1991_Cypress_Applications_Handbook 1991 Cypress Applications Handbook

User Manual: 1991_Cypress_Applications_Handbook

Open the PDF directly: View PDF PDF.
Page Count: 736

Download1991_Cypress_Applications_Handbook 1991 Cypress Applications Handbook
Open PDF In BrowserView PDF
APPLICATIONS
HANDBOOK
•

,

-:4'
:
=

CYPRESS

SEMICONDUCTOR

Cypress Semiconductor, 3901 North First St., San Jose, CA 95134 (408) 943-2600
Telex: 821032 CYPRESS SNJ UD, TWX: 910 997 0753, FAX: (408) 943-2741

Cypress Semiconductor, Cypress PLD Toolkit, and QuickPro II are trademarks of Cypress Semiconductor
Corporation.
IBM, IBM PC, and PCIXT are registered trademarks of the International Business Machine Corporation.
SPARC is a registered trademark of SPARC International.
Data I/O is a registered trademark of the Data I/O Corporation.
PLD Test and ABEL are trademarks of the Data I/O Corporation.
STAG is a registered trademark of Stag Microsystems Ltd.

Published August 1991
© Cypress Semiconductor Corporation. 1991. The Information contained herein Is subject to change wtthout notice. Cypress Semiconductor Corporation assumes no responsibility for the
use of anyclrcunryotherthan clrcunryembodled In a Cypress SemlconductorCorporatlon product. Nor does It conwyor Imply any license under patent or other rfghts. Cypress Semiconductor
does not authorize Its products for use as critical componenta In Ilfe-supporl systems where a malfunction or failure of the product may reasonably be expected to result In significant Injury
tothe user. The Inclusion of Cypress Semiconductor products In life-support systems appllcallons Implies that the manufacturerassumesaJl rfskofsuch use and In sodolng Indemnifies Cypress
Semiconductor against all damages.

CYPRESS

SEMICONDUcrOR

Preface

specific designs indicate whether the designs have been
simulated and/or built and completely debugged.
H you have questions about any Cypress product,
please contact your local Field Applications Engineer at
the nearest direct sales office. A list of Cypress sales
offices, representatives, and distributors is included at
the back of this Handbook. For continuous on-line information about Cypress products, you can connect to
the Cypress Bulletin Board at (408) 943-2954.

About This Book
This Applications Handbook is a learning tool for
using Cypress devices. The application notes included
here range from general product overview articles, such
as "Understanding Dual-Port RAMs," to specific design
examples.
The general overviews describe product-family
characteristics and explain some of the products'
capabilities. These application notes appear at the
beginning of this Handbook.
Next appear application examples that show how to
use specific Cypress devices in the context of real
designs. The application examples are organized by
product type (e.g., SRAMs or EPLDs). Within each
product type examples are arranged by product number, using the product that is the article's primary focus.
Although your specific application might not appear explicitly in an application note, the design examples can still be useful to you. H the design example
is similar to your application, you might be able to
adapt the hardware or software to your design easily.
Many of the application notes provide PLO software
code for design tools from a variety of vendors, so that
you can copy the code and use it as a skeleton for your
own PLO designs. Even if none of the examples relate
directly to your design, they can stimulate new ideas by
showing features or applications that might not have occurred to you. The information can also significantly
reduce the learning curve normally associated with unfamiliar ICs.
Most of the designs described in this Handbook are
based on actual circuits produced either by Cypress or
by one of our customers. Application notes that discuss

About Cypress Semiconductor
Since its incorporation in 1982, Cypress has successfully addressed diverse, high-performance niche
markets by creating technologically sophisticated
products, using innovative packaging, and emphasizing
quality. Cypress is a complete semiconductor manufacturer, performing its own process development, circuit
design, wafer fabrication, assembly, and test. Its core
CMOS and BiCMOS processes lead the industry with
O.8-micron design rules. Cypress ships over 200
products in seven product areas: SRAMs, PROMs,
PLOs, logic devices, SPARC microprocessors and
peripherals, multichip modules, and high-speed
BiCMOS PLO and memory devices. Cypress is an international company, with headquarters in San Jose,
California and fabrication facilities in San Jose; Round
Rock, Texas; and Bloomington, Minnesota. The company has started up five subsidiaries that are funded by
Cypress but run as independent businesses, including
Cypress Semiconductor (Texas) Inc., Aspen Semiconductor Corporation,
Multichip Technology Incorporated, Ross Technology Inc., and Cypress Semiconductor (Minnesota) Inc.

iii

Contents
Page
General Information
System Design Considerations When Using Cypress CMOS Circuits .......................... 1-1
Power Characteristics of Cypress Products ................................................ 1-23
Tips for High-Speed Logic Design ........................................................ 1-29
Protection, Decoupling, and Filtering of Cypress CMOS Circuits ............................. 1-34

Modules
Choosing Packages in High-Density Module Designs ........................................ 2-1
The Multichip Family of Universal JEDEC ZIP/SIMM Modules .............................. 2-7

ECL and TTL BiCMOS
Noise Considerations in High-Speed Logic Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3-1
Using ECL in Single + 5V TIL Systems ................................................... 3-4
BiCMOS TIL and ECL SRAMs Improve High-Performance Systems ......................... 3-7
PLCC and CLCC Packaging for High-Speed Parts ......................................... 3-15
A New Generation of BiCMOS High-Speed TIL SRAMs ................................... 3-20
Access Time vs. Load Capacitance for High-Speed BiCMOS TIL SRAMs . . . . . . . . . . . . . . . . . . .. 3-23
Combining SRAMs Without an External Decoder ....... . .. .. . . . . . . .. . . . . .. . . . .. . . . .. . .. . .. 3-27
BiCMOS TIL SRAMs Improve MIPS R3000 and R3000A Systems .......................... 3-30
Memory and Support Logic for Next-Generation ECL Systems .............................. 3-33

SRAMs
RAM I/O Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4-1
Understanding Dual-Port RAMs .......................................................... 4-7
Using Dual-Port RAMs Without Arbitration ............................................... 4-19
Using Cypress SRAMs to Implement 386 Cache ........................................... 4-23

PROMs
Pin-out Compatibility Considerations of SRAMs and PROMs ................................ 5-1
Introduction to Diagnostic PROMs ......................... . .. . .. . . . . . . . . .. . .. . .. .. . .. . ... 5-4
Interfacing the CY7C289 to the AM29000 ................................................. 5-10
Interfacing the CY7C289 to the CY7C601 ................................................. 5-23

PLDs
Introduction to Programmable Logic. . . . . . . . . . . . .. . . . . . .. . .. . .. . . .. . .. .. . . . .. .. . . . .. . .. . . .. 6-1
CMOS PAL Basics ...................................................................... 6-10
Are Your PLDs Metastable? ............................................................. 6-21
PLD-Based Data Path For SCSI-2 ........................................................ 6-40

v

Page

PLDs (continued)
PAL Design Example: A OCR EncoderlDecoder ........................................... ~3
1'2 Framing Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-76
Using CUPL with Cypress PLDs .......................................................... 6-93
Using ABEL to Program the Cypress 22V10 .............................................. 6-119
Using ABEL to Program the CY7C330 .......................................... : ........ 6-139
Using ABEL 3.2 to Program the Cypress CY7C331 ........................................ 6-147
Using Log/IC to Program the CY7C330 ........................ , ......................... 6-154
State Machine Design Considerations and Methodologies .................................. 6-173
Understanding the CY7C330 Synchronous EPLD ......................................... 6-213
Using the CY7C330 in Closed-Loop Servo Control ........................................ 6-233
FDDI Physical Connection Management Using the CY7C330 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-247
Bus-Oriented Maskable Interrupt Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-259
Using the CY7C330 as a Multi-channel Mbus Arbiter ..................................... 6-270
Using the CY7C331 as a Waveform Generator ........... ~ ................................ 6-279
CY7C331 Application Example: Asynchronous, Self-Timed VMEbus Requestor .............. 6-286
Understanding the 361 ................................................................. 6-295
Using the CY7C361 as an Mbus Arbiter ................................................. 6-305
TMS320C30/VME Signal Conditioner Using the CY7C361 ........................ ~ ........ 6-315
DMA Control Using the CY7C342 MAX EPLD .......................................... 6-327
Interfacing PROMs and RAMs to High-Speed DSP Using MAX . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-345
FIFO RAM Controller with Programmable Flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-351
Logic
Understanding Small FIFOs .............................................................. 7-1
Understanding Large FIFOs ............................................................. 7-14
Designing with the CY7C439 Bidirectional FIFO (BIFO) .................................... 7-2fJ
Microcoded System Performance ................... :..................................... 7-47
Systems with CMOS 16-Bit Microprocessor ALUs ......................................... 7-50

RIse
SPARC Software Advantages Over CISC ................................................... 8-1
Register Windows ................................................. '.' . . . . . . . . . . . . . . . . . . . .. 8-3
CY7C600 System Design Footnotes ........................... '.' .. . . . . . . . . . . . . . . . . . . . . . . . .. 8-7
The Impact of Memory on High-Performance RISC Microprocessors ......................... 8-17
High-Speed CMOS SPARC Design ....................................................... 8-23
SPARC System Surface-Mount Design .................................................... 8-33
Memory System Design for the CY7C601 SPARC Processor ............. . . . . . . . . . . . . . . . . . . .. 8-38
Cache Memory Design ..................................................... :............. 8-48
Synchronous Trap Identification for CY7C600 Systems. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. 8-65
An Introduction to Mbus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-69
Multiprocessing System Boot-Up ......................................................... 8-81

vi

Page

RIse (continued)
Porting UNIX to the CY7C604 or CY7C605 ............................................... 8-84
Getting Started with Real-Time Embedded System Development ....................... , ..... 8-89
SPARC as a Real-Time Controller ........................................................ 8-95
Memory Protection and Address Exception Logic for the CY7C611 SPARC Controller ........ 8-108

Bus Products
VIC068 Special Features and Tips ......................................................... 9-1
Interfacing the VIC068 to MC68020 ........................................................ 9-5

Glossary .............................................................................. 10-1
Index ................................................................................... 1-1

vii

Section Contents
Page

General Information
System Design Considerations When Using Cypress CMOS Circuits .......................... 1-1
Power Characteristics of Cypress Products ................................................ 1-23
Tips for High-Speed Logic Design ........................................................ 1-29
Protection, Decoupling, and Filtering of Cypress CMOS Circuits ............................. 1-34

Systems Design Considerations V\lhen
Using Cypress CMOS Circuits
analogous to the gm of a vacuum tube and is inversely
proportional to the gate oxide thickness.
Thin gate oxides, which are required to achieve the
desired performance, result in highly sensitive inputs.
These inputs require very little energy at or above the
device input-voltage threshold (approximately 1.5V at
25°C) to be detected. CMOS products might detect
high-frequency signals to which bipolar devices would
not respond.
MOS transistors also have extremely high input impedances (5 to 10 MO), which make these transistors'
gate inputs analogous to the input of a high-gain
amplifier or an RF antenna. In contrast, because
bipolar ICs have input impedances of 10000 or less,
these devices require much more energy to change state
than do MOS ICs. In fact, a typical Cypress IC requires
less that 10 picojoules of energy to change state. Thus,
when Cypress CMOS ICs replace bipolar or NMOS ICs
in existing systems, the CMOS ICs might respond to
pulses of energy in the system that are not detected by
the bipolar or NMOS products.

This application note describes some factors to
consider when designing new systems using Cypress
high-performance CMOS integrated circuits or when
using Cypress products to replace either bipolar or
NMOS circuits in existing systems. The two major areas
of concern are device input sensitivity and transmission
line effects due to impedance mismatching between the
source and load.
To achieve maximum performance when using
Cypress CMOS ICs, pay attention to the placement of
the components on the printed circuit board (PCB); the
routing of the metal traces that interconnect the components; the layout and decoupling of the power distribution 'system on the PCB; and perhaps most important of all, the impedance matching of some traces between the source and the loads. The latter traces must,
under certain conditions, be analyzed as transmission
lines. The most critical traces ate those of clocks, write
strobes on SRAMs and FIFOs, output enables, and chip
enables.

Replacing Bipolar or NMOS les

Reflected Voltages

Cypress CMOS ICs are designed to replace both
bipolar ICs and NMOS products and to achieve equal
or better performance at one-third (or less) the power
of the components they replace.
When high-performance Cypress CMOS circuits
replace either bipolar or NMOS circuits in existing
sockets, be aware of conditions in the existing system
that could cause the Cypress ICs to behave in unexpected ways. These conditions fall into two general
categories: device input sensitivity and sensitivity to
reflected voltages.

Cypress CMOS ICs have very high input impedances and - to achieve TTL compatibility and drive
capacitive loads -low output impedances. The impedance mismatch due to low-impedance outputs driving high-impedance inputs might cause unwanted voltage reflections and ringing, under certain conditions.
This behavior could result in less-than-optimum system
operation.
When the impedance mismatch is very large, a
nearly equal and opposite negative pulse reflects back
from the load to the source when the line's electrical
length (PCB trace) is greater than
1= _t_r_

Input Sensitivity

2Tpd

High-performance products, by definition, require
less energy at their inputs to change state than low- or
medium-performance products.
Unlike a bipolar transistor, which is a current-sensing device, a MOS transistor is a voltage-sensing device.
In fact, a MOS circuit design parameter called K' is

where tR is the rise time of the signal at the source, and
T pd is the one-way propagation delay of the line per
unit length.
The input clamping diodes in bipolar IC families
(e:g., TTL, LS, ALS, FAST, FACT) are inherent in th~
1-1

fabrication process. The P substrate is usually grounded
and N-wells are used for the NPN transistors and Ptype resistors. The wells are reverse biased by connecting them to the Vee supply. As a result, a PN junction
diode is formed between every input pin (cathode or N
material) and the substrate (anode or P material). A
negative voltage at an input pin due to either lead inductance or a voltage reflection forward biases the
diode, which turns on and clamps the input pin to a Vf
below ground (approximately -0.8V).
Historically, as circuit performance improved, the
output rise and fall times of the bipolar circuits
decreased to the point where voltage reflections began
to occur even for short traces when an, impedance mis~
match existed between the line and the load. Most
users, however, were unaware of these reflections because the reflections were suppressed by the diodes'
clamping action.
Conventional CMOS processing results in PN junction diodes, which adversely affect the ESD (electrostatic discharge) protection circuitry at each input pin
and cause an increased susceptibility to latch-up. In addition, when the input pin is negative enough to forward
bias the input clamping diodes, electrons are injected
into the substrate. When a sufficient number of
electrons are injected, the resulting current can disturb
internal nodes, causing soft errors at the system level.
To eliminate this problem, all Cypress CMOS
products use a substrate bias generator. The substrate is
maintained at a negative 3V potential, so the substrate
diodes cannot be forward biased unless the voltage at
the input pin becomes a diode drop more negative than
-3V. (See Figure 5 in "CMOS PAL Basics" for a
schematic of the input protection circuits used on all
Cypress CMOS products.) To the systems designer, this
translates to approximately five times (3.8V divided by
O.8V = 4.75) the negative undershoot safety margin for
Cypress CMOS integrated circuits versus those that do
not use a bias generator.
Voltage reflections should be eliminated by using
impedance matching techniques and passive components that dissipate excess energy before it can cause
soft errors. Crosstalk should be reduced to acceptable
levels by careful PCB layout and attention to details.

clock, chip select, output enable, and write and read
control 'lines from each other and from data and address lines so that the signals do not cause coupling to
each other or to the data lines.
It is standard practice to use ground or power
planes between signal layers on multi-layered PCBs to
reduce crosstalk. The capacitance of these isolation
planes increases the propagation delay of the signals on
the signal layers, but this drawback is more than compensated for by the isolation the planes provide.

The Theory of Transmission Lines
A connection (trace) on a PCB should be considered as a transmission line if the wavelength of the
applied frequency is short compared to the line length.
If the wavelength of the applied frequency is long compared to the length of the line, you can use conventional
circuit analysis.
In practice, transmission lines on PCBs are
designed to be as nearly lossless as possible. This
simplifies the mathematics required for their analysis,
compared to a lossy (resistive) line.
Ideally, all signals between ICs travel over constantimpedance transmission lines that are terminated in
their characteristic impedances at the load.. In practice,
this ideal situation is seldom achieved for a variety of
reasons.
Perhaps the most basic reason is that the characteristic impedances of all real transmission lines are not
constants, but present different impedances depending
upon the frequency of the applied signal. For "classical"
transmission lines driven by a single-frequency signal
source, the characteristic impedance is "more constant"
than when the transmission line is driven by a square
wave or a pulse.
According to Fourier series expansion, a square
wave consists of an infinite set of discrete frequency
components - the fundamental plus odd harmonics of
decreasing amplitude. When the square wave
propagates down a transmission line, the higher frequencies are attenuated more than the lower frequencies. Due to dispersion, the different frequencies do not
travel at the same speed.
Dispersion indicates the dependence of phase
velocity upon the applied frequency (Reference 1 pg.
192). The result is that the square wave or pulse is distorted when the frequency components are added
together atthe load.
'
A second reason why practical transmission lines
are not ideal is that they frequently have multiple loads.
You can distribute the loads along the line at regular or
irregular intervals or lump them together as close as
practical at the end of the line. The signal-line reflections and ringing caused by impedance mismatches,
non-uniform transmission line impedances, inductive
leads, and non-ideal resistors could compromise the
dynamic system noise margins and cause inadvertent
switching.

Crosstalk
The rise and fall times of the waveforms generated
by Cypress CMOS circuit outputs are 2 to 4 ns between
levels of 0.4 and 4V. The fast transition times and the
large voltage swings could cause capacitive and inductive coupling (crosstalk) between signals if insufficient
attention is paid to PCB layout.
You can reduce crosstalk by avoiding running PCB
traces parallel to each other. If this is not possible, run
ground traces between signal traces.
In synchronous systems, the worst time for the
crosstalk to occur is during the clock edge that samples
the data. In most systems, it is sufficient to isolate the

1-2

IC t

t
VI

TO
10

INFINITY

Figure 1. Transmission Line Model
One system design objective is to analyze the critical signal paths and design the i~terconnectio.ns .such
that adequate system noise margms are mamtamed.
There will always be signal overshoot and undershoot.
The objective is to accurately predict these effects,
determine acceptable limits, and keep the undershoot
and overshoot within the limits.

Input or Characteristic Impedance
To calculate the characteristic impedance (also
called AC impedance or surge impedance) looking i~to
terminals a-b of the circuit in Figure2, use the followmg
procedure.
.
.
.
Let Zl be the input Impedance looking mto terminals a-b, with Z2 for terminals c-d, Z3 for terminals
e-f, etc. Zl is the series impedance of the first inductor
(lL) in series with the parallel combination of Z2 and
the impedance of the capacitor (IC).
From AC theory:
XL= jrolL
where XL is the inductive reactance.
1
Xc= jrolC

The Ideal Transmission Line
An equivalent circuit for a transmission line ap-

pears in Figure 1. The circuit consists of subsections of
series resistance (R) and inductance (L) and parallel
capacitance (C) and shunt. admittance (~) or parallel
resistance, Rp. For c1anty and consistency, these
parameters are defined per unit length. Multiply the
values of R, L, C, and Rp by the length of the subsection, 1, to fmd the total value. The line is assumed to be
infinitely long.
If the line of Figure 1 is assumed to be lossless (R
= 0, Rp = infinity) Figure 1 reduces to Figure 2. A
small series resistance has little effect upon the line's
characteristic impedance. In practice and by design, the
series resistance is quite small. For I-ounce (0.0015inch-thick), I-mil-wide (0.010-inch) copper traces on 010 glass epoxy PCBs, the trace resistance is between 0.5
and 0.3.0 per foot. 2-ounce copper has a resistance 50
percent lower than that of I-ounce copper.

where Xc is the capacitive reactance.
Then
Z2XC
ZI=XL+ Z2+XC
If the line is reasonably long, Zl = Z2
stituting Zl = Z2 into Equation 1 yields

Eq. 1
Z3. Sub-

ZIXC

ZI=XL+
XZ1+ C
or,
Z12- ZIXL- XCXL= 0

Eq. 2

~Z3

t
e

IL

g

Ie! ~'--I-C-!--;-4--~Y
b

f

d

~

~I ..

~I..

Figure 2. Ideal Transmission Line Model

1-3

h

~

Substituting the expressions for Xc and XL yields

whether the rise time of the signal at the source equals
or is greater (slower) than two times the propagation
delay of the line.
The condition for a voltage reflection to occur is

Z1 2 - jrolL

= !=....
Eq.3
C
Equation 3 contains a complex component that is
frequency dependent. You can eliminate the complex
component by allowing I to become very small and by
recognizing that the ratio UC is constant and independent of I or ro:
ZI= ~LIC
Eq.4
The AC input impedance of a purely reactive,
uniform, lossless line is a resistance. This is true for AC
or DC excitation.

_tr_
L>
- 2TpdL
Solving for the loaded propagation delay yields

~

Eq.l0
2Tpd
The intrinsic capacitance of the line from Equation 5 is

1=

C -

0-

Eq.ll

Tpd

Zo

It is standard practice to use Co to designate the
intrinsic line capacitance, Lo the intrinsic line self inductance, and Zo the intrinsic line characteristic impedance.
Substituting the expressions from Equations 9, 10,
and 11 into Equation 6 gives the relationship for the
line length at which voltage reflections might occur.
Two conditions must be present for voltage reflections
to occur: The line must be long, and there must be an
impedance mismatch between the line and the load.

-..fLC

The propagation delay for a lossless line is the
reciprocal of the propagation velocity:
Tpd = -..fLC
Eq.5
= ZIC
where L and C are once again the intrinsic line inductance and capacitance per unit length.
Adding additional stubs or loads to the line (Refer~
ence 2 pg. 129) increases the propagation delay by the
factor
...J 1 + cwc
where CD is the load capacitance.
Therefore, the propagation delay of a loaded line,
TpdL, is
TpdL = Tpd...J 1 + Cwc
Eq. 6
This application note shows later that a transmission line's unloaded or intrinsic propagation delay is
proportional to the square root of the dielectric constant of the medium surrounding or adjacent to the line.
Propagation delay is not a function of the line's
geometry.
The characteristic impedance of a capacitively
loaded line decreases by the same factor that the
propagation delay increases:
ZI
= ''\jF1-+- - CIYic

8

However, the actual physical length of the line is

Propagation Velocity and Delay

1

q.

Eq.9

The propagation velocity (or phase velocity) of a
sinusoid traveling on an ideal line (Reference 1 pg. 33) is
1
cx= _.-

Z '

E

CD

+ -----~x Tpd

Eq.12

Tpd
Zo
Solving Equation 12 for the line length, 1, yields

L=~
2Tp d

1

J

Eq.13

C Z

1+~
tr

Equation 13 is very useful to the system designer. It
is generic and applies to all products irrespective of circuit type, logic family, or voltage levels. The equation
allows you to estimate when a line requires termination,
using variables you can easily determine.
When driving a distributed or non-lumped load, the
signal's rise time depends on the source - not the load,
as you might expect. The intrinsic,· or unloaded, line
propagation delay per unit length is a function of the
dielectric constant and can be easily calculated. The intrinsic line characteristic impedance isa function of the
dielectric constant and the PCB's physical construction
or geometry and can also be calculated. Finally, you can
estimate the equivalent (lumped) load capacitance by
adding up the number of loads (device inputs) being
driven and multiplying by 10 pF. For I/O pins, use 15
pF per pin.

Eq. 7

Note that the capacitance per unit length must be
multiplied by the line length, I, to calculate an
equivalent lumped capacitance.

The Condition for Voltage Reflection
It is relatively straightforward to obtain a c1osedform solution for a transmission line's maximum allowable length, which, if exceeded, might cause a voltage
reflection. If the line is. not terminated in its characteristic impedance, a reflection is guaranteed to occur.
The reflection's amplitude depends on the amount of
impedance mismatch between the line and the load and

Signal Transition Times
The standard Cypress 0.81l (L drawn) CMOS
process yields output buffers whose signals transition
approximately 4V in 2 ns, or, have a slew rate of 2V per

1-4

nanosecond. The rise time/fall time is 2 ns. Products
fabricated using the Cypress BiCMOS process have the
same rise times.
The Cypress ECL process yields products with 500ps output signal rise times and fall times, or slew rates
of 1V/0.5 ns = 2V per nanosecond. Internal signal slew
rates are lOV per nanosecond, but only for short (usually less than 500 mY) voltage excursions. Thus, high-frequency noise is generated on chip, which you can
eliminate by using 100- to 500-pF ceramic or mica filter
capacitors between Vee and ground.
The values in Table 1 come from using Equation 13
to calculate the line length at which voltage reflections
might occur. The calculations assume a 50Q intrinsic
line characteristic impedance and that the PCB is multilayer, using stripline construction on G-lO glass epoxy
material (dielectric constant of 5). These conditions
result in an unloaded line propagation delay of 2.27 ns
per foot.
Table 1 reveals that decreasing the source rise time
from 2 to 0.5 ns (a factor of 4) decreases the line length
at which a voltage reflection might occur by a factor of
5 (4.73 divided by 0.93 = 5.09) for the same load (10
pF) and intrinsic propagation delay (2.27 ns/ft.). A
second observation is that for signals with rise times of
0.5 ns, you should terminate all lines.

reflects back from the load to the source, where the
voltage either adds to or subtracts from the original signal. A mismatch between the source and line impedance might also cause a voltage reflection, which in
tum reflects back to the load. Therefore, two reflection
coefficients are defmed.
For classical transmission lines driven by a single
frequency source, the impedance mismatches cause
standing waves. When pulses are transmitted and the
source's output impedance changes depending upon
whether a Low-to-High or a High-to-Low transition occurs, the analysis is complicated further.
You can use classical transmission line analysiswhere pulses are represented by complex variables with
exponentials - to calculate the voltages at the source
and the load after several back and forth reflections.
However, these complex equations tend to obscure
what is physically happening.

Energy Considerations
Now consider the effects of driving the ideal transmission line with digital pulses and analyze the behavior
of the line under various driving and loading conditions.
The first task is to define the load and source reflection
coefficient s.
Figure 3 shows the circuit to be analyzed. The ideal
transmission line of length I is driven by a digital source
of internal resistance Rs and loaded with a resistive load
RL. The characteristic impedance of the line appears as
a pure resistance,
Zo= ..JLIC
to any excitation.
The ideal case is when Rs = Zo = RL. The maximum energy transfer from source to load occurs under
this condition, and no reflections occur. Half the energy
is dissipated in the source resistance, Rs, and the other
half is dissipated in the load resistance, RL (the line is
lossless).
If the load resistor is larger than the line's characteristic impedance, extra energy is available at the load
and is reflected back to the source. This is called the
underdamped condition, because the load under-uses
the energy available. If the load resistor is smaller than
the line impedance, the load attempts to dissipate more
energy than is available. Because this is not possible, a
reflection occurs that signals the source to send more
energy. This is called the overdamped condition. Both
the underdamped and overdamped cases cause negative
traveling waves, which cause standing waves if the excitation is sinusoidal. The condition Zo = RL is called
critically damped.
The safest termination condition, from a systems
design viewpoint, is the slightly overdamped condition,
because no energy is reflected back to the source.

Reflection Coefficients
Another attribute of the ideal transmission line is
reflection coefficients, which are not actually line characteristics. The line is treated as a circuit component,
and reflection coefficients are defined that measure the
impedance mismatches between the line and its source
and the line and its load. The reason for defining and
presenting the reflection coefficients becomes apparent
later when it is shown that if the impedance mismatch is
sufficiently large, either a negative or positive voltage
Table 1. Line Length at which a Voltage Reflection
Occurs
tr (ns)

CD (oF)

L (inches)

2

10
20
40

4.73

2
2
2
1
1
1
1
0.5
0.5
0.5
0.5

80

10
20
40
80
10
20
40
80

4.32
3.74
3.05
2.16
1.87
1.53
1.18
0.93
0.76
0.59
0.44

Line Voltage For a Step Function
To determine the line voltage for a step function
excitation, you apply a step function to the ideal line

1-5

Rearranging Equation· 16 yields

and analyze the behavior of the line under various loading conditions. The step function response is important
because any pulse can be represented by the superposition of a positive step function and a negative step function, delayed in time with respect to each other. By
proper superposition, you can predict the response of
any line and . load to any width pulse. The principle of
superposition applies to all linear systems.
According to theory, the rise time of the signal
driven by the source is not affected by the characteristics of the line. This has been substantiated in practice by using a special coaxially constructed reed relay
that delivers a pulse of 18A into 50n with a rise time of
0.070 ns (Reference 1 pg. 162).
The equation representing the voltage waveform
going down the line (Figure 3) as a function of distance
and time is
VL(X, t) = VA(t) U(t- X tpd) for t< To
Eq.14
VA(I)

~ vs(t{ 2

0
:
0

Rs )

VB= VL+ VL

1L= VL

Zo

and
VL'
Ii' = - _.-

Zo.
(The minus sign is due to IL being- negative; i.e., IL is
opposite to the current due to VL.) Therefore,
Zo

Eq.16

A

By defmition:
_ reflected voltage VL'
PL - incident voltage
VL
Solving for, VL' /VL in Equation 16 and substituting
in the equation for PL yields
RL- Zo
PL= RL+ Zo
Eq.17
The reflection coeffiCient at the source is
Rs- Zo
ps= Rs+ Zo

VL')
VL VL

Eq.19

f)L) VL
Equation 19 describes the voltage at the load (VB)
as the sum of an incident voltage (VL) and a reflected
voltage (pL VI) at time t = To. When RL = Zo, no
voltage is reflected.· When RL < Zo, the reflection coefficient at the load is negative; thus, the reflected voltage
subtracts from the incident voltage, giving the load voltage. When RL > Zo, the reflection coefficient is positive; thus, the reflected voltage adds to the incident voltage, again giving the load voltage.
Note that the reflected voltage at the load has been
defined as positive when traveling toward the source.
This means that the corresponding current is negative,
subtracting from the current driven by the source.
This piecewise analysis is cumbersome and can be
tedious. However, it does provide an insight into what is
physically happening and demonstrates that a complex
problem can be solved by dividing it into a series of
simpler problems. Also, eliminating the exponentialswhich provide phase information in the classical transmission line equations - simplifies the mathematics. To
use the piecewise method, you must do careful bookkeeping to combine the reflections at the proper time.
This is quite straightforward, because a pulse travels
with a constant velocity along an ideal or low-loss line,
and the time delay .between reflected pulses can be
predicted.
The rules to keep in mind are that at any location
and time the voltage or the current is the algebraic sum
of the waves traveling in both directions. For example,
two voltage waves of the same polarity and equal
amplitudes, traveling in opposite directions, at a given
location and time add together to yield a voltage of
twice the amplitude of one wave. The same reasoning
applies to all points of termination and discontinuities
on the line. The total voltage or current is the algebraic
sum of all the incident and reflected waves. Polarities
must be observed. A positive voltage reflection results
in a negative current reflection and vice versa.

where
VA = the voltage at point A
X = the voltage at a point X on the line
I = the total line length
tpd = the propagation delay of the line in nanQseconds
per foot
To = I tpd, or the one-way line propagation delay
U (t) = a unit step function occurring at x = 0
V8(t) = the source voltage
When the incident voltage reaches the end of the
line, a reflected voltage, VL', occurs if RL does not
equalZo. The reflection coefficient at the 10ad,pL, can
be obtained by applying Ohm's Law.
The voltage at the load is VL + VL', which must
be equal to (IL + IL')RL. But

Zo

= ( 1+

= (1+

Eq. 15

VB= VL+ VL,=[VL_ VL')RL

,

j+

Rs

~I

~X
-.
IA

Zo

-.
IB

t

SOURCE

I

+

VB(-X) RL

VA
S

B

IB

IA

.-

.-

LINE

1
LOAD

Figure 3. Ideal Transmission Line Loaded and Driven

Eq. 18

1-6

a via from a signal plane through a ground plane to a
second signal plane in a multilayer PCB or module. IC
sockets and other connectors can also cause discontinuities.

Step Function Response of the Ideal Line
Before examining reflections at the source due to
mismatches between the source and line impedances,
consider the behavior of the ideal line with various
loads when driven by a step function. The circuit for
analysis appears in Figure 3. Figure 4 shows the voltage
and current waveforms at point A (line input) and point
B (the load) for various loads. (These values are drawn
from Reference 1 pg. 158 - 159.) Note that Rs = Zo and
that VA at t = 0 equals Vsl2. This means that no impedance mismatch exists between the source and the
line; thus, there is no reflection from the source at t =
2To. To is the one-way propagation delay of the line.
The time-domain response of the reactive loads are
obtained by applying a step function to the LaPlace
transform of the load, then taking the inverse transform.
Note that the reflection coefficient at the load is
not the total reflection coefficient (a complex number)
but represents only the real part of the load. The
piecewise method eliminates the complex (jrot) terms by
performing the bookkeeping involving the phase
relationships, which the complex terms account for in
classical transmission line analysis.
Note that for the open-circuit condition in Figure
4b, ZL = infinity, so that PL = +1. The voltage is
reflected from the load to the source (at amplitude Vo
= Vsl2). Thus, at time = 2 To, the reflected voltage
adds to the original voltage, V0 = Vsl2, to give a value
of 2V0 = Vs. While the voltage wave is traveling down
to and back from the load, a current of
I - Vo _ Vs Z
0-

Zo -

2

Ideal Transmission Line's Pulse Response
Consider next the behavior of the ideal transmission lIne when driven by a pulse whose width is short
compared to the line's electrical length - when the
pulse width is less than the line's one-way propagation
delay time, To.
Figure 6 shows another series of response
waveforms for the circuit in Figure 3, this time for a
pulse instead of a step (drawn from Reference 1 pg. 160
- 161). Note that Rs = Zo and that VA at t = 0 equals
Vsl2. This means that there is no impedance mismatch
between the source and the line; thus, there is no reflection from the source at t = 2To.

Finite Rise Time Effects
Now consider the effects of step functions with
finite rise times driving the ideal transmission line.
During the rise time of a pulse, half the energy in the
static electric field is converted into a traveling magnetic field and half remains as a static electric field to
charge the line.
If the rise time· is sufficiently short, the voltage at
the load changes in discrete steps. The amplitude of the
steps depends on the impedance mismatch, and the
width of the steps depends on the line's two-way
propagation delay.
As the rise time and/or the line gets shorter
(smaller To), the result converges to the familiar RC
time constant, where C is the static capacitance. All
devices should be treated as transmission lines for transient analysis when an ideal step function is applied.
However, as the rise time becomes longer and/or the
traces shorter, the transmission line analysis reduces to
conventional AC circuit analysis.

0

exists. This current charges up the distributed line
capacitance to the value Vs, then the current stops.
The waveforms at the source and load for the series
RC termination shown in Figure 4g are of particular interest because this network dissipates no DC power;
you can use this network to terminate a transmission
line in its characteristic impedance at the input to a
Cypress IC. Figure 4h represents the equivalent circuit
of a Cypress IC's input. Combining both networks
models a Cypress IC driven by a transmission line terminated in the line's characteristic impedance, when the
values of Rand C are properly chosen.

Reflections From Small Discontinuities
Figure 7 shows a pulse with a linear rise time and
rounded edges driving the transmission line of Figures
5a and 5b. The expressions for Vr are derived on pages
171 and 172 of Reference 1. The reflection caused by the
small series inductance is useful for calculating the
value of the inductor, L', but little else.
The reflection caused by the small shunt capacitor
is more interesting. If this capacitor is sufficiently large,
it can cause a device connected to the transmission line
to see· a logic Zero instead of a logic One.

Reflections Due to Discontinuities
Figure 5 illustrates three types of common discon-

tinuities found on transmission lines. Any change in the
characteristic impedance of the line due to construction, connectors, loads, etc., causes a discontinuity,
which causes a reflection that directs some energy back
to the source. The amount of energy reflected back is
determined by the discontinuity's reflection coefficient.
Because discontinuities are usually small by design,
most of the energy is transmitted to the load.
In general, a discontinuity has series inductance,
shunt capacitance, and series resistance. An example is

The Effect of Rise Time on Waveforms
Next, consider the ideal line terminated in a resistance less than its characteristic impedance and driven
by a step function with a linear rise time. The stimulus,
the circuit, and the response appear in Figures Sa, b and
c, respectively. Once again, note that because the source
1-7

(a) Series Inductance

(b) Shunt Capitance

(c) Series Resistance

R

~"-----'t z,

VA

+2

-t------'

(R

VA R

+

I- 1'-+\

+ 20)
+

2Z
0

l'
2Tol

Figure 5. Reflections from Discontinuities with an Applied Step Function
resistance equals the line characteristic impedance,
there are no reflections from the source.
The resulting waveforms are similar to those of Figure 4c when modified as shown in Figure8c. The waveform's final value must be the same as before (Figure 4c).
The resultant wave at the line input (Vin)is easily
obtained by superposition of the applied wave and the
reflected wave at the proper time. In Figure 8, because
the step function's rise time is less than the line's twoway propagation delay, the input wave reaches its final
value, Vs/2. At t = 2To, the reflected wave arrives back
at the source and subtracts from the applied step function (the load reflection coefficient is negative). Figure9
illustrates waveforms for two relationships between the
step function rise time and the propagation delay.

appears on the line and travels toward the load. After a
one-way propagation delay time, To, the wave reflects
back with an amplitude of PL V0.
This first reflected wave than travels back to the
source, and at time t = 2To, the wave reaches the input
end of the line. At this time, the first reflection at the
source occurs, and 'a wave of amplitude ps (pL Va)
reflects back to the load. At time t = 3To, this wave
again reflects from the load back to the source with
amplitude
2

PL ps (pL Vo) = ps PL Vo
This back and forth reflection process continues
until the amplitudes of the reflections become so small
that they cannot be observed. Then, the circuit is said to
be in a q uiesce~t state.

Multiple Reflections
Now consider the case of an ideal transmission line
with multiple reflections caused by improper terminations at both ends of the line. The circuit and
waveforms appear in Figure 10. The reflection coefficients at the source and the load are both negativethe source resistance and the load resistance are both
less than the line characteristic impedance.
When the switch is initially closed, a step function
of amplitude
Vs20
Vo= Vin= Rs+ 20

Effective Time Constant
Voltage reflections in small increments and of short
durations approximate an exponential function, as indicated by the dashed line in Figure lOb. The smaller and
narrower the steps become, the more closely the
waveform approaches an exponential curve.
The mathematical derivation is presented on pages
178 and 179 of Reference 1. The time constant is
K= -

1-9

2To

1- ps PL

Eq.20

··b
..~

:1 n

D

.,

o

2TO

To

"1'1--[-·-1

w.

To

r 6:-°1•."""
TO

Figure 6. Pulse Response of Figure 3 for Various Terminations
_,-;-;:;VA = Vs/2, 10 = VO/Z o, To = hLC,

1-10

(RL- 2 0 )
PL = (RL+ 2 )
0

'

Consider the case of a short-circuited transmission
line driven by a step function with a source impedance
unequal to the characteristic line impedance. The
general case is shown in Figure lOa. For RL = 0 the
reflection coefficients are
Zs- Zo
ps = Zs + Zo
PL = - 1

Thus, the resultant voltage waveform at the load can be
approximated by
V(t)=

voe(i)

Eq.21

For Equation 21 to be accurate, PL and ps must be
reasonably large (approaching ± 1) so that the incremental steps are small. Because the product PSPL is
a positive number, less than one, the time constant is a
negative number, which indicates that the exponential
decreases with time. This is usually the case in transient
circuits.
Both reflection coefficients must also have the same
sign to yield a continually decreasing or increasing
waveform. Opposite signs give oscillatory behavior that
cannot be represented by an exponential function.

The approximate time constant is
_ k=
2To
~
To (Zs+ Zo)
1 - ps PL
1 + ps
Zs
ToZo
Eq.22
or - k= To+ - Zs
Recall that
To= l-fLC
(one-way delay) and
Zo= ...JLIC
where 1 is the physical length of the line, and L and C
are the per-unit-Iength parameters. Substituting these
variables into Equation 22 yields

From Transmission Line to Circuit Analysis
When a transmission line is terminated in its characteristic impedance, the line behaves like a resistor. It
usually does not matter if you use transmission line or
circuit analysis, provided that you take the propagation
delays into account.

- k= To+

l~
Zs

Vs

(a) Applied Pulse
from Generator

APPLIED STEP
FUNCfION

L'VA

V=-')20 T,

Zo

(b) Reflections
from Small Series
Inductor L'

Vs

"2

(c) Reflections
from Small Shunt
Capacitance C'

Reflected Wave

Vs~
Rz+ Z"

TR

Figure 7. Reflections From Small Discontinuities with
a Finite Rise Time Pulse

2To

Figure 8. Effect of Rise Time on Response of
Mismatched Line with Rl < Zo

1-11

1--1

Yin

1

Reflected Wave

VI

!
Rl
Vs - - Rl+ Zo

2To=T R

RI

(a) circuit

Yo-+-----,

4To

(a) TR = 2To

Yin

(b)

R/
~__~----~----4_----~--~--~--~VsR/+R,
Yin
2To
4To
6To
t

Reflected Wave

Rl
VS--Rl+ Zo
2To

2To TR

(c) tin~: current

4To

6To

4T

(b) TR > 2To

(I+PL)VO~

Figure 9. Effects of Rise Time on Response for
R/

Rl < Zo

Vs - .. R/+ [.

It is necessary to have Zs smaller than 20. Thus, the
reflection coefficients have the same sign to give exponential behavior. Opposite signs give oscillatory behavior.
If Zs < 20, the exponential approximation becomes more accurate. If Zs is very small compared to
Zo, then To is negligible compared to 1 Ll20, so that
Equation 22 reduces to
k= -

2To
4To
(d) load voltage

6To

Figure 10. Step Function Applied to Line Mismatched
on Both Ends; Shown for Negative Values of ps and PL

Types of Transmission Lines
The types of transmission lines include
Coaxial cable
Twisted pair
Wire over ground
Microstrip lines
Strip lines

l~

Zs
But 1 L is the total loop inductance, and Zs is the
circuit's total series impedance. The time constant is
then

L'

k= RsThis is the. same time constant you would obtain by
a circuit analysis approach if you considered the line a
series combination of L' and Rs. By open-circuiting the
line and performing a similar analysis, it can be shown
that an RC time constant results.

Coaxial Cable
Coaxial cable offers many advantages for distributing high-frequency. signals. The well-defined and
uniform characteristic impedance permits easy matching. The cable's ground shield reduces crosstalk, and
the' low attenuation at high frequencies make the cable
ideal for transmitting the fast rise- and' fall-time signals
1-12

The characteristic impedance is approximately 120n.
This value can vary as much as ± 40 percent, depending
upon the distance from the ground plane, the proximity
of other wires, and the configuration of the ground.

generated by Cypress CMOS ICs. However, because of
its high cost, coaxial cable is usually restricted to applications that permit no alternatives. These applications usually involve clock distribution systems on PCBs
or backplanes.
Because coaxial cable is not easily handled by
automated assembly techniques, its application requires
human assemblers. This requirement further increases
costs.
Coaxial cables have characteristic impedances of
50,75,93, or 150n. These values are the most common,
although special cables can be made with other impedances.
Coaxial cable's propagation delay is very low. You
can compute it using the formula
Tpd= 1.017 -{e,: (nsl/t)
Eq.23
where er is the relative dielectric constant and depends
upon the dielectric material used. For solid Teflon and
polyethylene, the dielectric constant is 2.3. The
propagation delay is 1.54 ns per foot. For maximum
propagation velocity, you can use coaxial cables with
dielectric Styrofoam or polystyrene beads in air. Many
of these cables have high characteristic impedances and
are slowed considerably when capacitively loaded.

Microstrip Lines
A micros trip line (Figure 12) is a strip conductor
(signal line) on a PCB separated from a ground plane
by a dielectric. If the line's thickness, width, and distance from the ground plane are controlled, the line's
characteristic impedance can be predicted with a
tolerance of ± 5 percent.
The formula given in Figure 12 has proven to be
very accurate for width-to-height ratios between 0.1:1
and 3.0: 1 and for dielectric constants between 1 and 15.
The inductance per foot for micros trip lines is
L

=

Eq. 24

(Zo)2 Co

where Zo is the characteristic impedance and Co is the
capacitance per foot.
The propagation delay of a micros trip line is
Tpd = 1.017...J 0.45 er + 0.67 (nsl/t)
Eq.25
Note that the propagation delay depends only upon
the dielectric constant and is not a function of the line
width or spacing. For G-10 fiberglass epoxy PCBs
(dielectric constant of 5), the propagation delay is 1.74
ns per foot.

Twisted Pair
You can make twisted pairs from standard wire
(AWG 24 - 28), twisted about 30 turns per foot. The
typical characteristic impedance is lIOn.
Because the propagation delay is directly proportional to the characteristic impedance (Equation 5), the
propagation delay is approximately twice that of coaxial
cable. Twisted pairs are used for backplane wiring,
sometimes for driving differential receivers, and for
breadboarding.

Strip Line
A strip line consists of a copper strip centered in a
dielectric between two conducting planes (Figure 13). If
the line's thickness, width, dielectric constant, and distance between ground planes are all controlled, the
tolerance of the characteristic impedance is within ± 5
percent. The equation given in Figure 13 is accurate for
W/(b - t) < 0.35 and tlb < 0.25.
The inductance per foot is given by the formula

Wire Over Ground

L

Figure 11 shows a wire over ground. This configura-

= (Zo)2 Co

The propagation delay of the line is given by the
formula
T pd = 1.017 -{e,: (nsl/I)
Eq. 26

tion is used for breadboarding and backplane wiring.

h

~

Ground

Wff$!;/$////$ff$;///;/$l~

Zo= _ _8_7_ln(~)
..Jer+ 1.41

Z ~ln(4h)
o

=

O.8w+ t

{i;d

Figure 11. Wire Over Ground

Figure 12. Microstrip Line

1-13

For 0-:10 fiberglass epoxy boards, the propagation
delay is 2.27 ns per foot. The propagation delay is not a
function of line width or spac·ing.

Line Termination Strategies
There are two general strategies for transmission
line termination:
Match the load impedance to the line impedance
Match the source impedance to the line. impedance
In other words, if either the load reflection coefficient or the source reflection coefficient can· be made· to
equal zero, reflections are eliminated. From a systems
design viewpoint, strategy 1 is' preferred. Eliminating
the reflection at the load (i.e., dissipating the excess
energy) before the energy travels back to the source
causes less noise, electromagnetic interference (EMI),
and radio frequency interference (RFI).

Modern PCBs
Most PCBs employ microstrip, stripline, or some
combination .ofthe two. Microstrip construction on a
double-sided board with power and ground nets can
suffice for low- to medium-performance, and low-densityPCBs.
For high-performance, high-density PCBs, stripline
construction is preferred. Power planes isolate signal
layers from each other and provide higher-quality
power and grounds than those of a two-layer board.
Manufacturing quality control assures that. the metalization is of uniform thickness and that the layers are
properly laminated, thus ensuring uniform, predictable
electrical characteristics.

Multiple Loads, Buses, and Nodes
In the case where multiple loads are connected to a
transmissiOn line, only one termination circuit is required. The termination should be located at the load
that is electrically the greatest distance from the source.
This is usually the load that is the greatest physical distance from the source.' A point-to-point or daisy chain
connection of loads is preferred.
Bidirectional buses should be terminated at each
end with a circuit whose impedance equals the intrinsic,
characteristic line impedance. The reason is that each
transmitting device sees the characteristic impedance of
the line when the device is transmitting.
Consider next a line that has three bidirectional
nodes: one on each end and one in the middle. The
middle node, when driving the line, sees an impedance
equal to Zo/2, because the node is looking into' two lines
in parallel with each other. The end nodes, however, see
an impedance of Zo. In this case, as in a backplane,
each end of the line should be terminated in an impedance equal to Zo/2.

When to Terminate Transmission Lines
Transmission lines should be terminated when they
are long. From the preceding analysis, it should be apparent that
·
Tr
L ong L me>
-2T
pdL

where TpdL is the loaded propagation delay of the line
per unit length. For Cypress CMOS and BiCMOS
products, the rise time, Tr, is typically 2ns.
For stripline construction (multilayer PCBs), the
line length at which voltage reflections occur has been
shown to vary from 4.73 inches for a lO-pF load to 3.05
inches for an 80-pF load (see Equation 13 and Table 1).
Not all lines exceeding these lengths need to be terminated. Terminations are usually required on control
lines (such as clock inputs, write and read strobe lines
on SRAMs and FIFOs) and chip select or outputenable lines on RAMs, PROMs, and PLDs. Address
lines and data lines on RAMs and PROMS usually have
time to settle because they are normally not the highestfrequency lines in a system. However, if very heavily
loaded, address and data bus lines might require terminations.

Types of Terminations
There are three basic types of terminations: series
damping, pull-up/pull-down, and parallel At terminations. Each has its advantages and disadvantages.
Except for series damping, the termination network
should be attached to the input (load) that is electrically
the greatest distance from the source. Component leads
should be as short as possible to prevent reflections due
to lead inductance.

Series Damping

Zo=

~ln(
-Ie;

4b

0.671t~ 0.8 + .;)

Series damping is accomplished by inserting a small
resistor (typically 10 to 75.Q) in series with the transmission line, as close to the source as possible (Figure 14).
Series damping is a special case of damping in which
the series resistor value plus the circuit output impedance equals the transmission line impedance. The
strategy is to prevent the wave reflected back from the
load from reflecting back from the' source. This is done
by making th,e source reflection coefficient equal to
zero.

1

Figure 13. Strip Line Construction

1-14

~

~~~OID~~~~~~~~~~~~~~~~~S~y~s~te~Dl~S~D~es~i~g~n~C~o~n~si~d~e~r~a~ti~o~n~s
Zo

Provides current limiting when driving highly
capacitive loads; the current limiting also helps
avoid ground bounce
The disadvantages of series termination are
Degrades rise time at the load due to increased RC
time constant
Should not be used with distributed loads
The low input current required by Cypress CMOS
lCs results in essentially no DC power dissipation. The
only AC power required is to charge and discharge the
parasitic capacitances.

C

B

A

Figure 14. Series Damping Termination

Pull· Up/Pull·Down Termination

The channel resistance (On resistance) of the pulldown device for Cypress lCs is 10 to 200, depending
upon the current-sinking requirements. Thus, subtract
this value from the series damping resistor, Rd.
Zo = Rs+ Rd
Eq.27
A disadvantage of the series damping technique, as
illustrated in Figure 15, is that during the two-way
propagation delay time of the signal edges, the voltage
at the input to the line is halfway between the logic
levels, due to the voltage divider action of Rd. The "half
voltage" propagates down the line to the load and then
back from the load to the source. This means that no
inputs can be attached along the line, because they
would respond incorrectly during this time.· However,
you can attach any number of devices to the load end of
the line .because all the reflections are absorbed at the
source. If two or more transmission lines must be driven
in parallel, the value of the series damping resistor does
not change.
The advantages of series termination are
Requires only one resistor per line

The pull-up/pull-down resistor termination shown
in Figure 16 is included for historical reasons and for
the sake of completeness. For TTL driving long cables,
such as ribbon cables, the values Rl = 2200 and R2 =
33m are recommended by several bus interface standards. If the cable is disconnected, the voltage at point B
is 3V, which is well above the 2V minimum High TTL
specification. Because most control signals are active
Low, a disconnected cable results in the unasserted
state.
The maximum value of Rl is determined by the
maximum acceptable signal rise time, which is a function of the charging RC time constant. The minimum
value of Rl is determined by the amount of current the
driver can sink. The value of Rz is chosen such that a
logic High is maintained when the cable is disconnected
and the equivalent Thevenin resistance is
Rr= RIR2
Rl+ R2
The value of Rl and R2 in parallel is slightly less
than the cable's characteristic impedance. Ribbon
cables with characteristic impedances of 15()Q are typical.
If both resistors are used, DC power is dissipated
all the time. If only a pull-down resistor (R2) is used,

Consumes little power
Permits incident wave switching at the load after a
To propagation delay
A

v

\

II

r\

J

I Rl > 50n. and 20n. >
R2 > lOn., depending upon speed and output currentsinking requirements.

Positive Step Function Response
The initial voltage on the capacitor is zero. At t =
0, the switch is moved from position 2 to position 1. At t
= 0+, the capacitor appears as a short circuit, and the
voltage V is applied through Rl to charge the load
(R3C). The voltage across the capacitor Vc(t), is
Vc(t)

Commercially Available RC Networks

= V( 1-

J(Rl: ~3)C])

A variety of combinations of Rand C values are
available as series RC networks in SIP packages from at
least two sources.
Bourns calls these networks the Series 701 and 702
RC Termination Networks. You can obtain data sheets
by calling the factory in Logan, Utah (801-750-7200) or
a local sales office.
Thin Film Technology also refers to the networks
as RC Termination Networks. You can obtain data
sheets by calling the factory in North Mankato, Minnesota at 507-635-8445.

Negative Step Function Response
The capacitor is charged to approximately V. At t
= 0, the switch is moved from position 1 to position 2,
and the capacitor is discharged. The voltage across the
capacitor, Vc(t) is
Vc(t)

=

vJ (R2: ~3)C]

Eq.29

Vee
Rl

Zo
A

Eq.28

In theory, the voltage across the capacitor reaches
V when t equals infinity. In practice, the voltage reaches
98 percent of V after 3.9 RC time constants. You can
verify this by setting Vc(t)/V = 0.98 in Equation 28 and
solving for t.

Zo

B

Figure 16. Pullup/Pulldown

Figure 17. Parallel AC Termination

1-16

The voltage decays to 2 percent of its original value
in 3.9 RC time constants. You can verify this by setting
Vc(t)/V = 0.02 in Equation 29 and solving for t.

V

~

The Ideal Case
Consider the ideal case, where Rl = R2 = O. Let
R3 = R in Equations 28 and 29. If a positive pulse of
width T is applied to the modified circuit of Figure 18,
the pulse disappears if 4RC > T.
Because the discharging time constant is the same
as the charging time constant for the ideal case, a negative-going pulse of width T also disappears if 4RC > T.
That is, if the applied signal is normally High and goes
Low, as does the write strobe on an SRAM, the termination filters out all negative glitches less than 4 RC
time constants in width.
The maximum frequency that the circuit passes is
1
F(max.) = 2T
Eq. 30

'\
2

12
Source

This is true because the charging and discharging time
constants are equal for the ideal case.

Capacitance for the Ideal Case

= V ( 1-

T

The Real World
To go from the ideal to the real world, calculate the
values of Rl and R2 from the curves on the data sheet
of the device driving the line. Rl is the slope of the output source current vs. output voltage between 2 and 4V.
R2 is the slope of the output sink current vs output voltage between 0 and 0.8V.
Add the value of Rl to 470 and calculate C, using
Equation 32. Then check to see that the RC charging
time constant does not violate some minimum positive
pulse-width specification for the line. If so, reduce C.
Add the value of R2 to 470 and calculate C. Then
check to see if the discharging RC time constant violates some minimum pulse-width specification for the
line. If so, reduce C.

V

For
For

V(t) _

V

-

V(t) _

V

0.1,

- 0.9,

t=

t=

Eq.32

For T = 5 ns, Table 2 can be constructed. This
table indicates that 500 transmission lines on PCBs that
are terminated with RC networks should use a 47Q
resistor and a capacitor of 48 pF max; 47· pF is a standard value. This network eliminates glitches of 9 ns or
less. The table's second column applies to wirewrapping
construction, which is not recommended for systems
operating at frequencies over 10 MHz. An exception is
if the system consists of less than six MSI or SSI ICs.

Eq.31

1- V(t)

Load

C = 2.2R

J~~] )

RCln[ _ _l_]

I:'

The time for the signaf to transition from 10 to 90
percent of its final value is then T = 2.2 RC. Solving for
C yields

for t yields
t=

I

V(t)

Figure 18. Lumped Load; AC Termination

The value of the capacitor, C, must be chosen to
satisfy two conflicting requirements. First, the capacitor
should be large enough to either absorb or supply the
energy contained or removed when positive-going or
negative-going glitches OCClJr. Second, the capacitor
should be small enough to avoid either delaying the signal beyond some design limit or slowing the signal rise
and fall times to more than 5 ns.
A third consideration is the impedance caused by
the capacitor's capacitive reactance, Xc. The digital
waveforms applied to the AC termination can be expressed as a Fourier Series, so that they can be manipulated mathematically. However, because these signals
are not periodic in the classical meaning of the word, it
is not clear that the AC steady-state analysis model of
Xc applies here.
In most applications, the degradation of the signal's
rise and fall times beyond 5 ns determines the maximum
value of the capacitor. The procedure is to calculate the
rise time between the 10- and 90-percent amplitude
levels, equate this rise time to 5 ns, and solve for C in
terms ofR:
V(t)

Rl

0.10RC.

Schottky Diode Termination
In some cases it can be expedient to use Schottky
diodes or fast-switching silicon diodes to terminate

2.3RC.
1-17

Table 2. Termination Values for an Ideal Case
PCB

Wirewrapped

Zo(O)

50

120

R (0)

47

110

C (max., pF)

48

20

RC (ns)

2.25

2.2

4RC (ns)

9

8.8

Vee

Figure 19. Schottky Diode Termination

lines. The diode switching time must be at least as fast
as the signal rise time. Where line impedances are not
well dermed, as in breadboards and backplanes, the use
of diode terminations is convenient and can save time.
A typical diode termination appears in Figure 19.
The Schottky diode's low forward voltage, Vf (typically
0.3 to 0.45V), clamps the input signal to a Vf below
ground (lower diode) and Vee + Vf (upper diode). This
significantly reduces signal undershoot and overshoot.
Some applications might not require both diodes.
The advantages of diode terminations are:
Impedance matched lines are not required.
The diodes replace terminating resistors or RC terminations.
The diodes' clamping action reduces overshoot and
undershoot.
Although diodes cost more than resistors, the total
cost of layout might be less because a precise, controlled transmission-line environment is not required.
If ringing is discovered to be a problem during system debug, the diodes can be easily added.
As with resister or RC terminations, the leads
should be as short as possible to avoid ringing due to
lead inductance.
A few of the types of Schottky diodes commercially
available are
IN4148 (switching diode)
IN5711
MBD 101, MBD 102 (Motorola)
SN74S1050152'56 (TI, single-diode arrays)
SN74S1051/53 (TI, double-diode arrays)

example because in most modem high-performance
digital systems, the PCBs have multiple layers.
The equivalent On channel· resistance of the PLD
pull-up device, 620, is calculated using the output
source current versus voltage graph, over the region of
interest (2 to 4V), from the PAL C 20 series data sheet.
The equivalent resistance of the pull-down device, 110,
is calculated in a similar manner, using the output sink
current versus output voltage graph, over the region of
interest (0.4 to 2V), also on the data sheet.
The equivalent input circuit for the FIFO is constructed by approximating the input and stray
capacitance with a lO-pF capacitor and the input resistance with a 5-MO resistor. The input leakage current
for all Cypress products is specified as a maximum of
± 10 J..I.A, which guarantees a rhlnimum of 500 KQ at Yin
= 5V. Typical leakage current is 10 pA.
Because the PLD is driving four FIFOs in parallel,
the equivalent lumped capacitance is 4 X 10 pF = 40
pF, and the equivalent lumped resistance is 5,000,000/4
= 1.25 MO.
The next step is to calculate the propagation delay
and the loaded characteristic impedance of the line.
The unloaded propagation delay of the line is calculated using Equation 26 with a dielectric constant of 5:
Tpd = 2.27 nslft
To calculate the loaded line propagation delay, the
intrinsic capacitance must first be calculated using
Equation 5.
Tpd= Zo Co

where Zo is the intrinsic characteristic impedance, and
Co is the intrinsic capacitance.

Un terminated Line Example
The following example is presented to illustrate the
procedure for calculating the waveforms when a
Cypress PLD generates the write strobe for four
Cypress FIFOs. The PLD is a PAL C 16L8 device and
the FIFOs are CY7C429s.
The equivalent circuit appears in Figure20 and the
unmodified driving waveform in Figure 21. The rise and
fall times are 2 ns. The length of the stripline trace on
the PCB is 8 inches and the intrinsic characteristic line
impedance is 500. The voltage waveforms at the source
(point A) and the load (point B) must be calculated as
functions of time. Stripline construction is used for this

C

o

= !..J!!!...=
Zo

2.27 nslft = 454 FI'"
50
. P Jt.

Because the line is loaded with 40 pF, Equation 6 is
used to compute the loaded propagation delay of the
line.
TpdL= TpdV1+ CDICo
r-------~_=----

TpdL

= 2.27 nsljt

TpdL

= 3.46

1+

40pF

8.

45.4 pF/.tt x 12 ~n./.

m·lft

1-18

nslft.

The magnitude of the reflected voltage at the source is
then
VS1 = - 4Vx (- 0.5) = 2V.
This wave propagates from the source to the load
and arrives at t = 3To. The wave adds to the OV signal.
The rise time is preserved, and thus the time required
for the signal to go from 0 to 2V is
2Vx 2ns
tr=
4V
= 1 ns.

Note that the capacitance per unit length must be
multiplied by the line length to arrive at an equivalent
lumped capacitance.
The intrinsic line impedance is reduced by the
same factor by which the propagation delay is increased
(1.524; see Equation 7):
Zo'

= :~~ = 32.80.
Initial Conditions

The signal at the load thus reaches the 2V level at time
t= 3To+ 1 ns= 7.9 ns.
and remains at that level until the next reflection occurs
at
t= 5To
The wave that arrives at the load at 3To reflects back to
the source and arrives at
t= 4To= 9.2ns.
The 2V level adds to the -4V level, for a total of
-2V. The rise time is preserved, so that this level is
reached at
t = 4T0 + 1 ns = 10.2 ns.
and maintained until the next reflection occurs at
t= 6To.
The 2V wave that arrives at the source at t = 4To
reflects back to the load and arrives at t = 5To. The
portion that is reflected back to the load is
VS2 = 2 x (- 0.5) = - 1V.
This value subtracts from the 2V level to give 2 - 1
= 1V. Because the fall time is preserved, the time required for the signal to go from 2 to 1V is
1Vx 2ns
if=
4V
= 0.5 ns.

At time t = 0, the circuit shown in Figure20 is in a
quiescent state. The voltage at points A and B must be
the same. By inspection:

VA = VB

= (Vee -

Vf)

(~)
Rs+ RL
6

= (5 -

J

5 X 10 6 = 4V
28+ 5 x 10
At t = 0, the driving waveform changes from 4V to
approximately OV with a fall time of 2 ns. This is shown
in Figure20 by the switch arm moving from position 1 to
position 2.
The wave propagates to the load at the rate of 3.46
ns per foot and arrives there
'.f.
8 in.
23
4
1) (

To= 3. 6nslJt x 12in.lft

= .

ns

later, as illustrated in Figure22b.
Because the reflection coefficient at the load is pL
= 1, an early equal and opposite polarity waveform is
propagated back to the source from the load. The
reflection arrives at t = 2To = 4.6 ns (Figure 22a).
Note that the fall time is preserved.
The reflection coefficient at the source is:
Rs - Zo'
11- 32.8
ps = Rs + Zo' 11 + 32.8 - 0.498

The IV level is thus reached at time
t= 5To + 0.5 ns = 12 ns.

To simplify the calculations that follow, consider 0.5 to be the Low-level source reflection coefficient.
VA(t)

r

24

I-

Vee = 5V

~I

4V

Zo = 500.

1V

~IB

62

1

0. I:

foF
J.-

I = 8"

i

(" rn
+

40 pF

1.25

0

1--- 20 ---1

J.-

0

2

22

Figure 21. V A(t), Unmodified

Figure 20. Equivalent Circuit for Cypress PAL Driving

1-19

24

4
3
2

o
-1
-2

-3
-4

Figure 22(a). Unterminated Line Example; VA(t)

4

4

2

o
-1

o
To
2.3

4.3

3To
6.9

7.9

5To
11.5

8

2

7To

9To

To

3To

16

W.7

2.3

6.9

-2

-3
-4

Figure 22(b). Unterminated Line Example; VB(t)

At t = 6To, the IV wave arrives back at the source,
where it subtracts from the - 2V level to give -IV. The
rise time is
tr = 1 x 0.5 ns/V = 0.5 ns.
The signal at the source reaches the IV level at
t= 6T o + 0.5 = 14.3 ns.
The IV wave that arrives at the source at t = 6To
is reflected back to the load and arrives at t = 7To. The
portion that is reflected back is
VS3= 1 x (- 0.5) = - 0.5V.
This value subtracts from the IV level to give 0.5V.
The fall time is 0.25 ns. The 0.5V level remains until the
next reflection reaches the load at
t= 9T o
At t = 8To the 0.5V wave that reflects from the
load at t = 7To arrives back at the source, where it
subtracts from the - IV level to give - 0.5V. The rise
time is 0.25 ns. The portion that reflects back to the
load is
VS4= 0.5 x (- 0.5) = - 0.25V.
The -0.25V signal arrives at the load at t = 10To
23 ns and subtracts from the 0.5V signal to give 0.25V.
This process continues until the voltages at points
A and B decay to approximately OV.

t=

The difference (11.75 - 7.65) is 4.1 ns, which is wide
enough for the FIFO to interpret as a second clock. To
eliminate this pulse, the line must be terminated.

Strobe Shortening Considerations
In this example the width of the negative strobe is
22 to 24 ns. If a CY7C429-20 FIFO is used, the write
(or read) strobe must not be shorter than 20 ns. Even if
the FIFO does not recognize the 4.1-ns negative pulse,
the shortening of the write strobe by 5T0 = 11.5 ns is
sufficient to violate the minimum negative-pulse-width
specification.
This strobe-shortening phenomenon might also
occur on other active-Low control lines such as output
enables and chip selects. Clock lines must also be
analyzed for this problem; in general, these lines should
be terminated.

The Rising Edge of the Write Strobe
Now consider an analysis of the write strobe's rising
edge to assure that the reflections associated with this
edge do not cause multiple clocks or false triggering of
the FIFO. At t = 22 ns, the rising edge of the write
strobe begins, which is the equivalent of closing the
switch in Figure20 in the 1 position. For this analysis, it
is convenient to start the time scale over at zero, as appears in Figures22a and 22b.
If the forcing function were a step function, the
equations of Figure 4h would apply. The time constant
in the eq uation is

Observations
The positive reflection coefficient at the load and
the negative reflection coefficient at the source result in
an oscillatory behavior that eventually decays to acceptable levels. The voltage at point A reaches -IV after 6
To delays and the voltage at point Breaches 0.5V after
7 To delays.
The reflection at the load that causes the voltage to
equal the TTL minimum One level (2V) at T = 3To
causes a problem. The actual input voltage threshold
level is 1.5V for TTL-compatible devices that do not exhibit hysteresis.
The voltage at the load falls from 4 to OV in 2 ns,
beginning at t = To. Because To = 2.3 ns, the voltage
reaches zero at
2.3 ns + 2 ns = 4.3 ns.
The 1.5V level occurs at
2ns
4.3 ns - 4'V x 1.5V = 3.55 ns.

RZo'Ce
T = R + Zo'

2ns
4'V
x

Eq. 33

Because
R> Zo' ,T= Zo'Ce

where Zo' = 32.80, and Ce = 45.4 pF.
This is the equivalent of saying that you can ignore
the 1.25-MQ device input resistance for transient circuit
analysis. Substituting Zo' and Ce into the preceding
equation yields a time constant ofT = 1.489 ns.
Writing the equation for the voltages for the circuit
of Figure20 yields
1
VB(t)

= iZo' + ce Jt i dt

Eq. 34

= KtU(t) -

Eq.35

0

Also,

The rising edge begins at
t= 3To = 6.9 ns.
The 1.5V level occurs at
6.9 ns +

5To + 0.25 ns= 11.75 ns.

VB(t)

K(t- Tl) U(t- Tl).

where Kt is the rising edge of the write strobe (K =
2V/ns) applied at t = 0 using a unit step function, U(t);
and-K(t - Tl) represents an equal but opposite
waveform applied at t = T1 (after the rise time) using a
unit step function, U(t - Tl).
Equating the expressions and taking the LaPlace
transforms of both sides yields

1.5 = 7.65 ns.

The time difference (7.65 - 3.55 = 4.1 ns) is long
enough for the FIFO to interpret the signal as a Low.
Next, consider the width of the positive pulse that
begins at the load at t = 3To. Because the rise time is
preserved, the signal takes 1 ns to reach 2V, or 0.75 ns
to reach 1.5V. The signal begins to fall at t = 5To,
reaching 1.5V at

Tls

K
Ke(
7----;:= Zo' J(s) + J(s)
Ces = Zo' +

1 )
C
J(s)
es
Eq.36

1-21

However,

Equation 42 is used to calculate the voltage at the
load at t = 2To, because 1 To is used for propagation
delay time:

f

~

VB(t)

1
Ce to i dt, or, VB(S) = /(s)
, Ces

VB (t=2To) =

Therefore,
K

-12

Ke- Tis

;- -;=

(

1 )
Zo' + Ces CeSVB(S).

-2Vx 32.8 x 45.4x 10

(1- e-1. 489 )(e- 2 )+ 4
2x
;;;: - 1.489(0.774) (0.1353) + 4
= - 1.559 + 4 = 3.84V.
The voltage at the load remains' at this value until
the first reflection from the source reaches the load at t
= 3To •
Meanwhile, at t = To, the wave at the load reflects
back to the source and arrives at t = 2To. The wave
subtracts from the 4V level at the source, as illustrated
in Figure6c. The.amplitude of the droop is given by
C 'Zo' Vo
Vr = - 2 - Tr .
Eq.44

Eq.37

10- 9

Solving for VB(S) yields

$( 1- eVB(S)

=

S

,
+

Ce (Zo

TIS

)

Eq.38

'1
C

S)
e

which is equivalent to

--L(
1 _ - TIS)
Zo'C
e
e

Eq.39

for Rs = Zo.
If Rs does not equal Zo', Equation 44 must be
modified. Instead of Vo/2, the voltage is

Taking the inverse LaPlace transform yields

1) + KtJ U(t)
[ KZo'Ce( J-t:c~l) J - 1) + K(t- Tl) ] U(t- Tl)

VB(t)
-

= [ KZo'Ce( e- Zo!C. -

o

V (

so that Equation 44 becomes

Eq.40

V - C 'Zo'Vo

for

KZo'Ce( f~J
= T1
eLzo'C. -

t~

VB(t)

Tl

)

K
1 + T1(t)

fnJ)f-tJ
eL
+

KZ'C(
=~
1- eL zo'c.

Zo'C.

for t> Tl
where Kl is the fmal value, which is 4V.
Substituting the correct values for t
yields
VB(t=Tl)= 2x32.8x 45.4x 102x 10- 9

12

r-

Eq.41

=

Tl

=

2 ns

(e-1. 489 _ 1)

+ 2V x 2ns
ns

= - 1.15 + 4 = 2.85V.
If the forcing function is a step function, the equation is

4~

Iz~,~.J)

=
1Eq.43
at t = 2 ns, VB = 3V, which is more than the 2.85V
,
calculated using Equation 41.
At t = 22 ns + To, the voltage waveform begins to
build up at the load and continues to build until the first
reflection from the source occurs at t = 3To.
VB(t)

Tr

Vr = 1.716V.
Because 4V - 1.716 = 2.284, the voltage does not
drop below the minimum TTL VIH level of 2V, but the
voltage does come close.
The reflection coefftcient at the source is
Rs- Zo'
ps= Rs+ Zo'
where Rs = 62.Q, Zo' = 32.80, ps = 0.308.
The amount of voltage reflected from the source
back to the load is then
VSl = 1.716 x 0.308 = 0.53V.
The 40-pF capacitor reduces the rise time of the
waveform at the load. The reflection at the source
caused by the load capacitor is insufficient to reduce
the 4V level to less than the TTL One level (2V).
The reflection coefficient at the source is small
enough so that the energy reflected back to the load is
insufficient to cause a problem.

Eq.42

K1

J

Rs
Eq.45
Rs+ Zo'
where C' = 40 pF, Zo' = 32.80, Rs = 620, TR = 2
ns, and Vo = 4V. Substituting these values into Equation 45 yields

The ftrst term in Equation 40 applies from time zero up
to and including Tl, and the second term applies after
Tl:
VB(t)

S
RS: 20' J

References
1. Matick, Richard E. Transmission Lines for Digital
and Communications Networks. McGraw Hill, 1969.
2; Blood, Jr., William R. MECL System Design
Handbook. Motorola Inc., 1983.

1-22

CYPRESS
SEMICONDUCTOR

Power Characteristics of Cypress Products

This application note presents and analyzes the
power dissipation characteristics of Cypress products.
The knowledge and tools presented here will help you
manage power when using Cypress CMOS products.

sociated with the inputs, outputs, and internal nodes.
This component is commonly called C V2 power and is
directly porportional to the operating frequency, f.
The charge, Q, stored in a capacitor, C, that is
charged to a voltage, V, is given by the equation:
Q= CV
Eq.l
Dividing both sides of Equation 1 by the time required to charge and discharge the capacitor (one
period, or T) yields:
Q_CV
Eq.2
T- T

Design Philosophy
The design philosophy for all Cypress products is
to achieve superior performance at reasorlable power
dissipation levels. The CMOS technology, circuit design
techniques, architecture, and topology are carefully
combined to optimize the speed/power ratio.

By definition, current (I) is the charge per unit time and

Power Dissipation Sources

f= 1T
Therefore,
1= C Vf
Eq.3
The power (P = V I) required to charge and discharge the capacitor is obtained by multiplying both
sides of Equation 3 by V:
P= VI= CV2 f
Eq.4
It is standard practice to assume that the capacitor
is charged to the supply voltage (Vee), so that
P=Vee I=CVee2 f
Eq.5
The total power consumption for CMOS systems
depends upon the operating frequency, the number of
inputs and outputs, the total load capacitance, the internal equivalent (device) capacitance, and the static
(quiescent) or standby power consumption. In equation
form:
Pd = [CINT FINT + Cload Fload] Vee2 + lee Vee
Eq. 6
The first four quantities are frequency dependent,
and the last is not. This same equation can be used to
describe the power dissipation of every IC in the system. The total power dissipation is then the algebraic
sum of the individual components.
The relative magnitudes of the various terms in the
equation are device dependent. Note that Equation 6

Power is dissipated both inside and outside ICs.
The internal and external power have a quiescent (or
DC) component and a frequency-dependent component. The relative magnitudes of each depend upon
the circuit design objectives.
In circuits designed to minimize power dissipation
at low to moderate performance, the frequency-dependent component is signifigantly greater than the DC component. In the high-performance circuits designed and
manufactured by Cypress, the frequency-dependent
power component is much lower than the DC component. This is because a large percentage of the internal power is dissipated in linear circuits such as sense
amplifiers, bias generators, and voltage/current references, which are required for high performance.

Frequency-Dependent Power
CMOS circuits inherently dissipate significantly less
power than either bipolar or NMOS circuits. The ideal
CMOS circuit has no direct current path between Vee
and Vss. In circuits using other technologies, such paths
exist, and DC power is dissipated while the device is in
a static state.
The principal component of power dissipation in a
power-optimized CMOS circuit is the transient power
required to charge and discharge the capacitances as-

1-23

Table 1. Types of Input Buffers

worst-case power dissipation. The information is
presented as functions of frequency, Vee, and temperature.
A general-purpose power dissipation model for all
Cypress ICs appears in Figure 1.
To obtain power dissipation data on an IC, you
must isolate the three components of power dissipation
included in Equation 6 by controlling the Ie's inputs.
The standby current (Ice) is measured with the inputs
to the IC at OAV or less. Under this condition, the input
buffers arid unloaded output buffers draw only DC
leakage currents. All other direct currents derive from
the substrate bias generator, sense amplifiers, other internal voltage or current references, and NMOS
memory circuits.
At Yin = 1.5V, the input buffers draw maximum
Ice. To find the total input buffer Ice current, you
measure the total current and subtract the quiescent
current. You can then calculate the current per input
buffer by dividing the total input-buffer current by the
number of input buffers.

ICC (MAX. IN rnA)

BUFFERTYPE
A

1.3

B

0.8

C

0.6

must be modified if all of the internal nodes or all of the
outputs are not switching at the same frequency.
Transient Power
Cypress devices incorporate N-well CMOS inverters that can affect the devices' transient power consumption. In an ideal N-well CMOS inverter, the Pchannel pull-up transistor and the N-channel pull-down
transistor (which are in series with each other between
Vee and V ss) are never on at the same time. Thus,
there is no direct current path between Vee and
ground, and the quiescent power is very nearly zero.
In the real world, when the input signal makes the
transition through the linear region (i.e., between logic
levels) both the n-channel and p-channel transistors are
partially turned On. This creates a low-impedance path
between Vee and Vss whose resistance equals the sum
of the n- and p-channel resistances.

Input Buffers
Cypress products use· three different types of input
buffers. For purposes of illustration, they are referred to
as types A, B, and C. Table 1 lists the buffer types used
in various products.
Figure 2 shows schematics and input characteristics
for the three types of buffers. A circle on a transistor's
gate means that the transistor is a P-channel device.
As Figure2 shows, the input buffers draw essentially zero Ice when Yin is OAV or less. This is also true
when Yin is 4V or more, except for type A. In other
words, if the inputs are driven rail to rail, the B and C
input buffers dissipate power only during the input signal transitions.

DC or Static Power
In addition to conventional gates, Cypress devices
contain sense amplifiers; input and output buffers; and
bias and reference generators that all dissipate power.
RAMs and FIFOs also have memory cells that dissipate
standby power whether the IC is selected or not.
PROM and PAL products have EPROM memory cells
that do not dissipate as much standby power as a RAM
cell.

Core and Output Buffers
The standby power dissipation of an IC's core
derives from the substrate bias generator, reference
generators, sense amp lifers , and polyload RAM cells or
EPROM cells. This current is measured with Yin = OV,
so that the input buffers draw no current. Under these
conditions, the output buffers draw only leakage current
and dissipate essentially no power.
Programming either PROMs or PALs stores
charge on the floating gate of. an NMOS transistor,
which increases the transistor's threshold voltage. This

Power-Down Options
Five Cypress static RAMs offer a power-down option that enables you to reduce the devices' power dissipation by approximately an order of magnitude when
they are not accessed. The power-down technique disables or turns off the input buffers and sense amplifiers.

Power Dissipation Model
The rest of this application note presents power
dissipation models for various Cypress CMOS products
as well as information on each product's typical and

n
INPUTS

I

'::t

~

~CIN

INPUT
BUFFERS

.

m
CORE

J...

OUTPUT
BUFFERS

,*C1NT

Figure 1. Power Dissipation Model

1-24

'I

~CL

OUTPUTS

Power Characteristics
Vee

VlNl
1.3

0.8

~

~

TYPE A

5

0

0.6

~

.-.§,

TYPEB

.-.§,

0

.!:r>

0

..Y

..Y

0

0
0.6

0

0

2.0

0.5

1.5

3.5

0

".0

0

0.5

VIN (v)

VIN (V)

1.5

3.5

VIN (v)

Figure 2. Three ButTer Types
current, i(t), and the X axis during Tcy. Thus, because
the "current pulse" is effectively spread over a longer
time when the frequency is decreased, the average current is proportionately lower.
Note that the preceding calculations have not accounted for any DC loads. You must calculate these
separately.

higher threshold prevents the transistor from turning on
during normal operation; unprogrammed transistors do
tum on. Therefore, unprogrammed PALs and PROMs
draw more current and dissipate more power than
programmed devices.
The output buffers on Cypress products have nchannel pull-up devices that cause the output voltage
level to reach
VOH

=

Vcc - VT

=

5V - IV

=

4V

The capacitance of the output buffers, including
stray capacitance, is typically 10 pF. If
CL = 10 pF, VOH "= 4V
Again, using E~uation 3,
Icc(f) = 40 x 10-1 f
for the output buffers.

ADDRESS / OATA

Current Measurement

ICC

Figure 3 illustrates the instantaneous current drawn

by a Cypress RAM. The instantaneous power is calculated by multiplying this current times the constant
supply voltage, V cc. Most of the power is dissipated
during the access time. This is also true for PROMs and
PALs.
The current measurement unit in an automatic
tester integrates the instantaneous current over the
measurement cycle and arrives at an equivalent average
current. In other words, the average current, Iz. during
time 'fCy equals the area between the instantaneous

tf-------TCy-------t

1I = Quiescent

12

=

i(t)

=

Icc

Average Icc
Instantaneous Icc

Figure 3. RAM Icc

1-25

Table 2. Static RAMs
Part No.

Because the input buffers on the CY7C 169 are type
C, the average current is 0.3 rnA. If the input-signallevel transitions are 4V and the transition times are
2V/ns, the transition time is
4V
Tt= 2V/ns = 2ns

No. CINT Icc Icc
"uffer No.
Type Inputs Outputs
(Q) (max
(pF) (rnA) (rnA)

CY7C122/123

A

16

4

24

50

90

CY7C128

B

14

8

27

59

120

CY7C147

B

15

1

34

28

90

CY7C148/149

B

12

1

32

45

90

CY7C150

B

18

4

20

44

90

CY7C1611162

B

22

4

300

13

70

CY7Cl64

B

20

4

300

13

70

CY7C166

B

21

4

300

13

70

CY7C167

C

17

1

75

25

70

CY7C168/169

C

18

4

75

50

70

CY7C170

B

18

4

50

33

90

CY7C1711172

B

18

4

100

27

70

CY7C185/186

B

25

8

330

13

100

CY7C187

B

19

1

150

7

100

CY7C189/190

B

10

4

21

32

90

The duty cycle is then
2 n~5'nS = 0.057
Each input buffer thus draws
OJ rnA x 0.057 = 0.0171 rnA
If all inputs change, the total transient input buffer
current is
18 x 0.0171 = 0.31 rnA
To calculate the CVf input buffer current:
1= CVf
CIN = 5 pF
1= 0.57 rnA
V= 4V
f = 1/35 ns
TOTAL = 18 x 0.57 = 10.28 rnA
To calculate the internal CVf current:
1= CVf
CINT = 75 pF
1= 10.71 rnA
V= 5V
f = 1135 ns
To calculate the output CVf current:
1= CVf
COUT = 10 pF
1= 1.15 rnA
V= 4V

Product Characteristic Tables
Tables 2 through 5 allow you to calculate the current requirements for Cypress products. CINT is the
equivalent device internal capacitance, lce(Q) is the
quiescent or DC current, and IcC(MAX) is the maximum
Icc (as specified on the data sheet) for the commercial
operating temperature range. Conditions are Vee =
5V and TA = 25°C.
Note that for the 16L8, 16R8, 16R6, and 16R4
PALs, the number of inputs and outputs is user configurable. All the PALs use type B buffers.

Table 3. PROMs
Part No.

Buffer No.
No. CINT Icc Icc
Type Inputs Outputs
(Q) (max
(pF) (rnA) (rnA)
[1]

SRAM Calculation Example
To illustrate how to use Tables 2 through 5, consider an example of estimating the typical Icc for the
CY7C169-35 RAM at room temperature (TA = 25°C)
and Vee. Assume the duty cycle is 100 percent at the
specified acces time. The procedure shown here calculates the typical and worst-case Icc with all inputs and
outputs changing and with output loading of 10 pF.
From the RAM product characteristic table:
Number of inputs = 18
Number of outputs = 4
CINT = 75 pF
Iee(Q) = 50 rnA

CY7C225

B

12

8

32

CY7C235

B

13

8

CY7C245

B

13

8

90

35

35

90

35

50

90

CY7C251

C

18

8

43

9.5

100

CY7C254

C

18

8

43

35

100

CY7C26113/4

C

14

8

60

45

100

CY7C268

C

19

118

60

60

100

CY7C269

C

17

118

60

60

100

CY7C2811282

B

14

8

35

35

100

CY7C2911292

B

14

8

35

50

100

[I]/Bidirectional pins

1-26

35

Power Characteristics
Table 4. PALs

f = 1/35 ns
TOTAL = 4 x 1.15 = 4.6 rnA
The quiescent current is 50 rnA. The total current
at Tey = 35 ns is:
Input Transient 0.31 rnA
Input CVf
10.28 rnA
Internal CVf
10.71 rnA
Output CVf
4.6 rnA
Ouiescent
50
rnA
Total Icc
75.9 rnA (all inputs/outputs changing)

Part No.

Icc (Q)
(rnA)

Icc (max)
(rnA)
45

PALC16L8/R8!R6!R4

40

25

PLDC20G10

50

30

55

PALC22VlO

50
300

40
42

80
120

IPLDCY7C330

Total Icc = Input Transient Icc
+ Input CVflee
+ [Internal CVf + Output CVf + Ice(Q)] x 1.13
Icc = 0.31 + 10.28 + [65.31] x 1.13 = 84.4 rnA
This value is approximately 94 percent of the 90
rnA specified on the data sheet. Note, however, that the
data sheet Icc maximum does not include the output
CVf current.

Note that the worst-case transient current is 25.9
rnA. If half the inputs and outputs change, the worstcase transient current decreases to 12.95 rnA, which
gives a total current of 63 rnA (typical Icc).
Note also that the input CVf current the the output
CVf current have the same values for a bipolar device.

Worst-, Worst-, Worst-Case ICC
Now consider a procedure for estimating Icc for
worst-case Vee and low temperature, in addition to all
inputs and outputs changing. Then you can compare the
result with the Icc specified on the data sheet.
Icc is greater at high Vee, which is 5.5V, or 1.1
times the nominal 5V Vee. Because the increase in Icc
due to the lower temperature is 3 percent, the total increase is 13 percent. These factors apply to the internal
CVf current (10.71 rnA), the output CVf current (4.6
rnA), and the quiescent current (50 rnA), which
together total 65.31 rnA.

I I

(pF)

CINT

ICC-Versus-Frequency Characteristic
The Ice-versus-frequency curves for all Cypress
products have the same basic shape, which is illustrated
by the PAL 16R8 curve in Figure4. The current remains
essentially constant at the quiescent Icc value until the
frequency increases to the point where the capacitances
begin to cause appreciable currents. The location of this
point depends upon the input, internal, and output
capacitances; the number of inputs and outputs; the
rate at which the inputs and outputs change; and the
voltage levels the inputs and outputs are switched be-

Icd VS FREQUENCY FOR PAL 16R8
ALL INPUTS / OUTPUTS CHANGE
Vcc=5V. TA=25OC. VIL=O.8V. VIH =2V

TYPICAL
Icc VS f

120

1

J/

!

100

..,.

...:

80

V'det=sop,

E

(OUTPUTS

~

EN~BL1D)

0

...Y

60

AR'{

40

~
1(0)

-'

25
20

,."",

~~

~

Jt~

I

o·S«\I'

~ ~CJ=O~F

I

(~

ct:i(

~.

B

o
10KHz

100KHz

1 t.lHz

FREOUENCY IN HERTZ

Figure 4. Typical Icc vs f
1-27

10t.lHz

~TYfl
100t.lHz

Table S. Logic Products
Part No.

Buffer No.
No.
CINT Icc Icc
Type Inputs Outputs (pF) (Q) (max
. (rnA) (rnA);

[1]

CY7C401

B

6

6

53

30

75

CY7C402

B

7

7

53

30

75

CY7C403

B

7

6

53

30

75

CY7C404

B

8

7

53

30

75

CY7C408

B

11

12

100

42

135

CY7C409

B

11

13

100

42

135

CY7C428/9

C

14

12

190

18

80

CY7C510

C

24

19/16

60

30

100

CY7C516

C

28

16/16

60

30

100

CY7C517

C

28

16/16

60

30

100

CY3341

B

6

6

53

30

45

CY7C601

C

25

19/64

950

89

600

CY7C901

C

24

10/4

160

25

80

CY7C909

C

21

5

80

25

55

CY7C910

C

22

16

150

2.6

70

CY7C911

C

13

5

80

25

55

CY7C9101

C

36

22/4

70

30

60

CY7C9116

C

22

1120

1000 35

150

CY7C9117

C

38

114

1000 35

150

tween. For Cypress products, this point is in the I-to
1O-MHz range.
The PAL 16R8 devices that were tested to optain
the data for the curve were exercised such that all inputs and outputs changed every cycle. Curve A sho:ws
the total Icc for a 50-pF load on each of the eight outputs. Curve B shows the total Icc when the outputs are
disabled. The B curve results from the input and the
internal capacitances. In most applications, ·the actual
operation of the device falls somewhere between the A
and B curves.
You can extrapolate the A and B curves backwards
until they intersect the quiescent current, which occurs
at point C in Figure 4. Point C is approximately 5.6
MHz. This gives you an easy-to-use formula for calculating Icc. For frequencies less than 5.6 MHz:
Icc = Icc(Q) = 25 rnA
For frequencies greater than 5.6 MHz:
Icc = Icc(Q) + 3.5 rnAlMHz(alI outputs changing)
or
Icc = Icc(Q) + 0.5 rnA/MHz (no outputs changing)

[l]/Bidirectional pins

1-28

CYPRESS
SEMICONDUCTOR

Tips for High-Speed Logic Design
This application note provides tips and makes substantive suggestions for designing high-speed logic circuits that operate reliably. The tips and suggestions are
organized under the headings:
Noise Considerations
Clock Distribution
Buses and Memories
Care and Feeding of PLDs
PCB Effects
Metastability and Crosstalk
As electronic system clock rates reach ever higher,
logic designers who were engineering lO-MHz, l00-nscycle-time systems are finding themselves working with
systems running at speeds upwards of 20 MHz, with 50ns cycle times. These designers are discovering that
adequate techniques for work at 10 MHz are no longer
appropriate at 20 MHz and beyond. At 10 MHz, you
can utilize sluggish and relatively well-behaved LS TTL
logic with its leisurely set up and hold parameters; long
propagation delays; forgiving output enable and disable
times; and high-output current-drive capacity.
As an alternative, designers turned to faster bipolar
logic families, but found that power dissipation rose
proportionally. To save power and enhance reliability,
today's designers are changing to CMOS components.
Designers are happy to find that CMOS can deliver the
speed they require at the low power levels they desire.
In
the
quiescent
state,
CMOS
logic
(ACt ACTIFCT) draws three to five orders of magnitude less power than bipolar logic (LSI ALSIAS). At 1
MHz, CMOS logic dissipates about 0.1 mW per gate,
while LS TTL logic dissipates about 2.0 mW per gate.
CMOS technology has truly rewritten the speed/power
rules set forth in the bipolar era.
Plenty of challenges still face the high-speed logic
designer, however. For example, high-performance logic
families are sensitive to system noise and generate noise
themselves. As a result of the effort to make these
devices as fast as possible, they often have anemic output drive capacity. Clock distribution becomes much
more of an issue at high frequencies because skew and

slow rise times degrade operating margins. As bus
cycles tighten, it becomes increasingly difficult to avoid
bus clashes (multiple devices driving a bus). Very fast
SRAMs and FIFOs require read and write pulse widths
that are very difficult to synthesize using synchronous
logic; hence the appearance of self-timed memory
devices. PLDs have become ubiquitous in modern
board-level designs, but high-speed designers must
carefully consider PLDs' relatively long propagation
delays and slow switching speeds.
You can no longer think of printed circuit boards
as an ideal electrical interconnect. In the high-speed
realm, you must account for the effects of distributed
capacitance, inductance, and propagation delay on the
PCB. To mitigate the effects of ringing, resistive termination of critical signals becomes a practical necessity
above 20 MHz. In the days of old, it wasn't appropriate
to factor loading into propagation delays. Today, conservative designers account for loading when calculating
worst-case prop delays and worst-case signal skew.
Heavy capacitive bypassing and low-inductance decoupIing is essential to minimize switching noise above 20
MHz.
Metastability, a phenomenon not widely appreciated until recently, is a critical issue in high-frequency systems. It is essential to be able to resolve
asynchronous events quickly and reliably in high-performance designs. Finally, crosstalk is a substantial concern with high-slew-rate and noise-sensitive CMOS
logic.

Noise Considerations
High-speed CMOS logic tends to be noisier than
LS TTL for two reasons: CMOS voltage swings are railtorail, and small-geometry, dual-layer-metal CMOS
technology makes possible faster edge rates (2V per ns
and faster).
The classic ground-bounce noise situation arises
when several outputs of a CMOS logic device switch
from High to Low. The simultaneous switching causes a
relatively large sink current from the load capacitance
to flow to ground through the device package inductance. The potential developed momentarily across this
1-29

Figure 1. Maintaining Duty Cycle Symmetry
inductance equals the product of the package induc":
tance and the sink current's rate of change. This
ground-bounce voltage spikes the Low state held on the
quiescent outputs. The spike can often exceed the input
Low-level maximum voltage (0.8V), causing the
downstream logic device to switch erroneously. Both the
chip ground reference and the chip Vee reference are
spiked, but because. more energy is switched through
the ground-lead inductance, it is much more common to
see a problem in a quiescent Low-state output.
Here are some procedures that minimize ground
and Vee bounce noise:
1. Pursue any steps that reduce the parasitic inductance between the package and ground and Vee. These
steps includes using a PCB with ground and Vee planes
or, at the very least, power distribution elements. Avoid
use of sockets, but do use low-inductance decoupling
and bypass capacitors. On critical parts, use a standard
ceramic decoupling capacitor (0.01 to 0.1 ~) along
with a high-frequency filtering capacitor (approximately
470 pF). The Rogers Corp. MiCro/Q 1000 Series highfrequency, low-inductance caps are optimal for this purpose. Surface-mount packages have lower package inductance than DIP packages. So-called rotated-die
devices with center Vee and ground pins also have lower
inductance.
2. Whenever possible, design synchronous 'circuits.
The ground bounce produced by an octal register, for
instance, is triggered by the clock. If the register feeds
another registered device, then the noisy output has
unp.l a set-up time before the next clock to settle. When
you must drive an asynchronous signal with an octal
driver, use an output pin close to the package ground
pin. The output pin next to the Vee pin can have as
much as SO% more ground-bounce noise than the output pin next to the ground pin.
3. Use various techniques to slow switching or transition edge rates and, therefore, the sink. and source
currents' rate of change. This can be accomplished with
series damping resistors or by increasing the inductance
or capacitance between the driving device's output pin
and the receiving device's input pin. PCB traces exhibit
parasitic ground-path capacitance and inductance that
depend on trace length and topology'; 'these factors are
thus difficult to predict. The most common technique is
to use series damping resistors in the 2S to 3S0 range;

330 is a standard value. Series resistors also limit signal
overshoot and undershoot.
4. Try to avoid. running control signals through a
device that drives data and address lines. When using a
10-output PLD such as a 22VlO in an 8-bit bus-oriented
application, for instance, you might be tempted to use
the extra two outputs for control signals. If the eight
data lines switch simultaneously, however, the control
lines will probably be disturbed. Using devices that feature input hysteresis adds to the noise margin. Input
,hysteresis can typically provide 200 mV of additional
noise immunity.
Note that mixing logic families can compromise
noise immunity margins. For comparison purposes, the
margin for a specific logic family is the magnitude difference between the family'S guaranteed input threshold
and the guaranteed output voltage for the High and
Low states:
N..

.

Vil- Vol

Olse Immumty = Vih - Voh

When possible, use a logic family that can drive
(commercial) transmission lines directly. This
specification is characteristic of devices that can switch
sufficient current to guarantee so-called incident-wave
switching. Switching that occurs on the incident wave is
faster than having to wait for the reflected wave.
In addition to caUSing false triggering of
downstream sequential logic and glitches in downstream
combinatorial logic, ground-bounce noise can .also
cause registers in the bounced device to "forget" their
stored state. This is due to the momentary disturbance
in the chip's ground and Vee reference. The switching of
multiple outputs can also skew the device's propagation
delay by approximately 200 ps per switched output.
With an octal or lO-bit device, this 1 to 2 ns additional
delay should be included in worst-case ,timing analyses.

son

Clock Distribution
Adequate clock distribution is essential for 20-MHz
and faster systems because skew can eat up precious
nanoseconds and because high-speed logic devices are
sensitive to clock waveform distortion and slow rise
times.
All physical devices exhibit, an edge-dependent
propagation delay asymmetry; the Low-to-High edge
propagates more quickly than the High-to-Low edge, or
vice versa. For example, the c1ock-to-Q propagation
delay for a Signetics 74F74 ranges from 3.8 to 6.8 ns
Low to High, and 4.4 to 8.0 ns High to Low. The data
sheet for the Texas Instruments 74AS1000 NAND
driver specifies a 1-to-4-ns range for both Low-to-High
and High-to-LoW edges, but any specific physical devic,e
shows some asymmetry.
It is possible to maintain duty-cycle symmetry· in a
buffered-cIock distribution network by cascading two
inverting drivers. The two drivers must both be in the
same package, as shown in Figure 11. Because the two

1-30

drivers are in the same package, their prop delay characteristics track, and the High-to~Low and Low-to-High
differential delays tend to cancel.
Limit the fanout from a clock buffer to eight to 15
devices. Fanout calculations must account for both AC
and DC loading. The AC characteristics for logic components are specified at 50 .pF of load capacitance and
occasionally at 300 pF of load capacitance. Propagation
delays and output-enable times increase by approximately 1 ns per each 50 pF of additional load
capacitance. The input capacitance of bipolar logic
families is higher (approximately 10 pF) than that of
CMOS (approximately 5 pF). If the sum of the
capacitance being driven exceeds 50 pF, derate the
driver's AC characteristics appropriately.
Input current is the important DC electrical characteristic for loading purposes. The driving device must
be able to sink the sum of the Low-level input currents
to which it is connected (101 at Vol). The driving device
~ust also be able to source the sum of the High-level
mput currents to which it is connected (Ioh at Voh).
. ~e Low-level input current for bipolar logic
famihes ranges from -400 to -100 JlA, while the Lowlevel input current for modem CMOS logic families
ranges from -5 to -1 JlA. The High-level input current
for bipolar logic families ranges from 50 to 20 JlA, while
the High-level input current for modem CMOS logic
families ranges from 5 to 1 JlA.
Because the 101 at Vol for bus drivers is often as
high as 48 rnA, and the Ioh at Voh is often as high as -24
rnA, input current loading is seldom an issue, except
when driving a parallel (resistor) terminated load. For
example, a 220Q pull-up resistor requires about 22 rnA
worst case (Vol = OV, Vee = 5V), and a 330Q pulldown resistor requires about 15 rnA worst case (Voh =
5V, Gnd = OV). Consider using an AC termination
scheme if this additional current cannot be tolerated.
If a single buffer cannot safely supply a sufficient
clock fanout, use parallel drivers (Figure 22. When distributing a clock signal,attempt to load each of the
parallel lines equally. Unequal loading increases the
skew between lines.

Figure 2. Parallel Clock Drivers
The input load or leakage currents for CMOS
SRAMs, PROMs, and DRAMs is approximately 10 JlA,
sink and source. When you use high-output-current bus
drivers (24 rnA 101 or greater), DC loading is rarely an
issue.
As system cycle times shorten, it becomes more difficult to avoid bus clash situations. Bus clash or bus
contention occurs on a shared bus when one three-state
device fmishes its output-enable time before a second
device finishes its output-disable time. For a short
period of time,. both devices drive the bus. Because the
output stages of memories and logic components can
typically withstand at least 20 rnA of current, the excess
current does not shorten the devices' useful lives.· Bus
clash does cause large positive and negative current
changes in the device Vee and ground paths, however.
The demand for current induces Vee and ground
bounce noise just like the simultaneous switching situation previously discussed. Thus, avoid more than 5 ns of
overlap in the worst-case output enable and output disable times.
You can use CMOS components' low input current
to advantage on buses when hold time is deficient. For
example, consider a CMOS memory connected to a
CMOS octal register. The memory is read, the IOE (or
the ICE) deasserted, and the data clocked into the
register. Ordinarily, the data should be clocked into the
register before IOE is deasserted because the memory's
worst-case output-disable time could be very short.
When the memory is read in this case, however, the distributed capacitance presented by the register inputs,
the PCB. trace, and the memory's own outputs is
charged. Because the memory's output leakage current
and the register's input current are very low (5 to 10
JlA), this distributed capacitance remains charged for
some time. In effect, the data is held long enough to
make up for the deficient timing.
High-speed SRAMs and FIFOs have timing requirements that are often difficult to meet using
synchronous circuits. In such situations, there are
asynchronous alternatives to consider. You can use the
delay lines supplied by various manufacturers by combinatorially gating the output taps to synthesize the required signal. Delay lines are typically calibrated by

Buses and Memories
When you design buses in high-performance systems, it is important to consider the effects of AC and
DC loading. The input and output capacitance of
CMOS SRAMs, PROMs, and DRAMs ranges from 5
to 7 pF. This capacitance can become a concern with
large memory arrays.
Be especially careful when using SRAM modules,
which might have high input and output capacitances
due to the multiple devices connected to each signal
line. Because the signals that drive large memory arrays
(such as the address, RAS, CAS, and data lines) tend to
have long PCB traces, it is common practice to seriesterminate these lines to minimize ringing, undershoot,
and overshoot.
1-31

comparing the input's rising edge to the various delayed
outputs' rising edges; the delay times for the falling
edges are less accurate. If a decoded signal _uses falling
edges, make sure that the design can tolerate a few
nanoseconds of inaccuracy.
The Engineered Components Company makes a
family of pulse-generator modules (POMs), which issue
a precise pulse when presented with a positive-going
edge. The company offers standard PGMs, fastrecovery PGMs that have a higher maximum repetition
rate, and delayed PGMs, which wait for a specified
period before issuing the pulse. Both delay lines and
PGMs have propagation delays that range from 5 to 10
ns.

Table 1. Pull-Up and Pull~Down Values
RESISTOA/ALUES

THEVENIt-EQUIVALENT

220Q PULL UP
330Q PULL DOWN

1320

330Q PULLUP
470Q PULL DOWN

194.Q

I-oz. copper line 1.5 mils thick over a ground plane
separated by a dielectric of 0-10 fiberglass epoxy 62.5
mils thick, the theoretical unloaded characteristic impedance is approximately non. In reality, PCB trace
characteristic impedances can range from 50 to 200n.
Capacitive loading reduces the characteristic impedance, increases the delay, and slows the rise time on
a transmission line.
The conventional method for reducing reflections
on transmission lines is with some form of termination,
the most common being the so-called Thevenin type.
This termination consists of a pull-up resistor to Vee
and a pull-down resistor to ground. The goal is to
match the two resistors' Thevenin equivalent to the
trace's characteristic impedance.
Table 1 lists common values for the pull-up and
pull-down resistors. Both of the termination pairs shown
in the table pull toe line to a logic High of approximately 3V when the dHver is disabled. Place the termination
resistors as close as possible to the receiver. Keep in
mind that many CMOS logic components have input
and output clamp diodes to help damp overshoot and
undershoot.

Care and Feeding of PLDs
Programmable Logic Devices (PLDs) are exceedingly useful for designing high-performance systems, but
their characteristics and shortcomings must be well understood. The set-up time for most registered PLDs is
usually just less than the propagation delay. This is because the signal to be latched must propagate through
the AND array as well as the OR/XOR gate before
reaching the flip-flop, while the clock is connected
directly from the pin to the flip-flop. Accordingly, the
hold time for this type of PLD is 0 ns minimum worst
case and several nanoseconds negative, typically. This
negative hold time implies that the PLD samples the
state of the inputs as they existed several nanoseconds
before the clock's rising edge. You can take advantage
of this phenomenon when the device feeding the PLD is
hold-time deficient with respect to the PLD clock.
PLD outputs usually do not have the drive capacity
of standard logic. When you use a PLD to generate a
critical signal, such as a FIFO-read or shift-out pulse,
buffer the signal with a fast, hard~driving gate. Bear in
mind, too, that identical equations implemented in the
same PLD can exhibit· different propagation delays due
to different on-chip path lengths. PLD propagation
delays are especially dependent on capacitive loading.

Metastability
The output of a latch or flip-flop can go into an
undefined or metastable state (neither High or Low)
when the set-up time or hold time for the device is violated. The metastable condition typically occurs when
an asynchronous signal is being synchronized. It occurs
in all process technologies and is impossible to completely eliminate.
The two important metastability parameters to consider in design work are the mean time between failures
(MTBF ) at maximum operating frequency and the
average or typical resolution or settling time, Tsw. The
latter is the time the device takes to resolve from a
metastable state to a stable state. These parameters
and/or the equations for deriving them should be available from a device's manufacturer.
Metastability performance is proportional to a
technology's Vih-to-Vn slew time. High-speed CMOS
registers such as those found in Cypress PLDs have very
fast slew times and typical settling times that range from
100 to 600 ps, depending on the device type.
By double-latching asynchronous inputs, you can
dramatically increase a system's MTBF and reduce the
probability of a metastable event causing system mal-

PCB Effects
The most conservative way to handle PCB signal
distortion effects. is to consider every substrate interconnect .. as a transmission line. In practice, this approach
only works when the unloaded signal transition time approaches the round-trip substrate propagation delay.
_ -For ordinary PCB materials (0-10 fiberglass
epoxy), t~e round-trip propagation delay is approximately 0.3 ns per inch. Therefore, for 3-ns transition times, you should' consider any PCB trace longer
than 10 inches as a transmission line.
characteristic imA transmission line presents
pedance and has
distributed inductance and
capacitance. You can ~~imize ringing on a transmission line by closely matching the output impedance of
the driving device to the line's characteristic impedance.
According to the micros trip model, for a lO-rnil-wide,

a

1-32

.develops whose duration is twice the difference in the
arrival times of the two waves; thus, the magnitude of
the disturbance increases when the length of the parallel or adjacent traces increases.
Due to CMOS's fast edge rates, crosstalk is a
legitimate concern. You can take the following steps to
reduce forward and reverse crosstalk:
1. Maximize the distance between traces, and minimize the distance over which traces are parallel or adjacent. When possible, make the signals on adjacent
PCB layers perpendicular. Use the power and ground
layers as shields between the signal layers. On two-layer
PCBs, run ground lines between adjacent, parallel signallines.
2. Make every other conductor a ground line when
using flat ribbon cable. Protect critical signals such as
clock lines with a dedicated ground strip on PCBs or
with a ground tWisted pair on backplanes.
3. Use Thevenin termination of a line to its characteristic impedance to reduce crosstalk amplitude by 50
percent

functions. When determining the length 'of time to delay
before clocking the second register, multiply the published typical settling time by two or three to create an
extra margin of protection.

Crosstalk
Crosstalk is the undesirable coupling of a transition
on an active line (talker) onto an inactive line (listener).
The crosstalk amplitude is proportional to the talker
edge rates, the physical proximity between signal lines,
and the distance over which the two lines are parallel or
adjacent.
Crosstalk results from two important .physical
causes: mutual impedance and velocity differences.
Mutual impedance is due to the mutual inductance and
capacitance between adjacent signal lines and is a transformer-like effect. Velocity differences arise when a signal propagates along a conductor that is in contact with
two materials. of differing dielectric constants, such as
fiberglass epoxy and air in PCBs. The wave propagating
at the copper-to-epoxy interface travels slower than the
wave propagating at the copper-to-air interface. A pulse

1-33

Protection, Decoupling, and Filtering
of Cypress CMOS Circuits
This application note explains how to protect your
ICs with a low-cost zener diode and why it is good insurance against inadvertent voltage transients. Also explained is the reason why decoupling and high-frequency-filtering capacitors are required. A method is
provided for determining the capacitors' values.

rating. Because zener diodes always fail· shorted, they
cause the power supply to "crowbar" and thus protect
the ICs.
A negative voltage on the Vcc line puts a forward
bias on the· diode. This turns on the diode, which
clamps the voltage to approximately -O.8V. If the negative voltage times the current exceeds the diode's power
rating, the diode fails shorted, as in the reversed-bias
case, and protects the ICs.

Zener Diode Protection
Linear power supplies can cause large voltage transients. When caused by the collapse of a magnetic field,
the transient is negative. When the supply is turned on,
the resulting transient is positive.
Some commercially available laboratory bench supplies behave the same way. When they turn on, they can
over-shoot several volts. When they turn off, lead inductance can cause a negative transient voltage at the Vee
pin. If sufficient energy is available, internal gate oxides
can break down, either destroying or weakening the IC
such that it might fail later.
You can avoid this problem by adding a 20¢ zener
diode (also called a voltage-regulator diode) between
Vcc and ground. Connect the diode's cathode to Vcc
and the anode to ground (Figure 1). A 400-mW, 6.2V
lN525 or equivalent is recommended. You can also use
the IN753, a 5OD-mW, 6.2V zener diode.
If a voltage greater than the zener voltage (6.2V)
occurs on Vcc, the diode breaks down, clamping the
voltage to 6.2V and shunting the current to ground (Figure 2). The diode can be destroyed if the current multiplied by the zener voltage exceeds the diode's power

High-Frequency Filtering
In addition to the protection offered by zener
diodes,
decoupling
and
high-frequency-flltering
capacitors are required on high-performance CMOS
circuits. To use these capacitors effectively, you must
understand why they are required.
To realize the fast rise and fall times that Cypress
CMOS integrated circuits are capable of achieving, the
power-distribution system must be able to supply the instantaneous current required when the device outputs
switch from Low to High. The energy converted to current is stored as charge on the local decoupling
capacitors. They decouple or isolate the circuit from the
power-distribution system. It is standard practice to use
one decoupling capacitor for each IC that drives a
transmission line and one capacitor for every three
devices that do not.
The PCB trace inductance plus the IC lead inductance can "current-starve" the output circuits, causing

Vr

v

Figure 1. Zener Diode Connection
Figure 2. Zener Diode Characteristic

1-34

The last step is to assume a reasonable, tolerable
droop in the capacitor voltage. Assume dV = 100 mV.
Additionally, the signal rise and fall times are 2 ns. Substituting these values in Equation 2 yields

c=

Figure 3. Simplified Capacitor Equivalent Circuit

= 14.4 X 10- 9

rise-time degradation. Remember that the current
through an inductor cannot change instantaneously.
Therefore, you must minimize any series inductance, including the lead inductance of the decoupling
capacitor s.

= 0.0144~

It is standard practice to use 0.01 to 0.1-~ decoupiing capacitors. A 0.1-~ capacitor can supply 5A
under the conditions assumed in the preceding calculations. Another way to look at the situation is that a 0.1~ capacitor supplies 720 rnA of instantaneous current
in 2 ns with only 14.4 mV of voltage droop across the
capacitor.
Decoupling capacitors for high-speed Cypress
CMOS circuits should be of the high-K ceramic type
with a low effective series resistance (ESR). Capacitors
using 5 ZU dielectric are a good choice.

Decoupling-Capacitor Calculations
To determine the value of the decoupling
capacitor, you must estimate the instantaneous current
required when all the outputs of an IC switch from Low
to High, assuming a reasonable droop of the voltage on
the capacitor. The charge stored on the local decoupiing capacitor is
Q= CV
Differentiating yields
i(t)

= EQ= c dV
dt

High-Frequency Filter Capacitors
The 0.1 to 0.01-~ decoupling capacitors usually do
not provide high-frequency decoupling or filtering.
These capacitors do not behave like capacitors at high
frequencies because their series resonance frequency is
not high enough. This is primarily because of lead inductance in their construction, which is a result of the
capacitor's relatively large value.
For high-frequency filter analysis, you can use the
simplified equivalent cirCuit of a capacitor shown in Figure 3. Rs is the effective series resistance (ESR), L is
the effective series inductance (ESL), and C is the
capacitance.
The impedance of the simplified equivalent cirCuit
is:

Eq.l

dt

The characteristic impedance of a typical transmission line is 50.0. Lines with a heavy capacitive load have
a lower characteristic impedances.
Next, assume that the IC is a nine-output FIFO,
such as the CY7C429. The outputs reach
Vee - Vt = 5V - IV = 4V
Each output thus requires 4V/50n = 80 rnA. Because the FIFO has nine outputs, it requires a total of
720 rnA during the rise times of the outputs.
Solving Equation 1 for C yields

c= i

Eq.2

dt
dv

Zc= Rs+ jroL + .1C

Eq.3

l ro

102

'\\

\

1\

\

'\
\
~

f

720x 1O- 3 x 2x 10- 9
100x 10-3

V

\\
/

./
V

/7 /

\

V

--

IV ~ K
i( o ul 10 ~

//

LX
V

v

Zc= Rs+j [roL-

1

ro c]

Eq.4

J

---

K

L--- L--

The magnitude of the impedance is

/

Zc~ "'-I RI + [0> L -

1I

\/

.:c l'

Eq.5

At the series resonant frequency:

~ Ipr

roL= _1_

~

IVV

roC

or,

ro=

10 102 1al 104 lOS 106 107 108 109 1010
Frequency (Hz)

1

-::riC

At the resonant frequency, Zc = Rs, which is the
minimum impedance.
Figure 4 shows how the impedance varies with frequency. The series resistance usually increases as the

Z (Ohms)
Figure 4. Capacitor Impedance Versus Frequency

1-35

F?l.
~

Protection, Decoupling, and Filtering

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

capacitance decreases. Also as the capaCitance
decreases, the inductance typically decreases, which
means that. the resonant frequency increases. This. is
usually due to the capacitor's physical construction.
Note that a surface-mounted capacitor's lead inductance is at least an order of magnitude less than that of
an axial-lead capacitor.
,The next step in high-frequency fllter analysis is to
determine a typical system's expected high-frequency
components. Begin by assuming that the circuit is driven
by a series of digital pulses with finite rise and fall
times, then perform a Fourier transform on the series to
determine their frequency components.

1t

1
3.1416x lOx 10- 9

31.83 MHz

1
F2= - =

9= 159.15MHz
3.1416 x 2 x 10Within the IC, signal rise and fall times can be as
fast as 300 ps (picoseconds), which means that F2 =
1.061 GHz (1,061 MHz) .. In some ICs short timing pulses are generated internally, but they are usually longer
than the 300-ps rise time, so. the preceding F2 is the
highest harmonic present.
1t tr

Because the IC's data outputs can normally change
no faster than those of the inputs, the outputs do not
generate additional higher-frequency harmonics.

Fourier Transform of a Periodic Pulse
Figure 5 illustrates a periodic pulse of amplitud~ A,
period T, rise and fall times of tr, and pulse width of Tp,
as measured between the SO-percent-amplitude points.
The approximate frequency-domain transform appears in Figure 6. The amplitude of the frequencydomain voltage is a function of the signal's amplitude
and duty cycle in the time domain. The fundamental
frequency, Fa, is related to the pulse train's period. The
first harmonic, FI, is of equal energy and is a function
of the pulse width. The second harmonic, F2, contains
half the energy of Fo and is a function of the, pulse rise
time.
The rise and fall times of Cypress's CMOS and
BiCMOS circuits are 2 ns, by design. If a Cypress PLD
is driving the write- or read-strobe inputs .of a
CY7C429-20 FIFO at the maximum frequency of 33.3
MHz (T = 30 ns) with a 10-ns/30-ns-duty-cyc1e signal
(Tp = 10 ns), the following signal frequencies are
generated:
1
1
Fo= - =
1t T
3.1416 x 30 x 10- 9

1
Tp

FI= - - =

Parallel the Filter Capacitors
You cannot fmd a· capacitor whose three series
resonant frequencies correspond to Fa, FI, and F2. Instead, select three separate capacitors with the appropriate resonant frequencies and connect them in
parallel between Vee and ground, as close to the IC as
possible. The capacitors act as a bandpass fllter, shunting the unwanted, high-frequency signals to ground. The
sum of the capacitors' values should be greater than or
equal to the capacitance value given by Equation 2. The
total high-frequency flltering capacitance is usually between 100 and SOO pF.

Low-Frequency Filter Capacitors
A solid tantalum capacitor of 10 JlF is recommended for every SO to 100 ICs to reduce power-supply
ripple. Place this capacitor as close as physically possible to where the Vee and ground enter the PCB or
module.

10.61 MHz

A

a=

2Aa

O.SA

Aa
0

t

I

I

Fo

FI

F2

Ie.
T

f-

Figure 6. Fourier Transform of Periodic Pulse

Figure S. Periodic Pulse Waveform

1-36

Section Contents
Page

Modules
Choosing Packages in High-Density Module Designs ........................................ 2-1
The Multichip Family of Universal JEDEC ZIP/SIMM Modules .............................. 2-7

,

-=4

CYPRESS
SEMICONDUCTOR

Choosing Packages
in High-Density Module Designs
components, ceramic modules can be used in military applications. For all applications, the ceramic-substrate
devices have better thermal characteristics than nonceramic types.

This application note describes the various packages
in which high-density memory modules are available and
reviews some of the application areas where specific
packages find use. Module outline drawings accompany
the text.
You can use high-density memory modules in place
of multiple monolithic les to minimize space, achieve
better performance, and obtain single-device solutions.
These modules are now available in a variety of package
styles, each of which satisfies different needs in high-performance systems. Table 1 summarizes the characteristics
of the different package types.
There are two general module types. The first type
uses plastic-encapsulated les mounted on an epoxyfiberglass substrate. The monolithic les on the modules
can be mounted in Sale, VSOP, or SOJ packages, which
are small-outline parts with either gull~wing or J-bend
leads. The second module type offers hermetically sealed
Lee (leadless chip carrier) les mounted on ceramic substrates.
Modules built on epoxy-fiberglass substrates offer
economic advantages over modules with ceramic substrates. In general, however, ceramic substrates can accommodate more components than epoxy-fiberglass substrates. Further, when assembled using military-grade

SIPs
The single in-line package, or SIP, is a vertically
mounted module with a single row of pins along one edge
for through-hole mounting. The pins are on a lOa-mil
pitch. Note in Figure 1 that the footprint of this plastic
package is only 0.66 square inches.
SIPs are typically used in low-pin-count applications
and are often used where high component density is required. These modules' vertical orientation and accommodation of components on both sides can increase component density by a factor of four or more over designs
that use monolithics. In addition to meeting space constraints, this higher density can also improve memory system performance by reducing path lengths from chip to
chip.
Another chief source of appeal for the SIP module is
fast, easy access to state-of-the-art package technology.
That is, a design's main circuit board can be implemented
in conventional, high-yield, through-hole technology,
while the system, overall achieves superior component

TopY'_

~

I"

'1

DDDDDDDDD~
.
b{=l
0.040
TYP

~

0.175

0.100
TYp.

0.035
0.075

.M12
0,022

Figure 1. SIP

2-1

0.01
TV?

-I 0."" I--

T

4.440

I

MAX

I

D[JLJLJ[jI }1PI~~
.Q.QQZ
0.013

~

o
~ .

0.1750.100
TYp.

..2:.Q.1!

0.075

0:026

Figure 2. Flat SIP
density and high performance by employing fully-tested
modules whose fine-pitch, surface-mount components are
mounted on a multilayer, tight-tolerance substrate.

which they are mounted. Flat SIPs' advantage is their low
profile; they are typically used where component height
above the main board is constrained. Flat SIPs range in
height from 0.300 to 0.38 inch.

Flat SIPs

ZIPs

Flat SIPs are virtually identical to SIPs, except that
their single rows of pins have a 90· bend (Figure 2).
Therefore, flat SIPs lie close and parallel to the board on

ZIP modules are similar to SIPs. However, the ZIP
module has pins on 100-mil centers along both sides of

Table 1. Module Package Characteristics
Package Typical
Typical
Type
Pin Count Height (in.) Mil
Min

Max

Min

Advantages

Disadvantages

Max

Board Space
(sq. in.)
FR4

Cer

SIP

24

50

0.5

0.9

N

Vertical orientation; FR4 or ceramic

Limited pin count

1.2

0.9

FSIP

24

50

0.2

0.4

N

Very low profile; mechanical
stability; FR4 or ceramic

Lower density due to
horizontal orientation

2.7

2.4

ZIP

24

100

0.5

0.9

N

Vertical orientation; JEDEC standard
pinouts; pinout compatible with
SIMM

1.2

N/A

SIMM

24

100

0.5

0.9

N

Vertical orientation; socket
mounting; pinout compatible with
ZIP

1.2

N/A

VDIP

36

104

0.5

0.95 y

0.17

0.37

1.2

0.9

y

Vertical orientation
Low profile; excellent mechanical
ruggedness

Horizontal

2.9

2.9

DIP

24

60

QUIP

48

200

Y

Low profile; excellent mechanical
Horizontal
ruggedness; increased number of pins

2.9

2.9

QFP

68

144

Y

Surface mount; low profile; excellent Surface mount
teohnology required;
mechanical ruggedness;large
number of pins in small area
horizontal; comp~ments
on one side only

3.1

3.1

PGA

68

144

Y

Large number of pins in throughhole technology; low profile;
excellent mechanical ruggedness

Multilayer boards;
horizontal; components
on one side only

2.9

2.9

Notes: Mil entries indicate whether a hermetic, military version is available
Board space is the mother board area that the package occupies when the module carries eight to 28 components

2-2

~

~'§r""'"
~

Packages In High.Density Module Designs

SEMlCONDUCfOR ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;=

BotIomVl_

--1'

-\

~

--I

0.350

I-

;~~iftlmlI W

=r;j I:-~.... .+. ~ ...-1~[....-1 J--.~W•••• ;lJ--~
I

I · · · · · · · · · · · · · ..

. ........ · · · · · · ·

J

I

~

0.100
TYP

Pin 1

Figure 3. ZIP
the substrate (Figure 3). Pins on alternate sides are staggered by 50 mils. The dual row of staggered pins allows a
higher connection density than the SIP, while maintaining
lOO-mil spacing between adjacent pins. The staggering
provides additional separation for the lead vias and supports between-lead traces. At the same time, pin count is
doubled over that of SIPs.
Many ZIP modules have a vertical dimension of
0.500 inch maximum. This low profile makes them candidates for VME systems, where there is a maximum allowable component height.

Some module devices are available in· both ZIP and
SIMM packages with the same form factor. The pin out is
such that the footprint of some SIMM sockets matches the
footprint of corresponding ZIP modules. This allows system prototypes to use socketed SIMMs, and production
systems to use through-hole-soldered ZIPs, with no
change in the motherboard.
Some SIMMs and matching ZIPs have presencedetect pins, whose unique combination of no-connects or
grounds can be used by external logic to identify the
module's memory capacity. Thus, the system can determine the amount of memory present without user input.

SIMMs

DIPs

Single in-line memory modules, or SIMMs, are also
similar to SIPs, except that SIMMs have no pins for
through-hole mounting (Figure 4). Instead, the module's
bottom edge effectively acts as an edge connector, which
is part of the substrate material.
Contacts directly opposite each other are connected
together. Some SIMMs have contacts on lOO-mil centers;
others have 50-mil centers.
The typical application for SIMM modules· requires
socket-mounted components, either for repair or for
upgrades in the field. Some SIMM sockets hold the
SIMM at an angle, which reduces the height of the
module on the board.

DIP modules have identical footprints and similar
form factors to standard IC DIPs. The modules are typically taller than the DIP packages used for monolithics.
Components are mounted on both the top and bottom of
the substrate.
Generally, these modules are used in anticipation of
monolithic devices that will someday fit the same
footprint. DIP modules allow engineers to design-in
monolithic devices that do not yet exist by employing the
modules to meet immediate production needs. Practically,
even after monolithic devices become· available, the
modules generally continue to find utility while initial

0.125 DIA.

+.001 2 PLCS

0.145 REF

. . . - - - - - - - " - - " - - - - 3.35 (64 P I N S ) - - - - - - - - - - - - - . {

Figure 4. SIMM

2-3

PIN 64

0.345

M

L~II ~~
f

~
0.175

0100

-1 L

TvP I I

-1L

__ I I
1-.

0.015

0.013

0.1~

I I 0:025

TYP

(a)

~I·-----------------------~------------------------~

DO

D

L.-_ _....

DO DO
0.050

lYP

(b)

Figure 5. VDIP (a) and HVDIP (b)
production ramp-up of the monolithic devices keeps supplies short.

through-hole mounting (Figure 5b). Components are hermetically encapsulated. Used in both low- and high-pincount applications, they are especially attractive when
high component density is required on the main board.
As with the plastic VDIP, pins on opposite sides of
the module are aligned, and spacing in both directions is
100 mils.

VDIPs
VDIP modules typically have the largest pin out of
any modules. Similar to ZIPs, VDIPs are vertically
mounted modules with plastic-encapsulated components
and epoxy-encapsulated chips (Figure 5a).
VDIP modules have pins along both sides of the substrate, with the pins on alternate sides aligned. Spacing
along each row and across the module is 100 mils. The
dual row of pins allows a higher connection density than
SIPs, while maintaining lOO-mil minimum spacing between adjacent pins.
Like ZIPs, VDIPs are useful in high-pin-count
devices, where the host board is designed to normal
through-hole design rules. VDIPs help retain the density
advantages of vertical packages, while providing a low
profile.

HDIPs
Hermetic DIP (HDIP) modules have ceramic substrates with the same pin arrangements and footprints as
standard IC DIPs (Figure 6). Hermetic components are
mounted on both sides of the substrate. Hermetic DIP
modules range in. size from 24-pin devices with 300-mil
widths to 60-pin, 600-mil devices to 900-mil special
modules.

The QUIP
The quad in-line package (QUIP, Figure 7) is similar
to. the DIP except that the QUIP has a dual row of pins
along the package edge. In-row and row-to-row spacing is
100 mils, with pins in adjacent rows aligned directly
across from one another. The QUIP is a low-profile package with excellent mechanical ruggedness and the added
advantage over DIPs of higher pin density for the same
package length.

Ceramic Modules
For harsher environments, several types of modules
are available with ceramic substrates and side-brazed
leads.· These modules sometimes have sealed metal lids to
protect directly-mounted IC chips or utilize hermetically
sealed LCC-packaged ICs. Four hermetic packaging styles
are available: HVDIPs, HDIPs, PGAs, and QFPs.

PGAs and QFPs

HVDIPs

Pin grid arrays (pGAs, Figure 7) and quad flat packs
(QFPs Figure 8) are ceramic-substrate packages similar to
those used for monolithic devices, except that the

Hermetic vertical DIPs (HVDIPs) are vertically
mounting ceramic modules with pins along both edges for

2-4

provides the die-to-die interconnect and the connection to
the I/O pins.

modules' cavities house more than one die. Each die is
individually bonded to pads. The customized substrate

I

1.414
~

•

..

I

II~bJCII~~ [[~:

~0230
0.285
II ~
-i I- 0.021

I I 0.100
-i I- TYP
Figure 6. HDIP

1.010

O.tto

·000000000'
-0

·0

I

·0

e-

0'

0-

0..350 DIA
WltDOW

'0-+,-0-

'0

•0
-0

llOOC

0'0'01 TYP
-' DIA

0-

0 0·

-000000000-

A

000000000

/ , 2 l "' ~ , 7 I , '0"
"t
BOno ... Vltw

"

A'

TOP VIEW

Figure 7. PGA Module

SEAL RING

t
.115 MAX

t

.180

t
HEAT SINK

Figure 8. QUIP

2-5

.600

JL

.100 TYP .

~~RESS
49'

Packages in High-Density Module Designs

~COIDUcr~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. -

~---------!g-----------

... -

1-----------8g----------~·

~-------;~--------~
""",

N

...

i-------~~=------_i
g"!i

11

!!

!llgj
l-:i
-:i

o

3:

1:S
1fT'!

I~

2-6

The Multichip Family
of Universal JEDEC ZIP/SIMM Modules
You can use each generation as a x32, x16, or x8
memory block by driving the chip enables as address pins
and connecting the 110 pins in parallel. This scheme allows the memory configurations shown in Table 1.

This application note describes three Cypress memory
modules, their special features, and how to use the
modules as universal memory building blocks. The three
modules are the CYM1821, CYM1831, and CYM1841,
which provide 512K, 2M, or 8M of static RAM.
The CYM1821, CYM1831, and CYM1841 provide
the versatility to design many different systems with the
same memory modules. The pin out and footprints allow
you to use the same module in 8-, 16-, or 32-bit systems
in ZIP or SIMM form factors using a single board layout.

Variable Depth
The three modules provide additional flexibility in
memory depth. All three are 64-pin modules with compatible pin outs. The CYM1821 has four no-connect pins:
29, 30, 35, and 36. Pins 29 and 30 are address pins on the
CYM1831, and pins 35 and 36 are still no-connects. On
the CYM1841, pins 35 and 36 are address pins. This allows the modules to function as memory options in a
design.
The module family's variable depth is enhanced by
the inclusion of two presence-detect pins: PDo and PD1 on
pins 2 and 3, respectively. These pins provide unique
logic conditions for a system to automatically sense the
amount of memory present, which permits the system to
adapt automatically to the module that is plugged in. The
presence-dectect pins are either tied to ground on the
module or left open, according to the information in
Table 2.
Figure 3 shows a simple circuit that decodes the
presence-detect pins and generates depth-indicator status
signals.

History
The JEDEC Solid State Products Engineering Council
approved four series of SRAM ZIP/SIMM module pin
outs for balloting 1 in December, 1987. The 64-pin
module included definitions for 4 x 16K x 8, 4 x 64K x 8,
and 4 x 256K x 8 generations.
The JEDEC definition established the industry standard for the mechanical specifications and pin outs of the
three generations of modules (Figure 1). The CYM1821,
CYM1831, and CYM1841 follow the JEDEC definition.

Variable Width
The JEDEC pin-out definition includes four chipenable pins that each control a byte-wide block of
memory (Figure 2):
CS1, on pin 32, enables 1100 through 1107
CS2, on pin 31, enables 1I0s through 11015
CS3, on pin 34, enables 11016 through 11023
CS4, on pin 33, enables 11024 through 11031

Layout Considerations
The three modules are available in either ZIP or
SIMM form factors. Additional versatility is included in
Table 2. Presence-Detect Pins

Table 1. Memory Configurations
x16
32Kx16

x32
16x32

No Module

POI
OPEN

PDO
OPEN

CYM1821

x8
64Kx8

CYMl821

OPEN

CYM1831

256Kx8

128x16

64Kx32

CYM1831

GND
OPEN

CYM1821

1Mx8

512Kx16

256Kx32

CYM1841

GND
GND

Word Width

2-7

GND

~~RE$
If!'
~

ZIP/SIMM Modules

SEMiCQIDucrOR

prototyping and testing various memory depths in the
same socket.

the module footprint to allow ZIP or SIMM modules to fit
into the same board layout. The ZIP pins are arranged in
the same hole pattern as a SIMM socket. If the board
layout fits a SIMM socket, such as the AMP 821825-1, a
ZIP plugs right in. This capability is useful for board
4x256Kx8

4 x 64K x 8

4 x 16K x 8

PDo(GND)

PDo(OPEN)

PDo(GNO)

1/0 0

1/00

1/00

1/0,

I/O,

1/0,

1/02

1/0 2

.1/02

1/03

1/03

1/03

10

Vee

Vee

Vee

12

A7

A7

A7

14

As

As

As

16

Ag

Ag

1/011

II0g

4 x 16K x 8

4x 64K x 8

4 x 256K x 8

GND

GND

GND

PO, (OPEN)

po, (GNO)

PO, (GND)

4

5

1/04

1/04

I/O.

6

7

I/Os

I/Os

II0s

9

1/0,

I/O,

I/O,

11

1/07

1/07

IIOr

13

Ao

Ao

Ao

15

A,

A,

A,

17

A2

A2

A2

19

00,2

00,2

D0 12

21

00,3

00,3

00'3

23

00'4

00'4

00,4

25

00,5

00'5

00 , 5

27

GND

GND

GNO

29

NC

A,s

A,s

31

~2

~2

"CS2

33

~4

~4

~4

35

NC

NC

A'7

37

~

~

M

39

1/020

1/020

1/0 20

8

18

I/Oe

1/011

20

I/Og

110g

22

1/0,0

1/0,0

1/0'0

24

I/O"

I/O"

I/O"

26
28

WE

WE

WE

A'4

A'4

NC

30

~,

~,

"CS,

32

~3

~3

~3

34

AliI

NC

NC

36

GND

GND

GNO

38

1/0,8

1/0,8

1/0,8

40

1/0'7

110 ,7

1/°,7

42

1/0 '8

1/0 ,8

1/°'9

1/0 ,9

1/0'8

46

A,O

A,O

A,a

48

A"

A11

A"

50

A'2

A,a

A,a

52

A'3

A'3

A'3

1/0 24

1/024

1/024

1/0 25

1/025

1/025

58

1/0 26

1/0 26

1/°26

60

1/0 27

1/°27

1/027

62

GND

GND

GND

64

Figure 1.

JEDEC Solid State Products Engineering Council,
Committee Letter Ballot JC-42.3-88-9, 16 January 1988.

3

~

1/0 111

Reference

41

1/021

1/0 21

1/021

43

1/022

1/° 22

1/0 22

45

11021

1/021

1/0 23

47

A3

A3

A3

49

A4

Ac

A4

51

As

A5

53

Vee

As
Vee

~

~

~

1/028

1/0 28

110 28

1/0 21

1/021

1/0 21

1/0 30

1/0 30

1/0 30

110"

1/0 3 ,

1/0 3,

44

56

6~·Pin

61

SRAM Module Pinout

2-8

Vee

ADDRESS
~

WE

-

~-r-~

x4

SRAM

~ 1/00 -1/0 3

.....

x4
r-- SRAM
-r-

----

I

r-~
~-r-

---

f----

x4
I/O
SRAM ~ I/Oa- "

x4

SRAM

P:- 1/0

~-

x4

SRAM

-1/0 19

-i-f--- i--

~

I

I--

16

~

I

x4

SRAM

P:-

J

-

'--~

x4
SRAM -,,:- 1/024 - 1/027

'--

'--L.--

x4

SRAM

~

I

1

Figure 2. 64·Pin SRAM Module Block Diagram

Vee

PDo

-..1..-..,---------...,.--1

NO MODULE

16Kx32

64Kx32

256Kx32

Figure 3. Depth Indicator Circuit

2-9

Section Contents
Page

ECL and TTL BiCMOS
Noise Considerations in High-Speed Logic Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3-1
Using ECL in Single + 5V TIL Systems ................................................... 3-4
BiCMOS TIL and ECL SRAMs Improve High-Performance Systems ......................... 3-7
PLCC and CLCC Packaging for High-Speed Parts ......................................... 3-15
A New Generation of BiCMOS High-Speed TIL SRAMs ................................... 3-20
Access Time vs. Load Capacitance for High-Speed BiCMOS TIL SRAMs . . . . . . . . . . . . . . . . . . .. 3-23
Combining SRAMs Without an External Decoder ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3-27
BiCMOS TIL SRAMs Improve MIPS R3000 and R3000A Systems .......................... 3-30
Memory and Support Logic for Next-Generation ECL Systems .............................. 3-33

CYPRESS
SEMICONDUCTOR

Noise Considerations
in High-Speed Logic Systems
This application note explains why ECL is a lowernoise logic family than TTL or CMOS logic, both internally at the circuit level and externally at the system level.
Also presented are the implications of ECL for your
design needs.
In state-of-the-art logic system design, clock frequencies of 50 MHz and beyond are not uncommon and give
rise to many noise problems that were not significant in
the past. Due to the nature of TTUCMOS logic, operating
at these faster clock rates is inherently noisier and requires
high-power output drivers with their associated groundbounce problems.
Fortunately, ECL solves these problems. It is built for
speed and is available in a low-power BiCMOS process
technology. Since ECL was designed for high-speed applications in 1962, a number of design iterations have improved ECL devices. Consequently, it is the premier highspeed logic family.

Additionally, built-in temperature and voltage compensation provides constant noise immunity in lOOK
devices, so that noise margins are flat. In these devices,
temperature compensation is designed into the DC input
thresholds by voltage regulation. A correction factor
designed into the current source, along with added circuitry between the output transistors' bases, make lOOK
ECL's output voltage levels insensitive to temperature.
These corrections rely on opposing positive or negative
temperature-tracking-coefficient circuits. In both lOOK
and 10KH ECL devices, voltage compensation is done by
regulating an internal reference voltage, supplying a constant current source, and making both functions independent of supply voltage. These compensations result in
a 3x improvement over T1L noise immunity.
Additional anti-noise features include differential
pairs, which prevent large current spikes when switching
logic states, provide clean power supplies, and reduce
ground bounce. Differential paths also cancel internal
parasitic charging currents.
Finally, ECL's more constant power dissipation - independent of operating frequency - keeps power-supply
surges to a minimum. Supply current drain is governed by
the constant-current sources that provide operating current
for the differential switches and level-shifting networks.
Thus, ECL's current drain remains the same regardless of
the state of the switches. The high ratio of ECL noise immunity to internally generated noise also contributes significantly to reliable system operation.

ECL's Internal Advantages
Internally, ECL steers current and compares input
signals to a voltage level instead of switching transistors
on and off over a wide voltage excursion, as do other
logic families. ECL's small voltage swings and low-current switching in signal paths minimize crosstalk and
noise generation (Figures 1 and 2). ECL generates less
noise switching logic levels due to the smaller dV/dt in
the I = CdV/dt equation, where C is the coupling
capacitance between signal paths, I is crosstalk current, dt
is the rise/fall time, and dV is logic swing.

dt

3.6V

1= C dV/dt

-O.9V

EeL

Crosstalk current I is less for ECL than
TTL due to smaller dV and dt

-1.7V

TTL __O_V_--r

Figure 1. Effects of Rise and Fall Time
3-1

5;;=
~

-; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;~N; ;o; ;is; ;e; ;C~OD; ;s; ;i; ;de; r; ;a; ;tI; ; ; ·

o;;;;;;D;;;;;;s;;;;;;iD;;;;;;;;;;;;;H;;;;;;i;;;;:;g;;;;;;;;h;;;;;;-S;;;;p;;e;;;;;;e;;;;;;d;;;;;;L;;;;;;o;;;;;g;i;;;;;;c;;;;;;S;y;;;;;;st;;;;;;e;;;;;;m;;;;;;;;s

SEMlcamUCfOR_

5.0V

Br'ov

3.6V

EeL

RISEIFALL TIME = dt
LOGIC SWING = dV
Therefore, time is saved because the logic swings
are smaller and rise/fall time faster

3.6V

~,-_-....:::;O.:.:...9V,-= -L

___-~1.~7V~~~~________~8=OO V

TTL----'O'--'-V-...J

SLEW RATE = dV!dt

dv

,---

CMOS
Figure 2. Effects of Slew Rate
The smaller transitions also prevent the emitter-follower outputs from generating ,large current spikes when
switching logic states, unlike TTL totem-pole outputs.
TTL current spikes are also related to I = CdV/dt. For
ECL, C is the capacitive load. TrL ground bounce results
from the current spikes and the inductance (L) between
the board and the device's pins and bond wires. The
bounce voltage (V) equals -Ldildt, which can be severe
(see the' Reference) and can cause the chip's ground to
rise. Because ECL's. einjtter followers provide superior
output current and the lower capacitance of characteristicimpedance transmission lines, ECL solves the problem of
power-supply droop and spikes when a large number of
transistors change state.

ECL in the System Environment
In the ECL system environment, low-impedance
open-emitter outputs and high current capacity allow you
to use board-level transmission-line techniques that reduce
reflections and decrease roll-off of high-speed rise and fall
times. To understand these system-level advantages, consider that voltage-mode circuits have a High-state output
impedance between 50 and 150n and exhibit an outputstepped characteristic. They fIrst reach 50 percent of the
final value, then later reach the fmal value, which can be
3.5V and above.
In contrast, ECL output impedances are less than lOn
and ensure a full-valued signal into transmission lines.
The signal only needs to be 800 mV.OutPuts are also
capable of supplying 50 rnA, which is required to drive
passive terminations. Because ECL gives you the built-in
ability to drive controlled-impedance PCB traces, you can
make tradeoffs among power dissipation, speed considerations, and PCB trace width.
Some ECL devices have skew-free differential or
complementary outputs for common-mode noise rejection
at the receiving ena of either board traces or twisted-pair
wire. As mentioned earlier, ECL's smaller logic transitioris lower crosstalk between board-level signal traces, as
well as at the IC level.

OUTPUT VOLTAGE
LEVELUMITS

Logic Family of Choice
The factors described here make ECL the logic family of choice when 'designing systems at 50 MHz or greater
clock/data rates. As !l percentage of total logic swing,
ECL provides superior noise margin in the system environment compared to both TrL and CMOS logic.
In a typical TrL/CMOS system, board-level noise
can be 800 mV or higher due to ground bounce and other
switching noise. The Reference explains this effect for
both CMOS and TTL and includes actual measurements

~""""""""""""'" -

Va.. (min.)

.'

INPUT TRANSITION
REGION UMITS

Note: VNH and VNL are the High- and Low-level device noise margins. Because ECL system noise is much lower than TTL
system noise, the smaller ECL device noise margins are still better than the TTL margins.
Figure 3. Identifying Specification Limits on Input and Output Voltage Levels

3-2

Table 1. System Noise Generated by Logic

lower percentage of total signal swing than in TTUCMOS
systems. ECL is therefore less susceptible to logic errors
at high speeds (Figures 4 and 5).
EeL's full temperature and voltage compensation
results in relatively constant signal levels and thresholds;
improved noise margins over chip-to-chip temperature and
voltage variations; and a tighter AC window in the system
environment. Another benefit of reduced noise generation
is the improvement in electromagnetic interference (EMI)
and radio-frequency interference (RFI) in ECL systems
versus TTL and CMOS.

TTL CMOS lOOK

Parameter
HIGH "I" (VNH) (mV)

400

400

LOW "0" (VNL) (mV)

400

400

Typ. System Noise (mV)

800
3.5V

800

22.8%

16%

Logic Swing
Percent Noise!

5V

10K
145
145
175
20
20
900mV
2.2%
140

Ipercent Noise = (system noise/logic swing) x 100

Reference

of ACL (CMOS) devices. In an ECL system, the noise is
closer to 20 mY. As shown in Table 1 and Figure 3,
EeL's overall system or board-level noise is at a much
ns

126.B40 ns

101.840 ns

.-------,------_.-

-1_______ _. . ___.___.__. _____.__. __._..._..__ . __.__.________ . _. _,_,_. _

_________
_______ ---'_. _ _--'-_'--__
Ct!. 3
~
200.0 mVolts/div
Timebase 5.00 ns/div
Ch. 3 Peremeters
Freq.
~L

"EDN's advanced CMOS logic ground-bounce tests,"
David Shear, EDN, March 2, 1989.

~

-

80.8538 NHz

Of fset
Oelay

a

622.5 mVults

101.840 ns

Figure 4. EeL Signal

76_8400 ns

126.B40 n!\

101.840 ns

I

___ .1-____ ._____ _
Ch. 3
600.0 mVolts/dlv
Timebase 5.00 ns/dlv
Ch. 3 Parameters

Freq.

-

80.6530 MHz

Figure 5. TTL Signal

3-3

Offset
Delay

-

258.7 mVolts
101.840 flS .

:z

,

.. CYPRESS
SEMICONDUCTOR

Using EeL in Single +5V TTL Systems

ECL normally uses a -S.2V supply for 10K- and
10KH-compatible devices or a -4.SV supply for 100Kcompatible devices. Pulling ECL circuits and memories up
to a single positive 5V level instead of using the nonnal
supply does not change any performance or absolute-value
logic levels so long as all the ECL device Vcc pins are
tied to +5V, and the device VEE pins are tied to ground.
The translators have separate supply pins and either
separate or common ground pins for the circuit's EeL and
TTL portions. This feature isolates the noisy TTL supplies
from the ECL section, which runs at much faster speeds
and with tighter noise margins.

The advent of very high speed, low-power, ECLcompatible BiCMOS SRAMs and PLDs is causing an
evolution in high-performance systems. ECL's inherent
speed and noise improvement is well documented, but
questions .and misconceptions concerning the devices
might occur. These questions stem from the fundamental
problems 6f mixing CMOS logic and bipolar ECL circuits
on the same die and from interfacing ECL devices in
single +5V supply CMOS/TTL systems.

Chip-Level Considerations
At the chip level, it is possible to integrate both ECL
and .CMOS logic with negligible noise coupling. This
compatibility is mainly due to the absence of noisy highdrive output devices between the device's CMOS sections
and the ECL lIOs.
The combined ECL/CMOS chips exhibit very low interconnect capacitance between devices on-chip, and the
drive requirements are minimal. The devices generate less
noise than occurs between devices at the board level. The
noise magnitude on the chip VEE line equals approximately 20 mV worst case, in contrast to SOO mV of noise in
typical high-speed, board-level CMOSITfL designs.
Further, the unique configuration that Cypress Semiconductor employs to connect the device ECL circuit
ground (Vcc) and ECL output ground (VCCA) reduces
noise coupling between the internal CMOS circuitry and
the ECL output drivers. Because the devices have a low
overall noise level and employ internal supply decoupling,
both the ECL and CMOS sections of Cypress devices run
successfully on the same power pin.

ECL-TTL-ECL Translation
The Brooktree Bt501 (lOKH ECL compatible) and
Bt502 (lOOK compatible) octal transceivers and translatorsperform bidirectional ECL/TTL transfers. These
devices offer the option of supplying +5V only to the
circuit's ECL portion (Figure 1). This arrangement makes
it possible to design the system with only one power
source and simplifies the task of adding ECL circuitry to a
TTL board.
You can isolate the ECL section from the TTL section in much the same way you isolate analog and digital
sections on a mixed-signal board. To isolate the TTLgenerated noise from the ECL +5V supply lines, you must
maintain separate ECL and TTL power lines; you can

~--I--

Board-Level Considerations
At the board level, using ECL-I/O-type devices in
single +5V TTL systems is possible with off-the-shelf
level translators. These translators are made specifically to
run standard ECL devices in a pseudo-ECL logic mode,
with switching levels pulled up to the range between the
+SV supply and ground.

BC.(DO.D7)

DIR

11'L vee

TIL GND

IiCL VIlB

sa. vee

Figure 1. Bt501l502 TTL/ECL Transceiver

3-4

~RffiS
-==:t!!If.,

-;;;;;;=========;;;;U;;;s;;;;in;;;g~E;;;;C;;;;L;;;;;;;;I·n;;;;S;;;;i;;;;n;;;;gl;;;;e;;;;+;;;;5;;;;V=T;;;;T;;;;L;;;;S;;;;;:;y;;;;;;s;;;;te;;;;m;;;;;;;.s

SEMICONDUCTOR .;;;

(a) MC10H350

+5Vdc

(b) MC10H351
IfOui

Bin

II GUI

AOUi

A in

o in

12

10

C in
15

Vec (+5.0 Vclcl - 1'1... lind 111
Gnd - Pin 8

14

Common 9
Slrobe

4

Aoul

111

00Ui

17

0001

19

~

18

C GUI

=

VCC ( ... S.O Vdcl - Pins II. 11, 15.20
Gnd = Pin 10

160

=

Figure 2. ECL to TTL (a), TTLINMOS to ECL (b)

Figure 4. ECL to TTL
have common or separate ground planes. Employ normal
power-supply decoupling for ECL devices.
The Brooktree devices have the advantage of providing eight sets of transceivers for both translation directions
in one IC package. A disadvantage is speed. The
Bt501l502 devices have maximum propagation delays of
7 ns when translating from TTL to ECL and 11 ns in the
other direction. In some applications this might be too
slow.
For a faster set of translators that run on single 5V
supplies, try the Motorola MCI0H350 and MCI0H351
(Figure 2). The MCI0H350 only translates in the ECL-toTTL direction, but it is faster than the Brooktree parts,
with a 5-ns maximum propagation delay, and includes differential inputs. The MCI0H351 is the TTL-to-ECL converter, offering a maximum delay of 2.1 ns. These devices
have separate ECL and TTL supply pins, but have common ground-pin connections.

Another method of translation is to use all discrete
components or a combination of discrete and integrated
products. The purely discrete approach speeds up the
translation but introduces the risk that noise from the
TTL-to-ECL sections might feed through the power and
ground connections. You also have to consider the lack of
temperature andlor voltage compensation, which affects
noise margins.
For translating TTL signals to ECL, use a simple
voltage divider network, whose primary purpose is to
reduce the TTL levels to ECL-Ievel logic transitions (Figure 3). In the other direction, a high-speed PNP transistor
increases the logic swing to accommodate TTL-logic-level
.
transitions (Figure 4).
A faster approach appears in Figure 5, where a differential pair consisting of two PNP transistors takes advantage of the ECL differential outputs. The choice of
transistors greatly influences the propagation delay
through these translators. Motorola manufactures some

+5Vdc

+5Vdc

Figure 5. ECL to TTL

Figure 3. TTL to ECL

3-5

very fast RF-type PNP transistors and matched PNP pairs
that can serve well in the circuits shown.

vee

CYIOE383/101E383 Full-Duplex Translator

00

~

QO

Do

DIFFERENTIAL 01
ECllNPUTS 01

For the ultimate in speed and flexibility, the Cypress
CYlOE383/CYlOlE383 is a new-generation, full-duplex,
TTL-to-ECL and ECL-to-TTL logic~level translator
designed for high-perfonnance systems (Figure 6). The
CYlOE383/CYlOlE383 has many features to satisfy a
variety of applications.
In the past, lev~l translators suffered from having an
insufficient number of channels or supply options. This
caused skew and noise problems that made the use of
high-speed ECL logic levels in TTL systems highly undesirable. The CYlOE383/CYlOlE383 contains ten independent TTL-to-ECL translators and ten independent
ECL-to-TTL translators for high-speed, bidirectional, fullduplex data-transmission, mixed-logic, and bus applications. The CYlOE383/CYlOlE383 is especially well
suited to driving ECL backplanes between TTL system
boards.
The translator is implemented with differential ECL
va to provide balanced, low-noise operation over controlled-impedance buses between TTL and/or ECL subsystems. The part features a delay of only 2 ns max from
TTL to ECL arid 3 ns max from ECL to TTL, with minimum skew between channels.
The CYlOE383/CYlOlE383 comes equipped with internal 2-K..Q pull-down resistors tied to VEE (ECL supply)
to decrease the number of external components. For system testing purposes or for driving light differential loads,
the pull-down resistors are the only termination, thus
eliminating up to 20 external resistors. You can also use
standard ECL terminations with the internal pull-down
resistors and still adhere to standard 10K/10KH and lOOK
logic levels. Additionally, the translator contains an ECL
VBB reference voltage output, which you can use to. tie
half of the ECL inputs for single-ended operation.
The device is designed with ample ground pins to
reduce bounce and has separate ECL and TTL
power/ground pins to reduce noise coupling between logic
families. The CYlOE383/CYlOlE383 can operate in
single or dual supply configurations while maintaining absolute 10K/10KH and lOOK level swings to be used with
either TTL-type (+SV) or ECL-type (-S.2V) supplies or
both.
The translators are offered in standard 10K/lOKH
(lOE) and lOOK (lOlE, lOOK levels with up to -S.2V
power supply) EeL-compatible versions. The TTL VO is

EeL SUPPLY

01

~~
03

02

153

.....

&

"

04

os
05
os

as
Q4

as
07

os

08

08
D9

010

.,

Dll

>

as

".

012

~

013

.~

014

>
>
>
>

010

010

011
011
Q12
012
013
013
.014
014
015
015
016

.~

TIL INPUTS 015
TTL SUPPLY 016

011

TIL SUPPLY

06

t56
07
07
~

TTl OUTPUTS

018

:>

019

>

~i ~i~i ~i

ala

OIFFEFlENTIAL
ECLOUTPUTS
ECLSUPPLV

017
017
018
018
OHl
'O19

i

Figure 6. CYIOE383 Full-Duplex TTLIECLITTL
Translator
fully TTL compatible. The CYlOE383/CYlOlE383 is
packaged in an 84-pin, surface-mountable PLCC.

Reference
Blood, William R., Jr., "Motorola MECL System
Design Handbook," (Motorola Semiconductor Products
Inc., Fourth Edition, 1988.)

3-6

CYPRESS
SEMICONDUCTOR

BiCMOS TTL and ECL SRAMs
Improve High-Performance Systems
A new BiCMOS process based on clean-slate approaches to implementing ECL or TTL logIc with bipolar,
BiCMOS, and CMOS transistors in single devices is
revolutionizing the speed/density characteristics of
SRAMs. Historically, BiCMOS technologies were
developed as either CMOS speed enhancers or bipolar
power misers. The resulting BiCMOS processes were
patches on either CMOS or bipolar process flows, and
performance for the complementary bipolar or MOSFET
components was sub-optimal.
In contrast, Cypress's STAR M2 is a third-generation, 0.8Jl BiCMOS technology in which the baseline
process is BiCMOS. In the STAR process, nonvolatile
elements such as polysilicon loads and TiW fuses are easily incorporated into the baseline process. This results in
high-density SRAMs, high-speed PLDs, and high-density
EPROMsIPLDs.
Figure 1 shows a simplified cross section ·of the
STAR M2 BiCMOS process. This I8-mask, double-poly,
double-metal technology utilizes a thin· epitaxial layer to
achieve NPN Ft greater than 10 GHz and CMOS latch-up
immunity. The MOSFETs both use lightly doped drains
for high performance and reliability.
In contrast to the architectures of SRAMs made using
first-generation BiCMOS processes, STAR's poly silicon

bipolar emitter is the same poly used for MOS gates. This
enhances NPN performance and decouples the NPN from
the poly load module used for 4T SRAM cells. By utilizing this poly load resistor, STAR allows for an 85-squaremicron memory cell. This third-generation process beats
second-generation BiCMOS technologies in terms of
product performance, density, and manufacturability.
SRAMs area key technology driver, and BiCMOS
fills the gap between the power-hungry pure bipolar ECL
and the very high density, medium-speed CMOS. To indicate the performance of the Cypress process, Table 1 summarizes gate delays as a function of logic family and
fanout.

Table 1. Gate Delays
Tpd (ps),
psiFanout
Gate
Fanout = 1
CMOS
110
70
12
BiCMOS
240
ECL
95
30*
*ECL delay varies with the square root of fanout

Figure 1. STAR M2 BiCMOS Process, Simplified

3-7

5if;cvm:ss

BieMOS SRAMs Improve High-Performance Sy·stems

~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Read Cycle No. 1

too

t - - toHA
DATA

)(

)(

ADDRESS

our

PREVIOUS DATA VALID

Write Cycle No. I

~

'*XX)(

DATA VALID



,r--I

djD2

READ '__ _
Addr ___ J

I Sh1ft

Dl

I

i

A

In
In

_

~,-

~D1n

SEL u

_

Uj
WE

1:

NUX

RegIster

Clock 2
N

I

DI

=
~

~

f-----,

o
>u

D2 0
0
SEL u

~

~
~

~

~
o

L-IA

In

_

NUX

I~

.. I

IWE

Latch

~

6

• 360 MHz

I

:g

gt-

e=

~
~

u

~

o

RAM

'"

tu

N
N
V

NUX

~D1n ~

1:

f

Set/Reset

D,

W

~

N
V

RAM
to

§fj~~l

,

I ~
~.

"

,..

Clock 1 (180 MHz)

1:

r

......

~

o00

....
Btl08

~

~
~

~

tn
tn

~

<
E

A

o
o I---

I:Il

~

SEL

ShIft
RegIster

§

~

nWf~1--"

lJJ

~

u1:

o

D 1n ~

MUX

~ DAC

-,
L-f-I

DI

RAM

e

~

~

111

~
('D

~ J.4

I USEL ~

I..

==

~

MUX

I

EVEN

READ

--..J

Addr

ua

L........f

DI

lJJ

o
o
L-ISEL u

1:

y

"'C

('D

If')
If')

__~===J=X~II~__1
I .L.---J- D2
MI

;>
~
I:Il
1-04

l&J

rHD2

00
~

J---l

I Rr'

~

111

I

?---

R/W

~
111

i
n

('D

00

t..od

I:Il

('D
"'"

e
t"I:l

Samp Ie Clock

Waveform Addresses

J

Waveform Data

b
[b]
lPF

~~

rI

•

-I
I

L~>outPut

Figure 5. Waveform Synthesis System
ECL I/Os. Interconnect capacitance between devicell on
the chip is very low, and drive requirements are minimal.
Consequently, noise is not generated at the high levels encountered between discrete devices installed on a board.
The noise magnitude on the chip Vee line is approximately 20 m V worst case, rather than the 800 m V encountered
in typical high-speed, board-level CMOS/TTL designs.
With a low overall noise level and internal supply decoupIing, both the ECL and CMOS sections of Cypress
devices run successfully on the same power pin.
Cypress employs a unique configuration to connect
the device ECL circuit ground (Vee) and ECL output
ground (Vcca). This configuration further reduces noise
coupling between the internal CMOS circuitry and the
ECL output drivers. The configuration also inhibits output
oscillation in response to slow or noisy input signals.

In operation, read and write addresses and data are
fed to the SRAMs from the octal 2: 1 multiplexer/latches,
and the color pixel data from the memories is sent to the
DAC. This path is one of three in which the DAC drives
the intensity of the display's red, green, or blue (RGB)
electron gun drivers. This system's 360-MHz speeds are
sufficient to drive 2K x 2K displays.
The waveform synthesis system in Figure 5 can be
controlled by either a microprocessor or a numerically
controlled oscillator (NCO). Another part of the system
writes waveform data to memory. Then the processor
commands an address sequencer, whose output controls
the memory, and the data read out is fed to the DAC,
which outputs an analog waveform. This type of fast digital waveform synthesis finds many applications in satellite
communications and video and test equipment.
The 8-, 10-, and 12-ns speeds of the TTL 16K x 4
CY7B 166 SRAM have improved the throughput of such
systems. The system could also use ECL BiCMOS for increased speed, but the resolution of available high-speed
ECL DACs is not as high as available TTL DACs.
For analog-to-digital applications, ECL and TIL
SRAMs are used with high-speed flash ND converters.
Some of converters have ECL outputs, whose clock rates
range from 20 MHz to 1 GHz. Other converters have TTL
outputs as fast as 25 MHz. In applications such as HDTV,
phased-array radar, digital oscilloscopes, and single-event
digitizers, the SRAMs create high-speed specialty
memories such as self-timed SRAM, pipelined SRAM,
and interleaved SRAM.
Further applications for ECL and TTL SRAMs are
found in high-performance workstations, file servers, and
high-end embedded controllers. Figure 6 shows an example based on Mips Computer's lOOK ECL version of a

BiCMOS ECL and TTL SRAM Applications
Applications for ECL and TTL SRAMs include
graphics and image processing, waveform generation via
direct digital synthesis (DDS), and fast ~ systems.
In video graphics, ECL memory stores color image
information. In waveform. generation and DDS, ECL
memory stores digital representations of analog
waveforms before they are fed to a digital-to-analog converter (DAC).
In a typical raster-graphics video system (Figure 4),
3.5-ns CYlOOE422 ECL SRAMs are used as color lookup tables (LUTs) to drive a Brooktree BTl08 video DAC.
The SRAMs are interleaved to achieve the necessary
speed and to supply the 8-bit words required for 3D solids
shading. Motorola MClOOE155s, which have clock-tooutput delays of 1 ns, are used as 2: 1 mux latches.

3-11

~~~~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~==~~~
BiCMOS SRAMs Improve High-Performance Systems
.

,

Instruction BUS

.......
jJII""""

,,

.......

Instruction

--

Cache

If)

:l

10

-

Control

~r

Y Bus

~
~

FloatIng
Point
Controller

Floating
Point
MultIplier

X Bus

If)
If)

4)

•

L.

U
U

-<
CPU

/'

~

..........

l

....

Data
Cache

System Bus
Controller

~

If)

:::J

m
E

!!
If)

III

j ~

Secondary

I...a....
..........

Cache

-

J ..

-

....

Data Bus

.'-'

Figure 6. RISe System
commercial RISC microprocessor, which has a clock frequency of 67 MHz and is rated in a general-purpose application mix at 55 MIPS. The cache and TAG blocks are
implemented using ECL BiCMOS SRAMs.
The system uses standard memories to provide two
levels of data cache. The primary caches include 64
Kbytes of storage for instructions and 16 Kbytes for data,
using the fastest 64-Kbit, 8-ns, CY100E494 SRAMs.
Cache control is part of the integer unit. With primary 8ns caching, the R6000 CPU can fetch both an instruction
and a data word every cycle, instead of having to wait
several cycles for main memory to keep pace. The. slower
512-KByte secondary cache is made up of 20-ns devices.
A general-purpose cache-TAG implementation using
standard EeL memories (Figure 7) uses two Cypress 1K
x 4 CY100E474 ECL SRAMs (3.5 ns max access time).
Two Motorola MC100E107 quint XORlXNOR gates (800
ps max prop delay D to F) perform the compare function.
The speed of the SRAMs and logic correspond to a 4.5 ns
address to match comparison time. Note that the outputs
of ECL PLDs or logic are wire ORed to save one additional component.
Alternatively, one CY100E302 (16P4) ECL PLD
could be programmed to implement the 8-bit compare

function in approximately 3 ns and save board space.
Other memory sizes (e.g., 16K x 4) could be used to increase depth, and word width could be optimized by cascading devices.
Figure 8 shows the critical path for a TTL 80386based cache system with a two-phase clock. The path consists of a DRAM controller implemented in a gate array,
address generation configured in PLDs, cache SRAM, and
cache TAG. Table 2 shows how the speed of cache tag
and cache RAM affects path speeds.
Table 2. Path Speeds for 80386 Cache
Bus Cycle Time (MHz)
Device
Gate array
TILPLDs
Cache RAM enable
Cache TAG
Total
+ 2-phase clock

3-12

33
17.5
7.5
15.0
20.0
60.0
30.0

40
15.0
7.0
12.0
15.0
50.0
25.0

50
12.0
5.0
10.0
13.0
40.0
20.0

~

=-

~~RESS

BiCMOS SRAMs Improve High-Performance Systems

~~~COID~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As bus cycle times decrease because of faster ~
clocks, the speed of the cache TAG and cache RAM become very important in achieving path speeds. For today's
TIL ~s, BiCMOS TTL SRAM implementations reduce
access times to meet cycle time requirements. An example
is shown in using the Cypress CY7B160 16K X 4 TIL

BiCMOS SRAM. It is an 8-, 10-, and 12-ns device with
internal decoding to enable easy memory expansion
without sacrificing speed. Figure 9 shows how to use four
devices to create an 8-, 10-, or 12-ns 64K X 4 memory
and a 32K X 8 memory. Additional configurations are
also easily implemented using no external decoding.

1----------,
I
I
I

Ul

as
~

E

r.c:

IK x 4

iii

~

::J

0.

Wire

u

I

OR

~

1- _..,

~

II
II
I
I L __

u

I

,...'

CY7Bl60B

tlT

A,-A"
A,.

1/00-1/0,

.... _-- .........

-<
f't)

A,-A .3

CY7Bl60C

Glillfrn---i-j

G~ rn ~fli
VCC:
.:J
- - . CE5
- _______

CY7BI60C

Cl!!!m'-iD

~

1/0.-1/0,

GJill

TRUTH TABU

L -_ _ ._~A~"~:CE4

rn

~

t\tt CE4 :

• -1/°
1/°
0
3

A.-A' 3

m

~

I

~~ A,-A' 3

VCC:CE4 ::
VCClCE5 R!

tlT

Au CE4 :

('i

________
V~CE5
• J 1/°0- 1/°3 p..-

CE

GMjrn ! i
--.,

...t=

D

I

CY7Bl60B
D

CY7Dl60A

i
~rn ~

-

Ao-A"

tlT

i

m

AlSlrn

Iif!
CE

~!

I " - A,-A .3
CY7Bl60A

~

rn

CE4 :
V.Q;; ~~~___~J 1/00-1/03 I - - -

Ao-A"

~

n~

4 RAM CONFIGURED WITH FOUR CY78160s
All EE2---;T

CY7Bl60D

14 A B C

D

o

o

0
0

0

i
i

hIS CE5 .:
_________ 1

-

WmA.-'"

TRUTH TABLE

...==
=-•

(J'C:I

1A15 AI4 A
0

0

1

B C

D

0

0

0

~

f't)

"'1
I/OO-I/03i"'--

CY7Bl60D

I

0

I

0

0

0

1

0

0

0

1

0

111

0

0

0

1

0'
"'1

a
=
n
~

t'D

f:I')

Figure 9. Example Memory Configurations

~
rI.l
~
f't)

a
rI.l

==-~
.iii CYPRESS
,
SEMICONDUCTOR
~

PLCC and CLCC Packaging
for 'High-Speed Parts
The semiconductor industry is constantly searching
for package options that enhance the capabilities of highperformance devices. For fast device performance with
minimal ground bounce, electrical characteristics must include low inductance and capacitance from external pin to
die bond-wire pad. A package should also furnish good
thermal characteristics for reliability overextended
temperature ranges.
Other major properties sought after are low cost, as
well as standardized outline/pin configurations for compatibility, ease of manufacturing, and handling throughput.
The package must also work with surface mount technology and have a small footprint to save board space.
The package that best meets all these requirements is
the PLCC (plastic leaded chip carrier). In the past, utilization of PLCCs was not practical for high-power, bipolar
devices. However, the advent of low-power bipolar and
BiCMOS ECL-compatible SRAMs and PLDs now
provides the opportunity for high-volume usage. As
manufacturers switch from bipol.ar to BiCMOS, the lower
power dissipation of high-density ECL SRAMs and complex PLDs promise to give PLCC packages a bright future. For military applications and extended temperature
environments or for devices with higher power dissipation, you can substitute the CLCC (ceramic leaded chip
carrier).
The PLCC has many desirable qualities:
Suitable for surface mounting with J-type leads
Small footprint to save board space
Low inductance and capacitance for high speed with
little ground-bounce
Good thermal characteristics for reliability over
temperature range
Ease of manufacturing and handling for production
throughput
Low cost compared to CERDIP, flatpack, LCC
Standard package outline and pin-configuration compatibility
The PLCC's J-type surface-mount leads have the advantage over gull-wing leads, which are susceptible to

fatigue. J leads also enhance handling ease in test and
burn-in fixtures. The PLCC's I-pF capacitance compares
favorably with the 3 and 6 pF for plastic DIPs and
CERDIPs, and inductance is equally impressive: 2 nH
versus 6 and 11 nH for plastic DIP and CERDIP. Unlike
flatpacks, PLCCs are available in standard tooling. PLCCs
come in a variety of pin configurations, from 18 to over
200 pins, versus a maximum of 40 pins for plastic DIPs.

The Ceramic Leaded Chip Carrier
For high-temperature environments and high-power
devices, you can make use of the ceramic leaded chip carrier (CLCC, Y package), which can also be surface
mounted. The Y package has the same footprint and J
leads as the PLCC (Figure 1) and works well for the
faster PLDs and SRAMs.
If you do not know system temperature in the early
stages of a design, you can substitute the Y package for
the PLCC and vice versa, so long as the device's die junction temperature does not exceed IS0°C. The Y package is
slightly more expensive than the PLCC, but with a ther. mal resistance from junction to ambient (8JA) of 3SoC/W
at SOO LFPM, the Y package can dissipate heat more efficiently.

Reliability
Cypress's bipolar and BiCMOS products in PLCC
and CLCC packages go through extensive burn-in and
testing at elevated temperature to guarantee package integrity. Cypress strongly recommends 500-LFPM system
forced air flow but guarantees reliability in systems with
or without the flow if the ambient air does not cause the
junction temperature (TJ) to exceed IS0'C.
The PLCC's 8JA is approximately 4SoCIW. The
SRAMs have power dissipation that ranges from 780 mW
max for the CYlO0E422L-S up to 1097 mW max for the
CYI0E474L-S. This dissipation results in junction
temperature rises from 3S to 49°C. The 16P4-type PLD
(CYl00E302L-6) has a temperature rise of 39°C, and the

3-15

28-uad Plastic uaded Chip Carrier J64
DIMENSIONS IN INCHES

RIGHT SlOE

,LTd
o.~ I ~

O.Oo&S

"'ilO56

VIet(

~=}!1'0.'3

r---....._ 0.021

~

.---J

0.430

0090

"O':'i'2O

28-Pin Ceramic Leaded Chip Carrier Y64

DIMENSIONS IN INCHES
MAX.

TOPV1EW

'MiN.

DETAIl. 0F>-

Figure 1. Diagrams of 28-Lead Chip Carriers

3-16·

~

0.020

~
0.180

MIN.

~

~~~OID~~~~~~~~~~~~~~~~~~~P~L~C~C~a~n~d~C~L~C~C~P~a~c~k~a~g~in~g
The graphs are based on a linear method of interpreting the failures observed at bum-in and indicate the longterm reliability of Cypress devices. You can use the
graphs to determine MTBF and FITs for any Cypress
device in any package after calculating the appropriate

16P8-type PLD (CYlOE301L-6) has a temperature rise of
47°C. The CLCC package's aJA equals 3YC/W for
temperature rises of up to SSoC (CYlOE474-3).

Finding Chip-Level Junction Temperature

~T.

The following relationship determines chip-level
junction temperature for the PLCC package:
TJ = ~T + TA
where
~T= Pn x aJA
and
aJA = aJC + acs + aSA
To calculate worst case junction temperature (TJ) use
maximum supply VEE and lEE for power dissipation and
maximum TA for the temperature range of interest. For
the 10KllOKH CYlOE301L in a PLCC, for example,
device lEE = 170 rnA max and VEE = S.46V max for Pn =
928 mW. Add IS mW per output for a total output Pn =
120 mW. Therefore, the total Po = 1048 mW.
For a PLCC, aJA = 4SoC/W at SOO LFPM, and aJA =
64°C/W for still air.
For a CLCC, aJA = 3SoC/W at SOO LFPM, and aJA
= S4°C/W for still air.
Because
TJ = total Po x aJA + TA
and
TA = 7SoC worst-case commercial temperature range, for
the PLCC:
TJ = (1.048 W)(4SoC/W) + 75°C = 122°C at 500 LFPM
TJ = (1.048 W)(64°C/W) + 75°C = 142°C in still air
This calculation is for absolute worst-case data sheet
conditions. The bum-in temperature used by Cypress (TJ)
is much higher than the device will ever see in a system.
Note that rrwst systems will not run at worst case due to
guard-banding. For this reason, use VEENOM = S.2V or
4.SV and IEENOM = (IEEMAX)(8S%) for nominal-condition
calculations.

The X-axis on the graphs indicates junction temperature. These values are determined by adding the L\T to
ambient temperature, as described earlier. As an example,
Figures 2 and 3 note the following critical points for a
CY10E301L ECL PLD under three different operating
conditions:
Point A-10Kl10KH typical data sheet conditions:
2SoC ambient, nominal VEE and lEE, son loads, SOO
LFPM air flow, TJ = 64"C, FITs = 7, MTBF =
18,000 yrs.
Point B -10Kl10KH typical operating conditions:
5SoC ambient, nominal VEE and lEE, son loads, SOO
LFPM air flow, TJ = 94°C, FITs = 45, MTBF = 2800
yrs.
Point C - 10KlKH absolute worst-case conditions:
7SoC ambient, S.46 V max and 170 rnA max, son
loads, 500 LFPM air flow, TJ = 122°C, FITs = 22S,
MTBF = S2S yrs.
The activation energy used for the MTBF and FITs
information is 0.7 eV. This is an average number for diesurface-related defects, such as metal and oxide pinholes,
etc., but is very conservative for silicon defects or
mechanical interfaces to packages. The number is usually
1.0 eV. A small change here results in a significant
change in MTBF or FITs. A change to 0.8 eV equates to a
33% reduction in FITs rate or a SO% increase in MTBF.

The Packages of Choice
The PLCC and CLCC are accepted as the packages
of choice by many manufacturers of high-speed devices.
Motorola Semiconductor uses the PLCC as the only package for the company's very high speed ECLINPS ECL
logic family, which stands for "ECL in picoseconds" and
is pronounced "eclipse." This family has set-up times and
propagation delays in the sub-nanosecond range, with
power dissipation of over 1W. Fully compatible with
Cypress SRAMs and PLDs, the ECLINPS family includes
many 10K, 10KH, and lOOK standard logic gates, building blocks, and transceivers.

Real-World Values
Obviously, most systems do not operate at the worstcase conditions. Therefore, Figures 2 through 5 show
graphs over different operating conditions to determine
failures in time (FITs) and mean time between failure
(MTBF) for a typical system or in a worst-case scenario.

3-17

~RESS
~,

PLCC and CLCC Packaging

SEMICOIDUCTOR

============================;;;;;;;;;;;;;;;;;;;;

ECLPLD

FITs vs. Tj

ATs

----------- ----------- ----------- ------------------------ ----------- ----------- ----------- ----------1

+-------r-----~r_--~_+------~------4_------+_----~-------+------~

10

60

80

90

100

110

120

130

140

150

Junction T etql (deg C)

Figure 2. Failures in Time vs Junction Temperature

Eel PlD MTBF vs. Tj

MmF
!'years)

100

+------;------;------4------~----~r_----~------~----~----__1

60

70

80

90

100

110

120

Ti,Junction T eIq) (deg C)

Figure 3. Mean Time Between Failures vs Junction Temp.

3-18

130

140

150

Eel SRAM FITs vs. Tj

RTs

1000

70

60

80

90

100

110

120

130

140

150

Ti.Junction TeIq) (deg q
Figure 4. Failures in Time vs Junction Temperature

Eel SRAM MTBF vs. Tj

MTBF
(Years)

100000

10000

----~;--------

----------- ----------- ----------- ----------- ----------- ----------- -----------

-----------

----------- ----------- ----------- ----------- ----------- ----------- -----------

----:~

111111111~111~·1~1~~~1 ~~~ ::mll11.1 ~11111·~~11 :1:::~111111:::!1111.:
r---

1000

100

+-------+-------~----~~-----4-------+------_r------_r------~----~

50

70

80

90

100

110

120

Ti. Junction TeIq) (deg C)
Figure 5. Mean Time Between Failure vs Junction Temp.

3-19

130

140

150

CYPRESS
SEMICONDUCTOR

A New Generation of BiCMOS
High-Speed TTL SRAMs
rior ground-bounce characteristics and faster propagation
delays than is possible with rail-to-rail output swings.

This application note profiles the Cypress CY7B166
family of TTL-I/O 64K SRAMs, which are ushering-in a
new era of high-performance memory devices. These are
the world's fastest BiCMOS RAMs, with address access
times as low as 8 ns.' Arranged in 16Kx4 and 8Kx8 architectures, the.pevices are functionally equivalent to their
industry-standatd, TTL-compatible, CMOS counterparts;
there is no difference in 110 logic-level minimax
specifications. In addition, on-chip features provide supe-

c:

UJ
LL
LL

::::l

_______) BIPOLAR

a..

uJ
c:
a..

c:
o
o
()
UJ

0

()

~I ___"~

~

BiCMOS technology employs CMOS inputs for compatibility with existing products and bipolar on-chip bus
interconnects and sense amplifiers to speed the internal
access timing. The resulting throughput improvement allows more time for the outputs to slew load capacitance.

c:

UJ
0

____~~~ m

~
z

BiCMOS Technology

l1li-----l1li

CMOS

UJ

_BiCMOS

o

X

X

c:

UJ
LL
LL

::J
f - -_ _....

CONTROL LOGIC

m

I-:
::::l

a..

I-:
::::l

o

Figure 1. 64K TTL SRAM Circuit Technology

3-20

~

~~~OIDucr~~~~~~~~~~~~~~B~I~·C~~~O~S~H~i~gh~-~S~p~e~ed~T~T~L~S~R~A~~~s

CDJT

TTL CMOS INPUT

TTL BiCMOS OUTPUT

Figure 2. CY7C166C BiCMOS Family 1/0 Architecture
Further, BiCMOS uses both CMOS and bipolar transistors
on the outputs to optimize drive capability.
Figure 1 shows how the parts of the memories are
partitioned by technology. On the outputs, two bipolar
transistors drop two Vbe levels (approximately 1.6 V) to
reduce the High-level output swing. One device is tied
base to collector as a diode, and the other is the Highlevel drive transistor. Both transistors cause the output to
conform to standard TIL logic levels (not CMOS rail-torail). This output structure appears in Figure 2. The diode
is the bipolar transistor Q3, and Q2 is the High-level drive
transistor. M18 is an output Low-level pull-down MOSFET (n-type). Keeping the output from swinging to the
power supply rail saves time when changing states, as
shown in Figure 3.
Figure 2 also shows the SRAM's input structure. The
CMOS devices are M2 and M4. The input structure includes bipolar-type input clamping diodes, which act as
ESD protection devices and meet MIL-STD-883C Method
3015 static discharge voltages of 2001V. The inputs adhere to standard CMOS specifications. The outputs include the same diodes and are an improvement over
CMOS-type diodes. The diodes also clamp transmissionline reflections in mis-matched board traces.

Compatibility and Improvements
To reduce ground-bounce noise problems associated
with full-swing, high-speed CMOS devices-and TIL
parts to a lesser degree-the CY7B 166 SRAMs include
an internal supply-bypass capacitor between the power
supply pin and the ground pin. In parallel with this
capacitor, an inductor of equal value to package lead inductance cuts in half the overall inductance associated
with output-swing ground bounce. Both the capacitor and
inductor decreases the magnitude of the bounce on the
output-logic swing's falling edge.
In conclusion, to illustrate BiCMOS compatibility
and improvements, Figure 4 shows I/O waveforms for
BiCMOS and CMOS devices. These waveforms show that
no compatibility problems arise when substituting
BiCMOS-type TTL devices for CMOS parts in a new or
existing TTL-I/O system. On the contrary, upgrading from
a CMOS 64K TTL-I/O SRAM to Cypress' BiCMOS
device family increases speed and noise immunity and
decreases noise generation, for an overall system
improvement.

3.8V

S.OV

dv

C110S __~.l5~V__~_----------~~­
SLEW RATE = dv/dt
RISE/FALL TIME = dt
LOGIC SWING = dv
THEREFORE TIME IS SAVED BECAUSE THE
LOGIC SWINGS ARE SMALLER

Figure 3. Speed Increase from Reduced Logic Swing
3-21

3X Attenuation
1/01

100.0

10

pF

son

scop

-=-

TEST SETUP

ADDR

DATA

7~8

15 ns

ns

TIL CMOS OUTPUT

TTL BiCMOS OUTPUT

Figure 4. CY7B166 BiCMOS Output vs 64K TTL CMOS Output

3-22

=

~

~.,

CYPRESS
SEMICONDUCTOR

Access Time vs Load Capacitance
for High-Speed BiCMOS TTL SRAMs
Although many TTL and CMOS components are
specified for 30- to 50-pF drive requirements, the actual
characteristics of modern high-speed systems are quite
different. In a system environment using good transmission lines and termination techniques, the drive requirement depends on the characteristics and. length of the
transmission lines, the number of succeeding device packages, and where devices are physically distributed along
the line. For testing purposes, however, you can approximate the effective capacitance seen at the output of
high-speed SRAMs as a lumped capacitance connected
directly to the output. This lumped value is from 10 to 30
pF in most board-level systems.
The graph in Figure 1 represents the additional access time requirements for various lumped-output-

This application note provides a technique for analyzing a system's load capacitance and shows how to determine access time (Taa) degradation. You can also determine other output-related specifications such as tDOE
using this method.
The BiCMOS process has made available a new
generation of 8-, 10-, and 12-ns 'J'TL-I10-compatible
SRAMs. In the past, the most significant speed barrier in
SRAMs was the propagation delay through the device.
This delay is now becoming quite small. Consequently,
the time the device takes to slew the output load
capacitance is a substantial portion of overall delay and
must be understood to determine optimum system timing.
The techniques presented here can thus help you maximize your system's throughput.

DELTA taa
(ns)

164K

0.8
0.7
0.6
0.5

------------------------------------- -- -- ------- -------------------------

0.3
0.2
0.1

-------------------------------------------------------

0.4

TIL SiGMOS SRAM

I

-------------------

o ~~~~~~~~~~~~~~++~r+~-r~~~;_~~_r++~~~-r~

-0 1

- - - - -. . - - - - -

-----

.-

-. --- - -

-----

.- - - -.-

-----

.- - - - . - - - - -

-0:2 0 - - - - - .5. - - - - - '0 - - - - -'5. - - - - .20- - - - - 25- - - - - 30 - - - - -35 - - - - _40. - - - - .45- - - - - 50
- - - - - - - CL Total Load Capacitance ( pF )

-0.3

Jj ~~~~~~~~;;~~~~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~~~~~~~l~~~~~~l~~~~~~

-0.9
-1
-1.1

--

--------------- ------ ------ ------------ ------ ---------------------- ------ ------ ------------ ------ ------------------------ ------ ------ ------------ ------ -----Figure 1. Normalized DELTA T aa vs Load Capacitance

3-23

~~
Access Time vs Load Capacitance for BiCMOS SRAMs
~
~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The setup· used to get the values in Figure 1 approximates son terminated transmission lines and is
shown in Figure 2. To avoid loading the output, a lOOn
resistor is put in series with the termination resistor. This
adds a 3X attenuation factor but does not alter the results.
Using the following measurement techniques and a
reasonable number of device loads, you can derive any
system's characteristic capacitance. This allows load adjustment to optimize time degradation to keep address access to a minimum. You can also use this technique to
determine other specifications that depend on output rise
and fall time, such as tDOE.

31 AUt'Du8iion
]/0 1

1001'\

lOOp,

Mil

SCOPE

TEST SETUP

Measuring Load Capacitance
Now that the capacitance's effect on the device speed
is known, the 20-pF approximation can be used to determine Taa. This requires a method for measuring the system load capacitance. A simple method is to use timedomain reflectometry (1DR),
which determines
capacitance on a transmission line by measuring the pulse
reflection the capacitance causes.
The TOR test system (Figure 3) consists of a fast
pulse generator and oscilloscope with son terminated inputs. The oscilloscope's channel A measures the reflected
voltages, and channel B measures the setup of rise time,
logic swing, and pulse width. A single device or a critical

Figure 2. Test Load to Determine T aa vs CL

capacitance values. This graph applies to the CY7Bl60,
CY7B161, CY7B162, CY7Bl64, CY7B166, CY7B185,
and CY7B186. Data is shown for the falling edge only
because this edge is eff~ted most by load capacitance.
The graph is normalized to 20 pF and can be used for all
speed grades. For the -8 devices with no capacitive load,
for example, the Taa is 6.9; at 20 pF it is 8 ns; and at 50
pF it is 8.8 ns.

Power Djvider (optional)

!

Pulse Generator
HP 80B2A or
Equivalent

Cable A
Channel A

D1G1TIZ1NG

Test

OSCII.J..OSCOPE
HP54120 OR
EQUIVALENT

Cable B
Zo = non

Pin

CabJe C
Channel B

GND

DEV)CE

Dynamjc
Test
Board

Figure 3. Test Setup for TDR Capacitance Measurement

3-24

~RESS

Access Time vs Load Capacitance for BiCMOS SRAMs

~~ ~COND~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Pulse Width

DO ns

, - - - - - - - - - - - - - . . , . . - - + - - - - - Vih

+3.0 V

'------ViI

+OAV

90%

50%
10%

tr

=

2 ns _ _----34

Figure 4. Setup Pulse on Channel B (nothing in the path)

where /).V = Maximum reflected voltage at channel A, tr =
Rise time of incident pulse, Zo = 50n, VI = Logic swing
of incident pulse
The equation includes the 2X attenuation factor introduced by the test circuit.

path with various loads can be measured to determine
dynamic capacitive loading.
Note that although the length of cables A and C is
not critical to the measurement, the time it takes the pulse
to traverse cable B must be much greater than the pulse's
maximum rise time. This ensures that reflections are
measured after the pulse has stabilized and not during a
transition. Also note that the test point is any input or output to a PCB transmission line or device and that outputs
must be forced to a Low state to be measured.
Figure 4 shows the setup pulse on channel B with no
device or board in the path. The setup waveform corresponds to the SRAM's output characteristics. Figure 5
shows an example reflection, indicating the /).V reflected
voltage measurement and position on cable B.
You can determine capacitance values from the test
data using the following equation:
CD= 4(tr)(/).V)
ZNl
Eq.1

Measuring Capacitance Values Exactly
The line capacitance along with the load capacitance
found using Equation 1 determine the total capacitance
and time delay added to the access time. Two ways to
determine the additional delay are to calculate the extra
time and add the result to the no-load access time or calculate the load capacitance and use the graph in Figure 1.
These approximations for total capacitance are adequate in most situations, but you can also measure actual
line and load capacitance using a high-frequency LCR
meter. Usually this equipment is unavailable and/or expensive due to the frequency range needed to get an accurate measurement.

100% Zone

t

6V = Ma.ximum
Reflected

2X Length

Voltage

of Cable B

Figure 5. DELTA V Reflected Voltage Measurement

3-25

~""'"
Access Time vs Load Capacitance for BiCMOSSRAMs
~~COID~OR ~~~~~~~~====~==============~~==~~~~~~~~~~~~~
Another approach is to determine the transmission
line's capacitance per foot by analyzing the line's characteristics based on the type of line and board construction.
A typical 50n microstrip line has approximately 35 pF/ft.
(3 pF/in.) based on the equation:
Co 1.017 x 1O-9(0.45er+ 0.67)(·5)
=

- 1 88 n~

= 11.4 ns
Another way to determine delay is from the overall
load capacitance, including line and distributed load:

3 for G-10 fiberglass-epoxy

The distributed load and line capacitance interact for

J

CD)(O.S)

Ctotal =C~1 + Co

to:
C J(0.5)

nVft.

3 in.
12 in.

=0.47 ns

an overall transmission-line propagation delay equivalent
tpd = 1.0 17 (er) (0.5'L 1 + C~

ftX

taa total = 0.47 ns + 10.9 ns

PCBs

{

~ 1.017(3)(O.'{1+ ;5]""5) "-'ft
-.

Eq.2

ZOpFITt.

where Zo = 50n and er

!pd

Eq. 4

where Ctotal = Total line capacitance
You can use Ctotal to determine the additional access
time from the graph in Figure 1. For example, for a 3-in.
50n micros trip transmission line (Co = 35 pF/ft.) driving
one load (CD = 5 pF/ft.), the total capacitance is:
5 )(0.5) 3 in.
Ctotal = 35 pF!ft. ( 1 + 35
x 12 in.

Eq.3

where CD = Distributed capacitance
This line length and load-dependent delay can be
added to the no-load (0 pF) access time from Figure 1 to
derive system timing. For example, for a 3-in. micros trip
transmission line (Co = 35 pF/ft.) with a 12-ns device
driving one load (5 pF), the total delay is:
taa @20 pF- 1.1 ns = 12ns - 1.1ns
~ 10.9ns
= No-load access time

= 9.35 pF
Using the graph, the access time decreases by 0.41
ns, for an access time of 11.6 ns. If the line is 6 in. long
with two loads, the total capacitance is' 19.8 pF, for an
increased access time of 11.95 its. Using Equatibn 3 gives
11.4 ns and 11.9 ns for the two examples.

3-26

•

=

~~.::z
~

,

CYPRESS
SEMICONDUCTOR

Combining SRAMs
Without an External Decoder

32K x 8 RAM CONFIGURED WITH FOUR CYl8160s

CE

CEI

CY7B160B

AO-AI3

CEI

CY7BI6OC

GNDr~-D

GNnl~ ~ I
L---IH-"A:l.:...;14:..., CE4

~:

vee C~~J

CEI

CY7B160D

14 ABC

D

o

1

0

1

0

1

0

1

0

1

Figure 1. 32 Kbit x 8

3-27

An internal decoder with four chip-enable inputs helps designers retain the 8/1O-ns access times
of the CY7B160 16K x 4 BiCMOS SRAM in multiple-chip memory configurations. Without this
capability, denser memory arrays require external
logic, which adds 3 or more nanoseconds to the
access time. This application note describes how to
use the 16K x 4 SRAM to create 64K x 4, 32K x
8, or 64K x 8 memories without an external
decoder.
In the x4 configuration, only one Cypress
CY7B160 is active at a time. In the x8 configurations, two chips are active at once. Devices that are
deselected power-down to less than 40 rnA of
standby current from a maximum operating current
of 120 rnA.
Figures 1, 2, and 3 show how two additional
address lines, connected to the memories' chipenable (CE) inputs, permit multiple-SRAM configurations without using an external decoder. You
can use a fifth chip-enable input to power-down all
devices.
The decoder works without external logic because two of the CE inputs (/CE2 and ICE3) are
active Low, and two (CE4 and CES) are active
High. When any CE pin is pulled out of its active
state, the chip is deselected. Any CE pin can
deselect and power-down the. device independently
of the other CE pins.
1991 Electronic Design. Reprinted by permission.

A,,~

~eEIT ~I

~CE4 gl
100
Q;LU 1/°0-1/°3 ,1 -110"

'(tx:

-

A.-A"

~ tE"

f-

CY7B160A

~~~ ~I

~CE4 ~I
V..cc1CE5 = I 1/°0-1/°3

r-"

1/'1.-110"

TRU'I1:I TABLE

_A.-A 13

CE

tE"

A.-A,.
A..

A"j-~
A,~I~

A,&

V.~CE4 ~ I
vee CE5
'-=-=

At6 ~14 A

.CY7B160B

E'

R

I 1/° -1/° ,110,-110,
0
3

---'

~

A.-A 13

~

"CEI

i-

CY'1B16OC

GNDr~ . '

~

S~~I
A,. CE. g I
vee
=> CE5
_ _=.JI

1/%-1/%

110,110,

1/0.-1/0,

~ Ao-A'3
~

tE"

Au-~

CY7Bl60D

0

~§~ ~I
V~CE4

~

gI

~g;~!J 1/°0-1/°3 ,!tn.-I/O,

-

A.-A

'3

....ll tE"

CY'1B160E

GNDr~

GNnl~ fl
A;;1CE4

10-

-!it

-

~

CEc:.

&I

i!

1/°0-1/°3

.J

1/01)-110,

Ao-A'3

CY7B160F

tE"

A rmGND 1Cil3

~CE'
AI6 CE5

ig,I
!I

-' 1/°0-1/°3

~

A.-A,.

~

m-

II/O.-J/O,

-

CY'1B160G

~:,~ i'
A;;iCE4 gl
-

~

CEo

~

A.-A

'--

"CEI

:J

1/%-1/°3

1/(11"1/0,

'3

CY?B160H

Figure 2. 64 Kbit x 8

3-28

B

C

D

E

F

0

0

t

0

t

0

0

0

0

0

0

t

0

t

0

t

0

0

0

0

1

0

0

0

0

0

1

0

1

0

1

t

0

0

0

0

0

1

0

t

G H

-..

~

~'~RESS
,

Combining SRAMs With ought External Decoder

SEMlcamucroR

=;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;=;;;;;;;;;;;;;~=;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;=;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;===;;;;;;;;;;;;;;;;;;

64K

4 RAM CONFIGURED WITH FOUR CY78160s

X

A15'--CE2 ~
AI.4 CE3
~

v~
v~

CE4
CE5

1

I

~I
~ I
J

f--_

]/0 -]/0
0
3

Ao-A 13

~

CY78160A

CEl

A15r~

1)1

V~ICE5

il

GNDI CE3 Ec I
~CE4 o I
t--_

]/0 -]/0
0
3

J

AO-Aj3

CE
Ao-Ats
Aj4

Au

CEl

CY7B160B

CE2~

I

GND ICE3
AI5 1CE4

AI!;

V~JCE5

~
~

1/°0-]/°3

I

I

il

1/°0- 1/°3

I--_J

Ao-A 13
It---

G1!Q

CE2

CE4
CE5

D 1

A15 A14

g:
D

~I

f--_

J

1/00 -1/°3

A -A
o 13
'---

TRUTH TABLE

r--

G]J2 CE3

A 14
A 15

CY7B160C

CEl

CEl

CY78160D
Figure 3. 64K x 4

3-29

A

BCD

0

0

1

000

0

1

0

100

1

0

0

010

1

1

0

001

CYPRESS
SEMICONDUCTOR

BiCMOS TTL SRAMs
Improve MIPS R3000 and R3000A Systems
This application note analyzes the speeds required for
the cache SRAMs used in RISC systems. The focus here
is on the R3000/R3000A RISC processor architecture
from MIPS Computer Systems Inc.
One of the goals of RISC-type machines is to execute
one instruction per CPU cycle. To achieve this goal, RISC
processors employ a compact and unified instruction set, a
deep instruction pipeline, and careful adaptation to optimizing compilers. However, these benefits can be
rendered useless without an efficient cache memory system composed of fast SRAMs.

Design Overview
A block diagram showing the memory components of
an R3000A cache system appears in Figure 1. The
memory system is designed for maximum bandwidth by
utilizing separate instruction and data caches and an external write buffer for main memory. The high-speed cache
is physically close to the processor and holds instructions
and data that are repetitively accessed by the CPU; this
reduces the number of times that slow main memory must
be utilized.
The R3000 can handle up to 256 Kbytes in 64K
entries. The processor provides cache control, which is
direct mapped. The processor also provides tag control to
verify that the correct data is read from cache. The controller can refill multiple words when a cache miss occurs.
With separate data and instruction caches on the same
bus the processor can access or write data and instructions
at the CPU's cycle rate. The separate cache architecture
for instruction and data memory means that each are alternately accessed during each CPU cycle. This makes cache
access time equal to half the cycle-time clock period.
As the processor speed increases from 25 MHz for
the R3000 to 33 and 40 MHz for the R3000A, the time
allowed for instruction and data fetches from cache
memory decreases. The clock period is 30 ns for the 33MHz system and 25· ns for the 40-MHz system. This
leaves 15 ns to access and/or read/write data for the 33MHz system and only 12.5 ns for the 40-MHz system
To further illustrate the cache timing, a sample read
critical path in a 64-Kbyte cache system appears in Figure
2. Path 1 is the access time from the R3000 through the
373A latch and into the CY7B166 SRAMs. Path 2 is the
time it takes data to be Valid after an IRd signal is
received from the R3000.
The extemallatch between the R3000A and the
cache address inputs provides part of the pipelining used
in the R3000 system and also minimizes loading between
the addresses of cascaded memories and the R3000. This

R3000/R3000A
ruse M]CROPROCESSOR
DATA

ADDRESS

DATA

ADDRESS

MAIN MEMORY

Figure 1. R30001R3000A System with
High-Performance Cache
3-30

DATA. BUS

I I
I

TAG BUS

!-

-...--1
! L

I

I

I

I-

ll
-

-

Tnnlpan!'M
UMt'h

373A

""

~7~
11

.,."'.

•

PATH 2

7''''-...7'''-...7

?

,..

...:::

Al1DRII:I

~:~

AUJ(LO 1::!.!J3

-

T_aP

.,-

:;;

;;
;;
.....

""~
Tnnl-

DIItB

pIIn!'M
UMt'h

llltlltP
1lC'.Ik r - -

f1:r--- lClIl

L7

3'73A

~ ...
n.Id •

QB\

"1\

fE-

~

R3OOO~QOA
me
ssor

lJllf\

lW,

DR'" M

o.m

D'lr\

WE\.

~

... ~""

.,.

~

1IOIt.

CY7Df66

'Dah

Cache

PATH 1
Clllb9;y.

L
Nemory
Inler1at"e

'I:

n

CY''1IH66
Imrll"lleUoD
Cache

,

-

C1I2IIS:In,
lID\

SpOut\

M..r,[20]
MmJM
MmI..,\

::- JlafBua;T
; Wr'Ba...,.\

.... CpCanIlIO]

S 1Iu1lBrTar\

ClIlt'tR'll
ClII2:xPllI
JI. .1\
C]:dPt'\
Ibm\
!txt'\
Cp'llaa:

CpCandl:J I]
]m....I~DJ

~
~

,7
jf-=-

Clocks

~

21-::3

....
::.. Co~ Processors

..L

;

A

r-

I 8BJ'dwBrt!
Inlerupts

I

Figure 2. Data and Instruction Cache Critical Paths

Table 1 lists the time constraints for critical paths 1
and 2 for different system speeds. This data indicates that
fast SRAMs are essential to keep up with the 33- and 40MHz processors. Fortunately, BiCMOS processes now fill
this need with 8-, 10-, and 12-ns TIL-liO-compatible
SRAMs with reduced internal propagation delays and improved driving capability.

extra device causes the memory's access time to become
critical, however.
As shown in Figure 3, data is fetched from the data
SRAM (path 2 for data cache) while the address for the
instruction SRAM is set up (path 1 for instr. cache).
During the next half cycle, the opposite operation is performed. This arrangement allows use of shared pins on the
processor, which save up to 64 I/O lines; however, bus
bandwidth requirements are doubled. You must therefore
keep signal lines short and loading as low as possible to
minimize capacitance.
For a 40-MHz system, critical path 1 in Figure 2 includes 3.9 ns for a 74PCT373A latch, which leaves only
8.6 ns for the memory, board trace, and address set-up.
'Fortunately, memory access can overlap into the read
cycle by 3 ns. Path 2 for a read cycle includes the time it
takes the R3000 to send the IRd. signal to the CY7B166,
the CY7B166 OE-Low-to-data-valid time (tDOE), and the
R3000's data set-up time. The set-up time for the 40-MHz
R3000A is 4 ns, and the read signal takes 3 ns to
generate. This leaves only 5.5 ns for tDOE and for slewing
the output load capacitance.

CY7B166 16K x 4 BiCMOS SRAM
The CY7B166 SRAM is optimized using a BiCMOS
process to achieve 12-, 10-, and 8-ns access times. Bipolar
and CMOS technology combines to speed-up critical
paths and boost output drive (see the block diagram in
Figure.1 of "A New Generation of BiCMOS TTL
SRAMslt). CMOS technology reduces memory array size
and keeps power to a minimum, while bipolar technology
speeds-up critical paths.
BiCMOS technology allows the inputs to be CMOS
for compatibility with existing products, while the on-chip
bipolar bus interconnects and sense amplifiers speed the
internal access timing to allow more time for the outputs
to switch. On the outputs, two bipolar transistors drop two

3-31

Vbe levels (approximately 1.6V) to reduce the High-level
output swing. One transistor is tied base-to-collector as a
diode and the other transistor is the High-level drive transistor. Both transistors cause the output to conform to
standard TTL-type logic levels (not CMOS rail to rail).
(See Figure ~. in "A New Generation of BiCMOS TIL
SRAMs" for a diagram of this output structure.) The diode
is the bipolar transistor Q3, and Q2 is the High-level drive
transistor. M18 is an output Low-level pull-down MOSFET (n type). Keeping the output from swinging to the
power supply rail saves time when changing states and
makes the ramp rate slower (as shown in Figure 3 of "A
New Generation of BiCMOS TIL SRAMs").
The CY7B166's input side includes CMOS devices
M2 and M4. Input clamping diodes are also included to
provide ESD protection and meet MIL-SID-883C Method
3015 static discharge voltages of 2001 V. The inputs meet
standard CMOS specifications.
To reduce ground-bounce noise problems associated
with full-swing, high-speed CMOS devices - as well as
TIL parts to a lesser degree - the CY7B 166 incorporates
an internal supply-bypass capacitor between the power
supply pin and the ground pin. The device also includes
an inductor, whose value equals that of the package lead
inductance, in parallel with the bypass capacitor to cut the
overall inductance associated with output-swing ground
bounce in half. Both the capacitor and inductor decrease
the magnitude of the bounce on the falling edge of the
output logic swing.
Substituting BiCMOS type TIL devices for CMOS
parts in a new or existing TTL-I/O system creates no
compatibility problems. Upgrading from a CMOS 64K
TIL SRAM to Cypress' BiCMOS family of devices increases speed and noise immunity, while decreasing noise
generation for overall system improvement.

Table 1. Delays Through Two Critical Paths

P
A
T

25 MHz

PNWElER
tAV R3000

1.5 ns

tpd 373A

5.5 ns

H

I ns

4.1

ng

40 MHz

I ns

3.9

ng

AS

10 ns

8 ns

2BS

2 ns

1.5

20 n.

15 ns

l1Rd\R3000A

5.0 ns

3.75 ns 3.125 ns

lOOE CY7B166

6 ns

5 ns

4 ns

lOS R3000/A

6 ns

4.5 ns

4ns

3.0 ns

1.75n.

1.375 ns

2Q ns

15 ns

12.5 BS

40 ns

30ns

25ns

tAA CY7B166

1

33 MHz

En6fI)

IElA'IS"'
ACCESS
CYCLE

12

lIS

12.5 ns

TIME

p
A
T
H

fOIrR)

2

IElA'IS"'
READ
CYQ.E
TIME

CLOCK PERIOD

Hoard delays are cntlcal as speed lDcreases. The access
time needed by the SRAM can overlap the path cycle
time by 3 ns to make up for loss in board delays.

33MHz CLOCK
40 MHz CLOCK

15ns
12.5n&

RISC UP CLOCK

D

ADDRESS BUS

PATH
DATA BUS

D
PATH 2
Figure 3. Cache Interleaved InstructionlData Timing

3-32

CYPRESS
SEMICONDUCTOR

Memory and Support Logic
for Next-Generation EeL Systems
This application note describes the characteristics and
use of ECL-lIO technology. Available for many years, this
technology is now breaking into mainstream applications
due to innovative process technologies. The high power
requirements and low device density that once banished
ECL to high-speed niche markets are fading with advanced technology and circuit designs. Table 1 shows
how performance and power utilization are evolving.
As system clocks pass 50 MHz, it becomes hard for
TTL to provide the necessary low-noise drive capability
for fast rise times, and ECL becomes essential. Happily,
new BiCMOS SRAMs, gate arrays, and improved bipolar
PLDs combine ECL lIO speed with higher density and
lower power requirements.
A bipolar ECL implementation of an industry-standard PLD such as the l6P4, for example, draws a modest
220 rnA (max), while exhibiting propagation delays of 3
ns (333 MHz). These specifications are for Cypress's
CYlOE302 and CYlOOE302 lOKH- and lOOK-compatible
devices. Low-power (170 rnA) versions with 4-ns
propagation delays are also available.

This performance is based on new approaches to
combining ECL and CMOS in single devices. Historically, BiCMOS technologies were developed as either
CMOS speed enhancers or bipolar power misers. The
resulting BiCMOS processes were based either on CMOS
or bipolar process flows, and performance for the complementary bipolar or MOSFET components was less than
optimal.
In contrast, Cypress's STAR M2 process is a thirdgeneration, 0.8J.l BiCMOS technology in which the
baseline process is BiCMOS. (See Figure 1 in "BiCMOS
TIL and ECL SRAMs Improve High-Performance Systems" for a simplified cross section of the STAR M2
BiCMOS process.) The STAR process utilizes a modular
architecture. That is, polysilicon loads, TiW fuses, or
other non-volatile elements are easily incorporated into
the baseline process. This results in high-density SRAMs,
high-speed PLDs, or high-density EPROMsJPLDs, respectively.
The STAR M2 process is an l8-mask, double-poly,
double-metal technology that utilizes a thin epitaxial layer
to achieve excellent production performance for NPNs (Ft
greater than 10 GHz) and CMOS latch-up immunity. The
MOSFETs use lightly doped drains for high performance
and reliability.
Unlike first-generation BiCMOS processes, which
were limited to SRAMs, STAR's poly silicon bipolar emitter is the same poly used for MOS gates. This enhances
NPN performance and decouples the NPN from the poly

EeL and BiCMOS
BiCMOS combines bipolar ECL I/O with both
bipolar and CMOS internal functions. This helps parts
such as Cypress's CYl0E474/CYlOOE474 lK x 4 static
RAMs draw only 275 rnA, while exhibiting access times
of 3.5 ns. Low-power (190 rnA) versions exhibit 7-ns access times.

Table 1. ECL Families
Parameter
Ext. Gate Delay (ps)
Flip-Flop (MHz)

lOKH

lOOK

ECLPSTM

500
250

400

500
800

Gate Power (mW)
Speed(X)Power (pJ)

25

30
12

25

400

3-33

8
2.4

Cypress STAR™
500
800

3
0.6

EeL PLD
D1strlbut1on
or R/W and Clock
S1gnals

Address
Sequencer

Figure 1. High-Speed A-to-D Application
load module used for 4T SRAM cells. Use of. tpis poly
load resistor allows for an 85-square-micron memory' cell
and small die size.
The advantages of the STAR M2 process over
second-generation BiCMOS technologies include higher
product performance and greater density and manufacturability.

ADC at top speeds. After the memory is full, you can load
the data at a slower rate to a PC or digital oscilloscope for
manipulation and/or measurements in software.
Instead of using the ECL PLD to implement the
SRAMs' address sequencer, it might once have been
necessary to incorporate the sequencer as part of the
memory chip or use discrete logic. Neither approach was
satisfactory, in the one case because of power dissipation
and in the other because of the speed limitations imposed
by multiple levels of discrete logic.
Further applications for ECL PLDs and SRAMs are
found in high-performance workstations, file servers, and
high-end embedded controllers. In fact, the next generation of high~end workstations will require ECL support
logic. Figure 2 shows an example based, on Bipolar Integrated Technology's 10K-ECL version of Sun Microsystems Inc.'s. SPARC processor. In this 80-MHz SPARe
implementation, based on Bipolar Integrated Technology's
ECL SPARe chip, cache and tag memories use BiCMOS
SRAMs and the cache control, memory management unit
(MMU), and cache data path (COP) are implemented with
ECLPLDs.
The BIT system is bipolar and consists of the main
integer unit (IU), a floating-point coprocessor interface
chip, a multiplier and accumulator floating point chip set,
and a register file chip. The IU can handle off-chip cache
of almost any size with complementary sets of 30n cache
address drivers to split the cache into two banks. This
minimizes trace length, reduces noise, and improves cycle
time. The 4K or 64K BiCMOS ECL SRAMs implement
the cache memory and reduce system power dissipation.
The IU has· a 12.5"ns cycle time and provides a Data
Ready clock signalthat allows a 15-ns cache access time.
This access time makes up for trace propagation delays.
The design can use SRAMS with access times from 3 to
12 ns, depending on the required cache size and power
requirements; these SRAMs can easily keep up with the
IU, as can the 3-ns PLDs.

Applications for ECL and BiCMOS ECL
Applications for ECL PLDs and SRAMs include
graphics and image processing, waveform generation, and
direct digital synthesis (DDS). In the case of video, ECL
memory stores images. In waveform generation and DDS,
ECL memory· stores digital representations of analog
waveforms before feeding the information to a digital-toanalog converter (DAC).
In both image and waveform applications, PLDs are
used for address generation/decoding, data manipulation,
and clocking schemes/timing control. These functions previously had to be either built discretely with ECL gates or
added onto the DAC or memory on the same die. However, high~speed video DACs (greater than 125 MHz) use
bipolar process technology, which does not lend itself to
high density due to power dissipation problems. It is
easier .to implement the functions in ECL PLDs and
BiCMOS SRAMs.
For analog-to-digital conversion, ECL PLDs work
with high-speed· flash AID ·converters (ADCs) that have
EeL outputs. These converters' clock rates range from 20
MHz up to 1 GHz. Applications include HDTV, phasedarray radar, ... digital 'oscilloscopes, and single-event
digitization ..'Here, PLD~ help create high-speed· specialty
memories such· as self-timed SRAM, pipelined SRAM,
and intetleaved SRAM.
'
Using the design shown in Figure 1, you can implement a·· fast· digital oscilloscope to· display analog
waveforms on a PC. The flash ADC contains a string of
comparators that split the signal into a digital "thermometer" code. From there the digital codes are usually
decoded into 8 bits, which are latChed on the outputs
every· clock period. The flash AID converter feeds
BieMOS SRAMs, which can be interleaved for maximum'
speed. The ,PLDs are programmed as address decoders
and counters to change the EeL SRAM's address location
every clock period. .similar to the way a cache memory
works, the memory stores'the digital information from the

Designing with ECL
Because ECL PLD propagation delays are as short as
3 ns, and output rise/fall times are in the sub-nanosecond
region, you must adhere to striet system layout guidelines.
ECL speed and noise performance are enhanced, with correct transmission-line design and power-supply bypassing
techniques. The underlying objectives are to minimize the

3-34

~

16

"

T ag
Read
Data

Memory Mgt & Comm
SPARC
I
Cache
ECL
Control
Tag
..,r- v. ~~ Unit
~Control
Hi h
v. Add. Low g
T
Phys ic al
16,{ l'17
I
Addres s
I
rr-~ SPARC
r
SPARC
72
Integer
MMU
ECL
I/O Data
, ....
~~~ (PLD)
rUnit
Cache
Cache
Read
Data
Data
"'~3 6
r

I

-

-..

-

-

t ,

,

..-

----

~

~

Cache
Write
Data

64
o4~

72

,

SYS
Bus

SPARC
COP

/

,

~

72

--

32

--...

r

-

J

r--,

...
~

I

64

36/

H •
SPARC
Fl Pt
!Contro ller
Control

/ CP Bus

'r

TTL System
Bus Interface

-30 I
64/
/

..

Fl Pt Bus In

SPARC
Fl Pt
Subsystem

I

...
Fl Pt Control

Figure 2. SO-MHz SP ARC Implementation
capacitive loading that slows data, prevent ringing and
reflections from impedance mismatch, and minimize voltage drops that add system noise and reduce noise margin.
ECL-I/O circuits achieve the best possible match to
transmission lines for maximum energy transfer. The output stage consists of a low-impedance, open-emitter transistor that can effectively drive different values of transmission-line Zo with the addition of a pull-down termination resistor. The pull-down resistor is also necessary for
operation of the output transistor and can serve a dual role
as the transmission-line termination. ECL input pins are
connected to a transistor's high-impedance (DC) base,
which appears as a small capacitive load to a properly terminated transmission line.
It is always a good idea to use transmission lines, but
they are essential when line propagation delay to the

receiving end and back again is greater than or equal to
the signal's rise time. Basic calculations for different
etched circuit board (ECB) lines appear in Figure 3, along
with an equation for propagation delay through the transmission line. Table 2 lists common values for the
dielectric constant.
Stripline is used in multi-layered boards and between
ground planes; it consists of a trace buried between
ground/power planes. The stripline calculations assume
that W/(H - T) is less than 0.35 and that T!H is less than
0.25. Single and composite microstripline is used on the
top and/or bottom of single- or double-ground boards; it
consists of a trace on the surface, with the ground or
power plane buried.
Other common high-speed practices are to use equal
line lengths from device to device and rounded comers on

3-35

QC'IPRE$

Memory and Support Logic for EeL Systems

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~

.L

H

H

~

t
Zo

=
+

1.41

5.98H

j 1nt

O.8W

of-

T

)

B == 1.016·.J7; ns/ft
where 0 = propagation delay
Ee

=

0.475

Er

+ 0.67

Figure 3. Zo for Microstrip and Stripline
traces. Component lead lengths should be short, with surface-mount passive and active components used as much
as possible.

Table 2. Common Values for Dielectric Constant
Material
Duroid
Quartz
0-10 (FO Epoxy)

2.56
3.78

Alumina

9.7
11.7

Silicon

Terminations

£r

Transmission-line terminations must match the line's
characteristic impedance to minimize reflections. The termination is usually also used as the pull-down resistor for
the open-emitter ECL outputs. These outputs allow Zo
values from 50 to 150.0. This means that ECL can accommodate 75.0 video systems as well as 50.0 communications systems. Some ECL outputs even allow 25.0 trans-

4.7

3-36

-5.2 or -4.5 V

= 2.6 Zo (l0KH)

R
2

a. Parallel

2.26 Zo (lOOK}

.B.1... (lOKH)

_ 1.6
R1 - R

fig (lOOK)

b. Thevenin

~

__

zo
c. Series

Vee

=

(lOKH) I n number of lines

I
(lOOK) I

: Rt

I

= Zo·

IOn

luru~_-r.r""--------------'

Figure 4. Three Types of Transmission Line Terminations

3-37

Table 3. ECL Output Transistor Power
and TerminatinglPull-Down Resistor Power

The series termination's efficiency depends on the
value of RE. The power dissipated by a small RE can exceed the power dissipated in the parallel termination. A
large RE can slow negative-going transitions because the
input capacitance of the following gates (typically 4 pF)
are being charged through the resistor. A large RE can
also reduce noise margins.
Note that, in this case, RE does not have to equal the
transmission-line impedance. Table 3 shows a tabulation
of ECL output transistor power and RE power dissipation.
Because the series termination is installed at the near
end of the transmission line, only lumped loads can be
used. Distributed loads cause problems because the full
value of the pulses are seen only at the far end of the line
and not along the length of the trace, as with the parallel
and Thevenin terminations.
Typically, you can have up to 10 lumped loads at the
end of the line. Thus, you must choose RE to supply
enough current to drive the loads. However, you must also
consider the voltage drop in the series terminating resistor.
One way to minimize dissipation is to make the series termination drive two or more lines with lumped loads in
parallel (as in Figure 4c).

Dissipation in
Terminating
Resistor Value
(0)

ECL Output
Transistor
(mW)

Terminating
Resistor
(mW)

Parallel Termination
150

5.0

4.3

100

7.5

6.5

75

10

8.7

50

15

13

Thevenin Termination
82/130

15

140

Series Termination
2K

2.5

7.7

lK

4.9

680

7.2

15.4
22.6

510

9.7

30.2

Measurements
After prototyping transmission lines and terminations,
you can make waveform measurements on a sample board
to uncover any mismatches. Simple time domain reflectometry (TDR, Figure 5) can show the position of discontinuities or mis-matches along the line and the type of
reactance or termination needed to correct them. Discontinuities, such as gate input capacitances distributed along
the line, appear as small glitches on the output waveform.
The reflection's amplitude is proportional to the
capacitance. You can therefore calibrate the test setup
using a series of standard capacitances. Also, test equipment with TDR capability, which simplifies measurements, is available from HP.

mission lines to drive doubly terminated 500 bus lines in
backplane applications.
Figure 4 shows the types of terminations with calculations. The different options have tradeoffs that include routing, power dissipation, loading, and ease of use.
The parallel· termination (Figure 4A) is simple: The
terminating resistor at the far end of the transmission line
equals the line's Zo. In reality, the line and Rt always
exhibit some mismatch caused by the ECL device's input
capacitance. This termination offers the fastest performance and lowest power dissipation, but requires an additional power supply for the termination resistor (Rt).
An advantage of parallel terminations over series terminations (Figure 4C) is that you can use the former with
ECL loads distributed along the length of the transmission
line. This is because the parallel termination is installed at
the transmission line's receiving end and absorbs most all
reflections.
The Thevenin equivalent (Figure 4B) of the parallel
termination (called the Thevenin termination) requires two
resistors but needs no separate supply because the termination relies on the system power bus. Although this
feature is convenient for small systems, the Thevenin termination draws 11 times more power per termination than
does the parallel termination.
The series termination is potentially the most powerefficient. It matches Zo by means of a resistor (Rt) in
series with the driving ECL gate's output impedance,
which is 100 in STAR devices). Instead of totally
preventing any reflections at the far end of the line, the
series termination allows pulses to be reflected by the
high impedance there, absorbing them when they are
reflected back to the near end.

Interfacing and Prototyping
With the increased use of ECL in new and nextgeneration systems, many connector and cable companies,
such as W. L. Gore & Associates, are offering controlledimpedance coax ribbon cable and wrappable coax cable
for prototypes and final design.
Although most ECL system prototyping is done on
PC boards, alternatives exist. ECL and mixed-TTL/ECL
wire-wrapping boards with extensive ground planes are
available from MUPAC Corporation. You can use wrappable coax on these boards between signal pins, with additional connections to adjacent ground pins.

Programming EeL PLDs
Cypress's current ECL PLDs are bipolar devices with
proven TiW fuses. This means that, unlike the company's
erasable CMOS PLDs, the ECL PLDs are one-time fuseprogrammable. You can program the devices using Data
I/O, Stag, and Logical Devices PLD programmers; you

3-38

Vreflect f".lf------IHH~.=...

Power
,.....-----.. Dlvider

200 MHz (or faster)

Scope

Pulse
Generator

v "800 mV
Tr " 2 ns
T " 50 ns

Term inat ion

Figure 5. Time Domain Reflectometry Setup

can also use Cypress's QuickPro II. Development
software, including simulation models, is available from
Data I/O (ABEL) and Logical Devices (CUPL).

3-39

Section Contents
Page

SRAMs
RAM 110 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4-1
Understanding Dual-Port RAMs .......................................................... 4-7
Using Dual-Port RAMs Without Arbitration ............................................... 4-19
Using Cypress SRAMs to Implement 386 Cache ........................................... 4-23

~4
~
.= CYPRESS
,

SEMICONDUCTOR

RAM 1/0 Characteristics

This application note describes the function and I/O
standards of Cypress high-speed static RAMs. Manufactured using a speed-optimized CMOS technology, these
RAMs meet and exceed the performance of competitive
bipolar devices, while consuming significantly less power
and providing superior reliability. While providing identical function, the RAMs exhibit slightly different input and
output characteristics, which permit you to improve overall system performance.

For more detailed information on these products, refer to
the Cypress Data Book.

Generic I/O Characteristics
Input and output characteristics fall generally into
two categories: when the area of operation falls within the
normal limits of Vee and Vss plus or minus approximately 600 mY, and abnormal circumstances, when these
limits are exceeded. Under normal operating conditions,
inputs switch between logic Zero and logic One. This application note considers operation in a positive-True environment, and therefore a One is more positive than a
Zero.
The RAMs provide TTL-compatible I/O. Therefore a
One is 2.0V, while a Zero is 0.8V. To be considered a
One, the input of a device must be driven greater than
2.0V, but not exceeding Vee + 0.6V. To be considered a
Zero, the input must be driven to less than 0.8V, but not
less than Vss - 0.6V.
Output characteristics represent a signal that drives
the input of the next device in the system. Because the
RAM levels are TTL compatible, you can assume that the
VIL and VIR values of 0.8V and 1.0V referenced above
are valid.
In consideration of noise margin, however, driving
the input of the next stage to the required VIL or VIR is
not sufficient. Noise margins of 200 to 400 mV are considered more than adequate. Thus, an adequate VOH is
2AV and VOL is OAV, providing a noise margin of 400
mY.
Because the driven node consists of both a resistive
and a capacitive component, output characteristics are
specified such that the output driver is capable of sinking
IOL at the specified VOL, and capable of sourcing IOH at
VOH. Because the values of IOL and IOH differ depending
on the device, these values are shown in Table 1.
Outputs have one other characteristic to be aware of:
output short-circuit current (Ios). This is the maximum
current that the output can source when driving a One into

Product Description
The five parts represented in Figure 1 constitute three
basic devices of 64, 1024, and 4096 bits. The CY7C189
and CY7C190 feature inverting and non-inverting outputs,
respectively, in a 16 x 4-bit organization. Four address
lines address the 16 words, which are accessed via
separate input and output lines. Both of these 64-bit
devices have separate active-Low select and write-enable
signals.
The 256 x 4 CY7C122 is packaged in a 22-pin DIP,
and features separate input and output lines, both activeLow and active-High select lines, eight address lines, an
active-Low output enable, and an active-Low write
enable.
Both the CY7C148 and CY7C149 are organized as
1024 x 4 bits and feature common pins for data input and
output. Both parts have 10 address lines, a single activeLow chip select, and an active-Low write enable. The
CY7C148 features automatic power-down whenever the
device is not selected, while the CY7C149 has a highspeed, 15-ns chip select for applications that do not require power control.
This family of high-speed static RAMs is available
with access times of 15 to 45 ns with power in the 300- to
500-mW range. These RAMs are designed from a common core approach and share the same memory cell, input
structures, and many other characteristics. The outputs are
similar, with the exception of output drive and the common I/O optimization for the CY7C148 and CY7C149.

4-1

DO

DO

0,

D,

02

02

03

D3

AO

Ao

00

00
A,

A,

0,

0,
02

Ai

02

A2

0]

0]
A3

A3

ES

B

Wl

WE

7C190

7C189

AO

110 0

Ae

A,

AS

A2

I/O,

A7

A3

AS

A.

I/Oi

AS
A.

1/0 3

A~

AG

CS

A7

WE

7C148/9

7C122

Figure 1. RAM Block Diagrams

vSS. You need to be aware of los for two reasons. First,

Technology Dependencies and Benefits

the output should be capable of supplying this current for
some reasonable period of time without damage. Second,
this is the current that charges the capacitive load when
switching the output from a Zero to a One and will control the output rise time.
Because memories such as these are often tied
together, you also need to consider the output characteristics when the devices are deselected. All of the RAMs
in the family feature three-state outputs; when deselected
the outputs are in a high-impedance condition that does
not source or sink any current. In this condition, as long
as the input is driven in its normal operating mode, a
three-state output appears as an open, with less than 10
IJA of leakage. Thus, to any other device driving this
node, the output does not exist.

.
Some of the products described in this application
note were originally produced in a bipolar technology.
They have since been re-engineered in NMOS technology,
arid Cypress has now produced them in a speed-optimized
CMOS technology.
Both technology dependencies and benefits associated
with each technology relate to the design of input and output structures. When you use these products, you should
know about these characteristics and how they can benefit
or impede a design effort.
One of the most obvious factors is that both NMOS
and CMOS device inputs are high impedance, with less
than 10 J.LA of input leakage. Bipolar devices, however,
require that the driver of an input sink current when driv-

4-2

RAM I/O Characteristics
Table 1. DC Parameters
CY7C122

Parameters

Description

Test Conditions

VOH

Output High Voltage

Vee = Min., IOH = -5.2 rnA

VOL

Output Low Voltage

Vee = Min., IOL = 8.0 rnA

VIH

Input High Voltage

2.1

VIL

Input Low Voltage

-3.0

IlL
IIH

Input Low Current
Input High Current

Vee = Max., VIN = Vss
Vee = Max., VIN = Vee

IOFF

Output Current
(High Z)

VOL < VOUT < VOH, TA = Max.

los

CY7C148/9 CY7C189190

Min. Max. Min. Max. Min. Max.
2.4
2.4
2.4
0.4
Vee
0.8
10

-3.0

10
-10

+10

Vee = Max., O°C < TA < 70°C

Output Short-Circuit
Current
VOUT = Vss, -55°C < TA < 125°C

-10

V

0.4

V
V

10

Vee
0.8
10

I.lA

10

10

IlA

0.4
2.0

Units

Vee
0.8

+10

2.0
-3.0

-10

V

+10

J.I.A

-70

-90

-275

rnA

-80

-90

-350

rnA

perform normally. Operated in full CMOS mode, the
devices save power because the current consumed in the
input converter decreases as the input voltage rises above
3.0V or falls below 1.5V. Because the input signal is in
the 1.5V-to-3.0V range only when transitioning between
logic states, the power savings in a large array with true
CMOS inputs can be significant. With input signals on
over half of the pins of a device, significant savings in a
large system can be realized by using CMOS input voltage swings even in TTL systems.
Although this application note does not directly deal
with the AC characteristics of high-speed RAMs, the
input and output characteristics of these devices have a
great deal to do with the actual AC specifications. Conventionally, all AC measurements associated with highspeed devices are done at l.5V and assume a maximum
rise and fall time. This eliminates the variations associated
with the various usage configurations (as a figure of merit
when testing the device), but does not mean that you can
ignore these influences when designing a system.
Maximum rise and fall time is usually included on
every data sheet. For the products referred to in this application note, a lO-ns maximum rise and fall time is
specified for all devices with access times equal to or

ing to VIL, but appear as high impedance at VIH levels.
This is because the input of a bipolar device is the emitter
of a bipolar NPN-type device with its base biased positive. The bias (l.5V) establishes the point at which the
input changes from requiring current to be sourced to
presenting a high impedance. This switching level is the
reason that AC measurements are done at the 1.5V level.
Although NMOS and CMOS device inputs do not
change from low to high impedance, great care is taken to
balance their switching threshold at 1.5V. This allows you
to consider only capacitive loading for MOS device
fanout, while bipolar has both. a capacitive and DC component.
The other· input characteristic that differs between
bipolar and MOS is the clamp diode structure, which exists in both MOS and bipolar. However, in MOS devices
that use bias generator techniques (all high-speed MOS
devices), the diode does not become forward biased until
the input goes more negative than the substrate bias generator plus one diode drop. Because the bias generator is
usually at about -3V, this factor removes the clamping effect.

CMOS/NMOS/Bipolar Input Characteristics
Although NMOS, CMOS, and bipolar technologies
differ widely, the I/O characteristics are the TTL derivatives that have been covered above and are documented in
Table 1. With the exception of the differences in input
impedance between MOS and bipolar devices, all three
technologies are used to produce TTL-compatible
products.
Another group of devices provide a true CMOS interface, where signals swing from Vss + 1.5V. In addition,
loads are primarily capacitive. Only devices produced in a
CMOS technology are capable of behaving in this manner. CMOS devices can, however, handle both TTL and
CMOS inputs.
Devices such as the ones described in this application
note have input characteristics such as those depicted in
Figure 2. While operated in the TTL range, these devices

3.5

!

3.0
2.5
2.0

1.S

/
If

1.0
0.5

o
0.0

r
I

-.....~

\

~

\.

J

/

1.0

2.0

3.0

4.0

INPUT VOLTAGE - V

"

s.o

Figure 2. Input Voltage vs Current

4-3

6.0

RAM I/O Characteristics
ACLoad

High Impedance Load

Rl 470 S1

5V~

OUTPUT

.

r

30pF

R2

R14700

5V~

Thevenin Equivalent

OUTPUT
152S1
OuTPUT ~1.62V

r

5 pF

-= 2240

R2

~ 224"

Figure 3. Test Loads
ticularly because the VOL and VOR changes track to the
same 100 mY.

greater than 25 ns. All devices with access times less than
25 ns have a 5-ns maximum rise and fall time.
The AC load and its Thevenin equivalent in Figure 3
represent the resistive and capacitive load components that
the devices are specified to drive. With either of these
loads, the device must source or sink its rated output current or its specified output voltage. The capacitance stresses the ability of the device output to source or sink sufficient current to slew the outputs at a high enough rate to
meet the AC specifications.
The high-impedance load is a convenience to testing
when trying to determine how rapidly the output enters a
high-impedance mode. The resistive divider charges the
capacitance until equilibrium is reached. Allowing for
noise margin, testing for a 500 mV change is normal. By
using a smaller capacitance than normal, you can make
the change occur more quickly, allowing a more accurate
determination of entry into the high-impedance state.

Electrostatic Discharge
Because of extremely high input impedance and relatively low breakdown voltage (approximately 30V), MOS
devices have always suffered from destruction caused by
ESD (electrostatic discharge). This problem has had two
effects. First, major efforts to design input protection circuits without impeding performance have resulted in MOS
devices that are now superior to bipolar devices. Second,
care in handling semiconductors is now common practice.
Interestingly enough, bipolar products that once did
not differ from ESD have now become sensitive to the
phenomenon,primarily because new processing technology involving shallow junctions is in itself sensitive. MOS
devices are in many cases now superior to bipolar
products. A sampling of competitive bipolar and NMOS
64-bit, 1-Kbit, and 4-Kbit products reveals breakdown
voltages as low as ±150V and greater than ±2001V.
The circuit in Figure 4 protects Cypress products
against ESD.The circuit consists of two thick-oxide field
effect transistors wrapped around an input resistor and a
thin-oxide device with a relatively low breakdown voltage
(approximately 12V). Large input voltages cause the field
transistors to turn on, discharging the ESD current harmlessly to ground. The thin oxide transistor breaks down
when the voltage across it exceeds 12V; this transistor is
protected from destruction by the current limiting of Rp.
The combination of these two structures provides
ESD protection greater than 2250V, the limit of available
testing equipment. Repeated applications of this stress do
not cause a degradation that could lead to eventual device
failure, as observed in functionally equivalent devices.

Switching-Threshold Variations
Along with input rise and fall times, switchingthreshold variations can affect the performance of any
device. Input rise and fall times are under your control
and are primarily affected by capacitive loading or the
driver and bus termination techniques. Switching
threshold is affected by process variations, changes in
Vee, and temperature. Compensation of these variables is
the responsibility of the manufacturer, both at the design
stage and during the manufacture of the device. Combined
threshold shifts over full military temperature ranges and
process variations average less than 100 mV. This translates directly to VIL and VIR variations that track well
within the noise margins of normal system design, par-

TTL TO

~.......---,..---...--JV'o.f'Ir--"-----'------T-- ~~~~ERTER
"'-----'
THIN OXIDE

TRANSISTOR
RSUB

'Thick Oxide Field Transistor
"Substrate Diode

RSUB

VSUB

Figure 4. Input Protection Circuit

4-4

RAM I/O Characteristics
Output Driver

CMOS Inverter

n-MOS
PULL-DOWN
DEVICE

'h

n+ DIFFUSION AND
n- WELL GUARD RING

OUTPUT

LS

p+ DIFFUSION
GUARD RING

n-MOS
/PULL-UP
DEVICE

Vee

LATERAL npn BIPOLAR
TRANSISTOR

INPUT

OUTPUT

PARASITIC
RESISTANCE

Figure 5. CMOS Cross Section and Parasitic Circuits
Latch-up can be induced at either the inputs or outputs. In true CMOS output structures such as the ones previously discussed, the output driver has a PMOS pull-up
resistor that creates additional vertical bipolar PNP transistors, which compound the latch-up problem. Additional
isolation using the guard ring technique can solve this
problem at the expense of additional silicon area. Because
all the devices of concern here require TTL outputs, the
problem is totally eliminated through the use of an NMOS
pull-up resistor.

CMOS Latch-Up
The parasitic bipolar transistors shown in Figure 5
result in a built-in silicon-controlled rectifier (Figure 6).
Under normal circumstances the substrate resistor RSUB is
connected to ground. Therefore, whenever the signal on
the pin goes below ground by one diode drop, current
flows from ground through· RSUB, forward biasing the
lower transistor in the effective SCR. If this current is sufficient to turn on the transistor, the upper PNP transistor is
forward biased, which turns on the SCR and normally
destroys the device.
Two possible solutions are to decrease the substrate
resistance or add a substrate bias generator (Figure 7).
The bias generator technique has several additional
benefits, such as threshold voltage control, which increases device performance. The bias generator is thus
employed in all Cypress products. Also used are guard
rings, which effectively isolate input and output structures
from the core of the device and thus decrease the substrate
resistance by short-circuiting the current paths.

Inducing Latch-Up for Testing Purposes
Exercise care in testing for latch-up because it is typically a destructive phenomenon. The normal method is to
power the device under test with a current-limited supply,
so that when latch-up is induced, insufficient current exists to destroy the device. Once this setup exists, driving
the inputs or outputs with a current and measuring the
point at which the power supply collapses allows nondestructive measurement of latch-up characteristics.
In actual testing, with the device under power, individual inputs and outputs are driven positive and nega10

Vee

1.0

I
L

0.1

cjJ

c(

E 0.0
co
co

~

I"""""

i/

,

,I

0.00 1
0.000

~

,

0.0000 1
-5.0

-4.0

-2.0

-3.0
VBB

-1.0

-v

Figure 7. Bias Generator Characteristics

Figure 6. Parasitic SCR and Bias Generator

4-5

0

0.0

o. 0

-1. 0

1

-2. 0

1
-3.0

/

I

-1.0
w
to-

!iw

-2.0

w

1/

~

C(

!

-3.0

1
-4.0

-4.0

J
J
-6.0

Vee = 5.0V

Vee = 5.0V

-5.0
0.0

6.0

-6.0

12.0

VINPUT (VOLTS)

0.0

6.0

12.0

VINPUT (VOL lS)

Note: Output is in a High Impedance Condition.

Figure 8. Input VII Characteristics

Figure 9. Output VII Characteristics

tive with a voltage. The current is measured at which the
device latches up. This provides the DC latch-up data for
each pin on the device as a function of trigger current.
Measuring .. the latch-up characteristics of devices
should encompass ranges of reasonable positive and negative currents for trigger sources. Depending on the device,
latch-up can occur at sink or source currents as low as a
few milliamperes to as high as several hundred milliamperes. Devices that latch at trigger currents of less
than 20 to 30 rnA are in danger of encountering system
conditions that cause latch-up failure.

done using low-impedance, epitaxial substrates andlor a
substrate bias generator.
The use of a low-impedance substrate increases the
undershoot voltage required to generate the trigger current
that causes latch-up. A substrate bias generator has two
effects that help to eliminate latch-up. First, by biasing the
substrate at a negative (-3.0V) voltage, the parasitic
devices cannot be forward biased unless the undershoot
exceeds -3.0V by at least one diode drop. Second, if undershoot is this severe, the impedance of the bias generator itself is sufficient to deter enough trigger current from
being generated.
The bias generator has one additional noticeable characteristic: It effectively removes the input clamp diode.
This is due to the anode of the diode connecting to the
substrate that is at -3.0V. Therefore, even though the
diode exists, as shown in Figure 4, DC signals of -3.0V
do not forward-bias the diode and exhibit the clamp condition. The benefits of the bias generator are apparent in
higher noise tolerance, as substrate currents due to input
undershoot do not occur.
Figures 8 and 9 represent the voltage and current
characteristics of the devices discussed in this application
note. Figure 8 is characteristic of an input pin, and Figure
9 an output pin in a high-impedance state. In Figure 8, the
input covers +12V to -6V - well outside the +7V to -3V
specification.
Figure 4 helps explain these characteristics. When the
input voltage goes negative, the thin-oxide transistor acts
as a forward~biased diode, and the slope of the the curve
is set by the value of Rp. As the input voltage goes positive, only leakage current flows. The output characteristics
in Figure 9 show the same phenomenon, except that, because this is not an input, no protection circuit exists and
therefore no Rp exists. An equivalent thin-film device acts
as a clamp diode that limits the output voltage to approximately -IV at -5 rnA.

Competitive Devices
Although few devices compete directly with the
Cypress devices covered in this application note, the
latch-up characteristics of the closest functionally similar
devices were measured. The results show devices that
latch-up at trigger currents as low as 10 rnA all the way to
devices that can sustain greater than 100 rnA without
latch-up. The Cypress devices covered in this applicatiori
note can sustain greater than 200 rnA without incurring
latch-up, which is far more than it is possible to encounter
in any reasonable system environment.

Eliminating Latch-Up in Cypress RAMs
The latch-up characteristic inherently exists in any
CMOS' device. Thus, rather than change the laws of
physics, semiconductor manufacturers design to minimize
latch-up effects over the operating environment that the
device must endure. The environmental variables include
temperature, power supply, and signal levels, as well as
process variations.
Several techniques are employed to eliminate the
latch-up phenomenon. One approach is to move the trigger threshold outside the operating range so that the voltage level never approaches this threshold. This can be

4-6

CYPRESS
SEMICONDUCTOR

Understanding Dual-Port RAMs
This application note examines the evolution of
multi-port memories and explains the operation and
benefits of Cypress's dual-port RAMs.
A dual-port RAM is a random-access memory that
can be accessed simultaneously by two independent entities. In digital ICs, this implies a dual-port memory
cell that can be accessed at the same time using two
independent sets of address, data, and control lines.

according to a 4-bit command; the result, C, is output.
The chip also provides a carry-in input, a carry-out output, and A = B outputs. A mode-control pin selects
either logical or arithmetic operations. The 74181 is
combinatorial; no storage is provided.
Early computers used the contents of a memory
location as one operand and an accumulator in the
CPU as the second operand. The results were usually
stored in the accumulator.

A Brief History of Multi-Port Memories

Bringing the Registers On Chip

The first multi-port memories were probably used
in the CPU of the first computers. Many two-operand
instructions are efficiently implemented using dual-port
registers for the operands and the result.
For example, consider Equation 1, which describes
a typical two-operand operation in the ALU (arithmetic
logic unit) of a CPU:
( C ) = ( A ) [ OPERATOR] ( B )
Eq. 1
A and B could be either the operands (Le., the
data) or the addresses of the operands, in which case
the data could be either in memory or in registers. In
any case, Equation 1 describes two pieces of data, A
and B, being operated upon by the OPERATOR and
the results designated as C. C could also be the data, a
register, or a memory location. OPERATOR could be
arithmetic or logical.

The 67901 was the first 4-bit slice that brought 16
4-bit registers onto the chip. The MMI 67901 was
second-sourced by AMD and became the 2901. At one
point, five vendors offered this industry-standard
bipolar ALU. The Cypress CMOS CY7C901 is the
highest-performance, TTL-compatible, 4-bit slice that is
form, fit, and functionally equivalent to the original 901.
The 16-word deep, 4-bit wide register array is functionally equivalent to a 16 x 4 dual-port memory. Four
A address lines and four B address lines select the contents of two of the 16 registers, whose outputs are applied to transparent latches. The latch outputs are then
applied to 3: 1 multiplexers, whose outputs drive the
ALU inputs. The ALU outputs can be sent off chip,
entered into a temporary register (Q), or written back
into the register file, thus replacing one of the operands.
This architecture is shown in the CY7C901 block
diagram in the Cypress data book.

The Combinatorial ALU
The 74181 was the first integrated circuit ALU. In
this IC, the 4-bit operands, A and B, .are operated upon

CY7C901 Dual-Port Memory Operation
A simplified CY7C901 block diagram appears in
Figure1. The device's A and B addresses select the con-

tents of two registers, whose outputs are applied to two
4-bit latches. When the clock (CP) is High, the latch
outputs follow their data inputs (Le., are transparent).
When the clock is Low, the ALU outputs are written
(WE) into the register array at the location specified by
the A or B addresses, depending upon the instruction
being executed. A Low on the clock causes the data in
the latches not to change, so that the ALU outputs are

Figure 1. **901 Dual-Port Memory (Simplified)

4-7

stable when they are written back into the re,gister
array.
Note that the CY7C901 does not perform the
three-port function described by Equation 1. In the
CY7C901, the C operand equals either the A or B
operand, depending upon the instruction being ex~
ecuted. In fact, the A and B addresses can be the same.
An old programming trick is to Exclusive-OR the contents of a register with itself, which clears the register.
Additionally, the CY7C901......s dual-port memory
does not use a dual-port memory cell. This type of cell
is not required because the CY7C901 does not need the
ability to simultaneously write independently to two
separate memory locations.

~ ~
RAM

MPl

1

MUX

MP2

Figure 2. Dual-Port Memory Using Single-Port Ram
data, converts the data to analog form, and sends the
data out over the communications channel on the trans'mit side. If the system contains only one processor, the
data buffers are not shared, and the system needs
neither a virtual nor a physical dual-port RAM.
Control information associated with each data buffer tells the communications controller the number of
words in the buffer and the starting address of the data
in the buffer. The control information resides in one or
more memory locations whose addresses have been previously agreed upon by the two processors.
This simple software-based buffer example requires
a second level of control- a mechanism or procedure
that prevents the two microprocessors from getting in
each other's way. In other words, the system needs a
procedure control mechanism.
Another way of analyzing this requirement introduces the concept of data ownership. Say, for example,
that processor A assembles and stores messages and
thus owns the data while performing these tasks.
Likewise, the communications processor B owns the
data while performing its tasks. The procedure control
mechanism amounts to a technique for transferring data
ownership between processor A and B.
In large systems, where many processors perform
many different operations, the processing of the information is called a job or a procedure. The procedure is
divided into many tasks, which can be performed by different processors. The tasks can either be scheduled
and assigned by a processor dedicated to that task or be
performed by any' available processor. These alternatives are referred to as autocratic and egalitarian systems, respectively. The term egalitarian implies that the
processors are treated equally. In either case, the
processors must have access to, a shared memory location used for message passing.
Synchronizing
sequential
processes
is
the
cornerstone of concurrent programming, which applies
to multi-tasking, single-processor systems; distributedprocessor networks; and tightly-coupled mUltiprocessor
systems.

Dual-Port Memory Using Single-Port RAM
Before the dual-port memory cell existed, designers
created dual-port RAMs from single-port RAMs by adding a multiplexer between the RAM and the two entities that shared the RAM. Figure 2 illustrates a block
diagram of such an arrangement. Two processors, MPI
and MP2, share the RAM. If each processor has access
to the RAM half the time, the resource is shared equally and is said to be allocated according to a fairness
doctrine.
This time division multiplexing assures that there is
no contention for the RAM. However, performance suffers if the RAM's access time does not equal 1/2 or less
of the processors' clock period, assuming that the
processors are clocked from the same source,
For example, consider two processors clocked from
the same 25-MHz source, for a period of 40 ns. Because
the processors are closely coupled, only one operating
system is in memory. In this case, the maximum access
time of the dual port has to be 20 ns or less. The
highest-speed dual-port RAM available has a 25-ns ,access time. Therefore, each processor suffers a WOfstcase 20% performance degradation.

Dual-Port RAM Applications
The first" applications for dual-port memories were
for CPU register files. Dual-port RAMs can also serve
as data or instruction cache memories. However, the
largest usage of dual-port RAMs is in communications,
which includes the exchange of' data between processors, processes, and systems.
Virtual Dual-Port RAM

Communication between systems does not require
physical dual-port RAMs. Instead, a conventional RAM
memory is partitioned into virtual data-storage areas
(buffers), ,usually to store at least two data packets.
These buffers are shared between the communications
controller and the intelligent element that assembles the
packets and stores them (usually ,a microprocessor).
The communications controller can also bea
microprocessor. It reads the data from memory, converts the data from parallel to serial form, encodes the

Message Passing

In the two-processor system under consideration,
synchronization can be achieved by using a lockword or
lockvariable. The lockvariable can apply either to data
(as in this example) or to executable instructions.
The, lockvariable is a location in shared memory
that is operated upon using two synchronization primi-

4-8

Note that this procedure does not require the use
of a dual-port RAM. The procedure does require each
processor to perform a TAS instruction, clear the lockvariable, and send a message to the other processor.
Sending a message implies writing to a location in
shared memory. To know that a message is waiting, the
processor receiving the message must either read the
memory location periodically (referred to as polling a
mailbox) or the act of writing to the mailbox must
generate an interrupt to the receiving processor. The interrupt-driven alternative is usually preferred because
the receiving processor does not have to waste time in a
polling sequence.

tives: LOCK (v) and UNLOCK (v), where (v) is the
location operated upon. These are simple binary switch
operations. If a processor wishes to lock or own a critical section of code or data, the processor indivisibly sets
the lockvariable if testing shows the lockvariable to be
zero. If the lockvariable is not zero, then the operation
is repeated until the lockvariable is zero. To unlock the
critical section, a processor sets the lockvariable to zero
and continues.
Most
modern
processors
have
indivisible
read/modify/write instructions, also called test and set
(TAS) instructions. In Reference 1, however, E. W.
Dijkstra shows that lockvariables can be implemented
without using a read/modify/write instruction. And in
Reference 2, he develops the semaphore, a technique for
managing a queue of tasks waiting for a resource. Lockvariables surround or bracket semaphores and thus provide entry and exit control on a mutual exclusion basis.

Dual-Port RAM Cell History
The first dual-port RAM ICs to use a dual-port
RAM cell were the Synertek SY2l30 and SY213l, introduced in 1983. These products are organized as 1024
words of 8 bits and use n-channel, double-poly silicon
technology to achieve 100-ns access times. The SY2130
has an automatic power down feature controlled by the
chip enables, and the SY2131 does not. The smaller
(512 X 8) SY2132 and SY2133 were similar but unsuc~
cessful.
The original dual-port RAMs include two mailboxes for message passing. When written to from one
port, a mailbox generates an interrupt to the opposite
port. Additionally, on-chip arbitration logic generates a
busy signal to the loser when both left and right ports
address the same memory location. If the loser was attempting to write, the write is suppressed.
Most of the dual-port RAMs on the market today
are functionally equivalent to the original Synertek
products. The "new features" added to several dual-port
RAM products by Motorola and Integrated Device
Technology (IDT) include dedicated semaphore
registers. These semaphores are unnecessary, however,
and the products that use them do not have second
sources.
The SY2l30 was second-sourced by IDT in 1984
and Advanced Micro Devices (AMD) in 1985. IDT also
doubled the density to 2K X 8 and called the new part
the IDTI132. Due to pin limitations (48 pins), the interrupt functions were deleted.
The AMD part (Am2130, 1024 X 8) had at least
three logic errors. A busy-going-active indication failed
to reset the interrupt when both ports addressed the
same mailbox location. Additionally, busy going inactive
failed to retrigger the address transition detection circuitry at all locations. And finally, when contention occurred and both ports were attempting to write, the
losing port was not· prevented from writing. The data
sheet for this device does not explicitly state these conditions, but they must occur for the device to make logical sense (more on this later).
In 1985 IDT added slave companion parts to the
company's dual-port family. The IDT7l40 (1024 X 8) is
the slave to the IDT7130, and the IDT7142 (2K X 8) is

Typical TAS Instruction

The current example assumes that the processors
have a TAS instruction. A typical TAS instruction
operates as follows: Read, test, and set to X. The addressed memory location is read, and if its contents are
zero, the value X is written into that location. If the
contents are not zero, the contents are returned to the
processor, and the value in the memory location is not
disturbed.
The usual convention is that a value of zero in the
lockvariable means that the resource associated with it
is available. A non-zero value means that another
processor temporarily owns the resource and that the
resource is not available. After performing the task associated with the lockvariable, the processor sets the
lockvariable's value to zero. The system is initialized
with alilockvariables set to zero.
In the current example, processor A performs a
TAS operation on the lockvariable and, fmding the
lockvariable zero, sets the lockvariable to a one. This
tells processor B that the message is in the process of
being assembled in the memory buffer area and is not
ready to be transmitted. Processor A then assembles the
message. After the message is assembled, processor A
clears the lockvariable, sends a message to processor B
saying that the message is ready to be transmitted, and
gives the data's location and the number of bytes to be
sent. Processor B reads the message from processor A
and performs a TAS operation on the lockvariable;
finding the lockvariable zero, processor B sets it to a
two. This tells processor A that the message is in the
process of being transmitted. Processor B then transmits the message and clears· the lockvariable. Processor
B sends processor A a message that the transmission
task has been completed. After receiving the message
from processor B, processor A performs a TAS operation on the lockvariable; finding the lockvariable zero,
processor A concludes that the message has been successfully transmitted.
4-9

ot

which requires one
more of the same resources to
perform its task.
For example, if processor A owns resource X and
processor B owns resource· Y, and both resources are
required to accomplish the task, a stalemate occurs. in
which each processor waits for the other to relinquish
the required resource. This is the simplest example. J'he
concept extends to n processors and m resources.
The solution to the deadly embrace depends upon
whether the system is autocratic or eglitarian, the tasks'
priorities, etc., and is beyond the scope of this discussion. In the case of dual-port RAMs, however, the solution is simple: Do not cascade two masters in width; use
a master and a slave.

the slave to the IDT7132. The slave device provides
word-width expansion. Busy is an input to the slave
from the master, and the slave contains no arbitration
logic. One master can drive many slaves. .This arrangement avoids the classic deadly embrace problem. This
arrangement avoids the classic deadly embrace problem
described in the next section.

The Deadly Embrace
The deadly embrace can. occur when two masters
are connected in parallel to make a wider word. If the
left and right port addresses match, and the left and
right port chip enables then become active to both chips
at approximately the same time, it is possible to have
one port of one master lose and the opposite port of the
other master also lose. In other words, if an address
match occurs and both ports are enabled during a small
time window, or aperature of uncertainty, the dual-port
RAM cannot determined which port wins or loses.
Under these conditions, if the corresponding left
and right port busy pins are connected together, both
ports of both masters are active (Low). This condition
occurs because the busy outputs are open drain, and the
loser pulls the node Low.
This condition is the simplest example of the deadly
embrace. So far as the external world is concerned,
both ports are busy, and the system remains locked up
indefinitely, with each port waiting to be released by the
other. Each master's arbiter section thinks it has lost
the arbitration and is waiting to be. released by the
other.
In general, the deadly embrace occurs under two
conditions: a processor requires one or more resources
to perform a task, and one or more of the required
resources is temporarily owned by another processor,

The Cypress Dual-Port RAM Family
Table 1 lists the members of the Cypress dual-port

RAM family. The package designator D26 stands for
600-mil ceramic DIP, and P25 stands for 600-mil plastic
DIP. The 48-pin ceramic leadless chip carrier (LCC) is
designated as L68. The 52-pin packages are designated
as L69 for ceramic LCC and J69 for plastic LCC
(PLCC).
Note that the interrupt function is not available at
the 2048 X 8 level in a 48-pin package. This is due to
pin limitations. At the 2-Kbyte level, each port requires
an additional address pin for the address's most significant bit.
The MIS column in Table 1 indicates whether the
device is a master or slave. The difference between
these devices is that the masters have arbitration logic
and· the slaves do not. The busy signals are outputs from
the master and inputs to the slave. (The ramifications of
this are examined later.)

Table 1. The Cypress Dual-Port RAM Family
Packa2e Options

MIS

Configuration Part Number

48-pin Dual In-Line Pkg
Ceramic

IKX8

2KX8

D26

CY7C130

M

CY7C131

M

---

CY7C140

S

D26

48-pin
Square

Plastic

52-pin Square

LCC

LCC

PLCC

L68

---

---

---

---

L69

J69

P25

L68

---

---

L69

J69

P25

CY7C141

S

---

---

---

CY7C132

M

D26

P25

L68

---

---

CY7C136

M

---

---

---

L69

J69

CY7C142

S

D26

P25

L68

---

---

CY7C146

S

---

---

---

L69

J69

Note: The nterru pt function

IS

not available at the 2KX8 level In a 48-PIn package
4-10

interrupted port reads that memory location, the interrupt is reset.
When both ports address the same memory location and both chip enables are active (Low), contention
occurs for that address. An arbitration is then performed, and ownership of the memory location is assigned to the winner. An active (Low) busy signal
notifies the loser of the arbitration.

Dual-Port RAM Functional Description
Figure 3. Dual-Port RAM Block Diagram

An important aspect of the Cypress dual-port
RAM s is their interrupt logic. A simplified logic
diagram of this logic appears in Figure 4, with the chip
enables deleted. A port's chip enable must be asserted
for the port to either read from or write to any location,
including the mailboxes. Note that you can use the mailbox locations as conventional memory by not connecting
the interrupt line to the appropriate processor.
The upper two memory locations (7FF and 7FE for
2K x 8; 3FF and 3FE for IK x 8) can be used for message passing. The highest memory location serves as the
mailbox for the right processor. When the left processor
writes to this mailbox, the interrupt (request) to the
right processor, INTR, goes Low. When the right
processor reads its mailbox, the flip-flop is reset, and
INTR goes High.
The second highest memory location serves as the
mailbox for the left processor. When the right processor
writes to this mailbox, the interrupt (request) to the left
processor, INTL, goes Low. When the left processor
reads its mailbox, the flip-flop is reset, and INTL goes
High.
Note that each port can read the other port's mailbox without resetting the associated flip-flop. If your application does not require message passing, leave the
appropriate pin open. Do not connect a pull-up resistor
to the pin, and do not connect the pin to the processor's
interrupt request pin.
Note that the active state of the busy signal
prevents a port from setting the interrupt to the winning
port. Additionally, an active busy signal to a port
prevents that port from reading its own mailbox and
thus resetting the interrupt. These operations are
ramifications of the data-ownership concept.
If both ports address the same memory location at
the same time, the master performs an arbitration, so
that one port wins and the other loses. Because each of
the two ports can be in either the reading or writing
state, there are four possible combinations of ports and
states (Table 2).

Cypress Dual-Port RAM Operation
A simplified block diagram of the Cypress dual
port RAM appears in Figure 3. The device interface includes three types of signals: address, data, and control.
There are two sets of these signals: those of the left port
and those of the right port. Each signal has either the
subscript L or R to designate left or right, respectively.
The address pins are designated AO through A9
(1024 X 8) and AD through AIO (2048 X 8), where AO
is the least significant bit (LSB) and A9 or AlO is the
most significant bit (MSB). The address pins ~
unidirectional inputs to the device; their states specify
the memory location to be read from or written into.
The data pins are designated 1I0D through 1I07,
where 1I00 is the LSB and 1I07 is the MSB. The data
pins are bidirectional; their states represent either the
data to be written or the data to be read.
The control pins are chip enable (CE'), readlwnte
(R/ W), and output enable (00). Two flags are also
provided, INT and BUSY; both have open-drain outputs and require external pull-up resistors. A Low on
the chip enable input allows that port to become functional. Data is either read from the internal dual-port
RAM array or written into it, depending upon the state
of the read/write signal; a Low initiates a write operation. The three-state data output drivers are enabled by
a Low output enable.
When one port writes to a pre-determined mailbox,
an interrupt to the other port is generated. When the
(OPEl lUll)

LEFT

'ID

ADDlEn

'-------'

IIIHT SIIE

Both Ports Reading
If both ports read the same location at the same
time, you would assume that both ports should read the
same data. This is true for all dual-port ICs. When arbitration occurs as a result of contention in a Cypress
dual-port RAM, the port that wins the arbitration gets
temporary ownership of the memory location. The

ADDlElS

(DPlI DUUI

Figure 4. Interrupt Logic

4-11

Table 2. Functional Operation of Duill..Port Masters
RESULT OF OPERATION AFTER ARBITRATION (MASTER)

OPERATION
CASE

LEFTPORT RIGHT PORT

CYPRESS and IDT

AMD

BOTH PORTS READ

1

READ

READ

BOTH PORTS READ

2

READ

WRITE

3

WRITE

READ

LOSER WRITES, WINNER IF LOSER PREVENTED FROM
READING, MIGHT HAVE
WRITING. IF LOSER IS
CORRUPTED DATA AND
READING AND PORTS ARE
NOT KNOW IT
ASYNCHRONUS, DATA

4

WRITE

WRITE

READ MIGHT NOT BE VALID

losing port can read the memory location but is told
that it lost the arbitration by the busy signal.
To guarantee data integrity in a multiprocessor system, it is standard practice to apply the concept of data
ownership. This ownership can apply to executable
code, data, or control locations in memory. The control
locations in memory can be associated with a resource,
such as a printer, tape drive, disk drive, or communications port.

The arbitration logic consists of left and right address equality comparators with their ass~iated delay
buffers; the arbitration latch formed by the crosscoupled, three-input NAND gates labeled L and R; and
the gates that generate the busy signals.
Operation With Unequal Addresses

When the addresses of the right and left ports are
not equal, the outputs of the address comparators
(nodes A and B) are both Low, and the outputs of the
gates labeled Land R (nodes C and D) are both High.
This condition forces both Busy signals High and both
Wnte InhibIt signals High. The arbitration latch does
not function as a latch.

One Port Reading. the Other Writing

In the AMD dual-port RAM, the losing port is not
prevented from writing. In the Cypress and IDT
devices, the losing port is prevented from writing. All
dual-port RAMs assert a busy signal to the losing port,
so that this port can tell that the data might be corrupted.
In the Cypress dual-port RAMs, the losing port is
prevented from writing so that the data cannot be corrupted. Busy is asserted to the losing port, so that the
port can tell that its read or write operation might not
have been successful.

Left Port Camped on an Address

Next, consider the condition where the left-port address and chip enable are quiescent, and the right port
address changes to an address equal to that of the left
port. Nodes A and B are initially Low.
Because the right-port address does· not go through
the delay buffer, the output of the right-address com-

Both Ports Writing

ADDRESS L

ADDRESS(R)

In the AMD dual-port RAMs, both are allowed to
write. Busy is asserted to the losing port, indicating that
the data might be corrupted. However, the winning port
is not told that the data it just wrote might be corrupted
by the writing of the losing port. This situation can
cause system errors.
In the Cypress and IDT dual-port RAMs, the
losing port is prevented from writing, so that the data
cannot be corrupted. Busy is asserted to the losing port,
indicating that its write operation was unsuccessful.

Arbitration Logic
Figure5 shows the arbitration logic used in Cypress

dual-port RAM masters. The arbitration logic has three
functions: to decide which port wins and which loses if
the addresses are equal simultaneously; to prevent the
losing port from writing; and to provide a busy signal to
the losing port.

WRITEINHIBIT(R)
WRITEINHIBIT(L}
Figure 5. Arbitration Logic

4-12

~

~~~am~~~~~~~~~~~~~~~~U~n~d~e~r~s~ta~n~d~i~n~g~D~u~a~1~-P~o~r~t~R~A~~~s
parator (node B) goes High before node A goes High
by a delay interval, d. The delay must be greater than
the delay through the R gate, so that when node B goes
High, node D goes Low, causing node C to remain
High. CE(R) and CE(L) are both High; they are the
inverse of the chip enable inputs. Node D going Low
causes the output of the BR gate to go Low, which tells
the right port that the memory location it just addressed
belongs to the left port. A write-inhibit signal is also
generated that prevents the right port from writing into
the addressed memory location.
In summary, when the right port addresses a
memory location that is already being addressed by the
left port, a delay occurs that equals the sum of the
propagation delays of the right-address comparator, the
R gate, the BR gate, and the output driver (not shown
in the diagram). Then the busy signal to the right port is
asserted. Nodes A, B, and C are now High, and node D
is Low. BUSY is asserted to the right port.
Due to the symmetry of the arbitration logic, the
device operates the same when either the right or left
ports are camped on an address.

>K_______
I
---------~--------------------!
I
k~

ADDRR ==>C!J56RESS !'lATCH

'viER

--O-_Il-_[)--~~~~~:~~==*~-----

ADDRI_

i
t
i
r i
===X:__~_!'~QQB~!??__~~k_~!i._____· ____________

~tJ~.xr.oj

~t..stJ·ol.,·-l-···I"

.• .

-I-\.oI~ll~Jmmi:r~-.~-f"'->i-

BUSYL ---..;...

DOUTL

! vtrUN/li/M'i
i" >.c
I
~········-l!lYr ....... ~~~.d
.-----------------------------------------------------------'-i
/

DEL

"'----./

Figure 6. Busy Timing
time, another cycle must be added to detect the condition, which can severely reduce performance. This time
is less than the minimum cycle time for all speed grades
of all Cypress dual-port RAMs.
Another parameter, Busy High from address mismatch, tBHA, is the maximum time it takes busy to go
from Low to High, as measured from the time the two
port addresses do not match until the busy signal goes
High. ,The comments of the preceding paragraph also
apply here;
The next two parameters are similar to the preceding two. The difference is that the chip enable controls
the busy signal. The parameters are Busy Low from CE
Low, tBLC, and Busy High from CE High, tBHC. Both of
these parameters are less than the minimum cycle time
for all speed grades of all Cypress dual-port RAMs.
Busy High to valid data, tBDD, is the maximum time
it takes the data to become valid to the losing port after
Busy goes away. This parameter's value equals the address access time, tAA, because a read cycle is initiated
to the losing port when its Busy signal transitions from
Low to High. An action by either port can cause the
busy transition. The winning port can either change its
address or deassert its chip enable.
To illustrate the last two parameters, Figure 6
shows the timing for the right port performing a write
operation and· the left port asynchronously moving to
the same address and attempting to perform a read
operation. The .first parameter of interest is tDDD ,
which is the maximum time between the stabilization of
the data to be written by the winning port and that same
data becomin~lid at the outputs of the port that
received the Busy. The second parameter of interest is
tWDD, which is the maximum time between the High-toLow transition of the winning port's write strobe and
the data becoming valid at the outputs of the port that
received the Busy.
It is possible for the losing port to read either the
old data,the new data, or some random combination of

Right and Left Addresses Equal Simultaneously
In the general case, it is possible to have both ports
access the same memory location simultaneously, unless
this is guaranteed not to occur by the design of the system. When nodes A and B go from Low to High at ex~
actly the same instant, .the arbitration latch· settles into
one of two states and determines which port wins and
which port loses. The latch is designed such that its two
outputs are never Low at the same time. It also has a
very fast switching time.
The dual-port RAM imposes a minimum time difference between either of two events: the two chip
enables going from inactive to active and the two sets of
addresses going from mismatch to equal. If the events
are close together in time, the probability of each port
either winning· or losing the arbitration is approximately
equal. This parameter is called port set-up time for
priority and· is abbreviated as tps on the data sheets.
The specified value is 5 ns. (Note, though, that Cypress
product engineers have measured tps at· room temperature and nominal Vee (5V) and found a value of approximately 200 ps.) In other words, if one port addresses a memory location 5 ns before the other port, the
first port is guaranteed to win. If not, the result of the
subsequent arbitration is unpredictable.

Other Key Busy Parameters
Several other key parameters are specified with
respect to the busy signal. For example, Busy Low from
address match, tBLA, is the maximum time it takes busy
to go Low, as measured from the time the two port ad-.
dresses are the same. This is the time from an address
match until the losing port is notified that it has lost the
arbitration. Obviously, the sooner this occurs the better.
If the value· of tBLA is greater than the memory cycle
4-13

5~
Understanding Dual-Port RAMs
. ~~~================~~~~~==~==~
IDLE - - - - - - ,

the two under these circumstances: the two ports are
operating asynchronously (i.e., with independent
clocks), and the conditions illustrated in Figure6 occur
(winning port writing and losing port reading). If the
read occurs early with respect to the write, old data is
read. If the read occurs late with respect to the write,
new data is read. And, if the read occurs at the same
time the data is changing from old to new, the data read
is not predictable. However, all is not lost There are
two general solutions. Both use the fact that the busy
signal is asserted to the losing port, telling the port in
this instance that the data it is reading might not be
valid.
One solution is to use the High-to-Low transition
of the busy signal to the losing port to generate an interrupt to the processor (or state machine) so that operation can be repeated. The drawback of this technique is
that a snapshot of the states of the losing port's address
lines and readlwnte line must be taken, so that the
processor can tell what load/store operation caused the
interrupt Taking this snapshot requires latches or flipflops for the data and control logic for doing the sampling, and the technique uses up an interrupt line. The
processor must also be able to read the sampled data
later.
A second solution is to use the Low level of the
Busy signal to the losing port to prompt one of three
types of delays: delay the reading of data until the data
becomes valid, which occurs an access time after the
Low-to-High transition of Busy; insert wait states until
Busy goes High; or stretch the clock until Busy goes
High. Any of these methods probably. require less
hardware and control logic than the preceding approach. Use of these methods does mean that the Busy
signal must eventually go from Low to High. This happens when the winning port either changes its address
or deasseru its chip enable. For this reason, as well as
for system noise immunity and power-saving considerations, it is recommended that blocks of addresses be
decoded to generate chip enables for the dual-port
RAMs.
Because the losing port has no control over the
winning port in the general case, however, a question
arises: What can the losing port do to successfully read
the data just written, assuming the winning port does
not change its address, write, or chip-enable signals?
There are two possible operations:
1. Change an address line to a different address,
then change back to the original address. This toggles
the busy signal to the losing port
2. Change the state of the chip enable. This also
toggles the busy signal to the losing port

l
DETEClEVENT
TURN· ON CIRCUITS
~

PERFORMOPERATION

l

TURN-OFF CIRCUITS

1

Figure 7. Simplified ATD Sequence
Detection (ATD) to improve performance and reduce
power dissipation.
ATD improves performance by equilibration of differential paths, pre-charging critical nodes, and forcing
the outputs to a high-impedance state. Equilibration
and pre-charging bias critical nodes to voltage levels approximately in the mid-point of the small-signal operating range; when the data is sensed, it takes a shorter
amount of time to transition to the Zero or One level.
Forcing the outputs to their high-impedance states improves speed slightly, but more importantly, the technique reduces output switching noise by eliminating crowbar current and separating the output current into two
pulses instead of one.
ATD minimizes power consumption because it
turns on power-hungry circuits only when they are required. Slightly over 50 percent of a RAM's circuits are
linear, and approximately 70 percent of the power is
dissipated in the sense amplifiers during a read operation. When the RAM is operating at its .maximum frequency, the ATD circuits are constantly triggered, so
the power savings are minimal. At lower speeds or
smaller duty cycles, however, the power savings are significant
A diagram representing a typical ATD sequence is
illustrated in Figure 7. The event that triggers the ATD
sequence for either port is the transition of any address,
chip-enable, or read/wnte signal. Equilibration and precharging are performed next, followed by either turning
on the sense amplifiers and latching the data (read
operation) or pulling the BIT and BIT lines to the required levels (write operation) at the addressed location. The master clock pulse lasts from 7 to 11 ns,
depending upon temperature, supply voltage, and the
distributions of IC processing parameters. At the end of
the pulse, the data is latched and the appropriate circuits are turned off.

Master Stand-Alone Operation
Figure8 presents a block diagram of a system using
two 8-bit microprocessors, the Cypress CY7C132 dualport RAM, static RAM, and EPROM. The address
lines of each microprocessor are decoded to generate
the chip enables to the dual-port RAM, the SRAM, and
the EPROM. Note that pull-up resistors are required
on the· interrupt requests to the microprocessors and

Address Transition Detection
Why does changing the address or chip enable
allow a losing port to read data successfully? All
Cypress dual-port RAMs, both masters and slaves, use
a circuit design technique called Address Transition

4-14

VCC
Q
INT ( L)
ADDR
DAT A ....
WR

.

..
po

-

WAIT
MREQ f8-BIT iii

~

.

CHI P
ENABLE
DECODE

IN T ( L)
A ( L)
D ( L)
WE ( L)
CE ( L)
BUSY ( L)

INT ( R.l
A (R)
D (R) ~
WE (R)~
CE ( R) BUSY
~
(

DUAL-PORT
CY7C132

2K x

~~

TV C C

,..--

n

-.
-.
-.

I--

ADDR
DATA
WE
CE
RAM

..
po

po

J

Ie

..

ADDR
DATA ~
WE ~
CE
RAM

l+ ADDR
---+ DATA
CE
EPROM

INT ( R)
ADDR
DATA
WR
WAIT
MREQ
8-BIT

.

L

p

~

-

I

CHIP
ENABLE
DECODE

,..--

ADDR f+DATA ~
CE
EPROM

-.

..
po

Figure 8. Typical 8-Bit Microprocessor

the busy signals, which go to the microprocessors' wait
inputs.

cycle times must be increased by this amount of time. In
equation form:
twc = tPWE + tBLA
Eq.2
where the delay must be at least equal to tBLA.
Note that if you add more slaves to make a wider
word, (e.g., 24 or 32 bits) the delay elements' outputs
can connect directly to the write-strobe inputs. Additional delay elements are not required.

Slave Word-Width Expansion
The block diagram in Figure 9 shows how to interconnect a CY7C132 (2K x 8) master and a CY7C142
(2K x 8) slave to form a 16-bit-wide word. The diagram
does not show the interfaces to the processors or the
connections for the interrupt signals. As previously explained, the interrupt outputs are not available at the
2K X 8 level in the 48-pin DIP due to pin limitations. In
the LCC and PLCC packages, the interrupt outputs are
available from both the master and the slave devices.
You can use either one. You do not have to tie the corresponding interrupt pins of the master and the slave
together.

Slave Stand-Alone Operation
Some applications might require that you give one
port permanent and absolute priority over the other.
You can easily do this. by implemen~ the memory
using only slave dual-port RAMs. The Busy input to the
priority port must be tied High by either connecting it
directly to Vee or to Vee through a lO-Kn pull-up resistor. You can connect the -ow priority port's Busy input
to the high-priority port's read/write input.
In this configuration, the busy (read/write) signal to
the lower-priority port always prevents the port from
writing when the high-priority port is writing to any
location. The data of the Lower priority port is overwritten when the two ports operate asynchronously, the
lower-priority port is writing, and the higher-priority

Delaying the Write Strobe
In width expansion, the write signals to the slave
devices must be delayed by an interval at least equal to
tBLA, which is the time required for the master to assert
the busy signal to the slave after an address match. The
delay prevents the slave data at the address in contention from being overwritten. Both the write and read
4-15

port simultaneously writes. This is not a very elegant
solution because the Busy input to the low-priority port
is not qualified by comparing the addresses of the two
ports or their chip enables. However, this approachsuggests how the slave dual-port RAMs can be used with
external arbitration logic. The busy inputs can be used
by control logic or under program control to dynamical'
ly change the port priorities.
If the lower-priority port is read only, you can tie
its Busy input High by either connecting it directly to
Vee or to Vee through a pull-up resistor.

Dual-Port Design Example
The following design example illustrates the
methodology to follow when designing with Cypress
dual-port RAMs. In this example, a dual-port memory
is used for message passing and bus. snooping for many
bus masters on a 32-bit-wide system bus. The dual-port
RAM s interface to a 32-bit system bus on the right side
and a 16-bit processor on the left side. From the right
port, the memory appears as 8K 32-bit words, and from
the left port the memory appears as 16K 16-bit words.
The memory has the following characteristics:
1. The memory location corresponding to address
zero for both ports is the same.
2. The data read from and written to the memory
from both ports is in the same order. Thus, DO of the
right port corresponds to DO of the left port. Additionally, D16 of the right port appears as DO of the left port
in address location 2048.
3. The minimum cycle time is 35 ns.
4. To conserve power, blocks of addresses are
decoded to generate the required chip selects.

AID - AD ( L )
D7 - DO LJ.)
WE (L )
OE (L )
CHIP ENABLE { L
BUSY (L )

A ( L)
A (R)
DU,A L PO RT D (R)
D (L)
WE (L) RAM CHIP WE (R)~
_
OE ( L) CY7C132 DE (R)~
.... CE ( L)
2K x 8
CE ( R) :::
BUSY (L )MAS T E R BUSY

-

..
...
..

'

('1

l

~

-

D8

I..LI)
~

I,'

.

--"'

4
~
~
---.;.

.

Vee

oE

(R)

CHIP ENABLE
BUS Y CR)

(R)

DELAY :j

A ( R) f4eL)
DUAL PO RT D ( R )
D , (L)
WE ( L ) .RAM CHIP WE (R) f4OE (L) CY7C142 OE (R) ~
CE ( R) ~
CE ( L) 2K x 8
BUSY ( t ..,
BUSY {'L)S LAVE

1-+ A

AID - AD (R)
D7 - DO (R)
WE (R)

v

I

1\ DELAY
D15

5. The CY7C132 and CY7C142 dual-port RAMs
are used. Part of the design task is to specify the number of masters and slaves required and the way they
must be interconnected. .
6. The appropriate Busy signals must be generated
to the correct port when contention occurs.
7. All possible mailbox locations that can be used
for message passing 'are used.
8. The right port signals are ARO ...ARI2,
DRO ...DR31, ~; and BusyR. The left port signals are
ALO... AL13, DLO...DLI5, eEL, and BusyL.
A simplified logic diagram of the memory appears
in FigurelO. A total of 16 2K X 8 dual-port RAMs are
required. The devices labeled MA (master, bank A)
through MD (master, bank D) are CY7C132 masters.
The devices labeled SU (slave, upper half-word) and SL
(slave, lower half-word) are CY7C142 slaves. The
memory cpnsists of' four masters and twelve slaves,
along with the required control logic.
From the right port The memory is configured as
8K 32-bit words, with a master' controlling three slaves.
The one-of-four decoder labeled RB (right bank)
generates. chip-enable signals for each bank of 2K 32-bit
words. Data is written (sampled) on the bus side, and
the only reads performed are from the mailbox
locations.
A general-purpose, right-port, control-logic block
generates control signals that conform to the timing
diagram shown in Figure 11. The diagram does not show
the generation of the output-enable control signals, but
they are similar to the RB decoder signals. If your application does not require message passing to the right
port, you can tie the right-port output-enable pins of all
of the dual-port RAMs directly to Vee.

\j'

Figure 9. Expansion (2K x 16) With Slave

4-16

D15

-

D8

(R)

From the left port, the memory is configured as
16K 16-bit words. For this organization, you might think:
that the slave dual-port RAMs in the second column
from the right in Figure 10 should be masters. If this
were the case, however, you would have to defeat the
arbitration logic in them when the right port addressed
the same address; this would add logic, reduce the
speed, and complicate the design. Therefore, this design
uses a combination of left-bank decoding (LB, 1-of-4
decoder) and upper-lower 16-bit word decoding (UL, 1
DL'

.-

~
~

LEFT POIT
COITIOL
LOIIe

--

-

I.

>-f-

-

t--<

UL

4

I&-I&--

1/11-1
1(0.10)
1/0-1
CE-I
IE-I
I-L

I&•7 !G-

I
II

•

I OF
DECODE

1

L.
1 OF 4
DECODE

I
I

~

I--

CE-L
DI-L
I-I
1/11-1 0 - - I(O.lO)
I/D· I I"'""
C E-I ~fDE-I ~
I-L

---

-

SL

SL

'-----

'-----

I-PI-I--

PPl-

I/O-L
~ 1/0- L
...-< CE - L
f-- I--< CE-L
f---< OE-L
I-- I--c o E- L
f---< I-I
I-- I--< '-1
1/11-1 l>- f-I-1/11. I
I-- I CO .10) I-- I-- I CO. 10)
1/0-1 I-I/O ••
CE-I l>- tI-CE-I
OE -I l>- tDE-I
tI-- '-L
I - -< I-L

I-SU
'----

"'

'----

-

AL(D.lI)
I/D-L
--c CI- L
OI-L
'-1
1/11-1 ~
1(0.11)
1/0-1
CE-I
DE-I
I-L

III IT POI
CDUIOL
LOIIC

f-

L
II. CIOI-L
'-1
I-1/11- I l>ICO .10) I-1/0-1 ~
C E-I l>oE- I l>'-L
I--

IC

SU

'-----

'----

1~
ElAILE-1

SL

SL

'-----

'----

I>I--

I-I>l>-

~ .. ...

I--<

--< CI-L
f---< DE· L

l - I- OI-L

f-c '-1

I--

i--

rE

SL

'-----

'-----

1/11-1 ~

i-- ICO.IO) i-1/0-1
i-CE-I
DE -I
I-f---< '-L

I--

i~

l>-

I--

I

u"

J

I OF 4
DECODE

oE·

~cc

.......

I/II-L
i--<~i-- I--<~i""- I-~
AL(D .10)
ALU.IO) I-- I - AL(O.IO)i-- i-- AL(O.IO)
'--- 1/O-L
I - 1I0-L
'--- 1/0-L
- - 1I0-L
CE- L
' - - - - CE- L
I-- I---< C E- L
r-' CE - L
OI-L
OE- L
I-- I---< OI-L
+-- DE - L
1- I
'-1
i - H - I-I
i - t--< 1- I
1/11-1 1>--11/11-1 I>- H I/II-I l>- I-1/11-1 PI(O.lD)
ICO .10) I- 1-1- ICDalD) l - i--- I Co. 10) l - t1/0-1 I-I/O-I I-1/0-1 I-1/0- I i CE-I
C E- I P- 1-1-CE-I P- I-C E· I P- I - IE-I
01-1 P- I-f-01-1 l>- I-01-1 l>I- L
I-- I-f---< '-L
'-L
'-L
l - i---<
I-10
su
SL
5L

-.....J

II.

J

'----

'----

'-----

YeC}
Dill

011'

DIU

01'

Figure 10. Logic Diagram for Dual-Port Example

4-17

I-I--

Cl-L

I - '-1
1/11-1 ~
i""i - ICO.IO) i 110-1 i CE-I ~
ia E·I ~
fI--< '-L
SL

i--

A ..

0
1

I--

01!4 -

~
~

~

i--<~f- i--<~f- I--<~
AL (0 .10) ~ ~ AL(O.U) I-- i-- ALCO.lO)
~ I/O-L
I-- 1/0-L
~ I/O-L

-

L-...c

-

AI(O.lD

AL(D.l0)
I/O-L

I-

I/O-L
CE- L
OE-L
'-1
I/II· I
1(0.10)
1/0-1
C E· I
oE-I
I-L

l>-

--c~co

•

A
I

r

CE- L

Ir OE·L 1-1

I

I
I

I--

~ I/O-L

• iEr

A

-

f----<~I-- f----<~I-- I--<~
ALCD.IG) I-- I-- AL(OaU) I-- I-- AL(O.lI)

i--- AL(D.U)

•

-

- -

AL CD .10)
AL CD .10)
-_
I/O-L
1/0-L
C E- L
-<
--< CE- L
>----( OE·L
DE. L
-<
f - - --< I-I
I-I
-<
1/11-1 I>-- f1/11- I >-1(O.lO) ~ ~ 1(0.1'0) ~
1/0-1 ~
1/0-.1 i""C E·I l>CI-I ~ i foE·I l>- fDE-I ~ i 1- L
I-- f---< 1- L
l- i-

I-SU
L---

'-----

C

II

r------c~- --c~- -<~

r---<~CO

.----

For purposes of this discussion, "word" refers to the
32-bit word at the right-port system-bus interface. At
the 16-bit processor interface, the 32-bit word is
referred to as either the lower half word (right-port bits

ILl

U

EIAILI·

ALII

DLD

Right-Port Operation

yec 0

AL(D.lI)
I/O-L
--< CE- L
OE- L
I-I
1/11-1
1(0.11)
I/O-I
CE-I
II-I
I. L

Id

ALl!
ALll

DLll

--<~

AL(O.lD)

~

-

of 8 decoder) to cause the bank master to arbitrate
when the right port is addressing the same bank as the
left port (more on this later).

01 . .

010

-

.17

Aill
AUI

Left-Port Operation

ADDRESS _______X______________________XL_____________

CE,OE.'vIE

u

Figure 11. Dual-Port Timing for Example

o through 15) or the upper half-word (right-port bits 16
through 31).
The bank-selection process employs the chip
enables. Specifically, the l-of-4 RB decoder decodes
the four combinations of the upper two right-port address-bus signals and generates four active-Low chip
enables to each bank of four dual-port RAMs. Bank A
contains addresses 0 through 2047, bank B contains addresses 2048 through 4095, bank C contains addresses
4096 through 6143, and bank D contains addresses 6144
through 8191. In other words, bank A addresses 0 to
2K, bank B 2K to 4K, bank C 4K to 6K, and bank D 6K
to 8K.
The lower 11 right-port address lines, AR(0:10),
are connected to the AO through A10 right-port address
pins of all the dual-port RAMs.
Figure 11 does not show the generation of the write
strobe, but does show the signal's timing. The write
enable is applied directly to all the masters in parallel,
then buffered, and th~n applied to all the slaves. The
minimum propagation delay of the buffer must be at
least as large as tSLA, which is the time required for the
master to assert the busy signal to the slaves after an
address match occurs.
Note that all the right-port output-enable pins are
connected together. These pins should be driven if
reading is required; otherwise connect them to Vee.
The open-drain busy outputs of the right port
masters must be pulled up to Vee using resistors. A
value of 3300 is recommended. The master busy outputs connect to all the right-port slave busy inputs for
each bank.
For the data bus interface, the I/O pins of each
RAM column connect to their respective I/Q pins on
each bank. This OR-tie connection is allowed because
the bank-selection chip enable causes the output buffers
of the un-selected banks to go to the high-impedance
state.

The l-of-4 decoder labeled LB performs bank
selection for the left port. The upper two left-port address lines, AL13 and AL12, decode bank-select chipenable signals for the four masters only. Bank A corresponds to addresses 0 through 4095, bank B corresponds to addresses 4095 through 8191, bank C corresponds to addresses 8192 through 12,287, and bank D
corresponds to addresses 12,288 through 16,383.
To perform upper and low~r half-word selection,
the I-of-8 decoder labeled UL decodes the upper three
right-port address signals. The decoder then generates
eight chip-enable signals with a resolution of 2048. The
chip enables connect to the slaves' chip-enable and output-enable pins (2048 resolution) and to the masters'
output enable_ Because the master chip-enable resolutiqn is 4096, the master arbitrates for two block~ of 2048
16-bit half words.
The lower eleven left-port address lines, AL(O: 10),
connect to left-port address pins AO through A10 of all
the dual-port RAMs.
At the 16-bit interface, writing is only required if
the left port wishes to send a message to the right port.
Otherwise, you can· connect the left-port write pins of
all the dual-port RAMs to Vce.
To implement the left-port data bus interface, the
left port's data I/O pins are connected together in the
same manner as those of the right port for all RAMs in
the same column. In addition, to multiplex a 32-bit data
word to a 16-bit half word, the least-significant bytes
and the most-significant bytes of each 2048-word group
are connected together_ The UL decoder that controls
the left-port output enable performs the selection_
Jf you use the masters' interrupt pins, pull them up
to Vee through a 3300 resistor and connect them to the
processor interrupt-request input. You can leave the
slaves' interrupt pins IlDconnected.
If the control signal connections from their source
to the dual-port memory constitute electrically long
lines, they might require proper termination to avoid
voltage reflections· due to impedance mis-matches.
Refer to the application note "Systems Design Considerations When Using Cypress CMOS Circuits" in this
book for further information;

References
1. Dijkstra, E.W., "Solution of a Problem in Concurrent Programming Control." CACM, Vol 8, no.9,
Sept. 1965, p 569.
2. Dijkstra, E.W., "Co-operating Sequential Processes." Programming Languages, F. Genyus (Ed.)
Academic Press, New York, 1968, pp 43 - 112.

4-18

CYPRESS
SEMICONDUCTOR

Using Dual-Port RAMs Without Arbitration
to generate a hold to the microprocessor until Busy is
deasserted. Adding an occasional wait state to a
microprocessor generally has no effect on the overall system performance.
Gating the Wait line and generating a hold to the
processor resolves the logical problem of simultaneous address conflicts but does not address the system-level issues that can cause the conflicts. The two-processor example serves to illustrate a common underlying cause of a
Busy state. Say that processor A attempts to read an array
of data that was generated by processor B, but the system
contains no mechanism to alert processor A when the data
is ready or valid. Therefore, processor A might be updating a RAM location while processor B is reading the same
address or vice versa.
This lack of overall synchronization or interprocessor
communication can manifest as stale data or incomplete
arrays of data in the shared memory. In a few cases, stale
or incomplete date is tolerable, but in most cases it is
fatal.
Locking a processor or processors out of specific
memory areas until data is available guarantees that
processors never receive stale data. To implement such
address-space restrictions, you must provide a level of access protection above the basic gating-of-Busy technique.
In mpst cases, you must add external hardware that signals the processors when new data is available or when

This application note offers several ways to implement dual-port RAMs to facilitate communication between processors. The applications covered include communication with general-purpose processors; video and
radar equipment; digital sigrial processors; and bit-slice
processors.
The most common application for dual-port RAMs is
to provide a high-speed memory resource that can be
shared between two processors in a system. Figure 1·· illustrates how the two processors communicate by passing
data and commands via the shared memory. Both processors benefit by having access to the dual-port RAM because it is mapped just like any other memory device on
the board.
Fast, local access to the shared memory eliminates
the need to arbitrate for and access the system bus, when
reading .or writing a common resource area such as a
shared memory card. In fact, many mUltiprocessor embedded-control systems implement dual-port RAMs for
interprocessor communication and eliminate the system
bus entirely. Removing the burden of a system bus, which
only exists to hook the processors together, reduces the
complexity of the system as well as the part· count and
power consumption.

Dual-Port Overview
Incorporating dual-port RAMs into a design such as
the dual-processor example is straightforward. But it is
important to consider the case of an address contention or
busy situation that can arise when both. processors simultaneously attempt to access the exact same location.
Cypress dual-port RAMs have several mechanisms
that simplify simultaneous access. The simplest approach
to resolving contention is to use the dual-port RAM's
Busy output lines. Both right and left ports provide a Busy
output signal. The arbitration logic inside the dual-port
RAM activates Busy when the logic senses a match between the left and right address lines. Assertion of Busy
indicates that both ports have attempted to access the
same location in the RAM.
In the case of a dual-processor system, these signals
can easily be gated with the processor's local Wait signal

Processor

ADDRESS

DUAL PORT
RI\Iot

Processor
"B"

ADDRESS

"'"

,

....

DATA

DATA

allY

Il,IIY

INT!RIU>T

",

INT!RRLPT"

,

Figure 1. Dual-Processor Communication

4-19

~=

Using Dual Port RAMs Without Arbitration
~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~;;
Table 1. CY7C132 Interrupt Line Usage
Function,

Result

Write to left Address 7FFh

Asserts Int_right

Read from Right Address
7FFh

Removes Int_right

Write to Right Address
7FEh

Asserts IntJeft

.synchronizing processes or restricting address spaces via
software.
. You now have tW? m~in options for dealing with
sImultaneous address SItuations: Use Busy in a strictly
hardware solution, or couple interrupts with status words
for a software solution. Regardless of your preference for
a hardware. or so~tware approach, Cypress dual-port
RAMs provIde all SIgnals and functions necessary to ensure a simple and effective system solution that maintains
data integrity and system sanity.

Using Dual-port RAMs Without Arbitration

Read from Left Address
Removes IntJeft
7FEh
permission is granted to access a certain area of the dualported device.
Interrupts serve well as a simple means of alerting or
synchronizing interdependent system elements that pass
d~ta ~ia a shared memory. Cypress dual-port RAMs proVIde. mte?Upt outputs to simplify the task of interrupting
or signalmg the processors; this relieves you of the need
to create your own interrupt mechanism. Assertion and
deas~ertion ?f these interrupt lines is accomplished by perfo~ng wnte and read operations to special locations
withm the dual-port RAM. Table 1 lists the read and write
operations.
The data word written to in these devices, 3FEh and
3FFh, . can be used as a status word or semaphore. This
word IS presented to the data bus during the read operation of an interrupt removal cycle. The status word
provides additional system-level information that augmen~ the h.ardware inteu:upt signal by passing along some
!lleanmg ~Ith the actual mterrupt event. More simply, the
mterrupt line alerts the processor that some action is required, and the status word provides additional information about exactly what happened or what needs to be
done.
The actual. meaning of the status byte is defined by
the system desIgner. Generally, the status byte is used to
indicate that data is ready, to lock a processor out of a
specific range of addresses, or to prompt a processor for
new data. Using the interrupt, along with status information, is an easy way of avoiding busy conditions by

Wait states and interrupts are a good solution for systems with microprocessor-like elements that are not affected by an occasional wait state. However, a much
broader class of systems and applications cannot tolerate
any type of data flow interruption or busy condition. Typically, these systems are dedicated function units that are
~gidly pipelined and operate on continuous or nearly contmuous streams of data.
A high-speed video processor is a good example of a
system whose elements cannot be wait-stated due to the
requirement that a data word or pixel be processed in
every clock cycle. The block diagram in Figure 2 shows a
video data transform or look-up table.
This implementation uses a very common dualbanked or "ping-pong" RAM to realize a look-up-table
translation function (Figure 3). A continuous stream of
video data drives the address lines of RAM bank O. The
output or transformed data of bank 0 flows downstream to
the post-processor units. Meanwhile, as continuous video
data flows through RAM bank 0, the transform table of
bank 1 is updated by a processor element, without interfering with the video data flow.
Dual banks make it impossible for a busy condition
or address conflict to exist, because each system element
essentially has its own discrete dedicated RAM. The
processor finishes updating the look-up table, then swaps
RAM banks by toggling the bank-select line. The PAL
then changes the state of the buffer-enable signals which
redirects the data flow pattern of the two RAM banks.
The ping-pong arrangement is effective, but the implementation is very costly in terms of real estate. The

RAM

VIdeo
Data ---~

~-~--~A

D~-~~--~

(Bank 01

Processor
Address -----~A

RAM

Bus

(Bank 11

Figure 2. Video Look-Up Table

4-20

TransforMed
Data Out

Table 2. Dual-Port vs. Ping-Pong RAM

....

Y,o"

Otv

FCT244

6

15

0.4

FCT245

2

10

0.4

180

0.4

2Onsx8 RAM

2

140

0.52

Total

11

570

4.64

120

1.5

CY7C142-35

DotaCklt

Power (rnA) Size (Sa.in.)

Device

PALI6L8-D

TrOll.forMd

IlR(O:71~----~Gm'm'

~lLi::O'

AReo:.,

Figure 4. Video Lookup with Segmented Dual-Port
RAM

design requ~es at least 11 very high speed devices, using
standard static RAMs.
Replacing the buffers, logic, and SRAMs with a
single dual-port RAM (Figure 4) simplifies the design
sub.stantially. Video data utilizes the device's left port,
while the processor communicates with the right port.
Raving two ports eliminates the need for any type of data
and address steering buffers. During processor update
cycles, however, there remains the problem of simultaneous address accesses and busy conditions.
RAM segmentation eliminates the possibility of a
busy conflict and provides the key to implementfug a
dual-banked RAM within a single dual-port RAM. A
single inverter segments the RAM. The Bank select signal from the processor drives the left address -port MSB,

and the Bank_select signal's inverse drives the right MSB.
The dual-port RAM is now segmented into two lK address spaces that do not overlap. The RAM appears as two
totally separate RAMs, as it did in the ping-pong implementation. Again, because the left address can never
equal the right address due to the opposite state of their
MSBs, a busy condition is not possible.
. Using a dual-port RAM does more than simplify the
deSIgn. Table 2 shows the tremendous savings in real estate and power consumption. Specifically, a single dualport device reduces the board area by 68 percent and
reduces the power consumption by almost 80 percent. In
terms of MTBF, system reliability benefits greatly from

Video Dota

Ran_Bank Se'80t

--T---t-----L--+----t--+--------.J
~_,,....-----4-----4Cpu.Data

Ran.Bank.SIII.t ---r-+--I--+-----L-~

Figure 3. Ping-Pong RAM Array

4-21

Using Dual-Port RAMs Without Arbitration

Dua ~Y~~~~2Ra"
1--_ _ _ _'ilALIO:1I
DLIO:"

ce·
oe·

,
I
DRIO'"

~

Or ......
0.•• 0..

Dua~Y~~~t2R."

ALlIO)

~

RIV
Addu ..

'

1lL10")~

ALIO",

~:8iU::~

Da ••

~

RIV-L
,E-L
CE.L

~~:::::::;i
" .--Ji

Figure 5. Data Descrambler

-

RIV_R
OI!..R
CE..R

-

DRco:"

"'''noW'

HlIIOI

V

~r:-

having fewer components and significantly lower power
dissipation.
The multitude of buffers and transceivers that steer
data and address signals in a ping-pong memory array
take up relatively large amounts of board space as well as
adding to the data propagation delay. The latter forces you
to use very high speed RAMs. Dual-port RAMs do not
suffer from the burden of buffer delays and can therefore
operate at significantly lower speeds.

""0:11

Figure 6. CPU/Pipelined Processor Interface
up. Initialization is only required once because the FIFO
utilizes its retransmit function (described in the CY7C429
FIFO data sheet), unless the data ordering changes. Because this design implements the dual-port RAM as a segmented memory, you can ignore the problems caused by
address contention.

Handling Video or Radar Data
Many types of high-speed data-processing applications can benefit form the use of dual-port RAMs. For
example, high-speed video or radar data is often transmitted in nonsequential or cross-interleaved order. The
receiver must first descramble or reorder the data before
the data can be used. Again, the incoming data stream
cannot be stopped in the event of an address contention.
Figure 5 shows that a dual-port RAM is an ideal
solution for this type of problem. Incoming data is written
into the RAM's left port in the received order. The pixel
counter provides sequential addresses to the left side of
the dual-port RAM and increments after each pixel. At the
end of the first line, the counter reaches terminal count
and initiates a bank toggle via aT-type flip-flop. After the
banks switch, the new data is accessible via the right port.
A FIFO stores the reordering sequence and thus
drives the right port's address lines to read-out the stored
video data. PROMs and counters can also implement the
descrambling function, but this approach requires more
parts and is much less flexible. Using a FIFO eliminates
the need to generate addresses for the reordering sequence
table. The CPU initializes the descrambling FIFO at boot

For DSPs and Bit-Slice Processors
Interfacing a system's CPU to a high-speed, pipelined
digital signal processor or bit-slice processor is another
common system interface problem. Coefficients and commands must be passed to the pipelined processor, and
fmal results read back by the CPU. Dual banks of RAM
are often furnish a solution because they provide a shared
memory space that both system elements. can use without
address contention.
Because the machines involved are rigidly pipelined,
they cannot easily be stopped or interrupted. Thus, a
single, segmented, dual-port RAM (Figure 6), or several
dual-port RAMs in parallel with no additional glue logic,
provides a simple, cost-effective solution to this problem.
If two banks of data are too restrictive, you can segment the dual-port RAM into multiple address spaces by
restricting more of the upper-address-line pairs. This
scheme allows the processor to easily and quickly communicate with the pipeline processor without using large
amounts of real estate and power.

4-22

--......
~

~
.~..
CYPRESS
SEMICONDUCTOR

____iiiii",'=
-':]

Using Cypress SRAMs to Implement 386 Cache
Because the 80386 is the most commonly used 32·
bit microprocessor available today, this application note
discusses some 386 cache implementations that take advantage of special features offered by Cypress's SRAM
products. This application note does not offer a broad
treatment of cache memories, however, and it assumes
that you have a fundamental understanding of cache
memories and the terminology associated with them.
Mainframe computers have used cache memories
for several years. Desktop systems did not require
caches until the advent of 32-bit microprocessors, such
as the 80386, that run at clock frequencies of 20 MHz
and above. A cache allows you to make full use of the
microprocessor's available throughput. This is because
the processor's bandwidth is greater than the bandwidth
available from commonly available DRAMs.
In a memory hierarchy, a cache is a small, fast
memory placed between the processor and main
memory. A cache stores the most often. used data and
instructions to avoid accesses to main memory. Because
of speed requirements, a cache is usually implemented
with fast static RAM. The goal, then, is to implement
the memory subsystem such that the processor's effective average access time approaches that of the cache,
while the memory subsystem's cost per bit approaches
that of the main memory.
Computer programs exhibit temporal and spatial
locality, which make cache memories possible. Temporal locality refers to a program's tendency to re-reference the elements referenced in the recent past. Loops,
temporary variables, and stacks are examples of constructs that conform to this property. Spatial locality
refers to a prograJ1l'S tendency to access a portion of
the address space in the neighborhood of the last reference. Sequential program execution and repeated access to array variables are examples of this property.
In addition to discrete cache implementations,
several VLSI cache controllers are available today for
the 80386. This application note describes two of the
most popular: the Intel 82385 and the Chips and Technologies 82C307. A discrete cache implementation using
Cypress products is covered first.

Discrete Implementation
You can implement a cache memory without using
a VLSI. cache controller. This discrete approach has the
advantage of allowing you to custom tailor the cache
subsystem to your specific requirements instead of
being limited by a VLSI cache controller's capabilities.
You can implement a low-cost cache subsystem or a
cache with higher performance characteristics than can
be achieved with today's VLSI cache controllers.
The discrete approach also has drawbacks. It
makes high-speed caches more difficult to implement
due to the delays incurred by discrete ICs input and
output· buffering, as well as trace delays introduced by
the printed circuit board: Discrete solutions can also increase board-space and power requirements, and transmission line and noise effects become a more significant
problem.
Figure 1 shows a block diagram of a simple, 64Kbyte, direct-mapped, write-through cache. You can
implement the control logic in programmable logic or a
gate array (which are not detailed here). The cache tag
or directory into the cache data is implemented in the
CY7Cl50 lK X 4 resetable SRAM. The CY7B185 8K X
8 SRAM serves as the cache data RAM. CY7C408A 64
X 8 FIFOs are used as write buffers, which reduce the
number of processor stalls in the write-through cache.
This example assumes that no memory references
are made above 1 Gbyte. Thus, only the lower 30 address bits of the 80386 are used. Because the tag directory has lK entries, and the data cache is organized as
8K X 32, the line size for this example is eight words or
32 bytes.
The 80386 supports two modes of local bus operation: pipelined and non-pipelined. With address pipelining enabled, the processor puts the address of the next
memory access on the bus during the current access.
This effectively gives the memory subsystem an extra
clock cycle to decode the address. This approach has
two drawbacks, however. First, entering pipeline mode
incurs an additional wait state. Wait states also occur
during branches, after periods when the processor's
4-23

~

.~~R~

Using Cypress SRAMs to Implement 386 C. . ache

~~~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

..

mn

lIT

cucu,
14
11 .• 111

JLL.U

~

-

' .. At

' .. IS

U.. tI

J
~

mInE
LIm

nmu

Inri

VE
'E

cumu

14

,..LU-

IS

~cs

14

0.. lIZ

JO .. II

~

11 .. 11 111 .. 117

.....m

VE

FIE
llI ill-

-

~
~

II

m

,-11
'r-II

CEZ

,-/IE
'r-lill

II

an

IF

,..--

~

cmlll
l,m

Tt

-

UII

1110 II
~

r-----

-

r--

Uti
U .. AIt

nmUl
14

II. ".1118 .. 1

/

1Et .. 1El
.... 131

111 •• 111
II

I

?=,

mUll

IllY

11 .. 117

V-I~:I
rr-II

.#

,----"

-

~

II

an

If

r--

I~

1l1'

'---

Figure 1. A Discrete Cache Implementation
cach~

pre-fetch queue is full, and after another bus master,
such as a DMA controller, relinquishes the local bus to
the ·processor. The second drawback is that the address
and' some of the control signals must b.eextemally
latched, requiring additional board space and complexity. Thus, for simplicity and increased performance,
the 80386's address pipelining feature is disabled in this
example.
. During . memory read accesses, address bits 5
through 14 index one of the entries in the tag RAM.
Simultaneously, address bits 2 through 14 access the
data RAM. After time tAA of the tag RAM, the address
tag appears at the comparator inputs. This tag is
qualified by the valid bit and compared with 80386 address bits 15 through 29. The ~atch output is fed to the

control logic. If a match is found, and the cache
valid (i.e., a read hit occurred), the cache RAMs
supply data to the 80386, and the cache control log~c
asserts /386 ROY. If a match is not detected, or the
cach~ line; i~-invalid (Le., a read miss occurred), the output enable of the cache ~AMs is de-asserted, and a
main memory access, is initiated. The cache control
logic causes the cache line to .be updated from main.
memory. The control logic then updates .. the valid bit
and supplies
requested data as well as /386 ROY to
the processor.
. .
T!lis cache implements. a. write-through, no-write-allocate' policy. Therefore,' for. write hits, both the cache
RAM and main memory; are updated before the 386
lin~ . is

the

4-24

10 ns. This speed is important, because the tag logic can
prove to be the critical speed path in the design.
Second, the CY7C150 has a memory reset function that
allows the contents of the entire tag to be flushed within
two memory cycles. Therefore, a cache flush operation
can be performed much faster than if the processor had
to invalidate the tag RAM on a line-by-line basis.
The CY7B185 SRAM is fabricated in Cypress's
high-performance BiCMOS process and is organized as
8K X 8. The device is available with access times as fast
as 10 ns and comes with a variety of packaging options.
This part's X 8 width allows you to implement the entire data cache with only four devices.
Cypress provides a wide variety of memory width
and depth configurations, all available with fast access
times. You can thus implement the configuration that
best suits your specific design requirements.

Table 1. Worst-Case Timing Calculations with the
82385
CALE : 82385 Cache Address Latch Enable
CS(3:0)# : Cache Select 3:0
COEA#,COEB# : Cache Output EnaBles A,B
WEA#, WEB# : Cache Write Enables A,B
Read Timinlf
tAA

(max) non-vivelined mode

4 CLK2 periods = 4 x 15 ns

60ns

CALE valid from CLK2 (max)

- 15 ns

386 data set up time (max)

...::..i!!§

tAA (max)

40ns

COE(A.Bl#. CS(3:0)# to Data Valid

4 CLK2 periods = 4 x 15 ns
CS(3:0)# valid from CLK2
386 data set up time
COE(A,B)#, CS(3:0)# to data valid
t()P (max)
2 CLK2 periods = 2 x 15 ns
COE(A,B)# active from CLK2 (max)

386 data set up time (max)
tOE (max)
Write Timin!!
WEA#, WEB# pulse width (min)

82385 Implementation

60ns

The 82385 is a VLSI cache controller offered by
Intel that is specifically designed to work with the
80386. The device supports a 32-Kbyte cache and can
be configured to operate in direct mapped or two-way
set-associative modes by strapping the 2W/D# pin. Appendix A provides information for strapping the 82385.
The CY7C184 cache RAM connects directly to the
Intel 82385 and 80386 with no external glue logic. You
can configure the CY7C184 as a 2 X 4K X 16-bit device
for set-associative implementations or as an 8K X 16
device for direct-mapped implementations.
During read misses, the 82385 invokes the 80386's
pipeline mode to reduce the miss penalty. Therefore,
the processor's address must be externally latched. The
CY7C184 contains address latches, eliminating the need
for discrete latches. Using discrete 4K X 4 SRAMs to
implement the two-way set-associative configuration
would require 18 ICs for the data cache and address
latches. Only two CY7C184s can implement the same
function in a space-saving 52-pin PLCC package.
The CY7C184 is configured by strapping the
MODE pin High for set-associative operation or Low
for direct-mapped operation. In set-associative mode,
address bit A12 is a Don't Care and should be externally grounded. Figures 2 and 3 show the connections for
two-way set-associative and direct-mapped modes,
respectively.
Table 1 illustrates some worst-case· timing calculations for a 33-MHz system. As the CY7C184 data sheet
shows, the -25 part meets or exceeds all the worst-case
requirements. For the 33-MHz configuration, there is
no difference in the 82385 timing specifications for setassociative and direct-mapped operation. Therefore,
set-associative operation is recommended, because it
yields higher hit rates. For some lower-speed grades of
the 82385, the timing is less stringent for direct-mapped
operation. Therefore, slower, less-expensive cache can
be implemented for direct-mapped operation. Thus, you
must make a price/performance decision.

- 25 ns
- 5 ns

-30ns

30ns
- 15 ns
- 5 ns

-10 ns

20ns

can continue execution. On write misses, only main
memory is updated.
Write buffers between the processor and main
memory improve write performance. During write
cycles, the processor writes to the write buffers, and the
cache control logic updates main memory as a background task. While main memory is updated, the
processor can continue executing as long as it executes
read hit cycles or write cycles and as long as the write
buffer has room. After a read miss, the processor halts
until the write buffer has been completely flushed to
main memory. Otherwise, the processor might access
stale data from main memory.
The write buffers are implemented with Cypress
CY7C408A 64 X 8 FIFOs. This device features speeds
up to 35 MHz. It is deep enough that a full write buffer
condition seldom occurs, and its output enable makes
external three-state devices unnecessary.
The CY7C150 SRAM has two features that are
beneficial in cache tag applications. First, access time is
very fast. This product is available with a tAA as fast as
4-25

f5r:~CCIDK:TOR =;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;==;;;;;;;;;;;U;;;;sl;;;;·n;;;::g;;;;;C;;;;y:;;;;:p:;;;r;;;;e;;;;s;;;;s;;;;S;;;;R;;;;A;;;;M;;;;;;;;;;;s;;;;t;;;;o;;;;I;;;;m;!p;;;;le;;;;m;;;;;;;;;;;en;;;;t;;;;3;;;8;;;6;;;;;;;;;;C;;;;a;;;;ch;;;;e;;;;;
386

ADORES
BUS

A2

A

CY7C184
AO
All
r - AU
ALE
DO
lOp
IIIEA
10EI
IIIEI
ICSl
ICSO
.:!:.ll..... MODE
ICE
~

-

J

!)

- D I"

1I'J1

II 1 II

386

DATA
BUS

*

CY7C184
AO
All
r - AU
ALE
DO
10EA
IIIEA
lOEB
IVEB
I CSl
ICSO
.±.l.L MODE
ICE
~

-

82385

CACHE
CONTRO

l

CALE
ICOEA
ICVEA
ICOEI
JeVEB
ICS!
ICS2
Jest
ICSO

I

-

D I"

nn

n

1 iii

~

Figure 2. Set Associative Operation with the 82385

386

ADDRESS
BUS

~~~~

________________________
__________
~

~-4+-

~AO

CY7C184
- AIZ

~ALE

________~/CSO
MODE
ICE

82385

CACHE
CONTROL

CY7C184
~~~~________ AO
- A1Z
-L~~-H~~~______~ALE
-L~~-H__~~________ /OEA
-L~~-H__~~________ /IIEA
+ V lOEB
+
IIIEB
~~L-_ _ _ _ _ _ _ _ _ _ _ _ _ _~/CSI
~~L-_ _ _ _ _ _ _ _ _ _ _ _ _ _~/CSO

MODE
ICE

Figure 3. Direct Mapped Operation with the 82385

4-26

386

DATA
BUS

5~CYPR!SS

Using Cypress SRAMs to Implement 386 Cache

~CaID~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

82C307 Implementation

Table 2. Worst-Case Timing Calculations with the

The 82C307 is a combination cachelDRAM controller offered by Chips and Technologies. The device is
part of a chip set designed to offer a high-performance
IBM PC/AT-compatible system with a minimum number of components. Because the 82C307 has a two-way
set-associative cache-mapping
policy,
strap the
CY7C184 MODE pin High for proper operation.
The cache organization for the 82C307 is
2 X 4K X 32 bits. Two CY7C184s implement the entire
data cache. The 82C307 also makes use of the
CY7C184's built-in address latches when pipelined
mode is required. The 82C307 has a programmable feature that allows either chip select or output enable to· be
supplied to the cache data RAM. This feature should
always be programmed to generate a chip select when
using the CY7C184. Figure 4 illustrates how to use the
CY7C184 with the 82C307. The Chips and Technologies
82C306 is used to latch the 80386 byte enables.
Table 2 illustrates some worst-case read timing calculations for a 25-MHz system in both non-pipelined
and pipelined modes. As the CY7C184 data sheet
shows, the -25 part meets or exceeds the worst-case requirements for non-pipelined mode, and the -45 part
does the same for pipelined mode. Again, you must
make a price/performance decision based on these options.

82C307

CRD(1:0)# : 307 Cache Read 1:0
Read Timingnon-oioelined mode
4 CLK2 periods = 4 x 20 ns
tAA (max)

CRD(1:0)# from CLK2 (max)
386 data set up time (max)
tAA (max)
10£ (max) non-vipelined mode
1.5 CLK2 periods = 1.5 x 20 ns

CRD(1:0)# active from CLK2 (max)
386 data set up time
tOE (max)

PCB Layout Considerations
As with any high-speed system, you must pay careful attention to the layout phase of a 386 cache project.
The following rules of thumb help reduce noise
problems and radiated EM!. A multilayer board with
both power and ground planes is strongly recommended. Power and ground planes provide good, lowinductance paths for the power connections to the
devices on the PCB. These paths help minimize ground
bounce and other noise problems. Sandwiching power
or ground planes between signal layers greatly· improves
the circuit board's noise characteristics. Ground-loop
currents are minimized, which reduces capacitive and
inductive signal coupling. A maximum center-to-center
spacing of 8 mils between signal and power layers is
recommended.
Good high-frequency decoupling on power and
ground connections is very important for reliable highspeed operation. High-frequency bypass capacitors with
NPO or X7R dielectrics are recommended. These
devices store charge and supply. instantaneous power
required by the active devices on the PCB. For the
CY7C184, one O.1-JlF and one 0.01-JlF capacitor are
recommended per device. Surface-mount capacitors are
preferred because of the lower lead inductance these
devices exhibit. Additionally, you can place surfacemount devices on the back of the PCB in the center of
the device they are intended to decouple. This placement reduces the inductance between the capacitor

80ns
- 12 ns
~

61 ns

30ns
- 12 ns
_ 7 n"
11 ns

fAA _(max) vipelined mode
6 CLK2 periods = 6 x 20 ns

120ns

CRD(1:0)# from CLK2 (max)

- 12 ns

386 data set up time (max)

-70s

tAA (max)

101 ns

toP. .(max) vivelined mode
3 CLK2 periods = 3 x 20 ns

CRD(1:0)# active from CLK2 (max)
386 data set up time (max)

60ns
- 12 ns

-=-2m.

tOE (max)
41 ns
leads and the actIve deVIce's power and ground connections.
Avoid sockets whenever possible because of the
extra inductance introduced. If sockets are necessary,
high-quality sockets with gold-plated contacts are
recommended.
Pay careful attention. to the routing of traces. In
general, traces should be kept as short as possible to
reduce transmission-line effects. Point-to-point connections are recommended, as opposed to stubbed or treetype connections. The latter causes discontinuities in
the transmission line, which create reflections. Instead
of 90· bends, traces should be curved; or use two 45·
bends. This help~ reduce EM!.
Critical signals, such as clocks and control lines,
should be routed first. Whenever possible, keep these
signals on the same layer, because vias cause transmission-line discontinuities. Routing these signals on the

4-27

386

'7 ,

ADDRESS
BUS

CY7C184
AO - All
Al2
ALE
DD
10EA
IWEA
10EI
lWEI
ICSI
ICSO
.tlL MODE
ICE
~

~

~

.

0.5D1II

D31

386

DATA
BUS

~

82C307

CACHE
CONTRO L
82C306

CALE
ICROO
leWED
ICRDI
ICWEI
IlRE3
ILBE2
ILIEI
ILBED

I

CY7C184
AO - All
A12
r--ALE
DO
10EA
IWEA
lOEB
IWEB
ICSI
ICSO
.tlL MODE
ICE
~

-

o [S DD

DtS

~

Figure 4. Operation with the 82C307
inner layers reduces radiated emissions. To minimize
transmjssion-line effects, keep these traces to a maximum of six inches in length. To minimize crosstalk, a
center-to-center minimum spacing of 16 mils is recommended for critical traces.
The signal quaiity of the system clock is a very important consideration. Pay careful attention to clock
loading and skew. For high-speed clocks, it is usually
recommended to supply each clock input from a
separate driver. The clock drivers should be in a
monolithic package, such as a hex inverter, so that clock

skew is minimized. Keeping clock traces approximately
the same length also helps minimize clock skew.
Series damping resistors in the 10 to 270. range
might be required on clock traces to achieve good signal quality. If so, use as low a value as possible. Experimentation determines the optimal value.
Once control lines have been routed, address and
data lines can be routed. These signals are somewhat
less critical, because some settling time is .usually
provided in the worst-case timing. However, these signals should still be routed point to point, and trace
length should be minimized.

Appendix A
Strapping Information for Different Steppings of the Intel 82385

Intel manufactures different versions (steps) of the 82385 cache controller. For example, the C step activates the
output enables to the cache RAMs whenever the write enable signals are asserted. Step B, on the other hand, inhibits
OE# while WE# is Low. StepSB, one of the new revisions, allows you to control the state of the OE# output during
write cycles. Cypress recommends that pin A14 be tied Low; then OE# is de-asserted during write. There are two
reasons for this:
Although the 7C183 three-states its outputs (tHZWE = 15 - 20 ns) after WE# is asserted, even if the OE# input is
active, the write pulse ' width (tpWE) in some systems might not be long enough to satisfy the tSD requirement after
tHZWE is satisfied.
.
Assuming tPWE is long ,enough to satisfy tSD, you must contend with another problem. After the 7C183 three-states
its outputs, the noise caused by its buffers drives the VIR level to 3V. In other words, any inputs less than 3V might
not be recognized as a High level. If you want to avoid this condition, pull OE# High 10 ns before asserting WE#.

4-28

Section Contents
Page
PROMs
Pin-out Compatibility Considerations of SRAMs and PROMs ................................ 5-1
Introduction to Diagnostic PROMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5-4
Interfacing the CY7C289 to the AM29000 ................................................. 5-10
Interfacing the CY7C289 to the CY7C601 ................................................. 5-23

=

~
CYPRESS
F SEMICONDUCTOR

Pin-Out Compatibility Considerations
of SRAMs and PROMs
This application note discusses the non-electrical
parameters of pin-out and programming involved in
finding socket-compatible second sourc~s for PRO~s.
Included here is an example of a venfied converSIOn
from the Motorola 68764 to the Cypress 7C264, a
PROM conversion that is not address-line compatible.

An SRAM Comparison
To understand how to choose second-source
PROMs, consider a comparison with .the process of
choosing second-source SRAMs. Ignonng the AC~
characteristics, rmding a second source for an SRAM IS
relatively simple. So long as the power, ground, control
(chip select, read, write), address, and data lines ar~ on
the same pins, the. devices should be compatible.
Specifically, on SRAMs, the address and data lin~s need
not be numbered identically between the two deVIces for
them to function identically in the same socket. As an
example, on several Cypress SRAMs, the addre~s pin
numbering is not the same as some of our competItors.
Consider a simplified example that illustrates why
address pin numbering is not a problem: Assume you
have a new device, the 2-bit x 4-location SRAM shown
in Figure 1. Note that the inferior pin-out chosen by the
Brand "X" 2 x 4 assigns address . line 2 (A2) to pin 1,
while the superior pin-out used by the Cypress device
has Al at pin 1, etc.
Assume that your engineering staff designed an .infrared scanning-pattern-recognizing toaster oven usmg
the Brand "X" SRAM, working only from the device's
data sheet. Just as your company is about to ramp into
Cypress
2x4

1FI>ll3

2~4

volume production, Brand "X" sends out an ~nd Of
Life notice on the 2 x 4, because the company IS converting all of its capacity to making DRAM~.
At this point, because you have no deSIre to layout
a new PCB, take a look at how the Cypress and Brand
"X" SRAMs would look in your design (Figure 2). In the
figure, J.1P designates a microprocessor interfacing to
the SRAM. The important thing to notice in Figure2 is
that the data read from an address generated by the
microprocessor is the same as data ~tten k? the same
location earlier. With an SRAM, any mconslStency between the address and data line numbering does not
matter because the data read is the same as the data
previously written.
To illustrate the point further, suppose that you
write a value of 1 (J.1P:D2,Dl = 0,1) at location 2
(J.1P:A2,A1 = 1,0). If you read location 2, you obtain
the value 1 that was written, because the address
presented to the SRAM during the read is the same as
the address for the previous write. Similarly, the data
read is in the same bit order as presented during the
previous write to the location. So far as the system is
concerned, the two SRAM devices are compatible.
Although not significant to the system, the devices
differ in where they internally store the data. In the
Brand 'X" Board
uP---A2------l ~ 3-----D2------uP

UP--Al-----2~ 4----------DI---uP

Brand 'X"
2x4

Brand "x" Board with Cypress 2 x 4

1~3
2~4

uP--A2------1 [A1Dil3----D2--------up
uP--Al-----2

~4---------Dl--------up

Figure 2. Example System with 2x4 SRAMs

Figure 1. Example 2x4 Simplified SRAMs
5-1

~~
~...

~~~~~~~~P~in~-~O~u~t~C~o~m~p=a~ti~b~il~it~y~fu~r~S~R~A~M~s=a=n=d=P=R=O=M~s

SEMICONDUCTOR;;;;;;

Cypress device, the J1P address of 2 (J,JP:A2,A1 = 1,0)
actually stores the data at SRAM location 1
(Cypress:A2,A1 = 0,1). The Brand "X" RAM physically
stores the data at address 2.
The address translation is transparent to the J1P,
however. Because the same location is accessed for the
subsequent reads, the difference in address numbering
between the two devices does not matter to the system.
Similarly, any numbering difference on the data lines
does not matter either. All writes and reads are
generated in your system; thus, s? ·lon~ .as .the a?dress
and data lines are on the· same pms, differences m the
numbering do not matter.

Second-Sourced PROMs
For PROMS, the scenario becomes slightly more
complex. Because you program PROMS using a
programmer that is separate from the system. in which
they are used, it is more difficult to substitute PROMs
that do not have the same address- and/or data-pin numbering.
.
Assume, for example, that the high-tech toaster
oven's 2 x 4's are PROMS. If you program each location
with data, you find that the Cypress device does not
work properly when used in the Brand "X"-designed
socket. In this case, the PROM programmer puts the
data at location 2, and the board reads this data when
the microprocessor requests the data at location 3. Additionally the data bits are swapped on this read. What a
mess! It becomes apparent that it is easiest to replace
this PROM with a device that has the same address- and
data-line numbering.
There are methods that allow you to use the Cypress
2 x 4 PROM in the Brand "X" socket, however. The objective in trying. to make the Cypress PROM work in the
foreign pin-out socket is to have the system read the
same data as when the Brand "X" device is used. In the 2
x 4 example, you encounter two problems: mismatches in
the numbering of address lines and data lines.

Correcting Data-Line Mismatch
First consider the data-line mismatch. As it stands,
data programmed in as bitl,bit2 is read as bit2,bitl. You
could fix this problem by swapping the printed traces for
Dl and D2. Unfortunately, this would also disallow the
use of the Brand "X" device.
If you could internally swap the data bits
prograiruned into the Cypress device, they would be in
the correct order when read. You can, in fact, swap the
data bits in the Cypress device through several means.
First, you might modify your programming adapter such
that D2 and D 1 are swapped when programming the
part. Then when the device is read, you get the bits in
the same order as presented by the Brand "X" device.
This is not a recommended method of solving the problem, because modifying prog~ammers tends to make the
manufacturer of the programmer unhappy.

5-2

1) Brand "X" 2 x 4

: Bit 2, Bit 1

2) Programmer (Cypress) : Bit 1, Bit 2
3) Cypress 2 x 4

: Bit I,Bit2

4) System Board uP

: Bit 2, Bit 1

Figure 3. PROM Bit Swapping with Programmer
A second method of solving this problem is to alter
the binary image of the PROM contents such that bits
D 1 and D2 are swapped in a file on your computer's
disk· this altered binary image file is then used to progr~ the Cypress PROM. This approach is less likely to
cause damage. than modifying a programmer, but requires some skill in altering the binary. file.
Finally, the easiest solution to this problem is to
trick the PROM programmer into swapping the bits for
you. If you set your programmer for the Cypress device
type, read a programmed Brand "X" device into memory,
then program the Cypress part with the image in
programmer memory, the bits are swapped for you. ."
You can see how this bit swapping works by exanuning Figure 3. The bits in the Brand "X" device are stored
in the order Bit2,Bitl - the same order in which the
toaster's J1P reads them. When you set the programmer
to read the Cypress part, the data lines are logically
swapped from the Brand "X" ordering. Thus, when you
read the Brand "X" part, the data bits are swapped as
shown.
When the Brand "X" part is removed from the socket, and the Cypress device is plugged in and
programmed, the bits are programmed into the Cypress
part in this same "reversed" order.. When you place t~e
Cypress part into your board, the bIts are swapped agam
due to the difference in numbering between the Cypress
part and the board layout, and the J1P gets the data in
the correct order.

Correcting· Address-Line Mismatch
The second problem in substituting PROMs is the
difference in address-line numbering. You can resolve
this problem in exactly the sam~ manner as the data
swap problem. By simply setting the programmer to the
Cypress device type, reading the Brand "~" pcu:t, then
programming the Cypress part, any addresslOg dIfferences are solved. The location of data words are swapped
to allow for the difference in pin-outs, just as the bits
were swapped in the data-line mismatch.

Working with PROM Programmers
Many programmers allow you to read a device different than the part selected, complaining only duri~g
programming if the device types do not match. WIth

use as a source for copying with uncooperative
programmers.
PIN

Cypress 7C264

Motorola 68764

21

AIO

Al2

19

All

AlO

18

A12

All

Figure 4. Cypress 7C264 vs. Motorola 68764 Pin-out
such a programmer, carrying out the procedures to convert a PROM should not present a problem.
Some programmers, however, do not allow you to
read a device if it is different from the part selected.
These programmers prevent the conversion method
from working. Fortunately, the Cypress CY3000 QuickPro programmer does permit use of the conversion
method. Cypress Field Applications Engineers, sales offices, and distributors can use their QuickPro programmers to generate a· Cypress master PROM that you can

Conversion Example
As an example of a PROM conversion, consider
the Motorola 68764 8K x 8 PROM. It has a similar pinout to the Cypress CY7C264, with the exception of address lines 10, 11, and 12.
To program a Cypress CY7C264 to work properly
in a socket designed to accept the Motorola device, use
this procedure:
Invoke the Cypress QuickPro or other appropriate
programmer and select the Cypress CY7C264 as the
device to be programmed.
Place the Mqtorola part in the programmer adapter
socket and read the device. Optionally, write the device
contents to a disk file.
Place a Cypress CY7C264 in the programmer adapter
socket, and program the part Optionally, you can read
the contents of the disk file as the source for programming.
The programmed device now works in the socket
designed for the Motorola part.

5-3

CYPRESS
SEMICONDUCTOR

Introduction to Diagnostic PROMs

This application note provides a basic understanding
of the concept of a diagnostic PROM, as well as a brief
introduction to possible applications.
Beginning with a short tutorial on system diagnostics,
this application note presents the reason for incorporating
diagnostics into a design and the special testability
problems associated with sequential designs. The concept
of shadow-register-based diagnostics is presented, and the
benefits of this approach are outlined.
Next, a description of diagnostic PROMs is given.
This covers the similarity of diagnostic PROMs to standard registered PROMs, as well as the fundamental operation of a diagnostic PROM. Next is a description of the
Cypress CY7C268 and CY7C269 8K x 8 diagnostic
PROMs. An application example is also included.

function of the current inputs. Test vector methods are
easily devised and implemented for combinatorial systems. But, for a sequential system, in which the outputs
are a function of both the current inputs and the previous
state(s), controllability and observability can be lost due to
lack of access to the internal states of the machine. Consequently, building testability into a system means being
able to control and observe all possible states of the system.
Consider the simple sequential machine in Figure 1.
Access to internal states is either denied or difficult to obtain. The obvious way to add testability to this system is
to permit access to these internal states.
One way to gain this access is through addition of a
diagnostic shadow register, as shown in Figure 2. Observability is effected by adding a serial data output path
(SDO) to allow shifting internal state information out of
the system. Controllability is gained by permitting a serial
data input path (SDI) to set the state of the internal
registers. As a result, relatively simple test vector methods
can be used to test the system.

Introduction to System Diagnostics
As electronic systems continue to grow in size, function, and complexity, it is becoming increasingly difficult
to test them and determine their reliability, as well as to
service the end product in the field. One way to simplify
the task of testing electronic systems is to design some
form of testability into the system.
Controllability and observability are the key points of
testability. These two qualities are easily obtained for a
combinatorial system in which the outputs are strictly a

INPUTS

INPUTS

!-_-r-_____-+OUTPUTS
STATE
OUTPUTS

t----f----.OUTPUTS
COMBINATORIAL
LOGIC
STATE
OUTPUTS

INTERNAL STATE fEEDBACK

CLK

SEQUENTIAL SYSTEM

SEOUENTIAL SYSTEM

Figure 2. Simple Sequential Machine
with Diagnostic Capability

Figure 1. Simple Sequential Machine

5-4

OUTPUTS

SYSTEM INPUTS

•.------------------------------I

·:•,
•
·•·---------------2--------------I

I

I
I

I

01

OUTPUTS

~-------------------------------~
Figure 3. Complex Sequential Machine
Consider, for example, the complex sequential
machine shown in Figure 3. This system would be virtually impossible to test in the current configuration because
you cannot control or observe the machine's internal
states. To increase this machine's testability, observability
must be added at points 01, 02, and 03. If this were accomplished, you would be able to observe the internal
states of the machine. Additionally, controllability must be
added at points Cl, C2, and C3. This would allow you to
set the internal states of the machine.
This controllability and observability can be attained
by adding shadow registers, as depicted in Figure 4. The
result is a complex sequential machine with a high degree
of testability. As a result of these actions, simple test vector methods can now be used to fully test the machine.
For instance, the state of the register at point Cl can be
set, the machine can be clocked through some known
number of cycles, and the state of the machine can be
observed at points 01, 02, and 03.
Knowing what state the machine should be in at a
specific time at each observation point (the machine's
"known-correct" state) can be compared with the observed
machine state. This comparison determines if the machine
is functioning correctly, and if it is not, which. "machine
primitive" is not functioning correctly (fault detection).
Note that this approach to sequential design also permits testing to see what the machine would do if a glitch
caused a jump into an unused state. This capability makes
the design task of forcing the machine back into a known
state much less complex.
The real advantage of this approach is that it requires
no changes in architecture, minimal hardware changes,

and results in a minimal (5 - 10 percent) area penalty
when integrated into existing integrated circuits.

Diagnostic PROMs
Diagnostic PROMs are a relatively minor migration
from standard registered PROMs. A block diagram of a
diagnostic PROM appears in Figure 5. The addition of
diagnostic capability to a registered PROM includes the
addition of:
Shadow register
Multiplexer
MODE pin
SDI (Serial Data In) pin
SDO (Serial Data Out) pin
Diagnostic clock
The shadow register is dynamically configured, based
on the value of the mode signal. If the mode is set to input
data to the PROM, the shadow register is configured as
serial-in, parallel-out; if you want to extract information
from the PROM, the shadow register is configured as a
parallel-in, serial-out.
The shadow register thus serves two purposes. First,
it can be configured to serially receive state information
that will appear at the outputs during the next cycle. This
feature allows you to preset a condition to be sent through
the part of the system fed by the PROM; i.e., you can
insert state information into the system. This feature adds
controllability to the system.
The second purpose that the shadow register serves is
to allow you to transfer state information from the register
and to serially shift that data out of the PROM. This feature adds observability by allowing you to observe the
state of the PROM's pipeline register at any given time.

5-5

Mode. SOl, SDQ, and DCLK for each "Machine Primitive"

Figure 4. Complex Sequential Machine with Diagnostic Capability
Including the features listed above in a registered
PROM can therefore add testability to any system. Note
that this increase in function is effected without loss of
other desirable registered-PROM features, such as
programmable initialization, programmable output enable,

wide diagnostic PROMs are manufactured in CMOS for
an optimum speed/power tradeoff.
Both PROMs contain an edge-triggered pipeline
register and on-chip diagnostic shift register. Both PROMs
can withstand 2001 V ESD. Both PROMs are produced in
Cypress's EPROM-based process, which allows testing
for lOO-percent programmability. Both PROMs are available in PLCC/LCC and dual-inline packages, and both
PROMs are available in a windowed package for
reprogrammability .

etc.

Cypress Diagnostic PROMs
Cypress Semiconductor manufactures two diagnostic
PROMs: the CY7C268 and CY7C269. These 64K-byte-

.----------------------------------------~

I

I

STATE

OUTPUTS

Figure 5. Diagnostic PROM Block Diagram

5-6

Table 1. CY7C268 Pin Functions
Name

MODE

PCLK

CONTROL
LOGIC
SOl

Function

110

Ao-A12

I

Address Input

00-07

0

Data Lines

ENA

I

INIT

I

Synchronous or
Asynchronous Output Enable
Asynchronous Initialize

MODE

I

Sets PROM to Operate in
Pipelined or Dia~nostic Mode

DCLK

I

Diagnostic Clock (Used to
Clock the Shadow Register)

PCLK

I

Pipeline Clock (Used to
Clock the Output Re~isters)

SDI

I

Serial Data In (Used to
Serially Shift Data into the
Diagnostic Register)

SDO

0

Serial Data Out (Used to
Serially Shift Data Out of the
Diagnostic Register)

SDO

Figure 6. Condensed Block Diagram of the CY7C268
Table 2. CY7C268 Operational Modes
Mode

ENA[l]

SDI

SDO

DCLK

PCLK

Normal Operation[l]

L

H,L

Data In

SDO

--

Rising Edge

Shadow to Pipeline[l]

H

H,L

X

SDI

--

Rising Edge

Pipeline to Shadow

H

L

L

SDI

Rising Edge

--

Data In to Shadow

H

H

L

SDI

Rising Edge

Shift Shadow Reg. [1]

L

H,L

Data In

SDI

Rising Edge

---

No Operation[1]

H

H,L

H

SDI

Rising Edge

--

Data Flow Description

Note: 1. For the asyn~hronous-enable operation, data out is enabled on the first Low-to-High clock transition after E is
brought Low. When E goes from Low to High (enable to disable), the outputs go to the high-impedance state after a
propagation delay if the asynchronous enable was programmed. If the synchronous enable was selected, a Low-to-High
transition is required.
Note that full diagnostic capability is realized through
the use of four control signals: SDI (Serial Data In), SDO
(Serial Data Out), MODE, and DCLK (diagnostic clock).
Including both DCLK and PCLK ensures that serial data
can be shifted into or out of the diagnostic register while
the PROM is operating in normal pipeline fashion. As a
result, the CY7C268 has three possible modes of operation:
Normal (pipelined)
Diagnostic
Pipelined and diagnostic simultaneously
Table 2 summarizes the operational modes of the
CY7C268.

The CY7C268 features full diagnostic capacity and is
available in 32-lead PLCC/LCC or 32-pin O.5-inch DIPs.
The CY7C269 features limited diagnostic capability and is
available in 28-lead PLCC/LCC or 28-pin O.3-inch DIPs.
For an in-depth description of the PROMs' functions,
refer to the data sheets. The following discussion briefly
describes the diagnostic functions available in each
device.

CY7C268
A condensed block diagram of the CY7C268 appears
in Figure 6. Table 1 lists the pin names and functions of
the CY7C268.

5-7

Table 3. CY7C269 Pin Functions

MODE

Ell

I/O

Ao-A12

I

Address Input

00-07

0

Data Lines

E,I

I

Enable or Initialize

Clock

I

Pipeline and Diagnostic
Clock

MODE

I

Sets PROM to Operate in
Either Diagnostic or
Regular Pipelined Mode

SDI

I

Serial Data In

SDO

0

Serial Data Out

--

CONTROL
lOGIC
SOl

soo

CLOCK

Function

Name

8

8'~--------------~

Figure 7. Condensed Block Diagram of the CY7C269

CY7C269

Design Example

A condensed block diagram of the CY7C269 appears
in Figure 7. The CY7C269 has reduced diagnostic function relative to the CY7C268. The CY7C269 is ideal for
applications requiring limited diagnostics with a premium
on board-space conservation. This PROM is available in
28-pin, 300-mil DIPs (windowed or opaque) and in 28lead PLCC/LCC packages. The pin names and functions
of the CY7C269 are listed in Table 3.
Note. that limited diagnostic capability is realized
through inclusion of three diagnostic signals: MODE,
SDI, and SDO. Because there is only one clock, the
regular and diagnostic modes are mutually exclusive.
Table 4 summarizes the operating modes of the
CY7C269.

As an example of using diagnostic PROMs, consider
the complex sequential machine presented earlier. This
machine could be easily implemented using CY7C268s or
CY7C269s, as shown in Figure 8. Note that the block
labeled "diagnostic control" could consist of PLDs,
PROMs, a sequencer, or a small microcontroller. Choosing between the CY7C268 and the CY7C269 is based on
the complexity of the diagnostic function required. For
full diagnostics that can function simultaneously with
regular pipelined operation, use the CY7C268. For an application where limited diagnostic capability is required perhaps only a function at power-up or some other welldefined time - use the CY7C269.

Table 4. CY7C269 Operating Modes
Data Flow Description
Normal Operation

Mode
L

-

E,I

Clock

SDI

SDO

[1],[2]

Rising Edge

X

HighZ

Shadow to Pipeline

H

L

Rising Edge

L

SDI

Pipe or Bus to Shadow

H

L

Rising Edge

H

SDI

Shift Shadow

H

H

Rising Edge

Data In

SDO

Notes:
1. The Eor I function is selected during programming.
2. If I is selected, the outputs are always enabled. If E is selected, the outputs are enabled synchronously or asynchronous_
ly, as"'programmed.
3. If I is selected, the outputs are always enabled. If E is selected, during diagnostic operation the data outputs remain in
the state they were in when the mode was entered. When enabled, the data outputs reflect the outputs of the pipeline
register. Any changes in the data in the pipeline register appear on the output pins.

5-8

SYSTE... INPUTS

"2

~I
I

ADDRESS DECODER
PROGRA ...... ABLE ARRAY
B!

~--+I

J.

DIAGNOSTIC MUX

I

DlAGNOSnC CONTROL

CONTROL
LOGIC

",

(I

PROG. INITIALIZE WORD
B - BIT PIPELINE REGISTER

I

I

8 - BIT DIAGNOSTIC
SHIFT REGISTER

~

I

~l

I

"

rl
t1

I--

I

8!
r---I

I+--

II

~

B

8
2

7

"

I

ADDRESS DECODER
PROGRA ...... ABLE ARRAY

+

8!

---+f

DIAGNOSTIC "'UX

I

....

-+f

I

PROG. INITIALIZE WORD

-+18 - BIT PIPELINE REGISTER

-

2

I
~

I

II

ADDRESS DECODER
PROGRAW ...ABLE ARRAY

+

8!

-H

I
I

I

8 - BIT DIAGNOSTIC
SHIFT REGISTER

DIAGNOSTIC "'UX

I

8

8
CONTROL
LOGIC

3

~

4-

CONTROL
lOGIC

H

H

-

8

8!
t1

PROG. INITIALIZE WORD
8 - BIT PIPELINE REGISTER

I
I
B

I

I

8 - BIT DIAGNOSTIC
SHIFT REGISTER

J

~

B

8

8

8

6

6

8

2

Figure 8. Complex Sequential Machine Implemented with Cypress Diagnostic PROMs

5-9

~

~
~~~~~~~
==
,
SEMICONDUCTOR
iii CYPRESS

Interfacing the CY7C289 to the AM29000
CY7C289 PROMs

This application note describes how to use highspeed Cypress CY7C289 PROMs to design an instruction memory system with virtually zero wait states for a
33-MHz AMD AM29000. The design includes 1 Mbyte
of CY7C289 PROMs in addition to the inteiface cir~
cuitry used to support processor bursts. A. logic
schematic and the equations for the PLDs used m the
memory interface are included.
Traditionally, PROMs have been much slower than
RAMs. System designers used PROMs only for the
boot process, immediately transferring the information
into RAMs once power-up was complete. This inefficient solution wasted a considerable amount of board
space, but system performance was generally considered more important.
The need for this tradeoff is now evaporating.
Cypress PROMs have narrowed the speed gap between
RAMs and PROMs to almost nothing. The CY7C289
PROMs use a fast-column-access. architecture to
produce on-page access ti~es of just 20 .ns (f~r
registered mode) at a 512-Kblt (64K x 8) denSIty. ThIS
architecture takes advantage of the burst mode feature
common in many current microprocessors. Because
most 32-bit processors burst just 16 bytes in a. w~ap­
around fashion, the burst mode accesses fall wlthm a
single page of the CY7C289 PROMs. Thus, each access
in a burst to the PROM is always completed in 20 ns.
Even with a prOcessor that generates bursts considerably longer than 16 bytes, the CY7C289 can supply all
the data in a burst from a single page. An excellent example of this capability is the 29000 instruction memory
design described in this application note. Even though
29000 bursts can be up to 1 Kbyte long, the memory
design described here never requires a wait s.tate dur~g
a processor burst. Wait states .are only.re9-urred dunng
an initial access,· and the .maxImum . walt· In a 33~MHz
system is just two clock cycles.
Figure 1 displays a block diagram of the ins~ction
memory system design for the 29000. The deSIgn has
three basic blocks: the 29000 microprocessor, the controllogic, and 1 Mbyte of CY7C289 PROM.

The CY7C289 is one of four new 64K x 8
reprogrammable PROMs offered by Cypress Semiconductor. Two of these PROMs, including the CY7C289,
feature the unique fast-column-access architecture. On
these devices, the PROM array is divided into 1024
pages that are each 64 bytes long. Any consecutive access to the same CY7C289 page requires just 20 ns to
complete. If an access cr~ss~s an internal. P~OM pa~e,
the device delivers data wlthm 65 ns. To mdlcate an mternal page crossing to the external circuitry, the
CY7C289 generates a W AIT\ signal.
Along with the unique array architecture, the
CY7C289 provides a variety of programmable features
to simplify the memory interface. Among these
programmable features. is the ability to capture the
input address with on-chip registers or latches.
If you select the address latch option, the address
flows into the PROM during the active portion of the
ALE signal and is captured when ALE is deas.serted
(the ALE signal's polarity is programmable). ThIS ?Ption is appropriate for most CISC processors, whIch
supply a valid address after the system clock's rising
edge. The ALE option can improve system performance by allowing the PROM to capture the ad.~ess as
soon as it becomes available, as opposed to wattmg for
the system clock's next rising edge. The ?raw?ack to the
address latch· option is that external lOgIC mIght be req uired to generate the ALE signal.
If you select the CY7C289's registered option, the
address at the input is captured at the CLK input's
rising edge. The advantage of the registered ~ode is
that the memory interface is often simpler. ThIS configuration is particularly useful when interfacing to
RISC processors. Most of these processors generate addresses arouhd the risillg edge of a system clock,
making it easy to capture the address with the CY7~89
input registers. (See the application note, "Interfacmg
the CY7C289 to the CY7C601.")
Another important CY7C289 feature is the ability
to program the polarity of two chip selects (CS 1 and

5-10

CY7C289 PROM - ( 4 BANKS )

.........................................:

CONTROL LOGIC

AM29000

....

10-131 ( .....

(ROY
IREQ
IBREQ
AZ-AS

00-031
-

:

-

-

-

- AO-A5.
cst.CSC

-

-

-

niL

-

-

-

-

WAIT
ALE

:

LOGIC

.....

AIO-A31

-

r-

~
M

I )
COUNTER

U
X

v

:
./
-v-

:

:

:
:

-

:

"

-

I-

) A6-h15

..

:..........................................:

I
I
I
Figure 1. AM29000 Instruction Memory Block Diagram

CS2), which facilitates automatic bank selection for up
to four banks of PROM. Proper use of the chip selects
also allows you to extend the PROM page's length
beyond 64 words when using multiple banks of PROM.
This capability improves the system's performance by
effectively increasing the size of a page in the
CY7C289s (more on this later).
Here is a complete list of the programmable features available on the CY7C289:
The input address can be either registered at
CLK's rising edge or latched by the ALE input.

You can program the polarity of both chip
selects (CSl and CS2).
You can set each of these options by appropriately
programming a reserved PROM location. Therefore,
the devices are configured at the same time the array is
programmed.

AM29000 Microprocessor
The 29000 is a 32-bit general-purpose microprocessor used mainly in embedded controller applications.
The version used in this design operates at 33 MHzthe highest-speed 29000 currently available. The
processor's pipelined RISe. architecture attempts ~o execute an instruction in every clock cycle. To dQ thIS, the
29000 relies heavily on burst-mode accesses.
The 29000 contains three buses, one each for address, data, and instruction. During a normal access, the
three-bus architecture behaves essentially like a two-bus
system (address and data), because the dedicated instruction and data buses must wait for the shared address bus. In burst mode, however, only the initial data

You can program the address set-up and
hold window.
You can program the WAIT output's polarity.
You can program the ALE input's polarity.
The WAIT output can be generated off the
falling or rising edge of CLK for the registeredmode CY7C289.
5-11

·ESEtT~--~----~----------~--~~----------------------------,

He
II
4.71

. 14

~

cn

uu

·U~ ,,~

~
.ill.ll.L....
.n

r---~--rHr-----------------------~~AlE

liE >--t-t-------'I"()C u·

f

f~

T74FII

+-----'2"-l1

~-------'

;ll--

CpreLl I sn

cLl I I !

S I
-+--t4-+t+-~~I~.
EO;.t--i4H
II
VA IT
5
'----H--t+f-7IT.IL~O..
AD;-.H I
I HUT

'-

~

~C

+-+--11L! ll'

UITOUT 7

,~

WAIT->---+-+-4---~

,r

liD

I~I

t .. ~

T
I

rrr-imQ
I': ru-CITCU
,1

~

:m11 t

III

11 lin
I/O HIT,-iAL
7iE
r--+-'
I/O 11 PRUET
G.!• 15
11 CLlH
DII LL

I10

Uf

r-:-If
II PlIT.-'c"5-------IC E

RESET I
t+--lI;';'IE~E•.!...-H
2 II

+-HH1;-i-ILi'-l'l--'i-i, IZ

IZ

........-++l---;A~l.....
E _~; U ::

12 C'UCI I

•

'-rri"r-

17

C4

~

,-,t1L~--;""" H:; Ol:c;.;~:,;",;--+-+----I~m

...

.,-,~,....:t--: It II
Cl

11

Z

OZ

DUTZ

g.

lI l
74 FI 12 H-l-++f-_________________-+I--I-__-+I+f+t-____---...............
- - -._____--l

lUI

~.rA
I>--H--+---.!...jZ • , •

II!ill:)[IlUIRRf1l

r-LIE SET I,J!.L- 11
IIEI
Z II U
c,
.--_ _ _ _ _ _ _ _- ' r-------H-~+__7IHIL;"_l---ilH IZ U
7 CI
A!
4
;lL
I' 0l3!++'..;"-!---I-----lOUT
Uf
17
• II "
14 01
.UTI
~ !l/eLl OJ

c..-..!. Cll·

>tr

f rt-L

T

#.-

!!H
o,~~
za.

:: i!i

l

AI
A7

itA

'~D ,.~

t .. r-1--llll--

T

7

t 15
II

AI
At

11::

liE.

~: 111

10 17

+--+---!I~I~LL~I~2 110

AU!!L.J.. cu·

74F74

II

#--:'

74F74

t---------l

CID

U ::

.... lIZ

::ll: :~

:~ II

"*
ff
C

~• It~~ :~

11

Iz

:~

::~~

110

lii:"a

7

OJ ~
• 10 ~
fCC

Ie

:!f-om
zt-

.:~ ~.

"lZZUO

110

~------'

Figure 2. Control Logic for the 29000 Instruction Memory

ample, if a burst begins four words before the end of a
1-Kbyte boundary, the burst can at most be four words
long.

or instruction address is sent to the corresponding
memory space, and the task of incrementing the address
is left to the interface circuitry. Thus, during burst, both
the data and instruction buses can operate simultaneously without having to wait for the shared address
bus. In: other words, a 29000 in burst mode can fully
utilize the separate data and instruction bus
architecture.
'
Although· the 29000 achieves maximum throughput
during bursts, AMD did impose a limit, on a burst's
length: The 29000 only' performs bursts within a 1Kbyte boundary. Therefore, an 8-bit counter suffices to
increment the burst addresses. Note that a 29000's maximum burst length depends on where it begins. For ex-

Control Logic
The memory interface logic required for this design
is detailed in Figure2 and appears symbolically in Figure
3. In addition, PLD ToolKit source code for the PLDs
used in the control logic appears in Appendices A
through D.
Referring to the block diagram in Figure 1, note
that the interface circuitry performs two' primary functions. One is to generate all necessary interface signals,
and the other is to increment the instruction address to

5-12

support processor bursts. The hardware required to implement the interface consists of two SSI devices (a
74F74 and a 74F112) and four small PLOs. The 22VlO
PLO is a 15-ns version, while the remaining three PLOs
(one 16R4 and two 16L8s) have a maximum propagation delay of 5 ns.

PROM Configuration
In this application, 16 CY7C289 PROMs constitute
a I-Mbyte instruction memory, distributed in four
banks. The CY7C289s' 22-ns access time (in latch
mode) allows on-page accesses to complete in a single
clock cycle at 33 MHz. Proper use of the programmable
chip selects ensures that all burst accesses fall within
the same PROM page and never require a processor
wait cycle.
The CY7C289s are configured with address latches
to take full advantage of the 29OO0's mid-clock address
release. Latch mode minimizes the number of wait
cycles during a single access or during a burst's first access. The set-up and hold window for the address
latches should be programmed to minimize the hold
time required after latch close. This setting is critical to
proper operation of the address increment circuitry.
The CY7C289s' chip selects are programmed on a
bank-to-bank basis, such that each bank has a unique
polarity combination of CS 1 and CS2. This arrangement
permits PROM bank selection without external address
decoding. The other applicable programmable features
on the CY7C289 are the polarity of the WAIT\ and
ALE signals. In the design implemented here, WAIT\
is active Low and ALE is active High.

Memory Interface
The 29000 has a few peculiarities that affect the
memory system design. For example, the instruction bus
is unidirectional. The 29000 can only READ from instruction memory. This limitation makes it difficult to
use RAMs for instruction memory, because there is no
mechanism to load the instructions into the RAMs to
begin with, but the nonvolatile nature of PROMs makes
them ideal for this application.
One way to use RAM fpr the instruction memory is
to trick the 29000 into thinking it is writing to data
memory (the data bus is bidirectional), but route the information back to the RAMs in instruction memory.
Implementing this memory subsystem requires two 32bit 2: 1 multiplexers on the data and instruction buses, in
addition to the associated glue logic necessary to control the transfer. To use the memory subsystem, the system copies the instruction information (from boot
PROMs located elsewhere on the board) onto the data
bus and subsequently into the RAMs on the instruction
side of memory. This solution is costly, wastes board
space, and slows system operation by adding multiplexer delays into both the instruction- and data-bus
paths.
A much better solution is to use PROMs in the instruction memory. Because they are nonvolatile, the instruction information is programmed into the device
prior to assembling the system, eliminating the extensive
logic needed to write to the instruction bus. Further,
with the CY7C289's high speed, the system has no need
for shadow RAMs. The resulting circuit occupies much
less board space than the RAM-based version and
provides better system performance. Moreover, lessening the number of components improves the circuit's
reliability.
Another unusual 29000 feature is the processor's
ability to suspend bursts to instruction memory. At any
time during an instruction burst, the 29000 can suspend
the sequence by deasserting the burst request signal
(IBREQ\). The instruction memory must respond by
discontinuing its operation while the IBREQ\ signal is
inactive. When the processor reasserts IBREQ\, the
memory system must resume from the point at which
the burst was suspended. Note that because the 29000
does not send a new address at this point, the interface
logic has to remember the address at which the processor suspended the burst. An instruction burst is not
complete until the 29000 asserts the instruction request
signal (IREQ\) and sends a new address. The interface
logic described in this design fully supports suspended
bursts.

Address Connection Scheme
For the most part, the placement of the CY7C289
PROMs in this design is straightforward. However,
there are two important memory design features that
bear clarification. The first is the address connection
scheme used for the CY7C289 PROMs. In Figure 3's
display of the address input to the CY7C289s, notice
that the addresses fed to the PROMs are not entirely
sequential. This non-sequential addressing scheme is
used with the chip selects to extend the effective PROM
page length to 1 Kbyte, and thus achieve no-wait-state
burst performance.
To understand how this is done, consider some internal details of the CY7C289. In this PROM, the
lowest six address inputs (AO - A5) designate a· specific
byte within a 64-byte internal PROM page. Inputs A6 A15 select one of 1024 PROM pages. When any of the
inputs at pins A6 - A15 changes, a new page is selected,
and the CY7C289 asserts the W AIT\ output.
You can think of the CY7C289's chip selects (CSl
and CS2) as additional address inputs in a multi-bank
memory system. Like AO - A5, changes at these inputs
do not result in an internal CY7C289 change. With four
banks of PROM, you have a total of 8 address bits (AO
- A5, CSl, and CS2) that do not affect the internal
PROM page, as opposed to just 6 (AO - A5) when using
one bank of PROM. The 8 bits of on-page addresses
translate into a PROM page length of 256 words, or 1
Kbyte, which equals the 29000's maximum possible
burst.
The schematic in Figure 3 reveals how this pagelengthening scheme is implemented. Note that all the

5-13

Figure 3. AM29000 Instruction Memory Design

outputs from the control logic (O~ - 09) connect to
CY7C289 inputs that do not cause a page change (AO ~
A5, CSI, and CS2). The lowest address connected
directly from the CPU to the PROMs is AlD. The 29000
is guaranteed never to change AIO - A31 during a burst,
because this would constitute crossihga I-Kbyte boundary. All the addresses that can change during burst connect to AO - A5, CSI, and CS2; thus, the CY7C289
never crosses an internal page-and never causes a wait
state-during a 29000 burst. The chip selects in this

design effectively quadruple the PROM page length, allowing a· greater percentage of single accesses and all
burst accesses to finish within a single clock cycle.
To make the. extended page useful, note that you
need .to locate sequential code on the same PROM
page. Because this design extends each PROM page
across all four banks, you must segment code into pagelength blocks; this is analogous to using interleaved
DRAMs. Because each CY7C289 PROM has a 64-byte
internal page, your code must be separated into 64-

5-14

word blocks. In other words, place the first 64 words of
code in bank 1, the next 64 words in bank 2, and so on.
You can accomplish this segmentation with a simple
program.

The ALE input controls the input of addresses to
the CY7C289s. The CY7C289's latch mode takes full
advantage of the 29000's mid-clock address release and
minimizes wait states during the initial access. The
drawback to latch mode is that an ALE signal must be
generated externally. In this design, the 16R4 creates
ALE based on the input clock, IRDY\, WA11\, and
IBREQ\. The remaining signals generated by the 16R4
control the burst counter logic implemented in the
22VI0 and the 16L8s.
Note that most of the logic displayed in Figure 2 is
required regardless of the memory device you choos'e.
Implementing the burst-counter interface to the 29000
requires the 22VI0, both 16L8s, and a portion of the
16R4. Thus, only the two SSI components and part of
one 16R4 are needed to create the appropriate communication signals between the 29000 and the CY7C289
PROMs.

Using the WAIT\ Signal
The second memory design issue that bears
clarification is the connection of the CY7C289's WA11\
signal. The CY7C289 asserts this signal when the input
address crosses an internal page boundary (at least one
of the inputs A6 - A15 changes). W A11\ tells the 29000
that the PROMs need an additional clock cycle to
deliver the requested instruction.
Note in the schematic in Figure 3 that only one
WA11\ output connects to the control logic. This is because all the PROMs examine the same upper-order
address inputs to determine if an internal page has been
crossed. Therefore, only one PROM is required to
identify a page crossing and assert the WAI1\ signal,
even if the chip selects (CSI and CS2) are deasserted at
the time. The only time the PROM does not generate
WA11\ is when the chip enable signal (CE\) is inactive
during an address change.

System Timing
Figures 4 and 5 illustrate the communication be-

tween processor and memory that supports burst mode
and inserts wait states. The 29000 generates the instruction request (lREQ\) and instruction burst request
(IBREQ\) signals to initiate instruction accesses. To
begin an access, the 29000 asserts IREQ\ and places a
valid address on the address bus a maximum of 12 ns
after CLK's rising edge. If this is the beginning of an
instruction burst, the processor asserts IBREQ\ no
more than 10 ns after the system clock's falling edge. At
each subsequent rising edge of CLK, the 29000 samples
the instruction ready (IRDy\) input before reading
data. Therefore, by deasserting IRDY\, the external
memory system can hold the CPU until an access is
completed.
When the access is finished, the memory system
must assert IRDY\ at least 10 ns before CLK's next
rising edge. The data must appear on the bus at least 4
ns before CLK's rising edge.

Burst Counter
The 22VlO and the two 16L8s support the 290oo's
burst mode capability. The 22VI0 implements an 8-bit
loadable counter, which loads a new address from the
29000 when the IREQ\ signal is asserted. On each subsequent clock rise in which IBREQ\ is active, the
22VlO increments the current address and delivers the
result to a multiplexer. Note that the clock to the 22VlO
is not the system clock. The 16R4 generates a special
counter clock that properly times the loading of the
counter and halts the count during a suspended burst.
The pair of 16L8s are utilized primarily as two
high-speed multiplexers. Each 16L8 implements a 4-bit
2: 1 multiplexer that selects the ins truction address from
the 29000 or the counter. During an initial access
(IREQ\ Low), the 16L8s feed the processor address to
the instruction memory. When the 29000 is bursting, the
counter address is routed to the PROMs.

No-Wait Timing
The control logic in this design generates IRDy\
based on the W A1T\ output from the CY7C289
PROMs and the IREQ\ and IBREQ\ signals from the
processor. During a single access or a burst's first access, the interface automatically inserts one wait cycle
due to the 29000's late delivery of valid address; completing a single access without a wait state would require a 12-ns PROM access time. The interface inserts
the wait state by deasserting IRDy\ in the cycle in
which IREQ\ was asserted. In the scenario illustrated
in Figure 4, this access falls on the same page as the
previous PROM access and therefore does not generate
a WA11\. The interface logic asserts IRDy\ in the following cycle, and data is delivered prior to CLK's next
rising edge. The CY7C289 PROMs' 22-ns on-page access time is well within the 44-ns window that results
from the single inserted wait state.

Signal Generation
The primary function of the remaining interface
logic (the 16R4, 74F74, and 74F112) is to generate the
necessary system control signals. These signals include
the instruction ready signal (IRDy\) to the 29000 and
the address latch enable signal (ALE). for the PROMs.
The IRDy\ input to the 29000 halts the processor when
accessing slower memory. Based on the WA11\ output
from the CY7C289s, the interface circuitry deasserts
IRDy\ when the PROM. requires more time to complete an access. Because this design never requires a
wait during a burst access, the control logic simply
holds IRDy\ Low while a burst is in progress. For the
PROM interface, IRDY\ is only used during a single
access or during a burst's initial access.
5-15

CPUCLK

ADR

I REO

IBREO
I ROY

DATA
4:4

ALE

VAIT
Figure 4. Instruction Memory Timing (WAIT deasserted)

the ALE input is active, the latch is transparent, and the
address at the input flows into the PROM. On the transition of ALE from High to Low, the PROMs latch the
address and ignore further changes to the address while
ALE is Low.
In this design, the ALE input remains active (open)
until a burst sequence begins. During a burst, the ALE
signal advances the counter and controls the loading of
the counter address into the PROM. Because ALE's
falling edge increments the count, the PROM's address
inputs change only after the address latch closes.
Note in the schematic in Figure 3 that the 16R4
generates the clock input to the AM29000. This clock
arrangement ensures that the ALE and CPUCLK sig-

Once the initial access is delivered, the memory can
complete each burst access within a single cycle. The
control logic therefore keeps IRDY\ asserted as long as
the IBREQ\ signal from the CPU is active. In Figure4,
note that the· 29000 temporarily deasserts IBREQ\-the
method the processor uses to suspend an instruction
burst In response, the instruction memory suspends
data delivery until the IBREQ\ is reasserted. When
IBREQ\ reasserts, the data is delivered from .the point
at which the burst was suspended, as illustrated in the
timing diagram in Figure4.
To govern the operation of the instruction PROMs,
the control logic generates the address latch enable signal (ALE), also shown in Figure 4. In this design, the
ALE input is programmed as active High. Thus, when

5-16

~

=- ~~0ID~~~~~~~~~~~~I~n~te~r~fa~c~in~g~th~e~C~Y~7~C~2~89~to~t~h~e~A;~~2~9~O~OO
CPUCLK

AOR

VALtD

IBREO

IROY

DATA

ALE
\lAIT
Figure 5. Instruction Memory Timing (WAIT asserted)

played in Figure 2 uses this WAIT\ output's falling edge
to send an additional wait signal to the 29000. This wait
signal is created by keeping the IRDY\ signal High for
one additional cycle.
As shown in Figure 5, this added wait provides a
total of 74 ns for the PROM to complete the access. An
access that involves crossing an internal PROM page
actually requires only 65 ns. Note once again that after
the initial data has been delivered, all subsequent burst
accesses are delivered within a single clock cycle.

nals track each other and are as closely synchronized as
possible.
PROM Wait Timing
If WAIT\ is asserted during a single access or

during the initial access of a burst, the control logic inserts one additional wait cycle (Figure 5). This wait
cycle occurs if a PROM address crosses a page boundary; the WAIT\ signal is then asserted a maximum of 21
ns after the address is loaded. The control logic dis-

5-17

'5):=
__

Interfacing the CY7C289 to the AM29000

~COID~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. PLD Toolkit Source Code for the 16R4
CI6R4;
{Norman Taffe
Cypress Semiconductor
April 23, 1990
Control Logic for CY7C289 PROM interface to the AMD 29000. }

CONFIGURE
{inputs}
CLK, CLKIN, RESET, IREQ, WAIT, INLOAD, WAITOUT, DIBREQ, KILL, OE(node= 11),
{outputs}
IRDY(node= 12), ALE, PRESET, CLRJK, DKILL, DIREQ, COUNTCLK, CPUCLK,

EQUATIONS;
lPRESET =

< sum> !INLOAD & lRESET;

lDIREQ =

< sum>

lALE =

< oe>
< sum> lDIREQ & lWAITOUT & lCLKIN & lRESET
# lDIBREQ & lWAITOUT & lCLKIN & lRESET;

lCPU CLK =

< oe>
< sum> lCLKIN & lCLKIN & lCLKIN & lCLKIN
# lCLKIN & lCLKIN & lCLKIN & lCLKIN;

!IRDY =

< oe>
< sum> WAIT & lWAITOUT & lDIBREQ & lRESET
# WAIT & lWAITOUT & lPRESET & lRESET
# lWAITOUT & lDIBREQ & lDKILL & lRESET
# lWAITOUT & lPRESET & lDKILL & lRESET
# lCLRJK & !RESET;

lCLRJK =

< sum> lPRESET & WAITOUT & lRESET;

lCOUNTCLK=

lDKILL =

!IREQ & lRESET;


< sum> lWAITOUT & DIBREQ & PRESET & CLRJK & DIREQ
# CLKIN &lWAITOUT & PRESET & CLRJK;

< sum> lCLRJK & lRESET;

5-18

~

=~RFSS
~,..
SEMICONDUCTOR

Interfacing the CY7C289 to the AM29000

.;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;=;;;;;;;;;;;;;;;;;;=;;;;;;;;;=~

Appendix B. PLD Toolkit Source Code for the Upper 16L8
C16L8;
{ Norman Taffe
Cypress Semiconductor
April 23, 1990
Control Logic for 285/9 PROM interface to AMD 29000. }
CONFIGURE;
{inputs}
RESET, IREQ, KILL, ALE, A2, A3, A4, A5, C2, C3(node= 11),
DIBREQ(node= 16), C4, C5,
{outputs}
02(node= 12),03,04,05, CE(node= 19),
EQUATIONS;
!02=

< oe>
< sum>
#
#
#

!RE SE T & !A2
!IREQ & !KILL & ALE & !A2
RESET & !ALE & !C2
RESET & KILL & !C2
# RESET & IREQ & !C2
# !A2& !C2;

!03 = < oe>
< sum>
#
#
#
#
#

!RESET & !A3
!IREQ & !KILL & ALE & !A3
RESET & !ALE & !C3
RESET & KILL & !C3
RESET & IREQ & !C3
!A3 & !C3;

!04 = < oe>
< sum>
#
#
#
#
#

!RESET & !A4
!IREQ & !KILL & ALE & !A4
RESET & !ALE & !C4
RESET & KILL & !C4
RESET & IREQ & !C4
!A4 & !C4;

!05 = < oe>
< sum> !RE SE T & !A5
# !IREQ & !KILL & ALE & !A5
# RESET & !ALE & !C5
# RESET & KILL & !C5
# RESET & IREQ & !C5
# !A5 & !C5;
!CE = < oe>
< sum> !IREQ # !DIBREQ;

5-19

Sir;=

Interfacing the CY7C289 to the AM29000

... SEMICOIDUCfOR

===============;;;;;;;;;;;;;;;;;:;=======;;;;;;;;;;;;;;;==;;;;;;;;=;;;;;;.
Appendix C. PLDToolkit Source Code for the Lower 16L8

C16LS;
{ Norman Taffe
Cypress Semiconductor
April 23, 1990
Control Logic for 2S5/9 PROM interface to AMD 29000. }
CONFIGURE;
{inputs}
RESET, IREQ, KILL, ALE, A6, A7, AS, A9, C6, C7(node= 11), CS(node= 17), C9,
{outputs}
06(node= 12),07, OS, 09, ALEBAR,
EQUATIONS;
!06 =

< oe>
< sum> !RESET & !A6

#
#
#
#
#

lIREQ & !KILL & ALE & !A6
RESET & !ALE & !C6
RESET & KILL & !C6
RESET & IREQ & !C6
!A6& !C6;

!07 = < oe>
< sum>
#
#
#
#
#

!RESET & !A7
lIREQ & !KILL & ALE & !A7
RESET & !ALE & !C7
RESET & KILL & !C7
RESET & IREQ & !C7
!A7 & !C7;

!OS =

< oe>
< sum> !RESET & !AS

#
#
#
#
#
!09 =

lIREQ & !KILL & ALE & !AS
RESET & !ALE & !CS
RESET & KILL & ICS
RESET & IREQ & ICS
!AS & !CS;

< oe>
< sum> !RESET & !A9

#
#
#
#
#

!ALEBAR =

!IREQ & !KILL & ALE & IA9
RESET & !ALE & !C9
RESET & KILL & !C9
RESET & IREQ & !C9
!A9 & !C9;

< oe>
< sum> ALE;

5-20

Appendix D. PLD Toolkit Source Code for the 22VI0
C22V10;
{ Norman Taffe
Cypress Semiconductor
April 25, 1990
8-bit counter for AMD29000 PROM interface. }

CONFIGURE;
{inputs}
CLK,A2,A3,A4,A5,A6,A7,A8,A9,KILL,IREQ,
{outputs}
09( node= 14),08,07,06,05,04,0 3,02,Q 1(noreg),

EQUATIONS;
!Ql:=

< sum> !KILL & !IREQ;

!09:=

< oe>
< sum> 02 & 03 & 04 & 05 & 09 & 08 & 07 & 06 & Ql
# !02 & !09 & Q 1
# !03 & !09 & Q 1
# !04 & !09 & Q 1
# !05 & !09 & Q 1
# !09& !06& Ql
# !09 & !07 & Q 1
# !09 & !08 & Ql
# A2 & A3 & A4 & A5 & A6 & A 7 & A8 & A9 & !Q 1
# !A2 & !A9 & !Q 1
# !A3 & !A9 & !Q 1
# !A4 & !A9 & !Q 1
# !A5 & !A9 & !Q 1
# !A6 & !A9 & !Q 1
# !A7 & !A9 & !Q 1
# !A8 & !A9 & !Ql;

!08:=

< oe>
< sum> 02 & 03 & 04 & 05 & 08 & 07 & 06 & Ql
# !02 & !08 & Q 1
# !03 & !08 & Q 1
# !04 & !08 & Ql
# !05 & !08 & Ql
# !08 & !06 & Q 1
# !08 & !07 & Q 1
# A2 & A3 & A4 & A5 & A6 & A 7 & A8 & !Q 1
# !A2 & !A8 & !Q 1
# !A3 & !A8 & !Q 1
# !A4 & !A8 & !Q 1
# !A5 & !A8 & !Q 1
# !A6 & !A8 & !Q 1
# !A7 & !A8 & !Ql;

~~~~~~~~~~~~_In_t_e_r[_a_C_in~g~t_h_e_C_Y~7_C_2_8_9_to~th_e_A~·~_2_9_0_0~O
Appendix D. PLD Toolkit Source Code for the 22VIO (cont.)

107:=

< oe>
< sum> 02 & 03 & 04 & 05 & 07 & 06 & Ql

#
#
#
#
#

#
#
#
#
#
#

!06:= < oe>
< sum>
#
#
#
#
#
#

#
#
#

!02& !07 & Ql
!03 & !07 & Q 1
!04 & !07 & Q 1
!05 & !07 & Ql
!07 & !06& Ql
A2 & A3 & A4 & AS & A6 & A7 & !Q 1
!A2 & !A7 & !Q 1
!A3 & !A7 & !Q 1
!A4 & !A7 & !Q 1
!A5 & !A7 & !Q 1
!A6 & !A7 & !Ql;

02 & 03 & 04 & 05 & 06 & Q 1
!02& !06& Ql
!03&!06&Ql
!04 & !06 & Q 1
!05 & !06& Ql
A2 & A3 & A4 & AS & A6 & !Q 1
!A2& !A6& !Ql
!A3 & !A6& !Ql
!A4 & !A6 & !Q 1
!A5 & !A6 & !Ql;

!05:= < oe>
< sum> 02 & 03 & 04 & 05 & Ql
# !02& !05 & Ql
# !03 & !05 & Ql
# !04 & !05 & Ql
# A2 & A3 & A4 & AS & !Q 1
# !A2 & !A5 & !Q 1
# !A3 & !A5 & !Q 1
# !A4 & !A5 & !Ql;
!04:= < oe>
< sum> 02 & 03 & 04 & Q 1
# !02 & !04 & Q 1
# !03 & !04 & Q 1
# A2 & A3 & A4 & !Q 1
# !A2 & !A4 & !Q 1
# !A3 & !A4 & !Ql;
!03:=

< oe>
< sum> 02 & 03 & Q 1 # !02 & !03 & Q 1 # A2 & A3 & !Q 1

# !A2 & !A3 & !Ql;

!02:= < oe>
< sum> 02 & Ql # A2 & !Ql;

5-22

CYPRESS
SEMICONDUCTOR

Interfacing the CY7C289 to the CY7C601
clock's rising edge. Ordinarily, you must latch these signals externally with several 74F74s or the like. However,
the CY7C289's on-chip registers capture the address
bits at the system clock's rising edge. This feature, as
well as the CY7C289's automatic WAIT-signal generation, allow for a straightforward connection between the
memory and the processor.
Figure 1 displays a block diagram of the instruction
memory system design for the CY7C601. As the
diagram shows, the design has only two major components: the CY7C601 32-Bit RISC Processor and one
Mbyte of CY7C289 PROM.

This application note describes how to use highspeed CY7C289 PROMs to design an instruction
memory for a 40-MHz CY7C601 RISC processor. The
design features 1 Mbyte of PROM and requires no interface circuitry. Utilizing a unique fast-column-access
architecture, the CY7C289 supplies data in a 40-MHz
system .with only occasional wait states. A schematic of
the design is included at the end of this application
note.
Because microprocessor performance improvements have outpaced access-time advances in high-density memory devices, system designers have resorted to
memory interleaving and high-speed SRAM caches to
more fully utilize a processor's performance capability.
In embedded control applications, the alternative has
been to compromise system performance by slowing
every processor access to PROM memory with wait
states or by using PROMs only for the boot process and
running instruction code from SRAMs. The necessity
for faster, nonvolatile memory in high-performance embedded applications has prompted Cypress to design
high-speed PROMs that you can easily interface to a
variety of microprocessors.
Using the CY7C289, high-speed embedded application s can run code directly from PROM and
eliminate the extra board space, cost, and logic required
to transfer code into ." shadow" RAMs. To achieve this
level of performance, the CY7C289 PROMs employ an
innovative architecture that accentuates local speed.
The memory array is split into 64-byte pages that allow
on-page access times of just 20 ns in a 512-kbit (64K x
8) PROM. This performance equals that of the fastest
static RAMs at similar densities. SRAM-like performance, combined .with the non-volatility of EPROM
technology, makes these devices ideal for high-performance embedded control applications.
Another important CY7C289 feature is the
availability of on-chip address registers. The CY7C601
memory design presented in this application note is an
example of the address registers' usefulness. Like many
RISC architectures, the CY7C601 delivers its address
and memory signals unlatched prior to the system

CY7C289 PROMs
The CY7C289 is part of a high-density (512K),
high-speed CMOS PROM family offered by Cypress
Semiconductor. The CY7C289, along with another of
the family members, features a unique fast-column-access architecture. The PROM array is arranged into
1024 pages, each 64 bytes long. Consecutive accesses to
the same page require only 20 ns to complete. When an
access crosses a page within the PROM, the data is
delivered in 65 ns. The 7C289 generates a WAIT signal
to alert external circuitry of an off-page access.
The CY7C289 emphasizes fast local accesseswithin a 64-byte page. The principle behind the
CY7C289 derives from a statistical approach to performance improvement. Many microprocessors linearize
memory access requests because of on-chip cache
burst-fill modes or instruction pre-fetch queues, in effect localizing the instruction fetch sequences. In the
CY7C289, Cypress uses the fast-column-access architecture to improve local performance and take advantage
of instruction stream linearity and locality. Fast access is
possible when consecutive PROM retrievals are within
the current page.
When a memory cycle requests data that is not on
the current page, the chip must power up the correct
page. Because processor code tends to be linear in nature, though, PROM accesses usually fall on the same
PROM page and therefore require only 20 ns to
complete.
5-23

CY7C289 PROM
( 4 BANKS)
CY7C601
00-031 V
~
MHOLD
MOS

00-D31

-

-

-

-

-

- AO-A5.
. . CS1 . CS2

-

-

-

r-"") A6-A15
....

-

-'-

-

I--

ri--

-

WAIT

I

A2.-A9

AIO-A31

I

I

I

Figure 1. Block Diagram of CY7C601 Memory Design

Along with the unique array architecture, the
CY7C289 simplifies system design by providing the onchip logic necessary to generate a WAIT signal. This
signal is used to automatically insert microprocessor
wait states during an off-page access.
To simplify the memory interface with a variety of
microprocessors, the CY7C289 contains a rich set of
programmable features. For example, you can latch the
input address with the ALE input or register the address at CLK's rising edge. The CY7C289 provides a
programmable bit to select between latched and
registered address inputs. The default is registered inputs, which samples the address on CLK's rising edge
and captures the address in the address register. This
configuration suits most RISC processors, which
generate addresses around the system clock's rising
edge.
When in LATCH mode while the ALE pin is active, the PROM recognizes any address changes and

latches the address into the address registers on the
user-defined edge of ALE. This option is particularly
useful when interfacing with CISC processors (see Reference). Most CISC processors generate a valid address
some time following the system clock's rising edge. Instead of waiting for the next rising clock edge (and
sacrificing perfonnance), you can capture the address
immediately using the ALE input. The drawback to
LATCH mode is that it might require external interface
circuitry. If you do select the ALE function, you can
define the ALE signal's polarity, with the default being
positive,
To eliminate external bank decoders, the CY7C289
includes two programmable chip selects (CSl and CS2).
The polarity of these inputs is user programmable,
facilitating automatic bank selection of up to four banks
of PROM. The programmable chip selects provide an
additional advantage for multibank PROM designs. If
you arrange them correctly, you can effectively extend
5-24

the length of the CY7C289 pages from 64 to as many as
256 words. This extension improves system performance
by increasing the likelihood of on-page PROM accesses
(more on this feature later).
The CY7C289 includes these programmable features:
1. You can either register the input address at
CLK's rising edge or latch the address using the ALE
input.
2. You can program the address set-up and
hold window.
3. You can program the WAIT output's polarity.
4. You can program the ALE input's polarity.
5. You can generate the WAIT output from CLK's
falling or rising edge for the registered-mode CY7C289.
6. You can program the polarity of both chip
selects (CS 1 and CS2).
Each of these options is set by appropriately
programming a reserved PROM location. Therefore,
the devices are configured at the same time the array is
programmed.

PROM Configuration
In this application, four banks (16 CY7C289s) of
PROM are used to provide 1 Mbyte of memory. Like
most RISC architectures, the CY7C601 sends out valid
address information immediately preceding a rising
clock edge (and removes it soon afterward). Thus, the
CY7C289s are configured in registered mode. The onchip address registers capture the input at CLK's rising
edge and ignore all unclocked address changes.
The chip selects on the CY7C289s are programmed
on a bank to bank basis. Each bank is programmed with
a unique polarity combination of CSI and CS2 to permit PROM bank selection without external address
decoding.
The other programmable features relevant to this
design involve the CY7C289's WAIT signal. For compatibility with the CY7C601, the WAIT signal should be
active Low and generated with respect to CLK's falling
edge.

PROM Interface

CY7C601 Microprocessor

Because this design involves no glue logic, the
CY7C289 PROM's circuit connections are relatively
straightforward. The CY7C601 communicates with external memory via a 32-bit address bus and a 32-bit
data/instruction bus. Note, in Figure2, however, that the
addresses fed to the PROMs are not entirely sequential.
The reason for the nonsequential addresses lies in
the way the CY7C289 is organized. To improve the
system's performance, the CY7C289 chip selects (CSI
and CS2) are used to extend the effective PROM page
length to 256 32-bit words (1 Kbyte). To understand
how this is done, consider that the CY7C289's lowest six
address inputs (AO - AS) designate a specific byte
within a 64-byte internal PROM page. The CY7C289
uses inputs A6 - A15 to select one of 1024 PROM
pages. When any of the inputs at pins A6 - A15 changes, a new page is selected and the CY7C289 asserts the
WAl1\ output.
You can think of the CY7C289's chip selects as additional address inputs in a multibank memory system.
As with AO - AS, changes at the chip select inputs do
not result in an internal page change.
With four banks of PROM, you have a total of 8
address bits (AO - AS, CS1, CS2) that do not affect the
internal PROM page, as opposed to just 6 (AO - AS)
when using one bank of PROM. The 8 bits of on-page
addresses translate into a PROM page length of 256
words or 1 Kbyte.
The schematic in Figure 2 reveals how this pagelengthening scheme is implemented. Note that the
lowest 8 address bits from the CPU (A2 - A9) connect
to the CY7C289 inputs that do not cause a page change
(AO - AS, CS1, CS2). The lowest address that connects
directly from the CPU to the PROMs is AI0. The chip
selects in this design have effectively quadrupled the

The CY7C601 is a 32-bit general-purpose
microprocessor that offers extremely high performance
for embedded controller applications. The system
described in this application note, for example, operates
at 40 MHz. The CY7C601 is Cypress's CMOS implementation of Sun Microsystems' SPARC (Scalable
Processor Architecture). This architecture achieves 29
MIPS by executing most instructions in a single clock
cycle.
A CY7C601 architectural feature that affects the
memory interface is an internal pipeline. To achieve an
instruction execution rate approaching one instruction
per clock cycle, the CY7C601 uses a four-stage instruction pipeline. All four stages operate in parallel, working on up to four different instructions at a time. The
stages are:
1. Fetch-The processor sends out the instruction
address to fetch an instruction.
2. Decode-The instruction is placed in the instruction register and decoded. The processor reads the
operands from the register file and computes the next
instruction address.
3. Execute-The processor executes the instruction
and saves the results in temporary registers.
4. Write-The processor writes the result to the
destination register.
A basic single-cycle instruction enters the pipeline
and completes four cycles later. Normally, once the
pipeline is full, an instruction is executed during every
clock cycle. The existence of the instruction pipeline affects the memory interface (as described in the System
Timing section of this application note). Otherwise, the
memory interface design is straightforward.
5-25

CY7C601
40 MHZ· ..r
".LF

Figure 2. CY7C601 Memory Design

5-26

PROM page length, allowing a greater percentage of
PROM accesses to complete within a single clock cycle.
Note that the extended-page-Iength feature of this
design affects the software that runs on the system. To
make the extended page useful, sequential code needs
to be located on the same PROM page. In this design,
where each PROM page extends across all four banks,
c?de .mus.t be segmented into page-length blocks. This
situatIon is analogous to interleaving DRAMs. Because
each ,CY7C289 PROM has a 64-byte internal page, the
users code must be separated into 64-word blocks. In
other words, place the first 64 words of code in bank 1
the next 64 words in bank 2, and so on. A simple pro~
gram can accomplish this segmentation.
Another design issue that bears clarification is the
connection of the WAI'I\ signal generated by the
CY7C289. This signal is asserted when the input address crosses an internal page boundary on the PROMs.
WAI'I\ connects directly to the CPU's Memory Hold A
(MHOLDA\) and Memory Data Strobe (MDS\) in~uts to .tell the CY?C601 that an additional clock cycle
is reqUlred to dehver the requested instruction from
PROM .. In the schematic in Figure 2, only one WAIT\
output is connected to the CY7C6010 This is because all
16 PROMs examine the same upper-order address inputs to determine if an internal page has been crossed.
Therefore, only one PROM is needed to assert the
yvAIT\ signal when an off-page access is detected. It is
lmporta~t to no~e that the PROM will not generate
WAlT\ if the ChiP enable signal (CE\) is inactive when
the address changes. This ensures that when the CPU
addresses some other portion of memory such as
RAM, the internal PROM page does not ch~ge and a
'
WAIT\ signal is not generated.

System Timing
This section provides a brief description of the
CY7C601 timing interface to the CY7C289 PROMs.
The .ti~ng diagram in Figure3 illustrates a typical commUfllcatIon sequence between the CPU and the
PROMs.
The memory interface's timing depends on whether
or not the access is on the same page as the previous
access. !h~ case w~ere an internal PROM page is
crossed is illustrated m the left side of Figure 3. Address 1 (displayed as A1) is an access to PROM that
causes an internal page change. W Am is asserted by
the CY7C289 to freeze the processor until the PROMs
can deliver valid data. Note in Figure 3 that WAIT\ is
not asserted until the next processor clock cycle. This
delay is possible, using either MHOLDA\ or
~OLDB\, because of the CY7C601's pipelined architecture. The delay allows memories or interface logic
more time to examine the address and determine if a
wait state is required.
The processor samples MHOLDA\ on the processor clock's falling edge. An active MHOLDA\ indicates
that the adru:ess in the previous clock cycle requires at
least one Walt state to complete. However, as shown in
Figure 3, by the time MHOLDA\ is detected active, the
processor has already read the data corresponding to
Al. Reading this false data is perfectly acceptable due
to the CY7C601's internal instruction pipeline. The
CPU has the time to invalidate the erroneous data
before it reaches the execution stage. The MDS\ signal
strobes in the correct data when the data becomes
available.
The CY7C289s are configured to generate the
WAlT\ signal with respect to CLK's falling edge to ensure proper operation of the wait-state mechanism. If
the rising-edge option were selected, it is possible that
the WAIT\ signal would be generated too early by the
PRO~s. Consequently the CY7C601 would recognize
~ a~tIve level on MHOLDA\ during the first cycle and
mvahdate the data from the bus cycle prior to the
PROM access. Generating the W AI'I\ signal from the
falling edge ensures that the CPU does not detect the
hold until the access's second cycle.
Another important aspect of the memory
interface's operation during a PROM page change is
that WAIT\ connects directly to MDS\ as well as to
MHOLDA\. This arrangement causes MDS\ to be asserted for two clock cycles instead of just one, but this
does not affect the system's operation. Although the
CY7C601 copies data erroneously during the first cycle
of MDS\, the erroneous data is overwritten with valid
data in the next cycle. This approach works because
MHOLDA\ remains asserted and does not allow the
internal pipeline to advance until the correct data arrives. The advantage to feeding WAIT\ directly into
MDS\ is that it avoids the use of any external logic for
the memory interface.

CY7C601 Interface
As shown in Figure2, the instruction memory interface requires only two control inputs (MHOLDA\,
MDS\). MHOLDA\ freezes the clock to the instruction pipeline during a cache miss (for systems with
cache) or when accessing a slow memory, such as the
65-ns page-miss operation in the CY7C289. Whenever
the CY7C289 generates a WAm signal, MHOLDA\ is
asserted and the instruction pipeline is frozen. The
processor freezes with the next instruction's address on
the address bus. MHOLDA\ must be presented to the
CY7C601 at the beginning of each processor clock cycle
and be stable during the processor clock's falling edge.
The other control signal, MDS\, signals the processor when slow or missed (cache-miss) data is ready on
the bus. The signal must be asserted only while the
processor is frozen by either MHOLDA\ or Memory
Hold B (MHOLDB\). Assertion of MDS\ enables the
clock to the on-chip instruction register during an instruction fetch and effectively strobes the valid data into
the CPU.

5-27

QCI'I'IOSS.

Interfacing the CY7C289 to the CY7C601

·SEMlcamucrOR

,==========;;;;;;;~~~;:;;~;;;;;;;;~;;;;;;;~~;;;;;;;~~;;;;;;;;=

elK
A2

ADR
t\HOLDA
3
/'\DS

DATA

VAlID

INV~LIO

<~--~----~~-65----------~

'WAIT

~19U--..--.;-

__--,1

Figure 3. Memory Interface Timing

Figure 3 . also displays some of the speed requirements that must be met in the instruction memory interface. In the case of an internal page, change, the
CY7C289 'PROMs require two wait cycles to complete
an access. The 40-MHz CY7C601 requires 2 ns of data
setup time before the system clock's rising edge. This
sequence results in a total of 73 ns available for the
memory' to return valid data. The CY7C289 meets this
requirement with the 65-ns off-page access· time.

Th,e relatively trivial timing of sequential accesses
falling on the same PROM page is illustrated in the
right portion of Figure 3 .. The PROM latches A2' into
the on-chip registers at CLK's rising' edge and delivers
data a maximum of 20 ns later.

Reference
For information on using the CY7C289 in latched
mode, see the application note entitled "Interfacing the
CY7C289 to the AM29000."

5-28

Section Contents
Page
PLDs
Introduction to Programmable Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-1
CMOS PAL Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-10
Are Your PLDs Metastable? ............................................................. 6-21
PLD-Based Data Path For SCSI-2 ........................................................ 6-40
PAL Design Example: A GCR EncoderlDecoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-63
1'2 Framing Circuitry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-76
Using CUPL with Cypress PLDs ......................................................... 6-93
Using ABEL to Program the Cypress 22V10 .............................................. 6-119
Using ABEL to Program the CY7C330 ................................................... 6-139
Using ABEL 3.2 to Program the Cypress CY7C331 ........................................ 6-147
Using Log/IC to Program the CY7C330 .................................................. 6-154
State Machine Design Considerations and Methodologies .................................. 6-173
Understanding the CY7C330 Synchronous EPLD ......................................... 6-213
Using the CY7C330 in Closed-Loop Servo Control ........................................ 6-233
FDDI Physical Connection Management Using the CY7C330 ............................... 6-247
Bus-Oriented Maskable Interrupt Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-259
Using the CY7C330 as a Multi-channel Mbus Arbiter ..................................... 6-270
Using the CY7C331 as a Waveform Generator ............................................ 6-279
CY7C331 Application Example: Asynchronous, Self-Timed VMEbus Requestor .............. 6-286
Understanding the 361 ................................................................. 6-295
Using the CY7C361 as an Mbus Arbiter ................................................. 6-305
TMS320C30/VME Signal Conditioner Using the CY7C361 ................................. 6-315
DMA Control Using the CY7C342 MAX EPLD .......................................... 6-327
Interfacing PROMs and RAMs to High-Speed DSP Using MAX . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-345
FIFO RAM Controller with Programmable Flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6-351

CYPRESS
SEMICONDUCTOR

Introduction to Programmable Logic
Why Use a PLD?

nology, thus making them EPLDs, which are erasable
using an ultraviolet light source. You can make design
changes at any point in the product cycle more easily
than you can with other ASICs. The design cycle of a
moderately complex PLD can be a week or less, and
after the one-time purchase of a good development
software package and programmer, the parts are relatively inexpensive. PLDs simplify logic timing because
all logical functions take approximately the same path
through the device. Thus, the same propagation delays
apply to all device outputs (more on this later).

ASICs (Application Specific Integrated Circuits)
are one of the fastest growing segments of the semiconductor market for good reason. In addition to increasing packaging density and reducing board real estate by
integrating SSIIMSI logic functions, ASICs reduce
power requirements, improve reliability, and provide
product secrecy.
ASICs include several different types of devices:
full-custom devices, standard cells, gate arrays, and
PLD s. Full-custom devices offer the greatest degree of
integration, but they are expensive, and the development cycles can be on the order of nine months to a
year. Full-custom designs are justified only for very
large volume applications.
Standard cell devices can be turned around much
more quickly (in about four months) and cost less than
full-custom devices. However, the level of integration,
and thus the speed, are lower than with the full-custom
product.
Gate arrays offer even less dense integration, but
because only two metal masks must be fabricated, the
design turnaround can be as short as six weeks. One
drawback of all these ASICs is that the design logic
must be set at the start of the fabrication cycle. If the
design changes, the whole product cycle must start over.
In addition, because each device is application specific,
you must watch inventory very carefully to make sure
that just enough of each device is ordered to meet
demand.
An alternative to custom or semicustom devices is
the PLD (Programmable Logic Device). Although
PLD s do not offer the same level of integration as the
other ASICs, the board-space reduction is still significant. The reduction factor is application dependent
and ranges from 4: 1 and 10: 1 for smaller PLDs (20 to
24 pins) to 75: 1 for high-density/pin-count devices such
as the LCA or MAX families. Additional benefits include reduced parts inventory, faster design and turnaround times, and simplified timing considerations.
Because a PLD is sold as a "generic" array of logic,
customized by the user, you can use the same PLD in
many different applications, spanning any number of
projects. Cypress's PLDs are based on EPROM tech-

PLD Technology
All Cypress EPLD families except the CY7C360
family utilize the familiar sum-of-products architecture.
You can implement Boolean transfer functions of this
form by programming the AND array whose output
terms feed a fixed OR array. This scheme can implement most combinatorial logic functions and is limited
only by the number of product terms available in the
AND-OR array. PLDs come in a variety of different
sizes and with additional architectural features such as
flip-flops.
TTL PLDs use a fuse as their programmable element. During the manufacturing process, fuses are built
into all the connections between input pins and product
terms. All unwanted connections are then blown during
the programming process. Bipolar products are
programmed using 20V pulses from 50 ~s to 100 ms
long. These 100- to 300-mA pulses blow unwanted
fuses. Fuses are blown one at a time so that the heat
generated does not damage or weaken the IC. Because
of the high currents required, bipolar PLDs have to be
programmed one at a time. Because physical fuses are
blown, you can program these devices only once.
In contrast, the Cypress CMOS EPLD family uses
an EPROM cell instead of fuses. This structure allows
Cypress to functionally test and then erase all devices
prior to packaging, thus facilitating 100-percent
programming yields. The EPROM cell used by Cypress
serves the same purpose as the fuse used in most
bipolar PLD devices. Before programming, the AND
gates (product terms) are connected via the EPROM
cells to both true and complement inputs.

6-1

Introduction to Programmable Logic
1
1

1

Z

J
......

7

t

LoUt
7

-.....

LOll12

~
...

7

4

......

Loft.

.J

7

~

I

..-1014
7

-

UtlD

~

.....

L15U

.J

7

~

•

unt
7

•

I
,
I

J

7

7

I

.J
~

•

I
I

1

~

3

~~F204~ ~O*

YPle!~C~~C
file
fie:
P D
.J)rodqced·

24 Z

LoODO

-

...J

I

1211989
re s
TooJklt C 100*
Secunt bittln ro ammed*
000 111111111111111)111111) 1 Ilfll11lO*N OE PT pin: 19*
LOO032 10011111111111111111111111111111*N Sum pt, pin= 19*
LOO064 Oll0l111111111111111111111111111*N Sum PT, pin= 19*

it

[gg?i~ ~ggggggw088gg:~ ~~~~: pi~:

l§:

LO016O OOOOOOOoooOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, gin= 19*
LO0192 OOOOooooooOOOOOOoooooooooOOOoooO*N Sum PT pin= 19*
L00224 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT: pin= 19*
L00256 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N OE PT pm= 18*
L00288 ooooOOooooOOOOOOOOOOOOOOOOOOOOOO*N Sum pt, pin= 18*

11

[g~m~ gggggggoo~~gggggggg:~ ~~~, pi~: l~:

17

LOO384 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT: gin= 18*
L00416 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 18*
LOO448 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 18*
LOO480 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 18*
L00512 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N OE Pfrtpm= 17*

11

g:
[gg~6a gggggggggo0888gggoo00W0gggggggggg:~ ~~~ ~: pi~: g:
[gg~~~ 888o~~gggggggggggggggg:~ ~~ PT' pi~:

II

L00608 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT' pin= 17*
LO0640 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT: pin= 17*

14

L00736 OOOOOOOOOOOOOOOOoooOOOOOOOOOOOOO*N Sum PT, ~in= 17*
LOO768 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N OE PT pm= 16*
L00800 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum pt, pin .. 16*
L00832 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin.. 16*
L00864 OOOOOOOoooOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 16*

[88g~~ gggo088ggg8880oggggggggggggggggggg:~ ~~~ ~: pi~: l~:

lIS

L00960 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, gin= 16*
LOO992 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 16*
LOI024 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N OE I'L,pm= 15*
LOI056 ooooOOOOOOOOOoooOOOOOOOOOOOOOOOO*N Sum 1'1, pin: 15*
L01088 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 15*
L01120 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 15*
L01152 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 15*
LOl184 OOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 15*
L01216 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin: 15*
L01248 ooooOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 15*
L01280 OOOOOOOOOOOOOOOOOOOOOOOO*N OE I'L,pm= 14*
L01312 OOOOOOOOOOOOOOO*N Sum 1'1, pin: 14*
LOl344 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 14*
L01376 OOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 14*
L01408 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 14*
LOl44000000000000000000000000*N Sum PT, pin= 14*
L01472 OOOOOOOOOOOOOOOOOOOOOOOOOOOON Sum PT, pin= 14*
L01504 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 14*
L01536 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N OE PT pm= 13*
L01568 ooooOOOOOOOOOOOOOOOOOOOOOOOOoooO*N Sum pt, pin: 13*
L01600 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin= 13*
L01632 OOOOOOOOOOOOOOOOOOO()()QOOOOOOOOOO*N Sum PT, pin= 13*

lZ

11

Figure 4. The 16L8 Block Diagram.

The official, standardized version of a fuse map is
called a JEDEC map. This map can contain various informational fields and/or comments in addition to the 1s
and Os. FigureS shows the JEDEC map that implements
the function shown in Figures 2 and 3. Each number
starting with L in the leftmost column represents the
first fuse number in that row. An N denotes a note or
comment. QF precedes the total number of fuses in this
device-QF2048 in this example. FO means that the fuse
default is 0, or unprogrammed. GO specifies an unprogrammed security fuse, whereas G 1 denotes a
programmed security fuse (more on this later). C
precedes a checksum value for the file. An * specifies
the end of a field. A JEDEC file can also contain test
vectors, which are not shown here.
For more information on the JEDEC Standard,
refer to "JEDEC Standard No.3-A, Standard Data
Transfer Format Between Data Preparation System and
Programmable Logic Device Programmer" available
from:
Solid State Products Engineering Council
2001 Eye Street N.W.
Washington, DC 20006
Most PLD design packages compile the design and
translate it into a JEDEC map. The map is then
downloaded to the programming hardware, which
programs the device(s) accordingly.

[gl~~ ~~ggggoo~ggggg:~ ~~~, pi~~

B:

[g~~~~ ggj~ggjgggggggggggjgggggjggjgg:~ ~~~ PT' pi~~

g:

LOl728 OOOOOOOOOOOOOoooOOOOOOOOOOOOOOOO*N Sum PT: ~in= 13*
L01760 OOOOOOOOOOOOOOOOoooOOOOOOOOOOOOO*N Sum PT, pin= 13*
L01792 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N OE ~pm= 12*
L01888 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT: gin=
L01920 OOOOOOOOOOOOOOOOOOOOOOOOOOOOoooO*N Sum PT, pin=
L01952 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin=
L01984 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin=
~~~\~ OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO*N Sum PT, pin=

12*
12*
12*
12*
12*

0000

Figure 5. A 16L8 JEDEC Map.

First-Generation PLDs
The ftrst PLDs were strictly combinatorial logic
with three-state outputs, like the PALC16L8. Then D
flip-flops, a clock input, and internal feedback were
added, allowing a single PLD to implement sequential
logic or state machines. The 16L8, 16R4 (four
registered outputs), 16R6 (six registered outputs), and
16R8 (eight registered outputs) became industry-standard parts.
Testability was a problem in some of the earlier
devices. Because a blank device had all fuses intact, out6-3

~~

Introduction to Programmable Logic

~, SEMICQIDUCI'OR

ASYNC RESET
GLOBAL CLOCK
SYNC PRESET
PTERM
PIN

PRODUCTS

FEEDBA K
TO ARRAY
Cl

Figure 6. The 22VIO Macrocell.
put enables were all turned off, configuring all device
pins as inputs. This scheme made it difficult to test
blank devices and to check whether the fuses could be
blown without actually blowing any of them;
To . get around these problems, a phantom array
was added to the device. The 16L8, for example, has
256 additional bits in its phantom array. These bits are
used to test the PLD functionally and verify dynamic
(AC) operation after the chip is packaged, without
using the normal array. The phantom array is so named
because it does not function in regular operating. mode.
The device must be in a special mode to access the
phantom array.
The phantom array is usually programmed and
verified as part of the final relectrical test procedure
during the manufacturing process. This procedure
verifies both .the PLD programmability and function.
Cypress's EPLDs are programmed, tested, and then
erased before they are packaged. You can also use the
phantom array as part of incoming inspection.
Another feature of today's PLDs is register
preload, which loads data into the registers of
registered devices for testing purposes. This arrangement greatly simplifies and shortens the testing procedure. You can use this feature to check illegal state
resolution -a state machine's ability to pass from an accidental illegal state to a legal one. Preloading is accomplished by applying a super-voltage (usually in the
range of 12 to 14V) pulse of at least 100-lls duration to
a specific pin, while holding a second pin at VIH. The
super voltage acts as a write strobe, which clocks data
applied to the I/O pins into the corresponding registers.
A security fuse has also become a standard PLD
feature. In addition to writing a' fuse map into a device,
any good device programmer can read a device's fuse
map. This capability tends to negate the PLO's advantage of hiding proprietary logic from observers. But
if you do not want your PLDs to be read by a programmer,You can program the security bit, which discon-

nects the lines used to verify the array. In' a Cypress
EPLD, the security EPROM cell is designed to retain
its charge longer than any of the other cells in the array.

The Programmable Macrocell
The basic 20-pin PLOs of the past still had some
limitations. For instance, they provided no way to control output-pin polarity without doing DeMorgan operations on the logic equations. Quite often the OeMorgan
version has too many product terms to fit in such as
device, even after several hours of reduction using a
.
logic-optimization program.
Another drawback is that you have to stock a
variety of the basic 20-pin PLDs and/or their 24-pin
equivalents to get the best fit for a given'design. Often
extra registers are left unused when the design is
fmished. Even though these PLOs tend to be pin
limited, the pins ·.associated .with the extra registers end
up being wasted because you .cannot use them for anything else.
The 24-pin 22VlO overcame earlier limitations and
revolutionized PLDs by introducing the programmable
macrocell (Figure 6). The programmable macrocell allows you to select one of four output configurations:
combinatorial inverting, combinatorial noninverting,
registered inverting, and registered noninverting. You
can use the "output" pin as an input or for bidirectional
I/O if you specify the macrocell as combinatorial.
Each of the 22VI0's .ten I/O pins have all four configuration options. You select the option using two
fuses, or cells, identical to those in the array. These .20
bits (two for each of ten macrocells) appear at the bottom of the fuse map that represents the array.
Another innovation of the 22VlO is that some pins
have a larger sum of products than others-an approach called variable product term distribution. In· the
22VlO, I/O pins have sums from eight to 16 product
terms wide. This variable distribution accommodates

6-4

~

<;~~

-=.,

Introduction to Programmable Logic

SEMICGIDUCfOR

applications such as D flip-flop counters, where several
outputs require a large number of product terms.
The 22VI0 offers yet another improvement over
PLDs such as the 16R8, which powers up with all
registers in the reset state. The only way you can .chan~e
this is by clocking in new data. The 22VlO aVOids thIS
problem by adding two extra product terms. One sets
all registers, the other resets all registers. Because the
set and reset are each a product term, they can be
programmed to be the AND of any array input(s). For
additional flexibility, the set is designated as a
synchronous operation, and the reset is asynchronous.
Because of the 22VI0's versatility, it has become
something of an industry standard. It is available in
TTL, CMOS, and GaAs. Many companies have introduced similar architectures with slightly different features. For example, the Cypress PLDC20G 10 uses a
similar macrocell that· adds the capability to choose between a product-term output enable and a pin-controlled output enable. To make the PLDC20G 10 faster and
less expensive than the 22VlO, Cypress has reduced the
array to nine product terms per I/O macrocell and
removed the set and reset product terms.
Another device introduced around the same time
as the 22VI0 is the 20RAI0, which targets asynchronous
registered applications. Like the 22VlO, the 20RAI0
has I/O pins with programmable polarity bits. You can
configure the 20RAlO's I/O pins as registered or combinatorial, but not with dedicated fuses. Instead, each
I/O pin has a sum of four product terms that c~nnects,
through a polarity switch, to the D input of a flip-flop.
Each of these flip-flops has dedicated product terms
connected to its clock, set, and reset functions. When
both the set and reset of a flip-flop are asserted (High),
the flip-flop becomes transparent, thus making the output combinatorial.
In addition, the 20RAlO has an unusual outputenable scheme. Pin 13 is inverted and ANDed with an
output-enable product term. If pin 13 is High, all I/O
pins are at high impedance. The 20RAlO also offers a
synchronous register preload in operating mode. When

pin 1 goes Low, any data driven onto an I/O pin is
latched into the corresponding flip-flop. An 20RAI0's
I/O pin is illustrated in Figure 7. This device's flexibility
and asynchronous nature make it ideal for bus-arbiter
and interrupt-controller applications.

Second-Generation PLDs
The architectural features introduced by the 22VI0
greatly enhance PLD flexibility, but this. device still has
some limitations. It offers only D-type flIp-flops, for example, which are cumbersome for applications such ~s
counters. Further, each flip-flop and its feedback sull
use a pin, even if the flip-flop's output is not needed
outside the PLD. Bidirectional, registered pins cannot
be implemented. High-speed applications often require
flip-flops outside the PLD's inp~ts to latch data bec.ause
propagation delays impose relatIvely long set-up urnes
for output flip-flops.
Cypress solves all these problems with the
CY7C330. In addition to· the output registers on the I/O
pins, each pin except power and groun? ~as an input
register with a choice of two clocks. ThIS mput macrocell makes the 28-pin CY7C330 ideal for pipelined control and high-speed state machine applications.
Another CY7C330 feature is its ability to emulate T
and JK flip-flops-a useful alternative in counter
designs. In each I/O macrocell, the sum of products
from the array drives one input of an exclusive-OR
(XOR) gate. The second input to the XOR gate is
another product term. This gate's output connects to
the D input of the output flip-flop in the macrocell (Figure 8). If the flip-flop's Q output is fed back and connected to the single product term driving the XOR gate,
the sum-of-products acts as the T input of a T flip-flop.
The macrocell can also emulate a JK flip-flop in this
way, using the relation T = J!Q + KQ. If you require a
D flip-flop, you can use the XOR gate to control
polarity.
Close examination of Figure 8 reveals two paths
into the array. The first is a multiplexer that selects
feedback from either the output register or the input
OUTPUT ENABLE
(FROM PIN 13)

OE~~UL

PRELOAD
(FROM PIN 1)
________________-r__________~~

eLO K PTERM
RESET PTERM

CO
TO~~L-____________~~==~

__ __
~

Figure 7. The 20RAIO Macrocell.

6-5

~

Introduction to Programmable Logic
SET
·E SET

JCL\Cl
JCLKO
OCLK
OE
OE

p~~

______________________

~~~~-+~

XOR
SUM~~______-R~

,
TO

C3

'h.~~~ 1np.t

FROM ADJACENT
MACROCELL

Figure 8. The CY7C330 Macrocell.

register's Q output. This multiplexer is called the feedback mux. The inputs to the second path, called the
shared input mux, are the Q outputs of input registers
belonging to adjacent I/Omacrocells. This path allows
you to feed back the Q output of a macrocell's output
register, and still utilize' the pin associated with that
macrocell as an input. You can do this for six of the 12
I/O macrocells. If you need more registers for an application, the CY7C330 contains four additional buried
registers. These registers are identical to the output
register portion of the 1/0 macrocell, except they are
not connected to any pin.
Just as the CY7C330 can be considered as' an extended, enhanced version of the 22VI0, the CY7C331
represents an extension of the lORAI0.· The lORAI0
has many of the same limitations as the 22VI0, with the
additional limitation that the sum of products is only
four product terms wide. The CY7C331 has 12 I/O
macrocells. In addition to the 20RAI0-like output flipflops, the CY7C331 has identical flip-flops in the input
path. As in the lORAI0, each flip-flop has a productterm-controlled clock, set, and reset. If the set and reset
product terms are both asserted, the flip-flop' becomes
transparent. The 20RAlO polarity fuse has been
replaced in the CY7C331 by an XOR gate, which has as
inputs the sum of products and a dedicated product
term. Thus, you can control the output's polarity or
have the flip-flops emulate T or JK flip-flops, as in the
CY7C330. The CY7C331 macrocell appears in Figure 9
Like the 22VI0 and CY7C330, the CY7C331 has
variable-product-term distribution with sums from four

to .12 product terms wide. The CY7C331 borrows the
shared· input mux and output enable schemes from the
CY7C330. The CY7C331 does not support the
lORAI0's operating mode preload, but you can preload
the CY7C331's registers using a super voltage.
The CY7C331 is designed especially for self-timed
applications such as high-speed 1/0 interfaces. The
device supports self-timed designs with programmable
clock inputs, well-controlled internal timing relationships, and ultra-fast metastable resolution. No other
PLO has this self-timed capability.
Another PLO architectural. trend, is to put
registered inputs in combinatorial devices. These PLDs
generally serve in sophisticated decoding applications,
where the address or data is only stable for a short time.
In the past, an MSI chip with latches or flip-flops
was used to capture transient data, and the latched data
fed into a PLO. Now PLOs such as the CY7C332 feature an input macrocell that you can program as combinatorial, registered, or .latched. You have a choice of
two clocks, and you can program the clock polarity as
well.
The CY7C332 I/O macrocell (Figure 11) incorporates. the input macrocell and a combinatorial output
path. The latter includes a variable sum of products that
drives one input of an XOR gate; a dedicated product
term drives the XOR's other input. An output-enable
mux allows a product term (pin 14) to control the output enable. This combinatorial output path can act as
an input to the programmable-input register/latch, thus
allowing you to create state machines.
6-6

Introduction to Programmable Logic
OE (pIN 14)
OE PTERM
OUT SET PTERM
CO

OUT ClK PTERM
OUT RESET PTERM
IN ClK PTERM
IN SET PTERM
TO

TO

INPUT B FFER

register

INPU~~~~~

FROM ADJACENT
MACROCELl

Figure 9. The CY7C331 Macrocell.

OE

PLk~--------4-~

XOR P~
SUM
F
PRODUCTS
TO

C4
>-+-________________
~

~~~----~--~uOLJO

PIN

INPU~T~B~~~__~~

C2
OE (PIN 14)

ClK!
ClK2

Figure 10. The CY7C332 Macrocell.

High-Density PLDs

figured using expander product terms. Each of these
product terms is called a logic array block (LAB). The
CY7C342 contains eight LABs, which connect together
via a programmable interconnect array (PIA).
The CY7C342 macrocell (Figure12) contains a sum
of three product terms driving one input of an XOR.
The other XOR input is a dedicated product term. The
XOR drives a programmable flip-flop, which you can

Because of its low power consumption, CMOS can
achieve higher integration than can bipolar technologies. Several manufacturers are taking advantage of
this fact to produce very high density PLDs. The
CY7C342, for example, is a 68-pin member of the new
MAX family and contains 128 flip-flops and over 1000
product terms. Up to 256 additional latches can be con6-7

Introduction to Programmable Logic

P T 'UU'-'<-.---i
P T ........'-"-----1
P T 'UU'""----i
CLOC ...........--L..Io.Jl..

VILP

VIHP

VILP

VILP

VIHP

VILP

VIHP

VILP

VIHP

VILP

11

VILP

VIHP

VILP

VIHP

VIHP

12
13
14
15
16
17
18
19
20
21
22
23

VILP

VIHP

VIHP

VILP

VILP

VILP

VIHP

VIHP

VILP

VIHP

VILP

VIHP

VIHP

VIHP

VILP

VILP

VIHP

VIHP

VIHP

VIHP

VIHP

VILP

VILP

VILP

VILP

VIHP

VILP

VILP

VILP

VIHP

VIHP

VILP

VILP

VIHP

VILP

VIHP

VILP

VILP

VIHP

VIHP

VIHP

VILP

VIHP

VILP

VILP

VIHP

VILP

VIHP

VILP

VIHP

VIHP

VILP

VIHP

VIHP

VILP
VIHP

VIHP

VILP

VIHP

VIHP

24

VIHP

VIHP

VILP

VILP

VILP

17 25 33 41 49 57

25

VIHP

VIHP

VILP

VILP

VIHP

10 18 26 34 42 50 58

26
27

VIHP

VIHP

VILP

VIHP

VILP

VIHP

VIHP

VILP

VIHP

VIHP

28

VIHP

VIHP

VIHP

VILP

VILP

29

VIHP

VIHP

VIHP

VILP

VIHP

VILP VILP VILP

0

8

16 24 32 40 48 56

VILP VILP VIHP

1

9

VILP VIHP VILP

2

19 27 35 43 51 59

VILP VIHP VIHP

3

11

VIHP VILP VILP

4

12 20 28 36 44 52 60

VIHP VILP VIHP

5

13 21 29 37 45 53 61

30

VIHP

VIHP

VIHP

VIHP

VILP

31
PO

'VIHP

VIHP

VIHP

VIHP

VIHP

VILP

VILP

Vpp

X

X

PI
P2
P3

VILP

VIHP

Vpp

X

X

VIHP

VILP

Vpp

X

X

VIHP

VIHP

Vpp

X

X

VIHP VIHP VILP

6

14 22 30 38 46 54 62

VIHP VIHP VIHP

7

15 23 31 39 47 55 63

DO .D1 D2 D3 D4 :D5 D6 D7
Programmed Data Input

6-14

-

£~~Rffi')
~.,

CMOS PAL Basics

SEMICONDUCTOR

-t>

INPUTS 10 - 31)
POP,P2 P3

o 1 2 3

• 1.7

, • lOll

12131411

18"1.1.

20212223

2UnU7

28213031

0
1
2
3

~

•

~

J..

~

•
I
7

19

A

>

....... t - - - -

-R ~

••

~

10

11
12

13

..

l'
II

~

A
~r---

I'
I'

.....

20
21
22
23

H-

E~

11

I'

~R
.....

.

18

17

...

~J----

.>
r-

2.

21
28
27
28
28
30

~

~
J.
~
~

.....

,......"

.H
H

31

16

c:_

>
32
33

~

"

3Ii
31
37
31
31

.

15

~
A

~

.....

40

~
R

.,"

...,....
.
.
'2

....

.,

-t
p

.....

'I

~~
;~

55

.....~

eg
&0

..

52
53

.....

14

....

...

.....

58

~
9

57

5'
51
60
81
82

6'

...

....

c:

~

~

POP,P2 P3

0 ' 23

.567

"'011

1213'415

111171.19

20212223

24252127

2121303.1

Figure 2. Functional Logic Diagram of PAL C 16L8A

6-15

13

12

11

CMOS PAL Basics

Programming Operation
In a PAL C device, pins 5 - 9 are decoded (Table 4)
in a one of 32 decoder, whose outputs correspond to the
inputs labeled 0 -31 in Figure 2. For programming, 15V is
applied to the bottom of the word line through a weakdepletion-mode device. The EN (enable) signal to all of
the three-state drivers is Low, which prevents the normal
PAL input signals from driving the word lines during
programming. The DO - D7 inputs (pins 19 - 12) drive the
program transistors (0, 8, 16, 24, etc.) as selected by pins
2,3, and 4 (Table 3). To disconnect a word line from a bit
line, the program transistor is forward biased, which increases the threshold of the read transistor.

data at the PAL C input pins is applied to all 64 of the
product term lines. If any of the P transistors (16 per
product term line) have not been programmed, they tum
on and pull the lower input of the corresponding sense
amplifier (SA) to 2V or less. Because this voltage is lower
than the reference (Vref), the sense amplifier's output is
Low.
The reference is an unprogrammed EPROM cell that
tracks the same process, voltage, and temperature variations that affect all the cells in the array. The reference is
approximately 3V at room temperature and nominal Vee
(5V).

Phantom PAL Operation
The PAL is in the Phantom PAL operation mode
when a supervoltage (Vpp = 13.5V) is applied to pin 6.
The phantom array is programmed as shown in Figure 2.
When the device is· in Phantom PAL mode, you can
measure the worst-case propagation delay from the pin 2
input to the outputs (pins 12 through 17). The truth table
for the phantom array appears in Table 5.

Verify Operation
To verify the programmed cells, the device must go
from the Program PAL mode to the Program Inhibit mode
to the Program Verify mode. This is accomplished by
reducing the voltage on pin 11 to VIHP (3V) and then to
VILP (O.4V). Inside the device (Figure 4), the voltage
changes disable the l-of-32 decoder, bring the EN signal
Low, and put 31 of the 32 input term lines at OV. The line
being verified is at 5V. The input address lines (pins 2
through 9) do not need to change when going from Program to Verify mode.
Because the Ones that were programmed cause the
thresholds of the R transistors to increase, these transistors
do not tum on during Verify mode. The unprogrammed.
transistors do tum on, however; the complement (inverse)
of the data programmed is thus read during verify.

Reliability
Reliability is designed into all Cypress products from
the beginning by using design techniques to eliminate
latchup and improve ESD and by paying careful attention
to layout. All products are tested for all known types of
CMOS failure mechanisms.
Failure mechanisms can be either classified as those
generic to CMOS technology or those specific to EPROM
devices. Table 6 lists both categories of failures, their
relevant activation energies, Ea in electron volts, and the
detection method used by Cypress. In both cases, the
mechanisms are aggravated by HTOL (high temperature
operating life) tests and HTS (high temperature storage)
tests.

Regular (Normal) PAL Operation
The PAL implements the programmed function when
no supervoltages are applied to any of the pins. During
regular PAL operation, the l-of-32 decoder and the DO D7 decoder are disabled, the EN signal is High, and all 32
input term lines are at 5V. Under these conditions, the

14

8
7-INPUT

8

NOR

OUTPUT
DRIVERS

12-19

PROGRAM

Figure 3. 16L8 Device Simplified Block Diagram

6-16

8
PINS

CATES

Table 5. Phantom Array Truth Table
Pin 2
0
1
0
0

Inputs
3 4
1
0
1
0
1 X
1 X

Outputs
19 18 17 16 15
X X 1 1 1
o· 0
X X 0
1 0 X X X
1 X X X
0

This results in a reduced read margin. The effects of this
mechanism are generally negligible.
Electrons might become trapped in the gate oxide
during programming and cause diminis~ed re~rog~am­
mability. For one-time-programmable deVIces, thIS faIlure
mode has little significance. This is because Cypress PAL
C devices are programmed only three times: twice during
manufacture and once by the customer.

14 13 12
1 1 1
0 0 0
X X X
X X X

HTOL Testing
High temperature operating life test (or burn-in)
detects most generic CMOS failure mechanisms. Units are
placed in sockets under bias conditions with power applied and at elevated temperatures for a specific number
of hours. This test weeds out the "weak sisters" that would
fail during the fIrst 100 to 500 hours of operation under
normal operating temperatures. HTOL tests are also used
to measure parameter shifts to predict and screen for
failures that would occur much later.

Specific EPROM failure mechanisms include charge
loss, charge gain, and electron trapping. Thermal energy
and field emission effects accelerate charge loss.
Thermal charge loss failures usually occur on random
bits and are often related to latent manufacturing defects.
In many instances a dramatic difference between typical
and worst-case bits are observed. Field emission effects
are generally detected as weakly programmed cells. The
high voltages used to program a selected bit might disturb
an unselected bit as a result of a defect.
Charge gain is due to electrons accumulating on a
floating gate as a result of bias or voltage on the gate.
PINS 5- 9

HTS Testing
High temperature storage tests are used to thermally
accelerate charge loss. These tests are performed at the
1 OF' 32 DECODER
(INPUT TERMS)

1 CORRESPONDS TO
INPUTS 0.1 OF' riG. 2

00-07

- - - 4 - - 1 - - -....--+--1

fO~~~~~~AM
ONLY

5V F'OR NORMAL AND VERIFY OPERATIONS
15V F'OR PROGRAMMING

Figure 4. Programming Method

6-17

wafer level and under unbiased conditions. Both pass/fail
data as well as shifts in thresholds are measured. For. a
more detailed discussion of charge loss screening, see the
References.
The generally accepted screening. method for identifying charge loss is a 168-hour bake at 250·C. This cor~
relates with more than 220,000 years of normal operation
at 70·C using a failure activation energy of 1.4 ev. The
sample size chosen guarantees that at least 99 percent of
the units will not fail during their useful operating life.

Initial Qualification
The process in general and the PAL C design specifically was qualified using HTS (bake) at 250·C for 256
hours, in conjunction with an HTOL .test at 125·C for
1000 hours.
In the qualification process, four wafers were erased
using ultraviolet light, and the linear thresholds of the
cell's read transistors measured at 25 sites on each wafer.
The wafers were then programmed, and the linear
thresholds measured and recorded.
The wafers were alternately baked at 250·C and the
linear thresholds measured and recorded at 0.25,0.5, 1,2,
4, 8, 16, 32, 64, 128, and 256 hours. The number of
device hours was therefore 100 x 256 = 25,600. .
The results of this process revealed that the average
threshold reduction due to charge loss was 0.66V. The
range was 8 to 10 percent of the average initial threshold
of 7.7V. This reduced threshold is more than 4V above
the sense amplifier voltage reference. There were no
failures.
If the charge loss failure activation energy is assumed
to be 1.4 ev, the HTS time of 256 hours at 250·C trans-

lates to 438,356 years of. operation at 70·C. This time
translation was computed using the industry-standard Arrhenius equation, which converts the time to failure
(operating lifetime) at one temperature and time to another
temperature and time.
To summarize the results:
Sample size: 100
Device hours: 25,600
HTSconditions: 256 hours at 250·C
Average initial threshold: 7.7V
Average threshold decrease: 0.66V
Standard deviation: 0.12
Lifetime (1.4 ev): 438,356 years at 70·C
These results confirm that the data retention characteristics of the EPROM cell used in all Cypress PALs and
PROMs guarantees a minimum operating lifetime of
438,356 years for activation energies of 1.4 ev.

Production Screen
Units from the same population were assembled
without being subjected to HTS and were subjected to an
HTOL of 150·C for 1000 hours. The units were tested at
12,24,48,96, 168, 336, and 1008 hours and the measurements recorded. Variations in the thresholds of the
EPROM cells were measured and correlated to the units
tested in the HTS test to determine a maximum acceptable
rate of charge loss. This data allows Cypress to guarantee
data retention over the devices' normal operating lifetime.

PAL C Advantages Over Bipolar PALs
The most pertinent data sheet parameters of Cypress
PAL C devices are compared with those of representative
bipolar PALs in Table 1. The supply current and propaga-

Table 6. Generic CMOS Failure Mechanisms
Mechanism
Surface charge
Contamination
Electromigration
Micro-cracks
Silicon defects
Oxide breakdown
Hot electron injection
Fabrication defects
Latchup
ESD
Charge loss
Charge Gain (oxide hopping)
Electron trapping in gate oxide

Activation Energy (eV)
0.5 to 1.0
1.0 to 1.4
1.0

-0.3
0.3

--

---

-0.8 to 1.4
0.3 to 0.6

--

Detection Method
HTOL, Fabrication monitors
HTOL, Fabrication monitors
HTOL
Temperature cycling
HTOL
High-voltage stress, HTOL
LTOL (low-temperature operating life)
Bum in
High-voltage stress, bum in, characterization
Characterization
HTS (high-temperature storage)
HTOL
Program/erase cycle

Table adapted from "An Evaluation of 2708, 2716, 2532, and 2732 Types of U -V EPROMs, Including Reliability and Long
Term Stability." Danish Research Center for Applied Electronics, Nov. 1980.

6-18

~

__

~

____

~

______

~

__

~~

__

~

__________

~

____________

~

TTL TO
__--+CMOS
CONVERTER

THINO)(IOE
TRANSISTOR

·Thick Oxide Field
Transistor
• ·Substrate Diode
VSUB

Figure 5. Input Protection Circuit
ground. Current rapidly increases until, in effect, a short
circuit from Vee to ground exists. If the current is not
limited, it will destroy the device, usually by melting a
metal trace.
The CMOS processing used to fabricate both N- and
P-channel MOS transistors also inherently creates
parasitic bipolar transistors - both NPNs and PNPs.
Latch-up is caused when these parasitic transistors are inadvertently turned on.
So long as the voltages applied to the package pins of
the CMOS IC remain within the limits of the power supply voltages (usually 0 to 5V), the parasitic bipolar transistors remain dormant. However, when either negative voltages or positive voltages greater than the Vee supply voltage are applied to input or output pins, the parasitic
bipolar transistors might tum on and cause latch-up.
Figure 6 shows a cross section of a typical CMOS
inverter using a P-channel pull-up transistor and an Nchannel pull-down transistor. Also shown is an N-channel
output driver that is isolated from the CMOS inverter by a
guard ring (channel stopper). The latter is necessary to
prevent parasitic MOS transistors between devices. P+
guard rings surround N-channel devices, and N+ guard
rings surround P-channel devices. The parasitic SCR
(PNPN) and bias generator appear in Figure 7, which
does not show the output driver schematic.
For latch-up to occur, two conditions must be satisfied: The product of the betas of the NPN and PNP transistors must be greater than one, and a trigger current
must exist that turns on the SCR.
Because the SCR structure in bulk CMOS cannot be
eliminated, the task of preventing latch-up is reduced to
keeping the SCR from turning on. If either Rwell or Rsub
equal 0, the SCR cannot turn on. This is because the base
and emitter of the PNP transistor are tied together and
thus the base/emitter junction cannot be forward biased;
and the base/emitter junction of the NPN cannot be forward biased because the base is connected to ground.
Note, however, that the NPN can be turned on by a negative voltage on the output pin if the right end of Rsub is
grounded.

tion delay specifications are compared under identical test
conditions. The output current sinking specifications are
also identical. Cypress PAL C devices are clearly superior
to bipolar PALs.
The lower power advantage of the PAL C results in
several benefits:
Lower capacity power supplies, which therefore cost
less
Reduced cooling requirements
Increased long term reliability due to lower die junction temperatures
You can further reduce the power dissipation by driving the PAL C inputs between 0.5V or less and 4V or
more. This reduces the power dissipation in the input
TIL-to-CMOS buffers, which dissipate power when their
inputs are between 0.8 and 3V.

PAL C Technology
The PAL C devices' 0.8~, double-Iayer-polysilicon,
single-layer-metal, N-well, CMOS technology has been
optimized for performance. Careful attention to design
details and layout techniques has resulted in superior-performance products with improved ESD input protection
and improved latch-up protection.
The circuit shown in Figure 5 is used at every input
pin in all Cypress products to provide protection against
ESD. This circuitry withstands repeated applications of
high voltage without failure or performance degradation.
This is accomplished by preventing the high ESD voltage
from reaching the internal transistors' thin gate oxides.
The circuit consists of two thick-oxide field transistors wrapped around an input resistor (Rp) and a thinoxide gate transistor with a relatively low breakdown voltage (12V). Large input voltages cause the thick-oxide
transistors to turn on, discharging the ESD current to
ground. The thin-oxide transistor breaks down when the
drain-to-source voltage exceeds 12V. This transistor is
protected from destruction by the current-limiting action
of Rp. Experiments confirm that this input protection circuitry results in ESD protection in excess of 2000V.

Latch-up

Preventing Latch-Up

Latch-up is a regenerative phenomenon that occurs
when the voltage at an input or output pin is either raised
above the power supply voltage potential or lowered
below the substrate voltage potential, which is usually

The traditional cures for latch-up include increased
horizontal spacing, diffused guard rings, and metal straps

6-19

Vee

compatibility is required. In addition, the P-channel pullup transistor is sensitive to overshoot and introduces
another vertical PNP transistor that further compounds the
latch-up problem. Cypress uses N-channel pull-up transistors that eliminate all of these problems and still maintain
TIL compatibility.
Cypress is the fIrst company to use a substrate bias
generator with CMOS technology. The bias generator
keeps the substrate at approximately -3V DC, which serves several purposes.
The parasitic diodes shown in Figure 5 cannot be forward biased unless the voltage at an input pin is at least
one diode drop more negative than -3V. This translates
into increased device tolerance to undershoot at the input
pins caused by inductance in the leads. If the undershoot
is larger than 3V, the output impedance of the bias generator itself is sufficient to prevent trigger current from
being generated.
,
The same reasoning applies to negative voltages at
the output pins (Figure 7). To tum on the NPN transistor,
the voltage at the output pin must be at least one VBE
more negative than -3V.
To protect the core of the die from free-floating holes
and stray currents, Cypress uses a diffused collection
guard ring that is strapped with metal and connected to
the bias generator. This provides an effective wall against
transient currents that could cause mis-reading of the
EPROM cells.

Substrate Bias Generator - .

Figure 6. Parasitic SCR and Bias Generator
to critical areas. These solutions are obviously opposite to
the goal of greater density.
A brute-force approach that has been successful in
reducing latch-up has been to increase the conductivity of
the N well and the substrate. Changing the well conductivity is unacceptable because it affects the characteristics
of the P-channel MOS transistors. Using an epitaxial layer
to reduce the substrate resistivity (Rsub) is also unacceptable because the price per wafer with a P+ epi-Iayer is
approximately three times the cost of the industry-standper square, P- wafer.
ard 5-inch,
Cypress uses several design techniques in addition to
careful circuit layout and conservative design rules to
avoid latch-up.
Conventional CMOS technology uses a P-channel
MOS transistor as a pull-up device on the output drivers.
This has the advantage of being able to pull the output
voltage High to within 100 mV of the positive voltage
supply. However, this is of marginal value when TIL

son

References
Woods, Murray H. "An E-PROM's integrity starts
with its cell structure," Electronics magazine, August 14,
1980, pg. 132.
Rosenberg, Stuart. "Tests and screens weed out
failures, project rates of reliability," Electronics magazine,
August 14, 1980, pg. 136.

Output Driver
n-MOS
PULL-DOWN
DEVICE

'n

n+ DIFFUSION AND
n- WELL GUARD RING

OUTPUT

p+ DIFFUSION
GUARD RING

CMOS Inverter

"1..J

n-MOS
/PULL-UP
DEVICE

Vee

OUTPUT

LATERAL npn BIPOLAR
TRANSISTOR

Figure 7. CMOS Cross Section and Parasitie Circuits

6-20

INPUT

~
------

~
CYPRESS
,
SEMICONDUCTOR

=

~.!~~~..

Are Your PLDs Metastable?
This application note provides a detailed description
of the metastable behavior in PLDs from both circuit and
statistical viewpoints. Additionally, the information on the
metastable characteristics of Cypress PLDs presented here
can help you achieve any desired degree of reliability.
Metastable is a Greek word meaning "in between."
Metastability is an undesirable output condition of digital
logic storage elements caused by marginal triggering. This
marginal triggering is usually caused by violating the
storage elements' minimum set-up and hold times.
In most logic families, metastability is seen as a voltage level in the area between a logic High and a logic
Low. Although systems have been designed that did not
account for metastability, its effects have taken their toll
on many of those systems.
In most digital systems, marginal triggering of
storage elements does not occur. These systems are
designed as synchronous systems that meet or exceed their
components' worst-case specifications. Totally synchronous
design is not possible for systems that impose no fixed
relationship between input signals and the local system
clock. This includes systems with asynchronous bus arbitration, telecommunications equipment, and most I/O interfaces. For these. systems to function properly, it is
necessary to synchronize the incoming asynchronous signals with the local system clock before using them.
Figure 1 shows a simple synchronizer, whose
synchronous input comes from outside the local system.
The synchronizer operates with a system clock that is
synchronous to the local system's operation. On each
leading edge of this system clock, the synchronizer attempts to capture the state of the asynchronous input. Figure 2 shows the expected result. Most of the time, this
synchronizer performs as desired.

CLOCK
ASYlle

111. ur

\'-------/
Figure 2. Expected Synchronizer Output
Digital systems are supposed to function properly all
the time, however. But because there is no direct relationship between the asynchronous input and the system
clock, at some point the two signals will both be in transition at very nearly the same instant. Figure 3 shows some
of the synchronizer's possible metastable outputs when
this input condition occurs. These types of outputs would
not occur if the synchronizer made a decision one way or
the other in its specified clock-to-output time. A flip-flop,
when not properly triggered, might not make a decision in
this time. When improperly triggered into a metastable
state, the output might later transition to a High or a Low
or might oscillate.
When other components in the local system sample
the synchronizer's metastable output, they might also become metastable. A potentially worse problem can occur
if two or more components sample the metastable signal
and yield different results. This situation can easily corrupt data or cause a system failure.
Such system failures are not a new problem. In 1952,
Lubkin (Reference 1) stated that system designers, incIud-

CLOCK
ASYNC
INPUT

IflCHIOlun
ASUCHROIOUI
,

T

ITiCHlOIOUI
OUTPUT
LOCALLY

- ...................- - - + - - - - - - - - - - - - i

SYNC
OUT

SYICHIOIOUI
nSTU

I

Figure 1. Simple Synchronizer

UTlSTAIU
lUOLVE TO

I

0

HITlSTAILE
RESOLVE TO 1

I

HETASTAILE
OSCILLATIU OUTPUT

Figure 3. Possible Metastable States of Synchronizer

6-21

take anywhere from an additional few hundred
picoseconds to tens of microseconds to reach a valid output level. The amount of additional time beyond teo. max
required for the outputs to reach a valid logic level is
known as the metastable walk-out time. This walk-out
time, while statistically predictable, is not deterministic.
Figure 6, from Reference 2, shows the variation in
output delay with data input time. The left portion of the
graph shows that when the data meets the required set-up
time, the device has valid output after a predictable delay,
which equals teo. The middle portion of the graph indicates the metastable region. If the data transitions in this
region, valid output is delayed beyond teo max. The closer
the input transitions to the center of the metastable region,
violating the device's triggering requirements, the longer
the propagation delay. If the data transitions after the
metastable region, the device does not recognize the input
at that clock edge, and no transition occurs at the output.
As given in Reference 3, you can predict the region tw,
where datil transitions cause a propagation delay longer
than t, from the formula:

ing the designers of the ENIAC, knew about metastability.
The accepted solution at that time was to concatenate an
additional flip-flop after the original synchronizer stage
(Figure 4). This added flip-flop does not totally remove
the problem but does improve reliability. This same solution is still in wide use today.
Recovery from metastability is probabilistic. In the
improved synchronizer, the first flip-flop's output might
still be in a metastable state at the end of the sample clock
period. Because the flip-flops are sequential, the probability of propagating a metastable condition from the
second flip-flop stage is the square of the probability of
the first flip-flop remaining metastable for its sample
clock period. This type of synchronizer does have the
drawback of adding one clock cycle of latency, which
might be unacceptable in some systems.
As system speeds increase and as more systems utilize inputs from asynchronous external sources, metastability-induced failures become an increasingly significant portion of the total possible system failures. So
far, no known method totally eliminates the possibility of
metastability. However, while you cannot eliminate
metastability, you can employ design techniques that
make its probability relatively small compared with other
failure modes.

Eq.l
where 't depends on device-specific characteristics such as
transistor dimensions and the flip-flop's gain-bandwidth
product.
Figure 7 shows another way of looking at metastability. A flip-flop, like any other bistable device, has two
minimum-potential energy levels, separated by a maximum-energy potential. A bistable system has stability at
either of the two minimum-energy points. The system can
also have temporary stability - metastability - at the
energy maximum. If nothing pushes the system from the
maximum-energy point, the system remains at this point
indefinitely.
A hill with valleys on either side is another bistable
system. A ball placed on top of the hill tends to roll
toward one of· the minimum-energy levels. If left undisturbed at the top, the ball can remain there for an indeterminate amount of time. As this figure indicates, the characteristics of the top of the hill as well as natural factors
affect how long the ball stays there. The steepness of the
hill is analogous to the gain-bandwidth product of the flipflop's input stage.
t w= t coe

Explanation of Metastability
In a flip-flop, a metastable output is undefined or oscillates between High and Low for an indefinite time due
to marginal triggering of the circuit. This anomalous flipflop behavior results when data inputs violate the
specified set-up and hold times with respect to the clock.
In the case of a D-type flip-flop, the data must be
stable at the device's D input before the clock edge by a
time known as the set-up time, ts. This data must remain
stable after the clock edge by a time known as the hold
time, th (Figure 5). The data must satisfy both the set-up
and hold times to ensure that the storage device (register,
flip-flop, latch) stores valid data and to ensure that the
outputs present valid data after a maximum specified
clock-to-output delay teo_max. As used in this application
note, teo_max refers to the interval from the clock's rising
edge to the time the data is valid on the outputs. In most
cases, teo_max equals the maximum teo found in data
sheets, as opposed to the average or typical teo value.
If the data violates either the set-up or hold specifications, the flip-flop output might go to an anomalous state
for a time greater than teo_max (Figure 5). The outputs can

't

ta • - . ,

'1_.'.

CLOCK
INPUT

LOCALLY
S'.CHUIOUI

1

1

I~_ _--.-J

OUT PUJ"··

IfnER

snCIlOIUII

I

Figure 4. Two-Stage Synchronizer

"'_"0'

1 1'----I
r--:::-: ..

~I

1

I

Figure 5. Triggering Modes of a Simple Flip-Flop

6-22

Causes of Metastability
Systems with separate entities, each running at different clock rates, are called globally asynchronous systems (Reference 4). The entities might include keyboards,
communication devices, disk drives, and processors. A
system containing such entities ~s. asynchronous because
signals between two or more entItIes do not share a fixed
relationship.
Metastability can occur between two co?currently
operating digital systems tha~ lack a. common ti~e. reference. For example, in a multlprocessmg system, It IS possible that a request for data from one system can occur at
nearly the exact moment that this signal is sampled by
another part of the system. In this case, the request ~ght
be undefined if it does not obey tlle set-up and hold time
of the requested system.
When globally asynchronous systems co~unicate
with each other, their signals must be synchrofllzed. Arbitration must occur when two or more requests for a
shared resource are received from asynchronous systems.
An arbiter decides which of two events should be serviced
first. A synchronizer, which is a type of arb~ter w~~ a
clock as one of the arbited signals, must make Its declSlon
within a fixed amount of time. A device can synchronize
an input signal from an external, asynchronous device in
cases such as a keyboard input, an external interrupt, or a
communication request.
Care must be taken when two locally-synchronous
systems communicate in a globally-asynchronous environment. A synchronization failure occurs when one system
samples a flip-flop in tlle other system that has. an. undefined or oscillating output. This event can distnbute
non-binary signals through a binary system (Reference 5).
In synchronizers, tlle circuit must decide the state of
the data input at the clock input's rising edge. If these two
signals arrive at the same time, the circuit can produce an

Figure 7. Graphical View of a Bistable System
output based on either decision, but ~ust decide one way
or the otller within a fixed amount of tIme.

Attacking Metastability
The design of synchronous systems is much different
than the design of globally-asynchronous systems. The
design of a synchronous digital syste~ is based on kn~wn
maximum propagation delays of fhp-flops and IO~ICal
gates. Asynchronous systems by definition have no fIxed
relationship with each other, and therefore, any propagation delay from one locally-synchronous system to the
next has no physical meaning.
Two different methods are available to produce locally-synchronous systems from globally:asynchro~ous systems. The first method involves creating self-umed systems. In a self-timed system, the entity that performs a
task also emits a signal tllat indicates tlle task's completion. This handshaking signal allows the use of the results
when they are ready instead of waiting for the wo:st-~ase
delay. Such handshaking signals allow commUfllcatlOns
between locally-synchronous systems.
The advantage of the self-timed method is tllat it permits .machines to run at tlle average speed instead of the
worst-case speed. The disadvantages are that a self-timed
system must have extra circuitry to compute its own completion signals and e.xtra circuitry to che~~ for tlle completion of any tasks asSIgned to external entitles.
Petri Nets data flow machines, and self-timed
modules all us: the self-timed method of communication
among locally-synchronous systems. Self-timed structures
do not completely eliminate metastability, however, because they can include arbiters that can be metastable.
Most systems do not include self-timed interfaces due to
tlle additional circuitry and complexity.
The· second method of producing locally-synchronous
systems from globally-asynchronous systems is the simple
synchronizer. This is the most com~on way of communicating between asynchronous objects. The metastability errors that might arise from these systems must .be
made to play an insignificant role when compared WIth
other causes of system failure.
Many metastability solutions involve special circuits
(References 6 and 7). Some of these solutions do not
reduce metastability at all (Reference 13 and 8). Others,
however, do reduce metastability errors by pushing .the
occurrence of metastability to a place where sufficl~nt
time is available for resolving the error. Most of these Clfcuits are system dependent and do not offer a universal
solution to metastability errors.
The easiest and the most widely used solution is to
give the synchronizing circuit enough time to both

V
A

L
I
D

D

A
T
A

o
U

T
p

U

T
T
I
M
EI--....--_ __

1w(1)

'2r

I
NORMAL
DELAY
1 
 SYNC;

{Have two registers hold the}
{true and inverted sense of }
{the synchronization register}

IF2


 ISYNC;

IERROR


 lRESET * Fl * IF2
+ /RESET * IFI * F2
+ RESET * ERROR;

{ERROR# goes low when the XOR }
{ofFI and F2 is false, ERROR#}
{also toggles on RESET}

ITSYNC

 TSYNC;

{Fmax reg toggles on every clock}

ITl

 TSYNC;

IT2

 /TSYNC;

{Have two registers hold the}
{true and inverted sense of }
{Fmax reg}

IFAIL


 Tl * IT2
+ ITI * T2;

{FAIL# goes low when the XOR}
{of Tl and T2 is false, indicating }
{Fmax has been exceeded}

Figure 23. PLD Equations for Metastability Testing
6-32

4. Chapiro, Daniel M., Globally-Asynchronous Locally-Synchronous Systems, Department of Computer Science
Report No. STAN-CS-84-1026, October 1984.
5. Horstmann, Jens U., Eichel, Hans W., Coates,
Robert L., "Metastability Behavior of CMOS ASCI FlipFlops in Theory and Test," IEEE Journal of Solid-State
Circuits, Vol 24, No 1, Feb 1989, pp. 146 - 157
6. Wormald, E.G., "A Note on Synchronizer or Interlock Maloperation," Professional Program Session Record
16, WESCON 87, November 17 - 19, 1987, Electronic
Conventions Management, Los Angeles, CA 90045.
7. Pechouchek, Miroslav, "Anomalous Response
Times of Input Synchronizers," IEEE Trans. Computers,
Vol. C-2S, No.2, Feb 1976, pp. 133 - 139.
8. Chaney, T. J., "Comments on 'A Note on
Synchronizer or Interlock Maloperation,'" IEEE Trans.
Computing, Vol C-28, No 10, Oct. 1979, pp. 802 - 804.
9. Couranz, George R., Wann, Donald F., "Theoretical and Experimental Behavior of Synchronizers Operating in the Metastable Region," IEEE Trans. Computers,
Vol C-24, No.6, June 1975, pp. 604 - 616
10. Veendrick, Harry I.M., "The Behavior of FlipFlop Used· as Synchronizers and Prediction of Their
Failure Rate," IEEE Journal of Solid-State Circuits, Vol
SC-15, No.2., April 1980, pp. 169 - 176.
11. Kacprzak, Tomasz, Albieki, Alexander, "Analysis
of Metastable Operation in RS CMOS Flip-Flops," IEEE
Journal of Solid-State Circuits, Vol SC-22, No 1, Feb
1987, pp. 57 - 64.
12. Flannagan, Stephen T., "Synchronization
Reliability in CMOS Technology," IEEE Journal of SolidState Circuits, Vol. SC-20, No.4, Aug 1985, pp. 880 882.
13. Wakerly, John F., A Designers Guide to
Synchronizers and Metastability, Center for Reliable
Computing Technical Report, CSL TN #88-341, February,
1988Computer Systems Laboratory, Departments of
Electrical Engineering and Computer Science, Stanford
University, Stanford, CA.
14. Freeman, Gregory G., Liu, Diek L., Wooley,
Bruce, and McClusky, Edward J., Two CMOS Metastability Sensors, CSL TN# 86-293, June 1986, Computer
Systems Laboratory, Electrical Engineering and Computer
Science Departments, Stanford University, Stanford, CA.
15. Rubin, Kim, "Metastability Testing in PALs,"
WESCON 87 (San Francisco, CA, Nov. 17 - 19, 1987),
Electronic Conventions Management, Los Angeles, CA
90045. 16/1.

because the first synchronization stage can synchronize
the asynchronous input signal, and the second
synchronization stage can perform a Boolean function on
a combination of the input and output signals. Boolean
functions can be performed at either stage; the metastability characteristics listed in Table 1 apply to PLD
registers' asynchronous inputs that are used directly as
well as asynchronous inputs used as a Boolean combination of existing inputs and outputs.
When implementing a two-stage synchronizer in a
PLD, the probability that a synchronizer is metastable
after the second stage of synchronization is the square of
the probability that a synchronizer is metastable after the
first stage of synchronization. The MTBF equation is

MTBF=(~)2

fc!dW

From this result, the equation for tr becomes
t sw ( In (MTBF) + 2 x In (f ef dW) )
tr=
2
Using this result for a two-stage synchronizer in a
Cypress PALC22VlOC, the tr fora 10-year MTBF is
reduced from 13.0 ns to
tr

= (0.5 ) (0.547 x 10 -9s) [In ( 315 x 10 6s )
+ In ( 90.9 x 10 6 x 90.9 x 10 6 x 8.08 x 10 -15) ]

= 7.65 ns
The maximum fc increases from 41.6 MHz to
1
1
53.6 MHz
fe
1
1
f max + t r 90.9 MHz + 7.65 ns
This example shows that if the cycle of latency
caused by the additional synchronization stage is acceptable, you can dramatically increase the synchronizer's
maximum operating frequency.

References
1. Lubkin, S., (Electronic Computer Corp.),
"Asynchronous Signals in Digital Computers," Mathematical Tables and Other Aids to Computation, Vol. 6, No.
40, Oct 1952, pp. 238 - 241.
2. Nootbaar, Keith, (Applied Microcircuits Corp.),
"Design, Testing, and Application of a Metastable-Hardened Flip-Flop," WESCON 87 (San FranCisco, CA, Nov.
17 - 19, 1987), Electronic Conventions Management, Los
Angeles, CA 90045.
3. Stoll, Peter A., "How to Avoid Synchronization
Problems," VLSI Design, November/December 1982, pp.
56 - 59.

6-33

Appendix. Metastability Graphs of Cypress Devices

CYPRESS PALC16R8-25
1.0E+09
1.0E+08

M
T
B
F
i

1.0E+07
1.0E+06
1.0E+05
1.0E+04

n
s

1.0E+03

c

1.0E+01

e
0

n

d

s

1.0E+02
1.0E+OO
1.0E-01
1.0E-02
1.0E-03
1.0E-04
0

2

4

6

8

10

12

14

16

tr (ns) • 1/fc - 1/fmax

CYPRESS PLDC18G8-12
1.0E+09
1.0E+08

M
T
B
F

1.0E+07
1.0E+06
1.0E+05
1.0E+04

n

1.0E+03

s

1.0E+02

c
n
d
s

1.0E+01

e
0

1.0E+OO
1.0E-O 1
1.0E-02
1.0E-03
1.0E-04
0

1

2

34567

tr (ns) • 1/fc - 1/fmax
6-34

8

9

10

Appendix. Metastability Graphs of Cypress Devices

CYPRESS PALC20G10-20
1.0E+09
1.0E+08
M

1.0E+07

T

1.0E+06

F

1.0E+05

B

i
n

1.0E+04
1.0E+03

s
e
c

1.0E+02

0

1.0E+OO

d

1.0E-01

n

s

1.0E+01

1.0E-02
1.0E-04

o

1

5

234

6

tr (ns) • 1/fc - 1/fmax

CYPRESS PALC20RA 10-15
1.0E+09
1.0E+08

M
T
B
F

i
n

s
e
c
0

n
d

s

1.0E+07
1.0E+06
1.0E+05
1.0E+04
1.0E+03
1.0E+02
1.0E+01
1.0E+OO
1.0E-01
1.0E-02
1.0E-03
1.0E-04
1.0E-05
0

1

234

5

tr (ns) • 1/fc - 1/fmax
6-35

6

7

.-.

45i~~~~~~~~~~~~~~~~~~~~A~r~e~Y~O~U~r~p~L~D~S~~~et~a~s~ta~b~l~e~?
Appendix. Metastability Graphs of Cypress Devices

CYPRESS PALC22V10-20
1.0E+09
1.0E+08

M

1.0E+07

F

1.0E+05

T
B
i
n

s

e

c

0

n
d

s

1.0E+06

1.0E+04
1.0E+03
/"

1.0E+02
1.0E+0 1
1.0E+00
/"

1.0E-0 1
1.0E-02
1.0E-03

111;;;;;;;I;;;;;;;;;;1;;;;;;;;IIII;;;;;;;;;;;;;;;;;;1

o

1

2 3 4

5

tr (ns) • 1/fc - 1/fmax

CYPRESS PALC22V10B-15
M

T
B
F

i

n
s
e
c
0

n

d

s

1.0E+09
1.0E+08
1.0E+07
1.0E+06
1.0E+05
1.0E+04
1.0E+03
1.0E+02
1.0E+0 1
1.0E+00
1.0E-01
1.0E-02
1.0E-03
1.0E-04
1.0E-05
1.0E-06
0

2

4

6

tr (ns) • 1/fc - 1/fmax
6-36

8

10

Are Your PLDs Metastable?
Appendix. Metastability Graphs of Cypress Devices

CYPRESS PALC22V10C-10
1.0E+09
...!"

1.0E+08

M

1.0E+07

B

1.0E+06

T

F
i

1.0E+04

s
e
c

1.0E+03

./

./

./.

./

1.0E+02

n

1.0E+01

s

1.0E+OO

d

./

1.0E+05

n

0

./

.;1

./

1.0E-01
1.0E-02
0

4

2

6

8

10

12

14

tr (ns) • 1/fc - 1/fmax

CYPRESS CY.7C330-66
1.0E+09
1.0E+08

M

1.0E+07

T

1.0E+06

F

1.0E+05

i
n

1.0E+03

B

s
e
c

0

n
d

s

1.0E+04
1.0E+02
1.0E+0 1
1.0E+00
1.0E-O 1
1.0E-02
1.0E-03
1.0E-04
0

2

4

6

tr (ns) • 1/fc - 1/fmax
6-37

8

10

Are Your PLDs Metastable?
Appendix. Metastability Graphs of Cypress Devices

CYPRESS CY7C331-20
M
T
B
F

i
n

s
e
c
0

n
d

s

1.0E+09
1.0E+08
1.0E+07
1.0E+06
1.0E+05
1.0E+04
1.0E+03
1.0E+02
1.0E+01
1.0E+OO
1.0E-O 1
1.0E-02
1.0E-03
1.0E-04
1.0E-05
1.0E-06
0

1

23456

7

tr (ns) • 1/fo - 1/fmax

CYPRESS CY7C332-15
1.0E+09
1.0E+08

M
T
B
F

i
n

1.0E+07
1.0E+06
1.0E+05
1.0E+04
1.0E+03

s
e
c

1.0E+02

0

1.0E+OO

s

1.0E-02

n
d

1.0E+O 1
1.0E-O 1
1.0E-03
1.0E-04
0

2

4

6

tr (ns) • 1/fo - 1/fmax
6-38

8

10

Appendix. Metastability Graphs of Cypress Devices

CYPRESS CY7C344-20
M

T
B
F

n

s

e

c

0

n

d

s

1.0E+09
1.0E+08
1.0E+07
1.0E+06
1.0E+05
1.0E+04
1.0E+03
1.0E+02
1.0E+O 1
1.0E+OO
1.0E-01
1.0E-02
1.0E-03
1.0E-04
1.0E-05
1.0E-06
1.0E-07

0

2

4 6

tr (ns) • 1/fc - 1/fmax

6-39

8

CYPRESS
SEMICONDUCTOR

PLD-Based Data Path for SCSI-2
This application note begins by describing the
major differences between the original SCSI standard
and the new SCSI-2 document, with special emphasis
on SCSI-2's high-speed signal timing. This information
is then put to use in a PLD-based, high-speed data-path
design for a SCSI-2 host bus adapter.

Connectors/Cables
SCSI-2 documents a 50-mil-pitch connector system.
This connector family allows fully shielded assemblies
for the 50-wire A cable and optional 68-wire B cable.
Many SCSI manufacturers use this micro-D-type connector in .volume. You can use the cable/connector
scheme in a mix-and-match system with SCSI-I connector/cable types through the use of adapter cables that
have different connector types on each end.
One of the de facto (non-ANSI-standard) SCSI
cable schemes, the 25-pin D-sub connector made
popular by the Apple Macintosh, does not support
SCSI's differential signal implementation. This cable
system achieves its low pin count by removing a large
number of the ground signals specified for single-ended
operation. Because the single-ended transmission
scheme is not recommended for SCSI-2's fast
synchronous information transfer mode, users of this
connector/cable system limit the data rates, cable
lengths, and noise margins at which they can operate.

Small Computer System Interface
The SCSI-2 standards document is based on the
original SCSI-l
standard (ANSI X3.131-1986)
developed by the X3T9.2 Accredited Standards Technical Subcommittee. The SCSI-2 specification, generated
by this same subcommittee, offers substantial improvements over the existing SCSI-l standard in documentation, function, performance, interoperability, and command-set standardization.
With the new SCSI-2 ANSI standard, companies
that use SCSI for their peripheral I/O now face difficult
decisions: Which of the new capabilities offered by
SCSI-2 should they support?
The changes in the SCSI-2 document affect both
hardware and software. Although it is possible to implement the changes affecting software drivers over time,
as these new features appear in peripherals delivered to
the marketplace, companies must decide now which
hardware features a host bus adapter (HBA) should
support. After deliveries to customers, hardware changes made as field upgrades or retrofits always bear high
costs and often present a negative picture to the customer.
The physical differences between the original SCSI
and the new standard fall into four main categories:
SCSI-l options that are now requirements, new connector/cable options, faster transfer rates, and wider data
buses.

Transfer Rates
SCSI supports two types of information transfer;
asynchronous (interlocked) and synchronous (data
streaming/offset interlock).
In asynchronous transfers, a four-way handshake
occurs between the SCSI peripheral (target) and the
HBA (initiator) for each piece of information transferred on the SCSI bus. The SCSI bus's REQ (request)
and ACK (acknowledge) control signals are used in this
handshake operation, with the SCSI I/O signal determining the direction of information flow. This
asynchronous transfer mode is the default mode for all
SCSI devices and is required for all MESSAGE, COMMAND, and STATUS transfers. On SCSI systems implemented with very short cables and fast turn-around
times in both the target and the initiator, theoretical
burst-transfer rates can exceed 10 Mtransfers/s. None of
the commercial LSI SCSI controller chips available at
this time support this high rate for asynchronous trans-

SCSI-! Options
To be considered SCSI-2 compliant, an HBA must
support both the parity and arbitration options of SCSI1. SCSI-2-compliant HBAs should be software configurable by SCSI device address to allow use of older
SCSI-l peripherals that do not have both capabilities.
6-40

PLD-Based Data Path For SCSI-2
fers. Most of these controllers handle asynchronous
transfers at 50 Ktransfers/s to 3 Mtransfers/s.
SCSI-2 implements the synchronous transfer mode
to remove device turn-around time and cable and
transceiver delays as factors affecting transfer rates. Unlike asynchronous transfers, which are limited by the
interface's four-way path delay, synchronous transfers
are limited by interface skew-the difference in transmission delays among signals on the interface.
SCSI-2 allows use of the synchronous method only
for data transfers and only after enabling it with a SCSI
MESSAGE negotiation between the initiator and target.
Synchronous transfers exist in SCSI-I, but few commercial LSI SCSI controllers or peripherals implement this
implementation defines
capability. The SCSI-1
synchronous transfers for data transfer periods of 200
ns and slower. This specification limits the synchronous
data rate to 5 Mtransfers/s.
With tighter-tolerance parts and low-pair-to-pairskew cables now available, SCSI-2 defines an additional
form of synchronous data transfer with a 100-ns minimum period. This change pushes the SCSI-2 maximum
data rate to 10 Mtransfers/s. Because of the tighter
timing defined for the fast synchronous transfer mode,
the SCSI-2 document does not recommend this mode's
use with single-ended transceivers, even for short cable
lengths.

The vast majority of the SCSI-2 changes are not
really changes at all, just better definitions of items
documented in the existing SCSI-1 standard. The arbitration and parity capabilities carry over unchanged
from the SCSI-1 standard. The connectors and cables
are now well defmed, with multiple component sources.
The wide bus options require only a replication of existing data-path hardware, but the data-path hardware itself has undergone a significant change.
The new fast synchronous data-transfer mode requires much tighter timing control than was necessary
with SCSI-I. If you plan on using the fast synchronous
transfer capability, you must contend with differential
transceivers, low-skew cables, three data-transfer modes
(asynchronous, synchronous, and fast synchronous), and
short set-up and hold times.
With all these challenges, it might seem doubtful
whether anyone will use the fast synchronous transfer
mode. However, a system analysis shows that implementing fast synchronous mode will cost less than
any of the wide-bus implementations and still yield a
burst data rate as high as 10 Mbytes/s with the standard
50-pin cables. This data rate is twice the maximum· offered in SCSI-1 and equal to that offered by the competing Intelligent Peripheral Interface (IPI) in its 2byte-wide standard implementation. The wide-bus requirement of a second cable also causes problems in
weight, cost, and space. Many of the newer 3.5-in.
peripherals just do not have room for an additional 68pin connector. .

Wide Data Bus
The last hardware addition allows use of wider
SCSI data buses. In SCSI-1 the interface's data-bus portion was only eight bits wide. SCSI-2 allows two addi~
tional bus widths of 16 and 32 bits. Because of these
different bus widths, SCSI-2 information transfer rates
are usually specified in transfers/second rather than
bytes/second. You determine the bytes/second rate by
multiplying the SCSI data-bus width in bytes by the
number of transfers per second on the interface.
The wide SCSI bus is currently defined as a secondary 68-signal B cable that can contain an additional
three bytes of bus width. Because this B cable contains
only the SCSI control signals necessary for information
transfer, you must use it in conjunction with a 50-signal
A cable for proper communications.
Use of the wide SCSI option at the maximum 32-bit
data-bus width, along with the fast synchronous transfer
mode, provides data transfer operations as high as 40
Mbytes/s.

SCSI Transfer Timing
Of the 23 different interface timing values specified
in the SCSI-2 document, 11 apply directly to the different forms of information transfer. These values are:
Cable skew delay
10 ns
Deskew delay
45 ns
Synchronous REQ/ ACK assertion period
90 ns
45 ns
Synchronous data hold time
Synchronous REQ/ ACK negation period
90 ns
Synchronous/fast synchronous transfer period Selectable
Fast synchronous REQ/ ACK assertion period 30 ns
Fast synchronous cable skew delay
5 ns
20 ns
Fast synchronous deskew delay
Fast synchronous data hold time
10 ns
Fast synchronous REQ/ ACK negation period 30 ns
Of these 11 timing values, only the cable skew delay
and the deskew delay apply to the asynchronous mode
of information transfer. The remaining values apply to
the two modes of synchronous data transfers.
These timing values are all specified for the transmitting end of the SCSI interface. Sufficient margins are
included in these values to allow proper interface
operation under worst-case configurations of transmitters, receivers, and cables. The fast synchronous mode
cuts many of the timing parameters by half or more
from those of the synchronous mode. Because the interface must still operate over the same distance (up to

New Problems
SCSI users who require no more performance than
they currently have need not make any changes to accommodate SCSI-2. The SCSI-1 standard's capabilities
exist as a subset of SCSI-2. However, users experiencing
an I/O bottleneck imposed by their current SCSI implementation must implement one or more of the new
SCSI-2 features to get additional performance.
6-41

ec~CYPRF$

PLD-Based Data Path For SCSI-2

~, SEMlCCtIDUCTOR

TIMING AT TARGET
DB[ 0 ..7. Pl

REQ

~ID DATA ON BUS

X tEXT

~1E==!5!5n. __-~1.

VALID DATA

~!5!5n1~

.'nlllu----..J~------,

ACK

_lnlllUII

~

~

I

TIMING AT INITIATOR
DB[O ..7.Pl~

REQ

~

I

ACK

,~

I

______

,'-_ __

Figure 1. Asynchronous Transfer Timing, Target Transmit
25m), usage of fast synchronous mode demands tighter
tolerances for many of the electrical components.

Because the initiator is not supposed to drive the
SCSI bus until a transfer's first REQ occurs, the total
delay for this first transfer is longer than the delay for a
flrst transfer from the target to the initiator. To get
around this longer delay, many initiators prestage the
data for subsequent transfers. The initiator does this by
driving the data bus with the next byte of information as
soon as the REQ signal from the previous transfer goes
Low (Figure2).

SCSI Transfers
All information transfers on the SCSI bus are controlled by the target device. The initiator cannot send or
receive information until it flrst has received a valid
REQ signal from the target device.

Asynchronous Mode Transfers

Synchronous Mode Transfers

The interface timing for asynchronous transfers is
common to all SCSI devices. Because MESSAGE,
COMMAND, and STATUS transfers require support
for this mode, all SCSI devices must support it. The interface timing for asynchronous operation varies slightly, depending on whether the SCSI initiator or SCSI target is sending information.
When the target sends information, it must flrst
place the correct data on the SCSI bus, delay a minimum of 55 ns, then assert REQ. The 55-ns delay accounts for all possible data-transmission-time variations
caused by transceivers, bias and. termination networks;
cables, and the information present on them. Because
the data has been on the SCSI bus for at least this long
prior to· REQ's assertion, the initiator knows that the
data present at its inputs is supposed to be valid when it
receives the asserted REQ signal. Because no set-up
time is guaranteed at the initiator, it should not assert
its ACK signal to respond to the REQ signal until after
delaying long enough to ensure that it (the initiator) can
properly. capture the data (Figurel).
When the initiator sends information; it must first
wait until it receives the REQ signal from the target.
This is necessary because the bus phase, which determines .the information to be sent and the direction of
the SCSI bus, does not begin until the REQ signal is
asserted for that phase's first transfer. After receiving
this flrst REQ, the initiator can place its· data on the
SCSI bus, delay a minimum of 55 ns, and respond by
asserting ACK. The SCSI target must delay its negation
of REQ until it has captured the data.

The synchronous mode of information transfer is
an option for SCSI-I and SCSI-2 devices. This mode is
only usable for data transfers and is not valid for MESSAGE, COMMAND, and STATUS transfers.
SCSI target devices with the ability to use
synchronous mode default to asynchronous transfer
mode following either a SCSI reset or power-up sequence. To allow synchronous transfers to occur, the
target device must fIrst be placed into synchronous
mode through a MESSAGE negotiation sequence with
an initiator. This sequence sets both the minimum
synchronous transfer period and a maximum
REQI ACK offset count.
The synchronous transfer period specifies the minimum period between successive leading edges of any
two consecutive REQ pulses or ACK pulses while
operating with synchronous transfers. If the negotiated
period is less than 200 ns but not less than 100 ns, the
data .transfer is specified as operating in the fast
synchronous mode and must meet the interface timing
requirements specified for fast synchronous transfers. If
the negotiated period is 200 ns or longer, the data transfer is specified as operating in the synchronous mode
and must meet the interface .timing requirements
specified for .synchronous transfers. If the negotiated
period is ever set to zero, the data transfer mode reverts
to asynchronous.
Unlike asynchronous transfers, .where REQ and
ACK are directly interlocked to each other to control
the transfer's speed, synchronous mode data transfers
impose no direct timing relationship between the
6-42

PLD-Based Data Path For SCSI-2

TIMING AT TARGET
DB[0 ..7.Pl~

REO

J

ACK

______________~I

)()

\\-------~I

\..

\\-_----~~

TIMING AT INITIATOR
DB[ 0 ..7. Pl XXXXXXXX

VALID DATA ON BUS

X

NEXT VALID DATA

REO
ACK
Figure 2. Asynchronous Transfer Timing, Initiator Transmit
Offset
Count

2

J

"

~

6

6

7

specifically identified for synchronous transfers. SCSI
synchronous-mode transfers do not require a 50-percent
duty cycle for REQ or ACK timing. When operated at
or near the maximum transfer rate the required interface timings approach this ratio, but at slower rates the
duty cycle is allowed wide variability.
When the target sends information in synchronous
mode, the target must place it's data on the SCSI bus a
minimum of 55 ns before asserting REQ. The target can
then remove or change the data a minimum of 100 ns
following REQ's assertion. REQ must remain active for
a minimum of 90 ns and, once negated, cannot be reasserted for a minimum of 90 ns. In addition to these requirements, the minimum negotiated period must be
maintained. A data transfer is completed when the target has no more data to send and the REQIACK offset
count has returned to zero. As with the asynchronous
transfer mode, the specified delays guarantee valid data
at the initiator on REQ's leading edge and not before

666

REQ
ACK _ _--'-_ _ _ _ _ _ __

Figure 3. REQIACK Offset Count

target's REQ pulses and the initiator's ACK pulses. Instead, the initiator uses a count relationship, known as
the REQIACK offset count, to slow the transfer. Maintained by both the initiator and the target, this count
keeps track of the difference between the number of
REQ and ACK pulses. When the count in the target
device reaches the negotiated maximum value (Figure
3), the target device stops sending REQ pulses until the
initiator brings the count below the maximum by returning an ACK pulse. A proper synchronous transfer requires that an equal number of REQ and ACK pulses
be sent.
The timing relationships of the REQ and ACK pulses and the data passed with them is specified by the
two values used for asynchronous transfers and values

(Figure4).

When the initiator sends information to the target,
the initiator must wait until it receives the REQ signal
from the target. Once the initiator receives REQ's lead-

TIMING AT TARGET
DB[ 0 ..7. Pl

REO
ACK

~D DATA ON BUS
~~55nl

J

100ni mln~'

NEXT VALID DATA

minimum

TIMING AT
DB[ 0 ..7. Pl ~~~~~mK==~~Z=~~

REO
ACK
Figure 4. Synchronous Transfer Timing, Target Transmit

6-43

PLD-Based Data Path For SCSI-2

TIMING AT TARGET
D8[ 0 ..7. P1

REO
ACK
TIMING AT
08[0 ..7. P1

REO
ACK
Figure 5. Synchronous Transfer Timing, Initiator
Transmit
synchronous mode. Additionally, the minimum data set
up prior to transmitting a REQI ACK pulse decreases to
25 ns, and the data hold time after REQIACK' s leading
edge is only 35 ns. This timing provides data specified
as valid, at the receive end of the SCSI bus, for only 10
ns immediately following REQI ACK reception. See
Figures 6 and 7 for fast synchronous mode timing
diagrams.

ing edge, the REQIACK Offset count in the initiator is
no longer zero. So long as the initiator has data available to send and the REQIACK Offset count is non
zero, the initiator can continue to send data to the
target.
The timing for this transfer (Figure5) is like that of
the transfer from the target described above.
Synchronous"mode provides valid data at the SCSI bus's
receiving end during" a 45~ns interval immediately following REQI ACK reception.

SCSI-2 Data Path Design
Synchronous and asynchronous data transfers, IOns timing windows, fixed and variable delays, and
programmable pulse widths are all necessary functions
of a SCSI-2 data path. The simpler techniques used
with SCSI-l's 45-ns data-availability windows are quite
different from those needed to operate with SCSI-2's
10-ns windows. Fortunately, designing a data path that
handles all possible SCSI-2 information transfer modes
is not as difficult as it might appear. By carefully selecting some of the newer PLD and interface parts, you can
implement the design quite efficiently.

Fast Synchronous Mode Transfers
Fast synchronous transfers function the same as
synchronous transfers but with different timing
parameters. These transfers only exist for REQI ACK
pulse periods shorter than 200 ns and longer than or
equal to 100 ns.
With fast synchronous transfers, the REQI ACK
minimum assert and negate times decrease to one third
their previous size. Thus, SCSI-2 permits REQ and
ACK pulses as short as 30 ns when operating in fast

TIMING AT TARGET
DB [ 0 ..7. Pl

REO

~D DATA ON BUS

,
NEXT VALID DATA
3!!1ns min =;j'

~1E==2!Sn.

minimum

ACK

TIMING AT
DB[ 0 ..7. Pl =~~~~~==tX~=~~

REO

ACK
Figure 6. Fast Synchronous Transfer Timing, Target
Transmit

6-44

PLD-Based Data Path For SCSI-2

TIMING AT TARGET
DBf o..7. Pl
REO

ACK

DBf 0 ..7. Pl

REO
ACK
Figure 7. Fast Synchronous Transfer Timing, Initiator
Transmit
To successfully meet the needs of fast transfer rates
and operability for a wide variety of peripherals, the
SCSI-2 design must be capable of:
Asynchronous data transfers at up to 5 Mtransfers/s
Synchronous data transfers at a maximum transfer
rate of 5 Mtransfers/s, with selectable lower transfer rates for peripherals that cannot operate at the
maximum synchronous rate
Fast synchronous data transfers at a maximum
transfer rate of 10 Mtransfers/s, with selectable
lower transfer rates between 10 and 5 Mtransfers/s
for peripherals that cannot operate at the maximum
fast synchronous rate, yet can operate faster than
the maximum synchronous rate
Operation with differential transceivers

trol operations for receive or transmit must perform the
same function: receiving or transmitting information.
Grouping the receive and transmit control functions
into two separate and more generalized functional units
reduces the design's complexity.
The necessary operations of the receive control
function are:
Clocking information into the receive data register
Returning and removing the ACK signal at the
proper time
Writing the received data into the data buffer
The necessary operations of the transmit control
function are:
Reading the data from the data buffer and clocking
the data into the transmit data register
Returning and removing the ACK signal at the
proper time
Timing the necessary data set-up time
Timing the necessary data hold time
Timing the necessary ACK assertion time
Timing the necessary ACK negation time
The data buffer function is another area where
some consolidation can· occur. Because the SCSI interface cannot send and receive data at the same time, a
single common buffer is used for both transmit and
receive functions.
With these functions combined, the design now
comprises seven functions:
SCSI interface transceivers
Receive data register
Transmit data register
Data buffer
REQI ACK offset counter
Receive control
Transmit control

Design Partitioning
Correct partitioning is probably the most critical
part of achieving an efficient implementation of any
SCSI design. When partitioning the design, list the
necessary functions and, where possible, combine multiple functions into a single, more global function. A
SCSI-2 data path must include these functions:
SCSI interface transceivers
Receive data register
Transmit data register
Receive data buffer
Transmit data buffer
REQI ACK offset counter
Asynchronous receive control
Asynchronous transmit control
Synchronous receive control
Synchronous transmit control
Fast synchronous receive control
Fast synchronous transmit control
Although the transmit and receive control functions
must operate with different timing values, the
asynchronous, synchronous, and fast synchronous con-

SCSI Interface Transceivers

The SCSI interface supports both single-ended and
differential transceiver types. The single-ended variety is
6-45

PLD-Based Data Path For
most common today because it is relatively inexpensive
and most commercial LSI SCSI controller chips incorporate this type. Single-ended transceivers suit cable
lengths less than 6m long and synchronous data rates of
5 Mtransfers/s or less.
SCSI devices using fast synchronous mode require
differential transceivers. This transceiver type meets the
electrical specifications of the EIA RS-485 standard.
Operating from a single +5V supply, these transceivers
can handle large swings in common mode noise, are
guaranteed glitch free during power-up and -down
operations, and have short-circuit and thermal-shutdown protection. SCSI applications that use cables
longer than 6m also require differential transceivers. Although currently limited in the SCSI standard to operation at no more than 25m, this transceiver type can
drive signals much farther, as shown by the Intelligent
Peripheral Interface usage of the same parts at 65m.
Differential transceivers have one other advantage
that is often overlooked. Because two differential signals determine the output state of each receiver, it is
possible to achieve either active High or active Low
TTL inputs and outputs by reversing the connection of
the + and - differential signal lines on the SCSI bus.
This programmable inversion can often eliminate the
need for an inverter, and its associated delay, from
many of the differential signals paths.
All existing SCSI applications that use differential
transceivers place these parts external to the LSI SCSI
controller chips. This practice is due primarily to the
transceivers' power dissipation and partially analog
operation. Until recently you could only get differential
transceivers in singles-one transmitter and receiver in
an 8-pin part. This packaging required 18 parts to implement the transceivers for a SCSI-l bus.
Due to the growing usage of these parts and improvements in power control technology, manufacturers
now offer triple and quad transceiver parts. Some of
these parts are designed specifically for the SCSI environment. To allow for the selection and arbitration sequences, for example, the trapsceivers have separate
transmitter enables that allow individual transmitters to
be turned on within the part. These transceivers meet
all signal and skew requirements of the SCSI-2 fast
synchronous mode.
Receive Data Register
The information from the transceivers is used for
arbitration, selection, and reselectionsequences, as well
as information transfers. Of the transfer· sequences, the
fast synchronous transfer mode has the most stringent
timing concerns.
Because of the fast synchronous mode's· lOons dataavailability window, the receive data register must have
a very short set-up and hold time. The 74F823, a 9-bit
D-type register, fits this application nicely. With a maximum set-up-and-hold-time total of 5.5 ns, the register
leaves room for a 4.5-ns skew in clock timing for proper

SCSI~2

operation. Because of this timing, the clock path to the
receive data register can afford only a single gate delay.
To meet the defined lOons data window and work with
the 74F823, the single gate must have a minimum
propagation delay of 3 ns and a maximum delay of 7.5
ns for the Low to High output transition. Depending on
the gating function needed, any parts such as the 74F08,
74Fll, or 74F32 meet the timing window.
Transmit Data Register
The same part type, 74F823, also works on the
transmit side of the interface. Because both the transmit
and receive data registers are as wide as the full SCSI
data bus, they implement a nearly seamless design.
Data Buffer
You can implement the data buffer for a SCSI interface in many ways. Host bus adapters that support
data-caching functions might require a large piece of
memory. Because the data cache usually exists several
logic levels away from the physical SCSI interface, the
HBA needs smaller piece of memory to act as a "rubber band" between the SCSI target and the host or
HBA memory. Using such a front-end buffer allows
data to move quickly on the SCSI physical interface.
Because the SCSI interface is asynchronous to most
of the logic activity in any HBA, the cleanest form of
this front-end data buffer has an asynchronous interface, which permits the buffer to accept data as the data
becomes available. Memories of this type fall into two
categories: dual~port RAMs and FIFOs. The . latter is an
excellent fit because the information transferred over
the SCSI interface is order dependent and does not
contain memory-address information. The FIFO
eliminates any need for address-sequencing logic for
moving information in and out of the data buffer.
The data buffer must also be bidirectional to allow
the HBA to send and receive information. You can create a. bidirectional FIFO using unidirectional FIFO
memories with external bus-steering and control logic.
Unfortunately, a bidirectional FIFO built in this manner
requires many extra parts, power, and board space. A
much better· choice is to use a monolithic bidirectional
FIFO.
Although most available bidirectional FIFOs are
register programmable and require a· processor connection to control their operation, the Cypress CY7C439
bidirectional FIFO does not. This 2K x 9-bit FIFO supports the full 9-bit SCSI data bus, in addition to the pin
programmability necessary for simple state machine
control.
REQIACK Offset Counter
The HBA uses the REQ/ ACK offset counter (Figure 8) for synchronous and fast synchronous transfers.
The counter keeps track of how many unanswered REQ
pulses the HBA has received and must respond to.
Both transmit and receive operations employ this logic.

a

6-46

PLD-Based Data Path For SCSI-2
lated from the information in the SCSI-2 document.
You can approximate the remaining values to arrive at a
number accurate to within a power of 2 (1 counter bit).
The cables specified for the SCSI interface use a
solid dielectric whose Vp ranges from 60 to 66 percent.
Additionally, the use of twisted-pair cables is strongly
recommended to reduce crosstalk. When wires are
twisted together to form a cable, longer wires are
needed to reach a specific physical cable length.
Depending on the amount of twist in the pairs, the
longer wires can lengthen the physical signal from 2 to
30 percent. The cables specified for fast synchronous
transmission have a very tight pair-to-pair signal skew
specification that is partially achieved by having a very

Just how big a counter is needed? Although it
would be easy to pick an arbitrary number, you can calculate the size of the counter needed to keep the SCSI
interface operating at its peak rate. This task requires a
counter of N bits, where R outstanding REQ and ACK
pulses can be active, such that R=2N-1. This same R
valu~applies to the target device as the maximum
REQIACK offset count.
The value of R depends on the SCSI cable's length,
the velocity of the cable signals' propagation (Vp), the
fastest synchronous period to be used, the turnaround
time of a REQ pulse to an ACK pulse in the initiator,
and the recognition time for an ACK pulse in the target. Many of these values are specified or can be calcu-

..

---

PAL22V10C

a.aac

II!I1..IN
...uN

1P..INt

E!CI!D

1
1
1

JCr..JIIMI

/tI'
taUN

41
1

I

IXMLDII

1

I

D£MI

I
1

III
QI

II!

1
I

1-----I~-7

L
Figure 8. REQIACK Offset Counter

6-47

DII'I"

PLD-Based Data Path For SCSI-2
2. Each generated ACK pulse generates a single
count down.
3. The counter does not change if REQ and ACK
are recognized simultaneously.
Although the simplest approach would be to run
the REQ signal from the receiver straight into the
metastable-prevent circuit, this could cause problems in
some systems. Because the REQ signal is allowed to be
as narrow as 30 ns at the cable's transmitting end, this
pulse might shrink under some conditions such that the
received pulse is less than the 20-ns sample period (plus
set-up and hold time). This situation could occur under
worst-case conditions of intersymbol interference, cable
imbalance, and bias distortion, causing the the
REQ/ ACK offset counter to miss the REQ pulse and
create a transmission error.
To make sure the counter does not miss the REQ
pulse, you need to add a D flip-flop, configured as an
edge detector, just before the metastable-prevent circuit. This flip-flop forces the received REQ signal to
remain at the counter input until it is recognized.
Although you can build the REQ/ ACK counter
with a small handful of MSI/SSI parts, a superior approach is to use a single Cypress PAL22VlOC PLD.
This one part can include the entire 3-bit up/down
counter, two single-count-per-pulse filters, and both
REQ and ACK metastable-prevent structures. Because
of the PAL22VlOC's synchronous operation, the
asynchronous edge-detector function still requires a
single 74F74 flip-flop external to the PAL22VlOC
REQ/ ACK offset counter. The equation list for this
PLD appears in Appendix A.
Receive Control
Data reception from the SCSI bus is handled the
same for all modes of information transfer. This is possible because the information on the SCSI bus is always
valid at REQ's leading edge for asynchronous,
synchronous, and fast synchronous transfer modes.
Every received REQ pulse can thus clock the receive
data register. Even when the initiator sends data to the
target, and therefore clocks invalid data into the receive
data register, the next REQ pulse overwrites the invalid
data.
It is necessary to delay the received REQ signal's
leading edge by a gate delay that matches the 74F823
Received Data Register's set-up and hold times. The
74F08 fits nicely here with a 3-ns minimum delay on
Low-to-High transitions and a 6.6-ns maximum delay.
This delay still gives a 900-ps margin for fast
synchronous transfers, judging from worst-case commercial specifications.
Because timing is so tight when doing fast
synchronous transfers, take care to avoid destroying any
designed-in margins with poor circuit layout. The standard FR4 substrates used for most circuit boards exhibit
a dielectric constant of about 5. With this high number,
circuit trace delay exceeds 2 ns/foot. To prevent infor-

loose twist in the signal pairs. In these cables, each
line's internal physical signal length is approximately 2
to 10 percent longer than the external physical length.
With a. maximum external cable length of 25m, the
calculated one-way maximum signal delay through the
cable is
t = (25m + 2.5m) * 5.56 ns/m
t = 153 ns
Because the SCSI target does not know that an
ACK has occurred until the ACK propagates to the
target's end of the cable, this one-way delay must be
doubled to allow for the return path time.
In addition to cable delay, the transceivers themselves contribute a major portion of the total loop delay.
The data sheet for a DS36954 quad differential
transceiver lists a maximum delay value of about 20 ns
for each transmitter and receiver that the REQ and
ACK signals pass through. This delay adds 80 ns to the
loop delay.
The next delays to consider are the turnaround and
recognition times in the initiator and target. These
delays must be approximated by examining the operations that must occur. Because both the REQ and ACK
signals are asynchronous when they are received, they
must go through a metastable-prevent circuit before
they can be used. The faster forms of TTL-compatible
logic can execute a metastable prevention procedure in
less than 20 ns and still provide a reasonable MTBF.
Following this procedure, a counter must operate on
the signal and generate a status value, which determines
whether the transfer can proceed or must suspend. For
worst-case operations, a miss must be assumed for the
first stage of the metastable-prevent circuit. This assumption yields a maximum REQ/ ACK offset counter
delay of 80 ns.
The REQ/ ACK send delay is the last piece of the
delay loop. The REQ/ ACK send delay assumes the
necessary data set-up time before generation of the
REQ or ACK pulse to send the data. For the fastest
transmission mode, this delay could be as long as 70 ns.
Adding these values yields a loop delay of
306 ns Cable delay
80 ns Transceiver delay
80 ns Initiator REQ/ACK offset counter delay
80 ns Target REQ/ ACK offset counter delay
70 ns Data set-up delay
616 ns Total loop delay
Considering this figure and the 100-ns Inlmmum
period for fast synchronous transmission, achieving continuous data flow demands that there be at most six outstanding REQ pulses at the target. This task requires a
minimum of a 3-bit REQ/ACK offset counter to maintain data streaming for fast synchronous transfers.
This counter must operate under the following
rules:
1. Each received REQ pulse generates a single
count up.

6-48

PLD-Based Data Path For SCSI-2
mation transfer errors, make sure the REQ signal's
routing length to the receive data register is never more
than 5 in. shorter or longer than, any of the data-path
signals.
Once information has been captured in the receive
data register, it must be written into the data buffer.
The I/O signal in this state indicates that the SCSI bus
direction is set for input to the initiator.
With these conditions met and REQ present, a
FIFO write operation must occur. For a correct write to
occur, the CY7C439 FIFO requires a pulse on the
ISTBB pin with a minimum width of 30 ns. With SCSI-1
peripherals, you could build a small asynchronous state
machine to generate a write FIFO pulse of this minimum width; the state machine could utilize the false
state of the REQ signal that occurs after each REQ
pulse. If you use this method, you need some external
logic to terminate the last write to the memory.
To support SCSI-2 peripherals that use fast
synchronous transfers, you need a different method. Because the REQ pulse's transmitted false state for fast
synchronous transfers can be as small as 30 ns, a pulse
of this same width cannot be guaranteed at the receive
end.
You can choose among many methods for generating fixed-width pulses: delay lines, TTL delay elements
(74LS31), strings of gates, counter chains, one shots,
and standard TTL parts feeding R-C circuits. Each circuit type has its inherent problems. One shots are
notorious for not triggering at all or mistriggering,
lumped-constant delay lines have high field failure rates,
and TTL delay elements have a too-wide margin of
variability for a manufacturable design. In this case,
however, a new type of reprogrammable CMOS
synchronous state machine PLD, the Cypress CY7C361,
can easily generate the required pulse.
The CY7C361 is a programmable state machine
that allows multiple concurrent and interacting state
machine s to operate in the same part. Based on a Petri
Net or token-passing philosophy, the CY7C361 can contain as many state machines as its state registers, inputs,
and outputs support. This part contains 32 separate
state registers that can operate at internal frequencies
as high as 125 MHz. The CY7C361 also contains an internal clock doubler, which makes it unnecessary to
generate and distribute frequencies upwards of 100
MHz in a TTL environment. Because this part is
designed for interface operations, it also contains
metastable-hardened input structures.
By operating from the same 50-MHz clock used
with the REQ/ACK offset counter (doubled internally
to 100 MHz), a CY7C361-based 4-state machine can
generate a 40-ns pulse to write the information into the
FIFO memory.
The state machine must account for the procedure
used to govern writes to the FIFO. Although FIFO
writes can occur even if the FIFO is half full, as determined by the FIFO status flags, the ACK signal that al-

lows the interface to continue operation is held up until
the host reads enough information from the FIFO to
bring the FIFO state below half full. This governing
procedure is used for asynchronous and synchronous
operations. For synchronous operations, data continues
to be written into the FIFO even after reaching the halffull state. Although ACK pulses are no longer returned
to the target when the FIFO is at or above half full, the
FIFO writes are only suspended when the REQ/ACK
Offset counter in the target reaches its maximum and
stops sending REQ pulses.
Figure 9 shows the simple state diagram for writing
information into the FIFO. The diagram includes four
active states (1 - 4) and a reset state (0). When in the
reset state, the CY7C361 continuously watches for a
REQ signal to occur while the SCSI bus's I/O signal is
asserted (SCSI bus direction = IN). When this condition occurs, the state machine advances to state 1 and
continues through states 2, 3, 4 and back to reset. The
CY7C361 implements this state machine using three of
the 32 available state registers, labeled here as WO, WI,
and W2. State registers WI and W2 also serve as FIFO
strobe-delay states for FIFO read operations.
Figure 10 shows, through three FIFO write cycles,
how the CY7C36I's state registers change to achieve a
fixed 40-ns delay. The outputs of the three state
registers are logically ORed together in the CY7C361.
Unlike many other register-based state machines, the
CY7C361's internal design allows you to OR together
adjacent but nonoverlapping state-register outputs to
generate a glitch-free output signal.
Next to each state register label in Figure lOis
either an s, t, or w. These letters represent which of the
three possible CY7C361 state register configurations is
used for that specific state register. An s (start)
specifies that the state register becomes active for exactly one clock cycle each time the required input conditions are met. A t (toggle) specifies that the state
register changes state on each clock cycle while the required input conditions are met. The t-type state
registers allow very efficient construction of counters.
The last type of state register, w (wait for terminate), is
set only by a carry in signal generated in the immediately preceding state register; the w-type state register is
cleared when its required input conditions are met.
Transmit Control
Transmitting information to the SCSI target is by
far the most complex function. The procedure requires
controlled interval timing for reading data from the
FIFO data buffer, placing the data in the transmit data
register and on the SCSI bus, and generating multiplewidth ACK pulses.
Because of these operations' controlled timing and
concurrency, the CY7C36I is again called into service.
The earlier application of this part used three of the 32
available state registers. The transmit function uses
many of the part's remaining states to generate the

6-49

PLD-Based Data Path For SCSI-2
\/0
\J 1 l
\/2 l
Figure 9. FIFO Write State Register Timing
necessary delays for asynchronous, synchronous, and
fast synchronous transfers.
For the SCSI transmit cycles to occur at the maximum rate, the HBA must stage or pipeline data so that
the data is immediately available for transmitting. This
operation requires that the HBA handle concurrent
asynchronous events. As one transfer is occurring on
the SCSI bus, the next piece of information must be
read out of the FIFO and be available for the next bus
~n.sfer. These FIFO read functions operate in two very
smular sequences: one for asynchronous SCSI writes
and one for synchronous SCSI writes.
Figure 11 shows the state diagram for FIFO read
operations. This state diagram has a similar reset state
(0) and the same delay states (2, 3, and 4) as the FIFO
write state machine. The two entry states are for
asynchronous (1) and synchronous (7) SCSI write
operations. For asynchronous SCSI writes, the FIFO
read' starts when synchronous operations are not
enabled, data is available in the FIFO, the bus direction
is set to out, and a FIFO read is not currently active.
For synchronous SCSI writes, the FIFO read starts
when the REQI ACK Offset counter is non-zero
synchronous operations are· enabled, data is available ~
the FIFO, the bus direction is set to out, and a FIFO
read is not currently active.
The FIFO read operation uses five more state
registers in the CY7C361. The state-register timing
diagram in Figure12 shows these new states:
RO starts the FIFO read for' asynchronous SCSI
writes
RSO starts the FIFO read for synchronous SCSI
writes

Figure 10. FIFO Load State Diagram

Figure 11. FIFO Read State Diagram
Rl serves as the FIFO strobe signal (ORed with
state registers WO, WI, and W2) and notes internally that a FIFO read is currently active
ES ends the FIFO read when the minimum delay
has passed (delay states 2, 3, and 4) and the transmit data register contains no valid data
DATA specifies that the transmit data register contains valid data
Figure 12 shows two sequences: RO starts an
asynchronous FIFO read, and RSO starts a synchronous
FIFO read. In normal operation, consecutive FIFO read
cyc!es .are .of the same type and overlap with data being
available In the transmit data register. Because the
FIFO output does not change (following the minimum
output delay time) until the FIFO strobe is removed
this strobe's' trailing edge is used to directly clock th~
data from the FIFO into the transmit data register.
With the FIFO data now in the transmit data
register and driven out onto the SCSI bus, the HBA
must generate specific and precise delays to allow the
ACK signal to be sent at the proper time. From the
time. ~at the data clocks into the transmit data register,
a mlnInlUm of 60 ns must be timed for asynchronous
SCSI writes, 40 ns for fast synchronous SCSI writes, and
90 ns for synchronous SCSI writes.
To create these delays and permit programmable
synchronous data rates slower than the maximum allowed, part of the CY7C361 is used to create a loadable
delay counter. This counter operates as a hardware subroutine within the CY7C361, providing all the necessary
delays for ACK timing.
For asynchronous SCSI writes, the state machine
?alls the delay routine as soon as information is .placed
In the transmit data register. When the timer times out
(returns to zero), the ACK signal is sent .For
synchronous SCSI writes, the state machine calls the
delay routine both to set and remove the ACK signal.
The CY7C361 implements the delay hardware as a
4-bit count-up toggle counter, which provides 15 different synchronous timing periods ranging from 100 to
380 ns. Table 1 lists the values that load into the counter

PLD-Based Data Path For SCSI-2
to - I U l
to -----II
Ria a Jl

RSIa •
RI to
ES sa

ACK Low interval when operating in fast synchronous
transfer mode with a lOO-ns period. To generate the
ACK delay for asynchronous mode, the SCSI specification for writes requires· two more delay states to get 60
ns. This added delay is achieved by setting the delay
counter inputs to 1101.
Figure15 shows the state diagram for asynchronous
SCSI write operations. The first active state (1) is the
fl11 state, which the state machine enters as soon as the
FIFO read completes and valid data is in the transmit
data register. The delay subroutine call appears as a
single state (2) that loops until the delay is complete.
Once the delay counter times out, the state machine advances to state 3, where ACK is transmitted. The state
machine remains in this state until the REQ signal is
removed. This clears the ACK signal and returns the
state machine to the reset state (0).
The state register timing for this sequence appears
in Figure 16. This timing diagram shows not only the
state registers used for generating the ACK signal, but
all the state registers used in the CY7C361. You can
therefore see the interaction of the FIFO read, delay
counter, and asynchronous ACK control state machines.
Figure 16 shows three tranjf~J'S. The ES state,
which ends the FIFO read operation, starts the ACK
delay state machine. As soon as this state machine is
started, the next FIFO read is also started. The ACK
cycle is terminated by the ATA state register, which
monitors the REQ and ACK signals. When the ACK
cycle completes, the next FIFO data is clocked into the
transmit data register, and another ACK cycle is
started.
The state diagram for synchronous transfers appears in Figure 17. This sequence starts the same as an
asynchronous transfer, except that the termination of
the first ACK delay starts a second delay to remove the
ACK signa1. When this second delay times out, the
ACK state ends. Meanwhile, the ongoing FIFO read
operation has put data into the FIFO. The end of the
ACK state prompts the FIFO read to complete and
start the next ACK cycle. Two fill states, 4 and 5, are

n..n

VI
V2

n

n
~

n

n

DATA w

L

Figure 12. FIFO Read State Register Timing
to provide these periods. The load value for the counter
enters the CY7C361 via four input pins. When the delay
subroutine is called, the signal levels on these four pins
load into four state registers, which in turn load into the
counter.
Figure 13 shows the state transition diagram for the
delay counter. From the reset state (0) the delay
counter enters a load state (L). Because the delay
counter has 15 possible start points, the load state must
have 15 possible exits. When the counter has reached its
maximum value (1111), the counter enters an exit state
(X) to toggle the ACK signal on or off.
This loadable delay counter uses nine. more state
registers in the CY7C361. Four of these state registers
(CEO, CEl, CE2, and CE3) serve as counter enable bits
that load the four toggle state registers (CTO, cn, CTI,
and CT3). The ninth state register is used for the exit
state (CTX).
Two count sequences appear in Figure14. The first
sequence shows the shortest timing interval, created by
loading 1111 into the counter. The second sequence
shows a longer delay, which results from loading 0101.
Because the delay counter has overhead states, the
shortest interval the counter can time is 30 ns. To get
the widest range of synchronous transfer periods from
the delay counter, a fill state is generated at the start of
each ACK cycle to stretch this minimum interval to 40
ns. The 40-ns interval determines the shortest possible

CTX
CEI2J
CEI
CE2
CE3
CTI2J
CTI
CT2
CT3

s
s
s
s
s

.---fl
Jl
Jl
Jl

~

n
n

l
l
l
l

Figure 14. Delay Counter State Register Timing

Figure 13. Loadable Delay Counter State Diagram

6-51

PLD-Based Data Path For SCSI-2

Figure 15. Asynchronous Write State Diagram
Figure 17. Synchronous Write State Diagram

ORed with the ACK state to meet the timing requirements 'of synchronous transfers.
By carefully selecting the data-enable, set-up, hold,
and ACK duty cycle, you can use the same state
machine for synchronous and fast synchronous transfers. Figure 18 shows the, state register timing for three
transfers in fast synchronous mode, with a 100-ns data
period. Compare these transfers with Figure 19, which
shows the state register timing for two synchronous
transfers, with a 200-ns data period. The only difference
between the two types of transfers is the amount of time
spent in the delay counter. Additionally, the FIFO read
portion of the waveforms shows that the synchronous
FIFO read state register, RSO, starts the FIFO read instead of the RO state register used with asynchronous
SCSI writes.
As configured thus far, the CY7C361-based state
machine generates the FIFO strobe signal for FIFO
read and write operations and the ACK signal for
asynchronous, synchronous, and fast synchronous SCSI
writes. As for SCSI read operations, the HBA generates
the ACK signal for asynchronous reads by returning the
REQ signal as ACK. For synchronous reads, however,
the HBA must use a different mechanism.

w • ______________________
VJ t.

~

V2 t.

~

R8 • -I"L--.J1

~B

The ACK sequence needed for synchronous SCSI
reads has the same timing as the ACK generated for
SCSI writes, except that the initiator places no data on
the SCSI bus. Because the CY7C361 outputs do not
control the enables of the SCSI transceivers, or the
receive and transmit data registers, the same ACK control state sequence used for synchronous SCSI writes
can also serve for'synchronous SCSI reads.
The return of an ACK on a SCSI read is based on
the FIFO having room rather than the HBA having data
available. Thus, a new state register must be added' to
start the ACK cycle. Additionally, a signal, is needed to
decrement the REQI ACK Offset counter. Although you
might expect to use the output ACK signal for this purpose, it does not occur early enough in the cycle to
count down the REQI ACK Offset counter before the
next ACK cycle is ready to start.
Figure 20 shows the state register timing for a fast
synchronous SCSI read operation. The DOWN state
Table 1. Synchronous Data Rates

~----------

r-1L-~

.:=J===LS=======LJ=======L:==========
LJ

---flL======:;-'nL-======;-"LJnL======:;----

ACKSJ • --1"1'---_ _---' '--_ _---'
M:J( t. _ _ _

L -_ _ _ __

~__'

ATA • _______----' '---_ _---' '--_ _---'

ACKS2. _ _ _ _ _ _ _ _ _-'--_ _ _ _ _ _ _ _ ___
IIO<.A. ___________________________

ACKB. _______________________________

CEB
CEJ
CE2
CES
CTB
CTJ

•
•
•
•
t.
t.

CT2 t.
CTS t.

n'-_ _ _--'nL_ _ _--'n'-_ __

- - - D_ _ _---'nL___--'n'-____

___________________-..,..._________

- - - D ' - -_ _---'nL_ _ _--'nL.._ _ __
---D
n
n'-_ _ __
nJ"l

nJ"l

nJ"l'-_____

r-1L.._ _ _ _ _--'r-1L_ _---'r-1'-_ __
roL.._ _---lro'-_ _ _---JroL.._____

"

Data
Rate

Data Transfer
Mode

1111
1110
1101
1100
1011
1010
1001
1000
0111
0110
0101
0100
0011
0010
0001

lOOns
120ns
140ns
160ns
180ns
200ns
220ns
240ns
260ns
280ns
300ns
320ns
340ns
360ns
3800s

10.0MUs
8.33MUs
7.14MUs
6.25MUs
5.55MUs
5.00MUs
4.54MUs
4.16MUs
3.84MUs
3.57MUs
3.33MUs
3. 12MUs
2.94MUs
2.77MUs
2.63MUs

Fast Synchronous
Fast Synchronous
Fast Synchronous
Fast Synchronous
Fast Synchronous
Synchronous
Synchronous
Synchronous
Synchronous
Synchronous
Synchronous
Synchronous
Synchronous
Synchronous
Synchronous

_____________

DATA
--1LJrL...M •w ___________________
__
CClVNw _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ___

CTX •

Synchronous
Period

nJ"l'-______________
nL-_____ _ , _ - - - - - - - - - -

RJt.~

ES •

Load
Value

"

roL.._____

Figure 16. Asynchronous Write State Register Timing

6-52

PLD-Based Data Path For SCSI-2
w. _________________

The solution is to construct a small latch external to
the CY7C361. The latch allows the ACK signal to be
generated as soon as possible, but only transmitted on
the SCSI bus after the REQ signal is received. The
latch's output prompts the CY7C361 to terminate the
current ACK when the CY7C361 sees an external ACK
present and REQ not active.
Now another SCSI possibility must be considered.
When the HBA receives information on the SCSI bus in
asynchronous mode, the ACK signal is just a repeated
REQ signal. The repeated REQ must still be justified
by the half-full signal from the FIFO. This extra
qualifier requires use of another latch to handle the following sequence: If an ACK is returned when the FIFO
half-full state is reached, the ACK being sent remains
active until REQ is removed, but another ACK is not
sent until the half-full flag changes and REQ is present.
This same circuit must also give synchronous transfers a bypass path for generating an ACK pulse that is
not tied to REQ. Gating the latch with the SYNC signal, which specifies synchronous operation, it is possible
to disable the latch for synchronous operations and
enable a different path at the same time.
These complex gating functions are again an excellent fit for a PLD. Because the ACK signal is part of
the asynchronous transfers' round-trip path, this application needs a fast part to limit the delays and skew
between data and clock. The best choices are probably
parts running at lO-ns or faster, such as the members of
the PAL18G8, PAL20GI0, or PAL22VIOC families. Although many of these parts are only available with active-Low outputs, you can correct the signal polarity at
the SCSI transceiver by reversing the differential signal
lines. Figure 21 shows the necessary gating function for
the ACK signal.

VI t.
V2 t.

IS. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
OO\INw _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

1tO1
REQ )---...-+--+---I

~

.-------------------

~
Rl to _ _ _ _....,...._ _ _ _ _ _
ES. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

~---'--'----

DATA w _ _ _ _ _ _ _ _ _ _ _ _--'--_ _ _ __
IS • --fl'___ _-' '-_ _---J ' _ _ _ - - ' - -_ _ __

IXMII w
ACKSI • --.n'----__-'
/10( -to -,--_ _--.J

'-_ _---J

IHF >---+-+-1-+----1
I ACK

' - -_ _ _ __

>-----,-+-+-I_+_~

ActCOUT

ATA • _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
AO-----~

CEB. _ _--'

eEl • _ _--'

Figure 20. External ACK Gating/Latch

CE2. _ _--'
CES • _ _----'

CTBto
CTI t.
CT2 t.
CTS t.

_ _ _-,

data path, additional functions are needed to complete
the host bus adapter. For example, you need control
circuitry to· operate the transceiver enables; read and
write the FIFO on the host bus side; monitor the SCSI
bus and change the FIFO direction when necessary;
control the selectionlreselection sequences; and similar
operations.
This data-path design meets its performance goals
with a minimal amount of circuitry. Because much of it
is implemented in PLD-type devices, you do not have to
redesign the HBA to handle almost any change to
SCSI-2 or future SCSI versions that affects interface
timing; instead, you can simply reprogram the existing
parts. This PLD flexibility provides the faster time to
market necessary to remain competitive in today's
markets.

_ _ _-'
_ _ _-'
_ _ _...J

Figure 21. Fast Synchronous Read State Register
Timing

Putting It All Together
All the necessary SCSI-2 data-path functions are
now accounted for. Interconnecting these pieces as
shown in the overall schematic (Figures22 and 23) completes the data-path design. The fIrst sheet of this
schematic (Figure22) details the physical SCSI interface
connection and interface transceivers. The second sheet
(Figure 23) contains the data-path logic functions. Although these two pages form a very compact and fast

6-54

II,
~~

i~

Dlfferenll.,l SCSI
I nt-erF oce Conn .. clor

!:eU,? PI

J2
(TIR\ .,7.P I;

~2

f;J
l@D

+5V

tSV

02

Dl
3A

F1

IA

~~

~

(fQ

=

-< -XJ\T1'LEN >

"'I

~

N

~2'

TIR4

~

~2e

0

~28

<:
"'I

01

25

ci®=J

21~

~

~311

~

en
0,

29~

~

~

~32

rJ:J

-8SY

r-==~::;~~~~~~~JI-------<'+BSY

(")

::r
~

=
a

+OST
-ASG

n°

::r
~

:~;~

IK

~

l-d

-SEL

IR2

rJ:J

a

~~~~~=i~~~~~~~~~~~~+~

+5V

b

,~-~------~.

(;.ocour
I

~

(SELECTED

~

rJ:J

"5

~-~

;~'

=

'1_

~~
CYPRESS
TITLE

SE~ICONDUCTOR

SCSI-2 HOST BUS ADAPTER

FlN:TION

PHYSICAL INTERFACE

SIZE

C ISt£ET

1

CF

2

REV

f'D

c:lo
~
~

f"to

~

l-d

~

f"to

::r
~

...

o

00

Ci
00
~
N

!'
~

SET...DIR
-I\..RESE:T
~
RCB8

B

REa..IN

l

REo~i~E

t::~ :~

~

.--r-!-!-'R
1-4 GI

cJ.!..
2

t RIB'

IC2

~23

~~22
~r--21

~~~
~r--Ie

~

IJCl

=

"1
!'I>

~_0~7.P

N
~

--

~'----17

~

I: -

!~

~
"1

~

(J1

r:J':l

CJ)

AmI
Rrn2
ROO3
Rtfl<

SYNC

@=>
C3

r:J':l

=-

12

H

20

STe'
B'IP'
STBB
SYPB

-!.' ,..,

ElF"
H'"

17
18

/eLIl SI

I'EI\........
I'El\..IllS2
I'El\..I3U53
I'El\..IlJS.4
1'EI\..!lJS6

/£fLIIUS6
1'El\..9JSr
1'El\..8USP

~

24

l'

:D

e....:.L.

'----

GI

1C2

£EfI

~

ROO. 4

~~t~~

~

~-~

==

~-~

SYNC

RIB' 10

"*

!L. I!E.

Ts
1'416

-

1lE1\..alSI.", pJ

-

~~~

T.

r-4I.
14
15

2'6

R(B82~23

'-g

CONTRCL

IIII.

i

2~

25

1:~J

5

CY?C439

~~ 1~ :~

RIBS

ROO'

~

•

>-~

3
41

I

RIB'

I..

3~~'*
361
~~

BUFnR
5-18

~IT

~

!'I>

--REGISTER

·REgET

Cl

~
+
""'TA

Fal5

13

~

~~~ +-

2

~

!")

==
an'

•

ROOO

0
q>

I

?4F2e8
PARI TV

~ O£J RCB2 B

P~R

[TY....O..EDCCLOCK

P/IR..O.K

~

b

-;".

!'I>

~
N

IO_IN

ILSr~

~

3
4

5

P,",L

••VI8C

~
~
21

~

~
-4 DIFFERENCE
~
CD..NlER

=1
~

::E:

f?-

~

VDI

~

I

.5V

~
IK : m :

LSr2

~

PAL

••vlce

---i

Z3

g...

~
t:'-l

J

m;KM..Ell£..TRf.NSMT ~

~

~

nl'~
~
=i
1i=
-iT
"iT
~

CONTRIl.

'---

~
.....

~

=-

=tt

rD

Q..

~

I CYPRESS
ITITLE SCS I -2
FlKT[ClII

SIZE

~

SEf\ICONOUCCTOR
HOST BUS ADAPTE R

DATA PATH ANO CONTROL

C ISt£ET 2

CF

2

I REV

~

.....

=.,
~

o

00-

n0074
N

PLD-Based Data Path For SCSI-2
Appendix A. PLD Toolkit Source Code for the
REQI ACK Offset Counter
C22VIO;
{SCSI2DIF.CYP}

{***********************************************************************

*

*

*
difference counter - keeps track of how many REQUEST pulses
*
*
have been received vs. how many ACKNOWLEDGE pulses have been
*
*
sent. The single output DIFF, is used only during synchronous
*
*
data transfers. When DIFF = 1 there exists a received REQUEST
*
pulse that has not been responded to by an ACKNOWLEDGE pulse.
*
*
*
The circuit contains two metastable prevent circuits to
*
capture the REQUEST and ACKNOWLEDGE signals. These signals
*
*
*
are filtered to be enables to a 3 bit up down counter. These
*
*
signals can occur at the same time. If they do the counter
*
should not count. Only one count cycle is allowed per enable.
*
*
*
*
************************************************************************}
CONFIGURE;
CLOCK(node=l),
REQ(node=2),
SELECTED(node= 3),
!CT DOWN(node=4),
SYNC(node=5),

{50 MHz system clock (20ns period)}
{SCSI Request signal, used for count up}
{used to reset the registers an counter}
{down count pulse from CY7C361}
{synchronous operation enabled}

{outputs}
DIFF(node= 14,noreg,ninv),
Q2(ninv),
Ql(ninv),
QO(ninv),
DOWN (ninv),
DOWN INH(ninv),
ACK :rN(ninv),
/UP, UP INH(ninv),
RE
 REQ & SYNC;

UP INH




UP =

REQ_IN;


 REQ_IN & !UP_INH;

6-57

PLD·Based Data Path For SCSI·2
Appendix A. PLD ToolKit Source Code for the
REQIACK Offset Counter (continued)
ACK_IN =




DOWN_INH

= 


DOWN

=

ICT_DOWN;
ACK_IN;




ACK IN & !DOWN_INH;

{3 bit counter}
QO =

SYNC * UP & !DOWN & !QO
SYNC * DOWN & IUP & IQO
SYNC * UP & DOWN & QO
SYNC * !UP & !DOWN & QO;



#

#
#
Q1



SYNC * UP & !DOWN & IQ1 & QO
SYNC * UP & !DOWN & Q1 & !QO
SYNC * DOWN & !UP & IQ1 & !QO
SYNC * DOWN & !UP & Q 1 & QO
SYNC*UP&DOWN&Q1
SYNC & !UP & !DOWN & Q 1;



#
#
#
#
#
Q2



SYNC * UP & !DOWN & !Q2 & Q1 & QO
SYNC * UP & !DOWN & Q2 & !Q1
SYNC * UP & !DOWN & Q2 & !QO
SYNC * DOWN & !UP & !Q2 & !Q1 & !QO
SYNC * DOWN & !UP & Q2 & Q1
SYNC * DOWN & !UP & Q2 & QO
SYNC * UP & DOWN & Q2
SYNC & !UP & !DOWN & Q2;



#

#
#
#
#
#
#

DIFF




#
#

Q2

Q1
QO;

6-58

PLD-Based Data Path For SCSI-2
Appendix B. PLD ToolKit Source Code for ACK and FIFO Strobe Control
CY7C361;

{****************************************************************

*
*

SCSI2 FIFO and ACK timing controller. Supports asynchronous
writes and synchronous and fast synchronous reads and writes

*
*

*****************************************************************}
CONFIGURE;
{reset control}
/RESET(node= 3,ireg),
GLBRST(node=64),

{low asserted reset, single reg}
{global reset control node}

{clock control}
CLKIN(node=4),
CLKDB(node=74,dbl clk),
IENA(node=29),
IENB(node= 30),
IENC(node=31),

{system clock}
{enable clock doubler}
{input clock enable for nodes 3,5,6,9}
{input clock enable for nodes 10,1l,12,13}
{input clock enable for nodes 1,2,14,15}

{inputs}
ZERO(node=73),
REQ(node=5,iireg),
ACK_IN (node=6,iireg),
10_IN(node=10,ireg),
DIFF(node=9,ireg),
HF(node= 11 ,iireg),
EF(node= 12,iireg),
SYNC(node= 13,ireg),

{internal tie point for enables}
{asynchronous SCSI request signal}
{gated ACK output signal, latched by REQUEST}
{SCSI bus set to O=out, 1=in}
{difference count <> O}
{room for data in FIFO - write}
{data in fifo - read}
{synchronous transfer mode}

{counter inputs}
CO(node= 1,ireg),
C1(node=2,ireg),
C2(node= 14,ireg),
C3(node= 15,ireg),
{outputs}
/ ACK_OUT(node=16),
/CT DOWN(node= 17),
/FIFO_STRB (node= 18),

{LSB (bit 0) of ACK length counter}
{bit 1 of ACK length counter}
{bit 2 of ACK length counter}
{MSB (bit 3) of ACK length counter}
{all low is an illegal value for CO,C1,C2,C4}
{ACKNOWLEDGE signal, used for asynchronous
SCSI writes and synchronous SCSI reads/writes}
{count down pulse for DIFF counter}
{FIFO strobe for SCSI writes, FIFO reads}

{state nodes}
{FIFO Write State Machine}
WO(node=32,start),
W1(node=33,tog),
W2(node=36,tog),
{FIFO Read State Machine}
RSO(node= 34,s tart),
RO(node= 37, start),
R1(node=35,tog),
ES(node=38,start),

{starts FIFO write sequence}
{delay state for FIFO strobe}
{delay state for FIFO strobe}
{start of sync FIFO read}
{start FIFO read strobe}
{stays active until transmit register is marked
as empty, uses delay states from FIFO write machine}
{ends FIFO read strobe and sets data
in output latch}

6-59

PLD-Based Data Path For SCSI-2
Appendix B. PLD ToolKit Source Code for ACK and FIFO Strobe Control (Continued)
DATA(node=39,cin,tenn),
ACKSl(node=43,start),
ACK(node=47,tog),
ATA(node=42,start),
ACKS2(node=47,start),
ACKA(node=40,start),
ACKB(node=41,start),
AS(node=44,start),
DOWN(node=45,cin,tenn),

{data in output latch}
{start fIrst ACK delay}
{ACK active}
{ACK Terminate, Async}
{start second ACK delay}
{synchronous ACK stretch I}
{synchronous ACK stretch 2}
{start ACK for sync SCSI read}
{count down pulse for SCSI reads}

{4 bit loadable counter}
CEO(node=54,start),
CTO(node=56,tog),

{load counter bit O}
{counter bit O}

CE 1(node=57 ,start),
CT1(node=58,cin,tog),

{load counter bit I}
{counter bit I}

CE2(node=59,start),
CT2(node=60,cin,tog),

{load counter bit 2}
{counter bit 2}

CE3(node=61 ,start),
CT3(node=62,cin,tog),

{load counter bit 3}
{counter bit 3}

CTX(node=63,start),

{terminal count reached (1111)}

EQUATIONS;
{CONTROL}
GLBRST
IENA

IENB
IENC

RESET;

 lZERO;
 lZERO;
 lZERO;

{STATES}
{start}
WO =

 10 IN

{tog}
WI =


 IWO

{global reset set to RESET signal}
{allow input clocks}
{allow input clocks}
{allow input clocks}

*

REQ;
{WO starts all FIFO write sequences when a REQuest is
received with the bus direction set to IN, used as part
of the FIFO STBX signal for FIFO writes}

* /WI * /w2 * IRO * IRSO;
{WI is triggered by WO and continues to toggle until WI
and W2 return to 0, used as part of the FIFO STBX
signal for FIFO writes}

{tog}
W2 =

{start}
RSO =



WI;

{W2 is triggered by WI for two clocks,
used as part of the FIFO STBX signal for FIFO writes}

 110 IN * SYNC * EF * DIFF * IRl;
{synchronous FIFO read started when the bus is in the proper
direction, synchronous' mode is active, data is in the FIFO, at
least one ACK is pending (DIFF) and a read is not in progress}

6-60

PLD-Based Data Path F()r SCSI-2
Appendix B. PLD ToolKit Source Code for ACK and FIFO Strobe Control (Continued)
{start}
RO =

{tog}
Rl =

 /10 IN * ISYNC * EF * IR1;
{asynchronous reads are started when the bus is in the proper
direction (OUT), synchronous mode is not active, there is data
in the FIFO (EF) and a read is not in progress (R l)}

 IRO

* IRSO *

IES;

{set a read in progress with RO or RSO, end same with ES when
read is complete and no data is in the output latch, used
as the FIFO STBX signal for FIFO reads}
{start}
ES =

 Rl

* IWl * IW2 * IDATA;
{end read strobe and sets DATA in output latch}

{cin,term}
DATA =

{start}
AS =

 ACK
 ICTX * lATA;
{data in latch set when FIFO read is
ended and cleared by end of ACK cycle}
 10_IN

* HF * DIFF * IDOWN * lACK;
{start new ACK cycle if DIFF<>O and
cycle not active with room in FIFO}

{cin,term}
DOWN
{start}
ACKSl
{tog}
ACK
{start}
ATA =

{start}
ACKS2



CTX;

{end counter down count}


 IES

* lAS;
{start the delay counter for the
leading edge of the ACK signal}


 ICTX

*

lATA;

{tum ACK on and off}

 ACK IN * ISYNC * IREQ;
- {ACK Terminate Async is triggered when an external ACK is
present and REQUEST has dropped, this occurs a minimum of 3
clocks after ACK is set due to metastable prevent pipeline
delays. One more cycle occurs to remove ACK and DATA}
 SYNC

* /ACK * CTX;
{used only in synchronous modes, starts
delay counter for terminate of ACK}

{start}
ACKA

 CTX

{start}
ACKB



*

ACK;
{lengthen the ACK signal by two clock periods to allow data
to change at the trailing edge of output ACK signal}

ACKA;

{lengthen the ACK signal by 2nd clock}

6-61

PLD-Based Data Path For SCSI-2
Appendix B. PLD ToolKit Source Code for ACK and FIFO Strobe Control (Continued)

{start}
CEO =

{start}
CE1 =

{start}
CE2 =

{start}
CE3 =

{tog}
CTO

=

 CO
 IACKS1 * IACKS2;
{latch bit a of counter for preset}
 C1
 lACKS 1 * I ACKS2;
{latch bit 1 of counter for preset}
 C2
 lACKS 1 * I ACKS2;
{latch bit 2 of counter for preset}
 C3
 lACKS 1 * I ACKS2;
{latch bit 3 of counter for preset}

 ICTO

* Icn * ICT2 * ICT3 * ICEO;
{toggle bit

a of counter when any bit set}

{cin,tog}

cn

=



CTO;

{toggle bit 1 of counter when bit

{cin,tog}
CT2 =

 CTO

*

{cin,tog}
CT3 =

 CTO

* cn *

{start}
CTX =

 CTO

* cn * CT2 * CT3;

a is set}

{toggle bit 2 of counter when bits 1 and

CT1;
CT2;

a are set}

{toggle bit 3 of counter when bits 0, 1, and 2 are set}

{counter has completed count up to 1111}

{OUTPUTS}

leo

{disable output driver to allow CO as input}

IC1

{disable output driver to allow Cl as input}

1C2

{disable output driver to allow C2 as input}
{disable output driver to allow C3 as input}
 lACK

FIFO STRB

= 
-

CT DOWN

IWO

* IACKA * IACKB;

{ACKNOWLEDGE signal}

* IWI * IW2 * IRl;
{FIFO read/write strobe}

 lAS * /DOWN * IRl;
{count down input for difference counter}

6-62

=

~

,

CYPRESS
SEMICONDUCTOR

PAL Design Example:
AGCR Encoder/Decoder
Quarter-inch tape cartridges are used extensively to
backup or archive data from hard disks. Most drives are
operated in a continous or streaming mode (for reasons
discussed later). Data is recorded at 10,000 FRPI (flux
reversals per inch) in a serpentine manner on seven to 14
channels. The tape moves at 30 to 90 ips (inches 9per
sec~nd), and the error rates achieved are one in 10 or
101 . A cartridge holds 2000 to 3000 feet of O.OOl-inchthick tape and stores 20 to 80 Mbytes of data.

This application note describes the procedure used to
encode/decode serial digital data for recording/reading
from one-quarter-inch magnetic tape. The design
presented here uses a Cypress CMOS PAL C 16R6 to implement the logic.
Digital data encoding and decoding is often used to
increase the reliability of data transmission and storage.
One such area is the transformation between data stored
on one-quarter inch magnetic tape and serial digital data.

A Little History

A Typical System

The recording format and the Group Code Recording
(GCR) code used in this design have been adopted and
incorporated in a series of standards. The standards are set
by the QIC (Quarter Inch Cartridge) Committee, composed of manufacturers and users of quarter-inch tapes
and cartridges. The committee's purpose is to ensure compatibility between manufacturers and reliability to end
users.

Figure 1 shows a block diagram of a typical tape
drive system. The interface with the host (or host adapter)
is bidirectional. The interface has a byte-wide data path
and 10 to 20 control signals, depending upon the interface
standard. Data rates are 300 KBytes/s to 1 MBytes/s.
The formatter or tape controller performs serial/parallel conversion and encoding/decoding as well as error
checking; in some cases, the data is also error corrected.
Control is usually provided by· a state machine, which

PULSE

OtT.
TAPE
rORIo4ATIER
OR
CONTROLLER

rORIo4ATIER

DRIVE

QIC-50
QIC-59

QIC-02
SCSI
IPI

Interface
Standards

Interface
Standards

QIC-24/36

Figure 1. Typical Tape Drive System

6-63

HOST

1+-...,...-+1 ADAPTER

HOST

HOST

GCR Encoder/Decoder
o

o

o

o

o

o

'--------'Rl_----'
o

o

o

o

READING FRO... TAPE

Figure 2. GCR Signal
handles the handshaking with the host as well as control
of the tape. Data is written in blocks of various lengths
(depending upon the standard), and a read-after-write
check is usually performed. Buffer storage of at least two
blocks of data is usually provided using static RAMs,
FIFOs, or some combination of the two.
The drive electronics include digital signals for controlling and sensing the tape motion and analog signals for
the read and write paths. The interface between the drive
electronics and the formatter is digital and varies depending on the standard used.

phase difference between the data separator's own frequency and the peak detector's data output, then adjusting
a voltage controlled oscillator (VCO) until the VCO's frequency equals that of the data.
The reference clock's frequency must be at least
twice (2t) that of the highest frequency to be read (t). The
PLL is synchronized to the 2f reference frequency when
not in use.
Before a block of data is recorded, a string of Ones is
recorded, which is called the preamble. When the command to read is given, the 2f reference frequency is
removed from the data separator, and the· signal from the
peak detector applied. The PLL then attempts to lock to
the preamble - a procedure called getting bit sync.
Just after the preamble, a code violation is recorded
so that the formatter can recognize where valid data
begins. The detection of the code violation is referred to
as obtaining byte sync.
PLLs typically exhibit frequency and phase offsets
during preamble acquisition. Phase errors also occur after
lock, during the reading of the data field. Differences in
tape speed during record and playback (as well as from
unit to unit) result in frequency differences between the 2f
reference and the data read from the tape. Random phase
errors caused by noise, intersymbol interference (bit
crowding), timing errors, and other transients might also
get the PLL out of lock.
The data separator's PLL is susceptible to these errors because it must satisfy two conflicting conditions: it
must lock quickly enough to detect the preamble, but it
must not over-correct phase for a single misaligned bit.
Strings of Zeros cause the PLL's phase to shift. If the
shift is larger than the bit window, an error occurs. The
QIC-24 standard calls for up to a 37-percent bit-shift
tolerence, which means that the data separator must be
able to recognize a One (flux transversal) that deviates
±18.5 percent from its expected time position without
causing a data error. To achieve this performance, a 4-bit
binary nibble is encoded into a 5-bit OCR code word,
which is written onto the tape.

Reading and Writing on Tape
To write on the tape, a current of 100 rnA or less is
used to change the direction of magnetization. To read
from the tape, a coil of wire (the read head) is held
against the tape; changes in direction of the tape's magnetic flux induce a voltage (10 mV or less) in the coil.

Recording Codes
All codes used for recording on magnetic mediums
are classified as Franaszek Run Length Limited (RLL)
codes of the form:
(D, K)

where D = the minimum number of Zeros between consecutive Ones, and K = the maximum number of Zeros
between consecutive Ones.
D controls the highest frequency that can be
recorded, and K controls the lowest frequency.
Using the Franaszek notation, the OCR code is (1, 2).
As illustrated in Figure 2, a flux reversal signifies a One,
and the absence of a flux reversal signifies a Zero. This is
true for all codes.

Peak Detection and Data Separation
OCR recording equipment detects peaks instead of
zero crossings because peak-detection circuits are less
sensitive to noise. The output of the peak detector goes to
the most critical analog circuit in the drive: the data
separator.
The data separator provides Ones and Zeros that
occur at a precise frequency. The circuit does this using a
phase locked loop (PLL). First the data separator
synchronizes itself to a crystal-controlled reference clock.
Then the circuit attempts to lock itself to the maximum
data frequency on the tape. This is done by finding the

The Purposes of GCR Code
The 5-bit OCR code format encodes data such that no
more than two consecutive Zeros occur in the serial data.
This encoding relaxes the performance requirements of
the PLL and loop filter, so that the system can achieve the
desired performance.
6-64

GCR Encoder/Decoder
Table 1. GCR Code

GCR encoding also compensates for the speed variation of the tape due to:
Mechanical Tolerences in cartridges and tape thickness (±3 percent)
Tape elasticity and wear
Motor speed variation
Temperature and humidity
These static tolerences can result in a (±10-percent
tape-speed variation.
In addition to the static tolerences, instantaneous
speed variations (ISVs) occur. These result from discontinous tape release at the unwind spool (10 - 20 percent),
guide/back stick slip (5 percent), and shuffle ISV (vibration) due to start/stop (5 - 30 percent). The shuffle ISV
can be avoided by operating the tape in a continous
(streaming) mode. If these dynamic tolerences are added
together they can result in (±15-percent speed variation.
The electronics in the tape controller and the drive
are designed to compensate for the tape-speed variations
due to mechanical tolerences.
The compensation is accomplished by:
Data encoding and error detection and correction
PLL design
.
Bit-window tolerence

4-BitCode
D D D
1 0
2

LlneNumber
(For Ref.)

D

0

0
0
0
0
0
0
0
0

0
0
0
0
1
1
1

1
1
1

0
0
0
0

1

2
3
4

5
6

7
8
9

10
11
12
13
14
15

3

1
1

1

y

S

1

0

0

1

0
0
0
0

0

1
1
0
1

1
1

0
1

1
0
0
0

0

1

1

A
0

0
0
0
B
0

1
1

A
3

A
2

A
1

1

Y

1

0
1
0
1
0

1

1

Y

3
1
1

0
0
1
1
0
0
1
1
0
0
0
0
1
1

1
1

S-BitCode
Y

1

0
1
0
1

0
1

1

1
1
1
1
1

1
0
0

1
1

1
0

1

1
0
0

0
0
1

1
1
0

1
1
1

1

0
0
0

0

1
1
1
1
1

1

1

1
1
1
1

B
1

B
2

1
1
1

0
1
1

B
3

1
1

0
1
0
1
0
1

0
1
0
1
B

4

~ode-control signals. The GCR code used in this design
IS part of the QIC-24 Standard and is also the ANSI
X3.54 standard (1976). The MSB (leftmost bit) is
recorded fIrst. Note that there are a maximum of two consecutive Zeros in the 5-bit code recorded on the tape.

Sequence of operations

Design Procedure

During a write operation, the following sequence
occurs:
1. Idle (hold)
2. Convert 4-bit parallel input to 5-bit GCR code and
load into 5-bit register
3. Shift-out 5 bits to write amplifier.
During a read operation, the following sequence
occurs:
1. Idle (same as during write)
2. Shift-in 5 bits
3. Detect sync mark, set/clear invalid flag, convert
5-bit serial input to 4-bit binary value, and load
value into register
Note that the read clock and the write clock are not
the same. Additionally, the logic must keep up with the
tape data rate. Finally, the read and write operations are
mutually exclusive. This means that the storage elements
(D flip-flops) can be time-shared and that read and write
operations require five clocks.
The GCR design requires a total of five states because the idle state is common to both read and write
operations. Therefore, the design requires three control
lines. It is convenient to designate one control line as an
enable line (active Low) and the other two lines as modecontrol signals.
This application note does not describe the control of
these lines or the required clock synchronization. This is
because at the next level of control, you must implement
in hardware the responses to error conditions. These
response choices tend to be application dependent as well
as subjective.
The diagrams in Figure 3 show the flow of data
under control of the ENABLE signal and the MO and Ml

The procedure for designing the GCR circuits is to
map the code conversions using Venn diagrams and write
the logic equations as the sum of products, or in minterm
form. Because the design requires six flip-flops, the logic
is implemented using a CY7C16R6 PAL. Because the
~AL has inverting. output buffers, the Zeros are mapped
mstead of the Ones. The D flip-flops require an extra term
to hold their states when the ENABLE is HIGH.
For a conventional D flip-flop, for example, the form
of the logic equations is:
D=
ENABLE 1 (Q)
; RECIRCULATE
PRESENT STATE
+ ENABLE 2 (F2)
; FUNCTION 2
+ ENABLE 3 (F3)
; FUNCTION 3
where the ENABLE controls are mutually exclusive.

4-bit to 5-bit Conversion for Y3 Output
At the bottom of Table 1, the 5-bit code columns are
labeled BO through B4 to help show how the 4-bit code is
mapped. In addition, the line numbers are labeled 0
through 15, which correspond to the values of the 4-bit
binary code.
Figure 4a shows how the 4-bit binary code is mapped
on .the Venn diagram. For example, reference line zero,
WhICh corresponds to binary value zero, is located in the
lower right hand comer.
The Venn diagram in Figure 4b shows the conversion
for the Y3 output, which is labeled the BO input to the D
flip-flop. Note that the parallel nibble (see Figure 3) is
reversed end for end so that the MSB is written first when
the nibble is shifted out.

6-65

GCR Encoder/Decoder

ENABLE M1 MO

x

OPERATION

X HOLD

DATA FLOW DIAGRAM

~~~lf?lf?
Y3

o

o

0 SERIAL
SHIrT IN

Y2

1

Y2

0 CONVERT
5-BIT TO 4-BIT

Y2

o

YO

Yl

SO

01

1 CONVERT
4-BIT TO 5-BIT

Y3

0

so

YO

Yl

03

1

SO

SIN

Y3

0

YO

E~
Y3

o

Yl

Y2

Yl

YY
YO

SO

1 SERIAL
SHIrT OUT

Y3

Y2

Yl

Figure 3. Data Flow Diagrams

6-66

YO

SO

DO

3

DO

DO

10

11

....::;

,

2

01

0

0

/'
0

1

14

15

6

0

0

0

1

1

02

1

0

1

1

0

1

1

1

1

1

1

"02

5

13

12

4

1

0

1

1

1

9

8

0

1

0

1

1

--'

0

\.......-'
03

(a) Binary Values

(b) Y3 Map

Y2 -IiI

0

0

o)

1

1

1

1

1

1

1

1(0

0

0

or

= ronl

.j.IDD2DO

(c) Y2Map
DO

DO

01
1

03

03

DO

0

r-;1

0

oJ

.p.,
1

1

1

~

1

1

1

1

1

1

1

1

1

1

1

1

' - I---'

02
(0

1

~

/""'.

0"

1

I 01

0

1

I

0J

1

0

1

02

03

= Ii3 = DUil IXl ... D3 D1 DO + D215T DO

(d) YIMap

0

'---"

'----'

03

D3

Y'f=1I!=m

1

Dt

01

Y!l

02

'----'
1

(0

y-...

1

01

01

7

...- r---

V

(e) YOMap

So=fi4-D100+DJIXl

(I) So Map

Figure 4. 4- to 5-Bit Conversions
In Figure 4b, the Ones and Zeros in column BO are
mapped. For example, reference line zero has the value
One in column BO of Table 1. Therefore, a One is placed
in the square corresponding to binary value Zero in Figure 4b. In a similar manner, reference line 15 has a value
of zero in column BO, so a Zero is placed in the square
corresponding to binary value fifteen.

used combinations are Don't Cares, which are represented
by Xs in the Venn diagrams. Don't Cares can be either
Ones or Zeros, which further reduces or simplifies the
logic equations.
The procedure is to plot the Ones and Zeros, put Xs
in the blank squares, and write the equations for the Zeros
(Figure 5).

Writing the Equation

Serial Shift In

If the output of the 16R6 PAL were positive-true
logic, the equation would include all the Ones on the
Venn Diagram. However, because the PAL output is
negative logic (active Low), the equation includes all the
Zeros. When the PAL inverts the signals, the Zeros are
changed to Ones, so that the final outputs are positive-true
logic. By inspection:
BU=D3 DO+D3 Dl
or,
Y3 =D3 DO + D3 Dl

During serial shift in (both mode control signals
Low), the data separator's data output goes to the
formatter's input. The signal is called SIN and is applied
to the SOUTflip-flop's D input. The SOUT flip-flop's output goes to the YO flip-flop's D input, whose output goes
to the Yl flip-flop's input, etc. After five read clocks, the
MSB of the 5-bit GCR coded data is in Y3, and the LSB
is in SOUTo

5- to 4-bit Conversion for Y Outputs

During a write operation, after the 4-bit data is converted to 5-bitdata and reversed, the data is shifted out
using the write clock and written on tape. The shift direction is opposite to that in serial shift in. Note that the data
is right-shifted "end around" (see Figure 3) so that after
five write clocks the same data appears in the register.

Serial Shift Out

A 5- to 4-bit conversion for Y outputs requires two
16-square Venn diagrams, because 25 = 32 possible binary
values exist. Note in Table 1, however, that the 5-bit code
columns do not use all 32 possible combinations. The un-

6-67

GCR Encoder/Decoder
YO

(x

YO

YO

X) I----

X

0

n

reX

0

X)

0

X

X

1

X

X

I

0

ex

I

X

X

0

X

X).

X

I

I

X

r---

0

0

I

l--(X

0

x

X)

Y3

SO=I

1

1

X

I

X

I

,

0

Y2

o

I

X

X

Y2

Y3

SO=O

X

YI

I

yz
I

I

n

YI
I

YO

r

0

II

0

x

x

0

0

X

X

X

0

Y3

50=0

YI
I

so-O

X

X

II' rr

I

I~

x

I

W
X

,-- ~

------v'--"
I----l0 "'
X

~~

I

0

x

X

0

I

Y2

1

X

X

'----- f-----'
Y3

~
I

x

0

.r-,X

x

X

I

,

I

X

0

YI

0

0

X

X

I

,

0

K

0

X

X

X

I

I

X

X
Y2

Y2

~
X

X

X

,
,

0
0

0

YZ

'----- I - Yl

YT ~ AI = YO

0

YI

YI

X

50= 1

YO

YO
I

J

(b) Y2Map
YO

X

x

Y2

Y2 - A2 = Y1

(a) Y3 Map
r-- ~

I

Y3

VJ=AJ=Y!+Y3So

yO

0

X

SO=I

so=o

Y3

Y3

50= I

YO=M=YJY2YO+SO

+ Y3 Y2

(d) YOMap

(c) Yl Map
Figure 5. 5· to 4·Bit Conversions

Several design programs that run on the IBM PC (or
equivalent) or the VAX computer are available from
either semiconductor manufacturers or from third-party
software vendors. The ftrst such program, called
PALASM (PAL Assembler) was developed by Monolithic
Memories. The program enables you to describe the logic
in terms of Boolean equations, truth tables, or state
diagrams using a language whose syntax is comparable to
a microcomputer assembly language.
Appendix A shows the equations for the GCR design,
written in the P ALASM syntax. This ASCII file was
created using Wordstar in the non-document mode.
The PALASM file ·(GCREX.PAL) is· then translated
to the syntax of the ABLE design program using the
TOABEL program. The format of the command is:
TOABEL -IB:GCREX -OB:GCREXT
The TOABEL program converts the GCREX.PAL
file to a file named GCREXT.ABL, whose listing appears
in Appendix B.
ABEL consists of an executive and several overlay
programs that are executed by typing in:
ABEL B:GCREXT
The ABEL program was developed by a programmer
manufacturer, Data I/O Corporation. ABEL can simplify a
source file (logic reduction), perform logic simulation, and
generate test vectors. Table 2 lists the ABEL programs.

Invalid Flag (INV Flip-Flop)
The Invalid flip-flop is set to a One when an invalid
5-bit code is read from the tape. This tells the tape formatter that the next data read is the beginning of the data
block. Because INV is a negative-true signal, the logic
equations are written for Ones on the Venn diagram.
The 16 binary values not listed in Table 1 are plotted
as Ones in Figure 6. Squares corresponding to valid 5-bit
codes contain Zeros; the rest of the squares contain Ones.
The equation for the Ones is:
INV = YO SOUT+ Y3 Y2 + Y3 YI YO
+ Y3 Y2 Y1 YO Sout
The Invalid flip-flop is enabled by a signal called CIF
(Control Invalid Flag) and reset when CIF is Low.

Synchronization Mark Detection
Bit synchronization is achieved when the illegal 5-bit
code of all Ones is read from the tape. This condition is
the logical AND of all 5 bits, or
BS = Y3 Y2 Y1 YO SOUTo

Implementing the Design
Once the conceptual design is complete, it must be
reduced to practice. This process has two main steps:
Describe the logic using a high-level language, and Program the PAL

6-68

~RESS
-==-~

SEMICGIDOCTOR

GCR Encoder/Decoder
~=============================~

INV

=

YO sour + Y3 Y2

+

Y3 VI Yl)

+ Y3 Y2 Yl YO SOUT

Figure 6. Binary Values Not Listed in Table 1
gram printed out that 40 of the device's 64 available
product terms were used.
If the P ALASM input equations shown in Appendix A
are implemented in two-input gates, approximately 30
gates are required for each of the six D flip-flop inputs, or
a total of 6 X 30 = 180 two-input gates. The logic equations alone would then require 180/4 = 45 14-pin DIPs.
The six flip-flops would require three 14-pin DIPs, for a
total of 48 DIPs. Thus, one 20-pin Cypress PAL replaces
approximately 50 14-pin DIPs.
This design also illustrates the Cypress PAL's powersaving advantage. The 16R6 PAL's maximum Icc current,
under worst-case conditions, is 45 rnA. In contrast, the
total Icc for 50 TIL packages would be 500 rnA, assuming 10 rnA for the typical Icc per package. The worst-case
Icc for the TIL system could be as high as 20 rnA per
DIP, which would mean a total of lA for the system.
The Cypress CMOS PAL reduces system power by a
factor of 10 to 15, depending upon whether typical or
worst-case numbers are compared.

The ABEL output mes for this design based on the
PAL C 16R6 (Figure 7) are:
GCREXT.LST
GCREXT.OUT
GCREXT.DOC (see Appendix C)
GCREXT.SIM (This design was not simulated.)
P16R6.JED (see Appendix D)
The last me is in JEDEC (JC-42.1-81-62) format and
is suitable for loading into a PLD programmer. The listing
appears in Appendix D. The DOCUMENT program output
appears in Appendix C. Note that, although the file list
includes a simulation me, this design was not simulated.
The CY7C16R6 that implements the design was
programmed using the Data I/O model 29B programmer
operated in the remote mode to the PC. The design was
then verified by testing the device on the bench.

PAL Advantages
This design example illustrates the space-saving advantage of Cypress CMOS PALs. The FUSEMAP pro-

Table 2. ABEL Programs
CK

1

t.41

2

t.40

3

sour

03

Y3

PROGRAM
NAME

Vee
BS

02

5

"

Y2

01

6

Y1

PARSE

Read source file; check syntax; expand
macros; act upon assembler directives

TRANSFOR

Convert the description to an
intermediate form
Perform logic reduction
Create the programmer load (JEDEC)
me

00

7

YO

REDUCE

EN

8

INV

elF

FUSEMAP

9

GND

10

SIN

11

E

SIMULATE

Figure 7. PAL C 16R6

FUNCTION

Simulate the operation of a
programmed device

DOCUMENT Create a design documentation me

6-69

Appendix A. PALASM Equations
DESIGN EXAMPLE
FILENAME: GCREX.PAL
PALI6R6
BRUCE WENNIGER 9/17/85
PATOOI
4B-5B ENCODER/DECODER
CYPRESS SEMICONDUCTOR
CK MI MO D3 D2 DI DO lEN ICIF GND
IE SIN IINY YO YI Y2 Y3 SOUT IBS YCC

ISOUT := EN*/SOUT
IEN*IMI *IMO*/SIN
IEN*/MI * MO*/YO
IEN* MI */MO*/SIN
IEN* IMI* MO* DI*/DO
IEN* IMI* MO* D3*IDO

+
+
+
+
+

; HOLDIRECIRCULATE
; SERIAL SHIFT IN
; SERIAL SHIFT OUT
; CONY. SIN & LOAD
; CONY. PAR. & LOAD
; DITTO

IYO := EN*/YO
IEN*/MI *IMO*ISOUT
IEN*/MI * MO*/YI
IEN* MI */MO*ISOUT
IEN* MI */MO* Y3* Y2*/YO
IEN* MI* MO*D2*IDI* DO
IEN* MI* MO* D3*IDI* DO
IEN* MI * MO*ID3*/DI */DO

+
+
+
+
+
+
+

; HOLD
; SERIAL SHIFT IN
; SERIAL SHIFT OUT
; CONY. SIN & LOAD
; DITTO
; CONY. PAR. & LOAD
; DITTO
; DITTO

IYI := EN*/YI
IEN*/MI */MO*/YO
IEN*/MI * MO*/Y2
IEN* MI */MO*/YO
. IEN* MI */MO* Y3* Y2
IEN* MI * MO*/D2

+
+
+
+
+

; HOLD
; SERIAL SHIFT IN
; SERIAL SHIFT OUT
; CONY. SIN & LOAD
; DITTO
; CONY. PAR. & LOAD

IY2 :=EN*/Y2
IEN*/MI */MO*/YI
IEN*/MI * MO*/Y3
IEN* MI */MO*/YI
IEN* MI * MO*!D3* DI
IEN* MI * MO*!D3* D2* DO

+
+
+
+
+

; HOLD
; SERIAL SHIFT IN
; SERIAL SHIFT OUT
; CONY. SIN & LOAD
; CONY. PAR. & LOAD
; DITTO

IY3 :=EN*/Y3
IEN*/MI *IMO*/Y2
IEN*/MI* MO*ISOUT
IEN* MI *IMO* Y3* SOUT
IEN* MI */MO*/Y2
IEN* MI * MO* D3* DO
IEN* MI* MO* D3* DI

+
+
+
+
+
+

; HOLD
; SERIAL SHIFT IN
; SERIAL SHIFT OUT
; CONY. SIN & LOAD
; DITTO
; CONY. PAR. & LOAD
; DITTO

INY :=/CIF* INY
+ ; HOLD INY FLAG (ACTIVE LOW)
CIF* MI *IMO*/Y3*/Y2
+
; SET IF INY ALID
CIF* MI*IMO*/Y3*/YI*/YO
+
; DITTO
CIF* MI */MO*/YO*/SOUT
+
; DITTO
CIF* MI*/MO* Y3* Y2* YI* YO* SOUT ; DITTO
BS

=

Y3* Y2* YI * YO* SOUT

; BIT SYNC. (ACTIVE LOW)

6-70

Appendix B. ABEL Listing
module gcrext;
flag '-rO';
title
'PAL16R6
FILENAME: GCREX.PAL
DESIGN EXAMPLE
PAT001
BRUCE WENNIGER 9/17/85
4B-5B ENCODER/DECODER
CYPRESS SEMICONDUCTOR
-Translated by TOABEL-';
P16R6 device 'P16R6';
"declarations
TRUE,FALSE = 1,0;
H,L = 1,0;
X,Z,C = .x.,.Z.,.C.;
GND,VCC
pin 10,20;
CK,Ml,MO,D3,D2,Dl,DO,EN,CIF,E
pin 1,2,3,4,5,6,7,8,9,11;
INV,YO,Y1,Y2,Y3,SOUT
pin 13,14,15,16,17,18;
SIN,BS
pin 12,19;
equations
ISOUT := lEN & ISOUT
#EN & IMI & IMO & ISIN
# EN & IM1 & MO & IYO
# EN & Ml & IMO & ISIN
# EN & Ml & MO & Dl & IDO
# EN & Ml & MO & D3 & IDO ;
" HOLD/RECIRCULATE
" SERIAL SHIFT IN
" SERIAL SHIFT OUT
" CONY. SIN & LOAD
"CONV. PAR. & LOAD
" DITTO
IYO

:= lEN & IYO
# EN & IMI & IMO & ISOUT
# EN & IMI & MO & IYl
#EN &Ml & IMO& ISOUT
# EN & M1 & IMO & Y3 & Y2 & IYO
# EN & Ml & MO & D2 & IDI & DO
# EN & M1 & MO & D3 & IDI & DO
#EN &Ml &MO & ID3 & IDI & IDO;

"HOLD
" SERIAL SHIFT IN
" SERIAL SHIFT OUT
"CONV. SIN & LOAD
" DITTO
"CONV.PAR. & LOAD
" DITTO
" DITTO

6-71

Appendix B. ABEL Listing (Continued)

!Yl

:= !EN & !Yl
#EN & lMl & lMO& lYO
# EN & lMl & MO & lY2
#EN &Ml & !MO & !YO
# EN & Ml & lMO & Y3 & Y2
#EN &Ml &MO& !D2;

"HOLD
" SERIAL SHIFT IN
" SERIAL SHIFT OUT
" CONY. SIN & LOAD
" DITTO
" CONY. PAR. & LOAD
lY2

:= !EN & lY2
#EN & !Ml & !MO & !Yl
# EN & lMl & MO & !Y3
#EN &Ml & lMO & !Yl
# EN & Ml & MO & !D3 & Dl
#EN &Ml &MO & lD3 &D2 &DO;

"HOLD
" SERIAL SHIFT IN
" SERIAL SHIFT OUT
" CONY. SIN & LOAD
" CONY. PAR. & LOAD
" DITTO
!Y3

:= !EN & !Y3
#EN & !Ml & !MO & !Y2
# EN & !Ml & MO & lSOUT
#EN &Ml & !MO & Y3 & SOUT
#EN &Ml & lMO & !Y2
#EN &Ml &MO&D3 &DO
#EN &Ml &MO &D3 &Dl;

"HOLD
" SERIAL SHIFT IN
" SERIAL SHIFT OUT
" CONY. SIN & LOAD
" DITTO
" CONY. PAR. & LOAD
" DITTO
lINY

:= CIF & !INY
# !CIF &Ml & !MO & lY3 & !Y2
# lCIF &Ml & !MO & lY3 & lYl & lYO
# !CIF & Ml & lMO & lYO & !SOUT
# !CIF & Ml & !MO & Y3 & Y2 & Yl & YO
&SOUT;
" HOLD INY FLAG
" SET IF INYALID
" DITTO
" DITTO
" DITTO

!BS
= Y3 & Y2 & Yl & YO & SOUT ;
" BIT SYNC.

end _gcrext;
6-72

GCR Encoder/Decoder

Appendix C. Document File

Page 1
ABEL(tm) Version 1.10 - Document Generator
17-Sept-85 8:30 AM
PAL16R6
DESIGN EXAMPLE
FILENAME: GCREX.PAL
PATOOI
BRUCE WENNIGER 9/17/85
4B-5B ENCODER/DECODER
CYPRESS SEMICONDUCTOR
-Translated by TOABELEquations for Module _gcrext
Device P16R6
Reduced Equations:
SOUT:= !(IEN & !SOUT
#EN & !MO & !Ml & !SIN
# EN & MO & !Ml & lYO
#EN & IMO &Ml & ISIN
# IDO &Dl &EN & MO &Ml
# IDO & D3 & EN & MO & Ml);
YO := 1(IEN & lYO
# EN & IMO & IMI & ISOUT
# EN & MO & IMI & lYl
#EN & IMO & Ml & ISOUT
# EN & IMO & Ml & lYO & Y2 & Y3
# DO & IDI & D2 & EN & MO & Ml
# DO & IDI & D3 & EN & MO & Ml
# IDO & IDI & ID3 & EN & MO & Ml);
Yl := I(lEN & lYI
# EN & lMO & IMI & lYO
# EN & MO & lMI & lY2
# EN & lMO & Ml & lYO
# EN & IMO & Ml & Y2 & Y3
# ID2 & EN & MO & Ml);
Y2 := 1(IEN & lY2
# EN & IMO & IMl & lYl
#EN &MO & lMI & lY3
# EN & lMO & Ml & lYl
# Dl & lD3 & EN & MO & Ml
# DO & D2 & ID3 & EN & MO & Ml);
Y3 := 1(IEN & lY3
# EN & lMO & IMI & lY2
#EN &MO& IMI & ISOUT
# EN & IMO & Ml & SOUT & Y3
# EN & IMO & Ml & lY2
# DO & D3 & EN & MO & Ml
# Dl & D3 & EN & MO & Ml);
INY := I(ClF & lINY

6-73

Appendix C. Document File (Continued)
Page 2
17 Sept-85 8:30 AM
ABEL(tm) Version 1.10 - Document Generator
PAL16R6
DESIGN EXAMPLE
FILENAME: GCREX.PAL
PATOOI
BRUCE WENNIGER 9/17/85
4B-5B ENCODER/DECODER
CYPRESS SEMICONDUCTOR
-Translated by TOABELEquations for Module _gcrext
Device P16R6
#
#
#
#
BS

=

IClF&
IClF&
IClF &
IClF &

IMO&Ml
IMO&Ml
IMO & Ml
IMO & Ml

& IY2& IY3
& IYO& IYI & IY3
& ISOUT & IYO
& SOUT & YO & Yl & Y2 & Y3);

I(SOUT & YO & Yl & Y2 & Y3);

Chip diagram for Module _gcrext
Device P16R6

PALC16R6
CK

1

~1

2

as

~O

3

03

.-

SOUT

02

5

Y2

Vee

Y3

01

6

Yl

DO

7

YO

EN
elr

8

GND

INV

9

12

SIN

10

11

E

end of module _gcrext

6-74

GCR Encoder/Decoder
Appendix D. JEDEC File
ABEL(tm) Version 1.10 JEDEC fIle for: P16R6
Created on: 17-Sep-85 8:30 AM
PAL16R6
DESIGN EXAMPLE
FILENAME: GCREX.PAL
PATOOI
BRUCE WENNIGER 9/17/85
4B-5B ENCODERIDECODER
CYPRESS SEMICONDUCTOR
-Translated by TOABEL-*

QP20* QF2048*
LOOOO

11111111111111111111111111111111
11111101110111011101110111111111
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
11111110111111111111111110111111
10111011111111111111111101111110
10110111111111111111111001111111
01111011111111111111111101111110
01110111111111110111101101111111
01110111011111111111101101111111
00000000000000000000000000000000
00000000000000000000000000000000
11111111111011111111111110111111
10111011111111101111111101111111
10110110111111111111111101111111
01111001110111111111111101111111
01111011111111101111111101111111
01110111011111111111011101111111
01110111011111110111111101111111
00000000000000000000000000000000
00000000000000000000000000000000
11111111111011111111111110111111
10111011111111101111111101111111
10110110111111111111111101111111
01111001110111111111111101111111
01110111111111011111111101111111
01110111011111111111011101111111
01110111011111110111111101111111
00000000000000000000000000000000
00000000000000000000000000000000
11111111111011111111111110111111
10111011111111101111111101111111
10110110111111111111111101111111
01111001110111111111111101111111
01111011111111101111111101111111
01111111111111111111011101111111
01110111011111110111111101111111
00000000000000000000000000000000
11111111111111101111111110111111
10111011111111111110111101111111
10110111111011111111111101111111
01111011111111111110111101111111
01110111101111110111111101111111

JEDEC Listing (Continued)

01110111101101111011011101111111
00000000000000000000000000000000
00000000000000000000000000000000
11111111111111111110111110111111
10111011111111111111111001111111
10110111111111101111111101111111
01111011111111111111111001111111
01111011110111011111111101111111
01110111111110111111111101111111
00000000000000000000000000000000
00000000000000000000000000000000
11111111111111111111111010111111
10111010111111111111111101111111
10110111111111111110111101111111
01111011111111111111111101111111
01111011110111011111111001111111
01110111111101111011011101111111
01110111011111111011011101111111
01110111101111111011101101111111
11111111111111111111111111100111
01111011111011101111111111111011
01111011111011111110111011111011
01111010111111111111111011111011
01111001110111011101110111111011
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000*
C8E51*
D15A

6-75

CYPRESS
SEMICONDUCTOR

T2 Framing Circuitry
-

This application note describes the design of a 1'2based transmission system. This system adds control characters to an image processor's data stream so that the
resulting output can be slotted into a 1'2 channel. DS-2
transmission equipment is then used to relay this information onward.
At receiving locations, the control bits are used to
synchronize the site's circuitry to the incoming characters.
The data is then restored to its original form, before being
routed to its final destination. A block diagram of this system appears in Figure 1 ~

B

B

F
1

Ch
1

Ch
2

B

B

..

Justification: Three bits, referred to as Stuffing Indicator bits (C), are inserted into every sub-frame for
justification purposes. Positive, negative, and no justification are possible by inserting the correct code
into the relevant locations.

~~; = 1.536 Mbits/s

You can achieve this maximum data rate for T1
transmission when using the Extended Super Frame format. This format <;iedicates all 8 bits of every channel to

PROCESSOR

.a

Ch
24

Frame alignment, implemented by alternating between logic level 0 and· 1. Each sub-frame contains
two of these bits, which are referred to as F bits.

The maximum data rate in a T 1 channel is therefore

T2

1

.9te

user data instead of reserving the eighth bit for channel
signaling. Figure 2 illustrates the composition of a T1
frame.
The next level in the digital communications hierarchy is referred to as T2. Four T1 frames constitute a 1'2
Multi-frame. These frames are arranged as four subframes, each having six blocks of 49 bits. The leading
character of every block is used for control purposes, and
the following 48 bits consist of data. In total, a Multiframe comprises 1176 characters.
This format includes three control features:
Multi-frame alignment, provided by a 0111 pattern in
each of the four sub-frames. These four bits are
referred to as M bits. The fourth M bit location can
also serve as an alarm service digit, if required.

24 Xl~OOO x 193 = 1.544 Mbits/s

TRANSMIT
INTERFACE

1

Ch
2

Figure 2. T1 Frame Structure

Overview of Tl and T2

IMAGE

Ch

F = F BIT (one bit)
Channel data = 8 bits
Number of channels = 24

Digital transmission systems in North America are
hierarchical in structure. Each carrier is multiplexed into
higher bandwidth carriers. The lowest level is known as
11. This typically consists of 24 64-Kbitls pulse code
modulation (PCM) telephone channels multiplexed
together into frames. A single framing (F) bit precedes
every Tl frame to· allow for features such as synchronizing channels, sending control characters, and generating
cyclic redundancy code (CRC) bits.
Thus, each frame contains 24 8-bit channels plus an
additional framing bit, for a total of 193 bits per frame.
The bit rate for a T1 channel equals the rate of a bit in the
frame multiplied by the total number of bits in the frame:

1.544 x

F

DS-2/T2
LINK
Figure 1. System Overview

6-76

T2

RECEIVER
INTERFACE

FINAL
DESTINATION

provides further details of all the framing structures mentioned here.
T2 MULTI-FRAME

Transmitter Site Circuitry
In the example T2 system, the machine from which
data originates can operate at frequencies as high as 10
MHz. The data is sourced to the T2 system at 6.183 MHz,
which is the data rate of a T2 line. At 10 MHz, stopping
and starting artifacts would arise from the disparity between the source and the transmission medium. The output from the transmitter circuitry is maintained at 6.312
MHz to allow the inclusion of control characters into the
data stream. Phase-lock-loop design techniques ensure that
the clocks in the T2 system and the data source are tightly
coupled.
Figure 5 shows the transmitter block diagram. Information feeds into a FIFO, ICl, under control of
TXCLKIN (6.183 MHz, the source's clock). TXCLKOUT
(6.312 MHz) retrieves data from ICI. IC2 (TXCNTRL
PAL) controls the insertion of control bits into the data
stream at every 49th time slot. IC4 is a PROM that holds
a unique 24-bit control pattern.
A counter, IC3 (PROMADDR PAL), provides the address to the PROM. ICS (DIVBY49 PAL) is programmed
as a counter that increments on successive clock pulses.
When this counter reaches its terminal count (49), a carry-

M = MULTI-FRAME ALIGNMENT BITS (M1, M2, M3 and M4)
C = STUFFING INDICATOR (Cj1, Cj2 and Cj3)
F = FRAME ALIGNMENT BITS (FO=O, F1=1)

Figure 3. T2 Frame Structure
Figure 3 shows how the data and control bits interleave. Figure 4 illustrates the sequence in which control
bits occur.
The bit rate of T2 information bit rate is 6.312
Mbitls. The corresponding data rate in a T2 channel is
therefore:

6.312 x 48 6.183 Mbits/s
49
Further levels exist within the communication hierarchy, but they are not relevant to this design. CCnT G743

Ml--.

Cll - - .

FO

--.

C12 - - .

C13 - - .

Fl

--.

M2

--.

C2l - - .

FO

--.

C22 - - .

C23 ---.

Fl

--.

C32 - - .

C33 - .

Fl

-.

M4

-.

C41 - - .

FO

--.

C42 - .

C43 - - .

Fl

--.

o
M3 - - .

C31 - .

FO

o

o
Figure 4. Control Bit Sequence

FIFODATA

IMAGE
IPDATA
...
PROCESSOR 1------1~~

PL~~S~

FIFO
IC1

... HFUll
~

lK

L

RDClKt r - l _ R C - I .
_

LOOP
L:C~IR~C~U~IT~R~Y~J-___T_xc_lK_oUT
_ _ _ _-t~~

_"----1---,

TXCNTRL

1~_____~~=Cl~KI~N________~.

PAL

r------------------------~~

IC2

~'~'l ~

PROMADDR
PAL
IC3

TO DS-2

INTERFA~E
t-_-B.A------~~
OPDATA

~

WUT

~~
~~
WNTROL
BITS

PURt
PROM
ADDRESS
BITS
Ao-A4

....

A5-~-+

PROM
IC4

(GND)

TXCLKOUT

--+

Figure S. Transmitter Site Circuitry

6-77

DIVBY 49
(COUNTER)
IC5

ETC

out signal is produced (FBITLOAD), which serves three
purposes:
It causes the counter to reload its base count (zero)
It indicates that a control bit has to be inserted into
the data stream
It serves as an input to the state machine in the IC2
PLD, which is the control-bit sequencer that governs
when the PROM address generator has to be incremented. A decode of one of the sequencer's states,
INCFADD, causes the PROM address to increase by
one.
The listings for the design's PALs appear in Appendices A through J.

The devices required to implement these tasks appear
in Figure 6.ICI (IPFIFO) is a FIFO whose input source is
the data and control character stream from the transmit
site. The FIFO holds the most recent 196 bits of information entering the receiver circuitry.
IC2 (DATASORT PAL) provides the commands that
control this operation and acts as an intermediate buffer
stage between the information presented to the FIFO and
the characters subsequently read from that device. The
outputs of IC3 (CLKGEN PAL) are the Read and Write
clocks for the FIFO.
IC4 (ALIGNDET PAL) and IC5 (FRAMCHEK PAL)
perform pattern recognition. IC4 compares the expected
control bit pattern to the stream of characters appearing at
the FIFO's outputs. IC5 interprets the results and sets a
flag whenever frame alignment is attained. IC5 also indicates if alignment is subsequently lost.
Frame alignment is declared when four pre-determined bit patterns have been recognized. Thereafter, the
circuit makes continuous checks to ensure that alignment
is maintained. In total, the circuit seeks 12 bit patterns. If
any check yields a negative result, alignment ha's been
lost. A locally generated reset pulse then sets the relevant
circuitry to its initial state, and the process of alignment
detection begins once again.
For a short period following the application of power,
an initialization signal, RESET, is active. This signal ensures that the outputs ofIC5 (FRAMCHEK PAL) and IC6
(DSCOUNT PAL) are driven to their initial states and the
FIFO (ICl) has all of its internal memory locations and
control registers cleared to zero. Once the power-up

Receiver Site Circuitry
The most obvious way to detect a valid pattern of 1'2
data· and control characters is to serially shunt them
through a shift register with 1176 stages. Outputs from the
first, 50th,. 99th, etc. through the 1128th location can then
be continuously monitored for the relevant character sequence .. This approach is very wasteful in terms of circuitry because monolithic shift registers provide either
eight or 16 stages.
Fortunately, you can achieve the same result with one
FIFO and two PALs. The principle is to arrange the incoming information so that a pattern recognition circuit
periodically samples the most recent, the 50th, the 99th,
the 148th, and the 197th bits. This circuitry then compares
the information to that expected. When a complete frame
of control characters has been detected, the incoming information is frame aligned with the circuitry at the
receiver site.
ICLK3

START
CLK3

PUR

RESET
CLK3

ALlGNF

START
CLK3
RESET
PUR
ICLK3

ALlGNDET
IC4

00-04

MTRUE

FRAMCHEK
JC5

FTRUE
PUR

01-04
MB,MA

A1-A4
ALlGNF

CLKGEN
IC3

E,D,C

IPFIFO'
IC1

RD
WR

OPHFULL

00

RDCLK, WRCLK
CLK3,ICLK3

OPFIFO
ROCLK

Ica

DSTAGGR
WRCLK

Figure 6. Receiver Site Circuitry

6-78

DATAOUT

TO FINAL
DESTINATION

04

'"
'15

03

02

'47
'48

rr

II

01

.

48

IPFIFO

Figure 7. YX Sequencer
routine has completed, the process of writing information
into the FIFO commences.
All data entering the receiver is initially fed to the
first input stage of the FIFO (D4) via a register in IC2.
This ensures that the FIFO's set-up parameter is not violated. Every time a character enters the FIFO, a counter in
IC6 increments once. When the terminal count (49) is
reached, the counter's carry-out pin (NEXT) goes active.
This condition causes the YX sequencer in IC2 to move
from its initial state 0 position to state 1 (Figure 7). A
decode of this state enables the strobe RD, which retrieves
stored data from the FIFO. Thereafter, the data from the
FIFO's first output stage (A4) is coupled, via IC2, to the
FIFO's second input port (03).
After two further occurrences of NEXT going active,
the FIFO's second and third output stages (A3 and A2)
are coupled to the third and fourth input ports (02 and
D1), respectively. The YX sequencer goes to state 2. Figure 8 shows the FIFO's contents when NEXT becomes
active for the fourth and final time. At this point, the pattern recognition circuitry can be enabled.
IC2's five data output pins (D4 - DO) effectively perform the same function as a shift register with 197 stages.
IC4 monitors this information until it detects the first ocare
currence of 01000. These control bits
M1IFlIC43/C42IFO, which are the signals present on D4 DO of the FIFO after the transmission of 1176 characters.
This pattern could correspond to the detection of 01000 in
IC4 for the first time. However, it is also quite probable
that this sequence could randomly occur in the data
stream. Thus, further checks are needed before assuming
that the valid recognition pattern has been detected.
As soon as the receiver recognizes the 01000 pattern,
a signal labeled START goes active. This term enables a
six-stage counter in IC7 (CBITCNTR PAL). The counter
counts to 48, then issues a carry-out signal (LD49). A
seven-state EDC sequencer (Figure 9) in IC5 recognizes
every occurrence of this signal and thus always moves to
its next stable position (state 1 in this case).

00

1----'04

03

02

01

Figure 8. Contents of Receiver Site FIFO
A second check is made in IC4 to determine whether
the second valid control bit pattern has been detected. IC4
uses the control bits E, 0, C, MA, and MB from IC5's
EDC and M sequencer (Figure 10) to determine whether
the incoming data has been aligned. These control bits
represent the state of the sequencers in IC4 and determine
the control sequence that should exist on the D4 - DO inputs.
The second valid control pattern, 10001, is now
sought on the bits F1, C43, C42, FO, and C41. If the pattern is not detected, a global reset is issued, and the search
for the 01000 pattern recommences. Conversely, if the
10001 pattern is detected, the EOC sequencer assumes
state 3. Further, FTRUE becomes true. This signal exists
for one clock period and causes the sub-frame detector
implemented by the F sequencer (Figure 11) to move to
its next stable state. A further 147 clocks are allowed to
elapse before the next control bit pattern check is carried
out.
By this time, the EDC sequencer is in state 6. The
occurrence of a 11000 pattern for M4, Fl, C33, C32, and
FO provides further proof that alignment has been attained, and the F sequencer moves to its next stable position, state 3. As before, a negative result causes the circuit
to issue a global reset. The checking process would then
continue with a 10000 pattern for F1, C33, C32, FO, and
C31 being sought when a further 49 clock periods had
elapsed (EDC sequencer in state 7).
In this case, the occurrence of the correct pattern
causes the M sequencer (multi-frame detector) to progress
MTRUE

POR

MTRUE

MTRUE

LD41

MTRUE

Figure 9. EDC Sequencer

Figure 10. M Sequencer

6-79

~

~~~~~~~~~~~~~~~~~~~~~~~~~T~2~F~r~a~m~i~n~g~C~ir~c~u~it~r~y
Control bits are not written to IC8 (OPFIFO); they
coincide with the occurrence of an active ID49
(counter carry-out) signal. Thus, although a data bit is
read out of the IPFIFO, the occurrence of LD49
prevents a write strobe (WRCLK) from being
generated and the data bit from being written into
OPFIFO.
The process of removing data from IC8 commences
as soon as that device is half full, indicated by OPHFULL. This prevents invalid data from being passed
to the next stage when the FIFO empties.
The frequency of the FIFO's write (WRCLK) and
read (RDCLK) strobes are 6.312 and 6.183 MHz,
respectively.

Figure 11. F Sequencer
from its state 0 start position to state 1. The F sequencer's
state diagram shows that the sequencer assumes state 2
after the next occurrence of an active LD49 signal, followed one clock period later by a return to the start position, state O.
As stated previously, the declaration of alignment is
made only when four consecutive bit patterns - commencing with the start condition on MO, F1, C3, C3, and
FO - have been sequentially detected. When these criteria
have been satisfied, the ALIGNF flag is raised. This flag
is held in its active state until one of the ensuing checks
produces a negative result. In such an event, the RESET
term goes active, thereby forcing certain areas of the
receiver's circuitry into the same conditions as occurred at
power-up.
Immediately following the receiver's alignment of the
incoming data stream, the ensuing information is written
into a second FIFO (IC8, OPFIFO). This action is a
preface to restoring the data to its original form, i.e.,
removing the control bits added by the transmitter. Once
this operation has been completed, the data can be passed
to its final destination. As in the transmitter's design, the
receiver's source (IPCLK, 6.312 MHz) and sink
(PLLOPCLK, 6.183 MHz) clocks must be locked
together. A phase lock loop circuit performs this function.

Other Considerations
The T2 system requires interfaces at both the transmit
and receive sites between the hardware described here and
the relevant DS-2 equipment. Rockwell's industry-standard DX-33B"4 (CLNS-95-297) and DX-33K-3 (CLNS-95308) boards suit this task. The latter is fitted with a termination network that matches the receiver's input impedance to that of the transmission medium.

Parts Lists
Transmitter:
IC1 = CY7C433
IC2 = CY7C22VlO
IC3 = CY7C22V10
IC4 = CY7C225
IC5 = CY7C22VlO
Receiver:
IC1 = CY7C433
IC2 = CY7C22V10
IC3 = CY7C22VlO
IC4 = CY7C22V10
IC5 = CY7C22VlO
IC6 = CY7C22V10
IC7 = CY7C22V10
IC8 = CY7C433

IC3 provides the control and strobe signals for
removing control bits from the data stream. The equations
in the source code for this device (Appendix D) reveal the
following facts:

6-80

Appendix A. PAL Equations For TXCNTRL
PAL 22VlO
T2 TRANSMITIER CONTROLLER (lC2)
CYPRESS SEMICONDUCTOR
ITXCLKIN /PUR ITXCLKOUT HFULL FBIT IFBITI..OAD FIFODATA NC8 NC9 NC10 NC11 GND
NC13 WRCLK /RDCLK IENREAD OPDATA lEN IB IA IINCFADD NC22 NC23 VCC

EQUATIONS
WRCLK = TXCLKIN
RDCLK

=

TXCLKOUT*ENREAD*/PUR

ENREAD := IENREAD*HFULL*/PUR
+ ENREAD*/PUR
; TXCLKIN = 6.183 MHz
; TXCLKOUT = 6.312 MHz
; HFULL = FIFO HALF-FULL FLAG
; PUR = POWER-aN-RESET SIGNAL
; WRCLK = FIFO SHIFT-IN
; RDCLK = FIFO SHIFT-OUT
OPDATA:= IOPDATA*FBIT*EN
+ IOPDATA*FIFODATA*IEN
+ OPDATA*IFBIT*EN
+ OPDATA*FIFODATA*/EN
; FBIT = FRAMING BIT FROM PROM
; EN = SELECTS DATA OR FRAMING BIT
; FIFODATA = DATA RETRIEVED FROM FIFO
; OPDATA = DATA PASSED TO DS-2 INTERFACE
; FBITLOAD = FRAMING BIT TO BE INSERTED INTO DATA STREAM
; INCFADD = CAUSES PROM ADDRESS TO BE INCREMENTED
; BA SEQUENCER = CONTROLS SELECTION OF FRAMING BITS
A:= IB*IA*FBITLOAD*/PUR
+/B*A*/PUR
B:= IB*A*/PUR
+ B*IA*/PUR
EN = IB*A
INCFADD = B* A
; STATE DIAGRAM FOR BA SEQUENCER
;;
EN
; PUR---- FBITLOAD ----

INCFADD

; ---->1 0 1-------->-------1 1 1------>-----1 3 1------->------1 2 1

---------------------------<--------------------------

6-81

Appendix B. PAL Equations For DIVBY49

PAL 22VI0
DIVIDE BY 49 COUNTER (IC 5)
CYPRESS SEMICONDUCTOR
ITXCLKOUT NC2 NC3 NC4 NC5 NC6 NC7 NC8 NC9 NCI0 NCl1 GND
NC13 IFBITLOAD QO Ql Q2 Q3 Q4 Q5 NC21 NC22 NC23 VCC
EQUATIONS
QO := IQO*/FBITLOAD
Ql := IQl *QO*/FBITLOAD
+ Ql */QO*/FBITLOAD
Q2 := IQ2*Ql *QO*/FBITLOAD
+ Q2*/Ql */FBITLOAD
+ Q2*/QO*/FBITLOAD

Q3 := IQ3*Q2*Ql *QO*/FBITLOAD
+ Q3*/Q2*/FBITLOAD
+ Q3*/Ql */FBITLOAD
+ Q3*/QO*/FBITLOAD

Q4 := IQ4*Q3*Q2*Ql *QO/FBITLOAD
+ Q4*/Q3*/FBITLOAD

+ Q4*/Q2*/FBITLOAD
+ Q4*/Ql*/FBITLOAD

+ Q4*/QO*/FBITLOAD

Q5 := IQ5*Q4*Q3*Q2*Ql *QO*/FBITLOAD
+ Q5*/Q4*/FBITLOAD
+ Q5*/Q3*/FBITLOAD
+ Q5*/Q2*/FBITLOAD
+ Q5*/Ql */FBITLOAD
+ Q5*/QO*/FBITLOAD

FBITLOAD = Q5*Q4*/Q3*/Q2*/Q 1*/QO
; T2CLKOUT = 6.312 MHz
; QO-Q4 = COUNTER OUTPUTS
; FBITLOAD = USED TO INSERT FRAMING BITS INTO DATA STREAM
(EVERY FORTY-NINTH LOCATION)

6-82

~

~~~~~~~~~~~~~~~~~~~~~~~T~2~F~r~a~D1~in~g~C~ir~c~u~i~tr~y
Appendix C. PROM Equations
PROM
FILENAME:PROM
CONTROL BIT GENERATOR (lC 4)
CYPRESS SEMICONDUCTOR
ADDRESS
PROM CONTENTS
(HEX) (HEX)

00
01
02
03
04
05
07
08

00
00
00
00
00
01
01
00
00

09

00

OA

00
01
01

06

OB
OC
OD

OE
OF

00
00

00

10

00

11

01
01

12

16

00
00
00
00

17

01

13
14

15

6-83

Appendix D. PAL Equations For PROMADDR

PAL 22VIO
FILENAME:PROMADDR
PROM ADDRESS GENERATOR (IC 3)
CYPRESS SEMICONDUCTOR
ITXCLKOUT /PUR IINCFADD NC4 NC5 NC6 NC7 NC8 NC9 NC10 NCll
NC13 AO A1 A2 A3 A4 /RELOAD NC20 NC2I NC22 NC23 VCC
EQUATIONS
AO := IAO*INCFADD*/RELOAD
+ AO*/INCFADD*/RELOAD
Al := IAI*AO*INCFADD*/RELOAD
+ A1 *1 AO*/RELOAD
+ A1 */INCFADD*/RELOAD

A2:= IA2*AI*AO*INCFADD*/RELOAD
+ A2*/A1 */RELOAD
+ A2*1 AO*/RELOAD
+ A2*/INCFADD*/RELOAD
A3:= IA3*A2*A1*AO*INCFADD*/RELOAD
+ A3*1 A2*/RELOAD
+ A3*1A1 */RELOAD
+ A3*1 AO*/RELOAD
+ A3*/INCFADD*/RELOAD
A4:= IA4*A3*A2*AI*AO*INCFADD*/RELOAD
+ A4*/A3*/RELOAD
+ A4 *1 A2*/RELOAD
+ A4 *1 A1 */RELOAD
+ A4*/AO*/RELOAD
+ A4*/INCFADD*/RELOAD
RELOAD = PUR
+ Q4*Q3 */Q2*/Q I */QO
; T2CLKOUT = 6.312 MHz
; PUR = POWER-ON-RESET
; INCFADD = INCREMENT ADDRESS COUNT
; AO-A4 = PROM ADDRESS
; RELOAD = LOAD COUNTER Willi BASE COUNT

6-84

GND

Appendix E. PAL Equations For DATASORT
PAL 22VlO
F~ENAME;DATASORT

ARRANGE DATA READY FOR PATTERN DETECTOR (IC 2)
CYPRESS SEMICONDUCTOR
/ICLK3 A4 A3 A2 Al INPUT !NEXT /PUR NC9 NClO NCII GND
NC13 /Y /X IDSTAGGR D4 D3 D2 Dl DO REGIN NC23 VCC
EQUATIONS
X:= /X*/Y*NEXT*/DSTAGGR*/PUR
+ X*/Y*/PUR
+ X*/NEXT*IPUR
+ X*DSTAGGR*IPUR
Y:= /Y*X*NEXT*/DSTAGGR*/PUR
+ Y*X*IPUR
+ Y*/NEXT*IPUR
+ Y*DSTAGGR*IPUR
DSTAGGR:= IDSTAGGR*Y*/X*NEXT*/PUR
+ DSTAGGR*/PUR
; STATE DIAGRAM FOR YX SEQUENCER

,
;PUR ---- NEXT*/DSTAGGR ---- NEXT*IDSTAGGR ---- NEXT*/DSTAGGR ----

; ---->1 0 1------------>-------------1 1 1------------>------------1 3 1------------>------------1 2 1
NEXT*IDSTAGGR

---------------------------------------------<--------------------------------------------; YX SEQUENCER = CONTROLS ARRANGEMENT OF DATA IN FIFO
; DSTAGGR = INDICATES WHEN DATA READY FOR PATTERN RECOGNITION
; PUR = POWER-ON-RESET
; NEXT = COUNTER O/P, CONTROLS DATA ORGANISATION INTO/OUT OF FIFO
REGIN := INPUT
DO := IDSTAGGR
+ IDO*Al*DSTAGGR*/Y*/X
+ DO*Y
+ DO*X
+ DO*Al
Dl := /Y*IDSTAGGR
+ IDl *Y*X*/DSTAGGR
+ IDl *y*/x* A2*/DSTAGGR
+ IDl */Y*/X* A2*DSTAGGR
+ Dl*A2
+Dl*X
+ D 1*Y*DSTAGGR

6-85

AppendixE. PAL Equations For DATASORT (cont.)

D2 := /Y*IDSTAGGR
+ 1D2*Y*/X* A3*/DSTAGGR
+ 1D2*/Y*/X*A3*DSTAGGR
+ D2*A3
+ D2*Y*DSTAGGR
+ D2*X*DSTAGGR
D3 := /Y*/X*IDSTAGGR
+ 1D3*X*A4*IDSTAGGR
+ 1D3*Y*/X*A4*IDSTAGGR
+ 1D3*/Y*/X*A4*DSTAGGR
+ D3*A4
+ D3*X*DSTAGGR
+ D3*Y*DSTAGGR
D4 :=REGIN
; DO-D4 = OUTPUTS TO PATTERN RECOGNITION CIRCUITRY, ALSO
;
REGISTERED DATA BEING FED BACK INTO FIFO lIP STAGES
; AO-A4 = FIFO OUTPUTS BEING FED TO REGISTER
; INPUT = SERIAL DATA STREAM FROM RECEIVER liP STAGE
; REGIN = REGISTERED liP DATA

6-86

Appendix F. PAL Equations for CLKGEN
PAL 22V10
FILENAME;CLKGEN
CLOCK GENERATOR FOR DATA SORTING CIRCUITRY AND OPFIFO (lC 3)
CYPRESS SEMICONDUCTOR

IIPCLK IY IX IDSTAGGR PLLOPCLK /PUR OPHFULL ILD49 I ALIGNF NC10 NC11 GND
NC13 ICLK3 IICLK3 fWR /RD ICLK4 IICLK4 IENREAD RDCLK WRCLK NC23 VCC
EQUATIONS
CLK3 = IPCLK
ICLK3 = IIPCLK
CLK4

=

PLLOPCLK

ICLK4 = /PLLOPCLK
WR
RD

=
=

IICLK3
IY*X*/DSTAGGR*ICLK4

+ Y*/DSTAGGR *ICLK4
+ IY*/X*DSTAGGR*ICLK4
; IPCLK = MASTER CLOCK FROM DS-2 INTERFACE (6.312MHz)
; YX SEQUENCER = USED TO CONTROL WHEN FIFODATA RETRIEVED
; DSTAGGR = USED TO CONTROL WHEN FIFO DATA RETRIEVED
; PLLOPCLK = O/P FROM PHASE LOCK LOOP CIRCUIT (6.183 MHz),
;
DERIVED FROM 6.312 MHz MASTER CLOCK
; CLK3/ICLK3 = DERIVATION OF MASTER CLOCK (6.312 MHz)
; CLK4/ICLK4 = DERIVATION OF PHASE LOCKED O/P (6.183MHz)
; WR = lIP STAGE FIFO SHIFT-IN
; RD = liP STAGE FIFO SHIFT-OUT
WRCLK = ALIGNF*/LD49*/CLK3
RDCLK

=

ENREAD*CLK4

ENREAD := IENREAD* OPHFULL */PUR
+ ENREAD*/PUR

; WRCLK = SHIFT-IN SIGNAL TO O/P STAGE FIFO
; RDCLK = SHIFT-OUT SIGNAL TO O/P STAGE FIFO
; ENREAD = CONTROLS WHEN DATA CAN BE READ FROM O/P STAGE FIFO
; ALIGNF = "ALIGNMENT" INDICATOR
; LD49 = O/P STAGE FIFO SHIFT-IN DISABLE TERM
; OPHFULL = INDICATES WHEN O/P STAGE FIFO IS HALF FULL
; PUR = POWER-ON-RESET

6-87

Appendix G.PAL Equations For DSCOUNT
PAL 22VIO
FILENAME; DSCOUNT
DIVIDE-BY-49 COUNTER FOR DATA SORTING PROCESS (IC 6)
CYPRESS SEMICONDUCTOR
IICLK3 /PUR IDSTAGGER NC4 NC5 NC6 NC7 NC8 NC9 NC10 NCl1
NC13 QO Q1 Q2 Q3 Q4 Q5 lNEXT NC21 NC22 NC23 VCC
EQUATIONS
QO:= IQO*IDSTAGGR*/NEXT
+ QO*DSTAGGR*INEXT
Q1 := IQ1 *QO*/DSTAGGR*/NEXT
+ Ql */QO*/NEXT
+ Q1 *DSTAGGR*/NEXT
Q2:= IQ2*Q1 *QO*/DSTAGGR*/NEXT
+ Q2*/Q1 */NEXT
+ Q2*/QO*/NEXT
+ Q2*DSTAGGR*/NEXT
Q3:= IQ3*Q2*Q1*QO*/DSTAGGR*/NEXT
+ Q3*/Q2*/NEXT
+ Q3*/Q1 */NEXT
+ Q3*/QO*/NEXT
+ Q3*DSTAGGR*/NEXT
Q4 := IQ4*Q3*Q2*Q1 *QO*/DSTAGGR*/NEXT
+ Q4*/Q3*/NEXT
+ Q4*/Q2*INEXT
+ Q4*/Q1*/NEXT
+ Q4*/QO*/NEXT
+ Q4*DSTAGGR*/NEXT
Q5:== IQ5*Q4*Q3*Q2*Q1 *QO*/DSTAGGR*/NEXT
+ Q5*/Q4*/NEXT
+ Q5*/Q3*/NEXT
+ Q5*/Q2*/NEXT
+ Q5*/Q1 *INEXT
+ Q5*/QO*/NEXT
+ Q5*DSTAGGR*/NEXT
NEXT = PUR
+ Q5*Q4*/Q3*/Q2*/Q1 *IQO*IDSTAGGR
; ICLK3 = 6.312 MHz CLOCK DERIVED FROM DS-2 INTERFACE
; DSTAGGR = INDICATES WHEN DATA IS READY TO BE INTERROGATED BY
;
PATTERN RECOGNITION CIRCUITRY
; PUR = POWER-aN-RESET
; QO-Q5 = O/P STAGES OF COUNTER
; NEXT = LOAD-ALL-ZEROES COMMAND TO COUNTER

6-88

GND

Appendix H. PAL Equations ForCBITRCNT
PAL 22V10
FILENAME; CBITRCNT
CONTROL BIT REMOVAL INDICATOR/COUNTER (IC 7)
CYPRESS SEMICONDUCTOR
ICLK3 IRE SET ISTART NC4 NC5 NC6 NC7 NC8 NC9 NClO NCll
NC13 QO Q1 Q2 Q3 Q4 Q5 ILD49 NC21 NC22 NC23 VCC
EQUATIONS
QO := IQO*START*/LD49
+ QO*ISTART*ILD49
Q1 := IQ1 *QO*START*/LD49
+ Q1 */QO*/LD49
+ Q1 *ISTART*ILD49
Q2 := IQ2*Q1 *QO*START*/LD49
+ Q2*/Q1 */LD49
+ Q2*/QO*/LD49
+ Q2*ISTART*ILD49
Q3 := IQ3*Q2*Q1 *QO*START*/LD49
+ Q3*/Q2*/LD49
+ Q3*/Q1 */LD49
+ Q3*/QO*/LD49
+ Q3*ISTART*ILD49
Q4 := IQ4*Q3*Q2*Q1 *QO*START*/LD49
+ Q4*/Q3*/LD49
+ Q4*/Q2*/LD49
+ Q4*/Q1 */LD49
+ Q4*/QO*/LD49
+ Q4+ISTART*/LD49
Q5 := IQ5*Q4*Q3*Q2*Q1 *QO*START*/LD49
+ Q5*/Q4*/LD49
+ Q5*/Q3*/LD49
+ Q5*/Q2*/LD49
+ Q5*/Q1 */LD49
+ Q5*/QO*/LD49
+ Q5*ISTART*ILD49
LD49 = Q5*Q4*/Q3*/Q2*/Q 1*/QO
+ RESET
; CLK3 = 6.312 MHz CLOCK DERIVED FROM THE DS-2 INTERFACE
; RESET = LOCALISED RESET GENERATED WHEN "ALIGNMENT" IS LOST
; START = INDICATES THAT THE FIRST CONTROL BIT SEQUENCE (01000)
;
HAS BEEN DETECTED
; QO-Q5 = COUNTER O/P STAGES
; LD49 = LOAD-ALL-ZEROES COMMAND

6-89

GND

Appendix I. PAL Equations For ALIGNDET
PAL 22VI0
FILENAME; ALIGNDET
FRAME ALIGNMENT DETECfOR (IC 4)
CYPRESS SEMICONDUCTOR
ICLK3 DO Dl D2 D3 D4 /PUR IE ID IC ILD49 GND
NC13 NC14 IMTRUE /FTRUE ISTART !RESET NC19 NC20 NC21 1MB IMA VCC
EQUATIONS
START := ISTART*ID4*D3*ID2*/DI */DO
+ START*!RESET
FTRUE = IE*/D*C*LD49*/D4*IDI *START
+ E*D*/C*LD49*D4*/D1*START

MTRUE = E*D*C*LD49*D4*D3*IDO*START*IMB
+ E*D*C*LD49*D4*D3*IDO*START*MB*MA
+ E*D*C*LD49*ID4*D3*IDO*START*MB*IMA
RESET = PUR
+ E*D*C*LD49*/D4*START*/MB
+ E*D*C*LD49*/D3*START*/MB
+ E*D*C*LD49*DO*START*IMB
+ E*D*C*LD49*/D4*START*MB*MA
+ E*D*C*LD49*/D3*START*MB
+ E*D*C*LD49*DO*START*MB
+ E*D*C*LD49*D4*START*MB*IMA
+ IE*ID*C*LD49*D4*START

+ IE*ID*C*LD49*/D1*START
+ E*D*/C*LD49*ID4*START
+ E*D*/C*LD49*Dl*START

; CLK3 = 6.312 MHz CLOCK DERIVED FROM DS-2 INTERFACE
; DO-D4 = DATA CHANNELS ON WHICH CONTROL-BIT-PATTERN-RECOGNITION
IS CARRIED OUT
; PUR = POWER-ON-RESET
; EDC = SEQUENCER USED WHEN SEEKING "ALIGNMENT"
; LD49 = INDICATES WHEN COMPARISON BETWEEN DATA CHANNELS
AND EXPECTED PATTERN SHOULD BE CARRIED OUT
; MTRUE = MULTI-FRAME DETECTION INDICATOR
; FTRUE = SUB-FRAME DETECTION INDICATOR
; START = INDICATES THAT THE FIRST CONTROL BIT PATTERN HAS BEEN
DETECTED
; RESET = ASSERTED WHEN ACTUAL AND EXPECTED CONTROL BIT PATTERNS
ARE NOT IN AGREEMENT
; MBMA = SEQUENCER ASSOCIATED WITH MULTI-FRAME DETECTION

6-90

Appendix J. PAL Equations For FRAMCHEK
PAL 22VIO
FILENAME; FRAMCHEK

FRAME ALIGNMENT CHECKER AND OPFIFO WRITE CONTROLLER (IC 5)
CYFRESSSEMICONDUCTOR
ICLK3 IRE SET IMTRUE IFTRUE ILD49 NC6 NC7 NC8 NC9 NClO NCll
NC13 1MB IMA IFB IFA IE ID IC IALIGNF NC22 NC23 VCC
EQUATIONS
MB := IMB*MA*MTRUE*/RESET
+ MB*MA*/RESET
+ MB*IMTRUE*/RESET
MA := IMA*/MB*MTRUE*/RESET
+ MA*IMB*IRESET
+ MA*IMTRUE*IRESET

; M SEQUENCER STATE DIAGRAM

,
; RESET

--- MTRUE

--- MTRUE

--- MTRUE

---

; ---------->1 0 1------->-------1 1 1-------->-------1 3 1------->--------1 2 1
MTRUE

------------------------------<-----------------------------FB:= IFB*FA*FTRUE*/RESET

+ FB*/FA*IRESET
+ FB*/E*IRESET
+ FB*D*/RESET
+ FB*/C*/RESET
FA:= IFA*/FB*FTRUE*/RESET
+ FA*IFB*IRESET
+ FA*IE*/RESET
+ FA*D*/RESET

+ FA*/C*IRESET
; F SEQUENCER STATE DIAGRAM

,
; RESET --- FTRUE --- FTRUE --- E*ID*C ---

------>1 0 1----->-----1 1 1----->------1 3 1----->-----1 2 1
E*ID*C

-----------------------<-----------------E := IE*D*/C*LD49*/RESET
+ E*D*/RESET

+ E*/C*/RESET

6-91

GND

Appendix J. PAL Equations For FRAMCHEK

D:= ID*/E*C*LD49*/RESET
+ D*/E*IRESET
+ D*/C*/RESET
+ D*/LD49*/RESET

C := IE*/D*/C*LD49*/RESET
+ E*D*/C*LD49*IRESET
+ IE*/D*C*/RESET
+ E*D*C*/RESET
+ IE*C*/LD49*/RESET
; EDC SEQUENCER STATE DIAGRAM
; RESET --- LD49 --- LD49 --- LD49 --- LD49 --- LD49 --- LD49 ---

,
;

------->1 0 1--->---1 1 1--->---1 3 1---->--1 2 1-->---1 6 1--->---1 7 1--->---1 5 1
1

-----------------------------------<----------------------------------ALIGNF := IALIGNF*E*/D*C*/RESET
+ ALIGNF*/RESET
; ALIGNF STATE DIAGRAM
;
ALIGNF
; RESET --- E*/D*C ---

; --------->1 0 1------>-----1 1 1

; SEQUENCE OF EVENTS PRIOR TO ALIGNMENT DECLARATION:

,
; START-LD49-STRUE-LD49-LD49-LD49-STRUE-LD49-LD49-MTRUE
; CLK3 = 6.312 MHz CLOCK DERIVED FROM DS-2 INTERFACE
; RESET = ISSUED IF ACTUAL AND EXPECTED CONTROL BIT PATTERNS DO
;
NOT AGREE
; MTRUE = MULTI-FRAME DETECTION INDICATOR
; FTRUE = SUB-FRAME DETECTION INDICATOR
; LD49 = INDICATES WHEN COMARISON BETWEEN ACTUAL AND EXPECTED
CONTROL BIT PATTERNS SHOULD TAKE PLACE
;
; MBMA = SEQUENCER ASSOCIATED WITH MULTI-FRAME DETECTION
; FBFA = SEQUENCER ASSOCIATED WITH SUB-FRAME DETECTION
; EDC = SEQUENCER USED IN DETERMINATION OF "ALIGNMENT"
; ALIGNF = WHEN TRUE INDICATES "ALIGNMENT" HAS BEEN ATTAINED

6-92

CYPRESS
SEMICONDUCTOR

Using CUPL With Cypress PLDs
This application note covers the following topics:
CUPL package components
CUPL programming language syntax
CUPL examples, using Cypress PLDs
CUPL compiling
A high-level universal language for programmable
logic devices (PLDs), CUPL works with schematic capture packages such as SCHEMA and OrCAD-SDT
and can port to UNIX-based systems.

put. The output file contains a comparison of the
device's expected output with its actual output; this is
based on a file created by CUPL during compilation
called the absolute file, filename.ABS. The comparison
file contains the original header information found in
filename.SI, all vectors that compared positively, and all
discrepancies. CSIM flags the discrepancies with the
values determined from the original logic equations.
The CSIM command line is shown in Figure 2.
When running CSIM with the -w or -d flag, you can
change the view of the waveform by using the keys
shown in Figure3.

CUPL Package Components
The CUPL package consists of CUPL (Universal
Compiler for Programmable Logic), CSIM (CUPL
Simulator), CBLD (CUPL Build), and PTOC
(PALASM to CUPL Translator).

CBLD
The CBLD program allows you to maintain and
personalize CUPL device libraries. Figure 4 shows the
CBLD command line. You can use CBLD to create
custom library files consisting, for example, of only the
parts you currently use. The structure of this ASCII text
file appears in FigureS.
CBLD also checks to see if the current CUPL version matches the current version of the device library. If
the key in the library does not match the CUPL version,

CUPL
The major component of the CUPL package is the
CUPL program. This me allows you to compile logic
description files that can be downloaded to a device
programmer. CUPL supports Cypress's entire 20-pin
PAL family, the PAL C 22VIO, the PAL C 20GIO,
and the CY7C33x family of parts.
In addition to providing a programming syntax
similar to that of other PLD programming packages,
CUPL helps implement lists, address ranges, and bit
fields efficiently. CUPL includes state machine syntax
(SMS) and truth-table input capability, allowing you to
enter complex designs easily into Cypress's PLDs.
CUPL also has four levels of minimization for logic
reduction.
CUPL comes with a menu-driven interface and a
DOS command-line interface (the latter is explained in
the last section of this application note). The menu interface integrates all the features necessary for efficient
design implementation, including a program and
JEDEC file editor, compiler, and simulator (Figurel).

sage Center

OMd'it!mmw mu ,

* CoIIplle aJPL rile
* Look at DOC rile

* Rev Ie.. error LST r lie
• JEDEC f lie ed i tor
• Inpllt sbulatlon file
• Shoul"te CUPL tHe
• \llew SI .... I.tlon ReSlllt"
* Deu jce Select ion
• Help (aJPL Qulclc Reference)
• Tutorial for PLD's
• QIllt

CSIM
CSIM, the file simulator for CUPL, takes an ASCII
file as input (filename.SI) and outputs a file called
filename.SO. The input file functionally describes the
part by specifying the device's input and expected out-

Allows YOIl to edit or conuert
a design file.

Figure 1. Menu Interface Screen

6-93

CSIM [flags]

[library]

source

CBLD [flags]

Where:
[-flags] may have the following values
-1
-j

-v
-u

-w
-d

[build]

[library) [devic'es]

Where:
..
[flags] may have the following values

create listing file
append test vectors to JEDEC
file
display simulation to screen
use specified library
(MS-DOS only) create listing
file and display waveforms
(MS-DOS only) display an
existing simulation output
file in waveform format

-b
-1
-m
-t
-u
-e

generate library using build
file
list long contents of library
list allowable macros by pin
list short contents of library
use specified library
list allowable extensions for
devices

[Build] is the name of the build file
to be used with the -b option flag

[library] is the name of the library
that contains the device which was
used when CUPL compiled the original
source file.

[Library] is a device library name
and path name to be used with the -u
option

source is the name of the ASCII
source file

[Devices] is one or more device names
to be used with the -t or -1 option

Figure 2. CSIM Command Line
Figure 4. CBLD Command Line

.....
t

+
Fl
F2
F3
F4
F5
F6
F9
FlO

Scroll Right
Scroll Left
Scroll Up
Scroll Down
Decrease scale horizontally
Enlarge scale horizontally
Grid on/off
Exit to DOS
Shift screen left
Shift screen right
Create waveform hardcopy
Waveform legend

You can place ~n. "X" within any number to indicate a Don't Care value. Appendix A shows an example
of using the Don't Care specification within truth tables.
Comments are delimited with 1* and *1. The CUPL
compiler ignores everything between these characters.
For example, to put a paragraph of explanation within a
program, enclose the entire paragraph in a set of comment delimiters. You do not have to put delimiters on
every line, as in some packages.
CUPL also supports list notation. Enclose all items
in the list in square brackets:.
[variable, variable, variable, ...]
When using sequentially numbered lists, you can
abbreviate the format to
[variablem..n]
CUPL 's format can be considered in three major
parts: the header, pin/node defmition, and equations
sections. The· header section contains general information about the design. The pin/node section assigns variable names to the device's pins and nodes. The equations section declares the device's function and can include truth tables, state machine syntax, Boolean equations, or a combination of these three. (Sample CUPL
programs are listed in the appendices and are described
later in this application note.)

Figure 3. CSIM Waveform Viewing Commands
CBlD generates an error message, and compilation is
aborted. The file CUPL.DL contains a description of all
devices supported by the current version of CUPL.

CUPL Programming Language Elements
The CUPL programming language's elements and
syntax are very similar to those of other languages.
Reserved words that cannot be used as variable names
are listed in Figure6.
You can use alternate number bases in CUPL by
putting the base's name within single quotes immedi~
ately before the number. The designations for the supported number bases appear in Table 1. For example, to
assign the hexadecimal value 16 to the variable "A,"
write:
A

=

Header Section
Figure 7 shows the header format. The NAME

descriptor must be followed by the· name for the
JEDEC map output, and the DEVICE descriptor must

'h'16

6-94

TARGET library
SOURCE library1
devices I *
SOURCE library2
devices I *

Table 1. Number Base Representation
Base Name
Binary
Octal
Decimal
Hexadecimal

Where:
TARGET identifies the new library.
SOURCE identifies the source
libraries.

Operator
!
&
#
$

library1 and library2 indicate source
library names
devices describes devices that are
contained in the
libraries

' 0'

' d'
'h'

Example
!A
A&B
A#B
A$B

Description
NOT
AND
OR
XOR

The NODE declaration statement tells the compiler that a variable is needed to hold some kind of
state information within the. device. This variable's outputs are not assigned to any output pin. You can use
the NODE statement to assign variable names-and
thus functions-to the buried registers in the CY7C330.
Or you might use the NODE statement to arbitrarily assign a variable name to any unused macrocell in a PAL
C 22VlO. This statement has the form
NODE [!]var;
Because the NODE statement arbitrarily assigns a
register to the specified variable name, it might be more
desirable to force the assignment of a variable to a
specific node. You can do this with the PINNODE
statement:
PINNODE node_n = ![var]
The FIELD assignment assigns a group of signals
to one variable name. This feature is useful for address
decoding and with truth tables, as shown in Appendix A.
The FIELD statement has the form:
FIELD var = [var,var, ...,var]
The MIN declaration overrides the minimization
level for a specific variable. This is useful, for example,
in designs where a portion of the design should not be
minimized. The MIN declaration has the form
MIN var[.ext] = level;

* is used to describe all devices in
a library
Figure 5. CBLD Custom Library Build File Format

specify the device library for use during compilation. If
you specify a different device file on the command line
when you invoke the compiler, this file overrides the
name found after DEVICE in the programming fIle.
Pin/Node Section

The pin declaration assigns specific pins to variable
names using the format
PIN pin n = [!]var;
Both pin nand var can be lists. Use the "!" with
inputs to indicate an active Low. The compiler chooses
the signal's inverted sense when it is indicated as active
in the logic equations. Use the "!" with outputs to indicate an active-Low output, and write the equations in a
logically true form. In this case, the compiler performs
DeMorgan's Theorem on the output variable to ensure
that the output is a Low-asserted signal.
FORMAT
FUNCTION
IF
JUMP
LOC
LOCATION
MACRO
MIN
NAME
NODE
OUT
PARTNO

Prefix
'b'

Table 2. CUPL Logical Operators

library indicates the target library
name.

APPEND
ASSEMBLY
ASSY
COMPANY
CONDITION
DATE
DEFAULT
DESIGNER
DEVICE
ELSE
FIELD
FLD

Base
2
8
10
16

PIN
PINNODE
PRESENT
REV
REVISION
SEQUENCE
SEQUENCED
SEQUENCEJK
SEQUENCERS
SEQUENCET
TABLE

NAME;
PARTNO;
REVISION;
DATE;
DESIGNER;
COMPANY;
ASSEMBLY;
LOCATION;
DEVICE;
FORMAT;

Figure 7. CUPL Header Format

Figure 6. CUPL Reserved Words

6-95

CUPL also contains several preprocessor commands that operate on the source file before the fIle is
passed on to the parser. These commands perform
functions such as string· substitution, fIle inclusion, and
Ext

Side

.D

L

.L
.K

L
L
L

.S
.R
.T
.DQ
.LQ

L
L
L
R
R

.AP

L

.AR
.SP
.SR
.CK

L
L
L
L

.OE
.CA
.PR
.CE

L

.LE
.OBS

L
L

.BYP
.DFB
.LFB
.TFB
.IO
.INT

L
R
R
R
R
R

.CKMUX
.OEMUX
.TEC

L

· IMUX

L

.Tl
.T2
.IOD

L
R

.IOL

R

· IOCK

L

· IOAR

L

· IOAP

L

· IOSR

L

.IOSP

L

.ARMUX

L

.J

L
L
L

L
L

L

. APMUX
.LEMUX

L

conditional compilation. The commands allow you to
develop general-purpose descriptions or modular portions of descriptions and customize them for different
applications. Appendix D shows how to use the
preprocessor command $DEFlNE to assign numbers to
state variables.

Description
D input of D flip-flop
D input of latch
J input of JK flip-flop
K input of JK flip-flop
S input of SR flip-flop
R input of SR flip-flop
T input of T flip-flop
Q output of D flip-flop
Q output of a latch
Asynch preset of flip
-flop
Asynch reset of flip-flop
Synch preset of flip-flop
Synch reset of flip-flop
Programmable clock of
flip-flop
Programmable OE
Complement array
Programmable preload
CE input of enabled D-CE
type flip-flop
Programmable latch enable
Programmable observability
of buried nodes
Register bypass
D feedback selection
Latch feedback selection
T feedback selection
Pin feedback selection
Internal feedback selec
tion
Clock MUX selection
Tri-state MUX selection
Technology-dependent fuse
selection
Input MUX selection of
two pins
Tl ~nput of 2-T flip-flop
T2 input of 2-T flip-flop
Pin feedback path through
D register
Pin feedback through
Latch
Clock for pin feedback
register
Asynchronous reset for
pin feedback register
Asynchronous preset for
pin feedback register
Synchronous reset for pin
feedback register
Synchronous preset for
pin feedback register
Asynchronous reset MUX
selection
Asynchronous preset MUX
selection
Latch enable MUX selection

CUPL Programming Language Syntax
This section focuses on CUPL's equation section.
The program's logical and arithmetic operators (Tables
2 and 3, respectively) resemble those used in other
programming languages.
A variable's function depends on the extension
added to it in the logic equation. These extensions
define such capabilities as flip-flop descriptions and
programmable three-state enables. The first column of
Figure8 lists the extension that is used after the variable
name. The second column indicates the side of the
equation on which the extension is used. The third
column briefly describes the extension's function. For
example, the .OE extension controls the output-enable
function for all Cypress PLDs with I/O pins; the
.CKMUX extension selects the source for the inputregister clock in the CY7C330 and CY7C332; and .D
selects registered output on devices that have both combinatorial and registered outputs.
To see the extensions you can use with a specific
Cypress part, use the CBLD program. To see all the
possible extensions for use when programming the PAL
C 22VlO, for example, the command line is
CBLD -e CUPL P22VlO
You can use the APPEND statement to assign
more than one expression to a variable. This is the same
as logically ORing the variable's present state with the
expression that follows the APPEND statement. The
latter has the form
APPEND [!]var[.ext] = expr;
CUPL also has several powerful set operations that
you can use to increase code readability and decrease
the amount of equation input. These set operations
serve in the equations section to simplify equation
input. For example,
[varl, var2, var3] & var4;
equates to
[varl&var4, var2&var4, var3&var4]
Table 3. CUPL Arithmetic Operators
Operator
+

*

I
%
**

Figure 8. CUPL Variable Extensions

6-96

Example
A+B
A-B
A*B
AlB
A%B
A**B

Operation
Add
Subtract
Multiply
Divide
Modulus
Exponent

Priority
1
1
2

2
2
3

TABLE var list 1
{

--

input_1
input_2

=>
=>
=>

output_1
output_n

Where:
var_list_1 are the input variables
var list 2 are then output variables
inp~t_n is the value of the inputs
(hex by default)
output_n is the value of the outputs

PRESENT 'b'Ol
NEXT 'b'10;

Figure 10. Unconditional Next State Diagram
tion to state_m, else if expr_n is true, transition to
state_y, else transition to state z.
PRESENT state n
IF expr NEXT state_ m;

Figure 9. Truth Table Entry Format

J

Use set operations such as this with caution to ensure that when CUPL expands an expression, the result
represents the minimum amount of logic needed to
completely specify the desired operation. To see if a set
of variables equals a constant, type
[varl, var2, var3]:constant
Or to check whether a set of variables lies between
a range of constants, type
[varl, var2, var3]:[constant 10 ..constant hi]
CUPL supports truth tables with the fo~at shown
in Figure 9. Truth tables are one of the easiest ways to
express device function, and they are among the most
easi1?, modified methods of design entry. You specify
the mput and output variable lists, then specify a oneto-one assignment from the value of the input variable
list to the value of the output variable list. You can use
Don't Care values in the input specifications to make
design entry easier. An example of truth tables with
Don't Care values is shown in Appendix A.
The state machine syntax of CUPL has the general
form of
SEQUENCE state var list
{
PRESENT state_l
statements;

IF expr_n NEXT state_y;
[DEFAULT NEXT state z;]
3. Unconditional Synchronous Output Statement
(Figure 12): This statement describes a transition from
the present· state to a next state with a synchronous output accompanying the transition.
PRESENT state n
NEXT state n OUT [!]var ...OUT[!]var;
4. Conditional S"'*ynchronous Output Statement (Fig.
ure 13): This statement describes a condition transition
with its associated synchronous outputs.
PRESENT state n
IF expr NEXT state_lOUT [!]var ...OUT [!]var;
IF expr NEXT state_n OUT [!]var; ..OUT [!]var;
[DEF AULT NEXT state· m OUT [!]var;]
5. Unconditional Asynchronous Output Statement
(Figure 14): This statement describes the asynchronous
outputs associated with a specific state.
PRESENT state n
OUT [!jVar ...OUT [!]var;
6. Conditional Asynchronous Output Statement
(Figure 15): This statement describes a conditional

PRESENT state n
statements;
}

INPUTA

where SEQUENCE is the state space, and PRESENT
indicates the device's present state and the function the
machine should perform based on that state.
The state machine syntax can be divided into six
parts:
1. Unconditional Next Statement (Figure 10): If the
machine is in state_n, then transition to stateJD.
PRESENT state n
NEXT state m;
2. Conditional Next Statement (Figure 11): If the
machine is in state_nand if expr_1 is true, then transi-

8

~

~PUTA
In

C)

PRESENT 'b' 01
IF INPUTA NEXT 'B'10;
IF !INPUTA NEXT 'B'll;

Figure 11. Conditional Next Statement Diagram

6-97

PRESENT ' b' 01
NEXT
'B'10 OUT Y OUT !Zi

PRESENT 'b'01
OUT Y OUT !Zi

Figure 12. Unconditional Synchronous Output
Diagram

Figure 14. Unconditional Asynchronous Output
Diagram

asynchronous output associated with a specific state and
a specific input.
PRESENT state n
IF expr 'OUT [!]var ...OUT [!]var;

The two examples described here both implement
the functions of a Thunder~ird's (T-Bird's) tail lightsincluding the sequentially flashing directional signals.
The .examples present this function in both the truth
table and state machine formats to give you models of
these CUPL syntax structures.

using the FIELD statement. Similarly, you must assign
all outputs to a variable name. All the inputs and outputs in the body of the truth table must be specified
without commas, brackets, or variables. The CUPL 3.2
source code for this example is shown in Appendix A.
CUPL's simulator verifies that this truth table
operates correctly. When compiling the source code,
you must use the -A flag to produce an absolute file for
the simulator's use. The simulator also needs an input
file, filename.SI, which contains the test vectors. To
simulate a design with output going to both the screen
and a listing file, filename. SO, type
CSIM -L -v FILENAME
Appendices B and C list the input and output
simulation fIles, respectively.

Truth Table Example
The first example shows how to configure a 22VlO
so that it makes two three-segment T-Bird tail lights
perform flashing, braking, left turn, right turn, and a
combination of these functions.
Consider the truth table example· first. This example illustrates both the Truth Table syntax and
CUPL's pin declarations. Note that when you· use a
truth table, you must assign all inputs to a variable name

State Machine Examples with the CY7C330
The second example performs the same function as
the first, but is coded in CUPL's state machine syntax
instead of truth tables. This second example also differs
in that it employs Cypress's CY7C330.
The CY7C330 is a high-performance, erasable
programmable logic device (EPLD). Through the use of
the user-configurable output macroce11, bidirectional
I/O capability, input registers, and three separate

IF expr OUT [!]var ...OUT [!]var;
[DEFAULT OUT [!]var ...OUT [!]var;]

CUPL Examples Using Cypress PLDs

~_X I_N~P~_T_B_Y
~'t!Z
____

____- J

PRESENT 'b'01
IF INPUTA OUT Xi
IF !INPUTB OUT Yi
DEFAULT OUT Z;

PRESENT 'b'01
IF INPUTA NEXT 'B'10 OUT Yi
IF !INPUTA NEXT 'B'll OUT !Zi

Figure 15. Conditional Asynchronous Output With
Default

Figure 13. Conditional Synchronous Output Diagram

6-98

uses the XOR term to invert an equation's polarity
when an active-Low output signal is specified. Using the
XOR term in this example greatly reduces the number
of product terms needed to specify the design. By connecting the signal name to the XOR product term, as
shown in the equations, the equations represent a T
flip-flop.
For example, the equations for CNT2 specify that
the flip-flop toggles (a) when preloading the lower limit,
for CNT2 not equal to LL2, (b) when preloading the
upper limit, for CNT2 not equal to UL2, (c) when
counting UP, for CNTO and CNTI High, and (d) when
counting DOWN, for CNTO and CNTI Low. It is important to keep in mind that UP, UEQUAL, and LEQU AL are Low-asserted internal signals.
The part utilization for this design is shown in Appendix E. The CUPL design file appears in Appendix F.

clocks, Cypress has tailored the CY7C330's architecture
to implement high-performance state machines.
This 28-pin device contains 11 dedicated input
macrocells, whose input registers can be controlled by
either of two input-register clocks. The 12 I/O macrocells (see Figure 1 in "Using ABEL to Program the
CY7C330") contain an output register that is controlled
by a dedicated state-register clock, output-enable control, an exclusive-or product term, an input register, and
feedback selection. Each macrocell has between nine
and 19 product terms you can use for design implementation. Each pair of macrocells also has a shared input
multiplexer, which allows you to bury an output register
while still utilizing the I/O pin as a device input The
CY7C330's output enable can be controlled by either
pin 14 or a product term. The device also provides four
buried registers that can hold state information.
The T-Bird design requires only four flip-flops
[QO .. 3] to specify all possible tail-light combinations.
Note that assignments such as LEFf.D = 'b'OOI are
not allowed in the main body of the state machine structure. Instead, all outputs must be handled individually
with the OUT command The source code for this example appears in Appendix D.
An additional CY7C330 example shows the extended function of this PLD family. The CY7C330, unlike the PAL C 22VI0, has more nodes than pins. Thus,
the additional nodes must be assigned node numbers so
that they can be referenced in the design. Table 4 lists
the node names. Numbers 33 to 44 refer to the output
register associated with each pin. IMUXI refers to the
shared input multiplexer between pins 28 and 27.
The second CY7C330 design example is an
up/down counter with preloadable limits. The lower
limits are loaded the dedicated input registers on the
rising edge of the lower-limit clock (lLC), and the
upper limits are loaded the I/O macrocells' input
registers on the rising edge of the upper-limit clock
(ULC). The waveforms for preloading the upper limits
and lower limits are shown in Figure16.
When preloading is done, the counter counts upward from the last loaded limit until the other limit is
reached. The counter then counts in the opposite direction until reaching the other limit The waveforms for
counting between the preloaded limits of 4 and 8 are
shown in Figure17. If the input register on a specific pin
is not being used, you can reference the output register
by referring to the I/O pin name. This is shown on pins
20 and 23.
The CY7C330's shared input multiplexer is used to
select an additional input into the product term array
from either of a macrocell pair's input registers (and
thus either macrocell's I/O pin). When referencing this
input-signal name in the equations section, you must use
the MUX name instead of the actual input signal name.

CY7C332 State Machine Example

The last example uses the Cypress CY7C332. This
versatile combinatorial PLO has 25 array inputs: 13
dedicated inputs and 12 I/O inputs. Each input has a
macrocell that you can configure as a register, latch, or
simple buffer. Outputs have polarity and three-stateTable 4. Cypress CY7C330 Node Assignments
PIN
BRO
BRI
BR2
BR3
28
27
26
25
24
23
20
19
18
17
16
15
IMUXI
IMUX2
IMUX3
IMUX4
IMUX5
IMUX6

Another important CY7C330 feature is the XOR
product term. During DeMorgan minimization, CUPL
6-99

NODE
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

~

~~~~~~~~~~~~~~~~~~U~S~in~g~C~U~P~L~VVJ=I~th~C~yP~re~S~S~P~L~D=S
L...ILJI..Jl .JL .JL ..JLJL .JL

CLK
LLC
~ ULC
1'1 LLO

lit

n n

JI

J 1

~ LU
~ LL3

tl LL3

19' LL4

ILO LL5
1

LL6

1L2 LL7
1t3 LPL
1'10 ULO
fJ5 UL1
[39 UL2
a6 UL3
~? UL4
;J8 UL:5
8 ·UL6
~:5 UL7
~? UPL
its CHTO
:5 CHT1
~6 CHT2
? CHT3
9 CHT4

I

u

1
f1

II

................................
:

1 LL6

2 LL?

3 LPL
~O

ULO
UL1
p9 UL2
~6 UL3
P? UL4
p8 UL:5
8 UL6
:5 UL7
~? UPL
Ft8CHTO
~5

n
•

~:5~C~H~T~1----r-------~~--~~1
~6

?

9
~4

~O

ra:3
~4

IU
~3
~2

4
1t6

CHT2
CHT3
CHT4
CHT:5
CHT6
CHT?
UEQUAL
UP
PLDOHE
LEQUAL
I'CHTOE
I'RESET

r

I

J

Figure 17~ .Up Down Counter Operation Waveforms

6-100

r~==~--~

o E PT-=--=Ec..:.R"-'M-=---_ _ _ _-I----i
C4

p~
SUM OF
PRODUCTS

XOR

TO

INPUT

::>O.........._ _ _--+_-t-T""-=O~I

0 PIN

BUFFER

OE

(PIN

ClKl
ClK2

14)

Figure 19. The CY7C332 I/O Macrocell

control product terms. Figure 18 shows the IJO macrocell. Each macrocell has up to 19 product terms to accommodate complex applications.
In this example, the CY7C332 serves as a simple
decoder (Appendix G). The device decodes a group of
address lines to select one of four "windows" in memory.
Inputs are implemented in each of the possible macrocell configurations. When reviewing the example code,
it is important to note the use of the .CKMUX,
.LEMUX, .DQ, and .LQ extensions.

JED file for programming, a .LST error listing, and a
.DOC equations-and-utilization file. You can compile a
fIle either from the DOS command line, or from the
CUPL menu structure. The· compilation command and
its description are shown in Figure19.

References
This Application Handbook provides a more
detailed explanation of the up/down counter example
using the CY7C330 in "Understanding the CY7C330
Synchronous EPLD." More information on the
CY7C33x family can be found in Cypress's
BiCMOS/CMOS Data Book.

CUPL Compilation
The input to the CUPL system is an ASCII text file
with extension .PLD. The various outputs include a

6-101

cupl [-flags]
source

[library]

[device]

Where -flags is the following set of
compiler options
-j
-n
-h
-i
-a
-1
-x
-f
-p
-b
-d
-r
-g
-u
-s
-e
-x
-w
-mO
-ml
-m2
-m3
-m4
-c

JEDEC download format
use source filename as JEDEC
filename
ASCII-HEX download format
HL download format
create absolute file (for
simulation purposes)
create listing file
create expanded product-terms
in documentation file
create fuse plot/chip diagram
in documentation file
create PDEF database
interchange format file
create Berkeley PLA format file
deactivate unused OR terms
disable product term merging
program security fuse
use specified library for
compilation
perform logic simulation after
compile
create expanded macro
definition file
generates a part usage
documentation file (filename.DOC)
perform simulation with
waveform output (PC only)
no minimization
quick minimization (default)
minimization level 2 (QuineMcCluskey)
minimization level 3 (Presto)
minimization level 4 (Espresso)
create PALASM format file

Library is the library name including
the path that should be used other
than the default library.
This option is used in conjunction with the u flag.
Device is the CUPL mnemonic name of
the device which should be used when
compiling the source file.
This option over rides the name used in the
CUPL source file.
Source is the user-created ASCII
logic description file (filename.
PLD) .
Figure 20. CUPL Compilation

6-102

Appendix A. T-Bird Truth-Table CUPL Code for PALC22VI0
Name
Partno
Revision
Date
Designer
Company
Location
Assembly
Device

TBIRD_TT.PLD;
PALC22V10;
01;
04-08-90;
Joe Designer;
Cypress Semiconductor;
U1;
Test;
P22V10;

/*
This program implements the control signals for the tail lights
of a Thunderbird.
The lights have three segments for both the
left and right tail light.
The control signal into the device
include a Left and Right signal, a Flash signal (Hazard), a brake
signal, and a ignition signal (IGN).
The outputs of the device
are the six separate tail light segments.
A Truth Table is used
to specify the control logic.
*/

PIN
PIN
PIN
PIN
PIN
PIN

1
4
5
6

CLK;
LT;
RT;
BRAKE;
FLASH;
IGN;

PIN
PIN
PIN
PIN
PIN
PIN
PIN

21
22
23
16
15
14
[17 .. 20]=

7

8

RI;
RM;
RO;
LI;

INPUTS

Clock for Device
*/
Left turn signal
*/
Right turn signal
*/
Brake signal
*/
Hazard flash singal */
Ignition input
*/

Right inside tail light */
Right middle
*/
Right outside */
Left inside
*/
Left middle
*/
Left outside
/*
*/
/* State variable holders */
/*
/*
/*
/*
/*

LM;

LO;
[QO .. 3];

[IGN,FLASH,LT,RT,BRAKE,LO,LM,LI,RI,RM,RO];
[LO.D,LM.D,LI.D,RI.D,RM.D,RO.D];

FIELD INPUTS
FIELD OUTPUTS
TABLE

/*
/*
/*
/*
/*
/*

=>

OUTPUTS

{

/* Quiescent state */
'B'11000XXXXXX
'B'OlXXOXXXXXX

=>
=>

'B' 0;
' B' 0;

=>
=>

' B' 0;
'B'llllll;

/* Flash */
'B'XOXXX111111
'B'XOXXXOOOOOO

6-103

Appendix: A. T-Bird Truth-Table CUPL Code (cont)

/* Brake */
'B'X1001XXXXXX

=>

'B'llllll;

=>
=>
=>
=>

'B'OOlOOO;
'B' 011000;
'B'111000;
'B'O;

=>
=>
=>
=>

'B'OOO100;
'B'OOOl10;
'B' 000111;
' B' 0;

=>
=>
=>
=>

'B'OOllll;
'B'Olllll;
'B'111111;
'B' 000111;

4'
/* Left turn */
'B'l1lOOOOOXxx
'B'11100001XXX
'B'll100011XXX
'B'lllOOlllXXX
/* Right turn */
'B'l1010XXXOOO
, B' 11010XXX100
, B' 11010XXX110
, B' 110l0XXX111
/* Left turn and brake */
'B'11101000XXX
'B'11101001XXX
'B'11101011XXX
'B'11101111XXX

/* Right turn and brake */
'B'11011XXXOOO
'B'11011XXX100
'B'11011XXX110
'B'11011XXX111
/* Both turn

-

=>
=>
=>
=>

'B' 111100;
'B'111110;
'B'l11111;
'B'111000;

light flash in reverse sequence */

'B'l1110000000
, B'llll0111111
, B' 11110011110
, B' 11110001100
/* Illegal condition

-

'B'll11l000000
, B' 11111100001
'B'lllllOlOOlO
'B'11111001l00

=>
=>
=>
=>

'B'11111l;
'B' 011110;
'B'001100;
'B'O;

All on */

=>
=>
=>
=>

'B'lOOOOl;
'B'OlOOlO;
'B'OOllOO;
'B'O;

6-104

Appendix B. T -Bird Simulator Input
Name
Partno
Revision
Date
Designer
Company
Location
Assembly
Device

ORDER:

TBIRD_TT.PLD;
PALC22V10;
01;
04-08-90;

Joe Designer;
Cypress Semiconductor;
U1;

Test;
P22V10;

"INPUTS,
CLK, IGN, FLASH, LT, RT, BRAKE,
OUTPUTS", LO, LM, LI, RI, RM, RO;

VECTORS:
$MSG " QUIESCENT
C11000
$MSG " QUIESCENT
C01XXO
$MSG
$MSG
C 0
$MSG
C 0
$MSG
C 0

STATE - 1";
LLLLLL
STATE - 1";
LLLLLL

""i

" FLASH HIGH";
H H H H H H
0 X X X
" FLASH LOW";
L L L L L L
0 X X X
" FLASH HIGH" ;
H H H H H H
0 X X X

$MSG "";
$MSG " BRAKE";
C X 1 0 0 1
$MSG "";
$MSG " LEFT
C11100
$MSG " LEFT
C 1 1 1 0 0
$MSG " LEFT
C 1 1 1 0 0
$MSG " LEFT
C 1 1 1 0 0
$MSG " LEFT
C 1 1 1 0 0
$MSG "" i
$MSG " RIGHT
C 1 1 0 1 0
$MSG " RIGHT
C 1 1 0 1 0
$MSG " RIGHT
C 1 1 0 1 0
$MSG " RIGHT
C 1 1 0 1 0

H H H H H H

TURN OFF";
LLLLLL
TURN 1";
L L H L L L
TURN 2";
L H H L L L
TURN 3";
H H H L L L
TURN OFF";
L L L L L L

TURN
L
TURN
L
TURN
L
TURN
L

1";
L L H L L
2";
L L H H L

3";
L L H H H
OFF";
L L L L L

6-105

Appendix B. ,T·BirdSimulator Input (cont)

$MSG "";
$MSG " BRAKE
Clll0l
$MSG " BRAKE
Clll0l
$MSG " BRAKE
C 1 1 101
$MSG " BRAKE
Clll0l

AND LEFT TURN 1";
LLHHHH
AND LEFT TURN 2";
LHHHHH
AND LEFT TURN 3";
H H H H H H
AND LEFT TURN OFF";
LLLHHH

$MSG "";
$MSG " BRAKE
C 1 1 0 1 1
$MSG " BRAKE
C 1 1 0 1 1
$MSG " BRAKE
C 1 1 0 1 1
$MSG " BRAKE
C 1 1 0 1 1

AND RIGHT
H H H
AND RIGHT
H H H
AND RIGHT
H H H
AND RIGHT
H H H

TURN OFF";
L L L
TURN 1";
H L L
TURN 2";
H H L
TURN 3";
H H H

6-106

Appendix C. T-Bird Simulator Output
CSIM: CUPL Simulation Program
Version 3.2a Serial# MD-32A-6295
Copyright (C)
1983,1989 Logical Devices,
CREATED Mon Apr 09 09:32:04 1990

Inc.

LISTING FOR SIMULATION FILE: tbird_tt.si
TBIRD_TT.PLD;
1 : Name
PALC22V10;
2 : Partno
01;
3: Revision
4 : Date
04-08-90;
5: Designer
Joe Designer;
6 : Company
Cypress Semiconductor;
7 : Location
U1;
Test;
8 : Assembly
P22V10;
9 : Device
10:
11:
12: ORDER: "INPUTS",
CLK, IGN, FLASH, LT, RT, BRAKE,
13:
OUTPUTS", LO, LM, LI, RI, RM, RO;
14:
Simulation Results
QUIESCENT STATE - 1
0001: INPUTSC11000
QUIESCENT STATE - 1
0002: INPUTSC01XXO
FLASH HIGH
0003: INPUTSFLASH LOW
0004: INPUTSFLASH HIGH
0005: INPUTSBRAKE
0006: INPUTSLEFT
0007:
LEFT
0008:
LEFT
0009:
LEFT
0010:
LEFT
0011:

TURN OFF
INPUTSTURN 1
INPUTSTURN 2
INPUTSTURN 3
INPUTSTURN OFF
INPUTS-

OUTPUTS-

LLLLLL

OUTPUTS-

LLLLLL

COOXXX

OUTPUTS-

HHHHHH

COOXXX

OUTPUTS-

LLLLLL

COOXXX

OUTPUTS-

HHHHHH

CX1001

OUTPUTS-

HHHHHH

C11100

OUTPUTS-

LLLLLL

C11100

OUTPUTS-

LLHLLL

C11100

OUTPUTS-

LHHLLL

C11100

OUTPUTS-

HHHLLL

C11100

OUTPUTS-

LLLLLL

6-107

Appendix C. T-BirdSimulator Output (cont)

RIGHT TURN 1
0012: INPUTSCll010
RIGHT TURN 2
0013 : INPUTSCll010
RIGHT TURN 3
0014: INPUTSCll010
RIGHT TURN OFF
0015: INPUTSCll010
BRAKE AND LEFT TURN
0016: INPUTSCll101
BRAKE AND LEFT TURN
0017: INPUTSCll101
BRAKE AND LEFT TURN
0018: INPUTSCll101
BRAKE AND LEFT TURN
0019: INPUTSCll101
BRAKE AND RIGHT TURN
0020: INPUTSCll011
BRAKE AND RIGHT TURN
0021: INPUTSCll011
BRAKE AND RIGHT TURN
0022: INPUTSC11011
BRAKE AND RIGHT TURN
0023: INPUTSCllOll

OUTPUTS-

LLLHLL

OUTPUTS-

LLLHHL

OUTPUTS-

LLLHHH

OUTPUTS-

LLLLLL

OUTPUTS-

LLHHHH

OUTPUTS-

LHHHHH

OUTPUTS-

HHHHHH

OUTPUTS-

LLLHHH

1
2
3
OFF
OFF
OUTPUTS1
OUTPUTS2
OUTPUTS3
OUTPUTS-

HHHLLL
HHHHLL
HHHHHL
HHHHHH

6-108

C~RE3S
~,

Using CUPL With Cypress PLDs

SEMICCNDUCfOR =;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;!;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;=

Appendix D. T-Bird State-Machine CUPL Code For CY7C330
Name
Partno
Revision
Date
Designer
Company
Location
Assembly
Device

TBIRD_SM.PLD;
CY7C330;
01;
04-07-90;
Joe Designer;
Cypress Semiconductor;
U1;
Test;
P7C330;

/*
This program implements the control signals for the tail lights
of a Thunderbird.
The lights have three segments for both the
left and right tail light.
The control signal into the device
include a Left and Right signal, a Flash signal (Hazard), a brake
signal, and a ignition signal (IGN).
The outputs of the device
are the six separate tail light segments.
A State Machine is used
to specify the control logic.

*/
PIN
PIN
PIN
PIN
PIN
PIN
PIN

1
2
4
5
6
7
9

PIN
PIN
PIN
PIN
PIN
PIN
PINNODE

28
27
26
25
24
23
[29 .. 32]=

CLK;
INCLK;
LT;
RT;
BRAKE;
FLASH;
IGN;
RI;
RM;
RO;
LI;
LM;
LO;
[QO .. 3];

FIELD OUTPUTS
OUTPUTS.OE
OUTPUTS.SR
OUTPUTS.SP

/*
/*
/*
/*
/*
/*
/*

Clock for Device
*/
Clock for Inputs
*/
Left turn signal
*/
Right turn signal
*/
Brake signal
*/
Hazard flash singal */
Ignition input
*/

Right inside tail light */
Right middle
*/
/* Right outside */
Left inside
/*
*/
Left middle
/*
*/
/* Left outside */
State variable holders
*/
/*
/*

/*

[LO,LM,LI,RI,RM,RO];
'B'l;
'B' 0;
'B'O;

/* Using the $DEFINE statement to assign variable name to state values */
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE

SO
Sl
S2
S3
S4
85
S6
S7
S8

'B'OOOO
'B'OOOl
'B'0010
'B'OOll
'B'0100
'B'0101
'B'0110
'B'Olll
'B'1000
6-109

Appendix D. T..Bird State-Machine CUPL Code (cont)
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE
$DEFINE

S9
S10
Sll
S12
S13
S14
S15

'B'1001
'B'1010
'B'101l
'B'1100
'B'll0l
'B'l110
'B'1111

1* The state machine construct where QO .. 3 are the state variables */
SEQUENCE [QO .. 3]
{

1* Initial state all lights off */
PRESENT SO
OUT !LO.D OUT !LM.D OUT !LI.D OUT !RI.D OUT !RM.D OUT !RO.D;
IF (FLASH) NEXT S15;
IF (BRAKE & ! (LT
RT»
NEXT S15;
IF (IGN & LT & !BRAKE) NEXT Sl;
IF (IGN & RT & !BRAKE) NEXT S4;
IF (IGN & LT & BRAKE) NEXT S7;
IF (IGN & RT & BRAKE) NEXT Sll;
DEFAULT NEXT SO;

*

1* Left turn */
PRESENT Sl
OUT !LO.D OUT !LM.D OUT LI.D OUT !RI.D OUT !RM.D OUT !RO.D;
IF (IGN & LT) NEXT S2;
DEFAULT NEXT SO;
PRESENT S2
OUT !LO.D OUT LM.D OUT LI.D OUT !RI.D OUT !RM.D OUT !RO.D;
IF (IGN & LT) NEXT S3;
DEFAULT NEXT SO;
PRESENT S3
OUT LO.D OUT LM.D OUT LI.D
NEXT SO;

OUT !RI.D

OUT !RM.D

OUT !RO.D

/* Right Turn * /
PRESENT S4
OUT !LO.D OUT !LM.D OUT !LI.D
IF (IGN & RT) NEXT S5;
DEFAULT NEXT SO;

OUT RI.D

OUT !RM.D OUT !RO.D;

PRESENT S5
OUT !LO.D OUT !LM.D OUT !LI.D OUT RI.D OUT RM.D OUT !RO.D;
IF (IGN & RT) NEXT S6;
DEFAULT NEXT SO;

6.. 110

Appendix D. T-Bird State-Machine Code (cont)
PRESENT S6
OUT !LO.D OUT !LM.D OUT !LI.D OUT RI.D OUT RM.D OUT RO.D;
NEXT SO;

/* Brake and Left Turn */
PRESENT S7
OUT !LO.D OUT !LM.D OUT LI.D OUT RI.D OUT RM.D OUT RO.D;
IF (IGN & LT) NEXT S8;
DEFAULT NEXT SO;
PRESENT S8
OUT !LO.D OUT LM.D OUT LI.D OUT RI.D OUT RM.D OUT RO.D;
IF (IGN & LT) NEXT S9;
DEFAULT NEXT SO;
PRESENT S9
OUT LO.D OUT LM.D OUT L1.D OUT R1.D OUT RM.D OUT RO.D;
IF (IGN & LT) NEXT S10;
DEFAULT NEXT SO;
PRESENT S10
OUT !LO.D OUT !LM.D OUT !L1.D OUT R1.D OUT RM.D OUT RO.D;
IF (IGN & LT) NEXT S7;
DEFAULT NEXT SO;

/* Brake and Right Turn */
PRESENT Sll
OUT LO.D OUT LM.D OUT L1.D OUT R1.D OUT !RM.D OUT
IF (IGN & RT) NEXT S12;
DEFAULT NEXT SO;
PRESENT S12
OUT LO.D OUT LM.D OUT L1.D OUT R1.D OUT RM.D OUT
IF (IGN & RT) NEXT S13;
DEFAULT NEXT SO;

!RO.D;

!RO.D;

PRESENT S13
OUT LO.D OUT LM.D OUT L1.D OUT R1.D OUT RM.D OUT RO.D;
IF (IGN & RT) NEXT S14;
DEFAULT NEXT SO;
PRESENT S14
OUT LO.D OUT LM.D OUT LI.D OUT !RI.D OUT !RM.D OUT
IF (IGN & RT) NEXT Sll;
DEFAULT NEXT Sll;

!RO.D;

/* Brake and/or flash tail lights on */
PRESENT S15
OUT LO.D OUT LM.D OUT LI.D OUT RI.D OUT RM.D OUT RO.D;
IF (BRAKE & ! (RT # LT)) NEXT S15;
DEFAULT NEXT SO;

6-111

S}:CY>=
-==--

Using CUPL With Cypress PLDs

SEMICOIDUCTOR

Appendix E. UplDown Counter Part Utilization

CY7C330 Resources Planning Sheet
Project : Up/Down Counter with Limits
Input
Input
Register
Register
Pin
Function
Clock
1
State Clk
2
Clk 1
Clk 2
3
4
LLO
1
LLI
1
5
LL2
1
6
1
7
LL3
VSS
8
LL4
1
9
LLS
1
10
LL6
1
11
LL7
1
12
PRELOAD LOW
13
1
14
COUNTER OE
15
ULI
2
16
Reset
1
17
UL3
2
18
UL6
2
19
UL4
2
20
VSS
21
VCC
22
23
2
24
ULS
25
UL7
2
26
UL2
2
27
PRELOAD HIGH
2
28
ULO
2
None
HI
H2
None
None
H3
H4
None

Register
Function

CNTl
CND
CNT4
CNT6

CND
CNT5
CNT2
CNTO
Up Equals
UH Prel'Done
Down Equals
Up Count

Notes :Input Register Clock

#1 is pin 2
#2 is pin 3
See the Application Note for the meaning of the pin names.
Output Enable = 14 means the asynchronous pin 14 direct enable.
Z means the pin is never active

6-112

Output
Enable

Pin
Z
Pin
Z
Pin
Pin

14
14
14
14

Pin 14
Pin 14
Z
Pin 14
Z
Pin 14
None
None
None
None

# of
PTerms

9
19
11
17
13
15

15
13
17

11
19
9
19

11
17
13

~

~~~OID~~~~~~~~~~~~~~~~U~si~n~g~C~U~P~L~VVJ~lt~h~C~yp~r~e~s~s~P~L~D~s
Appendix F. UplDown Counter CUPL Code for the CY7C330
Name
Partno
Revision
Date
Designer
Company
Location
Assembly
Device

COUNTER.PLD;
PALC22V10;
01;
02-25-90;
Joe Designer;
Cypress Semiconductor;
U1;
COUNTER;
P7C330;

1*
This design is an up/down counter with prelaodable limits.
The Lower limits
are loaded into the dedicated input registers on the rising edge of LLC and
the upper limits are loaded into the input registers found in the 1/0 macrocells
on the rising edge of ULC.
The counter begins counting, when pre loading is done
upwards until the upper limit is reached, and then, begins counting downward.
This design, because the equations are already minimized and in sum of products
form, should be compiled with the -MO flag (no minimization).

*1
PIN
PIN
PIN

1
2
3

PIN
PIN
PIN

[4 .. 7]
[9 .. 12]=
13

CLK;
LLC;
ULC;
[LLO .. 3];
[LL4 .. 7];
LPL;

1* Clock used for counting *1
1* Clock for pre loading lower limit *1
1* Clock for pre loading upper limit *1
1* Lower limit hold registers */
1* Lower limit preload indications *1

1*
Counter output registers.
Pin assignments are based on the number of
product terms are available on that pin.

*1
1*
1*
1*
1*
1*
1*

*1
*1
*1
*1
*1
*1

PIN
PIN
PIN
PIN
PIN
PIN
PIN
PIN
PIN
PIN
PIN

28
15
26
17
19
24
20
23
18
25
27

CNTO;
CNT1;
CNT2;
CNT3;
CNT4;
CNT5;
CNT6;
CNT7;
UL6;
UL7;
UPL;

PINNODE
PINNODE
PINNODE
PINNODE

29
30
31
32

UEQUAL;
PLDONE;
LEQUAL;
UP;

1* Upper limit has been reached *1
1* Preloading has finished *1
1* Lower limit has been reached *1
1* Count direction *1

PIN
PIN

16
14

!RESET;
!CNTOE;

1* Reset signal clears all registers *1
1* 1/0 pin OE used for loading upper limit *1

Also
Also
Also
Also
Also
Also

used
used
used
used
used
used

for
for
for
for
for
for

Upper
Upper
Upper
Upper
Upper
Upper

limit
limit
limit
limit
limit
limit

loading
loading
loading
loading
loading
loading

1* Used for Upper limit loading *1
1* Used for Upper limit loading *1

6-113

Appendix F. UplDown Counter Code for CUPL (cont)
PINNODE
PINNODE
PINNODE
PINNODE
PINNODE
PINNODE

45
46
47
48
49
50

ULO;
UL2;
UL5;
UL4;
UL3;
UL1;

ULO.IMUX
UL2.IMUX
UL5.IMUX
UL4.IMUX
UL3.IMUX
UL1.lMUX

1*
1*
1*
1*
1*
1*

Shared
Shared
Shared
Shared
Shared
Shared

1*
1*
1*

These definitions are used to *1
indicate which pin will be fed *1
through the share feedback Mux.*1

CNTO.IOD;
CNT2.IOD;
CNT5.IOD;
CNT4.IOD;
CNT3.IOD;
CNTl. lOD;

ULO
UL2
UL5
UL4
UL3
UL1

1*
1*

CNTO.lOD;
CNT2.IOD;
CNT5.lOD;
CNT4.IOD;
CNT3.lOD;
CNTl.lOD;

UPL.CKMUX
LPL.CKMUX
RESET.CKMUX
[CNTO .. 5] .CKMUX
[UL6 .. 7] .CKMUX
[LLO .. 7] .CKMUX
[CNTO .. 7] . SR
[CNTO .. 7] .OEMUX

ULC;
LLC;
LLC;
ULC;
ULC;
LLC;
RESET.DQ;
CNTOE;

input
input
input
input
input
input

MUX
MUX
MUX
MUX
MUX
MUX

definition
definition
definition
definition
definition
definition

*1
*1
*1
*1
*1
*1

These definitions are used to
Simulate the design properly

*1
*1

1*
1*
1*

Pin 3 will be used for upper preload */
Pin 3 will be used for upper preload *1
Pin 2 will be used for lower preload */

1*

Count register will be reset by pin 16

1*

OE will be controlled by pin 14 */

*1

1*
Count equations.
Note how the use of the XOR terms significantly reduces the
number of product terms that are needed.
This allows this complex design to fit
fit into the device.

*1
CNTO.D
$

**
**
=

CNTl.D
$

**
**
*

CNTO
PLDONE
!LLO.DQ
& LPL.DQ
& CNTO
!CNTO.IOD& ULO & UPL.DQ
LLO.DQ
& LPL.DQ
& !CNTO
CNTO.lOD& !ULO & UPL.DQ ;

CNTl
!LLl.DQ & LPL.DQ & !PLDONE & CNT1
LLl.DQ & LPL.DQ & !PLDONE & !CNT1
UPL.DQ & !PLDONE & lULl & CNT1
UPL.DQ & !PLDONE & ULl & !CNTl
CNTO.IOD& PLDONE & !UP
!CNTO.IOD& PLDONE & UP

6-114

Appendix F. UplDown Counter Code for CUPL (cont)

=

CNT2.D
$

**
**
*
CNT3.D

=

$

*
**

**
CNT4.D =
$

***
**
CNTS.D

=

$

*

**
*
*

CNT6.D =
$

**
**
*
CNT7.D

=

$

**
*
*
!CNT1; *

CNT2
! LL2 . DQ
& LPL. DQ
& CNT2 & ! PLDONE
LL2.DQ
& LPL .DQ & ! CNT2 & ! PLDONE
UPL.DQ & CNT2 & !UL2 & !PLDONE
UPL.DQ & !CNT2 & UL2 & !PLDONE
CNTO.IOD& PLDONE & !UP & CNTl
!CNTO.IOD& PLDONE & UP & !CNT1;
CNT3
!LL3.DQ & LPL.DQ & !PLDONE & CNT3
LL3.DQ & LPL.DQ & !PLDONE & !CNT3
UPL.DQ & !PLDONE & !UL3 & CNT3
UPL.DQ & !PLDONE & UL3 & !CNT3
CNTO.IOD& CNT2 & PLDONE & !UP & CNTl
!CNTO.IOD& !CNT2 & PLDONE & UP & !CNT1;
CNT4
!LL4.DQ & LPL.DQ & !PLDONE & CNT4
LL4.DQ & LPL.DQ & !PLDONE & !CNT4
UPL.DQ & !PLDONE & !UL4 & CNT4
UPL.DQ & !PLDONE & UL4 & !CNT4
CNTO.IOD& CNT2 & PLDONE & !UP & CNT3 & CNTl
!CNTO.IOD& !CNT2 & PLDONE & UP & !CNT3 & !CNT1;
CNTS
!LLS.DQ
& LPL.DQ
& CNTS & !PLDONE
LLS . DQ
& LPL . DQ
& ! CNT S & ! PLDONE
UPL.DQ & CNTS & !ULS & !PLDONE
UPL.DQ & !CNTS & ULS & !PLDONE
CNTO.IOD& CNT2 & PLDONE & CNT4 & !UP & CNT3 & CNTl
!CNTO.IOD& !CNT2 & PLDONE & !CNT4 & UP & !CNT3 & !CNT1;
CNT6
! LL6 .DQ
& LPL .DQ
& ! PLDONE & CNT6
LL6.DQ
& LPL.DQ
& !PLDONE & !CNT6
UPL.DQ & !PLDONE & CNT6 & !UL6.DQ
UPL.DQ & !PLDONE & !CNT6 & UL6.DQ
CNTO.IOD& CNT2 & CNTS & PLDONE & CNT4 & !UP & CNT3 & CNTl
!CNTO.IOD& !CNT2 & !CNTS & PLDONE & !CNT4 & UP & !CNT3 & !CNT1;
CNT7
!LL7.DQ
& LPL.DQ
& CNT7 & !PLDONE
LL7.DQ
& LPL.DQ
& !CNT7 & !PLDONE
UPL.DQ & !UL7.DQ & CNT7 & !PLDONE
UPL.DQ & UL7.DQ & !CNT7 & !PLDONE
CNTO. IOD& CNT2 & CNTS & PLDONE & CNT6 & CNT4 & !UP & CNT3 & CNTl
!CNTO.IOD& !CNT2 & !CNTS & PLDONE & !CNT6 & !CNT4 & UP & !CNT3 &

6-115

Appendix F. UplDown Counter Code for CUPL (cont)

/* Direction of count */
UP.D

UP
!UEQUAL & !UP & PLDONE
!LEQUAL & UP & PLDONE
UPL.DQ & !PLDONE & !UP
LPL.DQ & !PLDONE & UP;

$
41:
41:
41:

/* Has the lower limit been reached */
LEQUAL.D
41:
41:
41:
41:
41:
41:
41:
41:
41:
41:
41:
41:
41:
41:
41:

LL6.DQ & !CNT6
!LL7.DQ & CNT7
LL7.DQ & !CNT7
LL3.DQ & !CNT3
!LLS.DQ & CNTS
LLS.DQ & !CNTS
!LL1.DQ & CNTl
LLO.DQ & !CNTO
!LL2.DQ & CNT2
!LL4.DQ & CNT4
LL4.DQ & !CNT4
!LLO.DQ & CNTO
LL1.DQ & !CNTl
!LL6.DQ & CNT6
!LL3.DQ & CNT3
LL2.DQ & !CNT2;

1* Has pre loading finished */
PLDONE.D

!LPL.DQ

=

&

!UPL.DQ

/* Has the upper limit been reached */
UEQUAL.D

!CNT6
&
UL6.DQ
!UL7.DQ & CNT7
& !CNT7
41: UL7.DQ
& !CNT3
41: UL3
& !ULS
41: CNTS
& ULS
41: !CNTS
& CNTl
41: lULl
41: !CNTO.rOD
& ULO
& !UL2
41: CNT2
& CNT4
41: !UL4
& !CNT4
41: UL4
& !ULO
41: CNTO. rOD
ULl
& !CNTl
41:
& !UL6.DQ
41: CNT6
& CNT3
41: !UL3
& UL2;
41: !CNT2
41:

6-116

Appendix G. Decoder CUPL Code
Name
Partno
Revision
Date
Designer
Company
Location
Assembly;
Device

DCOLUMNS332.PLD;
P7C332;
01;
10-09-90;
Joe Designer;
Cypress Semiconductor;
332 DCOLUMNSR;
P7C332;

1*
This design is a simple decoder.
Agroup of address lines are decoded
to select one of 4 "windows" in memory.
The inputs have been configured
in each of their possible configurations.
Although this application would
not be used in a real design, this example shows how to configure the
input registers in each of their possible modes.

*1
PIN
PIN
PIN
PIN
PIN
PIN
PIN
PIN
PIN

1
2
[3 .. 7]
[9 .. 13]
14
[15 .. 20]
[23 .. 26]
27
28

[!WINDOWO .. 3] .OEMUX
NOTME.OE
[AD16 .. 19] .CKMUX
[AD20 .. 23] .CKMUX
[AD24 .. 27] .LEMUX
[AD28 .. 31] .LEMUX

1*

CLK;
LTCHEN;
AD16 .. 20];
[AD21. .25];
!COE;
[AD26 .. 31];
! [WINDOWO .. 3] ;
!NOTME;
!DCDEN;

1*
1*

COE;
'b'l;
CLK;
!CLK;
LTCHEN;
!LTCHEN;

Window selection Equations

Clock pin *1
Latch enable pin */
I*Address lines *1

1*

Output enable

*1

1*
1*
1*

Window selection output
No window selected *1
Decode enable *1

1*
1*
1*
1*
1*
1*

OE controlled by pin 14 *1
Notme always on bus *1
Clocked on rising edge *1
Clocked on falling edge *1
Latched when high *1
Latched when low */

*1

*1

WINDOWO

DCDEN.DQ & AD31.LQ & AD30.LQ & AD29.LQ & AD28.LQ &
AD27.LQ & AD26.LQ & AD25.LQ & AD24.LQ &
AD23.DQ & AD22.DQ & AD21.DQ & AD20.DQ &
!AD19.DQ & !AD18.DQ & !AD17.DQ & !AD16.DQ;

WINDOW1

DCDEN.DQ & AD31.LQ & AD30.LQ & AD29.LQ & AD28.LQ &
AD27.LQ & AD26.LQ & AD25.LQ & AD24.LQ &
AD23.DQ & AD22.DQ & AD21.DQ & AD20.DQ &
!AD19.DQ & AD18.DQ & !AD17.DQ & !AD16.DQ;

WINDOW2

DCDEN.DQ & AD31.LQ & AD30.LQ & AD29.LQ & AD28.LQ &
AD27.LQ & AD26.LQ & AD25.LQ & AD24.LQ &
AD23.DQ & AD22.DQ & AD21.DQ & AD20.DQ &
AD19.DQ & !AD18.DQ & !AD17.DQ & !AD16.DQ;

6-117

Appendix G. Decoder CUPL Code (cont)

DCDEN.DQ & AD31.LQ & AD30.LQ & AD29.LQ & AD28.LQ &
AD27.LQ & AD26.LQ & AD25.LQ & AD24.LQ &
AD23.DQ & AD22.DQ & AD21.DQ & AD20.DQ &
AD19.DQ & AD18.DQ & !AD17.DQ & !AD16.DQ;

WINDOW3

NOTME
$

#
#
#
#
#
#
#
#
#
#
#
#
#

'B'l
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN .DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ
DCDEN.DQ

&
&
&
&
&
&

&
&
&
&
&
&
&

&

AD16.DQ
!AD16.DQ & AD17.DQ
!AD31.LQ
!AD30.LQ
!AD29.LQ
!AD28.LQ
!AD27.LQ
!AD26.LQ
!AD25.LQ
!AD24.LQ
!AD23.DQ
!AD22.DQ
!AD21.DQ
!AD20.DQ;

6-118

Using ABEL to Program the Cypress 22VIO
Introduction
This application note presents a compilation of examples using the popular PALC22V10 programmable
logic device. The examples demonstrate the 22V10's advanced features and some of the high-level logic
description techniques of the ABEL programming langauge.
Each of the first seven. examples illustrates a specific
22V10 feature and lists the ABEL programming language statements necessary to implement the feature.
The ABEL files also contain test vectors that exercise
the feature. The remaining examples describe complete
22V10 designs that combine many of the individual features. All the examples have been tested, and you can
obtain the code for them on floppy disk from· Cypress
Semiconductor. The design examples provided are:

You can use these examples as a design reference. They
are excellent tools for designers new to programmable
logic as well as for veteran PLO users. Add the files to
your ABEL source-file library, and include any part of
the ftleS in your own designs. You can use the files as a
template by editing them using any text editor in the
non-document mode. Conversion to the CUPL or
PLO ToolKit ToolKit programming language is easily
accomplished due to these languages' syntactical
similarity. For conversion to other languages, consult
your user's guide.

Notes on the ABEL Programming Language
Before examining the application examples, consider an
introduction to the structure and syntax of the ABEL
programming language. A rudimentary understanding
of the ABEL language is necessary to fully appreciate
the example files included here.

Asynchronous reset/synchronous preset from single
inputs
Asynchronous
product terms

reset/synchronous

preset

An ABEL source file provides the information necessary to describe a PLO design's logical operation. You
can see these files' keywords and structure in any of the
examples. The ABEL language processor processes
source files to generate a JEDEC programming file and
design documentation. The language processor also
uses test vectors, which you generate as part of the
source file, to test the design's function.

from

Asynchronous reset/synchronous preset used to
load predetermined non-zero values, employing istype statements
Output-enable control from a single input
Output-enable control from product terms

ABEL Design Entry Methods

Using 16 product terms-an 8-bit identity comparitor

The ABEL programming language offers three methods
for defming the logical operation of a given design.
These methods are:

Using feedback to realize more than 16 product
terms in a 9-bit single-output identity comparitor

Boolean Equation

Bidirectional I/O-bus interface with answer-back
10-bit address generator/multiplexer

Truth table

Three state machines in one 22V10

State diagram
6-119

A source file can include any or all of these design entry
methods. The following sections describe the Boolean
equation, truth table, and state diagram entry· methods
as well as the operators and notation conventions used
in the source files.

ABEL Operators and Notation Conventions
In addition to the standard AND and OR logical
operators, ABEL supports several high-level logic
definitions. ABEL interprets "+" and "*,, signs-which
in standard Boolean notation stand for OR· and AND
operations, respectively-to indicate arithmetic addition
and multiplication. This convention greatly simplifies
the design of counters and ALU logic. Table 1 shows
the logical operators ABEL supports. The labels A, B,
and C in the examples can be either individual pins or a
set of pins, as defined in the source file.
Note that you can use these operators with operands of
more than one bit on a bit-by-bit basis. For example,
logically ORing hexidecimal values of 8 and 2 yields
hexidecimal value A:
"h08 # "h02 = "hOA

Specifying Alternate Number Bases
The "h symbols in the example above instruct the language processor to interpret the value following the
symbol as base 16 (hex). The default number base in
ABEL is decimal, but you can change the base for individual expressions with "b for binary, "0 for octal,
"d for decimal, or "h for hexidecimal. You can also
use the "@ radix" command to change the default number base to binary, octal, decimal, or hexidecimal for all
subsequent statements in a source document. All the
source files in. this application note include the command "@ radix 16" to set the number base to
hexidecimal.

Arithmetic Operators
ABEL provides arithmetic operators to allow for easy
implementation of math and shifting functions. Table 2
lists the arithmetic operators supported by ABEL.
Shifting operations are unsigned, and zeros are shifted
into the side of the expression opposite the direction of
the shift. Also note that ABEL interprets the symbol "1"
as an unsigned division operation. Other programmable
logic }anguages use this symbol to indicate inversion.
The symbol "%" gives the remainder of the division
operation performed by "/".

Relational Operators
Relational operators perform various comparisons of
elements in an expression and yield a Boolean true or
false based on the result of the comparison. These
operators greatly simplify the description of magnitude
comparisons and reduce an identity comparison to a
single statement. All relational operations are unsigned;
take care when you represent negative numbers in twos
compliment. Table 3 lists the relational operators.
Relational operators are frequently used where ranges
of values cause a given output. For example, if you want
to decode an active-low chip-select line (CSl) for any
address from "h2000 to "h2FFF, you can write the
logic for this output in a single line:
!CSt = (ADD >= "h2000) & (ADD<="h2FFF);.

Assignment Operators
Note that all example operations shown so far are for
purely combinatorial outputs. The structure for combinatorial equations is:
OUTPUT(s) = Expression(s) and/or Condition(s);
Table 2. ABEL Arithmetic Operators

Qp~[atQ[

NOT: ones compliment

Definition

Example
C= !A;

&

AND

C= A&B;

#

OR

C= A# B;

$

XOR: exclusive OR

C= A$B;

!$

XNOR: exclusive NOR

C= A!$B;

2s complement

~
C= -A;

subtraction

C= A-B;

addition

C= A+ B;

multiplication

C= A *B;

integer division

C= AlB;

%

remainder

C= A%B;

<

shift left

C= A< 2;

Qp~ratQr

Table 1. ABEL Logical Operators

+
*

D~finitiQn

(shift left 2 bits)

6-120

Table 3. ABEL Relational Operators

Table 4. ABEL Operator Priority
HighestPriority

equal

~
C=(A==B);

!=

not equal

C=(A!=B);

<

less than

C=(A

greater than

C=(A>B);

>

Shift right

< =

less than or equal

C=(A<=B);

*

Multiply

Qp~[ato[

Definition

- Twos compliment, IlQ1 subtraction

/ Unsigned division

The assignment operator is the U=U sign, meaning that
OUTPUT(s) combinatorially follow the evaluation of
the expressions and conditions. If an output or set of
outputs is registered (changing synchronously with the
clock's rising edge), use the assignment operator u:=u.
The structure of a registered equation, shown below, is
essentially the same as a combinatorial equation but
with this assignment operator:
OUTPUT(s) := Expression(s) and/or Condition(s);

% Remainder from division
Third Highest Priority

+ Add
- Subtract
#

OR

$ XOR
!$ XNOR
Lowest Priority

Operator Priority
Operators in an expression are evaluated using a
priority hierarchy. If two or more operators with equal
priority appear in a single expression, they are
evaluated in the order listed, from left to right within
the expression. Table 4 lists the priority of all operators.

All Relational Operators
(==, !=, <, >, <=, >=),

This statement signals the language processor to use
logic reduction level 4. In cases where you need
propagation delays of a speciftc length, use the statement

You can use parentheses as in normal mathematics to
alter the order of evaluation. ABEL performs the
operation in the innermost parentheses ftrst.

flag '-rO'

Special Constants

Table 5. ABEL Special Constants

ABEL supports several special constants that ease the
writing of equations and test vectors. Table 5 lists these
special constants and their functions.

Special Constant

Definition

.C.

Clock: causes a low-high-Iow
transition at a selected input for
testing.

.F.

Floating input or output

.K.

Same as .C., but high-low-high

.P.

Register preload

Logic Reduction Levels

.x.

Don't care condition

At the beginning of every source me in this brief appears the statement

.Z.

Tests input or output for high
impedance

To use several of these constants in an abbreviated form
and enable the symbols Hand L to represent binary
Ones and Zeros, place the following statement in the
labels section of the source document, as in the examples in this application note:
H,L,X,C,z = 1,O,.X.,.C.,.Z.;

flag' -r4'

6-121

which tells the language processor to use no reduction.
ABEL provides four logic reduction levels, as listed in
Table 6.

ABEL Design Entry: Boolean Equations
Boolean equations are the most common method of
design entry. To use them, you give a name to each pin
required for the application. If a design requires the
special functions available in many devices (i.e., reset
and preset), you also identify and name the nodes that
control these functions. (The 22VlO has two such
nodes: asynchronous reset at node 25 and synchronous
preset at node 26.) Groups of pins and/or frequently
used constants can also be given labels to facilitate writing equations.
Following the keyword EQUATIONS in the source me,
you describe the required logic with Boolean equations
that use the pin, node, and/or label names.
If an output has an output-enable term associated with
it, you can write an equation for that term by using the

pin name with the extension .OE" followed by the
equation for the term. An example of this is:
OUT1.0E = !RD & (INPUTS == 0);
II

This statement enables OUTl if pin RD is Low and the
group of pins (can be any number of pins) labeled INPUTS are all Low. If these conditions are not met, the
output remains three-stated.
The 22VlO has a separate combinatorial output-enable
product term for each I/O pin. The output enable is
therefore easily controlled by either a single selectable
pin or from a product term. To make an output enable
synchronous or to expand the number of product terms
available, you can dedicate an I/O macrocell to realize
the appropriate logic; the macrocell's output feeds back
to control the output-enable product term. This method
causes additional propagation delay, however, due to
the extra pass through the AND/OR array.
The use of the enable equations is purely optional; in
the absence of these equations, the ABEL language
processor automatically enables any I/O pin defined in
the Boolean equations as an output and disables any
I/O specified as an input. The outputs appear on the
left side of the equations.
This application note outlines the operators and syntax
of all Boolean equations. You can find additional information in the ABEL Language Reference and User's
Guide supplied with the ABEL software.

ABEL Design Entry: Truth Tables
Table 6. ABEL Logic Reduction Levels
Level

o

2

3

4

Statement
flag '-rO'

Description
No reduction. All equations
must be in sum-of-products
form.

flag '-rI'

Equations are expanded to
sum-of-products
form
and
reduced with standard Boolean
algebra. This is the default.

flag' -r2'

flag '-r3'

flag'-r4'

Includes level 1 reduction plus
the PRESTO algorithm. This
process is iterative, so processing time is increased significantly.

A truth table is a list of input combinations and the
resulting outputs. Normally, the inputs are listed in ascending binary order from the minimum value to the
maximum value. This format takes all possible input
situations into account and prevents any undefined
input combinations from producing undesirable outputs.
The keyword TRUTH_TABLE marks the beginning of
the table within the source file. Immediately following
the keyword, you list the input(s) and output(s) labels in
parentheses with an arrow (a minus sign and a greater
than sign "_>") between the inputs and outputs. If you
specify more than one input or output, you must enclose
the set in square brackets "[ l".
Figure 1 shows the statements required to implement a

3-to-8-line decoder. Note the use of the set identifier
Q7 ..QO.
This
can
be
written
out
as
Q7,Q6,Q5,Q4,Q3,Q2,Ql,QO.

The PRESTO algorithm is performed on a pin-by-pin basis.
This is faster than standard
PRESTO reduction.

The main advantage of the truth table entry method lies
in writing test vectors. You can block-copy the entire
truth table to the source fIle's test-vector section.

This reduction level uses the
ESPRESSO reduction algorithm.

Any design specified by a truth table can also be
entered as Boolean equations. For example, the output
6-122

Q6 in the above example could be represented by the
Boolean equation:
Q6 = 12 & 11 &

no;

ABEL Design Entry: State Diagrams
One of the most powerful features of the ABEL
programming language is its ability to compile state
diagrams directly. By allowing direct state-diagram
entry, ABEL frees you from the tedious task of generating Boolean equations with the expressions and conditions that cause each possible transition for each individual state register.
You can implement several state machines in a single
device, and you might have a set of outputs for each
state machine. The state diagram for each set of outputs
begins with the keyword STATE_DIAGRAM, followed
by the pin names or labels that make up the state outputs. You then list each state. followed by any operations to be performed while in that state and at least
one transition statement. A transition statement can be
in any of three forms:

As an example, consider a bidirectional, 3-bit counter
with inputs UP and DOWN and outputs Q2, Q1, and
QO. If UP or DOWN is High, the counter counts in the
direction specified. If both UP and DOWN are High,
the counter holds the current count. If both UP and
DOWN are Low, the counter resets to zero. In addition, output MAX is High if the counter is in the UP
mode and the count equals 7 or if the counter is in the
DOWN mode and the count equals zero. Convenient
labels for implementing this design appear in Figure 2,
and Figure3 lists the source code for the state diagram.
You can add another statement, WITH..ENDWITH, to
any transition statement to set additional outputs to any
given state when the transition preceding the
WITH ..ENDWITH statement is executed. In the previous state diagram, for example, assume the transition
from state S5 to S6 is to set a pin called FLAG. To
achieve this result, the S5 diagram is modified as shown
in Figure4.

P ALC22VIO Design Examples
The design examples present~d here exploit the various
features of the 22V10 PLD. The ftrst seven designs
focus on speciftc features and illustrate the techniques
for using and testing these features. The last three
designs combine several of the features to demonstrate
the device's versatility. It is the 22VlO's tremendous versatility that has made it the most popular of all Cypress
PLD s. Each of the last three designs, if implemented in
SSI and MSI TTL, would require from seven to 13
packages.

GOTO, for unconditional transitions to the next
state
IF .. THEN ..ELSE , for two-way branching
CASE ..ENDCASE·, for N -way branching
You can chain IF .. THEN ..ELSE statements to achieve
n-way branching. but the CASE..ENDCASE construct
accomplishes the same objective with less typing. By
using labels for state outputs and condition inputs, you
can implement even the most complex designs with
ease.

Asynchronous Reset/Synchronous Preset
As shown in Figure5, this example defmes pins 2 and 3
to be the asynchronous reset and synchronous preset inputs,
respectively.
Eight
inputs
deftned
as
INPUT7 ..INPUTO are given the label INPUTS. Eight

truth_table
([12,11,10] -> [Q7 ..QO])
[0,0,0] -> [0,0,0,0,0,0,0,1];

"labels

[0,0,1] -> [0,0,0,0,0,0,1,0];

OUTS

[0,1.0] -> [0,0,0,0,0,1,0,0];

MODE

[0,1,1] -> [0,0,0,0,1,0,0,0];

[Q2..QO];

=
=

[UP,DOWN];

CNTUP = "b10; CNTDWN = "bOI;

[1,0,0] -> [0,0,0,1,0,0,0,0];

RST = "bOO; HOLD = "bll;

[1,0,1] -> [0,0,1,0,0,0,0,0];
[1,1,0] -> [0,1,0,0,0,0,0,0];
[1,1,1] -> [1,0,0,0,0,0,0,0];

SO

"bOOO; Sl = "b001; S2

=

"bOlO;

S3

"bOll; S4

=

"blOI;

S6 = "bllO; S7

=

"b100j S5

=

"blllj

Figure 2. State Machine Labels for Counter Example

Figure 1. Truth Table for 3:8 Line Decoder
6-123

state_diagram OUT
state SO: MAX. = (MODE == CNTDWN);
case (MODE = = CNTUP): SI;
(MODE = = CNTDWN): S7;
(MODE = = HOLD) : SO;
(MODE = = RST) : SO;
endcase;
..
state SI : MAX = 0;
case (MODE = = CNTUP) : S2;
(MODE = = CNTDWN) :SO;
(MODE = = HOLD) : SI;
(MODE = = RST) : SO;
endcase;
state S2 : MAX = 0;
case (MODE = = CNTUP) : S3;
(MODE = = CNTDWN):SI;
(MODE = = HOLD)' : S2;
(MODE = = RST) : SO;
endcase;
state S3 : MAX = 0;
case (MODE = =
(MODE = =
(MODE = =
(MODE = =
endcase;

CNTUP): S4;
CNTDWN) :S2;
HOLD) : S3;
RST) : SO;

state S4 : MAX = 0;
case (MODE = =
(MODE = =
(MODE = =
(MODE = =
endcase;

CNTUP): S5;
CNTDWN): S3;
HOLD) : S4;
RST) : SO;

state S5 : MAX = 0;
case (MODE = =
(MODE = =
(MODE = =
(MODE = =
endcase;

CNTUP) : S6;
CNTDWN): S4;
HOLD) : S5;
RST) : SO;

state S6 : MAX = 0;
case (MODE = =
(MODE = =
(MODE = =
. (MODE = =
endcase;

CNTUP): S7;
CNTDWN): S5;
HOLD) : S6;
RST) : SO;

state S7 :MAX = (MODE
case (MODE = =
(MODE = =
(MODE = =
(MODE = =
endcase;

== CNTDWN);
CNTUP): SO;
CNTDWN): S6;
HOLD) : S7;
RST) : SO;

Figure 3.· ABEL Source Code for Counter Example

corresponding outputs, OUTPUT7 ..OUTPUTO, are
labeled OUTPUTS. Note how the use of labels enables
the logic for all eight outputs to be written in a single
equation. The equation:
OUTPUTS := INPUTS;
causes the data at INPUTS to be registered in OUTPUTS on the· rising edge of CLK. The .assignment
operator ":=" indicates that the operation is clocked
(registered). The 22VI0 clock input is, by definition, pin
1.
The pin assignments section identifies the predefined
node numbers for the reset and preset functions. The
equations for the nodes, in terms of the selected pins,
are then written in the file's equations section.

Asynch.Reset and. Synch. Preset from
Product Terms
This example (Figure 6) implements an asynchronous
reset and synchronous preset, as does the example in
Figure 5. In this case, however, product terms activate
the reset and preset nodes. Specifically,. the reset node
is High (active) only when INPUTS equal 55 hex.
Similarly, INPUTS equaling AA hex control the preset
term. Note how the test vectors distinguish and test the
synchronous versus the asynchronous operations.

Reset and Preset Load Predetermined Values
The examples in Figures 5 and 6 use the macrocells'
positive, registered output for the pins represented by
OUTPUI'S. Under this arrangement, the asynchronous
reset causes all outputs to go Low and the synchronous
preset causes them to go High.
This example demonstrates how you can use istype
statements in the pin assignments section to set any pattern of Ones and Zeros, either asynchronously with
reset or synchronously with preset. To understand this
operation, note in Figure7 that the 22VI0 provides four
state S5 : MAX = 0;
case (MODE = = CNTUP) : S6
with FLAG:= 1;
endwith
. (MODE = = CNTDWN): SO;
(MODE = = HOLD) : S5;
(MODE = = RST) : SO;
endcase;
Figure 4. WITH••ENDWITH Example

~RESS

~, SEMICCNDUCTOR

Using ABEL to Program the 22VIO

=================;;;;;;;======;;;;;;;======;;;;;;
"Cypress Semiconductor Corp. 11/10/1987
"Module name test

flag '-r3'

"Logic Reduction level r3, fast PRESTO

title ' Asynchronous Reset / Synchronous Preset Control From A Single Input
"Device designator and type
Ul device 'P22VI0';
"Pin assignments
CLK

pin 1;

RST

pin 2;

"Defines async reset pin

PRE

pin 3;

"Defines sync preset pin

INPUT7,INPUT6,INPUT5,INPUT4

pin 4,5,6,7;

INPUT3,INPUT2,INPUTl,INPUTO

pin 8,9,10,11;

OUTPUTI ,OUTPUT6,OUTPUT5,OUTPUT4

pin 23,22,21,20;

OUTPUT3,OUTPUT2,OUTPUTl,OUTPUTO

pin 19,18,17,16;

reset,preset

node 25,26;

"Clock input

"Pre-assigned node #s
"Labels

H,L,X,C,Z

1,0,.X.,.C.,.Z.;

INPUTS

[INPUT7 ..INPUTO];

OUTPUTS

[OUTPUTI ..OUTPUTO];

@radix 16;

"This command forces the default
"number base to HEX.

equations
reset

!RST;

preset

PRE;

"Sync preset if pin PRE is high during the rising edge of CLK

INPUTS;

'The := indicates that this a clocked (synchronous) operation

OUTPUTS

.-

"Async reset when pin RST low

test_vectors
"Test reset and preset
([CLK,RST,PRE,INPUTS] ->

OUTPUTS)

[C,R,L,55]

->

55;

"Test outputs by clocking in 55

[L,H,L,OAA]

->

55;

"Test registers hold old data (55)
"Clock AA (leading zero necessary for hex digits A-F)

[C,H,L,OAA]

->

OAA;

[C,H,L,OFF]

->

OFF;

"Set all outputs high (FF)

[L,L,L,OFF]

->

0;

"RST low asynchronously

[C,H,H,O]

->

OFF;

"PRE high synchronously

end Rst_Prel

Figure 5. ResetlPreset from Single Pins
6-125

"Cypress Semiconductor Corporation, 11110/1987
"Module name test
flag' -d'

"Logic Reduction level r3, PRESTO algorithm by pin

title 'Asynchronous Reset / Synchronous Preset Example 2, Reset and Preset generated from Product terms'

"************************************************************.
"* This Example will Asynchronously Reset all registers when the inputs

"* Synchronously Set all registers when the inputs equal AA

"************************************************************.
"Device designator and type
Ul device 'P22VI0';
"Pin assignments
CLK

pin 1;

INPUT7,INPUT6,INPUT5,INPUT4

pin 4,5,6,7;

INPUT3,INPUT2,INPUT1,INPUTO

pin 8,9,10,11;

OUTPUT7 ,OUTPUT6,OUTPUT5,OUTPUT4

pin 23,22,21,20;

OUTPUT3,OUTPUT2,OUTPUTl,OUTPUTO

pin 19,18,17,16;

reset,preset

node 25,26;

"Clock input

"Pre-assigned node #s
"Labels

H,L,X,C,Z

1,0,.X.,.c.,.z.;
[INPUT7 ..INPUTO];

INPUTS

[OUTPUTI ..OUTPUTO];

OUTPUTS

@radix 16 ; "command forces the default number base to be HEX
equations
reset

(INPUTS==55);

"Async reset when input = 55

preset

(INPUTS==OAA);

"Sync preset if inputs

OUTPUTS

INPUTS;

'The:= indicates that this a clocked (synchronous) operation

= AA during the rising edge of CLK

"Test reset and preset

test_vectors
([CLK,INPUTS] -> OUTPUTS)
[C,O]

->

0;

"Test outputs by clocking in 0

[L,OFF] ->

0;

"Test registers hold old data (0)

[C,OFF] ->

OFF;

"Clock in FF (note leading zero for hex digits A thru F)

[L,55]

->

0;

"RST low asynchronously on inputs = 55

[L,OAA]->

0;

"No change, PRE is synchronous

[C,OAA]->

OFF;

"PRE acts synchronously on inputs = AA

end Rst_Pre2

Figure 6. Reset I Preset From Product Terms
6-126

paths from the macrocells to the I/O pins: the Q and
Q\ outputs of the macrocell's register and the true and
inverted combinatorial terms that bypass the register.
All these paths pass through a 4:1 multiplexer, which is
controlled by architecture bits CO and Cl.

Note from the test vectors in Figure 8 that the use of
istype statements does not affect the outputs' polarity as
described by the Boolean equations. Conversely, if you
define an output as active Low through a Boolean equation, as in:
!OUTPUT6 := INPUT6;

The istype statements allow you to select which channel
of the multiplexer is routed to the I/O pin. Table 8
shows the choices available.

the state of the register is inverted for normal operation
and for reset and preset conditions.

An additional parameter in the istype statement allows

A final note on using istype statements in conjunction
with the reset node: The 22V10 resets when Vee is first
applied to the chip. Istype statements and active-Low
Boolean equations give you the opportunity to force the
device's outputs to any desired state upon power up.

you to select feedback paths. The choices are
feed_term, feed_reg, and feedyin. An example showing this parameter is:
OUTPUT6 istype 'pos,com,feedyin';
Specifying a feedback path for the 22V10 is redundant,
however. This is because the 22V10 selects a feedback
path using the same architecture bit (C1) that controls
the selection of registered or combinatorial outputs.
The 22V10 does not offer a feedback path from product
terms.

Output Enable Controlled by One Pin
The example in Figure 9 defines pin 2 as the output
enable pin for all outputs. Note the use of special constant" .Z." which is redefined as simply "Z" in the file's
labels section. The constant is used in the test vectors to
verify that the outputs are three-stated (high-Z) under
the appropriate conditions.

Table 8. Macrocell Configuration Selections

.cL

co..

Configuration

o

o

Reg,Active Low

'neg,reg'

Reg,Active High

'pos, reg'

Comb,Active Low

'neg, com'

Comb,Active High

'pos, com'

o

o

Product-Term-Based Output Enable

istllle Values

While Figure 9 illustrates gang control of all output
enables via an input pin, FigurelO shows several outputs
with individual output enables generated from separate
product terms.
As with reset and preset, you can make output enables
synchronous or extend the number of product terms by
using a macrocell to generate the necessary logic and

ASYNC RESET
GLOBAL CLOCK
SYNC PRESET
OUTPUT ENABLE
PTERM
SUM OF
PRODUCTS

-LD

s

q

r - ~~

QB
R

Ii

~
FEEDBACK
TO ARRAY

0

o
s

0

c1~
I

co

I

1

T

C1
Figure 7. The PALC22VIO Macrocell

6-127

TO I 10 PIN

"Cypress Semiconductor Corporation, 11/10/1987
"Module name test
"Logic Reduction level r3, PRESTO algorithm by pin

module Rst_Pre3
flag '-r3'

title' Asynchronous Reset/Synchronous Preset Example 3, Using Reset and Preset to Load to Predetermined States

"************************************************************************
"* This Example will Asynchronously Load a Value of 55 and Synchronously Load
"* Value of AA by using 'istype' statements to invert alternating output registers

*
*

"************************************************************************
"Device designator and type
Ul device 'P22VI0';
pin 1;
pin 2;
pin 3;
pin 4,5,6,7;
pin 8,9,10,11;
pin 23,22,21 ,20;
pin 19,18,17,16;
istype 'pos,reg';
istype 'neg,reg';
node 25,26;
"Labels

CLK
RST
PRE
INPUT7,INPUT6,INPUT5,INPUT4
INPUT3,INPUT2,INPUTl,INPUTO
OUTPUT7,OUTPUT6,OUTPUT5,OUTPUT4
OUTPUT3,OUTPUT2,OUTPUT1,OUTPUTO
OUTPUT7 ,OUTPUT5,OUTPUT3,OUTPUTI
OUTPUT6,OUTPUT4,OUTPUT2,OUTPUTO
reset,preset
H,L,X,C,Z
INPUTS
OUTPUTS
@radix 16;

"Pin assignments
"Clock input
"Defines async reset pin
"Defines sync preset pin

"Odd regs positive logic
''Even regs negative
"Pre-assigned node #s

1,0,.x.,.C.,.Z.;
[INPUT7 ..INPUTO];
[OUTPUT7 .. OUTPUTO];
"command forces the default number base to be HEX

equations
"Async reset when pin RST low
!RST;
reset
PRE;
"Sync preset if pin PRE is high during the rising edge of CLK
preset
"The := indicatese that this a clocked (synchronous) operation
INPUTS;
OUTPUTS
.test_vectors
([CLK,RST,PRE,INPUTS] -> OUTPUTS) 'Test Reset and Preset
[C,H,L,55]
"Test outputs by clocking in 55
55;
->
[L,H,L,OAA]
"Test registers hold old data (55)
55;
->
[C,H,L,OAA]
"Clock in AA (note the leading zero necessary for hex digits A thru F)
OAA;
->
[C,H,L,OFF]
OFF;
"Set all outputs high (FF)
->
[L,L,L,OFF]
"RST low asynchronously (bits 6,4,2,0 inverted)
55;
->
"PRE high synchronously (bits 6,4,2,0 inverted)
[C,H,H,O]
OAA;
->
end Rst Pre3

Figure 8. Resetting and Presetting to Predetermined Values

6-128

~

£

~RESS

Using ABEL to Program the 22VIO

~~ ~COID~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

"Cypress Semiconductor Corporation November 10, 1987
module Out_Enable 1
"Module name
flag' -r3'
"Logic Reduction level r3
title 'Output Enable from Single Input Example'

"***********************************************
"* This example demonstrates the Output Enable,

*

"* Function being controlled by a single input

*

"***********************************************
U1 device 'P22V10';

"Device designator and type
"Pin assignments
"Clock input
pin 1;
CLK
pin 2;
"Output enable input
OE
pin 4,5,6,7;
INPUT7,INPUT6,INPUT5,INPUT4
pin 8,9,10,11;
INPUT3,INPUT2,INPUT1,INPUTO
OUTPUT7 ,OUTPUT6,OUTPUT5,OUTPUT4
pin 23,22,21,20;
OUTPUTI,OUTPUT2,OUTPUT1,OUTPUTO
pin 19,18,17,16;
reset,preset
node 25,26;
"Pre-assigned node #s
"Labels
H,L,X,C,Z
1,0,.x.,.C.,.Z.;
INPUTS
[INPUT7 ..INPUTO];
[OUTPUT7 .. OUTPUTO];
OUTPUTS
OUTENA
[OUTPUT7 .OE,OUTPUT6.0E,OUTPUT5.0E,OUTPUT4.0E];
OUTENB
[OUTPUT3.0E,OUTPUT2.0E,OUTPUTl.OE,OUTPUTO.OE];
@radix 16;
equations
OUTENA
OUTENB
OUTPUTS

"This command forces the default number base to be HEX

10E;
10E;
INPUTS;

test_vectors
([CLK,OE,INPUTS]
[C,L,55]
->
[L,H,OAA]
->
[L,L,OAA]
->
[C,L,OAA]
->
[C,H,OFF]
->
[L,L,x]
->
end Out Enable1

"Outputs enabled only if pin OE is low

"Test output enables
->

55;
Z;
55;
OAA;
Z;
OFF;

OUTPUTS)
"Test outputs by clocking in 55 (outputs enabled)
"Test outputs go to high-Z state on OE high
"Test registers hold old data (55)
"Clock in AA (note the leading zero necessary for hex digits A thru F)
"Set all outputs high (FF) but tri-stated
"Tum outputs on and read FF

Figure 9. Output Enable Controlled by a Single Input
6-129

"Cypress Semiconductor Corp. 11/10/1987
"Module name
module Out_Enable2
flag' -r3'
"Logic Reduction level r3
title 'Output Enable From a Product Term Example'

"***********************************************
"* This example demonstrates the Output Enable
*
"* Function being controlled by a product term

*

"***********************************************
Ul device 'P22VI0';
CLK, OE
INPUT7,INPUT6,INPUT5,INPUT4
INPUT3,INPUT2,INPUT1,INPUTO
OUTPUT7 ,OUTPUT6,OUTPUT5,OUTPUT4
OUTPUT3,OUTPUT2,OUTPUTl,OUTPUTO
reset,preset
H,L,X,C,Z
INPUTS
OUTPUTS

"Device designator and type
"Pin assignments
pin 1,2;
"Clock and Output Enable inputs
pin 4,5,6,7;
pin 8,9,10,11;
pin 23,22,21,20;
pin 19,18,17,16;
node 25,26;
"Pre-assigned node #s

1,0,X.,.C.,.Z.;
[INPUT7 ..INPUTO];
[OUTPUT7 ..OUTPUTO];

"Labels

@radix 16;

"This command forces the default number base to be HEX

equations

"Each Output individually enabled
"inputs and OE is low
0) & !OE;
OUTPUT1.0E
2) & !OE;
OUTPUTI.OE
4) & !OE;
OUTPUT5.0E
6) & !OE;
OUTPUT7.0E

if the corresponding digital code is applied at

OUTPUTO.OE
(INPUTS
(INPUTS
1) &
OUTPUT2.0E
(INPUTS
(INPUTS
3) &
OUTPUT4.0E
(INPUTS
(INPUTS
5) &
OUTPUT6.0E
(INPUTS
(INPUTS
7) &
OUTPUTS := INPUTS;
test_vectors
([CLK,OE,INPUTS] -> [OUTPUT7 ..OUTPUTO])
[C,H,55]
[Z,Z,Z,Z,Z,Z,Z,Z] ;
->
[L,H,O]
[Z,Z,Z,Z,Z,Z,Z,Z] ;
->
[L,L,O]
[Z,Z,Z,z,Z,z,z,I] ;
->
[L,L,I]
[Z,Z,Z,Z,Z,Z,O,Z] ;
"Loads 55, checks OE high overrides
->
[L,L,2]
[Z,Z,Z,Z,Z, I,Z,Z];
->
"all enable terms, then enables and
[L,L,3]
[Z,Z,Z,Z,O,Z,Z,Z] ;
"checks all outputs one at a time
->
[L,L,4]
[Z,Z,Z,1 ,Z,Z,z,z];
->
[L,L,5]
[Z,Z,O,Z,Z,Z,Z,z] ;
->
[Z,I,Z,Z,Z,Z,Z,Z];
[L,L,6]
->
[L,L,7]
[O,Z,Z,Z,Z,Z,Z,Z] ;
->
end Out_Enable2

Figure 10. Separate Output Enables Controlled by Product Terms
6-130

!OE;
tOE;
IOE;
!OE;

looping back the term via a feedback path. This method
incurs additional propagation delay due to passing
through the AND/OR array twice, however.
The special constant" .Z." is used in the test vectors for
this design to verify the operation of outputs in the
three-stated (high-Z) mode.

An 8-Bit Identity Comparitor
This example (Figure 11) points out how the 22VI0's
variable-product-term architecture permits you to
directly implement logic that would otherwise require
multiple feedback terms in standard PLDs. The 22VI0
offers 16 product terms maximum, compared to only
eight product terms per output for standard 20-pin
PLDs.

pattern is supplied to INPUTS and is continuously compared to the data on DATA7 ..DATAO.
This design is intended for an application in which
DATA7 ..DATAO is a Z80 microprocessor's data bus. If
the interrupt is enabled (pin INTRENBL is High), the
8-bit comparitor output drives pin INTR active (Low).
In response, the Z80 drives pin IDREQ High. This action asks the device that initiated the interrupt to place
its 8-bit ID code on the data bus. In this example, the
ID code used is I\hSS. You can use any code by
modifying the equation for DATA in the source file.

Counter/Address Generator/Multiplexer
This lO-bit counter, address generator, and multiplexer
example (Figure 14) implements the address-generation
circuitry for the front end of a high-speed data-acquisition module. The design requires two modes of operation: In ACQUIRE mode, counters generate the ten
address lines. In READ mode, a microprocessor's address lines generate the same addresses.

An n-bit comparitor requires 2n product terms to implement. This example achieves 8-bit comparison by
decomposing the 8 bits into two 4-bit comparisons and
using I/O pins 18 and 19 for each 4-bit comparison.
These pins have 16 product terms each. The results of
each 4-bit comparison are available at the pins one tpd
after a match is detected

A discrete version of this application employs quad 2:1
multiplexers to select whether the counters or
microprocessor provide the address information. The
entire discrete circuit, excluding the SRAM being addressed, consists of 11 SSI and MSI TTL components.
The example given here implements the equivalent circuitry in a single 22VI0.

Note in Figure 11 how the inputs and outputs are used
in more than one label. This practice facilitates writing
equations and test vectors for the individual 4-bit fields
and the complete 8-bit fields.

Single-Output, 9-Bit Identity Comparitor

Note how the MODE pin in the equations for the
AOUT outputs controls the source of the addresses.
Also note the use of the asynchronous reset node: the
reset term is generated when the MODE is set for
microprocessor access (Low) and the processor address
itself is zero. Although the effect at the outputs (all outputs = zero) is the same as if the reset term were not
included, the asynchronous reset gives the processor a
way to reset all the registers to a known state before
allowing the counters to free-run again.

This example is very similar to the example in Figure 11 ,
except this example rearranges the DATA inputs to
AND the two 4-bit comparitor outputs with the result of
the single, 9th-bit compare. The result is a single DATA
= INPUTS output called INEQDATA. The source
code for this example appears in Figure12.
The disadvantage of this implementation is that it incurs
an additional tpd by feeding the individual 4-bit comparitor outputs back through the ANDIOR array. Note
that although the terms fed back to INEQDATA represent 34 (16 + 16 + 2) product terms, only three of the
eight product terms available at I/O pin 23 are used;
each of the three individual compares have already
been reduced to single signals by the time they reach
the AND/OR array for pin 23. You can also use the
extra product terms along with a separately defined
input for cascading the design to n-bit length.

Timing Diagram
One of the more interesting features of the ABEL
SIMULATE program is its ability to generate timing
diagram s for specified pins based on the test vectors in
a source file. Although a timing diagramdoes not show
propagation delays, it can help you verify a device's incircuit operation with a logic analyzer. The SIMULATE
output file shown in Figure 15 is generated with the
command line:
simulate -iaddmux.out -oaddmux.sim -t4 wl,2,3,4,S,13,14,IS, 16, 17, 18

Bus Interface Data Trap with Answer-back
This example demonstrates the 22VlO's bidirectional
I/O capabilities (Figure 13). In this example, an 8-bit
6-131

November 10, 1987
"Cypress Semiconductor Corporation
module AllTerms
"Module name
flag' -r3'
"Logic Reduction level r3, PRESTO algorithm by pin
title 'Using 16 Product Terms; An 8-bit Identity Comparitor '

"***************************************************************************
"* In this design, an 8-bit word is presented at 1/0 pins 23,22,21,20,17,16,15 and 14.
"* These pins are used for inputs only in this example. The 8-bit word is compared, 4 bits
"* at a time, to inputs INPUT7 ..0. Combinatorial outputs COMPHI and COMPLO show
"* the result of each 4-bit comparison. Pins 19 and 18 are used as the comparitor outputs
"* since these pins have enough Product Terms (16) for the required 4-bit comparisons.

"***************************************************************************
U1 device 'P22V10';
CLK
INPUT7,INPUT6,INPUT5,INPUT4
INPUT3,INPUT2,INPUTI,INPUTO
DATA7,DATA6,DATA5,DATA4
DATA3,DATA2,DATA1,DATAO

"Device designator and type
"Pin assignments
"Clock input (NOT used)
pin I;
pin 4,5,6,7;
pin 8,9,10,11;
pin 23,22,21,20;
pin 17,16,15,14;

COMPHI,COMPLO
reset,preset
H,L,X,C,Z
INPUTSH
DATAH
INPUTSL
DATAL
DATA
INPUTS

pin 19,18;
node 25,26;
I,O,.X.,.C.,.Z.;
[INPUT7 ..INPUT4];
[DATA7 ..DATA4];
[INPUT3 ..INPUTO] ;
[DATA3 ..DATAO];
[DATA7 ..DATAO];
[INPUT7 ..INPUTO] ;

"Comparator outputs
"Pre-assigned node #s
"High-order nibble
"Low-order nibble
"All 8 bits

@radix 16;
equations
COMPHI = (INPUTSH
COMPLO = (INPUTSL

DATAH);
DATAL);

test_vectors
([DATA,INPUTS] -> [COMPHI;COMPLO])
->
[H,H];
[0,0]
->
[1,1]
[8,8]
->
. [4,4]
->
[H,H];
[OE,OE] ->
[H,H];
[00,00] ->
[7,7]
->
[H,H];
[O,OF] ->
[OFO,OFF]->
[OFO,O] ->
[L,H];

"High-order nibble compare
"Low-order nibble compare

[H,H];
[H,H];
[H,H];
[H,L];
[H,L];

[2,2]
->
[OF,OF] ->
[OB,OB] ->
[OFO,OF]->

[H,H];
[H,H];
[H,H];
[L,L];

end AlITerms
Figure 11. Using 16 Product Terms : An 8-Bit Identity Com pari tor
6-132

--==--

!fft. ;~RESS
~,

Using ABEL to Program the 22VIO

~~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

"Cypress Semiconductor Corporation November 10, 1987
module CompFB
"Module name
flag' -r3'
"Logic Reduction level r3, PRESTO algorithm by pin
title 'Using Feedback to Realize more than 16 Product Terms; A Single Output, 9-bit Identity Comparitor '

"****************************************************************************
"* In this design, an 9-bit word is presented at pins 23,22,21,20,17,16,11,10 and 9.

*

"* These pins are used for inputs only in this example. The 8 LSBs of the 9-bit word are
*
"* compared, 4 bits at a time, to inputs INPUT7 ..0. Combinatorial outputs COMPHI and *
"* COMPLO show the results of each 4-bit comparison. Pins 19 and 18 are used as the
*
"* comparitor outputs since these pins have enough Product Terms (16) for the required *
"* 4-bit comparison. The MSBs (bit 8) of DATA and are compared at output COMPMSB. *
"* Outputs COMPMSB, COMPHI, and COMPLO are ANDED together to form output *
"* INEQDATA.
*

"****************************************************************************
U1 device 'P22V10';

"Device designator and type
"Pin assignments

pin 1,2,3,4,5;
INPUT8,INPUT7 ,INPUT6,INPUT5,INPUT4
INPUT3,INPUT2,INPUT1,INPUTO
pin 6,7,8,9;
pin 10,11,13,14,15;
DATA8,DATA7,DATA6,DATA5,DATA4
DATA3,DATA2,DATA1,DATAO
pin 16,17,20,21;
COMPH,COMPL,COMPMSB,INEQDATA
pin 19,18,22,23; "Comparator outputs
reset,preset
node 25,26;
"Pre-assigned node #s
H,L,X,C,Z
1,0,.x.,.C.,.Z.;
INPUTSH
[INPUT7 ..INPUT4];
"High-order nibble
[DATA7 ..DATA4];
DATAH
INPUTSL
[INPUT3 ..INPUTO] ;
"Low-order nibble
DATAL
[DATA3 ..DATAO];
DATA
[DATA8 ..DATAO];
"All nine bits
INPUTS
[INPUT8 ..INPUTO] ;
@radix 16;
equations
COMPH
"High-order nibble compare
(INPUTSH == DATAH);
COMPL
"Low-order nibble compare
(INPUTSL == DATAL);
COMPMSB
"MSB compare
(INPUT8 == DATA8);
INEQDATA
COMPH & COMPL & COMPMSB;
"Logical AND of all comparisons
test_vectors
([DATA,INPUTS] -> [COMPH,COMPL,COMPMSB,INEQDATA])
[H,H,H,H];
[111,111] ->
[H,H,H,H];
[0,0]
->
[H,H,H,H];
[H,H,H,H];
[44,44]
[22,22] ->
->
[H,H,H,H];
[IFF, IFF] -> [H,H,H,H];
[88,88] ->
[H,H,L,L];
[IFF,OFF] -> [H,H,L,L];
[0,100] ->
[H,L,H,L];
[lFE,lEE] -> [L,H,H,L];
[lFE,lFF] ->
end CompFB
Figure 12. Realizing More Than 16 Product Terms Through Feedback: A 9-Bit, Single-Output Identity Comparitor
6-133

"Cypress Semiconductor Corp., 11/10/1987
module BiDirect
"Module name test
flag' -r3'
"Logic Reduction level r3, PRESTO algorithm by pin
title 'Bi-Directional I/O A Bus Interface Data Trap with Answer-Back'

"****************************************************************************
"* This example compares the pattern at pins INPUTS to the data on data bus pins
"* D AT A7..D A TAO. Pin INTR is driven low if they match and INTRENBL (interrupt
"* enable) is high. Input IDREQ is then driven high, requesting ID code (" h55 in
"* this example) to be put on the data bus

*
*
*

"****************************************************************************
Ul device 'P22VlO';
IDREQ, INTRENBL
pin 2,3;
", Output Enable, Interrupt Enable
COMPL,INlR
pin 19,18;
"Used in comparision of 4 LSBs
INPUT7,INPUT6,INPUT5,INPUT4
pin 4,5,6,7;
pin 8,9,10,11;
INPUT3,INPUT2,INPUTl,INPUTO
DATA7,DATA6,DATA5,DATA4
pin 23,22,21,20;
DATA3,DATA2,DATAl,DATAO
pin 17,16,15,14;
"Pre~assigned node #s
reset,preset
node 25,26;
H,L,X,C,Z
= 1,0,X.,.C.,.Z.;
INPUTS = [INPUT7 .. INPUTO];
"All inputs
INPUTH = [INPUT7 ..INPUT4];
"High order nibble of INPUTS
INPUTL = [INPUT3 ..INPUTO];
"Low order nibble of INPUTS
DATA = [DATA7 ..DATAO];
"All data I/Os
DATAH = [DATA7 ..DATA4];
"High order nibble of DATA
DATAL = [DATA3 ..DATAO];
"Low order nibble of DATA
DATAOEA = [DATA7.0E,DATA6.0E,DATA5.0E,DATA4.0E];
DATAOEB
= [DATA3.0E,DATA2.0E,DATA1.0E,DATAO.OE];
IDCODE
"h55;
"Identification code
equations
DATAOEA=
DATAOEB=
DATA =
COMPL=
!INlR =

IDREQ;
IDREQ;
IDCODE;
(DATAL == INPUTL);
(DATAH = = INPUTH) & COMPL &

"Enables ID output onto data bus
"Identification code for device ("h55)
"4 LSBs compare
INTRENBL; "INTR active low, All bits equal and
"interrupt enabled (INTRENBL high)

test_vectors
([IDREQ,INTRENBL,DATA,INPUTS] -> [COMPL,INlR,DATAD
[L,H,"hOF ,"h IF]
->
[H,H,X];
"Low nibble equal,high not equal
->
[L,H,X];
[L,H, "hOFO, "hOFl]
"High nibble equal, low not equal
[H,H,X];
[L,L, "hOAA, "hOAA]
"Test Interrupt Enable
->
[H,L,X];
"DATA = INPUTS, INlR goes active (low)
[L,H, "hOAA, "hOAA]
->
[H,L,X];
[L,H, "h55, "h55]
->
[H,H,Z,X]
[X,X,IDCODE];
"DATA pins output IDCODE ("h55)
->
end BiDirect

Figure 13. BiDirectional I/O : Bus Interface Data
6-134

*

"Cypress Semiconductor Corporation November 10, 1987

module AddGenMux flag '-r3'

title ' lO-bit Address Generation / Multiplexer IC'

"********************************************************
"* This PLD design generates Address signals AO-A9.
"* If Control signal MODE is high, the address signals
"* are the output of a 10-bit counter. If MODE is low
"* the device passes uP Address lines UPADDO-UPADD9

*
*

*
*

"********************************************************
AdrsGen device 'p22v10';
CLK
AO,A1,A2,A3,A4,AS,A6,A7,A8,A9

pin 1;

UPADDO,UPADD1,UPADD2,UPADD3
UPADD4,UPADD5,UPADD6,UPADD7

pin 2,3,4,5;

UPADD8,UPADD9

pin 10,11;

pin 6,7,8,9;

MODE

pin 13;

reset,preset
H,L,X,C,Z

node 25,26;
1,0,.X.,.C.,.Z.;
[A9 .. AO);
[upADD9 ..UPADDO);

AOUT
UPADD
@radix 16;
equations
reset
AOUT

(UPADD

.-

->

[C,X,H) ->

[C,X,H)

1;

== 0)

& !MODE;

«AOUT + 1) & MODE)
# (UPADD & !MODE);

test_vectors
([CLK,UPADD,MODE)
[X,O,L) ->
0;

"System Master Clock

pin14,15,16,17,18,19,23,22,21,20;

"Address Outputs
"uP Address Lines
"Boolean equations
"Reset if uP Address = 00 and MODE is low
"Count up if MODE high or
"Pass UPADD if MODE low
"Check Operation

AOUT)
->

2;

[C,X,H] ->

"Checks Reset Function
3;
[C,X,H] -> 4;

[C,X,H) ->

5;

[C,X,H)

->

6;

[C,X,H) ->

7;

[C,X,H]-> 8;

[C,X,H) ->

9;

[C,X,H)

->

[C,X,H) ->

OB;

[C,X,H]-> OC;

[C,X,H) ->

OD;

[C,X,H)

OA;
OE;

[C,X,H) ->

OF;

[C,X,H]-> 10;

[C,lll,L)->

111;

222;

[C,44,L) ->

44;

[C,88,L]-> 88;

[C,2EE,L)->

2EE;

[C,155,L)->

155;

[C,OFF,L]->

->
[C,222,L)->

1DD;

[C,3BB,L)->

3BB;

[C,377 ,L]-> 377;

2AA;

[C,3FF,L)->

3FF;

[C,222,H)-> 00;

OFF;

[C,lDD,L) ->
[C,2AA,L) ->
[C,X,H] ->

100;

[C,lFF,L)->

IFF;

[C,X,H) ->

200;

"Load to states where all 8 LSBs

[C,2FF,L]->

2FF;

[C,X,H] ->

[C,3FF,L]->

3FF;

[C,X,H] ->

300;
0;

"counter mode

"are high (uP mode), then toggle in

end AddGenMux
Figure 14. 10-Bit Address GeneratorlMultiplexer

6-135

The "_i" indicates the input file, which in this case is the
intermediate output file created by ABEL's FUSEMAP
program. The -Oil tells SIMULATE which file to write
the results into. The -t4" specifies the trace level where
waveforms are displayed, and the "-w1..18" indicates
which pins to show in the waveform output.
II

II

You can find more information on SIMULATE in the
ABEL User's Guide and Language Reference supplied
with the ABEL software from DataI/O.

Three State Machines in One 22VIO
This final example demonstrates the power of the
22VlO when used as a synchronous state machine. The
application involves the redesign of a radar system's
timing circuitry. The system performs 12 discrete
Fourier transforms on each set of quadrature data
returned in three antenna beams that are gated for nine
ABEL Version 2.00b Data 1/0 Corp.
Address Generation I Multiplexer IC
Simulate device AdrsGen, type 'P22V10'

C
L
K

V 0001
V 0002
V 0003
V 0004
VOOOS
V 0006
V 0007
VOOOS
V 0009
V 0010
V 0011
V 0012
V 0013
V0014
V001S
V 0016
V 0017
VOO1S
V 0019
V 0020
V 0021
V 0022

u

u

u

u

p

p

p

P

A

A

A

A

M

D
D

D
D

D
D

D
D

0

2

3

o

D
E

A
0

A

A

A

2

3

L L
C
C

I

C
C
C
C
C
C
C
C
C

I

C

I I

I I

C
C
C
C

J

I 1_ I
I
I

J

I 1_
I

J

I 1_
I
I

I

I

I

I

I
I
J I_ I
I
_I L
J
I
I

I

'-

J
J

I 1_
I J

'-

L

J

L
I

J L
I
J
I

_ I
I
I

I
I
I
I
J _I
I
I
I
I
I I
J I
I
I
I
I
I
I

J I

I 1_

C

C
C
C
C

I

J

J
I

J

I
I_ I

This example creates three state machines in a single
22VlO. As you can see from the state diagrams (Figure
16), the filter state machine is free running. The beam
state machine only changes states when the filter outputs are in their maximum condition. Similarly the gate
information changes only if both the filter and beam
outputs are at their maximum values.
Note the combined use of Boolean equations and state
diagram s. A separate state diagram describes each state
machine, but the transitions depend upon the condition
of the other state outputs. Also of note is the extreme
use of labels for pins, groups of pins, and the state outputs. This approach greatly simplifies the writing of the
state diagrams and test vectors.
When this design was first compiled, the ABEL
FUSEMAP routine indicated several outputs that had
too many terms for the physical array of the corresponding I/O pin. The design was made to fit by carefully arranging the lIOs. The flag "_r3" reduction statement made the fit possible without the tedium of
generating and manually reducing Boolean equations
from the state diagrams.
The test vectors for this design are of particular interest Note how the @REPEAT command cycles through
35 states to make the gate state outputs toggle. This
powerful command helps describe 325 test vectors in a
concise and manageable manner.

I
I
I
I

J

ranges. The nonbinary nature of these numbers (three
beams, nine ranges, and 12 speed bins) make generating
the timing signals with counter circuits cumbersome.

I
I
I
I
I
I
I
I
I
I
I

J
I
I

I
J '- I
I
I
J 'I_ 1I
I
1_ _I
I
I

Figure 15. ABEL Simulated Waveform
6-136

"Cypress Semiconductor Corporation November 10, 1987
module Statexam flag '-r3'
title 'Timing Generation TRIPLE State Machine for DFT Processor using a Cypress Semiconductor PAL C22VIO'

"*********************************************************************
"* BEAM STATES - 0, 1,2 (3 not used), GATE STATES - 0,1,2,4,5,6,8,9, A
"* (3,7,B,C,D,E,F not used), FILTER STATES - 0,1,2,4,5,6,8,9, A, C, D, E
"* (3,7,B,F not used)

"*********************************************************************
Ul device 'P22VI0';
SYSCLK
START
ABO,AB1,AB2,AB3,AB4
AB5,AB6,AB7,AB8,AB9
reset,preset
ABO,AB 1,AB2,AB3,AB4
ABS,AB6,AB7,AB8,AB9
H,L,X,C,Z
ABall
FILT
BEAM
GATE
@radix 16;

pin 1;
pin 2;
pin23,14,22,IS,21;
pin 16,18,19,20,17;
node 2S,26;
istype 'pos,reg';
istype 'pos,reg';
l,O,.X.,.C.,.Z.;
[AB9 .. ABO];
[AB3 ..ABO];
[ABS,AB4];
[AB9 ..AB6];

"Used for reset/power-up
"Pins are non-sequential to take advantage of
"The variable number of product terms in the 22VI0
"Pre-assigned node #s
"Unnecessary because ABEL will set architecture bits
"automatically - shown for example purposes only

"Filter States - note missing states
FO = 00; F1 = 01; F2 = 02; F3 = 04; F4 = 05; FS = 06; F6 = 08;
F7 = 09; F8 = OA; F9 = OC; FlO = OD; F11 = OE;
"Beam States
BO = 00; Bl = 01; B2 = 02;
"Gate States
GO = 00; G1 = 01; G2 = 02; G3 = 04; G4 = OS; G5 = 06; G6 = 08; G7 = 09; G8 = OA;
equations
"Initialize to all lows on START
reset = START;
state_diagram FILT
State FO: GOTO Fl; State F1: GOTO F2; State F2: GOTO F3; State F3: GOTO F4;
State F4: GOTO FS; State F5: GOTO F6; State F6: GOTO F7; State F7: GOTO F8;
State F8: GOTO F9; State F9: GOTO FlO; State FlO: GOTO Fl1; State Fl1: GOTO FO;
state_diagram BEAM
State BO: case (FILT -- "blll0)
: Bl;
(FILT 1= "blll0)
:BO;
endcase;
State Bl: case (FILT == "blll0)
:B2;
"Increment ONLY if
(FILT 1= "b1ll0)
: B1;
"FILT is at max (OE)
endcase;
:BO;
State B2: case (FILT -- "bIllO)
(FILT 1= "bl1lO)
:B2;
endcase;

Figure 16. Triple State Machine (part!)

6-137

state_diagram OAm
State 00: case «BEAM
«BEAM
endcase;
State 01: case «BEAM
«BEAM
endcase;
State 02: case «BEAM
«BEAM
endcase;
State 03: case «BEAM

&

== AblO) & (FILT == Ab1110»
!= AblO) # (FILT != Ab11lO»

:02;
: 01;

== AblO) & (FILT == Abl110»
!= AblO) # (FILT != Abl1l0»

: 03;
: 02;

-- Abl0) & (FILT == Ab1110»
«BEAM != AblO) # (FILT != AblllO»

: 04;
: 03;

endcase;
State 04: case «BEAM
«BEAM
endcase;
State 05: case «BEAM
«BEAM
endcase;
State 06: case «BEAM
«BEAM
endcase;
State 07: case «BEAM
«BEAM
endcase;
State 08: case «BEAM
«BEAM
endcase;
test_vectors

"Increments ONLY if BEAM and FILT are at max
(FILT == Ab1110»
: 01;
!= AblO) # (FILT != Ab1110»
: 00;

== Abl0)

-- AblO) & (FILT == Abl110»
!= AblO) # (FILT != Ab11lO»

: 05;
:04;

-- Abl0) & (FILT == Ab1110»
!= AblO) # (FILT != Abl110»

: 06;
: 05;

-- Abl0) & (FILT -- AblllO»
!= AblO) # (FILT != Abl1l0»

:07;
: 06;

-- AblO) & (FILT == Ab1110»
!= Abl0) # (FILT != AblllO»

: 08;
:07;

== AblO) & (FILT == Abl110»
!= Abl0) # (FILT != Abl1l0»

:00;
:08;

"Verifies devices operation

([SYSCLK,STARll

->

[GATE,BEAM,FILT))

[X,H] -> [GO,BO,FO];

[C,L] -> [GO,BO,Fl];

[C,L] -> [GO,BO,F2];[C,L] -> [GO,BO,F3];

[C,L] -> [GO,BO,F4];

[C,L] -> [GO,BO,F5];

[C,L] -> [GO,BO,F6];[C,L] -> [GO,BO,F1];

[C,L] -> [GO,BO,F8];

[C,L] -> [GO,BO,F9];

[C,L] -> [GO,BO,FIO];[C,L] -> [GO,BO,Fll];

[C,L] -> [GO,Bl,FO];

[C,L] -> [GO,Bl,Fl];

[C,L] -> [GO,Bl,F2];[C,L] -> [GO,B1 ,F3];

[C,L] -> [GO,Bl,F4];

[C,L] -> [GO,Bl,F5];

[C,L] -> [GO,Bl,F6];[C,L] -> [GO,B1 ,F1];

[C,L] -> [GO,Bl,F8];

[C,L] -> [GO,Bl,F9];

[C,L] -> [GO,Bl,FlO];[C,L] -> [GO,Bl,Fll];

[C,L] -> [GO,B2,FO];

[C,L] -> [GO,B2,Fl];

[C,L] -> [GO,B2,F2];[C,L] -> [GO,B2,F3];

[C,L] -> [GO,B2,F4];

[C,L] -> [GO,B2,F5];

[C,L] -> [GO,B2,F6];[C,L] -> [GO,B2,F1];

[C,L] -> [GO,B2,F8];

[C,L] -> [GO,B2,F9];

[C,L] -> [GO,B2,FIO];[C,L] -> [GO,B2,Fll];

[C,L] -> [Gl,BO,FO];

"Gate output changes state here

@REPEAT 11035 {[C,L] -> [X,x,X]; } [C,L] -> [G2,BO,FO];@REPEAT 11035 {[C,L] -> [X,x,X]; } [C,L] -> [G3,BO,FO];
@REPEAT 11035 {[C,L] -> [X,x,x]; } [C,L] -> [G4,BO,FO];@REPEAT 11035 {[C,L] -> [X,x,X]; } [C,L] -> [G5,BO,FO];
@REPEAT 11035 {[C,L] -> [X,x,X]; } [C,L] -> [G6,BO,FO];@REPEAT 11035 {[C,L] -> [X,x,X]; } [C,L] -> [G7,BO,FO];
@REPEAT 11035 {[C,L] -> [X,x,X]; } [C,L] -> [G8,BO,FO];
@REPEAT

II

035 {[C,L] -> [X,x,x];} [C,L] -> [GO,BO,FO]; "Check the final state rolls over to the first
"This completes a run-through of ALL states, the following 2 vectors retest reset (STAR1)

[C,L] -> [GO,BO,Fl]; [C,H] -> [GO,BO,FO];

end Statexam

Figure 16. Triple State Machine (continued)
6-138

CYPRESS
SEMICONDUCTOR

Using ABEL to Program the CY7C330

put-enable line. Control of the CY7C330's output enable
can originate from the product term array or from pin 14.
You can program the choice on a register-by-register
basis. (The I/O macrocell section of this application note
gives more information on controlling the output enable.)
You can control the input-register clock mux in two
ways. The most descriptive way is to use the".C" suffix,
as shown in the DEM0330.ABL example file supplied
with ABEL. This method works for the dedicated input
registers (pins 4 - 7 and 9 - 14) but does not work in
ABEL 3.1 for the input registers in the I/O macrocells.
The reason for this problem is that for the 12 I/O macrocells, ABEL thinks the clock mux is for the output or state
register and not the input register.
Thus, the recommended method for controlling the
input-register clock mux is to use macro commands. The
macro file supplied with ABEL 3.0 does not include the
complete macro list needed to program all the clock
muxes, but you can get the complete file from Cypress.
This file, P330.INC, contains the macros needed to program all the clock muxes, including the input registers. A
listing of the macro file appears in Appendix A. ABEL
versions 3.1 and higher come with the complete macro
flIe.
After you reference the macro file in the ABEL
source flIe, the command CLK2 must enable the pin-3
clock. Then you set specifIc clock muxes by entering
CLK2 n, where n is the input register's pin number. For
example:
LIBRARY
'p330';
"allows use of p330.inc macro file
CLK2;
"enables pin 3 as a clock input
CLK2 5;
- "pin 5 input reg uses the pin 3 clock
CLK2 15;
- "pin 15 input reg uses the pin 3 clock

This application note describes how to access all the
features of the Cypress CY7C330 using ABEL. Examples
show how to put the features to work. ABEL is a versatile
logic design tool that can program over 300 different
devices.
The Cypress CY7C330 is a powerful PLD. Features
such as input and buried registers allow the CY7C330 to
fit into a wide variety of applications. Although, the same
features can make programming the device a challenge,
this application note should minimize the challenge.

ABEL 3.0 Bug
If you are still using ABEL 3.0 and trying to program

the CY7C330 for the fIrst time, note that the supplied
device driver has a fatal flaw. Both Cypress and Data I/O
offer updated device drivers.
ABEL 3.1 also supplies a correct device flIe, with a
new name. P330 was used for revision 3.0, and P330A for
3.1, although 3.1 still compiles with the P330 device
name. The only difference between these two device flIes
is the syntax for specifying the shared feedback mux.

Input Registers
The CY7C330 contains 11 dedicated input registers.
An input register is also associated with each one of the
12 output registers (more on this later).
Pin 3 can serve as an input register or a clock input.
In fact, ten of the 11 input registers can be clocked from
two different sources: pins 2 or 3. You can program the
choice of the clock source individually, ona register-byregister basis. If an application requires only one input
clock source, you can use pin 3 as a normal input. If an
application requires both input clocks, however, you must
use pin 3 as a clock input. A confIguration bit must be
changed to enable pin 3 as a clock input.
Like pin 3, pin 14 is a dual-function pin; it can be
used as a registered input or a global, asynchronous, out-

6-139

You do not need a macro statement to specify the use
of clock 1 (pin 2) for input registers, because clock 1 is
the clock mux default setting for both the dedicated input
registers and the I/O macrocell input registers.
ABEL handles the accessing of data from one of the
dedicated input registers (pins 3 - 14) the same as for a
straight buffered input. The only difference is that for the
dedicated input registers, input data is not available in the
product term array until after the appropriate input clock
pulse is received.

Controlling the Output Enable
You specify an output enable by appending the suffIx
".oE" to the appropriate pin name. You must define
whether control of the output enable mux comes from pin
14 or the product term array. Configuration bit CO controls this choice, and you make the selection using the 1STYPE statement:
OUT1,OU'f2,OUT3,OUT4 pin 15,16,17,18 ;
"I/O pins
OUT1.0E,OUT2.0E ISTYPE 'EQN';
"OE is product-term controlled
OUT3.0E,OUT4.0E 1STYPE 'PIN';
"OE is controlled by pin 14

OUT1, INP1, INP2 PIN 16, 5, 6;
OUT1.PR = INP1 ;
"preset all output nodes on INP1=1
OUT.RE = INP2;
"reset all output nodes on INP2=1
The second way to utilize set and reset is to employ
the node notation shown in the following code, in which
the set and reset product terms are designated node 30 and
29, respectively.
SET, RESET NODE 30, 29 ;
SET = INPl;
"preset all output nodes on INP1=1
RESET = INP2;
"reset all output nodes on INP2=1
Even though the reset and preset functions are
synchronous, an error occurs while parsing the equations
if you use the ":=" notation, which signifies a registered
operation.

Using the MacroceII as an Output Only
When using the I/O macrocell as an output, you need
to consider two parameters. The fIrst is the setting of the
macrocell feedback mux, as controlled by configuration
bit Cl. The second parameter is the control of the output
enable, as described in the previous section. As with the
output-enable control, you set the configuration bit for the
feedback mux using the 1STYPE statement. When the
input register is not used, data from the output register is
typically fed back to the product-term array through the
macrocell feedback mux. When this feedback arrangement
is used, 1STYPE is followed by the FEED_REG attribute:
OUTl
PIN 15;
"located in initial pin definitions

When controlling the output enable with a product
term, you have the option of setting it always on, always
off, or making it a combination of some number of inputs
or outputs. All three choices are illustrated in this code:
[OUT1.0E,OUT2.0E] = [1,1];
"permanently enable outputs
OUT3.0E= 0;
"permanently disable output
OUT4.0E = IN1 & IN2 & OUT1 ;
"OE controlled by IN1, IN2, OUT1

OUT1 1STYPE 'FEED REG';
"sets C1=0, allowing feedback mux
"to pass data from state register

Using Set and Reset
The CY7C330 has global synchronous set and. reset
capability. When used, it sets or resets all 12. state
registers and the four buried registers. Watch out for two
conditions when using set or reset: First, when you reset
the registers, all the outputs go High if they are· enabled
because of the inverter between the state register and the
output (Figure 1). Second, be aware that the reset does
not occur for two clock pulses if an input is designated as
the set/reset pin. This occurs because the reset data must
be clocked into the product term array using one of the
two input clocks fIrSt. The output registers must then be
clocked to cause the reset or set to occur.
You can accesS the CY7C330's set and reset
capability in two ways: First, you can append the suffix
"PR" for preset or ".RE"for reset to any output-pin or
buried-register node name. The syntax is:

OUT1:= INP1 $ «INP1 & INP2 )# INP3);
"sample eq from'equations' section
The ABEL default for the feedback mux configuration bit (C1) is to take data from the state register. Thus
the "1STYPE 'FEED_REG';" statement is not· required,
but it is recommended that the defaults be documented.

Using the MacroceII as an Input Only
When you use the I/O macrocell as an input register,
the syntax differs from that of the previous example.
Specifically; the output buffer most be three-stated, and
the macrocell feedback mux must be set to accept data
from the input register(Cl must be set to 1). The followingexample assiImes that the output register is not used at

6-140

Table 1. Node Numbers for Shared Input Multiplexers
Node Number
35
36
37
38
39

cell. A configuration bit (C3) controls whether the mux's
input is from an even- or odd-pin-number macrocell. The
ABEL default is that the data is supplied from the evenpin-number macrocell. Changing to an odd pin requires
that you invoke macros located in the P330.INC file. (The
example in the next section shows how to make this
change.)
The purpose of the shared input mux is to provide
another input path to the product-term array, when
registered feedback is used, without losing input
capability.

Mux Between Pins
15, 16
17, 18
19,20
23,24
25,26

all. Keep in mind that the input register clock defaults to
clock 1 (pin 2) unless specifically changed.
INPl, INP2, oun
PIN 5, 15, 16 ;

Using the Input and Output Registers
When using both the input and output registers in the
I/O macrocell, the most difficult task is to get the data
into the product-term array.
You can use two muxes to feed data from the
registers into the product-term array. The state-register information must be fed back through the feedback mux
controlled by configuration bit Cl. You can route inputregister data through the feedback mux or through the
shared input mux (Figure 1).
The state-register output is referred to by the pin
name associated with the macrocell. The data clocked into
the input register is referred to by using the node name
assigned to the shared input mux. Table 1 lists the node
numbers of the shared input muxes.
In ABEL, the configuration bit controlling the shared
input mux (C3) defaults to an even I/O pin. When the
input data is on an odd pin, you can use a macro in the
P330.INC macro file to change the C3 configuration bit.

INP2 ISTYPE 'FEED PIN';
"set Cl=l, allowing feedback mux to
"take data from the input register
INP2.0E ISTYPE 'EQU';
"set CO=O for product term OE
EQUATIONS
INP2.0E = 0;
"three-state output buffer permanently
oun:= INPI & INP2;

Shared Input Mux
Each pair of I/O macrocells has a shared input mux.
This mux feeds data from the input pin into the productterm array if both registers are fed back in an I/O macro-

SET
RESET
I C L K1

ICLKO

oeLl
OE
OE

PTL~E~R~M

________________________+-~~-+-r~

T

,
TO

C3
FROM ADJACENT
MACROCELL

Figure 1. The CY7C330 Macrocell

6-141

PIN

The following example also uses clock 2 (pin 3) to clock
the input register:
BREG

PIN 15;
"BREG is output register for pin 15

INP1 NODE 35;
"INP1 is the input register for pin 15
BREG ISTYPE 'FEED REG';
"C1 is set to 0, mux routes Q of BREG
BREG ISTYPE 'EQN';
"OE is product term controlled
LIBRARY 'P330' ;
"enables use of the P330.INC file
CLK2;
"enables pin-3 clock
CLK2 15;
- "enables CLK2 on pin-15 input reg
FEEDPIN 15;
"shared input InUX control bit (C3) set
"This gives pin 15 an input path
EQUATIONS
BREG.OE = 0;
"disable output
BREG := BREG $ (INP1 & INP2);
"BREG is fed back and INP1 is an input

The Exclusive-OR Gate
The CY7C330 provides an exclusive-OR (XOR) gate
on the D input of the 12 IJO-macrocell output registers
and the four buried registers. You can use this gate for
two purposes. First, you can invert the polarity of a signal
going into the output register. This inversion is accomplished by setting one of the XOR inputs to a logic 1,

using the ABEL "$" symbol for XOR. In ABEL, you can
use the following format:
OUT1 := 1 $ (INP1 & INP2 & INP3);
In ABEL versions before 3.1, however, the reduction
algorithms do not recognize a 1 mixed with variables in
an equation. The equivalent expression for earlier versions
is:
OUT1 := (INP1 # !INP) $ (INP1&INP2&INP3);
The second use for the XOR gate is to emulate JK or
T flip-flops in software. T flip-flops are more efficient
than D flip-flops for implementing counters and state
machines. You can emulate T -type flip-flops by feeding
back the output register's Q output and tying it to the
XOR product term. The sum-of-products input to the
XOR becomes the T input (Figure 2). You can configure
this emulation with Boolean equations:
1FLOP:= TFLOP $ (T input expression);
where "T input expression" is a legal sum-of-products expression. A JK flip-flop is emulated using the same configuration, and the relationship:
T=J!Q#KQ
The second way to configure an output flip-flop as a
T-type flop is to use an ISTYPE statement such as the one
in the next example. The following syntax describes a
simple 2-bit counter:
PIN 1, 2,3, 14;
CLK, INSTB, fOE
QO, Ql
PIN 28, 27;
QO, Ql
ISTYPE
'REG_ T' ;
QO.OE, Q1.0E ISTYPE
'PIN';
CNT = [Ql,QO];
EQUATIONS
QO.OE= OE;
Q1.0E = OE;
CNT = (CNT + 1);

SET
RESET
ICLKI
ICLKO
OCLK
OE

OE P~~---------------------r4-~-+~~

S U" - - " - ' - - - t - - - i 1 - - '

TO
INPUT MUX

Figure 2. The CY7C330 Macrocell as a T -Type Flip-Flop

6-142

ABEL compensates for the lack of inversion in the output
by inverting the data coming out of the input register.
"inputs
CKS, CK1, CK2, INP PIN 1, 2, 3, 4;
"output
OUT PIN 15;
EQUATIONS
OUT := INP;

OE (FROM PIN 14)
CllO
C l K1

TEST VECTORS

elK!

([CKS~CK1,CK2,INP] -> [OUT])
[0, C, 0, 0] -> [X];
[C, 0, 0, X] -> [0];
[C, 0, 0, X] -> [0];
[0, C, 0, 1] -> [0];
[C, 0, 0, X] -> [1];
[C, 0, 0, X] -> [1];

SR
55

Figure 3. A Buried Register

Buried Registers
As mentioned before, the CY7C330 contains four
buried registers. You access these registers by assigning a
name to the buried register node number. Table 2 lists the
node numbers, and Figure 3 shows a diagram of a buried
register.
To use a buried register, assign a name to the node
and use it as if it were a normal output. The only difference is that the I/O macrocell has an inverter between
the state register and the output pin, which causes ABEL
to handle the polarity differently (more on this in the next
section).

END
When using state machine syntax, ABEL does not
handle the polarity of the buried registers correctly. Not
only do the equations not work, but the simulation also
fails. You can easily flx the problem, however, by negating the names in the node declaration:
CLK1, CLK2, CLK3 PIN 1,2,3;
PIN 4,15 ;
INP, OUT
"hidden register declaration (negated)
!C1, !C2, !C3 NODE 31,32,33;

Polarity Conventions

As with the state machine syntax, when using the
"COUNT = COUNT +1" syntax, you also must invert the
polarity of any buried registers. The easiest place to accomplish the inversion is at the node definitions statement, as shown in the previous example. Additionally,
refer to the counter example at the end of this application
note.

As shown in later examples, you typically do not
have to worry about signal polarity except when sending
data to an output pin. This is because all data enters the
product-term array in both the non-inverted and inverted
states. ABEL chooses the right polarity to obtain the output as specified by the equations.
When you export data from the device via an output
pin, polarity is more critical-especially when using the
set or reset. As shown by the block diagrams, the macrocell includes an inverter between the output register and
output pin. Therefore, if you use the reset capability, the
registers' Q output goes Low, and the output pins go
High. If your application requires all the outputs to start
out Low, use preset instead of reset.
In the following example, the output is defmed as
positive, and a 1 and a 0 are passed through the device.

State Machine Syntax
ABEL supports state machine syntax on the
CY7C330. The only drawback is that you can only use
the toggle flip-flop emulation mode for very simple state
machines. Up to revision 3.1, the results of using state
machine syntax with T flip-flop emulation are unpredictable.
The T flip-flop is efflcient for state machines because
it holds its state unless told otherwise and thus needs a
product term only for a state change. In contrast, a state
machine using D flip-flops needs a product term both to
change states and to hold states. Even with this limitation,
the CY7C330 contains from nine to 19 product terms per
output and usually handles a medium-size state machine
with ease.

Table 2. Node Numbers of Buried Registers
Buried Register
1
2
3
4

Node Number
31
32
33
34

Product Terms
13
17
11
19

Simulation Caveat
Be aware of a limitation to what ABEL can simulate.
Speciflcally, when writing simulation test vectors, you can
use only one of the three clock lines on a single test-vec-

6-143

tor line. The following example does not simulate
correctly:
TEST_VECTORS
([CKS,CKl,CK2,INP] -> [OUT])
[ C , C , 0 , 0] -> [ 0] ;
The following modified version does simulate
correctly:
TEST VECTORS
([CKS~CKl,CK2,INP]

[O,C,O,O]
[C,O,O,X]

-> [OUT])
-> [X] ;
-> [0] ;

ABEL supports the preload function. Refer to the 15bit counter example for more information on how to use
it.

I6-Bit Up/Down Counter
This application, COUNTER6, is an example of a 15bit up counter with a terminal-count output The application shows how to use ABEL's "COUNT = COUNT + 1"
syntax and corrects the polarity problem that crops up
when combining normal I/O macrocell output registers
and buried registers. This example also illustrates how to
use the preload function. The ABEL source code for this
example appears in Appendix B.

State-Machine-Based Modulo-II Counter
This example is a state machine application implementing a modulo-II counter using state machine syntax. This example again shows how to handle polarity
using both normal registers and buried registers. Appendix
C lists the ABEL source code for this example.

Appendix A. P330.INC -- Macro Listing
" P330.INC
"The following select Clock 2 (pin 3) for the Output Macrocell Input register.
CLK2_28 macro () {FUSES[17030] = I;}
CLK2_27 macro 0 {FUSES[17034] = I;}
CLK2 26 macro 0 {FUSES[17037] = I;}
CLK2=25 macro 0 {FUSES[17041] = I;}
CLK2 24 macro 0 {FUSES[17044] = I;}
CLK2-23 macro 0 {FUSES[17048] = I;}
CLK2=20 macro 0 {FUSES[17051] = I;}
CLK2 19 macro 0 {FUSES[17055] = I;}
CLK2)8 macro 0 {FUSES[17058] = I;}
CLK2_17 macro 0 {FUSES[17062] = I;}
CLK2_I6 macro 0 {FUSES[I7065] = I;}
CLK2_15 macro 0 {FUSES[17069] = I;}
"The following enables clock 2 (pin 3)
CLK2 macro 0 {FUSES[17070] = I;}
CLK2_4 macro 0 {FUSES[17072] = I;}
CLK2 5 macro 0 {FUSES[I7073] = I;}
CLK2=6 macro 0 {FUSES[17074] = I;}
CLK2_7 macro 0 {FUSES[17075] = I;}
CLK2 9 macro 0 {FUSES[17076] = I;}
CLK2-10 macro 0 {FUSES[17077] = I;}
CLK2-U macro 0 {FUSES[17078] = I;}
CLK2-12 macro () {FUSES[17079] = I;}
CLK2-13 macro () {FUSES[17080] = I;}
CLK2)4 macro 0 {FUSES[17081] = I;}
"The following program the C3 bit in the Output Macrocell and selects feedback from the lower pin.
FEEDPIN 27 macro 0 {FUSES[17031] = I;}
FEEDPIN-25 macro 0 {FUSES[17038] = I;}
FEEDPIN-23 macro 0 {FUSES[17045] = I;}
FEEDPIN-19 macro 0 {FUSES[17052] = I;}
FEEDPIN-17 macro 0 {FUSES[17059] = I;}
FEEDPIN)5 macro 0 {FUSES[17066] = I;}

6-144

Appendix B. ABEL Source Code for the 16·Bit Counter Example
module counter6
title 'Counter application for CY7C330 application note· Cypress Semiconductor June 19,1989'
counter6device

'p330';
" This is example of a 15 bit counter showing:
"I. How to handle the polarity when combining normal output registers and buried regs.
"2. How to use the' count = count + l' syntax.
"3. How to use preload for simulation vectors and handle the polarity inversion for the
"
buried registers.

" inputs pins
clk,clk1,c1k2,preset
pin
1,2,3,4 ;
" output pins
cO,c1,c2,c3,c4,c5,c6 pin 15,28,26,17,24,19,20 ;
c11,c12,cI3,c14
pin 25,18,16,27 ;
tci
pin 23 ;
spreset
node 30 ;
!c7,!c8,!c9,!c10
node 31,32,33,34 ;
" macros
c cntr = [c14, c13, c12, ell, c10, c9, c8, c7, c6, c5, c4, c3, c2, c1, cO] ;
" this is used to handle the preload inversion of the buried registers. See test vectors below.
c_cntrs = [c14, c13, c12, c11, !clO, !c9, !c8, !c7, c6, c5, c4, c3, c2, c1, cO] ;
c,x,p
.c., .x., .p.;
equations
spreset
preset;
c_cntr '(c cntr + 1) ;
(c_cntr == 2346) ;
tci
" Example of using preset with simulation
test vectors
([clk,clkl,preset,c_cntrs] -> [c_cntr,tci])
[0,0 , x
, x];
] -> [ x
• x
[0. c , 1 • x
] -> [ X
• x];
[c.O , x • x
] -> [ 0
.0];
,0];
[0. c , 0
x
] -> [ 0
x
] -> [ 1
.0];
[c.O • x
x
] -> [ 2
.0];
[c.O • x
,0];
x
] -> [ 3
[c.O • x
x
] -> [ 4
.0];
[c.O • x
,0];
] -> [ 5
[c.O • x , x
.0];
[P.O. x • 62 ] -> [ x
] -> [ 62 .0];
[0.0 • x • x
[c,O • x • x
] -> [ 63 .0];
] -> [ 64 .0];
[c.O • x • x
] -> [ 65 .0];
[c.O • x • x
] -> [ 66 ,0];
[c.O • x • x
[c,O , x , x
] -> [ 67 ,0];
[c,O • x , x
] -> [ 68 ,0 ];
[p,O , x ,2345] -> [ x
,0 ];
[0,0 , x , x
] -> [ 2345 , 0 ];
[c,O , x , x
] -> [ 2346 , 0 ];
] -> [ 2347 , 1 ];
[c,O , x
x
] -> [ 2348 ,0];
[c,O , x
x
[c,O , x , x
] -> [ 2349 ,0];
end

6-145

Appendix C. ABEL State Machine Source Code for Modulo 11 Counter
module statem
title' Application Note State Machine Example - Cypress Semiconductor 5-12-89'
statem

device

elk 1,c1k2,cIk3
cl,c2
res
reset
!c3,!c4
count
c4,c3,c2,cl
c,x,z,h,l

'P330';
1,2,3 ;
pin
pin
15,16 ;
pin 4;
node 30;
node 31,32;
[c4,c3,c2,cl] ;
istype 'feedJeg';
.c.,.x.,.z.,I,O;
" This is an example of implementing a modulo counter using state machine syntax.
" This example also shows how to use the hidden registers.

" counter states
sO = AbOOOO; s3 = AbOOll s6 = AbOll0
sl = AbOOOI ; s4 = AbOlOO s7 = AbOlll
s2 = AbOOlO; s5 = AbO 10 1 s8 = Abl000
equations
c4.pr

s9 = Abl00l;
slO = AblOlO ;

res;

state diagram [c4,c3,c2,cl]
state sO: goto sl ;
state sl: goto s2 ;
state s2: goto s3 ;
state s3: goto s4 ;
state s4: goto s5 ;
state s5: goto s6 ;
state s6: goto s7 ;
state s7: goto s8 ;
state s8: goto s9 ;
state s9: goto s10 ;
state s 10: goto sO ;
test vectors
([elkl,clk2,res] -> [count])
[0 , c , 1 ] -> [15 ];
[c , 0 , 0 ] -> [ 0 ];
[0 , c , 0] -> [ 0 ];
[c , 0 , 0 ] -> [ I ];
[c , 0 , 0] ->
[c , 0 , 0 ] ->
[c , 0 , 0] ->
[c , 0 , 0 ] ->
[c , 0 , 0] ->
[c , 0 , 0] ->
[c , 0 , 0 ] ->

[
[
[
[
[
[
[
[c , 0 , 0] -> [
[c ,0 ,0] -> [
[c , 0 , 0 ] -> [

2 ];
3 ];
4 ];
5 ];
6 ];
7 ];
8 ];
9 ];
10];
0 ];

end

6-146

Using ABEL to Program
the Cypress CY7C331
This application note describes how to program the
CY7C331 using Data I/O's ABEL. Each section of the
application note describes a configuration and presents
the relevant ABEL source code. (You can obtain all the
examples presented in this application note from the
Cypress Bulletin Board at (408) 943-2954. Retrieve the
file 331APNT.EXE; it unarchives itself automatically.)
The information presented here can simplify the
jobs of circuit designers, who are under a lot of pressure to shorten design cycles and fit numerous functions
into a small footprint. The latest programmable logic
devices (PLOs) give you the ability to increase circuit
density with a reduced design cycle. When you combine
multiple types of PLDs from multiple vendors on the
same board, using a general programmable logic compiler such as ABEL makes a lot of sense.
Unfortunately, as PLOs get more complex, the concept and implementation of a universal compiler becomes non-trivial. A compiler vendor such as Data I/O
must define a syntax that is both easy to use and powerful enough to accommodate hundreds of different
PLO s. The ABEL PLO compiler succeeds with a vast
array of features. It does an admirable job of supporting
over 300 different types of PLD source equations with a
multitude of different architectures.
The architecture covered in this application note is
that of the Cypress CY7C331. This device belongs to a
family of high-speed, high-density, 28-pin PLOs. Features such as individual set, reset, and clock product
terms for each of the 24 registers make the device one
of the most versatile PLDs on the market today.

together. Because only one term is available, OR terms
are not allowed in the equation.
The advantage to using pin 14 rather than a
product term is that the pin enables or disables the output buffers 5 ns faster. This is because the output
enable signal does not travel through the array.
Any I/O pin (pins 15 - 28) used on the left side of
an equation, by default, has its output enable
programmed as asserted. For example:
Il, OUT15
PIN 1, 15;
EQUATIONS
OUT15
= 11 ;
is the same as
11, OUT15
PIN 1, 15;
EQUATIONS
OUTI5
= 11;
OUT 15.0E
= 1;
If you use the direct connection to pin 14, the signal must be configured as active Low. The way ABEL
configures the output enable mux depends on the equations. If the right hand side of an ".QE" equation has
just an inverted pin 14 on it, ABEL assumes you want
to use the direct connection to pin 14. For example, the
following equations use the direct connection to pin 14
(CO = 1):

Il4,115
PIN 14,15;
OUTI5, OUT16
PIN 15,16;
EQUATIONS
[OUT15,OUT16].OE = !Il4;
The same example uses the product term array if
you change the equation to:
[OUTI5,OUTI6].OE = 115; OR
[OUT15,OUT16].OE = Il4 & Il5;
or even:
[OUTI5,OUT16].OE = Il4;
In some cases, you might want to use pin 14 to control the output enable, but for timing reasons use the
product term array instead of the direct connection.
ABEL allows you to do this by using an ISTYPE statement. In the following example, the output enable for

Controlling the Output Enable
The CY7C331 has two different methods ofcontrolling the output enable on each of the twelve outputs
(see the CY7C331 diagram in Figure 1 of "Using the
CY7C331 as a Waveform Generator"). Either pin 14 or
a product term can control each output enable. Controlling the output enable by a product term means
using any combination of inputs and outputs ANOed

6-147

pin 16 goes through the product term array, and pin 17
uses a direct connection:
12,114
PIN 2,14;
OUTI6, OUT17
PIN 16,17;
ISTYPE 'EQN' ;
OUTI6.0E
EQUATIONS
[OUT16,OUTI7].OE
= !Il4;
OUT16
= 12 ;
OUT 17
= 12 ;
TEST VECTORS
([12,114]
->
[OUT16,OUTI7])
[ X , 0]
->
[ z , z ];
[0, 1]
->
[ 0 , 0 ];
[ 1, 1]
->
[ 1 , 1 ];
Note that in most cases when an output register is
buried and the I/O pin serves as an input, ABEL does
not automatically disable the output enable. In fact, you
cannot disable the output enable unless you defme it
with an ISTYPE 'EQN' statement.
In the following example, the OUTI5.0E = 0
statement does not disable the output enable unless the
statement is preceded with OUTI5.0E ISTYPE 'EQN':
" The following code is for testing
" polarity on the CY7C331.
"input pins
PIN 1,2;
I1,CLK
RES,PRE,OE
PIN 4,5,6;
"output pins
OUTI5,OUTI6
PIN 15,16;
OUTI7,OUT18
PIN 17,18;
"constants
C,X,Z
= .C., .x., .Z.;
ISTYPE 'EQN';
OUTI5.0E
EQUATIONS
" the example below shows using the
" feedback from register 15 to
" control the preset and set of
" register 16.
OUT15
11;
OUTI5.C
= CLK;
OUTI5.RE
= RES;
= PRE;
OUTI5.PR
" The following statement is ignored
" without previous istype'eqn'.
OUTI5.0E
0;
OUTI6.RE
OUT15;
OUTI6.PR
!OUTI5;
OUTI6.0e
1;
TEST VECTORS
([Il,CLK,RES,PRE]

[ 0, 0, 0, 0 ]
[ 0, C, 0, 0]
[ 1, C, 0, 0]

->

[OUT15,OUTI6])

->

[ Z , 1];

->
[Z, 0];
-> [ Z , 1];
" This tests what happens to the
" polarity of the register feedback
" when you go from register to

" transparent.

[0,0, 1, 1 ]

->
[ Z, 0];
[ 1, 0, 1, 1 ]
->
[ Z, 1];
In general, it is advisable to use the ISTYPE 'EQN'
for all I/O pins that use a product term to control the
output enable, especially when trying to disable an output buffer.

Registered Output Only
You can use the CY7C331 macrocell as a
registered output, without using the input register, as illustrated in the following example:
"input pins
PIN 1, 2 ;
D INP, CLK
"output pins
OUT15
PIN 15;
"constants
C,X,Z
= .C., .x., .Z. ;
EQUATIONS
OUT 15
:= D INP ;
OUTI5.C
= CLK;TEST VECTORS
([0 INP,CLK]
->
OUTI5)
[ X,O]
->
1;
[ 0, C ]
->
0;
[ 1, C ]
->
1;
As shown in this example, the minimum requirement to configure an output into a register is the OUTPUT := INPUT equation and an equation describing
where the clock is coming from. The latter is necessary
because the CY7C331 has no dedicated clock pin.
Because the following equations are ABEL
defaults, you do not need to explicitly define them:
OUTI5.RE = 0;
"disable reset
OUTI5.PR = 0;
"disable preset
" permanently enable output buffer
OUT15.oE = 1;
The next example uses all the output register's features. For example, you can dynamically switch from
registered mode to combinatorial and back to
registered. Although the ABEL simulation always shows
the register returning to the same state when switching
from combinatorial to registered mode, the. actual state
varies from device to device.
Also note that this example adds OUT17. to show
that even when the pin 15 output buffer. is disabled, the
register's state still feeds back to the product term array
via the feedback mux. The ABEL default for the feedback mux in the registered mode is to take information
from the register (Cl = 0).
"input pins
D INP,·CLK
PIN 1,2;
PIN 3,4,5;
RES, PRE, OE
"output pins
PIN 15,16;
OUTI5, OUT16
PIN 17,18;
OUTI7, OUT18
"constants
= .C., .x., .Z.;
C,X,Z

6-148

EQUATIONS
" OUT15 is using the output register in both registered
"and combinatonal mode by manipulating the
" set and reset terms.
OUTIS
:= D INP ;
= CLK;
OUTI5.C
= RES ;
OUTI5.RE
PRE ;
OUT 15.PR
OE ;
OUTI5.0E
OUT15
OUT17
TEST VECTORS
([15 INP,CLK,RES,PRE,OE]
-> [OUT15,OUTI7])

[0 ~O ,0 ,0,0]

-> [Z , 1 ];

"with no external help, the registers initialize to the
"reset state, which means the outputs are high,
" because of the non-bypassable inverter in
"the output path.
[ 0 , 0 , 0 ,0, 1]
-> [ 1 , 1 ];

[ 0 , 0 , 0 , 1 , 1]
-> [ 0 , 0 ];
[ 0 , 0, 1 ,0, 1]
-> [ 1 , 1 ];
[ 0 , C , 0 , 0, 1]
-> [ 0 , 0 ];
[ 1 , C , 0 , 0, 1]
-> [ 1 , 1 ];
" The register becomes combinatorial
" when the reset and preset are both asserted
[ 0 ,0, 1 , 1 , 1]
-> [ 0 , 0 ];
[ 1 , 0, 1 , 1 , 1]
-> [ 1 , 1 ];
" this is the state the register returned to
" when going from combinatorial to registered mode.
[ 0 , 0 , 0 , 0 , 1]
-> [ 0 , 0 ];
Remember that the ABEL default for the feedback
mux in the registered mode is to take information from
the register (Cl = 0). This is not the case when you
configure the output register as transparent, however, as
shown in the next example.

When you configure the output register as
transparent, the input register path data is automatically
fed to the product term array (C1 = 1). Because ABEL
also defaults to transparent input registers, the data fed
to the product term array is not the same as the
registered output data.
You can feed data back to the product term array
from before the output buffer-even when the output
register is configured as transparent-by using an ISTYPE 'FEED REG' statement:
"input pins
6,7 ;
16,17
PIN
"output pins
16,18;
OUT16, OUT18
PIN
"constants
C,X,Z
= .C., .x., z. ;
ISTYPE 'FEED_REG';
OUT16
EQUATIONS
OUT16
= 16 ;
OUT16.0E
= I7;
OUT18
= OUT16
TEST VECTORS
([16,17]
->
[OUT16,OUT18])

[ 0, 0]
->
[ Z , 0];
[ 1,0]
->
[ Z , 1];
[ 0, 1]
->
[ 0 , 0];
[ 1, 1]
->
[1, 1];
If you omit the FEED_REG statement, an error
occurs in the simulation. The FEED REG statement
changes the feedback-mux configuration bit from One
to Zero (Cl = 0).

Transparent Input Only
The ABEL 3.2 default is to make the input register
transparent. Thus, to specify an I/O macrocell as a combinatorial input, place the specification on the right side
of an equation:
"INPUTS
INP16, OUT18
PIN 16, 18;
EQUATIONS
OUT18
= INPI6;
TEST VECTORS
(INP16
->
OUTI8)

Combinatorial Output Only
ABEL allows you to configure the output register
as transparent by using the "=" symbol instead of ":="
in the equations, as this example shows:
"input pins
11
PIN 1;
"output pins
OUT15
PIN 15;
"constants

C, X, Z
EQUATIONS
OUT15
TEST_VECTORS
( 11
->

=

.C.,

.x.,

o

->
0;
1
->
1;
In this example, only one operator (=) serves to
configure both registers as transparent. This method
works because the equals sign controls only the output
register configuration (OUT18), which is possible because the default configuration for an input register is
transparent. Changing the "=" to ":=" changes the pin18 output register from transparent to registered, but
does not affect the pin-16 input register.

.Z.;

= 11 ;

OUT15 )
0;
1
1;
In this example, the following equations are ABEL
defaults, and you do not have to write them. Including
these equations does not cause an error.
OUT 15.PR
1; "set and reset
OUT 15.RE
1; "high = transparent.
OUTI5.0E
= 1;
"enable on.

o

->
->

The Macrocell as a Registered Input Only
To change an input register from transparent to
registered, you configure the register using its node

6-149

=e:~RESS

Using ABEL 3.2 to Program the Cypress CY7C331

~, ~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

number. Table 1 lists. the node assignment for each
register.
To use an input register as a register, place the signal on the right side of the equation and add the rest of
the terms needed. In the following example, INP17 is a
registered input pin. The register itself is called
INP17REG. OUT19 is a transparent or combinatorial
output
"pin definitions
INP17, RESET
PIN 17,3;
SET, CLK
PIN 4,5;
OUT19
PIN 19;
INP17REG
NODE 145;
EQUATIONS
OUT19
= INP17 ;
INP17REG.C
= CLK;
INP17REG .PR
= SET;
INP17REG.RE
= RESET
TEST_VECTORS
([INP17,CLK,SET,reset]
out19)
->
[ X ,X, 0, 1]
0;
->
[ X ,X, 1 , 0]
-> 1 .,
[ 0 ,C, 0, 0]
-> 0;
[ 1 ,C, 0, 0]
-> 1 ;
[ 0 ,X, 1, 1]
-> 0;
[ 1 ,X, 1, 1]
-> 1;
To access the data stored in an input register, use
the pin name. Access the set, reset, and clock using the
input register node name.

Burying the Output Register/Registered
Input
The CY7C331 allows you to bury an output register
and still use the pin as a registered input by using the
shared-input mux. The CY7C331 provides a· sharedinput mux between pins 15 and 16, 17 and 18, 19 and 20,
etc. Thus, there are three paths into the product term
array for every pair of macrocells. You therefore cannot
bury both of a pair's output registers and still use the
pin as an input. If you bury the output register at pin 15
and use the pin for an input, for example, you cannot
bury the output register at pin 16 and also use the pin
for an input.
Use the pin name to access the information fed
back to the product term array from the output register.
Use the node number of the shared-input mux to access
the input data coming from the pin and passing through
the input register. The shared-input mux node number
assignments appear in Table 2.
The shared-input mux can take information from
one of the two macrocells. ABEL defaults to selecting
the macrocell of the even pin number. However, macros
are available that select the odd pin's macrocell. You
can access these macros by using the following syntax:
LIBRARY
'P331';
FEEDPIN_27;

Table 1. CY7C331 Input Register Node Assignments

Pin Number

Re2ister Node

15

143

16

144

17

145

18

146

19

147

20

148

23

149

24

150

25

151

26

152

The
LIBRARY statement inserts a copy of all the possible
CY7C331 macros into the source during compilation.
You can observe the result by looking at the listing file
(.LST). The FEEDPIN_27 statement selects pin 27 to
pass through . the shared-input mux, overriding the
default, which is pin 28.
The following code is the complete listing of a test
program that shows how to bury a register and employ
the pin as an input, using macros to change the sharedinput mux:
module test3
title 'CY7C331 test programs for applications note
Cypress Semiconductor Inc. 3/16/90'
TEST3 DEVICE 'P331';
"This is an example of burying the output register
"of a CY7C331 and using the I/O pin as an input.
"input pins
11, CLK1, CLK2
PIN 1,2,3;
RES, PRE, OE
PIN 4,5,6;
PIN 7;
CLK3
"output pins
OUT1S, OUT16
PIN 15,16;
PIN 17,18;
OUT 17, OUTI8
"constants

C,X,Z

.=

.C.,X.,Z.;

"LIBRARY statement is used to access the macros
"needed to change the shared-input mux selection.
LIBRARY 'P331';
"Data from pin 1 gets clocked through the buried
"register on pin 15, and output on pin 16.
"Output register 15 is configured as a register and the
"pin 16 output register is transparent.
"Data also gets input on pin 15 and output on pin 17.
"Both are configured as registers.

6-150

5?l

Using ABEL 3.2 to Program the Cn!ress CY7C331

~~m~~~~~~~~~~~~~~~~~~~~~~~~~~

NODE 143;
INP1SREG
NODE 29;
INP1SMUX
ISTYPE 'EQN';
OUT1S.0E
[OUT16,OUT17];
OUTPUTS
FEEDPIN 15;
EQUATIONS
OUT17
:= INPlSMUX
= CLK3 ;
OUT17.C
INP1SREG.C
= CLK2 ;
.cUTIS
:= II;
OUT1S.C
= CLKI ;
= 0 ; " disable reset,
OUT1S.RE
OUT1S.PR
= 0 ; " preset, and oe
OUT1S.0E
=0 ;
OUT16
= OUTlS ;
TEST VECTORS
([I1,CLK1,OUT1S,CLI(2,CLK3] -> [OUTPUTS])
[X, 0, 0 , 0,0]
-> [1,1];
[0, C, x , 0,0]
-> [0,1];
[1, C, X , 0,0]
-> [1,1];
[X, 0, 0 , C,O]
-> [1,1];
[X, 0, x , O,C]
-> [1,0];
[X, 0, 1 , C,O]
-> [1,0];
[X, 0, x , O,C]
-> [1,1];
END
The ABEL 3.2 compiler contains a bug that relates
to this example. If you remove the line OUT1S.0E ISTYPE 'EQN';, the code compiles and simulates correctly. However, if you look at the resulting ,JEDEC map
for the equations, the output buffer for pin 15 is
enabled, which should cause the simulation to fail. Contact Data I/O for more information.
When you use macros, be cautious about several
aspects of ABEL. In equations, for instance, the ABEL
parser allows spaces between the end of the equation
and the semicolon. However, you must place a semicolon immediately after a library statement and a
macro. The parser does not allow a space between a
semicolon and a library statement or a macro.
Additionally, because the key words of the macros
that are accessed using the library statement are in
, Table 2. CY7C331 Shared Input Mux Node
Assignment

Pin Numbers

Shared Input
Mux Node

15116

143

17118

144

19/20

145

23/24

146

25/26

147

upper case, you must put all references to the macros
(e.g., FEEDPIN_27) in upper case. This is the only
place where ABEL is case sensitive.
Finally, although you can put the library statement
anywhere in the source code's declaration section, you
must put macros last in the declaration section, before
the equations section.

Transparent Output with Registered Input
This example shows how to configure a buried
transparent output register with a registered input As
described in the earlier section on transparent output
registers, when you configure the output as transparent,
the feedback to the product term array passes through
the input register, unless programmed otherwise. The
following code shows how to override the default using
the ISTYPE 'FEED REG' statement.
(Note that in the input section of the simulation,
OUTlS represents the data being input on pin 15. This
representation is somewhat confusing because in the
equations OUT1S refers to the information coming
from the pin-IS output register. See the simulation section of this application note for an explanation of this
apparent discrepancy.)
"input pins
II, CLK2
PIN 1,2;
PIN 3;
CLK3
"output pins
OUT 15, OUT16
PIN 15, 16;
OUTI7, OUT18
PIN 17,18;
"constants
C,X,Z
= .C., X.,
LIBRARY 'P331';
"Input data from pin 1 goes through the buried
"register on pin 15, and is output on pin 16.
"Output registers 15, 16 are configured as transparent.
"Data is also input on pin 15 and output on pin 17.
"Pin 15 input, pin 17 output are registered.
NODE 143 ;
INP15REG
INPI5MUX
NODE 29 ;
OUT1S
ISTYPE 'FEED_REG';
FEEDPIN 15;
EQUATIONS.- INPl5MUX;
OUT17
= CLK3 ;
OUTI7.C
INPlSREG.C
= CLK2 ;
OUT 15
=I1;
OUT15.0E
= 0 ;
OUT16
= OUT15
TEST VECTORS
([iI,OUT1S,CLK2,CLK3]
-> [OUT16,OUTI7])
[O,X,O,O ]
-> [ 0, 1];
-> [ 1, 1];
[1,X ,0,0]
[l,O,C,O ]
-> [ 1, 1];
[1,X,O,C]
-> [ 1, 0];
[1,1 ,C ,0] ,
-> [ 1, 0];
-> [ 1, 1];
[1 ,X ,O,C]
"end

z.;

6-151

Si;a=
~

--;;;;;;====;;;;;;;;;U;;;s;;;in~g~AB~E;;;L;;;;;3;;;.;;;2;;;to~P;;;r;;;o;:;gr;;;a;;;m~th;;;e;;;;;;C;;;yp:;:;;;r;;;;;es;;;;s;;;;;C;;;Y7=C;;;3;;;;;3;;;;;;;1

SEMICCtIDUCTOR_

Using the CY7C331 for Counting
You can use the CY7C331 to create a synchronous
counter. The only limitation to using the device in a
synchronous mode is that all feedback must be internal
to the part, because the input-data hold time is not
compatible with the output-data hold time.
ABEL provides many ways to implement a counter,
including describing it explicitly in D or T flip-flop
form.
The following example shows how to use the "count
= count + 1" capability with the CY7C331 to implement a basic counter. The ABEL compiler uses the
CY7C331's XOR gate to implement T flip-flops without
any external instructions such as ISTYPE 'REG_T'.
"input pins
11, CLK2, CLK3
PIN 1,2,3;
PIN 4,5,6;
RES, PRE, OE
"output pins
OUT15, OUT16
PIN 15,16;
PIN 17,18;
OUT17, OUT18
"constants
LIBRARY 'CONSTANT';
COUNT =[OUT18, OUT17, OUT16, OUT15];
EQUATIONS
" Example of 4-bit counter
" that starts and wraps around at 15.
COUNT.C
= CLK2;
COUNT
:= COUNT + 1;
" Example of how to use set and reset with this form
COUNT.RE
RES;
COUNT.PR
= PRE;
TEST VECTORS
(CLK2
-> COUNT)
o
-> 15;
C
-> 0;
1;
C
->
C
-> 2;
C
-> 3;
TEST VECTORS
([CLK2, RES, PRE]
-> COUNT)
[0,0,0]
3;
->
[0,0,1]
0;
->
[0, 1,0]
-> 15;
[0,0,1]
0;
->
[C,O,o]
1;
->
[C,O,O]
2;
->
[C,O,O]
3;
->
"end

ABEL makes polarity control transparent by allowing you to write equations with both positive- and negative-polarity outputs. Most of the examples in the previous sections, for instance, had active-High outputs.
But hard-wired polarity becomes an issue when using
set and reset. Keep in mind that a reset causes the output to go High.
ABEL takes care of the necessary inversions in the
device to get the correct output polarity. This operation
can be tricky when the internal feedback from a register
controls another register's set or reset. Because both
polarities are available in the product term array, it is
not obvious which polarity should be used. Refer to the
last example in the "Controlling the Output Enable"section of this application note for an example of indirect
set and reset control.
Although the CY7C331 has active-Low outputs,
defining the outputs active High (using OUT15 ISTYPE
'POS') sometimes causes ABEL's Reduce module to
create equations that suit the CY7C331 better. This effect is especially true when you use the XOR gate.
Refer to pages 3 - 4 in the ABEL 3.2 User Notes for
more information.

Simulation
Simulation is very important with a part as versatile
as the CY7C331. All the examples in this application
note have been simulated to verify their function.
The ABEL simulator is powerful enough to simulate most of the configurations possible with the
CY7C331. For example, the simulator supports multiple
clock inputs controlling different registers. An application that illustrates this capability is a ripple counter.
This counter has the clock input driven from the previous stage's output, with the least-significant bit driven
by an external clock.
The following is an example of a 4-bit decrementing ripple counter implemented in the CY7C331.
"input pins
PIN 2,3;
CLK2, RESET
"output pins
PIN 15,16;
OUT 15, OUT16
PIN 17,18;
OUT17, OUT18
"constants
LIBRAR Y 'CONSTANT';
COUNT =[OUT18, OUT17, OUT16, OUT15];
EQUATIONS
"example of a 4·bit ripple counter that starts at 15
"and wraps around at O.
= RESET;
COUNT.RE
OUT15.C
= CLK2;
:= !OUTI5;
OUT15
= OUT15;
OUT16.C
:= .!OUT16;
OUT16
= OUTI6;
OUTI7.C
OUT17
:= !OUT17;
= OUT17;
OUT18.C
:= !OUTI8;
OUT18

Polarity Issues
The CY7C331 's outputs do not have programmable
polarity control in the same sense as the 22V10. The
CY7C331 has a hard-wired inverter between the output
register and the output pin that results in an active low
output. You generally control the device's polarity using
the XOR gate located in front of the output register.

6-152

5?~a< .; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; V; ; ; S; ; ; iD; ; g: ; ; ;AB; ; ; ; ; ;E; ; ; ; ; ;L; ; ; 3; ; ; _2; ; ; ; ; ;to; ; ; ; ; ;P; ; ; r; ; ;o; :g; ; ;ra; ; ;m; ; ; ; ; ;t; ;h; ;e; ;;;;yp~r;;;;e;;;ss~C;;;;Y7~C;;;3;;3;;;,1
;C
TEST VECTORS
([CLK2,RESET ]
-> COUNT)
[ 0,
-> X;
[ 0, 1 ]
-> 15;
[ C,
-> 14;
[ C,
-> 13;
[ C,
-> 12;
[ C,
-> 11;
[ C,
-> 10;
[ C,
-> 9;
The CY7C331 powers-up with all registers in the
reset state. The simulator, in most cases, mimics the
device power-up characteristics. However, in certain
applications, including the previous one, the simulation
consistently initializes to a non-reset state.
Another interesting problem with simulating the
CY7C331 is naming the input data when you bury the
output register and use an I/O pin as an input. Although the input-register data is accessed in the equations using a node name, the ABEL simulator only
works with pin names. In this application note's
"Transparent Output with Registered Input" section, the
example's equations section uses the node name
(INP15MUX) to access the data being input on pin 15;
the pin name (OUT15) is used to represent the data
from the output register, which is fed back to the
product-term array and then to pin 16. In the simulation
section, however, OUT15 now represents the data being
input on pin 15. The ABEL simulator is smart enough
to know which data you are referring to.
Remember that simulation preload does not work
with registered asynchronous parts such as the
CY7C331 or 20RAlO. However, if your design has an
extra input, you can preset to a specific value by using
the set and preset product terms individually. For example:

"input pins
CLK, PRE
PIN 2,3;
"output pins
OUT15, OUT16
PIN 15,16;
PIN 17,18;
OUT17, OUT18
OUTI9, OUT20
PIN 19,20;
"constants
LIBRARY 'CONSTANT';
COUNT = [OUT20, OUT19, OUT18,
OUT17, OUT16, OUT15];
PRESET = [OUT20, OUT19, OUT18,
OUT 17, OUT16, OUT15].RE;
RESET = [OUT20, OUT19, OUT18,
OUTI7, OUT 16, OUT15].PR;
EQUATIONS
= CLK;
COUNT.C
COUNT
:= COUNT + 1;
WHEN (PRE == 1)
THEN PRESET
= [1,0,1,0,1,1];
WHEN (PRE == 1)
= [0,1,0,1,0,0];
THEN RESET
TEST VECTORS
(CLK
->
COUNT)
63;
->
0;
C
->
1;
C
->
2;
C
->
3;
C
->
TEST VECTORS
"preload simulation test
([clk,pre]
-> count)
[0, 0]
-> 3 ; "remembers from previous sim.
[C ,0]
-> 4 ;
[0, 1]
-> 43 ;
[C , 0]
-> 44;
[C , 0]
-> 45 ;
"end

°]
°°° ]]]
°°]]
°]

°

6-153

CYPRESS
SEMICONDUCTOR

Using LOG/iC to Program the CY7C330
This application note provides you with a running
start towards using the LOO/iC design synthesis tool for
designs using the Cypress CY7C330 programmable
logic device.
Of the steps required for implementing designs
using PLDs, generating JEDEC files from high-level
descriptions is probably the most time consuming. Unfortunately, the documentation that comes with many
high-level synthesis packages does not provide enough
detailed information to use advanced PLDs without a
significant learning curve. Although the LOO/iC
documentation is quite good, this application note
should help flatten the LOO/iC learning curve further.
Isdata's LOO/iC is an advanced universal logic synthesis program that generates designs targeted for
PROMs, PLDs, and gate arrays. The LOO/iC package's
basic algorithms were developed in the Electrical Engineering Department of the University of Karlsruhe,
West Oermany. Although a relative newcomer to the
PLD software market in the U.S., LOO/iC has become
very popular in Europe.
LOO/iC is available for a variety of operating environments including PC DOS and SPARC-based
SUNoS platforms. The software is available as four different packages with two options. The first (PLC) package supports PAL designs. It offers input in either
equations or tables with syntax constructs that include
address ranges and functional blocks. Also available are
hexadecimal, decimal, octal, and binary representations.
The second (PLUS) package extends the first and
supports the design of sequential controllers via inclusion of their FSM (finite state machine) syntax. This
package includes an automatic test vector generation
feature.
Package three (PERFECT) extends support to include designs partitioning across multiple devices. Package four (OATES) supports the design of multi-Ievelstructure gate arrays by producing netlists from the
various input formats. The two option packages offered
are the Functional Verifier and PLD Database.
This application note deals with the functions available in package two (PLUS).

LOG/iC Language Overview
LOO/iC offers three different entry methods:
Boolean equations, truth tables, and FSM. Declarations
partition an input me into sections. Additionally,
designs can be logically partitioned into functional
blocks within a LOO/iC design me. These options are
described briefly before proceeding to CY7C330specific information.
Declarations

Declarations are directives to the LOO/iC compiler
that identify the design, indicate the inputs and outputs,
specify compiler options, assign pin numbers to variables, and specify the type of input format. These declarations separate the input me into discrete sections that
describe the· design's various aspects. LOO/iC declarations consist of a key word preceded by an asterisk (*).
The frrst section is the *Identification section,
where you enter comments regarding the function of the
design, etc. Variable declarations follow this information. LOO/iC supports input, output, local, and state
variables for both Mealy and Moore machines. You can
specify variables in ranges for compacmess of expression, such as Address[O ..31]. Variables can also have
special function extensions that control the function of
the device, such as RAS.OE.
Following the variable declarations is the design
description. It is denoted by the declaration *Boolean
Equation s, *Function-Table, or *Flow-Table for
Boolean, truth table, and FSM entry methods, respectively. In the design description, you specify the circuit's
function. Drawing an analogy to programming a computer in a high-level language, you could say that most
of the other declarations describe the circuit's variables.
The design description implements the algorithm to
perform the function you wish to create.
Next are the *PLD, *PINS, *Run-Control, and
*END sections. The *PLD declaration describes the
device type targeted for use in this design. *PINS controls the assignment of the external variables to device
pins. Finally, *Run-Control provides compiler directives, and *END signifies the end of the design file.
6-154

*Identification
Parallel Load Register with acknowledge
MMA - Cypress Semiconductor
*X-Names
Load, Data[O .. 7];
*Y-Names
Qout[O .. 7],ACK;
*Boolean-Equations
Qout[O .. 7] := Load & Data[O..7];
ACK = Load;

Table 1. LOG/iC Operators
Operation
Unregistered Output
Registered Output
Negation
AND
OR
XOR
Constant 1
Constant 0

Symbol

1

&
+

#
VCC
GND

Example

z

= X;

Z:= X;
Z = IX;
Z = X & Y;
Z = X + Y;
Z = X # Y;
Z = VCC;
Z
GND;

Figure 1. Boolean Entry Example

Boolean Design Entry
The simplest design entry method is by Boolean
equation s. Table 1 shows the operators supported by
LOG/iC in order of precedence. The labels X, Y, and Z
can represent either a single variable or a range of variables.
Logic polarity often creates an amazing amount of
confusion for a methodology that has only two values.
LOG/iC removes the burden of considering whether a
given signal is active Low or High, because Boolean
equation s always have a positive polarity. Thus, if a
given input variable is specified without a 'I', that variable is deemed to be true independently of the active
level of the signal on the pin.
LOG/iC deals with signals that are active Low via
the *Level declaration. You therefore write equations
for an active-Low signal exactly the same as those for an
active-High signal. The *Level section identifies the
polarity of given input signals and manages negative/positive polarity issues for you.
Another useful aspect of Boolean entry is the use
of ranges, which provide a compact method of referring
to many variables in a succinct fashion. Typical examples include references to address or data buses. Figure 1 shows an example of Boolean entry that utilizes
variable ranges. This example features an 8-bit data bus
whose values are captured in a register when a load
command is issued.

Figure 3 shows an example in which a header changes the variable ordering. This example uses two important constructs that can assist in reducing the logic
design to the minimum number of product terms. The
first construct is the Don't Care entries designated by a
hyphen (-), which appear on both the input and output
sides of the table.
The use of the Don't Care input is unique to the
function table entry method and can significantly improve the compiler's ability to produce minimized logic.
Note that Don't Cares are only available when using bit
fields and that the table ends with word "REST" on the
input side. The use of the rest statement stems from the
fact that, to uniquely identify all possible ~put Matterns
with N input variables, you would requrre 2 table
entries. A single Don't Care in any given line represents
two entry lines rather that one. The rest statement
provides a brief way to specify all remaining possible
input values and the output the values should produce.
The header line has an additional benefit beyond
merely changing the order of bit data. You can also use
the header line to indicate logical groupings of data as
fields. Data that is not entered in groups must be
entered as binary data. Grouped variables, however, can
represent input data that is in binary, octal, decimal, or
hexadecimal representations. Suffixes that indicate the
*Identification
Truth Table Example
MMA - Cypress Semiconductor
*X-Names
X[6 .. 1];
*Y-Names

Truth Table Entry
Truth table entry represents one of the most compact entry methods to describe a combinatorial system.
With this entry format, you map the outputs as a function of the input variables. The basic format of truth
tables appears in Figure2.
This example contains several noteworthy characteristics. The first is the ordering of the inputs and outputs. Note that the labels after the key word "*FunctionTable" are comments, indicated by the leading semicolon (;). Thus, the ordering of the X and Y variables in
the *X-Names and *Y-Names declarations specifies
their ordering in the function table.
If you want some other ordering, you can specify it
with a header. A header is a logical line preceded by
the dollar sign symbol ($). When using a header, you
separate the variables into fields delimited by commas.

Y[1..4];
*Function-Table
Input Side
X X X X
6 5 4 3

o

o

1

o
1

X

X

2
1
0

1
0
0

1

1
1

o

o 0
o 0

o
o
REST

Output Side
Y Y Y Y

1
1

234
,
0
000 1 ;
1
1 0;
1;
o 1
100 1 ;
100 1 ;
0;
1

Figure 2. Truth Table Example

6-155

*Identification
Truth Table Example with header
MMA- Cypress Semiconductor
*X-Names
X[6.. 1];
*Y-Names
Y[1..4];
*Function-Table
Output· Side
i Input Side
$ X6, XS, X4, X3, Xl, X2
Y4, Y3, Y2, Yl
-, 0, , 1 ,.
0, -, -, -, 0, 1
1, 0, 0, 0;
1, 1, 1, 0, 0,
0, 1, , 1·,
0, 1, -, -, 1,
1, -, -, 1, 1, 1
1, -, 1, 0;
1, 0, 0, 1 ;
0, -, 1, -, 0,
1, 0, 0, 1 ;
0, -, -, 1, 0,
REST
0, -, -, 1 ;

°°
°°

-

Figure 3. Truth Table with Header
data format appear in Table 2. It is important to note
that a field is always totally occupied by a number; if
necessary, leading zeros are added to completely fIll the
field.
In addition to fIelds, function tables allow the use
of ranges. This feature permits effIcient implementation
of address decoders (Figure 4). The function table for
this decoder specifIes the address as ordered from 15 ..0.
This order is signiflcant because it is the same order as
that of the hexadecimal numbers entered in the ranges
below, when you view the hexadecimal numbers as individual bits. Also note the double parenthesis surrounding the outputs in the header line, which label this
field as a bit field, eliminating the need· for separating
commas.
Finite State Machine Entry
FSM entry is probably the design methodology that
correlates best with the CY7C330's target application as
a high-speed state machine. LOG/iC's documentation
defInes an FSM as a circuit that has combinatorial logic
and state registers of arbitrary type that feed back to a
combinatorial array. Add to this defInition multi-clocked input registers that minimize set-up and hold time
requirements and you have a high-level description of
the CY7C330.
More generally described, state machines have
memory elements that describe the present condition
and inputs that influence both the transition to the next
state and the outputs. FSMs are typically classified in
two general categories: Moore and Mealy machines.
LOG/iC differentiates between these types by stating
that machines whose outputs might change arbitrarily
within a state, even without a clock pulse, exhibit "Mealy
behavior." Moore machines; on the other hand, have
outputs that change only with the state clock and are
free of glitches. This output is typifled as "Moore behavior" and is characteristic of the CY7C330. These out-

Table 2. Numeric Base Indicator SufilXes
B

Binary (default - can be omitted)
Octal
Q Octal (alternate - to eliminate confusion between
and 0)
D Decimal
H Hexadecimal

o

°

puts are tied to the state clock and are referred to in
LOGlie as Z-variables.
Four variables describe an FSM's behavior: the
input variables' values, the present state, the output
variables' values, and the next state. An FSM's variable
declarations section has options for all these
parameters. As in the previous entry methods, *XNames describe the circuit's inputs. *Y-Names are
values that exhibit Mealy behavior. *Z-Names are outputs that change relative to the state clock, as do the
CY7C330's.
State information can assume one of two forms.
The most common (and easiest) way to store the
machine's state is to determine the total number ~
states required and dedicate N register bits (where 2
= the number of states) to maintain state information.
This method is reliable and produces discrete non-overlapping state assignments. The disadvantage is that you
must dedicate register resources (i.e., macrocells) that
might have served better in another capacity.
The second method available for state assignment
is assignment of states . based purely on the output
values. This method requires more thought, as it is critical that all output patterns be . unique. A design that
might meet this criteria on first pass, might not be
realizable if you add features - or remove them, in the
case of undesirable "features."
*IdentifIcation
Address Decoder Example
MMA - Cypress Semiconductor
*X-Names
Enable, Adr[O .. IS];
*Y-Names
ROM[1..3], Port[I,2];
*Function-Table
: OutPut Side
; Input Side
$ Enable, (Adr[15 ..0]) : «ROM[1..3], Port[1..2]));
1,
: 111 -- ; Disabled
0, OOOOOH ..007ftH : OIl 11 ; ROM1. Selected
0, 00800H ..OOfftH : 10111 . ; ROM2 Selected
0,01000H.,017ftH : 11011 ; ROM3 Selected
0, 08000H..08007H : 111 01 ; I/O Port 1 Selected
0, 08008H..0800FH : 111 10 ; I/O Port 2 Selected
: 01111 ; ROM1 (Shadow)
0, OfSOOH ..Offfm
: 111 11 ; Disabled
REST
Figure ( Address Decoder Function Table

6-156

*Identification
Counter with 247 states and overflow signal
MMA - Cypress Semiconductor
*X-Names
Reset;
*Y-Names
Overflow;
*Z-Names
Q[1..8]
*Flow-Table
S[1..247], X 1, Y 0, Fl
; Reset condition
S[1..246], X 0, Y 0, F[2 .. 247] ; Count
S[247],
X 0, Y 1, Fl
; Overflow

inputs and outputs might not be relevant to a subset of
the machine's sequence of operations. Rather than
force you to specify the status of all variables, LOG/iC
has a directive that lets you specify what variables are
significant. This statement is called Relevant and stays
in effect until the next Relevant statement or until the
end of the design. As an example, you can describe the
simple machine as:
*Flow-Table
Relevant = Xl, X2 : Y2;
51, X 0 1, Y 1, F2;
Omitting Yl from the Relevant statement indicates
that Yl is a Don't Care. If, instead, you want Yl always
to be off for the subsequent lines, you can state Yl = O.
Another powerful statement is Xrest. Similar to the
REST statement in function tables, Xrest provides a
brief way to assign all remaining non-specified input
patterns and these conditions' desired output and next
state.
You can also use ranges in flow tables for compact
machine descriptions. In only three lines, the counter
definition in Figure 5 completely specifies a state
machine with 247 states through the use of ranges. The
only limitation is the number of states that LOG/iC allows in a machine.
.
The table-driven LOG/iC optimizer allows a maximum of 1024 states. For most true state machine applications, you would be hard pressed to fit 1024 states
into a single PLD. But this syntax's attractiveness for
use in counters as large as 16 bits (64K states) in the
CY7C330 can lead you to run up against the 1024-state
limitation in short order.
Fortunately, LOG/iC can partition designs into
blocks. This capability allows you to partition the design
into smaller chunks that are optimized individually and
merged after compilation. Blocks also tend to mimic
optimal approaches to finding solutions by segmenting
designs into smaller functional units (more on this
later).
LOG/iC also includes a simple statement that
determines the type of flip-flop for implementing the
state registers via the *Flip-Flop directive. The default
is D-FlipFlops, but the T-FlipFlops statement can also
be used. The LOG/iC reduction algorithm automatically
generates optimized equations for the flip-flop type
specified. This capability is especially significant for the
CY7C330, because LOG/iC understands how to use the
XOR product term for both polarity control and T flipflop creation. The CY7C330 can implement large
counter s extremely efficiently using T flip-flops automatically generated by LOG/iC.

Figure 5. FSM Counter
LOG/iC can implement designs using either type of
state assignment. The *State-Assignment directive
provides the options of binary, number, gray, l-out-ofN, and Z-variables. The binary option dedicates
registers to state values and encodes the state values in
binary. LOG/iC can do this encoding automatically, or
you can specify the encoding explicitly.
Using the number option ensures that the binary
code for each state is the same as the state numbers
used in the high-level description, i.e., state 1 = 001,
etc. The gray option assigns the states using gray coding
to minimize transitions. l-out-of-N assignment again
uses registers but does not binary-encode states; instead, each discrete register represents a single state.
This approach is especially demanding on macrocell
resources but minimizes the number of state bits switching at a single clock edge. Finally, the Z-variable option
allows the output values themselves to represent the
states.
You enter the FSM design as a table after the
directive *Flow-Table. Each line in the flow table has as
many as four fields separated by commas. These fields
represent the present state, inputs, outputs, and next
state. Not all designs require all four states. Counters
are good examples of applications that require only
three fields to describe the machine, because the count
value is the same as the state value.
The order in which the fields appear is not significant, because a letter indicating the field type
precedes each field. The letters S, X, Y, and F indicate
the state-number, input, output, and next-state fields,
respectively.
A line in an FSM that describes part of a machine
might look like this:

*Flow-Table

Optimization Levels
You control LOG/iC's optimizer via the Compute
and Nocompute statements, which you can place in the
design file's *Run-Control section. Optimization levels
are essentially binary. Nocompute allows you to indicate

SI, X 0 1, Y - 1, F2;
When in state 1, with inputs at 0 and 1, this
machine causes the second output to go True and transitions to state 2. In this case, the first output is not
relevant to the design. In a large machine, many of the

6-157

~RffiS
~,

--;========;;;;;;;U~si;;;n~g;;;;;;;L;;;;;;;O~G;;;;;/i;;;C;;;;;;;t;;;;;;;o;;;;;;;P;;;;;;;r;;;;;;;o!:;gr;;;;;;;a;;;;;;;m=th;;;;;;;e; ; ; ;C; ; ; ;Y7=C; ; ; ;3; ; ; ;3; ; ; ,O

SEMICCffi)UCfOR _

outputs for which you desire no reduction. Compute is
complementary and allows you to explicitly specify the
outputs you want reduced. Another directive, CPUTime = nn, allows you to specify the maximum amount
of time the compiler can take to attempt an exact solution. After. this time, the compiler computes approximated solutions.

and CLK2, respectively. The default clock used is
CLKl. To specify CLK2 instead, use the *Special Functions directive along with the .IC2 pin name suffix.
Thus, to select CLK2 for input Fred, use the following
syntax:
*Special Functions
Fred.IC2 = YES;

CY7C330 Characteristics

Controlling Output Enable
The default for OE in LOG/iC is asynchronous,
pin-14 control of the output buffer. If you use the macrocell for input only (pure input), the OE-select fuse is
left intact, which selects OE from the product term. Because none of the product term fuses are blown, selecting OE from the product term results in the output
driver being turned off. Finally, if you use the macrocell
for both input and output, the OE again defaults to
asynchronous, pin-14 control.
You have several options for changing this default
behavior. First, you can use the OE special function. If
the macrocell is called AO, then:

Cypress's CY7C330 is a high-performance PLD optimized for state machine applications. It features a
pipelined architecture that achieves a 66-Mhz state
transition speed. The device's 11 dedicated registered
inputs offer small set-up and hold times. These verslltile
input registers can be clocked with either of two input
clocks. You select the input clock by programming a
configuration fuse unique to each input register. The
CY7C330 has a total of three clock pins - two for the
input registers and one for the output/state registers.
This feature allows you to synchronize input data
without using an external register. You can tie the clock
pins together if you need only a single clock source.
The CY7C330 provides 12 I/O macrocells and four
buried macrocells. The 12 I/O macrocells have an input
register structure identical to that of the dedicated inputs.
The outputs from the CY7C330 logic array feature
variable product-term distribution with nine to 19
product terms per output. These product terms are
XORed with an additional product term, which you can
use for equations that require an XOR, polarity control,
'
or T flip-flop implementation.
A fuse-configurable feedback mux allows you to
program the CY7C330 macrocell for feedback from the
input register or the output register (buried). The
device's output enable is configurable for control via a
product term or pin 14. This pin allows you to enable
the output buffers asynchronously. Product term OE
(output enable) is synchronous to the input register
values that comprise the OE equation. You can also
program this equation to permanently enable or disable
the output buffer.
When the feedback is programmed for stateregister (rather than input register) buried feedback,
you have an additional feedback connection between
pairs of I/O macrocel1s. This connection provides an
input path for the pin that would otherwise be lost. You
thus have the flexibility of burying six of the 12 I/O macrocells and using the associated' pins as dedicated inputs. The four hidden macrocells have the same
product-term structure as the I/O macrocells, with fixed
state-register feedback to the logic' array. The CY7C330
also furnishes two product terms that permit you to set
or reset all the state registers synchronously.
.
Selecting the CY7C330's Input Clock
The CY7C330's input registers are clocked with
either pin 2 or 3. LOG/iC refers to these pins as CLK1

AO.OE = 0;
; Sets OE to synchronous product term control and permanently turns OFF the driver
AO.OE = 1 ;
; Sets OE to synchronous product term control and permanently turns ON the driver
AO.OE = EQN;
; Sets OE to synchronous product term control, output
driver is controlled by the specified equation (EQN).
These constructs should allow you to create any
desired OE configuration, while maintaining readability.
You 'can also use the FUSES statement to control the
OE mux, as follows:
; BLOWN Selects synchronous product term output
buffer control
; INTACT Selects asynchronous pin 14 output buffer
control
Pin #
*Fuses;
15
$17067 = INTACT;
$17063 = INTACT;
16
$17060 = INTACT;
17
18
$17056 = INTACT;
$17053 = INTACT;
19
20
$17049 = INTACT;
23
$17046 = INTACT;
$17042 = INTACT;
24
$17039 = INTACT;
25
26
$17035 = INTACT;
27
$17032 = INTACT;
28
$17028 = INTACT;

6-158

Use oithe XOR Product Term

LOG/iC supports use of the XOR product term to
implement polarity control and T flip-flops. Polarity
control is automatic for all entry formats and is controlled via the *Level directive. LOG/iC uses the XOR to
create T flip-flops by using the *Flip-Flops directive
and specifying T-FlipFlops. The LOG/iC optimizer then
automatically produces reduced equations targeted at T
flip-flops.
Macrocell Feedback

statements in the *Boolean-Equations section. Avoid a
potential pitfall by remembering that resetting the
register to Zero causes a value of One to appear on the
output pin because of the inverting output buffer.
The following code shows the usage of the preset
and reset statements, where variable Paul presets the
register, and variable Ray resets the register.
*Boolean-Equations
$PS = Paul;
$RS = Ray;

LOG/iC defaults to selecting feedback from the
state register. If you use the macrocell as a pure input,
feedback is automatically routed from the input pin
register. Designs that use the macrocell state register
and the input pin register can specify feedback via the
.FBK function or FUSES statements.
As an example, say you use the state register as an
adder, and the associated macrocell input-pin register
holds a base value. In this case, you want to drive the
result onto the output pins during normal operation,
while the macrocell input register uses the feedback
path to provide the base value to the adder equations.
During base-value updates, you three-state the output
buffers and clock a new value into the macrocell input
registers. LOG/iC defaults to selecting feedback from
the state register. The following statements configure
the desired feedback:
SUM3.FBK = PIN;
or
; BLOWN Selects feedback from macrocell input
register
; INTACT Selects feedback from macrocell output
register
*Fuses;
Pin #
$17068 = BLOWN;
15
$17064 = BLOWN;
16
$17061 = BLOWN;
17
$17057 = BLOWN;
18
$17054 = BLOWN;
19
20
$17050 = BLOWN;
$17047 = BLOWN;
23
$17043 = BLOWN;
24
$17040 = BLOWN;
25
$17036 = BLOWN;
26
$17033 = BLOWN;
27
$17029 = BLOWN;
28

Using the Shared-Input Feedback Mux

As mentioned previously, the CY7C330 has a
shared-input feedback mux, which allows you to use a
given macrocell for both input and output. This feature
is useful for several configurations, such as when the
state register is buried as an internal state bit that is fed
back to the array, and the pin serves as a dedicated
input. In this case, the OE product term is typically configured to disable the output buffer.
Another good application for the shared-input
feedback mux occurs when you use the input register to
hold a seldom-changed value used by the machine. For
example, a counter might have an upper limit that is
loadable. During normal operation, the output buffer
OE is enabled and the count appears on the output
pins. When a new limit is desired, the output is threestated, and the limit value is clocked into the input
register. The machine can then access this value via the
shared-input feedback mux.
LOG/iC deals with these situations by referring to
the state register as a buried node. LOG/iC provides a
list of the node numbers and the pins they correspond
to. The input to the macrocell is assigned to the pin
number. Using this notation, LOG/iC automatically uses
the shared-input feedback mux for the input. The following statements correctly configure and use the
shared-input feedback mux for a buried macrocell that
has a variable assigned to the state register named S1
and an input named X29:
*X-names
X29;

*Y-names
Sl;
; Design entry here
*Pins
X29 = 27;
*Nodes
Sl = 15;
Remember that the shared-input feedbackmux is
available for only one of every pair of macrocells. Node
numbers, the corresponding pin numbers, and their

Controlling Synchronous Reset and Preset

The CY7C330 has a single product term that controls the synchronous resets of all of the state/output
registers. Similarly, a single product term controls all
the state/output registers' synchronous presets. These
two product terms are controlled via the $PS and $RS
6-159

available product terms are as follows (hI - 4 are the
hidden macrocells):
Node: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Pin: hI h2 h3 h4 15 16 17 18 19 20 23 24 25 26 27 28
PTs: 19 11 17 13 09 19 11 17 13 15 15 13 17 11 19 09

Design Examples
An old adage about designing asserts that good engineers borrow and great engineers steal! Most often
used to describe the practice of re-using existing
software in a new design, this proverb applies equally
well to doing new PLD designs. Also, examples tend to
be the best way to flatten the learning curve for a new
language. The examples that follow highlight features of
LOG/iC and the CY7C330.

Example 1: Modulo-ll Counter
The ability of the CY7C330's XOR product term to
implement T flip-flops proves ideal for building complex counters. This first example is a small counter that
counts to 11 and resets to O. The design also features a
clear, a hold, and count-up/down controls. Appendix A
shows the LOG/iC source code for this design. This
counter is an excellent example of the expression compactness with which LOG/iC describes designs.
The counter's four inputs are CLR, UP, HOlD,
and OE. CLR resets the counter to zero when asserted.
UP determines the direction in which the counter
operates. HOlD causes the counter to stop at its current count value until HOLD is released. OE is tied to
pin 14 and serves as an asynchronous output enable.
The outputs are count bits QO - Q3.
Note that the *Level statement has been used to
indicate that OE and CLR are active Low. As noted
earlier, the design needs no polarity conversion;
LOG/iC automatically creates the proper reduced
equations for these active-Low inputs.
Because each output value is unique, this design
uses Z-Values state assignment. Thus, the states are the
counter values. Examining the flow table, you can see
that whenever CLR is active, the counter goes to state
1, which has a value of zero. Entered this way, the flow
table values cause LOG/iC to use the macrocell product
term s to implement the CLR function. You could use
the CY7C330's preset and reset product terms to
achieve the same result. This design falls well within the
limit of nine to 19 product terms per output, however,
and the design is very readable in the current format.
The next line in the flow table shows the countingdown state: If in state 1, wrap around to state 11; if in
any other state, move to the next lower state. The third
line does not have to restate the state section of the
flow table, because no change occurs from the second
line. The third line specifies the design's up counter: If
in state 1 - 10, go to the next higher state; if in state 11,
wrap around to state 1.
The flow table's last line shows the hold state.
Notice that the file contains statements for both T and

D flip-flops. This practice allows you to comment one
of the two out easily and see the number of product
term s necessary for each type of design implementation.
As expected, the T flip-flop design generates a more efficient counter implementation.
Example 2: 15-Bit Counter with Carry Out
The. previous example generated a counter with a
very compact design expression. If you want a larger
counter, you might wish to borrow that example and
edit the numbers to provide· more count bits. Doing so
quickly runs you into the wall of the 1024-states maximum, however. The solutions is to use LOG/iC's block
structure to partition the task into multiple smaller
counter s that cascade to form a large counter. An example of this technique appears in Appendix B.
This design consists of three smaller design blocks
named CTR1, CTR2, and CTR3 - all identical. The
design has global inputs called RESET, HOlD, and
UP that perform obvious functions. Global outputs include 15 bits of counter value and a carry out. Between
the design blocks are two local variables, INTl and
INTI, which provide carry out internally between
counter blocks. The *Link statement reconciles all the
global variables and the local variables that each block
declares.
Although this design looks rather large, bear in
mind that when the optimization is complete, the internal variables completely disappear, and only two
product terms are required per output. Finally, note the
block titled HW RESET. This block uses the CY7C330
preset product term to reset all the output pins to zero.
Example 3: T-Bird Tail Lights via Truth Table
The T-Bird tail lights example is a simple design
that emulates the function of the early 1960s Ford
Thunderbird tail lights. The original design used a
motorized assembly that caused the left or right cluster
of three lights to tum on sequentially from the inside to
the outside when the driver activated the directional signal.
The design presented here has five inputs: left tum
(LT), right tum (RT), ignition (IGN), brake, and flash.
For this design, the six output lights are also listed as
inputs, because the truth table uses them to determine
present state - similar to Z-values in an FSM. The six
output lights are designated the right and left inside,
middle, and outside. The brakes and emergency flashers
operate regardless of whether or not the ignition is active. The tum signals, however, operate only with the
ignition on. The brake and tum inputs activate all the
lights on the side that is nof sequencing through a tum
indication.
This design introduces the concept of a bus
through the constructs LEFT = LO,LM,LI and
RIGHT = RI,RM,RO. Also note that the design uses
string substitution to describe the output states. Appendix C shows this example.

6-160

macrocell's input registers. Appendix E shows the
source code for this example.
One difference between this example and the earlier ones is that both the X and Y input sections contain
the variables A[O..7]. This arrangement is due to the fact
that the same macrocells provide the desired position
(input) and the difference value (output). The *Local
attribute identifies intermediate values that are not
needed for output but are used to generate the correct
results via substitution into other equations. Because the
basic equation for an adder uses an XOR to calculate
the sum, this example specifies the .XRB attribute to
use the CY7C330's XOR product term as an XOR - a
technique that reduces the number of other product
terms required.
The adder completes an 8-bit add in three clock
cycles, producing two intermediate carry bits, which are
generated and stored in two of the four internal hidden
registers. The special functions attributes .IC2 and .FBK
configure the output macrocell appropriately.

Example 4: T-Bird Tail Lights via Flow Table

FSM syntax can also implement the T-Bird tail
lights. For this approach, state bits are assigned to
guarantee that all states are unique and non-overlapping. The CY7C330's hidden macrocells are ideal for
this use. Refer to Appendix D for this design.
Although this FSM implementation is safer than
the truth-table version from the aspect of uniquely assigned states, the FSM approach is not without cost.
Specifically, the truth table implementation was able to
incorporate additional functions for invalid conditions
such as LT and RT active simultaneously.
Example 5: 8-Bit Adder for Servo Control
This servo example is covered in detail in the application note, "Using the CY7C330 as a Closed-Loop
Servo Controller." The basic idea is that you can use the
CY7C330 to calculate the difference between the
desired position and the actual position to provide feedback to the servo loop.
In the servo application, the target position is
loaded into the I/O macrocell's input register during a
special update cycle. During this cycle, a microprocessor provides data to the dedicated inputs as a delta
from the current position. The CY7C330 adds the position value to the current position and makes the result
available at the output pins in three clocks. Then the
second input clock is toggled once to load the new
desired position into the I/O-macrocell input registers.
This operation is possible because the outputs are driving the macrocell input registers.
This design uses nearly all of the registers in the
CY7C330. To provide a difference between desired and
current position during normal operation, the input
values are furnished in two's complement form and
added to the target position stored in the I/O

Summary
The examples presented here frequently optimized
to levels exceeding results produced previously. The
ability to specify Don't Cares for output cases, along
with LOG/iC's table-driven optimizer, produced results
much more quickly than has previously been typical.
Documentation, an Achilles heel for many PLD tools,
proved quite readable in this case and minimized the
dreaded learning curve. LOG/iC's finite state machine
syntax allows compact descriptions of complex designs
that produced correct results - quite a contrast to previous experiences.
Clearly, LOG/iC can implement designs that use all
the features available in the CY7C330. LOCtiC has
quickly become an essential tool for Cypress PLD
designs.

6-161

Appendix A. LOG/iC Source Code for Modulo-ll Counter
*IDENTIFICATION
Bit - modulo 11 counter using LOG/iC FSM entry
Z-Value state assignment used to specify absolute output value associated with each of the 11 states
MMA - Cypress Semiconductor
*X-NAMES
CLR, UP, HOLD, OE;
*Z-NAMES
Q[3 .. 0];
*LEVEL
LOW == CLR,OE; Active low level for these pins

*Z-VALUES
S[1..11] = [0 .. 10];
*FLOW-TABLE
RELEVANT = CLR, UP, HOLD;
S[1..11],X 1
-, F1
S[1..11], X 0 0 0, F[11,1..10]
X 0 1 0, F[2.. 11,1]
S[1..11],X 0
1, F[1..11]

;
;
;
;

Clear counter to zero
Count Down
Count Up
Hold Counter value

;Spacing between X variables above added only to improve clarity
*STATE-AS SIGNMENT
Z-Values;
*FLIP-FLOPS
D-FlipFlops;
D-F/F uses total of 22 Product Tei'rnS
T-FlipFlops;
T-F/F uses total of 16 Product Terms
*PLD
TYPE = PLD7C330;
*PINS
Q[3 .. 0]
CLR
UP
HOLD
OE

[28 ..25],
3,
4,
5,
14;

*RUN-CONTROL
PROG = JEDEC;
LIST = PLOT, EQUATIONS, PINOUT, FUSEPLOT;
*END

6-162

Appendix B. IS·Bit Counter with Carry Out
*Identification
15 bit counter· Using 7C330 hardware Reset
Using Block Syntax to implement large counter w/FSM input Syntax (bypasses problem with exceeding maximum
number of states when building large counters • block structure adds NO extra product terms to compiled design.
INTI & 2 are completely elinunatea.)

MMA
Cypress Semiconductor

HOLD

CTRI
CTR3
CTR2
CNT CY~------~CNT CY~------~ CNT CY
01
02
03
04
05

RESET

06
07
08
09
010

RESET

RESET~--------------~--------------~

*X-Names
RESET, HOLD, UP;
*Y-Names
CARRY,Q[1..15];
*Local
INT[I,2];
*Link
RESET
RESET
HOLD
UP
CARRY
INTI
INTI
Q[1..5]
Q[6 .. 10]
Q[I1..15]

CTRl:R,CTR2:R,CTR3:R;
HW RESET:R;
C'rRi:CNT;
CTRl:UP,CTR2:UP,CTR3:UP;
CTR3:CY;
CTRl:CY,CTR2:CNT;
CTR2:CY,CTR3:CNT;
CTRl:QQ[1..5];
CTR2:QQ[1..5] ;
CTR3:QQ[1..5] ;

;*** First 5-bit counter stage here ************
@BLOCK = CTRl;
*X-Names
CNT,R,UP;
*Y-Names
CY;
*Q-Names
QQ[5 .. 1];

6-163

CARRY
OIl
012
013
Q14
015

RESET

Appendix B. 1S-Bit Counter with Carry Out (continued)
*Flow-Table
;Using '330s Internal Reset
Relevant = CNT,UP:CY;
Y 0, F[l..32] ;Hold Condition
S[1..32],X
S[1..31],X 1 1, Y 0, F[2.. 32] ;Counting
S[32], X I I , Y 1, Fl
;Maximum Count Reached
S[32.. 2],X 1 0, Y 0, F[31..1] ;Counting
S[1],
X 1 0, Y 1, F32
;Minimum Count Reached

° -,

*Flip-Flops
T-FLIPFLOPS;
*State-Assignment
binary;
@ENDBLOCK =

CTRl;

;*** Second 5-bit counter stage here ************
@BLOCK = CTR2;
*X-Names
CNT,R,UP;
*Y-Names
CY;
*Q-Names
QQ[5 .. 1];
*Flow-Table
; Using '330s Internal Reset
Relevant = CNT,UP:CY;
S[1..32], X
Y 0,
S[1..31],X 1 1, Y 0,
S[32], X I I , Y 1,
S[32 ..2],X 1 0, Y 0,
S[I],
X 1 0, Y 1,

° -,

F[1..32];Hold Condition
F[2 .. 32];Counting
Fl
;Maximum Count Reached
F[31..1];Counting
F32
;Minimum Count Reached

*Flip-Flops
T-FLIPFLOPS;
*State-Assignment
Binary;
@ENDBLOCK =

CTR2;

;*** Third 5-bit counter stage here ************
@BLOCK = CTR3;
*X-Names
CNT,R,UP;
*Y-Names
CY;

6-164

Appendix B. IS-Bit Counter with Carry Out (continued)
*Q-Names
QQ[5 .. 1];
*F1ow-Table
; Using '330s Internal Reset
Relevant = CNT,UP:CY;
S[1..32],X
Y 0,
S[1..31],X 1 1, Y 0,
S[32], X I I , Y 1,
S[32 .. 2],X 1 0, Y 0,
S[1],
X
0, Y 1,

° -,

F[1..32]
F[2 .. 32]
Fl
F[31..1]
F32

;Hold Condition
;Counting
;Maximum Count Reached
;Counting
;Minimum Count Reached

*Flip-Flops
T -FLIPFLOPS;
*State-Assignment
Binary;
@ENDBLOCK = CTR3;
;******* End of Counter Blocks ***********
@BLOCK = HW_RESET;
*X-Names
R;
*Boolean Equations
$PS = R;
@ENDBLOCK
*PLD
Type

=

PLD7C330;

*Pins
REGCLK
INPCLK
Q[5 .. 1O]
Q[11..15]
CARRY
RESET
HOLD
UP
*Nodes
Q[l..4]

=

1,
2, ! needed for creating testvectors
[15 ..20],
[23 ..27],
28,
4,
5;
6;

[1..4];

*Run-control
Listing
Progformat =

Pinout, Plot;
Jedec;

*END
6-165

Appendix C. T-Bird Tail Lights Example
*IDENTIFICATION
Thunderbird sequencing Taillights example for 7C330 using ISDATA LOG/IC
Truth Table Implementation
MMA
Cypress Semiconductor
*X-NAMES
LT, RT, BRAKE, FLASH, IGN, RI, RM, RO, LI, LM, LO;
*Y-NAMES
RI, RM, RO, LI, LM, LO;
*BUS
LEFT = LO,LM,LI;
RIGHT = RI,RM,RO;
*LEVEL
LOW = FLASH;
;Macros for All desired output combinations:
*STRING
1, 1, 1;
ON
0, 0, 0;
OFF
0, 0, 1;
LEFT 1
LEFT2
0, 1, 1;
1, 0, 0;
RIGHTl
1, 1, 0;
RIGHT2
ONE

TWO
THREE
TRI

1,
0,
0,

0,
1,
0,

-,

.,

0;
0;
1;

*FUNCTION-TABLE
$ IGN,FLASH,LT,RT,BRAKE,LEFT ,RIGHT
;Quiescent
1, 0, 0, 0, 0, 'TRI' ,
'TRI'
0, 0, , -, 0, 'TRI' ,
'TRI'

-

;Flash
1,
1,

-,
-,

-, -, -,
-, -, -,

;Brake
0, 0,
0, 0, -,

-,

;Left Tum
1, 0,
1, 0,
1, 0,
1, 0,

1,
1,
1,
1,

: LEFT ,RIGHT
'OFF' ,'OFF';
'OFF','OFF';

ON',
'OFF',

'ON'
'OFF'

'OFF','OFF';
'ON','ON';

-,

0,

1,
1,

'TRI',
'TRI',

'TRI'
'TRI'

'ON','ON';
'ON','ON';

0,
0,
0,
0,

0,
0,
0,
0,

'OFF'
'LEFTl',
'LEFT2',
'ON',

,'TRI'
'TRI'
'TRI'
'TRI'

'LEFTl', 'OFF';
'LEFT2' ,'OFF';
'ON','OFF';
'OFF','OFF';

6-166

Appendix C. T-Bird Tail Lights Example (continued)
;Right Turn
1, 0, 0,
1, 0, 0,
1, 0, 0,
1, 0, 0,

1,
1,
1,
1,

0,
0,
0,
0,

'TRI' ,
'TRI',
'TRI',
'TRI',

'OFF'
'RIGHTl'
'RIGHT2'
'ON'

'OFF','RIGHT1';
'OFF' ,'RIGHT2';
'OFF' ,'ON';
'OFF' ,'OFF';

1,
1,
1,
1,

'OFF',
'LEFT1',
'LEFT2',
'ON',

'TRI'
'TRI'
'TRI'
'TRI'

'LEFT1','ON';
'LEFT2' ,'ON';
'ON','ON';
'OFF','ON' ;

'TRI',
'TRI',
'TRI',
'TRI',

'OFF'
'RIGHTl'
'RIGHT2'
'ON'

'ON' ,'RIGHT1';
'ON' ,'RIGHT2';
'ON','ON';
'ON' ,'OFF';

;Left Turn + Brake

1,
1,
1,
1,

0,
0,
0,
0,

1,
1,
1,
1,

0,
0,
0,
0,

;Right Turn + Brake
1, 0, 0, 1, 1,
1, 0, 0, 1, 1,
1, 0, 0, 1, 1,
1, 0, 0, 1, 1,
;Both Turn
1, 0,
1, 0,
1, 0,
1, 0,
;ll1egal
1,
1,
1,
1,

- lights flash
1, 1, 0,
1, 1, 0,
1, 1, 0,
1, 1, 0,

in reverse sequence

'OFF',
'ON',
'LEFT2',
'LEFT1',

condition, All ON
0, 1, 1, 1, 'OFF',
0, 1, 1, 1, 'ONE',
0, 1, 1, 1, 'TWO',
0, 1, 1, 1, 'THREE',

'OFF'
'ON'
'RIGHT2'
'RIGHT1'

'ON','ON';
'LEFT2' ,'RIGHT2';
'LEFT 1' ,'RIGHT1';
'OFF','OFF';

'OFF'
'THREE'
'TWO'
'ONE'

'ONE','THREE';
'TWO','TWO';
'THREE' ,'ONE';
'OFF','OFF';

*FLIP-FLOPS
D-FLIPFLOPS;
T-FLIPFLOPS;
*PLD
TYPE
*PINS
LT
RT
BRAKE
FLASH
IGN
RI
RM
RO
LI

LM
LO

PLD7C330;
4,
5,
6,
7,
9,
23,
24,
25,
20,
19,
18;

*RUN-CONTROL
PROG = JEDEC;
LIST = PLOT, EQUATIONS, PINOUT, FUSEPLOT;
*END

6-167

Appendix D. T-Bird Tail Lights via FlowTable

*IDENTIFICATION
Thunderbird sequencing Taillights example for 7C330 using ISDATA LOG/IC
State Machine Implementation
MMA
Cypress Semiconductor
*X-NAMES
LT,RT,BRAKE,FLASH,IGN;
*Z-NAMES
LO,LM,LI,ru,RM,RO;
*LEVEL
LOW

=

FLASH;

*Q-NAMES
Q[I ..4];
*Z-VALUES
SI
OOOOOO;AIllights off or Flash Off
S2
001000; Left Tum 1
S3
011000; Left Tum 2
S4
111000; Left Tum 3
S5
000100; Right Tum 1
S6
000110; Right Tum 2
S7
000111; Right Tum 3
S8
001111; Brake + Left Tum 1
S9
011111; Brake + Left Tum 2
SlO = 111111; Brake + Left Tum 3
S11 = 000111; Brake + Left Tum 4
S12 = 111100; Brake + Right Tum 1
S13 = 111110; Brake + Right Tum 2
S14 = 111111; Brake + Right Tum 3
SIS = 111000; Brake + Right Tum 4
S16 = 111111; Brake or Flash On
*FLOW-TABLE
Sn, LT RT Brake Flash
SI, X 0 0 0 0
X 1
X 0 0 1 0
X 1 0
X 1 0 0 0
X 0 1 0 0
X 1 0 1 0
X 0 1 1 0
XREST,

IGN, Fn
-, Fl;
-, FI6;
1, FI6;
0, FI6;
1, F2;
1, F5;
1, F8;
1, F12;
FI;

S2, X 1
XREST,

1,

F3;
FI;

S3, X 1
XREST,

1,

F4;
Fl;

S4, XREST,

All Lights Off

Left Tum Sequence

Fl;

6-168

Appendix D. T -Bird Tail Lights via Flow Table (continued)
RI

RM
RO
LI
LM
LO
*NODES
Q[1..4]

23,
24,
25,
20,
19,
18;
[1..4];

*RUN-CONTROL
PROG = JEDEC;
LIST = PLOT, EQUATIONS, PINOUT, FUSEPLOT;
*END

6-170

Appendix E. 8·Bit Adder Example
*Identification
8-Bit multi-stage adder - as detailed in 7C330 Servo control Application Note
Mark Aaldering
Cypress Semiconductor
*X-Names
CIN,C2,C5,A[0 .. 7],B[0..7];
*Y-Names
A[O.. 7],C2,C5,CARRY;
*Local
C[0.. 1,3 ..4,6..7] ;
*Boolean-Equations
A[0.. 7].XRB = A[0.. 7];
AO = BO # CIN;
CO = (AO & BO) + (AO & CIN) + (BO & CIN);
A[1..7] = B[1..7] # C[0 .. 6];
C[1..6] = (A[1..6] & B[1..6]) + (A[1..6] & C[0..5]) + (B[1..6] & C[0 .. 5]);
CARRY = (A7&B7) + (A7&C6) + (B7&C6);
*Flip-Flops
D-FLIPFLOPS;
*PLD
Type
*Nodes
C2 =
C5 =

= PLD7C330;
1;
3;

*Pins
OUTCLK
INCLK
ACLK
CIN
B[0..7]
AO
Al
A2
A3
A4
A5
A6
A7
CARRY

1,
2,
3,
4,
[5 .. 7,9 .. 13],
28,
15,
20,
17,
26,
23,

19,
24,

18,

*Special-Functions
AO.IC2 = Yes;
A1.IC2 = Yes;
A2.IC2 = Yes;

6-171

~CYPR!Ss
~

..

SEMlCOIDucrOR

Using LOG/iC to Program the CY7C330

=============;;;;;;;;;;:;;;;======;;;;;;:;;=======;;;;;;
Appendix E. 8-Bit Adder Example (continued)

A3.IC2
A4.IC2
A5.IC2
A6.IC2
A7.IC2

=
=
=
=
=

Yes;
Yes;
Yes;
Yes;
Yes;

AO.FBK =
A1.FBK =
A2.FBK=
A3.FBK =
A4.FBK =

Pin;
Pin;
Pin;
Pin;
Pin;
Pin;
Pin;
Pin;

AS.FBK =

A6.FBK =
A7.FBK=

*RUN-CONTROL
PROG = JEDEC;
LIST = PLOT, EQUATIONS, PINOUT, FUSEPLOT;
*END

6-172

~
~
---.
~II-~~:}'iii
.~a CYPRESS
~
F SEMICONDUCTOR
-

State Machine Design Considerations
and Methodologies
The use of state machines provides a systematic
way to design complex sequential logic circuits-an increasingly popular approach since the advent of PLD
(Programmable Logic Device) circuitry. This application note describes the many options encountered
during the state machine design cycle. By exhaustively
walking through the PLD-based design example
presented here, you can weigh the merits of several
design approaches.

7. Total input vector-The combination of the external
input vector and the state vector. The total input vector
is decoded to generate the next state of the machine.

State Machine Entry Methods
There are many ways of describing a state
machine, each with distinct advantages and disadvantages. Three popular description methods are state
diagrams, state tables, and high-level languages (HLLs).
The state diagram provides an easily observable flow
description of the state machine. Because the ability to
view the flow of states provides distinct documentation
advantages, state diagrams will be used throughout this
application note to describe the example state machine.
Upon completing a state diagram, you can easily
convert the diagram's visual information into the other
types of state machine description or directly into
Boolean equations. Several available software programs
accept their own forms of state table, HLL, and/or
Boolean entry. You can enter all these formats easily via
your favorite text editor. The software then translates
the inputs into suitable forms (usually a JEDEC map)
for hardware implementation.
Another method of describing a state machine, the
state table, offers perhaps the most concise description.
Its major advantage over the other entry methods is the
availability of state table reduction methods (see Reference 1). When applied to your state table definition, a
reduction program generates a minimal model for the
function. The software used for state machine synthesis
throughout this application note uses the state table
method of entry. The program is called LOG/iC from
ISDATA Corporation.
Finally, high level language (HLL) state machine
entry is probably the most popular forro of state

Definitions of Commonly Used Terms
1. External input vector-External signals (stimulus) applied to the state machine.
2. System outputs-Signals generated by the state
machine that are explicitly designed for availability to
the external system (hardware outside of the state
machine). Registered system outputs can also be fed
back into the state machine as part of the State Vector,
which is then used in the decode of the state machine's
next state.
3. State registers-Registers used exclusively for determining the next state of the machine (feedback).
4. State outputs--Outputs of the state registers that are
available to the external system. (They are typically
available to the external machine for debug or due to
the lack of buried registers.)
5. State vector or machine state-The registered feedback information defining the present state of the
machine and required to determine the next state of the
machine.
6. State path-The transitional condition that must be
met for the state machine to progress from one state to
another. The state path typically consists of one or more
product terms generated from external inputs, although
other state paths are possible.

6-173

~

State Machine Design Considerations and Methodologies

~ ~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~
machine design. HLLs typically offer C-language-Iike
instructions (e.g., case,if-then-else, etc.) to describe the
machine.

An Example State Machine
The example state machine is a clock generator for
a pipelined (three system execution stages), bit-slicebased, central processing unit (CPU). Each of the three
system execution stages contains two clocks for a total
of six system clocks for every instruction execution.
With pipelining enabled, each instruction takes an
average of two clock periods. Further, external
hardware unaffected by CPU wait and stop states (e.g.,
cache memory) needs both polarities of an additional
free-running clock.
To minimize clock edge skew, the state machine
provides both versions of the clock. To put the timing of
this application into perspective, executing each
pipeline stage in an 80-ns period (or 12.5 MHz) requires the state machine to run at 25 MHz. This speed
is well within the range of the available PALs, EPLDs
and PROMs that can be used to implement the state
machine.
Each of the pipeline's three execution stages has a
specific function. Briefly, the frrst stage of the pipeline
accesses the Writable Control Store (WCS) RAM. The
Arithmetic Logic Unit (ALU) execution occurs during
the second stage of the pipeline. Finally, the third
pipeline stage clocks status and memory address
registers. The function(s) performed during each of the
three stages are described in greater detail in the "State
Machine Output Definition" section of this application
note.
If this design only generates a simple set of
pipelined clocks, why not use shift registers and miscellaneous glue logic instead of a state machine? There are
two reasons to consider a state machine. First, it is
usually desirable to minimize the number of chips required; the state machine in PLD form might need external glue logic, but significantly less than the shift
register solution.
The second reason for considering a state machine
is that this application requires more then just a simple
set of pipeline clocks. The function of the clock signals
is to provide control of the CPU in multiple modes of
operation. The desired modes of operation are as follows:

time to complete one nonpipelined instruction equals
the average of three pipelined instructions.
CPU STOP
The system must have a way to perform· an orderly
stop of CPU execution from both of the above run
modes. This stop might be the result of several possible
conditions, including a utility stop from a system control
unit, a single step, a breakpoint, or a response to external hardware (e.g., a logic analyzer). The free-running
clocks continue to run during the CPU STOP mode and
remain running at all times, except during a reset condition.
CPU WAIT
In CPU WAIT mode, an external condition causes
a delay in an instruction's execution. The instruction
pauses until the external condition is removed. One application for the CPU WAIT mode is to handle a cache
miss. When a cache miss occurs, the CPU remains in
the CPU WAIT mode until the cache completes its
memory transfer.
SINGLE STEP
The ability to execute one instruction at a time is
needed to debug the CPU. You can easily implement
SINGLE STEP external to the clock state machine by
pulsing the RUN signal. SINGLE STEP mode is
described further in the State Machine Input Definition
section of this application note.
INTERRUPT
A variety of system conditions can interrupt the
CPU out of its normal execution sequence and immediately start the execution of the interrupt handler. The
influence of the INTERRUPT mode on the system
clocks will be discussed in greater detail later in this
application note.
REPEAT INSTRUCTION
The REPEAT INSTRUCTION mode is a CPU
debug feature.· It is a good idea to implement this mode
external to the clock state machine. By dubbing the
clock to the instruction register and the interrupt line to
the clock state machine, the CPU continually executes
the instruction in the instruction register.

Synchronous vs. Asynchronous Machine
At this point in the state machine design, an appropriate type of state machine must be chosen to
match the application. Two major types are the
asynchronous and the synchronous implementations.
The asynchronous machine changes state when one or
more of its inputs changes from a previously stable
input state. After a state change, the outputs of the state
machine settle, while the machine stabilizes once again.
A basic example of an asynchronous state machine
would be a simple SR latch built from two NAND gates
(Figure 1). For the clocking application considered in
this application note, the asynchronous state machine

PIPELINED RUN Mode

In this mode, the CPU simultaneously performs
the instructions in all three stages of the pipeline. For
example,: while instruction n does an ALU operation,
instruction n+1 accesses WCS, and instruction n-1
clocks ALU status.
NONPIPELINED RUN Mode

NONPIPELINED RUN mode performs all three
stages of instruction execution without overlap. The
6-174

~

£~~~
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;S;;;;;ta;;;;;t;;;;;e;;;;;M;;;;;;;;;;a;;;;;C;;;;;h;;;;;in;;;;;e;;;;;D;;;;;e;;;;;s;;;;;ign;;;;;;;;;;;;C;;;;;o;;;;;n;;;;;si;;;;;d;;;;;er;;;;;a;;;;;t;;;;;io;;;;;n;;;;;s;;;;;a;;;;;n;;;;;d;;;;;M;;;;;;;;;;e;;;;;th;;;;;o;;;;;d;;;;;O;;;;;IO;;;;;gI;;;;;'e;;;;;s;;;;;;;;;;=
SEMIcaIDUCTOR.;;;;;
STATE

STATE

IIIPUTS

remain stable until the next time period, when the
Moore machine samples the total input vector.to determine the next state. If all design conditions are met (external inputs are stable prior to the next state clock),
the Moore machine provides glitch-free system outputs-a desirable characteristic for the CPU system
clock. The design described here is therefore implemented as a Moore machine.

OUTPUTS

s

Q

Q

R-----l

Clock Generator Output Definition

Figure 1. SR Latch, Asynchronous State Machine
Example

As explained earlier, each of the three system execution stages contains two clocks for a total of six system clocks for every instruction execution. The naming
convention for these clocks is

implementation would be a poor choice, due to the instability of the system outputs.
The synchronous state machine offers a better
choice. A synchronous state machine block diagram appears in Figure 2. Generally, a synchronous state
machine samples the total input vector at specific
periods to determine the machine's next state. When
designing synchronous state machines, it is important to
avoid state register metastability. External inputs to the
machine must be synchronized to guarantee stable state
register inputs, and the feedback time plus data setup
time to the state register clock must be less then or
equal to the state clock period.
The modem theory of synchronous state machines
was pioneered by Mealy and Moore (see Reference 1).
Mealy and Moore machines differ slightly from each
other in they way they control the system outputs.
During a specific machine state, a Mealy machine allows the input conditions to alter the system outputs
(the outputs depend on the "total" input state). In contrast, a Moore machine system outputs depend only on
the present machine state. Thus, the system outputs

CLK xy
where x = 1, 2, or 3, representing the first, second, or
third stage of the instruction execution
and y = A or B, representing the first or second half of
the execution stage.
Following this convention, the state machine's two
free-running clocks are named CLK_A and CLK_B.
These clocks run at half the state clock frequency and
180 degrees out of phase. The free-running clocks occur
at the same time as their respective CLK_xA and
CLK xB clocks.
The major clock functions for this application are:
CLK_IB: The leading edge of this clock updates the instruction register.
CLK 2A: This clock's leading edge marks the start of
ALU- execution. The information on the ALU input bus
clocks into the appropriate input registers at this time.
The instruction cycle is considered recoverable up
through and including CLK_2A (Le., the status of the
machine from the previous instruction has not been
altered).

STATE
YECTOR

:.::::::::::::::

TOTAL
IIIPUT
YECTOR

11111111..... -

SYNCHRONOUS
EXTERNAL
IIIPUTS

ASYIICHROIIOUS
EXT ERIAL
INPUTS

ElTERIIAL
INPUT
VECTOR

~---------4~-~

STATE
REU TER

;:::::::::::::

MACHINE
STATE

MEALY

SYSTEM

OUTPUTS

OPTIOIIAL
STATE
OUTPUTS

&
OPTIONAL

SYSTEM
OUTPUT
DECODE

STATE CLOCK ____-J________________________________________

SYSTEM

OUPTPUT
FEEDBACK
MOORE SYSTEM
OUTPUTS

~

Figure 2. Synchronous State Machine Block Diagram

6-175

State Machine Design Considerations and Methodologies
CLK 2B: Used to control the second half of the ALU
execUtion stage, this clock initiates a write to RAM,
triggers counters, gates ALU output into its latch, and
clocks the ALU output information into any of the distributed destination registers.
CLK 3A: On this clock the memory address register
can be updated. The ALU output bus status and ALU
status is also clocked into the CPU status register.

Clock Generator Inputs
A set of inputs (external stimulus to the state
machine) controls the state machine. The clock state
machine described here has eight external inputs, including the state machine clock. These inputs are:
STATECLK: The state machine clock.
RESET: An asynchronous or synchronous reset
input that can be connected directly to the state
registers' preset or clear or to all clocked register inputs
(D or T input). If connected to the preset or clear,
RESET need not be synchronized. In this case, RESET
forces the state machine into the machine's initial state,
regardless of the present state. RESET can result from
any combination of the following sources:
1. Power up circuit (system reset)
2. System controller software decodes system reset
3. System controller software decodes module reset
4. CPU software decodes module reset
RUN: This signal controls the start and stop sequence of the CPU clocks. In PIPELINE RUN mode,
the start sequence generates the proper clock progression to fill up the pipeline registers, and the stop sequence empties the pipeline. RUN is externally manipulated to implement the single step and breakpoint functions.
NPL: Used to select NONPIPELINED RUN vs.
PIPELINED RUN modes, this signal must be set to the
selected mode prior to activating the RUN signal. Setting NPL = 1 selects NONPIPELINED RUN mode,
and NPL = 0 selects PIPELINED RUN mode. The
single step function operates properly in NONPIPELINED RUN mode only.
INTR: This signal indicates an external interrupt.
When INTR is received, and lEN (interrupt enable,
described below) is active, the CPU executes its interrupt handler. An interrupt inhibits the instruction
register update clock (CLK_lB) and the ALU update
clock (CLK 2B). CLK lA for the interrupt instruction
executes on -the next cycle. The interrupt condition has
priority over a wait condition and therefore starts
generating clocks to permit execution of the interrupt
instructions.
lEN: This interrupt enable signal qualifies INTR.
lEN is likely to be a bit in the instruction word, allowing
the user to define sections of un-interruptable code.
WAIT: The wait condition is initiated when both
WAIT and WEN (wait enable, described below) are active. The CPU remains in the wait condition until
WAIT goes inactive.

WEN: This wait enable signal qualifies WAIT for
entrance into the wait condition. Like lEN, WEN is
usually a bit in the instruction word, allowing the user to
define sections of wait-sensitive code.

State Machine Partitioning
When architecting a state machine, it is generally a
good practice to break up large machines into workable
blocks, with each of the smaller machines containing
states that require common inputs and generate common outputs. The example clock state machine is small
enough to be designed as a single state machine, although it would be trivial to design logic to generate the
free-running clocks as a separate machine from the rest
of the clock state machine. Equations for the free-running clocks are:
CLK_A := lRESET * ICLK_A
CLK B := lRESET * CLK A
where ":=" indicates a registered output.
By examining these output equations, you can see
that the free-running clocks have only two dependencies
in common with the remaining portion of the clock state
machine, i.e., RESET and STATECLK. The free-running clocks are required as inputs to the other state
machine to synchronize the additional system outputs,
however.
The example presented here implements the freerunning clocks and the other system outputs within the
same state definition. The resulting output equations
can be verified against the equations for the free-running clocks alone.
The I nitial Machine State
Regardless of the preferred state machine entry
method, attacking the problem starts with defming the
initial state of the machine. This initial state (INIT in
the example) must be consistent with the power-on condition and/or an external input used to initialize the
machine (RESET).
The state of the machine can be decoded from the
present values of the system outputs, state registers, or a
combination of the two. (The advantages and disadvantages of the state defmition options will be discussed
in greater detail later in this application note.) The initial machine state is generally, but not always, a decode
of all Os or all 18. In the example design, INIT is the
decode of all Os.
Naming the States
With the exception of INIT, each state in the example design is named to indicate the active system
clocks occurring during that state. For example, during
state A, only CLK_A is active. Similarly, state 123B has
only CLK_lB, CLK_2B, CLK_3B, and CLK_B active.
Additionally, an "N" suffix designates a nonpipelined
state and a "w" suffix designates a wait condition state;
this convention differentiates between states with identical active system outputs.

6-176

~
State Machine Design Considerations and Methodologies
~~~
2&, SEMIcc::mucrOR ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;~;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;~
FROM STATE B

CPU Inactive States

The RESET input causes the state machine to
enter the INIT state from any state in the machine.
From the INIT state, the machine unconditionally starts
to generate the free-running clocks. As shown in Figure
3, a line pointing from the INIT .sta.te to the A stat~,
with a path equation equal to 1, mdlcates an unconditional branch. The state machine progression continues
from the A state unconditionally into the B state. In the
B state a multi-branch condition exists. If the RUN
input remains inactive, then the A and B sta~es continue
to toggle, generating only the free-runnmg clocks.
Hence the INIT, A, and B states are referred to as
"CPU inactive states".
Nonpipelined States
If the NPL input is active while the RUN input be-

comes active, the state machine operates in NONPIPELINED RUN mode and follows the model
portrayed in Figure4.
Pipelined States
If the NPL input is inactive when the RUN input

goes active, thus indicating PIPELINED RUN mode,
the state machine operates as depicted in Figure5.
Unique States

When the RUN input goes active, the next state
executed is either the 1A or the 1AN state, depending
upon the value of the NPL input (refer to Figures4 and
5). Notice that the active system outputs in these two
states are identical. Why generate two identical stateswhen an additional state register might be required to
differentiate between the states? (This assumes you use
the system outputs to decode the machine's states.) The
redundant states are not a problem because the additional state register needed to differentiate between the
states is not an issue. There are two reasons for this.
First, if you eliminate the redundant states, the state
machine would require at least one additional state
register anyway to differentiate between the B and the
BW or BWN states, which would be needed without 1A
and 1AN. (Separation of states BW and BWN from
state B is required for correct functionality.) Second,
adding another state only increases the number of state
registers if the new total number of states exceeds an

RESET
(path from all

.tat •• )

.,
TO PIPELINE MACHINE

.,

TO NON-PIPELINE MACHINE STATES

Figure 3. CPU Inactive States

TO STATE A

Figure 4. Non-Pipelined States
additional binary boundary (2, 4, 8, 16, ... ). This is not a
problem here.
You might also choose to widen your state
machine (increase the number of state registers) to
reduce the number of product terms to the state or system output registers. This decision should take into account the desired circuit implementation (PLDs,
PROMS, discrete hardware, etc.) and is often an iterative process. In general, you can initially architect the
state machine in the manner that is the easiest for you
to understand, then make additional changes or small
adjustments later if they become necessary.

State Description Verification
Now that all the pieces of the state machine are
functionally defmed (refer to Figure6 for the .c0!llpleted
state diagram), consider methods for verifymg the
validity of the design. Some software you can use to
describe and implement state machines would already
offer verification at this point in a design. For other
methods, read on!
One way to verify a state machine design is to
recognize a rule of thumb: Out of every state, there
should be a state path to another state for every possible combination of relevant external inputs. For example, there are two paths out of st,ate 123B, with
INTR and IEN as the relevant external mputs:
Path 1 = INTR * IEN
Path 2 = IINTR + INTR * lIEN

6-177

~

State Macbine Design Considerations and Methodologies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* 'NPL

FROM STATE B

I
N

T

R

*
I

E

N

WAIT * WEN

-

_/INTR +
INTR * /IEN

RUN *
(lWAIT +
WAIT * /WEN)

/RUN *
(lWAIT +
WAIT * /WEN)

TO STATE A
Figure 5. Pipelined States
If the equation's terms equal 1 after Boolean
reduction, then every state path out -of the state is accounted for. The main advantage to this verification
method is that you can easily do it using readily available Boolean reduction software.
If there are known restrictions to the external inputs, you can use this information to reduce the complexity of the machine. If it is impossible for the INTR *
lIEN condition to occur externally, for example, then
you can leave this condition out of the Path 2 equation.

If there are no known restrictions on the external
inputs, a simple method of verifying the above rule of
thumb is to generate an equation where all of the paths
----ouror-a-state are ORed together as follows:
OUT_STATE_123B
= Path 1 + Path 2;
OUT_STATE_123B
= (INTR * IEN)
+ IINTR
+ (INTR * lIEN);
OUT_STATE_123B
= 1

6-178

State Machine Design Considerations and Methodologies
In that case, the reduction of the OUT STATE 123B
equation yields a non-l result
-Because the method of verification just described
does not detect redundant path equations, it is useful to
revise the original rule of thumb to: Out of every state,
there should be one and only one state path to another
state for every possible combination of relevant external
inputs.
This revised condition is not as easily verified as
the original statement. The easiest way to verify the
more restrictive case is to simulate the state machine.

To do this, you must generate a test vector for every
possible external input that is relevant to each state
simulated. Automatic test vector generation programs
are available that produce every possible combination.
After running the vectors against the design, you must
visually inspect the output to verify that the machine
never enters an illegal state.

System and State Register Output Generation
The model defming the clock state machine is
complete, but there are still quite a few important

RESET~_~
(pith fro.

111

. t l t •• )

~

,
U
N

+-

_

!INTR +
INTR * !IEN

RUN *
(lI/AIT
WAIT *

+
!WEN)

!RUN *
(lVAIT
WAIT *

+
!WEN)

Figure 6. CPU Clock State Machine

6-179

State Machine Design Considerations and Methodologies
decisions to be made regarding the fmal circuit implementation. Some of the major alternatives for final
implementation are:
System output vs.exclusive state register state
"decode
D flip-flop vs. T flip-flop implementation
PLD vs. PROM implementation
To· gain some insight into these choices, consider
how the output or feedback equations are assembled.
Take, for example, the generation of CLK_3A using a
D flip-flop (FF) implementation. By referring to Figure
6, you can find all the states in which CLK 3A is active.
These are 123A, 3A, and 3AN. The CLK.=-3A output is
generated by ORing the state decodes that, when
ANDed with their respective state paths, advance the
state machine into the three states listed above. Specifically:
eLK 3A :=
- (Decode of 12B)*(/INTR+INTR*/IEN) ;-123A
;-123A
+(Decode of BW) *(/WAIT)
+(Decode of 23B)*(1)
;-3A
+(Decode of 2BN)*(/INTR+INTR*I1EN);-3AN
When you defme the state decodes, the CLK 3A
equations are completely specified in terms of the state
machine inputs (state path), state registers, and/or system outputs (state decode). Typically, you then multiply
the equation out to form a sum of products. This format
provides for easy implementation in a PLD, which has a
sum of products architecture, and also provides a useful
foundation for further equation reduction.

decodes in the state registers can be selected to assist in
Boolean reduction, proper state assignment enables the
more complex equations to fit into a specific implementation.
This type of decode is useful in a PLD implementation, where there is a shortage of product terms for a
specific state flip-flop, but extra flip-flops. are available.
Adding "an extra state register can simplify the decode
logic enough to fit the design ina singlePLD.
The total number· of exclusive state registers required to implement a state machine varies from a minimum of LOG(2)X (rounded up to the nearest integer)
to a maximum of X, where X is the total number of
states in the machine. You can iteratively change this
number, along with the state assignment, to obtain a
suitable solution.
The state assignment itself is a non-trivial issue,
with almost limitless possibilities and no known method
of obtaining the optimal solution. There are, however,
some guidelines that can be used to obtain workable
solutions:
1. Two or more states that potentially enter the same
state with identical path equations should be adjacent
(their binary codes differ in exactly one position). As an
example, refer to Figure 5. States 12B and 123B both
proceed into state 1A if the path condition INTR * lEN
is true. When generating the CLK_1A equation, two of
the terms of the equation look like this:
CLK lA :=
(Decode of 12B) * (INTR * IEN)
;-IA
;-lA
+ (Decode of 123B) * (INTR * lEN)

State Decode

If the decode of 12B and 123B differ in exactly one
position, then Boolean reduction (which uses the A*B
+ I A*B = B relationship) converts the two product
term s into one smaller product term.
2. Two or more states that might proceed into different
states with identical path equations, and an identical active output, should be adjacent. This situation occurs in
the previous CLK_3A equation, shown again here:
CLK_3A :=
(Decode of 12B)*(/INTR+INTR*I1EN) ;-123A
;-123A
+(Decode of BW)*(/WAlT)
+(Decode of 23B)*(1)
;-3A
+(Decode of 2BN)*(/INTR+INTR*I1EN);-3AN
Note that if states 12B and 2BN are adjacent, then
you can reduce the CLK_3A equation to three product
terms.

As discussed earlier, the next state of the machine
can be decoded from the present values of the system
outputs, the state registers, or a combination of the two.
The choice typically comes down to weighing the maximum number of product terms verses the maximum
number of flip-flops available in an implementation. For
a Moore machine, with registered system outputs, using
the system outputs to uniquely define the states uses the
smallest number of flip-flops to define the state
machine. However, it is often necessary to add one or
more state registers to uniquely define the states.
State assignment for this state decoding method is
quite simple, but also rigidly defmed, allowing limited
flexibility when assigning the additional state registers.
Mter reduction, the feedback and output equations of
this "narrow" state machine might contain too many
product terms to be implemented in a specific PLD, although product term complexity is never a problem with
a PROM implementation.

Clock Generator Implementation
As mentioned earlier, there are many ways to im-

plement state machines. The following sections discuss
some of the pros and cons associated with some of the
more common state machine implementations.

Exclusive State Registers

Another consideration in state machine design is
that you might be able to distribute the number of
product terms more evenly among the equations implementing the state machine by using state registers exclusively to decode the states. Because the state

D Flip-Flop Implementation

There are more products available that support a
D flip-flop solution than any other implementation.
6-180

State Machine Design Considerations and Methodologies
Table 1. Optimized Results for Clock Generator:
T Flip-Flop Implementation

Table 2. Non-optimized Results for Clock Generator:
D Flip-Flop Implementation

LOG/IC OPTIMIZATION SUMMARY (FACT)

LOG/IC OPTIMIZATION SUMMARY (FACT)

CPU TIME QUOTA PER FUNCTION: 100 SEC

CPU TIME QUOTA PER FUNCTION: 100 SEC

FUNCTION

CLK_1AT
CLK lB.T
CLK 2AT
CLK_2B.T
CLK 3AT
CLK_3B.T
CLK AT
CLK_B.T
QQ1.T
QQ2.T

!NY

PTERMS

CPUTIME

NO

6

<1

YES

7

1

NO

4

1

YES

3

1

NO

5

1

YES

4

FLAGS

CLK 1A.D
CLK lB.D
CLK_2AD

<1

NO

4

1

YES

3

<1

NO

5

<1

YES

6

2

NO

4

<1

YES

2

<1

CLK 2B.D
CLK_3A.D
CLK_3B.D

NO

C

YES

C

NO

2

1

YES

1

<1

NO

3

<1

YES

5

1

NO

6

<1

YES

11

2

C: Constant Function
FACT MINIMIZATION:

FUNCTION

CLK AD
QQ1.D
QQ2.D

!NY

PTERMS

CPUTIME

FLAGS

NO

12

<1

N

YES

27

<1

N

NO

5

<1

N

YES

34

1

N

NO

8

<1

N

31

<1

N

YES
NO

7

<1

N

YES

32

<1

N

NO

8

<1

N

YES

31

<1

N

NO

6

<1

N

YES

33

<1

N

NO

NT

YES

NT

NO

6

<1

N

YES

5

<1

N

NO

10

<1

N

YES

9

<1

N

N: No Optimization
T: Trivial Function
FACT MINIMIZATION:

2 SEC

11 SEC
The best example of this situation is a simple
synchronous binary counter. While the most significant
bit (MSB) of an N-bit counter in a D flip-flop implementation requires N product terms, the T flip-flop
solution requires only one product term. Note that the
Cypress family of CY7C33x devices offers you a configurable T or D type implementation if you place an
XOR gate prior to the D flip-flop; route the AND/OR
array to one of the XOR's inputs and the flip-flop's Q
output (via an additional product term) to the other
XOR input.
It isn't clear from simple observation, however,
whether the T flip-flop implementation is beneficial for
the clock generator state machine. One way to clarify
this question is to change three command lines in the
state machine description shown in Appendix A and
recompile to produce a T flip-flop implementation.
Table 3 contains the product term results using T flip-

Therefore, it is usually the most cost-effective solution
for a state machine.
Table 1 lists the number of product terms per output obtained by compiling the clock generator state
machine definition with the LOG/iC software, using D
flip-flops. The compiler input file appears in Appendix
A. Optimizing the design (Table 2) significantly reduces
the number of product terms needed.

T Flip-Flop Implementation
Even though D flip-flop solutions are more widely
available, there are times when the logic needed for this
implementation is prohibitively complex. Under these
circumstances, a T flip-flop implementation might be
more cost effective, because using T flip-flops reduces
the logic significantly.

6-181

State Machine Design Considerations and Methodologies
flops. A quick study of the results reveals that the optimized version using D flip-flops (Table 2) requires
fewer product terms than the T flip-flop version.

Table 3. Optimized Results for Clock Generator:
D Flip-Flop Implementation
LOG/IC OPTIMIZATION SUMMARY (FACT)

PLD Implementation

CPU TIME QUOTA PER FUNCTION: 100 SEC

With the LOG/iC PLD Database option. the
software assists in selecting a PLD. and it shows that
the non-optimized version of the clock state machine
fits in a PALC22V10 without further reduction. If the
equations are reduced using Boolean reduction. however. a lower-cost solution is available. The results
shown in Table 3 indicate that the less expensive
PALC2OG10 would work. Appendix A shows the listing
for the 20G10 LOG/iC implementation. Waveforms for
the completed design appear in Appendix B. You. can
verify the CLK_A and CLK_B equation results against
the equations generated in the State Machine Partitioning section of this application note.

FUNCTION

CLK_1A.D
CLK 1B.D
CLK_2A.D

PROM Implementation

CLK 2B.D

You can obtain very high speed solutions by implementing state machines using PROMs. A PROM
uses a look-up table to decode the machine's next state,
as opposed to the AND/OR array in a PLD. The main
advantage of using a look-up table to decode the next
state is that every combination of the inputs can be
decoded. Thus, you can create an extremely complex
machine, without equation reductions.
The look-up table's drawback is that the PROM's
depth grows exponentially (2N, where N = # of inputs
to the look-up table) with every additional input to the
look-up table. To determine the depth required, notice
that the present total input vector provides the inputs to
the look-up table. The clock generator state machine
has seven external inputs, six system outputs, and two
state outputs, which indicates a feasible implementation
using the CY7C277 (32K X 8) registered PROM.
Using a registered PROM such as the CY7C277 to
implement the machine also helps to reduce the parts
count, because the PROM implements both the state
and system output registers. LOG/iC offers support for
implementing state machines in PROMs, and only a few
minor changes to the state machine description shown
in Appendix A are required. *PROM replaces the *pAL
command, some simple statements indicating the
CY7C277 architecture (INPUTS = 15 AND OUTPUTS = 8 ) replaces the TYPE = statement, and
PROGFORMAT = INTEL-HEX.

CLK_3A.D
CLK_3B.D
CLK_A.D
CLK_B.D
QQ1.D
QQ2.D

INV

PTERMS

CPUTIME

NO

6

1

YES

11

2

NO

3

1

YES

4

<1

NO

4

1

YES

7

<1

NO

3

1

YES

4

<1

NO

4

1

YES

9

1

<1

NO

3

YES

3

1

NO

1

<1

YES

2

<1

NO

1

1

YES

2

<1
<1

NO

3

YES

3

1

NO

6

16

YES

6

FACT MINIMIZATION:

FLAGS

2
29 SEC

differently than you would when designing with traditional PLD architectures.
To fully understand the information in this section,
consult the Cypress Semiconductor application note,
"Understanding the CY7C361."
Using the clock generator state machine example,
this section shows how you can generate a state diagram
for the CY7C361 by following some simple rules. This
diagram allows you to determine whether the design
can fit in a CY7C361. The rule of thumb is that a state
diagram with 32 or fewer state nodes will probably fit.
(The likelihood of the implementation at. that point
depends totally upon split-input-array fitting issues.)
You can convert the state diagram directly into Boolean
equation s (with no Boolean reduction required) and
compile the equations into JEDEC code for the final
implementation.

CY7C361 Implementation
A new way to obtain high-speed operation· of state
machine s
became
available
with
Cypress
Semiconductor's development of a revolutionary architecture that enables a CMOS PLD state machine
part to operate at speeds in the 125-MHz range. The
first part in this family is the CY7C361. The architectural innovations used to obtain 125-MHz operation require that you approach state machine design slightly

6-182

State Machine Design Considerations and Methodologies
The CY7C361's condition-decode array has been
optimized for use in state machine applications. As
shown in Figure 7, the CY7C361 condition decoder contains the necessary logic to generate two kinds of state
machine operations. The Entering a State operation
should look familiar. The process used to generate the
system output and state register equations (in the System and State Register Output Generation section of
this application note) utilizes a similar equation form.
There are two small differences, though.
First, the Entering a State equation shown in Figure 7 assumes the present state conditions are available
as single entities on the input array. That is, one state
register uniquely defmes each state, and therefore the
present state is not encoded using multiple flip-flops, as
is typical in traditional state machines. There is a special case, however, that allows you to encode the states

Leaving a state
(a+b+c)*SO

(SA+SB+SC)*(a*/b)

Entering a state
Figure 7. The Condition Decoder - Optimized for Two
State Machine Operations

!~~~ ~~ ;I2 ~~::;) s:~o~g~~~l~~~v:~~;d:ctm~::~:

CY7C361. The flip-flop in this macrocell configuration
is unconditionally set by the active previous state macrocell. The flip-flop remains set until the condition
decoder equation (a Leaving a State equation) for the
TERMINATE macrocell goes active. Figure 10 shows
how the TERMINATE configuration looks within a
state diagram.
When implementing the. clock generator state
machine in the CY7C361 using the conversion techniques discussed above, the number of states slightly exceeds 32. But by allowing the machine's pipelined and
nonpipelined portions to share common states, (lA, 1B,
and 3B) the total number of states reduces to less than
32.
Note that you can use this same kind of state
reduction for the original implementation (refer to the
Unique States section). Figure 8 shows the resulting
state diagram.
It is a simple matter to convert information from
the state diagram to PLD ToolKit Equations (refer to
Appendix C for the PLD ToolKit source file). You must
generate an Entering a State equation for every state
node in the diagram. (The TERMINATE configuration
was not used in this example, but it can be useful for
implementing wait states.)
You generate the equations in Appendix C using
the  and  connectives for
the AND and NAND terms, respectively. Then
generate the system outputs by ORing the appropriate
states in the OR-based output array. For example, the
CLK lB output is active during the lB, 12B, 123B, or
123BX states. The PLD ToolKit connective for the OR
array is . The CY7C361 implementation
of the clock generator state machine was simulated
using the PLD ToolKit (see Appendix D).

state register is required in the decode of the next state.
An example of this is a simple synchronous 32-bit binary counter using the TOGGLE (or T flip-flop) configuration.
The second design difference with the CY7C361 is
that the Entering a State equation also shows all states
(SA, SB, SC) that have an identical state path to SO.
This is not necessarily the case when designing with
traditional PLDs (as shown in Figure6). The CY7C361,
however, requires a machine definition in which all state
paths into any given state are identical. You can easily
convert an existing state diagram and satisfy the new
condition by simply adding additional states for those
states that do not meet the above condition.
To remain consistent with the naming conventions
already defmed for the clock generator example, two
additional suffixes, "X" and "Y", indicate the additional
states. For example, state BW in Figure 6 has two state
paths entering into it WAIT * WEN from state 123A
and a path from state AW. To meet the design conditions for the CY7C361, you add an additional state,
BWX, such that state AW enters BWX with a state
path of 1, and state 123A enters state BW with a state
path of WAIT * WEN.
The CY7C361 implementation of the clock generator state machine appears in Figure 8. Note that both
of the new states (BW and BWX) have exits with the
same state path equation. Thus, the number of states in
the state machine does not grow geometrically due to
this new methodology.
In addition to the normal Entering a State equation, the CY7C361 supports operations in which multiple state paths go from one state to another, and each
state path term contains only one input. Figure 9 shows
a diagram of this condition.
Another operation, called Leaving a State, proves
especially useful in conjunction with the (Wait Until)
TERMINATE state macrocell configuration in the

Reference
1. Donald D. Givone, Introduction to Switching Circuit Theory (New York: McGraw-Hill, Inc., 1970)

6-183

State Machine Design Considerations and Methodologies

( SAME
PATHS

OUT

AS 1ZU)

Figure 8. CY7C361 Implementation, CPU Clock State Machine

adjacent
.>state
/
macrocells

ENTERING A STATE
E Q.

I

2

(SA) (a+/b+c)
Figure 9. Entering a State Along Multiple Paths
Figure 10. Leaving a State (TERMINATE
Configuration)

6-184

~

~~

State Machine Design Considerations and Methodologies'

~;r~~OID~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix A. LOG/iC PLD Source Code: Clock State Machine

LOG/iC-PAL

ReI 3.212-2328-1721100034 # 32-5955 90/03/15 23:49:45

LOG/iC - COPYRIGHT (C) 1985,1988 BY ISDATA GMBH, 7500 KARLSRUHE WEST-GERMANY
Cypress Semiconductor

LICENCE FOR IBM-PC/XT/AT

Data Set: OD20G 10.DCB
1
2
3
4
5
6
7
8
9
10 I
11
12 I
13 I
14 I
15 I
16 I
17 I
18 I
19 I
20 I
21 I
221
23
24 I
25
26 I
27 I
28 I
29 I
30 I
31 I
32 I
33 I
34 I
35 I
36 I
37 I
38 I
39 I
40
41 I
42
43 I

1: *IDENTIFICATION
2: PIPELINED CLOCKING SYSTEM OD2OO 10 ·317/90
3: ERIC B. ROSS
4: CYPRESS SEMICONDUCTOR
5: NAMING CONVENTION
6: OD
= SYSTEM OUTPUTS ARE DFLOPS AND ARE USED FOR STATE DEF
7: 20010 = PALC2OOI0 IMPLEMENTATION
8: *PAL
9: TYPE= PALC2OOI0
10:
11: *X-NAMES
12: ;---------------------------------------------------------------------13: ;INPUT DEFINITIONS:
14:; RUN = START & STOP EXECUTION OF OUTPUT CLOCKS (NORMAL, SINGLE
15:;
STEP, & BREAK PT. EXECUTION
16:; NPL = PIPELINED VS NON-PIPELINED MODE OF EXECUTION
17:; INTR = EXTERNAL INTERRUPT CONDITION (TLB MISS, PARITY ERROR, ... )
18:; lEN = INTERRUPT ENABLE
19: ; WAIT = WAIT ENABLE (CACHE MISS)
20: ; WEN = WAIT ENABLE
21: ;---------------------------------------------------------------------22:;
23: RUN, NPL, INTR, lEN, WAIT, WEN, RESET;
24:
25: *Z-NAMES
26: ;---------------------------------------------------------------------27: ;OUTPUT DEFINITIONS:
28:;
29: ; 3 CLOCK STAGES 1, 2, 3
30:; 2 CLOCKS PER STATE A, B
31:; CLK XX WHERE XX = lA,lB,2A,2B,3A,3B
32: ;
33:; 2 FREE RUNNING CLOCKS
34:; CLK A, CLK B
35:;
36:; ADDITIONAL REGISTERS FOR STATE DEFINITION
37:; QQl, QQ2
38: ;---------------------------------------------------------------------39:;
40: CLK lA, CLK lB, CLK 2A, CLK 2B, CLK 3A, CLK 3B, CLK A, CLK B, QQ 1, QQ2;
41:
42: *Z-VALUES
43:

6-185

~

State Machine Design Considerations and Methodologies
~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)

44 I
45 I
461
47 I
48 I
49 I
501
51 I
521
53 I
54
55
56
57 I
58
59
60
61
62
63
64
65
66
67
68
69 I
70
71
72
73
74
75
76
77
78 I
79
80
81
82
83 I
84
85
86
87
88
89
90
91
92
93
94
95 I

44: ;
ADDITIONAL OUTPUTS
45: ;
SYSTEM OUTPUTS FOR STATE DEFINITION
46:;
47:;
48: ;
CCCCCCCC QQ
LLLLLLLL QQ
49: ;
KKKKKKKK 12
50:;
51:;
112233AB
ABABAB
52:;
53:
; INIT COMMON STATES
54: SI = 0 0 0 0 0 0 0 0
; SA
- INACTIVE
55: S2 = 0 0 0 0 0 0 1 0 0MODE STATES
; SB
56: S3 = 0 0 0 0 0 0 0 1 057:
;SIA PIPELINE STATES
58: S4 = 1 0 0 0 0 0 1 0 - 0
; SIB
59: S5 = 0 1 0 0 0 0 0 1 - 0
; S12A
60: S6 = 10100010
;SI2B
61: S7 = 010-10001
62: S8 = 10101010
; S123A
; S123B
63: S9 = 0 1 0 1 0 1 0 1
64: SIO = 000 1 0 1 0 1
; S23B
;S3A
65: Sl1 = 0 0 0 0 1 0 1 0 - 0
66: S12 = 0 0 0 0 0 1 0 1 - 0
; S3B
; SAW
67: S13 = 000000 1 0 1 0
;SBW
68: S14 = 00000001 10
69:
70: S15 = 10000010 -1
; SIAN NON-PIPLINE
;SIBN
71: S16 = 01000001 -1
;S2AN
72: S17 = 00100010
;S2BN
73: S18 = 00010001
;S3AN
74: S19 = 00001010 -1
;S3BN
75: S20 = 00000101 -1
76: S21 = 00000010 11
; SAWN
;SBWN
77: S22 = 0000000111
78:
79: *STRING
80: INIT = 1
COMMON STATES
-INACTIVE MODE
81: SA
= 2
STATES
82: SB
= 3
83:
; PIPELINE STATES
84: SIA = 4
85: SIB = 5
86: S12A = 6
87: S12B = 7
88: S123A = 8
89: S123B = 9
90: S23B = 10
91: S3A = 11
92: S3B = 12
93: SAW
13
94: SBW = 14
95:

6-186

~
=:rCYPRESS

aas,

State Machine Design Considerations and Methodologies

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)

96 96: SIAN
15
; NON-PIPLINE
97 97: SlBN = 16
98 98: S2AN = 17
99 99: S2BN = 18
100 100: S3AN = 19
101 101: S3BN = 20
102 102: SAWN = 21
103 103: SBWN = 22
104 104: LASTSTATE = 22;
105 1105:
106 106: *FLOW-TABLE
107 1107: ;
108 I 108: ;----------------------------------------------------------------------109 I 109: ;RESET STATE
110 I 110: ;ALL STATES MUST RESET TO TIm INITIAL STATE (ALL OUTPUTS REGISTERS 0) UPON
111 I 111: ;AN ACTIVE RESET INPUT. SINCE TIm 20G1O HAS NO GLOBAL OR INDIVIDUAL
112 I 112: ;RESETS TO THE OUTPUT REGISTERS, RESET TO INITIAL STATE MUST BE EMBEDDED
113 I 113: ;INTO THE STATE MACHINE
114 1114: ;
115 115: RELEVENT = RESET
;
116 116: S[1 .. 'LASTSTATE'], X l , F 'INIT' ;ALL STATE INIT UPON RESET
117 138: RELEVENT = RESET = 0
118 1139: ;
119 I 140: ;----------------------------------------------------------------------120 I 141: ;INACTIVE MODE STATES
121 142: RELEVANT = RUN, NPL
,
122 143: S'INIT' ,X - ,F 'SA' ;INITIAL STATE AFTER RESET
123 1144:
,F'SB' ;INACTIVE MODE STATE, ONLY
124 145: S 'SA' , X - 125 1146:
126 147: S 'SB' , X 0 , F 'SA' ;FREE RUN CLKS A & B ARE ACTIVE
127 148:
Xl 0
, F 'SlA' ; PIPELINE VS.
128 149:
XII
, F 'SIAN' ; NON-PIPELINE DECISION
129 1150:
130 I 151: ;----------------------------------------------------------------------131 I 152: ;PIPELINE MODE STATES
132 1153:
133 154: RELEVANT = INTR, lEN
;*PRIMING THE PIPELINE *
134 155: S 'SlA' ,X - ,F 'SIB'
135 1156:
,F'S12A' ;
136 157: S 'SlB' , X - 137 1158:
,F'S12B' ;
138 159: S 'S12A' ,X - 139 1160:
, F 'SlA' ; INTERRUPT CONDITION? YES
140 161: S 'S12B' ,X 11
, F 'S123A' ;
NO
141 162:
Xl 0
142 163:
X 0, F 'S123A' ;
NO
143 1164:
144 165: RELEVANT = RUN, INTR, lEN, WAIT, WEN; *FULL PIPELINE *
145 166: S 'S123A' ,X - - - 1 1 , F 'SBW' ; WAIT CONDITION
146 167:
X 0 - - 0 - , F 'S23B' ; IRUN COND., EMPTY PIPELINE
147 168:
X 0 - - 10, F 'S23B' ; IRUN COND., EMPTY PIPELINE
148 169:
X 1- - 0 -, F 'S123B' ; RUN CONDITION
149 170:
X 1 - - 1 0, F 'S123B' ; RUN CONDITION
150 1171:

6-187

State ,Machine Design Considerations and Methodologies
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)

151 172: S 'S123B' ,X - 1 1 - - , F 'SlA' ; INTERUPT CONDITION
152 173:
X - 0 - - - , F 'S123A' ; RUN CONDITION
153 174:
X - 10- - , F 'S123A' ; RUN CONDITION
154 1175:
155 176: RELEVANT = RUN
; *EMPTY PIPELINE *
156 177: S 'S23B' ,X ,F 'S3A'
157 1178:
158 179: S 'S3A' ,X ,F 'S3B'
159 1180:
160 181: S'S3B' ,X ,F 'SA' ; BACK TO INACTIVE STATE
161 I 182:
162 183: RELEVANT = WAIT
; *PIPELINE WAIT STATES*
163 184: S 'SBW' ,X 1
,F 'SAW' ; WAIT
164 185:
X0
, F 'S123A'; /WAIT
165 1186:
166 187: S 'SAW' , X ,F 'SBW' ;
167 1188:
168 I 189: ;----------------------------------------------------------------------169 I 190: ;NON-PIPELINE MODE STATES
170 1191:
171 192: S 'SIAN' ,X , F 'SlBN' ;
172 1193:
173 194: S 'SlBN' ,X ,F 'S2AN' ;
174 1195:
175 196: RELEVANT = WAIT, WEN
;
176 197: S 'S2AN' ,X 11
,F 'SBWN' ; WAIT CONDITION
177 198:
X0,F 'S2BN' ; /WAIT CONDITION
178 199:
Xl 0
,F 'S2BN' ; /WAIT CONDITION
179 1200:
180 201: RELEVANT = INTR, lEN
,
181 202: S 'S2BN' ,X 11
,F 'SIAN' ; INTERRUPT CONDITION
182 203:
X0, F 'S3AN' ; /INTERRUPT CONDITION
183 204:
X 10
,F 'S3AN' ; /INTERRUPT CONDITION
184 1205:
185 206: RELEVANT = RUN
186 207: S 'S3AN' ,X , F 'S3BN' ;
187 1208:
188 209: S 'S3BN' ,X 1
' ,F 'SIAN' ;
189 210:
X0
,F 'SA' ; BACK TO INACTIVE STATE
190 I 211:
191 212: RELEVANT = WAIT
;*NON-PIPELINED WAIT STATES*
192 213: S 'SBWN' ,X 1
, F 'SAWN' ; REMAIN IN WAIT
193 214:
X0
, F 'S2AN' ; END OF WAIT CONDITION
194 1215:
195 216: S 'SAWN' ,X , F 'SBWN' ; REMAIN IN WAIT
196 I 217:
197 218: *STATE-ASSIGNMENT
198 219: Z-VALUES
199 I 220:
2001221:
201 222: *PIN
202 223: STATECLK = 1, RUN = 2, NPL = 3, INTR = 4, lEN = .5, WAIT = 6, WEN = 7,
203 223: RESET = 8, CLK 1A = 14, CLK IB = 15, CLK 2A = 16, CLK 2B = 17,
204 223: CLK 3A = 18, CLK 3B = 19, CLK A = 20, CLK B = 21, QQ 1-== 22, QQ2 = 23;
205 1224:
-

6-188

State Machine Design Considerations and Methodologies
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)

206
207
208
209
210

225:
226:
227:
228:
229:

*RUN-CONTROL
LISTING= LONG,SYMBOL-TABLE,EQUATIONS,PINOUT;
PROGFORMAT= L-EQUATIONS
OPTIMIAZATION= P-TERMS;
*END

LOG/IC SYMBOL TABLE
SYMBOL

TYPE

REG LEVEL PIN/NODE

LOCAL
- HIGH
GND
VCC
LOCAL
- HIGH
X-VARIABLE
2
RUN
- HIGH
NPL
X-VARIABLE
- HIGH
3
INTR
X-VARIABLE
- HIGH
4
IEN
X-VARIABLE
- HIGH
5
WAIT
X-VARIABLE
- HIGH
6
WEN
X-VARIABLE
- HIGH
7
RESET X-VARIABLE
- HIGH
8
14
CLK 1A X-VARIABLE
- HIGH
CLK-lB X-VARIABLE
15
- HIGH
CLK-2A X-VARIABLE
16
- HIGH
CLK-2B X-VARIABLE
- HIGH
17
CLK-3A X-VARIABLE
- HIGH
18
CLK-3B X-VARIABLE
- HIGH
19
CLK-A X-VARIABLE
- HIGH
20
CLK-B X-VARIABLE
- HIGH
21
QQ1
X-VARIABLE
22
- HIGH
QQ2
X-VARIABLE
23
- HIGH
CLK 1A.D Z-VARIABLE DFF HIGH
14
CLK-lB.D Z-VARIABLE DFF HIGH
15
CLK-2A.D Z-VARIABLE DFF HIGH
16
CLK-2B.D Z-VARIABLE DFF HIGH
17
CLK-3A.D Z-VARIABLE DFF HIGH
18
CLK-3B.D Z-VARIABLE DFF HIGH
19
CLK-A.D Z-VARIABLE DFF HIGH
20
CLK-B.D Z-VARIABLE DFF HIGH
21
Z-VARIABLE DFF HIGH
22
QQCD
QQ2.D Z-VARIABLE DFF HIGH
23
EXPANDED FUNCTION TABLE (INCLUDING LOCAL VARIABLES):
: CCCCCC
: LLLLLLCC
CCC CCC
: KKKK KKLL
RLLL LLLC C :
KK QQ
I W EKKK KKKL L :ll2233 QQ
GVRN NIA W S
K KQQ : ABAB ABAB 12
NCUP TEIE E 112233 QQ : ......... .
DCNL RNTN TABA B"ABA B12 : DDDD DDDD DD

6-189

State Machine Design Considerations and Methodologies
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)
---- ---- 1000 0000 0-- : 0000 0000 --;
---- ---- 0000 0000 0-- : 0000 0010 0-;
---- ---- 1000 000100- : 0000 0000 --;
---- ---- 0000 0001 00- : 0000 0001 0-;
---- ---- 1000 0000 10- : 0000 0000 --;
--0- ---- 0000 0000 10- : 0000 0010 0-;
--10 ---- 0000 0000 10- : 1000 0010 -0;
--11 ---- 0000 0000 10- : 1000 0010 -1;
---- ---- 1100 0001 0-0 : 0000 0000 --;
---- ---- 0100 0001 0-0 : 0100 0001 -0;
---- ---- 1010 0000 1-0 : 0000 0000 --;
---- ---- 0010 0000 1-0 : 1010 0010 --;
---- ---- 1101 0001 0-- : 0000 0000 --;
---- ---- 0101 0001 0-- : 0101 0001 --;
---- ---- 1010 1000 1-- : 0000 0000 --;
---- 11-- 00101000 1-- : 1000 0010 -0;
---- 10-- 0010 1000 1-- : 1010 1010 --;
---- 0--- 0010 1000 1-- : 1010 1010 --;
---- ---- 1101 0101 0-- : 0000 0000 --;
---- --11 0101 0101 0-- : 0000 0001 10;
--0- --0- 0101 0101 0-- : 0001 0101 --;
--0- --10 0101 0101 0-- : 00010101 --;
--1- --0- 0101 0101 0-- : 01010101 --;
--1- --10 010101010-- : 01010101 --;
---- ---- 1010 1010 1-- : 0000 0000 --;
---- 11-- 0010 1010 1-- : 1000 0010 -0;
---- 0--- 0010 1010 1-- : 1010 1010 --;
---- 10-- 0010 1010 1-- : 1010 1010 --;
---- ---- 1000 1010 1-- : 0000 0000 --;
---- ---- 0000 1010 1-- : 0000 1010 -0;
---- ---- 1000 01010-0 : 0000 0000 --;
---- ---- 0000 0101 0-0 : 0000 0101 -0;
---- ---- 1000 0010 1-0 : 0000 0000 --;
---- ---- 0000 0010 1-0 : 0000 0010 0-;
---- ---- 1000 0001 010 : 0000 0000 --;
---- ---- 0000 0001 010 : 0000 0001 10;
---- ---- 1000 0000 110 : 0000 0000 --;
---- --1- 0000 0000 110 : 0000 001010;
---- --0- 0000 0000 110 : 1010 1010 --;
---- ---- 1100 0001 0-1 : 0000 0000 --;
---- ---- 0100 0001 0-1 : 0100 0001 -1;
---- ---- 101000001-1 : 0000 0000 --;
---- ---- 00100000 1-1 : 0010 0010 --;
---- ---- 10010001 0-- : 0000 0000 --;
---- --11 0001 0001 0-- : 0000 0001 11;
---- --0- 0001 0001 0-- : 0001 0001 --;
---- --10 0001 0001 0-- : 0001 0001 --;
---- ---- 1000 1000 1-- : 0000 0000 --;
---- 11-- 0000 1000 1-- : 1000 0010 -1;
---- 0--- 0000 1000 1-- : 0000 1010 -1;
---- 10-- 0000 1000 1-- : 0000 1010 -1;
---- ---- 1000 0101 0-1 : 0000 0000 --;
---- ---- 0000 01010-1 : 0000 0101 -1;
---- ---- 1000 0010 1-1 : 0000 0000 --;
--1- ---- 0000 0010 1-1 : 1000 0010 -1;

11 116
2/ 143
31 117
41 145
51 118
6/ 147
71 148
81 149
91 119
101 155
111 120
12/ 157
131 121
141 159
151 122
16/ 161
171 162
181 163
191 123
201 166
211 167
22/ 168
231 169
241 170
251 124
26/ 172
271 173
281 174
291 125
301 177
311 126
32/ 179
331 127
341 181
351 128
36/ 187
371 129
38/ 184
39/ 185
401 130
41/ 192
42/ 131
43/ 194
441 132
45/ 197
46/ 198
471 199
48/ 133
49/ 202
501 203
51/ 204
5V 134
53/ 207
54/ 135
55/ 209

6-190

State Machine Design Considerations and Methodologies
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)
--0- ---- 0000 0010 1-1 : 0000 0610 0-;
---- ---- 1000 0001 011 : 0000 0000 --;
---- ---- 0000 0001 011 : 0000 000111;
---- ---- 1000 0000 111 : 0000 0000 --;
---- --1- 0000 0000 111 : 0000 001011;
---- --0- 0000 0000 111 : 0010 0010 --;
REST
: ---- ---- --; 62
1234 5678 9012 3456 789

56/ 210
571 136
581 216
591 137
601 213
61/ 214

1234 5678 90

STATE ASSIGNMENT:
CCCC CC
LLLLLLCC
KKKKKKLL
KKQQ
112233 QQ
ABAB ABAB 12
0000 0000 --; 1
0000 0010 0-; 2
0000 00010-; 3
1000 0010 -0; 4
0100 0001 -0; 5
1010 0010 --; 6
0101 0001 --; 7
1010 1010 --; 8
0101 0101 --; 9
0001 0101 --; 10
0000 1010 -0; 11
0000 0101 -0; 12
0000 0010 10; 13
0000 0001 10; 14
1000 0010 -1; 15
0100 0001 -1; 16
0010 0010 --; 17
0001 0001 --; 18
0000 1010 -1; 19
0000 0101 -1; 20
0000 0010 11; 21
0000 0001 11; 22
EXPANDED FUNCTION TABLE (LOCAL VARIABLES REMOVED):
: CCCCCC
: LLLLLLCC
C CCCC C
: KKKK KKLL
RL LLLL LCC
KK QQ
I W EK KKKK KLL :-IT22 33 QQ
RNNI AWS
KKQ Q : ABAB ABAB 12
UPTE lEE 1-1223 3- Q Q : ......... .
NLRN TNTA BABA BAB1 2 : DDDD DDDD DD

6-191

State Machine Design Considerations and Methodologies
Appendix A. LOG/iC PLO Source Code: Clock State Machine (Continued)

---- --100000 000- - : 0000 0000 --; 11 116
---- --00 0000 000- - : 0000 0010 0-; 'll 143
---- --100000 0100 - : 0000 0000 --; 31 117
---- --00 0000 0100 - : 0000 0001 0-; 41 145
---- --100000 0010 - : 0000 0000 --; 51 118
0--- --00 0000 0010 - : 0000 0010 0-; 6/ 147
10-- --00 0000 0010 - : 1000 0010 -0; 71 148
11-- --00 0000 0010 - : 1000 0010 -1; 81 149
---- --11 0000 010- 0 : 0000 0000 --; 91 119
---- --01 0000 010- 0 : 0100 0001 -0; 101 155
---- --10 1000 001- 0 : 0000 0000 --; 11/ 120
---- --00 1000 001- 0 : 1010 0010 --; 1'll 157
---- --11 0100 010- - : 0000 0000 --; 131 121
---- --01 0100 010- - : 0101 0001 --; 141 159
---- --10 1010 001- - : 0000 0000 --; 151 122
--11 --00 1010 001- - : 1000 0010 -0; 16/ 161
--10 --00 1010 001- - : 1010 1010 --; 171 162
--0- --00 1010 001- - : 1010 1010 --; 181 163
---- --11 0101 010- - : 0000 0000 --; 191 123
---- 1101 0101 010- - : 0000 0001 10; 201 166
0--- 0-01 0101 010- - : 0001 0101 --; 21/ 167
0--- 1001 0101 010- - : 0001 0101 --; 221 168
1--- 0-01 0101 010- - : 0101 0101 --; 231 169
1--- 1001 0101 010- - : 01010101 --; 241 170
---- --10 1010 101- - : 0000 0000 --; 251 124
--11--00 1010 101- - : 1000 0010 -0; 26/ 172
--0- --00 1010 101- - : 1010 1010 --; 271 173
--10 --00 1010 101- - : 1010 1010 --; 281 174
---- --100010101- - : 0000 0000 --; 291 125
---- --00 0010 101- - : 0000 1010 -0; 301 177
---- --100001010- 0 : 0000 0000 --; 31/ 126
---- --00 0001 010- 0 : 0000 0101 -0; 3'll 179
---- --100000 101- 0 : 0000 0000 --; 331 127
---- --00 0000 101- 0 : 0000 0010 0-; 341 181
---- --100000 0101 0 : 0000 0000 --; 351 128
---- --00 0000 0101 0 : 0000 0001 10; 36/ 187
---- --10 0000 0011 0 : 0000 0000 --; 371 129
---- 1-00 0000 0011 0 : 0000 0010 10; 381 184
---- 0-00 0000 0011 0 : 1010 1010 --; 391 185
---- --11 0000 010- 1 : 0000 0000 --; 401 130
EXPANDED FUNCTION TABLE (LOCAL VARIABLESREMOVED)- continued:
---- --01 0000 010- 1 : 0100 0001 -1;
---- --10 1000 001- 1 : 0000 0000 --;
---- --00 1000 001- 1 : 00100010 --;
---- --10 0100 010- - : 0000 0000 --;
---- 1100 0100 010- - : 0000 0001 11;
---- 0-00 0100 010- - : 0001 0001 --;
---- 1000 0100 010- - : 0001 0001 --;
---- --100010 001- - : 0000 0000 --;
--11 --00 0010 001- - 1000 0010 -1;
--0- --00 0010 001- - 0000 1010 -1;
--10 --00 0010 001- - 0000 1010 -1;
---- --10 0001 010- 1 0000 0000 --;

41/ 192
4'll 131
431 194
441 132
451 197
46/ 198
47/ 199
481 133
491 202
501 203
51/ 204
5'll 134

6-192

State Machine Design Considerations and Methodologies
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)
---- --00 0001 010- 1 : 0000 0101 -1;
---- --100000 101- 1 : 0000 0000 --;
1--- --00 0000 101- 1 : 1000 0010 -1;
0--- --00 0000 101- 1 : 0000 0010 0-;
---- --100000 01011 : 0000 0000 --;
---- --00 0000 0101 1 : 0000 0001 11;
---- --10 0000 0011 1 : 0000 0000 --;
---- 1-00 0000 00111 : 0000 001011;
---- 0-00 0000 0011 1 : 00100010 --;
REST
: ---- ---- --; 62
1234 5678 9012 3456 7

53/ 207
54/ 135
55/ 209
56/ 210
57/ 136
58/ 216
59/ 137
60/ 213
61/ 214

1234 5678 90

PIPELINED CLOCKING SYSTEM OD2OG10 3/7/90
ERIC B.ROSS
CYPRESS SEMICONDUCTOR
90/03/15 23:49:45

****************************************************
*** NET DESCRIPTION TABLE FOR AND/OR STRUCTURE ***
****************************************************
: CCCCCC
: LLLLLLCC
C CCCC C
: KKKK KKLL
RL LLLL LCC:
KK QQ
I W EK KKKK KLL - :112233 QQ
RNNI AWS
KKQ Q : ABABABAB 12
UPTE IEEl-1223 3- Q Q : ..........
NLRN TNTA BABA BAB12 : DDDD DDDD DD
INV ......... .
REG DDDD DDDD DD
---- 0-0- --0- 0-11 0 : A. ........ ; 1
1--- --0- --0- 1--- 1 : A ......... ; 2
---- --0- 1--- ---- 0 : A ......... ; 3
---- --0- 1-1- ---- - : A ......... ; 4

--11 --0- --1- 0--- - : A ......... ; 5
1--- --0- 0-0- 0-10 - : A. ........ ; 6
---- --01 ---0 ---- - : .A ........ ; 7
1--- -001 ---- ---- - : .A ........ ; 8
1--- 0-01 ---- ---- - : .A ........ ; 9
---0 --0- 1--- ---- - : ..A ....... ; 10
--0- --0- 1--- ---- - : ..A ....... ; 11
---- 0-0- --0- 0-11 - : ..A ....... ; 12
---- --0- 1-0- ---- - : ..A. ...... ; 13
---- 0-0- -1-- ---- - : ... A ...... ; 14
---- -00- -1-- ---- - : ...A ...... ; 15
---- --01 -1-0 ---- - : ...A ...... ; 16
---0 --0- --1- ---- - : .... A ..... ; 17
--0- --0- --1- ---- - : .... A ..... ; 18
---- --0- 0-1- 1--- - : .... A ..... ; 19

6-193

~

State Machine Design Considerations and Methodologies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)

---- 0-0- 0-0- 0-11 0 : .... A ..... ; 20
---- 0-0- ---1 ---- - : .....A .... ; 21

---- -00- ---1 ---- - : .....A .... ; 22
---- --0- -0-1 ---- - : .....A .... ; 23
---- --0- ---- -0-- - : ......A. ..; 24
---- --0- ---- -1-- - : .......A .. ; 25
---- ---- ---- -1-1 - : ........ A.; 26
---- ---- ---- 0-11 - : ........ A.; 27
---- ---- -1-- ---- - : ........ A.; 28
---- ---- --0- 1--- - : .........A; 29
---- ---- 0-1- 0--- - : .........A; 30
---- ---0 -1-- ---- - : .........A; 31
---- ---- -0-- -1-- 1 : .........A; 32
-1-- ---00--00--0 - : .........A; 33
---- ---- 00-- 0--1 1 : .... .... .A ; 34
1234 5678 9012 3456 7 : 1234 5678 90
PIPELINED CLOCKING SYSTEM OD2oo 10 3/7/90
ERIC B.ROSS
CYPRESS SEMICONDUCTOR
90103/15 23:49:45

****************************************************
***
BOOLEAN EQU A TIONS
***
****************************************************
CLK lA.D
'- IWAIT & /RESET & ICLK 2B
& ICLK_3B
& CLK B
& QQl &/QQ2
+ RUN & IRESET & ICLK 2B
& CLK_3B
& QQ2
+ /RESET & CLK lB
& iQQ2
+ /RESET & CLK-IB
& CLK 2B
+ INTR & lEN -& IRESET & eLK 2B
& ICLK 3B
+ RUN & /RESET & ICLK lB
&-/CLK 2B
&-/CLK 3B
& CLK_B &/QQl ; CLK lB.D
'- IRESET & CLK lA
& ICLK 3A
+ RUN & lWEN & IRESET &. CLK lA
+ RUN & IWAIT & IRESET & CLK')A
CLK 2A.D
'- lIEN & /RESET & CLK lB
+ IINTR & /RESET & cLk lB
+ IWAIT & IRESET & ICLK-2B
& ICLK_3B
& QQl
+ /RESET & CLK_IB
& ICLK_2B
CLK 2B.D
'- IWAIT & IRE SET & CLK 2A
+ lWEN & /RESET & CLK 2A
+ /RESET & CLK_IA
& CLK_2A

6-194

S;~

State Machine Design Considerations and Methodologies
~~ ~~OID~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)

CLK 3A.D

:=

& IRESET & CLK 2B
+ IINTR & IRESET & CLK 2B
+ IRESET & ICLK lB
& -CLK 2B
& CLK 3B
+ IWAIT & IRESE-T & ICLK lB - & ICLK 2B - & ICLK_3B
& CLK_B & QQ1 & !QQ2
-

- lIEN

CLK 3B.D .- /WAIT & IRESET & CLK 3A
+ lWEN & IRESET & CLK 3A
+ /RESET & ICLK_2A
& CLK_3A
CLK A.D
'- IRESET & ICLK_A
CLK B.D
'- IRESET & CLK_A ;
QQl.D := CLK A & QQ1
+ ICLK 3B& CLK B & QQ1
+ CLK=2A
& CLK 3B
QQ2.D := ICLK 2B
+ ICLK lB& CLK 2B& ICLK 3B
+ ICLK-1A
& CLK-2A
+ ICLK-2A
& CLK-A & QQ2
+ NPL - & ICLK 1A - & ICLK lB
& ICLK_3A
& ICLK 3B & IQQ 1
+ ICLK_lB - & ICLK_2A
& ICLK_3B
& QQ1

PIPELINED CLOCKING SYSTEM OD2OGlO 3/7/90
ERIC B. ROSS
CYPRESS SEMICONDUCTOR
90/03/15 23:49:45

PALC2OG10

STATECLK

24 @VCC

RUN 2

23 QQ2

NPL 3

22 QQ1

INTR 4

21 CLK_B

lEN 5

20 CLK_A

WAIT 6

19 CLK_3B

WEN 7

18 CLK_3A

RESET 8

17 CLK_2B

6-195

& QQ2

~

~~

.
State Machine Design Considerations and Methodologies
~;r~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)
@09 9

16 CLK_2A

@10 10

15 CLK_lB

@11 11

14 CLK_IA

@GNO 12

13 @OE

PIPE LINED CLOCKING SYSTEM 00200 10 3/7/90
ERIC B. ROSS
CYPRESS SEMICONDUCTOR
90/03/15 23:49:45
S
T
A
T
I
E @
NNRCVQQ
TPULCQQ
RLNKC21
432 1282726
5

lEN 6
WAIT 7
23 CLK 3B
PALC2oo10
8
22 CLK_3A
LCC
WEN 9
21 CLK_2B
RESET 10
11

20 CLK_2A
19

12 13 14 15 16 17 18
@@@@@CC
011GOLL
901NEKK
D
1 1
AB
PIPELINED CLOCKING SYSTEM 00200 10 3/7/90
ERIC B. ROSS
CYPRESS SEMICONDUCTOR
90/03/15 23:49:45

6-196

....::=-....

%;~RFSS
-=:::!!!!I!!f"

State Machine Design Considerations and Methodologies

=============================;;;;;;

SEMlCONDOCTOR

Appendix A. LOG/iC PLD Source Code: Clock State Machine (Continued)

S
T
A
T
E

NRC
P U L
L N K

@

V Q
C Q
C 2

432 1282726
INTR 5

25 QQ1

lEN 6

WAIT 7
23 CLK A
PALC2OGlO
WEN 8
22 CLK 3B
PLCC
RESET 9
21 CLK_3A
@09 10

20 CLK_2B
19 CLK 2A

11

12 13 14 15 16 17 18
@@@@CC

11GOLL
01NEKK
D

LOG/iC - PAL

CPU TIME USED:

45 SEC

6-197

State Machine Design Considerations and Methodologies
Appendix B. LOG/iC Simulation: Clock State Machine

PIPELINED CLOCKING SYSTEM OD200 10 317190
CCCC

ES

R
E

vt
W
eaR N N I A W S
n t
U PTE l E E

teN L R N
#

#

CC

LLLL LLCC
KKKK KKLL

TNT

K K

C C 2- 2-

3- 3A B A B A -B -A B

0-10-10-10-1 0-10-10-1: 0-1 0-1 0-1 0-1 0-10-10-10-1 0

Top of trace buffer

1 lIU::
1 1 IC :
1 lIU:
2 lIU:
2 1 IC:
2 lIU:
3 lIU:
3 lIC:
3 2IU:
4 2IU:
4 2IC:
4 3IU:
5 3IU:
5 3 IC:
5 2IU:
6 2IU:
6 2IC:
6 3IU:
7 3IU:
7 3 IC:
7 4IU:
8 4IU:
8 4IC:
8 5IU:
9 5IU:
9 5IC:
9 6IU:
10 6IU:
10 6IC:
10 7IU:
11 7IU:
11 7IC:
11 8IU:
12 8IU:
12 8IC:
12 9IU:
13 9IU:
13 9IC:
13 8IU:
14 8IU:
14 8IC:
14 9IU:

6-198

State Machine Design Considerations and Methodologies
Appendix B. LOG/iC Simulation: Clock State Machine (Continued)

PIPE LINED CLOCKING SYSTEM OD2oo 10 3/7/90
CCCC CC
ES
R LLLL LLCC
vt

W

ea
RNNI
n t
U PTE
teN L R N
#

15
15
15
16
16
16
17
17
17
18
18
18
19
19
19
20
20
20
21
21
21
22
22
22
23
23
23

9IU:
9IC:
8IU:
8IU:
8IC:
10 IU :
10 IU :
10IC:
11 IU :
11 IU :
11 IC :
12IU:
12IU:
12IC:
2IU:
2IU:
2IC:
3IU:
3IU:
3 IC :
2IU:
2IU:
2IC:
3IU:
3IU:
3IC:
15IU:
15IU:
15 IC:
16IU :
16IU :
16IC:
17 IU :
17IU :
17IC:
18IU :
18IU :
18 IC:
19IU:
19IU:
19IC:
20IU:
20IU:
20IC:
15IU :

24
25
25
25
26
26
26
27
27
27
28
28
28
29
29
29

KKKK

KKLL
KK

C C 2- 2- 3- 3A B A B

A -B -A B

0-10-10-10-1 0-10-10-1: 0-1 0-1 0-1 0-1 0-10-10-10-1 0

#

24
24

E

AWS
lEE
TNT

6-199

fir:~OR ======s;;;;;t;;;;;a;;;;;te=M=aC;;;;;h;;;;;i;;;;;D;;;;;e;;;;;D;;;;;es;;;;;·;;;;;ign=C=OD;;;;;s;;;;;i;;;;;d;;;;;er;;;;;a;;;;;t;;;;;io;;;;;D;;;;;S;;;;;a;;;;;D;;;;;d=M;;;;;e;;;;;t;;;;;h;;;;;Od;;;;;O;;;;;I;;;;;O;;;;;gI;;;;;·e;;;;;s=;;;;;;;;;
Appendix B. LOG/iC Simulation: Clock State Machine (Continued)
PIPELINED CLOCKING SYSTEM OD2OG 10 317/90
CCCC CC
ES
R LLLL LLCC
vt
W
E KKKK KKLL

eaR N N I A W S
t
U PTE
teN L R N

n

#

#

lEE
TNT

CC
A

2- 2- 3- 3-

B A

B

K K

A -B -A B

0-10-10-10-1 0-10-10-1: 0-1 0-1 0-1 0-1 0-10-10-10-1 0

30 15IU :
30 15IC:
30 16 IV :
31 16 IV :
31 16IC:
31 17 IV :
32 17 IV :
32 17 IC:
32 18 IV :
33 18 IV :
33 18IC:
33 19 IV :
34 19 IV :
34 19 IC:
34 20 IV :
35 20 IV:
35 20IC:
35 15 IV :
36 15IU :
36 15IC:
36 16 IV :
37 16 IV :
37 16IC:
37 17 IV :
38 17 IV :
3817IC:
38 18 IV :
39 18 IV :
39 18IC:
39 19IU:
40 19 IV :
40 19IC:
40 20 IV :
41 20IU :
41 20IC:
41 2 IV :
42 2IU:
42 2IC:
42 3IU:
43 3 IV:
43 3IC:
43 2 IV:

6-200

.-..

£. :;~RESS
~,

State Machine Design Considerations and Methodologies

~COID~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix B. LOG/iC Simulation: Clock State Machine (Continued)

PIPELINED CLOCKING SYSTEM OD2OG 10 317/90
CCCC CC
ES
R LLLL LLCC
vt
W
E
KKKK KKLL

eaR N N I A W S
n t

V PTE
teN L R N

#

#

lEE
TNT

C C 2- 2- 3- 3A

B A

B

K K

A -B -A B

0-10-10-10-1 0-10-10-1: 0-1 0-1 0-1 0-1 0-10-10-10-1 0

44 2IV:
44 2IC:
44 3IU:

45 3IU:
45 3IC:
45 4IU:
46 4IU:
46 4IC:
46 5IU:
47 5IU:
47 5IC:
47 6IU:
48 6IU:
48 6IC:
48 7IU:
49 7 IV :
49 7IC:
49 8IU:
50 8IU:
50 8IC:
50 9IU:
51 9IU:
51 9IC:
51 8 IV :
52 8 IU :
52 8IC:
52 9IU:
53 9IU:
53 9IC:
53 4IU:
544IU:
54 4IC:
54 5IU:
55 5IU:
55 5IC:
55 6IU:
56 6IV:
56 6IC:
56 7 IU :
57 7IU:
57 7IC:
57 8IU:
58 8IU:
58 8 IC:
58 9IU:
59 9IU:
59 9IC:
6-201

State Machine Design Considerations and Methodologies
Appendix B. LOG/iC Simulation: Clock State Machine (Continued)
PIPELINED CLOCKING SYSTEM OD2OG 10 3/7/90
CCCC CC
E S
R
LLLL LLCC
vt
W
E
KKKK KKLL
ea
RNNI AWS
KK
n t
U PTE l E E
C C 2- 2- 3- 3t e N L R N TNT A B A B A -B -A B

#

#

0-1 0-1 0-1 0-1 0-1 0-1 0-1 : 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0-1 0

59 8IU:
60 8IU:
60 8 IC :
60 9IU:
61 9IU:
61 9IC:
61 8IU:
62 8IU:
62 8IC:
62 14 I :
63 14 I :
63 14IC:
63 13 I :
64 13 I :
64 13IC:
64 14 I :
65 14 I :
65 14IC:
65 13 I :
66 13 I :
66 13IC:
66 14 I :
67 14 I :
67 14IC:
67 8IU:
68 8IU:
68 8IC:
68 9IU:
699IU:
69 9IC:
69 8IU:
70 8IU:
70 8 IC:
70 9IU:
71 9IU:
71 9IC:
71 1IU:
72 1 IU :
72 1 IC:
72 1IU:
73 lIU:
73 1 IC :

6-202

State Machine Design Considerations and Methodologies
Appendix B. LOG/iC Simulation: Clock State Machine (Continued)
PIPELINED CLOCKING SYSTEM OD2OG 10 317/90
CCCC CC
E S
R LLLL LLCC
vt
W
E KKKK KKLL

eaR N N I A W S
U PTE
teN L R N

n t

#

#

lEE
TNT

K K

C C 2- 2-

3- 3A B A B A -B -A B

0-10-10-10-1 0-10-10-1: 0-1 0-1 0-1 0-1 0-10-10-10-1 0

73 1IU:
74 1 IU :
74 1 IC:
74 2IU:
75 2IU:
75 2IC:
75 3IU:
76 3IU:
76 3IC:
76 4IU:
77 4IU:
77 4 Ie:
77 5IU:
78 5IU:
78 5IC:
78 6IU:
79 6 IV:
79 6IC:
79 7IU:
80 7IU:
80 7 IC:
80 4IU:
81 4IU:
81 4IC:
81 5IU:
82 5IU:
82 5IC:
82 6IU:
83 6IU:
83 6IC:
83 7IU:
84 7IU:
84 7IC:
848IU:
85 8 IV :
85 8 IC :
85 9IU:
86 9IU:
86 9IC:
86 8IU:
87 8IU:
87 8IC:
87 9IU:
88 9IU:
88 9IC:
88 8IU:
89 8IU:

6-203

~

State Machine Design Considerations and Methodologies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix B. LOG/iC Simulation: Clock State Machine (Continued)

PIPELINED CLOCKING SYSTEM OD2OG10 317/90
eccc CC
ES
R LLLL LLCC
vt
W
E
KKKK KKLL

eaR N N I A W S
V PTE
teN L R N

n t

#

#

lEE
TNT

K K

C C 2- 2-

3- 3A B A B A -B -A B

0-10-10-10-1 0-10-10-1: 0-1 0-1 0-1 0-1 0-10-10-10-1 0

89 8IC:
891IV:
901IV:
90 1 IC:
902IV:
91 2 IV :
91 2IC:
91 3 IV :
923IV:
92 3 IC:
92 15 IV :
93 15 IV :
93 15IC:
93 16 IV :
94 16 IV :
94 16IC:
94 17 IV :
95 17 IV:
95 17 IC:
95 18 IV :
96 18 IV :
96 18IC:
96 15 IV :
97 15 IV :
97 15 IC:
97 16 IV :
98 16 IV :
98 16IC:
98 17 IV :
99 17 IV:
99 17 IC:
99 221 :
100 22 I :
100 22IC :
100 21 I :
101 21 I :
101 21 IC :
101 22 I :
102 22 I :
102 22IC:
102 21 I :

6-204

State Machine Design Considerations and Methodologies
Appendix B. LOG/iC Simulation: Clock State Machine (Continued)

PIPELINED CLOCKING SYSTEM OD2OG 10 317/90
CCCC CC
E S
R LLLL LLCC
vt
W
E KKKK KKLL
ea
RNNI AWS
KK
n t
U PTE l E E
C C 2- 2- 3- 3t e N L R N TNT A B A B A -B -A B
#

#

0-10-10-10-1 0-1 C-1 0-1 : 0-1 0-1 0-1 0-1 0-10-10-10-1 0

103 21 I :
103 21IC:
103 22 I :
104 22 I :
104 221C :
104 17IU :
105 171U:
105 171C:
105 18IU:
106 181U:
106 181C:
106 19 IU :
107 19IU:
107 191C:
107 20 IU :
108 20lU :
108 201C:
108 151U:
109 151U:
109 151C:
109 161U:
110 161U :
110 161C:
110 171U:
111 17 IU :
11117IC:
111 181U:
112 18IU:
112 18IC:
112 19IU :
113 191U:
113 191C:
113 20IU :
114 20IU :
114 20IC:
114 21U:
115 21U:
115 21C:
115 31U:
116 31U:
116 31C:
116 21U:

6-205

t;F;~
~

--;;;;;=====s;;;;;;ta;;;;;;t;;;;;;e;;;;;;M=a;;;;;;C;;;;;;h;;;;;;iD;;;;;;e;;;;;;D;;;;;;es=ign=;;;;;;C;;;;;;O;;;;;;D;;;;;;Si;;;;;;d;;;;;;er;;;;;;a;;;;;;t;;;;;;iO;;;;;;D;;;;;;S;;;;;;a;;;;;;D;;;;;;d;;;;;;M=e;;;;;;th;;;;;;o;;;;;;d;;;;;;O;;;;;;IO;;;;;;gI;;;;;;·;;;;;;es==

SEMICQIDUCTOR_

Appendix C. Cypress PLD ToolKit: CY7C361 Implementation
CY7C361;
{PIPELINED CLOCKING SYSTEM AN1 361 4/27/90
ERIC B. ROSS
CYPRESS SEMICONDUCTOR}
CONFIGURE;
{ ---------------------------------------------------------------------------------------------------

----------------------------

;---------------_ ... _---------------------------------------------------------------------------------

----------------------------

; INPUT DEFINITIONS:
; RUN
= START & STOP EXECUTION OF OUTPUT CLOCKS (NORMAL, SINGLE STEP,
,
& BREAK PT. EXECUTION
; NPL
= PIPELINED VS NON-PIPELINED MODE OF EXECUTION
; INTR
= EXTERNAL INTERUPT CONDITION (TLB MISS, PARITY ERROR, ...)
; lEN
= INTERRUPT ENABLE
; WAIT = WAIT ENABLE (CACHE MISS)
; WEN
= WAIT ENABLE
; RPT_EO = USED TO DUB CLK_1B, CLK USED TO UPDATE THE EO REG
;OUTPUT DEFINITIONS:

,

; 3 CLOCK STAGES 1,2,3
; 2 CLOCKS PER STATE A, B
CLK_XX WHERE XX = 1A,lB,2A,2B,3A,3B

,
; 2 FREE RUNNING CLOCKS
CLK_A, CLK_ B

;--------------------------------------------------------------------------------------... ------------ ......_----------------------}

RUN(node= 3), STATECLK, NPL, INTR,
IEN(node= 9), WAIT, WEN, RESET,
IRPT_EO,

{*INPUTS*}

ICLK A(node= 16), ICLK B, ICLK lA,
ICLK-2B(node= 24), ICLK IB(and~ ICLK 2A, /CLK 3A,
ICLK)B,
--

{LOCAL 8
FEEDBACK

LOCAL 8
HALF 16
GLOBAL 32
FEEDBACK FEEDBACK FEEDBACK

{*OUTPUTS*}

*STATE MACROCELLS*
START = DEFAULT}

AX(node= 32),
lA,

A,
lAX,

B,
12A,

BW,
BWX,

AW,

12B,

AWN,

BWN,
2AN,
BWNX(node= 53),2ANX,

23B,
3A,

23BX,
3AN,

2BNX,
3ANX,

1B,
123A,

{LOCAL 8 = 1, HALF = 1}

123AX,
{LOCAL 8 = 2, HALF = 1}
123AY(node= 47),

=

123B,
123BX,

{LOCAL 8 = 1, HALF

2BN,
3BN,

{LOCAL 8 = 2, HALF = 2}

6-206

2}

State Machine Design Considerations and Methodologies
Appendix C. Cypress PLD ToolKit: CY7C361 Implementation (Continued)
{*MISC*}

IENA(node= 29),IENB,
GLBRST(node= 64),
GND(NODE= 73),
CLKDB(NODE= 74)
EQUATIONS;

{*MISC*}

GLBRST = < prod> RESET;
IENA = < INV_SUM> IGND;
IENB = < INV_SUM> IGND;
AX = < prod> IRESET;
A

{*STATE MACROCELLS}

< prod> IRUN
< invyrod> IB * 13BN;

B

< prod>
dnvyrod> lAX * IlAX

lB

< prod>
< invyrod> IlA * IlAX;

lA

< prod> INTR * IEN

< invyrod> Il23B * Il23BX * Il2B * 12BN;

lAX

< prod> RUN
< invyrod> IB * 13BN;

l2A

< prod> INPL
< invyrod> 11B;

l23A

< prod> IINTR
< invyrod> Il2B * Il23B * Il23BX;

BW

< prod> WAIT * WEN
< invyrod> 1123A * 1123AX * Il23A Y;

AW

< prod> WAIT
< invyrod> IBW * IBWX;

l2B

=

l23AX=

< prod> 12A;
< prod> INTR * lIEN
< invyrod> 112B * 1123B * Il23BX;

BWX = < prod> AW;
l23A Y= < prod> IWAIT
< invyrod> IBW * IBWX;
AWN = < prod> WAIT
< invyrod> IBWN * IBWNX;

6-207

State Machine Design Considerations and Methodologies
AppendixC. Cypress PLD ToolKit: CY7C361 Implementation (Continued)
BWN

< prod> WAIT * WEN
< inv""prod> 12AN;

2AN

< prod> NPL
< inv""prod> IlB;

123B = < prod> RUN * IWAIT
< inv""prod> 1123A * 1123AX

* 1123AY;

BWNX = < prod> AWN;
2ANX = < prod> IWAIT
< inv""prod> IBWN

* IBWNX;

123BX = < prod> RUN * WAIT * lWEN
< inv""prod> 1123A * 1123AX * 1123AY;
< prod> IR UN * WAIT * lWEN
< inv""prod> 1123A * 1123AX * 1123AY;

23B

23BX = < prod> IR UN * IWAIT
< inv""prod> 1123A * 1123AX
2BNX

=

< prod> WAIT * lWEN
< inv""prod> 12AN * 12ANX;

2BN

< prod> IWAIT
< invyrod> 12AN

* 12ANX;

3A

< prod>
< invyrod> 123B

* 123BX;

3AN =

< prod> IINTR
< invyrod> 12BN

3ANX

< prod> INTR * lIEN
< inv""prod> 12BN * 12BNX;

3BN

=

* 1123AY;

< prod>
< invyrod> 13A

* 12BNX;

* 13AN * 13ANX;

6-208

State Machine Design Considerations and Methodologies
Appendix C. Cypress PLD ToolKit: CY7C361 Implementation (Continued)
CLK_A

= < iny sum> IA * lAX * 11A, * 11AX * {*OUTPUTS*}
-

112A * 1123A * 1123AX *
1123AY * lAW * 13A * 12AN *
12ANX * lAWN * 13AN * 13ANX;

< iny sum> IB * IlB * 112B * 1123B *
1123BX * IBW * IBWX * 123B *
123BX * 12BN * 12BNX * IBWN *
IBWNX * 13BN;

CLK_IA

= < iny sum> 11A * 11AX * 112A * 1123A *
-

1123AX

* 1123AY;

CLK_IB = < inY_sum> liB * 112B * 1123B * 1123BX;
CLK_2A = < iny sum> 112A * 1123A * 1123AX *
1123AY * 12AN * 12ANX;
< inY_sum> 112B * 1123B * 1123BX *
123B * 123BX * /2BN * /2BNX;

CLK_2B

=

CLK_3A

= < inY_sum> 1123A * 1123AX * 1123AY *
/3A

CLK_3B

* 13AN * 13ANX;

= < inY_sum> 1123B * 1123BX * 123B *
123BX

* 13BN;

6-209

~

State Machine Design Considerations and Methodologies

~', ~~amucroR =============================;;;
Appendix D. Cypress PLD TooIKit:.CY7C361 Simulation

6-210

State Machine Design Considerations and Methodologies
Appendix D. Cypress PLD ToolKit: CY7C361 Simulation (Cont.)

6-211

~

£:: ~RESS
~,

State Machine Design Considerations and Methodologies

~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix D. CypressPLD ToolKit: CY7C361 Simulation (Coot.)

6-212

~

,

.iii

CYPRESS
SEMICONDUCTOR

Understanding the CY7C330
Synchronous EPLD
determined by the total 15-ns feedback time from the Q
output of a flip-flop to the D input of any flip-flop in
the device. To ensure the 66-MHz operation, all 23 inputs to the device have registers. This structure permits
pipelined operations, which allow external data to be
synchronized or CPU bus-oriented data to be latched.
Input registers can be clocked from either of two input
clock sources on either pin 2 or 3.
The CY7C330 offers 258 variable product terms for
16 state registers. This allows you to design very complex sequential machines with virtually no limitation of
product terms. These designs can easily exceed the size
you want to manage with Kamaugh mapping. However,
the new generation of advanced EPLD compilers can
manage very complex state machine designs on workstations such as the IBM PC/XT .

This application note provides basic information on
the CY7C330 and presents four design examples: a
high-speed up/down counter with limits, a 16x16
crossbar switch, a pipelined buffer, and a simple toggle
counter. Also included is an internal product term numbering c'hart. All example source code is in Cypress
PLD ToolKit syntax.
The Cypress CY7C330 is the flrst in a family of
high-speed, application-optimized CMOS EPLDs. This
fully synchronous part is designed to implement state
machines and other clocked systems. The CY7C330 offers new solutions for systems designers, with a truly
usable high clock rate, 39 total registers, and 17,000
programmable bits providing up to 1200-gate complexity.
Other devices in the family are the CY7C331 and
the CY7C332. All family members are packaged in 28pin, 300-mil dual in-line and LCC/PLCC packages. The
technology is low-power CMOS and UV erasable. The
application-specific family from Cypress provides the
CY7C330 for sequential state machine applications, the
CY7C331 for general-purpose asynchronous designs,
and the CY7C332 for decoders and combinational logic
applications.
This family of high-speed devices provides the optimal solution for each system design using Cypress's
0.8-micron, dual-level-metal, CMOS technology. Systems using other types of programmable logic devices
for synchronous state machine applications can use the
CY7C330 as a higher-density, lower-power solution at
speeds up to 66 MHz.
The Cypress PALC22VlO, PLDC20G 10 and
PAL20 devices proved the popularity of high-speed,
low-power, erasable CMOS logic. The CY7C330 builds
on that base. One CY7C330 can easily replace four
PALC22VIOs because the CY7C330 extends the number of state registers to 16, extends the number of
product terms per output to 19 maximum, adds an
XOR logic function, and provides the ability to use pins
as bidirectional 1/0.
The CY7C330 increases the speed of synchronous
systems to 66 MHz. This is the actual usable speed, as

Overview of the CY7C330
An easy way to picture the CY7C330 is with the

block diagrams in Figure 1. On the input side of the
CY7C330 (pins 1 - 7 and 9 - 14) are 11 input registers
and three clocks. Pin 1 is the state clock. Each of the 11
input registers is edge triggered, and each can use
either device pin 2 (clock 1) or pin 3 (clock 2) (shown
in Figure 2) as a clock. An architecture bit for each
input register controls the selection of the input clock.
This approach allows input data to be synchronized to a
clock edge or loaded into the device from a CPU data
bus, with the clocks being decoded I/O-write signals.
The registers' setup and hold times are very short, allowing high system throughput. Note that the outputs of
the registers feed the device's AND-OR-XOR array.
Pin 14 has an additional function that affects the
input register: You can use the pin as a fast,
asynchronous output enable to the device, allowing a
CPU to move data in the state machine registers onto a
bus, for example.
On the I/O side of the device (pins 15 - 20 and 23 28) are 12 macrocells. Each I/O macrocell (see Figure 1
in "Using ABEL to program the CY7C330") contains a
D-type register, an input register with clock controls,
and output-enable resources. Architecture bits for feed6-213

TO UPPER SECTION

T0

LOWE R

SEC T' 0 N

Figure 1. The CY7C330 Block Diagram
back selection, output-enable configuration, and inputregister clock selection allow you to configure each
macrocell independently.
Each adjacent 110 macrocell shares an input multiplexer (Figure 3). This allows either macrocell register
to be hidden, while the 110 pin is used as an input. In
addition, four hidden register macrocells (see Figure 3
in "Using ABEL to Program the CY7C330") provide
additional state registers without direct output connections.
The AND-OR-XOR array in Figure1 has 66 inputs
and 244 product terms driving 16 OR-XOR gates. The

I

i

FINS 1. .? •.. 14

. . .,i. . . . .'1'

PiN

I
!

I

LiD
!!

l 0r-r

l. .

16 OR gates have from nine to 19 inputs (variable
product terms), which allow complex designs to fit into
each stage. An XOR product term for each OR output
permits equations to be solved either with D or T flipflops in the output stage, or for active-High or activeLow equations. 12 product terms provide the outputenable function. A global reset and preset is also
generated out of the array. Each product term forms an
AND function with up to 66 inputs. The 66 inputs are
the true and complement signals of 33 internal nodes in
the CY7C330.
FRO~

I~n·
""'"y

'IU

INPUT TO
ARRAY

UPPER

~ACROCELL

~0"""..1
o

S

1

··············r··········..

(4

C3

CU::'2 FROl\ PIN :3
eLKl FRD1'\ PiN 2

I

FRO~ LO~ER

Figure 2. The CY7C330 Input Macrocell

~ACROCELL

Figure 3. The CY7C330 Shared Input Multiplexer

6-214

~D~a~
Pin 1

~>CLi(!

!

arOJ
Pin 2/3

-1u k o~
Pin 1

~~H.tt

!

!

'f0J

Pin 1

~~~~r
,..............
~~~~r ----..cia

0:
l.. . . ~~.jpln 2/3

t~~~

__

~~~~~

-----!

----l

Pin 2/3

Pin 1

~~~~r
~~~~r

-----'
a

0
CL

Pin 2/3

~

"...!.......
~
o ..---c>-

..·0 PR

Pin 1

CLK

MO

1~~~E

Figure 4. Four CY7C330 I/O Macrocell Configurations

pin 3 serving as clock. The total register count is 39-16
state registers and 23 input registers. To keep the
device speed as high as possible, the number of inputs
to the array is limited to 33 (x2); six of the array inputs
from the I/O macrocells are multiplexed (shared). Thus,
three feedbacks are provided for the two output and
two input registers for each set of two I/O pins. The
easiest way to understand the net result is that the maximum number of hidden registers in the 12 I/O macrocells is six. Output registers that have no feedback to
the array are useful for data outputs or single-clockdelayed Mealy outputs from the state machine.
The 12 macrocells have 24 registers total and 18
feedbacks. When you assign functions in your application to physical pins in the device, consider the number
of feedbacks available and the number of product terms
required.

Macrocell State Registers
The CY7C330's OR-XOR gates feed into 16 state
registers. These registers are edge-triggered D flip-flops
with pin 1 serving as clock. The outputs from these state
registers feed back into the array, allowing you to con~
struct high-speed state machines. The total feedback
time period from Q to D and the array delay from input
register to state register is 15 ns, allowing a full, usable
clock rate of 66 MHz.
Four of the CY7C330's state registers are always
hidden inside the device. A hidden register lets you
build intermediate states or other functions without
loading an I/O pin. Of the 12 remaining registers, up to
six can be hidden. This gives a total of 10 maximum
usable hidden registers, while allowing the 28-pin device
to have 17 dedicated input pins, six I/O pins, and many
other combinations. Valid I/O macrocell configurations
appear in Figure4.

Center Pinning
All Cypress CY7C330 family products use center
pins for Vee and Vss connections. In addition, the Vss

Each I/O macrocell (pins 15 - 20 and 22 - 28) also
has an edge-triggered input register with either pin 2 or
6-215

for the intemal logic and the Vss for the output drivers
are on different pins. Center power pins eliminate noise
generated by both TIL and CMOS devices. This noise
is inductive noise proportional to the package lead inductance. Moving the power pins to the center lowers
pin inductance and noise by a factor of 3 compared
with corner-pin power connections.
Splitting ground lines-with the ground for input
and logic on pin 8 and the ground for output drivers on
pin 21-has additional noise benefits. Ground-bounce
noise is caused when outputs switch from High to Low.
The more pins switching at the same time, the more
noise generated. Several hundred millivolts can be induced on the chip's internal ground from this effect. Although the level is low enough to meet output Vol specs,
the noise voltage must be considered when designing
the input buffers on a chip because the noise influences
the Vii spec of 0.8V. 400 mV of ground-bounce noise
shifts the AC effective Vii to 1.2V.
By separating the input reference ground from the
output ground where the noise is generated, ground
noise compensation is lowered or eliminated. This permits Cypress offer a faster input buffer. Externally, the
two grounds are connected together. Also, by placing
the Vee pin close to the GND pin, external 0.1 J.LF
capacitors (as usual, one per chip) can be very close to
the actual device power pins.
All Cypress EPLDs permit the registers to be
preloaded into any configuration. This capability can
vastly reduce the test time and allows all patterns
programmed into an EPLD to be completely tested.
Without preload, for example, testing a multibit counter
that has no reset product term could be very slow or
impossible.

CY7C33X Family Technology
The CY7C330 and most other new Cypress
products are built in the Cypress 0.8 micron, N-well
CMOS, high-speed technology. New Cypress EPLDs
use a dual-metal-layer connection method to further in'crease speed. This technology allows Cypress to build
static RAMs with 7-nsaccess times, 35-MHz FIFOs, a
33~MHz RISC processor, and many other high-performance products.
'
Cypress uses an EPROM technology (as distinct
from fuse-link, or EEPROM technology) for all its
EPLD sand (E)PROMS because of the tremendous in.;
crease in manufacturing yields and 100-percent testability offered by EPROM technology. This UVerasable EPROM technology provides proven data
retention, testability, and manufacturability.
, In addition, the Cypress 2T (2 transistor) cell
design allows very high speed circuits to be built.
Cypress uses this 2T cell design for performance. One
transistor is used only for programming and the other
for reading, with each optimized for only one function.
The program transistor can be larger and slower. It is
designed to withstand 15V source to drain, which is the

maximum program charge on the floating gate. The
and fast. Because the
read transistor can be very
read bit line is only switching between 0 and 5V, the
sense amp is smaller and faster, and no high-current
15V driver MOSFETs are present. The result is very
fast (sub 10 ns) array times.
All Cypress devices offer protection against static
discharge (ESD). This means the devices are no more
sensitive than bipolar devices. By using a unique -3V
substrate bias generator (Vbb), Cypress devices are
protected from latchup caused by transient voltages
below ground, which are commonly seen in TIL systems. This internally generated Vbb also allows the
device to maintain high speed over a wide temperature
range by controlling switching thresholds. No current
flows in an input even under extreme undershoot situations, and the input transistor requires no recovery time
after an undershoot.
In addition to substrate bias for latchup elimination, Cypress uses a stacked TIL output driver. This
feature removes the pin-to-P-channel-transistor connection, a major source of latchup. Reducing the energy in
High-to-Low transitions also improves overshoot and
noise generation. Virtually all high-performance systems
using TIL or CMOS adhere to the TIL standard voltage specification-2.0V for a TIL High and 0.8V for a
TTL Low. Thus, a P-channel output transistor that pulls
the output to Vee causes more problems than it solves
because it overdrives the output. The lower voltage output from a stacked N-channel output drive of 3.5V vs.
5.0V causes less noise on the High-to-Low transition
because less energy needs to be switched.
Cypress uses stacked N-channel transistors on the
outputs of all devices, eliminating latchup and fast transition to an overly high output 1 level. The devices are
more compatible with the TIL devices Cypress
replaces.

small

Resource Planning
Planning the assignment of functions to pins in the
CY7C330 is an important step in a CY7C330 design.
The resource planning sheet presented in Table 1
should be helpful for this procedure. Examples of its
use are included with each application presented here.
The decision on which pin to use is based on:
1. Asynchronous output enable, set to pin 14 or
synchronous enable with a product term
2. State clock is pin 1
3. Input clock is pin 2
4. Second input clock is pin 3, or use pin 3 as a normal
input if pin 2 will be the only input clock
5. Input only on pins 4 -7,and 9- 13
6. Device outputs: Assign pins keeping in mind that
they have different product term widths. The widths
are: 9, 11, 13, 15, 17, 19 for pins 28/15, 26/17, 24/19,
23/20,25/18,27/16, respectively

d. Assign input names to these six registers that are
different from the physical device pin names
e. The optionally hidden registers can be viewed if
their output enable is made active and the external
logic driving the pin is in a high-impedance state;
otherwise the OE (output enable) product term of
the hidden register must be set to Zero
(NAME.ENA = 0)

7. Use of hidden registers:
a. Four registers - H 1 to H4 - are always hidden
b. Up to six additional hidden registers can be
defined; Cypress suggests this sequence: 25, 18, 27,
16,23,20
c. Assign input names to these six registers that are
defined. Cypress suggests this sequence: 25, 18, 27,
16,23,20

Table 1. A CY7C330 Resource Planning Sheet
CY7C330 Resources Planning Sheet
Project: Your project name
Input
Register
Pin
Function
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

State Ok
Clk 1
Input/Clk 2
Input
Input
Input
Input
VSS
Input
Input
Input
Input
Input
Input/OE
Input
Input
Input
Input
Input
Input
VSS
VCC

23
24
25
26
27
28
HI
H2
H3
H4
Notes :

Input
Input
Input
Input
Input
Input
None
None
None
None
Input Register Clock

Input
Register
Clock

Register
Function

Output
Enable

# of
PTerms

112
112
112
112 if Input
112 if Input
112 if input
112 if input
112 if input
112 if input
112 if Input

Output
Output
Output
Output
Output
Output

Pin
Pin
Pin
Pin
Pin
Pin

141Ptenn
141Ptenn
141Ptenn
14/Ptenn
141Ptenn
141Ptenn

9
19
11
17
13
15

112 if input
112 if input
112 if input
112 if input
112 if input
112 if input

Output
Output
Output
Output
Output
Output

Pin 141Ptenn
Pin 14/Ptenn
Pin 14/Ptenn
Pin 141Ptenn
Pin 141Ptenn
Pin 14/Ptenn
None
None
None
None

15
13
17
11
19
9
19
11
17
13

1 if Input
112
112
112
112
112

1/2

#1 is pin 2
#2 is pin 3
See the Application Note for the meaning of the pin names.
Output Enable = 14 means the asynchronous pin 14 direct enable.
Z means the pin is never active

6-217

~CYI'R!SS
CY7C330 Synchronous EPLD
~~R~~~==~====~~~-~~~
8. The remaining visible registers can still be used in
applications where both inputs of a macrocell pair
are used. However, one of the output registers of
each adjacent Pair cannot have a feedback; it is used
only as an OQtput synchronized by the state clock on
pin 1. If, after this assignment, the compiler or assembler complains that not enough product terms
are available, some pins might have to be re-assigned

Software Design Tools
You can compile logic for the CY7C330 with a
number of packages available from independent
software vendors® These packages include ABEL V3",,0
from DATA 110 and LOG/iC V3.0 from ISDATAIIII.
Cypress has developed the PLD ToolKit (CY7C3101),
which you can use to design any PLD that Cypress
makes. All these packages are logic compilers capable
of converting state machine or binary logic descriptions
into a JEDEC file that can program the device.
The JEDEC file is the standard interface from a
software development tool to a logic programmer. See
the examples section for more detail on the software
tools.

Logic Programmers
The CY7C330 can be programmed today on the
QuickPro plug-in board for IBM and compatible personal comp~ters. So~n you will also be able to use the
DATA 110 , STAG ,and other programmers.
Some software tools require you to set fuses or bits
in the device to enable certain functions, whereas others

eLKS

r----

15

0i9

t---020

[ .......... 023

r

19

024

'-.....or-...

CLKl

r----

025
026

1.. ···........·.. Q27

. ·. ·t·. ·. ·

......................] Q 2 8

CLK2
Figure 5. Pipelined Buffer Block Diagram

Pipelined Buffer
The Pipe330 example is a two-stage pipeline that
shifts parallel data from the inputs to the outputs (Figure 5). This example demonstrates the overall Cypress
PLD ToolKit source syntax and shows how macrocells
are configured.
In the Pipe330 example, the output enable for
specific macrocells is under control of either pin 14 or
the associated product term. The latter case is the
default. To control the output enable of a·· macrocell
with pin 14, add NENBPT to the list of attributes following. the· node assignment in the configuration section.
If NENBPT does not appear in the attribute list for
a node, the expression that follows .the construct
 in the equations controls the output enable. If
 . is not part of the equation, the output is permanently disabled. If  is present, but no expression follows it, the output is permanently enabled.
The pin 1 signal always clocks the output registers
in the CY7C330. Either the pin 2 or 3 signals can clock
the input registers. Because pin 2 is the default clock,
no special attributes are required for this configuration.
If you wish to clock an input register with pin 3, the
attribute list for that node must contain ICLK= 3.
The resource planning sheet for the pipelined buffer appears in Table 2, and the source code appears in

Appendix A.
Test patterns for the Pipe330 example are relatively
simple, but keep in mind a few guidelines. At first, for
example, the state of the registers in the device is unknown, and all registers are put in a known state before
any outputs are checked (non-X). Another aspect of
CY7C330 simulation is the need to consider multiple
clocks. The input and output clocks should be treated
separately, because the simultaneity of clock assertion is
not guaranteed in programmers---or in any real system,
for that matter.

UplDown Toggle Counter with Preloads

·····..·v··..·..\

i

set the architecture bits automatically. Note that bit
17070 requires special attention: it must be set to 1 if
any input register uses a clock from pin 3. This requirement will disappear in {uture releases of the software
packages, and the bits will be set automatically.

The Tog330 example shows how you can use the
CY7C330's XOR product terms to emulate aT-type
flip-flop. The statement:
Q =
< XSUM> Q
< SUM> T;
programs the XOR product term with the feedback of
the register output, making the register into a T type .
The T-type register configuration is active Low because,
by architecture, all the outputs are active Low. You can
. emulate a JK-type flip-flop by using the configuration
above with the following relation:
T = J!Q + KQ

6-218

5/!cvmss
-=-

CY7C330 S.r!!chronous EPLD

SEMICCNDUCTOR

=====================;;;;;!;=======;;;;;;

Table 3 presents the resource planning sheet for
the toggle counter example, and the source code appears in Appendix B. Figure 6 shows the block diagram
for the design.

UplDown Counter with Limits
The up/down counter example shows how you can
assign the pins for maximum use in the CY7C330. This
counter operates at 66 MHz, counting up until reaching
the value stored in the 8-bit upper-limit register, then
down until reaching the lower limit. Also included is a

device reset and a method to preload the counter to
either the upper or lower limit.
Consider an application in which the two 8-bit limit
registers are loaded from a CPU. The lower limit is on
pins 4 to 12, with a 9th bit for preload on pin 13. The
clock for this lower limit is on pin 2. The upper limit is
loaded via pins 15 - 27, with pin 27 providing 9th
preload bit. These pins are also used for reading out the
counter value, and pin 14 is the output enable for the
up/down counter.

Table 2. Resource Planning Sheet for Pipelined Buffer

CY7C330 Resources Planning Sheet
Project: Pipelined Buffer
Input
Register
Pin
Function
1
2
3
4
5
6
7
8
9
10
11
12
13

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
HI
H2
H3
H4

Input
Register
Clock

State Clk
Clk 1 (LHS)
Clk 2 (RHS)
14
15
16
17
VSS
19
110

1
2

III

2

112
113
OE

2

Register
Function

Output
Enable

# of

Z
Z
Z
Z

PTerms

2

Q19
Q20

Pterm (Eqn)
Pterm (Eqn)

9
19
11
17
13
15

Q23
Q24
Q25
Q26
Q27
Q28

Pterm (Eqn)
Pterm (Eqn)
Pin 14
Pin 14
Pin 14
Pin 14
None
None
None
None

15
13
17
11
19
9
19
11
17
13

VSS

vce

None
None
None
None

Notes: Input Register Clock

#lispin2
#2 is pin 3
See the Application Note for the meaning of the pin names.
Output Enable = 14 means the asynchronous pin 14 direct enable.
Z means the pin is never active

6-219

'7

L....·..··.. ·in~

r. . . ·

Q0 ............8fIo.

IT!

m

iT:?

Q?'

~-~;T3

03 i--~~1111L .... i~

.?'

t.. . . . . . . . . . . . . . . . . . . . . . . . . . . j . . . . .
CLR
Figure 6. Toggle Counter Block Diagram

Figure 8. 16X16 Crossbar Switch Block Diagram

Four buried registers detect equality of the counter
with the limits to maintain up/down direction and to
detect the preload request as an edge-triggered signal.
By using the XOR product terms, the counter needs
only nine total products even on the most significant bit.
Without XOR, the 8th bit would need 18 product terms
because of the two preload sources. Due to the large
number of product terms per output in the CY7C330,
this counter can operate at 66 MHz.
The counter's contents can be read out when pin 14
(direct output enable) is Low. In a bus-oriented system,
a microprocessor can read the register if a decoded I/O
read signal is applied to pin 14. Note that the other
method of output enable, via the array, requires a clock
edge to load the enable input condition into the input
registers. When pin 14 is High, the upper-limit register
can be loaded-from a processor bus, for example. The
lower-limit register can be loaded at any time.

Pin 2

Pin 3

Prelo .. d L Prelo .. d H Resel-<' """---1---- Pin

1

L-_~-~~t_~-Pln 14

Figure 7 shows the block diagram for this design.
The resource planning sheet appears in Table 4, and the
code is in Appendix C.
In operation, the up/down counter counts between
the limits stored in two registers. Lower-limit (LL) data
is loaded on the positive edge of the pin 2 clock. There
are 8 data bits plus 2 control bits, LPL and Reset. If
LPL is Low, only the limit compare register is changed.
If LPL is High, the LL data is loaded into the counter
on the next clock edge, and the counter counts up. The
LL data is one count higher than the actual lower limit.
If RESET is active, all internal registers are reset to 0,
so long as the reset bit is set in the LL register.
Upper-limit (UL) data is loaded on the positive
edge of the pin 3 clock. This part of the counter uses 8
data bits plus a preload control bit, UPL. If UPL is
Low, only the limit-compare register is changed. If UPL
is High, the UL data is loaded into the counter on the
next clock edge, and the counter counts down. UL data
is multiplexed with the counter output data. The UL
data is one count lower than the actual upper limit. Pin
16 is the RESET input. Pin 14 is the active-Low output
enable for the counter; the counter can be read at any
time. Pin 1 is the clock for the counter. Pins 18 and 20
are connected together for data bit 6. Pins 23 and 25
are connected together for data bit 7.
The buried (hidden) registers are used as follows:
HI is loaded with the result of the comparison between
the counter and UL. H2 is UPL or LPL, delayed by one
clock edge; H2 serves as an edge detect. H3 is loaded
with the result of the comparison between the counter
and LL. H4, when High, forces the counter to count up.

16 x 16 Crossbar Switch
A data switch capable of multiplexing 16 inputs
into four outputs can be built with one CY7C330. The
66-MHz clock rate allows even asynchronous input signals of up to 33 MHz to be switched through the ~evice.
The compact 300-mil package saves PCB space, In contrast to the space such a multiplexer would otherwise

8

Figure 7. UPIDOWN Counter Block Diagram

6-220

need. At least 40 pins would normally be required, partitioned as follows:
16 input pins,
4 output pins,
4 x 4 = 16 selection inputs
4 pins for power and clock connections
No other PLD today can perform this function
using a single device, due to the logic requirement (the
number of product terms required per output) as well
as the timing requirement.

The crossbar switch uses 12 state registers plus four
input registers to act as the 4 x 4-bit selection registers.
Each output channel needs a 4-bit register to select one
of 16 input channels. A 4-stage, 4-bit-wide shift register
implemented in the device holds the select status. This
allows the 4 x 4 selection bits to be loaded via only four
pins, without needing any address pins.
When the PL (PRELOAD) signal on pin 3 is Low,
input data bits 0 to 3 become the selector data lines;
five clock pulses shift the select data through the device

Table 3. Resource Planning Sheet for Toggle Counter
CY7C330 Resources Planning Sheet
Project: 4 Bit Toggle Counter
Input
Register
Pin
Function
1
2
3
4
5
6
7
8
9
10

Input
Register
Clock

# of
PTerms

Register
Function

Output
Enable

!QO
!QI
!Q2
!Q3

Pterm
Pterm
Pterm
Pterm
Z
Z

9
19
11
17
13
15

Z
Z
Z
Z

15
13
17
11
19
9
19
11
17
13

State Clk
Clk 1
Clear

VSS

11

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
HI
H2
H3
H4

VSS
VCC

Z

Z
None
None
None
None

None
None
None
None

Notes: Input Register Clock

# 1 is pin 2
#2 is pin 3
See the Application Note for the meaning of the pin names.
Output Enable = 14 means the asynchronous pin 14 direct enable.
Z means the pin is never active

6-221

a shared-input mux. The source file's configuration secinto selectors 1, 2, and 3, as well as the output pins.
tion specifies this arrangement by first assigning the
Setting pin 3 High after the fifth pulse loads the signals
name of the output register to the macrocell node numon the output data· pins into select register O. This last
ber. Because the default configuration is for the output
load operation utilizes· the function of pin 3 as a data
register's Q output to feed back into the array, no other
pin as well as a clock. Setting the signal on pin 3 Low
configuration attributes are needed here. Next, the
switches the internal logic from a selector into a shift
input's name is assigned to the node number of the
register; the clock edge created by applying a High to
shared . input mux adjacent to the pin. The default for
pin 3 loads the data. outputs into the input registers associated with output pins 16, IS, 25, and 27.
the shared input muxes is to pass the data on the evennumbered pin into the array. If the input should come
This design buries the output registers of several
from an odd-numbered pin, YOll must add the attribute
110 macrocells and uses the pin as an input by utilizing
Table 4. Resource Planning Sheet for UplDown Counter
CY7C330 Resources Planning Sheet
Project: UplDown Counter with Limits
Input
Register
Pin
Function
lState Clk
2Clk 1
3Clk2
4ll.01
5ll.11
6ll.21
7ll.31
SVSS
9LIA1
1Oll.51
1lll.61
12ll.n
13PRELOAD LOW1
14COUNTER OE15UL12CNTlPin 14 9
16Resetl-Z19
17UL32CNT3Pin 1411
ISUL62-Z17
19UL42CNT4Pin 1413
20--CNT6Pin 1415
21VSS
22VCC
23--CNT7Pin 1415
24UL52CNT5Pin 1413
25UL72-Z17
26UL22CNT2Pin 1411
27PRELOAD HIGH2-Z19
2SUL02CNTOPin 149
H1None-Up EquaisNone19
H2None-UH Prel'DoneNone11
H3None-Down EqualsNone17
H4None-Up CountNone13

Input
Register
Clock

Register
Function

Notes :Input Register Clock #1 is pin 2
#2 is pin 3
See the Application Note for the meaning of the pin names.
6-222

Output
Enable

# of
PTerms

SRC=N (where N is the pin number) to the list of attributes in parentheses following the node name. For an
example of this syntax, refer to dl0 and sa2 in the
source file.
The space advantage of the CY7C330 in this
crossbar switch application becomes especially important as the size of the matrix increases. A 32 x 32 matrix
requires only 16 devices vs. 64 PALC22VI0s or 96 TIL
parts. You can easily load the internal data selection
registers with a Cypress 24-pin EPLD, the PLDC2OGlO,

and a FIFO. A CPU can load the 16 x 4-bit selector
information into the FIFO, and the PLDC20G 10 can
move the data from the FIFO into the device. One
PLDC2OGI0 and one 16 x 4 (or larger) FIFO is required. The Cypress CY7C403 is an ideal FIFO for this
application
Table 5 shows the resource planning sheet for the
16 X 16 crossbar switch, and a block diagram of the
design appears in Figure 8. The source code can be
found in Appendix D.

Table 5. Resource Planning Sheet for Crossbar Switch
CY7C330 Resources Planning Sheet
Project :16 X 16 Crossbar Switch
Input
Register
Pin
Function
1
2
3
4
5
6
7

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
HI
H2
H3
H4

State Clk
Clk 1
Sel PRELOAD
Data 0
Data 1
Data 2
Data 3
VSS
Data 4
Data 5
Data 6
Data 7
Data 8
Data 9
Data 10
Select DO
Data 11
Select CO
Data 12

Input
Register
Clock

1

2
2

Register
Function

Output
Enable

# of
PTerms

Select A2
Output 3
Select Al
Output 2
Select Cl
Select Dl

Z

9
19
11
17
13
15

Select B2
Select A2
Output 1
Select C2
Output 0
Select D2
Select A3
Select B3
Select C3
Select D3

Z
Z

Pterrn
Z

Pterrn
Z
Z

VSS
VCC
Data 13
Select BO
Data 14
Select AO
Data 15
None
None
None
None

Notes: Input Register Clock

1

2
1

2

Pterrn
Z

Pterrn
Pterrn
None
None
None
None

15
13
17
11
19
9
19
11
17
13

# 1 is pin 2
#2 is pin 3
See the Application Note for the meaning of the pin names.
Output Enable = 14 means the asynchronous pin 14 direct enable.
Z means the pin is never active

6-223

the non-inverted input to the array. If the number is
even, then the false input is the next-higher integer; if
the number is odd, then the false input is the next lower
integer.
The table lists the number of product terms in each
output stage, along with the JEDEC offset (sequential
fuse position) for each.

· Reading the CY7C330 JEDEC Map
Table 6 should help you read the JEDEC map of a

CY7C330. The pin or node reference number is on the
left. These numbers correspond to the pin and node
numbers on the block diagram in Figure 1.
The column labeled Input True gives the sequential
number (left to right) of the column corresponding to

Table 6. The CY7C330 Internal Array Reference List
Pin or
Node

Function

1
2
3
4
5
6

State Clock
Input Clockl
Input Clock2
Input Register
Input Register
Input Register
Input Register
VSS
Input Register
Input Register
Input Register
Input Register
Input Register
Input Register
110 Regs, mux
mUll input(node)
IJO Regs, mux
IJO Regs, mux
mux input(node)
110 Regs, mux
IJO Regs, mux
mUll input(node)
110 Regs, mux
VSS

7
9
10
11
12
13
14
15
N-35
16
17
N-36
18
19
N-37
20
21
22
23
N-38
24
25
N-39
26
27
N-40
28
N-29
N-30
N-31
N-32
N-33
N-34

Input
True

# of
Pterms

OE

XOR

1st
OR

9

L16236

Ll6302

Ll6368

19
11

L14850
Ll3992

Ll4916
Ll4058

L14982
Ll4124

17
13

Ll2738
L9636

Ll2804
L9702

Ll2870
L9768

15

L8514

L8580

L8646

39
36
35
33
30

15

L5280

L5346

L5412

13
17

L4290
L3036

L4356
L3102

L4422
L3168

29
27
24
23

11
19

L2178
L792

L2244
L858

L2310
L914

9

L66

L132

Ll98

0
2
4
6
8
10
12
14
16
18
20
65
62
61
59
56
55
49
46
45

vce
IJO Regs, mux
mUll input(node)
IJO Regs, mux
IJO Regs, mux
mux input(node)
IJO Regs, mux
110 Regs, mux
mUll input(node)
IJO Regs, mux
Sync. Reset
Sync. Preset
Buried Register
Buried Register
Buried Register
Buried Register

LO
Ll6962
40
42

13
17

Ll1814
Ll0626

Ll1870
Ll0692

50
52

11
19

L7722
L6402

L7788
L6468

6-224

Appendix A. PLD ToolKit Source Code for Pipelined ButTer
CY7C330;

{Pipe330}

CONFIGURE;
CkS (node=l),
Ckl,
Ck2,
10 (iclk=3),
11 (iclk=3),
12 (iclk=3),
I3 (iclk=3),
14 (node=9),
15,
16,
17,
OEl,
IOE2(node=14),
Q7,
Q6,
Q5,
Q4,
Q3(nenbpt),
Q2(nenbpt),
Ql(node=23,nenbpt),
QO(nenbpt),
lRST(iop),
reset(node=29),

{Output register clock}
{Input register clock I}
{Input register clock 2}
{Input 0, clocked by Ck2 (pin 3)}
{Input 1, clocked by Ck2 (pin 3)}
{Input 2, clocked by Ck2 (pin 3)}
{Input 3, clocked by Ck2 (pin 3)}
{Input 4, clocked by Ckl (pin 2)}
{Input 5, clocked by Ckl (pin 2)}
{Input 6, clocked by Ckl (pin 2)}
{Input 7, clocked by Ckl (pin 2)}
{output enable for Q<7:4>}
{direct output enable for Q<7 :0> }
{Output 7, clocked by CkS, enabled by OEl&IOE2}
{Output 6, clocked by CkS, enabled by OEl&IOE2}
{Output 5, clocked by CkS, enabled by OEl&IOE2}
{Output 4, clocked by CkS, enabled by OEl&IOE2}
{Output3, clocked by CkS, enabled: pinl4}
{Output2, clocked by CkS, enabled: pinl4}
{Outputl, clk: CkS, OE: pinl4}
{OutputO, clocked by CkS, enabled: pinl4}
{low asserted reset, I/O macrocell as input}
{internal reset node}

EQUATIONS;
reset = RST;
lQO



!IO;

lQI



!II;

lQ2



!I2;

lQ3



!I3;

lQ4

 OEI & OE2
< sum> !I4;

lQ5

 OEI & OE2
< sum> !I5;

lQ6

 OEI & OE2
< sum> !I6;

lQ7

 OEI & OE2
< sum> !I7;
{end of file}

6-225

Appendix B. PLD ToolKit Source Code for a Toggle Counter

CY7C330;

{Tog330}

CONFIGURE;
CkS,

Ckl,
!elr,
!OE(node
14),
!QO(nenbpt),
!Ql(nenbpt),
!Q2(nenbpt),
!Q3(nenbpt),
reset(node=29),

{Count clock, This is pinl since it is fIrst in the list.}
{Input clock, This is pin2 since it is next.}
{Low true clear, Pin3 is next in sequential order.}
{Low asserted output enable pin, pin 14}
{QO-Q3 are the counter outputs - pins 15-18.}

{The reset product term is node 29.}

EQUATIONS;
reset = Clr;

QO

 QO
< sum> ;

{Feeding the register output back into the XOR emulates a T flop.}
{T input - No expression after the connective < sum> means always asserted}

Ql =  Ql
< sum> QO;

{Feeding the register output back into the XOR emulates a T flop.}
{T input}

Q2 =  Q2
< sum> Ql & QO;

{Feeding the register output back into the XOR emulates a T flop.}
{T input}

Q3

{Feeding the register output back into the XOR emulates a T flop.}
{T input}

 Q3
< sum> Q2 & Q 1 & QO;

{end of fIle}

6-226

Appendix C. PLD ToolKit Source Code for UplDown Counter

{File: COUNTER.CYP Date: 11/9/1988 }
CY7C330;
CONFIGURE;
CLK(node=I), LLC(node=2), ULC(node=3),
{Count clock, Lower Limit Clock, Upper Limit Clock}
LLO(node= 4, iclk= 2), LL1, LL2, LL3, {The Lower Limit register is clocked by pin 2-LLC- by default.}
LL4(node= 9), LL5, LL6, LL7,
{The register is located at pins 4-7, 9-12 - pin 8 is Vss.}
LPL(node=13),
{Lower limit PreLoad}
ICNTOE (node=14),
{Counter output enable on pin 14}
CNTO (node= 28, nenbpt, oclk= l,iclk= 3), {The counter itself is in the output register of various 1/0 macrocells}
CNTI (node=15, ,nenbpt, iclk=3),
{as noted in the node numbers after the names. Pin 1 always clocks the}
CNT2 (node=26, nenbpt, iclk=3),
{output registers-oclk = 1 was included once for documentation.}
CNT3 (node=17, nenbpt, iclk=3),
{'nenbpt' specifies that the output enable is controlled by pin 14}
CNT4 (node=19, nenbpt, iclk=3),
{rather than the output enable product terms in each macrocell}
CNT5 (node= 24, nenbpt, iclk= 3),
{Most of these rnacrocells will be bidirectional, with the Upper Limit}
CNT6 (node=20, nenbpt),
{register residing in the input registers. 'iclk = 3' specifies that pin 3}
CND (node=23, nenbpt),
{clocks the input registers. This overrides the default, pin2.}
{The output register is fed back into array by default.}
{ULO is the input reg of pin28, routed thru shared input mux-node40}
ULO (node=40, src=28),
{ULI is the input reg of pinl5, routed thru shared input mux-node35}
ULI (node=35, src=15),
{UL2 is the input reg of pin26, routed thru shared input mux-node39}
UL2 (node=39, src=26),
{UL3 is the input reg of pinl7, routed thru shared input mux-node36}
UL3 (node=36, src=17),
{UL4 is the input reg of pinl9 routed thru shared input mux-node37}
UL4 (node=37, src=19),
{UL5 is the input reg of pin24 routed thru shared input mux-node38}
UL5 (node=38, src=24),
{UL6 is the input reg of pinl8, 'iop' selects array input from input reg}
UL6 (node=18, iop,iclk=3),
UL7 (node=25, iop, iclk=3),
{UL7 is the input reg of pin25, 'iop' selects array input from input reg}
UPL (node=27, iop, iclk=3),
{Upper limit PreLoad, array input from input reg, clocked by pin 3}
lreset (node=16, iop),
{Low asserted clear, array input from input reg, clocked by pin 2}
node29 (node=29),
{The reset product term is node 29}
UP (node=31),
{buried node 31 selects the counter direction, clocked by pin I}
LEQUAL (node=32),
{buried node 32 compares counter with lower limit, clocked by pin I}
PLDONE (node=33),
{buried node 33 is the preload done flag, clocked bypin I}
UEQUAL (node=34),
{buried node 34 compares counter with upper limit, clocked by pin I}
EQUATIONS;
ICNTO= < XSUM> ICNTO
< SUM> ILPL & IUPL

< SUM> IPLDONE
< SUM> ILLO & LPL & CNTO
< SUM> ICNTO & ULO & UPL
< SUM> LLO & LPL & ICNTO
 CNTO & /ULO & UPL;
ICNT1= < XSUM> ICNTI
< SUM> ILPL & CNTO & IUPL & IUP
< SUM> ILPL & ICNTO & IUPL & UP
< SUM> ILLI & LPL & PLDONE & CNTI
< SUM> LLI & LPL & PLDONE & ICNTI
< SUM> UPL & PLDONE & lULl & CNTI
< SUM> UPL & PLDONE & ULI & ICNTI
< SUM> CNTO & IPLDONE & IUP
 ICNTO & IPLDONE & UP;

6-227

~C'tPRE$
~

CY7C330 Synchronons EPLD

SEMlcnIDUCI'QR ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;!;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;=;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Appendix C. Source Code for Up/Down Counter (continued)
leNT2=: < XSUM> ICNT2
< SUM> ILPL & CNTO & /uPL & /uP & CNTl
< SUM> ILPL & ICNTO & /uPL & UP & ICNTl
< SUM> ILL2 & LPL & CNT2 & PLDONE
< SUM> LL2 & LPL & ICNT2 & PLDONE

< SUM> UPL & CNT2 & /uL2 & PLDONE
< SUM> UPL & ICNT2 & UL2 & PLDONE
< SUM> CNTO & IPLDONE & IUP & CNTl
 ICNTO & IPLDONE & UP & ICNTl;
ICNT3= < XSUM> ICNT3

ILPL&CNTO&/UPL&CNT2&/UP&CNTl
ILPL&/CNTO&IUPL&/CNT2&UP&/CNTl
< SUM> ILL3 & LPL & PLDONE & CNT3
< SUM> LL3 & LPL & PLDONE & ICNT3
< SUM> UPL & PLDONE & /uL3 & CNT3
< SUM> UPL & PLDONE & UL3 & ICNT3
CNTO&CNT2&/PLDONE&IUP&CNTl
/CNTO&/CNT2&IPLDONE&UP&/CNTl;
ICNT4=

 ICNT4
< SUM> ILL4 & LPL & PLDONE & CNT4
< SUM> LL4 & LPL & PLDONE & ICNT4
< SUM> UPL & PLDONE & /UL4 & CNT4
< SUM> UPL & PLDONE & UL4 & ICNT4
 ILPL' & CNTO & IUPL & CNT2 & IUP & CNT3' & CNTl
 /LPL & ICNTO & IUPL & ICNT2 & UP & ICNT3 & ICNT
< SUM> CNTO & CNT2 & IPLDONE & /uP & CNT3 & CNTl
 ICNTO & ICNT2 & IPLDONE & UP & ICNT3 & ICNTl;

ICNT5=

 ICNT5
< SUM> ILL5 & LPL & CNT5 & PLDONE
< SUM> LL5 & LPL & ICNT5 & PLDONE
 UPL & CNT5 & /UL5 & PLDONE
 UPL & ICNT5 & UL5 & PLDONE
< SUM> ILPL & CNTO & IUPL & CNT2 & CNT4 & IUP & CNT3 & CNTl
< SUM> ILPL & ICNTO & /uPL & ICNT2 & ICNT4 & UP & ICNT3 & ICNTl
< SUM> CNTO & CNT2 & IPLDONE & CNT4 & /uP & CNT3 & CNTl
< SUM> ICNTO & ICNT2 & IPLDONE & ICNT4 & UP & ICNT3 & ICNTl;

ICNT6=

 ICNT6
< SUM> ILL6 & LPL & PLDONE & CNT6
< SUM> LL6 & LPL & PLDONE & ICNT6
< SUM> UPL & PLDONE & CNT6 & /uL6
< SUM> UPL & PLDONE & ICNT6 & UL6
< SUM> ILPL&CNTO&/UPL&CNT2&CNT5&CNT4 & IUP & CNT3 & CNTl
< SUM> ILPL & ICNTO & IUPL & ICNT2 & /CNT5 & ICNT4 & UP & ICNT3 & ICNTl
< SUM> CNTO&CNT2&CNT5&/PLDONE&CNT4 & IUP & CNT3 & CNTl
< SUM> ICNTO & /CNT2 & ICNT5 & IPLDONE & ICNT4 & UP & ICNT3 & ICNTl;

6-228

Appendix C. Source Code for UplDown Counter (continued)
ICNT7 =

 ICNTI
< SUM> ILL7 & LPL & CNT7 & PLDONE
< SUM> LL7 & LPL & ICNT7 & PLDONE
 UPL & !UL7 & CNTI & PLDONE
 UPL & UL7 & ICNTI & PLDONE
< SUM> ILPL & CNTO & IUPL & CNT2 & CNTS & CNT6 & CNT4 & IUP & CNT3 & CNTl
 ILPL & ICNTO & /UPL & ICNT2 & ICNTS & ICNT6 & ICNT4 & UP & ICNT3 & ICNTl
< SUM> CNTO & CNT2 & CNT5 & IPLDONE & CNT6 & CNT4 & IUP & CNT3 &CNTl
< SUM> ICNTO & ICNT2 & ICNT5 & IPLDONE & ICNT6 & ICNT4 & UP & ICNT3 & ICNTl;

node29 =  reset;
UP= < XSUM> UP
 lUEQUAL & IUP
 lLEQUAL & UP
< SUM> UPL & PLDONE & IUP
< SUM> LPL & PLDONE & UP;
PLDONE= < SUM> ILPL & IUPL;
LEQU AL= < SUM> LL6 & ICNT6
< SUM> ILL7 & CNT7
< SUM> LL7 & ICNT7
< SUM> LL3 & ICNT3
< SUM> ILLS & CNTS
< SUM> LL5 & ICNT5
< SUM> ILLl & CNTl
< SUM> LLO & ICNTO
< SUM> ILL2 & CNT2
< SUM> ILL4 & CNT4
< SUM> LL4 & ICNT4
< SUM> ILLO & CNTO
< SUM> LLl & ICNTl
< SUM> ILL6 & CNT6
< SUM> ILL3 & CNT3
< SUM> LL2 & ICNT2;
UEQU AL=
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<

< SUM> ICNT6 & UL6
SUM> IUL7 & CNT7
SUM> UL7 & ICNT7
SUM> UL3 & ICNT3
SUM> CNT5 & IUL5
SUM> ICNTS & ULS
SUM> lULl & CNTl
SUM> ICNTO & ULO
SUM> CNT2 & IUL2
SUM> IUL4 & CNT4
SUM> UL4 & ICNT4
SUM> CNTO & IULO
SUM> ULl & ICNTl
SUM> CNT6 & IUL6
SUM> IUL3 & CNT3
SUM> ICNT2 & UL2;

6-229

Appendix D. Source Code for Crossbar Switch

CY7C330;
configure;
clk (node=l), iclk, pI,
dO, dl, d.2, d3,
d4 (node =9), d5, d6, d7, d8,d9,
dlO (node=35,src=15), dll (node=36, src=17),
d12 (node=37,src=19), d13 (node=38, src=24),
d14 (node=39, src=26), d15 (node=40, src=28),
sal (node=17), sa2 (node=15), sa3 (node=34),
sbl (node=24), sb2 (node=23), sb3 (node=33),
scI (node=19), sc2 (node=26), sc3 (node=32),
sdl (node=20), sd.2 (node=28), sd3 (node=3l),
yO(node=27 ,iop,iclk= 3),
yl(node=25,iop,iclk=3),
y2(node=18,iop,iclk=3),
y3(node= l6,iop,iclk=3),

{Input reg
{Input reg
{Input reg
{Input reg

EQUATIONS;

Isal =  Ipi & Isa2
 pI & Isal;
/sa2 = /pi & sa3
 pI & /sa2;
sa3 =

 Ipi & dO
 pI & sa3;

Isbl=

 /pi & /sb2
 pI & Isbl;

Ish2=

< SUM> /pi & sb3
 pI & /sh2;

sb3 =  /pi & dl
 pI & sb3;
/sel=

< SUM> /pi & /sc2
 pI & /scl;

Isc2

< SUM> Ipi & sc3
 pI & Isc2;

sc3

< SUM> /pi & d2
 pI & sc3;

sdl = < SUM> /pi & /sd2
 pI & Isdl;

Isd.2

< SUM> /pi & sd3
 pI & Isd.2;

sd3

< SUM> Ipi & d3
< SUM> pI & sd3;

6-230

is
is
is
is

saO}
sbO}
scO}
sdO}

Appendix D. Source Code for Crossbar Switch (continued)
ly3

=

< OE> IpI
 pI & Ida & Isa3 & Isb3 & Isc3 & Isd3
 pI & Idl & sa3 & Isb3 & Isc3 & Isd3
 pI & 1d2 & Isa3 & sb3 & Isc3 & Isd3
 pI & Id3 & sa3 & sb3 & Isc3 & Isd3
 pI & Id4 & Isa3 & Isb3 & sc3 & Isd3
 pI & Id5 & sa3 & Isb3 & sc3 & Isd3
 pI & Id6 & Isa3 & sb3 & sc3 & Isd3
 pI & Id7 & sa3 & sb3 & sc3 & Isd3
 pI & Id8 & Isa3 & Isb3 & Isc3 & sd3
 pI & Id9 & sa3 & Isb3 & Isc3 & sd3
 pI & Isa3 & sb3 & Isc3 & sd3 & IdlO
 pI & sa3 & sb3 & Isc3 & sd3 & Idll
< SUM> pI & Isa3 & Isb3 & Idl2 & sc3 & sd3
 pI & Idl3 & sa3 & Isb3 & sc3 & sd3
 pI & Idl4 & Isa3 & sb3 & sc3 & sd3
 pI & Idl5 & sa3 & sb3 & sc3 & sd3
 IpI & sdl;

1y2 = < OE> IpI
< SUM> pI & Ida & sd2 & sc2 & sb2 & sa2
< SUM> pI & Idl & sd2 & sc2 & sb2 & Isa2
< SUM> pI & Id2 & sd2 & sc2 & Isb2 & sa2
< SUM> pI & Id3 & sd2 & sc2 & Isb2 & Isa2
< SUM> pI & Id4 & sd2 & Isc2 & sb2 & sa2
< SUM> pI & Id5 & sd2 & Isc2 & sb2 & Isa2
< SUM> pI & Id6 & sd2 & Isc2 & Isb2 & sa2
< SUM> pI & Id7 & sd2 & Isc2 & Isb2 & Isa2
< SUM> pI & Id8 & Isd2 & sc2 & sb2 & sa2
< SUM> pI & Id9 & Isd2 & sc2 & sb2 & Isa2
< SUM> pI & Isd2 & sc2 & Isb2.& IdlO & sa2
< SUM> pI & Isd2 & sc2 & Isb2 & Idll & Isa2
< SUM> pI & Isd2 & Isc2 & sb2 & Idl2 & sa2
< SUM> pI & Isd2 & Isc2 & Idl3 &sb2 & Isa2
< SUM> pI & Isd2 & Isc2 & Idl4 & Isb2 & sa2
< SUM> pI & Isd2 & Idl5 & Isc2 & Isb2 & Isa2
 IpI & scI;
Iyl = < OE> IpI
< SUM> pI & IdO & sbl & sdl & scI & sal
< SUM> pI & Idl & sbl & sdl & scI & Isal
< SUM> pI & Id2 & Isbl & sdl & scI & sal

< SUM> pI & Id3 & Isbl & sdl & scI & Isal
< SUM> pI & Id4 & sbl & sdl & Isel & sal
< SUM> pI & Id5 & sbl & sdl & Isel & Isal
< SUM> pI & Id6 & Isbl & sdl & Isel & sal
< SUM> pI & Id7 & Isbl & sdl & Isel & Isal
< SUM> pI & Id8 & sbl & Isdl & sel & sal
< SUM> pI & Id9 & sbl & Isdl & scI & Isal
< Sl!M> pI & Isbl & Isdl & scI & sal & IdlO
< SUM> pI & Isbl & Isdl & scI & Idll & Isal
< SUM> pI & sbl & Isdl & Idl2 & Iscl & sal
< SUM> pI & sbl & Id13 & Isdl & Isel & Isal
< SUM> pI & Idl4 & Isbl & Isdl & Isel & sal
< SUM> pI & Idl5 & Isbl & Isdl & Iscl & Isal
 IpI & sbl;

6-231

Appendix D. Source Code for Crossbar Switch (continued)

lyO = < OE> Ipl
< SUM> pI & IdO & lyO & Iyl & lil & ly3
< SUM> pI & Idl & yO & Iyl & lil & ly3
< SUM> pI & Id2 & lyO & yl & lil & ly3
< SUM> pI & Id3 & yO & yl & lil & ly3
< SUM> pI & /d4 & lyO & Iyl & il & ly3
< SUM> pI & IdS & yO & Iyl & il & ly3
< SUM> pI & Id6 & lyO & yl & il & /y3
< SUM> pI & Id7 & yO & yl & il & 1y3
< SUM> pI & IdS & lyO & Iyl & lil & y3
< SUM> pI & Id9 & yO & Iyl & lil & y3
< SUM> pI & lyO & yl & lil & y3 & IdlO
< SUM> pI & yO & yl & lil & IdB & y3
< SUM> pI & lyO & Iyl & Idl2 & il & y3
< SUM> pI & yO & Iyl & Idl3 & il & y3
< SUM> pI & lyO & Idl4 & yl & il & y3
< SUM> pI & IdlS & yO & yl & il & y3
 IpI & sal;

6-232

Using the Cypress CY7C330 in Closed-Loop
Servo Control
This application note examines a cornmon facet of
engineering design - control systems - and offers an
alternative to cornmon implementations. Along with an
overview of the subject, this application note explores
the tradeoffs among several implementation strategies.
Also included here is a description of a PLD-based
method that offloads the processing bandwidth require~
ments of a controlling CPU. Implemented in a Cypress
CY7C330 PLD, this method has been successfully
employed in a high-speed customer application - a
laser mirror-positioning servo.

are using to read this line of text, the engine thermostat
in most automobiles, and the print head of a dot-matrix
printer. The closed-loop application described later in
this application note consists of a motor-driven mirror
that can rotate 360 degrees in either direction.
Closed-loop systems use information from the environment under control to influence the output. Block
diagrams such as the one in Figure 1 typically represent
such a control system.
Control System Influences
In a closed-loop design, numerous factors influence
the system behavior. Among them are:
Input, I(t): The system input is the signal from an
external source that references the desired steady-state
behavior. In the mirror servo system, the steady-state
output is the absolute position at a given location within
a given accuracy. The input is also known as the reference or set point.
Summing function: This is the section of the control
system that determines the amount of error, E(t), currently in the system. It is the difference between the reference point and the controlled environment's present
state. In a motor servo system, E(t) is the difference between the target reference position and the motor's
present position. In an analog circuit, an operational
amplifier usually implements the summing function.
Controller. Most control systems incorporate a controller that receives the error signal as an input and
generates. an output that attempts to reduce this error
to within a specific tolerance (ideally 0). The controller
has a control mode that determines how the controller
should manipulate the error signal to produce a control
signal. Cornmon control modes include proportional, integral, differential, and the combination of these three
- PID. Approximately 80 - 90 percent of industrial
control implementations use variations of the PID
method.
Controlled device: The. object of the control system
is to have a controlled device perform satisfactorily.
This is the motor, in the case of the mirror servo.

Control System Concepts
Control system theory is applied to areas as diverse
as pneumatic controls and economic models. Analyzing
control system behavior mathematically relies heavily on
an understanding of Laplace and Z-transforms (see the
References). However, this application note deals with
the subject on a more practical level.
Control systems fall into two major categories:
open loop and closed loop. An open-loop system
generates outputs based on input conditions, but has no
feedback from the output to verify or correct the output
condition. Examples of open-loop systems include light
switches (although you could reasonably argue that the
human is the feedback loop) and self-timed, free-running traffic-control signals.
Closed-loop systems, on the other hand, provide information on system status to the controller. Examples
of closed-loop systems include the eye-brain system you
DI STURB"NCES
INPUT

OUTPUT

REFERENcE
POINT

FEEDBACK

Figure 1. Closed·Loop Servo System

6-233

Output, Oft): This is the physical characteristic to
be controlled. In ·an automobile thermostat system, O(t)
is the engine's temperature. In the mirror servo, O(t) is
the mirror's position.
Disturbance, D(t): Any influence on the system that
negatively affects the desired output is called a disturbance. In an automobile, operation in bumper-tobumper traffic that reduces airflow through the radiator
is a disturbance to the thermostat.
This is only a partial list of the influences in a
closed-loop control system, but the factors mentioned
are the most significant for the mirror servo. (For more
complete information, consult any of the References.)
Control System Parameters
Some of the parameters used to quantize control
system behavior are:
Accuracy: the difference between ideal and actual
steady-state system behavior.
Settling time: the time required to reach steady state
after the reference point is changed or set.
Percentage overshoot: the difference between the
reference point and the maximum excursion after passing through the reference point.
Jitter. a condition that occurs when the controlling
element improperly overcompensates for an overshoot
of the reference point. The overcompensation results in
an undershoot that is .again overcompensated for and
produces overshoot. Jitter can increase the system's settling time or result in unstable· oscillations that never at~
tain the reference point.
Rise time: the time required for the system's output
to increase from 10 to 90 percent of the final value.

Control System Implementations
Control system implementations vary from purely
analog to completely digital.· Many popular implementations use a hybrid of digital and analog techniques. The
approach described here uses a digital element to perform the summing, the control, and part of the feedback
function. This approach and the pure-analog method
are possibly the most often used.
Each approach has its own tradeoffs. Because
analog systems continuously perform the summing function (usually with an op-amp), they are immune to the
problems associated with data quantization. Thus
analog systems usually offer excellent stability.
Digital hybrids offer good senSitivity, immunity to
noise, resolution, and flexibility, along with minimized
drift. These systems are usually easier to design at a
lower cost, compared to alternatives. Microprocessors
make it relatively easy to implement the system's controller and summing function on one chip.
When you use a microprocessor, you can take advantage of several algorithms for generating the control
signal. The simplest is proportional control, in which
the correction made is proportional to the error signal.
The value by which the error is scaled is the system's

proportionality constant or gain. Proportional control
offers an intuitively reasonable solution: the larger the
error, the larger the corrective signal.
Another control algorithm is integral control,
where the corrective signal is based on the error's time
integral multiplied by a weighting factor. You typically
calculate this value using a numeric approximation. Integral control is usually combined with proportional
control to increase accuracy or reduce steady-state
error.
One other control algorithm, derivative control,
employs a corrective signal comprised of the error
signal's derivative over time multiplied by a weighting
factor. Again, a numeric approximation is used to calculate the derivative. Combining this method with proportional control contributes a stabilizing influence to the
system. However, noisy systems often omit the derivative function because it amplifies high-frequency disturbances.
When all three control algorithms are combined,
they constitute proportional· + integral + derivative, or
PIO control. You can verify the influences of the integral and derivative methods on PIO with analysis
based on Laplace transforms. A PIO tradeoff is that it
reduces the processor bandwidth available to perform
other tasks. PIO systems also require a fmite amount of
time to calculate the output value.
Another factor to consider in a hybrid control system is the system's sampling/processing rate. Several
reference books indicate that the sampling rate for a
closed-loop control system should be significantly above
the minimum dictated by Shannon's sampling theorem.
Thus, rather than operating at the Nyquist frequency
(twice the highest frequency sampled), the sampling
rate would be eight to ten times the highest sampled
frequency. The reasons for this practice include an uncertainty associated with determining the sampled
signal's highest frequency component, the possibility of
aliasing, and the decrease in system stability that can
result from a too-low sampling rate. Unfortunately, increasing the sampling rate quickly consumes the available bandwidth of a microprocessor-based implementation.

Using the CY7C330 in Servo Control
The Cypress CY7C330 can help offload the
microprocessor in a high-speed servo control system.
The application described here positions a mirror to
form images with a laser beam. A previous implementation of this system used a 68000 microprocessor in the
servo loop. But as the number of tasks on the 68000
increased, the processor's ability to maintain a stable
servo system became marginal. The CY7C330-based
version maintains servo loop stability as well as freeing
processor system throughput with a minimum of additional cost and complexity.
Several features of the CY7C330 are fundamental
to understanding this design (see the CY7C330 block
6-234

I------~D

0

~---nCLK

o
ClK1

INPUTS

TO
LOGIC

ARRAY

ClK2

Figure 2. CY7C330 Dedicated Input Register

diagram in Figure 1 of "Understanding the CY7C330
Synchronous EPLD"). The dedicated input registers
(Figure2), for example, allow data to be loaded into ~e
chip with either of two data input clocks - CLKl (pm
2) or CLK2 (pin 3). You choose the input clock at program time via an EPROM configuration fuse.
The macrocells (Figure 3) also feature input
registers, again with two clocks for data entry. The
ability to three-state the macrocell output drivers and
load data into the macrocell input register allows you to
use these macrocell input registers to hold reference
values. This is handy in applications such as up/down
counter s, where the input registers can hold the counter
upper/lower limit.
In the mirror-servo design, the macrocell input
registers store the mirror's calculated target position
and are clocked by CLK2. While actively controlling the
servo, this design uses the dedicated input registers for
loading the present mirror position from the servo loop.
In command mode, though, the dedicated input
registers hold data from the microprocessor that is used
to calculate a new target position. In either case, CLKl
loads the dedicated input registers.

Figure 3. CY7C330 I/O Macrocell
diagram in Figure 4 shows the general approach used.
The design employs three CY7C330s that each generate
an 8-bit accumulate for 24-bit precision. The
microprocessor provides the CY7C330s with a 24-bit
position reference target for the mirror. The CY7C330s
latch this 24-bit value into their on-board registers.
The CY7C330s perform the control loop's summing
and proportional feedback functions. The PLDs compare the 24-bit desired position to the present position,
which is maintained in an external 24-bit present-position counter. The result is the error multiplied by a
fixed unity gain. This proportional control signal is then
converted to an analog signal, which is converted to a
current level to control the positioning mirror's motor.
The motor's shaft has an optical encoder that
creates a sin-cos analog signal. When converted to digital form, this signal indicates the direction of rotation
and provides a pulse that increments or decrements the
external 24-bit present-position counter. This allows the

Mirror Servo Fundamentals
As Figure 1 shows, the basic mechanism of control
loops is proportional feedback of the error signal. If this
loop acts as a self-contained coprocessor to the main
CPU, the CPU is only required to input the reference
point to which the mirror should be moved. Now. the
CPU no longer needs to perform the control algorithm
at a pace equal to the sampling rate. Essentially, the
processor can "set and forget" the servo coprocessor.
One way to implement this servo coprocessor is to
add another microprocessor. This would add software
and hardware (CPU, RAM, ROM, clock, 110, interrupt
control, etc.), and possibly require an in-circuit
emulator for development if a low-cost microcontroller
is used. Another possibility is to use an analog servo
controller, but the accuracy requirements preclude this
when drift is considered.
Another approach is to use several simple PLDs in
a hybrid control-loop implementation. The system block

Figure 4. CY7C330 Servo Control Loop

6-235

result of this addition moves to the macrocell output
registers, and CLK2 clocks it into the same macrocell
input registers that were a source value for the add.
Thus, in this mode, the CY7C330s use the present
value on the dedicated input pins to adjust the target
position in the macrocell input registers with an accum~late cycle. This target-position update cycle is pictured in Figure 5. The microprocessor always provides
data as a delta or step from the present position. The
accumulate can be either an add or subtract. Subtracts
are accomplished by providing the step data from the
microprocessor in 2's-complement form. Mter alignment, the position and accumulator values are reset to
zero, and the system is ready for operation.
In operation, the outputs from the microprocessor
are three-stated, and the value from the 44-bit position
counter is loaded into the the dedicated input registers.
This value is. always provided in a 2's-complement form
by inverting the position counter's outputs (1' s complement) and setting the carry in (Cin) input to one. The
position-counter value is thus subtracted from the
present-target-position value stored in the macrocell
input registers; this forms the proportional error feedback. value used to control the servo motor. Figure 6 illustrates this servo control mode.
Note that the D/A converter does not need a 24-bit
digital value for control. In practice, the circuit uses an
8-bit DIA value biased such that the eighth bit provides
direction control (clockwise vs. counterclockwise). In
the actual design, the upper 16 bits from the two mostsignificant CY7C330s are tested for rail High and Low
conditions and generate generate two offscale bits each

loop to operate as fast as the slowest of the following
elements: the CY7C330s configured as a multistage accumulatorlsubtractor, the DI A converter, or the AID
converter. The host microprocessor is completely
decoupled from the servo loop. Should the
microprocessor halt, the servo circuitry continues to
maintain the desired reference position without intervention.

Details of the Mirror Servo
Getting into the inner workings of the mirror servo
loop, the CY7C330 macrocell output registers act essentially as an accumulator. Depending on the mode of
operation, the accumulator generates a value that is
either a new servo-motor target position or the proportional error feedback value to the servo.
When the system starts, the macrocell input
registers wake up with an initial value of O. These
registers are dedicated to holding the motor's present
target position. At the same time, the external position
counter is set to zero. Then the microprocessor steps
the target position until the laser targets an alignment
sensor.
The following steps accomplish this sequence: First,
the outputs of the external 24-bit position counter are
placed in a three-state condition. These outputs and the
microprocessor's .outputs act as inputs to the
CY7C330's dedicated input registers. The processor
drives a step value onto the inputs, and CLKI clocks
the value into the CY7C330's dedicated input registers.
On CLK1's rising edge, this value is added to the
present value in the macrocell input registers. The

MICROPROCESSOR

DEDICATED

POSITION

I/\PUT
REGISTER

DATA

LOGIC

MACROCELL

ARRAY

so

OUTPUT
REGISTER

0

PROGRAMMED
WITH
ACCUMULATOR
EQUATIONS

a

ClK
AIlDER
RESULT

Q
TARGET

D

POSITION

ClK
CLK1

CIn· 0

CLK

MACROCELL
lilPUT
REGISTER

CLK2

Figure 5. Target Update Mode Operation Sequence
(1) With external position counter's output three-state, host microprocessor drives position step data.

(2) Step data (provided in 2's complement form if a subtract is desired) is loaded intO the 330 with CLKI.
..
(3) Step data is added or subtracted from present target position with logic equations to create new target pOSItion.
(4) New target position is clocked into macrocell output registers with CLK.
(5) On CLK2, the new target position is clocked into the macrocell input register.

6-236

for these conditions. The seven low-order bits, along
with the four offscale bits, are passed to a second PLO
(22VI0), which drives the output to the 01 A in the cor~
rect direction (eighth bit) and with the correct magnitude. If the four offscale bits indicate that the upper
bits are all close to 0, the seven bits to the 01A are
masked to O. Likewise, if the upper bits are mostly 1,
the DI A bits are set to 1.
The offscale bits are generated to minimize the
number of inputs required for the subsequent PLO that
feeds the DI A converter. The determination of how to
use the offscale bits for compensation in the second
PLD is specific to a given application.

The Accumulator Design
The backbone of the logic in this design is the
CY7C330-based accumulator. The logic that implements this synchronous full adder is described by an
equation for the sum and an equation for the carry of a
given bit. The equation for the sum (S) at bit position n,
with inputs A, B, and carry in (Cin) is:
Sn = (An XOR Bn XOR Cin).
The equation for the carry out is:
COUTn = (An * Bn) + (An * Cin) + (Bn * Cin)
Figure 7 shows the equations for a 4-bit
synchronous adder, whose sequence completes in four
clocks. Because the objective is to calculate a complete
24-bit sum as quickly as possible, the equation for carry
out (CO) from the adder's first bit can be substituted

COUNTER
POSITION
DATA

LOGIC

DEDICATED
INPUT
REGISTER

0

into the equation for the adder's second bit This arrangement allows the first two bits to be added in a single
clock cycle. Similarly, the equation for the carry out
from the second bit can be substituted into the equation
for the third sum, and so on. The resulting equations for
three bits of substitution appear in Figure8.
The CY7C330's XOR product term is useful for
reducing the number of product terms required for a
given sum bit. However, even after Boolean reduction
and utilization of the XOR product term, the fourth bit
of the adder requires 30 product terms for the sum bit
and 31 product terms for the carry out bit to generate a
4-bit result in a single clock cycle. Because a given
CY7C330 macrocell provides a maximum of 19 product
terms, the device must run the accumulate process over
multiple 3-bit stages. The addition of the first three bits
fmishes after one clock cycle, the second three bits after
two cycles, and so on. Implemented in three CY7C330s,
the complete 24-bit accumulate therefore requires nine
clock cycles. With 66-MHz devices, nine clock cycles
translates to a complete calculation cycle of 120 ns.
Appendix A lists the minimized equations for one of
the three 8-bit adder stages. The syntax used in this example is that of the Cypress PLD ToolKit Variables BO
- B7 are the eight dedicated inputs sourced from either
the microprocessor or the 24-bit position counter.
INCLK is the CLKI pin on the CY7C330 used to clock
in the BO - B7 variables. Cin is the carry in from external
logic (set to one for subtraction when in control mode

MACROCEll
OUTPUT
REGISTER

ARRAY

a
0"

SO

a

0

PROGRAMMED
WITH
ACCUMULATOR
EQUATIONS

ClK

PROPORTIONAL
ERROR
FEEDBACK

ClK
ADDER
RESULT

Q
TARGET
POSITION

ClK!

eln

= I

MACROCEll
I!'-PUT
REGISTER

ClK

Figure 6. Control Mode Operation Sequence
(1) CLKI loads external 24-bit position data (in l's complement form) into CY7C330's dedicated input register.
(2) With carry in set to 1, logic equations subtract current position from t(Jfget position to form error amount.
(3) Error result is clocked into macrocell output register with CLK and is available to servo motor interface.

6-237

/*

Four Bit Adder - General Case

*'

/*

Synchronous 3 bit adder - derivative of General Case *'
Uses substitution of Carry Out in ftrst 3 bits to generate 3 bit
result in one clock cycle

'*

Inputs: An, Bn ; Inputs to be added at Bit n
CIN ; Carry in to Adder

*'

SO = AO XOR BO XOR CIN

, * CO=

Outputs: Sn
; Sum out for Bit n
Cn
; Carry out from adder stage n

'*

Equations to be reduced

'* C1
+ (AI
+ (B1

so

* CIN)

SI = Al XOR BI XOR CO
CI = (AI * BI) + (AI * CO) + (BI

* CO)

S2 = A2 XOR B2 XOR CI
C2 -= (A2 * B2) + (A2 * C1) + (B2

* C1)

= (AI * B1)
* [(AO * BO) + (AO
* [(AO * BO) + (AO

S2 = A2 XOR B2 XOR
{(AI * BI)
+ (AI * [(AO * BO) + (AO
+ (BI * [(AO * BO) + (AO
C2 = (A2

S3 = A3 XOR B3 XOR C2
C3 = (A3 * B3) + (A3 * C2) + (B3

'*

* BO) + (AO * CIN) + (BO * CIN) *'

Sl = Al XOR B1 XOR [(AO

*'

= AO XOR BO XOR CIN
CO= (AO * BO) + (AO * CIN) + (BO

(AO

* BO) + (AO * CIN) + (BO * CIN)]

* CIN) + (BO * CIN)])
* CIN) + (BO * CIN)]) *'

* CIN) + (BO * CIN)])
* CIN) + (BO * CIN)])}

* B2)

+ (A2*
{(AI * B1)
+ (AI * [(AO * BO)
+ (B1 * [(AO * BO)
+ (B2 *
{(AI * B1)
+ (AI * [(AO • BO)
+ (BI * [(AO * BO)

* C2)

C3 == Carry Out of Four Bit Adder

*'

Figure 7. Equations for Four-Bit Adder
on the first 8-bit adder stage) or from the previous stage
of the adder.
AO - A7 are the sum outputs for either target update or control mode. If the processor is updating the
target position by a step incremen4 AO - A7 are loaded
into the macrocell input registers with CLK2 (named
ACLK). When this new position update is being loaded,
the output drivers of the macrocells are not three-stated
with the OE pin or a product term equation. This allows ACLK to load the macrocell output registers
(which have the newly calculated target position) into
the macrocell input registers (which are used to hold
the target position).
C2 and C5 are internal carry-out bits generated
from the ftrst and second 3-bit adder stages, respectively. Finally, COUT is the carry out generated as either
the final carry out or as the input to the next 8-bit adder
stage's carry in.
Appendix B shows the implementation of the two
upper CY7C330 stages. The equations for the accumulator function are the same as in the previous
equations. The additions here are the equations for
detecting rail conditions and generating the offscale
bits.
Note that the intent here has been to focus on a
different approach to implementing a closed-loop servo

+ (AO
+ (AO

* CIN) + (BO * CIN)])
* CIN) + (BO * CIN)])})

+ (AO
+ (AO

* CIN) + (BO * CIN)])
* CIN)

+ (BO

* CIN)])}

Figure 8. Equations for a Synchronous 3-Bit Adder
controller, with the CY7C330 as the central element,
and to disclose the details unique to the CY7C330.
Many hardware implementation details are left to the
designer, including the D/A design, feedback design,
and the lead/lag compensation.

References
Houpis & Lamont, Digital Control Systems - Theory,
Hardware, Software (New York: McGraw-Hill, 1985)
Ball & Prat4 Engineering Applications of Microcomputers (Prentice Hall Int'l (UK) Ltd, 1986)
Kuo, Digital Control Systems (New York: Holt,
Rinehart, & Winston, Inc., 1980)
Gayakwad & Sokoloff, Analog and Digital Control
Systems (Prentice Hall, 1988)
Bollinger & Duffie, Computer Control of Machines
&Processes (New York: Addison - Wesley, 1988)
For more information on implementing the
CY7C330-based, 24-bit up/down position counter mentioned in this application note, consult the application
note, "66-MHz CY7C330 Synchronous State Machine."

6-238

Appendix A. PLD ToolKit Code for an 8-Bit Accumulator
{Mark Aaldering - Cypress Semiconductor - 8-bit accumulator - June 14, 1989}
CY7C330;
CONFIGURE;

{ Dedicated input registers. Default configuration is use of pin 2 for clock}

Outclk(node=I),
Inclk(node=2),
Aclk(node=3 ),
CIN(node=4),
BO(node=5),
Bl(node=6),
B2(node= 7),
B3(node=9),
B4(node=10),
B5(node=11),
B6(node= 12),
B7(node=13),
oe(node=14),
{Output nodes assigned to maximize available product term utilization. In the following declarations, the 7C330's
macrocell outputs are configured as follows:
ireg--This sets the macrocell feedback MUX for feedback from the macrocell input register instead of the
(default) macrocell output register (rgd)
iclk=3--This selects the clock on pin 3 instead of the default (used for the inputs above) of clock on pin 2 for the
macrocell input register
IOP--Same as ireg.
nenbpt--Selects OE control from pin 14 instead of a product term}
AO(node=28,iop,iclk=3,ireg,nenbpt),
Al (node= 15,iop,iclk=3,ireg,nenbpt),
A2(node=20,iop,iclk=3,ireg,nenbpt),
A3(node=17,iop,iclk=3,ireg,nenbpt),
A4(node=26,IOP,iclk=3,ireg,nenbpt),
A5(node=23,IOP,iclk=3,ireg,nenbpt),
A6(node= 19 ,IOP,iclk=3,ireg,nenbpt),
A7 (node=24,IOP,iclk=3,ireg,nenbpt),
COUT(node= 18,nenbpt),
C2(node= 32),
C5(node= 34),

{
{
{
{
{
{
{
{

Sum 0 / Accum. Feedback Register
Sum 1 / Accum. Feedback Register
Sum 2 / Accum. Feedback Register
Sum 3 / Accum. Feedback Register
Sum 4 / Accum. Feedback Register
Sum 5 / Accum~ Feedback Register
Sum 6 / Accum. Feedback Register
Sum 7 / Accum. Feedback Register
{Carry out }
{ Carry 2 - Hidden }
{ Carry 5 - Hidden}

{ Available nodes
# P.T.'s}
{ I/O macrocell
- 16 - 19
}
{ I/O macrocell
- 25 - 17
}
{ I/O macrocell
- 27 - 19
}
{ hidden macrocell - 31 - 13
}
{ hidden macrocell - 33 - 11
}
{End of configuration section}

6-239

0
1
2
3
4
5
6
7

}
}
}
}
}
}
}
}

Appendix A PLD ToolKit Code for an 8-Bit Accumulator (continued)
{Logic equation section}
EQUATIONS;
{AO: 2 product terms, pin 28: 9 P.T. Available}
lAO

< XSUM> CIN
< SUM> lAO * IBO
+
AO* BO;

{AI: 6 product terms, pin 15: 9 P.T. Available}
IAI

< XSUM> IAt
< SUM> Bl * IBO * ICIN
+
IBI * BO * CIN
+
IB 1 * AO * CIN
+
IBI * AO * BO
+
B 1 * lAO * ICIN
+
B 1 * lAO * BO;

{A2: 14 product terms, pin 20: 15 P.T. Available}

IA2

< XSUM> IA2
< SUM> B2*/AI */BI
IB2 * B 1 * BO * CIN
+
IB2 * Al * BO * CIN
+
fB2* Bl* AO* CIN
+
IB2 * Al * AO * CIN
+
IB2 * B 1 * AO * BO
+
IB2 * At * AO * BO
+
B2 * IBI * IBO * ICIN
+
B2 * IAt * IBO * ICIN
+
IB2* Al* Bl
+
B2 * IBI * lAO * fCIN
+
B2 * fAI * lAO * fCIN
+
B2 * fB 1 * lAO * IBO
+
B2 * fAI * fAO * fBO;
+

{C2: 15 product terms, virtual pin 32: 17 P.T. Available}
C2

< SUM>

+
+
+
+
+
+
+
+
+
+
+
+
+
+

B2 * B 1 * BO * CIN
A2 * Bl * BO * CIN
B2 * Al * BO * CIN
A2 * Al * BO * CIN
B2 * B 1 * AO * CIN
A2 * B 1 * AO * CIN
B2 * Al * AO * CIN
A2 * Al * AO * CIN
B2 * B 1 * AO * BO
A2 * B 1 * AO * BO
B2'" Al * AO * BO
A2 * Al * AO * BO
B2 * Al * Bl
A2 * Al * Bl
A2 * B2;

6-240

Appendix A PLD ToolKit Code for an 8-Bit Accumulator (continued)
{A3: 2 product terms, pin 17: 11 P.T. Available}
IA3

=

< XSUM> C2
< SUM> IA3 * IB3
A3 * B3;
+

{A4: 6 product terms, pin 26: 11 P.T. Available}
IA4

=

< XSUM> IA4
< SUM> B4 * IB3 * IC2
+
IB4 * B3 * C2
+
IB4 * A3 * C2
+
IB4 * A3 * B3
+
B4 * I A3 * IC2
+
B4 * I A3 * B3;

{A5: 14 product terms, pin 23: 15 P.T. Available}
IA5

=

< XSUM> IA5
< SUM> B5 * IA4 * IB4
+
IB5 * B4 * B3 * C2
+
IB5 * A4 * B3 * C2
+
IB5 * B4 * A3 * C2
+
IB5 * A4 * A3 * C2
+
IB5 * B4 * A3 * B3
+
IB5 * A4 * A3 * B3
+
B5 * IB4 * IB3 * IC2
+
B5 * IA4 * IB3 * IC2
+
IB5 * A4 * B4
+
B5 * IB4 * I A3 * IC2
+
B5 * IA4 * IA3 * IC2
+
B5 * IB4 * IA3 * IB3
+
B5 * I A4 * I A3 * IB3;

{C5: 15 product terms, virtual pin 34: 19 P.T. Available}
C5=

< SUM> B5 * B4 * B3 * C2
+
A5 * B4 * B3 * C2
+
B5 * A4 * B3 * C2
+
A5 * A4 * B3 * C2
+
B5 * B4 * A3 * C2
+
A5 * B4 * A3 * C2
+
B5 * A4 * A3 * C2
+
A5 * A4 * A3 * C2
+
B5 * B4 * A3 * B3
+
A5 * B4 * A3 * B3
+
B5 * A4 * A3 * B3
+
A5 * A4 * A3 * B3
+
B5 * A4 * B4
A5 * A4 * B4
+
A5 * B5;
+

6-241

Appendix A. PLD ToolKit Code for an 8·Bit Accumulator (continued)
{A6: 2 product terms, pin 19: 13 P.T. Available}
IA6 =

< XSUM> CS
< SUM> IA6 * IB6

+

A6* B6;

{A7: 6 product terms, pin 24: 13 P.T. Available}
IA7 =

< XSUM> IA7
< SUM> B7 * IB6 * ICS
+
IB7 * B6 * C5
+
IB7 * A6 * CS
+
IB7 * A6 * B6
+
B7 * I A6 * ICS
+
B7 * I A6 * B6;

{COUT: 7 product terms, pin 18: 17 P.T. Available}
ICOUT

=

< SUM> IB7 * IB6 * ICS
I A7 * IB6 * ICS
IB7 * I A6 * ICS
I A 7 * I A6 * IC5
+
IB7 * I A6 * IB6
+
I A7 * I A6 * IB6
+
IA7 * IB7;

+
+
+

{End of file.}

6-242

Appendix B. PLD ToolKit Code for an Accumulator with Rail Condition
{Mark Aaldering - Cypress Semiconductor - 8-bit accumulator with rail condition outputs - June 14, 1989}
CY7C330;
CONFIGURE;

{ Dedicated input registers. Default configuration is use of pin 2 for clock }

Outclk(node=I),
Inclk(node=2),
Aclk(node=3),
Cin(node=4),
BO(node=5),
Bl(node=6),
B2(node= 7),
B3(node=9),
B4(node= 10),
B5(node=11),
B6(node=12),
B7 (node= 13),
oe(node=14),
{Output nodes assigned to maximize available product term utilization. In the following declarations, the 330's
macrocell outputs are configured as follows:
ireg--This sets the macrocell feedback MUX for feedback from the macrocell input register instead of the
(default) macrocell output register (rgd)
iclk=3--This selects the clock on pin 3 instead of the default (used for the inputs above) of clock on pin 2 for the
macrocell input register
IOP--Same as ireg.
nenbpt--Selects OE control from pin 14 instead of a product term }
AO(node=28,iop,iclk=3,ireg,nenbpt),
Al (node= 15,iop,iclk=3,ireg,nenbpt),
A2(node=20,iop,iclk=3,ireg,nenbpt),
A3(node=17 ,iop,iclk=3,ireg,nenbpt),
A4(node=26,iop,iclk=3,ireg,nenbpt),
A5(node=23,iop,iclk=3,ireg,nenbpt),
A6(node=19,iop,iclk=3,ireg,nenbpt),
A7(node=24,iop,iclk=3,ireg,nenbpt),
COUT(node= 18,nenbpt),
C2(node= 32),
C5(node= 34),
RO(node= 16,nenbpt),
R l(node= 25,nenbpt),

{ Sum 0 I Accum. Feedback Register 0
{ Sum 1 I ACCUID. Feedback Register 1
{ Sum 2 I ACCUID. Feedback Register 2
{ Sum 3 I ACCUID. Feedback Register 3
{ Sum 4 I Accum. Feedback Register 4
{ Sum 5 I ACCUID. Feedback Register 5
{ Sum 6 I ACCUID. Feedback Register 6
{ Sum 7 I ACCUID. Feedback Register 7
{Carry Out}
{ Carry 2 - Hidden }
{ Carry 5 - Hidden}
{Rail Bit O}
{ Rail bit 1 }

# P.T.'s}
{ Available nodes
{ I/O macrocell
- 27 - 19
}
}
{ Hidden macrocell- 31 - 13
{ Hidden macrocell - 33 - 11
}
{End of configuration section}

6-243

}
}
}
}
}
}
}
}

~~
-==l!Ir

-;~~~~~~~~~~C~Y7~C~3~3~O~:~C~lo~s~ed~.~L~o~o~p~S~e~rv~o~C~on~t~r=ol

SEMICOIDUCTOR_

Appendix B. PLD ToolKit Code for an Accumulator with Rail Condition (continued)
{Logic equation section}
EQUATIONS;
{AO: 2 product terms, pin 28: 9 P.T. Available}
I AD =

< XSUM> CIN

< SUM> lAO * IBO
+
AO * BO;

{AI: 6 product terms, pin 15: 9 P.T. Available}
IAI =

< XSUM> IAI
< SUM> Bl * IBO * ICIN
+
IB 1 * BO * CIN
+
IBl* AO* CIN
+
IBI * AO * BO
+
B 1 * lAO * ICIN
+
B 1 * lAO * BO;
{A2: 14 product terms, pin 20: 15 P.T. Available}

IA2 = < XSUM> IA2
< SUM> B2 * IAI * IBI
+
IB2 * B 1 * BO * CIN
+
IB2 * Al * BO * CIN
+
IB2 * Bl * AO * CIN
+
IB2 * Al * AO * CIN
+
IB2 * Bl * AO * BO
+
IB2 * Al * AO * BO
+
B2 * IB 1 * IBO * ICIN
+
B2 * IAI * IBO * ICIN
+
IB2* Al* Bl
+
B2*/BI * lAO */CIN
+
B2 * IAI * lAO * ICIN
+
B2*/Bl*/AO*/BO
+
B2*/Al*/AO*/BO;
{C2: 15 product terms, virtual pin 32: 17 P.T. Available}
C2= < SUM>

+
+
+
+
+
+
+
+
+
+
+
+
+
+

B2 * B 1 * BO * CIN
A2 * B 1 * BO * CIN
B2 * Al * BO * CIN
A2 * Al * BO * CIN
B2 * B 1 * AO * CIN
A2 * Bl * AO * CIN
B2 * Al * AO * CIN
A2 * Al * AO * CIN
B2 * B 1 * AO * BO
A2 * Bl * AO * BO
B2 * Al * AO * BO
A2 * Al * AO * BO
B2 * Al * Bl
A2 * Al * Bl
A2 * B2;

6-244

Appendix B. PLD ToolKit Code for an Accumulator with Rail Condition (continued)
{A3: 2 product terms, pin 17: 11 P.T. Available}
IA3

=

< XSUM> C2
< SUM> IA3 * IB3

+

A3

*

B3;

{A4: 6 product terms, pin 26: 11 P.T. Available}
I A4

 I A4
< SUM> B4 * IB3 * IC2
+
IB4 * B3 * C2
+
IB4 * A3 * C2
+
IB4 * A3 * B3
+
B4 * I A3 * IC2
+
B4 * I A3 * B3;
{AS: 14 product terms, pin 23: lS P.T. Available}

IA5 = < XSUM> IA5
< SUM> BS*IA4*/B4
IB5 * B4 * B3 * C2
+
IB5 * A4 * B3 * C2
+
IB5 * B4 * A3 * C2
+
IB5 * A4 * A3 * C2
+
IBS * B4 * A3 * B3
+
IB5 * A4 * A3 * B3
+
B5 * IB4 * IB3 * IC2
+
B5 * I A4 * IB3 * IC2
+
IBS * A4 * B4
+
BS * IB4 * IA3 * IC2
+
BS * IA4 * IA3 * IC2
+
B5 * IB4 * I A3 * IB3
+
B5 * IA4 * IA3 * IB3;
+

{CS: lS product terms, virtual pin 34: 19 P.T. Available}
C5 = < SUM> BS * B4 * B3 * C2
A5 * B4 * B3 * C2
+
B5 * A4 * B3 * C2
+
+
AS * A4 * B3 * C2
+
BS * B4 * A3 * C2
+
AS * B4 * A3 * C2
+
B5 * A4 * A3 * C2
+
A5 * A4 * A3 * C2
+
BS * B4 * A3 * B3
+
AS * B4 * A3 * B3
+
B5 * A4 * A3 * B3
+
AS * A4 * A3 * B3
+
B5 * A4 * B4
+
A5 * A4 * B4
+
A5 * B5;

6-245

5y.:~ .;;;;;;;;;;;=========;;;;;C;;;;;Y7=C;;;;;3;;;;;3;;;;;O;;;;;:=C;;;;;lo;;;;;;s;;;;;;ed;;;;;;-;;;;;;L;;;;;;o;;;;;o!;;;p;;;;;;S;;;;;;e;;;;;;rv;;;;;o;;;;;;C=oD;;;;;;t;;;;;;;r=ol
Appendix B. PLD ToolKit Code for an Accumulator with Rail Condition (continued)
{A6: 2 product terms, pin 19: 13 P.T. Available}
IA6

=

< XSUM> C5
< SUM> IA6 * IB6
+
A6 * B6;

{A7: 6 product terms, pin 24: 13 P.T. Available}
IA7

= < XSUM>

IA7
< SUM> B7 * IB6 * IC5
+
IB7 * B6 * C5
+
IB7 * A6 * C5
+
IB7 * A6 * B6
+
B7 * I A6 * IC5
+
B7 * I A6 * B6;

{COUT: 7 product terms, pin 18: 17 P.T. Available}
ICOUT

=<

SUM> IB7 * IB6 * IC5
+
I A7 * IB6 * IC5
+
IB7 * I A6 * IC5
+
I A 7 * I A6 * IC5
+
IB7 * I A6 * IB6
+
I A7 * I A6 * IB6
+
IA7 * IB7;

{RO: rail bit 0; Arbitrarily equation chosen to detect when upper 5 bits are all 1 - this decision is a matter of
preference
output active low}
IRO = < SUM>

A7 * A6 * A5

* A4 * A3;

{ R1: rail bit 1; Again, arbitrarily chosen to reflect value of carry out, therefore this is a redundant output - active 10\
output}
IRI

=

< SUM> COUT;

{End of me}

6-246

~4
;;;;;;
.= CYPRESS
,
SEMICONDUCTOR

FDDI Physical Connection Management
Using the CY7C330
This application note shows how you can use the
Cypress CY7C330 programmable logic device (PLD) to
implement the Physical Connection Management
(PCM) state machine specified in the Station Management (SMT) of the Fiber Distributed Data Interface
(FDDI) standard. Along with a brief overview of the
FDDI standard, this application note explains the
CY7C330's features, the design methodology used in
this design, and an example of how you can synthesize a
complex function into this device. Note, however, that
this is not meant to be an in-depth tutorial of the FDDI
standard and its various layers.

1988 update of the SMT specification. The final FDDI
specification might differ slightly, but the design
methodology remains the same.
The PMD layer is the lowest and specifies the
network's connectors, transceivers, and bypass switches.
The PRY layer specifies the type of encoding used on
the data (4B/5B) and specifies a set of line states. These
line states implement a handshake mechanism between
PRYs of adjacent nodes. The MAC layer performs
higher-level, peer-to-peer communications. It also
provides for system timer support, packet framing, and
responses to various types of errors in the network. The
SMT layer controls the activities of the MAC, PHY,
and PMD. SMT includes functions such as connection
management (CMT), fault detection, and ring reconfiguration.
The CMT is the portion of Station Management
that controls the insertion, removal and logical connection of the PRY entities. Within the CMT is an area
known as the Physical Connection Management (PCM).
A chart showing a hierarchical view of the location of

FDDIOverview
FDDI is a lOO-Mbits/s dual token ring network that
can connect as many as 500 nodes with a maximum linkto-link distance of 2 km and a total network circumference of about 100 km. The network employs a
primary and a secondary ring. The primary ring handles
data transmission, and the secondary ring mainly
provides fault tolerance, but can be used for data transmission as well.
FDDI is a token ring network, in which rotating a
token grants network access. The node with the token
can transmit data. This arrangement ensures a deterministic, collision-free network, independent of the
number of stations in the network.
Because of the dual-ring topology, FDDI defines a
fault-recovery mechanism. If a fault is detected, such as
a broken fiber-optic cable, the network can be restored
by routing around the break with the second ring. This
function is largely controlled by the state machine
shown later, which is implemented with the CY7C330.
The ANSI X3T9.5 standards committee controls
the FDDI standard, which was developed using the
Open Systems Interconnection (OSI) model; FDDI implements the model's physical and data-link layers. The
four FDDI layers are Physical Media Dependent
(PMD), Physical (PRY), Media Access Control
(MAC), and Station Management (SMT).
The state machine example described later in this
application note was developed with the December 2,

I

SMT

I
I

MAC

I

I I CMTII

I
PHY
I
PMD
I
Figure 1. FDDI Hierarchy

6-247

PCM

~

==-~~~~~~~~~~~~~~~~~~~~F~D~D~I~U~S~in~g~th~e~C~Y7~·~C~3~30
the PCM appears in Figure 1. The PCM provides the
signals to perform the following functions:
Initialize a connection
Reject a marginal connection
Support maintenance
Figure 2 shows the synthesized state machine that
performs these activities. This state machine is based on
version 9.1 of the PCM state machine described in the
SMT specification.
To keep within the CY7C330's 25 I/O constraint, a
small amount of logic is implemented outside the
CY7C330. For instance, the PCM uses two timers. The
CY7C330 does not include these timers,but two
decoded signals (timerl and timer2) indicate that the
timer has reached specific values. The timerl and
tirner2 signals are inputs to the CY7C330. The chart in
Figure 3 shows all the macrofunctions, how they are
decoded, and their functions.

Introduction to the CY7C330
The CY7C330 is a synchronous, 28-pin PLD. It is
packaged in a 300-mil DIP as well as several types of
surface mount packages, including a leadless ceramic
chip carrier (LCC) and a plastic leaded chip carrier

(PLCC). The device is fabricated with the Cypress 0.8micron CMOS process and is available in speeds of 33,
50, and 66 MHz. The CY7C330 is also available as a
military device in speeds of 33, 40, and 50 MHz. The
device is optimized to implement high-speed state
machine designs.
The CY7C330's features can be generalized into
four groups:
1. Dedicated input cell
2. Product term array
3. I/O macrocell
4. Hidden state-register macrocell
The CY7C330 contains 11 of the dedicated input
macrocells. This cell (Figure 4) contains a D flip-flop
and a programmable multiplexer (mux) that allows a
choice of two iriput clocks. The two input clocks are
CKI and CK2, which come directly from pins 2 and 3 of
the device, respectively. Note that you cannot bypass
any of the CY7C330's registers. The device is purely
synchronous in nature.
As with any PLD, the CY7C330's product term
array (see the CY7C330 block diagram in Figure 1 of
"Understanding the CY7C330") synthesizes the logical
connections of the design. The product terms control a

H LS

~~.---..

QLS+HLS+IO ISE

(QU +HLS+YLS) -Till E1

QLS+(IIU*TIIIU)

hall

QLS+HLS+TIIIEI+II0ISE

Figure 2. PCM State Machine

6-248

MACRD

NAME

MLS
ILS
HLS
QLS
pc_start
pCJeject
scjoin
pcstop
pcmaint
time 1
time2
n_neCL1O
n_~7

n_~9
n_~lO

noise
vaIn
vaI8
vaI9

SYNTHESIZED SIGNAL

!MLS
!ILS
!HLS
!QLS
!pcO & !pel
!pcO & pcl
pcO & !pcl
!pc_stop
!pc_maint
!timerl
!timer2
!nO & !nl
!nO & nl
nO & !nl
nO & nl
!noise_count
Val_n
!VaI 8
!VaI_9

FUNCTION

Master Line State
Idle Line State
Halt Line State
Quiet Line State
State PCM State Machine
Enter Reject State
Incorporate connection into token path
PCM state machine to enter OFF state
Enter maintenance state
See timer explanations below.
See timer explanations below.
Counter indicating 10 bits of data have not been received or transmitted
Counter indicating 7 bits have been transmitted or received
Counter indicating 9 bits have been transmitted or received
Counter indicating 10 bits have been transmitted or received
Noise counter threshold
Transmitted value n
Transmitted or Received value = 8
Transmitted or Received value = 9

TIMER VALUES
Timer 1

Oms
0.2ms
480ns
15 us
25 ms
200ms
Timer 2:
100ms

TB_Min
A Max
LS Min
LS_Max
I_Max
T_next(9)

Minimum break time for link.
Maximum time required to achieve signal aquisition.
Length of time reception of ILS
Max time required for line state recognition
Max optical bypass insertionldeinsertion time
Default time for MAC loopback

TOut

Signalling Timeout
Figure 3. Macro Definitions
register, which can clock data from the I/O pin into the
array. This flip-flop can be clocked from CKI or CK2,
as with the dedicated input cell.

global reset, a global preset, an Exclusive-OR gate, the
output enables, and the product terms that go to the D
input of the flip-flops in the output macrocells. (Most of
these features are covered ·later in the explanation of
the macrocell.) The device offers product term distribution that varies between nine and 19, depending on
which output macrocell is being addressed. The 19
product terms become the limiting factor in the complexity of the design.
The I/O macrocell (see Figure] in "Using ABEL to
program the CY7C330") contains two D flip-flops. One
of the D flip-flops clocks data from the array to either
the output pin or back to the array and is intended to
be a state register. The I/O macrocell has a different
clock than the input registers, called CLK, which comes
directly from pin 1. The other D flip-flop is an input

FROM
INPUT

TO

PIli

INPUT

CLK2
CLKl

FROM

FROM

PIN

Pili

3

2

Figure 4. Input Macrocell

6-249

BUFFER

As mentioned earlier, the product term array feeds
an XOR gate, which in turn feeds the D input of the
state register. This gives you quite a bit of design
flexibility. For example, you can use the XOR as an inverter by setting the XOR product term to a One. You
can use the XOR to make the flip-flop a D, T, or JK
type. Wrapping the Q output back to the XOR input
changes the flip-flop from D to T, for instance. The
design example described later uses this feature.
The output macrocell also allows you to choose the
output-enable control for the pin. The output enable
can come from a product term or directly from pin 14.
The CY7C330 provides 12 I/O macrocells.
The hidden-state macrocell (Figure 5) contains a
state register with no output pin associated with it. The
CY7C330 contains four hidden-state macrocells. You
can use these macrocel1s to synthesize a small 4-bit internal state machine or perform any function that is required only internally to the device itself.
The timing required for this design is 12.5 MHz,
which allows use of the slowest CY7C330 version (33
MHz). The design requires one clock, although two
pins are dedicated for clocks in the CY7C330. In this
design, pins 1 and 2 are tied together extemally,conneeting the input-register and state-register clocks
together. In the ABEL source code described below,
the labels for the two clocks are CKS and CKI.

Design Methodology
The PCM design is implemented using the state
machine syntax in ABEL version 3.0. The first-pass
ABEL source code appears in Appendix A. Note that
the state machine requires 31 states. This means that
the state machine is implemented with 5 bits, which
gives 32 total states and leaves one illegal state. When
the design is run at reduction level 4 - the maximum
reduction in ABEL - the software responds that the
design requires more than 30 product terms per output.
This is far more than the 19 product terms that are possible on anyone output.

I 0 'L-II:...LJU1.IlL-..4'r---.
SUM

)-+-+-++-+-+--1

OE

(FIlOM

PIlI

ClKO
ClKl
elK!
SR
SS

Figure 5. The CY7C330 Buried Register

14)

Case 1.
Decimal
6
9

Binary
000110
001001
(4 bits toggle)

6
7

000110
000111
(1 bit toggles)

Case 2.

Figure 6. State Change Comparison
At first glance, you might assume that the design is
far too complex for the CY7C330. But further procedures make this implementation possible. To understand these procedures, it is necessary to understand
some facts about ABEL.
ABEL reduces a design to a sum of products and
does not make use of the XOR gate in the macrocell.
To use the XOR gate, you must specify it in Boolean
equation form and run the reduction at level O. Specifying T flip-flops in version 3.0 also causes ABEL to
reduce to a sum of products and not create T flip-flops
using the XOR gate. ABEL 3.1 accepts T flip-flops,
however, and corrects this situation.

Product Term Squeezing
The first method for reducing the number of
product terms is to increase the number of bits in the
state machine from 5 to 6 bits. Although the. state
machine only requires 31 states, a much broader range
of choice results from having 64 possibilities for placing
the states.
The next procedure involves changing from D flipflops to T flip-flops. T flip-flops are more effIcient because when the T input is High, the flip-flop toggles.
Otherwise, the flip-flop retains its previous state.
Because a T flip-flop only needs one product term
for a transition to occur, the state machine can be optimized by choosing state transitions that use a minimum number of bits. For example, a transition between
states 6 and 9 requires more bits to change than a transition between states 6 and 7 (Figure 6).
The 6-to-9 transition requires four product terms,
while the 6-t0-7 transition requires only one product
term. Because the number of total states has been increased from 32 to 64 by adding one more bit to the
state machine, you gain much more flexibility in choosing states. Carefully choosing the states in a state
machine is the easiest way to reduce the number of
product terms required.
Another way to make the design implementation
more effIcient is to use the CY7C330's synchronous
global reset and preset to deal with illegal states. (Initially, the state machine is in state 0 because the
CY7C330 has a power-on reset) It is good design practice to make provisions for illegal states. Although an
6-250

State !S48:

illegal state should never occur, the state machine
should be able to recover from such a state. Many times
the recovery mechanism is built into the state machine
itself, which requires more product terms.
If an illegal state is detected in this design, the state
machine re-initializes itself and goes to state O. Instead
of building this requirement into the design, you can use
a hidden register to detect the occurrence of illegal
states. The signal from that register controls the
CY7C330's synchronous reset, which returns the state
machine to state O. The CY7C330's synchronous nature
causes the state machine to go to state 0 two clocks
after the illegal state is encountered. One clock is required to detect the illegal state, and one clock is required to reset the device. This requirement is acceptable for this application.
In this design, it was noticed that the condition
pcmaint was encountered in every state; the state
machine was unconditionally required to go to this
state. To reduce the state machine further, the state assigned to this condition is 63 (111111 binary). The
synchronous preset is used to detect this signal. The
assertion of pcmaint forces the state machine to state
63, thus avoiding the use of any product terms in the
main body of the design.
This design requires several synchronous resets: an
external pin (RST), the illegal state detect, and the signal pc_stop. Because only one product term is allowed
for the device's synchronous reset, the other two resets
must be developed by ANDing the reset signal with
every product term associated with the outputs that are
to be reset. This performs the same function as having
multiple p terms for the synchronous reset but does not
utilize any additional resources in the CY7C330.
Keep in mind that the CY7C330 has varied product
term distribution. The state registers associated with
pins 16 and 27 have 19 product terms. Put the state outputs that require the most product terms to these pins.
In this example, QO requires 18 product terms, and Q5
requires 17. These outputs are assigned to pins 27 and
16. The remaining outputs are placed in the same
manner.
Converting the state machine to Boolean equations
is a straightforward procedure. By examining the state
transitions, you can extract the Boolean equations. The
reduced design is shown in Figure7.

if (HLS) then !S52
else if (QLS # time2) then !S32
else !S48;
48
52

= 110000 (binary)
= 110100

Q2 is the only bit that transitions
Therefore, a product term of:
Q5 & Q4 & !Q3 & !Q2 & !Q1 & !QO & HLS
/

\

state 48
would be added to the equation for Q2.
To continue the example:
48 = 110000
32 = 100000

Q4 is the only bit that transitions
Therefore, the product terms of:
Q5 & Q4 & !Q3 & !Q2 & !Q1 & !QO & QLS

# Q5 & Q4 & !Q3 & !Q2 & lQl & lQO & time2
\

/

state 48
would be added to the equation for Q4.
Figure 7. Boolean Equation Extraction Example
The Cypress PLD ToolKit· is used as the development platform for the reduction process. The PLD
ToolKit is a low-cost software development system for
all Cypress PLDs. Although the reduced equations
could have been obtained using ABEL, in many ways
the PLD ToolKit is easier to use and more tailored to
the Cypress devices. The PLD ToolKit source file appears in Appendix B. The PLD ToolKit also features a
mouse-driven, interactive, simulator/waveform editor
that makes design verification easy.

6-251

Appendix A. Orignal Abel Source Code

module pcm flag '-r3'
title 'Physical Connection Management (PCM) state Machine version 9.1
Steve Traum Cypress Semiconductor March 27, 1989'
U1

device 'P330';

"Inputs
CKS,Ck1,rst
pcO,pc1
timer 1
timer2
mls,ils,hls,qIs
Val n
nO,n1
Val 8
Val-9
noise_count
pc_stop
pc. maint

nC
Val 8
Val=9
noise_count

pin 1,2,3;
pin 4,5;
pin 6;
pin 7;
pin 9,10,11,12;
pin 13;
pin 14,15;
pin 16;
pin 17;
pin 18;
pin 19;
pin 20;
istype 'feedyin';
istype 'feedyin';
istype 'feedyin';
istype 'feedyin';

"Outputs
Reset
node 29;
Q5,Q4,Q3,Q2,Q1,QO
pin28,27,26,25,24,23;
Q5,Q4,Q3,Q2,Q1,QO
istype 'pos,reg';
Qstate = [Q5,Q4,Q3,Q2,Q1,QO];
"declarations

"Qstate
SO
S5
S10
S15
S20
S25
S30
S35
S40
S45
S50
S55
S60

I\bOOOOOO;
"bOOOlO1;
I\b001010;
I\bOOllll;
"b010100;
"b01lO01;
"bOll 110;
I\b100011;
"b lO 1000;
"blOll01;
"bllOOlO;
I\b 110 11 1;
I\b1 11 100;

High,Low
H,L,C,X,Z

=
=

1,0;
1,0, .C.,X.,.Z.;

Sl = "bOOOOO1;
S6 = "bOOO 11 0;
Sl1
"bOOlO11;
S16
"b010000;
S21
"bOlO101;
S26
"b011010;
S31
I\bOl 11 11;
S36
"b100100;
S41 = "b101001;
S46 = "biOI 110;
S51 = I\b11OOll;
S56 = I\b1ll000;
S61 = "b11ll01;

S2 = I\bOOOO 10;
S7 = "bOO0111;
S12
"bOOll00;
S17
"b010001;
S22
"bOlOll0;
S27
"bOl1011;
S32
I\blOOOOO;
S37
"blOOlOl;
S42
I\blOl0l0;
S47
I\b10ll11;
S52
I\bl10100;
S57
I\bl1l001;
S62
I\b1 11 110;

MLS MACRO {(!mls)};
1LS MACRO {(!ils)};
HLS MACRO {(!hIs)};
QLS MACRO {(!qls)};

6-252

S3 = "bOOOOll; S4 = "bOOO100;
S8 = "bOO 1000; S9 = I\bOOl00l;
S13
I\bOO1101;S14 = I\bOOlll0;
S18
"bOl00lO;S19 = "bOl0011;
S23
I\b010111;S24 = "b011000;
S28
I\b011100;S29 = "b01ll01;
S33
I\b100001;S34 = I\b1000lO;
S38
"b100110;S39 = I\b100ll1;
S43
"b1010ll;S44 = "b10ll00;
S48
"bllOOOO;S49 = I\b11OOO1;
S53
I\bll0101;S54 = I\b110110;
S58
I\b1ll0lO;S59 = "bll10ll;
S63
I\bllllll;

Appendix A. Original Abel Source Code (continued)
pc_start MACRO {(!pcO & Ipc1)};
pCJeject MACRO {(!pcO & pc1)};
scjoin MACRO {(pcO & Ipc1)};
pcstop MACRO {(!pc_stop)};
pcmaint MACRO {(!pc maint)};
time1 MACRO {(Itimer!)};
time2 MACRO {(!timer2)};
n_necL)O MACRO {(!nO & In1)};
n eq 7 MACRO {(!nO & n1)};
n=eq) MACRO {(nO & In1)};
n_e
 QO & !ILSTATE & pc stop
# !Q5 & !Q4 & !Q3 & !Q2 & !Ql & QO & !HLS & !ILSTATE & pc stop
# !Q5 & !Q4 & !Q3 & !Q2 & Ql & !QO & !timer! & !ILSTATE & Pc_stop
# !Q5 & !Q4 & Q3 & !Q2 & !QI & QO & pcO & !pcl & !timer! & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & !QI & QO & !ILSTATE & pc stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & QO & nO & nl & !ILSTATE & pc stop
# Q5 & Q4 & Q3 & !Q2 & Ql & QO & !Val 8 & !ILSTATE & pc stop
# Q5 & Q4 & !Q3 & Q2 & QI & !QO & !ReS & !lLSTATE & pc -stop
# Q5 & Q4 & !Q3 & Q2 & QI & !QO & !MLS & !lLSTATE & pc-stop
# Q5 & Q4 & !Q3 & Q2 & Ql & !QO & !timerl & !ILSTATE & pc_stop
# !Q5 & Q4 & Q3 & !Q2 & Ql & QO & Val_n & !lLSTATE & pc_stop
# Q5 & !Q4 & !Q3 & !Q2 & !Ql & !QO & !QLS & !timerl & !lLSTATE& pc stop
# Q5 & !Q4 & !Q3 & !Q2 & !Q 1 & !QO & !HLS & !timerl & !lLSTATE & pc-stop
# Q5 & !Q4 & !Q3 & !Q2 & !Ql & !QO & !MLS & !timerl & !lLSTATE & pc-='stop
# Q5 & !Q4 & !Q3 & !Q2 & !Ql & QO & !ILS & !lLSTATE & pc stop
# Q5 & !Q4 & !Q3 & !Q2 & Ql & QO & !timerl & !lLSTATE & pc stop
# Q5 & !Q4 & Q3 & !Q2 & !QI & !QO & !ILS & !lLSTATE & pc_stop
# Q5 & Q4 & !Q3 & !Q2 & Ql & QO & !Val_n & !lLSTATE & pc_stop
# Q5 & Q4 & Q3 & Q2 & Ql & QO & !pcO & !pcl & !lLSTATE & pc_stop;

6-256

Appendix B. Cypress PLD ToolKit Source File (continued)

QI

Q2

Q3

0-

0-

0-

< oe>
 QI & !ILSTATE & pc stop
# !Q5 & !Q4 & !Q3 & !Q2 & Ql & QO & !pcO & pcl & !ILSTATE & pc_stop
# Q5 & Q4 & Q3 & Q2 & QI & QO & !pcO & !pcl & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & !Ql & QO & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & !QO & !QLS & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & !QO & !timer2 & !ILSTATE & pc stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & QO & nO & nl & !ILSTATE & pc-stop
# Q5 & !Q4 & !Q3 & !Q2 & !QI & QO &"!HLS & !ILSTATE & pcji"op
# Q5 & !Q4 & !Q3 & !Q2 & QI & !QO & !QLS & !ILSTATE & pc stop
# Q5 & !Q4 & !Q3 & !Q2 & QI & !QO & !timer2 & !MLS & !ILSTATE & pc stop
# Q5 & Q4 & !Q3 & !Q2 & QI & QO & Vatn & !ILSTATE & pc_stop;

< oe>
 Q2 & !ILSTATE & pc_stop
# Q5 & Q4 & Q3 & Q2 & QI & QO & !pcO & !pcl & !ILSTATE & pc_stop
#!Q5 & Q4 & !Q3 & !Q2 & QI & !QO & lHLS & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & !QO & !MLS & !ILSTATE & pc_stop
# Q5 & Q4 & Q3 & !Q2 & QI & QO & !Val_8 & !ILSTATE & pc_stop
# !Q5 & Q4 & Q3 & !Q2 & QI & QO & !ILSTATE & pc_stop
# Q5 & !Q4 & !Q3 & Q2 & !QI & !QO & !QLS & !ILSTATE & pc stop
# Q5 & !Q4 & !Q3 & Q2 & !QI & !QO & !timer2 & !ILSTATE & Pc stop
# Q5 & !Q4 & !Q3 & Q2 & QI & !QO & !timerl & !ILSTATE & pc stop
# Q5 & !Q4 & Q3 & Q2 & !QI & !QO & !timerl & !ILSTATE & pc-stop
# Q5 & Q4 & !Q3 & !Q2 & !Ql & !QO & !HLS & !ILSTATE & pc3top
# Q5 & Q4 & !Q3 & Q2 & Ql & QO & !ILSTATE & pc_stop;

< oe>
 Q3 & !ILSTATE & pc stop
# Q5 & Q4 & Q3 & Q2 & Ql &-QO & !pcO & !pcl & !ILSTATE & pc stop
# !Q5 & !Q4 & Q3 & !Q2 & !QI & !QO & !QLS & !ILSTATE & pc stop
# !Q5 & !Q4 & Q3 & !Q2 & !QI & !QO & !HLS & !ILSTATE & pc-stop
# !Q5 & !Q4 & Q3 & !Q2 & !Ql & !QO & !noise count & !ILSTATE & pc stop
# !Q5 & !Q4 & Q3 & !Q2 & !Ql & QO & !pcO &-pcl & !ILSTATE & pc_stop
# !Q5 & !Q4 & Q3 & !Q2 & !Ql & QO & !MLS & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & Ql & QO & !nO & nl & !ILSTATE & pc stop
# !Q5 & Q4 & !Q3 & !Q2 & Ql & QO & nO & !nl & !ILSTATE & pc-stop
# Q5 & Q4 & Q3 & !Q2 & QI & QO & !ILSTATE & pc stop
# Q5 & !Q4 & !Q3 & Q2 & !Ql & !QO & !MLS & !ILST-ATE & pc_stop
# Q5 & !Q4 & Q3 & !Q2 & !Ql & !QO & !QLS & !ILSTATE & pc_stop
# Q5 & !Q4 & Q3 & !Q2 & !Ql & !QO & !HLS & !ILSTATE & pc_stop
# Q5 & !Q4 & Q3 & !Q2 & !Ql & !QO & !timer2 & !ILSTATE & pc_stop
# Q5 & !Q4 & Q3 & !Q2 & !Ql & !QO & !noise_count & !ILSTATE & pc_stop;

6-257

Appendix B. Cypress PLD ToolKit Source File (continued)

Q4 .-

< oe>
 Q4 & m.,sTATE & pc_stop
# !Q5 & !Q4 & !Q3 & !Q2 & QI & QO & !timerl & !lLSTATE & pc_stop
# Q5 & Q4 & Q3 & Q2 & QI & QO & !pcO & !pcl & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & !QI & !QO &Vat9 & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & !QO & !QLS & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & !QO & !timer2 & !lLSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & !QO & tMLS & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & Q2 & QI & !QO & !ILSTATE & pc stop
# !Q5 & Q4 & Q3 & !Q2 & QI & QO & !Val n & !ILSTATE & pc stop
# Q5 & !Q4 & !Q3 & Q2 & QI & QO & !HLS & !ILSTATE & pcj'top
# Q5 & !Q4 & !Q3 & Q2 & QI & QO & !MLS & !ILSTATE & pc_stop
# Q5 & !Q4 & !Q3 & Q2 & QI & QO & !timerl & !ILSTATE & pc_stop
# Q5 & Q4 & !Q3 & !Q2 & !QI & !QO & !QLS & !ILSTATE & pc stop
# Q5 & Q4 & !Q3 & !Q2 & !QI & !QO & !timer2 & !lLSTATE & pc_stop
# Q5 & Q4 & !Q3 & Q2 & !QI & !QO & !timerl & !ILSTATE & pc_stop;

Q5

< oe>
 Q5 & !ILSTATE & pc_stop
# !Q5 & !Q4 & !Q3 & !Q2 & !Ql & !QO & !pcO & !pcl & !lLSTATE & pc_stop
# !Q5 & !Q4 & !Q3 & !Q2 & !QI & QO & !HLS & !ILSTATE & pc_stop
# !Q5 & !Q4 & !Q3 & Q2 & QI & !QO & !ILSTATE & pc_stop
# !Q5 & !Q4 & Q3 & !Q2 & !Ql & !QO & !QLS & !ILSTATE & pc_stop
# !Q5 & !Q4 & Q3 & !Q2 & !QI & !QO & !HLS & !ILSTATE & pc_stop
# !Q5 & !Q4 & Q3 & !Q2 & !QI & !QO & !noise count & !ILSTATE & pc stop
# !Q5 & Q4 & !Q3 & !Q2 & !Ql & !QO & !ILSTATE & pc stop
# !Q5 & Q4 & IQ3 & IQ2 & Ql & IQO & IQLS & !ILSTATE & pc_stop
# !Q5 & Q4 & IQ3 & IQ2 & Ql & !QO & !timer2 & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & QI & QO & InO & !nl & !ILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & !Q2 & Ql & QO & nO & lnl & IILSTATE & pc_stop
# !Q5 & Q4 & !Q3 & Q2 & Ql & !QO & !ILSTATE & pc stop
# !Q5 & Q4 & Q3 & IQ2 & Ql & QO & !ILSTATE & pcjtop
# Q5 & !Q4 & !Q3 & !Q2 & Ql & !QO & !ILS & !ILSTATE & pc_stop
# Q5 & IQ4 & Q3 & !Q2 & !Ql & QO & !timerl & !ILSTATE & pc stop
# Q5 & Q4 & !Q3 & !Q2 & Ql & !QO & !ILSTATE & pc stop
# Q5 &Q4 & !Q3 & !Q2 & Ql & QO & Vatn & !ILSTATE & pc_stop;

{end of file}

6-258

Bus-Oriented Maskable Interrupt Controller

This application note illustrates the design
flexibility of Cypress's CY7C331 PLD by describing a
single-chip interrupt controller based on the PLD.
Virtually all microprocessor designs require some
type of interrupt support. Co~plex applications c~
take advantage of a dedicated mterrupt controller ChIp
from the microprocessor family. But for simple applications or where special requirements exist, a standard interrupt controller can prove inadequate or represent
overkill for the design.
In such cases, you generally implement a customdesigned controller using some combination of MSI
logic and PLDs. The single-chip design described ~ere
is implemented in two stages: The first stage compnses
a simple 4-channel controller, which includes the major
functional blocks. In the second stage, another controller is cascaded from the stage-l design to provide support for up to eight interrupt channels.
The interrupt controller's design features include:
1. Programable-polarity, level-sensitive inputs
2. Interlocked REQI ACK handshake
3. Simple MPU bus attachment for read and write
4. Masking of individual channels
5. Prioritized interrupt vector
6. Fully asynchronous operation

CY7C331 Description
The device used to implement the interrupt controller is the CY7C331, an asynchronous PLD packaged
in a 28-pin, 300-mil DIP. The device features 12 I/O
macrocells and 13 dedicated inputs. The I/O macrocell
has a separate input and output flip-flop, which is highly

Mask Word (Write)
7

16

o ->

Is

14

b

I

ENABLED

~CHO

MASK
CH1

7 6 0

STATUS BIT
o -> No
Interrupt Vector

I

I L Vector
LSB
Vector

Figure 1. Data Bus Bit Assignments
useful in bus-oriented applications.
Each flip-flop has a separate product term for the
clock, set, and reset. The output flip-flop's D input incorporates an XOR with the sum-of-products array.
This allows you to select polarity or implement a toggle
or JK flip-flop.
The macrocell flip-flops also offer a unique
transparency feature: When the set and reset inpu!s are
both asserted, the flip-flop's Q output follows Its D
input. Thus, you can use the flip-flop as a. clocked
register with independent clock, set and reset mputs or
as a combinational path.

Design Description
The interrupt controller attaches to the MPU data
bus and is controlled by the system processor through
read and write ports on the data bus. The read port
provides interrupt status and a prioritized vector for the
processor, and the write port allows the processor to
selectively mask individual interrupt channels. The controller provides a separate interrupt request line to the
processor to signal a pending interrupt.. Figure 1 sho~s
the bit assignments for the read and wnte ports. In FIgure 2. you can see the interrupt controller's major functional blocks.

6-259

£i.~RESS
Bus-Oriented Maskable Interrupt Controller
~, ~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Figure 2. Interrupt Controller Block Diagram
Additionally, the CY7C331 includes six shared
input multiplexers, which allow you to bury up to six
output flip-flops without giving up any pins. Figure 3
shows a block diagram of the CY7C331. A diagram of
the device's I/O macrocell appears in Figure 1 of the
application note "Using the CY7C331 as a Waveform
Generator."

Four-Channel Interrupt Controller
The interrupt controller's operation is quite simple.
On reset, all interrupt channels are masked off, and no
interrupts are permitted. The processor then loads the
mask register with the desired interrupt-channel mask
bits cleared. If the channel is not masked when an interrupt request occurs, the request is prioritized, and the
controller asserts the Interrupt Request (IRQ) to the
processor.
The processor responds to the IRQ by reading the
interrupt vector port When the interrupt controller
detects .the read, the controller latches the current interrupt priority and places the priority vector on the
data bus. Latching the current priority while the vector
is being read prevents the vector from being altered
during the read cycle. In addition, the controller
decodes the vector and asserts the corresponding
channel's acknowledge line.
The acknowledge remains asserted until detected
by the interrupting element, which responds by deasserting its interrupt request. This interlocking handshake ensures that a pending interrupt is not lost or
responded to more than once. The controller also uses
the acknowledge internally to disable the interrupt request into the priority encoder; this is done in the time
between the interrupt acknowledge and the interrupt
request being deasserted. A simple example of the
timing sequence· for a single interrupting channel appears in Figure 4.
Figure 5 defmes the pin assignments for the fIrststage interrupt controller.

Figure3. CY7C331 Block Diagram
Low and· WE remains High, the controller holds the
current priority vector and interrupt status and places
them on the data bus. TheCY7C331's I/O macrocell is
readily adapted to this requirement, as illustrated in
Figure 6.
The INTERRUPT status generation requires a different inlplementation. If any interrupt requests are
pending when the controller detects a read cycle (CS

Data Bus Interface
The data bus interface requires bidirectional operation. When CS and WE are asserted Low, the controller
writes data into the mask register. When CS is asserted

6-260

ICS*W~E

INTn

_________________,

IRQ

DiB

~~_=DATA

CE
WE

CS
L..L...~~~~~LL...LLj-<~

BUS"

R S T'-_ _ _ _---,

ACK _ _-+_~
MASK"-"_ _ _ _-1

Figure 4. Timing Sequence for Single Interrupt
Channel

CE
WE

Low, WE High), the interrupt status bit must be asserted High. Moreover, new interrupt requests are held
off until the end of the read cycle. This requires a
clocked implementation of the interrupt status bit on
the data bus, as shown in Figure 7.

Figure 6. Mask/Priority Vector Function
ing have been met. An SR flip-flop implements the acknowledge-generation function for each channel. The
flip-flop is set when a read cycle occurs, the priority
vector corresponds to the channel, and the delayed internal strobe occurs. The flip-flop is reset when the interrupt request for the channel is de asserted.
A logic diagram for the internal strobe generation
and a single acknowledge-generation block appears in
Figure 8. The timing diagram in Figure 9 illustrates a
typical operation.

Acknowledge Generation
Acknowledge generation requires that the controller decode the priority vector placed on the data bus
and assert the corresponding acknowledge line until the
interrupt request line is deasserted. The controller must
handle the timing carefully for correct operation.
Specifically, because a valid priority vector is not available until after CS is asserted Low, the controller cannot decode the correct channel until the priority vector
register has settled. Thus, a delay is required before the
controller can generate an acknowledge.
The controller can generate a delay by taking advantage of the following sequence: If there is a pending
interrupt request, the interrupt status bit is always asserted one propagation delay after CS is asserted on a
read cycle. The interrupt status signal is then passed
through an internal strobe stage, which causes an additional propagation delay. The internal strobe then initiates the acknowledge-generation sequence.
The delayed strobe assures that the priority vector
value has settled and the setup requirements for decod1
2

1 CS
1 WT
1 RS T

3
4
5
6
7
8

REQ3
REQ2
REQI
REQO

9
10
11
12
13
14

28
27
26
25
24
23
22
21

Logic Equations
The Cypress PLD ToolKit assembles the Boolean
equation s for the interrupt controller (Appendix A).
The equations are heavily commented for clarity. Because the PLD ToolKit does not currently support "DeMorgan ization," and because the CY7C331 contains inverting output buffers, the Boolean equations for output
flip-flops are written for negative logic (i.e., solving for
zero). In addition, the inversion requires swapping of
the SET and RESET functions on the output flip-flops.
Thus, the logical Boolean equation required to set the
flip-flop must be implemented on the flip-flop's reset
input. Similarly, the equation required to reset the flipflop must be implemented on the flip-flop's set input.

Adding Cascade Capability
IRQ

You can readily extend the interrupt controller
design to accommodate four additional channels by inles *
REQ3

V~E~

DTB3
DTB2

20

DTBl
DTBO

19
18
17
16
15

ACK3
ACK2
ACKl
ACKO

_________________-,

REQZ

ISTAT

REQI
REQO
RST _______________
MASK3 ____________

~

·0·

Figure 7. Interrupt Status Generation

Figure 5. Interrupt Controller Pin Assignments

6-261

5r~ =========B;;;;;;U;;;S;;;;.O~rl;;;·e;;n;;;te;;d~M;;;a;;;s;;;k;;;a;;;b;;le~I;;;n;;;te~r~r~u!p~t~C~o~n~tr~o~l~le~r
REO! 4 .. 7') _-7-4_~

INTERNAL STROBE

~--I--

liE

UPPER
INTERRUPT
CONTROLLER

.. 3 )

INTERRUPT

~rRQ

ICS

WE

r

LOIJER
REO!

PRIORITY
VECTOR

~

-+-~

cs

DTB! 4 .. 7 )

,",,--·-1--31-

CONTROLLER

DTB! ~ .. 3 )
ACK(i2I .. 3)

I

CS

Figure 10. Cascading Interrupt Controllers

Figure 8. Internal StrobelAcknowledge Generation
corporating a cascade mechanism. You can then attach
a second interrupt controller to to the ftrst (Figure 10).
The additional channels require an extension to the formats of the mask register and the interrupt vector (Figure 11).
The lower interrupt controller supports the lowerpriority interrupt channels, generates the IRQ to the
processor, and places the interrupt status and priority
yector on the data bus during a read cycle.· The upper
mterrupt controller supports the higher-priority channels and passes its current status and priority vector
down to the lower interrupt controller.
The interrupt status line is asserted High when the
upper interrupt controller has a non-masked interrupt
request pending. To permit the host processor to write
into the upper interrupt controller's mask register, the
controller monitors the data bus's upper four bits. Because the upper interrupt controller passes its priority
vector directly to the lower interrupt controller, however, the upper interrupt controller does not need to
output any data on the bus during a read cycle.

i~Tr

c

5 ,A,TU,:>

ACKn

MSK \lORD
(\/RITE 1

7

3

2

I

0

I1l_ENI\8LED

I-MSKED

INTERR'UPT VECTOR
(READ)
7
6

5

4

3

_UTL
2

I

0

CKT..ITXT2-:-J:s V2-1--vT'V0'

___,d'

___

65";

19iil~fi"ITr;I:i~JQl11fD;l.[f~grQ£iJQijJ

~~.
~_}

i

VECTOR LSB

I' - - - - -

VECTOR 2SEl

c'ef>---

1
''-T

INTERNAL
STROBE

In operation, the lower interrupt controller must
monitor the status interrupt line from the upper con!£Oller. The lower controller incorporates the interrupt
mto the IRQ to the host processor and into the interrupt vector placed on the data bus during a read cycle.
Modifying the interrupt vector is straightforward.
Because the upper interrupt channels have higher
priority, when the interrupt status from the upper controller is asserted, the interrupt vector's lower two bits
are the two vector bits from the upper controller. When
the status is not asserted, the interrupt vector's lower

.

.

--------------- VECTOR /'ISEl

I'--.Iclf>-'--

..

--~t .

I

' - - - - - - - STATUS

<,'

.r

L
3-> NO iNTEHRUPTS

------alt'~-;

I ~ VECTOR !S VAllO

REOn
L--Figure 9. Timing Diagram

Figure 11. Extended Interrupt Vector

6-262

two bits are the lower priority interrupt vector encoded
from the lower interrupt controller. The interrupt
vector's third bit is simply the state of the interrupt
status signal from the upper controller. The modified
interrupt controller equations for the lower element appear in Appendix B. and the upper element equations
in Appendix C.

Summary

moderate-complexity interrupt controllers. You can extend the design as required for different request
polarity levels, edge-sensitive inputs, or additional
channels.
Simulations of the interrupt controller show that
the design works as expected. You can obtain the PLD
source files for the design from your local Cypress sales
office.

The interrupt controller described in the application note can serve as the basis for flexible low-to-

6-263

~CYPRISS

Bus-Oriented Maskable Interrupt Controller

~~m~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. PLD ToolKit Source Code
Stand Alone Interrupt Controller
{Stand Alone Interrupt Controller}
CY7C331; {declare device type}
CONFIGURE;
CS(node = 4),
WE(node = 5),
RST(node = 6),
REQ3(node = 9),
REQ2(node = 10),
REQ l(node = 11),
REQO(node = 12),

{pin 4, chip select}
{pin 5, write enable}
{pin 6, reset}
{pin 9, interrupt request channel 3}
{pin 10, interrupt request channel2}
{pin 11, interrupt request channell}
{pin 12, interrupt request channel O}

!IRQ(node = 27),
{pin 27, interrupt to processor}
ISTAT(node = 28),
{pin 28, data bus 3 - interrupt status}
PVEC2(node = 26),
{pin 26, data bus 2 - priority vector bit 2}
PVEC1(node = 24),
{pin 24, data bus 1 - priority vector bit I}
PVECO(node = 20),
{pin 20, data bus 0 - priority vector bit O}
ACK3(node = 18),
{pin 18, acknowledge channel3}
ACK2(node = 17),
{pin 17, acknowledge channel2}
ACK1(node = 16),
{pin 16, acknowledge channell}
ACKO(node = 15),
{pin 15, acknowledge channel O}
MSK3(node = 34,SRC = 28), {shared input mux for pin 28}
MSK2(node = 33,SRC = 26), {shared input mux for pin 26}
MSK1(node = 32,SRC = 24), {shared input mux for pin 24}
MSKO(node = 31,SRC = 20), {shared input mux for pin 20}
ISTB(node = 25),

{pin 25, internal strobe}

EQUATIONS;
IRQ = < oe>
< set_out> {make FF transparent}
< clr_out> {make FF transparent}
< xsum> {force invert}
< sum> REQ3 & IACK3 & IMSK3
# REQ2 & IACK2 & IMSK2
# REQ1 & lACK 1 & IMSKl
# REQO & IACKO & IMSKO;
!ISTAT = < oe> ICS & WE
< xsum> {force invert}
< set_out> CS & ISTAT {FF output is reset}
< ck out> ICS & WE
< seCin> IRST {interrupt is masked on reset}
< ck in> lWE & ICS
< sum> REQ3 & IACK3 & IMSK3
# REQ2 & IACK2 & IMSK2
# REQ1 & !ACK1 & !MSK1
# REQO & !ACKO & !MSKO;
IPVEC2 = < oe> ICS & WE
< set out> {always zero}
< set-in> !RST {interrupt is masked on reset}
< ck,=-in> lWE & !CS;
.

6-264

~=
Bus-Oriented Maskable Interrupt Controller
~ .~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix A. PLD ToolKit Source Code
Stand Alone Interrupt Controller (continued)
IPVECl = < oe> ICS & WE
< xsum> {force invert}
< ek out> ICS & WE
< suiID. IACK3 & REQ3 & IMSK3
# IACK2 & REQ2 & IMSK2
< set in> IRST {interrupt is masked on reset}
< ek]n> lWE & ICS;
IPVECO = < oe> ICS & WE
< xsum> {force invert}
< ek out> ICS & WE
< sum> !ACK3 & REQ3 & !MSK3
# !ACKl & REQ 1 & IMSKl & MSK2 # IMSKl & IACKl & REQ 1 & IREQ2
< set_in> !RST {interrupt is masked on reset}
< ek_in> lWE & ICS;
IACK3 = < oe>
< elr_out> !CS & WE & PVECl & PVECO & ISTB & IACK3 {FF output is set}
< set_out> CS & ACK3 & !REQ3; {FF output is reset}
IACK2 = < oe>
< elr_out> !CS & WE & PVECl & IPVECO & ISTB & IACK2 {FF output is set}
< set_out> CS & ACK2 & IREQ2; {FF output is reset}
IACKl = < oe>
< elr_out> ICS & WE & IPVECl & PVECO & ISTB & IACKl {FF output is set}
< set_out> CS & ACKl & IREQl; {FF output is reset}
IACKO= < oe>
< elr_out> ICS & WE & IPVECl & IPVECO & ISTB & IACKO {FF output is set}
< set_out> CS & ACKO & IREQO; {FF output is reset}
!ISTB = < oe>
< elr_out> 1ST A T & !ISTB {FF output is set}
< set_out> CS & ISTB; {FF output is reset}

6-265

1i1:CYPRISS

Bus-Oriented Maskable Interrupt ControUer

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix B. PLD ToolKit Source Code
Cascadable Interrupt Controller-Lower Element
{Cascaded Interrupt Controller - Lower Element}
CY7C331; {declare device type}
CONFIGURE;
UST A T(node = 1),
RVECl(node = 2),
R VECO(node = 3),
CS(node = 4),
WE(node = 5),
RST(node = 6),
REQ3(node = 9),
REQ2(node = 10),
REQl(node = 11),
REQO(node = 12),
!lRQ(node = 27),
1STA T(node = 28),
PVEC2(node = 26),
PVECl(node = 24),
PVECO(node = 20),
ACK3(node = 18),
ACK2(node = 17),
ACKl(node = 16),
ACKO(node = 15),
MSK3(node = 34,SRC
MSK2(node = 33,SRC
MSKl(node = 32,SRC
MSKO(node = 31,SRC
ISTB(node = 25),

{pin 1, upper element interrupt status}
{pin 2, ripple vector bit 1 from upper element}
{pin 3, ripple vector bit 0 from upper element}
{pin 4, chip select}
{pin 5, write enable}
{pin 6, reset}
{pin 9, interrupt request channel3}
{pin 10, interrupt request channel2}
{pin 11, interrupt request channell}
{pin 12, interrupt request channel O}

=
=
=
=

{pin 27, interrupt to processor}
{pin 28, data bus 3 - interrupt status}
{pin 26, data bus 2 - priority vector bit 2}
{pin 24, data bus 1 - priority vector bit I}
{pin 20, data bus 0 - priority vector bit O}
{pin 18, acknowledge channel3}
{pin 17, acknowledge channel2}
{pin 16, acknowledge channell}
{pin 15, acknowledge channel O}
28), {shared input mux for pin 28}
26), {shared input mux for pin 26}
24), {shared input mux for pin 24}
20), {shared input mux for pin 20}
{pin 25, internal strobe}

EQUATIONS;
IRQ = < oe>
< set_out> {make FF transparent}
< clr_out> {make FF transparent}
< xsum> {force invert}
< sum> REQ3 & !ACK3 & !MSK3
# REQ2 & !ACK2 & !MSK2
# REQ 1 & IACKl & IMSKl
# REQO & IACKO & !MSKO
# USTAT;
!ISTAT = < oe> !CS & WE
< xsum> {force invert}
< set_out> CS & ISTAT {FF output is reset}
< ck out> ICS & WE
< se~in> IRST {interrupt is masked on reset}
< ck in> lWE & ICS
< sum> REQ3 & IACK3 & !MSK3
# REQ2 & IACK2 & !MSK2
# REQ 1 & IACKl & !MSKI
# REQO & !ACKO & IMSKO
# USTAT;

6-266

~RESS

Bus-Oriented Maskable Interrupt Controller

~~ ~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix B. PLD ToolKit Source Code
Cascadable Interrupt Controller-Lower Element (continued)
lPVEC2 = < oe> lCS & WE
< xsum> {force invert}
< ck out> lCS & WE
< sum> USTAT
< set in> lRST {interrupt is masked on reset}
< ck.In> lWE & lCS;
lPVECl = < oe> lCS & WE
< xsum> {force invert}
< ck out> lCS & WE
< sum> lACK3 & REQ3 & lMSK3 & lUSTAT
# lACK2 & REQ2 & lMSK2 & lUSTAT
# RVECl & USTAT
< set in> lRST {interrupt is masked on reset}
< ck]n> lWE & lCS;
lPVECO = < oe> lCS & WE
< xsum> {force invert}
< ck out> lCS & WE
< sum> lACK3 & REQ3 & lMSK3 & lUSTAT
# lACKl & REQ 1 & lMSKl & MSK2 & lU STAT
# lMSKl & lACKl & REQI & lREQ2 & lUSTAT
# RVECO & USTAT
< set in> lRST {interrupt is masked on reset}
< ck]n> lWE & lCS;
lACK3 = < oe>
< elr_out> lCS & WE & lPVEC2 & PVECI & PVECO & ISTB & lACK3 {FF output is set}
< set_out> CS & ACK3 & lREQ3; {FF output is reset}
lACK2 = < oe>
< elr_out> lCS & WE & lPVEC2 & PVECI & lPVECO & ISTB & lACK2 {FF output is set}
< set_out> CS & ACK2 & lREQ2; {FF output is reset}
lACKl = < oe>
< clr out> lCS & WE & lPVEC2 & lPVECl & PVECO & ISTB & lACKl {FF output is set}
< sei='out> CS & ACKI & lREQl; {FF output is reset}
lACKO = < oe>
< clr out> lCS & WE & lPVEC2 & lPVECl & lPVECO & ISTB & lACKO {FF output is set}
< se(.out> CS & ACKO & lREQO; {FF output is reset}
lISTB = < oe>
< clr out> 1STA T & !ISTB {FF output is set}
< se(.out> CS & ISTB; {FF output is reset}

6-267

~
9!

~~RESS

Bus-Oriented Maskable Interrupt Controller

~, ~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix C. PLD ToolKit Source Code
Cascadable Interrupt Controller-Upper Element
{Cascaded Interrupt Controller - Upper Element}
CY7C331; {declare device type}
CONFIGURE;
CS(node = 4),
WE(node = 5),
RST(node = 6),
REQ3(node = 9),
REQ2(node = 10),
REQ l(node = 11),
REQO(node = 12),

{pin 4, chip select}
{pin 5, write enable}
{pin 6, reset}
{pin 9, interrupt request channel3}
{pin 10, interrupt request channel2}
{pin 11, interrupt request channell}
{pin 12, interrupt request channel O}

{pin 28, data bus 3 - always zero}
PVEC3(node = 28),
PVEC2(node = 26),
{pin 26, data bus 2 - always zero}
PVEC1(node = 24),
{pin 24, data bus 1 - always zero}
PVECO(node = 20),
{pin 20, data bus 0 - always zero}
ACK3(node = 25),
{pin 25, acknowledge channel3}
ACK2(node = 23),
{pin 23, acknowledge channel 2}
ACK1(node = 19),
{pin 19, acknowledge channell}
ACKO(node = 17),
{pin 17, acknowledge channel O}
1\1SK3(node = 34,SRC = 28), {shared input mux for pin 28}
MSK2(node = 33,SRC = 26), {shared input mux for pin 26}
MSKl(node = 32,SRC = 24), {shared input mux for pin 24}
MSKO(node = 31,SRC = 20), {shared input mux for pin 20}
ISTB(node = 27),

{pin 27, internal strobe}

USTA T(node = 18),
{pin 18, interrupt status output}
ISENSE(node = 30,SRC = 18), {shared input mux for pin 18}
{internal interrupt sense to generate input for ISTB}
RVECl(node = 16),
RVECO(node = 15),

{pin 16, ripple vector bit 1 output}
{pin 15, ripple vector bit 0 output}

EQUATIONS;
!PVEC3 = < set out> {always zero}
< set in> iRST {interrupt is masked on reset}
< ck,=-in> !WE & !CS;
!PVEC2 = < set out> {always zero}
< set in> iRST {interrupt is masked on reset}
< ck.=-in> !WE & !CS;
!PVEC1 = < xsum> {force invert}
< ck out> !CS & WE
< sum> !ACK3 & REQ3 & !MSK3
# !ACK2 & REQ2 & !MSK2
< set in> !RST {interrupt is masked on reset}
< ck3n> !WE & !CS;
!PVECO = < xsum> {force invert}
< ck_out> !CS & WE

6-268

2:~RESS
-=JII'

SEMlcamUCfOR

Bus-Oriented Maskable Interrupt Controller

=========================;.;;;;=====;;;;;;
Appendix C. PLD ToolKit Source Code
Cascadable Interrupt Controller-Upper Element (continued)

< sum> IACK3 & REQ3 & IMSK3
# IACKl & REQl & IMSKl & MSK2 # IMSKl & IACKl & REQl & IREQ2
< set_in> IRST {interrupt is masked on reset}
< ek_in> lWE & ICS;
IACK3 = < oe>
< elr out> ICS & WE & PVECl & PVECO & ISTB & IACK3 {FF output is set}
< se(out> CS & ACK3 & IREQ3; {FF output is reset}
IACK2 = < oe>
< elr_out> ICS & WE & PVECl & IPVECO & ISTB & IACK2 {FF output is set}
< set_out> CS & ACK2 & lREQ2; {FF output is reset}
IACKl = < oe>
< elr_out> ICS & WE & IPVECl & PVECO & ISTB & lACK! {FF output is set}
< set_out> CS & ACKl & IREQl; {FF output is reset}
lACKO = < oe>
< elr_out> ICS & WE & IPVECl & IPVECO & ISTB & IACKO {FF output is set}
< set_out> CS & ACKO & IREQO; {FF output is reset}
lUSTAT = < oe>
< xsum> {force invert}
< set_out> {make FF transparent}
< elr_out> {make FF transparent}
< sum> REQ3 & IACK3 & IMSK3
# REQ2 & IACK2 & IMSK2
# REQ 1 & IACKl & IMSKl
# REQO & IACKO & lMSKO
< ek in> ICS & WE
< elr-=.in> CS & ISENSE;
lR VECl = < oe>
< xsum> {force invert}
< set_out> {make FF transparent}
< elr_out> {make FF transparent}
< sum> IACK3 & REQ3 & !MSK3
# !ACK2 & REQ2 & IMSK2;
IR VECO = 
< xsum> {force invert}
< set_out> {make FF transparent}
< elr out> {make FF transparent}
< ek -out> ICS & WE
< sum> !ACK3 & REQ3 & !MSK3
# !ACKl & REQ 1 & IMSKl & MSK2
# IACKl & REQ 1 & IMSKl & !REQ2;
!ISTB = < oe>
< elr out> ISENSE & !ISTB {FF output is set }
< se(out> CS & ISTB; {FF output is reset}

6-269

CYPRESS
SEMICONDUCTOR

Using the CY7C330 as a Multi-channel Mbus
Arbiter
This application note discusses the use of the
CY7C330 as a bus arbiter for an Mbus system based on
the Cypress SPARC CY7C600 RISC processor. The
CY7C330 is a high-speed synchronous erasable
programmable logic device (EPLD) optimized for fmite
state machine (FSM) applications.
The Cypress SPARC system utilizes a CY7C601
RISC processor, a CY7C602 floating point unit (FPU),
four CY7C604 cache controller and memory management units (CMU), and eight CY7C157 16K x 16 cache
RAM s for a 256-Kbyte cache. The arbiter uses a combination of techniques to resolve Mbus access contention for a system with four CMU bus masters. Figure 1
shows a block diagram of the Mbus system.

CY7C330 Brief Description
The CY7C330 is a 66-MHz, high-performance PLD
with 11 input latches, 17,000 programmable bits, four
buried state registers, and 12 user-configurable output
macrocells. It is manufactured using a CMOS 0.8micron, double-metal processing technology that is UV
erasable. The CY7C330 comes in 28-pin, 300-mil dual
in-line and LCClPLCC packages. You can partition it
into multiple functional blocks, as shown in this applica-

Figure 1. Mbus System Block Diagram

tion.(See Figure 1 in "Understanding the CY7C330
Synchronous EPLD" for a block diagram of the
CY7C330.)

Mbus Description
The Mbus is a system bus defined to be a SPARC
standard main memory interface for the Cypress
CY7C604 SPARC cache/memory management unit.
The M in Mbus stands for module and emphasizes the
multi-processor module support .that SPARC offers.
The Mbus is a high-speed synchronous, 64-bit, multiplexed address/data bus that operates at the
CY7C601 's clock rate. Mbus accesses are initiated by a
master and responded to by a slave. Generally, a bus
transaction takes place between a master and main
memory, but in the case of direct data intervention,
transactions can occur between masters.
The handshake between the CY7C604 CMMU and
the arbiter utilizes a request line (MRQO-3) and a grant
line (MGTO-3) for each master. A busy line (MBB) is
common to all masters and indicates that the bus is in
use.
Figure 2 shows the multiple Mbus request sequence. By design, bus mastership and resolution of
multiple requests are performed outside the realm of
Mbus and SPARC. This allows you to implement the
arbitration scheme that best fits your system requirements. The application example presented here
describes only one such implementation.
Mbus transfers are synchronous with respect to the
system clock. The data transactions across the bus consist of a single-clock-period address phase and a multiple-clock-period data phase. The bus transfers data in
word (64-bit), multi-word burst, or. atomic-load-store
formats. All signals are valid and sampled on the system
clock's rising edge. The address phase is validated by
the memory address strobe (/MAS) signal, which
denotes the start of the actual data transfer. Bus states
are indicated by three status lines and convey the current bus operation as well as error status. Figure 3
shows Mbus data transfer waveforms.

6-270

Timing Considerations

SHRE IJRITE ACCESS.

To meet the Mbus timing specifications, the arbitrator must be able to: accept a request, resolve any
access contention, and grant bus rights to a master, all
in a single Mbus clock cycle. In this application, a 66MHz CY7C330 implements the arbiter, whose input
registers run at the same 33-MHz clock rate as the
CY7C601 and CY7C604s. This speed allows the arbiter
inputs to meet the Mbus masters' timing requirements.
The output registers (including the state machine) are
clocked at twice the rate of the bus masters (66 MHz),
enabling the arbiter to sample requests with the input
latches on one Mbus clock cycle's rising edge, transfer
from one state to another, and grant access before the
Mbus clock's next rising edge. Figure 4 illustrates the
timing relationship between Master 0 (CY7C604 at 33
MHz) and the 66 MHz CY7C330 arbiter.

~TA

INS
ItlWf

~ ~AIT

STATES

~---;.---.;-----;--

~

L-..l.--Jr---+------+--

II\I£lRY
Itmm
I~

~4:--~-~---'~--~----~­
iE-1IIII!I~MIAAfII~

\6-BYTE BlffiT READ.

[J£ ~AIT

STATE

~OJIK

MDBSIDATA
INS

Arbitration Scheme

IMfNlf

You can employ several resolution techniques for
the arbitration function. Fixed priority, rotating priority,
least recently used (LRU) , and random priority prove
successful, although each has its own faults. A fixed
priority, for instance, favors one requester more than
the others. Rotating priority provides a simple but not
always fair approach to arbitration. An LRU arbitration
scheme represents the fairest form of contention resolution but requires a highly complex implementation. The
random technique does not allow predictable arbitration results and could result in performance problems.
A combination of methods minimizes the associated problems. The circuit presented here, for example, employs both a random and a fixed priority
scheme. The random scheme uses a 2-bit counter that
increments every clock cycle and varies the priority accordingly.
You can set the priority function such that the
processor can specify which master has the highest
priority; the processor does this by loading a value into
the CY7C330 via a store instruction. To support the
processor in this function, the interface to the processor
must provide a latched and decoded chip select, along

lNURY

mal

~~~--~----~--~-

IlIEJRR
I~

~~:_ _-,-_---.:_ _-,-_-,-:..--,r:iE-1IIII!I~'/IIlIX1£~WA:"'~

Figure 3. Mbus Data Transfer Waveforms
with a latched write enable connected directly to the arbiter. The priority function can be of value if the preset
highest-priority Mbus master is fetching a program's
critical data from main memory. The remaining channels follow a preset priority defined in Table 1.
The Random Priority Counter employs the same
priority scheme used for preset priority and operates
only when the latched priority is disabled by the priority
selection block via the EN signal.

Design Partitioning
The arbiter design is partitioned into four functional blocks that are designed separately (Figure 5). The
first block is the priority latch, which is a synchronous
register using the decoded and latched chip select (lCS)

CLOCK

CY7C331l IN'UT

---,l--_
/~R00 ---,L-_____

a f8..6 IllIK
CY7C331l rurrur

/ ~ROl

!lOCK

/I\R00
----l

/I\GT0

/~GTI

/1\88
MEHlER

STAlE

/~GT0

/~BB

Figure 2. Mbus Multiple Request Sequence

Figure 4. CY7C604 & CY7C330 Timing for Master 0

6-271

~

:.n~ucrOR =====;;;;;;;;U;;;s;;;;in;;:g;;t;;;;h;;;;e;;;;;C;;;;;Y~7;;;C;;;;3;3;;;;;O;;;;;;;a;;;;s;;;;;;;a;;;;;;M;;;;;;;;u;;;;lt;;;;;;i-;;;;;;c;;;;;h;;;;;a;;;;;D;;;;;;De;;;;;I;;;;;M;;;;;;;;;;;h;;;;;u;;;;s;;;;;A;;;;;r;;;;;h;;;;;it;;;;;e;;;;;;r
FID\ 11lE. GTLVAIT. GT2.VAIT (J' GlUAIT

II\GTII
II\GTI
11\GT2
IIIGT3

IftIB ACTIVE

TO GTlUl

Figure 5. Arbiter Block Diagram
and write enable (/WE) signals from the CY7C601 to
generate an enable signal.
The priority latch accepts three data lines from the
processor bus (one for the priority enable and two for
the high-priority bus master's value). The latch loads
the values into dedicated registers.
The random counter, a minor portion of the design,
is a free-running counter that supplies a 2-bit binary
value to the priority-select block. The count changes
every output clock (CLKl) cycle and provides a "seed"
for the random priority function.
The priority-select block chooses between the
priority latch outputs (LPO - 1) and the random counter
value (CTO - 1) using the EN signal as the selection
criteria. The two outputs (PRIO - 1) feed to the handshake state machine and arbitrate between bus masters
when more than one simultaneous request occurs.
The handshake state machine monitors the request
(MRQO - 3) and busy (MBB) inputs and generates the
grant (MGTO - 3) signals that give an Mbus master
ownership of the bus.

TOGTIJI

TOGT3Jl
TO GT2JJ

Figure 6. Bus Master 0 State Diagram.
0-1-2-3 sequence. The equations for the random
counter are:
cn = CTO = + CTI */CTO + ICTO; + ICTI *CTO;
The priority selection block selects between the
priority latch and the random counter. This block is a
registered multiplexer that loads its register outputs
with the priority latch value if EN = 1, or the counter's
current state if EN = O. The outputs are updated every
clock and fed to the handshake state machine.

Handshake State Machine
The handshake state machine controls Mbus handshake and arbitration. The machine cycles through 13
discrete states in performing its function. On power-up
or reset, the state machine enters the idle state, waiting
for a bus request. Upon receiving a request (/MRQO,
for instance), the machine enters a wait mode (state
GTO 0). In wait mode, the arbiter looks for busy
(!MBE) to go inactive, while driving the IMGTO output
active. When !MBB goes inactive, the machine goes to
state GTO 1 and holds IMGTO active, while waiting for
the granted master to· assert !MBB. When IMBB is
Table 1. Mbus Channel Priorities

Priority Latch, Select and Random Counter
As described previously, the priority latch is a
synchronous register loaded by the processor. When the
active-Low write enable (/WE) and chip select (/CS)
signals are both Low, the latch loads three data bits
from the bus to the three macrocells dedicated to the
priority latch. When either lWE or ICS are inactive
(High), each register's output value is continuously
reloaded every clock cycle, thus retaining the proper
value. The equations for the priority latch are:
EN= ICS */WE*D2 + ICS*/WE*Dl + ICS*/WE*DO
+ EN*WE + LPI *WE + LPO*WE + EN*CS; +
LPI *CS; +LPO*CS;
EN = LPl= LPO;
The random counter is simply a 2-bit counter that
changes state every output clock (CLKl) transition. The
counter clears when lRESET is Low and counts in a

PRIORITY

Latched

6-272

Value

FIRST

2ND

3RD

LOWEST

11

master3

master2

masterl

master4

10

master2

masterl

masterO

master3

01

masterl

masterO

master3

master2

00

masterO

master3

master2

masterl

~~
~

; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; U~si~n!g~t; ;he~C; ;Y; ;7; ;C; ;3; 3; ;O; ; ; a; ;s; ; ; a; M~u; ;lt; ; i; ;.c; ; ; ; h; ; a; ; ; n ; ; ; e; ; ; I; ; ; M; ; ; ; ; b; ; ; ; ; us

;;;;;;;;A;;;;;;;;r;;;;;;;;b;;;;;;;;it;;;;;;;;;;;er

SEMJeamUCTOR;;;;

detected, the machine goes to state GTO_WAIT and
looks for another request. The MGTO grant line is held
active during and after the sequence, allowing the
master to maintain bus ownership until another master
requests ownership.
Figure 6 shows the bus master 0 state diagram and
the request/grant handshake. The operation is identical
for each of the four bus masters.
The equations for the handshake state machine can
be produced from a state transition table that also includes the arbiter's priority encoding. The table can be
reduced to a manageable number of minterms using a
public-domain optimizer called McBOOLE.. (see the
Reference). Appendix A shows the state tranSItIon table.
The sum-of-products format equations are then merged
into the Cypress PLD ToolKit design file with the
priority-latch, random-counter, and priority -se~ection
equations. The PLD ToolKit design file appears m Appendix B.

Design Verification
The CY7C330 four-channel Mbus arbiter design
was entered and verified using the PLD ToolKit. Design
verification was performed using the PLD ToolKit's interactive simulator. A mouse was used with pop-down
menus to create the circuit stimuli by drawing the
waveform on the graphics screen for a each CY7C330
node or pin. The SIMULATE command was then
selected, and the response waveforms were visu~ly inspected, giving a high degree of confidence m the
design's function before programming a part.

Reference
"McBOOLE: A New Procedure For Exact Logic
Minimization," M.R. Dagenias, V.K. Agarwal, N.C.
Rumin, IEEE transactions on CAD of Circuit and Systems, vol. CAD-5, N.I, January 1986, p.229.

6-273

5:1;=

Using the CY7C330 as a Multi-channel Mbus Arbiter

~CaID~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. Mbus Handshake!Arbiter State Transition Table.

I*STATE TABLE FOR MBUS ARBITER HANDSHAKE STATE MACHINE -names:
MBB,MRQ3,MRQ2,MRQ1.MRQO,PRI1,PRIO,STI,ST2,ST1,STO,MGTI,MGT2,MGT1,MGTO;
STI,ST2,ST1,STO,MGTI,MGT2,MGT1,MGTO; output
*1
1* PRESENT
NEXT
STATE
STATE
(INPUTS)
(OUTPUTS)

input

----------------------------------------------------------------------MMMMPP MMMM
MRRRRRRSSSSGGGG
BQQQQll I I I I I I I I
B32101032103210
XllllXXOOOOOOOO
XI110XXOOXOXXXX
XllOlXXOOXOXXXX
XIOlIXXOOXOXXXX
XOIIIXXOOXOXXXX
XOOOOOOOOXOXXXX
XOOO1 OOOOXOXXXX
XOO100000XOXXXX
XOO110000XOXXXX
X01000000XOXXXX
XO 10 1OOOOXOXXXX
XOllOOOOOXOXXXX
X10000000XOXXXX
Xl00l0000XOXXXX
XI0l00000XOXXXX
XllOOOOOOXOXXXX
XOOOOO100xOXXXX
XoooI0100XOXXXX
XOOI001 OOXOXXXX
XOOI10100XOXXXX
XO 10001 OOXOXXXX
XO 10 101 OOXOXXXX
X01100100XOXXXX
X10000100XOXXXX
X1OO10100XOXXXX
XI0loo100XOXXXX
Xl1ooo100XOXXXX
XOOOOloooXOXXXX
XooolloooXOXXXX
X0010 l000XOXXXX
Xooll1000XOXXXX

MMMM
SSSSGGGG
II II II II
32103210*1
OOOOOOOO
01000001
01000010
10000100
10001000
01000001
10001000
01000001
10001000
01000001
10001000
01000001
01000001
10000100
01000001
01000001
01000010
01000010
01000001
10001000
01000010
01000010
01000001
01000010
01000010
01000001
01000010
10000100
10000100
10000100
10000100

I*WAlT FOR MRQx *1
I*GOTO GTO *1
I*GOTO GTt *1
I*GOTO GT2 *1
I*GOTO GTI *1
I*GOTO GTO *1
I*GOTO GTI *1
I*GOTO GTO *1
I*GOTO GTI *1
I*GOTO GTO *1
I*GOTO GTI *1
I*GOTO GTO *1
I*GOTO GTO *1
I*GOTO GT2 *1
I*GOTO GTO *1
I*GOTO GTO *1
I*GOTO GTt *1
I*GOTO GT1 *1
I*GOTO GTO *1
I*GOTO GTI *1
I*GOTO GTt *1
I*GOTO GTt *1
I*GOTO GTO *1
I*GOTO GTt *1
I*GOTO GTt *1
I*GOTO GTO *1
I*GOTO GTt *1
I*GOTO GT2 *1
I*GOTO GT2 *1
I*GOTO GT2 *1
I*GOTO GT2 *1

XO 100 1OOOXOXXXX
X01011OOOXOXXXX
XOll 0 loooXOXXXX
Xlooo1oooXOXXXX
XlOOl1000xOXXXX
X10101000XOXXXX
X11OO1OOOXOXXXX
XOOOO11 ooXOXXXX
Xoooll100XOXXXX
XOO101100XOXXXX

01000010
01000010
01000001
10000100
10000100
10000100
01000010
10001000
10001000
10001000

I*GOTO GTt *1
I*GOTO GTt *1
I*GOTO GTO *1
I*GOTO GT2 *1
I*GOTO GT2 *1
I*GOTO GT2 *1
I*GOTO GTI *1
I*GOTO GTI *1
I*GOTO GTI *1
I*GOTO GTI *1

6-274

-s;):CYPRESS
~

Using the CY7C330 as a Multi-channel Mbus Arbiter

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. Mbus Handshake/Arbiter State Transition Table.
XOOllll00XOXXXX
X01001100XOXXXX
XOI011100XOXXXX
XO 110 11 OOXOXXXX
Xl0001100XOXXXX
Xl00l1100XOXXXX
XlOI01100XOXXXX
X 1100 11 OOXOXXXX

10001000
10001000
10001000
10001000
10000100
10000100
10000100
01000010

OXXXXXXOI000001
lXXXXXXOl00000l
1XXXXXXOOO 10001
OXXXXXXOOOI 000 1
Xll11XXOOl00001

01000001
00010001
00010001
00100001
00100001

OXXXXXXOl000010
lXXXXXXOlooooI0
lXXXXXX00010010
OXXXXXXOOOl0010
XI111XXoolooolO

01000010
00010010
00010010
00100010
00100010

OXXXXXXI 00001 00
1XXXXXX 10000100
lXXXXXXoooI0loo
OXXXXXXoooI01oo
XIIIIXX00100100

10000100
00010100
00010100
00100100
00100100

OXXXXXX 1000 1000
1XXXXXX 1000 1000
lXXXXXXOOOll000
OXXXXXX00011000
XllllXXool0l000

10001000
00011000
00011000
00101000
00101000

'*GOTO G1'3 *'
'*GOTO G1'3 *'
'*GOTO G1'3 *'
'*GOTO G1'3 *'
'*GOTO GT2 *'
'*GOTO GT2 *'
'*GOTO GT2 *'
'*GOTO GTI *'
'*CH 0 STATES *'
'*GTO_O, WAIT ONMBB= 1 IN GTO_O*'
'*GTO_O, GOTO GTO_l *'
'*GTO_l, WAIT ON MBB= *'
I*GTO_l, GOTO GTO_WAIT *'
I*GTO_WAIT *'
'*CH 1 STATES *'
I*GTl_O, WAIT ONMBB= lINGTl_O*1
I*GTl_O, GOTO GTl_l *1
I*GTl_l, WAIT ON MBB= *'
I*GTl_l, GOTO GTl_WAIT *1
I*GTl_WAIT *'
'*CH 2 STATES *'
I*GT2_0, WAIT ON MBB= 1 IN GT2_0*1
I*GT2_0, GOTO GT2_1 *'
I*GT2_1, WAIT ONMBB = *1
I*GT2_1, GOTO GT2_WAIT *1

°
°

°

'*OT2_WAIT *'
'*CH3 STATES *'
I*OT3_0, WAIT ONMBB= lINOT3_0*'
'*GT3_0, GOTO OT3_1 *'
I*OT3_1, WAIT ONMBB = 0*'
'*GT3_1, OOTO GT3_WAIT *1
'*G1'3_WAIT *'

6-275

57~

=;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;V;;;;;;;S;;;;;;;iD;:g::;t;;;;;;;h;;;;;;;e;;;;;;;C;;;;;;;Y;;;;;;;;;;;;;7;;;;;;;C;;;;;;;3;;;;3;;;;;;;O;;;;;;;a;;;;;;;S;;;;;;;a;;;;;;;M;;;;;;;;;;;;;u;;;;;;;lt;;;;;;;i-;;;;;;;c;;;;;;;h;;;;;;;a;;;;;;;D;;;;;;;D;;;;;;;el;;;;;;;M=h;;;;;;;u;;;;;;;s;;;;;;;A;;;;;;;r;;;;;;;h;;;;;;;it;;;;;;;e;;;;;r

Appendix B. PLD ToolKit Source File for Mbus Arbiter
CY7C330;
{DESIGN FILE: FOUR CHANNEL MBUS ARBITRATION UNIT WITH
RANDOM PRIORITY COUNTERS AND SYNCHRONOUS PRIORITY ENABLE}
CONFIGURE;
{INPUTS}
CLKl,
CLK2,
!RESET,
MBB,
MRQO,
MRQl,
MRQ2,
MRQ3(node=9),
CS,
WE,
DO,
01,

{Output Clock 2x CLK2 }
{Input Clock = MBUS System Clock }
{Reset, Active Low}
{MBUS Busy, Active Low}
{MBUS Channel 0 Request, Active Low}
{MBUS Channel 1 Request, Active Low}
{MBUS Channel 2 Request, Active Low}
{MBUS Channel 3 Request, Active Low}
{Decoded Processor Chip Select}
{Processor Write Enable}
{Data Bus Bit 0, Lalched Priority Bit O}
{Data Bus Bit 1, Latched Priority Bit I}
{Data Bus Bit 2; Latched Priority Enable Bit}

02,

{OUTPUTS}
!MGTO(node= 15),
!MGTl,
!MGT2,
IMGT3,
lEN,
IPRIO(node=23 ),
!PRIl,
!CTO,

!cn,

!LPO,
!LPt,
INT RST(node=29),
STO(node=31),

STl,
ST2,
ST3,

{MBUS Channel 0 Grant, Active
{MBUS Channel 1 Grant, Active
{MBUS Channel· 2 Grant, Active
{MBUS Channel 3 Grant, Active
{Settable Priority Enable Bit}
{Priority Selection Bit O}
{Priority Selection Bit I}
{Random Counter Bit O}
{Random Counter Bit I}
{Latched Priority Bit O}
{Latched Priority Bit I}
{Sync Reset Node}
{State Variable Bit O}
{State Variable Bit I}
{State Variable Bit 2}
{State Variable Bit 3}
{End of configuration section}

Low}
Low}
Low}
Low}

EQUATIONS;
INT_ RST = RESET;
{MBUS Request/Grant Handshake State Machine Equations}
ST3 =



IMRQ3*MRQl *MRQO*/PRIl *IST3*IST2*ISTO

+ IMRQ3*PRIl *PRIO*IST3*IST2*ISTO

+ IMRQ3*MR QO*/PRIl */PRIO*IST3*IST2*ISTO
+ IMRQ3*MRQ2*MRQl*MRQO*IST3*IST2*ISTO
+ IMRQ2*PRll*/PRIO*IST3*IST2*ISTO
+ MRQ3*/MRQ2*MRQl*MRQO*IST3*IST2*ISTO
+ MRQ3*/MRQ2*MRQO*/PRIO*IST3*IST2*ISTO
+ MRQ3*/MRQ2*PRIl*IST3*IST2*ISTO
+ IMBB*ST3*IST2*ISTl*ISTO*/MGT3*MGT2*/MGTl*/MGTO
+ IMBB*ST3*IST2*ISTl*ISTO*MGT3*/MGT2*/MGTl*/MGTO;

6-276

~

~~RESS

Using the CY7C330 as a Multi-channel Mbus Arbiter

~;r~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix B. PLD ToolKit Source File for Mbus Arbiter
ST2

 MRQ2*/MRQl *PRIl */PRIO*IST3*IST2*ISTO
+ MRQ2*MRQ 1*/MRQO*/PRIO*IST3*IST2*ISTO
+ IMR Q 1*/PRIl *PRIO*IST3*IST2*ISTO
+ IMRQO*/PRIl */PRIO*IST3*IST2*ISTO

+ MRQl*/MRQO*/PRIl*IST3*IST2*ISTO
+ MRQ3*MRQ2*/MRQ 1*MRQO*IST3*IST2*ISTO
+ MRQ3*MRQ2*/MRQl*PRIl*IST3*IST2*ISTO
+ MRQ3*MRQ2*MRQ 1*/MRQO*IST3*IST2*ISTO
+ IMBB*IST3*ST2*ISTl *ISTO*/MGT3*/MGT2*/MGTl *MGTO
+ IMBB*IST3*ST2*ISTl *ISTO*IMGT3*/MGT2*MGTl */MGTO;
STl
 IMBB*IST3*IST2*ISTl *STO*/MGT3*/MGT2*MGTl */MGTO
+ IMBB*IST3*IST2*/ST 1*STO*IMGT3*MGT2*/MGTl */MGTO
+ IMBB*IST3*IST2*ISTl *STO*MGT3*/MGT2*/MGTl */MGTO
+ IMBB*IST3*IST2*ISTl*STO*/MGT3*/MGT2*/MGTl*MGTO
+ MRQ3*MRQ2*MRQl *MRQO*IST3*IST2*STl *ISTO*/MGT3*/MGT2*MGTl */MGTO
+ MRQ3*MRQ2*MRQl*MRQO*IST3*IST2*STl*ISTO*/MGT3*MGT2*/MGTl*/MGTO
+ MRQ3*MRQ2*MRQ 1*MRQO*IST3*IST2*STl *ISTO*/MGT3*/MGT2*/MGTl *MGTO
+ MRQ3*MRQ2*MRQl*MRQO*IST3*IST2*STl*ISTO*MGT3*IMGT2*/MGTl */MGTO;
STO =  MBB*IST3*IST2*ISTl *STO*/MGT3*/MGT2*/MGTl *MGTO
+ MBB*IST3*/ST2*ISTl *STO*IMGT3*/MGT2*MGTl */MGTO
+ MBB*/ST3*IST2*/STl *STO*/MGT3*MGT2*/MGTl *IMGTO
+ MBB*IST3*/ST2*ISTl *STO*MGT3*/MGT2*/MGTl */MGTO
+ MBB*IST3*ST2*ISTl *ISTO*/MGT3*/MGT2*MGTl *IMGTO
+ MBB*ST3*IST2*ISTl */STO*/MGT3*MGT2*/MGTl */MGTO
+ MBB*/ST3*ST2*/STl*ISTO*/MGT3*/MGT2*/MGTl*MGTO
+ MBB*ST3*/ST2*/STl *ISTO*MGT3*/MGT2*/MGTl */MGTO;
MGT3=
 /MRQ3*MRQl *MRQO*/PRIl *IST3*/ST2*ISTO
+ IMRQ3*PRIl *PRIO*/ST3*IST2*/STO
+ IMRQ3*MRQO*/PRIl */PRIO*/ST3*/ST2*ISTO
+ /MRQ3*MRQ2*MRQ 1*MRQO*IST3*IST2*/STO
+ MBB*IST3*IST2*/STl *STO*MGT3*IMGT2*/MGTl */MGTO
+ IMBB*/ST3*/ST2*/STl *STO*MGT3*/MGT2*/MGTl */MGTO
+ MRQ3*MRQ2*MRQ 1*MRQO*IST3*/ST2*STl */STO*MGT3*/MGT2*/MGTl */MGTO
+ MBB*ST3*IST2*/STl */STO*MGT3*/MGT2*/MGTl */MGTO
+IMBB*ST3*IST2*/STl *ISTO*MGT3*IMGT2*IMGTl */MGTO;
MGT2 = 
 MBB*IST3*IST2*/STl *STO*/MGT3*MGT2*IMGTl */MGTO
+ IMBB*/ST3*IST2*ISTl *STO*/MGT3*MGT2*/MGTl */MGTO
+ IMRQ2*PRIl */PRIO*IST3*IST2*ISTO
+ MRQ3*/MRQ2*MRQ 1*MRQO*IST3*IST2*ISTO
+ MRQ3*MRQ2*MRQ 1*MRQO*IST3*IST2*STl *ISTO*/MGT3*MGT2*/MGTl */MGTO

+
+
+
+
MGTl

MRQ3*/MRQ2*MRQO*/PRIO*IST3*IST2*ISTO
MRQ3*/MRQ2*PRIl */ST3*IST2*ISTO
MBB*ST3*IST2*ISTl *ISTO*/MGT3*MGT2*/MGTl */MGTO
IMBB*ST3*IST2*ISTl *ISTO*/MGT3*MGT2*/MGTl */MGTO;

= 
 MBB*IST3*IST2*ISTl *STO*/MGT3*/MGT2*MGTl */MGTO
+ IMBB*IST3*IST2*ISTl *STO*/MGT3*/MGT2*MGTl */MGTO
+ MRQ2*/MRQl *PRIl */PRIO*IST3*IST2*ISTO
+ IMRQ 1*/PRIl *PRIO*IST3*IST2*ISTO

+
+
+
+

MRQ3*MRQ2*/MRQl*MRQO*IST3*IST2*ISTO
MRQ3*MRQ2*/MRQ 1*PRIl *IST3*/ST2*/STO
MRQ3*MRQ2*MRQ 1*MRQO*IST3*/ST2*STl */STO*/MGT3*/MGT2*MGTl */MGTO
IMBB*/ST3*ST2*ISTl */STO*/MGT3*/MGT2*MGTl */MGTO
+ MBB*/ST3*ST2*/STl *ISTO*/MGT3*/MGT2*MGTl */MGTO;

6-277

~

~~RESS
~, SEM!camucrOR

_-;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;V;;;;;;;;;;;;;;;si;;;;D;:;g;;;;t;;;;he;;;;;;;;;;;;;;;C;;;;Y;;;;7;;;;C;;;;3;;;;3;;;;O;;;;a;;;;s;;;;a;;;;M;;;;;;;;;;;;;;;ll;;;;It;;;;i-;;;;c;;;;h;;;;a;;;;D;;;;ne;;;;I;;;;M;;;;;;;;;;;;;;;h;;;;ll;;;;s;;;;A;;;;r;;;;h;;;;it=er

Appendix B. PLD ToolKit Source File for Mbus Arbiter
MGTO


 MBB*IST3*IST2*ISTl *STO*/MGT3*/MGT2*IMGTl *MGTO
+ IMBB*IST3*IST2*IST 1*STO*IMGT3*IMGT2*IMGTI *MGTO
+ MRQ2*MR Q 1*/MR QO*/PRIO*IST3*IST2*ISTO
+ IMRQO*IPRIl */PRIO*IST3*IST2*ISTO
+ MRQ 1*/MRQO*IPRIl *IST3*IST2*ISTO
+ MRQ3*MRQ2*MRQ 1*/MRQO*IST3*IST2*ISTO
+ MRQ3*MRQ2*MRQ 1*MRQO*IST3*IST2*STI *ISTO*/MGT3*/MGT2*IMGTI *MGTO
+ IMBB*IST3*ST2*ISTI *ISTO*/MGT3*IMGT2*IMGTI *MGTO
+ MBB*IST3*ST2*ISTI *ISTO*/MGT3*IMGT2*/MGTI *MGTO;
=

{Random Counter Equations}
CTl = 
 CTI */CTO
+ ICTl*CTO;
CTO




ICTO;
{Latched Priority Equations}

EN


 ICS*IWE*D2
+ EN*WE
+ EN*CS;

LPI

=


 ICS*IWE*Dl
+ LPl*WE
+ LPl*CS;

LPO

=


 ICS*IWE*DO
+ LPO*WE
+ LPO*CS;

{Priority Selection Latch}

PRIl

PRIO


 IEN*CTl
+ EN*LPl;

=


 IEN*CTO
+ EN*LPO;
{End of file}

6-278

CYPRESS
SEMICONDUCTOR

Using the CY7C331as a Waveform Generator
This application note demonstrates the ability of the
Cypress CY7C331 CMOS Erasable Programmable Logic
Device (EPLD) to implement a design requiring multiple
clocks, input registers, buried registers, and independent
control of individual registers' set and reset inputs. Combined with this design flexibility, the CY7C331 provides
high-speed performance-an unprecedented combination.
The application example described in this application
note shows how to use the CY7C331 as a programmable
waveform generator.

ture that supports asynchronous and general-purpose gluelogic integration applications.
The CY7C331 has a 192-product-term array and 12
I/O-logic macrocells. Each macrocell has two D-type flipflops with asynchronous set, reset, and bypass capability.
You can individually program the flip-flops' clock, set,
and reset inputs, as well as each macrocell's logic polarity
and output enable control. The CY7C331 easily supports
combinatorial and registered inputs, along with buried
states.
The ability to bury registers and associated gates is
highly desirable because it helps increase the number of
usable gates in an EPLD. Typically, if you use an I/O pin
as an input, you waste the output register and its supporting product term structure. This loss occurs because conventional devices provide only one macrocell feedback
OE (PIN 14)

CY7C331 Background
The CY7C331 is a member of the Cypress slimline
28-pin family of high-performance CMOS EPLDs, which
are characterized by high speed, increased I/O, and high
integration. The CY7C331 has a highly flexible architec-

OE

RM

OUT SET PTERM
CO
PRODUCTS
OUT ClK PTERM

r-------+-------~ ~r-~~L-~~PIN

OUT RESET PTERM
IN ClK PTERM
IN SET PTERM
TO

INP T B FFER

IN RESET

PTERM

TO INPUT BUFFER

reg1 ster

FROM ADJACENT
MACROCEll

Figure 1. The CY7C331 I/O Macrocell and Shared Input Mux

6-279

path. Using this path as an input makes it impossible to
feed the contents of the register back into the array.
The CY7C331's dual-muxing structure eliminates this
limitation by allowing you to use the shared input mux
(Figure 1) as an I/O path into the array, while simultaneously feeding back the register contents using the
separate macrocell feedback mux. Because you can make
the CY7C331's output register transparent by asserting
both the register's set and clear nodes, you can also
achieve simultaneous combinatorial feedback. Using this
feature, you can implement bidirectional I/O in both
registered and combinatorial configurations.

Configuring the CY7C331
Figure 2 lists PLO ToolKit source code that configures a CY7C331 I/O macrocell as bidirectional, with
feedback from the output. The I/O pin corresponding to
the macrocell is labeled 10 PIN, and the path from the
I/O pin to the macrocell is :iN PATH. The code includes
explanatory comments.
Note that the source code assigns 10 PIN to node 28
and IN_PATH to node 34, with pin 28 as-a source. In the
PLO ToolKit simulator, you must add the input waveform

on the trace corresponding to node 28, even though that
trace is named 10 PIN. IN PATH's node 34 is a readonly node. This is true evenIf you configure 10_PIN as a
buried register, and IN PAlH is always an input. The
reason is that node 34 is just a mux, and the register associated with the input belongs to node (pin) 28. If you
want to see the output register's value when the pin is an
input, you can create a view node for the mux node. This
arrangement allows you to probe several different places
inside a macrocell (see the Reference for more information on view nodes).

The CY7C331 as a Function Generator
Waveform generators are useful in a variety of applications, primarily in the test and diagnostic areas. Any
time you need to create high-speed digital waveforms, a
programmable waveform generator is the ideal solution.
The CY7C331 design described here allows you to
generate waveforms of frequencies greater than 30

MHz.
This waveform generator builds waveforms with
respect to a system clock called SYS CLK. To use the
generator, you load into LOW_ REG(2:0) the number of

{*****************************************************************************************}
CY7C331;
CONFIGURE;

{The first line of code selects the device}
{In this section pin and node names are specified, along with configuration information}

INCLK, OUTCLK, IINCLR, IINSET, OEI, IOE2, INPUT, IOUTCLR(NOOE=9), 10UTSET,
{The input names are listed above. Pin I will be the input clock, pin 2 will be the output clock. Pins 3 and
4 will be the input register's clear and set signals respectively. Pins 5 and 6 will be output enables, OEI is high
asserted, IOE2 is low asserted. Pin 7 is a straight input. We skip pin 8 because it is Vss. Pins 9 and 10 will be
the input register's clear and set signals.}
10 PIN(NOOE=28, IREG), IN PATH(NOOE=34, SRC=28), OUT(NODE=27),
- {Pin 28 is the actual bidirectional pin. The IREG attribute specifies that the input to the array comes from
the output register, rather than the pin. Node 34 is the shared input mux for nodes 27 and 28. IN PATH is the
input path to the array from pin 28. Pin 27 is a simple output.}
EQUATIONS; {This is where the array is specified.}
 INPUT {When 10 PIN is an output, it follows Pin 7.}
 OUTSET
 OUTCLR
 OUTCLK
 OEI * OE2
{Outputs are enabled when OE_l is high, and IOE_2 is low.}
 INCLK
 INCLR
 INSET;
OUT =


{Listing the connective alone sets the product term to "I", always asserted.}
{When both the set and reset product terms are asserted, the register}


{becomes transparent. Thus, this is a combinatorial output.}
 IN PATH; {This output always shows the value of the input register at pin 28.}
{If the register is in combinatorial mode, the value on pin 28 will be shown.}
Figure 2. PLD ToolKit Source Code for
a Bidirectional Pin With Feedback

6-280

SYS_CLK cycles that you want the output waveform
(OUT W AVE) to remain Low. HI REG(2:0) contains the
number of SYS CLK cycles that you want OUT WAVE
to be High. For this implementation. the values must be
between 2 and 7.
When the START signal is asserted. OUT WAVE
goes low. and LOW REG(2:0) is loaded into a counter.
When the count is almost O. the signal TERM CNT is
deasserted. then reasserted when the count reaches O. This
toggles OUT WAVE and loads a second counter with the
value in HIJlliG(2:0). The cycle repeats. alternating between HI REG(2:0) and LOW REG(2:0) until SYS CLK
is withheld. or new values are loaded into HI REG(2:0)
and LOW_REG(2:0). and START is reissued:- Figure 3
depicts the waveforms for this design.
HI_REG(2:0) and LOW_REG(2:0) are loaded using
IDS and ADDR(7:0). You can specify any address for
these registers. In this example. HI REG(2:0) is at
ADDR(7:0) = 00 Hex. and LOW-REG(2:0) is at
ADDR(7:0) = 01 Hex.
LOW_CLK_IN
is
the
clock
input
for
LOW_REG(2:0). The clock results from decoding the active low IDS (data strobe) and ADDR(7:0) ;., 01 Hex.
HI_CLK_IN is similarly decoded from IDS and
ADDR(7:0) = 00 Hex.
LOW CNT (2:0) and HI CNT (2:0) form two 3-bit
counters. These counters are iOaded-with the contents of
ADDR(7:0)
IDS

XZ\

00

the LOW_REG(2:0) and HI_REG(2:0) registers. respectively. via each flip-flop's individual set and reset.
LOW CNT (2:0) is loaded when /TERM CNT is Low
and OUT WAVE is High. Similarly, HI-CNT (2:0) is
loaded when /TERM CNT is Low and OUT ViAVE is
Low. SYS CLK clockS both counters.
lTERM CNT is also clocked by SYS CLK and
detects when either of the counters equals 1. -When this
occurs. lTERM CNT goes Low for one clock. then goes
High again. -/TERM CNT's rising edge clocks
OUT WAVE. which toggles on every clock.
Implementing this design requires two separate 3-bit
input registers. decoding logic for the input-register
clocks, two separate 3-bit counters, logic. and two miscellaneous registers. All the counter flip-flops must be individually settable or resettable. In addition. there are four
separate clocking functions. Figure 4 shows an implementation of this design using small-scale
integration.
This type of design is usually difficult to implement
in a PLD. The flip-flops in most PLDs permit neither the
use of the individual set and reset inputs nor separate
clocking. Because the CY7C331 has these features. however. it implements the design effortlessly.

PLD ToolKit Implementation

~~______________~P~Oa"_'T~C~A~R~E__________________________

~

HI_REG(2:0) ~~~~~~_____________________________________________________
LOW_REG(2:0) ~~~~~~XX~-L_______________________________________________

START

--------~;--\~----------~-----------------------------\~

\'--------'/

x

x x

Figure 3. Waveform Generator Internal and External Timing

6-281

\

_ _____J/
/

output. Because this is the defaul~ it does not need to be
specifie~ but it is included here for documentation purposes. The same is true for TERM CNTt IHI CNT 0,
and /LOW CNT 1.
-Notice that -HI IN 1 and LOW IN 0 have the attribute "!REO" listed after the node assignment. This attribute specifies that these pins are dedicated inputs; the
feedback mux selects the Q output of the input register
associated with the pint as opposed to the output register's
Q output. This is an override of the default discussed
above.

Appendix A contains the Cypress PLD ToolKit source
code for the waveform generator. Two aspects of the code
require some clarification: the pin assignments and
polarity.
The pin assignments for nodes (pins) 1 through 14
are straightforward. Pin 8 has been skipped because it is a
Vss pin. Otherwiset these pins are the CY7C33rs combinatorial inputs and thus require no configuration information.
OUT_WAVE is assigned to pin 16. "lOP" following
the node assignment indicates that the feedback mux is
programmed to feed back the OUT_WAVE registerts Q
SYI

t_1

r-....
.....
.....

IDI
1&11111"

LAURI
,aDDI.

I

LADJIJII
LllDR4
IADDU

LA

~F=I

lADDIl
LADDR7
.LUl11

"

r8"

LOII_RU

_

~h

,.
~

LOll_lEi

~ )--I-

~

HI

'I

..

D,

1.011 I I

-PD- ~II~

II

t:op>-

r8..

II_RUI

,I,

II

1

r8..

II_REI

HI

II

!

I L 1111

t I

,

\'

II ,

T"'. " IT

~
.',, r~

I

~
,I,
"

..............
II
lIo .. ua ••

~

~h

.......

'"oT lIa ••

'8"., "a.,

..... - .....

'8".,-"aVIr
.. ,

.,.T

~I

-PD- ~~..
,.

.HI

1I0T lIa ••

,-, TOIiriT

LOII_UU

_C LI_II

IIBY lIa,w
tllTftIiT~

-Tra.

tiT - IIr-

.1111

"

J.JIJI..I.I

.......

""" ua ••

~
,I,

}-L-A.i - C L, _ "

IHI

elT

It I

,,,,, .. "av~

hT-.--'.

t:op>-b®J~

tiT

'II"" uav.
..... -tIlT

'A"" uav~
Tn.-tIlT

QI

START

II_REn

S'fS eLla.

Figure 4. Schematic of the Waveform Generator

6-282

.

r~

IIUT IIJ YE

wn

Using the CY7C331 as a Waveform Generator

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The rest of the assignments are of the same form as

NO CONNECT
lOll - III - 0
lOll - III 1
/ lOI/_ CNT 1
lOll III - Z
/HI_ CNT 0

IDS
ADORa
ADDRt
ADDRZ
ADDR3
ADDU
ADDRS

IHI CNT 2 and HI IN 2. IHI CNT 2 is assigned to node
attribute of IOP.-As mentioned earlier, this
18,-with

an

configures the feedback mux to select feedback from
IHI CNT 2 as the array input. HI IN 2 is assigned to
node 30, -which is an additional mux that serves as an
input path from the input register on either pin 18 or 17.
The notation "SRC = 18" specifies that HI_IN_2 is assigned to the input register on pin 18. The default is that
the even pin is always selected, and thus "SRC = 18" is
included primarily for documentation purposes. This
method for utilizing both a pin's input and output registers
is used four times in this design. In each case, the output
register is buried (not accessible to the pin). Figure 5
shows the CY7C331 footprint with all external pin signals
labeled.
A close look at the file in Appendix A might also
raise questions concerning polarity conventions in the
PLD ToolKit. Polarity on inputs is fairly straightforward.
Note that the "I" in ISTART denotes a Low-asserted signal. When START appears in the EQUATIONS section
(refer to lOUT_WAVE and /TERM_ CNT equatio~s)
without the "I", the signal is interpreted as ISTART bemg
asserted. Thus, when ISTART = 0, the OUT_WAVE
register is set
The output feedback polarity can cause more confusion. Polarity on the CY7C331 is programmed using the
XOR in the array. Thus, when TERM_CNT is specified in
the CONFIGURATION section, the output register is actually /TERM_ CNT, because an inverter lies between the

-

-

Vee
Vss

Vss

HI IN a
HI IN 1
HI IN 2
TERM CNT
OUT IIAYE
NO CONNECT

ADDU
ADDR7
START
SYS ClK
NO CONNECT
SYS CLEAR

-

-

-

-

Figure 5. Footprint of the CY7C331 Waveform
Generator
register output and the pin. Further, when . you set
TERM CNT, the pin is Low. How, then, do you specify
that TERM CNT is asserted when it appears on the right
of an equation? You refer to the polarity present on the
pin. Thus, in the lOUT_WAVE equation's 
portion, TERM_CNT is specified. This means that
lOUT WAVE is clocked when pin 17 (TERM_CNT) exhibits rising edge.

a

Reference
PLD ToolKit Manual, Chapter 4.3. Available from
Cypress Semiconductor.

6-283

~

~~RESS
SEMICQIDUCTOR
~,

_--,;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;V;;;;;;;;;;;;;si;;;;D;;g;;;;t;;;;he;;;;;;;;C;;;;Y;;;;7;;;;C;;;;3;;;;3;;;;1;;;;a;;;;s;;;;a;;;;;;;;W;;;;a;;;;v;;;;e;;;;fo;;;;;;;r;;;;m;;;;;;;;G;;;;e;;;;D;;;;;;;er;;;;a;;;;t;;;;;;;;;;;;or

Appendix A. PLD ToolKit Code for the Waveform Generator
CY7C331;
CONFIGURE;
{Low asserted data strobe}
IDS,
ADDRO, ADDR1, ADDR2, ADDR3, ADDR4, ADDRS,
{address bits 0,1,2,3,4,S,}
{address bits 6 and 7}
ADDR6(NODE=9), ADDR7,
ISTART,
{start sequence}
SYS CLK,
{counter clock}
SYS-CLEAR(NODE=14),
{initialize OUT WAVE,TERM CNT to a quiescent state}
{output wave rOiro}
OUT W AVE(NODE=16,IOP),
TERM CNT(NODE= 17,lOP),
{terminal count decode register} .
IHI CNT 2(NODE=18,IOP),
{high counter bit 2, a buried register}
{high register input bit 2}
HI IN 2(NODE=30,SRC=18),
IN-1(NODE=19,IREG),
{high counter input bit I}
IHI CNT 1(NODE=20,IOP),
{high· counter. bit 1, a buried register}
HI IN 0(NODE=31,SRC=20),
{pin 20 acts as high register input bit O}
IHI CNT. 0(NODE=23,IOP),
{high counter bit O}
ILOW CNT 2(NODE=24,IOP),
{low counter bit 2, a buried register}
LOW -IN 2(NODE=32,SRC=24),
{pin 24 is low register input bit 2}
ILOW CNT 1(NODE=2S,IOP),
{low counter bit I}
LOW -IN 1(NODE=33,SRC=26),
{pin 26 acts as low register input bit I}
ILOW CNT 0(NODE=26,IOP),
{low counter bit 1, a buried register}
LOW]N_0(NODE=27,IREG),
{low register input bit O}

He

EQUATIONS;
LOW_CNT_O :=

 /LOW CNT 0
 SYS CLK
 DS* mDRO*1ADDR1 *1 ADDR2*1ADDR3*1ADDR4*1 ADDRS*IADDR6*1ADDR7
 ILOW IN 0 * lOUT WAVE * lTERM CNT
 LOW,=-IN=O * IOUT=WAVE * lTERM'=-CNT;
 DS*ADDRO*/ADDR1*/ADDR2*/ADDR3*/ADDR4*/ADDRS*/ADDR6*/ADDR7;

LOW CNT 1 :=  LOW CNT 1
 LOW CNT 0
 ILOW IN 1 * lOUT WAVE * lTERM CNT
 LOW-IN-1 * lOUT-WAVE * lTERM-CNT
 SYS CLK;
LOW CNT 2 :=  LOW CNT 2
 LOW CNT 0 * LOW CNT 1
 ILOW IN 2 * lOUT WAVE * lTERM CNT
 Low1N-2 * lOUT-WAVE * ITERM-CNT
 SYS CLK DS*AI5DRO*/ADDR1 */ADDR2*/ADDR3*/ADDR4*/ADDRS*/ADDR6*/ADDR7;
 OUT WAVE
 TERM CNT
 START
SYS_CLEAR

6-284

C~RESS

Using the CY7C331 as a Waveform Generator

~, ~C~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. PLD ToolKit Code for the Waveform Generator
/TERM CNT:=  /LOW CNT 0 * LOW CNT 1 * LOW CNT 2
 /HI-CNT 0 * HI CNT 1 * HI-CNT-2
 START
 SYS CLEAR
;
 /HI CNT 0
-SYS -CLK
 HI IN 0 * OUT W AVE * /TERM CNT
 /HCIN-=,O * OUT-='WAVE * /TERM-='CNT;
 HI CNT 1
 HI CNT 0
/HI IN l*OUT WAVE*/TERM CNT
 HCIN-l *OUT-WAVE*/TERM-CNT
 SYS- CLK
 DS*/ ADDRO*/ADDRI */ADDR2*/ ADDR3*/ ADDR4*/ADDRS*/ ADDR6*/ ADDR7;
/HI_IN_l

=

HI_CNT_2:=

 DS*/ADDRO*/ADDRl*/ADDR2*/ADDR3*/ADDR4*/ADDRS*/ADDR6*/ADDR7;
 HI CNT 2
 HI CNT 1*HI CNT 0
/HI IN 2*OUT VIAVE*/TERM CNT
 HI-IN-2*OUT-WAVE*ITERM-CNT
 SYS- CLK
 DS*/ADDRO*/ADDRl*/ADDR2*/ADDR3*/ADDR4*/ADDRS*/ADDR6*/ADDR7;

6-285

CYPRESS
SEMICONDUCTOR

CY7C331 Application Example: Asynchronous,
Self-Timed VMEbus Requester
This application note describes how to use the
Cypress CY7C331 CMOS erasable programmable logic
device (EPLD) to support asynchronous, self-timed
designs. The CY7C331 is ideal for implementing
asynchronous, self-timed, and general-purpose logic· integration applications. The application example
described here is an asynchronous, self-timed VMEbus
requester.
The CY7C331 is a member of the Cypress slim-line,
28-pin family of high-performance CMOS EPLDs.
Family members are characterized by high speed, increased I/O, and high integration. The CY7C331 has a
highly flexible architecture with a 192-product-term
logic array and 12 I/O-logic macrocells. Each macrocell
provides two D flip-flops with asynchronous set, reset,
and bypass capability. The flip-flop's Clock, set, and
reset inputs are individually programmable, as are each
macrocell's logic polarity and output-enable control.
The CY7C331 easily supports combinatorial and
registered inputs and outputs and buried states.
Additionally, the CY7C331 has the uncommon
ability to self-time asynchronous, sequential applications. A self-timed design performs a sequential task
without the presence of a clock to synchronize each
step in the sequence. This design approach usually
results in higher performance compared to synchronous
designs. The main application for self-timing is in highperformance I/O interfaces. The CY7C331 supports
self-timed designs because its clock inputs are programmable, internal timing relationships are well-controlled,
and metastable resolution is ultra-fast.
The VMEbus is a common, high-performance
asynchronous bus. The VMEbus request function is
asynchronously initiated and sequential. In addition to
showing the CY7C331 's ability to handle asynchronous,
self-timed tasks, this application example demonstrates
the use of many unique CY7C331 features.

CY7C331 Brief Description
The CY7C331 is available in a 28-pin slim-line (300
mil wide) plastic or windowed DIP and in 28-pin PLeC
and LCC packages. The windowed version is UV

erasable and reprogranunable, and the plastic DIP,
PLCC, and LCC versions are one-time programmable.
The CY7C331 is available with TpD and Teo specified
at 20 ns max and with register set-up times of 12 or 2
ns, depending on whether the register connects to an
input pin or to the device's .logic array. Other commercial and military speed grades are also available.
The CY7C331 is based on a programmable sum-ofproducts (AND-OR) logic-array architecture. The logic
array consists of 192 programmable product terms, each
having as input the true and complement versions of 31
logic inputs. The product terms connect to one of
twelve I/O logic macrocells, and each of these macrocells connects to a device pin. The product terms are
allocated with a variable distribution to the macrocells.
The CY7C331 provides 13 combinatorial inputs to
the array from dedicated input pins, one of which (pin
14) can also be used as an output-enable control. The
macrocells and six shared input muxes each provide an
input to the array. A shared input mux selects the input
from one of two adjacent macrocells (Figure1).
The CY7C331's I/O-logic macrocell sums array
product terms, selectively inverts the sum, and provides
the result to the D input of a D flip-flop. The flip-flop's
output (Q) connects through an inverting three-state
buffer to a device pin and can be fed back to the array.
The I/O macrocell also provides a second D flip-flop
that latches data from the same device pin. This flipflop's Q output connects to the macrocell input-select
mux and to the shared-input mux (see Figure1 in "Using
the CY7C331 as a Waveform Generator"). Both flipflops have asynchronous set (S) and reset (R) inputs, as
well as bypass capability. A flip-flop bypasses the D
input to Q when S and R are both High. Separate
product terms drive both flip-flops' clock, S, and R
inputs.
A multi-input OR ogate sums the product terms.
The number of product terms input to the OR gate
depends on the macrocell (Figure1). A dual-input XOR
gate selectively inverts the sum. The XOR gate's second
input is a product term that controls selective inversion.
You can control a macrocell's output enable (OE) by

6-286

Figure 1. Cypress CY7C331 Block Diagram

using pin 14 or a product term. The OE mux selects one
of these two options. Another mux, the FB mux, selects
the macrocell array input Each OE, FB, and sharedinput feedback mux .has an associated· programmable
configuration bit that controls mux selection.

A self-timed design implements a state machine
without the presence of a clock to synchronize each
state transition. The implementation of a self-timed
design must meet two· basic requirements:
1. It must time and perform state transitions.
2. It must synchronize asynchronous inputs.
As in any state machine, a self-timed design must
meet minimum state flip-flop set-up times before performing a state transition. Without the benefit of a
clock, the design must generate self-timing clocks based
on the state data change due to a state transition itself.
Thus, clock initiation and data changes are coincident,
and the design must delay a clock to allow data to settle
and meet minimum set-up time requirements.
The simplest example of self-timing appears in FiEf
ure 3. This circuit clocks a logic 1 into a D flip-flop on
the input's rising edge. The design works if the clock
delay time is long enough to allow the data input to be
set up. This simple circuit illustrates how the CY7C331
supports self-timed designs; the CY7C331 allows you to
program the timing relationship between the flip-flop's
D-input logic and clock input logic to guarantee satisfaction of minimum set-up time requirements. The
CY7C331 synchronizes asynchronous inputs in the same
manner, except that the set-up time is longer to allow
for metastable resolution. The CY7C331 can also perform self-timed synchronization because metastable
resolution is ultra-fast
The approach used in the CY7C331 to self-time
state transitions is to delay a clOCk signal by passing it
through the logic array one additional time; this arrangement allows data to meet set-up time requirements.
To guarantee that this approach works, the extra delay
in the clock path must be programmed to delay the
clock as long as possible (Figure 4). In general, a selftimed design should set up data as fast as possible and
delay the clock long enough to guarantee that data is set
up. But delay time in the CY7C331 is sensitive to the
logic function programmed. Guaranteeing that data is
set up as fast as possible restricts the logic functions the
device can perform. You can avoid this limitation by
placing restrictions on the clock path. You can program
any logic function if the clock delay path is slow enough.
To perform self-timed synchronization, the clock is
delayed by two extra passes· to provide the extra delay
required for metastable resolution (Figure 5). Program
both clock delay elements to be as slow as possible so
you can configure any logic function. With these restrictions, the mean time to failure (MTF) due to a metastable condition is greater than 10 years.

CY7C331 Self-Timed Capability

Clock Delay Programming

The main application for self-timed functions is in
high-performance I/O interfaces, where clocking restrictions prevent performance requirements from being
satisfied. These applications might not have an available
clock, the clock might be too slow, or synchronization
time might have to be minimized.

In the CY7C331, a product term generates an output transition from Low to High faster than from High
to Low. A transition caused by a single input and a
single product term is faster than those caused by multiple inputs and/or product terms. The shortest delay
time through a CY7C331 occurs when a single input
6-287

triggers a single product term to transition from Low to
High. The. slowest clock path results from placing
restrictions on how the extra level of clock delay is
programmed. These restrictions are:
The clock delay should use a logic path through
multiple product terms, OR gates, and XOR gates
to a bypassed flip-flop.
Clock delay logic should make product term outputs transition from High to Low.
All product terms to the OR gate should be
programmed identically to implement clock logic.
The OR gate should have the same or more inputs
than associated data-path OR gates.
The programmable XOR input should be set Low.
The clock delay element shown in Figure 4 illustrates each of the four programming restrictions.

Self-Timed VMEbus Requester
Bus requesters are used in common bus systems
that support multiple processors controlling bus transfers. A processor that controls bus transfers is typically
referred to as a bus master. The bus requester requests
permission for a master to control the data bus and indicates to the master when data bus control has been
granted. The VMEbus supports multiple bus masters.
A self-timed design approach for a VMEbus requester is appropriate because the VMEbus is
asynchronous and offers high performance. The bus-request function is asynchronously initiated and is sequential. A self-timed design self-synchronizes to initiate the
request and self-times the rest of the request sequence
at CY7C331 device sp~d. A synchronous approach requires an external clock· to synchronize and time the sequence, for which the VMEbus provides a 16-MHz system clock. However, a CY7C331 self-timed design
provides much higher performance than a synchronous
design using the system clock.

VME Background
The VMEbus i~ defmed to support multiple bus
masters, although only one master can control the bus
at a time. The VMEbus provides an arbitration subsystem in which a central bus arbiter determines which
master is granted the data bus. Each master contains a
bus requester to request control of the bus from the
arbiter.
The arbitration subsystem is supported on the
VMEbus with six bused lines and four daisy-chained
lines. All these lines are active Low, which is indicated
by a"_" suffix on a line name. The bused lines are Bus
Busy (BBSY-), Bus Clear (BCLR-), and Bus Request 3
- 0 (BR3- through BRO-).
When the daisy-chained lines enter a board, they
are designated Bus Grant 3-0 In (BG3IN- through
BGOIN-), and when leaving are designated Bus Grant 3
- 0 Out (BG30UT- through BGOOUT-). (The terms

BRx-, BGxIN-, and BOxOUT- are used when references are not to a specific line or lines; x. is any value
from 0 to 3.) The highest priority is allocated to number
3 lines and lowest to number 0 lines. The BGxOUTlines that leave a board in slot n enter the board in slot
n+l as BGxIN- lines. The bus arbiter must always
reside in the first slot of a VMEbus-based system to initiate BGxOUT- generation.
All masters in the system drive BBSY- when they
have control of the bus. Within each bus-grant daisy
chain, all masters drive the same BRx- line. Multiple
masters on a bus grant daisy chain can request the data
bus at the same time by simultaneously driving their associated BRx- lines. When this occurs, the requester
furthest up in the daisy chain gets the bus grant. The
remaining master(s) on the daisy chain can continue to
assert BRx- until they receive a bus grant.
A simple VMEbus requester initiates a request
after detecting an on-board request (OBR). (A
simplified bus-request state diagram and timing
diagram appear in Figures6 and 7.) The requester then
drives the BRx- line active and waits for the associated
BGxIN- line to become active. Once the requester
detects BGxIN- active, BBSY~ and the appropriate
DMA Grant .line (DMAGRx-) are driven active, while
BRx- is released to inactive. The active DMAGRx line
indicates to an on-board master that it has. the bus and
can perform a a data transfer.
While data is being trarisferred, the bus master asserts the Data Transfer (DTR-) input to the CY7C331
bus requester. When the master has finished using the
bus, the DTR input is deasserted. The requester then
releases the bus by deasserting BBSY- and OBO. Even
if one of the other on-board masters wants the bus, the
requester deasserts BBSY- and waits for a new BGxINbefore granting the bus to this· master. This extra overhead allows other requesters that might be further up
the daisy chain to obtain the bus between on-board bus
requests.
If the bus grant input (BGxIN-) becomes active
while none of the on-board request lines are active, the
requester must pass the request down the daisy chain.
This is accomplished by asserting the bus grant out
(BGxOUT-) signal.
The VMEbus specification includes a few timing
and requester design restrictions. A VMEbus requester
must satisfy the two timing requirements displayed in
Figure6. BBSY- must be driven for a minimum of 90 ns,
and the release of BRx- must occur at least 30 ns before
BBSY- is released. The primary design requirements
are that BBSY- and BRx- must use open-collector

I N

OUT

Figure 2. A Self-Timed Element

6-288

OUT

IllS

RESET

..... t
IIAII

PTERM

PT • I

DELAY

Figure 3. CY7C331 Self-Timed Element
drivers, and BGxOUT- must never glitch during operation. The restriction on BGxOUT- ensures avoidance of
inadvertent bus grants.

possible to facilitate the next bus arbitration. BBSY- is
not released, however, until the following criteria are
met BBSY- is driven for at least 90ns, BGxIN- is inactive, and the previous data transfer is complete (DTRis deasserted). If none of the DMARQx- lines is requesting the bus when a grant is received, the requester
passes the grant onto BGxOUT- for the next requester
on the daisy chain. The requester also recognizes a system reset (SYSRESET-) and initializes the device appropriately.
A logic diagram of a self-timed VMEbus requester
using the CY7C331 appears in Figure8. BRx- is the OR
of the DMARQx- lines.

Requester Design
The requester supports overlapped bus requests; It
also releases the data bus every transfer cycle to allow
the central arbiter to grant the bus to a higher-priority
requester, if one exists.
The CY7C331 VMEbus requester supports three
on-board DMA request lines (DMARQ2- through
DMARQO-). All the DMARQx- lines can generate a
bus request on the BRx- line. The requester supports
three on-board grant lines (DMAGR2- through
DMAGRO-), one for each request line. When a bus
grant is received on BGxIN-, the requester must determine which DMAGRx- line to activate. The requester
prioritizes the DMARQx- lines and grants the bus to
the highest priority request; DMARQO- has the highest
priority and DMARQ2- the lowest. The selected
DMAGRx- line is not activated until the previous data
transfer is complete.
If any of the DMARQx- lines are active when a bus
grant is received, the requester drives BBSY- active.
For overlapped operation, BBSY- is released as soon as

Requester Operation
If any DMARQx line becomes active, BRx- be-

comes active, signifying to the arbiter that one of the
masters on this board wants the data bus. An external
open-collector driver drives BRx-.
Self-timed operation begins when the incoming
BGxIN- line becomes active. The three on-board DMA
request lines (DMARQ2- through DMARQO-) are selfsynchronized to the BGxIN- line. BGxIN's falling edge
serves as a clock to register the DMARQx- lines and
toggle a flip-flop from High to Low to initiate an inter-

f----"-"'-T

RESET

PURH

Figure 4. CY7C331 Self-Synchronizing Element

6-289

~CYPRIi$

CY7C331 Asynchronous VMEbus Reguester

~aNOOcr~~~~~~~~~~~~~~~~~~~~~~~~~~~

nal. self-timed clock signal (STCP). The DMARQxlines must be synchronized. because BGxIN- can be activated when any BRx- line becomes active or when
BBSY- is released. For example. if DMARQO- causes
the associated BRx- to initiate bus arbitration. and
DMARQ2- attempts to become active at the same time
BGxIN- becomes active. DMARQ2's resulting state
could be an indeterminate metastable condition that
needs time for resolution. The pair of internal clock
delays provides this time before the DMAGR2- output
register samples the state of DMARQ2-.
Two CY7C331 delay elements delay the internal.
self-timed clock signal to provide enough time to selfsynchronize the requests. The requests are prioritized
during the clock delay time. The resulting delayed clock
(STCP2) then asserts BBSY - if any of the DMARQxlines are active. If none are active. the BGxOUT- line is
asserted to send the grant to the next requester in the
daisy chain. Using the delayed clock to generate BBSYand BGxOUT- guarantees that both lines are
synchronized and cannot glitch.
BBSY- is driven onto the bus with an external
open-collector driver. The prioritized requests are
clocked into registers to create the DMAGRx-· signals
on the delayed STCP' s rising edge. if the previous data
SYSRESET

'I

x

B R -L30"
B G x I N-=---1L._ _--'
BBSY _

II

90"

smX"

s

B G.x 0 U , -

Figure 6. VME Arbitration Timing
transfer has completed. or on the rising edge of DTRwhen the data transfer completes. An internal flip-flop
toggles at the same time. The flip-flop output indicates
transfer completion (TC).
The registered BBSY- line feeds into an external
90-ns delay line to guarantee that BBSY- is active for
the minimum required time. The delay mechanism
should be designed such that the delay circuit has no
effect if the data transfer requires more than 90 ns to
complete. One way to implement this feature is to use a
one-shot triggered by the falling edge of the CY7C331's
BBSY- signal. The one-shot's output is ORed with the
BBSY- signal from the CY7C331 to generate the

_

-

~~-----------.--------------------------~

,~

IOBR- & IBGxIN-

IBRx-. IBGxOUT-.
IBBSY-. IOBGBGxIN- & IOBR-

OBR-

-

.-

BRx-

BGxOUT-

~GxIN-

BGxIN-

-

IBGxIN-

BBSY-. IBRx. OBG-

iBGxtNIBGxIN- & IDTR-

Figure 5. VME Bus Requester State Diagram

6-290

+ DTR-

Yf::~ =========;;C;;;Y;;;7;;;;C;;;;3;;;3;;;1;;;A~sy~n~c;;;;h~r~on~o~u~s~V~M~E~b~u~s~R~e~q~u~e~s~te~r
BBSY- signal to the VMEbus. The VME BBSY- signal
is inactivated when the 90-ns delay has elapsed
provided that TC is True and OTR- and BGxIN- ~
inactive. The requester is initialized for another self~~d. operation at the same time. The requester also
lmtializes when the SYSRESET input is asserted.
This design uses the 9O-ns delay circuit because an
~bsolute dela~ is required to meet the VME specification. A self-timed delay can yield only relative results
because there is no way to determine how many delay
levels are required to obtain a 9O-ns delay. Anyone
delay is usually much faster than the worst-case
specification, but the delay might be that slow. You can
emulate the delay on-chip by creating a digital delay,
but accuracy would be poor because you would have to
synchronize BBSY- to an absolute time base, such as
the 16-MHz system clock.
The .CY7C331 can emulate the external open-collector drivers, but the emulation would not meet the
VMEbus specification's drive requirements. To emulate
an open-collector driver, use the signal output to the external driver to drive the output enable of an on-board
inverting, three-state driver (with the input tied High). '

CY7C331 Implementation
The bus requester can be implemented and simul~ted using the source code in Appendix A, generated
Vla the Cypress PLO ToolKit software package. A close
examination of the code reveals how many of the
CY7C331's features are utilized.
The DMARQx- lines use two CY7C331 pins for
each line---one combinatorial and one registered. The
registered input .pins are used to conserve output logic
for other functions. The three macrocells associated
with the registered inputs also perform the internal selftimed clock generation and delay functions; most other
PLO s require six outputs to implement these functions.
In addition, the CY7C331's individually programmable
clocks allow the input register flip-flops to be clocked
on BGxIN's falling edge.
BBSY is assumed to be the input to the external
delay line, and the CY7C331 input BBSY90 is assumed
to connect to the delay line output.
The source code defines the self-timed clock
generation and delay logic needed to meet the requirements of CY7C331 self-synchronization.

n-ll ,

110

.I-n ,

III , /11

.0 .. ..'.lI
(.,,'ar •• ' )

Figure 7. Self-Timed VMEbus Requester

6-291

Appendix A. PLD ToolKit Source Code for VMEbus Requester

CY7C331;
{ Norman Taffe
Cypress Semiconductor
6120/1990

Cypress PLD Toolkit
VME Bus Requester
}

CONFIGURE;
DMARQ2(node= 1),
DMARQ1(node= 2),
DMARQO(node=3);.
BGxIN(node= 4),
SYSRESET(node= 6),
BBSY90(node= 7),
DTR(node= 9),
node 14(node= 14),
IINIT(node= 15),
IOBG(node= 16,ireg),
ISTCP(node= 17),
IBBSY(node= 18),
IBGxOUT(node= 19,ireg),
IBRx(node= 20,ireg),
IDMAGRO(node= 23),

{ On-board Request Lines}
{ VME Bus Grant Input}
{ Externally delayed BBSY signal}
{ Signifies a Data Transfer in progress}
{ Requester initialize signal}
{ Signals board that it has the bus }
{ Self timed CLK input register }
{ Assert Bus Busy when taking the bus }
{ Send Bus Grant down the daisy chain if not wanted}
{ Signal arbiter that this board wants the bus}

IDMAGR1(node= 24),

{ On-board grant lines }

IDMAGR2(node=25),
IRDMARQ1(node= 26),
IRDMARQ2(node= 27),

{ Registered On-Board Request lines}

IRDMARQO(node= 28,ireg),
STCP2(node= 33),
STCP1(node= 34,src= 27),
TC(node= 30,SRC= 17),

{ Second delay stage of self timed clock}
{ First delay stage of self timed clock }
{ Resets the INIT signal}

EQUATIONS;
INIT =

< OE>
< SET .OUT>
< CLR-OUT>
- BGxIN*BBSY90*TC*D1R
 ISYSRESET;

STCP

< CK OUT> RDMARQ1 & DTR
 - INIT
< CK -IN> IBGxIN
 INIT
;

BBSY

< OE>
 RDMARQ1
< CLR _OUT> INIT

{Output Register is used for TC}

6-292

 RDMARQO
 ISTCPl
 ISTCP2;

BGxOUT = < OE>
 RDMARQl
 BGxIN
 IRDMARQO*STCPl *STCP2;
BRx

=

< OE>
< SET OUT>
< CLR-OUT>
< XSU~1>

 DMARQ2*DMARQl *DMARQO
< SUM> BBSY;
DMAGRO = < OE>
 RDMARQl
< CLR OUT> INIT
- RDMARQO;
DMAGRI = < OE>
 RDMARQl
< CLR OUT> INIT

 RDMARQl
 INIT
 IRDMARQO*ISTCPl *STCP2;
RDMARQl = < SET OUT>
< CLR OUT;
< CK iN> IBGxIN
< CLR IN> INIT
< SUM> RDMARQ2
 RDMARQ2
 RDMARQ2
< SUM> RDMARQ2
 RDMARQ2
 RDMARQ2;
RDMARQ2 = < SET OUT>
< CLR OUT;
< CK iN> IBGxIN
< CLR IN> INIT
< SUM> TC
 TC
 TC
 TC
 TC
< SUM> TC
 TC
 TC
 TC

{ output register for STCP2 }
{ Note that XSUM is set to zero and }
{ p-term transitions are from high }
{ to low, to maximize self-timed delay }
{ Use all 6 p-terms to add to delay}

{ output register for STCPl }
{ Note that XSUM is set to zero and}
{ p-term transitions are from high }
{ to low, to maximize self-timed delay}

{ Use all 12 p-terms to add to delay}

6-293

$.i;CYPI TC
 TC
 TC;
RDMARQO =  IBGxIN
 lNIT

< CLR OUT>
< SET-OUT>

<

xsu1.1>

< SUM> DMAGRO
< SUM> DMAGRI
< SUM> DMAGR2;

6-294

CYPRESS
SEMICONDUCTOR

Understanding the CY7C361
The Cypress CY7C361 UV-erasable PLD employs
a revolutionary architecture that allows internal speeds
as high as 125 MHz. The part comes in a 28-pin, 300mil DIP and a 28-pin (P)LCC. The CY7C361 has eight
input pins with macrocells, four bidirectional pins with
input macrocells, one clock input with doubler, six
"pure" outputs, and four Mealy macrocell outputs. Internally, there are 32 state registers.
Control-logic clocks usually run at twice the system-clock frequency in high-performance systems. Thus,
for a 33-MHz system, a CY7C330 running at 66 MHz
works fine. But 40-MHz RISC CPUs are now available,
and even faster clock rates are right around the comer.
Because control logic often does not stabilize until late
in the design cycle, a PLD solution beyond 66 MHz is
needed. The CY7C361 is that solution.
How does the CY7C361 achieve speeds up to 125
MHz? Through a combination of state-of-the-art
process technology, circuit design, and architectural innovation (see Figurel).

(INPUT)
32

Traditional Architectures
To understand how the CY7C361 achieves its high
level of performance, .consider some common PLD architectures and their limitations.
The PAL
Figure 2 shows a simplified block diagram of a
traditional PAL architecture. When you implement a
state machine in a PAL, two components contribute to
the worst case !MAX. The first is ts, which is the delay
through the AND array and the fixed OR plus the
register set-up time. The second factor is tCF, which is
the clock-to-feedback time. Although you cannot
measure tCF directly, it is slightly less than the clock-tooutput time, tco. The maximum frequency for a state
machine implemented in this device is:
!MAX = 1/(ts + tcF)
Eq. 1
Substituting the minimum ts and the maximum tcF
yields the worst case !MAX. Typical nu.mbers for these
parameters are ts = 18 ns and tCF = 13 ns. The

CON D I T ION
STATE

OUTPUT

DECOD

REGISTERS
ARRAY

Figure 1. Block Diagram of the CY7C361

6-295

RRAY

C~RES')
II'
-===r

Understanding the CY7C361

SEMlcamUCTOR ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

dominant parameter is always ts, primarily due to the
delay through the AND array. Thus, the key to improving fMAX lies in minimizing the propagation delay
through the AND array.
The FPLS
Figure 3 shows a simplified block diagram of an

Input

(AN

Array

FPLS-another architecture commonly used to implement state machines. The method for computing fMAX
for the FPLS is similar to that for the PLD. In this case
ts is approximately twice the corresponding value for a
PAL, because the FPLS value includes delays through
two arrays (the AND and OR) instead of just one. This
makes ts even more of a dominant parameter in the
!MAX calculation. The higher ts value is the tradeoff for
some extra flexibility in implementing the state machine,
due to having an OR array rather than a fixed-OR
scheme.

Condition

Array

General Limitations
Another major barrier to speed in both PAL and
FPLS state machine designs is the design methodology
itself. Traditionally, efficiency of state machine implementation has been the overriding concern for designers. The goal was to use as few flip-flops as possible.
In such devices, the required number of states (S) is
encoded into N flip-flops, where N is the smallest in-

Figure 3. A Simplified FPLS Block Diagram

AND
ARRAY

teger such that S<=2N. The actual control signals must
be decoded from the state machine inputs and registers.
This adds extra latency time.
With the advent of high-density PLDs and the
shrinking of cycle times, the minimal-flip-flop strategy is
no longer viable. A Petrie net (see Reference) or tokenpassing methodology suits high-speed state machine
design better. In the token-passing methodology, each
state has its own register, and these registers are directly connected, as in a shift register. Passing from one
register to the next, a token signifies the present state by
its position. Branching results from passing the token to
a new process, which is enabled by an input condition.
This approach removes the necessity for the encoding/decoding logic in the traditional state machine in
two ways: because the token passes directly from one
state register to the next, and usually the control signals
can be taken directly from the state-register outputs.

The CY7C361 Architecture
Cypress Semiconductor developed the CY7C361
architecture by modifying and streamlining the architectures discussed earlier. Direct connections between
state register macrocells allow implementation of the
token-passing methodology. But the CY7C361 removes
the FPLS's primary speed barrier-that all inputs and

Figure 2. A Simplified PAL Block Diagram

6-296

~

£

:~RESS

Understanding the CY7C361

: : : , ~cam~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

...

feedback propagate through two arrays before reaching
the registers. The CY7C361 removes the barrier by
placing the state register macrocells between the two
arrays, with feedback going directly to the input array.
This strategy cuts ts in half and minimizes tCF, along
with providing state-decoding logic on chip, if needed.
Even reduced by half, however, ts is still a
dominant factor in the fMAx calculation shown in Eq. 1.
But because the propagation delay through a programmable array is directly proportional to the array's size,
streamlining the array further reduces ts. In the
CY7C361, state-register feedback accounts for 64 array
inputs; actual chip inputs account for only 24 array inputs. 88 inputs makes for a fairly large array. How can
the array size be reduced without sacrificing inputs or
state registers?
You can implement most state machines with four
or fewer registers, and you can usually break larger
state .machines into several smaller processes that pass
control back and forth. And although all registers need
to have access to the inputs, in most cases feedback is
local to a specific process.
The CY7C361 takes advantage of these facts by implementing feedback in stages. The state-register macrocells have been separated into eight groups of four,
each with its own local reset. In each of the groups, one
register has feedback available to all 32 registers, one
register has feedback available to a group of 16
registers, and the other two registers have feedback
available to eight registers. You can break a large state
machine into several processes, with local feedback
used within an individual process. Arbitration among
the smaller processes is accomplished with global feedback or the direct connections between adjacent state
macrocell s. This distribution allows the effective array
size to shrink to 56 input lines, without sacrificing the
number of inputs. Figure4 illustrates the concept.

liZ

!t

f--'

.A

1
f-J

f--'"

I

I--""

.....

I
t---'
t---'

The Condition Decoder
Array size has two components, the . number of inputs and the number of product terms. In PAL architecture design, one of the most critical tradeoffs is
speed vs. the number of product terms. Thus, the
second part of the challenge of minimizing ts is to provide only as many product terms as needed to implement state machines. Most PALs offer a minimum of
seven product terms per register. (The CY7C330 offers
a range between nine and 19.) However, seven product
terms for each of 32 registers (224 product terms total)
is obviously not the answer for a high-speed device.
Because the CY7C361 is designed specifically for
state machine applications, Cypress Semiconductor
analyzed state machine operations to find out what logical functions are necessary for state machine implementation. Cypress found that all state machine operations
fall into two classes: entering a new state from one of
several states based on a condition or leaving the
present state for one of several other· states based on a
condition.

...

I
Figure 4. Global vs. Local Feedback

As illustrated in Figure 5, both operations lend
themselves to the form:
(a+b+c .. +n) & (N ..& X & y & Z)
Eq. 2
Thus, in state machine applications, the standard
sum of products construct can be replaced with the
more efficient construct shown in Figure 6. The
CY7C361 uses the version pictured on the right, because that implementation is faster in CMOS. This circuit is called the Condition Decoder.
Replacing the standard sums of products with condition decoders reduces the number of terms in the
array to 64 for state registers, plus 16 terms used for
local resets, and two terms for the global reset. This
permits a total input array size of 56 x 82. This is small
6-297

Leaving

a

(a+b+c)·SO

state

CLOCK
FROM INPUT
Entering

a

TO
NPUT
MACROCELL

state

(SA+SB+SC)· (a· / b)

MEALY
OUTPUT OUTPUT

BIDIRECTION~URE

Figure 5. The Two Kinds of State Machine Operations

I/O

enough to make possible a ts + tcF of 8 ns or under. A
tp of 8 ns means an fMAX of 125 MHz.
The Output AlTay
The CY7C361's output array is OR based. The
state macrocell outputs are driven in complemented
form only, and the pure and bidirectional outputs are
product terms. This structure results in a logical NOR
for the overall output array function because:
!A & !B = !(A + B)
Eq. 3
The output enables are Low asserted and fed by
product terms. When taken with the complemented inputs to the output array, the output enables are an OR
function of any state register output(s). Mealy output
terms are NAND terms (more on this later). Each of
the output types appears in Figure 7.

State Macrocells
As mentioned earlier, the CY7C361 has 32 state
macrocell s. The state macrocells each have a single condition-decoder input, and they all share the same clock
and a global reset-condition decoder. For each group of
four state macrocells there is also a local reset-condition decoder. Additionally, each state macrocell has a
C)N input and a C_OUT output that connect macrocells to their adjacent macrocells. In addition to
C_OUT, the macrocell output is driven directly back to

Figure 7. Portions of the Output Array and Output
Types
the input array, in both true and complement form, and
to the output array in complement form only.
There are three possible configurations for the
state macrocell: START, TOGGLE, and TERMINATE.
The START Configuration
Figure 8 shows the CY7C361 state macrocell in its
START configuration, which causes the macrocell to
act like a one-shot circuit. Configuration bit C2 selects
whether C IN is a logic 0 or C OUT from the previous
macrocell. -The activating signal is the logical OR of the
C IN signal and the condition decoder. When these signais activate the input, the macrocell output is asserted
for one clock period only. You can use this macrocell
configuration to start a process or as a state in an unbranching sequence.
The TOGGLE Configuration
The CY7C361 state macrocell in its TOGGLE configuration acts like a toggle flip-flop. Once again, configuration bit C2 selects whether C_IN is a logic 0 or
C OUT from the previous macrocell. The activating
signal is the logical OR of the C_IN signal and the condition decoder. While this input is active, the macrocell
changes state on the rising edge of every clock. If the
input is not active, the macrocell retains its state (see
Figure 9). You can use the TOGGLE configuration for
binary counters and other traditional state machine implementations.
The (Wait Until) TERMINATE Configuration
The third configuration of the CY7C361 state macrocell is TERMINATE. The TERMINATE configuration differs from those already described in that you
must configure C2 such that C_IN is the C_OUT of the
previous macrocell. Asserting C_IN activates the circuit,
causing the output to become asserted on the next

Figure 6. The Condition Decoder

6-298

Fro ..
P"evious Stage

C 1 • CO

00

START

CLOCK

To Input Arra

To Next Stage

Condition+C
Q 0 u,--'t=--_---J
Figure 8. The START Macrocell Configuration

Stlge

C 1.

CO

10

LOCAL RESET
6 OBAL RESET
C LOC Ie

TOGGLE

Cond1t1on+C

Figure 9. The TOGGLE Macrocell Configuration

6-299

C I

From
LOCAL RESET
Prevtous Stage
G OIAL RESET

Q0

T

U '"--_ _-'

TPUT
ARRAY

Next

C 1 • CO

(WAIT UNTIL)

01

TERMINATE

Figure 10. The TERMINATE MacroceU Configuration

rising clock edge. The macrocell remains in this state
until the condition decoder becomes asserted. This
causes the output to be deasserted (terminated) on the
clock's next rising edge. Figure10 shows this configuration. You can use 1ERMINA1E to insert wait states in
a process.

registered mode, which you can use for pipelining inputs. The last configuration is double-registered mode.
It is used to synchronize asynchronous signals. Note that
the clock/or the registered modes is the same as the internal state clock. Thus, if you enable the clock doubler,
the input registers are clocked at twice the frequency of
the external clock. The input-macrocell configurationbit settings appear in Table 1.
The input macrocells also have a clock-enable function. Each of the three groups of macrocells has its own
input enable, which come from three product terms in
the output array. Because the enables are Low asserted,
and the output array inputs are all in complemented

The Input Macrocell
The CY7C361 contains 12 input macrocells, of
which. four are straight inputs, four are inputs from pins
that are also connected to the Mealy macrocells, and
four constitute the input path of the bidirectional pins.
The input macrocell of the CY7C361 avoids metastability problems and provides for flexibility in the
timing of inputs. Metastability has always been a problem in asynchronous systems, but with cycle times
shrinking dramatically, metastability is becoming more
of an issue in high-speed synchronous design as well.
The CY7C361 inputs are designed to be very metastability resistant. That is, metastability occurs rarely, and
when it does happen, the device resolves it quickly.
The input macrocell (Figure 11) has three possible
configurations. The fITst (default) is a nonregistered
configuration that can be used to cascade the CY7C361
with other high-speed devices. If you use the CY7C361
inputs in this mode, however, be careful not to violate
the state registers' set-up and hold time specs. This
timing is tight, and violating either spec could lead to
metastability conditions in the state macrocells. The
second configuration of the input macrocell is single-

FROM INPUT PIN

~
~

TO
INPUT
BUFFER

__+-__~__+-__~_______________ CLOCK
______ ______ _______________ CLOCK
ENABLE
~

~

Figure 11. The CY7C361 Input MacroceU

6-300

TRIG
lOT

FROM
ARRAY

couuu,

eNT-7

FROM

INPU

coupru

..... .....

.....

.....

.....

.....

Figure 13. State Diagram of the Pulse-Triggered
Counter
Figure 12. The CY7C361 Mealy Macrocell

Example: Pulse-Triggered Counter
An example application of the CY7C361 illustrates

form, the logical function of these clock enables is an
OR.

the use of most of the features discussed in this application note. In this circuit, an asynchronous trigger,
!1RIG, starts a 3-bit binary counter. Because only one
count cycle can be triggered at It. time, the circuit ignores any !TRIG pulses received in the middle of a
count cycle. !TRIG is synchronized· using the doubleregistered input macrocell configuration.
The counter outputs, COUNT(2:0), cycle from 0 to
7 and reset to O. Note that each instance of
COUNT(2:0) is High asserted. The outputs are thus assigned to Mealy macrocells, where the polarity can be
controlled.
When the circuit accepts a !'fRIG pulse, the fourth
Mealy output is used to generate an acknowledge signal, !1RQACK. The clock for this circuit is called
CLKlN, and the clock doubler circuit is activated. !RST
is a Low-asserted global reset signal. Because the timing
of !RST is not critical for this example, the input macrocell is configured as combinatorial.

The Mealy Macrocell
The CY7C361 provides four macrocells that allow
you to build Mealy machines-one of the two general
classes of state machine. (The other is the Moore
machine, in which outputs depend only on the present
state of the registers.) In a Mealy machine, outputs
depend on both present state information and the
state machine inputs. The CY7C361's Mealy macrocell appears in Figure 12.
The Mealy macrocells have two inputs, one from
the output array and the other from a dedicated input
pin (before the macrocell). As mentioned earlier, the
array outputs have an extra inverter in the path. This is
because the Mealy macrocells have programmable
polarity; the default configuration has an inverter in the
path. These two inverters cancel each other, effectively
making the Mealy macrocell's straight output a logical
NOR, just like the other outputs.
In addition to programmable polarity, the Mealy
output s offer configurations of output only; AND of
input and output; OR of input and output; or XOR of
input and output. Table 2 shows the Mealy macrocell
configuration bit settings.

Table 2. Mealy Macrocell Configuration Settings

Table 1. Input Modes for the CY7C361
Cl

CO

0

0
1
X

0
1

Inout Macrocell Mode
Combinatorial
Single Registered (pipeline Mode)
Double Registered (Synchronizer Mode)

6-301

Mealy Configuration

C2

Cl

CO

0
0
0
0
1
1
1
1

0
0
1
1

0
1
0
1

loin
loin
loin
Ipin

= input NAND array out
- input NOR array out
= input XNOR array out
- INV array out

0
0
1
1

0
1

Ipin
Ipin
loin
loin

- input
= input
= input
= array

0
1

AND arrav out
OR array out
XOR array out
out

~CYffiESS
-===r ....

.

SEMlcamUCfOR

Understanding the CY7C361
======================;;;;;;:;=====;;;;;;
macrocells. For the CY7C361, this includes input macrocells, state macrocells, Mealy macrocells, and the
clock doubler.
This example uses two input macrocell confIgurations. When configuring an input register, the default is
combinatorial. To override the default, specify IREG
for a single register or IIREG for double registers. In
this example, !RST is assigned to node (pin) 12 and
configured as single registered. !TRIG is assigned to
node (pin) 13 and configured as double registered.
CLKIN is assigned to node 4, the dedicated clock
input. Node 74 is the clock doubler. The default is that
the doubling function is not enabled. To enable it, assign the DBL_CLK attribute to node 74.
The internal state macrocells for this design are
S(l :0) and C(2:0). SO is assigned to node 32 and configured as a start macrocell by specifying the START
attribute. START is the default, but it is specifIed here
for completeness. Sl is assigned to node 33, and configured as a terminate macrocell with the attribute

Two state machines implement this design. The
fIrst, a supervisory machine, consists of two states: SO
and S1. SO is a START macrocell, triggered by !TRIG.
Sl is a (wait until) 1ERMINATE macrocell, with
C OUT from SO connected to C IN. Thus, !TRIG initIates a token in SO, which then passes the token to S1.
Sl acts as an enable for the counter, which is implemented using three TOGGLE macrocells, C(2:0). The
outputs COUNT(2:0) come directly from C(2:0). When
the counter reaches 7, Sl's terminate condition is met
and the circuit is ready for the next !TRIG. Figure 13
shows the state diagram for this example.
!TRGACK is to be asserted only when the circuit
receives !TRIG and· SI is not asserted. The Mealy macrocell is chosen so that !TRIG is the input, and it is
programmed as an OR gate. The output term is IS 1.
Figure 14 shows the physical implementation of this
circuit, and Appendix A lists the PLO ToolKit source
file. The source me's CONFIGURE .section must list
configuration information· for all the state machine's
ClKIN

ClKDB nOde=74,dbLck)
x

~I rlHs_O~(n_O_OO_=_32~,s_ta_rt~)__~-+__

-4__-4__

..:....t1+--H_-++_+t--++-++-H--+11-++---I~ Sdl=TRIG&lSO;

-+t--H-++--H-lIH&*f--++-++~~-'"
OUTtCd)C
C

IN

~

S1(node=33,tenn)

~1+-·<-pr-Od~>~S~1~&C~O~&~C~1~&~C~-~--r--+----r

,

U=~~~~------rC~I~N~=~'1

.

~~c_o(_nO_OO_=3_4,t_~_)~__T--+__+--'

~
HC_1~(n_Od_e_~_5~,t~~~) ~
~~
1=S1 ;

. ...I.'-f+--H----lf+--++-f+-lf+-H-++-4+-I

C IN=

~

__

• ...1.'++---H---:-I+--++-++-If+-H-H-++--f

__+-__+-__

~__~

J=S 1&CO;

~

C IN=

I

~ ~~C_2_(n_OOO~=3~6~,ro~g~)~~~~~~~r---r

.7

'++---tt---I+--+i-t+-iH-tt-+t-+t--I~ ~tS1 &CO&....C_1_;"*"'__+-__-+-__-+___+_

......

~)
-t+--Hr---+t--+t-+t-+t-HI-+t-++-I

CLKDB

node...,

~!I§III~~IEN~Q~q==LQ

o

,QT,

rrJJ.c~td!J,T1

G

Figure 14. PhYSical Implementation of the Pulse-Triggered
Counter

6-302

NTO

TERM. CIN must also be specified for every macrocell
configured as terminate. The C(0:2) macrocells are assigned to nodes 34, 35, and 36. Because these macrocells make up a counter, they are configured as toggle
macrocells by specifying the TOG attribute.
This example uses all four Mealy macrocells. The
COUNT(2:0) macrocells are assigned to nodes 19, 20,
and 24. In this case, no logical function is used. This is
the default, although it is not configuration 00. The
Mealy macrocells are used to make the outputs High
asserted. The toggle macrocells, C(2:0), are inverted
going into the output array, then inverted again going
into the Mealy macrocell. The macrocell contains one
other inverter, which the attribute NINV bypasses.
!TRGACK, the OR function of !S1 and !TRIG, is
assigned to the fourth Mealy macrocell at node 25. The
OR attribute is specified, and the NINV attribute
bypasses the inverter in the Mealy macrocell path.
IBN and GLBRST are both internal nodes. IBN is
assigned to node 30, which is the input-enable term for
pins 10 - 13. GLBRST is the global reset term for all
internal registers. Node 73, OFF in this example, is used
for tying anything in the output array Low.
The source file's EQUATIONS section contains
equations for both the condition-decode array and the
output array. The legal connectives for the conditiondecode array are  for the AND product
term and  for the NAND (or OR)
product term. This example uses only 
terms, and the logic/miser bits are automatically
programmed to enable the condition decoders.

The two legal connectives for the output array are
< INV SUM> , which is used for all outputs and input
enables: and , which is used for the output
enables on the bidlfectional pins. The output-array connectives are somewhat confusing unless you remember
that the entire output array is an OR array, and all the
state inputs are Low-asserted only. Thus, if you wanted
to assert an output only while SI is asserted, the equation would use  !SI. Because the output
enable is Low asserted,-the sum of !S1 and nothing else
serves in this example.
The equations are fairly straightforward. TRIG
triggers SO. SI is terminated by C2, Cl, CO = 7. The
toggle macrocells, C(2:0), are configured as a toggle
counter with S 1 as an enable. GLBRST is assigned to
RST.
The input clock enable is always enabled. Because
it is Low asserted, lEN is assigned to node 73, GND.
The counter outputs, COUNT(2:0), connect directly to
the outputs of the toggle macrocells, C(2:0), which are
inverted. TRGACK is assigned to pin 25-the Mealy
macrocell connected to pin 13, which is !TRIG. The
array input is !Sl. The NOR function is selected in the
source file's CONFIGURE section.

Reference
Murata, Tadao, "Petrie Nets: Properties, Analysis,
and Applications" (Proceedings of the IEEE, VOL. 77,
NO.4, April 1989)

6-303

Appendix A. PLD ToolKit Source File for
Pulse-Triggered Counter
CY7C361;
CONFIGURE;
CLKIN(node=4),

{system clock}

!RST(node= 12,ireg),

{low asserted reset, no input regiSter}

!TRIG(iireg),

{asynchronous trigger, iireg means double registered}

lEN(node=30),

{node 30 is the clock enable for inputs

CLKDB(node= 74,dbl_clk),

{node 74 is the clock doubler, enabled here}

COUNTO(node= 19,ninv),

{nodes 19,20,24,25 are the mealy outputs}

COUNT 1(ninv),

{COUNT(2:0) are not inverted, no logical function is used}

1~13}

C0UNT2(node=24,ninv),
!TRGACK(or,nirtv),

{!TRGACK is an OR function of pin 13 (!TRlG), and the
output assigned below, no invert of output is performed}

SO(node=32,start),

{state register 32, configured as START (one shot)}

Sl(cin,term),

{state register 33, C IN enabled,
configured as (wait until) TERMINATE}

CO(tog), Cl(tog), C2(tog),

{state registers 34,35,36, configured as TOG (toggle flops)
the internal counter}

GLBRST(node=64),

{node 64 is the global reset condition decoder}

OFF(node=73)

{node 73 is used to tie signals low}

EQUATIONS;
SO =  TRIG;
SI =  CO

* C1 * C2j

CO =  SI;
Cl =  CO
C2 =  CO

* Sl;
* C1 * Sl;

{SO is triggered by TRG}

{SI is triggered by C_IN from SO, released when C(2:0)=7}
{CO, least significant bit of counter, enabled by SI}
{C1, middle bit of counter, enabled by Sl and CO}
{C2, most significant bit of counter, enabled by Sl, CO, Cl}

GLBRST =  RST;

{RST selected as a global reset}

COUNTO

 ICO;

{counter outputs, connected to ICO, ICl, IC2 respectively}

COUNT 1

 IC1;

{inverted once more before mealy macrocell, }

COUNT2 =  IC2;

{high asserted on pins}

lEN =  IOFF;

{input clocks always enabled}

TRGACK =  lSI;

{TRGACK is a mealy and of TRG and lSI}

6-304

CYPRESS
SEMICONDUCTOR

Using the CY7C361 as an Mbus Arbiter
This application note discusses the use of the
CY7C361 as a bus arbiter for a Cypress SPARC
CY7C600 RISC-processor Mbus system. The Cypress
CY7C361 is a very high-speed synchronous Erasable
Programmable Logic Device (EPLD) optimized for
state machine applications. The Cypress SPARC system
utilizes a CY7C601 40-MHz RISC processor, a
CY7C602 Floating Point Unit (FPU) , four CY7C604
Cache Controller and Memory Management Units
(CMU), and eight CY7C157 16K x 16 cache RAMs
make up a 256-Kb cache. The arbiter resolves Mbus access contention for a system with four CMU bus
masters. Refer to Figure 1 for a block diagram of the
Mbus system.

MASTER

0

CY7C361 Brief Description
The CY7C361 is a high-performance PLD with 32
state macrocells, a condition-decode array, an output
array, 12 input macrocells for eight dedicated inputs
and four bidirectional inputs, six dedicated outputs and
four Mealy output macrocells. The CY7C361 also has a
clock-doubler circuit, which allows up to 125-MHz internal operation. Packaged in a 28-pin, 300-mil DIP or
LCC/PLCC package, the CY7C361 is manufactured
using a CMOS 0.8-micron, double-metal-processing
technology that is UV erasable. Please consult the
Cypress
application
note,
"Understanding
the
CY7C361," for an in-depth description of the CY7C361
architecture.

MASTER

MASTER

2

MASTER

3

MBUS

CY7C361
MBUS
ARBI TER

MA I N
MEMORY
Figure 1. Mbus System Block Diagram

6-305

S

~~
~

; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;~U~s~in~g=t~h~e~C~Y~7~C~3~6~1~a~s~a~n; ; ; ;M; ; ; ; ; ; ; h; ; ; ;us; ; ; ;A; ; ; ; ; ; ; rh; ; ; ;i=te

SEMlcnIDUCTOR;;;;;;;

Mbus Description
The Mbus is a SPARC standard main-memory interface for the Cypress SPARC CachelMemory
Management Unit device (the CY7C604). The M in
Mbus stands for module and emphasizes the multiprocessor module support that SPARC offers. It is a
high-speed, synchronous, 64-bit, multiplexed address
and data bus that operates at the CY7C601's clock rate.
Mbus accesses are initiated by a master and
responded to by a slave. Generally a bus transaction
takes place between a master and main memory, but in
the case of direct data intervention, transactions can
occur between masters. The handshake between the
CY7C604 CMU and the arbiter consists of a request
line (IMBR(3:0» and a grant line (IMBG(3:0» for each
master. A busy line (/MBB) is common to all masters
and indicates that the bus is in use.
Mbus arbitration uses the following procedure: A
master asserts its request line. The arbiter decides
whether to grant the request. The request is gran~ed
when the arbiter asserts the master's correspondmg
grant. As soon as the grant is received, the master can
deassert its request. The newly granted master watches
the IMBB (busy) signal. When IMBB is deasserted, the
new master must drive IMBB Low on the next clock
cycle to take control of the bus or risk losing its ~hance
for mastership. The new master can now start Its bus
transaction. When the transaction is completed, the
master deasserts IMBB. The arbiter continues to assert
the grant until another request is received. This allows
the master to perform multiple transactions without
repeating the arbitration sequence. Refer to Figure 2 for
the Mbus multiple request sequence.
Mbus transfers are synchronous with respect to the
system clock. The data transactions across the bus co~­
sist of a single-clock-period address phase and a mul~­
ple-clock-period data phase. Data transfers ~an occur m
word (64 bit), multi-word-burst, or atOmIc-load-store
formats. All signals are valid and sampled on the system
clock's rising edge. The Memory Address Strobe
(!MAS) signal validates the address phase and den~tes
the start of the actual data transfer. Three status lmes
indicate bus states and convey the current bus operation, as well as error status. Figure 3 shows the Mbus
data-transfer waveforms.

By design, the details of bus mastersh~p and resolution of multiple requests are handled outsIde the re~m
of Mbus and SPARC. This approach allows you to Implement any arbitration scheme that suits the system requirements.
.
.
Two arbitration schemes fit the Mbus specification.
The .simplest is concurrent arbitration, in which the bus
is granted to a master, and the master performs its bus
transaction. If requests are pending when the master
completes its transaction, the bus is re-arbitrated, and
the new master takes over. In this arbitration scheme,
the current master's grant is asserted during the bus
transaction and deasserted after the transaction is
fmished. The bus arbitration happens between bus
transactions, causing several cycles of latency between
transfers.
The second arbitration scheme is pre-arbitration,
which is more efficient on Mbus but trickier to implement. In pre-arbitration, the arbitration happens before
the previous bus cycle completes. The bus is granted to
a master, and the master starts its transaction. If other
requests are pending, the grant is withdrawn ~d ~e
bus is rearbitrated Once the new master receIves Its
grant, it waits until. IMBB is deasserted and then takes
control of the bus by asserting IMBB on the next cycle.
At this point the bus can be arbitrated again. This
means that as long as requests are pending, IMBB is
inactive for, at most, one cycle at a time. This takes
more work to implement because the arbiter must have

Stngle Write Access
No Wait States
.IIUS CLOCK
ADDIESS/DATA
/MAS

,

~

'----!-J

/III Of

:

:

/MITY

;

;

:

:

:

: ,--_ _ _ _ _ __

/ ME II

'I
;addr ••• dattl

/1111

"\ ,

,

~h •• r.ph ••• ,

16-Byte Burst Read
One Waft State
IIIUI CLOCK

CLOC It
'MAS

!MIRO

/IiIDY
'MITT

!MIRl

-------.,.,

'ME II

!MUO

/MII

,~

____________

~r----

!MIU
!MBI

------,\....-..JII.L,&
......t......LLr--"-......~

1

Figure 3. Mbus Data Transfer Waveforms

Figure 2. Mbus Multiple Request Sequence

6-306

~

£;~Rffi<)
Using the CY7C361 as an Mbus Arbiter
___ ,
~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MBUS

CLOC

CY7C361 CLO
IMBRO
IMBGO

\~----~--------~--------~I
I
--~--------~--------~:~\~~~~~~~~~S~

~tc Om a x~

IMBB
ARB I T E R S T A TL.!:-E_.J\....AI!...&..1jL-1J'~L..l....lil\¥....!E..,.~'~.,.JL..!IIU\.IIUL.I~a..¥-"'-'-'LJ\........J£.-¥-!t..J'-......L:!LJ~----k.......J'----'--'"--"'---"~...,\

Figure 4. CY7C604 & CY7C361 Timing for Master 0
two different arbitration modes: one for when the bus is
idle, (e.g., immediately after power up), the second for
normal operation.
The arbiter described in this application note uses
the pre-arbitration scheme.

(IMBR(3:0» and the bus-busy input (IMBB). The
CY7C361's state macrocells implement this block. The
third block is implemented in the output array, which
generates the grant (IMBG(3:0» signals that give an
Mbus master ownership of the bus.

Timing Considerations

Handshake State Machine

Because this arbiter uses pre-arbitration, the arbiter need not be able to accept a request, resolve access contention, and grant bus rights to a master in a
single Mbus clock cycle. In this application, the arbiter's
clock input runs at the same 40-MHz clock rate used by
the CY7C601 and the CY7C604s. This clock rate allows
the arbiter inputs to meet the timing requirements of
the Mbus masters. Internally, the CY7C361 that implements the arbiter is clocked at 80 MHz by virtue of the
on-chip clock doubler.
Figure 4 shows the timing relationship between
master 0 (a CY7C604 at 40 MHz) and the CY7C361
arbiter. This figure also illustrates the first arbitration
cycle after a reset.

Because the condition decode array uses the feedback from the handshake state machine, consider the
state machine first. The machine controls Mbus handshake and arbitration. The arbiter cycles through 26 discrete states in performing its function and thus takes 26
state macrocells to implement. The first two states,
BEGIN and IDLE, can be thought of as supervisory
states. Their state diagram appears in Figure 6. Each
master has its own grant sequence of six states for a
total of 24 (Figure7).
You need to consider a number of issues when fitting a design into the CY7C361. The fIrst is the placement of the state macrocells with relation to each other.
All (wait until) TERMINATE macrocells must be
preceded by another macrocell to provide a way to pass
the token into the cell.
Because no grants are asserted on power-up or
reset, the machine starts a token in BEGIN and passes
the token to IDLE on the next clock. IDLE is a (wait
until) TERMINATE macrocell. This means that the
previous state macrocell (BEGIN in this case) passes a
token to IDLE, which keeps the token until certain conditions are met (more on this shortly).
When the machine receives a request (IMBRn,
where n = 0, I, 2, 3), priority is decided and a token is
created in a START macrocell called GTn PRI. Because the priority selection happens as a condition to
GTn PRI, a condition for GTO PRI is that IMBRO is
asserted. Further, a condition -for GT3 PRI is that
/MBR3 is asserted and not the other requests.
The token from GTn PRI is immediately passed to
TERMINATE macrocell-GTn WAIT. The GTn PRI
macrocell must be placed first, because it is the START
macrocell that creates the token. OTn PRI is immediately followed by the GTn_WAIT macrocell, which is
essentially a timing loop that waits until the bus can be

Arbitration Scheme
With the arbitration function left to the designer,
there are several resolution techniques you can employ.
Fixed priority, rotating priority, least recently used, and
random priority are all contention-resolution schemes
that have proven successful, yet each has its own faults.
A fixed priority, for instance, favors one requester more
than the others. Rotating priority provides a simple, but
not always fair approach to arbitration. A least recently
used arbitration scheme represents the fairest form of
contention resolution but requires a highly complex implementation. The random technique does not guarantee arbitration results. To help simplify this example, it
uses fixed priority, with master 0 having the highest
priority, and master 3 the lowest.

Design Partitioning
The design is partitioned into three functional
blocks (Figure 5). The first block is the condition
decoder or input array. The second block is the handshake state machine, which keeps track of requests
6-307

Output
Array

State
Microcel

s

Figure S. Arbiter Block Diagram
arbitrated. GTn_WAIT .terminates under two condi~ons: wh~n IDLE is active, meaning that no bus activity
IS occumng or when IMBB goes active, meaning that
the bus can be arbitrated. At this point GTn WAIT
passes the token to one of two processes.
If IDLE is still active, the token passes to
IDLn_SGT. This is another START macrocell, which
passes the token to the adjacent TERMINATE macrocell, called IDLn_TGT. These two states produce the
IMBGn signal. IDLn TGT and IDLE both terminate
when IMBB is asserted, (meaning that master n is now
controlling the bus), and any GTn WAIT state is asserted (meaning that another request is pending).

no

reques

This sequence is only used for the fIrst two arbitra~
tions immediately after a reset, or if there has been a
lapse in bus requests. The sequence fInishes as soon as
the master takes control of the bus.
If IDLE is not active, GTn_WAIT passes the token
to NRMn_SGT. This is a START macrocell, which passes the token to the adjacent TERMINATE macrocell
called NRMn_TGT. Like the equivalent IDLn SGT
and IDLn_TGT . states, these two states produce the
IMBGn signal. However, NRMn TGT terminates when
IMBB is deasserted (signaling -that the current bus
transaction has completed and master n will control the
b~s ~ext), and any GTn_WAIT state is asserted (indicating another pending request). The next grant is asserted as soon as master n asserts IMBB.
This sequence is the normal mode of operation. It
terminates as soon as the previous bus transaction completes. On the next cycle, the granted master takes control of the bus while the arbiter is issuing the next grant.
Note that both modes allow for bus parking, which
allows a master to do multiple bus transactions without
re-arbitrating, so long as no other requests are pending.
This is why GTn_WAIT is used to terminate the grant
line.
Figure 8 shows the. waveforms for two consecutive
arbitrations, the fIrst starting from the IDLE state and
using the idle mode, the second proceeding in normal
operation mode.
The Condition Decode Array
The condition decode array implements the control
logic for the handshake state machine. This array's inputs consist of the true and complement of all input
pins along with the true and complement of the state

s

/MBB-O
and

GTn

WAIT

Figure 6. Supervisory State Machine: BeginlIdle

6-308

/MBRO-I.

/MBRO. /MBRI-I.
/MBR2-0

/MaRO-I.
/MaR2-1.

/MBRI-O

/MaRI-I.
/MBR3-0

NOTES:
MBGn - IDLn SGT + IDLn TGT + NRMn SGT + NRMn TGT; so
IMBGn - /IDLn_SGT * IIDLn_TGT * /NRMn_SGT * 7NRMn_TGT:
IDLn_TGT terminates when a GTn_WAIT state is asserted and /MBB-O.
NRMn_T6T terminates when a 6Tn_WAIT state 1s asserted and /MBB-I.

Figure 7. State Diagram of the Mbus Arbiter

6-309

QREss

Using the CY7C361 as an Mbus Arbitel

.• SEMIcamUCTOR =;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;=;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;:;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Each state macrocell has two possible inputs. One
input comes from the previous macrocell. You enable
this input by specifying CIN in the macrocell's configuration. A TERMINATE macrocell requires the CIN
designation; otherwise the macrocell cannot be set.
The other state macrocell input comes from a condition decoder, which is the output of the conditiondecoder array. A condition decoder is the output of an
AND function, with inputs from a NAND (or
INY PROD) term and an AND (or PROD) term. OR
functions are accomplished by inverting the inputs to
the INY PROD term.
The -BEGIN state starts a token after a reset, when
none of the requests (IMBR(3:0». are asserted. This
state passes the token to IDLE, where it stays until terminated by the combination of a GTn.:_WAIT state and
the IMBB line being activated. The PLD ToolKit code
for this is:
BEGIN =  IMBRO * IMBR 1

macrocell feedback. The state macrocell feedback terms
are not all global inputs, however. Every fourth macrocell has global feedback, and every third macrocell has
feedback to 16 of the 32 macrocells. The rest of the
macrocells have local feedback within their group of
eight Placement of the states so that there is adequate
feedback is one of the biggest challenges when using the
CY7C361. The global feedback in die Mbus arbiter
comes from the GTn WAIT states.
The other array- inputs are IMBB, IRE SET, and
IMBR(3:0). The IMBR(3:0) inputs are single registered.
Because the input registers have very small set-up and
hold times (tsu = 2 ns, tH = 3 ns), metastable events
are unlikely. The combinatorial nature of IRE SET
makes it suitable to work as a non-registered signal.
IMBB, on the other hand, is double registered because
of its importance to the design's internal processes. If
you can guarantee that IMBB will not change until after
the falling edge of the system clock, you can singleregister IMBB for slightly better performance.
All the array inputs mentioned above are Low asserted. You specify this fact in Cypress PLD ToolKit
syntax by the preceding slash when you define the signals in the source file's configuration section. In the
source file's equation section, because all signals are
treated as High asserted, specifying MBB in an equation is the same as saying that IMBB = 0 on the pin.

* IMBR2 * IMBR3;

IDLE

 MBB
 /GTO WAIT * /GTI WAIT
- * IGT2_WAlT * IGT3_WAIf;

Keep in mind that, because BEGIN is a START
macrocell, BEGIN is asserted for one cycle after its
condition is true. Because IDLE is a TERMINATE

MBUS CLOCK
CY7C361

CLOCK

"

IMBRI

I

,

~~----~:----~:~----~

--7--------7---.::
I

GTO_PRI

,,

"
I

IMBRO

I

:
:
i

--7-----~--~r--)~----~:----

__~____~____~~:______

-L______

L -_ _ _ _

,

"

L __ _ _ _

~

STO_WAIT

--~----~-----J~~--~i------~----~----~:----~~----~------~----~

IOLO_S6T

--~----~------~:--~r--r~.----~------~----~:------~----~------L-----~

IOLO_T6T
IN BG 0

6TI_PRI
6TI_WAIT
HRMI_S6T
HRM1_T6T

__~----~------~:--~~V

:,~

__~~____~____~~__~

---r------r-----~:----r'~'-;'
I~\O~~------r_-----r----~
,
~ tc o>:\~--~-------'--------L:-'-----'Y

-~----~------~----~:------~----~r--'~~:----~~----~----~----~
~
~
~
~:
-L:__~~~
,
, ____~______~_____T----~
__

____

______

____

_ _ _ _ _ _L -_ _ _ _

_ _- ' -_ _ _ _---L_ _ _ _ _ _~_ _ _ _~:_ _ _ _ _ _L -_ _ _ _- L_ _ _ _~r--\~

,

__r -____~______~____~

I

IN BG I

INBB
IDLE

If anothlr riquist hiS blln rlcetved,
NRNn_T6T terntnates here,
and the next grant fs fssued.

Figure 8. Handshake State Machine Waveforms

6-310

macrocell, it is asserted after the previous macrocell is
asserted; IDLE is deasserted when its condition is true.
The machine enters the OTn PRI state if this
state's request is the highest priority-received. OTn_PRI
passes the token directly to OTn WAIT, which terminates when either IMBB goes active (meaning that a
bus transaction has started) or IDLE is active (indicating that no bus transactions have taken place). The
PLD ToolKit code for OT3_PRI and OT3_WAIT is:
OT3_PRI =
 IMBRO * IMBRl
* IMBR2 * MBR3;
OT3_WAIT

=


 IMBB

*

The second state the machine can enter from
OTn WAIT is NRMn SOT. This state is entered when
the corresponding OTn_WAIT is asserted, MBB is asserted, and IDLE is not asserted. NRMn SOT (a
START macrocell) passes the token to NRMn TOT,
which is a TERMINATE macrocell. NRMn TOT terminates when MBB is deasserted and one of the other
OTn_WAIT states is asserted. The equations are:
NRM3 SOT
 OT3 WAIT * MBB *
IIDLE; IMBB
 IOTO WAIT
*/OTl_WAIT * IOT2_WAIT;

IIDLE;

Remember that  is a NAND term,
and therefore the expression above means that
OT3 WAIT is terminated when MBB or IDLE is asserted (according to the DeMorgan theorem). Also
note that the OT3 WAIT equation specifies
 without an expression. This automatically
sets the AND term to a logical 1. If the equation did
not include , PLD ToolKit would assume
that  equals 0, which would cause the condition decoder to always be false and OTI _WAIT to
never terminate.
At the same time that OTn WAIT is terminated, a
new token is started in one of-two macrocells. In the
first case, IDLE is asserted, and the token is created in
IDLn SOT, a START macrocell. The condition for
entering IDLn_SOT is that IDLE and the corresponding OTn WAIT state are asserted. IDLn SOT then
passes the token to the TERMINATE- macrocell
IDLn TOT. IDLn TOT terminates when MBB is asserted- and one or the other handshake. processes is in
its OTn_WAIT state. The equations are:
IDL3_SOT
 OT3 _WAIT * IDLE;
IDL3_TOT

This sequence is the normal mode of operation.
The sequence terminates as soon as the previous bus
transaction finishes. On the next cycle, the granted
master takes control of the bus, while the arbiter is issuing the next grant.
The Output Array
The CY7C361's output array has the complements
of all the state registers as its inputs. The terms are
NAND based and connected directly to the output pins
or Mealy macrocells, which makes the outputs an OR
function of the state macrocells. IMBGn (where n = 0,
1, 2, 3) is produced by ORing states IDLn SOT,
IDLn_TOT, NRMn_SOT, and NRMn_TOT.
-

Design Verification
The entire CY7C361 Mbus arbiter design was
entered using the Cypress PLD ToolKit and verified
using the PLD ToolKit's interactive simulator. Working
with a mouse and pop-down menus, the designer
created the circuit stimuli by drawing waveforms on a
graphics screen for a each CY7C361 node or pin. The
PLD ToolKit' s SIMULATE command then displays the
response waveforms, promoting a high degree of confidence in the design's operation before programming a
part. The PLD ToolKit source file can be found in
Appendix A.

 MBB
 IOTO WAIT
*IOT1_WAIT * IOT2_WAIT;

This sequence only happens immediately after a
reset or if there has been a lapse in bus requests. The
sequence finishes as soon as the master takes control of
the bus.

Reference
SPARe Mbus Interface Specification, Revision 1.1,
Published March 29, 1990 by Sun Microsystems.

6-311

Appendix A. PLD ToolKit Source File for Mbus Arbiter

CY7C361;

{Mbus Arbiter using the CY7C361}

CONFIGURE;

{Configuration of macrocells}
{Inputs}

CLK

(NODE=4),

IMBB (IIREG) ,

{MBus Busy can be single or double registered,
depending on the system}

IRESET,
IMBRO (NODE=10, !REG),

{MBus Request from master 0, single registered}

IMBRI (!REG),

{MBus Request from master 1, single registered}

IMBR2 (!REO),

{MBus Request from master 2, single registered}

IMBR3 (!REG),

{MBus Request from master 3, single registered}

IMBGO

(NODE=16),

{Outputsl
{MBus Grant for master 0, low asserted output}

IMBG1,

{MBus Grant for master 1, low asserted output}

IMBG2,

{MBus Grant for master 2, low asserted output}

IMBG3

(NODE=28),

{MBus Grant for master 3, low asserted output}

CKENI (NODE=30), CKEN2,

{internal node configuration}
{clock enables for input macrocells}

BEGIN (NODE=34, START),
IDLE (1ERM, CIN),

{begin and idle are supervisory states}
{idle is situated on a global feedback macrocell}

OTO PRI (NODE=38, START),
OTO=WAIT (1ERM,CIN),

{prioritization is a condition to this state, 0 is highest priority}
{waits until a grant can be issued, this rnacrocell has global feedback}

NRMO SOT (START),
NRMO=TGT (1ERM,CIN),

{These 2 states are the actual grant states for IMBOO }
{during normal operation}

OTt PRI (START),
OT(WAIT (1ERM,CIN),

{prioritization is a condition to this state, 1 is second highest priority}
{waits until a grant can be issued, this macrocell has global feedback}

NRMI SOT (START),
NRMl=TOT (1ERM,CIN),

{These 2 states are the actual grant states for 1MB 0 1}
{during normal operation}

OT2 PRI (START),
OT2=WAIT (1ERM,CIN),

{prioritization is a condition to this state, 2 is third highest priority}
{waits until a grant can be issued, this macrocell has global feedback}

NRM2 SGT (START),
NRM2=TOT (1ERM,CIN),

{These 2 states are the actual grant states for IMBG2}
{during normal operation}

GT3 PRI (START),
OT3=WAIT (1ERM,CIN),

{prioritization is a condition to this state, 3 is lowest priority}
{waits until a grant can be issued, this rnacrocell has global feedback}

NRM3 SGT (START),
NRM3=TGT (1ERM,CIN),

{These 2 states are the actual grant states for IMBG3}
{during normal operation}

IDLO SGT (START),
IDLO=TGT (1ERM,CIN),

{These 2 states are the actual grant states for IMBGO}
{for the first two transactions after an idle state}

6-312

Appendix A. PLD ToolKit Source File for the Mbus Arbiter
IDLl SOT (START),
IDL(TOT (TERM,CIN),

{internal node configuration data--continued}
{These 2 states are the actual grant states for /MBG l}
{for the first two transactions after an idle state}

IDL2 SOT (START),
IDL2=TOT (TERM,CIN),

{These 2 states are the actual grant states for /MBG2}
{for the first two transactions after an idle state}

IDL3 SOT (START),
IDL3=TOT (TERM,CIN),

{These 2 states are the actual grant states for /MBG 3}
{for the flrst two transactions after an idle state}

ORST

{Global ReSeT node}

OFF

(NODE=64),
(NODE=73),

{internal reference point}

CLK2X (NODE=74, DBL_CLK),

{the internal clock doubler is enabled}

EQUATIONS;

{the equations for the part are specified here}

ORST =



RESET;

OTO PRI



MBRO;

{the condition decode array}
{start macrocell, master 0 has highest priority}

OTO WAIT = 
 IMBB

*

*

NRMO_SOT =  OTO_WAlT

IIDLE;
MBB

 /MBB
 IOTl_WAlT
 OTO_WAlT

*

 MBB
 IOTl_WAlT

OTl PRI =

 IMBRO

OTl WAIT


 IMBB

*

NRMl_SOT =  OTl WAlT

*

 OTl WAlT

*

IIDLE; {grant is issued if request is highest order pending, and}
{MBB is asserted and IDLE is not asserted}
{grant terminates when MBB is deasserted,}
{and a request is pending}

* IOT2_WAlT * lOTI_WAlT;

{grant is issued if request is highest order pending, and
{IDLE is asserted}
{grant terminates when MBB is asserted, }
{and a request is pending}

* lOTI_WAlT * lOTI_WAlT;

{start macrocell, master 1 has second highest priority}

MBRl;
IIDLE;
MBB

NRMl_TOT =  /MBB
 IOTO_WAlT
IDLl SOT

*

IDLE;

IDLO TOT

*

{terminate macrocell,  enables condition decoder}
{gtO_wait terminates if MBB or IDLE are asserted}

*

IIDLE; {grant is issued if request is highest order pending, and}
{MBB is asserted and IDLE is not asserted}
{grant terminates when MBB is deasserted,}
{and a request is pending}

* IOT2_WAlT * IOT3_WAIT;

IDLE;

 MBB
 IOTO_WAlT

{terminate macrocell}
{gtl_wait terminates if MBB or IDLE are asserted}

{grant is issued if request is highest order pending, and
{IDLE is asserted}
{grant terminates when MBB is asserted,}
IOT3 _WAlT; {and a request is pending}

* lOTI_WAlT *

6-313

Appendix A. PLD ToolKit Source File for the Mbos Arbiter
Gn_PRI

{condition decode array equations--continued}
 IMBRO '" IMBRI '" MBR2; {start macrocell. master 2 has third highest priority}

=

{terminate rnacrocell}
{gt2_wait terminates if MBB or IDLE are asserted}


 IMBB '" IIDLE;

NRM2_SOT

=

NRM2_TOT

= 

 On_WAIT

*

MBB

IMBB
dNY_PROD> IOTO_WAIT

*

IIDLE; {grant is issued if request is highest order pending. and}
{MBB is asserted and IDLE is not asserted}
{grant terminates when MBB is deasserted.}
{and a request is pending}

* lOTI_WAIT * lOTI_WAIT;

IDL2_S0T

{grant is issued if request is highest order pending. and
{IDLE is asserted}
 MBB
dNY_PROD> IOTO_WAIT
 IMBRO

* IMBRl * IMBR2 * MBR3;


 IMBB
NRM3_S0T

=

 OTI_WAIT

* IIDLE;
* MBB *

NRM3_TOT =  IMBB
dNY_PROD> IOTO_WAIT
 OT3_WAIT

*

 MBB
 IOTO_WAIT

BEOIN =

 IMBRO

IDLE

 MBB
 IOTO WAIT
-

IIDLE; {grant is issued if request is highest order pending. and}
{MBB is asserted and IDLE is not asserted}
{grant terminates when MBB is deasserted.}
{and a request is pending}

* IOT1_WAIT * Ion_WAIT;

*

{grant is issued if request is highest order pending. and
{IDLE is asserted}

{grant terminates when MBB is asserted.}
IOT1_WAIT '" lOTI_WAIT; {and a request is pending}

* IMBRl * IMBR2 * IMBR3;
* IOTl

{output array equations}
{input register clocks are always enabled}

 IOFF;
 IOFF;

* INRMO

< INY SUM> INRMO SOT

MBOl

 1NRM1_SOT

*

MB02

 1NRM2 SOT
-

* INRM2

 1NRM3_S0T

* 1NRM3_TOT

MB03

{begin asserts when there are no requests,
usually after a reset}

{idle terminates when MBB is asserted,}
WAIT '" lOTI WAIT * lOTI WAIT;
{and a request is pending}

MBOO

,

{start macrocell. master 3 has lowest priority}

{terminate macrocell}
{gt3_wait terminates if MBB or IDLE are asserted}

IDLE;

IDL3_TOT

CKENl
CKEN2

{grant terminates when MBB is asserted.}
'" lOTI _WAIT; {and a request is pending}

* lOTI_WAIT

TOT'" IIDLO SOT'" IIDLO TOT; {each grant is made up
of these four states}
INRM1_TOT '" IIDLl_SOT '" IIDL1_TOT
-

TOT'" IIDL2 SOT
-

*

IIDL2 TOT;{the output pins are
inverted for the low}

'" IIDL3_S0T '" IIDL3_TOT; {asserted grants}

{end of ftle};

6-314

~~~Z-=I-I-i~~·~..'iii~

..~.~

,

CYPRESS
SEMICONDUCTOR

TMS320C30/VME Signal Conditioner
Using the CY7C361
The design documented in this application note
shows how to use the Cypress CY7C36I to work with
the TMS32OC30 (,C30) digital signal processor from
Texas Instruments and a VME interface. The design
uses a single CY7C36I to perform 'C30 interrupt signal
conditioning as well as VME DTACK (Data Transfer
ACKnowledge) generation. The CY7C36I performs
these functions at a cost that is generally lower than
would otherwise be possible.
This application note provides a brief introduction
to the CY7C36I and the methods you can use to implement two different functions in the device. This design
contains six different state machines and uses 30 of 32
available macrocells.

CY7C361 Description
The CY7C36I is a 28-pin, I25-MHz state machine
EPLD. It contains 32 macrocells, eight dedicated inputs, four bidirectional pins, six dedicated output pins,
and four Mealy output pins (which you can use as fast
combinatorial outputs).
The CY7C36I is based on a token-passing state
machine methodology, which is distinctly different from
what you might consider the "normal" method of designing state machines (e.g., encoding states). The
CY7C36I's token-passing scheme effects a logical,
streamlined state machine design methodology. In this
scheme, each macrocell typically corresponds to a state.
It is possible, however, to encode states in this device.
But associating each macrocell with a state generally
obviates the need to decode the macrocell outputs to
determine the machine's present state, which eliminates
the need for a state table.

You can configure each CY7C361 state macrocell
in one of three ways. First, in the START configuration,
the macrocell's output pulses High for exactly one cycle
when either of two conditions are asserted: the C IN
signal from the previous macrocell or the output ofthe
CY7C36I's condition decoder. The start configuration
is useful for starting a sequence.
The second macrocell configuration is the (Wait
Until) TERMINATE configuration. In this configuration, the output goes High when C_IN is received from
the previous macrocell and remains High until the condition decoder's output is asserted. You can use this
configuration to indicate, for example, "I am performing
this function now," where "I" is a state machine implemented in the CY7C361.
In the third macrocell configuration, the TOGGLE
configuration, the macrocell output toggles so long as
the C IN from the previous macrocell or the condition
deco$
~

TMS320C30NME Signal Conditioner Using the CY7C361

~~~;;~~~~~~~~~~~~~~~~~~~~~~~~~~~

N33

INTO 51

N34

INT1 50
INT1 51

N35
Local Reset

=

N66

N36

~IN~TO~C~O___________________________

N37

INTO C1

N38

INTO C2

N39

NOT U5ED
CO

N41

C1

N42

INT1 C2

N43
Local Reset

=

INT1 MaN
N68

N44

~IN~T~2~C~0~___________________________

N45

~IN~T2~C~1___________________________

N46

_IN_T2~C~2___________________________

N47

INT2 MON

N49

~IN~T3~5~1____________________________

N50

~IN~T2~50~__~______________________

N51
INT2 51
Local Reset = N70 -------------------------------N52

_IN_T_3_C_0____________________________

N53

~IN~T~3~C~1

N54

~IN~T~3~C=2___________________________

N55

NOT USED

____________________________

Reserved for DTACK circuitry

N57

Reserved for DTACK circuitry

N58

Reserved for DTACK circuitry

N59

Reserved for DTACK circuitry

Local Reset

= N72

N60

Reserved for DTACK circuitry

N61

Reserved for DTACK circuitry

N62

Reserved for DTACK circuitry

N63

Reserved for DTACK circuitry

Figure 4. Resource Allocation Chart for 'C30 Inierrupt Conditioner Circuit

6-319

W!= .. --;;;==;;;;;;;;;;;T;;;;;M;;;;S;;;3;;;2~O;;;;C;;;;3;;;;O/;;;;;V=M;;;;E~S~ig:;;D;;;;;a;;;1

;;;-;;C;;;oD;;;d;;;i;;;;ti;;;oD;;;;;;;e;;;;r;;;;;;;V;;;;;s;;;;;;;iD;;;;:;g:;t;;;;;;;h;;;;;;;e;;;;;;;C;;;;;;;Y;;;;;;;';;;;;;;C;;;;;;;3;;;;;;;6;;;;;;;1

SEMICOIDUCTOR _

INTx SO
INlx S1
I~JT x

co

I~JTx

C1

--+---~~-+~~

II
_-i-_-+--!--+--ii.---i--i-"",r-LJ'i.--~""
~!---i--~+--i---!-......j.­
...--' , \--'
,

~!~

INTx C2

I\JTx MON
Figure 5. Output Signals for the Interrupt Controller
PLD ToolKit simulator output is presented in Figures8
and 9.

"ienx" is the name assigned to the nodes. Node 29
enables inputs 10 through I3. Node 30 enables I4
through I7. Node 31 enables the inputs of the bidirectional pins (BO through B3).
One line of code provides a way to tie signals to
ground:
GND(node=73),
You can use this code, for example, to permanently
enable
inputs,
which
the
equations
section
demonstrates:
ienl
 /gnd;
ien2
 /gnd;
ien3
 /gnd;

CY7C361 Configuration Information
Both the configuration and equations parts of the
PLD ToolKit source file in Appendix A contain a section for miscellaneous functions. These lines of code
configure a variety of CY7C361 characteristics.
The configuration section contains a required line
that sets up the internal clock:
CLKDB(node= 74, dbl clk),
CLKDB is the name assigned to the CY7C361's internal clock. Node 74 is the clock's internal node number. "dbl clk" is an attribute that turns on the
CY7C361'S clock doubler. If you do not want double:'
clock operation, simply leave out the "dbl_clk" attribute.
Three lines of code configure the input enables:
ienl(node=29),
ien2(node=30),
ien3(node=31),

/GMSELO
/GMSEL 1
DTACK

Acknowledgment
The author thanks Steve Heinrichs and Scott Mindemann for permitting their idea to be used here and
for providing information regarding the TMS320C30's
interrupt structure and timing.

J\..--,,--I\
~ ~l~---~~~~:==/-~~~)

~

__

135 ns (min)

~75

ns (min)

~

Figure 6. Timing Requirements for DTACK Circuitry

6-320

__

=C:~RESS

TMS320C30/VME Signal Conditioner Using the CY7C361

~, .~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Local Reset = N65
N32
N33
N34
N35

N57

s_E_L_O_s_l____________________________

N58
S~E~L~1~S~O___________________________
N59S=E=L~1~S~1~__________________________
Locol Reset = N72
N60

N62

DTACK CO
DTACK C1
DTACK C2

N63

DTACK C3

N61

Figure 7. Resource Allocation for the DTACK Circuitry

6-321

CLK N
CLKDB
/GMSELO

-n~i,=,tj~::::~=t=t:t:ti:t:tj:j:::::i:l-~~iI
i

_

i

/GMSEL 1
DTACK
SELO

.

i

i

i

--!-!-~--!---~-!-

~!I

.................

iLL

so -i---+-+-+-.....j,..-4-...;.-4-4--Itl...' ~4-...;.-......;.-i-o-oo!--!O~-i-i.....j,..1_

i ,
LL
I i '

SELO S1
SEL 1 SO
SEL 1 S1

~~:~: ~~ --!---.;.~-~~-"~---;'~~---;"~--i---i-~i

DTACK C2

i

I I~-i--~-"'''''''''...,j,.,..~--i--i--i--

DT ACK C3

tl:. .--i-_i_-i--i--i--i-~~-i--":""-

DTACKRST

!I !

;LL

I :

Figure 8. DTACK Timing for IGMSELO
CLKIN

/GMSEL 1

SE~:A~: --!--+~-!-+--!--+liI
SELO S1
SEL 1 SO
SEL1 S1
DTACK CO
DTACK C1
DTACK C2

______

~~~~----~~~~--~~-----i---

__ IU--~

--~~~~~---;.---;.~~---;..~~--~~~~~~~~

---i-+-~....j-~h,-:-+--i-~i-.;.--i-~-!---+-+--~~~-+-!

~

j

M~"""-+--+--+--i--+--+--i--+-~~-+--+--i--+-.-+--+--+--+-+-~+--~......-+--+-+-+-_
i n,-~----!--+-4---!~~_4_~+__4__+_--i-_4_..;.-.

m:.

DTACK C3 -!---+--+--..;.......;.......;--+-~-!-o!--i---i~--!--i-----!--!--+--!-+-O!--!-DTACKRST

II

1L
1

Figure 9. DTACK Timing for IGMSELI

6-322

:

5):"""""

TMS320C30NME Signal Conditioner Using the CY7C361

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. PLD ToolKit Source File for
TMS320C30NMESignai Conditioner/Generator
CY7C361;
{PLD Toolkit source code listing for TMS32OC30NME signal conditioner/generator}
CONFIGURE;
{These lines are miscellaneous setup for the 36l.}
CLKDB(node= 74, dbl_clk),
{Required line. In this case, the clock doubler is ON.}
ien1 (node=29),
ien2(node=30),
ien3(node=31),

{Nodes 29, 30, and 31 are the input enables for the}
{dedicated inputs and the bidirectional inputs used on}
{this device.}

GND(node=73),

{Used to permanently assert an internal signal.}

CLKIN(node=4),

{Clock signal. Feeds both the 361 and TMS32OC30.}

GLBL_RST(node=64),

{Naming the internal global reset node}

/RESET(node=9),

{Pin that will be used to assert GLBL_RST}

{This section is the configuration of the interrupt logic for INTO.}
/XINTO(node=l, iireg),
{An interrupt trigger input. Pin 1, double registered.}
/INTO(node=28),

{Massaged interrupt pulse, output on pin 28.}

INTO SO(node=32, start),
INTO=Sl(cin, term),

{State SO means "not counting", start configuration.}
{State Sl means "counting", (Wait Until) Terminate}
{configuration, triggered by C_IN from INTO_SO.}

INTO CO(node=36,
INTO-C1(tog),
INTO=C2(tog),

{CO, C1, and C2 are the counter bits. They count to}
{6 and are locally reset to 000 binary, all configured in}
{the toggle configuration}

tog),

INTO_RST(node=66),

{Local reset for INTO counter. Resets when C2CICO = 110.}

{This section is the configuration of the interrupt logic for INTl.}
/XINT1(node=2, iireg),
{An interrupt trigger input. Pin 2, double registered.}
/INTl(node=27),

{Massaged interrupt pulse, output on pin 27.}

INT1 SO(node=34, start),
INT(Sl(cin, term),

{State SO means "not counting", start configuration.}
{State Sl means "counting", (Wait Until) Terminate}
{configuration, triggered by C_IN from INT1_S0.}

INT1 CO(node=40,
INT(Cl(tog),
INTl_C2(tog),

{CO, C1, and C2 are the counter bits. They count to 5}
{which triggers INT1_MON, which then causes them to be reset}
{to 000 binary, all configured inthe toggle configuration}

tog),

INTl_MON(start),

{This is the monitor bit for the INTl counter. Configured}
{as Start and triggered by C2ClCO = lOl}

INT1_RST(node=67),

{Local reset for INTl counter. Resets when INTl MON is}
{asserted. }
-

6-323

~=

TMS320C30NME Signal Conditioner Using the CY7C361

~aw~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. PLD ToolKit Source File for
TMS320C30IVME Signal Conditioner/Generator (Continued)
{This section is the configuration of the interrupt logic for INT2. All comments ,from INTI apply here.}
IXINT2(node=3, iireg),
IINT2(node=26),

INT2 SO(node=50, start),
INT2=Sl(cin, term),
INT2 CO(node=44, tog),
INT2-Cl(tog),
INT2=C2(tog),
INT2_MON(start),
INT2_RST(node=68),
{This section is the configuration of the interrupt logic for INT3. All comments from INTO apply here.}
IXINTI(node=5, iireg),
IINT3(node=25),

INTI SO(node=48, start),
INTI=Sl(cin; term),
INT3 CO(node=52, tog),
INTI=Cl(tog),
INTI_C2(tog),
INTI _RST(node= 70),
{Configuration of the DTACK circuitry.}
IGMSELO(node= 10, iireg),
IGMSELl(node=l1, iireg),
DTACK(node=19, ninv),

{Input, pin 10, double registered}
{Input, pin 11, double registered}
{Output, pin 19. Mealy output is used to get non-inverting}
{output}

SELO SO(node=56, start),
SELO=Sl(cin, term),

{Supervisory state for IGMSELO.}
{Supervisory state for IGMSELO.}

SELl SO(start),
SELCS1 (cin, term),

{Supervisory state for IGMSELl.}
{Supervisory s~te for IGMSELl.}

DTACK_CO(tog),
DTACK_Cl(tog),
DTACK C2(tog),
DTACK=C3(tog),

{LSB of the DTACK delay counter.}

{MSB of the DTACK delay counter.}

DTACKRST(node=72)

{Reset term for the DTACK delay counter.}

6-324

~CYPDJ:'C"C'
~ ~ ==

TMS320C30!VME Signal Conditioner Using the CY7C361

~, ~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. PLD ToolKit Source File for
TMS320C30NME Signal Conditioner/Generator (Continued)
EQUATIONS;
{This section makes sure the inputs are always ENABLED and connects RESET to the internal Global Reset.}
ienl = Ignd;
ien2= < iny sum> Ignd;
ien3 = Ignd;
GLBL_RST =  RESET;
{Equations for the interrupt logic for IINTO.}
INTO_SO = < inv""prod>
IXINTO
< prod> ;

{Start configuration, triggers on falling}
{edge. of IX INTO. < prod> is included}
{to set the AND term to logic 1. If this}
{is not done, the condition decoder is}
{always logic O.}
{(Wait Until) Terminate configuration,}
{triggered by C IN from above, }
{terminated by count of 6.}

a

INTO CO =  INTO S 1;
INTO- Cl =  INTO CO * INTO S 1;
INTO C2 =  INT(:CO * INTO.=-Cl * INTO_S 1;

=

INTO =  IINTO_S 1;

{LSB of the counter}
{MSB of the counter}
{Output equation. Effectively,}
{/INTO is asserted while INTO SI is}
{asserted.}
{Local reset term.}
{Included since XINTO is an input to a}
{bidirectional pin. This turns off the}
{output buffer, making the pin an input.}

{Equations for the interrupt logic for IINTl.}
INTI_SO = < inv""prod>
IXINTI
;

{Start configuration, triggers on falling}
{edge of IXINT1.  is included}
{to set the AND term to logic 1. If this}
{is not done, the condition decoder is}
{always logic O.}

INTI SI

{triggered by C IN from above,}
{terminated when INTl_MaN is asserted.}

INTl CO =  INTI S 1;
INTCCI =  INTI CO * INTI SI;
INT(C2 =  INTCCO * INTl.=-CI * INTl_Sl;

{LSB of the counter}

INTl_MON

=

 INTI_C2

*

IINTl_CI

*

INTl_CO;

{MSB of the counter}
{This is the monitor bit. In order to make}
{the output be 7 clocks long, this must}
{be triggered at 6 clocks (eg, a count of 5.)}
{Output equation. Effectively,}
{lINT 1 is asserted while INTl S I is}
{asserted.}
{Counter is reset when INTI_MaN is}
{asserted.}

6-325

~=
.

TMS320C30!VME Signal Conditioner Using the CY7C361

~C~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. PLD ToolKit Source File for
TMS320C30NME Signal Conditioner/Generator(Continued)
{Included since XINTI is an input to a}
{bidirectional pin. This turns off the}
{output buffer, making the pin an input}

XINTl

{E,quations for the interrupt logic for IINT2. All comments from lINT I apply here.}
INT2_SO = < inv""'prod>
IXINT2
;
INTI_SI =  INTI_MON;
INTI CO . =  INT2 S I;
INT2- C1 =  INT2 CO * INT2 Sl;
INT2=C2 =  INT2,,=-CO * INT2_C1 * INT2_SI;
INT2_MON

=

 INTI_C2

* IINT2_C1 *

INT2_CO;

INT2 = dnv_sum> IINT2_S 1;

{Equations for the interrupt logic for IINT3. All comments from IINTOapply here.}
INT3 _SO = dnv""'prod> IXINTI
;
INT3 _Sl =  INTI _C2 * INTI _CI * IINTI _CO;
INT3 CO =  INT3 S 1;
INTI- C1 =  INT3 CO * INT3 Sl;
INT3 C2 =  INT3..=-c0 * INT3 ~l * INT3 _S 1;

=

{Equations for the DTACK counter}
DTACK_CO =  IGMSELO * IGMSELI
;
DTACK_CI = DTACK_CO
dnv""'prod> IGMSELO * IGMSELl;
DTACK C2 =  DTACK CO * DTACK C1
::inv....,prod> IGMSELO
IGMSEL1;
DTACK_C3 =  DTACK_CO * DTACK_C1
dnv""'prod> IGMSELO * IGMSELl;

*"

{Equations for DTACK pulse generation.}
SELO_SO =  IDTACK_C3
SELO_ S 1 =  IGMSELO;
SELl_SO =  IDTACK_C3
SEL1_S1 =  IGMSELI;

{LSB toggles when IGMSELO OR}
{/GMSEL1 is active.}

*

DTACK_C2

* DTACK_C2 * DTACK_C1 * IDTACK..;,.CO * IGMSELI;
* IDTACK_C2 * DTACK_C1 * IDTACK_CO * IGMSELO;

DTACKRST = dnv""'prod> ISELO_SI
;

*

ISELI_S1

6-326

CYPRESS
SEMICONDUcrOR

DMA Control Using the
CY7C342 MAX EPLD
This application note details the use of a CY7C342
MAX EPLD as a general-purpose DMA controller for
a 16-bit microprocessor-based system. The design shoWcases the versatility and density you can achieve with
this type of device, as well as demonstrating a modular
and hierarchical design approach utilizing the
MAX+PLUS development system's schematic capture
and textual design entry capabilities.
The Cypress Multiple Array MatriX (MAX )
EPLD s are a reprogrammable, user configurable, highdensity, high-performance family of logic devices that
suit a host of applications. The MAX architecture allows you to replace large numbers of small- and
medium-scale integration (74XXX series) parts, as well
as programmable logic devices (PLDs) with a single
CY7C340 MAX family device.

more complete description of the features of the
CY7C340 MAX family, refer to the Cypress Semiconductor Databook.

MAX+PLUS Description
The MAX+PLUS development system is a computer aided design environment used to implem~nt
designs with the Cypress CY7C340 EPLD famlly.
MAX +PLUS offers an integrated approach to design
entry, design verification, and device. programming.
Running on an IBM PC!AT-compatible platform,

CY7C342 Description
The CY7C342 EPLD is functionally equivalent to
4000 - 5000 logic gates. A block diagram of the chip
appears in Figure 1. The CY7C342 offers high performance, reprogrammability, excellent design tool support, and fast design tum around. The device has, 128
flexible macrocells arranged' into eight groups called
Logic Array Blocks (LABs). You can configure individual macrocells for combinatorial or registered
operation, supporting product term control of XOR
input, register preset, register clear, asynchronous clock,
synchronous clock, and output enable.
Additionally, you can augment macrocell product
terms by the use of expander product terms (expanders). This array of inverted-AND product terms
feeds back to the macrocell inputs as well as their own
inputs, allowing the implementation of large s~m-of­
products structures and cross-coupled NAND registers.
Each LAB contains 16 macrocells and 32 expanders and functions much like a small EPLD. Signals
are routed between the eight LABs via the programmable interconnect array (PIA), which allows you to
partition a design across several, LABs. The P~'s ro?ting resources feature a single uniform delay, which mmimizes the overall impact on device performance. For a

Figure 1. CY7C342 Block Diagram

6-327

DMA Control Using the CY7C342 MAX EPLD
Table 1. DMAC Modules
DMAC Modules

I

Design Method

CPU Decoder
Control Register

Text Entry
Schematic Entry

Address Counter
Word Counter

Schematic Entry
Schematic Entry

Three-State Buffers
Output Multiplexers

Schematic Entry
Schematic Entry

Cycle Controller

Text Entry

CY7C342

MAX+PLUS provides all the tools necessary to quickly and efficiently convert complex logic designs into
functional silicon.
You enter designs in MAX+PLUS using a powerful hierarchical graphic editor that supports both
schematic capture and high-level text definition. The
graphical editor gives you the ability to capture
schematics utilizing standard 74XXX TTL macrofunctions, generic logic primitives, or user-defmed custom
functions. You can also use a high-level text definition
written in the Advanced Hardware Description Language (AHDL) to defme the function of
entire
design or just a portion of a design. By incorporatirig
schematic capture, Boolean equation, state machine, or
truth table design entry methods, you are free· to choose
the technique that best fits your application.
When the MAX+PLUS design entry is complete,
the design is compiled. The compiler performs several
tasks as it goes through the design database. The compiler performs a minimization function on the logic, fits
the design into a device, creates files for the simulator,
and provides an object file for programming a device.
After compiling the design, you can use the interactive,
event-driven timing simulator to determine the design's
function and worst case timing characteristics.

HREQ

CPU

INTP

Figure 2. Typical System Block Diagram

outputs to access a specific memory location. Depending on the DMAC's programmed configuration, it either
reads the memory location's contents and writes the
data to the I/O device or reads data from the I/O device
and writes the data into the referenced memory location.
Data transfers between memory and I/O devices
can occur as single-word operations or as bursts of
words under CPU program control. A 16-bit counter is
initialized and maintained to control the number of
words being transferred. This counter is decremented
every transfer, allowing a count of zero to generate an
interrupt. The DMAC also handles memory interface
timing and transfer control, because the DMAC essentially functions· as the processor when in control of the
bus.

an

Design Partitioning
The MAX+PLUS development system is a hierarchical tool that allows you to partition your design into
functional blocks, with each block designed as a
separate module. The DMAC implemented here is partitioned into seven functional modules. Table 1 lists the
modules' names and the design method utilized. In addition to the main modules, the DMAC uses some glue
logic, which effectively ties the design together.
. Figure 3 shows the DMAC block diagram. Each
module was constructed utilizing the design technique
which best suits the .application. The modules are
described here in detail including an explanation of the
design methodology chosen for each implementation.

Application Description
The application illustrated here is a general-purpose, 16-bit direct memory access controller (DMAC).
A DMAC shares the address, data, and control buses
with the central processing unit (CPU) and acts as a
bus master in place of the CPU when granted bus
ownership. Generally, a DMAC has access to all system
resources, including memory, memory-mapped I/O, and
I/O. Figure2 shows a. typical system block diagram with
a 16-bit CPU, RAM and ROM memory, I/O devices,
and the DMAC.
When an I/O device requires data from memory or
needs to transfer data to memory, it must. request service from the DMAC by asserting a DMA request
(DREQ). The DMAC (when configured and enabled)
then requests ownership of the system bus by activating
its hold request output (HREQ). The DMAC then
waits until it receives a hold acknowledge (HLDA)
from the CPU. Next, the DMAC enables its 23 address

CPU Decoder Module
Because the DMAC acts as a peripheral until it has
control of the system bus, the DMAC must respond to
several I/O commands from the CPU. These commands
are essential to configure the DMAC properly and to
successfully transfer data. The CPU decoder module

6-328

DMA Control Using the CY7C342 MAX EPLD
receives interface signals from the CPU and decodes
them into write strobes and a read enable. The write
strobes latch incoming parameters from the data bus
into the parameters' respective locations (e.g., control
register, word counter, etc.). The read enable and address line MAl allow internally selected registers to be
multiplexed onto the data bus. The DMAC is configured by the processor via I/O instructions and
responds to the addresses as shown in Table 2.
The CPU interface's function is defined using
AHDL. The resulting ASCII text file describes the CPU
decoder's behavior without determining the Boolean
equation s or using the schematic-capture/graphic-design
entry method. Because MAX+PLUS AHDL provides
several different ways to specify a module's operation, a
truth table is used to describe the CPU decoder to
clearly express which outputs are active when specific
inputs are asserted. Refer to Appendix A for the CPU
decoder AHDL text fIle.

Table 2. DMAC I/O Addresses
A2 Al

The control register module configures the DMAC
and controls the DMAC's operation. The CPU writes
to the control register module, which has control bits to
enable or disable the DMAC, enable an interrupt when
the word count equals zero, clear the word counter,
enable burst or single-byte transfers, and define the

BtHER

C? t:====::;:~

X

X

I

0
0

0

I

X

0
0

1

I

X

I

0
0
0
0
0
0

I

X
I
I
I

10WR

OPERATION

X

No Operation

0
0
0

Write Control Register
Write Word Count

0

I

Write Low Mem Addr
Read Low Mem Addr

I

0

Write High Mem Addr

0

I

Read High Mem Addr
and DMAC Status

transfer direction (memory to I/O or I/O to memory).
The bit defmitions for each DMAC function appear in
Table 3.
The processor can read the DMAC's current status
and configuration. The bit defmitions (Table 4) are essentially the same as the those for the control word.
MAX+PLUS 's schematic capture capability is
used to implement the control register function (Figure
4). This register stores control and configuration information from the CPU, with the exception of the clear
word counter bit. When written by the CPU as a logic 1,
this bit uses an additional flip-flop to clear itself and the
16-bit counter.

Control Register Module

h-;:::==~

CS lORD

ADDR 1
ADDR23

Address Generator Module
The address generator module is a 23-bit
synchronous counter that provides the system memory
address for the data transfer operation. The CPU must
initialize this counter to the 23-bit value that corresponds to the transfer's starting memory address. As

DATA1S

Table 3. DMAC Control Register Bit Definitions

ICS

BIT

DEFINITION

o

Enable DMA Controller
(0 = Disabled, l=Enabled)

Enable Interrupt
(O=Disabled,

VORD
COUNT \E-L----l
IlDA

IDRED

HCLK
lRESET

I--~>___I.._f__

CYCLE
CONTROL

I--~>___--L.._

IHlRD

I--~>-----

IHE/1V
II1EHR
HREO
DACK

I------C>___--

2

IIDVR

(l Clears WordCounter and Bit 2 to zero)

DHAEN
-----~
'-------'

l=Enabled)

Clear Word Counter

INTP

Figure 3. DMAC Block Diagram

6-329

3

Burst/Single Word Transfer

4

(O=Single Transfer, l=Burst Transfer)
Transfer Direction

5-15

(O=Memory to I/O, 1=1/0 to Memory)
Not Used

DMAControi Using the CY7C342 MAX EPLD
vee

I N PUT

000

ENABL
OUTPUT

I N PUT

CLRENB

I RESET

001

I NT E N
OUTPUT

I N PUT

WCT _ C L R

002
I N PUT

003

BURST
OUTPUT

I N PUT

005

I02MEM

Figure 4. Control Register Schematic
Table 4. DMAC Status Register Definitions
BIT

8

memory transactions take place, the counter is incremented at the end of every memory operation to
guarantee that the address is set for the next transfer.
The counter is incremented under the control of the
cycle controller module. The CPU can read the 23-bit
address with two 110 read operations, one for the lower
16 bits and another for the upper 7 bits.
Using the 74XXX TTL macrofunctions available in
MAX +PLUS , the address generation function is implemented with six 4-bit, 74161-equivalent counters.
These counters are arranged so that when each 4-bit
counter increments to a binary count of 1111, its ripple
carry output (RCO) enables the next higher 4-bit
counter via the enable P (ENP) and enable T (ENT)
inputs. The CPU parallel loads these counters via the
data bus in two operations, one for the lower 16 bits
and one for the upper 7 bits. Figure5 shows the address
generator schematic diagram.

DEFINITION
DMA Controller Enabled
(0 = Disabled, l=Enabled)

9

Interrupt Enabled
(O=Disabled,

10
11

(O=Disabled,
13

l=Enabled)

-

Not Used = 0
Burst Transfer Mode
l=Enabled)

Transfer Direction
(O=Memory to 110, 1=110 to Memory)

14,15

Not Used

6-330

DMA Control Using the CY7C342 MAX EPLD
C TEN
II e L K

t::::j·~uttT~.jU~T~:=~~i~

D 0
D 0

3

D 0

4

D 0

5

I--~::":::"':':.:;.:..:::..:::..~.n::J::I:~
....-~:..::...:...:;...::..::...r:J===:lI:>

Doe
D 0

-

7

Doe
Do.
D ,
D ,

ADO R 0 3
ADD R 0 4

..

IN """

I

OUT

Hlo'("

0
,

OUT

PUT

OUT

PUT

PUT~
~

DUTPur=

~

~-

I--

l.....-

C

0

U

..

T

E

0

2

R

2
3

OUT

III

U ,

A D D R

OUTPUT~
A D D R
PUT~

I~

OUT
OUT pu,=
..

I--

I....--

c au

.

,
, ,
,

A D D R 0
A D D R
A D D R
A D D R

.

~,

D ,
D ,

:~ ~ ::

N T

E

A D D R
A D D R

,
,
,
,

3

.
4

5

R

, •,
OUT

II

OUT

PUT---""""""

OUT

~

I....--

.

I-

U T ...............

PUT~

-

,

,
,

e

A D D R 2

0

A D D R
A D D R
A D D R

7

8

• O,!...

.....

OUTPU'=

OUTPur=

IRESET

.~~
Figure S. Address Generator Schematic

6-331

ADDR2'
ADD R 2
ADD R 2

DMA Control Using the CY7C342 MAX· EPLD
Word Counter. Module
Because each transfer operation requires a v,vord
count, a 16-bit word counter monitors the number of
words to be transferred. The CPU initializes this
counter to a value representing one less than the total
number of words to be transferred. This value allows
the counter to reach zero before. the last transfer and
terminate the operation at the proper time, with the·
correct number of words transferred.
The 16-bit word counter is constructed utilizing
four 74161 synchronous counter macrofunctiorts. In-.
itialization of the word· count occurs during set up of
the transfer. operation. The 16-bit word count value is
l's complemented (inverted) as it is loaded into the
counter s. The counter is actually incremented instead of
decremented, and the RCO output of the last counter
indicates that the word count has reached zero. This arrangement requires only four macrocells or 3 percent of
the CY7C342's resources, and thus allows use of the
74161 macrofunction. Figure 6 shows the word counter
schematic.

Three-State Buffers
Whtm the CPU has ownership of the system bus,
the DMAC's address, memory, and I/O control lines
are in a high-impedance state. The data bus must also
remain in a high-impedance state unlesS the CPU is
reading the DMAC's internal register.
An octal three-state buffer implements the high-impedance interface function. The buffer uses eight Tri
buffers from the macrocell library with the enables all
tied together. These octal buffers correspond to the
output portion of the. CY7C342's I/O pins. The cycle
control module output, DMAEN, enables the· address,
memory, and I/O control outputs. The RD_ENAB signal from the CPU decoder module enables the data
outputs.

Output Multiplexer. Module
The CPU must have access to the DMAC's internal
registers to· monitor operation.. Because the MAX family of devices· does not support an internal three-state
bus, an alternative technique~s employed when driving
I/O pins from multiple sources wi~in the device. Fo~
74157 2: 1 multiplexer macrofunctIons are placed m
front of the 16 I/O pins for the data bus. With address
input A1 connected to each 74157 mux's select input,
either the address generator's lower 16 bits (when A1
= 0) or upper 7 bits (including the DMAC status inf~r­
mation) are driven onto the data bus during a CPU
read operation. Figure 7 shows the output multiplexer
schematic with octal buffers.

Cycle Control Module
The cycle control module controls 'DMAC memory
and I/O operations. This finite state machine (FSM)
handles the hold request (HREQ) and hold acknowledge (HLDA) handshake with the CPU. It also acCepts

DMA requests (DREQ) from I/O devices and acknowledges the requests with the output DACK. The FSM
generates all memory and I/O interface timing, as. well
as the address/control output enables, counter mcre"'
.mentldecrementoperations, and interrupt generation
(when enabled). The FSM is specified as a text design
fIle and uses the AHDL state machine syntax with
CASE statements to defme the operation in each state.
.Figure 8 shows the cycle controller state diagram, ~d
Appendix B lists the cycle controller AHDL text deSign
file.
Examining the state variable declarations in the
cycle controller text design me ~eveals that all the ~ter­
nal control signals, handshake lmes, and external mterface signals are encoded into the FSM's state definitions. This text allows for a clear definition of each state
using the fewest number of macrocel1s. However, this
method can sometimes result in each macrocell having a
complex Boolean expression that requires additional expanders. Thus, you might find it beneficial in some
cases to allow MAX+PLUS to defme an FSM's state
definitions utilizing more macrocells, which can reduce
the number of expanders .. This approach relieves you
from the responsibility of manually assigning state bits
and often results in better performance.

Design Compilation
Upon completing each DMAC module, you can
compile'the module to eliminate any errors. If a design
module contains' an error, MAX+PLUS flags the error
and takes you' to the error's location in the schematic or
text file. As each .module compiles successfully,
MAX +PLUS automatically generates a symbol representing the module· design. These symbols are then incorporated in the DMAC's top-level schematic (AppendixD).

Once all DMAC blocks are integrated into the
design, you can perform top-level compilation of the
DMAC. You can compile the DMAC design from
within MAX+PLUS 's graphic editor or from the compiler itself...
.
.
The compiler follows a senes of steps dunng the
compilation process ranging from netlist extraction. to
the creation of the object file used by the deVice
programmer. First, the design processor creates a compiler netlist file (.CNF me) and tests for any design rule
violations (output shorts, syntax errors in AHDL mes,
inputs and outputs not used, etc.). Next, the compiler
generates a hierarchy interconnect file (.HIP file),
which details the design's hierarchical interconnections.
The database builder then flattens the design into a
single level, maintaining the original design's func~ion
and connectivity. A logic synthesizer determmes
Boole.an expressions for each logic function and. primitive allowing for the sum-of-products form reqUIred by
the 'MAX EPLD architecture. Proprietary minimization
algorithms remove redundant logic and reduce the
number of required product terms.

6-332

DMA Control Using the CY7C342 MAX EPLD
CT

EN

WR

WC N T

--

1 N PUT

""

_I
0

N PUT

_I

N PUT

-

NOT

""
NOT

DO 0

7 4 1 6 1

e e

I

NOT

DO 1

-

N PUT

_I

_I

l
It.
B

Q

It.

~

e

Q

B

~

D

Q

C

~

II--

E N T

Q

D

I--

I...--

E N P

R C 0

... "
N Q T

DO 2

-

N PUT

DO 3

_I

N PUT

....2..£....-

"

u

NOT
u u

-

D N

I

C l
C l

~

R N

·

K
COUNTER

NOT

D04

_I

NPUT

-Vu"

I

N Q T

DO 5

_I

N PUT

-

vee

7 4 1 6 1
l

NOT

DO 6

DO 7

_I

NPUT

-

vee

I

N Q T

-

..........-,;-1

N PUT

e e

D N

"
B

Q

It.

C

Q

B

D

Q

C
D

4~

E N T

Q

I...--

E N P

R C 0

C l
C l

R N

_I

~

DO 9

-

N PUT

N Or!...,

I

l

a,..-"

-

_I

N PUT

I

uu

NOT

D 1 1

_I

~

NPUT

vce

B

Q

A

~

C

Q

B

D

Q

C

I-I--

~.....-

E N T

Q

D

I--

'---

E N P

R C 0

NOT

D 1 0

D N

It.

N PUT

""

C l
C l

_I

-

N PUT

D 1 3

_I

-

N PUT

~I

N PUT

o

·

K

NOT

""

I

NOT
0

0

""

7 4 1 6 1
l

-V""

NOT

D 1 5
I

RESET

M C L K

~I

NPUT

_Vuu

_I

NPUT

~

vee

_I

N PUT

-

-

r

D N

It.

NOT

D 1 4

~

R N

COUNTER

D 1 2

-

7 4 1 6 1

e e
NOT

---r"1

~

--

·

K
COUNTER

DO 8

-

4....I...--

B

Q

It.

~

C

Q

B

100-

D

Q

C

E N T

Q

D

lI-

E N P

R C 0

C l
C l

uu

R N
K
COUNTER

Figure 6. Word Counter Schematic

6-333

·

-

OUTPUT-oo-

Z E R 0

DMA Control Using the CY7C342 MAX EPLD
7 4 1 5 7
r----

ADO 1

8 E L

A'

A01 7

8'

A002
A01 8
A003
A01 9
A004

At

V,

8 2

V 2

A 3

V 3

83

V'

f---

A.

A020

OCT

8 •

;--C

L.....-- IN.
IN'

MULTIPLEXER

I N 2

-

r----

8 E L

A'

A021

8 ,

A006
A022
A007
A023
AOO8

OUT

a

A 2

y,

8.

V'

A 3

v,

8 •

v.

..

OUT;
OUT2

I N 3

OUT.

IN.

OUT

I N 6

o U TOS

IN.

7 4 1 5 7
A005

BUF

aN

I N 7

'0

OUT I
OUT7

000
DO 1
002

8 I D I R
V
81 lfl"R

.=

-

003

8 I If I"R
8 IV 81e R -.............

004

8 IV 81e R ...............

005

8 I If I"R

006

8 IV 8 Ie R :::::::::

007

8 I

000
00 1
DO 2
0

003
004

If

t

R

""

OUT_EN

DO 5
DO 6
007

~

A.

~

~

aN
MULTIPLEXER

EN

RO

A

B

A 0 1

7 4 1 5 7
S E L

AOO9

EN

A

A'

BL

8 ,

A01 0
I

NT E N

A01 1
A01 2

~

A 2

y,

8 2

V 2

A.

V.

8.

v.

I---

A 4

BURST

OCT

8 •

~

'-- I - -

aN

MULTIPLEXER

7 4 1 5 7
A01 3

OUT'

~

OUT.

I N 3

OUT 3

IN'

OUT 4

INS

OUT S

IN.

OUT.

I N 7

OUT 1

I . . - - OUT_EN

8'

A01 4

A01 6

r-- I - -

A'

I 02 ME M

A01 5

S E L

a

OUT

IN'
IN.

'-- I - -

BUF

IN.

A 2

v, I - - -

8 •

V 2

A'

V 3

~

8 •

v.

~

8 •

A 4

~

GN

.. U L T I P LEX E A

"'7
Figure 7. Output Mux Schematic W/Octal Buffers

6-334

008

8 I D I R ____

009

8 I

01 0

8 I 81" R ;;;::;;:::;;::

008
If I" R ::::::::::

01 1

81 I f t R - =

01 2

8 I 81"R ......;::.....

01 3

8181"R'=

01 4

81 I f I " R , =

01 5

81 I f I " R , =

""

DO 9
01 0
01 1
01 2
01 3
01 4
01 5

DMA Control Using the CY7C342 MAX EPLD
additional PIA delay. However, this extra delay affects
operation only when the CPU writes to or reads from
the DMAC. Because the CPU typically requires a slow
I/O operation to access the DMAC, the extra delay
would cause no significant performance reduction.
Using the graphic editor, the DMAC schematic is
alter:ed by placing an MCELL buffer between each output of the CPU Decoder block and the destination.
Placing the MCELL buffer after a module forces
MAX +PLUS to place the logic function preceding the
buffer in a macrocell. The module's output is then
routed to all LABs via the PIA, resulting in an additional delay but requiring fewer interconnects. The design
was recompiled and successfully fit into a CY7C342.

Design Verification
Design verification is an important step in the
development of any programmable logic function and
can be accomplished in several different ways. One way
is to take a programmed device, insert it in a circuit,
and observe it's behavior. This "plug and chug" technique works [me for simple devices performing welldefined functions; it does not work well for large, complex designs with major portions of the logic buried
within the device.
The second, far more sophisticated approach, is to
model the programmable logic function's behavior and
simulate the operation before the part goes on a board.
This design verification procedure is recommended for
all but the most elementary designs to determine that
the function and timing characteristics obtained match
the system's requirements.
The MAX+PLUS simulator provides fast, easy
design verification. It allows you to input circuit stimuli
from either a vector or waveform file (.VEC and .SCF
files, respectively). Output information can be stored in
a table and compared during later simulation sessions
or viewed in the waveform editor. You can create batch
operations to automate the simulation process.
The simulator is accurate to 100 ps and features
glitch, oscillation, and ~et-up/hold monitoring on every
internal node. within the selected device. This capability
allows you to monitor the device's operation from a
functional standpoint, utilizing worst-case timing
parameters to guarantee proper operation in the application.
Although you can obtain "standard" DMA controllers, often they are not a good fit for the specific system
you are designing. A custom gate array solution, with its
high non-recurring engineering charges and long
processing delays, is difficult to justify when compared
to MAX EPLDs.
The Cypress CY7C340 family of EPLDs offer you
capabilities far beyond those of earlier PLD generations. Through the use of the powerful tools in the
MAX +PLUS development system, you can complete
complex designs in less time, using fewer components,
and achieve lower system cost than ever before.

Figure 8. Cycle Controller State Diagram
A rule-based expert system then groups the
design's logical requirements into a balanced number of
macrocells and expanders. MAX+PLUS 's fitter allocates resources within the device, selecting the best
macrocell location, pin assignments (if not already assigned), and interconnection paths. After a successful
fit, MAX+PLUS creates a programmer object file
(.POF file), which you can use to program a device.
Whether the design compiled successfully or not,
MAX+PLUS also creates a report file (.RPT file) that
details the utilization of macrocells, expanders, interconnects, and I/O pins.
If the design compiled properly, you can program a
device or verify the design through simulation. If the
design did not compile, the compiler provides warnings
and error messages to aid you in correcting any
problems. Refer to the Cypress MAX+PLUS User's
Guide for a complete list of error_ messages and recommended corrective actions.
Upon initial compilation of the DMAC design, the
MAX +PLUS fitter determined that two of the LABs
required more connections from the PIA than are available. The macrocell interconnection cross reference in
the report file (Appendix C) revealed that portions of
the CPU decode function were implemented with expanders in several LABs. This arrangement required
the routing of many CPU decoder module inputs to
each LAB. The compiler chose this approach to conserve macrocells, enhance performance, and prevent an

6-335

DMA Control Using the CY7C342 MAX EPLD
Appendix A. CPU Decoder AHDL Text File

% CYPRESS SEMICONDUCTOR INC. %

TITLE "CPU Address Decoder";

%***************************************************************************
CPU Interface Decoder for DMA Controller

***************************************************************************%
SUBDESIGN cpu_decd (

%***************************************************************************
Decoder Inputs

***************************************************************************%
ma2,
mal,

% Address Bit 2 %
% Address Bit 1 %
% Chip Select %

Ics,
liowr, % I/O Write Signal %
liord % I/O Read Signal %

: INPUT;

%***************************************************************************
Write Strobe Outputs

***************************************************************************%
Iwr _ctrl,
Iwr wcnt,
Iwr=ma_0,
Iwr_rna_1,

%
%
%
%
%

rd_ enabl
: OUTPUT;

Control Register Write Strobe %
Word Count Write Strobe %
Lower Memory Address Write Strobe %
Upper Memory Address Write Strobe %
Output Multiplexer Read Enable %

BEGIN
TABLE
ma2,ma1,/iord,liowr,/cs = > Iwr ctrl,lwr went,lwr rna O,lwr rna 1,rd enabl;

% ----------------------------------------=----------=----------=---%
1;
1,
1,
1,
1,
x, x, 0, 1, 0=>
0,
1,
1,
1,
0, 0, 1, 0,
0;
= >
1,
0,
1,
1,
0;
0, 1, 1, 0,
= >
1,
1,
0,
1,
0;
1, 0, 1, 0,
= >
0,
0;
1, 1, 1, 0,
1,
1,
1,
= >
END TABLE;
END;

°°
°°

6-336

DMA Control Using the CY7C342 MAX EPLD
Appendix B. Cycle Controller AHDL Design File
TITLE "Cycle Controller";

% CYPRESS SEMICONDUCTOR INC. %

%**************************************************************************
MEMOR Y and I/O Cycle Controller

**************************************************************************%
SUBDESIGN cyc_ctrl(

%*************************************************************************
cyc ctrl Input Definitions

*************************************************************************%
reset,
dreq,
hId a,
zero,
enabl,
inten,
dir,
burst,

% reset Input Active High %
% DMA Request Input %
% CPU Hold Acknowledge %
% Word Counter Borrow output %
% DMA Enable Input %
% Interrupt Enable %
% Direction Bit: dir = 0 = MEM TO I/O, dir = 1 = I/O TO MEM %
% Burst Enable Bit: burst = 0 = single xfers,
burst = 1 = multiple xfers %
clock % System Clock %

: INPUT;

%*************************************************************************
cyc ctrl Output Defmitions

*************************************************************************%
count,
% Count Enable Bit %
memw,
% Memory Write %
memr,
% Memory Read %
iowr,
% I/O Write %
iord,
% I/O Read %
dack,
% DMA Acknowledge %
dmaen, % DMA Address Enable %
hreq,
% CPU Hold Request %
setint,
% SE T Interrupt Output %
clrenb
% Clear DMA Enable bit %
: OUTPUT;
)

DMA Control Using the CY7C342 MAX EPLD
Appendix B. Cycle Controller AHDL Design File (Continued)
VARIABLE

cye_ctrl: MACHINE OF BITS (q[lO..O))
WITH STATES ( stidl = B"OOOOOOOOOOO",
sthld = B"OOOOOOOlOOO",
stdir = B''OOOOOOllOOO'',
memO = B"OOlOOlllOOO",
meml = B"00 llOl 11000",
mem2 = B''OOOOOll1OOl'',
ioO = B"OOOOl1l1OOO",
iol = B"OlOOl 11 lOOO",
i02 = B"lOOOOll1OOl",
stend = B"lOOOOOl1OOO",
stint = B''0000000 1100",
endhld = B"lOOOOOOOOOO",
clenb = B"OOOOOOOOOlO");
BEGIN
cye ctrl.clk = clock; % system clock %
cye_ctrl.reset = reset; % system reset %
memw = cye_ctrl.q[9];
memr = cye_ctrl.q[8];
iowr = cye_ctrl.q[7];
iord = cye ctrl.q[6];
dack = cye=ctrl.q[5];
dmaen = cye ctrl.q[4];
hreq = cye=ctrl.q[3];
setint = cye ctrl.q[2];
clrenb = cye- ctrl.q[l];
count = cye=ctrl.q[O];
% QlO is a state variable to make all state defmitions unique %

CASE (cye ctrl) IS
WHEN stidl = >
% Wait for Enable and Request %
IF enabl & !dreq THEN cye ctrl = sthld;
END IF;
% Wait for Hold Acknowledge %
WHEN sthld = >
IF hlda THEN cye ctrl = stdir;
END IF;
-

WHEN stdir = >
% Determine which direction %
IF dir THEN cye ctrl = ioO; % 1/0 to Memory %
ELSE cye ctrl = -memO;
% Memory to I/O %
END IF;WHEN memO = >
cye_ctrl = meml;

% Memory Read and 1/0 Write %

WHENmeml = >
cye_ctrl = mem2;

6-338

DMA Control Using the CY7C342 MAX EPLD
Appendix B. Cycle Controller AHDL Design File (Continued)

WHEN mern2 = >
cye _ctrl = stend;
WHEN ioO = >
c}'C_ctrl = iol;

% 1/0 Read and Memory Write %

WHEN iol = >
cye_ctrl = i02;
WHEN i02 = >
cye_ctrl = stend;
% Determine what to do next %
WHEN stend = >
IF !dreq & !zero & burst THEN cye_ctrl = stdir;
ELSIF dreq & !zero THEN cye ctrl = endhld;
ELSIF zero & inten THEN cye ctrl = stint;
ELSIF zero & !inten THEN cye- ctrl = c1enb;
END IF;
-

WHEN stint = >
cye_ctrl = c1enb;
WHEN clenb = >
cye_ctrl = endhld;

% Set Interrupt, if Enabled %

% Clear Enable and Counters %

WHEN endhld = > % Wait for end of HOLD/ACK Sequence %
IF !hlda THEN eye ctrl = stidl;
END IF;
END CASE;
END;

6-339

DMA Control Using the CY7C342 MAX EPLD
Appendix C. DMAC Report File
C:\MAX_WORK\DMAC_APP\DMAC.RPT
MAX+PLUS Compiler Report File
Version 2.03C 01112/90

*****

Design compiled without errors

Title: DMA CONTROLLER
Company: Cypress Semiconductor
Designer: Joe Engineer
Rev: A
Date: 12:25a 4-14-1990
Turbo: ON
Security: OFF

lID
MI I HMD
M
DMMMMM.
AOORAAVGCI GRAAAAA
OWREECCNLCNE22221
1RDQNKCDKSDQ321 09
9 8 7 6 5 4 3 2 1 68 67 66 65 64 63 62 61

1

IMEMR

IMEMW
RESERVED
RESERVED
RESERVED
RESERVED
GND
RESERVED
D04
D05
VCC
D09
DlO
D13

10
11
12
13
14
15
16
17
18
19
20
21
22
23

DOO

24

DOl
D02

25
26

60
59
58
57
56
55

54
53
52
51
50
49
48
47
46
45

CY7C342

44

2728293031 3233343536373839404142 43
DDD MM GGG
000 AANNN
3680 0 D DD
34

/ H VD MMM MM
R LCOAAAAA
E DC 7 0 0 0 1 1
SA
28901
E
T

6-340

I
I MA07

I MA06
I MA05
I MA18
I MA17
I MA16
I VCC
I MA15
I MA14
IINTP
I GND
I D15
I D14
I D12
ID11
I MA13
I MA12

I

DMA Control Using the CY7C342 MAX EPLD
Appendix C. DMAC Report File (Continued)
C:\ MAX_WORK\ DMAC _ APP\ DMAC.RPT

**

RESOURCE USAGE

**

Logic Array Block

Macrocells

I/O Pins

Expanders

External
Interconnect

MC1- MC16
A:
B: MC17 - MC32
C: MC33 -MC48
D: MC49 -MC64
E: MC65 -MC80
F: MC81- MC96
G: MC97 - MC112
H: MC113 - MC128

8/16( 50%)
0/16( 0%)
5/16( 31%)
8/16( 50%)
14/16( 87%)
7/16( 43%)
5/16( 31%)
8/16( 50%)

8/8(100%)
0/5( 0%)
5/5(100%)
8/8(100%)
8/8(100%)
5/5(100%)
5/5(100%)
8/8(100%)

0/32(
0/32(
0/32(
0/32(
4/32(
0/32(
0/32(
0/32(

5/24( 20%)
0/24( 0%)
9/24( 37%)
16/24( 66%)
24/24( 100% )
9/24( 37%)
22!24( 91%)
24/24( 100% )

Total dedicated input pins used:
Total I/O pins used:
Total macrocells used:
Total expanders used:

51 8 ( 62%)
47/52 (90%)
55/128 ( 42%)
4/256 ( 1%)

Total input pins required:
Total output pins required:
Total bidirectional pins required:
Total macrocells required:
Total expanders in database:

5
27
20
55
4

Synthesized macrocells:

0/128 ( 0%)

0%)
0%)
0%)
0%)
12%)
0%)
0%)
0%)

C:\ MAX_WORK\ DMAC_ APP\ DMAC.RPT

**

FILE HIERARCHY

**

IOCT BUF:1671
ICYC- CTRL: 1061
ICm REG:731
ICPU DECD:751
IADDR GEN:1231
IADDR- GEN:123174161:661
IADDR-GEN:123174161:711
IADDR-GEN:123174161:701
IADDR-GEN:123174161:691
IADDR- GEN: 123174161:681
IADDR- GEN: 123174161:671
IWORD- CNT:1251
IWORD-CNT:125174161:661
IWORD-CNT:125174161:691
IWORD-CNT:125174161:681
IWORD-CNT:125174161:671
174157:f301
174157:1321
174157:1331
174157:1341
IOCT BUF:561
IOCT- BUF:551
IOCT-BUF:351
IOCT=BUF:1681

6-341

~

.r~

DMA Control Using the CY7C342 MAX EPLD
Appendix C. DMAC Report File (Continued)

**

C:\ MAX_WORK\ DMAC_APP\ DMAC.RPT
INPUTS **

Pin MCell LAB
68
66
24 (49) (D)
25 (50) (D)
26 (51) (D)
27 (52) (D)
18 (33) (C)
19 (34) (C)
28 (53) (D)
38 (65) (E)
29 (54) (D)
21 (35) (C)
22 (36) (C)
46 (81) (F)
47 (82) (F)
23 (37) (C)
48 (83) (F)
49 (84) (F)
36
7 (4) (A)
8 (5) (A)
9 (6) (A)
39 (66) (E)
1
35

Primitive
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT

Fan-In
Expanders
Total Shared INP FBK
0
0
0
0
0
0
0
·0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
·0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
b 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

6-342

Name
ICS
DREQ
DOO
DOl
D02
D03
D04
D05
D06
P07
D08
D09
D10
D11
D12
D13
D14
D15
HLDA
IIORD
IIOWR
MA01
MA02
MCLK
IRE SET

c:~~~
~,

SEMlCQIDUCTOR

DMA Control Using the CY7C342 MAX EPLD
Appendix C. DMAC Report File (Continued)
C:\ MAX_WORK\ DMAC_APP\ DMAC.RPT

**

OUTPUTS

**

Pin MCell LAB
4
1 A
5
2 A
24 49 D
25 50 D
26 51 D
27
52 D
18 33 C
19 34 C
28
53 D
65 E
38
29 54 D
21
35 C
22 36 C
46 81 F
47
82 F
23
37 C
48
83 F
49 84 F
6
3 A
51
85 F
4
A
7
8
5 A
9
6 A
39 66 E
30 55 D
31
56 D
58 113 H
59 114 H
60 115 H
40 67 E
41
68 E
42 69 E
43
70 E
44 71 E
45 72 E
52 97 G
53 98 G
55 99 G
56 100 G
57 101 G
61 116 H
62 117 H
63 118 H
64 119 H
65 120 H
10
7 A
11
8 A

Expanders
Primitive Total Shared
DFF+
0
0
DFF+
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
OR2
0
0
DFF+
0
0
OUTPUT 0
0
DFF+
0
0
DFF+
0
0
DFF+
0
0
DFF+
0
0
DFF+
0
0
0
0
DFF+
DFF+
0
0
DFF+
0
0
DFF+
0
0
DFF+
0
0
DFF+
0
0
DFF+
0
0
DFF+
0
0
0
0
DFF+
DFF+
0
0
0
0
DFF+
0
0
DFF+
DFF+
0
0
0
0
DFF+
DFF+
0
0
0
0
DFF+
0
0
DFF+
DFF+
0
0
0
0
DFF+
0
0
DFF+
DFF+
0
0
DFF+
0
0

Fan-In
INP FBK
2
7
4
7
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
2
1
2
1
2
1
2
1
3
1
3
1
2
1
2
1
2
3
7
0
0
2
8
2
7
3
3
3
4
3
5
3
6
3
7
3
8
3
9
10
3
11
3
12
3
13
3
3
14
3
15
16
3
3
17
3
18
3
19
3 20
3 21
3 22
3 23
3 24
3 25
2
9
2
7

6-343

Name
DACK
DMAEN
DOO
DOl
D02
D03
D04
D05
D06
D07
D08
D09
DlO
D11
D12
D13
D14
D15
HREQ
INTP
IIORD
IIOWR
MAO 1
MA02
MA03
MA04
MA05
MA06
MA07
MA08
MA09
MAIO
MAll
MA12
MA13
MA14
MA15
MA16
MA17
MA18
MA19
MA20
MA21
MA22
MA23
IMEMR
IMEMW

DMA Control Using the CY7C342 MAX EPLD
Append~

C. DMAC Report File (Continued)
C:\ MAX_WORK\ DMAC_ APP\ DMAC.RPT

**

BURIED LOGIC

Pin MCell
96
95
80
79
78
77
76
75

**

LAB
F
F
E
E
E
E
E
E

**
Primitive
DFF
DFF
DFF+
DFF+
MCELL
MCELL
MCELL
MCELL

Expanders Fan-In
Total Shared INP FBK

o
o
o

0
0

2
2

1
1

4

027
0
4 8

o

0
0
0
0

o
o
o

3
5
5
5

0
0
0
0

Name
I CTRL REG:731:9
I CTRL-REG:731 :33
I CYC J~TRL:1061 qO
I CYC_CTRL:1061 q10
:152
:153
:155
:156

C:\ MAX WORK\ DMAC APP\ DMAC.RPT
STAlE MACHINE ASSIGNMENTS **
-

ICYC CTRL: 1061eye etrl:
MACHINE
OF BITS ( MC079, MCOO8, MCOO7, MCOO5, MCOO4, MCOO1, MCOO2, MCOO3, MC080) WITH
STAlESI(
stidl = B"()()()()()()()(",
sthld = B ''0000000 10",
stdir = B"OOOOOO11O",
memO = B"OO1001110",
mem1 = B"OO1101110",
mem2 = B"OOOOOl1l1",
ioO = B"OOOOll1lO",
iol = B"010011110",
io2 = B"lOOOOl1l1",
stend
B"lOOOOO110",
stint
B ''0000000 10",
endhld
B" 100000000",
elenb
B"OOOOOOOOO"
);

6-344

~

iii CYPRESS

?

SEMICONDUCTOR

Interfacing PROMs and RAMs to a High-Speed
DSP Chip using MAX
This application note describes how to interface
Cypress CY7C128A Static RAMs and CY7C291A
PROMs to the AT&T DSP16Ausing the CY7C343 64Macrocell MAX EPLD. This design illustrates MAX's
ability to integrate SSI and MSI logic for system cost
and space savings.
The DSP16A includes a parallel port, with associated strobe signals, which is available for interfacing
to external memory. The parallel port needs an external
address generator when interfacing to RAM or PROM,
and an EPLD of MAX's density suits this purpose.

PSEL, peripheral select-indicates which one
of two logical ports, pdxO or pdx1, is used
during. the current parallel I/O transaction
PIDS, parallel input data strobe-asserted
during a read transaction
PODS, parallel output data strobe-asserted
during. a write transaction.
You can program the pulse width of both PIDS and
PODS to be from one to four times the processor's
cycle time (abbreviated as T). The pulse width is controlled by two bits in the DSP16A's parallel I/O control
(pIOC) register. Two other bits in the PIOC define
PIDS and PODS as either active (output signal) or passive (input signal). This design assumes that PIDS and
PODS are in the active mode and that all 16 bits of the
parallel port bus are configured to be bidirectional
(PIOC's status/control bit equals 0).
Because the DSP16A parallel port lacks an address
bus, it is necessary to create an external one. In this
design example, the CY7C343 MAX implements an address generator and an address decoder. When fully
utilized, this generator/decoder addresses up to 16K
words of mixed ROM and RAM.

Design Description
The conventional method of attaching external
memory to the DSP16A is through its external memory
interface, which comprises the following signals: a 16-bit
ROM address bus; a 16-bit ROM data bus; and a c1ockout signal, CKO, which cycles at 25 ns. Bear in mind
two constraints when using this external memory interface. First, it allows only memory reads; second, it requires extremely fast PROM speeds. With the DSP16A25, for example, you must use PROMs with 7-ns address access times (clock cycle - address delay - data set
up = 25 - 5 - 13 = 7 ns). Memory devices at this performance level are very expensive.
The DSP16A parallel port provides a non-zerowait-state alternative to the device's external memory
interface, which accommodates both read and write
memory accesses. A non-zero-wait-state external
memory subsystem is appropriate because the DSP16A
has 2K words of RAM .on chip, and the external
memory can download data or coefficients to the onchip memory prior to time-critical computation (or
equivalently, to upload data or results from the on-chip
memory following time-critical computation). The
design outlined in this application Ilote requires four
cycles (100 ns) to load starting address, three cycles
(75 ns) to perform a write operation, and five cycles
(125 ns) to perform a read operation.
The DSP16A's 16-bit bidirectional parallel port includes three associated signals:

Design Details
Figure 1 shows the block diagram for this design. In
addition to the PROMs, SRAMs and MAX chip, the
design requires a discrete 74F08 AND gate (more on
this later). Note that the MAX chip generates four
BANK! and four BANK signals. The BANK! lines control the CY7C128A SRAMs' active-Low chip enable,
and the BANK lines connect to the CY7C291A
PROMs' active-High chip select. You can modify the
number of SRAMs versus the number of PROMs just
by changing the MAX design, because the bank signals'
timing is the same for both the active-High (BANK)
and the active-Low (BANK!) version.
Figure 2 shows a schematic of the CY7C343 logic.
The four 74163s make up a 16-bit preloadable, auto-incrementing up counter. The 74138 decodes the eight
memory-chip enables and chip selects, which are conditioned with the PSEL signal. SCLK is the symbol that

a

6-345

Interfacing to DSP Chip with MAJ.
lists a code fragment that performs the read and· write
operations described here.
.
This design's conceptual operation is relatively
straightforward. The· challenge is to design the addressing logic in MAX, determine the proper pulse widths
for the PIDS and PODS strobes, and fmd memories
with the appropriate speed.

forces the MAX+PLUS compiler to use synchronous
clocking.
The DSP16A's physical parallel 110 port connects
to the two logical ports, pdxO and pdxl, which distinguish between address and data transfers. When writing
code for the DSP16A, you issue an external memory address from pdxl. This causes PSEL to go High and
enables the load function on the 74163s; PODS' rising
edge clocks the address from the parallel bus (pBOO PB13) into the 74163s. The code for the DSP16A then
reads data in through pdxO or writes data out of pdxO.
In the case of a read, PSEL goes Low, which disables. the 74163s' load function and enables the bank
signal to the memories. Because PIDS is Low, the
memories are output enabled, data returns to the
DSP16A on the parallel bus. and PIDS's rising edge increments the 74163s.
In the case of a write, PSEL again goes Low. Because PODS is Low, the memories are write enabled,
data is written from the DSP16A parallel bus, and
PODS's rising edge increments the 74163s. Appendix A

Timing Diagrams
The timing diagrams for this design appear in
Figures 3 and 4, with the corresponding timing
parameters in Tables 1. 2. and 3. Specifically, Figure 3
shows an address load and back-ta-back data writes.
Figure4 show~ an. address load and a single data read.
The CY7C343-30 timing parameters shown in these
illustrations were calculated from the internal switching
characteristics described in the device's data sheet; the
MAX +PLUS simulator verified the signal timing. The
circuit
was
captured
with
address-generator
MAX+PLUS 's graphic editor, then compiled and
simulated.
I

DATA

t.

CY7C128A
ADOD-ADlO

DSP16A

CY7C343
PlOO-Pl07

PI

'IDS

PODS

PIlL

PlOO-P".

074F08

PI

AD

CY7C128 A
UOO-ADIO

UDR

PlOI-Pl"

DATA

ADDR

DATA

1

ADDIISS

CLOCI

IAUI

J
I_

PilL

IAII

1
14

ts

r - - - cr.1

r - - - ell

.---

WEI

.--

011

.---

.-- DEI

••
•

••
•

I

CY7C291A
ADOO-ADlO

PI00-'107

ADDI

DATA

CY7C291 A
ADOO-ADIO

PlOI-PllI

+L
Figure 1. Block Diagram

CII

cn

ADDR

DATA

CIlI

CUI

6-346

ul

+L

CII

cn

5i;=~

P

Interfacing to DSP Chip with MAX

SEMlcaIDUCfOR

~4

+.,
~
~

~

~

,
•

QA

C

QI
IIC

0

110

r---

UP
UT

ICO

-

CLl
LOAD
~CLI

SCLl

'-nTi'r

!m3
PI
PIOZ
PIIU

mt-<
PliO

-A

IIII

0

'1
mE
4

...- f---

'--

r----<

P

~
Pill
Pill

•

C
D

IIII

UP
EIT

QA
III
IIC
QD
ReO f---

~ f--'--

f------C

UP
liT

QD

aco

A

I

"l">-----<

-!.!!..L....
II'. It 0

~

H>743t

-

'--

,
I

e

~

n

Y4
II
YI
IU
IU
Y7
7413.

"

£IT

eLl

¥

'-nTi'r

C

..-~IA.ltZ

YO
Y1
yz

L - - . ( LOAD
eLI

I'NU

-

+.,

eLl
LOAD
eLI
741

~d
•e
QI Qe
D
QD leo r..-- EI'
-

ADOt

II

Q'
QI
Qe

D

AD 5
~

dH±
>MYAD

r---

•e

DD
ADO
AD03

II

eLl
LOAD
CLI
741
A

ffiIf

D

~

H>

l1t

•

AIK3
A

..-~IA.lt4

~74DZ
~

IAIKS

c
~

)

IAIU

D

l)

IA

It

Figure 2. CY7C343 Schematic Diagram
Note that you could include inside the MAX chip
the AND gate that uses PIDS and PODS to create the
clock signal. This forces the MAX+PLUS compiler to
use asynchronous clocking, however, pushing the I/O
input hold time out beyond the 10 ns provided by the
DSP16A (see P13 in the timing diagrams and parameter
listings). Using an external AND gate allows the macrocell clocking to be synchronous, which eliminates the
hold-time problem. The maximum propagation delay
through the 74F08 AND gate is taken to be 6 ns (PI4).
To create the design, you must determine the
length of the PODS strobe during the address-load
cycle. The critical requirement is the 34-ns 74163 set-up
time (P15). Because the PD bus is valid 25 ns after
PODS goes active (P13 in Figure 3), PODS must be
programmed for a pulse width of 3T to meet the set-up
time requirement After the load cycle, PODS must go
inactive for at least one cycle so that an address load
takes 4T or 100 ns.
Similarly, it is necessary to determine the length of
the PODS strobe during a data write. Using the

CY7C128A-20 SRAM, the critical requirement is the
15-ns interval from chip-enable Low to the end of the
write. With PODS programmed to 2T, the chip-enableLow-to-write-end interval is guaranteed to be 18 ns
(P21). A write to SRAM thus takes 3T, or 75 ns. As
shown in Figure 3, this configuration also provides a
write-enable pulse width of 50 ns (P22), data-set-up-toTable 1. DSP16A Parallel 110 Read-Cycle Specs

CKO High to PIDS Low = 15 ns max [PI]
CKO High to PIDS High = 15 ns max [P2]
PIDS Low to PSEL valid = 10 ns max [P3]
PIDS High to PSEL invalid (PSEL hold) =
25 ns min [P4]
PB valid before PIDS High (data set up) =
15 ns min [P5]
PIDS High to PB invalid (data hold) =
o ns min [P6]

6-347

Interfacing to DSP Chip with MA:
Table 2. DSP16A Parallel 110 Write-Cycle Specs

write-end time of 25 ns (P23), data-hold-from-write-end
time of 10 ns (P24), address-set-up-to-write-end time of
53 ns (P25), write cycle time of 53 ns (P26), address-setup-to-write-start time of 35 ns (P27), and address-holdfrom-write-end time of 0 ns (P28).
The next requirement to be determined is the
length of the PIDS strobe during a read operation. For
the CY7C291A-20 EPROM, the critical parameter is
the chip-select-active-to-data-valid time of 15 ns (see
P29 in Figure 4). To meet this requirement, PIDS must
be programmed to 4T, thus providing a total EPROM
read cycle time of 5T, or 125 ns. This also provides an
address-to-output-valid time of 69.5 ns (P30).
Now it is necessary to verify that this read cycle
timing also meets the CY7C128A-20 SRAM's requirements. Again, the critical parameter is the chip-selectactive-to-data-valid time, but for the SRAM this
parameter is 20 ns (P31). As shown earlier, programming PIDS to 4T suffices for this operation, as well, so
that an SRAM read cycle is also 5T, or 125 ns. As for
the EPROM, this configuration provides the SRAM an

--.

II

ISEC

CKO High to PSEL valid = 8 ns max [P7]
PSEL valid before PODS Low = 2.5 ns min [P8]
PODS High to PSEL invalid (pSEL hold) =
125 ns min [P9]
,CKOLow to PODS Low = 8 ns max [Pl0]
CKO Low to PODS High = 8 ns max [Pl1]
PODS Low to PB valid = 25 ns max [P12]
PODS High to PB invalid (data hold) =
10 ns min [P13]
address-to-data-valid time of 69.5 ns (P32) and an output-enable-to-data-valid time of 69.6 (P33).
This MAX design is I/O intensive, rather than macrocell intensive. Although the design uses 96 percent of
the I/O pins, it uses only 34% of the macrocells. Still,
the CY7C343 in this example integrates five 16-pin and
two 14-pin TIT- MSI packages into a 44-pin PLCC

....

CLKU

CLUUT
- - . P7 ....
~r----------------------~L-,

PSEL

--.

PI

....

--. PI ....

L -_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

~_l_T_ _ _ _
ZT_~r___

PODS

PDI

--. Pl.....

AD.

P2 I. P2 I

1 . - -_ _ _ _

pU

~

ZZZZZZZZZZZZZZZZZZX'-_....JA....D:.ILD!UU~SL.S...LY.!I.JALILAI-"-D----'~>CX
--.

--.

IAIKI

--11

....

CLO C K

pzO.PI7

"21

....

....

---------------------------~-~~~~I~C~H~IP~E~.A~.~LE~....J
CHIP SELECT

IAN K

Figure 3. Address Load, Data Write, Data Write

6-348

>0

AD P USS ULI D
~___'>.J<.J<.JIU.LL-L.!I.JILA-"--_~

Interfacing to DSP Chip with MAX
package. In situations where space is at a premium,
MAX is a very powerful integration tool.
A discrete implementation of the circuit outlined
here requires approximately 1.6 square inches of board
space, while the CY7C343 requires approximately 0.5
square inches. And with the CY7C343, you also have
the advantage of design flexibility, especially when the
CY7C343 is socketed. You could, for instance, change
the MAX design to include up- and down-count addressing or accommodate higher density PROMs and
SRAMs. MAX EPLDs are available in windowed packages for erasure under UV light; to make a change, you
simply redesign, recompile, and reprogram the chips.
Because one MAX part replaces seven TTL parts,
the MAX implementation offers inherently higher
reliability. Inventory overhead is reduced, and the
CY7C343 consumes 155 rnA worst case versus 311 rnA
worst case for the FITL parts it replaces.

Table 3. Critical CY7C343 Parameters
74163 set-up time (110 input) =
Tio + Tpia + TIad + Trsu - Tin - Tics =
5 + 16 + 14 + 8 - 7 - 2 = 34 ns [P15]
74163 set-up time (dedicated input) =
Tin + Tlad + Trsu - Tin - Tics =
7 + 14 + 8 - 7 - 2 = 20 ns [P16]
74163 hold time (110 input) =
Tin + Tics - Tio - Tpia - TIad + Trh =
7 + 2 - 5 - 16 - 14 + 8 =
-18 ns, assume 0 ns [P17]
74163 hold time (dedicated input) =
Tin + Tics - Tin - Tlad + Trh =
7 + 2 - 7 - 14 + 8 =
-4 ns, assume 0 ns [P18]
74163 c1ock-to-output time =
Tin + Tics + Trd + Tod =
7 + 2 + 2 + 5 = 16 ns [P19]
74138 propagation time =
Tfd + Tpia + Tlad + Tcomb + Tad - Tod
1 + 16 + 14 + 4 + 5 - 5 = 35 ns [P20]

Acknowledgments
The AT&T application note "Interfacing External
RAM to the WE DSP16 Family of Digital Signal
Processors" outlines a parallel-port implementation
using discrete logic. Thanks to Daniel Yasi and Jim
Flynn of AT&T.

=

CLln

CLIOUT

PSEL

~r---------------------------'~~
-.

______________________________~r-----

~

'1

-.

PI

-'P3~

IT

-.

-.

ZZZZZZZ X'-_. . .

'4

------------

'OD S

'DI

~

-.

'~L___1T__~__~:u~TT~P~UTwl~[I~A~i~~~~___
4T__~1

ZT

PI

~

~

PH

AL~!p'----'XZZZZZZZZZZZZZ

Ap"'-'pI........
ES.......
S .......
Y......

>CIfI£XZZZZZZ
~

PI

CLocr

AU

~

________________~r---lL______________________~
-.

P30. PJZ

ZZZZZZZZZZZZZZZZZZX'--________

--'X'--____

AD_D_RE_S-'-S_VA_L_ID_ _ _ _ _ _ _ _

-.

PH,PH

CMI'

IAUI

~

ElAILE

CII' SHICT

lAir

Figure 4. Address Load, Data Read

6-349

CPr~
~

Interfacing to DSP Chip with MA)

SEMlCCtIDUCTOR

Appendix A. Code Fragment
This DSP16A program reads two words from external memory, multiplies them together, and writes the result back to
external memory. Note that there is a latency of one read cycle during active read because of a double-buffering
process. When a read statement is encountered, the data on which the program operates is taken from the on-chip
mput register. As the data is being taken from the input register, a read transaction is initiated on the physical port so
that, at the end of the read cycle, the correct value is in the input register.
This program example is pathological because it does not' inake use of any pipelining or parallelism, of which the
DSPf6A is capable. Ordiiiarily, a large block of data would be downloaded, processed, and uploaded, not just a few
words. However, this example does illustrate the steps necessary to address, read, and write external memory through
the parallel port.

1* Issue Address

*1

pioc=Ox5800

1*
1*
1*
1*
1*

pdxl=OxOO

1* address external memory location OxOO

not statuslcontrol mode (parallel bus 16-bit
bidirectional), PIDS and PODS are outputs
(active mode), PIDS and PODS strobe width
equals three times the processor cycle time
or 3T

1* Read Data

*1
*1
*1
*1
*1
*1
*1

pioc=Ox7800

1* not sIc mode, PIDS and PODS active, PIDS
1* and PODS strobe width equals 4T

*1
*1

aO=pdxO

1* first read not valid, discard

*1

aO=pdxO

1* second read valid, read first location of
1* external memory into accumulator 0

*1
*1

al=pdxO

1* read second location of external memory into
1* accumulator 1

*1
*1

1* Process Data

*1

x=aO

1* put first word into x register

*1

y=al

1* put second word into y register

*1

p=x*y

1* multiply words, result in p register

*1

aO=p

1* put product back into aO

*1
*1

1* Write Data
pioc=3800

1* not sic mode, PIDS and PODS active, PIDS
1* and PODS strobe width equals 2T

*1
*1

pdxO=aO

1* write product to third location in external
1* memory

*1
*1

, 6-350

CYPRESS
SEMICONDUCTOR

FIFO RAM Controller With
Programmable Flags
8-Word FIFO RAM Controller Operation

This application note describes a scalable FIFO (fIrst
in, first out) RAM controller that provides all the control
circuitry necessary to make a deep FIFO. The design uses
off-the-shelf dual-port static RAMs (Cypress CY7C130s,
for example). The controller also features an array of
programmable flags that you can tailor to the specifIc
needs of your project.
FIFOs are often used to buffer data transfers. The increasing volumes of data that must be manipulated and
transferred between systems has prompted the need for
large FIFOs.

For the simple 8-word design, the signal names and
their definitions are:
IMR - Master Reset
lSI - Shift in is the external signal used to write data into
the FIFO
ISO - Shift out is the external signal that requests a read
from the FIFO
RDADDR(3:0) - The dual-port read address, connected
to A(3:0)R on the CY7C130
WRADDR(3:0) - The dual-port write address, connected
to A(3:0)L on the CY7C130
FUlL - The flag that indicates when the FIFO is full
ALMOST FULL - The flag that indicates when the
FIFO is 75 percent full
ALMOST EMPTY - The flag that indicates when the
FIFO is 25 percent full
EMPTY - The flag that indicates that the FIFO is empty
ISIINT - Internal shift-in signal for the dual-port RAM;
connected to RlWL and ICEL of the CY7C130
ISOINT - Internal shift-out signal for the dual-port
RAM; connected to 10ER and ICER of the CY7C130
DATAIN(7:0) - Input data lines connected to I10L(0:7)
on the CY7C130
DATAOUT(7:0)-Output data lines connected to
I10R(0:7) on the CY7C130
BUSY IN, BUSY OUT - The busy flags on the dualport RAM should be used to indicate when data can safely
be shifted into or out of the device
Asserting IMR initializes the FIFO. This signal resets
the write counter and read counter, so that they both point
to location 0000. A master reset also clears the address
latches and causes the EMPTY and ALMOST_EMPTY
flags to be asserted.
The inverse of the FUlL flag enables ISIINT, so that
when the FIFO is full, no more data can be shifted in.
This gated SI signal is connected to the RlWL and ICEL
pins of the CY7C130. When lSI is asserted, and the FIFO
has room, the dual-port RAM's left port is enabled and

FIFO RAM Controller Architecture
The FIFO RAM controller is implemented here in
two stages. The first stage illustrates the architecture of
the controller by implementing a shallow, 8-word-deep
FIFO using a dual-port RAM. The second stage expands
this scalable architecture to implement an 8-Kword-deep
FIFO.
Typically, FIFOs are based on a dual-port RAM
structure. This structure includes a memory cell that can
be written to and read from at the same time. These
devices are relatively inexpensive and provide the kind of
asynchronous operation essential to a FIFO.
The design includes four primary sections: counter
logicladdress generation, flag generation, overflow control, and memory (Figure 1).
Data is written into the left port of the dual-port
SRAM with an address supplied from the write counter.
Data is read from the right port of the dual-port SRAM
using an address supplied from the read counter.
The core of the design is the dual-port memory. A
Cypress CY7C130 lK x 8 dual-port SRAM is used here.
(Refer to "Understanding Dual-Port RAMs" in the Logic
section of this book for more information on dual-port
RAMs.) To make a wide FIFO, you add as many
CY7C130s as necessary and address them in parallel. As
mentioned earlier, you can implement deep FIFOs (even
deeper than 8 Kwords) by scaling this design properly.

6-351

put into read mode. Data on 1I0L(7:0) is read into the
FIFO. 10EL is permanently disabled.
Applying the fIrst lSI pulse causes data from the
1I0L(7:0) lines to be latched into memory. When the read
is completed, the pMPTY flag is deasserted, indicating
that there is data in the FIFO that can be shifted out. The
ALMOST EMPTY flag stays asserted if the FIFO contains two or fewer valid words (25 percent full). ALMOST EMPTY deasserts when there are three or more
valid data words in the FIFO.
If six consecutive shift-in cycles are completed
without a shift-out cycle, the ALMOST FULL flag is asserted. This means that the FIFO is 75 Percent full. After
eight consecutive shift-in cycles without a shift-out cycle,
the FULL flag is asserted, signalling that the FIFO is full,
and no more data can be shifted in.
The inverse of the EMPTY flag enables ISOINT, so
that when the FIFO is empty, invalid data is not read. This
gated SO signal is connected to the 10ER and lCER pins
of the CY7C130. When ISO is asserted, and the FIFO
contains valid data, the data is driven to the 1I0R(7:0)
pins. RlWR is tied high, forcing the right port to read
mode.
If the FIFO is full and a shift-out cycle·is completed,
the FULL flag is deasserted, because the FIFO is no
longer full. Once there are less than six words in the FIFO

(less than 75 percent full), the ALMOST_FULL flag is
deasserted. When there are only two words left in the
FIFO, the ALMOST EMPTY flag is asserted. If all the
valid words in the FIFO have been read, the EMPTY flag
is asserted. Waveforms for this circuit appear in Figure 2.
Due to the asynchronous nature of dual-port RAMs,
the FIFO can be written to or read from at any time (unless, of course, the FIFO is full or empty). It is a good
idea to monitor the flags when lSI and ISO are both deasserted, however, to give the internal logic time to settle.
The flags can be safely monitored at any time if you can
guarantee that only one operation is performed at a time.
Note that flags are updated when lSI and ISO are High.

Counter/Address Generation Logic
The two counters in this design serve as a read
counter and a write counter (Figure 3). The write counter
provides an address to the memory port that is being written to. This counter is incremented every time an lSI (shift
in) pulse is received, until the FIFO is full. When the
FIFO is full, the FULL flag is asserted and the counter is
inhibited, preventing data overflow. When the FIFO is no
longer full due to a read, the FULL flag deasserts and the
write counter is enabled again.
The read counter provides the address to the memory
port that is being read from. This counter is incremented

,

r

MlKln

iiiln 1--...---..1
"liill

Will

Figure 1. Block Diagram of a FIFO RAM Controller

6-352

...::=-0..

£;~RESS
FIFO RAM Controller With Programmable Flags
~~ ~CaID~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MENU

21
·················_···i-fr

+0_Sns

SO
S1:
WRADDR1
WRADDR2
WRADDR3
WRADDR1:
RDADDR1
RDADDR2
RDADDR3
RDADDR1:
SUB1
SUB2
SUB3
SUB1:
FULL
OST_FULL
ST_EHPTY
EMPTY
SOOUT
S1:0UT

. . .n. . n._. .n . . n . . n . _n. . .n._.n. . n_.

1:
0

)(

0

)(

0

)(

0

)(

0

x

0

x

0

)(

0

x

0

)(

0

x

0

x

0

)(

0

x

0

)(

0

x

0

x

0

)(

0

x

L.-~----------.L!l!il··i

::!:,:!;;~

...._.........._..................__..._.... _._...................__.......................................1

Pir W~~

L------------------':YH
L------------------!;i!,!l!l!
. . . . . . _. . . . . . . _. _. . . . . . . . . . _. . . . . . . . . . . . . ._. . . . . . . . . . . ._. .__. . . . . ._. ._. . ._. . . . . . . . . n . . . . . .n . . . . J I. . . . . .!.l. . . . . . . . . . . . . . . . . . . !,;!:i!··i

III Ii:'

---I

:~~.~~----------~~~~~~:~~~~~:~-~~-III
·-.--. .·-.·u-.u·_·.·u. . u . ·.·u·. . u·-.·u-.. u·.·_·. . . . ·_-.. . _. . . . . . ..L.n
. . . . . .-. . ....._n
.-.. . .. .. n.
. . . ._..__.n. . . .._..n. . ._.....n__. . .. ..-.n.J.

!;m:!i

"':"'L' FiUH

·_!II!j'j!lj

1---_ _ _ _ _ _ _ _ _..;;:......_ _ _ _-.-_ _ _ _....-_ _ _--._ _ _ _--.-_ _ _ _-.-_ _
TIME
CU$Jo

~.7US

3.3US

~.9US

:IL.0r'iS.

Figure 2. Sample Waveforms for the 8 Word FIFO Example
every time an ISO pulse is received, until the FIFO is
empty. When the FIFO is empty, the EMPTY flag is asserted, and the read counter is inhibited until spme valid
data has been written to the FIFO.
Both the read and write counters are forced to 0000
when IMR is asserted.
lSI and ISO are Low-asserted signals, as are their internalcounterparts ISIINT and ISO INT. The counters are
incremented on the High-to-Low transition of ISIINT or
ISOINT. Note that SIINT and SOINT clock the counters.
The counters must be clocked on the signal's rising edge,
but the external signals are Low asserted; an inverter is
thus required. With this scheme,the current address is always stable before the next lSI or ISO pulse.
The address lines are latched to keep the address
stable during the Low-asserted lSI or ISO pulse. When lSI
or ISO deasserts, the latches enable the new address to the
dual-port RAM. In simple form, the counter is incremented on the falling edge of lSI or ISO, and the new

address is latched-out on the rising edge of lSI or ISO.
Note that the rising edge of the lSI or ISO signal
propagates through the CY7C342 PLD used to implement
this design; the signal continues out to the dual~port RAM
before the new address appears, thereby allowing the correct address at the rising edge of lSI or ISO before changing to the next address.
so

SI--IV~'T~

______~
Reed Address

"rtle Addra ••

Figure 3. CounterlAddress Generation Logic and

6-353

~~ ====;;;;;;;;;;;==F_IF=O_R=A;;;;;;M;;;;;;;;;;C;;o;;;n;;;tr;;;o;;;;lI;;;e;;.r.;W;;;;I;;;·t;;h.;P.;r;;;o~gr;;;;a;;;m;;m~a;;;b;;;le;;;F;;;I;;;a==gs
~===D
AIR ----Dc:>----I-----.

'1/1111
FlLL

Figure 4. The FULL Flag Circuit

Flag Control Logic
The flag control logic consists of several comparators
and a subtraction circuit, which keeps track of the number
of valid data words currently in the FIFO. This is done by
subtracting the value in the read counter from the value in
the write counter. The output of the subtraction circuit is
compared with values you supply. The flags result from
these comparisons.
Because this is a PLD solution, you can customize
the flags to suit your needs. This example includes only
four flags, but you can create as many flags as needed to
check for any number of words.

change on the falling edge of ISIINT and IS OINT, and
thus settle by the time lSI or ISO is deasserted.
The reset input of the FULL flag latch is fed by the
master reset (/MR), the comparator's less-than output and
the gating signal discussed above. In other words, the
FULL flag is reset when the number of valid words is less
than 8 and both lSI and ISO are deasserted, or when an
IMR pulse is received.
The ALMOST_FUlL Flag
For the AIMOST_FULL flag (Figure 5), the comparator takes its A port inputs from the subtract circuit;
the comparator's B port inputs are tied to 0101(5). The
operation of this flag is almost identical the that of the
FULL flag, except that the AIMOST_FULL flag is asserted when there are 6 or more valid words in the FIFO.
This is where the comparator's greater-than output comes
into play. Note that the ALMOST_FULL flag is always
asserted while the FULL flag is asserted.
The ALMOST_EMPTY Flag
The ALMOST_EMPTY flag comparator (Figure 6)
takes its A port inputs from the subtractor and its B port
inputs from the value 0010 (2). The IMR signal feeds the
set input of the RS latch, because when the part is reset,
the FIFO is empty. Because this means there are less than
2 valid data words in the FIFO, the AIMOST EMPTY
flag is asserted. The flag is also set by the comparator's
less-than or equal-to outputs, gated by the deasserted
ISIINT and ISOINT, as before. Consequently, whenever
the subtractor gives a value less than or equal to 0010 (2),
the flag is set.
The reset of the AIMOST_EMPTY flag is fed by the
comparator's greater-than output, gated with ISIINT and
ISOINT. Thus, when the subtractor indicates a value
greater than 0010 (2), the flag is reset.

The FUlL Flag
The FULL flag logic appears in Figure 4. The comparator takes its A port inputs from the subtractor. Its B
port inputs come from tying the fIrst 3 bits high and the
fourth to ground (0111 binary or 7 decimal). The value of
0111 is used instead of 1000 (8 decimal) because the fIrSt
address is 0000, not 0001, and thus the subtract circuit's
output is actually 1 less than the number of valid words in
the FIFO. The 0111 value also allows the flag to be
processed concurrently with other activities. In other
words, the FULL flag is asserted on the eighth valid word,
rather than waiting until the ninth, when it is too late.
Again, you can customize the flag to be asserted whenever you need the information.
The flag comes from an RS latch. The latch is set
only when the comparator indicates that the subtract logic
value is equal to or greater than 0111 (7). The
comparator's greater-than output is not strictly necessary
here, but it is included as a safeguard. The latch's set
input is gated by a combination of ISIINT and ISOINT
that is asserted only when neither lSI or ISO are asserted.
This arrangement ensures that the flags do not change
until the counters both settle. Remember that the counters

The EMPTY Flag
The EMPTY flag comparator (Figure 7) takes its A
port input from the subtractor. The comparator's B port
input is tied to 0000. The RS latch is set with the /MR
signal or when the subtractor value is 0000, which indicates that the FIFO is empty. Any other value at the subtract circuit clears the EMPTY flag. Again, ISIINT and
ISOINT gate all signals except /MR.
Overflow Control
The controller contains a simple mechanism to
prevent data overflow and underflow - the condition
ISO _ _~......

.1M

lSI

----1')01--+---..

.1M

---Do---4---.

'111111
AUIOST F1.LL

'811111

Figure 6. The ALMOST_EMPTY Flag Circuit

Figure 5, The ALMOST_FULL Flag Circuit
6-354

~
=as

~ ~JmiS

FIFO RAM Controller With Programmable Flags

~Jr~CaID~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ----\'I

where an address that does not contain current, valid data
is read. Specifically, when the FULL flag goes active, its
inverse is ANDed with the SI signal to block any further
shift-in pulses from entering the system. This keeps the
write address counter from incrementing and the dual-port
RAM from accepting any more data. When a shift-out
pulse is received, the FULL flag resets, and data can be
shifted into the FIFO again.
When the FIFO is empty, the inverse of the EMPTY
flag is ANDed with the SO signal to block any further
shift-out operations. This keeps the read address counter
from incrementing and disallows any further reads from
the dual-port RAM. When the next shift-in pulse is
received, the EMPTY flag resets, and data can be shifted
out again.
Referring back to Figure 2, notice that 10 total lSI
pulses are sent, but only eight ISIINT pulses are
generated.

II'R ----no...---I--~

EJ\PTY
'111111111111

Figure 7. The EMPTY Flag Circuit
Language). AHDL allows you to create a textual description of your design using Boolean equations, truth tables,
and state machine syntax.
If you need a new macrofunction, you can easily create exactly what you need using schematic capture,
AHDL, or a combination of both. When the new macrofunction is completed, a symbol for it can be created automatically.

Scaling the Controller to 8 Kwords

Creating a Macrofunction
Because the MAX+PLUS library contains no subtractor macrofunction, the subtract circuit is created using
AHDL:
subdesign subtract

So far this application note has analyzed the basic
functions of a FIFO RAM controller with a simplified example. Next, consider a real-world example of an 8Kword-deep FIFO.
The counters are expanded to provide a 13-bit address that drives the dual-port memories. The comparators
and subtract circuit are also expanded to 13 bits. To
facilitate the large design, the comparators are eliminated,
and the desired toggle point is decoded using a 13-input
AND structure. Put the decoded output into SIR of the
NORLTCH (a latch made of NORs). The memory core
now consists of four CY7C132 2K X 8 dual-port RAMS.
The CY7C344 PLD is used to implement the design's
first stage. The final implementation scales into the
CY7C342 in much the same way as the design itself. The
CY7C344 is a high-density (1250 gate equivalent) 28-pin,
32-macrocell PLD. The design fits nicely into this device,
as shown in the design report fIle in Appendix D. All the
macrocells are used, while still leaving five inputs, two
outputs, and half the available p-terms unused.

(

wrcnt[12..0], rdcnt[12 .. 0] : input;
sub[12..0] : output;
)

begin
("sub [12 ..0])=(0, 1,wrcnt[ 12..0] )-(0,0,rdcnt[12 ..0]);
end;
The first line of code specifies that this subdesign, or
macrofunction, is called "subtract." This is an arbitrary
choice. Next, the inputs and outputs are defmed as being
13 bits wide (0 through 12). Finally, the required function
is defmed between the begin and end statements. The term
"0,1," before wrcnt[12 .. 0] assures that wrcnt is always
bigger than O,O,rdcnt It is like adding another significant
digit Thus, when a subtract occurs, this macrofunction always returns a positive number. This number represents
the magnitude difference between the write counter and
the read counter. The subtract function returns a 2's complement number; for simplicity of design, the result
should therefore always be positive. If the result were
negative, the comparators would have to be very complex.
For convenience, the MAX+PLUS Text Editor is used to
create this fIle, which is called SUBTRACT.1DF.
After the function is implemented, the design is complied using the COMPILE function in the pull-down
menu. The logic is automatically generated and a symbol
created that can now be placed in the design. The compiler is run at this stage to tum the subtractor into a
functioning macrocell, but do not run the compiler on the
entire design until you have finished total design capture.

The MAX+PLUS Tools
The FIFO RAM controllers described here have been
implemented using the MAX+PLUS design package. This
well-integrated, user-friendly package provides schematic
capture, a high-level design language, simulation, and
programming facilities. MAX+PLUS also supports hierarchical designs.
The schematic capture utility features an easy-to-use,
mouse-driven, pull-down menu format. Most of the macrofunctions used in the design come from the
MAX+PLUS macrofunction library. This library includes
everything from basic logic components, such as a twoinput AND gate (AND2), to an 8-bit counter (8COUNT)
and 7400 Series logic. These robust libraries facilitate
quick, efficient schematic generation.
The MAX+PLUS package provides a high-level
design language, AHDL (Advanced Hardware Design

Design Verification and Simulation
After the entire design has been entered, the
MAX+PLUS compiler checks the design to detect any
6-355

~

I'

SI
YRI TE AOI:RESS

Tew

f------

,1/

-

The~

"'I'

T.l w ----..,

-~

-

-

I

DATA IN

.'-

Perem.l.r
Tew
The
Telw
T.d
Thd
Todr

D•• crlpllan
Addr ••• S.l-Up la Yrll. End
Addr ••• Hold from Yrll. End
Shlfl In Pul •• Yldlh
Dolo S.l-Uc la Yrll. End
D",l", Hold r .. .V.. d:.• End
RI.1ng Edge SI t:.a N.xt:. Addr •••
",m

1\1n

"ex

21i!1
2
21i!1
15
35

65

Figure 8 Simple FIFO Timing
design rule violations. When an error is detected, the
MAX+PLUS software is so well integrated that it jumps
to the schematic or text editor and highlights the error.
This feature is extremely helpful during the initial debug
phase of the design. The compiler also creates all the files
necessary for design simulation and device programming.

Simulation
MAX+PLUS provides a simulation package that allows you to test your design. You can define waveforms
using tabular entry format or a waveform editor. Vectors
entered in the tabular format can be converted and displayed as waveforms. The MAX+PLUS simulator performs both timing· and logic simulation. The MAX+PLUS
simulation facility generated the timing diagram shown in

Figure 2.
The code used to generate the waveforms is created
in the MAX+PLUS text editor. The top part of the file
defines the inputs and outputs. START and STOPcommands permit you. to start. and . stop the simulation at a
given time. INTERVAL defines the time between the
lines of code; an interval of 200 means the inputs change
every 200 ns. The rest of the file consists of columnized
entries for /MR, ISO, and lSI. Appendix A lists the vector
file for the 8-word FIFO RAM controller, and Appendix C
contains the vector fileJor the 8-Kword controller.
Simulating a design as large as the 8-Kword FIFO
RAM controller takes a prohibitively long period of time
(16,000 cycles), without some kind of complex simulation
capability. The MAX+PLUS simulator provides such features through the implementation of command and vector
files. The vector file used to exercise the flags for the 8K
FIFO appears in Appendix C.

Note that a vector file can run without a command
file. A typical vector file contains the START, STOP, and
INTERVAL statements explained earlier. Additionally, a
PATTERN statement sets up the inputs and the subsequent pattern of Is and Os provides the desired input
stimulation. This simple compilation of rows of Is and Os
quickly exercises a design. You can enter the vector file
in the text editor; the filename must end with a .vEC extension.
When you are ready to simulate, you use the pulldown menu to activate. "FILE," then "VECTOR input,"
followed by "VECTOR file (.VEC)." This sequence of
commands fetches the vector file as input stimulus. Now
activate the SIMULATE command. You are prompted for
the length of the simulation, and simulation begins. When
the simulation is completed, activate the WAVEFORM
command to display the results.
To further aid in complex simulation, you can create
a command file, which includes the instructions needed to
execute a. simulation sequence. Appendix B li~ts the command file used to exercise the flag operation for the 8K
FIFO. The command file works in conjunction with a vector file J/tppendix C),
The frrst line of the command file in Appendix .B
defines that the GROUP command accepts, ,hex input.
Next, the VECTOR command uses the given vector file as
a source of stimulation. GROUP shortens the description
of a group of inputs/nodes by allowing you to describe
them in hex format instead of defining every input individually in binary format. You can type 07FE, for example, instead of 00000 11111 i 11110.
The "SIMULATE 200Oos" command gives the
amount of time the simulator runs for this section of the

6-356

REm

~ 1'-___T_a_d_~_1--+-----J>-DATA

>

<

QJT

Pa,.am.l.,.
c •• c,.lpllan
T.ad
Shlfl cul la Dala Valid
T,.dadd,. Shlfl Oul la Valid Add,.•••
T,.e
R.ed Cyel. Tim.
Tad~
Valid Add,.••• la cala Valid

I'\1n

J\ex
65
65

95
25

Figure 9 Read Cycle
tADR = 30 ns: CY7C344 lSI to next clock
tACE = 30 ns: Dual-port ICE Low to data out
tRDADDR = 30 ns: CY7C344 ISO to valid address
tRC = 25 ns: Dual-port read cycle time
tAA = 25 ns: Dual-port valid address to data valid
Timing simulation data from the VECTOR and
COMMAND files indicates the worst-case timing from
lSI or ISO until flags are stable is approximately 95 ns.
This value, coupled with the normal timing parameters for
shifting data in and out of the dual-port RAM, gives a
200-ns FIFO RAM controller system cycle time.
Because this is a very complex design, you can
change the logic to add or delete features. Note that you
can use the minimum lSI or lSI pulse width, but keep the
overall period at approximately 200 ns.

command file. In this example, the simulation starts by
running for 2000 ns. As shown in the COMMAND &
VECTOR files, this resets (MR) the device and allows a
couple of shift-in operations (SI) to occur. Next, a FORCE
STICK command forces the RDCNT and WRCNT to a
desired value. "SIMULATE +100ns" allows this value to
be implemented. The + in front of the number means that
this value is for an additional amount of time, or incremental. The absence of the + indicates an absolute
time.
Now a FORCE UNSTICK command allows the
nodes to simulate freely. The next "SIMULATE +1800ns"
allows for two SIs followed by two SOs. This sequence
sets the counter to a desired value just before the ALMOST_FULL flag activates; the sequence also provides
SIs to activate the counter and SOs to deactivate it.
The rest of the code sets counters next to the value
that toggles flags and activates and deactivates the
counters. These two files save a great deal of simulation
effort and compute time.

Programming Support
When the design is complete and fully exercised, a
device can be programmed using the MAX+PLUS
programmer module in conjunction with the QuickPro II
programmer hardware. The small design is fitted to a
CY7C344, and the expanded design is fitted to a
CY7C342.
With the proper software and adapters, the QuickPro
II is versatile enough to program all the MAX devices, as
well as every PROM and PLD Cypress manufactures. The
QuickPro II is connected to a PC via a parallel port, leaving the slots in your PC available for other peripherals.
The complete MAX+PLUS package contains the
MAX+PLUS software, the QuickPro II programmer and
software, and adapter sockets for the entire MAX family.
The designs presented here have been verified by
simulation.

TIming Analysis
The MAX+PLUS software offers valuable timing information. The timing data from the MAX+PLUS simulation and the information in the CY7C132 data sheet allow
complete timing analysis of the FIFO RAM controller.
Figures 8 and 9 show the controller's timing waveforms.
The pertinent timing parameters and their values are:
tAW = 20 ns: Dual-port address set-up to write end
tHA = 2 ns: Dual-port address hold from write end
tPWE = 20 ns: Dual-port lWE pulse width
tSD = 15 ns: Dual-port data set-up to write end
tHD = 0 ns: Dual-port data hold to write end

6-357

Appendix A. Simulation File: 8 Word FIFO RAM Controller
SIMPLE EXAMPLE SIMULATION CODE

101

START 0;
STOP 9000;
INTERVAL 200;
OUTPUTS WRADDR1 WRADDR2WRADDR3
WRADDR4RDADDR1RDADDR2RDADDR3RDADDR4 SUB1 SUB2 SUB3 SUB4 FULL ALMOST FULL
ALMOST EMPTY EMPTY FULL SOOUT SIOUT ;
INPUTS MR SO SI;
PATTERN
011
%RESET%
111
111
110
111
110
111
110

101
111
111
% SHIFTED OUT 2 EXTRA TIMES TO ENSURE
PROPER FLAG OPERATION %
110
1 11
110

111

110
111
110
11 1
% SHIFTED IN 8-BYTES%
1 10
111
1 10
111
% SHIFTED IN 2 EXTRA BYTES TO ENSURE
PROPER FLAG OPERATION%
111
101

1 11

110
111
110
111
1 10
111

1 10
111
1 10
111
% SHIFTED IN 8-BYTES %
110
111
110
111
111
% SHIFTED 2 EXTRA TIMES TO ENSURE PROPER
FLAG OPERATION %
101
111
101
111
101
111
101
111
101
111
101
111
101
111
101
111
% SHIFTED OUT 8-BYTES %

111
110

111
110
111
1 10
111

110
111

111

101
11 1
101

111
101
111
101
1 11
101
111

101
111

101
111
%SHIFTED IN 8-BYTES%
101
111

101
111
%SHIFTED 2 EXTRA TIMES TO ENSURE PROPER
FLAG OPERATION%

111;
6-358

Appendix B. Simulation Command File: 8K FIFO RAM Controller
1* This file executes preloads to simplify
1* simulation of the FIFO RAM Controller
RADIX HEX
VECTOR BIGFIF.VEC
GROUP CREATE RDCNT = RCNTl3 RCNTl2 RCNTll RCNTlO RCNT9 RCNT8 RCNT7 RCNT6 RCNT5 RCNT4
__
RCNT3 RCNT2 RCNTl
GROUP CREATE WRCNT = WCNTl3 WCNTl2 WCNTll WCNTlO WCNT9 WCNT8
WCNT7 WCNT6 WCNT5 WCNT4 WCNTI WCNT2 WCNTl
SIMULATE 2000NS
FORCE STICK WRCNT=07FE
FORCE STICK RDCNT=0000
SIMULATE +lOONS
FORCE UNSTICK WRCNT
FORCE UNSTICK RDCNT
SIMULATE +l800NS

1* initialize, turn EMPTY flag off
1* force controller to ALMOST EMPTY BOUNDARY

FORCE STICK WRCNT=OFFE
FORCE STICK RDCNT=0000
SIMULATE + lOONS
FORCE UNSTICK WRCNT
FORCE UNSTICK RDCNT
SIMULATE +l800NS

1* force controller to AlMOST FULL boundary

1* preload
1* relieve force
1* do several writes to turn flag on
1* do several reads to tum flag off

1* preload
1* relieve force
1* do several writes to tum flag on
1* do several reads to tum flag off

FORCE STICK WRCNT= lFFEl* force controller to FULL boundary
FORCE STICK RDCNT=0000
SIMULATE +l00NSI* preload
FORCE UNSTICK WRCNT 1* relieve force
FORCE UNSTICK RDCNT
SIMULATE + 1800NSI* do a write to turn flag on, attempt more
1* do several reads to turn flag off

6-359

~RESS

FIFO RAM Controller With Programmable Flags

~, ~~R~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix C. Simulation File: 8K Word FIFO RAM Controller

% Simulation file for the FIFO RAM Controller
START 0;
STOP 399;
INTERVAL 100;
% Each cycle is lOOns long
OUTPUTS

1 0
1 1
111
1 0 1
1 0 1
111
111
1 1 0
110
1 1 1
111
110
1 1 0
1 1 1
1 1 1
1 1 1
1 1 1
111

WCNT13 WCNT12 WCNTII WCNTlO
WCNT9 WCNT8 WCNT7 WCNT6
WCNT5 WCNT4 WCNT3 WCNT2
WCNTI RCNTl3 RCNTl2RCNTlI
RCNTlO RCNT9 RCNT8 RCNT7
RCNT6 RCNT5 RCNT4 RCNT3
RCNT2 RCNTI WRADDR13
WRADDR12 WRADDRII
WRADDRlO WRADDR9 WRADDR8
WRADDR7WRADDR6WRADDR5
WRADDR4WRADDR3WRADDR2
WRADDRI RDADDR13 RDADDR12
RDADDRIIRDADDRI0RDADDR9
RDADDR8 RDADDR7RDADDR6
RDADDR5RDADDR4RDADDR3
RDADDR2 RDADDRI
FULL ALMST EMP ALMST FUL
EMPTY SIOUT SOOUT ;
MR SI SO;

101
101
111
1 1 1
101
1 0 1
1 1 1
1 1 1
1 1 0
110
111
1 1 1
1 1 0
110
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 0 1
1 0 1
1 1 1
1 1 1
1 0 1
101
1 1 1
111
1 1 0
110
111
1 1 1
110
1 1 0
111
1 1 1;

INPUTS
PATTERN
011
% master reset and initialization
111
111
1 11;
START 400;
STOP 7500;
INTERVAL 100;
INPUTS MR SI SO;
PATTERN 1 0 1
101
1 1 1
1 1 1
1 0 1
101
111
111
1 1 0
1 1 0
111
1 1 1
1 1 0
1 1 0
111
111
1 1 1
111
1 1 1
1 0 1

6-360

~

~

=--=,~~RESS
SEMICONDUCTOR

FIFO RAM Controller With Programmable Flags
Appendix D. Report File: 8K Word FIFO RAM Controller
C:\MAX_ WORK\FIFOPLD\BIGFIF.RPT

MAX+PLUS Compiler Report File
Version 2.5OC 7/18/90
***** Design compiled without errors
Title: DESIGN NAME
Company: CYPRESS SEMICONDUCTOR
Designer: MIKE LEWIS
Rev: A
Date: 6:41p 11-01-1990
Turbo: ON
Security: OFF
00

~ ~ ~
~
p;l

en

~
~

en

~
p;l

en

~ ~ ~

0\

00

t-

~
~

\0

~

~

0

~

~
~

~

U
U

~ ~ ~ >

~

\0

~

~

II"l

"

0\

0

" tPHSI + tOHIR
Eq. 7
and
Tso > tpHSO + tOHOR
Eq. 8
The difference (X) between the TSI/Tso period
and the sums represented by Equations Sand 6 are the
times within which the external logic must respond to
achieve the performance specified on the data sheet.

Input Data
As explained previously, a rising edge on SI causes
a falling edge on IR. Nothing further happens as long as

SI is held High. The internal write pointer is incremented on the SI signal's High-to-Low transition. If the
FIFO is not full, after the write pointer settles the IR
signal goes High,· indicating that more room is available.
If the IR signal does not go High within tOHIR (delay,
SI Low to IR High), the IR Low signifies that the FIFO
is full.
Output Data
Output data appears at the data output pins, then
the OR output signal goes from Low to High, signifying
that the data is valid. Internally, the OR signal is logically ANDed with the SO input.
As a result, if external logic generates a positive SO
pulse when OR is Low, the FIFO ignores the pulse.
Therefore, the read pointer is not incremented, and the
same data is read, assuming that external logic samples
the output data on SO's rising edge; this makes the
FIFO appear to pick up words. In fact, the device that
generates the SO pulse is reading the words more than
once.
As explained previously. a rising edge on SO causes
a falling edge on OR. The internal read pointer is incre-

7-5

Table 1. Output Data Setup to OR

mented ,on the SO signal's High-to-Low transition. The
read pointer now settles, and an interval equivalent to
an SRAM's address access time passes. Then, if the
FIFO contains at least one' word of data, the OR signal
goes High, signifying that more data is available. If the
OR signal does not go High within tOHOR (delay, SO
Low to OR High), the OR Low indicates that the FIFO
is empty.

CY7C40S/409-xx
Parameter

Data Timing
Examination of Figure 8 shows the minimum SI
period to be:
TSI = tSSI + tHSI
Eq. 9
Comparing these parameters' values in the data
sheets for the small FIFOs reveals that the maximum
input data frequency is considerably greater than the
frequency represented by Equation 3. This is because
the control signals go into the IC as well as come out of
it, which requires more time than simply presenting the
input data to be sampled. In other words, the maximum
input frequency is limited' by the propagation delay of
the control signal path, not the data path.
Examination of Figure 9 shows the minimum SO
period to be:
.
Tso = tPHSO + tOHOR
Eq. 10
Comparison of these parameters' values, in the data
sheets for the small FIFOs reveals that this maximum
output data frequency is considerably greater than the
frequency represented by Equation 4, for the same
reason given for the input data frequency.

tSSI -+1
~

tPLSO (min)

25

24

17

tPHSO (min)

23

11

9

tPLSO (max)

44

29

20

tPHSO (max)

40

23

16

tSOR (calc)

4

6

4

TSIO

---------~~I

1------.1
\'---_ _

S~
I

1,.--_ _

II
tHSD+j I.-

..J

\

J

I

7-6

_ __ _

1 , - -_ __

1

I.----tOHO~

Figure

Figure S. Input Data Timing

r-

~~

D~a~ta~O~u~t_~_____~~L·

---'X~~-

T~

29

·-----~--

I

' tHSI
~I.----~

40

I
=---tPHSO'::'_--II~"I.I-__tPLS~

~~I

---,r-

DIN=*'----1--:--.:....._ _ _--.:.._ _

15
67

When FIFOs are connected' in parallel to make a
wider word, under certain conditions,. they might individually ignore a read or write request The systemlevel symptom of this problem is byte mis-alignment.
When a single FIFO ,is operating alone, the words are

~'-----'---------'\,---_ _
I

35

Fo
lIFo

Operation at the Boundary Conditions

i~tPHS::...I_ _.~I.---tPLSI,··,~,~1
SI

-35

25

where Fo is the output frequency.
Then calculate the set-up tirite as
tSOR = tPLSO (max.) - tOHOR (max.)
Eq. 12
where tOHOR (max.) is the maximum value from the
data sheet.
Table 1 summarizes the data and calculations using
Equations 11 and 12 for the CY7C408 and CY7C409.
Output data should be sampled using a positiveedge-triggered flip-flop or register such as the 74AS374
or equivalent. Clock the register with the SO signal.

Output Data Set-Up Time

I

-25

mum. Instead, calculate the maximum value of tPLSO
using the relationship
1
Eq.ll
tPLSO (max.) = Fo - tPHsO

The difference between tPLSO and tOHOR is the
set-up time for the output data, tSOR. This .is true because, when cascading FIFOs, the output data must be
available a set-up time before· the OR signal goes· from
Low to High. The data' sheets for the small FIFOs
specify tSOR asO ns (min.), but you can also calculate it
yourself. Simply subtracting the data sheet values tpLSO
- tOHOR does not give a reasonable answer, because
tPLSO is specified as a minimum and tOHOR as a maxi-

,...-_____ TsI _ _ _ _ _ _

-15

9: Output Data Timing

With the small FIFOs, this is easier said than done.
This is because, with the exception of the CY7C40S/409,
they do not have full or empty flags. However, the other
FIFO s do have handshaking signals, and it has been
shown that the output data is available before OR's
Low-to-High transition. So long as the consumer
generates a Low-to-High transition on SO only when
there is a Low-ta-High transition on OR, proper operation at the empty boundary (as well as everywhere else)
is guaranteed.
Similarly, if the producer generates a Low-to-High
transition on SI only when there is a Low-to-High signal
transition on IR, proper operation at the full boundary
(as well as everywhere else) is guaranteed.

Figure 10. Forbidden Window

A Caveat for the CY7C401, 402, 403, and 404
simply missing. They were either not written or not
read.
The problem occurs at the empty condition, when a
write is immediately followed by a read, and at the full
condition, when a read is immediately followed by a
write.

In addition to the aperture of uncertainty, note that
the CY7C401 - 404 have a forbidden window of 40 ns
during which they recognize only one SO pulse. This
window (Figure 10) is measured from OR's rising edge
when the fIrst word is written into an empty FIFO to
the rising edge of the second SO pulse. The forbidden
window is a consideration only at high speeds (25
MHz), when a second output system clock could cause
a second SO pulse within 40 ns of the first OR
transition.
One way around this situation is to detect the
empty and full conditions and delay the appropriate
clock (SI or SO) the required amount of time. If the
FIFO is empty, OR does not go High within a
fallthrough time after SO goes Low; this condition can
be sensed and used to indicate EMPTY. Similarly, if the
FIFO is full, IR does not go High within a bubblethrough time after SI goes Low, and this condition
can indicate FULL.

Operation at the Empty Boundary
Consider first a FIFO that has been reset and is
empty. Read operations are inhibited by internal logic,
so that the read pointer is not incremented, but all
zeros are read at the data outputs. In the general case,
the read and write signals are asynchronous.
Upon completion of a write operation, the FIFO's
internal state goes from empty to empty + 1. During
this interval, a read operation might not be recognized.
If the read precedes the write, the read is ignored; if the
read follows the write, the read is executed. Between
these conditions, the FIFO must decide whether to
recognize the read. During this aperature of uncertainty, you cannot determine whether the read will be ignored or not With one FIFO, this behavior is acceptable. If two or more FIFOs are connected in parallel to
make a wider word, however, some might ignore the
read, and others might not.

Interfacing to the FIFO
This section deals with issues regarding interfacing
to the small FIFOs. The two areas of concern are (1)
voltage sensitivity on the SI and SO inputs, and (2)
metastability when the handshaking signals are used and
the SI and SO signals are derived from independent frequency sources. The following information applies to all
of the small FIFOs.

Operation at the Full Boundary
A similar condition occurs when a single FIFO becomes full. Write operations are inhibited by internal
logic. A read operation immediately followed by a write
operation causes the FIFO to go from full to full - 1 and
back to full. During the time the FIFO is going from full
to full - 1, the write operation might not be recognized.
The same aperature of uncertainty exists, because the
FIFO takes a finite amount of time to change internal
states. If a write command arrives at this instant, it
might be ignored.
The most obvious solution to the aperture-of-uncertainty problems is to not perform the operation at
the boundary condition. That is, (l) do not perform a
read immediately after writing the first word into an
empty FIFO, and (2) do not perform a write immediately after reading from a full FIFO.

High-Gain Inputs
The FIFO data sheets specify the minimum positive
SI and SO pulse widths as 9 ns for the 35-MHz SI/SO
versions of the CY7C40S/409 and 11 to 20 ns for the
other speed grades of all the small FIFOs. At room
temperature and nominal (5V) Vceo the FIFO operates
reliably with SI/SO pulses as short as 5 ns, as measured
at the input threshold level (approximately 1.5V). These
FIFO s respond to such short pulses because the
Cypress high-performance CMOS process yields circuits that have very thin gate oxides. This characteristic
permits the transistors to have high gains and, consequently, require very little energy to change state.

7-7

L

---DlDI

. .: .:_RK. :. .-~cr;_D_/~_-IA ~

SI
Source

47 pF.L
47 Ohms

I

B

A

C*lr

__

delay d

Figure 11. Recommended Termination Network
Tennination networks are recommended on the SI
and SO lines (traces) on printed circuit boards (PCBs)
when the lines from source to load· are long. A long line
is defined as a line whose "electrical length" is equal to
or greater than the rise time of its signal divided by the
two-way propagation delay of the line per unit length.
When the line is long, a voltage reflection might occur
that the FIFO can interpret as a clock.
The tennination matches the load impedance to the
characteristic impedance of the PCB trace, which is
typically 50 ohms or less for micros trip or stripline construction on 0-10 glass epoxy material. For minimum
voltage reflections, a slightly overdamped termination is
preferred. Cypress recommends a series capacitor of 10
to 47pF and resistor of 47 ohms connected from the
input pin (SI/SO) to ground (Figure 11). This tennination network acts as a low-pass filter for short, high-freq uency pulses and dissipates no DC power.
If you connect more than one FIFO in parallel to
make a wider word, only one tennination network isrequired. Put it at the input that is electrically the farthest
from the source.
For the method of detennining the values of R and
C for the tennination network, please refer to the lowpass filter analysis in the "Systems Design Considerations When Using Cypress CMOS Circuits" application
note in this book. That application note also explains
how to determine when a line is long. The line length at
which a voltage reflection might occur is a function of
the signal rise time, the unloaded. (intrinsic) line
propagation delay, the load, and the intrinsic line characteristic impedance.

Figure 12. Pulse Synchronizer
In the asynchronous mode, you cannot assure a
known relationship between the OR signal and the output system clock, with respect to either frequency or
phase. It is the responsibility of the designer to ensure
that, even though the output system clock edge might
occur at the same time as OR, the FIFO still receives
an SO clock wide enough for the FIFO to recognize
reliably. The same reasoning applies to the SI signal
generated in response to IR, under control of the input
system clock.

Pulse Synchronizer

n

The circuit shown in Figure
is recommended to
generate the SO pulse as a function of OR, under control of the output system clock. Use an identical circuit
to generate .the SI pulse as a function of IR, under control of the input system clock. If you want to perform
control functions on OR or IR, do so before they are
clocked by the fIrst D flip-flop.
Figure 13 shows a diagram of the two-stage shift
register as· a state machine. You can design more complex state machines for the task, but the idea is the
same: reliably generate a single pulse of a known minimum width for every OR or IR Low-to-High signal
transition.
Make the frequency of the clock to the pulse
synchronizer at least twice the maximum rate at which
you want to shift data into or out of the FIFO. For example, if you want to shift data into the FIFO at a 10MHz •SI rate, make the clock to the input pulse
synchronizer 20 MHz. If you want to shift data out of
the FIFO at a 15-MHz SO rate, make the clock to the
output pulse synchronizer 30 MHz.
If a clock of this frequency is not available, you can
easily double the frequency of the existing clock by
delaying it and exclusive-ORing the delayed signal with
the original signal. A circuit to do this appears in Figure
14a, with the timing shown in Figure 14b. If dl is the
propagation delay of the non-inverting buffer in this circuit and d2 is the XOR gate's delay, the width of the
strobe is dl + d2. This circuit does not generate a positive output strobe unless dl > d2.
You can, of course, replace the non-inverting buffer with an even number of inverting buffers. Lumped
delay elements such as gates act as glitch filters. A gate

Synchronous and Asynchronous Operation
When the SI and SO signals are derived from a
common frequency source or clock, the FIFO is, by
definition, operating. in the synchronous mode. This approach establishes a precise, known relationship between the SI and SO signals. Conversely, when the SI
and SO signals are derived from independent frequency
sources, the FIFO is operating in an asynchronous
mode.
In the synchronous mode, you can guarantee that
the OR signal does not occur within the set-up-andhold-time window that normally surrounds the output
system clock edge or sampling signal. The same reasoning applies to the occurance of the IR signal, with
respect to the input system clock.
7-8

whose propagation delay is d absorbs or filters-out short
pulses whose width is less than, but almost equal to d.
When you use the pulse synchronizer shown in Figure 12 under normal operating conditions, make SO's
minimum pulse width one cycle of the output clock
(CLK). However, when OR or IR changes within the
forbidden window around the clock edge, the flip-flop
might go into a metastable state (outputs between logic
One and Zero). The amount of time the flip-flop stays
in the metastable region is approximately 4X, where X
is the flip-flop's clock-to-output propagation delay time.
The minimum pulse width of the SO signal depends
on the delay, d, through the NOR gate, plus any delay
you might add (D, shown as a box) in the path from the
A flip-flop's /Q output to the NOR gate's input The
NOR gate acts as a low-pass filter and does not pass
pulses narrower than d. Adding an external delay, D,
increases the minimum pulse width to d + D. Assuming equal gate turn-on and turn-off times, the maximum
frequency at which the circuit can operate is

&(

) _

Ilmax. -

1
2(d+ D)

Eq.13

Choose the total delay such that the FIFO can
reliably detect the minimum pulse width. If only the
NOR gate provides the delay, Table 2 lists typical and
maximum propagation delays under nominal Vee and
loading (20 pF) conditions. A 74LS02 NOR gate results
in a minimum pulse width of 10 ns, which reliably
operates a 25-MHz CY7C403 or CY7C404 FIFO.
If you want to operate a 10-MHz CY7C401l402,
you can. invert the A flip-flop's Q output through a
74LS04 and apply the result to the NOR gate's lower
input. The minimum pulse width is then 10 + 10 = 20
ns. You can also use a delay line or RC network to
delay the signal to the lower input of the NOR gate.
Use SO's rising edge to sample (clock) the FIFO
data into a D-type flip-flop.

Operating FIFOs in Cascade Mode
When you connect two or more FIFOs together to
make a deeper FIFO, they are said to be cascaded.
There are two basic types of cascade mechanisms: serial
and parallel, used by the small and large FIFOs, respectively. In the parallel method, data is steered between
FIFO s using an internal token~ In the serial method,
data is passed serially from FIFO to FIFO using the
handshaking signals.
The throughput of serially cascaded FIFOs is
reduced in proportion to the reciprocal of the product
of the fallthrough time and the number (N) of cascaded
FIFO s. See Equation 2.
The throughput of FIFOs operating in the parallel
cascade mode is a constant, independent of the number
of FIFOs and equal to the throughput of a single FIFO
operating alone.

SO = 1

Serial Cascade Analysis
A consideration in cascading FIFOs serially is to
calculate the maximum SI and SO frequencies using the
data sheet AC parameters. It is also useful to analyze
the fallthrough (empty) and bubbletbrough (full) conditions.
Another aspect of analyzing serially cascaded
FIFO s is to understand burst mode. In this mode, you
prevent the FIFOs from "thinking" they are empty,
which avoids the devices' inherent cascaded frequency
limitation.
Figure 15a shows the required interconnections between FIFOs for correct cascading .. Data (DIA) is input
to the A FIFO and then transferred to the B FIFO. The
data flows from left to right, and it is standard practice
to call FIFO A the upstream FIFO and FIFO B the
downstream FIFO.
For the data to transfer reliably from FIFO A to
FIFO B, the data must be valid at the inputs to FIFO B
at least a set-up time before the Low-to-High transition
of FIFO A's OR output. This is because the OR is ap-

Transition Table
.

B

STATE

Description

0

0

0

idle at state 0

1

0

1

output

SO=1

1

1

3

output

So=o

0

1

2

transition state

A

Figure 13. Pulse Synchronizer State Diagram

7-9

A

LC>------.J)~
B

dl

Typical (ns)

Maximum (ns)

LS

10

15

ALS

5

11

.HCMOS

8

23

FACT

5

9.5

Table 2. Gate Propagation Delay Times

Figure 14a. Digital Frequency Doubler

I
v,--------.t

Familv

I

Y

'-.

l -d-l-+-1- - - -

~

~

I

I

!

~B___.~I

:;1
C
__--:-'I. I

\1
I

-.J \.-

\ 1'--_______

/1,---------\t

I '-.--+-1-' I

~ dl ~I'" d2~1

d2

In operation, the producer samples the mA line
and, finding it High, presents the data to be written to
FIFO A. A set-up time later, the producer causes a
Low-to-High transition on FIFO A's SI input. FIFO A
samples the data, and the m output goes from High to
Low.
Nothing further happens until the producer causes
a High-to-Low transition on FIFO A's SI input As a
result, the write pointer is incremented, it settles, the m
output goes from Low to High, and the FIFO's internal
state becomes empty + 1. Because FIFO A's SO input
is High, the data just written is output on the DOA
pins, and an internal one-shot is fired that causes a 15ns pulse to appear on FIFO A's OR output. When the
one-shot fires, the conditions at FIFO B' s inputs are
identical to those at FIFO A's inputs when the sequence began.
This cascade handshaking sequence repeats for
every FIFO in the string. The first N-l FIFOs must go
from empty to empty + 1 and then back to empty to
pass the data word to the last (Nth, or output) FIFO.
This does not mean that the frrst data word must pass
through aU N or N - 1 FIFOs before the second (or
subsequent) data words enter the first FIFO. However,
the frrst FIFO must go from empty to empty + 1 and
then back to empty before a second data word (rising
edge on SI) can enter.

I

I

i'-

1~d2~I'" dl ~I'" d2 ~I

Figure 14b. Digital Frequency Doubler Timing
plied to FIFO B's SI input. As explained previously,
data is sampled on S1's Low-to-High transition.
In the cascade configuration, the downstream
FIFO's m output connects to the upstream FIFO's SO
input, and the upstream FIFO's OR output connects to
the downstream FIFO's SI input. These two control
connections are the only ones required to cascade the
FIFO s. In theory, you can cascade any number of
FIFO s in this manner.
The timing for serieally cascaded FIFOs appears in
Figure i5b, which does not show the data. The signals
begin in their quiescent states after a reset (/MR, not
shown). Both FIFOs are initially empty.
There is one key difference between the quiescent
state of a FIFO operati~g alone versus two or more
FIFO s operating in cascade mode: In the stand-alone
configuration, SO is Low, whereas in the cascade configuration, the SO inputs of the upstream N-l of N cascaded FIFOs are High. This is true for all conditions,
except when the downstream FIFO is full. As the
downstream FIFOs fill, their m outputs go Low, indicating that they are full.
When you cascade two FIFOs, the intrinsic handshaking frequency limitation goes away when the
downstream FIFO becomes full.

Serial Handshaking Calculations
Now consider how to calculate the intrinsic handshaking frequency for two or more FIFOs cascaded
together. On the SIA signal's falling edge, fallthrough
begins (tBT on the data sheet). When FIFO A's OR
output goes from Low to High, FIFO B samples the
input data. In response to the Low-to-High transition
on FIFO B's SI input, FIFO B's m output goes from
High to Low. This time is called tDLIR. FIFO A is now
empty, and FIFO B is empty + 1.
In equation form:
F(hs) = tBT + \DLIR

7-10

Eq.14

From the CY7C408/409 data sheet for the 35-MHz
speed grade:
tBT = 50 ns, tDLIR = 15 ns
Substituting these values in Equation 14 yields:
F(hs) = 15.38 MHz
In practice, you will probably never observe this
worst-case cascade-handshaking-frequency limitation,
because the values given in the data sheet are
"guardbanded." For typical Cypress FIFOs at room
temperature and Vee = 5V:
tBT = 20 ns, . tDLIR = 10 ns
which yields a cascade handshaking frequency of
33.3 MHz. Note that this value applies to the entire
string of cascaded FIFOs and is independent of the
number of FIFOs cascaded together.
The same cascade-handshaking-frequency limitation occurs when both FIFOs are full and the
downstream FIFO receives two SO pulses. In this case,
the downstream FIFO goes from full to full - 1 and back
to full. The empty location then bubbles through to the
upstream FIFO, and downstream FIFO's IR output pulses. Internally, the one-shot is fired, and the upstream
FIFO changes its OR output from High to Low and
then back High (when the pulse ends).

A FIFO

The cascade handshaking frequency is the reciprocal of the sum of the bubblethrough time and the
propagation delay time from SO going Low to High to
OR going High to Low (tDLOR). By design, these
parameters have the same values as the fall through time
and tOLIR, respectively. Therefore, the cascade handshaking frequency is the same for the full condition as it
is for the empty condition.

Burst Input
It stands to reason that if the cascaded FIFOs can
be made to think they are not empty, you can enter data
at a higher rate than the cascade handshaking frequency. Also, if they can be made to think they are not full,
you can remove data at a higher rate than the cascade
handshaking frequency.
Figure 16a shows how to take advantage of these
facts by adding an inverter between. the downstream
FIFO's IR output and the upstream FIFO's SO input.
Note, however, that every FIFO· whose SO input is the
inverted IR output of a downstream FIFO has its
capacity reduced by one word.
From the timing diagram in Figure 16b, you can see
that the composite FIFO never goes empty. The cascaded handshaking illustrated is essentially the same as
that of Figure 7, which is the stand-alone output handshaking timing.
When the fIrst (most upstream) FIFO is empty,
there is a fallthrough time (tBT) delay after the first
word is shifted in. Or if the difference between the
shift-in and cascade handshaking frequencies is great
enough, the fIrst FIFO goes empty. If the shift-in frequency is sufficently greater than the cascade handshaking frequency, the first FIFO goes full, and a fallthrough
time (tBT) delay occurs.
Except for the preceding conditions, adding the inverter enables the cascaded FIFOs to be either loaded
at the stand-alone maximum shift-in frequency, or to be
burst loaded using the AFE and HF flags of the
CY7C408/409.
If you cascade N FIFOs together, inverters are required on the IR outputs of the N - 1 downstream
FIFOs.
With the exception of the full and empty conditions
of the fIrst FIFO, the cascade handshaking frequency
with the inverter is:

B FIFO

OR I--...;:::O=R::..A::L.-::S=IB=---l~ SI

OR

ORB

SO I4-.;,;;,...SO.:,...A....;,_IR_B'--.--I IR

D B

DOA

IMR

Figure 15a. Cascaded FIFOs: Intrisic Handshaking

IRA
SIA

1
Eq.15
tDLIR + to
where f(hsi) is the handshaking frequency with the inverter and to is the inverter's Low-to-High propagation
delay time.
Comparing Equations 13 and 14 reveals that if the
inverter's delay is less than the fallthrough time, the cascade handshaking frequency with the inverter is less
than the intrinsic cascaded handshaking frequency. If
tOLIR = 10 ns and to = 10 ns, then f(hsi) = 50 MHz.
This means that because the handshaking frequency is

f(hsi) =

SOA, IRB
ORA,,_S_I_B_--:-+J
ORB
SOB

/
LOW

Figure ISh. Cascade Timing: Intrinsic Handshaking
with FIFO

7-11

greater than the stand-alone SI/SO frequencies. the
throughput is not limited by the handshaking frequency.

either the SI or the SO frequency is over 5 MHz. connect a 100- to 400-pF mica capacitor or high-frequencyfIltering ceramic capacitor between Vee and "noisy
ground," and connect a second l00-pF cpacitor between
Vee and "quiet ground." Keep lead lengths as short as
possible.
For applications in which the SI and SO frequencies exceed 10 MHz, Cypress recommends a Pi fIlter on
the Vee line to the FIFOs. Because the fIlter is bidirectional, it keeps other ICs' noise from the FIFOs and the
FIFO s' high-frequency noise from the other ICs. The
inductor should be a subminature RF choke with a
series DC resistance of Hl or less and an inductance of
l00~. Make the capacitors 500-pF mica or ceramic
types.
.

Care and Handling of Small FIFOs
The rest of this application note provides general
guidelines for overcoming any problems you might have
using the CY7C401 through 404. CY7C408/409. and
CY3341 FIFOs.
One important factor to keep in mind involves the
very high gain transistors used in the FIFOs to achieve
the desired performance. These transistors' high speed
can cause the FIFOs to respond to short pulses on the
SI and SO inputs that bipolar. NMOS. and some CMOS
FIFO s do not see. As a result, the small FIFOs might
lock up or drop bits.

Sf and SO Signal Considerations
To achieve the best results, make sure the SI and
SO signals have rise times and fall times of 5 ns or less
between 0.4 and 4V. At 5V Vee and room temperature,
the small FIFOs operate reliably with SI/SO pulses 5 ns
wide. measured between the 1.5V levels.
Therefore, it is imperative that the signals be clean
and slew rapidly between logic levels. If noise is superimposed on a slowly rising or falling signal, the FIFO
might interpret the signal as multiple clocks.

A voiding Lock Up
The lock-up phenomenon occurs in the presence of
excessive noise on the Vee or ground lines; short pulses
on SI or SO; or noise on the /MR line. Two distinct
lock-up states have been observed: full and locked up.
When IR is Low and OR is High. the FIFO thinks
it is full. However. in the full-lock-up state. no matter
how many SO pulses are applied, the FIFO never goes
empty (i.e.. IR never stays High). You can get out the
FIFO's contents. but those contents might not be the
same as the data that went in.
The locked-up lock-up state should never occur. It
is the quiescent state where both the IR and OR signals
are Low. In other words. the FIFO thinks it is simultaneously full and empty.
The only method of recovery from the two lock-up
states is to reset the FIFO by activating the /MR pin.
All data is lost. If the FIFO is dropping bits, misaligning
words, or occasionally just stopping, make sure that the
/MR signal does not have noise on it. A small capacitor
(47 to 100 pF) connected between the IMR pin and
ground eliminates the noise.

A FIFO

B FIFO

OR 1--0...;;...;R.;.;;.;A'"""',c.....;;S=IB~~ SI
SO

OR

ORB

IR
.DaB

DOA

IMR

Dropping Bits
The dropping of bits is annoying and unacceptable,
but the data corruption does not cause the control logic
to fail. Data corruption occurs because of either noise
on the Vee and ground pins or improper data sampling
at the FIFO inputs or outputs. A significant amount of
noise causes lock up; less noise can cause data corruption.

Figure 16a. Cascaded FIFOs: Burst Mode Operation

IRA
SIA

I
;---\
r
'~---'

IRB

Vee and Ground Noise
Reliable operation of the small FIFOs requires
clean Vee and grounds. Keep peak-to-peak Vee noise to
less than 200 mV. Additionally, keep the "quiet ground"
(pin 7, DIP. CY7C408/409) separate from the "noisy
ground" (pin 22, DIP, CY7C408/409), and connect both
to system ground or a groundplane. Make the lead
length from the pin to ground as short as possible.
Another noise-control procedure is to connect a
O.OliJF ceramic decoupling capacitor between each
FIFO's Vee and "noisy ground" pins. In addition, if

'-----'~,

SOA
ORA, S_I_B_---:=,"'""~tBT
ORB
SOB

=

LOW

Figure 16b. Cascade Timing: Burst Mode with FIFO

7-12

The source that drives the SI/SO pin should have
active devices pulling in each direction. That is, use
totem-pole-output drivers instead of open-collector/drain outputs with a resistor to Vee.
Beware of decoding glitches on the SI/SO signals.
You can eliminate these glitches by using an AC termination network consisting of a series RC from the
SI/SO pin to ground (Figure 11). This network also acts
as a fllter and absorbs pulses that are shorter than four
RC time constants. If the line is short and does not require termination, you can use a small capacitor (47 to
100 pF) to kill the glitches. When you c~nn~t FIFOs in
parallel to make a wider word, one termmation netw~rk
is required for all SI pins and a second for all SO pms.
Connect the network to the pin farthest from the
source.
All the small FIFOs sample the input data for 10 ns
after the SI pulse's Low-to-High transition. Therefore,
the input data should be held stable at least 10 ns a!ter
S1's rising edge. Violation of the set-up and hold-time
specifications can cause data corruption.

General Troubleshooting Guidelines

The switching speeds of CMOS devices are inversely proportional to temperature and directly proportional
to supply voltage. Thus, the combination of low
temperature and high Vcc is called the "fast-fast" corner,
and high temperature and low Vcc is the "slow-slow"
corner.
If increasing Vcc to the FIFO or PCB increases the
number of failures, the problem is probably noise related. If increasing Vcc reduces the number of failures,
the problem is probably due to marginal timing. If yo~
reduce the temperature using a product such as Freezlt
while at low Vcc, and the failure rate increases, you have
confirmed that the problem is marginal timing.
CY7C408/409 Only

If all else fails, you can increase the internal device
thresholds by adding a diode (IN914, IN4004) between
"quiet ground" and power ground, cathode to power
ground. This increases the threshold to Vt = 1.5V +
0.8V = 2.3V. A single diode suffices for many FIFOs.
The number of FIFOs one diode can handle depends
on the diode's forward current rating.

7-13

CYPRESS
SEMICONDUCTOR

Understanding Large FIFOs
This application note explains the internal operation of the large FIFOs manufactured by Cypress and
shows how to use the devices to accomplish depth and
width expansion. Other topics covered here include
FIFO interfacing, the writing and reading process,
failure modes, and typical problem symptoms and solutions. This information applies to the following Cypress
FIFOs: CY7C420, CY7C421, CY7C424, CY7C425,
CY7C428, CY7C429,· CY7C432, CY7C433, CY7C439,
CYM421O, and CYM4220.
Timing parameters given in this application note
are
taken
from
the
Cypress
Semiconductor
BiCMOSICMOS Data Book.

These FIFOs feature 70-MHz operation and are
characterized by self-timed interfaces. You generate the
read· and write enables, which are. combined internally
with the appropriate clocks. Thus, you do not need to
generate narrow read and write pulses. These FIFOs
also feature totally independent, asynchronous, read
and write operations.
The CY7C420/421, CY7C424/425, CY7 C428/429,
and CY7C432/433 are, respectively, 512, 1024, 2048, and
4098 words deep by 9 bits wide. Each FIFO is organized such that data is read out in the same sequential order in which it was written. Full, half-full and
empty flags facilitate writing and reading. Additional
pins are provided to facilitate unlimited expansion in
width and depth, with no performance penalty.

Large FIFO Overview
The Cypress line of large FIFOs provide densities
from 512 x 9 to 4K x 9 in monolithic devices; 8K and
16K x 9 in high-density modules; and a 2K x 9 bidirectional FIFO. Access times are as fast as 20 ns, and all
the FIFOs feature identical, industry-standard pinouts.
The monolithic devices are available in space-saving
300-mil-wide DIPs (odd-numbered devices) as well as
industry-standard 600-mil-wide DIPs (even-numbered
devices), and various surface-mount packages.
The CY7C420, CY7C421, CY7C424, CY7C425,
CY7C428, CY7C429, CY7C432, and CY7C433 are
fabricated using an advanced 0.8J..l (drawn), n-well,
CMOS technology. Input ESD protection is greater
than 2000V, and careful layout, guard rings, and a substrate bias generator prevent latchup.
Although the first FIFOs utilized a shift-register
type of architecture, today's large FIFOs employ an
SRAM type of interface. Data is written into and read
out of the devices, as with SRAM write and read operations. These operations can occur totally independently
of one another and are made possible by a specially
designed six-transistor, dual-ported SRAM cell. This
cell makes use of separate read and write transistors to
allow independent R/W operation.
Operating these FIFOs at their maximum throughput rates demands the generation of extremely narrow
write and read pulses. To facilitate significantly higher
throughput rates, Cypress has developed the CY7C440
and CY7C450 families of clocked, or self-timed FIFOs.

Writing to and Reading From the FIFO
Figure 1 shows the large FIFOs' read and write
timing. Reads and writes are asynchronous to each
other. The read process begins with iPs falling edge.
The output data bus, QO - Q8, leaves the high-impedance state tLZR ns after iPs falling edge. The output
data becomes valid tA ns after that same falling edge.
This tA period is referred to as the FIFO's read access
time. iPs rising edge ends the read process.
The data on the QO - Q8 bus remains valid for
tOVR ns following the if rising edge. This is the output
data hold time at the end of the read cycle. The internal
circuitry then readies itself for the next read operation.
This period is referred to as the tRR, or read recovery
time, and must be observed between consecutive read
operations. The read signal's minimum pulse width is
denoted by tPR and is identical to the read access time,
tAo The maximum read frequency is the reciprocal of
tPR + tRR.
The write process is similar to the read process. A
write begins with the falling edge of the write line, W,
and terminates with W's rising edge. For a valid write to
occur, the input data bus, DO - D8, must be stable for
tso ns prior to W's rising edge and for tHO ns after this
edge. These specifications are referred to as the data
set-up and hold times, respectively. The write strobe
also has a minimum negative pulse width, denoted as

7-14

fR

'---_____I
---~ _______ )1------«

DATA IN VALID

>--

Figure 1. Asynchronous Read and Write Timing
tpw. A minimum recovery time, twR, is required between write cycles.
The maximum write frequency is the reciprocal of
tpw + twR. As an example, a device with a 20-ns write
strobe width and a lOons write recovery time yields a
30-ns write cycle time, or a 33.3-MHz maximum write
cycle frequency.
You can determine the read cycle time (tRC) by adding the access time (tA) and the read recovery time
(tRR), which you can find in the FIFO data sheet The
maximum read frequency is the reciprocal of tA + tRR.
For example, a Cypress FIFO with a 20-ns access time
and a 10-ns read recovery time results in a 30-ns read
cycle time, or 33.3-MHz maximum read cycle frequency.
The FIFOs include separate write and read
counter s (pointers). Each write or read operation increments the appropriate counter one position. When the
FIFO is empty, both counters point to the same location. The relative position of these counters determines
the device's status, which is indicated externally via
empty, half-full, and full flags.

Communications systems, including local area networks
Digital-signal-processing-based systems, for buffering real-time data
Electronic data processing, CPU, and peripheral
equipment, including high-performance disk controllers

Common FIFO Configurations
Every Cypress FIFO, from the 512 x 9 CY7C420/21
to the 4K x 9 CY7C432/3, are fully cascadeable. Width
expansion allows you to create word widths of any multiple of nine bits. Cascading in depth creates FIFOs of
various depths. Width and depth expansion modes are
described here, along with design considerations.
Figure 2 illustrates stand-alone mode, and Figure 3
shows width expansion mode. In both these modes, the
XI (expansion in) pin is grounded and the FL (first
load) pin is tied High.
The OR gates in the width-expansion design
generate composite full, half-full, and empty flags (F,
H-F, E). Composite flags are necessary because variations in propagation delays might prevent the individual
FIFOs in the design from entering the F, H-F, or E
states simultaneously. A composite flag properly reflects
the instantaneous status of the entire word.
Figure 4 illustrates depth expansion. The FL (first
load) pin on one device must be grounded to define
that FIFO as the ftrst FIFO to be written to. The FIFOs
are then daisy-chained together by connecting one

Applications
FIFOs are asynchronous devices that are ideal for
interfacing between two asynchronous processes. A
FIFO allows two systems running at different data rates
to communicate by providing a temporary data or control buffer.
Typical FIFO applications include:
Inter-processor communications, in which bidirectional devices are especially useful
WRITE ENABLE
____

1R104~

~IW

INPUT DATA
_ _ _ _~DO-D8

READ ENABLE

_ __

OUTPUT DATA

00-08 ...._ _~. . .

MASTER RESET
_ _ _ _~/MR

IFF
STATUS FLAGS

IFL

IEF

IXI

IHF

FULL. EMPTY. HALF-FULL

Figure 2. Stand·Alone Operation

7-15

device's XO (expansion out) output pin to the next
device's X! (expansion in) input. The XO of the last
device in the chain is connected to the X! of the frrst
device, thus forming a token-passing ring.
Token passing allows the writing and reading
processes to stay consistent. That is, the passing and
holding of a read or write token tells an individual
FIFO whether it is actively being read from or written
to. In the token-passing procedure for write operations,
the frrst FIFO is written to until it is filled. An internal
write pointer. determines the location written to, and
after every write, the pointer is incremented. When the
pointer reaches the last physical location, no more
writes can occur to that device. At that point, the frrst
FIFO passes the write token to the next FIFO in the
chain via the XO-X! interface. The second device, now
in possession of the write token, receives all future written data until this device also fills up and passes the
write token onto the next device in the chain.
If enough writes occur to fill up the FIFO chain,
the last device fails in its attempt to pass the write token
back to the frrst device. This is because the full FIFO
cannot accept a write token. No further writes to the
FIFO chain are allowed until a read operation occurs,
which frees up an internal location. The relative positions of the internal write and read counters determine
a device's status and whether it can accept data though
a write operation. Figure 5 shows the timing for write
operations.
As with the procedure for writes, the frrst FIFO in
the chain holds the read token. When the FIFO chain is

read from, the device holding the read token supplies
the data from the address specified by the device's read
pointer. The read pointer is then incremented. The incrementingcontinues until the FIFO is empty, and the
read token is passed to the next device in the chain. The
passing of the read token is done via the XO-XI interface. Figure6 shows the timing for read operations.
A depth-expansion design must generate composite
status flags to adequately reflect the instantaneous state
of the FIFO chain, as is done for width expansion.

Retransmit
The retransmit feature is useful in communications
for retransmitting packets of data and in disk drives for
rewriting sectors. It is especially useful in applications
where a single block of data in the FIFO must be sent
out multiple times, as in a word or pattern generator.
Data can be retransmitted any number of times,
and with Cypress FIFOs, the retransmit feature can be
used at any time, no matter how much data the FIFO
contains. This is in contrast to some competing FIFOs,
such those from IDT, which do not allow. use of the
retransmit function when the FIFO is full.
In the retransmit operation, the read pointer is
res~t to its initial location and the if pin is pulsed until
the read pointer. advances to the same memory location
addressed by the write pointer. The retransmit (RT) pin
is available in the single-device and width-expansion
modes, but not in depth expansion because this pin
designates the FIFO to be loaded frrst.

IFF

RITE

9,

ATAIN

IFULL

,
t--

..
..-

CY7C420
CY7C421
CY7C424
CY7C425
CY7C428
CY7C429
CY7C432
CY7C433

9,

-..

I

...

9,

-...

I
/RESET

9,

-

DATA OUT

,
IFL

IXI

~
CY7C420
CY7C421
CY7C424
CY7C425
CY7C428
CY7C429
CY7C432
CY7C433

IE MPTY

---L-

"

9,

DATA OUT

9,

DATA OUT

,
IFL

IXI

IFF

ATAIN

/READ

-"

IFF

-.J-ATAIN

-.

I

CY7C420
CY7C421
CY7C424
CY7C425
CY7C428
CY7C429
CY7C432
CY7C433

--

,
IFL

IXI

~7

Figure 3. Width Expansion

7-16

ixo

FF

...

!WRITE

9,
, -

9,

DATA IN

,

-

CY7C420
CY7C421
CY7C424
CY7C425
CY7C428
CY7C429
CY7C432
CY7C433

IlO

I F
IREAD

9,

9,

,

DATA OUT

,

IFl

-

Vee

to

IFF

IFULL

...,.

I

9,
I

...

CY7C420
CY7C421
CY7C424
CY7C425
CY7C428
CY7C429
CY7C432
CY7C433

I

9,

-

I

Fl

IEMPTY

---

IlOt

lilco

IFF

~

-

I

...

IRESET

-

CY7C420
CY7C421
CY7C424
CY7C425
CY7C428
CY7C429
CY7C432

CY7C433

IX',

9,
I

Fl

~

Figure 4. Depth Expansion

The retransmit function is initiated by asserting an
active-Low pulse to the retransmit input, which resets
the internal read counter to zero. Keep the R input inactive during this time; otherwise, the conflicting requirements on the read counter might cause it to become corrupted. The retransmit process does not affect
the state of the write counter or the write process,
though the retransmit timing constrains shown in Figure
7 must not be violated.
Note that the architectural description in the 1990
and previous Cypress data books incorrectly stated that

the W input must be inactive during a retransmit cycle.
No design or usage rules are violated if retransmit and
write cycles overlap or occur simultaneously; the device
does not lockup, and data is neither lost nor corrupted.
The reasons for the data book's retransmit/write
restriction are more historical and application-oriented
than functional. Specifically, the first large FIFOs did
not permit writes during a retransmit cycle. This set a
documentation precedent that all future devices had to
match.
WRITE TO FIRST PHYSICAL
LOCATION OF DEVICE 2

WRITE TO LAST PHYSICAL
LOCATION OF DEVICE 1

NI

*'XO

1(XI) 2

00-08

Expansion Out of Device 1 (X01) is connected
to Expansion In of Device 2 (XI2)

Figure 5. Write Expansion Timing

7-17

READ FROM LAST PHYSICAL
LOCATION OF DEVICE 1

READ FROM FIRST PHYSICAL
LOCATION OF DEVICE 2

fR

*/XO 1(XI) 2

0D-08

Expansion Out of Device 1 (X01) is connected
to Expansion In of Device 2 (XI2)

Figure 6. Read Expansion Timing
Additionally, keeping track of what data is currently in the FIFO and what data is being read out can become complicated. For example, if a FIFO is half full
and the retransmit function is activated and writes continue, filling the FIFO to three quarters full before the
read pointer catches up with the write pointer, the
FIFO outputs all of the data.

causes this data to be written a second time, resulting in
duplicated data.
Write glitches are often the result of voltage reflections due to impedance mismatches, which you can
eliminate using impedance-matching termination networks. Termination networks are recommended on the
Wand if traces on printed circuit boards (PCBs) when
the lines exceed approximately 4 inches from source to
a single load. This line length assumes a 2-ns rise/fall
time for the read and write strobes. For if and W signals with sub-2-ns rise/fall times, line lengths as short as
1 inch might require termination.
A termination network matches the ·load impedance to the PCB trace's characteristic impedance,
which is typically 50n or less for microstrip or stripline
construction on 0-10 glass epoxy material. To minimize
voltage reflections, a slightly overdamped termination is
preferred. Cypress recommends a 47-pF (max) series
capacitor and a 47-ohm resistor be connected from the
read or write pin to ground (Figure 8). This termination
network acts as a high-pass filter to short, high-frequency pulses and dissipates no DC power. Read or write
lines that drive more than one FIFO require only one

Common Problems and Solutions
To help prevent problems and correct them
they occur, this section describes the causes and
tions to some common FIFO problems. The ftrst
lem to consider is corrupted or repetitive data
FIFO.

when
soluprobin a

Corrupted or Repetitive Data
The most common cause of corrupted and repetitive data being present in a FIFO is a spurious active
signal (glitch) on the FIFO's W input. Because Cypress
devices are extremely fast, a write pulse as short as 3 ns
initiates a write. Write glitches cause whatever logic
levels are present at the data inputs to be written into
the FIFO, which can put false data into the device. If
valid data is present at the data inputs, a write glitch
t
PRT

fFL,/RT

fR

t

t

RTR

is the Retransmit Recovery time.
RTR

It is a timing window which must not be violated.
t

is the minimum retransmit pulse width.
PRT

Figure 7. Retransmit Timing

7-18

CYPRESS

iL
SOURCE
47pF

47 OHMS

Boundary conditions are defined as the FIFO being
either full or empty. When high-density FIFOs are connected in parallel to make a wider word, certain conditions can cause the FIFOs to choose individually to
either ignore or act upon a read or write request. The
system-level symptom of individual FIFOs making different decisions is word mis-alignment. The problem
occurs in the empty condition when a read immediately
follows a write and in the full condition when a write
immediately follows a read.

FIFO
fR,IW

-L

I

Operation at the Empty Boundary

Figure 8. Recommended Termination Network

Consider a FIFO that has been reset and is empty.
The empty flag is active (Low), and internal logic inhibits read operations. In the general case, the read and
write signals are asynchronous. Upon completion of the
write operation the internal state of the FIFO goes from
empty to empty + 1. During this interval, a read operation might or might not be recognized. A read preceding the write is ignored; a read following the write is
not. In between these conditions, the FIFO decides
whether to recognize the read. During this aperture of
uncertainty, you cannot determine whether the read will
be ignored or not. With one FIFO, this uncertainty is
acceptable. However, if two or more FIFOs are connected in parallel to make a wider word, some might
ignore the read, and others might not.

termination network. Put the network at the input that
is electrically farthest from the source. For multiple
loads, see the "Systems Design Considerations When
Using Cypress CMOS Circuits" application note for
help in determining the maximum line length.
FIFO data corruption can also be caused by violation of master-reset timing constraints. As shown in the
timing diagram in Figure 9, the read and write signals
must be inactive around the rising edge of MR (master
reset) to satisfy the. tRMR, or master-reset recovery-time
specification. This constraint is necessary because the
FIFO goes through an internal initialization process
during reset and requires a settling period after the
reset terminates.

Operation at the Full Boundary

FIFO Locks-Up

A similar condition occurs when a single FIFO becomes full. The full flag is active (Low), and internal
logic inhibits write operations. A read operation immediately followed by a write operation causes the FIFO
to go from full to full - 1 and back to full. During the
time the FIFO is going from full to ftill- 1, a write
operation might or might not be recognized. The aperture of uncertainty applies here because the FIFO takes
a finite amount of time to change· states, and a write
command arriving at this instant might be ignored.

Short noise pulses on the FIFO's master reset pin
can cause the FIFO to not respond because it is "partially reset." If this problem occurs, you might need to
terminate the master reset line.
Missing or Disappearing Data

Glitches on the if input can cause data to disappear because of an unintended read operation. The
read increments the internal read counter, resulting in
the loss of the current data word. Here again, a termination network eliminates the unwanted glitches.

Waiting at the Empty Boundary
Figure 10 shows the timing that prevents problems

Repetitive or Out-oj-Sequence Data, False Full or Empty

A .misaligned . internal read or write pointer can
cause a variety of symptoms, including repetitive or outof-sequence data and false full andlor empty conditions.
The two most common causes of misaligned pointers
are master-reset violations and boundary-condition
violations.

with reads at the empty boundary. Any device reading
from the FIFO must wait an amount of time, tRAE, after
the termination of the write operation before causing a
High-to-Low transition of the if signal. The W signal's
rising edge indicates the termination of the write operation.
1 MRSC

tMR

\

I
~

tR,IW

I

//////

"////

\

1 RPW
1 WPW

.-1

RMR_

Figure 9. Master Reset Timing

7-19

~

IW

IR

IEF

DATA OUT

- - - - - - f - - - i - - - - - < V A L I D DATA

t RAE is an invalid read window.
A read operation should never be initiated inside this window.

Figure 10. Read Fall-Through Timing Violation
One way to satisfy this timing is t.Q..gate read operations with the composite empty flag (EF) such that the
read operation is prevented when the empty flag is active. Note, however, that the if signal can be Low either
before or during the first write to the empty FIFO and
the data still propagates to the outputs correctly.

Internal logic also inhibits attempts to write to a
full FIFO, and the write pointer is not incremented.
Intermittent Malfunctions
If all the timing requirements appear to be met and
data in the FIFO is still corrupted, the cause is likely to
be noise on the power supply. Random spikes on either
the Vcc or ground pins of the FIFO are likely culprits
when non-repeatable failures occur.
The cure for this problem is to add a high-pass fIlter capacitor between the device's power and ground
pins. This practice is recommended whenever the read
or write frequency exceeds 5 MHz. Use a very small
(100 - 500 pF) ceramic or mica capacitor. Precision fIltering capacitors of this type are available through suppliers such as Rogers Corporation, 2400 S. Roosevelt
St., Tempe, AZ 85282.
The filter capacitor is in addition to the 0.1- or
0.01i1F decoupling capacitor that should always be
present with any high-speed digital chip. Although
decoupling capacitors are often referred to as bypass
capacitor s - inferring fIltering properties - their true
function is to supply the instantaneous current required
when many or all device outputs simultaneously switch
from Low to High. This larger capacitor thus decouples
or isolates the Ie from the power distribution system.

Waiting at the Full Boundary
Figures 11 shows the timing that prevents problems
with writes at the full boundary. Any device writing to
the FIFO must wait an amount of time, twAF, after the
termination of the read operation before causing a
High-to-Low transition of the W signal. The if signal's
rising edge indicates the end of the read operation.
You can enforce this timing by ~ng write operations with the composite full flag (FF) such that the
write operation is prevented when the full flag is active.
However, the W signal can be Low either before or
during the first read from a full FIFO and the data is
still properly written.
Empty Reads and Full Writes
When Cypress FIFOs are empty, their data outputs
go to the high-impedance state. Therefore, attempting
to read from an empty FIFO yields unpredictable data.
Internal logic inhibits the read, and the read pointer is
not incremented.
IR

IW

IFF

I WAF is an invalid write window.

A wrile operation should never be initialed inside this window.

Figure 11. Write Bubble-Through Timing Violation

7-20

CYPRESS
SEMICONDUCTOR

Designing with the CY7C439
Bidirectional FIFO (BIFO)
The BIFO eliminates the need for other costly,
space-intensive solutions that bidirectionally transfer
data between two buses with disparate data rates. One
alternative is a two-FIFO design (Figure 2), which requires a signifIcant amount of board space and control
circuitry. Although this solution achieves its objective,
the two separate FIFOs are rarely needed because large
amounts of data usually transfer in only one direction at
a time. Another BIFO alternative utilizes one FIFO that
can be switched from one direction to the other with
bus-steering logic (Figure 3). Although this solution
costs less than the previous one, it requires many MSI
parts and thus requires more board area.
The CY7C439 solves all the problems associated
with these alternative solutions. The CY7C439 utilizes
signifIcantly less board area, requires less power, and
eliminates the need for complex FIFO control circuitry.
Additionally, the CY7C439 is fully pin programmable,

This application note describes the features of the
CY7C439 bidirectional FIFO (BIFO) and shows how to
use the BIFO in a multiprocessor communication
design. The CY7C439 is a 2K x 9 FIFO memory that
transfers data asynchronously at rates as high as 28.5
MHz.

BIFO Overview
Figure 1 shows a block diagram of the CY7C439
BIFO. The device has three internal data paths. The
fIrst path consists of a 2048-word-by-9-bit dual-ported
RAM array, which allows half-duplex, bidirectional
FIFO buffering. The second path, the registered bypass,
allows registered message passing in the opposite direction from the FIFO-path operations. The last path, the
transparent bypass, allows data to pass in either direction around the FIFO path.

.IIT A

.on.

114 • • •

1A[t •• 1]

DI[O •• ' l

flFt

liS IT
FL'"

LOIIC

la"u,,,aUl
n",n

Figure 1. BIFO Block Diagram

7-21

I""
I"
IIAt

~C\'PR!$

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

bypass directions as well as resetting the :BIFORAM
array.
The bidirectional BIFO interface is similar' to
Cypress's CY7C42x family of FIFOs. In the CY7C42x
FlFOs, data is written into the BIFO on the W line's
rising edge and read out of the FIFO on the Kline's
falling edge. The CY7C439 works nearly the same. If
the direction of the FIFO is from A to B, data is written
into the FIFO on the rising edge of the STl3A signal
and read out of the FIFO on the falling edge of the
STlm" signal. The function of these two pins is reversed
if the FIFO direction is set from B to A. Table 2 shows
these relationships.
The BIFO three-states its data lines on STIm's
rising edge. BIFO circuitry does not allow additional
Reads beyond empty or' additional Writes beyond full.
The "AC Timing" section describes the device's critical
timing parameters.

Figure 2. Two FIFO Design
COITIOL a STATUS

COITROL

COITROL

a

STATUS

snlAL
DECODEI
DATA

Registered bypass

DATA

The CY7C439's registered bypass feature provides
a way to send a word in the opposite direction to the
FIFO data flow. The bypass feature is useful for message passing to indicate control and status information.
In communication environments, for example, you can
use the bypass register to indicate that a packet was not
received correctly. This feature eliminates the additional circuitry required to allow a data consumer to communicate with a data producer.
The bypass operation does not affect the normal
FIFO operation. The consumer writes bypass data into
the ,register on the rising edge of BYJ5X. The x in this
pin name indicates that either the BYPA or lITPlr pin
is applicable, depending on the BIFO's direction. The
assertion of the IrnA flag signals the producer that it

Figure 3. Switch FIFO Design
contains a hardware reset, allows message passing
against the BIFO flow, and permits the initialization of
dumb peripherals via the transparent bypass feature.

Half-DuplexBIFO Operation
Table 2. BIFO Operation Truth Table

When you reset the BIFO externally, you pulse the
Nm" pin Low. During master reset, the BIFO's direction
is set according to the state of the BYPA pin (Table 1).
The EYPA state is latched internally on ~'s rising
edge. If the l3Yfi'A state is High on '&fR's rising edge,
the BIFO direction is A to B, and the registered bypass
direction is B to A. If, on the other hand, BYPA is Low
on Kn'f's rising edge, the FIFO direction is B to A, and
the registered bypass direction is A to B. The master
reset cycle is thus useful for setting the FIFO and

Dir
A->B

STBA BYPA STBB BYPB
"lr

iI

A->B
A->B

1I

B->A

"'U"

Normal Operation

B->A

1

FIFO (A-> B),
Bypass (B->A)

B->A

""U"

FIFO (B-> A),
Bypass (A->B)

ANY

"'U'

FIFO Write at A,
FIFO Read at B

iI

FIFO Read at B, Reg
Byp Read at A

1.r

Table 1. Master Reset BIFO Direction Selection
MR BYPA BYPB STBA STBB
X

X

X

X

..r
..r
0

0
X

X

X

X

Internal Reset

7-22

FIFO Write at A,
Reg Byp Write at B

V

FIFO Write at B,
FIFO Read at A

V

FIFO Read at A,
Reg Byp Read at B

Action

1.r

Action

""U"
0

0

FIFO Write at B,
Reg Byp Write at A
No FIFO, Trans
Data B to A

.-.

~

===,~~CNDUCTOR

=;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;D;;;;e;;;;s;;;:ig;;;;;;D;;;;iD;;;;;:g:;;;;W;;;;;;;;;;;;;;;it;;;;;;;h;;;;th;;;;e;;;;;;;;;;;;;;;C;;;;Y;;;;;;;7;;;;;;;C;;;;4;;;;;;;3;;;;;;;9;;;;B;;;;;;;id;;;;i;;;;;;;re;;;;c;;;;tI;;;;·o;;;;D;;;;al;;;;F;;;;I;;;;F;;;;O;;;;;;;;;;(B;;;;;;;I;;;;;;;F;;;;O;;;;;;;;)

has a message waiting for it in the bypass register. The
producer can then read the bypass register by pulsing
the nYPX pin Low. The nYPx pins perform bypass
register read and write functions with timing identical to
that of the "STBi signals.

ElF
0

1

0

Transparent Bypass

1

1

1-1024

1

0

1025-2047

0

0

Full

Table 3. BIFO Flag Operation

The CY7C439's transparent bypass capability allows the producer to transmit information through the
BIFO without the consumer manipulating the BIFO to
receive the information. This feature is useful for initializing dumb peripherals. Either side can initiate a
transparent bypass by bringing both "STBi and nYPi
Low at the same time. The FIFO's contents are not affected by the transparent bypass operation. The port
wishing to send data transparently to the other port
must ensure that the other port will not attempt a FIFO
read or write during the transparent bypass cycle.

AC Timing
Figure 4 shows the FIFO read and write tlmmg

diagram. As mentioned earlier, the timing looks very
similar to that of the Cypress CY7C42x family of FIFOs.
Assuming that the FIFO direction is from A to B, a
read operation is performed by pulsing ~ Low
while maintaining 'IfYPIf High. "STBB must be held Low
for a minimum time of tPR (25 ns, for the CY7C439-25
part). The data lines remain three-stated for a minimum
of tLZR (3 ns) after STB13's falling edge, and data becomes available after tA (25 ns). The "STBB signal must
recover for tRR (IOns) before another read operation is

STBB

(READ)
PORT

1

I.-to VI

:CX
I.-

-+I
til

till

PO R T A - - - - - - - «

DATA

I I YALID

~

tA

-+I

a

%
I.-tHZI-.j
DATA OUT 'ALID

A
I

-+I

-+t

tIIC

~

l.-

~'--------I!
I.-tSl~U~

~,--I __~JI
.

(WRITE)

1
-+I

DATA aUT YALID

-+I+-

tP II

I

-+I

%

B

ST BA

-+I+-

t:-LZI-+I

Words in FIFO

snrx

tiC

HI

I.-

HF

performed. The data lines remain valid for tDVR (3 ns)
and three-state after tHZR (18 ns) from the rising edge
ofSTIrn".
The 'ST'BA signal is used to perform writes to the
FIFO array. 'ST'BA must be Low for tpw (25 ns) and
recover for twR (10 ns). The data to be written into the
FIFO must be set up for tsD (15 ns) and held for tIm (0
ns) from the rising edge of STl3A.
Figure5 shows the timing waveforms for the bypass
register mode of operation. Reads from the bypass
register look much the same as reads from the FIFO.
During the bypass register reads or writes, the STBx
signal must be High within tBSR (10 ns) of the falling
edge of BYPx. The only differences between the timing
for FIFO read and bypass register read are that the
data lines remain three-stated for at least tBLZ (10 ns)
after 'l3Y'PA's falling edge and that data will be available
after tBA (30 ns) on the -25 part. The bypass register
write timing parameters are identical to those of the
FIFO write parameters.
Figure 6 shows the timing waveforms for the
transparent bypass mode of operation. This transceiverlike data path allows data to be driven from one port to
another without the need for the consumer to control
the BIFO to receive data. Either port can initiate a data
on the producing side must be High for
transfer.
tTSB (10 ns) and must go Low no longer than tTBS (10

Flag Operation
The BIFO provides two flag pins that can be
decoded to represent one of four states (Table 3):
empty; between empty and half full; between half full
and full; and full. These flags indicate the FIFO's status
and are useful for controlling the FIFO read and write
operations.

I.-

-

)~---------~~

Figure 4. FIFO Read and Write Timing

7-23

DATA

.. YALID }

~CYPR>SS
~

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ns) after the falling edge of BYPX. Data is available
tTPD (20 ns) after the falling edge of"STBX, and the output data changes tDL (20 ns) after the input data changes. The consumer bus three-states after trSD (18 ns)
from the rising edge of STBi.

image, RADAR, and SONAR data and equipment that
performs telecommunications bridging must all transfer
significant amounts of data among system processors.
A design example shows the interface issues involved in communications between a 25-MHz Cypress
CY7C601 SPARC processor and a 25-MHz 80386 used
as an embedded processor. This high-speed processor
bridge provides bidirectional data packet transfer, isolation of host processor from embedded processor, im-

Multiprocessor Communication Design
An excellent application for the CY7C439 BIFO is
in interprocessor communication. Systems that process

I+-

'\~_ _----JI

BYPB
(READ)
P 0 RT B

-+I

tlPII

I

I

tlLZ

*-

1+---.1

--------Itx

1

' \ \ . . . . - . . _ _- - - - J

DATA

OUT

tI

DY--.I:'\,,----1

YALID;---

I

I+B Y PA

P 0 RT

_nJ
\--.A

',-,_ _ _ _ _ _-',
I

DA TA

I.

:.- tI Hz--.I
DATA

OUT

~
;J-.- - - -

YALID

I

--.I

I
I

\'--_______1,----

/1

A --------«

IA --.I

I

P

tI II

~

(WRITE)

t

I

I+-

VA LID

t l s u 4 tI~

)>-___________

--i('-D-AT-A-I.-Y-A-LI-D-'j

I

I

Figure 5. Bypass Register Timing

kr-

STBA

0%1~I
-------.. r-

I

'\

I~-------------~I

tTSa



Designing with theCY7C439 Bidirectional FIFO (BIFO)

~COID~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

a read from the FIFO and a write to the bypass register.
CLK is the host system clock. CLKD is the delayed system clock used in the BIFO control logic. ADDR represents the state of the A31, A30, A29, and A2 address
lines. These signals, as well as the RD and WE: -control
signals, are valid 7 ns before and 7 ns after CLK's rising
edge.

I WRITE

IRE A D

I+-

eLK
ADDR
RD
WE

I
I

-+l

40 ••

--Y

~'B\\\
I

W '}W#//##

CLKD
BYPAI
STBAI

~

flfi$!&

II#II!(!II//

I

I

... ~
I I

I I
I I
- - h i,......'++1......

Io,

110,
'++I

DATAL
DATAF

I

I

I

I

:~
:~ I.rrutJJm~
I
I ~
~
":10'-+1
"''7\~~o,

DATAS

L

I

\\~

I

I NUL L

I

(

\
I
I

~

The CY7C601 asserts RD at the beginning of a
read cycle and WE: in the second cycle of a write operation. Both RD and WE: are used to determine if a valid
read (load) or write (store) cycle has begun.
The INULL signal nullifies an active write cycle.
Asserting INULL in the frrst cycle of a store operation
cancels the operation. INULL appears in the second
cycle of a all valid store operations (Figure8).

I

: :

u..

-+I

!.I

I

h'

-+I
au
_ _~I------~I_ _ _ _----u.~'~

~

Una

~~~

I

I

~
I

-:---Iroolllll~---------,

,

Figure 8. 7C601 Control Circuitry Timing Waveforms

7-26

.-.

%~RFSS
Designing with the CY7C439 Bidirectional FIFO (BIFO)
~ ~~~OR~~~~~~~~-~~-~~~==~~~~~~~~~~~~~~~~~
EDIT

SIMULATE

VIEWS

BDA

1 CLK
BDA

IS

HF

IS

1'1

EF

Ii

RD
WE

J3

A30

P

A29

J

L-J

L-J

IS

r

1

1

WE

~ A:J:O

P

l

10 A2

r
r
r

1
1
1

r

1

A29

o

11 INULL

VIEWS

"L..JL...JLJ LJ 1. J L.J 1 J 1 J

HF

'" EF
Ii RD

IS

EDIT

SIMULATE

n r 1 J 1-' L.J L.JL...JL......J

1 CLK

A2

11 INULL

14 MHOLD

4

~

2 MR
:J: STBA

MHOLD

2HR

~

L

19 BWA

19 BWA

13 INTACK

r

15 LEOE

L-J

17 11

l~
r

~O 10

1

J

:J: STBA

1

13 INTACK
15 LEOE

117

'L--

11

~O 10

Figure 9. BIFO Initialization ··7C601 Producer

Figure 10. BIFO Initialization •• 7C601 Consumer

The DATAS signal shows the data timing requirements of the CY7C601. During a load operation (read
from the BIFO), the CY7C601 requires valid data 3 ns
before and 5 ns after the system clock's rising edge.
During a store operation, the CY7C601 produces the
data to write 29 ns after the falling edge of the first
store cycle CLK; this data remains valid for 4 ns after
the falling edge of the second store cycle CLK.
Figures9 and 10 show the simulated waveforms for
initializing the BIFO. To initialize the BIFO for FIFO
read operations, BYPA must stay Low for two clock
cycles; this ensures that BYPA is stable for the entire
time :fJIr is Low. At any time, the CY7C601 can reset
the BIFO or switch the BIFO direction by writing to or
reading from address $40000004. The control circuitry
decodes this address by looking at address lines A31,
A30, A29, and A2. A2 determines whether the BIFO is
to be initialized or if a normal BIFOoperation is to be
performed. The resetting operation takes priority over
all other operations.
The state diagram in Figure 11 shows the BIFO
control circuitry's behavior after the BIFO is initialized

to receive data from the CY7C601. The boxes in this
diagram represent the state of the control logic. The
diamonds represent the conditions that produce the
transition from one state to the next.
The BIFO is reset in state 11 and moves to
WRITE_IDLE (State 10). The CY7C601 can begin
writing to the BIFO at this time by writing data to
memory location $40000000.
Because of the pipelined operation of the
CY7C601, a latch must hold the write data to meet the
BIFO's write set-up time. The data from the latch is
shown as the DATAL signal in Figure8. The latch not
shown in Figure 7 latches data from the CY7C601 and
allows this data to remain on the bus after the CY7C601
removes the data.
The CY7C601 produces valid write data at the falling edge of the clock in the write's second cycle. A 12ns-delayed LEOE signal from the 22VI0 is asserted at
this time to provide the latch enable and output enable
of the latch; the delayed LEOE signal is deasserted at
the end of the cycle. The LEOE signal ensures that the
BIFO data set-up and hold times are met with respect
to STBA. The simulated waveforms for a write cycle appear in Figure12.
Each time the CY7C601 writes to the BIFO, the
state machine checks to see if the BIFO has become
half full (HIT asserted). If HF is asserted, the state
machine moves to state 13 and interrupts the CY7C601.
Figure12 shows the timing of these activities.
The interrupt at HF" is one of three possible interrupts the control circuitry can generate. A complete list
of the interrupt values appears in Table 5.
The most efficient way to transfer data through the
BIFO is based on fixed-length packets, ideally 1 Kbyte
in size. This packet size makes efficient use of the
CY7C439 because 1024-word packets can be transferred without interruption unless the HF flag is asserted.
If the HF flag is asserted, the remainder of the current packet can be transferred, then the CY7C601 dis-

Table S. 7C601 BIFO Control Logic Interrupts
Name

Cause

U/IO

Not_Empty

ElF goes High when

01

not Reading
Empty

ElF goes Low when

01

Reading
Not Half Full

HF goes High when
Writin!!

10

Half Full

HF goes Low when
Writin!!

10

Bypass Data

BDA goes Low
when Writing

11

7-27

5i:CYPRi$

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mr

continues writing to the CY7C439 until the
flag becomes deasserted (FIFO less than half full). The state
machine interrupts the host processor at each of these
two conditions -lIF flag asserted when writing and
Fur flag deasserted when not writing.
If a design does not need to transfer fixed-length
packets, the EiF" flag should be monitored to determine
when to stop writing, and the Fur flag should be
monitored to determine when to begin writing. This
prevents continual CY7C601 interrupts. The CY7C601
can write to the BIFO until the BIFO is full, then do

other work while the consumer reads the BIFO below
the half full threshold.
The state machine allows the processor to continue
writing to the BIFO and to acknowledge the interrupt
that was generated when the FIFO became half full.
This ability to continue writing without waiting for an
interrupt acknowledge permits the CY7C601 to mask
out the
interrupt and continue transferring the current packet If the CY7C601 acknowledges the interrupt, the state machine moves to state 14. There, the
state machine enables the SPARC processor to con-

mr

TO .n IUD
(lTA'l, I)

Figure 11. Writing State Diagram of the 7C601 Control Logic

7-28

~CYPR>$
~

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~m~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SI:IMULATE
1

CLI<
BOA

EDI:T

1.

J

1.

I

1.

.I

UI:EWS

1.

I

1.

I

L

.I

QUI:T

LI:MI:TS

1.

.I

1.

L

.I

.I

L-I

L-I

LJ

LJ

HF

eoF
RD
WE
A30
A29
10 A2
11 I: NULL
1'" IMHOLD
MR
~2
3
STEOA
19 BYPA
13 XNTACIC
18 LEOE
17 I: 1
12 0 1:0

r
r
r

'"l...--J

I

r-L

L

""L--J

r

L
~

I

999

999

Figure 12. Write Cycle Simulation Waveforms

the state machine operates the same as for a m:>A
assertion during the HF IDLE state.
Figure14 shows the-state diagram of the BIFO configured as the consumer of BIFO data. Figure 10 shows
a simulation of the CY7C601 processor switching the
direction of the FIFO from writing (A to B) to reading
(B to A). From the READ IDLE state, the control circuitry interrupts the CY7C601 processor (11/10 = 10)
whenever the BIFO becomes not empty (FJF" deasserted). After the CY7C601 has acknowledged the interrupt, it can begin reading from the BIFO. Figure 15
shows a timing diagram of a read from the BIFO. The
DATAS wavefonn shows the CY7C601's data timing requirements, and the DATAF wavefonn shows the
timing of the data supplied by the BIFO (Figure8). The
BIFO holds data valid tDVR after the rising edge of
'STBi. The CY7C601, on the other hand, requires that
valid data be maintained for at least 5 ns after the rising
edge of the processor clock. The 22VI0 control clock
must therefore be delayed with a device such as a gate
that has a delay of 5 ns or even a delay line (CLKD in
Figure 8) to meet the CY7C601's data-hold requirements.
The CY7C601 can continue reading from the BIFO
until the BIFO becomes empty (EiF" asserted) or the
CY7C601 has read all the information it needs. When
the BIFO becomes empty, the CY7C601 control circuitry interrupts the processor (11/10 = 10); the

tinue writing to the BIFO. In this state, the state
machine also continues monitoring the HP" flag and
causes an interrupt whenever the flag becomes unasserted (FIFO less than half full).
Another CY7C439 feature that this design utilizes
is registered bypass. When the 386 has passed a message to the CY7C601, the JmA flag is asserted. If this
happens during the HF_IDLE state, the control circuitry generates an interrupt (11/10 = 11). After the
CY7C601 acknowledges the interrupt, the CY7C601 has
the option of writing to the BIFO or reading the bypass
register. If the bypass register is read, the state machine
moves back to the HF_IDLE state. Figure 13 shows a
simulation of these activities.
It is especially important to note that during
registered bypass read, the control circuitry does not
monitor the MHOLD signal as in other states, but instead drives this signal Low, indicating that the
CY7C601 should wait for the expected data. The
MHOLD signal must be maintained while the missed
data is strobed into the processor with the FJI)S" signal.
For this design, the MHOLD output from the control
circuitry can also be used as the FJI)S" signal.
During the HF IDLE state, when the BIFO
empties below half fUil (HF" High), the state machine
moves to the WRITE IDLE state again (state 10).
Writing can continue fram this state, as before. If the
JmA flag is asserted during the WRITE_IDLE state,
ED:lT
1

CLI<
BOA

J

1.

I

LJ

HF

EF
RD
wE
A30
A29
10 A2
11 X NULL
1'" MHOLD
2
"A
3' STBA
19 EOYPA
13' XNTACI<
1& LEOE
17 X 1

zo

r

r
r

1........1"!:

QUXT

LI:MI:TS

U:lEwS

~

J

1.

J

1.

r-LJ

""L..JL

J

,,,

r

:10

9 9.....

Figure 13. Bypass Register Read
7-29

999

l-

&.""""'"
~

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~~OR~~~~~~~~==~~~~~~~~~~~~~~~~~~~

CY7C601 cannot read from the· BIFO any more until
the BIFO offers the not empty flag.
If the processor reads until the BIFO is empty, the
processor might continue reading after the BIFO
empties due to the latency of the processor responding
to the interrupt. This. might cause the processor to read
invalid data. This problem is .avoided by one of. two
methods: Employ a special value as the last word written to the FIFO to indicate the transfer's end; or have
the producer send the number of data words in the
transmission at the transfer's beginning and have the
consumer continue reading data until the specified
number of words has been read.
From either the READ IDLE state or from the
NOT EMPTY IDLE state, the CY7C601 can write to
the bypass register. This operation passes a message
against the normal FIFO flow to the 386. Figure 15 iricludes a bypass register write cycle. The control circuitry performs a bypass register write whenever the
CY7C601 is consuming data from the BIFO and the

processor

performs

a

write

to

address

location

$40000000.
CY7C601 BIFO Control Design File
Appendix A lists the design file used to generate the
equations for programming the 22VIO. This design file
was created using the LOG/iC software package from
ISDATA. For a detailed description of this software
package, refer to the application note~ "Using LOG/iC
to Program the CY7C330."
The *IDENTIFICATION section of the design file
gives general information about the file. The *pAL section .indicates that this file will be used to program a
22VlO. The *X-NAMES section describes inputs to the
22VIO, and the *Z-NAMES section describes
registered outputs of the 22VIO. The *Z-VALUES section assigns a unique value to all states in the state
machine ... These values indicate the signal level on each
output while a current state is active, as well as the
value of any additional state bits (Q[3 .. 1]);

TO

.1

IIIIU

("AlE 11)

Figure 14. Reading State Diagram of the 7C601 Control Logic

7-30

SIMULATE
eLK
J
BDA
:s HF
EF
RD
s WE
3
A30
A:29
10 A:2
11 INULL
1 .... MHOLD
:/::2 MR
:/:3 STBA
19 BV'PA
13 XNTACK
15 LEOE
17 x 1
:/:0 XO

1

EDIT

l

J

l

J

l

l

l

.J

J

J

l

QUIT

LIMITS

UIEWS

.J

l

L

J

J

~

1.

J

1.

.J

l

l

J

:/:

.

,.

r

1

-r

..

"i

L-

~

~

L--

...

r

-,

r

r

-,
999

999

Figure 15. BIFO Read Simulation Waveforms

READ

ClK

r~
I
I

~

ClK2
t:;

ADSI
W/RI
A16

WRITE

40 ••

I
I"'~

-.I

11ft.

~
~

~

I

M/ 101

r;JJIfffA
...

CLK-ST

I

~
II

~

I

I

I
II

~

I

I

I
WI

~

I

I

I
II

~

I

-- -- - - - - .;

I

I

I

'a/

'a/

'a/

~

STBBI

~

BYPBI
DATAM
DATAF

I
I

... ~

I· ..

:~

I
I

~
I

I

I
I

Figure 16. 80386 BIFO Control Logic Timing Diagram
7-31

trfffIJ
l+-

I I ••

1

wJ:CYFR>SS

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Figures 11 and 14 reveals that'the state diagrams map
easily into the LOG/iC design file.
The *STAlE-ASSIGNMENT section indicates
that the compiler should use the variables listed in the
*Z-VALU ES section for state-assignment values. The
*PIN section assigns the variable names to 22VI0 pin
numbers, and the *RUN-CONTROL section configures
the compiler and requests various outputs.
The LOG/iC design file produces fully reduced
equations that accurately describe the state diagram.
The simulation waveforms shown in this application
note are taken from the Cypress PLD ToolKit and
reflect the function of the equations produced with the
ISDATA software.

The *FLOW-TABLE section describes the state
transitions found in the state diagram. Each line of this
section contains the current state, a possible state of the
inputs, and the next state the state machine goes to if
these inputs are true. For example, line one states that
if the state machine is in State 1 (READ IDLE) and
several conditions are met-RD and WE" are High (indicating a microprocessor read), A31 is Low, A30 is
High, A29 is Low (FIFO is selected), A2 is High (FIFO
reset address), and MHOLD is Low (7C601 has not
been held) -then go to State 2 (MR READ 1, a
master reset cycle configuring the BIFO to transfer data
to the CY7C601). Comparing the *Z-VALUES and
*FLOW-TABLE sections to the state diagrams in

TO

liT IV VIITI
(lfur II)

Figure 17. Reading State Diagram of the 80386 Control Logic

7-32

I.ITI.IILI

Figure 18. Writing State Diagram of the 80386 Control Logic
80386 BIFO Control Circuitry
A Cypress PAL22VIOC-7 implements the 386 control circuitry. The state machine used to control the
BIFO operation uses a 6-ns delay (CLK-ST in Figure
16) of the 386 system clock to capture the address and
control information from the 386. The clock is delayed
in the same manner as the delayed clock for the
CY7C601
The 386 control logic closely resembles that of the
CY7C601. Because the 386 uses a doubled system clock
(CLK2 in Figure 16), the STBll' and 'BYPl3 signals must
be strobed for three CLK2 cycles, in contrast to the one
CLK cycle in the CY7C601 state machine. The other
major difference between the two designs is that the 386
has a two-clock-cyle read, in contrast to the one-clockcycle read for the CY7C601. This design easily meets
the set-up and hold times of the BIFO. In fact, you can
use the design for the 386 control logic at system speeds
as high as 33 MHz.
Unlike the design on the CY7C601 side, the 386
side monitors the BIFO MIf line and IWPA signal to

determine the direction for which the BIFO is configured. At start up, the 601 resets the BIFO and sets
the direction (usually configuring itself as producer).
The embedded control circuitry notices the NIR signal
pulse Low and that the BYP'A signal is High. These two
signal states force the 386 control logic into the read
portion of the state machine (Figure 17). If, at some
later time, the CY7C601 switches the BIFO's direction
so that the CY7C601 becomes the consumer of BIFO
data, the CY7C601 pulses both Mlf and BYP'A Low.
This forces the 386 control circuitry into the write portion of the state machine (Figure 18). Figures 19 and 20
show simulated waveforms for 386 control logic's master
reset read and master reset write.
The 386 performs a read from the BIFO by driving
Wt'R and MlIIT Low and driving A16 High. As these
states imply, the BIFO lies in the upper 32 Kbytes of
memory-mapped I/O. The simulated waveforms shown
in Figure21 look similar to those of the CY7C601.
Figure 22 shows the simulated waveforms for a
write to the BIFO. The 386 performs a write in the
7-33

~

%
~RESS
Designing with the CY7C439 Bidirectional FIFO (BIFO)
~
~COND~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SIMULATE
eLK
RESET
3 "R
EO B_A
.. BOA
oS HF
1

2:

EDIT

L 1

J

1.

1.

J

1

J

J

1.

J

QUIT

LIMITS

UIEWS

1.

1.

J

1.

r-t. 11. 11. J

1. 1

J

1.

EF
RW
"IO
10 A1.s.
11 ADS
1-4 STBB
15 BVPB
13 INTA
1.s. I2
17' I1
18 IO
19 01

~

~

J
L
L--J

L

~

L ~

L

.r-

l

L--

J

1--J

lit-

1--

L--S
99'9'

99'9

Figure 19. Master Reset Initialization -- 80386 Consumer

SXMULATE
eLK
1.
RESET
3 "R
eo B_A
.. BOA
oS HF

EDIT

J

1

"

B

1. J

1.

UXEWS

J L ..rL

LXMITS

..r1. ..r1. r l .

.rL

JI.

QUIT

J

JI.

L

S l

S

EF

RW
"XO
10 A1.s.
11 ADS
1-4 STBB
15 BVPB
13 XNTA
14 X2
17' X1
18 IO
19 01

s;>

lit-

L

J

l

J

~

.r--

L

~

r

L--S

~

L--S
99'9

999

L-

Figure 20. Master Reset Initialization -- 80386 Producer

1

iZ.
~

P

~
~

SIMULATE
eLK
RESET

EDIT

J

1.

JL

J

1.

J

1.

J

1.

J

1. 1

1.

J

1

J

1.

J

1.

JL

L

1

"R

B_A
BOA

HF
EF

~

RW
"XO
10 A1d.
11 ADS
1-4 STBB
15 BVPB
13 XNTA
1d. I2
17' X 1
1B IO
19 01

L
L

p

L

1.

QUIT

LXMXTS

UXEWS

r-t. J

J
J

L

J

L

J

~

L

J

l

1
L

~

~

L--S
999

....
Figure 21. 80386 Control Logic Read Simulation Waveforms

7-34

999

•

"LI

~"""""
~

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ED:lT

E

eLK
RESET
MR
B'VPA
BDA
HF
EF
B RN
;> ":10
10 A1.,s
11 ADS
14 S
BB
16 BVlPB
13 :IN"TA
Ld. :12
117 :I 1
1181:0
119 Q1

J

1.

LJ

.f

1. J

L.:lM:lTS

"':1 ENS

1.

J

1.

1. .f

J

1. .f

1. .f

1.

..[

OU:lT

1.

.(

].

..[

1.

1._

.f

I

l.

J

L

J

l.

J

L

J

I

r

J

r

~

~

..

L:

r

~

"'L--J

~".".

999

Figure 22. 80386 Control Logic Write Simulation Waveforms
same way as a read, with the exception of driving wlf{
High for the write.
The 386 control logic has seven separate interrupts
that it sends to the interrupt controller (Table 6). In addition to generating the empty and half-full interrupts
also generated by the CY7C601 control logic, the 386
state machine interrupts the microprocessor whenever
the BIFO direction is switched. Two separate interrupts
ensure that the microprocessor knows the direction in
which the BIFO is switched.
The design ftle for the 386 control circuitry was
created using LOO/iC and appears in Appendix B. The
format of this file is the same as that of the CY7C601
22VlO control logic. Notice that the *FLOW-TABLE
section contains fewer state transitions because the control logic does not have to decode address lines to
determine if the BIFO direction has switched.

Table 6. 80386 BIFO Control Logic Interrupts
Name

7-35

Cause

I2/Il/IO

Switch Read

MRgoesLow

001

Switch Write

MR and BYPA go
Low

010

Not Empty

ElF goes High when
not Readim!

011

Empty

ElF goes Low when
Reading

011

Not Half Full

HF goes High. when
Writing

100

Half Full

HF goes Low when
Writing

100

Bypass Data

BDA goes low when
Writing

111

~CYI'RfSS
.

Designing with the CY7C439 Bidirectional FIFO (BIFO)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. 7C601 Control Logic Design File

*IDENTIFICATION
BIDIRECTIONAL FIFO CONTROL CIRCUITRY FOR THE CYPRESS 601
SEAN DINGMAN
CYPRESS SEMICONDUCTOR
*PAL
TYPE=PALC22V10;
*X-NAME S
CLK,BDA,HF,EF,RD,WE,A31,A30,A29,A2,INULL,MHOLD,INTACK;
*Y-NAMES
MHOLD;
*Z-NAMES
I1,IO,MR,STBA,BYPA,LEOE,Q3,Q2,Q1;
*Z-VALUES
S1
S2
S3
S4
S5
S6
87
88
S9

0
0
0
0
0
0
0
0
0

0
0
0
1
0
0
1
0
0

1
0
1
1
1
1
1
1
1

1
1
1
1
1
0
1
1
1

1
0
0
1
1
1
1
0
0

0
0
0
0
0
0
0
1
1

000

S10
Sl1
S12
S13
S14
S15
S16
817
S18
S19
820
S21
S22
S23
S24
825
S26

0
0
0
1
0
1
0
1
0
0
0
0
1
0
0
0
0

0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0

1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

1
1
0
1
1
1
0
1
1
1
1
0
1
1
1
1
0

1
1
1
1
1
1
1
1
1
0
0
1
1
1
0
0
1

0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
0
1

010

(A) READ_IDLE
MR READ 1
(B) MR READ "2
(E) INT_NOT_EMPTY
(A) NOT_EMPTY_IDLE .
READ_FROM_NOT_EMPTY_IDLE
(E) INT_EMPTY
(C) WRITE_FROM_NOT_EMPTY_IDLE
(C) WRITE_FROM_READ_IDLE

0--0001
-10-0
0-1

(A) WRITE_IDLE
MR SW WRITE
(D) WRITE FROM WRITE IDLE
(F) INT HF
(A) HF IDLE
(F) INT NOT HF
(D) WRITE_FROM_HF_IDLE
(G) INT_BDA_FROM_HF_IDLE
(A) INT BDA IDLE FROM HF IDLE
(B) READ1 FROM HF IDLE (B) READ2-FROM-HF-IDLE
-(D) WRITE_FROM_INT BDA_IDLE_FROM HF IDLE
(G) INT_BDA_FROM_WRITE_IDLE
(A) INT_BDA_IDLE_FROM_WRITE_IDLE
(B) READ1 FROM WRITE IDLE
(B) READ2=FROM=WRITE=IDLE
(D) WRITE_FROM_INT_BDA_IDLE_FROM_WRITE_IDLE

-00
-0011
-1-01
0-1-0
100
101
-10
1-1-1
110
111
-11

*BOOLEAN-EQUAT IONS
MHOLD.OE = MR

&

8TBA

&

/BYPA

&

Q3;

7-36

Cr};cv-.
~

F; ; I; ;:; F; ; ; ;O~(B; ; ; I; ;:; F; ; ;O~)

-;;;;;;===;;;D;;;e;;;;;si;:g;;;DI;;;:;;oD;:g;;;W1;:;;;o;;;th;;;;t;;;:;;h;;;e;;;;;C;;;;;;Y;;;;7;;;;;;;C;;;;4;;;:;3;;;;;9;;;B;;;i;;;;d;;;ir;;;ec;;;t;;;;;io;;;;D;;,;a;;;1

SEMlCQIDUCI'OR_

Appendix A. 7C601 Control Logic Design File (Continued)
*FLOW-TABLE

; READING
RELEVENT

S
S
S
S
S

1
1
1
1
1

S 1

S
S
S
S
S
S
S
S

1
1
1
1
1
1
1
1

S 2
S 2
S 2
S 2

S
S
S
S

2
2
2
2

S
S
S
S
S
S
S
S

3
3
3
3
3
3
3
3

S
S
S
S
S
S
S
S
S
S
S
S
S
S

4
4
4
4
4
4
4
4
4
4
4
4
4
4

,

EF,RD,WE,A31,A30,A29,A2,INULL,MHOLD,INTACK;

X
X
X
X
X
X
X
X
X
X
X

-110101-1-,
-00010101-,
1--1------,
1--00-----,
1--011----,
1--0100---,
0-0010001-,
0--01001--,
0--010000-,
0--1------,
0--00-----,
X 0--011----,
X ---0101-0-,
X -00010111-,

Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F

2
11
4
4
4
4
9
1
1
1
1
1
1
1

MR READ 1
MR WRITE
READ IDLE
READ_IDLE
READ IDLE
READ_IDLE
READ IDLE
READ_IDLE
READ IDLE
READ IDLE
READ IDLE
READ IDLE
READ. IDLE
READ IDLE

-

INT_NOT_EMPTY -- MISC ADDR SPACE
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
WRITE_FROM_READ_IDLE
READ IDLE
INULLED
READ IDLE
MHELD
READ IDLE
MISC ADDR SPACE
READ IDLE
READ-IDLE
READ IDLE
HELD SWITCH
READ IDLE
INULLED SWITCH

X
X
X
X
X
X
X
X

-110101-1-,
-00010101-,
---1------,
---00-----,
---011----,
---0100---,
---0101-0-,
-00010111-,

Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F

2
11
3
3
3
3
3
3

MR READ 1
MR WRITE
MR READ 1
MR READ 1
MR READ 1
MR-READ-1
MR READ 1
MR-READ-1

-

MR READ 2
MR:=READ~)
MR READ 2
MR:=READ:=2
MR READ 2
MR-READ-2

X
X

-110101-1-,
-00010101-,
---1------,
---00-----,
---011----,
---0100---,
---0101-0-,
-00010111-,

Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F

2
11
1
1
1
1
1
1

MR READ 1
MR WRITE
MR- READ 2
MR:=READ:=2
MR READ 2
MR:=READ~)
MR READ 2
MR-READ-2

-

IDLE
IDLE
IDLE
IDLE
IDLE
IDLE

-110101-1-,
-00010101-,
---1-----0,
---00----0,
---011---0,
---0100--0,
---0101-00,
-000101110,
---1-----1,
---00----1,
---011---1,
---0100--1,
---0101-01,
-000101111,

Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F

2
11
4
4
4
4
4
4

MR READ 1
MR WRITE
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY

X
X
X
X
X

X
X
X

X
X
X
X
X
X
X
X
X
X
X
X

5
5
5
5
5
5

7-37

READ
READ
READ
READ
- READ
- READ

-

IGNORE MHOLD
AND INULL
FIFO ACTS
INDEPENDENTLY

INT_NOT_EMPTY
INT_NOT_EMPTY
INT_NOT_EMPTY
INT NOT EMPTY
INT NOT EMPTY
INT_NOT_EMPTY
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE

5i;CYmSs
~

Designing with the CY7C439 Bidirectional FIFO (BIFO)

SEMiCONDUCTOR

Appendix A. 7C601 Control Logic Design File (Continued)

5
5
5
5
5
5
5
5
5

X
X
X
X
X
X
X
X
X

-110101-1-,
-00010101-,
-110100-1-,
-00010001-,
---1------,
---00-----,
---011----,
---010--0-,
-00010-11-,

Y
Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F

6
6
6
6
6
6
6
6
6
6
6
6
6
6
6

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

-110101-1-,
-00010101-,
1110100-1-,
1000100---,
0--1------,
0--00-----,
0--011----,
0--0100---,
0--0101-0-,
000010111-,
1--1------,
1--00-----,
1--011----,
1--010--0-,
100010-11-,

Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F

2

F
F
F
F
F

5
5
5
5
5

S 7
S 7

S
S
S

-110101-1-,
-00010101-,
---1-----0,
---00----0,
---011---0,
---0100--0,
---0101-00,
-000101110,
---1-----1,
---00-----1,
---011---1,
---0100--1,
---0101-01,
-000101111,

Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F

2

7
7
7
7
7
7
7
7
7
7
7
7

X
X
X
X
X
X
X
X
X
X
X
X
X
X

S
S
S
S
S
S
S
S

8
8
8
8
8
8
8
8

X
X
X
X
X
X
X
X

-110101-1-,
-00010101-,
---1------,
---00-----,
---011----,
---0100---,
---0101-0-,
-00010111-,

Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F

2

S
S
S
S
S

s
S
S
S
S
S
S
S
S
S
S
S
S

S
S
S

S
S
S

s
S
S
S
S
S
S
S
S

F

F
F
F
F

2

MR_READ_1

11

~WRITE

6

NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE

8
5
5
5
5
5

11
6

5

7
7
7
F 7
F 7
F 7

11
7
7
7
7
7
7

1
1
1
1
1
1
11
5
5
5
5
5
5

MR_READ_1
MR WRITE
READ_NOT_EMPTY
READ_NOT_EMPTY
READ_NOT_EMPTY
READ_NOT_EMPTY
READ NOT EMPTY
READ=NOT=EMPTY
READ NOT EMPTY
READ=NOT=EMPTY
READ_NOT_EMPTY
READ_NOT_EMPTY
READ_NOT_EMPTY
READ_NOT_EMPTY
READ_NOT_EMPTY
MR_READ- 1
MR_WRITE
INT_EMPTY
INT_EMPTY
INT_EMPTY
INT EMPTY
INT_EMPTY
INT EMPTY
INT EMPTY
INT EMPTY
INT EMPTY
INT EMPTY
INT EMPTY
INT_EMPTY

-

-

-

READ_NOT_EMPTY
WRITE_NOT_EMPTY
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE

-

READ_NOT_EMPTY
NOT_EMPTY_IDLE
INT EMPTY
INT EMPTY
INT_EMPTY
INT_EMPTY
INT_EMPTY
INT_EMPTY
NOT_EMPTY_IDLE
NOT_EMPTY_ IDLE
NOT_EMPTY_IDLE
NOT EMPTY IDLE
NOT=EMPTY=IDLE

-

INT EMPTY
INT_EMPTY
INT_EMPTY
INT EMPTY
READ- IDLE
READ- IDLE
READ IDLE
READ IDLE
READ IDLE
READ IDLE
READ- IDLE
READ IDLE

MR READ- 1
MR_WRITE
WRITE_NOT_EMPTY
WRITE_NOT_EMPTY
WRITE_NOT_EMPTY
WRITE_NOT_EMPTY
WRITE_NOT_EMPTY
WRITE_NOT_EMPTY

-

-

NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE
NOT_EMPTY_IDLE

MIse ADDR
MHELD
INULLED

.....::=-..
=-.........,.,..

:l~RFSS

-=,

9"

Designing with the CY7C439 Bidirectional FIFO (BIFO)

SEMICQIDUCTOR

Appendix A. 7C601 Control Logic Design File (Continued)

s
S
S
S
S
S
S
S

9
9
9
9
9
9
9
9

X
X
X
X
X
X
X
X

-110101-1-, y 1, F 2
-00010101-, y 1, F 11
---1------,

y

---00-----,
---011----,
---0100---,
---0101-0-,
-00010111-,

Y

y
y
y
y

1,
1,
1,
1,
1,
1,

F
F
F
F
F
F

1
1
1
1
1
1

MR READ- 1
MR WRITE
WRITE READ IDLE
WRITE READ IDLE
WRITE_READ_ IDLE
WRITE_READ_ IDLE
WRITE READ- IDLE
WRITE_READ_ IDLE

-

READREAD
READ
READREADREAD

IDLE
IDLE
IDLE
IDLE
IDLE
IDLE

iWRITING
RELEVENT=HF,BDA,RD,WE,A31,A30,A29,A2,INULL,MHOLD,INTACK;
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S

10,
10,
10,
10,
10,

--110101-1-, y 1, F 2

10,

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

--00010101-, y 1,
-0--1------, y 1,
-0--00-----, Y 1,
-0--011----, y 1,
-0--0100---, y 1,
-0--0101-0-, y 1,
-000010111-, y 1,
-100010001-, y 1,
-1--1------, y 1,
-1--00-----, Y 1,
-1--011----, y 1,
-1--0100-0-, y 1,
-100010011-, y 1,
-1110100---, y 1,
-1--0101-0-, y 1,
-100010111-, y 1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F

11
22
22
22
22
22
22
12
10
10
10
10
10
10
10
10

MR READ 1
MR WRITE
WRITE IDLE
WRITE_ IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE- IDLE
WRITE_ IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE- IDLE

-

S
S
S
S
S
S
S
S

11,
11,
11,
11,
11,
11,
11,
11,

X
X
X
X
X
X
X
X

--110101-1-, y 1,
--00010101-, y 1,
----1------, y 1,
----00-----, Y 1,
----011----, y 1,
----0100---, y 1,
----0101-0-, y 1,
--00010111-, y 1,

F
F
F
F
F
F
F
F

2
11
10
10
10
10
10
10

MR
MR
MR
MR
MR
MR
MR
MR

WRITE- IDLE
WRITE_ IDLE
WRITE- IDLE
WRITE IDLE
WRITE- IDLE
WRITE IDLE

S
S
S
S
S
S
S
S
S
S
S
S
S
S

12,
12,
12,
12,
12,
12,
12,
12,
12,
12,
12,
12,
12,
12,

X
X
X
X
X
X
X
X
X
X
X
X
X
X

--110101-1-, y 1, F 2
--00010101-, y 1, F 11
0---1------, y 1, F 13
0---00-----, Y 1, F 13
0---011----, y 1, F 13
0---0100---, y 1, F 13
0---0101-0-, y 1, F 13
0-00010111-, y 1, F 13
1---1------, y 1, F 10
1---00-----, y 1, F 10
1---011----, y 1, F 10
1---0100---, y 1, F 10
1---0101-0-, y 1, F 10
1-00010111-, y 1, F 10

10,
10,
10,

10,
10,
10,

10,
10,
10,
10,
10,

READ 1
WRITE
WRITE WRITE WRITE WRITE WRITE WRITE -

INT_BDA_FROM_WRITE_ IDLE
INT_BDA_FROM_WRITE_ IDLE
INT_BDA_FROM_WRITE_ IDLE
INT_BDA_FROM_WRITE IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_ IDLE
WRITE FROM WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE
-

MR READ 1
MR WRITE
WRITE_FROM_WRITE_ IDLE
WRITE_FROM_WRITE_IDLE
WRITE_FROM_WRITE- IDLE
WRITE_FROM_WRITE IDLE
WRITE_FROM_WRITE IDLE
WRITE_FROM_WRITE_ IDLE
WRITE FROM WRITE- IDLE
WRITE_FROM_WRITE_IDLE
WRITE_FROM_WRITE_IDLE
WRITE FROM WRITE IDLE
WRITE=FROM=WRITE= IDLE
WRITE_FROM_WRITE IDLE

-

7-39

-

-

-

INT_HF
INT HF
INT HF
INT HF
INT HF
INT HF
WRITE IDLE
WRITE IDLE
WRITE_ IDLE
WRITE IDLE
WRITE IDLE
WRITE IDLE

-

sr~

Designing with the CY7C439 Bidirectional FIFO (BIFO)

Appendix A. 7C601 Control Logic Design File (Continued)
S
S
S
S
S
S
S
S
S
S
S
S
S
S

13,
13,
13,
13,
13,
13,
13,
13,
13,
13,
13,
13,
13,
13,

X
X
X
X
X
X
X
X
X
X
X
X
X
X

--110101-1-,
--00010101-,
----1-----0,
----00----0,
----011---0,
----0100--0,
----0101-00,
--000101110,
----1-----1,
----00----1,
----011---1,
----0100--1,
----0101-01,
--000101111,

y
y
y
Y
y
y
y
y
y
y
y
y
y
y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F

2
11
13
13
13
13
13
13
14
14
14
14
14
14

MR_READ_1
MR_WRITE
INT HF - INT_HF
INT HF - INT HF
INT HF - INT_HF
INT_HF - INT_HF
INT.HF - INT HF
INT HF - INT HF
INT HF - HF IDLE
INT HF - HF IDLE
INT HF - HF IDLE
INT HF - HF IDLE
INT HF - HF IDLE
INT HF - HF IDLE

S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S

14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,
14,

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

--110101-1-,
--00010101-,
1---1------,
1---00-----,
1---011----,
1---0100---,
1---0101:-0-,
1-00010111-,
00--1------,
00--00-----,
00--011----,
00--0100---,
00--0101-0-,
0000010111-,
0100010001-,
01--1------,
01--00-----,
01--011----,
01110100---,
01--010--0-,
0100010-11-,

y
y
y
y
y
y
y
y
y
Y
y
y
y
y
y
y
y
y
y
y
y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F

2
11
15
15
15
15
15
15
17
17
17
17
17
16
14
14
14
14
14
14

MR READ 1
MR WRITE
HF IDLE HF IDLE HF IDLE HF IDLE HF- IDLE HF IDLE HF_ IDLE HF IDLE HF IDLE HF IDLE HF IDLE HF- IDLE HF- IDLE HF IDLE HF IDLE HF IDLE HF IDLE HF IDLE HF- IDLE -

--110101-1-,
--00010101-,
----1-----0,
----00----0,
----011---0,
----0100--0,
----0101-00,
--000101110,
----1-----1,
----00----1,
----011---1,
----0100--1,
----0101-01,
--000101111,

y
y
y
Y
y
y
y
y
y
y
y
y
y
y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F

2
11
15
15
15
15
15
15
10
10
10
10
10
10

MR READ_1
, MR..:...WRITE
INT_NOT_HF
INT_NOT_HF
INT_NOT_HF
INT_NOT_HF
,- INT_NOT_HF
INT_NOT_HF
INT_NOT_HF
INT_NOT_HF
INT_NOT_HF
INT_NOT_HF
INT_NOT_HF
INT_NOT_HF

S 15,
S 15,

s
S
S
S
S
S
S
S
S
S
S
S

15,
15,
15,
15,
15,
15,
15,
15,
15,
15,
15,
15,

X
X
X
X
X
X
X
X
X
X
X
X
X
X

17

7-40

INT_NOT_HF
INT_NOT_HF
INT-,NOT_HF
I NT_NO T_HF
INT_NOT_HF
INT_NOT_HF
INT_BDA_FROM_HF_IDLE
INT_BDA_FROM_HF_ IDLE
INT_BDA_FROM_HF_ IDLE
INT_BDA_FROM_HF_ IDLE
INT_BDA_FROM_HF_ IDLE
INT_BDA_FROM_HF_IDLE
WRITE_FROM_HF_IDLE
HF_IDLE
HF_ IDLE
HF_IDLE
HF_IDLE
HF_ IDLE
HF IDLE

-

-

-

INT NOT HF
INT_NOT_HF
INT NOT HF
INT NOT HF
INT=NOT=HF
INT_NOT_HF
WRITE IDLE
WRITE IDLE
WRITE_ IDLE
WRITE_ IDLE
WRITE IDLE
WRITE_ IDLE

~

.An~ucrOR

Designing with the CY7C439 Bidirectional FIFO (BIFO)

Appendix A. 7C601 Control Logic Design File (Continued)

S
S
S

16,
16,
16,
16,
16,
16,
16,
16,

X
X
X
X
X
X
X
X

--110101-1-,
--00010101-,
----1------,
----00-----,
----011----,
----0100---,
----0101-0-,
--00010111-,

y
y
y
Y
y
y
y
y

1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F

2
11
14
14
14
14
14
14

MR READ 1
MR WRITE
WRITE- FROMHF- IDLE
WRITE_FROM_HF IDLE
WRITE_FROM_HF_ IDLE
WRITE FROM HF- IDLE
WRITE_FROM_HF_ IDLE
WRITE_FROM_HF- IDLE

S
S
S
S
S
S
S
S
S
S
S
S
S
S

17,
17,
17,
17,
17,
17,
17,
17,
17,
17,
17,
17,
17,
17,

X
X
X
X
X
X
X
X
X
X
X
X
X
X

--110101-1-,
--00010101-,
----1-----0,
----00----0,
----011---0,
----0100--0,
----0101-00,
--000101110,
----1-----1,
----00----1,
----011---1,
----0100--1,
----0101-01,
--000101111,

y
y
y
Y
y
y
y
y
Y
y
y
y
y
y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F

2
11
17
17
17
17
17
17
18
18
18
18
18
18

MR READ- 1
MR WRITE
INT BDA FROM HF IDLE - I NT_BDA_FROM_HF- IDLE
INT BDA FROM HF- IDLE
INT_BDA_FROM_HF_ IDLE
INT_BDA_FROM_HF- IDLE - INT_BDA_FROM_HF- IDLE
INT_BDA_FROM_HF IDLE - INT- BDA- FROM-HF- IDLE
I NT_BDA_FROM_HF_ IDLE - INT- BDA- FROM-HF- IDLE
INT_BDA_FROM_HF_IDLE - INT- BDA- FROM- HF- IDLE
INT_BDA_IDLE_FROM_HF- IDLE
INT- BDA- FROM-HF-IDLE
INT_BDA_FROM_HF_ IDLE - INT- BDA- IDLE- FROM- HF- IDLE
INT BDA FROM HF- IDLE - INT- BDA- IDLE- FROM-HF- IDLE
- INT_BDA_FROM_HF IDLE - INT- BDA- IDLE- FROM-HF-IDLE
I NT_BDA_FROM_HF_ IDLE - INT BDA IDLE FROM HF IDLE
INT- BDA- FROMHF- IDLE -iNT- BDA- IDLE- FROM- HF- IDLE

S
S
S
S
S
S
S
S
S

18,
18,
18,
18,
18,
18,
18,
18,
18,

X
X
X
X
X
X
X
X
X

--110101-1-,
--00010101-,
--110100-1-,
--00010001-,
----1------,
----00-----,
----011----,
----010--0-,
--00010-11-,

y
y
y
y
y
Y
y
y
y

1, F
1~ F
0, F
1, F
1, F
1, F
1, F
1, F
1, F

2
11
19
21
18
18
18
18
18

MR READ 1
MR WRITE
INT BDA IDLE_FROM_HF_IDLE - READ1_FROM_BDA_IDLE
INT_BDA_IDLE_FROM_HF_ IDLE - WRITE FROM BDA IDLE
INT_BDA_IDLE_FROM_HF_ IDLE-INT_BDA_IDLE_FROM_HF_IDLE
INT_BDA_IDLE_FROM_HF- IDLE - INT_BDA_IDLE_FROM_HF- IDLE
INT BDA IDLE FROM HF- IDLE - INT- BDA- IDLE- FROM- HF- IDLE
INT BDA IDLE FROM HF- IDLE - INT- BDA- IDLE- FROM- HF- IDLE
INT- BDA- IDLE- FROM- HF- IDLE - INT- BDA- IDLE- FROM- HF- IDLE

S 19,

X

-----------

Y 0,

F 20

READ1_FROM BDA IDLE

-

HF IDLE

S 20,

X

-----------

Y 0,

F 14

READ2 FROM BDA IDLE

-

HF- IDLE

21,
21,
21,
21,
21,
21,
21,
21,

X
X
X
X
X
X
X
X

--110101-1-,
--00010101-,
----1------,
----00-----,
----011----,
----0100---,
----0101-0-,
--00010111-,

y
y
y
Y
y
y
y
y

F
F
F
F
F
F
F
F

MR READ 1
MR WRITE
WRITE_FROM_BDA IDLE
WRITE_FROM_BDA_IDLE
WRITE_FROM_BDA_IDLE
WRITE_FROM_BDA_IDLE
WRITE_FROM_BDA_IDLE
WRITE_FROM_BDA_IDLE

-

INT_BDA_IDLE_FROM_HF- IDLE
INT_BDA_IDLE_FROM_HF- IDLE
INT_BDA_IDLE_FROM_HF IDLE
INT_BDA_IDLE_FROM_HF- IDLE
INT_BDA_IDLE_FROM_HF- IDLE
INT_BDA_ID LE_F ROM_HF- IDLE

S
S
S
S
S

S
S
S
S
S
S
S
S

1,
1,
1,
1,
1,
1,
1,
1,

2
11
18
18
18
18
18
18

-

HF- IDLE
HF IDLE
HF- IDLE
HF- IDLE
HF- IDLE
HF- IDLE

-

-

-

7-41

-

-

~

=t: ~~RESS
Designing with the CY7C439 Bidirectional FIFO (BIFO)
~~~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. 7C601 Control Logic Design File (Continued)

S
S
S
S
S
S
S
S
S
S
S
S
S
S

22,
22,
22,
22,
22,
22,
22,
22,
22,
22,
22,
22,
22,
22,

X --110101-1-,
X --00010101-,
X ----1-----0,
X ----00----0,
X ----011---0,
X ----0100--0,
X ----0101-00,
X --000101110,
X ----1-----1,
X ----00----1,
X ----011---1,
X ----0100--1,
X ----0101-01,
X --000101111,

Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F
F
F
F
F
F

2
11
22
22
22
22
22
22
23
23
23
23
23
23

S
S
S
S
S
S
S
S
S

23,
23,
23,
23,
23,
23,
23,
23,
23,

X --110101-1-,
X --00010101-,

Y
Y
Y
Y
Y
Y
Y
Y
Y

1,
1,
0,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F
F

2
11
24
26
23
23
23
23
23

X --110100-1-,
X --00010001-,
X ----1------,
X ----00-----,
X ----011----,
X ----010--0-,
X --00010-11-,

S 24, X -----------

Y 0, F 25

S 25, X -----------

Y 0, F 10

S
S
S
S
S
S
S
S

Y
Y
Y
Y
Y
Y
Y
Y

26,
26,
26,
26,
26,
26,
26,
26,

X --110101-1-,
X --00010101-,

X
X
X
X
X
X

----1------,
----00-----,
----011----,
----0100---,
----0101-0-,
--00010111-,

1,
1,
1,
1,
1,
1,
1,
1,

F
F
F
F
F
F
F
F

2
11
23
23
23
23
23
23

MR READ 1
MR WRITE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE
INT_BDA_FROM_WRITE_IDLE

INT_BDA_FROM_WRITE_IDLE
- INT_BDA_FROM_WRITE_IDLE
- INT_BDA_FROM_WRITE_IDLE
- INT_BDA_FROM_WRITE_IDLE
- INT_BDA_FROM_WRITE_IDLE
- INT_BDA_FROM_WRITE_IDLE
-INT_BDA_IDLE_FROM_WRITE_IDLE
-INT_BDA_IDLE_FROM_WRITE_IDLE
-INT_BDA_IDLE_FROM_WRITE_IDLE
-INT_BDA_IDLE_FROM_WRITE_IDLE
-INT_BDA_IDLE_FROM_WRITE_IDLE
-INT_BDA_IDLE_FROM_WRITE_IDLE

MR READ 1
MR WRITE
INT BDA IDLE_FROM_WRITE_IDLE - READ1_FROM_BDA_IDLE
INT_BDA_IDLE_FROM_WRITE IDLE - WRITE_FROM_BDA_IDLE
;INT_BDA_IDLE_FROM_WRITE_IDLE INT_BDA_IDLE_FROM_WRITE_IDLE
INT_BDA_IDLE_FROM_WRITE_IDLE
INT_BDA_IDLE_FROM_WRITE_IDLE
INT_BDA_IDLE_FROM_WRITE_IDLE
INT_BDA_IDLE_FROM_WRITE_IDLE

MR READ 1
MR WRITE
WRITE_FROM_BDA IDLE
WRITE FROM BDA IDLE
WRITE_FROM_BDA_IDLE
WRITE_FROM_BDA_IDLE
WRITE_FROM_BDA_IDLE
WRITE_FROM_BDA_IDLE

-

INT_BD~IDLE_FROM_HF_IDLE

INT BDA IDLE FROM HF IDLE
-INT_BDA_IDLE_FROM_HF_IDLE
INT_BDA_IDLE_FROM_HF_IDLE
INT_BDA_IDLE_FROM_HF_IDLE
INT_BDA_IDLE_FROM_HF_IDLE

*STATE-ASSIGNMENT
Z-VALUES
*PIN
CLK = 1,
A29 = 9,
Q3
21,
Q2 = 16,

BDA
A2
Q1
MR

= 2, HF = 3, EF = 4, RD = 5, WE = 6, A31 = 7, A30
10, INULL = 11, MHOLD = 14, INTACK = 13,
18, LEOE
15, Il = 17, BYPA = 19, 10 = 20,
22, STBA = 23;

*RUN-CONTROL
LISTING = LONG,SYMBOL-TABLE,EQUATIONS,PINOUT,PLOT,FUSEPLOT;
PROGFORMAT = L-EQUATIONS,JEDEC;
OPTIMIZATION = p-terms;
*END
7-42

8,

~

~

=:.~~
SEMlccmUCTOR

Designing with the CY7C439 Bidirectional FIFO (BIFO)

Appendix B.· 80386 Control Logic Design File

*IDENTIFICATION
BIDIRECTIONAL FIFO CONTROL CIRCUITRY FOR THE INTEL 386
SEAN DINGMAN
CYPRESS SEMICONDUCTOR
*PAL
TYPE=PALC22V10;
*X-NAME S
CLK,RESET,MR,BDA,BYPA,HF,EF,RW,MIO,A16,ADS,INTA;
*Z-NAMES
I2,I1,IO,STBB,BYPB,Q5,Q4,Q3,Q2,Q1;
*Z-VALUES
Sl
S2
S3
S4
S5
S6
S7
S8
S9
S10
Sl1
S12
S13
S14

0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
1
0
0
0
0
1
0
0
0
0
0
0

0
1
1
0
0
0
0
1
0
0
0
0
0
0

1
1
1
1
0
0
0
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
0
0
0
0
0
0

S15
S16
S17
S18
S19
S20
S21
S22
S23
524
S25
S26
S27
S28
S29
S30
531
S32
S33
S34
S35

0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
1
0
0
0
0

0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0

1
1
0
0
0
1
1
1
0
0
0
1
1
1
1
1
1
1
1
1
1

1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
1
1
0
0
0

--000
0-----001
-0000
-0001
-0010
1----0000
-0001
-0010
-0011
-0100
-0101
--010
-0011
-0100
-0101
0-----011
1----0110
-0111
-1000
0-----100
-0110
-0111
-1000
1-----101
-1001
-1010
-1011

(A) READ IDLE (6)
INT SW READ
(D) INT_NOT_EMPTY
(A) NOT EMPTY IDLE
(B) READ FROM NOT EMPTY IDLE 1 (9)
(B) READ=FROM=NOT=EMPTY=IDLE=2
(B) READ_FROM_NOT_EMPTY_IDLE_3
(D) INT_EMPTY
(C) WRI TE_FROM_NOT_EMP TY_IDLE_1 (12)
(C) WRITE FROM NOT EMPTY IDLE 2
(C) WRITE-FROM-NOT-EMPTY-IDLE-3
(C) WRITE-FROM-READ IDLE-1
(C) WRITE=FROM=READ=IDLE=2
(C) WRITE_FROM_READ_IDLE_3
(A) WRITE IDLE
INT SW WRITE
(B) WRITE_FlROM_WRITE_IDLE 1
(B) WRITE FROM WRITE IDLE 2
(B) WRITE=FROM=WRITE=IDLE=3
(E) INT HF
(A) HF_IDLE
(E) INT_NOT_HF
(B) WRITE FROM HF IDLE 1
(B) WRITE-FROM-HF-IDLE-2
(B) WRITE=FROM=HF=IDLE=3
(F) INT_BDA_FROM_HF_IDLE
(A) INT BDA IDLE FROM HF IDLE
(C) READ_FROM_HF=IDLE=l (C) READ FROM HF IDLE 2
(C) READ=FROM=HF=IDLE=3
(F) INT_BDA_FROM_WRITE_IDLE
(A) INT BDA IDLE FROM WRITE IDLE
(C) READ FROM WRITE IDLE 1 (C) READ-FROM-WRITE-IDLE-2
(C) READ=FROM=WRITE=IDLE=3

7-43

&.~
~

--;;;;;;===D;;;;;;;e;;;;;;;s;;;::ig::;;D;;;;;;;iD;;;;::g;;;;;;Wl;;;;;;;O;;;;;;;th;;;;;;;t;;;;;;;h;;;;;;;e;;;;;;;C;;;;;;;Y;;;;;;;';;;;;;;C;;;;;;;4;;;;;;;39=B;;;;;;;id;;;;;;;ir;;;;;;;e;;;;;;;ct;;;;;;;io;;;;;;;D;;;;;;;8;;;;;;;1;;;;;;;FI;;;;;;;F;;;;;;;O;;;;;;;;;;(B;;;;;;;I;;;;;;;F;;;;;;;:;O)

SEMIcc:wucrOR_

Appendix B. 80386 Control Logic Design File (Continued)
*FLOW-TABLE
; RE5ETING 5TATE5
RELEVENT = RE5ET,MR,BYPA
5[1..26],
X 0--, F 1
5[1..26],
X 100, F 16
X 101, F 2
5[1..26],

READ IDLE ON RE5ET
5WITCH DIRECTION5 B-A
SWITCH DIRECTION5 A-B

RELEVENT
RELEVENT

MR = 1;
RE5ET = 1;

; READING
RELEVENT

EF,RW,MIO,A16,AD5,INTA;

S
5
5
5

1
1
1
1

READ_IDLE
READ_IDLE
READ IDLE
READ_IDLE

- INT_NOT_EMPTY
- INT_NOT_EMPTY
- WRITE BYPA55 FROM READ IDLE 1
- READ~IDLE
-

S 2
5 2

INT_5W_READ - READ_IDLE
INT_5W_READ _. INT_5W_READ

5 3
5 3

INT_NOT_EMPTY INT_NOT_EMPTY -

5 4
5 4
5 4

NOT EMPTY IDLE - READ FROM NOT EMPTY IDLE 1
NOT_EMPTY:=IDLE - WRITE_FROM_NOT_EMPTY_IDLE_l
NOT_EMPTY_IDLE - NOT_EMPTY_IDLE

5 7
S 7

READ_FROM_NOT_EMPTY_IDLE_3 READ_FROM_NOT_EMPTY_IDLE_3 -

5 8

INT_EMPTY INT_EMPTY -

S 8
S 9

INT_EMPTY
NOT_EMPTY_IDLE

READ_IDLE
INT_EMPTY

F 10

X

5 10, X

NOT_EMPTY_IDLE
NOT_EMPTY_IDLE

------

5 11, X

F 4

WRITE_FROM_NOT_EMPTY_IDLE_3 -NOT_EMPTY_IDLE

5 12,

X

F 13

WRITE_FROM_READ_IDLE_l -

WRITE FROM READ IDLE_2

5 13, X

F 14

WRITE_FROM_READ_IDLE_2 -

WRITE_FROM_READ_IDLE_3

5 14, X

F 1

WRITE_FROM_READ_IDLE_3 -

READ_IDLE

7-44

Q"""""

Designing with the CY7C439 Bidirectional FIFO (BIFO)

SEMICC.NDUCrOR=~~====~
Appendix B. 80386 Control Logic Design File (Continued)

;WRITING
RELEVENT
S
S
S
S

15,
15,
15,
15,

=

HF,BDA,RW,MIO,A16,ADS,INTA ;

X --1010-, F 17
X -0-1---, F 31
X -0-00--, F 31
XREST
F 15

S 16, X ------1, F 15
S 16, X ------0, F 16
S 17, X -------

F 18

S 18, X -------

F 19

WRITE_IDLE
WRITE_IDLE
WRITE IDLE
WRITE_IDLE

- WRITE_FROM_WRITE_IDLE 1
- INT_BDA_FROM_WRITE_IDLE
- INT BDA FROM WRITE IDLE
- WRITE IDLE

INT_SW_WRITE - WRITE_IDLE
INT_SW_WRITE - INT_SW_WRITE

S 19, X 1------, F 15
S 19, X 0------, F 20

WRITE_FROM_WRITE_IDLE_3 - WRITE_IDLE
WRITE_FROM_WRITE_IDLE_3 - INT HF

S 20, X ------1, F 21
S 20, X ------0, F 20

INT HF INT_HF -

S 21,
S 21,
S 21,

HF IDLE
HF IDLE
HF IDLE
HF IDLE
HF IDLE
HF_IDLE
HF IDLE

S 21,

S 21,

S 21,
S 21,

X 1--1---,
X 1--00--,
X --1010-,
X 00-1---,
X 00-00--,
X --0010-,
XREST

F 22
F 22

F 23

F 26
F 26
F 27
F 21

HF IDLE
INT_HF
- INT_NOT_HF
- INT_NOT_HF
- WRITE FROM_HF_IDLE 1
- INT- BDA- FROM-HF-IDLE
- INT BDA FROM HF IDLE
- READ_FROM_HF=IDLE 1
- HF IDLE

S 22, X ------0, F 22
S 22, X ------1, F 15

INT_NOT_HF - INT_NOT_HF
INT_NOT_HF - WRITE_IDLE

S 23, X -------

F 24

WRITE FROM HF IDLE 1 - WRITE FROM HF IDLE 2

S 24, X -------

F 25

S 25, X -------

F 21

S 26, X ------0, F 26
S 26, X ------1, F 27

INT_BDA_FROM_HF_IDLE INT_BDA_FROM_HF_IDLE -

S 27, X --0010-, F 28
S 27, XREST
F 27

INT BDA IDLE FROM_HF_IDLE - READ_FROM_HF_IDLE_1
INT BDA IDLE FROM HF IDLE - INT BDA IDLE FROM_HF_IDLE

S 28, X -------

F 29

S 29,

X -------

F 30

S 30, X -------

F 21

S 31, X ------0, F 31
S 31, X ------1, F 32

INT_BDA_FROM_HF_IDLE
INT_BDA_IDLE_FROM_HF_IDLE

INT_BDA_FROM_WRITE_IDLE INT_BDA_FROM_WRITE_IDLE -

7-45

INT_BDA_FROM_WRITE_IDLE
INT_BDA_IDLE_FROM_WR_IDLE

.e:~RESS

Designing with theCY7C439 Bidirectional FIFO (BIFO)

~, ~~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix B. 80386 Control,Logic Design File (Continued)
S 32, X --0010-, F 33
, F 32
S 32, XREST
S 33, X -------

F 34

S 34, X -------

F 35

-------

F 15

S 35,

X

INT BDA IDLE FROM WR IDLE - READ FROM_WR_IDLE_1
INT_BDA_IDLE_FROM_WR_IDLE - INT_BDA_IDLE_FROM_WR_IDLE

*STATE-ASSIGNMENT
Z-VALUES
*PIN
CLK = 1, RESET = 2, MR = 3, BDA = 4, BYPA = 5, HF = 6, EF = 7,
RW = 8, MIO = '9, A16
10, ADS = 11, INTA = 13, STBB = 14, BYPB
15,
12 = 16, 11 = 17, 10 = 18, Q1 = 19, Q2 = 20, Q3 = 21, Q4 = 22, Q5 = 23;
*RUN-CONTROL
LISTING = LONG,SYMBOL-TABLE,EQUATIONS,PINOUT,PLOT,FUSEPLOT;
PROGFORMAT = L-EQUATIONS,JEDEC;
OPTIMIZATION = p-terms;
*END

7~46

Microcoded System Performance

This application note describes the performance of
Cypress's microcoded processor devices in 16- and 32-bit
processors configurations. Included is a critical-path
timing analysis of the data loop and control loop for
generic 16- and 32-bit systems. A discussion of the speed
and power advantages offered by CY7C9101 systems is
also presented.

The Cypress microcoded processor family is the
fastest available. Increasing functional integration is evident in the CY7C9101 16-bit slice, which is the
equivalent of four CY7C901s (4-bit slices) and a 2902
carry lookahead generator. By placing these functions on
a single chip, Cypress has reduced the interconnect delays
between chips. Significant improvement in overall system

Data Loop
CY7C245
CY7C90l
Carry Logic
CY7C901
Register

Clock to Output
A, BtoG, P
Go. Po to C n + z
C n to Worst Case
Setup

12

CY7C245

28
9

MUX
CY7C91O
CY7C245

18

4
71 ns

Control Loop
Clock to Output
Select to Output
CC to Output
Access Time

12
12

22
20
66 ns

Minimum Clock Period = 71 ns

Figure 1. CY7C901-Based 16-Bit System (Pipelined System, Add without Simultaneous Shift)

7-47

TO
] CY7C901(6.7.8)

Control Loop

Data Loop
CY7C245
CY7C901
Carry
Logic
CY7C901
Register

Clock to Output
A,BtoG,P

[ Go. 1'0 toG. P

GO, Po to C n + x
+ x. y, z
C n to
to Worst Case
Setup

en

en

CY7C245
MUX
CY7C91O
CY7C245

12
28
12
9
14

12
12
22
20

Clock to Output
Select to Output
CC to Output
Access Time

66ns

18
4
97ns

Minimum Clock Period

= 97 ns

Figure 2. CY7C901-Based 32-Bit System (pipelined System, Add without Simultaneous Shift)

Control LOop

Data Loop
CY7C245
CY7C9101
Register

Clock to Output
A, B to Y, C n <+ 16, OVR
Setup

CY7C245
MUX
CY7C910
CY7C245

12
37
4
53 ns

Minimum Clock Period

Clock to Output
Select to Output
CC to Output
Access Time

12
12
22
20
66ns

= 66 ns

Figure 3. CY7C9101-Based 16-nit System (Pipelined System, Add without Simultaneous Shift)

7-48

Control Loop

Data Loop
CY7C245
CY7C9101
CY7C9101
Register

Clock to Output
A, B to Cn + 16
Cn to Worst Case
Setup

CY7C245
MUX
CY7C910
CY7C245

12
35
24
4

12
12
22
20
66 ns

Clock to Output
Select to Output
CCto Output
Access Time

75ns

Minimum Clock Period = 75 ns
Figure 4. CY7C9101-Based 32-Bit System (Pipelined System, Add without Simultaneous Shift)
throughput, reduced board space, and reduced power requirements are among the advantages of CY7C9101-based
systems over CY7C901-based systems.

Table 1. Icc Calculations
Icc Calculations for 16-Bit Systems (mA)

Minimum Cycle Time Calculations

Cypress CMOS

Power is an important consideration in microcoded
systems. For an equivalent system, the CY7C901 offers
substantial savings in power over bipolar devices. Coupled
with other low-power Cypress CMOS devices, the power
savings over bipolar is clearly evident.
The functional integration of four CY7C90ls with
carry lookahead gives the CY7C9101 even greater advantages. The number of ALU elements is reduced by a
factor of four, and there is a reduction in the carry logic
needed. A comparison between bipolar, CY7C901-based,
and CY7C9101-based systems appears in Table 1. Note
that in this comparison the devices common to all 16- and
32-bit system configurations are included in the Icc computations.
Cypress CMOS devices offer the fastest microcoded
solutions, while keeping power consumption to reasonable
levels. The CY7C901-based systems beat bipolar's fastest
devices in a speed comparison, while consuming roughly
one-third the power. Upgrading to the CY7C9101 results
in even faster systems, at close to one-third the power of
the CY7C901-based systems. This comparison is illustrated in Table 2.

CY7C901 CY7C910
Based
Based
Sequencer
Registered PROM
Carry Logic

Bipolar

100

100

340

90

90

185

110

--

110

ALUElments
4x 4-Bit Slice

320

1060
75

16-Bit Slice
620

Total

265

1695

Icc Calculations for 32-Bit Systems (mA)
Sequencer
Registered PROM
Carry Logic

100

100

340

90

90

185

330

110

330

ALU Elements
8x 4-Bit Slice

2120

640
150

2x 16-Bit Slice
1160

Total

450

2975

Table 2. Speed/Power Comparison of Bipolar, CY7C901, CY7C9101
Maximum Icc (mA)

Minimum Clock Cycle (ns)
CY7C901

Bipolar

CY7C9101

Bipolar

CY7C901

CY7C9101

16-Bit Systems

85

71

66

1695

620

265

32-Bit Systems

111

97

75

2975

1160

450

7-49

CYPRESS
SEMICONDUCTOR

Systems with CMOS 16-bit Microprogrammed
ALUs
This application note shows how to improve
reliability. flexibility. and speed by diagramming timing
for the CY7C9116 and CY7C9117 arithmetic and logic
units (ALUs). Also highlighted are applications that
benefit significantly from these deviCes' architecture and
CMOS technology.
In the past. the dominant use of microprogrammed
ALU s has been as general-purpose data processors in
computers. Using microprogrammed machines in these
applications improved performance because generalpurpose microprocessors were too slow. In addition to
allowing custom instruction sets. microprogrammed
processors provided the only way to achieve the desired
number of MIPS (millions of instructions per second).

With the advent of high-performance. 30-MIPS
reduced instruction set computers (RISC). however.
microprogrammed ALUs have relinquished their hold
on general-purpose data-processor applications and
found homes as custom processors or special-purpose
controller s.

The CY7C911617
The CY7C9116 and CY7C9117 are extremely fast
arithmetic and logic units implemented in a 1.2-micron.
double-metal. CMOS process technology. As shown in
Figures 1 and 2. the CY7C9117 differs from the
CY7C9116 by incorporating separate buses for data
input (0) and output (Y) and thus allows for the design
10 Ell
111

YO

IZ IIORD
DLE

10 -

Il.~>-~r-~--------~---+----r-r-r----+~~----~~~---.

I SRE c::>----t----j

OEt
KUI A_D ZEIO DETECT

Tl _ T'i:=::>-----I

CT

Figure 1. CY7C9116 Block Diagram

7-50

-

Y15

Systems with CMOS 16·bit Microprogrammed ALU's

of faster microprogrammed systems. Otherwise the
CY7C9116 and CY7C9117 are identical and will be
described here as a single device. Both units are
capable of 35-ns worst-case propagation delays from instruction in to data out.
The CY7C911617 contains a single-port, 32 x 16-bitword register file; two operand arithmetic units; and
three input logic units. Carry-look-ahead logic is also integrated with the logic and arithmetic units.
The CY7C911617's instructions can be divided into
eleven types, as listed in Table 1. The on-chip barrel
shifter attains single-clock operation on the extensive
bit-manipulation and rotate instructions. In fact, all instructions in the ALU execute within one clock except
for immediate instructions, where a second clock is
needed to obtain the immediate operand.
The CY7C911617 is TIL compatible and fully interchangeable with its counterparts from Advanced Micro
Devices and Texas Instruments. However, exercise caution when illegal instructions or undefined opcodes are
used. Because the results are not predictable or guaranteed during these operations, they should not be used in
any production system. Table 2 shows an example of
such a condition, when SOA is mistakenly encoded as
an undefined operation.
Another feature of the Cypress CY7C911617 is that
it allows the priority instruction to use both the source
and destination as the accumulator. Be aware that older
implementations of this architecture in bipolar technol-

Table 1. 7C911617 Instruction Types
Instruction Type
Single Operand inc:
Two Operand add:
Single Bit Shift shupl:
Bit Oriented
Rotate by n bits
Rotate & Merge
Rotate & Compare
Prioritize
CRC
cref:
Status
No-Op

~

sre plus 1-> dest
sre plus sre -> dest
sre up 1
setnr:
set RAM bit n
rotate RAM n bits
rotrl:
rotate sre and src' w/mask
mdai:
rotate src cmp w/sre' set cc
rotc:
indicate highest priority bit
prtnr:
create ere fwd from qlink
reset status register
rstst:
noop:

no effect

ogy do not allow such an operation. When you use older
bipolar implementations or test devices, some machines
might behave improperly, and undefined or illegal
operations might produce different results for various
device types, depending on vendor and technology.

Faster Operation and Lower Power
Combining
the
CY7C911617's
advanced
microprogrammed architecture with Cypress's CMOS
process technology provides many benefits. Specifically,
custom computing units and controllers can operate at
higher frequencies and consume less power - about
IOE7
16
16

3Z VOID

YO
DO

OLE
11 lIT

RA

ADDIESS

10 _

11~>--~~-~---~---r-~-+-+-~-r-+--+-+-4--~

1 S REc : : : > - - - - - j - - - - j

OEtC=>--V

F MUI AID ZEIO DETECT
T1 -

TC::::::>--'-'---\

CT

c:::::J-------'

Figure 2. CY7C9117 Block Diagram

7-51

-

TIS
011

SJ:=
~

Systems with CMOS 16-bit Microprogrammed ALU's

~~R~~~~~~~~~~~~~~~~~~~~~~~~~~~

Table 2. Example Instruction Encoding Error
SOA instruction:

Table 3. CMOS vs. Bipolar Performance and Power

ACC -> Y bus

Cypress

E2lOO
Instruction Code

Generic
~

Rm!J.t
Speed '(ns)

35

53

30
150

600

Conect encoding:
All

111\;000 10000000

Power (Icc, rnA)
Stactic
Max@10Mhz

()()()()1l10()()()()1101

A

Coding Error

Technology

AMD

l11QJUlO 10000000

1111111111111110

Cypress

1110011010000000

11110100 1000 1100

11

1110011010000000

()()()()OOOO()()()()IOOI

CMOS

400

Bipolar

tion memory. In this example a 74F151 multiplexes
status and condition-code information into the sequencer to complete the control loop. The components that
make up this system are appropriate for embedded application s that have a fixed microcode control store.
You can improve system performance and
flexibility by using Cypress static RAMs instead of
PROMs, thus forming a write able control store (WCS).
(In this case, flexibility represents the ability to
download or reprogram microcode at run time, which
permits the system designer or user to load different applications or algorithms into the machine.) As
diagrammed in Figure 6, four CY7C168 4K x 4 static
RAMS can replace the ROMed microcode control
store. However, you must add an external 74FCT374A
register to replace the CY7C245A PROM's on-chip
register. Thus, you pay a board space penalty for slightly improved performance and flexibility.
The data-loop timing for both the embedded and
reprogrammable microcoded applications appears in
Figure 7. Here, the CY7C911617 and its fast operation

80% less - while offering higher reliability. Table' 3
compares the performance and power characteristics of
a typical 16-bit microprogrammed ALU and the
CY7C911617. The results show a significant power
savings, which promote lower die temperatures and thus
enhance the CY7C911617's reliability.
Other aspects of the CY7C911617's CMOS processing technology also contribute to increased system
reliability. In the past, CMOS technologies experienced
problems with destructive latch-up conditions. Cypress
CMOS processes minimize this problem by employing
guard rings and a substrate bias generator to achieve
latch-up trigger currents in excess of 200 rnA. Also contributing to reliability and performance are voltage
supply tolerances of 10% and electrostatic discharge
(ESD)' protection circuitry, which allows the device to
withstand voltages greater than 2001 v.

System Timing
In microcoded systems, two loops determine system
performance: the data and control loops. The control
loop (Figure 3) is essentially the instruction stream for
the CY7C911617. The current instruction combined with
other status information generates a new address and
instruction for the processor.
The data loop (Figure 4) moves information from
an external source to a register; the CY7C911617 then
uses the information to produce a result and status information for use by the external element. Because instructions and data are in separate domains, it should
be apparent that this is a Harvard-style architecture.
Thus, to achieve optimal performance, both the control
and data loops should be as short as possible and equal
in length.
Figure 5 shows an example of control loop timing
for a typical CY7C911617 system. Four CY7C245A
registered 2K x 8 PROMs implement the control store
and current state register. The CY7C91O 12-bit microsequencer allows for 4K words of addressing, i.e., instruc-

CC from ALU

L-______

~

L-__________

toALU

Figure 3. Microcoded System Control Loop

7-52

Systems with CMOS 16-bit Microprogrammed ALU's

Mux. Delay
7C910 CC -> Output
7C168 Access time
74FCf374 CP -> Q
Total

9ns
22ns
20ns
~

57.5n:

Figure 4. Microcoded System Data Loop

benefit the systems designer in two ways. First, because
the data path is significantly faster than the control
path, results are available early for the external data
units, thereby allowing more time for external operations. Second, as faster memory technologies become
available, you can design systems to operate at rates up
to 25 MIPS.

Applications, Old and New
The applications for fast 16-bit microprogrammed
CMOS ALUs fall into two categories. The first category
resembles these devices' traditional use as a central
processing unit for general-purpose computing. You
might use a microprogrammed machine simply because
instruction-set compatibility with previous machines is a
design requirement. Here, the CY7C911617's speed and
low power serve as powerful upgrades to existing
hardware, with the possibility of lower cost from
reduced power supply needs.
The more exciting applications for 16-bit
microprogrammed
ALUs are in loosely coupled

Mux. Delay

Microcode Control Store

7C910 CC -> Output
7C245A Setup Time
7C245A CP -> Q
Total

9ns
22ns
12ns
.l.§n§.

Figure 6. 7C911617 Reprogrammable Control Loop
Timing

coprocessor or embedded controllers. Here, the
CY7C911617's special bit, rotate, and CRC capabilities
deliver significant performance advantages over "offthe-shelf' microprocessors. Graphics and imaging
coprocessor s benefit from single-clock bit manipulation
and rotation. The forward and reverse CRC instructions
prove very helpful in communications and disk-controller applications, in terms of speed and code density.
Graphics, communications, and disk controllers are just
three examples that benefit from an application-specific
instruction set, as provided by microprogrammed
machines such as the CY7C911617.
There remain a myriad of custom control and embedded applications in military, industrial, and commercial systems that can exploit the performance and
flexibility of the CY7C9116 and CY7C9117 CMOS 16bit microprogrammed arithmetic and logic units.

6.5ns

3m
41.5ns

61ns

7C245 Registered Proms
Current State Register
Registered Output

Figure 5. Embedded Application Control Loop Timing

Figure 7. Microcoded System Data Loop Timing

Section Contents
Page

RIse
SPARC Software Advantages Over CISC ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-1
Register Windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-3
CY7C600 System Design Footnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-7
The Impact of Memory on High-Performance RISC Microprocessors. . . . . . . . . . . . . . . . . . . . . . . .. 8-17
High-Speed CMOS SPARC Design ....................................................... 8-23
SPARC System Surface-Mount Design .................................................... 8-33
Memory System Design for the CY7C601 SPARC Processor ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-38
Cache Memory Design .................................................................. 8-48
Synchronous Trap Identification for CY7C600 Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-65
An Introduction to Mbus .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-69
Multiprocessing System Boot-Up ......................................................... 8-81
Porting UNIX to the CY7C604 or CY7C605 ............................................... 8-84
Getting Started with Real-Time Embedded System Development. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8-89
SPARC as a Real-Time Controller ........................................................ 8-95
Memory Protection and Address Exception Logic for the CY7C611 SPARC Controller ........ 8-108

~

,

iii

CYPRESS
SEMICONDUCTOR

SPARe Software Advantages over else

This application note explains the ways in which
SPARe promotes more efficient software implementations
of applications. Several attributes of the SPARe architecture make efficient high-level language (HLL) optimizing
compilers possible. These attributes ~nable a compiler to
map code from HLLs such as e, Fortran, and Pascal into
SPARe native code without a significant loss in execution
speed.

for the target machine. Without knowing what the HLL
program is trying to accomplish, the compiler cannot
select the one optimal instruction out of several similar
instructions that accomplish almost the same result. The
compiler merely uses one that works in all situations. The
generated code is optimal in terms of· execution speed
only by chance.
When the epu's pipeline is not visible to the compiler, there is no way to schedule native code instructions
to take advantage of unfilled pipeline slots. These pipeline
"bubbles" exist in every computer architecture because of
delays caused by the underlying system hardware. Because .the compiler cannot schedule operations during
these bubbles, the processor spends a significant portion
of its time in an idle or No-op mode.
A destructive two-address architecture means that an
instruction is of the form
A & B --> A
where A and B are registers, "&" is a logical or arithmetic operation, and "--" signifies that the results are
moved to a destination (in this case, A). This instruction
destroys the contents of A; hence the name, destructive
two-address architecture. When A's contents are
destroyed, the data that was stored in A cannot be used
for further calculations.
This architecture imposes a stiff overhead penalty for
an algorithm such as a recursive digital fllter, in which the
intermediate results of an input-value stream multiplied by
some constant are reused to produce the final value. To
overcome this limitation, the intermediate values must be
saved somewhere, then constantly reloaded into the
registers. A programmer might be able to save some of
this overhead by loading multiple copies of the data into
several registers and switching from one set of registers to
the other. Although a programmer might have the craft
and intelligence to do this, a compiler does not.
The mixed' memory/register model of execution
means that the instruction set allows the programmer to
specify that values can be directly fetched from or stored

CISC Software Drawbacks
The efficiency of an optimizing compiler is critical.
Before the development of RISe architecture, a compiler
designer was faced with the near-impossible task of creating a compiler that mapped HLL code correctly into elSe
native code and simultaneously generated an optimal
elSe native code stream. There is no way of algorithmically resolving the twin objectives that compiled
elSe native code be both correct and optimal- i.e., that
the code does what the programmer defined in the HLL
code and out of all the instruction streams possible, the
one generated executes in the shortest time.
The following primary attributes of a elSe architecture cause this fundamental difficulty:
A complex, non-orthogonal, overlapping instruction
set
Non-visible execution pipeline
Destructive two-address architecture
Mixed memory/register model of execution
A complex, non-orthogonal, overlapping instruction
set allows you to substitute more than one native code
instruction sequence for the same HLL instruction. For
example, a typical elSe instruction set has more than one
instruction for addressing memory, performing arithmetic
instructions, testing and branching, etc. No one instruction
in an instruction category is optimal for all situations, and
there lies the problem.
A compiler only knows how to accomplish tasks, not
why. The compiler does not understand what the programmer is trying to do with the HLL code. The compiler only
knows how to parse the HLL code to produce native code
8-1

to memory. Because of the physical properties of
electronic circuits, the data value does not appear in the
CPU instantaneously. Some time is needed to assert the
address lines, to let the read strobe reach a steady voltage
level, etc. During this time, the processor is idle. Because
of CISC's non-visible pipeline, another instruction cannot
be scheduled to utilize the idle time. The pipeline bubble
must be left unfilled, causing a decrease in processor efficiency.

decoupled from ALU operations. This means that the
ALU can be operating in parallel with the SPARC chip's
load/store components, overlapping operations and increasing the processor's efficiency.
SPARC's non-destructive, triadic address architecture
has instructions of the following form:
A & B --> C
Where A, B, and C are registers; &" is a logical or
arithmetic operation; and "__" signifies that the results are
moved to a destination (in this case, C). The contents of
the A and B registers are presereved during this operation
- hence the name non-destructive:· The; compiler can
reuse the data in both A and B in subsequent operations,
saving the overhead of reloading intermediate data again
and again.
In addition to allowing hardware speed to be increased by scaling the device geometry and/or retargeting
to another semiconductor technology, the SPARC architecture allows the creation of efficient optimizing HLL
compilers. This software advantage improves the productivity of application developers because they can write
code in an HLL such as C and still achieve the performance they need.
The majority of applications for mainframes, minis,
and PCs were first written in assembly code because that
was the only way to attain the execution speed needed to
run the application algorithm at a reasonable rate.
Programmer productivity is measured in lines of debugged
code per day. The number of lines produced is the same,
whether they are lines of assembly or HLL code. Because
one line of HLL code can be equivalent to ten or more
lines of assembly code, the ability to write an application
in C or another HLL can increase software productivity
by a full order of magnitude.
II

RIse Software Advantages
In contrast, the· following significant factors of a
RiSe machine make efficient optimizing compilers possible:
A simplified, orthogonal instruction set
Visible execution pipeline
Load/store model of execution
Non-destructive triadic address architecture
With a simplified, orthogonal instruction set, only a
small set of native code instruction streams achieve the
effect of an HLL instruction. This simplifies the
compiler's task of selecting the correct native code stream
to emulate an HLL instruction. Instead of spending effort
to ensure that the native code does what the HLL program
states, the compiler writer can concentrate on scheduling
the generated native code so that it executes in the minimum amount of time.
SPARC's visible execution pipeline allows an optimizing compiler to see when idle periods occur. Using
this knowledge, the compiler can re-schedule native code
instructions toJill these empty slots in the pipeline.
In the load/store· model of execution, data is first
loaded from memory into registers or stored into registers
before being sent to memory. Data load/stores are

8-2

Register Windows

This application note explains how the Cypress
CY7C601 SPARC microprocessor uses register windows
and shows how they decrease system execution time.
The CY7C601 is one of the few processors to use
register windowing for context. switching. When entering
and returning from trap handlers and procedures, the
CY7C601 thus enjoys a significant speed advantage over
other processors with "flat" register files. Register windows are also the CY7C601's least understood architectural feature.

has a flat register file, each register is addressed as an
offset from the beginning of the register file. A register's
effective address equals the register number times the
register's size (usually 32 bits) plus the address base (0 for
a flat register file).
The CY7C601's register windowing feature adds an
entry to the processor state register (PSR) that provides
the base address used to generate the effective register address. This entry in the PSR is called the Current Window
Pointer (CWP). Changing the CWP by one offsets the
register addressing by 16. Thus the effective register address is: (the CWP times 16 plus the register number)
times the register size (32 bits). High-speed hardware ensures that the correct register can be selected and the data

Register Windows
Most of today's microprocessors implement a register
file as a contiguous piece of fast memory. If a processor

Windowed register file

Flat register file

Register offset

CWP
----t

Register offset

Figure 1. Addressing Mechanisms

8-3

~

~~~D~OR ~~~~~~~~~~~~~~~~~~~~~~R~e~g~i~s~te~r~VV~in~d~o~w~s
Prevloua WIndow (CWP

+ 1)

awe

f31

II'

INS

f24

.. Restore

r23

ris

LOCAlS

Cwrent Window (CWP)

r1~

f8

OUTS
Next WIndow (CWP - 1)

r31
INS

r23

riS
r1~

r8

LOCALS
OUTS

Figure 2. Overlapping Register Windows
windows of 16 registers each plus eight global registers.
Each procedure has 16 registers for its exclusive useeight locals and eight outs. 32 registers can be addressed
- eight locals, eight outs, eight globals, and eight ins
(from the calling procedure).
97 percent of all procedures pass fewer than six
parameters during a procedure call; the average is 2.1.
Eight registers are more than sufficient for passing
parameters between procedures. If more parameters need
to be passed, one of the registers is used as a frame
pointer, and the additional parameters are stored to
memory. The eight outs can carry data and address from
the current procedure to a procedure called by the current
procedure or to data local to the procedure. As Figure 2
shows, the CWP is decremented when a procedure is
called and incremented when a procedure returns.

loaded or extracted from the register in one clock. The
diagram in Figure 1 illustrates this register addressing
mode.
The CWP allows you to partition the register me into
separate sets, or windows. When. a context switch is
necessary after a trap is taken or a procedure called, the
processor can save its old state information and get a new
set of registers to use simply by incrementing the CWP.
The CY7C601 performs this operation with one singlecycle instruction.
In most processors that use a flat register file, a
process can get a new set of registers to use only after
saving the current registers to memory to preserve state
information. Depending on the number of registers to save
and the clock time for a save instruction, the context
switch can take quite a while.

Parts of the Register File

The Window Invalid Mask

The CY7C601 employs four types of registers: outs,
ins, locals, and globals (Figure 2). The ins registers contain values from the procedure that called the current procedure. The globals can be accessed by any procedure, no
matter what the procedure's nesting level. The outs hold
local information or pass information to a procedure that
the current procedure calls. The locals are for the current
procedure's exclusive use. The diagram on the next page
gives a conceptual picture of what these overlapping
register windows look like.
Registers are shared between procedures: the previous procedure's outs are the current procedure's ins.
Parameters are passsed between procedures using the ins
and outs. The ins contain the data and return address of
the calling procedure.
Partitioning the register me into windows reduces the
number of registers each process can use. This is why the
CY7C601 has such a large register me. The CY7C601 has
136 registers for a maximum of eight windows - eight

The CWP is not the only unique CY7C601 hardware
feature that supports register windows. The CY7C601 also
includes a dedicated 32-bit register called the Window Invalid Mask (WIM). The WIM tells the CY7C601 how
many windows it has and which ones are active. By comparing the WIM and the CWP, the CY7C601 can determine when· it is attempting to utilize more windows than it
has available. This would not cause a physical problem
because the register me is implemented as a circular
stack, but the data in the first window would be corrupted.
The processor's attempt to use more windows than it
has causes a window overflow trap. During this trap, the
processor saves the oldest window to memory. This is the
only time when the CY7C601 microprocessor must save
registers to memory during a context switch (unless more
than eight parameters must be passed between procedures). On a window overflow, only 16 registers must be
saved to memory (eight locals & eight outs).
8-4

SAVE

Figure 3. Register Window Concept with Eight Windows
An alternative way to conceptually view the
CY7C601 register windows is to think of them as a ring
of registers (Figure 3). As mentioned earlier, the
CY7C601's register file is circular. If the CWP is pointing
to window 7 and is incremented, the CWP now points to
window O. If the CWP points to window 0 and is decremented, the CWP points to window 7.
As an example, say that the CY7C601 makes a procedure call with the CWP pointing to window 1, as shown
in Figure 3. Assume that the WIM has been set to reflect
the fact that eight windows are physically implemented.
Upon making the procedure call, the CY7C601 attempts a
SAVE to provide the called procedure with a new set of
registers. During a SAVE, the CWP is decremented by
one (CWP = 0). The CY7C601 checks this value against
the WIM. Because window 0 must be reserved for use by
the trap handler, the CY7C601 has run out of windows for
user procedures. Bit 0 of the WIM was set to reflect the
fact that window 0 is reserved for system use. Upon
checking the value of the CWP against the WIM, the
CY7C601 detects a window overflow condition and
causes a trap. During this window overflow trap, the

CY7C601 increments the CWP to point back to window 1
and saves the calling procedure's registers (eight ins and
eight locals) to a location in memory. Upon returning
from the procedure, the registers are restored from
memory and the values of the registers in window 1 are
overwritten.

A Versatile Architecture
One of the advantages of register windowing is its
versatility. The SPARC architecture has provisions for implementing up to 32 register windows, and the CY7C601
has eight. You can partition the available registers into
different numbers of windows to increase the CY7C601 's
efficiency for specific applications. When you use a realtime operating system, for example, you can set the WIM
to partition the register set into a small number of windows, say four. You can assign each real-time task to its
own window. The total interrupt response time is now the
interrupt latency (4 - 7 clocks) plus one clock to switch
windows. Compare this response time to the response
time for a CISC or RISC architecture with a flat register

8-5

Table 1. Register Windows vs a Flat Register File

Benchmark Dr02ram

GCC

TeX

Percenta~e

1.8%

3.6%

2.3

3.2

Loads (flat register fil~

3.928710

2.811.545

Loads SPARC 1re~ister windows)

3.313 317

2,736,979

Ratio loads windowslflat

0.84

0.97

Stores (flat register file)

2037,226

1974078

Stores SPARCJregister windows)

124~38

1,401,186

0.61

0.70

of CALL or RETURN instructions

Average registers stored j)er call

Ratio stores windows/flat

to be saved to and restored from memory, reducing
load/store traffic. This traffic reduction shrinks the edge
that register windowing gives the CY7C601 over
microprocessors with flat register files.
Interprocedural register allocation has a glaring weakness, however: It depends upon a complete knowledge of
how many registers the called procedure uses and for
what purpose. With an object-oriented language such as
C++ or Smalltalk, this knowledge is not available at compile time. Interprocedural register allocation is therefore
not possible when using an object-oriented language, and
register windowing's performance edge comes to bear in
full force.
The software world is shifting toward object-oriented
languages such as C++ because of the need for increased
productivity. Register windowing thus makes the
CY7C601 the performance leader for today and promises
to further soldify the CY7C60 l' s lead in the future as the
use of object-oriented languages increases.

file, where saving register contents consumes most of the
interrupt response time.
A register window architecture means that the
CY7C601 must perform fewer load/stores. This is amply
demonstrated by Table 1, which lists the loads and stores
done by two microprocessor architectures: SPARC with
eight register windows and another architecture with a flat
register file.
The two benchmark programs used to obtain the data
in the table are the Gnu C Compiler (GCC) and the text
processing program TeX. Because of register windowing,
the CY7C601 has to do up to 16 percent fewer loads and
39 percent fewer stores, compared to a microprocessor
with a flat register file. Register windowing thus increases
processing speed significantly.
Some of the load/store traffic generated by the use of
a flat register file can be reduced by using interprocedural
register allocation. This technique consolidates the use of
registers to hold variables passed between procedures. By
consolidating the number of registers used, less data needs

8-6

CYPRESS ~~~~~~~~~~~~~
SEMICONDUCTOR

CY7C600 System Design Footnotes

This application note covers several topics that have
generated questions from SPARC systems designers. The
intent here is to provide additional insight into the operation of the CY7C600 chip set through discussion of these
short topics. Of course, a single paper cannot answer all
questions regarding SPARC design. Please contact your
local Cypress field applications engineer regarding any
other questions you might have about SPARC.

left unchanged. This feature provides easy error recovery
in the case of an error-mode-generated reset (more on this
later), because the registers are not changed after an errorcausing condition.
Upon reset, the CY7C6011611 initializes the PSR's
supervisor-mode bit to 1 (enabling supervisor mode) and
sets the ET bit to 0 (traps disabled). The program counter
(PC) and the next program counter (nPC) are initialized to
o and 4, respectively. If the reset is a power-on (initial)
reset, the state of all other registers are undefined. In addition, the state of all fields other than the PSR's ET and S
bits are also undefined. A reset that occurs after the initial
power-on reset (such as a reset to exit error mode) does
not affect any registers other than the PSR, PC, and nPC.
Upon entering execution mode from a power-on
reset, the software designer must ensure that the
CY7C6011611 (and CY7C604A, if present) is properly initialized. Three registers in the CY7C6011611 must be initialized upon power-on reset: the processor state register
(PSR), the trap base register (TBR), and the window invalid mask register (WIM).
One common mistake is to neglect to initialize the
WIM register, which is undefined upon a power-on reset.
If not initialized, this register can unexpectedly disable
one or more windows. The processor state register does
automatically initialize the register's S (supervisor) and
ET (enable traps) bits upon reset, but all other fields must
be initialized by software.
The TBR register must be initialized to point to the
beginning of the trap vector table to handle traps. The
register should be initialized before the PSR's ET bit is
set. Note that three NOP instructions are generally inserted after writes to the PSR, WIM, and TBR registers to
ensure that the CY7C6011611 correctly handles instructions immediately following these special register writes.
Error mode is a self-initiated halt mode that the
CY7C6011611 enters upon encouIitering a synchronous
trap when the PSR's ET bit is set to Zero (traps disabled).
The processor also enters error mode if a return from trap

Reset and Error Modes
The CY7C6011611 is reset by the assertion of the
RESET signal for a minimum of eight clocks. The clock
signal must be active for the CY7C6011611 to correctly
synchronize upon receiving IrnSET.
For systems using the CY7C604A1605A, the system
reset signal is supplied to the CY7C604A1605A PaR
input for a minimum of eight clocks. Upon receiving the
P<:m signal, the CY7C604A1605A asserts the IRST output, which drives the CY7C601 lffiSlIT input. IRST is
released one clock after the mR input to the
CY7C604A1605A is released.
The CY7C6011611 enters reset mode upon receiving
the ImSET signal at a rising clock edge. Figure 1 illustrates CY7C6011611 reset timing. All processor operation halts. The CY7C6011611 asserts address OxOOOOOOOO,
and the appropriate control signals for the fITst instruction
access are asserted while IrnSET is asserted. The
CY7C6011611 remains in reset mode until 'RESET is
released, then the CY7C6011611 immediately enters execution mode. One clock after receiving the release of
ltESET, the CY7C6011611 asserts the address for the next
instruction access on the bus. On the clock after RESET is
released, the CY7C6011611 latches the first instruction on
the data bus. Note that the MAO and MHOLD signals
must be de-asserted while RESET is asserted.
The CY7C6011611 initializes the enable traps (ET)
and supervisor (S) bits of the processor state register
(PSR), the program counter, and next program counter
upon reset. All other registers in the CY7C6011611 are

8-7

elK
A<31:0>
ASI<7:0>

0<31:0>

SIZE<1:0>

~..... . . ~.r- H
C)@X09H

60000< :

~H ~

;~

~

09H

~

(f-

xmcp

~----..-4;(f

C J @ X : : ! , . : .10
10Il

l?(
I ))
1(0

INULL

I

MAO

~

_ _......-(:

:

:!::::::. ::::II:::

1 /

\

I

I

I

.

!:::::::::1::

t:::::::::

Figure 1. Power-On Reset Timing
(RETT) instruction is encountered with either the traps
enabled (ET = 1) or the supervisor bit cleared. Upon encountering one of these conditions, the CY7C6011611 sets
the TBR's tt (trap type) field to reflect the type of
synchronous trap that caused the error state, after which
the processor asserts the ERIDJR signal and halts (Figure
2). Error mode is exited when RESET is asserted.
A CY7C604A1605A responds to EImOlr by executing a watchdog reset. During this reset, the
CY7C604A1605A asserts the msT output (used as the
mET .input),.. sets the watchdog reset bit in the
CY7C604A1605A reset register, and sets the boot-mode
bit of the CY7C604A160SA system control register. All
other registers in the CY7C604A160SA are left unchanged. Because the CY7C604A160SA enters boot mode,
all instruction fetches made by the CY7C601 are fetched
from physical memory on the Mbus; regardless of whether
the cache is enabled. This action is !-lppropriate because
when the CY7C601 enters execution mode from reset, the
processor executes the reset routine, which is generally
stored in nonvolatile physical memory.

causing headaches for those debugging their hardware.
Table 1 lists the state to which you must tie CY7C600
signals for proper operation. The table assumes a
CY7C604A/60SA-based CPU.
Pay attention to the resistance value of passive pull
ups and pull downs to ensure that they match the
CY7C600 buffers' drive capabilities. CY7C600 buffers
sink a minimum of 8 rnA and source a minimum of -2
rnA. 1 KW is a reasonable value for pull-up resistors on
signals that must be driven by CY7C600 buffers. 10 KW
is a reasonable pull-up or pull-down value on input signals such as mmm, CP,or CCCV.

Signal Termination
Design of high-speed CMOS systems requires close
attention to board layout, PCB trace propagation delays,
crosstalk, noise,clock skew, and signal terminations. This
is extremely important to systems operating at 33 to 40
MHz and beyond. (Consult the application note, "Highspeed CMOS SPARC System Design.") Cypress recommends close attention to clock distribution and termination
for any high-speed CMOS design.
Termination is highly recommended on the clock signals for a CY7C600 .system. Due to the minimum slewrate requirement of 0.8 V/ns for CY7C600 parts, you must
pay close attention to the clock drivers' drive and slew-

CY7C600 Pull-Up/Pull-Down Resistors
For proper operation, several signals for the
CY7C600 chip set must be pulled either High or Low.
This has often been overlooked in some SPARC designs,

8-8

2

3

ClK
A<31:0>

~_E_~_R_ _ _ _...--o'flll:f..

ASI<7:0>

~_E_~_R_A_SI_ _ _~

0<31:0>

SIZE<1:0>

'flll:f..
~""-~
'flll:f..
~~xx_____.....'flll:f..

OOOOH

09H

_ _ _ _ _"""

10

~

INUll

\~_ _~~______~____~~_ _ _ _ _ _ ___

RESET·

"RESET must be asserted for a minimum of 8 clocks

(continued)

9

11

10

12

ClK
A<31:0>

0000 H

ASI <7:0>

09 H

D<31:0>

SIZE<1:0>

INUll

10

\~~----~------~

Figure 2. ErrorlReset Timing

8-9

....::=-...

€i!t. ;~RESS

CY7C600 System Design Footnotes

~~ ~eOID~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Table 1. CY7C600 Signal Pull Ups and Pull Downs
Part Affected
CY7C604/605 input/output

Pulled
High

mnY
mTY

CY7C604/605 input

High

CY7C604/605 input

~

CY7C604/605 output

High
. High

Signal
~

Comments
CY7C604/605 cannot acquire Mbus if not pulled up

~

CY7C604/605 input

High

CMER

CY7C604/605 output

High

mE
SNUII

CY7C601, CY7C604/605 input

Low

CY7C604/605 input

High

Assuming a single-CY7C604/605 system

~

CY7C601 input

CY7C604/605 allows this signal to three-state

K1EXC

CY7C601 input

.High
High

mmrn

CY7C601, CY7C602 inputs

High

MiIOLDB

CY7C601, CY7C602 inputs

High

rnm:r>

CY7C601, CY7C602 inputs

High

Required if coprocessor is removable or not present

CP

CY7C601 input

High

Required if coprocessor is removable or not present

CCC[l:O]

CY7C601 input

High

Required if coprocessor is removable or not present

CCCV

CY7C601 input

High

Required if coprocessor is removable or not present

MAO

CY7C601 input

Low

Must be pulled Low for system to operate

CY7C604/605 allows this signal to three-state

FPSYN

CY7C601 input

Low

1FT

CY7C601 input

Low

FNULL

CY7C604/605 input

Low

Required if CY7C602 is removable or not present

FP

CY7C601 input

High

Required if CY7C602 is removable or not present
Required if CY7C602 is removable or not present

FCC[l:O]

CY7C601 input

High

FCCV

CY7C601 input

High

Required if CY7C602 is removable or not present

FIIDrn

CY7C601 input

High

Required if CY7C602 is removable or not present

FEXC

CY7C601 input

High

Required if CY7C602 is removable or not present

rate capabilities. For many designs, you cannot use a
simple parallel resistor termination because the buffer
drive required to attain the minimum clock slew rate often
exceeds the drive capabilities of available CMOS· or TIL
buffers.
One recommended method of clock signal termination is to use one or more diodes to clamp the clock signal
voltages to within a single diode voltage drop of ground
or Vee (Figure 3). Unlike parallel resistor termination,
diode termination does not require a high-drive clock
buffer. And unlike AC termination, diode termination
does not degrade the clock signal's slew rate.

fetch is nullified in the pipeline, but no other valid address
is yet available to assert on the address bus.
For cached systems, !NULL prevents a cache miss on
a nullified access. !NULL is also used by the exception
logic to prevent an exception that might be generated by a
nullified access.
!NULL is asserted when an address is generated in an
interlock case, such as a load that produces a hardware
interlock. !NULL is also generated when a trap or interrupt is encountered. !NULL is asserted in this case to nullify the address generated before the trapped instruction
enters the pipeline's execute stage.
!NULL is asserted:
During the second address cycle of any store instruction (including atomic load/stores)
For the third instruction fetch after a trapped instruction
To nullify the error-causing address after a reset
On a load that causes a hardware interlock
On the execution of JMPL and RETT instructions

!NULL
The CY7C6011611 generates the !NULL signal to indicate that the processor will ignore the current memory
access. !NULL is asserted before the rising clock edge on
which the nullified memory access would have been
latched (Figure 4). This event occurs when an instruction

8-10

a) parallel termination

b) AC termination

c) diode termination

Figure 3. Signal Termination Examples
state of the mxc signal (described later) but does not
cause the pipeline to advance. As Figure 5 shows, 'M'DS' is
sampled on the falling edge of the clock, and the information valid on the data bus at the next rising clock edge is
latched into the CY7C601l611.
Because the pipeline does not advance until 'mIDITI
is released, MDS" can be asserted for more than one clock,
although this is not necessary. The only qualification is
that data must be valid for the rising clock edge after the
last assertion of 'M'DS'. The information on the data bus at
this time is used by the CY7C601!611 when :K1HOIJ) is
released.
~ indicates a memory exception to the
CY7C601!611. Upon detecting an exception case, the exception control logic asserts :MmJiJ) to halt the
CY7C601!611 pipeline. MEXC and M'DS' are then asserted to signal the exception. KIJ:)S" must be asserted with
mxc to cause the CY7C6011611 to latch the value of
MEXC while ~ is asserted. Otherwise, the
CY7C601!611 ignores the ~ signal while :K1HOIJ)
is asserted.

MHOLD, MDS, and ME XC
The CY7C6011611 signals 'mIDITI, 'M'DS', and
MEXC are used by outside control logic to control
CY7C601!611 memory accesses. Typically, this control
logic takes the form of a cache controller or exceptiongeneration controller.
M'H(jI]) freezes the CY7C601l611's pipeline. External logic uses this signal to freeze the CY7C601!611 's
operation so that the supporting memory and exception
logic can· provide a response in synchronization with the
CY7C601!611's pipeline. The CY7C601!611 samples
MHOLD on the falling clock edge. Asserting 'MHOU5
causes the CY7C601I611 to hold its outputs at the state
that would be valid at the next rising clock edge. Note
that this state is driven from the rising clock edge before
:K1HOIJ) is asserted. From the perspective of the
CY7C601l611, the pipeline is frozen before the rising
clock edge following Kii'IImJ) assertion.
K1DS' is used during the assertion of K1HOrn to
cause the CY7C6011611 to latch the data present on the
data bus. MDS" also causes the CY7C601!611 to latch the
elK

A[31:0]

0[31:0]

INUll

~----~------~--_/
Figure 4. INULL Assertion

8-11

,\.-_:...--_-

eLK
A(0-31)

0(0-31)

\~-~----!-'/

' ___I
Figure 5. Wait-State Generation using MHotD with lmJS'

mrrn:::u

must be asserted immediately after the rising
clock edge of a memory access. This method requires fast
logic even at 25 MHz and is probably not feasible for
higher frequencies. The assertion of ~ must make
the set-up time before the falling clock edge (Figure 7).
With ~ asserted, the memory system can catch up
with the CY7C601I611 and assert the data on the bus. The
wait-state logic then releases ~, allowing the
CY7C6011611 to latch the data.

Wait-State Generation
Memory wait states can be generated using ~
or by stretching the CY7C601I611 clock. Because the
CY7C601I611 is a fully static processor design, clock
stretching is a simple method for generating memory wait
states (Figure 6).
Another method is to use ~ to freeze the
CY7C601I611 pipeline. You can do this in two ways. One
way is to use ~ in the same manner as intended for
a cache miss (Figure 7). ~ is asserted by the waitstate logic after the rising clock edge on which the
CY7C6011611 would have latched the memory access.
When the memory has responded to the access, the waitstate logic strobes ~ to make the CY7C601I611 latch
the information, then releases ~.
You can also use ~ to halt the pipeline before
the CY7C6011611 has missed the memory access.

Interrupts
Interrupts are signaled to the CY7C601I611 by asserting the interrupt request level inputs, IRL[3:0]. For the
interrupt to be taken by the CY7C601l611, the value asserted on the IRL[3:0] signals must exceed the value
stored in the processor interrupt level (PIL) field of the
processor status register (PSR). An IRL level of 0 indicates no interrupt; a level of 15 indicates a non-maskable
74AS1805

Clock From

Generator ------------~""
1:>---,.. Stretched
r----II-"'.iIt'--'
Clock
XJ--9-_~Free

Running
Oock

Stretch
Indicator

INULL

Figure 6•. Simple Clock-Stretching Circuit

8-12

elK
A(0-31)

0(0-31)

Figure 7. Wait-State Generation using MHOLD without MDS'

interrupt In addition, the PSR's enable traps (ET) bit must
also be set for the CY7C6011611 to respond to an interrupt input
The IRL[3:0] inputs are sampled, then latched before
the interrupt is allowed to be prioritized and taken by the
CY7C601l611. This requires an IRL[3:0] value to be asserted two clocks before the CY7C6011611 accepts the interrupt input, thus helping to prevent extraneous interrupts. The CY7C6011611 uses interrupt acknowledge (INTACK) to acknowledge that an interrupt has been taken.
The CY7C6011611 asserts INTACK after the rising clock
edge upon which the address of the ftrst trap instruction is
asserted. Interrupts that are not taken, such as those
masked by the PIL, are not acknowledged.
The prioritization stage of interrupt processing compares the interrupt's trap priority level against that of any
other synchronous trap that might be occurring simultaneously. All other trap types take priority over interrupt
traps, and in the case of contention, the other trap is serviced instead of the interrupt In this case, INTACK is not
asserted until the CY7C6011611 has returned from the trap
handler and the interrupt level can again be sampled,
latched, and prioritized.
The CY7C6011611 features extremely fast response
time for interrupt inputs. Interrupt latency (interrupt reception to ftrst trap address assertion) is from four to seven
clocks. The three-clock variation in interrupt latency is
due to the effect of multiple cycle instructions upon
CY7C6011611 execution. Interrupt latencies greater than
four cycles occur when a multiple-cycle instruction is
fetched immediately before an instruction is interrupted.
The worst-case, seven-clock interrupt latency occurs when
a three-cycle instruction is fetched immediately before the
interrupt
Figure 8 shows the assertion of INTACK with
respect to IRL[3:0] and the ftrst interrupt address (TO).

The CY7C601I611 generates TO when the interrupted instruction reaches the pipeline's execute stage. The
CY7C601I611 asserts INTACK during the clock cycle
when the interrupted instruction reaches the pipeline's
write stage. The memory system sees INTACK asserted
on the rising clock edge on which the fITSt trap instruction
returns from memory.
Delays in interrupt acknowledgment are due to multicycle instructions in the pipeline that were fetched before
the interrupted instruction. Therefore, if a multi-cycle instruction is in the pipeline's decode stage when the interrupt is sampled, that instruction's pipeline delays (designated by internal ops, or lOPs) affect the timing of the
interrupt acknowledge. Figure 9 shows these events; instruction 2 is the interrupted instruction. Interrupts are executed before the interrupted instruction, and thus instruction 2 is annulled (not executed).
If the CY7C604A asserts ~, the control signal
outputs from the CY7C601 are frozen, and INTACK is
asserted as long as ~ is asserted. ~ causes
the pipeline for the CY7C601 to freeze, and all bus signals asserted by the CY7C601 during this freeze remain
asserted on the bus.
An example of such a case appears in Figure 10, in
which ~ is asserted due to the fetch of an interrupt
instruction that has been declared non-cacheable in the
MMU. This is a common case, as interrupt handlers are
generally part of the kernel or monitor code. Declaring the
interrupt routine to be in a non-cacheable segment of
memory forces the CY7C604A to fetch the interrupt instruction from main memory. Thus, the CY7C604A must
assert ~ until this instruction is fetched. The assertion of ~ causes the INTACK signal to remain asserted until JillIrn]) is released. Note that this case is
likely to repeat, as the subsequent interrupt instructions
probably reside in the same memory segment.

8-13

Fetch

Write

eLK

A<31:0>

0<31:0>

IRL<3:0>

INTACK

Sampled

Latched

Prioritized

Figure 8. Best-Case Interrupt Latency

INTACK

Figure 9. Worst-Case Interrupt Latency

8-14

Taken

eLK
A <31:0>

D <31:0>

A

IRL<3:0>

INTACK

~

I

MHOLD

\~__~__~r-__~ll

MDS

II

.~

Figure 10. INT ACK Frozen Due to MHOLD
visor state in the PS bit. The CY7C601I611 does automatically save the return address for the trap in the trap-window registers r[17] and r[18].
Note that for nested interrupts, the PSR's PIL field
must be updated to equal the current interrupt level before
re-enabling traps. This protects the CY7C601I611 from
further interruptions by the same level of interrupt.
In addition to saving the processor state, the software
designer must determine how to handle potential window
overflows, which can be caused by nested-trap handlers.
This problem can occur because the CY7C6011611 does
not check the WIM register for window overflow when
the processor enters the next window to process a trap.
This aspect of the SPARC architecture is necessary to
save at least one window for trap handlers. For instance,
the CY7C601I611 checks the WIM register to detect
potential window overflow when a SAVE instruction is
executed. Upon detecting that a window save would push
the processor state into a "WIMmed" window, the
CY7C601I611 enters a window overflow trap. To process
this trap without overwriting the current window registers,
the CY7C601I611 jumps into the WIMmed window, ignoring the WIM register. Because the WIM register does
not affect trap entry, the register must save a window for
trap handlers. The register also prevents procedure calls
from overwriting valid windows.
The use of nested interrupts adds another level of
complexity to window management If th.e entire ~et of
non-WIMmed windows has been used, an mterrupt Jumps
into the last (WIMmed) register window. If traps are
enabled again without any other corrective actions, the
next trap (or interrupt) overwrites the next window upon
entering the trap. To prevent this problem, the software

Nested Interrupts
Upon taking a non-reset trap, the CY7C6011611 executes the following operations:
Sets the PSR's ET bit to Zero (traps disabled)
Copies the PSR's S bit into the PS (previous supervisor) bit, then sets the S bit to One
Decrements the CWP (current window pointer) by
one (next window)
Saves the PC and nPC into r[17] and r[18], respectively, of the trap window
Sets the tt field of the TBR (trap base register) to the
appropriate value (according to IRL[3:0])
Writes the PC with the contents of the TBR, and
writes the nPC with the value of the TBR + 4
Note that upon entering a trap, the CY7C6011611 immediately disables all other traps. Some systems require
that the processor be able to respond to higher-priority interrupts or other traps while executing an interrupt handler. This capability is referred to as nested interrupts.
If the CY7C6011611 must support nested interrupts or
traps, the software designer must re-enable traps after
taking precautions to protect the previous state of the
machine. Most software designers using SPARC systems
use a stack to save windows and the state of the
processor.
Note that for SPARC (and most RISC processors in
general), the hardware does not implement the stack
pointer and the process of saving the processor state upon
entering a trap, which leaves the task to the software
designer. This task includes saving the PSR, because the
CY7C6011611 does not save the PSR upon entering a
trap; instead, the CY7C6011611 saves the previous super-

8-15

Table 2. CY7C604A160SA Mbus Signals
CY7C604/60S
Mbus
Signal Description
Name
Name
Asynchronous Error output ~
AERR*
RSTOUT* Mod,ulereset output signal NmST
mK··
RSTIN* Module reset input signal

designer must ensure that at least one additional window
beyond the current trap window is available before re-enabling traps within a trap handler.

CY7C604A160SA Notes
Three CY7C604N605A signals differ from the corresponding signal name used in the Mbus specification.
Table 2 lists these CY7C604N605A Mbus signals and
their corresponding Mbus names.
If you implement an Mbus arbiter, note that under
certain conditions the CY7C604N605A holds ~ active for multiple Mbus transactions. Those conditions are:

When the CY7C604A is holding the bus during a
table. walk and has not received a relinquish-andretry response
When the CY7C604A is holding the bus for a retried
write
When the CY7C604A is holding the bus for a retried
read
When the CY7C604A is holding the bus for an
atomic load/store that was notrelinquished and
retried
When the CY7C604A is holding the bus to complete
a burst access (normal operation)
When the CY7C604A had the bus for the last transaction that was not relinquished and retried, and currently has a grant, and has an ~ccess pending
Accesses are considered to be pending for the
CY7C604A only when one or more write accesses are
queued in the write buffer. This can take the form of
either multiple write accesses queued in the write buffer,
or of one or more write accesses in the write buffer forcing a pending read access. In the latter case, the read access must remain pending until the write buffer is cleared.
Read transactions must be delayed until the write buffer is
cleared. This ensures data consistency in case one of the
writes is to the same address as the read transaction.

The Impact of Memory Design on
High-Performance RISe Microprocessors
Memory design has always been a crucial factor in
the race for high-performance processing. Now the stakes
are higher than ever before with the advent of RIse
microprocessors, which require a memory access during
every clock cycle and speeds exceeding 40 MHz.
To feed these high-performance engines, you are
faced with building a eMOS or TTL-based memory system that must sustain a bandwidth on the order of 160
Mbytes per second (assuming 32-bit accesses, one access
per clock, and 40 MHz). Because the processor can only
run as fast as the memory system, a high-performance
memory system is a crucial part of any RISe design.
Ideally, a memory system should be big, fast, and
cheap. Unfortunately, these goals are often at odds with
one another. A simple high-speed SRAM memory system
large enough and fast enough to fulfill the RISe
processor's needs would be ideal- if SRAMs were not
expensive and power hungry and did not need much more
board area than DRAMs. The latter cost much less and
provide better memory density but also run several times
slower and require a more complex addressing and control
interface. Using only DRAM for a memory system implies multiple wait states for each memory access. This is
disastrous to the performance of a high-speed RISe
processing engine. The typical solution to these conflicting requirements of speed and density versus cost is to use
a high-speed SRAM cache memory system backed by a
DRAM main memory system.
Cache systems are a well-recognized and commonly
used solution for high-speed processing systems. Cache
memory systems were proposed early in the 1960s and
have been used extensively in mainframes and minicomputers since the mid to late 1960s. Cache memory has become increasingly interesting to the designers of small
computer systems as microprocessor speed and memorybandwidth requirements have increased. Consequently,
RISC processors make extensive use of cache systems to
meet their memory-bandwidth needs.

Cache systems are not the only variable in the
memory system performance equation, however. The
memory system picture includes two factors: cache performance, which is often measured, in terms of cache hit
ratio, and cache miss penalty, which is a function of both
the cache controller and the main memory system. The
average memory access time gives a good perspective on
the total memory solution:
tavg = teh(chr)

+ tern(l - Chr)

Eq.l

where tavg = average memory system access time
teh = cache-hit memory access time
tern = cache-miss memory access time
Chr = cache hit ratio
l-chr = cache miss ratio
For most cache systems, the cache-hit memory access
time is one clock, which represents a zero-wait-state
memory for RISC processors. A useful approximation for
estimating performance for systems using RISC processors (assuming one clock per memory access), is that performance equals the product of the average memory system access time and the processor's average number of
clocks per instruction (CPI). This product yields an adjusted system clocks per instruction value that is useful in
estimating system performance:
CPIsystern = CPIprocessor x tavg

Eq.2

where CPIsystern is the adjusted CPI for system performance
This rule-of-thumb equation illustrates the importance
of memory system performance. As with any processor,
only a memory system providing zero wait-state accesses
permits a RISe processor to achieve maximum performance. Because a RISC system requires a memory access
for every clock cycle, an average memory access time of

8-17

~RESS
Impact of Memory Design on RISe Microprocessor
~~ ~~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
memory system features can· be optimized to offset the
cache's performance.

two clocks cuts the maximum attainable system performance in half!
As can be seen in Eq. I, the cache hit ratio and the
cache-miss memory access time are the two parameters
that can be manipulated to achieve the maximum system
performance within your constraints. Cache hit ratio is the
ratio of cache hits to the number of total cache requests
and is largely a function of the cache design. The cachemiss memory access time is the combination of the latency imposed by the cache controller as it fetches the
missed cache line and the latency caused by the main
memory system. Cache-miss memory access time is not
directly a sum of these two latencies, though, because the
cache line fetch timing and main memory timing overlap.
Cache hit ratio is important to memory system performance. Whether you are using a custom-designed cache
controller or an off-the-shelf product, it is necessary to understand the factors that contribute to cache hit ratio. This
understanding allows you to make a rough estimation of
cache performance, which in tum helps you defme the required main-memory-system performance required to
meet the desired system performance. Along with processor performance, the cache and supporting memory system determine the achievable level of system performance.

Size and Set Associativity
Set associativity also contributes to cache performance. Set associativity describes the number of memory
locations to which a single address can be mapped. In
other words, a cache with N-way set associativity can map
any address to N number of cache locations.
A fully N-way-associative cache large enough to
yield a high cache hit ratio is, in practice, extremely difficult to implement for a useful clock speed. Therefore,
cache designs generally use four-way, two-way, and oneway (direct mapped) set-associative caching.
For smaller cache sizes, the greater the set associativity, the greater the cache hit ratio. However,
studies have demonstrated that the benefits of set associativity decrease as cache size increases. Figure 1 illustrates the cache hit ratios for 1-, 2-, and 4-way set-associative caches as a function of cache size. Note that as
cache size increases, the cache hit ratio curves for the
cache systems converge.
Multiple set associativity carries a penalty for cache
system design. The greater the level of set associativity,
the greater the number of cache tags that must be compared to determine a cache hit. This requirement directly
affects the maximum clock speed at which a cache controller can operate. As Figure 1 shows, a 2- or 4-way setassociative cache offers little performance advantage over
the direct-mapped cache at cache sizes of 64 Kbytes and
larger. Assuming that the cache can provide a suffIciently
large memory size, reducing the level of set associativity
carries the advantage of decreased cache controller complexity and increased maximum speed. The direct-mapped
cache provides virtually the same cache hit ratio as the
more complex 2- and 4-way set-associative caches, yet
promotes greater overall system performance by allowing
a faster system clock for the processorand cache system.

Cache system performance
Cache performance and its contributing factors has
been a topic of intense study in computer architecture
circles. This application note is not intended to provide a
detailed analysis of cache performance and design. However, a short discussion of cache performance serves as
background for a discussion of memory system performance.
Cache hit ratio is the primary metric of cache performance and is strongly influenced by cache size. The larger
a cache is, the more likely it is to hold the required datum.
However, the tradeoff to cache size is cost. The reasons
for avoiding a large SRAM memory system are system
cost and memory density. If several megabytes of SRAM
are an affordable option, why build a cache in the fIrst
place? The purpose of a cache is to provide enough highspeed memory to effectively increase processor performance, yet still stay within the system budget. Therefore
the next question is: How big is big enough?
Unfortunately, the question of cache size is not easily
answered. Caches often cannot be made arbitrarily big,
but they need to be large enough so that their benefit offsets their cost. Assuming a system budget of some type
(cost, power consumption, size, or a combination of
these), a good approach to designing a cache is to choose
a target cache hit ratio based on the system performance
requirements. This target cache hit ratio is driven by system performance (as described by equations 1 and 2) and
the system design constraints. If the system budget for the
cache does not allow the cache hit ratio required to' meet
the system performance requirements, the supporting

Block Size
Another contributing factor to cache hit ratio is cache
block (or line) size, which is the number of bytes fetched
by the cache upon a cache miss. Cache performance
generally increases as the cache line size increases, because the cache fetches more data upon a miss and is
more likely to contain the next segment of code. As the
cache line size increases, however, processor delays
caused by the cache line fetch and the likelihood of fetching unnecessary memory coritents detract from performance. The net effect is that cache performance generally
increases as a function of cache line size; but the small
improvements in cache performance must be balanced
against the disadvantage of processor stalls caused by the
longer cache line fetches.
Many other factors contribute to a cache's overall
performance. One such factor is the method by which the
main memory is updated upon a write access to the cache.
Because the cache contains a copy of data stored in main

8-18

~
9'

'~RESS

Impact of Memory Design on RISe Microprocessor

~, ~I~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Effects of associativity on miss rate for user component in mUltiprogramming environment

+ 1 set
• 2 sets

20

x 4 sets
15
Miss
Rate
%

10

5

O+-----------~----------r---------~----------_.----~

1

2

4

8

16

32

64

128

256

512

Cache Size(Kb)
• ACM Traasactions On Computet Systems, 11188 Yo. 6 No.4, Cache Performance Of Operating System And Mu\tiprogrammiDa Workloads, (Agarwal, Hellnes)', Horowitz)

Figure 1. Effects of Associativity
memory, writing to the cache changes that data, which
written with a new cache line. If the cache line has been
modified, however, the modified cache line is written out
outdates main memory's contents. This problem introduces the issue of maintaining data consistency between
to main memory before being replaced.
the cache and main memory.
Copy-back caching has the advantage of allowing any
Two caching modes are used to ensure data consisnumber of write accesses to the cache without processor
tency: write through and copy back. Write-through cachdelays. It also conserves system bus bandwidth, because
ing avoids the data-consistency problem by writing to
cache lines are only written to memory when the cache
main memory with every cache-write access. The problem
line is no longer needed.
with the write-through approach is that write accesses
Cache Speed
incur main memory delays upon every write to the cache.
You can avoid these delays by using write buffers to store
A design issue of growing importance is the difficulty
the data from write accesses, but buffers solve the probof building a cache fast enough to meet the processor's
lem only to the extent that they can store the write-access
needs. Designing a discrete CMOS or TTL cache controldata and unload it to main memory. Block store operaler that can achieve zero-wait-state performance is becomtions, such as those used in context switches, often cause
ing prohibitively difficult at processor speeds of 25 MHz
processor stalls under write-through mode, when the write
and beyond. Driven by the timing problems of high-speed
buffers become overwhelmed with data. The writecache design, many designers are using ASICs to implethrough method also has the disadvantage of increasing
ment custom cache controllers at speeds of 16 to 25 MHz.
bus traffic, because each write access forces a bus transAt speeds of 33 MHz and above, designers are relying on
fer. For these reasons, designers of shared-bus multiVLSI cache controllers.
processing systems have largely abandoned write-through
The use of VLSI cache controllers as part of a
caching.
processor chip set is becoming the preferred method of
The alternative, copy-back caching, allows the
microprocessor CPU design. This approach minimizes
processor to write to cache memory without immediately
design time, while offering superior performance with
updating main memory. The copy-back cache keeps a
minimal cost. VLSI cache controllers also provide speed
state bit in each cache tag entry to report the modified
enhancements due to the integration of features such as
status of a cache line. If the processor writes to a cache
cache tag memories and MMU controllers. By providing
line, the copy-back cache controller sets the modified bit
greater functional density than that achievable with
for that cache line. When a cache line is no longer
ASICs, the VLSI custom controller offers greatly enneeded, the state of its modified bit is checked If the
hanced levels of integration and maximum system speed
cache line has not been modified, the cache line is over8-19

~CYPRISS
Impact of Memory Design on RISe Microprocessor
&M~~~~~~~~~~~~~~~~~~~~~~~~~~~~
memory bank outputs to place each word of the cache line
on the memory bus. This method does not necessarily
reduce the latency associated with the initial cache-line
access, but the latency for all subsequent accesses in the
cache line is minimized.
Cache line prefetch is another method you can use to
maximize main-memory performance. Because caches are
designed around the concept of sequential memory accesses due to an effect known as spatial locality, a cache
miss on any specific cache line increases the probability
of a cache miss on the next cache line. Main memory can
use this concept to anticipate the cache by prefetching the
next cache line after servicing a cache-line fetch.
You can accomplish cache-line prefetch by designing
a memory controller that can access the next cache line
and store it into a prefetch buffer in the memory controller. You can also implement cache-line prefetch by asserting the next cache-line address to the memory in anticipation of the next cache line, thereby minimizing the initial
memory access latency. Either method requires a more
complex memory controller and an address competitor to
prevent memory access errors. Note that by implementing
cache-line prefetch in the main memory system as opposed to the CPU cache, you avoid unnecessary bus traffic for unused prefetched cache lines.
An extension of the cache-line prefetch approach is to
employ secondary, or second-level, caching. The secondary cache is essentially a much larger cache used to support the· smaller CPU cache. In general, the secondary
cache is 2k times larger than the primary CPU cache, and
the secondary cache blocks contain 2D primary cache
blocks (where nand k are typically ~ 2). The use of a
secondary· cache allows fast cache-line fetching by the
primary-level cache, assuming a cache hit in the secondary cache. Upon a secondary-cache miss, data is supplied
from main memory.
To minimize the secondary-cache-miss penalty,
cache-line forwarding is generally used. This allows the
cache line requested by the first-level cache to be fetched
from main memory with essentially the same latency as
main memory alone. The secondary cache updates itself
with the missed cache line as the line is supplied to the
primary cache. The secondary cache then fetches the
remainder of the secondary cache line.
Note that the initial main memory access delay for a
DRAM memory system is generally sufficient time for a
secondary cache to determine if a cache hit has occurred.
This delay can be used in designing a secondary cache
that introduces no latency penalty over a main memory
system alone. The cache-line address can be supplied
simultaneously to both main memory and the secondary
cache. The secondary cache uses the initial main memory
access latency time to determine whether a cache hit occurred and to inhibit main memory, thus preventing bus
contention.
Including a secondary cache in the system greatly
reduces the latency associated with a primary cache miss.
Assuming a primary-cache hit ratio of 90 percent, only 10

As semiconductor technology matures, increasing
levels of CPU integration become possible. This has
resulted in the recent emergence of integrated processors
with on-chip cache systems. Integrated caches offer
greater system integration and the opportunity for processor architectural improvements that are prohibitive to implement outside the chip.
However, one die currently cannot accommodate an
entire CPU plus a cache that achieves a 96-percent cache
hit ratio. The number of transistors that can be placed on a
single chip is limited, which forces chip designers to
reduce cache size to allow room for the processor. The
cache size problem increases significantly as the processor
becomes larger and more complex. The transistor budget
for an integrated cache processor currently forces system
tradeoffs that require the supporting memory system to
compensate for the resulting low cache hit ratio.
Integrated cache processors can present a problem for
the system designer attempting to assess system performance. Benchmarks for these processors are often carefully chosen or modified to maximize the cache hit ratio,
providing performance numbers that you cannot achieve
in the real world. This leads to a buyer-beware situation.
In evaluating any cache system, whether it is on or off
chip, weigh performance numbers against unbiased,
authoritative research findings on similar cache designs.
Always keep in mind that processor performance
depends on the entire memory system, not just the cache.
Even a cache system with a very high hit ratio can result
in mediocre system performance if a slow supporting
memory system hinders the cache. You must therefore
pay attention to the supporting memory system to achieve
the desired system performance.

Getting the Speed You Need From Main
Memory
As previously stated, minimum average memory access time represents maximum system performance. Equation 1 gives the average memory access time as the probability weighted sum of the average cache-hit memory access time and the average cache~miss memory access
time. Although you always want to minimize the supporting memory latency, the importance of minimizing mainmemory latency grows as the probability of a cache miss
increases.
The obvious approach to minimizing the supporting
memory's latency is to design a fast DRAM main
memory. To provide maximum access speeds, DRAMs
commonly provide fast sequential memory accesses via an
addressing mechanism such as page mode, static column,
or nibble mode. These features prove useful for cache line
fetch, because a cache line is a fixed-length, sequential
series of memory accesses. However, sequential accesses
are often not enough.
Another method of increasing DRAM memory system speed is to employ interleaved banks of DRAM. This
essentially involves supplying addresses to several banks
of DRAM simultaneously and sequentially enabling the

8-20

~CYPRISS
~

Impact of Memory Design on RISe Microprocessor

~~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~====~

RISC CPU
with cache

RISCCPU
with cache

<

t

t
DRAM
memory

.)

memorvbus

t
memory inhibit

"

,------------memo,,:r..-_~

Secondary
cache

Inhibit

Secondary
cache

DRAM memory

Figure 2. Secondary Cache Systems
percent of the memory accesses are made from the supporting memory system. With the inclusion of a secondary
cache, 90 percent of those primary cache misses are
fetched from the zero-wait-state secondary cache, again
assuming a 90-percent cache hit rate. This leaves only 10
percent of the primary cache misses to be fetched from
main memory .. Therefore, the percentage of memory accesses that incur the full delay from main memory drops
to 10 percent of 10 percent, or only 1 percent!

using additional CY7C604 cache controller/MMUs and
CY7C157 cache RAMs.
The CY7C604 provides a high degree of functional
integration and includes features such as on-chip write and
read buffers, cache-tag memory, and a SPARC reference
MMU with 64 lockable TLB (table look-aside buffer)
entries. The CY7C604 supports both copy-back and writethrough cache modes, giving you the superior system performance of copy back or the simple cache coherency afforded by write through.
Under copy-back mode, the CY7C604's on-chip
write and read buffers allow modified cache lines to be
simultaneously flushed out of the cache while the missed
cache line is fetched from main memory. Write buffers
also boost performance in write-through mode and for
non-cached memory accesses, allowing the CY7C604 to
store up to four double-word memory writes without stalling the CY7C601 processor.
Incorporating cache-tag memory into the CY7C604
provides extremely fast recognition of cache hits or misses, thus allowing the cache to run faster than is possible
with off-chip tag memory. Including the SPARC reference
MMU with the cache controller on the CY7C604 allows
tightly coupled operation between the cache controller and
MMU functions. The MMU checks access-priVilege status
for all memory accesses, including those to cache, thereby
protecting memory from unauthorized accesses. In addition, the SPARC reference MMU supports execute-only
access protection for memory, providing an additional
level of security for sensitive code and computing environments.
Cache miss latency for the CY7C604 cache is minimized by the use of high-speed, O.8J.l, dual-layer-metal,
CMOS logic and support of the SPARC reference Mbus.
Mbus is a 64-bit multiplexed address and data bus that
supports burst-mode accesses and provides a peak bus

Secondary Cache System Applications
Secondary caching proves especially useful for supporting small, on-chip integrated processor caches and for
supporting shared-bus, multiprocessing systems (Figure
2). For the latter, processing nodes with small- to
medium-sized primary caches can share a large secondary
cache. This approach allows equal or greater performance
than nodes with large primary caches and also reduces the
cost of each processor node.

The CY7C600 Chip Set
The Cypress CY7C600 SPARC RISC chip set is an
example of an high-integration CPU with a VLSI cache
subsystem (Figure 3). This chip set comprises the
CY7C601 Integer Unit, the CY7C602 Floating-Point Unit,
the CY7C604 Cache Controller and MMU, and two
CY7C157 Cache RAMs. These five chips constitute a
high-performance CPU that requires no glue logic and
operates at speeds from 25 to 40 MHz, providing 29
MIPS of sustained integer performance.
As a part of this CPU, the CY7C604 provides a tightly coupled SPARC reference MMU and cache controller
with cache tag RAM. The chip implements a 64-Kbyte,
direct-mapped cache. The 64-Kbyte cache is estimated to
provide an average cache hit ratio of 96 to 98 percent.
You can expand the cache to a maximum of 256 Kbyte by

8-21

CY7C602

CY7C601
Integer

Floating-

FP Interface Signals

Unit

Point
Unit

Virtual Address Bus VA<31:0>

CY7C604

Cache
Controller
andMMU

Mbus (54-bit multiplexed data/address bus)

Figure 3. CY7C600 SPARC Chip Set
bandwidth of 320 Mbyte/s at 40 MHz. Mbus allows cache
lines to be transferred in bursts, providing a fast interface
to main memory. All CY7C604 burst accesses are in
cache line lengths and on cache line boundaries, simplifying both the main memory design and the Mbus interface.
In addition to supporting the high-speed Mbus, the
CY7C604 provides support signals for secondary-cache
systems. The CY7C604 furnishes visibility into the cache
operation from the memory bus by supplying a cache
status signal. This function gives a secondary-cache con-

troller greater flexibility in managing the status of its
cache and can be used to increase secondary-cache efficiency.
The CY7C600 chip set is a high-performance RISC
CPU, providing maximum system performance with minimal design effort. The chip set is available in speeds of 25
to 40 MHz, and its five-chip, no-glue-logic design offers a
highly compact solution to state-of-the-art computing
needs.

8-22

CYPRESS
SEMICONDUCTOR

High-Speed CMOS SPARC
System Design
This application note describes many of the effects
caused by high clock speeds and rules of thumb for lessening the severity of the effects. Following these rules of
thumb will help ensure a successful SPARC hardware
design.
The SPARC (Scalable Processor ARChitecture) RISC
processor is the only RISC processor architecture
designed to be scalable, so that the processor's clock
speed can increase as semiconductor process technology
improves. The benefits of scalability appear most dramatically in the Cypress CY7C600 SPARC product family. In
a little more than a year, Cypress has increased the clock
speed on the CY7C601 integer unit from 25 to 40 Mhz.
As the CY7C600 SPARC family leaps upward from
25 to 40 Mhz, system designers must become more aware
of the effects of fast clock speeds upon hardware design.
High-speed hardware design is not a difficult art, but it
does require careful and close attention to detail.
The effects that can lead to untraceable bugs in a
high-speed system exist in a low-speed system; however,
the magnitude of these effects in a slow system are small
enough so that they can be safely ignored. This is not the
case when clock speeds rise over 25 Mhz.

System Clock
At speeds above 25 Mhz, generating and distributing
the system clock becomes a critical issue. The goal is to
minimize the effects caused by duty-cycle imbalance,
clock skew, and noise on clock lines.
Duty-Cycle Imbalance
Duty-cycle imbalance occurs when the clock signal's
High and Low portions the are not symmetrical. Clock
symmetry can vary from 40 to 60 percent, depending
upon the hybrid crystal oscillator used. A simple way to
ensure that the clock is symmetrical is to generate a signal
at twice the frequency desired, then divide this frequency
down to the system clock frequency using a D flip-flop
(74ACII0074). Figure 1 depicts a simple clock generation circuit.
All physical devices exhibit an edge-dependent,
propagation-delay asymmetry; i.e., the Low-to-High-going
edge rises faster than the High-to-Low-going edge falls, or
vice-versa. If a single driver buffers a clock line, the
driver introduces asymmetry into the system clock signal.
You can avoid this asymmetry by cascading two inverting

PR
'-------f

D

40 Mhz
Q

System Clock

Q
1 - - - - - - - - - - 1 ; > CLR

L.-.....Jro...Lo.JJo..o.--I

Figure 1. Symmetric-Duty-Cycle Clock Generation

8-23

20
19.8
19.6
19.4

.

19.2
19
./

18.8
18.6
18.4

-r;;-

S

~
13

0

18.2

/'

18
17.8
17.6
17.4
17.2

/

17
16.8
16.6

16.4
16.2

16
15.8
15.6

V

/

/

I

/

/

/

./v

V

/

~

r"""""

/

V

Li(

V

15.4
15.2
15

20

40

.

60

80

.

100

.

120

140

160

180

200

wad (Pt)
Figure 2. Delay (os) vs Load (pF) for CY7C601
drivers in the same package. Because the drivers are in
the same package, their delay characteristics are
equivalent, and the differential between the Low-to-High
transition and the High-to-Low transition is zero. A clock
signal introduced into such a cascaded driver has the same
symmetry going out as it had going in.

(PCB). The only way to minimize clock skew is to design
the PCB so that the fanout on all clock lines is equivalent.
Use a chip with multiple on-board buffers to maximize
line-driving capability.
The load on a clock line has three components: trace
capacitance, socket capacitance, and input capacitance.
Because the high integration of the CY7C600 SPARC
family lends itself well to single-board designs, trace
capacitance is not usually an issue. Socket and input
capacit~ce dominate on PCBs.

Clock Skew
Clock skew is caused by the need to distribute the
system clock signal from a central point (the oscillator) to
components that are dispersed on the printed circuit board

8-24

clock drive lines in place of one. This cuts the capacitance
of the clock line in half.

Noise Generation
The CY7C600 family is fabricated using the Cypress
CMOS process. Because of the fast edge rates (1 - 2 V/ns)
and rail-to-rail voltage swings of high-speed Cypress
CMOS logic, careful attention must be paid to signal
noise. The primary sources of noise are ground bounce,
power supply, crosstalk, and transmission-line reflections.
You can combat noise effects by noise budgeting, good
grounding, use of synchronous circuits, and proper line
termination.

Figure 3. Parallel Clock Drivers
The pin grid array (PGAl package used for the
CY7C600 family has extremely low capacitance. The
maximum pin capacitances are 10 pF for input pins, 12 pF
for output pins, and 15 pF for bus pins. You can limit
other components' socket and input capacitances by using
surface-mount design techniques.
As a rule of thumb, limit a clock buffer's fanout to
eight to 14 devices. It is important to include both AC and
DC loading in your fanout calculations. Data for the
CY7C601 SPARC Integer Unit that relates delay to load
appears in Figure 2.
AC characteristics for logic devices are usually calculated using a value of 50 pF. If more than 50 pF of
capacitance is being driven, the driver's AC characteristics should be reduced for your calculations.
The input capacitance of a typical CMOS part is 5
pF. Bipolar logic is higher, with a typical input
capacitance of 10 pF. Typical ECL parts are lower, with
an input capacitance of about 3 pF. When you need a
clock fanout greater a single buffer can supply, use the
parallel driver scheme shown in Figure 3.
DC input current ratings are important when calculating total loading. The driving device must be able to sink
the sum of the Low-level input currents to which it is con~
nected. Low-level input current for bipolar logic ranges
from -100 to -400 ~. The corresponding figure for
CMOS is -1 to -5 ~, while ECL weighs in at 140 to
200 !lA.
High-level input current for bipolar logic is from 20
to 50 ~, with CMOS at 1 to 5 ~ and ECL at 265 to
350 ~. Because most bus drivers can sink up to -24 rnA
and source up to 48 niA, input current loading is seldom
an issue. Input current loading might become significant
when driving a parallel-resistor-terminated load. In such a
case, use an AC termination scheme.

Ground Bounce
Ground-bounce noise arises when several outputs of a
CMOS logic device switch from High to Low. This simultaneous switching causes a large sink current from the
load capacitance to flow to ground through the device
package inductance. This current develops a momentary
potential whose magnitude equals the product of the package inductance and the sink current's rate of change:
dI
V=Lx ~
E~ 1

where V is voltage, L is ,the package inductance, and dIldt
is the current's rate of change per unit time. This graph
was computed using typical values of L and V.
Figure 4 illustrates typical ground bounce as seen at a
device's output pin and the corresponding voltage induced
across a ground pin. The voltage is normalized to IV. If
you apply 5V, for example, you see a ground bounce of
approximately 0.75V 1.2 ns after. the power is applied.
Note. the voltage undershoot at 0.5 ns caused by the inductance. Without damping or termination, you can expect
the ground bounce to settle to zero in approximately
1.8 ns.
The fast edge rates of the CY7C600 devices can lead
to a fairly large ground-bounce potential. This voltage
spikes the Low state held on the quiescent outputs and can
exceed the input Low-level maximum (0.8V), causing
downstream logic to switch erroneously. Ground-bounce
noise can also cause registers in the bounced device to
lose their stored state. This is caused by the momentary
disturbance in the device's ground and Vee reference.
The switching of multiple outputs on a CMOS device
also changes its propagation delay. The delay increases by
approximately 200 ps per switched output. For a device
with a large number of outputs, this additional delay
should be included in worst-case timing analyses.
The magnitude of a given ground bounce is proportional to the package inductance and the number of outputs switched. By reducing parasitic inductance between
the package, ground and Vee, you can minimize the effect of ground bounce. The most effective way to reduce
parasitic inductance is. to use surface-mount technology
(SMT). You can also reduce parasitic inductance by using
packages with center Vee and ground pins and by using
low-inductance bypass and decoupling capacitors. For

Clock-Line Noise
Noise on the clock distribution lines can have a ripple
effect upon other logic. It is important to take steps to
minimize self-generated noise on clock lines. Selfgenerated noise comes from two sources: reflectance from
the end of the clock line and ove:rshoot caused by line
load capacitance. You can minimize reflectance by
properly terminating the end of the clock line. (The
"Noise Reduction" section covers line termination.) You
can substantially reduce overshoot by using two parallel

8-25

parts .in critical logic paths, use a standard decoupling
capacitor (0.01 - 0.1 J.1F) along with a high-frequency
decoupling capacitor (470 pF).
If you employ pin grid array (PGA) or through-hole
technology, you can also reduce the effect of the ground
bounce by using series damping resistors on the package
outputs. The resistors lower the magnitude of the ground
bounce before it reaches the downstream logic. As an
added benefit, the magnitude of signal overshoot and undershoot is decreased. The tradeoff is slower switching
rates, due to the increased RC time constant.
You can also reduce ground-bounce magnitude by
using fewer outputs per package. Figure 5 shows the
relationship between ground-bounce magnitude and the
number of outputs switched. Note that the relationship is
roughly linear.
Further, ground-bounce magnitude is directly proportional to the power supply voltage. By reducing the magnitude of Vee, you can reduce noise problems caused by
ground bounce.
The use of only synchronous circuits provides a builtin resistance to false triggering caused by ground bounce.
Synchronous circuits only trigger when inputs and the
clock signal change. The ground-bounce noise produced
by the upstream logic has one clock cycle minus the setup time to settle before the next clock reaches and triggers
the downstream logic.
If asynchronous logic is required, the use of an output
pin close to the package ground pin reduces ground-

bounce noise. The difference in noise magnitude between
pins next to the ground pin and pins next to the Vee pin
can be as much as 50 percent.
To minimize the effect of ground-bounce noise upon
the rest of the circuit, avoid running control signals
through a device that drives data and/or address lines. The
probability of multiple data or address lines simultaneously transitioning is high. If the device also contains control
signals, they can be erroneously switched by the ensuing
ground bounce.

Power Supply
Like the system clock, the power supply generates a
global signal; its fluctuations have an effect upon every
component in the system. Power~supply variations have a
greater effect as clock speeds increase. High-frequency
noise and ripple from the power supply can cause differences in voltage levels among different sections of the
system. As a rule of thumb, high-frequency noise occurs
whenever the mean wave length of the noise on the power
lines is not several times greater than the length of the
longest power line.
By causing the voltage levels to vary across the PCB,
high-frequency noise and ripple from the power supply
leads to a loss of noise immunity due to a reduction in the
difference between the voltage value of input Low and
input High. Bypass capacitors at the power supply input
smooth out momentary current fluctuations. High-frequen-

1

-

.-

0.9
0.8
0.7
0.6
0.5
0.4
0.3 ..........
0.2

~~~~~~~~~~~~~~~k¥

0.1
O------~--=-----~~~~~----------~--~===----

-0.1
-0.2
Time (in nanoseconds)
-0.3
.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
Figure 4. Ground Bounce and Voltage Induced on Ground Pin

100

·
90

·
80

·

~

70

·

60

·

50

/

40

·
30

V

/

~

[7

/

V

v

I.--"

V

·
20

·
10

·
o

o

1

2

4

3

5

6

7

8

Number of outputs switching
Figure 5. Ground Bounce Magnitude vs Number of Switched Outputs (from Reference 5)

cy power-supply noise should be specified to be under 50
mV peak to peak.

length over which they are adjacent or parallel. Run
ground strips alongside either the cross talker or the cross
listener or between them.
To prevent crosstalk:, critical signals such as the clock
should always have a dedicated ground line. When possible, the signals on adjacent PCB layers should be perpendicular to each other. Use the power and ground layers
as shields between signal layers. For backplane and wirewrap applications, use twisted-pair for sensitive signals
such as clocks; asynchronous set and clear signals, and
asynchronous parallel loads. When using ribbon or flat cabling, make every other conductor a ground line.
If crosstalk: occurs in an already-designed board or
system, try these quick fixes to solve the problem: On
PCBs, glue a grounded wire or copper strip alongside or
between the affected traces. In a backplane or wire-wrap-

Crosstalk
Crosstalk occurs when a signal passing through a
board trace or transmission line generates a corresponding
signal in an adjacent quiescent trace or line. Crosstalk
magnitude is proportional to three factors; edge rates,
physical proximity of the lines, and the distance over
which the two lines are adjacent. Because of the fast edge
rates of CY7C600 devices (up to 2 V/ns) and other highspeed CMOS logic, crosstalk: deserves careful consideration.
There are three ways to minimize crosstalk:: grounding, shielding, and separation. During the initial design,
maximize the distance between traces and minimize the

8-27

€r~ ========~H;;;I;!·g;;;;h~-S~p;;;e~ed~C~M~O~S~S~P~A~R~C~S~ys~t~em~D~e~s~ig~n
Reflectance is caused by a mismatch between the line
characteristic impedance and the load impedance. The following equation shows the relationships involved:
RL-Zo
Eq.2
PL=-RL+Zo
where R.L is the load impedance, Zo is the line impedance,
and PL IS the coefficient of reflectance, which equals the
reflected voltage over the incident voltage. The equation
shows that the reflectance from the end of the line goes to
zero as the term RL - Zo goes to zero. Additionally, the
magnitude of the reflectance decreases to zero as RL + Zo
goes to infmity. These relationships show two ways to
decrease the reflection from the end of a transmission
line: match the line's impedance to that of the load to
minimize the voltage reflected or maximize the sum of
both impedances to minimize the effect of the reflected
voltage. The tradeoff is that maximizing impedance
decreases the signal rise time. Both methods are discussed
in the next section.

Vee

R l/R2 = equivalent Thevenin
resistance of termination

Figure 6. Split-Resistor Termination

ping situation, spiral a ground wire around the talker
and/or listener to increase their shielding. Use a split-resistor termination on the offending line, where RlIR2 = the
Thevenin resistance, which is the impedance of the line
(Figure 6). (The "Parallel Termination" section explains
how to determine the Thevenin resistance). You can use
diode or active termination to reduce ringing (see the
"Diode Termination" section). As a last resort, cut the offending crosstalk trace from the PCB and replace it with a
wire. By ~-routing the wire, you might reduce crosstalk,
at the possIble cost of greater propagation delay.

Reducing Noise
~ou

can e?vision the effect of noise upon a system
by usmg a deSIgn method called noise budgeting. In the
SImplest sense, noise budgeting is the allocation of noise
to system noise sources (ground, power, crosstalk, etc.) in
such a way that the noise immunity of individual components is not exceeded. Allocation is based on the calculated or expected values of the noise generated.
. A nois~ budget table shows you the relative magnItude of nOlse generated by each source. This allows you
to focus your noise-reduction efforts where they will have
the most effect. A noise-budget table for a representative
system appears in Table 1. The entries for DC and AC
noise represent the peak: noise values allocated to the
s~cific noise so~rce. For example, the expected peak
nOlse due to EMI IS 2 mY. Note that some noise sources,
such as temperature, have only DC components; others,
such as crosstalk, have only AC components.
The concept of a noise budget rests upon three points:
effectivity measures, probability theory, and noise immunity. An effectivity measure relates a circuit parameter,
such as temperature, to its effect upon a device's output
level. For example, temperature has a DC effectivity
measure of 1.0 in Table 1. This means that if the noise
generated by temperature variations is 10 mY, then the
noise output from the circuit equals 10 mV of noise times
a 1.0 DC effectivity, or 10 mY: You can determine effectivitY measures from vendor-supplied circuit characteristics and graphs.
Probability theory comes into play when you consider
the chance that a noise event will falsely trigger a circuit.
You can consider noise in a computer system as random
for most practical pUIpOses. This assumption might seem
counter-intuitive, as noise is caused by transients, whether
its source lies in temperature, ground levels, signal edges,
etc. Each noise point source is deterministic; a reflected
noise pulse is generated only when the incident signal
reaches the end of a transmission line. However, the noise

Transmission-Line Reflections
F?r long trace lengths or backplane connections, it is
sometimes necessary to consider transmission-line effects.
Thes~ . effe~ts ~ significant when the unloaded signal
tranSItion time IS less than or equal to the round-trip substrate propagation delay. For ordinary PCB materials (G10 ~oxy), the rou.nd-trip propagation delay is approXimately 0.295 ns/mch. Unloaded signal transition time
for the CY7C600 devices varies from 3 to 2 ns, depending
upon clock speed. Traces longer than 6 -10 inches should
be treated as transmission lines for noise calculation purposes.
Transmission lines suffer from three types of noise
effects: undershoot, overshoot, and ringing. Undershoot
occurs when a signal's voltage level momentarily drops
below the Low level (OV). Overshoot is the inversew?en a signal's voltage level momentarily rises above the
H~gh.lev~l (+5V). (Use. of a 5V power supply is assumed.)
Rmgmg IS when a nOlse pulse keeps on reflecting back
from the two ends of a trace or wire.
All of these effects result from reflectance at the end
of the trace or wire. Depending on where the reflection
appears in relation to the signal, a reflected noise pulse
can manifest as undershoot or overshoot.
Because of the CMOS· technology used to fabricate
the CY7C600 devices, the parts resist damage caused by
undershoot and overshoot on input lines. The devices are
insensitive to -3V DC input levels (sustained) and -5V undershoot levels less than 10 ns 10I)g (measured at the 50
percent point). Input levels as high as +5.5V DC can be
withstood without damage, as can· momentary overshoot
pulses of up to +6V DC.

8-28

S7~

High-Speed CMOS SPARC System Design

~~~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Table 1. Noise Budget
Total
Effective
Noise (mV)
5S.25

DC Noise
(mV)

DC
Effectivity

Equivalent
DC (mV)

AC Noise
(mV)

AC
Effectivity

Equivalent
AC (mV)

Gnd to Gnd

11

1.0

11

105

0.45

47.25

PC card Crosstalk.
Backpanel Crosstalk

--

--

--

---

75
71

1.0
1.0

75
71

Vee Bus
Temperature
SIP Crosstalk
Termination

230
10

0.10
1.0

23
10

80

0.29

23.2

--

--

---

---

---

-22
24

1.0
1.0

22

75
71
46.2
10
22

24

24

Wire Untwist
EMI

--

---

---

35
2

1.0

35
2

35

Source

--

--

1.0

2

The last point upon which noise budgeting rests is
noise immunity. Noise immunity is the amount of noise in
volts that a component can absorb without changing state.
The noise immunity for a component is the difference between input High voltage (1m) and input Low voltage
(IlL). For the CY7C600 family, 1m = 2.1V and ilL = O.SV.
This gives a noise immunity of 1.3V. For a CY7C600
component to be switched by a noise pulse, the noise must
therefore have a magnitude of at least 1.3V.

observed at any place in the system is the sum of all the
point sources of noise. This overall noise is random, because it is the sum of time-diverse noise sources such as
slow variations due to temperature, fast variations at clock
edges, etc.
Note also that noise generated by an event might arrive at a component from different sources separated in
time because of the different-length paths the noise took
to arrive at the component. Therefore, noise magnitude assumes a Gaussian distribution - the familiar bell-shaped
curve.
In a typical electronic system, no single point source
can generate enough noise to falsely trigger a circuit. A
circuit is· only triggered when a group of random noise
pulses sum to greater than the circuit's noise immunity.
Because the resultant noise is the sum of random noise
with a normal distribution, the probability that noise of a
certain magnitude will be encountered equals the area
under the normal curve. Peak noise voltage occurs within
3 sigma limits of the mean noise voltage on the normal
curve. 99.7 percent of the area under the normal curve is
within 3 sigma of the mean. Noise magnitude will be less
than or equal to the peak noise 99.7 percent of the time.
Thus, the peak noise voltage will be exceeded approximately 0.3 percent of the time. The peak equivalent
noise for a system equals the root of the sum of the
squares (RSS) of the individual sources:
Noise(equivalent) = "";S1 2 + S22 + ...

Grounding Techniques
Like the clock and power, ground is a common signal
for all components. For high-speed SPARC CMOS logic
design, the use of proper grounding techniques is important to reduce crosstalk and increase switching rates.
The basic grounding technique for PCBs is to provide
a ground comb on one side of the board. A ground comb
is a series of parallel strips connected at one end by a
perpendicular trace. The ground strips should only be connected at one/--.end, to minimize noise coupling. The
ground load caused by switching components on each
strip should be dispersed in both time and space to
decrease the amount of noise coupling between components.
Remember that the ground, while providing a common voltage reference, also provides an alternate path for
noise signals. Ground bounce travels down signal lines as
well as the ground plane to which the circuit is connected.
Take care to ensure that chips with many outputs that
switch at the same time do not connect to the same
ground plane. The ground plane should be connected to
10 percent of the edge connector pins spaced equally
apart. This reduces the ground impedance, which minimizes crosstalk because multiple signals do not rely upon
a single ground return path. Connect high-current circuits
to a separate ground to minimize noise coupling to other
circuits. For high-speed SPARC CMOS designs, use a

where Sl, S2 ... are the noise sources. The RSS of the system described in Table 1 is 136.3 mY. This is the peak
value of the total effective noise, and this value will not
be exceeded 99.7 percent of the time. For a design to be
immune to noise effects, the noise immunity of the component with the least amount of noise immunity must exceed this peak value by a wide margin. Conservatively,
minimum noise immunity of 2 x peak is acceptable.

8-29

Q"""""

ffigh-Speed CMOS SPARC System Design

'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Figure 7. Series Termination
multi-layer PCB with separate ground and Vee planes to
decrease system-wide noise.

Using Synchronous Circuits
Because noise is random, large noise spikes can
occur at any time. The only ones of interest are the ones
that falsely trigger a circuit. In an asynchronous design,
any noise pulse that exceeds a circuit's noise immunity
triggers the circuit.
In a synchronous design, on the other hand, the circuit is triggered only when a clock edge occurs at the
same time as a noise pulse that exceeds the circuit's noise
immunity. A valid clock edge only occurs during 25 - 35
percent of the clock cycle, depending on whether one or
both of the clock's edges can trigger an event. Thus, 65 75 percent of the noise pulses that randomly occur are
unable to falsely trigger circuits. The inherent noise resistance of synchronous logic make it a must for robust highspeed system design.

Termination Methods
You can reduce reflectance from the end of a signal
line by using proper termination. To do this, you use a
resistance to either damp the reflected signal or match the
impedance of a transmission line to the source or load,
thus reducing the reflection's magnitude. Consider using
termination techniques for signal lines longer than 6 112
inches. Termination. is mandatory only for clock inputs,
write and read strobes, and chip select and enable lines.
Address and data lines usually have time to settle before
they are sampled.

Series Termination
There are four methods of termination: series, parallel, AC, and diode. You can accomplish series termination
by placing a resistor in series with the output of the device
driving the signal trace (Figure 7). The intent is to match
the trace's impedance, Zo, to the circuit output impedance
plus the resistor value. When these two quantities are
equal, according to Equation 1, the reflectance from the
driving device is zero. Thus, if a noise pulse is reflected
back from the driven device's input pin, the pulse is absorbed when it· meets· the series resistor. Place the series
resistor as close to the output pin as possible.
One of the advantages of series termination is that it
causes no DC power dissipation and therefore does not
add to the system's overall power requirements. Series
termination does have several disadvantages, however.
One is slower signal propagation, which is due to the
larger RC time constant. Another disadvantage is that you

cannot use distributed loading along the line. The series
resistor's voltage-divider effect during the two-way
propagation delay time causes any inputs attached along
the line to see an input voltage halfway between the logic
levels; the devices therefore fail to respond correctly.
Reflections at the receiving gate place no restriction
on the number of lumped loads you can place at the end
of the line because all reflections are absorbed at the
source. However, the voltage drop across the series-terminating resistor limits the effective loading of the line.
A variation of series termination is series damping. In
this technique, instead of matching the line's impedance
to the driving circuit's output impedance plus the resistor
value, you use a resistor of 10 to 750.. This resistor damps
the noise pulses caused by reflection at impedance mismatches; the pulses are not completely absorbed as with
series termination. Except for this difference, the advantages and disadvantages of both techniques are the
same.

Parallel Termination
In parallel termination, you place a pull-up and a
pull-down resistor at the end of the signal trace (Figure
8). The Thevenin equivalent value of the resistors equals
the impedance of the signal trace. Two simple formulas
for calculating the proper values of Rl and R2 are
R2=2.6Zo
R2
Rl = 1.6
where Zo is the impedance of the signal trace.
The advantage of parallel termination is that the
waveform along the full length of the line remains undistorted. Also, the rise time of the signal traveling down the
terminated trace is unaffected. Additionally, you can use
parallel termination when the signal line's characteristic
impedance not completely defined. By approximating the
impedance, the reflectivity coefficient is still relatively
small; thus, overshoot and undershoot will probably
remain within safe limits. On the negative side, the pulldown and pull-up resistors constantly dissipate power. For
this reason, parallel termination should probably not be
used in systems utilizing the high-speed CMOS CY7C600
family.

AC Termination
Figure 9 shows the AC termination method, which is
the most common termination approach. It does not have
the half-voltage disadvantage of series damping and
causes no DC power dissipation. The latter feature is important when using low-power CY7C600 devices.

zo

Figure 8. Parallel Termination
4-ns maximum signal degradation. The calculated values
are based upon a son PCB trace and a 120n wire-wrapping line. If you use a resistor value of 47n and a
capacitor value of 38 pF, the termination closely matches
the impedance of the PCB signal trace. Additionally, the
termination acts as a low-pass fIlter, absorbing all noise
pulses under 7.12 ns in length and over 140 Mhz in frequency.
The disadvantage of the AC termination method is
that it requires two components, a capacitor and a resistor.
Remember to keep the leads as short as possible to
prevent ringing caused by lead inductance.

AC termination consumes no DC power because the
capacitor blocks the path to ground. You can attach loads
at any point along the trace, and they see a full voltage
swing. AC termination also acts as a low-pass fIlter for
short noise pulses. Any noise pulse less than 4(R x C)
seconds wide is fIltered out.
The value of the capacitor, C, must satisfy two conflicting requirements. It must be large enough to either absorb or supply the energy contained or removed when
positive or negative noise pulses occur. Additionally, the
capacitor must be small enough to avoid delaying the signal or slowing the signal's rise and fall times beyond the
design limit. For CY7C600 applications, the minimum
set-up times determine the maximum degradation of the
signal rise time. The minimum set-up time can be as small
as 4 ns for a 33-Mhz part. You can use the following formula to closely approximate the values for Rand C:

Diode Termination
Terminating a signal trace with a pair of Schottky
diodes is called diode or active termination (Figure 10).
The lower diode's low forward voltage, V f, clamps the
input signal to below ground, and upper diode does the
same to Vee + Vf. These effects significantly reduce signal overshoot and undershoot. If both undershoot and
overshoot are not a problem, you might require only one
diode.
The advantage of diode termination is that you do not
have to match the line impedance exactly, as you do in
series, parallel, and AC termination. Diodes are more expensive than resistors or capacitors but might reduce overall system cost because they eliminate the work of
precisely determining line impedances. Additionally, if
you discover that ringing is a problem during system
checkout, you can easily add diodes. As with all termination methods, keep the leads as short as possible to avoid
ringing caused by lead inductance.

C=~
Eq. 3
2.2R
where T is the maximum degradation of the signal rise
time. Start with a value of R slightly less than that of Zo,
the characteristic line impedance. Then calculate the value
of C. The combined impedance of the resistor and the
capacitor approximate that of the line, reducing the reflectivity at the end of the trace to near zero. You can verify
this by calculating the capacitive reactance, Xc, of the
capacitor:
1
Xc= 21tfC
Eq.4
where f is the frequency of the signal passing through the
signal trace. As an example, Table 2 was calculated for a

Figure 9. AC Termination

8-31

~~qR ;;;;;==;;;;;;;;;;=====;;;;H;;;;;;ig:;;;h;;;;-;;;;;;S;;;p;;;;ee;;;;d;;;;C=M;;;;O;;;;S=S;;;;;P;;;;A;;;;R;;;;C=S;;;;y;;;;;st;;;;em=D;;;;e;;;;s;;;;;;ig;;;;:;D
Vee

Zo

Figure 10. Diode Termination
The following Schottky diodes are suitable for termination purposes:
IN4148 (Switching)
IN5711
MBDI0l (Motorola)
HP5042 (Hewlett-Packard)

Table 2. AC Termination for a 4-ns Signal
Degradation

References

Values

PCB

1. FAST Applications Handbook, Fairchild, Inc.,
1987.
2. Blood, Jr; William R. MECL System Design Handbook, Motorola Inc., 1988.
3. Hefner, Moore & Weinstein. Advanced CMOS
Logic Designer's Handbook, Texas Instruments Inc.,
1988.
4. CY7C600 RISC Family Users Guide, Cypress
Semiconductor Corp., 1988
5. Tripp & Hall. "Good design methods quiet highspeed CMOS noise problems," EDN, October 29, 1987.

Zo (0)

50

R (0)
C (pF)

47

Wirewrapping
120
110
16
1.76

38
RC (ns)
1.78
4RC (ns)
7.12 - (140 MHz) 7.04 - (142 MHz)
- passwidth

8-32

CYPRESS
SEMICONDUCTOR

SPARe System Surface-Mount Design

This is diffIcult to do because of the mechanical constraints imposed by through-hole technology.
Consider, for example, a typical through-hole package, the DIP. While small, a DIP is a physically imposing
thing· with its large package and stiff leads. These
mechanical constraints are imposed by the need for the
DIP leads to go through either a socket or the PCB. To
achieve this penetration,· the leads and package need stiffness and strength. This requirement causes the leads and
package to have more mass and material, which, in turn,
means greater capacitance, inductance, and package
volume.
SMT packages do not have these mechanical constraints. Because surface-mount devices. (SMOs) are
placed onto, instead of inserted into the PCB, their
strength and stiffness requirements are considerably lower.
Thus, leads and package can be made as small as the
number of signal leads and die bonds allow. Because the
leads can be reduced to where they are just big enough to
physically reach the PCB pads from the package, their
capacitance and inductance are correspondingly reduced.
This reduction decreases the capacitive and inductive mismatch between the leads, the PCB pads, and the PCB
traces, which decreases the noise effects the component
sees. The decreased noise effects allow the signal lines to
run at higher frequencies without random problems caused
by noise spikes.
An additional SMT benefit is the capacity to place
more components in the same board area. Because SMT
packages are smaller than through-hole packages, more
components can reside in the same area. Because the components are closer together, the traces needed to connect
them are shorter. This means less trace capacitance and
impedance, which also makes higher operating frequencies possible.
As with everything in life, the advantages of SMT are
not free. The primary diffIculties encountered in using
SMT involve placement and soldering. Placement of surface-mount devices is more diffIcult than for through-hole

This application note covers most of the pitfalls in
SMT design and should help make your fIrst SMT design
successful. This is not a complete reference, however. For
thorough coverage of SMT techniques, please refer to References 1 and 2.
Cypress's objective is to design and build the fastest,
most capable SPARC chip sets. in the world. As the
operating frequency of Cypress's CY7C601, CY7C602,
CY7C604, and CY7C157 SPARC chip set increases, concerns about factors such as package capacitance and inductance and PCB trace length become more important.
You can reduce the impact of these factors by using surface-mount technology (SMT).
SMT differs from through-hole technology in that the
component leads are placed directly onto the PCB rather
than through the PCB. SMT permits greater component
density, more reliable systems, and savings in labor and
material costs. To gain these benefIts, SMT demands care
and precision in the placement and soldering of devices to
the PCB.
Fine-pitch leads for SMT devices are not uncommon.
Cypress and other advanced semiconductor vendors use
208-lead, 25-mil-pitch ceramic quad flat packs for many
products. Fine-pitch packages. such as these require
precision in initial placement and alignment.

Ins and Outs of Surface-Mount Technology
Through-hole or leaded technology is arobust packaging technique that has served the electronics industry
well in moving up the integration. curve. In fact, if integrating more functions on chip was the only technology
driver, through-hole technology would serve well for the
foreseeable future. However, as the industry moves into
the realm of system engineering, which involves multiple
chips, other integration requirements reveal the flaws in
through-hole technology.
The SPARC community's primary system-integration
need is to reduce the physical and electrical distance between components to achieve higher clock frequencies.

8-33

-52-Lead Plastic Leadless Chip Carrier J69
All dimensions are In inches:

=:-

---------I..--L lUl.45

--r

0.056

i

lUIQQ

0.730

T
0.033
II.lI23

**=~?=?

J~'

.1 ~=nm.

_

Jl..Wi

1

0.130

0.200

Figure 1. CY7C157 PLCC
160-pin plastic quad flat pack (PQFP), and a 208-lead
ceramic quad flat pack (CQFP). The CY7C157 cache
RAM comes in a 52-lead PLCC; the CY7C611 embedded
controller is offered in a 160-lead PQFP; and the
CY7C601 integer unit and CY7C604 and CY7C605
cache/memory management units can be packaged in a
208-lead CQFP. The drawings and form factors of these
three packages appear in Figures 1,2, and 3, respectively.

devices because SMD placement is relative, not absolute.
Because through-hole components are inserted into either
the PCB or a socket, feedback on correct alignment is instantaneous: Either the component leads go into the holes,
or they do not.
On the other hand, SMDs must be placed relative to
the appropriate solder pads on the PCB. A misalignment
of one or more 'leads does not become apparent until the
placement is visually inspected. Additionally, the lead
placement is not self-corrective. For a through-hole component, if one or more leads or PCB through-holes are
slightly off, inserting the other leads tends to force the
out-of-alignment leads into the correct orientation. This
self correction does not exist for SMDs.
Soldering is the other area where differences between
through-hole and surface-mount techniques become apparent. SMDs' have a lower profile than through':hole
devices, which puts SMDs closer to the PCB. If wave
soldering is used, then a problem known as shadowing becomes a concern. Shadowing occurs when the solder wave
must rise over the component instead of going under it, as
in through-hole designs. The component body can shadow
the component leads, preventing the solder from wetting
them. As a reSUlt, some of the leads are not soldered to
their PCB pads.
Another possible SMT problem caused by" wave
soldering is heat damage to components. While throughhole packages stand' off from the PCB, SMDs sit on the
board. The solder wave therefore washes over the SMD.
If the solder temperature is not carefully controlled, the
components can be damaged.
Fortunately, you can control all these difficulties
peculiar to SMT by careful attention to the fine details of
board stuffing and assembly.
Cypre~s

Lead Handling for SMDs
The relative fragility of SMDs requires a change in
handling procedures from that used for DIPs and other
through-hole devices. These parts can be shipped and
transported in carriers that allow flex and slight device
movement. This type of packaging suits the JEDEC J69
52-lead PLCC used for the CY7C157 due to the robust
nature of its leads.
However, this packaging is not suitable for theEIAJ
standard 160-lead PQFP or 208-lead CQFPs that Cypress
uses for non-memory devices. The leads in these packages
are very fine and fragile and are susceptible to twisting or
bending. These packages must be firmly fixed in pla.ce
during transport. The best method is to use a waffle pat,
in which the component leads are fixed by a small ridge
of material that, forms a box around the package.
Sandwiching the package between two carriers holds it
frrmly in place. The ASAT 125C is a good example of
this type of carrier.

Creating SMD Footprints on a PCB
Footprint or solder-pad design for PCBs is a critical
part of good SMT design. This is because SMDs are not
rigidly connected to the PCB during soldering, as are
through-hole components. SMDs essentially float during
the soldering process. This floating results from differences ,in surface tension due to uneven cooling after
soldering.

Surface-Mount Packages

The Cypress product line includes three SMD package types: a 52-lead plastic leaded chip carrier (PLCC), a

8-34

devices, wave soldering or combination wave/reflow
soldering is usually necessary. When you use these soldering techniques, you must apply an adhesive to the PCB to
hold the SMDs in place until soldering.
The use of adhesives brings a new set of potential
problems, especially relating to product reliability. The
adhesive might absorb moisture and create a short on the
PCB. The adhesive might also degrade in an unattractive
way, causing marring or shorting of other system components. For these reasons, select adhesives for their
lifetime properties. As a general guide to adhesive selection, follow this framework:
System: Account for the intended application of the
system in which the surface-mount PCB is used. An
adhesive that has lifetime properties suitable for a
workstation environment might not be optimum for
industrial or military applications. Selecting the right
type of adhesive at this stage prevents system failures
during the product's lifetime.
Device: Keep in mind the type of SMD used on the
PCB. Most adhesives work with the plastic and
ceramic SMT packages used by Cypress. However,
other SMDs on your PCB might require a different
type of adhesive.
Process: How will the adhesive be applied to the
PCB? The three available methods are pin transfer,
Screen printing, and pressure syringe. Each method
has its own advantages and disadvantages. The

The effects of floating can be reduced by carefully
crafting the pad sizes; Reducing the pad width is the frrst
step. A pad that is too long causes the SMD to float off
the high point on the pad and over to one side. A pad that
is too wide might allow the component to rotate. The
ideal pad is almost exactly the same size as the SMD
lead's contact surface. The pad width should equal 1.02
times the lead width, and the pad length should equal 1.02
times the lead contact length.
For PLCC devices (CY7C157), it is important that
the lead footprints on the PCB not run too far under the
package. Footprints should be extended out approximately
0.050 inch to the outside of the package. This helps
reduce solder bridges under the PLCC, where they cannot
be seen during visual inspection.

Fixing SMDs in Place
Through-hole devices are fixed in place in a socket or
PCB either by lead bending or the mechanical tightness of
the lead fit. SMDs are not. You must use adhesives to
frrmly fix SMDs in place before soldering. The only exception is when you use reflow soldering. In this technique, solder paste is applied to the PCB before the SMDs
are place. Then, IR lamps or hot air cause the solder paste
to reflow. The paste usually has sufficient adhesion to
hold the devices in place until soldering is complete.
However, not all SMT PCBs can use solder reflow. If
a board includes a mixture of through-hole and SMT
.l...OSa

1.106 sq.

1

0.998 sq.

.1.2!6

1.266 sq.

~-n160 pin EIAJ standard QFP
All dimensions In Inches

~~

0.0256

U

166-Pin Quad Flat Package (Top View)

Figure 2.CY7C611 PQFP

8-35

-Q...136

~1~

1

f

0.037

You can use either a stand-alone adhesive application
machine or one integrated into a pick-and-place
system.
Adhesive: The fmal adhesive choice must be compatible with all the requirements you establish. This
choice is often driven by the type of machine you
choose, because the. machine might have been
designed with a specific adhesive type in mind. This
makes the adhesive choice the responsibility of the
machine manufacturer.

dominant criteria for selection are the estimated
production volume and the type of PCB substrate
used. For example, screen printing demands a flat and
distortion-free substrate. You cannot use this method
for PCBs that already have components on them, such
as pre-loaded mixed-print boards. Als.o limiting your
choice is the fact that all adhesives are not compatible
with all three methods.
Machine: The application process you choose drives
the selection of a machine for applying the adhesive.

ts

I'·

1.102 ± 0.008 sq.

rJlHllIllllJllB. . . .lIlIIBmn:LT

PIn 208
Pin 1

0.008

208-pin EIAJ standard QFP
All dimensions in inches

Figure 3.

CY7C601/CY7C604/C~7C605

8-36

CQFP

features (vias, pads) is determined by the inaccuracies of
the PCB fabrication and layer masking process.
Because of these inaccuracies, determining pad location by absolute methods - in terms of X-Y coordinates
from the pick-and-place machine - does not work. The
only way to achieve the required accuracy is to actively
determine the location of the component on the end of the
pick-up nozzle relative to the PCB. This is done by determining the location of the PCB by vision system inspection of the fiducials and extrapolating this location to
determine the location of the SMD pads.
An accurate vision system can determine the location
of a fiducial to within 0.0007 inch. In the worst case then,
the starting inaccuracy of the pick-and-place machine is
0.0017 inch (O.OOl-inch pick-up nozzle inaccuracy plus
0.OO7-inch location inaccuracy). Because the leads must
be placed within 0.002 inch of the actual pad location, this
only leaves 0.003 inch for machine inaccuracies in arm
location and in the PCB holding fixture.
The only way to reduce this inaccuracy is to use multiple fiducials, which permit an angular orientation accuracy of ±O.2°. The use of multiple fiducials means that
you must use a computationally powerful vision system
with interpolation algorithms, which implies high cost and
slow fabrication.

SMD Alignment
The very fine pitch between leads on the 208-lead
EIAJ CQFPs (0.0196 inch) places exacting requirements
on pick-and-place machines. Cypress uses this package for
some products, which usually require absolute placement
to within ±0.OO2 inch or less in both X and Y coordinates,
relative to the lead pads on the PCB. Additionally, angular
error should be held to less than 10. This requires that the
pick-and-place machine have rotational correction
capability. Vision capability is also needed.
It is important to realize that an interacting set of inaccuracies determine the required placement accuracy.
The first inaccuracy is the location of the CQFP with
respect to the vacuum pick-up nozzle. Usually, the pickup nozzle only has a general idea of its position with
respect to the true center of the device. This general idea
is not sufficient for fine-pitch CQFP. devices. Pick-up
position needs to be controlled by accurately positioning
the waffle pack (if used) in relation to the pick-and-place
machine. The pick-up nozzle usually picks up the device,
then repositions it by use of a centering system. Centering
is done with reference to the leads' edge surfaces.
Depending on the centering system's capability, it can
achieve an accuracy of ±0.001 inch of the placement center to the device center. This is half the allowable error,
and the device has not been placed on the PCB yet.
A vision system guides SMD placement on the PCB.
The vision system first orients itself either by detecting
fixed locating patterns on the PCB called fiducials or by
looking for unique combinations of pads and vias that
occur at fixed places on the PCB. Fiducials are the
preferred method, because they take less processing
capability for the vision system to recognize, and their
location on the PCB can be more tightly controlled. One
fiducial allows the vision system to locate itself with relation to the PCB. Two fiducials allows the vision system to
establish a second-order level of correction, which encompasses X-Y offset, angular offset, and a linear expansion/contraction compensation for the medium. Adding a
third fiducial improves the accuracy of these corrections
through use of an interpolation algorithm in the vision
machine software. It is a good idea to use several levels of
fiducials to give several levels of position and angular
correction: PCB to PCB, circuit to circuit, or component
to component.
As mentioned earlier, inaccuracies have a compounding effect. The machine's location is determined by the
inaccuracies of its placement on the shop floor. The
PCB's location is determined by the inaccuracies of its
placement in the PCB fixing jig. The location of the PCB

Component Spacing
Because of the fine pitch of the EIAJ CQFPs used by
Cypress, it is important to recognize the effects that the
close tolerance of the PCB pads and vias can have upon
the PCB's solderability. The prime objective here is to
reduce solder bridging, which occurs when a pad, lead, or
via connects to the wrong place, causing either shorting or
a path for random circuit effects.
You can control solder bridging by ensuring that the
clearance between PCB vias and pads is large enough to
prevent solder migration. O.OI-inch air-gap distance between vias and pads is recommended. 0.012 inch is
0
recommended where a 90 via point is adjacent to a pad.
At least 0.025 inch should be available between pads.
Careful alignment of the solder mask is also helpful
in reducing solder migration. Use a photo-imaged mask
coating. Keep the maximum clearance of the solder mask
in relation to the pads and PCB vias to 0.005 inch.

References
Traiser, John E. Design Guidelines for Surface Mount
Technology. Academic Press, Inc., New York, 1990.
Prasad, Surface Mount Technology Principles and
Practice.

8-37

CYPRESS
SEMICONDUCTOR

Memory System Design
for the CY7C601 SPARC Processor
This application note describes a simple 25-MHz
CY7C601 memory design for non-cache-memory applications. The memory subsystem consists of 128
Kbytes of data RAM and 128 Kbytes of instruction
RAM. (You can easily expand the instruction RAM to
256 Kbytes using this design.) The difference between
data memory and· instruction memory is that the
CY7C601 integer unit (IU) is not allowed to write to
instruction memory. This restriction implies that an external device loads instruction RAM at power~up.
The design utilizes the CY7C157 cache RAM,
which is specifically intended for use with the CY7C601
and the CY7C604/605 cache/memory management unit
(CMMU). When used in this .environment, the CMMU
provides all necessary control signals (byte writes and
output enables). This article shows that the CY7C157
also adapts easily to non-cache applications.
First, this application note describes the CY7C157,
followed by a brief description of the CY7C601 bus interface. Second, a design is presented that uses the
CY7C330 EPLD to generate· the byte-write signals and
the CY7C332 EPLD to provide the output-enable signals. Figure1 shows the design's block diagram.

the IU sends the address bus, data bus, and all memory
interface signals (except INULL) unlatched; they
should be latched externally before being used (more
on this later).

Memory Wait States and Exceptions
The memory design described here needs no wait
states, but you can find information on this topic and on
memory exceptions in the IU data sheet.

Bus Cycles
Assuming that the system does not contain a floating~point processor or a coprocessor, memory must
deal with these bus cycles: instruction fetch, load single,
load double, store single, store double, and atomic
load/store.
Instruction Fetch
The IU sends out address and control bits at the
beginning of the fetch cycle. Remember that you must
latch these bits externally. At the end of the fetch cycle,
the IU latches instruction data from the data bus into
an on-chip instruction register.
The first cycle in Figure 2 illustrates an instruction
fetch. Because all instruction fetches are single-cycle

CY7C157 Cache RAM
The CY7C157 cache RAM is a very high performance 16K x 16-bit static RAM. This device employs
common I/O architecture and a self-timed byte-write
mechanism. The self-timed write eliminates the difficult
task of generating accurate write strobes in high-speed
systems. Address and write-enable. inputs load into
input registers on the system clock's rising edge. The
SRAM provides data-input and -output latches, along
with· an asynchronous output enable. The CY7C157 is
available in 20-, 24-, and 33-ns speed grades. Because a
25-MHz IU requires the slowest device offered, 33 ns,
this device is used for the memory system presented
here.

7C601
SPARC

Processor

CY7C601 Bus Interface
The IU has a 32-bit address bus and can directly
address 4 Gbytes of memory. In the cycle prior to use,

Figure 1. Block Diagram

8-38

operations, they incur no pipeline delays. Under some
conditions, the processor is unable to fetch an instruction, usually because a prior multi-cycle instruction
needs to use the bus. When this occurs, the processor
asserts !NULL to indicate that the current fetch cycle
should be nullified.

Addr/Size

Load Cycles
The first and second clock cycles in Figure 2 show
the timing for a load single integer instruction. Load
single integer is a two-cycle operation: The first cycle
fetches the load instruction, and the second cycle actually loads the required information from memory. A
load double instruction is similar to the load single instruction except that a third cycle is added to fetch the
second data word from memory. Figure2 also illustrates
this event.

RD

to : VZL1 : V2d·.L~

DXFER

~ : f7zrt··~+VZl
.

.

-----

DataIn

Figure 2. Load/Load Double Timing

Store Cycles
Figure 3 illustrates store single and store double instructions. A store single requires three clocks: The
store instruction is fetched during the first clock.
During the second clock, the destination address of the
store is driven onto the bus. Store data is driven onto
the data bus at the middle of cycle two and removed at
the middle of cycle three. Memory update occurs in
cycle three. The store address's early arrival allows it to
be checked for possible write-protect violations or
memory exceptions in systems that implement these
features.
The store double instruction closely resembles a
store single instruction, except for an extra cycle needed
to store the second data word. Note that the second
store's address is set to the first address plus 4, and that
the size bits are set to 11, indicating a double-bus access.

The CY7C157 requires a 6-ns write-enable-set-upto-clock-Low time and 3 ns write-enable hold from
clock Low. From the store transaction timing diagrams,
you can see that the store data valid times are referenced to the system clock's falling edge, while transaction information (address, size, etc.) is referenced to the
same clock's rising edge. The desired PLD architecture
for the write-enable generator must provide one clock
for clocking in the transaction information and a
separate clock for clocking out the write enables. The
Cypress CY7C330 state machine can handle this task.
The next critical factor is: Can the CY7C330 meet
the write enable set-up and hold times? Inspection of
the CY7C330-50WC data sheet for teo and toH specs
indicates that the device meets these conditions. Figure
5 shows that any write enable is valid 15 ns after
Sys Ck's falling edge (thus providing a. 25-ns set-up
time) and is held for 3 ns after Sys _Ck' s falling edge
(matching the required hold time at the CY7CI57).

Atomic Load/Store Cycles
Atomic transactions consist of two or more transactions that are indivisible; once started, the sequence
cannot be interrupted. To ensure bus access for the
second transaction, the IU asserts the LOCK signal for
the necessary length of time. Figure 4 shows the timing
of an atomic load/store instruction.

Sys_clk
State_clk
Addr/Size
RD

WE

Design Considerations

WRT

Using the CY7C157s in a non-cache application requires generation of appropriate byte-write signals and
output enables. Because the CY7C157 does not require
a chip select when used with the CMMU, this design
decodes separate sets of write enables for each 64
Kbytes (16 Kword deep) block of RAM. An output
enable must also be generated on 16-Kword boundaries
during reads. Because address and data set-up/hold requirements between the IU and the CY7C157 are
guaranteed by design, you can concentrate on the writeenable and output-enable timing requirements of the
CY7CI57-33.

DXFER
Data
·WAXlWBX· -~---':"--'\~-;"'_--;_-JI
!NULL

if\* Signal from 7C330 PLD

Figure 3. Store/Store Double Timing

8-39

Addr/Size
Addr/Size
:

WRT

tU' '------

RD

RD

:

Data

In~~

Data Out

----L----------ct=>DXFER . VA i I/L//f : '-S s-t---<=>-- :

DXFER

Figure 5. Actual Timing

~:

also uses these lines to inhibit writes on unaligned boundaries; you can easily' modify this feature to generate a
memory exception. Address 14 selects between Bank A
write enables (lower 16 Kwords) and Bank B write
enables (upper 16 Kwords) for the data RAMs. The address is sent out unlatched and must be latched externally before use. If the address output enable (/AGE)
or test output enable (/TOE) signals are deasserted, the
address bus three-states.
INULL: occurs on two occasions. First, it always
occurs during the second cycle of a store transaction to
tell the memory subsystem that the current memory
transaction has proceeded too far to be nullified; i.e_, it
is. too late to initiate a wait state or memory exception.
Second, INULL can occur during a transaction's first
cycle to tell the memory subsystem to ignore the transaction entirely. This signal is of consequence only for
store transactions that must be inhibited before the
write occurs.
/Reset: an active-Low input to the CY7C330 PLD
that forces all outputs to the inactive state. It is a clocked reset.
Table 1. Byte Write Signals

Figure 4. Atomic Load/Store Timing
For reads, Figure5 shows that the CY7C332 output
delay plus the CY7C157 output-enable time· provides a
5-ns data set-up time, which easily meets the IU's 3-ns
requirement Data hold time requirements are determined by examining the CY7C332 output-enable hold
time from Sys Ck's falling edge. This hold time is 3 os,
which, when added to a 2-ns minimum turn-off time for
the CY7C157, guarantees the required' 5-ns data hold
time at the IU.

CY7C330 Write-Enable Design
The signals required to generate the byte-write signals appear in Table 1_ The signals are defined as follows:
.
State Clock: the inverted version of System_Clock.
State Clock drives the state registers in the CY7C330
PLD-:System Clock: the clock that drives the IU and
CY7C330 iilput registers. All transaction information is
valid on Syste~Clock's rising edge.
Advanced Write: The processor asserts (sets to 1)
Advanced Write (WRT) during the fIrst data cycle of
single or double integer store instructions and during
the second cycle of atomic load/store instructions. WRT
is send out unlatched and must be latched externally
before it is used.
Size(1:0): These two bits specify the data size associated with all transactions on the data bus. The IU
sends out size bits unlatched. The value of these bits
indicates the data size corresponding to the current
cycle's memory address. The size bits are valid at the
same time as the address bus. Because all instructions
are 32 bits long, Size(I:0) is set to 10 during all instruction fetch cycles_ Encoding of the size. bits is shown in
Table 2.
Address (1:0), Address 14: Address (1:0) decodes
individual byte-write lines for writes within a 32-bit
word boundary. The CY7C330 design described here

8-40

Name

Mnemonic

State Clock
System Clock
Advanced Write
Size(I:0)
Adr(I:0)

St Ck
Sys Ck
WRT
Sizel, SizeO
AI, AO

Adr14
INULL

INULL

/Reset
/Output Enable
/Write Enables - Bank A
/Write Enables - Bank B

!Rst
tOE
!WA3 - !WAO
!WB3 - !WBO

A14

~

~~RESS
Memory System Design for the CY7C601SPARC
~J'~CaIDUcrOR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

=-

Table 2. Size bit encoding

Table 4. Memory Subsystem Characteristics

Size(I:0)

Transaction Type

Component

Quantity

Power

00

Byte

CY7C157-33

8

1.375W

01

Halfword

CY7C330-50

1

0.99W

10

Word

CY7C332-20

1

0.99W

11

Double Word

TOTAL

10

13.0W

/OE: an active-Low input to the CY7C330 PLD
that enables all the device's outputs. When High, all
CY7C330 outputs are three-stated. The source file containing the PLD equations for the CY7C330 writeenable generator appears in Appendix A.

write-enable generator, complete memory control is
achieved in just two PLDs.
Pin 1 of the CY7C332 is the system clock, active on
the rising edge. Pins 2 - 4 are address bits 16 - 14, which
are used in the output-enable decoding. Pins 5 and 6
are the IU size bits.
For instruction fetches, if SJZE does not equal lOB
(see Table 2), then IFMEMx is made active. The SJZE
bits are ignored for data fetches, because all alignment
occurs in the IU.
RD = 1 signifies that the following cycle is a read
cycle. DXFER = 1 signals that the following cycle is a
data transfer. Conversely, if DXFER = 0, the next
cycle is a non-data (instruction) cycle. The INULL signal is not needed here, because the CPU ignores instruction/data fetched in the next cycle anyway. DOEx
and IOEx are the data output enables and instruction
output enables, respectively. IFMEMx occurs when an
instruction fetch is attempted with SIZE not equal to 10
(one word).

CY7C332 Output-Enable Design
Table 3 lists the signals used to generate the required output-enable signals. Appendix B shows the out-

put-enable circuit's design file, implemented for the
CY7C332 PLD using the Cypress PLD ToolKit. The
PLD ToolKit is an assembler/simulator package for
PLDs.
The design utilizes a CY7C332 to generate five instruction output enables and five data output enables
for a Cypress SPARC-based, non-cached memory system. Each output enable is decoded on a 16-Kword
boundary (word = 32 bits). The CY7C332 suits this application especially well, because this one PLD incorporates input latch/registers with output decoding.
When combined with a CY7C330 programmed as a

Conclusions

Table 3. Output Enable Signals
Name

Mnemonic

System Clock

Sys Ck

Size(I:0)

Sizel, SizeO

Adr(16:14)

AI6,AI5,AI4

INULL·

INULL

/Reset

!Rst

/Output Enable-> 332

!OE

/Output Enables - Inst
Bank

!lOE4 - !lOEO

/Output Enables - Data
Bank

!DOE4 - !DOEO

Inst Fetch Mem Exception

!IFMEMx

The design presented here provides 128 Kbytes of
instruction memory and 128 Kbytes of data memory
with just ten components (eight CY7C157's, one
CY7C330, and one CY7C332). Table 4 tabulates some
of the memory subsystem's key characteristics.
You can easily expand the memory subsystem's
capacity by using the CY7C330's four additional outputs
as write enables. This change furnishes another 64
Kbytes of data memory. The CY7C332 design already
provides output enables for 320 Kbytes of data memory
and 320 Kbytes of instruction memory.
For systems requiring even larger memory spaces,
you can make a tradeoff with the CY7C330. If the smallest write boundary is changed to half word (16 bits)
instead of byte, the CY7C330 can provide byte writes
for 384 Kbytes of data memory. Similarly, for systems
requiring only 32-bit writes to data memory, a single
CY7C330 can provide the required write enables for
768 Kbytes of memory. However, this configuration requires an additional CY7C332 to decode output enables
for data memory reads.

R-41

~C'/PRI$

Memory System Design for the CY7C601 SPARC

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. ABEL CY7C330 Write Enable PLD Equations

Module SPARC_WRTENB flag '-r3'
title
LIBRARY 'P330';
IC device 'P330';
St_Ck,Sys_Ck,INULL, Rst
WRT,Size1,SizeO,A1,AO,A14
Reset, Set
!WA3,!WA2,!WA1,!WAO
,!WB3,!WB2,!WB1,!WBO
fOE

'SPARC Write Enable Generator'
"Enable various useful macros
Pin 1, 2, 10, 13;"Inputs
Pin 3, 4,5,6,7,9;
node 29, 30; "Outputs and Internal Node declarations.
Pin 28, 27, 26, 25;
Pin 24, 23, 20, 19;
Pin 14;

is type 'Pin';

!WA3.0E

"Enable pin 14 as common OE for all outputs

SIZE = [Size1,SizeO];
ADR = [AI, AO];
WA = [ WA3,WA2,WA1,WAO ];
WB = [WB3,WB2,WB1,WBO];

"Definitions for readability and test vector generation

H,L,C,X,Z = 1,O,.C.,X.,.Z.;

"Declarations

equations
WA3.0E = fOE;

"Tum on outputs

WA3 :=

!Rst & !!NULL & !A14 & WRT & (SIZE == 0) & (ADR == 3)
# !Rst &!!NULL .&!AI4 & WRT & (SIZE == 1) & (ADR == 2)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 3) & (ADR == 0)
# !Rst & !A14 & (SIZE == 3) & (ADR == 0) & WA3;

WA2 :=

!Rst & !!NULL & !A14 & WRT & (SIZE == 0) & (ADR == 2)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 1) & (ADR == 2)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 3) & (ADR == 0)
# !Rst & !A14 & (SIZE == 3) & (ADR == 0)& W A2;

WA1.-

!Rst & !!NULL & !A14 & WRT & (SIZE == 0) & (ADR == 1)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 1) & (ADR == 0)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst & !!NULL & !A14 &. WRT & (SIZE == 3) & (ADR == 0)
# !Rst & !A14 & (SIZE == 3) & (ADR == 0) & WA1;

WAD :=

!Rst & !!NULL & !A14 & WRT & (SIZE == 0) & (ADR == 0)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 1) & (ADR == 0)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst & !!NULL & !A14 & WRT & (SIZE == 3) & (ADR == 0)
# !Rst &. !A14 & (SIZE == 3) & (ADR == 0)& WAO;

8-42

Appendix A. ABEL CY7C330 Write Enable PLD Equations (Continued)

WB3 :=

!Rst & !!NULL & A14 & WRT & (SIZE == 0) & (ADR == 3)
# !Rst & !!NULL & A14 & WRT & (SIZE == 1) & (ADR == 2)
# !Rst & !!NULL & A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst & !!NULL & A14 & WRT & (SIZE == 3) & (ADR == 0)
# !Rst & A14 & (SIZE == 3) & (ADR == 0)& WB3;

WB2 :=

!Rst & !!NULL & A14 & WRT & (SIZE == 0) & (ADR == 2)
# !Rst & !!NULL & A14 & WRT & (SIZE == 1) & (ADR == 2)
# !Rst & !!NULL & A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst & !!NULL & A14 & WRT & (SIZE == 3) & (ADR == 0)
# !Rst & A14 & (SIZE == 3) & (ADR == 0)& WB2;

WB1 :=

!Rst & !!NULL & A14 & WRT & (SIZE == 0) & (ADR == 1)
# !Rst & !!NULL & A14 & WRT & (SIZE == 1) & (ADR == 0)
# !Rst & !!NULL & A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst &!!NULL & A14 & WRT & (SIZE == 3) & (ADR == 0)
# !Rst & A14 & (SIZE == 3) & (ADR == 0)& WB1;

WBO :=

!Rst & !!NULL & A14 & WRT & (SIZE == 0) & (ADR == 0)
# !Rst & !!NULL & A14 & WRT & (SIZE == 1) & (ADR == 0)
# !Rst & !!NULL & A14 & WRT & (SIZE == 2) & (ADR == 0)
# !Rst &!!NULL & A14 & WRT & (SIZE == 3) & (ADR == 0)
# !Rst & A14 & (SIZE == 3) & (ADR == 0)& WBO;

Appendix A. ABEL CY7C330 Write Enable PLD Equations (Continued)
"Test vectors for WA outputs, WB outputs are similar except for A14
"Note that the WA outputs are treated as active-high in the test vectors
"since they were declared as active-low in the pin declaration sections.
Test vectors
([!OE,!Rst,St Ck,Sys Ck,WRT,INULL,SIZE,ADR,A14]
[ 0,0, 0, 0, X~X, X, X, X ] -> [ X,X ];
[ 0,0, 0, I, X, X, X, X, X ] -> [ X,X ];
[ 0,0, I, 0, X, X, X, X, X ] -> [ 0,0 ];

[WA,WB]);

->

"vi Reset

"WRT = 0 = WAx inactive
[0,1,0, 1,0,0, X, X, 0] -> [0,0];
[ 0,1, I, 0, 0,0, X, X, 0] -> [ 0,0 ];
"Halfword transactions to lower word (bytes 1:0)
[ 0,1, 0, I, I, 0, I, 0, 0 ] -> [ 0,0 ];
[ 0,1, I, 0, 1,0, I, 0, 0 ] -> [ 03,0 ];
"Halfword write on byte boundary results in IU generated alignment error.
[ 0,1, 0, I, 1,0, I, 1,0] -> [ 03,0 ];
[ 0,1, I, 0, I, 0, I, I, 0 ] -> [ 00,0 ];
"vlO
"Halfword write to upper word
[ 0,1, 0, I, 1,0, I, 1,0] -> [ 03,0 ];
[ 0,1, 1,0, 1,0, I, 1,0] -> [ 00,0 ];

"v10

"Halfword write to upper word
[ 0,1, 0, I, I, 0, I, 2, 0] -> [ 00,0 ];
[ 0,1, I, 0, I, 0, 1,2, 0] -> [ Oc,O ];
[ 0,1, 0, I, 0, 0, I, X, 0] -> [ Oc,O ];
[ 0,1, 1,0, 0, 0, I, X, 0] -> [ 00,0 ];
"Word write on byte bndary results in IU generated alignment "error
[ 0,1, 0, I, 1,0, I, 3,0] -> [ 0,0 ];
[ 0,1, 1,0, 1,0, I, 3, 0] -> [ 0,0 ];
"Verify WA follows byte writes correctly [!OE,!Rst,StCk,SyCk,W,I,S,ADR,A14]
[ 0,1, 0, I, I, 0, 0, 3, 0 ] -> [ 0,0 ];
"v20
[ 0,1, I, 0, 1,0,0, 3, 0] -> [ 08,0 ];
[ 0,1, 0, I, 1, 0, 0, 2, 0 ] -> [ 08,0 ];
"wrt byte 3
[ 0,1, I, 0, 0, 0, 0, 2, 0] -> [ 04,0 ];
"wrt byte 2
[ 0,1, 0, I, 1, 0, 0, I, 0] -> [ 04,0 ];
[ 0,1, I, 0, 0, 0, 0, I, 0] -> [ 02,0 ];
[ 0,1, 0, I, 1, 0, 0, 0, 0 ] -> [ 02,0 ];
"wrt byte 1
[ 0,1, I, 0, 0, 0, 0, 0, 0 ] -> [ 01,0 ];
[ 0,1,0, 1, 0,0,0, 0, 0] -> [ 01,0 ];
"wrt byte 0
[ 0,1, I, 0, 0, 0, 0, 0, 0] -> [ 00,0 ];
"writes are inactive
"Verify single store works correctly [!OE,!Rst,StCk,SyCk,W,I,S,ADR,A14] for ease of programming only
[0,1,0, 1, 1,0,2,0,0] -> [0,0];
[ 0,1, 1, 0, 1, 0, 2, 0, 0] -> [ Of,O ];
[ 0,1, 0, 1, 0, 0, 0, X, 0] -> [ Of,O ];
[ 0,1, 1,0, 0, 0, 0, X, 0] -> [ 0,0 ];
"v30

R-44

......-..

£;~RESS

,

Memory System Design for the CY7C601 SPARC

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Appendix A. ABEL CY7C330 Write Enable PLD Equations (Continued)

°°]] °
°°]]
°°]] °

"Verify WA responds correctly to double stores
-> [ ,0 ];
[ 0,1, 0, 1, 1,0, 3, X,
[ 0,1, 1,0, 1, 0, 3, X,
-> [ Of,O ];
[ 0,1, 0, 1, 0, 0, 3, X,
-> [ Of,O ];
[0,1, 1,0, 0, 0, 3, X,
-> [ Of,O ];
[ 0,1, 0, 1, 0, 0, 2, X,
-> [ Of,O ];
[0,1, 1,0, 0, 0, 2, X,
-> [ ,0 ];
"Do the same thing for the WB outputs (no comments)
[ 0,0, 0, 0, X, 0, X, X, X ] -> [ X,X ];
[ 0,0, 0, 1, X, 0, X, X, X ] -> [ X,X ];
[ 0,0, 1, 0, X, 0, X, X, X] -> [ 0,0 ];

°

"vI Reset

"WRT =
= WAIB inactive
[0,1,0, 1,0,0, X, X, 1] -> [0,0];
[0,1, 1,0,0,0, X, X, 1] -> [0,0 ];
"Halfword transactions to lower word (bytes 1:0)
[ 0,1,0, 1, 1,0, 1,0, 1 ] -> [ 0,0];
[ 0,1, 1,0, 1, 0, 1, 0, 1 ] -> [ 0,03 ];
"Halfword write on byte boundary - occurence results in IU generated alignment error.
"vlO
[ 0,1, 0, 1, 1,0, 1, 1, 1 ] -> [ 0,03 ];
[ 0,1, 1,0, 1,0, 1, 1, 1 ] -> [ 0,0 ];
"Halfword write to upper word
[ 0,1, 0, 1, 1,0, 1, 2, 1 ] -> [ 0,0];
[ 0,1, 1, 0, 1, 0, 1,2, 1] -> [ O,Oc ];
[ 0,1, 0, 1, 0, 0, 1, X, 1 ] -> [ O,Oc ];
[ 0,1, 1,0, 0,0, 1, X, 1 ] -> [ 0,0 ];
"Word write on byte boundary results in IU generated alignment "error
[ 0,1, 0, 1, 1,0, 1, 3, 1 ] -> [ 0,0 ];
[ 0,1, 1, 0, 1, 0, 1, 3, 1 ] -> [ 0,0 ];
"Verify WB follows byte writes correctly
[!OE,!Rst,StCk,SyCk,W,I,S,ADR,A14]
[ 0,1, 0, 1, 1,0, 0, 3, 1 ] -> [ 0,0 ];
[ 0,1, 1,0, 1,0,0, 3, 1 ] -> [ 0,08 ];
[ 0,1, 0, 1, 1,0,0, 2, 1 ] -> [ 0,08 ];
[ 0,1, 1, 0, 0, 0, 0, 2, 1 ] -> [ 0,04 ];
[ 0,1, 0, 1, 1, 0, 0, 1, 1 ] -> [ 0,04 ];
[ 0,1, 1, 0, 0, 0, 0, 1, 1 ] -> [ 0,02 ];
[ 0,1, 0, 1, 1,0,0, 0, 1 ] -> [ 0,02 ];
[ 0,1, 1, 0, 0, 0, 0, 0, 1 ] -> [ 0,01 ];
[ 0,1, 0, 1, 0, 0, 0, 0, 1 ] -> [ 0,01 ];
[ 0,1, 1, 0, 0, 0, 0, 0, 1 ] -> [ 0,0 ];

"v20
"wrt byte 3
"wrt byte 2
"wrt byte 1

°

"wrtbyte
"writes are inactive

"Verify single store works correctly [!OE,!Rst,StCk,SyCk,W, I, S,ADR,A14] for ease of programming only
[ 0,1, 0, 1, 1,0, 2, 0, 1 ] -> [ 0,0 ];
[ 0,1, 1, 0, 1,0,2,0, 1 ] -> [ O,Of];
[0,1,0,1,0,0,0, X, 1] -> [O,Of];
[ 0,1, 1, 0, 0, 0, 0, X, 1 ] -> [ 0,0 ];

·

~RESS
Memory System Design for the CY7C601 SPARC
~,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Appendix A. ABEL CY7C330 Write Enable PLD Equations (Continued)

°

"Verify WB responds correctly to double stores
[ 0,1, 0, 1, 1, 0, 3, X, 1 ] -> [ ,0 ];
[0,1, 1, 0, 1, 0, 3, X, 1 ] -> [ 0, Of ];
[ 0,1, 0, 1, 0, 0, 3, X, 1 ] -> [ 0, Of ];
[0,1, 1,0, 0, 0, 3, X, 1 ] -> [ 0, Of];
[ 0,1, 0, 1, 0, 0, 2, X, 1 ] -> [ 0, Of];
[0,1, 1,0, 0, 0, 2, X, 1 ] -> [ ,0 ];

°

"Check that all WA's and WB's are inhibited when !NULL occurs with WRT
[0,0,0,0, X, X, X, X, X] -> [X,X];
[ 0,0, 0, 1, X, X, X, X, X ] -> [ X,X ]j
"vi Reset
[0,0, 1, 0, X, X, X, X, X ] -> [ 0,0 ];
[ 0,1,0, 1, 1, 1,0,3, X] -> [ 0,0 ];
[0,1, 1, 0, 1, 1, 0, 3, X] -> [ 0,0 ];
"write byte 3
[ 0,1, 0, 1, 1, 1, 0, 2, X] -> [ 0,0 ];
[ 0,1, 1,0, 0, 1, 0,2, X] -> [ 0,0 ];
"write byte 2
[ 0,1, 0, 1, 1, 1, 0, 1, X] -> [ 0,0 ];
[0,1, 1,0,0, 1,0, 1, X] -> [0,0];
[ 0,1, 0, 1, 1, 1, 0,0, X] -> [ 0,0 ];
"write byte 1
[0,1, 1,0, 0, 1, 0,0, X] -> [ 0,0 ];
"write byte
[ 0,1, 0, 1, 0, 1, 0, 0, X] -> [ 0,0 ];
[0,1, 1, 0, 0, 1, 0, 0, X] -> [ 0,0 ];
"writes are inactive

°

"Double stores
[ 0,1, 0, 1, 1, 1, 3, X, X]
[0,1, 1,0, 1, 1, 3, X, X ]
[ 0,1, 0, 1, 0, 1, 3, X, X]
[0,1, 1,0, 0, 1, 3, X, X]
[ 0,1, 0, 1,0, 1, 2, X, X]
[0,1, 1,0,0, 1,2, X, X]

°°];

-> [ ,0 ];
-> [ 0,
-> [ 0, 0];
-> [ 0, 0];
-> [ 0,
-> [ ,0 ];

°°];
° °];
°a ];];
° °];

°°]]
°°]]
°°]]

" MORE REALISTIC
[ 0,1, 0, 1, 1, 0, 3, X,
[ 0,1, 1,0, 1, 0,3, X,
[ 0,1, 0, 1, 0, 1, 3, X,
[ 0,1, 1, 0, 0, 1, 3, X,
[ 0,1, 0, 1, 0, 0, 2, X,
[0,1, 1, 0, 0, 0, 2, X,
"Double stores
[ 0,1, 0, 1, 1, 0, 3,
[0,1, 1,0, 1, 0, 3,
[ 0,1, 0, 1, 0, 1, 3,
[0,1, 1, 0, 0, 1, 3,
[ 0,1, 0, 1, 0, 0, 2,
[0,1, 1, 0, 0, 0, 2,

X,
X,
X,
X,
X,
X,

"Inactive

OCCURANCE OF DOUBLE STORE !NULL
-> [ ,0 ];
-> [ Of,
-> [ Of,
-> [ Of,
-> [ Of,
-> [ ,0 ];

1 ] ->
1 ] ->
1 ] ->
1 ] ->
1 ] ->
1 ] ->

[
[
[
[
[
[

°°°
°°
°

,0 ];
,Of];
,Of ];

,Of];
,Of];
,0 ];

8-46

~

~~RESS
Memory System Design for the CY7C601 SPARC
4&1' ~COID~OR~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

=-

Appendix B. PLD ToolKit Source Code for CY7C332 Write Enable
CY7C332;
CONFIGURE;
Sys Ck,
Al6(ireg), A15(ireg), A14(ireg),
SIZE 1(ireg), SIZEO(ireg),
RD(ireg), DXFER(node = 9,ireg),

{Pin 1 }
{Pins 2 thru 4}
{Pins 5 and 6}
{Pins 7 and 9}
{Pin 8 is GND}
{Pin 14 is out enb}
{Pin 15 thru .. }
{.. 19 }
{Inst Fetch Mem Excp}
{Pins 24 thru .. }

!OE(node = 14),
!DOEO(nenbpt), !DOE 1(nenbpt), !DOE2(nenbpt),
!DOE3(nenbpt), !DOE4(nenbpt),
!IFMEMx(node = 23,nenbpt),
!IOEO(nenbpt), !IOE 1(nenbpt), !IOE2(nenbpt),
!IOE3(nenbpt), !IOE4(nenbpt),

{.. 28}

EQUATIONS;
IOE4 = RD & !DXFER & SIZE 1 & !SIZEO & !A16 & !A15 & !A14;
{A= OOO}
IOE3 = RD & !DXFER & SIZE1 & !SIZEO & !A16 & !A15 & A14;
{A = 001}
IOE2 = RD & !DXFER & SIZE 1 & !SIZEO & !A16 & A15 & !A14;
{A = 010}
IOE1 = RD & !DXFER & SIZE 1 & !SIZEO & !A16 & A15 & A14;
{A = Oll}
IOEO = RD & !DXFER & SIZE1 & !SIZEO & A16 & !A15 & !A14;
{A = 100}

IFMEMx =

RD & !DXFER & !SIZE1 & !SIZEO
& RD & !DXFER & !SIZE1 & SIZEO
& RD & !DXFER & SIZE1 & SIZEO;

{ Recall that for lost Fetches only SIZE(1:0) = '10' is allowed}
{SZ = OO}
{SZ = 01}
{SZ = ll}

{DOE's do not depend on SIZE bits, since IU does alignment internally}
DOE4 = RD & DXFER & !A16 & !A15 & !A14;
{A = OOO}
DOE3 = RD & DXFER & !A16 & !A15 & A14;

{A = 001}

DOE2 = RD & DXFER & !A16 & A15 & !A14;

{A = 010}

DOE1 = RD & DXFER & !A16 & A15 & A14;

{A = Oll}

DOEO = RD & DXFER & A16 & !A15 & !A14;

{A = 100}

8-47

CYPRESS
SEMICONDUCTOR

Cache Memory Design
The purpose of this application note is to provide a
general understanding of the attributes and engineering
tradeoffs of various cache designs. The fIrst section discusses the cache design goal and methods of achieving
that goal. Next, several major cache design facto~ are
described, with an explanation of each factor's aclvantages
and disadvantages. The application note then explores the
conditions for and techniques used in design of multilevel
cache for uniprocessor and multiprocessor environments.
The first commercial use of a cache memory was in
1969, the year IBM introduced the IBM 360/85. Since
that time, cache memory has spread from mainframes to
minicomputers to microcomputers, thus becoming an accepted design technique for a broad range of computing
machines.
Cache memory is an engineering solution to unacceptably high main-memory access times relative to CPU
cycle time - a difference so great that main-memory access time was severely limiting overall machine performance. The cache acts as a small, high-speed buffer between the CPU and main memory. This buffer is hidden
from the outside world; thus the name cache.
If designed properly, the cache makes the machine
appear to have a large amount of very fast main memory.
As an example of the effectiveness of this approach,·consider high-end machines such as the Amdahl 580 or IBM
3090. This caliber of machine has a main-memory access
time of 200 - 500 ns and a cache access time of 20- 50
ns, yielding an effective memory access time of 30 - 100
ns - a 5 to 7x increase in memory performance.
The use of cache memory has become very
widespread, as evidenced by cache being directly supported or included on-chip in a variety of microprocessors: the National Semiconductor 32000 family, the
Motorola 68000 family, the Intel 80386 and 80486, and
all of the currently available RISC families, such as the
Cypress CY7C600 SPARC family.

tradeoff analysis. You can achieve this goal by realizing
that most processor reference streams are both highly sequential and highly loop oriented.
Therefore, a cache operates on the principle of spatial
and temporal locality of reference. Spatial locality means
that information the CPU will reference in the near future
is likely to be logically close in main memory to information that is currently being referenced. Temporal locality
means that information the CPU is currently referencing is
likely to be referenced again in the near future. Through
these mechanisms, you can design a cache to ensure a
high probability that CPU references are located in the
cache.
Spatial locality of reference is serviced in the following manner: If a cache miss occurs (the cache does not
contain information requested by the CPU), the cache accesses main memory and retrieves the information currently being requested, as well as several additionallocations that logically follow the current reference. This set
of information is called a line or block. The next CPU
reference. now has a high statistical probability of being
serviced by the cache, thus avoiding main memory's relatively long access time.
Temporal locality of reference is serviced by allowing information to remain in the cache for an extended
period of time, only replacing the line in order to make
room for a new one. You can use several algorithms to
manage cache line replacement (more on this later). By
allowing the information to remain in the cache and with
a sufficient cache size, an entire loop of code can fit into
the cache, allowing very high speed execution of instructions in the loop.

The Cache Design Goal
The goal of a cache design is to reduce the effective
memory access time as seen by the CPU. Effective access
time can be expressed as:
tef!= tcache + m x tmain
where:
tcache
= Effective hit time of cache (i.e., cache access time)

Cache Design
The objective of cache design is to reduce the effective (or average) memory access time to an acceptable
level that is generally determined from cost/performance

8-48

m

=

tmain. This significantly reduces the translation time's
overall negative impact on teff if the hit rate is high.
The disadvantage of a virtual cache is that it is more
difficult to manage,· because support must be included to
detect and correct aliases (or synonyms). Aliasing occurs
when two virtual addresses translate to the same physical
address. This situation can occur, for example, when two
different programs in the CPU share pages placed in different locations in the two programs' respective address
maps.
Aliasing can be detected and corrected in a number
of ways. The most complete solution is to employ a set of
virtual cache tags (cache tags are explained later) and a
set of physical cache tags, which are used as cross-references to detect and prevent aliasing. The CY7C605
SPARC CMU-MP uses this methodology.
Another, less elegant, aliasing solution is to use an
operating system detector that either forces shared data to
the same cache line or marks shared data as non-cacheable. The CY7C604 SPARC CMU uses this technique.
Either of the two aliasing solutions described here
allow you to take advantage of a virtual cache's faster
response. As higher processor speeds place greater
demands on cache systems, virtual caching schemes will
become more popular.
On the other hand, system designers might not have
to choose between physical and virtual cache much longer
because, as integration levels increase, more and more
microprocessors become available with on-board cache. In
fact, several CISC (Complex Instruction Set Computer)
chips already contain on-board cache (32000, 68030,
80486), and several RISC architectures have been
proposed or introduced as a single chip with on-board
cache. As a result, virtual cache vs. physical cache is likely to become a silicon design issue, with system-level
designers focusing on methods of designing an efficient
second-level cache to back up relatively small on-board
cache.
In the event of a multilevel cache hierarchy, you
might not have the option of choosing where to place the
cache. If the cache is on the processor chip, it will probably force a physical level 2 cache. It is also probable that a

Miss rate of cache

Main-memory access time (penalty
beyond tcache for main-memory accesses)
Thus, design of a cache revolves around:
Minimizing the time for the cache to service a hit
Maximizing the hit rate (hit rate = 1 - miss rate)
Minimizing the delay due to a cache miss· (included
in tmain)
Minimizing the overhead delay associated with keeping main memory coherent - in agreement with the
data in the cache - especially in multicache configurations (included in tmain)
Generally, all these factors are affected in some way
by the design parameters discussed below. To simplify the
overall design process, you might find it useful to view
cache design from the following macroarchitecture viewpoints, each of which can be broken into one or more
microarchitectural parameters:
Cache placement
physical vs. virtual cache
Cache organization
cache mapping method
cache size .
cache line size
split cache vs. combined cache
Cache management
main-memory coherence schemes
line-replacement algorithms
fetching algorithms
The next several sections examine the microarchitectural aspects of these factors in detail, giving the performance tradeoffs relative to the four cache performance
factors identified above. After describing these design
parameters, this application note pulls together the critical
parameters of cache design and presents a method of calculating estimates for effective cycle time.

tmain

=

Cache Placement
Along with the cache, an address translation unitusually called a Memory Management Unit (MMU)resides between the CPU and main memory. The MMU
maps the virtual addresses generated by a program and
used by the CPU to the physical addresses used to access
main memory.
Figure 1 shows two ways of arranging the cache and
MMU. You choose between these two approaches by
answering the question: Where in the system should the
MMU delay occur? Traditionally, caches have been referenced with physical addresses, as in Figure 1a. A physical
cache is easier to manage but slower than a virtual cache,
which is referenced by virtual addresses as in Figure 1b.
A physical cache is slower because tcache includes the
address translation time; thus, the translation delay occurs
on every memory reference. A virtual cache allows address translation to occur in parallel with cache access,
thereby shifting the translation tim.e penalty from tcache to

1a. Physical Cache System

1b. Virtual Cache System

Figure 1. Cache Placement

8-49

HII

CACHE UNE 1

11'

CACHE UNE m
~E

MAlI MEMORY

I

Direct Mapping
Figure 2 illustrates direct mapping. Each location in
main memory maps to a unique location in the cache. For
instance. location 1 in main memory maps to location 1 in
the cache. Location 2 in main memory maps to location 2
in the cache. Location m in main memory maps to location m in the cache. Location m+l in main memory maps
to location 1 in the cache. etc.
A simplistic direct-mapped cache implementation appears in Figure 3. A direct-mapped cache consists of a
data memory. a tag memory. and a comparator. The data
memory contains the cached data and instructions; its size
is defmed as the cache size. The tag memory uses the
comparator to determine whether the cache contains the
line being addressed by the processor.
The memory address is. split into three fields: a tag
field, an index field. and a word-offset field. The tag field
consists of the address's higher order bit. The index field
addresses the tag memory to see if the line being accessed
is the line the processor wants. This mechanism ensures
that, for example, data from the desired cache location
2m+4 is retrieved instead of data from cache location
4m+4. which would reside in the same location in the
cache. The line size is defmed as the basic unit of transfer
between the cache and main memory and is typically an
even binary amount such as 16.32.64. or 128 bytes.
The number of bits in each field of the address can be
deciphered as follows:
i = log2 (# cache tag entries)
w = log2 (line size)
i +w= log2 (cache size)
When an address is presented to the cache. the bits of
the index field address the tag memory. The tag in the
location addressed by the index field is presented at the
tag memory's outputs. This tag is compared with the reference tag, while the cache subsystem also checks to see
that its status bits (Le .• VALID. DIRTY. etc., explained
later) are in appropriate states.
In· parallel with the tag access and status check, i +w
bits are used to address the data memory; the accessed
word is placed in the DATA OUT buffer. If the tags
match and the status bits are all correct, the cache subsystem asserts the MATCH OUT signal. indicating that the
information retrieved from the data memory is correct (a
cache hit). If the tags do not match or the status bits are

Figure 2. Direct Mapping

multiple-chip processor family will be partitioned such
that it forces a physical level 2 cache.

Cache Organization
Cache organization has four basic parameters: cache
mapping method. cache size. cache line size. and split vs.
combined cache. Note that for a multilevel cache hierarchy. the organizational tradeoffs associated with· cache
size. cache mapping method, and cache line size are multidimensional. This is because choices made for the level
1 cache are likely to affect the performance of the level 2
cache and vice versa.
Because a cache can be viewed as a small moving
window into portions of a larger main memory. main
memory locations must be mapped to and from locations
in the cache. The type of mapping you use affects both
cache hit time and miss rate. Generally, an increase in hit
rate exacts a penalty on cache hit time. However. recent
research supports the idea that if a cache is sufficiently
large. the relative difference in miss rate for various mapping methods becomes very small. This indicates that a
sufficiently large cache should be mapped according to
the scheme that exacts the least penalty on cache hit time.
ADDRESS LATCH

DATA

~------------------~

The most widely used mapping schemes are based on
the principle of associativity. A fully associative cache allows any location in main memory to be mapped to any
location in the cache. An noway set-associative cache
(typically n = 2. 4. 8. etc.) allows any specific location to
be mapped to n locations in the cache. A direct-mapped
cache allows any specific location in main memory to be
mapped to only one location in the cache; this scheme
thus implements a I-way set-associative cache. The following discussion details each technique. beginning with
the least complex (direct-mapped) and finishing with the
most complex (fully associative).

DATA OUT

Figure 3. A Direct-Mapped Cache

8-50

-..

~

~~~~~~~~~~~~~~~~~~~~~~~~~C~a~c~h~e~~~eDl~o~r~y~D~e~sl~·g~n
----=::::;:;--::
::;.....---

CACHE UNE 1
CACHE UNE 2

~
CACHE UNE m

SET 0

~

SEr 1

iI

Figure 4. Set Associative Mapping

not correct, MATCH OUT is de-asserted (indicating that
the data in DATA OUT is invalid and thus represents a
cache miss), and the correct data is retrieved from main
memory.
Consequently, a direct-mapped cache has two critical
timing paths:
1. Read-data: accessing the data memory and passing
the word to the DATA OUT register.
2. Asserting the MATCH OUT signal if the status bits
are OK and the retrieved tag matches the reference
tag.
Accordingly, the slower of paths 1 and 2 limit a
direct-mapped cache's access time.

Figure 5. A 2-Way Set Associative Cache
cache diagram (Figure 3). Additionally, an OR function in
the set-associative cache asserts MATCH OUT if either
set contains a match. The decode function selects data
from the bank containing the match and asserts a control
line to the mux; this allows the matched data to propagate
to DATA OUT.
This topology can be extended to n-way set associativity by having n sets of memory, an n-input OR
function, an n-to-Iog21l decoder, and an n-to-l mux.
Additionally, note that this is only one of several
topologies. Another way of implementing the mux function is to assert RAM output enables based on the outcome of the matching function. Yet another way is to
combine the OR and decode functions into one PLD.
Note as well that a multi-way set-associative cache
has more logic levels than a direct-mapped cache. A
multi-way set-associative cache contains three critical
timing paths:
1. Read data: accessing the cache data memory in each
of the sets.
2. Asserting the MATCH OUT signal in one of the sets,
if the tag is matched and valid.
3. Select data: selecting the cached data from the set that
matches, if there is a match.
Multi-way set-associative caches are slower than a
direct-mapped cache because of the added logic delay associated with the select-data path. Therefore, a directmapped cache exhibits a faster cache hit time at a lower
system cost.

Set-Associative Cache Mapping
Figure 4 illustrates how set-associative mapping
works for the two-way set-associative case. The cache
consists of two sets, or banks, of memory cells, each containing m lines. Location 1 in main memory maps to
cache line 1 of either set. Location 2 in main memory
maps to cache line 2 of either set. Location m in main
memory maps to cache line m of either set. Location m+ 1
in main memory maps to cache line 1 of either set Location m+2 in main memory maps to cache line 2 of either
set, and so on.
In this manner, each location in main memory has
two chances of being in the cache. This scheme allows,
for example, main-memory locations m+z and 5m+z
(where z is any integer) to coexist in the cache. This is an
advantage because it supports the principle of temporal
locality of reference very efficiently for small cache sizes.
As an example of how this type of cache works, consider a software loop that m cache lines cannot contain.
As the loop executes, the cache begins to fill with instructions and data from the loop, eventually filling m lines of,
say, set 1. At this point, rather than replacing cache lines
of set 1 with new information from the loop, the cache
can begin to fill the lines in set 2, thereby allowing the
entire loop to reside in the cache. This results in a performance advantage. This advantage goes away, however,
when the cache becomes sufficiently large.
Figure 5 shows an implementation of an n-way set
associative cache, where n = 2. Each of the sets contains
the same logic as the dashed block in the direct-mapped

Fully Associative Mapping
Figure 6 illustrates fully associative mapping. With a
fully associative scheme, any location in main memory
can be mapped to any location in the cache. This scheme
theoretically produces the highest hit rate because there is
no possibility of thrashing. Thrashing occurs when two or
more data blocks that map to the same location in the
cache start replacing each other frequently. The end result
of thrashing is a drastic increase in teff due to increased

8-51

••
•

(Reduced Instruction Set Computer) architectures has
created a demand for higher cache hit rates and faster
cache hit times - in other words, a large cache that is
designed simply (i.e., fewer logic delays).
The trends toward larger cache sizes and faster hit
times tend to favor the easier-to-design direct-mapped
cache. The basic tradeoff involves associativity, defmed as
the number of cache lines in which a given block of data
can reside. As associativity decreases, fewer lines are
searched on a memory reference. This provides a potential
implementation advantage because, as fewer lines are
searched, logic delay paths disappear and the cache gets
faster. A disadvantage to decreasing associativity is that
the number of lines with identical tags that can simultaneously reside in the cache also decreases.
Valid arguments support the use of set associativemapping over direct mapping - and vice versa. However,
most researchers agree that the trend is toward direct
mapping.
Two basic arguments are presented against direct
mapping: First, a direct-mapped cache has a lower hit rate
than· a set-associative cache of the same size. This statement is true but is rapidly becoming a Don't Care. Consider Figure 8. For small cache size, direct mapping exhibits considerably higher miss rate than either two-way
or four-way set-associative mapping. But for large cache
size (64 Kbytes) the miss-ratio difference between direct
mapping and set-associative mapping becomes a fraction
of 1 percent
Research presented in Reference 4 shows that, for an
8-Kbyte unified instruction/data cache, the difference in
miss rate for a two-way set-associative vs. a directmapped cache is around 1.3 percent. That figure drops to
about 0.5 percent for a 32-Kbyte cache.
The end result is that for large cache size, the reduced
logic delay inherent in direct mapping (specifically,
elimination of the select-data path) produces a cache that
is faster and displays essentially the same hit rate as a
similarly sized set-associative cache. Thus, recent research
supports the use of direct mapping.

••
•

CACHE

••
•
MAIN MEMORY

Figure 6. Fully Associative Mapping
miss rate. Thrashing becomes statistically unlikely, however, as cache size increases.
Figure "7 illustrates a simplistic. fully associative
cache. As shown, the address accesses a CAM (Content
Addressable Memory) bank which simultaneously searches all locations for a match. If the CAM finds a valid
match, the cache data RAM places the requested information in DATA OUT. If the CAM does not find a match,
main memory must be accessed for the correct data .. "
Fully associative caches are very expensive to build
due to the fact that CAM cells are not readily available.
Consequently, most caches are designed with direct or set~
associative mapping, which can be realized with SRAM
technology.
Direct vs. Set-Associative Mapping
The trend in cache design is toward larger caches. In
the past, cache sizes of 8 to 16 Kbytes were fairly common. Today, 64 Kbytes· is probably the average, with
many systems having much larger cache sizes.
As an example, consider the 80386 - a low-end
processor - used in combination with the 82385 cache
controller. The 82385 directly supports a 32-Kbyte cache
and indirectly supports 64- and 128-Kbyte caches. The
device supports both direct-mapped and two-wayset-associative cache. By coupling the 82385 with the two
Cypress CY7C184 Cache Data RAMs (designed specifically for this application), you can implement a 32-Kbyte
cache with three chips.
As another example, the Cypress CY7C600 SPARC
family - a high-end processor family - supports directmapped cache in 64-Kbyte clusters. Each cluster consists
of one CY7C604 Cache Tag/Cache Controller/Memory
Management Unit (CMU) and two CY7C157 16K x 16
Cache Data RAMs. Up to four clusters can be included
per processor, implementing a direct-mapped cache as
large as 256 Kbytes.
There are two basic reasons for this trend toward
larger cache sizes: First, semiconductor technology can
now easily support a 64-Kbyte cache size with reasonable
chip. count and speed. Second, the emergence of RISC

~

ADDRESS

I

1
PANOO.II ~E AlEJIOHY

CONTENT AOIJH£SS4BLE AlEAIORY

•
••
MATCH OUT

~1

•
••

1

32

~

Figure 7. A Fully Associative Cache

8-52

I DATA OUT

processing time due to the added logic delays necessary to
manage a larger cache. Given the current state of semiconductor technology, cache sizes of 64 Kbytes are easy
to achieve and generally large enough to allow a cache to
obtain a 95-percent hit rate.
For multilevel cache hierarchies, a level 2 cache must
generally be very much larger than the level 1 cache to be
effective. Research results presented in Reference 7 indicate that adding a level 2 cache can provide a worthwhile
performance increase, given the proper combination of
small level 1 cache and slow main memory.

The second argument against direct-mapped cache is
that a direct-mapped cache is more prone to thrashing. On
the swface, this makes a good deal of sense. But for
larger cache size, the statistical likelihood of thrashing is
so low that it becomes negligible. Additionally, .for realtime applications in which deterministic response time to
a memory reference is critical, the possibility of thrashing
can be completely eliminated if cache entries can be locked - marked as non-replaceable so they are always in the
cache.
Four sound arguments can be presented in support of
direct mapping: First, direct-mapped cache is less expensive than set-associative cache due to elimination of the
select-data logic. Second, the access time for a directmapped cache is faster than for a set-associative cache
due to elimination of the select-data logic delays. Third,
teff is generally lower for a direct-mapped cache than for a
set-associative cache for a sufficiently large cache
(generally 32 Kbytes), because tcache is reduced and the
(unfavorable) difference in m between the two cache
types is negligible. Finally, you do not need to implement
a cache line replacement policy for a direct-mapped cache
because direct mapping has a one-to-one relationship between cache and main memory (more on cache replacement policies later).

Cache Line Size
Cache line size is defined as the basic unit of information transfer between the cache and main memory.
Line size ranks second right behind cache size as the
parameter that most affects cache performance. Proper
choice of line size is important because it affects both
miss rate and tmain.
Figure 9 presents data transposed from Reference 10.
Note that for a given cache size, increasing the line size
reduces the miss rate. Eventually, however, the miss rate
begins to increase with larger line size, as shown by the
2-Kbyte curve in Figure 9.
Cache line size also affects tmain. Too-large line sizes
have long transfer times (which increases tmain) and create
difficulties in multiprocessing systems by generating excessive bus traffic. These problems especially affect
primitive buses that do not support single-address, multiple-data-cyc1e burst transfers. The burst transfer
capabilities of newer bus protocols, such as Futurebus and
theSPARC reference standard Mbus (Module-bus), allow
larger line sizes with less impact on tmain.
Additionally, larger line sizes tend to effect a degree
of memory pollution. This problem occurs when information is loaded into the cache but never referenced by the
processor.
For multicache organizations, having a level 2 cache
line size greater than that of the level 1 cache has advantages that are not discussed in Reference 7:lower cache

Cache Size
Cache size has perhaps the single largest influence on
miss ratio. In terms of miss-ratio impact, cache size is also
the most difficult to quantify because it relates so closely
to the principle of locality of reference - and therefore
the software workload. In general, however, a larger cache
has a lower miss ratio. On the other hand, large cache is
also significantly more expensive to build given the relatively higher cost of fast SRAMs.
Additionally, .mindlessly increasing the size of the
cache can actually result in a performance decrease. This
effect might result from an increase in output loading due
to fan-inlfan-out limitations or the increase in cache-hit
211 Ki.. Rate

(lI:)
-+-

Ki.. RaUo

~lI:r---------------------~========~l

4-11'.., Bet _

-&- 1-11'.., Bet _ _
..... _Kappecl

ZIrB caohe
.. ~ .... kB cach•

20"

--",,-- II IrB C&che

-e-

111 IrB Caahe

""*" sa kb C.ahe

111"
10

o~--~--~--~--~--~--~--~----~~

1

8

18

32

8'

128

258

,

1112

o,,~----~------~------~------~----~

Cache Size (kB)

18

32

8'

128

Cache Line Size (Bytes)

Figure 8. Cache Miss Rate as a Function of
Associativity (Transcribed from Reference 1)

Figure 9. Cache Miss Rate as a Function of Line Size
(Transposed from Reference 10)

8-53

tag cost and increased performance due to the pre-fetch
nature of the line size difference.
If the level 2 cache line size is greater than that of the
level 1 cache, you must consider some additional design
elements. Generally, for example, the line-size ratio of
level 2 to level 1 cache is set at a power of 2. Recall that
line size is defmed as the basic size of information transfer between the cache and main memory, or between the
level 1 cache and the level 2 cache. If the line size of the
level 2 cache is not equal to the line size of the level 1
cache, the level 2 cache controller must be able to communicate in two different sizes of data chunks.
A cache such as this level 2 cache is referred to as
sector oriented. This type of cache maintains coherency in
sizes equal to the level 1 cache line size, which is a subblock of the level 2 cache line.
As a result, the level 2 cache tag entries must include
bits to track the following status parameters for each subblock:
VALIO means that the sub-block contains good data.
DIRTY indicates that the cache line/sub-block has
been written to and is no longer the same as main
memory, i.e., main memory must be updated on replacement of a dirty line
INCLUSION indicates that the sub-block is present in
the level 1 cache.
To illustrate the operation of a sector-oriented cache,
consider a 16-Kbyte, direct-mapped, level 1 cache with a
16-byte line size that is backed up by a 256-Kbyte, directmapped, level 2 cache. If the level 2 cache line size
equals the level 1 cache line size (16 bytes), the level 2
cache has 256K116, or 16K, cache tag entries. Assuming a
32-bit address, the tag size in bits is then 32 - log2(16K) log2(16). This expression yields a tag size of 14 bits; adding 3 bits for VALID, DIRTY, and INCLUSION gives a
total length of 17 bits. This length equates to a cache tag
size of 16Kx17, or a 272-Kbit tag size.
If, on the other hand, the level 2 cache line size is set
at 64 bytes, the level 2 cache has 4K tag entries. The tag
size is then 14 bits plus the 3 status bits needed for each

of the 4 sub-blocks in the level 2 cache line; this yields a
total of 26 bits of tag. The total tag size is then 4Kx26, or
104 Kbits. This means that the tag for the sector-based
level 2 cache costs 40 percent as much as the tag for the
non-sector-based cache tag on a cost/bit basis. Therefore,
in addition to the possible performance benefit associated
with having a level 2 line size greater than the level 1 line
size, the cache is less expensive as well.
In summary, three factors influence cache line size
choice:
1. The . type of bus protocol used. A protocol that is
capable of burst transfers, such as Futurebus or Mbus,
permits a longer line size with a potential performance increase, due to reduced miss penalty for a
given line size.
2. The structure of main memory. In other words, make
sure that the line size does not create a bottleneck at
the main memory interface.
3. Bus-bandwidthldata-contention considerations, especially in a multiprocessing environment.
The design task boils down to choosing a line size
that is long enough to effect a good miss ratio, but short
enough to minimize tmain. Typically, cache line size is 16,
32, 64, or 128 bytes.

Split vs. Combined Cache
In the past, computers have generally utilized a single
cache for both instructions and data. It is possible, however, to design a system that has separate caches for instructions and data. Generally, as shown in Figure 10, a
unified instruction/data cache results in slightly higher
performance through a lower miss ratio. The advantages
of splitting the cache are:
1. It makes design of the instruction cache easier because the cache's contents do not generally need to be
modified.
2. It might eliminate conflict between data and instruction accesses in a pipelined architecture; this depends
on the overall processor architecture.
There are also advantages to using a unified cache:
1. Cache design is simpler for a unified cache because
both cache-to-main-memory and cache-to-processor
communications are one to one.
2. A unified instruction/data cache tends to make more
efficient use of the cache - which is a limited
resource.

Mi•• Rate

30".--------------------,

~
~

Cache Management
In this context, cache management refers to the
policies governing the movement of information into and
out of the cache. These policies do not relate directly to
cache organization, but they do affect the cache controller's complexity. Specifically, cache management refers to
the policies that
1. Keep main memory coherent relative to cached information.

o,,~-~-~-~-~-~-~-~-~~

1

4

8

16

32

64

128

256

612

Cache Size (kB)

Figure 10. Miss Rate for Split cache vs. Combined
Cache(Transcribed from Reference 12)

8-54

Table 1. Enl!ineerinl!Tradeoffs: Write-throul!h vs. Copy-back

2.
3.

+1-

Write-through

+

Main memory always has the most up-to-date version
of data - minimizing cache coherency problems for
multicache confh~urations.

Produces a lower miss rate than write-through for
some applications

+

Easy to implement in the cache controller.

Frees up bandwidth on the main memory bus due to
less freQuent memory updates.

-

Without buffering (e.g., poseted writes), CPU must
wait for write to complete.

Difficult to realize in multiprocessing systems due to
cache coherency issues.

-

If write buffers are present, extra logic must be
included to ensure that data will not be referenced
from main memory until it has been stored there.

Extra logic needed for DIRTY bit.

-

Generates increased bus traffic, which is especially
bad for multiprocessing systems.

Results in a more complex controller design, because
it caches writes in addition to reads.

Copy-back

For multilevel cache systems, reducing the overhead
required to maintain consistency between the level 1
cache, the level 2 cache, and main memory is a critical
design factor. The tradeoff is one of cache controller complexity and the amount of bus bandwidth vs. cost. According to Reference 7, the level 1Ilevei 2 cache coherency
strategy can result in a 15-percent cache system performance differential. In a two-level cache, you can generally choose a write strategy independently of the level.
From highest to lowest performance, the strategies are:
!&Yill
Level 2
Copy Back
Copy Back
Copy Back
Write Through
Write Through
Copy Back
Write Through
Write Through

Determine when new information should be loaded
into the cache from main memory.
Choose the cache line that should be replaced with
the new information being loaded into the cache, if a
choice is available

Main-Memory Coherence Schemes
When the CPU modifies cached data, main memory
needs to be notified of the change. Whether this notification happens sooner or later depends on the coherency
scheme used. The two mainstream coherency schemes are
called write through and copy back. Each policy has advantages and disadvantages, .and each affects both the
complexity of the cache controller and teff.
Using the write-through policy, all writes to cached
locations are immediately written-through to main
memory. This policy is the simpler of the two to implement, resulting in a less complex cache controller design.
The write-through approach can result in a performance
decrease, however, because the CPU usually must be held
pending completion of the write. Write through can also
cause problems due to increased bus traffic.
The copy-back policy only updates the cache on CPU
store cycles, updating main memory only when it becomes necessary to replace a modified (or dirty) line in
the cache. This policy requires an extra bit in the cache
tag array to keep track of whether a line is clean or dirty.
The main advantage of copy back is that it generates less
memory bus traffic, resulting in higher performance. The
main disadvantage of copy back is increased complexity
of the cache controller. Table 1 outlines the major advantages/disadvantages of both policies.
Additionally, a system can implement write allocation. This means that on a write miss, the data addressed
by the write miss is loaded into the cache and then
modified. With no write allocation, the data is written to
main memory only, and the cache is not updated.

Line Replacement Algorithms
The line replacement algorithm decides which entry
in the cache to replace when a new line must be loaded
into the cache. For a direct-mapped cache, this task is
straightforward, because each main-memory location
maps to a unique line in the cache. A set~associative
cache permits some latitude in choosing the set in which a
line is replaced.
The most common methods of replacing cache lines
are Least Recently Used (LRU) and First InlFirst Out
(FIFO). The LRU algorithm keeps track of which line has
gone the longest without being used, and replaces that
line. The FIFO algorithm keeps track of the. oldest line,
and replaces that line. You can also use a random line-replacement algorithm, where the set containing the line to
be replaced is chosen at random.
Curiously, research presented in Reference 11 shows
that random replacement generally proQuces higher hit
ratios than either the LRU or FIFO algorithms. Figure 11
uses data from Reference 11 and shows relative hit ratios
for four-way, set-associative cache using both the LRU

8-55

cache giv~ various assumptions and design choices. Note
that this 'methodology only provides "ballpark" figures.
You can obtain more accurate figures by simulating an
actual design - either dfrectly or via a software model.
As presented earlier, the goal· of cache design is to
reduce the effective memory access time (teff) as seen by
the CPU. Effective access time is defmed as

and random methods; two-way, set-associative cache
using the same two methods; and direct replacement (for
direct-mapped cache).
Two notes of caution: First,· this data is fairly old
(1983) and therefore reSults from use of an unreasonably
small cache by today's standards. Second, the information
was obtained by averaging trace data from three different
C programs running under UNIX on a VAX-It. Thus,
depending absolutely on this data would be inappropriate,
especially for RISC machines.
On the other han(i, relative comparisons of each
policy and cache organization are most appropriate. Some
interesting conclusions can be drawn from the data
presented in Figure 11, First, the random replacement algorithm appears to provide nearly the same or better hit
rates compared to LRU. This result is significant because
a random replacement algorithm is very much easier to
design into a cache controller and requires less hardware.
The second conclusion is that for, an 8-Kbyte (and
presumably larger) cache size, direct-mapped cache offers
nearly the same hit-ratio performance as two-way and
four-way set-associative cache. This result· supports the
conclusions drawn in the section on cache mapping techniques.

tef!= tcache

+ m X tmain

The following .methodology does not take into account the effects of 4esign choices on tcache or tmaini.e., these numbers are either already known or estimated.
This methodology does, however, include the miss rate,
which accounts for the following factors:
Cache size
Cache line size
Cache mapping scheme
Main memory coherency algorithm
These factors are included by modeling the miss rate as
m=MxMRM+CF
where:
m
= Cache miss rate
M
= Raw miss rate
= Miss-rate multiplier
MRM
CF
= Coherency factor
The raw miss rate represents the miss rate strictly as a
function of cache size and cache line si~e. Table 2
provides the raw miss rate and assumes direct-mapped
cache. The miss-rate multiplier is essentially a correction
factor that accounts for variations in miss rate between
direct~mapped and set~assoCiative cache. organiiations;
Table 3 provides this value.
.
The coherency factor accounts for variations in miss
rate due to the choice of main-memory coherency algorithm. Recall that the write-through policy does not cache
CPU writes; instead writethrough forces all C:PU writes to
immediately pass through to main memory. Thus, CPU
writes to a write~through cache can be regarded as cache

Fetching Algorithms
Most caches use demand fetching, where a new line
is requested from main memory only when a CPU reference results in a cache miss. This method minimizes the
complexity of the cache controller.
An alternate method, called pre-fetching, can produce
higher hit rates· in some applications. Pre-fetching makes
use of idle memory cycles to move data into the cache.
Static pre-fetch is implemented at compile time, while
dynamic pre-fetch occurs at run time.
Sequential dynamic pre-fetching can cut the miss rate
in half, according to Reference 11. Reference 5 estimates
a reduction in miss rate of as much as 75 to 80 percent.
This estimate points to a significant performance advantage, but sequential dynamic pie...fetching requires a
large cache size to be effective. This is because dynamic
pre-fetch can result in increased memory pollution; the
statisticl!llikelihood of this happening increases dramatically for decreasing cache size. 1l1us, if cache size is large
and cache controller complexity is not a major issue, includinga dynamic pre-fetch mechanism can result in a
significant perforinance increase.
A pre-fetch mechanism also provides a way to improve the hit rate ofa level 2 cache. Because the level 2
cac~e hit rate is . usually fairly low anyway· (generally 50
to 90 percent), memory pollution introduced by pre-fetch
tends to be inconsequential: You can implement pre-fetch
with minimal hardware overhead by malPng .the line size
of level 2 great~ than the line size oflevel 1:

Normalized Hit RaUo
1.01.--------------------,

0.99
,-WIlT

IU/lWf

.oj... ,-WIlT

IU./LRU

-

•

IU/lWf
I-WIlT IU/I1W

"*"

DIrect

.... I-WIlT

2048

4098

Cache size (Bytes)
16 Byte line size,
all numbers normalized to 4..,..Way SA/RAN

Pulling it all together

Figure 11. Cache Hit Rate as a Function of
Replacement Algorithm

This section provides a simplistic method of calculat..
ing teff and the performance improvement of using a

8-56

8192

Table 2. Miss Rate as a Function of Cache Size and Cache Line Size
Cache Size
(Kbytes)

Cache Line Size (Bytes)
8

16

32

64

128

256

2

0.154

0.116

0.092

0.080

0.084

0.088

4

0.116

0.086

0.074

0.064

0.061

0.065

8

0.096

0.073

0.060

0.053

0.050

0.045

16

0.086

0.064

0.054

0.047

0.044

0.039

0.044

0.041

0.036

32

0.081

0.060

0.051

64

0.079

0.057

0.050

0.043

0.040

0.035

128

0.077

0.056

0.049

0;042

0.039

0.034

256

0.076

0.055

0.048

0.041

0.038

0.033

misses, meaning that CF > O. If the cache uses write
through with posted-write capability or uses the copy-back
algorithm, CPU writes can be considered cache hits,
meaning CF = O. You obtain CF by determining or assuming the percentage of cache references that are writes and
then derating the miss rate by that factor.
As an example, consider a 64-Kbyte, direct-mapped
cache that can be accessed by the CPU in one cycle and
that has a 32-byte line size; the cache uses write through,
and 30 percent of cache references are writes. Assume a
15-cycle main-memory access time. From Table 2, M =
0.017. From Table 3, MRM = 1.000. CF = 0.300 (given).
Then
m=MxMRM+CF
= (0.050) (1.000) + 0.300
=0.350
and
tef!= tcache + m x tmain
=1 + (0.350) (15)
= 6.25 cycles
meaning that this system achieves a 2.4x performance increase using the cache described. Note that the same system with a copy-back cache achieves a teff of 1.75 cycles,
resulting in an 8.57x performance improvement. Finally,
consider a two-way set-associative cache using copy back.
Now teff = 1.746 cycles, for a performance improvement
of 8.59 -less than 0.2% better than a direct-mapped
cache).

Multilevel Cache
Recent advances in silicon technology have allowed a
new focus in cache design methodology. Increased gate
densities in integrated circuits have helped make it possible to include a small- to-medium-sized cache on the
CPU chip itself. Examples include the Motorola
68030/040, the Intel 80486, and Intel's i860 RISC chip.
Additionally, because ICs available today support multiprocessing in a straightforward manner, multiprocessing
systems will become more common. Examples of these
ICs include the Intel 80486 and the Cypress CY7C600
RISC family.
Both of these developments tend to support a multilevel cache hierarchy. Four major factors support a move
to a multilevel cache hierarchy:
1. The way the on-chip cache is implemented can force
a cache partition. Specifically, a small on-chip cache
with unacceptable or marginally acceptable hit rates
might force you to add a second level of cache off
chip to achieve a design's performance objectives.
However, if the on-chip cache is designed improperly, a multilevel cache might be impossible or impractical. The on-chip cache must have the necessary
hooks to permit communication between the first
level and second level caches. If these hooks are not
present, you will be forced to accept lower performance in return for higher integration. This problem
occurs with the Intel i860, for example.

Table 3. Cache Miss Rate as a Function of Cache Size and Mapping Method
Cache Size (Kbytes)

Mapping Method
Direct Mapped

128

256

512

2

4

8

16

32

64

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

0.996

0.996

0.989

0.989

2-Way Set Assoc.

0.975

0.980

0.986

0.990

0.994

0.995

0.996

4-Way Set Assoc.

0.925

0.940

0.958

0.970

0.982

0.985

0.988

8-57

The balance of this application note focuses on multilevel cache hierarchies for uniprocessor and multiprocessor systems. In both cases, the hierarchy is limited to two
levels.

Multilevel Cache in Single-Processor Systems
Figure 12 illustrates the cache hierarchy discussed in
this section. In this hierarchy, the level 1 cache services
processor references and obtains data on a miss from the
level 2 cache. The level 2 cache services references from
the level 1 cache and obtains data on a cache miss from
main memory. The level 1 cache can be inside or outside
the processor chip.
This hierarchy does not change the design goal and
methods of achieving that goal, but it adds more variables
to the equations. The effective memory access time can be
expressed as:
tefl= tLI + mLl (tLl + mL2 x tmain)
where:
= Level 1 cache access time
tLl
= Level 2 cache access time (penalty
IL2
beyond ILl)
= Level 1 cache miss rate
mLl
= Level 2 cache miss rate
mL2
= Main memory access time (penalty
tmain
beyond tL2)
Minimizing the overhead delay associated with maintaining cache consistency is much more complex for multilevel cache hierarchies than for a single-level cache
hierarchy. When beginning a multilevel cache design, you
must carefully consider all the previously discussed design
factors, but these design factors are now multidimensional
problems.
The miss rate approximation presented earlier can be
extended for two levels of cache. As an example, consider
a system with a single-cycle, two-way set-associative, 8Kbyte, level 1 cache that has a 32-byte line size and uses
copy back; the system also has a single-cycle, directmapped, 128-Kbyte, level 2 cache that has a 128-byte line
size and uses write through with 30-percent writes. Mainmemory access requires 20 cycles. To evaluate this
scheme's performance, compare the effective memory access time for both levelland level 2 cache with the access time for level 1 cache alone.
First, you can calculate mLl from data in Tables 2
and 3:
mLl = MLI x MRMLl + CFLl
= (0.060 )(0.986) + 0
=0.059
Then

Figure 12. Multilevel Cache Hierarchy for SingleProcessor Systems
2. Detailed study of teff reveals that a multilevel cache
hierarchy can offer higher performance than a singlelevel cache hierarchy, especially if the difference between processor speed and memory speed is large.
This speed difference might not result solely from in.:
creases in CPU speed, but can also result from the
use of larger (and therefore slower) main memory.
3. Creating multiple cache levels opens the possibility of
functionally tuning each cache level for highest performance. For example, you can optimize the firstlevel cache to minimize teff and the second-level
cache for high hit ratio, reduced cost, or reduced interconnect traffic.
4. Increased usage of multiprocessing might force a
multilevel cache hierarchy. Generally, each processor
needs its own cache - especially if it is a RISC engine - to increase performance and decrease bus
traffic. Bus bandwidth is an especially valuable
resource in multiprocessing systems. Adding a
second-level cache can reduce teff, especially if the
level 1 cache does not meet performance objectives.
In considering these factors, you must resolve the
cost vs. performance tradeoff of multilevel vs. single-level
cache. This tradeoff depends on the processor architecture,
the on-board cache (if any), the main-memory structure,
and the type of connection between the cache and main
memory.
Consequently, there are no set rules to justify inclusion of a multilevel cache hierarchy. However, recall
that cache memory was created to solve performance
problems stemming from extremely fast CPU speeds relative to main-memory access times. Reference 5 states that
these speeds must differ by a factor of 10 to justify use of
a cache and a factor of 40 to justify use of a multilevel
cache. The actual ratio that justifies inclusion of a multilevel cache hierarchy is a personal decision - generally
as much a marketing decision as an engineering decision
- and concrete statements regarding justification are not
valid.

tefilLI only

=tLI + mLl X tmain

= 1 + (0.059) (20)
= 2.18 cycles

Next, you can calculate mL2:
mL2 = ML2 x MRML2 + CFL2
= (0.039 )(1.000) + 0.300
=0.339

8-58

ber of processing elements, you usually implement a busbased protocol.
Adding a second level of cache tends to aggravate the
consistency problem by introducing another level at which
consistency must be maintained. Because multicache,
multiprocessor topologies have some combination of multiple level 1 caches interfacing to a single level 2 cache
and/or multiple level 2 caches interfacing to a common
global main memory, the effective memory access time
equation must contain a component to account for time
wasted while attempting to gain access to a "parent
memory."
Therefore, the equation for multiprocessing systems
with multilevel cache hierarchies has a contention delay
term for consistency management traffic. The position at
which this delay enters the equation depends on the topology used. Minimizing this and other delays caused by
consistency management is critical to cache design in
multiprocessing systems that have a multilevel cache
hierarchy.
Now consider how this extra level of coherence
management affects system performance. Figure 13
presents three different· mUltiprocessing topologies. Most
authors agree that the level 2 caches should be supersets
of their children caches. In this manner, the coherence
management protocol can be moved as far from the
processing element as possible. This allows the level 2
caches to shield the level 1 caches from unnecessary blind
checks (snoops for data that is not in the cache) and invalidations that might propagate up from main memory.
Adhering to the Multilevel Inclusion (MLI) Principle
- that all the data in the level 1 cache is in the level 2
cache - can minimize snoops, which halt the CPU. The
MLI Principle is defined for set-associative caches in Reference 2: MLI can be achieved if the degree of set associativity of a parent (level 2) cache is greater than or
equal to the product of the number of its children (level 1)
caches, their degree of set associativity, and the ratio of
their block sizes. Expressed mathematically:

Then
tefflLl &L2 = tLl + mLl (tLl + mL2 X lmain)
= 1 + 0.059 [1 + 0.339 (20)]
= 1.46 cycles
Thus, the performance improvement over using only
the level 1 cache is 33 percent. Note. also that the evaluation model produces a teff of 1.459 cycles for a two-way
set-associative, level 2 cache, which results in trading a
more complex, more expensive cache controller design for
essentially J!Q performance improvement over a directmapped implementation. Additionally, if the level 2 cache
is direct mapped and uses copy back, leff is 1.11 cycles,
resulting in nearly a 50-percent improvement over using
only the level 1 cache.

Multilevel Cache in Multiprocessing Systems
Multiprocessing systems are becoming increasingly
prevalent in the industry because· they allow the growth
rate of computer system technology to be higher than the
growth rate of processor technology. Programmers want
these systems to have a global main memory. At the same
time, the single most performance-limiting factor in multiprocessing systems is maintaining consistency between
the global main memory and multiple processors; each
having its own cache. Adding a second level of cache can
aggravate this consistency problem, and in fact might
cause a degradation in performance. However, a multilevel cache hierarchy can increase performance if. implemented properly.
Multicache Consistency in Multiprocessing Systems

In multiprocessing systems, it is generally preferable
for each processor to have a private cache, which minimizes bus traffic, and a common global main memory,
which supports ease of programming. Because a multiprocessing environment generally includes multiple
caches, which provide local windows on a large main
memory, two or more caches can contain the same data. If
this situation occurs, a change in the data in one cache
renders the data in the other caches incoherent. Therefore,
you need a set of rules - a multicache consistency
protocol- to maintain consistency.
As described earlier, maintaining coherence in a
uniprocessor system with a single level of cache is fairly
simple because coherence only needs to be maintained between one cache and main memory. You can achieve this
goal by implementing the copy-back or write-through
protocols. The consistency problem is more complex in
multiprocessing systems, where each processor has a
private cache. This is because consistency must· be maintained among a cache, it's "sibling" caches, and main
memory.
The consistency problem in this case - while more
complex - is well dermed and has well-known solutions.
Typically, for a multiprocessing system with a large number of processing elements, you employ a software consistency protocol. For systems with a small to medium num-

AllLl's

Set AssociativityL2 =

I [ Set AssociativityLl

Line SizeL2 ]
x Line SizeLl
Note that MLI is not a requirement in multicache
designs, and the scheme proposed in Reference 2 is only
one of several ways to achieve MLI. As shown later, MLI
as stated in Reference 2 is very restrictive and results in
an extremely complex and expensive level 2 cache design.
To enforce MLIaccording to the Reference 2 scheme, for
instance, the system described in the section Multilevel
Cache in Single-Processor Systems becomes an eight-way
set-associative, level 2 cache. This might be an unrealistic
goal, because a 128-Kbyte cache of this complexity is expensive to implement.
For Topology A in Figure 13, however, you can implement MLI under the Reference 2 scheme if, for example, you do it this way: The level 1 cache is a direct-

8-59

mapped, 16-Kbyte cache with 16-byte line size, and the
level 2 cache is a four-way set-associative, 256-Kbyte,
sector-based cache with 64-byte line size. Additionally,
using Topology A, you can implement a simple cachecoherence protocol such as copy back or write through at
the level 1 cache, which is generally small.
Cost effectiveness can dictate a fairly large sectorbased level 2 cache. Consistency among the level 2 caches
is maintained on the basis of the level 1 line size. A
private level 2 cache services all level 1 cache misses.
The level 1 cache is disturbed only when the need arises
to replace a sub-block in a level 2 cache whose IN·
CLUSION bit is set.
You can improve performance dramatically if the
system meets two conditions: the bus can support direct
data intervention (more on this later), and the level 2
cache controller has a bus-snooping mechanism that allows it to monitor bus activity and perform invalidations
based on observed bus traffic. The effective memory access time for this topology is:
lelf= ILl

+ mLl

[ILl

points at which contention might occur, resulting in an effective memory access time equation of:
telf= tLl

+ mLl [(tLI + tcontention. LI-L2)
+ mL2 (tmain + tb""L2-main)]

where tcontention,LI-L2 denotes the arbitratiOn/contention
penalty for a level 2 cache to service a level 1 cache.
Thus, this design is slower than topology A.
Additionally, the logic required for arbitration at level
2 among the several level 1 caches is expensive. Finally,
MLI is very difficult to obtain for this type of system.
Consider a system with four 16-Kbyte,. direct-mapped,
level 1 caches that have a 16-byte line size connected to a
256-Kbyte, level 2 cache that has a 64-byte line size. The
scheme proposed by Reference 2 dictates that the level 2
cache be 16-way set associative.
Topology C, a bus-based hierarchy, is probably the
most attractive topology for systems with a small to
medium number of processing elements. Using this topology, MLI is guaranteed through the use of broadcast invalidations - notifications to all caches to invalidate
shared lines that were written by another CPU into a
private cache. The effective memory access time for this
topology is given by:

+ mL2 (Imain + Ib""L2-main)]

where lbus,L2-main is the time required for a given level 2
cache to acquire the bus. The advantage of this topology
is that it is simple and fairly straightforward to implement.
The main disadvantage is that the level 2 cache is not
shared by several level 1 caches.
Topology B, which depicts a multiport level 2 cache
connected to multiple other level 2 caches via a bus, is
probably the least desirable of the 3 topologies shown for
several reasons. For example, this topology contains two

telf= tLI

+ mLl [(tLI + tb""Ll-L2)
+ mL2 (tmain + tb""L2-main)]

where lbus,LI-L2 is the time required for a given level 1
cache to acquire the bus that connects the level 1 caches
and the level 2 cache. Using well-designed buses, such as
Futurebus or Mbus, reduces bus traffic in this topology to
a minimum.

TOPOLOGY 8

TOPOLOGY A

TOPOLOGY C

Figure 13. Multiprocessing Topologies with Multilevel Cache Hierarchies

8-60

in Figure 15. This change supports cache sizes from 64 to
2S6 Kbytes in 64-Kbyte increments. You can also connect
several MP clusters via the Mbus to form a multiprocessing system, as shown in Figure 16.
The CY7C601 IU is fully compliant with the SPARC
reference Instruction Set Architecture. The CY7C601 furnishes full support for eight register windows, a full IEEE
floating point coprocessor interface, and a second generic
(user-defined) coprocessor interface. The device is available at 2S, 33, and 40 MHz (scalable to SO MHz) and is
implemented in a 0.8-micron, dual-layer-metal CMOS
process.
The CY7C602 FPU is a single-chip, SPARC, floating-point processor. It provides full IEEE double-precision
support, a dedicated register file, and 64-bit data paths.
The CY7C602 is available at up to 40 MHZ (scalable to
SO MHz).
The CY7C1S7 Cache RAM is custom design for
CY7C604 and CY7C60S cache systems. It is still a fairly
generic cache RAM, however. The CY7ClS7 is a fully
synchronous (self timed) device - much better suited to
cache design than "industry standard" asynchronous
RAMs. The CY7C1S7 scales in speed, matching the clock
rate of the IU and CMU, and is implemented in 0.8micron, dual-layer-metal CMOS technology.

Topology C's disadvantages are that it introduces
greater hardware complexity, and that a manageable implementation requires the use of VLSI. (Such VLSI solutions are available in the Cypress CY7C600 family, however.) Additionally, even with a good bus protocol, the
amount of bus traffic limits the number of resources that
can share the bus. Despite these disadvantages, a busbased multilevel cache hierarchy appears to be the most
promising in terms of cost and performance.

Multilevel Cache in SPARC Multiprocessing
The Cypress CY7C600 RISC microprocessor family
contains full support for multiprocessing, including an excellent bus-based, multicache consistency mechanism.
This section covers the CY7C600 family members that
comprise a multiprocessing (MP) cluster: the CY7C601
Integer Unit (IU), the CY7C602 Floating Point Unit
(FPU), the CY7C60S Cache Tag-Cache ControllerMemory Management Unit for Multiprocessing (CMUMP), and the CY7C1S7 16K x 16 Cache RAM. This part
of the application note highlights the features of the
CY7C60S CMU-MP that support multicache consistency.
A section also covers Mbus. Finally, a SPARC multiprocessing system is extended to a multilevel cache
hierarchy, which is demonstrated in two topologies. These
topologies are then examined, with a focus on implementation and performance advantages/disadvantages.

The CY7C605 CMU-MP
The CY7C60S CMU-MP includes all the features of
the CY7C604 uniprocessing CMU along with provisions
for multiprocessing. Fully compliant with the SPARC
Reference MMU Architecture Standard, the CY7C60S has
a 32-bit (4 Gbyte) virtual address space and a 36-bit (64
Gbyte) physical address space. In addition to an on-board,
64-entry, fully associative translation lookaside buffer
(TLB), the CY7C60S includes support for 4K multiple
contexts, a 4-Kbyte page size, memory-address protection
checking, hardware table walking, and sparse· address
spaces with a three-level page-table map.
For cache control, the CMU contains 2K, directmapped, virtual cache tag entries and support for a 32byte line size; these features allow the device to manage a
64-Kbyte direct-mapped cache. The CY7C60S also supports either write through with no write allocate or copy
back with write allocate. Copy back with write allocate
does not degrade performance because the CMU has a full
32-byte cache read buffer.
The CMU can also perform posted writes via two onchip, 32-bit write buffers, which support fully buffered
Store Doubles. This capability improves the cache's performance when a write miss is encountered by allowing
the main-memory update to occur in background. The
CY7C60S's .cache lock mechanism allows entries to be
locked in the cache, enabling deterministic responses for
real-time applications. The device also provides for five
levels of cache flushing. Its 64-bit multiplexed address/data bus provides the interface to Mbus.
The CY7C60S provides full alias detection and correction through use of both a virtual and physical cache
tag array. The physical tags, which are not included in the

The SPARC Multiprocessing Cluster
As presented in Figure 14, the basic SPARC multi-

processing cluster consists of a CY7C601 IU, a CY7C602
FPU, a CY7C60S CMU-MP, and two CY7ClS7 Cache
RAMs. You can increase the cache size by adding up to
three more CY7C60Ss and six more CY7ClS7s, as shown

CY7C601
INTEGER UNIT
(lU)

CY7C602
FLOATING POINT
UNIT (FPU)

I
I
CTRL

CY7C605
CACI£ CONTROLlER. TAG "
NENORY tAANAGElAENT UNIT
(CIotU-MP)

/

'"

CY7C157
16kx16
CACHE DATA RAM
(CRAM)

-

AI-BtJS-AlP

MAIN MEMORY

Figure 14. The SPARC Multiprocessing Cluster
8-61

M-BUS-MP

Figure 15. Fully.Extended SPARC Cache

CY7C604 CMU-UP, serve two purposes. First, this
second bank of cache tags acts as a reverse .translation
unit, allowing on-chip detection and correction. of aliasing.
Second, the physical tag array permits bus snooping to
occur completely independently of the processor, which
interfaces to the virtual cache through the virtual cache
tag array.
Bus snooping is an activity in which the CMU
monitors all activity on the Mbus and responds to invalidation broadcasts or requests for data from other
caches in the system. The key advantage of physical tag
entries is that they· enable the bus snooping logic to be
decoupled from processor traffic, resulting in a substantial
performance increase.
The CMU-MP contains full support for the MOESI
(Modified, Owned, Exclusive, Shared, Invalid) cache consistency model. The MOESI model enables multiple
caches to coexist on a single bus and share a global main
memory, while guaranteeing multicache consistency.
Using this- methodology, each entry in a cache can be in

one of five states: PRIVATE CLEAN, PRIVATE DIRTY,
SHARED CLEAN, SHARED DIRTY, or INVALID. If an
entry. is located in only one· cache in the system, it is
either PRIVATE CLEAN or PRIVATE DIRTY.
If more than one cache shares unmodified data, they
are all in the SHARED CLEAN state. Once a cache
modifies shared data, it marks the data SHARED DIRTY,
broadcasts an invalidation message informing other caches
with that data to mark their entries INVALID. and immediately becomes responsible for responding to any further
requests for that data. Note that any time a processor is in
one of the DIRTY states, it becomes the "owner"of the
data and is responsible for servicing any requests for that
data.
Finally, the CMU-MP supports reflective main
memory and direct data intervention. The latter provides a
significant performance increase over indirect data intervention. To illustrate the difference, consider an MP system with a common main memory and, for simplicity, two
caches. Cache A retrieves a line of information from main

• • •

)/-BVS-AIP

Figure 16. A SPARC Multiprocessing System
8-62

~

~~~oR~~~~~~~~~~~~~~~~~~~~C~a~c:h~e~~~e~rn~O~rY~D~eS~i~g~n

• • •

AI-BIIS-AIP

TOPOLOGY 1
Figure 17. SPARC Single-Level Cache Extension to Multilevel Cache Topology

A Cache Hierarchy for SPARC MP Systems

memory and modifies it, thus becoming the owner of the
data. At some point, Cache B requests the same piece of
information.
In a system using indirect data intervention, Cache A
informs Cache B that the miss occurred and that Cache B
should attempt to gain access to the bus later. Cache A
then seizes the bus and updates main memory.
Meanwhile, Cache B tries to gain access to the bus, while
its processor is on hold, awaiting the new data. When
fmished updating main memory, Cache A releases the
bus. Cache B gains access, and begins to retrieve the data
from main memory. Eventually, after a considerable number of cycles, processor B is released from hold and permitted to continue.
In a system using direct data intervention, Cache A
supplies the data requested by Cache B directly, resulting
in considerably fewer hold cycles for processor B. Additionally, with a reflective main memory system, main
memory observes the information transfer and updates itself at the same time. With a non-reflective approach,
main memory would contain stale data relative to the
caches.

This section presents two possible multilevel cache
implementations for SPARC multiprocessing systems. For
highest performance, both topologies require a level 2
cache controller that is as complex as the cache controller
in the CY7C605. Specifically, the level 2 cache must support fully concurrent bus snooping and direct data intervention. In addition, it is generally preferable that the
level 2 cache have a larger line size than the level 1
cache. The level 2 cache controller thus needs to be sector
based, which increases the level 2 cache controller's complexity.
Figure 17 shows a single-level cache extension topology, which forces the level 2 cache to manage cache consistency. Consistency management is thus moved as far
away from the processor as possible. This approach improves performance because it tends to cause fewer hold
cycles for the processor. This topology also permits
smaller level 2 caches - a defmite advantage if the speed
of the level 2 cache is critical, because small caches are
easier to optimize for speed.
The main disadvantage of this topology is that the
level 2 cache is not shared by several level 1 caches. This
results in higher total system cost because each level 2
cache requires its own controller.
Topology 2 (Figure 18) is a multilayer bus-based
hierarchy. This topology permits a common level 2 cache,
whose single controller keeps costs lower. However, this
topology probably requires a level 2 cache size of 2
Mbyte or more to achieve high system-level performance.
This large cache size generally results in a slower (perhaps multicycle) level 2 cache. If cost of the level 2 cache
is critical, however, this topology is probably the best
choice.

Mbus
Mbus is a fully synchronous, 64-bit, multiplexed address/data bus that supports multiple bus masters and has
a peak transfer rate of 320 Mbyte/s at 40 MHz. All signals
are sampled on rising clock edges and driven active and
inactive. Mbus supports single-address/multiple-data-cycle
bursts of 16, 32,64, and 128 bytes, with full retry support.
Finally, central arbitration is separate from the master and
slave; the type of arbitration scheme used is completely
up to you. The cache consistency model for the Mbus is
based on the MOESI model.

8-63

• • •

AI-BUS-AlP

L2 CACHE

BACKPLANE Bl/S OR AI-BOS-AIP

TOPOLOGY 2
Figure 18. SPARC Bus-Based Multilevel Cache Topology
6. Pohm, A.V. and Agrawal, O.P., High Speed
Memory Systems, Reston Publishing, 1983
7. Short, RT. and Levy, H.M., "A Simulation Study
of Two-Level Caches," Proceedings of the 15th Annual
Symposium on Computer Architectures, February 1988,
pp. 81- 88
8. Smith, AJ., "Cache Memories," Computing Surveys, Vol 14, No 3, September 1982, pp. 473 - 530
9. Smith, AJ., "Cache Memory Design: An Evolving
Art," IEEE Spectrum, December, 1987, pp. 40 - 44
10. Smith, A.J., "Line (Block) Size Choice for CPU
Memories," IEEE Transactions on Computers, Vol C-36,
No 9, September 1987, pp. 1063 - 1075
11. Smith, J.E. and Goodman, J.R., "A Study of
Cache Organizations and Replacement Policies," ACM
Computing Surveys, 1983, pp. 132 - 137
12. Smith, CPU Cache Memories, University of
California, Berkeley, 1984

References
1. Agrawal, Hennesy, Horowitz, "Cache Performance
of Operating Systems and Multiprogramming Workloads,"
ACM Transactions on Computer Systems, 11/88, Vol. 6,
No.4
2. Baer, J.L. and Wang, W.H., "On the Inclusion
Properties for Multilevel Cache Hierarchies," Proceedings
of the 15th Annual Symposium on Computer Architectures, February 1988, pp. 81. - 88
3. Gregory, Richard, "Caching Designs Eliminate
Wait States to Relieve Bottlenecks," Computer Design,
October 15, 1988, pp. 65 - 73
4. Hill, Mark D., "A Case for Direct-Mapped
Caches," IEEE Computer, December 1988, pp. 25 - 40
5. Kabakibo, Aiman, et al, "A Survey of Cache
Memory in Modem Microcomputer & Minicomputer ~ys­
terns," IEEE Micro, March 1987, pp.210 - 227

8-64

Synchronous Trap Identification
for CY7C600 Systems
tion or a data access exception. Instruction access exceptions are delayed until the fetched instruction reaches the
execute stage in the CY7C601. Because data accesses are
generated as a result of an instruction that has reached the
execute stage, the exceptions associated with a data access
are recognized immediately. This difference in the timing
of exception recognition .causes many of the double fault
cases described in section 4.9.1 of the SPARe RIse
User's Guide.
Upon detecting an instruction or data exception, the
CY7C601 enters the corresponding trap. The two trap
handlers share the task of identifying the synchronous
fault case. The following sections describe the fault cases
that each handler can identify using the contents of the
SFSR and SFAR. Figures 1 and 2 illustrate the decision
tree seen by the data exception handler and the instruction
exception handler, respectively.

This applications note discusses the decoding of the
status bits in the CY7C601 SPARC processor's
synchronous fault status register (SFSR). When a memory
access fault occurs, these bits indicate the type of fault
Due to the pipelined nature of the SPARC processor,
multiple traps can occur before it leaves normal execution
mode and vectors to a trap handler. If a multiple-trap
situation occurs, the information in the SFSR and the
synchronous fault address register (SFAR) might not
reflect the status for the trap to which the CY7C601 frrst
responds. Although the corrective course of action for the
fault case depends on your system's characteristics, this
application note explains how to interpret the fault so that
it can be corrected.
Section 4.9 in the SPARe RIse User's Guide
describes the operation of the SFSR and SFAR upon encountering a synchronous fault. Reviewing section 4.9
will help you understand the information given in this application note. A brief summary of the SFSR. characteristics appears in the last section of this applications
note.

Data Exception Fault Groups
GroupDl
This group consists of case 14, as described in the
SPARe RIse User's Guide. The CY7C601 traps for the
data memory access fault. The information in the SFAR
and SFSR reflects the instruction translation fault and is
not useful for servicing the initial data access fault. The
address of the data access instruction is not lost, however.
The address is given by the PC stored in r[17], or local
register 11, of the trap handler window.

Trap Handler Objectives
The objective for the trap handler is to resolve a
memory access error, if possible. In the case of a double
fault occurrence, the first of the two faults is generally,
but not always, the desired fault to be corrected. In one
group of cases, correcting the second fault is preferable,
because the CY7C601 re-executes the instruction that
caused the first fault upon leaving the trap handler.
Errors in address translation are generally nonrecoverable, as they imply a mapping problem in the
MMU virtual page-mapping tables. For these cases, the
identification and recording of the error condition is the
only purpose that the trap handler can serve.
Memory access errors are signaled when the
CY7C604 or the CY7C605 cache and memory management units assert the MEXC signal. This event forces the
CY7C601 to vector to either an instruction access excep-

GroupD2
The members of this group are cases 12 and 13. Handling this group is straightforward in that the information
in the SFSR and SFAR reflects the frrst occurring fault.
However, translation faults in general are a nonrecoverable type of error, as they imply a mapping problem within the page tables. Handling this type of fault
consists of dropping the task altogether and recording the
fault information for system debug.

8-65

Group 01
Fault case

14

Fault case

12, or 13

Fault case

8, 15, 16,
or 17

Fault case 3, 4,

or 5

Figure 1. Data Access Fault Identification

Group 11
Fault case

9

Fault case

Fault case 2

7,10, or 11

Figure 2. Instruction Access Fault Identification

8-66

Fault case 1

I RSV 1:.8~TaUCITOI BEl
31

14 13

12

L

11 10 9

RSV .. Reserved
·CBT = Copy-back 'franslation Error
UC
Uncorrectable Error
TO = Time Out Error

=

I AT I
8 7

IFAV IOWI

FT

5 4

BE = Bus Error
L = Level
AT Access'ljpe

=

2

1

0

Fr = Fault 'JYpe
FAV - Fault Address Valid
OW = Over Write

(·CY7C604 only; reserved in CY7C605)

Figure 3. CY7C604/60S Synchronous Fault Status Register

GroupD3
This group consists of cases 8, 15, 16, and 17. In
cases 8, 16, and 17, the data access translation fault that
the CY7C601 traps on is the either the only or the frrst of
the two occurring faults. The information stored in the
SFSR and SFAR is valid and can be stored in an error
dump for debug purposes.
Case 15 is an instruction fault followed by a data access translation fault. The information for the instruction
access fault is non-recoverable, as the CY7C601 traps on
the data translation fault, which overwrites the instruction
access fault information.

Group 12
This group consists of case 2, which is a triple instruction fault. Note that the SPARe RIse User's Guide
describes this case as a double instruction fault, but the
User's Guide errata makes the correction to a triple fault.
The CY7C601 traps on the frrst occurring instruction
trap, but the information in the SFSR and SFAR has been
overwritten by the following instruction fault'>. The address of the frrst instruction fault can be recovered in the
same manner as group 11.
GroupI3
This group of faults includes cases 7, 10, and 11. All
of these cases are translation faults on an instruction access. Case 7 is a singular occurrence of an instruction access translation fault. Cases 10 through 13 are occurrences
of an instruction access translation fault followed by some
other type of fault. The information in the SFSR and
SFAR reflects the status stored from the first occurring
fault These faults involve translation errors, which are
generally handled by dropping the task altogether and
recording the fault information for system debug.

GroupD4
This group consists of cases 3, 4, and 5. The
CY7C601 traps on the data fault in all of·these cases, and
the information in the SFSR and SFAR reflects the information for the data fault. Case 3 is a single data fault, and
is straightforward in its recovery. Case 4 is an instruction
fault that is overwritten by a following data fault. The information for the initial instruction fault is lost, but can be
recovered by correcting the data fault first. The CY7C601
reissues the address for the instruction that caused the initial instruction fault, allowing the fault to be handled.
Case 5' is the occurrence of a data fault followed by an
instruction fault. It should also be handled by correcting
the data fault and allowing the CY7C601 to reissue the
address for the fault-causing instruction.

Group 14
This group consists of case 1, which is a single instruction fault. Information in the SFSR and SFAR is
valid.

SFSR Description

Instruction Exception Fault Groups

The SFSR is described in sections 4.4.11 and 4.9 of
the SPARe RIse User's Guide, but is briefly repeated
here for reference.
Figure 3 gives the bit assignments for the SFSR. The
SFSR's UC, TO, and BE bits are set according to the type
of error signaled to the CY7C604 by an Mbus agent (such
as memory or the Mbus arbiter) in response to an Mbus
transaction. Table 1 gives the encoding for the Mbus
transaction response signals.
Mbus transactions that signal a bus error, time out, or
uncorrectable error set the corresponding bit in the SFSR
of the CY7C604/605, which then responds by asserting
CMER to the interrupt logic. These bits describe Mbus
error cases and do not apply to the synchronous fault

GroupIl
The single member of this group is case 9, an instruction access translation fault preceded by an instruction access fault. The CY7C601 traps for the first instruction access fault. The information in the SFAR and SFSR
reflects the instruction translation· fault and is not useful
for servicing the initial instruction access fault. The address of the instruction is not lost, however. The instruction address is given by the PC stored in r[l7], or local
register 11, of the trap handler window. Recovery from
this fault case involves using the old PC to re-execute the
fault-causing instruction and attempting correction of the
error condition.

8-67

Table 1. Mbus Transaction Response Signal

MEIm'

mmv MRTY

H

H

H

H

L

H

L
L

H

L
L
L

H

H

L

H

H

L

H

L

L

H

L

L

Table 3. SFSR Access Type
AT
Access Type
0 Load from User Data Space (ASI = OxA)
1 Load from Supervisor Data Space (ASI = OxB)
LoadlExecute from User Instruction Space
2
(ASI = Ox8)
LoadlExecute from Supervisor Instruction Space
3 (ASI = Ox9)
4 Store to User Data Space (ASI = OxA)
5 Store to Supervisor Data Space (ASI = OxB)
6 Store to User Instruction Space (ASI = Ox8)
7 Store to Supervisor Instruction Space (ASI = Ox9)

Action
Nothing
Relinquish and Retry*
Data Strobe
Reserved
Bus Error
Time Out
Uncorrectable Error
Retry

Table 2. SFSR Fault Level
L
0
1
2
3

Table 4. SFSR Fault Type

Level
Entry in Context Field
Entry in Level 1 Table
Entry in Level 2 Table
Entry in Level 3 Table

FT

0
1
2

3
4
5
6
7

cases described in section 4.9 of the SPARC RISC User's
Guide.
The SFSR's level (L) bits describe the level in which
an incorrect page entry was found for translation faults.
These bits are described in Table 2. Note that they are
irrelevant for non-translation fault errors.
The access type (AT) bits are described in Table 3.
They give the type of access that caused the currently
reported memory access fault.
The fault type (FT) bits describe the type of error
found by the CY7C604/605. Table 4 gives the fault type
for the case of a table walk that correctly fmds a page
table entry (PTE) but still causes a fault condition. The
access type (AT) is compared against the access protection field of the PTE (ACC bits), and the fault type is set
according to Table 5.

Fault Type
None
Invalid Address Error
Protection Error
Privilege Violation Error (user mode only)
Translation Error
Bus Access Error
Not Generated
Reserved

Table 5. Fault Type (FT) ror PTE[ET] = 2 (valid PTE)
AT

0
1

2
3
4
5

6
7

8-68

0

1

2

0
0
2
2
2
2
2
2

0
0
2
2
0
0
2
2

0
0
0
0
2
2
2
2

ACC
4
3
0
2
2
0
0
0
0
0
2
0
2
0
2
0
2
0

5
0
0
2
2
2
0
2
2

6
3
0
3
0
3
2
3
2

7

3
0
3
0
3
0
3
0

An Introduction to Mbus
This application note provides an introduction to
Mbus, a part of the SPARC architectural standard, which
addresses the requirements for interfacing to a processor
system's physical memory space.
In a system supporting virtual memory with cache,
the physical memory and I/O interface are key components of the system architecture. Maintaining bandwidth
and response time is critical to achieving adequate performance levels for the system.

systems that incorporate shared memory with caching.
This application note primarily focuses on level 1 compliant system design.
A complete set of Mbus communication protocols
provide for access to physical memory and I/O channels.

Basic Structure of the Bus
Mbus is a 64-bit bus that can transfer up to 128 bytes
in a single data burst, with support to transfer 8 bytes on
each clock cycle. Elements on the bus operate in a
master/slave relationship, where a master element initiates
a transaction and a slave element responds by either accepting or providing data. A "ready" status line from the
slave element controls data transfers. Data is not transferred until the ready line is asserted. This allows a slave
element to operate even if it is not fast enough to handle
data on each clock cycle.
Bus arbitration is supported on Mbus using a conventional request/grant mechanism, which assumes a centralized arbiter. The protocol enables arbitration to overlap
data transfers. This feature allows the arbitration process
to execute without using any bus cycles dedicated to arbitration. The algorithm for implementing the grant
response to a bus request is user defmed for maximum
flexibility.
Several Mbus protocols support error conditions that
can occur in a typical system implementation. These errors include: External Bus Error, Response Timeout, Uncorrectable Memory Error, and Transaction Retry. These
protocols handle most of the error conditions encountered
in a system interface to physical memory and I/O.

Architectural Overview of Mbus
Mbus provides a high-performance interface to the
physical address space in a SPARC system, with facilities
to support the cache coherency requirements of symmetric
multiprocessing. Mbus is intended to operate with SPARC
processors that have local virtual caches, so that access to
the physical address space only occurs in the event of a
cache miss. With reasonable-sized local caches, the Mbus
loading from an individual processor is in the range of 5
to 10 percent. This allows the Mbus to support other
processors and I/O activities without degrading individual
performance.
Bus overhead, which mostly consists of arbitration
and transaction time, is a critical element in determining
overall system performance. Many different bus-arbitration mechanisms are available, with a variety of cost/performance tradeoffs. For this reason, the SPARC architectural standard does not define a specific arbitration
mechanism for Mbus. You thus have complete flexibility
in system design.
Mbus does support bus arbitration that can operate
concurrently with data transactions. When a system can
use overlapped arbitration, bus arbitration incurs no bus
overhead.
The second aspect of bus overhead, transaction time,
is the bus time required to perform the actual data transfer. High bus bandwidth minimizes transaction time on
Mbus, which is capable of peak data rates up to 320
Mbytes/s and 256 Mbytes/s sustained at 40 MHz.
Two Mbus compliance levels are defined to suit differing system requirements. Level 1 compliance is for
uniprocessor applications, and level 2 for multiprocessing

Multiprocessing Facilities
A significant trend in computer systems is toward
multiprocessing. In a shared-memory multiprocessing system, maintaining local cache coherency without degrading
system performance is a major architectural challenge.
Figure 1 shows the topology of a multiprocessing
system. All processor nodes contain local caches and
operate out of them most of the time. However, when one
processor changes data that is shared, the other processors
need to be made aware that the data has changed, so that

8-69

An Introduction to Mbus
PHYSICAL

MEMORY

Figure 1. Multi-Processing Topology
subsequent references can be made without using stale
data. Further, the multiprocessing system needs an efficient mechanism to allow a processor to access the
.
modified data when it is required.
Mbus implements a bus snooping protocol that allows
a processor node to communicate. to the other. nodes that a
piece of shared data has been modified. Each processor
node responds by marking that datum as invalid in the
node's own cache tag .. When. a processor node references
that datum, a cache miss occUrs because the entry has
been invalidated· in the cache. The processor node then
generates a normal Mbus read transaction to access the
datum. Instead of the physical memory element supplying
the datum,the Mbus protocol allows the datum to be supplied by the processor node whose local cache contains
the current datum.
This approach has the advantage that no data transfer
occurs on the bus untilthe shared data is needed, saving a
considerable amount of bus bandwidth. Further, the performance penalty to access modified data is no worse than
the a penalty of a normal cache miss.
When the processor node provides its modified data
on the bus to the requesting node; the Mbus protocol allows the physical memory and other processor nodes to
update their data. This reflective memory feature can save
additional bus bandwidth by requiring that modified
shared data be transferred only once, rather than each time
a different node references the data.

system clock, MCLK, and a bus transaction can only be
initiated by the bus master that currently owns the bus.
A .transaction consists of an address phase followed
by one or more data transfer cycles. The address phase
provides a 36-bit physical address and a set of control
fields that defines the transaction's nature and size. The
data phase consists of multiple 64-bit transfers that are
synchronous with the bus clock. A simple illustration of a
32-byte transCiction appears in Figure 2.
Address Phase
Mbus's 64 data bits are defined as a 36-bit physical
address space and a set of control fields that determine the
type of bus cycle that is being initiated. The master signals the beginning of a cycle by placing the required address and control information on the data bus and asserting an address strobe (MAS) on the bus. The command
fields are
MAD(36- 39) Transaction Type (Type)
o read
1 write
2 coherent invalidate*
3 read coherent*
4 coherent write and invalidate*
5 coherent read and invalidate*
*Level 2 only
MAD(40 - 42) Transaction Size (Size)
o Byte
1 Halfword
2 Word
3 Doubleword
4 16 Bytes
5 32 Bytes
6 64 Bytes
7 128 Bytes
MAD(43)
Memory Cacheable (MC)
This advisory bit indicates whether the address space
for the transaction is cacheable.
MAD(44)
Locked Transaction (MLOCK)
This bit signals that the transaction is part of a multitransaction operation that must be indivisible; thus, the
master will not relinquish the bus between transactions.

Mbus Description
Mbus is a fully synchronous bus whose 64-bit data
path (MAD) multiplexes address and data for each data
transaction. All data is sampled oil. the rising edge of the

MRDY

\~

____ ________________________
~

~r---

DATA PHASE

Figure 2. BasicM-Bus Transaction Timing

8-70

Table 1. Transaction Status Encoding
MERR"
H
H
H
H
L
L
L
L

MRDY~ MRTY~

H
H
L
L
H
H
L
L

H
L
H
L
H
L
H
L

Meaning
Idle cycle
Relinquish and Retry
Valid Data Transfer
undefmed L1, reserved L2
ERROR1 => Bus Error
ERROR2 => Timeout
ERROR3 => Uncorrectable
Retrv

I

READ CYCLE

\/RITE CYCLE

Figure 3. Mbus Read and Write Transaction Timing
starting address is not on a block boundary, the address
sequence increments to the block boundary and then
wraps to the block's start boundary. A simple example of
a 32-byte transfer is illustrated below:
Block boundary is at 100000000000
100000010000 starts on the third 8-byte subblock
100000011000
100000000000 wraps to the start of the block
100000001000

MAD(45)
Boot mode/Local (MBL)
This bit signals that the processor is in the boot mode
or that the transaction is in the local space (Address Space
Identifier (AS I) = 01). This is an advisory bit that the system can use, but is not required for compliance.
MAD(46 - 49) Virtual Address
This field contains bits 12 - 19 of the virtual address
being accessed. These bits are used by virtually indexed
secondary caches for synonym elimination, and they are
only required in multiprocessing level 2 compliant systems.
MAD(50 - 59) Reserved
MAD(60 - 63) Module Identifier
These bits contain the ID(O - 3) for the master initiating the transaction. Used by slave elements to keep track
of which master to reconnect to when implementing
Relinquish and Retry operations, these bits are used only
for multiprocessing level 2 compliance.

Data Control Lines
Mbus provides two multiplexed data control lines:
MAS Address Strobe
The current bus master asserts this line for one clock
cycle when a bus transaction's address phase is executed.
The slave occupying the specified address in the physical
memory space is expected to capture the address and
command fields on the bus when MAS is asserted.
MRDYReady
Slave elements use this line to signal to a master that
requested data is ready for a read or data has been accepted for a write. A master monitors MRDY to know
when a slave is ready for the next cycle in a data transaction.

Data Phase
The element that occupies the physical address
defined during the address phase responds to the request
by either accepting 64 bits of data for a write or providing
data for a read. The slave signals the master its readiness
to complete a data transfer by asserting a ready status on
the bus. This provision allows a slave to operate at a data
rate slower than that available on Mbus.
The Mbus command protocol supports up to 16 successive data transfers. This allows up to 128 bytes to be
transferred in 17 system clock cycles.
Figure 3 illustrates a simple read and write transaction. For transactions that require multiple data phases
(more than 8 bytes), Mbus supports an address wrap feature within the block being transferred. An address wrap
is accomplished by specifying a burst starting address that
is not on a block boundary. This feature can be useful for
cache line transfers, where the CPU is waiting for a
specific word. This word is transferred first, allowing the
CPU to proceed while the balance of the cache line is
transferred.
Block wrapping is implemented by not allowing the
addresses accessed to cross a block boundary. When the

Mbus Transaction Status and Encoding
MRDY combines with the MRTY and MERR control
lines to encode the current status of a transaction's data
phases. The slave element controls the status lines and
thus determines how the current data phase cycle is terminated. The status encoding appears in Table 1.
The rest of this section describes the Mbus transaction activities.
The Idle cycle occurs when a slave element is not yet
ready to transfer data to or from the master. The cycle
occurs when the slave does not assert any of the status
lines. The idle cycle thus effectively· operates as a wait
cycle on the bus. Note that this encoding also appears on
the bus when there are no transactions currently being executed.
The Data Transfer cycle, executed by the slave element asserting MRDY for one bus cycle, indicates to the
master that the slave is ready for the requested data transfer for the current data phase cycle.
The Retry cycle causes the master to restart the full
bus transaction with the address and all data phases
repeated. This cycle is often useful for memory modules

8-71

executing an ECC data correction that needs additional
time.
The Relinquish and Retry cycle operates the same as
a Retry cycle, except that the master must release the bus
and re-arbitrate before starting the transaction cycle again.
A Relinquish and Retry cycle typically is used for devices
that have a long data latency or when the module is busy
and cannot respond.
The Bus Error status is typically used to signal that
an external bus error has occurred. This could be a bus
parity error or invalid status. Note that this encoding is
only a suggested definition. You can use the error encoding as a system-specific error if desired.
The Bus Timeout Error is generated by an external
watchdog timer to signal that the time allotted for a full
bus transaction has expired. It is important to note that
this error applies to transactions requiring from one to 16
data phases, and the time limit chosen must accommodate
the transaction requiring the greatest time. The suggested
timeout interval for Mbus is 200 J.lS. This encoding to
identify a timeout error is only a suggested definition.
You can use the error encoding as a system-specific error
if desired.
An Uncorrectable Error is typically generated by
memory elements that encounter an uncorrectable error,
such as parity or a multi-bit ECC error, in the data being
accessed. This encoding to identify an uncorrectable error
is only a suggested definition. You can use the encoding
as a system-specific error if desired.

rent data transaction to complete before ownership is
transferred..
This protocol places several requirements on the arbiter and the bus masters:
The current master must deassert MBB after the completion of a data transaction.
The current master must have its MBG active to initiate a new data transaction. The arbiter signals a master
that it no longer has bus ownership by deasserting the
MBG.
The arbiter is not allowed to re-arbitrate new requests
after a new grant until MBB is deasserted.
Details of the control algorithm for bus arbitration appear in Figure 4, which is a state flowchart for a bus
master arbitration state machine.

Module Identification and Configuration
An optional facility allows the CPU to identify and
configure modules attached to the Mbus. This facility
provides up to 16 logical positions on the bus, with the
requirement that each logical position contain a small
memory space dedicated to that logical position. An Mbus
module supporting the configuration facility can incorporate any control or status registers required within its
assigned memory space. Four dedicated lines are provided
for each Mbus module (ID[O:3]) to identify the logical
position the module occupies on the Mbus.
In a typical configuration, each slot on the Mbus has
a unique value hard-wired on its ID control lines. A
module decodes its configuration map space in a specific
slot by using the ID value.
An Mbus Port Register (MPR) - a single 32-bit
word at location FFFFFCh in the configuration space - is
defined with a standard format to allow a uniform identification mechanism for a module. The format of the Port
Register is defined in Figure 5, and the configuration address map for the 16 ID values is defined in Table 2.
The MPR fields are defined as follows:
MDEV - Mbus Device Identification Number
This field contains a' unique vendor-defined identification number for the Mbus device being addressed.
MREV - Mbus Device Revision Number
This field contains the revision or configuration number for the Mbus device being addressed.
MVEND - Mbus Vendor Number
This field contains a unique vendor identification
number for the Mbus device being addressed. The current
vendor number assignments are:
oFujitsu
1 Cypress
2 (reserved)
3 LSI Logic
4 Texas Instruments
Note that on reset, a processor begins execution at
location OFFOOOOOOh. This is the same memory space as
the first logical position in the configuration space. Thus,
the ID = Oh logical position must be treated as predefined

Interrupt Support
Mbus provides four dedicated lines, IRL[0:3], for
feeding the current interrupt level to the processor. These
lines typically connect directly to the CPU's interrupt inputs. An external interrupt controller is expected to drive
the interrupt lines.
The four lines operate as an encoded, 16-level,
priority interrupt request, ranging from no interrupt pending (0000) to non-maskable interrupt request (1111). The
system is expected to include a separate interrupt request
encoder to drive the IRL lines for Mbus.

Arbitration MeChanisms
Transfer of bus ownership 'on Mbus is accomplished
using dedicated request and grant control lines from a
central arbiter to the system's bus masters. The current
master controls a busy status line (MBB) to signal that the
bus is in use.
Arbitration between masters can occur concurrently
with data transactions. This is accomplished in the following manner: When a master requires the bus, the master
asserts its bus request (MBR) to the arbiter. The arbiter
responds by asserting the bus grant (MBG) for the requesting master and deasserting the MBG for the current
master. The new master deasserts its MBR on the next
system clock cycle. When the requesting master detects
the grant, that master does not take ownership of the bus
until the bus busy (MBB) is inactive. This allows the cur8-72

An Introduction to Mbus
31

OxFFnFfFFFC

1615

8743

1L....-_lfIl_L_EIofNT_A_T_I~_SPE_C_IF_IC___IL....-_t.IlEV_ _..&..I_m_.....I_MVOO_...JI

Figure 5. M-Bus Port Register Configuration
modules from multiple vendors. These modules are typically for memory, I/O devices, and bus adapters. Modules
can be configured to be mounted either parallel or perpendicular to the mother board.
Cypress has developed a family of processor modules
that incorporate the standard Mbus connector. These
modules include a uniprocessor cluster module, a multiprocessor cluster module, and a dual multiprocessor
cluster. A cluster is considered to be an integer unit, floating point unit, memory management unit, and a 64K
cache.
The Mbus connector is available as a standard component from Amp Incorporated and has 100 signal pins in
a dual row on O.OS-inch centers. The signal interconnects
through the connector are a constant son impedance.
Separate power and ground blades minimize the supplyrail impedance. Table 5 defines the connector's pin assignments.

System Design Considerations
Virtually any Mbus-based system requires several
support elements, including clock generator, watchdog
timer, interrupt controller, and bus arbiter. The following
sections examine the functional requirements and design
considerations for each of these support elements. The
support functions are relatively straightforward, and can
be implemented with four PLDs, a TTL buffer and possibly a flip-flop.
Clock Generator
The system clock is derived from the clock generator,
which should be crystal referenced. For many applications, a simple crystal-controlled oscillator module performs very well. On the other hand, operating with a 2x
clock followed by a toggle flip-flop might be useful if the
application requires true and complement clocks. Note
that Mbus does support a true and complement clock distribution, although it is not required.
Clock distribution on the bus should be implemented
using a single printed circuit trace with no stubs and characteristic impedance of 50 to 750.. The line must be

Figure 4. Bus Arbitration Flowchart

and considered the logical position for the Mbus boot
PROM module.

AC Timing Parameters
Because the Mbus is fully synchronous, with all data
sampled on MCLK's rising edge, all AC parametrics are
specified as set-up and hold times with respect to this
edge. The signals are grouped into two categories: data
path (MAD) and control (CNTRL). The AC specifications
are provided in Table 3. Table 4 summarizes the DC characteristics and reflects the assumption that the maximum
loading per module is a single CMOS load per line.

Table 2. Mbus Address Configuration Map
ConfiQ'uration Snaces
OxFFOOOOOOOto OxFFOFFFFFF
OxFF1000000 to OxFFIFFFFFF
OxFF2000000 to OxFF2FFFFFF

Mbus Identifier
Range for ID=OxO
Range for ID=Ox 1
Range for ID=Ox2

OxFFFOOOOOO to OxFFFFFFFFF

Ran!!e for ID=OxF

Processor Modules
Another part of the Mbus standard defmed by
SPARC International is a physical connector that allows
you to take advantage of a wide variety of standard
8-73

Table 3. Level 1 DC Characteristics
Levell DC Characteristics and Pin Capacitance (Ta = 0-70(:)
Svmbol
Vih

Si~nal

min

max

unit

Input High Voltage level

2.1

Vcc

V

0.0

DescriPtion

Conditions

.8

V
uA

Input High Current

+- 1.0
10

Input Low Current

- 10

uA

2.4

Vcc

V

0.0

ViI

Input low Voltage level

lil

Input Leakage

lib
lilo
Voh

Output High Voltage

loh = -2mA
loi = 8mA

uA

Vol

Output Low Voltage

0.5

V

Cin

Input Capacitance

10

pF

Cout

Output Capacitance

12

pF

eiJo

Input/Output Capacitance

15

pF

When RUN is asserted, the reset state machine is
quiescent, MBB* is not asserted, and the counter chain is
held reset When MBB* asserts, the counter chain begins
counting toward terminal count. In normal operation,
MBB* is deasserted long before terminal count is reached,
and the timer returns to the reset state. If MBB* remains
asserted until terminal count is reached, however, MERR *
and MRDY* are asserted for one clock cycle. The current
master is expected to respond to this condition by terminating the transaction and deasserting MBB*.
In the case of a reset condition, it cannot be predicted
if MBB* will be asserted at the start of the reset interval.
It is therefore necessary to gate-out MBB* from the
timing block during a reset interval. This is accomplished
with GATE from the reset state machine.

properly tenninated. The requirements for clock distribution dictate the use of a low-propagation-delay buffer with
the ability to drive the transmission line. If complementary clock distribution is required, the buffers must also
have low delay skew.

Watchdog Timer
The watchdog timer provides the timing reference required for bus timeout error detection. The Mbus recommendation for the timeout interval in a 40-Mhz system is
200 ~s. The actual value chosen for an application
depends on the system clock rate and the worst-case transaction time of any element on the bus. Normally a value
between 100 and 500 ~s is adequate.
While a transaction is in process, the current master
asserts the Bus Busy status line (MBB), which serves as
the controlling status for the watchdog timer. Each time
MBB is asserted, the timer is triggered. If the timer
reaches terminal count before MBB is deasserted, a Bus
Timeout Error is generated.
The watchdog timer can also be used to generate
timing for the bus reset strobe (MRST). This is possible
because the watchdog function does not have to operate
during reset. The additional logic required to support both
functions is minimal.

Table 4. Levell AC Characteristics
max

25

25

ns

Tch

11

14

ns

Tci

11

14

ns

Tsi(MAD)

3

-

ns

Thi(MAD)

2

-

ns

Tdo(MAD)

-

18

ns

Tho(MAD)

4

ns

Parameter

You can implement the watchdog timer in a pair of
22VIO PALs. Figure 6 shows a block diagram of the
function, with the flowchart for the reset state machine
shown in Figure 7. The design incorporates a single
counter, the watchdog timer, and the reset function.
Two counters make it possible to implement the function in two 22VlOs. The modulo 40 counter uses a
synchronous count enable connected to the terminal count
of the modulo 250 counter. Thus, 250 x 40 = 10000 clock
cycles for the timer to reach terminal count At 40 Mhz,
this value corresponds to a 200 JlS timeout interval.

Unit

min

Tcp

Tsi(CNTRL

3

Thi(CNTRL)

2

-

Tdo(CNTRL)

-

18

ns

Tho(CNTRL)

4

-

ns

ns
ns

* All times are for a Capacitive load of 100 pF

8-74

~~R5S

.....-., SEMICGlDUCfOR

An Introduction to Mbus

;;;;;;;;;;;;;;;;;;;;;;;;;===========================::;;;;;;;===;;;:;;;;;;;;
When a reset occurs, RUN is deasserted. The state
machine deasserts GATE to disable MBB* and hold the
counter in the reset state. When RUN goes active,
COUNT is asserted. This enables the counter, disables
MERR* and MRTY*, and causes MRST* to assert. The
state machine remains in this state until terminal count
(TC) from the counter is detected. The state machine then
asserts GATE and deasserts COUNT. The latter deasserts
MRST* and enables MBB* to control the triggering of
the timing chain. The state machine remains in this state
until another reset occurs.

Table 5. Mbus Connector Pin Assignments

Interrupt Control

Interrupt processing for Mbus-based systems requires
a simple priority encoder that uses individual interrupt requests to determine the priority level. The interrupt controller then drives the bus's four Interrupt Status Lines
(ISL 0 - 3). System elements that generate interrupts are
expected to assert their individual interrupt request line
and hold it asserted until the processor takes action to
clear the interrupt condition. You can easily implement
this function in a single 22VI0 with 16 inputs and four
outputs.
Bus Arbitration

Most Mbus-based systems require some type of bus
arbitration. In addition to the processor requiring access to
the bus, 110 devices such as disk drives require access to
the memory space for data transfer. Thus, the system
needs at least a simple arbitration mechanism to allow the
processor and the I/O device to share the bus.
You can implement many different arbitration
strategies in an Mbus system. These strategies include
fixed priority, round robin, dynamic assignment, or random priority. System performance requirements largely
dictate the arbitration strategy for a specific application.
The arbiter must, in any case, conform to the interface
protocol defined by Mbus.
For a good example of Mbus arbitration, see the
Cypress application note "Using the CY7C330 as a Multichannel Mbus Arbiter." This application note shows how
to implement two different arbitration algorithms in a
single CY7C330 PLD. Note that the design requires the
availability of a 2X clock.

DRAM Memory Module Design
Several issues must be resolved in defining an Mbus
memory module. The module's capacity, the required performance level, and COst are the basic constraints that dictate the module's design.
For reasonable performance, the memory must support Mbus's full 64-bit access per memory cycle. This implies a minimum capacity of 8 Mbytes for a 1M x 1
DRAM design. This is a reasonable minimum size and
capacity increment. Alternatives include a 1M x 4, which
reduces parts count but increases cost; a 256K x 4, which
also reduces parts count and the minimum capacity to 2
Mbytes; and finally a 4M x 4, which increases the mini-

8-75

RU'I,_ _ _ _ _- ,

vee

STATE

~~e~~~~-------------------~--C~--~ST

MACHINE

OATE

---t----...,. MERR

--C>---iiffi'
MB~--+---------~

Figure 6. Watchdog Timer Block Diagram
for a given performance level to determine if an approach
is cost effective.
Table 6 illustrates several performance-cost design
points for 40-, 33-, and 25-Mhz systems. The small performance difference between the lowest-cost design for a
specific clock frequency and the highest-performance
design makes the low-cost implementation quite attractive
from a cost-performance standpoint.

mum capacity to 32 Mbytes with no increase in parts
count.
Figure 8 shows a curve relating a system's relative
performance to Mbus wait cycles. The curve is derived for
32·byte cache line replacements and a 95 percent cache
hit ratio. Note that the relative performance does not
strongly depend on the number of wait cycles. For example, doubling the transaction time for a cache line replacement to five wait cycles reduces performance by
only 13.5 percent.
Thus, the curve indicates that the incremental improvement in performance for a reduction in the number
of wait cycles is somewhat marginal, with only a 2- to
3-percent increase in performance for each wait cycle
eliminated. You must therefore evaluate the relative cost

An Example Memory Design
To better understand Mbus .memory module design,
consider an example of a design for a 25-Mhz system
with 8-Mbyte capacity and 128K of boot PROM. The
design supports the module identification facility. This example illustrates the design requirements without the additional issues involved in a full-speed, 40-Mhz module. A
block diagram of the module appears in Figure 9 and consists of three major functional blocks: interface decode,
DRAM, and PROM/identification generation.
Interface Decode
The interface decode block decodes Mbus commands,
and generates control signals, and supports the DRAM
and PROM blocks for data transfer across the data bus.
The decode block thus detects Mbus commands directed

RelatIve
Perfornance
1.00
.88
.88
.94
.82

.80
.88

.88
.84

.82
.80~_~---+-~r--~--+-~r--+---~

Figure 8. Relative Performance vs M-Bus Wait States

Figure 7. Reset State Machine Flowchart

8-76

decode registers. This is initiated from either the DRAM
or PROM block through the CLRSTB signals.
The address decode block decodes DRAM addresses
by comparing the Mbus address to the map position for a
match (more on the map position later). The PROM
decode is a simple decode for address 000000000 to
00003FFFF. Two match signals for DRAM and PROM
are implemented to avoid the additional delay that would
occur from ANDing the decode outputs from the two
PLDs. The two signals are ANDed in the DRAM and
PROM control elements with no additional delay overhead.
The MPR block implements the write portion of the
Mbus configuration facility. The decode PLD detects the
configuration address space and matches the ID field to
ID(O - 3) for a module match. This condition is signaled
with CMAT. The map position register is loaded from the
data bus when CMAT is asserted and the transaction type
is a write. The PROM block is responsible for the transaction termination via the CLRSTB.
The auxiliary decode block is a simple decode PLD
that captures the transaction size and read/write status.
Note that BOOT and LOCK are decoded in the auxiliary
block, but are not used in this design example.
DRAM Block
The DRAM block is implemented using 70-ns RAMs
operating in page mode with two wait states to initial access and zero wait states for up to 32 bytes. For transactions requiring more than 32 bytes, an additional wait
state is required after every fourth transfer cycle. The additional wait state is necessary because the DRAM
operates in page mode at a 50-ns cycle time. This causes
the data access to skew out 10 ns per cycle, and an additional cycle allows the data access to resynchronize with
MCK. Figure 11 shows the basic timing for a 32-byte
read transaction.
Among the numerous approaches to DRAM control
design, the implementation required for an Mbus pagemode controller has no special peculiarities except for the

at the module; infonns the DRAM, PROM or ID generation block of the request; and supports the transaction by
providing control of the data transceiver, generating the
MRDY as needed, and tenninating the transaction when
complete.
The decode block can locate the DRAM anywhere in
the system's physical address space on a 1 Mbyte boundary. This is accomplished using the Mbus configuration
facility to load the DRAM position in the module's port
register.
A block diagram of the decode function appears in
Figure 10. The block consists of a high-speed bus decode
PLD, a pair of 22VIOs for the address decode, the Mbus
port register that contains the DRAM map position, and
an auxiliary decode PLD for control decode within the
module. The interface decode also provides transaction
control for the data-bus buffer. The buffer is an FCT648
transceiver/register, which can be configured with a combinational or registered data path in either direction. The
DRAM design requires a registered ciatapath on a read
operation.
The bus decode block provides the buffered clock for
the module and generates MRDY out to the Mbus when
the module is active. The ready signal is derived from the
DRAM and PROM blocks' RDYSTB signals.
The block generates LDSTB to the decode registers
when MAS is asserted on Mbus. The block detects that
the module is active by monitoring the match signals from
the address decode.
The decode block controls the bus buffer via BUSDIR and BUSSEL. BUSDIR nonnally causes data to go
from Mbus into the module but reverses direction when
the module is accessed and the transaction is a read operation. BUSSEL controls the type of output data path from
the module for the transaction. When a DRAM read access occurs, the data path is registered; when a PROM or
configuration port read access is executed, the datapath is
combinational. These transactions are decoded using the
match decodes and RD\WT. The bus decode function terminates the transaction by using CLR to clear the block's

Table 6. Memory System Perfomance/Cost Anaylsis
Clock
MHZ
40
40
40
33
33
25

Description

Absolute
Wait
Cycles Perfomance

Relative
Perfonnance

Relative
Complexity

Relative
Cost

35ns IMxl BiCMOS
Non-multiplexed
70ns 256Kx4 Static Col
2 way interleave
60ns IMxl DRAM
Fast Page Mode
35ns IMxl BiCMOS
Non-multplexed
70ns IMxl DRAM
Fast Page Mode
45ns lrnxl BiCMOS
Non-multiplexed

4

0.88

0.88

0.8

2.0

5

0.86

0.86

1.5

1.5

7

0.83

0.83

1.0

1.2

2

0.77

0.94

0.8

2.0

4

0.72

0.88

1.0

1.0

0.60

0.97

0.8

1.7

8-77

MRDY - - - - - - - - - - - - - - - ,
MCK

----..,.------i

MAS

----;------1

10 O-J

----+-~----t

MAD _~e4~H

10

DRM!
ARRAY

1 - - - - - - " - - - 1 GEN

Figure 9. MbusMemory Module Block Diagram

8US DECODE
~8US

CLK
MAS

RDYST8
CLRST8
RMAT
PMAT

MRDY

,,"

MCK

,"
,
,

BUSDIR

.

PROM MATCH: ADR = OOOOOH
RAM MATCH: ADR = MAP POSITION
CONFIG MATCH: ADR '= FFn, n = ID

IsLs-5
BUSSEL

I

CMAT

CLR
LDST8

RD/WT

~~~ITION
AD820-35

,,,

,

,"

ADDRESS
DECODE

J
22Vl0

(2 )

,I
,I

I

RAM MATCH (RMAT)
PROM MATCH (PMAT)

-------CONFIG MATCH (CMATI

I
100-3
A0823-35

"
,,'I

~9~~~?S

I

22Vl0

22Vl0
A080-15

AD840-42
A0836-39
A0845
AOS44

 I timeLoop, 31
timeLoop:
008f9c30 033fffff sethi %hi (Oxffffffb8), %gl
oo8f9c34 820063b8 add %gl, %10 (Oxffffffb8), %gl
oo8f9c38 9de38oo1 save%sp, %gl, %sp
oo8f9c3c 90102064 mov Ox64, %00
008f9c40 d027bffc st %00, [%fp - OxOOOOOOO4]
oo8f9c44 c027bff8 elr [%fp - OxOOOOOOO8]
oo8f9c48 d207bff8 ld [%fp - OxOOOOOOO8], %01
oo8f9c4c d407bffc ld [%fp - OxOOOOOOO4], %02
008f9cSO 80a2400a cmp %01, %02
008f9cS4 16800007 bge Ox008f9c70
008f9cS8 01000000 nop
008f9cSc d607bff8 ld [%fp - OxOOOOOO08], %03
008f9c60 9602eOOl add %03, 1. %03
'008f9c64 d627bff8 st %03, [%fp - OxOOoooo08]
008f9c68 10bffff8 b Ox008f9c48
oo8f9c6c 01000000 nop
oo8f9c70 90102000 clr %00
008f9c74 bOI00008 mov %00, %iO
008f9c78 81c7e008 ret
008f9c7c 81e80000 restore
timeMain:
008f9c80 033fffff sethi %hi (OxffffffaO), %gl
008f9c84 820063aO add %gl, %10 (OxffffffaO), %gl
008f9c88 9de38001 save%sp, %gl, %sp
008f9c8c ll0023e7 sethi %hi ltimeLoop), %00
008f9c90 90122030 or %00, %10 ( timeLoop), %00
008f9c94 7ffc8cab call timexN
008f9c98 01000000 nop
008f9c9c 90102000 elr %00
008f9caO b0100008 mov %00, %iO
008f9ca4 81c7eOO8 ret
008f9ca8 81e80000 restore

Sample Applications
The development system can debug, test, run, and
benchmark real-time applications. You begin the development process by creating an application in C using the
normal UNIX environment Compile the application code
by typing:
cc -c -0 -Ilusr/vw/h 
The -c flag suppresses linking with the UNIX C
libraries and leaves the undefined externals unresolved.
The VxWorks linking loader resolves these unresolved externals. The optional -0 flag optimizes the code. The -I
flag tells the compiler where to fmd the VxWorks header
flies.
The MIZARlVxWorks system is linked to the
Cypress Ethernet Network. To access the system, type
rlogin mizar
The VxWorks user shell displays a ->. The shell contains the last 20 commands issued, and you can access the
shell by issuing vi-like commands. This interactive shell
evaluates and executes virtually any C command. For example, the command
-> printf("hello world")
produces the response

hello world
You can get help by typing any of the help commands shown in Table 6. Refer to Table 7 for a list of
other useful commands and their explanations.
System developers can easily benchmark code and
time context switches within the VxWorks operating system. You create an application first in the host UNIX environment. The sample program used for illustration purposes is a C program containing little more. than a loop
that iterates 100 times. This program measures its own execution time - a necessary feature for benchmarking user

Figure 2. Disassembled Un-Optimized Object Code.
applications. Figure 1 shows the source code for this application.
The timexNO subroutine in this program continues
executing the subroutine or function passed to it until the
subroutine's execution time is known to within ±2 percent VxWorks supplies this routine along with an extensive library of other routines, which perform tasks ranging
from network communication, to device drivers and
linked-list manipulation. These UNIX C-compatible
routines are optimized for speed and real-time oper~tion.
The following user login session assumes that Joe has
logged into the UNIX environment with a h?me ~ectory
called thome/joe and a benchmark working directory
called bench, All UNIX operating system prompts, therefore, begin with Cypressthome/joe/bench, wh,ne ,all
MIZARtVxWorks operating system prompts begm WIth
->. To compile the source code (timetest), type the follow.
ing at the UNIX prompt:
Cypresslhome/joe/bench: cc -c -I1usr/vwlh tImetest.c

#include "vxWorks.h"
timeLoopO
{

int loops,i;
loops = 100;
for(i=O;kloops;++i);
}

timeMainO
{

timexN(timeLoop);
}

Figure 1. Sample C Test Code

8-93

Real-Time Embedded System Development
Table 8. Useful Debu22er Commands
->dbgHelp
dbgHelp
dbgInit

b
b
bd
bda11

e
eret
so
I
tt

addr[,task[,count]]
addr[,task]
[task]
[task[,addr]]
[task]
[task[,addr]]
[task]
[adr[,nInst]]
[task]

Print this list
.
Install debug facilities
Display breakpoints
Set breakpoint
Delete breakpoint
Delete all breakpoints
Continue from breakpoint
Continue to subroutine return
Single step
Single step/step over subroutine
List disassembled memory
Do stack trace on task
To initialize the debugging facility, which allows you to
disassemble code, set breakpoints, step through code, and
perform other tasks (Table 8), type
-> dbgInit
For example, Figure 2 shows the disassembled code
for this test program. This code was produced using the
debugger command "I" (list disassembled memory). The
listing shows the hex memory address, the hex instruction
code, and the instruction itself. To execute the program,
type
-> timeMain
The operating system responds with
timex: 7500 reps, time per rep = 90 +/- 2 (2%) microsecs
This response indicates that after 7500 iterations of
the routine timeMain, the routine found that it took 90 ± 2
J.lS to execute. TimeMain is the name of the subroutine
that represents the main part of the program. You can also
get the timing of this code by typing
-> timexN(timeLoop)
The system responds with

-> I timeLoop, 13
_timeLoop:
008f9bd8 9a102000 clr %05
008f9bdc 9a036001 add %05, 1, %05
008f9beO 80a36064 cmp %05, Ox64
008f9be4 26bfffff bl,aOxOO8f9beO
008f9be8 9a036001 add %05, 1, %05
008f9bec 81c3eOO8 ret!
008f9bID 90002000 add 0, %00
timeMain:
008f9bf4 9de3bfaO save%sp, OxffffffaO, %sp
008f9bfS 110023e6 sethi %hi (OxOO8f9800), %00
008f9bfc 7ffc8cd1 call timexN
008f9cOO 901223d8 or-%oO,Ox3d8, %00
008f9c04 81c7eOO8 ret
Figure 3. Disassembled Optimized Object Code

This command produces unoptimized, unlinked code
that has headers located in the directory lusr/vw/h. To log
into the Mizar system type
Cypress/home/joe/bench: rlogin mizar
To switch the Mizar operating system root directory to
that of the user Goe), type
-> iam "joe"
Next, switch to the working directory:
-> cd "~lbenchtt
To link and load the test program, type
-> Id < timetest.o

timex: 7725 reps, time per rep

= 91 +/- 2 (2%)

microsecs

To exit the Mizar system type
-> "".

Real-time programs can also take advantage of the C
compiler's optimization features. To create a fully optimized version of the code, type
Cypress/home/joe/bench: cc -c -04 -I/usr/vw/h
timetest.c
The significantly optimized code appears in Figure 3.
This code also has a significantly better execution time, as
shown by typing
-> timeMain

timex: 46250 reps, time per rep = 14 +1- 0 (0%) microsecs

8-94

CYPRESS
SEMICONDUCTOR

SPARC as a Real-Time Controller
In addition to giving an overview of real-time system characteristics, this application note shows how the
Cypress SPARC chip set supports real-time operations.
Special attention is given to operating models that
either reduce procedure call overhead or minimize the
time needed for a context switch.
A real-time system must react to external events as
they happen. These systems are, by nature, event driven
as they respond to external, asynchronous stimuli and
must do so in a timely manner. If both logical correctness and timing correctness are not satisfied, severe
consequences can result. Although the need for logical
correctness is obvious, the need for timing correctness
arises due to the possible physical impact· of the controlling system's activities. If a computer controlling a
satellite does not respond to ap external event in time,
for example, the satellite might collide with a foreign
object and be knocked out of orbit.
At the highest level, you can view a real-time system as one that acquires data and detects the occurrence of events by means of hardware inputs. These inputs are then processed and the results transmitted to
hardware outputs. An embedded computer can be used
to· process the data,. with a real-time operating system
controlling the computer.
When defining a real-time system, it is essential to
partition the functions to be performed into individual
units called tasks. Each task is implemented as a
software module that can be invoked to perform a
specific function. Although. many tasks are usually associated with a real-time system, only a limited number
of processors is generally available to execute these
tasks. This application note concentrates on the
simplest case, where a single processor is involved.
Because multiple tasks compete for use of a limited
resource, the processor, it is crucial that tasks be
prioritized. The highest-priority task that is ready to run
at any given time must actually be running. This requirement often leads to a case where IJ higher-priority
task becomes ready while a lower-priority task is executing. In this case, the lower-priority task must immediately be pre-empted, and the higher-priority task must

take control of the processor. This concept of pre-emptive scheduling is esseptial in all relll-time systems.
The real-time systems design considerations
described so far deal with the general behavior of a
real-time system. To put these generalities into perspective, consider the following example.

Dealing with Overhead
In this example, several tasks are defin¢ in
prioritized order (task 1 through task 6). Included in
this system is a real-time clock that generates an· interrupt to the processor every 500 JlS. Table 1 lists the
CPU requirements for this example.
Tasks 1 through 5 all have specific jobs that require
a fixed amount of· time. Task 6 checks for ,,~er commands, and thus the amount· of time it needs varies
depending on whether a user command is present.
Based on the data in Table 1, the CPU time requirements for a second of processing time for each
task appear in Table 2.
Tasks 1 through 5 use 743 ms, which leaves 257 ms
for the background task to execute. This means that the
background task executes at a worst-~ase rate pf 1.3
times per second. In the best case,. the background
task's frequency is· 25 times per secolld~ This rate allows
display updating 25 times per second, while user commands can only be processed at the rate of 1.3 per
second.
So far, this example has not accounted for the overhead associated with switching processor contexts beTable 1. CPU Reqqirements

8-95

Task

Duration

Operating Speed

1

35 us

2000 Jlz

2

100 JIS

1000Hz

3

1ms

333Hz

4

200 us

200 liz

Table 2. CfU Time Per Second
Task

Time!
Invocation

Table 3. 25-us Context Switch Overhead

Invocations

Total Time

1

3508

2000

70ms

2

100 us

1000

lOOms

3

1ms

333'

333ms

4

200 US

,200

40ms

Tasks 1-5

743ms

Switch Overhead

2508

Number of Switches

5733

Overhead

143ms

You can think of the response time to an event as
the maximum amount of time that elapses before the
system can identify that an event has occurred and
respond with the necessary action. In the case of detecting meltdown in a nuclear power plant, the processor
could use the instructions directly from the interrupt
handler to perfonn, the critical actions necessary to shut
the reactor down. This avoids the time penalty of a context switch.
Table 5 shows the effect of interrupt latency in a
real-time system. Many factors, contribute to this effect.
The processor itself has a worst.,.case interrupt response
time, and ,the memory subsystem might also contribute
to interrupt latency. The operating system might be required to disable ,interrupts ,during critical, sections of
code, thus, adding to interrupt latency.
Interrupt response time varies among, processors.
Some processors are designed such that they save the
entire state of the machine when' an interrupt, occurs. In
this case, the interrupt handler starts executing without
regareJ' to the context of the interrupted task. Although
this practice might be ,convenient for the person writing
the interrupt handler, it adds to the system's overhead
and slows interrupt response time.
Other processors vector to the ,interrupt handler
and make the' interrupt routine ,responsible for saving
any part of the' interrupt 'task, state that the handler
might use. The st~te of the interrupt task must then be
restored upon exit' from the' interrupt handler. This, is a
good approach because it does· not introduce any unnecessary· overhead. '
The best approach in minimizing interrupt latency
at th~ processor level is to employ a dedicated. set of
registers reserved for interrupt handlers, With this approach, ~e·· ~nterrupt handler' need not be concerned

tween tasks. This overhead includes several operations.
Specifically, the state of the processor at the time of
pre-emption is saved with ea~ context switch., Then the
scheduler detennines the next task to run. Finally, the
state of the new task is loaded into the proces~or. In
commercially available real-time operating systems, the
time required for a task switch generally ranges from 25
J.lS to over 100 ms for some processors. '
Including ,a 25-~ task switch overhead, the CPU
usage during 1 ~econd breaks down as shown irJTable 3.
More than 14 percent of the total CPU ~e is spent on
overhead; no useful work was' done.
In this 'case, the background task only runs at a
best-case frequency of 11 times per second, while the
worst-case frequency is only once every()ther second.
lncreasing the context' switch overhea4 to 3S ~
produces an interesting effect" ~soCiated with real-time
systems, as sbownin Table 4. Although it s~ms as if the
system works, critical timing parameters have been violat~. For example, task 5 is, sche4uled to run a second
time when it has npt received enough' CPU qme ~
complete its first run. To help compute context switch
overhead, you can use the example, Cprograrn ,th~t appears in Appendix A.
, '
, ,

Interrupt Latency
The need to' meet externally imposed deadlines lies
at the heart of a re~.:tip1e sys!em. In real-time computing, the c()rrectness of t1'l~ system depends not only on
the logicai resJllt of the computation, but also on the
time at which the results are produced. A system must
be fast as well as predictable.
The parameter used to ~~ify a system's predictability is its worst-case interrupt latency. TItis parameter
is defmed as the maximum amount of time 'a' sy&~em
takes "j)efore responding to an' external event; Interrupt
latency usually indicates Ii specific processor's worthiness as a real-time controller.
Interrupt latency directly affects two key system
performance factors: the guaranteed response time to
an event and the guaranteed respon~e time, of any individual ,task. The latter is the maximum amount of time
it takes to pass control from a 10wet-prioIity task to a
pre-empting higher-priority task.
.

Table 4. 35-llSContext Switch Overhead
Tasks 1-5
Switch Overhead

8-96

743ms
35

us

Number of Switches

5733

Overhead

200ms

with saving and restoring the interrupted task's working
registers.
Another factor you must account for is memory
system latency. In a design. using dynamic memory, the
interrupt latency includes the worst-case memory-cycle
timing for fetching interlllpt handler instructions. In a
cache system, the worst-case timing includes the time
penalties of a cache· miss. With processors running in
the 25- to 40-Mhz range, failure to consider these latencies can have drastic effects.
Just as important as the time taken to switch tasks
or respond to interrupts is the time window during
which· the operating system is unable to do these things.
An operating system's ability to do a context switch in
10 ~ is not useful if the operating system disables context switching for 50 ms or more while doing something
else;
An operating system' might disable interrupts to
place a task in a ready queue or to access a critical
region while doing inter-task communication, resource
allocation, or· task synchronization. When accessing a
critical region, a real-time system must provide a way to
get uninterrupted access to a· shared variable. Some
processors support this requirement in hardware; however, the following example shows the overhead involved when hardware does not support uninterruptable
access to shared variables.

Access to Shared Variables
This example defines two tasks Table 6. Task 1
counts the number of input pulses from ,an input
stream. . Task 2· reads the total number of pulses every
second, clears the count variable, and performs a series
of operations based on the total' number of pulses. If
special care is not taken in accessing the· shared count
variable, the following might occur:
1. Task 1 has control 1* count is at 200 */
count->register
register+ l->register
interrupt occurs

2. Task 2 gets control (One second has elapsed)
count->register
O->count
execute based on count

A serious problem has occurred: The variable
count contains a value of 201 when the count should be
1. This is a common problem that must be overcome in
a multitasking. environment. The key to eliminating the
problem is . uninterruptable updating of shared variables.
In processors without harqware support for this
capability, the only way to update a shared variable
without the possibility of pre-emption is to disable interrupts. Table 7 shows modifications. (Note that this
solution is valid only for single-processor systems. In a
multiprocessor system; some form of hardware lockout
is essential.)
Although this solution works, and the maximum
amount of time in which interrupts are disabled is minimal, everything is not as it seems. The main problem is
that interrupts can only be disabled in supervisor mode.
This means that·· a software trap must be executed, the
processor must branch to a trap vector, change into supervisor mode, execute the few uninterruptable instructions, then go back to the original point You must cOIi:sider the time during which the processor i~ uninterrup~
table when calculating worst-case interrupt latency.

SPARC as a Real-Time Controller
As real-time systems vary widely in requirements, it
is important that
specific processor chip set provide
the flexibility to meet the needs of specific applications.
It does not make sense to pay fora processor that has a
built-in floating point unit to ~o strictly integer operations. The same holds true for 'a processor with a builtin MMU when you use only a physical memory system.
The Cypress SPARC chip set is specifically deSigned to
meet the needs of individual applications without fOICing you to buy something you do n9t need. Table
8shows the SPARC family of chips: You can. use these
parts in any combination to create a system' that fits
your application. family· of chips. You can use these
parts in any combination to create a system that fits
your application. family of chips. You can use these
parts in any combination to create Ii system that fits
your applic~tion~

a

Processor Interrupt Response Time

3. Task 1 resumes
register->count

Table 5. Effect ofInterrupt Latency

The CY7C601 SPARC integer unit minimizes interrupt latency at 'the processor level. The processor dedi~
cates eight of its 136 registers strictly for use by interrupt handlers. When an inteiTupt occurS, the interrupt
routine automatically gets anew set of· eight registers
with which to work. On an interrupt, the processor
switches to superVisor mode, gets the new set of
Table 6. Format of Tasks

Event

Worst-Case Time

Task Switch

3$ us

Task 1

Task 2

Interrupt latency

25 us

count->register

count->rf;!gister

Response to event

25 us

registeN 1->register

O->count

load-store instruction' moves a byte from memory into
the selected register and then rewrites the same byte in
memory to all Ones. The CY7C601 executes both instructions without allowing intervening asynchronous
traps.
You can use either of these instructions to create a
semaphore for accessing a critical region without the
need to enter supervisor mode and disable interrupts.
The SWAP instruction can be used for counting
semaphores, and the atomic load-store is appropriate
for a simple semaphore for critical regions.

Table 7. Modified Format of Tasks
Taskl

Task 2

,disablQ interrupts

disable interrupts

couQt->register

count->register

register+ l->register

O->count

register->count

enable interrupts

Alternate Register Models For SPARC
registers, and completes execution of the first instruction in ,the interrupt routine in a worst-case time of 14
clock cycles. At 40 MHz that time equals 350 ns.
Two of the CY7C601's interrupt-handling registers
autornap.~ally save the program counter and next program coun~ of the interrupted task, with the remaining sjx registers, at the disposal of the interrupt routine.
Upon return from the interrupt, the processor automatically restores the state of the interrupted, task; this is
done in two clock cycles, or 50 ns at 40 MHz.

The Cypress CY7C601 has a total of 136 32-bit
registers, which are divided into a set of 128 local
registers and eight globals. The use of these registers is
configurable by accessing a processor register called the
Current Window Pointer (CWP). Two common operating models are supported by commercially available
compilers and operating systems: The standard register
windowing model is optimized to minimize procedure
call overhead, and an alternate model significantly
reduces the time required for a context switch.

Achieving Deterministic Response Time

Register Windowing Model

The CY7C604 CMU has two special features that
help guarl!lltee deterministic response for systems using
either virtual or physical addressing, with or without
cache memory. The MMU allows selected pages to be
locked into the Translation Lookahead Buffer (TLB).
This capability ensures that critical memory pages are
always in main memory, avoiding the delay associated
with a table walk;
In systems using cache memory, the CY7C604 allows the cache to be locked. You can load the cache
with time-critical code, such as interrupt handlers and
time-critical tasks, and be sure that these routines will
a1wa~ be present in the cache. With these features,
memory latency is no longer a problem, and predictability is guaranteed.

For the register windowing model, the register file
is divided into a set of eight overlapping register windows. Each window contains a set of 24 local registers.
The registers in each window are divided into three sets
of 'eight registers referred to as INS, LOCALS, and
OUTS. At any given time, the processor can access only
one, window ,and the eight globals. The windows are
join~ together ina circular stack, with each window
sharing its INS and OUTS with adjacent windows. Two
instructions provide for rotating the windows among
procedures.
A save instruction is used with a procedure call to
allocate the next window for the called procedure.
Before executing the save instruction, the calling procedure stores the parameters to be passed in its OUT
registers. Upon execu~on of the save instruction, the
register set is rotated such that the called procedure has
access to the passed parameters in its IN registers.
A restore instruction is used with a return from
procedure to restore the register set' of the calling procedure. Before executing the restore instruction, the
called procedure stores in its IN registers the
parameters to be returned to the calling procedure.
Upon execution, of ,the restore instruction, the register
set is rotated back to its previous position with the
returned parameters, in the caller's OUT registers.
Because the processor logically provides new LOCALS and OUTS with each procedure call, local
register values need not be saved and restored across
calls. The overlapping registers also minimize the overhead of passing and returning procedure parameters
because the' parameters are passed in registers instead
of the main memory stack.

Se"1flphore Support in Hardware
Included in the CY7C601's instruction set are two
instructions tltat provide uninterruptable access to an
external memory location. The SWAP instniction exchanges the contents of, a selected register with the contents of the addressed memory location. The atomic
Table 8. RISC 600 Family ofSPARC Chips
Device

Description

CY7C60l

Integer Unit

CY7C602

Floating Point Processor

CY7C604

Cache Tag-Controllerl
MMU

8-98

5) Execute RE'IT (return from trap)

Fast Task Switch Register Model

For the fast task switch register model, the register
set is divided into four non-overlapping sets of 24
registers. Three of the four register sets are dedicated
to the three highest-priority or time-dependent tasks.
All the remaining tasks share the other set of registers.
Associated with each register set are a set of eight independent registers for use by interrupt handlers. These
registers also store the state of the processor on a task
switch.
Using this register model, the processor can do a
task switch to any of the three highest priority tasks in
under a microsecond. A task switch to one of the other
tasks can be done in less than 3 Ils.
When an interrupt occurs, the processor automatically switches register sets to access the interrupt
registers corresponding to the new task. If the interrupt
initiates a task switch, the state of the processor is saved
in the interrupt registers. If the new task is one of the
three high-priority tasks, the task's state is loaded from
its dedicated interrupt registers, and execution begins
immediately. In this case, the state of the machine is
merely the PSR, PC, NPC and possibly a few other control registers. The general-purpose registers are not affected, as they are dedicated to general-purpose tasks.
If the new task shares a set of registers, the state of
the task previously using that register set is saved to
memory and the new task's state is loaded into the
processor. This state includes the minimal processor
state as well as the 24 general-purpose registers.
To understand this model's task switching behavior,
consider two examples:

Example 2-Switching to a lower-priority task
1) Interrupt occurs
Automatically switches to interrupt registers
PC and NPC saved in interrupt registers
2) Save PSR and any other control registers to interrupt registers
3) Load pointer to the shared set of working registers into the CWP
4) Save the registers to memory
(these are the registers of the previous task using the window)
5) Restore the working registers of the new task from memory
6) Update the CWP to point to the shared task's interrupt registers
7) Save to memory the eight interrupt registers containing the
state of the previous task running out of these registers
8) Restore the state of the new task
9) RE'IT (return from trap)

Each register model has certain advantages. Using
register windowing significantly reduces both procedure-call overhead and data-bus traffic as parameters
are passed in registers. This approach also has the affect of caching local variables because each procedure
gets a new set of local registers. The price paid for this
advantage lies in the context switch overhead. On a context switch, the processor must save and restore all the
used registers-up to 120, as detennined by the Window Invalid Mask (WIM), a processor status register.
When using the fast context switch register model,
on the other hand, you do not get the ultra-fast procedure calls that result from register windowing. You do
get the benefit of four separate register files and very
fast context switching, however. In this model,
parameters are passed on the stack as is done on most
other architectures. Each task's allocation of 24 generalpurpose local registers and eight global registers is the
same as the total number of registers in most other architectures.
Because register usage in the CY7C601 is configurable by software, you can mix these models to
achieve the benefits of both. The SPARC register set
and the entire Cypress chip set has been designed to
cover a wide range of applications efficiently.

Example l-Switching to a higher-priority task
1) Interrupt occurs
Automatically switches to interrupt registers
PC and NPC saved in interrupt registers
2) Save PSR and any other control register to interrupt registers
3) Load the pointer to the new task's interrupt registers into the
CWP
4) Restore new task's PSR and any other control registers

8-99

Appendix A. Sample C Program to Compute Context Switch Overhead
1***************************************************** ********************************************** ************1
1*
*1
1* This program is used for detennining the overhead of context switching in a real-time system. This simulation does not take into *1
1* account interrupt latency, memory I;ltency, or any of the other many possible forms of overhead associated with a realtime
*1
1* system, but these can easily be added. The current version should be sufficient to give a good idea of how much time the kernel *1
1* is spending on context switching.
*1
1***************************************************** ********************************************** ************1
#include \c\ms\include\math.h
#include \c\msunclude\stdio.h
#define BCKGRND 100
FILE *fp; int openfile; char fname[35]; int numtasks;
main (argc, argv)
int argc;
char *argv[];
{

int
int
int
int
int
int
int
float
int
int
int
int
int
float
float

i,j;
iterations;
curr_task;
time[I00];
duration [1 00];
frequency[1 00];
total;
background;
swtime;
switchh;
temp;
tempI;
sampfreq;
tempflt;
cs_time;

create_fileO;
temp = 0;

1* get number of simulation points per second *1
while (temp==O)
{

place (7,4,"Enter the sampling rate in Hz (100 - 10(00) : ");
locate (7,58);
ceolO;
iterations = 0;
temp .. getcharO;
while (temp<>Oxa)
{

if «temp> .. 000) && (temp < .. 009»
{

temp = temp - Ox30;
iterations = iterations * 10;
iterations = iterations + temp;
temp = getcharO;
},

temp .. 1;
locate (22,S);

8-100

Appendix A. Sample C Program to Compute Context Switch Overhead (continued)
ceolO;
if «(iterations % 1(0) != 0) II (iterations = 1(0) II (iterations = 100(0»
{
temp = 0;
place (22,5,"Error must be = 100 or 10,000 and a mult of 100");
}
},

1* time in microseconds of one clock tick *1
sampfreq = 10000 1 (iterations 1 100);

place (8,4,"Enter the context switch overhead in microseconds : ");
locate (8,58);
swtime = 0;
temp = getcharO;
while (temp<>Oxa)
{

if «temp> - 000) && (temp< = 009»
{
temp = temp - Ox30;
swtime = swtime * 10;
swtime = swtime + temp;
temp = getcharO;

1
temp = 0;
while (temp==O)
{
place (9,4," Enter the number of tasks (100 max) : ");
locate (9,58);
ceolO;
numtasks = 0;
temp = getcharO;
while (temp < > Oxa)
{
if «temp > = 0x30) && (temp < = 009»;
{
temp = temp - Ox30;
numtasks = numtasks * 10;
numtasks = numtasks + temp;
}
temp = getcharO;
}
temp = 1;
locate (20,S);
ceolO;
if (numtasks > 1(0)
{
temp = 0;
place (20,S ,"Maximum number of tasks is 100");
}
},
1* tasks numbered 0 to n *1
numtasks = numtasks - 1;
for (i=O; i=numtasks; i++)

8-101

Appendix A. Sample C Program to Compute Context Switch Overhead (continued)

temp = 0;
while (temp==O)
{,
locate (i+ll,4);
printfC'Enter the frequency of task %d in Hz" ,i);
locate (i+1I,60);
ceolO;
frequency[iJ .. 0;
temp .. getcharO;
while (tem<>Oxa)
{
if «temp> = 000) && (temp < = 009»
{
temp .. temp - Ox30;
frequency[i] = frequency[iJ '" 10;
frequency[i] = frequency[i] + temp;
}
temp .. getcharO;
}
locate (20,S);
ceolO;
locate (21,S);
ceolO;
if (frequency[i] 0)
{
if «iterations % frequency[i]) !... 0)
{
locate (20,S);
printf (" Warning: %d and the simulator frequency: %d are not multiples",frequency[i],iterations);
place (21,5," Would you like to re-enter the value (not mandatory) (y'n) : ");.
locate (21,70);
temp = getcharO;
tempi = getcharO;

'''' CR "'I
if «temp=='Y') II (temp=='y'»
temp .. 0;
else
temp = 1;

else
place (20,5," Frequency must be greater than zero ");
temp .. 0;
}

1* frequency[i] will be used with modulo operator to see when task ready""
frequency[i] = iterations' frequency[i];
1* integer divide ""
}
locate (20,5);
ceolO;
locate (21,5);

8-102

Appendix A. Sample C Program to Compute Context Switcu Overhead (continued)
ceolO;
for (i=O; i=numtasks; i++)
{
locate
(i+numtasks+14,4);
printf(,Enter the duration of task %d in microseconds" ~);
locate (i+numtasks+14, 60);
duration[i] .. Q;
temp = getcharO;
while (temp<>Oxa)
{
if «temp> = 000) && (temp < = 009»
{
temp = temp - Ox3Q;
duration[i] = duration[i] * 10;
duration [i) = duration[i] + temp;
}
temp = getcharO;
}
/* init ialize current task *1

curr_task = BCKGRND;
/* init current task, task switch needed for 1st task background task time of execution *1

background = 0;
1* number of contexi switches *1
switchh = 0;
1* init total time left in this time s lice *1
total = 0;
1* check to see whether a disk fil!l is to be opened *1
if (openfile== 1)
init_fileO;
clsO;
1* init time spent in individual tasks *1
for (i=O; i=numtasks; i++)
time[i] = O~
1* iterations start at 0 *1
iterations = iterations - 1;
1* main simulation loop *1
for (j=0; j=iteratio!ls; j++)
/* number of samples 'lll
{

1* screen oup ut to show system didn't die *1
if «j % 100)==0)
{
locate (10,6);
printf (' Doing silllulation loop %d of %d n, j,iterations+l);
},
total = total + sampfreq;
1* increment clock time for each time slice scheduling of tasks *1
for (i=O; i=numtasks; i++)

1* check if task is scheduled to execute *1
if  0)

'*

check if there is time to run the task

*'

if (time[i»O)

'*
,*

is this particular task ready to run

{

*'

does a context switch actually take place or was
if (i !. clDT_task)
{
total .. total - swtimej

'*
'*

*'

context switch time
switchh .. switchh + 1;
# of context switches

'*

can task time slice be completed
if (total ..; time[i))

*'
*'

{

if (openfile.... 1)
fprintf(fp,"%d" ,time[i));
total- total-.time[i);

'*
'*

time left in slice
time[i) .. 0;

*'
*'

update ready list
curr_task = ij
j. mark as last task to run

'*

can run portion of tas~
else

8-104

*'

*'

*'

Appendix A. Sample C Program to Compute Context Switch Overhead (continued)
1* use remaining time available in simulation slice "'I
if (total 0)

I'" time still required by the task "'I
time[i] • time[i] - total;
total- 0;
I'" time slice has expired "'I
}
I'" mark in sim file that a context switch has started for "'I
I'" one task but a higher priority task has become ready "'I
I'" and will has pre-empted the scheduled task "'I
else
if (openfile==l)
fprintf(fp,"X");

I'" mark state of processor "'I
curr_t ask '" i;

else
if (openfile == 1)
fprintf(fp," -");

else
if (openfile= ... 1)
fprintf(fp,"-");

I'" background "'I
I'" if time left after all· scheduled tasks have run, let bac kground task run "'I
if (total 0)
{
I'" check to see if background was last to use the processor "'I
if (curr_task != BCKGRND)
{
switchh", switchh + 1;
total", total - swtime;

curr_task ... BCKGRND;
I'" set curr_task to background "'I
if (total 0)
{
if (0Penfile.... l)
fprintf(fp,"%d" ,total);
I'" add to background task exec ution time "'I
background '" background + total;
total .. 0;
I'" background takes all remaining time "'I
}
else

8-105

Appendix A. Sample C Program to Compute Context Switch Overhead (continued)
if . (openfile... l)
fprintf(fp,"X");

else
if (openfile--l)
fprintf(fp," _");
if (openfile"'=I)
fprintf(fp,"\n") ;

1* screen output *1
clsO;
for (i - 0; i=numtasks; i++)
tempflt .. «(float)iterations + 1) I (float)frequency[iD * (float)duration[i];
tempflt = tempflt I 1000:
locate (i+4,6);
printf (' Total execution time for task %d : %6.2f ms",i,tempflt);
}
locate (numtasks + 6,6);
printf (" There were %d Cotltext switc hes" ,switchh)~
cUime .. «float) swtime * (float) switchh) I 1000;
locate (numtasks + 8,6);
printf (" Context switch ove rhead : %6.2f ",cs_time);
locate (numtasks + 10,6);
printf (,Time available for background tasks: %6.2f ",background I 1 000);
pI: locate (numtasks + 15,6);
printf C'For more info look at simulation file ");
locale (22,1);
if (openfile==I)
fclose(fp);

1* screen utilities supported with ansi.sys clear screen utility *1
cis 0
{
printf ("%cl 2J" ).7);
}
ceolO
{
printf ( "%elK" ).7);
}
locate (row,col)
int row,col;

printf("%c[%d;%dH",27, row,col);
}
place (row,col,text)
int row,col; char text[];
locate (row,col);
puts (text);

8:..106

Appendix A. Sample C Program to Compute Context Switch Overhead (continued)

create_fileO
{
int temp);
openfile '"' 0;
for (i.Q;i#;i++)
fname[i] = 0;
clsO;
place (5,4,"Enter file to be created (Retu m for no file): ");
locate (5,51);
i '"' 0;
temp = getcharO;
if (temp=Oxa)
{
temp '"' getcharO;
while (temp<>Oxa)
{
fname[ i] = temp;
i++;
temp = getcharO;
}
fp = fopen(fname,"w");
openfile = 1;
}

initJileO
{
inti;
fp rintf(fp,"\n\n\n\n Simulation Results \n\n\n\n\n");
fprintf(fp,"
Tick
");
for (i=O; i=numtasks; i++ )
fprintf(fp,"task%d
",i);
fprintf(fp,"background
\n\n");
},

8-107

-----.:-

_.iii~
. CYPRESS
.....:}

,

~:

SEMICONDUCTOR

Memory Protection and Address Exception
Logic for the CY7C611 SPARC Controller
This application note describes an address validity
check circuit for the Cypress CY7C611 SPARC-compatible RISC controller. The design provides validity
checks on 32-bit word boundaries for the entire 24-bit
CY7C611 address space. If an address falls outside a
valid boundary, the check circuit generates a memory
exception.
The absence of a memory management unit
(MMU) often distinguishes an embedded microprocessor from a central processing unit. This does not mean
that some MMU functions are not desired for embedded applications, but these applications usually do
not need the full range of such functions (mapping, access protection, validity check, etc.). However, a circuit
that performs an address validity check definitely has
applications in embedded systems, provided that the
circuit can be implemented with a reasonable number
of components. The circuit described here is implemented in two Cypress EPLDs: a CY7C332 and a
CY7C361.
The circuit contains two functional blocks: the address-checldng circuit and the memory-exception generator (Figure 1). The address-checldng circuit checks the
SPARC processor's most significant 22 address bits
against an arbitrary memory map. The memory map
used for this design appears in Table 1. If an address
Table 1. System Memory Map
ADDRESS

DESCRIPTION

EXCEPTION

000000 -

07FFFF

Boot PROM

N

080000 -

OFFFFF

Unused

'(

100000

I/O Status Reg.

N

100004

I/O Control Reg.

N

Unused

Y

10000a -

1FFFFF

200000 -

5FFFFF

4Mb RAM

N

600000 -

BFFFFF

Unused

Y

COOOOO -

DfFFFF

EOOOOO - FFFFFT

I/O Interface
Reserved for exparos::ion

ADDRESS
/WE
CY7CCll

INTEGER
UNT

~

CY7C332
ADDRESS
CHECK CKT
(ACC)

.......

~
, 'I' I'

, II

~

INlA..L ....

WRY
/MHOLDA ...
/MEXC

CY7CJ61
MEMORY
EXC£PTION
GENERATOR

/MDS

(MEG)

/MHOLOB

~

Figure 1. Block Diagram
exception occurs, the memory-exception generator
sends a memory exception to the processor.
These circuits offer an example of how the logic
functions built into the CY7C332 and CY7C361 can implement designs that would otherwise be very difficult
to implement. The CY7C332's transparent latch mode
permits the design to make the most of the 10-ns address setup time provided by the CY7C611 SPARC
controller. The CY7C361 combinatorial input configuration acts upon the exception information without
incurring any extra clock delay, while the singleregistered configuration holds CY7C611 bus transaction
information. The CY7C361 Mealy input inhibits
memory exceptions that might occur because of a nullified address from the CY7C611, and the termination
macrocell configuration inhibits further exceptions
before the current exception completes.

CY7C611 Memory Interface
The CY7C611 sends most of its memory interface
signals out unlatched. Thus, these signals are only valid
a short time before and after the system clock's rising

N
y

8-108

C?~RESS

CY7C611 Memory Protection and Address Exception Logic

~jr ~C~OR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-~~~~~

eLK
A(2."l'O)

Opt '0)
INULL

!
WRT
AE:XC
MHOLOx

MOS
MEXC
Figure 2. Load Bus Cycle
edge, To be used, they must be latched outside the
processor.
The processor sends out a few latched signals.
!NULL is the only one of these signals used in this
design. The processor holds !NULL until the next rising
clock edge, and the signal is not designed to be latched
externally.
The Load, Load Double, Store, and Store Double
bus cycles for the CY7C611 appear in Figures 2,3,4,
and 5, respectively. Figure 6 illustrates a Load with
Memory Exception, and Figure 7 illustrates a Store with
Memory Exception. Note that in both cases, MHOLD
becomes active more than one clock after the address
causing the memory exception leaves the bus. This is acceptable because the data corresponding to that address is just being clocked into the CY7C611's fetch
pipeline stage, and thus can be easily invalidated.

Signal Description
The signals used in the design are:
1. IMHOLD(A/B) -Memory Hold (CY7C611 inputs)
These two signals are ORed together inside the
processor. Either is asserted to freeze the processor's
pipeline, which is the ftrst thing that must be done to
generate a memory exception. The CY7C611 's outputs
revert to and maintain the value they had at the clock's
rising edge in the cycle in which either signal was asserted. These inputs are sampled at the processor
clock's falling edge.
This design's state machines use both /MHOLDA
and /MHOLDB. The store exception state machine
uses IMHOLDA, and the load exception state machine
uses /MHOLDB (Figures8 and 9).
2. IMDS-MemoryData Strobe (CY7C611 input)

The memory-exception generator asserts Memory
Data Strobe to strobe the lMEXC memory exception
signal into the CY7C611. IMDS can only be asserted
when the pipeline is frozen via assertion of /MHOLDA
or /MHOLDB. IMDS must be de-asserted before or
simultaneously with the release of memory hold (Figure
6 and 7).
3. IMEXC - Memory Exception (CY7C611 input)
Assertion of this signal initiates an instruction access or data access exception trap and indicates to the
processor that an attempt was made to access an invalid
address. lMEXC serves as a qualifter for the IMDS signal and must be asserted when both IMHOLD(A/B)
and IMDS are already asserted. When IMEXC is
generated with IMDS, the contents of the data bus are
ignored. lMEXC is latched on the clock's rising edge
and is used in the subsequent cycle. IMEXC must be
released in the same cycle that memory hold is released
(Figure6 and 7).
4. INULL - Instruction Nullify (CY7C611 output)
The processor asserts INULL to indicate that the
current memory access is being nullified The signal is
asserted in the same cycle in which the address being
nullifted is active, although the address is no longer on
the address bus. The address is held in external latches.
!NULL is used to disable memory exception generation
for the current memory access. Tills means INULL
should not be asserted during a memory exception.
!NULL is asserted under the following conditions:
During the second data cycle of any store instruction, to nullify the second occurrence of the store
address, i.e., if the address was valid the ftrst time,
it is still valid the second time
On all traps, to nullify the third instruction fetch
after the trapped instruction

8-109

5~~~R==~~~~~~~~~==~==
CY7C611 Memory Protection and Address Exception Logic
On a load in which the hardware interlock is ac-

tivated
On JMPL and RETT instructions
INULL is used as a Mealy input to the memory-exception generator to inhibit memory exception generation during nullified load memory accesses. The signal
has no effect on the· store exception state machine
(Figures2 through 7).
5. /WE - Write Enable (CY7C611 output)
/WE is asserted during the cycle(s) in which store
data is on the data bus. The address-checking circuit
uses this signal to inhibit generation of address exceptions after the store has begun (Figures4,5 and 7).
6. WR T - Advanced Write (CY7C611 output)
WRT is asserted in two cases: during the frrst store
address cycle of integer single or double store instructions and during the second load/store address cycle of
atomic load/store instructions. The memory-exception
generator uses this signal to enable either the store
(WRT = 1) or load (WRT = 0) state machines
(Figures2 through 7).

Address-Checking Circuit
The address-checking circuit fits completely into a
single· CY7C332. This PLD was chosen because it has
the required number of I/O pins, a very narrow capture
window, and inputs that are configurable as latches.
Configuring the inputs as latches allows you to make
maximum use of the CY7C611 's 10-ns address setup
before the system clock's rising edge (more on this
later).

The inputs to the address-checking circuit are the
22 most significant bits of the processor address bus,
the system clock (SCLK), and IWE. The output of the
address.,.checking circuit is a single line: Address
EXCeption (I AEXC). 1AEXC is inhibited when IWE is
active. Figures 4 and 5 show that IWE is active only
when store data is on the data bus, i.e, after the frrst
address cycle of a store. At this point, because it is too
late to stop a store and generate a memory exception,
1AEXC is inhibited. Note that 1AEXC is inhibited only
on the data portions of store bus cycles.

Memory-Exception Generator
The memory-exception generator occupies roughly
1/3 of a CY7C361 PLD and must accomplish two
things. First, the circuit must respond to address exceptions generated by the address-checking circuit. Second,
the memory-exception generator must know when not
to respond to memory exceptions generated by the address-checking circuit.
The second case requires the use of a Mealy
input/output pair in the CY7C361. The CY7C361 was
chosen for its Mealy I/O capability and its input configurability. Each input can be configured as single
registered, double registered, or combinatorial. This
design uses both single-registered and combinatorial
inputs.
At frrst glance, INULL looks like the perfect signal
to inhibit memory-exception generation and reset the
memory-exception generator to its initial state. But as
Figure 7 shows, if the store's first address cycle causes
the address exception, IMHOLDx is asserted. just after

eLK

A(23:0)

O(Jl:0)
INULL

__+-______~------~~------4_------_+--~1

WE
WRT

MEXC

Figure 3. Load Double Bus Cycle

8-110

~""""'"

CY7C611 Memory Protection and Address Exception Logic

~aNOOcr~~~~~~~~~~~~~~~~~~~~~~~=-~~~

elK

A{23:0}
D{31 :O}

INUll

\

WE

rh

WRT

/

I

ftE)(C

MHOLDx'
MDS
MEXC

Figure 4. Store Bus Cycle
the next SCLK rising edge. At this point, INULL inhibits IMHOLDx and resets the exception circuit before
it can generate an exception - an undesired chain of
events.
To avoid this problem, the memory-exception generator is actually two state machines - one for stores
and one for loads. The load-exception state machine
has a Mealy output connected to the CY7C611 's

/MHOLDB input, and the store-exception state
machine has a regular CY7C361 output connected to
the IMHOIDA input. Thus, !NULL can inhibit nullified load transactions but has no effect on stores.
The equivalent function for stores is accomplished
in the address-checking circuit with the /WE input.
When /WE is active, the address-checking circuit cannot generate address exceptions. Memory exceptions

eLK
A(2J;(»
0(31 0)

INUll
~

WRT

!/

\

I

rh
I

AEXC
t'

NHOLDx

NOs
iffi RST;
ICKEN = ; {always on}

{*************************************************************
S states are states for Store operations, R states are states for Read
operations. DCLK is used to ensure sampling only when inputs are valid.

**************************************************************}
SO =  WRT
S1 = < prod> S5;

* AEXC * !DCLK * !S1;

{These conditions start
state machine}
{S1 is term, so S5 terminates this output}

RO =  !WRT * AEXC * DCLK * lR1 * !R2; {These conditions
start state machine}
R1 =  RO * I!NULL * lDCLK;{R2 is term, so R5 terminates
this output}
R2 = < prod> R5;
{R5 turns off the state machine}
MHOLDA = < inv sum> IS1 * ISO;
{Active for 0, 1}
MDS = < inv sum; !S4 * IS5 * lR4 * !R5; {Active for 4 and 5}
MEXC = < iflv sum> 1S4 * !S5 * !R4 * lR5; {"
"}
MHOLDB = <-inv_sum> !R2 * !R1 * !RO;
{Active for 0, 1, 2}

8-117

Section Contents
Bus Products
Features of the VIC068 VMEbus Interface Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9-1
Interfacing the VIC068 to the MC68020 .................................................... 9-5

CYPRESS
SEMICONDUcrOR

Features of the VIC068
VMEbus Interface Controller
This application note describes some features of the
Cypress VIC068 and provides information· on how to use
the device.
The VIC068 was designed by a consortium of
VMEbus manufacturers in partnership with Cypress. The
major goals of this consortium were to achieve a standardized, reasonably priced vMEb~s interface that was not
dominated by any board manufacturer. Manufacturing
this specialized chip requires a high-speed process (125
MHz) and high-power I/O pins (64 and 48 rnA).
The VIC068 adheres to the ANSI/IEEE Standard
1014, which minimizes 'the problems of interfacing among
the VMEbus boards of various manufacturers. A block
diagram detailing the device's functional blocks appears
in Figure 1.

In slave write-posting mode (Figure 3), the same
function happens with write cycles from the VMEbus to
the local bus. As soon as the data is latched, the VMEbus
cycle is terminated and the local cycle can finish independently of further VMEbus traffic. Both modes reduce
CPU overhead and VMEbus utilization, providing higher
bandwidth in single-cyc1e writes.
The VMEbus prohibits a similar function in singlecycle reads because every read cycle on the VMEbus
could turn out to be a read-modify-write (RMW) cycle.
This cannot be foreseen because the only difference is that
the address strobe is held Low between the two cycles.
Therefore, if the VMEbus address strobe were released
during the two portions of the same RMW cycle, another
VMEbus master could break into that cycle and modify
the same data.
To move blocks of data over the VMEbus, the
VIC068 uses the block transfer mode. In its standard
form, this mode allows a processor to transfer up to 256
bytes with, just one starting address supplied to the
VMEbus. Additionally, the VIC068 uses a type of pipelining to accelerate VMEbus throughput. On a block transfer
read cycle, the slave VIC068 automatically prefetches the
n+ 1 data byte during the same read. The nth data byte is
transferred across the VMEbus, and the n-1 byte is latched
in local RAM. As shown in Figure 4, this operation uses
,all three buses in overlapped and parallel operation to
. speed up the transfer. Write transfers use the same
mechanism.
The limiting factor on the VMEbus transfer rate is
either the VMEbus's many timing restrictions or the
source or destination memories. If the memory consists of
dynamic RAM, the restriction is probably the cycle time
of the chips used, often as slow as 200 ns. To overcome
this limitation, the VIC068 offers a programmable access
mode so that attached DRAM can be used in page mode.
After a, starting row address cycle (RAS), all subsequent cycles need only a column address (CAS) to
reduce the .access time, often by as much as half. For a
slave interface, the VIC068 contains all the necessary
counters and timing elements for local AS, DS, and address generation.

VIC068 Highlights
With very precise timing, based on a 64-MHz clock
that is used internally to make decisions on 8-ns intervals,
you can reach the theoretical limits of the VMEbus transfer rates - a block transfer rate of 40 MBytesls.
,Because all logic resides in a single chip, the VIC068
greatly reduces the board space necessary to interface to
the VMEbus. Even a highly· sophisticated interface with
an A32/D32 system controller and block transfer support
requires no more than 60 square cm or 20 percent of a
double eurocard (6U card).
Special care has been taken to speed up the VIC068's
VMEbus access. Although, many of today's c.PU boards
use megabytes of high~speed local RAM to limit the number of VMEbus accesses, the accesses that do occur for
I/O or data reads and writes must be done efficiently to
avoid slowing the rest of the system.
For both types of data transfers, the VIC068 offers
special support. For single-write cycles, you can program
the VIC068 to operate in the so-called master or slave
write-posting mode. In the master write-posting mode
(Figure 2), the local VMEbus write cycle is terminated
locally as soon as data is latched in the VMEbus latches.
This allows the local CPU to continue with instruction
fetches or other operations while the VIC068 transfers
data over the VMEbus.
9-1

VIC068 Features

AOt - A07

LAO - LA7

Interprocessor
~~~~ COMMunications
Registers 8l
SWitches
r----,..----ACFM..*
- - - - L R Q 1 * - LRQ7*

1--..,...---.. FLO-

- P\..2IJ

...----..SYSFM..*

1-----.. L1AQ(0*

~---0J<:84M

-_SYSCLOCK

I - -............

~--~~--------s~

ABEN*
LADO
LADI
LEDI

LEDO
DDIR*
DENO*

U'WDENN*
L~

SW'DEN*

VME

Buffer
Control
Logic

Control Signal
TransforMation

ISOBE*
LAEN

Figure 1. VIC068 Functional Block Diagram

9-2

VIC068 Features
Local AS - ,
r-L-.J
VMEbus ACCESS - - ,
...-----VMEbus AS
I
>~
Local DTACK ~
VMEbus DTACK
>''L-I

VMEbus AS/DS
Slave Select
Local AS/DS
JMEbus DTACK
Local DTACK

Figure 2. Master Write Posting

~
~
,~
~
,~
~

>~

Figure 3. Slave Write Posting

A master block transfer needs two or three additional
latches for the higher address lines during the local DMA
part of the block transfer. Thus, even with low cost
DRAMs, the VIC068's block transfer rate can reach 40
MBytes/s, limited only by the VMEbus specification and
the physical characteristics of the VMEbus.
This transfer rate decreases the time needed to load
programs or move data to graphics boards, as well as increasing the VMEbus's bandwidth, thereby allowing more
CPU s to work together in a multiprocessor system.

Interrupt Generator
The VIC068 handles up to seven simultaneous pending IRQs with separate vectors. The VIC068 also provides
independent local IRQ vectors, if external IRQs are
served.

Miscellaneous Features
The VIC068 furnishes several features for VMEbus
support:
SYSFAIL generation
Software reset
ACFAIL
BERR register for detailed information
For local support, the VIC068 provides these features:
Seven local IRQ sources, all level, polarity, edge
and vector programmable.
Local bus time out (2 - 512 ms)
With/without VMEbus request time included
31 different local IRQ vectors
VIC ID register
In addition to the VIC068, the following parts or
equivalents are required for a minimum hardware interface:
Three address latches and drivers (74xx543)
Three data latches and drivers (74xx543)
Four isolation buffers (74xx245)
You might also need the following:
One to two PLDs for slave address decoding
Two to three latches for master block transfer
112 PLD for block transfer glue logic

Mailbox Signaling
To add greater capability to multiprocessor systems;
the VIC068 has four interprocessor communication global
switches (ICGSs) and four interprocessor communication
module switches (lCMSs). These are all byte-wide mailbox registers that generate a local interrupt when accessed
from the VMEbus. The ICGSs of one group reside at the
same address and are accessed with a write cycle, which
behaves as a broadcast to all members of the group. Because the ICMSs are at different addresses, one dedicated
processor can be activated with a local interrupt request
(IRQ).
A processor can inform a logical group of processors
about a new task via a broadcast using the ICGSs and can
then communicate with single processors about the task
using the ICMSs.
Eight-byte-wide
interprocessor
communication
registers (ICRs) are also available. Five of these registers
serve as general-purpose read/write registers, and three are
dedicated to control local activities (Halt, Reset, Mask ID,
etc.). The ICRs can be read and written from the local
side or the VMEbus without interfering with each other.

Longword n on VMEbus

VMEbus

Longword n-l
wrItten to RAM

Longword n+1
wrItten to RAM

CPU 2

CPU 1
Moster CPU

Slave CPU
Figure 4. Block Transfer Read Cycle

9-3

VIC068 Features
Interfacing
To connect a processor other than the 680xOto the
VIC068, it is often easiest to map the processor control
signals into the control signals available on a 680xO type
of processor. This type of transition interface offers the
advantage of compatibility with.a large family of 680xOcompatible peripheral parts, which you can then use elsewhere in the design.
Figure 5 shows a sample interface, whose four address latches store the multiplexed M-bus of the MC88000
processor. Four data latches store the data bytes after the
acknowledge of the 680xO bus and then start calculating
parity for the processor's M-bus. The reason for this approach lies in some older peripheral I/O chips, which
change their data lines when they should remain stable
(i.e., transmit data buffer empty, etc.).
Two other data latches emulate the MC68020's
dynamic bus sizing. The last buffer, between DO - D7 of
the 680xO bus and AD16 - AD23 of the M-bus, emulates
the 680xO bus's IRQ cycles with normal read cycles of
the MC88000.

eaoxo

ADDRESS
BUS

ADO,.,!

eaoxo

DATA
BUS

Acknowledgment
Cypress Semiconductor wishes to thank Jiirgen Bullacher of Eltec GMbH and Eltec International S.A.R.L. for
submitting this article.

CONTROL

i

IL _______________ .1

r---------------,
i

i

~
BlFFER

ADfl.23

'244

00.7

I

I

I

F.

RQ

VECTOR

I. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

:

J

Figure 5. Sample Interface

9-4

CYPRESS
SEMICONDUCTOR

Interfacing the VIC068 to the MC68020
This application note explains some of the features of
the Cypress VIC068 and provides the ftrst-time VIC068
user with simple implementations of these features. The
VIC068 offers the most highly integrated VMEbus interface available today. It reduces the number of parts
needed and saves board space. The emphasis in this application note is on interfacing the VIC068 as VMEbus
A24/AI6 DI61D08(EO) master/slave to the Motorola
68020.

up. Asserting the 74HCl64 CLEAR pin drives all the
parallel outputs Low, which asserts the selected MAP signal. With the two serial inputs tied High, each Low-toHigh transition of the 68020 AS clocks the High through
the shift register and out each of the parallel outputs. By
picking the proper output for the MAP signal, you can
decode from 1 t08 of the initial processor cycles. You
can use the MAP signals on memory conftgurations that
are 8, 16, or 32 bits wide by using the QH, QD, or QB
outputs, respectively.

Reset Operation

Using The Processor RESET Instruction
The OR gate in Figure 1 ensures that the 74HCI64 is
cleared only when HALT and RESET are both asserted.
This allows the use .of the 68020 RESET instruction
without inadvertently re-asserting MAP. An alternative to
this approach is to use two small-signal diodes (lN4148)
and a pull-down resistor in place of the OR gate. This
change reduces the design's parts count by eliminating the
74HC32.
A ROM remapping circuit must be used whether the
RESET instruction is issued or not because of the way the
VIC068 arbitrates local bus contention between the 68020
and the VMEbus. Contention occurs when both master
and slave operations are requested concurrently (MWB asserted and SLSELO, SLSELl, or IFCSEL asserted). The
VIC068 indicates this contention by asserting DEDLK.
You can' deal with the condition by setting· bit 4 of the
VIC068's interface conftguration register ($AF) to assert

The VIC068 performs three distinct reset operations:
Internal reset -·activated by the IRESET pin, which initializes.most of the internal registers
System reset - essentially the same as !RESET, but is activated by writing ($FO) to the system reset register, or by
asserting !RESET when the VIC068 is the VMEbus controller (SCaN pin asserted)
Global reset - initializes aU the VIC068 registers
After a reset, the 680XO processor reads its initial
stack pointer (SSP) and program counter (PC) from addresses $0 through $7. One way.to handle this is to remap
the boot-up ROMs to the low addresses for the ftrst' few
cycles of the processor.
Figure 1 shows a circuit you can use·to do this. The
circuit uses a serial-inlparallel-out shift· register (the
74HCI64) to generate the MAP signal. This active-Low
signal can be used with address-decode logic to force boot
ROM access to the lower addresses during initial power

YCC
3
MAP
MAP

MAP

FOR 32-BIT MEMORY

4

FOR 16-BIT MEMORY

6

FOR 8-BIT MEMORY

10
11
12
13

QA
QB
QC

A

2

B

,QD

TO VI Cl 68020 AS

QE
QF C L K <1--::8:...--,
QG
9
QH C L R
74HC164

TO YIC/68020

RESET

TO VIC/68020

HALT

Figure

l~

74HC32

ROM Remapping Circuit
9-5

Interfacing the VIC068 to the MC68020

VCC

I N4 148
47K

(TO

VI C068

PI N

03)

74HC14

0.1 UF

T

GND

I I RESET

(TO

VIC068

PIN

B14)

Figure 2. Global Reset Circuit
HALT along with LBERR when DEDLK occurs (68020
bus retry sequence). The VIC068 then waits for the 68020
to de-assert the MWB input Once this happens, the
VIC068 releases LBERR but continues to assert HALT to
keep the 68020 off the local bus. The VIC068 then allows
the slave operation.to complete and deasserts HALT. The
68020 can now retry the contested bus cycle..

known state. You initiate a global reset by asserting
IPL(O) concurrent with or just after asserting IRESET. Because IPL(O) is also one of the encoded interrupt lines for
the 68020, you must assert this signal with an open-collector device.
Figure 2 shows a typical power-up circuit for asserting IRESET and IPL(O). By using a device such as the
74HC14, you get the hysteresis necessary for the shallow
charging slope of the RC circuit connected to IRESET.
And because of the 74HC14's inherent propagation delay,
you can easily meet the requirement for asserting IPL(O)
after IRESET.
In using global reset, bear in mind that when the
VIC068 powers-up it ignores the VMEbus SYSRESET.
The VIC068 releases HALT and RESET after the 200-ms
time out even if the current VMEbus master asserts SYSRESET past this required minimum time. This automatic
release is a useful feature because it eliminates reliance on
the· system controller to release SYSRESET to start the
power:-up sequence.
The VIC068 generates· a LBERR if you try to access
the VMEbus or any of the VIC068 registers· before SYSRESET is de-asserted. One solution to this problem is to
structure the software so that the VIC068 registers are set
up as late as possible in the power-up sequence. You can
also temporarily point the 68020 BERR exception vector
to an address containing an RTE instruction and let the
68020 cycle in a BERRIRTE loop until SYSRESET is deasserted. The latter approach provides an opportunity to
be the fIrst board in a system to request YMEbus mastership.

Internal Reset
At fIrst· glance, the IRESET might seem the logical
choice· for implementing the power-on reset. Because the
IRESET input has some built-in hysteresis, a simple RC
circuit would be appropriate for applying the power-on
signal.
IRESET does not initialize the local bus timing
register nor any of the slave select registers, however. Additionally, the VIC068 powers-up with the DRAM refresh
option enabled (bit 4 of the arbiter/requester confIguration
register $B3 High). This condition is acceptable if you are
using DRAM but advers.ely affects the external reset circuit in Figure 1. SpecifIcally, for the fIrst DRAM refresh
cycle, the VIC068 deasserts·RESET but maintains HALT
in the active (Low) state and toggles AS. This action
causes shift operations in the 74HC164. You can activate
DRAM refresh after reset by writing a 1 to bit 4 of the
arbiter/requester confIguration register ($B3).
System Reset
The assertion of SYSRESET on the VMEbus typically activates system reset, but only when a global reset is
not taking place. When the VIC068 is confIgured as the
system controller (SCON pin asserted), it drives the SYSRESET pin for the required 200 ms during an internal or
global reset.

.Connecting The Bus Lines

Global Reset
The global reset is the most useful for power-up purposes because it places all the VIC068 registers in a

Figure 3 shows the standard buffer confIguration for
an A241D16 VMEbus connection. This design also supports A16 and D08(E0) operation.

9-6

IC~RESS

Interfacing the VIC068 to the MC68020

~, SEMICQIDUCTOR

The D16ID08(EO) Data Bus
Connect the VIC068 to the 68020 as you would any
16-bit peripheral device. The 74FCT543 data buffer connects between the,68020 data bus's upper byte (D31 - 24)
and the VMEbus D15 - 8 data lines. The lower byte (LD7
- LDO) is buffered through the VIC068 to the VMEbus

low byte (D7 - DO). Several control signals connect
directly from the VIC068 to the 74FCT543: DENO (data
enable out) to OEAB (Output enable A-to-B), LWDENIN
(lower word data enable) to OEBA (Output enable B-toA), LEDO (latch enable data out) to IEAB (Latch enable
A-to-B), and LEDI (latch enable data in) to LEBA (latch
enable B-to-A).

'1 .. 110

'1 .. 111

11 Il A..

:=

~

[H U

C:::::
C:::::

.·PAm

$:::

~.
.....
···
:
:

-.

.1-1.
.1 ....

Pl· . .

.1-111

'1-"
.1-17
'1-"
'1-111
'1-114
.1-11'

.1 .. 111
,1 .. 11
,1 .. 1 •
• I-CII

Pl·IU
fl·ll.
fl·ll.
Pl·1l7
Pl·1l1
Pl·IU
fl·n.
fl·UI
fl·AU
Pl·AtI
Pl'AII
fl'C14
fl·AU
fl·11I
'1·111
Pl'117
Pl·II.
Pl-All
'I-Al'

'OJ

,1 ... 111

.1 ... 11.
'l-Cll

'l-CIO

.1-1)

'l-Cll

fl-CII
fl-CII
fl-CI7
,I-CII
,1·CII
fl-Cl'
H'CII
'I-CII
fl-CIl
fl·CU
,I·CI.
Pl-Cl.
fl-Cl7

,.-

..

,

COlTtOL

,"E

Ila
IUS

't-C.I
Pl-Cl.
Pl-esl
Pl-AH
Pl-au
Pl-AII

.1-AI'

.1-AII

'I-AI'
~

I

...J..lll..

..
.. ..
AI

7'ACt.

AI
AI
U
AI
AI

~

A7

~

F
I ..

.. ...

..L 'A"i'""Io

tUI
LUI
CUI
Iua
LEI
CEI.
•

i•

II
IS

..

n

17

::
It~r"l1
Al
AI
AI
U
A7

II
II
I'
II

::
17

"'IUI
LUI
~ CUI

.

"IA
CUA
,~LEIA
41

"I

U

.r,oo. .
Al
AI

I.
.,
U
II

AI
U
AI
..
A7
17
I •. lUI
LUI
CUI
~ lUI.
LEIA
CUI.

,t

UI

Figure 3. Address and Data Bus Connections

9-7

II
I.

141

+-

fl-U
Pl-C7

'4-0

,t
. c ..
Pl-es

~

...

.1-A,'

.

'I·e'
Pl-CI
'l-CI
'loU

Pl·"
..... '1-A7
.1-AI
,I-li
.1 ....
,..... .1-a,
'I-AI

'--

.1-Al

'ME

IIDIEII
IUS

The Address Bus
The A24/A16 configuration requires the uSe of two
more 74FCT543 devices to buffer and control the
VMEbus A23 through A8 signals. The 74FCT543 LEAB,
LEBA, and OEBA inputs connect directly to the VIC068
LADO (latch address out control), LADI (latch address in
control~, and ABEN (enable address out control) outputs;
respectIvely. The output of the VIC068 LAEN (local-address enable control) must be connected to the 74FCT543
OEBA input through an inverter because LAEN is an active-High output and OEBA is an active-Low input.

the arbitration logic to. assert the correct BOIN 9 (bus
grant in) line. The VlC068 takeS cares of this overhead for
the local processor, improving system throughput.
. To request VMEbus m.astership, the 68020 asserts· the
MWB input. You can think of MWB as a VMEbus chip
select.
.
.
When interfacing to the VMEbus as an A24 or A16
device, you Can have access to the whole VMEbus address space by decodiitg a 32-Mbytearea of the 68020
address sP!lce. for. VMEbus operations. The ASIZ1-O pins
tell the VIC068 whethetthe current cycles represent an
A32, A24; or A16 operation. Y01.l can use the upper 16Mbyte. address space (A24 High)'for VMEbusA23 operation and the lower half (A24 Low) for VMEbus A16
operation by follo:wing three steps: decode A31 through
A25 to generate MWB, tie the ASIZ1 input High, and
connect the 68020 A24 address line to the VIC068's
ASIZO input. Figure 4 demonstrates this way of deCodirlg
MWB.
Wheri the VIC06~ recognizes a. vatid shive access,
the device asserts LBR (68020 BR input)· and waits for
LBO assertion (68020 BO output). Orice the VIC068
receives LBO, the device becomes the local bus master at
the conclusion. of the current cycle and completes the requested VMEbus slave. operation.· If the VIC068 is the
only DMA device on the local bus, there is no need to
generate BOACK.(bus grant acknowledge) for the 68020.
But if any other devices are capable of local bus masterShip, .you have to provide th~ arbitration logic and the
BOACK signal for the 68020. Keep in mind, toc,>, that
other DMA devices must be able to recognize and deal
appropriately with the 68020 bus-cycle entry operation
(BERR and HALT asserted) ..

Connecting The DSACK Lines

During the normal local bus operation, the 68020's
slave devices (i.e., memory, UART,. parallel port) must
tell the processor the size of their data bus. This is done
by asserting the DSACK1 inputs, which tells the 68020
that the port is a 16-bit device. Asserting DSACKO instead indicates .thilt the port is an 8-bit device. Asserting
both DSACK1 and DSACKO indicates that the port is 32
bits wide. To configure the VIC068 as a 16-bit port, simply connect the 68020 DSACKI to the VIC068 DSACKl.
So long as there you have no requirement for
VMEbus access to 8-bit devices on the local bus, you do
not need to do anything with the VIC068 DSACKO pin
except terminate it (pull it High).
When you do need to access 8-bit devices, a small
problem arises with the way the VIC068 acknowledges
register accesses and interrupt-acknowledge cycles.
During these cycles, the VIC068 always asserts both
DSACKI and DSACKO, whether the WORD input is asserted or not. And in· VMEbus master cycles, wheri talking to· an 8-bit device on the VMEbus, the VIC068
responds with DASCKO to acknowledge the 8-bit transfer
completion.
The solution to the DSACKO problem is simple but
can be complicated to implement: You must break the
DASCKO connection between the VIC068 arid the 68020
during interrupt acknowledge or VIC068 register access
(CS) cycles. The circuit needed to do this is a bidirectional, open-collector buffer between the VIC068 and 68020.
The buffer should be inactive in both directions only
when the VIC068 FCIACK or CS inputs are asserted. In
Figure 4's PAL equations, the DSACKO 020 and VIC068
DSACKO equation illustrates how to handle the DSACKO
connection.

Slave Operation
The VIC068 carl provide full vMEbus slave operation by· dual:porting local memory with little or no 68020
overhead. The normal slave access operation statts by
providing SLSELO. or SLSELI through VMEbus address
decoding. The circuits in Figures 3 and 5 use a 22V10
PAL for this pwpose. Always qualify VMEbus address
decoding with the AS andlor DS1-0.
Decoding SLSELO,SLSELl, andIFCSEL
Figure 5 illustrates a. typical .PAL specification that
you can use to provide address decoding for SLSELO,
StSEL1,"and IFCSEL. The VIC068 uses all the address
modifier lines (AM5 - 0) to quality the access mod~. Adc4"ess decoding can ignore these inputs. The VIC068 then
decides if the access m?de is legal and completes the
cycle or generates the VMEbus BERR signal, depending
on the value programmed in the slave select registers. You
can also qualify the select outputs with. the address
modifiers ai14 let the initiating device time-out if the access is not legal:
The IFCSEL input gives the VMEbus access to some
of the VIC068 control, registers and the interprocessor
communication registers. These registers are available
A16 privileged-mode access.
only through

Master Operation
VMEbus master operation with the VIC068 is easily
accomplished with the use of the MWB (module-wantsbus) input. The VMEbus can be requested at any level (0
- 3). The VMEbus can also be dynamically changed via
the arbiter/requester configuration register ($B3), which
eliminates the need for hardware jumpers. All VMEbus
release modes are supported through the release control
register ($D3). Support for write posting means that the
local processor can write to the VMEbus without having
to wait for the current bus master to release the bus or for

an

9-8

module_CYCLE_DECODE;
Cycle_decode device 'PV22VlO';
VCC,OND

pin 24,12;

"inputs (15)
A31,A30,A29,A28,A27,A26,A25,A19
SLSEL1, SLSELO
FC2,FCl,FCO,AS,LBO
"outputs (6)
VIC_DSACKO,DSACKO_020
VIC_CYCLE
FCIACK
PRE_MWB,MWB

pin 1;1.,3,4,5,6,7,8;
pin 9,10;
pin 13,14,15,16,17 "f
Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19
Create Date                     : 2017:08:12 11:51:38-08:00
Modify Date                     : 2017:08:12 12:45:30-07:00
Metadata Date                   : 2017:08:12 12:45:30-07:00
Producer                        : Adobe Acrobat 9.0 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:f1558bd8-10ef-2240-9eb2-8c1051febb7d
Instance ID                     : uuid:32dce3e4-eea8-d24f-8ea0-afda160e6eb8
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 736
EXIF Metadata provided by EXIF.tools

Navigation menu