1989_TI_SN74ACT8800_Family_Data_Manual 1989 TI SN74ACT8800 Family Data Manual
User Manual: 1989_TI_SN74ACT8800_Family_Data_Manual
Open the PDF directly: View PDF
.
Page Count: 754
| Download | |
| Open PDF In Browser | View PDF |
. . . TEXAS
INSTRUMENTS
SN74ACT8800 Family
32·Bif CMOS Processor
Building Blocks
1989
1989
Overview
SN74ACT8818
16-Bit Microsequencer
SN74ACT8832
32-Bit Registered ALU
SN74ACT8836
32- x 32-Bit Parallel Multiplier
SN74ACT8837
64-Bit Floating Point Processor
SN74ACT8841
Digital Crossbar Switch
SN74ACT8847
64-Bit Floating Point/lnteger Processor
Support
Mechanical Data
SN74ACT8800 Family
32·Bit CMOS Processor
Building Blocks
Data Manual
-1!1
TEXAS
INSTRUMENTS
IMPORTANT NOTICE
Texas Instruments (Til reserves the right to make changes to or
to discontinue any semiconductor product or service identified
in this publication without notice. TI advises its customers to
obtain the latest version of the relevant information to verify,
before placing orders, that the information being relied upon is
current.
TI warrants performance of its semiconductor products to current
specifications in accordance with TI's standard warranty. Testing
and other quality control techniques are utilized to the extent TI
deems necessary to support this warranty. Unless mandated by
government requirements, specific testing of all parameters of
each device is not necessarily performed.
TI assumes no liability for TI applications assistance, customer
product design, software performance, or infringement of patents
or services described herein. Nor does TI warrant or represent that
any license, either express or implied, is granted under any patent
right, copyright, mask work right. or other intellectual property
right of TI covering or relating to any combination, machine, or
process in which such semiconductor products or services might
be or are used.
Copyright © 1988, Texas Instruments Incorporated
March 1988
First edition:
First revision:
June 1988
Second revision: June 1989
INTRODUCTION
In this manual, Texas Instruments presents technical information on the TI
SN74ACT8800 family of 32-bit processor "building block" circuits. The
SN74ACT8800 family is composed of single-chip VLSI processor functions, all of which
are designed for high-complexity processing applications.
This manual includes specifications and operational information on the following highperformance advanced-CMOS devices:
•
•
•
•
•
•
SN 7 4ACT881 8
SN74ACT8832
SN74ACT8836
SN74ACT8837
SN74ACT8841
SN74ACT8847
16-bit
32-bit
32- x
64-bit
Digital
64-bit
microsequencer
registered ALU
32-bit parallel multiplier
floating point processor
crossbar switch
floating point/integer processor
These high-speed devices operate at or above 20 MHz, while providing the low power
consumption of TI's advanced one-micron EPIC'· CMOS technology. The EPIC'· CMOS
process combines twin-well structures for increased density with one-micron gate
lengths for increased speed.
The SN74ACT8800 Family Data Manual contains design and specification data for
all five devices previously listed and includes additional programming and operational
information for the '8818, '8832, and '8837/'8847. Two application notes,
"Chebyshev Routines for the SN74ACT8847" and "High-speed Vector Math and 3D
Graphics Using the SN74ACT8837/8847 Floating Point Unit" are also included.
Introductory sections of the manual include an overview of the '8800 family and a
summary of the software tools and design support TI offers for the chip-set. The general
information section includes an explanation of the function tables, parameter
measurement information, and typical characteristics related to the products listed
in this volume.
Package dimensions are given in the Mechanical Data section of the book in metric
measurement (and parenthetically in inches).
Complete technical data for any Texas Instruments semicondutor product is available
from your nearest TI field sales office, local authorized TI distributor, or by calling Texas
Instruments at 1-800-232-3200.
EPIC is a trademark of Texas Instruments Incorporated.
v
vi
Overview
1-1
o
...
<
<
(1)
en'
:E
1-2
Overview
1-3
o
<
...<
CD
ai'
:E
1-4
Introduction
Texas Instruments SN74ACT8800 family of 32-bit processor building blocks has been
developed to allow the easy, custom design of functionally sophisticated, highperformance processor systems. The '8800 family is composed of single-chip, VLSI
devices, each of which represents an element of a CPU.
Geared for computationally intensive applications, SN74ACT8800 devices include highperformance ALUs, multipliers, microsequencers, and floating point processors.
The '8800 chip set provides the performance, functionality, and flexibility to fill the
most demanding processing needs and is structured to reduce system design cost
and effort. Most of these high-speed processor functions operate at 20 MHz and above,
and, at the same time, provide the power savings of TI's advanced, 1 I!m EPICTM CMOS
technology.
The family's building block approach allows the easy, "pick-and-choose" creation of
customized processor systems, while the devices' high level of integration provides
cost-effectiveness.
Designed especially for high-complexity processing, the devices in the '8800 family
offer a range of functional options. Device features include three-port architecture,
double-precision accuracy, optional pipelined operation, and built-in fault tolerance.
Array, digital signal, image, and graphics processing can be optimized with '8800
devices. Other applications are found in supermini and fault-tolerant computers, and
I/O and network controllers.
In addition to the high-performance, CMOS processor functions featured in this data
manual, the family includes several high-speed, low-power bipolar support chips. To
reduce power dissipation and ensure reliabilty, these bipolar devices use Tl's proprietary
Schottky Transistor Logic (STL) internal circuitry.
EPIC is a trademark of Texas Instruments Incorporated.
1-5
~
oS;
Cii
>
o
At present, TI's '8800 32-bit processor building block family comprises the following
functions:
•
•
•
•
•
•
•
o
<
...<
(1)
CD'
:e
SN74ACT8818 16-bit micro sequencer
SN74ACT8832 32-bit registered ALU
SN74ACT8836 32· x 32-bit parallel multiplier
SN74ACT8837 64-bit floating point processor
SN74ACT8841 digital crossbar switch
SN74ACT8847 64-bit floating point and integer processor
Bipolar Support Chips
• SN74AS8838 32-bit barrel shifter
• SN74AS8839 32-bit shuffle/exchange network
• SN74AS8840 16 x 4 crossbar switch
20 MIPS and Low CMOS Power Consumption
With instruction cycle times of 50 ns or less and the low power consumption of EPIC'·
CMOS, the '8800 chip set offers an unrivaled speed/power combination. Unlike
traditional microprocessors, which require multiple cycles to perform an operation,
the' ACT8800 processors typically can complete instructions in a single cycle.
The ' ACT8832 registered ALU and ' ACT8818 microsequencer together create a
powerful 20-MHz CPU. Because instructions can be performed in a single cycle, the
8832/8818 combination is capable of executing over 20 million instructions per second
(MIPS).
For math-intensive applications, the ' ACT8836 fixed-point multiplier/accumulator
(MAC), ,ACT8837 64-bit floating point processor, and' ACT884 7 64-bit floating point
and integer processor offer unprecedented computational power.
The exceptional performance of the' ACT8800 family is made possible by TI's EPICTlO
CMOS technology. The EPIC™ CMOS process combines twin-well structures for
increased density with one-micron gate lengths for increased speed.
Customized Solution
The '8800 family is designed with a variety of architectural and functional options
to provide maximum design flexibility. These device features allow the creation of
"customized" solutions with the '8800 chipset.
A building block approach to processing allows designers to match specialized hardware
to their specific design needs. The '8818/8832 combination forms the basis of the
system, a high-speed CPU. For applications requiring high-speed integer multiplication,
the' ACT8836 can be added. To provide the high precision and large dynamic range
of floating point numbers, the 'ACT8837 or 'ACT8847 can be employed.
EPIC is a trademark of Texas Instruments Incorporated.
1-6
To ensure speed and flexibility, each component of the '8800 family has three data
ports. Each data port accommodates 32 bits of data, plus four parity bits. This
architecture eliminates many of the I/O bottlenecks associated with traditional singleI/O microprocessors.
The three-port architecture and functional partitioning of the '8800 chip-set opens
the door to a variety of parallel processing applications. Placing the math and shifting
functions in parallel with the ALU permits concurrent processing of data. Additional
processors can be added when performance needs dictate'.
The 'ACT8800 building block processors are microprogrammable, so that their
instruction sets can be tailored to a specific application. This high degree of
programmability offers greater speed and flexibility than a typical microprocessor and
ensures the most efficient use of hardware.
A separate control bus eliminates the need for multiplexing instructions and data, further
reducing processing bottlenecks. The microcode bus width is determined by the
designer and the application.
Another source of design flexibility is provided by the pipelined/flowthrough operation
option. Pipelining can dramatically reduce the time required to perform iterative, or
sequential, calculations. On the other hand, random or nonsequential algorithms require
fast flowthrough operations. The '8800 chip set allows the designer to select the mode
(fully pipelined, partially pipelined, or nonpipelined) most suited to each design.
Scientific Accuracy
The '8800 family is designed to support applications which require double-precision
accuracy. Many scientific applications, such as those in the areas of high-end graphics,
digital signal processing, and array processing, require such accuracy to maintain data
integrity. In general-purpose computing applications, floating point processors must
often support double-precision data formats to maintain compatibility with existing
software.
To ensure data integrity, '8800 devices (excluding the barrel shifter and
microsequencer) support parity checking and generation, as well as master/slave error
detection. Byte parity checking is performed on the input ports, and a parity generator
and a master/slave comparator are provided at the output. Fault tolerance is built into
the processors, ensuring correct device operation without extra logic or costly software.
1-7
3:
Q)
.~
~
o
The SN74ACT8800 Building Block Processor System
Some of the high-performance '8800 devices are described in the following paragraphs.
SN74ACT8818 16-Bit Microsequencer
~
~
<
ai'
:e
In a high-performance microcoded system, a fast microcode controller is required to
control the flow of instructions. The SN74ACT8818 is a high-speed, versatile l6-bit
microsequencer capable of addressing 64K words of microcode memory. The
ACT881 8 can address the next instruction fast enough to support a 50-ns system
cycle time.
'
The' ACT8818 65-word-deep by l6-bit-wide stack is useful for storing subroutine
return addresses, top of loop addresses, and loop counts. Addresses can be sourced
from eight different sources: the three I/O ports, the two register counters, the
microprogram counter, the stack, and the l6-way branch.
SN74ACT8832 Registered ALU
The SN74ACT8832 is a 32-bit registered ALU that operates at approximately 20 MHz.
Because instructions can be performed in a single cycle, the' ACT8832 is capable of
executing 20 million microinstructions per second. An on-board 64-word register file
is 36-bits-wide to permit the storage of parity bits. The 3-operand register file increases
performance by enabling the creation of an instruction and the storage of the previous
result in a single cycle. To facilitate data transfer, operands stored in the register file
can be accessed externally, while the ALU is executing. To support the parallel
processing of data, the' ACT8832 can be configured to operate as four 8-bit ALUs,
two l6-bit ALUs, or a single 32-bit ALU. The' ACT8832 incorporates 32-bit shifters
for double-precision shift operations.
SN74ACT8836 32- x 32-Bit Integer MAC
The SN74ACT8836 is a 32-bit integer multiplier/accumulator (MAC) that accepts two
32-bit inputs and computes a 64-bit product. The device can also operate as a 64-bit
by 64-bit multiplier. An onboard adder is provided to add or subtract the product or
the complement of the product from the accumulator.
When pipelined internally, the l.",m CMOS parallel MAC performs a full 32- x 32-bit
multiply/accumulate in a single 36-ns clock cycle. In flowthrough mode (without any
pipelining), the' ACT8836 takes 60 ns to multiply two 32-bit numbers. The' ACT8836
performs a 64- x 64-bit multiply/accumulate, outputting a 64-bit result, in 225 ns.
The' ACT8836 can handle a wide variety of data types, including two's complement,
signed, and mixed. Division is supported via the Newton-Raphson algorithm.
SN74ACT8837 64-Bit Floating Point Unit
The SN74ACT8837 is a high-speed floating point processor. This single-chip device
performs 32- or 64-bit floating point operations.
1-8
More than just a coprocessor, the' ACT8837 integrates on one chip a double-precision
floating point ALU and multiplier. Integrating these functions on a single chip reduces
data routing problems and processing overhead. In addition, three data ports and a
64-bit internal bus architecture allow for single-cycle operations.
The' ACT8837 can be pipelined for iterative calculations or can operate with input
registers disabled for low latency.
SN74ACT8841 Digital Crossbar Switch
~
Q)
'S;
..
Q)
>
The SN74ACT8841 is a single-chip digital crossbar switch. The high-performance
device, cost-effectively eliminates bottlenecks to speed data through complex bus
architecture.
The' ACT8841 is ideal for multiprocessor applications, where memory bottlenecks
tend to occur. The device has 64 bidirectional I/O ports that can be configured as 16
4-bit ports, 8 8-bit ports, or 4 16-bit ports. Each bidirectional port can be connected
in any conceivable combination. Any single input port can be broadcast to any
combination of output ports. The total time for data transfer is 20 ns.
The control sources for ten separate switching configurations are on-chip, including
eight banks of programmable control flip-flops and two hard-wired control circuits.
The EPIC'" CMOS SN74ACT8841 and its predecessor, SN74AS8840, are based on
the same architecture, differing in power consumption, number of control registers,
and pin-out. Microcode written for the ' AS8840 can be run on the ' ACT8841 .
SN74ACT8847 64-Bit Floating Point Unit
The SN74ACT8847 is a high-speed 64-bit floating point processor. The device is fully
compatible with IEEE standard 754-1985 for addition, subtraction, multiplication,
division, square root, and comparison. Division and square root operations are
implemented via hardwired control.
The SN74ACT8847 FPU also performs integer arithmetic, logical operations, and logical
shifts. Registers are provided at the inputs, outputs, and inside the ALU and multiplier
to support multilevel pipelining. These registers can be bypassed for nonpipelined
operations.
When fully pipelined, the' ACT884 7 can perform a double-precision floating point or
32-bit integer operation in under 40 ns. When in flowthrough mode, the' ACT884 7
takes less than 100 ns to perform an operation.
1-9
0
Bipolar Support Chips
~
~
The SN74AS8838 high-speed, 32-bit barrel shifter can shift up to 32 bits in a single
instruction cycle of Linder 25 ns. Five basic shifts can be programmed: circular left,
circular right, logical left, logical right, and arithmetic right. The' AS8838 offloads the
responsibility for shifting operations from the ALU, which increases shifter functionality
and system throughput.
<
(ii' The SN74AS8839 is a 32-bit shuffle/exchange network. The high-speed device can
perform data permutations on one 32-bit, two 16-bit, four 8-bit, or eight 4-bit data
words in a single instruction cycle of under 25 ns. The shuffle/exchange network is
designed primarily for use in digital signal processing applications.
:e
1-10
SN74ACT8818
16-Bit Microsequencer
2-1
en
:2
"'-I
~
l>
(")
-I
00
00
....a
00
2-2
SN74ACT8818
16·8it Microsequencer
•
Addresses Up to 64K Locations of Microprogram Memory
•
CLK-to-Y
•
Low-Power EPIC'· CMOS
•
Addresses Selected from Eight Different Sources
•
Performs Multiway Branching, Conditional Subroutine Calls, and Nested
Loops
=
30 ns (tpd)
•
Large 65-Word by 16-bit Stack
•
Cascadable
co
.-
CO
CO
lt)
n
-I
CO
CO
~
CO
Continue .........................................
Continue and Pop ..................................
Continue and Push .................................
Branch (Example 1) .................................
Branch (Example 2) .................................
Sixteen-Way Branch ................................
Conditional Branch .................................
Three-Way Branch .................................
Thirty-Two-Way Branch .............................
Repeat ..........................................
Repeat on Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Repeat Until CC = H ...............................
Loop Until Zero ....................................
Conditional Loop Until Zero ...........................
Jump to Subroutine ................................
Conditional Jump to Subroutine ........................
Two-Way Jump to Subroutine .........................
Return from Subroutine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conditional Return from Subroutine . . . . . . . . . . . . . . . . . . . . .
Clear Pointers .....................................
Reset ...........................................
2-6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2-40
2-40
2-40
2-40
2-42
2-42
2-42
2-44
2-44
2-44
2-46
2-46
2-48
2-48
2-50
2-52
2-52
2-52
2-54
2-54
2-54
2-54
List of Illustrations
Figure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Title
' ACT8818 GC Package ......................... .
' ACT881 8 FN Package ......................... .
' ACT881 8 Logic Symbol ........................ .
' ACT881 8 Functional Block Diagram ............... .
Continue .................................... .
~ontinue and Pop ............................. .
Continue and Push ............................ .
Branch Example 1 ............................. .
Branch Example 2 ............................. .
Sixteen-Way Branch ........................... .
Conditiohal Branch ............................ .
Three-Way Branch ............................. .
Thirty-Two Way Branch ......................... .
Repeat ....................•.................
Repeat on Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Repeat Until CC = H ........................... .
Loop Until Zero ............................... .
Conditional Loop Until Zero (Example 2) ............. .
Jump to Subroutine ............................ .
Conditional Jump to Subroutine ................... .
Two-Way JUnip to Subroutine .................... .
Return from Subroutine ......................... .
Conditional Return from Subroutine ................ .
Clear Pointers ................................ .
Page
2-14
2-16
2-17
2-27
2-41
2-41
2-41
2-43
2-43
2-43
2-45
2-45
2-45
2-46
2-47
2-49
2-49
2-51
2-53
2-53
2-53
2-55
2-55
2-56
2-7
CO
or-
CO
CO
I(,)
c:(
~
.....
Z
CJ)
en
2
-...J
~
l>
(")
-f
00
00
...Ii
00
2-8
List of Tables
Table
1
2
3
4
5
6
7
8
9
10
11
Title
'ACT8818 Pin Grid Allocation ....................
' ACT881 8 Pin Functional Description ...............
Response to Control Inputs ......................
Y Output Controls (MUX2-MUXO) .................
Stack Controls (S2-S0) .........................
Register Controls (RC2-RCO) .....................
Decrement and Branch on Nonzero Encodings ........
Call Encodings without Register Decrements .........
Call Encodings with Register Decrements ............
Return Encodings without Register Decrements .......
Return Encodings with Register Decrements ..........
Page
.
.
.
.
.
.
.
.
.
.
.
2-15
2-18
2-26
2-32
2-33
2-33
2-36
2-37
2-38
2-38
2-39
2-9
~
CO
CO
~
()
~
"""
Z
CIJ
2-10
Introduction
The SN 7 4ACT8818 microsequencer is a low-power, high-performance microsequencer
implemented in TI's EPICT. Advanced CMOS technology. The 16-bit device addresses
up to 64K locations of microprogram memory and is compatible with the SN74AS890
microsequencer.
The 'ACT8818 performs a range of sequencing operations in support of TI's family 00
of building block devices and special-purpose processors such as the SN74ACT8847 ~
Floating Point Unit (FPU).
~
I-
Understanding the ' ACT8818 Microsequencer
U
The' ACT8818 microsequencer is designed to control execution of microcode in a
microprogrammed system. Basic architecture of such a system usually incorporates
at least the microsequencer, one or more processing elements such as the' ACT8847
FPU or the SN74ACT8832 Registered ALU, microprogram memory, microinstruction
register, and status logic to monitor system states and provide status inputs to the
microsequencer.
The' ACT8818 combines flexibility and high speed in a microsequencer that performs
multiway branching, conditional subroutine calls, nested loops, and a variety of other
microprogrammable operations. The' ACT8818 can also be cascaded for providing
additional register/counters or addressing capability for more complex microcoded
control functions.
In this microsequencer, several sources are available for microprogram address
selection. The primary source is the 16-bit microprogram counter (MPCl, although
branch addresses may be input on the two 1 6-bit address buses, ORA and ORB. An
address input on the ORA bus can be pushed on the stack for later selection.
Register/counters RCA and RCB can store either branch addresses or loop counts as
needed, either for branch operations or for looping on the stack.
The selection of address source can be based on external status from the device being
controlled, so that three-way or multiway branching is supported. Once selected, the
address which is output on the Y bus passes to the microprogram memory, and the
microinstruction from the selected location is clocked into the pipeline register at the
beginning of the next cycle.
It is also possible to interrupt the' ACT881 8 by placing the Y output bus in a highimpedance state and forcing an interrupt vector on the Y bus. External logic is required
to place the bus in high impedance and load the interrupt vector. The first
EPIC is a trademark of Texas Instruments Incorporated.
2-11
«
'I:t
I'
Z
(J)
microinstruction of the interrupt handler subroutine can push the address from the
Interrupt Return register on the stack so that proper linkage is preserved for the return
from subroutine.
Microprogramming the 'ACT8818
~
Microinstructions for the' ACT8818 select the specific operations performed by the
Y output multiplexer, the register/counters RCA and RCB, the stack, and the
bidirectional DRA and DRB buses. Each set of inputs is represented as a separate field
in the microinstructions, which control not only the microsequencer but also the ALU
or other devices in the system.
-...J
The 3-port architecture of the 'ACT8818 facilitates both branch addressing and
register/counter operations. Both register/counters can be used to hold either loop
C") counts or branch addresses loaded from the DRA and DRB buses. Register/counter
-t operations are selected by control inputs RC2-RCO.
CX)
~
»
CX)
-' Similarly, the 65-word by 16-bit stack can save addresses from the DRA bus, the
CX) microprogram counter (MPC)' or the Interrupt Return register, depending on the settings
of stack controls S2-S0 and related control inputs. Flexible instructions such as Branch
DRA else Branch to Stack else Continue can be coded· to take advantage of the
conditional branching capability of the 'ACT8818.
Multiway branching (16- or 32-way) uses the B3-.80 inputs to set up a 16-way branch
address on DRA or DRB by concatenating B3-BO with the upper 12 bits of the DRA
or DRB bus. The resulting branch addresses DRA' (DRA 15-DRA4::B3-BO) and DRB'
(DRB15-DRB4::B3-BO) are selected by the Y output multiplexer controls MUX2-MUXO.
A Branch DRB' else Branch DRA' instruction can select up to 32 branch addresses,
as determined by the settings of B3-BO.
Design Support
TI's '8818 16-bit microsequencer is supported by a variety of tools developed to aid
in design evaluation and verification. These tools will streamline all stages of the design
process, from assessing the operation and performance of the '8818 to evaluating
a total system application. The tools include a functional model, behavioral model,
and microcode development software and hardware. Section 8 of this manual provides
specific information on the design tools supporting Tl's SN74ACT8800 Family.
2-12
Systems Expertise
Texas Instruments VLSI Logic applications group is available to help designers analyze
TI's high-performance VLSI products, such as the '8818 16-bit microsequencer. The
group works directly with designers to provide ready answers to device-related
questions and also prepares a variety of applications documentation.
The group may be reached in Dallas, at (214) 997-3970.
....00
00
00
l-
e.>
«
~
,.....
Z
CIJ
2-13
'ACT8818 Pin Grid Allocation
(TOP VIEW)
2
A
B
c
D
en
E
2
-...J
~
l>
G
~
H
C")
00
00
J
...,\
00
K
3
4
5
6
8
9
10 11
.
• • • • •
• • • • • • • • •
• • • • • • • • •
.~
• (!) •
• •
• • •
• • •
• • •
•
•
•
•
• •
• •
• •
• •
• •
• • • • • •
(!) • • • • • • • (!) •
• • • • • • • • •
• •
• • •
•
Figure 1. 'ACT8818.
2-14
7
. GC Package
Table 1. 'ACT8818 Pin Grid Allocation
PIN
NO.
A2
A3
A4
A5
A6
A7
A8
A9
A10
B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
B11
C1
NAME
RC2
Y1
Y3
Y5
Y6
Y8
Y11
Y13
NC
DRB15
RC1
YO
Y2
Y4
YOE
Y9
Y12
Y14
Y15
ZEROIN
DRB14
PIN
NO.
C2
C3
C5
C6
C7
C9
C10
C11
D1
D2
D9
D10
D11
E1
E2
E3
E9
E10
E11
F1
F2
NAME
RCO
GND
GND
Y7
Y10
GND
VCC
RE
DRB12
DRB13
GND
COUT
INC
DRB9
DRB10
DRB11
INT
B3
B2
DRB7
DRB8
PIN
NO.
F3
F9
F10
F11
G1
G2
G3
G9
G10
G11
H1
H2
H10
H11
J1
J2
J3
J5
J6
J8
J9
NAME
RBOE
BO
B1
MUX2
DRB6
DRB5
GND
CLK
MUXO
MUX1
DRB4
DRB3
CC
ZEROUT
DRB2
DRB1
VCC
GND
RAOE
DRA1
GND
PIN
NO.
J10
J11
K1
K2
K3
K4
K5
K6
K7
K8
K9
K10
K11
L2
L3
L4
L5
L6
L7
L8
L9
L10
NAME
51
5TKWRN/RER
DRBO
5ELDR
DRA14
DRA12
DRA10
DRA7
DRA5
DRA3
DRAO
50
52
DRA15
DRA13
DRA11
DRA9
DRA8
DRA6
DRA4
DRA2
05EL
en
0ren
en
l-
e,)
(')
-t
00
00
.....
00
PIN
GC
FN
NAME
NO.
NO.
BO
F9
22
Bl
FlO
23
B2
Ell
24
B3
El0
25
ClK
G9
18
COUT
Dl0
28
DESCRIPTION
I/O
I
Input bits for branch addressing (see Table 3)
System clock
Incremerit~r
0
'carry-out. Goes high when an attempt is
made to il")crement microprogram counter beyond
addressable micromemory.
CC
Hl0
DRAO
K9
9
DRAl
J8
8
DRA2
19
7
DRA3
K8
6
DRA4
l8
5
DRA5
K7
4
DRA6
l7
3
DRA7
K6
2
DRA8
l6
84
stack or register/counter A (RAOE = 0) or inputs
DRA9
l5
83
external data (RAOE = 1).
DRA10
K5
82
DRAll
l4
80
DRA12
K4
79
DRA13
l3
78
DRA14
K3
77
DRA15
l2
76
DRBO
Kl
73
DRBl
J2
72
DRB2
Jl
71
DRB3
1i2
Hl
70
DRB4
DRB5
G2
69
67
DRB6
Gl
66
DRB7
Fl
65
DRB8
F2
63
DRB10
E2
61
2-18
15
I
I/O
Condition code
Bidirectional DRA data port. Outputs data from
Bidirectional DRB data port. Outputs data from
I/O
register/counter B
(RaOE = 0) or inputs external data
Table 2. 'ACT8818 Pin Functional Description (Continued)
PIN
GC
FN
NAME
NO.
NO.
I/O
DESCRIPTION
ORB11
E3
60
ORB12
01
59
ORB13
02
58
ORB14
C1
57
ORB15
B1
56
GNO
C3
10
co
GND
C5
30
GND
C9
33
GND
09
46
CO
CO
I-
GNO
G3
52
GND
J5
68
GNO
J9
81
INC
011
27
I
INT
E9
26
I
MUXO
G10
19
MUX1
G11
20
MUX2
F11
21
OSEL
L10
11
I
RAOE
J6
1
I
ORA output enable, active low
RBOE
F3
64
I
ORB output enable, active low
RCO
C2
55
I
Controls for register/counters A and B
I
INT RT register while a low input passes Y to INT RT
RC1
B2
54
RC2
A2
53
RE
C11
29
SO
K10
12
Bidirectional ORB data port. Outputs data from
I/O
register/counter B (RBOE = 0) or inputs external data
(RBOE = 1).
...
I
Ground pins. All pins must be used.
()
Incrementer control pin
z"""
en
«
~
Selects INT RT register to stack, active low (see
Table 3)
MUX control for Y output bus (see Table 4)
ORA output MUX select. Low selects RCA, high
selects stack.
INT RT register enable, active low. A high input holds
register (see Table 3).
S1
J10
13
S2
K11
14
SELDR
K2
75
I
J11
16
0
STKWRN/
RER
VCC
C10
31
VCC
J3
74
I
Stack controls
Selects data source to ORA bus and ORB bus (See
Table 3)
Stack warning signal flag
Supply voltage (5 V)
2-19
Table 2. 'ACT8818 Pin Functional Description (Concluded)
PIN
GC
FN
NAME
NO.
NO.
I/O
DESCRIPTION
YO
B3
Y1
A3
50
Y2
B4
49
Y3
A4
48
Y4
B5
47
Y5
A5
45
Y6
A6
44
~
Y7
C6
43
(")
Y8
A7
41
-I
CO
CO
Y9
B7
40
Y10
C7
39
Y11
A8
38
Y12
B8
37
Y13
A9
36
Y14
B9
35
Y15
B10
34
YOE
B6
42
I
ZEROIN
B11
32
I
Forces internal zero detect high
ZEROUT
H11
17
0
Outputs register/counter zero detect signal
en
2
......
l>
.....
CO
2-20
51
I/O
Bidirectional Y data port
Y output enable, active low
'ACT8818 Specification Tables
absolute maximum ratings over operating free air temperature range (unless
otherwise noted) t
Supply voltage, VCC . . . . . . . . . . . . . . . . . . . . . . . . . . . .. -0.5 V to 6 V
Input clamp current, ',K (V,VCC) ................ ±20 mA
Output clamp current, 10K (VO < 0 or Vo > V CC . . . . . .
± 50 mA
Continuous output current, 10 (VO = 0 to VCC) . . . . . .
± 50 mA
Continuous current through VCC or GND pins. . . . . . . .
± 100 mA
CO
Operating free-air temperature range. . . .
. . . . . . .. 0 DC to 70°C ....
Storage temperature range . . . . . . . . . . . . . . . . . . . . . . .. 65 DC to 1 50 DC CO
tStresses beyond those listed under "absolute maximum ratings" may cause permanent damage to the device.
These are stress ratings only and functional operation of the device at these or any other conditions beyond
those indicated under "recommended operating conditions" is not implied. Exposure to absolute maximum
rated conditions for extended periods may affect device reliability.
.
PARAMETER
Supply voltage
V,H
High-level input voltage
V,L
Low-level input voltage
IOH
High-level output current
IOL
V,
Input voltage
MIN
NOM
MAX
4.5
2
0
5
5.5
V
Vee
V
0.8
-8
8
mA
mA
Vee
V
Low-level output current
Va
dt/dv
Output voltage
TA
Operating free-air temperature
Input transition rise or fall rate
(.)
«
..t
"
en
Z
recommended operating conditions
Vee
CO
I-
0
0
0
0
Vee
15
70
UNIT
V
V
ns/V
°e
2-21
electrical characteristjcs over recommended operating free-air temperature
range (unless otherwise noted)
TA - 25°C
PARAMETER
..
TEST CONDITIONS
VCC
4.5 V
IOH = -201lA
VOH
IOH = -8 rnA
CJ)
:2
IOl = 20llA
-..J
VOL
~
»
("')
IOl = 8 rnA
-I
CO
CO
...
CO
TYP
MAX
MIN
TYP
MAX
UNIT
4.48
5.5 V
5.46
4.5 V
4.15
5.5 V
4.97
V
3.76
4.76
4.5 V
0.014
5.5 V
0.014
4.5 V
0.15
0.45
5.5 V
0.13
0.45
±1
IlA
98
200
Il A
II
VI = Vee or 0
5.5 V
lee
VI = Vee or 0
5.5 V
ei
VI = Vee or 0
5V
~Ieet
One input at 3.4 V, other
inputs at 0 or Vee
MIN
5.5 V
V
pF
3
1
rnA
tThis is the increase in supply current for each input that is at one of thE! specified TTL voltage levels rather
• than 0 V or Vee.
2-22
maximum switching characteristics
PARAMETER
(INPUT)
CC
CLK
tpd
TO
FROM
(OUTPUT)
V
ZEROUT
ORB
STKWRN
24
16
25
27
30 t
23
ORB15-0RBO
22
MUX2-MUXO
22
RC2-RCO
26
S2-S0
25
B3-BO
19
OSEL
25
ZEROIN
25
SELOR
23
23 t
18
19
ns
20
INC
20
Y
ten
16
16
RAOE
18
YOE
tdis
ns
17
RBOE
RAOE
COUT
23
ORA15-0RAO
YOE
UNIT
ORA
14
13
RBOE
ns
14
tOecrementing register/counter A or B and sensing a zero.
2-23
setup and hold times
PARAMETER
FROM (INPUT)
TO (OUTPUT)
CC
Stack
15
Stack
9
DRA15-DRAO
DRB15-DRBO
RCA
6
INT RT
9
RCB
INT RT
7
MPC
Stack
7
Stack
15
OSEl
B3-BO
SElDR
ZEROIN
RCA, RCB
6
INT RT
16
Stack
13
INT RT
13
Stack
12
INT RT
13
Stack
8
INT RT
14
Stack
10
INT RT
10
Stack
14
INT RT
13
Y
MPC
6
RE
INT RT (ClK)
7
MUX2-MUXO
INT RT
12
Any
Any
Input
Destination
th
UNIT
7
INT
S2-S0
MAX
11
INC
RC2-RCO
tsu
MIN
ns
0
ns
clock requirements
PARAMETER
MIN
MAX
UNIT
tw1
Pulse duration, clock low
7
ns
tw2
Pulse duration, clock high
9
ns
tc
Clock cycle time
33
ns
2-24
Architecture
The' ACT8818 microsequencer is designed with a 3-port architecture similar to the
bipolar SN74AS890 microsequencer. Figure 4 shows the architecture of the
'ACT8818. The device consists of the following principal functional groups:
1. A 16-bit microprogram counter (MPC) consisting of a register and
incrementer which generates the next sequential microprogram address
2. Two register/counters (RCA and RCB) for counting loops and iterations,
storing branch addresses, or driving external devices
3. A 65-word by 16-bit LIFO stack which allows subroutine calls and interrupts
at the microprogram level and is expandable and readable by external
hardware
CO
(')
-t
5. B3-BO, whose contents can replace the four least significant bits of the
ORA and ORB buses to support 16-way and 32-way branches
..
CO
CO
6. An external input onto the bidirectional Y port to support external
interrupts.
CO
Use of controls MUX2-MUXO is explained further in the later section on
microprogramming the' ACT8818.
Microprogram Counter.
Based on system status and the current instruction, the microsequencer outputs the
next execution address in the microprogram. Usually the incrementer adds one to the
address on the Y bus to compute next address plus one. Next address plus one is
stored in the microprogram register at the beginning of the subsequent instruction cycle.
During the next instruction, this 'continue' address will be ready at the Y output MUX
for possible selection as the source of the subsequent instruction. The incrementer
thus looks two addresses ahead of the address in the instruction register to set up
a continue /increment by one) or repeat (no increment) address.
Selecting INC from status is a convenient means of implementing instructions that
must repeat until some condition is satisfied; for example, Shift ALU Until MSB = 1,
or Decrement ALU Until Zero. The MPC is also the standard path to the stack. The
next address is pushed onto the stack during a subroutine call, so that the subroutine
will return to the instruction following that from which it was called.
Register/Counters
Addresses or loop counts may be loaded directly into register/counters RCA and RCB
through the direct data ports ORA 1 5-DRAO and ORB 1 5-DRBO. The values stored in
these registers may either be held, decremented, or read. Independent control of both
the registers during a single cycle is supported with the exception of a simultaneous
decrement of both registers.
2-28
Stack
The positive edge clocked 16-bit address stack allows multiple levels of nested calls
or interrupts and can be used to support branching and looping. Seven stack operations
are possible:
1. Reset, which pulls all Y outputs low and clears the stack pointer and read
pointer
2. Clear, which sets the stack pointer and read pointer to zero
3. Pop, which causes the stack pointer to be decremented
4. Push, which puts the contents of the MPC, interrupt return register, or
DRA bus onto the stack and increments the stack pointer
5.
Read, which makes the address indicated by the read pointer available
at the DRA port
6.
Hold, which causes the address of the stack and read pointers to remain
unchanged
7.
Load stack pointer, which inputs the seven least significant bits of DRA
to the stack pointer.
Stack Pointer
The stack pointer (SP) operates as an up/down counter; it increments whenever a push
occurs and decrements whenever a pop occurs. Although push and pop are two event
operations (store then increment SP, or decrement SP then read), the' ACT8818
performs both events within a single cycle.
Read Pointer
The read pointer (RP) is provided as a tool for debugging microcoded systems. It permits
a nondestructive, sequential read of the stack contents from the DRA port. This
capability provides the user with a method of backtracking through the address
sequence to determine the cause of overflow without affecting program flow, the status
of the stack pointer, or the internal data of the stack.
Stack Warning/Read Error Pin
A high signal on the STKWRN/RER pin indicates a potential stack overflow or underflow
condition. STKWRN/RER becomes active under two conditions. If 62 of the 65 stack
locations (0-64) are full (the stack pointer is at 62) and a push occurs, the STKWRN/RER
pin outputs a high signal to warn that the stack is approaching its capacity and will
be full after two more pushes.
The STKWRN/RER signal will remain high if hold, push or pop instructions occur, until
the stack pointer is decremented to 62. If a push instruction is attempted when the
stack is full, the new address will be ignored and the old address in stack location
64 will be retained.
2-29
The 5TKWRN/RER pin will go high when the stack pointer is less than ;or equal to one
and a pop or read from stack is coded on the 52-50 pins. The pin will go high after
reading the next to the bottom stack address (1). When the 52-50 pins are set to pop
or read the last address (0) or to pop or read an empty stack, the 5TKWRN/RER pin
. will go high. The pin depends only on the setting of the 52-50 pins and the stack pointer,
not on the clock.
Interrupt Return Register
en
:s
:t
Unlike the MPC register, which normally gets next address plus one, the interrupt return
register simply gets next address. This permits interrupts to be serviced with zero
latency, since the interrupt vector replaces the pending address.
The interrupting hardware disables the Y output and forces the vector onto the
microaddress bus. This event must be synchronized with the system clock. The first
---t address of the service routine must program INT low and perform a push to put the
00
00 contents of the intetrupt return register on the stack.
C')
~
00
2-30
Microprogramming the ' ACT8818
Microprogramming is unlike programming monolithic processors for several reasons.
First, the width of the microinstuction word is only partially constrained by the basic
signals required to control the sequencer. Since the main advantage of a
microprogrammed processor is speed, many operations are often supported by or
carried out in special purpose hardware. Lookup tables, extra registers, address
generators, elastic memories, and data acquisition circuits may also be controlled by
the microinstruction.
The number of slices in a bit-slice ALU is user-defined, which makes the microinstruction
width even more application dependent. Types of instructions resulting from
manipulation of the sequencer controls are discussed below. Examples of some
commonly used instructions can be found in the later section of microinstructions and
flow diagrams. The following abbreviations are used in the tables in this section:
BR A
BR A'
BR B
BR B'
BR S
CALL A
CALL B
CALL A'
CALL B'
CALL S
CLR SP, RP
CONT/RPT
ORA
ORA'
ORB
ORB'
MPC
POP
PUSH
RCA
RCB
REAO
RESET
RP
SP
STK
00
or-
00
00
lt.)
«
Y Y Y
Y -
ORA
ORA'
ORB
ORB'
Y - STK
Y ~ ORA; STK - MPC; SP - SP + 1; RP - RP + 1
Y
ORB; STK - MPC; SP - SP + 1; RP - RP + 1
Y - ORA'; STK - MPC; SP - SP + 1; RP - RP + 1
Y - ORB'; STK - MPC; SP - SP + 1; RP - RP + 1
Y - STK; STK - MPC; SP - SP + 1; RP - RP + 1
SP - 0; RP - points to TOS register
Y - MPC + 1 if INC = H; Y - MPC if INC = L
Bidirectional data port (can be loaded externally or from RCA)
ORA 15-0RA4::B3-BO
Bidirectional data port (can be loaded externally or from RCB)
ORB15-0RB4::B3-BO
Microprogram counter
SP - SP - 1; RP - RP - 1
STK - operand; SP - SP + 1; RP - RP + 1
Register/counter A
Register/counter B
ORA - STK; RP - RP - 1; SP - SP - 1
Y - 0; SP - 0; RP - points to TOS register
Read pointer
Stack pointer
Stack
~
"Z
tJ)
2-31
Address Selection
V-output multiplexer controls MUX2-MUXO select one of eight 3-source branches as
shown in Table 4. The states of CC and ZERO determine which of the three sources
is selected as the next address. ZERO is set at the beginning of any cycle in which
a register/counter will decrement to zero. This applies to both internal ZERO and external
ZEROUT signals.
Table 4. Output Controls (MUX2-MUXO)
MUX2RESET
MUXO
tn
Z
.....
XXX
LLL
LLH
LHL
LHH
HLL
HLH
HHL
HHH
~
»
n
-I
CO
CO
......
CO
Yes
No
No
No
No
No
No
No
No
Y OUTPUT SOURCE
CC - L
ZERO - L ZERO - H CC - H
All Low
All Low All Low
STK
MPC
ORA
STK
MPC
ORB
STK
ORA
MPC
STK
ORB
MPC
ORA
ORB
MPC
ORB,:j:
ORA't
MPC
ORA
STK
MPC
ORB
STK
MPC
tORA 15-0RA4::B3-BO
*ORB15-0RB4::B3-BO
By programming CC high or low without decrementing registers, only one outcome
is possible; thus, unconditional branches or continues can be implemented by forcing
the condition code. Alternatively, CC can be selected from status, in which case Branch
A on Condition Code Else Branch B instructions are possible, where A and B are the
address sources determined by MUX2-MUXO.
Decrement and Branch on Nonzero instructions, creating loops that repeat until a
terminal count is reached, can be implemented by programming CC low and
decrementing a register/counter. If CC is selected from status and registers are
decremented, more complex iflstructions such as Exit on Condition Code or End or
Loop are possible.
When MUX2-MUXO = HLH, the B3-BO inputs can replace the four least significant
bits of ORA or ORB to create 16-Way branches or, when CC is based on status, to
create 32-way branches.
Stack Controls
As in the case of the MUX controls, each stack-control coding is a three-way choice
based on CC and ZERO (see Table 5). This allows push, pop, or hold stack operations
to occur in parallel with the aforementioned branches. A subroutine call is accomplished
by combining a branch and push, while returns result from coding a branch to stack
with a pop.
2-32
Table 5. Stack Controls (S2-S0)
STACK OPERATION
S2-S0
OSEL
CC - L
ZERO = L
ZERO - H
CC .. H
LLL
X
Reset/Clear
Reset/Clear
Reset/Clear
LLH
X
Clear SP/RP
Hold
Hold
LHL
X
Hold
Pop
Pop
LHH
X
Pop
Hold
Hold
HLL
X
Hold
Push
Push
HLH
X
Push
Hold
Hold
HHL
X
Push
Hold
Push
HHH
H
Read
Read
Read
HHH
L
Hold
Hold
Hold
....
ex)
ex)
ex)
....
()
~
A branch or jump to a given microaddress can also be coded several ways. RCA, ORA,
RCB, ORB, and STK are possible sources for branch addresses (see Table 4). Branches
00 to register or stack are useful whenever the branch address could be stored to reduce
~ overhead.
00
The simplest branches are to ORA and ORB, since they require only one cycle and
the branch address is supplied in the microinstruction. Use of registers or stack requires
an initial load cycle (which may be combined with a preceding instruction). but may
be more practical when an entry point is referenced over and over throughout the
microprogram, for example, in error-handling routines. Branches to stack or register
also enhance sequencing techniques in which a branch address is dynamically
computed or multiple branches to a common entry point are used, but the entry point
varies according to the system state. In this case, the state change might require
reloading the stack or register.
In order to force a branch to ORA or ORB, CC must be programmed high or low. A
branch to stack is only possible when CC is forced low (see Table 4).
When CC is low, the ZERO flag is tested, and if a register decrements to zero the
branch will be transformed into a Decrement and Branch on Nonzero instruction.
Therefore, registers should not be decremented during branch instructions using
CC = 0 unless it is certain the register will not reach terminal count. Call (Branch and
Push MPC) instructions and Return (Branch to Stack and Pop) instructions are discussed
in later sections.
2-34
Conditional Branch Instructions
Perhaps the most useful of all branches is the conditional branch. The' ACT8818
permits three modes of conditional branching: Branch on Condition Code; Branch
16-Way from DRA or DRB; and Branch on Condition Code 16-Way from DRA Else
Branch 16-Way from DRB. This increases the versatility of the system and the speed
of processing status tests because both single-bit and 4-bit status are allowed.
Testing single bit status is preferred when the status can be set up and selected through
a status MUX prior to the conditional branch. Four-bit status allows the' ACT8818
to process instructions based on Boolean status expressions, such as Branch if Overflow
and Not Carry if Zero or if Negative. It also permits true n-way branches, such as If
Negative then Branch to X, Else if Overflow, and Not Carry then Branch to Y. The
tradeoff is speed versus program size. Since multiway branching occurs relatively
infrequently in most programs, users will enjoy increased speed at a negligible cost.
Call (Branch and Push MPC) instructions and Return (Branch to Stack and Pop)
instructions are discussed in later sections.
Loop Instructions
Up to two levels of nested loops are possible when both counters are used
simultaneously. Loop count and levels of nesting can be increased by adding external
counters if desired. The simplest and most widely used of the loop instructions is
Decrement and Branch on Nonzero, in which CC is forced low while a register is
decremented. As before, many forms are possible, since the top-of-Ioop address can
originate from RCA, DRA, RCB, DRB, or the stack (see Table 4). Upon terminal count,
instruction flow can either drop out of the bottom of the loop or branch elsewhere.
When loops are used in conjunction with CC as status, B3-BO as status and/or stack
manipulation, many useful instructions are possible, including Decrement and Branch
on Nonzero else Return, Decrement and Call on Nonzero, and Decrement and Branch
16-Way on Nonzero. Possible variations are summarized in Table 7. Call (Branch and
Push MPC) instructions and Return (Branch to Stack and Pop) instructions are discussed
in later sections.
Another level of complexity is possible if CC is selected from status while looping.
This type of loop will exit either because CC is true or because a terminal count has
been reached. This makes it possible, for example, to search the ALU for a bit string.
If the string is found, the match forces CC high. However, if no match is found, it
is necessary to terminate the process when the entire word has been scanned. This
complex process can then be implemented in a simple compact loop using Conditional
Decrement and Branch on Nonzero.
2·35
00
r00
00
t;
S2-S0
OSEl
CC - H
BR A
CALL A
(')
HLL
HLH
X
CALL A
CONT/RPT
~
HLL
HHL
X
CALL A
CONT/RPT
CALL B
HLH
HLH
X
CALL A' (16-way)
CONT/RPT
BR B' (16-way)
HLH
HHL
X
CALL A' (16-way)
CONT/RPT
CALL B' (16-way)
HHL
HLH
X
CALL A
BR S
CONT/RPT
HHL
HHL
X
CALL A
BR S
CONT/RPT: PUSH
HHH
HLH
X
CALL B
BR S
CONT/RPT
HHH
HHL
X
CALL B
BR S
CONT/RPT: PUSH
CO
CO
~
CO
Subroutine Returns
A return from subroutine can be implemented by coding a branch to stack with a pop.
Since pop is also conditional on CC and ZERO, the complex forms discussed previously
also apply to return instructions: Decrement and Return on Nonzero; Return on
Condition Code; Branch on Condition Code Else Return. Return encodings are
summarized in Tables 10 and 11.
Table 10. Return Encodings without Register
Decrements
2-38
MUX2-MUXO
S2-S0
OSEl
cc - L
LLL
LHH
X
RET
CC - H
BR A
LLH
LHH
X
RET
BR B
LHL
LHH
X
RET
CONT/RPT
LHH
LHH
X
RET
CONT/RPT
Table 11. Return Encodings with Register Decrements
MUX2-MUXO
S2-S0
OSEl
cc ZERO - l
l
ZERO = H
CC
= H
LLL
LHH
X
RET
CONT/RPT
BR A
LLH
LHH
X
RET
CONT/RPT
BR B
LHL
LHH
X
RET
BR A
CONT/RPT
LHH
LHH
X
RET
BR B
CONT/RPT
HHL
LHL
X
BR A
RET
CONT/RPT: POP
HHH
LHL
X
BR B
RET
CONT/RPT: POP
co
~
CO
~
Reset
Pulling the S2-S0 pins low clears the stack and read pointers, and zeroes the Y output
multiplexer (See Table 5).
«
(.)
~
"
en
Clear Pointers
2:
The stack and read pointers may be cleared without affecting the Y output multiplexer
by setting S2-S0 to LLH and forcing CC low (see Table 5).
Read Stack
Placing a high value on all of the stack inputs (S2-S0) and OSEL places the' ACT8818
into the read mode. At each low-to-high clock transition, the address pointed to by
the read pointer is available at the ORA port and the read pointer is decremented. The
bottom of the stack is detected by monitoring the stack warning/read error pin
(STKWRN/RER). A high appears on the STKWRN/RER output when the stack contains
one word and a read instruction is applied to the S2-S0 pins. This signifies that the
last address has been read.
The stack pointer and stack contents are unaffected by the read operation. Under
normal push and pop operations, the read pointer is updated with the stack pointer
and contains identical information.
Interrupts
Real-time vectored ihtern,ipt routines are supported for those applications where polling
would impede system throughput. Any instruction, including pushes and pops, may
be interrupted. To process an interrupt, the following procedure should be followed:
1. Place the bidirectional Y bus into a high-impedance state by forcing YOE high.
2. Force the interrupt entry point vector onto the Y bus. INC should be high.
3. Push the current value in the Interrupt Return register on the stack as the
execution address to return to when interrupt handling is complete.
The first instruction of the interrupt routine must push the address stored in the interrupt
return register onto the stack so that proper return linkage is maintained. This is
accomplished by setting INT and B1 low and coding a push on the stack.
2-39
Sample Microinstructions for the ' ACT8818
Representative examples of instructions using the' ACT8818 are given below. The
examples assume a one-level pipeline system, in which the address and contents of
the next instruction are being fetched while the current instruction is being executed,
and an ALU status register contains the status results of the previous instruction.
en
~
-'="
»
(')
-I
00
00
...a
00
Since the incrementer looks two addresses ahead of the address in the instruction
register to set up some instructions such as continue or repeat, a set-up instruction
has been included with each example. This shows the required state of both INC and
CC. CC must be set up early because the status register on which V-output selection
is typically based contains the results of the previous instruction.
Flow diagrams and suggested code for the sample microinstructions are also given
below. Numbers inside the circles are microword address locations expressed as
hexadecimal numbers. Fields in microinstructions are binary numbers except for inputs
on ORA or ORB, which are also in hexadecimal. For a discussion of sequencing
instructions, see the preceding section on microprogramming.
Continue
To Continue (Instruction 10)' INC and CC must be programmed high one cycle ahead
of instruction 10 for pipelining.
Address
(Set-up)
10
Instruction
Continue
MUX2-MUXO S2-S0 R2-RO OSEL
XXX
110
XXX
111
XXX
XXX
X
0
CC
1
X
INC
X
ORA
ORB
XXXX XXXX
XXXX XXXX
Continue and Pop
To Continue and decrement the stack pointer (Pop), INC and CC are forced high in
the previous instruction.
Address
Instruction
(Set-up)
10
Continue/Pop
MUX2-MUXO S2-S0 R2-RO OSEL
XXX
110
XXX
010
XXX
XXX
X
X
CC
INC
1
X
X
ORA
ORB
XXXX XXXX
XXXX XXXX
Continue and Push
To Continue and push the microprogram counter onto the stack (Push), INC and CC
are forced high one cycle ahead of Instruction 10 for pipelining.
Address
Instruction
(Set-up)
10 Continue/Push
2-40
MUX2-MUXO S2-S0 R2-RO OSEL
XXX
110
XXX
100
XXX
XXX
X
0
CC
1
X
INC
X
ORA
ORB
XXXX XXXX
XXXX XXXX
>-----
IMPOSSIBLE
co
....
CO
CO
I-
U
«
~
"Z
CJ)
Figure 5. Continue
Figure 6. Continue and Pop
Figure 7. Continue and Push
2-41
Branch (Example 1)
To Branch from address 10 to address 20, CC must be programmed high one cycle
ahead of Instruction 10 for pipelining.
Address
Instruction
(Set-up)
10
BR A
MUX2-MUXO 52-SO
xxx
xxx
000
111
R2-RO
05EL
CC
INC
ORA
ORB
XXX
XXX
x
1
o
X
X
X
XXXX
0020
XXXX
XXXX
Branch (Example 2)
en
::i
To Branch from address 10 to address 20, CC is programmed low in the previous
instruction; as a result, a ZERO test follows the condition code test in Instruction 10.
~ To ensure that a ZERO = H condition will not occur, registers should not be
-t decremented during this instruction.
~
CO
CO
Address
(Set-up)
10
Instruction
BR A
MUX2-MUXO 52-SO
XXX
110
XXX
111
R2-RO 05EL
XXX
000
CC
INC
ORA
ORB
X
o
o
X
X
X
XXXX
0020
XXX X
XXXX
Sixteen-Way Branch
To Branch l6-Way, CC is programmed high in the previous instruction. The branch
address is derived from the concatenation DRB15-DRB4::B3-BO.
Address
Instruction
(Set-up)
10
BR B'
2-42
MUX2-MUXO 52-SO
XXX
101
XXX
111
R2-RO 05EL
XXX
XXX
CC
INC
X
X
X
X
o
DRA
ORB
XXXX XXXX
XXX X 0040
........_ - IMPOSSIBLE
ex>
ex>
~
>-_H_-IMPOSSIBLE*
~
(.)
~
-=:t
'"
Z
en
"no register decrement
Figure 8. Branch Example 1
Figure 9. Branch Example 2
Figure 10. Sixteen-Way Branch
2-43
Conditional Branch
To Branch to address 20 Else Continue to address 11, INC is set high in the preceding
instruction to set up the Continue.
Address
(Set-up)
10
en
2
"l>
~
(")
-I
00
00
Instruction
MUX2-MUXO 52-SO R2-RO OSEL CC' INC
XXX
110
BR A else
Continue
xxx
111
XXX
000
X
x
o
X
X
ORA
ORB
XXXX XXXX
0020 XXXX
Three-Way Branch
To Branch 3-Way, this example uses an instruction from Table 7 with BR A in the
ZERO = L column, CONT/RPT in the ZERO = H column and BR B in the CC = H
column. To enable the ZERO = H path, register A must decrement to zero during this
instruction (see Table 6 for possible register operations). INC is programmed high in
Instruction 10 to set up the Continue.
.-.
00
Address
(Set-up)
10
11
Instruction
MUX2-MUXO 52-SO R2-RO OSEL CC INC
Continue and
Load Reg A
Decrement Reg A;
Branch 3-Way
XXX
XXX
XXX
X
110
111
010
0
t
100
111
001
0
X
ORA
ORB
XXXX XXXX
XXXX XXXX
X
0020 0030
tSelected from external status
Thirty-Two-Way Branch
To Branch 32-Way, the four least significant bits of the ORA' and ORB' addresses
must be input at the B3-BO port; these are concatenated with the 12 most significant
bits of ORA and ORB to. provide new addresses ORA' (ORA 15-0RA4::B3-BO) and ORB'
(ORB15-0RB4::B3-BO).
Address
Instruction
(Set-up)
10
32-way Branch
2-44
MUX2-MUXO 52-SO R2-RO OSEL CC INC
XXX
XXX
101
111
XXX
000
X
0
X
X
X
ORA
ORB
XXXX XXXX
0040 0030
H IMPOSSIBLE"
....
ex)
ex)
ex)
l-
t.)
«
"d'
,....
Z
(J)
• no register decrement
Figure 11. Conditional Branch
Figure 12. Three-Way Branch
*no register decrement
Figure 13. Thirty-Two-Way Branch
2-45
Repeat
To Repeat (Instruction 10), INC must be programmed low and CC high one cycle ahead
of Instruction 10 for pipelining.
en
Address
Instruction
(Set-up)
10
Continue
MUX2-MUXO 52-SO R2-RO OSEL CC INC
XXX
110
XXX
111
XXX
XXX
a
X
a
X
X
ORA
ORB
XXXX XXXX
XXX X XXXX
Repeat on Stack
:2 To Continue and push the microprogram counter onto the stack (Push), INC and CC
......
~
must be forced high one cycle ahead for pipelining .
»
To Repeat (Instruction 12), an BR S instruction with ZERO =
("')
L is used. To avoid a
ZERO = H condition, registers are not decremented during this instruction (see Table 6
-I
CO for possible register operations. CC and INC are programmed high in Instruction 12
CO to set up the Continue in Instruction 11.
-'
(X)
Address
Instruction
(Set-up)
10
11
12
Continue/Push
Continue
BR Stack
INC-O
MUX2-MUXO 52-SO R2-RO OSEL CC INC
XXX
110
110
010
XXX
100
111
111
~-MIIIf---t
XXX
XXX
XXX
000
X
X
a
a
1
a
CC-1
>-...;;L'---_ IMPOSSIBLE
Y-MPC
Figure 14. Repeat
2-46
1
X
ORA
ORB
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
'no register decrement
Figure 15. Repeat on Stack
2-47
Repeat Until CC = H
To Continue and push the microprogram counter onto the stack (Push), INC and CC
must be forced high one cycle ahead for pipelining.
To Repeat Until CC = H (Instruction 12), use a BR S instruction with CC = Land
CONT/RPT: POP instruction with CC = H. To avoid a ZERO = H condition, registers
are not decremented (See Table 6 for possible register operations). CC and INC are
programmed high iii Instruction 12 to set up the Continue in Instruction 11. A
consequence of this is that the instruction following 1 3 cannot be conditional.
fJ)
2
-...I
Address
»
("")
(Set-up)
10
11
12
~
-t
00
00
~
00
Instruction
Continue/Push
Continue
BR Stack else
Continue
MUX2-MUXO 52-SO R2-RO OSEL CC INC
XXX
110
110
XXX
100
111
xxx
XXX
XXX
X
X
0
010
010
000
X
1
t
ORA
ORB
XXXX XXXX
XXXX XXXX
XXXX XXXX
XXXX XXXX
t Selected from external status
Loop Until Zero
To Continue and push the microprogram counter onto the stack (Push), INC and CC
are forced high one cycle ahead for pipelining. Register A is loaded with the loop counter
using a Load A instruction from Table 6.
To decrement the loop count, a decrement register A and hold register B instruction
from Table 6 is used. To Repeat Else Continue and Pop (decrement the stack pointer),
an instruction from Table 7 with BR S in the ZERO = L column and CONT/RPT: POP
in the ZERO = H column is used. CC is programmed low in Instruction 11 to
force the ZERO test in Instruction 12; it is programmed high in Instruction 12 to set
up the Continue in Instruction 11.
Address
(Set-up)
10
11
12
2-48
Instruction
Continue/Push
Continue/Load
Reg A
Decrement Reg A;
BR 5 else
Continue: Pop
MUX2-MUXO 52-SO R2-RO OSEL CC INC
XXX
110
XXX
100
XXX
XXX
X
0
110
111
010
0
000
010
001
ORA
ORB
XXXX XXXX
XXXX XXXX
0
XXXX XXXX
XXXX XXXX
00
.-
00
00
IU
~
CO
Address
(Set-up)
10
Instruction
Call A else
Continue
MUX2-MUXO 52-SO R2-RO OSEL CC' INC
XXX
XXX
XXX
X
t
110
101
000
X
X
ORA
ORB
XXXX XXXX
X
0020
XXXX
t Selacted from external status
Two-Way Jump to Subroutine
To perform a Two-Way Call to Subroutine at address 20 or address 30, this example
uses an instruction from Table 8 with CALL A in the CC = L column and CALL B
in the CC = H column. In this example, CC is generated by external status during
the preceding (set-up) instruction. INC is programmed high in the preceding instruction
to set up the Push. To avoid a ZERO = H condition, registers should not be decremented
during Instruction 10.
Address
(Set-up)
23
Instruction
Call A else
Call B
t Selected from external status
2-52
MUX2-MUXO 52-SO R2-RO OSEL
CC
XXX
XXX
XXX
X
t
100
110
000
X
X
INC
ORA
ORB
XXXX XXXX
X
0020
0030
ex>
....
ex>
ex>
.....
u
«
o::t
,.....
Z
Figure 19. Jump to Subroutine
(J)
*no register decrement
Figure 20. Conditional Jump to Subroutine
• no register decrement
Figure 21. Two-Way Jump to Subroutine
2-53
Return from Subroutine
To Return from a subroutine, this example uses an instruction from Table 10 with RET
in the CC = L column. CC is programmed low in the previous instruction. To
avoid a ZERO = H condition, registers are not decremented during Instruction 23.
en
2
Address
Instruction
(Set-up)
23
Return
MUX2-MUXO S2-S0 R2-RO OSEL
XXX
010
xxx
011
XXX
000
CC
INC
ORA
ORB
x
o
X
X
X
X
XXXX
XXXX
XXXX
XXXX
Conditional Return from Subroutine
~ To conditionally Return from a Subroutine, this example uses an instruction from
:; Table 10 with RET in the CC = L column and CONT/RPT in the CC = H column.
n CC is selected from external status in the previous instruction. To avoid a ZERO = H
~ condition, registers are not decremented during Instruction 23.
CO
~
CO
Address
(Set-up)
23
Instruction
MUX2-MUXO S2-S0 R2-RO QSEL
Return else
Continue
CC
XXX
XXX
XXX
X
t
010
011
000
X
X
INC
ORA
ORB
XXX X XXX X
X
XXXX
XXXX
t Selected from external status
Clear Pointers
To Continue (Instruction 10), INC must be high; CC must be programmed high in the
previous instruction. To Clear the Stack and Read Pointers and Branch to address 20
(instruction 11), CC is programmed low in instruction 10 to set up the Branch. To avoid
a ZERO = H condition, registers are not decremented during Instruction 11.
Address
(Set-up)
10
11
Instruction
Continue
BR A and Clear
SP/RP
MUX2-MUXO S2-S0 R2-RO OSEL CC INC ORA
ORB
XXX
XXX
X
XXXX XXXX
XXX
110
111
XXX
X 0020 XXXX
0
0
110
001
000
X
X
X
XXXX XXXX
Reset
To Reset the' ACT8818, pull the S2-S0 pins low. This clears the stack and read pointers
and places the Y bus into a low state.
Address
Instruction
10
Reset
2-54
MUX2-MUXO S2-S0 R2-RO OSEL CC INC
XXX
000
XXX
X
X
X
ORA
ORB
XXXX XXXX
00
.00
00
l-
e,)
C')
-I
00
00
W
N
3-2
SN74ACT8832
CMOS 32·8it Registered ALU
•
50-ns Cycle Time
•
low-Power EPICTM CMOS
•
Three-Port 1/0 Architecture
•
64-Word by 36-Bit Register File
•
Simultaneous ALU and Register Operations
•
Configurable as Quad 8-Bit or Dual 16-Bit Single
Instruction, Multiple Data Machine
•
Parity Generation/Checking
The SN74ACT8832 is a 32-bit registered ALU that can operate at 20 MHz and
20 MIPS (million instructions per second), Most instructions can be performed
in a single cycle. The' ACT8832 was designed for applications that require highspeed logical, arithmetic, and shift operations and bit/byte manipulations.
The' ACT8832 can act as host CPU or can accelerate a host microprocessor.
In high-performance graphics systems, the 'ACT8832 generates display-list
memory addresses and controls the display buffer. In I/O controller applications,
the 'ACT8832 performs high-speed comparisons to initialize and end data
transfers.
A three-operand, 64-word by 36-bit register file allows the' ACT8832 to create
an instruction and store the previous result in a single cycle.
EPIC is a trademark of Texas Instruments Incorporated.
3-3
en
2
-...J
~
»
(")
-I
00
00
W
N
3-4
Contents
Page
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Understanding Microprogrammed Architecture ......... .
'ACT8832 Registered ALU ....................... .
Support Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Design Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Systems Expertise . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .
'ACT8832 Pin Descriptions ....................... .
'ACT8832 Specification Tables .................... .
3-13
3-13
3-13
3-14
3-15
3-15
3-16
3-25
'ACT8832 Registered ALU ...........................
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Architectural Elements ......................
Three-Port Register File ..................
Rand S Multiplexers ....................
Data Input and Output Ports ..............
ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ALU and MQ Shifters ...................
Bidirectional Serial I/O Pins ...............
MQ Register ..........................
Conditional Shift Pin ....................
Master/Slave Comparator ................
Divide/BCD Flip-Flops ...................
Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Input Data Parity Check .................
Test Pins ............................
Instruction Set Overview ........................
Arithmetic/Logic Instructions with Shifts .........
Other Arithmetic Instructions .................
Data Conversion Instructions ..................
Bit and Byte Instructions .....................
Other Instructions ..........................
Configuration Options .......................
Masked 32-Bit Operation .................
Shift Instructions ......................
Bit and Byte Instructions .................
Status Selection . . . . . . . . . . . . . . . . . . . . . . .
3-28
3-28
3-29
3-31
3-31
3-32
3-34
3-34
3-36
3-36
3-37
3-37
3-37
3-37
3-38
3-38
3-38
3-39
3-43
3-46
3-48
3-49
3-49
3-50
3-50
3-50
3-51
3-51
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3-5
N
~
CO
~
«
'lit
~
en
Contents (Continued)
Page
Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ABS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ADD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ADDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ANDNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13ADO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BCDBIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BINCNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BINCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BINEX3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BSUBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BSUBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DIVRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DNORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DUMPFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EX3BC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EX3C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INCNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INCNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LOADFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LOADMQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MOSLC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MOSLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MOSRA ................ '...................
MQSRL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PASS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CJ)
2
"l>
,f::I.
(")
-I
CO
CO
W
N
3-6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3-52
3-53
3-55
3-57
3-59
3-61
3-63
3-65
3-67
3-70
3-72
3-74
3-76
3-78
3-80
3-82
3-84
3-85
3-88
3-90
3-92
3-94
3-96
3-99
3-101
3-103
3-105
3-107
3-109
3-111
3-113
3-115
3-117
3-119
3-121
3-123
3-125
3-127
Contents (Concluded)
Page
SDIVI .................................... .
SDIVIN ................................... .
SDIVIS ................................... .
SDIVIT ................................... .
SDIVO .................................... .
SDIVQF ................................... .
SEL ...................................... .
SETO ..................................... .
SET1 ..................................... .
SLA .....................................".
SLAD .................................... .
SLC ..................................... .
SLCD .................................... .
SMTC .................................... .
SMUll .................................... .
SMULT ................................... .
SNORM ................................... .
SRA ..................................... .
SRAD .................................... .
SRC ..................................... .
SRCD .................................... .
SRL ...................................... .
SRLD .................................... .
SUBI ..................................... .
SUBR .................................... .
SUBS .................................... .
TBO ...................................... .
TB1 ...................................... .
UDIVI .................................... .
UDIVIS ................................... .
UDIVIT ................................... .
UMULI ................................... .
XOR ..................................... .
3-129
3-131
3-133
3-135
3-137
3-139
3-141
3-143
3-145
3-147
3-149
3-151
3-153
3-155
3-157
3-159
3-161
3-163
3-165
3-167
3-169
3-171
3-173
3-175
3-177
3-179
3-181
3-183
3-185
3-187
3-189
3-191
3-193
3-7
N
('I)
00
00
....
u
II)
MICROINSTRUCTION BUS
-t
CO
CO
W
N
TESTED STATUS
STATUS
Figure 1. Microprogrammed System Block Diagram
The configuration of this processor enchances processing throughput in arithmetic
and radix conversion. Internal generation and testing of status results in fast processing
of division and multiplication algorithms. This decision logic is transparent to the user;
the reduced overhead assures shorter microprograms, reduced hardware complexity,
and shorter software development time.
Support Tools
Texas Instruments has designed a family of low-cost, real-time evaluation modules
(EVM) to aid with initial hardware and microcode design. Each EVM is a small selfcontained system which provides a convenient means to test and debug simple
microcode, allowing software and hardware evaluation of components and their
operation.
At present, the 74AS-EVM-8 Bit-Slice Evaluation Module has been completed, and
16- and 32-bit EVMs are in advanced stages of development. EVMs and support tools
for other devices in the' ACT8800 family are also planned for future development.
3-14
Design Support
Tl's '8832 32-bit registered ALU is supported by a variety of tools developed to aid
in design evaluation and verification. These tools will streamline all stages of the design
process, from assessing the operation and performance of the '8832 to evaluating
a total system application. The tools include a functional model, behavioral model,
and microcode development software and hardware. Section 8 of this manual provides
specific information on the design tools supporting TI's SN74ACT8800 Family.
Systems Expertise
Texas Instruments VLSI Logic applications group is available to help designers analyze
Tl's high-performance VLSI products, such as the '8832 32-bit registered ALU. The
group works directly with designers to provide ready answers to device-related
questions and also prepares a variety of applications documentation.
The group may be reached in Dallas, at (214) 997-3970.
3-15
, ACT8832 Pin Descriptions
Pin descriptions and grid allocations for the' ACT8832 are given on the following pages.
GB . .. PACKAGE
(TOP VIEW)
2
A
B
C
D
en
2
......
.a:=a.
»
n
-t
00
00
W
N
E
F
G
H
J
K
L
M
N
P
R
S
T
• •
• ••
• •
• •
• •
• •
• •
• •
• •
• •
• •
• •
• •
• •
• •
• •
• •
3
4
5
6
7
8
9
10 11
12 13 14 15 16 17
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 2. SN74ACT8832 . .. GB Package
3-16
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
...
32-BIT
REGISTERED
ALU
WRITE EN
[......,
CLK
CLK
INPUT
SELECT
CARRY IN
r:
SSF
RF
OPERAND
SELECT
EBO-EB1
S100-S103
REGISTER
FILE
DA31-DAO
ALU
I
"- ALU/MO
SPECIAL
SHIFTER
SHIFT
FUNCTION
I ·
·
··
I ·
·
CONFIGURA TlON
CF1
MODE
CF2
SELECT
AO
B PORTIO
READ·
ADDRESS •
SelECT 5
BO
WRITE 0
ADDRESS
~ MO REGISTER
ALU SHIFTER
I
SElMO
DA
PORT
PARITY
I/O
I
OUTPUT
A5
B5
co
:
SelECT 5
I
TEST PINS
TPO-TP1
SELRF1-SELRFO
A PORT 0
READ·
ADDRESS •
SELECT 5
"- SIO EN
CFO
WE3-WEO
RFCLK
CLK
DB
PORT
C5
PAO
N
PA1
(V)
PA3
00
00
PBO
U
PA2
....
«
PB1
~
PB2
'"
Z
PB3
SELECT
CJ)
0
10
11
PORi
12
INSTRUCTIONS
13
14
15
PARITY
STATUS
16
17
7
OEA
......
OEB
"-
OEYO-OEY3
......
OES
......
DAO
DA31
I
YO-Y31
EN
STATUS
,
·· ··· ~
·
0
31
PY1
PY2
PY3
DA BUS
PERRA
DB BUS
PERRB
Y BUS
MASTER/SLAVE
COMPARATOR
PERRY
DAO-DA31
DBO-DB31
PYO
MSERR
SIGN
N
CARRY-OUT
C
STATUS
ZERO
OVERFLOW
BYTE OVERFlOW
OVR
BY03-BYOO
r
~ ··· ···
0
31
~ ·· ··
· ·
0
IINSTRUCTlf)
31
DBO
DB31
YO
Y31
v
Figure 3. SN74ACT8832 . .. Logic Symbol
3-17
Table 1. SN74ACT8832 Pin Grid Allocation
PIN
CJ)
2
"»
~
o
-t
00
00
CAl
N
NO.
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
A16
A17
B1
B2
B3
B4
B5
B6
B7
B8
B9
B10
B11
B12
B13
B14
B15
B16
B17
C1
3-18
NAME
Y7
Y13
Y15
BYOF1
5103
5102
IE5101
IE5100
5100
N
OE5
55F
Y18
Y20
Y23
Y24
Y25
Y6
BYOFO
Y10
Y12
PY1
IE5103
IE5102
5101
Z
OVR
M5ERR
Y16
Y19
Y21
PY2
Y26
Y29
Y2
NO.
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
01
02
03
04
05
06
07
08
09
010
011
012
013
014
015
016
017
E1
E2
PIN
NAME
Y5
OEYO
Y9
Y11
Y14
OEY1
GNO
VCC
C
PERRY
Y17
Y22
OEY2
Y28
PY3
BYOF3
CF1
Y1
Y3
PYO
Y8
GNO
GNO
GNO
VCC
GNO
GNO
GNO
BYOF2
Y27
Y31
TP1
10
5ELMa
CFO
PIN
NO.
E3
E4
E14
E15
E16
E17
F1
F2
F3
F4
F14
F15
F16
F17
G1
G2
G3
G4
G14
G15
G16
G17
H1
H2
H3
H4
H14
H15
H16
H17
J1
J2
J3
J4
J14
NAME
YO
Y4
Y30
TPO
12
13
EB1
Cn
CLK
CF2
OEY3
11
14
16
OBO
EA
EBO
GNO
GNO
15
17
PA3
OB2
OB1
VCC
GNO
GNO
VCC
OA31
OA30
OB3
OB4
OB5
VCC
VCC
PIN
NO.
J15
J16
J17
K1
K2
K3
K4
K14
K15
K16
K17
L1
L2
L3
L4
L14
L15
L16
L17
M1
M2
M3
M4
M14
M15
M16
M17
N1
N2
N3
N4
N14
N15
N16
N17
NAME
OA28
OA27
OA29
OB6
OB7
OAO
GNO
GNO
OA24
OA25
OA26
PBO
OA2
VCC
GNO
GNO
VCC
OB30
PB3
OA1
OA4
OA7
GNO
PA2
OB26
OB28
OB31
OA3
OA6
OB9
OB13
OA19
OA23
OB25
OB29
PIN
NO.
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
P13
P14
P15
P16
P17
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
NAME
OA5
OB8
OB12
OA9
OA15
A5
A1
VCC
GNO
C4
PERRB
GNO
OB22
OA16
OA18
OA22
OB27
PAO
OB11
PB1
OA11
PA1
A4
AO
WE2
VCC
B1
C2
OEB
OB18
OB21
PB2
OA20
OB24
PIN
NO.
51
52
53
54
55
56
57
58
59
510
511
512
513
514
515
516
517
T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
T11
T12
T13
T14
T15
T16
T17
NAME
OB10
OB15
OA10
OA13
PERRA
A3
WEO
WE3
RFCLK
B4
B2
C3
CO
OB17
OB20
OB23
PA21
OB14
OA8
OA12
OA14
OEA
A2
WE1
5ELRF1
5ELRFO
B5
B3
BO
C5
C1
OB16
OB19
OA17
Table 2. SN74ACT8832 Pin Description
PIN
NAME
NO.
AO
R7
A1
P7
A2
T6
A3
56
A4
R6
A5
P6
BO
T12
B1
R10
1/0
DESCRIPTION
I
Register file A port read address select
I
Register file B port read address select
B2
511
B3
T11
B4
510
N
M
B5
T10
00
00
BYOFO
B2
BYOF1
A4
BYOF2
D13
BYOF3
C17
C
C10
CO
513
C1
T14
C2
R11
C3
512
C4
P10
C5
T13
CFO
E2
CF1
D1
CF2
F4
l-
0
5tatus signal representing carry out condition
I
Register file write address select
F2
I
AlU carry input
F3
I
Clocks synchronous registers on positive edge
DAO
K3
M1
l2
DA3
N1
DA4
M2
DA5
P1
DA6
N2
DA7
M3
DA8
T2
DA9
P4
I/O
"z
en
16-bit. or four 8-bit AlU's
Cn
DA2
~
Configuration mode select. single 32-bit. two
ClK
DA1
(")
-I
CO
CO
W
N
NAME
NO.
YO
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
Y9
Y10
Y11
Y12
Y13
Y14
Y15
Y16
Y17
Y18
Y19
Y20
Y21
Y22
Y23
Y24
Y25
Y26
Y27
Y28
Y29
Y30
Y31
E3
02
C1
03
E4
C2
B1
A1
05
C4
B3
C5
B4
A2
C6
A3
B12
C12
A13
B13
A14
B14
C13
A15
A16
A17
B16
014
C15
B17
E14
015
B9
Z
3-24
110
1/0
0
DESCRIPTION
Y port data bus
Output status signal represents zero condition
, ACT8832 Specification Tables
absolute maximum ratings over operating free-air temperature range
(unless otherwise noted) t
Supply voltage, vee. . . . . . . . . . . . . . . . . . . . . . . . . . . .. -0.5 V to 6 V
Input clamp current, 11K (VI < 0 or VI > Vee) . . . . . . . . . . . . .. ± 20 mA
Output clamp current, 10K (VO < 0 or Vo > Vee) .......... ± 50 mA
Continuous output current, 10 (VO = 0 to Vee) . . . . . . . . . . . .. ± 50 mA
Continuous current through Vee or GND pins. . . . . . . . . . . . .. ± 100 mA
Operating free-air temperature range. . . . . . . . . . . . . . . . . .. ooe to 70 0 e
Storage temperature range ............... . . . . . .. - 65 °e to 150 0 e
tStresses beyond those listed under "absolute maximum ratings" may cause permanent damage to the device.
These are stress ratings only and functional operation of the device at these or any other conditions beyond
those indicated under "recommended operating conditions" is not implied. Exposure to absolute-maxi mumrated conditions for extended periods may affect device reliability.
Table 3. Recommended Operating Conditions
PARAMETER
Vee Supply voltage
VIH High-level input voltage
VIL
Low-level input voltage
IOH
High-level output current
IOL
Low-level output current
VI
Input voltage
Output voltage
Vo
dt/dv Input transition rise or fall rate
TA
Operating free-air temperature
MIN
NOM
4.5
2
0
5.0
0
0
0
0
MAX
UNIT
5.5
V
Vee
V
0.8
-8
8
mA
Vee
V
Vee
15
70
V
mA
V
ns/V
°e
3-25
N
M
00
00
....
u
~
,
SI03-SIOO
T _\.
55F
I
N
PARITY
GENERATE
",:::E
& 'l
..
~,.
I
I
,
pya-pya
17-10
CF2-CFO
~'8
4
iESI03-IESloa
Oil
GND
I
4
VCC
,2
/'
9 I qJ
4
TP1-TPO
SHIFTER
...I
DIVIDE'
BCD
ClK
I
MQ
REGISTER
32
~
SELMa
t:J
32
Y31·YO
I
MASTERI
SLAVE
COMPARE
t
I
MSERR
Figure 4. 'ACT8832 32-Bit Registered ALU
3-30
4
~3
7 110
I
1
to
4
4
I
I
T
PARITY
COMPARE
EB1-ESO
I
~
1:
o831-0BO
32
I
k~ ~ ~ J'
32
t
PERRO
OEB
2
)-
,
PERRY
'f
.......
r-~
RFCLK
P03-POO
I
"" "o~ 1 .
3.
I
CHECK
1,
r< 1
ALU
/
SHIFTER
I
Co\)
f
L-i
PARITY
C5-CO
85-80
AlU
(")
-4
4
I
32
z
....,
•
I
1 '1
'-J
OA31-DAO
4
,
1
32
,
"
5
T
A
T
U
5
,
,
SELRF1SELRFO
36
36
2
WE3-WEO
C5-CO
A5-AO
PA3-PAO
REGISTER
FILE
64 X 36
BS-BO
RFCLK
4
4
PB3-PBO
32
EA
2
EB1-EBO
DA31-DAO
DB31-DBO
Cn
N
('\')
CO
CO
~
Co)
SELRF1
SELRFO
0
0
-t
0
1
External DB input
1
0
V-output MUX
1
1
External Y port
~
(')
CO
CO
W
N
SOURCE
External DA input
Rand S Multiplexers
ALU inputs are selected by the Rand S multiplexers. Controls which affect operand
selection for instructions other than those using constants or masks are shown in
Table 9.
Table 9. ALU Source Operand Selects
3-32
R-BUS
S-BUS
OPERAND
OPERAND
RESULT
SELECT
SELECT
DESTINATION
EA
EB1-EBO
-SOURCE OPERAND
0
R bus
-Register file addressed by A5-AO
1
R bus
-DA port
00
S bus
-Register file addressed by B5-BO
10
S bus
-DB port
X 1
5 bus
-MO register
Table 10. Destination Operand Select/Enables
REGISTER
FILE
WRITE
ENABLE
WE
Y BUS
OUTPUT
ENABLE
OEY
FILE
SELECT
MOSEL
0
0
0
0
0
0
0
0
0
0
0
1
X
X
X
X
X
1
REGISTER
YMUS
DA
DB
PORT
PORT
OUTPUT OUTPUT
SELECT
RFSEL 1-RFSELO
ENABLE
ENABLE
OEA
OEB
----
SOURCE
X
X
Y/PY
y/py
ALU shifter/parity generate
X
0
0
Y/PY, RF
ALU shifter/parity generate
Y/PY, RF
MQ register/parity generate
RF
External Y/PY
RF
External DA/PA
X
RF
External DB/PB
0
DA/PA
R bus register file output
DA/PA
Hi-Z
0
X
0
--
-
X
1
0
0
~-
RESULT
DESTINATION
---------
MQ register/parity generate
DB/PB
S bus register file output
DB/PB
Hi-Z
Co)
w
Co)
SN74ACT8832
Data Input and Output Ports
The DA and DB ports can be used to load the Sand/or R multiplexers from an external
source or to read S or R bus outputs from the register file. The Y port can be used
to load the register file and to output the next address selected by the Y output
multiplexer. Tables 9 and 10 describe the MUX and output controls which affect DA,
DB, and Y.
ALU
The ALU can perform seven arithmetic and six logical instruction's on the two 32-bit
operands selected by the Rand S multiplexers. It also supports multiplication, division,
normalization, bit and byte operations and data conversion, including excess-3 BCD
arithmetic. The' ACT8832 instruction set is summarized in Table 15.
(J)
The' ACT8832 can be configured to operate as a single 32-bit ALU, two 16-bit ALUs,
~
32-bit word formed by adding leading zeros to the 12 least significant bits of R bus
~ or four 8-bit ALUs (see Figures 6 and 7). It can also be configured to operate on a
l> data. This is useful in certain IBM relative addressing schemes.
(")
-I
CO
CO
W
N
4
SKrn~------~--------+---------~~---+----~
16
V31-V16
16
BVOF3
V15-VO
Figure 6. 16-Bit Configuration
3-34
Z. C. OVR. N
BVOF1
"16
16
,"JiJ
,I
5103-++-l..
n
s
T
QJ
A
IT
U
5
n
s
T
QJ
'i/" '----.g
n
S
T
A
A
T
U
5
T
U
5
4
4
5102
4
5101
5100
8
Y31-Y24
8
BYOF3
Y23-Y16
8
8
BYOF2
Y15-Y8
Figure 7. 8-Bit Configuration
w
W
0'1
SN74ACT8832
BYOF1
Y7-YO
f:J
Z, C, OVR, N
I.
BYOFO
OE5
Configuration modes are controlled by three CF inputs as shown in Table 11. These
signals also select the data from which status signals other than byte overflow will
be generated.
Table 11. Configuration Mode Selects
CONTROL INPUTS
CJ)
:2
-....I
MODE SELECTED
DATA FROM WHICH STATUS OTHER
CF2
CF1
CFO
0
0
0
Four a-bit
THAN BYOF WILL BE GENERATED
Byte 0
0
0
1
Four a-bit
Byte 1
0
1
0
Four a-bit
Byte 2
0
1
1
Four a-bit
Byte 3
1
0
0
Two 16-bit
Least significant 16-bit word
Most significant 1 6-bit word
1
0
1
Two 16-bit
1
1
0
One 32-bit
32-bit word
1
1
1
Masked 32-bit
32-bit word
~
»
n
-I ALU and MQ Shifters
CO
~
N
The ALU and MQ shifters are used in all of the shift, multiply, divide and normalize
functions. They can be used independently for single precision or concurrently for
double precision shifts. Shifts can be made conditional, using the Special Shift Function
(SSF) pin.
Bidirectional Serial lID Pins
Four bidirectional SID pins are provided to supply an end fill bit for certain shift
instructions. These pins may also be used to read bits that are shifted out of the ALU
or MQ shifters during certain instructions. Use of the SID pins as inputs or outputs
is summarized in Table 17.
The four pins allow separate control of end fill inputs in configurations other than 32-bit
mode (see Table 12 and Figure 4).
Table 12. Data Determining SID Input
SIGNAL
16-BIT MODE
a-BIT MODE
-
Byte 3
most significant word
Byte 2
SI01
-
-
Byte 1
SIOO
32-bit word
least significant word
Byte 0
SI03
SI02
3-36
CORRESPONDING WORD, PARTIAL WORD OR BYTE
32-BIT MODE
To increase system speed and reduce bus conflict, four SIO input enables
(lESI03-IESIOO) are provided. A low on these enables will override internal pull-up
resistor logic and force the corresponding SIO pins to the high impedance state
required before an input signal can appear on the signal line. If the SIO enables are not
used, this condition is generated internally in the chip. Use of the enables allow internal
decoding to be bypassed, resulting in faster speeds.
The IESIOs are defaulted to a high because of internal pull-up resistors. When an
SIO pin is used as an output, a low on its corresponding IESIO pin would force
SIO to a high impedance state. The output would then be lost, but the internal
operation of the chip would not be affected.
MQ Register
Data from the MQ shifter is written into the MQ register when a low-to-high transition
occurs on clock ClK. The register has specific functions in double precision shifts,
multiplication, division and data conversion algorithms and can also be used as a
temporary storage register. Data from the register file and the DA and DB buses can
be passed to the MQ register through the AlU.
The Y bus contains the output of the AlU shifter if SElMQ is low and the output of
the MQ register if SElMQ is high. If OEY is low, AlU or MQ shifter output will
be passed to the Y port; if OEY is high, the Y port becomes an input to the
feedback MUX.
Conditional Shift Pin
Conditional shifting algorithms may be implemented using the SSF pin under hardware
or firmware control. If the SSF pin is high or floating, the shifted AlU output will be
sent to the output buffers. If the SSF pin is pulled low externally, the AlU result will
be passed directly to the output buffers, and MQ shifts will be inhibited. Conditional
shifting is useful for scaling inputs in data arrays or in signal processing algorithms.
Master/Slave Comparator
A master/slave comparator is provided to compare data bytes from the Y output MUX
with data bytes on the external Y port when OEY is high. If the data are
not equal, a high signal is generated on the master slave error output pin (MSERR).
A similar comparator is provided for the Y parity bits.
Divide/BCD Flip-Flops
Internal multiply/divide flip-flops are used by certain multiply and divide instructions
to maintain status between instructions. Internal excess-3 BCD flip-flops preserve the
carry from each nibble in excess-3 BCD operations. The BCD flip-flops are affected
by all instructions except NOP and are cleared when a ClR instruction is executed.
The flip-flops can be loaded and read externally using instructions lOADFF and DUMPFF
3-37
C\I
M
CO
CO
I-
()
-I
W
N
3-40
SLC
Logical right single precision shift
Circular left single precision shift
Load MQ register
Pass ALU to Y
Table 15 .• ACT8832 Instruction Set (Continued)
GROUP 3 INSTRUCTIONS
INSTRUCTION BITS
17-10
(HEX)
MNEMONIC
08
SET1
18
SETO
Set bit 0
28
TB1
Test bit (one)
Test bit (zero)
FUNCTION
Set bit 1
38
TBO
48
ABS
58
SMTC
68
ADD I
Add immediate
78
SUBI
Subtract immediate
88
BADD
Byte add R to S
98
BSUBS
Byte subtract S from R
A8
BSUBR
Byte subtract R from S
B8
BINCS
Byte increment S
C8
BINCNS
08
BXOR
Byte XOR Rand S
E8
BAND
Byte AND Rand S
F8
BOR
Absolute value
Sign magnitude/two's complement
Byte increment negative S
Byte OR Rand S
3-41
Table 15. 'ACT8832 Instruction Set (Continued)
GROUP 4 INSTRUCTIONS
INSTRUCTION BITS
en
2
.;:.
.....
»
17-10
(HEX)
MNEMONIC
00
10
20
30
40
50
60
70
80
90
CRC
Cyclic redundancy character accumulation
SEL
Select S or R
FUNCTION
SNORM
Single length normalize
DNORM
Double length normalize
DIVRF
SDIVQF
Divide remainder fix
Signed divide quotient fix
SMUll
Signed multiply iterate
SMULT
Signed multiply terminate
SDIVIN
Signed divide initialize
SDIVIS
Signed divide start
AO
SDIVI
Signed divide iterate
("')
80
UDIVIS
Unsigned divide start
~
CO
UDIVI
Unsigned divide iterate
DO
UMULI
Unsigned multiply iterate
EO
SDIVIT
Signed divide terminate
FO
UDIVIT
Unsigned divide terminate
CO
CO
W
N
3-42
Table 15. 'ACT8832 Instruction Set (Continued)
GROUP 5 INSTRUCTIONS
INSTRUCTION BITS
17-10
(HEX)
MNEMONIC
OF
LOADFF
1F
CLR
Clear
2F
CLR
Clear
3F
CLR
Clear
4F
CLR
Clear
5F
DUMPFF
6F
CLR
7F
BCDBIN
BCD to binary
8F
EX3BC
Excess-3 byte correction
Excess-3 word correction
FUNCTION
Load divide/BCD flip-flops
Output divide/BCD flip-flops
Clear
9F
EX3C
AF
SDIVO
BF
CLR
Clear
Clear
Signed divide overflow test
CF
CLR
DF
BINEX3
EF
CLR
Clear
FF
NOP
No operation
Binary to excess-3
Group 1, a set of ALU arithmetic and logic operations, can be combined with the userselected shift operations in Group 2 in one instruction cycle. The other groups contain
instructions for bit and byte operations, division and multiplication, data conversion,
and other functions such as sorting, normalization and polynomial code accumulation.
Arithmetic/Logic Instructions with Shifts
The seven Group 1 arithmetic instructions operate on data from the Rand/or S
multiplexers and the carry-in. Carry-out is evaluated after ALU operation; other status
pins are evaluated after the accompanying shift operation, when applicable. Group 1
logic instructions do not use carry-in; carry-out is forced to zero.
Possible shift instructions are listed in Group 2. Fourteen single and double precision
shifts can be specified, or the ALU result can be passed unshifted to the MO register
or to the specified output destination by using the LOADMO or PASS instructions.
Table 16 lists shift definitions.
When using the shift registers for double precision operations, the least significant
half should be placed in the MO register and the most significant half in the ALU for
passage to the ALU shifter. An example of a double-precision shift using the ALU and
MO shifters is given in Figure 8.
3-43
SERIAL DATA
INPUT SIGNALS
SIOO_----,
Single Precision Logical Right Single Shift. 32·8it Configuration
SERIAL DATA
INPUT SIGNALS
SIOO..----.
Double Precision Logical Right Single Shift. 32·8it Configuration
Figure 8. Shift Examples, 32·Bit Configuration
All Group 2 shifts can be made conditional using the conditional shift pin (SSF). If the
SSF pin is high or floating, the shifted ALU output will be sent to the output buffers,
MO register, or both. If the SSF pin is pulled low, the ALU result will be passed directly
to the output buffers and any MO shifts will be inhibited.
Table 16. Shift Definitions
SHIFT TYPE
Left
NOTES
Moves a bit one position towards the most significant bit
Right
Moves a bit one position towards the least significant bit
Arithmetic right
Retains the sign unless an overflow occurs, in which case, the
sign would be inverted
Arithmetic left
May lose the sign bit if an overflow occurs. Zero is filled into
the least significant bit unless the bit is set externally
Circular right
Fills the least significant bit in the most significant bit position
Circular left
Fills the most significant bit in the least significant bit position
Logical right
Fills a zero in the most significant bit position unless the bit
Logical left
Fills a zero in the least significant bit position unless the bit
is forced to one by placing a zero on an SID pin
is forced to one by placing a zero on an SID pin
3·44
The bidirectional SIO pins can be used to supply external end fill bits for certain Group 2
shift instructions. When SIO is high or floating, a zero is filled, otherwise a 1 is filled
Table 17 lists instructions that make use of the SIO inputs and identifies input and
output functions.
Table 17. Bidirectional SIO Pin Functions
INSTRUCTION
BITS 17-10
510
MNEMONIC
1/0
0*
SRA
SRAD
0
0
Shift out
1*
2*
SRL
I
Most significant bit
(HEX)
DATA
Shift out
3*
SRLD
I
Most significant bit
4*
SLA
I
Least significant bit
5*
SLAD
I
Least significant bit
6*
SLC
SLCD
8*
SRC
9*
SRCD
A*
MOSRA
0
0
0
0
0
Shifted input to MO shifter
7*
Most significant bit
N
M
00
00
I-
U
Shifted input to MO shifter
«
~
Shifted input to ALU shifter
,....
Shifted input to ALU shifter
Z
Shift out
B*
MOSRL
I
C*
MOSLL
I
Least significant bit
D*
MOSLC
Shifted input to MO shifter
Least significant bit
00
CRC
0
0
20
SNORM
I
30
DNORM
I
Least significant bit
60
SMUll
ALUO
(/J
Internally generated end fill bit
70
SMULT
80
SDIVIN
90
SDIVIS
AO
SDIVI
BO
UDIVIS
CO
UDIVI
DO
UMULI
EO
SDIVT
FO
UDIVIT
0
0
0
0
0
0
0
0
0
0
7F
BCDBIN
I
Least significant bit
DF
BINEX3
0
Shifted input to MO register
ALUO
Internally generated end fill bit
Internally generated end fill bit
Internally generated end fill bit
Internally generated end fill bit
Internally generated end fill bit
Internal input
Internally generated end fill bit
Internally generated end fill bit
3-45
Other Arithmetic Instructions
The 'ACT8832 supports two immediate arithmetic operations. ADDI and SUBI
(Group 3) add or subtract a constimt between the values of 0 and 15 from an operand
on the S bus. The constant value is specified in bits A3-AO.
Twelve Group 4 instructions support serial division and multiplication. Signed, unsigned
and mixed multiplication are implemented using three instructions: SMUll, which
performs a signed times unsigned iteration; SMULT, which provides negative weighting
of the sign bit of a negative multiplier in signed multiplication; and UMULI, which
performs an unsigned multiplication iteration. Algorithms using these instructions are
given in Tables 18., 19, and 20. These include: signed multiplication, which performs
a two's complement multiplication; unsigned multiplication, which produces an
unsigned times unsigned product; and mixed multiplication which multiplies a signed
multiplicand by an unsigned multiplier to produce a signed result.
en
z
Table 18. Signed Multiplication Algorithm
"l>
~
OP
(')
CODE
-4
E4
LOADMQ
W
N
60
70
CO
CO
MNEMONIC
CLOCK
INPUT
INPUT
CYCLES
SPORT
R PORT
Multiplier
-
SMUll
1
N-1 t
Accumulator
Multiplicand
SMULT
1
Accumulator
Multiplicand
OUTPUT
YPORT
Multiplier
Partial product
Product (MSH) i
Table 19. Unsigned Multiplication Algorithm
OP
CODE
MNEMONIC
E4
LOADMQ
DO
UMULI
DO
UMULI
CLOCK
INPUT
INPUT
CYCLES
SPORT
R PORT
1
N-1 t
1
Multiplier
Accumulator
Accumulator
-
OUTPUT
Y PORT
Multiplier
Multiplicand
Partial product
Multiplicand
Product (MSH) i
Table 20. Mixed Multiplication Algorithm
OP
CODE
MNEMONIC
E4
LOADMQ
60
60
CLOCK
INPUT
INPUT
CYCLES
SPORT
R PORT
Multiplier
-
OUTPUT
YPORT
Multiplier
SMUll
1
N-1 t
Accumulator
Multiplicand
Partial product
SMUll
1
Accumulator
Multiplicand
Product (MSH) i
t N = 8 for quad 8-bit mode, 16 for dual 16-bit mode, 32 for 32-bit mode.
tThe least significant half of the product is in the MQ register.
3-46
Instructions that support division include start, iterate and terminate instructions for
unsigned division routines (UDIVIS, UDIVI and UDIVITI; initialize, start, iterate and
terminate instructions for signed division routines (SDIVIN, SDIVIS, SDIVI and SDIVITI;
and correction instructions for these routines (DIVRF and SDIVOFI. A Group 5
instruction, SDIVO, is available for optional overflow testing. Algorithms for signed
and unsigned division are given in Tables 21 and 22. These use a nonrestoring
technique to divide a 16 N-bit integer dividend by an 8 N-bit integer divisor to produce
an 8 N-bit integer quotient and remainder,. where N = 1 for quad 8-bit mode, N = 2
for dual 16-bit mode, and N = 4 for 32-bit mode.
Table 21. Signed Division Algorithm
OP
CODE
MNEMONIC
CLOCK
INPUT
CYCLES
SPORT
Dividend (LSH)
E4
LOADMQ
80
SDIVIN
AF
SDIVO
1
1
1
90
SDIVIS
AO
SDIVI
INPUT
R PORT
-
OUTPUT
Y PORT
Dividend (LSH)
Dividend (MSH)
Divisor
Remainder (N)
Remainder (N)
Divisor
Overflow Test
1
Remainder (N)
Divisor
Remainder (N)
N-2t
Remainder (N)
Divisor
Remainder (N)
N
M
en
en
IU
Result
EO
SDIVIT
Divisor
Remainder§
DIVRF
1
1
Remainder (N)
40
Remainder+
Divisor
Remainder'
50
SDIVQF
1
MQ register
Divisor
Quotient #
«
"d'
"Z
CIJ
tN = 8 for quad 8-bit mode, 16 for dual 16-bit mode, 32 for 32-bit mode.
tThe least significant half of the product is in the MO register.
§Unfixed
, Fixed (corrected)
#The quotient is stored in the MO register. Remainder can be output at the Y port or stored in
the register file accumulator.
Table 22. Unsigned Division Algorithm
OP
CODE
MNEMONIC
E4
LOADMQ
CLOCK
INPUT
CYCLES
SPORT
1
Dividend (LSH)
1
INPUT
R PORT
-
OUTPUT
Y PORT
Dividend (LSH)
Dividend (MSH)
Divisor
Remainder (N)
N-l t
Remainder (N)
Divisor
Remainder (N)
UDIVIT
1
Remainder (N)
Divisor
Remainder+
DIVRF
1
Remainder§
Divisor
Remainder§
BO
UDIVIS
CO
UDIVI
FO
40
tN = 8 in quad 8-bit mode, 16 in dual 16-bit mode, 32 in 32-bit mode
tUnfixed
.
§ Fixed Icorrected)
3-47
Data Conversion Instructions
Conversion of binary data to one's and two's complement can be implemented using
the INCNR instruction (Group 1). SMTC (Group 3) permits conversion from two's
complement representation to sign magnitude representation, or vice versa. Two's
complement numbers can be converted to their positive value, using ABS (Group 3).
SNORM and DNORM (Group 4) provide for normalization of signed, single- and doubleprecision data. The operand is placed in the MQ register and shifted toward the most
significant bit until the two most significant bits are of opposite value. Zeroes are shifted
into the least significant bit, provided 510 is high or floating. (A low on 510 will shift
a one into the least significant bit.) SNORM allows the number of shifts to be counted
and stored in one of the register files to provide the exponent.
(J)
2
......
Data stored in binary-coded decimal form can be converted to binary using BCD BIN
(Group 5). A routine for this conversion, given in Table 23, allows the user to convert
an N-digit BCD number to a 4N-bit binary number in 4N + 8 clock cycles .
~
:r>
Table 23. BCD to Binary Algorithm
C")
-4
CO
CO
W
N
OP
MNEMONIC
CODE
CLOCK
INPUT
INPUT
OUTPUT
CYCLES
SPORT
R PORT
DESTINATION
-
E4
LOADMQ
1
BCD operand
02
SUBR/MQSLC
1
Accumulator
Accumulator
Accumulator/MQ reg.
02
SUBR/MQSLC
1
Mask reg.
Mask reg.
Mask reg/MQ reg.
01
MQSLC
2
Don't care
Don't care
MQ reg.
68
ADDI (15)
1
Accumulator
Decimal 15
Mask reg.
Interim reg/MQ reg.
MQ reg.
REPEAT N-1 TIMES t
DA
AND/MQSLC
1
MQ reg.
Mask reg.
D1
ADD/MQSLC
1
Accumulator
Interim reg.
Interim reg/MQ reg.
7F
BCDBIN
1
Interim reg.
Interim res.
Accumulator/MQ reg.
7F
BCDBIN
1
Accumulator
Interim reg.
Accumulator/MQ reg.
1
MQ reg.
Mask reg.
Interim reg.
1
Accumulator
Interim reg.
Accumulator
END REPEAT
FA
D1
I
AND
ADD MQSLC
tN = Number of BCD digits
BINEX3, EX3BC, and EX3C assist binary to excess-3 conversion. Using BINEX3, an
N-bit binary number can be converted to an N/4- digit excess-3 number. For an
algorithm, see Table 24.
3-48
Table 24. BCD to Binary Algorithm
OP
CODE
E4
02
02
MNEMONIC
CLOCK
INPUT
INPUT
OUTPUT
CYCLES
SPORT
R PORT
DESTINATION
-
LOADMQ
1
Binary number
SUBR
1
Accumulator
Accumulator
Accumulator
MQ reg.
SET1 (33116
1
Accumulator
Mask (33116
Accumulator
REPEAT N TIMES t
OF
BINEX3
1
Accumulator
Accumulator
Accumulator/MQ reg
9F
EX3C
1
Accumulator
Internal data
Accumulator
ENO REPEAT
tN = Number of bits in binary number
N
~
Bit and Byte Instructions
Four Group 3 instructions allow the user to test or set selected bits within a byte.
SET1 and SETO force selected bits of a selected byte (or bytes) to one and zero,
respectively. TB1 and TBO test selected bits of a selected byte (or bytes) for ones
and zeros. The bits to be set or tested are specified by an 8-bit mask formed by the
concatentation of register file address inputs C3-CO and A3-AO. The register file
addressed by B5-BO is used as the destination operand for the set bit instructions.
Register writes are inhibited for test bit instructions. Bytes to be operated on are
selected by forcing SIOn low, where n represents the byte position and 0 represents
the least significant byte. A high on the zero output pin signifies that the test data
matches the mask; a low on the zero output indicates that the test has failed.
Individual bytes of data can also be manipulated using eight Group 3 byte
arithmetic/logic instructions. Bytes can be added, subtracted, incremented, ORed,
ANDed and exclusive ORed. Like the bit instructions, bytes are selected by forcing
SIOn low, but multiple bytes can be operated on only if they are adjacent to one another;
at least one byte must be nonselected.
Other Instructions
SEL (Group 4) selects one of the ALU's two operands, S or R, depending on the state
of the SSF pin. This instruction could be used in sort routines to select the larger or
smaller of two operands by performing a subtraction and sending the status result
to SSF. CRC (Group 4) is designed to verify serial binary data that has been transmitted
over a channel using a cyclic redundancy check code. An algorithm using this instruction
is given in Table 25.
3-49
~
(,)
~
~
Z
en
Table 25. CRC Algorithm
OP
CODE
MNEMONIC
CLOCK
INPUT
INPUT
OUTPUT
R PORT
DESTINATION
Polynomial g(x)
Poly reg.
1
SPORT
Vector c'(x)t
F6
·INCR
1
-
F2
SUBR
1
Accumulator
Accumulator
Accumulator
Accumulator
E4
LOADMQ
CYCLES
-
MQ reg.
REPEAT n/BN TIMESt
00
CRC
1
Accumulator
Poly reg.
E4
LOADMQ
1
Vector c'(x) t
-
MQ reg.
END REPEAT
en
tN = Number of bits in binary number
n = Length of the code vector
:2
-..J
t
CLR forces the ALU output to zero and clears the internal BCD flip-flops used in excess-3
BCD operations. NOP forces the ALU output to zero, but does not affect the flip-flops.
n
....
Configuration Options
00
00 The' ACT8832 can be configured to operate in 8-bit, 16-bit, or 32-bit modes, depending
eN
N on the setting of the configuration mode selects (CF2-CFO). Table 11 shows the control
inputs for the four operating modes. Selecting an operating configuration other than
32-bit mode affects ALU operation and status generation in several ways, depending
on the mode selected.
Masked 32-Bit Operation
Masked 32-bit operation is selected to reset to zero the 20 most significant bits of
the R Mux input. The 12 least significant bits are unaffected by the mask. Only Group
1 and Group 2 instructions can be used in this operating configuration. Status
generation is similar to unmasked 32-bit operating mode.
Shift Instructions
Shift instructions operate similarly in 8-bit, 16-bit, and 32-bit modes. The serial I/O
(SI03'-SI00') pins are used to select end-fill bits or to shift bits in or out, depending
on the operation being performed. Table 12 shows the SIO signals associated with
each byte or word in the different modes, and Table 17 indicates the specific function
performed by the SIO pins during shift, multiply, and divide operations.
Figures 9 and 10 present examples of logical right shifts in 16-bit and 8-bit
configurations.
3-50
SERIAL DATA
INPUT SIGNALS
SIOO-'~---------------------------------~
SI02-+--~L
Single Precision Logical Right Single Shift. 16-Bit Configuration
SERIAL DATA
INPUT SIGNALS
SIOO - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ,
Double Precision Logical Right Single Shift. 16-Bit Configuration
Figure 9. Shift Examples, 16-Bit Configuration
Bit and Byte Instructions
The' ACT8832 performs bit operations similarly in 8-bit, 16-bit, and 32-bit modes.
Masks are loaded into the R MUX on the A3-AO and C3-CO address inputs, and the
bytes to be masked are selected by pulling their 510' inputs low. Instructions which
set, reset, or test bits are explained later
Byte operations should be performed in 32-bit mode to get the necessary status
outputs. While byte overflow signals are provided for all four bytes (BYOF3-BYOFOI.
the other status signals (C, N, Z) are output only for the word selected with the
configuration control signals (CF2-CFO).
Status Selection
Status results (C, N, Z, and overflow) are internally generated for all words in all modes,
but only the overflow results (BYOF3-BYOFO) are available for all four bytes in 8-bit
mode or for both words in 16-bit mode. If a specific application requires that the four
status results are read for two or four words, it is possible to toggle the configuration
3-51
SERIAL DATA
INPUT SIGNALS
SIOO-.~-------------------------------------------------~
SI01~---------------------------------.
SI02~----------------~
,.-----.".,;----,
Single-Precision Logical Right Shift. 8-8it Configuration
SERIAL DATA
INPUT SIGNALS
SIOIo~--------------------------------------------------~
SI01~---------------------------------.
SI02~-----------------.
,.----rn;----,
en
SI03
Z
...,J
~
»
(")
-t
00
00
eN
N
Double-Precision Logical Right Shift. 8-8it Configuration
Figure 10. Shift Examples, 8-Bit Configuration
control signals (CF2-CFO) within the same clock cycle and read the additional status
results. This assumes that the necessary external hardware is provided to toggle
CF2-CFO and collect the status for the individual words before the next clock signal
is input.
Instruction Set
The' ACT8832 instruction set is presented in alphabetical order on the following pages.
The discussion of each instruction includes a functional description, list of possible
operands, data flow diagram, and notes on status and control bits affected by the
instruction. Microcoded examples are also shown.
Mnemonics and opcodes for instructions are given at the top of each page. Opcodes
for instructions in Groups 1 and 2 are four bits long and are combined into eight-bit
instructions which select combinations of arithmetic, logical, and shift operations.
Opcodes for the other instruction groups are all eight bits long.
An asterisk in the left side of the opcode box for a Group 1 instruction indicates that
a Group 2 opcode is needed to complete the instruction. An asterisk in the right side
of a box indicates that aGroup 1 opcode is required to combine with the Group 2
opcode in the left side of the box.
3-52
Absolute Value
ABS
I4 I8 I
FUNCTION
Computes the absolute value of two's complement data on the S bus.
DESCRIPTION
Two's complement data on the S bus is converted to its absolute value. The carry
must be set to one by the user for proper conversion. ABS causes S' + Cn to be
computed; the state of the sign bit determines whether S or S' + Cn will be selected
as the result. SSF is used to transmit the sign of S.
Available R Bus Source Operands
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
..
A3-AO
Mask
No
No
No
No
Available S Bus Source Operands
RF
(85-80)
Yes
D8-Port
MQ
Register
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (85-80)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
None
None
Control/Data Signals
Signal
User
Programmable
Use
SSF
No
Inactive
SiO'O
No
Inactive
SI01
No
Inactive
SI02
No
Inactive
SI03
No
Inactive
Cn
Yes
Should be programmed high for proper conversion.
3-53
1418
Absolute Value
ABS
Status Signals
ZERO
N
OVR
1 if result = 0
1 if MSB (input) = 1
1 if input of most significant byte is 80 (Hex) and inputs (if any) in all
other bytes are 00 (Hex).
C=1ifS=0
EXAMPLES (assumes a 32-bit configuration)
Convert the two's complement number in register 1 to its positive value and store
the result in register 4.
en
2:
-...I
~
»
Instr
Oprd
Oprd
Code
17-10
Addr
A5-AO
Addr
B5-BO
01001000
XX XXXX
000001
Oprd Sel
Dest
EB1-
Addr
EA EBO
X
00
Destination Selects
WE3- SELRF1-
C5-CO
SELMQ
-WEO
SELRFO
000100
0
0000
10
X
X
n
~ Example 1: Assume register file 1 holds F6D81340 (Hex):
CO
Source
11110110110110000001001101000000
Is+- RF(1)
Destination
00001001 00100111 1110 1100 11000000
I
~
RF(4)
+- S + Cn
Example 2: Assume register file 1 holds 09D527CO (Hex):
Source
00001001110101010010011111000000
Is+- RF(1)
Destination
00001001 1101 0101 00100111 1100 0000
I RF(4) +- S
3-54
CF2-
OEY3
0eA DEB 0eY0 DES
xxxx
0
Cn
CFO
1
110
ADD
Add with Carry (R + S + Cn)
1
FUNCTION
Adds data on the Rand S buses to the carry-in.
DESCRIPTION
Data on the Rand S buses is added with carry. The sum appears at the ALU and MQ
shifters.
·The result of this instruction can be shifted in the same microcycle by specifying a shift instruction in the
upper nibble (17-14) of the instruction field. The result may also be passed without shift. Possible instructions
are listed in Table 15.
Available R Bus Source Operands
N
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
M
..
CX)
CX)
~
A3-AO
()
Mask
Yes
No
Yes
«c:t
No
I"'-
Z
Available S Bus Source Operands
CJ)
MQ
RF
DB-Port
(B5-BO)
Register
Yes
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
Yes
Yes
3-55
Add with Carry (R + S + Cn)
1
ADD
Control/Data Signals
User
Signal
Use
Programmable
SSF
No
Affect shift instructions programmed in bits 17-14 of
SIOO
No
Inactive
SI01
No
Inactive
SI02
No
Inactive
SI03
No
Inactive
Cn
Yes
Increments sum if set to one.
instruction field.
(f) Status Signals t
2
-.J
if result = 0
ZERO
~
l>
N
(")
OVR
-t
C
1 if MSB = 1
1 if signed arithmetic overflow
if carry-out
=
1
CO
CO tc is ALU carry out and is evaluated before shift operation. ZERO and N (negative) are evaluated
eN after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
N
EXAMPLES (assumes a 32-bit configuration)
Add data in register 1 to data on the DB bus with. carry-in and pass the result to the
MQ register.
Instr
Oprd
Oprd
Code
Addr
Addr
17-10
A5-AO
B5·BO
1110 0001
00 0001
XX XXXX
Ioprd Sel
EB1-
Eli" EBO
a 10
Destination Selects
Dest
Addr
WE3- SELRF1-
C5·CO
SELMO
WEO
XX XXXX
a
1111
OEY3·
SELRFO OEA
10
CF2·
OEB
OEYO
OES
Cn
CFO
X
XXXX
a
a
110
X
Assume register file 1 holds 0802C618 (Hex and DB bus holds 1 E007530 (Hex):
Source
0000 1000 0000 0010 1100 0110 0001 1000
I R +- RF( 1)
Source
0001 1110 0000 0000 0111 0101 0011 0000
Is+- DB bus
Destination
0010 0110 0000 0011 0011 1011 0100 1000
MQ register
3-56
+-
R
+ S + Cn
ADDI
I6I8 I
ADD Immediate
FUNCTION
Adds four-bit immediate data on A3-AO with carry to S-bus data.
DESCRIPTION
Immediate data in the range 0 to 15, supplied by the user at A3-AO, is added with
carry to S.
Available R Bus Source Operands (Constant)
C3-CO
RF
A3-AO
..
DA-Port
(A5-AO) Immed
N
M
00
00
Mask
No
Yes
No
No
~
U
Available S Bus Source Operands
n
Logically AND the contents of register 3 and register 5 and store the result
in register 5.
.
Instr
Code
17-10
Op,d
Add,
Op,d
Add,
Op,d Sel
AS-AO
Bq-BO
EA EBO
11111010
000011
000101
EB1·
0
00
Dest
Add,
Destination Selects
WEj. SELRF1-
CS-CO
SELMQ
WED
000101
0
0000
SELRFO
10
CF2-
OEY3
OEA
X
0eB 0eY0 DeS
X
XXX X
0
Cn
CFO
X
110
~
~
Assume register file 3 holds F617D840 (Hex) and register file 5 holds 15F6D842 (Hex):
Co\)
Source
111101100001 0111 1101 100001000000
I
R - RF(3)
Source
0001 0101 1111 01101101 100001000010
I
S - RF(5)
Destination
0001 01000001 01101101 1000 0100 0000
I
RF(5) - RAND S
N
3-60
ANDNR
Logic AND Negative R (R' AND S)
*
I
E
FUNCTION
Computes the logical expression S AND NOT R.
DESCRIPTION
The logical expression S AND NOT R is computed. The result appears at the ALU and
MQ shifters.
"The result of this instruction can be shifted in the same microcycle by specifying a shift instruction in the
upper nibble 07-14) of the instruction field. The result may also be passed without shift. Possible instructions
are listed in Table 15.
Available R Bus Source Operands
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
Yes
No
N
('I')
..
A3-AO
CO
CO
Mask
(.)
Yes
~
«q-
No
I"'"
Z
Available S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
MQ
Register
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
en
No
Shift Operations
Y-Port
ALU
MQ
Yes
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Affect shift instructions programmed in bits 17-14 of
5100
No
Inactive
instruction field.
5101
No
Inactive
5102
No
Inactive
5103
No
Inactive
Cn
No
Inactive
3-61
Logic AND Negative H (H' AND S)
ANONH
Status Signals t
ZERO
I
= 1 if result = 0
N = 0
OVR = 0
C = 0
t C is ALU carry out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Iflvert the contents of register 3, logically AND the result with data in register 5
and store the result in register 10.
en
2
-..J
.J:Io
»
n
-I
Inst,
Code
17-10
11111110
Op,d
Add,
A5-AO
000011
Op,d
Add,
B5-BO
000101
Op,d Sel
Dest
Destination Selects
EB1Addr
SELRF1EAEBO
C5-CO
SELMQ WEb SELRFO CiEA OEB
10
X
0 00
001010
0
0000
X
wea-
omOEYO
DES
XXXX
0
CF2Cn CFO
X 110
CO
CO Assume register file 3 holds 1 5F6D840 (Hex) and register file 5 hold F61 7D842 (Hex):
Co\)
N
Source
0001010111110110 110110oo010Qoooo
I R-
Source
1111 01100001 0111 1101 100001000010
I
S - RF(5)
Destination
11100010000000010000000000000010
I
RF(10) - RAND S
3-62
FlF(3)
BADD
Byte Add R to S with Carry
8
8
FUNCTION
Adds 8 with carry-in to a selected byte or selected adjacent bytes of R.
DESCRIPTION
8103-8100 are used to select bytes of R to be added to the corresponding bytes of
8. A byte of R with 810 programmed low is selected for the computation of
R + 8 + en. If the 810 signal for a byte of R is left high, the corresponding byte
of 8 is passed unaltered. Multiple bytes can be selected only if they are adjacent to
one another. At least one byte must be nonselected.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
Yes
No
Available S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
MQ
Register
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
None
None
Control/Data Signals
Signal
User
Use
Programmable
Inactive
SSF
No
5100
Yes
Byte select
5101
Yes
Byte select
5102
Yes
Byte select
5103
Yes
Byte select
Cn
Yes
Propagates through nonselected bytes; increments
selected byte(s) if programmed high.
3-63
18 18
BADD
Byte Add R to S with Carry
Status Signals
ZERO
N
1 if result (selected bytes) = 0
o
if signed arithmetic overflow (selected bytes)
OVR
if carry-out (most significant selected byte) = 1
C
EXAMPLE (assumes a 32-bit configuration)
Add bytes 1 and 2 of register 3 with carry to the contents of register 1 and store the
result in register 11.
en
z
-..J
~
l>
n
Instr
Oprd
Oprd
Oprd Sel
Dest
Code
Addr
Addr
EB1-
Addr
17-10
AS-AO
BS-BO
0100 1000 000011
000001
Eii EBO
0
00
Destination Selects
WE3-
SELRF1-
CS-CO
SELMa
WEo
SELRFO
001011
0
0000
10
om-
CF2-
0Eii Oeii OEYO 5Es
X
X
XXXX
0
Cn CFO
1
Si03- iESi03'SiOo IESiOO
110 1001
0000
Assume register file 3 holds 2C018181 (Hex) and registerfile 1 holds 7A8FBE3E (Hex):
Source
0010110000000001 10000001 10000001
I Rn'" RF(3)n
Source
011110101000 11111011111000111110
I
ALU
101001101001 0001 0100 000011000000
I Fn'" Rn + Sn + Cn
Destination
01111010100100010100 1111 00111110
I
-I
CO
CO
eN
N
tF = ALU result
n = nth byte
Register file 11 gets F if byte selected. S if byte not selected.
3-64
Sn'" RF(l)n
RF(11)n'" Fn or Sn t
Byte AND RAND S (Byte Logical AND RAND S)
BAND
IEI8 I
FUNCTION
Evaluates the logical AND of selected bytes of R-bus and S-bus data.
DESCRIPTION
Bytes with their corresponding SIO signals programmed low compute RAND S. Bytes
with SIO signals programmed high, pass S unaltered. Multiple bytes can be selected
only if they are adjacent to one another. At least one byte must be nonselected.
Available R Bus Source Operands
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
Yes
No
..
A3-AO
N
Mask
CO
CO
Yes
C")
No
I-
o
oCt
Available S Bus Source Operands
,...
¢
RF
MQ
DB-Port
(B5-BO)
Register
Yes
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
z
en
No
Shift Operations
Y-Port
ALU
MQ
Yes
None
None
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Forced low
SIOO
Yes
Byte select
SIOl
Yes
Byte select
SI02
Yes
Byte select
SI03
Yes
Byte select
Cn
No
Inactive
3-65
IEI8
Byte AND RAND S (Byte Logical AND RAND S)
BAND
Status Signals
ZERO
N
OVR
C
1 if result (selected bytes) =0
0
0
0
EXAMPLE (assumes a 32-bit configuration)
Logically AND bytes 1 and 2 of register 3 with input on the DB bus; store the result
in register 3.
,
Instr
Oprd
Oprd
Oprd Sel
best
C/)
Code
Addr
Addr
EB1-
Addr
Z
17-10
AS-AD
BS-BO
"-oJ
11101000 000011
XX XXXX
t\ EBO
0
10
Destination Selects
C5-CO
SELMa
WE3WEo
000011
0
0000
SELRF 110
SELRFO
0mX
X
~
»
~
CF2-
i5EA 0Eii 6EYo DEs
xxxx
Cn CFO
0
X
Si03- iESiOOSiOo IEsiOo
110 1001
0000
Assume register file 3 holds 398FBEBE (Hex) and input on the DB port is 4290BFBF
(Hex):
00
00
I Rn -
Source
001110011000 11111011111010111110
Source
01000010 1001 0000 1011 1111 1011 1111
Sn - DBn
Destination
01000010 10000000 1011 1110 1011 1111
RF(3)n - Fn or Sn t
W
RF(3)n
N
tF = ALU result
n = nth byte
Register file 3 gets F if byte selected, S if byte not selected.
3-66
BCDBIN
BCD to Binary
1
F
FUNCTION
Converts a BCD number to binary.
DESCRIPTION
This instruction allows the user to convert an N-digit BCD number to a 4N-bit binary
number in 4(N-1) plus 8 clocks. The instruction sums the Rand S buses with carry.
A one-bit arithmetic left shift is performed on the ALU output. A zero is filled into bit 0
of the least significant byte unless SIOO is set low, which would force bit 0 to one.
Bit 7 of the most significant byte is dropped.
Simultaneously, the contents of the MQ register are rotated one bit to the left. Bit
7 of the most significant byte is rotated to bit 0 of the least significant byte.
N
M
00
00
Recommended R Bus Source Operands
IU
C3-CO
A3-AO
RF
(A5-AO) Immed
DA-Port
..
(")
-t
N
OVR
C
00
00
Co\)
N
Should be programmed low for proper conversion.
1
1
1
1
if result = 0
if MSB = 1
if signed arithmetic overflow
if carry-out = 1
ALGORITHM
The following code converts an N-digit BCD number to a 4N-bit binary number in 4(N-1 )
plus 8 clocks. This is one possible user generated algorithm. It employs the standard
conversion formula for a BCD number (shown here for 32 bits):
ABCD = [(A
x 10 + B) x 10 + C] x 10 + D.
The conversion begins with the most significant BCD digit. Addition is performed in
radix 2.
3-68
BCDBIN
BCD to Binary
I7IFI
PSEUDOCODE
LOADMO
NUM
Load MO with BCD number.
SUB
ACC, ACC, SLCMO
Clear accumulator;
Circular left shift MO.
SUB
MSK, MSK, SLCMO
Clear mask register;
Circular left shift MO.
SLCMO
Circular left shift MO.
SLCMO
Circular left shift MO.
ADD I
ACC, MSK, 15
Store 1 5 in mask register.
Repeat N-1 times:
N
M
CO
CO
(N '" number of BCD digits)
AND
ADD
MO, MSK, R1,
SLCMO
ACC, R1, R1, SLCMO
I(.)
Extract one digit;
Circular left shift MO.
Source
01000000100011111011111010111110
I Sn
+--
RF(7)n
ALU
01000000 1000 11111011111110111110
I Fn
+--
Sn
Destination
0100 00001000 1111 1011 1111 1011 1110
I RF(2)n
+--
+
~
..
A3-AO
Mask
Yes
No
Yes
No
("')
-t
00
00
Co\)
N
Available S Bus Source Operands
RF
MO
DB-Port
(B5-BO)
Register
Yes
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
·No
Shift Operations
Y-Port
ALU
MO
Yes
None
None
Control/Data Signals
Signal
User
Use
Programmable
Inactive
SSF
No
5100
Yes
Byte select
SiOi'
Yes
Byte select
Si02
Yes
Byte select
5103
Yes
Byte select
Cn
Yes
Propagates through nonselected bytes; should be
set high for two's complement subtraction.
3-78
IAI8 I
Byte Subtract R from S with Carry
BSUBR
Status Signals
1 if result (selected bytes) = 0
ZERO
o
N
if signed arithmetic overflow (selected bytes)
OVR
C
if carry-out (most significant selected byte)
EXAMPLE (assumes a 32-bit configuration)
Subtract bytes 1 and 2 of register 1 with carry from bytes 1 and 2 of register 3.
Concatenate with bytes 0 and 3 of register 3, storing the result in register 11.
Instr
Oprd
Oprd
Oprd 5.1
Oest
Code
Addr
Addr
EB1-
Addr
17-10
A5-AO
B5-SO
10101000
00 0001
000011
EAESO
0
00
Destination Selects
C5-CO
SELMa
WE3WEo
00 1011
0
0000
SELRF110
SELRFO OEA
X
om-
0Eii
X
CF2-
OEYO
DES
XXXX
0
Cn CFO
1
Si'53- i'Esi03Si50 iEsiOO N
M
110 1001
0000
Assume register file 1 holds 09185858 (Hex) and register file 3 holds 703A9898 (Hex):
Source
0000 1001000110110101100001011000
I Rn
+-
CO
CO
~
U
~
RF(1)n
I'
Z
Source
0111 00000011 1010 1001 10001001 1000
I Sn
+-
RF(3)n
ALU
01100111 0001 1111 0100000001000000
I Fn
+-
R'n
Destination
0111 00000001 1111 01000000 1001 1000
I RF( 11)n
rJ)
+ Sn + Cn
+-
Fn or Sn t
t F = ALU result
n = nth package
Register file 11 gets F if byte selected. S if byte not selected.
3-79
I9 I8
Byte Subtract S from R with Carry
BSUBS
FUNCTION
Subtracts S from R in selected bytes.
DESCRIPTION
Bytes with SIO inputs programmed low compute R + S' + Cn. Bytes with SIO inputs
programmed high. pass S unaltered. Multiple bytes can be selected only if they are
adjacent to one another. At least one byte must be nonselected.
Available R Bus Source Operands
C3-CO
RF
A3-AO
DA-Port
(A5-AOI Immed
Mask
(I)
2
..
A3-AO
Yes
No
Yes
No
-.J
~
l>
n
-f
CO
CO
Co\)
N
Available S Bus Source Operands
RF
(B5-BOI
Yes
DB-Port
MQ
Register
Yes
Yes
Available Destination Operands
RF
RF
(C5-COI (B5-BOI
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
None
None
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
SIOO
Yes
Byte select
SiOT
Si02
Yes
Byte select
Yes
Byte select
5103
Yes
Byte select
Cn
Yes
Propagates through nonselected bytes; should be
set high for two's complement subtraction.
3-80
BSUBS
I9I8I
Byte Subtract S from R with Carry
Status Signals
ZERO
N
1 if result (selected bytes) = 0
o
if signed arithmetic overflow (selected bytes)
OVR
if carry-out (most significant selected byte)
C
EXAMPLE (assumes a 32-bit configuration)
Subtract bytes 1 and 2 of register 3 with carry from bytes 1 and 2 of register 1.
Concatenate with bytes 0 and 3 of register 3, storing the result in register 11.
Instr
Op,d
Op,d
Op,d S.I
Dest
Code
Add,
Add,
EB1-
Add,
17-10
A5-AO
B5-BO
1001 1000 00 0001
000011
EAEBO
0
00
Destination Selects
We3-
SELRF1-
C5-CO
SELMa
WEo
SELRFO
001011
0
0000
10
om-
CF2-
Si03- iESiOaiEsiOo
QEij 0Ev0 (ill; Cn CFO SiOO
X
X XXXX 0
1 110 1001
0eA
0000
C'II
(¥)
Assume register file 1 holds 52888888 (Hex) and register file 3 holds 143A9898 (Hex):
Source
Source
0101 0010100010001011 1000 1011 1000
I Rn -
0001 01000011 10101001 1000 1001 1000
I
CO
CO
....
RF(1)n
CJ
Sn - RF(3)n
I'
~
~
2
CJ)
ALU
0011 11100100 111000100000 0010 0000
I Fn -
Destination
0101 00100100111000100000 1011 1000
I RF(11)n -
Rn
+ S'n +
Cn
Fn or Sn t
t F = AlU result
n = nth byte
Register file 11 gets F if byte selected. S if byte not selected.
3-81
Byte XOR Rand S
(Byte Exclusive OR Rand S)
lola
BXOR
FUNCTION
Evaluates R exclusive OR S in selected bytes.
DESCRIPTION
Bytes with SIO inputs programmed low evaluate R exclusive OR S. Bytes with SIO
inputs programmed high, pass S unaltered. Multiple bytes can be selected only ifthey
are adjacent to one another. At least one byte must be nonselected.
Available R Bus Source Operands
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
en
z
.....
..
A3-AO
Mask
Yes
No
Yes
No
~
»
(")
-t
CO
CO
W
N
Available S Bus Source Operands
RF
MQ
DB-Port
(B5-BO)
Register
Yes
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
None
None
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
5100
Yes
Byte select
5101
Yes
Byte select
5102
Yes
Byte select
5103
Yes
Byte select
Cn
No
Inactive
3-82
Byte XOR Rand S
(Byte Exclusive OR Rand S)
BXOR
I0 I8 I
Status Signals
ZERO
N
OVR
C
1 if result (selected bytes) = 0
o
o
o
EXAMPLE (assumes a 32-bit configuration)
Exclusive OR bytes 1 and 2 of register 6 with bytes 1 and 2 on the DB bus; concatenate
the result with DB bytes 0 and 3, storing the result in register 10.
Instr
Op,d
Op,d
Op,d Sel
Dest
Code
Add,
Add,
EB1-
Add,
WE3-
17-10
A5-AO
B5-BO
C5-CO
SELMO WEO SELRFO
1101 1000 000110
XX XXXX
EAEBO
0
10
001010
Destination Selects
0
SELRF1-
0000
10
om-
CF2-
0eA Oeii 0Ev0 OES
X
X
XXXX
0
Cn CFO
1
Si53"- iEsi'03SiOO iEsiOo
110 1001
0000
Assume register file 6 holds 938FBEBE (Hex) and the DB bus holds 4190BEBE (Hex):
Source
100100111000 11111011111010111110
I Rn -
RF(6)n
Source
0100 0001 1001 0000 1011 1110 1011 1110
I Sn -
DBn
Destination
0100 0001 0001 1111 0000 0000 1011 1110
I
RF( 1O)n - Fn or Sn t
tF
= ALU result
n = nth pac~age
Register file 10 gets F if byte selected, S if byte not selected.
3-83
I F It
1
CLEAR
FUNCTION
Forces ALU output to zero and clears the BCD flip-flops.
DESCRIPTION
ALU output is forced to zero and the BCD flip-flops are cleared.
tThis instruction may also be coded with the following opcodes:
[2] [F]. [3] [F], [4] [F], [6] [F], [B] [F], [e] [F], [E] [F]
Available R Bus Source Operands
C3-CO
RF
A3-AO
(AS-AO) Immed
DA-Port
CJ)
2
-.J
..
A3-AO
Mask
No
No
No
No
~
~
-I
(X)
(X)
eN
N
Available S Bus Source Operands
RF
(BS-BO)
DB-Port
No
No
MQ
Register
No
Available Destination Operands
RF
RF
(CS-CO) (85-80)
Yes
No
Status Signals
IZER~
OVR
Cn
3-84
1
o
o
o
Shift Operations
Y-Port
ALU
MQ
Yes
None
None
CLR
CRC
Cyclic Redundancy Character Accumulation
I0I0I
FUNCTION
Evaluates R exclusive OR S for use with cyclic redundancy check codes.
DESCRIPTION
Data on the R bus is exclusive ORed with data on the S bus. If MOO XNORed with
SO is zero (MOO is the LSB of the MO register and SO is the LSB of S-bus data), the
result is sent to the ALU shifter. Otherwise, data on the S bus is sent to the ALU shifter.
A right shift is performed; the MSB is filled with RO (MOO XOR SO), where RO is the
LSB of R-bus data. A circular right shift is performed on MO data.
Recommended R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
N
M
00
00
..
A3-AO
~
Mask
Yes
No
No
u
«~
No
"enZ
Recommended S Bus ,Source Operands
MQ
RF
DB-Port
(B5-BO)
Register
Yes
Yes
No
Recommended Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
Shift Operations
Y-Port
ALU
MQ
No
Right
Right
No
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
5100
5101
5102
5103
Cn
No
Inactive
No
Inactive
No
Inactive
No
Inactive
No
Inactive
3-85
10 10
Cyclic Redundancy Character Accumulation
CRC
Status Signals
ZERO
=
1 if result = 0
N = 0
I
OVR = 0
en
= 0
CYCLIC REDUNDANCY CHARACTER CHECK
DESCRIPTION
en
~
Serial binary data transmitted over a channel is susceptible to error bursts. These bursts
may be detected and corrected by standard encoding methods such as cyclic
redundancy check codes, fire codes, or computer generated codes. These codes all
divide the message vector by a generator polynomial to produce a remainder that
contains parity information about the message vector.
~
l> If a message vector of m bits, a(x), is divided bya generator polynomial, g(x), of order
n
k-1, a k bit remainder, r(x), is formed. The code vector, c(x), consisting of mIx) and
r(x) of length n = m + k is transmitted down the channel. The receiver divides the
received vector by g(x).
N
After m divide iterations, r(x) will be regenerated only if there is no error in the message
bits. After k more iterations, the result will be zero if and only if no error has occurred
in either the message or the remainder.
-f
CO
CO
W
ALGORITHM
An algorithm for a cyclic redundancy character check, using the 'ACT8832 as a
receiver, is given below:
LOADMQ VEC(X)
Load MQ with first 32 message bits of
received vector c' (x).
LOAD POLY
Load register with polynomial g(x).
CLEAR SUM
Clear register acting as accumulator.
REPEAT (n/32) TIMES:
SUM = SUM CRC POLY
Perform CRC instruction where
R Bus = POLY
S Bus = SUM
Store result in SUM.
LOADMQ VEC(X)
Load MQ with next 32 message bits of
received vector c'(x).
(END REPEAT)
3-86
CRC
Cyclic Redundancy Character Accumulation
I0 I0 I
SUM now contains the remainder [r'(x)) of c'(xl. A syndrome generation routine may
be called next, if required.
Note that the most significant bit of
g(x) = (gk-1 )(xk-1)
+
(9k_2)(x k - 2 )
+ .. (go)(x O )
is implied and that POL Y(O) is set to zero if the length of g(x) requires fewer bits than
are in the machine word width.
3-87
1410
Divide Remainder Fix
DlVRF
FUNCTION
Corrects the remainder of nonrestoring division routine if correction is required.
DESCRIPTION
DIVRF tests the result of the final step in nonrestoring division iteration: SDIVIT (for
signed division) or UDIVIT (for unsigned division). An error in the remainder results
when it is nonzero and the signs of the remainder and the dividend are different.
The R bus must be loaded with the divisor and the S bus with the most significant
half of the previous result. The least significant half is in the MO register. The Y bus
result must be stored in the register file for use during the subsequent SDIVOF
instruction.
CJ)
~
~
DIVRF tests to determine whether a fix is required and evaluates:
Y +- S + R' + 1 if a fix is necessary
Y +- S + R + 0 if a fix is unnecessary
l>
(")
-t
Overflow is reported to OVR at the end of the division routine (after SDIVOF).
~
Recommended R Bus Source Operands
00
N
C3-CO
A3-AO
RF
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
No
No
Recommended S Bus Source Operands
MQ
RF
DB-Port
(B5-BO)
Register
Yes
Yes
No
Recommended Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
3-88
No
Shift Operations
Y-Port
ALU
MQ
No
None
None
DIVRF
Divide Remainder Fix
I4 I0 I
Control/Data Signals
User
Signal
Use
Programmable
SSF
No
Inactive
5100
No
Inactive
SiC5T
No
Inactive
5102
No
Inactive
5103
No
Inactive
Cn
Yes
Should be programmed high
Status Signals
ZERO
N
OVR
Cn
1 if remainder = 0
N
o
o
CO
CO
l-
('I)
t.)
1 if carry-out =
n
"""4
00
00
W
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mas!<
No
No
No
No
N
Recommended S Bus Source
Operands (MSH)
RF
(B5-BO)
Yes
DB-Port
No
MQ
Register
No
Recommended Destination
Operands
RF
RF
(C5-CO) (B5-BO)
Yes
3-90
No
Shift Operations
(conditional)
Y-Port
ALU
MQ
No
Left
Left
DNORM
o
3
Double-Length Normalize
Control/Data Signals
User
Signal
Use
Programmable
SSF
No
Inactive
5100
Yes
When low, selects a one end-fill bit in LSB
5101
No
Passes internally generated end-fill bits
5102
No
5103
No
Cn
No
Status Signals
ZERO
N
OVR
Cn
1 if result = 0
N
1 if MSB = 1
C')
1 if MSB XOR 2nd MSB
00
00
o
lt)
<
'¢
EXAMPLE (assumes a 32-bit configuration)
,....
Normalize a double-precision number.
z
(This example assumes that the MSH of the number to be normalized is in register 3
and the lSH is in the MQ register. The zero on the OVR pin at the end of the instruction
cycle indicates that normalization is not complete and the instruction should be
repeated).
Instr
Oprd
Oprd
Oprd Sel
Code
Addr
Addr
EB1-
17-10
A5-AO
B5-BO
00110000
XX XXXX
000011
Eli: EBO
X
00
Dest
Addr
C5-CO
000011
Destination Selects
SELRF1-
SELMO
WE3WeO
0
0000
10
SELRFO
Offi"-
0eA We
X
X
CF2-
OEYO
OES
Cn
CFO
XXXX
0
X
110
Assume register file 3 holds FA75D84E (Hex) and MQ register holds 37F6D843 (Hex):
I ALU shifter
Source
11111010011101011101100001001110
Source
0011 0111 1111 01101101 100001000011
MQ shifter
Destination
1111 010011101011 1011 0000 1001 1101
8RF(3)
Destination
01101111 11101101 1011 000010000110
+-
+-
OVR
+-
MQ register
Result (MSH)
I MQ register
GJ
RF(3)
+-
+-
Result (LSH)
ot
tNormalization not complete at the end of this instruction cycle.
3-91
en
I5 IF
Output Divide/BCD Flip-Flops
DUMPFF
FUNCTION
Output contents of the divide/BCD flip-flops.
DESCR,PTION
The contents of the divide/BCD flip-flops are passed through the MQ register to the
Y output Imultiplexer.
Available R Bus Source Operands
C3-CO
en
2
'-I
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
No
No
No
No
~
»(")
-f
CO
CO
W
N
Available S Bus Source Operands
RF
MQ
DB-Port
(B5-BO)
Register
No
No
No
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
No
No
Status Signals
IZER~
=
=
0
0
OVR = 0
Cn
3-92
= 0
Shift Operations
V-Port
ALU
MQ
Ves
None
None
DUMPFF
I5IFI
Output DividelBCD Flip-Flops
EXAMPLES (assumes a 32-bit configuration)
Dump divide/BCD flip-flops to Y output.
Oprd
Addr
Instr
Code
17-10
0101 1111
A5-AO
XX
Oprd
Addr
B5-BO
Oprd Sel
EB1EAEBO
Dest
Addr
C5-CO
xxxx xx xxxx x xx xx
-WE3-
Destination Selects
SELRF1-
SELMa WEO SELRFO
XXXX
1
XXXX
XX
0EY3'15EA 15Es 0Ev0 DES en
x
x
0000
x
X
CF2CFO
110
Assume divide/BCD flip-flops contain 2A055470 (Hex):
Source
0010101000000101 0101 01000111 0000
I MQ register
+-
Destination
0010101000000101 0101 01000111 0000
I Y output
MQ register
+-
Divide/BCD flip-flops
N
M
00
00
~
(.)
«qr"
2
en
3-93
I8 IF
Excess·3 Byte Correction
EX3BC
FUNCTION
Corrects the result of excess-3 addition or subtraction in selected bytes.
DESCRIPTION
This instruction corrects excess-3 additions or subtractions in the byte mode. For
correct excess-3 arithmetic, this instruction must follow each add or subtract. The
operand must be on the 5 bus.
Data on the 5 bus is added to a constant on the R bus determined by the state of
the BCQ flip flops and previous overflow condition reported on the 55F pin. Bytes with
510 inputs programmed low evaluate the correct excess-3 representation. Bytes with
510 inputs programmed high or floating, pass 5 unaltered.
en
2
-.oJ
Available R Bus Source Operands
~
»
n
......j
CO
CO
W
N
C3-CO
A3-AO
RF
(A5-AO) Immed
DA-Port
..
A3-AO
Ml!sk
No
No
No
No
Available S Bus Source O,perands
MQ
RF
DB-Port
(B5-BO)
Register
Yes
No
No
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
Shift Operations
Y-Port
ALU
MQ
No
No
No
No
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
SIOO
Yes
Byte select
5101
Yes
Byte select
5102
Yes
Byte selElct
5103
Yes
Byte select
Cn
No
Inactive
3-94
EX3BC
F
8
Excess-3 Byte Correction
Status Signals
ZERO
o
N
o
if arithmetic signed overflow
OVR
if carry-out = 1
Cn
EXAMPLE (assumes a 32-bit configuration)
Add two BCD numbers and store the sum in register 3. Assume data comes in on
DB bus.
1.
2.
3.
4.
5.
6.
Clear accumulator (SUB ACC, ACC)
Store 33 (Hex) in all bytes of register (SET1 R2, H/33/1
Add 33 (Hex) to selected bytes of first BCD number (BADD DB, R2, R1)
Add 33 (Hex) to selected bytes of second BCD number (BADD DB, R2, R3)
Add selected bytes of registers 1 and 3 (BADD, R1, R3, R3)
Correct the result (EX3BC, R3, R3)
Instr
Op.d
Op.d
Op.d Sol
Dest
Code
Add.
Add.
EB1-
Add.
-WE3-
17-10
AS-AD
BS-8O,
CS-CO
SELMQ WEO SELRFO
Eli EBO
XX XXXX 0
00001000 00 0010 XX XXXX 0
10001000 00 0010 XX XXXX 0
1000 1000 00 0010 XX XXXX 0
XX
XX
000010
10
1000 1000 000001
11110010 00 0010
1000 1111
000011
XX XXXX 000011
SELRF1-
0
0000
10
00 0010
0
0000
10
00 0001
0
0000
10
10
000011
0
0000
10
00
000011
0
0000
10
X 00
000011
0
0000
10
0
- --
Destination Selects
0eYa0eA 0Eii iiEYO 0eS
X
X
X
X
X
X
X XXXX
X XXXX
X XXX X
X XXX X
X XXXX
X XXXX
CF2- 5103- IESI03Cn CFO
SiOo iEsiOo
1
110
0
XXXX XXXX
X 110 XXXX XXXX
0
0
110 1100
0000
0
0
110 1100
0000
0
0
110 1100
0000
0
0
110 1100
0000
0
Assume DB bus holds 51336912 at third instruction and 34867162 at fourth
instruction.
000000000000 0000 0000 0000 0000 0000
I
RF(2)
+-
0
2
0000 0000 0000 0000 0011 0011 0011 0011
RF(2)
+-
00003333 (Hex)
3
01010001001100111001110001000101
RF(1)
+-
RF(2) +DB
4
0011 0100 1000 0110 1010 0100 1001 0101
RF(3)
+-
RF(2)
5
0011 010010000110010000001101 1010
I
6
0011 0100 1000 0110 0100 0000 0111 0100
I RF(3)n
RF(3)n
+ DB
+ RF(3)n
+-
RF(1)n
+-
Corrected RF(3)n result
3-95
I9 IF
Excess·3 Word Correction
EX3C
FUNCTION
Corrects the result of excess-3 addition or subtraction.
DESCRIPTION
This instruction corrects excess-3 additions or subtractions in the word mode. For
correct excess-3 arithmetic, this instruction must follow each add or subtract. The
operand must be on the 5 bus.
Data on the 5 bus is added to a constant on the R bus deteqnined by the state of
the BCD flip-flops and previous overflow condition reported on the SSF pin.
Available R Bus Source Operands
en
C3-CO
:2
-...I
~
»
(")
-t
CO
CO
W
N
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
No
No
No
No
Available S Bus Source Operands
RF
MQ
DB-Port
(B5-BO)
Register
Yes
No
No
Available Destination Operands
RF
RF
(C5-CO) (B5-80)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
No
No
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
5100
No
Inactive
5101
No
Inactive
5102
No
Inactive
5103
No
Inactive
Cn
No
Inactive
3-96
EX3C
I9IF
Excess-3 Word Correction
Status Signals
o
ZERO
N
1 if MSB =
if arithmetic signed overflow
OVR
en
if carry-out = 1
EXAMPLE (assumes a 32-bit configuration)
Add two BCD numbers and store the sum in register 3. Assume data comes in on
DA bus.
1.
2.
3.
4.
5.
6.
7.
Clear accumulator (SUB ACC, ACC)
Store 33 (Hex) in all bytes of register (SET1 R2, H/33/1
Add 33 (Hex) to all bytes of first BCD number (ADD DB, R2, R1)
Add 33 (Hex) to all bytes of second BCD number (ADD DB, R2, R3)
Add the excess-3 data (ADD, R1, R3, R3)
Correct the excess-3 result (EX3C, R3, R3)
Subtract the excess-3 bias to go to BCD result.
Instr
Oprd
Oprd
Oprd Sel
Code
17-10
Addr
A5-AO
Addr
B5-80
EA EBO
11110010
00 0010
00001000
11110001
11110001
111.10001
000010
000010
000010
000001
1001 1111 XX
11110010 000010
xxx x
EB1-
Dest
Addr
C5-CO
Destination Selects
WE3- SELRF1SELMa
0
0
xx
xx
000010
000010
0
0
0
0
10
10
000001
000011
0
0
000011
000011
0
00
X 00
000011
000011
000011
0
000011
0
0
0
XX
XX
XX
XX
XXXX
XXXX
XXXX
XXXX
00
WEO
SELRFO
10
0000
0000
10
10
0000
10
0000
0000
0000
0000
10
10
10
X
X
X
X
X
X
X
....
U
(")
-I
00
CO
Co\)
N
3-110
CF2-
0eY0 0eS
RF(1)
R +
en
0
Cn CFO
0 110
MQSLC
Pass (V - F) with Circular Left MQ Shift
I0 I
*
I
FUNCTION
Passes the result of the ALU instruction specified in the upper nibble of the instruction
field to Y MUX. Performs a circular left shift on MQ.
DESCRIPTION
The result of the arithmetic or logical operation specified in the lower nibble of the
instruction field (13-10) is passed unshifted to Y MUX.
The contents of the MQ register are rotated one bit to the left. The MSB is rotated
out and passed to the LSB of the same word, which may be 1, 2, or 4 bytes long.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the MQ register. If SSF is low, the MQ register will not be altered .
• A list of ALU operations that can be used with this instruction is given in Table 15.
Shift Operations
Available Destination Operands IALU Shifter)
RF
RF
(C5-CO)
185-80)
Yes
No
Y-Port
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
Yes
Passes shift result if high or floating; retains MQ
SiOO
SiOT
No
Inactive
No
Inactive
SI02
No
Inactive
SI03
No
Inactive
Cn
No
Affects arithmetic operation programmed in bits
without shift if low.
13-10 of instruction field.
3-111
10 I *
Pass (Y - F) with Circular Left MQ Shih
MQSLC
Status Signals t
ZERO
N
1 if result = 0
1 if MSB of result =
o if MSB of result
OVR
C
= 0
1 if signed arithmetic overflow
1 if carry-out = 1
tc is ALU carry-out and is evaluated before shift operation. ZERO and N (negative)
are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
CJ)
Add data in register 1 to data on the DB bus with carry-in and store the unshifted
result in register 1. Circular shift the contents of the MQ register one bit to the left.
:2
-.,J
~
l>
n
-4
00
00
W
N
Inst'
Code
17·10
11010001
Op,d
Add,
A5·AO
00 0001
Op,d
Add,
B5·BO
XX
Op,d Sel
Dest
Add,
EB1·
EAEBO
C5·CO
10 00 0001
xxxx a
Destination Selects
SELRF1·
SELMa
SELRFO i5EA 0Ee
10
X
X
0
0000
WE3.
WEO
Offi·
0EY0 OES
xxxx a
CF2·
Cn CFO
I
110
Assume register file 1 holds 2508C618 (Hex), DB bus holds 11007530 (Hex), and
MQ register holds 4DA99AOE (Hex).
Source
0010 0101 0000 1000 1 lOa 01100001 1000
Source
0001 0001 0000 0000 0111 0101 0011 0000
Destination
001101100000 1001 00111011 0100 1001
Source
Destination
3-112
I
0100 1101 lOla 1001 1001 lOla 0000 1 I 10
1001 1011 0101 001 I 001 I 0100 0001 1100
I R - RF(1)
I S - DB bus
I RF( 1) - R + S + Cn
I MQ shifter - MQ register
I MQ register - MQ shifter
MOSLL
Pass (Y - F) with Logical Left MO Shift
FUNCTION
Passes the result of the ALU instruction specified in the upper nibble of the instruction
field to Y MUX. Performs a left shift on MO.
DESCRIPTION
The result of the arithmetic or logical operation specified in the lower nibble of the
instruction field (13-10) is passed unshifted to Y MUX.
The contents of the MO register are shifted one bit to the left. A zero is filled into
the least significant bit of each word unless the SIO input for that word is programmed
low; this will force the least significant bit to one. The MSB is dropped from each word,
which may be 1, 2, or 4 bytes long, depending on the configuration selected.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the MQ register. If SSF is low, the MQ register will not be altered.
• A list of ALU operations that can be used with this instruction is given in Table 15.
N
M
CO
CO
I(.)
("')
-t
00
00
tAl
N
Inst,
Code
17-10
10110001
Op,d
Add,
A5-AO
000001
Op,d
Op,d Sel
Dest
Add,
Add,
EB1B5-BO
EA EBO C5-CO
XX XXXX
0 10 00.. 0001
Destination Selects
SELRF1OffiSELMa WEo SELRFO 0eA 0Eii 0EY0
0
0000
10
X
X
XXXX
WE3-
OES
0
CF2Cn CFO
1 110
Assl-lme register file 1 holds 5608C61~ (Hex), DB bus holds 14007530 (Hex), and
MO register holds 98A99AOE (Hex).
Source
0101 01100000 1000 110001100001 1000
I R +- RF( 1)
Source
0001 0100000000000111 0101 0011 0000
Is+- DB bus
Destination
011010100000 1001 0011 1011 0100 1001
I RF(1) +- R + S + Cn
Source
1001100010101001 1001 10100000 1110
I MO shifter +- MO register
Destination
01001100 0101 0100 1100 1101 00000111
3-118
MO register
+- MO shifter
NAND
Logical NAND (R NAND S)
*
IcI
FUNCTION
Evaluates the logical expression R NAND S.
DESCRIPTION
Data on the R bus is NANDed with data on the S bus. The result appears at the ALU
and MQ shifters.
"The result of this instruction can be shifted in the same micro cycle by specifying a shift instruction in the
upper nibble (17-14) of the instruction field. The result may also be passed without shift. Possible instructions
are listed in Table 15.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
N
M
..
CO
CO
A3-AO
t-
Mask
Yes
No
Yes
O
«
,...~
No
z
Available S Bus Source Operands
en
MQ
RF
DB-Port
(85-BO)
Register
Yes
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
Y-Port
Yes
No
ALU
MQ
Shifter
Shifter
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Affect shift instructions programmed in bits 17-14 of
5100
No
instruction field.
5101
No
5102
No
5103
No
Cn
Inactive
3-119
I * Ie I
Logical NAND (R NAND S)
NAND
Status Signals t
ZERO
N
1 if result = 0
1 if MSB = 1
OVR
0
C
0
tc is AlU carry out "and is evaluated before shift operation.
ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after AlU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Logically NAND the contents of register 3 and register 5, and store the result
in register 5.
rJ)
2
"'-J
~
»
("')
-t
CO
CO
Inst'
Code
17-10
Op,d
Add,
Op,d
Op,d Sel
Add,
A5-AO
1111 1100
000011
B5-BO
000101
EB1·
EA EBO
0
00
Dest
Add,
C5-CO
000101
Destination Selects
SELRF1OEY3CF2SELMQ
SELRFO OEA OEB 0eY0 OES Cn CFO
X
X XXXX
X 110
0
0000
10
0
WE3.
'Weii
Assume register file 1 holds 60F6D840 (Hex) and register file 5 holds 13F6D377 (Hex).
~
I R-
Source
01100000111101101101100001000000
Source
00010011111101101101001101110111
S - RF(5)
Destination
111111110000 100100101111 lOll 1111
RF(5) - R NAND S
3-120
RF(3)
NOP
No Operation
F
F
FUNCTION
Forces AlU output to zero.
DESCRIPTION
This instruction forces the AlU output to zero. The BCD flip-flops retain their old value.
Note that the clear instruction (ClR) forces the AlU output to zero and clears the BCD
flip-flops.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
N
M
CO
CO
Mask
No
No
No
No
....
()
ct
Available S Bus Source Operands
~
I'
RF
MO
DB-Port
(B5-BO)
Register
No
No
No
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Z
en
Shift Operations
Y-Port
ALU
MO
Yes
None
None
Status Signals
IZER~
OVR
C
o
o
o
3-121
IFIF
No Operation
NOP
EXAMPLE (assumes a 32-bit configuration)
Clear register 12.
Inst,
Code
17-10
11"11111
Op,d
Add,
AS-AO
XX XXXX
Dl:Istination
en
2:
"~
o
-t
co
co
W
N
3-122
I
Op,d
Md,
B5-6O
XXXX
xx
Op,d Sel
EB1EA EBO
x xx
Destination Selects
Dest
Add,
WE3- SELRF1C5-CO
SELMa WeB' SELRFO OEA DEe
001100
0
0000
10
X
X
0000 0000 0000 0000 0000 0000 0000 0000
I RF(12) -
0
OEY3-
0Ev0
xxxx
CF2OES Cn CFO
X
0
110
Logical NOR (R NOR S)
NOR
*
I0
FUNCTION
Evaluates the logical expression R NOR S.
DESCRIPTION
Data on the R bus is NORed with data on the S bus. The result appears at the ALU
and MQ shifters.
"The result of this instruction can be shifted in the same microcycle by specifying a shift instruction in the
upper nibble (17-141 of the instruction field. The result may also be passed without shift. Possible instructions
are listed in Table 15.
Available R Bus Source Operands
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
N
..
('I)
00
00
A3-AO
....
Mask
Yes
No
~
No
Yes
I"
Available S Bus Source Operands
Z
C/)
RF
(B5-BO)
Yes
DB-Port
MO
Register
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (85-BO)
Yes
No
Y-Port
Yes
ALU
MO
Shifter
Shifter
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Affect shift instructions programmed in bits 17-14 of
5100
No
instruction field.
5101
No
5102
No
5103
No
Cn
No
Inactive
3-123
I * 10
Logical NOR (R NOR S)
NOR
Status Signals t
ZERO
N
OVR
C
1 if result = 0
1 if MSB = 1
o
o
t C is ALU carry out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift op~ration. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Logically NOR the contents of register 3 and register 5, and store the result
in' register 5.
~
-...I
~
l>
(")
-t
CO
CO
Ins~r
Code
17-10
11111011
Oprd
Addr
A5-AO
000011
Oprd
Addr
B5-BO
000101
Oprd Sel
Dest
EB1Addr
EAEBO
C5-CO
0 00 000101
Destination Se!!3cts
SELRF1SELMO
SELRFO 0eA OEB
X
X
0
0000
10
We3WEci
0eYaOEYO
OES
XXXX
0
CF2Cn CFO
X 110
Assume register file 3 holds 60F6D840 (Hex) and register file 5 holds 13F6D377 (Hex).
~
Source
011000001111 01101101 100001000000
I R +- RF(3)
Source
00010011111101101101001101110111
Is+- RF(5)
Destination
1000 11000000 10010010010010001000
I RF(5) -
3-124
R NOR S
OR
Logical OR IR 0" S)
*
IBI
FUNCTION
Evaluates the logical expression R OR S.
DESCRIPTION
Data on the R bus is ORed with data on the S bus. The result appears at the ALU
and MQ shifters.
'The result of this instruction can be shifted in the same microcycle by specifying a shift instruction in the
upper nibble (17-14) of the instruction field. The result may also be passed without shift. Possible instructions
are listed in Table 15.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(AS-AO) Immed
Yes
DA-Port
No
N
M
CO
CO
..
A3-AO
Mask
I-
No
«
q-
Yes
CJ
r-.
Available S Bus Source Operands
2
en
RF
MQ
DB-Port
(BS-BO)
Register
Yes
Yes
Yes
Available Destination Operands
RF
RF
(CS-CO! (BS-BO)
Yes
No
Y-Port
Yes
ALU
MQ
Shifter
Shifter
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Affect shift instructions programmed in bits 17-14 of
5100
No
instruction field.
5101
No
5102
No
5103
No
Cn
No
Inactive
3-12S
Logical OR (R OR S)
OR
Status Signals t
ZERO
1 if result
N
OVR
C
1 if MSB
=0
=1
0
0
t C is ALU carry out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluat/ld after ALU operation and after shift operation.
EXAMPLE (assu!11es a 32-bit configuration)
Logically OR the contents of register 5 and register 3, and store the result in
register 3.
t/)
Z
~
~
C')
-f
~
~
Code
17-10
Oprd
Addr
A5-AO
Oprd
Addr
B5-BO
1111 1011
000101
000011
Instr
Oprd Sel
EA
EB1·
EBO
Dest
Addr
C6-CO
0
00
000011
Destination Selects
SELRF1·
SELMO
SELRFO 0eA OEB
0000
10
X
X
0
"We3.
WeD
om·
0eY0 DES
Cn
XXXX
X
0
CF2·
CFO
110
Assume register file 5 holds 60F6D840 (Hex) and register file 3 holds 13F6D377 (Hex).
Source
011000001111 01101101 100001000000
Source
Destination
3-126
I
R
+-
RF(5)
00010011111101101101001101110111
S
+-
RF(3)
0111 0011 1111 0110 1101 1011 0111 0111
RF(3)
+-
R OR S
PASS
F
Pass (Y - F)
FUNCTION
Passes the result of the ALU instruction specified in the lower nibble of the instruction
field to Y MUX.
DESCRIPTION
The result of the arithmetic or logical operation specified in the lower nibble of the
instruction field (/3-10) is passed unshifted to Y MUX.
* A list of ALU operations that can be used with this instruction is given in Table 15.
Available Destination Operands
RF
RF
(C5-CO)
(85-80)
Yes
No
Y-Port
Yes
ALU
MQ
Shifter
Shifter
None
None
N
M
CO
00
I-
CJ
~
,....
Control/Data Signals
Signal
User
z
en
Use
Programmable
SSF
No
Inactive
SIOO
No
Inactive
~101
No
Inactive
SI02
No
Inactive
SI03
No
Inactive
Cn
No
Affects arithmetic operation specified in bits 13-10 of
instruction field.
Status Signals t
ZERO
N
1 if result = 0
1 if MSB of result = 1
o if MSB of result
OVR
C
= 0
1 if signed arithmetic overflow
if carry-out condition
tc is ALU carry out and is evaluated before shift operation. ZERO and
N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
3-127
IFI*
Pass (Y -
PASS
F)
EXAMPLE (assumes a 32-bit configuration)
Add data in register 1 to data on the DB bus with carry-in and store the unshifted
result in register 10.
Instr
Op,d
Op,d
Code
17-10
Add,
Add,
Op,d Sel
EB1-
Dest
Add,
A5-AO
B5-BO
Eli: EBO
C5-CO
SELMQ
WeD
SELRF1SELRFO
0eA
QEij
11110001
000001
XX XXXX
00 1010
0
0000
10
X
X
0
10
Destination Selects
WEJ-
0Ev3OEYO 0eS
xxxx 0
CF2Cn CFO
1
110
Assume register file 3 holds 9308C618 (Hex) and DB bus holds 24007530 (Hex).
Source
1001 0011 00001000 110001100001 1000
I R-
Source
00100100 0000 0000 0111 0101 0011 0000
I
Destination
10110111000010010011101101001001
3-128
RF(1)
S - DB bus
RF(10) - R
+ S + en
SDIVI
Signed Divide Iterate
IAI0 I
FUNCTION
Performs one of N-2 iterations of nonrestoring signed division by a test subtraction
of the N-bit divisor from the 2N-bit dividend. An algorithm using this instruction is
given in the "Other Arithmetic Instructions" section.
DESCRIPTION
SOIVI performs a test subtraction of the divisor from the dividend to generate a quotient
bit. The test subtraction passes if the remainder is positive and fails if negative. If
it fails, the remainder will be corrected during the next instruction.
SOIVI checks the pass/fail result of the test subtraction from the previous instruction,
and evaluates
F ..... R
F ..... R'
+ S
+ S + Cn
if the test fails
if the test passes
N
M
00
A double precision left shift is performed; bit 7 of the most significant byte of the MO
shifter is transferred to bit 0 of the least significant byte of the ALU shifter. Bit 7 of
the most significant byte of the ALU shifter is lost. The unfixed quotient bit is circulated
into the least significant bit of the MO shifter.
The R bus must be loaded with the divisor, the S bus with the most significant half
of the result of the previous instruction (SOIVI during iteration or SOIVIS at the beginning
of iteration). The least significant half of the previous result is in the MO register. Carryin should be programmed high. Overflow occurring during SOIVI is reported to OVR
at the end of the signed divide routine (after SOIVOF).
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
Yes
No
DA-Port
Yes
..
A3-AO
Mask
No
Recommended S Bus Source Operands
RF
MQ
D8-Port
(85-80)
Register
Yes
Yes
No
Recommended Destination Operands
RF
RF
(C5-CO) (85-80)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
Left
Left
3-129
~
U
C
(')
-4
CO
CO
Co\)
N
3-134
1 if intermediate result = 0
o
o
1 if carry-out
SOIVIS
SDiVIT
Signed Divide Terminate
I EI0 I
FUNCTION
Solves the final quotient bit during nonrestoring signed division. An
algorithm using this instruction is given in the "Other Arithmetic Instructions" section.
DESCRIPTION
SDIVIT performs the final subtraction of the divisor from the remainder during
nonrestoring signed division. SDIVIT is preceded by N-2 iterations of SDIVI, where
N is the number of bits in the dividend.
The R bus must be loaded with the divisor, and the S bus must be loaded with the
most significant half of the result of the last SDIVI instruction. The least significant
half lies in the MQ register. The Y bus result must be ioaded back into the register
file for use in the subsequent DIVRF instruction. Carry-in should be programmed high.
SDIVIT checks the pass/fail result of the previous instruction's test subtraction and
evaluates;
Y+-R+S
Y +- R' + S
I-
if the test fails
if the test passes
+ Cn
CJ
«
~
The contents of the MQ register are shifted one bit to the left; the unfixed quotient
bit is circulated into the least significant bit.
Overflow during this instruction is reported to OVR at the end of the signed division
routine (after SDIVQF).
Available R Bus Source Operands
C3-CO
A3-AO
RF
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
Yes
No
Recommended S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
Yes
MQ
Register
No
Recommended Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
Left
Left
N
M
CO
CO
3-135
~
en
IEI0
Signed Divide Terminate
Control/Data Signals
User
Signal
Use
Programmable
SSF
No
Inactive
5100
No
Pass internally generated end-fill bits.
5101
No
5102
No
5103
No
Cn
Yes
Should be programmed high
Status Signals
en
ZERO
1 if intermediate result = 0
:2
N
o
~
OVR
o
-..I
»
(')
C
-t
CO
CO
eN
I\)
3-136
1 if carry-out
SDIVIT
SOIVO
Signed Divide Overflow Test
IAI
F
FUNCTION
Tests for overflow during nonrestoring signed division. An algorithm using this
instruction is given in the "Other Arithmetic Instructions section.
DESCRIPTION
This instruction performs an initial test subtraction of the divisor from the dividend.
If overflow is detected, it is preserved internally and reported at the end of the divide
routine (after SOIVOF). If overflow status is ignored, the SOIVO instruction may be
omitted.
The divisor must be loaded onto the R bus; the most significant half of the previous
SOIVIN result must be loaded onto the S bus. The least significant half is in the MO
register.
N
The result on the Y bus should not be stored back into the register file; WE' should
be programmed high.
('I)
00
00
~
Carry-in should also be programmed high.
u
C
n
Use
Programmable
-I
CO
CO
Co\)
to..)
3-138
1 if divisor = 0
o
o
1 if carry-out
SDiVO
SDlVQF
Signed Divide Quotient Fix
I5I0I
FUNCTION
Tests the quotient result after nonrestoring signed division and corrects it if necessary.
An algorithm using this instruction is given in the "Other Arithmetic Instructions"
section.
DESCRIPTION
SDIVQF is the final instruction required to compute the quotient of a 2N-bit dividend
by an N-bit divisor. It corrects the quotient if the signs of the divisor and dividend are
different and the remainder is nonzero.
The fix is implemented by incrementing S:
Y-S+
Y-S+O
if a fix is required
if no fix is required
The R bus must be loaded with the divisor, and the S bus with the most significant
half of the result of the preceding DIVRF instruction. The least significant half is in
the MQ register.
N
M
00
00
I-
U
o
Source
000011110000 1111 0000 111100001111
CO
Source
10100000100000111011111010111110
ALU
1010oo()0 1000001110111111101111;0
Destination
10100000100000111011111110111110
-I
ffiN
Rn - C3-CO::A3-AO
I Sn - RF(1)n
I Fn - Sn OR Rn
I RF(1)n - Fn or Snt
tF = ALU result
n = nth byte
Register file 1 gets F if byte selected. S if byte not selected_
3-146
Si03- iE'Si03SiOo iEsiOo
116 HOl
0000
SLA
Arithmetic Left Single Precision Shift
FUNCTION
Performs arithmetic left shift on result of ALU operation specified in lower nibble of
instruction field.
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is shifted one bit
to the left. A zero is filled into bit 0 of the least significant byte of each word unless
the SID input is programmed low; this will force bit 0 to one. Bit 7 is dropped frqm
the most significant byte in each word, which may be 1, 2, or 4 bytes long, depending
on the configuration selected.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the MQ register. If SSF is low, the MQ register will not be altered .
• A list of ALU operations that can be used with this instruction is given in Table 15.
Shift Operations
ALU Shifter
Arithmetic Left
Available Destination Operands (ALU Shifter)
RF
RF
(C5-CO)
(85-80)
Yes
No
V-Port
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
Yes
Passes shift result if high; passes ALU result if low.
SIOO
Yes
Fills a zero in LS8 of each word if high; fills a
SI01
Yes
one in L58 if low.
5102
Yes
5103
Yes
Cn
No
Affects arithmetic operation programmed in bits
13-10 of instruction field.
3-147
SlA
Arithmetic Left Single Precision Shift
Status Signals t
ZERO
N
1 if result = 0
1 if MSB of result = 1
cOif MSB of result = 0
OVR
C
1 if signed arithmetic overflow or if MSB XOR MSB-1
1 before shift
1 if carry-out condition
tc is ALU carry-out and is evaluated before shift operation. ZERO and N (negative I are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after s~ift operation.
EXAMPLE (assumes a 32-bit configuration)
en
2
"~
(")
-I
00
~
Perform the computation A = 2(A + B), where A and B are single-precision, two's
complement numbers. Let A be stored in r~g!ster 1 and B be input via the DB bus.
Instr
Oprd
Oprd
Oprd 5el
Dest
Code
Addr
Addr
EB1-
Addr
17-10
AS-AO
BS-BO
01000001
00 0001
XX XXXX
EA EBO
0
10
Destination Selects
WE3- SELRF1-
OEV3-
CF2- S103-
CS-CO
SELMa
WEO
SELRFO
OEA
OEB
OEVO
OES
000001
0
0000
10
X
X
XXXX
0
Cn CFO
0
IESI03-
SiOo iESiOo
110 1110
0000
Assume register file 1 holds 1308C618 (Hex), DB bus holds 44007530 (Hex).
N
Source
00010011000010001100011000911000
I R-RF(1)
Source
010001000000 0000 01 I I 0101 0011 0000
I
Intermediate
Result
0101 01 I 1 0000 1001 001 I 101 I 01001000
I ALU Shifter
Destination
10101 I 100001 001001 I I 0110 1001 0001
3-148
S - DB bus
RF( 1)
+-
+-
R + S + Cn
ALU shift result
SSF
1
SLAD
Arithmetic Left Double Precision Shift
15 I
*
I
FUNCTION
Performs arithmetic left shift on MO register (LSH) and result of ALU operation (MSH)
specified in lower nibble of instruction field.
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is used as the upper
half of a double-precision word, the contents of the MO register as the lower half.
The contents of the MO register are shifted one bit to the left. A zero is filled into
bit 0 of the least significant byte of each word unless the SID input for the word is
set to zero; this will force bit 0 to one. Bit 7 of the most significant byte in the MO
shifter is passed to bit 0 of the least significant byte of the ALU shifter. Bit 7 of the
most significant byte in the ALU shifter is dropped.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the Y MUX and MO register. If SSF is low, the ALU output and MO
register will not be altered .
• A list of ALU operations that can be used with this instruction is given in Table 15.
Shift Operations
ALU Shifter
MQ Shifter
Arithmetic Left
Arithmetic Left
Available Destination Operands (ALU Shifter)
RF
RF
(C5-CO)
(B5-BO)
Yes
No
Y-Port
Yes
ContrOl/Data Signals
Signal
User
Use
Programmable
SSF
Yes
Passes shift result if high; passes ALU result if low.
SIOO
Yes
Fills a zero in LSB of each word if high; fills a
SI01
Yes
one in LSB if low.
SI02
Yes
SI03
Yes
Cn
No
Affects arithmetic operation specified in bits 13-10 of
instruction field.
3-149
I5 I*
Arithmetic Left Double Precision Shift
SLAD
Status Signals t
ZERO
N
1 if result = 0
1 if MSB of result. = 1
o if
OVR
MSB of result = 0
1 if signed arithmetic overflow or if MSB XOR MSB-1
C
1 before shift
if carry-but condition
tc is ALU carry-out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Perform the computation A = 2(A + B), where A and B are two's complement numbers.
(/) Let A be a double Ilrecision number residing in register 1 (MSH) and the MQ register
~ (LSH). Let B be a single precision number which is input through the DB bus.
~
»
(")
-f
00
00
Instr
Oprd
Oprd
Oprd Sel
Dest
Code
Addr
Addr
EB1-
Addr
17-10
A5-AO
B5-BO
-
0101 0001
00 0001
XX XXXX
0
10
EA EBO
Destination Selects
WE3- SELRF1-
C5·CO
SELMa
000001
0
-
WEO
SELRFO
OEA
OEB
0000
10
X
X
OEY3-
CF2-
Si03- iESiOO-
0EYci DES en
CFO
SIOO
IESIOO
SSF
XXXX
110 1110
0000
1
0
0
Co\)
N
Assume register file 1 holds 2408C618 (Hex), DB bus holds 26007530 (Hex), and
MQ register holds 50A99AOE (Hex).
MSH
Source
0010010000001000110001100001 1000
I R +- RF( 1)
Source
00100110000000000111 0101 0011 0000
Is+- DB bus
Intermediate
Result
0100 10100000 1001 0011 1011 0100 1000
I ALU Shifter
Destination
1001 0100 OOP1 00100111 0110 1001 0000
I RF( 1) +- ALU shift register
Source
0101 0000 1010 1001 1001 101000001110
I MO shifter +- MO register
Destination
10100001 0101 0011 0011 01000001 1101
+-
R + S + en
LSH
3-150
MO register
+-
MO shift result
SLC
Circular Left Single Precision Shift
FUNCTION
Performs circular left shift on result of ALU operation specified in lower nibble of
instruction field.
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is rotated one bit
to the left. Bit 7 of the most significant byte in each word is passed to bit 0 of the
least significant byte in the word. which may be 1. 2. or 4 bytes long.
The shift may be made conditional on SSF. If SSF is high or floating. the shift result
will be sent to Y MUX. If SSF is low. F is passed unaltered.
* A list of ALU operations that can be used with this instruction is given in Table 15.
N
M
Shift Operations
ALU Shifter
MQ Shifter
Circular Left
None
CO
CO
....
u
(")
-t
Instr
Oprd
Dest
Addr
Oprd
Addr
Oprd Sel
Code
EB1-
Addr
17-10
AS-AD
B5-BO
EA EBO
01100110
000110
XXXXXX
0
00
Destination Selects
WE3- SELRF 1-
C5-CO
SELMQ
WEO
SELRFO
OEA
OEB
000001
o
0000
10
X
X
"Ci"EY35EYo OES
XXX X
CF2Cn CFO
0
CO Assume register file 6 holds 3788C618 (Hex).
CO
Co\)
Source
0011 0111 1000 1000 110001100001 1000
I
R
Intermediate
Result
0011 0111 1000 1000 1100 0110 0001 1000
I
ALU Shifter
Destination
0110 1111 0001 0001 1000 1100 0011 0000
I
RF( 1)
N
3-152
+-
RF(6)
+-
+-
R
+ Cn
ALU shifter result
0
110
SSF
1
SLCD
Circular Left Double Precision Shift
7
FUNCTION
Performs circular left shift on MQ register (LSH) and result of ALU operation specified
in lower nibble of instruction field (MSH).
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is used as the upper
half of a double-precision word. the contents of the MQ register as the lower half.
The contents of the MQ and ALU registers are rotated one bit to the left. Bit 7 of the
most significant byte in the MQ shifter is passed to bit 0 of the least significant byte
of the ALU shifter. Bit 7 of the most significant byte is passed to bit 0 of the least
significant byte in the MQ shifter.
The shift may be made conditional on SSF. If SSF is high or floating. the shift result
will be sent to Y MUX. If SSF is low. F is passed unaltered and the MQ register is
not changed.
N
* A list of ALU operations that can be used with this instruction is given in Table 15.
U
~
CO
t-
Perform a circular left double precision shift of data in register 6 (MSH) and MQ (LSH),
and store the result back in register 6 and the MQ register.
~
Instr
Op,d
Op,d
Op,d Sel
Add,
Add,
EB1·
(')
Code
17·10
01110110
A5·AO
000110
B5·BO
-t
XX XXXX
EA EBO
0
00
Dest
Add,
Destination Selects
iNE3.
SELRF1·
SELRFO OEA
C5·CO
SELMO
WED
000110
0
0000
10
X
+-
RF(6)
0Ev3.
We 0EY0 OES
X
XXXX
CF2·
Cn CFO SSF
0
0
110
l
CO
CO
W Assume register file 6 holds 3708C618 (Hex) and MQ register holds 50A99AOE (Hex).
N
MSH
R
Source
0011 0111 00001000 110001100001 1000
Intermediate
Result
0011 0111 00001000110001100001 1000
I ALU Shifter
Destination
01101111 0001 0001 100011000011 0000
I
Source
0101 0000 1010 1001 1001 10100000 1110
I MQ register
+-
MQ register
Destination
10100001 0101 0011 0011 01000001 1100
I
+-
MQ shift result
RF(6)
+-
+-
R
+ Cn
ALU shifter result
LSH
3-154
MQ register
SMTC
Sign Magnitude/Two's Complement
I5I8I
FUNCTION
Converts data on the S bus from sign magnitude to two's complement or vice versa.
DESCRIPTION
The S bus provides the source word for this instruction. The number is converted by
inverting S and adding the result to the carry-in, which should be programmed high
for proper conversion; the sign bit of the result is then inverted. An error condition
will occur if the source word is a negative zero (negative sign and zero magnitude).
In this case, SMTC generates a positive zero, and the OVR pin is set high to reflect
an illegal conversion.
The sign bit of the selected operand in the most significant byte is tested; if it is high,
the converted number is passed to the destination. Otherwise the operand is passed
unaltered.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
No
No
No
No
Available S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
Yes
MQ
Register
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Shift Operations
Y-Port
ALU
MQ
Yes
None
None
3-155
15 18
Sigm Magnitude/Two's Complement
SMTC
Control/Data Signals
Sign!:!1
en
2
....,
t
(")
-4
CO
CO
User
Use
Programmable
SSF
No
Inactive
SIOO
No
Inactive
SiOf
No
Inactive
SI02
No
Inactive
SI03
No
Inactive
Cn
Yes
Should be programmed high for proper conversion
Status Signals
ZERO
N
OVR
1 if result = 0
1 if MSB = 1
1 if input of most significant byte is 80 (Hex) and results in all other
bytes are 00 (Hex).
C = 1 if S = 0
tAl
N EXAMPLES (assumes a 32-bit configuration)
Convert the two's complement number in register 1 to sign magnitude representation
and stor~ the result in register 4.
Oprd
Add,
Oprd
Add,
Op,d Sel
EB1-
Dest
Add,
A5-AO
B5-BO
EA EBO
C5-CO
000001
X 00
000100
Instr
Code
17,10
0101 1000 XX XXXX
Destination Selects
WE3-
SELRF1-
SELMO
WEo
SELRFO
OEA
(ffij
0
0000
10
X
X
OEY3·
Example 1: Assume register file 1 holds C3F6D840 (Hex).
Source
11000011111101101101100001000000
I S-
Destination
1011 11000000 1001 00100111 11000000
I RF(4) -
RF(1)
S' + Cn
Example 2: Assume register file 1 holds 550927CO (Hex).
Source
0101 0101 0000 1001 00100111 11000000
I S-
Destination
01010101 0000 1001 00100111 11000000
I RF(4) -
3-156
RF(1)
S
CF2-
0EY0 DEs
XXXX
0
Cn
1
CFO
110
SMUll
Signed Multiply Iterate
I6I0 I
FUNCTION
Computes one of N-1 signed or N mixed multiplication iterations for computing an
N-bit by N-bit product. Algorithms for signed and mixed multiplication using this
instruction are given in the "Other Arithmetic Instructions" section.
DESCRIPTION
SMUll checks to determine whether the multiplicand should be added with the present
partial product. The instruction evaluates:
F
+-
R + S + Cn
F-S
if the addition is required
if no addition is required
A double precision right shift is performed. Bit 0 of the least significant byte of the
ALU shifter is passed to bit 7 of the most significant byte of the MO shifter; carry-out
is passed to the most significant bit of the ALU shifter.
The S bus should be loaded with the contents of an accumulator and the R bus with
the multiplicand. The Y bus result should be written back to the accumulator after
each iteration of UMULI. The accumulator should be cleared and the MO register loaded
with the multiplier before the first iteration.
C3-CO
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
Yes
No
Recommended S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
Yes
MQ
Register
No
Recommended Destination Operands Shift Operations
RF
RF
(C5-CO) (B5-BO)
Yes
No
en
en
lt)
«
'It
"Z
en
Available R Bus Source Operands
RF
N
('I)
Y-Port
ALU
MQ
No
Right
Right
3-157
1610
Signed Multiply Iterate
Control/Data Signals
User
Signal
SSF
No
Inactive
5100
No
Passes LSB from ALU shifter to MSB of MQ shifter.
5101
No
SI02
No
Si03
No
Cn
Yes
Status Signals
en
Z
.....
~
~
Use
Programmable
ZERO
N
OVR
C
-f
CO
CO
W
N
3-158
1 if result = 0
1 if MSB = 1
o
1 if carry-out
Should be programmed low
SMUll
SMULT
Signed Multiply Terminate
I7I0 I
FUNCTION
Performs the final iteration for computing an N-bit by N-bit signed product. An algorithm
for signed multiplication using this instruction is given in the "other Arithmetic
Instructions" section.
DESCRIPTION
SMUll checks the present multiplier bit (the least significant bit of the MO register)
to determine whether the multiplicand should be added with the present partial product.
The instruction evaluates:
F
+-
R'
+ S + en
if the addition is required
if no addition is required
F-S
with the correct sign in the product.
A double precision right shift is performed. Bit 0 of the least significant byte of the
ALU shifter is passed to bit 7 of the most significant byte of the MO shifter.
The S bus should be loaded with the contents of an register file holding the previous
iteration result; the R bus must be loaded with the multiplicand. After executing SMULT,
the Y bus contains the most significant half of the product, and MO contains the least
significant half.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
Yes
No
Recommended S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
Yes
MQ
Register
No
Available Destination Qperands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Shifl Operations
Y-Port
ALU
MQ
No
Right
Right
3-159
17 10
Signed ,Multiply Terminate
Control/Data Signals
User
Signal
SSF
No
Inactive
5100
No
Passes LSB from ALU shifter to MSB of MQ shifter.
5101
No
5102
No
5103
No
en
Yes
Status Signals
en
:2
-...I
Use
Programmable
ZERO
N
~
OVR
l>
(")
-t
c
CO
CO
Co\)
N
3-160
1 if result = 0
1 if MSB = 1
o
1 if carry-out
Should be programmed low
SMULT
SNORM
Single-Length Normalize
I2I0 I
FUNCTION
Tests the two most significant bits of the MO register. If they are the same, shifts
the number to the left.
DESCRIPTION
This instruction is used to normalize a two's complement number in the MO register
by shifting the number one bit position to the left and filling a zero into the LSB (unless
the SIO input for that word is low). Data on the S bus is added to the carry, permitting
the number of shifts performed to be counted and stored in one of the register files.
The shift and the S bus increment are inhibited whenever normalization is attempted
on a number already normalized. Normalization is complete when overflow occurs.
C3-CO
C\I
M
00
00
..
()
A3-AO
~
o
~
Perform the computation A = (A + B)/2, where A and B are single-precision numbers.
Let A reside in register 1 and B be input via the DB bus.
Instr
Code
17·10
00000001
Oprd
Addr
A5-AO
000001
Oprd
Oprd Sel
Dest
Addr
EB1·
Addr
B5-BO
EA EBO C5-CO
XX XXXX 0 10
000001
Destination Selects
SELRF10eY3.
SELMQ
SELRFO OEA Oeii 0eY0
o 0000 10
X
X XXXX
WE3WED
DEs
0
CF2Cn CFO SSF
Ci 110 1
CO
CO Assume register file 1 holds 6Ab8C618 (Hex) and DB bus holds 51007530 (Hex).
W
N
Source
0110 10100000 1000 110001100001 1000
I R +- RF( 1)
Source
0101 0001 00000000 0111 0101 0011 0000
Is+- DB bus
Intermediate t
Result
10111011000010010011101101001000
Destination
0101110110000100 1001110110100100
I ALU Shifter R + S + en
I RF(1) ALU shift result
+-
+-
tAfter the intermediate operation (ADD), overflow has occurred and OVR status signal is set high. When the
arithmetic right shift is executed, the sign bit is corrected (see Table 16 for shift definition notes).
3-164
SRAD
Arithmetic Right Double Precision Shift
1
I
*
FUNCTION
Performs arithmetic right shift on MQ register (LSH) and result of ALU operation (MSH)
specified in lower nibble of instruction field.
DESCRIPTION
The result of the ALLi operation specified in instruction bits 13-10 is used as the upper
half of a double precision word, the contents of the MQ register as the lower half.
The contents of the ALU are shifted one bit to the right. The sign bit of the most
significant byte is retained unless the sign bit is inverted as a result of overflow. Bit 0
of the least significant byte in the ALU shifter is passed to bit 7 of the most significant
byte of the MQ register. Bit 0 of the MQ register's least significant byte is dropped.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the Y MUX. If SSF is low, the ALU result will be passed unshifted to
the Y MUX.
* A list of ALU operations that can be used with this instruction is given in Table 15.
Shift Operations
ALU Shifter
MQ Shifter
Arithmetic Right Arithmetic Right
Available Destination Operands (ALU Shifter)
RF
RF
(C5-CO)
(B5-BO)
Yes
No
Y-Port
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
Yes
Passes shifted output if high; passes ALU result
5100
No
LSB of ALU shifter is passed to MSB of MQ shifter,
and LSB of MQ shifter is dropped.
if low.
SI01
No
SI02
No
5103
No
Cn
No
Affects arithmetic operation specified in bits 13-10 of
instruction field.
3-165
I1 I*
Arithmetic Hight Double Precision Shift
SHAD
Status Signals t
ZERO
1 if result
N
= 0
=
1
MSB of result =
0
1 if MSB of result
o if
o
OVR
C
1 if carry-out condition
t C is ALU carry-out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Perform the computation A = (A + B)/2, where A and B are two's complement numbers.
Let A be a double precision number residing in register 1 (MSH) and MQ (LSH). Let
~ B be a single precision number which is input through the DB bus.
en
~
»
o
-I
CO
CO
W
N
Instr
Code
17-10
0001 0001
Oprd
Add,
A5-AO
000001
Op,d Sel
Dest
EB1Add,
EAEBO
C5-CO
XX XXXX 0 10
000001
Op,d
Add,
B5-BO
Destination Selects
WE3.
SELMQ
0
SELRF1·
SELRFO
0000
10
WEo
0EY3OEA 0EEi 0EY0 5Es
X
X
XXXX
0
CF2Cn CFO SSF
0 110 1
Assume register file 1 holds 4A08C618 (Hex). and DB bus holds 51007530 (Hex).
and MQ register holds 17299AOF (Hex).
MSH
Source
01001010000010001100011000011000 I
Source
0101 0001 000000000111 0101 0011 0000 IS+- DB bus
Intermediate:!:
Result
Destination
R +- RF(1)
1001 1011 00001001 0011 1011 01001000 I
ALU
01001101100001001001110110100100 I
RF(1) +-
Shifter +- R
ALU
+
S
+
Cn
shift result
LSH
Source
Destination
0001
on 1 00101001
1001 101000001111
0000 1011 1001 0100 1100 1101 00000111
MO shifter +- MO register
MO register +- MQ shift result
:tAfter the intermediate operation (ADD), overflow has occurred and OVR status signal is set high. When the
arithmetic right shift is executed, the sign bit is corrected (see Table 16 for shift definition notes).
3-166
SHC
Circular Hight Single Precision Shift
I8I
*
FUNCTION
Performs circular right shift on result of ALU operation specified in lower nibble of
instruction field.
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is shifted one bit
to the right. Bit 0 of the least significant byte is passed to bit 7 of the most significant
byte in the same word, which may be 1,2, or 4 bytes long depending on the selected
configuration.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the Y MUX. If SSF is low, the ALU result will be passed unshifted to
the Y MUX .
• A list of ALU operations that can be used with this instruction is given in Table 15.
Shift Operations
ALU Shifter
MQ Shifter
Circular Right
None
Available Destination Operands IALU Shifter)
RF
RF
(C5-CO)
(B5-BO)
Yes
No
Y-Port
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
Yes
Passes shift result if high; passes ALU result
SIOO
No
Rotates LSB to MSB of the same word, which may
SIOl
No
be 1, 2, or 4 bytes long depending on configuration
if low.
SI02
No
SI03
No
Cn
No
Affects arithmetic operation specified in bits 13-10 of
instruction field.
3-167
I8 I*
Circular Right Single Precision Shift
SRC
Status Signals t
ZERO
N
1 if result = 0
1 if MSB of result = 1
o if MSB of result
OVR
C
= 0
1 if signed arithmetic overflow
1 if carry-out condition
t C is ALU carry-out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Perform a circular right shift of register 6 and store the result in register 1.
C/)
2
"l>
~
n
Instr
Code
17-10
Oprd
Addr
A5-AO
1000 0110 000110
Oprd
Addr
85-80
XX
Oprd Sel
E81·
EA E80
xxxx a xx
Dest
Addr
C5-CO
00 0001
Destination Selects
SELRF1-
SELMa
iiVE3.
WEO
a
0000
10
SELRFO
OEY3-
OEA 0Eii
X
X
CF2OEYO OES Cn CFO SSF
a a 110 1
xxxx
-t
00 Assume register file 6 holds 3788C618 (Hex).
00
~
Source
0011 0111 1000 1000 1100 0110 0001 1000
Intermediate
Result
0011 0111 1000 1000 1100 0110 0001 1000
Destination
0001 1011 1100 0100 0110 0011 0000 1100
3-168
IR
+-
RF(6)
I ALU Shifter R + Cn
I RF( 1) ALU shift result
+-
+-
SHCD
Circular Hight Double Precision Shift
I9I
*
FUNCTION
Performs circular right shift on MO register (LSH) and result of ALU operation (MSH)
specified in lower nibble of instruction field.
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is used as the upper
half of a double precision word, the contents of the MO register as the lower half.
The contents of the ALU and MO shifters are rotated one bit to the right. Bit 0 of the
least significant byte in the ALU shifter is passed to bit 7 of the most significant byte
of the MO shifter. Bit 0 of the least significant byte is passed to bit 7 of the most
significant byte of the ALU shifter.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the Y MXU and MO register. If SSF is low, the Y MUX and MO register
will not be altered.
N
M
* A list of ALU operations that can be used with this instruction is given in Table 15.
U
~
~
«~
Shift Operations
"
Z
en
ALU Shifter
MQ Shifter
Circular Right
Circular Right
Available Destination Operands (ALU Shifter)
RF
RF
(C5-CO)
(B5-BO)
Yes
No
Y-Port
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
Yes
Passes shift result if high; passes ALU result and
SIOO
No
Rotates LSB of ALU shifter to MSB of MQ shifter,
and LSB of MQ shifter to MSB of ALU shifter
retains MQ register if low.
SIOl
No
SI02
No
SI03
No
Cn
No
Affects arithmetic operation specified in bits 13-10 of
instruction field.
3-169
I9 I*
Circular Hight Double Precision Shift
SHCD
Status Signals t
1 if result = 0
ZERO
1 if MSB of result = 1
N
o if
MSB of result = 0
1 if signed arithmetic overflow
OVR
C
if carry-out condition
t C is ALU carry-out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
en
Perform a circular right double precision shift of the data in register 6 (MSH) and MQ
(LSH), and store the result back in register 6 and the MQ register.
2
.....
~
l>
n
-t
Instr
Op,d
Op,d
Op,d Sel
Code
17·10
Add,
Add,
BS;eO
EB1EA EBO
10010110
AS-AO
000110
XX XXXX
0
XX
Dest
Add,
Destination Selects
WE3-
CS-CO
SELMQ
WEci
SELRF1·
SELRFO
000110
0
0000
10
0EY30eA DeB 0eYli 0Es
X
X
XXXX
0
Cn
CF2·
CFO
0
110
CO
CO Assume register file 6 holds 3788C618 (Hex) and MQ register holds 50A99AOF (Hex).
W
N
MSH
R
RF(6)
Source
0011 0111 00001000110001100001 1000
Intermediate
Result
0011 0111 0000 1000 110001100001 1000
Destination
1001 1011 1000010001100011 00001100
Source
0101 000010101001 1001 101000001111
MQ shifter - MQ register
Destination
001010000101 0100 1100 1101 0000 0111
MQ register - MQ shift result
+-
I ALU shifter R + Cn
I RF(6) - ALU shift result
+-
LSH
3-170
SRL
Logical Right Single Precision Shift
FUNCTION
Performs logical right shift on result of ALU operation specified in lower nibble of
instruction field.
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is shifted one bit
to the right. A zero is placed in the bit 7 of the most significant byte of each word
unless the SIO input for the word is programmed low; this will force the sign bit to
one. The LSB is dropped from the word, which may be 1,2, or 4 bytes long depending
on selected configuration.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the Y MUX. If SSF is low, the ALU result will be passed unshifted to
the Y MUX.
• A list of ALU operations that can be used with this instruction is given in Table 15.
N
('I)
ex)
ex)
....
U
Shift Operations
(")
~
CO
CO
W
N
3-172
I
+-
DA bus
RF( 1)
+-
+-
R + en
ALU shift result
SRLD
Logical Right Double Precision Shift
FUNCTION
Performs logical right shift on MQ register (LSH) and result of ALU operation (MSH)
specified in lower nibble of instruction field.
DESCRIPTION
The result of the ALU operation specified in instruction bits 13-10 is used as the upper
half of a double precision word, the contents of the MQ register as the lower half.
The ALU result is shifted one bit to the right. A zero is placed in the sign bit of the
most significant byte unless the SIO input for that word is programmed low; this will
force the sign bit to one. Bit 0 of the least significant byte is passed to bit 7 of the
most significant byte of the MQ shifter. Bit 0 of the least significant byte of the MQ
shifter is dropped.
The shift may be made conditional on SSF. If SSF is high or floating, the shift result
will be sent to the Y MUX and MQ register. If SSF is low, the ALU result and MQ
register will not be altered.
* A list of ALU operations that can be used with this instruction is given in Table 15.
Shift Operations
ALU Shifter
MQ Shifter
Logical Right
Logical Right
Available Destination Operands (ALU Shifter)
RF
RF
(C5-CO)
(B5-BO)
Yes
No
Y-Port
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
Yes
Passes shift result if high; passes ALU result and
5100
Yes
Fills a zero in M5B if high or floating;
SiOT
Yes
fills a one M5B if low.
5102
Yes
5103
Yes
Cn
No
retains MQ
Affects arithmetic operation specified in bits 13-10 of
instruction field.
3-173
I3 I*
Logical Right Double Precision Shift
SRLD
Status Signals t
ZERO
1 if result = 0
N == 1 if MSB of result = 1
o if MSB of result
OVR
C
= 0
1 if signed arithmetic overflow
if carry-out conditioh
t C is ALU carry-out and is evaluated before shift operation. ZERO and N (negative) are evaluated
after shift operation .. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
t/)
2
~
l>
C')
-4
00
00
Perform a logical right double precision shift of the data in register 1 (MSH) and MO
(LSH), filling a one into the most significant bit, and store the result back in register 1
and the MO register.
Instr
Op,d
Op,d
Op,d S.I
Dest
Code
Add,
Add,
EB1·
Add,
17·10
A5·AO
B5·BO
00110110 XX XXXX 00 0001
EAEBO
X
00
Destination Selects
iiVe3.
SELRF1·
C5·CO
SELMO
WEo
SELRFO
000001
0
0000
10
om·
CF2·
i5'EA 0Eii 0EY0 0Es
X
X
0
XXXX
Cn CFO
0
Si03. iESi03.
SIOO iESiOo
110 1110
0000
W
N Assume register file 1 holds 2DA8C615 (Hex) and MO register holds 50A99AOE (Hex).
MSH
Source
0010 1101 10101000 1100 0110 0001 0101
R
Intermediate
Result
0010 1101 10101000 110001100001 0101
ALU Shifter
Destination
10010110 1101 0100 0110 00110000 1010
I RF(1)
Source
0101 0000 1010 1001 1001 1010 0000 1110
I MQ shifter
Destination
1010 1000 0101 0100 1100 1101 0000 0111
+-
RF(1)
+-
+-
S + Cn
ALU shift result
LSH
3-174
+-
MQ register
MQ register
+-
MQ shift result
,
SUBI
Subtract Immediate
I7I8I
FUNCTION
Subtracts four-bit immediate data on A3-AO with carry from S-bus data.
DESCRIPTION
Immediate data in the range 0 to 15, supplied by the user at A3-AO, is inverted and
added with carry to S.
Available R Bus Source Operands (Constant)
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
..
A3-AO
Mask
No
Yes
No
N
M
00
00
No
t-
Available S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
«~
MO
"Z
Register
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
U
No
CJ)
Shift Operations
Y-Port
ALU
MO
Yes
None
None
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
SIOO
No
Inactive
SiOT
Inactive
Si02
No
No
5103
No
Inactive
Cn
Yes
Two's complement subtraction if programmed high.
Inactive
3-175
1718 1
SUBI
Subtract Immediate
Status Signals
ZEAO
N
OVA
C
1
if result =
1
if
1
0
MSB = 1
if arithmetic Signed overflow
if carry-out
EXAMPLE (assumes a 32-bit configuration)
Subtract the value 12 from data on the DB bus, and store the result into register file 1.
en
z
Inst,
Code
17-10
01111000
Op,d
Add,
A5-AO
001100
Op,d
Op,d Sel
Dest
EB1Add,
Add,
B5-BO
Eli EBO C5-CO
XX XXXX
X 10 00 0001
Destination Selects
WE3SELMa
o
SELRF1SELRFO
0000
10
WEo
0EY3-
15EA
Qeij
X
X
0eY0 0eS
XXXX
'" Assume bits A3-AO hold C (Hex) and DB bus holds 24000100 (Hex).
~
l>
o
-t
CO
CO
W
N
Source
00000000 0000 0000 0000 0000 00001100
I A +- A3-AO
Source
0010010000000000 0000 0001 00000000
Is+-
Destination
001001000000000000000000 1111 0100
I
3-176
DB bus
AF( 1)
+- A'
+
S
+
Cn
0
CF2Cn CFO
1 110
Subtract R with Carry (R ' + S + Cn)
SUBR
FUNCTION
Subtracts data on the R bus from S with carry.
DESCRIPTION
Data on the R bus is subtracted with carry from data on the S bus. The result appears
at the ALU and MQ shifters.
* The result of this instruction can be shifted in the same microcycle by specifying a shift instruction in the
upper nibble (17-14) of the instruction field. The rellult may also be passed without shift. Possible instructions
are listed in Table 15.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
Yes
No
No
Available S Bus Source Operands
RF
MQ
DB-Port
(B5-BO)
Register
Yes
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
Y-Port
Yes
No
ALU
MQ
Shifter
Shifter
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Affect shift instructions programmed in bits 17-14 of
SIOO
No
instruction field.
SI01
No
SI02
No
SI03
No
Cn
Yes
Two's complement subtraction if programmed high.
3-177
1* 12
Subtract q with Carry (R'
+
S
+
SUBR
Cn)
Status Signals t
ZERO
if result = 0
N
1 if MSB = 1
OVR
if signed arithmetic overflow
C
if carry-out
t C is ALU carry-out and is evaluated before shift operation. ZERO and N (negative) are evaluated after shift
operation. OVR (overflow) is evaluated after ALU operl;ltion and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Subtract data in register 1 from data on the DB bus, and store the result in the MQ
register.
en
2
-..J
t
n
Instr
Oprd
Oprd
Code
17-10
11100010'
Addr
A5-AO
Addr
85-80
000001
XX XX'XX
Oprd Sel
EB1EAE80
0
10
Destination Selects
Dest
Addr
C5-CO
SELMQ
XX XXXX
1
WEa-
SELRF1-
WEO
SELRFO
XXXX
XX
0eV3OEA OEB 6EYli OES
X
X
XXX X
0
-4
CF2Cn CFO
1
110
00 Assume register file 1 holds 15008400 (Hex) and DB bus holds 4900C350 (Hex).
ffiN
Source
0001 0101 0000 0000 1000 0100 1101 0000
I R-
RF( 1)
Source
01001001 0000000011000011 0101 0000
I S-
DB bus
Destination
0011 0100000000000011 111010000000
I
3-178
MQ register - R'
+ S + Cn
Subtract S with Carry (R + S' + Cn)
SUBS
FUNCTION
Subtracts data on the S bus from R with carry.
DESCRIPTION
Data on the S bus is subtracted with carry from data on the R bus. The result appears
at the ALU and MQ shifters.
"The result of this instruction can be shifted in the same microcycle by specifying a shift instruction in the
upper nibble (17-14) of the instruction field. The result may also be passed without shift. Possible instructions
are listed in Table 15.
Available R Bus Source Operands
C3-CO
A3-AO
RF
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
Yes
No
Available S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
MQ
Register
Yes
Yes
Available Destination Operands
RF
RF
(C5-CO) (B5-BO)
Yes
No
Y-Port
Yes
ALU
MQ
Shifter
Shifter
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Affect shift instructions programmed in bits 17-14 of
5100
No
instruction field.
5101
No
5102
No
5103
No
Cn
Yes
Two's complement subtraction if programmed high.
3-179
I* I3
Subtract S with Carry (R + Sf + Cn)
SUBS
StatUI> Signals t
ZERO
N
OVR
C
if result = 0
1 if MSB = 1
1 if signed arithmetic overflow
if carry-out
t C is ALU carry-out and is evaluated before ~hift operation. ZERO and N (negative) are evaluated
after shift operation. OVR (overflow) is evaluated after ALU operation and after shift operation.
EXAMPLE (assumes a 32-bit configuration)
Subtract data on the DB bus from data in register 1, and store the result in the MQ
register.
f/)
2
....,
~
l>
(")
....
ex)
ex)
W
N
Op,d
Op,d
Add,
Add,
A5·AO
B5·BQ
XX XXXX
Instr
Code
17-10
11100011
000001
Op,d Sel
EB1-
Oest
Add,
EAEBO
C5-CO
SELMQ
XX XXXX
1
0
10
Destination Selects
We3-
SELRF1·
WEO SELRFO
XXXX
XX
OEY3-
00i DEB 1iEYo 0Es
X'
X
XXXX
0
CF2·
Cn CFO
1
110
Assume register file 1 holds 15008400 (Hex) and DB bus holds 4900C350 (Hex).
Source
000101010000 000010000100110) 0000
I .R .... RF(1)
Source
01001001 0000000011000011 0101 0000
Is+- DB bus
Destination
3-180
1100 1011
i 111
1111 11000001 10000000
I MQ register .... R + S' + en
TBO
Test Bit (Zero)
3
8
FUNCTION
Tests bits in selected bytes of S-bus data for zeros using mask in C3-CO::A3-AO.
DESCRIPTION
The S bus is the source word for this instruction. The source word is passed to the
ALU, where it is compared to an a-bit mask, consisting of a concatenation of the C3-CO
and A3-AO address ports (C3-CO::A3-AO). The mask is input via the R bus. The test
will pass if the selected byte has zeros at all bit locations specified by the ones of
the mask. Bytes are selected by programming the SIO inputs low. Test results are
indicated on the ZERO output, which goes to one if the test passes. Register write
is internally disabled during this instruction.
Available R Bus Source Operands
N
M
CO
CO
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
..
I(.)
c:(
A3-AO
Mask
No
No
No
~
"Z
Yes
(IJ
Available S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
MQ
Register
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
SIOO
Yes
Byte Select
SI01
Yes
Byte Select
SI02
Yes
Byte Select
SI03
Yes
Byte Select
Cn
No
Inactive
3-181
131a
Test Bit (Zero)
TBO
.Status Signals
ZERO
1 if result (selected bytes)
N
o
OVR
0
0
C
Pass
EXAMPLE (assumes a 32-bit configuration)
Test bits 7, 6 and 5 of bytes 0 and 2 of data in register 3 for zeroes.
en
2
"
~
Instr
Mask
Oprd
Oprd Sol
Mask
Code
(LSH)
Addr
EB1·
(MSH)
17-10
A3-AO
B5-BO
EA EBO
C3-CO
0011 1000
0000
000011
X 00
1110
Destination Selects
WEi·
SELRF1-
SELMa
WeO
SELRFO
X
XXXX
xx
OEV3-
CF2-
0eA i5EB OEYO 0Es
x
x
XXXX
Q
Cn CFO
SiO"3- iEsi'1i35100 iESiOo
X 110 1010
Assume register file 3 holds 881 CD003 (Hex).
l>
(")
Source
11100000111000001110000011100000
I
R +- Mask (C3-CO::A3-AO)
Source
100010000001 11001101 000000000011
I
SN
-I
CO
CO
W
N
Output
tn
nth byte
3-182
GJ
+-
ZERO
RF(3)n t
+-
1
0000
Test Bit (One)
TB1
2
8
FUNCTION
Tests bits in selected bytes of S-bus data for ones using mask in C3-CO::A3-AO.
DESCRIPTION
The S bus is the source word for this instruction. The source word is passed to the
ALU, where it is compared to an 8-bit mask, consisting of a concatenation of the C3-CO
and A3-AO address ports (C3-CO::A3-AO). The mask is input via the R bus. The test
will pass if the selected byte has ones at all bit locations specified by the ones of the
mask. Bytes are selected by programming the SIO inputs low. Test results are indicated
on the ZERO output, which goes to one if the test passes. Register write is internally
disabled for this instruction.
Available R Bus Source Operands
C3-CO
RF
A3-AO
DA-Port
(A5-AO) Immed
..
A3-AO
Mask
No
No
No
Yes
Available S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
MQ
Register
Yes
Yes
Control/Data Signals
Signal
User
Use
Programmable
SSF
No
Inactive
SIOO
Yes
Byte Select
SIOl
Yes
Byte Select
SI02
Yes
Byte Select
Si03
Yes
Byte Select
Cn
No
Inactive
3-183
12 18
Test Bit (One)
TB1
Status Signals
ZERO
1 if result (selected bytes)
Pass
o
N
OVR
0
C
0
EXAMPLE (assumes a 32-bit configuration)
Test bits 7, 6 and 5 of bytes 1 and 2 of data in register 3 for ones.
CJ)
Destination Selects
Instr
Mask
Oprd
Oprd Sel
Mask
Code
(LSH)
Addr
EB1-
(MSH)
17-10
A3-AO
B5-BO
Eli EBO
C3-CO
SELMa
WEo
SELRFO
0010 1000
0000
000011
1110
x
xxxx
xx
x
00
0EY3-
WE3- SELRF1-
CF2-
OEA DEe 0W0 0eS
x
x
XXX X
0
Cn CFO
SiOO- iESi03SiOO iESiOii
X 110 1001
2
""'" Assume register file 3 holds 881 CFOO;3 (Hex).
~
l>
(")
Mask
11100000 111000001110000011100000
Rn - Mask (C3-CO::A3-AO)
Source
100010000001 11001101 000000000011
Sn - RF(3)n t
~
00
00
W
N
Output
tn
3-184
nth byte
G
ZERO - 0
0000
UDIVI
Unsigned Divide Iterate
IcI0 I
FUNCTION
Performs one of N-2 iterations of nonrestoring unsigned division by a test subtraction
of the N-bit divisor from the 2N-bit dividend. An algorithm using this instruction can
be found in the "Other Arithmetic Instructions" section.
DESCRIPTION
UDIVI performs a test subtraction of the divisor from the dividend to generate a quotient
bit. The test subtraction may pass or fail and is corrected in the subsequent instruction
if it fails. Similarly a failed test from the previous instruction is corrected during
evaluation of the current UDIVI instruction (see the "Other Arithmetic
Instructions"section for more details).
The R bus must be loaded with the divisor, the S bus with the most significant half
of the result of the previous instruction (UDIVI during iteration or UDIVIS at the
beginning of iteration). The least significant half of the previous result is in the MQ
register.
M
CO
CO
UDIVI checks the result of the previous pass/fail test and then evaluates:
U
F+-R+S
F +- R' + S
+ en
N
~
«~
,...
if the test is failed
if the test is passed
Z
CJ)
A double precision left shift is performed; bit 7 of the most significant byte of the
MQ shifter is transferred to bit 0 of the least significant byte of the ALU shifter. Bit 7
of the most significant byte of the ALU shifter is lost. The unfixed quotient bit is
circulated into the least significant bit of the MQ shifter.
Available R Bus Source Operands
C3-CO
A3-AO
RF
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
Yes
No
Recommended S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
Yes
MQ
Register
No
3-185
Ie 10
Unsigned Divide Iterate
Recommended Destination Operands Shift Operations
RF
RF
(C5-CO) (85-80)
Yes
No
Y-Port
ALU
MQ
Yes
Left
Left
Control/Data Signals
User
Signal
Programmable
SSF
No
Inactive
5100
No
Passes internally generated end-fill bit.
5101
No
en
5102
No
5103
No
-...I
Cn
Yes
2
~
»
(") Status Signals
-i
00
00
W
N
ZERO
1 if result = 0
N
o
OVR
o
C
3-186
1 if carry-out
Use
Should be programmed high.
UDIVI
UDIVIS
Unsigned Divide Start
IBI0 I
FUNCTION
Computes the first quotient bit of nonrestoring unsigned division. An
algorithm using this instruction is given in the "Other Arithmetic Instructjions" section.
DESCRIPTION
UDIVIS computes the first quotient bit during nonrestoring unsigned division by
subtracting the divisor from the dividend. The resulting remainder due to subtraction
may be negative; the subsequent UDIVI instruction may have to restore the remainder
during the next operation.
The R bus must be loaded with the divisor and the S bus with the most significant
half of the remainder. The result on the Y bus should be loaded back into the register
file for use in the next instruction. The least significant half of the remainder is in the
MQ register.
~
UDIVIS computes:
ex)
ex)
F
+-
R'
+
S
+
....
U
Cn
A double precision left shift is performed; bit 7 of the most significant byte of the
MQ shifter is transferred to bit 0 of the least significant byte of the ALU shifter. Bit 7
of the most significant byte of the ALU shifter is lost. The unfixed quotient bit is
circulated into the least significant bit of the MQ shifter.
Available R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
DA-Port
..
A3-AO
Mask
Yes
No
Yes
No
Recommended S Bus Source Operands
RF
(B5-BO)
Yes
DB-Port
Yes
MQ
Register
No
Recommended Destination Operands Shift Operations
RF
RF
(C5-CO) (B5-BO)
Yes
No
Y-Port
ALU
MQ
Yes
Left
Left
3-187
~
(1
ZERO
N
OVR
C
-4
CO
CO
W
N
3-190
1 if intermediate result =0
o
o
1 if carry-out
UDiVIT
UMULI
Unsigned Multiply Iterate
o o
FUNCTION
Performs one of N unsigned multiplication iterations for computing an N-bit by N-bit
product. An algorithm for unsigned multiplication using this instruction is given in the
"Other Arithmetic Instructions" section.
DESCRIPTION
UMULI checks to determine whether the multiplicand should be added with the present
partial product. The instruction evaluates:
F +- R + S +
F+-S
en
if the addition is required
if no addition is required
A double precision right shift is performed. Bit 0 of the least significant byte of the
ALU shifter is passed to bit 7 of the most significant byte of the MQ shifter; carry-out
is passed to the most significant bit of the ALU shifter.
The S bus should be loaded with the contents of an accumulator and the R bus with
the multiplicand. The Y bus result should be written back to the accumulator after
each iteration of UMULI. The accumulator should be cleared and the MQ register loaded
with the multiplier before the first iteration.
R Bus Source Operands
C3-CO
RF
A3-AO
(A5-AO) Immed
Yes
No
DA-Port
Yes
..
A3-AO
Mask
No
Recommended S Bus Source Operands
RF
MQ
DB-Port
(85-80)
Register
Yes
Yes
No
Recommended Destination Operands Shift Operations
RF
RF
(C5-CO) (85-80)
Yes
No
Y-Port
ALU
MQ
Yes
Right
Right
3-191
N
~
CO
~
u
(")
-I
00
00
eN
0')
4-2
SN74ACT8836 32·Bit by 32·Bit
Multiplier/Accumulator
The SN74ACT8836 is a 32-bit integer multiplier/accumulator (MAC) that accepts
two 32-bit inputs and computes a 64-bit product. An on-board adder is provided
to add or subtract the product or the complement of the product from the
accumulator.
To speed-up calculations, many modern systems off-load frequently-performed
multiply/accumulate operations to a dedicated single-cycle MAC. In such an
arrangement, the 'ACT8836 MAC can accelerate 32-bit microprocessors,
building block processors, or custom CPUs. The' ACT8836 is well-suited for
digital signal processing applications, including fast fourier transforms, digital
filtering, power series expansion, and correlation.
4-3
rJ)
2:
~
~
»
(")
~
co
co
eN
0')
4-4
SN74ACTB836
32·BIT BY 32·81T MULTIPLIER/ACCUMULATOR
03046. JANUARY 1988
•
Performs Full 32-Bit by 32-Bit
Multiply/Accumulate in Flow-Through Mode
in 60 ns (Max)
•
Can be Pipelined for 36 ns (Max) Operation
•
Performs 64-Bit by 64-Bit Multiplication in
Five Cycles
•
Supports Division Using Newton-Raphson
Approximation
•
Signed, Unsigned, or Mixed-Mode Multiply
Operations
•
EPIC'· (Enhanced-Performance Implanted
CMOS) l-J.'m Process
•
Multiplier, Multiplicand, and Product Can be
Complemented
•
Accumulator Bypass Option
•
TTL I/O Voltage Compatibility
•
Three Independent 32-Bit Buses for
Multiplicand, Multiplier, and Product
•
Parity Generation/Checking
•
Master/Slave Fault Detection
•
Single 5-V Power Supply
•
Integer or Fractional Rounding
description
The' ACT8836 is a 32-bit by 32-bit parallel multiplier/accumulator suitable for low-power, high-speed
operations in applications such as digital signal processing, array processing, and numeric data processing.
High speed is achieved through the use of a Booth and Wallace Tree architecture.
Data is input to the chip through two registered 32-bit DA and DB input ports and output through a registered
32-bit Y output port. These registers have independent clock enable signals and can be made transparent
for flowthrough operations.
II
The device can perform two's complement, unsigned, and mixed-data arithmetic. It can also operate as
a 64-bit by 64-bit multiplier. Five clock cycles are required to perform a 64-bit by 64-bit multiplication
and multiplex the 128-bit result. Division is supported using Newton-Raphson approximation.
A multiply/accumulate mode is provided to add or subtract the accumulator from the product or the
complement of the product. The accumulator is 67 bits wide to accommodate possible overflow. A warning
flag (ETPERR) indicates whether overflow has occurred.
A rounding feature in the' ACT8836 allows the result to be truncated or rounded to the nearest 32-bits.
To ensure data integrity, byte parity checking is provided at the input ports, and a parity generator and
master/slave error detection comparator are provided at the output port.
The SN74ACT8836 is characterized for operation from OOC to 70°C.
2:
o
i=
c:r:
~
a:
ou.
-
2:
w
(,)
2:
~
Q
c:r:
EPIC is a trademark of Texas Instruments Incorporated
ADVANCE INFORMATION doc.mants contain
~~~;:d::~nO:h:::or~::f~::.!~ac~::=,.:~
uta and other specifications are subject to change
without notice.
TEXAS
~
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
Copyright © '988, Texas Instruments Incorporated
4-5
SN74ACT8836
32-BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
logic symbol
32 x 32 MULTIPLIER/
ACCUMULATOR
4>
74ACT8836
ClK
CKEA
CKEB
CKEi
CKEY
OASGN
OBSGN
COMPl
RNOO
RN01
ACCO
ACC1
•
SFTO
SFT1
(J)
2
FTO
FT1
'-I
~
l>
n
SELY
-I
CO
CO
SElO
EA
EB
W
en
SElREG
WEMS
WElS
»C
<
»2
(H1)
.....
......
......
(H151
(H2)
(G141
(C12)
,....
OA31
m
'T1
0
OB31
PAR
STAT
OA REG
DB REG
ClK
EN
I REG
Y PORT
MASTER/SLAVE
(0151
EOUAl CHK
Y REG
(014)
0
(G13)
110
(H12)
(G12)
INSTR
INPUTS
(E15)
(C14)
(A15)
(E14)
(B15)
3
(M8)
0
(09)
OA
PORT
(013)
(F1)
PARITY
INPUTS
o
ISHIFTER
1 CONTROL
(G4)
(H13)
1
08
PORT
CONTROL
EXTENDED
PRECISION
YMUX
(H14)
......
(C3)
......
OMUX
RMUX
(06)
(07)
3
(B3)
(G1)
(05)
(M7)
0
o I FEEOTHROUGH
(G15)
(P9)
(010)
3
(813)
(B12)
0
2
Y OUT/EN
SMUX
(814)
...,
(03)
(02)
TESTI 0
PINS 1
_l'.. RAorR81
MS 32-BITS WRITE
...... lS 32-BITS ENA8lE
(G3)
..,
0
•••
•
•31•
0
••
•
•
••
31
PERRB
PERRY
MSERR
ETPERR
PYO
PY1
PY2
PY3
PAO
PA1
PA2
PA3
PBO
PB1
PB2
YETPO
YETP1
YETP2
(C13)
(E7)
(01)
PERRA
PB3
I
INPUT
SELECT
TPO
TP1
r
I OAT~
I OAT~
~
I
RESULTS;
::D
s:
»
:::!
0
2
TEXAS ."
4-6
(E141
Y PORT
PARITY
OBO
2
(081
(C15)
08 PORT
(F15)
OAO
0
(881
OA PORT
ClK
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TeXAS 75265
0
••
•
31
••
•
YO
Y31
SN74ACT8836
32·81T 8Y 32·81T MULTIPLIER/ACCUMULATOR
functional block diagram (positive logic)
SGNEXT
SELD
PA3-PAO +-_--/-..;4'--_+-_ _ _-1
PB3-PBO
+-_--1-:..4'--_+-___-1
2
PERRA
+ ____-+____....J
PERRB
+-----+-------'
2
SFT1-SFTO
SELREG
WEMS
WELS
32
DA31-DAO+--+3::.:2=-e_~_ _ _.....,
32
32
CKEA~----~----~---;--I
DB31-DBO
CKEB
CKEI
EA
~-----H-~EB
'----'
(0
('I)
DASGN
DBSGN
L -_ _ _ _ _ _ _-I
CO
CO
MULTIPLIER/ADDER STAGE 1
l-
RND1-RNDO
ACC1-ACCO
t)
PIPELINE REGISTER
COMPL
2
«
MULTIPLIER/ADDER STAGE 2
""'"2:
en
~----------1-CKEY
FT1-FTO~
TP1-TPO~
2
o
CLK+--
VCC~
l-
GND~
e:(
J'----------~~SELY
PERRY
~
a:
oLL
2
w
U
2
OEY~----------+~~~--~~~---~
ETPERR YETP2-YETPO
Y31-YO
e:(
>
C
MSERR PY3-PYO
e:(
TEXAS . "
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
4-7
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
GB PIN·GRID·ARRAY PACKAGE
(TOP VIEWI
2
A
B
C
D
E
F
G
H
J
K
L
M
N
p
R
3
4
5
• • • •
•
• @ • •
• • • •
• • •
• • •
• • •
• • • •
• • • •
• • • •
• • •
• • •
• • •
• • • •
•@ • •
• • • •
6
7
8
9
• • • •
• • • •
• • • •
• •
•
• • •
• • •
• • •
•
•
•
•
10 11 12 13 14 15
• • • • •
• • • • •
• • • • •
•
•
•
•
• •
• •
• •
•
•
•
•
• • • • •
• • • • •
• • • • •
• •
@.
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
• •
@ •
• •
GB PACKAGE PIN ASSIGNMENTS
NO.
Al
A2
A3
A4
A5
A6
A7
A8
A9
Al0
All
A12
A13
A14
A15
Bl
B2
83
B4
B5
B6
B7
B8
B9
Bl0
Bll
l>
C
<
l>
2
(')
m
2
."
o:xJ
s:
l>
:::!
o
PIN
NAME
Y8
Yl0
Yll
Y13
Y14
Y16
Y18
Y19
Y21
Y23
Y25
Y27
Y28
Y30
PYl
Y2
Y6
SELY
Y7
Y9
Y12
Y17
Y20
Y26
Y29
Y31
NO.
B12
B13
B14
815
Cl
C2
C3
C4
C5
C6
C7
C8
C9
Cl0
Cl1
C12
C13
C14
C15
Dl
D2
D3
D7
D8
09
D13
PIN
NAME
YETPl
YETPO
YETP2
PY3
YO
Y4
EB
Y5
VCC
GND
Y15
GND
Y22
GND
VCC
CKEY
OEY
ACCO
PERRY
WEMS
TPl
TPO
GND
VCC
Y24
ACCl
NO.
D14
D15
El
E2
E3
E13
E14
E15
Fl
F2
F3
F13
F14
F15
Gl
G2
G3
G4
G12
G13
G14
G15
Hl
H2
H3
H4
PIN
NAME
PYO
ETPERR
SELREG
Y3
GND
GND
PY2
RNDl
SFTO
Yl
GND
GND
MSERR
DASGN
SELD
SGNEXT
WELS
SFT1
RNDO
DBSGN
CKEI
FTl
CLK
CKEB
DBO
DBl
NO.
H12
H13
H14
H15
Jl
J2
J3
J4
J12
J13
J14
J15
Kl
K2
K3
K13
K14
K15
L1
L2
L3
L13
L14
L15
Ml
M2
PIN
NAME
COMPL
FTO
EA
CKEA
DB2
DB3
DB5
DB7
DA26
DA24
DA30
DA31
D84
DB9
D811
DA22
DA28
DA29
DB6
DB15
DB13
DA18
DA20
OA27
DB8
DB17
2
..If
4·8
TEXAS
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
PIN
NO.
M3
M7
M8
Ml0
M13
M14
M15
Nl
N2
N3
N4
N5
N6
N7
N8
N9
Nl0
Nll
N12
N13
N14
N15
Pl
P2
P3
P4
NAME
DB18
PBl
PAO
DA6
DA16
DA17
DA25
DB10
DB19
DB20
DB21
DB23
DB27
VCC
GND
DAO
DA4
DA10
DA13
DA15
DA19
DA23
DB12
DB16
DB24
D822
NO.
P5
P6
P7
P8
P9
Pl0
Pll
P12
P13
P14
P15
Rl
R2
R3
R4
R5
R6
R7
R8
R9
Rl0
Rll
R12
R13
R14
R15
PIN
NAME
DB25
D829
DB31
PERRA
PA2
DA2
DA8
DA12
DA14
DA11
DA21
DB14
DB26
DB28
D830
PBO
PB2
PB3
PERRB
PAl
PA3
DAl
DA3
DA5
DA7
DA9
SN74ACT8836
32-BIT BY 32-BIT MULTIPLIER/ACCUMULATOR
PIN
NAME
ACCO
ACCI
ClK
CKEA
CKES
NO.
C14
013
HI
H15
110
I
DESCRIPTION
Accumulate mode ope ode (see Table 2)
I
System clock
I
Clock enable for A register, active low
I
Clock enable for 8 register, active low
Clock enable for Y register, active low
CKEI
H2
G14
CKEY
C12
I
I
COMPl
H12
I
DAO
DAI
DA2
DA3
DA4
DA5
DA6
DA7
DAB
DA9
DA10
DAll
DA12
DA13
DA14
DA15
DA16
DA17
DA18
DA19
DA20
DA21
DA22
DA23
DA24
DA25
DA26
DA27
DA28
DA29
DA30
DA31
N9
Rll
Pl0
R12
Nl0
R13
Ml0
R14
Pll
R15
NIl
P14
P12
N12
P13
N13
M13
M14
l13
N14
l14
P15
K13
N15
J13
M15
J12
l15
K14
K15
J14
J15
DASGN
F15
Clock enable for I register. active low
Product complement control; high complements multiplier result, low passes multiplier unaltered
to accumulator.
U)
('I)
ex)
ex)
I
I-
DA port input data bits 0 through 31
U
C
~
TEXAS •
INSTRUMENTS
POST OFFICE BOX 656012. DALLAS, TEXAS 75265
4-11
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
PIN
»c
<
»z
NAME
NO.
YO
Yl
Y2
Y3
Y4
Y5
Y6
Y7
Y8
Y9
Yl0
Yll
Y12
Y13
Y14
Y15
Y16
Y17
Y18
Y19
Y20
Y21
Y22
Y23
Y24
Y25
Y26
Y27
Y28
Y29
Y30
Y31
YETPO
YETPl
YETP2
Cl
F2
Bl
E2
C2
C4
82
84
Al
B5
A2
A3
B6
110
DESCRIPTION
A4
A5
C7
A6
B7
A7
A8
B8
A9
C9
Al0
09
All
B9
A12
A13
Bl0
A14
Bl1
B13
B12
B14
110
110
Y port data bus. Outputs data from Y register (OEY ::::; L); inputs data to master/slave comparator
(DEY = HI.
Data bus for extended precision product. Outputs three most significant bits of the 67-bit multiplier
core result; inputs external data to master/slave comparator.
TABLE 1. INSTRUCTION INPUTS
Low
Signal
High
OASGN
Identifies DA Input data as two's complement
m
OBSGN
Identifies DB input data as two's complement
Identifies DB input data as unsigned
Z
RNOO
Rounds integer result
Leaves integer result unaltered
o;:g
s:
RNOl
Rounds fractional result
Leaves fractional result unaltered
(")
"T1
COMPL
»-I
ACCO
ACCl
oz
Identifies DA input data as unsigned
Complements the product from the multiplier
Passes the product from the multiplier to the
before passing it to the accumulator
accumulator unaltered
See Table 2
See Table 2
TEXAS . "
4·12
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
TABLE 2. MULTIPLIER/ADDER CONTROL INPUTS
ACCl
ACCO
EA
EB
0
0
X
X
X
X
X
X
Ace
Operation
± IR x S) + 0
± IR x S) + ACC
0
1
1
0
1
1
0
0
±1 x 1 + 0
1
1
0
1
±1 x DB + 0
1
1
1
0
±DA x 1 + 0
1
1
1
1
±DA x DB + 0
±IR x S) - ACC
is the data stored in the accumulator
TABLE 3. SHIFTER CONTROL INPUTS
SFTl
SFTO
l
l
Pass data without shift
Shift one bit left; fill with zero
Shifter Operation
l
H
H
l
Swap upper and lower halves of temporary register
H
H
Shift 32 bits right; fill with sign bit
TABLE 4. FLOWTHROUGH CONTROL INPUTS
Control Inputs
FTl
l
l
H
H
FTO
l
H
l
H
Registers Bypassed
Pipeline
Y
I
B
A
Yes
Yes Yes Yes Yes
Yes
No
Yes
Yes
No
No
No
No
No
No
No
No
No
No
<.0
M
CO
CO
No
I(.)
«
I"
"""
Z
TABLE 5. TEST PIN CONTROL INPUTS
TPl
l
l
H
H
TPO
l
H
l
H
Operation
All outputs and liDs forced low
en
All outputs and liDs forced high
All outputs placed in a high impedance state
Normal operation (default state)
2
o
data flow
Two 32·bit input data ports, DA and DB, are provided for input of the multiplicand and multiplier to registers
A and B and the multiplier/adder. Input data can be clocked to the A and B registers before being passed
to the multiplier/adder if desired. Two multiplexers, Rand S, in conjunction with a flowthrough decoder
select the multiplier operands from DA and DB ihPuts, A and B registers, or the temporary register. Data
is supplied to the temporary register from a shifter that operates on external OAf DB data or a previous
multiplier/adder result. The 67·bit multiplier/adder result can be output through the Y port or passed through
the shifter to the accumulator.
External DA and DB data is also available to the accumulator via the shifter. This 64-bit data can be extended
with zeros or the sign bit. The 64 least significant bits from the shifter may also be latched in the 64-bit
temporary register and input to the multiplier through the Rand S multiplexers. A swap option allows the
most significant and least significant 32-bit halves of temporary register data to be swapped before being
made available to the Rand S multiplexers. This allows either 32-bit half of the temporary register to be
used as a multiplier.
i=
C
C
C
<
>
2
SGNEXT
SELD
0 MUX ,
li
2
II
ACCUMULATOR
32
\;A
P'
M~
I
I
1
I
32
,32
B
REGISTER
\ . A MUX /
'\BMUX
~.
\
.r?
~
RMUX /
\
J
SMUX /
T
MULTIPLIER/ADDER STAGE 1
2
PIPELINE REGISTER
."
MULTIPLIER/ADDER STAGE 2
::0
T
0
67
s:
>
FIGURE 1. TEMPORARY REGISTER AND ACCUMULATOR
::!
0
2
TEXAS •
4-16
JL
I~
I
EA
I
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS. TEXAS 75265
SELREG
WEMS
WELS
DB31-DBO
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
shifter
The shifter can be used to multiply by two for Newton·Raphson operations or perform a 32-bit shift for
double precision multiplication. The shifter is controlled by two SFT inputs, as shown in Table 3.
Y register
Final or intermediate multiplier/adder results will be clocked into Y register when CKEY is low.
Results can be passed directly to the Y output multiplexer using flowthrough decoder signals to bypass
the register (see Table 4).
Y multiplexer and Y output multiplexer
The Y multiplexer allows the 64-bit result or the contents of the Y register to be switched to the Y bus,
depending upon the state of the flowthrough control outputs. The upper 32 bits are selected for output
when the Y output multiplexer control SEL Y is high; the lower 32 bits are selected for output when SEL Y
is low. Note that the Y output multiplexer can be switched at twice the clock rate so that the 64-bit result
can be output in one clock cycle.
flowthrough decoder
To enable the device to operate in pipelined or flowthrough modes, on-chip registers can be bypassed using
flowthrough control signals FT1 and FTO. Up to three levels of pipeline can be supported, as shown in
Table 4.
co
MULTIPLIER/ADDER STAGE 1
M
a)
a)
PIPELINE REGISTER
MULTIPLIER/ADDER STAGE 2
I-
()
0
«
TEXAS
-I!}
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
4-17
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
Y
~-------------------4-CKEY
REGISTER
~------------------~~SELY
PERRY
OEY~------------------~~~4------4~~------~
ETPERR YETP2· YETPO
Y31·YO
MSERR PY3·PYO
FIGURE 3. OUTPUT ERROR CONTROL
en
z
...,
~
l>
(')
-t
extended precision check
Three extended product outputs, YETP2-YETPO, are provided to recover three bits of precision during
overflow. An extended precision check error signal (ETPERR) goes high whenever overflow occurs. If sign
controls DASGN and DBSGN are both low, indicating an unsigned operation, the extended precision bits
66-64 are compared for equality. Under all other sign control conditions, bits 66-63 are compared for
equality.
CO master slave comparator
CO
Co\)
0')
»c
»<2
(')
-m
2
A master/slave comparator is provided to compare data bytes from the Y output multiplexer with data
bytes on the external Y port when OEY is high. A comparison of the three extended precision bits of the
multiplier/adder result or Y register output with external data in the YETP1-YETPO port is performed
simultaneously. If the data is not equal, a high signal is generated on the master slave error output pin
(MSERR). A similar comparison is performed for parity using the PY3-PYO inputs. This feature is useful
in fault-tolerant design where several devices vote to ensure hardware integrity.
test pins
Two pins, TP1-TPO, support system testing. These may be used, for example, to place all outputs in a
high-impedance state, isolating the chip from the rest of the system (see Table 5).
data formats
."
The 'ACT8836 performs single-precision and double-precision multiplication in two's complement, unsigned
magnitude, and mixed formats for both integer and fractional numbers.
:lJ
Input formats for the multiplicand (R) and multiplier (5) are given below, followed by output formats for
the fully extended product. The fully extended product (PRDT) is 67 bits wide. It includes the extended
product (XTP) bits YETP1-YETPO, the most significant product (MSP) bits Y63-Y32, and the least significant
product (LSP) bits Y31-YO.
o
s:
»
:::!
o
2
4-18
TEXAS . .
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
SN74ACT8836
32·BI1 BY 32·BI1 MULTIPLIER/ACCUMULATOR
This can be represented in notational form as follows:
PRDT
XTP : : MSP : : LSP
PRDT
YETP2 - YETPO : : Y63 - YO
or
Table 6 shows the output formats generated by two's complement, unsigned and mixed-mode
multiplications.
TABLE 6. GENERATED OUTPUT FORMATS
Two's Complement
Unsigned Magnitude
Two's Complement
Two's Complement
Two's Complement
Unsigned Magnitude
Two's Complement
Unsigned Magnitude
examples
Representative examples of single-precision multiplication, double-precision multiplication, and division using
Newton-Raphson binary division algorithm are given below.
single-precision multiplication
Microcode for the multiplication of two signed numbers is shown in Figure 1. In this example, the result
is rounded and the 32 most significant bits are output on the Y bus. A second instruction (SEL Y = 0)
would be required to output the least significant half if rounding were not used.
Unsigned and mixed mode single-precision multiplication are executed using the same code. (The sign
controls must be modified accordingly.) Following are the input and output formats for signed, unsigned,
and mixed mode operations.
e.>
«
'd'
Input Operand B
31
30
29
2
_2 31
2 30
2 29
22
0
21
I 31
20
(Sign 1
_231
30
29
2
2 30
2 29
22
0
21
""'"
en
Z
20
(Signl
Unsigned Integer Inputs
Input Operand A
31
30
29
2 31
2 30
2 29
.........
2
Input Operand B
2
22
21
0
31
30
29
2
20
2 31
2 30
2 29
22
0
21
20
31
30
29
_20
2- 1
2- 2
(Sign)
0
2- 29 2-30 2-31
I I
31
30
29
-20
2- 1
2- 2
-
C
-t
CO
CO
Extended
Product
(YETP2-YETPO)
Co\)
0')
I
('")
m
Z
o"
...........
30
31
32
I I 31
30
29 .......... .
234
233
232
231
230
2 29
o
2
Two's Complement Fractional Outputs
("')
»
c
<
»
z
Least Significant Product
(Y3l-YO)
Most Significant Product
(Y63-Y32)
66
65
64
-24
23
22
Most Significant Product
(Y63-Y32)
II
63
62
61
21
20
2- 1
'-...-'
30
Least Significant Product
(Y3l-YO)
31
II
32
31
30
29
..
'"
,
.....
2-31 2-32 2-33
2-28 2-29 2-30
2
r 60 2-61
0
2-62
(Sign)
Unsigned Fractional Outputs
Extended
Product
(YETP2-YETPO)
66
65
64
22
21
20
Least Significant Product
(Y3l-YO)
Most Significant Product
(Y63-Y32)
II
63
62
61
2- 1
2-2
2-3
. .....
.....
30
31
II
32
2-30 2-31 2-32
31
lJ
s:
»
:!
o
z
TEXAS
4-20
30
29 ...........
2-33 2-34 2-35
~
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
2
0
2-62 2-63 2- 64
I
SN74ACT8836
32-BIT BY 32-BIT MULTIPLIER/ACCUMULATOR
double-precision multiplication
To simplify discussion of double-precision multiplication, the following example implements an algorithm
using one' ACT8836 device. It should be noted that even higher speeds can be achieved through the use
of two' ACT8836s to implement a parallel multiplier.
The example is based on the following algorithm where A and Bare 64-bit signed numbers.
Let
Am = as,a62, a61,· .. , a32
and
AI = a31, a30, a29, ... , ao (ao
LSB)
Therefore:
A = (Am x 2 32 ) + AI
Likewise:
B = (B m x 2 32 ) + BI
Thus:
A x B = [(Am x 2 32 ) + AI] x [(B m x 2 32 ) + BI]
= (Am x Bm) 2 64 + (Am x BI + AI] x Bm )2 32
+ AI x BI
Therefore, four products and three summations with rank adjustments are required.
Basic implementation of this algorithm uses a single 'ACT8836. The result is a two's complement 128-bit
product. Microcode signals to implement the algorithm are shown in Figure 4.
The first instruction cycle computes the first product, AI x BI. The least significant half of the result is
output through the Y port for storage in an external RAM or some other 32-bit register; this will be the
least significant 32-bit portion of the final result.
The instruction also uses the shifter to shift the AI x BI product 32 bits to the right in order to adjust
for ranking in the next multiplication-addition sequence. The least significant half of the shift result is stored
in the lower 32-bit portion of the accumulator; the upper 32 bits contain the zero and fill.
The second instruction produces the second product, AI x Bm , adds it to the contents of the accumulator,
and stores the result in the accumulator for use in the third instruction.
Instruction 3 computes Am x BI, adds the result to the accumulator, and outputs the least significant
32 bits of the addition for use as bits 63-32 of the final product.
This instruction also shifts the result 32 bits to the right to provide the necessary rank adjustment and
stores the shift result (the most significant half of the addition result) in the lower 32 bits of the accumulator.
Bits ACC63-ACC32 are filled with zeros; the sign is extended into the three upper bits (ACC66-ACC64).
Instruction 4 computes the fourth product (Am x Bm), adds it to the accumulator, and outputs the least
significant half at the Y port for use as bits 95-64 of the final product.
This example assumes that the chip is operating in feed-through mode. A fifth instruction is therefore required
to perform the fourth iteration again so that bits 127-96 of the final product can be output.
c.o
M
00
00
I-
(.)
«
~
"""
z
en
2
o
i=
c
n
n
c
:s:
c
;:
-I
CI
:1:1
SN74ACT8836
32-BIT BY 32-BIT MULTIPLIER/ACCUMULATOR
Newton-Raphson binary division algorithm
The following explanation illustrates how to implement the Newton-Raphson binary division algorithm using
the 'ACT8836 multiplier/accumulator. The Newton-Raphson algorithm is an iterative procedure that
generates the reciprocal of the divisor through a convergence method.
Consider the equation Q = A/B. This equation can be rewritten as Q = A x (1/B). Therefore, the quotient
Q can be computed by simply multiplying the dividend A by the reciprocal of the divisor (B). Finding the
divisor reciprocal 1/B is the objective of the Newton-Raphson algorithm.
To calculate 1/B the Newton-Raphson equation, Xi + 1 = Xi(2'BXi) is calculated in an iterative process.
In the equation, B represents the divisor and X represents successively closer approximations to the
reciprocaI1/B. The following sequence of computation illustrates the iterative nature of the Newton-Raphson
algorithm.
Step 1
Step 2
Step 3
X 1 = XO(2-BXO)
X2 = X 1(2-BX 1)
X3 = X2(2-BX2)
Step n
Xn = Xn-1 (2-BXn-1 )
The successive approximation of Xi, for all i, approaches the reciprocal 1/B as the number of iterations
increases; that is
1im Xi = 1/B
i -+ n
The iterative operation is executed until the desired tolerance or error is reached. The required accuracy
for 1/B can be determined by subtracting each xi from its corresponding xi + 1. If the difference IXi + 1
- Xi I is less than or equal to a predetermined round off error, then the process is terminated. The desired
tolerance can also be achieved by executing a fixed number of iterations based on the accuracy of the
initial guess of 1/B stored in RAM of PROM.
II
The initial guess, XO, is called the seed approximation. The seed must be supplied to the Newton-Raphson
process externally and must fall within the range of 0 < XO < 2/B if B is greater than 0 or 2/B < XO < 0
if B is less than O.
To perform the Newton-Raphson binary division algorithm using the' ACT8836, the divisor, B, must be
a positive fraction. As a positive fraction, B is limited within the range of 1/2 ,;; B < 1.
Since Xi from Newton-Raphson must lie between 0 < Xi < 2/B and since the range of the positive fraction
B is 1/2 ,;; B < 1, then the limits of Xi become 1 ,;; Xi <2.
The range of - BXi will therefore be - 2 ,;; - BXi ,;; - 1/2.
z
-o
The limits of - BXi are shown in Table 7 as they would appear in the' ACT8836 extended bit, binary fraction
format.
I-
TABLE 7. LIMITS OF -BXi IN 'ACT8836 EXTENDED BIT FORMAT
a:
Extended Bits
-2
-%
66
65
64
1
1
1
1
1
1
63
62
61
......
2
1
0
0
1
0
1
0
0
......
0
0
0
0
0
0
......
C
The diagram indicates that - BXi is always of the form:
1 1 1 dO. d1 d2 ............ dn-2 dn-1
c
<
l>
2
m
(")
2
."
o
::D
s:
l>
:::!
o
2
TEXAS . "
4-24
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
absolute maximum ratings over operating free-air temperature range (unless otherwise noted)t
Supply voltage, Vee. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. -0.5 V to 6 V
Input clamp current, 11K (VIVee) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ±20 mA
Output clamp current, 10K (VOVee) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ±50 mA
Continuous output current, 10 (VO = 0 to Vee) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ± 50 mA
eontinous current through Vee or GND pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ± 100 mA
Operating free-air temperature range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 0 DC to 70°C
Storage temperature range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - 65°C to 150°C
t Stresses beyond those listed under "absolute maximum ratings" may cause permanent damage to the device. These are stress ratings
only and functional ope~ation of the device at these or any other conditions beyond those indicated under "recommended operating
conditions" is not implied. Exposure to absolute-maximum-rated conditions for extended periods may affect device reliability.
recommended operating conditions
MIN
NOM
MAX
4.5
5
5.5
V
Vee
0.8
V
Vee
Supply voltage
VIH
High-level input voltage
2
Vil
Low-level input voltage
0
10H
High-level output current
UNIT
V
~8
rnA
8
rnA
10l
Low-level output current
VI
Input voltage
0
Vee
V
Vo
dt/dv
Output voltage
0
Input transition rise or fall rate
0
Vee
15
nslV
TA
Operating free-air temperature
0
70
V
°e
electrical characteristics over recommended operating free-air temperature range (unless otherwise
noted)
PARAMETER
TEST CONDITIONS
10H
.-
~20 ~A
VOH
IOH
~
~8
rnA
10l ~ 20 ~A
VOL
10l ~ 8 rnA
~
Vee or 0
II
VI
lee
VI ~ Vee or 0.10
ei
VI ~ Vee or 0
Alee t
One input at 3.4 V.
other inputs at 0 or Vee
10ZH
VI ~ Vee or 0
10Zl
VI
~
Vee or 0
Vee
TA - 25°C
TYP
MAX
MIN
TA MIN
4.5 V
4.4
4.4
5.5 V
5.4
5.4
4.5 V
3.8
3.7
5.5 V
4.8
4.7
oDe
to 70 DC
UNIT
MAX
V
0.1
5.5 V
0.1
0.1
4.5 V
0.32
0.4
5.5 V
0.32
0.4
5.5 V
0.1
5.5 V
50
100
~A
10
10
pF
rnA
5
V
V
~A
± 1.0
1
1
5V
0.5
5
~0.5
I-
o
o
c:r:
TEXAS . "
INSTRUMENlS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
4-25
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
setup and hold times
PARAMETER
MIN
tsul
tsu2
Instruction before ClKi
Data before ClKi
14
12
tsu3
CKEA before ClKi
14
tsu4
CKEB before ClKi
14
tsu5
CiITi before
10
ClKi
tsu6
CKEY before ClK i
t su 7
SElREG before ClK i
19
12
tsu8
WEMS before ClKi
11
tsu9
th1
th2
WElS before ClKi
Instruction after CLKf
Data after ClK i
11
0
0
th3
CKEA after ClK i
0
tM
CKEB after ClKi
0
th5
CiITi after
0
th6
th7
CKEY after ClKi
SElREG after ClKi
0
0
th8
WEMS after ClK i
0
th9
WElS after ClKi
0
ClKi
(f)
:2
"l>
~
(")
-f
CO
CO
W
0)
l>
c
<
l>
2:
(')
m
-2:
."
0
::tI
~
l>
::!
0
2:
TEXAS •
4-26
INSTRUMENTS
POST OFFICE BOX 65!?012 • DALLAS, TEXAS 75265
MAX
UNIT
ns
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
switching characteristics over recommended ranges of supply voltage and free-air temperature (see
Figure 2) for load circuit and voltage waveforms)
PARAMETER
FROM
TO
(INPUT)
(OUTPUT)
FT MODE (FT1-FTO)
MIN
TYP
MAX
tpdl t
ClK
PIPE
11
36
tpd2 t
PIPE
Y REG
11
36
tpd3 t
PIPE
ACCUM
11
36
tpd4t
Y REG
Y
All modes
18
tpd5
SElY
Y
All modes
18
tpd6 t
ClK
Y REG
01
54
tpd7t
ClK
ACCUM
10 or 01
67
tpd8
ClK
Y
10
67
tpd9
DATA
Y
00
60
tpdl0 t
DATA
ACCUM
00
56
tpdl1
ClK
YETP
11 or 10
18
tpd12
ClK
ETPERR
11 or 10
18
tpd13
ClK
YETP
00
67
tpd14
ClK
ETPERR
01
67
tpd15
DATA
YETP
00
60
tpd16
DATA
ETPERR
00
60
tpd17
PA
PERRA
All modes
20
tpd18
DA
PERRA
All modes
20
tpd19
PB
PERRB
All modes
20
tpd20
DB
PERRB
All modes
20
tpd21
PY
PERRY
All modes
20
tpd22
Y
MSERR
All modes
22
tpd23
YETP
MSERR
All modes
22
ten2
YETP
All modes
20
tenl
DEY
DEY
Y
All modes
20
tdisl
DEY
YETP
All modes
15
tdis2
OEY
Y
All modes
15
UNIT
ns
•
z
o
t=
1
I
~ten2
I
I.
_I
tpd9
Y31-YO SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSX
..pzzz
I
~ tpdS
X
LSP
tdi.2~
1
XS
MSP
FIGURE 5. FULL FLOWTHROUGH MODE (FT - 001
CLK __________________________________________~~~-----------------1
CKEA.CKEB~~----------------------------------------I~----------------------_
CKEI. CKEY
24'
I
INSTR ~I:::::::::::::::::::::::::::::::::::::::::~I::::::::::::::::::==
I
I
1
1
I
I
I
I
I
x::::
DATA~~~==================================jl=================X===
1
I
SELREG ___
WEMS. WELS
l>
C
~
:2
om
:2
."
o:xJ
SUM-OF-
1
I
1
1
slsssssssssssssssssssssssssssssssssss~
--,-
*:---t -t
i
1
I
h7 h9 -----*'
PRODUCT
I
ACCUM.
1
I.
SELY
42277727222
I
t.u7-t.u9~
tpdl0
_I
1
1
,
I
1
{
1
I
1
1
1
DEY SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS 1>
I
.
1
I.
tpd9
.1
I
I
I
I
~I
tpdS--*-+I
I
,I
1
Y31-YO SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSlSSSX
1
LSP
X
ten2--k-------.1
s:
FIGURE 6. FULL FLOWTHROUGH MODE. ACCUMULATOR MODE (FT - 001
l>
:::!
o
:2
TEXAS •
4-28
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
tdis2 --I4-+t
I
1
MSP
XS
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
PARAMETER MEASUREMENT INFORMATION
CLK __________~r----1~__________________~~~---------------------I
1
1
1
CKEA. CKEB ~:
CKEI. CKEY '-t t . .
;-- su3- su6INSTR~:
_ tsu1--+!
:
DATA:::::XI
:
QZZZZZZZZZZZZZZZzzzzz;
1_
1 t t
-===~.• ==~.~h~3~-~h6~========
~
1
..
1;~-----th1---~.1
I
1
*-- tsu2--+!
I
A. B
X:::=================
1
1
1
1...
~----th2----...-I.1
1
PRDDUCT=====:ti============~f:::======~YR~EG[:====:::
I
1
...1~------tpd6------+l~
1
I
SELY
OEY
...rI
1
I
:
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS~ :
:
:
~
1
I+--+t:
Y31-YO SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
X
1
I
ten2 I.
.1
tpd4:
t p d5
LSP
-+--+:
FIGURE 7. FLOWTHROUGH PIPE ONLY VOLTAGE WAVEFORMS (FT
tdis2-+-+t
:
==XS
X:::=JM~S[P
•
= 01)
2
o
i=
c:r:
:E
a:
ou.
2
w
(.)
2
c:r:
>
c
c:r:
TEXAS •
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
4-29
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
PARAMETER MEASUREMENT INFORMATION
ClK
I
I
CKEA. CKEB
CKEI. CKEY
INSTR
I
_"S)-~~'
-., :+-~:;;:;;;;~--!------!-------j-----th3::t.;;i::t;::=::::f
tsu3- t su6
~ I
-.:
*:±=====J
:.-tsu 1
:
~th1~
DATAS!<:
--.:
SElREG
WEMS,WELS
~tsu2
*::::====:J
A.a
:
:
_th2~
I
I
I
I
SSS(SSSSSSS»
I
tsu7-tsu9
.1
I
I
I
I+-- t h7-t h9---+i
:
SUM-OFPRODUCT
I
I
SSSSSSSSSSSSSSSSS,*
I
I
en
z-.oJ
~
»
(")
-I
CO
CO
I
I
I
II
I
I
I
I
I
I
SSSSSSSSSSSSSSSSSK I
I
I
'--:======~'=======::±'====
x..
I
ACCUM.
i+---tpd7 ------..,I
PRODUCT
I
I
I
I
,rIZZ????Z?Z?ZZ?Z???????????????????????
I.
VREG
I I
I
xq:=======:I~======:::t====
:
Io--tpdS-----oI :
:
:
SElY ----------t:----~.-( I
I
I
OEY
I
I
SSSSSSSSSSSSSSSS)..1
en
I
I
: :
I
I
1/ ZZZZVZZZZZZZZZZZZZZZZZ
I I
I
I
I
I
I
II
I I
I
:
I
I
:
I
I
I
I
I
SSSSSSSSSSSSSSSSSSSS SX=:Jl!!SP~1=:*==:J!:MS~P=+':::::)j(~2Z:LZ~7~Z~Z2:ZZZZ2Z2/2Z22:LZ:L2~2~Z~/~Z~Z2:Z2:ZZ/Z
I
ten2
Co\)
'I
I I
l~tpd4
Y31-YO
I
....... SfSSSSSSSSSSSSS1SSSSSSS
I.
01
I
Io----<+-tpd5
~tdis2
FIGURE 8. FLOWTHROUGH PIPE ONLY. ACCUMULATOR MODE (FT = 01)
»c
<
»
z
(")
m
Z
."
olJ
s:
»
::t
oZ ________________________________________________
4-30
TEXAS . "
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
SN74ACT8836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
PARAMETER MEASUREMENT INFORMATION
ClK __________________~~L-----------------------------------------I
14- th3-th5-.i
CKEA. CKEB ~
I
I
~
I
...,..\SSSSSSSSS\SS\S$':::"
CKEI
' - - - tsu 3-tsu5-----'
INSTR::X
I
* ____________
~
~ul
~
i
1414------t hl------+l.1
DATA:J(::::::::::::::::::t::::::JA[.!B:::::::::::::::::::*:::::::::::::::::::::::~I
,
,
1
...- - - - t su 2 - - - -...1
I
I
I
,..'.-------th2-----_.1
SElY _____________~I----------------------------_1(
I
I
:
I
I
OEY SSSSSSSSSSSSSSSSS>SSSSSSSSSSSSSSSSS,/>
..(ZZZ
I
~ten2
I
tdis2~
I.
tpdB
..
~tpd5
I
Y31-YO-)~S"'S""'S""''''''''''''''S'''''''''S'''''S'''''S'''''S..,."...S""'",....S"""......S..,.S...S""'S,. . S,. . S""""'S..."...S...S,....S,....S......S..,."...S""'S,. . S,. . S""'S. ,.S...,,""'S,....*~--...,.,LS"'P---X-------:M"'S::-P---~
FIGURE 9. FLOWTHROUGH PIPE AND Y ONLY 1FT - 10)
ClK
,
CD
M
I
CO
---*I ~tsu3-tsu5
th3- th5*---1
CKEA. CKEB I I
I
}:- CO
CKEI
-~~~I---------------7-----------+---------------r-----------L----~~~ IINSTR ~
-.I
*:
I+------thl__
i
*
x
~tsul
x
xc~::==
I
I
I
'
~tsu2
I
I,
I
I
I
I
I
I
I
I
I
1+----tpd7--+f
I
:
I
I
I
I
1_
I
I
:
I
I
..aZZVZZZZZVVZZZZZZZZZZZOZZOaZOZ
~tsu7-t5u9
PRODUCT 227ZZZZZZZZZZZZZZZX
SElY
en
I
I
I
I
I4--- t h7- t h9---+i
I
I
X
I
!---th2_
SElREG
I
WEMS. WElS s\\\\\\sSSSSSSS)...
SUM-OF-
X
A.S
ACCUM.
I
I
:
I
*
,
«~
I"'"
xCtl::::::= Z
,
DATA::li<
U
I
I
I
:
Z
o
-
C
%>SS«SSSSs*
I
I
/
,
PRODUCT
SElY
/,
~~~
I
I
/
/
I
I
X
IYREG
'
/
I
I
XC==!======::=====t:==
:
/
/
:
I
VAEG
/
I
I
I
~r~I------~~__
//
/
I
"
I
/
~ '4>'S§S\\\\\\\\f§§SSSSSSSr :
Y31-YO
I
,
/
/
I
*:::::=====t=====::!:====::±I==
PIPE
I
/
I
:
:
,
S&~~~,,~'\\\§\>*
I
xq~====:)XI
*'
PIPE
_tpd1---ot
:
Xc~===::>a::::=
I
CO/2'
--.: r.
d:::====>e!==
,
X
I
/
I
I ~tpd4""
::
,
I
, I
I',
,
I
S§\m~~~""~~~
I4-len2 _
LSP1 :
:
L
-4_____~(----t---~\~SS"~S~SS'~~'~,~§~S~S
/
I
I
tpd5~
'I
:
:
I
tpd5~
I I
1/zzm*mm
"
j(:::::::iM~S[P'C::X!(=::::J;LS!iP[2=:X~=]!M!!SP[2~:~)vZ/.~?:2zZ?z~?:2zza~'l
I
I
14----01-- tpd5
---.I !4- 'dis2
FIGURE 11. ALL REGISTERS ENABLED (FT - 11)
»
c
»2<
o
m
2
o
."
:::a
~
::!
o
2
TEXAS
4-32
i
I '
~
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
SN74ACTB836
32·BIT BY 32·BIT MULTIPLIER/ACCUMULATOR
PARAMETER MEASUREMENT INFORMATION
ClK
-~ :.- th3-t su6
CKEA. CKEB::cl. I
CKEI. CKEY
I
INSTR:JIC :
...,
j4- tsu1
:
I
I
*:
x
JI<: •. B'"
-.I
:+-tsu 2
:
:+--th2_
INTR. PRDDUCT
xcj=====:::>xq:====~x:::;
WEMS. WELS
I
I
I
:
I
I
l,tsU7-tsu9~,
I
I
I
I
tpd3~
PRODUCT
SS\SSSSSSSSSSSS\\SS*
I
SElY
I
tpd 2 - - "
:
I
:
:
:
I
1~-4--~~~
I
I
I
I
I
I
OEY SSS\'L'Z\"Sb~,§\\S», I
I
I
I t+-.t-tpd4
:
I
.X:
"
'CCUM.
I
I
I
;.
.
I
I
I
I
I
I
I
I
'I
XC=Yil!RliEGC:*==)XC===::t=====~:=====::~=:
I
YREG :
I
I
:
I
I
""];§£i!~;==)X
"""y""'~"(&"'~ACCUM.
.
..
I
I
I
I
{WflTLYAV?ZZV/TAVrLrfld/Yqfl!ZTAVW2V1W
SUM-OFPRODUCT
I
I
~th7-th9-.1
I
I
*::::=====~:=====:::t=====t==
I
I
PIPE
I
I
I
I
I
I
XI
PIPE
I
I
I
I
I
----.I
S$\\.'§\\\\\\\}
I
I
I
~~'$A~~"'X
I
I
- - tpd1
I
SElREG
r-
J.zzzi
=====>¢:::=
----.I
l!<:::::::=:JC~.£D]12[1=)xC::±====~X::::~====::JX¢:::=
I
14-- th 1
DATA
th3- t h6"':
:
I
I
:
'
:
::
:
I
I
Y31·YO SSSSSSSSSSSS~
1
:
lSP1
*
-..-.I-ten2
I
I
I
I
I
I
I
tpd5~
______-+J/~-------:I~',S~S~S~SSSS~S~S~SS~S~~S~SS~~S~S
I
:
MSP1
I
I
I
I
I
'
I I
4Zmmzzmzzza
*
: :
LSP2
X
I
MSP2
,I
I
:
~,§\SSS\\\&'5S'
--.t ~tdis2
i+---M-tpd5
FIGURE 12. ALL REGISTERS ENABLED. ACCUMULATOR MODE (FT - 11)
Tvce
S1
TEST
PARAMETER
ten
FROM OUTPUT _ _P_O.IN
....T
_ _"'R"'loy.._ _•
UNDER TEST
tdis
tpZH
tpZL
tpHZ
tpLZ
Rl
clt
1 kll
50 pF
1 kll
50 pF
-
50 pF
tDd
S1'
OPEN
S2
CLOSED
CLOSED
OPEN
OPEN
CLOSED
CLOSED
OPEN
OPEN
OPEN
z
o
i=
C
(")
-I
00
00
CAl
-..J
5-20
NO.
DESCRIPTION
I/O
OENORM
B16
I/O
ENRA
M2
I
ENRB
M1
I
FAST
E3
I
GNO
GNO
GNO
GND
GND
GND
GND
GNO
GND
GND
GND
GND
GND
GNO
GND
GNO
GNO
04
06
07
09
010
012
013
E4
E14
F4
F14
H4
H14
K4
K14
L14
M4
HALT
R2
10
11
12
13
14
15
16
17
18
19
INEX
E2
01
E1
F2
G3
F1
G2
G1
H3
H1
C14
Status pin indicating a denormal output from the
ALU or a wrapped output from the multiplier. In
FAST mode, causes the result to go to zero when
OENORM is high.
When high, enables loading of RA register on a
rising clock edge if the RA register is not disabled
(see PIPESO below).
When high. enables loading of RB register on a
rising clock edge if the RB register is not disabled
(see PIPESO below).
When low. selects gradual underflow (IEEE mode).
When high. selects sudden underflow. forcing all
denormalized inputs and outputs to zero.
Ground pins. NOTE: All ground pins should be
used and connected.
I
Stalls operation without altering contents of
instruction or data registers. Active low.
I
Instruction inputs
I/O
Status pin indicating an inexact output
Table 2 .• ACT8837 Pin Functional Description (Continued)
PIN
NAME
NO.
I/O
IVAL
A15
I/O
MSERR
0
OEC
E17
A1
A2
A16
A17
B1
B17
H2
J15
P1
S1
T1
T16
T17
G15
OES
F17
I
OEY
F16
I
OVER
B14
I/O
PAO
PA1
PA2
PA3
PBO
PB1
PB2
PB3
L17
K15
K16
K17
S2
P4
R3
T2
PERRA
F15
0
PERRB
C1
0
PIPESO
P2
I
PIPES1
R1
I
NC
DESCRIPTION
Status pin indicating that an invalid operation or a
nonnumber (NaN) has been input to the multiplier
or ALU.
Master/Slave error output pin
No internal connection. Pins should be left floating.
I
Comparison status output enable. Active low.
Exception status and other status output enable.
Active low.
Y bus output enable. Active low.
Status pin indicating that the result is greater the
largest allowable value for specified format
(exponent overflow).
I
Parity inputs for DA data
I
Parity inputs for DB data
DA data parity error output. When high, signals a
byte or word has failed an even parity check.
DB data parity error output. When high, signals a
byte or word has failed an even parity check.
When low, enables instruction register, RA and RB
input registers. When high, puts instruction
register, RA and RB registers in flowthrough mode.
When low, enables pipeline registers in ALU and
multiplier. When high, puts pipeline registers in
flowthrough mode.
5-21
Table 2. 'ACT8837 Pin Functional Description (Continued)
PIN
NAME
en
z
~
~
»
(")
-4
00
00
eN
NO.
I/O
DESCRIPTION
I
When low, enables status register, product (P) and
sum (S) registers. When high, puts status register,
P and S registers in flowthrough mode.
PIPES2
N4
PYO
PY1
PY2
PY3
A13
C12
B13
A14
RESET
P3
I
RNDO
RND1
F3
D2
I
RNDCO
B15
I
SELMS/LS
G16
I
SELOPO
SELOP1
SELOP2
SELOP3
SELOP4
SELOP5
SELOP6
SELOP7
SELSTO
SELST1
J3
J2
J1
K1
K2
K3
L1
L2
H17
H16
SRCC
J16
I
SRCEX
C16
I/O
STEXO
STEX1
D16
D15
I/O
TPO
TP1
H15
G17
I
UNDER
C13
I/O
UNORD
D17
I/O
I/O
I
I
~
5-22
Y port parity data
Clears internal states and status with no effect to
data registers. Active low.
Rounding mode control pins. Select four IEEE
rounding modes (see Table 18).
When high, indicates the mantissa of a wrapped
number has been increased in magnitude by
rounding.
When low, selects LSH of 64-bit result to be
output on the Y bus. When high, selects MSH of
64-bit result.
Select operand sources for multiplier and ALU
(See Tables 6 and 7)
Select status source during chained operation
(see Table 16)
When low, selects ALU as data source for C
register. When high, selects multiplier as data
source for C register.
Status pin indicating source of status, either
ALU (SRCEX = L) or multiplier (SRCEX = H)
Status pins indicating that a nonnumber (NaN) or
denormal number has been input on A port
(STEX1) or B port (STEXO).
Test pins (see Table 19)
Status pin indicating that a result is inexact and
less than minimum allowable value for format
(exponent underflow).
Comparison status pin indicating that the two
inputs are unordered because at least one of them
is a nonnumber (NaN).
Table 2 .• ACT8837 Pin Functional Description (Concluded)
PIN
NAME
VCC
VCC
VCC
VCC
VCC
VCC
VCC
VCC
VCC
VCC
YO
Y1
Y2
Y3
Y4
Y5
Y6
Y7
Y8
Y9
Y10
Y11
Y12
Y13
Y14
Y15
Y16
Y17
Y18
Y19
Y20
Y21
Y22
Y23
Y24
Y25
Y26
Y27
Y28
Y29
Y30
Y31
NO.
05
08
011
014
G4
G14
J4
J14
L4
M14
C2
03
82
C3
83
A3
C4
84
A4
C5
85
A5
C6
86
A6
C7
87
A7
C8
88
A8
A9
89
C9
A10
810
C10
A11
811
A12
C11
812
DESCRIPTION
1/0
5-V power supply
'"
M
1/0
CO
CO
I-
32-bit Y output data bus
o
«~
'"2:
en
5-23
, ACT8837 Specification Tables
absolute maximum ratings over operating free-air temperature range
(unless otherwise noted) t
Supply voltage, Vee ....................... - 0.5 V to 6 V
Input clamp current, 11K (VI < 0 or VI > Vee) ........ ± 20 mA
Output clamp current, 10K (VO <0 or Vo > Vee) . . . .. ± 50 mA
eontinuous output current, 10 (VO = 0 to Vee) . . . . . .. ± 50 mA
eontinuous current through Vee or GND pins . . . . . . .. ± 100 mA
Operating free-air temperature range . . . . . . . . . . . .. ooe to 70 0 e
Storage temperature range. . . . . . . . . . . . . . . .. - 65 °e to1 50 0 e
tStresses beyond those listed under "absolute maximum ratings" may cause permanent damage
to the device. These are stress ratings only and functional operation of the device at these or
any other conditions beyond those indicated under "recommended operating conditions" is
not implied. Exposure to absolute-maximum-rated conditions for extended periods may affect
device reliability.
recommended operating conditions
PARAMETER
V
y£t'
V
0
_v,:10.8
-8
A 'x'
_':''j(>
8
Low-level input voltage
IOH
High-level output current
»
(")
IOL
VI
Input voltage
OQ
5.0
5.25
VIH
co
W
......
4.75
UNIT
2
VIL
-I
MAX
Supply voltage
z
......
~
NOM
Vee
High-level input voltage
en
SN74ACT8837
MIN
Low-level output current
Vo
dt/dv
Output voltage
TA
Operating free-air temperature
5-24
Input transition rise or fall rate
~lPv
"<0
0
0
V
mA
mA
Vee
V
Vee
15
ns/V
70
V
°e
electrical characteristics over recommended operating free-air
temperature range (unless otherwise noted)
PARAMETER
TEST CONDITIONS
10H
=
10L
=
-8 mA
II
ICC
Ci
VI
VI
Vi
3.76
5.5 V
4.76
TYP MAX
UNIT
5.5 V
10
V
.<
...•'0',\'\;>.
5.5 V
4.5 V
mA
= VCC or 0
= VCC or 0,
= VCC or 0
4.5 V
4.5 V
= 20 p.A
=8
MIN
5.5 V
VOL
10L
SN74ACT8837
TA - 25°C
MIN TYP MAX
4.5 V
-20 p.A
VOH
10H
VCC
J'
\
';:"
.O(\i)~i
'
0.45
.
V
0.45
5.5 V
±1
p.A
5.5 V
200
p.A
5V
pF
switching characteristics (see Note)
PARAMETER
SN74ACT8837-65
MIN
MAX
Propagation delay from DAIDB/I inputs
tpd1
to Y output
Propagation delay from input register to
tpd2
output buffer
tpd5
output
buff~r
Propagation delay from SELMS/LS to Y output
<$.,F;,:§i'
':;
.
Propagation delay from input register to
td1
output register
Delay time', input register to pipeline register or
td2
pipeline register to output register
ns
118
ns
~.~..\\'l.(.,.'~
70
ns
30
ns
32
ns
95
ns
,....'<-\;.
output buffer
Propagatipn delay from output register to
tpd4
125
,\
Propagation delay from pipeline register to
tpd3
UNIT
65
ns
Note: Switching data must be used with timing diagrams for different operating modes.
5-25
setup and hold times
SN74ACT8837-65
MIN
MAX
PARAMETER
tsu1
Setup time. Instruction before ClK!
18
tsu2
Setup time. data operand before ClK!
18
tsu3
for double-precision operation (input register
UNIT
.,~' ~. ns
£>1).,,,,"Vi ,.
ns
f'\
;O'\) .J
Setup time. data operand before second ClK I
~O
ns
0
ns
not enabled)
th1
Hold time. Instruction input after ClK I
clock requirements
SN74ACT8837-65
PARAMETER
tw
Pulse duration
Clock period
5-26
I ClK high
I ClK low
MIN
15
~t.
~,tc1 1(T"
~Jv
~J.lNIT
ns
ns
SN74ACT8837 FLOATING POINT UNIT
The SN74ACT8837 is a high-speed floating point unit implemented in TI's
advanced 1-p.m CMOS technology. The device is fully compatible with IEEE
Standard 754-1985 for addition, subtraction and multiplication operations.
The' ACT8837 input buses can be configured to operate as two 32-bit data buses
or a single 64-bit bus, providing a number of system interface options. Registers are
provided at the inputs, outputs, and inside the ALU and multiplier to support multilevel
pipelining. These registers can be bypassed for nonpipelined operation.
A clock mode control allows the temporary register to be clocked on the rising edge
or the falling edge of the clock to support double precision operations (except
multiplication) at the same rate as single precision operations. A feedback register with
a separate clock is provided for temporary storage of a multiplier result, ALU result
or constant.
To ensure data integrity, parity checking is performed on input data, and parity is
generated for output data. A master/slave comparator supports fault-tolerant system
design. Two test pin control inputs allow alii/Os and outputs to be forced high, low,
or placed in a high-impedance state to facilitate system testing.
Floating point division using a Newton-Raphson algorithm can be performed in a sumof-products operating mode, one of two modes in which the multiplier and ALU operate
in parallel. Absolute value conversions, floating point to integer and integer to floating
point conversions, and a compare instruction are also available.
I'
Data Flow
('I')
CO
Data enters the' ACT8837 through two 32-bit input data buses, DA and DB. The buses 00
can be configured to operate as a single 64-bit data bus for double precision operations ~
U
(see Table 7). Data can be latched in a 64-bit temporary register or loaded directly
n
-I
CO
CO
SELMS/l§'
SELST1-
SElSTO
FROM
INSTRUCTION - - •
REGISTER
ore
CAl
m
-.J
PY3-PYO
Y31-YO
m
MSERR
UNORD
AGT B
A eQ B
IVAL
IHEX
OVER
UNDER
DEHORM
DENIN
RNOCO
SRCEX
CHEX
STEX1-STEXO
Figure 1. •ACT8837 Floating Point Unit
5-28
A parity check can also be performed on the entire input data word by setting BYTEP
low. In this mode, PAO is the parity input for DA data and PBO is the parity input for
DB data.
Temporary Input Register
A temporary input register is provided to enable double precision numbers on a single
32-bit input bus to be loaded in one clock cycle. The contents of the DA bus are loaded
into the upper 32 bits of the temporary register; the contents of DB are loaded into
the lower 32 bits. A clock mode signal (ClKMODE) determines the clock edge on which
the data will be stored in the temporary register. When ClKMODE is low, data is loaded
on the rising edge of the clock; when ClKMODE is high, data is loaded on the falling
edge.
RA and RB Input Registers
Two 64-bit registers, RA and RB, are provided to hold input data for the multiplier
and AlU. Data is taken from the DA bus, DB bus and the temporary input register,
according to configuration mode controls CON FIG l-CONFIGO (see Tables 3 and 5).
The registers are loaded on the rising edge of clock ClK. For single-precision operations,
CONFIG1-CONFIGO should ordinarily be set to 0 1 (see Table 4).
Table 3. Double-Precision Input Data Configuration Modes
LOADING SEQUENCE
DATA LOADED INTO
TEMP REGISTER ON FIRST
DATA LOADED INTO
CLOCK AND RA/RB
RA/RB REGISTERS ON
REGISTERS ON SECOND
SECOND CLOCK
~
CLOCKt
0
VvLNS
U1
IN
N
Table 8. Independent ALU Operations, Single Operand (19 ... 0, 16 - 0)
CHAINED
OPERATION
PRECISION
19
RA
18
0= Not
Chained
0= A(SP)
1 = A(OP)
PRECISION
RB
16
17
0= B(SP)
1 = B(OP)
OPERAND
TYPE
OUTPUT
SOURCE
o
= ALU
result
ABSOLUTE
VALUE A
ALU OPERATION
15
14
13-10
1 = Single
Operand
O=A
1 = IAI
0000
0001
0010
0011
0100
0101
0110
0111
1000
----_ ..
-
1001
1010
1011
1100
1101
1110
1111
RESULT
Pass A operand
Negate A operand
Integer to floating point
conversion t
Floating point to integer
conversion
Undefined
Undefined
Floating point to floating
point conversion:J:
Undefined
Wrap (denormal) input
operand
Undefined
Undefined
Undefined
Unwrap exact number
Unwrap inexact number
Unwrap rounded input
Undefined
tThe precision of the integer to floating point conversion is set by 18.
*This converts single precision floating point to double precision floating point and vice versa. If the 18 pin is low to indicate a single-precision input, the result
of the conversion will be double precision. If the 18 pin is high, indicating a double-precision input, the result of the conversion will be single precision.
Table 9. Independent ALU Operations, Two Operands (19 ... 0, 15 ... 0)
CHAINED
OPERATION
19
PRECISION
RA
18
PRECISION
RB
17
OUTPUT
SOURCE
16
OPERAND
TYPE
15
ABSOLUTE
VALUE A
14
ABSOLUTE
VALUE B
13
ABSOLUTE
VALUE Y
12
0= Not
chained
0= A(SP)
1 = A(DP)
o
0= ALU
result
0= Two
operands
O=A
1 = IAI
0= B
1 = IBI
0= V
1 = IVI
= B(SP)
1 = B(OP)
--
ALU OPERATION·
11-10
RESULT
00
01
10
11
A+B
A-B
Compare A, B
B - A
'-
.-
Table 10. Independent Multiplier Operations (19 ... 0, 16 ... 1)
CHAINED
OPERATION
19
o=
Not
chained
PRECISION
RA
18
0= A(SP)
1 = A(DP)
PRECISION
RB
17
o = B(SP)
1 = S(OP)
OUTPUT
SOURCE·
16
1 = Multiplier
result
15
0
tSee Table 15.
U1
W
eN
SN74ACT8837
ABSOLUTE
VALUE A
14t
ABSOLUTE
VALUEB
13 t
NEGATE
RESULT'
12t
O=A
1 = IAI
0= B
1 = IBI
0= V
1 = IVI
WRAP A
11
o=
Normal
format
1 = A is a
wrapped
number
WRAPB
10
o=
Normal
format
1 = B is a
wrapped
number
I
I
Table 11. Independent Multiplier Operations Selected by 14-12 (19 = 0, 16 = 1)
ABSOLUTE
VALUE A
ABSOLUTE
VALUE B
NEGATE
RESULT
OPERATION SELECTED
14
13
12
14-12
RESULTS
O=A
1 = IAI
0= B
1 = IBI
O=Y
1 = -y
000
001
010
011
100
101
110
111
A*B
-(A * B)
A * IBI
-(A * IBI)
IAI * B
-(IAI * B)
IAI * IBI
-(IAI * IBI)
Table 12. Operations Selected by 18-17 (19 - 0, 16 - 1)
PRECISION
SELECT RA
18
(I)
:2
"~
»
(")
-4
00
00
W
"
PRECISION
RAINPUT
PRECISION
SELECT RB
17
PRECISION
RBINPUT
PRECISION
OF RESULT
0
Single
0
Single
Single
0
Single
Converted
to Double
1
Double
Double
1
Double
0
Single
Converted
to Double
Double
1
Double
1
Double
Double
Master/Slave Comparator
A master/slave comparator is provided to compare data bytes from the Y output
multiplexer and the status outputs with data bytes on the external Y and status ports
when OEY, OES and OEC are high. If the data bytes are not equal, a high signal is
generated on the master/slave error output pin (MSERR).
Status and Exception Generator/Register
A status and exception generator produces several output signals to indicate invalid
operations as well as overflow, underflow, non numerical and inexact results, in
conformance with IEEE Standard 754" 1985. If output registers are enabled
(PIPES2 = 0), status and exception results are latched in a status register on the rising
edge of the clock. Status results are valid at the same time that associated data results
are valid. Status outputs are enabled by two signals, O'EC for comparison status and
OES for other status and exception outputs. Status outputs are summarized in
Tables 14 and 15.
During a compare operation in the ALU, the AEQ8 output goes high when the A and
8 operands are equal. When any operation other than a compare is performed, either
by the ALU or the multiplier, the AEQ8 signal is used as a zero detect.
5-34
Table 13. Chained Multiplier/ALU Operations (19 = 1)
CHAINED PRECISION PRECISION
OPERATION
RB
RA
17
19
18
1 = Chained 0= A(SP) o = B(SP)
1 = A(DP) 1 = B(DP)
OUTPUT
SOURCE
ADD ZERO
16
O=ALU
result
1 = Multiplier
result
MULTIPLY
BY ONE
15
o=
Normal
operation
1 = Forces
B2 input
of ALU
to zero
CJ1
W
CJ1
SN74ACT8837
NEGATE
NEGATE MULTIALU RESULT PlIER RESULT
14
o=
Normal
operation
1 = Forces
B1 input
of multiplier to
one
12
11-10
RESULT
Normal
operation
1 = Negate
multiplier
result
00
01
10
11
A+B
A-B
2 - A
B - A
13
o=
Normal
operation
1 = Negate
ALU
result
ALU
OPERATIONS
o=
Table 14. Comparison Status Outputs
SIGNAL
AEQB
RESULT OF COMPARISON (ACTIVE HIGH)
The A and B operands are equal. (A high signal on the AEQB output indicates a
zero result from the selected source except during a compare operation in the
ALU.)
AGTB
The A operand is greater than the B operand. (Only during a compare operation
in the ALU)
UNORD
The two inputs of a comparison operation are unordered, i.e., one or both of
the inputs is a NaN.
Table 15. Status Outputs
SIGNAL
CHEX
DENIN
DENORM
STATUS RESULT
If 16 is low, indicates the multiplier is the source of an exception during a
chained function. If 16 is high, indicates the ALU is the source of an exception
during a chained function.
Input to the multiplier is a denorm. When DENIN goes high, the STEX pins
indicate which port had a denormal input.
The multiplier output is a wrapped number or the ALU output is a denorm. In
the FAST mode, this condition causes the result to go to zero.
INEX
The result of an operation is not exact.
IVAL
A NaN has been input to the multiplier or the ALU, or an invalid operation
(0
00 or ± oo:j: (0) has been requested. When IVAL goes high, the STEX
pins indicate which port had a NaN.
»
(")
OVER
The result is greater than the largest allowable value for the specified format.
-4
RNDCO
The mantissa of a wrapped number has been increased in magnitude by
rounding and the unwrap round instruction can be used to unwrap properly
the wrapped number (see Table 8).
SRCEX
The status was generated by the multiplier. (When SRCEX is low, the status
was generated by the ALU.)
STEXO
A NaN or a denorm has been input on the B port.
STEX1
A NaN or a denorm has been input on the A port.
UNDER
The result is inexact and less than the minimum allowable value for the
specified format. In the FAST mode, this condition causes the result to go to
zero.
en
z
.....
~
CO
CO
W
.....
5-36
*
In chained mode, status results to be output are selected based on the state of the
16 (source output) pin (if 16 is low, ALU status will be selected; if 16 is high, multiplier
status will be selected). If the nonselected output source generates an exception, CHEX
is set high. Status of the nonselected output source can be forced using the SELST
pins, as shown in Table 16.
Table 16. Status Output Selection (Chain Model
SELST1SELSTO
00
01
10
11
STATUS SELECTED
Invalid
Selects multiplier status
Selects ALU status
Normal operation (selection based on result source specified by 16 input)
Flowthrough Mode
To enable the device to operate in pipelined or flowthrough modes, registers can be
bypassed using pipeline control signals PIPES2-PIPESO (see Table 17).
Table 17. Pipeline Controls (PIPES2-PIPESOI
PIPES2PIPESO
X X 0
Enables input registers (RA, RB)
X X 1
Disables input registers (RA, RB)
X 0 X
Enables pipeline registers
X 1 X
Disables pipeline registers
0 X X
Enables output registers (P, S, Status)
1 X X
Disables output registers (P, S, Status)
REGISTER OPERATION SELECTED
......
M
00
00
IU
«~
......
z
en
FAST and IEEE Modes
The device can be programmed to operate in FAST mode by asserting the FAST pin.
In the FAST mode, all denormalized inputs and outputs are forced to zero.
Placing a zero on the FAST pin causes the chip to operate in IEEE mode. In this mode,
the ALU can operate on denormalized inputs and return denormals. If a de norm is input
to the multiplier, the DENIN flag will be asserted, and the result will be invalid. If the
multiplier result underflows, a wrapped number will be output.
5-37
Rounding Mode
The' ACT8837 supports the four IEEE standard rounding modes: round to nearest,
round towards zero (truncate), round towards infinity (round up), and round towards
minus infinity (round down). The rounding function is selected by control pins RND1
and RNDO, as shown in Table 18.
Table 18. Rounding Modes
RND1-
ROUNDING MODE SELECTED
RNDO
o0
o1
Round towards nearest
1 0
1 1
Round towards infinity (round up)
Round towards zero (truncate)
Round towards negative infinity (round down)
Test Pins
Two pins, TP1-TPO, support system testing. These may be used, for example, to place
all outputs in a high-impedance state, isolating the chip from the rest of the system
(see Table 19).
Table 19. Test Pin Control Inputs
TP1-
OPERATION
en
TPO
0
0
0
All outputs and 1I0s are forced low
~
1
All outputs and I/0s are forced high
1
0
All outputs are placed in a high impedance state
1
1
Normal operation
:2
....
l>
(")
~
CO
CO
~
....
Summary of Control Inputs
Control input signals for the' ACT8837 are summarized in Table 20.
5-38
Table 20. Control Inputs
SIGNAL
SYTEP
HIGH
Selects byte parity generation
and test
LOW
Selects single bit parity generation
and test
Clocks all registers except C
No effect
Clocks C register
No effect
CLKMODE
Enables temporary input register
load on f.alling clock edge
Enables temporary input register load
on rising clock edge
CONFIG1CONFIGO
See Table 3 (RA and RS register
data source selects)
See Table 3 (RA and RS register data
source selects)
ENRA
If register is not in flow through,
enables clocking RA register
If register is not in flow through, holds
contents of RA register
ENRS
If register is not in flow through,
enables clocking of RS register
If register is not in flow through, holds
contents of RS register
FAST
Places device in FAST mode
Places device in IEEE mode
HALT
No effect
Stalls device operation but does not
affect registers, internal states, or
status
OEC
Disables compare pins
Enables compare pins
OES
Disables status outputs
Enables status outputs
OEY
Disables Y bus
Enables Y bus
See Table 17 (pipeline mode
control)
See Table 17 (pipeline mode control)
RESET
No effect
Clears internal states and status but
does not affect data registers
RND1RNDO
See Table 18 (rounding mode
control)
See Table 18 (rounding mode control)
See Tables 6 and 7
(multiplier/ALU operand selection)
See Tables 6 and 7 (multiplier/ALU
operand selection)
Selects MSH of 64-bit result for
output on the Y bus
Selects LSH of 64-bit result for output
on the Y bus (no effect during single
precision operation)
See Table 15 (status output
selection)
See Table 15 (status output selection)
Selects multiplier result for input
to C register
Selects ALU result for input to C
register
See Table 19 (test pin control
inputs)
See Table 19 (test pin control inputs)
CLK
CLKC
PIPES2PIPESO
SELOP7SELOPO
SELMS/LS
SELST1 SELSTO
SRCC
TP1-TPO
5-39
INSTRUCTION SET
Configuration and operation of the ~ACT8837 can be.selected to perform single- or
double-precision floating-point .calculations in operating modes ranging from
flowthrough to fully pipeliried. Timing and sequences of operations are affected by
settings of clock mode, data and status registers, input data configurations, and
rounding mode, as well as the instruction inputs controlling the ALU and the multiplier.
The ALU and the multiplier of the 'ACT8837 can operate either independently or
simultaneously, depending on the setting of instruction inputs 19-10 and related controls.
Controls ·for data flow and status results are discussed separately, prior to the
discussions of ALU and multiplier operations. Then, in Tables 22 through 25, the
instruction inputs to the ALU and the multiplier are summarized according to operating
mode, whether independent or chained (ALU and multiplier in simultaneous operation).
Loading External Data Operands
Patterns of data input to the' ACT8837 vary depending on the precision of the operands
and whether they are being input as A or B operands. Loading of external data operands
is controlled by the settings of CLKMODE and CONFIG 1-CONFIGO, which determine
the clock timing and register destinations for data inputs.
Configuration Controls (CONFIG 1-CONFIGO)
en
Three input registers are provided to handle input of data operands, either single
precision or double precision. The RA, RB, and temporary registers are each 64 bits
wide. The temporary register is only used during input of double-precision operands.
~ When single-precision or integer operands are loaded, the ordinary setting of CON FIG 1~ CONFIGO is LH, as shown in Table 4. This setting loads each 32-bit operand in the
most significant half (MSH) of its respective register. The operands are loaded into
~ the MSHs and adjusted to double precision because the data paths internal to the device
00 are all double precision. It is also possible to load single-precision operands with
~ CONFIG 1-CONFIGO set to HH but two clock edges are required to load both the A
~ and B operands on the DA bus.
»
Double-precision operands are loaded by using the temporary register to store half
of the operands prior to inputting the other half of the operands on the DA and DB
buses. As shown in Tables 3 and 5, four configuration modes for selecting input sources
are available for loading data operands into the RA and RB registers.
CLKMODE Settings
Timing of double-precision data inputs is determined by the clock mode setting, which
allows the temporary register to be loaded on either the rising edge (CLKMODE = L)
or the falling edge of the clock (CLKMODE = H). Since the temporary register is not
used when single-precision operands are input, clock modes 0 and 1 are functionally
equivalent for single-precision operations.
5-40
The setting of CLKMODE can be used to speed up the loading of double-precision
operands. When the CLKMODE input is set high, data on the DA and DB buses are
loaded on the falling edge of the clock into the MSH and LSH, respectively, of the
temporary register. On the next rising edge, contents of the DA bus, DB bus, and
temporary register are loaded into the RA and RB registers, and execution of the current
instruction begins. The setting of CONFIG1-CONFIGO determines the exact pattern
in which operands are loaded, whether as MSH or LSH in RA or RB.
Double-precision operation in clock mode 0 is similar except that the temporary register
loads only on a rising edge. For this reason the RA and RB registers do not load until
the next rising edge, when all operands are available and execution can begin.
A considerable advantage in speed can be realized by performing double-precision ALU
operations with CLKMODE set high. In this clock mode both double-precision operands
can be loaded on successive clock edges, one falling and one rising, and the ALU
operation can be executed in the time from one rising edge of the clock to the next
rising edge. Both halves of a double-precision ALU result must be read out on the Y
bus within one clock cycle when the' ACT8837 is operated in clock mode 1.
Internal Register Operations
Six data registers in the' ACT8837 are arranged in three levels along the data paths
through the m,ultiplier and the ALU. Each level of registers can be enabled or disabled
independently of the other two levels by setting the appropriate PIPES2-PIPESO inputs.
The RA and RB registers receive data inputs from the temporary register and the DA
and DB buses. Data operands are then multiplexed into the multiplier, ALU, or both.
To support simultaneous pipelined operations, the data paths through the multiplier
and the ALU are both provided with pipeline registers and output registers. The control
settings for the pipeline and output registers (PIPES2-PIPES 1) are registered with the
instruction inputs 19-10.
,....
M
~
IU
ct
'd"
A seventh register, the constant (C) register is available for storing a 64-bit constant ,....
or an intermediate result from the multiplier or the ALU. The C register has a separate
clock input (CLKC) and input source select (SRCC). The SRCC input is not registered
with the instruction inputs. Depending on the operation selected and the settings of
PIPES2-PIPESO, an offset of one or more cycles may be necessary to load the desired
result into the C register.
Status results are also registered whenever the output registers are enabled. Duration
and availability of status results are affected by the same timing constraints that apply
to data results on the Y output bus.
Data Register Controls (PIPES2-PIPESO)
Table 1 7 shows the settings of the registers controlled by PIPES2-PIPESO. Operating
modes range from fully pipelined (PIPES2-PIPESO = LLL) to flowthrough
(PIPES2-PIPESO = HHH).
5-41
Z
en
Ih flowthrough mode all three levels of registers are disabled, a circumstance which
may affect some double-precision operations. Since double-precision operands require
two steps to input, at least half of the data must be clocked into the temporary register
before the remaining data is placed on the DA and DB buses.
When all registers (except the C register) are enabled, timing constraints can become
critical for many double-precision operations. In clock mode 1, the ALU can perform
a double-precision operation and output a result during every clock cycle, and both
halves of the result must be read out before the end of the next cycle. Status outputs
are valid only for the period during which the Y output data is valid.
Similarly, double-precision multiplication is affected by pipelining, clock mode, and
sequence of operations. A double-precise multiply requires two cycles to execute,
depending on the settings of PIPES2-PIPESO. The output may be valid for one or two
cycles, depending on the precision of the next operation.
Duration of valid outputs at the Y multiplexer depends on settings of PIPES2-PIPESO
and CLKMODE, as well as whether all operations and operands are of the same type.
For example, when a double-precision multiply is followed by a single-precision
operation, one open clock cycle must intervene between the dissimilar operations.
C Register Controls (SRCC, CLKC)
en
:2
......
~
l>
n
-I
00
00
eN
......
The C register loads from the P or the S register output, depending on the setting of
SRCC, the load source select. SRCC = H selects the multiplier as input source.
Otherwise the ALU is selected when SRCC = L. In either case the C register only loads
the selected input on a rising edge of the CLKC signal.
The C register does not Imid directly from an external data bus. One method for loading
a constant without wasting a cycle is to input the value as an A operand during an
operation which uses only the ALU or multiplier and requires no external data inputs.
Since the B operand can be forced to zero in the ALU or to one,in the multiplier, the
A operand can be passed to the C register either by adding zero or multiplying by one,
then selecting the input source with SRCC and causing the CLKC signal to go high .
Otherwise, the C register can be loaded through the ALU with the Pass A Operand
instruction, which requires a separate cycle.
Operand Selection (SELOP7-SELOPO)
As shown in Tables 6 and 7, data operands can be selected as five possible sources,
including external inputs from the RA and RB registers, feedback from the P and S
registers, and a stored value in the C register. Contents of the C register may be selected
as either the A or the B operand in the ALU, the multiplier, or both. When an external
input is selected, the RA input always becomes the A operand, and the RB input is
the B operand.
5-42
Feedback from the ALU can be selected as the A operand to the multiplier or as the
B operand to the ALU. Similarly, multiplier feedback may be used as the A operand
to the ALU or the B operand to the multiplier.
Selection of operands also interacts with the selected operations in the ALU or the
multiplier. ALU operations with one operand are performed only on the A operand.
Also, depending on the instruction selected, the B operand may optionally be forced
to zero in the ALU or to one in the multiplier.
Rounding Controls (RND1-RNDO)
Because floating point operations may involve both inherent and procedural errors,
it is important to select appropriate modes for handling rounding errors. To support
the IEEE standard for binary floating-point arithmetic, the' ACT8837 provides four
rounding modes selected by RND1-RNDO.
Table 18 shows the four selectable rounding modes. The usual default rounding mode
is round to nearest (RND1-RNDO = LL). In round-to-nearest mode, the 'ACT8837
supports the IEEE standard by rounding to even (LSB = 0) when two nearest
representable values are equa"ynear. Directed rounding toward zero, infinity, or minus
infinity are also available.
Rounding mode should be selected to minimize procedural errors which may otherwise
accumulate and affect the accuracy of results. Rounding to nearest introduces a
procedural error not exceeding half of the least significant bit for each rounding
operation. Since rounding to nearest may involve rounding either upward or downward
in successive steps, rounding errors tend to cancel each other.
"
('t)
In contrast, directed rounding modes may introduce errors approaching one bit for
each rounding operation. Since successive rounding operations in a procedure may
a" be similarly directed, each introducing up to a one-bit error, rounding errors may
accumulate rapidly, especially in single-precision operations.
Status Exceptions
Status exceptions can result from one or more error conditions such as overflow,
underflow, operands in illegal formats, invalid operations, or rounding. Exceptions may
be grouped into two classes: input exceptions resulting from invalid operations or
denormal inputs to the multiplier, and output exceptions resulting from i"egal formats,
rounding errors, or both.
To simplify the discussion of exception handling, it is useful to summarize the data
formats for representing IEEE floating-point numbers which can be input to or output
from the FPU (see Table 21). Since procedures for handling exceptions vary according
to the requirements of specific applications, this discussion focuses on the conditions
which cause particular status exceptions to be signalled by the FPU.
5-43
CO
CO
....
(.)
t'N
~
SECOND
RESULT ~
/ 4 - - - t pd1----+f
OUTPUT(31.01. STATUS(13.0)
Figure 2. Single-Precision Operation, All Registers Disabled
(PIPES - 111, CLKMODE - 0)
5-53
The second example shows a microinstruction causing the ALU to compare absolute
values of A and B. Only the input registers are enabled (PIPES2-PIPESO = 110) so
the result is output in one clock cycle.
CLKMODE
=0
000001 1010
~
.....,
=
o 01
Operation: Compare IA I' IB I
110
CCC
L 00 P P
K NN I I
M FF PP
0 II EE
D GG SS
E 1-02-0
I I
9-0
en
PIPES
SS
EE
LL
00
PP
7-0
110 xxxx 1111 00
01101000x11
Load First Operands
Begin First Operation
Load Second Operands
Begin Second Operation
~
~
I
I
i
CLK
RR
NN
DD
1-0
S
E
L
M
SS
BEE R
S
FEE S /
Y L L EH
ANNR
OOOTSSSATT
SRRCLEEEETTELPP
TAB C S Y C S P 1 -0 T T 1-0
1 1 11
:
r---~I--~ rr~~~~~~~~~~r----~I--~ r~~nr~TV~nr~~~
~
»
C')
14- tsu 1-':" th 1+I
~
INSTRUCTION: FUNC(9.0). RND(1.0). FAST
CAl
.....,
(
00
00
Op~I~:~DS ~ o~i~~~~s ~
...... t su2 ' " th 1.+1
DATA(31.0) A AND B INPUTS
,4
tpd1-----..~
OUT(31.0) STATUS(13.0)
It-- tsu2-.~~t--~.'l-th1
14,4-.---tpd2-----..~1
Figure 3. Single-Precision Operation, Input Registers Enabled
(PIPES - 110, CLKMODE ... 0)
5-54
Input and output registers are enabled in the third example, which shows the subtraction
B - A. Two clock cycles are required to load the operands, execute the subtraction,
and output the result (see Figure 4).
CLKMODE
=
0
I I
9-0
0000000011
eLK
PIPES
C
L
K
M
0
D
E
=
010
CC
00 P P
NN I I
FF PP
II E E
GG SS
1-02-0
o 01
Subtract B - A
Operation:
SS
EE
LL
00
PP
7-0
S
E
L
M
S S
BEE R
S
FEE S /
Y L L EH
ANNR
OOOTSSSATT
SRRCLEEEETTELPP
TAB C S Y C S P 1 -0 T T 1-0
RR
NN
DD
1-0
01 0 xxxx 1111 00
0000 1 000 x 11
Load First Operands
Begin First Operation
Load Second Operands
Begin Second Operation
~
~
I
I
I
1 1 11
I
l t i l l - - - - - - - t d1---------+l.,
I
I
I
I
J4- t su2 -M-th 1 +I
DATA(31.01 A AND B INPUTS
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _>@
n
~
Double-Precision ALU Operations with CLKMODE = 0
CO The first example shows that, even in flowthrough mode, a clock signal is needed
CO
eN to load the temporary register with half the data operands (see Figure 6). The selected
......
5-58
operation is executed without a clock after the remaining half of the data operands
are input on the RA and RB buses:
CLKMODE
=0
PIPES = 111
Operation: Add A
+ IB I
S
E
C
L
K
M
CC
00 P P
NN I I
FF PP
o II E E
D GG SS
E 1-02-0
I I
9-0
L
M
SS
EE
LL
00
PP
7-0
SS
BEE R
FEES/
YLLEH
ANNR
OOOTSSSATT
SRRCLEEEETT ELPP
TAB C S Y C S P 1-0 T T 1-0
S
RR
!''IN
DD
1-0
01 1000 1000 0 11 111 xxxx 1111 00
0 1 1 0 x 0 0 0 x 11
1 1 11
load Half of Data
~
ClK
(FIRST
INS~RUCTION
I
I4- t su1-+!
INSTRUCTION: FUNC(9.01. RND(1.01,FAST
I
I
(
~
HA~F OF____
____
~D~_T_A
I
14- tsu2 ____ th1
X
~,
~
REST OF
_____
D_AT_A________________________________________
~
DATA(31.01 A AND B INPUTS
SElMS/LS
~REST
_ _ _ _ _ _ _ _ _ _~....;.F.,;,;.IR,;,;.S...
T ________________________
I4-tpd1-+1
OUT(31.01 STATUS(13.01
Figure 6. Double-Precision ALU Operation, All Registers Disabled
(PIPES - 111, CLKMODE - 0)
5-59
In the second example the input register is enabled (PIPES2-PIPESO = 110). Operands
A and B for the instruction, I B I - I A I, are loaded using CON FIG = 00 so that B is
loaded first into the temporary register with MSH through the DA port and LSH through
the DB port. On the second clock rising edge, the A operand is loaded in the same
order directly to RA register while B is loaded from the temporary register to the RB
register (see Figure 7).
CLKMODE = 0
PIPES = 110
C
L
K
M
I I
9-0
01 1001 1011
5-60
CC
00 P P
NN I I
FF PP
0 II EE
D GG SS
E 1-02-0
o
Operation: I B I - IAI
SS
EE
LL
00
PP
7-0
RR
NN
DD
1-0
00 11 0 xxxx 1111 00
S
E
L
M
S S
BEE R
S
Y L L EH
FEE S /
OOOTSSSATT
ANNR
SRRC[EEEETTELPP
TAB C S Y C S P 1 -0 T T 1-0
011 OxOOOx 11
1 1 11
load Rest
of First
Operands
load Half
of First
Operands
Begin First
Operation
load Half
of Second
Operands
Begin Second
Operation
1
I
1
""----1---,
+
I
load Rest
of Second
Operands
•
+
+
L
ClK
FIRST INSTRUCTION
,
THIRD INSTRUCTION
SECOND INSTRUCTION
I
th1~ I4-- t su1 ~
14-- tsu1--+1
14- th1 ~ 14-- tsu1~
INSTRUCTION: FUNC(9,0), RND(1,0),FAST
I
I
HALF
1ST OPS
I4-- t su2
HALF
2ND OPS
REST
1ST OPS
REST
2ND OPS
HALF
3RD DPS
REST
3RD OPS
I
.ll4--tsu2 _ _ th1~ I4--tsu2--+14th1~ I4--tsu2--+14-th1~ I4--tsu2---+1f-th1~ I4-- t su2--+t
.'4
th1
DATA(31,0) A AND B INPUTS
L---~I-I,--------,
SElMS/lS
OUT(31 ,0) STATUS( 13,0)
(J1
OJ
)4--tpd2~
I4---*tpd5
)4--tpd2~
)4----+f-tpd5
)4--tpd2 ~
Figure 7. Double-Precision ALU Operation. Input Registers Enabled
(PIPES = 110. CLKMODE = 0)
SN74ACT8837
~tpd5
80th the input and output registers are enabled (PIPES2-PIPESO = 010) in the third
example. The instruction sets up the ALU to wrap a denormalized number on the OA
input bus. The wrapped output can be fed back from the S register to the multiplier
input multiplexer by a later microinstruction. Timing for this operation is shown in
Figure 8.
CLKMOOE = 0
PIPES = 010
Operation: Wrap Oenormal Input
S
C
L
K
M
I I
9-0
CC
00 P P
NN I I
FF PP
o II E E
oGG SS
E 1-02-0
E
L
M
SS
EE
LL
00
PP
7-0
01 101 0 1000 0 01 01 0 xxxx 11 xx 00
5-62
SS
S
8 EE R
RR F,EESI
YLLEH
NN ANNR
555TSSSATT
DO SR~C[EEEETTELPP
1-0 T A 8 C S Y C S P 1-0 T T 1-0
0 1 1 0 x 0 0 0 x 11
1 1 11
Load Rest
of First
Operands
l
Load Half
of First
Operands
Begin First
Operation
~
~
,
I
I
CLK
Begin Second
Operation
Load Output
+
I
:
Load Rest
of Second
Operands
Load Half
of Second
Operands
+
I
I
Load Half
of Third
Operands
Load Output
~
U
r--I---,
L
14-- td1 ---+!
I
I
I
I
FIRST INSTRUCTION
SECOND INSTRUCTION
I
I
I4-tsu1~
THIRD INSTRUCTION
I4th1~ 14- tsu1---+1
.th1~ I4-tsu1 ~
,
I4-th1~
INSTRUCTION: FUNC(9.0}. RND(1.0}. FAST
I
I
REST
1ST OPS
HALF
1ST OPS
I4- t su2
th1
HALF
2ND OPS
REST
2ND OPS
HALF
3RD OPS
REST
3RD OPS
I
~ I4-tsu2~th1~ I4-tsu2-+1f-th1~ I4-tsu2---+14th1~ I4-tsu2~th1~ 14- tsu2---+14-th'l~
DATA(31.0} A AND B INPUTS
--1L...-..--_
SELMS/LS
OUT(31.0) STATUS(13.0)
0'1
a,
W
I4-tpd4-+1
I4-tpdS+I
I4-tpd4-+1
I4-tpdS-+I
Figure 8. Double-Precision ALU Operation. Input and Output Registers Enabled
(PIPES = 010. CLKMODE = 0)
SN74ACT8837
I4-tpd4~
In the fourth example with CLKMODE = L, all three levels of internal registers are
enabled. The instruction converts a double-precision integer operand to a doubleprecision floating-point operand. Figure 9 shows the timing for this operating mode.
CLKMODE
=0
I I
9-0
01 10100010
5-64
PIPES
= 000
C CC
L 00 P P
K NN I I
M FF PP
0 II EE
DGGSS
E 1-02-0
o
Operation: Convert Integer to Floating Point
SS
EE
LL
00
PP
7-0
RR
NN
DO
1-0
11 000 xxxx 1100 00
S
E
L
M
SS
BEE R
S
FEE S /
Y L L EH
ANNR
OOOTSSSATT
SRRCLEEEETTELPP
TAB C S Y C S P 1 -0 TT1-0
0110xOOOxll
1 1 11
load Rest
of First
Operands
load Half
of First
Operands
I
load Half
of Second
Operands
Begin First
Operation
~
~
I
I
I
~ td2
ClK
load Rest
of Second
Operands
Begin Second
Operation
load Pipeline
load Output
~
~
I
.'4
L
I
td2 ---+i
I
I
FIRST INSTRUCTION
I
14-- tsu1-+1
INSTRUCTION:
SECOND INSTRUCTION
THIRD INSTRUCTION
I
th 1*----+I
th 1 --J4---t.I 14-tsu 1 ----+I
th1-i4--+114- tsu1--+1
I
FUNC(9,O). RND(1,O), FAST
HALF
2ND OPS
REST
1ST OPS
REST
2ND OPS
HALF
3RD OPS
REST
3RD OPS
I
I4--tsu2~ I+-- t su2
th1
~ I+-tsu2.'"
th1
.'l+-tsu2
~l+-tsu2
th1
th1
*--+II+--tsu2
th1
~..
., th1
DATA(31,O) A AND B INPUTS
__ J
SElMS/ls
OUT(31,O) STATUS{13,O)
(Jl
OJ
(Jl
i+-+I
i4-+!
i4-+!
tpd4
tpd5
tpd4
Figure 9. Double-Precision ALU Operation. All Registers Enabled
(PIPES ... 000. CLKMODE ... 0)
SN74ACT8837
-
14-+1
tpd5
14-+1
tpd4
Double-Precision ALU Operations with CLKMODE = 1
The next fo~r examples are similar to the first four except that CLKMODE = H so that
the temporary register loads on the falling edge of the clock. When the ALU is operating
independently, setting CLKMODE high enables loading of both double-precision
operands on successive falling and rising clock edges.
In this clock mode a double-precision ALU operation requires one clock cycle to load
data inputs and execute, and both halves of the 64-bit result must be read out on
the 32-bit Y bus within one clock cycle. The settings of PIPES2-PIPESO determine
the number of clock cycles which elapse between data input and result output.
In the first example all registers are disabled (PIPES2-PIPESO = 111), and the addition
is performed in flOwthrough mode. As shown in Figure 10, a falling clock edge is needed
to load half of the operands into the temporary register prior to loading the RA and
RB registers on the next rising clock.
CLKMODE = 1
~
II
-...I
9-0
~
~
PIPES = 111
C CC
L 00 P P
K NN I I
M FF PP
0 II EE
DGGSS
E 1-02-0
Operation: Add A + IBI
SS
EE
LL
00
PP
7-0
RR
NN
DD
1-0
01 1000 1000 1 11 111 xxxx 1111 00
-t
CO
CO
eN
-...I
5-66
S
E
L
M
S S
BEE R
S
YLLEH
FEE S I
OOOTSSSATT
ANNR
SRRCLEEEETTELPP
TAB C S Y C S P 1-0 T T 1-0
0 1 1 0 x 0 0 0 x xx
1 1 11
-----------""f
~--------------------------------------------------------------------
ClK
~
LOAD HAC' D' O"RANOS
FIRST INSTRUCTION
~
~
tsu1
INSTRUCTION: FUNC(9.01. RND(1.0}. FAST
I
(
~
HALF1STOPS
tsu2
.~
==>C~
_________________________________
REST 1ST OPS
th1~
DATA(31.0) A ANDB INPUTS
SElMS/lS
~_J
~ ~~:~~
__________________________________________
QUT(31.0} STATUS (13.0)
~
tpd1
~
~
I+--tpd5~
Figure 10. Double-Precision ALU Operation, All Registers Disabled
(PIPES = 111, CLKMODE = 1)
(11
0,
-..J
SN74ACT8837
~~~~
The second example executes subtraction of absolute values for both operands. Only
the RA and RB registers are enabled (PIPES2-PIPESO = 110). Timing is shown in
Figure 11.
CLKMODE
=
1
I I
9-0
PIPES
C
L
K
M
0
D
E
en
:2
.....
~
- IA I
S
E
L
M
SS
S
BEE R
RR FEE S I
Y L L EH
NN ANNR
OOOTSSSATT
DD SRRCLEEEETTELPP
1-0 TAB C S Y C S P 1-0 T T 1-0
SS
EE
LL
00
PP
7-0
1 1 11 0 xxxx 1111 00
o
1 1 0 x 000 x xx
1 1 11
load half
of First
Operands
load Rest
of First
Operends
load Half
of Second
Operands
load Rest
of Second
Operands
load Half
of Third
Operands
load Rest
of Third
Operands
+
+
+
I
+
I
I
I
+
+
I
I
I
I
I
I
I
I
FiRST INSTRUCTION
~
14- tsu1~
(')
INSTRUCTION: FUNC(9.01.
-4
Operation: Subtract IB I
110
CC
00 P P
NN I I
FF PP
II EE
GG SS
1-02-0
01 1001 1011
elK
=
00
00
CAl
I
I
THIRD INSTRUCTION
SECOND INSTRUCTION
I.- th1-+114---+1- tsu1
~ND(1.01. FAST
I
I
.....
I
I4--tsu2~ I4--tsu2~
I4- t su2 ~101
th1
th1
DATA(31.01 A AND B INPUTS
~ll4-tsu2"'th1-+1 ~th1-+1l4-tsu2+14-*th1
th1
tsu2
SElMS/lS
OUT(31.01 STATUS(13.01
1+---+1
tpd2
Figure 11. Double-Precision ALU Operation, Input Registers Enabled
(PIPES - 110, CLKMODE - 1)
5-68
The third example shows a single denormalized operand being wrapped so that it can
be input to the multiplier. Both input and output registers are enabled
(PIPES2-PIPESO = 010). Timing is shown in Figure 12.
CLKMODE
I I
9-0
=
1
PIPES
C
L
K
M
0
D
E
= 010
CC
00 P P
NN I I
FF PP
II EE
GG SS
1-02-0
Operation: Wrap Denormal Input
SS
EE
LL
00
PP
7-0
RR
NN
DD
1-0
01 1010 1000 1 11 010 xxxx 11xx 00
S
E
L
M
SS
BEE R
S
Y L L EH
FEE S I
ANNR
555TSSSATT
S R R C lEE E E T TEL PP
TAB C S Y C S P 1 -0 T T 1-0
o
1 0 0 x 0 0 0 x xx
1 1 11
.....
M
CO
CO
t-
U
q<
.....
2
CJ)
5-69
L£88.1::HfvLNS
01
.!.J
o
-.J
load Rest
of First
Operands
load Rest
of Second
Operands
load Half
of First
Operands
Begin First
Operation
load Half
of Second
Operands
Begin Second
Operation
~
~
~
~
I
•
I
141-- - -
I
ClK
,
I
I
td3 - - -
I
I
FIRST INSTRUCTION
I
I
14- tsu 1 ~
I
.....-,-----.
.,
SECOND INSTRUCTION
I
th 1 ~
If--- tsu 1 --.I
load Output
THIRD INSTRUCTION
14- th1 +I 14- tsu1-+1
I4- t h1-+i
INSTRUCTION: FUNC(9.0). RND{1.0). FAST
I
I
HALF
1ST OPS
tsu2
101
HALF
2ND OPS
.101
th1
.'l4-
REST
2ND OPS
I
t su2+14---+1l4- t su2-+1+th1-+1 ~
th1
tsu2
HALF
3RD OPS
th1--+1 ~
tsu2
REST
3RD OPS
th1 --+114+14-- th1 ~
tsu2
DATA{31.0) A AND B INPUTS
~---,ISElMS/lS
OUT{31.0) STATUS{13.0)
14--+1
I4--+i
I4--+i
I4--+i
tpd4
tpd5
tpd4
tpd5
Figure 12. Double-Precision ALU Operation. Input and Output Registers Enabled
(PIPES = 010. CLKMODE = 1)
. --1
The fourth example shows a conversion from integer to floating point format. All three
levels of data registers are enabled (PIPES2-PIPESO) so that the FPU is fully pipelined
in this mode (see Figure 13).
CLKMODE = 1
PIPES = 000
Operation: Convert Integer to Floating Point
S
C CC
L 00 P P
K NN I I
M FF PP
o
I I
9-0
II E E
D GG SS
E 1-02-0
E
L
SS
EE
LL
00
PP
7-0
RR
NN
DD
1 -0
01 101 0 001 0 1 1 1 000 xxxx 1 100 00
M
SS
S
BEE R
FEES!
YLLEH
ANNR
OOOTSSSATT
SRRCLEEEETTELPP
TAB C S Y C S P 1 -0 T T 1-0
0 1 1 x x 0 0 0 x xx
1 1 11
,....
M
CO
CO
~
u
«
~
,....
z
en
5-71
L£88.1::>~VLNS
01
Load Rest
of Third
Operands
~
N
Load Half
of First
Load Rest
of First
Operands
Load Half
Begin First of Second
~perands
1peration
~perands
J
I
I
I
CLK
1
I
I
14
Load Rest
of Second
Operands
1
Load Half
of Third
1
Operands
ad Pipeline
Begin Third
Operation
Load Pipeline
load Output
r
I
~
td2
td2
.1
I
FIRST
INSTRUCTION
tsu1~
SECOND
INSTRUCTION
I
th1-t4---+1 I4---* t su1
THIRD
INSTRUCTION
th1~ ~tsu1
FOURTH
INSTRUCTION
th1~ ~tsu'l
th1-l4----+1
INSTRUCTION: FUNC(9.01. RND(1.0). FAST
I
I
i'I
I
.,01 ., ,01 .'01
tsu2 th 1
~
tsu2 th 1
i'I
.14
~
tsu2 th 1
'OIl .'4 .,
tsu2 th 1
,4 .'4 ., ,4 .'4 ., i'I .,4 ., ,4 .,4 .,
tsu2 th 1
tsu2 th 1
tsu2 th 1
tsu2
th 1
DATA(31.0) A AND B INPUTS
L
SElMS/LS
OUT(31.0) STATUS(13.0)
tpd4~
tpd5-14--+1 tpd4~
tpd5-14--+1
Figure 13. Double-Precision ALU Operation, All Registers Enabled
(PIPES = 000, CLKMODE = 1)
tpd4~ tpd5~
Double-Precision Multiplier Operations
Independent multiplier operations may also be performed in either clock mode and with
various registers enabled. As before, examples for the two clock modes are treated
separately. A double-precision multiply operation requires two clock cycles to execute
(except in flowthrough mode) and from one to three other clock cycles to load the
temporary register and to output the results, depending on the setting of
PIPES2-PIPESO.
Even in flowthrough mode (PIPES2-PIPESO = 111) two clock edges are required, the
first to load half of the operands in the temporary register and the second to load the
intermediate product in the multiplier pipeline register. Depending on the setting of
CLKMODE, loading the temporary register may be done on either a rising or a falling
edge.
Double-Precision Multiplication with CLKMODE = 0
In this first example, the A operand is multiplied by the absolute value of B operand.
Timing for the operation is shown in Figure 14:
CLKMODE = 0
C
L
K
M
0
I I
9-0
01 1100 1000
Operation: Multiply A
PIPES = 111
CC
00 P P
NN I I
FF PP
II EE
oGG SS
E 1-02-0
o
11
SS
EE
LL
00
PP
7-0
RR
NN
DO
1-0
111 1111 xxxx 00
* IBI
S
E
L
M
SS
BEE R
S
Y L L EH
FEE S /
OOOTSS SA TT
ANNR
SRRCLEEEETTELPP
TAB C S Y C S P 1 -0 T T 1-0
o
"CO
('I)
CO
~
U
~
II::t
x x x x 0 0 0 x xx
1 1 11
"2
(J)
5-73
L£88.L::>V17LNS
01
~
"'"
_______--'r~d.
r". ;~
load Half of
ClK
(
FIRST INSTRUCTION
I
I
~tsu1~
INSTRUCTION:
(
I
FUNC(9,0~,
RND(1,0), FAST
----- -------x
~tsu2
HALF
lSTOPS
~
1~
th1
~
______
REST
~l~S~T~O~P~S
tsu3
____
~I
____________________________________________________________
~
DATA(31 ,0) A AND B INPUTS
SELMS/LS
~
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~
OUT(31,0) STATUS(13,0)
~
tpd2
~
HALF
FIRST
~
~
~tpd5---+1
Figure 14. Double-Precision Multiplier Operation. All Registers Disabled
(PIPES = 111. CLKMODE = 0)
REST
FIRST
The second example assumes that the RA and RB input registers are enabled. With
CLKMODE = 0 one clock cycle is required to input both the double-precision operands.
The multiplier is set up to calculate the negative product of IA I and B operands:
=0
CLKMODE
PIPES
C
L
K
M
NN I I
FF PP
0 II EE
D GG SS
E 1-02-0
I I
9-0
SS
EE
LL
PP
RR
NN
DD
7-0
1-0
00
01 1101 0100 0 11 110 1111 xxxx 00
Load Rast
of First
Operands
l
S S
BEE R
FEE S I
Y L L EH
ANNR
OOOTSSSATT
SRRCLEEEETTELPP
TAB C S Y C S P 1-0 T T 1-0
0 1 1 x x 0 0 0 x xx 1 1 11
Load Rest
of Second
Operands
Load Half
of Second
Operands
Load half
of First
Operands
Begin First
Operation
Load Pipeline
Begin Second
Operation
+
+
+
+
I
I
I
I
I
I
:+-- td2 - - + I
I
I
I
CLK
B)
S
E
L
M
S
CC
00 P P
*
Operation: Multiply - ( IA I
= 110
,.....
(¥)
ex;)
ex;)
I
I
I
l-
FIRST INSTRUCTION
!+-
\+-tsu1-.l
e.>
SECOND INSTRUCTION
th1
~
~ \4-tsu1-+J
V
,.....
INSTRUCTION: FUNC(9.01. RND(1.01. FAST
HALF
1ST OPS
I+---tsu2~
REST
1ST OPS
th1--+114-- t su24-- th1-.1
j4- t su2+14-
z
en
REST
2ND OPS
HALF
2ND OPS
th1-.1
DATA(31.01 A AND B INPUTS
14
.14
th1--+1
tsu2
SELMS/LS
HALF
REST
HALF
REST
~
1ST
1ST
2ND
2ND
-------------------------------- 14-+1
14-+1
H
H
tpd2
tpd5
tpd2
tpd5
OUT(31.01 STATUS(13.01
Figure 15. Double-Precision Multiplier Operation. Input Registers Enabled
(PIPES - 110. CLKMODE = 0)
5-75
Enabling both input and output registers in the third example adds an additional delay
of one clock cycle, as can be seen from Figure 16. The sample instruction sets up
calculation of the product of 1A 1and 181:
CLKMODE = 0
I I
9-0
PIPES = 010
C
L
K
M
0
D
E
CC
00
NN
FF
II
GG
Operation: Multiply IAI
PP
I I
PP
EE
SS
SS
EE
LL
00
PP
RR
NN
DD
1-02-0
7-0
.1-0
01 11 01 1000 <> 10 010 1111 xxxx 00
Load Half
of First
Operands
+
Load Rest
of First
Operands
Load Half
of Second
Operands
Begin First
Operation
* 181
S
E
L
M
S S
8 EE R
S
YLLE'H
FEE S /
ANNR
OOOTSSSATT
SRRCLEEEETTELPP
T A 8 C S Y C S P 1 -0 T T 1-0
0 1 1 x x 0 0 0 x xx 1 1 11
Load Rest
of Second
Operands
Begin Second
Operation
+
en
Z
.....
eLK
.s::a.
l>
-I
CO
CO
Co\)
SECOND INSTRUCTION
FIRST INSTRUCTION
(")
14
_I
tsu1
THIRD INSTRUCTION
I+- th1-t-1 I+ t su1"i
I
INSTRUCTION: FUNC(9.0). RND(1.0). FAST
r
.....
I
REST
3RD OPS
.. tsu2 +14- th1 +I ... t su2 " th1 -.t ... tsu2+14- th 1 -.t ... tsu2" th1 -.t
DATA(31.0) A AND B INPUTS
L
SELMS/Ls
~
--------------------~
OUT(31.0) STATUS(13.0)
tpd4-11~.-~.1
Figure 16. Double-Precision Multiplier Operation, Input and Output Registers Enabled
(PIPES - 010, CLKMODE - 0)
5-76
With all registers enabled, the fourth example shows a microinstruction to calculate
the negated product of operands A and B:
CLKMOOE = 0
PIPES = 000
C
L
K
M
0
I I
9-0
01 11000100
CC
00 P P
NN I I
FF PP
II EE
o GG SS
E 1-02-0
o
Operation: Multiply - (A
SS
EE
LL
00
PP
7-0
RR
NN
DO
1-0
01 000 1111 xxxx 00
* B)
S
E
L
M
SS
BEE R
S
FEE S /
Y L L EH
OOOTSSSATT
ANNR
SRRCLEEEETTELPP
TAB C S Y C S P 1-0 T T 1-0
o
1 1 x x 0 0 0 x xx
1 1 11
.....
('I)
CO
CO
I-
(J
2
(')
-4
CO
CO
W
MULTIPLIER/ALU
OPERATIONS
3
.....
4
PSEUDOCODE
A - RA. B - RB
C - RA. P - RB
A * a- P(AB)
P(AB) + 0 - S(AB)
E - RA. F - RB
C * D- P(CD)
S(AB) + P(CD) - S(AB + CD)
G - RA. H - RB
E * F - P(EF)
S(AB + CD) + P(EF) - S(AB + CD + EF)
G * H - P(GH)
S(AB + CD + EF) + P(GH) - S(AB + CD + EF + GH)
A microcode sequence to generate this sum of product is shown in Table 28. Only
three instructions in chained mode are required. since the multiplier begins the
calculation independently and the ALU completes it lndependently.
5-86
Table 28. Sample Microinstructions for Single-Precision Sum of Products
C
L
K
M
I I
9-0
S
E
L
M
S
CC
00 pp
NN I I
FF pp
0 II EE
D GG SS
E 1-0 2-0
SS
EE
LL
00
RR
NN
pp
DD
7-0
1-0
0001000000
1001100000
1000000000
1000000000
0000000000
0 01
0 01
0 01
0 01
0 01
010
010
010
010
010
1111 xxxx
1111 xxxx
11111010
xxxx 1010
xxxx 1010
xx xxxx xxxx
x xx
xxx
xxxx xxxx xx
00
00
00
00
00
S S
BEE
FEE S I
Y L L
OOOTSS
ANN R
S R R C LEE E E TT
TABCSYCS P 1-0
R
E H
SA TT
E L PP
T T 1-0
x
x
x
x
ox x x
x x x x
1 1 11
1 1 11
11
11
11
11
0
0
0
0
x
x
x
x
x
x
x
x
x
x
x
0
x
x
x
x
x
o
x
x
x
x
x
x
x
x
x
x
0 x
xx
xx
xx
xx
xx
xx
Fully Pipelined Double-Precision Operations
Performing fully pipelined double-precision operations requires a detailed understanding
of timing constraints imposed by the lTlultiplier. In particular, sum of products and
product of sums operations can be executed very quickly, mostly in chained mode,
assuming that timing relationships between the ALU and the multiplier are coded
properly.
Pseudocode tables for these sequences are provided, (Table 29 and Table 30) showing
how data and instructions are input in relation to the system clock. The overall patterns
of calculations for an extended sum of products and an extended product of sums
are presented. These examples assume FPU operation in CLKMODEO, with the CONFIG
setting HL to load operands by MSH and LSH, all registers enabled
(PIPES2 - PIPESO = LLL), and the C register clock tied to the system clock.
'"
M
~
t(.)
:;J
'"
Z
In the sum of products timing table, the two initial products are generated in (/)
independent multiplier mode. Several timing relationships should be noted in the table.
The first chained instruction loads and begins to execute following the sixth rising
edge of the clock, after the first product P1 has already been held in the P register
for one clock. For this reason, P1 is loaded into the C register so that P1 will be stable
for two clocks.
'
On the seventh clock, the ALU pipeline register loads with an unwanted sum, P1 + P1.
However, because the ALU timing is constrained by the multiplier, the S register will
not load until the rising edge of CLK9, when the ALU pipe contains the desired sum,
P1 + P2. The remaining sequence of chained operations then execute in the desired
manner.
5-87
L£88.l~nfvLNS
en
cD
(Xl
Table 29. Pseudocode for Fully Pipelined Double-Precision Sum of Products
(CLKM = 0, CONFIG = 10, PIPES = 000, CLKC-SYSCLK)
ClK
I1
I2
I3
I4
I5
I6
I7
IS
I9
IlO
I11
S12
DA
DB
TEMP
INS
INS
RA
RB
MUl
P
C
ALU
BUS
BUS
REG
BUS
REG
REG
REG
PIPE
REG
REG
PIPE
A1 *B1
A1
B1
A2 MSH B2 MSH A2.B2MSH A2*B2 A1 *B1
A1
B1
A1 *B1
A2 LSH
A2
B2
A1 *B1
A2
B2
A2*B2
P1
A3
B3
A2*B2
P1
P1
A3
B3
A3*B3
P2
P1
P1 +P1
A4
B4
A3*B3
P2
P1
P1 +P1
A4
B4
A4*B4
P3
P2
S1 +P2
S1
A5
B5
A4*B4
P3
P3
S1 +P3
S1
A5
B5
A5*B5
P4
P3
XXXXX
S2
S
REG BUS
A1 MSH B1 MSH A1.B1MSH A1 *B1
A1 LSH
B1 LSH A1.B1MSH A1 *B1
B2 LSH A2.B2MSH A2*B2 A2*B2
PR+CR
A3 MSH B3 MSH A3.B3MSH
A3 LSH
B3 LSH A3.B3MSH
A4 MSH B4 MSH A4.B4MSH
A4 LSH
B4 LSH A4.B4MSH
A5 MSH B5 MSH A5.B5MSH
A5 LSH
B5.LSH A5.B5MSH
A6 MSH B6 MSH A6.B6(M)
---------- ...
----
A3*B3
A2*B2
PR+CR PR+CR.
A3*B3 A3*B3
PR+SR PR+SR.
A4*B4 A3*B3
PR+SR PR+SR.
A4*B4 A4*B4
PR+SR PR+SR.
A5*B5 A4*B4
PR+SR PR+SR.
A5*B5 A5*B5
PR+SR PR+SR.
A6*B6 A5*B5
-
- _.. _-
---
-
~
--
y
Table 30. Pseudocode for Fully Pipelined Double-Precision Product of Sums
(CLKM ... 0, CON FIG = 10, PIPES = 000, CLKC-SYSCLK)
CLK
Sl
S2
S3
S4
S5
DA
DB
TEMP
INS
INS
RA
RB
MUL
P
C
ALU
BUS
BUS
REG
BUS
REG
REG
REG
PIPE
REG
REG
PIPE
Al(M)
Bl(M)
Al,Bl(M)
Al +Bl
Al(L)
Bl (L)
Al,Bl(M)
Al +Bl
Al +Bl
Al
Bl
A2(M)
B2(M)
A2,B2(M)
A2+B2 Al +Bl
Al
Bl
Al +Bl
A2(L)
B2(L)
A2,B2(M)
A2+B2 A2+B2
A2
B2
Al +Bl
51
A2+B2
A2
B2
51
A2+B2
51
CR*SR CR*SR
A3+B3 A3+B3
A3
B3
51
A2+B2
52
A3
B3
51 *52
51
A3+B3
52
PR*SR CR*SR ENRA=L ENRB=L
51 *52
A4+B4 A3+B3
A3
B3
51
A3+B3 XXX
A3(M)
B3(M)
A3,B3(M)
Sa
A3(L)
B3(L)
A3,B3(M)
S7
XXX
XXX
XXX
sa
A4(M)
B4(M)
A4,B4(M)
S9
A4(L)
B4(L)
A4,B4(M)
SlO
XXX
XXX
XXX
S11
A5(M)
B5(M)
A5,B5(M)
S12
A5(L)
B5(L)
A5,B5(M)
CR*SR
A3+B3
SP Add
CR*SR
A3+B3
PR*SR PR*SR
A4
B4
XXX
Pl
51
XXX
53
A4
B4
Pl *53
Pl
51
A4+B4
53
PR*SR PR*SR ENRA=L ENRB=L
Pl *53 XXX
A4
B4
A5+B5 A4+B4
51
A4+B4 XXX
PR*SR PR*SR
A5+B5 A5+B5
51
A4+B4 A4+B4
SPAdd
PR*SR
A4+B4
A5
85
XXX
U1
eX!
co
S
NOTE: On CLK 7 and CLK10, put 0000000000 (Single-Precision Add) on the instruction bus.
SN74ACT8837
P2
XXX
V
REG BUS
54
In the product of sums timing table, the two initial sums are generated in independent
ALU mode. The remaining operations are shown as alternating chained operations
followed by single-precision adds. The SP adds are necessary to provide an extra cycle
during which the multiplier outputs the current intermediate product. The current sum
and the latest intermediate product are then fed back to the multiplier inputs for the
next chained operations. In this manner, a double-precision product of sums is
generated in three system clocks, as opposed to two clocks for a double-precision
sum of products.
Mixed Operations and Operands
Using mixed-precision data operands or performing sequences of mixed operations
may require adjustments in timing, operand precision, and control settings. To simplify
microcoding sequences involving mixed operations, mixed-precision operands, or both,
it is useful to understand several specific requirements for mixed-mode or mixedprecision processing.
Calculations involving mixed-precision operands must be performed as double-precision
operations (see Table 12). The instruction settings (18-17) should be set to indicate
the precision of each operand from the RA and RB input registers. (Feedback operands
from internal registers are also double-precision.) Mixed-precision operations should
not be performed in chained mode.
en
Timing for operations with mixed-precision operands is the same as for a corresponding
double-precision operation. In a mixed-precision operation, the single-precision operand
must be loaded into the upper half of its input register.
:2 Most format conversions also involve double-precision timing. Conversions between
single- and double-precision floating point format are treated as mixed-precision
operations. During integer to floating point conversions, the integer input should be
loaded
into the upper half of the RA register.
-I
~
""""
»
(")
CO
CO In applications where mixed-precision operations is not required, it is possible to tie
W the 18-17 instruction inputs together so that both controls always select the same
"""
precision.
5-90
Sequences of mixed operations may require changes in multiple control settings to
deal with changes in timing of input, execution, and output of results. Figure 22 shows
a simplified timing waveform for a series of mixed operations:
CLOCK CYCLE
FUNCTION
AND DATA
A,B
RESULTS
AND STATUS
6
7
8
9
10
11
12
E,F
G,H
G,H
I.J
I,J
K,L
M,N
XXXX A,B XXXX C,D
E,F
E,F
G,H
G,H
I.J
K,L
2
3
4
A,B
C,D
C,D
5
13
M,N
A,B,C,D - double precIsion multiply; E,F - single precIsion operation; G,H,I,J - double
precision add; K,L - single precision opration. A double precision number is not required to
be held on the outputs for two cycles unless it is followed by a like double precision function.
If a double precision multiply is followed by single precision operation, there must be one open
clock cycle.
Figure 22. Mixed Operations and Operands
(PIPES2-PIPESO '"' 110, CLKMODE = 0)
In this sequence, the fifth cycle is left open because a single-precision multiply follows
a double-precision mUltiply. If the SP multiply were input during the period following
the fourth rising clock edge, the result of the preceding operation would be overwritten,
since an SP multiply executes in one clock cycle. To avoid such a condition, the FPU
will not load during the required open cycle.
Because the sequence of mixed operations places constraints on output timing, only
one cycle is available to output the double-precision (e
0) result. By contrast, the
SP multiply (E
F) is available for two cycles because the operation which follows
it does not output a result in the period following the seventh rising clock edge. In
general, the precision and timing of each operation affects the timing of adjacent
operations.
*
*
5-91
,....
M
~
IU
:i
,....
Z
en
Control settings for CLKMOOE and registers must also be considered in relation to
precision and speed of execution. In Figure 23, a similar sequence of mixed operations
is set up for execution in fully pipelined mode:
CLOCK CYCLE
2
FUNCTION
AND DATA
A.B
4
3
C.D
RESULTS
AND STATUS
5
6
7
8
9
10
E,F
G.H
I.J
K.L
M.N
O.P
A.B
A.B
C.D
E.F
G.H
I.J
11
12
13
Q.R
K.L
M.N
M.N
A.B.C,D - double precision multiply; E,F - single precision operation; G,H, - double precision
add; I,J,K,L,M,N - single precision operation; D,P,Q,R - double precision multiply. In clock
mode 1, a double precision result is two cycles long only when a double precision multiply is
followed by a double precision multiply.
Figure 23. Mixed Operations and Operands
(PIPES2-PIPESO = 000, CLKMOOE ... 1)
Although the data operands can be loaded in one clock cycle with CLKMODE set high,
enabling two additional internal registers delays the (A
B) result one cycle beyond
the previous example. Again, an open cycle is required after the (C
0) operation
0) multiply is
because the next operation is single precision. The result of the (C
available for one cycle instead of two, also because the following operation is single
precision. With this setting of CLKMOOE and PIPES2-PIPESO, a double-precision result
is only available for two clock cycles when one OP multiply follows another DP multiply.
*
en
2
-.J
~
*
*
l>
(') Matrix Operations
-4
co
co
eN
-.J
The' ACT8837 floating point unit can also be used to perform matrix manipulations
involved in graphics processing or digital signal processing. The FPU multiplies and
adds data elements, executing sequences of microprogrammed calculations to form
new matrices.
Representation of Variables
In state representations of control systems, an n-th order linear differential equation
with constant coefficients can be represented as a sequence of n first-order linear
differential equations expressed in terms of state variables:
dx1
dt
5-92
=
x 2, ... ,
dx(n-1)
dt
=
xn
For example, in vector-matrix form the equations of an nth-order system can be
represented as follows:
d
dt
a11
x1
x2
a12
a1n
~
b11
b1n
x2
:
xn
an1
+
or, X = ax
an2
ann
~
u2
+
xn
bn1
bnn
un
bu
Expanding the matrix equation for one state variable, dx 1/dt, results in the following
expression:
X1 = (a11
*
x1
+ ... + a1 n
* xn)
+ (b11
* u1
+ ... + b1 n
* un)
where X 1 = dx 1/dt.
Sequences of multiplications and additions are required when such state space
transformations are performed, and the' ACT8837 has been designed to support such
sum-of-products operations. An n X n matrix A multiplied by an n x n matrix X yields
an n x n matrix C whose elements cij are given by this equation:
n
cij =
E
aik
* xkj
"~
M
for i = 1, ... ,n
j = 1, ... ,n
(1)
I-
k=1
(J
For the cij elements to be calculated by the' ACT8837, the corresponding elements
aik and xkj must be stored outside the' ACT8837 and fed to the' ACT8837 in the
proper order required to effect a matrix multiplication such as the state space system
representation just discussed.
Sample Matrix Transformation
The matrix manipulations commonly performed in graphics systems can be regarded
as geometrical transformations of graphic objects. A matrix operation on another matrix
representing a graphic object may result in scaling, rotating, transforming, distorting,
or generating a perspective view of the image. By performing a matrix operation on
the position vectors which define the vertices of an image surface, the shape and
position of the surface can be manipulated.
5-93
c:(
~
z"
en
The generalized 4 x 4 matrix for transforming a three-dimensional object with
homogeneous coordinates is shown below:
a
e
T
b
f
c
g
k
...
0
:
.....
m n
d
h
I
p
The matrix T can be partitioned into four component matrices, each of which produces
a specific effect on the resultant image:
3
3
x 3
x
1 x 1
1 x 3
The 3 x 3 matrix produces linear transformation in the form of scaling, shearing and
rotation. The 1 x 3 row matrix produces translation, while the 3 x 1 coiumn matrix
produces perspective transformation with multiple vanishing points. The final single
element 1 x 1 produces overall scaling. Overall operation of the transformation matrix
:2 T on the position vectors of a graphic object produces a combination of shearing,
..., rotation, reflection, translation, perspective, and overall scaling.
en
~
~ The rotation of an object about an arbitrary axis in a three-dimensional space can be
-4 carried out by first translating the object such that the desired axis of rotation passes
CO through the origin of the coordinate system, then rotating the object about the axis
~ through the origin, and finally translating the rotated object such that the axis of rotation
..., resumes its initial position. If the axis of rotation passes through the point P = [a b c 1],
then the transformation matrix is representable in this form:
[x y z h] = [x y z 1]
1
0
0
0
1
0
0
0
1
0
0
0
-a
-b
-c
1
R
5-94
0
1
0
0
0
1
0
0
0
b
c
1
~
~
translation
to origin
1
0
0
a
rotation
about
origin
translation
back to initial
position
(2)
where R may be expressed as:
n12
R
=
+ (1-n)2 cosc/J
n 1n2( 1-cosc/J) + n3sinc/J n 1n3( 1-cosc/J) - n2sinc/J
+ (1-n2)2 cosc/J
n 1 n2( 1-cosc/J) - n3sinc/J
n22
n 1 n3( 1-cosc/J) + n2sinc/J
n2n3(1-cosc/J) - n1 sinc/J
0
and
n2n3( 1-cosc/J) + n 1sinc/J
n3 2
0
+ (1-n3)2 cosc/J
0
0
0
0
n1
=
q1/(q12
+ q22 + q3 2 )1/2
n2
=
q2/(q 12
+ q22 + q3 2 ) 1/2 = direction cosine for y-axis of rotation
n3 = q3/(q 12
n=
+ q22 + q3 2 ) 1/2
direction cosine for x-axis of
rotation
= direction cosine for z-axis of rotation
= unit vector for Q
(n1 n2 n3)
Q = vector defining axis of rotation = [q1 q2 q3]
c/J = the rotation angle about Q
.....
A general rotation using equation (2) is effected by determining the [x y z] coordinates
of a point A to be rotated on the object, the direction cosines of the axis of rotation
[n1, n2, n31. and the angle c/J of rotation about the axis, all of which are needed to
define matrix [R]. Suppose, for example, that a tetrahedron ABCD, represented by
the coordinate matrix below is to be rotated about an axis of rotation RX which passes
through a point P = [5 - 6 3 1] and whose direction cosines are given by unit vector
[n1 = 0.866, n2 = 0.5, n3 = 0.707]. The angle of rotation 0 is 90 degrees (see
Figure 24). The rotation matrix [R] becomes
2
1
2
2
R
-3
-2
-1
-2
0.750
-0.274
1.112
0
3
2
2
2
1.140
0.250
-0.513
0
A
B
C
D
0.112
1.220
0.500
0
0
0
0
1
5-95
M
CO
CO
t;
«
~
.....
Z
en
y
z·
r-I
(2)1
DT
+ - - - - ---------...,
BT " " - - AT
1(1)
Q
55 0
I
I
I
X·+-------------~~~~~~4~5~0----------~----------------~X
I
L_-+
BR
I
I
z
C·
IL ____
(3)
-+
B'
t~~
D'
__________
900
~~_L~
IA'
P (5. -6.3)
I
I
I
y'
(1) THIS ARROW DEPICTS THE FIRST TRANSLATION
(2) THIS ARROW DEPICTS THE 90 0 ROTATION
(3) THIS ARROW DEPICTS THE BACK TRANSLATION
Figure 24. Sequence of Matrix Operations
rJ)
Z
.....
~
l>
n
The point transformation equation (2) can be expanded to include all the vertices of
the tetrahedron as follows:
-4
(XI
(XI
W
.....
xa
xb
xc
xd
2 -3
1 -2
2 -1
2 -2
ya
yb
yc
yd
3
2
2
2
za
zb
zc
zd
1
1
1
1
h1
h2
h3
h4
1 0 00
01 00
00 1 0
-56-31
~
translation
to origin
5-96
0.750 1.140 0.112 0 1 000
-0.274 0.250 1.22 0 0 1 0 0
1.112 -0.513 0.5000 o 0 1 0
0
0
0
1 5-6 3 1
~
rotation about origin
~
translation
back to
initial
position
(3)
The 'ACT8837 floating-point unit can perform matrix manipulation involving
multiplications and additions such as those represented by equation (1). The matrix
equation (3) can be solved by using the' ACT8837 to compute, as a first step, the
product matrix of the coordinate matrix and the first translation matrix of the righthand side of equation (3) in that order. The second step involves postmultiplying the
rotation matrix by the product matrix. The third step implements the back-translation
by premultiplying the matrix result from the second step by the second translation
matrix of equation (3). Details of the procedure to produce a three-dimensional rotation
about an arbitrary axis are explained in the following steps:
Step 1
Translate the tetrahedron so that the axis of rotation passes through the origin. This
process can be accomplished by multiplying the coordinate matrix by the translation
matrix as follows:
1
-3
-2
2
2
-2
2
-1
3
2
2
2
1
0
0
0
1
0
0
0
1
-5
6
-3
(2-5)
(1 - 5)
(2-5)
(2-5)
0
0
0
1
(-3+6)
(-2+6)
(-1 +6)
(-2+6)
(3-3)
(2-3)
(2-3)
(2-3)
~
~
translation
to origin
vertices of translated
tetrahedron
-3
-4
-3
-3
+3
+4
+5
+4
0
-1
-1
-1
1
1
1
1
AT
BT
CT
DT
The' ACT8837 could compute the translated coordinates AT, BT, CT, DT as indicated
above. However, an alternative method resulting in a more compact solution is
presented below.
5-97
Step 2
Rotate the tetrahedron about the axis of rotation which passes through the origin after
the translation of Step 1. To implement the rotation of the tetrahedron, postmultiply
the rotation matrix [Rl by the translated coordinate matrix from Step 1. The resultant
matrix represents the rotated coordinates of the tetrahedron about the origin as follows:
-3
-4
-3
-3
1.140 0.112 0
3
0 1
0.750
4 -1 1 -0.274
0.250 1.22 0
5 -1 1
1.112 -0.513 0.500 0
4 -1 1
1
0
0
0
-3.072
-5.208
-4.732
-4.458
-2.670
-3.047
-1.657
-1.907
3.324
3.932
5.264
4.044
~
~
rotation about origin
rotated coordinates
1
1
1
1
Step 3
Translate the rotated tetrahedron back to the original coordinate space. This is done
by premultiplying the resultant matrix of Step 2 by the translation matrix. The following
calculations produces the final coordinate matrix of the transformed object:
- 3.072
-5.208
-4.732
-4.458
5-98
- 2.670
-3.047
-1.657
-1.907
3.324
3.932
5.264
4.044
1
1
1
1
1
0
1
0
0 0
5 -6
0
0
1
3
0
0
0
1
1.928
-0.208
0.268
0.542
- 8.670
-9.047
-7.657
-7.907
6.324
6.932
8.264
7.044
~
~
translate back
final rotated coordinates
1
1
1
1
A more compact solution to these transformation matrices is a product matrix that
combines the two translation matrices and the rotation matrix in the order shown in
equation (3). Equation (3) will then take the following form:
xa
xb
xc
xd
ya
yb
yc
yd
za
zb
zc
zd
h1
h2
h3
h4
2
1
2
2
-3
-2
-1
-2
3
2
2
2
0.750
-0.274
1.112
-3.730
1.140
0.250
-0.513
-B.661
0.112
1.220
0.500
B.260
0
0
0
1
~
transformation matrix
The newly transformed coordinates resulting from the postmultiplication of the
transformation matrix by the coordinate matrix of the tetrahedron can be computed
using equation (1) which was cited previously:
"
M
CO
CO
n
cij
=
E
aik
* xkj
for i = 1, ... ,n
j = 1, ... ,n
(1)
t;
n
-I
00
00
W
......
PSEUDOCODE
a11 - RA, x11 -RB
p1=a11*x11
a12 -RA, x21 -RB
p2 = a12 * x21
p1 - P(p1)
8
en
MULTIPLIER/ALU
OPERATIONS
Load a11, x11
SP Multiply
The h-scalars h1, h2, h3, and h4 are equal to 1. The number of clock cycles to generate
each 4-tuple can then be decreased from 16 to 13 cycles. Total number of clock cycles
to calculate all four vertices is reduced from 66 to 54 clocks. Figure 25 summarizes
the overall matrix transformation.
5-102
v
Z'
x'----------------------~~----------------~--------------------_7X
1°
I
I
I
I
0
S
C'
I
Z
0
0'
S'
:A'
90°
P (5, -6,3)
I
I
I
I
V'
Figure 25. Resultant Matrix Transformation
This microprogram can also be written to calculate sums of products with all pipeline
registers enabled so that the FPU can operate in its fastest mode. Because of timing
relationships, the C register is used in some steps to hold the intermediate sum of
products. Latency due to pipelining and chained data manipulation is 11 cycles for
calculation of the first coordinate, and four cycles each for the other three coordinates.
U
After calculation of the first vertex, 16 cycles are required to calculate the four
coordinates of each subsequent vertex. Table 33 presents the sequence of calculations
for the first two coordinates, xa and ya.
z
en
5-103
,...
M
CO
CO
....
«~
,...
Table 33. FuliV Pipelined Sum of Products (PIPES2-PIPESO .. 000)
(Bus or Register Contents Following Each Rising Clock Edge)
CLOCK
CYCLE
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
I
BUS
Mul
Mul
Chn
Mul
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
DA
BUS
x11
x21
x31
x41
x12
x22
x32
x42
x13
x23
x33
x43
x14
x24
x34
x44
DB
BUS
a11
a12
a13
a14
a11
a12
a13
a14
a 11
a12
a13
a14
a 11
a12
a13
a14
I
REG
RA
REG
RB
REG
MUL
PIPE
ALU
PIPE
P
REG
S
REG
Mul
Mul
Chn
Mul
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
x11
x21
x31
x41
x12
x22
x32
x42
x13
x23
x33
x43
x14
x24
x34
a11
a12
a13
a14
a11
a12
a13
a14
a11
a12
a13
a14
a11
a12
a13
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
p13
p14
s1
t
s2
s3
s4
xa
s5
s6
s7
ya
s8
s9
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
p13
s1
t
s2
s3
s4
xa
s5
s6
s7
ya
s8
Y
C
REG BUS
p2
p2
p2
s2
p6
p6
p6
s5
p10
p10
p10
xa
ya
tContents of this register are not valid during this cycle.
Products in Table 33 are numbered according to the clock cycle in which the operands
and instruction were loaded into the RA, RB, and I register, and execution of the
instruction began. Sums indicated in Table 33 are listed below:
en
s1 =
s2 =
s3 =
s4 =
z
....,
~
»
C')
-t
00
00
W
....,
5-104
p1 + 0
p1 + p3
p2 + p4
p5 + 0
s5 = p5 + p7
s6 = p6 + p8
s7 = p9 + 0
s8 = p9 + p11
s9 = p10 + p12
xa = p1 + p2 + p3 + p4
ya = p5 + p6 + p7 + p8
SAMPLE MICROPROGRAMS FOR BINARY DIVISION AND
SQUARE ROOT
The SN74ACT8837 Floating Point Unit supports binary division and square root
calculations using the Newton-Raphson algorithm. The' ACT8837 performs these
calculations by executing sequences of floating-point operations according to the
control settings contained in specific microprogrammed routines. This implementation
of the Newton-Raphson algorithm requires that a seed ROM provide values for the
first approximations of the reciprocals of the divisors.
This application note presents several microprograms for floating-point division and
square root using the Newton-Raphson algorithm. Each sample program is analyzed
briefly to show details of the floating-point procedures being performed.
Binary Division Using the Newton-Raphson Algorithm
Binary division can be performed as an iterative procedure using the Newton-Raphson
algorithm. For a dividend A, divisor B, and quotient Q, this procedure calculates a value
for 1 /B which is then used to evaluate the expression Q = A
1/B. The calculation
can be performed with either single- or double-precision operands, and examples of
each precision are shown.
*
The basic algorithm calculates the value of a quotient Q by approximating the reciprocal
of the divisor B to adequate precision and then multiplying the dividend A by the
approximation of the reciprocal:
Q = A/B = A
* Xn, where Xn
= the value of X after the nth iteration
n = the number of iterations to achieve the
desired precision
t-
Intermediate values of X are calculated using the following expression:
Xi
+
1 = Xi
*
(2 - B
* Xi),
,...
M
00
00
O
«
,...
where XO = approximates 1/B for
the range 0 < XO < 2/B
~
Z
To illustrate a program using the Newton-Raphson algorithm, the sequence of
calculations is presented in detail. For double-precision operations, three iterations are
5-105
en
needed to achieve adequate precision in the value of 1lB. A value for the seed XO
(approximately equal to lIB) is assumed to be given, and the following operations are
performed to evaluate Q from double-precision inputs:
Xl = XO(2 - B
=
X2
* XO)
Xl (2 - B * Xl)
X3 = X2(2 - B
=
XO(2 - B * XO) * (2 - B * XO(2 - B * XO))
* X2)
X3 = XO(2-B * XO) * (2-B * XO(2-B * XO)) * (2-B * XO * (2-B
* XO) * (2-B * XO * (2-B * XO)))
Q
=
AlB
A
=
*
=A
lIB
* X3
A * XO(2-B * XO) * (2-B * XO(2-B * XO)) * (2-B * XO
* (2-B * XO) * (2-B * XO * (2-B * XO)))
Xl
Xl
X1
X2
X1
X2
X3
en
~
Table 36 presents decimal and hexadecimal values for A, B,and XO, which are used
in the sample calculation. The computed value of the quotient Q is also included,
showing the representations of the results of this sample division.
~
Table 34. Sample Data Values and Representations
~
l>
CO
CO
W
-..J
TERM
A
B
XO
Q
VALUE
22
7
1/7
22/7
DECIMAL REPRESENTATION
MANTISSA • 2 EXPONENT
1.375 * 2 4
1.75*22
1.140625 * 2 (-3)
1.5714285714285713 * 2 1
IEEE HEXADECIMAL
REPRESENTATION
40360000
401 COOOO
3FC24000
40092492
00000000
00000000
00000000
49249249
In Table 35, the sequence and timing of this procedure is shown exactly as performed
by the' ACT8837. This example shows the steps in a double-precision division requiring
three iterations to achieve the desired accuracy. In this table each operation is
sequenced according to the clock cycles during which the instruction inputs for that
operation are presented at the pins of the' ACT8837. Operations are accompanied
by a pseudocode summary of tHe operations performed by the' ACT8837 and the clock
cycle when an operand is available or a result is valid.
Each line of pseudocode indicates the operands being used, the operations being
performed, the registers involved, and the clock cycles when the results appear. Each
5-106
register is represented by its usual abbreviation (RA, RB, P, S, or C) followed by the
number of the clock cycle when an operand will be valid or available at the register.
For example, "P.4" refers to the contents of the Product Register after the fourth
clock cycle.
Table 35. Binary Division Using the Newton-Raphson Algorithm
CLOCK
CYCLES
OPERATIONS
1, 2
3, 4
5, 6
X1
7, 8
9, 10
11, 12
X2
13, 14
15, 16
17, 18
19,20
21,22
* XO
2 - B * XO
= XO(2 - B * XO)
B * X1
2 - B * X1
= X1(2 - B * X1)
B * X2
2 - B * X2
= X2(2 - B * X2)
A * X3
B
X3
Output MSH
PSEUDOCODE
B - RA.2, XO - RB.2
RB.2 - P.4
RA.2
2 - P.4 - S.6
*
RB.2
RA.2
* S.6 * P.8 -
P.8
P.10
P.8 - C.9, 2 - P.1O - S.12
*
*
C.9
S.12 P.14
P.14-P.16
RA.2
P.14-C.15,2 - P.16-S.18
A - RA.18, C.15
RA.18
* P.20
-+
* S.18
-+
P.20
P.22
P.22.MSH - Y
The sequence of operations can be microcoded for execution exactly as listed in the
table above. Sample microprograms (with data and parity fields provided) are given
below. To make the programs easier to follow, comment lines have been included to
indicate clock timing, calculation performed by the instructions being loaded, and
operations being represented, in the same pseudocode as in the preceding table. The
fields in the microinstruction sequences presented below are arranged in the following
order:
M
CO
~
U
-t
(')
19 0 0 040 0 0 2 EF 0 0 0 0 1 1 0 0 0 0 3 1
20 1 0 040 0 0 2 EF 0 0 0 0 1 1 0 0 0 0 3 1
3 00000000 00000000 0 0
3 00000000 00000000 0 0
00
00
Co\)
....,
;Lines 21-22
Operation:
P.22 .... Y
21 0 0 020 0 0 2 EF 0 0 0 0 1 1 0 0 0 0 3 1
22 1 0 020 0 0 2 EF 0 0 0 0 1 1 0 0 0 0 3 1
5-110
3 00000000 00000000 0 0
3 00000000 00000000 0 0
Double-Precision Newton-Raphson Binary Division
If the value of B is given as a double-precision number and XO is looked up in a doubleprecision seed ROM, no conversions are required prior to performing a double-precision
division using the Newton-Raphson algorithm. Three iterations are used in the doubleprecision example (n = 3). The following formula represents the sequence of
calculations to be performed:
AlB = A
*
*
*
* [2 - B * XO * (2 - B * XO)]
* XO) * [2 - B * XO .(2 - B * XO)])
*
XO
(2 - B
XO)
(2 - B
XO .(2 - B
*
Table 37 shows a double-precision division using a double-precision seed ROM. The
example divides 22/7.
Table 37. Double-Precision Newton-Raphson Binary Division
;Lines 1-4
01
02
03
04
0
1
0
1
0
0
0
0
1 CO
1CO
1 CO
1CO
Calculation: B
Operations: B
0
0
0
0
0
0
0
0
2
2
2
2
FF
FF
FF
FF
0
0
0
0
0
0
0
0
0
0
1
1
0
0
1
1
* XO
-+
RA.4, XO
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
-+
0
0
0
0
3
3
3
3
RB.4, RA.4
1
1
1
1
1
1
1
1
3
3
3
3
* RB.4
3FC24000
3FC24000
401 COOOO
401 COOOO
-+
P.8
00000000
00000000
00000000
00000000
0
0
0
0
0
0
0
0
"
(II)
00
00
*
;Lines 5-8
Calculation: 2 - (B
Operation:
2 - P.8
XO)
-+ S.12
I-
U
~
=
and Xl
(")
-i
ex>
ex>
A
w
"""
5-114
B
*
*
*
0.5
[3 - B
0.5
*
*
Xl
*
[3 - B
*
XO
*
[3
*
B
*
*
*
(Xl 2)]
(XO 2)]
0.5
XO
[3 - B
(XO 2)]
(0.5
XO
[3 - B
(XO 2)]) 2]
*
*
*
*
Table 38. Single-Precision Binary Square Root
;Lines 1-2
01 0 0 026 1
02 1 0 026 1
;Lines 3-4
Calculation: B s.p .... d.p.
Operations: B'" RA.1, (s.p. to d.p.l(RA.11 ... S.2
3 FF 0 0 1 0 1 1 0 0 0 0 3 1
3 FF 0 0 1 0 1 1 0 0 0 0 3 1
Calculation: Load XO
Operation:
XO'" RA.4
03 0 0 126 1 0 2 FF 0 0 1 0 1 1 0 0 0 0 3 1
04 1 0 126 1 0 2 FF 0 0 1 0 1 1 0 0 0 0 3 1
;Lines 5-6
0 0 0 0 3 1
0 0 0 0 3 1
*
*
"
* C.7'" P.10
3 40000000 00000000 0 0
3 40000000 00000000 0 0
«~
"
*
0 0 0 0 3 1 1 3 40400000 00000000 0 0
0 0 0 0 3 1 1 3 40400000 00000000 0 0
*
Calculation: 3 - (B
XO 2)
S.12 - P. 12 ... S. 14
Operation:
11 0 0 003 0 0 2 FA 0 0 0 0 1 1 0 0 0 0 3 1
12 1 0 003 0 0 2 FA 0 0 0 0 1 1 0 0 0 0 3 1
M
00
00
....
(,)
z
en
Calculation: B
XO 2
C.7'" P.12, 3'" RA.10'" S.12
Operations: P.10
09 0 0 260 0 0 2 6F 0 0 1 0 1
10 1 0 260 0 0 2 6F 0 0 1 0 1
;Lines 11-12
3 3FE6AOOO 00000000 0 0
3 3FE6AOOO 00000000 0 0
Calculation: Load B, B
XO
Operations: S.6'" C.7, B'" RB.8, RB.8
07 0 1 040 1 0 2 7F 0 0 0 1 0 1 0 0 0 0 3 1
08 1 0 040 1 0 2 7F 0 0 0 1 0 1 0 0 0 0 3 1
;Lines 9-10
3 3FE6AOOO 00000000 0 0
3 3FE6AOOO 00000000 0 0
Calculation: XO d.p .... s.p.
Operations: (d.p. to s.p.l(RA.41 ... S.6
05 0 0 126 1 0 2 FF 0 0 1 0 1
06 1 0 126 1 0 2 FF 0 0 1 0 1
;Lines 7-8
3 40000000 00000000 0 0
3 40000000 00000000 0 0
3 00000000 00000000 0 0
3 00000000 00000000 0 0
5·115
Table 38. Single-Precision Binary Square Root (Continued)
;Lines 13-14
Calculation: XO
Operations: C.7
* (3 - (B * XO
* 8.14 ...... P.16,
2))
1/2 ...... RA.14 ...... 8.16
13 0 0 260 0 0 2 9F 0 0 1 0 1 1 0 0 0 0 3 1
14 1 0 260 0 0 2 9F 0 0 1 0 1 1 0 0 0 0 3 1
;Lines 15-16
*
*
*
3 3FOOOOOO 00000000 0 0
3 3FOOOOOO 00000000 0 0
*
Calculation: 1/2
XO
(3-(B
XO 2)) ...... X 1
P.16 ...... P.18, 0 ...... RA.16,
Operations: 8.16
RA.16 + RB.8 8.18
1 5 0 0 240 0 0 2 AF 0 0 1 0 1 1 0 0 0 0 3 1
16 1 0 240 0 0 2 AF 0 0 1 0 1 1 0 0 0 0 3 1
;Lines 17-18
*
3 00000000 00000000 0 0
3 00000000 00000000 0 0
Calculation: B
X1
Operations: 8.18
P. 18 ...... P.20
*
1 7 0 0 040 0 0 2 AF 0 0 0 0 1 1 0 0 0 0 3 1 1 3 00000000 00000000 0 0
1 8 1 0 040 0 0 2 AF 0 0 0 0 1 1 0 0 0 0 3 1 1 3 00000000 00000000 0 0
en
2
-..J
~
;Lines 19-20
(")
-I
CO
CO
W
-..J
*
Calculation: B
X1 2
Operations: P.18 ...... C.19, P.20
C.19 ...... P.22,
3 ...... RA.20 ...... 8.22
*
19 0 1 260 0 0 2 6F 0 0 1 0 1 1 0 0 0 0 3 1
20 1 0 260 0 0 2 6F 0 0 1 0 1 1 0 0 0 0 3 1
;Lines 21-22
*
Calculation: 3 - (B
X1 2)
Operations: 8.22 - P.22 ..... 8.24
21 0 0 003 0 0 2 FA 0 0 0 0 1 1 0 0 0 0 3 1
22 1 0 003 0 0 2 FA 0 0 0 0 1 1 0 0 0 0 3 1
;Lines 23-24
*
3 00000000 00000000 0 0
3 00000000 00000000 0 0
*
Calculation: X1
(3 - (B
X1 2))
8.24 ...... P.26, 1/2 ..... RA.24 ..... 8.26
Operations: C.19
*
23 0 0 260 0 0 2 9F 0 0 1 0 1 1 0 0 0 0 3 1
24 1 0 260 0 0 2 9F 0 0 1 0 1 1 0 0 0 0 3 1
5-116
3 40400000 00000000 0 0
3 40400000 00000000 0 0
3 3FOOOOOO 00000000 0 0
3 3FOOOOOO 00000000 0 0
Table 38. Single-Precision Binary Square Root (Concluded)
;Lines 25-26
*
*
*
25 0 0 240 0 0 2 AF 0 0 1 0 1 1 0 0 0 0 3 1
26 1 0 240 0 0 2 AF 0 0 1 0 1 1 0 0 0 0 3 1
;Lines 27-28
*
3 00000000 00000000 0 0
3 00000000 00000000 0 0
Calculation: B
X2 ... A
Operations: 5.28
P.28 ... P.30
*
27 0 0 040 0 0 2 AF 0 0 0 0 1 1 0 0 0 0 3 1
28 1 0 040 0 0 2 AF 0 0 0 0 1 1 0 0 0 0 3 1
;Lines 29-30
*
Calculation: 1/2
Xl
(3 - (B
Xl 2))'" X2
Operations: 5.26
P.26 ... P.28, 0'" RA.26,
RA.26 + RB.8 5.28
3 00000000 00000000 0 0
3 00000000 00000000 0 0
Calculation: NOP
Operation:
Y ... Output
29 0 1 OOA 0 0 2 FF 0 0 0 0 1 1 0 0 0 0 3 1
30 1 0 OOA 0 0 2 FF 0 0 0 0 1 1 0 0 0 0 3 1
3 00000000 00000000 0 0
3 00000000 00000000 0 0
.....
M
Double-Precision Square Root
00
00
The value of B is given as a double-precision number so XO can be looked up from
a double-precision seed ROM without conversion from one precision to the other. Three
iterations (n = 3) are required in the double-precision calculation, and the following
formula for sqrt(B) is to be evaluated:
A = B
*
*
*
*
0.5
[3 - B
[3 - B
[3 - B
* 0.5 * 0.5 * XO * [3 - B * (XO
* (0.5 * XO * [3 - B * (XO 2)]) 2]
* (0.5 * 0.5 * XO * [3 - B * (XO 2)]
* (0.5 * XO * [3 - B * (XO 2)]) 2]) 2]
to-
~
~
.....
Z
en
2)]
5-117
Table 39. Double-Precision Binary Square Root
*
;Lines 1-4
01
02
03
04
0
1
0
1
0 3EO
03EO
0 3EO
0 3EO
XO
Calculations: Load B, Load XO, B
Operations: B -+ RB.4~ XO -+ RA.4, RA.4
RA.4 -+ 5.8 -+ C.1 0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
3EO
3EO
3EO
3EO
FF
FF
FF
FF
0 0 0 0 1
00001
0 0 1 1 1
0 0 1 1 1
1
1
1
1
*
;Lines 5-8
05
06
07
08
2
2
2
2
0 0 0 0 3
00003
0 0 0 0 3
0 0 0 0 3
Calculations: B
XO 2
Operations: P.8
5.8
0
0
0
0
0
0
0
0
2
2
2
2
AF
AF
AF
AF
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
1
1
1
1
*
1
1
1
1
0
0
0
0
0
0
0
0
-+
0
0
0
0
1
1
1
1
P.12, 3
0
0
0
0
3
3
3
3
1
1
1
1
* RB.4
3
3
3
3
40000000
40000000
3FE6AOOO
3FE6AOOO
-+
RA.8
3
3
3
3
00000000
00000000
40080000
40080000
3
3
3
3
00000000
00000000
00000000
00000000
-+
-+
P.8
00000000
00000000
00000000
00000000
0
0
0
0
0
0
0
0
00000000
00000000
00000000
00000000
0
0
0
0
0
0
0
0
00000000
00000000
00000000
00000000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5.12
*
;Lines 9-12
XO 2)
Calculations: 3 - (B
Operations: 5.12 - p.12 -+ 5.16
rJ)
Z
t
-..J
(")
09
10
11
12
0
1
0
1
0
1
0
0
183
183
183
183
0
0
0
0
0
0
0
0
2
2
2
2
FA
FA
FA
FA
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
3
3
3
1
1
1
1
1
1
1
1
~
CO
CO
CAl
-..J
;Lines 13-16
13
14
15
16
0
1
0
1
5-118
0
0
0
0
3EO
3EO
3EO
3EO
*
*
Calculations: XO
(3 - (B
XO 2))
Operations: C.10
5.16 -+ P.20, 1/2
0 0 2 9F 0
0 0 2 9F 0
0 0 2 9F 0
0 0 2 9F 0
0
0
0
0
0
0
1
1
0
0
0
0
1
1
1
1
1
1
1
1
*
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
3
3
3
1
1
1
1
1
1
1
1
3
3
3
3
-+
RA.16
00000000
00000000
3FEOOOOO
3FEOOOOO
-+
5.20
00000000
00000000
00000000
00000000
Table 39. Double-Precision Binary Square Root (Continued)
* *
*
;Lines 17-20
17
18
19
20
0
1
0
1
0
0
0
0
3CO
3CO
3CO
3CO
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
1 CO
1 CO
1CO
1CO
2
2
2
2
AF
AF
AF
AF
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
1
1
1
1
1
1
.1
1
*
;Lines 21-24
21
22
23
24
*
Calculations: 1/2
XO
(3-(B
XO 2)) -+ X 1
P.20 -+ P.24 -+ C.25, 0 -+ RA.20,
Operations: 5.20
RA.20 + RB.4 -+ 5.24
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Calculations: B
Xl
Operations: 5.24
P.24
0
0
0
0
0
0
0
0
2
2
2
2
AF
AF
AF
AF
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
*
1
1
1
1
*
;Lines 25-28
0
0
0
0
0
0
0
0
0
0
0
0
-+
0
0
0
0
Calculations: B
Xl 2
Operations: P.28
C.25
*
3
3
3
3
3
3
3
3
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
0
0
0
0
0
0
0
0
3
3
3
3
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
P.28
3
3
3
3
-+
1
1
1
1
1
1
1
1
P.32, 3
-+
RA.28
-+
5.32
"
(\')
25
26
27
28
0
1
0
1
1
0
0
0
3EO
3EO
3EO
3EO
0
0
0
0
0
0
0
0
2
2
2
2
6F
6F
6F
6F
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
1
1
1
1
1 0 0 0 0 3 1
1 000 0 3 1
1 0 0 0 0 3 1
1000031
1
1
1
1
3
3
3
3
00000000
00000000
40080000
40080000
00000000
00000000
00000000
00000000
CO
~
()
C")
m" = 2 I m - delta
-4
(X)
(X)
Co\)
-....I
Where delta = 2(-8)
So m"
=
2 I m, and e"
= (- e)
- 1.
Since IEEE exponents are represented in excess 1023 notation, a formula for X" must
be determined, given that X is the IEEE exponent. As an IEEE exponent,
X = e + 1023 -+ e = X - 1023 and X" = e" + 1023. So, for X" in terms of X,
X" = e" + 1023
( - e) - 1 + 1023
= (- (X - 1023)) + 1022
= 1023 - X + 1022
= 2045 - X
So given the 11 bits of X as address of the seed exponent, the value stored at address
X is
X" = 2045 - X
5-126
(2)
Given that the mantissa seed ROM uses 10 bits of the mantissa to determine the seed,
delta).
each seed Xm will be used for some range of mantissas, 8m to (8m + 2
The formula for Xm is from formula (1).
*
218m
2/(8m + 2
-Xm
-Xm
* delta)
Where delta = 2( -11)
This value is used since the actual Xm should be generated by the mantissa in the
center of the given range:
Xm = 2/(8m + delta)
This would result in a more accurate seed on the average. Therefore, the formula used
to generate the mantissa part of the seed is
Xm = 2/(8m
+
(2(-11)))
(3)
Square Root PROMs
The seed for the square root, XO, is actually the reciprocal of the square root of the
data, 8:
XO = 1 1(8(112))
*
Given 8 = m
(2 e ) and XO = m'
by substitution and reduction:
*
* (2 e '), the expression for XO can be evaluated
XO = 1 I ((m
(2 e ))(1/2))
= 1 I (m(1/2)
(2(e/2)))
= m( - 1/2)
(2( - e/2))
*
"
*
M
CO
CO
....
(.)
Then m' and e' may be written as m' = m( - 1/2) and e' = - e/2.
c:t
Next, it is necessary to verify that the above m' and e' form a valid normalized IEEE
number. When e is an odd number, e' is not an integer and, therefore, it is not valid
IEEE exponent. If the above expression is separated into two cases, e' can be
represented in terms of a valid IEEE exponent, e":
e' = -e/2
for e even
for e odd
e' = e" + 112
Rewriting e" in terms of e produces this expression:
e" = e' - 1/2
=
(-e/2) - 1/2
for e odd
Then a valid IEEE exponent, e", can be written for all e as
e"
- e/2
e" = (-e/2) - 112
for e even
for e odd
5-127
~
"
2:
en
This is equivalent to e" = intI - e/2) for all e. However, the 1/2 affects the mantissa:
* (2e')
* (2(e" + 112))
for odd e
* (21/2) * (2e")
for odd e
m" * (2 e ") m" can be rewritten as
XO = m'
XO = m'
XO = m'
Since XO =
mil = m'
m" = m'
*
for even e
for odd e
(21/2)
In terms of m, m" = m - 1/2
m" = (m-1/2)
* (2112)
for even e
for odd e
Simplifying m" for odd e,
m"
(1/m1/2)
m" = (21m 112)
* (21/2)
for odd e
for odd e
Just as the divide exponent needed to be converted to excess 1023 notation, so the
same must be done for the square root:
X" = e" + 1023
X = e + 1023
X" = intI - e/2) + 1023
X" = int((1023-X) I 2) + 1023
en
Z
.....
The IEEE bits for the exponent seed, X", can be expressed in terms of the IEEE bits
for the exponent of B, X:
~
X" = intI (1023-X) 12)
+
1023
~ Because the formula for m" depends on the least significant bit of e, that bit must
CO be used as an address line to the mantissa.
CO
eN Since X = e + 1023, an odd value of e will result in an even value of X, and an even
..... value of e will result in an odd value of X. Therefore,
m" = m- 1/2
m" = 2/m1/2
5-128
for odd X
for even X
SN74ACT8841
Digital Crossbar Switch
6-1
6-2
SN74ACT8841
Digital Crossbar Switch
The SN74ACT8841 is a single-chip digital crossbar switch that cost-effectively
eliminates bottlenecks to speed data through complex bus architectures.
The' ACT8841 has 16 four-bit bidirectional ports which can be connected in
any conceivable combination. Total time fot data transfer is 14-ns flowthrough.
The' ACT8841 is ideal for multiprocessor application, where memory bottlenecks
tend to occur. For example, four 32-bit buses can be easily connected by two
'ACT8841 devices. System architectures based on the 16-port 'ACT884 1 can
include up to 16 switching nodes (i.e., processors, memories, or bus interfaces).
Larger processor arrays can be built with multistage interconnect schemes.
6-3
en
2
--..I
~
l>
(")
-I
CO
CO
~
....
6-4
SN74ACT8841
DIGITAL CROSSBAR SWITCH
JUNE 1988
•
•
High-Speed Programmable Switch for
Parallel Processing Applications
(TOPVIEWI
2
Dynamically Reconfigurable for FaultTolerant Routing
A
B
•
64 Bidirectional Data I/Os in 16 Nibble
(Four-Bitl Groups
•
Data 110 Selection Programmable by Nibble
•
Eight Banks of Control Flip-Flops for Storing
Configuration Programs
•
Two Selectable Hard-Wired Switching
Configurations
•
G
H
J
Selectable Stored-Data or Real-Time Inputs
K
L
•
156-Pin Grid-Array Package
•
CMOS 1 I'm EPIC"' Process
•
Single 5-V Power Supply
3:
w
Ga PACKAGE
M
p
3
4
5
6
7
8
9
10'1 12131415
•••••••••••••••
·· ..............
.
.• ... ·• ..
••
·· ·• ·
·
..
·• ...
• • • •
• ·• ·
• • •
• • • •
• •••
• • •
·· ·• ·•
• ••
• ·
• • •
• • •
· ............. .
•••••••••••••••
· ............. .
•••••••••••••
description
The SN74ACT8841 is a flexible, high-speed digital crossbar switch. It is easily microprogrammable to
support user-definable interconnection patterns. This crossbar switch is especially suited to multiprocessor
interconnects that are dynamically reconfigurable or even reprogram mabie after each system clock. The
'ACT8841 is built in Texas Instruments advanced 1 I'm EPIC"' CMOS process to enhance performance
and reduce power consumption. The switch requires only a 5-V power supply.
Because the' ACT8841 is a 16-port device, system architectures based on the' ACT8841 can include
up to 16 switching nodes, which may be processors, data memories, or bus interfaces. Larger processor
arrays can be built with multistage interconnection schemes. Most applications will use the crossbar switch
as a broadband bus interface controller, for example, between closely coupled processors which must
exchange data with very low propagation delays.
The' ACT8841 has ten selectable control sources, including eight banks of programmable control flip-flops
and two hard-wired control circuits. The device can switch from 1 to 16 nibbles (4 to 64 bits) of data
in a single cycle.
The 64 110 pins of the' ACT8841 are arranged in 16 switch able nibbles (see Figure 1). A single input nibble
can be broadcast to any combination of 15 output nibbles, or even to 16 nibbles (including itself) if operating
off registered data. Multiple input nibbles can be switched to multiple outputs, depending on the programmed
configurations available in the control flip-flops.
The digital crossbar switch is intended primarily for multiprocessor interconnection and parallel processing
applications. The device can be used to select and transfer data from multiple sources to multiple
destinations. Since it can be dynamically reprogrammed, it is suitable for use in reconfigurable networks
for fault-tolerant routing.
EPIC is a trademark of Texas Instruments Incorporated
PRODUCT PREVIEW documents contain information
on products in the formative or design phase of
development. Charact.ristic dati anil other
~:::~::t:=:sl:ht dt~iXa=;:IS';r T3i~::~~:~~~h:::
products without notica.
Copyright © , 988. Texas Instruments Incorporated
TEXAS " ,
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
6-5
:>w
a:
0.
I-
o::;)
c
oa:
0.
SN74JCT8841
DIGITAL CROSSBAR SWITCH
"'0
:J3
description (continued)
o
The' ACT8841 and the bipolar SN74AS8840 share the same architecture. Microcode for the' AS8840
can be tun on the' ACT8841 if the additional control inputs to the' ACT8841 are properly terminated.
However, because the' ACT8841 is a CMOS device with six additional control inputs, the' AS8840 and
the' ACT884 1 are not socket-compatible and cannot be used interchangably. A summary of the differences
between the SN74AS8840 and the SN74ACT8841 is provided in the 'AS8840 and 'ACT8841
FUNCTIONAL COMPARISON at the end of the data sheet.
C
c:
(")
~
"'0
The SN74ACT8841 is characterized for opertion from OOC to 70°C.
:J3
m
S
m
:E
Table 1. 'ACT8841 Pin Grid Allocation
PIN
NO.
Al
PIN
NAME
GND
GNO
037
C12
A5
A6
D35
033
C13
C14
WE
A7
A8
A9
CAAOAI
CNTR7
CNTA4
C15
Dl
02
AlO
All
0El57
03
D7
VCC
GND
D29
D8
A12
027
D25
GND
GND
D9
013
D14
D15
VCC
GNO
GND
GND
El
A2
A3
A4
A13
A14
A15
81
82
83
84
039
D36
034
85
86
87
0Ei58
88
89
CASACE
CNTA5
810
811
812
813
814
815
Cl
C2
C3
PIN
PIN
NO.
Cl0
Cll
NAME
D31
0ED6
VCC
GNO
D23
D21
043
D42
H13
H14
H15
Jl
J2
J3
J4
J12
J13
J14
J15
NAME
CAEAOO
SELOLS
CNTA3
N9
Nl0
VCC
DO
OEC
CAWAITEO
NIl
N12
03
06
CAWAITEI
GND
N13
N14
GNO
D8
GNO
CNTA2
N15
PI
P2
D9
GNO
GND
P3
P4
P5
P6
D58
D60
CNTAI
CNTAO
CAWAITE2
OE012
D20
D19
D45
044
K13
K14
E3
E13
5EiITO
K15
L1
lIDJ3
E14
018
D17
OE02
0Ei55
D48
D15
D14
P7
CNTA13
056
062
CNTA12
CNTA15
P8
P9
TPO
050
OED13
Pl0
Pll
P12
OEDO
D2
D4
P13
P14
D7
GND
D49
OEDll
L2
L3
L13
F2
046
L14
012
F3
F13
D47
D16
L15
Ml
GND
D28
D26
D24
F14
F15
Gl
OED4
CASEL3
CNTR8
M2
M3
M7
D13
D51
052
P15
D30
Al
A2
GNO
GNO
054
GND
A3
A4
D57
D59
GNO
G2
GND
D41
D40
G3
G4
CNTA9
CNTA10
M8
MID
VCC
GND
A5
A6
0EliT5
GNO
G13
G14
G15
CAADAO
D38
C4
C5
C6
0ED9
C7
C8
VCC
CACLK
C9
CNTR6
032
E15
Fl
D61
GND
M13
M14
M15
VCC
010
Dl1
A7
R8
CNTA14
GND
CASEL2
A9
CAEAD2
CASELI
CASELO
Nl
N2
D53
D55
AID
All
TPI
Dl
HI
H2
CNTAll
SELOMS
N3
N4
GND
H3
H4
MSCLK
N5
N6
A12
A13
R14
OEDT
VCC
OED14
D63
R15
GNO
G12
VCC
TEXAS ."
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS. TeXAS 75265
6-6
NAME
VCC
LSCLK
NO.
N7
N8
Kl
K2
K3
E2
D22
NO.
H12
CAEAOI
D5
GNO
SN74ACTB841
DIGITAL CROSSBAR SWITCH
3:
w
Table 2. 'ACT8841 Pin Functional Description
PIN
NAME
NO.
CNTRO
J15
CNTRl
J14
CNTR2
J13
CNTR3
H15
CNTR4
A9
CNTR5
69
CNTR6
C9
CNTR7
A8
CNTR8
Gl
CNTR9
G2
CNTR10
G3
CNTRll
Hl
CNTR12
P7
CNTR13
N7
CNTR14
R7
CNTR15
P8
CRADRO
87
CRADRl
A7
CRCLK
C8
CREADO
N8
CREADl
R8
CREAD2
R9
CRSELO
G15
CRSEL 1
G14
CRSEL2
G13
CRSEL3
F15
CRSRCE
88
110
:>w
DESCRIPTION
a:
a.
I-
o:::)
110
Control 1/0. Inputs four control words to the control flip-flops on each CRCLK cycle. As outputs, the
same addresses can be used to read the flip-flop settmgs.
c
oa:
a.
I
I
I
Control register address. Selects 1 6-blts of control flip-flops as a source/destination for outputs/inputs
on CNTRO-CNTR15. (see Table 7)
Control register clock. Clocks CNTRO·CNTR15 into the control flip-flops on low-te-high transition.
Selects one of eight banks of control flip-flops to read out on eNTRD-eNTRl 5 in 16-blt words
addressed by CRADR1-CRADRO.
I
Selects one of ten control configurations.
I
Load source select. When low selects CNTR inputs, when high selects DATA Inputs.
TEXAS . "
INSTRUMENTS
POST OFFICE BOX 655012. OALLAS. TEXAS 75265
6-7
SN74ACT8841
DIGITAL CROSSBAR SWITCH
"tI
Table 2. 'ACT8841 Pin Functional Description (continued)
::a
o
C
C
(")
-I
"tI
::a
m
S
m
~
PIN
NAME
NO.
CRWRITEO
CRWRITEl
CRWRITE2
00
01
02
03
04
05
06
07
08
09
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
J2
J3
Kl
Nl0
Rll
Pll
Nll
P12
R13
N12
P13
N14
N15
M14
M15
L14
L15
K14
K13
F13
E15
E14
015
014
C15
013
C14
813
A13
812
A12
811
All
810
Cl0
C6
A5
85
A4
110
I
DESCRIPTION
Destination select. Selects one of eight control banks. (see Table 4)
1/0
1/0 data bits 0 through 31 (data bits 0 through 31 are the least significant half I.
110
1/0 data bits 32 through 35 (data bits 32 through 63 are the most significant half).
TEXAS •
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS. TEXAS 75265
6-8
SN74ACTBB41
DIGITAL CROSSBAR SWITCH
~
Table 2. 'ACT8841 Pin Functional Description (continued)
w
PIN
NAME
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
GNO
GNO
GNO
GNO
GNO
GNO
GNO
GNO
GNO
GNO
GNO
GNO
GNO
GNO
NO.
84
A3
C4
83
C2
Cl
02
01
E2
El
F2
F3
K3
L1
L2
M1
M2
Nl
M3
N2
P3
R3
P4
R4
P5
R5
P6
N6
A1
A2
A14
A15
81
82
814
815
C3
C13
07
09
G4
G12
110
:>w
DESCRIPTION
a:
a..
I-
o
:;:)
c
a:
a..
o
110
1/0 data bits 36 through 63 (data bits 32 through 63 are the most significant half).
Ground /all pins must be used).
TEXAS ."
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
6-9
SN74ACT8841
DIGITAL CROSSBAR SWITCH
."
Table 2. 'ACT8841 Pin Functionel Description (continued)
:zJ
o
C
c:
(")
-f
."
:zJ
rn
S
rn
:e
PIN
NAME
NO.
GND
GND
GND
GND
GND
GND
GND
GND
GND
GND
GND
GND
GND
GND
LSCLCK
MSCLK
OEC
OEDO
OEDl
OED2
OED3
OED4
OED5
OED6
OED7
orns
OED9
OED 10
OEDll
OED12
orn13
OED14
OED15
J4
J12
M7
Ml0
N3
N13
Pl
P2
P14
P15
Al
A2
A14
A15
H13
H3
Jl
Pl0
A12
L13
K15
F14
E13
Cll
Al0
86
C5
E3
Fl
K2
L3
N5
A6
110
DESCRIPTION
Ground lall pins must be used).
I
Clocks the least significant half of data inputs into the input registers on a low-ta-high transition.
I
Clocks the most significant half of data inputs into the input registers on a low-ta-high transition.
I
Output enable for control flip-flops, active low
I
Output enables for data nibbles. active low
TEXAS . "
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
6-10
SN74ACT8841
DIGITAL CROSSBAR SWITCH
~
Table 2. 'ACT8841 Pin Functional Description (concluded)
PIN
NAME
NO.
SELOLS
H14
I
SELDMS
H2
I
TPO
TPl
P9
Rl0
I
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
e7
e12
03
DB
H4
H12
M8
M13
N4
WE
A6
w
DESCRIPTION
110
When low, selects the stored. least significant data input to the main internal bus. When high, real·
:>w
~
time data is selected.
When low, selects the stored, most significant data input to the main internal bus. When high, real-
Q.
time data is selected.
I-
o
Test pins. High during normal operation. (see Table 9)
::::»
o
o
~
Q.
5-V supply
N9
I
Write enable for control flip-flops, active low
overview
The 64 110 pins of the' ACT8841 are arranged in 16 nibble (four-bitl groups where each set of four pins
serves as bidirectional input!; to and outputs from a nibble multiplexer. During a switching operation, each
nibble passes four bits of either stored or real-time data to the main internal 64-bit data bus. Each output
multiplexer will independently select one of the 16 nibbles from this 64-bit data bus.
Data nibbles are organized into two groups: the least significant half (031-00) and the most significant
half (063-0321. Stored versus real-time data inputs can be selected separately for the LSH and the MSH.
Two clock inputs, LSCLK and MSCLK, are available to latch LSH and MSH data inputs, respectively, into
the data register.
The pattern of output nibbles resulting from the switching operation is determined by a selectable control
source, either one of eight banks of programmable control flip-flops or one of two hard-wired switching
configurations. Inputs to the control flip-flops can be loaded either from the data bus or from control liDs.
A separate clock (CRCLKI is provided for loading the banks of control flip-flops.
TEXAS . .
INSTRUMENTS
POST OFFICE BOX 655012 • DA.LLAS, TeXAS 75265
6-11
SN74ACT8841
DIGITAL CROSSBAR SWITCH
"tJ logic symbol
::xl
0
C
C
.
DIGITAL CROSSBAR SWITCH
'ACT8841
WE
(")
CREAOO
-t
CREAOI
CAEA02
"0
SELECT
CRClK
::xl
m
DESTINATION
I
CRWRITEO
CRWRITEI
CRWRITEZ
:S
m
:E
CRSRCE
CASElO
CRSEll
OEC
SELECTi
REAO
CNTR3-CNTRO
CONTROL
CNTR7 -CNTR4
'il
CNTR I I -CNTR8
CONTROL
REGISTER
CRSEl3
CRAORO
ADDRESS \
lOAD
CRAORI
TPO
CNTR I 5-CNTR I 2
lSClK
CRSELZ
ClK
ClK
MSHI
IlSH
SElOlS
TPI
MSClK
SELOMS
MUX
OE08
OEOO
8
0
03-00
035-032
OE09
OEOI
039-036
07-04
MUX
OE010
OE02
10
011-08
OE03
3
lSH
MUX
.....
019-016
l:-
OE05
~
MUX
OE01Z
023-020
~
00
00
12
4
DATA
MUX
n
047-044
MSH
OE04
2
OEOll
II
015-012
(J)
043-040
051-048
MUX
5
13
6
14
OE014
OE06
~
027-024
OE013
055-052
059-056
~
MUX
OE07
15
031-028
FIGURE 1
TEXAS . "
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS. TeXAS 75265
6-12
OE015
063-060
SN74ACT8841
DIGITAL CROSSBAR SWITCH
architecture
The' ACT884 1 digital crossbar switch has its 64 data II0s arranged in 16 multiplexer logic blocks, as shown
in Figure 2. Each nibble multiplexer logic block handles four bits of real-time input and four bits of storeddata input, and either input can be passed to the common data bus.
Two input multiplexer controls are provided to select between stored and real-time inputs. SELOLS controls
input data selection for the LSH (031 -00) of the 64-bit data input, and SELOMS for the MSH (063-032).
The input register clocks, LSCLK and MSCLK, are grouped in the same way and are used to clock data
into the registers in the multiplexer logic blocks. The 16 data input nibbles make up the 64 data bits on
the internal main bus.
This common bus supplies 16 data nibbles to a 1 6-to- 1 output multiplexer in each multiplexer logic block
(see Figure 3), As determined by one of ten selectable control sources, the 16-to- 1 output multiplexer
selects a data nibble to send to the outputs via the three-state output driver.
Control of the input and output multiplexers determines the input-to-output pattern for the entire crossbar
switch. Many different switching combinations can be set up by programming the control flip-flop
configurations to determine the outputs from the 1 6-to- 1 multiplexers.
For example, the switch can be programmed to broadcast one data input nibble through the other 15 nibbles
(60 outputs). Conversely, a 1 5-to- 1 nibble multiplexer can be configured by programming the switch to
select and output a single data nibble from the 64-bit bus. Several examples are described in more detail
in a later section.
TEXAS .."
INSTRUMENTS
POST OFFICE BOX 655012. DALLAS. TEXAS 75265
6-13
~
w
:>w
a:
a..
I-
U
::>
o
oa:
a..
SN74ACT8841
DIGITAL CROSSBAR SWITCH
"tJ
:%I
functional block diagram
o
SELOlS
C
C
+-----,
,--------
(")
-t
6'EC
(X)
(X)
~
~
FIGURE 2
TEXAS " ,
INSTRUMENTS
POST OFFICE BOX 655012· DALLAS, TEXAS 75265
6-14
SElDLS OR seLDMS
~
lSCLK OR MSCLK
,-----1-----------------------------------------------~-OEDX
DATA
eus
(tl4/
)
bU,
~
;
~l
\
~ ~
4,
;
DXX - DXX
CRSEl3
CRSElO
z
CREAD2
CREADO
i:l;;;i
OUTPUT
~
CONTROl
NIBBLE
~~
~4r
111.11.11.11.11.11.11.
,CReLK
CI
C;
~
TO
\
lOGIC
CRSRce
::j
>
,...
n
::ICI
o
en
en en
NIBBLE FROM
DATA BUS
CONTROL FLIP-FLOP
NIBBLE INPUT
CICIZ
>-...1
::ICI~
FIGURE 3. DATA NIBBLE MULTIPLEXER LOGIC
>
enn
:::e:-I
-=
Cf>
-1=
n~
~
::c -
01
SN74ACT8841
PRODUCT PREVIEW
SN74ACT8841
DIGITAL CROSSBAR SWITCH
""0
::0
o
C
C
(")
-t
""0
::0
m
S
m
:e
multiplexer logic group
There are 16 multiplexer logic blocks, one for each nibble. External data flows from four data 110 pins
into a logic block. A block diagram of the multiplexer logic is shown in Figure 3. The data inputs are either
clocked into the data register or passed directly to the main internal bus. The 64 bits of data from the
main bus are presented to a 16-to-l multiplexer, which selects the data nibble output.
Each of the 16 nibble multiplexer logic blocks contains eight control flip-flop (CF) groups, one for each
of the control banks. A control bank stores one complete switching configuration. Each CF group consists
of four D-type edge-triggered flip-flops. In Figure 3, the CF groups are shown as CFXXO to CFXX7, where
XX indicates the number of the nibble multiplexer logic group (0 < = XX < = 15). CFXXO represents the
16 CF groups (one from each logic block) which make up flip-flop control bank 0, CFXX 1 the 16 CF groups
in bank 1, etc.
In addition to the eight banks of programmable flip-flops, two hard-wired switching configurations can
be selected. The MSH/LSH exchange directs the input nibbles from each half of the switch to the data
outputs directly opposite. Thi~ switching pattern is shown in Table 3 below. For example, data input on
D ll-D8 is output on D43-D40, and data input on D43-D40 is output on Dll-D8.
Table 3. MSH/LSH Exchange
LSH
03·00
MSH
035·032
07·04
039·036
011·08
043·040
015·012
047·044
019·016
051·048
023·020
055·052
027·024
059·056
031·028
063·060
The second hard-wired configuration, a read-back function, causes all 64 bit to be output on the same
I/0s on which they were input. Neither of the hard-wired control configurations affects the contents of
the control banks.
The control source select, CRSEL3·CRSELO, determines which switching pattern is selected, as shown
in Table 4.
Table 4. 16-to-l Output Multiplexer Control Source Selects
CRSEL3
CRSEL2
CRSELl
CRSELO
L
L
L
L
Control bank 0
CONTROL SOURCE SELECTED
(programmable)
L
L
L
H
Control bank 1
(programmable)
L
L
H
L
Control bank 2
(programmable)
L
L
H
H
Control bank 3
(programmable)
L
H
L
L
Control bank 4
(programmable)
L
H
L
H
Control bank 5
(programmable)
L
H
H
L
Control bank 6
(programmable)
L
H
H
H
Control bank 7
(programmable)
H
X
X
L
MSH/LSH exchange *
H
X
X
H
Read-back (output echoes input) *
*Hard-wired switching configuration
X "" don't care
TEXAS " ,
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
6-16
SN74ACTBB41
DIGITAL CROSSBAR SWITCH
control words
A CF group can store a four-bit control word (CFN3-CFNO) to select the output of the 16-to-1 multiplexer
for that nibble port. One control word is loaded in each CF group. A total of 16 words, one per multiplexer
logic block, are loaded in a bank to configure one complete switching pattern. Table 5 lists the control
words and the input data each selects.
Each control word can be stored in a CF group and sent as an internal control signal to select the output
of a 16-to-1 multiplexer in a nibble logic block. For example, any CF group loaded with the word "LHHH"
will select the data input on 031-028 as the outputs of the associated nibble. If all 16 CF groups in a
bank were loaded with "LHHH," the same output (031-028) would be selected by the entire switch.
CFN3
CFN2
CFNI
MULTIPLEXER OUTPUT
03·00
L
L
L
L
L
L
L
H
07·04
L
L
H
L
011·08
L
L
H
H
015·012
L
H
L
L
019·016
L
H
L
H
L
H
H
L
023·020
027-024
L
H
H
H
031·028
H
L
L
L
035·032
H
L
L
H
039·036
H
L
H
L
043·040
H
L
H
H
047·044
H
H
L
L
D51·D48
H
H
L
H
055·052
H
H
H
L
059·056
H
H
H
H
063·060
CRWRITE2-CRWRITEO select which control bank is being loaded, as shown in Table 6.
Table 6. Control Flip-Flops Load Destination Select
CRWRITEI
CRWRITEO
DESTINATION
L
L
L
Control bank 0
L
L
H
Control bank 1
L
L
Control bank 2
L
H
H
H
Control bank 3
H
L
L
Control bank 4
H
L
H
H
H
L
H
H
H
Control bank 5
Control bank 6
Control bank 7
(.)
::l
C
rl.
loading control configurations
CRWRITE2
rl.
I-
a:
INPUT DATA SELECTED AS
CFNO
5>
w
a:
o
Table 5. 16-to-1 Output Multiplexer Control Words
INTERNAL SIGNALS
~
w
TEXAS •
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS. TEXAS 75265
6-17
SN74ACT8841
DIGITAL CROSSBAR SWITCH
"tJ
The control words for a bank can be loaded either 16 bits at a time on the control 110 pins (CNTR1 5-CNTRO)
or all 64 bits at once on the data inputs (063-00). If the control load source select, CRSRCE, is high, the
words are loaded from the data inputs. When CRSRCE = L, the CNTR inputs are used.
:D
o
C
c:
When a control bank is loaded from the data inputs, WE, CRSRCE, CRWRITE2-CRWRITEO, and the control
register clock CRCLK are used in combination to load all 16 control words (64 bits) in a single cycle. A
MSH/LSH exchange like that shown in Table 3 is used to load the flip flops on a rising CRCLK clock edge.
For example, data inputs 03-00 go to the data bus and then to the CF group that selects the data outputs
for 035-032. CRWRITE2-CRWRITEO select the control bank that is loaded (see Table 6).
(")
-I
"tJ
:D
m
The CNTR 15-CNTRO inputs can also be used to load the control banks. The bank is selected by
CRWRITE2-CRWRITEO (see Table 6). Four control words per CRCLK cycle can be input to the CF groups
(CFXX) that make up the bank. The CF groups loaded are selected by CRAOR1-CRAORO, as shown in
Table 7. Four CRCLK cycles are needed to load an entire control bank.
~
m
:E
Table 7. Loading Control Flip-Flops from CNTR liDs
CF GROUPS LOADED BY
CRAD1
CRADO
WE
L
L
L
CRCLK
L
H
L
H
L
L
H
H
L
S
S
S
S
x
x
H
X
CONTROL ICNTRI I/O NUMBERS
.15·12
11-8
7-4
3-0
CF12
CF8
CF4
CFO
CF13
CF9
CF5
CFl
CF14
CF10
CF6
CF2
CF15
CFll
CF7
CF3
Inhibit write to flip-flops
To read out the control settings, the same address signals can be used, except that no CRCLK signal is
needed and DEC is pulled low. CREA02-CREADO select the bank to be read; the format is the same as
for CRWRITE2-CRWRITEO, shown in Table 6.
Using the control II.0s to read the control bank settings can be valuable during debugging or diagnostics.
Control settings are volatile and will be lost if the' ACT8841 is powered off. An external program controlling
switch operation may need to read the control bank settings so that it can save and restore the current
switching configurations.
test pins
en
:z
""-I
TP1-TPO test pins are provided for system testing. As Table 8 shows, these pins should be maintained
high during normal operation. To force all outputs and liDs low, low signals are placed on TP1-TPO and
all output enables (OE015-0EOO and DEC). To force all outputs and liDs high, TP1 and all output enables
are pulled low, and TPO is driven high. When TPO is left low and a high signal is placed on TP1 , all outputs
on the' ACT8841 are placed in a high-impedance state, isolating the chip from the rest of the system.
~
l>
Table 8. Test Pin Inputs
C')
-I
CO
CO
~
.-.
TP1
TPO
L
L
OED15-
lffilo
L
OEC
L
RESULT
All outputs and II0s forced low
L
H
L
L
All outputs and II0s forced high
H
L
X
X
All outputs placed in a high-impedance state
H
H
X
X
Normal operation (default state)
TEXAS . "
INSTRUMENTS
POST OFFICE BOX 655012 • DA.LLAS, TeXAS 75265
6-18
SN74ACT8841
DIGITAL CROSSBAR SWITCH
I!xamples
Most' ACT8841 switch configurations are straightforward to program, involving few control signals and
procedures to set up the control words in the banks of flip-flops. Control signals and procedures for loading
and using control words are shown in the following examples.
broadcasting a nibble
3:
w
>
w
a:
c..
Any of the 16 data input nibbles can be broadcast to the other 15 data nibbles for output. For ease of
presentation, input nibble 063-060 is used in this example. Example 1 presents the microcode sequence
for loading flip-flop bank 0 and executing the nibble broadcast.
The low signal on CRSRCE selects CNTR 1 5-CNTRO as the input source, and the low signals on
CRWRITE2-CRWRITEO select flip-flop bank 0 as the destination. Table 5 shows that to select data on
063-060 as the output nibble, the four bits in the control word CFN3-CFNO must be high; therefore the
CNTR15-CNTRO inputs are coded high. The four microcode instructions shown in Example 1 load the same
control word from CNTR 15-CNTRO into all 16 CF groups of bank O.
Once the control flip-flops have been loaded, the switch can be used to broadcast nibble 063-060 as
programmed. The microcode instruction to execute the broadcast is shown as the last instruction in
Example 1. WE is held high and the data to be broadcast is input on 063-060. The high signal on SELOMS
selects a real-time data input for the broadcast. MSCLK and LSCLK (not shown) can be used to load the
input registers if the input nibble is to be retained. No register clock signals are needed if the input data
is not being stored.
The banks of control flip-flops not selected as a control source can be loaded with new control words
or read out on CNTR15-CNTRO while the switch is operating. For example, the MSH data inputs can be
used to load flip-flop bank 1 of the LSH while bank 0 of the LSH is controlling data 1/0.
TEXAS " ,
INSTRUMENlS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
6-19
I-
o
:::l
C
oa:
c..
M31J\3Hd J.::>naOHd
L1788J.:nf17LNS
m
,:.,
o
CI
lNSTi
NO.
1
CRSRCE CRWRITE2 CRWlUTEl
±
,I
CRWRITEO CRADR' CRADRO
CRSEL3
CRSEL2
eRSEL 1
CRSELO
WE
1111
1111
1111
1111
X
X
X
X
0
X
X
1111
1111
1111
1111
X
X
X
X
0
X
1111
1111
1111
1111
X
X
X
X
0
X
1111
',11
1111
1111
CNTR I/O NUMBERS
15-12 11-8
7-4
3-0
xxxx xxxx xxxx xxxx
SElDMS SELDLS
OEOl6-0ED0
OEC
1
X
xxxx xxxx xxxx· XXXX
xxxx xxxx xxxx XXXX
X
xxxx xxxx xxxx
XXXX
1
1
X
X
X
X
0
X
X
xxxx xxxx xxxx xxxx
1
0
0
0
0
1
1
X
1000
1
0000
0000
0000
CRCLK
en:!::
en
None
=e
=t
Selects bank 0 for switching control
• Selects real-time data inputs
I~
~4r
Example 2. Programming an MSH/LSH Exchange on CNTR Inputs
CRWRITEO CRAORl CRAORO
tNTH 110 NUMBERS
15-12
11-8
7·'
3·0
0100
0000
1100
1000
0101
0001
1101
1001
0111
0011
1111
1011
0111
0011
1111
lOll
CRSEL3
CRSEl2
C"SEL 1
CRSELO
WE
Comments
COMMENT
Loads CF12. CFa. CF4. Cfa 01 bank 7
Loads CF13. CF9. CF5. Cfl of bank 7
Loads Cf14. CflO. Cf6. Cf2 of bank 7
Loads eF15. CF11. CF7. CF3 of bank 7
Selects bank 7 tor SWltchmg control
Selects registered data Inputs
0eC
OE015-0£00
SElDMS SELDLS
CRCUt
xxxx
xxxx xxxx
xxxx xxxx
xxxx xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
xxxx
.r
.r.r
0000
0000
0000
None
XXXX
xxxx xxxx xxxx xxxx
INST. NO.
=
>
::Ill
n
:z:
Loads CF14. CF10. CF6. CF2 of bank 0
CRSRCE CRWfUTE2 CRWRlTE'
e=
en
COMMENT
Loads CF16, CF11. CF7., CF3 of bank 0
NO.
0000
=
..r
..r
Loads CF13, CF9, CFS, CFl of bank 0
lNST.
n-t
::Ill
Loads CF12. CFe. CF4, CFO of bank a
Z
5! .....
-t "'"
»
r-n
I
S
Com.......
·INST. NO.
en
-2
Example 1. Programming a Nibble Broadcast
I
SN74ACT8841
DIGITAL CROSSBAR SWITCH
programming an MSH/LSH exchange
A second, more complicated example involves programming the switch to swap corresponding nibbles
between the MSH and the LSH (first nibble in the LSH for first nibble in the MSH, and so on). This swap
can be implemented using the hard-wired logic circuit selected when CRSEL3 is high and CRSELO is low.
Programming this swap without using the MSH/LSH exchange logic requires loading a different control
word into each mux logic block. This is described below for purposes of illustration.
Each nibble in one half, either LSH or MSH, selects as output the registered data from the corresponding
nibble in the other half. The registered data from 035-032 is to be output on 03-00, the registered data
from 03-00 is output on 035-032, and so on for the remaining nibbles. As shown in Table 4, the flip-flops
for 03-00 have to be set to 1000 and the 035-032 inputs must be low. The CF groups and control words
involved in this switching pattern are listed in Table 9.
CF
CNTRINPUTS
TO LOAD
CONTROL
WORD
FLIP-FLOPS
LOADED
0111
CNTR15CNTR12
0110
0101
0100
CF15
RESULTS
CF11
CF10
CNTR11-
CF9
CNTR8
0001
031-028 027-024 023-020 019-016 015-012 011-08 07-04 -
CNTR7·
CNTR4
0000
1111
1110
1101
03-00
063-060
059-056
055-052
CF14
CF13
CF12
CF8
CF7
CF6
CF5
CF4
1100
CF3
CF2
CF1
CFO
0011
0010
CNTR3CNTRO
063-060
059-056
055-052
051-048
047-044
043-040
039-036
-
035-032
031-028
051-048 047-044 -
027-024
023-020
019-016
1011
1010
015-012
043-040 -011-08
1001
1000
039-036
035-032
--
a:
Q.
~
(,)
~
C
0
a:
Q.
Table 9. Control Words for an MSH/LSH Exchange
GROUP
3:
W
>
W
07-04
03-00
With this list of control words and the signals in Table 7, the 16-bit control inputs on CNTR15-CNTRO
can be arranged to load the control flip-flops in four cycles. Example 2 shows the microcode instructions
for loading the control words and executing the exchange.
In Example 2, bank 7 of flip-flops is being programmed. Bank 7 is selected by taking CRWRITE2-CRWRITEO
high and leaving CRSRCE low (slle Table 4) when the control words are loaded on CNTR15-CNTRO. With
WE held low, the CRCLK is used to load the four sets of control words. Once the flip-flops are loaded,
data can be input on 063-00 and the programmed pattern of output selection can be executed. A
microinstruction to select registered data inputs and bank 7 as the control source is shown as the last
instruction in Example 2. The data must be clocked into the input registers, using LSCLK and MSCLK,
before the last instruction is executed.
TEXAS •
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TeXAS 75265
6-21
SN74ACT8841
DIGITAL CROSSBAR SWITCH
."
::a
o
C
c
The control flip-flops could also have been loaded from the data input nibbles in one CRCLK cycle. Input
nibbles from one half are mapped onto the control flip-flops of the other half. All control words to set up
a switching pattern should be loaded before the bank of flip-flops is selected as control source. The
microcode instructions to load bank 1 with the 16 control words in one cycle are presented in Example 3.
(")
Example 3. Loading the MSH/LSH Exchange from Data Inputs
-I
CRWRITE2
CRWRITE1
."
o
0
::a
m
:sm
:e
CRWRITEO
SElDMS
\III(
SElDlS
~15-~
1
1111 1111 1111 1111
o
These control nibbles may be loaded from the input as a 64-bit real-time input word or as two 32-bit words
stored previously. To use stored control words, MSCLK and LSCLK are used to load the LSH and MSH
input registers with the correct sequence of control nibbles. Whenever the flip-flops are loadecj from the
data inputs, all 64 bits of control data must be present when the CRCLK is used so that all control nibbles
in a program are loaded simultaneously. Example 4 presents the three microcode instructions to load the
MSH and LSH input registers and then to pass the registered data to flip-flop bank 2.
Example 4. Loading Control Flip-Flops from Input Registers
INST.
NO.
CRSRCE CRWRITE2 CRWRITE1 CRWRITEO
1
X
X
2
X
X
3
1
0
;,w
SELDMS SELDLS
lIED15·
~
CRCLK MSCLK LSCLK COMMENTS
X
x
1
X
X
1
None
S
X
X
1.
X
X
1
None
None
1
0
0
0
0
1
S
None
None
S
None
load inputs
063·032
Load inputs
031-00
Load control
bank 2
The control words in a program can also be read bflCk from the flip-flops using the CNTR outputs. Four
instructions are necessary to read the 64 bits in a bank of flip-flops out on CNTR15-CNTRO. WE is held
high and DEC is taken low. No CRCLK signal is required. CREAD2-CREADO select bank 2 of flip-flops,
and CRADR1-CRADRO select in sequence the four addresses of the 16-bit words to be read out on the
CNTR outputs. Example 5 shows the four microcode instructions.
Example 5. Reading Control Settings on CNTR Outputs
INST.
CREA02 CREA01 CREADO
NO.
l:IE
CRADR1 CRADRO lift
1
0
1
0
0
0
2
3
0
1
1
0
0
0
0
0
0
4
0
1
0
0
1
1
1
0
1
1
0
1
1
1
CNTR liD NUMBERS
3-0
15·12 11-8 7·4
0100 0000 1100 100e
0101 0001 1101 1001
0110 0010 1110 1010
0111 0011 1111 1011
TEXAS ",
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TeXAS 75265
6-22
COMMENT
Read CF12. CF8. CF4. CFO
Read CF13. CF9. CF5. CFl
Read CF14. CF10. CF6. CF2
Read CF15. CFll. CF7. CF3
SN74ACT8841
DIGITAL CROSSBAR SWITCH
absolute maximum ratings over operating free-air temperature range (unless otherwise noted)t
Supply voltage, VCC ....
Input clamp current, 11K (VI < 0 or VI > Vcc)
Output clamp current, 10K (Vo < 0 or Vo > VCC)
Continuous output current, 10 (VO = 0 to VCC)
Continuous current through VCC or GND pins ....... .
Operating free-air temperature range.
Storage temperature range
-0.5 V to 6 V
±20 mA
....... ±50 mA
..... ±50 mA
±100 mA
. ... ooC to 70°C
- 65°C to 150°C
tStresses beyond those listed under "absolute maximum ratings" may cause permanent damage to the device. These are stress ratings
only and functional operation of the device at these or any other conditions beyond those indicated under "recommended operating
conditions" is not implied. Exposure to absolute-maximum-rated conditions for extended periods may affect device reliability.
recommended operating conditions
PARAMETER
Vee
Supply voltage
MIN
NOM
MAX
4.5
5.0
5.5
V
Vee
0.8
V
VIH
High-level input voltage
2
VIL
Low-Jevel input voltage
0
IOH
High-level output current
10L
Low-level output current
VI
Input voltage
Vo
dt/dv
Output voltage
0
0
Input transition rise or fall rate
TA
Operating free-air temperature
UNIT
V
-8
mA
8
mA
Vee
V
0
Vec
15
ns/V
0
70
V
·e
electrical characteristics over recommended operating free-air temperature range (unless otherwise
noted)
PARAMETER
TEST CONDITIONS
10H
~
- 20 ,A
~
-8 mA
VOH
10H
~
IOL
20 ,A
VOL
10L ~ 8 mA
10Z
Vo ~ Vee or 0
II
VI = Vee or 0
VI - Vee or 0, 10
lee
el
tThis is the increase
VI
In
~
Vee or 0
vee
25·C
TA MIN
TYP
MAX
4.5 V
MIN
TYP
MAX
5.5 V
5.4
4.5 V
3.8
3.7
5.5 V
4.8
4.7
V
4.5 V
0.1
5.5 V
0.1
4.5 V
0.32
0.4
5.5 V
0.32
0.4
5V
5.5 V
UNIT
4.4
±O.5
0.1
5.5 V
V
±O.S
,A
±1
,A
100
"A
pF
5V
supply current for each input that is at one of the specified TTL voltage levels rather than 0 V or
Vee.
TEXAS ."
INSTRUMENlS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
6-23
3:
w
:>w
a:
D..
t-
U
~
C
oa:
D..
SN74ACTBB41
DIGITAL CROSSBAR SWITCH
:2
-
O·
C
switching characteristics over recommended ranges of supply voltage and operating free-air temperature
(unless otherwise noted)
TVP!
MAX
7
14
10
18
9
15
CRCLK
CRSEL3-CRSELO
12
19
12
19
CREA02-CREAOO
10
18
10
18
PARAMETER
FROM
c:
TO
MIN
Data in
MSCLK. LSCLK
(")
Data out
SELDMS. SELOLS
-I
."
'pd
II
m
CRCLK
S
m
:E
CNTRn
CRA01. CRAOO
'en
tdis
tAli typical values are at
vee
8
16
TP1. TPO
All outputs
10
19
TP1. TPO
All outputs
Data out
10
15
OEO
DEC
CNTRn
7
12
8
14
10
15
TP1. TPO
All outputs
OED
Data out
5
8
DEC
CNTRn
6
10
UNIT
ns
ns
ns
= 5 V. TA = 25°C.
timing requirements over recommended ranges of supply voltage and operating free-air temperature
(unless otherwise noted)
PARAMETER
'w
Pulse duration
MIN
LSCLK. MSCLK. CRCLK h;gh or low
7
Data
7
7
CNTRn
'su
Setup time before CRClK
SELDMS. SELDLS
9
CRADR1.CRADRO
8
CRSRCE. CRWRITE2-CRWRITEO
8
LSCLK. MSCLK
'su
'h
Hold time after CRCLK
C/)
:2
.....
'h
B
Data
0
7
CNTRn
0
SELDMS. SELDLS
0
CRADR1. CRADRO
0
CRSRCE. CRWRITE
0
WE
0
0
Hold time, data after LSCLK or MSCLK
~
l>
(")
-I
00
00
~
~
TEXAS . "
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS. TEXAS 75265
6-24
UNIT
ns
ns
10
WE
Setup time, data before LSCLK or MSCLK
MAX
ns
ns
ns
SN74ACT8841
DIGITAL CROSSBAR SWITCH
:=w
'AS8840 AND 'ACT8841 FUNCTIONAL COMPARISON
differences between the SN74AS8840 and the SN74ACT8841
The SN74AS8840 and the SN74ACT8841 digital crossbar switches essentially perform the same function.
The SN74AS8840 and the SN74ACT8841 are based on the same 16-port architecture, differing in the
number of control registers, power consumption, and pin-out.
One difference is in the number of programmable control flip-flop banks available to configure the switch.
The 'AS8840 has two programmable control banks, while the 'ACT8841 has eight. Both have two
selectable hard-wired switching configurations.
The increased number of control banks in the 'ACT884 1 require six additional pins not found on the
'AS8840. These are: CRWRITE2, CRWRITE1, CREAD2, CREAD1, CRSEL3, and CRSEl2. CREAD and
CRWRITE on the '8840 become CREADO and CRWRITEO on the '8841. On the '8840, CRSEl1 selects
the hardwired control functions when high. This function is performed by the CRSEl3 signal on the '8841.
Therefore, CRSEl2 and CRSEl1 are actually the added signals.
The' ACT8841 is a low-power CMOS device requiring only 5-V power. Because of its STl internal logic
and TTL 1I0s, the 'AS8840 requires both 2-V and 5-V power.
Both the' AS8840 and the' ACT8841 are in 156 pin grid-array packages, however, the two devices are
not pin-for-pin compatible. Control signals were added to the' ACT8841 and the 2-V VCC pins (' AS8840
onlyl were assigned other functions in the' ACT8841 .
changing 'AS8840 microcode to 'ACT8841 microcode
Since only six signals have been added to the 'ACT8841, changing existing 'AS8840 microcode to
'ACT8841 microcode is straight forward. CRSEl3 on the' ACT8841 is functionally equivalent to CRSEl1
on the' AS8840. CREAD2, CREAD1, CRWRITE2, CRWRITE1, CRSEl2, and CRSEl1 bits must be added.
These can always be 0 if no additional control banks are needed. Additional control configurations can
be stored by programming these bits.
All other signals in the' AS8840 microcode remain the same when converting to 'ACT8841 microcode.
TEXAS ~
INSTRUMENTS
POST OFFICE BOX 655012 • DALLAS, TEXAS 75265
6-25
>
w
ex:
D..
I-
()
::J
C
oex:
D..
6-26
SN74ACT8847
64-Bit Floating Point/Integer Processor
7-1
7-2
SN74ACT8847
64·8it Floating Point Unit
•
Meets IEEE Standard for Single- and DoublePrecision Formats
•
Performs Floating Point and Integer Add,
Subtract, Multiply, Divide, Square Root, and
Compare
•
64-Bit IEEE Divide in 11 Cycles, 64-Bit Square
Root in 14 Cycles
•
Performs Logical Operations and Logical Shifts
•
Superset of TI's SN74ACT8837
•
30-ns, 40-ns and 50-ns Pipelined Performance
•
Low-Power EPIC" CMOS
The SN74ACT8847 is a high-speed, double-precision floating point and integer
processor. It performs high-accuracy, scientific computations as part of a
customized host processor or as a powerful stand-alone device. Its advanced
math processing capabilities allow the chip to accelerate the performance of both
CISC- and RISC- based systems.
High-end computer systems, such as graphics workstations, mini-computers and
32-bit personal computers, can utilize the single-chip' ACT884 7 for both floating
point and integer functions.
"
~
CO
CO
IU
EPIC is a trademark of Texas Instruments Incorporated.
(')
37
-I
00
00
~
-..J
38
7-12
Page
Single-Precision Independent ALU Operation, All
Registers Enabled (PIPES2-PIPESO = 000,
CLKMODE = X) · ............................
Double-Precision Independent ALU Operation, All
Registers Disabled (PIPES2-PIPESO = 111,
CLKMODE = 0) · ............................
Double-Precision Independent ALU Operation, Input
Registers Enabled (PIPES2-PIPESO = 110,
CLKMODE = 0) · ............................
Double-Precision Independent ALU Operation, Input
and Output Registers Enabled (PIPES2-PIPESO = 010,
CLKMODE = 1) · ............................
Double-Precision Independent ALU Operation, All
Registers Enabled (PIPES2-PIPESO = 000,
CLKMODE = 0) · ............................
Single-Precision Independent Multiplier Operation,
All Registers Disabled(PIPES2-PIPESO = 111,
CLKMODE = X) · ............................
Single-Precision Independent Multiplier Operation,
Input Registers Enabled (PIPES2-PIPESO = 110,
CLKMODE = X) · ............................
Single-Precision Independent Multiplier Operation,
Input and Output Registers Enabled
(PIPES2-PIPESO = 010, CLKMODE = X) ...........
Single-Precision Independent Multiplier Operation,
All Registers Enabled (PIPES2-PIPESO = 000,
CLKMODE = X) · ............................
Double-Precision Independent Multiplier Operation,
All Registers Disabled (PIPES2-PIPESO = 111,
CLKMODE = 0) · ............................
Double-Precision Independent Multiplier Operation,
Input Registers Enabled (PIPES2-PIPESO = 110,
CLKMODE = 1) · ............................
Double-Precision Independent Multiplier Operation,
Input and Output Registers Enabled
(PIPES2-PIPESO = 010, CLKMODE = 0) ...........
7-101
7-102
7-103
7-104
7-105
7-106
7-107
7-108
7-109
7-110
7-111
7-112
List of Illustrations (Continued)
Figure
39
40
41
42
43
44
45
46
47
48
49
50
51
52
Page
Double-Precision Independent Multiplier Operation,
All Registers Enabled (PIPES2-PIPESO = 000,
CLKMODE = 0) ............................ .
Single-Precision Floating Point Division, Input
Registers Enabled (PIPES2-PIPESO = 110,
CLKMODE = X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single-Precision Floating Point Division, Input and
Pipeline Registers Enabled (PIPES2-PIPESO = 100,
CLKMODE = X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single-Precision Floating Point Division, Input and
Output Registers Enabled (PIPES2-PIPESO = 010,
CLKMODE = X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Single-Precision Floating Point Division, All Registers
Enabled (PIPES2-PIPESO = 000, CLKMODE = X)
Double-Precision Floating Point Division, Input
Registers Enabled (PIPES2-PIPESO = 110,
CLKMODE = 0) ............................ .
Double-Precision Floating Point Division, Input and
Pipeline Registers Enabled (PIPES2-PIPESO = 100,
CLKMODE = 0) ............................ .
Double-Precision Floating Point Division, Input and
Output Registers Enabled (PIPES2-PIPESO = 010,
CLKMODE = 1) ............................ .
Double-Precision Floating Point Division, All Registers
Enabled (PIPES2-PIPESO = 000, CLKMODE = 1) .....
Integer Division, Input Registers Enabled
(PIPES2-PIPESO = 100, CLKMODE = X) . . . . . . . . . . .
Integer Division, Input and Pipeline Registers Enabled
(PIPES2-PIPESO = 100 CLKMODE = X) . . . . . . . . . . .
Integer Division, Input and Output Registers Enabled
(PIPES2-PIPESO = 010, CLKMODE = X) . . . . . . . . . . .
Integer Division, All Registers Enabled
(PIPES2-PIPESO = 000, CLKMODE = X) . . . . . . . . . . .
Single-Precision Floating Point Square Root, Input
Registers Enabled (PIPES2-PIPESO = 110,
CLKMODE = X) ............................ .
7-113
7-114
7-114
7-115
7-115
7-116
7-116
7-117
7-117
7-118
,....
"d"
7-11800
00
I7-119 U
<{
"d"
7-119 ,....
Z
en
7-120
7-13
List of Illustrations (Continued)
Figure
53
54
55
56
57
58
Page
Single-Precision Floating Point Square Root, Input
and Pipeline Registers Enabled
(PIPES2-PIPESO = 100, CLKMODE = X) ...........
Single-Precision Floating Point Square Root, Input
and Output Registers Enabled
(PIPES2-PIPESO = 010, CLKMODE = X) ...........
Single-Precision Floating Point Square Root,
All Registers Enabled (PIPES2-PIPESO = 000,
CLKMODE = X) · . ".' ..........................
Double-Precision Floating Point Square Root, Input
Registers Enabled (PIPES2-PIPESO = 11O,
CLKMODE = 1) · ............................
Double-Precision Floating Point Square Root, Input and
Pipeline Registers Enabled (PIPES2-PIPESO = 10O,
CLKMObE = 01 · ............................
Double-Precision Floating Point Square Root, Input and
Output Registers Enabled (PIPES2-PIPESO = 010,
CLKMODE. = 1 I
Double-Precision Floating Point Square Root,
AII'Registers Enabled (PIPES2-PIPESO = 000,
CLKMODE = 0) · ............................
Integer Square Root, Input Registers Enabled
(PIPES2-PIPESO = 110, CLKMODE = X) ...........
Integer Square Root, Input and Pipeline Registers
Enabled (PIPES2-PIPESO = 100, CLKMODE = XI
Integer Square Root, Input and Output Registers
Enabled (PIPES2-PIPESO = 010, CLKMODE = XI
Integer Square Root, All Registers Enabled
(PIPES2-PIPESO = 000, CLKMODE = X) ...........
Single-Precision Chained Mode Operation, All
Registers Disabled (PIPES2-PIPESO = 111,
CLKMODE = X) · ............................
Single-Precision Chained Mode Operation, Input
Registers Enabled (PIPES2-PIPESO = 11O,
CLKMODE = 1) · ............................
Single-Precision Chained Mode Operation, Input and
Output Registers Enabled (PIPES2-PIPESO = 010,
CLKMODE = X) · ............................
•
59
60
61
62
63
(J)
2
"
64
~
l>
("")
-I
CO
CO
65
~
"
66
7-14
•••••
0"
• • • • • • • • • • • • • • • • • • • • •
7-120
7-121
7-121
7-122
7-122
7-123
7-123
7-124
7-124
7-125
7-125
7-126
7-127
7-128
List of Illustrations (Concluded)
Figure
67
68
69
70
71
72
73
74
75
76
77
78a
78b
79
Page
Single-Precision Chained Mode Operation, All Registers
Enabled (PIPES2-PIPESO = 000, CLKMODE = X) ....
Double-Precision Chained Mode Operation, All Registers
Disabled (PIPES2-PIPESO = 111, CLKMODE = 0)
Double-Precision Chained Mode Operation, Input
Registers Enabled (PIPES2-PIPESO = 110,
CLKMODE = 1) ............................ .
Double-Precision Chained Mode Operation, Input and
Output Registers Enabled (PIPES2-PIPESO = 010,
CLKMODE = 0) ............................ .
Double-Precision Chained Mode Operation, All
Registers Enabled (PIPES2-PIPESO = 000,
CLKMODE = 0) ............................ .
Sequence of Matrix Operations ... . . . . . . . . . . . . . . . .
Resultant Matrix Transformation ................. .
SN74ACT8837 Floating Point Unit ................ .
SN74ACT8847 Floating Point Unit. ............... .
Creating a 3-D Image ......................... .
View Volume ............................... .
Model of Procedure for Creating a 3-D Graphic ...... .
Model of Creating and Transforming a 3-D Graphic ... .
Viewing Pyramid Showing Six Clipping Planes ....... .
7-129
7-130
7-131
7-132
7-133
7-154
7-161
7-225
7-226
7-245
7-246
7-247
7-247
7-251
,....
'lit
ex)
ex)
t-
O
B Comparison Function Table .................. .
Data Flow for Accept/Reject Testing ................ .
Data Flow for the X Processor. . . . . . . . . . . . . . . . . . . . . .
Program Listing for the X Processor ................. .
Summary of Graphics Systems Performance .......... .
Available Options for Graphic System Designs ......... .
7-257
7-258
7-259
7-260
7-262
7-262
7-263
7-263
.....
..t
00
00
....(,)
«
..t
.....
Z
en
7-21
7-22
Overview
Using a top-down approach, this user guide contains the following major sections:
Introduction (to Microprogrammed Architectures and the' ACT884 7)
SN74ACT8847 Architecture
Microprogramming the 'ACT884 7
Easy-to-Access Reference Guide
Application Notes
The SN74ACT8847 combines a multiplier and an arithmetic-logic unit in a single
microprogrammable VLSI device. The' ACT8847 is implemented in Texas Instruments
one-micron CMOS technology to offer high speed and low power consumption with
exceptional flexibility and functional integration. The FPUs can be microprogrammed
to operate in multiple modes to support a variety of floating point applications.
The 'ACT884 7 is fully compatible with the IEEE standard for binary floating point
arithmetic, STD 754-1985. This FPU performs both single- and double-precision
operations, integer operations, logical operations, and division and square root
operations (as single microinstructions).
Understanding the' ACT8847 Floating Point Unit
To support floating point processing in IEEE format, the' ACT884 7 may be configured
for either single- or double-precision operation. Instruction inputs can be used to select
three modes of operation, including independent ALU operations, independent multiplier
operations, or simultaneous AlU and multiplier operations.
Three levels of internal data registers are available. The device can be used in
flowthrough mode (all registers disabled), pipelined mode (all registers enabled), or
in other available register configurations. An instruction register, a 64-bit constant
register, and a status register are also provided.
Each FPU can handle three types of data input formats. The ALU accepts data operands
in integer format or IEEE floating point format. A third type of operand, denormalized
numbers, can also be processed after the ALU has converted them to "wrapped"
numbers, which are explained in detail in a later section. The' ACT884 7 multipli!~r
operates on normalized floating point numbers, wrapped numbers, and integer
operands.
Microprogramming the' ACT8847
I'
~
00
00
~
u
«
The' ACT884 7 is a fully microprogrammable device. Each FPU operation is specified
by a microinstruction or sequence of microinstructions which set up the control inputs ~
I'
of the FPU so that the desired operation is performed.
Z
en
7-23
Support Tools
Texas Instruments has developed functional evaluation models of the' ACT884 7 in
software which permit designers to simulate operation of the FPU. To evaluate the
functions of an FPU, a designer can create a microprogram with sample data inputs,
and the simulator will emulate FPU operation to produce sample data output files, as
well as several diagnostic displays to show specific aspects of device operation. Sample
microprogram sequences are included in this section.
Design Support
Texas Instruments Regional Technology Centers, staffed with systems-oriented
engineers, offer a training course to assist users of TI LSI products and their application
to digital processor systems. Specific attention is given to the understanding and
generation of design techniques which implement efficient algorithms designed to
match high-performance hardware capabilities with desired performance levels.
Information on VLSI devices and product support can be obtained from the following
Regional Technology Centers:
Atlanta
Texas Instruments Incorporated
3300 N.E. Expressway, Building 8
Atlanta, GA 30341
404/662-7945
Chicago
Texas Instruments Incorporated
51 5 Algonquin
Arlington Heights, IL 60005
312/640-2909
Boston
Texas Instruments Incorporated
950 Winter Street, Suite 2800
Waltham, MA 021 54
617/895-9100
Dallas
Texas Instruments Incorporated
10001 E. Campbell Road
Richardson, TX 75081
214/680-5066
Northern California
Texas Instruments Incorporated
5353 Betsy Ross Drive
Santa Clara, CA 95054
4081748-2220
Southern California
Texas Instruments Incorporated
17891 Cartwright Driv.e
Irvine, CA 92714
714/660-8140
fJ)
:2
Design Expertise
,J::a.
Texas Instruments can provide in-depth technical design assistance through
consultations with contract design services. Contact your local Field Sales Engineer
for current information or contact VLSI Systems Engineering at 214/997-3970.
......
~
~
00
00
,J::a.
......
7-24
, ACT884 7 Logic Symbol
•
'ACT8847
64-Bit Floating Point Unit
CLK
MASTER CLOCK (EXCEPT C REGISTER)
C REGISTER CLOCK
CLKC
CLKMOOE
CLOCK EDGE
BYTEP
PARITY GENERATION
CONFIG1-0
~
RND1-0
TP1-0
10
~
8
8
LJ::".
MULTIPLIER
I
~
PIPESI
STATUS, p, S'I FLOWTHROUGH
AND INST PIPELINE
REGISTERS
EN
~
PIPES2
SELECT
PARITY
110
I
4
DA DATA
4
DB DATA
4
Y BUS
MSHi
LSH Y BUS
STATUS
PARITY
2
I
COMPARISON
STATUS
12
13
14
ENRA
ENRB
OES
OEC
OEY
--;:::::
-,.;..
......
LOAD RA REGISTER
LOAD RB REGISTER
EXCEPTION & OTHER STATUS
COMPARISON STATUS
Y31-YO, PY3-PYO
UNORD
I
AGTB
AEQB
ED
D1VBYO
IVAL
IN EX
OVER
UNDER
OENORM
DENIN
RNDCO
SRCEX
CHEX
STEX1-0
NEG
INF
EXCEPTION
ANO
OTHER
STATUS
EN
~
DAO
DA31
DBO
DB31
··• ··
·· ··
r
0
31
PY3-0
PERRB
17
18
10
PB3-0
MSERR
INSTRUCTION
19
PA3-0
PERRA
DA DATA
DB DATA
MASTER/SLAVE
COMPARATOR
0
11
110
PIPESO
ALU, MULTIPLlER'I OWTHROUGH
AND INSTRUCTION FL
PIPELINE REGISTERS
EN
ALU
C REG
WRITE
8YPASS
OPERAND SOURCE
STATUS SOURCE
15
16
HALT
~
IUNDERFLOW
GRADUAL
ROUNDING MODE
SRCC
ENRC
FLOWC
SELOP7-0
SELST1-0
SELMS/LS
SUDDEN
RESET
INSTRUCTION, RA, & RB I FLOWTHROUGH
EN
REGISTERS
DATA SOURCE
FAST
CLEARS STATES
& STATUS /1
/1
STALLS OPERATION
···
0
~
~
31
···
YO
'I:t
Y31
00
00
~
0
31
.....
(,)
~
(")
-I
(X)
(X)
~
"
7-34
, ACT884 7 Specifications
absolute maximum ratings over operating free-air temperature range
(unless otherwise noted) t
Supply voltage, VCC ....................... -0.5 V to 6 V
Input clamp current, 11K (V, < 0 or V, > VCC) ......
± 20 mA
Output clamp current, 10K (Va < 0 or Va > VCC). . .
± 50 mA
Continuous output current, 10 (Va = VCC) . . . . . . . . .
± 50 mA
Continuous current through VCC or GND pins . . . . . ..
± 100 mA
Operating free-air temperature range . . . . . . . . . . . .. OoC to 70°C
Storage temperature range. . . . . . . . . . . . . . . .. - 65°C to 150°C
tStresses beyond those listed under "absolute maximum ratings" may cause permanent damage
to the device. These are stress ratings only and functional operation of the device at these or
any other conditions beyond those indicated under "recommended operating conditions" is
not implied. Exposure to absolute-maximum-rated conditions for extended periods may affect
device reliability.
recommended operating conditions
PARAMETER
SN74ACT8847
NOM
M~!,
4.75
5.0
~~5
V
(;01ee
,:,;~~t~" 0.8
V
Vee
Supply voltage
VIH
High-level input voltage
2
VIL
Low-level input voltage
0
IOH
High-level output current
IOL
Low-level output current
VI
Input voltage
Vo
dt/dv
Output voltage
Input transition rise or fall rate
TA
Operating free-air temperature
UNIT
MIN
1"" ,
,,~,)
o~<)
V
-8
mA
8
mA
Vee
V
'''0
Vee
15
ns/V
0
70
~J
V
°e
I'
~
00
00
I-
U
«
~
I'
Z
en
7-35
electrical characteristics over recommended operating free-air
temperature range (unless otherwise noted)
PARAMETER
TEST CONDITIONS
10H = -20 pA
VOH
10H = -8 rnA
10L = 20 p.A
VOL
10L = 8 rnA
VCC
4.75 V
5.25 V
4.74
4.55
5.24
5.05
5.25 V
4.7
0.01
5.25 V
4.75 V
0.01
10Z
VI = VCC or 0, 10
5.25 V
ICCQ
Ci
VI = VCC or 0, 10
5.25 V
5V
UNIT
V
,
,,;;;::s:"
?~"
0.10
0.10
d(t:Y
,
0.45
....')"
5.25 V
5.25 V
TYP MAX
3.7
4.75 V
VI = VCC or 0
7-36
MIN
4.75 V
II
Vi = VCC or 0
SN74ACT8847
TA - 25°C
MIN TYP MAX
V
0.45
{~J
,;;;+('
10
±5
p.A
±10
p.A
200
p.A
pF
switching characteristics
NO.
1
PARAMETER
tpd1
FROM
TO
(INPUT)
(OUTPUT)
PIPELINE
CONTROLS
PIPES2-PIPESO
SN74ACT8847-30
MIN
DA/DB/lnst
Y OUTPUT
111
t
INPUT REG
Y OUTPUT
110
70
INPUT REG
STATUS
110
70
PIPElN REG
Y OUTPUT
lOX
48
PIPElN REG
STATUS
lOX
48
OUTPUT REG Y OUTPUT
OXX
20
OUTPUT REG
STATUS
OXX
20
Y OUTPUT
XXX
18
2
tpd2
3
tpd3
4
tpd4
5
tpd5
SELMS/lS
6
tpd6
ClKi
7
tpd7
ClKi
8
tpd8
SElMS/lS
9
tdl :I:
ClKi
ClKi
010
56
10
td2:1:
ClKi
ClKi
000
30
Y OUTPUT
INVALID
STATUS
INVALID
Y OUTPUT
INVALID
ns
ns
ns
ns
3.0
ns
all but 111
3.0
ns
XXX
1.5
ns
data captured in C register is data
td3
ns
all but 111
Delay time, ClKC after ClK to insure
11
UNIT
MAX
clocked into sum or product register by
ns
12
td-O§
that clock. (PIPES2-PIPESO = OXX)
12
tenl
OEY
Y OUTPUT
XXX
12
STATUS
XXX
12
ten2
OEC,OES
14
tdis1
OEY
Y OUTPUT
XXX
12
15
tdis2
OEC,OES
STATUS
XXX
12
13
ns
tThis parameter no longer tested and will be deleted on next Data Manual revision.
:I: Minimum clock cycle period not guaranteed when operands are fed back using FlOWC to bypass
the C register and operands are used on the same clock cycle.
§td is the clock cycle period.
7-37
setup and hold times
PIPELINE
NO.
PARAMETER
CONTROLS
SN74ACT8847-30
PIPES2-PIPESO MIN
16
17
12
tsu1
Inst/control before ClKi
XXO
XXO
11
tsu2
DA/DB before ClKi
18
tsu3
DA/DB before 2nd ClKi (DP)
XX1
40
19
tsu4
CONFIG1-0 before ClKi
XXO
12
20
tsu5
SRCC before ClKCi
XXX
10
21
tsu6
RESET before ClKi
XXO
12
22
th1
Inst/control after ClK!
XXX
1
XXX
1
23
UNIT
MAX
th2
DA/DB after ClK!
24
th3
SRCC after ClKC!
XXX
1
25
th4
RESET after ClK!
XXO
6
ns
ns
elK/RESET requirements
SN74ACT8847-30
PARAMETER
tw
CJ)
2
.....
.J:I.
»
(')
-I
CO
CO
.J:I.
.....
7-38
Pulse duration
MIN
ClK high
10
ClK low
10
RESET
10
MAX
UNIT
ns
switching characteristics
NO.
1
PARAMETER
tpd1
2
tpd2
3
tpd3
4
tpd4
5
tpd5
FROM
(INPUT)
TO
(OUTPUT)
PIPELINE
CONTROLS
PIPES2-PIPESO
SN74ACT8847-40
MIN
DA/DBllnst
Y OUTPUT
111
t
INPUT REG
Y OUTPUT
110
90
INPUT REG
STATUS
110
90
PIPELN REG
Y OUTPUT
10X
60
PIPELN REG
STATUS
10X
OXX
OXX
XXX
60
OUTPUT REG Y OUTPUT
OUTPUT REG
SELMS/LS
STATUS
Y OUTPUT
Y OUTPUT
24
24
20
ns
XXX
1.5
ns
CLKi
8
tpd8
SELMS/lS
9
td1 +
ClKi
ClKi
010
72
10
td2+
ClKi
ClKi
000
40
Y OUTPUT
INVALID
Delay time, ClKC after ClK to insure
data captured in C register is data
clocked into sum or product register by
that clock. (PIPES2-PIPESO =
12
ten1
OEY
Y OUTPUT
STATUS
ten2
OEC, OES
14
tdis1
OEY
Y OUTPUT
15
tdis2
OEC, OES
STATUS
13
ns
3.0
tpd7
td3
ns
all but 111
7
11
ns
ns
CLKi
INVALID
ns
3.0
tpd6
STATUS
ns
all but 111
6
INVALID
UNIT
MAX
OXX)
XXX
XXX
XXX
XXX
ns
16
td-O§
16
16
16
ns
16
tThis parameter no longer tested and will be deleted on next Data Manual revision.
+Minimum clock cycle period not guaranteed when operands are fed back using FlOWC to bypass
the C register and operands are used on the same cycle.
§td is the clock cycle period.
,....
~
00
00
l-
e.>
«~
,....
Z
CJ)
7-39
setup and hold times
PIPELINE
NO.
PARAMEtER
CONTROLS
SN74ACT8847-40
PIPES2-PIPESO MIN
16
tsu1
Inst/control before CLKt
17
tsu2
18
tsu3
DA/DB before CLKt
DA/DB before 2nd CLKt (DP)
19
tsu4
CONFIG 1-0 before CLKt
20
tsu5
21
22
tsu6
th1
23
th2
24
th3
th4
25
XXO
XXO
14
13
XX1
52
XXO
14
SRCC before CLKCt
XXX
14
RESET before CLKt
Inst/control after CLKt
XXO
XXX
14
DA/DB after CLKt
XXX
3
SRCC after CLKCt
XXX
3
RESET after CLKt
XXO
6
UNIT
MAX
ns
3
ns
elK/RESET requirements
SN74ACT8847-40
PARAMETER
tw
7-40
Pulse duration
MIN
CLK high
15
eLK low
15
REm
12
MAX
UNIT
ns
switching characteristics
NO.
1
PARAMETER
tpd1
2
tpd2
3
tpd3
4
tpd4
5
tpd5
FROM
(INPUT)
TO
(OUTPUT)
PIPELINE
CONTROLS
PIPES2-PIPESO
SN74ACT8847-50
MIN
DA/DB/lnst
Y OUTPUT
111
t
INPUT REG
Y OUTPUT
110
120
INPUT REG
STATUS
110
120
PIPELN REG
Y OUTPUT
10X
75
PIPELN REG
STATUS
10X
75
OUTPUT REG Y OUTPUT
OXX
36
OUTPUT REG
STATUS
OXX
36
Y OUTPUT
XXX
24
SELMS/CS
6
tpd6
CLKt
7
tpd7
CLKt
8
tpd8
SELMS/lS
9
td1 :j:
ClKt
10
td2:j:
ClKt
Y OUTPUT
ns
ns
ns
ns
all but 111
3.0
ns
XXX
1.5
ns
CLKt
010
100
ClKt
000
50
STATUS
INVALID
Y OUTPUT
INVALID
clocked into sum or product register by
ten 1
OEY
Y OUTPUT
13
ten2
QE<:, O'ES
STATUS
14
tdis1
OEY
Y OUTPUT
OXX)
XXX
XXX
XXX
15
tdis2
OEC,OES
STATUS
XXX
that clock. (PIPES2-PIPESO =
12
ns
3.0
data captured in C register is data
td3
ns
all but 111
INVALID
Delay time, ClKC after ClK to insure
11
UNIT
MAX
ns
16
td-O§
20
20
20
ns
20
tThis parameter no longer tested and will be deleted on next Data Manual revision,
:j: Minimum clock cycle period not guaranteed when operands are fed back using FlOWC to bypass
the C register and operands are used on the same cycle.
ttd is the clock cycle period.
,.....
..t-
OO
00
t-
U
«
..t,.....
Z
CJ)
7-41
setup and hold times
PARAMETER
NO.
16
17
18
19
20
21
22
23
24
25
tsu1
tsu2
tsu3
tsu4
tsu5
tsu6
th1
th2
th3
th4
Inst/control before CLKt
DA/DB before CLKt
DA/DB before 2nd CLKt (DP)
CONFIG1-0 before CLKt
SRCC before CLKCt
RESET before CLKt
Inst/control after CLKt
DA/DB after CLKt
SRCC after CLKCt
REm after CLKt
PIPELINE
SN74ACT8847-50
CONTROLS
PIPES2-PIPESO MIN
MAX
XXO
16
XXO
16
XX1
75
XXO
18
xxx
16
XXO
16
XXX
3
XXX
3
XXX
3
XXO
6
UNIT
ns
ns
elK/RESET requirements
PARAMETER
tw
7-42
Pulse duration
SN74ACT8847-50
MIN
MAX
CLK high
15
CLK low
15
RESET
15
UNIT
ns
, ACT884 7 load Circuit
The load circuit for the 'ACT884 7 is shown in Figure 1.
TESTER PIN
ELECTRONICS
TEST
FROM
S1
OUTPUT~
UNDER TEST
1 ,_
C>
T
Cl
TIMING
PARAMETER
ten
tdis
tpd
tpZH
tplH
tpHZ
tplZ
Cl t
IOH
IOl
IOH
Vl
S1
50 pF
1 rnA
-1 rnA
1.5 V
CLOSED
50 pF
16 rnA
-16 rnA
1.5 V
CLOSED
50 pF
-
-
OPEN
-
tCl includes probe and test fixture capacitance.
NOTE: All input pulses are supplied by generators having the following characteristics:
PRR :s 1 MHz, Zo = 50 n, tr :S 6 ns, tf :S 6 ns.
Figure 1. Load Circuit
,....
""
00
00
l-
e.>
«
"",....
Z
en
7-43
Lfl881~nfflLNS
2
3
4
5
8
9
~I
~I
~I
~I
~I
~I
~:
~:
~:
~:
I
I
f
I
I
I
I
I
-...J
.;,.
-1>0
CLOCKS
~I
CLK.CLKC
III
DATA INPUT
BUSES
{ DA31-0 (
1---17-----01
DB31 OK
,
CONFIG1.0
PIPES2-0
DATA INPUT.J
CONTROLS ..
OPO
~
i
IV
OP3
;'-23
xr------~~~--~------~--------~------~------~--------~------~------~
:
CONFIG1.0 - 00
'::
: ',--_---".__--''--__~i_ _ _' - -_ _..l--_ _--L-_ _---1.._ _- - ! .
X~----,r----__,----__r----_,_----...,...----_r_----,......----,....-
4~22
i
PIPES2-0. 000
i
i
l
i i i
I
V
7
II
1:
I
: ,
11\
I
16 --fi 1<"22
1\
II
i
RA
INPUT REGISTER {
I,
------i'i.
)
i I
I
OPO
I
RB
INSTRUCTION/{
CONTROLS
~
-.l
I ,
--\
22
I
:I
I
,
'i.
OP3
(,"----__,--:..:...:=--__r----_,_-----,-----_r_----,......----t_------r
r----~----~----~---=--=-----~----~----~-----"
~"----~--O~P-2-"""T----_;-----_r----_r_----,......----t_---__r
SELOP7-0 ~ XXXX 1001" SELOP7-0 I. XXXX 01X~'
t"11~\ltut"
CONTROLS ~
NOTES:
~
1
I,
If
I
OP1
(0'- - - - ' - - - - - - ' - - - - - - ' - -
'.{
: jo--16:::::::;1 10-22
('
CONTENTS
11
X
~~~--r------'----__r----""T"-----'-----_r_----,......----,....I -174
I
-
ENRB
OP2 MSH
f<-23
X OP2LSH
KCONFlG1.0_01~
CLKMODE!\
:
ENRA
OP1
II
~
10
+ \,;tu:.u\:)rl
,--,,::,"V~N.:v=.n:;.;.'.::u::.._.:.:v~.,::.._r'':'----_r----_r_----,......-
V M.LILI \;-VI'II I nUL ""..,..."VIVlr IIVI .. WI \a.CC .,",UlCI
Assume the following mixed precision operation.
Single precision OPO+OP1 =RA+RB - SUM1 - CREG, where OPO is SP and OP1 is SP.
Mixed precision OP2.0P2 = RA.RB - PRODUCT1, where OP3 is SP and OP2 is DP.
NOP (must be inserted).
Mixed precision (OP3.0P2) + (OPO + OP1) = PREG + CREG - SUM2 (DP), and then convert to SP.
Assume valid control signals for FAST, HALT = 1, PIPES2-0 =000 (fully pipelined mode), RESET = 1, RND1-0, SELST1-0 = 11, TP1-0 = 11.
Figure 2a. Timing Diagram for: SP ALU
-+
DP MULT
-+
DP ALU
-+
Convert DP to SP
2
3
II
I
I
iI
i
:J'
S R C C - - - ' - - - - - - ' - '- - - - - ' - , '
CREG CONTROLS
{
5
4
': _20:::0:
ENRC-----'-----'------L..'i,
9
10
1
:I
I
:
:I
I
I
I
8
i
-20~
"-24
: : :
I:
I
I
I
"
1"-24
: )'
1 "--164
6
,\
:
10-22
I
FLOWC
:
i
: _
16
,
--=>l ..-
22
1
1--16
11
i l~----'-:---'-I
:t
-I
~-:2c:2--r-------r--
-'-----'-----'------'-----""""'XPRODUCTl IOPlX
I
1
I
1
i
1
_ _ _---._ _ _--,.X SUM1 ISPI X
j
i
I
I ~----~,~-------r,---------,r_------~,---------r--1
1
1
:
1
i
X SUM2 I D P I X X SUM2 (SPI X
i
I '----,.--
.
~13
NOTES:
Assume the following mixed precision operation.
Single precision OPO + OP1 = RA + RB - SUM 1 - CREG, where OPO is SP and OP1 is SP.
Mixed precision OP2.0P2 = RA.RB - PRODUCT1, where OP3 is SP and OP2 is DP.
NOP (must be insertedl.
Mixed precision (OP3.0P21 + (OPO+OPlI = PREG+CREG - SUM2 (DPI. and then convert to SP.
Assume valid control signals for FAST, HALT = 1, PIPES2-0 =000 (fully pipelined model, RESET = 1, RND1-0, SELST1-0 = 11, TP1-0 = 11.
Figure 2b. Timing Diagram for: SP AlU ... DP MUl T ... DP AlU
-...J
./:..
m
SN74ACT8847
-+
Convert DP to SP
L1788.1~~17LNS
~
Ol
3
2
--.J
CLK,CLKC
CLOCKS
4
5
6
7
8
9
10
11
I
I
OA:~~~:UT {
to
OA31-0~
OPO ~~ OP~ MSH
10-17
I ~23
I~
"
J ~ ,~
OB31-0
OP1 LSH
OP1 MSH
OP2 LSH j(i-~_--I:I.I
CONFIG1,0
PIPES2-0
~
CLKMOOE
ENRA
ENRB
INPUT REGISTER {
CONTENTS
t
II
CONfl3.,0-01
r
r
I
DATA INPUT
CONTROLS
t
II
k
16
:
I
;x
:
I
-b
!
:I
----rl~
16 _
RA
:
RB
;
110-0 ,
CONTROLS
C
:
I
I
;Xr ---.J;----J....---..J..-----1------J
:
I
i
PIPES2-0;-000
I
i
:
I 1'r-_ _....L._ _ _-1._ _ _
K'
--1___- - l _
-OJ t- 22
I
I
I
:\
:
:
~
~
fPo
:
;
;
~P1
~(======~:~O:P2~====~::=======~:=======~=======~=======~=======~
:
_--...L--------1.
l'
..,. ...............
u
I
L
J
,1'
J. _
I I
[
:
I I
I'tr-
,
,II\.
I
I
J_
~I
j
VALID CONTROL ASSUMPTIONS Isee Notel
I
I
NOTES:
I
;
:~
I
:
:
:
:
1,1..
:
I
I
I
I
:~?
I
J- 22
,---16 ----0/ \.-'-2~2::----+1----+1----II-----!-----+---~I------!
{
CONTROLS
I
II
SELOP7-0
INSTRUCTIONI
CONFIG1;0-00;X
I i i
----'I"Ii:
___1--_ _ _L-_ _ _..L:_ _ _--l_ _ _--II.-___1--_ _ _L-_
II
I
I
I
I
I
I
Assume the following double precision operation,
OPO + OP1 = RA + RB - SUM1 - CREG
(OPO + OP11 • OP2 = SREG • RB - PRODUCT1
[(OPO + OP11 • OP21] + (OPO + OP11 = PREG + CREG - SUM2
Assume valid control signals for FAST, HALT = 1, PIPES2-0 = 000 (fully pipelined model. RESET = 1, RND1-0, SELST1-0 = 11, TP1-0 = 11.
Figure 3a. Timing Diagram for: DP ALU -- DP MULT -- DP ALU
2
3
II
4
--L,:I I
_ _ _......L_ _ _-JL-._ _ _.J.:_ _ _
5
6
8
II
i
II
I I
1/
SRCC~
-----r------,r----"T----r-~_--20--....j..1 !.-
CREG CONTROLS
{
ENRC
:
:'1
~
:~
:
INTERNAL
{.:.:
_.l---~---I,:X
ALU PIPE
REGISTER CONTENTS
REGIST~~T;~~TENTS
I!
PREG
OEY
I
I
I
I
I~~x ~{,,:
:
I
I
I
I
I
I
I
I
:
:
:I
I
'
:I
I
I
i
II
I
I
::x
:
I
I
I
I
I
I
II
I),
-!« _
12
71
SELMSILS
_
:
OEC.OES
:I I
Y31-0
:
OUTPUT BUSES {
i~SUMI
:I ,
I
STATUS
I
:
!
!
:
:
S U M 1 X X
SUM1:
_ _ _ _......L_ _ _-JL-._ _ _.J.I_ _ _-J:~~'---_HI.
{
11
I
I
I
I
: :
SREG
OUTPUT CONTROL
16 ---II Jo- 22
_
10
I
I
I
I
Ix I I
:::;;x
CREG
{
-, ,'- 24
:,(
,~
9
I
I
I
I
:~
I
I
I
-lot 10- 5 :
:
S~Ml
:
PR~DUCT1:X
:x
:
:x
I
I
I
I
I
I
I
I,
I
Y
:
I
:
:
:
:
:
I
SUM2X
:>e
~UM2
I
I
I
I
I
I
:):
JJ 10-:
I
::
I
5
I
VALID STATUS
14
I
t- ..
I
VALID STATUS
-l.I 10-13
NOTES:
Assume the following double precision operation.
OPO + OP1 = RA + RB .... SUM1 .... CREG
(OPO + OP1) • OP2 = SREG • RB .... PRODUCT1
[(OPO + OP1) • OP2)) + (OPO + OP1) = PREG + CREG .... SUM2
__
Assume valid control signals for FAST, HALT = 1, PIPES2-0 =000 (fully pipelined mode), RESET = 1, RND1-0, SELST1-0 = 11, TP1-0 = 11.
Figure 3b. Timing Diagram for: DP ALU .... DP MULT .... DP ALU
....,
./:.
....,
SN74ACT8847
I
I
:r
M+:
I
VALID STATUS
~
II
I:
Lv88.l:::>~vLNS
.....
.i(Xl>.
2
4
3
5
6
7
8
ClK.ClKC
DA31-0
~n,n'--"--"'i!'
"!~1-;3'--23
.1
"!~1;':0"23
~
\J
_~r----";"--."r----L-------'-------
ClKMODE
ENRA
ENRB
RA
i
~
-----r'
RB
/'
t"
l'
SElOP7-0
SElOP7-0 - 1101 1011
110-0
RA • CREG. PREG + RB
!"
/'1..---...::::.:..::.....-,-----,------
i
i
CONTROL ~
,I
VALID CONTROL ASSUMPTIONS Isee Notel
I
I
I
NOTES: Assume the following single precision operations,
(K * OPO) + OP1 = PRODUCT1 + OP1 .... SUM1
(K * OP2) + OP3 = PRODUCT2 + OP3 .... SUM2
(K * OP4) + OP5 = PRODUCT3 + OP5 .... SUM3
(K * OP6) + OP7 = PRODUCT4 + OP7 .... SUM4
Assume valid control signals for FAST , HALT = 1,
TP1-0 = 11.
I
I
I
PIPES2-0
= 010, RESET = 1, RND 1-0, SELST1-0 = 11 ,
Figure 4a. Timing Diagram for: SP [(Scalar * Vector) + Vector)
2
3
4
8
I
I
I
I
I
I
I
"'F --=-I
:
:
*-
20
::::
24
I :
I
I)
: _16-----.: 1.-22
I
:
FlOWC----------T----------+I----~--~:~
I
:
I
>
I (+--16---01
CREG
;
~
PREG
:
~
:x
PROOUCT1
I
;x
:x
SUM1
I
I
:
:
:
:
:
I
I
:
:
~22
:
PRODUCT2: X
I
I
~
,
ICONSTANT<
K ICONSTANTI:X
______~--------~I
OEY
7
I
I
ENRC - - - - + - - - - . . . . J . , I ) .
SREG
6
I
I
...l..----...L,:L:I
SRCC
PRODUCT3:X
:x
SUM2
I
PROOUCT4
:x
SUM3
I
~r---------:'---------
:x
SUM4
I
:x'-------I
) . I ! : : : : :"
I
SElM/LS (
~
- - --
~r-12
I
':
II
I,
I~
Y31-0
I
STATUS
I
K ICONSTANTIX
SUM1
SUM2
SN74ACT8847
SUM3
SUM4:
I
I
~)l-----15
VALID STATUS
~
VALID STATUS
10-4
~
14- 4
~
14-4
;-4
PIPES2-0
Figure 4b. Timing Diagram for: SP [(Scalar
~
14
IA'
I I
Assume the following single precision operations.
(K * OPO) + OP1 = PRODUCT1 + OP1 -+ SUM1
(K * OP2) + OP3 = PRODUCT2 + OP3 -+ SUM2
(K * OP4) + OP5 = PRODUCT3 + OP5 -+ SUM3
(K * OP6) + OP7 = PRODUCT4 + OP7 -+ SUM4
Assume valid control signals for FAST , HALT = 1.
TP1-0 = 11,
co
'4I .-41-------
!
!
I
-t*-13
NOTES:
!!
-1
__________+-______~: i
OEC,OES
-;J
5
I
I
*
°
~
10-4
= 10, RESET = 1, RND 1-0, SELST1-0 = 11 ,
Vector)
+
Vector)
SN74ACT8847 64-Bit Floating Point Unit
Introduction
Designing with the SN74ACT8847 floating point unit (FPU) requires a thorough
understanding of computer architectures, microprogramming, and IEEE floating point
arithmetic, as well as a detailed knowledge of the 'ACT8847 itself. This introduction
presents a brief overview of the 'ACT884 7 and discusses a number of issues when
designing and programming with this FPU.
Major Architectural Features
The overall architecture for a floating point system is determined by a combination
of design factors. The principal consideration is the set of performance targets that
the floating point processor has to achieve, usually exprE1ssed in terms of clock cycle
period, operating mode (vector or scalar), and operand precision (32 bit, 64 bit, or
other). Of almost equal importance are design constraints of cost, complexity, chip
count, power consumption, and requirements for interfacing to other processors.
The architecture of the 'ACT884 7 is optimized to satisfy several processing and
interface requirements. The FPU has two 32-bit input buses, the DA and DB data buses,
and one 32-bit output bus, the Y bus. This three-port design provides much greater
I/O bus bandwidth than can be achieved by a single-port device (one 32-bit I/O bus).
Two single-precision inputs can be simultaneously loaded on the input buses while
a result is being output on the Y bus.
Internally, the 'ACT8847 FPU consists of two main functional blocks: the multiplier
and the ALU (see Figure 5). Either the multiplier or the ALU can operate independently,
or the two functional units can be used simultaneously in "chained" mode. When
operating independently, each block of the FPU performs a separate set of arithmetic
or logical functions. The multiplier supports multiplication, division and square roots.
The ALU supports addition, subtraction, format conversions, logical operations, and
shifts. Integer division and integer square root require both the multiplier and the ALU;
the final result comes from the ALU.
en
2
-....I
,J:I.
l>
In chained mode, a multiplier operation executes in parallel with an ALU operation.
Possible examples include calculations of a sum of products (multiply and accumulate)
or a product of sums (add and then multiply). The sum of products computation requires
a total of four operands: two new inputs to be multiplied, the sum of previous products,
and the current product to be added to the sum, as shown in Table 3.
('")
-4
IX)
IX)
,J:I.
-....I
7-50
0831-080
OA31-0AO
/32
{32
V
CONFIGURATION
LOGIC
V
V
164
164
I
INPUT REGISTER
INPUT REGISTER
V
I
r
I
l.64
64
MULTIPLIER
I
ALU
V
/64
I
I
~
I
64
I
V MUX
7
v
I
32
V31-VO
Figure 5_ High Level Block Diagram
Table 3_ Sum of Products Calculation
MULTIPLIER OPERATION
ALU OPERATION
A*B
C*O
E * F
(A * B) + 0
(C * 0) + (A * B)
··
·
-
··
·
7-51
Because the' ACT8847 has multiple internal data paths and data registers, this sum
of products can be generated by simultaneous operations on new bus data and internal
feedback, without the necessity of storing either the previous accumulation or the
current product off chip. Data flow for the sum of products calculation is shown in
Figure 6.
A
*
B
PREG + SREG
Figure 6. Multiply/Accumulate Operation
Data Flow in Pipelined Architectures
Several levels of internal data registers are available to segment the internal data paths
of the' ACT884 7. The most basic choice is whether to use the device in flowthrough
mode (with no internal registers enabled) or whether to enable one or more registers.
When none of the internal registers are enabled, the paths through the multiplier and
the ALU are not segmented. In this case, the delay from data input to result output
is the longest.
Enabling one or more registers divides the data paths so that data can be clocked into
internal registers, instead of from an external source to an external destination. Enabling
the input registers permits data and instruction inputs to be registered on chip. Also,
the hardware division and square root operations which the' ACT884 7 performs require
that the input registers be enabled.
In the main data paths, three sets of internal registers are available in the ACT8847:
input registers, pipeline registers in the multiplier and ALU logic blocks, and output
registers to capture results from the multiplier and the ALU. When all three levels of
data registers are enabled, the register-to-register delay inside the device is minimized.
This is the fastest operating mode, and in this configuration the' ACT8847 is said
to be "fully pipelined." While one instruction is executing, the next instruction along
with its associated operands may be input to device so that overlapped operations
occur (see Figure 7).
The selection of operating mode, from flowthrough to fully pipelined, determines the
latency from input to output, the number of clock cycles required for inputs to be
processed and results to appear. For each register level enabled in the data path, one
clock cycle is added to the latency from input to output.
7-52
Inn
ClK
INSTRUCTION INPUTS [
DATA INPUTS
A
+ B
[-A~n
INPUT REGISTER . CONTENTS
I
REGISTE~l~;!~~~~~
OUTPUT SUM
REGISTER CONTENTS
I
r
C
+ D --
I
E
l--~~~
A ,B
n
il
+ F
E, F
~.
-
C ,-~- -[
T-~;;-r
E, F
C + D
A
+ B
Figure 7, Example of Fully Pipelined Operation
-..J
en
w
SN74ACT8847
l-l~
T
r--
E
+ F
C
+ D
1
ul
E
+ F
Control Architectures for High-Speed Microprogrammed Architectures
A separate control circuit is required to sequence the operation of the' ACT884 7. A
sequencer function within the control circuit controls both the sequencer and FPU as
determined by FPU status outputs. Either a standard microsequencer such as the
SN 7 4ACT881 8, or a custom controller such as a PLA or gate array can be used to
control the FPU. Figure 8 shows an example block diagram for a PLA control circuit.
If a standard microsequencer is used, execution addresses for routines stored in the
microprogram memory are generated by the microsequencer. As its name implies,
microprogram memory stores the sequences of microinstructions which control FPU
execution. The' ACT884 7 can be programmed by generating all control bits in a given
microinstruction to select an FPU operation.
One possible control circuit for the ' ACT884 7 consists of a microsequencer,
microprogram memory, and one or more microinstruction registers, together with status
logic as required to support a specific floating point implementation. A control circuit
without an instruction register is typically too slow for use with the' ACT884 7. At
least one microinstruction register is used to hold the current instruction being executed
by the FPU and sequencer (see Figure 9).
Inclusion of the microinstruction register divides the critical path from the sequencer
through the program memory to the FPU control inputs, permitting much faster
execution times. However, when all the internal registers of the FPU are enabled, FPU
operation may be fast enough to require a second register in the control circuit. In
this case, a register on the output bus of the sequencer captures each microprogram
address, and the microinstruction register captures each microinstruction (see
Figure 10).
EXTERNAL
CONTROL/STATUS
PROGRAMMABLE
LOGIC ARRAY
(PLAI
OA
DB
STATUS
Y
Figure 8. PLA Control Circuit Example
7-54
MICROCODE ADDRESS
I
MICROPROGRAM
MEMORY
INSTRUCTION
REGISTER
DA
DB
~32
V
"
"
-
.....
MICROSEQUENCER
t
r'
'ACT8847 FPU
I
I
STATUS
lOGIC
I
I
STATUS
~
Figure 9. Microprogrammed Architecture
-.J
in
(11
SN74ACT8847
32
I
"V
" 32
Y
Lv88l:J"vLNS
-..J
enOJ
MICROCODE ADDRESS
I
-.
-
MICROPROGRAM
MEMORY
INSTRUCTION
REGISTER
~
DB
DA
.
I
.. V
~r 32
32
r
MICROSEQUENCER
~
~
I
L
STATUS
LOGIC
I
J
-
....
'ACT8847 FPU
STATUS
ADDRESS
REGISTER
+
Figure 10. Microprogrammed Architecture with Address Register
~
I'
Y
32
Introducing registers in the FPU data paths and the control circuit complicates I/O
timing, status output timing, the status logic and the microprogram for the FPU and
the sequencer. These timing relationships affect branches, jumps to subroutine, and
other operations depending on FPU status. Some of these programming issues are
discussed below.
Microprogram Control of an 'ACT884 7 FPU Subsystem
A microprogram to control the' ACT884 7 must take into account not only the FPU
operation but also the sequencer operation, especially when the system is performing
a branch on status or handling an exception.
Several options are available for dealing with such exceptions. The' ACT884 7 can
be programmed to discard operands in invalid formats, and some exceptions caused
by illegal operations. In general, though, the microprogram should be designed to handle
a range of status results or exceptions. Hardware timing considerations such as pipeline
delays in both control and data paths must be studied to minimize the difficulty of
performing branches to status exception handlers.
Later sections of the 'ACT884 7 user guide present detailed examples of
microinstructions and timing waveforms, along with interpretations of status outputs
and the choices involved in handling IEEE status exceptions.
, ACT884 7 Data Formats
The' ACT884 7 accepts either operands as normalized IEEE floating point numbers,
(ANSI/IEEE standard 754-1985), unsigned 32-bit integers, or 2's complement integers.
Floating point operands may be either single precision (32 bits) or double precision
(64 bits).
IEEE formats for floating point operands, both single and double precision, consist of
three fields: sign, exponent, and fraction, in that order. The leftmost (most significant)
bit is the sign bit. The exponent field is 8 bits long in single-precision operands and
11 bits long in double-precision operands. The fraction field is 23 bits in single precision
and 52 bits in double precision. The value of the fraction contains a hidden bit, an
implicit leading" 1 ", as shown below:
1. fraction
The representation of a normalized floating point number is:
(-1)S * 1.f * 2(e-bias)
where the bias is either 127 for single-precision operands or 1023 for double-precision
operands.
The formats for single-precision and double-precision numbers are shown in Figure 11
and Figure 12, respectively. Further details of IEEE formats and exceptions are provided
in the IEEE Standard for Binary Floating Point Arithmetic, ANSI/IEEE Std 754-1985.
7-57
31 30
o
23 22
s: sign of fraction
e: a-bit exponent biased by 127
f: 23-bit fraction
Figure 11. IEEE Single-Precision Format
63 62
o
52 51
s: sign of fraction
e: 11-bit exponent biased by 1023
f: 52-bit fraction
Figure 12. IEEE Double-Precision Format
The' ACT884 7 also handles two other operand formats which permit operations with
very small floating point numbers. The ALU accepts denormalized floating point
numbers, that is, floating point numbers so small that they could not be normalized.
If these denormal operands are input to the multiplier, they will cause status exceptions.
Denormals can be passed through the ALU to be "wrapped," and the wrapped
operands can then be input to the multiplier.
A denormalized input has the form of a floating point number with a zero exponent,
a nonzero mantissa, and a zero in the leftmost bit of the mantissa (hidden or implicit
bit). Using single precision, a denorm is equal to:
(-1)S * (2) -126 * fraction
For double precision, a denorm is equal to:
(-1)S * (2) - 1022 * fraction
A denormalized number results from decrementing the biased exponent field to zero
~ before normalization is complete. Since a denormalized number cannot be input to
......
~
f;
~
~
~
......
the multiplier, it must first be converted to a wrapped number by the ALU. A wrapped
number is a number created by normalizing a denormalized number's fraction field and
subtracting from the exponent the number of shift positions (minus one) required to
do so. The exponent is encoded as a two's complement negative number. When the
mantissa of the denormal is normalized by shifting it left, the exponent field decrements
from all zeros (wraps past zero) to a negative two's complement number (except in
the case of 0.1 XXX ... , where the exponent is not decremented).
Floating point formats handled by the 'ACT8847 are presented in Table 4.
7-58
Table 4. IEEE Floating Point Representations
TYPE OF
OPERAND
EXPONENT (e)
SP (HEX) DP (HEX)
FRACTION (f) HIDDEN
(BINARY)
BIT
VALUE OF NUMBER REPRESENTED
SP (DECIMAL) t
DP (DECIMAL) t
Normalized
Number (max)
FE
7FE
All 1 's
1
( - 1)S (2127) (2 - 2 - 23)
( - 1)S (2 1023 ) (2 - 2 - 52)
Normalized
Number (min)
01
001
All O's
1
(-1)S (2- 126) (1)
( - 1)S (2 - 1022) (1)
Denormalized
Number (max)
00
000
All 1 's
0
(1-)S (2-126) (1-2-23)
( - 1)S (2 - 1022) (1 - 2 - 52)
Denormalized
Number (min)
00
000
000 ... 001
0
(_1)5 (2 -126) (2 - 23)
(-1)S (2- 1022) (2- 52 )
Wrapped
Number (max)
00
000
All 1 's
1
( - 1)5 (2 - 127) (2 - 2 - 23)
( - 1)5 (2 - 1023) (2 - 2 - 52)
Wrapped
Number (min)
EA
7eD
All O's
1
( - 1)S (2 - (22 + 127)) (1)
(_1)5 (2 - (51 + 1023)) (1)
Zero
00
000
Zero
0
(-l)S (0.0)
(-1)S (0.0)
Infinity
FF
7FF
Zero
1
( - 1)5 (infinity)
( - 1)S (infinity)
NaN (Not a
Number)
FF
7FF
Nonzero
N/A
None
None
ts
sign bit.
-..J
0,
CO
SN74ACT8847
Status Outputs
Status flags are provided to signal both floating point and integer results. Integer status
is provided using AEQ8 for zero, NEG for sign, and OVER for overflow/carryout.
Status exceptions can result from one or more error conditions such as overflow,
underflow, operands in illegal formats, invalid operations, or rounding. Exceptions may
be grouped into two classes: input exceptions resulting from invalid operations or
denormal inputs to the multiplier, and output exceptions resulting from illegal formats,
rounding errors, or both.
SN74ACT8847 Architecture
Overview
The SN74ACT8847 is a high-speed floating point unit implemented in Tl's advanced
1-ltm CMOS technology. The device is fully compatible with IEEE Standard 754-1985
for addition, subtraction, multiplication, division, square root, and comparison.
The' ACT884 7 FPU also performs integer arithmetic, logical operations, and logical
shifts. Absolute value conversions, floating point to integer conversions, and integer
to floating point conversions are also available. The ALU and multiplier are both included
in the same device and can be operated in parallel to perform sums of products and
products of sums (see Figure 13).
7-60
ENRA
SRCC
-+--j------+------'\
--------------4 FlOWC
___________ HArf
--------------4 BYTEP
--------------4CLK
--------------4 PlPESO
----------+ ClKMODE
-------+ RE"SEf
FlOWC
--r-2-+- TP1·TPO
-+--j-------+------'\
"
~VCC
~GNO
SElMS/LS
SElST1·
SElSTO
----r-----------'\
FROM
INSTRUCTION - PIPELINE
REGISTER0
11
-
PY3·PYO
=
Y31·YO
MSEAR
ED
DIVBYP
IVAl
INEX
OVER
UNDER
OENOAM
DENIN
ANDCO
SACEX
CHEX
STEX1·STEXO
NEG
IN'
Figure 13. 'ACT8847 Detailed Block Diagram
7-61
IEEE formatted denormal numbers are directly handled by the ALU. Denormal numbers
must be wrapped by the ALU before being used in multiplication, division, or square
root operations. A fast mode in which all denormals are forced to zero is provided
for applications not requiring gradual underflow.
The' ACT884 7 input buses can be configured to operate as two 32-bit data buses
or as a single 64-bit bus, providing a number of system interface options. Registers
are provided at the inputs, outputs, and inside the ALU and multiplier to support
multilevel pipelining. These registers can be bypassed for nonpipelined operation.
A clock mode control allows the temporary input register to be clocked on the rising
edge or the falling edge of the clock to support double-precision ALU operations at
the same rate as single-precision operations. A feedback register (C register) with a
separate clock is provided for temporary internal storage of a multiplier result, ALU
result or constant.
Four multiplexers select the multiplier and ALU operands from the input registers, C
register or previous multiplier or ALU result. Results are output on the 32-bit Y bus;
a Y output multiplexer selects the most significant or least significant half of the result
if a double-precision number is being output.
To ensure data integrity, parity checking is performed on input data, and parity is
generated for output data. A master/slave comparator supports fault-tolerant system
design, Two test pin control inputs allow alii/Os and outputs to be forced high, low,
or placed in a high-impedance state to facilitate system testing.
Pipeline Controls
Six data registers in the' ACT884 7 are arranged in three levels along the data paths
through the multiplier and the ALU. Each level of registers can be enabled or disabled
independently of the other two levels by setting the appropriate PIPES2-PIPESO inputs.
When enabled, data is latched into the register on the rising edge of the system clock
(CLK). A separate instruction pipeline register stores the instruction bits corresponding
to the operation being executed at each stage.
The levels of pipelining are shown in Figure 14. The first set of registers, the RA and
RB input registers, are controlled by PIPESO. These registe'rs may be used as inputs
to the ALU, multiplier, or both.
The pipeline registers are the second register set. When enabled by PIPES1, these
registers latch intermediate values in the multiplier or ALU.
The results of the ALU and multiplier operations may optionally be latched into two
output registers by setting PIPES2 low. The P (product) register holds the result of
the multiplier operation; the S (sum) register holds the ALU result.
Table 5 shows the settings of the registers controlled by PIPES2-PIPESO. Operating
modes range from fully pipelined (PIPES2-PIPESO = 000) to flowthrough
(PIPES2-PIPESO = 111). The instruction pipeline registers are also set accordingly.
7-62
PIPE SO
EN
INPUT REGISTER
EN
INPUT REGISTER
EN
MULTIPLIER
PIPELINE REGISTER
EN
PIPELINE REGISTER
EN
MULTIPLIER
PRODUCT REGISTER
EN
EN
INSTRUCTION
REGISTER
EN
INSTRUCTION
PIPELINE REGISTER
EN
INSTRUCTION
PIPELINE REGISTER
PIPES1
ALU
PIPES2
ALU
SUM REGISTER
CLK--
Figure 14. Pipeline Controls
-...J
m
w
SN74ACT8847
Table 5. Pipeline Controls (PIPES2-PIPESO)
PIPES2·PIPESO
X
X
X
X
0
1
X
X
0
1
X
X
0
1
X
X
X
X
REGISTER OPERATION SELECTED
Enables input registers (RA. RBI
Makes input registers (RA. RBI transpar(!nt
Enables pipeline registers
Makes pipeline registers transparent
Enables output registers (PREG. SREG. Status)
Makes output registers (PREG. SREG. Status) transparent
In flowthrough mode all three levels of registers are transparent. a circumstance which
may affect some double-precision operation~. Since double-precision operands require
two steps to input. at least half of the data must be clocked into the temporary register
before the remaining data is placed on the DA and DB buses.
When all registers (except the C register) are enabled. timing constraints can become
critical for many double-precision operations. In clock mode 1. the ALU can perform
a double-precision operation and output a result during every clock cycle. and both
halves of the result must be read out before the end of the next cycle. Status outputs
are valid only for the period during which the Y output data is valid.
Similarly. double-precision multiplication is affected by pipelining. clock mode. and
sequence of operations. A double-precise multiply may require two cycles to execute
and two cycles to output the result. depending on the settings of PIPES2-PIPESO.
Duration of valid outputs at the Y multiplexer depends on settings of PIPES2-PIPESO
and CLKMODE. as well as whether all operations and operands are of the same type.
For example, when a double-precision multiply is followed by a single-precision
operation. one clock cycle must intervene between the dissimilar operations. The
instruction inpl!ts are ignorC3 d during this clock cycle.
Temporary Input Register
A temporary input register is provided to enable loading of two double-precision
numbers on two 32-bit input buses in one clock cycle. The contents of the DA bus
are loaded into the upper 32 bits of the temporary register; the contents of DB are
loaded into the lower 32 bits.
(/)
2 A clock mode signal (CLKMODE) determines the clock edge on which the data will
~ bestored in the temporary register. When CLKMODE is low. data is loaded on the
~
(")
~
rising edge of the clock. With CLKMODE set high, the temporary register loads on
a falling edge and the RA and RB registers can then be loaded on the next rising edge.
The temporary register loads during every clock cycle.
(X)
~
-.oJ
7-64
RA and RB Input Registers
Two 64-bit registers, RA and RB, are provided to hold input data for the multiplier
and AlU. Data is taken from the DA bus, DB bus and the temporary input register.
The registers are loaded on the rising edge of clock ClK if the enables ENRA and ENRB
are set high. PIPESO must be low.
Data input combinations to the 'ACT884 7 vary depending on the precision of the
operands and whether they are being input as A or B operands. loading of external
data operands is controlled by the settings of ClKMODE and CONFIG 1-CONFIGO,
which determine the clock timing for loading and the registers that are used. (See Figure
15).
Configuration Controls
Three input registers are provided to handle input of data operands, either single
precision or double precision. The RA, RB, and temporary registers are each 64 bits
wide. The temporary register is (ordinarily) used only during input of double-precision
operands.
Double-precision operands are loaded by using the temporary register to store half
of the operands prior to inputting the other half of the operands on the DA and DB
puses. As shown in Table 6, four configuration modes for selecting input sources are
available for loading data operands into the RA and RB registers.
DA
DB
TEMPORARY REGISTER
l
LSH
MSH
CONFIG 1 --4......-\.
CONFIGO
ENRA
-----t----....---+----....---+-----'
MSH
LSH
RA INPUT REGISTER
LSH
II/ISH
RB INPUT REGISTER
ENRB-----------------'
Figure 15. Input Register Control
7-65
Table 6. Double Precision Input Data Configuration Modes
LOADING SEQUENCE
DATA LOADED INTO TEMP
DATA LOADED INTO RA/RB
REGISTER ON FIRST CLOCK
REGISTERS ON SECOND
AND RA/RB REGISTERS ON
CLOCK
SECOND CLOCK t
CONFIG1
CONFIGO
0
0
0
1
1
0
1
1
DA
B operand
(MSH)
A operand
(LSH)
A operand
(MSH)
A operand
(MSH)
DB
B operand
(LSH)
B operand
(LSH)
B operand
(MSH)
A operand
(LSH)
DA.
A operand
(MSH)
A operand
(MSH)
A operand
(LSH)
B operand
(MSH)
DB
A operand
(LSH)
B operand
(MSH)
B operand
(LSH)
B operand
(LSH))
tOn the first active clock edge (see Clock Mode Settings), data in this column is loaded into the temporary
register. On the next rising edge, operands in the temporary register and the DAtOS buses are loaded into
the RA and RS registers.
When single-precision or integer operands are loaded, the ordinary setting of
CONFIG1-CONFIGO is 01, as shown in Table 7. This setting loads each 32-bit operand
in the most significant half (MSH) of its respective register. Single-precision operands
are loaded into the MSHs and adjusted to double precision because the data paths
internal to the device are all double precision. It is also possible to load single-precision
operands with other CON FIG settings but two clock edges are required to load both
the A and B operands on the DA bus. The operands are input as the MSHs of the A
and B operands (see Table 6). For example, to load single-precision operands using
CONFIG 1-CONFIGO = 10, the A and B operands are input one active clock edge before
the instruction.
Table 7. Single-Precision Input Data Configuration Mode
DATA LOADED INTO
RA/RB REGISTERS ON
FIRST CLOCK
CONFIG1
CONFIGO
DA
DB
0
1
A operand
B operand
NOTE
This mode is ordinarily used for singleprecision operations.
Clock Mode Settings
Timing of double-precision data inputs is determined by the clock mode setting, which
allows the temporary register to be loaded on either the rising edge (CLKMODE = 0)
or the falling edge of the clock (CLKMODE = 1). Since the temporary register is not
used when single-precision operands are input, clock modes 0 and 1 are functionally
equivalent for single-precision operations using CONFIG 1-CONFIGO = 01.
7-66
The setting of CLKMODE can be used to speed up the loading of double-precision
operands. When the CLKMODE input is set high, data on the DA and DB buses are
loaded on the falling edge of the clock into the MSH and LSH, respectively, of the
temporary register. On the next rising edge, contents of the DA bus, DB bus, and
temporary register are loaded into the RA and RB registers, and execution of the current
instruction begins. The setting of CON FIG 1-CONFIGO determines the exact pattern
in which operands are loaded, whether as MSH or LSH in RA or RB.
Double-precision operation in clock mode 0 is similar except that the temporary register
loads only on a rising edge. For this reason, the RA and RB registers do not load until
the next rising edge, when all operands are available and execution can begin.
A considerable advantage in speed can be realized by performing double-precision
operations with CLKMODE set high. In this clock mode, both double-precision operands
can be loaded on successive clock edges, one falling and one rising. If the instruction
is an ALU operation, then the operation can be executed in the time from one rising
edge of the clock to the next rising edge. Both halves of a double-precision ALU result
must be read out on the Y bus within one clock cycle when the' ACT884 7 is operated
in clock mode 1.
The discussion above assumes that the system is able to furnish two sets of operands
in one cycle (one set on the falling edge of the clock and the other set on the next
rising edge). This assumption may not be valid, since the system is required to "double
pump" the input data buses.
Even for a system that is not able to double pump the input data buses, using clock
mode 1 can reduce microcode size substantially resulting in increased system
throughput. To illustrate, take the case of an operation where the operand(s) are
furnished by one or more of the feedback registers (refer to Table 8). Since the input
data buses are not being used to furnish the operands, the data on the buses at the
time of the instruction is unimportant. By setting CLKMODE high, the instruction begins
after the first cycle, resulting in a savings of one cycle.
Table 8a. Double-Precision CREG
CYCLE
CLKMODE
1
2
0
0
3
X
DA
BUS
DB
BUS
TEMP
REG
X
X
X
X
X
X
X
X
Table 8b. Double-Precision CREG
CYCLE
CLKMODE
1
1
2
X
+ PREG Using CLKMODE ... 0, PIPES2-0 - 010
X
INSTR
BUS
C + P
C + P
X
RA
REG
RB
REG
S
REG
X
X
X
X
X
X
X
X
C
+ P
+ PREG Using CLKMODE - 0, PIPES2-0 - 010
DA
BUS
DB
BUS
TEMP
REG
X
X
X
X
X
X
INSTR
BUS
C + P
X
RA
REG
RB
REG
X
X
X
X
S
REG
X
C
+ P
7-67
Going one step further, take the case of an operation where only one operand needs
to be furnished by the input data buses (refer to Table 9). To take advantage of clock
mode 1, set the CONFIG lines so that the external operand comes directly from the
DA and DB bus, as opposed to coming from the temporary register. Since the temporary
register is not used to provide an operand, the data latched into it is inconsequential.
It naturally follows then that the clock edge used to load the temporary register is
unimportant. So by setting CLKMODE high, a double-precision instruction will begin
after one cycle, instead of two cycles.
Table 9a. Double-Precision PREG
0
0
DA
BUS
X
RB(M)
DB
BUS
X
RB(L)
X
X
X
CYCLE
CLKMODE
1
2
3
Table 9b. Double-Precisioh PREG
1
1
DA
BUS
RB(M)
2
X
X
CYCLE
+ RB Using CLKMODE .. 0, PIPES2-0 .. 010
CLKMODE
INSTR
BUS
P + RB
P + RB
X
TEMP
REG
X
RB
X
RA
REG
X
X
X
+ RB Using CLKMODE
DB
BUS
RB(L)
X
INSTR
BUS
P + RB
X
TEMP
REG
RB
X
RB
REG
X
RB
X
1, PIPES2-0
RA
REG
X
X
RB
REG
RB
X
S
REG
X
X
P + RB
010
S
REG
X
P + RB
Operand Selection
Four multiplexers select the multiplier and ALU operands from the RA and RB registers,
the previous multiplier or ALU result, or the C register (see Figure 16). The multiplexers
are controlled by input signals SELOP7-SELOPO as shown in Tables 10 and 11. For
division and square root operations, operands must be sourced from the input registers
RA and RB.
Table 10. Multiplier Input Selection
A1 IMUX1) INPUT
en
SELOP6
OPERAND SOURCEt
SELOP5
SELOP4
OPERAND SOURCE t
0
0
0
0
0
0
1
0
1
1
Reserved
C register
ALU feedback
RA input register
Reserved
C register
Multiplier feedback
RB input register
2
"""
~
»
('")
-t
B1 IMUX2) INPUT
SELOP7
1
1
1
1
0
1
t For division or square root operations, only RA and RB registers can be selected as sources.
(X)
(X)
~
"""
7-68
ENRA
FROM
C REGISTER -- - FROM PRODUCT - - - REGISTER
SELOP7-6
SELOP5-4
t----------- ENRB
------------~
I
I
~
~
•
I
~
\.
SELOP1-0
-------------+------'
64
MULTIPLIER
Figure 16. Operand Selection Multiplexer
-..J
a,
(0
SN74ACT8847
SUM
T - ~ - FROM
REGISTER
SELOP3-2
Table 11. ALU Input Selection
82 (MUX4) INPUT
A2 (MUX3) INPUT
SELOP3
SELOP2
OPERAND SOURCEt
SELOP1
SELOPO
OPERAND SOURCEt
0
0
0
0
0
0
1
1
0
Reserved
C register
Multiplier feedback
RA input register
1
1
0
Reserved
C register
ALU feedback
RB input register
1
1
1
1
t For division or square root operations, only RA and RB registers can be selected as sources.
As shown in Tables 10 and 11, data operands can be selected from five possible
sources, including external inputs from the RA and RB registers. feedback from the
P (Product) and S (Sum) registers, and a stored value in the C register. Contents of
the C register may be selected as either the A or the B operand in the ALU, the multiplier,
or both. When an external input is selected, the RA input always becomes the A
operand, and the RB input is the B operand.
Feedback from the ALU can be selected as the A operand to the multiplier or as the
B operand to the ALU, Similarly, multiplier feedback may be used as the A operand
to the ALU or the B operand to the multiplier. During division or square root operations,
operands may not be selected except from the RA and RB input registers
(SELOP7-SELOPO = 11111111).
Selection of operands also interacts with the selected operation in the ALU or the
multiplier. ALU operations with one operand are performed only on the A operand (with
the exception of the Pass B operation). Also, depending on the instruction selected,
the B operand may optionally be forced to zero in the ALU or to one in the multiplier.
If an operation uses one or more feedback registers as operands, the unused busIes)
can be used to preload operand(s) for a later operation. The data is loaded into the
RA or RB input register(s); when the data is needed as an operand, the SELOPS pins
are set to select the RA or RB register(s), but the register input enables (ENRA, ENRB)
are not enabled. The one restriction on preloading data is that the operation being
performed during the preload MUST use the same data type (single-precision, doubleprecision, or integer) as the data being loaded. Operands cannot be preloaded within
square root or divide instructions.
C Register
The 64-bit constant (C) register is available for storing the result of an ALU or multiplier
operation before feedback to the multiplier or ALU. The C register has a separate clock
input (CLKC), input source select (SRCCI. and write enable (ENRC, active low).
The C register loads from the P or the S register output, depending on the setting of
SRCC. SRCC = 1 selects the multiplier as the input source. Otherwise, the ALU is
selected when SRCC = O. The SRCC input is not registered with the instruction inputs.
Depending on the operation selected and the settings of PIPES2-PIPESO, an offset
of one or more cycles may be necessary to load the desired result into the C register.
The register only loads on a rising edge of CLCK when ENRC is low. (See Figure 17).
7-70
~
~
td t
I
I
I
I
ClKC
f
I
I
I
"
f
\
ClK --~I
11
~
t td is the clock cycle period.
Figure 1 7. C Register Timing
-...J
~
SN74ACT8847
'-
A separate control (FLOWC) is available to bypass the C register when feeding an
operand back on theC register feedback bus. When FLOWC is high, the output of
the P or S register (as selected by SRCC) bypasses the C register without affecting
the C register's contents. Direct P or S feedback is unaffected by the FLOWC setting.
Pipelined ALU
The pipelined ALU contains a circuit for floating point addition and/or subtraction of
aligned operands, a pipeline register, an exponent adjuster and a normalizer/rounder
as shown in Figure 18. An exception circuit is provided to detect denormal inputs;
these can be flushed to zero if the FAST input is set high. If the FAST input is low,
the ALU accepts a denormal as input. A de norm exception flag (DENORM) goes high
when the ALU output is a denormal.
Integer processing in the ALU includes both arithmetic and logical operations on either
two's complement numbers or unsigned integers. The ALU performs addition,
subtraction, comparison, logical shifts, logical AND, logical OR, and logical XOR.
The ALU may be operated independently or in parallel with the multiplier. Possible ALU
functions during independent operation are given in Table 12.
EXPONENT SUBTRACTER
PREALIGNMENT
INTEGER ALU
NORMALIZER
ROUNDER
Figure 18. Functional Diagram for ALU
7-72
Table 12. Independent ALU Operations
SINGLE OPERAND
Pass
Move
Format Conversions
Wrap Denormalized Number
Unwrap
Shift
TWO OPERANDS
Add
Subtract
Compare
AND
OR
XOR
Pipelined Multiplier
The pipelined multiplier (see Figure 19) performs a basic multiply function, division
and square root. The operands can be singie-precision or double-precision floating point
numbers and can be converted to absolute values before multiplication takes place.
Integer operands may also be used. Independent multiplier operations are summarized
in Table 13.
If the operands to the multiplier are double precision or mixed precision (ie. one single
precision and one double precision), then one extra clock cycle is required to get the
product through the multiplier pipeline. This means that for PIPES 1 = 1, one clock
cycle is required for the multiplier pipeline; for PIPES 1 = 0, two clock cycles are required
for the multiplier pipeline.
RECODER
MULTIPLIER/DIVIDER
CONVERTER
"
~
CO
CO
I-
U
«~
NORMALIZER
"z
en
Figure 19. Functional Diagram for Multiplier
7-73
Table 13. Independent Multiplier Operations
SINGLE OPERAND
Square Root
TWO OPERANDS
Multiply
Divide
An exception circuit is provided to detect denormalized inputs; these are indicated
by a high on the DENIN signal. Denormalized inputs must be wrapped by the ALU before
multiplication, division, or square root. If results are wrapped (signaled by a high on
the DENORM status pin). they must be unwrapped by the ALU.
The multiplier and ALU can be operated simultaneously by setting the 11 0 instruction
input high. Division and square root are performed as independent multiplier operations,
even though both multiplier and ALU are active during divide and SQRT operations.
Data Output Controls
Selection and duration of results from the Y output multiplexer may be affected by
several factors, including the operation selected, precision of the operands, registers
enabled, and the next operation to be performed. The data output controls are not
registered with the data and instruction inputs. When the device is microprogrammed,
the effects of pipelining and sequencing of operations should be taken into account.
Two particular conditions need to be considered. Depending on which registers are
enabled, an offset of one or more cycles must be allowed before a valid result is available
at the Y output multiplexer. Also, certain sequences of operations may require both
halves of a double-precision result to be read out within a single clock cycle. This is
done by toggling the SELMS/LS signal in the middle of the clock period.
When a single-precision result is output, the SELMS/LS signal has no effect. The
SELMS/LS signal is set low only to read out the LSH of a double-precision result (see
Figure 20). To read out a result on the Y bus, the output enable OEY must be low.
, OEY is an asynchronous signal.
7-74
PRODUCT REGISTER
SUM REGISTER
64
,.--------'
64
16---~
SELMS/LS - - - r - - - - - - - - 4
FROM
INSTRUCTION
REGISTER
Y BUS
Figure 20. Y Output Control
Parity Checker/Generator
When BYTEP is high, internal even parity is generated for each byte of input data at
the DA and DB ports and compared to the PA and PB parity inputs respectively. If
an odd number of bits is set high in a data byte, a parity check can also be performed
on the entire input data word by setting BYTEP low. In this mode, PAO is the parity
input for DA data and PBO is the parity input for DB data.
Even parity is generated for the Y multiplexer output, either for each byte or for each
word of output, depending on the setting of BYTEP. When BYTEP is high, the parity
generator computes four parity bits, one for each byte of the Y multiplexer output.
Parity bits are output on the PY3-PYO pins; PYO represents parity for the least significant
byte. A single parity bit can also be generated for the entire output data word by setting
BYTEP low. In this mode, PYO is the parity output.
,....
"d-
00
00
Master/Slave Comparator
~
A master/slave comparator is provided to compare data bytes from the Y output U
multiplexer and the status outputs with data bytes on the external Y and status ports
when OEY, OES and OEC are high. If the data bytes are not equal, a high signal is ~
2
generated on the master/slave error output pin (MSERR).
«
en
Figure 21 shows an example master/slave circuit. Two' ACT884 7 slave devices verify
the data/status integrity of the' ACT884 7 master.
7-75
L v881~:n1v L NS
-..J
~
OJ
ARBITRATION I
CONTROL LOGIC
Y OUT
STATUS OUT
OUTPUT
Figure 21. Example of Master/Slave Operation
Status and Exception Generation
A status and exception generator produces several output signals to indicate invalid
operations as well as overflow, underflow, non-numerical and inexact results, in
conformance with IEEE Standard 754-1985. If output registers are enabled
(PIPES2 = 0), status and exception results are latched in the status register on the rising
edge of the clock. Status results are valid at the same time as associated data results
are valid.
Duration and availability of status results are affected by the same timing constraints
that apply to data results on the Y bus. Status outputs are enabled by two signals,
OEC for comparison status and OES for other status and exception outputs. Status
outputs are summarized in Tables 14 and 15.
Table 14. Comparison Status Outputs
SIGNAL
RESULT OF COMPARISON (ACTIVE HIGH)
AEQB
The A and B operands are equal. A high signal on the AEQB output indicates a
zero result from the selected source except during a compare operation in the ALU.
During integer operations, indicates zero status output.
AGTB
The A operand is greater than the B operand.
UNORD
The two inputs of a comparison operation are unordered, i.e., one or both of the
inputs is a NaN.
During a compare operation in the ALU, the AE08 output goes high when the A and
8 operands are equal. When any operation other than a compare is performed, either
by the ALU or the multiplier, the AE08 signal is used as a zero detect.
7-77
Table 15. Status Outputs
SIGNAL
STATUS RESULT
CHEX
If 16 is low, indicates the multiplier is the source of an exception during a chained
function. If 16 is high, indicates the ALU is the source of an exception during a
chained function.
DENIN
Input to the multiplier is a denorm. When DENIN goes high, the STEX pins indicate
which port had the denormal input.
DENORM
The multiplier output is a wrapped number or the ALU output is a denorm. In the
FAST mode, this condition causes the result to go to zero. It also indicates an
invalid integer operaion, i.e., PASS (-A) with unsigned integer operand.
DIVBYO
An invalid operation involving a zero divisor has been detected by the multiplier.
ED
Exception detect status signal representing logical OR of all enabled exceptions
in the exception disable register.
INEX
INF
(J)
2
~
The result of an operation is not exact.
The output is the IEEE representation of infinity.
IVAL
A NaN has been input to the multiplier or the ALU, or an invalid operation
[(0 * (0) or (+ 00 - (0) or (- 00 + (0)) has been requested. This signal also goes
high if an operation involves the square root of a negative number. When IVAL
goes high, the STEX pins indicate which port had the NaN.
NEG
Output value has negative sign.
OVER
The result is greater than the largest allowable value for the specified format.
RNDCO
The mantissa of a number has been increased in magnitude by rounding. If the
number generated was wrapped, then the unwrap round instruction must be used
to properly unwrap the wrapped number (see Table 8).
SRCEX
The status was generated by the multiplier. (When SRCEX is low, the status was
generated by the ALU.)
STEXO
A NaN or a denorm has been input on the B port.
STEXl
A NaN or a denorm has been input on the A port.
UNDER
The result is inexact and less than the minimum allowable value for the specified
format. In the FAST mode, this condition causes the result to go to zero.
In chained mode, results to be output are selected based on the state of the 16 (source
output) pin (if 16 is low, ALU status will be selected; if 16 is high, multip[ier status
will be selected). If the nonse[ected output source generates an exception, CHEX is
set high. Status of the nonse[ected output source can be forced using the SELST pins,
as shown in Table 16.
»
(")
~
(X)
(X)
~
'-I
7-78
MULTIPLIER STATUS REGISTER
ALU STATUS REGISTER
18
18
SElST1-0, 16
\.
OES~
STATUS OUTPUT
b---OEC
COMPARISON
STATUS OUTPUT
Figure 22. Status Output Control
-..J
~
co
SN74ACT8847
Table 16. Status Output Selection (Chained Mode)
SELST1SELSTO
00
01
10
11
STATUS SELECTED
Logical OR of ALU and multiplier exceptions (bit by bit)
Selects multiplier status
Selects ALU status
Normal operation (selection based on result source specified by 16 input)
An exception detect mask register is available to mask out selected exceptions from
the multiplier, ALU, or both. Multiply status is disabled during an independent ALU
instruction, and ALU status is disabled during multiplier instructions. During chained
operation, both status outputs are enabled.
When the exception mask register has been loaded with a mask, the mask is applied
to the contents of the status register to disable unnecessary exceptions. Status results
for enabled exceptions are then ORed together and, if true, the exception detect (ED)
status output pin is set high (see Figure 23). Individual status outputs remain active
and can be read independently from mask register operations.
7-80
EXCEPTION
DETECT MASK
MULTIPLIER
MULTIPLIER
ALU
5
6
SELST1-0, 16
\.
OES----o.
ED
Figure 23. Exception Detect Mask Logic
-...J
00
SN74ACT8847
Microprogramming the ' ACT884 7
Because the' ACT884 7 is microprogrammable, it can be configured to operate on either
integer or single- or double-precision data operands, and the operations of the registers,
ALU, and multiplier can be programmed to support a variety of applications. The
following sections present not only control settings but the timings of the specific
operations required to execute the sample instructions.
Control Inputs
Control inputs to the 'ACT8847 are summarized in Table 17 below. Several of the
inputs have already been discussed; refer to the page listed in the table for detailed
information.
The remaining inputs are discussed in the following sections. All control signals and
their associated tables are also listed in the' ACT884 7 Reference Guide to provide
a complete, easy-to-access reference for the programmer already familiar with
, ACT884 7 operation.
7-82
Table 17. Control Inputs
SIGNAL
HIGH
BYTEP
Selects byte parity generation and
test
Clocks all registers (except C) on
rising edge
Clocks C register on rising edge
Enables temporary input register
load on falling clock edge
See Table 6 (RA and RB register
data source selects)
No effect
CLK
CLKC
CLKMODE
CONFIG1CONFIGO
ENRC
ENRA
If register is not in flowthrough,
enables clocking of RA register
ENRB
If register is not in flowthrough,
enables clocking of RB register
Places device in FAST mode
FAST
FLOW_C
HALT
OEC
DES
OEY
PIPES2PIPESO
RESET
RND1RNDO
SELOP7SELOPO
SELMS/LS
SELST1SELSTO
SRCC
TP1-TPO
Causes output value to bypass C
register and appear on C register
output bus.
No effect
Disables compare pins
Disables status outputs
Disables Y bus
See Table 5 (Pipeline Mode
Control)
No effect
See Table 18 (Rounding Mode
Control)
See Tables 10 and 11 (Multiplier!
ALU operand selection)
Selects MSH of 64-bit result for
output on the Y bus (no effect on
single-precision operands)
See Table 16 (Status Output
Selection)
Selects multiplier result for input
to C register
See Table 22 (Test Pin Control
Inputs)
LOW
PAGE
NO.
Selects single bit parity
generation and test
No effect
7-75
No effect
Enables temporary input
register load on rising clock edge
See Table 42 (RA and RB
register data source selects)
Enables C register load when
CLKC goes high.
If register is not in flowthrough,
holds contents of RA
register
If register is not in flowthrough,
holds contents of RB register
Places device in IEEE mode
7-70
7-66
No effect
7-72
Stalls device operation but
does not affect registers, internal
states, or status. C register
loading is not disabled
Enables compare pins
Enables status outputs
Enables Y bus
See Table 5 (Pipeline Mode
Control)
Clears internal states, status,
internal pipeline registers, and
exception disable register. Does
not affect other data registers.
See Table 18 (Rounding Mode
Control)
See Tables 10 and 11
(Multiplier!ALU operand selection
Selects LSH of 64-bit result for
output on the Y bus (no effect
on single-precision operands)
See Table 16 (Status Output
Selection)
Selects ALU result for input to
C register
See Table 22 (Test Pin Control
Inputs)
7-85
7-62
7-65
7-70
7-65
7-65
7-84
7-77
7-77
7-74
7-62
7-86
7-84
7-68
7-74
7-78
7-70
7-86
7-83
Rounding Modes
The' ACT884 7 supports the four IEEE standard rounding modes: round to nearest,
round towards zero (truncate). round towards infinity (round up), and round towards
minus infinity (round down). The rounding function is selected by control pins RND1
and RNDO, as shown in Table 18.
Table 18. Rounding Modes
RND1-
ROUNDING MODE SELECTED
RNDO
o
0
0 1
1 0
1 1
Round
Round
Round
Round
towards
towards
towards
towards
nearest
zero (truncate)
infinity (round up)
negative infinity (round down)
Rounding mode should be selected to minimize procedural errors which may otherwise
accumulate and affect the accuracy of results. Rounding to nearest introduces a
procedural error not exceeding half of the least significant bit for each rounding
operation. Since rounding to nearest may involve rounding either upward or downward
in successive steps, rounding errors tend to cancel each other.
In contrast, directed rounding modes may introduce errors approaching one bit for
each rounding operation. Since successive rounding operations in a procedure may
all be similarly directed, each introducing up to a one-bit error, rounding errors may
accumulate rapidly, especially in single-precision operations.
FAST and IEEE Modes
The device can be programmed to operate in FAST mode by asserting the FAST pin.
In the FAST mode, all denormalized inputs and outputs are forced to zero.
~
Placing a zero on the FAST pin causes the chip to operate in IEEE mode. In this mode,
the ALU can operate on denorrnalized inputs and return denormals. If a de norm is input
to the multiplier, the DENIN flag will be asserted, and the result will be invalid. Denormal
numbers must be wrapped before being input to the multiplier. If the multiplier result
underflows, a wrapped number will be output.
~
Handling of Denormalized Numbers (FAST)
~
The FAST input selects the mode for handling denormalized inputs and outputs. When
the FAST input is set low, the ALU accepts denormalized inputs but the multiplier
generates an exception when a denormal is input. When FAST is set high, the DENIN
status exception is disabled and all denormalized numbers, both inputs and results,
are forced to zero.
~
(X)
~
""'"
A denormalized input has the form of a floating point number with a zero exponent,
a nonzero mantissa, and a ~ero in the leftmost bit of the mantissa (hidden or implicit
bit). A denormalized number results from decrementing the biased exponent field to
7-84
zero before normalization is complete. Since a denormalized number cannot be input
to the multiplier, it must first be converted to a wrapped number by the ALU. When
the mantissa of the denormal is normalized by shifting it left, the exponent field
decrements from all zeros (wraps past zero) to a negative two's complement number
(except in the case of 0.1 XXX ... ). where the exponent is not decremented.
Exponent underflow is possible during multiplication of small operands even when the
operands are not wrapped numbers. Setting FAST = 0 selects gradual underflow so
that denormal inputs can be wrapped and wrapped results are not automatically
discarded. When FAST is set high, denormal inputs and wrapped results are forced
to zero immediately.
When the multiplier is in IEEE mode and produces a wrapped number as its result,
the result may be passed to the ALU and unwrapped. If the wrapped number can be
unwrapped to an exact denormal, it can be output without causing the underflow status
flag (UNDER) to be set. UNDER goes high when a result is an inexact denormal, and
a zero is output from the FPU if the wrapped result is too small to represent as a
denormal (smaller than the minimum denorm). Table 10 describes the handling of
wrapped multiplier results and the status flags that are set when wrapped numbers
are output from the multiplier.
Table 19. Handling Wrapped Multiplier Outputs
TYPE
OF RESULT
STATUS FLAGS SET
NOTES
DENORM
INEX
RNDCO
Wrapped,
exact
1
0
0
Unwrap with 'Wrapped
exact' ALU instruction
Wrapped,
inexact
1
1
0
Unwrap with 'Wrapped
inexact' ALU instruction
Wrapped,
increased in
magnitude
1
1
1
Unwrap with 'Wrapped
rounded' ALU instruction
When operating in chained mode, the multiplier may output a wrapped result to the
ALU during the same clock cycle that the multiplier status is output. In such a case
the ALU cannot unwrap the operand prior to using it, for example, when accumulating
the results of previous multiplications. To avoid this situation, the FPU can be operated
in FAST mode to simplify exception handling during chained operations. Otherwise,
wrapped outputs from the multiplier may adversely affect the accuracy of the chained
operation, because a wrapped number may appear to be a large normalized number
instead of a very small denormalized number.
Because of the latency associated with interpreting the FPU status outputs and
determining how to process the wrapped output, it is necessary that a wrapped operand
be stored external to the FPU (for example, in an external register file) and reloaded
to the A port of the ALU for unwrappjng and further processing.
7-85
Stalling the Device
Operation of the 'ACT884 7 can be stalled nondestructively by means of the HALT
signal. Bringing the HALT input low causes the device to inhibit the next rising clock
edge. Register contents are unaltered when the device is stalled, and normal operation
resumes at the next low clock period after the HALT signal is set high.
Stalling the device does not stall the C register. If ENRC is low, CLKC will clock in
data from the source selected by SRCC.
For some operations, such as a double-precision multiply with CLKMODE = 1, setting
the HALT input low may interrupt loading of the RA, RB, and instruction registers,
as well as stalling operation. In clQck mode 1, the temporary register loads on the falling
edge of the clock, but the HALT signal going low would prevent the RA, RB, and
instruction registers from loading on the next rising clock edge. It is therefore necessary
to have the instruction and data inputs on the pins when the HALT signal is set high
again and normal operation resumes.
RESET
The RESET input is an active-low signal that asynchronously clears the internal states,
status, and exception disable mask. Internal pipeline registers are cleared, but the RA,
RB, and C registers are riot. Operation resumes when RESET goes high again.
Test Pins
Two pins, TP1-TPO, support system testing. These may be used, for example, to place
all outputs in a high-impedance state, isolating the chip from the rest of the system
(see Table 20).
Table 20. Test Pin Control Inputs
TP1TPO
0
0
0
1
0
1
1
7-86
1
OPERATION
All outputs and I/Os are forced low
All outputs and I/Os are forced high
All outputs are placed in a high impedance state
Normal operation
Independent ALU Operations
Configuration and operation of the' ACT884 7 can be selected to perform single- or
double-precision floating point and integer calculations in operating modes ranging from
flowthrough to fully pipelined. Timing and sequences of operations are affected by
settings of clock mode, data and status registers, input data configurations, and
rounding mode, as well as the instruction inputs controlling the ALU and the multiplier.
Three modes of operation can be selected with inputs 110-10, including independent
ALU operation, independent multiplier operation, or simultaneous (chained) operation
of ALU and multiplier. Each of these operating modes is treated separately in the
following sections.
The ALU executes single- and double-precision operations which can be divided
according to the number of operands involved, one or two. Tables 21 and 22 show
independent ALU operations with one operand, along with the inputs 110-10 which
select each operation. Conversions from one format to another are handled in this mode,
with the exception of adjustments to precision during two-operand ALU operations.
The wrapping and unwrapping of operands is also done in this mode.
Most format conversions involve double-precision timing. Conversions between singleand double-precision floating point format are treated as mixed-precision operations
requiring two cycles to load the operands. A single-precision number is loaded in the
upper half (MSH) of its input register. During integer to floating point conversions,
the integer input should be loaded into the upper half of the RA register. If converting
from integer to double precision, then two cycles are required.
Logical shifts can be performed on integer operands using the instructions shown in
Table 22. The data operand to be shifted is input from any valid operand source and
the number of bit positions the operand is to be shifted is input only from the DB bus.
The shift number on the DB bus should be in positive 32-bit integer format, although
only the lowest eight bits are used. The shift number cannot be selected from sources
other than the RB register, and the shift number must be loaded on the same cycle
as the instruction.
"
c::t
00
00
I-
(J
1
1~.....,,! !I
, -_ _-1.
:
2ND OPS
SELMS/LS
OUT(31,O) STATUS(18,O)
NOTE: Assume PIPES2-0=110, CLKMODE=O, CONFIG1-0=OO, ENRA=l, ENRB=l, OEY=O, OEC=OES=O, RESET=HALT=l, TP1-0=11
';'I
o
eN
Figure 29. Double-Precision Independent ALU Operation, Input Registers Enabled
(PIPES2-PIPESO = 110, CLKMODE = 0)
SN74ACT8847
L17881:>"17LNS
-.J
,
~
0
.j:o.
-
Load Rest
of Second
Operands
Load Rest
of First
Operands
Begin Second
Operation
Load HaH
of First
Operands
Bagln First
Operation
Load Half
of Second
Operands
Load Output
~
l
~
~
I
14
11
.1
I
I
~
I
I
ClK
..i....-_ _---'-,-..." ,
FIRST INSTRUCTION
I
If- 16
---+I
INSTRUCTION:
i
II
I
i
.Ie
II
I
I
k-22~
I
... 22 ....... 16 .....
I
I
:
I
I
I
I
I
I
I
I
I
HALF
2ND OPS
II
i
II
.1.... 17 ----..! I+- 17
23
23
DATA(31,0) A AND B INPUTS
I
17 ~
I
22 -M---+I I+- 16 ---.J
r-------~--~, Ir--~!--~' •
HALF
1ST OPS
_ _"'"
SECOND INSTRUCTION
FUNC(10,0), RND(1,0), FAST
I
I
I , I~_....L..._
I
HALF
3RD OPS
REST
2ND OPS
II
I
tM---+I~
23
171
I
I
II
I
23-+1~
171
I
REST
3RD OPS
II
i
I
23 -.... JMM-- 23 ~
17
I
L
SElMS/lS
OUT(31 ,0) STATUS(18,01
NOTE: Assume PIPES2-0=010, CLKMODE=1, CONFIG1-0=11, ENRA=1, ENRB=1, OEY=O, OEC=OES=O, RESET=HALT=1, TP1-0=11
Figure 30. Double-Precision Independent ALU Operation, Input and Output Registers Enabled
(PIPES2-PIPESO = 010, CLKMODE .. 1)
Load Rest
of First
Operands
Load Half
of First
Operands
I
I
I
,
14
I
I
I
I
I
i
1+-16 --+I
INSTRUCTION:
10
I
I
L
I
~14
10---+\
I
I
Ir--------------------~
I
J
II
I
II
I
22 *---+11+-16 --+I
I
22~1+-16
I
I
i
'I
I
I
I
I
I
HALF
3RD OPS
REST
3RD OPS
I
I
22-!+--+1
-+I
FUNC(10.0l. RND(1.0L FAST
i
I -"'"',
r------..I..
'
REST
1ST OPS
II
17 ~ I+- 17
I
I+-
!
!
FIRST INSTRUCTION
I'
I
Load Output
1
I
Begin Second
Operation
load Pipeline
1
I
ClK
Load Half
of Second
Operands
Begin First
Operation
I
Load Rest
of Second
Operands
23
I
~4
I
HALF
2ND OPS
II
~I I+-- 17
23
I
~14
I I
~I I+-- 17
23
II
I
~I.
--+1+---+114---- 17
I
23
II
~ \4-- 17
,,'-_ _ _ _ _ _ __
I
~ 23
I
~I.
23
DATA(31.0) A AND B INPUTS
L__J
SElMS/lS
I
OUT(31.0) STATUS(1.8.0)
I
1+--+1
4
NOTE: Assume
-;J
~
o
01
I
I
........
5
I
I
I
I
I
I
~
14-+1
4
5
4
PIPES2-0~OOO, CLKMODE~O, CONFIG1-0~11, ENRA~l, ENRB~l, OEY~O, OEC~OES~O, RESET~HALT~1,
Figure 31. Double-Precision Independent ALU Operation, All Registers Enabled
(PIPES2-PIPESO = 000, CLKMODE = 0)
SN74ACT8847
I
1+-+1
TP1-0
11
Sample Independent Multiplier Microinstructions
The following independent multiplier timing diagram exam pies show five register
settings, ranging through fully pipelined. Examples for divide and square root are
inqluded in this section. X = don't care.
FIRST INSTRUCTION
:~
__________ ______
~I
J,~
__________
INSTRUCTION: FUNC(10,O), RND(1,O), FAST
==*
I
I
I
FIRST OPERANDS
~
~
____- J
~--------------
I
:
SECOND OPE,RANDS
X. . _______________
~
DATA(31,O) A AND B INPUTS:
I::
~
I~
FIRST
~ SECOND
RESULT ~ RESULT
I
I
It---- 1 - - - .
OUT(31,O), STATUS(18,O)
NOTE:
I
\4---
I
1
----t
Assum~PES2~1, CONFIG1-0=Ol, ENRA=X, ENRB=X, SELMS/LSX, OEY=O,
OEC=OES=O, RESET = HALT = 1 TP1-0=11
Figure 32. Single-Precision Independent Multiplier Operation, AU Registers
Disabled (PIPES2-PIPESO .. 111, CLKMODE .. Xl
7-106
load Second Operands
regin Second Operation
load First Operands
Begin First Operation
l
~__- ,__- J I
i~~~~~J
~~~~~~~~~~~
I.-
...- 16 ..;.. 22..1
INSTRUcrTION: FUNC(10,O), RND(1,OI. FAST
( Op~l:j~DS ~ O~~~~~~S ~
~
I
...- 17 .... 23-tf
DATA(31 ,0) A AND B INPUTS
I
I
~
14
2.1
I
I!
..-- 17 ....N...
~~Mo
... 23
I
R~~~~T
I
I
~
~
2
.,\
OUTl31.01 STATUS(18.01
NOTE: Assume PIPES2-0 = 110. CONFIG 1-0 = 01, ENRA = 1, ENRB = 1, SELMS/LS = X, OEY = 0,
OEC =OES=O, RESET=HALT= 1 TP1-0= 11
Figure 33. Single-Precision Independent Multiplier Operation, Input
Registers Enabled (PIPES2-PIPESO - 010, CLKMODE = Xl
7-107
load First Operands
Begin First Operation
load Second Operands
Begin Second Operation
l
~
I
I
I
ClK
I
Ie
9------~~
I
I
I
I
I
I
I
I
I
.... 16 ..... 22 ~
14- 16
I
I
INSTRUCTION: FUNC(10.01. RND(1.01. FAST
;
{
I
I
i
I
I
-1~~_.r.1~22
I
I
,:
I
I
I
op~ld~DS
~ o~~~~~~S ~
I
I
It- 17 ... 23 ~
DATA(31.01 A AND B INPUTS
I
I
I
14- 1 7 - ....IIIt--... 23
f--1
I
I
~
FIRST RESULT
----~--------------------------~--- I
I ~-------------------~4-..1
OUT(31.01 STATUS(18.01
NOTE: AssumillPES2-0=010. CONFIG1-0=01. ENRA=1. ENRB=1. SELMS/LS=X. OEY=O.
OEC=OES=O. RESET = HALT = 1 TP1-0= 11
Figure 34. Single-Precision Independent Multiplier Operation, Input and Output
Registers Enabled (PIPES2-PIPESO .. 010. CLKMODE ... XI
,-7c108
Load Second
Operands
Load First
Operands
Begin Second
Operation
Begin First
Operation
I
Load Pipeline
Begin Fourth
Operation
Begin Fifth
Operation
Load Pipeline
Load Pipeline
Load Pipeline
Load Output
Load Output
Load Output
.
I
I
I
I
I
I
10
I
j
-,-..
-----+I
.,~
16
II
~
I
FOURTH
INSTRUCTION
16
~
~--.I
1
I
\I I i
L
I
SECOND
INSTRUCTION
INSTRUCTION: FUNC(10,0), RND(1,0) FAST
,
I
I
If
I
I
I,
!
II
"
+
•
i 22 II
"Oil
I
M4
I
FIRST
INSTRUCTION
I
Begin Third
Operation
.
14--- 1 0
I
Load Fifth
Operands
.
, . . - - -.....-~,
14- 16
Load Fourth
Operands
.
I
CLK
Load Third
Operands
SECOND
OPERANDS
\I oJ
FOURTH
OPERANDS
17
DATA(31 ,0) A AND B INPUTS
,
I
I
t
I
.
\I
1
1
1
FIFTH
OPERANDS
t 23"
~
~ I+-
l231
17 ~
~.
I
I
I
,...-----..1
~1'l.ALJ1'
OUT(31,0) STATUS(18,0)
,,----.
j
I
I
14-4~
NOTE: Assume PtPES2-0=OOO, CONFIG1-0=01, ENRA=1, ENRB=1, SELMS/LS=X, OEY=O, OEC=OES=O, RESET=HALT=1, TP1-0=11
-;J
.....
o
Figure 35. Single-Precision Independent Multiplier Operation, All Registers Enabled
(PIPES2-PIPESO =< 000, CLKMODE -= Xl
c.o
SN74ACT8847
Load Half
____-'-!tJ-f-O-P.-'.-~d-S-.....,L-____~r·d Pipeline
C~
I
I
I
I
<
I
I
FIRST INSTRUCTION
I
I
~ 16-_~1'----INSTRUCTION:
I
~~
22
------+~::
FUNCI10,OI. RNDI1,OI. FAST
I
I
I
__~,~~t~~~~;___-I>}~__~'~~~~~PS~_,-
I'-- 1 7 --.tI~t--- 23-~.ttlt---
1,8
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ___
--.I
DATAI31.01 A AND B INPUTS
I
SELMS/LS
I
I
------------------------~I
Ir-----~I
I
I
_______________.......~
OUTl31,Ol STATUSI1B,OI
~:~~
k---:-- 3 ----.I
NOTE: Assume PIPES2-0 = 111, CLKMODE = 0, CONFIG 1-0
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11
=
I~----
~.......-F:::~~::.:~~~
~5---.t
11, ENRA
=
X, ENRB
=
X, OEY
Figure 36. Double-Precision Independent Multiplier Operation, All Registers
Disabled (PIPES2-PIPESO .. 111, CLKMODE .. 0)
=
0,
Load Rest
of Second
Load Pipeline
Operands
Load Half
of Second Begin Second
Operands Operation
Load Rest
of First
Operands
Load Half
Begin First
of First
Operands Operation
!
!
l
I
CLK
,I
I,
1
!
l
~~--~I
~I--~___
I
I
I
I
I
I
I
I
I
I
SECOND INSTRUCTION
:.
16 I
I
INSTRUCTION,:
I
I----J- 22
.1I
FUNC(10,01. RND(1,0), FAST
I
I
II.
I
\4----tI- 22
I
I
I
I
I
REST
1ST OPS
I
,
,14- 16 --./
,
...... 17 .............17..- 23
23
HALF
2ND OPS
II. "u1l4
I::
I
--...
17
23 17
I
23-...1
II
,
I
DATA(31 ,0) A AND B INPUTS
SELMS/LS
__________________
I
_'4
I
I
I
II
i 'I
~,
~~~~~~_
~
~
I
I
-------
-------------------..
OUTl31 ,0) STATUS(18,0)
~
~
3
5
NOTE: Assume PIPES2-0 = 110, CON FIG 1-0 = 11, ENRA = 1, ENRB = 1, OEY = 0, OEC = OES = 0,
RESET= HALT = 1, TP1-0 = 11
Figure 37. Double-Precision Independent Multiplier Operation, Input Registers
Enabled (PIPES2-PIPESO - 110, CLKMODE = 1)
7-111
load Rest
of First
Operands
load Half
of First
Operands
•
load Half
of Second
Operands
Begin Second
Operation
Begin First
Operation
load Pipeline
load Output
~
+
+
I
ClK
Load Rest
of Second
Operands
14-- 9
I
-_1+-- 9 "'---+III
THIRD INSTRUCTION
SECOND INSTRUCTION
I
I
~16
INSTRUCTION:
I
I
~22
II
!+" 22 -.t 14-16 -.t
FUNC(10,O), RND(1,O), FAST
I
.....
I
I
I
I
I
I
I
I
r---~----~!Ir-------~
REST
2ND DPS
I
~17
I
II
I
II
I
II
I·
r-----REST
3RD OPS
I
...... 23 ... M-17___ 23-.11&SX~g:E~iM~~~NEXT(DP) )~I~-
~22
16 ~
I
.... ,
..--: 1oP22
~~~~*~Eg~auoTIENT)""_--.I
~3
NOTE: Assume PIPES2-0 = 110, CONFIG1-0 = 01, ENRA = 1, ENRB = 1, OEY = 0, OEC = OES = 0,
RESET = HAlT = 1, TP1-0 = 11
Figure 44. Double-Precision Floating Point· Division
(PIPES2-PIPESO - 11 0, CLK~ODE .. 0)
2
3 4 5 6 7 8 9 10 11 12
13
14
NOTE: Assume PIPES2-0 = 100, CONFIG1-0 = 01, ENRA = 1, ENRB = 1, OEY = 0, OEC = OES = 0,
RESn = HALT = 1, TP1-0 =·11
Figure 45. Double-Precision Floating Point Division
(PIPES2-PIPESO .. 100, CLKMODE .. 0)
7-116
2
.....- .....elK
INST
-
i
14
13
3 4 5 6 7 8 9 10 11 12
-
.....-
I I
~__~I~I_
OIV
14-16
~~~**~~
-.I k-M- 22
I
I
I
NEXT (OPI
\4-16"':
NOTE: Assume PIPES2-0 = 010, CONFIG1-0 = 01, ENRA = 1, ENRB
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11
=
»)-,---!.-.i22
I
1, SELMS/LS
=
X, OEY = 0,
Figure 46. Double-Precision Floating Point Division
(PIPES2-PIPES() - 010. CLKMODE = 1)
2
3 4 5 6 7 8 9 10 11 12
NOTE: Assume PIPES2-0 = 000, CONFIG1-0 = 00, ENRA = 1, ENRB
RES'Ei' = HALT = 1, TP1-0 = 11
14
13
=
1, OEY
.
=
0, OEC
=
OES = 0,
Figure 47. Double-Precision Floating-Point Division. All Registers Enabled
(PIPES2-PIPESO - 000. CLKMODE - 1)
7-117
-
123466789101112131416
ClK
-
16
~
NOTE: ~me..f!f.ES2-0 = 110, CQtill91-0 = 01, ENRA = 1, ENRB = 1, SELMS/LS = X, OEY
OEC = OES ;., 0, RESET = HALT = 1, TP1-0 = 11. The result appears in the SREG.
=
0,
Figure 48. Integer Division, Input Registers Enabled
(PIPES2-PIPESO - 110, CLKMODE - XI
1
,......,..,..
2 346 6
789 1011121314 16
.....-
16
.....-
ClK
NOTE: Assume PIPES2-0 = 100, CONFIG1-0 = 01, ENRA = 1, ENRB = 1, SELMS/LS = X,
OEC = DES = 0, RESET = HALT = 1, TP1-0 = 11. The result appears in the SREG.
Figure 49. Integer Division, Input and Pipeline Registers Enabled
(PIPES2-PIPESO .. 100, CLKMODE - XI
7-118
OEY = 0,
2
3 4 5 6 7 8
9 10 11 12 13 14
15
-
16
17
r--
r--
NOTE: Assume PIPES2-0 = 010, eONFIG1-0 = 01, ENRA = 1, ENRB = 1, SELMS/LS = X, OEY
OEe = OES = 0, RESET = HALT = 1, TP1-0 = 11. The result appears in the SREG.
=
0,
Figure 50_ Integer Division. Input and Output Registers
Enabled (PIPES2-PIPESO - 010. CLKMODE = Xl .
1
2 3
r--
4
5 6 7 8 9 1011121314
15
16
17
r--
r--
r--
elK
-I
-
I
I
INST~:V{P~T;;I~~!---I-I----16--.t
~I
~22
---'
I
y
I
~ I
I
16--.1!.-1
22--t
~~NgEIE~MIN~~~UOTlENT>--I
t--4
NOTE: Assume PIPES2-0 = 000, eONFIG1-0 = 01, ENRA = 1, ENRB = 1, SELMS/LS = X, OEY
DEe = OES = 0, RESET = HALT = 1, TP1-0 = 11. The result appears in the SREG.
=
0,
Figure 51. Integer Division. All Registers Enabled
(PIPES2-PIPESO ... 000. CLKMODE - Xl
7-119
23456789
11
10
NOTE: Assume PIPES2-0 = 110, CONFIG1-0 = 01, ENRA = 1, ENRB = 1, SElMS/LS = X, OEY = 0,
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11
Figure 52. Single-Precision Floating Point Square Root, Input
Registers Enabled (PIPES2-PIPESO - 110, CLKMODE - XI
23456789
10
NOTE: ~me.E!f.ES2-0 -=--1.!Q, C~1-0 = 01, ENRA = 1, ENRB = 1, SELMS/LS = X, OEY = 0,
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11
Figure 53. Single-Precision Floating Point Square Root, Input and Pipeline
Registers Enabled (PIPES2-PIPESO - 100, CLKMODE - Xl
7-120
23456789
-
11
10
-
CLK
-I
1SQUARE
1
ROOT
1
INST~U~D!TrRXliE~I--1641
~I
-.I
16-.1
14- 22
14-1
~ 22
.....,
I
y
~U~~1~~Eg~S~~~~E>4 ~
t---
NOTE: Assume PIPES2-0 = 010, eONFIG1-0 = 01, ENRA = 1, SELMS/LS = X, OEY = 0,
OEe = OES = 0, RESET = HALT = 1, TP1-0 = 11
Figure 54. Single-Precision Floating Point Square Root, Input and Output
Registers Enabled (PIPES2-PIPESO ... 010, CLKMODE .. XI
23456789
r--
10
11
~
~
CLK
-I
1SQUARE
1
ROOT
I
--rnsx:U~D!T!RXI~E~I---+----16-.t ~I
16-.j 14-1
INST
-.I
I
y
14-22
.....,
14-22
1
~N1E!Eli~N!D~S~~~~E>4--1 t---
NOTE: Assume PIPES2-0 = 000, eONFIG1-0 = 00, ENRA = 1, SELMS/LS = X, OEY = 0,
OEe = OES = 0, RESET = HALT = 1, TP1-0 = 11
Figure 55. Single-Precision Floating Point Square Root, All Registers Enabled
(PIPES2-PIPESO ... 000, CLKMODE - XI
7-121
23456789101112131415
17
16
-
~~
-
~
CLK
NOTE: Assume PIPES2-0 = 110, CONFIG1-0
RESET = HALT = 1, TPt-O = 11
=
11, ENRA
=
1, OEY
=
0, OEC
=
OES
=
0
Figure 56. Double-Precision Floating Point Square Root, Input
Registers Enabled (PIPES2-PIPESO = 110, CLKMODE = 1)
2
3 4 5 6 789101112131415
17
16
CLK
I
II
, II
I
I
I
INST
)~I----------
NEXTIDP)
j..16
I
~
I
",22
I
NOTE: Assume PIPES2-0 = 100, CONFIG1-0
RESET = HALT = 1, TP1-0 = 11
en
:2
~
~
=
01, ENRA
=
1, OEY
=
0, OEC
=
OES
=
0,
Figure 57. Double-Precision Floating Point Square Root, Input and Pipeline
Registers Enabled (PIPES2-PIPESO - 100, CLKMODE - 0)
»
(")
~
CX)
CX)
,~
~
7-122
2
3
4 5 6 7 8 9 101112131415
16
17
NOTE: Assume PIPES2-0 = 010, CONFIG1-0 = 10, ENRA = 1, OEY = 0, OEC = OES = 0,
RESET = HALT = 1, TP1-0 = 11
Figure 58. Double-Precision Floating Point Square Root, Input and Output
Registers Enabled (PIPES2-PIPESO - 010, CLKMODE = 1)
23456789101112131415
16
17
elK
INST
I
I
I
I
I
I
I
I
NEXT (DP)
k- 16 --rI
I
I
)~I----~I---------I
~ 22
I
I
NOTE: Assume PIPES2-0 = 000, CONFIG1-0 = 00, ENRA = 1, OEY = 0, OEC = OES = 0,
RESET = HALT = 1, TP1-0 = 11
Figure 59. Double-Precision Floating Point Square Root, All
Registers Enabled (PIPES2-PIPESO - 000, CLKMODE - 0)
,....
or::t
ex)
ex)
~
u
«
or::t
,....
Z
(J)
7-123
23456789101112131415161718
19
20
NOTE: Assume PIPES2-0 = 110, CON FIG 1-0 = 01, ENRA = 1, SELM/LS = X, OEY = 0,
OEC = OES = 0, RESET = HALT = 1 TP1-0 = 11. The result appears in the SREG.
Figure 60. Integer Square Root, Input Registers Enabled
(PIPES2-PIPESO - 110, CLKMODE .. Xl
2 3 4 5 6 7 89101112131415161718
19
20
CLK.
I
I
SQUARE
ROOT
I
I
INST~ioiT~R~liE~----16 ~
~I
~
It- 22
16
-.I !.-I
~ I't- 22
Y~U:~1R:*Eg~S~~~~Er-.j ~3
tJ)
NOTE: Assume PIPES2-0 = 100, CONFIG1-0 = 00, ENRA = 1, SELMS/LS = X, OEY = 0,
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11. The result appears in the SREG.
2
"
Figure 61. Integer Square Root, Input and fipeline Registers Enabled
(PIPES2-PIPESO - 100, CLKMQDE - Xl
t
(")
-t
00
00
~
"
7-124
2 3 4
5 6
7 8 9101112131415161718
19
20
NOTE: Assume PIPES2-0 = 010, CONFIG1-0 = 01, ENRA = 1, SELMS/LS = X, OEY = 0,
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11. The result appears in the SREG.
Figure 62. Integer Square Root, Input and Output Registers Enabled
(PIPES2-PIPESO - 010, CLKMODE
Xl
2 3 4
5 6 7 8 9 101112131415161718
19
20
NOTE: Assume PIPES2-0 = 000, CONFIG1-0 = 00, ENRA = 1, SELMS/LS = X, OEY = 0,
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11. The result appears in the SREG.
Figure 63. Integer Square Root, All Registers Enabled
(PIPES2-PIPESO - 000, CLKMODE - Xl
7-125
Sample Chained Mode Microinstructions
The following chained mode timing diagram examples show four register settings,
ranging from fully flowthrough to fully pipelined.
FIRST INSTRUCTION
:~
__________~I______J,~__________~____- J
INSTRUCTION: FUNC(10,0), RND(1,0), FAST
I
I
~
FIRST OPERANDS
i
~
~--------------
I
I
SECOND
OP~RANDS
X'-__________
I
DATA(31 ,0) A AND B INPUTS:
~ :~~~T ~ ~i;~~~~
I
I
1 ~
OUT(31.0), STATUS(18,0)
It--
I
14--
I
1
-----..
NOTE: Assume PIPES2-0 = 111, CONFIG1-0 = 01, ENRA = X, ENRB = X, SELMS/LS, DEY = 0,
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11
Figure 64. Single-Precision Chained Mode Operation, All Registers Disabled
(PIPES2-PIPESO - 111, CLKMODE ... Xl
7-126
ClK
load half
of First
Operands
load Rest
of First
Operands
load Helf
of Second
Operands
load Rest
of Second
Operands
load Half
of Third
Operands
load Rest
of Third
Operands
l
l
~
~
l
l
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
THIRD INSTRUCTION
FIRST INSTRUCTION
1-16-1
I
I
I
II
J+- 22 +114---+1- 1 6
I
I
INSTRUCTION: FUNC/l0,OI. RND/l,O), FAST
I
I
II
I--+J If- 16 +I
122
I
I
I
I
r---~--~
I
r--~--~
I
f4---* 22
I
I
I
I
II
...
.11f
.. ---ti4--.......
: 23
17
m
I
~
FIRST
--------- I
I ------- I
I
OUT/3l,O) STATUS/la,O) ~ 2
~ 2
SECOND
x~_
NOTE: Assume PIPES2-0 = 110, CONFIG1-0 = 11, ENRA = 1, ENRB = 1, SElMS/lS = X, OEY = 0,
OEC = OES = 0, RESET = HALT = 1, TP1-0 = 11
Figure 65. Single-Precision Chained Mode Operation, Input Registers Enabled
(PIPES2-PIPESO = 110, CLKMODE ... 1)
.
7-127
load First Operands
Begin First Operation
elK
load Second Operands
Begin Second Operation
1
l ·
I
I
I
~
I
9------~~
I
I
.17
I
"23~
DATAI31.0) A AND B INPUTS
I
I
t.- 17
I
~III
23
I
I
~
~
FIRST RESULT
---------------------------------I1f--4...... I~------------------OUTI31.0) STATUSI18.0)
NOTE: Assume PIPES2-0 = 010, CONFIG1-0 = 01, ENRA = 1, SELMS/LS = X, OEY = 0,
OEC = OES= 0, ReSeT = HALT = 1, TP1-0 = 11
Figure 66. Single-Precision Chained Mode Qperation. Input and Output Registers
Enabled (PIPES2-PIPESO - 010. CLKMODE - XI
7-128
Load Second
Operands
Load First
Operands
Begin Second
Operation
Begin First
Operation
Load Pipeline
.
I
.
I
CLK
FIRST
INSTRUCTION
I
1416
I
Load Fifth
Operands
Begin Third
Operation
Begin Fourth
Operation
Begin Fifth
Operation
Load Pipeline
Load Pipeline
Load Pipeline
Load Output
.
Load Output
Load Output
I
,r----...
+
L
I
.....
~10
I
I
Load Fourth
Operands
.
,
I
Load Third
Operands
10 ----.!
I
I
I
SECOND
INSTRUCTION
i 22 II
""
.11f-16
I
II
~
I
..I
INSTRUCTION: FUNC(10.0)' RND(1.0) FAST
•
I
1
11
I
I
II
!
Ii
I
Ii
"
.J
SECOND
OPERANDS
u II~----L--v
FOURTH
OPERANDS
17
t 23'1
toI4---tI ~ 17
I
I
DATA(31.0) A AND B INPUTS
t
I
,
-;-J
~
(0
:
.1
II'-l......L.Yi ~I
I
I
14-4~
OUT(31.0) STATUS(18.0)
r-.J
~I
I
---"~Ij
NOTE: Assume PIPES2-0 = 000, CONFIG1-0
TP1-0=11
~...
01, ENRA
1, ENRB
~I
I
14-4--.J
1, SELMS/LS = X, OEY = 0, OEC = OES = 0, RESET = HALT
Figure 67. Single-Precision Chained Mode Operation. All Registers Enabled
(PIPES2-PIPESO = 000. CLKMODE = Xl
SN74ACT8847
,'-_ _-'
I
1,
Load Half
____-!tJ- opf
-
I
ClK
<
---!r.d
8
-'an-d-._ - - ,......._ _ _ _
Pipeline
I
I
I
I
FIRST INSTRUCTION
I
I
~ 16-_~~---INSTRUCTION:
22 - - - - _ . ; :
FUNCll0.01. RNDll.0J. FAST
I
I
I
I
~~___~l;~:~~"~;______-I>}~___~l~;~~~PS~_~_____________________________
It-- 17 - -...11_- 23 -91'111It---- 18
DATA131.01 A AND B INPUTS
SElMS/lS
-----t
I
I
-------------~I
I
.
I
Ir----~I
------------------------------'~
OUT131.01 STATUSI18.01
I--- 3 ---.t
~:~~
Ir----
~'---"':.~;;.;~~'-jI--- 5----.{
NOTE: Assume PIPES2-0 = 111, CONFIG 1-0 = 11, ENRA = 1, ENRB = 1, OEY = 0, OEC = OES = 0,
RESET = HALT = 1, TP1-0 = 11
Figure 68. Double-Precision Chained Mode Operation, All Registers Disabled
(PIPES2-PIPESO - 111, CLKMODE '" 0)
7-130
load Rest
of First
Operands
1
load Pipeline
load Half
of First
Begin First
Operands Operation
!
t
;I
i
I
,
lL..._~---'1
ClK
I
load Rest
of Second
Operands
load Half
of Second Begin Second
Operands Operation
l!
" - 1--:"_~-"""'I
1....----:..__
I
I
FIRST INSTRUCTION
SECOND INSTRUCTION
~'-------------~'------I~--"I ~-------r'------~'------~I--~ ~--------
,,----..
16 ------....,
I
INSTRUCTION:
~ 22
I
FUNC(10.01. RND(1.01. FAST
i4----J. 22
14- 16 -..t
I
I
I
i
I
REST
HALF
' -____~~~----~;I'~~~'S~T~O~P~S--~ , -__~2~N~D~O~PS~~~II~__~~~__~
I II
I
I
I
I JI ,
I
. - 17 ...........17....- 23 --..
23
~e .1
17 Ie
23
Ie
17
.Ie
17 -.I
231
I
DATA(31.01 A AND B INPUTS
,
,
I
I
SElMS/lS
I HALF
-----------------------------------OUT(31.01 STATUS(18.01
I
I
,
~I~----------"
REST
I FIRST I I FIRST I ~-----------
......
~
3
5
NOTE: Assume PIPES2-0 = 110. CONFIG1-0 = 11. ENRA = 1. OEY = O. OEC = OES = O.
RESET = HALT = 1. TP1-0 = 11
Figure 69. Double-Precision Chained Mode Operation, Input Registers Enabled
(PIPES2-PIPESO - 110, CLKMODE .. 1)
7-131
load Rest
of First
Operands
load Half
of First
Operands
+
load Rest
of Second
Operands
load Half
of Second
Operands
Begin Second
Operation
Begin First
Operation
load Pipeline
~
~
I
I
ClK
I+---- 9 ----.!
I
THIRD INSTRUCTION
SECOND INSTRUCTION
I
I
I
!4'""" 22 ~ ... 16 -.j
~16
INSTRUCTION:
II
I
I
:
:
I+- 22 ....
:
:
FUNC(10,O), RND(1,O), FAST
I
r---~--~ r----~--~ r--------~ r----~--~!Ir-------~
REST
r-----REST
2ND OPS
I
I
II
,
II
I
II
lRD OPS
I
I
:
I
1+-17 ...... 23-.1 '-17.......... 23-.11+-17 ........ 23-.1 1+-17~23-+:
DATA(31 ,0) A AND B INPUTS
I
-------------------------------------------!~----~:----~I
~
l
SElMS/lS
I
I
I
:
:
I
I
------------------------------------------~
l
----------------------------------------~
I
l
:
I
I+- 4 .....
OUT(31 ,0) STATUS( 18,0)
NOTE: Assume PIPES2-0 = 010, CONFIG1-0
RESET = HALT = 1, TP1-0 = 11
= 10, ENRA
= 1, ENRB
= 1, OEY
~
5 -..I
= 0, OEC
= OES = 0,
Figure 70. Double-Precision Chained Mode Operation, Input and Output Registers
Enabled (PIPES2-PIPESO - 010, CLKMODE .. 0)
7-132
Load Rest
of First
Operands
Load Half
of First
Operands
Begin First
Operation
+
+
Load Half
of Second
Operands
I
I
10 - - . j .
• 1I
I
!
I
FIRST
INSTRUCTION
~
I
10
.~
I
I
THIRD
INSTRUCTION
SECOND
INSTRUCTION
22
INSTRUCTION:
~
I
14---10
I
Begin Second
Load Pipeline
Operation
+
I
1.-16....1
Load Half
of Third
Operands
Load Pipeline Load Pipeline Load Output
1
CLK
Load Rest
of Second
Operands
.
I
I
~
I
I
I
I
I
FUNC(10.0). RND(1.01. FAST
~_-i..''''''''''\'
17
I
I
•
I
j
23
I
1+-22 ....
1
I
I
I
22......-..J -----*" 16
... 16....
I
17
23
17
17T23' ~ 1-7-:- 23- ~
23
DATA(31.01 A AND B INPUTS
_ _ _ _ _ _ _--:--~I---lL______'
SELMS/LS
I
I
t.--4-.t
OUT(31.01 STATUS(18.01
NOTE: Assume PIPES2-0
= 000.
CONFIG1-0
=
01. ENRA
=
1. ENRB
=
1. OEY
=
O. OEC
=
OES
=
O. RESET
= HALT = 1,
-;J
w
w
Figure 71. Double-Precision Chained Mode Operation, All Registers Enabled
(PIPES2-PIPESO ... 000, CLKMODE - 01
SN74ACT8847
TP1-0
= 11
Instruction Timing
The following table details the number of clock cycles required to compiete an operation
in different pipelined modes. For more detail, see the sample microi!1structions shown
in the previous section.
Clock duration and output delay depend on the pipeline mode selected. See the note
in the table and timing parameters listed at the beginning of this document.
Table 31. Number of Clocks Required'to Complete an Operation
PIPES2-0
PIPES2-0
PIPES2-0
PIPES2-0
- 000
(tpd41
- 100
- 111
-010
(tpd3 1
- 110
(tp d21
(tpdl 1
(tp d41
ALU Operation
or Multiply:!:
3
2
1
0
2
Divide
8
7
7
X
8
11
10
10
X
11
ALU Operation t
4
3
2
1
Multiply:!:
5
4
3
2
4
OPERATION
PIPES2-0
Single-Precision
Floating Point
Square Root
Double-Precision
Floating Point
3
Divide
14
13
13
X
14
Square Root
Integer
17
16
16
X
17
3
2
1
0
2
Divide
16
15
15
X
16
Sauare Root
20
19
19
X
20
ALU Operation
or Multiply:!:
Y output and status valid following this tpd delay after the designated number of clocks
t'nc'udes every conversion involving double-precision lOP +-+ SP or OP +-+ Integer)
:t Includes all chained mode operations
X = invalid
When using fast cycle times and double-precision operations, two cycles may be
required to output and capture both halves of a double-precision result. To insure the
result remains valid for two cycles, a NOP instruction may need to be inserted between
the operations. Table 32 shows the number ,of NOPs necessary to insert into the
instruction !;tream for fully pipelined operation (PIPES2-PIPESO = 000).
7-134
Table 32. NOPs Inserted to Guarantee That Double-Precision Results Remain
000)
Valid for Two Clock Cycles (PIPES2-PIPESO
1 ST OPERATION
DP -
32 BIT
32 BIT -
DP
32 BIT OP
DP ALU
DP Multiply
FOLLOWED BY
2ND OPERATION
# NOPs INSERTED
BETWEEN OPERATIONS
# CYCLES RESULT
IS VALID
2
2
1
2
2
2
2
DP
32
32
DP
DP
DP
DP
- 32 BIT
BIT - DP
BIT OP
ALU
Multiply
Sqrt
Divide
0
0
0
0
0
0
0
DP
32
32
DP
DP
DP
DP
- 32 BIT
BIT -+ DP
BIT OP
ALU
Multiply
Sqrt
Divide
0
0
DP
32
32
DP
DP
DP
DP
-+ 32 BIT
BIT -+ DP
BIT OP
ALU
Multiply
Sqrt
Divide
0
0
0
0
0
0
0
2
2
DP
32
32
DP
DP
DP
DP
-+ 32 BIT
BIT -+ DP
BIT OP
ALU
Multiply
Sqrt
Divide
0
0
2
2
2
2
2
2
2
DP
32
32
DP
DP
DP
DP
-+ 32 BIT
BIT -+ DP
BIT OP
ALU
Multiply
Sqrt
Divide
1
0
0
0
0
1
0
0
0
0
1
1
2t
1
0
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
NOTE: 32-bit operation refers to a single-precision floating point or integer ALU operation or multiply, except
conversion to or from double-precision. This assumes the instruction following a double-precision divide
may begin loading on the 12th clock cycle, following a double-precision square root on the 15th cycle.
tThe device will not load a single-precision operation on the first clock edge following this operation, so any
single-precision instruction may be used. A Nap is recommended. The second instruction must be a Nap.
7-135
Table 32. NOPs Inserted to Guarantee That Double-Precision Results Remain
Valid for Two Clock Cycles (PIPES2-PIPESO ... 000) (Continued)
1ST OPERATION
DP SQRT
DP Divide
FOLLOWED BY
2ND OPERATION
DP - 32 BIT
32 BIT - DP
32 BIT OP
DPALU
DP Multiply
DP Sqrt
DP Divide
DP - 32 BIT
32 BIT - DP
32 BIT OP
bPALU
DP Multiply
DP Sqrt
DP Divide
# NOPs INSERTED
BETWEEN OPERATIONS
1
1
2t
1
0
0
0
# CYCLES RESULT
IS VALID
2
2
2
2
2
2
2
1
1
2t
1
0
0
0
2
2
2
2
2
2
2
NOTE: 3i-bit operation refers to a single-precision floating point or integer ALU operation or multiply, except
conversion to or from double-precision. This assumes the instruction following a double-precision divide
may begin loading on the 12th clock cycle, following a double-precision square root on the 15th cycle.
tThe device will not load a single-precision operation on the first clock edge following this operation, so any
single-precision instruction may be used. A Nap is recommended. The second instruction must be a Nap.
\'
Exception and Status Handling
Exception and status flags for the' ACT8847 were listed previously in Tables 14 and
(J)
Z
~
~
l>
(")
""'4
CO
CO
~
~
15.
Output exception signals are provided to indicate both the source and type of the
exception. DENORM, INEX, OVER, UNDER, and RNDCO indicate the exception type,
and CHEX and SRCEX indicate the source of an exception. SRCEX indicates the source
of a result as selected by instruction bit 16, and SRCEX is active whenever a result
is output, not only when an exception is being signalled. The chained-mode exception
signal CHEX indicates that an exception has be generated by the source not selected
for output by 16. The exception type signalled by CHEX cannot be read unless status
select controls SELSn-SELSTO are used to force status output from the deselected
source.
Output exceptions may be due either to a result in an illegal ,format or to a procedural
error. Results too large or too small to be represented in the selected precision are
signalled by OVER and UNDER. When INF is high, the output is the IEEE representation
of infinity. Any ALU output which has been increased in magnitude by rounding causes
INEX to be set high. DENORM is set when the multiplier output is wrapped or the ALU
output is denormalized. DENORM is also set high when an illegal operation on an integer
is performed. Wrapped outputs from the multiplier may be inexact or increased in
magnitude by rounding, which may cause the INEX and RNDCO status signals to be
set high. A denormal output from the ALU (DENORM = 1) may also cause INEX to
be set, in which case UNDER is also signalled.
7-136
Ordinarily, SELST1-SELSTO are set high so that status selection defaults to the output
source selected by instruction input 16. The ALU is selected as the output source when
16 is low, and the multiplier when 16 is high.
When the device operates in chained mode, it may be necessary to read the status
results not associated with the output source. As shown in Table 16, SELST1-SELSTO
can be used to read the status of either the ALU or the multiplier regardless of the
16 setting.
Status results are registered only when the output (P and S) registers are enabled
(PIPES2 = 0). Otherwise, the status register is transparent. In either case, to read
the status outputs, the output enables (OES, OEC, or both) must be low.
Status flags are provided to signal both floating point and integer results. Integer status
is provided using AEQB for zero, NEG for sign, and OVER for overflow/carryout.
Several status exceptions are generated by illegal data or instruction inputs to the FPU.
Input exceptions may cause the following signals to be set high: IVAL, DIVBYO, DEN IN,
and STEX 1-STEXO. If the IVAL flag is set, either an invalid operation such as the square
root of - IX I, has been requested or a NaN (Not a Number) has been input. When
DEN IN is set, a denormalized number has been input to the multiplier. DIVBYO is set
when the divisor is zero. STEX 1-STEXO indicate which port (RA, RB, or both) is the
source of the exception when either a denormal is input to the multiplier (DENIN = 1)
or a NaN (lVAL = 1) is input to the multiplier or the ALU.
NaN inputs are all treated as IEEE signalling NaNs, causing the IVAL flag to be set.
When output from the FPU, the fraction field from a NaN is set high (all 1s) and the
sign bit is 0, regardless of the original fraction and sign fields of the input NaN.
When the' ACT884 7 outputs a NaN, it is always in the form of a signalling NaN along
with the IVAL (Invalid) and appropriate STEX flag set high (except for the MOVE A
instruction which passes any operand as is without setting exception flags).
Certain operations involving floating point zeros and infinities are invalid, causing the
, ACT884 7 to set the IVAL flag and output a NaN. Operations involving zero and infinity
are detailed below.
A floating point zero is represented by an all zero exponent and fraction field. The sign
bit may be 0 or 1, to represent +0 OR -0 respectively.
Zero divided by zero is an invalid operation. The result is a NaN with the IVAL and
DIVBYO flags set. Any other number divided by zero results in the appropriately signed
infinity with the DIVBYO flag set.
"
~
CO
I-
U
oCt
~
z"
en
7-137
For operations with floating point zeros: ± 0 multiplied by any number is the
appropriately signed O.
+0
+0
-0
-0
+0
+0
-0
-0
+
+
+
+
-
(-0)
(+0)
(-0)
(+0)
(-0)
(+0)
(-0)
(+0)
+0
+0
-0
+0
+0
+0
+0
-0
Floating point infinity is represented by an all 1 exponent field with an all 0 fraction
field. The sign bit determines positive or negative infinity (0 or 1 respectively).
Infinity divided by infinity is an invalid operation, setting the IVAL flag and resulting
in a NaN output. Division of infinity by any other number results in the appropriately
signed infinity. Division of any number (except infinity or zero) by infinity results in
an appropriately signed zero. Infinity divided by zero results in the appropriately signed
infinity with the DIVBYO flag set.
For invalid operations with infinity listed below, the output is a signalling NaN with
the IVAL flag set.
± infinity multiplied by ± 0
± infinity divided by ± 0
+ infinity + (- infinity)
- infinity + (+ infinity)
+ infinity - (+ infinity)
- infinity - (- infinity)
Any other number added to or multiplied by infinity results in the appropriately signed
infinity as output.
7-138
, ACT884 7 Reference Guide
Instruction Inputs
Operations are summarized in Tables 33 thru 41.
Table 33. Independent ALU Operations, Single Floating Point Operand
ALU OPERATION
INSTRUCTION
ON A OPERAND
INPUTS 11 0-10
NOTES
Pass A operand
OOx x01 x 0000
Pass - A operand
OOx x01 x 0001
Convert from 2' s
complement integer
to floating point t
OOx x01 00010
Convert from floating
point to 2's complement
integer t
OOx x01x 0011
x = Don't care
Move A operand (pass
without NaN detect or
status flags active)
OOx x01x 0100
18 selects precision of A
operand
0= A (SP)
Pass B operand
OOx x01x 0101
17 selects precision of B
operand and must equal 18.
Convert from floating
point to floating point
(adjusts precision of
input: SP -+ DP, DP -+ SP):t;
OOx x01x 0110
Floating point to
unsigned integer
conversion t
1 = A (DP)
14 selects absolute value of
a operand:
O=A
OOx x01x 0111
Wrap denormal operand
Unsigned integer to
floating point
conversion t
OOx x01x 1000
OOx x01x 1010
Unwrap exact number
OOx x01x 1100
Unwrap inexact number
OOx x01x 1101
Unwrap rounded input
OOx xO 1 x 111 0
1 = IAI
During integer to floating
point conversion, I A I is not
allowed as a result.
tOuring this operation, 18 selects the precision of the result. If the conversion involves double-precision. the
operation requires 2 cycles to load.
tRequires 2 cycles to load the operation. even if input is SP.
7-139
Table 34. Independent ALU Operations, Two Floating Point Operands
ALU OPERATIONS
INSTRUCTION
AND OPERANDS
INPUTS 110-10
Add A
+ B
+B
Add A + IBI
Add IAI + IBI
OOx xOOO OxOO
Add IAI
OOx xOO 1 OxOO
Subtract A - B
OOx xOOO Ox01
Subtract I A I - B
OOx x001 Ox01
Subtract A -
OOx xOOO 1x01
IB I
Subtract IAI Compare A, B
IBI
NOTES
OOx xOOO 1xOO
x = Don't Care
OOx x001 1xOO
18 selects precision of A
operand:
OOx x001 1x01
OOx xOOO Ox 10
0= A ISP)
1 = A lOP)
17 selects precision of B
operand:
o=
B ISP)
Compare IAI ' B
Compare A, I B I
OOx x001 Ox10
Compare I A I, I B I
Subtract B - A
OOx x001 1x10
12 selects either Y or its
absolute value:
OOx xOOO Ox 11
o=y
Subtract B-1 A I
OOx x001 Ox11
1 = IYI
Subtract I B I - A
OOx xOOO 1x11
Subtract IBI -
OOx x001 1x11
IAI
OOx xOOO 1x1 0
1 = B lOP)
Table 35. Independent ALU Operations, One Integer Operand
CJ)
:2
-..J
ALU OPERATION
INSTRUCTION
ON A OPERAND
INPUTS 110-10
NOTES
Pass A operand
010 xx10 0000
x = Don't Care
Pass - A operand 12's complement):I:
010 xx10 0001
17 selects format of A or B
integer operand:
Negate A operand 11' s complement)
010 xx10 0010
Pass B operand
010 xx10 0101
Shift left logical t
010 xx10 1000
Shift right logical t
010 xx10 1001
1 = Single-precision unsigned
integer
Shift right arithmetic t
010 xx10 1101
18 must equal 17
o=
Single-precision 2's
complement
tB operand is number of bit positions A is to be shifted and must be input on the same cycle as the instruction.
tPass (- AI of unsigned integer takes 1 's complement.
~
l>
(")
-4
ex)
ex)
~
-..J
7-140
Table 36. Independent ALU Operations, Two Integer Operands
ALU OPERATIONS
INSTRUCTION
AND OPERANDS
INPUTS 110-10
Add A
+ 8
NOTES
010 xOOO 0000
Subtract A - 8
010 xOOO 0001
x = Don't Care
Compare A, 8
010 xOOO 0010
Subtract 8 - A
010 xOOO 0011
17 selects format of A and 8
operands:
Logical AND A, 8
010 xOOO 1000
o=
Logical AND A, NOT 8
010 xOOO 1001
Logical AND NOT A, 8
010 xOOO 1010
Logical OR A, 8
010 xOOO 1100
Logical XOR A, 8
010 xOOO 1101
Single-precision 2's
complement
1 = Single-precision unsigned
integer
Table 37. Independent Floating Point Multiply Operations
MULTIPLIER OPERATION
INSTRUCTION
AND OPERANDS
INPUTS 110-10
*8
Multiply - (A * 8)
Multiply A * I B I
Multiply -(A * 181)
Multiply I A I * 8
Multiply -(IAI * 8)
Multiply IAI * 181
Multiply -(IAI * 181)
Multiply A
NOTES
OOx x 100 OOxx
x = Don't Care
OOx x100 01xx
18 selects A operand
precision (0 = SP, 1 = DP)
OOx x1 00 10xx
OOx x100 11xx
OOx x101 OOxx
17 selects 8 operand
precision (0 = SP, 1 = DP)
OOx x1 01 01 xx
11 selects A operand format
(0 = Normal, 1 = Wrapped)
OOx x101 10xx
10 selects 8 operand format
OOx x101 11xx
(0 = Normal, 1 = Wrapped)
Table 38. Independent Floating Point Divide/Square Root Operations
MULTIPLIER OPERATION
INSTRUCTION
AND OPERANDS t
INPUTS 110-10
NOTES
x = Don't Care
Divide A /8
OOx x11 0 Ox xx
SQRT A
OOx x110 1xxx
Divide IAI /8
OOx x111 Oxxx
SQRT IAI
OOx x111 1 xxx
18 selects A operand precision
and 17 selects 8 operand
precision (0 = SP, 1 = DP)
12 negates multiplier result
(0 = Normal, 1 = Negated)
11 selects A operand format and
10 selects 8 operand format
(0 = Normal, 1 = Wrapped)
tl7 should be equal to 18 for square root operations
7-141
Table 39. Independent Integer Multiply/Divide/Square Root Operations
MULTIPLIER OPERATION
INSTRUCTION
AND OPERANDSt
INPUTS 110-10
Multiply A * B
Divide A / B
SQRT A
010 x100 0000
010 x110 0000
010 x110 1000
NOTES
x = Don't care
17 selects operand format:
o = SP 2's complement
1 = SP unsigned integer
t Operations involving absolute values, wrapped operands, or negated results are valid only when floating point
format is selected (19 = 0).
Table 40; Chained Multiplier/ALU Floating Point Operationst:
CHAINED OPERATIONS
OUTPUT
INSTRUCTION
SOURCE
INPUTS 110-10
A+B
ALU
10x xOOO xxOO
A+B
Multiplier
10x x100 xxOO
10x xOOO xx01
A*B
A - B
A - B
ALU
Multiplier
10x x100 xx01
A*B
2-A
ALU
10x xOOO xx10
x = Don't Care
A*B
2 - A
Multiplier
10x x100 xx10
A * B
B-A
ALU
10x xOOO xx11
18 selects precision of
RA inputs:
A*B
B-A
Multiplier
10x x100 xx11
o=
A * B
A+O
ALU
10x x010 xxOO
1 = RA (DP)
A*B
Multiplier
10x x110 xxOO
A * B
A+O
O-A
ALU
10x x010 xx11
A * B
O-A
Multiplier
10x x110 xx11
17 selects precision of
RB inputs:
o = RB (SP)
A * 1
A+B
ALU
10x x001 xxOO
A * 1
A* 1
A+B
A-B
Multiplier
10x x101 xxOO
ALU
10x x001 xx01
A * 1
A* 1
A* 1
A-B
Multiplier
10x x101 xx01
2 - A
ALU
10x x001 xx10
2 - A
Multiplier
10x x101 xx10
A* 1
A * 1
B-A
ALU
10x x001 xx11
B-A
Multiplier
10x x101 xx11
A* 1
A * 1
A* 1
A+O
ALU
10x x011 xxOO
A+O
Multipiier
10x x111 xxOO
O-A
ALU
10x x011 xx11
A * 1
O-A
Multiplier
10x x 111 xx 11
MULTIPLIER
ALU
A*B
A * B
A*B
C/)
2
"l>
~
(")
-i
(Xl
(Xl
e;
NOTES
RA (SP)
1 = RB (DP)
13 negates ALU result:
o=
Normal
1 = Negated
12 negates multiplier
result:
o=
Normal
1 = Negated
tThe 110-10 setting 1xx xx1x xx10 is invalid, since it attempts to force the B operand of the ALU to both
o and 2 simultaneously.
7-142
Table 41. Chained Multiplier/ALU Integer Operations
CHAINED OPERATIONS
OUTPUT
INSTRUCTION
SOURCE
INPUTS 11 0-10
MULTIPLIER
ALU
A*B
A+B
ALU
11 0 xOOO 0000
A*B
Multiplier
110 x100 0000
A * B
A + B
A - B
ALU
110 xOOO 0001
A*B
A - B
Multiplier
110 x100 0001
A*B
2-A
ALU
110 xOOO 0010
A*B
2-A
Multiplier
110 x100 0010
A*B
B-A
ALU
110 xOOO 0011
A*B
B-A
Multiplier
110 x100 0011
A * B
A+O
ALU
110 x010 0000
NOTES
x
=
Don't Care
A*B
A+O
Multiplier
110 x110 0000
17 selects format of A
and B operands:
A * B
O-A
ALU
110 x010 0011
A*B
O-A
Multiplier
110 x11 00011
o = SP 2's
A * 1
A+B
ALU
110 x001 0000
A * 1
Multiplier
110 x101 0000
A * 1
A + B
A-B
ALU
110 x001 0001
A * 1
A-B
Multiplier
110 x101 0001
A * 1
2 - A
ALU
110 x001 0010
A * 1
2 - A
Multiplier
110 x101 0010
A * 1
B-A
ALU
110 x001 0011
A * 1
B-A
Multiplier
110 x101 0011
A * 1
A+O
ALU
110 x011 0000
A * 1
Multiplier
110 x111 0000
A * 1
A+O
O-A
ALU
110 x011 0011
A * 1
O-A
Multiplier
110x111 xx11
complement
1 = SP unsigned
integer
7-143
Input Configuration
CONFIG 1-CONFIGO control the order in which double-precision operands are loaded,
as shown in the Table 42.
Table 42. Double-Precision Input Data Configuration Modes
LOADING SEQUENCE
DATA LOADED INTO TEMP
DATA LOADED INTO RA/RB
REGISTER ON FIRST CLOCK
REGISTERS ON SECOND
AND RA/RB REGISTERS ON
CLOCK
SECOND CLOCK t
CONFIG1
CONFIGO
0
0
0
1 :j:
1
0
1
1
DA
B operand
(MSH)
A operand
(LSH)
A operand
(MSH)
A operand
(MSH)
DB
B operand
(LSH)
B operand
(LSH)
B operand
(MSH)
A operand
(LSH)
DA
A operand
(MSH)
A operand
(MSH)
A operand
(LSH)
B operand
(MSH)
DB
A operand
(LSH)
B operand
(MSH)
B operand
(LSH)
B operand
(LSH))
t On the first active clock edge (see CLKMOOE), data in this column is loaded into the temporary register.
On the next rising edge, operands in the temporary register and the OAIOB buses are loaded into the RA
and RB registers.
tUse CONFIG1-0 = 01 as normal single-precision input configuration.
Operand Source Select
Multiplier and ALU operands are selected by SELOP7-SELOPO as shown in Tables 43
and 44.
Table 43. Multiplier Input
SELOP7
0
0
1
en
z
""""
~
1>
1
A1 (MUX1) INPUT
OPERAND SOURCEt
SELOP6
Reserved
0
1
C register
0
ALU feedback
RA input register
1
Selectio~
SELOP5
0
0
1
1
B1 (MUX2) INPUT
SELOP4 OPERAND SQUhCEt
0
1
0
1
Reserved
C register
Multiplier feedback
RB input register
t For division or square root operations, only RA and RB registers can be selected as sources.
()
-t
00
00
~
""""
7-144
Table 44. ALU Input Selection
A2 IMUX3) INPUT
B2 IMUX4) INPUT
SELOP3
SELOP2
OPERAND SOURCEt
SELOP1
SELOPO
OPERAND SOURCEt
0
0
0
0
0
0
1
1
0
Reserved
C register
Multiplier feedback
RA input register
1
1
0
1
Reserved
C register
ALU feedback
RB input register
1
1
1
tFor division or square root operations, only RA and RB registers can be selected as sources.
Pipeline Control
Pipelining levels are turned on by PIPES2-PIPESO as shown below.
Table 45. Pipeline Controls (PIPES2-PIPESO)
PIPES2PIPESO
0
1
X
X
X
X
0 X
1 X
0
1
X
X
X
X
X
X
REGISTER OPERATION SELECTED
Enables input registers IRA, RB)
Makes input registers IRA, RB) transparent
Enables pipeline registers
Makes pipeline registers transparent
Enables output registers (PREG, SREG, Status)
Makes output registers (PREG, SREG, Status) transparent
Round Control
RND1-RNDO select the rounding mode as shown in Table 46.
Table 46. Rounding Modes
RND1-
ROUNDING MODE SELECTED
RNDO
0 0
0 1
1 0
1 1
Round
Round
Round
Round
towards
towards
towards
towards
nearest
zero (truncate)
infinity (round up)
negative infinity (round down)
7-145
Status Output Selection
SELST1-SELSTO choose the status output as shown below.
Table 47. Status Output Selection (Chained Mode)
SELST1-
STATUS SELECTED
SELSTO
00
01
10
11
Logical
Selects
Selects
Normal
OR of ALU and multiplier exceptions (bit by bit)
multiplier status
ALU status
operation (selection based on result source specified by 16 input)
Test Pin Control
Testing is controlled by TP1-TPO as shown below.
Table 48. Test Pin Control Inputs
TP1TPO
o
0
0 1
1 0
1 1
7-146
OPERATION
All outputs and I/Os are forced low
All outputs and I/Os are forced high
All outputs are placed in Ii high impedance state
Normal operation
Miscellaneous Control Inputs
The remaining control inputs are shown in the Table 49.
Table 49. Miscellaneous Control Inputs
SIGNAL
BYTEP
CLKMODE
ENRC
HIGH
Selects byte parity generation and test
Enables temporary input register load on
failing clock edge
No effect
ENRA
If register is not in flowthrough, enables
clocking of RA register
ENRB
HALT
If register is not in flowthrough, enables
enables clocking of RB register
Places device in FAST mode
Causes output value to bypass C
register and appear on C register output
bus.
No effect
OEC
OES
OEY
RESET
Disables compare pins
Disables status outputs
Disables Y bus
No effect
FAST
FLOW_C
SELMS/LS
SRCC
Selects MSH of 64-bit result for output
output on the Y bus (no effect on singleprecision operands)
Selects multiplier result for input to C
register
LOW
Selects single bit parity
generation and test
Enables temporary input register
load on rising clock edge
Enables C register load when
CLKC goes high.
If register is not in flowthrough,
through, holds contents of RA
register
If register is not in flowthrough,
holds contents of RB register
Places device in IEEE mode
No effect
Stalls device operation but
does not affect registers, internal
states, or status
Enables compare pins
Enables status outputs
Enables Y bus
Clears internal states, status,
internal pipeline registers, and
exception disable register. Does
not affect other data registers.
Selects LSH of 64-bit result for
output on the Y bus (no effect on
single-precision operands)
Selects ALU result for input to C
register
Glossary
Biased exponent - The true exponent of a floating point number plus a constant called
the exponent field's excess. In IEEE data format, the excess or bias is 127 for singleprecision numbers and 1023 for double-precision numbers.
Denormalized number (de norm) - A number with an exponent equal to zero and a
nonzero fraction field, with the implicit leading (leftmost) bit of the fraction field being O.
7-147
NaN (not a number) - Data that has no mathematical value. The' ACT884 7 produces
(Xl is executed. The output format
a NaN whenever an invalid operation such as 0
for an NaN is an exponent field of all ones, a fraction field of all ones, and a zero sign
bit. Any number with an exponent of all ones and a nonzero fraction is treated as a
NaN on the input.
*
Normalized number - A number in which the exponent field is between 1 and 254
(single precision) or 1 and 2046 (double precision). The implicit leading bit is 1.
Wrapped number - A number created by normalizing a denormalized number's fraction
field and subtracting from the exponent the number of shift positions required to do
so. The exponent is encoded as a two's complement negative number.
SN74ACT8847 Application Notes
Sum of Products and Product of Sums
Performing fully pipelined double-precision operations requires a detailed understanding
of timing constraints imposed by the multiplier. In particular, sum of products and
product of sums operations can be executed very quickly, mostly in chained mode,
assuming that timing relationships between the AlU and the multiplier are coded
properly.
Pseudocode tables for these sequences are provided, (Table 38 and Table 39) showing
how data and instructions are input in relation to the system clock. The overall patterns
of calculations for an extended sum of products and an extended product of sums
are presented. These examples assume FPU operation in ClKMODE 0, with the CONFIG
setting 10 to load operands by MSH and lSH, all registers enabled
(PIPES2 - PIPESO = 000), and the C register clock tied to the system clock.
In the sum of products timing table, the two initial products are generated in
independent multiplier mode. Several timing relationships should be noted in the table.
The first chained instruction 10aQs and begins to execute following the sixth rising
edge of the clock, after the first product P1 has already been held in the P register
for one clock. For this reason, P1 is loaded into the C register so that P1 will be stable
for two clocks.
en
2
"
i:
("')
-I
00
00
On the seventh clock, the AlU pipeline register loads with an unwanted sum, P1 + P1.
However, because the AlU timing is constrained by the multiplier, the S register will
not load until the rising edge of ClK9, when the AlU pipe contains the desired sum,
P1 + P2. The remaining sequence of chained operations then execute in the desired
manner.
~
"
7-148
Table 50. Pseudocode for Fully Pipelined Double-Precision Sum of Products t
(CLKMODE-O, CONFIG1-CONFIGO-10, PIPES2-PIPESO ... 000)
ClK
I
I
I
I
TEMP
INS
INS
RA
RB
MUl
P
C
ALU
REG
BUS
REG
REG
REG
PIPE
REG
REG
PIPE
S
2
A1 l5H
A1 *B1
A1
B1
3
A2 M5H B2 M5H A2.B2M5H A2*B2 A1 *B1
A1
B1
A1 *B1
4
A2 l5H
A2
B2
A1 *B1
A2
B2
A2*B2
P1
A3
B3
A2*B2
P1
P1
A3
B3
A3*B3
P2
P1
P1 +P1
A4
B4
A3*B3
P2
P1
P1 +P2
A4
B4
A4*B4
P3
P2
51 +P2
51
A5
B5
A4*B4
P3
P2
51 +P3
51
A5
B5
A5*B5
P4
P2
XXXXX
52
P4
P2
B1 l5H A1.B1l5H
B2 l5H A2.B2L5H
A1 *B1
A2*B2 A2*B2
PR+CR
I
5
A3 M5H B3 M5H A3.B3M5H
I
6
A3 L5H
I
7
A4M5H B4 M5H A4.B4M5H
I
8
A4 LSH
I
9
A5 M5H B5 M5H A5.B5M5H
A5 L5H
B3 L5H A3.B3L5H
B4 L5H A4.B4L5H
B5 L5H A5.B5L5H
A6 MSH B6 M5H A6.B6M5H
tpR = Product Register
SR = Sum Register
CR = Constant (C) Register
co
"'"
SN74ACT8847
A3*B3
A2*B2
PR+CR PR+CR.
A3*B3 A3*B3
PR+5R PR+5R.
A4*B4 A3*B3
PR+5R PR+5R.
A4*B4 A4*B4
PR+5R PR+5R.
A5*B5 A4*B4
PR+5R PR+5R.
A5*B5 A5*B5
PR+5R PR+5R,
A6*B6 A5*B5
V
REG BUS
A1 M5H B1 M5H A1.B1M5H A1 *B1
I11
I 12
--
DB
BUS
1
I10
';'I
DA
BUS
52
L V881.::n:fv L NS
...';-I
Table 51. Pseudocode for Fully Pipelined Double-Precision Product of Sums t
(CLKMODE ... O, CONFIG1-CONFIGO-10, PIPES2-PIPESO=OOO)
U1
o
CLK
I
I
I
I
I
DA
DB
TEMP
INS
INS
RA
RB
MUL
P
C
ALU
S
V
BUS
BUS
REG
BUS
REG
REG
REG
PIPE
REG
REG
PIPE
REG
BUS
1
A1M5H
B1M5H A1,B1M5H A1 +B1
2
A1L5H
B1L5H A 1 ,B1 L5H
A1 +B1
A1
B1
3
A2M5H
B2M5H A2,B2M5H A2+B2 A1 +B1
A1
B1
4
A2L5H
B2L5H A2,B2L5H
A2
B2
5
A3M5H
B3M5H A3,B3M5H
I
6
A3L5H
B3L5H A3,B3L5H
I
7
XXX
I
8
A4M5H
B4M5H A4,B4M5H
I 9
I10
A4L5H
B4L5H A4,B4L5H
XXX
XXX
XXX
XXX
XXX
I11
A5M5H
B5M5H A5,B5M5H
I12
A5L5H
B5L5H A5,B5L5H
A1 +B1
A2+B2 A2+B2
CR*5R
A2+B2
A2
B2
CR*5R CR*5R
A3+B3 A3+B3
A3
B3
A3
B3
A3+B3
NOP
PR*5R
A4+B4
CR*5R
A3+B3
NOP
PR*5R PR*5R
A4+B4 A4+B4
NOP
PR*5R
A5+B5
PR*5R
A4+B4
NOP
PR*5R PR*5R
A5+B5 A5+B5
ENRA=O ENRB=O
Nap instruction is 011 0000 0000.
Product Register
Sum Register
Constant (C) Register
A1 +B1
51
A2+B2
51
51
A2+B2
52
51 *52
51
A3+B3
52
51 *52
51
ENRC=O
51
XXX
A3
B3
A4
B4
XXX
P1
51
XXX
53
A4
B4
P1 *53
P1
51
A4+B4
53
51
A4+B4 XXX
ENRA=O ENRB=O
A4
B4
A5
B5
----- -------
NOTE:
t PR =
SR =
CR =
A1 +B1
P1 *53 XXX
XXX
P2
------- - -
51
---------
X
54
Matrix Operations
The' ACT884 7 floating point unit can also be used to perform matrix manipulations
involved in graphics processing or digital signal processing. The FPU multiplies and
adds data elements, executing sequences of microprogrammed calculations to form
new matrices.
Representation of Variables
In state representations of control systems, an n-th order linear differential equation
with constant coefficients can be represented as a sequence of n first-order linear
differential equations expressed in terms of state variables:
d X1
-dt
_
- x2,···,
dX(n-1 )
dt
=
xn
For example, in vector-matrix form the equations of an nth-order system can be
represented as follows:
d
dt
x1
x2
a11
a12
a1n
b11
b1n
x2
xn
an1
+
or, X = ax
an2
ann
+
bn1
xn
Q
u2
m
:
b nn
un
bu
Expanding the matrix equation for one state variable, dX1/dt, results in the following
expression:
X1
=
(a11
*
x1
+ ... +
a1 n
* xn)
+
(b11
* u1
+ ... +
b1 n
* un)
where X1 = dX1/dt.
Sequences of multiplications and additions are required when such state space
transformations are performed, and the' ACT884 7 has been designed to support such
sum-of-products operations. An n X n matrix A multiplied by an n x n matrix X yields
an n X n matrix C whose elements cij are given by this equation:
n
Cij =
E
aik
* Xkj
for i = 1, ... ,n
j = 1, ... ,n
(1 )
k=1
7-151
'For the Cij elements to be calculated by the' ACT884 7, the corresponding elements
aik and Xkj must be stored outside the' ACT884 7 and fed to the' ACT884 7 in the
proper order required to effect a matrix multiplication such as the state space system
representation just discussed.
Sample Matrix Transformation
The matrix manipulations commonly performed in graphics systems can be regarded
as geometrical transformations of graphic objects, A matrix operation on another matrix
representing a graphic object may result in scaling, rotating, transforming, distorting,
or generating a perspective view of the image. By performing a matrix operation on
the position vectors which define the vertices of an image surface, the shape' and
position of the surface can be manipulated.
The generalized 4 x 4 matrix for transforming a three-dimensional object with
homogeneous coordinates is shown below:
a
e
T
b
f
c
d
g
h
k
.....
m
n
,
0
..
p
The matrix T can be partitioned into four component matrices, each of which produces
a specific effect oli the resultant image:
3
en
:2
-..J
~
l>
(")
-f
00
00
~
-..J
3
x 3
x
1
x 3
1
x 1
The 3 x 3 matrix produces linear transformation in the form of scaling, shearing and
rotation, The 1 x 3 row matrix produces translation, while the 3 x 1 column matrix
produces perspective transformation with multiple vanishing points. The final single
element 1 x 1 produces overall scaling. Overall operation of the transformation matrix
T on the position vectors of a graphic object produces a combination of shearing,
rotation, reflection, translation, perspective, and overall scaling.
The rotation of an object about an arbitrary axis in a three-dimensional space can be
carried out by first translating the object such that the desired axis of rotation passes
through the origin of the coordinate system, then rotating the object about the axis
7-152
through the origin, and finally translating the rotated object such that the axis of rotation
resumes its initial position. If the axis of rotation passes through the point P = [a b c 11.
then the transformation matrix is representable in this form:
[x y z h)
[x y z 1)
1
0
0
-a
0
1
0
-b
0
0
1
-c
0
0
0
1
R
0
1
0
b
1
0
0
a
0
0
1
c
0
0
0
1
(2)
I
I
translation
to origin
rotation
about
origin
translation
back to initial
position
where R may be expressed as:
R =
n12 + (1-n)2 cosq,
n 1n2( 1-cosq,) + n3sinq,
n 1 n3( 1-cosq,) - n2sinq,
0
n 1 n2( 1-cosq,) - n3sinq,
n22 + (1-n2)2 cosq,
n2n3( 1-cosq,) + n 1sinq,
0
n1n3(1-cosq,)+n2sinq,
n2n3(1-cosq,)-n1sinq,
n32 + (1-n3)2 cosq,
0
o
and
n1 = q1/(q1 2 + q22 + q32)1/2
o
o
direction cosine for x-axis of
rotation
direction cosine for y-axis of rotation
n3 = q3/(q 12 + q22 + q32) 1/2 = direction cosine for z-axis of rotation
n=
Q
(n1 n2 n3)
= unit vector for
Q
= vector defining axis of rotation = [q 1 q2 q3)
q, = the rotation angle about Q
A general rotation using equation (2) is effected by determining the [x y z) coordinates
of a point A to be rotated on the object, the direction cosines of the axis of rotation
[n1, n2, n3), and the angle q, of rotation about the axis, all of which are needed to
7-153
define matrix [R]. Suppose, for example, that a tetrahedron ABCD, represented by
the coordinate matrix below is to be rotated.about an axis of rotation RX which passes
through a point P = [5 - 6 3 1] and whose direction cosines are given by unit vector
[n1 = 0.866, n2 = 0.5, n3 = 0.707]. The angle of rotation 0 is 90 degrees (see
Figure 72). The rotation matrix [R] becomes
-3
-2
-1
-2
2
1
2
2
R
0.750
-0.274
1.112
0
3
2
2
2
A
B
C
D
1.140
0.250
-0.513
0
0.112
1.220
0.500
0
0
0
0
y
Z'
+----- - - - - - - - - - - ,
Q
55 0
D
I
Z
AR
I
IL ____
(3)
-+
B'
C'
r-
D'
90 0
P (5, -6.3)
I
I
y'
(1) THIS ARROW DEPICTS THE FIRST TRANSLATION
(2) THIS AROW DEPICTS THE 90 0 ROTATION
(3) THIS ARROW DEPICTS THE BACK TRANSLATION
Figure 72. Sequence of Matrix Operations
7-154
The point transformation equation (2) can be expanded to include all the vertices of
the tetrahedron as follows:
xa
xb
xc
xd
2-3
1- 2
2 -1
2 -2
ya
yb
yc
yd
3
2
2
2
za
zb
zc
zd
1
1
1
1
h1
h2
h3
h4
1 0 00
0.750 1.140 0.112 0 1 000
01 00 -0.274 0.250 1.22 0 0 1 0 0
00 1 0
1.112 -0.513 0.5000 o 0 1 0
- 56-31
1 5-6 3 1
0
0
0
I
I
I
translation
to origin
rotation about origin
translation
back to
initial
position
(3)
The 'ACT884 7 floating point unit can perform matrix manipulation involving
multiplications and additions such as those represented by equation (1). The matrix
equation (3) can be solved by using the' ACT884 7 to compute, as a first step, the
product matrix of the coordinate matrix and the first translation matrix of the righthand side of equation (3) in that order. The second step involves postmultiplying the
rotation matrix by the product matrix. The third step implements the back-translation
by pre multiplying the matrix result from the second step by the second translation
matrix of equation (3). Details of the procedure to produce a three-dimensional rotation
about an arbitrary axis are explained in the following steps:
"
"I:t
ex)
ex)
....
«
"I:t
u
"Z
en
7-155
Step 1
Translate the tetrahedron so that the axis of rotation passes through the origin. This
process can be accomplished by multiplying the coordinate matrix by the translation
matrix as follows:
2
1
2
-3
-2
-1
3
2
2
2
-2
2
1
0
0
-5
0
1
0
6
0
0
1
-3
(2-5)
(1 - 5)
(2-5)
(2-5)
0
0
0
1
(-3+6)
(-2+6)
(-1 +6)
(-2+6)
(3-3)
(2-3)
(2-3)
(2-3)
I
I
translation
to origin
vertices of translated
tetrahedron
-3
-4
-3
-3
+3
+4
+5
+4
0
-1
-1
-1
1
1
1
1
AT
BT
CT
DT
The' ACT884 7 could compute the translated coordinates AT, BT, CT, DT as indicated
above. However, an alternative method resulting in a more compact solution is
Presented below.
Step 2
Rotate the tetrahedron about the axis of rotation which passes through the origin after
the translation of Step 1. To implement the rotation of the tetrahedron, postmultiply
the rotation matrix [Rl by the translated coordinate matrix from Step 1 . The resultant
matrix represents the rotated coordinates of the tetrahedron about the origin as follows:
-3
-4
-3
-3
3
0
4 -1
5 -1
4 -1
7-156
1
0.750
1.140 0.112 0
1 -0.274
0.250 1.22 0
1
1.112 -0.513 0.500 0
1
0
0
0
1
- 3.072
- 5.208
-4.732
- 4.458
- 2.670
-3.047
-1.657
-1.907
3.324
3.932
5.264
4.044
I
I
rotation about origin
rotated coordinates
Step 3
Translate the rotated tetrahedron back to the original coordinate space. This is done
by premultiplying the resultant matrix of Step 2 by the translation matrix. The following
calculations produces the final coordinate matrix of the transformed object:
- 3.072
- 5.208
-4.732
-4.458
- 2.670
- 3.047
-1.657
-1.907
3.324
3.932
5.264
4.044
1
1
1
1
0
0
1
3
1
0
0
1
0
0
5 -6
1.928
-0.208
0.268
0.542
0
0
0
1
- 8.670
-9.047
-7.657
-7.907
6.324
6.932
8.264
7.044
1
1
1
1
I
I
translate back
final rotated coordinates
A more compact solution to these transformation matrices is a product matrix that
combines the two translation matrices and the rotation matrix in the order shown in
equation (3). Equation (3) will then take the following form:
xa
xb
xc
xd
ya
yb
yc
yd
za
zb
zc
zd
h1
h2
h3
h4
2
1
2
2
-3
-2
-1
-2
3
2
2
2
0.750
-0.274
1.112
-3.730
1.140
0.250
-0.513
-8.661
0.112
1.220
0.500
8.260
0
0
0
1
I
transformation matrix
7-157
The newly transformed coordinates resulting from the postmultiplication of the
transformation matrix by the coordinate matrix of the tetrahedron can be computed
using equation (1) which was cited previously:
n
Cij =
E
aik * Xkj
for i = 1, ... ,n
j = 1, ... ,n
(1 )
k=1
For example, the coordinates may be computed as follows:
xa = c11
a11 * x11 + a12 * x21 + a13 * x31 + a14 * x41
2 * 0.750 + (-3) * (-0.274) + 3 * 1.112 + 1 * (-3.73)
1.5 + 0.822 + 3.336 - 3.73
1.928
ya=c12= a11 *x12+a12*x22+a13*x32+a14*X42
2 * 1.140 + (-3) * 0.250 + 3 * (-0.513) + 1x(-8.661)
2.28 -0.75 - 1.539 - 8.661
-8.67
za = c13
a11 * x13 + a12 * x23 + a13 * x33 + a14 * x43
2 * 0.112 + (- 3) * 1.220 + 3 * 0.500 + 1 * 8.260
0.224 - 3.66 + 1.5 + 8.260
6.324
h1 = c14 = a11 * x14 + a12 * x24 + a13 * x34 + a14 * x44
2 * 0 + (- 3) * 0 + 3 * 0 + 1 * 1
0+0 + 0 + 1
1
A' = [1.928 - 8.67 6.324 11
The other rotated vertices are computed in a similar manner:
B' = [- 5.208 - 3.047 3.932 11
C' = [-4.732 -1.657 5.264 1)
0' = [- 4.458 -1.907 4.044 11
Microinstructions for Sample Matrix Manipulation
The' ACT884 7 FPU can compute the coordinates for graphic objects over a broad
dynamic range. Also, the homogeneous scalar factors h1, h2, h3 and h4 may be made
unity due to the availability of large dynamic range. In the example presented below,
some of the calculations pertaining to vertex A' are shown but the same approach
can be applied to any number of points and any vector space.
7-158
The calculations below show the sequence of operations for generating two
coordinates, xa and ya, of the vertex A' after rotation. The same sequence could be
continued to generate the remaining two coordinates for A' (za and h1 I. The other
vertices of the tetrahedron, B', C', and D', can be calculated in a similar way.
Table 52 presents a pseudocode description of the operations, clock cycles, and register
contents for a single-precision matrix multiplication using the sum-of-products sequence
presented in an earlier section. Registers used include the RA and RB input registers
and the product (PI and sum (SI registers.
Table 52. Single-Precision Matrix Multiplication (PIPES2-PIPESO .. 010)
CLOCK
CYCLE
1
MULTIPLIER/ALU
OPERATIONS
Loada11,x11
SP Multiply
a11 - RA, x11 -RB
p1=a11*x11
2
Load a12, x21
SP Multiply
Pass P to S
a12 -RA, x21 -RB
p2 = a12 * x21
p1 - Plp1)
3
Load a13, x31
SP Multiply
Add P to S
a13 - RA, x31 -RB
p3 = a13 * x31, p2 -Plp2)
Plp1) + 0-Slp1)
4
Load a14, x41
SP Multiply
Add P to S
a14 - RA, x41 - RB
p4 = a14 * x41, p3-Plp3)
Plp2) + Slp1) - SIp1 + p2)
5
Load a 11, x 12
SP Multiply
Add P to S
a11 - RA, x12 - RB
p5 = a11 * x12, p4 - Plp4)
Plp3) + SIp1 + p2) - SIp1 + p2
+ p3)
6
Load a12, x22
SP Multiply
Pass P to S
Output S
a12 - RA, x22 - RB
p6 = a12 * x22, p5 - Plp5)
Plp4) + SIp1 + p2 + p3) SIp1 + p2 + p3
+ p4)
7
Load a13, x32
SP Multiply
Add P to S
Load a 14, x42
SP Multiply
Add P to S
a 13 -RA, x32- RB
p7 = a13 * x32, p6-Plp6)
Plp5) + 0 - Slp5)
a 14-RA, x42 -RB
p8 = a14 * x42, p7 - Plp7)
Plp6) + Slp5)- SIp5 + p6)
Next operands
Next instruction
Add P to S
A - RA, B - RB
pi = A * B, p8 - Plp8)
Plp7) + SIp5 + p6) - SIp5
Next operands
Next instruction
Output S
C - RA, D - RB
pj = C * D, pi - Plpi)
Plp8) + SIp5 + p6 + p7) SIp5 + p6
8
9
10
PSEUDOCODE
I"-
~
CX)
CX)
I-
+ p6 + p7)
U
«~
I"-
Z
+ p7 + p8)
7-159
en
A microcode sequence to generate this matrix multiplication is shown in Table 53.
Table 53. Microinstructions for Sample Matrix Multiplication
I I
10-0
C CC
L 00 P P
K NN I I
M FF PP
0 II EE
DGGSS
E 1-02-0
SS
EE
LL
00
PP
7-0
RR
NN
DD
1-0
S
E
L
M
S S
BEE R
S
Y L L EH
FEE S /
ANNR
OOOTSSSATT
SRRCLEEEETTELPP
TAB C S Y C S P 1 -0 T T 1-0
000 0100 0000
10001100000
100 0000 0000
100 0000 0000
100 0000 0000
o 01
o 01
o 01
o 01
o 01
0101111 xxxx
01 0 1111 xxxx
01011111010
01011111010
01011111010
00
00
00
00
00
o1
o1
o1
o1
o1
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
xx
xx
xx
11
11
11
11
11
100 0110 0000
100 0000 0000
100 0000 0000
100 0000 0000
10001100000
o 01
o 01
o 01
o 01
o 01
01 0 1111 xxxx
01011111010
01011111010
01011111010
01 0 1111 xxxx
00
00
00
00
00
o1
o1
o1
o1
o1
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
xx
xx
xx
xx
11
11
11
11
11
Six cycles are required to complete calculation of xa, the first coordinate, and after
fQur more cycles the second coordinate ya is output. Each subsequent coordinate can
be calculated in four cycles so the 4-tuple for vertex A' requires a total of 18 cycles
to complete.
Calculations for vertices S', C', and D', can be executed in 48 cycles, 16 cycles for
each vertex. Processing time improves when the transformation matrix is reduced,
i.e., when the last column has the form shown below:
The h-scalars h1, h2, h3, and h4 are equal to 1. The number of clock cycles to generate
each 4-tuple can then be decreased from 16 to 13 cycles. Total number of clock cycles
to calculate all four vertices is reduced from 66 to 54 clocks. Figure 73 summarizes
the overall matrix transformation.
7-160
v
Z'
x'--------------------~~~--------------~~------------------~x
1°
I
I
I
I
B
C'
I
Z
.-0.
0'
B'
:A'
90°
P (5, -6,3)
I
I
I
I
V'
Figure 73, Resultant Matrix Transformation
This microprogram can also be written to calculate sums of products with all pipeline
registers enabled so that the FPU can operate in its fastest mode. Because of timing
relationships, the C register is used in some steps to hold the intermediate sum of
products. Latency due to pipelining and chained data manipulation is 11 cycles for
calculation of the first coordinate, and four cycles each for the other three coordinates.
After calculation of the first vertex, 16 cycles are required to calculate the four
coordinates of each subsequent vertex. Table 54 presents the sequence of calculations
for the first two coordinates, xA and yA.
Products in Table 54 are numbered according to the clock cycle in which the operands
and instruction were loaded into the RA, RB, and I register, and execution of the ~
instruction began. Sums indicated in Table 54 are listed below:
CO
CO
~
s1 = p1 + 0
s5 = p5 + p7
s9 = p10 + p12
u
s2 = p1 + p3
xA
p1 + p2 + p3 + p4
s6 = p6 + p8
s3 = p2 + p4
yA = p5 + p6 + p7 + p8
s7 = p9 + 0
~
s4 = p5 + 0
s8 = p9 + p11
«
"2
en
7-161
Table 54. Fully Pipelined Single-Precision Sum of Products (PIPES2-PIPESO = 000)
CLOCK
CYCLE
0
1
2
I
BUS
3
Mul
4
5
6
7
8
9
10
11
12
13
14
15
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Mul
Mul
Chn
DA
BUS
x11
x21
x31
x41
x12
x22
x32
x42
x13
x23
x33
x43
x14
x24
x34
x44
DB
BUS
a11
a12
a13
a14
a 11
a12
a13
a14
a11
a12
a13
a14
a11
a12
a13
a14
I
REG
RA
REG
RB
REG
MUL
PIPE
Mul
Mul
x11
x21
x31
x41
x12
x22
x32
x42
x13
x23
x33
x43
x14
x24
x34
a11
a12
a13
a14
a11
a12
a13
a14
a11
a12
a13
a14
a11
a12
a13
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
p13
p14
Chn
Mul
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
Chn
ALU
PIPE
51
t
52
53
54
xA
55
56
57
VA
58
59
P
REG
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
p11
p12
p13
S
REG
C
Y
REG BUS
51
p2
p2
p2
52
p6
p6 xA
p6
55
p10
p10 VA
p10
t
52
53
54
xA
55
56
57
VA
58
tContents of this register are not valid during this cycle.
Chebyshev Routines for the SN74ACT8847 FPU
Introduction
Using the SN74ACT8847, very efficient routines can be developed for the
implementation of transcendental functions. A high degree of accuracy can be achieved
by taking advantage of the' ACT884 7's ability to perform calculations using doubleprecision floating point operands.
This application note describes how to use the' ACT884 7 to implement seven different
transcendental functions. TIM (Texas Instruments Meta-Macro Assembler) assembly
files have been written for all seven functions and these files are available upon request
from Texas Instruments. The algorithm chosen to implement these functions is the
Chebyshev expansion method [11. Table 55 lists the functions that have been
implemented, along with the number of cycles required, and time required to perform
the calculations. Also listed in the table is the cycle count and time required to perform
the same calculation using the Motorola MC68881 Floating Point Coprocessor and
the Intel 80387 Numeric Processor Extension.
The Chebyshev expansion method was chosen rather than some of the more well
known methods, such as the Taylor series and Newton-Raphson approximation, for
a variety of reasons. Tht:! primary advantage of Chebyshev's method is that it provides
a uniform convergence rate in the number of terms required to achieve the desired
accuracy. Thus the range of the input value will have little effect on the accuracy of
the result. Another advantage is that the number of terms required to calculate the
7-162
approximation is relatively small. This provides for faster execution. Also, Chebyshev's
method can be applied to any function which is continuous and of bounded variation.
Lastly, tables are available which contain the constants necessary to implement
Chebyshev's method.
In order that this application note be useful to the largest audience, only those
instructions and features common to all 'ACT884 7 versions have been used to
implement the routines.
Contact Texas Instruments VLSI Logic applications group at (214) 997-3970 for a
copy of the seven TIM assembly files.
Table 55. Cycle Count and Execution Speed for the Seven Chebyshev Functions
CYCLE COUNTt
FUNCTION
'ACT8847
MC68881
Sine
51
416
Cosine
51
416
Tangent
84
498
ArcSine
68
606
ArcCosine
68
650
104
428
52
522
ArcTangent
Exponentiation
80387
122 to
771
123 to
772
191 to
497
Not
Avail.
Not
Avail.
314 to
487
Not
Avail.
EXECUTION SPEED:!:
IN MICROSECONDS
'ACT8847
MC68881
80387
7.32 to
1.53
25.0
46.3
7.38 to
1.53
25.0
46.3
11.5 to
2.52
29.9
29.8
Not
2.04
36.4
Avail.
Not
39.0
2.04
Avail.
1B.8 to
3.12
25.7
29.2
Not
31.3
1.56
Avail.
tFor MC68881 cycle count refer to 'MC68881 Floating Point Coprocessor User's Manual', Document No.
MC68881UM/AD, Page 6-13. For 80387 cycle count refer to '80387 Programmer's Reference Manual',
Document No. 231917-001, Page E-36.
;, ACT8847 cycle speed is 30 ns, 33 MHz
MC68881 cycle speed is 60 ns, 16.6 MHz
80387 cycle speed is 40 ns, 25 MHz
Overview of Chebyshev's Expansion Method
If fIx) is continuous and of bounded variation over the interval - 1 :s
fIx) may be approximated by the following equation:
X
:s 1, then
00
E
arTr(x)
r=O
7-163
Note that the range for x is between - 1 and 1. For most functions, this restriction
requires that the input, x, be range reduced before the calculation begins. Range
reducing an argument means to scale the argument down to a certain range. In the
case of Chebyshev approximations, the range is usually - 1 :s X :s 1, or 0 :s X :s 1.
In the equation for fIx) above, the constants represented by an are known as Chebyshev
coefficients. The variables represented by T r are known as Chebyshev polynomials
and can be derived from the following relationship and values:
T r +1(x) - 2xTdx) + T r -1(x) = 0,
TO(x) = 1,
T1 (x) = x
To illustrate Chebyshev's expansion method, the procedure to approximate function
fIx) using the first seven polynomials is now covered. Let
fIx) = 1/2aO +
a1T1 (x) +
a2T2(x) +
a3T3(x) +
a4T4(x) +
a5T 5(x) +
a6T6(x)
Substituting in the expressions for the polynomials,
fIx) = 1/2aO +
a1(x) +
a2(2x 2 -1) +
a3(4x 3 - 3x) +
a4(8x 4 -8x 2 + 1) +
a5( 16x 5 - 20x 3 + 5x) +
a6(32x 6 - 48x 4 + 18x 2 - 1 )
Rearranging the expression, by grouping powers of x,
fIx) = xO(1/2aO - a2 + a4 - a6) +
x 1 (a1 - 3a3 + 5a5) +
x 2 (2a2 - 8a4 + 18a6) +
x 3 (4a3 - 20a5) +
x4(8a4 - 48a6) +
x 5 ( 16a 5) +
x 6 (32a6)
en
:5
.&:lo
l>
~
ex>
ex>
.&:lo
.....J
7-164
Next make the following substitutions:
Let cO
cl
c2
c3
c4
c5
c6
=
1/2aO - a2 + a4 - a6
- 3a3 + 5a5
2a2 - 8a4 + l8a6
4a3 - 20a5
8a4 - 48a6
l6a5
32a6
= al
=
=
=
=
=
Substituting the c's into the last equation for fIx),
fIx) = CoxO
C4x4
+ cl xl + C2x2 + C3x3 +
+ C5x5 + C6 x6
Applying Horner's Rule yields,
fIx) = (((((C6X + C5)X + C4)X +
C3)X + C2)X + Cl)X + cO
In the remainder of the paper, the above equation will be referred to as Cseries'
Therefore,
Cseries_f(x) = (((((c6x
+ c5)x + C4)X +
c3)x + C2)x + Cl)X + cO
The last step prior to approximating fIx) is to calculate the c's by substituting the values
for the Chebyshev coefficients into the equations for cO through C6.
Format for the Remainder of the Application Note
Each of the seven functions will be covered in a separate section. Each section will
include the following information:
1. General steps required to perform the calculation including a description of
any preprocessing and/or postprocessing
2. An algorithm for each of the above steps
3. What system intervention, if any, is required; this intervention may take the
form of branching based on comparision status generated by the' ACT884 7,
or storing and then later retrieving intermediate results
4. The number of ' ACT884 7 cycles required to calculate fIx)
5. A listing of the c's
6. Pseudocode table showing how the calculation is accomplished. The
pseudocode tables list the contents of all the rei event ' ACT884 7 registers
and buses for each instruction.
7. Microcode table listing the instructions
7-165
,....
~
~
IU
~
,....
Z
(/)
References
[1] C. W. Clenshaw, G. F. Miller, and M. Woodger, "Algorithms for Special
Functions I," Numerische Mathematik, Vol 4, 1963, pages 403 through 419.
[2] C. W. Clenshaw, "Chebyshev Series for Mathematical Functions," Vol 5
of the Mathematical Tables of the National Physical Laboratory, Department
of Scientific Industrial Research, England, 1960.
Cosine Routine Using Chebyshev's Method
All floating point inputs and outputs are double precision. The input is in radians.
Steps Required to Perform the Calculation
STEP 1 - Preprocessing; range-reduce the input, X, to a range of [-1,1]. Next
square this range-reduced value, multiply it by 2.0, and finally subtract
1.0. X3 is the range-reduced input value, it must be stored externally.
'TRUNC' means to truncate.
X1 ~ X*(2.0/pi)
X2 ~ (4(TRUNC(0.25(X1 + 2.0)))) - X1
If X2 > 1.0
Then X3 ~ 2.0 - X2
Else X3 ~ X2
X4 ~ 2.0*(X3*X3) - 1.0
+ 1.0
STEP 2 - Core Calculation; X4 in Step 1 will be referred to as 'x' in the core
calculation.
X5 ~ Cseries cos
~ (((((((C8~X + C7)*X + cs)*x + C5)*x +
C4)*X + c3)*x + C2)*X + C1)*X + cO
STEP 3 - Postprocessing; multiply the output of the core calculation times X3.
Cosine(X)
~
X5*X3
Algorithms for the Three Steps
Step 1 perform the preprocessing:
T1 +-X*(2.0/pi)
T2 +-T1 + 2.0
T3 +-0.25*T2 and
T4 +-1.0 - CREG
T5 +-INT(T3)
T6 +-4*T5
T7 +-DOUBLE(T6)
T8 +-T7 + CREG
CMP (1.0,T8)
If (1.0 > T8)
Then T9 +- 2.0 - CREG
Else T9 +- CREG
T10 +-CREG*CREG
T11 +-T10 *2.0
T12+-T11 - 1.0
2.0/pi entered as a constant
CREG - T1, T3 and T4 result
from a chained instruction
round controls set to truncate
CREG - T4
convert from integer to double
CREG +- T8
T9 is X3 in Step 1, must
be stored externally
CREG -+ T9
T12 is X4 in Step 1, the
input to the core routine
Step 2 perform the core calculation:
T13 +-c8*CREG
T14+-T13 + c7
T15 +-T14*CREG
T16 +-T15 + c6
T17 +-T16*CREG
T18+-T17 + c5
T19 +-T18*CREG
T20 +-T19 + c4
T21 +-T20*CREG
T22 +-T21 + c3
T23 +-T22*CREG
T24 -T23 + c2
T25 +-T24*CREG
T26 -T25 + c1
T27 -T26*CREG
T28 +-T27 + cO
CREG +- T12
Step 3 perform the postprocessing:
Cosine(X) +- T28*T9
7-167
Required System Intervention
As seen in the algorithm for Step 1, the' ACT884 7 performs a compare. The results
of this compare determine which one of two calculations is to be performed. The
system, in which the' ACT884 7 is a part. must make the decision as to which of the
two calculations is to be performed. In addition, the system must store X3 and then
later furnish X3 as an input to the 'ACT884 7.
Number of ' ACT884 7 Cycles Required to Calculate Cosine(x)
Calculation of Cosine{x) requires 46 cycles. In addition, it is assumed that five additional
cycles are required due to the compare instruction, and resulting system intervention.
Therefore. the total number of cycles to perform the Cosine{x) calculation is 51.
Listing of the Chebyshev Constants (c's)
The constants are represented in IEEE double-precision floating point format.
c8
c7
c6
c5
c4
c3
c2
c1
cO
7-168
= 3D19D46B7D4C8F32
=
BD962909C5C01 ED6
= 3EOD53517735F927
= BE7CC930FDOADA9D
= 3EE3EOAF61F7677F
= BF41E5FDEF25C403
= 3F92A9FB40C119ED
= BFD23B03366AAOC9
= 3FF4464BCC8CBA 1 F
Pseudocode Table for the Cosine(x) Calculation
Table 56. Pseudocode for Chebyshev Cosine Routine (PIPES2-0 = 010, RND1-0 =00)
ClK
DA
BUS
DB
BUS
RB
REG
0
RA2.RB2
X is the input
2DIVPI is a constant
representing 2.0/pi
X l5H
2
2DIVPI
M5H
2DIVPI
l5H
X
2DIVPI
0
RA2.RB2
3
1.0 M5H
1.0 l5H
X
2DIVPI
0
PR4+RB4
4
2.0 M5H
2.0 l5H
1.0
2.0
0
PR4+RB4
1.0
0.25
1
5R5.RB5
RA5-CR5
6
1.0
0.25
0
DP2I(PR7)
7
1.0
0.25
0
DP2I(PR7)
0.25 l5H
ALU
PIPE
P
C
REG REG
V
S
REG BUS
INSTR
X M5H
0.25 M5H
MUl
PIPE
ClK
MODE
1
5
Pl
Pl
5R5.RB5
51
RA5-CR5
Double precision - integer
P2
4
0
5R8.RB8
1.0
4
1
12DP(PR9)
10
1.0
4
1
CR10+5Rl0
54
11
1.0
4
1
COMPARE
RA11,5Rll
55
1.0
4
0
NOP
1.0
2.0
1
RB13-CR13
1.0
4
1
PA5(CR13)
12
13a
2.0 M5H
2.0 l5H
13b
14
15
16
2.0 M5H
2.0 l5H
1.0
2.0
or 4
1
CR14.CR14
1.0
2.0
or 4
0
RA16.PR16
2.0
2.0
or 4
0
RA16.PR16
-'
0)
(0
SN74ACT8847
Cycles 6,7 set RND1, 0 = 01
52
1.0
4
COMMENT
Preload RA with 1.0 for
use in cycles 5 and 11
RA2.RB2
9
8
';'I
RA
REG
52
53
P3
Integer - double-precision
If 5Rll > RAll then 13a
If 5Rll s RA 11 then 13b
Wait for system response
55
Execute 13a or 13b
Pass contents of CREG
56
CR14.CR14
56
P4
56
56 is either RB13-CR13 or
CR13 from PA55 CR13, and
must be stored externally
for use in cycle 43
56
Output 56 in cycles 14 and
15
L1788.l:l"17LNS
";-I
Table 56. Pseudocode for Chebyshev Cosine Routine (PIPES2-0 .. 010. RND1-0 .. 00) (Continued)
~
o
CLK
DA
BUS
DB
BUS
17
RB
REG
CLK
MODE
INSTR
MUL
PIPE
2.0
2.0
or 4
0
PR18+RB18
RA16.PR16
RA
REG
18
-1.0 MSH
-1.0 LSH
2.0
-1.0
0
PR18+RB18
19
c8 MSH
c8 LSH
2.0
c8
1
SR19.RB19
2.0
c8
0
PR21 +RB21
c7 MSH
c7 LSH
2.0
c7
0
PR21 +RB21
22
2.0
c7
1
SR22.CR22
23
2.0
PR24+RB24 SR22.CR22
20
21
SR19.RB19
c7
c6
0
PR24+ RB24
25
2.0
c6
1
SR25.CR25
26
2.0
c6
0
PR27+RB27 SR25.CR25
2.0
c5
0
PR27+RB27
28
2.0
c5
1
SR28.CR28
29
2.0
c5
0
PR30+RB30 SR28.CR28
2.0
c4
0
PR30+ RB30
2.0
c4
1
SR31.CR31
2.0
c4
0
PR33+RB33 SR31.CR31
2.0
c3
0
PR33+RB33
SR34.CR34
PR36+RB36 SR34.CR34
27
30
c5 MSH
c4 MSH
c6 LSH
c5 LSH
c4 LSH
31
32
33
c3 MSH
c3 LSH
34
2.0
c3
1
35
2.0
c3
0
2.0
c2
36
c2 MSH
c2 LSH
PR36+RB36
0
_._--
-
-----
C
S
S7
2.0
c6 MSH
P
V
REG REG REG BUS
COMMENT
!
P5
0
24
ALU
PIPE
S7
Start core calculation
S7 is input to core calc.
P6
S8
P7
S9
P8
S10
,
P9
S11
Pl0
S12
I
I
Pll
----~
Table 56. Pseudocode for Chebyshev Cosine Routine (PIPES2-0 .. 010. RND1-0 == 001 (Concluded I
ClK
RA
REG
RB
REG
37
2.0
c2
1
5R37.CR37
38
2.0
c2
0
PR39+RB39
2.0
c1
0
PR39+RB39
40
2.0
c1
1
5R40.CR40
41
2.0
c1
0
PR42+RB42
39
DA
BUS
c1 M5H
DB
BUS
c1 l5H
ClK
MODE
INSTR
MUl
PIPE
ALU
PIPE
P
C
S
Y
REG REG REG BUS
COMMENT
513
5R37.CR37
P12
514
5R40.CR40
42
co M5H
co l5H
2.0
co
0
PR42+RB42
43
56 M5H
56 l5H
2.0
56
1
5R43.RB43
515
44
2.0
56
0
DUMMY
5R43.RB43
45
2.0
56
0
NOP
P14
P14 Output MSH of answer
46
2.0
56
0
NOP
P14
P14 Output L5H of answer
~
-..I
SN74ACT8847
P13
Begin postprocessing
Instruction is double·
precision RA + RB, allows
time for answer to
propagate to the Y bus
-
LV88.L::l'VvLNS
....';'I
Microcode Table for the Cosine(x) Calculation
N
All numbers are in hex. Any field with a length that is not a multiple of 4 is right justified and zero filled. For the microcode
table, the value of X has been chosen to be 1/2 pi.
-...J
P
A
D
A
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
3FF921FB
3FE45F30
3FFOOOOO
40000000
3FDOOOOO
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
40000000
00000000
00000000
BFFOOOOO
3D19D46B
00000000
BD962909
D
B
54442D18
6DC9C883
00000000
00000000
00000000
00000000
00000000
00000004
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
7D4C8F32
00000000
C5C01 ED6
PEE C P C C
B N N L I L 0
A B K P K N
C EMF
SOl
D G
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
00_
1 1 _
00_
1 1 _
0 1J
00_
00_
0 1 S
00_
0 0 _
00_
0 0 S
00_
00_
0 0 S
1 0 _
00_
0 1 _
0 1 _
0 0 S
0 1 _
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
o
o
o
o
1
o
o
o
1
1
o
1
o
o
o
o
1
o
o
3
3
3
3
3
3
3
1
3
3
3
3
3
3
3
3
3
3
3
3
3
s R H
E
L
0
P
E F
E A N L
S L C 0
E T
W
T
C
FF
FF
FB
FB
1
BD
o
FB
1
FB
o
BF
FB
F6
1
FE
1
FF
o
F7
1
5F
1
EF
o
EF
1
FB
1
FB
BF
1
FB
1 0
FB 1 1 1
o
o
o
o
0
o
o
0
o
o
o
0
o
o
0
o
o
o
o
0
o
N
S
T
R
R F S B S T S 555
N A RYE E E E E E
D S C T L SLY S C
T C EST V
P T
1CO o 0 0 0 3 3
1CO o 0 0 0 3 3
180 o 0 0 0 3 3
180 o 0 0 0 3 3
581 0 0 1 0 3 3
1A3
00033
1A3 1 000 3 3
240 o 0 0 0 3 3
1A2 o 0 0 0 3 3
180 o 0 0 0 3 3
182 o 0 0 0 3 3
300 o 0 0 0 3 3
1AO o 0 0 033
1CO o 0 0 0 3 3
1CO o 0 0 0 3 3
1CO o 0 0 0 3 3
180 o 0 0 0 3 3
180 o 0 0 0 3 3
1CO o 0 003 3
180 o 0 0 0 3 3
180 0 0 0 0 3 3
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
Microcode Table for the Cosine(x) Calculation (Continued)
p
';'I
~
-.J
w
A
D
A
D
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
00000000
00000000
3EOD5351
00000000
00000000
BE7CC930
00000000
00000000
3EE3EOAF
00000000
00000000
BF41E5FD
00000000
00000000
3F92A9FB
00000000
00000000
BFD23B03
00000000
00000000
3FF4464B
00000000
00000000
00000000
00000000
00000000
00000000
7735F927
00000000
00000000
FDOADA9D
00000000
00000000
61F7677F
00000000
00000000
EF25C403
00000000
00000000
40C119ED
00000000
00000000
366AAOC9
00000000
00000000
CC8CBA 1F
00000000
00000000
00000000
00000000
SN74ACT8847
PEE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
EMF 0
SOl P
D G
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
00_
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
0 1 _
0 0 _
0 0 _
0 0 _
II
1 3 9F
0 3 FB
3 FB
1 3 9F
0 3 FB
0 3 FB
1 3 9F
0 3 FB
0 3 FB
1 3 9F
3 FB
0 3 FB
1 3 9F
0 3 FB
0 3 FB
1 3 9F
0 3 FB
0 3 FB
1 3 9F
0 3 FB
0 3 FB
1 3 BF
0 -3 FF
0 3 FF
0 3 FF
R H
E F
E A N L
S L C 0
E T
W
T
C
1 1
1
1
1 1
o
1
o
1 1 1
1 1
1
1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
I
N
S
T
R
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
300
300
aaa
R F S B S T S
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
00003
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
o0 0 0 3
o0 0 0 3
o0 0 0 3
o0 0 0 3
o0 0 0 3
o
o
o
o
o
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
1
1
1
1
0
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
0 0 0
Sine Routine Using Chebyshev's Method
All floating point inputs and outputs are double precision. The input is in radians.
Steps Required to Perform the Calculation
STEP 1 - Preprocessing; range reduce the input, X, to a range of [ -1,1 J. Next
square this range-reduced value, multiply it by 2.0, and finally subtract
1.0. X3 is the range-reduced input value, it must be stored externally.
'TRUNC' means to truncate.
X1 - X*(2.0/pi)
X2 - X1 - (4(TRUNC(0.25(X1
If X2 > 1.0
Then X3 - 2.0 - X2
Else X3 - X2
X4 - 2.0*(X3*X3) - 1.0
+
1.0))))
STEP 2 - Core calculation; X4 in Step 1 will be referred to as 'x' in the core
calculation.
X5 -
Cseries_sin
-
(((((((C8**x
C4)*X
+ C7)*X + C6)*X + C5)*x +
+ c3)*x + C2)*X + c1 )*x + cO
STEP 3 - Postprocessing; multiply the output of the core calculation times X3.
Sine(X) -
X5*X3
Algorithms for the Three Steps
Step 1 perform the preprocessing:
T1 -X*(2.0/pi)
T2 -T1 + 1.0
T3 -0.25*T2
T4 -INT(T3)
T5 -4*T4
T6 -DOUBLE(T5)
T7 -CREG - T6
CMP (1.0,T7)
If (1.0 > T7)
Then T8 - 2.0 - CREG
Else T8 - CREG
T9 .... CREG*CREG
T10 .... T9 *2.0
T11 -T10 - 1.0
7-174
2.0/pi entered as a constant
CREG .... T1
round controls set to truncate
convert from integer to double
compare 1.0 to T7
CREG - T7
T8 is X3 in Step 1, must
be stored externally
CREG .... T8
T11 is X4 in Step 1 above, the input to
the core routine
T11 = 'x' from Step 2 above
Step 2 perform the core calculation:
T12 +-c8*CREG
T13+-T12 + c7
T14 +-T13*CREG
T15+-T14 + c6
T16 +-T15*CREG
T17+-T16+C5
T18 +-T17 *CREG
T19 +-T18 + c4
T20 +-T19*CREG
T21 +-T20 + c3
T22 +-T21 *CREG
T23 +-T22 + c2
T24 +-T23*CREG
T25 +-T24 + c1
T26 +-T25*CREG
T27 +-T26 +- cO
CREG +- T11
Step 3 perform the postprocessing:
Sine(X) +- T27*T8
Required System Intervention
As seen in the algorithm for Step 1, the' ACT884 7 performs a compare. The results
of this compare determine which one of two calculations is to be performed. The
system, in which the 'ACT884 7 is a part, must make the decision between which
two calculations are to be performed. In addition, the system must store X3 and then
later furnish X3 as an input to the' ACT884 7.
Number of ' ACT8847 Cycles Required to Calculate Sine(x)
Calculation of Sine(x) requires 46 cycles. In addition, it is assumed that five additional
cycles are required due to the compare instruction and resulting system intervention.
Therefore, the total number of cycles to perform the Sine(x) calculation is 51.
Listing of the Chebyshev Constants (c's)
The constants are represented in IEEE double-precision floating point format.
c8
C7
c6
c5
c4
c3
c2
c1
cO
= 3D19D46B7D4C8F32
= BD962909C5C01 ED6
= 3EOD53517735F927
= BE7CC930FDOADA9D
= 3EE3EOAF61F7677F
= BF41E5FDEF25C403
= 3F92A9FB40Cl19ED
= BFD23B03366AAOC9
= 3FF4464BCC8CBA 1 F
7-175
L1788.l:l'd17LNS
~
-
Pseudocode Table for the Sine(x) Calculation
-.J
(])
Table 57. Pseudocode for Chebyshev Sine Routine (PIPES2-0 ... 010, RND1-0 - 00)
ClK
DA
BUS
DB
BUS
1
X MSH
X lSH
2
2DIVPI
MSH
2DIVPI
lSH
3
RA
REG
RB
REG
ClK
MODE
INSTR
0
RA2.RB2
X is the input
i
2DIVPI is a constant
representing 2.0/pi
i
X
2DIVPI
0
RA2.RB2
X
2DIVPI
0
PR4+RB4
4
1.0 MSH
1.0 lSH
X
1.0
0
PR4+RB4
5
0.25 MSH
0.25 lSH
X
0.25
1
SR5.RB5
6
1.0 MSH
1.0 lSH
X
0.25
0
DP2I(PR7)
1.0
0.25
0
DP2I(PR7)
1.0
4
0
SR8.RB8
7
4
8
MUl
PIPE
ALU
PIPE
P
C
REG REG
S
Y
REG BUS
RA2.RB2
P1
P1
S1
Double precision
P2
4
1
12DP(PR9)
1.0
4
1
CR10-SR10
S3
11
1.0
4
1
COMPARE
RA11,SR11
S4
12
1.0
4
0
NOP
1.0
2.0
1
RB13-CR13
1
PAS(CR13)
2.0 lSH
1.0
4
1.0
2.0
or 4
1
CR14.CR14
1.0
2.0
or 4
0
RA16.PR16
16
2.0
2.0
or 4
0
RA16.PR16
17
2.0
2.0
or 4
0
PR18+RB18
2.0
-1.0
0
PR18+RB18
13b
14
15
2.0 MSH
18 -1.0 MSH
2.0 lSH
-1.0 lSH
integer
Cycles 6,7 set RND1,O = 01
1.0
-
-+
I
S2
9
2.0 MSH
!
SR5.RB5
10
13a
COMMENT
Integer
P3
-+
double precision
I
I
•
If SR11 -+ RA 11 then 13a
If SR11 :s RA11 then 13b
S4
Execute 13a or 1 3b
Pass contents of CREG
S5
CR14.CR14
S5
P4
RA16.PR16
P5
I
Wait for system response
S5
S5 is either RB13-CR13 or
CR13 from PASS CR13, and
must be stored externally
for use in cycle 43
S5
Output S5 in cycles 14 and
15
.
Table 57. Pseudocode for Chebyshev Sine Routine (PIPES2-0
CLK
DA
BUS
DB
BUS
RA
REG
RB
REG
c8 MSH
c8 LSH
2.0
CLK
MODE
INSTR
MUL
PIPE
c8
1
SR19.RB19
2.0
c8
0
PR21 +RB21
2.0
c7
0
PR21 +RB21
22
2.0
C7
1
SR22.CR22
23
2.0
c7
0
PR24+RB24 SR22.CR22
2.0
19
20
21
24
c7 MSH
c7 LSH
c6
0
PR24+RB24
2.0
c6
1
5R25.CR25
2.0
c6
0
PR27+RB27
2.0
c5
0
PR27+RB27
2.0
c5
1
SR28.CR28
2.0
c5
0
PR30+RB30
2.0
c4
0
PR30+RB30
31
2.0
c4
1
SR31.CR31
32
2.0
c4
0
PR33+RB33
2.0
c3
0
PR33+RB33
34
2.0
c3
1
5R34.CR34
35
2.0
c3
0
PR36+RB36
2.0
c6 MSH
c6 L5H
25
26
27
c5 M5H
c5 L5H
28
29
30
33
36
c4 MSH
c3 MSH
c2 M5H
c4 L5H
c3 L5H
c2 L5H
37
38
39
40
c1 M5H
c1 L5H
c2
0
PR36+RB36
2.0
c2
1
5R37.CR37
2.0
c2
0
PR39+RB39
2.0
c1
0
PR39+RB39
2.0
c,
1
5R40.CR40
~
-.J
-.J
SN74ACT8847
ALU
PIPE
010, RND1-0
P
C
S
00) (Continued)
y
REG REG REG BUS
S6
SR19.RB19
COMMENT
Start core calculation
S7 is input to core calc.
S6
P6
S7
P7
S8
5R25.CR25
P8
59
5R28.CR28
P9
510
5R31.CR31
P10
511
5R34·CR34
P11
S12
SR37.CR37
P12
513
L t7BB.i::n1t7LNS
Table 57. Pseudocode for Chebyshev Sine Routine (PIPES2-0
';'l
-.J
co
ClK
DA
BUS
DB
BUS
41
RA
REG
RB
REG
ClK
MODE
INSTR
MUl
PIPE
2.0
c1
0
PR42+RB42
SR40.CR40
42
Co MSH
co LSH
2.0
cO
0
PR42+RB42
43
S5 MSH
S5 LSH
2.0
S5
1
SR43.RB43
AlU
PIPE
010, RND1-0
P
C
S
00) (Concluded)
y
REG REG REG BUS
COMMENT
P13
S14
Begin postprocessing
Instruction is doubleprecision RA + RB, allows
time for answer to
44
2.0
S5
0
DUMMY
SR43.RB43
45
2.0
S5
0
NOP
P14
P14
Output MSH of answer
46
2.0
S5
0
NOP
P14
P14
Output LSH of answer
propagate to the Y bus
Microcode Table for the Sine(x) Calculation
All numbers are in hex. Any field with a length that is not a multiple of 4 is right justified and zero filled. For the microcode
table, the value of X has been chosen to be 1/2 pi.
p
A
0
A
0
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
3FF921FB
3FE45F30
00000000
3FFOOOOO
3FOOOOOO
3FFOOOOO
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
40000000
00000000
00000000
BFFOOOOO
3D 19046B
00000000
B0962909
54442018
60C9C883
00000000
00000000
00000000
00000000
00000000
00000004
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
704C8F32
00000000
C5C01 E06
PEE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
EMF 0
SOl P
o
~
~
-..J
co
SN74ACT8847
o
E F
E A N L
S L C 0
E T
W
T
C
I
N
S
T
R
R F S B S T S 555
N A RYE E E E E E
o S C T L SLY S C
T C EST Y
P T
G
F 0 0 _ 2 0 3
F 1 1 _ 2 0 3
F 0 0 _ 2 0 3
F 0 1 _ 2 0 3
F01.r2 1 3
F 0 0 _ 2 0 3
F 1 0 _ 2 0 3
F01_201
F 00_ 2 1 3
F 00_ 2 1 3
F 00_ 2 1 3
F 0
2 0 3
F 00_ 2 1 3
F 00_ 2 1 3
FOO.I2
3
F 1 0 _ 2 0 3
F 0 0 _ 2 0 3
F 0 1 _ 2 0 3
F 0 1 _ 2 1 3
F 0 0 I
2 0 3
F 0 1 _ 2 0 3
o.r
R H
FF
FF
FB
FB
BF
FB
FB
BF
FB
F6
FE
FF
F7
5F
EF
EF 1
FB
FB
BF
FB
FB
1
1
o
o
o
o
o
0
1
1
1
1
o
o
o
o
o
o
1
1
o
o
1
1
o
o
o
o
1
o
o
o
o
0
0
0
1CO
1CO
180
180
1CO
1A3
1A3
240
1A2
181
182
300
1AO
1CO
1CO
1CO
180
180
1CO
180
180
o
o
o
o
0 0 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
001 033
000
1 000 3 3
000
1 000 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
000 3 3
000
0 0 0 3 3 1 000
0 0 0 3 3
000
0 0 0 3 3
000
000 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3
000
o
o
o
o
o
o
o
o
o
o
o
o
o
LV88.1:lVvLNS
;J
00
0
-
Microcode Table for the Sine(x) Calculation (Continued)
~
A
,0
A
0
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
00000000
00000000
3E005351
00000000
00000000
BE7CC930
00000000
00000000
3EE3EOAF
00000000
00000000
BF41E5FO
00000000
00000000
3F92A9FB
00000000
00000000
BF023B03
00000000
00000000
3FF4464B
3FFOOOOO
00000000
00000000
00000000
00000000
00000000
7735F927
00000000
00000000
FDOADA90
00000000
00000000
61F7677F
00000000
00000000
EF25C403
00000000
00000000
40C119EO
00000000
00000000
366AAOC9
00000000
00000000
CC8CBA 1 F
00000000
00000000
00000000
00000000
p
PEE C
B N N L
A B K
C
P C C S
I L 0 E
P K N L
EMF 0
SOl P
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
o
00_
00_
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
00_
0 1 _
00_
0 0 _
0 1 _
00_
00_
0 1 _
0 1 _
0 0 _
0 0 _
0 0 _
o
o
o
E F
E A N L
S L C 0
E T
W
T
C
I
N
S
T
R
R F S B S T S 555
N A RYE E E E E E
o S C T L SLY S C
T C EST Y
P T
G
1 3
3
0 3
1 3
0 3
3
1 3
0 3
0 3
1 3
0 3
0 3
1 3
3
0 3
1 3
0 3
0 3
1 3
3
0 3
1 3
0 3
0 3
0 3
o
R H
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
BF
FF
FF
FF
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1 1
1 1
1 1
1 1
1 1
1
1
1
1
1
1
1
1
1
1
1 1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
300
300
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
003
0 0 3
0 0 3
0 0 3
0 0 3
003
0 0 3
0 0 3
0 0 3
0 0 3
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3 1 000
3 1 000
3 1 000
3 1 000
3 1 000
3 1 000
3 1 000
3 1 000
3 1 000
3 1 000
3 1 000
300 0 0
Tangent Routine Using Chebyshev's Method
All floating point inputs and outputs are double precision. The input is in radians.
Steps Required to Perform the Calculation
STEP 1 - Preprocessing; range reduce the input, X, to a range of [- 1,11. Next
square this range-reduced value, multiply it by 2.0, and finally subtract
1.0. X3 is the range-reduced input value, it must be stored externally.
'TRUNC' means to truncate. If X2 > 1 .0, then in the postprocessing
part of the routine, the answer is the reciprocal of X5*X3.
X1 +- X*(4.0/pi)
X2 +- X1 - (4(TRUNC(0.25(X1
If X2 > 1.0
Then X3 +- 2.0 - X2
Else X3 +- X2
X4 +- 2.0*(X3*X3) - 1.0
+
1.0))))
STEP 2 - Core Calculation; X4 in Step 1 will be referred to as 'x' in the core
calculation.
+- (((((((((((((C14)*X
cS)*x + cS)*x
C2)*X + c1 )*x
+ C13)*X + C12)*X + C11)*X + ClO)*x +
+ C7)*X + c6)*x + c5)*x + C4)*x + C3)*x +
+ cO
STEP 3 - Postprocessing; multiply the output of the core calculation times
X3. If X2 > 1.0, then the reciprocal of X5*X3 is the answer, if
X2 :5 1.0 then X5*X3 is the answer.
Tangent(X) +- X5*X3 (or reciprocal of X5*X3)
Algorithms for the Three Steps
Step 1 perform the preprocessing:
+-X*(4.0/pi)
T2 +-T1 + 1.0
T3 +-0.25*T2
T4 +-INT(T3)
T5 +-4*T4
T6 +-DOUBLE(T5)
T7 +-CREG +- T6
CMP (1.0,T7)
If (1.0 > T7)
Then TS +- 2.0 - CREG
Else TS +- CREG
T1
4.0/pi entered as a constant
I'
~
CREG +- T1
round controls set to truncate
CO
CO
....
«~
U
convert from integer to double
I'
Z
CREG +- T7
TS is X3 in Step 1, must
be stored externally
CJ)
7-181
T9 -CREG*CREG
T10 -T9*2.0
T11-T10 - 1.0
CREG - T8
T11 is X4 in Step 1, the
input to the core routine
Step 2 perform the core calculation:
T12 -c14 *CREG
T13-T12 + c13
T14-T13*CREG
T15-T14 + c12
T16-T15*CREG
T17-T16 + c11
T18 -T17*CREG
T19-T18 + c10
T20 -T19*CREG
T21 -T20 + c9
T22 -T21 *CREG
T23-T22 + c8
T24 -T23*CREG
T25-T24 + C7
T26 -T25*CREG
T27 -T26 + c6
T28 -T27*CREG
T29 -T28· + c5
T30 -T29*CREG
T31 -T30 + c4
T32 -T31 *CREG
T33-T32 + c3
T34 -T33*CREG
T35-T34 + c2
T36 -T35*CREG
T37 -T36 + c1
T38 -T37*CREG
T39-T38 + cO
Step 3 perform the postprocessing:
T40-T39*T8
If X2 (in Step 1) > 1.0
Then Tangent(X) - 1.0/T40
Else Tangent(X) - T40
7-182
CREG - T11
Required System Intervention
As seen in the algorithm for Step 1, the' ACT884 7 performs a compare. The results
of this compare determine which one of two calculations is to be performed. The
system, in which the' ACT884 7 is a part, must make the decision as to which of the
two calculations is to be performed. In addition, the system must store X3 and then
later furnish X3 as an input to the' ACT884 7. Finally, the system will have to determine
if it is necessary to take the reciprocal of the final product (T40 in the Algorithm for
Step 3) to yield the answer. If it is necessary to take the reciprocal, then the system
will be required to direct the variable T 40 from the' ACT884 7' s output bus to the input
buses. This is because operands for division instructions must be provided by the RA
and RB registers; feedback is not an option.
Number of ' ACT8847 Cycles Required to Calculate Tangent(x)
Calculation of Tangent(x) requires 79 cycles. In addition, it is assumed that five
additional cycles are required for system intervention due to the compare instruction.
Therefore, the total number of cycles required to perform the Tangent(x) calculation
is 84.
listing of the Chebyshev Constants (c's)
The constants are represented in IEEE double-precision floating point format.
c14
c13
c 12
c11
c10
c9
c8
C7
c6
c5
c4
c3
c2
c1
cO
= 3D747D842210CC35
= 3DA 1 D66636043991
= 3DCCD078F52B3A 73
= 3DF938F9CDDFF864
= 3E2620430E99B5B7
= 3E535C2C953CE515
=
3E80F07AFC099D7F
= 3EADA4D789EB45C4
= 3ED9F03D4C51A771
= 3F06B236DE4D014C
= 3F33DBFB01B3F415
= 3F6160DE701 F3A53
= 3F8E70A18736FC10
= 3FBAEA2653199611
= 3FEC14B2675B10BA
7-183
Lv881~.HfvLNS
~
-
Psuedocode Table for the Tangent{x) Calculation
00
"'"
Table 58. Pseudocode for Chebyshev Tangent Routine (PIPES2-0
CLK
DA
BUS
DB
BUS
RA
REG
RB
REG
X
4DIV
PI
INSTR
0
RA2.RB2
X is the input
0
RA2.RB2
4DIVPI is a constant
representing 4.0/pi
X MSH
X LSH
2
4DIVPI
MSH
4DIVPI
LSH
X
4DIVPI
0
PR4+RB4
4
1.0 MSH
1.0 LSH
X
1.0
0
PR4+RB4
5
0.25 MSH
0.25 LSH
X
0.25
1
SR5*RB5
6
1.0 MSH
1.0 LSH
X
0.25
0
DP2I(PR7)
1.0
0.25
0
DP21(PR7)
1.0
4
0
SR8.RB8
7
4
8
ALU
PIPE
P
C
S
RA2.RB2
P1
P1
S1
Double precision
SR5·RB5
P2
1.0
4
1
12DP(PR9)
4
1
CR10-SR10
S3
11
1.0
4
1
COMPARE
RA11,SR11
S4
1.0
4
0
NOP
1.0
2.0
1
RB13-CR13
1.0
4
1
PAS(CR13)
1.0
2.0
or 4
1
CR14.CR14
1.0
2.0
or 4
0
RA16·PR16
16
2.0
2.0
or 4
0
RA16.PR16
17
2.0
2.0
or 4
0
PR18 + RB18
2.0
-1.0
0
PR18 + RB18
2.0 LSH
13b
integer
= 01
S2
1.0
2.0 MSH
~
Cycles 6,7 set RND1,0
9
13a
COMMENT
REG REG REG BUS
10
12
y
CLK
MODE
1
3
MUL
PIPE
010, RND1-0 = 0)
P3
Integer
~
double precision
If SR 11 > RA 11 then 13a
If SR 11 ,,; RA 11 then 13b
S4
Wait for system response
Execute 13a or 13b
Pass contents of Creg
S5 is either RB13-CR13 or
14
15
18
2.0 MSH
-1.0 MSH
2.0 LSH
-1.0 LSH
S5
CR14·CR14
S5
P4
RA16·PR16
P5
S5
S5
CR13 from PASS CR13, and
must be stored externally
for use in cycle 61
Output S5 in cycles 14 and
15
Table 58. Pseudocode for Chebyshev Tangent Routine (PIPES2-0 ... 010. RND1-0
CLK
19
DA
BUS
DB
BUS
RA
REG
RB
REG
c14 MSH
c14 LSH
2.0
c14
1
SR19.RB19
2.0
c14
0
PR21 +RB21
2.0
20
21
MUL
PIPE
0
PR21 + RB21
c13
1
SR22.CR22
23
2.0
c13
0
PR24+RB24 SR22.CR22
2.0
c12
0
PR24+RB24
c12
1
SR25.CR25
c12 MSH
c13 LSH
c12 LSH
25
2.0
26
2.0
c12
0
PR27+RB27
2.0
c"
0
PR27+RB27
2S
2.0
cll
1
SR2S.CR2S
29
2.0
cll
0
PR30+RB30
2.0
cl0
0
PR30+RB30
31
2.0
clO
1
SR31.CR31
32
2.0
clO
0
PR33+RB33
2.0
c9
0
PR33+RB33
30
33
cll MSH
clO MSH
c9 MSH
cl1 LSH
clO LSH
c9 LSH
34
2.0
c9
1
SR34.CR34
35
2.0
c9
0
PR36+RB36
2.0
36
Cs
0
PR36+RB36
37
2.0
cs
1
SR37.CR37
3S
2.0
Cs
0
PR39+RB39
2.0
c7
0
PR39+RB39
40
2.0
c7
1
SR40.CR40
41
2.0
c7
0
PR42+RB42
2.0
c6
0
PR42+RB42
39
42
cs MSH
c7 MSH
c6 MSH
Cs LSH
c7 LSH
c6 LSH
U1
SN74ACT8847
PIPE
P
C
S
y
REG REG REG BUS
S6
c13
c13 MSH
ALU
SR19.RB19
2.0
27
";'l
INSTR
22
24
IX)
CLK
MODE
S6
0) (Continued)
COMMENT
Start core calculation
S7 is input to core calc.
P6
S7
P7
SB
SR25.CR25
PS
S9
SR2S.CR2S
P9
S10
SR31.CR31
Pl0
Sll
SR34.CR34
Pll
I
S12
SR37·CR37
P12
S13
SR40.CR40
P13
,
Lv88.l:>"vLNS
Table 58. Pseudocode for Chebyshev Tangent Routine (PIPES2-0
;J
.....
00
Ol
RA
REG
RB
REG
43
2.0
c6
1
5R43.CR43
44
2.0
PR45+ RB45
CLK
DA
BUS
DB
BUS
CLK
MODE
INSTR
c6
0
2.0
c5
0
PR45+ RB45
46
2.0
c5
1
5R46.CR46
47
2.0
c5
0
PR48+RB48
2.0
c4
0
PR48+ RB48
49
2.0
c4
1
5R49.CR49
50
2.0
c4
0
PR51 +RB51
2.0
c3
0
PR51 +RB51
1
5R52.CR52
45
48
51
c5 L5H
c5 M5H
c4 L5H
c4 M5H
c3 M5H
c3 L5H
52
53
54
c2 M5H
c2 L5H
2.0
c3
2.0
c3
0
PR54+RB54
2.0
c2
0
PR54+RB54
c2
1
5R55.CR55
2.0
55
2.0
c2
0
PR57+RB57
2.0
c1
0
PR57+RB57
58
2.0
c1
1
5R58.CR58
59
2.0
PR60+RB60
56
57
c1 M5H
c1 L5H
c1
0
60
co M5H
Co L5H
2.0
co
0
PR60+RB60
61
55 M5H
55 L5H
2.0
55
1
5R61>RB61
2.0
62
-
55
0
DUMMY
MUL
PIPE
ALU
PIPE
010. RND1-0
P
C
S
V
REG REG REG BUS
0) (Concluded)
COMMENT
514
5R43.CR43
P14
515
5R46.CR46
P15
516
5R49.CR49
P16
517
5R52.CR52
P17
51B
5R55.CR55
P18
519
5R5B.CR5B
P19
520
5R61.RB61
Begin postprocessing
Instruction is RA + RB, used
to allow time for result
to propagate to Y bus
Table 58. Pseudocode for Chebyshev Tangent Routine (PIPES2-0 .. 010. RND1-0
ClK
DA
BUS
DB
BUS
63
RA
REG
2.0
RB
REG
S5
ClK
MODE
0
INSTR
MUl
PIPE
AlU
PIPE
NOP
P
C
S
Y
REG REG REG BUS
P20
0) (Continued)
COMMENT
Output MSH. if cycle 13b
was executed then P20 is
the answer; if cycle 13a
P20
was executed then the
answer is 1.0/P20. which
is calculated next
64
1.0 M5H
1.0 l5H
2.0
55
0
DIV
65
P20 M5H
P20 L5H
1.0
P20
0
DIV
Operands for Division must
come from RA and RB.
feedback is not an option
P20 Output l5H
66
1.0
P20
0
NOP
Wait for Division result
67
1.0
P20
0
NOP
Wait for Division result
68
1.0
P20
0
NOP
Wait for Division result
69
1.0
P20
0
NOP
Wait for Division result
70
1.0
P20
0
NOP
Wait for Division result
71
1.0
P20
0
NOP
Wait for Division result
72
1.0
P20
0
NOP
Wait for Division result
73
1.0
P20
0
NOP
Wait for Division result
74
1.0
P20
0
NOP
Wait for Division result
75
1.0
P20
0
NOP
Wait for Division result
76
1.0
P20
0
NOP
Wait for Division result
77
1.0
P20
0
NOP
78
1.0
P20
0
NOP
79
1.0
P20
0
NOP
...~
CO
-.J
SN74ACT8847
Wait for Division result
-----------
P21
P21
Output M5H of answer
P21
P21
Output L5H of answer
L~88.l::>-V~lNS
';'I
.....
(10
(10
Microcode Table for the Tangent(x) Calculation
All numbers are in hex. Any field with a length that is not a multiple of 4 is right justified and zero filled. For the microcode
table, the value of X has been chosen to be 1/3 pi.
P
A
D
A
D
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
3FFOC152
3FF45F30
00000000
3FFOOOOO
3FDOOOOO
3FFOOOOO
00000000
00000000
00000000
00000000
00000000
00000000
40000000
00000000
40000000
00000000
00000000
BFFOOOOO
3D747D84
00000000
3DA1D666
382D7365
6DC9C883
00000000
00000000
00000000
00000000
00000000
00000004
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
2210CC35
00000000
36043991
.p EE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
£ M F 0
SOl P
D G
F 0 0 _ 2 0 3
F 1 1 _ 2 0 3
F 0 0 _ 2 0 3
F 0 1
2 0 3
F 0 1 S
2 1 3
F 0 0 _ 2 0 3
F 1 0 _ 2 0 3
F01_201
F 00_ 2 1 3
F 00_ 2 1 3
F 00_ 2 1 3
F 0 0 J
2 0 3
F 0 1
2 1 3
F 00_ 2 1 3
F 0 0 I
2 0 3
F 1 0 _ 2 o 3
F 0 0 _ 2 0 3
F 0 1 _ 2 0 3
2 1 3
F 0 1
F 0 O..f" 2 0 3
F 0 1 _ 2 0 3
FF
FF
FB
FB
BF
FB
FB
BF
FB
F6
FE
FF
F7
5F
EF
EF
FB
FB
BF
FB
FB
R H
E F
E A N L
S L C 0
E T
W
T
C
1
1
1
1
o
o
o
o
o
0
1
1
1
1
1
1
o
o
o
o
o
o
1
1
o
o
o
0
1
o 0
1
1 o
·1
1 o
1 1 1 o
1
1 o
o 0
1
1
1 o
I
N
S
T
R
RF
N A
D S
T
S B S T S 0 0 0
RYE E E E E E
C T L SLY S C
C EST Y
P T
1CO o 0 0 0 3 3
1CO o 0 0 0 3 3
180 o 0 0 0 3 3
180 o 0 0 0 3 3
1CO 0 0 1 0 3 3
1A3 1 000 3 3
1A3 1 000 3 3
240 o 0 0 0 3 3
1A2 o 0 0 0 3 3
181 .0 0 0 0 3 3
182 o 0 0 0 3 3
300 o 0 0 0 3 3
183 o 0 0 0 3 3
1CO o 0 0 0 3 3
1CO o 0 0 0 3 3
1CO o 0 0 0 3 3
180 o 0 0 0 3 3
180 o 0 0 03 3
1CO o 0 0 0 3 3
180 o 0 0 0 3 3
180 o 0 0 0 3 3 1
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
Microcode Table for the Tangent(x) CalcuJation (Continued)
p
A
~
.....
co
(l)
D
A
F 00000000
F 00000000
F 3DCCD078
F 00000000
F 00000000
F 3DF938F9
F 00000000
F 00000000
F 3E262043
F 00000000
F 00000000
F 3E535C2C
F 00000000
F 00000000
F 3E80F07A
F 00000000
F 00000000
F3EADA4D7
F 00000000
F 00000000
F 3ED9F03D
F 00000000
F 00000000
F 3F06B236
D
B
00000000
00000000
F52B3A73
00000000
00000000
CDDFF864
00000000
00000000
OE99B5B7
00000000
00000000
953CE515
00000000
00000000
FC099D7F
00000000
00000000
89EB45C4
00000000
00000000
4C51 A 771
00000000
00000000
DE4D014C
PEE C
B N N L
A B K
C
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
SN74ACT8847
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
s RH
P C C
I L 0 E
P K N L
EMF 0
SOl P
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
o
G
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
o
E F
E A N L
S L C 0
E T
W
T
C
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB 1
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
1
1
1
1 1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
I
N
S
T
R
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
R F S B S T S 000
N A RYE E E E E E
o S C T L SLY S C
·T C EST Y
P T
o
o
o
o
o
o
o
o
o
o
o
o
o
0 003 3
0 0 0 3 3
0 0 0 3 3
0 003 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3 1
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 003 3
0 0 0 3 3
00003 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
00003 3
0 003 3
0000331
0 003 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
o
o
o
o
o
o
o
o
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
Lv88.l:lVvLNS
....~
0
-
Microcode Table for the Tangent(x} Calculation (Continued)
<0
P
A
0
A
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
00000000
00000000
3F330BFB
00000000
00000000
3F61600E
00000000
00000000
3F8E70A1
00000000
00000000
3FBAEA26
00000000
00000000
3FEC14B2
3FE55555
00000000
00000000
3FFOOOOO
3FE279A7
00000000
00000000
00000000
00000000
0
B
00000000
00000000
01 B3F415
00000000
00000000
701F3A53
00000000
00000000
8736FC10
00000000
00000000
53199611
00000000
00000000
675B10BA
55555555
00000000
00000000
00000000
45903310
00000000
00000000
00000000
00000000
PEE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
EMF 0
SOl P
o G
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
00_
0 0 _
0 1
00_
0 0 _
0 1
00_
0 0
0 1
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
0 1
0 0 _
0 0
0 0 _
1 1 _
0 0 _
0 0 _
0 0 _
0 0 _
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
BF
FF
FF
FF
FF
FF
FF
FF
FF
R H
E F
E A N L
S L C 0
E T
W
T
C
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
I
N
S
T
R
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
300
1EO
1EO
0300
300
300
300
o
o
o
R F S B S T S 000
N A RYE EE E E E
o S C T L SLY S C
T C EST Y
P T
000 0 3 3
000
0 0 0 3 3
000
0 0 0 3 3 1 000
000 0 3 3 1 000
0 0 0 331 000
0 0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
000 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 300 0 0
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
0 0 0 3 3 1 000
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
Microcode Table for the Tangent(x) Calculation (Concluded)
p
A
D
A
F 00000000
F 00000000
F 00000000
F 00000000
F 00000000
F 00000000
FOOOOOOOO
F 00000000
F 00000000
F 00000000
D
B
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
PEE C
B N N l
A B K
C
pee
I l 0
P K N
EMF
SOl
D G
s
E
l
0
P
R H
F
F
F
F
F
F
F
F
F
F
2
2
2
2
2
2
2
2
2
2
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
1
1
1
1
1
~
~
CD
SN74ACT8847
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
_
_
_
_
_
_
_
_
_
_
0
0
0
0
0
0
0
0
0
0
3
3
3
3
3
3
3
3
3
3
E F
E A N l
S leo
E T
W
T
C
1
1
o
o
o
o
o
o
o
o
o
o
I
N
S
T
R
300
300
300
300
300
300
300
300
300
300
R F S B S T S 000
N A RYE E E E E E
D seT l SLY S C
TeE STY
P T
o
o
o
o
o
o
o
o
o
o
0
0
0
0
0
0
0
0
0
0
003
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
3
000
3
000
3
000
3
000
3 1 000
3
000
3
000
3
000
3 1 000
3 0 0 0 0
ArcSine & ArcCosine Routine Using Chebyshev's Method
All floating point inputs and outputs are double precision. The output is in radians.
Steps Required to Perform the Calculation
STEP 1 - Preprocessing; range reduction is not needed, because an input, X,
outside the range of [-1,11 indicates an error. This routine requires
that the X2 be less than or equal to 1/2. The first operation to be
performed is to square X, then multiply it by 4.0, and finally subtract
1.0.
STEP 2 - Core Calculation; X1 in Step 1 will be referred to as 'x' in the core
calculation.
X2 ~ Cseries_asin&acos
~
((((((((((((((((C1S*X +C17)*x + C16)*x +
c15*x + C14)*X + C13)*x + C12)*X + C11)*X + C10)*X +
cg)*x + cS)*x +C7)*X + C6)*x + 05)*x + C4)*X + C3)*x +
C2)*X + c1 )*x + cO
.
STEP 3 - Postprocessing; ml!ltiply the output of the core calculation times
SORT(2.0), then multiply this product by X, the original input. This
yields ArcSine(X). To calculate ArcCosine(X), the fqllowing identity
is used:
ArcCosine(X) = pi/2 - ArcSine(X)
X3 ~ X2*SORT(2.0)
ArcSine(X) +- X3*X
ArcCosine(X) +- pi/2 - ArcSine(X)
Algorithms for the Three Steps
Step 1 perform the preprocessing:
T1 +-X*X
T2 ~4.0*T1
T3 +-T2 - 1
7-192
T3 is X 1 in Step 1, the input to the core
routine
Step Two perform the core calculation:
T4 -c18*CREG
T5 -T4 + c17
T6 -T5*CREG
T7 -T6 + c16
T8 -T7*CREG
T9 -T8 + c15
Tl0 -T9*CREG
Tll -Tl0 + c14
T12 -Tll *CREG
T13-T12 + c13
T14-T13*CREG
T15-T14 + c12
T16 -T15*CREG
T17 -T16 + c11
T18-T17*CREG
T19-T18 + clO
T20 -T19*CREG
T21 -T20 + c9
T22 -T21 *CREG
T23-T22 + c8
T24 -T23*CREG
T25-T24 + c7
T26 -T25*CREG
T27-T26 + c6
T28 -T27*CREG
T29-T28 + c5
T30 -T29*CREG
T31 -T30 + c4
T32 -T31 *CREG
T33-T32 + c3
T34 -T33*CREG
T35-T34 + c2
T36 -T35*CREG
T37 -T36 + cl
T38 -T37*CREG
T39-T38 + cO
CREG - T3
"
~
00
00
~
u
«~
Step 3 perform the postprocessing:
T40 - X*T39
ArcSine(X) - T40*SORT(2.0)
ArcCosine(X) - pi/2 - ArcSine(X)
SORT(2.0) entered as a constant
7-193
"Z
tJ)
Required System Intervention
There is no system intervention required to calculate ArcSine(X) and ArcCosine(X).
Number of 'ACT8847 Cycles Required to Calculate ArcSine(x) and
ArcCosine(x)
The total number of cycles required to perform the ArcSine(x) and ArcCosine(x)
calculation is 68.
Listing of the Chebyshev Constants (c's)
The constants are represented in IEEE double-precision floating point format.
c18
c17
c16
c15
c14
c13
c12
cll
clO
c9
c8
c7
c6
c5
c4
c3
c2
cl
cO
7-194
3DA4A49F8CCD9E73
3DC05DFE52AAD200
3DCCF3l E26F94C8D
3DE86CDA3C8CAEBO
= 3E0768D9F4E950EA
= 3E2383A37598FC80
= 3E403E4B2F65FODE
= 3E5BAFC8245ABDF8
= 3E77E3333AFF1AB4
= 3E94E3A4D4220C9C
= 3EB296DD4C084ACB
= 3EDOE9l3F5F9D496
= 3EEFA74E896F8FA8
= 3FOEC76B7832DBB6
= 3F2F978698C8B2E4
= 3F5l9B1087542073
= 3F7696895FFC05AO
= 3FA375CA6l D2988C
= 3FE7B20423D1D930
=
=
=
=
Pseudocode Table for the ArcSine(x) and ArcCosine(x) Calculation
Table 59. Pseudocode for Chebyshev ArcSine and ArcCosine Routine (PIPES2-0 '" 010, RND1-0
ClK
DA
BUS
DB
BUS
1
X MSH
X LSH
2
X MSH
X LSH
3
4.0 MSH
4.0 LSH
X
X
ClK
MODE
INSTR
0
RA2.RB2
0
RA2.RB2
X
X
0
RA4.PR4
4.0
X
0
RA4.PR4
5
4.0
X
0
PR6+RB6
6
-1.0 MSH
-1.0 LSH
4.0
-1.0
0
PR6+RB6
7
c18 MSH
c18 LSH
4.0
c18
1
SR7.RB7
4.0
c18
0
PR9+RB9
9
4.0
c17
0
PR9+RB9
10
4.0
c17
1
SR10·CR10
11
4.0
c17
0
PR12+RB12
4.0
c16
0
PR12+RB12
13
4.0
c16
1
SR13·CR13
14
4.0
c16
0
PR15+RB15
4.0
c15
0
PR15+RB15
16
4.0
c15
1
SR16.CR16
17
4.0
c15
0
PR18+RB18
4.0
c14
0
PR18+RB18
c14
1
SR19.CR19
12
15
18
c17 MSH
c16 MSH
c15 MSH
c14 MSH
c17 LSH
c16 LSH
c15 LSH
c14 LSH
MUl
PIPE
y
COMMENT
P2
Sl
SR7.RB7
S2
SR10.CR10
P4
S3
SR13.CR13
P5
S4
SR16·CR16
P6
c14
0
4.0
c13
0
PR21 +RB21
22
4.0
c13
1
SR22.CR22
23
4.0
c13
0
PR24+RB24 SR22.CR22
Start core calculation
S 1 is input to core calc.
Sl
P3
20
SN74ACT8847
S
P1
PR21 +RB21
(11
C
RA4.PR4
4.0
c13 LSH
P
REG REG REG BUS
RA2.RB2
4.0
c13 MSH
ALU
PIPE
X is the input
19
21
co
RB
REG
4
8
i"
RA
REG
00)
S5
SR19.CR19
P7
S6
Lv88.L~nfvLNS
....~
co
0)
Table 59. Pseudocode for Chebyshev ArcSine and ArcCosine Routine (PIPES2-0 = 010. RND1-0 == 00) (Continued)
ClK
24
DA
BUS
c12 MSH
DB
BUS
RA
REG
c12 lSH
4.0
RB
REG
ClK
MODE
INSTR
c12
0
PR24+RB24
25
4.0
c12
1
SR25.CR25
26
4.0
cl2
0
PR27+RB27
4.0
27
c11 MSH
cll LSH
2S
29
30
clO MSH
clO LSH
c11
0
PR27+RB27
4.0
cll
1
SR2S.CR2S
4.0
cll
0
PR30+RS30
4.0
ClO
0
PR30+RS30
SR31·CR3'1
31
4.0
ClO
1
32
4.0
ClO
0
PR33+RB33
4.0
C9
0
PR33+RB33
34
4.0
c9
1
SR34.CR34
35
4.0
c9
0
PR36+RB36
4.0
cs
0
PR36+RB36
37
4.0
cs
1
SR37.CR37
3S
4.0
cs
0
PR39+RS39
4.0
33
36
c9 MSH
Cs MSH
c9 LSH
Cs LSH
c7
0
PR39+RB39
4.0
c7
1
SR40·CR40
4.0
c7
0
PR42+RS42
4.0
c6
0
PR42+RB42
43
4.0
c6
1
SR43.CR43
44
4.0
c6
0
PR45+RB45
4.0
c5
0
PR45+RB45
46
4.0
c5
1
SR46.CR46
47
4.0
c5
0
PR4S+RB4S
4.0
c4
0
PR4S+RB4S
39
c7 MSH
c7 LSH
40
41
42
45
4S
c6 MSH
c5 MSH
c4 MSH
c6 LSH
c5 LSH
c4 LSH
MUl
PIPE
ALU
PIPE
P
C
S
y
REG REG REG BUS
PS
S7
SR25.CR25
P9
SS
SR2S.CR2S
P1Q
S9
SR31.CR31
Pll
SIO
SR34.CR34
P12
S11
SR37.CR37
P13
S12
SR40.CR40
P14
S13
SR43.CR43
P15
S14
SR46.CR46
P16
COMMENT
Table 59. Pseudocode for Chebyshev ArcSine and ArcCosine Routine (PIPES2-0 - 010, RND1-0 ... 00) (Concluded)
ClK
DA
BUS
DB
BUS
RB
ClK
REG . MODE
. INSTR
49
4.0
c4
1
SR49*CR49
50
4.0
c4
0
PR51 +RB51
4.0
c3
0
PR51 +RB51
52
4.0
c3
1
SR52.CR52
53
4.0
c3
0
PR54+RB54
4.0
51
54
c3 MSH
c3 lSH
c2
0
PR54+RB54
55
4.0
c2
1
SR55.CR55
56
4.0
c2
0
PR57+RB57
4.0
c1
0
PR57+RB57
4.0
c1
1
SR58.CR58
57
c2 MSH
c1 MSH
c2 LSH
c1 LSH
58
59
4.0
c1
0
PR60+RB60.
LSH
4.0
cO
0
PR60+RB60
X MSH
X LSH
4.0
X
1
SR61.RB61
SORT(2)
MSH
SORT(2)
LSH
4.0
X
0
RA63.PR63
63
SORT
2
X
0
RA63.PR63
64
SORT
2
X
0
DUMMY
SORT
2
pi/2
1
RB66-PR66
67
SORT
2
pi/2
0
NOP
68
SORT
2
pi/2
0
NOP
60
Co
61
62
66
':"
CD
-.J
RA
REG
MSH
pi/2 MSH
Co
pi/2 LSH
MUl
PIPE
AlU
PIPE
P
REG
C
REG
S
REG
Y
BUS
COMMENT
S15
SR49.CR49
P17
S16
SR52.CR52
P18
S17
SR55.CR55
P19
S18
SR58"CR58
P20
Begin postprocessing
S19
SORT(2) is the real value
of square root of 2.0
SR61.RB61
P21
Instruction is doubleprecision RA + RB, prevents
ArcCosine from overwriting ArcSine result
RA63.PR63
P22
P22
Output LSH of ArcSine
S20
S20
Output MSH of ArcCosine
S20
S20
Output LSH of ArcCosine
L v88.l::nfv L NS
-
....-;J
Microcode Table for the ArcSine(x) and ArcCosine(x)· Calculation
00
All numbers are in hex. Any field with a length that is not a multiple of 4 is right justified and zero filled. For the microcode
table, the value of X has been chosen to be 1/(SQRT(2.0)).
(I)
p
A
D
A
D
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
3FE6A09E
3FE6A09E
40100000
00000000
00000000
BFFOOOoo
3DA4A49F
00000000
3DC05DFE
00000000
00000000
3DCCF31 E
00000000
00000000
3DE86CDA
00000000
00000000
3E0768D9
00000000
00000000
667F3BCD
667F3BCD
00000000
00000000
00000000
00000000
8CCD9E73
00000000
52AAD200
00000000
00000000
26F94C8D
00000000
00000000
3C8CAEBO
00000000
00000000
F4E950EA
00000000
00000000
PEE C
B N N L
A B K
C
P C C S
I L 0 E
P K N L
EMF 0
SOl P
D G
F 0 0 _ 2
F 1 1 _ 2
F 00_ 2
F 1 0 _ 2
F 00_ 2
F 0 1 _ 2
F 0 1 _ 2
FOOI2
F 0 1 _ 2
F 00_ 2
F 00_ 2
F 0 1 _ 2
F 00_ 2
F 00_ 2
F 0 1 _ 2
F 00_ 2
F 00_ 2
F 0 1 _ 2
F 00_ 2
F 00_ 2
o
o
o
o
o
o
1
o
o
1
o
o
1
o
o
1
o
o
1
o
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
FF
FF
EF
EF
FB
FB
BF
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
R H
E F
E A N L
S L C 0
E T
W
T
C
o
o
o
o
o
o
o
I
N
S
T
R
1CO
1CO
1CO
1CO
180
1
180
1
1CO
0 180
180
1
1
1CO
1
180
180
1
1CO
1
1
180
1
180
1
1eO
180
1
180
1CO
180
o
o
o
o
o
o
o
o
o
o
o
o
o
R F S B S T S 555
N A RYE E E E E E
D S C T L SLY S.C
T C EST Y
P T
000 0
0 0 0
000
0 0 0
000 0
0 0 0
0 0 0
000 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
000 0
0 0 0
000 0
0 0 0
0 0 0
0 0 0
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
000
000
000
000
000
000
000
000
000
000
0' 0 0
0 0 .
000
000
000
000
000
000
000
000
o
Microcode Table for the ArcSine(x) and ArcCosine(x) Calculation (Continued)
p
A
...';'I
(0
(0
D
A
F 3E2383A3
F 00000000
F 00000000
F 3E403E4B
F 00000000
F 00000000
F 3E5BAFC8
F 00000000
F 00000000
F 3E77E333
F 00000000
F 00000000
F 3E94E3A4
F.OOOOOOOO
F 00000000
F 3EB296DD
F 00000000
F 00000000
F 3EDOE913
F 00000000
F 00000000
F 3EEFA74E
F 00000000
F 00000000
D
B
7598FC80
00000000
00000000
2F65FODE
00000000
00000000
245ABDF8
00000000
00000000
3AFF1AB4
00000000
00000000
D4220C9C
00000000
00000000
4C084ACB
00000000
00000000
F5F9D496
00000000
00000000
896F8FA8
00000000·
00000000
PEE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
EMF 0
SOl P
D G
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
SN74ACT8847
0 1
00_
0 0 _
0 1 _
00_
00_
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
0 0 _
0 0 ~
o
1
0
0
1
o
o
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
aa
R H
E F
E A N L
S L C 0
E T
W
T
C
1 1
1
1
1
1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
N
S
T
R
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
R F S B S T SO
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
0 0 3
003
0 0 3
0 0 3
0 0 3
0 0 3
3
000
3
000
3
000
3
000
3
000
3 1 000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
L1788.l:l"17LNS
.....
~
0
0
Microcode Table for the ArcSine(x} and ArCosine(x} Calculation (Concluded)
p
A
0
A
0
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
3FOEC76B
00000000
00000000
3F2F9786
00000000
00000000
3F519B10
00000000
00000000
3F769689
00000000
00000000
3FA375CA
00000000
00000000
3FE7B204
3FE6A09E
3FF6A09E
00000000
00000000
00000000
3FF921FB
00000000
00000000
78320BB6
00000000
00000000
98C8B2E4·
00000000
00000000
87542073
00000000
00000000
5FFC05AO
00000000
00000000
6102988C
00000000
00000000
2301 0930
667F3BCO
667F3BCO
00000000
00000000
00000000
54442018
00000000·
00000000
PEE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
EMF 0
SOl P
o G
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
0 1 _
0 0 _
1 0 _
0 0 _
0 0 --.:.
0 1 ~
0 0 _
0 0 _
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
1
0
0
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
R H
E F
E A N L
S L C 0
E T
W
T
C
FB
9F
FB
FB
9F
FB
FB 1
9F 1
FB 1
FB 1
9F 1
FB 1
FB
9F
FB
FB
BF
EF
EF
FF
FF1
FB 1
FF 1
FF 1
1
1
1
1
1
1
1
1
1
1
1
1
1
,
1
1
1
1
1
1
1
1
1
1
1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
I
N
S
T
R
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
1CO
lCO
180
300
183
300
300
R F S B S T S 000
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
o
o
0 0 0 3
0 0 0 3
000 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 003
0 0 0 3
0 003
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3 1 000
3 1 000
3 1 000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3 1 000
3 000 0
3 1 000
30 0 0 0
ArcTangent Routine Using Chebyshev's Method
All floating point inputs and outputs are double precision. The output is in radians.
Steps Required to Perform the Calculation
STEP 1 -
Preprocessing; If the magnitude of the input, X, is greater than 1.0,
then the reciprocal must be taken. If the magnitude of X is not greater
than 1.0, then pass X. Let this number (either X or 1.0/X) be referred
to as Xl. Next multiply Xl times 2.0, then multiply this resulting
number by Xl. Finally, subtract 1.0 from this last product.
IXI > 1.0
Then Xl +- 1.0/X
Else Xl +- X
X2 +- Xl *2.0*Xl - 1.0
If
STEP 2 - Core Calculation; X2 in Step 1 will be referred to as 'x' in the core
calculation.
X3 ...... CSeries_atan
+-
((((((((((((((((((C19*x +C1S)*X + C17)*x + C16)*x + C15)*x +
C14)*X + C13)*x + C12)*X + Cll)*x + Cl0)*x + C9)*x
+CS)*x + C7)*x + c6)*x + C5)*x + C4)*x + c3)*x + C2)*X
+ Cl)*X + cO
STEP 3 - Postprocessing; mUltiply the output of the core calculation times Xl.
Let this number be referred to as X4. The next computation will yield
the answer. If X was greater than 1.0, then subtract X4 from pi/2.
If X was less than -1.0, then subtract X4 from - pi/2. If neither of
the two conditions above are true, then X4 is the answer.
X4 +- X3*Xl
If X > 1.0
Then ArcTangent(X) +- pi/2 - X4
Else If X < - 1 .0
Then ArcTangent(X) +- - pi/2 - X4
Else ArcTangent(X) +- X4
I"
.q
00
00
....
«
.q
u
I"
Z
en
7-201
Algorithms for the Three Steps
Step 1 perform the preprocessing:
If
IXI > 1.0
Then Tl T2 T3 T4 <0Else Tl <0T2 T3 T4 -
1.0/X
T1 *2.0
T2*CREG
T3 - 1.0
X
Tl *2.0
T2*Tl
T3 - 1.0
T1 is Xl in Step 1, must be stored
externally
CREG - Tl
Step 2 perform the core calculation:
T5 -C19*CREG
T6 -T5 + c18
T7 -T6*CREG
T8 -T7 + c17
T9 -T8*CREG
Tl0-T9 + c16
Tll -T10*CREG
T12 -T11 + c15
T13-T12*CREG
T14-T13 + C14
T15 -T14*CREG
T16-T15 + c13
T17-T16*CREG
T18-T17 + c12
T19 -T18*CREG
T20-T19 + cll
T21 -T20*CREG
T22 -T21 + cl0
T23 -T22*CREG
T24-T23 + c9
T25 -T24*CREG
T26 -T25 + c8
T27 -T26*CREG
T28 -T27 + C7
T29 -T28*CREG
T30 -T29 + c6
en
2
'"
~
:t>
(")
-i
(X)
(X)
~
'"
7-202
CREG
+-
T4
T31 -T30*CREG
T32 -T31 + c5
T33 -T32*CREG
T34-T33 + c4
T35 -T34*CREG
T36-T35 + c3
T37 -T36*CREG
T38-T37 + c2
T39 -T38*CREG
T40 -T39 + cl
T41 -T40*CREG
T42-T41 + co
Step 3 perform the postprocessing:
T43 - T42*Tl
If X > 1.0
CREG - T43
Then ArcTangent(X) - pi/2 - CREG
Return
If X < -1.0
Then ArcTangent(X) - - pil2 - CREG
Return
ArcTangent(X) - CREG
Required System Intervention
As seen in the algorithm for Step 1, the' ACT884 7 performs a compare. The results
of this compare determine what kind of preproccessing is to be performed. In Step 3,
there are two more compare operations. The system must therefore perform additional
decision making. In addition, the system must store Tl , and later (in the postprocessing)
provide this value to the 'ACT884 7.
Number of 'ACT8847 Cycles Required to Calculate ArcTangent(x}
Calculation of ArcTangent(x) requires at most 89 cycles (including the divide
instruction). In addition, it is assumed that 15 additional cycles are required due to
the compare instructions, and resulting system intervention. Therefore, the total number ,....
of cycles to perform the ArcTangent(x) calculation is 104.
~
CO
CO
~
U
1.0 then execute
80
X MSH
X LSH
20rX
Tl
0
83 through 86, otherwise
skip to 83b. In either case
COMPARE
X,1.0
--
--
-----
--'--
,_execute 80_through 82
Table 60. Pseudocode for Chebyshev ArcTangent Routine (PIPES2-0 ... 010. RND1-0 .. 00) (Concluded)
ClK
C
S
y
REG
REG
BUS
REG
RB
REG
ClK
MODE
1.0 MSH
1.0 LSH
X
1.0
0
82
X
1.0
0
NOP
83
X
1.0
0
RB84-CR84
X
pi/2
0
RB84-CR84
85
X
pi/2
0
NOP
S21a
S21a
86
X
pi/2
0
NOP
S21a
S21a
0
COMPARE
-1.0,X
81
84
pi/2 MSH
pi/2 LSH
INSTR
COMPARE
AlU
PIPE
P
DB
BUS
RA
MUl
PIPE
REG
DA
BUS
COMMENT
P23
X,l.0
Wait for system response
P23
Execute if X
Output MSH of answer
Output LSH of answer
The calculation is done
Execute if X
83b -1.0 MSH -1.0 lSH
X
1.0
0
NOP
P23
0
RB87-CR87
-pi/2
0
RB87-CR87
-1.0
-pi/2
0
NOP
S21b
89b
-1.0
pi/2
0
NOP
S21b
86c
-1.0
X
1
PASS(CR86)
87c
-1.0
X
0
NOP
88c
-1.0
X
0
NOP
0
85b
-1.0
X
86b
-1.0
X
-1.0
88b
87b
X LSH
-pi/2
-pi/2
MSH
LSH
1.0.
skip to 86c. In either case
execute 83b thru 85b
P23
X
X MSH
s
If - 1.0 > X then execute
86b through 89b, otherwise
COMPARE
-1.0,X
-1.0
84b
> 1.0
Wait for system response
Execute if - 1.0
> X
S21b Output MSH of answer
S21b
Output LSIo-l of answer.
The calculation is done.
Execute if X is within the
range [- 1 ,11, Pass CREG
"r:..,
oCO
SN74ACT8847
_._-
S21c
S21c
Output MSH of answer
S21c
S21c
Output LSH of answer
Lv88.l::l"vLNS
.....
~
Microcode Table for the ArcTangent(x) Calculation
~
0
All numbers are in hex. Any field with a length that is not a multiple of 4 is right justified and zero filled. For the microcode
table, the value of X has been chosen to be SQRT(3.0).
p
A
D
A
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
3FFOOOOO
3FFBB67 A
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
40000000
00000000
00000000
00000000
00000000
BFFOOOOO
BDC4D6CC
D
B
00000000
E8584CAB
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
6308553F
PEE C
B N N L
A B K
C
P C C S
I L 0 E
P K N L
EMF 0
SOl P
D G
F 0 0 _
F 1 1 _
F 0 0 _
F 00_
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
F 0 0 _
FlO _
F 0 O..r
F 0 0 _
F 0 0 _
F 0 1
F 0 1 _
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
EF
EF
6F
6F
FB
FB
BF
R H
E F
E A N L
S L C 0
E T
W
T
C
o
o
o
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
N
S
T
R
18A
18A
300
OlEO
300
300
300
300
300
300
300
300
300
300
300
lCO
lCO
0 lCO
lCO
180
180
lCO
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
R F S B S T S 555
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
o
o
o
o
o
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
00003
000 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 003
0 0 0 3
00003
0 0 0 3
00 0 3
0 0 0 3
00003
00103
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
o
o
o
o
o
o
o
o
o
o
o
o
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
000
000
000
000
000
000
000
000
000
000
000
000
000
1 000
1'0 0 0
1 000
0 000
000
000
000
000
000
Microcode Table for the ArcTangent(x) Calculation (Continued)
-.J
,:.,
...
...
p
A
D
A
D
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
00000000
3DDFFD56
00000000
00000000
BDE88078
00000000
00000000
3E040967
00000000
00000000
BE237C82
00000000
00000000
3E3F1358
00000000
00000000
BE587CD2
00000000
00000000
3E73D238
00000000
00000000
BE9028E9
00000000
FCFD2315
00000000
00000000
2D99D071
00000000
00000000
OCB71218
00000000
00000000
39249B77
00000000
00000000
EC1D6ACO
00000000
00000000
5F4AFBED
00000000
00000000
8BOB8A86
00000000
00000000
21 CA6A94
PEE C
B N N l
A B K
C
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
SN74ACT8847
0 O.r
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
P C C s
I l 0 E
P K N l
EMF 0
SOl P
D G
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
0
0
1
0
0
1
0
o
1
0
o
1
0
0
1
0
0
1
0
0
1
0
0
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
R H
E F
E A N l
S l C 0
E T
W
C
T
FB
FB
9F
FB 1
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB 1
o
1
I
N
S
T
R
0 180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
R F S B S T S 555
N A RYE E E E E E
D S C T l SLY S C
T C EST Y
P T
o
o
o
o
o
o
o
o
o
o
o
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 003
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
000 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
0 003
0 003
0 0 0 3
0 0 0 3
0 0 0 3
0 0 0 3
o
o
o
o
o
o
o
o
o
o
o
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
L1788.l:l'V17LNS
-
.....
N
....
Microcode Table for the ArcTangent(x) Calculation (Continued)
N
P
A
D
A
F 00000000
F 00000000
F 3EAA8149
F 00000000
F 00000000
F BEC5EDAD
F 00000000
F 00000000
F 3EE256E5
F . 00000000
F 00000000
F BEFF171F
F 00000000
F 00000000
F 3F1 ACFA9
F 00000000
F 00000000
F BF37A846
F 00000000
F 00000000
F 3F558DF7
F 00000000
F 00000000
F BF749B3E
D
B
00000000
00000000
97A38D4E
00000000
00000000
9A21FE5F
00000000
00000000
7BA07FAE
00000000
00000000
48FDF707
00000000
00000000
F95CAODF
00000000
00000000
4221 D994
00000000
00000000
. A83283C9
00000000
00000000
2E433683
PEE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
EMF 0
SOl P
D G
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
00_
00_
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
00_
0 0 _
0 1 _
1 3
3
0 3
1 3
0 3
0 3
1 3
0 3
0 3
1 3
0 3
0 3
1 3
0 3
0 3
1 3
0 3
0 3
1 3
0 3
0 3
1 3
0 3
o3
o
R H
E F
E A N L
S L C 0
E T
W
T
C
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB 1
o
o
o
o
o
o
1
1
1
1
1 1
1 1
1
1
1
1
1 1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
I
N
S
T
R
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
1CO
180
180
aaa
R F S B S T S
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
o
o
o
o
o
o
o
0 0 0 3
0 0 0 3
0 003
0 0 0 3
0 0 0 3
0 003
0 0 0 3
000 0 3
o0 0 0 3
o0 0 0 3
o 0 003
00003
o 0 003
o0 0 0 3
o 0 003
0 0 0 3
0 0 0 3
000 3
o 0 003
o0 0 0 3
0 0 0 3
o0 0 0 3
0 0 0 3
o0 0 0 3
o
o
o
o
o
3
000
3 1 000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3
000
3 1 000
3 1 000
3
000
3
000
3
000
3
000
3
000
Microcode Table for the ArcTangent(x) Calculation (Concluded)
p
A
D
A
D
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
00000000
00000000
3F955A30
00000000
00000000
BFBA 1494
00000000
00000000
3FEBDA 7A
3FE279A7
3FFBB67A
3FFOOOOO
00000000
00000000
3FF921FB
00000000
00000000
00000000
00000000
OBFB8078
00000000
00000000
C19FADD4
00000000
00000000
85BD40CB
4590331C
E8584CAB
00000000
00000000
00000000
54442D18
00000000
00000000
PEE C P C C
B N N L
L 0
A B K P K N
C EMF
SOl
D G
F 0 0 _ 2 1 3
F 0 0 _ 2 o 3
F 0 1
2 o 3
F 0 0 _ 2 1 3
F 0 0
2 o 3
F 0 1
2 o 3
F 0 0 _ 2 1 3
F 0 0 _ 2 o 3
2 o 3
F 0
F 0 1 _ 2 1 3
F 0 0 _ 2 o 3
F 1 1 _ 2 o 3
FOO.I2 o 3
F 0 0
2 o 3
F 0 1 _ 2 o 3
F 0 0 _ 2 o 3
F 0 0 _ 2 o 3
-..j
N
w
SN74ACT8847
s RH
E F
E E A N L
L S L C 0
0 E T
W
P T
C
9F
FB
FB
9F
FB
FB
9F
FB
FB
BF
FF
FF
FF
F7
F7
FF
FF
o
o
o
o
o
o
o
o
o
o
o
o
I
N
S
T
R
R F S B S T S 000
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
1CO o 0 003
180 o 0 0 0 3
180 o 0 0 0 3
1CO o 0 0 0 3
180 o 0 0 0 3
180 o 0 0 0 3
1CO o 0 0 0 3
180 o 0 0 0 3
180 o 0 0 0 3
1CO o 0 0 0 3
182 o 0 0 0 3
1
182 o 0 0 0 3
o 0 300 0 0 1 0 3
o 183 o 0 003
o 183 o 0 0 0 3
o 300 o 0 0 0 3
o 300 o 0 0 0 3
3
000
3
000
3
000
3
000
3
000
3
000
000
3
3
000
3
000
3
000
000
3
3
000
3
000
3
000
3 1 000
3 1 000
300 0 0
Exponential Routine Using Chebyshev's Method
All floating point inputs and outputs are double precision.
Steps Required to Perform the Calculation
STEP 1 - Preprocessing; first multiply the input, X, by log2e (yielding X1). Next,
convert this product to an integer, using truncate mode (yielding X2).
Form the variable EX by adding 1024 to X2. EX is used in the
postprocess!ng part of the routine. Subtract 1023 from EX to find
the variable N (N is a~tually X2 incremented by 1). Convert N to a
floating point number (yielding X3). Subtract X1 from X3, multiply
this difference by 2.0, and then finally subtract 1.0. This last
computation is. the input to the core routine.
X1'"
X2'"
EXNX3 X4-
X*1092e
TRUNC(X1)
1024 + X2
EX - 1023
DOUBLE(N)
2.0*(X3 - X1) - 1.0
STEP 2 - Core Calculation; X4 in Step 1 will be referred to as 'x' in the core
calculation.
X5'" Cseries_exp
-
((((((((((C11 *x + C10)*X + Cs)*x + c8)*x + C7)*X + c6)*x
C5)*X + C4)*X + c3)*x + C2)*x + C1)*x + cO
+
STEP 3 - Postprocessing; multiply the output of the core calculation times 2N.
To generate 2N, perform the following: shift left logical 20 positions
(bits) the variable EX (which was calculated in Step 1). The resulting
bit pattern will be the double precision floating point representation
of 2N. However, the 'ACT.8847 will not at this point recognize the
bit pattern as floating point number. So this number must be output
from the Y bus, and then input (declaring the input to be a double
precision floating point number) on the input bus. Now the' ACT884 7
wjll process 2N as a double float, and so the COre output, X5, can
be multiplied by 2N to produce the final result. 'SLL' means to shift
left logical.
a
X6'"
Y busDA busExp(X) ...
7-214
EX SLL by 20 bits
X6
Y bus
XEi * X6
Algorithms for the Three Steps
Step 1 perform the preprocessing:
Tl +-X*lo92e
T2 +-INT(Tl)
T3 +-1024 + T2
T4
T5
T6
T7
T8
T9
+- T3 - 1023
+-1*T4
+-DOUBLE(T5)
+-T6 - CREG
+-2.0*T7
+-T8 - 1.0
lo92e entered as a constant
round controls set to truncate
T3 is EX in Step 1, must be
stored externally, CREG +- Tl
makes T4 available to A2 MUX
convert from integer to double
T9 is X4 in Step 1, the
input to the core routine
Step 2 perform the core calculation:
Tl0 +-cll *CREG
Tll ..... Tl0 + c1Q
T12 +-Tll *CREG
T13+-T12 + c9
T14+-T13*CREG
T15 +-T14 + c8
T16 +-T15*CREG
T17+-T16 + c7
T18 +-T17*CREG
T19 +-T18 + c6
T20 +-T19*CREG
T21 +-T20 + c5
T22 +-T21 *CREG
T23 +- T22 + c4
T24 +-T23*CREG
T25 +-T24 + c3
T26 +-T25*CREG
T27 +-T26 + c2
T28 +- T27 *CREG
T29 +-T28 + cl
T30 +-T29*CREG
T31 +-T30 + cO
CREG +- T9
7-215
Step 3 perform the postprocessing:
T32 +- T3 SLL by 20 bits
Y bus +- T32
DA bus
+-
Y bus (= T32)
Exp(X)
+-
T32*CREG
Shift T3 20 bits left
Output and then Input T32
CREG +- T31
Two cycles required to
input both halves of T32
Required Systf!m Intervention
The system is required to store the variable EX, and then later provide this variable.
In addition, the system is required to route the variable T32 (in Step 3) from the Y
bus to the DA bus.
Number of ' ACT884 7 Cycles Required to Calculate Exp(x)
Calculation of Exp(x) requires 52 cycles. Since there are no decisions which the system
is required to perform, the total number of cycle to perform the Exp(X) calculation is 52.
Listing of the Chebyshev Constants (c's)
The constants are represented in IEEE double-precision floating point format.
c11
c10
cg
c8
c7
c6
c5
c4
c3
c2
c1
cO
7-216
= BD45A7FC05D3B501
=: 3D957BFD2DBF487C
= BDE351B821AC16D5
= 3E2F5BOE17440879
= BE769E51EE631E87
= 3EBC8D7530548DD5
=
BEFEE4FD234A4926
= 3F3BDB696E8987 AC
= BF741839EB88156E
= 3FA5BE298ADF0369
= BFCF5E46537AB906
= 3FE6A09E667F3BCC
Pseudocode Table for the Exp(x) Calculation
Table 61. Pseudocode for Chebyshev Exponential Routine (PIPES2-0
ClK
DA
BUS
DB
BUS
1
X MSH
X LSH
2
L092e
MSH
Lo92e
LSH
X
ClK
MODE
INSTR
0
RA2.RB2
Lo92e
0
RA2.RB2
RB
REG
3
X
Lo92e
0
DP2I(PR4)
4
X
Lo92e
0
DP2I(PR4)
1024
Lo92e
0
RA5+SR5
-1023 Lo92e
0
RA6+SR6
5
1024
6
-1023
-1023
1
0
SR7.RB7
8
-1023
1
1
12DP(PR8)
9
-1023
1
1
SR9-CR9
-1023
2.0
1
SR10.RB10
-1023
2.0
0
PR12+RB12
0
PR12+RB12
SR13.RB13
7
10
1
2.0 MSH
2.0 LSH
11
12
-1.0 MSH -1.0 LSH -1023 -1.0
13
cll MSH
-1023
cll
1
14
-1023
cll
0
PR15 + RB15
15
-1023
cl0
0
PR15+RB15
16
-1023
cl0
1
SR16.CR16
17
-1023
cl0
0
PR18+RB18
-1023
cg
0
PR18+RB18
18
'-I
RA
REG
cl0 MSH
c9 MSH
cll LSH
cl0 LSH
Cg LSH
r:,
19
-1023
'-I
20
-1023
c9
1
SR19.CR19
c9
0
PR21 +RB21
MUl
PIPE
AlU
PIPE
010. RND1-01
P
C
S
y
REG
REG
REG
BUS
COMMENT
X is the input
RA2.RB2
Double-precision - integer
Pl
Pl
Sl
S2
S2
Store S2. which is the
variable EX. for use in
cycle 46
S3
P2
Integer - double-precision
S4
S5
SR10.RB10
P3
S6
SR13.RB13
S6
P4
S7
SR16.CR16
P5
S8
SR19.CR19
Start core calculation.
S6 is the input to the
core calculation
L1788.l::l"17LNS
-.J
Table 61. Pseudocode for Chebyshev Exponential Routine (PIPES2-0 ... 010. RND1-0) (Continued)
~
(XI
ClK
DA
BUS
DB
BUS
RA
REG
RB
REG
ClK
MODE
INSTR
Cs MSH
cs LSH
-1023
Cs
0
PR21 +RB21
22
-1023
cs
1
SR22.CR22
23
-1023
cs
0
PR24+RB24
-1023
c7
0
PR24+RB24
25
-1023
c7
1
SR25.CR25
26
-1023
c7
0
PR27+RB27
-1023
c6
0
PR27+RB27
2S
-1023
C6
1
SR2S.CR2S
29
-1023
c6
0
PR30+RB30
-1023
21
24
27
c7 MSH
c6 MSH
c7 LSH
c6 LSH
c5
0
PR30+RB30
-1023
c5
1
SR31.CR31
-1023
c5
0
PR33+RB33
-1023
c4
0
PR33+RB33
34
-1023
c4
1
SR34.CR34
35
-1023
c4
0
PR36+RB36
-1023
c3
0
PR36+RB36
37
-1023
c3
1
SR37.CR37
3S
-1023
c3
0
PR39+ RB39
-1023
c2
0
PR39+RB39
40
-1023
c2
1
SR40·CR40
41
-1023
c2
0
PR42+RB42
-1023
cl
0
PR42+RB42
cl
1
SR43.CR43
30
c5 MSH
c5 LSH
31
32
33
36
39
42
43
c4 MSH
c3 MSH
c2 MSH
cl MSH
c4 LSH
c3 LSH
c2 LSH
cl LSH
-1023
MUl
PIPE
ALU
PIPE
P
C
S
y
REG
REG
REG
BUS
P6
S9
SR22.CR22
P7
S10
SR25.CR25
PS
Sll
SR2S.CR2S
P9
S12
SR31.CR31
Pl0
S13
SR34.CR34
Pll
S14
SR37.CR37
P12
S15
SR40.CR40
P13
S16
COMMENT
Table 61. Pseudocode for Chebyshev Exponential Routine (PIPES2-0 - 010, RND1-0) (Concluded)
ClK
DA
BUS
DB
BUS
44
45
46
co MSH
S2
47
48
49
S18
20
RB
REG
ClK
MODE
INSTR
MUl
PIPE
-1023
c1
0
PR45+RB45
SR43*CR43
-1023
Co
0
PR45+ RB45
S2
20
0
SLL
RA46,RB46
S2
20
0
NOP
S2
20
0
RA48.CR48
AlU
PIPE
P
REG
C
REG
S
REG
y
BUS
COMMENT
P14
Begin post processing.
S2 is the variable EX, and
was calculated in cycle 5.
Shift left logical S2
20 bit positions
S17
S18
S18
Allows time for S18 to be
output from the Y bus and
input to the DA bus
RA holds S18', which is
the double precision
floating point equivalent
of 2 N, where N was
calculated in cycle 6
S18'
20
0
RA48.CR48
S18'
20
0
DUMMY
51
S18'
20
0
NOP
P15
P15
Output MSH of answer
52
S18'
20
0
NOP
P15
P14
Output LSH of answer
50
0
Co lSH
RA
REG
-.I
~
co
SN74ACT8847
Instruction is RA + RB, used
to allow time for result
to propagate to Y bus
RA48.CR48
Lv88.l:lVvLNS
-..J
N
!'.)
0
Microcode Table for the Exp(x} Calculation
All numbers are in hex. Any field with a length that is not a multiple of 4 is right justified and zero filled. For the microcode
table, the value of X has been chosen to be 6.25.
p
A
D
A
D
B
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
40190000
3FF71547
00000000
00000000
00000400
FFFFFC01
00000000
00000000
00000000
40000000
00000000
BFFOOOOO
BD45A7FC
00000000
3D957BFD
00000000
00000000
BDE351B8
00000000
00000000
3E2F5BOE
00000000
652B82FE
00000000
00000000
00000000
00000000
00000001
00000000
00000000
00000000
00000000
00000000
05D3B501
00000000
2DBF487C
00000000
00000000
21AC16D5
00000000
00000000
17440879
PEE C
B N N L
A B K
C
P C C s
I L 0 E
P K N L
EMF 0
SOl P
D G
F 0 0 _ 2 0 3
F 1 1
2 0 3
F 0 0 _ 2 0 3
F 0 0 _ 2 0 3
F10...r201
F10_201
F01_201
F 00_ 2 1 3
F 00_ 2 1 3
F 0 1 _ 2 1 3
F 0 0 _ 2 0 3
F 0 1 _ 2 0 3
F 0 1 _ 2 1 3
F 0 OS 2 0 3
2 0 3
F 0 1
F 00_ 2 1 3
F 00_ 2
3
F 0 1 _ 2 0 3
F 00_ 2 1 3
F 0 0 _ 2 0 3
F 0 1 _ 2 0 3
o
FF
FF
FB
FB
FE
FE
BF
FB
F6
BF
FB
FB
BF
FB
FB
9F
FB
FB
9F
FB
FB
R H
E F
E A N L
S L C 0
E T
W
T
C
o
o
o
o
1
1
1
1
1
1
1
1
1
N
S
T
R
1CO
1CO
1 1
1A3
1A3
1 1
1 0 0 200
1 1
200
1 1
240
1A2
1
183
1CO
180
180
1
1CO
0 180
1
180
1CO
180
180
1CO
1
1
180
1
1
180
1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
R F S B S T S 000
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
o
o
0 003 3
0 003 3
0 0 3 3
1 0 0 0 3 3 1
0010331
0 0 0 3 3 1
0 0 0 3 3 1
0 0 0 3 3 1
0 0 0 3 3 1
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 0 0 3 3
0 003 3
0 003 3
0 0 0 3 3
00003 3
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
Microcode Table for the Exp(x) Calculation (Continued)
p
-..J
N
N
~
A
D
A
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
00000000
00000000
BE769E51
00000000
00000000
3EBC8D75
00000000
00000000
BEFEE4FD
00000000
00000000
3F3BDB69
00000000
00000000
BF741839
00000000
00000000
3FA5BE29
00000000
00000000
BFCF5E46
00000000
00000000
3FE6A09E
D
B
00000000
00000000
EE631E87
00000000
00000000
30548DD5
00000000
00000000
234A4926
00000000
00000000
6E8987 AC
00000000
00000000
EB88156E
00000000
00000000
8ADF0369
00000000
00000000
537 AB906
00000000
00000000
667F3BCC
PEE C P C C
B N N L I L 0
A B K P K N
C EMF
SOl
D G
F 00_ 2 1 3
F 00_ 2 o 3
F 0 1
2 0 3
F 00_ 2 1 3
F 0 0 _ 2 0 3
F 0 1 _ 2 o 3
F 00_ 2 1 3
F 00_ 2 o 3
F 0 1 _ 2 o 3
F 00_ 2 1 3
F 0 0 _ 2 0 3
F 0 1 _ 2 o 3
F 00_ 2 1 3
F 00_ 2 o 3
F 0 1
2 o 3
F 00_ 2 1 3
F 0 0 _ 2 0 3
F 0 1 _ 2 0 3
F 00_ 2 1 3
F 0 0 _ 2 0 3
F 0 1 _ 2 0 3
F 00_ 2 1 3
F 00_ 2 o 3
F 0 1 _ 2 o 3
SN74ACT8847
s R H E F
E E A N L
L S L C 0
0 E T
W
P T
C
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
9F
FB
FB
1
1
1 1
1
1
1
1 1 1
1
1 1 1
1
1
1
1 1
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
N
S
T
R
R F S B S T S 000
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
1CO o 0 0 0 3 3
000
180 000 0 3 3
000
180 o 0 0 0 3 3 1 000
1CO o 0 0 0 3 3 1 000
180 o 0 0 0 3 3
000
180 o 0 0 0 3 3
000
1CO o 0 0 0 3 3
000
180 o 0 0 0 3 3
000
180 o 0 0 0 3 3
000
1CO o 0 0 0 3 3
000
180 o 0 0 0 3 3
000
180 o 0 0 0 3 3 1 000
1CO o 0 0 0 3 3
000
180 o 0 0 0 3 3
000
180 o 0 0 0 3 3
000
1CO o 0 0 0 3 3
000
180 o 0 0 0 3 3
000
180 o 0 0 0 3 3 1 000
1CO o 0 0 0 3 3
000
000
180 o 0 0 0 3 3
180 o 0 0 0 3 3
000
1CO o 0 0 0 3 3 1 000
000
180 o 0 0 0 3 3
000
180 o 0 0 0 3 3
Lv88.l~VvLNS
-..J
N
N
N
Microcode Table for the Exp(x) Calculation (Concluded)
P
A
D
A
F
F
F
F
F
F
F
00000409
00000000
40900000
00000000
00000000
00000000
00000000
D
B
00000014
00000000
00000000
00000000
00000000
00000000
00000000
PEE C
B N N L
A B K
C
F
F
F
F
F
F
F
P C C s
I L 0 E
P K N L
EMF 0
SOl P
D G
11_201
0 1 .J 2 0 3
0 0 _
2 ·0 2
1 0 _
2 0 2
0 0 _
2 0 3
0 0 _
2 0 3
0 0 _
2 0 3
R H
E F
E A N L
S L C 0
E T
W
T
C
FF
FF
DF
DF 1
FF
FF
FF 1
1
o
1 1
1
1
1 1
1 1
o
o
o
o
o
o
I
N
S
T
R
228
0 300
1CO
1CO
180
300
300
R F S B S T S 000
N A RYE E E E E E
D S C T L SLY S C
T C EST Y
P T
o
o
o
o
o
o
o
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
3
3
3
3
3
3
3
000
3
000
3
000
3
000
3 1 000
3 1 000
3 0 000
High-Speed Vector Math and 3-D Graphics
Introduction
Texas Instruments SN74ACT8837 and SN74ACT8847 floating point units (FPU) are
designed to execute high-speed, high-accuracy mathematical computations. The
devices are especially suited for matrix manipulations such as those used in graphics or
digital signal processing. These FPUs multiply and add data elements by executing
sequences of microprogrammed calculations to form new matrices. Each device may be
configured for either single- or double-precision operation. Single-precision operation is
assumed throughout this report.
The 'ACT8847 is a functional superset of the 'ACT8837 and operates at higher clock
rates (up to 33 MHz) than the 16-MHz '8837. Unlike the 'ACT8837, the 'ACT8847 can
perform integer and logical operations and has built-in, hardwired algorithms for division
and square root operations.
This application report outlines the timing, data flow, and programming for several
common data vector calculations and matrix transformations. Further, it illustrates some
of the programming "tricks" resulting in fastest operation. Throughout, this document
compares the timing schemes for programs in which all registers, including the ALU and
multiplier internal pipeline registers, are enabled ("pipelined" mode) with those for
equivalent programs in which the internal pipeline registers are disabled ("unpiped"
mode). Equations are provided to help the programmer select the more efficient mode,
and performance figures are included for both devices, with times given for 15-MHz and
3D-MHz operations.
This report begins by covering simple vector arithmetic operations, which are
categorized as "computational" or "compare" functions for convenience. This document
then compares these operations as they are used in graphics applications to perform
three-dimensional coordinate transformations, perspective viewing, and Clipping.
SN74ACT8837 and SN74ACT8847 Floating Point Units
......
q-
Both the 'ACT8837 and 'ACT8847 floating point units (FPU) combine a multiplier and an
arithmetic-logic unit (ALU) in a single microprogram mabie VLSI device. These devices
are implemented in Tl's advanced one-micron CMOS technology and are fully
compatible with the IEEE standard for binary floating pOint arithmetic, STO 754-1985, for
either single- or double-precision operation.
00
00
Instruction inputs can select independent ALU operation, independent multiplier
operation, or simultaneous ALU/multiplier operation. Each FPU can handle three types
of data input formats. The ALU accepts data operands in integer format or IEEE floating
7-223
~
(.)
~
......
Z
CJ)
point format. In the 'ACT8837, integers are converted to normalized floating point
numbers with biased exponents prior to further processing. A third type of operand,
denormalized numbers, can also be processed after the ALU has converted them to
"wrapped" numbers, which are explained in detail in the SN74ACT8800 Family Data
Manual. The 'ACT8837 multiplier operates only on normalized floating pOint numbers or
wrapped numbers. The 'ACT8847 multiplier also operates on integer operands.
Data enters the 'ACT8837 or 'ACT8847 through two 32-bit data buses, DA and DB (see
Figures 74 and 75), which can be configured to operate as a single 64-bit data bus for
double-precision operations. Data can be latched in a 64-bit temporary register or
loaded directly into the input registers, RA and RB, which pass data to the multiplier and
ALU.
A clock-mode control allows the temporary register to be clocked on the rising or falling
edge of the clock to support double-precision ALU operations at the same rate as singleprecision operations. Using the temporary register, double-precision numbers on a
single 32-bit input bus can be loaded in one clock cycle.
The input registers RA and RB are the first of three levels of internal data registers.
Additionally, the ALU and multiplier each have an internal pipeline register and an output
register. The ALU's output register is denoted by"S" (sum), and the multiplier's output
register is denoted by "P" (product). Any or all of these internal registers may be
bypassed.
A 64-bit constant register (C) with a separate clock is provided for temporary storage of a
multiplier result, ALU result, or constant for feedback to the multiplier and ALU. An
instruction register and a status register are also included.
Four multiplexers select the multiplier and ALU operands from the input, C, S, or
P registers. Results are output on the 32-bit Y bus; a Youtput multiplexer selects the
most or least significant half of the result for output.
In addition to add, subtract, and multiply functions, the 'ACT8837 can be programmed to
perform floating pOint division using a Newton-Raphson algorithm. Absolute value
conversions, floating point-to-integer and integer-to-floating pOint conversions, and a
compare instruction are also available.
en
2
~
»
(")
~
The 'ACT8847 FPU is fully compatible with IEEE Standard 754-1985 for addition,
subtraction, multiplication, division, square root, and comparison. The 'ACT8847 FPU
also performs integer arithmetic, logical operations, and logical shifts. Additionally,
absolute value conversions and floating pOint-to-integer and integer-to-floating point
conversions are available.
(X)
~
-...I
7-224
l-1
PERRA
DA31-DAO
PA
to
PO
PARITY
CHECK
I
I
I
'~ t t r-l
I
J
PERRB
to
I
32
32
I
I
PARITY
CHECK
DB31-0BO
TEMPORARY
REGISTER
32
32
CONFIGURATION
lOGIC
--------
2
~t60
I RA INPUT
I REGISTER
ENRA
---r60
I
I
I
I
RO INPUT
REGISTER
t
ENRB
f 60
60
I
I
~ ~
~~
MULTIPLIER CORE
AlU
PIPELINE REGISTER
PIPELINE REGISTER
ADDER/ROUNDER
NORMALIZER
60
60
60
60
I
INSTRUCTION
REGISTER
19-10
SELOP7-SELOPO
PIPES2-PIPES1
FAST
RND1-RNDO
I
t
60
60
PRODUCT IPI REGISTER
SUM (SI REGISTER
60 60
\4-1
SRec
MUX
CLKC
C REGISTER
---~
FROM
INSTRUCTION.
REGIST ER
PARITY
GENERATE
~
4
CLKMODE
RESET
4
TP1-TPO
1~t
4
Vce
4
GND
L
7..
C
I
0
Y31-YO
-OEY
SELST1SELSTO
11
3
0
PY3-PYO
ClK
PIPESO
1STATUS REGISTER
32
I
4
4
f60
~
&--
2
HALT
BYTEP
~~
7
f60
4
4
4
17~
SELMSfLS
CONFIG1CONFIGO
T
MASTER/
SLAVE
COMPARE
t
MSEAR
J
UNORD
AGT B
A EO B
IVAL
IHEX
OVER
UNDER
OENORM
OENIN
RNDCO
SRCEX
CHEX
STEX 1-STEXO
Figure 74. SN74ACT8837 Floating Point Unit
7-225
PERRA
CA31·DAO
PA
t
I
J
PARITY
CHECK
DB31·DBO
I
REGISTER
I t t
'----l
32
32
PARITY
CHECK
I
I
TEMPORARY
t
I
32
32
PERRB
P8
r-
CONFIGURATION
LOGIC
I
I
CONFIG1·
2
CONFIGO
;t2S
t64
I
I
I RA INPUT
ENRA
I REGISTER
t64
I
RS INPUT
REGISTER
ENRB
I
164
J64
I
I
~ ~
~l:
64
I
64
MULTIPLIER CORE
1-64
llNSTRUCTION
REGISTER
I 10-10
SELOP7~SELOPO
PIPES2-PIPES 1
FAST
RND1-RNDO
1-64
ALU
PIPELINE REGISTER
PIPELINE REGISTER
ADDER/ROUNDER
I
64
PRODUCT IP) REGISTER
~ INSTRUCTIONJ
NORMALIZER
PIPELINE
t64
SUM IS) REGISTER
84 64
W
SRce
MUX
CLKC
C REGISTER
2,
;
l0o!
;
-1-64
SELMS/I:!
FROM
--~
INSTRUCTION ..
REGIS TER
2
';<
-1-64
~
&-
lSTATUS REGISTER
ENRC
of
FLOWC
of
HALT
of
BYTEP
of
CLK
of
P'PESO
of
CLKMODE
of
REm
of
TP1-TPO
of
VCC
of
GND
SELST1SELSTO
32
I
PARITY
GENERATE
U
4
3
PV3·PYO
11
'"7.
~
.c
T
4
I
Y3,·YO
orv
MASTERI
SLAVE
COMPARE
t
MSERR
I
ED
DIVBYO
IVAL
UNORD
AGT B
AEQR
IHEX
OVER
UNDER
DENORM
Figure 75. SN74ACT8847 Floating Point Unit
7-226
of
OES
DENI N
RND CO
SRCEX
CHE X
STE X1·STEXO
NEG
INF
For both the 'ACT8837 and 'ACT8847, the ALU and multiplier can operate in parallel to
perform sums of products and products of sums. Detailed information regarding the
instruction inputs for the various 'ACT8837 and 'ACT8847 configurations and operations
is given in the SN74ACT8800 Family Data Manual.
Mathematical Processing Applications
Tl's SN74ACT8837 and SN74ACT8847 high-speed floating point units (FPU) are
designed to perform high-accuracy, computationally-intensive mathematical operations.
In particular, these FPUs can meet the computational demands of high-end graphics
workstations and advanced signal processing. Both applications involve repetitive
computations on arrays of data typically expressed as vector arithmetic operations.
For example, the calculation of the sum of products, or multiply-accumulate function, is
frequently used in both signal and graphics processing. In general form, the sum of
products equation is:
n
S = I kiXi, for coefficients ki and data xi.
i=1
This sum of products is the central function involved in multiplying matrices. Such
matrices might represent a system of linear differential equations or the geometrical
transformation of a graphic object. Specifically, an n x n matrix A multiplied by an n x m
matrix B yields an n x m matrix C whose elements Cij are given by:
n
Cij = I aik x bkj for i = 1, ... ,n and j = 1, ... ,m.
k=1
The 'ACT8837 and 'ACT8847 are designed to handle efficiently this kind of parallel
multiplication and addition.
Graphics Applications
The basic principle of graphics processing is that any object can be reduced to a
combination of points, lines, and polygons and then defined as a collection of points in
three-dimensional space. Because pOints, planes, transformation matrices and other
common data structures are vectors, most of the computations involved in graphics
processing are vector operations.
.....
~
CO
t;
«
.q
.....
2:
en
7-227
Computations for a 3-D graphics display are highly involved due to the complexity
introduced by the z-axis. Viewing an object from a particular perspective involves
transforming the object's world coordinates, or its coordinates in the model space, into
viewing, or eyepoint, coordinates. A series of translations and rotations map the viewing
system axes onto the world coordinate axes. Each individual pOint must be translated,
rotated and, if necessary, scaled in a proper order. Once the coordinate transformation is
complete, the coordinates are clipped to a viewing volume. Clipping algorithms employ
arithmetic operations to determine whether an object, or part of an object, is inside or
outside a pyramidal volume. Hidden surface routines may then be employed to delete
surfaces that fall behind a "nearer" surface from the viewer's perspective.
Matrix arithmetic is required for scaling, rotating, translating, or shearing an object, as
well as for the final process of projecting its visible parts to a two-dimensional frame
buffer. Any sequence of these transformations can be represented as a single matrix
formed by concatenating the matrices for the individual operations. The generalized
4 x 4 matrix for transforming a three-dimensional object is shown below, partitioned into
four component matrices, each of which produces a specific effect on the image. The
3 x 3 matrix produces linear transformation in the form of scaling, shearing, and rotation.
The 1 x 3 row matrix produces translation, while the 3 x 1 column matrix produces
perspective transformation with multiple vanishing points. The final single-element 1 x 1
matrix produces overall scaling.
Overall operation of the matrix T on the position vectors of a graphics object produces a
combination of shearing, rotation, reflection, translation, perspective, and overall
scaling.
Vector Arithmetic
Programs that require repetitive computations on multiple sets of operands lend
themselves to vector-processing algorithms, in which the operands are viewed as
succeeding elements of long "data vectors." The next two sections outline the
programming for commonly-used vector operations. Most of these examples conclude
with a comparison of program timing for pipelined (internal pipeline registers enabled)
and unpiped (internal pipeline registers disabled) operation. For convenience, the
operations are labeled "computational," which includes simple and compounded adds,
multiplies, and divides, or "compare," which can be used to select maximum or minimum
values from succeeding pairs of numbers or from a list.
7-228
Computational Operations on Data Vectors
This section covers the following vector operations: vector add, vector multiply, vector
divide, sum of products (also called inner, scalar, or dot product), and product of sums.
Since matrix multiplication is composed of a sequence of sum of products operations,
these two functions are discussed in the same section. In some cases, a whole class of
operations is covered under one heading. For example, the vector add operation
includes sums and differences of Ai, Bi, IAi I,and IBi I in all combinations.
Vector Add
The vector add operation adds corresponding components of data vectors to obtain the
components of the output vector. Hence, for input vectors A and B and output vector V,
each with N components,
Vi = Ai
+
Bi,
1
:$
i
:$
N.
The 'ACT8837 and 'ACT8847 perform this calculation in unchained, independent ALU
mode.
Table 62 shows the contents of the data registers at successive clock cycles for N = 6
with the FPU operating in pipelined mode. Since the data travels by way of the internal
pipeline register, two cycles pass before the first sum appears in the S register. The
contents of the internal pipeline register are not given in the flow.
Table 62. Data Flow for Pipelined Single-Precision Vector Add, N
RA
RB
A1
B1
A2
B2
1
2
S
A3
B3
A1+B1
=6
A4
A5
A6
B4
B6
B5
A2+B2 A3+B3 A4+B4 A5+B5 A6+B6
p
C
Y
ClK
Y1
3
Y2
4
Y3
5
Y4
6
Y5
7
Y6
8
9
Data transfers and operations for each clock cycle are summarized in the program listing
in Table 63. Detailed information on the instruction inputs required to perform each ,....
operation is included in sections 5 and 7. Note that the selection of the output source (in qthis case, the S register), which is determined by the 16 instruction bit, is programmed ~
along with the ALU or multiplier operation that generates the output.
~
U
c:r
z
q,....
en
7-229
Table 63. Program Listing for Pipelined Single-Precision Vector Add, N
REGISTER TRANSFERS
ALU OPERATION
1.
2.
3.
lOAD RA, RS;
lOAD RA, RS;
lOAD RA, RS;
Y--s
y--s
y--s
ADD(RA,RS)
ADD(RA,RS)
ADD(RA,RS)
6.
lOAD RA, RS;
y--s
ADD(RA,RS)
=6
MULTIPLIER
OPERATION
Timing and programming are similar for other independent ALU operations involving two
operands, such as (A - B), (B - A), and compare (A,B). However, when the compare
function is used, two status bits must be generated before numeric values can be output
(see "Compare Operations on Data Vectors").
Because the vector add program closely parallels that for vector multiplication, pipelined
and unpiped modes for both vector add and multiply are compared in the next section.
Vector Multiply
The vector multiply operation multiplies corresponding elements of data vectors to
obtain the components of the output vector. Hence, for input vectors A and B and output
vector Y, each with N components,
Yi = Ai x Bi,
1 s; i s; N.
The 'ACT8837 and 'ACT8847 perform this calculation in unchained, independent
multiplier mode.
Pipelined Mode
CJ)
Table 64 shows the contents of the data registers at successive clock cycles for N = 6
with the FPU operating in pipelined mode. The product may be replaced by a variety of
other independent multiplier operations, such as - (A x B), A x IB I, - (A x IB I), IA I
x IB I, and - ( IA I x IB I). Data transfers and operations for each clock cycle are
summarized in the program listing in Table 65.
Z
--.J
~
l>
(')
-i
00
00
Table 64. Data Flow for Plpelined Single-Precision Vector Multiply, N = 6
RA
RS
A1
S1
A2
S2
S
P
C
~
--.J
Y
Y1
elK
7-230
A3
S3
A1 xS1
1
2
Y1
3
A4
S4
A5
S5
A6
S6
A2xS2 A3xS3 A4xS4 A5xS5 A6xS6
Y2
4
Y3
5
Y4
6
Y5
Y6
7
8
9
=6
Table 65. Program Listing for Pipelined Single-Precision Vector Multiply, N
1.
2.
3.
lOAD RA, RB;
lOAD RA, RB;
lOAD RA, RB;
V-P
V-P
V-P
MULTIPLIER
OPERATION
MUlT(RA,RB)
MUlT(RA,RB)
MUlT(RA,RB)
6.
lOAD RA, RB;
V-P
MUlT(RA,RB)
ALU OPERATION
REGISTER TRANSFERS
Unpiped Mode
Table 66 shows the contents of the data registers at successive clock cycles during a
vector multiply operation for N = 6 with the FPU operating in unpiped mode. The vector
add operation progresses similarly. Since there is no "single-clocked storage" in the
internal pipeline register, each product or sum is performed in one cycle.
Table 66. Data Flow for Unpiped Single-Precision Vector Multiply, N
RA
RB
A1
B1
A2
B2
A3
B3
A4
B4
A5
B5
=6
A6
B6
S
p
C
V
ClK
A1 xB1
1
A2xB2 A3xB3 A4xB4 A5xB5 A6xB6
V1
2
V2
3
V3
4
V4
5
V5
6
V6
7
8
9
Comparison of Pipelined and Unpiped Modes
For both vector add and vector multiply operations carried out in pipelined mode, results
are output to the Y bus on clocks 3, ... , N + 2. In unpiped mode, results are output to the
Y bus on clocks 2, ... , N + 1, thereby saving a cycle. Unfortunately, it is necessary to
operate at a lower clock rate in unpiped mode than in pipelined mode. The following
equation can be used to determine which of the two modes provides the faster
performance in a particular application. Pipelined operation is faster if:
(N
+
2)/Fp < (N
+
1)/Fu,
1'0
¢
where Fp and Fu are the clock rates in pipelined and unpiped modes, respectively. As of
publication, pipelined mode provides faster performance for input vectors with N > 2.
ex)
~
U
4.
Product of Sums
The product of sums operation adds corresponding elements of data vectors and
multiplies the resulting sums. For input vectors A and B, each with N components, the
product of sums operation yields a single output Y defined as follows:
N
Y=
'IT (Ai
+
Bi)
i=1
The product of differences can be computed by simply making the ALU operation
(A - B) or (B - A). The 'ACT8837 and 'ACT8847 perform this calculation in chained
mode so that concurrent operation of the ALU and multiplier is possible. The data flow
and program listing for the product of sums are identical to those for the sum of products,
except that the roles of add and multiply are reversed. The criteria used to decide
between pipelined and unpiped modes are also identical to those previously given.
Vector Divide
The vector divide operation divides corresponding elements of data vectors to obtain the
components of the output vector. Hence, for vectors A and B and output vector Y, each
with N components,
Yi = Ai / Bi,
en
1 :5 i :5 N.
2 The 'ACT8837 and 'ACT8447 perform this calculation using the Newton-Raphson
~ iterative method. This algorithm, which is described in detail in the SN74ACT8800 Family
l> Data Manual,
calculates the value of a quotient Y by approximating the reciprocal of the
Q divisor B and then multiplying the dividend A by that approximation.
00
00
~
The following sections review the vector divide programs for the 'ACT8837 and the
'ACT8847. In the 'ACT8847, the divide algorithm is built-in.
7-234
SN74ACT8837 Vector Divide
For division using single-element inputs A and B, the value of the reciprocal of B,
denoted by X, is determined iteratively using the following equation:
Xi + 1 = Xi (2 - B
x Xi)
The seed approximation, XO, is assumed to be given. The iteration stops when X is
determined to the desired level of precision. Assuming the presence of a seed ROM
providing 4-bits accuracy, three iterations are necessary to correctly determine a singleprecision result X. Given the seed for 1/B = XO, Xi+1 = XI (2 - B x Xi). A is eventually
multiplied by the value "s.
An 8-bit seed ROM is commonly employed and gives single-precision accuracy in only
two iterations and double-precision accuracy in three iterations. Instructions for
implementing an 8-bit seed ROM are included in the SN74ACT8800 Family Data Manual.
This example assumes that a 4-bit seed is used to develop the program.
Pipelined Mode
The 'ACT8837 performs the vector divide in chained mode. Table 70 shows the data flow
for pipelined operation. The value of (2 - B x Xj) is denoted as Ti. Note that the value X3
does not appear, per se, in the table, but is expressed in terms of X2 to save
unnecessary calculations. The output Y is determined from the calculation of (A x X~
x T2 in cycle 17, which is equivalent to A x X3, since X3 = X2 x T2.
In order to keep Xi available for the final calculation of Xi+ 1, a few programming "tricks"
are employed to keep the original value of each Xi within the chip while it is being altered
in the calculation of (2 - B x Xi). First, Xi is stored in the 5 register by adding 0 to it. Then,
when the 5 register is needed, Xi Is moved to the P register by multiplying it by 1.
Table 70. Data Flow for 'ACT8837 Pipellned Single.Precision
Vector Divide, N = 1
RA
RB
XO
B
B
XO
BxXO
S
P
TO
XO
X1
BxX1
X1
C
V
ClK
1
2
RA
RB
S
P
3
4
5
6
7
8
9
10
B
T1
X1
A
X2
BxX2
X2
T2
AxX2
V
C
v
ClK
V
11
12
13
14
15
16
17
18
19
20
7-235
Data transfers and operationsa.resummari~ed, in the program; listing in Table 7,1._
Because no operations begin. on even~numbered cycles,only the odd-nuJTlbered .clock
cycles are shown.
.
. . .. ,
...
Table 71. Program Listing for 'ACT8837 F'ipelineC\Single-Precision
Vector Divide, .N=1
REGISTER TRANSFERS
1.
3.
5.
7.
9.
LOAD RA, RB
ADD(RA,O)
ADD(2,':"P),
LOAD RA
AOD(P,O) ,
ADD(2,-P)
LOAD RA
LOAD RB
ADD(P,O)
ADD(2,-P)
11.
13.
15.
17.
MuLTIPLIER '.
OPERATION'
MULT(RA,RB)
MULT(S,1)
MULT(S,P)
MULT(RA,P)
. , MULT(S,1)
.,
MULT(S;P)
MULT(RA,P)
MULT(S,RB) .' .
MULT(S,P)
ALUQPERA:rION "
Y-P
In steps 1, 7, and 13, 0 is added to Xi so that Xi appears two cycles later in the S register.
In steps 3 and 9, the Xi value inthe S register is multlpliedby 1 s6 that it appears in the P
register two cycles later. In step 15, Xi (from the S register) is multiplied by the dividend A
just input to RB.
Because no operations begin on ,even cycles; two vector divide operations may
be interleaved, calculating two. quotients in .20 cycles. Table 72 shows the data flow
for computing two quotients, Y1 and Y2, where Y1 = AlB and Y2 = C/D. The
approximation for 1/B is denoted by Wi, an~ the approximation for 1/D is denoted by Xi.
Ti = (2 - B x Wi), and Qi = (2 - D x Xi).
Table 72. Data Flow for 'ACT8837 Pipelined Slngle.~Precisionlnterleaved
Vector Divide, N == 2
RA
RB
S
P
C
Y
eLK
RA
RB
S
7-236
WO
B
XO
D
WO
XO
BxWO DxXO
1
2
0
B
3
4
B
D
P
T1
W1
01
X1
W2
X2
C
Y
CLK
11
12
13
14
.
TO
WO
00
XO
W1
X1
5
6
7
8
A
C
W2
X2
T2
02
BxW2 DxX2 AxW2 CxX2
15
16
17
18
W1
X1
BxW1 DxX1
9
10
Y1
Y2
Y1
19
Y2
20
The program listing for an interleaved vector divide is similar to that for a single divide
operation, with functions listed in each odd line and duplicated in the next even line for
the second operation.
As previously stated, the time needed to compute two single-precision divide operations
starting with a 4-bit seed ROM is 20 clock cycles. Since a new pair of divides can start at
ClK = 19, the time required to perform the vector divide operation on two N-dimensional
vectors is given by the following equation:
TIME = [18 x CEllING(N/2)
+ 2] cycles,
where the ceiling function rounds to the next highest integer for fractional values. With an
8-bit seed ROM, the time reduces to [12 x CEllING(N/2) + 2] cycles, which equals
2.5 million divides per second at 15 MHz.
Unpiped Mode
Table 73 shows the data flow for a vector divide in unpiped, chained mode.
Table 73. Data Flow for 'ACT8837 Unpiped Single-Precision
Vector Divide, N = 1
RA
R8
XO
B
B
XO TO
BxXO XO
S
P
X1
B
T1
X1
BxX1 X1
A
X2 T2
X2 8xX2 AxX2
C
y
ClK
Y
y
1
2
3
4
5
6
7
8
9
10
This program uses the same methods as the pipelined version to keep Xi within the chip.
The time needed to compute a vectOr divide of two N-element vectors is (9N + 1) cycles
with a 4-bit seed ROM and (SN + 1) cycles with an 8-bit seed ROM.
Comparison of Pipe lined and Unpiped Modes
~
Using a 4-bit seed ROM, pipelined mode is faster if:
CO
~
[18 x CEllING(N/2) + 2]/Fp < (9N + 1)/Fu,
where Fp and Fu are the clock rates in pipelined and unpiped modes. As of publication,
pipelined mode provides faster performance for input vectors with N > 1.
~
~
"
Z
CIJ
7-237
A General Principle
The vector divide example illustrates a general programming principle that should be
considered whenever a program begins a new instruction every other cycle. In cases
where the C register is not used, it is simple to interleave another program, even one not
performing the same function.
Interleaving programs is not as easy if the C register is used because the C register is the
only nonpiped register. However, even using the C register, programs may often be
interleaved by staggering one against the other so that their use of the C register does
not overlap in time. Many of the programs so far discussed can be thought of as two such
interleaved programs, with the C register being used to delay the first result until it can be
combined with the second. (See, for example, the sum of products operation.)
SN74ACT8847 Vector Divide
Since the 'ACT8847 has a built-in algorithm for divide, the microprogram is more simple
than that for the 'ACT8837. Table 74 shows the data flow for pipelined operation. Data
transfers and operations are summarized in the program listing in Table.75.
Table 74. Data Flow for 'ACT8847 Plpelined Single-Precision
yector Divide
RA
RB
A1
B1
A2
B2
S
P
C
V
ClK
A1/B1
V1
1
2
4
3
5
6
7
8
9
10
Table 75. Program Listing for 'ACT8847 Plpelined Single-Precision
Vector Divide
1.
lOAD RA, RB;
v-p
MULTIPLIER
OPERATION
DIVIDE
7.
lOAD RA, RB;
v-p
DIVIDE
13.
lOAD RA, RB;
v-p
DIVIDE
REGISTER TRANSFERS
7-238
ALU OPERATION
Note that the microinstructions are presented on the steps indicated (1 , 7, 13, ...), with a
six-cycle lapse before the next operands can be input to RA and RB. Performing avector
divide of two N-element single-precision vectors takes (6N + 2) cycles in pipelined
mode. M such pairs of vectors would require [6(N x M) + 2] cycles in pipelined mode. In
unpiped mode, the equation is 7(N x M).
Compare Operations on Data Vectors
In 'inde'pendent ALU mode (unchained), two operands may be compared for equality
(A= B) and order (A > 8). Additionally, the absolute Values of either or both operands
may, be compared. The compare function' uses two status bits, the AGTB and AEQB
output signals. (When any operation other than a compare is' perforrned, either
by the ALU or the multiplier, the' AEQB signal is used as a zero detect. Hence, numerical
results cannot be output in the same cycle in which comparison status is output.)
For greatest efficiency, programs for compare operations should be written without
requiring conditional branches in thssequencer. If branches can be avoided,the
rnicrocoding is simplified and the programs are immediately scalable to SIMDsystems
employing many 'ACT8837 or'ACT8847 chips.
This section covers vector max/min and,
~st
max/min operations.
Vector MAX/MIN
The vector max/min operations compare corresponding elements of data vectors and
select the maximum or minimum value to obtain the components of the output vector.
Hence, for input vectors A and B and output vector Y, each with N components,
Yi = MAX/M IN (Ai , Bj),
1 s i s N.
Pipelined Mode
Table 76 shows the suggested data flow for a pipelined vector MAX operation, where Yi
is set to the max of (Ai, Bi) for all i. Included are rows to indicate the setting of the chain
mode instruction bit (19 for the 'ACT8837, 110 for the 'ACT8847) and the status bit being
sensed.
~
~
00
00
~
o
~
v
,...,.
z
en
Table 76. Data Flow for Pipelined Single-Precision Vector MAX
CHAIN
RA
RB
S
N
A
B1
Y
A1
Y
Y
B1
N
Y
A2.
A2.
Y
B2
N
A3
B3
B2
B1
A1
A1
P
C
Y
STATUS
ClK
Y
Y
A3
B2
A2.
A2.
Y1
1
A>B
3
2
Y2
A>B
4
5
6
7
8
10
9
A comparison starts at ClK = 1, 5, etc., when the chain-mode instruction bit is low. The
result appears at ClK = 3, 7, etc., indicated by the AGTB and AEQB signals. AGTB is
saved off-chip for use as instruction bit 16 (output source) at ClK 4, 8, etc. This value for
16 selects the output source, either the multiplier or the AlU result, at elK 6, 1D, etc. For
example, if a comparison result is A > B, the AGTB signal goes high and is used to set 16
high. 16 then selects the multiplier result (Ai) to output. Similarly, if A :s B, AGTB and 16
are low, and the AlU result (Bi) is output. The circuitous route taken by Ai on the way to
the P register is necessary because it is not possible to pass RA or RB through the
multiplier in parallel with passing the other through the AlU.
The program is not particularly well-packed and produces the vector max of a pair of
vectors of length N in (4N + 2) cycles. For M pairs of vectors of length N, the total time is
(4MN + 2) cycles. The program can be improved by applying the interleaving principle
previously discussed. The steps are rearranged so that a new operation begins every
other cycle, thus allowing two compare programs to be interleaved. Table 77 shows the
suggested data flow for a pipelined vector min/max operation, where Vi = MAX/MIN(Ai,
Bi) and Zi = MAX/MIN (Ci, Di).
Table 77. Data Flow for Pipelined Single-Precision Interleaved
Vector MAX/MIN
en
:2
.....
~
»
n
-f
CO
CO
~
.....
CHAIN
RA
RB
S
N
A1
B1
N
C1
01
Y
A1
Y
C1
Y
B1
Y
01
A1
C1
P
C
Y
STATUS
ClK
1
A>B A>B
2
4
5
3
6
N
B2
B1
A1
N
C2
02
01
C1
Y1
Z1
A2.
7
Y
A2.
Y
C2
Y
B2
Y
02
A2.
C2
A>B A>B
9
10 11
8
12
N
N
B2
A2.
02
C2
Y2
Z2
13
14
Again, Ai (and Ci) reaches the P register by an indirect route. However, this tighter
program performs M vector comparisons, two vector comparisons at a time, in
[6 x N x CEILlNG(M/2) + 2] cycles. (As previously defined, the ceiling function rounds
to the next highest integer for fractional values.) In this example, two separate vector
7-240
comparisons on two-dimensional vectors are performed, giving 6 x 2 x 1 + 2 = 14
cycles. For M = 2 pairs of vectors, all of length N, the second program is as good as the
first. For M > 2, the interleaved program performs increasingly better as M gets larger.
This second program requires more off-chip logic, since the status outputs at CLK 3 and
4 must be saved separately off-chip for use at CLK 5 and 6, respectively. This problem
can easily be avoided by starting the calculations on the second pair of vectors two
cycles later than shown (Le., at CLK 4). The time necessary to perform the vector MAX
operation on M pairs of N-dimensional vectors, two pairs concurrently, then increases to
[6 x N x CEILlNG(M/2) + 4] cycles.
Data transfers and operations for the odd lines only are summarized in the program
listing in Table 78. The complete program is obtained by repeating the equivalent of
each odd-numbered line in the next even line for the second pair of vectors.
Table 78. Program Listing for Pipelined Single·Precision Interleaved
Vector MAX/MIN
REGISTER TRANSFERS
1.
3.
LOAD RA, RB
LOAD RA
LOAD RA;
5.
ALU OPERATION
V-PIS
MULTIPLIER
OPERATION
COMPARE(RA,RB)
ADD(RA,O)
ADD(RA,O)
MULT(S,1)
Unpiped Mode
Table 79 shows the data flow for an un piped vector MAX operation.
Table 79. Data Flow for Unpiped Single·Precision Vector MAX
CHAIN
RA
RB
S
P
C
V
STATUS
CLK
N
A1
81
V
A1
V
B1
A1
N
A2
B2
B1
A1
V
A2
V
B2
A3
A2
B3
B2
1
2
4
V
B3
A3
V2
A>B
3
V
A3
A2
V1
A>B
N
5
f'
.;:t-
A>B
6
7
8
OO
00
9
The status bit is saved off-chip at CLK = 2, 5, etc., and used at CLK = 3, 6, etc., as
the 16 bit of the instruction. 16 selects either the multiplier or ALU result to output to the
Y bus at CLK = 4,7, etc.
I-
U
~
f'
Z
The program computes the vector comparison of M pairs of vectors of length N in
[3 x M x (N + 1)] cycles.
7-241
fJ)
Compa.rison
of Pipelined and Uripiped Operati6~ .
Pipelined operation is faster if~
[6
x
N
x
+ 2J/Fp
CEILlNG(M/2)
< (3 x M x N + 1)/Fu,
whereFp and, Fu are the clock rat~jn pipelined.andvnpiped mOdes, resp~cti~eIY. As of
publication, pipelined. mode provides faster performance for,~ >,1.
Ust MAX/MIN
"',"
;,
The list max(rhin operatiOr:1SSelectthEi maximum ormjnirn,un;l value,Z,of a list of N
elements. Hence, for input vector A with N components and output Z,
,
"
Z
= MAX/MIN (Ai) ,
"1 ::;i::; N.
List min/max is an essential operation in computer graphics because it is used to find the
"extents" of a polygon or polyhedron. The extents are the maximum values of X, Y; and Z
among the .list of vertices for the 'object in question. Many fotms of comparison are
possible since the absolute value of either or both ALU operands may be employed.
However, the example in this section assumes that the largest element of a list of
N elements .is desired. '
Pipelined Mode
Table 80 shows the data flow for a pipeli~ed list MAX op~ration,
where M1 = MAX(A1, A2);Mi == MAX[M(i"::'1), A(i+1)],2 ::; I::; N - 2.;'
Table 80. Data Flow for Pipelined .Single-Precision List MAX
CHAIN
RA
Y N Y Y
A1 A1 A2..
RB
S
'A2
Y
'
A1
Y
Y
M1
A3
A1
C
M1 M1
y
;
3
Y
Y
Y
Y
'
.
Y
•
,t,,'
M3
M2
A4
M2 M2
M3
"
M3
,
A>B
2
N Y
A4 A4
, ...
A2
p
STATUS
CLK
1
N Y
A3 A3
4
A>B
5
6
7
8
9
A>B
10 11 12 13 14 15 16
;
As with vector comparison, the max/min of the absolute values isaVaiiable, since the
chip operates in independent ALU mode on the comparison steps. The comparison is
between the RA register and the RS register in step 2 and between RA and C in steps 6,
10, etc. In these steps, the chip is switched into unchained, independent ALU mode. The
status is saved off-chip and used to set the SRCC Signal, which selects whether the P or
S data goes into the C register in steps 5, 9, etc.
7-242
When the list max is in the C register, at ClK == 4N - 2, the C register contents must
then be passed through one of the functional units to the output. The MAX/MIN of an
N-element Ust therefore takes 4N cycles. M such vectors can be processed in
[M(4N - 1) + 1] cycles.
Data transfers and operations for the list max operation are summarized in the program
listing in Table 81. The program is carried but in pipelined mode, alternating between
unchained and chained modes. The list max reaches the output in cycle 4N.
Table 81. Program Listing for Pipellned Single-Precision List MAX
REGISTER TRANSFERS
1.
2.
3.
ALU OPERATION
LOAD RA
LOAD RA, RB
LOAD RA
MULTIPLIER
OPERATION
ADD(RA,O)
COMPARE(RA,RB)
ADD(RA,O)
MULT(S,1)
COMPARE(RA,C)
ADD(C,O)
MULT(RA,1)
4.
5.
6.
7.
8.
9.
C
LOAD RA
LOAD RA
+-
PIS
C +- PIS
REPEAT STEPS 6 THROUGH 9 UiJTIL STEP 4N-2 IS RjACHED, THEN:
ADD(C,a)
4N - 2
Y +- S
Comparison of Pipelined and Unpiped Modes
The equivalent unpiped program takes [M(3N -1) + 1] cycles. Pipelined mode is fastest
if:
[M(4N - 1)
+ 1J/Fp < [M(3N - 1) + 1]/Fu,
where Fp and Fu are the clock rates in pipelined and unpiped modes, respectively. As of
publication, pipelined mode provides faster performance for all M and N.
Graphics Applications
This section summarizes the concepts related to creating a three-dimensional image "
and examines a few of the matrix operatiohs used in three-dimensional graphics :;
processing. These operations include coordinate transformations and clipping 00
operations. Additionally, this section illustrates some of the programming techniques
used to perform these operations.
t;
«
~
"
2!
(fJ
7-243
Creating
a 3·0 Image
ConceptUlllly, translating 3-D images to 2~D display screens involves defining a view
volume that limits the scope of the vista the viewer can see at one time. For simplicity, a
standardized frame bf reference, in which the viewer's eye is located at the brigin of the
coordinate system, is adopted in this example.
.
As illustrated in Figures 76a and 76b, the arbitrary world coordinates of the objects under
scrutiny are transformed into normalized "viewing". or "eye" coordinates that reflect this
frame of reference. Once the normalizing transformation is complete, the images within
the view volume are projected onto a 2-D view plane, which is assumed to be located,
like a projection screen, at a suitable relative distance from the viewer (see Figures 76c
and 77).
A basic model for creating a 3-D view, illustrated in Figure 78a, transforms arbitrary world
coordinates to normalized viewing coordinates and then "clips" the image to remove
lines that do not fall within the normalized view volume. Clipping is followed by projecting
the image to the 2-D projection plane (or "window"). The image is then mapped onto a
canonical 2·0 viewport display and from there onto the physical device.
To incorporate image transformations, another model must be adapted (see Figure 78b).
After clipping, instead of projecting to the view plane, a perspective transformation is
performed on the Clipped viewing coordinates, transforming the view volume into a 3-D
Viewport, the "screen system" in which image transforms are performed. Then the image
is projected to the 2-D viewport display and onto the physical device.
In .both models, the Clipping operation is performed on coordinates in the vi(3wing
system. This approach is referred to as "clipping in the eye system." In practice, clipping
is often performed after transformation to the screen system. A trivial accept/reject test is
performed on viewing coordinates, the image is transformed to the screen system, and
then Clipping is performed.
7-244
Y
vup'
c----vup
x
a
.Figure 76a. In sequence of transformations, the world coordinate positions for the house are
transformed into the normalized viewing coordinate system (also called the eye system). For clarity,
the house is pictured outside the view column. Also shown are the direction vectors VUP (view up),
VPN (view normal), and VUP' (the projection of VUP parallel to VUN onto the view plane.
Yv
vup'
Figure 76b. After a series of translations,
rotations, and shearing and scaling
operations, the view volume becomes the
canonical perspective projection view volume,
which is a truncated pyramid with apex at the
origin, and the house has been transformed
from the world to the viewing coordinate
system.
Figure 76c. This figure illustrates the
projection of the house from the perspective
of the viewer, with eye located at the origin of
the coordinate system.
""d'
00
00
~
u
«"d'
Figure 76. Creating a 3-D Image
J. D. Foley and A. Van Dam, Fundamentals of Interactive Computer Graphics, Addison-Wesley Publishing
Company, Reading, MA, 1982,291-293. Reprinted with permission.
7-245
"enZ
The following sections illustrate programming techniques used in both of these
approaches to normalizing, clipping, and tra,nsforming a 3-D image. The operations are
grouped as "3-D Coordinate Transforms," "Clipping in the Eye System," and "Clipping in
the Screen System."
Y
VIEW VOLUME
WORLD COORDINATE SYSTEM
PROJECTION
PLANE
VIEWING (EYEI
COORDINATE SYSTEM
Figure 77. View Volume
Adapted with permission from a paper by Stephen R. Black entitled "Digital Processing of 3-0 Data to Generate
Interactive Real-Time Dynamic Pictures" from Volume 120 of the 1977 SPIE journal "Three Dimensional
Imaging."
7-246
3-D
WORLD
TRANSFORM
TO
EYE SYSTEM
~
COORD
2-D
VIEWING
COORD
•
VIE~NG
-----i.~
COORD
TRANSFORM
TO
2-D VIEWPORT
[J
CLIP
2-D
NORMALIZED
DEVICE
COORD
PROJECT
TO
WINDOW
---+~
TRANSFORM
TO
PHYSICAL
DEVICE
~
Figure 78a. Model of Procedure for Creating a 3·0 Graphic
3-D
WORLD
COORD
~
TRANSFORM
TO
EYE SYSTEM
VIE~NG
---+.
COORD
[J
CLIP
3-D
NORMALIZED
DEVICE
COORD
•
3-D
IMAGE
TRANSFORM
PROJECTION
TO
2-D
-----i.~
TRANSFORM
TO
SCREEN
SYSTEM
2-D
NORMALIZED TRANSFORM
_ _ _.~
TO
DEVICE
COORD
PHYSICAL
DEVICE
Figure 78b. Model for Creating and Transforming a 3·0 Image
Three-Dimensional Coordinate Transforms
One of the computationally-intensive functions of a 3-D computer graphics system is that
of transforming points within the object space, such as translating an object or rotating
an object about an arbitrary axis. Equally complex is the transformation of pOints within
the object space (or "world coordinate system") into pOints defined by a particular
perspective and located within the viewing space (or "eye coordinate system"). This
latter process, known as the viewing transformation, generates points in a left·handed
cartesian system with the eye at the origin and the z-axis pointing in the direction of view.
The arbitrary world-system view volume and the objects therein are translated, rotated,
sheared, and scaled to match the predefined, canonical view volume of the eye system.
"
"=t
~
I(,)
«
For a "realistic" image, the canonical view volume will be a truncated pyramid that mimics
"=t
the cone of vision available to the human eye. Alternatively, the volume can be a unit
cube. The series of operations that make up each transformation differ, but if Z
CJ)
homogeneous coordinates are used, either transformation can be expressed as a
simple matrix multiply.
"
7-247
For each point (X, V, Z) in the world system, a projection inhomogeneous coordinates is
denoted by (Xh, Vh, Zh, Wh) where,
.
. " ,
(Xh, Vh, Zh, Wh)= (X x Wh, V x Wh, Z x Wh, Wh),
and Wh is simply a scale factor, typically unity whenJloating point numbers areus.ed.
(With fixed point values, non unity values of Wh are used to maximize use of the numeric
range.) To transform a point in homogeneous coordinates, it is post-multiplied bya4x4
transform matrix:;',')~",:
[Xh', Vh', Zh', Wh'] = [Xh, Vh, Zh, Wh] x [A11 A12 A13 A14]
A21 A22 A23 A24
•. A31.A32A33,A34
A41A42A43A44
The transformed pOint can later be converted back to 3-space by dividing byWh: .
The transform matrix is constructed by multiplying together a sequence of matrices,
each of which performs a simpl!:! task. The product of 4 or 5 elementary matrices may be
used to perform some complex overall operation on a set of points representing an
object or an entire scene. Once constructed, the transform matrix is used on each point
of the object to be transformed.
This section describes two approaches to the viewing transformation--the gener'aJ c~se
and the specific yet typical case in which a reduced version of the transform matrix m/iiY
be used. Performance times are given for 15-MHz and 3D-MHz frequencies, which
roughly correspond to the operating speeds of the '8837 and '8847, respectively.
Operation with General Transform Matrix
Table 82 shows part of the data flow for the pipelined and chained program for the
product of the homogeneous point [X, V, Z, W] and the 4 x 4 transform matrix A~
Table 82. Partial Data Flow for Product of
General Transform Matrix
[X, V, Z,
W] and
RA
X
y
Z
W
x
y
z
w
x
y
RB
A11
A21
A31
A41
A22
P1 (1)
P2(1)
A12
51 (1)
P3(1)
P2(1)
P4(1)
P2(1)
A32
53(1)
P1(2)
A42
54(1)
P2(2)
53(1)
A13
'51 (2)
P3(2)
P2(2)
A23
T1
P4(2)
P2(2)
3
4
5
6
7
8
5
P
·C
X!
Y
ClK
7-248
1
2
9.
10
The technique is that already illustrated for the sum of products operation. The numbers
in parentheses indicate which column ofthe transform matrix is involved in the operation.
Here, P1 (i) = X x A1 j, P2(i) = V x A2i, etc. 51 (i) = P1 (i) + 0, 53(i) = 51 (i) + P3(i), 54(i)
= P2(i) + P4(i), and Ti = 53 (i) + 54(i). T1 = X', T2 = V', T3 = Z', T4 = W'. As in the sum
of products illustration, in order to make the most efficient use of the 5 register, P2 is
used directly instead of summing by 0 to form 52.
The time to transform N pOints in a system is 16N + 6 cycles. The system can transform
approximately .94 million points per second at a clock rate of 15 MHz and 1.875 million
pOints per second at a clock rate of 30 MHz.
Operation with the Reduced Transform Matrix and Wh = 1
Because viewing transformations are frequently carried out using a single-vanishingpoint perspective, the 3 x 1 column that performs perspective transformations with
multiple vanishing pOints is often not used. Additionally, with Wh = 1, the 1 x 1 scale
factor is often equal to one. In these cases, the transform matrix takes the following form:
[".0]
... 0
".0
".1
:
With multiple vanishing points, and in other graphics operations such as clipping, 4 x 4
matrices are used with nonzero values in the fourth coiumn. The transform matrix is
termed "reduced" when its fourth column is the same as that previously shown. In such
cases, the transform of each point requires only 9 multiplications and 9 additions.
Table 83 shOws part of the data flow for the reduced matrix program.
Table 83. Partial Data Flow for Product of [X, V, Z, W] and Reduced
Transform Matrix
RA
RB
X
A11
Y
A21
5
P
Z
A31
P1 (1)
x
A41
X
A12
P2(1)
P1(1)
P3(1)
P2(1)
4
5
Y
A22
51 (1)
x
Z
A32
52(1)
P1 (2)
51 (1)
P2(2)
P1 (2)
7
8
A42
Y
eLK
1
2
3
6
X
A13
T1
P3(2)
P2(21
X'
9
~
'I::t
co
co
~
u
«
Again, the numbers in parentheses refer to the column of the transform matrix involved in 'I::t
~
the operation. In this case, however, only the first three columns are used. Hence, for Z
1 :s i :s 3, P1(i) = X x A1i, P2(i) = V x A2i, etc. 51 (i) = P1(i) + A4i. 52 (i) = P2(i)+ P3(i), en
and Ti = 51 (i) + 52(i). T1 = X', T2 = V', T3 = Z'. Note that W values are not calculated
since they are all 1.
7-249
The time to tran~form N pOints in a system is (12N + 5) cycles; The system can transform
1,25 million points per second 'at 15 MHz and 2.5 million points per second at 30 MHz.
Three-Dimensional Clipping
Once an image istransfotmed into viewing coordinates, it mustbe clipped so that lines
extending outside the view volume are'removed. There are several approaches to
Clipping, some moreefficientthan others. This section surveys the most commonly used
techniques and estimates the throughput of several single- and multi-processor
arrangements.
'
First considered is the technique of fully clipping the line segments to fit within the
viewing pyramid in the eye coordinate system. This technique is commonly referred to
as "Clipping before division."
Clipping in the screen system is considered second; This method eliminates lines that
are obviously invisible in the eye system; the rest are clipped after projection to the
screen.
Clipping in the Eye System
If an object is composed of straight line segments and a perspective view is to be taken,
the viewing volume is a pyramid defined by the following plane equations:
x = K x Z, X = -K x Z, Y = K x
Z, Y
= -K x
Z,
where K is a constant to be defined below. Thus, -KZ < (X,V) < KZ. Two other clipping
planes are usually employed at Z = Nand Z = F, where Nand F are the near and far
'
limits, respectively, of the view. This gives:
N < Z < F.
rJ)
Z
Looking in the direction of the z-axis (see Figure 79), the eye can imagine a screen
located at a distance N from the eye. K is formed from the half-screen height divided
by N. A specific line segment might intersect any or all of the six clipping planes. One
common approach to this problem is to use six processors in a pipeline, each Clipping
the line to one plane.
--..I
~
l>
(")
-t
00
00
~
--..I
7-250
/ ' _ k, pl.n,
SCREEN
~
Ve
<1i~~--+---",~"""",
k
_(~)
I&-
~=--=-_N-----J_J ,_x Plane~
-_kZ
FAR VIEWING
LIMIT
Figure 79. Viewing Pyramid Showing Six Clipping Planes
Consider the case of clipping the line defined by the points P1 = (X1, Y1, Z1) and
P2 == (X2, Y2, Z2) against the Z = N plane. First computed are (Z1 - N) and (Z2 - N). If
both are negative, the line is invisible, and a notation meaning an empty line is passed
on. If both are positive, both ends of the line are on the visible side of the Z = N plane,
and the line is passed on unclipped.
When one of these computed values is negative and the other positive, the line must be
clipped and the new values for its endpoints passed down the rest of the pipeline. To do
so, a parameter t that indicates what fraction of a segment Z1Z2, and therefore of P1 P2
as a whole, lies on the P1 side of the Z = N plane, is computed as follows:
t
=
(Z1 - N)/(Zl - Z2).
In general, the value of the parameter is derived as described in Newman and Sproull,1
using the following equations of the line: X = X1 + (X2 - X1)u; Y = Y1 + (Y2 - Y1)u;
Z = Z1 + (Z2 - Z1)u. These equations are each inserted into the corresponding plane
equation. In the current example, N = Z1 + (Z2 - Z1)t.
~
ex)
ex)
Since N is between Z1 and Z2, t is always positive, and the signs of Z1 - Nand Z2 - N ~
(,)
are used to determine which end to clip. If Z1 - N is negative, the P1 end is clipped,
(')
-I
00
00
~
.....
LOAD
LOAD
LOAD
LOAD
LOAD
RA,RB
RA, RB
RA, RB
RA, RB;
RA, RB
LOAD RA" RB
y...s
y ....s
MULTIPLIER
OPERATION
AlU OPERATION
REGISTER TRANSFERS
C-S·
C-s
V-S
V-S
v-S
LOAD RA
ADD
ADD
ADD
ADD
'ADD
ADD
ADD
ADD
ADD
ADD
(RA,-RB)
(RA, - Ra)
(RA,-RB)
(RA,O)
(O,'-ICI)
(2,-P)
(C, -ICI)
(RA,-RB)
(RA, -RB)
(P,O)
ADD (2,-P)
LOADRB
tn" Z=, N PI~ne
MULT(RA,RB),
,
k'
,
\
, , MULT(S.I)'
..
MULT(S,P)
.
',"
. MULT(RA,P)
MULT(S,RB)
MULT(S,P)
LOAD RA
LOAD RA
LOAD RA
LOAD RA
LOAD RA,
LOAD RA,
LOAD RA,
LOADRA,
LOAD RB
LOAD RB
C-P
C-p
RB
RB
RB
RB
V-S
V-S
v-S
V-S
V-S
V-S
C ...... S
ADD
,ADD
ADD
ADD
(P,O)
(P,RB)
(P,RB)
(P,RB)
ADD (P,RB)
ADD (P,RB)
ADD (P,R6)
MULT(IRAI,IPI)
MULT(IRAI,ICI)
. MULT(RA,P)
. MULT(RA,C)
MULT(RA,C)
MULT(RA,C)
MULT(RA,C)
MULT(RA,C)
In pipelined mode, computing (Z1 -~) t~kes 2 cycles. This v~I,\Jeis passed off-chip~l'Jc;!
used to get the first approximation to 0.5/(Z1 - Z2) from,an 8-bit seed ROM. Ite,ration to
correctly determine the value begins in the 4th cycle, with subsequent operations
starting on even-numbered cycles. The computations of H1'and H2'are interleaved with
the divide algorithm and are completed before it. '
(X2 - X1), (Y2 - Y1), and (Z2 - Z1) are also ~omputeqduring th~divide.The'vaiues of
t1 and t2 are ready in steps 18 and 19. New values of X1, X2,Y1, Y2, Z1,andZ2 are all
computed and output by step 28. Each chip, therefore, clips against one Clipping plane
in 28 cycles. With a two-cycle overlap, the hextline segment can be presented in cycle
26 .
7-254
For the two X and two Y clipping planes, the c.alculations are slightly more complicated.
For the X = KZ plane, the two parameters ti are defined in terms of the values W1 = KZ1,
W2 = KZ2 and H1 = W1 - X1, H2= W2 - X2 as follows:
IH1'/;2(H1
t1 =
- H2) I and t2 =
IH2'/2(H1
- H2) I,
where, as before, Hi' = Hi - IHi I. The equations for the new endpoints, (X1', Y1', Z1')
and (X2', Y2', Z2'), are the same as before. It is still possible to compute the new
endpoints in under 30 cycles. At 15 MHz, a six-chip '8837 system would clip 577,000 line
segments per second.
In the '8847 a similar process is employed, but the built-in divide instruction is used
beginning in step 7 and ending in step 15. t1 and t2 are calculated by step 18, and the
entire operation completes in step 27, one cycle shorter than for the '8837. The data flow
is shown in Table 86. A six-processor '8847 system operating at 30 MHz would clip
1.2 million line segments per second with a new operation beginning every 25 cycles.
Table 86. Data Flow for Clipping a Line Segment Against the Z
USing the SN74ACT8847
RA
RB
Z1
Z2
Z1
N
Z2
N
d
S
X2
X1
H1
P
C
0.5
d
H2
X2X1
H1'
Y2
Y1
H1'
H2'
Y2Y1
H2'
1/D
H1
·Y
d
H2
X2X1
H1'
H2'
7
8
= N Plane
SAME AS FOR
'8837
t1
1/D
Y2Y1
t2
t1
STATUS
ClK
1
2
~
4
5
6
14
15
16
17
STEPS
20
THRU
28
18
Since the performance levels obtained from the six-chip systems described below are
slower than the rate of endpoint transformation by a single-chip system, some further
speed improvement is desirable. Hence, rather than going through the code for clipping
to the X and Y planes, another approach is proposed.
.....
Clipping to All Six Planes at a Time
~
The "window edge Clipping method" derived in Newman and Sproull can be used to clip CO
CO
to all six planes at once. Recall that the viewing volume for a perspective view is a Ipyramid defined by the following plane equations:
U
X = K x Z, X = -K x Z, Y = K x Z, Y
= -K x
Z, Z
n
~
00
~
To take advantage of this speedup, the only change in the sequence given above js that
while computing Q and R, the logjcal AND and OR is formed for the signs of the
corresponding pairs of values, Qj and Ri. This is best performed off-chip if the '8837 is
being used but may be done using independent ALU (unchained) mode in the '8837 or a
logical operation in the '8847. For the '8837, with two operands Qj and Ri, Table 89
shows the A > Bstatus bit for an A > BcomparisononA=-Qjx IRil and B = IQil x Ri
for all signs of Qj and Rj.
......
7-258
Table 89. A > B Comparison Function Table
Sign QI
Sign RI
+
+
-
Sign A
+
+
= -QI x IRJi
+
+
-
Sign B
= IQII x RI
-
+
-
+
A>B
T
F
F
F
A=B
F
T
T
F
The A > B status provides the needed AND function of the sign bits of Oi and Ri. In
computing these A > B values, if A > B is TRUE, the sequencer branches to code that
rejects the line as invisible. A comparison A > B of A = (Oi x IRi I) and B = (I Oi I x Ri)
gives the logical AND of the complement of the sign bits. It is TRUE when both Oi and Ri
are positive. If all six values are TRUE, the sequencer can branch to code that passes the
line segment unclipped.
For a three-processor parallel system, lockstep operation with a single sequencer is still
possible since aU three processors are working on the same line segment, and the
branch options apply equally to them aU. The estimated time for a three-processor
system is 56 cycles; not much interleaving is possible.
Now that the operations have been reduced to a minimum, the remaining steps are
necessarily sequential. Rejecting invisible or passing totally visible line segments without
division, however, is still beneficial.
Clipping in the Screen System
In most graphics systems, full line clipping is not performed in the eye system. Instead, a
trivial accept/reject test is performed, in which the line segments are simply tested
against the six clipping planes. If a line has both ends on the invisible side of anyone of
the Clipping planes, it is rejected. Lines surviving this test may still be outside the viewing
pyramid. In any case, the lines are transformed to the screen coordinate system and
then clipped against a cube defined by the simple plane equations -1 < (X, V, Z) < 1.
The next three sections describe this process.
Trivial Accept/Reject Test
In the eye system, the clipping planes are:
X
= W, X = -W, V =
W, V
= -W, Z = N, and Z = F,
7-259
where W = K x Z. After -W1 and -W2 are computed, a sequence of comparison
operations are performed, summarized as follows:
with
with
with
with
with
with
X1
X1
Y1
Y1
Z1
Z1
in
in
in
in
in
in
RB and
RA and
RB and
RA,
RB and
RA and
-W1 in P,
-W1 in e,
-W1 in e,
P > RB (Le., -W1 > X1)
lei
RA >
e>
(Le., X1 > W1)
RB
e
RA > I I comparison
RA > RB (Le. N > Z1)
RA > RB (Le., Z1 > F).
N in RA,
F in RB,
These six operations are carried out in successive cycles and then repeated for (X2, Y2,
Z2). The two six-tuples are saved off-chip and a bit-wise AND is carried out. If anyone of
the resulting six boolean values is TRUE, the line is rejected. This entire operation takes
only 16 cycles, thereby providing a speed of 1,071,000 line segments per second at
15 MHz and 2,143,000 line segments per second at 30 MHz. The data flow for an accept/
reject test is given in Table 90. Accept/reject testing of individual points takes only
8 cycles.
Table 90. Data Flow for Accept/Reject Testing
CHAIN
N
N
RA
K
K
RB
Zl
Z2
Y
Y
Y
Xl
Xl
Y
Y
Y
Y
Y
Y
Y
Y
Y
Yl
N
Zl
-W2
X2
-W2
Yl
N
Z2
Zl
F
X2
-W2
Yl
-W2
Z2
F
Yl
N
N
S
p
-W~
C
-Wl -Wl
Y
-W2
STATUS
ClK
1
2
3
-W2
-Wl
-Wl
-Wl
-W2
-W2
-Wl
>Xl Xl>Wl >Yl Yl>Wl N>Zl Zl>F >X2 X2>W2 >Y2 Y2>W2 N>Z2 Z2>F
12
13
14
15
16
4
7
9
10
11
5
6
8
Transformation to the Screen System
en
2
.....
After the line segments have passed the trivial accept/reject test, they are transformed to
the screen coordinate system. The following transformation is first applied to the Z
coordinate in order to scale its Clipping planes to Z' = -W, and Z' = W:
Z' = [-W x (F
+ N)]/(F - N) + (2 x W x Z)/(F - N) .
~
l> The value of 1/(F - N) is constant for all line segments and is therefore computed only
(")
-I
CO
CO
~
.....
once. In fact, two constants, a = 2K1(F - N) and b = - (F + N)/2, can be available so that
Z' = Z x a x (b + Z). (Note that other transformations on Z can also be used.)
After the trivial accept/reject test, the following transformation to the screen system
occurs:
Xs
7-260
= X/W, Ys = Y/W, Zs = Z'/W.
The clipping planes then have these equations:
Xs
= -1, Xs = 1, Ys = -1, Ys = 1, Zs = -1, Zs = 1.
Z1' and Z2' can be formed in 8 cycles. Only two reciprocals, 1/W1 and 1/ W2, need to be
computed, and they can be interleaved and completed in 13 cycles in an '8837 if an 8-bit
seed ROM is employed and in 12 cycles in an '8847. The line segment is transformed to
the screen system in a further 6 cycles. The total is 26 cycles for the 'ACT8847 and
27 cycles for the 'ACT8837. A single-processor system would transform 600,000 line
segments per second with a 15 MHz clock and 1.2 million line segments per second at
30 MHz.
Note that the above projection does not preserve planarity. See Newman and Sproull for
perspective projections that do preserve planes.
The Clipping Operation
The final operation on line segments is to clip them to the cube:
Xs
= 1, Xs = -1, Ys = 1, Ys =
-1, Zs
= 1 and Zs = -1.
It is important to realize that the required resolution of Xs , Ysand Zs may only be 10 or
11 bits. Any divisions needed in an '8837 implementation at this stage could feasibly be
done entirely by table look-up. It would certainly not be necessary to perform more than
one iteration if an 8-bit seed ROM is employed. Two divisions can therefore be
interleaved and completed in 7 cycles. However, three iterations are assumed in this
example to give full single-precision accuracy.
Consider a three-processor pipeline, with each processor clipping against two parallel
planes. The first will clip against the x planes -1 < X < 1. For clipping the P1 end of the
line segment, 0 = (1 + X1, 1 - X1) is computed and 0' is formed, where OJ' = 0i -I Oi I.
I.e.,
01' = 2(1 + X1), if (1 + X1) < 0; 01' =
02' = 2(1 - X1), if (1 - X1) < 0; 02' =
a otherwise.
a otherwise.
At least one of OJ' will be zero; the other will be negative. Hence, MIN(01', 02') = 01' I"'+ 02' = [(1 + X1) - 11 + X111 + [(1 - X1) - j1-X11l. Therefore, MIN(01', 02') = (1 ~
- IX1j) - 11 - IX111. SO, t = l(m1-jm11) / 2dl and s = l(m2-lm2j) / 2dl, where ~
mi = 1 - lXii, and d = X1 - X2. Note that only one reciprocal is required per processor. ~
U
A three-processor parallel system would have each processor work on one dimension, 00,
2.54 (0.1001
I
35.1 (1.3801
32.5 (1.2801
n~l.~!.~t~ ~ ~f~,:""."OM
OIA (4 PLACES)
0.406 (0.0161
OIA TYP
2.54 (0.100)
2.54 (0.100) T.P.
(See Note
00000000000@0
0000000000000
000@000000000
0000000000000
0000000000000
30.5 (1.200) REF
H 0000000000000
G0000000000000
F0000000000000
E0000000000000
00000000000000
C 0000000000000
80000000000000
lL.---A0000000000000
T.P.
AI
I ~:
1
2
3
4
5
6
7
8
...caca
ALL POSSIBLE PIN LOCATIONS ARE
SHOWN. SEE APPLICABLE PRODUCT
OAT A SHEETS FOR ACTUAL PIN
LOCATIONS USED.
C
'ii
(J
°2
ca
.c
(J
CI)
9 10 11 1213
~
NOTE A: Pins are located within 0,13 (0.005) radius of true position relative to each other at
maximum material condition and within 0,381 (0.051) radius relative to the center of
the ceramic.
ALL LINEAR DIMENSIONS ARE IN MILLIMETERS AND PARENTHETICALLY IN INCHES
9-7
13 x 13 GC pin grid array ceramic package
.. ____ 35.1 (,.3801 ____....1
32.5 (1.280)
I
r
~
INDEX CORNER
MARK OR CHAMFER
, .27
l
~
10.051.45·
35.1 (1.380)
32.5 (1.280)
::~: ::::::h
~ ::: ::g;g:
,. . ",..~~ ~ !l,Hn~~ ~ ~~~,:" .
2.54 (0.100)
0.406 (0.016)
OIA TYP
2.54 (0.100) T.P.
I
3:
(')
:::r
I»
::s
,),
e!.
o
I»
r+
I»
(See Note A)
N~:0000000000000
0€>000®0000000
0000000®00000
00000®0000000
000000000000®
30.5 (1.200) REF H 000®000®0®®00
G0000000000000
F 0000000000000
E0000000000000
00000000000000
c0®0®00000®00®
80000000000000
L---A00®®0000000®0
1 2 3 4 5 6 7 8 9 10 11 1213
CD
"oo.
DIA (4 PLACES)
2.54 (0.100) T.P.
ALL POSSIBLE PIN LOCATIONS ARE
SHOWN. SEE APPLICABLE PRODUCT
OATA SHEETS FOR ACTUAL PIN
LOCATIONS USEO.
NOTE A: Pins are located within 0,13 (0.005) radius of true position relative to each other at
maximum material condition and within 0,381 (0.051) radius relative to the center of
the ceramic.
ALL LINEAR DIMENSIONS ARE IN MILLIMETERS AND PARENTHETICALLY IN INCHES
9-8
15 )( 15 GB pin grid array ceramic package
r
INDEX CORNER"",-"
----il
40.1 ( 1 . 5 8 0 ) - 1
37.6 (1.480)
I
.~~.
40.1 (1.580)
37.6 (1.480)
L..-.--._ _ _ _ _ _ _ _
L -_ _ _ _ _ _ _ _ _ _ _ _ _ _ _
~
________
~
_ _ __
j
4.95 (0.195)
if5
.
"
'''
Source Exif Data:
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.3
Linearized : No
XMP Toolkit : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19
Create Date : 2017:07:29 12:21:31-08:00
Modify Date : 2017:07:29 14:38:06-07:00
Metadata Date : 2017:07:29 14:38:06-07:00
Producer : Adobe Acrobat 9.0 Paper Capture Plug-in
Format : application/pdf
Document ID : uuid:aee954e9-aa9f-4349-87f3-f98f0135c06d
Instance ID : uuid:0d764b12-bc0c-a346-84ee-98ff7640a29d
Page Layout : SinglePage
Page Mode : UseNone
Page Count : 754
EXIF Metadata provided by EXIF.tools