ILLIAC_IV_System_Study_Progress_Report_No_2_Oct66 ILLIAC IV System Study Progress Report No 2 Oct66

User Manual: ILLIAC_IV_System_Study_Progress_Report_No_2_Oct66

Open the PDF directly: View PDF PDF.
Page Count: 74

DownloadILLIAC_IV_System_Study_Progress_Report_No_2_Oct66 ILLIAC IV System Study Progress Report No 2 Oct66
Open PDF In BrowserView PDF
OCTOBER 13, 1966

illlAC IV

SYSTEM STUDY
PROGRESS REPORT NO. 2

SUBMITTED TO

UNIVERSiTY OF ILLINOIS

UNIVERSITY OF ILLINOIS PURCHASE ORDER NO. 09852-B

CONTENTS

Page

1-1

SECTION I

INTRODUCTION

•

SECTION II

THE ILLlAC IV SYSTEM
ROUTING
Shifting at the PE Level.
Shifting at the Quadrant Level.
Inter-Quadrant Transfers •
Machine Instruction •
THE MULTIPLY ALGORITHM
EIGHT-BIT WORD LENGTHS
COMMON DATA PATHS.
Input-Output Buffering •. •
From Control Unit to Proces sing Element.
From Processing Element to Control Unit.
Evaluation .
MODE.
Introductory Considerations
The Problem
Back-up Storage in the Control Unit.
Back-up Storage in Processing Element
Functional Capability.
Comparison.

2-1
2-1
2-3
2-3
2-7
2-8
2-8
2-12
2-14
2-14
2-15
2-16
2-16
2-17
2-17
2-18
2-18
2-19
2-20
2-20

SECTION III

REVISED PE LOGIC DESIGN •
PROCESSING ELEMENT DESCRIPTION
MEMORY DA TA REGISTER (MDR) •
OPERAND SELECT GATES (OSG) •
A-B REGISTER (ACCUNIULA TOR)
LOOIC UNIT
BARREL SWITCH.
LEADING ONES DETECTOR •
MULTIPLJCAND SELECT GATES

3-1
3-1
3-1
3-4
3-4
3-4
3-5
3-5
3-5

•

- iii-

CONTENTS (Cont. )

Page
(Continued)
PSEUDOADDER TREE
CARRY PROPAGATE ADDER (CPA)
MODE REGISTER.
ADDRESS REGISTER.
PE INDEX REGISTER
MEMORY ADDRESS REGISTER (MAR) .
SPECIAL REGISTER (SR)

3 -5
3 -5
3-6
3-6
3-6
3-7
3-7

SECTION IV

THE I/O SUBSYSTEM
GENERAL CONSIDERATIONS.
DISK STORAGE.

4-1
4-1
4-2

SECTION V

PROGRAMMING
REVISED PE INSTRUCTIONS .
CONTROL UNIT INSTRUCTIONS.
Elements of the Control Unit
Discussion of the Instructions.

5 -1
5 -1
5-4
5-6
5-6

SECTION VI

ILLIAC IV APPLICATIONS STUDY.
DESCRIPTION OF THE COOLEY-TUKEY ALGORITHM
MACHINE I:=\lPLEMENTA TION
IMPLEMENTATION ON ILLIAC IV .
STORAGE OF THE COEFFICIENTS A(k)
COMPUTATION AND STORAGE OF INTERMEDIATE
RESULTS.
COMPUTA TIONS REQUIRED

6-1
6-1
6-3
6-4
6-6

CIRCUIT DESIGN - THE ECL CIRCUIT

7 -1

SECTION III

.

SECTION VII

-iv-

6-6
6-10

LIST OF ILLUSTRA TIONS

Figure

2-1
2-2
2-3
2-4

Page

2-2
2-2
2-6

PE Level Shifting •
Quadrant Nearest Neighbor Connections.,
Pseudo Physical Layout of a Quadrant •
"Nearest Neighbors" Arrangement of Full Array,
Showing Quadrant Subarrays •
Logic Elements Involved in Multiply Algorithm •
System for Handling Data From Control Unit to
Processing Element, Block Diagram •

2-10

3-1

Revised PE Logic Design, Block Diagram.

3-2

6-1

Interpretation of k as a Binary Number to Define
Storage Location of A(k}
The Numbering of the PEls Within a Quadrant
The Dis tribution of the Coefficients A(k} in the Array Initially

•

6-5
6-5
6-6

•

7-2
7-2
7-2
7-4

2-5
2-6

6-2
6-3

7-1
7-2
7-3
7-4
7-5

ECL Gate, Schematic Diagram
Driver, Schematic Diagram.
Receiver, Schematic Diagram.
ECL Driver-Receiver, Balanced Signals, Schematic Diagram
Use of Drivers and Receivers for Distributing Control
. Signals from CU to PEls

2-6
2-10

7-7

Table

2-1
2-2

Shift Table, Showing Shifts by One's and Shifts by Eight's.
Relationship of Gate Size to Multiplier Size

2-5
·2-11

-v-

LIST OF ILLUSTRA TIONS (Cont'd)

Table

-vi-

Page

3-1

PE Logic Requirements

3-3

4-1

512-Bitl Parallel Organizations for the Librascope 4802

4-6

6-1
6-2

Shifts Required for Combinations of Bits j 2 and k
•
rm-r
Shifting Required for the Final Eight Iterations.

6-8

7-1

Signals Requiring Driver and Receiver

7-6

6-9

SECTION I
INTRODUCTION

This report is the second of three reports to be submitted during the Phase I effort
of the ILLIAC IV Program. At this measured milestone in the development of the
ILLIAC IV system real progress is indeed evident.
There exists now a detailed specification of the Processing Element for which a
fully compatible and complete instruction repertoire has been developed. In addition,
the data transfer paths establishing the links between the Input-Output and the
Array, between the Control Unit and the Array, and between the elements of the
Array have been specified.
Progress in defining the system hardware has also been made. A family of logic
circuits has been selected and is currently undergoing an intense packaging effort.
The size, type and speed of the memory system for the Array ha's been selected.
,Progress in defining the applications areas, particularly that progress made at the
University of Illinois, is especially evident.
The contents of this report in conjunction with the first report contain much of the
rationale for the present system definition.
For the remainder of Phase I, much work remains. Although most of the functions
and specifications of the 110 and the Control Units have been identified some
additional work is needed.
In the area of packaging, both at the cabinet level and the logic level, detail design
remains. Such items as power distribution, system cooling and general installation
detail must be specified. Finally the entire procurement must be matched to a
comprehensive program plan of schedule and delivery which is mutually acceptable.

1-1

SECTION II
THE ILLIAC IV SYSTEl'vl

This section contains discussions of specific systerns problems which are considered to be of major importance in the design of the ILLIA C IV System.

ROUTING
One of the instructions in ILLiAC IV is to transfer data residing in the PE array
to new locations in that array. This is called the "routing" instruction. One
version of this instruction will take the data from the nth PE and transfer it to
the (n + m)th PE~ where m is an indexable variant contained within the instructionJ
and n runs over all PEr s. Disabled PEr s are not to receive any new data or lose
any of the data they are currently holding as a result of this instruction.
The immediate reaction to such a requirement is to implement all the required
various paths by brute force. Such a solution is not only expensive~ but unnecessary, and of inferior performance. To find a superior method of inlplementing
routing, it will help to consider individual properties of the routing process. From
a functional viewpoint the four properties of concern for routing are the level~
the timing, the modulo~ and the increment.
In the array of 256 PE's the levels of shifting with'which we are concerned may
be divided into shifts between quadrants, shifts between the same elements of a
quadrant and shifts within a Processing Element.
The timing refers to the execution time of the shift and its concurrency with other
array instructions.
The modulo is the end-around size of the shift which can be variable up to a given
maximum size. For instance a 64- bit shifter may be designed to shift 64 bits
end-around or eight 8-bit bytes end-around depending on the modulo control.
The increment is the smallest shifting amount of which all shifts are a multiple.
Bit shifts would have an increment of 1" byte shifts would require 8, etc.

2-1

PE 1

PE 2

PE 3

A

A

A

I

-

I

..

I

I

I

I
t

BYT E

I

CON TRO~

I

t

BARREL SWITCH

I
I

BA RREL SWITCH

,

t

BARREL SWITCH

I
I

ORF UNCTION

,

B
~

'---

I
I
t
I

B

-

-

B

I

Figure 2-1.

PE

PE

1

2

PE Level Shifting

- - - - - - - - - - - - ----+---.!

PE
8

NORTH

PE
9

SOUTH
EAST
WEST

QUADRANT GATE COUNT
4
><
64
X
64 = 16.000 Gat s/Quadr nt
Gates/Bit X Bits/WD >< PE s/Q

PE

PE

57

58

Figure 2-2.

2-2

14--- -

--- -

-- -

-

- - - -~~

Quadrant Nearest Neighbor Connections

PE
64

The variations on these properties are many, and since the function involves a very
large number of bits the different variations involve substantial differences in
cost.
Shifting at the PE Level
The first level of consideration is the shifting of operands within the PEe Here
a barrel switch is provided which will shift in increments of 1-bit-multiples at
the 64- bit-operand level.
A modulo control may be employed by adding an additional logic input to the first
level of gating in the barrel switch and an additional gate to the receiving register input. Such modulo capability may include 8-, 32-, and 64- bit bytes and
be extended to link PEts by a full word transfer at the end of the switching cycle
(figure 2 -1).
The execution time of this shifting will require one pass through the barrel
switch for modulo 64 shifts. two passes through the barrel switch for end off
switching. and four passes for end- around. Otherwise a transfer of Register
B to the neighboring Register A will extend this capability between PE's. For
the most part this logic capability exists within.a PE, and there is no concurrency of execution at this level.

'Shifting at the Quadrant Level
At this level of shifting, considering the variety of desirable shifting properties
available, there are many possibilities. Two seemingly practical forms at the
Quadrant level are nearest neighbor connections and the quadrant barrel switch.
These two shifting approaches represent the practical minimum and maximum
cost for the ILLIA C IV System.

NEAREST NEIGHBOR CONNECTIONS -Figure 2-2 shows the necessary data
paths and gates to implement this shift approach. It has the capability of shifting left or right 4096 bits in increments of 64 and 512 bits. Increments smaller
than this may be shifted with the local barrel switch if desirable since concurrency is not possible here.
All shifting must be done in sequence with all arithmetic and logic instructions;
all shifted amounts are combinations of the basic two increments. Shifting of
a modulo size smaller than the whole quadrant is done by mode control of the
PEts.

THE QUADRANT BA RREL SWITCH - The main advantage of the quadrant barrel
switch is its ability to execute any shifted amount (in increments of 8) in two,
passes through it. Each shift can be executed concurrently with other instructions
2-3

in the PEe Modulo control is accomplished by controlling the contents of the
Routing Register between partial shifts.

A COMPARISON OF THE TWO APPROACHES -

1. The two method s can produce identical results with differences in
execution time and cost. Ignoring time and cost the methods are functionally
equivalent.
2.
There is an approximate cost difference of six to one. If we ignore
the requirement for special circuits to transmit signals long distances, the
simple gate count for the quadrant barrel amounts to 200/0 of the total system.
3.
Physical distance is an important consideration. Figure 2- 3 depicts
a psuedo physical layout of the quadrant in which the interconnected PE for the
Nearest Neighbor Connections may be reasonably close. Perhaps a maximum
distance of 6 feet may be realized.
With the quadrant barrel however some paths are long. Considering a centrally
placed barrel switch in the middle of a 27 foot quadrant cabinet row, one row
per quadra~t, the worst- case distance would be about 30 feet.
This means that every shift, however short, will involve 30 feet of cable delay
plus logic. The following computation shows anticipated delay times.
Barrel Switch
Cable:

= 51. 0 nsecs.
8 gates X 3 nsecsj levels = 24. 0

30 1 X 1. 7 nsecsj ft.

Logic:

75. 0 nsecj shift

Nearest Neighbor
Cable:

6 1 X 1. 7 nsecsj ft.

= 10.2 nse·cs.

Logic:

3 gates X 3 nsecsj gate

=

Avera~e

Barrel Switch Time

Avera~e

Nearest Neighbor Time = 4 X 19. 2 =76. 8 nsecs.

*Refer to table 2-1,
2-4

9. 0
19.2 nsecsj shift

= 75.0

*

page 2- 5.

nsecs.

Table 2-1.

Shift

Table~

Desired
Shift

Shift
by One's

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

0
+1
+2
+3
+4
-3
-2
-1
0
+1
+2
+3
+4
-3
-2
-1
0
+1
+2
+3
+4
-3
-2
-1
0
+1
+2
+3
+4
-3
-2
-1

Showing Shifts by One's and Shifts by Eight's
Shift
by Eight's

0
1
2
3
4
4
3
2
1
2
3
4
5
5
4
3
2
3
4
5
6
6
5
4
3
4
5
6
7
7
6
5

0
0
0
0
0
+1
+1
+1
+1
+1
+1
+1
+1
+2
+2
+2
+2
+2
+2
+2
+2
+3
+3
+3
+3
+3
+3
+3
+3
4
4
4

Average number of shifts =

Total

2~~

Desired
Shift
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

Shift
by One's
0
+1
+2
+3
-4
-3
-2
-1
0
+1
+2
+3
-4
-3
-2
-1
0
+1
+2
+3
-4
-3
-2
-1
0
+1
+2
+3
-4
-3
-2
-1

Shift
by Eight's Total
4
4
5
4
6
4
7
4
-3
7
-3
6
-3
5
-3
4
-3
3
-3
4
-3
5
-3
6
-2
6
-2
5
-2
4
-2
3
-2
2
-2
3
-2
4
-2
5
-1
5
-1
4
-1
3
-1
2
-1
1
-1
2
-1
3
-1
4
4
0
0
3
2
0
1
0

= 4

4. However, real problems are not random with average shifts. The
data to be shifted often falls into patterns which, with proper programming, can
be made to somewhat fit the available shifting patterns of the partial shifting
scheme, and save time over the average time taken for transfer of randomly
placed data. Some examples of such savings follow.
(a) The Cooley- Tukey algorithm for spectrum analysis in one form
involves swapping data between PE's which are half an array apart at the first
swap, a quarter of an array apart at the second swap, an eighth at the third,
and so on. Over each 64 -PE quadrant, these transfers are accomplished with
four, two, and one partial shifts respectively, or an average of 2. 333 partial
shifts for each actual data transfer. This is considerably faster than the four
partial shifts required for a random data transfer, and is faster than the data
transfer effected through the a 11- possible-paths scheme.
(b) Another use for the data transfer paths which cover the array is
in the computation of global variables ("Global" here means a variable which
acquires its definition from data which covers all the PE's). Maximum, minimum,
logical AND~ across-word parity, sum, product, are examples of functions
which one might want to perform on corresponding· words in all PE's, more or
less in parallel acro"ss the array, producing a one-word result. Since PE's are
only capable of two-operand operations, a combination of the· 64 corresponding
variables from the 64 PEs of one quadrant into one resulting variable must
take at least six operational steps, ·consisting of 32 operations on the variables
by' pairs, then a pairwise combining of the 16 resulting pairs, and so on. Between

2-5

Figure 2-3.

0

17

16

Pseudo Physical Layout ofa Quadrant

2

15

3

14

4

13

5

12

6

11

7

10

360 377 361 376 362 375 363 374 364 373 365 372 366 371 367 370
20

37

21

36

22

35

23

34

24

33

25

32

26

31

27

30

340 357 341 356 342 355 343 354 344 353 345 352 346 351 347 350
40

57

41

56

42

55

43

54

44

53

45

52

46

51

47

50

320 337 321 336 322 335 323 334 324 333 325 332 326 331 327 330
741 64 73 65 72 66 71 67 70
1
300 317 301 316 302 315 303 3141304 313 305 312 306 311 307 310
60

77

61

76

62

75

63

---------------r-----~--------

100 117 101 116 102 115 103 1141 104
1
260 277 261 276 262 275 263 2741264
1
120 137 121 136 122 135 123 134/ 124
1
240 257 241 256 242 255 243 254: 244

113 105 112 106 111 107 110
273 265 272 266 271 267 270
133 125 132 126 131 127 130
253 245 252 246 251 247 250

,

140 157 141 156 142 155 143 154: 144 153 145 152 146 151 147 150
220 237 221 236 222 235 223 234 1224 233 225 232 226 231 227 230
160 177 161 176 162 175 163 174:164 173 165 172 166 171 167 170
200 217 201 216 202 215 203 214:204 2 13 205 2 12 206 2 11 207 210

Figure 2-4.

2-6

"Nearest Neighbors" Arrangement of Full Array,
Showing Quadrant Subarrays

each operation a transfer of data occurs to place it in position for the next operation. Thus a global function takes six binary operations and five data transfer
times when implemented within the system which has all possible paths. The
above description of a sequence of steps is correct even if a single instruction
calls forth the whole sequence.
Global functions are in the same category as the Cooley- Tukey algorithm as far
as data transfer is concerned. The data is shifted by 2 and combined with an
unshifted copy, and so on up to a shift of 32# for the 64- element subarray. For
the nearest neighbors scheme~ an average of 2. 33 partial shifts per actual
shift is called for. The nearest neighbor connections would thus seem to have
a slight advantage in speed over the quadrant barrel in executing global functions
such as maximum, minimum, global AND, global OR, and across-word parity.
5.
The conclusion is that we do not buy anything by implementing
schemes which tranfer data across from one side of the array to the other ·with
a single step. The time it takes for the data to travel across the array is sufficient to permit several logical operations. It is possible to find a set of partial
data transfer paths which result in simpler mechanization and lesser wiring,
and which do not degrade performance. Furthermore, by avoiding the barrel,
we also avoid the necessity for allowing maximum delay on each transfer, and
when the data is ordered, or partially ordered, a speedup {s accomplished
which would be impossible in the more expensive system.

Inter-Quadrant Transfers

Shifting between quadrants is dependent on the type of intraquadrant shifting
selected, but the considerations at the interface are similar. The choice of
transferring all words of a quadrant as a column (row) may be made for either
case.
Here also a similar layout of PE ~s was done for the quadrant (figure 2-3) can
make edge switching faster than quadrant switching. Figure 2-4 shows an
arrangement of the whole array, 256 PE's, with the same properties for the
whole array that figure 2-3 has for the 64 PEl sof the single quadrant. PE's
which differ in number by ±1 or ±16 from a subject PE are physica.l neighbors
of it, just as in figure 2-3, where PE's which differ in number by ±1 or ±8 of
a given PE are physical neighbors thereof. Furthermore, each quarter of the
diagram of figure 2- 4, as indicated by the dotted lines, is a copy of figure 2~ 3.
To allow one to see this latter point, the PE's in figure 2-4 are numbered from
o to 377 in octal. To translate from an entire array PE numbering, as shown
in figure 2-4, to quadrant numbering, one may suppress the first and fifth bit
of the binary equivalent of the octa.l PE number. When this is done, each quarter
of figure2-4 is like figure 2-3, with those on the right reflected about the vertical axis, and those on the bottom reflected about the horizontal axis. The lower
right quadrant, for example,is like figure 2-3 both upside down and backwards.

2-7

Machine Instruction
The machine instruction to use this equipment thus req uires considerable equipment to translate the instruction into the var ious shifts required. The number of
shifts will differ, depending upon how far around a given quadrant, or whole array,
the data is to be shifted. Variants of the shift instruction will be able to call upon
the intraquadrant shift even when the entire array is being used. This is of
advantage during one possible implementation of the Cooley- Tukey algorithm,
for example. A shift of one in the 32-bit-word mode also involves a transfer of
half-words betwee'ri.'neighboring PE's, since each one is now acting as two halfword PEls.
The shift instruction therefore calls up, possibly an internal barrel shift in the
PE, possibly a half-word transfer between neighboring PE's, a variable number
of east-west neighbor shifts, and a variable number of north- south shifts. To
preserve the contents of the A and B registers of the disabled PE's, intervening
steps must avoid the A and B registers, which may be only an initial source of
data, or the destination at the final shift. Either the MDR or the S register,
therefore, is involved in routing, with strong preference to the MDR, since
it is desirable, even though maybe not as absolutely necessary, to save the S
register as well as the A and B.

THE MULTIPLY ALGORITHM
The most important single logic function from the standpoint of both performance
and cost is the multiply. The emphasis placed on this instruction in its design
and application singles it out for special discussion.
When first considering the many different ways to implement the multiply the
ILLIA C array itself offers the first direction. There is a class of algorithms
which takes advantage of the statistical nature of the ONE and ZERO trains in
the multiplier. The average execution time of such a multiply is always less
than a worst-case pattern of ONE and ZERO in the multiplier and, therefore,
in the course of a program run, the multiply time is the average multiply time.
However, because of the lock- step synchronous operation of the Array which
handles up to 256 pairs of operands simultaneously, the average execution time
becomes the worst-case time. Such methods therefore are not applicable to the
ILLIA C system.
Following the above, the next decision to be made is how many bits of the multiplier are to be examined and disposed of simultaneously during a single step in
the" cyclic multiply sequence. Apart from circuit speeds, this is the single basis
for determining the speed of the multiply.

2-8

Two seoarate, yet interdeoendent, techniaues, are available to do this. One
technique is to encode fields of the m ultiDlier into a larger number base* ; and the
second techniaue is to combine manifold summands selected by the conditions of
the multiolier bits. In practice these two techniaues are combined into a single
comprehensive design. The following general relation is useful in dete"rmining
the siz e of the partial multiplier:
Execution time: (MB +3)

G·

6 t

XB
Where:
MB

Nunlber of bits in the mantissia

XB

Number of bits in the partial multiplier

G

Number of gates in typical delay chain

6 t

Nominal gate delay including loading, wire length, etc.

Reasonable values for the above paramete rare:
MB

48 bits

G

10 gates selected as typical PE logic chain

6. t

=

3 nanoseconds/ gate

Execution time = memory cycle time = 250 nanoseconds.
Therefore: XB = 8 bits.
Table 2-2 shows the relative speed-cost relation for the range of possible partial
multiply sizes. The important feature in this table is the gate count differences
for the different selected sizes. In terms of a 10K-gate Processing Element
the 8- bit selection appears to be a reasonable maximum hardware investment.

*MacSorley,

O. L., "High- Speed Arithmetic in Binary Computer. "
Proceedings of the IRE, pp. 67 -71, January 1961.

Wallace, C. S., "A Suggestion for a Fast Multiplier, "IEEE
Transactions on Electronic Computers, V. EC-13, February 1964.

2-9

.MEMORY DATA REGISTER

1

~

OPERAND SELECT REG.

OPERAND SELECT GATE

t

~

!

t

MULTIPLIER DECODE

1

CARRY SA VE TREE

B REGISTER

~

l

FULL ADDER

1

t

I--

BARREL SWITCH

+

t

A REGISTER

J
Figure 2-5.

Logic Elements Involved in Multiply Algorithm

Memory Access
Buffer (512 bits)

Memory Access
Buffer (512 bits)

1

1

8 words

,
Control
Unit
1

8 words

P.E.
Array

P.E.
Array
2

1

INTER

,
Control
Unit
2

-QUADRAN~

EXCHANGE

Memory Access
Buffer (512 bits)

t

Control
Unit
3

Figure 2-6.

-I

18

8 words

P.E.
Array
3

P.E.
Array
4

words
Control
Unit
4

SystelTI for Handling Data From Control Unit to
Processing Element Block Diagram
l

2-10'

Memory Access
Buffer (512 bits)

Table·2-2.

Relationship of Gate Size to Multiplier Size
Logic Gate Count for
Multiply Only

Estimated Execution Time
(nanoseconds)

6

2,000

330

8

2,800

270

12

4,800

210

Number of
Partial Multiplier Bits

Having decided what constitutes a reasonable number of multiplier bits to handle
in a single cycle (in this case 8 bits), the next consideration is to determine
how to maintain a 10- gate delay for the worst- case logic chain. To evaluate
this criteria the following substeps in the multiply must be completed within a
10-gate chain:
1.

Decode the partial multiplier.

2.

Add the next summand to the partial product.

3.

Shift the multiplier and the partial product.

4.

Normalize the result.

Figure 2- 5 is a block diagram of the logic elements involved in the execution of
this algorithm. The delay chain involved is the time to go from register to
register.
The multiply algorithm selected first decodes the 8 bits of the multiplier into a
base- 4 representation into which 0, + I, -1, and +2 values of the operand are
selected. This decode is stored in the Operand Select Register (OSR). The
OSR is 12 bits long for storing the fully decoded four conditions for each of
four operands.
Because the 8 bits of the multiplier are taken as bit pairs, four summands must
be added to the partial product in a single cycle. The carry save tree, which
contains three full length carry save adders (2 less than the total number of
summands) combines four summands plus the partial product into a new sum and
carry value. The full adder now combine s this sum and carry into a new partial
product.
In terms of worst-case delay, the logic chain through the operand select gate, .
the carry save tree, and the full adder represents the worst-case delay.
Detailed logic design analysis has shown that a nominal 30-nanosecond delay
through the chains is possible. A breadboard of the actual hardware must be
built and operated to obtain a true final result.
2-11

EIGHT-BIT WORD LENGTHS
The ILLIAC IV is to be built with several different word lengths. Each 64-bit
PE is capable of being split, effectively, into at least two 32-bit PE's. Another
desirable word-length breakpoint is at eight 8- bit effective PEl s within any
given PEe To evaluate this possibility we require, first; to know what it costs
to expand PE capability to handle eight independent 8-bit words, and second, we
need to estimate the worth of the extra capability produced by such an expansion
over and above the capability already inherent in the 64- bit PE for handling 8- bit
~ pieces.
The conclusion is that all logical operations, probably even add and
. subtract, are easily programmed for the 64- bit PE so that they make effective
use of the parallelism of the 64- bit machine. The only feature that is lacking
is the independent mode control of the 8-bit sections of th~ 64-bit PEe
The price for 8-bit words is mainly in the fragmentation of the controls which
results. Existing circuit design contemplates a buffer capable of driving 24
loads. With independent mode control on every 8- bit slice, each 8 bits of a
64-gate transfer command would have to be independently controlled, thus requiring 8 gates instead of the 3 gates required by straightforward implenlentation
of the 64- bit PEe Since there are estimated to be 150 command lines, and the
64-gate transfer command is typical of them, we estimate that 750 gates per PE
are added by the fragmentation of the command lines. In addition, the barrel
controls multiply, from the three control signals per level required by only
one word size, to 24 control signals per level. However, only two levels of
control are used in the module eight barrel, so that 42 additional gates are added
from this account.
The extra mode register bits for 8- bit operation imply extra lines to the control
unit, and extra drivers and receivers for them
Twenty- eight such bits per
PE will require 28 drivers and 28 receivers in the PE itself, and 3, 584 extra
drivers and receivers in the control unit. The design difficulties of cable
bundles of this bulkiness are considerable at the speeds under consideration.
The 8-bit operation contemplated here assumes we still have no more than one
index register per PEe To get eight 8- bit words out of independently specified
memory addresses requires eight memory accesses, a situation which is exactly
the same as though 8- bit fields were programmatically extracted from64- bit
words in 64- bit operation.
The most promising design of the memory currently contemplated for the ILLIA C IV,
representing the best compromise between cost and cycle time, is a nondestructively read memory. W~iting a small field, such as 8 bits, into a memory
word will require two memory cycles, one to read those portions of the word
which are not to'be changed, and one to write back the word,· one 8-bit field of
which has been changed. A store instruction in 8- bit operation, if. independent
addressing is called for, on each 8-bit word, will require, therefore) 16 memory
cycles ..

2-12

The above hardware operations are hardly better than programmatic implementation
in 64- bit mode. A pparently, the chief virtue of implementing 8- bit operations in
the PE hardware is that mode register control over the individual operations can
be achieved.
All the above circuitry amounts to an additional 800 gates in the PE for 8- bit
operation. Probably 1000 gates is a better estimate. From this we conclude
that adding 8-bit operation to the 64- bit PE add s about 10o/c to the equivalent
gate count, and presumably also adds 10o/c to the estimated cost of the PE
exclusive of memory.
In addition to the direct cost, there is degradation of the performance in normal
64-bit mode. Were the equipment packaged as a solid volu.me, lengths would
be increased by 3. 2o/c as a result of the 10o/c increase in components. In a wellmatched system, equally sensitive to memory and logic speeds, the result
would be a net 1. 6o/c slowdown. Power and coollng requirements also follow the
component count.
The opposite question is to enquire to what extent can 8- bit operations 1::>e carried
on in parallel in a 64- bit PE without the assistance of specific 8- bit operations
in the hardware.
Certain operations" which are expected to be frequent operations in 8-bit programming, are independent of word length, such as all bit- by- bit Boolean
operations" compare all words against zero, set to specified value, read from
memory, store to memory, and others.
Certain operations are programmable in such a way as to rnake use of much of
the parallelism of the 64- bit machine even though the entities being manipUlated
are only 8 bits long. Add, subtract, and routing, are examples of this class of
operation. Add, for example, on words of 7 bits plus sign each, can be handled
by removing the sign bit and using the resulting space for overflow control, thus
allowing as many adds in parallel as the adder is long. Routing can be partially
implemented on all eight 8-bit words at once by means of the barrel. End-around
shifts can be implemented by two shifts, which are then partially blanked in
accordance with a mask which could be held in the S register, and the two shifted
words then ORed together.
The only operations of any signiflgance which are not included in the above
paragraph are either things such as multiply or independent indexibllity, which
were never intended for 8- bit operation because of their expense, and mode
control.
_To program mode control, without actually having it in the hardware, requires
manufacturing masks" which blank out whole 8- bit segments of the 64- bit word,
in response to decision bits which usually will show up in the sign- bit position
of the 8- bit words.

2-13

This programming, while facilitated by the one- clock-time barrel by the complete set of Boolean operations and by the fast single-bit instructions, will not
be trivial.
l

l

COMMON DATA PATHS

This section describes the various paths by which information is transferred
between the PE and its memory both to the input-output interface, and to the
·control unit. A s a result of the .conIlicting requirements on transfer rates,
frequency of use and destination, there are three segments of the equipment
provided for data transfer. These segments are an input- output buffer a
memory access buffer with each control unit, and direct lines 'from the control
unit to and from the PE. The rest of this section describes this hardware in
terms of its use in each class of signals which must be handled into and out of
each PE.
l

l

!.nput- Output Buffering
Present plans are to provide an input-output buffer of 4096 bits for each quadrant of the array. Each bit of the I/O buffer has connection to one bit of the
memory data register of one PE. When data is to be transferred to or from
a PE memory from the I/O buffer, one memory cycle is taken from PE operations and all 64 PE's insert one data word into their memory at that memory
cycle.
l

Data is transferred across the external interface of the I/O buffer in word sizes
appropriate to the device found at that interface. The external device may be
a buffer memory with very long words say 4096 bits each. In this case, I/O
operations are accomplished by stealing one PE memory cycle from the array
for each cycle of the external buffer memory. On the other hand, the external
device could conceivably be a device of lesser word size, 512/bits per word, for
example, in which case an independent loaqing and unloading control is required
to assemble the shorter external words into the 4096- bit word in the 110 buffer.
This independent control communicates with the array control unit both to steal
memory cycles, and to report I/O complete when a whole biock of data has been
successfully transferred.
l

The design of this interface is almost independent of the rest of ILLIA C IV.
It is not the intent of this section of this report to discuss the design of the independent I/O control unit, as this design is dependent upon matters discussed
els.ewhere in this report.

2-14

From Control Unit to Processing Element
Data for the use of the PE, data to be stored in the PE memory for the control
unit's own use, and commands for the PE are transmitted from the control unit
to the processing element. Common data lines from the control unit are used
for data which goes from the control unit to the PE, whether this data is for PE
use or for the control unit's private use. To avoid continually interrupting PE
operations for data fetches and stores which are related to control unit purposes,
many words should be fetched and stored in parallel. A reasonable compromise
between amount of hardware and interference with PE operation appears to be
no more than 32 words of data transmitted to the PE array in parallel. Further
discussion of the buffer size is found below. We plan to provide 32 words of
buffering. Of these 32 words, eight are assigned to each quadrant. A block
diagram of the system for handling data is shown in figure 2- 6 (page 2-10).
It is simplest to describe operation by starting with the isolated quadrant. Eight
words of data are transmitted to the memory access buffer from the control unit.
This transmission is overlapped with PE operations provided that the program is
such that the requirement for transmitting data can be recogni7ed far enough ahead
of time. This will generally always be recognizable ahead of time when the data
. is simply to be stored in memory. One clock time per word is expected to be
required, since the data paths within the control unit are expected to be typically
one word wide. Each word in the memory access buffer has access to a column
in the array. The eight words have simultaneous access to one element in every
column, namely a row. As a result, the eight words can be transmitted in parallel to every row in the array_ If data is to be stored in memory, one row accepts
the data.
If data is being broadcast, rows receive the data.

In this case, the eight words
of data are eight copies of a single word whicl} originated in the control unit.

When the array is operating as a coherent whole, all four control units operate
on t~e same program str ing and data in parallel, and have identical internal
states. Out of a package of 32 words to be sent, the f,irst eight can be derived
from information supplied by the fir st control unit, the second eight can be
derived from information supplied by the second control unit, and soon. Likewise, when broadcasting data, the four control units will have four identical
copies of the word. to be broadcast, each of the four is copied eightfold into the
memory access buffers, and each of the resulting 32 copies is then used to
drive eight PE's.
Also transmitted from the control unit to all PE's in parallel are control signals .
. These control signals, are identical for all PE's. Retiming considerations will
demand that there be a flip-flop in the control unit for each such line. Present
planning calls for a receiver per cabinet for each such signal, and eight PE's
per cabinet.

2-15

Addresses must also be issued from the control unit to all PE's. These will use
the least significant 12 bits of each word in the memory access register, and the
broadcast data bus.
Issuing of addresses and broadcasting of data require a fixed delay of several
clock times. The design of the microsequences as manufactured by the control
unit is such as to allow for this constant delay.

From Processing Element to Control Unit
When data is fetched from PE to control unit in the isolated quadrant, the
mechanism is the reverse of the process described above for disseminating data.
Namely, the information is collected in the form of a single word from each
single column of the quadrant, and the eight words from each row are then
transmitted to the control unit. For the instruction" store to broadcast register, "
the eight words are ORed together to form a single word in the control unit,
very like a broadcast in reverse.
When data, or program string, are being fetched in 32-word packages from the
united array an additional complication sets in, since each one of the memory
access buffers contains only a quarter (eight words) of the 32-word package and
each control units wants all 32 word s. In this case it will be necessary to transfer data around among the memory access buffers until each quarter package of
eight words has shown up once in each of the eight- word memory access buffers.
Single word fetching makes use of the same paths by disabling all but one of
the words being received.
Also entering the control unit are lines from each of the mode register flip-flops
in the PE' s. To the control unit of one quadrant, 'a bit in the mode register
appears as a 64-bit register to the control unit, which may be set, read, compared with other registers available to the control unit, etc, When operating in
united mode, control units must cooperate in the sensing of the state or mode
registers. For example, a "jump on any bit equal to ONE" means to jump if
any bit in anyone of the four 64- bit registers involved is ONE. Since all four
control units run the same program instructions at the same time, only twelve
wires, one to communicate between each pair of control units ,are required to
secure the necessary cooperation.
Evaluation
Some of the above described data-transferring procedures take more time than
one might at first expect. One 250-nsec. memory cycle is required to load the
memory access buffer. This memory cycle interferes with the action of the
PE only insofar as the PE memory or memory data buffer is required.

2-16

The memory access buffer for a single quadrant can be unloaded into the appropriate area in program lookahead or local data buffer in eight clock times. However,
these potentially interfere with other used of program lookahead or local data
buffer. With proper implementation of the controls, these eight clock times can
be largely hidden by being taken from otherwise idle time in the two buffer areas
in the control unit.
When the array is in united operation, one must count not only 32 clock times
rather than 8, one also must transfer the data from one memory access buffer
to another.
If cables of up to 30 or 40 feet long separate the memory. access buffers, then
the time to transfer data from one memory access buffer to another may well
be three clock times, and three transfers are needed. The conclusion is that
as much as 41 clock times are needed to transfer the 32 words, read in one
memory cycle time, into the appropriate areas of the four control units. Seven
clock cycles in each memory cycle is a likely design choice. In this case, 41
clock cycles represents 1. 46 microseconds. This 1. 46 microseconds is overlapped with other operations as long as a reservoir of instructions for PE
operation can be maintained in the control unit.
However, this 1. 46 microseconds often gets in the way when loading the program
lookahead. It is a penalty to be taken whenever the program jumps to a location
not contained in program lookahead. Further, the coarser block size means that
loops between 32 and 64 words long will less often be found within the program
lookahead, and program fetching will take place more often when the block which
is fetched is larger. Large block size also interferes with broadcast operations,
since a larger delay occurs between the instruction to fetch a block of data to the
data buffer and the first opportunity to broadcast one of those words, when blocks
are larger.

~

Optimum block size is that which finds the best tradeoff between interference
with the operation of the PE's in the array, and slowing of operations in the
control unit. Block sizes of 8, 16, and 32 words are, easily available with minor
modifications of the structure here described, and a choice of a smaller block
size than the 32 words here proposed can be easily made during the early, design
months of the next contract.

MODE
Introductory Considerations
. Each PE is supplied with a mode register, which it can change as the result of
tests, and which the control unit can set at will. Instructions will be executed
or not as a result of the setting of the mode register. This arrangement is in
lieu of branching at the PE level, since all PE's must share the common
instruction stream, and therefore cannot independently execute transfers of
control.

2-17

Modes should be remembered and recoverable. At the very least, each routine
which one enters must be supplied with fresh capabiHty for setting and changing
its mode, while remembering the mode of the calling sequence. Actually, it
seems that considerably more flexibility is desirable. In the limit, one could
specify a particular mode register, out of some set of mode registers, for each
instruction.
The Problem
At issue is the question of the location of the backup storage of the noncurrent
modes. The choice is between supplying back-up mode register storage in the
PE's, or of supplying the control unit with the capability of reading, storing,
and restoring the PE modes. The implementation of mode control is considerably different, depending on the results of this choice, so'that we are
really choosing between two different systems for controlling PE operations
in response to mod e.
Back- Up Storage In The Control Unit
The first system to discuss is that with back-up storage for other than current
modes in the control unit. In this system, only one mode register's worth
of flip-flops are in the PE, and the setting of the mode register, for which a
given instruction will be executed in a specific PE is known at the time of that
instruction. Either four bits accompany the instruction, specifying which of
the possible modes perInit the execution of the instruction, or else a pre- existing
decision, based on the content of the mode register, controls the execution 01'
nonexecution of instructions in the PE. Bit economy in the instruction stream
favors the latter, if mode is not to change at every instruction. A decision
based on speed also favors separation of the mode decision from the execution
of the instruction, since then less control gating is involved for the individual
instruction at the PE.
One of the savings in speed of the ILLIA C IV type of computer ought to lie in the
fact that the subcommand matrix of a normal computer finds itself mostly in the
control unit, so that the instruction decoding and timing wpich consumes one
clock time per instruction in most computers can be spent in the control unit,
overlapped with useful arithmetic work in the PE. For most complete overlap,
the PE has an "on- off" flip-flop to control the execution or nonexecution of the
next instruction. Individual instructions in such a scheme will never wait for
the decoding of mode information before they can start and noninterferin~
microseQuences can freely overlap. Occasionally a one- clock-time instruction
would be needed to change the setting of the" on- off" flip-flop in response to
some new interpretation of the mode register.
The "on-off" flip-flop also must respond (sometimes) to arithmetic overflow
as well as to programmed tests which change the mode bits. There is an overflow flip-flop which appears to the control unit much like third mode bit.

2-18

In this system there are therefore four flip-flops per effective PE with mode-like
functions. Thus there are four flip-flops per 32-blt word, or eight flip-flops
per actual PE, two programmatically specifiable mode bits, the overflow flip-flop,
and the "enable-disable" flip-flop. There are PE instructions to set or reset
each mode bit in response to programmed tests. Each bit in the PE appears
to the control unit as a 64- bit word, since there are 64 PEl s per control unit.
There are instructions to read, save, and set these word s, the instructions
being control unit instructions rather than PE instructions. The control unit
is provided with high- speed register storage for such saving. It is also provided with logical instructions to manipulate the modes, namely AND, OR, and
COMPLEMENT instructions which can operate on the words formed from the
mode register bits. Jump instructions in the control unit would test mode bits
(either "any mode bit" or "all mode bits ").
They would also test old modes
stored in the control unit.
This system thus has a reservoir of old modes which can be reactivated on
short notice. This back-up is in the control unit. Response to old mode settings
is effected in one clock time by setting the "on-off" flip-flop in each. ,?E. The
instruction stream has no mode field per instruction, but does have mode- manipulating instructions, which are decoded by the control unit in parallel with the
issuing of instructions to the PE.
Storage of old modes in the control unit is backed up further by the storage of
words from old mode registers back to memory. It is assumed that the control
unit has some means of addreSSing memory, both for read and wr ite.
Back- Up Storage In The Processing Element
The second system we discuss is that with back-up storage of old modes assigned
to the PE's. In this system, where several modes are stored in the PE, a method
must be chosen for choosing the applicable mode for any given stream of instructions~ A settable pointer could be used, and the pointer setting changed
on command from the control unit, in between actual processing instructions.
Using the pointer, the operation would be fully equivalent to operation with
back-up storage in the control unit except for the location of the stored bits.
Whe,n a mode register address is lissued with each instruction, more flexibility
is obtained. A s described to us, the system was used with six mode bits with
each instruction: Two bits of mode register address, and four bits to interpret
the contents of that register.
Arithmetic overflows are lost in the implementation unless tested for immediately
by means of a jump instruction in the control unit. They cannot influence mode
directly, although, clearly, when tested, they can be used to control modes.
Even with back-up s,torage of modes in the PEl s, some means of setting new
modes under control unit control is required. Whether a path directly from
control unit to PE for loading the mode registers is needed, or whether indirect
methods suffice, has not yet been determined.

2-19

Each instruction in the PE must therefore look to the mode register for conditions against which execution is made conditionaL Some instructions are made
one clock time longer becuase of this. Furthermore, no overlapping of
non-interfering microsequences appears possible.
Jump instructions are made dependent on a bit, one per PE, returned to the
control unit. These instructions are, like any other, conditional on one of the
mode registers, so that jumps can easily be made conditional on mode register
settings. This capability is similar to that achieved in the competing system
by testing old modes in the control unit.
The system has a limited reservoir of old modes which can be reactivated on
short notice. This reservoir is limited in length, but very fast of access. The
instruction stream has a 6- bit mode field per instruction.
Storage of additional modes is in PE memory. If they are to be packed effciently in memory, a short mod e- packing routine is necessary.
Functional Capability
Both competing systems can execute the same functions.
capabilities follows ..

A partial list of

• Quick change of control.from one mode register to another.
•

Storage 6f several different mode registers per PEe

• Transfer of control possible in response to mode register setting.
Comparison
A list of features in which the suggested methods of supplying back-up mode
registers differs reveals none in which the functional capabilities made available by the one cannot also be made available by the other. However, there
are other differences.
"When back-up mode storage is in the PE every microsequence requires the
operation of mode gating in the PE, and overlapping of microsequences is not
possible. Six bits of every instruction are expended on mode information.
"When back-up mode storage is in the control unit, every change of mode requires
a one t-time operation interleaved between the regular arithmetic operations.
Each change of mode takes a short instruction, say 16 bits.
What is needed, to compare the advantages and disadvantages listed above, is
some estimate of the number of instructions, on the average, executed without change of mode setting, or executed while ignoring mode. Even if average ..

2-20

the string length is only two or three. The speed advantage would appear to lie
with the system of storing modes in the CD.
Another difference is in the PE hardware. Considerable savings are expected by
placing the back -up mode storage in the CD, whose long registers are much
cheaper per bit than PE short registers.
At a very detailed machine language level of coding there are differences in
the progran1.ming of mode control. With back-up modes stored in short registers
in the PE, a different mode with every instruction is appropriate, but programming compl~xity mounts when mode registers above the fourth are used. With
control unit back-up storage, programming complexity remains essentially
constant for any depth of mode storage, although producing'memory interference
when depth of storage exceeds that available in the high speed registers of the
control unit. The same assembly language could be used in either case, if
desired, putting the above differences solely into the assembly program.

2 -21

SECTION III
REVISED PE LOGIC DESIGN

This section describes the mechanization and hardware requirements for the PE.
A block diagram is included containing the data paths for each item required to
mechanize a PE. The logic requirements per unit are tabulated in table 3-1.

PROCESSING ELEMENT DESCRIPTION
The revised PE logic design is shown in figure 3 -1. A PE is essentially a threeregister system which can execute a complete general purpose computer order
code as described in Section VII of TR66-3 or otherwise modifieq in this report.
A fourth register was added to store intermediate results. Capabilities of some
units may be increased or decreased to vary operation code execution times. A
summary of each unit in the PE follows.

MEMORY DATA REGISTER (MDR)
This unit is abuffer register between a PE and its PE's thin film stack and the
outside world. The register serves as an intermediate buffer when performing
array shifts or memory stores and fetches. This register is accessible even
though the PE's mode status is disabled. The MDR receives operands from the
: Common Data Bus, A Register, Operand Select Gates, Address Adder and its PE's
.memory. The register's output is connected to the Common Data Bus, Address
Adder, PE's memory and Operand Select Gates located within its PE's boundaries
or four nearest neighbors.

3-1

TO I/O REGISTERS

1

COMMON DA TA BUS
1

3
CONTROL
UNIT

-

MODE
DRIVERs'rr
REGISTER
RECEIVERS

I

DRIVERS &
RECEIVERS

J

,

l

DRIVERS &
RECEIVERS

t

RECEIVERS

I

r--=-

•

,_~1

SEW

t + ••

2&3

t

1

PE
ENABLE
SIGNALS
RECEIVERS

N
3

I

:

I
t--

A REGISTER
SELECT GA TES

I

I

•

I

1
t

j

1

1

J

1
t
LEADING
ONEs
DETECTOR

l
f
l

~

I

----1

1

-'-

I

DRIVERS

1

+ +

N

S

•
E

t
W

NOTES:
1.

The blocks iabeled number 1 will be implemente d
cabinet or half Cabinet BaSiS, rather than a
PE basis.
The blocks labeled number 2 are implemented to
only 2 of the Signals .
The blocks labeled number 3 will be implemente d
as required on a PE basis.

DDa

2.

I

A REGISTER

I

1---

2 & 3

,J

I

~

I

I
I
I

I

B REGISTER

I

I

I

I

~

I

I
I
I

I
I

MEMORY
ADDRESS
REGISTER
(MAR)

r rI
PE
INDEX
REGISTER

AND OPERAND
SELECT GA TES

CARRY

Figure 3-1.

I

B REGISTER

I

PE
iMEMORY

I

~

PRO PA GA TE ADDER
(CPA')

I
I

r-r--

.-

~

PSEUOADDER
TREE

r=:l

I
I
I

ADDRESS
ADDER

S REGISTERS

I

I
I

I
I

OPERAND
SELECT GATES
(OSG)

1

I

I

+

J

MULTIPLICAND
SELECT GATES
(MSG)

I
I

I
I

MEMORY DATA REGISTER
(MDR)

H

__,

•
LOGIC
UNIT

,
I

BARREL
SWITCH

I
I

3.

1

Revised PE Logic Design, Block Diagram

Table 3-l.

PE Logic Requirements

FUNCTIONS

UNITS
Memory Data Register

Operand Select Gates

Buffer register between thin film
stack and logic
Address Adder inputs, bit positions
52 through 63
Drivers and Receivers to and from
Common Data Bus
Drivers to adjacent PE Operand
Select Gates
Drivers and Receivers to Input
Output Registers
Selects an Operand for PE
processing
Receivers from two adjacent numbered PE's
Selects 4-56 bit words based on
8 multiplier bits
Control word enable signal generation, some fan out

Multiplicand Select

Carry Propagate
Adder

Address Adder

BIT
POSITIONS

GATES/
BIT

TOTAL
GATES

FLIP
FLOPS
64

64

6

384

12

1

12
64/64
64/00
64/64

64

6

384
00/128

56/ word

16

896
288

49

648
96

12

2

48

3

144

12
16
16
16

4
5
3
3

216
48
80
48
48

Consists of six adder sections:
Receives mantissa field inputs from
A-B Registers, Pseudoadder Tree
and Operand Select Gates.
Gate and Flip-flop requirement for
six sections
Input Gates: A Register
B Register and Operand
Select Gates

48

Consists of two adder sections: Perform address modification, exponent
summation and supplementary sections to Carry Propagate Adder
Gate and Flip-flop requirement for
two sections
Input Gates: Address Modification
Exponent Summation
Control Inputs
Output Gates:

4

PE's Index Register

Stores Address variable

12

12

Memory Address
Register

Buffer register between thin film
stack and logic

12

12

A Register

Main operand store and Overflow
flip-flop

64

7

448

65

B Register

Auxiliary operand store
Address Adder input:,: pOSitions
o through 15

64

6

384

64

16

1

16

S Register

Intermediate operand store

64

Pseudoadder Tree

Converts 4 Words + Partial Product
into CPA inputs

56

Logic Unit

Performs logic functions between
A-B Registers

Barrel Switch

Performs shifting functions.
Barrel Switch Control Logics

Mode Register

Buffer register between a PE and the
Control Unit
Mod.e control input gates
Drivers and Receivers to Control Unit

Leading One's
Detector

Detects position of leading one,
mantissa field

PE Control
Signal Rec.

Receives apprOximately 200 Signals
frqm the Control Unit

64
27

1512

64

6

396

65

14

910
48

1.

2.
Total gates used.

1

8

8
38

8/8
184
000/200
7228

A ssurnptions:

DRIVERS/
RECEIVERS

355

200/264

The flip flops are constructed of 4 gates per element.
The Drivers and receivers are constructed of one logic gate.
7228
1420
664
9312

3-3

OPERAND SELECT GATES (OSG)
This unit consists of one logic level of decoding gates. The enabled gate selects
an incoming operand from its MDR or nearest neighbor's MDR for array transfer
operations or PE processing. During an array transfer operation, . the OSG output is stored temporarily in the MDR. PE processing requirements are that the
selected operand be connected to the B Register input gates and Multiplicand Select
Gates. Depending upon the instruction an operand, or multiple thereof, is loaded
into the recipient register.

A-B REGISTER (ACCUMULATOR)
The A Register and B Register provide main operand store and auxiliary operand
store, respectively. The Barrel Switch output is connected to both registers.
These register inputs are used to execute operation codes that require main or
auxiliary operand shifts, or logical combinations thereof. Both register outputs
are routed to the Carry Propagate Adder (CPA) and Logic Unit.
In addition the A Register receives inputs from the CPA and the Modified Address
Adder.
The CPA output is gated into the A Register's mantissa field. This
output, depending on the algorithm, could be a partial product or remainder, a
summed mantissa or just an operand transfer from the B Register or the OSG.
During an operand transfer, the appropriate exponent is gated directly into the
register's exponent field. The Modified Address Adder output is a summed value
loaded into the exponent field. The A Register's outputs are connected to the
Leading ONEs De tector, MDR, Special Register (SR) and the Modified Address
Adder. The B Register receives additional inputs from the CPA; OSG, and SR.
The CPA input is a completed 8 -bit product obtained in a multiplication microsequence step. The OSG input is the incoming operand from this PE's or an
adjacent PE's MDR. The SR input is the incoming .operand obtained from a previous
calculation. Other B Register's outputs are connected to the Multiplicand Select
Gates and Modified Address Adder. The Multiplicand Select Gate inputs are
multiplier bits decoded into enable signals for the next multiply step. The Modified
Address Adder input is the exponent field to be summed in the adder.

*

LOGIC UNIT
This unit performs logic functions between the A Register and B Register. The
unit receives outputs from both accumulator registers. The various logic functions
are performed as specified in Section VII of TR66 -3 or otherwise modified in this
report. This unit provides the input gating function to the Barrel Switch unit.

*The Modified A.ddress Adder term refers to the Address Adder extended 4 bit
positions.

3-4

BARREL SWITCH
The Barrel Switch is a matrix of symmetrical gates which shifts a 64 -bit parallel
input any number of places to the left or right either end-off or end-around. Operation is started upon receipt of a 64 -bit parallel input and appropriate control
signals. The output is connected to both the A and B Registers.

LEADING ONES DETECTOR
The input to this unit consists of the A Register's mantissa bit positions. This
unit detects the location of the most significant set bit, decodes its position as a
radix 2 power and enables appropriate Barrel Switch displa'cement control signals.
The unit also senses the absence of a set bit, thereby detecting a zero value. This
decoded signal is used to zero the exponent field of the A Register.

MULTIPLICAND SELECT GA TES
These gates select four multiplicand words each based on 2 bits of the multiplier
mantissa only. The Multiplicand Select Gates are partitioned into two parts, one
which receives the next set of multiplier bits to be decoded into multiplicand word
enable signals and the other which gates the appropriate multiplicand (word) into
the Pseudoadder Tree. During multiplication, four 2 -bit pairs of multiplier bits
are received and decoded in advance to form the enable signal that gates the
required words into the Pseudoadder Tree for the next microsequence step. The
decoded signal may select multiplicand multiples of times one, times minus one,
or times two. This word selection enable signal is stored in flip -flops at the
termination of the current microsequence step.
During an operand transfer, the word selected' will enter the Pseudoadder Tree so
that its output is positioned at the A Register mantissa input gates.

PSEUDOADDER TREE
The Pseudoadder Tree consists of three carry save adders per bit position. During
multiplication, this unit receives inputs consisting of four words from the multiplicand Select Gates and the current partial product from the A Register. The
carry save adders reduce these five inputs to two outputs, one being the summand
sum and the other the summand carry. These outputs are gated into the CPA
in their true and complemented form.

CARRY PROPAGATE ADDER (CPA)
Upon receipt of an input signal set from the Pseudoadder Tree or the A -B Registers, or the A register and Operand Select Gates, the CPA adds the inputs in a

3-5

microsequence step. When executing an arithmetic instruction, the extended
CPA's output bit positions 0 thru 47 are gated into the A Register mantissa field.
When executing the multiplication algorithm, the CPA's output positions 48
thru 53 are gated into the B Registers eight most significant locations, and
positions 0 thru 47 are gated into the A Registers mantissa field.
During multiplication the CPA functions in two distinct modes: group carry save
additions, and extended mantissa addition. The saved group carries are displaced 8 bit positions to the right and re -entered into that particular adder group.
The group carry save addition is performed on those adder sections whose output
is entered into the A Register's mantissa field. Additional adder sections are
required to perform this algorithm. These adder sections receive their inputs
from the most significant 16 bit positions of the B Register and group carry save
bits from the previous microsequence step. The least significant section
performs section carry save addition. The saved carry is re -entered into this
adder displaced eight bit positions to the right. The last step in the multiplication algorithm requiring product summation is performed by the CPA operate mode of extended mantissa addition .. The additional adder sections are
connected to the CPA to form the· extended tna,ntissa adder. This microsequence
step allows the carry to propagate throughout the adder to form the completed
product.
MODE REGISTER
The Mode Register is a buffer register between the PE and the Control Unit. The
PE status (operative or inoperative) is controlled by two bits of this register.
These two bits are controlled exclusively by the Control Unit. The PE controls
other register flip -flops when executing specific instructions.

ADDRESS ADDER
The Address Adder inputs are received from the Common Data Bus, MDR, PE's
Index Register and the Memory Address Register. These added sums are connected to the input gates to the MDR, PE's Index Register and the Memory Address
Register. The Modified Address Adder inputs are received from the exponent
bit positions of the A -B Registers and the OSG. This added output is either decoded
to form Barrel Switch control signals or entered into the A Register's exponent
field.

PE INDEX REGISTER
This unit receives inputs from the Address Adder. Its output is connected to the
Address Adder where compare operation or address modifications are performed.

3-6

MEMORY ADDRESS REGISTER (MAR)
The MAR is a buffer register between a PE and its thin film stack. The MAR output is the thin film stack operand address location. Depending upon the instruction
the MAR can be loaded or modified with Address Adder inputs. The Address
Adder input is gated into the MAR at the start of a memory cycle.

SPECIAL REGISTER (SR)
The SR Register is used to store intermediate results obtained during PE processing. The operand input is received from the A Register. The SR register's
output is gated into the B Register, thus providing a buffer 'store for a previously
computed operand without a memory fetch.

3-7

SECTION IV
THE INPUT-OUTPUT SUBSYSTEM

GENERAL CONSIDERATIONS
The following design considerations are pertinent to the ILLIAC IV I/O Subsystem.
1. A disk storage system will provide the principal on-line backup storage
for the ILLIAC IV System. Present memory state of art singles out disk file storage as the only medium with the necessary volume and cost parameters to satisfy
the ILLIAC IV requirements. Requirements for the disk file system are further
considered at the end of this section.

2. I/O word transfers to and from the PE Array will be in the form of a
4096 -bit word. This capability makes maximum use of the interrupt time of a
quadrant and keeps I/O interference with the Array program to a minimum. It
also provides a separate 1/ a path to each PE to accommodate applications which
require asynchronous direct inputs to the PE memories. Some real-time problems
involving large array sensor systems are typical of this application.
3. Capability to perform interlaced I/O transfers from simple descriptor
operation is desirable. Considering the variety and number of peripheral devices
the system may ultimately incorporate, the capability to store and rapidly fetch at
least 64 I/O descriptors to control the interlaced I/O transfers is necessary.
4. Much routine I/O processing such as data packing and unpacking,
descriptor assembly and updating, etc., must be performed external to and independent of the Array or Control Unit. Therefore, a processing 'device similar to
a medium scale computer module would be most suitable. There are, however,
some 1/ a programs such as code and format conversion which are eminently
suitable to Array processing.

4-1

5. The existence of an economical core memory system separate and
distinct from the Array memories is necessary for the following reasons:
a) Transfers from tape to disk, or card to disk, or disk to printer, must
have a buffer area available to collect reasonable block sizes before making transfers to the' disk in order to minimize latency time overhead.
b) Frequently used subroutines which overflow array storage can be kept
in random access, zero latency time memory to avoid millisecond delays in
processing.
c) To permit I I 0 lookahead transfers from disk to the array to overlap
latency time with processing, a separate buffer memory is desirable as a staging
area for such transfers.
It is possible that the buffer memory required to interface the disk mass memory

and the PE array memory may be very large. Anum ber of core memory manufacturers are being surveyed to identify possible candidates for core memories
in the capacity range up to 16 million bits (modules independently accessible up
to 8 million bits), with word-lengths in the range of 1024 to 4096 bits, and cycle
times in the range of 2 to 8 microseconds. In the indicated capacity range the
ferrite core memories are currently dominant from the cost standpoint. Memory
organizations involving only 2 wires per core, such as linear select or variations
of "2 1/2 D, " are generally indicated to minimize mat fabrication cost. This is
also a significant consideration here but in this case it is likely that the electronics
will remain a very significant portion of the cost. For the longer word lengths
(2K - 4K bits) it appears that" 2 1 / 2 D" would offer little advantage over linear
select and the large number of sense amplifiers and data bit drivers would reflect
heavily on the cost per bit. It also appears that practical considerations may
limit word drivers to something in the order of 1000 cores per line and hence might
require multiple driver and switch matrices for the longer word lengths. Necessarily tentative extrapolations of fairly recent cost data suggests that it may be
difficult to obtain costs much lower than 3 cents per bit for a core memory for
this requirement.
At this time several core memory manufacturers, including the Burroughs Component Division, have been contacted about this requirement. ' The unusual geometry
of this unit precludes the use of an existing product line item. This fact has delayed
detailed responses beyond publication of this report.

DISK STORAGE
Effort to determine candidates for a disk file mass memory offering high data
transfer rates and economic storage in capacities in the range of 300 million to 1
billion bits has continued. Obtaining the high data transfer rates desired in this
applicat~on (greater than 300 megabits / second) requires a high degree of parallelism
in the transfer, e. g. a large data storage word. The parallel mode of operation

4-2

increases the amount of read-write electronics required and it is desirable to keep
this increase to a minimum. It is possible to reduce the number of read -write
channels below one per bit if basic bit rate capability can be traded to effect. the
reduction. This tradeoff is indeed a necessity if fewer heads are simultaneously
accessible than the number of bits required in parallel.
The reduction of the number of required read -write channels is effected by using a
multiple zone storage format in which several bits per word time are recorded
serially in the outer zones. In general a two -zone format allows the greatest
. reduction in the number of read-write channels for a given sacrifice of maximum
bit rate while formats of three or more zones afford greater capacity utilization.
More than three zones are seldom warranted since the required increase in the
number of clock rates tends to cancel the attractiveness of the small, further increments to utilization efficiency. The following relations enable specific disk
organizations to be evaluated. For a three-zone format the number of tracks in
each zone must be related as follows:

2n

1

+ n = N [nb + 2 '- K]
2

T

~

and for a two -zone format

where
n
n

1
2

= the

number of tracks in the outer zone

= the number of tracks in the !Y'tddle zone

NT = total number of tracks per disk face
K

= number of bits per track per word stored serially in the outer zone.
N
n

= the number of bits per word per face

f

N

= the number of bits per storage word

n

= the number of disk faces used per word

f

4-3

The number of read-write channels required is:
n

c

= n. n

n

f

where nh is the number of heads per face which are simultaneously selected. For a
reduction of read -write channels below one per bit of the parallel storage word it
is required that

In order that the available capacity of the disk file be used efficiently it is also
required that nh be an integral submultiple of the total number of tracks per disk
face, NT' and that nf shall be an integral submultiple of the total number of faces
per file, N(
For the usual ranges of interest the maximum packing density occurs in the innermost track of the outer zone. It is at this critical radius, R , that the maximum bit
packing density, B, applies.
C

where.
R2 = outermost track radius

T

= the radial track density.

The number. of storage words per data cylinder, that is, the number of words
accessible in one disk revolution without head switching, is given by:

~=

2TT RcB
K

The word transfer rate,

UW,

• _
(RPMt
~ - nW \60)-

TT

is then determined in terms of the disk rpm as follows:

RcB (RPM)

30K

The total file capacity is;

bits/ file.

4-4

The ideal packing efficiency# a measure of capacity utilization referred to the
maximum possible capacity which would be available at uniform packing density
in all tracks# is given by:

where R2 and R1 are the radius of the outermost and innermost tracks respectively.
A related packing factor based on the maximum possible capacity at a constant
number of bits per track for all tracks is:

Note that this factor is in general greater than unity because the multizone format
utilizes available disk area more efficiently than would be possible with a single
clock rate for the entire disk (as would be required for a straight parallel
configuration).
In the multiple zone format every track in the same zone contains the same number
of bits# but each zone requires a different clock rate. The clock rates for a twozone format# for example, would be:
K

nW

in the outer zone

(K -1) nW in the inner zone.

As previously reported the Librascope 4800 disk file series is potentially attractive
because of the large number of heads per face. The model 4802 is of particular
interest because the increased packing density offers higher basic bit rates and
more capacity in the same basic unit. The salient characteristics are summarized
here.
J

No. disk per file - 6 (48" dia. )
Tracks per face

- 432

Packing density

- 2000 bits/inch

Track density

- 48 tracks I inch

RPM

- 900 (35 ms avg. access)

4-5

Table 4-1.

512-Bit, Parallel Organizations for the Librascope 4802
No Head
Modifications

4-6

All Selected Heads on
One Disk

One Face

Bits
Bits
Format (Out er Z one: Inner Zone)

3:2

4:3

3:2

4:3

Head Groups/Head Assembly
Bits/Face/Word
Heads/Sel. /Face/Word
Heads/Group
Faces Sel. /Unit

1
86
36
12
6 of 12

1
128
36
12
4 of 12

3
256
108
4
2 of 12

4
512
144
3
1 of 12

Ideal Packing Efficiency

83%

86. 3%

83%

86.3%

No. Stg. Words/Data Cylinder
32-Bit Segments/Data Cylinder

83840
2620

58000
1815

84500
2640

58000
1815

Word Transfer Rate, MHz

1. 26

O. 87

1. 27

0.87

Read-Write Channels

216

·144

216

144

Data Cylinders /Unit

24

36

24

36

6
Total Stg. Cap. Words/File (10 )

2.0

2.09

2.02

2.09

6
Total Stg. Cap. Bits/File (10 )

1024

1070

1030

1070

Clock Rate, Inner, MHz
Clock Rate, Outer, MHz

2. 52
3. 78

2. 62
3. 48

2. 54
3.81

2. 62
3.48

No. Tracks, Outer Zone
No. Tracks, Inner Zone

168
264

240
192

160
272

240
192

Head Groups, Outer Zone
Head Groups, Inner Zone

14
22

20
16

40
68

80
64

Head Sticks, Outer Zone
Head Sticks, Inner Zone

14
22

20
16

13 + 1/3
22 + 2/3

20
16

Clock Channels /Unit

12

8

4

2

The large number of accessible heads permit a number of possible disk organizations.
Without sacrificing capacity the following might represent limiting transfer rate
capabilities.
1.

Up to 128 bits per face or 1024 bits per file at a word
transfer rate of approximately o. 8 megaHertz without
modifications to head stick wiring.

2.

Up to 512 bits per face or 2048 bits per file at a word
transfer rate of approximately o. 8 megaHertz if head
stick wiring is revised to enable simultaneous
selection of four heads per stick.

3.

Up to 256 bits per face or 1024 bits per file at a word
transfer rate of approximately 1. 2 megaHertz if head
wiring is revised to enable simultaneous selection of
three heads per stick.

Still larger word lengths would be possible by revising head stick wiring to permit
simultaneous access to all heads but the resulting increase in the nun1.ber of head
wires from 15 to 39 would likely be troublesome. With no fewer than 3 heads per
group the number of wires per stick can be held to 23 for completely flexible headtrack sparing, or to 14 wires per head stick if restriction to group sparing is
acceptable. At the word lengths of interest it is necessary to either subdivide the
head stick wiring or concurrently select heads over several diskfaces. Both have
some potential disadvantages; the latter from the possible necessity for skew correction due to relative timing errors between several faces and/ or head plates,
and the additional clock channels required. At a word length of 512 bits either
approach is a possibility. Table 4-1 presents a summary for four different organizations: two involve no head stick revisions, and two require head revisions but
involve only one disk or one disk face, respectively. Two different two -zone formats (indicated in the table as the number of serial bits per word time in outer
zone: the number of serial bits per word time in the inner zone) offer words transfer rates of O. 8 to 1. 2 megaHertz. At the initial contact with General Precision,
Inc., they indicated a tentative preference for organizations involving concurrent
head selection on a single face because they did not think they had adequate
information at that time with respect to the skew problem.

4-7

SECTION V
PROGRAMMING

PE OPERA TIONS
This section describes modifications and additions to the list of instructions executed, at least in part, by the logic within Processing Elements. A11 instructions not mentioned here are functionally unchanged from those described in Section VII of Progress Report No.1 (TR-66-3; August 26, 1966). However, some
changes in instruction forma.ts are contemplated to accommodate certain new instructions. In particular, some of the new instructions will be ma.de var iants of
old instructions instead of new op- codes and these old instructions will have their
formats changed to incorporate variant fields.
In 32-bit mode, bits 0 and 32 are the sign-bits of the two operands in either the
A or the B Register; bits 1-7 and 33-39 are the exponents; bits 8-31 and 40- 63
are the mantissa magnitudes. Fixed- point operations always involve the mantissa
magnitudes and the mantissa signs only. Thus, in 32- bit mode fixed- point, operations are performed on 24-bit-plus-sign quantities. When an operation uses
- operands from both the A and the B Registers, corresponding bits from the two
registers are involved. A s an example, a double-length arithmetic shift (SHD)
treats bits 8- 31 of the A and B Registers as one 48- bit quantity and bits 40- 63 of
the two registers as another 48-bit quantity. Shlft counts are interpreted modulo
32 in 32-blt mode.
The PE Index Register is not affected by the word-length mode. For example,
Load Index from A Register (LXA) always transmits bits 52- 63 of the A Register
'to the Index Register.
.
There are now nine mode-bits per PE, designated as the \V, E, E1, F, F1, G,
I, and J bits. The \\1'- Bit is always set or reset by the Control Unit and designates
the word-length (ZERO implies 64-bit operands, ONE implies 32-bit operands').

5-1

In 64-bit mode the E-Bit enables or disables the changing of full operand registers
and the E1- Bit is not available for program use. In 32 - bit mode, the E- Bit controls the changing of bits 0- 31 of the registers and the E1- Bit controls bits 32- 63:
In test instructions executed by PEts, any register bit which is disabled from
being changed is also disabled from being tested to generate a change in any "mode
bit. In 64-bit mode, the F-Bit is set to ONE whenever an arithmetic fault occurs
in the PE and the F1-Bit is not available for program use. In 32-bit mode, the
F-Bit is set to ONE whenever a fault occurs in arithmetic performance on bits
0-31 of operands and the F1-bit is set to ONE whenever a fault occurs in bits
32-63. In 64-bit mode, the instruction list is augmented to permit specification
of anyone of the G, H, I, and J bits to hold the result of any test. For example,
the tests for the A-Register being zero now include GAZ and HAZ (Set G/H if A
Equals Zero) as well as IAZ and JAZ. The Control Unit can enable or disable a
PE as a result of the condition of the G and H bits in a manner identical to the
previous capability with the I and J bits. In 32-bit mode, the IAZ instruction sets
the G-Bit if bits 0-31 of the A-Register are zero and sets the I-Bit if bits 32-63
of the A-Register are zero. Of course, if the E-Bit equals ZERO, the G-Bit
is unchanged and if the E1-Bit equals ZERO, the I-Bit is unchanged. Similarly,
JAZ affects both the Hand J bits in 3~-bit mode. The operations of instructions
calling for setting of the G and H bits (e. g. GAZ, HAZ) are presently undefined
for 32-bit mode.
The new register in each PE is designated the ItS Register" (for Save Operand
Register or Special Register). The contents of the S Register are not involved
or altered by any arithmetic or shift instruction. Three instructions have been
defined involving the S Register:
SAS

Store A Register in S Register

Copy all of the A Register into the S Register. If in 64- bit mode and
E = ZERO, do not change the S Register. If in 32-bit mode and E = ZERO,
do not change bits 0-31 of the S Register. If in 32- bit mode and
E1 = ZERO, do not change bits 31-63 of the S Register.
IBS

Load B Register from S Register

SWAPS

Swap A, Band S Registers
A Goes to S
B Goes to A
S Goes to B

There is now a set of instructions which permit operand transmissions to begin
and/ or end at the MDR without involving the A and B Registers. These instructions are designed so that intermediate PE's can participate in multi- PE routing.
without having their operand registers altered.. The Control Unit has a single
instruction which causes a sequence of transfers among neighboring PE's to
accomplish the desired multi-PE routing. The instructions transmitted to the
PE's by the Control Unit in response to the single instruction executed by the·

5-2

Control Unit are also available for the program to call upon individually. These
instructions include MDR-to-MDR transmission with direction specified, storage
from the A or the B Register to the MDR of the same PE, and leading of the A
or B Register from the MDR of the same PEe A disabled PE does not execute
those instructions which load its A or B Register but does participate in the
others. This control of which PEls finally receive multi-PE transmissions,
coupled with the programmed control of the path distance and directions, permit
the single Control Unit instruction required. In effect, all PE' s originate
transmissions but, after the path control has been counted down, only enabled
PE's accept the result of the transmission. The operands which arrive at
disabled PE' s are retained in their MDR' s and may be discarded by the next
instruction causing their replacement. However, some saving of transmission
time may be accomplished, for certain routing patterns, if the first multi-PE
transmission is followed by a new transmission that starts with MDR-to-MDR
transmissions and which terminates with a different set of PEls being enabled
to accept the transmission results from their MDR's into their A and B Registers.
There are now instructions which cause the clearing of the exponent field of the
A or the B Register. Clearing of the mantissa field has always been available
as an end-off arithmetic shift with any shift count exceeding the length of the
mantissa. With the Barrel, shifting to clear a field takes no longer than the
execution of a special clear field instruction. Obviously, in assembly language
a clear mantissa instruction may exist which would assemble as a shift.
There is now a round variant on each of the pertinent arithmetic instructions.
In floating-point, rounding is accomplished before normalization to preserve
the prope r significance.
There are now instructions which add address fields to the previous contents of
the Memory Address Register, with the address fields being optionally provided
by fields within the instruction or the PE Index Register or both. This permits
straightforward, multi-level, indirect addressing when the level is known.
There is now an instruction which stores an operand from the A, B or Memory
Data Register of an enabled PE to the Common Data. Bus. When only one PE is
enabled, this provides the required transmission of an operand from a PE to the
Control Unit without requiring memory access. When two or more PEls are
enabled, the result of this instruction is undefined.
There are now instructions which carry an indexable- 6-bit field specifying one
bit within either the A or the B Registers. The bit number is interpreted modulo
64 in 64-bit mode.. In 32-bit mode, the bit number is interpreted modulo 32 and
as modulo 32 plus 32, allowing specification of corresponding 'bits in both
halves of the Register. The operations on and with the specified hit are:
• Set either the G, H, I or J bit to equal the specified bit.
In 32-bit ~ode, only I and J are defined. )

5-3

• Change the specified bit.
• Set the specified bit to ZERO.
• Set the specified bit to ONE.
There are now instructions which perform arithmetic on the magnitudes of the
operands.
There are now instructions which compare the magnitude of the A Register with
the magnitude of the B Register. Also, the magnitude of either register may
be compared with the Common Value. Comparison with the magnitude of the
Common Value is not included as a separate PE inst ruction since this may be
accomplished by having the Control Unit broadcast the magnitude only of the
Common Value.
The instructions which operate upon or compare the value of the PE Index
Register have been augmented so that the Index Register may be stored in,
loaded from,or compared with the least significant twelve bits of the B Register
or a memory word. When a memory word is involved in any of these instructions, the address in memory is indexable.

CONTROL UNIT INSTRUCTIONS
In the following numeric list of instructions, the first syllable is given in octal.
Op-code "000" is interpreted by the CU as a halt. Op-codes "001-177" are executed in part by the CU and part by the PE array. Op -codes" 200 -377" are
interpreted fully by the CD and no direct PE action results.
In ulti -syllabic instructions, the following abbreviations are used to indicate the
coding of the syllables other than the first:
A bit which does not affect the operation of the instruction being
described.

5-4

a

A bit which is part of an address or literal field.

b

A bit which is part of a field which designates a bit- number within
a register.

c

A bit which is part of a shift count ( in some instructions a
bit-number).

C

An eight-bit syllable used to designate an address within CU
local memory.

d

A bit which designates shift direction.

e

A bit which distinguishes between end -off and end -around shifts.
End -around shifts are mnemonically referred to as "Rotate. II

i

A variant bit which is defined as ONE for the specific variaI1t being
discussed.

o

A variant bit which is defined as ZERO for the variant being discussed.

L

An eight -bit syllable which is part of a literal field.

M

An eight -bit syllable which is part of an address field used by the CD
to address IVlain Memory.

v

A variant bit which has meaning in defining a variant described
elsewhere. Examples of this are given following this list.

x

A variant bit which controls indexing, by the PE Index Register,
of the address, shift-count or bit-number field given in the instruction.

As an example of the use of these abbreviations in the instruction list, consider
the instructions which transmit data between neighboring PE's. Each of these
instructions starts with an op -code syllable equal to octal 120 and has a second
syllable specifying variants. The variants permit the choice of the A Register,
the B Register or the Memory Data Register as the transmission origin; the same
three registers, or the Memory Address Register, may be the destination; the
transmission-direction may be North, East, South, or West. Two bits are used
for each of these variants, leaving two undefined bits in the variant syllable. The
two most significant bits specify the originating register, the next two bits specify
the destination register and the two least Significant bits specify the direction.
Transmission from the Memory Data Register to the Memory Data Register of the
North neighbor is indicated by coding" 10" for each of the register -designation
bits and "00" for the direction: 1010 .. 00, where the periods indicate the undefined
bits. In the instruction list this would appear "ioio .. 00". However, to avoid
listing each possible variant separately, the list contains the following t.hree
entries:
120 iovv .. vv

D-T-

120 vvio .. vv

-DT-

120 vvvv ..

--TN

00

The first entry denotes the coding of the bits that specify the origin and indicates
that the other variant bits have no affect on the origin. Similarly, the second entry
denotes destination and the third entry denotes direction. The mnemonic abbreviations show the characters that represent the variants and hyphens are used for
character positions that are used to denote other variants. In the specific example
being considered, "DDTN" means MDR -to-MDR transmission North and is encoded
"ioio .. 00".

5-5

Elements of the Control Unit
The control unit is conveniently described in terms of its registers and other
functional entities. The registers consist of:
• Eight mode function registers, E l' E , F l' F , G, H, I and J
2
although physically located in the processing e~ements, are addressed
by program as though they were registers of the control unit
• Index· registers, each 32 bits long
• 64-bit registers in high speed scratchpad storage.
these can be used as the broadcast buffer.

Sixty-four of

• Program counter
• Interrupt register and mask, or interrupt control, register
• Local register address pointer, 8 bits of register address.
The above registers are addressible uniformly using an 8 -bit register address
in the instructions. In addition, there are several registers which are not
addressible by the 8 -bit field, but are implicity addressed by all instructions
which are relevant to their use. They are:
'. Program lookahead, for holding a reservoir of program steps
independently of memory accessing.
• Address register, which serves as an accummulator for addresssized fields. It is 24 bits long.
•

A ccumulator (for want of a better name), a common register
referenced by all data manipulating instructions. It is 64 bits long.

• A queue of instructions and q,ata, which decouples the operations
which are solely within the CU from those that refer to the PE's.
This queue, like that in the B8500, has no effect on the operations
of the instructions except to permit a certain amount of parallelism,
and therefore is not discussed further.
Discussion of the Instructions
All registers within the CU are uniformly addressed by an 8 -bit field, which is indexable. The operation of transferring the I Register to the E Register, which is
"enable those PE's which had a true result on the most recent test involving the
I-Bit, " is accomplished by the same sequence of instructions as transferring
register number 24 to register number 42 which merely moves data around the
scratchpad. Similarly, a jump is accomplished by storing a new value in the
numbered register which is the program counter.

5-6

A bonus which comes from this approach is that capabili ty which is invented for
the use of one special case, such as reading a PE number from the E register, can
be used on any data within the CD, thus increasing programming flexibility.
The CUAccummulator is central to most CU operations. Its use is implied in
most CD data transmission instructions. Whenever an instruction is transmitted
from the CD to the PE array, the CD Accummulator may modify the PE instruction
or otherwise participate in the operation of the PEt For example, the PE instruction "Load A Register from Common Value" (LAC) transmits the contents of
The CD Accummulator to the A Registers of all enabled PE's. The accummulator
also receives the transmission from the PE in the "Store A Register to Common
Data Bus" Instruction (SAC).
When an instruction with an address, shift-count, or bit-number field is transmitted to the PE's, the contents of the CU Accumnlulator are added to this field
before transmission. Thus, multiple indexing with CU index registers is accomplished by ordinary addition to the CD Accummulator before issuance of the PE
instruction.
The CU also has_ one special index register for its own use in addressing w:ithin
its local memory. The act of placing a value in this index automatically causes
the addition of this value to the next instruction with a CD register number. This
register is always cleared after its use.

000

HALT

Halt All Operations

002

CHSA

Change Sign of A Register

003

CHSB

Change Sign of B Register'

004

SAP

Set A Register Positive

005

SBP

Set B Register Positive

006

SAN

Set A Register Negative

007

SBN

Set B Register Negative

010

CEA

Clear Exponent of A Register

011

CEB

Clear Exponent of B Register

013

CMB

Complement B Register

015

CLB

Clear B Register

016

NORM

Normalize A Register

020

SAD

Store from A Register to Memory Data Register

021

SBD

Store from

022

LAD

Load A Register From Memory Data Register

i3

Register to Memory Data Register

5-7

5-8

023

LBD

Load B Register from Memory Data Register

024

SAC

Store from A Register to Common Data Bus

025

SBC

Store from B Register to Comn1on Data Bus

026

LAC

Load A Register from Common Value

027 .

LBC

Load B Register from COlnmon Value

030

LMA

Load Memory Address Register from A Register

031

LMB

Load Memory Address Register from B Register

032

LMC

Load Memory Address Register from Common Value

033

LMD

Load Memory Address Register from Memory Data Register

034

LXA

Load Index Register From A register

035

LXB

Load Index Register from B Register

036

LXC

Load Index Register from Common Value

037

LXD

Load Index Register from Memory Data Register

040

SXA

Store Index Register in A Register

041

SXB

Sotre Index Register in B Register

042

SXC

Store Index Register to Common Data Bus

043

SXD

Store Index Register in l\1emory Data Register

044

ADAX

Add A Register to Index Register

045

ADBX

Add B Register to Index Register

046

ADCX

Add Common Value to Index Register

047

-ADDX

050

LDAM

Load A Register as Designated by Memory Address Register

051

LDBM

Load B Register as Designated by Memory Address Register

052

STAM

Store A Register as Designated by Memory Address Register

053

STBM

Store B Register as Designated by Memory Address Register

054

LDMM

Load Memory Address Register as Designated by Memory
Address Register

055

LDDM

Load Memory Data Register as Designated by Memory
Address Register

06_0

SAS

Store A Register in S Register

061

LBS

Load B Register from S Register

062

SWAP

Swap A and BRegisters

063

SWS

Swap with S Register (A to S: S to B: B to A)

064

DBA

Duplicate B Register from A Register

Add Memory Data Register to Index Register

100

xdcc cccc

SHA

Shift A Register Mantissa

101

xdcc cccc

SHB

Shift B Register Mantissa

102

xdcc cccc

SAL

Shift A Register Logical

103

xdcc cccc

SBL

Shift B Register Logical

104

xdcc cccc

RAL

Rotate A Register Logical

105

xdcc cccc

RBL

Rotate B Register Logical

106

xdcc cccc

SHD

Shift Double -Length Mantissa

107

xdcc cccc

SDL

Shift Double -Length Logical

110

x. bb bbbb

CHBA Change Specified Bit of A Register

111

x. bb bbbb

CHBB Change Specified Bit of B Register

112

x. bb bbbb

SBA

Set Specified Bit of A Register

113

x. bb bbbb

SBB

Set Specified Bit of B Register

114

x. bb bbbb

CLBA Clear Specified Bit of A Register

115

x. bb bbbb

CLBB Clear Specified Bit of B Register

120

oovv .. vv

A-T-

Inter -PE Transmission frorn A Register

120

oivv .. vv

B-T-

Inter -PE Transmission from B Register

120

iovv •. vv

D-T-

Inter-PE Transmission from Memory Data Register

120

vvoo •. vv

-AT-

Inter -PE Transmission to A Register

120

vvoi .. vv

-BT-

Inter-PE Transmission to B Register

120

vvio .. vv

-DT- Inter -PE Transmission to Memory Data Register

120

vvii .. vv

-MT-

Inter - PE Transmission to l\1emory Addres s Re gister

120

vvvv

--TN

Inter-PE Transmission North

120

vvvv · . oi

--TE

Inter - PE Transmis sion East

120

vvvv · . io

--TS

Inter-PE Transmission South

120

vvvv · . ii

--TW

Inter-PE Transmission West

121

oioo ioio

Clear A Register
CLA
BOOOO Boolean Function 0000

121

oHi ioio

A AND B; Result to A Register
AND
BOO01 Boolean Function 0001

121

oiii ioii

NIMP Not (A Implication B); Result to A Register
B0010 Boolean Function 0010

121·

oiii iiio

NRIMP Not(B Implication A); Result to A Register
BO 100 Boolean Function 0100

•• 00

5-9

5-10

121

oooi

ioio

DAB
'B0101

Duplicate A Register fromB Register
Boolean Function 0101

121

ioii

ioio

EOR
BOlIO

A Exclusive OR B; Result to A Register
Boolean Function 0110

121

ooii

ioio

OR
BOllI

A OR B; Result to A Register
Boolean Function 0111

121

o iii

iiii

NOR
BIOOO

Not (A or B); Result to A Register
Boolean Function 1000

121

ioii

ioii

MEQ
B100l

A Material Equivalence B; Result to B Register
Boolean Function 1001

121

oooi

ioii

NOTB
B1010

Complement of B Register Transmitted to A Register
Boolean Function 1010

121

ooii

ioii

RIMP
BI011

B Implication A; Result to A Register
Boolean Function 1011

121

ooio

iiio

CMA
Bl100

Complement A Register
Boolean Function 1100

121

ooli

iiio

IMP
BIlOl

A Implication B; Result to A Register
Boolean Function 1101

12 1

iiii

iiii

NAND
B1ll0

Not (A AND B); Result to A Register
Boolean Function 1110

122

oovv vvvv

122

oivv

vvvv

----M

Perform Arithmetic on Magnitude Only

122

vvvv vvvo

----R

No Rounding of Result

122

vvvv vvvi

----R

Round Result

122

vvoo ooov

ADD

Fixed Point Add A to B; Result to A Register

122

vvoo ooiv

UFAD

Unnormalized Floating Add A to B; Result to
A Register

122

vvoo oiov

SUB

Fixed Point Subtract B from A; Result to A Register

122

vvoo oiiv

UFSU

Unnormalized Floating Subtract B from A;
Result to A Register

122

vvoo ioov

MUL

Fixed Point Multiply A by B; Result to A & B

122

vvoo ioiv

UFMU

Unnormalized Floating Multiply A by B; Result
toA & B

122 . vvoo iiov

DIV

Fixed Point Divide A by B; Quotient to
Remainder to B

122

vvoo iiiv

UFDV

Unnormalized Floating Divide A by B; Quotient to
Remainder to B

122

vvoi

FAD

Float:ing A dd A to B; Result to A

ooiv

Perform A rithmetic on Sign and Magnitude

A~

A~

122

:vvoi oiiv

FSU

Floating Subtract B from A; Result to A

122

vvoi ioiv

FMU

Floating Multiply A by B; Result to A & B

122

vvoi iiiv

FDV

Floating Divide A by B; Quotient to A, Remainder, to B

122

vvio ooiv

EAD

Extend Precision After Floating Add; Extension of
Sum to A Register

122

vvio oiiv

ESU

Extend Precision After Floating Subtract; Extension
of Difference to A Register

122

vvio iiov

IDV

Integer Divide A by B; Quotient to A, Remainder to B

123

iiii

iiio

GR8

Te st 8- bit Bytes for A Greater than B; Result to A

123

iiii

ioii

LS8

Test 8- Bit Bytes for A Less than B; Result to A

123

ioii

ioii

EQ8

Test 8-Bit Bytes for A Equal to B; Result to A

130

oovv vvvv

G---

Set G-Bit as Result of Comparison of A Register
with B Register

130

oivv vvvv

H---

Set H-Bit as Result of Comparison of A Register with
B Register

130

iovv vvvv

1---

Set I-Bit as Result of Comparison of A Register with
B Register

130

iivv vvvv

J---

Set J -Bit as Result of Comparison of A Register with
B Register

130

vvoo vvvv

130

vvoi vvvv

- M- - -

Compare Magnitude only of A Register with Specified
State of B Register

130

vvio vvvv

- L---

Compare Logical Value of A Register with Specified
State of B Register

130

vvvv oovv

-- -- -

Compare Sign and Magnitude of B Register with
Specified State of A Register

130

vvvv oivv

- - -- M

Compare Magnitude Only of B Register with Specified
State of A Register

130

vvvv iovv

----L

Compare Logical Value of B Register with Specified
State of A Register

130

vvvv vvoo

-- EQ-

Test for A Equal to B

130

vvvv vvoi

-- LS-

Test for A Less than B

130

vvvv vvio

--GR-

Test for A Greater than B

131

oovv vv ..

G---

Set G-Bit as Result of Test of A or B Register

131

oivv vv ..

H---

Set H-Bit as Result of Test of A or B Register

Compare Sign and Magnitude of A Register with
Specified State of B Register

5-11

5-12

131

iovv vv .•

l---

Set 1- Bit as Result of Test of A or B Register

131

iivv

J---

Set J - Bit as Result of Test of A or B Register

131

vvov vv ..

-A--

Test A Register

131

vviv vv ..

-B--

Test B Register

131

vvoo oi. .

--Z

Test for Plus or Minus Zero

131

vvvo io ..

--PZ

Test for Plus Zero

131

vvvo ii. .

--0

Test for ALL ONES (Minus Full Scale)

131

vvvi

--S

Copy Sign- Bit to Designated Test Bit

132

oovv vv ..

GX--

Set G-Bit as Result of Index-Register Test

132

oivv vv ..

HX··-

Set H-Bit as Result of Index-Register Test

132

iovv vv ..

IX--

Set I-Bit as

132

iivv vv ..

JX--

Set J -Bit as Result of Index- Register Test

132

vvoo vv ..

-XE-

Compare Index Register for Equality with Specified
Quantity

132

vvoi vv ..

-XL-

Test for Index Register Less than Specified Quantity

132

vvio vv ..

-XG-

Test for Index Register Greater than Specified
Quantity

132

vvii vv ..

-XZ

Test for Index Zero

132

vvvv

-X-A

Compare Index Register with A Register

132

vvvv oi. .

-X-B

Compare Index Register with B Register

132

vvvv io ..

-X-C

Compare Index Register with Common Value

132

vvvv ii ..

-X-D

Compare Index Regsiter with Memory Data Register

134

xobb bbbb

GBA

Set G-Bit to Designated Bit of A Register

134

xibb bbbb

GBB

Set G-Bit to Designated Bit of B Register

135

xobb bbhb

HBA

Set H-Bit to Designated Bit of A Register

135

xibb bbbb

HBB

Set H - Bit to Designated Bit of B Register

136

xobb bbbb

lBA

Set I-Bit to Designated Bit of A Register

136

xibb bbbb

lBB

Set l- Bit to Designated Bit of B Register

137

xobb bbbb

JBA

Set J - Bit to Designated Bit of A Register

137

xibb

bbbb

JBB

Set J -Bit to Designated Bit of B Register

VV ••

00 ••

00 ••

Result of Index- Register Test

140

oovv aaaa

aaaa aaaa

GX-L

Set G-Bit as Result of Comparison of
Index Register with Literal

140

oivv aaaa

aaaa aaaa

HX-L

Set H-Bit as Result of Comparison of
Index Register with Literal

140

iovv aaaa

aaaa aaaa

IX-L

Set I- Bit as Result of Comparison of
Index Register with Literal

140

iivv aaaa

aaaa aaaa

JX-L

Set J -Bit as Result of Comparison of
Index Register with Literal

140

vvoo aaaa

aaaa aaaa

-XEL

Test for Index Equal to Literal

140

vvoi aaaa

aaaa aaaa

-XLL

Test for Index Less than Literal

140

vvio aaaa

aaaa aaaa

-XGL

Test for Index Greater than Literal

141

· ovv aaaa

aaaa aaaa

L-L

Load Specified Register with Literal

141

· ivy aaaa

aaaa aaaa

ADL-

Add Literal to Specified Register

141

· voo aaaa

aaaa aaaa

LALj
ADLA

Specify A Register

LBLj
ADLB

Specify B Register

LXLj
ADLX

Specify Index Register

141
141

· vol aaaa
• via aaaa

aaaa aaaa
aaaa aaaa

150

x.

aaaa

aaaa aaaa

STA

Store A Register

150

x.oi aaaa

aaaa aaaa

STB

Store B Register

150

x. io aaaa

aaaa aaaa

STD

Store Memory Data Register

150

x.ii aaaa

aaaa aaaa

STX

Sotre Index Register

00

.....

151

x.oo aaaa

aaaa aaaa

LDA

Load A Register

151

v.oi aaaa

aaaa aaaa

LDB

Load B Register

151

x. io aaaa

aaaa aaaa

LDD

Load :LyIemory Data Register

151

x.ii aaaa

aaaa aaaa

LDX

Load Index Register

152'

xooo aaaa

aaaa aaaa

LDM

Load Memory Address Register

152

xioo aaaa

aaaa aaaa

ADM

Add to Memory Address Register

152

xiii aaaa

aaaa aaaa

ADMX

A dd from Memory to Index Register

DUP

Duplicate Non-Zero Half of CU
A ccummulator

.201

FULL

Enter Full- Word (64-bit) Mode

202

HALF

Enter Half- Word Mode

204

ZEROF If CU Accumulator Is Not All ZEROS~
Skip Until the Next "UNSKIpl!
Instruction

200

5-13

205

ZEROT If CU A ccumulator is All ZEROS, Skip
Until the Next "UNSKI P" Instruction

206

ONESF If CU Accumulator Is Not All ONES, Skip
Until the Next "UNSKIP" Instruction

207

ONEST If CU A ccum ulator is all ONES, Skip
Until the Next "UNSKIP" Instruction

210

SKIPF

If the Result of the Last Test was False,

Skip Until the Next "UNSKIP" Instruction
211

SKIPT If the Result of the Last Test was True,
Skip Until the Next "UNSKIP" Instruction

212

UNSKIP Resume Executing All Instructions

220

LEA DO

Find Leading ONE in the CU Accumulator; Put Bit- Number in CU
Accumulator

221

LEADZ

Find Leading ZERO in the CU Accumulator; Put Bit- Number in CU
Accumulator

230

CCL

Clear CU A ccumulato!'

231

CCOM

Complement CU Accumulator

232

XCUA

Index by CU Accumulator

240

aaaa aaaa

STL

Store CU A ccumulator in CU Local
Memory

241

aaaa aaaa

STLC

Store CU A ccumulator in CU Local
Memory, Complemented

242

aaaa aaaa

LDL

Load CU f.. ccumulator from CU Local
Memory

244

aaaa aaaa

EXCH

Exchange CU. A ccumulator with CU
Local Memory

245

aaaa aaaa

EXCHC

Exchange Complement of CU Accumulator
with CU Local Memory

246

aaaa aaaa

CADD

Add to CU Accumulator

247

aaaa aaaa

CSUB

Substract from CU Accumulator

250

aaaa aaaa

CAND

AND to CU Accumulator

251

aaaa aaaa

COR

OR to CU Accumulator

252

aaaa aaaa

CEOR

Exclusive OR to CU Accumulator

5-14

Clear Designated Bit

260

oobb bbbb

CCLB

260

oibb bbbb

CSBO

Set Designated Bit to ONE

260

iobb bbbb

DDHB

Change De signated Bit

261

edcc cccc

CSHIFT Shift

262

oobb bbbb

CTSBZ

Test Bit for ZERO

262

oibb bbbb

CTSBO

Test Bit for ONE

270

aaaa aaaa

EQUALT

271

aaaa aaaa

EQUALF

272

aaaa aaaa

GRTRT

273

aaaa aaaa

GRTRF

274

aaaa aaaa

LESST

275

aaaa aaaa

LESSF

276

aaaa aaaa

XADD

Add to CU Index

277

aaaa aaaa

SUB

Subtract from CU Index

30U

ooVY

nnnn nnnn

RTA-

Route from A Registers

300

oivv

nnnn nnnn

RTB-

Route from B Register s

300

iovv

nnnn nnnn

RTD-

Route from Memory Data Registers

300

vvoo

nnnn nnnn

RT-A

Route to A Registers

300

vvoi

nnnn nnnn

RT-B

Route to B Registers

300

vvio

nnnn nnnn

RT-D

Route to Memory Data Registers

300

vvii

nnnn nnnn

RT-M

Route to Memory Addr(;ss Registers

310

LLL

SLIT

Short (24- bit) Literal to CU Accumulator

311

MMM

STO

Store One "Vord from CU Accumulator
to Main Memory

312

MMM

LOAD

Load One Word from Main Memory to
CU Accumulator

320

CMMM

BIN

Block Transfer into CU Memory from
Main Memory

321

CMMM

BOUT

Block Transfer Out from CU Memory
to Main Memory

3"30

LLLLLLLL

LIT

Full- Word Literal to CU Accumulator

5-15

SECTION VI
ILLIAC IV APPLICATIONS STUDY

The implementation on ILLIAC IV of the Cooley-Tukey algorithm for the calculation
of complex Fourier series has been studied. A method is described in this section.

DESCRIPTION OF THE COOLEY-TUKEY ALGORITHM
This is a method for evaluating the function X{j) for N values of the argument
= 0,1, ... ,N-1) when we are given the N complex coefficients A(k), k = 0,1, ••. ,
N -I, appearing in the Fourier sum that is used to define the function X(j).

(j

N-1
X (j)

=

I

j

A (k). W k

j

= O.

1•••. , N-1.

(1)

k=O
Here W is defined to be the principal N -th root of unity.

w

= e

27Ti N
/ I

(i

=J-T).

(2)

rhe inverse problem can also be solved by the same method. for if the N values
of X(j) corresponding to j = 0.1 •••.• N-1 are given. then the Fourier coefficients
appearing in equation (1) are defin ed by

A(k)

=;

N-1

L

X(j)

W~jk

k = O. 1 ••••• N-1

(3)

j= 0

which is similar in form to equation (1).
6-1

The following discussion is concerned with the problem as stated in the form given
in equation (1).
The straightforward use of equation (1) is equivalent to pre-multiplying the Ncomponent vector A(k) by the NxN matrix Wjk to obtain the N -component solution
vector X(j). This is easily implemented on ILLIAC IV which computes the components of X in parallel. The total number of operations required would be about
N2 where an operation is considered to be a complex multiplication followed by a
complex addition. The algorithm des cribed by Cooley and Tukey* can achieve the
same result with much less computation. The number of operations, in the most
favorable case, is proportional to N. log N rather than to N2. It is also economical
in st'orage requirements. These features make it a highly desirable method for
this problem, especially for large values of N.
Cooley and Tukey* showed that choosing N to be a power of 2 (N = 2m) has particular advantages for computation on a binary machine. "Vith this choice, the
algorithm takes the form of generating iteratively a sequence of m N-component
vectors. The first member of the sequence is derived by iteration on the vector
A(k) and the final member is the required vector X(j).

It is assumed throughout what follows that the choice N

=2m

has been made.

To define the sequence of vectors the indices are written in binary form.

(4a, b)

The equation (1) can then be written

, ... , L
k

m-1

By evaluating the indicated sums sequentially, the following definition of the sequence
Ar (r = 1, 2, ••• , m) is obtained. The notation used is that of Cooley and Tukey
except that the iteration parameter is represented by r instead of Z,. for ease of
typing.

*J. W.

Cooley and J. W. Tukey: An Algorithm for the Machine Calculation of
Complex Fourier Series. Mathematics of Computation, Vol. 19 (1965). pp. 297 - 301.

6-2

(5)

>·

I

A r- 1 (jo' • • • , j r- 2' k m-r , • • • , k O

A r (jo'· •• ' j r- l' k m-r- l' • • ., k o >:= k

Pr
W

m-r
where
p

1

:=

j

0

• k

m-1

• 2

m 1
(6)

p

r

:=

(j

r-1

+ J. ) k

• 2 r -1 +, ••• ,

O

• 2m - r

m-r

Writing out the two terms of the sum in equation (5) we obtain

A r (jo, •.• ,j r- l,k m-r- 1,···,k ):=A r- l(jO,···,j r- 2: 0, k m-r- I'···' kO) +
O
( - 1 )j r -1 A

r-

1 {j 0' • . • ,j

r

r-

:=

2' 1, k

m-r-

(7)

l' • • • , k 0>. W qr

2, 3, ••• , m

where
r
m r
qr -- (J' r - 2 • 2 -2 + , . . . , + J' 0 ). 2 -

(8 )

The desired components of X are then defined by the last menlber of the sequence.
(9)

MACHINE IMPLEMENTA TION
It was the suggestion of Cooley and Tukey that the value of
A. r (j 0' • • ., j r- I' k m-r- l' • • . , k O)

calculated by means of equation (7) be stored in a location whose address is
.• 2m-1 +
m-r
m-r-l
' • • • t + j r- 1· 2
+ k m-r- 1· 2
+ , .•
J.O

q

+ kO·
6-3

When this is done the storage requirements are minimized and the last array calculated gives the desired Fourier sum s, equation (9), in such an order that the
index of an X must have its binary bits put in. reverse order to yield its index in
array Am.
On any iteration, the components of Ar may be computed in parallel since the calculations defined by equation (7) may be carried 0 ut with all values of jo' . • ., j r- 2
and kO' • •. , k
1 simultaneously.
m-rIMPLEMENTATION ON ILLIAC IV
In order to perform the calculations indicated on ILLIAC IV, it is necessary, on the
r-th cycle, to have the values of both
A r- 1 (jo' •• • '. j r- 2' 0, k m-r- l' ••• , kO)
and

q
available in the same PE memory. The value of W r must also be available to
the PE. One then computes, according to equation (7), the value of Ar with the
same two indices.

To obtain the values of A
it may be necessary to

with the desired pair of indices in the same PE memory
PE to PE during the course of the calculations.

slln'l data from

The constant powers of W required by each PE, during the entire course of the
computation, are predetermined by the method in which the original coefficients
A{k) are distributed within the PE memories and by the scheme adopted for shifting
data between PE memories as the compu tation proceeds. These details will now
be discuss ed.

STORAGE OF THE COEFFICIENTS A(k)
Use is made of the fact that N has been chosen to be a power of 2: N:: 2m. It is
further assumed that m is greater than 8. To determine the location in which a
particular A(k) is stored, the representation of k as a binary number as in
equation (4b) is used. Let the last eight bits (k 7 , ••• ,kO) of this binary representation of k define the PE in which A(k) is stored. The las t two of these bits (k 1, kO)
d.etermine the number o~ the quadrant, the preceeding six bits (k , ••• ,k ) deter7
2
mine the number of the PE within the quadrant.

6-4

The remaini.ng m - S bits (km-I, •.• , kS) may be interpr eted as the add res s of a
storage location within the PE memory. To allow for the storage of both real and
imaginary parts of A(k), the real parts can be stored in the indicated location while
the imaginary part can be stored in the corresponding location of another block of
m emory, congruent to the block in which the real parts are stored. The size of
each of these blocks of m emory will be 2m- a words.
The interpretation of the binary representation of the index k is thus as shown in
figure 6-1.

Storage Location Withill PE

!
I

PE Number Within Quadrant
(

km - 1

ka

k7

k2

m-S bits

Figure 6-1.

6 bits

Ikl

If" Quadrant

kO

I

2 bits

Interpretation of k as a Binary Number to Define Storage
Location of A(k).

The quadrants of the ILLIAC IV array are numbered 0, 1, 2, 3. The numbering of
the PE's within the quadrant is shown in figure 6- 2. If p is the PE number then the
last 3 bits of the binary representation of p are the column and the first 3 bits are the
row numbers. When the initial data is stored as described, its distribution throughout the arrays is shown in figure 6-3. The number shown are k mod 256.

~
Row

0

1

2

3

4

5

6

0

0

1

2

3

4

5

6

1

8

9

10

11

12

13

14

2

16

3

24

4

32

5

40

6

48

7

56

7

.

7

15

Row

I
3 bits

57

- Figure 6-2.

58

59

60

61.

62

Col

;

3 bits

63

The Numbering of the PElS withIn a Quadrant

6-5

o

/

0

4

8

32

36

40

64

68

12

16

20

24

28

96

1

5

9

33

37

41

65

69

13

17

21

25

29

97

1

128
160
192
224

228

232

236

240

244

248

252

225

253

2

6

10

14

18

22

26

30

3

7

11

43

34

38

35

39

66

70

67

71

98

226

Figure 6-3.

15

19

23

27

31

254

227

I

3

99

255

The .Distribution of the Coefficients A (k) in the A rray Initially

COMPUTATION AND STORAGE OF INTERMEDIATE RESULTS
The calculations fall naturally into two parts, the first m- 8 iterations and the
final 8 iterations. In the following de scr iption, the binary bit s (always n~ in number)
of the locations referred to are to be interpreted in the same manner as the bi~ary .representation of k just described.

6-6

\

QUADRANT
NUMBERS

r

The first m-S cycles

1.

Calculate A

r

=

1,2, ... ,m-S.

(j, ... , j
I' k
l' ... kO)' by use of equation (7).
0
rm- r-

l' k
. 1'" . ,k )' i. e., at
rm-rO
address (j " " , j l' k
1" •. , k ) of PE (k , ... , kO)' Note that on the (nl-S / th
o
. rm-rS
7
cycle it is address (jo' ... , jm- 9) of PE (k , ... , kO) that is meant.
7

2.

Store the result in location (jo' ... ,j

The calculation is performed for all values of (jo' ... ,j -1) and of (k _ _ l' ... , kO)'
m
It is seen from equation (7) that for the computation of tbe A for any Inaex, the
storage locations defined for the quantities on both the left ~~d right hand sides of
the equation are in the same PE memory (the last S bits are the salne). Hence no
transfer of data between PE's is required. Thus all PE'sScompute in parallel and,
mon each cycle, each PE computes the value of A for 2
different values of the
. d ex.
r
In
The final S cycles r
1. ShiftA

=

m-7, m- 6, ... , n1.

(jo, ... ,j
r- l(jO,···,j r- 2,k m- r , ... ,k O) from location
.
m- 10,j r- 2)
j r- 3' k m-r
k O> to location (j 0' ... , j m- 10' k m-r ) in P~

in PE (j m- 9 I • • • I
I • • .,
(j m- 9' ... , j r- 2' k m-r- l' ... , kO)' Note that on the (m-7 )th cycle the shift meant

is from location (jo' ... , jm- 9) in PE (k , ... , kO) to location (jo' ... , jm-l 0' k7)
7
inPE (jm- 9' k , ... , kO)'
Note also, on the mth cycle the shift meant is from
6
location (jo' ... , jm-lO' jm-2) in PE (jm- 9' ... , jm-3' kO) to location (jo' ... , jm-l 0'
kO) in PE (j m- 9' ... , j m- 2)'
Calculate A (jo"'" j l' k
I' ... , k > by use of equation (7).
r
rm-rO
Note on the mth cycle it is A (jo"'" j 1) that is calculated.
2.

r

3.

r-

Store the result in location (jo' ... , j

m-

10' j

r-

1) in PE (j

m-

9' ... ,

j r- 2' k m-r- I' ... ,k )' Note on the mth cycle it is location (jo' ... ,. j m- 10' j m- 1 >
O
in PE (j
9' ... , j
2) that is meant.
mm-

Again the calculation is performed for all values of (jo' ••• ' jr-l) and of
(k
. m-r- I" • • • I kO). The shift called for in step 1 is designed to bring together in
the same PE the values of A 1 with the two indices. They differ only in the r-th

r-

bit appearing on the right-hand side of equation (7).

Once the shifts have been

accomplished all PEls compute in parallel and each computes the values of A for
..
r
2 ffi S values of the index.

6-7

The shifts required in step 1 are determined by the values of the bits j 2 and
k
. Since only four possible combinations of the values of these blt§-are possillrl, the corresponding shifts may be tabulated.

Table 6-1.

Shifts Required for Combinations of Bits j

Bit
jr-2

r-

2 and k

m-r

Shift
k

m-r

0

0

0

1

1

0

No shift.
Shift to PE of lower number (_2 m - r ).
Increase location by 1.
Shift to PE of higher number (+2 m - r).
Decrease location by 1.

1

1

No shift.

'----.

Since the possible combinations of bit values occur with equal frequency it is seen
from the above table that on any cycle (r) precisely half the data has to be shifted
by inter-PE shiftin£. Of the data that is shifted, half goes to a PE of higher
m r
m r
number (+2 - ) and half goes to a PE of lower number (_2 - ). This shifting is
such that on cycle r, two PE's whose number in the array differ only in the r- (m- 8)
bit position exchange a word of data, for each two word s they contain.
The required shifting of data between PE memories is accomplished by one or two
routing instructions. In the fir st cycle requiring such a shift each PE in the block
of 128 PE's (those numbered 0-127) sends and receives words from the corresponding PE in the second block of 12~ PE's (128- 255). Examination of the PE
numbering system of figure 6-3 shows that the first four rows of PE's in each
quadrant exchange words with the second four rows. The end-around cylindrical
connection of the PE's on the North and South edges of each quadrant are such that
sending and receiving can be accomplished in one instruction, since all PE's in a
quadrant shift a word 4 rows South, end- around, simultaneously.
In the second cycle requiring a shift, each PE in 2 blocks of 64 PE's (0- 63; 128-191)
exchanges words with the corresponding PE in the corresponding block of the remaining 2 blocks of 64 PEls (64-127; 192-255). This requires the first two rows
of PEls within a quadrant to exchange words with the second two rows and the
third pair of rows to exchange words with the fourth pair of rows. In this case

6-8

the end- around connection cannot be used and the exchange takes place under
mode control in the two parts. First, rows I, 2, 5, 6 transmit data two rows
South to rows 3, 4, 7, 8 respectively. Second, rows 3, 4, 7, 8 transmit data
two rows North to rows I, 2, 5, 6 respectively.
In the third cycle information is exchanged between adjacent rows.
The following three cycles repeat the same pattern of shifts but between columns
rather than rows. Next a shift of 8 rows is required. This involves the whole
array as data moves from quadrant to quadrant for the first time. Use could be
. made of the end- around cyclindinal connection of the whole array, as in the first
shift described. The final shift is similar, being of distaI1ce 8 between columns.
This pattern of shifts is listed in table 6-2.

Table 6- 2.

Shifting Required for the Final Eight Iterations

Block s of PEl s
Cycle
Number

Size

Distance of
Shift Required
Nearest Number
4*

NS

64

2

NS

8

32

1

NS

m-4

16

16

4*~~

WE

m-3

32

~

2

WE

m-2

64

4

1

WE

m-l

128

2

8*

NS

m

256

1

8** WE

m-7

2

128

m-6

4

m-5

* Denotes that end- around connectivity may be used

to allow both of the shifts to

occur simultaneously.
**

Shifts can also use end- around connectivity, but will require one more step in
routing than "*" shifts, because of the differences in edge connectivity patterns.

6-9

We now consider the powers of W that have to be available in each PE. According
to equation (7) the computation
on the r-th iteration, of A (jo"'" j -1' k _ -1"'"
qr
k ) requires the use of W
where q is given by equation f8). With 1he dJra r
sPorage scheme described the addre§s of A within the PE determines the power
. r
.
of W required in its calculation. According to equation (8), on the first m-8
He rations, put the first r-1 bits of the address in reverse and m.ultiply the resulm r
tant number by 2 - • This gives the required powerOof W. On the first iteration,
no bits are selected by this rule--corresponding to W • For these iterations all
PEls require the same power of the W at the same time. Thus, these powers should
be broadcast, and not stored repetitively in each PE.
On the final 8 iterations one has Ar_1 (jo, .•. , jr-2 k m - s ' •.• , k O> stored in location
jo,".' jnl-10' k m _ r > in PE (jm-9,"" jr-2' k m - r - 1, •. ·, k O)" After the required
shifting has taken place, take the r-1 bits that have been underlined (the first
m-9 in the memory address followed by the first r-m + 8 in the PE number), invert
these bits and multiply the number obtained by 2m-r. This is the power of W
required. Since these powers of W depend on the PE number they should be stored
within the PE memories.
On each iteration one power of W for each pair of indices of Ar within the PE is
required. The number of such pairs is 2 m - 9 and the number of iterations in this
mode is 8. Thus 2m - 6 values of powers of Ware required by each PE.
A s in the standard algorithm, at the completion of the m-th iteration, to f.ind the
location of X(j) in the A
array, one interprets the bits of j in reverse as a location in the PE memorTes. However, after reversing the bits of j it is necessary
to shift the final bit (j -1) left eight plac es (to between j -10 and j -9> before
terpreting the locatio!?as in figure 6-1.
m
m

COMPUTA TIONS REQUIRED
To give some idea of the magnitudes involved some figures are given here for
N = 4096 = 212. (m = 12).

=

1.

Number of iterations required (m)

12

2.

Number of coefficients A (k) in each PE(2 m- 8) = 16

3.

Number of values of W stored in each PE (2

m-6

) = 64

m

4.
Computation required for each pair of coefficients (2 - 9 = 8) per
P~, is indicated by equation (7), 1 complex multiplication and 2 complex additions.
This in terms of real arithmetic, amounts to 4 multiplications and 6 additions for
each pair of coefficients. On the final 8 iterations data transfers occur and 2
words (real and imaginary part of one of the pair) are transmitted and 2 words
received for this amount of calculation. Since there are 8 pairs of coefficients

6-10

in each PE, for each iteration a PE:

1.

Transmits 16 word s of data

2.

Receives 16 words of data

3.

Performs 32 multiplications

4.

Performs 48 additions.

}

Not required on first
4 of the 12 iterations

During the first 4 iterations one is effectively solving 256 problems in parallel
(one in each PE), each problem being of size N = 16. Duripg the final 8 iterations one is solving equivalently, 8 problems, each problem being of size N = 512.
These last problems are distributed uniformly throughout the array.

6-11

SECTION VII
CIRCUIT DESIGN - THE ECL CIRCUIT

The basic logical circuit for ILLIAC IV is the current switch, or ECL circuit,
shown in figure 7 -1. Logical operations can be performed in this circuit by supplying input transistors in parallel, by tying collectors together into a common
collector resistor, and by connecting together the output ernitters. Gating on the
input transistors appears as "not OR" for positive -going signals when seen at the
inverting output "b, " or as OR when seen at the noninverting output, Ila." For
negative -going signals, gating at the input transistors appears as "not AND" and
AND, respectively, at the inverting and noninverting outputs. Logic gating can also
be done by sharing a single resistor among otherwise independent collectors. Such
sharing produces an OR for negative signals or an AND for positive signals, and
would appear to be a way of adding another gate to the logic without adding any
components or any delay_ However, one must take care of the case that two collectors are delivering current into the resistor simultaneously, either by clamping
the voltage, which costs components, or by ensuring in the logical design that no
more than one current flows at anyone time. Mixing outputs by tying the emitters
together performs exactly the same function as mixing inputs of the next stage in
the input transistor, namely negative AND or positive OR. Implementation of any
logic equation using such gates can always be found by translating the AND - OR
description implicit in the 19ical equation into a NAND-NAND description, level for
level. As reference to the logical design of the processing element elsewhere in
this report will show, designs can often be simplified considerably from the
directly translated version.
The use of ECL circuits differs depending upon the amount of wiring which must be
driven and the speed to be obtained. When integrating within the semiconductor
array, freedom to use the ECL gates just described is virtually' unlimited. When
_ any wiring is involved, however, the low impedance point, namely the emitter
outputis used as the source when signals must be sent along conductors from one
circuit package to another. Further, fanout suffers on signals which must leave
the circuit package because of the need to supply damping and terminating
resistors for the wiring.

7 -1

v cc

-0

(b)

Inverting Output
(a)

IN o---;--t

Non- Invertlng Output

V

ee

Figure 7-1.

EeL

Gate~

Schematic Diagram

I

I1 +3.5v

200

I

+1. 2v

I

1____ ---.

61

--l---l

I

In
From
Logic

I
I

I

0--+--1

I

-3.

v

I
I

183

Line

36

I
I

-2. Ov

o-----~------------I~--------~

I

Equivalent
Load

Note: All Resistance Values in Ohms.
Figure 7-2.

Driver~

Schematic Diagram

I

+1. 2v

120

I

__ ..J

I~
From
Line

V

ref

320

2K

-3.5v

Note: All Resistance Values in Ohms.
Figure 7-3.

7-2

Receiver~

Schematic Diagram

Out
to
Logic

A t the collector, signal levels are approximately +1. 2v and +0. 4v. The swing is
set essentially by the current in the emitter, it is related therefore to V , V
and resistor ratios. The more positive level departs from V
by an arrrcrunt cc
determined by base current in the outp.1 t transistor, a few peF8ent at most of the
"on" current. At the II output" emitter in figure 7 -I, the voltage is downshifted by
an amount equal to the base -to-emitter voltage of the output transistor. The outp.1 t
therefore swings from +0. 4v approximately to -0.4v approximately. These output
voltages are made meaningful with respect to the nominally zero volt input threshold whatever gate receives the.
Signals which must travel considerable distance, such as between cabinets, for
instance, may need more margin than that supplied in the signals at the output of
the ECL gates. The extra margin is needed to overcome distortions in the signal
due to imperfect impedance matching in the wiring, unwanted components due to
crosstalk, noise from external sources, and discrepancies in temperature and
perhaps even in "zero volts" between the two cabinets. The requirements for
these non -EeL signals appear to be as follows:
•

Compatibility with ECL levels at the input of the driver and the
output of the receiver

•

Larger signal swing (3. Ov based upon previous Burroughs experience)

f)

•

Drive capability for several transmission line characteris impedances
in parallel (at least two; or 36 ohms load .on each signal, if 72 -ohm
line is used)
Fanout of 64 (from control unit to all 64 P. E' s); this implies an input
inlpedance of over 2. 3k ohms per receiver if 32 receivers are to be
attached to a single 72 -ohm line. )

A driver circuit which satisfies these requirements is shown in figure 7 -2. Note
that the portion of the circuit to the left of the dotted line in the driver circuit is
identical (except for a somewhat lower "impedance .level) to our standard EeL gate.
When this driver circuit is implemented as a portion of a large-scale integrated
array, the portion of the circuit on the left, which is "the same" as the standard
EeL circuit, is ·available for performing some logic function. Only the noninverted output, which feeds the large -swing signal, would be unavailable for
normal EeL use.
A receiver circuit which satisfies these requirements is shown in figure 7 -3.
Note that the portion of the circuit to the right of the dotted line in the receiver
circuit is identical to our standard EeL gate. When this receiver circuit is implemented as a portion of a large -scale intregrated array, the portion of the
circuit on the right, which is the same as the standard EeL circuit, is available
for performing logic functions within the array.

7 -3

+ 1.2v
-3.5v

'---+---o OUT

IN

OUT
~FALSE)

LONG LINE

-3.5v
-3.5v
DRIVER

Figure 7-4.

RECEIVER

EeL Driver-Receiver, Balanced Signals, Schematic Diagram

The signal across the interface, in this system, swings from +2. 5v to -0. 4v.
Threshold is at 1. 2v, at room temperature, and has a slight negative temperature
coefficient. Since rise times no faster than 15 ns are of interest, the gain of the
transistors at 25 mc or thereabout controls the input impedance of the receiver. With
transistors whose cutoffs are in the hundreds of n'legacycles, gains of 30 are easily
available. The +3. 5v supply can be disabled to control intercabinet transfers.
An alternative dirver -receiver circuit has been suggested in which each signal
is transmitted balanced with respect to ground.
This circuit is shown in figure
7 -4. It has a signal swing of 1. 6v, the differential signal between the two outputs,
but makes up for lowered signal swing with increased noise immunity. The noise
immunity is almost equal to half the minimum signal swing, compared to a noise
immunity of 30% to 350/0 of the signal swing in the system exemplified by the driver
and receiver of figures 7 -2 and 7 -3. This scheme has a further advantage in
rejecting noise, in that some noise sources tend to induce a common mode component in the line. This single -ended scheme rejects common mode noise
essentially perfectly up to some maximum amplitude, while the scheme depends
on the coupling between the two conductors of the line to induce a noise component in one conductor equal to the noise component induced on the other by some
external source. The latter scheme is extreluely effective, but not as effective
as balanced signaJs.
An advantage of the balanced signal driver and receiver is that they are identical in design and fabrication with standard EeL gates.
Disadvantages of the balanced signal scheme are severe for certain proposed
uses. In particular, it is not possible to put more than one driver on one signal
wire. For the connections from the PE's back to the memory access buffer,
it would be necessary to give each PE its own expensive wire back to its own
private receiver at the memory access buffer. Using the unbalanced, 3.0 v
signal, one can combine all eight drivers on a single data line. This defect
will be general, whenever several data sources converge on a common destination.
A second defect of the balanced signal scheme arises because the wiring must
be balanced with respect to ground. Thus one must use pairs of wires, either
twisted pair, or shielded twisted pair. Twisted pair is inferior to coaxial cable
or ribbon cable in terms of crosstalk; shielded twisted pair is considerably more
expensive both to buy and to install than coaxial cable and ribbon cable. Even
unshielded twisted pair, when procurred in belted form, can be surprisingly
expensive.
A third limitation of the balanced signal scheme arises because of the difficulty
in designing for low .output impedance. As described to us, the drive capability
of the balanced driver was only sufficient to drive a single transmission line. In
the case of data being transmitted from one PE to neighboring PE's, the data
transmission paths go in both directions from the transmitting PEa For optirEum
per.formance, it is necessary to drive two transmission lines, one going in each

7 -5

direction. Therefore, two balanced driver circuits are required to handle the
signal. Greater fan -out at the receiver end is also available with the unbalanced design.
The conclusion is that either driver and receiver design is satisfactory for use
in cables up to some maximum length, where not more than one driver per signal set is required. Ordinary twisted pair can probably be driven by the balanced
driver design in medium length runs of up to 10 or 30 feet where the unbalanced
driver would require coaxial cable or the equivalent. Beyond that, in either
case, higher quality wire is required, such as ribbon cable using three wires
per signal, coaxial cable, or shielded twisted pair. For this last class of signals,
single -ended signals would be more economical of wiring, would be directly
compatible with popular types of "foreign" logical circuitry which is likely to
be found in external equipment, and would be more compatible with any requirement such as r. f. filtering. A table of "long lines" signals is in table 7-1.
Table 7 -1 contains our conclusions on the implementation of such "long lines"
Signals. Either balanced or unbalanced signals will be acceptable within each
quadrant, assuming each quadrant to be packaged within a single unit, a set of

Table 7 -1.

Signals Requiring Driver and Receiver
Estimated
Suitable
Length (feet) Driver Design

Class of Signals
Pe to PE (same cabinet)

no driver needed, standard ECL signals suitable

short

6

either

- - --

PE to 110 Buffer

30

either

- - --

PE to memory access buffer

30

single ended

30

either

- ---

CD to CD, different quadrants 80

single ended

economically more
attractive wiring

200

single ended

compatibility with
foreign signals.

Pe to PE (different cabinets)"

I

PE from CD, control signals
and mode control

CD to peripheral devices

7-6

Comments

drivers must be aRable

cabinets bolted together.
For longer runs, the lower price of coaxial c;~::., ..
the greater compatibility of unbalanced signals at a "foreign" interface ~.,(,.'::,' ,
call for the use of 3. Ov unbalanced signals between quadrants, and frGrn th . :
array processor to the outside.
Figure 7 -5 shows an example of how the unbalanced system nlight be used ~('
distribute control signals from the CU to the 64 PE's assunling tha t the I'~: :';: :,"
receiver in each PE. A fanout of 64, as shown, may be beyond the C'~qnL~l: ~:'
of either scheme at the required speed, especially the balanced signal SC!:(·~ ....
However, the grouping of the PE s is such that one receiver should do for ;1<'.::'
or eight PE' s.

32 Receivers
(in PE's)

r----4f-+-----1~---_t>_+-------

-

-

-

-

-

-

----e---f---

r - - - - * - - - - - * - - - - - + - - - - - - - - - - - - - - - - - - - - - - - -...- ...-... -..

Drive r

32 Receivers

(in CU)

(in PE's)

L.-...1f---Clf-+-----1~---~>_+------ -

-

-

-

-

-

----.e---t--

L........._________-----+----------.--------->-----~-."

Figure 7-5.

Use of Drivers and Receivers for Distributing COc.tI>,.)1
Signals from C U to PEl S

..'" ..



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37
Create Date                     : 2014:03:01 14:44:47-08:00
Modify Date                     : 2014:03:01 14:00:20-08:00
Metadata Date                   : 2014:03:01 14:00:20-08:00
Producer                        : Adobe Acrobat 9.55 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:4bf7a417-ada1-a64c-b6b6-1239b08bd0e7
Instance ID                     : uuid:b0197436-cfcf-9e4e-af8c-1a1c12bbbe8c
Page Layout                     : SinglePage
Page Mode                       : UseOutlines
Page Count                      : 74
EXIF Metadata provided by EXIF.tools

Navigation menu