1970 05_#36 05 #36

1970-05_#36 1970-05_%2336

User Manual: 1970-05_#36

Open the PDF directly: View PDF PDF.
Page Count: 739

Download1970-05_#36 1970-05 #36
Open PDF In BrowserView PDF
AFIPS
CONFERENCE
PROCEEDINGS
VOLUME 36

1970
SPRI NG JOI NT
COMPUTER
CONFERENCE
May 5 -7, 1970
Atlantic City, New Jersey

The ideas and opinions expressed herein are solely those of the authors and are not
necessarily representative of or endorsed by the 1970 Spring Joint Computer Conference Committee or the American Federation of InformatioQ- Processing Societies.

Library of Congress Catalog Card Number 55-44701
AFIPS PRESS
210 Summit Avenue
Montvale, New Jersey 07645

~

©1970 by the American Federation of Information Processing Societies, Montvale,
New .Jersey 07645. All rights reserved. This book, or parts thereof, may not be
reproduced in any form without permission of the publisher.

Printed in the United States of America

CONTENTS
GRAPHICS-TELLING IT LIKE IT IS
An algorithm for producing half-tone computer graphics presentations
with shadows and movable light sources ...................... .

1

The case for a generalized graphic problem solver ................ .

11

J. Bouknight
K. Kelley
E. H. Sibley
R. W. Taylor
W. L. Ash

PATENTS AND COPYRIGHTS
(A Panel Session-No Papers in this Volume)
1\1ULTIPROCESSORS FOR l\1ILITARY SYSTEl\1S
(A Panel Session-No Papers in this Volume)
THE INFORMATION UTILITY AND SOCIAL CHOICE
(A Panel Session-No Papers in this Volume)
ANALOG-HYBRID
A variance reduction technique for hybrid computer generated random
walk solutions of partial differential equations ................. .
Design of automatic patching systems for analog computers ....... .
18 Bit digital to analog conversion ............................. .
A hybrid computer method for the analysis of time dependent river
pollution problems. ~ ....................................... .

19
31

39

E. L. Johnson
T. J. Gracon
J. Raamot

43

R. Vichnevetsky
A. Tomalesky

Programmable indexing networks .............................. .
The debugging system AIDS .................................. .
Sequential feature extraction for waveform recognition ............ .

51

Pulse amplitude transmission system (PATSY) .................. .

77

K. Thurber
R. Grishman
S. Yau
W. J. Steingrandt
N. Walters

PROGRAl\1 TRANSFERABILITY
(Panel Session-No Papers in this Volume)
COl\1PUTING IN STATE GOVERNl\1ENT
(Panel Session-No Papers in this Volume)
TOPICS OF SPECIFIC INTEREST
59

65

ALGORITHl\1IC STRUCTURES
Termination of programs represented as interpreted graphs ........ .
A planarity algorithm based on the Kuratowski theorem .......... .

83
91

Combinational arithmetic systems for the approximation of functions

95

Z.
N.
P.
C.
A.

l\1anna
Gibbs
l\1ei
Tung
Avizienis

OPERATING SYSTEl\'lS
Operating systems architecture .................... : ............ .
Computer resource accounting in a time sllaring environment ...... .

109
119

H. Katzan, Jr.
L. Selwyn

.:Vlultiple consoles-A basis for communication growth in large systems

131

Hardware aspects of secure computing .......................... .
TICKETRON-A successfully operating system without an operating
system .................................................... .

135

D. Andrews
R. Radice
L. l\IIolho

143

H. Dubner
J. Abate

157

L. Symes

A study of user-micro programmable computers ............... .

165

Firmware sort processor with LSI components ................... .
System/360 model 85 microdiagnostics ...... ; ................... .

183
191

Use of read only memory in ILLIAC IV ........................ .

197

C. Ramamoorthy
lVI. Tsuchiya
H. Barsamian
N. Bartow
R. l\1cGuire
H. White
E. K. C. Yu

':Vlanipulation of data structures in a numerical analysis problem
solving system-NAPSS .................................... .
::'VIICRO PROGRAl\LUING

LESSONS OF THE SIXTIES
(Panel Session-No Papers in this Volume)
DIGITAL SIlVIULATION APPLICATIONS
A model and implementation of a universal time delay simulator for
large digital nets ............................................ .

207

A.
D.
E.
H.

Szygenda
Rouse
Thompson
l\!Iorgan

H.
C.
P.
R.
O.

Trauboth
Rigby
Brown
Gerard
Serlin

UTS-I: A macro system for traffic network simulation ............ .
Real time space vehicle and ground support systems software
simulator for launch programs checkout ....................... .

223

Remote real-time simulation ................................... .

237

l\,fARSYAS-A software system for the digital simulation of physical
systems .................................................... .

251

H. Trauboth
N. Prasad

Picturelab-An interactive facility for experimentation in picture
processing ................................................. .

267

Power to the computers-A revolution in history? ................ .
l\1usic and the computer in the sixties .......................... .
Natural language processing for stylistic analysis ................. .

275
281
287

W. Bartlett
E. Arthurs
D. Ladd
R. Salmon
J. Whipple
S. Hackney
R. Erickson
H. Donow

COlVIPUTERS IN EDUCATION: lVIECHANIZING
HUl\1ANIZING lVIACHINES

HU~VrANS

217

OR

(A Panel Session-No Papers in this Volume)
PROPRIETARY SOFTWARE-in the 1970's
(A Panel Session-No Papers in this Volume)
HU1\1ANITIES

INFOR1VIATION MANAGEl\1ENT SYSTEl\1S-FOUNDATION
AND FUTURE
An approach to the development of an advanced information
management system." ....................................... .

297

J. l\1yers
S. Chooljian

The dataBASIC language-A data processing language for
non-professional programmers ............................... .
LISTAR-Lincoln Information Storage and Associative ........... .

3p7
313

All-automatic processing for a large library ....

!-' • • • • • • • • • • • • • • • • •

323

N aturallanguage inquiry to an open-ended data library ........... .

333

P. Dressen
A. Armenti
S. Galley
R. Goldberg
J. Nolan
A. Sholl
N. Prywes
B. Litofsky
G. Potts

SYSTEM ARCHITECTURE
Computer instruction repertoire-Time for a change .............. .
The PlVIS and ISP descriptive systems for computer structures .... .

343
351

C. Church
C. Bell
A. Newell

Reliability analysis and architecture of a hybrid-redundant digital
system: Generalized triple modular redundancy with self-repair ...

375

The architecture of a large associative processor ............. '.' ... .

385

F. l\1athur
A. Avizienis
G. Lipovski

NUl\1ERICAL ANALYSIS
Application of invariant imbedding to the solution of partial
differential equations by the continuous-space discrete-time method
An initial value formulation for the CSDT method of solving partial
differential equations .......... .' ............................ .
An application of Hockney's method for solving Poisson's equation ..

397

P. Nelson, Jr.

403
409

Architecture of a real-time fast fourier radar signal ............... .

417

V.
R.
R.
S.
A.

An improved generalized inverse algorithm for linear inequalities and
its applications ............................................ .

437

L. Geary
C. Li

449

O. Dial

A continum of time-sharing scheduling algorithms ................ .
The management of a multi-level non-paged memory system ...... .

453
459

A study of interleaved memory systems ......................... .

467

L. Kleinrock
F. Baskett
J. Browne
W. Raike
G.Burnett
E. Coffman, Jr.

Vemuri
Colony
Reynolds
Wong
Zukin

SON OF SEPARATE PRICING
(Panel Session-No Papers in this Volume)
SOCIAL Il\1PLICATIONS
The social impact of computers ................................ .
COJ\1PUTER SYSTEl\1 l\10DELING AND ANALYSIS

lV[EDICAL-DENTAL APPLICATIONS
A computer system for bedside medical research ................. .

475

Linear programming in clinical dental education ................. .
Automatic computer recognition and analysis of dental x-ray film .. .

485
487

S. Wixson
E. Strand
H. Perlis
C. Crandell
D. Levine
H. Hopf
lVI. Shakun

PROGRAl\llVIING LJiNGUAGES
A translation grammar for ALGOL 68 .......................... .
BALl\1-An extendable list-processing language .................. .
Design and organization of a translator for a partial differential
equation language .......................................... .

493
.507

V. Schneider
1\1. Harrison

513

SCROLL-A pattern recording language ........................ .
Al\-1TRAN-An interactive computing system ................... .

525
537

A. Cardenas
W. Karplus
l\1. Sargent III
J. Reinfelds
N. Eskelson
H. Kopetz
G. Kratky

RESOURCE SHARING C01\1PUTER NETWORKS
Computer network development to achieve resource sharing ....... .
The interface message processor for the ARPA computer network .. .

543
551

Analytic and simulation methods in computer network design ...... .
Topological considerations in the design of the ARPA computer
network ........................................ " ......... .

569

HOST-HOST Communication protocol in the ARPA network ...... .

589

581

L. Roberts
F. Heart
R. Kahn
S. Ornstein
W. Crowther
D. Walden
L. Kleinrock
H. Frank
1. Frisch
W. Chou
S. Carr
S. Crocker
V. Cerf

REQUIRElVIENTS FOR DATA BASE l\1ANAGEl\1ENT
(Panel Session---..:No Papers in this Volume)
l\/IAN-l\1ACHINE INTERFACE

A comparative stydy of management decision-lVlaking from
computer-terminals ......................................... .

599

An interactive keyboard for man-computer communication ........ .
Linear current division in resistive areas: Its application to computer
graphics .................................................. .

607

Remote terminal character stream processing of multics ........... .

621

613

C. Jones
J. Hughes
L. Wear
J. Turner
G. Ritchie
J. Ossanna
J. Saltzer

ARTIFICIAL INTELLIGENCE
A study of heuristic learning methods for optimization tasks requiring
a sequence of decisions ...................................... .
l\1an-machine interaction for the discovery of h!gh-level patterns ... .
Completeness results for E-resolution ........................... .

629
649
653

L. Huesmann
D. Foster
R. Anderson

DATA COlVIlVION CARRIERS FOR THE SEVENTIES
(A Panel Session-No Papers in this volume)
~VIINICOlVIPUTERS-THE

PROFILE OF TO:.\'IORROW'S

CONIPONENTS
A ne\v architecture for mini-computers-the DEC PDP-ll ........ .

657

A systems approach to minicomputer I/O ....................... .
A multiprogramming, virtual memory system for a small computer ..

677
683

Applications and implications of mini-computers ................. .

691

G. Bell
R. Cady
H. NlcFarland
B. Delagi
J. O'Laughlin
R. Noonan
W. Wulf
F. Coury
C. Christensen
A. Hause
G. Hendrie
C. Ne\vport

BUSINESS, COlVIPUTERS, AND PEOPLE?
Teleprocessing systems software for a large corporation information
system ............................. '....................... .

697

H. Liu
D. Holmes

The selection and training of computer personnel at the Social Security
Administration. . . . ........................................ .

711

E. Coady

PROCESS CONTROL
(A Panel Session-No Papers in this Volume)

Editor's Note:
Due to the recent embargo of mail, several papers went to press without author and/or proofreader corrections.

An algorithm for producing half-tone computer graphics
presentations with shadows and movable
light sources
by J. BOUKNIGHT and K. KELLEY
University of Illinois
Urbana, Illinois

INTRODUCTION

scene recursively into quarters until all detail in a
given square is known or the smallest size square is
reached. The result is a set of "key squares", that is
intensity change points, along the visible edges in the
scene. The time required for this algorithm varies
linearly as the total length of the visible edges in the
picture, but varies also as the square of the raster size.
An important' feature of the Warnock algorithm is that
it handles the occurrence of the intersection of two
planes without having to precalculate the line of intersection.
At the General Electric Electronics Research Laboratory in Syracuse, a system which combines both hardware and software to produce color half-tone image~ in
real time has been developed for NASA as a simulator
for rendezvous and docking training. This device can
hold up a 600 X 600 raster point picture of up to 240
edges, in color, and change the picture as quickly as
the beam scans the screen. 3
The work of the computer group at the Coordinated
Science Laboratory began as an effort to add some
realism to line drawings of structures being generated
by R. Resch, who, while working in the laboratory,
was also a member of the faculty of the Department cif
Architecture. Through his acquaintance with J. Wa~
nock, we were able to implement a version of the
Warnock algorithm which operates on the CDC 1604.
After several re.visions of the implementation and some
fine tuning of the CRT display hardware, black and
white half-tone images of the Resch structures were
exhibited at the Computer Graphics Conference at the
University of Illinois in April of 1969.
In discussions with J. Warnock and Robert Schumacher of General Electric, we envisioned a hidden
surface algorithm using a scanline technique combining
the recursiveness of the Warnock algorithm with the
hardware techniques used in the NASA simulator.

In the years since the introduction of SKETCHPAD
an increasing number of graphics systems for line drawing have been developed. Software packages are now
available to do such things as picture definition, rotation and translation of picture data, and production
of animated movies and microfilm. Automatic windowing, three-dimensional figures, depth cueing by intensity, and even stereo line drawing are now feasible and
in some cases, available in hardware.
Even with all these capabilities, however, representation of three-dimensional data is not quite satisfactory. Representing a solid object by lines which
define its edges leads to the computer generated unreality of being able to see through solid objects. In
recent years, research centered around means for computer graphical display of structural figures and data
has begun to move from display of "wire-frame"
structures where the "wires" represent the edges of
the surfaces of the structures, to the display of structures using surface definition techniques to enhance
the three-dimensional appearance of the final result.
Several efforts have been concentrated on producing
graphical output which is similar to the half-tone
commellCial printing process.
The work of Evans, et aI., at the University of
Utah! established the feasibility of using a computer
to produce half-tone images. Their algorithm processes
structures whose surfaces are made up of planar triangles. The algorithm employs a raster scan and
examines crossing points of the boundaries of the triangles by the scanning ray. A significant feature of
their method is that the increase in computing time is
linear as the resolution of the picture increases.
John Warnock's algorithm for half-tone picture representation employs a different technique. 2 He divides the
1

2

Spring Joint Computer Conference, 1970

These discussions were the impetus for the development
of the LINESCAN algorithm. 4 It was for this implementation that laboratory engineers added a raster
scan operation to the display hardware. As a result,
the algorithm does not have to provide the scope with
an item of data for each and every point. Only the
location of an intensity change on the scanline and the
new magnitude of the intensity are needed. In addition,
the display hardware was modified to allow 256 levels
of intensity under program control.
N either of the previously mentioned algorithms for
half-tone images represented the picture with a light
source located away from the observer position, although both regard it as a next step in development.
Moving the light source away from the observer position presents the problem of cJ~st sgadows. Arthur
Appel's work approaches the shadow problem and the
hidden line or surface problem simultaneously. 5 His
algorithm also scans the picture description in a linescan
manner. The question of which parts of which surfaces
are visible is answered by a technique called "quantitative invisibility6". His structures are composed of
planar polygonal surfaces. Appel also includes the
ability to handle multiple illumination sources and the
shadows cast due to those sources. His shadow boundaries are computed by projecting points incrementally
along the edge of a shading polygon to the surface
which will be shaded.
The work of the Computer Group at the Coordinated
Science Laboratory to move the ·light away from the
observer began shortly after the completion of the
LINES CAN algorithm. Augmentation of the original
LINESCAN method with a dual scan for shadows cast
upon the surfaces presents two questions.
First, the number of projected shadows to be calculated must be kept to a minimum; but the technique
for narrowing the set of polygon pairs must be simple.
For a single illumination source, we are constrained by
the fact that for n polygons, there are n(n - 1) pairs
to be considered. The method chosen to narrow the
set of shadow casting and receiving polygons was to
project the polygons onto a sphere centered at the
light source and make some gross comparisons of maximum and minimum Euclidean coordinates of the points
so projected. 7 The transformation to the sphere was so
devised that no trigonometric functions or square roots
were used. The comparisons used are not intended to
discard all pairs that do not cause shadows. The point
is to discard as possible shadow pairs all case~ in which
it is obvious that a shadow is not cast on one by the
other. Some nonshadow-producing pairs do slip through
the first set of tests; this is allowed because the second
set of tests can check these cases with less overall
programming effort and execution time.

The second question which the algorithm answers is
how to handle the most prevalent situation of shadows
cast by one polygon only partially falling on another
polygon. It is not necessary to compute the boundaries
of the intersection of a polygon with a shadow cast
upon its plane. The decision to reduce intensity as the
scan enters the shadowed portion of a polygon is left
to the final picture-producing stage of the process.
Also in the present version no computation is wasted
to see if the cast shadow is visible to an observer.
Shadows are output with tags to tell which polygon is
being shadowed and which polygon is casting the
shadow. The final step of the process responds to
shadow information by making appropriate intensity
changes only in the case that the shadowed polygon is
the same as the current visible polygon.
THE LINESCAN ALGORITHM AND ITS
ADAPTATION FOR SOLVING THE
SHADOW PROBLEM
The LINESCAN algorithm presents itself as a likely
candidate for extension to a system for solving the
shadow problem in half-tone image processing because
of its speed of operation and because it is directly
suited to processing a shadow-space which is structured
in the same manner as the associated three-space polygonal surface structures. Shadows cast by one polygon
onto another by point illumination sources are them-

Light
Source

Planar
Polygon
in Three
Space

Projection of Square
on Polygon Plane

-~~+I--"

Window on Viewing Plane

Observer Position

Figure I-Shadow and object projections

Algorithm for Producing Half-Tone Computer Graphics

selves polygons in the same three-space. The resulting
shadow three-space can be projected onto the viewing
plane in the same way as the original three-space
structure (see Figure 1). Thus, the extension of the
LINES CAN algorithm involves only the addition of a
second scanning process for keeping track of shadows on
each scanline.
A brief description at this point will serve to orient
the reader to the actual mechanisms of the LINESCAN
algorithm. The LINES CAN algorithm processes a
graphical image into a half-tone final image from two
data sets derived from the three-space structure: (1) the
set of all plane equation coefficients for the polygonal
surfaces of the structure and (2) the perspective projections of the edges of the surfaces on the viewing
plane.
The construction of the final half-tone image is done
in a television-like manner where a CRT beam scans
across the image line-by-line and exposes a raster of
points. As the beam moves across the scanline, the
intersection points on the scanline corresponding to the
viewable edges of the original structure will dictate
changes in the tone (intensity) of the scanning beam
from that point to the next intersection. These intersection points, which are output to the final image producing routine as "key squares", are the primary output
data produced by the LINESCAN algorithm.
"Key squares" are generated in two types of locations on the viewing plane during a linescan operation. The primary type of location is the intersection
of the current scanline with the projection of an edge
of the three-space structure. This location will cause a
"key square" to be produced only if the intersection
is visible to the observer.
The second type of location on the viewing plane
which can cause a "key square" to be produced is the
intersection of an "implicitly defined line" and the
current scanline. An "implicitly defined line" is the
projection on the viewing plane of the intersection of
two or more polygons in three-space. Polygon intersections are allowed in the theoretical world of the
computer even though they violate the law of Nature
that only one object may occupy a given amount of
space. Because of the implicit nature of these intersections, special operations must be performed to detect
and process them to produce the correct final result.
For any given scanline, a linked list is present containing all intersections of. projected edges with that
scanline ordered in the direction of the scanning movement. The LINES CAN algorithm moves from intersection to intersection keeping track of which polygon
projections are entered and which ones are exited by
the scanning ray. At each intersection, a "depth sort"
is performed on those polygons being pierced by the

3

scanning ray to find the closest or visible polygon at
that intersection on the scanline.
The decision to produce a "key square" at a given
intersection point is based primarily upon the relation
of the depth of the edge associated with the point and
the visible polygon determined by the "depth sort" at
the intersection. If the edge is visible, then consideration
is given to whether a polygon projection will be entered
as this edge is· crossed or whether it will be exited.
For the e:ritering case, the "key square" will denote
the new polygon for control of the CRT scanning
beam at that intersection. For the exiting case, the
"depth sort" polygon will be denoted.
Two special problems arise concerning the output of
multiple "key squares" for a given point on the final
image raster. The first requires that constant checking
be performed to see if the integer value of successive
intersection points on the scanline are equal. If this
occurs, any "key square" action which would be taken
for any given intersection in the group will be deferred
until the last member of the group has been processed.
Thus, only one "key square" will actually be produced.
The polygon to be denoted by the resulting "key
square" may change from the beginning of the group
to the end; but in any case, the last visible polygon
will control the result.
The second special case of clos~ intersections occurs
when coincident edges occur in th~pecification of the
three-space structure. Performing a "depth sort" on
the associated polygons at the common intersection
would normally fail because their depths would be the
same. Determination of the visible polygon of the
group is performed by actually moving the scanning
ray a small increment forward in the scanning direction
and computing the "depth sort" at that point. This
will yield the polygon which will be visible just after
the scanning ray leaves the coincident intersection
point.
"Implicitly defined lines" are detected when the
visible polygon denoted at two successive intersection
points is different. The procedure used in searching
for the projected intersection involves finding which
polygons are intersecting and using their plane equation
coefficients to calculate their intersection's projection
and its intersection with the scanline. An iterative procedure is used in order to detect the possibility of
multiple pairs of intersecting polygons which might
yield more than one "implicitly defined line". "Key
squares" will be produced for the calculated intersection points on the scanline subject to the same constraints about multiple "key squares" for the same
raster points in the final image.
The extended version of the LINES CAN algorithm
for solving the shadow problem includes two scanning

4

Spring Joint Computer Conference, 1970

operations. The primary scanning movement is the
original scan operation where the three-space structure
data is processed to provide the final image structure.
The additional or secondary scanning operation processes in a parallel manner, the shadow three-space
stru~ture.to produce data which will be combined with
the primary scan data to form the scanline intensity
data for the final image. The data output from the
secondary scanning operation, which we call "shadow
key squares", affects the intensity patterns of the final
image only. Only the primary scan data defines the
structure.
In order to keep the changes in the LINESCAN
algorithm to a minimum and for any changes made to
have minimum influence in computation speed of the
implementation, it was decided that as much of the
processing operations for the final image as was possible
would be shifted to the final output routine from the
LINES CAN routine. This was because the original
final output routine, PIXSCANR, had been inputoutput bound during the processing of the "key squares"
data file from the LINESCAN routine. The additional
operation impressed on the new version of PIXSCANR
was the keeping track of which shadow polygon projections were being pierced by the scanning ray at any
point in the processing of a scanline. Thus, the only
change to the LINESCAN algorithm involved the addition of the secondary scan which simply detected
the crossings of the scanline by projected edges of the
shadow three-space structure -and issued "shadow key
squares" at every occurrence.
'The dichotomy imposed on the shadow processing
responsibilities between the LINESCAN and PIXSCANR routines has an additional advantage. There
will always be the possibility of cast shadows falling
outside the visible portions of the associated polygons.
Additionally, some polygonal surfaces of the original
surface will not appear in the final image and therefore,
neither will their shadows. We shall see in our discussion of shadow pair detection that it is most economical to allow shadow projections of these kinds to be
processed in the same manner as all other shadow
projections. Their data items will be passed on to the
PIXSCANR routi:n,e where their occurrence will be
duly noted. No effect will be registered on the final
image, however, since the associated three-space surface
polygon will not appear in the final image or at least
not in conjunction with the projection of the extraneous
shadow.
Once the mechanism for producing the proper final
image of the half-tone presentation was established, it
remained to develop the proper procedure for comparing all possible pairs of polygons with respect to the
illumination source, and in an economically feasible man-

ner discard as many extraneous shadow pairings as
pos~ible. Economy of computation speed relative to
total scanline processing time was the main concern.
SHADOW DETECTION
The primary task· to be accomplished in shadow detection is not so much the actual projection of shadows
as it is the elimination of the need for calculating projections and storing shadow polygons unnecessarily.
The number of possible shadows cast is eqUal to the
number of possible pairs of polygons in the structure.
Since this number increases rapidly as the complexity
of the structure increases, it is extremely important to
be able to identify useful shadow pairs with a minimum
of computation and to store this information in a
compact form.
The shadow pairs are stored in a chained list, with
subchains linking all polygons that may shadow a
given polygon. The procedure for narrowing the set
of all possible pairs of polygons to a near minimal set
of shadow producing pairs consists of. two distinct
steps. In the first the polygons are projected onto. a
sphere centered at the light source and are checked In
an approximate fashion for interference with respect to
the light source. In the second step, pairs of polygons
which seem to occlude one another are further examined to determine which polygons may shadow the
others.
The light sphere projection is a device for culling
out certain pairs of polygons which can in no way
interfere with respect to a given light source. It is
only a gross test, intended to ease the burden of computing projections of one polygon onto the plane of
another. The test throws out polygon pairs only if it

Figure 2-Projection of polygon on sphere

Algorithm for Producing Half-Tone Computer Graphics

is obvious that no interference takes place. The light
sphere projection is in no way used to compute intersections of polygon projections on the sphere, nor is it
used to compute intersections of shadow polygons with
the polygons being shadowed.
Every vertex of the three-space structure has to be
projected onto the sphere centered at the light source
in order to make initial interference tests. The sheer
magnitude of the number of operations necessary restricts the projection in a number of ways. Namely, it
would be preferred if the job could be done without
computing any trigonometric functions and if possible,
without computing very many square roots, since each
of these requires much computer time. The projection
is given by:

x

= sgn(X l )

*X *
l2

* *

Y = sgn(Yl ) Y l 2
S
DELTA
Z _ sgn(Zl)
S

K2

DELTA

s

-

*Z l *
2

K2

K2

DELTA

where Xl, Y l , Zl are the coordinates of a point with
respect to an origin at the light source, and DELTA
is the square of the distance from the point to the light
source (see Figure 2). This transformation to the light
sphere is a composite of four transformations which are
done algebraically to arrive at the final transformation:
(1) transform the points to an Euclidean 3-space
with origin at the point of light;
(2) transform these coordinates to polar coordinates;
(3) map these points to the sphere by setting p to a
constant for each point; and
(4) transform the points on the sphere back into the
Euclidean 3-space with origin at the light source.
The algebraic derivation of these transforms yields
a final form that involves some square roots in the
numerator and denominator. However, since only the
relative magnitude is used in the comparison operations,
these results are all squared; and the sign is preserved,
yielding the final transformation.
In order to use the transformed points to determine
which polygons interfere with· each other with respect
to the light source, the maximum and minimum of the
X s , Y s, Zs, and DELTA values are saved for each
polygon, in addition to the transformed points. Also
the coefficients of the equations of the planes in the
light source space are computed and saved for possible
shadow computation.
The first check made for each polygon is to see if it

5

is self-shadowed, that is, to see if the observer and
light source are on opposite sides of the plane of the
polygon. The procedure is to substitute both the light
source point and the observer point into the equation
of the plane. If the two results have different signs, the
polygon is self-shadowed and no shadows cast on it
will be computed. However, shadows cast by the selfshadowed polygon must still be considered.
If a polygon is not self-shadowed, then it is compared
to each remaining polygon in the list to see if it is
obvious that interference does not occur. The criterion
is as follows:
For all pairs of polygons Pi and Ph if the points
transformed to the sphere are separated in X s , Y s ,
or Zs, then the polygons do' not interfere with each
other with respect to the light source.
This criterion amounts to simply examining the orthographic projection of the points on the sphere onto the
coordinate planes and looking for separation by comparing maximums and minimums in each direction.
In the event that the projection of a polygon is so
oriented on the sphere as to wrap around a coordinate
axis, then the maximum· or minimum in some direction
does not occur at a vertex. In this case, for the purpose
of this comparison, the associated maximum or minimum is replaced by the absolute maximum or minimum
coordinate value on the sphere.
When a pair of polygons are not separated enough
for this test to detect the separation, then the maximum
and minimum distances to the light source ·are compared .. If the maximum distance of the vertices of
polygon I from the light source is less than the minimum
such distance on polygon J, then it is clear that polygon
I may cast a shadow on polygon J but not vice versa.
The tests of projections on the light source sphere
eliminates many possible shadow pairs and thus reduces
the total amount of storage and computing time required. This set of tests, however, fails to eliminate
certain other shadow pairings which will not affect the
final image. Among these are shadow pairings of polygons with common vertices and of polygons which completely overlap on the sphere. In neither case is there
clear separation on the light source sphere. In the
latter case the sizes of the polygons may be so disparate
as to nullify the usefulness of vertex distance comparisons.
Figure 3 illustrates one of the cases in which the
projection of points of polygon I onto the plane of
polygon J indicates the presence of a shadow, while
the shadow cast in no way falls within the bounds of
polygon J. Since there is no separation between the
two polygons, both possible shadow pairings would be

6

Spring Joint Computer Conference, 1970

,

Light

,

;eLiQht

/LiQht

Case 3

Case 2

/

.-1Il?

Light

/Li9ht

Apparent Shadows
on Extended Plane

Figure 3-0rientation of polygon pairs-Case 1
/

noted, and both would be computed, but neither would
be present in the final picture.
We can eliminate this case and several others like it
(see Figure 4) by defining and appropriately testing
two relations:
I

~

J:

I s J:

"Each point of planar polygon I lies between the light source and the plane of
planar polygon J."

In the case if Figure 3 for example, we see that I ~ J,
J ~ I, Is J, and J s I are all true, and therefore, the
shadows are cast only upon the extended planes of the
two polygons. As a result, neither shadow pairing is
added to the list of. possible shadows.
All of the decisions about possible shadow pairs are
computed in advance of the start of the LINESCAN
operation. As the LINESCAN operates, it has a linked
list for each polygon of all polygons which may cast a
shadow upon it. It is at the point where the first line
of a polygon is processed that the shadows cast upon
it are computed and stored in a list. When the polygon
is no longer active, we purge the shadow information
from the list. Thus, shadow information is calculated
only when it is first needed and discarded when the
need for it ceases.
Shadows are computed by projecting the vertices of
one polygon onto the plane of another. The parametric
form of the equation of a line is used to calculate this
projection. Given' two points Pl(X l, Y l , Zl) and

l Light

LiQht

Case 8

Case 7

Case 6

"The planar polygon I is entirely on one
side of the plane of planar polygon J."

•..'"

Case 5

Case 4

Figure 4-Polygon orientations

P 2(X 2, Y 2, Z2), the set of points P(X, Y, Z) that lie on
a line joining PI and P 2 is given by:

xX2

-

Xl
Xl

Y - Yl
Y2- Yl

Z - Zl
Z2 - Zl

Setting each of these ratios equal to a parameter r
yields the parametric form:

X = Xl

2 -

+ r(Y
= Zl + r(Z2 -

Y = Yl

Z

+ r(X

2 -

Xl)
Yl)

Zl)

PI and P 2_ are so chosen that PI is the light source
position and thus the origin of the system. This reduces
the equations to:

Algorithm for Producing Half-Tone Computer Graphics

The parameter

T

has the following useful properties:

T>1==>

P is on the extension of P 1P 2

T=1==>

P is identical to P 2

T=O==>

P is identical to PI
P is between PI and P 2

7

polygon was, in fact, visible at sor;ne point on the
scanline. This technique would eliminate a large number
of "shadow key squares" that are, in fact, not needed
at all in the production of the picture. Another extension
being considered is to allow polygons to have a degree
of translucency. However, self-shadowing polygons then
would have shadows visible on them and this would
cause a large increase in the number of shadow lines.

P is on the extension of P 2P I

In addition to providing the coordinates of projected
points, T can be used in establishing the truth value of
the relations I s J and I ~ J. As we see in Figure 5,
it is necessary to use the values of T for each vertex to
see if the shadow makes sense. In this case polygon 7r i
projects onto the plane of polygon 7rj as PI', P 2', P a', P/.
The routine has to check for such situations and instead
use PI', P 2', P a", P/'. We do not, however, make any
checks to see whether the shadow as cast is visible.
The fact that two or more polygons may cast shadows
that overlap on a given polygon has no effect on the
computation. The final stage of the process takes care
of such contingencies.
We feel that in further implementations it would be
useful to defer actual computation of "shadow key
squares" until a complete scanline is processed. In this
manner, it would be possible to introduce and compute
shadows cast on a polygon only in the case that the

Figure 5-Shadow projection

THE FINAL OUTPUT PROCESS
The final data set for the half-tone image consists of
two parts. The first part contains the three-space plane
equation coefficients for the surfaces of the three-space
structure and the position data for the illumination
source. The second part contains the linear string of
scan control data items: "key squares", "shadow key
squares" and "self-shadow key squares". It is the function of the PIXSCANR routine to assimilate these
two masses of data and to produce the final half-tone
image.
In order to couple our results closely to the equipment
that was available for our use, we modified the display
hardware to provide a special raster scanning operation
in which the equipment automatically performs the
function of stepping across the raster, and our data
input specifies what intensity levels will be used in
various sections of the scan. The raster is plotted from
left-to-right and bottom-to-top on the display screen.
A data item initializes the scan to the starting position
and gives the initial intensity value for the CRT beam.
In addition, the stepping increment 0 is given. When
the scanning operation comes to the end of a scanning
line, the value of the x coordinate is reset to 0 and the
y coordinate is incremented by o.
The remaining data items presented to the display
hardware consist of an x coordinate value and an
intensity value. As the scanning operation proceeds,
the current x coordinate value of the scan is compared
to the x coordinate of the next data item. If agreement
is achieved, then the intensity of the beam is adjusted
to the new value and the scan continues. Once set, the
beam intensity does not change.
This raster scan operation allows the final image to
be exposed with a minimum number of actual data
items being sent to the display hardware. A moderately
complex picture might have, for example, an average
of 20 intensity changes per scanline. If it were necessary
to send display information for every point in the
picture, each line would have 512, 1024, 2048, ... or
more data items associated with each scanline. Another
benefit gained from this condensed data format is that
the amount of data needed per picture varies in a

8

Spring Joint Computer Conference, 1970

linear manner with the size of raster being used. Computation speed also varies in a linear manner.
The section of PIXSCANR which processes the first
part of the data file from the shadow-tone algorithm
establishes the· intensity functions associated with the
polygons of the three-space structure. A basic assumption made in our current implementation is that the
intensity of light reflected from a given planar surface
is uniform over the entire surface. Although this does
not hold true in the physical world, it is close enough
for our purposes.
We selected a cosine function for the intensity of the
reflected light from a given surface. A ray emanating
from the center of the illumination source and passing
through the "centroid" of the surface defines the angle
of incidence of the light for the entire plane. The
"centroid" is calculated by finding the average value
of the x and y coordinates of the vertices of the polygon
and solving for the corresponding z using the equation
of the associated plane. The cosine of the angle between
the surface of the polygon and the ray drawn to the
illumination source is then given by:
cosO

=

data set complicates the process immensely. Recall that
the function of keeping track of which shadows are
being pierced by the scanning rayon a scanline is now
a proper operation to be performed by PIXSCANR.
Shadow tracking is accomplished in an n X n binary
array, in which the (i, j)th position is a 1 if polygon j
is casting a shadow on polygon i. As the "shadow key
squares" are processed from the data file, the associated
positions in the binary array are flipped from the inshadow state to the out-of-shadow state. When a structure "key square" is processed, the intensity of the

I Aa + Bb + Cc I
viA + B2 + C2 va + b2 + c2
2

2

where the A, B, C are coefficients of the plane equation
and a, b, c, are direction numbers of the ray. The
intensity of the reflected light from the surface of a
polygon not in shadow is given by:

Ii

= I cos () 1* Ri* RANGE + IMIN

RANGE and IMIN are parameters controlled by the
user which specify the total range of intensity to be
used'in the half-tone image and a translation of that
range along the scale of the display hardware. Ri is a
pseudo-reflectivity coefficient specified by the user for
each polygon surface to allow some differentiation between surfaces. Those polygons indicated by "selfshadow key squares" are assigned a special intensity
due to "ambient" light. This intensity is given by:

188 = 0.2

C

viA 2 + B2 + C2 RANGE + IMIN

where A, B, C are coeffiCients of the equation of the
plane of the self-shadowed polygon.
Once the intensity functions have been calculated,
the processing of the "key squares" data set begins.
In the original version of PIXSCANR used for nonshadow half-tone image presentations, it was a simple
matter to transpose the "key squares" directly into
data items to send to the display hardware. For the
shadow half-tone system, the addition of the "shadow
key squares" and the "self-shadow key squares" to the

Figure 6-Two present.ations of a three-space structure

Algorithm for Producing Half-Tone Computer Graphics

9

RESULTS OF THE ALGORITHM

Figure 7a-A-Frame cottage with no shadows

The two photographs of Figure 6 compare the "wireframe version" of a three-space structure with the
shadow half-tone presentation of the same structure.
The object consists of three parts, all arranged to fit
interlockingly within one another. The computation
time for our implementation on the CDC 1604 computer was about 2 minutes, 20 seconds. Time for
computation of the non-shadow half-tone presentation
ran about 45 seconds.
Figure 7 shows the same view of an A-frame summer
cottage, first with no cast shadows in part a. and then
with cast shadows in part b. Non-shadow half-tone
computation took 13.5 seconds and the shadow halftone computation required about 27.0 seconds. Both
cases indicate that time required for shadow half-tone
computations is about twice the time required for the
non-shadow case.
As an example of how the same scene appears with
the light source in various locations, Figure 8 shows the
A-frame cottage in three different appearances. When
only the light source position changes, only the shadow

Figure 7b-A-Frame cottage with shadows

beam will be set at that point. Otherwise, the shadow
will be indicated by using the minimum value of intensity for the image (IMIN).
The remainder of the operations performed by the
PIXSCANR routine are concerned with the outputting
of the final image on various photographic media. At
CSL, we have the option of photographic recording on
either Polaroid 3000 speed black and white film or
70mm RAR type 2479 recording film. The PIXSCANR
routine also provides for inversion of the image 'in a
complementing operation on the intensity functions.
Further output capability is provided for making animation sequences on a 16mm animation camera.

Figure 8-A-Frame with different light source positions

10

Spring Joint Computer Conference, 1970

pairings and their subsequent computations change.
Thus, future implementation of shadow half-tone algorithms may be able to save computation time by
passing the "key square" data from scene to scene and
computing only the "shadow key squares" and "self-<..
shadow key squares".
If the only change made from one presentation to
another is a movement of the observer position, the
converse of the above occurs. The shadow data does

not change and only the "key square" data must be
computed. Both of these attempts to reduce computation time by borrowing from past results will require
increased amounts of storage and techniques for merging
the old and new data sets to generate the final half-tone
image.
Our final presentation of Figure 9 shows a torus in
free space back-lighted with respect to the observer
position. The torus is constructed of 225 planar polygons. The time for computation of the non-shadow case
was one minute, 25 seconds. The shadow half-tone
image required about three minutes of computation.
Approximately 700 shadow pairings were found to be
useful by the detection stage of the algorithm. Only
a small number were actually detected in the final
image. In fact, only by back-lighting the torus could
the complete image be. processed because the number
of visible shadow exceeded the limits o( storage available during execution of the program. Efficient computation of the final image data will depend upon the
availability of sufficient amounts of direct address core
storage or an auxilliary storage medium which can be
accessed in speeds approaching main memory access
time.
BIBLIOGRAPHY

Figure 9-Back-lighted Torus

1 C WYLIE G ROMNEY D EVANS A ERDAHL
Half-tone perspective drawings by computer
Proc of the Fall Joint Computer Conference Vol 31 49-58
1967
2 J WARNOCK
A hidden line algorithm for half-tone picture presentation
Tech Report 4-5 University of Utah Salt Lake City Utah
May 1969
3 BELSON
Color TV generated by computer to evaluate space borne systems
Aviation Week and Space Technology October 1967
4 J BOUKNIGHT
An improved procedure for generation of half-tone computer
graphics presentations
Report R-432 Coordinated Science Laboratory University
of Illinois Urbana Illinois September 1969
5 A APPEL
Some techniques for shading machine renderings of solids
Proc of the Spring Joint Computer Conference Vol 32 p
37-49 1968
6 A APPEL
The notion of quantitative invisibility and the machine
rendering of solids
Proc ACM Vol 14 p 387-393 1967
7 M KNOWLES
A shadow algorithm for computer graphics
Department of Computer Science File No 811 University of
Illinois Urbana Illinois 1969

The case for a generalized graphic problem solver*
by E. H. SIBLEY, R. W. TAYLOR, and W. L. ASH
University of Michigan
Ann Arbor, Michigan

INTRODUCTION

package, and that field has been having its own problems for years, too.
The rest of this article focuses on what has been
done, and what reasonably could soon be done towards
providing a useful package for generalized graphic problem solving. The techniques adopted in SKETCHPAD's constraint satisfactions are discussed with conclusions made about their generalization and their
wider application in other fields.
Finally, some of those scientific, engineering, and
mathematical modeling techniques which normally use
graphics for either visualization, analysis, or numerical
computation of the solution are examined in some
detail. The conclusions suggest that there is still something that can be achieved in the near future, although
it is probably less than many of the early optimists
had predicted.

Not so many years ago, SKETCHPADI and DAC2
set the whole world of computing on a philosophical
bender. People, who either knew little of the subject
or else should have known better, started talking up a
storm. The whole of engineering was about to be
revolutionized and everyone should prepare now or be
sunk, to drown in their own ignorance.
Unfortunately, even though we,in computation, have
been regularly beset by super-salesmen who keep on
telling us "how good its going to be," we still are
suckers for a good line. We sat at the edge of our chairs
listening to the prophets, and later scurried around
trying to learn more about the wonders of the future,
and bought expensive hardware (which we didn't yet
know how to use) so that we should be ready.
Fortunately, some business managers finally asked
why the expensive equipment was sitting there, and as
a result, many people moved away from "research-like"
operations to a more reasonable "application" approach.
This meant that the people in graphics became divided
into two camps (almost mutually exclusive) who either,
tried rather unsuccessfully to implement a generalized
graphic system, or else tried to produce a working
application program. The latter set of investigators
have produced many useful packages, or we should all
have watched the graphics hardware being converted
into TV sets. Why then has the well promoted "generalized graphic system package" proved so elusive? In
this article, we shall try to show that this is due to lack
of knowledge on the part of the prophets. That in fact
they were proposing systems which needed, as their
core a generalized problem solver, which is, after all,
only asking for a really good artificial intelligence

BACKGROUND AND JUSTIFICATION
'Before starting a discussion on the merits or deficiencies of a generalized graphic problem solver, we
must define that term. The use of graphics (other than
symbols) for the solution of problems is common
throughout much of engineering and science, and even
in some fields of mathematics. The concept is often
associated with the idea of a model which involves a
topological or physically scaled picture from which the
original problem is solved. Sometimes the picture itself
is the solution (e.g., in the case of computer generated
art, some phases of architecture, etc.), but more often
the picture is either an immediate model from which
algebraic or numeric equations are produced (these are
solved to give the answer) or else the picture is later
used to generate information (e.g., for architecture, the
original drawings may later be examined to give material and cost information). Now if we consider a
software computer system which aids the engineer,
scientist, or mathematician in the formulation and

*The work reported in this paper was supported in part by the
Concomp Project, a University of Michigan Research Program
Sponsored by the Advanced Research Projects Agency, under
Contract Number DA-49-083 OSA-3050.

11

12

Spring Joint Computer Conference, 1970

analysis of a range of problems, using the heuristics of
the man to formulate a reasonable model, and the
computer to aid in and augment the analysis, etc.,
then this software could be termed a "generalized
graphic problem solver".
At the other end of the scale from the generalized
graphics processor, is the specialized graphic package.
Here the classical input-output devices (e.g., card
reader and printer) are replaced by graphic devices
(e.g., light pen and CRT) so that the user has an
easier time stating his problem or understanding his
solution, probably because it is two dimensional in
form, or nearer to the engineers' medium, viz, pen
and paper.
The first question must then be: "Is generalized
graphics desirable?"
Obviously all the early graphic-systems/computeraided design prophets thought that it was. To quote
one:3
"In the near future-perhaps.Jwithin five and surely
within 10 years-a handful of engineer-designers
will be able to sit at individual consoles connected
to a large computer complex. They will have the full
power of the computer at their fingertips and will be
able to perform with ease the innovative functions of
design and the mathematical processes of analysis,
and they will even be able to effect through the computer, the manufacture of the product of their
efforts . ... "
Recently, there has been some retrenchment: 4
"Suddenly a new wor~d seemed to have sprung
into being, in which engineers and architects could
sit in front of a screen ... and conjure up automobiles or hospitals complete in every detail, in the
course of an afternoon. Unfortunately, reality turned
out to be more elusive than some people expected . ...
What emerges from the above is a requirement for a
general system for building models, to which can be
applied transformations and algorithmic procedures . ... "
On the whole though, there is still much to be said
for generalized graphics systems research. To begin, we
can state the rather overoptimistic argument that:
there is bouIld to be some practi~al or more efficient
fallout from research in general systems; thus we will
have better specialized systems even if we never get a
really generalized one. In this argument, we are probably on reasonable ground, since the special systems of
today have nearly all been- spin-offs from the past
generalized systems. But th,is is not enough.
The important point seems to be that a reasonable
research effort can and will produce further steps towards a generalized system, or at least a "less specialized" system.

A second question is:
If we had a generalized graphic system today, could
industry afford to use it? Unfortunately, with our best
hopes, we still have to admit that the answer is NO.
This is mainly due to the fact that specialized systems
are normally cheaper, and often easier to run than a
generalized system. This could be compared to the
difference between a "FORTRAN machine" which provides a useful but ad hoc language, and a Turing machine, which is certainly general, but is by no means as
easy to use.
How, then, do we hope to succeed? The first way is
by developing more powerful techniques, which, though
general, are easy to use and well interfaced to the user.
The second is the normal effect of engineering progress
and the economies of scale. As time goes on, we might
hope to see both cheaper hardware and a ready pool
of useful routines from a user community which can
be integrated into a total, but generalized system.

THE SKETCHPAD METHOD OF
CONSTRAINT SATISFACTION
Since picture meaning and solution will form the
heart of any generalized system, we felt that a deeper
understanding of the "graphical constraint problem"
was necessary. Naturally, we started with the
SKETCHPAD approa,ch. We had the following questions in mind:
1. Were SKETCHPAD's methods as general as some
claimed or were people misconstruing some exciting beginnings?
2. Why hadn't further extensions of the graphical
constraint problem appeared?
3. How might we extend the SKETCHPAD work
on graphical constraints?
Some answers which led to our general conclusion will
appear in the next few sections. It will. be helpful to
consider the SKETCHPAD method in some detail for
background purposes.
When SKETCHPAD was commanded to satisfy a
set of constraints on a picture, it did so by using numerical measures of how much certain "variables"
(usually points) were "out of line" (our phrase). Constraint satisfaction was therefore a matter of reducing
the various numerical error measures to zero. The
errors were computed by calling, for each constraint
type and for each degree of freedom restricted by that
constraint type, an error computing subroutine. Each
subroutine would compute an error "nearly proportional to the distance by which a variable was removed
from its proper position". Thus if the components of a

Case for Generalized Graphic Problem Solver

variable were displaced slightly and the error subroutine called for each displacement, a set of linear
equations could be found. These equations had the
form

where Xi is a component of a variable, E is the computed
error, and the subscript 0 denotes an initial value.
This set of equations could then be solved by LeastMean-Squared Error Techniques to yield a new value
for each component involved.
The general constraint procedure was thus based on
this numerical measure, and the most general algorithm
available was relaxation. Thus a variable was chosen
and re-evaluated using the LMS technique such that
the total error introduced by all constraints in the
system was reduced. The process continued iteratively
and eventually terminated when the total computed
error became minimal.
Obviously, the relaxation method, with a set of
equations to be computed and solved many times, was
slow. Thus, before using relaxation, SKETCHPAD
employed a heuristic, which will be discussed in some
detail because we believe many similar techniques will
be necessary. The object of the heuristic was to find
an order in which a set of variables could be re-evaluated
such that no subsequent re-evaluation would affect a
former one. If this order could be found, then the constraint satisfaction process could proceed much faster
because the variables involved would need only one
re-eval uation.
~o perform this search, the user picked a starting
varIable. SKETCHPAD then considered the constraints
on that variable and formed the set of all variables
which participated with the starting variable in the
constraints involved. If some of these variables had
sufficient degrees of freedom to satisfy the constraints,
then one could be sure that they would not affect
previous constraint re-evaluations. Thus such constraints could be removed from the constraint set on
the starting variable, possibly allowing it to be easily
re-evaluated. The technique was extended to build
chains of constrained variables until sufficient free ones
were found. Of course, such a se,t of free variables did
not always exist, and it was necessary to have the
relaxation method as a back-up.
COMMENTS ABOUT SUTHERLAND'S
METHOD
In the conclusion to Reference 1, Sutherland stated

"It is worthwhile to make drawings on a compute;

13

only if you get more out of the drawing than just a
drawing" (p. 67). While' this statement might arouse
some debate (are plotters useless?), it nevertheless reflects the key point that one must be able to associate
meaning, and thus constraints, with a picture. When
the Sketchpad method is evaluated relative to arbitrary
picture semantics, several observations can be made.
First, all constraints in SKETCHPAD were eventually related to a numerical error definition having to do
with distance. This technique was clearly oriented toward the L1V[S equation solution method, but in addition had several advantages. It allowed new constraint types to be added quickly since all one had to
do was write a new set of error computing subroutines;
the solution machinery was already present in the overall system. The solution technique itself, though it
involved a preliminary heuristic, could almost guarantee
a solution in reasonable if not entirely pleasing time.
In addition, the numerical approach together with
relaxation yielded results for a class of problems that
were of particular interest in the author's field. The
approach was therefore eminently successful for certain
operations.

In a more general context, the approach has some
drawbacks. The relaxation method depends critically
on the results of a previous iteration. Thus once having
applied it, it is a non-trivial matter to restore the pictUre to its previous status; one might have to make a
copy of parts of storage, for instance. Design by trial
and error, with recovery fr.om undesirable conditions,
is thus hampered. Furthermore, the relaxation method
is not generally selective in picking some priority in
variable re-evaluation. One constraint is as strong as
another and all are broken with equal abandon. This
runs counter to most realistic concepts of a constraint.
These drawbacks were known to the author: "There
is much room for improvement in the relaxation process
and in making 'intelligent' generalizations that permit
humans to capitalize on symmetry and eliminate redundancy" (p. 55).
A final criticism of the numerical definition of constraints centers on the degree to which they are "picture
oriented." A review of the available constraint types in
SKETCHPAD (Appendix A) quickly reveals their
obvious correlation with "picture transformations".
Certainly such picture transformations are a desirable
part of man-machine communication. However, if the
system is limited to constraint satisfaetion methods
involving such simple picture transformations, the
system capabilities are undoubtedly themselves limited.
Examples will be presented later. The point is that
constraints are a function of picture semantics, and
only incidentally of picture geometries.
It should be noted that the above criticisms are

14

Spring Joint Computer Conference, 1970

OUTPUT
DBVICI:.

picture,
fODlUl •• ,

nu:abera.

IIIPUT
DEVICE:

typewriter,
light pen,
tablet.

IIIPUT
PRQCEDURZ.

paning, L-.-~
at.ting valu•••
relation.hip.,
~nd ••

Figure I-Generalized (graphic) problem solver

centered on the mapping from constraints to numbers.
The phi10sophy of relaxation and especially the preliminary heuristic do not depend critical1y on the numerical error computing subroutines. They are, rather,
general strategies for solution. Relaxation could be
rephrased as:
"Transform object A into object B"
"Reduce the difference D between object A and
object B"
"Apply operation Q to object A."
which the reader may recognize as a strategy in the
Newell and Simon GPS program. 7 In such a context,
the preliminary heuristic becomes a method for ordering
how the transformations will be effected. GPS has
similar, though less dynamic, ordering strategies.
Thus while the Sketchpad approach was an exciting
and useful beginning in the area of graphica1 constraints,
it should not be taken as the last word. The author
himself has stated, "l\1uch room is left in Sketchpad
for improvements.... A method should be devised for
defining and applying changes which involve removing
some parts of the object drawing as well as adding new
ones" (p. 70).
The last sentence of the quotation suggests the
generality discussed above. It implies more general
picture operations, macroscopic in nature; it is a short
step to view these macros as a "subproblem" strategy.
THE GENERALIZED GRAPHIC
PROBLEl\1 SOLVER
In analyzing the difference between special graphic
systems and generalized graphic problem solvers, we
found that the information flow chart for both systems,
and indeed for many other types of computer-aided
design systems, could be represented by Figure 1.

Although this figure bears strong family resemblance
to work of others,5,6 it has special new features, and
leads to certain unique conclusions.
First let us consider the work of a simple graphic
"drawing" program, where the user can merely input a
restricted set of graphic entities (such as lines, arcs,
etc.), and then view them on a CRT. In essence,
he is working with the restricted system A ~ B ~ C D ~ E ~ F ~ A; he is unaware of the data structure
representing his picture, and is unable to either affect
it to produce a new drawing, or analyze its meaning
(except, possibly, as an artistic entity). The particular
blocks that he uses are:
B: An input device such as a light pen, or tablet,
or even a string of characters from a teletype
which represent predefined graphical items.
C: An input procedure, not necessarily highly sophisticated, nor particularly general in its parsing
rules, which stores the information.
D: A data structure, normally unsophisticated, which
represents the input information in a somewhat
compacted form, possibly in hierarchic classes
called pictures.
E: An output procedure, which can take information
about a given picture from the data structure,
and format it so that the output device will be
able to use it (e.g., place it, in correct format,
into a "display file").
F: An output device, probably a digital or analog
driven cathode ray tube.
If we now consider what additional information flows
in SKETCHPAD, we see that the loop ABCDEFA is
still needed, with much greater sophistication in blocks
C and D, but also that other parts must exist. Besides
the drawing function, we have the constraint function;
e.g., two lines are stated to be parallel. The additional
functional requirements are therefore:

G: A procedure which interprets (or parses) the constraining command
H: A process which (possibly heuristically) determines the particular set of rules which are to
apply for this additional constraint
I: The set of rules that can apply for the particular
branch of science or mathematics being modeled.
In this case, only geometry.
J: A procedure which takes the new rule, and, in
conjunction with all previous rules laid down by
the user, produces the final result, thereby po,:.
tentially changing the data structure (D).
As an example, suppose we enter two lines into
SKETCHPAD. This will cause two loops through
ABCDEFA. We now state that these must be parallel.

Case for Generalized Graphic Problem Solver

The arc ABCGH determines that there is a given rule
in I to satisfy the condition "parallel"; i.e., that the
slopes of the lines be the same. J now determines the
error metric, and minimizes this, finally producing new
definitions for the two lines in D. The new result to
the user (A) is given via E, F.
Ross5,6 working from a similar system conception,
argued that it would be pointless to write a family of
special purpose packages, since these could never satisfy
a general user population. Nevertheless, he realized
that boxes H, I, and J were the key to any useful
system. He therefore proposed (and has since built) a
system for building systems. The idea is that a sophisticated user, provided with general packages for building
boxes B, C, D, E, and F, and further provided with a
general language for building boxes H, I, and J, could
produce specialized packages economically. Our conclusions differ from Ross', but before examining them,
we should draw parallels to other (non-graphic) systems.
For a numeric problem, e.g., the solution of supersonic flow in a divergent nozzle, we have no automatic
display loop. The information flow is essentially ABC,
CD and G), (H and I), J, D, E, F, though some of the
steps are either eliminated or merged with one another;
e.g., G, H, and I are merely reformatting the input so
that it can be worked on by a special purpose program J
(which is a differential equation solver and special
purpose output formatter).
For algebraic analysis, we have a surprisingly similar
set of procedures. The input (BC) and output (EF)
programs are oriented towards equations, and the
parsing in the input procedures (from C to D) may be
quite complex, but the constraints on the solution (G)
are now the selection of a subset of the equations,
probably by user command, with some criteria (such
as elimination of a variable) for the solution. The
analyzer and rule applier (H) now must select, from
the set of system laws (I) or axioms or rules, that set
of rules and their order of application so that a reasonable solution can be generated when they are used by
the problem solver (J) to produce the required set of
operations to provide a solution.
Finally, we are ready to make our generalization,
which is that graphic problem solvers, like general
problem solvers, are in need of good heuristic solution
techniques. Thus the conclusions are:
(i) The constraint satisfaction problem, though
capable of solution for simple graphic pictures
with a few constraints by algorithmic or simple
maze solving techniques, is in fact a specialization (though a not much easier problem) of
the general problem solver.
(ii) That if a graphic system were built with the

15

GHJ of Figure 1 replaced by a General Problem
Solver, then the "constraint problem" would
only be one of many useful operations available,
and that it would then be possible to use this
GPS, examining the data structure and a set
of rules (possibly for topology and model building in electronic circuits) to analyze the physics
of the model, either numerically or symbolically.
(iii) It then would be possible for the user to be
truly in charge of his model. In contradistinction
to other writers,5,6 we believe the correct interface between the user and his modeling is at
"B" as a user of a given system, or at "I" when
he is altering his system rules or setting up his
system for a new type of model (e.g., mechanical
engineering instead of electrical, etc.).
Examples oj use oj a generalized graphic problem solver

Let us consider the operation of such a generalized
(graphic) problem solver on two problems from engineering (Figure 2). The first of these· represents an
electric circuit problem solved using a "phasor diagram". Naturally, this is a problem which is "easily"
solved using a special purpose program. First, we construct the circuit diagram by using a series of lines to
connect resistors and capacitors which have ·been pro-

:q
3.

T

a.

Phasor Diagram

b

11

-7'''-------+-----'''' a

b~

Mechani_ 6 Velocity DiagrlllD

Figure 2-Examples for a generalized (graphic) problem solver

16

Spring Joint Computer Conference, 1970

vided as "subroutines" by previous users or system
designers. We associate with the resistor the "numerical" value R, and with the capacitor the value C. If
the idea of "three phase balanced voltage" has been
previously introduced, we associate 1, 2, 3 with the
lines and give V as the line voltage. All of the drawing
and associations have been entered either by light pen
or typewriter key strokes. The picture has been drawn
using loop ABCDEF, and all associations are passive
constraints which have not caused any "problem solution" up to this point.
It is, of course, possible that the engineer, wishing
to be "neat" positioned the lines horizontally or vertically, but this was either a constraint during drawing,
and hence automatically satisfied, or else the constraint
was later applied, and the GPS loops of (C, D), G,
(H, I), J, D will have no real difficulty in resolving the
problem since the problem is not overconstrained, and
anyone constraint can be selected to be applied first
with no effect on the outcome.
If "phasor diagrams" and the laws of impedance
are already a part of the electrical-bag-of-tricks, then
the diagram to the right could immediately be constructed, with the value of I12 set as (V/R) in phase
with V 12, and the value of 123 set at· (VwC) leading
V23 by 90° (where w is the angular velocity of the
applied voltage). All that remains is the need for
Kirchhoff's law to produce 12 as the vector sum of 1 23
and (-1 12) ••• this also requires the concept of a triangle for solution.
- If the previous stored experience (the system laws)
,does not include phasor diagrams, the user may still
. produce a solution by applying the graphic construction
phase as follows: Draw V 12 horizontally, define its length
to be V (non numeric scales being allowed), and the
other voltages V 23 and V 31 at 120° and 240° respectively
anticlockwise. In the same way, 112 would be defined
parallel to V 12 (with scaling) and 1 23 perpendicular to
V 23, clockwise.
If the values of V, R, C, ware known, then actual
numeric values of the current (and phase angle) can
be computed. If the user wishes, he could use an
algebraic solver to obtain 112 as (V /R) /0°, 1 23 as
(VwC) / -30°, and even 12 as V,{ (v3/2)wC - B-1 i(wC /2)} where i is the 90° operator (assuming that
this notation was available).
As an extension of this type of program, we can introduce topological considerations to include the concept
of parallel and series with their associated algebraic or
numerical transforms. It would also be possible to
apply more complicated transformation rules, such as
P --1r and its inverse. Obviously, the simple rules of
complex number manipulation should be included in
this "circuit pa.ckage".

The second example of Figure 2 involves the position
and velocity of a simple four-link mechanism. First, we
draw the mechanism, constraining each length fixed,
point C as moving horizontally, point E as moving on
AB, and D as fixed between Band D. The generalized
graphic problem solver could produce a "picture" of
the mechanism for any given angle of. AB from the
horizontal. As a result, we could plot if we wish the
curve of displacement-v-angle. Alternatively, if we have
an algebraic/trigonometric processor available, we could
produce equations for the positions of the various points
of the mechanism (e.g., B = l cos (J
il sin (J, etc.).
This would probably also introduce the angles of BC
and DE from the horizontal, with "constraint" equations for determining these as functions of the various
lengths and (J (e.g., BC/sin (J = BA/sin a).
One common method of solving for the velocity is to
draw a diagram, as shown with lengths proportional
to the velocity, and the relationships of the angles
either along or perpendicular to the special diagram.
Thus ab = nAB to scale, and perpendicular to AB.
To find C, we intersect bc (drawn 1. BC) and ac (drawn
along the direction of constrained piston motion, i.e.,
horizontally). The position of d is determined by preserving ratios, i.e., bd/bc = BD/BC. Thus the velocity
diagram can be constructed, and the velocity of all
parts found.
Obviously, another way to solve this problem, given
the trigonometric rules, laws, etc., above, and also a
simple set of differentiation rules, would be to solve
for the time derivative of the displacements, and hence
obtain a closed form solution .

+

CONCLUSIONS
We have seen, in the two examples, that there are
several ways that the same problem can be tackled,
given a generalized graphic problem solver. But indeed,
this should not be surprising, because we have already
assumed that a significant feature of the generalized
graphic problem solver is a general problem solver. We
also know that much of engineering graphics in the
past was aimed at either visualizing the model in such
a way that it could be solved, or else providing a way
for obtaining numerical solutions by scaled drawings.
This may now lead us to two important facts. The
first is: scientists and engineers have, in the past, made
significant use of graphic diagrams to solve mechanical,
electrical and other engineering and scientific problems.
Is it reasonable to use these techniques, when they
represent only a man oriented solution, and when a
machine might make the solution more simply using
the algebraic formulation? In some ways, this question

Case for Generalized Graphic Problem Solver

is academic, since any algorithm, which is easily defined,
should be welcome to the computer user. In fact, these
sorts of techniques are highly man-machine or symbiotically oriented. Strangely enough, the algebraic approach wou!d be much less man controlled", needing
heuristics to determine how to solve the series of equations (i.e., to determine the order of applying the laws
or rules). Thus we see, once again, that the future
betokens more applied artificial intelligence.
The second is:
Although many investigators have looked on the
world as man-machine oriented, Figure 1 suggests that
this cannot be. the case if we plan to expand towards
generalized syste,~. This is because the user only
appears in one loop (ABCDEFA), and is excluded from
the other (GHJD). When the problem is being solved,
the user may be able to assist, but more often, the
machine representation of the problem and its present
state of solution may be unintelligible or untranslatable
to the man. This does not mean that there are no
places where the man could help, but it suggests that
there is no single successful technique where a man can
help. An illustration in Reference 8 shows that if
graphic procedures are called from within other procedures, then the human decision maker could be con·
fused by a "question" or call-for-aid generated in a
low-level subroutine, since the user may not even be
aware of the conflict, let alone what it means.
Finally, to end on an optimistic note:
The graphic systems which are specialized are significantly different from the generalized systems, but
they are, nonetheless similar in many parts to the
generalized system described here. l\1any of the routine
manipulations in a special syst"em are still needed in
the moregeneraI. The total general system involves
generalized problem solvers which, though being developed in several locations in the country, are still
v-ery primitive; however, recent work on both the
theoretical and practicallevel9 , 10 suggests their ultimate

17

utility. But should all else fail for the next few years,
we can still fall back on the semi-automatic procedures of the engineer/scientist, like those discussed in
the last section and shown iIi Figure 2. Although this
is not a large step forward, we have plenty of room for
research even in this cut-down version, while we are
waiting for AI to develop.
REFERENCES
1 I E SUTHERLAND
SKETCHPAD: A man-machine graphical communication
system
Lincoln Laboratory Technical Report 296 Lexington
Massachusetts January 1963
2 E L JACKS
A laboratory for the study of graphical man-machine
communications
Proc of the Fall Joint Computer Conference Vol 26 Part I
p 343 1964
3 S A COONS
Computers in technology
In Information W H Freeman and Company San
Francisco California 1966
4 J C GRAY
Compound data structure for computer-aided design: A survey
Proc 22nd National Conference ACM p 355 1967
5 D T ROSS
The AED approach to generalized computer-aided design
Proc 22nd National Conference ACM p 367 1967
6 D T ROSS J E ROD~IQUEZ
Theoretical foundations for a computer-aided design system
Proc of the Spring Joint Computer Conference p 305-322
1963
7 A NEWELL H A SIMON
GPS, a program that simulation human thought
In Feigenbaum and Feldman Computers and Thought
p 279-293
8 E H SIBLEY
The use of a graphic language to generate graphic procedures
Proceedings of Second Illinois Conference on Pertinent
Concepts in Computer Graphics April 1969
9 G W ERNST
Sufficient conditions for the success of GPS
JACM Vol 16 No 4 October 1969
10 C C GREEN
A pplication of theorem proving to problem solving
IJCAI Washington May 1969

A variance reduction technique for hybrid
computer generated random walk
solutions of partial differential equations
by DR. EVERETT L. JOHNSON
The Boeing Company
Wichita, Kansas

INTRODUCTION

sian amplitude distribution and power spectral densities
and D2 respectively. Figure 1 demonstrates the
simulation technique.
Fundamental to the research of Chuang, et aI, is the
theorem of Petrowsky 3 which guarantees the convergence of the stochastically obtained solution to that
of the generalized Dirichlet problem, Equation 1.
The solutions to several one and two dimensional
problems were given in the paper by Chuang, et al.
The solutions were obtained as follows: An analog
program as shown in Figure 1 was patched. The initial
conditions of integrators 1 and 2 were set to the coordinates of the point for which the solution was desired. Boundary crossings were detected by using an
oscilloscope, bounded region mask, and photo tube.
Upon detecting a boundary crossing, the value of the
boundary-value function at the point of intersection
was recorded and the process repeated. The approximate solution was then given by
D!

The work done to date on the analog/hybrid Monte
Carlo solutions of partial differential equations can be
summarized by reviewing three works: one research
report and two Ph.D. theses.
Chuang, Kazda, and Windenecht were the first to
demonstrate the feasibility of a Monte Carlo solution
of a class of partial differential equations on an analog
computer.! The boundary value problems for which
their stochastic solution technique is applicable belong
to a family of generalized Dirichlet problems of the
form

(1)

where Kl (Xl, X2) and K2 (Xl, X2) are arbitrary functions
of Xl and X2. The boundary, c, is an arbitrary, finite
closed curve-a Jordan curve.
The boundary-value function ¢(XI, X2) is a bounded,
single-valued piecewise continuous function of Xl and
X2. DI and D2 are constants.
The method developed is based on the direct relation
that exists between partial differential equations and
the random process that arises in the analysis of electric
circuits subjected to random excitations. 2
The electrical equations to be simulated for the
solution of Equation 1 are
dXI

at + KI(XI,

X2)

1

f(~) = N

E
N

¢(Sn)

(3)

where Sn is the coordinate of the nth crossing, ~ the
solution point, and N is the number of repetitions of
the procedure. The simulation equipment allowed up
to 2000 random walks per hour. Errors were in the
.5 percent range and were attributed primarily to statistical variations, presumably in the noise source.
Little4 extended the class of partial differential equations for which the methods developed by Chuang,
et aI, are applicable while developing a technique of
solution utilizing an analog computer linked toa digital
computer. Little solved three types of partial differential equations: parabolic, elliptic, and non-homogeneous.
Little used the analog computer for simulating an
electric circuit excited by random noise. The digital

= NI(t)
(2)

where NI(t) and N 2 (t) are noise generators with Gaus-

19

20

Spring Joint Computer Conference, 1970

In summary:
BOUNDARY
DETECTION

~=gi
GENERATOR

----

ANALOG
MODE
CONTROL
LOGIC

I--_~ AVERAGING
DEVICE

Figure I-Continuous random walk program

computer collected and averaged the resulting boundary values and controlled the modes of the analog
components. With his hybrid system, EAI 231R-V
analog computer and Logistics Research Alwac III-E
digital computer, he was able to obtain 10 random
walks per second. The Monte Carlo solutions compared
favorably with the analytical solutions of the example
problems.
Handler5 demonstrated that by use of a high speed
repetitive operation analog computer with parallel logic
capability, the ASTRAC II, that the Monte Carlo
solution techniques developed by Chuang, et aI, and
Little could be made competitive with more conventional finite difference digital solution techniques. In
fact, he demonstrated the ability to plot continuously
and directly the solution to partial differential equations. This was accoIllplished by slowly changing the
coordinates of the point for which the solution was
desired while performing 1000 random walks per second.
The averaging of the intersected boundary values was
done by a simple analog averaging circuit.
A. W. J\!Iarshall6 has stated that if a random sampling
method is to be used to solve' a problem, attention
should be turned to three topics.
(1) Choosing or modeling the probability process to be
sampled (in some cases this means choice of the
analog; in others a choice between alternative probability models of the same process).
(2) Deciding how to generate random variables from
some given probability distributions in an efficient
way.
(3) Variance reduction techniques, i.e., ways of increasing the efficiency of the estimates obtained
from the sampling process.

(1) Petrowsky has defined a class of stochastic processes which can be used for obtaining a solution
to the Dirichlet problem.
(2) Wang, et aI, have established a random process, in
the class defined by Petrowsky, to be sampled.
The first two topics suggested by Marshall have been
considered previously and satisfactory results obtained.
The third topic is the subject of this paper.
The means toward the end will be an examination
and implementation of a technique of stratified sampling using the Green's function for a rectangle and for
a circle.
The technique is implemented by choosing an approximating region, R a , which is totally contained within
the region, R, for which a solution is desired. Portions
of the boundary of Ra may be coincident with the
boundary of R. For a point contained in Ra the solution
is found by performing walks which originate from the
boundary of Ra. The number of walks, N m, which
originate from the mth segment is given by the negative
of the normal derivative of the Green's function for Ra
integrated over the mth segment. A substantial reduction in variance results from the use of the technique.
A VARIANCE REDUCTION TECHNIQUE FOR
.THE CONTINUOUS RANDOM WALK
For a solution of Laplace's equation for the region

R of Figure 2, the continuous walk technique moves

I

I
I

,,,
-,

-

\
\

"

Figure 2-Region of solution with approximating region

21

Variance Reduction Technique

continuously from the point of interest, ~, in the region
R until a boundary is intersected. The value of the
boundary function cP (s) is recorded and after N walks
the solution is
(4)

If the Green's function for Laplace's equation and the
region R are known then the solution can be written7 as
cp(~)

= -

f

aG

cp(s) -

an

c

(~,

s) ds

(5)

where ~ represents the coordinates of the point for
which a solution is desired, s is the variable on c the
boundary of the region R, and (aGjan) (~, s) is the
derivative of the Green's function with respect to the
normal vector of the boundary of R.
In Figure 2 consider the region Ra contained by R
with the boundary represented by a dotted line. Each
walk leaving the point ~ must intersect the boundary
of Ra at least once before intersecting the boundary
of R. If a boundary function CPa (Sa) were given for R a,
where Sa is the variable on the boundary of R a, then
CPa (Sa) evaluated at the points of first intersection with
the boundary of Ra allows the solution for Laplace's
equation in Ra to be written

CPa(~) =

1

N

E
N

cpa (Sai) .

(6)

It was reasoned that if Ra were a region with' a known
solution this information might be used to determine
the points· of intersection on the boundary of Ra for
walks originating at ~. Due to the Markovian nature
of the random walk process, walks originating from the
boundary of Ra with the correct distribution for continuous walks from ~ should give the solution to
Laplace's equation for R. The information sought can
be obtained from the normal derivative of the Green's
function for Laplace's equation in Ra. The probability
density function properties of the Green's function are
well established. 8 A heuristic argument is given in
Appendix A to support the use of the Green's function
to find the proportion of the N walks with origin at ~
which intersect any segment of the boundary of Ra.
If the boundary intersection coordinates are stored
during a random walk solution of Laplace's equation
for the region R then these values can be used for the
solution to Laplace's equation for various boundary
functions. The set of coordinate points is then' an approximation to the Green's function for the region R
and the walk origin ~. It is important that these intersection coordinates have as nearly as possible the same
statistical parameters as the Green's function. The im-

portance is obvious if the boundary function
pressed as a power series.

IS

ex-

The solution to Laplace's equation is then seen to be a
function of the moments of S

cp(~) =

f (ao + alS + a2s2 + ... )Gn(s) ds.

(8)

8

The possibility of obtaining a variance reduction in
the sample average by originating continuous walks
from theoretically determined origins on the boundary
of Ra prompted the research reported in this paper.
A STRATIFIED SAMPLING TECHNIQUE
USING GREEN'S FUNCTIONS
Stratified sampling is a sampling technique which
gives a reduction in sample variance if the population
from which samples are to be made can be divided into
sub groups which have variances smaller than the
original population. The sub group selection and the
number of samples to be made from each sub group
must be selected such that the parameters to be determined from the samples are the same for the stratified
sampling as for samples taken from the original population. The use of Green's functions to determin.e the
distribution of walk origins on an approximating region
Ra contained by R permits the division of the original
sample popUlation that exists at ~ to be divided into
sub groups with variances smaller than that of the
population at ~. The technique described below was
implemented and shown to give a variance reduction
before its recognition as an application of stratified
sampling.
The technique consists of performing walks from the
boundary of the approximating region Ra which contains the point, ~, for which a solution is sought. The
boundary of Ra is divided into M segments, ~Ai' The
proportion, pm, of the total number of walks, N, to be
originated from the mth segment of Ra is determined by
(9)

where Gna(~, Sa) is the normal derivative of the Green's
function for Ra. Note that pm is the probability of a
walk originating at ~ intersecting the boundary segment
~Am' The segments must be small enough that the
solution to Laplace's equation at the mid points of two
adjacent segments does not differ greatly.
Consider the solution of Laplace's equation using
the technique described above for a region R, Green's

22

Spring Joint Computer Conference, 1970

function with normal derivative Gn(~, s) and with
boundary function c/>(s) = s on a portion of the boundary, ~s, and zero elsewhere. Divide the boundary of
the approximating region into M equal segments ~Am.
For the problem described, the solution is
1

N

N

k=l

S = - I:Sk

~(l.X) - 0

7

1 r-------~~~----~----~

(10)

where S is the average value of the intersection coordinates on ~S. The variance of the solution is9
with Ra
1

YeS) = -

N

M

I: PmO"m

2

(11)

o~--------~-(-o-.x-)-.-.-X--------~l

x

m=l

without Ra
Figure 3

giving a variance reduction
o(V) =

~(Qu2 + 2

f:

Pm(Um - U)2 -

f:

Qmu",'Pm)
(13)

where

tion was solved for the region and boundary values
given in Figure 3.
A circle with a radius of .45 centered at (.5, .5) was
used for Ra. Solutions were obtained for x = y = .5.
Approximately 100 walks per second were made-this
excludes the time required to compute the number of
walks made from each segment.
For each value of N, ten solutions were made. The
ten values were used to compute the standard deviation
of the sample average. The sample average is the so-

pm = the probability of intersecting ~Am
O"m2 = the variance of the average value of intersection coordinates from walks originating at
~m, the mid point of ~Am
U m = the expected value of the intersection coordinates resulting from walks originating at ~m
U = the expected value of the intersection coordinates resulting from walks originating' at ~

2S
----- Theoret1oal W1thout
_ _ Theoret1oal W1 th Ra
20

\

15

Q = -

fGn(~' s)

C"'\

ds

8

o

....

f Gn(~m,

10

,,

,,

,,
,
' ...... ,

s) ds.

8

The wavy line under S in Equation 12 denotes the
fact that in order to get the equation in terms of
Equation 11 it was necessary to assume that any walk
originating from ~ and intersecting a segment of the
boundary of Ra continued its motion from the mid
point of the segment. For sufficiently large M the
difference in results is negligible. For the experimental
verification given below a value of M = N was used.
For verification of the above results, Laplace's equa-

\

,,

H

o

Qm = -

Ba

5

.....

........

--- --- ------- ----+

o L------4TO-0--~-8~0"0------lT20-0----~1~6~OO~--~200'0
.UIIl~r

or

Walke

Figure 4-Standard deviation for X = .5, Y =

~5,

with Ra

Variance Reduction Technique

lution to the given problem. The experimental values
were plotted along with theoretical values for (J' for the
cases with and without Ra using the square root of
Equations 11 and 12.
The problem was programmed on an EAr 690 Hybrid
Computer. The digital portion of the hybrid system
computed the initial coordinate values for the walks,
the number of walks to be made from a segment, recorded the boundary intersection coordinates, computed the average, and controlled the modes of the
analog computer. The analog portion integrated the
noise to generate the walk, detected boundary intersections, and performed the filtering for the noise
generators.
25

15

o

----- Theoretical Without
- - Theoret1ca1 W1th Ra

,

\

\

H

\

o

10

I

40

I

"

'" "

~--

--

----- ~heoretlca1 Without Ra
- - 'l'heoretical With Ha

.....

,,

I

I

,,

,,

"

""

10

o

L-______~----~~----~~----_,------~

.2

.4

y

.6

.8

1.0

VARIANCE fOR X = .5 USING A CIRCULAR Ha
CENTERED AT X = y: .5 WITH A .45 RADIUS

Ra

\

r-4

50

Figure 6-Variance for X = .5 using a circular Ra centered at
X = Y = .1) with a .4i') radius

20

C""I

60

23

,,

,
+

--- - - -~-------

5

+

o ~-------'----------r---------II---------'-------~
~oo

~oo

1200

1600

2000

retical change in variance for x = .5 and .05 ::; Y ::; .9.5
with a radius of .45 and Figure 7 shows the theoretical
change in the variance in the sample average as the
radius of the approximating circle is varied. Note in
Figure 6 that the variance is a function of position and
that for a given variance in sample average the number
of \valks required is not a constant.
The technique described and demonstrated in this
section gives a substantial reduction in error in the
continuous random walk solution of Laplace's equation.
For a given number of walks the amount of error
reduction is dependent on the position of the point
for which a solution is sought. The next section contains
an extension to the technique which in many cases
increases the variance reduction.

Number of WalkS

Figure !i-Standard deviation for X

.!i, Y

.;), without. Ra

50

1-------.....-::...:-:-.- -- - - - - - - -- - - - - - - - - - - ---

40

Figure 4 shows the results for x = .r>, y = .r> \vith
Ra. Figure 5 gives the results for the same problem
without Ra. The data points would have been closer to
the theoretical curves if more than ten solutions had
been made at each value of N. However, the improvement in standard deviation using Ra is apparent.
By using Ra for the solution of Laplace's equation
the following benefit is realized:
for x = y = .5, a ,158% reduction in variance or the
same variance as 'without Ra with 42% as many
walks.
For the problem of Figure 3, Figure 6 gives the theo-

----- Theoret1cal W1thout Ra
- - Theoretical W1 th Ra

o L-_________. -_____- .________
.1

.2

~---------r---------,

Rad1us of

.3

Ra

.5

VARI~~CE WITH Ra AS A FUNCTION Of RADIUS OF CIRCULAR Ra CENTERED AT X :: Y '" • 5 FOR THE POINT X = y = .5

Figure 7-Variance with Ra as a function of radius of circular Ra
centered at. X - Y = .;") for t.he point X = Y = .•5

24

Spring Joint Computer Conference, 1970

VARIANCE IMPROVEMENT USING
BOUNDARY COINCIDENCE

and

The continuous random walk solution of Laplace's
equation with boundary function q, (s) for a region R
with approximating region Ra can be written as

K is the number of walks actually made and is dependent on the extent of the boundary coincidence
and the coordinates of the point for which a solution
is sought. N will now be referred to as the base number
of walks.
Equation 11 gives the variance when Ra is used
which is the weighted sum of the variance from the
M segments of Ra. The variance of the coincident segments is zero giving an additional decrease in the
variance.

q,(~) =

N 1
EN
E
M

0< K

Nm

m

N

m

q,(Sim)

(14)

or
M

q,(~)

L Pmq,(~m).

=

(15)

m=l

q,(~m) is the sample average at the mid point of the
mth segment, N m is the number of walks intersecting
the mth segment of Ra and Sim is the coordinate of the
intersection of the ith walk from the mth segment.
Equation 15 is the continuous random walk solution
of Laplace's equation in Ra with boundary function
q,(~m) on ~Am. Since pm may be computed using the
approximating region Green's function, Equation 15
can be written

= -

i2 f

m=l

~A,,~

(~m)Gna(~, s) ds

where Gna is the normal derivative of the Green's
function for Ra. If Ra can be positioned such that
portions of its boundary are coincident with the boundary of R then the boundary function on the coincident
portions of Ra need not be approximated by q,(~m) but
is the same as the boundary function for R. Equation
17 can then be written

q,(~)

=

-! q,(S)Gna(~,

s) ds

+ t Pmq,(~m)

(22)

The solution to Laplace's equation is desired for the
region shown in Figure 8 and boundary conditions:
q,(0, B) = (3
((3, a) = B
q,(A, (3) = (3
q,(a, (3)

(17)

N.

EXAMPLE PROBLEM

(16)

or

~

=0

elsewhere

where a and (3 refer to coordinates of poi~ts on
boundary.
The Green's function for a rectangle is 10

(m1rY). m1r (

~ . h - - smh- b - (3
-2 £-sm
a m=l

a

t~e

)

a

. (m1rx) . (m1ra)

. sm ---;;- sm
G(x, y, a, (3)

~

m1r . m1rb

h
-sm -

a

(18)

a

m=l

8e

where Se is the coincident portion of the boundaries
and L is the number of segments not coincident.
The second term of Equation 18 can be written

for
(23)

~. (fn1r(3).
m1r
= -2 £-smh
_.- smh-

(19)

a m=l

a

a

(b - y )

. (m1rx) sm. (m1ra)
----;;-

which can be written

. sm
1

K

N

i=l

- L: q,(Si)

(20)

m1r . m1rb

h
-sm - a
a

where Si is the ith intersection coordinate,
L

K =

LNm ,
m=l

~

(21)

for

Variance Reduction Technique

25

TABLE I-Solution for Laplace's Equation for Region of
Figure 8 with Base Walk Number of 1000

where

O:::;x:::;a

o :::; {3

~

1000 Walks

b.

The coordinates of the point being solved for are (x, y).
Equation 23 can also be expressed as a Fourier
series in y by interchanging a and b, x and y, and a and
{3. Both representations give the same results but, to
obtain faster convergence the Fourier series in x should
be used at points for which

(y: ~) >

e a)
~

(24)

and the Fourier series in y when the reverse of Equation
24 is true. The exponential form of Equation 23 was
used for developing the digital representations of the
equation.
Using the Green's function for a rectangle and the
two approximating regions 1234 and 4567 shown in
Figure 8, the problem was programmed on an EAI 690
hybrid computer. The digital-analog problem split was
as follows:
The digital computer calculated which of the two
approximating regions would require the least random
walks, the contribution the coincident boundary portions made to the solution, and the number of walks
to be made from each non-coincident segment. In addition, the digital computer controlled the modes of
operation of the analog computer, monitored sense lines
signaling boundary intersections, stored the boundary
intersection coordinate values, computed the proper
boundary function values, and set the initial conditions
for the x and y integrators.
The analog computer performed the random walks
by integrating the noise inputs, provided shaping filters
for the noise, and with analog comparators and parallel

SOLUTION USING Ra

WITHOUT Ra

Y DIGITAL HYBRID IERRORI #WALKS HYBRID IERRORI

X
35.33
30.62
25.83
23.38
20.89
18.33
15.69
12.93
10.02
6.90
3.54

36
32
28
26
24
22
20
18
16
14
12

35.19
30.54
2.5.89
23.55
20.72
18.25
15.63
13.20
9.89
7.11
3.42

.14
.08
.06
.17
.17
.08
.06
.27
.13
.21
.12

=

100

49
97
143
161
178
190
212
201
191
154
91

.28
.94
.31
.70
.33
.43
.07
.07
.75
1.30
.94

35.60
31.56
26.52
24.08
20.56
18.76
15.76
13.00
10.76
8.20
4.48

logic components simulated the boundary and flagged
the digital computer when an intersection occurred.
Table I contains the results for the region shown in
Figure 8 with dimensions:

A =200
B = 40
D =

10

H =

10,

a 5-unit boundary segment for region 1234, a 2-unit
segment for region 4567, and a base walk number of
TABLE II -Solution for Laplace's Equation for Region of
Figure 8 with Base Walk Number of 10,000
SOLUTION USING Ra

B

J

4

Y

DIGITAL

HYBRID

x
>t

\1 2
1

0

-~-

+• 0

----------I ---------Dr-

:r-1~

(A.!D)/2

A

x

Figure 8-Region for which solution to Laplace's equation
is desired

2

36
32
28
26
24
22
20
18
16
14
12

3.5.33
30.62
25.83
23.38
20.89
18.33
15.69
12.93
10.02
6.90
3.54

I ERROR I # WALKS

= 100

35.34
30.61
25.71
23.40
20.82
18.33
15.66
12.99
10.02
6.93
3.61

.01

.00
.12
.02
.07
.00
.03
.06
.00
.03
.07

490
970
143
1610
1780
1900
2120
2010
1910
1540
910

2€)

Spring Joint Computer Conference, 1970

1000. The solutions both with and without an approximating region are compared to the results obtained
using finite differences and a relaxation type numerical
solution.
Table II gives the results using an approximating
region and a base walk number of 10,000. Solutions for
many other points were made ll with similar reductions
in error. Note in Tables I and II that less than 25% of
the base number of walks is actually made.

SOLUTION OF POISSON'S EQUATION WITH
CONSTANT FORCING FUNCTION
Handler12 proposed solving Poisson's equation with
constant forcing function by making use of the fact
that the average time, T, for a random walk satisfies
the equation

D a2T(x, Y)
ax2

1

+

D a2T(x, y) _
ay2

2

T(s) = 0

--1

(25)

and that the solution for Poisson's equation
QUICK LOOK CAPABILITY

2

One advantage of the continuous random walk with
or without Ra is the ability to find the solution for just
one point in the region while the solution by conventional digital techniques requires the solution over
the whole region to obtain the solution at a particular
point. Using the continuous random walk technique a
solution for cJ> can be found at a point or points as the
geometry is changed allowing a quick look at the effect
of a geometry change at some critical point. For the
problem of the preceeding section Figure 9 shows the
resultant change in cJ> at the point x = 100, Y = 20,
as H varies from 0 to 20 with a segment increment of 5
and 1000 for the· base number of walks. The data for
the curve was taken in less than 15 minutes, the time
required for one conventional digital solution.

20

2

a cJ>(x, y)
y)
- - + a cJ>(x,
=
ax2
ay2

-A

(26)

can be found as the sum of the solutions of

and

2
a cJ>2(X, y)
ax2

+ a2cJ>2(X, y)

=

-A

cJ>(s) = 0

(28)

ay2

where cJ>2 = AT. Random walks were performed with a
value proportional to the average walk time being
added to the resulting average of the intersected boundary values.
The technique described in the previous section is
also applicable to the solution of Poisson's equation
with constant forcing function.
Using the Green's function, Equation 28 can be
solved for Ra3 giving the average time for a random
walk to the boundary of R a •
The solution for Poisson's equation is then

cJ>p(x, y) = cJ>L(X, y)

+ AT(x, y) +

1.5

AD,
N

ETi
K

(29)

where cJ>L(X, y) is given by Equation 18, AT(x, y) is
given by cJ>2 (x, y) of Equation 28, and the last term is
the average time contribution from the walks actually
made from the non-coincident portion of the boundary
of Ra. D is the power spectral density of the noise
source. N is the base number of walks and K is the
actual number of walks performed .
Poisson's equation with A equal to 1 was solved for
the region and boundary values shown in Figure 10.
Two approximating rectangles were used,

+
10

.5

o

()
f()
cJ> s = s

.5

10

1.5

o~
o~

20

H

x

~

1.0

y

~

.9

o~ x ~
o~ y ~

1.0

and
Figure 9-Solution of Laplace's equation for X = 100, Y = 20
in the region of Figure 8 as H is varied

.6.

Variance Reduction Technique

+(l,X) • 1

1.~------~--------------~
.~I- - - - - - - - - - - - - - - - - -

-

.6 - - - - - - - - - - - - - - - - +<1,7) • 0

+(0,7) • 0

o ~----------------------~

\72+ • -1

+(X,O) .. 0

1.0 x

Figure 10-Rectangular region for which a solution to Poisson's
equation is desired

The solution for x = y = .5 is given in Figure 11. A
base number of 1000 walks was used. Approximately
100 walks were performed per second.
The time required for the K walks actually made
was measured by counting a one-hundred kHz clock

_____ Theoretloa1
O~'-'-

+ Voltage. Logical 1

...---4--.------I----r-----I---T""T - Voltage· Logical 0

Moet 8ignificant bit

- Voltage

.

Ri

Analog Voltage~

Analog Voltage

lRL

Figure

Figure I-A 4 bit digital to analog converter with the binary
input 1011

39

2~Amplifier

-

-A

'-

- i
~ Negative Reference Voltage = E

(secondary voltage standard)

r

circuit acts as parallel current regulator at
node (1)

4-0

Spring Joint Computer Conference, 1970

Figure 3-DAC switching waveforms between 377 7778
and 400 000 8 at 5 us/cm and 10 v/cm. Top: DAC ouputBottom: Power source switching, voltage not to scale

In a noise-free analysis, the voltage at node (1) is:

E = - Er

Ro) Er + 2A (Ro)
+ 2A ( 1 + Rs
Rs Es

(1)

Equation 1 shows only the first order error terms. It
is significant to recognize that the voltage E at node
(1) is independent of the power source, E s , and of the
resistors Ro and R s , provided that the amplifier gain A,
is sufficiently large.
A further analysis yields the result that a change in
E due to a change in amplifier gain is proportional to
A -2. This suggests that if the gain is about 104 and the
other terms are held constant to 1%, then it is possible
to obtain a stability of 1 part per million.
The transient response of the circuit can be calculated
by assuming the initial condition that all voltages are
zero, and at zero time a step voltage of one volt is
applied at the power source, Es. Another assumption,
that the amplifier frequency characteristic is equivalent
to a low-pass RC network, results in the following
transient voltage at node (1):

Ro
Rs

E (t) = - e-7rFt

Likewise, the practical accuracy limitation is due to
amplifier noise. A noise generator, En, representing the
amplifier noise at its input will appear at node (1) as a
voltage of -2En superimposed on the voltage E.
Even with the above practical limitations, it is easy
to realize the construction of an 18 bit digital to analog
converter by the use of the regulation circuit of Figure 2.
Parallel current voltage regulators are limited in
range by the current source. The circuit shown in
Figure 2 is limited in range by the maximum available
current from the amplifier. This property is put to good
use in digital to analog conversion.
If the power source, E s , is within regulation range,
then the voltage, E, has the desired accuracy. If the
power source is out of regulation r~nge, then no regulation takes place. It is possible to switch the power
source between two states, where for each state there
is a regulation circuit that is within range. Thus, two
regulation circuits are required for each high order bit
of the digital to analog converter. This arrangement is
driven by a reasonably fast power switching circuit.
It is necessary to design an 18 bit digital to analog
converter for direct drive at the final voltage and
current level. Contrary to common practice, there
should be no amplifier in the converter output circuit.
Any output amplifier would necessarily degrade the
converter performance.
For the same reason it is difficult to evaluate the
performance of an 18 bit converter. Figure 3 shows the
transient response of an 18 bit converter at mid-range
when all bits switch state. The least significant bit
change is not resolved in this photograph. On the same

--

I

~

-~

j

hi
-H-++++~~+-+"';'.--.~

,!

I
I
i

!

(2)

- " - - I - - .. ;

Again, only the first order terms are shown. In
equation 2 F is the gain-bandwidth product of the
am.plifier.
The implicit time constant and settling time of the
above model are in the nanosecond range. Unfortunately, the practical transient response is determined
by propagation delay through the amplifier and by the
slewing rate.

.

.
to" -~- -~---+--+--+---+---

.. ;-- !

I

.... "

-r ( " .

l..

.-.--I-+--.-t.l+++iH-+-......~-++-

!,
i
{L

,

'

___ ~____ ~ --+----o~

.. _..

I

r

Figure 4-DACswitching waveforms between 377 7778 and
400 0008 at 500 us/cm and 1 mv/cm. Top: DAC outputBottom: Power switching, voltage not to scale

18 Bit Digital to Analog Conversion

Figure 5-DAC ouput from 3777748 to 400 003 8 at 1 min/in and
1 mv /10 divisions

picture are shown the power source switching waveforms.

41

Figure 4 shows the same waveforms at a I,Ilore sensitive oscilloscope range, but at less bandwidth. Here the
least significant bit change is resolved. Figure 5 shows
the dc resolution of least significant bit changes of the
18 bit converter.
The converter has an accuracy of 10 parts per million
and has a stability of better than 1 part per million per
day. This performance is obtained in an unprotected
laboratory environment. Only the standard cell, that is
used as the primary reference, is in a temperature controlled oven.
The stability and resolution obtained for the 18 bit
converter suggest that the same converter could be
extended to 20 bits. A true evaluation of the performance of high precision digital to analog converters
can only be made in the final system where they are
needed.
The technique of adding high order bits to existing
digital to analog converters is very successful. It may
appear to be expensive to construct the additional bits
by having a separate regulation circuit for each state
of each bit. However, this is the practical way to
achieve high resolution.

A hybrid computer method for the analysis
of time dependent river pollution problems
by R. VICHNEVETSKY
Electronic Associates, Inc. and Princeton University
Princeton, New Jersey

and
ALLAN W. TOMALESKY
Electronic Associates, Inc.
Princeton, New Jersey

INTRODUCTION

as a partial differential equation representing the mass
balance on the pollutant in one space dimension and
time (see, e.g., Reference 6 for a derivation).

This paper is devoted to the description of work done
in the hybrid computer simulation of polluted rivers
and estuaries., Our attention in this paper is restricted
to the solution of the pollutant concentration equation.
The computer method used to perform the integration
is essentially a continuous-space discrete-time method
of lines. We have, in a previous paper,! described a continuous-space-discrete-time computer method for the
analysis of flows and velocities in a one-dimensional
river or estuary. Hence, these two programs, which
may be exercised simultaneously, must be viewed as
part of the same problem, since the pollutant diffusion
parameters in a river (as described in the present
paper) may be derived· as explicit functions of the
river geometry and water flow.
The kind of problems in partial differential equations
to which river flows and pollution studies belong are
as a rule computer time-consuming. It is therefore
desirable to place emphasis on techniques by which
truncation error-correction methods may lead to larger
grid sizes in the finite differences processes of approximation. Such a truncation error characterization and
correction method is embodied in the present paper,
which permits the truncation error induced by larger
time steps in the computer simulation to be (in the
first approximation) corrected for in a semi-exact
fashion.

ac = ~ k ac _ a (V • C) _ D ( c)
at ax ax
ax

+f

(1)

where:
C = c (x, t) pollutant concentration
V = Vex, t) river velocity (ft.jsec.)
D (c) = pollutant degradation or decay function
(We shall assume for simplicity of the
ensuing discussion that D (c) = D· C
where D is a constant.)
k = k (x, t) diffusion constant (ft2/ sec. )
f = f(x, t) pollutant source function
x = river length variable (ft.)
t = time variable (sec.)

The boundary conditions associated with this problem
are discussed in a later section.
COMPUTER ANALYSIS

The CSDT approximation
The hybrid continuous-space-discrete-time method
of approximation consists in expressing the solution
along equi-distant lines parallel to the x axis in the
(x, t) plane.
Call cj (x) the approximation of c (x, t j ) where t j = .f • ~t;
j = 0, 1, 2, .. '.'
Then equation (1) may be approximated by the se-

PROBLEM STATEIVIENT
A simplified analysis of a one-dimensional river in
terms of the polluting species is given mathematically
43

44

Spring Joint Computer Conference, 1970

ing solution of (2), we solve that equation for (Ci+l) :
. dci+ 1 d . d , - Vl+1-_kJ+I-C1H
dx
dx
dx

t=o

2

4

6

8

-

O~t

1

dVi+
+- + D)
dx

(1 - 0) [d

i

c

- O~t - fi+l -

o

(1
-0-

Ci+l

d

dx ki dx ci

10
-

d(ViC i )
]
-Dci+p
.
dx

(3)

Let the right-hand side equal:

t=25

o

2

4

10

8

6

where:

iY = ci + (1 - 0) • ~t [.!!:.- k idci _ d(Vic
dx

dx

i

)

dx

t=50
- Dc i +p]

o

2

4

(4)

10

8

6

I t is easily shown that E·i satisfies the recurrence
relation :3,4

(5)

o

2

4

6
Distance - x 100C ft,

For convenience, we call Ri the entire right-hand side
of equation (3) :

Figure 1-Typical propagation of pollutant profile

We can see that equation (3) can be rewritten as:
quence of ordinary differential equations (for j
1,2 ... ).
CiH - cJ.'

--~

~t

=

_ DCi+l

0,

_ 0)

[.!i kJ
dx

-0 • ~t

+ -dV + D )
dx

Ci+l

Ri is a known function of x and this equation can now
be solved at each time step, together with the algebraic
calculation of Ei+l as expressed by (5).

i
i
dc· ' _ d(Vic )
dx
dx

- Dei

(1

= Ri (6)

[ d , dci+ 1 d(Vi+lCi+l)
0 - k J+1 ~- - - - - dx
dx
dx

+ P+l] + (1

d
dC i+1 - Vi+ 1dC i+1 _ki+l_dx
dx
dx

+ p]

(2)

where 0 is a constant which must be chosen in the
interval 72 < 0 ~ 1 to ensure stability in the time
marching process. 2
To produce a recurrence relation for the time march-

Stability problem and application of the method of
decomposition

Equation (6) is of the second order in x. For constant
V and k, its characteristic equation is:
k"Y2 -

V

"y -

(~
+ D)
O~t

=0

(7)

A Hybrid Computer IVlethod

45

or

V
-y=-±
2

~V2
- + k (-1
- + D)
4

() • t::.t

BACKWARD

(8)

These two values of -yare real and of opposite sign.
Thus, direct integration of equation (6) as an initial
value problem of the second order will have unstable
error propagation properties whi~h may impair the
validity of the computer results.
The Method of Decomposition (Vichnevetsky,2.3)
consists in avoiding the difficulty by transforming this
second order differential equation into two first order
ordinary differential equations, for which directions of
stable x-integration may be chosen independently.
This is obtained as follows:

INTEGRATION

~F(JC....)ao

~=-tl+ ~+ (g+g+ D)
I~

BACKWARD STABLE

INTEGRATION

v(x..,. 0
"1

FORWARD. STABLE

...t,•
...

INTEGRATION

The second order operator

appearing in equation (6) is (arbitrarily) decomposed into the product of two first order operators,
LB and L F, which are intended to yield stable
integrations in the backward and forward directions, respectively:*
Figure 2-Computing sequence block diagram

and:
dAF
1
dV
---AFAB=--+-+D
dx
() • t::.t
dx

Conditions for the stable integration in these respective directions are AB ~ 0 and AF ~ O.
By identification of (10) with (9), we find:

(1

d
d - Vd- - - - + -dV
L=-k-.
+ D)
dx dx
dx
() • t::.t
dx

d
dx

d
dx

dAF
d
d
- AF - - kAB dx
dx
dx

= - k- - -

(12)

AF may be obtained by the integration of the Ricatti
equation:
dAF _ A (V - AF) = _1_
() • t::.t
dx
F
k

dV

+ dx

D
+

(13)

and AB subsequently obtained by the application of
equation (11).
Now, a particular solution of equation (6) is obtained
by the following sequence of computer integrations:

+ AFAB

(a):

LB(y(x»)

==

d~ Y -

AB(Y) = Ri(x)

(14)

or:
(11)

* The operator LF( . ) is said to be forward-stable if all solutions
of the equation LF(V) = 0 are stable in the classical sense. The
operator LB( . ) i~ said to be backward-stable if all solutions of
the equation LB(V) = 0 are stable in the classical sense when the
integration variable (-dx) is used instead of dx An operator
L( . ) is said to be unstable when it is neither forward-stable, nor
backward-stable.

(b):

(15)

Indeed, that Ci+l satisfies (6) is easily shown:
L(Ci+l) = LB . LF(Ci+l) = L B(L F(ci+ 1 ») = LB(y) = Ri

q.e.d.
In summary, the sequence of equations solved at
each time step in this problem is that of Figure 2.

46

Spring Joint Computer Conference, 1970

BOUNDARY CONDITIONS
The equation (1) is of the first order in time and
second order in space. Hence, solutions are specified by
one initial condition function (i.e., the initial pollutant
concentration profile c(x, 0)) and by two spatial
boundary conditions. One of these boundary conditions,
c(O, t), is well defined (the pollutant concentration at
the inlet of the river section under analysis), while
the second boundary condition, c (Xmax , t) is not easily
defined in terms of the problem formulation. However,
that end-boundary condition has, mathematically, a
very small influence on the solution c (x, t) for x E
[0, Xmax] except for a small region which is close to
xmax . Hence, one may choose this end-boundary condition to take any convenient form. The one chosen in
the computer implementation described hereafter is
that:*

-acl
-0
ax Xmax

(16)

This condition can be seen to be automatically satisfied by choosing as boundary conditions for the AF and
yequations (equations (13) and (14), respectively):
AF(Xmax) = 0; }

(17)

mation of equation (1) as' an approximation process in
which the diffusion coefficient k(x, t) results from that
which is introduced explicitly by the computing process
described in an earlier section of this paper, plus a
spurious k* (x, t) which is introduced by' the CSDT
approximation itself. If the spurious par~ of the diffusion coefficients (i.e., k* (x, t)) can be predicted, then
it becomes an easy matter to correct for this factor by
subtracting it from the desired value before entering
into the computing sequence of the third section of
this paper.
The remainder of this section is an analysis of the
truncation-induced diffusion effect of the CSDT approximation of equation (18) followed by an experimental computer verification of the applicability of
these theoretical results to the more general CSDT
approximation of the transport-diffusion equation (1),
as described earlier. The partial differential equation
(18) describes a pure fluid transport phenomenon.
The CSDT approximation of equation (18) is expressed by:

dui+1.
dU i ]
- V [0 +
(1 - 0 ) dx
dx

ll.t

(19)

The solution of (19) approximates that of a transport
diffusion equation of the form:

Y(Xmax) = 0

(20)
i.e., from equation (1.5):
dC I
ax
i

1

+

1
=k
[Y(Xmax) - AF(Xmax) - Ci+l] = 0

q.e.d.

Xmax

ERROR ANALYSIS AND FIRST ORDER CORRECTION OF THE CSDT APPROXIMATION

where the diffusion constant k* is a spurious diffusion
coefficient, introduced strictly by the approximation
process, and which depends on the parameters appearing in (19).
An equivalent value of k* may be estimated analytically. To that effect, we express the different terms of
(19) in a Taylor Series around the point ui(x) :

Analysis
Application of the CSDT approximation to the
simple fluid transport equation:

au = _ vau
at
ax

(18)

introduces a truncation error which has the effect to
"disperse" the solution u(x, t) by a diffusion-like phenomenon. Hence, one may look at the CSDT approxi-

* This assumption is not a limitation of the method described
in this paper. Any other boundary condition C (XmaxJ t) could be
chosen. This then would require the independent calculation of
solutions of the homogeneous equation L(w) = o. as shown for
instance in Reference 4.

ui +

au

a2u

ll.t2

-at - ll.t + -at --2 + ---

(21)

dU i+1 au i+1 aui a2u i
-=-=-+--ll.t+--dx
ax
ax
axat

(22)

Ui+l =

2

Hence, upon substitution of these relations in (19),
that equation becomes (we may now delete the superscripts) :

au ll.t a2u
-+--+
at
2 at2

---

au
a2u
)
-VO-+--ll.t+--[ ( ax axat

+ (1 -

8) -au]
-

ax

(23)

A Hybrid Computer lVlethod

where u(t) is the solution of:

For the exact solution, we have the relation:

au
-=
at

47

_V au
ax

du
dt

k

--u

Hence,

which yields:

U(t)2 - u(0)2 = 2k • t

(31)

(24)

and, generally, between two instants of time tl and
t2 > tl:

and

Thus, after using these relations, (23) becomes:

au
at

- = -

v au
-ax + (8 -

~)

a2u
+
ax2

• V2 • I1t • -

...

(26)

Equation (30) expresses the fact that the "peak'"
moves with the flow at the velocity V, that the Gaussian distribution property remains preserved in time,
and that the standard deviation u(t) grows as Vt.
Experimental measurement of u (t) is easily achieved,
either by measuring the "peak" of the solution:

By identification of (26) with (20), we find the equivalent diffusion constant:
k* = (8 -

~)

V2 • I1t

(27)

For 8 < ~, k* becomes negative: It is of interest to
note that this corresponds exactly to the values of 8
for which the CSDT approximation is unstable. 2

Computer verification of the analysis
Computer verification of the preceding analysis has
confirmed its first-order validity within a range of
parameters which applies to river pollution problems.
In order to perform this verification, the homogeneous
equation:

ac = _ V ac
at
ax

k a 2c
+ c ax2

Cmax (t)

=

A
u(t)

(33)

.

or by measuring the "2u" of c(x, t) at 1/ Ve of the
peak: (for c = Cmax • e-1i2 • x = Xpeak ± u). A typical
input is shown in Figure 1.
For the purpose of this study, the computer program
described in an earlier section was utilized for equation
(28), where kc .was chosen "small" (specifically equal
to .01 ft/sec 2), and the results shown on Figure 3 are,

K
500"------..------r--~~-/~'

-

(28)

was integrated in the manner described in an earlier
section on the computer, with initial conditions c(x, 0)
corresponding to a Gaussian distribution; i.e.,
A

c(x,O) = ,..(0)
• exp v

(x -

Xp )2

2U(0)2

(29)

where A is a positive constant, Xp the point where the
peak of the distribution occurs at t = 0, and u(O) the
initial standard deviation of the c(x,O) distribution.
The exact solution of (28) With (29) as initial condition is (at least if the boundaries are assumed to be far
enough not to have any effect upon the solution) :
C

[x - (x p + V • t) J2
A
( x, t ) = - () • exp ( )2
ut

2'ut

·9

(30)
Figure 3-K* vs.

(J

for

V

= 4.

and At

= .50

1·

e

4:8

.5

Spring Joint Computer Conference, 1970

rJ*

rJo

.4
1.

.3

.8

.9

.8

1.0

.2

e

0

.5

1.0

1.5

f)
Figure

5--rJc

as a function of

'l1

form:

ae
at
Figure 4-'l1 * vs. ()

in effect:
k*

=

kmeasured

~ kc ~

kmeasured

(k c small)

(Note that the range of k of interest for rIver
studies is of 500 and over.)
Experimental results are shown in Figure 3. Consideration of this figure shows that there is a reasonably
good agreement between the predicted and experimental
v~lues. of the truncation-induced diffusion constant k*.
I ndueed diffusion in the transport-diffusion equation

The preceding analysis and computer verification of
k* is concerned with the simplified transport equation

ae
at

= _

V

ae
ax'

or at least with the transport diffusion equation with
"small" values of the diffusion constant k.
We are in practice interested in the equations of the

where k is not "small" in the sense previously described.
The question thus arises as to what happens to k*
as a function of 0, for non-small values of k. An answer
to this question was searched experimentally, by performing experiments' similar to that described in the
preceding section, but with k as an additional free
parameter. In this analysis, it was first recognized
that an analysis of the dimensionless "induced" diffusion constant 1/* = k* /V2 • ilt could be obtained as a
function of the dimensionless "explicit" diffusion constant 1/c = kc/V2 • ilt, thereby providing a relationship
where, for a fixed value of 0, 1/* would be a function of
'J1c alone.
The argument here is that there is no reason why
Buckingham's 'If' principle cannot be applied to the
behavior of computer program solutions as it applied
to the field of mechanics and thermodynamics. Hence,
relationships between dimensionless parameters must
be absolute, save for round-off phenomena, if and
where they occur.)
Experience confirmed this theory, and Figure 4 shows
a dimensionless chart of the induced 1/* = k* / V2 • Ilt
as a function of and 1/ = k/V2 • ilt.

°

A Hybrid Computer l\1ethod

49

of () being used from Figure 5 and introducing the corrected diffusion constant kc = 'YJc V2Llt into the procedure
discussed earlier.
This correcting method is applied continuously as a
function of time and space, as shown in Figure 6.

CONCLUSIONS
The method of s~mulation of river pollution described
in this paper has been implemented as a computer
program. Experimental results have confirmed the usefulness of the diffusion correction of Section 6 as a
means of allowing larger time steps t~ be utilized. It
has also been found that the general "hybrid approach"
which consists in approximating the problem in the
form of ordinary differential equations offers a convenient way to implement pure-numerical simulations.

REFERENCES

Figure 6-Diffusion-corrected computer method

Weare reminded at this point that any practical
computer program will entail a fixed value for (), and
that 'YJ* will thereby become a function of 'YJ alone.
Figure 5 shows this more useful relationship for
various values of ().
DIFFUSION-CORRECTED COMPUTER
METHOD
The "diffusion.,..corrected" method simply consists
in deriving the function 'YJc ('YJ) for the particular value

1 R VICHNEVETSKY
Computer integration of hyperbolic partial differential equations
by a method of lines
Proc Fourth Australian Computer Conference Adelaide
South Australia August 1969
2 R VICHNEVETSKY
Hybrid computer methods for partial differential equations
Course Notes for the In-Service Seminar on Engineering
Applications of Hybrid Computation Pennsylvania State
University June 19-21 1969
3 R VICHNEVETSKY
A pplication of hybrid computers to the integration of partial
differential equations of the first and second order
Proceedings of the IFIP Congress 68 Edinburgh Scotland
August 5-10 1968
4 R VICHNEVETSKY
A new stable computing method for the serial hybrid computer
integration of partial differential equations
Proceedings SJCC Conference Vol 32 Thompson Book
Company Washington DC 1968
5 LANGHAAR
Dimensional analysis and theory of models
J Wiley and Sons New York New York 1951
6 R BIRD W STEWART E LIGHTFOOT
Transport phenomena
J Wiley and Sons New York New York 1960

Programmable indexing networks
by KENNETH JAMES THURBER
Honeywell Incorporated

St. Paul, Minnesota

INTRODUCTION
One of the most important functions that must be
performed in a digital machine is the handling and
routing of data. This may be done in routing logic
(computers), in permutation switching networks (computers and telephone traffic), sorting networks, etc. In
some parallel processing computers being envisioned
the handling of large blocks of data in a parallel fashion
is a very important function that must be performed.
For a special-purpose machine a fixed-wire permutation
network could be acceptable for the handling of data;
however, for a general-purpose machine more sophisticated reprogrammable networks are required.
The permutation network problem has been previously studied by Benes, 2 Kautz et al., 3 Waksman, 4
Thurber,1i and Batcher.l This paper introduces and
defines a new network to be considered. This is the
generalized indexing network. This network can perform
an arbitrary mapping function and is easily reprogrammabIe to perform any other arbitrary map with n inputs and rri outputs, and has many potential areas of
use. The most interesting possible area of application is
the processing of data while routing the data. If the
network is used as routing logic, it can perform many
simple data manipulation routines while routing the
data e.g., matrix transposition.
Some of the solutions presented are significant improvements on the shift register permuters suggested
by Mukhopadhyay.7 The solutions suggested here are
programmable (utilizing the output position mask),
as fast, and utilize less hardware than the previously
suggested shift registers permuters.

produces a one to one mapping from the n input lines
to the n output lines of the network. The permuter
can perform a very limited set of functions. As currently
studied, the permutation networks can only transfer
lines of data. In this paper the networks will be utilized
to transfer words of data.
Limitations of permutation networks are that input
words cannot be repeated or deleted at the output.
Also, blanks cannot be inserted into the output and
the number of input words and the number of output
words must be equ~l. The indexing network* differs from
the permuter in that input words can be repeated or
deleted and blanks can be inserted in the output. Also,
for an indexing network the number of input words (n)
has no special relation to the number of output words
(m). The non-blank output words may appear in
many contiguous subsets of the output words (these
subsets could be empty). Figure 1 shows some examples
of possible permutation networks. Figure 2 shows some
examples of possible indexing networks.
In this paper Xi means a word of input data (instead

OPM

@TIlQJ

Xl

YI = X 2

X2

Y2 =X I

@]illJ

X3

Y3 =X
4

~

X4

Y4 =X3

[QJillJ

FORJV[ULATION OF THE PROBLEl\1
Previously, most researchers have considered the
problem of permuting a set of n input lines Xl, X 2, ••• ,
X n- l , Xn onto a set of n output lines Y I , Y 2, ••• , Y n- l ,
Y n by means of a device called a permuter. A permuter
51

Figure I-Permuter

* The terminology indexing network and generalized indexing
network will be taken to have the same meaning.

52

Spring Joint Computer Conference, 1970

OPM

Figure 2-Indexing network and its OPM

of an input line) and Y i means a word of output data
(instead of an output line). The blank word is designated by O. The actual storage device containing or
receiving the word of information (or the number of
bits in the word) is not shown and the inputs an.d ~out­
puts to the network are still only pictured as one line
for each Xi and Y i. (When in reality each line may
symbolically represent p parallel input lines (for a p
bit word) for the parallel transfer of each word into
and out of the netfork.) Each storage device for one
word of information is called a cell.
It should be noted that the permutation network
problem is a sub-problem of the generalized indexing
network problem.
If N is a network with n inputs and m outputs then
the output position mask (OPM) is a vector containing
m distinct cells with log2 (n + 1) binary bits per cell.*
Each cell contains the binary code corresponding to the
input value desired in the corresponding output cell.
Log2 (n + 1) bits are needed since the n inputs and
the 0 must have a code so that they can be specified
as output values if desired. Figures 1 and 2 show several
networks, along with their corresponding output position masks. Each cell consists of a shift register capable
of delivering its contents (in parallel) onto the appropriate control lines of the network.

transfers are arranged such that a transfer pulse to the
input set of registers causes the simultaneous parallel
cyclic transfer of the contents of the registers; i.e.,
n ~ n - 1, n - 1 ~ n - 2, ... , 2 ~ 1,1 - 0, and
-0 - n simultaneously. A transfer pulse to the output
set of registers (and to the OPM) causes the simultaneous parallel transfer of the contents of the registers; i .. e,
n~n - 1 (OPM(n) ~OPM(n - 1)), ... , 2~ 1
(OPM(2) - OPM(l)), and 1 ~ n (OPM(l) ~
OPM (n) ). The previously specified functions are performed by the Input Cyclic Control (ICC) and Output
Cyclic Control (OCC) respectively. The Transfer
Control (TC) performs the function of transferring
data from input position 0 to output position L There
is no output position O.
Figure 3 shows the clocking hardware used to read
the OPM and produc~ the desired control pulse for the
TC. It is assumed that the clocking hardware contains
a clock with clock rate c/p, where c is the clock rate of
the sorter and p is a suitable positive integer. Binary
constants C2, Cl, and Co placed on the input lines to the
network produce an output from the network after
(c2(4) + cl(2) + co(l)) units of delay. One unit of
delay is equal to the time period between indexing
clock pulses (the clock rate of the indexing network is
c so a unit delay is c- 1 second). The clocking hardware
is used to advance the input registers to a position
selected by the OPM.
Figure 4 shows a general setup for an indexing network and a complete indexing network for n = 5 and
m = 4.. The words are 4-bit words in this example. The
indexing network consists of an input set of registers
and associated ICC and TC hardware, an output set
of registers and the associated OCC and OPM hardware, and the clocking and control hardware.
The clock rate of the indexing network is c per second

A SHIFT REGISTER SOLUTION
This is the first of several "shift register solutions"
to be presented in this paper. The name shift register
solution has been used for simplicity; however, what is
actually used is a set of shift registers (each contains
or receives one word of inf()rmation) which can perform
a parallel transfer of its contents to its neighbor. The

* Where log2(n + 1) is understood to be rounded off to the next
larger integer if log2(n + 1) is not an integer; e.g., log2(7 + 1) = 3
and log2(10 + 1) = 4.

IIlINDICATES A DELAY OF ONE TIME UNIT. IF THE INDEXING NETWORK CLOCK RATE IS C THEN.
EQUIVALENT TO A TIME DELAY OF lIC

05

Figure 3-Clocking hardware for obtaining delays from 0 to 7.
time units of delay

Programmable Indexing Networks

53

INPUT REGISTERS

I

INPUT WORD n

I

OUTPUT WORD

mJ

OPM WORD

m

Figure 5-Indexing network
Figure 4(a)-Generalized shift register indexing network

and the clock rate of the clocking hardware is c/8 per
second. In general the clock rate for the clocking hardware is cln + 3 per second.* No provisions have been
shown for connecting the network to other hardware,
but this should be obvious. A blank (binary 0) is
placed in register 0 of the set of input registers.

WORD 0

WORD

WORD 1

WORD 2

I oI oI 0 I0 I

I I
WORO 2

I I
WORD 3

I

I

I

I I

[ililiJ

WORD 3

I I

CiliTIJ

The operation of the network is easily explained. Assume the input registers are full and the first clock pulse
is produced (in both the clocking hardware clock and
the indexing, network simultaneously). The binary
value in the OPM causes the pulse to the ICC and TC
to be delayed a number of time periods equal to its
value. Meanwhile the input is being cycled. When the
correct input register has moved into position 0, the
transfer pulse arrives inhibiting further cycling and
causing the transfer (a non-destruct read) from input
o to output 1 to occur. The input is still inhibited and
the output is shifted one position by the OCC. The
input register then is cycled to its origin'al state and the
process begins again. After rn cycles the output registers
are all filled and back in their correct position so that
the indexing operation has been completed.
This type of an indexing network can be configured
in many different ways depending upon the speed
desired and the hardware available. Figure 5 shows
the manner in which the network could be set up for
faster operation. The network in Figure 5 requires
twice as much hardware as the network in Figure 4,
but is twice as fast. Figure 6 is an indexing network
that operates approximately n times as fast as the

WORD 4

I I

[ililiJ

WORD 4

I I I
WORD 5

I I I

Figure 4(b)-Indexing network with n = 5, m = 4, and word
lengths of 4 bits with the OPM set to produce, (0 Xa X 2 X 4)

+ 3 are needed instead of c/n because (1) a time period is
needed for shifting n + 1 input values instead of just n input

'---------i

,,'.tI~I,

) - - -_ _ _ _-----'

* c/n

values, (2) a time period is needed to transfer the data, and
(3) a time period is needed to shift the output registers and OPM.

Figure 6-High-speed indexing network

54

Spring Joint Computer Conference, 1970

The details of the operation are as follows:
(1) The input data and the OPM: are inputted into
the network.
(2) The input data is cycled until the input code
equals the current value of the OP]VL
(3) The input word is transferred to the output
register.
(4) The output register and the OPM are advanced
one position unless the output register is full in
which case go to (6).
(5) Go to (2).
(6) Output the data in the output register.
(7) Stop.

INPUT TO

-------=:::)
INPUT WORD
REGISTERS

Figure 7-Comparator indexing network for n = 6 and m = 4

network in Figure 4. As can be easily seen, this solution
to the generalized indexing network problem can be
easily configured to account for many different hardware and speed requirements. In Figure 6, less logic
is required in parts of the network and the clock rate
of the clocking hardware is different than the rat~ of
the network in Figure 4. This is because the set of output registers does not have to be shifted to their next
receiving positions since the network is a "parallel"
indexing network and the output is available after
n + 2/c seconds.

A SOLUTION UTILIZING SIMPLIFIED
CLOCKING HARDWARE
The purpose of this section is to introduce another
version of a generalized indexing network which utilizes
shift registers to perform the indexing operation. This
solution utilizes the OPM to program the network.
Figure 7 shows 'the solution for n = 6, m = 4. An extra
set of log2 (n + 1) bits has been added to the input
register. These bits contain the input position of the
input data and are utilized to select the appropriate
output value.

EXTENSIONS OF THE SOLUTION GIVEN IN
PREVIOUS SECTION
The solution given in the previous section is interesting in that there are several other methods by which
it can be implemented in a more sophisticated manner.
Since the solution given previously does not require as
much hardware as some of the other solutions it is interesting to consider what can be done with the addition of some extra hardware.
As with the solution given in the third part of this
paper, the solution given in the previous section can be
implemented in a form such' as in Figures 5 and 6. Also,
it could be implemented in any form that "lies" between the solutions given in Figures 5 and 6.
The following solutions require that the set of input
registers be able to shift cyclicly backwards (0 ~ 1,
1 ~ 2, ... , n - 1 ~ n, n ~ 0) as well as forwards
(1 ~ 0, 2 ~ 1, ... , n ~ n -: 1, 0 ~. n).
One method of improving the solution given previously is to make more than just a comparison of the
two numbers for equality. A solution is to check and
see whether the number contained in the OPM is
greater than, equal to, or less than the number designating the current state of the, input. If the· OPM
number is larger shift the input register forward, if the
OPM numb~r is smaller shift the input register backwards, and if the numbers are equal then transfer the
information. The actual shifting can be implemented
as in the previous section (a comparison after every
input shift) or as in the third section (this would require a subtraction to determine the number of needed
periods of delay) using the clocking hardware in Figure
3 to -produce the transfer pulse.
Another improvement that can be made is based
upon the following observation; i.e., if the set of registers
can cycle both forwards and backwards then there are
cases where it is shorter time wise to go around one of

Programmable Indexing Networks

55

DESTINATION
OF DATA

DATA

Figure 8-General arrangement of a splitter register
2

the "ends" of the set of input registers. For example,
if n = 10 and the network is at 9 and needs to go to 1
then the shortest way is 9 ~ 10, 10 ~ 1 (instead of
9 ~ 8, 8 ~ 7, ... , 2 ~ 1). This solution can be implemented by calculating and comparing n + 1 ..,I p - q I to I p - q I were p and q are the current location and the desired location. Again this solution could
be built as in the previous section (comparison after
each input shift) or as in the third section (using clocking delays); however, it is probably best implemented
using clocking hardware (such as in Figure 3) because
the minimum of n + 1 - I p - q I and I p - q I give
the number of time delays to be produced by the clock.
Therefore, after the comparison has been made, the
minimum value can be used as input data into clocking
hardware and the register cycled in the proper direction
(forward or backward).

L..I

_---.;...J

CCi2~

I

I

t
. n·1

1

~--~--~Ln=1==~

Figure lO-Section of the splitter used to produce a permutation

THE SPLITTER
Thifl section presents a solution to the generalized
data indexing problem based upon an input decision
called the input position map. This solution utilizes a
modular construction and seems most interesting in the
case in which a lot of different indexings must be produced in rapid succession. A major advantage of this
type of network is that it is capabl.e of simultaneously
processing many indexings at the same time.
The input position map (IPM) is a set of binary
codes associated with the input data of a network that

nTD

1\12

-=-=
1~~SFER

0:::::::::>

INPUT
INTO
THE
SPLITTER

~
~
V2

TO

IV4

=

.

I 'td 1-==-... c::::::J
1\18.......

......
TRANSFER
LOGIC

[[]4 _
TO

1\18

OUTPUT WORD 0

~ OUTPUT WORD

2

:

_

TRANSFER
LOGIC

2TO

1\14

=

c::::J OUTPUT WORD n-1

Figure 9-Use of splitters to perform a permutation for n = 2k

specifies the position (or positions) that the data is to
be transferred to in the set of output registers. In the
case of the design of a splitter it will be assumed that
the input data and the binary code contained in the
IPM associated with that input data are contained in
an extended register as shown in Figure 8.
Figure 11 shows the general block diagram of several
splitter networks organized to perform a permutation
function. Each module in the splitter takes the n inputs (assume n is even) and groups of these n inputs
into two n/2 input groups based upon the mapping
information contained in the mapping information
portion of the node. The splitter is most useful in constructing sorting networks that have n = 2k.
The permutation network shown in Figure 10 can
be built in various sizes so that it can be configured as
shown in Figure 9. The mapping information inputted
to this network would be the binary value of the position in the set of output registers that the data was
destined for so that an arbitrary. input register would
contain DATA and DESTINATION OF DATA where
the destination of the data is between 0 and n - 1. The
first splitter encountered (n ~ n/2) would sort the
information based upon the binary value contained in
the highest order digit; whereas, the last group of

56

Spring Joint Computer Conference, 1970

C/2}--_ _- ,

Figure ll-General splitter module

splitters (2 ~ 1) would read the lowest order digit.
The values being read would be inputted to the AND
gates as shown in Figure 10. The full word of data
would be transferred to tne appropriate output register
in parallel and the appropriate output register and the
input register would be advanced one position each.
The next word is then processed in the identical manner.
To split n elements into two n/2 element groups requires n clock periods. The bit that the AND gate
reads is different at each level, but begins with the high
order digit and proceeds to the low order digit.
The IPM for a permuter constructed by the splitter
method is just the binary output destinations of the
data. It is a little harder to construct a generalized indexing network using this concept. The permuter was
easy because it needed a one to one and onto mapping
function. A generalized indexing network is a little
harder but not impossible. It will be slightly harder to
compute the IPM than it was for the permuter, but
the following method and the hardware shown in
Figure 11 co·nfigured as in Figure 9 will produce a
generalized indexing network. One modification of the
network is that in the first splitter, the data must be
broken from n into two groups of m/2 elements. From
that point on each group of m/2 p elements is split into
two groups of m/2 p +1 elements. The mapping information for the network can be furnished by the following
observations. Each element of input data can be
categorized as· to where it is transferred by means of a
two-bit binary map (byte) . The high order byte

specifies the split n ~m/2; whereas, the low order byte
specifies the split 2 ~ 1. There are exactly four distinct
possibilities that can happen to a piece of data; i.e.,
the data not transferred to either output register, the
data transferred t~ one but not the other output,
register (two possible cases), or the data transferred to
both output registers. These are indicated in Figure 12
and the necessary hardware shown in Figure 11. This
design allows design of a generalized indexing network
if the output registers are all set to the blank (0) value
before they receive any data. In order to make the
splitter work utilizing two bit bytes, the mapping information must be introduced at each stage of· the
process as shown in Figure 13. If the mapping information was completely specified with the data in stage 1
there would be no way to produce the indexing (X4 00
X 4 ) because the second byte would have to be 10 and
01 simultaneously. (X4 00 X 4) could be produced by
the map 11 associated with X 4 at stage 1 the map 10
associated with the value of X 4 in stage 2 (A I), and
the map 01 associat.ed with X 4 a stage 2 (BI) in Figure
13. The difficulty encountered in .constructing the
TRANSFER

D

BYTE VALUE

D
D
D

o

0

o

1

1 0

D
1 1

Figure 12-Possible data transfer operations

Programmable Indexing Networks

maps for the splitter is balanced by two advantages of
the splitter; i.e., (1) the designer can gets by with only
two bits of mapping information in each data word 'at
~very stage of the process (this has not been done in
Figure 13, but the reader can clearly see why it can be
done by looking at Figure 13), and (2) since previously
used mapping information is no longer needed, many
different indexings can be in process at the same time.
The IPJVI can be constructed by tracing the desired
data output back through the network. Figure 13 is a
generalized indexing network for n = 8, m = 4 constructed using the splitter concep~. It is conceivable to
combine the networks using only one portion to replace
the portions marked AA' and BB', thereby eliminating
some transfer hardware and A (A') and B (B') at the
expense of more complex clocking and logic. By changing the size of the bytes it is conceivable to construct
many different IPM's, but the previously explained
IPM seems to be a very good one to use.
This network can be built to provide very high rates
of throughput since the level mj2 splitter takes half as
much time to operate as the m level splitter. With
some sophisticated clocking it is conceivable to "time
share" the mj2 level splitter with two m level splitter
and thereby maximize throughput.

DISTRIBUTED INDEXING NETWORKS
This section presents the final two solutions to the
indexing network problem considered in this paper.
These two networks are characterized by a highly
parallel operation, high speed, and unique timing arrangement. Each network has one comparator (or

","PPlIiICI"-OItMATIOtIIiIiPUT
FO~LOW01T1t8VT£ '

DIIIl
[]ill]
[]ill]

DllIl
DIII1

0IIll
DIII1
DIII1

o~

DIIIl

Figure 13-Generalized indexing network

0-

1

57

/C

EXTRA
BLANK
REGISTERS

DISTRIBUTED OPM AND COMPARATORS
INPUT REGISTERS

OUTPUT REGISTERS

Figure 14-High-speed comparator indexing network set to
produce (0 0 X 2 Xl 0)

clocking hardware unit) and one transfer control unit
for each word of desired output. In some cases (m >
n + 1) this requires the addition of several extra blank
input registers as in Figure 16. It is assumed that the
clock controlling the cycling of the input register has a
long enough time between pulses to allow the comparison and transfer of data.
Both of these networks are based upon the observation that in a complete cycling of the input registers, all
data passes through every register. When the correct
word is recognized it is immediately transferred.
In the comparator solution in Figure 14, at every
clock period the data currently occupying input positions 0, 1, 2, ... , M' - 1 is compared to the OPM and
the appropriate transfers made. This solution (and
the solution shown in Figure 15) requires the larger of
m or n + 1 clock pulses for the indexing of the input
data.
The solution shown in Figure 15 requires a modification of the OPM. The value of the ith position of the
OPM is not the binary value of the input data desired,
but the number of clock pulses before the input data
is in the ith position; i.e., if Y i = X j then
OP1VI (i)

=.i -

i if.i ~ i
m - (i - .i) if.i < i and m > n + .i
n + 1 - (i - .f) if.i < i
andn + 1 ~ m

58

Spring Joint Computer Conference, 1970

INPUT REGISTERS

Figure 1.5-High-speed indexing network

CONCLUSION
A new class of networks was presented in this paper.
These networks have the ability to arbitrarily reorder a
set of n input cells into 1n output cells with the repetition
or deletion of any cell allowed. Blank cell values may be
arbitrarily placed in any of the output cells, thereby
allowing the construction of arbitrary contiguous sets
of data separated by blanks in the output. These networks have as special cases previously studied permutation and sorting networks. The networks described
here are extremely general in nature, and should have
many different areas Jf application, particularly, in
areas needing networks, for routing and transferring
data.
The ease of programmability of the indexing networks described is a 'feature that is extremely unique.
Almost all previously studied permutation and sorting
networks have long set up and programming times
that tend to make them useless in problems in which
the destination of the data has to be. changed between
each set of data inputted. The manner in which the
programs are inputted into the network and the
simplicity of the program are other features that are
unique to the approach followed in this paper. Another
unique feature of the solutions presented is the range
of tradeoffs they cover. The designer can easily make

tradeoff comparisons between the solutions and has
many possible different ways to configure each type of
network to obtain various speed and hardware comparisons. Hybrid solutions may be extremely attractive.
An interesting solution to consider is utilization of the
splitter to go from n to nj2k followed by utilization of
2k non-splitter networks (like a comparator network).
In this manner the large input block of data can be
broken down for high-speed "parallel" sorting by other
networks.
It is suggested that future research consider construction of high-speed routing networks utilizing the previously described sorting networks. Tllese networks seem
to be particularly attractive for the routing and rearranging of data in parallel processors. Another topic
that might be of interest is the investigation of the
possibilities of performing logic operations on the data
while it is being routed (indexed). Consideration might
be given to the use of these networks as memories.
Some of the logic might be able to be used to convert
frQm a indexing network to a memory.
Some possible applications for the' generalized indexing networks are: sorting of data, routing of data,
permutation networks, multi-access memories with the
number of words of memory accessed a controllable
variable, ass~ciative memories, multi-access associative
memories, reconfigurable multi-processors for realtime users, associative multiprocessors, and any other
applications which require the manipulation and reconfiguration of large amounts -of data.
BIBLIOGRAPHY
1 K E BATCHER
Sorting networks and their applications
AFIPS Conference Proceedings pp 307-314 SJCC 1968
2 E V BENE~
Mathematical theory of connecting networks and telephone
traffic
Acadimic Press New York 1965
3 W H KAUTZ K N LEVITT A WAKSMAN
Cellular interconnection arrays
IEEE Trans Electronic Computers Vol C-17 pp 443-451
May 1968
4 A WAKSMAN
A permutation network
Journal of the ACM Vol 15 pp 159-163 January 1968
5 K J THURBER
Design of cellular data switching networks
Submitted for Publication
7 A MUKHOPADHYAY G SCHMITZ K J THURBER
K K ROY
Minimization of cellular arrays
Final Report NSF Grant GJ-158 September 1969 Montana
State University
Bozeman Montana

The debugging system AIDS
by RALPH GRISHMAN*
N ew York University
New York, New York

In comparison with the growth of procedural languages over the past decade, the advances in facilities
for debugging compiled code have been small. 1 The debugging services offered today on most conversational
systems have not advanced fundamentally from the
design of DDT for the PDP-I. t Batch systems have
added a potpourri of other aids-in particular, systems
with a machine simulator have included a variety of
traces-but, in general, selective tracing and program
checks of even slight complexity have been quite
messy to invoke, if they were available at all. tt
The object of the AIDS project has been to provide
a debugging system for FORTRAN and assembly language code on the Control Data 6600 which includes a
flexible and reasonably comprehensive set of tools for
program tracing and checkout, suitable for both batch
and on-line use. A large variety of traces and checks
can be invoked through a special "debug language"
syntactically similar to FORTRAN. A system of such
breadth is really practicable only on a machine with
the power and memory capacity of a CDC 6600; such a
large debugging system would be difficult to implement
on some of the smaller machines on which the earlier
interactive debugging systems were developed. At the
same time, it is precisely the large, complex programs

and supporting systems for machines of this size which
make powerful debugging facilities so valuable.
HISTORY
The story of AIDS may be traced back to early
1965, when Prof. J. Schwartz initiated the development of a debugging system for the CDC 6600, which
was soon to be delivered to N ew York University. This
system, dubbed the WATCHR, was developed and
expanded over the next two years by E. Draughon into
a working debugging system. 4 As this system developed,
several fundamental difficulties came to light. First,
as the options proliferated, calling sequences became
more complex, to the point where users not only could
not possibly remember the calling sequences, but often
would not attempt to invoke some of the more powerful
W ATCHR features. Second, although W ATCHR was
adapted for use on the N ew York University timesharing system, it was clearly not designed for interactive use. Symbols were not kept at run time, so the
user had to refer to his program in terms of absolute
addresses; lengthy calling sequences were particularly
cumbersome at a teletype.
Thus, in 1967 development was begun on a new debugging system, designed from the outset for conversational as well as batch use, to be invoked through a
special procedural language rather than subroutine
calls. Design and coding lasted through mid-1968, and
distribution of the program began in the spring of 1969.
Several basic requirements were established for the
implementation: First, to facilitate maintenance, the
same program was to be useable in both batch and interactive modes. Second, to facilitate distribution, the
system had to be useable without any modification to
the operating system, and have a simple input-output
interface adaptable to a variety of environments.

* Currently with the Department of Physics of Columbia
University, New York, New York.
t The most notable exception of which the author is aware is the
debugging system recently developed for T88/360;2 readers are
referred to this paper for a more detailed discussion of the need
for more powerful debugging systems.
tt TE8TRAN, the system provided with 08/360 for debugging
assembly language routines, 3 includes several features for program
checks and conditional traces; however, because the debug
commands are macro calls, their format is severely restricted, and
consequently test conditions which do not fall into one of several
predetermined forms can be quite complicated to encode.
59

60

Spring Joint Computer Conference, 1970

Third, to facilitate use, simple commands had to be
provided for the most common debugging requirements.

PROGRAIVI ORGANIZATION
AIDS, the All-purpose Interactive Debugging System, is a main program with three input files: the object
code of the user's program, the listing generated by the
compilation (or assembly) of the program, and a
udebug file" containing the commands to AIDS for
tracing and testing the user's program. AIDS may be
divided into three sections corresponding to these
three files: the listing reader, which extracts from the
compiler and assembler listings the attributes and
addresses of the identifiers in the source program; the
command translator, which transforms the statements
in the debug file into entries in the AIDS trap tables;
and the simulator, which simulates, monitors, and
traces the user's program.
The listing reader is entirely straightforward, and
only one point bears mentioning, namely, that the
alternative, modifying the compiler and assembler to
output the needed information, was rejected for several
reasons. At the time of inception of the project, new
FORTRAN compilers were being issued so often by
Control Data that reimplementing such a modification
on each new compiler would have been a full time effort
by itself. In addition, installations with their own compilers would have had to modify them in order to use
AIDS, a step many installations might have been
hesitant to take.
All debugging information is supplied through a
special debug 'language; absolutely no modifications
are required to the user's program to run under AIDS.
This debug language will be described in some detail
below, after which a few of the techniques used in
implementing AIDS will be discussed.

DEBUG LANGUAGE
The three basic syntactic entities of the debug language are the tag, the expression, and the event. The
tag designates a fixed location or block of memory,
and may be an octal address, statement number, variable name, or subroutine name. The expression specifies
a value, and is constructed according to the same rules
as a FORTRAN IV expression, including full mixed
modes, and logical, relational, and arithmetic operators;
only function references are excluded. The event specifies a particular occurrence in the user's program, and

can take one of five forms:
OPCODE[S] [(opcode) [TO (opcode)]]
AT [(tag list)]
LOAD[S] [[FROM] (tag list)]
STORE[S] [[TO] (tag list)]
CALLS[S] [(tag list)]
where (tag list):: = (tag) \ (tag) [, (tag)] . . . )
In his debug statements, a user can refer to all the
identifiers of his source program: variable and subroutine names and statement numbers. Array elements
can be referenced with subscripted variables, or the
entire array designated by the array name alone (the
latter feature is useful, for example, in tracing stores to
any element of an array). All hardware registers may
be used in arithmetic expressions on an equal footing
with other variables. Additional variables may be
created at run time for use as counters or switches,
arid new labels may be assigned to points in the user's
program not associated with any identifier in the
source' text.
The principal statement in the debug language is the
trap ~tatement, which has the form

{::~~~Et

(event), (trap sequence)

AFTER)

where (trap sequence):: = (trap command) [, (trap
command)] . . . This statement directs that immediately before or after (WHEN is synonymous with
BEFORE) the occurrence of the specified event* in the
simulated program, the commands in the trap sequence
are to be executed. The possible trap commands are:
(1) assignment statement-exactly as in FORTRAN
(2) IF (logical expression»)
which causes subsequent trap commands t9 be executed only if the logical expression is true,
(3) SUSPEND {OPCPDE \ CONTROL \ 'LOAD \
STORE \ CALL} TRAPS
R~SUIVIE {OPCODE \ CONTROL \ LOAD \
STORE \ CALL} TRAPS
which turns off/on all traps of a given type,
(4) GO TO (tag)
which causes a transfer of control in' the simulated
program,
(5) PRINT (print item) [, (print item)] ...
where (print item) = (tag) \ "(text not including" )"
which prints the contents of the tag (in a format appropriate to its mode in the user's program), or the
specified text, and

* AT (tag) denotes the execution of the first instruction at
location (tag). An event without any opcode specification or tag
list denotes every opcode, any store, any call, etc.

The Debugging System Aids

TAGS

where

(left side)::

61

(global identifier) I
(variable identifier)
[( (arithmetic expression)
[, (arithmetic expression)] ... )]

=

(tag):: = (subscripted variable identifier) I (statement
identifier) I (global identifier) I (octal identifier)
(variable name) [$ (subprogram
name)]
(statement identifier):: = (statement number)S [$
(subprogram name)]
(global identifier):: = [$] (global name)

IF (logical expression»)
SUSPEND (trap type) [(trap word)]
RESUME (trap type) [(trap word)]

(subscripted variable identifier):: = (variable identifier)
[( (subscript )[, (subscript)] ... )]

(trap word):: = TRAP[S] I TRACE[S]
PRINT (print element) [, (print element)] ...
where (print element):: = (tag) I "(text not containing" )"

(variable identifier)::

=

EVENTS
(opcode event specifier) I
(control event specifier) I
(load event specifier) I
(store event specifier) I '
(call event specifier)
(opcode event specifier);; = OPCODE[S] [(opcode)
[TO (opcode)]]
(control event specifier):: = AT [(tag list)]
(load event specifier):: = LOAD[S] [[FROM] (tag list)]
(store event specifier):: = STORE[S] [[TO] (tag list)]
(call event specifier):: = CALL[S] [(tag list)]
(tag list):: = (tag) I (tag) [, (tag)] ... )
(event specifier): :

TRAP STATEMENT

j

WHEN }
BEFORE
AFTER

(event ),

(trap sequence)

(trap sequence):: = (trap command) [, (trap command)] ...
TRAP COMMANDS
(left side) = (arithmetic expression)

where

(trap type)::

=

OPCODE I CONTROL I LOAD I
STORE I CALL

COMMANDS NOT VALID IN TRAP SEQUENCES
WHAT [IS] (tag)
WHERE [[IS] (tag)]
$ (subprogram name)
RETREAT (decimal integer)
LABEL (octal identifier)

rON

MAP
STEP

*
**

1

iOFF
~
lTRACEj

f (decimal integer) l
OFF
f
l

= (variable identifier) = (arithmetic expression)
BREAK OUT[AT (tag)]
BREAK IN AT (tag)

***

* establishes new local (default) subprogram for identifiers
** finds nearest symbolic label
*** defines (assigns an address to) a symbol

Figure I-The syntax of the AIDS debug language

(6) TRACE
which prints a line describing the event which caused
the trap. Since the statement
WHEN (event), TRACE
is one of the most often used, the natural abbreviation
TRACE (event)
has been allowed. A few sample trap statements:
TRACE STORES TO A
WHEN CALL TEST, IF (I**3+J . GT .27)
PRINT "INVALID ARGUMENTS TO TEST,"
I, J
WHEN AT lOS, 1=1+1, IF(I . GT . 100) PRINT
"LOOP EXECUTED 100 TIMES, EXIT FORCED" ;
GO TO 100S
(the S after the numbers in the last statement indicate
that they are statement numbers rather than integers).
The user has at his disposal quite a few other control

and informational commands; these are enumerated in
Figure 1. A few of these deserve special mention:
The MAP feature provides a simple means of tracing
the flow of control in his program. The MAP TRACE
command makes AIDS print out pairs of addresses
between which instructions were executed without any
transfers. If the user does not want a continuous map,
he can still get the last 25 such pairs printed at any
time by typing MAP.
The user can step forward through his program, a
fixed number of instructions at a time, with the
command
STEP (integer)
More interestingly, with the command
RETREAT (integer)
he can step backwards through his program by a fixed
number of instructions; his program is restored to
exactly the same status it had earlier. Although this
process is limited to a few thousand instructions, it is

62

Spring Joint Computer Conference, 1970

The program to be debugged:
C

PROGRAM PRIME (INPUT, OUTPUT)
PROGRAM DETERMINES IF A. NUMBER IS
PRIME

10 PRINT 20
20 FORMAT(*YOUR NUMBER, PLEASE-*)
READ 40, NUM
40 FORMAT(I4)
IF (NUM. LE. 0) CALL EXIT
ISQRT = NUM ** (72) + .1
70 DO 90 J = 2, ISQRT
80 IF (NUM/J*J. EQ. NUM) GO TO 130
90 CONTINUE
PRINT 110
110 FORMAT (* NUMBER IS PRIME*)
GO TO 10
130 PRINT 140
140 FORMAT (* NUMBER IS NOT PRIME*)
GO TO 10
END
A log of the debug session:

YOUR NUMBER, PLEASE-9
STORE TO J = 2
NUMBER IS PRIME
PAUSE.
BEHEST-what is isqrt?
1

BEHEST-after store to isqrt,
fnum = num, isqrt = fnum**
0.5 +.1

why didn't it try J

=

3?

check limit of DO loop
aha! realize that formula for
ISQRT is wrong
fix it

BEHEST-go

YOUR NUMBER, PLEASE-9
STORE TO J = 2
STORE TO J = 3

NUMBER IS NOT PRIME
PAUSE.

seems to work now

BEHEST-go

try one more

Explanation:

YOUR NUMBER, PLEASE-5
TYPE PROGRAM N AME-Igo
BEHEST-when at lOs, pause
pause each time before
first print
BEHEST-go
start execution
have reached statement 10
PAUSE.
BEHEST-go
keep going
YOUR NUMBER, PLEASE-9 try program with 9
program doesn't work
NUMBER IS PRIME
PAUSE.
are back at statement 10
BEHEST-trace stores to j
watch DO-loop index
BEHEST-go
and try again

STORE TO J = 2
NUMBER IS PRIME
PAUSE.
BEHEST-quit
NOTE: Input from user appears above in lower case; output
from AIDS appears in standard upper case; output from the
user's program appears in italicized upper case. The comments on
the right would not be a part of an actual debugging session.
"BEHEST-" is the prompt given by AIDS when it expects input

Figure 2-A trivial example of an on-line session with AIDS

generally very helpful in determining the source of
difficulty when an error condition occurs.
Programs running under AIDS are normally simulated rather than executed directly; the trap commands
described above are in effect only while the program is
being simulated. Simulation, however, greatly increases
the time required for program execution (by a factor
of 60 or more), so that programs which run into difficulties only after several minutes of execution cannot
be debugged by simulation alone. For users who believe
they can localize the source of their difficulties, or who
only require the AIDS trap facilities at specific points
in their program, the commands
BREAK OUT AT (tag)
and
BREAK IN AT (tag)
have been provided. These commands direct AIDS to
change from simulation to direct program execution,

and to revert to simulation at arbitrary points in a
program.
To illustrate the use of a few of these commands in
the 'interactive mode, a trivial debugging example is
given in Figure 2.
INTERNAL DESIGN
The most important decision in designing a debugging system is whether to process the source language directly (by adding debugging statements to a
compiler, or interpreting the source text) or to work
from the object code. The author firmly believes that
both types of d~l;>ugging aid should be included in
standard programming support, particularly for timesharing systems. A system using only the object code
and symbol table of a program cannot offer the sim:plicity of code modification possible with an interpreter
or incremental compiler; nor can it provide several
types of error checking which are easy to perform at

The Debugging System Aids

the source language level, such as subscript in range
and agreement in type of formal and actual parameters.
However, several considerations dictated development
of a system running from the object code. First, a
large fraction of users have assembly language subroutines in their FORTRAN programs; running such
programs interpretively would in effect mean assembling the source code and then simulating the
machine instructions. Second, some of the most elusive
bugs are due to compiler and system routine errors;
such bugs can clearly only be found by a system which
runs from the compiled code. (Interestingly enough,
some of the first bugs found by AIDS were in the compiler, loader, FORTRAN coded output routine, and
the routine which generates FORrRAN executiontime error messages.)
The next choice to be made is whether to simulate
or execute the object code. In contrast to most debugging systems, AIDS offers the user the ability to do
either. Simulation provides a far richer set of traces
and checks than could a system which executes the
object code; in particular, it provides a simple solution
to what appears to be the most common plight of the
desperate user, "What part of my program stored
that?" On the other hand, when a particular routine
can be isolated as the source of a program error, only
that routine need be simulated, with the rest of the
code executed; in this case, the program can run at
nearly normal speed.
The trap system is entirely straightforward, using
for each type of trap (load, store, etc.) a list of addresses
which is checked regularly during simulation. To
avoid possible "side-effects" (e.g., instruction modification at a location where a breakpoint is stored) absolutely no modifications are made to the user's program
during simulation. Trap commands are checked
syntactically and translated into an internal form on
input, and are interpreted \yhenever a trap occurs.
Whenever a store is performed· by the simulated
program, the old contents of the referenced memory
location are saved in a circular buffer. At two points in
the circuit of the circular buffer, the contents of all the
simulated hardware registers are saved. When a
RETREAT is requested, the user's program is first
reset to its status at one of these two earlier points;
memory is restored by working backwards through the
circular buffer from its current position to the earlier
point. The program is then stepped forward to the
point to which the user wanted to retreat in the first
place.
AIDS consists of about 6000 source cards, and occupies a minimum of 44000 8 words of memory. With
the exception of the simulation routine, which was
coded in assembly language for efficiency, the entire

63

system was written in FORTRAN. This was no doubt
a factor in getting the system coded and largely debugged in less than one man-year of programming
effort.
CONCLUSION
In evaluating the results of the AIDS project, it is
necessary to ask two separate questions: Is such a
powerful debugging system worthwhile? and Has this
implementation been successful, in particular with
respect to the three points mentioned towards the
beginning of this paper?
The latter question I believe can be answered in the
affirmative; as regards the three specific points:
1.' The identical program has been used for both
batch and conversational debugging. In general the
system appears to be flexible enough to satisfy the debugging styles of both types of user: the selective
traces arid automatic program checks required by the
batch user and the conditional trapping desired by the
time-sharing user. *
2. In large part because most of AIDS is coded in
FORTRAN, it has been converted for use under two
batch and three conversational systems with relative
ease. In addition, the modular design has made it
possible for the author of a subsequent CDC 6600 debugging system to incorporate major sections from
AIDS.5
3. The only "abbreviation" included is the TRACE
command (in place of WHEN ... , TRACE). Short of a
general redesign of the command structure to reduce
the amount of typing required, no other particular
sequences of commands seemed to be frequent enough
to merit abbreviation.

The more general question, whether such a powerful
debugging system is worth the cost, is more diffic:qlt
to answer. There is, of course, the increased cost in
processor time and memory space, but these items generally represent only a small part of the cost of debugging; as these costs decrease further, it is safe to
assume that nearly any significant saving of a programmer's time at the expense of computer time will
represent a net savings.
Thus the fundamental question is, does AIDS save
the programmer time in debugging? In one aspect it
clearly does not: since it is such a large system, it
takes quite a while to learn all its capabilities. Indeed,

* It has been suggested that the ability to jump around within the
deck of commands to AIDS may be desirable to give the batch
user even greater control over the debugging process; such a
facility may soon be added.

64

Spring Joint Computer Conference, 1970

potential one~time or occasional users have been dissuaded by the thought of reading a 23-page manual.
As a result, most AIDS users until now have been
systems programmers or user consultants. One user
has suggested, however, that it is precisely these experienced users who are most in need of such a system
and for whom the system should be designed; in this
case the time required to become familiar with the
system is not such a critical factor.
So, finally: Does AIDS save time in the actual task
of debugging, in comparison with simpler debugging
systems? In the batch mode, where the primary object
is to collect as much useful information as possible
from each run, I am confident that the answer is yes.
In conversational debugging, on the other hand,
brevity and ease of typing are important factors;
these aspects clearly favor the simple debugging systems, where considerable effort has been expended in
this area, 6 over the syntactically complex AIDS. It is
the author's impression, however, that the few most
difficult program bugs-those in which a very powerful
system like AIDS can be expected to be the most helpare the ones which consume most of a programmer's
time and cause most of his ulcers. In any event, a

good deal more experience with the on-line use of
AIDS and similar debugging systems will be required
to find the best balance of brevity, simplicity, and
power.
REFERENCES
1 T G EVANS D L DARLEY
On-line debugging techniques: A survey
FJCC Proceedings 1966
2 W A BERNSTEIN J T OWENS
Debugging in a time-sharing environment
FJCC Proceedings 1968
3 System/360 operating system TESTRAN
IBM Form No C28-6648-1
4 E DRAUGHON
WATCHR III-A program analyzing and debugging system
for the CDC 6600, user's manual
AEC Research and Development Report NYO-1480-58
5 H E KULSRUD
Helper-An interactive extensible debugging system
IDA-Communications Research Division Working Paper
No 258
6 P T BRADY
Writing an on-line debugging program for the experienced user
CACM Voill No 6 p 423 June 1968

Sequential feature extraction for waveform recognition
by W. J. STEINGRANDT* and S. S. YAU
Northwestern University
Evanston, Illinois

INTRODUCTION
Many practical waveform recognition problems involve
a sequential structure in time. One obvious example is
speech. The information in speech can be assumed to
be transmitted sequentially through a phonetic structure. Other examples are seismograms, radar signals,
or television signals. We will take advantage of this
sequential structure to develop a means of feature extraction and recognition for waveforms. The results
will be applied to speech recognition.
An unsupervised learning (or clustering) algorithm
will be applied as a form of data reduction for waveform
recognition. This technique will be called sequential
feature extraction. The use of sequential feature extraction allows us to represent a given waveform as a
sequence of symbols aul, ••• , auk from a finite set
A = {aI, ••• , aM}. This method of data reduction has
the advantage of preserving the sequential structure of
the waveform. The problem of waveform recognition
can be transformed into a vector recognition problem
by expanding the waveform using orthogonal functions. 1
However, in this case the sequential structure is masked
because the expansion operates on the waveform as a
whole. Data reduction can also be carried out by time
sampling, and storing the samples as a vector. In this
case the dimension· of the vector is usually large. The
data produced by sequential feature extraction is more
compact. We will formalize the concept of sequential
feature extraction and develop a performance criterion
for the resulting structure. An unsupervised learning
algorithm, which will optimize this structure with respect to the performance criterion, is presented. This
algorithm, which can be applied to waveform recognition as well as vector recognition, represents an improvement over existing clustering algorithms in many
respects. This method will allow unbounded strings of
sample patterns for learning. The samples are presented

* Presently with IBM Corporation,

Rochester, Minnesota.

65

to the algorithm one at a time so that the storage of
large numbers of patterns is unnecessary.
The·assumption of known probability measures is
extremely difficult to justify in most practical cases.
This assumption has been made in a number of
papers,2-5 but no such assumption is made here. That
is, the requirement for convergence is only that the
measures be smooth in some sense. Braverman's algorithm6 has been shown to have these advantages.
However, he assumes that there are only two clusters,
which, after a suitable transformation, can be strictly
. separated by a hyperplane. These assumptions are too
restrictive for the practical applications considered in
this work. In the clustering algorithm to be presented
here, any number of clusters is allowed, the form of the
separating surfaces is not as restricted, and strict
separability of the clusters is not assumed. This algorithm is considerably more general than existing
clustering algorithms in that it applies to time varying
as well as time invariant patterns.
We will assume that the waveform is vector valued,
i.e., x(t) is in a set n = {x(t)\\\ i:(t)\\ < M, all
t E [0, T xJ}, where i: (t) is the componentwise time derivative of x(t). It is assumed that each pattern class
has some unknown probability measure on this set.
A unified model for waveform recognition and vector
recognition will be presented. It will be shown that the
recognition of a vector pattern can be considered as a
~pecial case of waveform recognition. This will be done
by observing that the pattern space of n-vectors v is
isomorphic to the space of all constant functions
x(t) = v.
Recognition of real functions of time will be possible
by defining a transformation to the space n or by
assuming that x (t) is one-dimensional. The problem of
waveform recognition will be carried out in the space n,
where the dimension of x(t) is most likely greater than
one.
The experiments on speech will show an interesting
relationship between the sequential features and the

66

Spring Joint Computer Conference, 1970

Figure I-Assumed process prodl1cing pattern waveforms

standard linguistic phonetic structure for English. A
recognition algorithm using sequential machines will be
given that will accept symbol strings aul, ••• , auk to
classify spoken words.
SEQUENTIAL FEATURE EXTRACTION
Figure 1 shows the process that is assumed to produce
the vector waveform x (t). It is emphasized that this
model may not represent an actual physical process
as described. It is included as a means of demonstrating the assumptions about the sequential structure
on O. In the figure it is assumed that there is some state
of nature or intelligence such that pattern class i is
present. The pattern classes are represented by the
symbols Ui, i = 1, ... , R. There exists a second set of
symbols A = {al' ••• , aM} called the phoneme set. Each
ai is called a phoneme (while the terminology is suggestive of speech and language, there may be little
relation to the speech recognition problem). The second
step converts Ui into a finite sequence of phonemes
aUl, ••• , auk, where U'i is the index of the ith phoneme
in the sequence. The process of encoding Ui into
aUl, ••• , auk is most likely unknown and is probably
nondeterministic. That is, the sequence generated by a
given Ui may not be unique.
Each sequence is then assumed to go through an
encoding process into a real waveform wet) E W, where
W is the set of all continuously differentiable real waveforms such that tV (t) and the time duration are bounded.
This process is also most likely nondeterministic. For
the most part, this encoding process is unknown; but
some assumptions can ·be made. It is assumed that
there is some unique behavior of wet) .for each ai. As
each aui from aul, ••• , auk is applied to the encoder,
the behavior of wet) changes in some manner. This
behavior is detected by using a transformation to a
vector function of time x (t) E O. This transformation
can be considered to be described by some differential
equation of the form
t(t)

where j:Rn X R

~

= f[x(t) , w(t)],

(1)

Rn is a bounded continuous func-

tion. The explicit form for this equation may not be
known, but the system that it describes is assumed to
be determined from the physical process producing
w (t). If this differential equation is properly chosen,
then the value of x (t) at any time t is some pertinent
measure of the recent past behavior of w(t).
We will shortly present a clustering algorithm on 0
which is a generalization of the usual concept of clustering. Clustering for the time invariant case will first he
reviewed. It is assumed that there exists a rp.etric p that
measures the similarity between patterns, where the
patterns are assumed to be fixed points in Rn. p is such
that the average intra-class distance is small, while the
average inter-class distance is large. The method of
cluster centers used by Ball and HalF will be used to
detect the clusters. It is assumed that the number of
clusters is fixed, say at M, and there are Si E Rn, i = 1,
•.. , M, such that each Si has minimum mean distance
to the points in its respective cluster. These Si can be
found by minimizing the performance criterion
Ex mini p(Si' x), where the expectation is with respect
to the probability measure on Rn.
These assumptions will now be generalized for patterns that are time varying. Here the phonemes ai play
the part of the pattern class for the time invariant case.
That is, the time invariant pattern vectors are assumed
to be the same as the time varying case except that the
phoneme sequence producing the vector is always of
length one, and x(t) ·is the constant function.
We will describe the general case in more detail.
Here, as before, it is assumed that there is a similarity
metric p on Rn. This metric measures the similarity of
the behavior of w (t) at any given tiD?-e tl to that at
any other time t2 • This is done by measuring the
distance p[x(tl ) , x(t2)], where it is understood that
x(t) and wet) satisfy (1). The assumption is that (1)
and p are such that if ai was applied to the waveform
encoder both at time tl and tz, then p[ x (t l ) , x (tz)] is
small. On the other hand, if aj was applied during tl
and ai during t2, then p[x(tl ), X(t2)] is large for i ~}.
In other words, each ai produces behavior in w (t)
such that the corresponding values for x (t) tend to
cluster in distinct regions of Rn. Thus, the ai are represented by clusters in Rn. It is assumed that each ai
has a cluster center Si associated with it. This implies
that for each ai there is a point Si E Rn such that when
ai is applied to the waveform encoder, the function
x (t) tends to pass close to Si~
It will also be assumed that x (t) spends most of its
time in those regions that are close to the Si. In other
words, the more important features of w (t) are of
longer duration. The example shown in Figure 2 illustrates the foregoing assumptions. The figure shows the
action of x (t) under the application of al, az, a3 to the

Sequential Feature Extraction for Waveform Recognition

encoder. In the figure the width of the path is inversely
proportional to II i: (t) II.
This model is necessarily somewhat vague because
we are unwilling to make assumptions about the probability measures on O. If such assumptions were made,
then a more formal definition of a cluster might be
possible. For most practical problems such as speech
recognition, these types of assumptions cannot be made.
Assuming p and· the Si were known, they could be
used to reconstruct an estimate of the sequence au 1,
••• , auk for an unknown waveform x(t) in the following
manner. Referring to Figure 3, each of the quantities
p[Si, x(t)], i = 1, ... , M are continuously calculated
and the minimum· continuously indicated. That is,
suppose there exist times t1 = 0, t2, ••• , tk +1 = T x such
that P[Sui, x(t)] ~ p[sj, x(t)] for all j ~ i and all
t E [ti, ti+1], then it is assumed that the phoneme sequence most likely to have produced x(t) is aul, ••• , auk.
Note that no adjacent phonemes in the sequence ~re
ever the same. It is also apparent that the output
sequence is independent of time scale changes in x(t).
If p and (1) are fixed, then for a given set of the Si,
i = 1, ... , M there is a transformation defined by
Figure 3. This transformation will be called T8:0 ~ P,
where P is the set of all finite sequences of symbols
from A, S = (SI', ••• , SM')' and the prime of a matrix
denotes its transpose.

~(t)

i-f(a.w(t»

Jl{t)

67

MINIMUM

SELECTOR

Figure 3-Implementation of a phonetic structure

The pair (A, T defines a sequential structure on O.
This sequential structure is extracted by the transformation T8 defined in Fig. 3. Thus, the terminology
sequential feature extraction has been used.
This definition of sequential feature extraction is
unique in that it puts sequential structures in waveform
recognition on a more formal basis. Gazdag8 has suggested a somewhat similar structure in what he calls
machine events. His method involves linear discriminant
functions, and he gives no method for determination of
the structure.
The objective of the learning algorithm will be to
determine (A, T by determining the composite vector
s. The differential equation in (1) and p are assumed
to be determined from a study of the physical process
producing wet). It is obvious that any random choice
for S will define a sequential structure. The learning
algorithm will be required to find that S which is
-optimum with respect to some performance function.
This performance function is generalized from that
mentioned previously for time invariant patterns.
Based on the previous discussion, the performance
function for this case is
8)

8)

E.C(s, x)

t=O
Figure 2-Example of waveform x(t) produced by sequence
aI, a2, aa

~ E.

G. {'

{min; pes;, x(t)]l dt)'

(2)

where C (s, x) is a function called the confidence function
for a given waveform x (t) . The smaller C (s, x) is for a
given x(t)) the more confidence, on the average, can
be placed in the resulting sequence of phonemes. Taking
the statistical expectation over the entire population 0
gives us the performance function.
The object of the learning rule will be to find an s*
such that ExC (8*, x) is at least a local minimum for

Spring Joint Computer Conference, 1970

68i

ExC (s, x). It is obvious from (2) that direct evaluation
of the performance function is not possible because the
probability measures are not known. Using stochastic
approximation, it can be shown that if a learning rule
of the form
(3)

is used, then under certain conditions the sequence
{sn} converges almost surely to a saddle point or local
optimum s*, where sn is the value for S at the nth
stage of learning, xn is the nth sample waveform, and
an is a sequence of scalers satisfying certain convergence
conditions. Note that xn is unlabeled, i.e., no pattern
class information is used in the learning rule.
It can easily be seen that if x(t) = v, and Tx = 1,
then the performance function in (2) reduces to that
for the time invariant case.
We are now in a position to calculate vC(s, x) for a
given pattern x (t). Define
A(Si) = {xERn I p(Si, x)

<

p(sj, x),

all.i

~

i}.

(4)

Each region A (Si) corresponds to a phoneme ai. For
each x(t), the sequence aul, ••• , auk is simply a list of
the regions A (Si) through which x(t) passes. The
t 1 , ••• , tk+1 are then the times at which x (t) passes
from one region to the next. Using this, we can write

C (s, x) =

:E fli+
k

i=l

1p[Sui' x(t)] dt

(5)

ti

Taking the gradient and canceling terms we have
(6)
where V Sj is the gradient with respect to Sj. It is also
understood that the integral of a vector function is
meant to be the vector of integrals of each of the individual components. The learning rule in (3) becomes

(7)
where
00

:Ea

n

n=l

00

=

00,

:E a <
n

2

(8)

00,

Sj

n

p(x, y) =

= Sjn - an

:E (Xi -

Yi)2.

i=l

The learning rule in thiS'-case becomes
st+1 = Sjn -

a
Tn

(Txn

J"

Xn

Xi[Xn(t) ][Si n - xn(t)] dt.

11o Xj[Xn(t)

The automatic recognition of speech has received
much attention since the advent of the digital computer.
Most of the previous work9- 12 in speech recognition has
made use of the phonetic structure of speech. Almost
all of these studies use the standard linguistic phonetic
structure. Here we investigate the applicability: of sequential feature extraction to the speech recognition
problem. A sequential structure will be developed using
a limited vocabulary. It will be seen that the resulting
structure is related to the standard English phonetic
structure. Because of this relationship to speech, we
will refer to sequential feature extraction as a machine
phonetic structure.

In order to represent the speech waveform w (t) as a
vector function of time we will use the common method13
of a bank of bandpass filters. In the experiments 50
filters were spaced from 250 to 7000 hz. Each filter was
envelope detected and sampled by an AID converter
and multiplexor. Therefore, x(t) is a 50 dimensional
vector function of time.
Kabrisky 14 has shown that a neuron network similar
to that found in the brain is capable of performing
correlation calculations. Based on this we assume that
the similarity metric defined by
p(x,y)

=

(1- 11 x~·IIYII) = ~CI :11-11;11)'
(11)

is valid for speech sounds. Note that p (ax, by) = p (x, y)
for all a, b, x, y, i.e., the metric p is invariant to amplitude changes in the signal. Using this metric we have
the following learning rule.
(12)

where

4.i
]VSjp[Sj,

(10)

0

AUTOMATIC SPEECH RECOGNITION

at the nth step of learning. An equivalent form is

SjnH

Assume that p is the squared euclidean

metric, i.e.,

n=l

xn (t) is the nth sample waveform, and Sjn is the value

of

Example 1

Xn(t)] dt,

where Xj is the characteristic function of A

(9)

=

1 (1 W ,) 1

II"sin "

n

" Sin

- I

SinSi

1

7 n
'x

Sj).

T Xn

o

Xi[Xn(t) ]Xn(t) dt

(13)

Sequential Feature Extraction for Waveform Recognition

69

where
Xn (t)

= II

1
Xn (t)

I-C(I,.)

WXn (t) ,

(14)

and I is the n X n identity matrix.
If we normalize x (t) as part of the preprocessing and
normalize each Si after each step of learning, then we
can write the learning rule as

- - - TEN SAMPLE AVERAGES

- - - - -- -

jTZn Xi[X(t) Jx(t) dt

RUNNING AVERAGE

(15)

o

This rule was used to develop the phonetic structure
presented in the next section on the experimental
results.
MACHINE PHONETIC STRUCTURE
EXPERIl\1:ENTAL RESULTS
This section describes the results of experiments using
the data acquisition equipment previously described.
The basic goals of the experiments were
(1)
(2)
(3)
(4)

test convergence of the algorithm
determine effects of local optimums
provide output for use in speech recognition
determine relationship to the standard linguistic
phonetic structure, if any.

There were two sets of data used for the tests. One
set consisted of 50 utterances each of the words "one",
"four", "oaf", "fern", "were". These words were chosen
because they contained a small number of sounds with
an unvoiced as well as voiced sounds. One speaker was
used for all utterances. It was found that the speaker's
voice had enough variation to adequately test the
algorithm. If the algorithm had been tested with many
speakers, the variance would have been much larger.
This would have lengthened the convergence times
beyond what was necessary for a sufficient test.
The larger data set consisted of 40 utterances of
each of the ten digits "one", "two", . i . , "nine",
"oh". These were all spoken by the same person.
These words contain a wide variety of sounds: voiced,
unvoiced, dipthongs, plosives, etc. This set was used to
give a somewhat more severe test of convergence and
to provide data for speech recognition. We will now
consider the four goals of the experiments separately.

100

300

400

Figure 4-Improvement of the performance function

to better determine convergence, the sequence {an}
was chosen to be constant over many steps of learning.
If convergence was apparent under these conditions,
then convergence under decreasing step increments can
be assumed.
Figure 4 shows an example of the convergence of
C (s, x) using the large data set. Due to the variance
of the data, a direct plot of C(s, x) at each step of
learning shows very little. The individual points for
C (s, x) are so scattered that convergence is difficult to
see. Figure 4 shows the plot after data smoothing. The
solid curve represents averages of ten successive values
of C (s, x). The dotted line represents further data
smoothing. It can be seen that the performance function
is not improved at each step but is improving over
many samples. In order to demonstrate that the components of sn were converging as well as C (s, x), the
plot in Figure 5 was made. This is a plot of the tenth

.-

........ .

.'.

'

......
"

..

...............

.............

.....,.,..- ..~~,....--.................

.........-

Convergence: Many runs with the small data set were
made. Different starting points were chosen, and other
conditions were varied. In all cases the algorithm
showed a strong convergence.
Because there was only a finite set of samples, the
convergence properties in (8) were academic. In order

200
LEARNING STEP

50

.."....

100

150

LItARNING STEP

Figure 5-Convergenre of the 10th component of

86

70

Spring Joint Computer Conference, 1970

structures were not equivalent for vowels or glides with
time changing spectra. In this case the machine structure appeared to develop phonemes that represented
transitions between the standard phonemes.

C(S,.)

RECOGNITION OF PHONE1VIE STRINGS
INITIAL POINT ONE

LEARNING STEP

100

PO.. T TWO
LURNING STEP
POINT ONE

100

200

150

200

400

Figure 6-Improvement of E C(8, x)

channel for 86 of the small data set. The computer
listing for each step of learning was examined to find a
rapidly changing component. This component is typical
of the convergence of these values. Note that at the
beginning there are rapid and random changes in its
value due to the large value of an and the fact that the
structure is rapidly changing. The learning then appears
to enter a phase where the structure is rapidly descending toward a minimum. The last part of the
learning seems to be random fluctuation about the
optimum. Note that convergence appears to take only
about 150 steps.
Local Optimums: It was found that there definitely was
more than one optimum. By choosing different starting
points, the algorithm converged to different optimums.
To see this, examine Figure 6. This is a plot of the
smoothed data. for two runs with the small data set.
Each learning run was made with the same data except
that the starting points were different. It can be seen
from the figure that the initial point one converged to
a local optimum that was not as good as that for the
initial point two. We can be fairly certain that the
first point will never converge to the second, since more
than twice the number of learning steps were run for
point one than for point two.
The Standard Phonetic Structure: The output strings
from sample words were inspected for similarities to
standard phonetic spellings. IS It was found that the two
structures were similar in many respects. A one-to-one
correspondence could be made between certain standard
phonemes and machine phonemes. This was particularly true for consonants such as [sJ or [f]. The two

In this section we present a means of classifying the
phoneme strings that are produced as a result of sequential feature extraction. For completeness, we shall
restate the recognition problem here. There is a set
of symbols A = {aI, "', aM} called phonemes. The
pattern space P is the set of all finite sequences of
symbols from A. A typical pattern from P will be
denoted either by the sequence a Ul1 " ' , aUk or by q.
There are R pattern classes, each class has some characteristics associated with its sequences that differentiate it from the other classes.
If one were to use a Bayes decision procedure, the
following would be needed. According to decision theory,
in order to minimize the probability of error, the discriminant functions
gi (q)

= p (q I i) p ( i) , i = 1, "', R

(16)

are needed, where q E P, p (q I i) is the probability of q
given pattern class i, and p (i) is the a priori probability
of class i. If it can be assumed that the p (i) are all
equal, then they can be dropped from (16). The problem is then to estimate p (q I i) for all q and i. It is
obvious that even if the length of the strings is bounded,
the estimation of all the probabilities in (16) is an
almost impossible task for a phoneme set of any size.
For example, if there are ten phonemes and the strings
are assumed to be no longer than length 5, then the
number of probabilities is greater than 510. The amount
of data to estimate these probabilities is too large to
obtain practically./ Therefore, a Bayes decision procedure for this case is impractical. A decision procedure
that does not require the estimation of all possible
probabilities will have to be found.
In order to motivate the development that follows,
we will outline the basic approach used for this recognition problem. A concept of the storage of prototype
pattern strings is extended to what is called a generalized
prototype str~ng. This concept will be used in the pattern
recognition problem as follows. The generalized prototype string will be defined as a truncated Markov
chain. R of these Ml1rkov chains are defined. These
1VIarlmv chains produce finite strings of symbols in P.
The probability measures p (q I i) on these strings are
assumed to approximate the probability measures for
each of the R pattern classes. These generalized prototype strings are used to define sequential machines for

Sequential Feature Extraction for Waveform Recognition

recognition. These sequential machines will accept an
unknown string q and calculate p (q I i). This will then
be used to classify q according to (16).
The need for generalized prototype strings comes
from the fact that the intra-class variance of the strings
is large. If this variance is small, then a straightforward
method of recognition exists. This method 'would be to
store the most common output strings for each class.
Each unknown pattern string q would then be matched
against the stored strings. If there is a match with one
of the stored strings for class i, then q will be put in
pattern class i. If, however, the strings within a pattern
class show a large variance, too many strings will have
to be stored in order to recognize a reasonable number
of patterns. To reduce the storage requirements in this
case, the following concept of a generalized prototype
string has been formulated.
In order to simplify notation, we will work only
with the indices of the strings and omit the symbols a.
In other words, if we have a string acl'l' "', aUk' then
we will describe this string as 0"1, "', O"k. This will
cause no confusion.
If there are M phonemes in A, then the possible
indices for the symbols ai run from 1 to M. Assume
that there is a new symbol aM+1 that represents string
termination. That is, using the notation introduced
above, each string is of the form 0"1, "', O"k, M + 1.
This will be useful when the truncated Markov chains
are defined.
Suppose we: have a prototype string n1, "', nm,
nm+l = M + 1. This string will be used to define a
Markov chain that terminates when JJI + 1 appears.
To do this, assume that there exist probabilities p (i) ,
i = 1, "', m, and p ( j I k), j = k + 1, "', m + 1,
k = 2, "', m + 1. These probabIlities, along with the
sequence defined above, can now be used to define a
Markov chain. This chain will produce subsequences of
nl, .. " nm+l. If nil' "', nik' nm+! is such a subsequence,
then, using the above probabilities, the l\1:arkov property allows us to write

k

= P(il)p(m

+ 11 i II p(ij I i
k)

j- l ) ,

(17)

j=2

where p (nil1 "', nik' nm+!) is the probability th~t this
subsequence occur~, p (il ) is the probability that index
i l is the first index in the subsequence, and p (i j I ij-l)
is the probability that index i j follows i j - 1 • Note that
if at any time i j = m + 1, the string terminates. Also
note that the subsequence preserves the order of the
original sequence. That is, p ( j I k) = 0 for j ~ k.
In accordance with the above discussion we have the
following definition.

71

Definition 4. A sequence nl, "', 11m, nm+! = M + 1
together with the probabilities p (i), p (j I i), j = i + 1,
• • " m + 1, i = 1, "', m is called a generalized prototype string S. The string S is said to be generated by
nl, "', nm, n m+l.
Definition 5. The range of a generalized prototype string
S is that set of subsequences Q such that a subsequence
ni" "', nik' 11m+! is in Q if and only if p (nil) "', nik'
1'111'+1)

> o.

Thus, the generalized prototype string is actually a
probability measure on P. Suppose that 0"1, "', O"k,
ill + 1 is a string in P. If this sequence is not in the
range of S, then there is a subsequence nil) •• " nik' nm+!
such that O"j = nij for all j. The probability measure on
P is defined in the following manner.
Definition 6. If 0"1, "',
p (0"1, "',

O"k)

O"k

is a sequence in P, then define

= 0, if 0"1, "',

O"k

is not in the

range of S.

is in the
range
of S,
(18)
where nil' •.• , nik' nm+! is such that 0" j = nij for all j,
and p (nil' .. ', nik, n m+l) is defined in (17). The
probability in (18) will be called the measure associated
with S.
It can now be seen that the usual notion of a prototype string isa special case of the generalized prototype
string. If p(l) = 1, and p(i I i - I ) = 1 for all i, then
the resulting range of S consists only of the string
nl, "', n m+1. This corresponds to the method described
at the beginning of this section.
The following approach to the recognition problem
will now be taken. Each pattern class i has a probability measure p (q I i) associated with it. It is assumed
that there exist generalized prototype strings Si, i = 1,
.. " R, such that p (q I i) is the measure associated
with Si. In other words, we are using the concept of
the generalized prototype string to approximate the
measure p (q I i). The learning procedure will require
that each Si be determined. A recognition procedure
must also be developed that will allow each of the
p (q I i) to be evaluated for an unknown pattern. To
simplify notation, write Pi(q) = p(q I i).
The calculation of Pi(q) associated with each Si can
be implemented by the use of sequential machines. It
was shown that the measure on the range of Si was
the result of a truncated Markov chain. Let nl i , " ' ,
n ml i, M + 1 be a string that generates Si. Assume for
the moment that this string and the associated prob-

·72·

Spring Joint Computer Conference, 1970

abilities have been completely determined. The seque~tial machine that implements Si contains mi + 2
mi,
states. These states are labeled (0), 1, 2,
mi
1, where (0) is the reset or power on state and
mi + 1 is the terminal state. In other words, each
machine state j is associated with the corresponding
term n/, for j = 1,
mi + 1. The (0) state corresponds to the start of the sequence. Each state transition is defined in the following manner.
Let Pi(j), pi(k Ij), k = j + 1,
mi + 1, j = 1,
mi be the probabilities used to define Si. Suppose
the sequential machine M i is in state (0). If P (j) ~ 0
define a transition to state j for input symbol nj. 1f
p(k) = 0 for some k, there is no transition from (0)
to state k. Now suppose the sequential machine is in
state k. If p ( j I k) ~ 0, define a state transition from
state k to state j for input nj. Continue for all such
states from 1 to mi. This process completely defines
the sequential machines Mi.

1-.1

4-.1

000,

+

000,

000,

o

0

0,

Definition 7. The sequential machine Mi is said to
accept a sequence 0"1,
O"k if this sequence is contained
in the range of Si.
We are now in a position to see how the above sequential machines lIl i can calculate the probabilities
Pi(q). Recall that each state transition was defined
using one of the probabilities Pi ( j), or Pi ( j I k). Thus,
each state transition has an associated probability. If
machine Mi accepts pattern string q, thenthere is a
sequence of state transitions leading to the final state
mi + 1. Each state transition has an associated probability. If, the product of all these probabilities is
formed, then it is seen that the result is the product in
(17) . But, it has been seen that this is the desired probability Pi(q) from Definition 6.
Therefore, the recognition procedure is as follows.
The unknown string q is applied to all the sequential
machines M i , i = 1,
R. If none of the machines
accept q, then it is rejected as unrecognizable. For .each
machine Mi that accepts. q the probability Pi(q) is
calculated in the following manner. An accumulator
register is initialized to the value 1 at the start of q.
As each state transition is made during the application
of q the probability associated with that transition is
multiplied by the contents of the accumulator, and the
result is stored back in the accumulator. After the
machine reaches the final state mi + 1, the desired
probability Pi(q) is in the accumulator. These calculated probabilities are then used to classify q in the
sense of (16).
Two examples will now be given that will clarify the
above development.
0

0

0,

000,

Example 2. Suppose there are three phonemes in A.

Figure 7-(a) State transiti~n diagram for pattern class 1 with
associated probabilities, and (b) State transition diagram for
pattern class 2 with associated probabilities

That is, A = {I, 2, 3}. Assume there are two pattern
classes and that the prototype strings for these two
classes are

1, 2, 3, 4
2, 3, 1, 4
The transition probabilIties are as follows:
P1(1)

= .5

P1(1 12)

= .3

P1(213) = .9

P1(2)

= .5

p1(1 13)

= .6

P1(214) = .1

Pl(3)

= 0.0

P1(114) = .1

P1(3 14)

= 10

p2(1)

= 1.0

p2(1 I 2)

= 1.0

P2(213)

= .5

P2(2)

= 0.0

p2(1 13)

= 0.0

P2(214) = .5

p2(3)

= 0.0

p2(1 14)

= 0.0

P2(3 14) = 1.0

These sequences and probabilities can be used to design
the sequential machines shown in the state transition
diagrams in Fig. 7. By inspection of the state transition
diagrams it can be seen that the range lor S1 consists

Sequential Feature Extraction for Waveform Recognition

of the strings
1234
124
134
14
24
234
The measure associated with Sl is then seen to be
P1(1234)

(here, A is the sam~ as in Example 2). If these strings
were produced by a Markov process, then p(4113) =
p(4123) and p(2113) = p(2123). But, from the
figure it can be seen that this is not the case. For
p(4113) = .3, and p(4123) = .6. Also, p(2113) = .7
and p (2 I 23) = .4. Thus, while the range of S is produced by a Markov process of the states of the sequential machine, the resulting strings in the range are
not Markov.

= .5 X .3 X .9 X 1.0 = .135

P1(124)

= .5 X .3 X .1 = .015

P1(134)

= .5 X .6 X 1.0 = .3

TABLE I-Example of Table for Sequential Structure
original strings

P1(14) = .5 X .1 = .05

1
1
1
1
1
6
1
1

P1(24) = .5 X .1 = .05
P1(234) = .5 X .9 X 1.0 = .45

In the same manner, the range for S2 is
234
2314

P2(2314)

1
1
1
1
1

= 1.0 X 1.0 X .1 = .1
= 1.0 X 1.0 X .9 X 1.0 = .9

5

6
1
1
1

Note that the ranges for Sl and S2 overlap in that 234
is common to both. But 234 will be put in class 1 since
P1(234)

5 1
2 3
2 3
2 3
2 3
5 1
2 3
2 3
1 2
2 4
2 4
2 3
5 1
2
1 2
5 2
2
2 3
2 6
1 2
6 2
5 6
1 2

5

The measure associated with S2 is
P2(234)

> P2(234).

There is a subtle point about the application of the
Markov chain. While the strings in the range of a
generalized prototype string are assumed to be produced
as a result of a Markov process, the strings themselves
are not Markov. The next example will illustrate this
point.
Example 3. Consider the generalized prototype string
shown in Fig. 8 (Sequential machines will be used to
define the Si from this point on since the notation is
more compact). The range of this generalized prototype
string is
1324
134
234
2324

5
5

1
5
5

6

5

5

1
1
1
1

5

6

4-.3

5
5

1
1
1
5
5

1

5
5

Figure 8-Example of prototype string whose range is not Markov

73

6
6

2· 4

2

3

6
3
3

3

2

3

4
3

3

3

3
3
6 3
3
4 3

1 2
2
2
2
2
1 2
2
2
1 2
2
2
2
1 2
2
1 2
2
2
2
2
1 2
2
1 2

4

4
4

4

4

3
3
3
3
3
3
3
3
6 3
3
3
3
3
3
3
3
6 3
3
6 3
3
3

74

Spring Joint Computer Conference, 1970

Unfortunately, the method for learning the Si is
not as formal as the preceding. The prototype strings
were determined from the tables of sample strings. For
example, consider the sample strings in Table 1. Each
string has been listed in the table so that each column
contains only one phoneme. The strings have been arranged so that the order of the phonemes is unchanged
and the number of columns is minimized. These tables
were formed by a manual procedure, and at present
this procedure cannot be written as a sequence of steps.
Once these tables have been determined, the prototype
sequences can be defined. Each of. the columns in the
tables contains a distinct phoneme. These phonemes
are taken to be the prototype string that generates
Si for each class i. This is best described through the
use of an example. Consider the table for "one" In
Table II. For this case
A = {I, 2, 3, 4, 5, 6, 7,10,11,12,13,14,15,16,

17,20,21,22}
The prototype string that generates S1 for the class
"one" is 10, 22, 20, 17, 16, 11, 17, 11, 1, 15, 5, 3, 1, 15, 3.
Crude estimates of the probabilities can also be made
by counting the transitions between states as defined

TABLE II-Samples of Output Strings for "One"

10

20 17

11
1 15 1
5
11 17 1
10
17
1 15
11
10
17
15 1 1.5
10
20 17
11
1
22
17
11
1 15 1
16 11
1 15
10 22
11
1
22
11
1 li5 1
10 22
11
15 1
20 17
11
15
22
11
1
10
17
11 17 1 15
10 22
11 17 15
10
20 17
11
1 15
17
22
11
1 15
10
20 17
11
1 .15
5
10
20 17
15 1
20 17
15 1
22
17
11
15
17
11 17 15 1
5
10
17
11
15
10
20 17
11
1 1.5
22 20 17
11
1 15 3
5 3
22
11
1 15 1

1 15

22

1

1 15

1 15

Ii') 3

Figure 9-State transition diagram for "ONE"

by rows in the table. Using these probabilities and the
above prototype string, we have the generalized prototype string for "one" in Figure 9.
A computer program was written that simulated the
entire recognition system. The program accepted isolated words, computed the phoneme strings, and implemented the sequential machines. The sequences in the
training set were recognized using the sequential machines. Using this method, the recognition rate was one
hundred percent for the 250 patterns in the training set.
In order to further demonstrate the power of the
algorithm, the recognition was run using a restricted
system. The association of probabilities with state transitions was removed. Recall that the calculated probabilities for an imput pattern string were used only if
more than one machine accepted the string. In order
to provide a more severe test for the concept of using
sequential machines, the use of the calculated probabilities was dispensed with. In this case, only one
pattern from the 250 was accepted by more than one
machine. One sample pattern for "three" was accepted
by both the machine for "three" and "two". Thus,
under these circumstances, the sequential machines
performed most satisfactorily.
It was desirable to continue to simplify the algorithm
in order to see when the performance began to degenerate. Therefore, the following additional simplification
was tried. The assumption that each input string had
phoneme M + 1 in the terminal position was dropped.
Using the same definition of acceptance, this impliesthat a sequential machine will accept an input string
even if the machine is not driven to its final state.
It is stressed here that these changes in the algorithm
were not made to try to improve performance, but were
made to try to see- how far the algorithm could be
degraded and still achieve good results.
Recognition was attempted using the strings without
the terminal symbol described above. The sequential

Sequential Feature Extraction for Waveform Recognition

machines with no probabilities were used. If more than
one machine accepted a pattern, then it was assumed
to be an error. Under these conditions, the error rate
was 4%. That is, there were ten error patterns out Qf
the 250 presented to the system. The confusion matrix
for this test is given by
decision
class

1

2

3

4

5

678

9

0

1 25
2

22

3
p

a c
t 1
t a
e s
r s
n

2

24

5

24

6

25

7

3 21

8

o

1

1 20

4

9

1

25
1

23

25

This' algorithm represents considerable simplification of
the full algorithm. Under these conditions the error
rate was still low. We can conclude that ·the potential
of the complete algorithm is such that further work is
highly desirable.
The conclusions that can be made from this section
are as follows. It has been seen that sequential feature
extraction has considerable utility for use in the two
stage recognition procedure presented here. The structure produces phoneme strings that can easily be used
to design the sequential machine for the second stage
of recognition. The recognition results in the second
stage were encouraging. The initial motivation for development of the second stage recognition was to
demonstrate the capabilities of the machine phonetic
structure. However, the experimental results indicate
that this has potential for solving recognition problems
independent of the sequential feature extraction.
CONCLUSIONS
In this study we have presented a means of detecting
sequential structures in waveforms for recognition. This

75

process is called sequential feature extraction. The
waveforms were assumed to be produced by a random
process that was unknown. A learning algorithm that
automatically generated a structure for sequential feature extraction was presented. This learning rule was
unsupervised, and was shown to be a generalization of
previously unsupervised learning rules for the time
invariant case.
It was shown that sequential feature extraction could
be considered as a transformation Ts from W to the
set -uLall finite sequences of symbols from a set A
called the phoneme set. A structure on Ts was developed so that the transformation was dependent on
a set of parameters s. This allowed us to find an s
that was optimum with respect to a performance
function.
This algorithm was applied to a problem in speech
recognition. Experiment!;tl results were given that
showed interesting relationships between the standard
phonetic structure and the structure developed by sequential feature extraction. It was concluded that the
automatically developed structure was related to the
linguistic structure, but that there were significant
differences due to the continuously time changing
character of speech.
A new concept called the generalized prototype string
was presented. This was a generalization to the probabilistic case of the method of storage of prototype
strings. Each generalized prototype string was seen to
be a means of approximating the probability measures
on P. Once the generalized prototype strings were
found, it was possible to design sequential machines
for recognition. These sequential machines were seen
to implement the calculation of estimates of the individual pattern class probabilities Pi(q), where q E P.
Using these probabilities, the pattern could be classified
according to Bayes decision theory.
The sequential feature extraction presented in this
study represents a new approach to waveform recognition and unsupervised learning. For the case of
speech, it was seen that sequential feature extraction
was related to a phonetic structure. While phonetic
structures are not new, the concept of using unsupervised learning to automatically develop a phonetic
structure is new. The sequential structure in speech or
other types of waveforms can be detected by using this
algorithm. Because the algorithm is automatic, there
is no bias due to previous results from linguistics. This
is a particular advantage if the algorithm is to be
applied to applications other than speech.
The restriction of the unsupervised learning algorithm
to the time invariant case showed that the algorithm
had advantages over current methods. No knowledge
of the probabIlity measures is required, strict separa-

76

Spring Joint Computer Conference, 1970

hility of clusters is not required, the class of allowed
metrics is large, and there is no requirement that the
sample patterns he stored for processing since the
algorithm will accept patterns one at a time.

REFERENCES
1 B P LATHI
Signals, systems and communication
Wiley New York 1965
2 S C FRALICK
Learning to recognize patterns without a teacher
IEEE Trans Inf Th Vol 13 pp 57-64 January 1964
3 D P COOPER P W COOPER
Adaptive pattern recognition without supervision
Proc IEEE International Convention 1964
4 E A PATRICK J C HANCOCK
N onsupervised sequential clas8ification and recognition of
patterns
IEEE Trans on Inf Th Vol 12 July 1966
5 C G HILBORN D G LAIN lOLlS
Optimal unsupervised learning multicategory dependent
hypothesis pattern recognition
IEEE Trans Inf Th Vol 14 May 1968
6 E M BRAVERMAN
The method of potential functions in the problem of training
machines to recognize patterns without a teacher
Automation and Remote Control Vol 27 October 1966
7 G H BALL D J HALL
ISODATA -An iterative method of multivariate analysis and
pattern classification

International Communications Conference Philadelphia
Pennsylvania June 1966
8 J GAZDAG
A method of decoding speech
University of Illinois AD 641 132 June 1966
9 K W OTTEN
Simulation and evaluation of phonetic speech recognition
techniques-Vol I I I Acoustical characteristics of speech sounds
systematically arranged in the form of tables
NCR Company AD 601422 March 1964
10 I LEHISTE
Acoustical characteristics of selected English consonants
Int J Am Linguistics Vol 30 July 1964
11 W F MEEKER A L NELSON P B SCOTT
Voice to teletype code converter research program. Part 11Experimental verification of a method to recognize phonetic
sounds
Technical Report ASD-TR61-666 Part II AD 288099
September 1962
12 K W OTTEN
Simulation and evalutation of phonetic speech recognition
techniques VolJI-Segmentation of continuous speech into
phonemes
NCR Company AD 601423 March1964
13 J L FLANNAGAN
Speech analysis, synthesis and perception
Springer-Verlag New York 1965
14 M KABRISKY
A proposed model for visual information processing in the
human brain
University of Illinois Press Urbana Illinois 1966
15 J S KENYON T A KNOTT
A prouncing dictionary of American English
G & G Merriam Springfield Massachusetts 1953

Pulse-Amplitude Transmission System (PATSY)*
by NEAL L. WALTERS
IBM Corporation
Research Triangle Park, North Carolina

SU1VIMARY

The data entry units are polled (scanned) by the
area stations, at a rate of 250 units a second, to determine when one is ready to send data. Line turnaround is relatively unimportant because lines are short
and data flow is primarily one-way (from the terminating data entry units into the area stations). The
only transmissions out to the data entry units are a
signal to start the unit reading when the scanner in an
area station finds a data entry unit requiring service,
and an acknowledgment of a data entry indicating
whether it has been satisfactorily received.
The low-speed system has similarities to a TouchTone telephone except that resistors have been substituted for the tone oscillators (Figure 2). Resistor-diode
circuits are less expensive and may be more reliable
than oscillator circuits. They are arranged, with two
diodes, into two banks. By sampling with alternate
positive and negative pulses, one particular resistance
value (and, thus, character) out of the character set
can be identified. Alternate pulsing makes it possible
to use half as many resistors with twice the separation
between values, and thus obtain more reliable identification.
Therefore, the operation of the PATSY adapter in
the area station is similar to that of an ohmmeter,
capable of reading resistances in either polarity. When
a data entry unit is ready, it informs the area station
by placing a short circuit on its pair of wires.
The area station then sends a voltage pulse as a start
command to the data entry units, which begins placing
combinations of resistances on the wires, interspersed
with "open-circuit" periods. The area station uses the
open-circuit condition to provide timing information
between characters, and reads the resistance levels to
determine what characters are being transmitted. When
the transmission has been completed, the area station
sends another voltage pulse to the data entry unit to
indicate that the message has been received.

A new type of pulse-amplitude transmission system
(PATSY) has been developed to transmit numeric data
asynchronously over relatively short distances. The
"transmitting station" requires no power or data set.
As many as 32 transmitting stations can be attached
to a single "receiving station." Other advantages are
simplicity, flexibility, and low cost.
The low-speed system has similarities to a TouchTone** telephone, using resistors instead of tone oscillators .. Resistors are less expensive and may be more
reliable. The adapter in the sending station operates
similarly to an ohmmeter which reads resistances in
either polarity.
INTRODUCTION
A new type of pulse-amplitude transmission has been
developed to transmit numeric data asynchronously at
low rates* over short distances (up to 1,000 feet). The
transmission is unique in that the terminating "data
entry unit" requires no power or data set. Other advantages are two-wire lines which are easy to install
and change, small amount and simplicity of hardware,
and low cost.
The pulse-amplitude transmission system (PATSY)
is being used to transmit data on personnel and machine
performance, materials, job status, etc., between data
entry units and area stations. Up to 32 data entry
units can be attached to one area station.

* PATSY is not a pulse-amplitude transmission system in the
sense that the transmitting station sends pulses to the receiving
station. However, the data appearing on' the transmission lines
takes the form of amplitude-encoded pulses.
** Trademark of Bell Telephone Company.
* At 40 cps in the IBM 2790 Communication System for which
developed (Figure 1), but could be used for higher rates.

77

78

Spring Joint Computer Conference, 1970

PATSY occurs in the sequence shown in the following
paragraphs.
Inactive (used for diagnostic testing)

In the inactive state, a data entry unit will have R3
with polarity 2 across the lines. The polarity 1 resistor
is not used but the polarity 2 resistance allows the
system to check the wiring and connections to the unit
in a diagnostic-test mode by having the adapter read
that P 2 resistance. In this way, shorted or open lines
and missing units can be detected.
Request for service

When a data entry unit is ready to send data, it
makes a request for service by a R 4, Pl. This resistor
must remain on the line until sometime after the start
command. The P 2 impedance is not read.
Start command

Figure 1-IBM 2790 communication system

THE PATSY INTERFACE
To distinguish between the two wires connecting the
Data Entry Unit and the PATSY adapter, one wire
will be called the "high line" and the other the "low
line." Values of resistance placed across the lines by
the Data Entry Unit will be referred to as follows:
resistances measured with the high line po~itive with
respect to the low line will be labeled "polarity 1"
(P l ) and resistances measured with opposite polarity
will be labeled "polarity 2" (P2 ). There are five resistance levels in the PATSY code. They are: Ro, an
open circuit; R l , R 2, and Ra in order of descending
resistance; and R 4, a short circuit. Transmission over
- - - - _ r - - - - r - - - - - r - - - O O High Line

R3

R2

R4

R3

L-----~---~-----~--------~----~~wLine

Figure 2-Examples of PATSY characters

The Start Command sent by the PATSY adapter
to the data entry unit is 11 P 2 voltage pulse; that is, the
low-line is positive with respect to the high line. Pulse
amplitude is 30 volts with a duration of approximately
60 milliseconds, and it is able to drive a 100-ohm load.
Data transmission

The data entry unit is required to provide a P l open
circuit continuously for a minimum of three milliseconds preceding each character. For the first character,
this open circuit time begins after the start command
pulse has finished. There is no limit on how long an
ope'n circuit can last (within the constraints of the timeout criteria of the particular program controlling the
system).
After the open-circuit time, the data entry unit places
the P l resistance on the line (which will be in the normal
high-line positive state for open-circuit time and the
first part of the character reading). After approximately
five milliseconds, the polarity on the line will be reversed
by the adapter and the P2resistance will be read. After
six more milliseconds, the line will be reversed to wait
for the next Pl, open circuit condition.
These two resistances, of opposite polarity, constitute
a character. They may be placed across the line at the
same time and left across the line for a minimum of 12
milliseconds. The maximum time that they can be left
on the line is only limited by system time-out constraints. Opening the circuit by removing the P l re-

PATSY

2nd Polari ty
Reading

Undetectable

1st Polarity
Reading

*

*

=

*

*

5

*

4

7

6

1

-

Space

fTX

2

9

*

0

3

8

* Unassigned
TABLE 1-2790 5-Level PATSY Code

sistor indicates to the receiving station that character
transmission has been completed.
Table I shows the coding used for PATSY characters.
The rows represent the P l half of a character, the
columns the P 2 half. Ro is illegal as the P l half of a
character sin~e a Pl, open circuit is reserved for the
timing information used to. separate characters. The
10 numbers, dash, equal sign, space and ETX (end of
text) are used, leaving six unassigned characters.

79

A normal end command will return the data entry
unit to its inactive state and is the same as the start
command except that it lasts twice as long. An error
end command has the same amplitude and duration of
the normal end command but will have the opposite
polarity. The error end command inhibits further operation of the data entry unit until the operator resets the
terminal. He is informed of the error by a red button
popping up.
RESISTANCE READING
Resistance values are sensed by the circuit shown in
Figure 3. The plus voltage is used as the transmission
voltage as well as the circuit supply. In the block
diagram (Figure 3a), the reference resistors of block 3
provide the high and low parts of a voltage divider with
the data-entry-unit resisto:r, plus wire and other unknown impedances. The matched 200-ohm reference
resistors were chosen to minimize noise pickup by their
relatively low impedance and the desire to keep the
line balanced. The polarity switch and low-pass filter
are omitted from Figure 3b for simplicity.
This combination of impedances produces a difference
voltage, V d , which is the input to a differential amplifier,

End request

The. particular combination of resistances, R g, with
polarities 1 and 2, constitute the end request from a
data entry unit, and is sent as the last data character.
It is not necessary that the end request be followed by
an open circuit. When the PATSY ETX character is
detected, reading ceases and the receiving station waits
for instructions from the system controller on how to
answer the data entry unit.

 01), etc.
The value of the differential voltages 01, 02, etc., are
determined by the transmission-resistor values. (See
Appendix.) The number of threshold detectors are the
same as tlie number of discrete resistor values to be
read, and hence, determine the size of the character set.
In addition, threshold detector one indicates the opencircuit condition since any resistance too large to satisfy
the first threshold is defined as an open circuit.
On the data-entry-unit side of the reference impedances is a polarity switch and a low-pass filter which
rolls off at 600 Hz or about ten times greater than the
maximum-character rate. ·The purpose of the switch is
to read the data-entry-unit resistance value in both
directions. In block one of Figure 3a, normally point a
is connected to b, and c to d, but the connection can be
reversed where point c is connected to b, and a to d,
all under control of the logic. Point b on the polarity
switch is always positive with respect to point d, which
simplifies the low-pass filter and difference amplifier.
As stated previously, an uninterrupted period when
there is a positive open circuit must precede each
character. When this open circuit is ended, a reading
sequence begins under logic timing control. Table II
shows the steps in reading a character.
Figure 4 shows the waveforms on the high and low
lines for a complete message transmission from a data
entry unit to a PATSY adapter.

_"',.
Figure 4-High-and low-line PATSY waveforms

POLLING AND LINE MULTIPLEXING OF
DATA ENTRY UNITS
The polling and line configuration is shown in Figure
5. Up to 32 data entry units can be attached to a single
PATSY adapter. There are two groups of multiplex
switches, the high-line and low-line switches.
For polling (scanning) data entry units, all the highline switches are closed and all the low-line switches are
open. The resistance-sensing circuit is used in an un-

High-line
Multiplex
Switches

r--l

To Dala Entry Unit:

No. 0
No.1
:
No. 31 ______

:'----+----11

+--u'-~~

ResislanceReading
Circuit
High Line

Rl

low Line

R3

R2

,----,
low-line
Multiple"
Switches

r- Dala Entry Unit:

No. 0

Entry~

I

linary Count.

I ~

P

I

II

Address

Poll

I

No'31-=f~.
I ~
--1
N

31

5

I
Dola
Entry
Unit

~-!::--....,':'.....--+-~~ ~,
N

R4

Indicates Dala
Unit is reody to
T_lt

' - __

p
'N

or

Figure 5-Polling and addressing of data entry unit

PATSY

balanced mode since there is no connection to the lowline input. Each low line has a scanning transistor.
When one of these transistors is turned on, a current
p~th .exists from the high line of the resistance-reading
CIrcUIt through the particular data entry unit being
polled, and to ground through the scanning transistor.
When a data entry unit desires service, it places a PI
~hort circuit on its wires. When that data entry unit
IS polled, the voltage on the high line of the resistance
reading circuit is pulled down toward ground, making
Vd equal approximately 0, satisfying the lowest threshold and indicating a request.
Polling is under the control of a binary counter.
When polling, the counter is advanced by clock pulses
every four milliseconds. A counter decode is used to
sequentially turn on the scanning transistors, 250 data
entry units being scanned each second. When a request
for service is detected, the clock pulses are clegated
from the counter for the duration of the time that that
?ata entry unit is being read. Thus, the binary number
In the counter becomes the data entry unit address,
an~ t~e transmission of an address by the data entry
. umt IS unnecessary. The identification of the transmitting data entry unit by the pair of wires to which
attached is a novel feature in PATSY.
When a request is detected, polling ceases and the
scanning transistors are no longer used. However, the
decode now is used to close the proper high-line and
low-line switches between the data entry unit and the
receiving station to provide a dedicated connection.
Figure 6 shows the location of the 30-volt switch
circuits which provide the Start Command to initiate a
transmission and the End Command to terminate a
transmission. The resistance reading circuit is disconnecte~ fro~ t~e lines when the 30 volts is applied.
IdentIcal CIrcUIts attached to both the high and low

81

lines allow the 30 volts of either polarity to be applied
to the data entry unit.
TRANSMITTING STATION
PATSY permits the attachment of a wide variety of
data entry units to an adapter in a data collection
system. The data entry units can be either mechanical
using brush commutators, or employ solid-state tech~
nology. such ,as silicon-controlled rectifiers (SCR's).
There IS enough power in the start and answer pulses
to pick relays or energize solenoids. In addition, there
is no restriction on how slowly the data can be transmitted, permitting live keyboards to be intermixed with
fixed-rate devices on the same adapter. In addition, the
receiving station is transparent to message length. The
first character transmitted is a transaction code, telling
the system what type of message will be transmitted.
The second character tells the system what type of
data entry unit is sending, indicating the message
length to expect.
Figure 7 is a schematic of a mechanical data entry
unit used in the IB1VI 2790 Communication System.
A brush commutator is on the left; a card/badge reader
in matrix form on the right. When a card or badge is
to be read, there is one electrical connection made on
each row. The commutator is cocked in the request
position. When the receiving station sends a start
command, the magnet is picked releasing the commutator to scan through the rows, alternately placing the
character resistors and an open circuit across the lines.
The commutator is spring powered. Note the simplicity
of the transmission components required to transmit
the 12-character set of this station: six resistors and
15 diodes.

~~
1IooI_-~.

-

~

~
s,oco •

"'-,

!.l!..
!L

-

•

.....

HIP-

7

6

5

•

3

2

,

0

~

Sw1
Sw6

...-u..
c r.....
f'.;
7

---

~

-c.:

-

,
Col

'a'

.~

I---c*-

If

t ~n

fnf

=3

......
Figure 6-30-volt start and answer circuits

'2
'3

1

,y

Figure 7-Data entry unit schematic

'2
'2
13
13

82

Spring Joint Computer Conference, 1970

I

Vo

0-------------------------------log RPotsy

VR
A
VR,{
V {
R2
VR3 {

Figure 8-Voltage band for V D

+Ecc

, 1
'2

&3

'4

VR4 [
APPENDIX-SELECTION OF PATSY
TRANSMISSION RESISTOR VALUES
Figure lO-Placement of threshold levels

Neglecting the effects of the polarity switch, the
circuit used to read PATSY transmission is shown in
Figure 3b. Where RREF = 200 ohms, RPATSY is the
transmission resistor value and RSERIES is all other
resistance, i.e., wire resistance, diode drop, etc. Worst
case V D maximum and minimum can be expressed as:

V =
D

VD =
-

2R·

_

+

If,PATSY

2RREF

+

(+Ecc )

+ If,PATSY - - -

+Ecc

--V
RA

+

RSERIES
RPATSY
-E)
(
R
+cc
SERIES
RPATSY

REF

(1)

(2)

= 12

VR) {
Vo

VR2

C

VR3 {

I

Ito

Log RPatsy-

Figure 9-Location of PATSY resistors in voltage band

Where underlined values are minimums and overlined
are maximums. The plot of VD and YD versus RPATSY
yields the curves shown in Figure 8.
The PATSY transmission resistor values were chosen
to give the largest difference between f D associated
with one resistor value and VD associated with the next.
smallest resistor value. The plot in Figure 9 shows the
PATSY values on the VD plot.
The voltage bands V Rp V R2' etc., are separated by
guard-band voltages of approximately 1.5 volts.
When one of the PATSY resistors is being read, the
difference voltage VD will fall into one of the voltage
bands V Rn. The threshold voltages 0 of the resistance
reading circuit are then placed in the guard bands as
shown in Figure 10.
The threshold voltages are placed below center of the
guard bands because of the nature of the reading.
When a resistance is being read, VD is sampled for a
period of three milliseconds, and the lowest threshold
which is crossed in that time is considered to be the
reading. Thus, the thresholds are offset to allow more
noise rejection.
Also, it can be seen from the above diagram that
four thresholds can be used to detect five levels where
01 indicates Rl or lower if crossed and RA if not crossed.

Termination of programs represented
as interpreted graphs *
by ZOHAR MANNA**
Carnegie-Mellon University
Pittsburgh, Pennsylvania

INTRODUCTION

WELL-ORDERED SETS

This work is concerned with the termination problem
of interpreted graphs. An interpreted graph can be considered as an abstract model of algorithms; it consists
of a directed graph, where:

A pair (8, > ) is called an ordered set, provided that
8 is a set and » is a relation defined for every pair of
distinct elements a and b of 8, and satisfies the following
two conditions:

1. With each vertex v, there is associated a domain D v ,
and
2. With each arc a leading from vertex v to vertex v',
there is associated a total test predicate Pa(Dv ~
{T, F}) and a total function fa(Dv*~Dvl), where
Dv* = {xl x E Dv /\ Pa(X) = T}.

1. If a ~ b, then either a » b or b » a;
2. If a » band b » c, then a » c (i.e., the relation is
transitive) .

Let us represent by a state vector x the current
values of the variables during an execution of an interpreted graph lG. An execution of lG may start from
any vertex v with any initial state vector xoE Dv. If
during execution we reach vertex v with state vector x,
P a (x) represents the condition that arc a (leading
from v) may be entered, and fa represents the operation
of changing the state vector x to fa (x) when control
moves along area. Execution will}:lalt on vertex v, with
state vector x, if and only if no predicate on any arc
leading from v is true for x. An interpreted graph
terminates if and only if all the executions of lG
terminate.
Our main result is a sufficient condition for the
termination of interpreted graphs defined by means of
well-ordered sets. This result has applications in
proving the termination of various classes of algorithms.
Floyd 1 has discussed the use of well-ordered sets for
proving the termination of programs.

* This work is based on the author's Ph.D. Thesis. 2 The work
was supported by the Advanced Research Projects Agency of the
Office of the Secretary of Defense (SD-146).
** Present address-Computer Science Department, Stanford
University, Stanford, California.

A well-ordered set W is an ordered set (8, ») in
which every non-empty subset has a first element;
equivalently, in which every decreasing sequence of
elements a » b » c··· has only finitely many elements.
For example,

1. I I + (the set of all non-negative integers) is wellordered by its natural order, i.e., to, 1, 2,3, ... }.
2. In + (the set of all n-tuples of non-negative integers
for some fixed n, n ~ 1) is well-ordered by the usual
lexicographic order, i.e., (al)~'···' an) » (b l ,
b2, ••• ; bn ) if and only if al = bl , ~ = b2 ,
ak-l = bk- 1, ak > bk for some k, 1 :::; k :::; n.
DIRECTED GRAPHS
A directed graph G (graph, for short) is an ordered
triple (V,·L, A) where:

1. V is a non-empty finite set of elements called the
vertices of G;
2. L is a non-empty set of elements called the labels
of G; and
3. A is a non-empty set of ordered triples (v, l, v') E
V X L X V called the arcs of G.
Note that L and A may be infinite sets. If L and A are
finite sets, G is called a finite directed graph.

83

84

Spring Joint Computer Conference, 1970

A finite path of a graph G (path, for short) is a finite
sequence of n, n ~ 1, arcs of G of the form:

1G is a (finite or infinite) sequence of the form

(v(O),

leo)

l(I)

l(2)

x(O))~(v(I), x(I))~(V(2\ X(2))~

••• ,

where
1. v(j) E V, l(j) ELand xU) E Dv W for all j
2. (v(O), x(O)) is (vo, xo);

We say that:
1. The path joins the vertices ViI and Vin+1 and meets
the vertices ViI' Vi2' "', Vi n+1'
2. The path is elementary if the vertices ViI' Vi2' ••• ,
Vi n +l are distinct.
.
3. The path is a cycle if the vertex ViI coincides with
the vertex Vin+l; it is an elementary cycle if in addition
the vertices VilJ Vi2' ••• , Vin are distinct.
We define a cut set of a graph G as a set of vertices
having the property that every cycle of G meets at
least one vertex of the set.
A graph G is said to be strongly connected if there is a
path joining any ordered pair of distinct vertices of G.
Let G be a graph (V, L, A ). We define a subgraph
GI = (VI, L, AI) of G as the triple consisting of VI, L
and AI, where VI is a subset of V and Al is defined by
Al = A n (VI X L X VI).
A subgraph GI = (VI, L, AI) of G is said to be a
strongly connected component of G if:
1. GI is strongly connected, and
2. For all subsets V 2 C V such that V 2 ~ VI and
V 2 ::) VI, the subgr;ph G2 = (V2' L, A 2) is not
strongly connected.
A tree T == (V, L, A, r) is a directed graph (V, L, A)
with a distinguished root rE V, such that for every
VE V (v ~ r) there is at least one path joining rand v. *

~

0;

l(j)
.
.
.,
3. If (v(j), x(j))~(v()+l), X(J+I)) IS III the sequence,
then there exists an arc a = (v(j), lW, V(Hl)) E A
such that P aXU) = T and fax(j) = X(Hl);
4. If the sequence is finite and the last pair in the sequence is (v(n), x(n)), then for all arcs a E A leading
from v(n): P ax(n) = F.
The definition of an interpreted graph 1G allows
the existence of a vertex vE V, an xED v , and two
distinct arcs a, bE A leading from v-such that both
Pax = T and PbX = T, i.e., the predicates on all arcs
leading from the vertex V are not necessarily mutually
exclusive. It follows that for the fixed pair (vo, xo) E
VXD vo , there may exist many distinct (vo, xo)-execution
sequences of 1G. For this reason, the execution process
of an interpreted graph, starting with the pair (vo, xo),
is best described by a tree.
The execution tree T(vo, xo) of 1G is the tree (V', L,A',
(vo, xo)), where:
1. The set of vertices V'is the set of all pairs (v, x) E
V X Dv occurring in some (vo, xo)-execution sequence of 1G.
2. L is the set of labels of 1G;
3. The set of arcs A' is the set of all triples ((v, x), l,
(v', y)) E V' X L X V', such that (v, x) ~ (v', y)
occurs in some (vo, xo)-execution sequence of 1G; and
4. (vo, xo) E V'is the_ root vertex of the tree.

INTERPRETED GRAPHS
Example
An interpreted graph 1G consists of a directed graph
(V, L, A), where
1. With each vertex vE V, there is associated a domain
D v , and
2. With each arc a = (v, l, v') E A, there is associated
a total test predicate P a which maps Dv into {T, F},
and a total function fa which maps Dv* into DV
where Dv* = {x I x E Dv /\ Pa(X) = T}.

Let us consider the interpreted graph 1G* (Figure
1), where D V1 · = DV2 = {the integers}. There are three

1,

£

Q

Let (vo, xo) E V X D Vo be an arbitrary pair of an
interpreted graph 1G. A (vo, xo)-execution sequence of

* Note that the standard definition of a tree has the restriction
that for every vE V (v ¢ r) there must be exactly one path joining
r and v.

Figure I-The interpreted graph fG*

Termination of Programs Represented as Interpreted Graphs

(Vo, xo) E V X Dvo all the
are finite.

(vo, xo) -execution

85

sequences

Notations

Let

=

ex

(al,~,···J aq),

E A for 1
graph. Then let
vCiH»

~

~

j

where aj =

(vCi),

l(j),

q, be any path of an interpreted

1. fa(x) stand for faq(· •• (fa2( fal (x) )) ••• ), and
2. Pa(X) stand for

x EDv(1) 1\ Pal (X)
q

1\ 1\ P aj (faj_l(faj_2(···(fa2(fal(X»)) ••• )))
j=2

Theorem 1

Le;t IG be an interpreted graph. If there exist:
1. A cut set V* of the vertices V of IG, and
2. For every vertex V E V*, a well-ordered set Wv
(Sv, > v) and a total function Fv which maps Dv
into Sv,
such that,
3. For every cycle ex of IG:
l(1)

l(2)

V(1)~V(2)._~V(3)

(VI,

-4)-execution sequences of IG*, namely

l(q-l)

l(q)

••• v(q-IL-~v(q)~v(l)

(where vel) E V* and V(k) ~ vel) for alII
and for every x such that Pa(X) = T:
Fv(l) (x)

<

k

~

q),

> v(I)Fv(l) (fa (X»)

then IG terminates.*
Proof

and
Proof by contradiction.
Let us assume that IG does not terminate, i.e., there
exists an infinite execution sequence "I in IG,
4 (V2' 2) ~ (VI, -1) 4 (V2, 1) ~ (VI, 0).

The execution tree
Figure 2.

T(vI,-4)

of IG* is presented in

"I: (v(O), x(O»

Let

"I'

leo)

~ (v(l), X(l»

Definition

An interpreted graph is said to terminate if all its
execution sequences are finite, i.e., for every pair

l(2)

be the infinite path
"I':

TER1VIINATION OF INTERPRETED GRAPHS

l(1)

- - * (V(2), X(2» - - * .••.

ZOO)
ZU)
l~)
v(0)--*V(I)--*V(2)~ •

•••

Since IG, by definition, contains a finite set of
vertices, and V*' is a cut set, it follows that there

* In Manna2 it is proved, by the use of Konig's Infinity Lemma,
that if IG consists of a finite directed graph, then this is also a
necessary condition for the termination of IG.

86

Spring Joint Computer Conference, 1970

exists a vertex v* E V* that occurs infinitely many
times in "I'.
Let vent), v(n 2), vena), ••• (0 ~ nj < njH for j ~ 1) be
the infinite, sequence of all occurrences of the vertex
v* in "I'. Therefore, the infinite execution sequence "I
can be written as
"I:

leo)
l(nl)
(v(O), x(O») - - - t ••• (v(n l), x(n l») ~ •••

Then a strongly connected component IG' of IG consists of a strongly connected component G' =
(V', L, A') of G, where,
1. With each vertex v E V', there is associated the domain Dv of IG and
2. With each arc a E A', there is associated the test
predicate P a and the function fa of IG.
Theorem 2

Then, by condition (3) it follows that
An interpreted graph I G terminates if and only if all
its strongly connected components terminate.
i.e., there is an infinite decreasing sequence in W v *.
But this contradicts the fact that W v * is a well-ordered
set.
q.e.d.
The following corollaries follow directly from Theorem'1.
Corollary 1

Proof
(=}) Follows directly from the definition of termination of interpreted graphs.
(¢=) Proof by contradiction.
Let us assume that IG does not terminate, i.e., there
exists an infinite execution sequence "I in IG,

Let IG be an interpreted graph. If there exist:
1. A cut set V* of the vertices V of IG,
2. A well-ordered set W = (8, >-), and
3. For every vert ax v E V*, a total function Fv that
maps Dv into S,
such that
4. For every elementary path a of IG:
l(O

l(2)

l(q-l)
••• V(q-I)~V(q)
(where v(l), v(q) E V* and v(j) E V* for all j,
1 < j < q), and for every x such that Pa(X) = T:
Fv(I)(X) >- Fv(q) (fa(x),

V(1)~V(2)~V(3)

then IG terminates.
Corollary 2

Let IG be an interpreted graph, which has a vertex
v* common to all its (elementary) cycles.
If there exist a well-ordered set W = (S, >-) and a
total function F which 'maps D v* into S, such that for
every elementary cycle a: v* ~ ••• ~ v* and for
every x such that Pa(X) = T: F(x) >- F(fa(x),
then IG terminates.
Definition

Let IG be an interpreted graph constructed from
the directed graph G.

"I:

leO)
(v(O), x(O») ~ (V(l),

l(O

X(l») ~ (V(2),

l(2)
X(2») - - - t •••.

Let "I' be the infinite path
"I':

lro)

lO)

l~)

vro)~V(1)~V(2)~

•••.

Since IG, by definition, contains a finite set of
vertices, it follows that there exist finitely many
vertices of G that meet "I' only a finite number of times.
Let v(n l ), v(n 2 ), • • • , v(nq) (0 ~ nj < njH for 1 ~ j < q)
be the list of their occurrences in "I"
It follows that all the vertices v(j) (j > n q ) of "I' are
in some strongly connected component G' of G.
This implies that there exists a strongly connected
component IG' of IG such that the infinite subsequence of "I:

is an infinite execution sequence of IG', i.e., IG' does
not terminate. Contradiction.
q.e.d.

APPLICATIONS
The results of the preceding section can be used for
proving termination of various classes of algorithms.
In this section we shall illustrate the use of the
results for proving termination of programs and recursively defined functions.

Termination of Programs Represented as Interpreted Graphs

87

+ - - - - 0 : { U~ ks n)
1 A (k, n integers)

a
(i, j, k)-(k + 1, j, k)

USkSn_l)
A (2s is n + 1)

!

r.:-- --,D-D·
akk~J
,_____

A (i, k, nintegers)

0, j, k)+-O + 1, j, k)
d

(lsksn-1)
--Q2: 1\ (2sisn + 1)
{ 1\ (i, k,n integers)

0, j, k)-O, n, k)
USkSn_u
A (2sisn)
{ A (k:s j S n)
A (i, j, k, n integers)

i_i+l
(l~k~n-1)

--Q3:

yes

{

(i, j, k)-(i, j -1, k)

1\ (2sisn)
1\ (k +b:jsn +1)
1\ (i, j, k,n integers)

e
Figure 4-The interpreted graph fG l

Figure 3-A program for evaluating a determinant I aijl of order
n, n 2:: 1, by Gaussian elimination

Example 1:

Consider the program (Figure 3) * for evaluating a
determinant I aij I of order n, n ~ 1, by Gaussian
elimination. ** Here n is an integer constant , (a··)
.<
tJ 1<'
-t.J-n
a r.eal array, D a real variable, and i, j, k integer
varIables. We want to show that the program terminates
for every positive integer n.
Since neither D nor any aij occurs in a test box or
affects the value of any variable that occurs in a test
box, it is clear that by erasing the three assignments,
denoted by dashed boxes in Figure 3, we do not change
the termination properties of the program.
One can verify easily that the set of predicates
attached to the test boxes of the flowchart is a valid

interpretation with respect to the initial predicate "n
positive integer"; i.e., starting with any initial positive
integer n, whenever the flow of control through the
flowchart reaches the test box Bi the current values
of the variables satisfy the predicate qi (see Floyd1).
Let us construct now, from the .reduced program
(Figure 3 without the dashed boxes), the appropriate
interpreted graph 1G1 (Figure 4), such that each
vertex Vi, 1 ~ i ~ 3, of Figure 4 corresponds to the
test box Bi of Figure 3, and its domain Di is exactly
the valid interpretation qi. Note that we have used
Theorem 2 here, since we consider only the strongly
connected component of our graph.
It is clear that, if the interpreted graph 1G1 terminates, then the given program terminates for every
positive integer n. But the termination of 1G1 follows
from Corollary 1, where
V* = {2, 3} is the cut set,

W = 13+ is the well-ordered set,
F2 (i, j, k) = (n - 1 - k, n

* Ignorefor a moment the predicates qr, q2 and qa associated with
the test boxes.
** We consider the division operator over the real domain as a
total function. (Interpret, for example, r /0 as r /10-10 for every
real r.)

+1-

i, n

+ 1)

is the mapping of D2 into W, and

F3(i, j, k)

(n - 1 - k, n

+1-

i, j)

is the mapping of D3 into W.

88

Spring Joint Computer Conference, 1970

(./2)

(x, y)- (rem(y, x), x)

a

(Xs y) A

(rem(y, x) " 0)

x>y

Figure 5-The interpreted graph IG2

Example 2:
Consider the function gcd(x, y) (McCarthy 3).
gcd(x, y) computes the greatest common divisor of x
and y (where x and yare positive integers), and is
defined recursively using the Euclidean Algorithm by
gcd(x, y) = if x > y then gcd(y, x)
else if rem(y, x) = 0 then x
else gcd (rem (y, x), x),
where rem(u, v) is the remainder of u/v.
We want to show that for every pair (x, y) of positive integers, the recursive process for computing
gcd(x, y) terminates.
Consider the interpreted graph 1G2 (Figure 5)
where D = {positive integers} X {positive integers}.
By considering the vertex v as representing the start
of the computation of gcd, it follows that the recursive
process for computing gcd(x, y) terminates for every
pair of positive integers (x, y) , if and only if the interpreted graph 1G2 terminates.
Since 1G2 consists only of one vertex, we may use

Figure 7(a)-The execution tree T(v, (1,2))
(b)-The real execution tree of A(l, 2)

Corollary 2 to show its termination. So, let W = / 1+
be the well-ordered set, and F(x, yj = rem(y, x) the
mapping of D into W. Since*
1. Pa(x, y) = T =>(x, y) E D /\ (x

>

y)

=>(rem(y, x) = y)
/\ (y

=> rem(y, x)
=> F(x, y)

> rem(x, y)

~

0)

> rem(x, y)

> F(y,

x), and

2. P{J(x, y) = T => (rem(y, x), x) E D

=> rem(y, x)
=> F(x, y)

x=t:o"y=o
Figure 6-The interpreted graph IG 3

> rem (x, rem(y, x))

> F(rem(y,

x), x),

it follows by Corollary 2 that the interpreted graph 1G2
terminates, which implies the desired result.

* Note that for every non-negative integer x, and for every
positive z: z > rem(x,z) ~ O.

Termination of Programs Represented as Interpreted Graphs

89

integers). But the termination of the interpreted graph
follows clearly from Corollary 2, where

Example 3:

Ackermann's function A (x, y), where x and yare
non-negative integers, is defined recursively by:
A (0, y)

=y

+1

+ 1,0) = A(x, 1)
A(x + 1, y + 1) = A (x, A(x + 1, y».

W = 1 2+ is the well-ordered set,and
F(x, y)

= (x, y) is the mapping of D into W.

A(x

We want to show that for every pair (x, y) of nonnegative integers, the recursive process for computing
A (x, y) terminates.
Let us consider the interpreted graph 1Gs (Figure 6),
where D = {non-negative integers} X {non-negative
integers}. The arc a represents infinitely many arcs
ao, ai, a2, ••• leading from vertex v to vertex v and with
each ,arc ai, i ~ 0, there is associated the test predicate
x ~ 0 /\ y ~ 0 and the function (x, y) f - (x - 1, i).
In other words, 'any' represents all the non-negative
integers and therefore includes all possible values of
A(x
1, y). It follows that, for every pair (x, y) of
non-negative integers, the execution tree T(v, (x, y» of
1Gs (i.e., execution starts from v with (x, y» contains
the real execution tree of A (x, y) as a "subtree". This
is illustrated in Figure 7 for A (1, 2).
This implies that if the interpreted graph 1Gs
termites, then the recursive process for computing
A(x, y) terminates (for every pair (x, y) of non-negative

+

ACKNOWLEDGMENTS
I am indebted to Robert W. Floyd for his help and
encouragement ,throughout this research. I am also
grateful to Stephen Ness for his detailed reading of
the manuscript.

REFERENCES
1 R W FLOYD
Assigning meaning to programs
Proceedings of Symposia in Applied Mathematics American
Mathematical Society Vol 19 pp 19-32 1967
2 Z MANNA
Termination of algorithms
PhD Thesis Computer Science Department Carnegie-Mellon
University April 1968
3 J ,McCARTHY
Recursive functions of symbolic expressions and their
computation by machine part I
Communications of the ACM Vol 3 No 4 pp 184-195 April
1960

A planarity algorithm based on the
Kuratowski theorem *
by PENG-SIU MEl
Honeywell, Inc.
Wellesley Hills, Massachusetts

and
NORMAN E. GIBBS
College of William and Mary
Williamsburg, Virginia

is a finite undirected graph without loops or multiple
edges, or more simply, a graph.
Definition 2: A subgraph G' = (V', E') of a graph
G= (V, E) is a graph where V' <: V and E' C E.
Definition 3: Let G = (V, E) be a graph and X a
non-empty subset oCE, then SG(X) = (V', X), the
subgraph of G spanned by X, is the subgraph of G where
V' = {v I V E V and for some x E X, V Ex}.
Definition 4: A non-empty subset C of edges of a
graph G is a circuit (or cycle) of G if 8 G (C) = (V', C)
is such that for each V E V', there are exactly two elements of C which contain v, and C does not properly
contain any other circuit of G. C is said to be of length k
if it has k elements.
Definition 5: The class of all subgraphs of G which
are spanned by the union of two distinct circuits of G
will be denoted by TC (G), i.e.,

INTRODUCTION
In the layout of integrated circuits and printed circuits, one often wants to know if a particular electrical network is planar, i.e., can be imbedded in the
plane without having any line crossing another line.
Our algorithm, when given a finite graph G can decide
if G is planar. The algorithm was implemented in
Fortran together with the Cycle Generation Algorithm for Finite Undirected Linear Graphs of Gibbs 3
and used extensively to test the planarity of a large
number of graphs. The distinguishing characteristic
of' this algorithm is its conceptual simplicity and its
ease of implementation on a computer. The computer
program took but a few days to write and debug.
In contrast to some of the recent work on the same
subject,t,2,6,7 this planarity algorithm is a direct application of the Kuratowski Theorem. It is based on the
observation that a Kuratowski graph can be spanned
by the union of two of its circuits. This algorithm can
be used in conjunction with existing algorithms which
generate all the circuits of a given graph G. After obtaining all the circuits of G, our algorithm examines
the subgraphs spanned by pairs of circuits to see if
these subgraphs contain a Kuratowski graph.
The paper begins with the necessary notation and
definitions. This is followed by the presentation of the
algorithm and a few brief comments.
Definition 1: Let V be a finite non-empty set and
E C {{VI, V2} I VI, V2 E V and VI ~ V2}, then G = (V, E)

TC(G)

= {SG(C 1 U C2 ) I C1 and C2
are distinct circuits of G}

Definition 6: Let G = (V, E) be a graph, we define
an open simple path of G inductively. (4), {e}), where
e E E, is an open simple path. If (V', E') is an open
simple path, then so is (V' U {v}, E' U {e} ) where
1. e E E - E'
2. V E V - V'
a.there is some e' E E' such that vEe ne'
4. for all v' E V', v' Ef e.
Figure 1 shows two examples of open simple paths.
Definition 7: Let (V, E) be an open simple path of
G and (VI, E) = SG(E). Then VI - V has exactly

* This work was supported in part by the National Science Foundation Grant GJ-120 and a Purdue David Ross Research Grant.
91

92

Spring Joint Computer Conference, 1970

Thtee-JoAlle Open Simple Path

Figure 1

two elements, u and v, and, we say u and v are connected
by the open simple path (V, E).
Definition 8: Two open simple paths (V', E') and
(V", E") are disjoint if and only if V' n V" =

E'

n

E"

=

Figure 3

cp.

Definition 9: A K5* graph is a gra~h which can be
constructed by taking a set V of five vertices and connecting every pair of distinct elements of V by an open
simple path such that these open simple paths are
pairwise disjoint.
Definition 10: A K*3.3 graph is a graph which can be
constructed by taking two disjoint sets, VI and V2 of
three vertices each and· connecting every member of
VI to every member of V2 by an open simple ·path
such that these open simple paths are pairwise disjoint.
Figure. 2 shows examples of the simplest K5* and
K*3.3 graphs.
Note that every K5* (every K*3.3) graph may be
obtained from that of· Figure 2 by replacing the set of
edges by a set of pairwise disjoint open simple paths.
Theorem (Kuratowski5): A graph is planar if and
only if it does not have a subgraph which is a K5*
or a K*3.3 graph..
. .
Observation (J. R. Buchi): A graph G IS planar If
and only if TC (G) does not contain any K5* or K*3.3
gra1>hs.

In fact this follows from the Kuratowski Theorem
because each K5* and each K*3.3 can be spanned by
two circuits. The union of the two circuits in Figure 3
span the K5* graph of Figure 2.
The union of the two circuits of Figure 4 span the
K*3.3 graph of Figure 2.
.
Weare now ready to state our algorithm WhICh may
be programmed in conjunction with a circuit generation algorithm, for example, the algorithms of Gotlieb
and Corneil, 4 and of Gibbs, 3 to determine whether or
not a graph G is planar from its vertex-adjacency
matrix (vertices vs. vertices). Given two circuits C1
and C2, let SG(C1 ) = (VI, Cl) and SG(C2 ) = (V2' C2~'
In brief, steps 2 and 3 of the algorithm check to see If
. SG(C1 U C2 ) is a K*5 graph. If SG(C1 U C2) is not a
K5* graph, VI n V 2 has more than five elements, and
C1 n C2 has more than two elements, then steps 5
through 8 of the algorithm essentially eliminate all
the vertices of degree 2 of SG(C1 U C2 ) and then check
to see if the resultant graph is K 3 •3-the simplest of
the K*3.3 graphs .

/ \.,

.
VI :. {vI'

Y),

v5 }

V2 =- {v2' v4 , v6 }

Figure 2

(;

1

Figure 4

A Planarity Algorithm

ALGORITHM
1. Given a graph G, generate all the circuits of length
five or greater.
2. Given two circuits C1 and C2 , let So(C1 ) =
(V1, C1 ) and So(C2) = (V 2, C2 ). If V 1 n V 2 has exactly
five elements and C1 n C2 = C/>, go to the next step,
otherwise, go to step 4.
3. Trace So (C1 ) in one direction and let (ViI' Vi2'
Vi3' Vi4' Vis) be the elements of V 1 n V 2 ordered in this
cyclic order. Check to see if these elements can be
placed in a cyclic order (ViI' Vi3' Vis, Vi2' Vi4) when
So(C2 ) is traced. If the answer is "yes," So(C1 U C2 )
is a Ks* graph and G is non-planar. If the answer is
"no," go to step 9.
4. If V 1 n V 2 has more than five elements and
C1 n C2 has more than two elements, go to the next
step, otherwise, go to step 9.
5. Form the vertex-adjacency matrix M = (mij)
of So(C1 U C2 ) as follows:

o if

{Vi,Vj} EE

1 if

{Vi, Vi}

E C 1 - C2

2

if

{Vi, Vj}

E C 2 - C1

3

if

{Vi, Vj}

E c1

93

9. The graph G is planar if there are no more pairs
of circuits to be considered. Otherwise, select another
pair of circuits C1 and C2 of G, and go to step 2.
CONCLUSION
It may take the algorithm a relatively long time to
find out that a large planar graph is indeed planar,
but the relative ease with which the algorithm can be
programmed should render it suitable for testing a
small number of graphs or graphs that do not have a
large number of circuits. Although a relatively large
computer was ~sed in our implementation, the algorithm is simple enough to be implemented on a computer of almost any size. Step 1 (the generation of
circuits) of the algorithm can be executed first and
the generated circuits can be stored on some form of
auxiliary storage. The check for planarity can then
be executed separately.

C1 U C 2

REFERENCES
mij =

n C2

6. Go through the matrix row by row once, doing
the following:
If row k has exactly two non-zero entries (note that
these must be equal), say mki and mkj are not zero,
then add mki to-mij, and mii and set mki, mik, mkj, and
mik to zero. Otherwise, go to the next row.
7. After the last row, if there remain exactly six
rows with non-zero entries and each of these rows has
exactly three non-zero entries, go to step 8, otherwise,
go to step 9.
8. The resultant matrix is the vertex-adjacency
matrix of a cubic graph G' = (V', E') with six vertices.
Let e1' be the circuit of G' consisting of the six edges
labeled by a "I" Or a "3" and let C2' be the circuit of
G' consisting of the six edges labeled by a "2" or a
"3." Note that So,(e1' U C2') =G'. Let (Vil1 Vi2' Vi3'
Vi4' Vis, Vi6) be the elements of V' in the cyclic order
obtained by tracing So' (C1') . in one direction with the
edge {ViI' Vi2} E C1' n Cl. Now start with ViI and go
to Vi2 and continue tracing So' (C2'). If the resultant
cyclic order of V' is (ViI' Vi2' Vis, Vis, Vi3' Vi4) , then
So( C 1 U C2 ) contains a K*3,3graph, otherwise, go to
step 9.

1 AUSLANDER S V PARTER
On imbedding graphs in the sphere
J Math and Mech VollO pp 517-523 1961
2 G J FISHER 0 WING
A n algorithm for testi;"g planar graphs from the incidence
matrix
Proc 7th Midwest Symp on Circuit Theory Ann Arbor
Michigan May 19~
3 N E GIBBS
A cycle generation algorithm for finite undirected linear graphs
Journal of the ACM Vol 16 No 4 pp 564-568 October 1969
4 C C GOTLIEB D G CORNElL
Algorithms for finding a fundamental set of cycles for an
undirected linear graph
Communications of the ACM VollO No 12 pp 780-783
December 1967
5 G KURATOWSKI
Sur le problem des Courbes Gauches en Topologie
Fund Math Vol 15 pp 271-283 1930
6 A LEMPEL S EVEN I CEDERBAUM
A n algorithm for planarity testing of graphs
Theory of Graphs International Symposium at Rome Gordon
and Breach New York 1967
7 P M LIN
On methods of detecting planar graphs
Proc 8th Midwest Symposium on Circuit Theory Colorado
State University June 1966
8 S MACLANE
A combinatorial condition for planar graphs
Fund Math Vol 28 pp 22-32 1937
9 H WHITNEY
N on-separable a11d planar graphs
Trans Am Math Society Vol 34 pp 339-362 1932

Combinational arithmetic systems for
the approximation of functions*
by CHIN TUNG
IBM Research Laboratory
San Jose, California

and
ALGIRDAS AVIZIENIS
University of California
Los Angeles, California

INTRODUCTION

A simple example-evaluation of polynomials-is
used here to illustrate the concept of CA nets.
The method suggested by Estrin,7 computing

The concepts of arithmetic building blocks (ABB) and
combinational arithmetic (CA) nets as well as their
applications have been previously reported in References 3, 4, and 5. The unique ABB, resulting from the
efforts of minimizing the set of building blocks in Reference 3, is designed at the arithmetic level, employing the
redundant signed-digit number system, 2 and is to be implemented as one package by LSI techniques. The
ABB performs arithmetic operations on individual
digits of radix r > 2 and its main transfer functions
are: the sum (symbol +) and product (symbol *) of
two digits, the multiple sum of m digits (m ::; r + 1),
(symbol ¢), and the reconversion to a non-redundant
form (symbol RS).
A single ABB may serve as the arithmetic processor
of a serially organized computer. Many ABB's can be
interconnected to form parallel arrays called combinational arithmetic (CA) nets which compute sums, products, quotients, or evaluate more complex functions:
trigonometric, exponential, logarithmic, gamma, etc.
Because of the use of signed-digit numbers, the parallel
addition and multiplication speed is independent of the
length of operands. A design procedure has been developed for CA netsS-a given algorithm is initially
represented by a directed graph (algorithm graph, or
A-graph), which is then converted to an interconnected
diagram of ABB's (hardware graph, or H-graph). The
delay through one ABB is defined to be one time unit,

Pn(X) = ao

+ alX + X2(~ + a3x)
+ x4(a4 + asx +

x2(a6

+ a7x») + ...

permits the fastest evaluation when CA nets are used.
This is shown in Figure 1 with n = 3; the extension to
higher values of n is evident. In general, the delay
through such a net is Ilog2 n I + 1 multiplicationaddition times.
This paper summarizes our study of applying a particular version of CA nets, i.e., pipelined CA nets, to
approximating functions. Involved are not only the
topological layout of pipelined CA nets for approximating functions but also the computational complexity.
Throughout this paper, w,e will use minimally redundant radix 16 signed-digit number representation
whose allowed digit values are {-9, -8, "', -1, 0,
1, "', 9}.
APPROXIl\1ATION OF FUNCTIONS
The basic capability of a typical digital computer is
limited to simple algebraic manipulations. As a result
of this inherent limitation approximation is inevitably
involved in the practical computational procedure if
the numerical approach is to apply to the evaluation
of functions at all. The discrepancy between approximated and the approximating values is required to be

~t.

* The work was sponsored by AEC-AT(U-1) Gen. 10, Project 14.
9.5

96

Spring JQint CQmputer CQnference, 1970

way that the errQr shall be Qf the same Qrder Qf magnitude all Qver the range. Rapidly cQnvergent PQwer expansiQns are Qf practical impQrtance. Mere CQnvergence
Qf an expansiQn, valuable as it is frQm the purely
analytical standpQint, is Qf little practical use if the
number Qf terms demanded fQr a reasQnable accuracy
is. very large. 9
In light Qf the abQve cQnsideratiQn Chebyshev
polynomials, defined by
Tn(x) = CQS (n CQS-l)

fQr

-1 :::; x :::;

+1,

(1)

emerge as a prQmising PQtential candidate fQr apprQximating functiQns.
With the speed Qf divisiQn rapidly increased in CQnventiQnal cQmputers, the superiQrity Qf ratiQnal apprQximatiQns seems to. be generally recQgnized. 9
RQughly speaking, Qne may say that the "curve-fitting
ability" Qf rational function R{x)

o

= Multiplication

0.
0.

Addition
Storage

T =IL0gz31 + 1 .. 2

Figure I-Evaluation of 3rd degree polynomial with
Estrin's method

ao + alX + U2X2 + ... + amxm
R(x) = - - - - - - - - - -n
bo + b1x + b2x2 + ... + bnx

is apprQximately equal to. that Qf a PQlynQmial Qf degree
n + m. In cQmpeting with the PQlynQmial Qf degree
n + m, R(x) has an unsuspected advantage in that
the cQmputatiQn Qf R(x) fQr a given x dQes nQt require
n + m additiQns, n + m - 1 multiplicatiQns, and Qne
divisiQn as might be surmised at first. By transfQrming
R (x) into. a continued fraction
R(x) = Pl(X)

adjusted to. a certain tQlerable degree as individual
cases demand. There are two. general apprQaches in the
theQry Qf apprQximatiQn-PQlynQmial apprQximatiQn
and ratiQnal. apprQximatiQn. 8
The representatiQn Qf functiQns by PQlynQmials is
an Qld art. The TaylQr series has been Qne Qf the CQrnerstQnes Qf analytical research. If a series has no. Qther
purpQse than numerical evaluatiQn Qf the functiQn, the
degree Qf CQnvergence has to. be investigated. The
TaylQr expansiQn may cQnverge in the entire plane Qr
within a given circle Qnly, and it may diverge even at
every PQint. With the develQpment Qf the theQry Qf
QrthQgQnar expansiQns, the realizatiQn came that QCcasiQnally PQwer expansiQns whQse cQefficients are nQt
determined accQrding to. the scheme Qf TaylQr expansiQn can Qperate niore effectively than TaylQr series
itself. Such expansiQns are nQt based Qn the prQcess Qf
successive differentiatiQn but Qn integratiQn. A large
class Qf functiQns which are nQt sufficiently analytic to.
allQw a TaylQr expansiQn· can be represented by such
QrthQgQnal expansiQns. These expansiQns belQng to. a
given definite real realm Qf the independent variable
x, arid it is aimed to. apprQximate a functiQn in such a

+ - - - -C - - C
P (x) + P ( X ) + ...
2

3

2

3

we achieve the significant reductiQn in the number Qf
multiplicatiQns and divisiQns fQr evaluating any R (x)
to. n Qr m, whichever is larger. The cQntinued fractiQn
fQrm Qf a ratiQnal functiQn nQt Qnly lends itself· to. a
faster executiQn but also, sQmetimes, refrains frQm a
disadvantage suffered by the ratiQnal functiQns-the
cQefficients depend Qn the degrees Qf the numeratQr
and denQminatQr.
A practical applicatiQn Qf CA nets to. apprQximating
a given functiQn shQuld inVQlve three basic criteria:
speed, accuracy, and cost. The Qverall speed Qn a
machine is gQverned by two. factQrs: the speed Qf
signals physically gQing through circuit cQmpQnents
and the speed Qf the cQmputatiQnal algQrithm in terms

Combinational Arithmetic Systems

of logical steps. Weare primarily concerned with the
latter. The inherent unique property of a totally combinational arithmetic net clearly allows as many
parallel computations to be done simultaneously as
they are mathematically permissible. Therefore, the
delay from the presence of given data to the interpretation of the result could be minimized in a CA net.
As the evaluation of a given function through the numerical 'technique inevitably introduces approximations, the problem of accuracy is twofold. First, how
accurate is the approximating formula? Second, how
can the error, thus incurred, be estimated, adjusted,
and controlled? Cost is given a restricted meaning here.
It is a measure of the number of building blocks needed
in the implementation.
If f(x) is continuous and of bounded variation in
[ -1, 1] then f (x) can be expressed in terms of the
Chebyshev series. 6
co

L' aiTi(x)

(3)

i=O

This series can be truncated after any term, say nth,
to give an approximation to f (x). The truncation error
is then
co

co

L I aiTi (x) I ~ L I ai I
i=n+1

(4)

i=n+1

because the magnitude of Ti(X) is bounded by unity.
It has been shown that Chebyshev expansions are the
most strongly convergent of a wide class of. expansions
in orthogonal polynomials. 9 Therefore, the truncation
error of the Chebyshev approximation is ascertainable
at a glance. Further, the partial sum of the Chebyshev
series
n

L' aiTi(x)

97

Recent developments have demonstrated that rational approximations can give higher accuracy than
Chebyshev approximations of the same computational
complexity.lO However, approximation of the form
R(x) = P(x)/Q(x) = f(x)

(7)

share one of the disadvantages of explicit polynomials;
the coefficients of P (x) and Q (x) depend on the degrees
of P (x) and Q(x). Continued fractions derived from
the form (7) can overcome this drawback and have
shown a promising prospect in numerical computation
on conventional computers. Nevertheless, continued
fractions are still impaired by some shortcomings, at
least, as far as the application of CA nets is concerned.
The most serious problem in this r~pect is that the
evaluation of a continued fraction involves a series of
divisions; division is rather complicated in a CA net.1l·12
The other shortcoming of less importance is that integration and differentiation cannot be done on a continued fraction as easily as on a Chebyshev series.
The fact that continued fractions involve many
divisions has forced us to choose polynomial approximations, which have no division at all,' rather than rational approximations in the design of the combinational arithmetic system for the approximation of
functions. This choice is more or less unique to our
system and may not be justified in many other cases.
Further improvement of this system may alter this
basic decision.

PIPELINED COMBINATIONAL ARITHMETIC
NETS FOR EVALUATING CHEBYSHEV
SERIES

(5)

i=O

is a good approximation to the best polynomial of
degree n in [-1, 1]. Arguments supporting this assertion can be found in Reference 6.
Even though explicit polynomials can be evaluated
on a maximally parallel CA nets, as shown in the previous section, they have some drawbacks. First, the
power form given by
f(x) =

Co. n

+ Cl.nX + ... + cn.nxn

(6)

has coefficients which are functions of n, so that a
change in order of approximation requires a new set of
coefficients. The second drawback stems from the illdetermination of the coefficients Ci.n when n is large,
which frequently occurs when a function is represented
to high accuracy over a long range.

In between the two extremes, totally serial· and
totally concurrent (e.g., Figure 1), a pipelined CA
(PCA) net serves as a compromised alternative. A
PCA net, in general, consists of both sequential and
combinational circuits. Different composition of these
two kinds of circuits gives to the resultant PCA net a
wide spectrum of performance versus cost. A designer
is thus endowed with more freedom at his disposal to
choose a particular composition to meet his requirements. The study we made shows that the PCA net is
particularly attractive for evaluating Chebyshev series.
The general concept of pipelining techniques has been
successfully applied to modern information processing
systems in order to obtain a much improved performance at the cost of very moderate increase of
hardware. 1

98

Spring Joint Computer Conference, 1970

then

input

nA

T

loop-free
ckt

loop

1

T'

accumulating
ckt

I

tl,n

=

=

+ IlT

to,n

(9)

In order to decrease T one must decrease to,n or IlT,
or both. To decrease IlT is to shorten the longest information flow path; to decrease to,n is to minimize the
time spacings between consecutive to's. The minimum.
possible value of to,n is n • Ilt.
lt is interesting to investigate the effect on T by
variations of IlT and the time spacing between consecutive to's. Suppose IlT is increased by Ilt, due to
the addition of more circuit in the longest information
flow path, i.e., IlT' = IlT + Ilt, then the corresponding
total computation time T' becomes

'it

~

+ IlT

tl,i = to,i

=

to,n

+ IlT'

=

to,n

+ IlT + Ilt =

T

+ Ilt

(10)

In most cases, T is much greater than Ilt, hence T' is
approximately the same as T.
On the other hand, suppose the time spacing between
consecutive to's is increased by, Ilt, i.e.,

B
111

to,of = to,o

output

toi = to,l

+ Ilt

to./ = to,i

+i

to,n; = to,n

+ n ·Ilt

Figure 2-An abstract model of pipelining

A n abstract notion of pipelining

The concept of pipelining technique relevant to the
evaluation of Chebyshev series and polynomials can
be abstracted with the following simplified model.
Consider

. Ilt

(11)

then
(8)

Assume not only that the computations of fi's are independent of one another but also that the computation pattern of each fi is essentially the same, or can
be made the same by introducing dummy operations if
necessary. The block identified by "loop-free ckt"
in Figure 2 assumes the responsibility of computing
fi's; the raw data of fi's are fed into it one after the
other. The computed results of fi's are then accumulated in the "accumulating ckt."
Let to,i be the instant at which the raw data for computing fi are fed into the pipelined circuit (at point A) ,
tl,i the instant at which fi is' computed and accumulatedas the partial result (at point B), IlT the amount
of time needed to compute ii, i.e., IlT = tl,i - to,i, T
the total amount of time required to evaluate f.
Clearly, assume

to,o = 0

T'

=

to,n'

+ IlT =

to,n

+ n . Ilt + T

=

T

+ n • Ilt
(12)

A comparison of Equations (10) and (12) clearly
shows that the effect of a variation of the time spacing
between consecutive to's is n times greater than that of
a variation ofllT with the same magnitude. Therefore,
it is more desirable to decrease the time intervals.
between consecutive to's than to shorten the longest
information flow path.
Ideally, one would like to see the input data for
computing the fi's are fed into the pipelined circuit
one after the other with the least possible delay. In
this case, with
to,o = 0
we have
to,i = i . Ilt,

to,n = n . Ilt,

Combinational Arithmetic Systems

and
T = n • A.t

+ AT

(13)

99

S

x

0,5

In Equation (13), with A.t fixed, the interactions of
T, n, and AT, can be briefly summarized in the following. As the circuit complexity increases, most likely
AT will be lengthened and the computational power of
the circuit will be enhanced. If Equation (8) can be
reorganized
n'

f= Lf/

(14)

i=O

with the complexity of f/ greater than that of fi, then
we expect n' < n. The effect on T of the increase of
AT and decrease of n cannot be specified without
detailed information. It remains to be investigated.

Figure 4-CA net for evaluating 4th degree polynomial

function f(x) assumes the following forms:

Layout of peA nets

n

f(x) =

With the knowledge of the above section, we can
now begin the layout of PCA nets for evaluating
Chebyshev series. The functional block diagram of a
PCA-W net is shown in Figure 3. It consists of three
subnets, namely, CA-W subnet, self-multiplication
CA (SMCA) subnet, and pipelined sequential CA
(PSCA) subnets. A CA net with the capacity of computing polynomials Pn(x), n ~ w, asynchronously
without the necessity of segmenting P n (x) is said to
have a width wand is denoted by CA-:W net. The
meaning of SM CA and PSCA will be clear in the later
text.
The Chebyshev series used to approximate a given

CA-W
Subnet

,.
SMCA
Subset

--

'.

PSCA
Subset

L' aiTi(X)

-1~x~1

i=O

i/2

L C2jX2j

for i = 0, 2, 4, ... ,

j=O

(i-I) /2

Ti(X) =

L

C2i+I X2 i+ 1

for i

=

1, 3, 5, ... ,

(16)

j=O

SMCA assumes the responsibility of computing the
powers of x. Using Estrin's method, CA-W specializes
in evaluating Ti(X) when i ~ w. If i > w, then Ti(x)
must be broken into several segments. Each segment
can be evaluated on the CA-W net with Estrin's
methods. The input data for-computing these segments
are fed into the CA-W one immediately after the other.
The outputs from both CA-W and SMCA arrive in
the PSCA to form Ti(X). At PSCA, the coefficient ai
is multiplied with Ti(x), and aiTi(x) must then be
accumulated.
One of the inherent properties of Chebyshev polynomials, as can be seen from Equation (16), is that
Ti(X) contains only even terms when i is even and
only odd terms when i is odd. Due to this inhomogeneity, the CA-W subnet can be simplified by removing half of the storage vertices as well as all the
1r- ~ pairs of vertices for evaluating the sub-expressions, Cj + Cj+IX. For instance, a full CA net for evaluating a normal 4th degree polynomial
P 4 (x) = Co + CIX + C2X2 + Caxa + C4X4
(17)
is shown in Figure 4.
Since

Figure 3-Functional block diagram of the PCA-W net

(18)

100

Spring Joint Computer Conference, 1970

below. Assume
s

s

s
0,2

x

0,4

0,3

15

!(X)

=

L: aiTi(x)
i=O

3,2

for i

=

0,2, •••

for i

=

1, 3, •.•

(19)

then the contents of SO,I, SO,2, SO,3) and SO,4 are given
in Figure 6 as time increases from 0, the exact timing
as to when these c's should be fed into the PCA-W
net will be seen in the appendix.
The output of the CA-7 subnet is multiplied at '1"4,1
by an appropriate factor coming from SMCA. For
example, the output of CA-7, Ti(X) for i :::; 7 or the
first segment of Ti(X) for i > 7 is multiplied at '1"4,1 by
the unity coming from 8 3 ,1 through M4,1. The second,
third, and fourth segments, if they exist, of T i (x) for
i > 7, are multiplied by x8, XI6 , and X24 respectively.
The partial result of each T i (x) is accumulated at
t

@=Merging
f(x)

=

~

i ~0

a.T.(x)
I I

Figure 5-PCA-7 net for evaluating Chebyshev series-A-Ievel

 7. It
should be emphasized that in the following discussion,
for all H-level vertices, the output gates are always
open and clocks are always effective unless otherwise
stated.
When i :::; 7, Ti(X) needs not be segmented. Assume
at t = to, x is fed to 8 0 ,0 and at t = to + 4 all coefficients
Cj of Ti(X) are in the first storage layer (80 ,1, ••• , SO,4).
At t = to + 12, Ti(X) is ready at +3,1 and +3.t', the
output of the CA-7 subnet. Ti(X) must be multiplied
with an ai at t = to + 14, hence at t = to + 11 the gate
of *4,2 is closed to produce ai. Only at t = to + 16, the
clock of ~ 5,13 is effective such that the partial result,
aiT i (X), can be registered there. Since there is no extra
delay needed while accumulating the result in ~ 5,13,
T i+l (X) can come right after T i (x). According to the
criterion outlined in the text the minimum delay for
evaluating
n

f(x)

L: aiTi(x)
i=O

(A-I)

with a given PCA-W net is
T

= ntt.t

+

tt.T

(A-2)

if tt.T is proved to be the shortest possible delay through
the net for computing and solving one segment of the
given function. Referring to Figure 7, we see tt.T is 17
time units, while the shortest possible delay is 16 time
units (four multiplications have to take place in series).
The extra delay is caused by +4,4. The passage through
+4,4 seems redundant but it is actually indispensable
as will be seen in the following discussion. Therefore,
the current design requires minimum amount of time
to evaluate Equation A-I. Meantime, the gates of
S3,1, +3,0, +4,0, +5,0, and +4,11 are closed because they
are not needed for computing Ti(X) with i :::; 7.
When i > 7, Ti (x) must be segmented as shown in
Figure 6. Let us denote the segments of T i (x) by
Ti,o(x), T i ,I(X), ••• , Ti,k(X). In order to minimize the
overall delay through the net the input data for Ti,k(X)
must be fed into the net immediately after the input
data for T i ,k-l(X). Whether this requirement can be
met depends on a subtle arrangement: of vertices
*4,1, ~4,11, ~4,12, ~4,13, +4,1, and +4,4.
Ts(x) is broken into Ts,o(x) and TS,I(X), With the
gate of +3,1 closed from now on, Ts,o(x) and T8,1(x)
will be ready in +3.t' at t = to + 12 and t = to + 13
respectively. Only at t = to + 11 and t = to + 12 the
gates of 8 3 ,1 and +3,0 will be open respectively to insure proper multiplication of 1 . Ts,o(x) and XS• T S,I(X)
in *4,1 at t = to + 13 and t = to + 14. At t = to + 16,
Ts(x) is formed in ~4,13. At t = to + 17, ~4,13 holds
the sum of Ts(x) and T 9 ,0(x)/x with the gate of +4,1
still closed. At t = to + 18, the gate of +4,1 is permitted
to open for one time unit such that T s (x) would be in
+4,4. At the same time, ~4,13 holding the sum of Ts(x)
and T 9 ,0(x)/x receives T9,l(x)/x from ~4,11 and ~4,12
and the complemented value of Ts(x) from +4,1, The
net effect at ~4,13 is to produce T 9 (x)/x at t = to + 18.
Assume at t = t1, T I6 ,0(X), T I6 ,1(X) and T I6 ,2(X) are
in ~4,11 (and ~4,12)' *4,1, and +3.t' respectively, while
the contents of ~4,13 is T I5 (X)/X. The gates of 8 3 ,1,
+3,0, and +4,0 are permitted to open at t = tl 3,
tl - 2, tl - 1, respectively. At t = tl + 1, ~4,13 holds
the sum of T I5 (X)/X and 1 • T I6 ,0(X) and +4,1 holds
T I5 (X)/X with its gate still closed. At t = tl + 2, the
gate of +4,1 is open, ~4,13 now holds T I5 (X)/X +

+

+

T I6 ,0(X)
xBTI6 ,I(X) - T I5 (X)/X = T I6 ,0(X)
xST16,1 (x). With the gate of +4,1 closed again at t =
tl +
3, the contents of ~4,13 become T l6 ,0(X) +
XST I6 ,1 (x)
X16 T I6 ,2(X) = T16(X). The transition from
3-segment T i (x) to 4-segment T i (x) and the transition

+

from 4 to 5 and so forth are done in the same manner.
The preparation of ai ( aiX ), the mU'Iti"plication of ai
and Ti(x) (aix and Ti(x)/x) at *5,1, and the registra-

Combinational Arithmetic Systems

tion and accumulation at ~ 5,la are done exactly in the
same way as they were in the case of aiT i (x) for i :::; 7.
The existence of +4,4 is necessary because one of the
inputs of *5,1 must come from either +a,1 or +4,1 and
+4,4 must be there to serve as a selector. Since input
data for all segments of T i (x) can be fed into the net
one immediately after the other as we expected, and
tiT is also held minimum, the Chebyshev series is
evaluated on this PCA-7 net at the fastest possible
pace, according to Equation (13).
To insure proper and reliable operations m the
PCA-7 net shown in Figure 7, the gates and clocks of

107

..I"L.:gdlopII'I

--L- :clock '"-tIve

8

9

10

11

12

13 14 16

18

+3.0 I--_ _ _ _ _---J.....L..L....L..L..L..l....L.J...J...J...L..LL..J....JLLJI....-_ __

18

~0r_----------------~~-----+6.01-----------------------------

+3.1t--_ _ _ _ _ _ _...L...-_ _ _ _ _ _ _ _ _ _ _ _ _ __ _

o

2

4

8

01234687

8

9

10 11

12 13 14

16

18

10

=

I i/W + 1 I

Figure A-2-Control timing chart for Figure 7

the following
ABB's).

H-Ievel vertices

(or,

equivalently,

yes

h

=g

must be carefully studied while for the remaining Hlevel vertices, the following assumption holds, "Output gates are always open and clocks are always
effective."
Let V h be the H -level vertex in the SM CA subnet
where Xh(w+l) is formed, then in our example (Figure 7)

Vo:
At t

= t~·
-10+(h-g)
1,1

V h gate opens

8 a,l, VI:

+a,o, V 2 :

+5,0.

Their clocks are always effective; gates open only at
certain instants, which are defined by the flow-chart
shown in Figure A-I, in which tj,i refers to the instant
at which aiTi(x) is in the last H-Ievel vertex of the
net, i.e., +5,1 in Figure 7.
For +4,1, clocks are always effective while its gate
opens only at t = tj,i - 4. The gate of +a,1 opens only
for the first tv + 4( \ log2 tv I + 1) + 1 time units
with the clocks always activated. The gate of ~5,la
always opens while clocks are activated only at t =
tj,i 1. At t = tj,i - 6, the gate of 8 a,2(*4,2) opens
( closes) for even Chevyshev polynomial and the gate
of *4,2 (8a,2) opens (closes) for odd Chebyshev
polynomial.
With the knowledge obtained so far, we draw a
control timing chart in Figure A-2 for evaluating
16

L
i=O

Figure A-I-Determination of gate conditions in SMCA sub net

+4,0, Va:

on a PCA-7 net.

aiTi(x)

Operating systems architecture
by HARRY KATZAN, JR.
Pratt Institute
Brooklyn, N ew York

INTRODUCTION

Access

Operating systems architecture refers to the overall
design of hardware and software components and their
operational effectiveness as a whole. To be effective,
however, an operating system must not only be cognizant of the collection of har-dware and software
modules, but must also be designed in light of the
programs and data which the system processes and
the people which it serves. The absence of formal theory
on operating systems and the lack of standard terminology have caused. much confusion among users. The
problem is particularly apparent when comparing
systems where the same terms are applied to a variety
of concepts and levels of implementation.
The purposes of this paper are threefold: (1) to
present the basic properties with which operating
systems can be grouped, classified, and evaluated; (2)
to identify the major categories into which operating
systems can be classed and to give concrete examples
of each; and (3) to discuss resource management in
operating systems with an emphasis on storage allocation and processor scheduling.
First, seven properties, used to classify operating
systems, are briefly described. Then, the major categories into which operating systems can be classed
are given and their most significant attributes are
noted. Lastly, the most significant factors in operating
systems design, i.e., storage allocation and processor
scheduling, are treated in detail.

Access is concerned with how the user interacts with
the system. Does he access the system via a remote
terminal or does he submit his work in a batch processing environment? If the user is at a remote location,
what is the nature of his terminal device? Is it a
RJE/RJO work station or is it a keyboard or CRT
type device? Is a command system available so that
the user can enter into a dialogue with the· operating
system? Can the user initiate batch ~asks or query the
status of them when at a remote terminal? Does the
facility exist for conversing with a problem program
from a terminal device?

PROPERTIES OF OPERATING SYSTEMS
An operating system tends to be classified, informally,
on the basis of how the facilities of the system are
managed or allocated to the user. Accordingly, the
number of properties, or combinations of properties,
which contribute to the classification, is very large.
Seven of these properties dominate the remainder and
are introduced in the following paragraphs.

109

Utilization
Utilization is concerned with the manner in which
the system is used. Is the system closed so that the user
is limited to a specific programming language or is the
system open allowing the user access to all of the
system facilities? How must the user structure his
programs-planned overlay, dynamic segmentation,
single-level store? Can the user prepare and debug
programs on line or is he limited to querying the system? What facilities are available for data editing and
retrieval? In the area of data management, what access
methods, file organization, and record types are permitted? Does the data management system have provisions for using the internal and external storage
management facilities of the system? Lastly, what
execution-time options are permitted by the operating
system at run time as compared to compile time?
Performance
Performance deals with the quality of service to the
installation and to the user. Does the operating system
design philosophy attempt to maximize the use of

110

Spring Joint Computer Conference, 1970

system resources, maximize throughput, or guarantee
a given level of terminal response? What is the probability that the system will be available to the user
when needed? Does the user lose his data sets if the
system fails (data set integrity)? What facilities are
available for system error recovery?
Scheduling
Scheduling determines how processing time is allocated to the jobs (or tasks) which reside in some form
in the system. What scheduling philosophy is usedsequential, natural wait, priority, time slicing, demand?
What is the nature of the scheduling algorithmround robin, exponential with priority queues, table
driven?

a basis for comparison and are used in the next section
to identify the major categories of operating systems.
CATEGORIES OF OPERATING SYSTEMS
An operating system! is an integrated set of control
programs and processing programs designed to maximize the overall operating effectiveness of a computer
system. Early operating systems increased system performance by simplifying the operations side of the
system. Current operating systems additionally attempt
to maximize the use of hardware resources while maintaining a high level of work throughput or providing a
certain level of terminal response. A multitude of
programmer services are usually provided, as well.
Multiprogramming

Storage management
Storage management is concerned with how main
storage and external storage are allocated to the users.
Is main storage fragmented, divided into logical regions,
allocated on a page basis or are programs swapped?
Is external storage allocated in fixed-size increments or
on a demand basis with secondary allocations, as
required?
Sharing
Sharing is the functional capability of the system to
share programs, data, and hardware devices. Does the
system permit readonly and re-entrant code so that
programs can be shared during execution? Can data
sets be shared without duplicating them? At what level
of access? Can hardware devices be shared among
users giving each the illusion that he has a logical
device to himself?
Configuration management
Configuration management is concerned with the real
physical system and the logical system as seen by the
user. Physically, how is the system organized and how
can this organization be varied? Does the capability
exist of partitioning off a maintenance subsystem?
Similarly, can a failing CPU, core box, channel, or
I/O device be removed from the system? Logically,
does the user have a machine to himself, a large virtual
memory, a fixed partition?
Obviously, the properties are not exhaustive in the
sense that all operating systems, real or hypothetical,
can be automatically classified. The properties do form

A multiprogramming system2 ,3 is an operating system
designed to maintain a high level of work throughput
while maximizing the use of hardware resources. As
each job enters the system, an internal priority, which
is a function of external priority and arrival sequence,
is developed. This internal priority is used for processor
scheduling. During multiprogramming operation, the
program with the highest internal priority runs until
a natural wait is encountered. While this wait is being
serviced, processor control is turned over to the program
with the next highest priority until the first program's
wait is satisfied, at which time, processor control is
returned to the high priority program, regardless if
the second program can still make use of the system.
The first job has, in a sense, demanded control of the
system. The concept is usually extended to several
levels and is termed the level of multiprogramming. A
multiprogramming system is characterized by: (1)'
Limited access traditionally limited to tape or card
SYSIN and printer or tape SYSOUT. RJE/RJO may
be implemented but on-line real-time processing usually
requires a specially written problem program. (2)
Utilization is most frequently restricted to batch type
operations with data management facilities being provided by the system. Planned overlay is usually. required for large programs with most debugging being
done off line. (3) Performance is oriented towards high
throughput and maximum utilization of hardware
resources. A given level of service is not guaranteed
and the processing of jobs is determined by opera~ional
procedures. (4) Scheduling of work usually involves
priority, natural wait, and demand 'techniques. In
some systems, a unit of work may spawn other units of
work providing parallel processing, to some degree.
(5) Storage management techniques vary between sys-

Operating Systems Architecture

Operating System A

Stomge

I.. ,. ..

0%

'Ill

memories, each of which is utilized by an operating
system. I/O channels and devices are dedicated to one
or the other operating system and use the hardware
prefix register to know to which half of storage to go.
All interrupts are indirectly routed to a common interrupt routine which decides which operating system
should receive the most recent interrupt. Processor
control is then passed to the hypervisor control program
for dispatching. The hypervisor control program loads
the prefix register and usually dispatches processor
control to the operating system which received the last
interrupt. Alternate dispatching rules are to give one
operating system priority over another or to give one
operating system control of the processor after a fixed
number of interrupts have been received by the other
side. Hypervisors are particularly useful when it is
necessary to run an emulator and an operating system
at the same time. Similar to multiprogramming systems, a hypervisor is characterized by: (1) limited
access; (2) batch utilization; (3) high throughput performance; (4) priority, natural wait, and demand
scheduling; (5) basic storage management techniques;
(6) limited sharing facilities; and (7) configuration
management determined by the operating systems
that are run as subsystems.

Figure I-Hypervisor multiprogramming

Time sharing

terns with most systems using a fixed partition size or a
logical region for problem programs. Paging techniques
with dynamic address translation have been used with
some success. (6) Sharing of system routines is frequently provided while the sharing of problem programs is a rarity. When a central catalog of data set
information is provided, data set sharing, at various
levels of access, is available. Otherwise, data set sharing
is accomplished on an ad hoc basis. (7) Configuration
management is usually limited to the existing physical
system with the users being given a portion, fixed or
variable, of actual storage.

Although time-sharing is used in a variety of contexts, it most frequently refers to the allocation of
hardware resources to several users in a time dependent
fashion. More specifically, a time-sharing system concurrently supports mUltiple remote users engaging in a
series of interactions with the system to develop or
debug a program, run a program, or obtain information
from the system. The basic philosophy behind time
sharing is to give the remote user the operational advantages of having a machine to himself by using his
think, reaction, or I/O time to run other programs.
Operation of. a time sharing system is summarized as
follows:*

H ypervisor multiprogramming

One of the problems frequently faced by installation
management involves running two different operating
systems, each of which requires a dedicated but identical
machine. A hypervisor is a control program that, along
with a special hardware feature, permits two operating
systems to share a common computing system. A relatively small hypervisor control program (see Figure 1)
is required which interfaces the two systems. Although
only one processor is involved, a hardware prefix
register divides storage into two logically separate

Time-shared operation of a computer system
permits the allocation of both space and time
on a temporary and dynamically changing
basis. Several user programs can reside in
computer storage at one time while many
others reside temporarily on auxiliary storage
such as disc or drum. Computer control is
turned over to a resident program for a scheduled time interval or until the program reaches

* See reference [11, p. 190.

112

Spring Joint Computer Conference, 1970

a relay point (such as an I/O operation), depending upon the priority structure and control algorithm. At this time, processor control
is turned over to another program. A nonactive program may continue to reside in
computer storage or may be moved to auxiliary storage, to make room for other programs,
and subsequently be reloaded when its next
turn for machine use occurs.
A time-sharing system is characterized by: (1) Remote
access with keyboard or CRT devices and possibly
RJE/RJO work stations. (2) Varied utilization ranging
from a closed system such as QUIKTRAN4, APL/360 5,
or BASIC6, to an open system such as MULTICS7 or
TSS/360 8 ,9. In most open systems, a single-level store,
on-line debugging facilities, and an extensive file system
are also available. (3) Performance in most time sharing
systems is mainly centered around dividing processor
time among the users and providing fast response to
terminal requests. Management of other resources in a
time-sharing system is usually towards this end. (4) A
given level of user service is maintained by giving
users a short slice of processor time at frequent intervals
according to a scheduling algorithm. The most frequently used scheduling algorithms are round robin
and exponential with priority queues. (5) Varied
storage management techniques are used depending
upon the hardware and the sophistication of the soft-

User A's
Virtual Storage

User B's Virtual

ware. Swapping and paging techniques have been used
with great success. In the latter category, direct-access
storage devices and large capacity core storage have
both been used as paging devices. (6) Most open timesharing systems permit code sharing during execution
and language processors, data management routines,
and command system programs are frequently shared.
If public storage is provided, then data sets sharing is
also available. In closed systems, the level of sharing
is determined by the programming language used and
its method of implementation. (7) In a utility class
time-sharing system, configuration management facilities
are required for preventative maintenance and for the
repairing of faulty equipment. Multiple processors,
storage units, and data channels are provided with
many large time-sharing systems; thus' the hardware
resources are available for configuring the system to
meet operational needs. In some time-sharing systems,
the user has a logical machine to himself provided
through a combination of hardware and software
facilities. This topic is covered in the next section on
virtual machines.
Virtual systems

A virtual system is one which provides a logical
resource which does not necessarily have a physical
counterpart. Virtual storage systems 7,8,9,IO,1l (see Figure
2) are widely known and provide the user with a large
single level store achieved through ~ combination of
hardware and software components. A virtual storage
system is characterized by the fact that real storage

~l------------s~torage

B'S loaded

A's loaded virtual
storage

virtual
storage

:?.:

r\

~

~

/

~~

~

~

"
storage

storage

User CiS
Virtual Storage

User D's
Virtual Storage

Figure 2-Virtual storage

If

/

~~
"'at
C

~
~

"-

~~

~

/

Figure 3-Loaded virtual storage

~

a.n
D

Operating Systems Architecture

contains only that part of a user's program which need
be there for execution to proceed. The basic philosophy
of virtual storage lends itself to paging (Figure 3) and
is usually associated with dynamic address translation,
as introduced later in the paper.
A virtual machine is an extension to the virtual
storage concept which gives the user a logical replica of
ian actual hardware system. Whereas in a virtual
storage system, a user could run programs, in a virtual
machine, a user or installation can run complete operating systems. In addition to using the virtual storage
concept, a virtual machine system contains a control
program12 which allocates resources to the respective
virtual machines and processes privileged instructions
which are issued by a particular operating system.
Although virtual systems are usually associated with
time-sharing, the concept is more general and applies
equally well to multiprogramming systems. Virtual
systems tend to be most effective in operating environments where dynamic storage allocation, dynamic
program relocation, simple program structure, and
scheduling algorithms are of concern. Virtual systems
using fixed size pages and dynamic address translation
also lend themselves to sharing and most systems using
this design philosophy have implemented code sharing
during execution to some extent.

job control and data management. User programs,
language processors, and utility programs run at level
two and are regarded as problem programs by the
supervisor program. Control is passed from level two
to level one by hardware facilities, usually termed an
interrupt, and level one services are able to completely
sustain level two needs.
In a virtual system, another level of complexity is
required. Logical as well as physical resources must be
maintained and allocated. Thus, in virtual systems,
allocation of resources* is relegated to the supervisor
or control program and typical job control and data
management routines are included as a job monitor
program (Figure 5) which exists as a second level.

V.er A'.
Virtual
MellOzY

r..wt 2'Iuw

U.er B'.
Virtual
MellOzY

Lsv.t 2'Iuw

Job Monitor

Lswt

Job Monitor

Lsv.t

2\10

~

Real
storage

/--~

(Contxol proqrUlI

Tri-level operating systems

In a conventional operating system (Figure 4), two
levels of control are available, each of vvhich corresponds to a segment of core storage. Level one contains
the supervisor program and all associated routines for

113

2t.Jo

/
~

veer C'.
Virtual
MellOry

Lswt 2YzH.

veer D'.
Virtual
MellOzY

r..wi 2'Iuw

Job Monitor

r..wt

Job Monitor

LsHt

2t.Jo

2t.Jo

Figure 5-Tri-level operating system

User Programs
Lanquage Processors
utility Programs

Leve Z 'l'lJo

Su~ervisor Program
Job Control
Data Management

LeveZ One

Figure 4-Conventional

op~rating

system

Processing programs then execute at a third level. The
need for communication between level three and level
two exists and is satisfied by a virtual interrupt implemented in software analogous in a real sense to the
hardware interrupt discussed earlier. Level one control programs are characterized as follows: one per
system, executes in the supervisor state, runs in the
non-relocated mode, is not time sliced, and is core
resident. Similarly, level two monitor programs are
characterized by: one per user, executes in the problem
state, runs in the relocated mode, is time sliced, and is
pageable.

* Such as

CPU time, core storage, and I/O facilities.

114

Spring Joint Computer Conference, 1970

RESOURCE MANAGEMENT
In modern operating systems, the allocation of hardware resources among users is a major task. Two
resources directly affect performance and utilization:
storage management and scheduling. Both topics were
introduced earlier. The most widely used implementation techniques are discussed here.
Storage management

In either a 2-level or 3-level operating system, available storage is divided into two areas: a fixed area
for the supervisor program and a dynamic area for the
user programs. If no multiprogramming or time sharing
is done, then a user program executes serially in the
dynamic area. When he has completed his use of the
CPU, then the dynamic area is allocated to the next
user.
When more than one user shares the dynamic area,
such as in multiprogramming or time sharing, then
storage management becomes a problem for which
various techniques have been developed. They are
arbitrarily classed as multiprogramming techniques or
time-sharing techniques although the point of departure
is not well-defined. Multiprogramming techniques include fixed partition, region allocation, and roll in/roll
out. Time-sharing techniques include core resident,
swapping, and paging.
In a fixed partition system, the dynamic storage area
is divided into fixed sub-areas called partitions. As a
job enters the system, it specifies how much storage it
needs. On the basis of the space requirements specified,
it is assigned to a fixed partition and must operate
within that area using planned program structure
whenever necessary. In a region (allocation system, a
variable number of jobs may use the system. Just
before a job is initiated, a request is made to dynamically allocate enough storage to that job. Once a job is
initiated, however, it· is constrained to operate within
that region. In a logical sense, fences are created within
the dynamic area. Roll in/roll out is a variation of region
allocation which effectively enables one job to borrow
from another job if space requirements can not be fulfilled from the dynamic area. The borrowed region is
rolled back in and returned to the original owner
whenever he demands the CPU or when the space is
no longer needed by the borrower.
The most fundamental technique for storage management in time sharing is core resident. In a core
resident system, all active tasks are kept in main
storage. This method reduces system overhead and
I/O activity but is obviously limited by the size of

Segment

Page

Byte

Figure 6-Segmentation

core storage. Large capacity storage (LCS) is frequently
used in a hierarchical sense with main storage and provides a cost effective means of increasing the number
of potential users. Large capacity storage is sufficiently
fast to satisfy the operational needs of a user at a
remote terminal. Swapping is the most frequently used
method of storage management in time sharing. At the
end of a time slice, user A's program is written out to
auxiliary storage and user B's is brought in for execution. All necessary control information is saved between
invocations. In the above case, the system would have
to wait while user B's program was brought in for
execution. Thus, two or more partitions can be used
for swapping to reduce the I/O wait. The use of several
partitions permits other user programs to be on their
way in or on their way out while one user's program is
executing. This method reduces wait time but increases
the amount of system housekeeping and overhead. A
variation to the single partition approach is the onionskin method used with the CTSS system at M.I.T.13
With this method, only enough of user A's program is
written out to accommodate user B. In a sense, user
A's program is peeled back for user B's program. If
user C requires still more space than B, then A is
peeled back even more. In a paging system, main
storage is divided into fixed-size blocks called pages.
Pages are allocated to users as needed and a single
user's program need not occupy consecutive' pages, as
implied in Figure 3. Thus a translation is required
between a user's virtual storage, which is contiguous,
and real storage, which is not. A technique called
dynamic address translation is employed that uses a
table look up, implemented in hardware, to perform
the translation. First, the address field is segmented
to permit a hierarchical set of look up tables (Figure 6).
Then, each effective computer address goes through an
address translation process (Figure 7) before operands
are fetched from storage. The process is usually speeded
up with a small associative memory (Figure 8). When a
user program references a page that ,is not in main
storage, a hardware interrupt is generated. The interrupt is fielded by the supervisor program which brings
the needed page in for execution. Meanwhile, another
user can use the processor. Look up tables (Figure 7)
are maintained such that when a page is brought into

Operating Systems Architecture

SBgmtmt
Tab'te RBgiBtBJ'

ViJ'tuaZ StoNge Address

I

I

L

115

002

I I
01

019

,~

I

I

+

I

I

019

-,

019

1

I

Segmtmt Tab'tes

-

PagB
Tab'te ~
OJ'igin

1

SB(JIfIfMt 2bbZu

00201

00117

Ae.ociative

T
Page TabZes
CON

AddNss

~

00127

I
+

IIPtory

I

-'

+

I

Page TabZu

I

00121

r
I

00127

I

019

I

I

,

-,
I

I

-

00127

I

Figure 7-Dynamic address translation

Figure 8-Associative memory

main storage, an entry is made to correspond to its
relative location in the user's virtual storage.
The methods vary, obviously, in complexity. An
eventual choice on which technique to employ depends
solely on the sophistication of the operating system,
the access, performance, and utilization required, and
the underlying hardware.

philosophy is generally terrp.ed multiprogramming as
discussed previously.
In a time-sharing environment, performance is measured in terms of terminal response, and processor
scheduling is oriented towards that end. Thus, a user is
given a slice of processor time on a periodic basisfrequently enough to give him the operational advantage of having a machine to himself. The scheduling
philosophy is influenced by the user environment· (i.e.,
compute-bound jobs, small jobs, response-oriented jobs)
and the method of storage management. Three strategies have been used frequently enough to warrant
consideration. The most straightforward method is
round robin. Jobs are ordered in a list on a first-in-firstout basis. Whenever a job reaches the end of a time
slice or it can no longer use the processor for some reason, it is placed on the end of the list and the next job
in line is given a slice of processor time. A strict round
robin strategy favors "compute" jobs and "terminal
response" jobs equally and tends to be best suited to a
core resident storage management system. With· an
exponential scheduling strategy, several first-in-first-out
lists are maintained, each with a given priority. As a
job enters the system, it is assigned to a list on the
basis of its storage requirements-with lower storage

Scheduling

In modern operating systems, the supervisor program assumes the highest priority and essentially
processes and does the housekeeping for interrupts
generated by problem programs and external and I/O
devices. In this sense, the supervisor (or the system) is
interrupt driven. It is generally hoped that the processing done by the supervisor is kept to a minimum. When
the supervisor has completed all of its tasks, it must
decide to whom the processor should be allocated. In a
single job system, the running program simply retains
control of the processor. In a multi-job batch environment, where the system is performance oriented but
not response oriented, the processor is usually given
to the highest priority job that demands it. This

116

Spring Joint Computer Conference, 1970

requirements being assigned a higher priority since
they facilitate storage management. The scheduling
lists are satisfied on a priority basis, no list is serviced
unless higher priority lists have been completed.
Terminal (or response) oriented jobs are kept in the
highest priority list-:-thus assuring rapid terminal
response. If a job is computing at the end of its time
slice, then it is placed at the end of the next lowest
priority list. However, lower priority lists are given
longer time slices, of the order 2t, 4t, 8t, ... , so that
once in execution, a compute-bound job stays in execution longer. Exponential scheduling has "human
factors" appeal in that a terminal-oriented user, who
gets frequent time slices, is very aware of his program
behavior whereas the program behavior of a computebound user is generally transparent to him. One of the
biggest problems in processor scheduling is the difficulty
in developing. an algorithm to satisfy all users. The
schedule table strategy is an attempt to do that. Each
user is given a profile in a schedule table. When a job
enters the system, it is assigned default values. As the
job develops a history, however, the table values are
modified according to the dynamic nature of the
program. The scheduler is programmed to use the
schedule table in allocating the processor while satisfying both user and installation objectives. The schedule table approach is particularly useful in a paging
environment where certain programs require an excess
of pages for execution. Once the required pages have
been brought into main storage, then the job can be
given an appropriate slice of processor time.
Scheduling strategies differ to the extent that a
different one probably exists for each installation that
is developing one. As such, scheduling algorithms continue to be the object of mathematical description and
analysis by simulation.
THE LITERATURE
There are a weaJth of good papers on operating
systems in the computer literature .. In fact, the volume
is so great that a literature survey would invariably do
injustice to a great many competent authors. In spite
of this initial disadvantage, a sample of interesting
papers will be mentioned.
Dynamic storage allocation, storage hierarchy, and
large capacity storage have been studied in detail by
Randell and Kuehnerl 4, Freeman15 , Lauerl 6, and Fikes,
Lauer, and Vareha17 .
Performance, program behavior, and the analysis of
system characteristics have been reported by Belady18,
Fine, Jackson, and McIsaac 19 , Wallace and Mason20,
Coffman and Varian21 , Rande1l22, Dennis23 , Den-

ning24 ,25,26, Stimler27 , Madnick28 , Habermann29 , Estrin
and Kleinrock30 , Shulman31, and Belady, Nelson, and
Shedler32.
In addition to those already referenced, many operating systems have been implemented for experimental and productive purposes. Still other papers
give a survey of multiprogramming and time-sharing
techniques. Representative literature in these areas is:
Wegner33 ,34, O'Neill35 , Arden, Galler, O'Brien, and
Westewelt 36 , Badger, Johnson, and Philips37, Schwartz,
Coffman, and Weissman38 , Mendelson and England 39,
and Kinslow40 .
Eventually, all use of an operating system reduces
to a problem in man-machine communication and two
important papers by McKeeman41 and Perlis42 should
be listed in a survey such as this.
Lastly, two compendiums of papers on time-sharing
systems have been published by General Electric and
IBM. The GE collection entitled, A New RemoteAccessed lYIan-Machine System, describes the MULTICS system at M.LT. and contains the following
papers:
Corbato, F. J., and V. A. Vyssotsky, "Introduction and Overview of the MULTICS
System."
Glaser, E. L., J. F. Conleur, and G. A. Oliver,
"System Design of a Computer for Time
Sharing Applications."
Vyssotsky, V. A., F. J. Corbato, and R. M.
Graham, "Structure of the MULTI CS
Supervisor."
Daley, R. C., and P. G. Neumann, "A General-Purpose File System for Secondary
Storage."
Ossanna, J. F., L. E. Mikus, and S. D. Dunten,
"Communications and Input/Output Switching in a Multiplex Computing System."
David, E. E., and R. M. Fano, "Some
Thoughts about the Social Implications of
Accessible Computing."
The IBM collection entitled, TSS/360 Compendium,
contains the following papers and reports:
Lett, A. S., and W. L. Konigsford, "TSS/360:
A Time Shared Operating System."
Martinson, J. R., "Utilization of Virtual Memory in Time Sharing System/360."

Operating Systems Architecture

McKeehan, J. B., "An Analysis of the TSS/
360 Command System II."
Johnson, O. W., and J. R. Martinson, "Virtual
Memory in Time Sharing System/360."
Lett, A. S., "The Approach to Data Management in Time Sharing System/360."
SUMMARY
Seven properties were introduced for the description,
classification, and comparison of operating systems:
access, utilization, performance, scheduling, storage
management, sharing, and .configuration management.
On the basis of these properties, the following types of
operating system were identified: multiprogramming,
hypervisor multiprogramming, time sharing, virtual
systems, and tri-level operating systems. Lastly, two
major areas of resource management were discussed:
storage management and scheduling. Generally, storage
management techniques can be classified as to whether
they apply to multiprogramming or time-sharingalthough the dividing line is not well-defined. Multiprogramming techniques presented were: fixed partition, region allocation, and roll in/roll out. Timesharing techniques included: core resident, swapping,
and paging. Scheduling methods are similarly related
to either multiprogramming or time sharing. After a
brief discussion, the following time-sharing scheduling
philosophies were introduced: round robin, exponential,
and the schedule table.
Although formal methods have not been applied to
any great extent to operating systems, the interest
level is high and many related papers exist in the
literature. Operating system technology continues as
one of the more challenging areas in the field of computer science.
REFERENCES
1 H KATZAN
Advanced programming: Programming and operating systems
Van Nostrand Reinhold Company 1970
2 A J CRITCHLOW
Generalized multiprogramming and multiprogramming systems
Proceedings of the Fall Joint Computer Conference 1963
3 S ROSEN
IBM operating system/360 concepts and facilities
Programming Systems and Languages McGraw-Hill Book
Company 1967
4 T M DUNN J H MORRISEY
Remote computing-An experimental system. Part I: External
specifications
Proceedings of the Spring Joint Computer Conference 1964

117

5 A D FALKOFF K ElVERSON
APL/360 user's manual
IBM Thomas J. Watson Research Center Yorktown Heights
New York 1968
6 J G KEMENY T E KURTZ
Basic
Dartmouth College Computation Center Hanover New
Hampshire 1965
7 F J CORRATO V A VYSSOTSKY
Introduction and overview of the M U LTI CS system
Proceedings of the Fall Joint Computer Conference 1965
8 W T COMFORT
A computing system design for user service
Proceedings of the Fall Joint Computer Conference 1965
9 C T GIBSON
Time-sharing in the IBM system/360: Model 67
Proceedings of the Spring Joint Computer CorJerence 1966
10 A S LETT W L KONIGSFORD
TSS/360: A time-shared operating system
Proceedings of the Fall Joint Computer Conference 1968
11 N WEIZER G OPPENHEIMER
Virtual memory management in a paging environment
Proceedings of the Spring Joint Computer Conference 1969
12 An introduction to CP-67/CMS
IBM Cambridge Scientific Center Report 320-2032
Cambridge Massachusetts 1969
13 F J CORBATO et al
The compatible time-sharing system
The MIT Press Cambridge Massachusetts 1963
14 B RANDELL C J KUEHNER
Dynamw storage allocation systems
Communications of the ACM May 1968
15 D N FREEMAN
A storage-hierarchy system for batch processing
Proceedings of the Spring Joint Computer Conference 1968
16 H C LAUER
Bulk core in a 360/67 time-sharing system
Proceedings of the Fall Joint Computer Conference 1967
17 R E FIKES H C LAUER A L VAREHA
Steps toward a general-purpose time-sharing system using large
capacity core storage and TSS/360
Proceedings of the 1968 ACM National Conference
18 L A BELADY
A study of replacement algorithms for a virtual storage computer
IBM Systems Journal Volume 4 No 2 1966
19 G H FINE C W JACKSON P V MC ISAAC
Dynamic program behavior under paging
Proceedings of the 1966 ACM National Conference
20 W L WALLACE D L MASON
Degree of multiprogramming in page-on-demand systems
Communications of the ACM June 1968
21 E G COFFMAN L C VARIAN
Further experimental data on ,the behavior of programs in a
paging environment
Communications of the ACM July 1968
22 B RANDELL
A note on storage fragmentation and program segmentation
Communications of the ACM July 1969
23 J B DENNIS
Segmentation and the design of multiprogrammed computer
systems
Journal of the ACM Volume 12 No 4 1965
24 P J DENNING
The working set model for program behavior
Communications of the ACM May 1968

118

Spring Joint Computer Conference, 1970

25 P J DENNING
A statistical model for console behavior in multiuser computers
Communications of the ACM September 1968
26 P J DENNING
Thrashing: Its causes and prevention
Proceedings of the Fall Joint Computer Conference 1968
27 S STIMLER
Some criteria for time-sharing system performance
Communications of the ACM January 1969
28 S MADNICK
Multi-processor software lockout
Proceedings of the 1968 ACM National Conference
29 A N HABERMANN
Prevention of system deadlocks
Communications of the ACM July 1969
30 G ESTRIN L KLEINROCK
Measures, models, and measurements for time-shared computer
utilities
Proceedings of the 1967 ACM National Conference
31 F D SHULMAN
Hardware measurement device for IBM system/360 time-sharing
evaluation
Proceedings of the 1967 ACM National Conference
32 L A BELADY R A NELSON G S SHEDLER
A n anomaly in space-time characteristics of certain programs
"running in a paging machine
Communications of the ACM June 1969
33 P WEGNER
Machine organization for multiprogramming
Proceedings of the 1967 ACM National conference

34 P WEGNER
Programming languages, information structures, and machine
organization
McGraw-Hill Book Company 1968
35 R W O'NEILL
Experience using a time-sharing multiprogramming system
with dynamic address relocation hardware
Proceedings of the Spring Joint Computer Conference 1967
36 B WARDEN R A GALLER R C O'BRIEN
F N WESTERVELT
Program and addressing structure in a time-sharing environment
Journal of the ACM Volume 13 No 11966
37 G F BADGER E A JOHNSON R W PHILIPS
The Pitt time-sharing system for the IBM systems 360
Proceedings of the Fall Joint Computer Conference 1968
38 J I SCHWARTZ E G COFFMAN C WEISSMAN
A general purpose time-sharing system
Proceedings of the Spring Joint Computer Conference 1964
39 M J MENDELSON A W ENGLAND
The SDS SIGMA 7: A real-time, time-sharing computer
Proceedings of the Fall Joint Computer Conference 1966
40 H A KINSLOW
The time-sharing monitor system
Proceedings of the Fall Joint Computer Conference 1964
41 W M MCKEEMAN
Language directed computer design
Proceedings of the Fall Joint Computer Conference 1967
42 A J PERLIS
The synthesis of algorithmic systems
Proceedings of the 1966 ACM National Conference

Computer resource accounting in a time
sharing environment
by LEE L. SELWYN*
Boston University
Boston, Massachusetts

INTRODUCTION
The past several years have witnessed major stages in
the evolution of time sharing service suppliers toward
the (perhaps) ultimate establishment of a computer
utility or utilities that will, presumably, resemble other
public utilities in many ways. This paper is concerned
with the development of managerial accounting techniques that will enable suppliers to broaden their range
of services and allow some of them to evolve into
vertically integrated information service organizations.

Background

The early time sharing suppliers provided a relatively narrow range of services; the general-purpose
systems usually provided access to but a small number
of languages and virtually no proprietary software.
Indeed, if the latter was available, access to it was provided to customers of the service at no additional
charge. Other services provided only access to some
proprietary applications package, and usually did not
offer the generality of access to a programmable service.
However, with respect to this latter type of supplier, a
charge was indeed imposed for access to the applications software, although it was embedded in the overall
cost of the service.
The particular pricing policy established by any
one supplier was, perhaps as often as not, forced upon
it by some limitation in the computer resource accounting mechanism associated with the time sharing operating system. Hence, many firms "lived with" some

* The

author was a research participant at Project MAC,
Massachusetts Institute of Technology, which provided partial
support for much of the work reported here. He is presently
Assistant Professor, College of Business Administration, Boston
University.

schedule of charges that were almost certainly suboptimal. This, in many instances, resulted in substantial limitations upon the overall variety and type
of computing services they could provide.
In an earlier paper,! the author, along with D. S.
Diamond, considered the implications of various types
of services upon the pricing policies of a time sharing
service supplier. Several possible strategies were considered at that time. These included:
• Transaction-based charges for access to some applications package or data base, where the approximate quantity of computing power required for a
given transaction was either (a) reasonably predictable, (b) an insignificant part of the total·costs
associated with provision of the service, or (c)
negligible with respect to the "value" of the service
to its end-user.
• Prices based upon resource usage (e.g., cpu time,
connect time, core residence, etc.) for (a) generalpurpose programmable services, (b) for access to a
proprietary applications package where the quantity of computing resources consumed is unpredictable, highly variable, and a major component
of costs. (Of course, the resource prices for access
via the proprietary package could be higher than
for general access to the system.)
• Flat rate for unlimited access to the system, or
perhaps a variation on fiat rate, such as elapsed
connect time. This may be appropriate in cases
where the nature of use of the system was sufficiently limited such that this type of rate structure
would not result in abuse, or in cases where the
charge for a unit of service was so small that the
cost of accounting for service was prohibitive. (It
should be noted that, with the exception of a
poorly designed operating system, such conditions
are rather difficult to imagine in practice.)
Whatever pricing structure is eventually adopted
119

120

Spring Joint Computer Conference, 1970

MARKETlNr,

COMPUTER
OPERATIONS

END USER

INDEPENDENT MARKETING
OR
SOFTWARE SERVICE FIRM

INlEPEDENT
SOFTWARE
DEVELOPER
OR SOFTWARE
SERVICE FIRM
SOFTWARE
MAINTENANCE
INDEPENDENT
COMPUTER
SERVICE
SUPPLIERS
MARICETERS
SOFTWARE
DEVELOPMENT

Figure I-Organization of an information service firm

must be the result of consideration along a number of
dimensions. The pricing structure must be "market
oriented" so as to satisfy the needs and objectives of
customers. The user must be made to feel as if he is
paying only for· his share of the computer, that activities of other users do not affect his charges. He
wants to be able to predict, with some degree of accuracy, what his costs will be. He will base his purchase
decision, presumably, upon the value of the proposed
computing service to his organization. A second dimension is "operations oriented." The pricing structure
must induce users to behave in a manner that is consistent with the best interests of the operators of the
facility. If it is found desirable to have a relatively
level load on the system, then lower prices must prevail
in the less desirable times of the day or week. Users
must not be permitted to "hog" the machine, even if
they have "the funds to pay for the service, since this
could cost the computing service other customers who
balk at the (perhaps temporary) degradation of service
they receive.
Thus, a pricing structure for a computer utility
must be flexible enough to handle a variety of service
types, must be accurate to a point that satisfies customers' . needs for fairness an.d predictability, must
encourage the use of proprietary software with royalties
or other payments accruing to the owner, and must be
consistent with the requirements of a well oiled operating procedure. The present paper considers the requirements of a managerial accounting and control

mechanism to support such a pricing structure, with
the stated requirements.
The next section considers the management information requirements of time sharing firms, in light
of the current evolution of the industry. Specific design
objectives of such an accounting mechanism are then
examined in the following section. Finally, the paper
concludes with discussions of implementations of such
systems by the author-one on the IBM 7094 Compatible Time Sharing System at Project MAC, MIT,
and the other on the PDP-IO system operated by
Codon Computer Utilities, Inc.

EVOLVING NATURE OF THE TIME SHARING
BUSINESS

Services offered by time sharing firms
Where the early time sharing service suppliers provided only a limited range of services, it is becoming
increasingly clear that the suppliers of the future will
offer a much wider range of services at a much broader
level. In effect, a time sharing firm may be thought of
as a vertically integrated information service organization with activities ranging from the production of the
raw computing power, the development of applications
programs and other software, the maintenance of such
software, and the retail marketing of its product to
the end-user. Figure 1 illustrates a possible organization
of an information service firm that provides all of these
types of services.
One of the more significant developments in the time
sharing industry has been the entry of firms that specialize in some subset of these four major activities.
Thus, a software developer may only write an applications package, and may then turn over its maintenance,
marketing and operation to an information service
organization. Alternatively, the same software developer may perform his own maintenance and marketing, and use the time sharing service only as a source of
computing power. The time sharing service, on the
other hand, may choose to establish its own retail
marketing outlets, or may instead sell its services to a
retailer who will assume the marketing risks and
rewards within, perhaps, a particular geographical
region.
As the industry continues to develop along these
lines, the nature of particular arrangements made
between its members will grow increasingly more
complex. The software developer that chooses to do
his own maintenance and marketing will perhaps pay
the time sharing supplier for the computing resources
he has consumed and then go on to charge his own

Computer Resource Accounting

customers at whatever rate is appropriate. Alternatively, he may license a package to a time sharing
house either on a flat rate or a royalty basis; in either
case the two firms must somehow keep track of the
use of the subsystem by the ultimate customer.
The necessary record keeping associated with these
various levels of activity could, of course, be done at
each level; the computer operations area (or firm)
could simply measure the quantity of computing
resources consumed by each of its customers, which
could be end users but might also be software o~ners'
packages, independent marketing departments or firms,
etc., and render statements accordingly. The software
owner would then develop his own accounting system
to measure the use of his subsystem by each of his
customers; the marketing organization would similarly
have to account for resources used by each of its endusers, and so on. As an alternative, a single ~~formation
system may be constructed that provides for appropriate managerial and financial control at all levels.
The author began work on the development of such
a system while at Project MAC at the Massachusetts
Institute of Technology, and has since designed a
more complete information system structure for the
DEC PDP-10 while serving as a consultant to Codon
Computer Utilities, Inc., of Waltham, Massachusetts.
The principal design objectives, features and capabilities of these systems are described in the following
section.
DESIGN OBJECTIVES
We have already suggested a rationale for the development of an information system for time sharing
services that takes account of the vertically integrated
nature of the business. Such a system must be based
upon several key design objectives, which will be discussed presently.
Computer resource allocation

The system must provide a mechanism whereby it
is possible to allocate access to computer resources
among the various end-users, subsystems, retailers,
and in-house software development efforts.
Ewm the largest time sharing computers in operation today can support but a mere handful of simultaneous users; in most cases under 50 and in virtually
no instance over 100. As a result, the actions of any
one user on the system can, and often do, have a
noticeable impact upon all of the other users in the
community. In an economic sense, the users are
"oligopsonists," i.e., there are relatively few buyers

121

(of time sharing service) in the (captive) market associated with the single "monopolistic" machine. Of
course, this description of the market structure does
not hold for even the medium run, let alone the long
run. There are now numerous time sharing service
suppliers, and numerous customers, such that something more closely resembling a competitive market
exists. However, in the very short run, a given user is,
in effect, captured by a given system. As an oligopsonist,
his actions affect both the system and the other users.
Hence, his access to the system must somehow be
controlled, irrespective of his ability to pay. As an
example, a user desirous of obtaining the entire machine
for a period of, perhaps even five minutes would create
much unrest among the other users. As a result, the
time sharing monitor will normally use some sort of
round-robin or related method of scheduling jobs so as
to prevent a user from obtaining this much service in
this short a time. But what of the user who requires
some large number of simultaneous lines and many·
connect hours, plus perhaps some very large quantity
of m~ss storage, over a relatively short period of time,
perhaps a week or two. This could place an unnecessarily heavy load on the system and once again cause
some unrest among the other users. This would, of
course, be perfectly reasonable, from the point of view
of system management, if this new heavy demand were
permanent; but if it is only a very temporary thing,
then the system, and the remaining customers, must
be protected. In effect, some rationing scheme is indicated. By such a mechanism, a user reserves, in' advance, that quantity of system resources he is likely
to require during some time interval, perhaps a month.
By this technique, system management may allocate
available resources among its customers in an attempt
to even the load on the system. For example, the day
may be divided into several periods, such as peak and
off-peak, and different rations might apply to each
such period for any customer. By this technique, the
user can be assured of more even levels of service at
all times, and system management may more accurately
forecast its facilities requirements.

Flexibility in pricing

Although the resource management system will use
the firm's pricing structure as a basis for its recordkeeping, it should by no means be bound or limited by
the specific pricing mechanism that has been selected
by management. There are several reasons for this.
Perhaps the most obvious is that the marketing strategy
of the firm, and hence its pricing policy, may be subject
to some modification as time goes by. Moreover, if in

122

Spring Joint Computer Conference, 1970

In effect, the time sharing supplier need deal only
with intermediaries, its own marketing organization,
and customers. Each may then manage its own use of
the computer and allocate the ration of system resources among those users and activities subject to its
control. Such a resource management structure is illustrated in Figure 2.

TIME SHARING
SYSTEM
ADMINISTRATION

MARKETING· a
SERVICE FIRMS
(MORE THAN I)

CUSTOMER
GROUPS

(MORE THAN Il

IN-HOUSE
SOFTWARE
DEVELOPMENT

Access to system resources and services

PRO.£CT
LEADER

PROJECT

PROJECT
LEADER

LEA~R

~=====:

USERS

:::===:....J

SERVICE
MGR.

INDIVIDUAL
USERS

CUSTOMERS

Figure 2-Resource management structure

fact the firm is going to deal with independent software
developers, independent retailers, or both, it will not
want to impose its pricing structure upon these firms
in the latters' dealings with their own customers.
Hence, the system should be capable of supporting a
variety of pricing policies at the same time. It should
have the capability of charging the intermediary at
prices established by mutual negotiation or any other
means, and then permit the intermediary to impose
virtually any sort of charge upon the latters' customers.
This separation of wholesale and retail types of charges
should be reflected in all other parts of the system,
from resource allocation to billing.

Decentralized management
Since the time sharing supplier will be dealing with
some intermediaries, it is necessary that the latter be
provided with some resource management and administration tools, thereby enabling control over the
activities of the intermediary's customer. Moreover,
the actual customers of both the time sharing firm and
its intermediaries can greatly benefit from such local
resource management facilities. Thus, a customer's
project leader should be able to directly manage use
of the computer among the several participants in his
project. He should be permitted to allocate some set
of resources given to him among users in his project.
Similarly, an intermediary should be allocated some
pool of resources and should be permitted to directly
allocate these among its customers according to whatever method it wishes.

The foregoing implies a fairly tight method of controlling access to the computer. This is especially true
where price differences apply to different classes and
types of services. Thus, a user of a proprietary subsystem who pays a higher price for system resour~es
than does a general time sharing service user must be
restricted to use the system only through the account
specifically established for this purpose. Similarly, a
general purpose time sharing service user must be prevented from gaining access to any proprietary service
for which some unusual charge applies, without first
establishing an account for the use of such services.
For any end-user, the operating system must know (1)
who is responsible for charges (i.e., a customer or an
intermediary), (2) what type of service this user has
subscribed for, (3) what resources have been allocated
to this end-user, and (4) which price shcedule applies
to the user and which to the responsible account
(which may, of course, be the same).

Detail of system usage
I t is possible for a time sharing system to keep a
detailed log of all transactions that take place within
it. It is, for example, possible for every system command to create a message, stored somewhere in the
system, that indicates the time, date, name of the
command, the identity of the user, and the quantity
of resources used in the execution of the command.
This would, however, be a fairly costly procedure, in
that system overhead wOl!ld be increased to handle
each of these message creation and storage. From a
managerial point of view this may be quite unnecessary.
However, there are certain types of individual services
which must be accounted for in great detail, both from
the standpoint of billing and collection of revenues,
and from the standpoint of operations analysis and
control. Thus, services provided by proprietary software that is to be charged on the basis of transactions
performed rather than computer resources consumed
must be maintained in great detail so that the customer
may know how he has spent his money and that the
software developer may examine the relative efficiency

Computer Resource Accounting

of his programs. Moreover, the time sharing system
management will want this information to compute
that portion of the subsystem's total revenues to retain
for providing service.
Subsystem usage accounting

Proprietary subsystems that offer services to be
charged on a "value of service" or transaction basis
necessarily have pricing structures radically different
from any resource-based rate schedule. In fact, just
about the only similarity between any two such subsystems is that they both compute their rates according
to some unique set of rules determined by the developer
of the application package. The resource accounting
mechanism must not only cope with such a variety
of structures, but should additionally encourage innovation on the part of software developers to use any
arbitrary mechanism that they desire. Hence, the accounting system should permit the subsystem to' compute its charges and to report to the accounting mechanism these charges together with some identification
of the nature of the service provided to the end-user.
The accounting system should be able to record these
subsystem-imposed charges in the end-user's account,
retain the details of the transactions for billing and
analysis purposes, and to impose upon the subsystem
itself a charge based upon the resources consumed in'
completing the user-initiated transaction. In effect,
the subsystem becomes the customer of the time sharing
system, the end user is charged by the time sharing
system on behalf of the subsystem. (That is, the time
sharing firm offers, as a service to software developers,
a means by which it will handle all of the paper work
associated with the software developer's customers'
accounts.)
IMPLEMENTATION
Many of the features mentioned here were included
in a resource accounting system implemented by the
author on the IBM 7094 Compatible Time Sharing
System developed at Project MAG at the Massachusetts Institute of Technology.2;3 The "BUYTIM"
resource allocation system was designed to operate
wholly within the CTSS operating system, and required no monitor-level modifications to the CTSS
softw~re. The CTSS implementation is described below.
When Project MAC started charging for CTSS
usage inJanuary of 1968, a need arose for some changes
in the mechanism by which users r~quested additions
to or changes in their set· of allotments of computer
resources. Formerly, these resources were allocated

123

directly-that is, by specific allocations to each user,
of time, divided into five shifts, and disk records for
storage of user files. U nder~e new scheme, users
receive dollar budget allocations, from some sponsored
project, and use these funds to purchase resources on
the CTSS System. The mechanism described here
serves to give the individual user more direct control
over the means by which his available funds are spent.
The system provides for computer resource management at a number of levels, ranging from top-level
administration down to the '':level of the individual
user. Under the pre-existing resource management
system this function was assumed, within the MIT
Information PrC?cessing Center, at two levels: Administration and User Group Leader. The Administrationj
held ultimate responsibility for computer resourcel
allocation; it classified all users into one of about twentYl
user groups, and determined the particular mix ofj
resources that were to be made available to each o~
the groups. The Group Leader was responsible for ap~
portioning the resources allotted to his group among
its members. This responsibility is not altered under
the BUYTIM set-up. However, what is provided to
the. Group Leader is a means for further delegating his'
responsibility-and power-to individual members of
his group at two principal levels: the Problem Number
Leader and the Individual User.
The Problem Number Leader is afforded the capability of apportioning resources allotted, by the
Group Leader, to his problem number among its members, a relationship vis-a-vis the Group Leader not unlike the latter's relationship to CTSS Administration.
The Problem Number Leader is further afforded the
capability of delegating some of his authority to individual members of the problem group, with several
levels of decision-making capability. That is, the
problem number leader may designate another user
in the problem group to have identical capabilities as
himself, or he may choose to limit these capabilities
in some way. At the limit, the individual may be given
the ability to make changes in only his own account,
or may be denied even this capability.
User preferences for computer resource allocations
are subject to two types of limitations: pricing and
rationing. Resource prices have been established by
CTSS administration and govern the allocation of
time and disk, as well as of several other "commodities."
Only the time and disk are covered by this system. In
addition to the constraints of the costs of CTSS resources, users are further constrained by several restrictions which limit their ability to spend their allocation of funds as they might please. First, each user
group is given a set of Group Limits containing the
maximum amounts of each commodity that may be

124

Spring Joint Computer Conference, 1970

FILESD
PROGRAMS

~~~N~;ATlON

0
0

Figure 3-BUYTIM system files and programs

allocated among the group's members. Under the
BUYTIM system, the group leader may further break
this set of limits down among individual problem numbers in the group, or may have all problem numbers
share the common pool of resources available to the
group, or a combination of these techniques. To further
restrict the user's freedom, the group leader may establish a maximum increment by which a user may increase his allocation over his actual usage to date. At
the beginning of each calendar month individual user
time allocations are reduced to nominal beginning of
month levels, also established by the group leader for
each user, thus requiring the latter to repurchase additional requirements for the new month.
The BUYTIM system communicates with the CTSS
Administrative management programs by updating
the Group Allocation File (GAF) , a method used by
all of the individual user groups within CTSS. As a
result, it is not necessary for all user groups to subscribe to and implement the BUYTIM system; its use
is totally transparent to the pre-existing allocation
mechanism.
Two new file types were created for the purposes of
effecting communication between the individual user
and the GAF, and a number of programs were written
to permit suitably formatted and protected modifications of these files. The principal file in the system is
the 'TIMACT' file. There is one such file for each problem number in a subscribing user group. In addition,
there is a file of the form "GRPXX ALLOC" for each
user group. All of these files are maintained in a special
directory in Private, Protected mode. Hence, these
files are accessible directly to the Group Leader, and
are made accessible to users only through a special
privileged CTSS command which may modify these
files. Figure 3 illustrates the file structure in this system.

The use of these TIMACT files makes it possible for
the Group Leader to subdivide the overall Group allocations of time and disk into blocks available to each
problem number within the group. However, in some
cases, particularly where the problem number contains
only one or a few programmers, this feature may be
undesirable, since the overall group limits would be
segmented to a point where flexibility to meet individual
user needs, within the pricing mechanism, would be
seriously restricted. In such cases, the Group Leader
may assign a particular problem number to a common
pool of resources. This means that, instead of getting
problem number limits from the TIMACT file, the
limits come from the file GRPXX ALLOC.
The BUYTIM command provides an on-line mechanism for requesting changes in CTSS time and disk
allocations. It permits a user to "spend" the funds
allotted to him, at the prevailing prices for the s~veral
commodities as contained in the file SYSTEM PRICES,
subject to availability, and also enables the user to
change his password. In addition, BUYTIM provides
varying capabilities for project leaders to make changes
in funds and computer resource allocations of other
users within the same problem number group. A wide
range of access privileges are available for this purpose.
Although the individual CTSS user and his project
leader are provided with the ability to trade-off among
the various CTSS resources at the established prices,
this capability is limited to the extent that such resources are available to the project group.

Classes of use of BUYTIM
There are seven distinct classes of use of BUYTIM,
each of which affords the classified user with certain
privileges, in terms of what types of changes he may
request via the command. The class codes for each user
are contained in the TIMACT file, which also contains
the other account information about the user. The
class designations are as follows:

o.

No access to BUYTIM whatsoever.
1. User may change his own time and disk.
2. User may change his own time, disk, and password.
3. User may change his own time, disk and password,
and may change time and disk of other users in the
problem number group.
4. User may change his own time, disk and password,
and the time, disk and passwords of other users in the
same problem number group.
5. User may change his own time, disk, and password, and the time, disk, and funds of other users in
the same problem number group.
6. User may change time, disk, password and funds

Computer Resource Accounting

allocations of himself and all other users in the same
problem number group.
Change codes

There are nine change codes available, plus a termination code, as follows:
Code
t1
t2
t3
t4
t5

disk
pass
funds
prpass

Description

1. Unauthorized use of BUYTIM (class 0 user).
2. Attempt to change account of another user
(classes 1 and 2).
3. Attempt to change specific items of another user
or of own account not permitted by class designation.

Allocation restrictions

The '*' is used as a termination code, to denote the
completion of a series of changes.
To change time, disk or funds allocations, a user
types the appropriate change code followed by the
amount of the change in integral minutes of time, disk
records, or dollars of funds allocation. The amount of
the change may be a signed or unsigned number. If
signed (e.g., +25, -40) the present level will be
changed by that amount. If unsigned, the number /is
assumed to be the new level of the allocation, and will
thus replace the old one. For example:
t1

+10

disk

t3

+ 50

-5
funds

t4

20

+ 100

Because BUYTIM charges on the basis of allocation
rather than usage, the dollar balances obtained from
these alternate methods will usually differ somewhat.
However, note that BUYTIM is provided for the convenience of users and Group Leaders, and where a discrepancy exists the figures based upon usage, as provided by CTSS charge statements, will prevail.
BUYTIM will reject several kinds of transactions:
Unauthorized use

Shift 1 time
Shift 2 time
Shift 3 time
Shift 4 time
Shift 5 time
Disk records
Password changes
change· in funds allocation
print user password (class 4 and 6 only)

TYPE CHANGES:

125

*

This will increase shift 1 time by 10 minutes, reduce
shift 3 by 5 minutes, set the shift 4 allocation to 20
minutes, increase the disk allocation by 50 records
and increase the funds allotment by $100.
Charges are levied by BUYTIM on the basis of
allocation, rather than usage. An unused allocation
may be returned for full credit at the prevailing
PRICES at the time of the return. In the case of time,
the charge imposed (or credited) is 1/60 of the prevailing hourly rate for each integral minute of time allocation purchased (or returned). In the case of disk space,
the charge (or credit) is based upon 1/30 of the prevailing rate per disk record per month times the number
of days left in the month. Thus, a disk allotment is
paid through the end of the month, and credits are
figured from the date the space is released until the
end of the month. BUYTIM does not consider the
other charges, such as those for the monthly account
maintenance fee and the U.F.D. charge. These should
be estimated by the Group Leader and subtracted from
the funds balance appearing in TIMACT.

4. Attempt to Increase allocation of time or disk
above maximum increment set by Group Leader or
Pro blem Leader.
5. Attempt to reduce allocation below current usage.
6. Insufficient funds.
7. Increase in allocation exceeds available resources.
There are several capabilities available to the Group
Leader that are not available to the individual user or
problem number leader within the BUYTIM command.
Several other programs were written to facilitate these
functions by the Group Leader. For example, the
group leader may add or delete users, may assign
various types of access privileges and restrictions, and
may apportion the group resource limits among the
individual problem numbers in the group.
All ch,anges made by users, problem leaders, and
group leaders are recorded in the appropriate TIMACT
file (although the group leader may occasionally
modify the Group Allocation File directly). The
modified TIMACT file records must be posted to the
Group Allocation File in order for the CTSS resource
allocation programs to recognize the changes. This is
accomplished by means of a self-rescheduling Foreground Initiated Background job run each evening by
the group leader. (A mechanism in CTSS permits such
jobs to be scheduled and run automatically, without
the presence of a user at the time the job is run.) Hence,
changes made during any given day cannot be recognized by the time sharing system until they have,
first, been posted to the Group Allocation File, and
second, been copied from the Group Allocation File
into the primary system accounting files. This latter
activity usually occurs at midnight, also via a self-rescheduling job. Thus, changes made via the BUYTIM
mechanism will usually take effect at or around midnight following the change request.

126

Spring Joint Computer Conference, 1970

PDP-10 implementation
A maj or extension of the concepts developed in the
BUYTIM system has been designed for the PDP-lO
time sharing system in operation at Codon Computer
Utilities, Inc. 4 The new features of this extension are
described here. It should be noted, however, that unlike
the CTSS version, several monitor-level system modifications were required for the PDP-lO design. As a
result, the system is not transparent to the operating
system, but forms an integral part of it. Besides extending his own concepts, the author wishes to acknowledge the work by Thomas H. Van Vleck of MIT, who
developed the overall CTSS accounting mechanism
(within which BUYTIM operated) for a number of the
ideas incorporated into the PDP-10version.
Unlike the CTSS case, the newer version was built
directly into the time sharing monitor, replacing the
manufacturer-supplied resource accounting mechanism,
rather than simply operating within it. As a result, any
alterations to the user accounts take effect immediately,
rather than at some later time when the changes might
be posted to the user accounts, as in the CTSS case.
Several important new features were, additionally,
added.

Dynamic pricing

Under the CTSS version, the principal control on
usage, during real-time operation of the time sharing
system, ~ was the central processor time consumed by
an individual user. Thus, a user might have received
an allocation of, say, 20 minutes of prime shift processor time for the current month. When that allocation
is exhausted, the user is automatically logged off the
computer bY,the monitor, and is prevented from logging
back on during that shift for the duration of the
month, or until he can secure from the group leader,·
or through the use of .BUYTIM, additional allocation
for the shift. In a commercial environment, the operator
of a computer utility may desire to control other
resources besides the amount of straight processor time
employed by his customers. Moreover, his pricing
mechanism may be non-linear, in that lower unit
prices may apply for larger quantities of a given
resource consumed. Further, depending upon the
nat\lre of the customer, (e.g., an end user or an intermediary) the nature of his application, and any special
terms negotiated with him in contracting for service,
it is conceivable that several different rate schedules
may have to be devised and used simultaneously
during the real-time operation of the computer. In the

PDP-lO version, four distinct types of charges may
apply to a user account during a console session.
Central Processor, or "Computation"
Transaction service usage
Connect time
I/O device usage
The computation charge is based upon the processor
time and the core residence during execution of the
user's job. The applicable rate may be non-linear, in
that as core residence increases, the unit charge for a
space-time unit (kilo-core-second) may vary.
Certain services of a computer utility may be
marketed in terms of the "service" they provide,
rather than the quantity of system resources they
consume. Such "transaction" services are charged at
varying rates, the exact charge being determined by
the particular proprietary program that provides the
service. The charge may be based upon what is being
done, how much of it is being done, and, perhaps in
some cases, who is doing it.
Connect time is the elapsed time between login and
logout. It may vary according to the type of terminal
(e.g.,' use of the system from a high-speed CRT display
terminal may be charged at a higher connect-time rate),
and perhaps at a different rate depending upon whether
or not the job is in an "attached" state or in a "detached" state, wherein no console is physically connected to the computer for the detached job.
Use of I/O devices, such as line printers, magnetic
tape drives, card readers and punches, _etc., are charged
for at rates applicable to each device. Further, a set-up
or minimum charge may also apply in some instances.
In each of the four types of charges, the specific
structure may vary among classes of users, as well as
the time-shift in which the service is provided.
Under the dynamic pricing technique, each user is
given money allocations for each applicable time shift,
and is free to spend his allocated limits on any of the
four types of services just described. Whenever some
service is provided, except in the case of transaction
services, the system computes the quantity consumed,
determines the applicable rate structure ,for the customer in the current time shift, and proceeds to charge
the account the money cost of the service. If this charge
brings the bl1lance for the shift below zero, the user
may be logged off the computer, with an appropriate
message explaining the reason for this action. In the
case of transaction services, the transaction program
itself computes the applicable charge and, by a suitable
monitor call, conveys this information to the monitor.
Specific resource usage (computation, connect time,
and I/O device usage) is charged by the monitor to
the transaction service, in a special account maintained

Computer Resource Accounting

for this purpose, and not to the user. No negative
balance check is made against the transaction service
account. When the transaction service informs the
monitor of a charge to the user, the user's account is
charged and a negative balance check is made. Control
is returned to the transaction service in any event but,
where a negative balance condition exists, the transaction program is so informed and is expected to take
action. Thus, there is a very important assumption
made about the nature of a transaction program. It is
considered to be a well debugged program that is fully
responsible for aIr' accounting interactions with the
monitor on behalf of the user. It must compute the
charge, inform the monitor of its conclusion, and take
appropriate action in the event the user's funds have
been exhausted.
Under the PDP-10 time sharing system, it is possible
for the same user to be logged on the system several
times simultaneously. Thus, it beco~es necessary to
coordinate the charges incurred by each of the several
possible jobs in simultaneous operation in a single
funds balance. Thus, as soon as anyone of the jobs
logged in under the same project-programmer number
does something that results in a negative balance, all
of the jobs subject to this balance will be logged off the
system.
Bills are rendered to a responsible account, rather
than directly to the user. Of course, these may be the
same person. However, by this mechanism, an independent retailer may render his own bills to his customers. Alternatively, the billing system may prepare
such bills for the retailer. The flexible price structure
mechanism enables the individual user to be charged,
during real-time use of the computer, under a rate
structure that J?ay differ from the one for which
services will be billed to the responsible account. Thus,
an independent retailer may convey to the computer
utility his own rate structure, based upon which the
utility will charge and prepare bills for the retailer's
customers. The wholesale prices charged by the utility
to the retailer need not be the same, either in level or
structure, as the latter's retail prices.

Accounting for services used

The large variety of things that a user may be charged
for in a computer utility system requires an accounting
mechanism that can collect, maintain, display and
summarize the. detail. of individual user activities.
Moreover, in the case of proprietary transaction
services, detail as to the nature of individual types of
services provided, their quantities and applicable
charges, is most desirable from the point-of-view of the

127

customer. Further, system and application subsystem
management will want such detail so as to best analyze
the relative efficiencies of the various services offered,
to perform market demand studies, and other management analyses.
The accounting system designed for Codon seeks
to provide these features. Each user account record
maintains a breakdown of the dollar value of resources
used in each of. the four principal categories and in
each of the applicable time shifts. A user may obtain
this information for himself from a logged in console
by a suitable monitor command.
Thus, during a console session, the monitor maintains
a record of the consumption of the four "temporal"
resources and, upon logging off the system, reports
these figures back to the user account record. Moreover,
a record of the' individual console session is created,
containing the resource usage data, time and date, and
other relevant information, and is maintained for
later processing and auditing purposes.
In addition to the temporal resources (temporal in
that they are consumed over time) the system accounts for use of "spatial" resources, such as mass
storage occupancy. The technique here is quite analogous to the scheme employed within CTSS, as described earlier. Besides accounting for disk storage, the
monitor will not permit a file to be opened for writing
if the quota of disk blocks has been exceeded. When
such a condition occurs, the user .must either delete
some files to free up some space, or have his allocation
of disk space increased.
Spec:fic transaction services are recorded as they
are provided, by means of records that contain data
on the user's identification, the type and quantity of
transaction services provided, and the applicable
charge. (E.g., preparation of. 40 payroll checks @ 20¢
each, $8.00.) At the same time, the charge to be imposed to the user is also added to the transaction
service's account for auditing purposes. At the end of
an accounting period (e.g., a month) the transaction
service records are sorted by user and summary statements are prepared showing the basis for the aggregate
transaction service charge.
Usage of input/output devices is also handled by a
similar detailed recording procedure. A record is maintained for each access to a particular device, indicating
the· duration. of access and the applicable charge for
the service. These records may also be summarized at
the close of an accounting period and presented in a detailed statement to the user. System management may
also be provided with a detailed picture of the relative
demand for access to the various peripherals on· the
computer.
Thus, the billing process will generate a summary

128'

Spring Joint Computer Conference, 1970

bill, which provides aggregate charges for each of the
four temporal resources, the principal spatial resource,
disk, and any other charges related to the account.
Further, a detailed statement of specific transaction
services supplied may be produced, as may a similar
detailed breakdown of I/O device access. Finally, an
historical record of the individual console sessions may
be generated by the accounting system.
Applications subsystem owners, whose software is
offered on a transaction basis by the computer utility,
may be provided with a statement of the resources
consumed by their respective programs, as well as the
revenues generated by their subsystem's customers. A
similar type of detail may be provided in the case of
usage of peripheral devices.
Finally, it should be pointed out that the accounting
system· incorporates all transactions between the
customer and the utility, and provides a convenient
mechanism for posting miscellaneous charges and
credits, such as a charge for consulting service or a
credit for a system failure, directly to the user's account
record maintained on the system.

System access

The nature of access to the computer, by individual
users, may be restricted by the system administrators
in several ways. In the simplest case, it may be desired
to restrict access to only some of the time shifts during
which the machine is in operation. This may be handled
quite simply by setting the funds allocations in the
restricted shifts to zero. The system will not permit a
user to log in during these times, since the balance remaining in his account, for the shift, is zero.
Where a user subscribes to the computer utility for
a specific proprietary applications subsystem, it may be
desired to restrict access to that subsystem. Moreover,
access to the subsystem programs must be restricted
to only those users who have subscribed to its service.
These objectives may be accomplished by a fairly.
straightforward procedure. First, applications subsystems are accessed by special system commands
which perform a login procedure for the user invoking
them. If the user is to be restricted to a subsystem, the
name, or some other unique identification, of that subsystem is placed in the account record. The normal
system login procedure checks to see that no such
restriction exists; if it does, login as a general time
sharing user is not permitted. The login procedure in
each subsystem must verify the equality of the subsystem restriction with its own requirements. For
example, a subsystem might accept several legal subsystem identifications in the user account record, but

assign different levels of service privileges to the user
based upon the particular code that is present in his
account. (Of course, a subsystem might not perform
any such check, allowing any subscriber to access it
and purchase its services.)
Dn occasion, it may be desirable to limit the number
of simultaneous jobs that may be active for a group of
users, e.g., a number of individual users all associated
with the same customer of the computer utility. This
capability permits the utility to offer guaranteed
access for a stated number of individual users to a large
customer organization. This concept may be implemented in two ways: a guaranteed minimum number
of lines, or a guaranteed total number of lines. In the
former, the customer would be guaranteed that he
could always have some s~ci~ number of lines; if
he were using all of these, he could obtain additional
lines on a contention basis with other customers who
do not have any guaranteed access. In the latter case,
the customer cannot exceed his guaranteed number of
lines; additional users will be prevented from logging
in until another user in the group has logged out.
We have already considered the case of multiple jobs
for the same user account. The system has the capability
of placing an upper limit on the number' of times a
user may be logged in. However, for a user in a guaranteed access group, the use of multiple logins on the
same user account will count toward the access guarantee for every time a single user is logged in.

Resource management

The preceding sections have described the control
capabilities of the PDP-IO design. In order to administer the system, a mechanism -has been provided for
communicating with the accounting structure at a
number of levels, similar, but with some significant
extensions, to the BUYTIM system on CTSS described
earlier. The program that enables such communication
is implemented as an application subsystem that may
be made available to virtually all users of the computer,
but subject to several distinct levels of access. The
same type of management levels that exist in the CTSS
version are available here-utility administration, user
group leader, project leader (analogous to problem
number leader in CTSS) an individual user. In the
CTSS version the group leader could allocate individual
sets of resource limits to each problem number, or place
some or all in a common pool. A similar capability
exists- here, except that any number of resource pools.
may be established within the same user group, instead
of only the one available in CTSS. Generally, each
level of management has greater direct control ca-

Computer Resource Accounting

pabilities than were available within the CTSS version.
The following table summarizes the principal capabilities of each level of resource management, under
the PDP-I0 version.
RESOURCE lVIANAGEMENT CAPABILITIES

Level

Capabilities

User

May have no access to subsystem, or
May do any of the following, as determined by project leader:
Alter own resource limits, password,
or funds.
Alter any or all of the above for
other users in same project.

Project
Leader

May do any of the above within his
project.
May assign any of the above levels of
permission to users in his project.
Add or delete users in his project.
Examine the accounts, including passwords, of users in his project.
Designate specific applications subsystem access to any user in his
project, subject to restrictions
imposed by Group Leader.

Group
Leader

May do any of the above.
Designate project leaders for projects
within his group.
Assign and remove individual project
numbers to and from resource
pools.
Change allocation limits for projects
and resource pools within the
group.

Utility
Administration

May alter rate structure applicable
to individual user accounts.
Alter guaranteed access group assignment for individual users, projects and customer groups.
Change overall resource limits for
individual customer groups.
May add and remove project- numbers
to or from a customer group.
May create and destroy customer
groups.
May alter the number of lines, as well
as the nature of, a guaranteed access group.
May enter any extraordinary charge
or credit to any individual user
account.

129

May set, for each user, the number of
times he may be simultaneously
logged in.
May establish and modify the list of
subsystems available to the user
group.

Security of the SysteIll

Several procedures and mechanisms have been incorporated into the design of the resource management
system to provide protection from accidental or
deliberate sabotage by users. All accounting files are
maintained in a manner such that access to them is
only possible by means of one of several system commands. Further, these commands are responsible for
restricting access to the accounting files according to
the level of authority of the user invoking the command.
This permission information is maintained in the user
account record. When executing the accounting commands, the user is prevented from returning to monitor
level until all files have been closed and access has
been terminated. Moreover, should a user accidentally
be able to get to monitor level, the system will not
permit him to do anything except log off the computer.
Similar restrictions apply in the case of use of an
application subsystem. Once access has been gained,
control cannot be returned to the time sharing monitor
unless ahd until the program so desires. User-initiated
interrupts are intercepted by the monitor and control
is passed back to the proprietary subsystem then in
execution. The subsystem is, of course, responsible for
taking whatever action it considers appropriate. A
normal exit from such a subsystem implies a logout. If
a user wishes to use both the subsystem and the general
purpose time sharing service, he must establish separate
accounts for each purpose.
We have attempted to present a description of how
a managerial accounting information system for a
computer utility can significantly expand the scope of
activities of such an organization. The approach to
system design has been the result of actual administrative experience with such systems over a period of
several years. This experience has shown the importance
of such capabilities.
As time sharing service organizations become more
complex in their structure and diversified in their
activities, the need for a well-structured information
management mechanism will no doubt become more
critical. This implies that time sharing operating systems will, more and more, have to be designed with the
necessities of system administration in mind. Operating

130

Spring Joint Computer Conference, 1970

systems deficient in this respect will find it difficult to
provide the range of services necessary for survival in
an increasingly more competitive industry environment.
REFERENCES
1 D S DIAMOND L L SELWYN
Considerations for computer utility pricing policies
Proceedings of 23rd National Conference Association for

Computing Machinery pp 189-200 Brandon/Systems Press
Princeton New Jersey 1968
2 P A CRISMAN Ed
The compatible time sharing system: A programmer's guide
Second Edition MIT PRESS Cambridge Massachusetts 1965
3 L L SELWYN
BUYTIM: A system for CTSS resource control
Unpublished memorandum Project MAC MIT Rev July 1969
4 Codon resource allocation system: User's guide
Codon Computer Utilities Codon Corporation Waltham
Massachusetts 1970

Multiple consoles: A basis for communication
growth in large systems
by DENNIS W. ANDREWS and RONALD A. RADICE
International Business Machines Corporation
Kingston, N ew York

INTRODUCTION
The intent of this paper is to discuss the development
of multiple consoles which on the surface appears to
be a simple and straightforward concept. This concept,
however, should not be considered an ending in itself,
it should be viewed as a basis from which a communication network can grow. Justification for more than one
console is supported to show that multiple-consoles are
necessary components in most if not all large systems.
Beyond the basic and obvious requirements of a multiple console environment lie many different philosophies concerning the utilization of consoles, some of
which are explored in this paper.
The recently announced Multiple Console Support
(MCS) Option of the IBM System/360 Operating
System constitutes one significant approach. Within
its own environment MCS leaves room for meaningful
growth in the communications network.
THE MULTIPLE CONSOLE CONCEPT
As one looks back over the history of computer
evolution the evidence of development in system
features shows a steady growth. Increased diversification, throughput capabilities and physical size have all
contributed to overloading the communication network.
The discontinuous flow of system messages and the
proportionate increase in the number of messages with
the growth pattern in system features, made the task
of the operator demanding to the point of physical
and mental exhaustion.
An informal study revealed that three messages per
minute based on an average message mix is the maximum number an operator can handle without delaying
system throughput. Information bnly messages were
the easiest for the operator to handle, because all he
had to do was remember what was outputted to him
131

regarding the system status. But as the number of
decision and action type messages increased so increased
the complexity of the operator's job. His role in the
man-computer communication network became more
vital as the system and the number of system messages
continued to grow. It soon became obvious any breakdown in the response time of the operator would only
degrade the p6wer and throughput of the system.
What was a user to do? He wanted more system
features and was aware that with this increased power
the system would need a corresponding increase in
operator-computer communications. As one user described it, "With priority scheduling and with multipartition or multiprocessor operations, the operator at
the console is in the position of trying to drink from a
firehose." Another user found that their IBM 1052
typewriter console output was approximately 4 million
characters a month based on a 24-hour day, 7 days per
week operation, or 100 characters per minute. And considering that the maximum rate of a 1052 is 14.8 characters per second and allowing 500 milliseconds for a
carrier return this meant that the console was busy on
an average of one-eighth of each minute. However,
realizing that. console output rates follow probability
the situation arose too often where the console had all
it could do to keep up with message output. At these
times, the operator was kept abnormally busy trying
to interpret what was going on. Then, if required
operator communication requests are considered, such
as displaying information about job activity on the
IBM System/360 Models 65 and up, chances were
that what the operator got back in reply was a history
of what existed rather than what was presently happening in the system. The IBM 2250 Display Console
helped somewhat in offsetting the time required to display these types of informational messages.
Another problem was that even if the operator, or
operators with the larger systems, could keep up with

132

Spring Joint Computer Conference, 1970

the flood of output messages they were kept active
trying to satiefy peripheral requests made by the
system. Add to this fact that with some users it is either
necessary or desirable to have peripheral equipment
located in other rooms or even other floors, the inconvenience of satisfying peripheral requests is compounded. Voice communication between operators was
previously the only way of handling many of these
distant peripheral demands made by the system. But
voice communication proved faulty either through unintelligible operator enunications or because of the lack
of readable copy for the peripheral operator who might
forget a request when his area might be overly busy.
The result was needless I/O delay.
Another significant point was that all communication was dependent on one relatively inexpensive console unit. If for some reason this device should fail,
all communication would stop and the system would
eventually stop.
The solution to this overloaded operator and overloaded console problem was to have multiple operators
at multiple consoles. Now, for example, all the operator
stationed in the tape pool had to do was scan the console output near his station for messages related to the
tape pool. Clearly this was a trivial solution, but it
was nonetheless a solution. The inherent problem, of
course, was that each operator received all system
messages. We shall see later how a specific implementation of the multiple console concept can eliminate this
problem of message duplication at all active consoles.

DESIRABLE ASPECTS OF A SOLUTION
What was needed to satisfy the growth in system
power was an encompassing communications network
that would be versatile enough to accommodate future
system growth while affording immediate solutions to
the present day problems inherent in man-computer
interactions.
At this point then, let us enumerate what should be
the desirable aspects of such a solution:
(1) Reliability-A system component as vital as the
communication network should afford safeguards
against failure whenever possible. Also, there
should be a guarantee that every generated message
gets delivered.
(2) Flexibility-The network should have the capability of molding to fit the immediate needs of
the system, and allow room for future growth.
(3) Standardization-To guard against the heretofore
haphazard growth in the communication area,
formalized procedures should be developed to

eliminate needless duplication of messages and
guard against multiple formats of messages.
(4)· Communication Power-Presently, a major deficit
is in speed of communication. The solution should
offer a definite increase in output capability. This,
to have value, would have to be coupled with a
routing capability which could deliver messages to
their most appropriate point.
(5) Auditability-The network should be able to
maintain a trail of past communications to aid in
system analysis and evaluations.
(6) Security-A user desiring to keep his computer
interactions private should be given that capability.

MULTIPLE CONSOLE SUPPORT
One answer to the multiple console concept was provided by IBM with the Multiple Console Support
(MCS) Option, made available in Release 18 of the
IBM System/360 Operating System. Previous- attempts
at multiple consoles within IBM were made through
the ASP-HASP spooling systems, but the MCS Option
was the first supported package that not only made
multiple consoles a working idea but provided a communications network that could grow with the system.
Let us now look at some of the highpoints of this MCS
Option.
As its name implies, MCS enables the Operating
System to be configured with multiple consoles, with
each console performing one or more dedicated
functions.
The primary means of routing messages to their appropriate point is via 'routing codes' assigned to each
message. One or more routing codes may be specified
to indicate to which functional area a message is to be
sent. More than one routing code may be assigned to a
message.
Descriptor codes provide multiple means of message
presentation or message deletion from display type
devices. These codes provide the individual console
device support with the means of determining how a
message is to be printed or displayed, and how a
message may be deleted from a display device.
The user specifies the console configuration when
building the system. He may dynamically alter the
configuration during system operation, however. One
console must be specified as the Master Console, where
all commands are valid. All other consoles are specified
as secondary consoles with each console having a
command entering authority and assigned routing
codes.
Secondary Consoles are additional consoles (local or
remote) to which selected messages are routed. One

MUltiple Consoles

console may handle more than one routing code, and
the same routing code may be handled by many consoles. The user specifies which operator commands and
routing codes will be authorized for each secondary
console when the system is built .
.Alternate consoles provide backup capability when
the original console device is inoperative. An alternate
console can be a secondary console or the Master
Console. MCS requires that an alternate console be
specified for the Master Console. If an alternate console
is not assigned to a secondary console, the Master
Console will be assigned as the alternate. This alternate
console concept enhances the network reliability by
ensuring that messages are not lost when one console
goes down or offline. Initially, each console's alternate
is assigned during system definition but .can be dynamically changed during operation.
Six console types are presently s~pported by MCS:
(1) IBlYI 1052 Printer Keyboard Model 7 with a 2150
Console.
(2) IBM 1052 Printer Keyboard Model 7 with a 1052
Adapter.
(3) Composite Reader/Printer or Reader/Punch combinations.
(4) IBM 2250 on IBM System/360 Models 50, 65, 75,
and 91 using MVT. (Display Screen)
(5) IBM System/360 Model 85 Integrated Operator's
Console (220C). (Display Screen)
(6) IBM 2740 Communication Terminal ModelL
Console switching, the act of moving one console's
capabilities to another console, can be done automatically, dynamically, and manually. Automatically
switching to an alternate console occurs when the
console is determined to be inoperative by the software.
An operator command has been provided for dynamic
console switching and console reconfiguration. The
external interrupt key on the operator's panel provides
manual switching to a new Master Console. The
facility for the DISPLAY of the console configuration
is provided through the Display Consoles operator
command.
HARD COpy LOG, ROUTING CODES, AND
TIME STAMPING: MCS provides the capability to
have buffered or immediate hard copy. Specification of
the hard copy device is provided at system definition
and at system initialization. It eliminates the loss of
information when graphic console operators delete
messages and operator commands from their screen.
Hard Copy collects and records all messages that have
routing codes which intersect an assigned set. This set
can be dynamically changed by the Master Console
operator. Messages when sent to Hard Copy are pre-

i133

fixed by their routing codes and a time stamp from the
system clock.
User Exit: An option is provided to enable the inclusion of a resident, user-written exit routine. This
routine receives in a separate buffer a copy of each
message before it is routed. Available to the routine
are the following:
1. Message Text
(read only)
2. Routing Codes
(modify or suppress)
3. Descriptor Codes
(modify or suppress)
This allows a user to impose his own functional routing
mechanism.
These individual facts about MCS together form a
multiple console environment which accomplishes some
of the aforementioned 'desirable aspects.'
(a) Reliability-The extensive alternate chaining and
automatic console switching combined with a wide
variety of device types insures much greater dependability. Now one or more of the consoles can
fail without a noticeable delay In system
functioning.
(b) Communication Power-More consoles immediately increases message output capability by a
factor almost equal to the number of consoles. The
new devices supported such as the IBM 2250 and
the IBM System/360 Model 85 Integrated Operator's Console 220C also deliver increased speed
in terms of time between message issuance and appearance on the console.
(c) Auditability-The Hard Copy Log concept is a
direct answer to this problem and affords a flexible
means of recording the system's daily performance.
(d) Flexibility-The changing internal and external
system environment can be coped with through
the new operator's commands. The console configuration can be easily changed by bringing up
new consoles or removing others from operation.
Operating consoles can have their functional areas
and command authorities changed. Flexibility on a
shop level is given by the User Exit routine which
allows a user to tailor a routing algorithm more in
keeping with his own specific job types.

GROWTH IN THIS NETWORK
MCS clearly follows the lines of the solution. Although not a panacea for all the mentioned points it

134

Spring Joint Computer Conference, 1970

does afford a basis upon which the remaining points
could be obtained.
M CS routes messages through the use of predetermined codes attached to messages and consoles defined
to receive specific message codes. This allows for
splitting messages between the consoles but does not
allow for messages that might be considered priority.
Thus, if a volume of messages occurs the operator will
attack the output serially and he may lose time responding to a number of less important messages
before he gets to the priority message. MCS does provide, however, a mechanism for flagging messages that
are considered needed action on the/part of the operator, but still a priority message rp.ay be tucked at the
end of the queue. This problem of priority messages
will grow as the system continues to grow and the
number of output messages and operator responses
increase. Although priority routing is a valid point it
should be noticed that a multiple console environment
that properly uses functional routing minimizes problems in this area.
The security aspect has not been addressed during
the MCS discussion. Certain abilities for security
handling and monitoring exist through the USER
EXIT and HARD COPY mechanism, and a routing
code is assigned for security messages. Beyond this
there is no defined mechanism for a security console to
monitor system functions that may affect the security
user. Since the master console always has the authority
to issue all commands and other consoles may have
limited command issuing authorities, either may inadvertently affect the· security user by cancelling his
job, halting his job, or varying system conditions. The
security user should be able to monitor all system functions that may affect the status of his jobs.
MCS is device independent within the range of
devices/ consoles it presently supports and in some
cases devices not presently supported can be hooked
into an MCS environment if the user modifies internal
restrictions that disallow such devices. A mechanism
should be provided to allow total device independence.
Thus, a user should be able to allow for a desired
number of consoles, and later, say at system initialization time, decide what devices he wants to use as
consoles.
Message standardization perhaps does not fall
within the range of implementing a multiple console
concept. However, standardizing message formats and
text would have great impact on an MCS environment
since the USER EXIT could have more effective text
analysis. This would indeed make available more
means of routing to the individual user. For example,
MCS puts the job name on each message, Hence the
USER EXIT could feasibly route by job name by in-

corporating a standard naming technique within the
given shop.

OTHER PHILOSOPHIES
The last two sections dealt with one specific case,
namely MCS, which should be recognized as just one
example. It is not meant to be representative since any
application is a function more of its environment than
any other single factor and consequently cannot be
considered a general solution. It (MCS) does however
point out that more can be derived from a multiple
console environment than meets the eye. The extremely
obvious and trivial concept of more than one console,
can play a signifi~ant role in large system enhancement
if implemented in a meaningful manner. The basis
upon which messages are routed seems to be the major
point of difference between philosophies. Noone
method can be labeled as the best since different environments lend themselves to different routing
algorithms.
Some systems have more than one Direct Access,
Tape, or Unit Record pool so that functional routing
as afforded by MCS would result in needless messages
at certain locations. A more suitable algorithm in this
case might be to route by unit address. In this way, a
console could request all information concerning certain
. devices. This method also makes implementation of
security measures more feasible since any operations
concerning a particular volume could be monitored by
receiving all messages concerning the unit upon which
the volume is mounted.
Another routing algorithm which arises from the
user who wishes to monitor his own program is the
routing by job class or partition. With this concept a
console would receive all messages referring to a particular job class. By properly assigning classes then,
one could go to the appropriate console to watch his
own job being processed.
These algorithms are for the most part batch oriented.
A time-sharing environment would place different
demands on the communication network and proper
implementation of multiI?le consoles no doubt implies
different routing algorithms.

SUMMARY
Beneath the existing multi-console answer to computer
communication network problems lies the more significant matter of meaningful implementation. Proper implementation cannot be stereotyped since it depends
for the most part on the environment.

!

Hardware aspects of secure computing
by LEE M. MOLHO
System Development Corporation

Santa Monica, California

INTRODUCTION
It makes no sense to discuss 'Software for privacypreserving or secure time-shared computing without
considering the hardware on which it is to run. Software
access controls rely upon certain pieces of hardware.
If these can go dead or be deliberately disabled· without
warning, then all that remains is false security.
This paper is about hardware aspects of controlledaccess time-shared computing. * A detailed study was
recently made of two pieces of hardware that are re. quired for secure time-sharing on an IBM System 360
Model 50 computer: the storage protection system and
the Problem/Supervisor state control system. I It uncovered over a hundred cases where a single hardware
failure will compromise security without giving an
alarm. Hazards of this kind, which are present in any
computer hardware which supports software access
controls, have been essentially eliminated in the SDC
ADEPT-50· Time-Sharing System through techniques
described herein. 2
Analysis based on that work has clarified what
avenues are available for subversion via hardware; they
are outlined in this paper. A number of ways to fill
these security gaps are then developed, including methods applicable to a variety of computers. Administrative policy considerations, problems in secll";"v cel"tification of hardware, and hardware design c~nsider­
~tions for secure time-shared computing also receive
comment.

FAILURE, SUBVERSION, AND SECURITY
Two types of security problem can be found in computer hardware. One is the problem of hardware failure;
*!he relationship between "security" and "privacy" has been
dIscussed elsewhere. 3,4 In this paper "security" is used to cover
controlled-access computing in general.

135

This includes not only computer logic that fails by
itself, but also miswiring and faulty hardware caused
by improper maintenance ("Customer Engineer") activity, including CE errors in making field-install able
engineering changes.
The other security problem is the cloak-and-dagger
question of the susceptibility of hardware to subversion
by unauthorized persons. Can trivial hardware changes
jeopardize a secure computing facility even if the software remains completely pure? This problem and the
hardware failure problem, which will be considered in
depth, are related .

Weak points for logic failure

Previous work involved an investigation of portions
of the 360/50 hardware. I Its primary objective was to
pinpoint single-failure problem locations. The question
was asked, "If this element fails, will hardware required
for secure computing go dead without giving an alarm?"
A total of 99 single-failure hazards were found in the
360/50 storage protection hardware; they produce a
variety of system effects. Three such logic elements
were found in the simpler Problem/Supervisor state
(PSW bit 15) logic. A failure in this logic would cause
the 360/50 to always operate in the Supervisor state.
An assumption was made in finding single-failure
logic problems which at first may seem more restrictive
than it really is: A failure is defined as having occurred
if the output of a logic element remains in an invalid
state based on the states of its inputs. Other failure
modes certainly exist for logic elements, but they reduce
to this case as follows: (1) an intermittent logic element
meets this criterion, but only part of the time; (2) a
shorted or open input will cause an invalid output
state at least part of the ·time; (3) a logic element which
exhibits excessive signal delay will appear to have an .
invalid output state for some time after any input
transition; (4) an output wire which has been con-

136

Spring Joint Computer Conference, 1970

nected to an improper location will have an invalid
output state based on its inputs at least part of the
time; such a connection may also have permanently
damaged the element; making its output independent
of its input. It should be noted that failure possibilities
were counted; for those relatively few cases where a
security problem is caused whether the element gets
stuck in "high" or in "low" state, two possibilities were
counted.
A situation was frequently encountered which is considered in a general way in the following section, but
which is touched upon here. Many more logic elements
besides those tallied would cause the storage protection
hardware to go dead if they failed, but fortunately
(from a security viewpoint) their failure would cause
some other essential part of the 360/50 to fail, leading
to an overall system crash. "Failure detection by faulty
system operation" keeps many logic elements from
becoming security problems.

Circumventing logic failure

Providing redundant logic is a reasonable first suggestion as a means of eliminating single failures as
security problems. However, redundancy has some
limits which are not apparent until a close look is
taken at the areas of security concern within the Central
Processing Unit (CPU). Security problems are really
in control logic, such as the logic activated by a storage
protect violation signal, rather than in multi-bit data
paths, where redundancy in the form of error-detecting
and error-correcting codes is often useful. Indeed, the
360/50 CPU already uses an error-detecting code extensively, since parity checks are made on many multi-bit
paths within it.
Effective use of redundant logic presents another
problem. One must fully understand the system as it
stands to know what needs to be added. Putting it
another way, full hardware certification must take
place before redundancy can be added (or appreciated,
if the manufacturer claims it is there to begin with).
Lastly, some areas of hardware do not lend themselves too easily to redundancy: There can be only one
address at a time to the Read-Only-Storage (ROS) unit
whose microprograms control the 360/50 CPU.5,6 One
could, of course, use such a scheme as triple-modular
redundancy on all control paths, providing three copies
of ROS in the bargain. The result of such an approach
would not be much like a 360/50.
Redundancy has a specialized, supplementary application in conjunction with hardware certification. After
the process of certification reveals which logic elements
can be checked by software at low overhead, redundant

logic may be added to take care of the remainder. A
good example is found in the storage protection logic.
Eleven failure possibilities exist where protection interrupts would cause an incorrect microprogram branch
upon failure. These failure possibilities arise in part
from the logic elements driven by one control signal
line. This signal could be provided redundantly to
make the hardware secure.
Software tests provide another way·. to eliminate
hardware failure as a security problem. Code can be
written which should cause a protection or privileged~
opera~ion interrupt; to pass the test the interrupt must
react appropriately. Such software must interface. the
operating system software for scheduling and storageprotect lock alteration, but must execute in Problem
state to perform its tests. There is clearly a tradeoff
between system overhead and rate of testing. As previously mentioned, hardware certification must be performed to ascertain what hardware can be checked by
software tests, and how to check it.
Software testing of critical hardware is a simple and
reasonable approach, given hardware certification; it is
closely related to a larger problem, that of testing for
software holes with software. Software testing of hardware, added to the SDC ADEPT-50 Time-Sharing
System, has eliminated over 85 percent of present
single-failure hazards in the 360/50 CPU.
Microprogramming could also be put to work to
combat failure problems. A microprogrammed routine
could be included in ROS which would automatically
test critical hardware, taking immediate action if the
test were not passed. Such a microprogram could either
be in the form of an executable instruction (e.g., TEST
PROTECTION), or could be automatic, as part of
the timer-update sequence, for example.
A microprogrammed test would have much lower
overhead than an equivalent software test performed
at the same rate; if automatic, it would test even in
the middle of user-program execution. A preliminary
design of a storage-protection test that would be exercised every timer update time (60 times per second)
indicated an overhead of only 0.015 percent (150 test
cycles for every million ROS cycles). Of even greater
significance is that microprogrammed testing is. specifiahle. A hardware vendor can be given the burden of
proof of showing that the tests are complete; the vendor
would have to take the testing requirement into account
in design. The process of hardware certification could
be reduced to a design review of vendor tests if this
approach were taken.
Retrofitting microprogrammed testing in a 360/50
would not involve extensive hardware changes, but
some changes would have to be made. Testing microprograms would have to be written by the manu-

Hardware Aspects of Secure Computing

facturer; new ROS storage elements would have to be
fabricated. A small amount of logic and a large amount
of documentation would also have to be changed.
Logic failure can be totally eliminated as a security
problem in computer hardware by these methods. A
finite effort and minor overhead are required; what
logic is secured depends upon the approach taken. If
microprogram or software functional testing is used,
miswiring and dead hardware caused by CE errors will
also be discovered.
Subversion techniques

It is worthwhile to take the position of a would-be
system subverter, and proceed to look at the easiest
and best ways of using the 360/50 to steal files from
unsuspecting users. What hardware changes would have
to be made to gain access to protected core memory
or to enter the Supervisor state?
Fixed changes to eliminate hardware features are
obvious enough; just remove the wire that carries the
signal to set PSW bit 15, for example. But such changes
are physically identical to hardware failures, since something is permanently wrong. As any functional testing
for dead hardware will discover a fixed change, a potential subverter must be more clever.
In ADEPT-50, a user is swapped in periodically for
a brief length of time (a "quantum"). During his
quantum, a user can have access to the 360/50 at the
machine-language level; no interpretive program comes
between the user and his program unless, of course,
he requests it. Thus, a clever subverter might seek to
add some hardware logic to the CPU which would
look for, say, a particular rather unusual sequence of
two instructions in a program. Should that sequence
appear, the added logic might disable storage protection for just a few dozen microseconds. Such a small
"hole" in the hardware would be quite sufficient for
the user to (1) access anyone's file; (2) cause a system
crash; (3) modify anyone's file.
User-controllable changes could be implemented in
many ways, with many modes of control and action
besides this example (which was, however, one of the
more effective schemes contemplated). Countermeasures to such controllable changes will be considered
below, along with ways in which a subverter might try
to anticipate countermeasures.
Countermeasures to subversion

As implied earlier, anyone who has sufficient access
to the CPU to install his own "design changes" in the
hardware is likely to put in a controllable change, since

137

a fixed change would be discovered by even a simple
software test infrequently performed. A user-controllable change, on the other hand would not be discovered by tests outside the user's quantum, and
would be hard to discover even within it, as will become
obvious.
The automatic microprogrammed test previously discussed would have a low probability of discovering a
user-controllable hardware change. Consider an attempt by a user to replace his log-in number with the
log-in number of the person whose file he wants to
steal. He must execute a MOVE CHARACTERS instruction of length 12 to do this, requiring only about
31 microseconds for the 360/50 CPU to perform. A
microprogrammed test occurring at timer interruptsonce each 16 milliseconds-would have a low probability of discovering such a brief security breach. Increasing the test rate, though it raises the probability,
raises the overhead correspondingly. A test occurring
at 16 microsecond intervals, for example, represents a
15 percent overhead.
A reasonable question is whether a software test
might do a better job of spotting user-controllable
hardware changes. One would approach this task by
attempting to discover changes with tests inserted in
user programs in an undetectable fashion. One typical
method would do this by inserting invisible breakpoints
into the user's instruction stream; when they were
encountered during the user's quantum, a software test
of storage protection and PSW bit 15 would be performed.
.
A software test of this type could be written, and as
will be discussed, such a software test would be difficult
for a subverter to circumvent. Nevertheless, the drawbacks of this software test are severe. Reentrant code
is required so that the software test can know (1) the
location of the instruction stream,· and (2) that no
instructions are hidden in data areas. Requiring reentrant programs would in turn require minor changes
to the ADEPT-50 Jovial compiler and major changes
to the F -level Assembler. A small microprogram change
would even be required, so that software could sense
the difference between a fetch-protect interrupt and
an execute-protect interrupt. Changes would be required to the ADEPT-50 SERVIS, INTRUP, DEBUG, and SKED modules. Were such a software test
implemented, run-time overhead would likely be rather
high for frequent breakpoint-insertions, since each
breakpoint inserted would require execution of 50 or
more instructions at run time. Lastly, programmers
might not always like having to write reentrant code.
These implementation problems, though described
for one specific test approach, are likely to extend to
any test which attempts to interrupt running user pro-

138

Spring Joint Computer Conference, 1970

grams via software, since any such test must isolate
and augment every user's instruction stream;
Although the above does not demonstrate that detecting user controllable subversion hardware is impossible, it should indicate that by no means is such
detection trivial. It is even harder than the above may
suggest, for, as the next section indicates, while the
subverter is busily building his controllable changes
into the CPU he may add insult to injury by inactivating anti-failure and anti-subversion hardware.

Defeat of countermeasures

Hardware countermeasures that are easy to install
are easy to remove. Redundant logic can be disabled
with the same wire cutters or wire-unwrapper as other
logic. This is also true of state-testing and parity-check
logic. Alternatively, a clever subverter may add logic
that picks up a test signal and uses it to inactivate his
own logic while the test is performed. The subverter
may, alternatively, arrange things so that tests always
pass during his quantum.
Microprogrammed countermeasures are somewhat
more difficult for the subverter to inactivate, since he
must decode particular ROS addresses to know that a
microprogrammed test routine has begun. Once he has
done this, though, he may proceed as if he were dealing
with hardware, since on a short-term basis, microprogram contents and addresses do not change. The net
result of putting countermeasures in ROS rather than
in logic is to raise the "work function" for subversion'
that is, the subverter must do more work to achiev~
his goal, but he can reach it.
On the other hand, a thorough software test, such
as the breakpoint test previously described, would be
nearly impossible to defeat by hardware techniques.
This is because of the transient nature of software:
a subverter would have to decode program variables
such as addresses via hardware; then even minor
changes in the breakpoint-placing routines would make
that hardware useless. One must recall, however, the
large implementation and overhead problems inherent
in a user-interrupting software test. In summary,
countermeasures can be devised which have a high
"work function," but they entail major costs in implementation and system efficiency.
Two assumptions have been inherent in this discussion; namely, that the subverter has both knowledge
of system hardware (including subversion countermeasures) and means of changing the hardware. This
need not be the case, but whether it is depends on
administrative rather than technical considerations.
Administrative considerations are the next subject.

Administrative policy

Special handling of hardware documentation and
engi~eering changes may be worthwhile when commercial lines of computers are used for secure timesharing. First, if hardware or microprograms have been
added to the computer to test for failures and subversion
attempts, the details of the tests should not be obtainable from the computer manufacturer's worldwide network of sales representatives. The fact that testing is
done and the technical details of that testing would
seem to be legitimate security objects, since a subverter
can neutralize testing only if he knows of it. Classification of those documents which relate to testing is a
policy question which should be considered. Likewise,
redundant hardware, such as a second copy of the
PSW bit 15 logic, might be included in the same
category.
The second area is that of change control. Presumably
the "Customer Engineer" (CE) personnel who perform
engineering changes have clearances allowing them
access to the hardware, but what about the. technical
documents which tell them what to do? A clever subverter could easily alter an engineering-change wire list
to include his modifications, or could send spurious
change documentation. A CE would then unwittingly
install the subverter's "engineering change." Since it
is asking too much to expect a CE to understand on a
wire-by-wire basis each change he performs, some new
step is necessary if one wants to be sure that engineering
changes are made for technical reasons only. In other
words, the computer· manufacturer's engineering
changes are security objects in _the sense that their
integrity must be guaranteed. Special paths of transmittal and post-installation verification by the manufacturer might be an adequate way to secure engineering
changes; there are undoubtedly other ways. It is clear
that a problem exists.
Finally, it should be noted that the 360/50 ROS
storage elements, or any equivalent parts of another
manufacturer's hardware that contain all system microprogramming, ought to be treated in a special manner,
such as physically sealing them in place as part of
hardware certification. New storage elements containing
engineering changes are security objects of even higher
order than regular engineering-change documents, and
should be handled accordingly, from their manufacture
through their installation.
GENERALIZATIONS AND CONCLUSIONS
Some general points about hardware design that
relate to secure time-sharing and some short-range and
long-range conclusions are the topics of this section.

I

Hardware Aspects of Secure Computing

Fail-secure

VS.

fail-soft hardware

Television programs, novels, and motion pictures
have made it well known that if something is "fail-safe,"
it doesn't blow up when it fails. In the same vein,
designers of high-reliability computers coined the term
"fail-soft" to describe a machine that degrades its
performance when a failure occurs, instead of becoming
completely useless. It is now proposed to add another
term to this family: "Fail-secure: to protect secure
information regardless of failure."
The ability to detect failures is a prerequisite for
fail-secure operation. However, all system provisions
for corrective action based on failure detection must be
carefully designed, particularly when hardware failure
correction is involved. Two cases were recently de. scribed wherein a conflict arose between hardware and
software that had been included to circumvent failures. *
Automatic correction hardware could likewise mask
problems which should be brought to the attention of
the System Security Officer via security software.
Clearly, something between the extremes of system
crash and silent automatic correction should occur
when hardware fails. Definition of what does happen
upon failure of critical hardware should be a design
requirement for fail-secure time-sharing systems. Failsoft computers are not likely to be fail-secure computers, nor vice versa, unless software and hardware
have been designed with both concepts in mind.

139

B fail, the failure of system operation B would indicate
the malfunction of this portion of operation A's logic.
Such interdependence is quite useful in a fail-secure
system, as it allows failures to be detected by faulty
system operation-a seemingly inelegant error detection
mechanism, yet one which requires neither software nor
hardware overhead. Some ideas on its uses and limitations follow.
The result of a hardware logic failure can usually be
defined in terms of what happe:q.s to the system operations associated with the dead hardware. Some logic
failure modes are detectable, because they make logic
elements downstream misperform unrelated system
operations. Analysis will also reveal failure modes which
spoil only the system operation which they help perform. These failures must be detected in some other
way. There are also, but more rarely, cases where a
hardware failure may lead to an operation failure that
is not obvious. In the 360/50, a failure could cause
skipping of a segment of a control microprogram that
wasn't really needed on that cycle. Such failures are
not detectable by faulty system operation at least part
of the time.
Advantage may be taken of this failure-detection
technique in certifying hardware to be fail-secure as
well as in original hardware design. In general, the
more interdependencies existing among chunks of logic,
the more likely are failures to produce faulty system
operation. For example, in many places in a computer
one finds situations as sketched in Figure 1. Therein,

Failure detection by faulty system operation

Computer hardware logic can be grouped by the
system operation or operations it helps perform. Some
logic-for example, the clock distribution logic-helps
perform only one system operation. Other logic-such
as the read-only storage address logic in the 360/50helps perform many system operations, from floating
point multiplication to memory protection interrupt
handling. When logic is needed by more than one system
operation, it is cross-checked for proper performance:
Should an element needed for system operations A and

*At

the "Workshop on Hardware-Software Interaction for
System Reliability and Recovery in Fault-Tolerant Computers,"
held JUly 14-15, 1969 at Pacific Palisades, California, J. W.
Herndon of Bell Telephone Labs reported that a problem had
arisen in a developmental version of Bell's "Electronic Switching
System." It seems that an elaborate setup of relays would begin
reconfiguring a bad communications channel at the same time
that software in ESS was trying to find out what was wrong.
R. F. Thomas, Jr. of the Los Alamos Scientific Laboratory,
having had a similar problem with a self-checking data acquisition system, agreed with Herndon that hardware is not clever
enough to know what to do about system failures; software
failure correction approaches are preferable.

SYSTEM
OPERATION
A

LOGIC
3

OUTPUT

SYSTEM
OPERATION
8
(o)

SYSTEM
OPERATlON-------.i
A

LOGIC

3

SYSTEM
OPERATION------~

8

~-~

(b)

Figure 1-Inhibit logic vs sequencing logic

OUTPUT

140

Spring Joint Computer Conference, 1970

TABLE I-Control Signal Error Detection by Odd Parity
Check on Odd-Length Data Field

DATA BITS
012 P

000 0
000 1
001 0
001 1
0100
010 1
011 0
011 1
100 0
100 1
1010
101 1

MEANING
data error or control logic error*

o
1

data error
2

data error
data error
3
4

110 0

data error
data error
5
data error

110 1
1110
111 1

6
7
data error or control logic error**

*Control logic incorrectly set all bits to zero.
**Controllogic incorrectly set all bits to one.

System Operation A needs the services of Logic Group
1 and Logic Group 3, while System Operation B needs
Logic Group 2 and Logic Group 3. Note at this point
that, as above, if System Operation A doesn't work
because of a failure in Logic Group 3, we have concurrently detected a failure in the logic supporting
System Operation B.
A further point is made in Figure 1. Often System
Operations A and B must be mutually exclusive; hardware must be added to prevent simultaneous activation
of A and B. Two basic design approaches may be taken
to solve this problem. An "inhibiting" scheme may be
used, wherein logic is added that inhibits Logic Group 1
when Logic Group 2 is active, and vice versa. This
approach is illustrated by Figure lea). Alternatively,
a "sequencing" scheme may be used, wherein logic not
directly involved with 1 or 2-such as system clock,
mode selection logic, o:r a status register-defines when
A and B are to be active. This approach is illustrated
by Figure 1(b) .
N ow, "inhibit" logic belongs to a particular System
Operation, for its function is to asynchronously, on
demand, condition the hardware to perform that System
Operation. It depends on nothing else; if it fails by
going permanently inactive, only its System Operation
is affected, and no alarm is given. On the other hand,
"sequencing" logic feeds many areas of the machine;
its failure is highly likely to be detected by faulty
system operation.

A further point can be made here which may be
somewhat controversial: that an overabundance of
"inhibit" -type asynchronous logic is a good indicator
of sloppy design or bad design coordination. While a
certain amount must exist to deal with asynchronous
pieces of hardware, often it is put in to "patch" problems that no one realized were there till system checkout
time. Evidence of such design may suggest more
thorough scrutiny is desirable.
System Operations can be grouped by their frequency
of occurrence: some operations are needed every CPU
cycle, some when the programmer requests them, some
only during maintenance, and so on. Thus, some logic
which appears to provide a cross-check on other logic
may not do so frequently or predictably enough to
satisfy certification requirements.
To sum up, the fact that a system crashes when a
hardware failure occurs, rather than "failing soft" by
continuing to run without the dead hardware, may be
a blessing in disguise. If fail-soft operation encompasses
hardware that is needed for continued security, such
as the memory protection hardware, fail-soft oper.ation
is not fail-secure.

Data· checking and control signal errors

Control signals which direct data transfers will often
be checked by logic that was put in only to verify
data purity. The nature and extent of this checking is
dependent on the error-detection cod~ used and upon
the length of the data field (excluding check bits).
What happens is that if logic fails which controls a
data path and its check bits, the data will be forced to
either all zeros or all ones. If one or both of these cases
is illegal, the control logic error will be detected when
the data is checked. (Extensive parity checking on the
360/50 CPU results in much control logic failure detection capability therein.) Table 1 demonstrates an
example of this effect; Table 2 describes the conditions
for which it exists for the common parity check.

TABLE 2-Control Signal Error Detection by Parity Checking
DATA
FIELD
LENGTH:

PARITY:

even
even
odd
odd

odd
even
odd
even

CONTROL LOGIC
ERROR CAUSES:
all-ones
all zeros
CAUGHT
MISSED
CAUGHT
MISSED

MISSED
CAUGHT
CAUGHT
MISSED

Hardware Aspects of Secure Computing

CONCLUSIONS
From a short-range viewpoint, 360/50 CPU hardware
has some weak spots in it but no holes, as far as secure
time-sharing is concerned. Furthermore, the weak spots
can be reinforced with little expense. Several alternatives in this regard have been described.
From a longer-range viewpoint, anyone who contemplates specifying a requirement for hardware certification should know what such an effort involves. As
reference, some notes are appropriate as to what it
took to examine the 360/50 memory protection system
to the level required for meani~gful hardware certification. The writer first obtained several pUblications
which describe the system. Having read these, the
writer obtained the logic diagrams, went to the beginning points of several operations, and traced logic
forward. Signals entering a point were traced backward
until logic was found which would definitely cause
faulty machine operation outside the protection system
if it failed. During this tedious pro,cess, discrepancies
arose between what had been read and what the logic
diagrams appeared to show. Some discrepancies were
resolved by further study; some were accounted for
by special features on the SDC 360/50; some remain.
After logic tracing, the entire protection system was
sketched out on eight 8! X 11 pages. This drawing
proved to be extremely valuable for improving the
writer's understanding, and enabled failure-mode charting that would have been intractable by manual means
from the manufacturer's logic diagrams.
For certifying hardware, documentation quality and
currentness is certainly a problem. The manufacturer's
publications alone are necessary but definitely not
sufficient, because of version differences, errors, oversimplifications, and insufficient detail. Both these and
machine logic diagrams are needed.
Though the hardware certification outlook is bleak,
an alternative does exist: testing. As previously described, it is possible to require inclusion of low-overhead functional testing of critical hardware in a secure

141

computing system. The testing techniques, whether
embedded in hardware, microprograms, or software,
could be put under security control if some protection
against hardware subversion is desired. Furthermore,
administrative security control procedures should extend to "Customer Engineer" activity and to engineering change documentation to the extent necessary to
insure that hardware changes are made for technical
reasons only.
Careful control of access to computer-based information is, and ought to be, of general concern today.
Access controls in a secure time-sharing system such
as ADEPT-50 are based on hardware features. 7 The
latter deserve scrutiny.

REFERENCES
1 L MOLHO
Hardware reliability study
SDC N-(L)-24276/126/00 December 1969
2 R LINDE C WEISSMAN C FOX
The ADEPT-50 time-sharing system
Proceedings of the Fall Joint Computer Conference Vol 35
p 39-50 1969
Also issued as SDC document SP-3344
3 W H WARE
Security and privacy in computer systems
Proceedings of the Spring Joint Computer Conference
Vol 30 p 279-282 1967
4 W H WARE
Security and privacy: Similarities and differences
Proceedings of the Spring Joint Computer Conference
Vol 30 p 287-290 1967
5 S G TUCKER
Microprogram control for system/360
IBM Systems Journal Vol 6 No 4 p 222-2411967
6 G C VANDLING D E WALDECKER
The microprogram control technique for digital logic design
Computer Design Vol 8 No 8 P 44-51 August 1969
7 C WEISSMAN
Security controls in the ADEPT-50 time-sharing system
Proceedings of the Fall Joint Computer Conference Vol 35
p 119-133 1969
Also issued as SDC document SP-3342

TICKETRON-A successfully operating system
without an operating system
by HARVEY DUBNER and JOSEPH ABATE
Computer A pplications Incorporated

New York, New York

INTRODUCTION
In recent years, industry has witnessed the proliferation of complex on-line systems. More and more, computer management is recognizing the need to employ
scientific methods to assist in the complex tasks of
hardware/software selection and evaluation. This is
especially true for real-time computer systems. As is.
well known, the distinguishing feature of real-time
systems is that they are prone to the most spectacular
failures ever witnessed in the computer industry. In
many installations, real-time systems have become
"hard-time" systems. The specter of potential failure
has caused users to realize the importance of designing
first, installing later. The sophisticated user has become
aware of the fact that the rules of thumb and intuition
that adequately described simple batch-type systems
do not suffice when one is concerned with real-time
systems. Real-time automation demands a certain
amount of expertise on the part of the designer and
implementor. In fact, systems which have been installed without adequate pre-analysis, more often
than not, wind up with:
•
•
•
•
•

Too expensive a central processor
Too many ancillary components
The wrong number of I/O channels
Too elaborate a Supervisory System
Poor communications interface

Clearly, the salient point we wish to establish is
that real-time systems have a tendency to cost far in
excess than necessary. Typically, the inefficient use of
hardware is the staggering cost factor which most
dramatically degrades the performance per dollar of a
real-time system. Of course, our concern of performance
per dollar would be an academic issue if the effect of
improper design were to cause increases in hardware
costs of the order of 10%. However, we maintain that
14:l

the situation is much more drastic and such systems
suffer excessive hardware costs in the order of 100%.
The reason for this state of affairs is that the design
of real-time systems is not an art but rather a scientific
discipline. 1 ,2 One must bring analysis to bear on the
problems. To be sure, it is not the purpose of this
paper to give a full treatment of this discipline. Rather,
it is the purpose of this paper to demonstrate certain
techniques3 ,4,5 and their application to a real-life
system, TICKETRON.
TICKETRON is a real-time ticket reservation and
distribution system for the entertainment industry.
In many respects it resembles most other real-time
systems, therefore, the discussions concerning this·
system are by no means unique to it. That is to say,
the approach and attitudes developed in the design
and implementation of TICKETRON represent our
philosophy toward real-time systems in general. We
believe that using a successful system such as TI CKETRON as the vehicle for presenting our philosophy
concerning real-time systems, adds substance to our
arguments.
The ultimate aim of our arguments is the concern
for maximum performance per dollar of a system.
TICKETRON is successful because it did achieve excellent performance per dollar. Specifically, the "industry standard" for this type of system priced the
central facility hardware at over $60,000 per month.
Through proper design, TICKETRON was able to
accomplish better performance for less than $30,000
per month.
At the heart of the problem is the frenzy associated
with multiprogramming in real-time systems,. causing
the need for supervisory programs. There has been a
tendency in the past few years to implement operating
systems which are so elaborate that the amount of
computer time used for message processing can be
matched or exceeded by the amount of time required

144

Spring Joint Computer Conference, 1970

by the supervisor to maintain the job flow. In addition
to the exorbitant overhead in time, there is the extra
hardware cost associated with the inordinately large
amount of dedicated core storage required by such
operating systems. Further, a typical, modern-day,
operating system presents itself as a labyrinth to the
user who is required to make his application programs
function in the unfamiliar and complex environment
of the operating system. In most instances, the added
burden on the user to cope with this labyrinth during
program development and debugging is so costly in
terms of manpower effort that it would have been far
cheaper for him to have avoided trying to take advantage of the "standard supervisory package."
In short, these problems reflect a major paradox associated with third generation computer systems:
"How can an operating system that costs nothing be
so expensive?"
At this point, the reader might feel that we have
overstated our position. No doubt he is able to point
to many systems having constraints such that they
. require an elaborate operating system. We agree.
Certainly a large on-line system which must perform
a multitude of tasks cannot function without a complex supervisory system. Our point, however, is that
too often simple systems are designed as if they were
complex systems.
To summarize our approach, we believe that simplicity is the keynote of a good system design. If there
is no need to multiprogram, don't! This is why
TICKETRON is a successfully operating system without an operating system!
It is the intent of this paper to put forth the system
design story for TICKETRON. The second section
presents the design and the third section explains the
design. The fourth section analyzes the design.
SYSTEM OVERVIEW
In addition to giving a functional description of the
system, it is the purpose of this section to broadly
specify the architecture of the system.
To begin, what is TICKETRON? It is a fully computerized ticket reservation and distribution system
offering actual printed tickets at remote terminals.
In short, it provides access to box offices from remote
locations. In that sense, TICKETRON is an extention
of the box office. It was originally intended to sell
tickets for the entertainment industry. However,
today it is also selling train tickets for the PennCentral Metroliner. The system is practical in any
application which involves the issuing of tickets.
Remote sales terminals are installed at high-traffic

points such as shopping centers, department stores,
etc., and, of course, at box offices. It is a nationwide
service having separate systems, each serving a geographical area. There are three central facilities at
present: N ew York, Chicago and Los Angeles. Each
central facility can support almost 900 terminals which
can accommodate sales of 50,000 tickets per hour without any difficulty, under certain conditions (see the
fourth section).
A remote terminal consists of a dedicated keyboard,
a ticket printer and a :r:eceive-only teletype. A customer
desiring ti~kets approaches a remote station and makes
an inquiry concerning the availability of a performance;
The terminal operator interrogates the system via the
dedicated keyboard and receives a response in seconds
at the teletype. The teletype message indicates what
seats are available, if any. Then, if the customer is
pleased with the selection, the operator will cause the
system to "sell" the seats. Within seconds the actual
tickets are printed out by the ticket printer. These are
real tickets and the customer pays for them as he would
at a box office. Therefore, in a genuine sense the remote
station is an extension of the box office. Direct-access
to the total ticket inventory guarantees remote buyers
the best available seats at time of purchase (this is
done automatically by a seat selection program). In
addition to selling tickets, the system provides certain
key reports for management information and also accurate accounting of ticket sales.
TICKETRON is a typical real-time system in that
it is composed of four major constituents:
(1) the remote terminals with their communications
network
(2) the line controller and buffers
(3) the processing unit and associated core storage
(4) the auxiliary storage with its connecting data
channels
Knowing that these function,al elements are required
in the system, one must then determine what hardware is best suited for the job. Hopefully, this selection
should be made on a performance per dollar basis. In
short, this is what systems design is all about.
In the third section, we discuss certain procedural
concepts that we consider important for accomplishing
an effective system design. Further, we present some
findings obtained by executing these procedures for the
TICKETRON system. The remainder of this section
will be devoted to an overview of the hardware and
software architecture of the system.
An important result is the actual hardware configuration that was decided upon for TICKETRON.
It was found that the system should be dedicated
solely to the on-line, real-time tasks required of it.

TICKETRON

(:DC-1713
Typewriter
Conaole
CDC-1700
CPU
32K wds
1.1 usee

CDC-17oo
CPU

f--.I
'----

t--

Two
CDC-1716
Coupling
Data
Channels
.A

1,--

CDC-1751
DI'IIIIl
512K wda
lot 17 .s
175K wpa

'--

Disks
3.1M wds
Seek 75 .s
Rot 25 .s
70K wps

t-t--

cM~i54

Ie

CDC-1742
Line
Printer
300 lpm

-

Dr

j

-

CDC-1722/1724
Paper Tape
Reader and
Punch
400/120 cpa

-

Two
CDC-601
Mag Tapes

H6\~rS

CDC-1729
Card
Reader
100 cpm

CCIIIII\lIlicat10n
Controller
CDC-I770 CPU
20K wda

Figure 1-Central facility hardware configuration
of TICKETRON

Further, it was found that the tasks were such that a
process control type computer afforded the best performance per dollar in this situation. The computer
system selected was a Control Data Corporation 1700.
Figure 1 shows the central facility configuration. Reliability deemed it necessary that essential hardware
be duplexed. The application is such that the system
must be operative at certain critical times, for example,
the peak ticket selling period just before a ballgame.
A result of the design shown in Figure 1 indicates
that the TI CKETRO N system has two processors;
one processor acts as a communications controller,
while the other processes the messages. The front-end
only handles the communication functions and contains the input and output line buffers. It does not
examine the contents of the message. This last function
is done by the message processing program which is
resident in the central processor, which requests messages from the front-end. The communications program
is resident in the front-end.
In addition to specifying the hardware, Figure 1
indicates the approximate characteristics of each device.
The total monthly rental for the central facility hard-

145

ware (including duplexing and maintenance) is about
$30,000. We maintain that this is an achievement of
understanding the characteristics of real-time systems
and their design consequences.
As previously stated, the hardware configuration is
a result of our design analysis. To be sure, the software
design is not divorced from the performance analysis.
In fact, one establishes certain programming considerations by analyzing their effect on the performance of
the system. As a result, we decided on a single-thread
program design. That is, at anyone time there will be
no more than one message in the system which is
partially processed. There may be additional messages
in the system, but these will be in one of two states;
either awaiting processing (in input queue) or having
completed processing (in output queue). We can use a
single-thread programming concept because of the
timings involved.
.
There. are three major program modules: the communications program, the message processing program
and an on-line utility program. The software design is
such that each subroutine calls the next required
subroutine. In essence, the system has one big program.
In considering the flow of a message through the
system, we have the processing program receiving its
messages from the input queue and after processing,
delivering them to the output queue. The processing
program is a single-thread program which deals with
one message at a time. That is, it only accepts another
message for processing after it has delivered one to the
output queue. The processing program determines the
next message to be processed as follows: it scans the
input queue. When it finds a full buffer, it first checks
the output queue to see if a buffer corresponding to
that line is empty; if not, it will carry out the procedure
for another line. If the processing program finds no
candidates to be processed, it will then exit to what one
might call a main scheduler program which does nothing
but loop-the-Ioop. When the processing program has
completed a message, the procedure is for it to loop
back on itself. When it has work, it starts its cycle over
again and does not return control to the main scheduler.
That is, during a busy period, the processing program
is continually looping; further, it is in complete charge
since it has no open branches. In fact, during this time,
the processing program has all the characteristics of
an executive routine. However, in actuality the concept
of "the executive" is foreign to this system.
The communications program, which is resident in
the front-end processor, simply fills and empties the
input and output buffers, respectively, for each communications line. The details of its operation are given
in a later section, because the communication discipline
was an integral part of the system design.

146

Spring Joint Computer Conference, 1970

All programming on the system was done in FORTRAN because it offered the following advantages:
(a)
(b)
(c)
(d)
(e)

Minimize programming costs and time.
Machine independent.
Partially self-documenting, no patching.
Easy to modify.
Easy communication between subroutines (and
between programmers writing the subroutines) .

History and evolution of the system

The TICKETRON system was conceived in January,
1967. A pilot system using a CDC 160A computer and
modified teletype as ticket printer was used to demonstrate the feasibility of the system and to gain practical
experience with automated ticket selling. The pilot
ran from July to November, 1967.
Although TICKETRON has not varied much in
concept over the years, the size and requirement of
the system have undergone evolutionary changes. As a
result, there have been three different computer equipment configurations.
1. Initially, TICKETRON was aimed primarily at
the N ew York legitimate theater and advance sale for
some sporting events, with 50 to 100 terminals selling
50,000 tickets per day. The first operational on-line
configuration consisted of a CDC 1700 with 16K of
core, 2 disks each with 3.1 million words, 2 tapes, a
console teletype, andcorrimunications hardware sufficient for interfacing 16 voice grade (1200 baud) lines.
This equipment was duplicated for reliability, with a
printer and paper tape reader-punch added for offline work. This system went operational in March,
1968.
2. It soon became apparent that sporting events
were more important than originally anticipated, &.nd
that more terminals would be required. The number of
lines was increased from 16 to 32 and the core increased
from 16K to 32K. This allowed the system to handle
over 500 terminals and well over 100,000 tickets per
day. In addition, a drum with 256K words was added
for high frequency files,_ in particular for the inventory
needed for selling same day sporting events. This
system became operational in January, 1969.
3. As terminals become more distant from the computer center, communication costs can be dramatically
reduced by using communication concentrators. These
city buffers are actually computers which perform
"intelligent" functions such as formatting tickets. Since
this required redesigning of the whole communications
program, we took advantage of new technology and
replaced the communication hardware by a computer
front end with 20K of core at no increase in cost. This

configuration can handle the equivalent of 56 phone
lines and almost 900 terminals. Also, to accommodate
more events two more disks are being added and the
size of the drum increased to 512K words. This system
will be operational in the spring of 1970. The analysis
performed in this paper assumes this configuration.
SYSTEM DESIGN
We maintain that a successful system design is
achieved by understanding the characteristics of realtime systems and their design consequences. The procedure for accomplishing this is essentially an iteration
scheme:
(1) Specify a configuration; that is, define a prototype model of the system.
(2) Evaluate the configuration as to its operational
characteristics. Essential to this step is a performance
analysis which determines the capabilities and limitations of the prototype system.
(3) Make design recommendations on the basis of
the evaluation.
(4) Are the recommendations substantial? If yes,
continue. If no, end:
(5) . Incorporate the recommendations by updating
the system model. Then start again.
In this section, we shall present some results obtained
by executing the above system design (procedure for
TICKETRON. At the heart of the procedure is the
performance analysis.
TICKETRON typifies the operati~:mal characteristics of a real-time system. Namely, it is representative
of a stochastic service system. The situation encountered is that the inputs to the system occur randomly
and generate service requests of the central processor
which are varied in type. In a poorly designed system,
these random phenomena cause queueing and congestion problems. Therefore, a performance analysis which
takes account of the random phenomena is essential
to the design effort. The considerations of this approach
manifests itself as follows: in a steady-state operation,
the throughput of the system is defined as the average
number of'- input messages to the system per second.
Certainly then the throughput requirement is an important design criterion. Concurrently, with the
throughput considerations is the requirement of a
tolerable response time to each input message~ In a
given system, the situation usually encountered is
that of having a desirably high throughput which, in
turn, causes a system request queue to build up, thereby
degrading response time. Therefore, pertinent to the
system design is a knowledge of throughput versus

TICKETRON

respons'e time, which will provide the basis for a
practical tradeo ff.
In fact, the analysis of congestion forms the basis for
design considerations of both the hardware and software; e.g., the analysis can answer basic questions such
as "How many terminals can a particular system configuration support?"
Most systems design at present is based on the intuition and experience of the designer with plans to
"check-out" the final system performance by simulation. Historically, this approach has led to poor systems. For example, if one determines storage requirements of a real-time system by a consideration of the
average input loading, then the system would certainly
fail during peak traffic loads. On the other hand, if one
tried to design around the peak traffic, then one would
have a system 'whichis grossly over-designed. Obviously, an optimum system can only be achieved by a
consideration of the entire loading distribution. Tl}.at
is, one must consider the traffic level that is not exceeded 90% of the time, 99% of the time, etc., in
order to realize an efficient system design. Once such
considerations are introduced, predictions essential to
adequate design can be made accurately without the
haphazard quality of intuition or reliance on previous
experience which is not truly applicable.
Communications program

Because a TICKETRON facility must be able to
support over 500 remote terminals, the nucleus of the
system is the communication function. Therefore, most
of the design effort was concentrated on the communication program. In fact, the success of TICKETRON is
mostly attributable to its ability to efficiently handle
so many terminals. We now briefly present some of the
communications design.
To begin, the communication program administers
the handling of message traffic to and from the system
by performing the following functions.
1. 'fo initiate polls by sequencing through a polling
list which is dynamically maintained.
2. To transfer messages between the terminals and
the main processing program and to do this in an
efficient manner.
3. To maintain the physical status of terminals
connected to the system.
4. 'To distinguish on message errors of a physical
nature (as opposed to errors in message content).

The communications program is resident in, and executed by, a special processor as shown in Figure 1.
This front-end processor is a CDC-1774 CPU1 which

147

is essentially a stripped-down CDC-1700 computer.
It has 20K words of core memory available for the
program and 1/0 line buffers. (In this system a computer storage word is composed of 18 bits: 16 data.bits,
1 parity bit and 1 program protect bit.) Associated
with, each line is essentially four buffers, two for input
and two for output. Actually, one of the input buffers
is in an area which contains ten buffers shared by all
the lines. As we shall see below, the system will only
issue a poll on the line if one of the input buffers is
available. That is, you only take a message in if you
have room for it. As was previously discussed, the
message processing program will only start processing
a message if th~re is an output buffer available. Hence,
at anyone time the system generally will contain no .
more than four messages for a particular line. Therefore, because of this "throttling" effect, one need not
program for, nor worry about the problems associated
with excessive internal queueing.
Each line is full duplex and transmission is done
asynchronously. The system outputs 9-bit characters
(1 start bit, 6 data bits, 1 parity bit, 1 stop bit) at
1200 baud or 7.50 msec per character, and inputs 7bit characters (1 start bit, 4 data bits, 1 parity bit, 1
stop bit) at 800 baud or 8.75 msec per character. The
reason why only 4 data bits are required on the input
side, is that all input messages are in the form of a
restricted fixed format. The terminal input device is a
dedicated keyboard with dedicated columns allowing
the entry of such pertinent information as: event code,
performance date and time, and certain seat qualifiers.
Associated with this data is one of three function codes:
inquiry, buy, or buy alternate. The buttons on the keyboard are such that they only permit one per column
or function to be depressed at anyone time. Therefore,
every input message to the system is of fixed size, 19
characters. The advantages of this scheme as to programming and operation are obvious.
The communication program uses a polling technique
that can uniquely address each terminal in the system.
The poll message uses four characters. The system uses
no "hand-shaking" characters such as ACK or N ACK.
A poll to a keyboard causes it to transmit if the transmit
button is depressed. This is accomplished in about 200
milliseconds. If the transmit button is not depressed
when it receives at poll, it sends no response. The communication program will allow the terminal 200 milliseconds to respond before it infers a non-acknowledgment.
The communication program will perform the following logic for each communication line on a periodic
execution cycle.
1. Check disposition of the "receive" line buffer,

148

Spring Joint Computer Conference, 1970

three possibilities exist: free, full, or busy (200 milliseconds have not elapsed since last poll).
2. If free, check if there is available space for a
message in an input message buffer. If the space exists
for an input message, then prepare a poll message for
the next non-busy terminal on that line. Go to 5.
3. If full, check for transmission errors, then move
the message to the input message queue (space will
always be available). Once this is done, the "receive"
line buffer is free, go to 2.
4. If busy, go to 5.
5. Check disposition of the "send" line buffer, two
possibilities exist: free or busy.
6. If free, check for message in the output buffer
for this line, if there is one, send it, otherwise done.
(Start algorithm again at 1 for next line.)
7. If busy, done. (Start algorithm again at 1 for
next line.)
We purposely did not clutter the algorithm with implementation details since it would only cause to mar
the simplicity of the scheme. For instance, the output
required for the printing of a ticket is transmitted in
four segments. Interspersed between the segments
may be poll, light and TTY messages for other terminals on the line. Further, it is clear that the scheme
requires use of certain kept tables which reflect terminal
and line activity.
The basic philosophy in the given design of the communication program has been to maintain the integrity
of its true function. That function being, that it is
simply an intermediary between the message processing
program and the terminals. Speaking loosely, it should
be synchronized with the actions of the message processing program. In fact, since it is ultimately the
responsibility of the processing program to respond to
the terminals, then the communication controller
should only react to the needs of the message processing program. For example, if at any given time the
message processing program has enough work, then
there is no need for the communication program to poll
terminals for the purpose of bringing in more messages.
To do otherwise would be illogical.

Utilization factor is a well defined mathematical concept of queueing theory. Given a facility which has
some random arrival pattern for requests such that the
average input rate is A arrivals per second, then let
each arrival place a demand on (tie-up) the facility
for some average time, Ts seconds. That is, during the
time Ts (service time), the facility is not available to
any other arrival. Then we have the utilization factor
of the facility defined by
(1)

In short, it represents the percentage of time the
facility is tied-up. Obviously, U should riot exceed
100%!
For TICKETRON,_ the throughput is measured in
terms of the number of tickets per hour that the system
is capable of outputing. These determinations start
with a specification of the input traffic to the system.
We distinguish two types of terminals: box office
and remotes. Remotes print what we call full-tickets
(308 ch of data). In addition, box office terminals are
capable of also selling half-tickets (119 ch of data),
which are useful for same-day events. Remotes can
only buy tickets after an inquiry has been previously
executed, whereas, a box office may execute a direct
buy. The reason for this is that box office attendants
are more familiar with their own inventory and therefore, have little need to make inquiries. The characteristics of remotes are such that they average 1Y2 inquiries for each buy transaction. In contrast, at a box
office you have on the average about % of an inquiry
per buy transaction. A buy transaction requires on the
average the printing of three tickets. Table I gives a
distribution of the number of tickets sold per transaction. Because most tickets are bought in pairs, the distribution is "tight" about the average of three, as
verified by the small squared coefficient of variation

TABLE I-Distribution of Various Types of Ticket Sales
Average number of tickets sold per buy transaction equals
2.95, with a standard deviation of 1.72.
Number of tickets sold
per buy transaction

Distribution
(Percent of occurrence)

1
2

49%

PERFORMANCE ANALYSIS
In this section, we give the results of a queueing
analysis of the system in order to determine its capabilities and limitations. As argued in the second
section, response time versus throughput is the basis
in terms of which to measure the performance of the
system. The throughput capability of any system is
determined by certain utilization factors.

10%

3

10%

4
5
6

18%
4%
4%
1%

7
8
9

2%
2%

TICKETRON

149

TABLE II-The Processing Service Times for Various Types
of Transactions
r - - - BOX OFFICE
SALES

20,000

Type of Transaction

<>:
::>

0

:x:

or.

Processing Time in Seconds
same-day
future
events
events

r

~

p..

-'I

~

rJ)

10,000

~
:.:
~

.135

Inquiry
Buy following an Inquiry
Direct Buy

Q

.338
.248
.450

.099

.180

Eo<

r

I
I
I
I

9 AM

-

REMCYrE SALES

- - - --

I

--------,
I

12

I

j

I

6 PM

I

9 PN

Figure 2-Histograms of hourly rate of ticket sales

which equals .34. Therefore, for calculational purposes,
we may consider this a constant distribution and make
use of the simple queueing formulas associated with
constant service.
A typical operational day for TICKETRON is
represented by sales of about 120,000 tickets, which is
equivalent to 40,000 transactions. Unfortunately, this
traffic is not distributed evenly throughout the day,
hence, the system must be able to accommodate peak
traffic loads. Figure 2 depicts histograms of the hourly
rate of ticket sales, which we shall use to establish
peak traffic loads. We note that the remote sales are
evenly distributed at 4,000 tickets per hour over a
ten hour day and they account for ~ the total load.
Of these 4,000 tickets per hour, we estimate that onetenth or 400 are for same-day events while 3·,600 are
for future events. In contrast, the box offices have
sharp peaks for 1Y2 hour periods just before afternoon
and evening performances. It is estimated that the box
offices sell 2,000 tickets per hour for future events
evenly over a ten hour period, which accounts for
20,000 tickets; whereas, the other 60,000 are for sameday events and are sold over a three hour period at an
even rate of 20,000 per hour. These same-day events
are usually sold as direct buys, hence, on the average
we estimate that they cause only about .2 inquiries per
sale. We note that since % of the sales average .2 inquiries per sale and ~ average 1.5 inquiries per sale,
then on the average a box office has about % of an
inquiry per buy transaction. In summary, during a
peak hour, the box offices sell 22,000 tickets while the
remotes sell 4,000.
Because most of the peak traffic represents sales for
same-day events, the system keeps this inventory on

high speed drums for fast retrieval. The processing
service times for the various types of transactions are
given in Table II. These timings are almost a constant
independent of the number of tickets, therefore, we
will assume them constant. The timings include all
I/O times which are not overlapped with the processing,
since the main processing program is a single-thread
routine.
At this point, it is of interest to demonstrate how the
processing times limit the throughput of the system.
As argued in an earlier section, the inputs to the system
are random, in fact we maintain that they generate a
Poisson arrival stream to the processor. This phenomenon causes queueing of the inputs. Therefore, the
total processing time or cpu response time must include
waiting time for the cpu. The simple queueing formula
which determines the average waiting time for a singleserver queue with Poisson input rate A, mean service
time T s , and second moment of the service time b2 , is
given

w--

U (bTs
2

2(1 - U)

)

(2)

where U is the utilization factor which is determined
by eq. (1). To be sure, W becomes intolerably large
as U approaches 100%, which is the limitation that
governs the capability of the system. Hence, to obtain
the cpu response time for a particular type of input,
we just add its service time to the waiting time W. Let
us now determine the average cpu response time for
three different operational environments.
Case I. (Box office peak hour)
Assume for this case that all inputs to the system are
direct buy. transactions' from box offices for same-day
events. One may envision this situation to prevail for a
period of about an hour on a day when every baseball
team has an afternoon game and the remotes are closed.
(Memorial Day is an example of such a day.) It is
reasonable that in this environment there will be a

150

Spring Joint Computer Conference, 1970

active. Since remote terminals cannot make a direct
buy, the input transactions in this case will consist of
inquiries and buys (only after an inquiry). Hence,
there will be at least one inquiry for each buy, but in
fact, we maintain that on the average there will be 1%
inquiries for each buy transaction. Therefore, if A is
the number of buys per second, then (1.5A + A) equals
the rate of input transactions to the system. From
Table II, we calculate the mean cpu service time for a
transaction to be (.6) (.388) + (.4) (.248) = .302
seconds. Hence, the cpu utilization in this case is given

CPU UTILIZATION

.7

.6

.6

.9

0
t9 1.5

~
~

M
~

1.0

fa

~

re

t)

U = (2.5A) (.302) = A(.755)

(4a)

As before, the number of tickets per
(10,800) A. The second moment of the
equals .093 seconds-squared. Therefore,
cpu response time to a buy transaction in
ment is given

hour equals
service time
the average
this environ-

.5

z

i
40,000

50,000

nCKETS PER HOUR

Figure 3-Average CPU response time for a direct buy
transaction as a function of CPU utilization and tickets sold
per hour for an input traffic generated only by box offices for
same-day events, (Case I.)

negligible number of inquiries and selling for future
events. From Table II, we have the cpu service time for
this case as T8 = .180 seconds. Since the service time
is constant, the second moments is equal to the meansquared. Let A equal the average number of input
transactions per second or in this case, it. is equal to
average rate of buy transactions. Therefore, the cpu
utilizatiop. and response time are given

U

=

A(.180)

U
Tcpu = 2(1 _ U) (.180)

Tcpu

U
(.093)
= 2(1 - U) .302

+ (.248)

(4b)

Figure 4 is a graph of this result. We observe that in
this environment the system can sell in the order of
10,000 tickets per hour.
It is interesting to note the contrast of this result
with that given for the environment in Case 1. We may
consider these two cases as extremes' to which the
system can respond. Let us now turn to a study of the
CPU UTILIZATION

(3a)

+ (.180)

(3b)

Figure 3 is a graph of this result. Because each buy
transaction represents an average sale of three tickets
and since there are 3,600 seconds in an hour, then
(10,800) A represents the number of tickets sold per
hour. This quantity is also given in Figure 3. We observe that in this environment the system can sell in
the order of 50,000 tickets per hour.
Case II. (Remote peak hour)

t)

1.5

~

~
E-4

1.0

6,000

12,000
'UCKET S PER HOUR

In this case, let us assume that all the inputs to the
system are generated by remote terminals and are
sales for future events. This situation may very well
occur on certain rare days which have very few events
and during the time that box offices are usually not

Figure 4-Average CPU response time for a buy transaction
as a function of CPU utilization and tickets sold per hour for
an input traffic generated only by remote terminals for future
events, (Case II.)

TICKETRON

151

TABLE III-Distribution of Transaction Types for a Realistic Peak Hour
A is the average rate of buy transactions.
Average Input Rates
from remotes
from box offices

Type of Transaction

Same-day Events
Direct Buy
Inquiry
Buy following an Inquiry

.8(20/26)A
.2 (20/26) A
.2 (20/26) A

Future Events
Inquiry
Buy following an Inquiry

Case III. (Realistic Peak Hour)
Let us assume an input traffic mix as determined
from a peak hour of the histograms given in Figure 2.
Specification of this mix, as discussed above in this
section, establishes the profile given in Table 3. Again,
A equals the average rate of buy transactions. We find
that the total number of inputs per second is 1.5A.
Also, we find that the mean cpu service time for this
traffic is .209 seconds. Therefore, the cpu utilization is
given
U = (l.5A) (.209) = A(.314)
(5a)
As before, the number of tickets per hour equals
(10,800) A. The second moment of the service time
equals .050 seconds-squared. Therefore, the average
cpu waiting time for processing in this environment is
given

.180
.135

1. 5 (.4/26) A
1 (.4/26)X

1. 5 (2/26) A
1 (2/26) A

system which more closely corresponds to a realistic
environment to which the system is subjected more
often.

Processing Service Times
in Seconds

.099

1.5(3.6/26)A
1(3.6/26)A

.338
.248

rather by the congestion of the communication lines.
Further, since the output transmission dominates the
communication load in such systems, it is the traffic on
the output line and the number of such lines which
truly govern the throughput capability of the system.
For example, since the output transmission time required to print a full ticket is about 4 seconds, then
the theoretical maximum that a line can output is 900
tickets per hour. Hence, if the system only had 16
lines, it cOlild not achieve the throughput levels required. Therefore, due to these considerations, we shall
now investigate the system in terms of the activity on a
single line.
CPU UTILIZATION

.55

.65

.75

.85

U
~

1.5

;.~

;:c:

H

Eo<

w

_U
cpu -

2(1 - U)

(.050)
.209

(5b)

~
~

0.

1.0

(1)

~

In Figure 5, we graph the following two functions:
Wcpu + .099 and Wcpu + .248 which represent the
average cpu response times to a buy following an
inquiry for a· same-day event and a future event,
respectively.
Figure 5 shows that as far as cpu processing is concerned, the system can accommodate peak hour sales
of 26,000 tickets in the environment specified by the
histograms of Figure 2 which represents possible
traffic distribution for sales of 120,000 tickets in a day.
In a sense, this situation is to be expected because
typically, the limitations for a system of this sort are
not determined by the cpu processing capability but

0::
~

0.

U

;i!;

 C)

+ C,
=

(X

<

j 2

0)

C) ~ - (X j 2)

vii) H(X, Y)[5, -2] = G'(X) +
JX j A (X ~ 0 TO Y)
viii) S

~

"I AM A NAPPS STRING"

ix) R[l, 1] ~ S

I I "ARRAY ELEMENT"

The left arrow operator (~) indicates that the
arithmetic expression on the right is to be evaluated and
its value is to be assigned to the variable on the left. The

value assigned to D is either a scalar or an array
depending upon the operands in the expression on the
right; while the value assigned to ARRAY is a 3 by 3
array.
The equals sign (=) has the more mathematical
meaning. Statement three establishes that a future
occurrence of E is equivalent to the expression VI
V2 j 2. Values are only substituted for the variables in
the expression on the right of the = when a value of the
variable on the left is needed. Thus if the value of VI or
V2 should change between the definition of E and the
use of E this is reflected in the value of E. Variables
defined to the left of an = are referred to as equals
variables, and variables defined to the left of an ~ are
called left arrow variables, or simply variables.
Statements four and five illustrate that a symbolic
function may be assigned different definitions on
different domains. The difference between statements
four and five is similar to the difference between
statements two and thre~. In the definition of F the
variables A, B, and C have their current values
substituted for them, while in the definition of G they do
not. Values are only substituted for A, B, and C when a
value of the function G is needed. Functions defined to
the left of an = sign are called equals functions and
functions defined to the left of an ~ are called left
arrow functions.
Statements six and seven illustrate how arrays of
functions are defined. All the elements in array of
functions must have the same number of arguments and
they all must be either left arrow or equals functions.
Statement eight assigns to S a string, and statement
nine assigns a string to an element of an array.
Although N APSS is intended primarily as a problem
statement language, the features of a procedural
language have been included to increase its power for
the user who wishes to create a personal library of
NAPSS routines. External and internal procedures may
be written in NAPSS. The use of these facilities is
optional. The casual user need not be concerned with
the rules that procedures introduce, for he can employ
the system on what is called console level.
On console level the user does not set up any procedures. Statements are entered without having to go
through any initial set up, and are normally executed as
they are received.

+

OVER-ALL STRUCTURE OF THE SYSTEM
The N APSS system currently running on the Control
Data 6500 at Purdue University consists of four main
modules: the supervisor, the compiler, the interpreter
and the editor. These modules are composed of 115

lVlanipulation of Data Structures

different routines, which are combined into 28 overlays.
Almost all of the system is written in FORTRAN, with
the exception of a few machine dependent operations
which are restricted to "black-box" modules coded in
assembly language. This is dOI)-e to aid the goal of
machine independence for the system.
The supervisor controls the flow into each of the three
other modules. It distinguishes between NAPSS sources
statements, which are processed by the compiler and
edit statements, which are processed by the editor. The
supervisor is also responsible for invoking the interpreter
when a NAPSS statement is to be executed.
N APSS source statements are transformed by the
compiler into an internal text which the interpreter
processes. This scheme was adopted for several reasons.
First, the complexity of the elements to be manipulated
and the absence of declarations require execution time
decoding of operands. Second, it easily allows for
extensions to the system. Third, it gives the user
incremental execution. Fourth, it permits extensive
error diagnostics and permits error corrections without
having to recompile the whole program. Fifth, statements which are repeatedly executed are only translated
once into internal text.
The internal and source text for each statement is
stored in secondary storage. When a statement is to be
executed, a copy of the internal text is passed to the
interpreter. This reduces considerably the core storage
required for a user's programme. Since the system is
intended for use in an incrementally executing mode, no
reference to secondary storage is normally required to
obtain the internal text of a statement.
The system operates in one of two modes: suppress
mode or execute mode. In the suppress mode, each
statement is compiled into internal text and the internal
and source text is saved on secondary storage for later
execution. Suppress mode is entered by typing the
statement .SUPPRESS. A block of statements which
have been compiled in suppress mode may be executed
at any time by typing the statement .GO.
The normal mode of execution is execute mode. Here,
each statement is executed immediately after it has been
compiled and a copy of its internal and source text saved
in secondary storage. The system automatically enters
suppress mode when the user starts a compound
statement (a FOR statement) or a procedure. This is
necessary because a compound statement cannot be
executed until the whole statement is received and a
procedure is only executed when invoked. The system
re-enters execute mode automatically as soon as the
compound statement or procedure is completed.
The memory of a N APSS program is made up of a few
pages of real memory which reside in core and a larger
number of virtual pages of virtual memory which reside

159

in secondary storage and are brought in and out of real
memory. Two vectors (one dealing with virtual and the
other with real memory) and several pointers are used
to keep track of real and virtual memory.
Each element in the virtual memory vector is
sub~ivided into three twenty-bit bytes. The first byte
contains a flag indicating what type of information is
stored in the page. The second byte is a switch, used
when a page is in real memory to indicate whether or not
a copy of the page also resides in secondary storage. The
third byte contains the real page number the virtual
page is in, when it is in real memory.
The elements of the virtual memory vector which
denote available pages are linked together. Initially, the
element for virtual page one points to the element for
virtual page two and the last element contains a zero.
When a page of virtual memory is returned to the
system its element is again linked to the top of the list
of available virtual pages.
The real memory vector elements contain one entry
per real page. This entry is the number of the virtual
page occupying it (zero of it is free). This pointer from
real memory to virtual memory is used when a new
virtual page is placed in real memory. The virtual page
currently in the real page must be copied out into
secondary storage if a copy of it is not already there.
The amount of core assigned to real memory is
dynamic. Pages are removed from the top and bottom
of real memory in order to obtain contiguous blocks of
storage. Pages are removed from the top of real memory
for two purposes: first, to expand the name table, and
second, to obtain space for the work pool. Pages are
removed from the bottom of real memory to obtain
space for local name control blocks during the evaluation
of left arrow functions. See Figure 1.
The work pool is used to hold arrays when performing
array arithmetic. Requests for work pool space are
always made in terms of words. However, the amount
of real memory assigned to the work pool is always an
integral number of pages. When a request is made for
work pool space and the work pool is empty, the space
supplied is zeroed. When space is requested for the work
pool and the work pool is not empty, one of two
situations arises. First, the space requested is less than
the current size of the work pool. If the difference
between the space requested and the current size of th~
work pool amounts to one or more pages, a corresponding number of pages is returned to real memory from the
bottom of the work pool. Second, the space requested
exceeds the current size of the work pool. If additional
pages are obtained from real memory to satisfy the
request, they are zeroed.
Virtual pages are assigned to real pages sequentially.
Thus a virtual page is not removed until all real pages

160

Spring Joint Computer Conference, 1970

,
.....

INTERPRETER'S
RECURSIVE VARIABLES
INTERNAL
TEXT

AEPDA <

TEMPORARY NAME CONTROL
BLOCKS
RESULT NAME CONTROL BLOCK
GLOBAL NAME TABLE

NAME
TABLE

r'"

> AENCBS

REAL MEMORY PAGE 2
REAL MEMORY PAGE J
1< ~

REAL
MEMORY

<

<>

REAL MEMORY PAGE N-4
REAL MEMORY PAGE N-3
LEFT ARROW FUNCTION
LOCAL NAME TABLE
.....

REAL MEMORY VECTOR
VIRTUAL MEMORY VECTOR

Figure I-NAPSS memory organization

are assigned a virtual page. This sequential process may
be broken whenever space is assigned to the work pool
or to hold the local name control blocks for a left arrow
function, since, after the space request is satisfied, the
next real page to receive a virtual page may no longer
belong to real memory. When this occurs the pointer to
the next real page to receive a virtual page is reset to the
first page now in real memory.
The algorithm for bringing virtual pages into real
memory is further modified when the work pool returns
a page to real memory. Since the page returned is
empty, a virtual page may be placed in it directly,
avoiding the possibility of having to save the virtual
page currently there in secondary storage. Thus the
normal sequential process is interrupted until all pages
returned to real memory by the work pool are reused.
The system does not assign all of real memory to
either the work pool or to space for left arrow function's

local name control blocks. A request for real memory
space is honored as long as two pages remain in real
memory after the request is satisfied. If more space is
requested than can be supplied, the request is modified
to correspond to the maximum amount of space
available. This permits the systen:t to continue if this is
adequate.
Two pages are required in real memory to facilitate
the linking of virtual pages. With two pages in real
memory the above algorithm guarantees that the
previous and current virtual pages referenced remain in
real memory. Thus they may be linked together if
necessary, without having to save pointers and reread
a virtual page to fill in link information.
Associated with each procedure is a name table
containing entries for each variable, label and constant
in that procedure. The entries, called mime control
blocks, are created during compilation when the name
or constant is introduced. At this time it contaiI).s the
name of the variable, and some basic attributes
describing how the variable appears in the program.
During execution the name control block is used to hold
values, pointers to values and a complete set of
attributes for the variable.
This double usage of the name control block entries
poses no problem if compilation and execution are
performed separately. But in N APSS the normal mode
of operation is to execute each statement as soon as it is
compiled. Thus, three situations are possible when a
variable is entered in the name table. First, the variable
may never have been used before in the program.
Second, the variable may have appeared before in the
program but have no value assigned to it. Therefore, it
is just as it was when the compiler last saw it. Here a
limited compatability check is made between the two
uses of the variable in the program. For example, the use
of a name as a label and as a variable in an arithmetic
expression is illegal. Third, the variable has appeared
before in the program and has been assigned a value and
a complete set of attributes. This enables more checking
to be performed. However, the name table routine must
not disrupt any of the attribute flags, for 'if any of them
are changed the attribute may no longer correspond to
the value associated with the name control block.
The name table is constructed sequentially. This
method requires a minimum amount of space, and
permits the name. table to grow dynamically. The name
table is expanded by removing pages permanently from
real memory. This method of name table construction
does require that the name table be searched sequentially. The search goes through the name table from
bottom· to top. This is done because frequently the
greatest percentage of references to a variable occur in
the immediate vicinity of its definition.

Manipulation of Data Structures

A variable which is declared to be global in N different
procedures has N + 1 name control blocks associated
with it. There is a name control block for the variable in
the name table of each of the procedures in which it
appears. Only compile time information and a pointer to
the N + 1st copy is contained' in these name control
blocks. The N + 1st copy is in the global variable name
table and contains a complete set of attributes for the
variable and its value or a pointer to its value.
The N + 1st copy of a global variable's name control
block is placed in the global name table when the first
procedure is invoked in which the global variable
appears, or when the variable is declared global on the
console level (the portion of the program not contained
in a procedure). When a global variable is added to the
global name table and it already appears there, a check
is made on the compatability of the attributes. An error
results when they conflict. Otherwise a pointer to the
N + 1st copy is placed in the procedure's copy of the
variable's name control block.
A count is kept in the global name control block of the
number of procedures referencing the global variable.
When a global variable is no longer referenced, then its
name control block is removed from the global name
table and the storage associated with it is returned to
the system.
A procedure is compiled when it is defined. To permit
it to be linked into the program, the text generated uses
only relative. pointers to name table entries, and all
linking between entries in a procedure's name table is
done with relative pointers. This allows procedure A,
for example, to be compiled as an external procedure
and to be invoked either directly from the console level
or from another procedure which itself is invoked from
the console level. The name table for procedure A is
placed in the name table after the last entry presently
there when it is invoked and a base address is set up.
Variables which are not declared to be either local or
global in an internal procedure are assumed to be known
in the containing block. * After the procedure is compiled
an~ a copy of its name table saved, a pass is made
through the procedure's name table. This pass goes
through the name table from top to bottom and places a
copy of the name control block for each variable not
declared to be either local or global, in the name table
of the containing block. If the variable has appeared in
the containing block, a compatability check is made
between the attributes.
During execution only one name control block is used
for the value and attributes of a variable which is not
declared to be local or global. This is the name control
block entry in the outermost block. The name control
* A block is either a procedure or the console level routine.

161

block in the internal procedure is linked to this when the
interval procedure is invoked. The linkage is constructed
so that only one step is required to obtain the value of
the variable regardless of the depth of the procedure.
There are three types of name control blocks in
different memory areas: ordinary, local for left arrow
functions, and temporary. See Figure 1. Temporary
name control blocks are used to hold temporary results
during the evaluation of an arithmetic expression.
A central routine is used to decode variable name
control blocks during execution. This routine determines
the type of the name control block and handles the
linkage betw~en global, and non-local, non-global name
control blocks. Three things are returned when a name
control block is decoded: the attribute number, the data
pointer field and the index of the array AENCBS of first
word of the data pointer portion of the name control
block. See Figures 1 and 2.
DATA STRUCTURES
A name control block is the basic unit of all data
structures in the system. In some cases it holds the
actual values of the variable, and in others it contains a
pointer to the actual values and descriptive information.
A name control block is made up of seven sixty-bit words
of twenty-one twenty bit bytes. See Figure 2.
A name control block which denotes a numeric scalar
contains the value of the scalar in its data portion. One
or two words of the data portion are used depending
upon whether the value is single precision real, double
precision real or single precision complex.
When a name control block denotes a numeric array,
the data portion of the name control block contains the
actual bounds for the array, the declared bounds for the
ITERATION
POINTER

DATA
POINTER

ATTRIBUTE
FLAGS

}

NAME

DATA
PORTION

Figure 2-The layout of a name control block

162

Spring Joint Computer Conference, 1970

array (these mayor may not have been specified by the
programmer), and the number of dimensions in the
array. The data pointer byte of the name control block
points to where the actual array is stored, by rows, as a
contiguous block. The array is stored as a contiguous
block tQ speed up array operations.
If the data pointer byte of the name control block is
non zero, a copy of the array exists in secondary storage
in the array file. The data pointer is then the number of
the record used to store the array and an index in the
vector AEPAR.
The vector AEPAR contains additional information
about the array. Each word in AEPAR is subdivided
into three bytes. The first byte contains the reference
count for the array. This is incremented by one each
time the array appears in a left arrow function definition.
The values of all non-parameter variables are fixed when
a left arrow function is -defined. The use of a reference
count for arrays permits only one copy of the actual
array to be kept, and. if the non-parameter array
variable is assigned a new value the value of the function
will not change. The second byte contains the number of
dimensions in the array. And the third byte contains the
number of words in the array. The number of words is
equal to the number of elements in the array times the
number of words in each element. This factor is one for
a single precision real array and two for a double
precision real array or single precision ,complex array.
If the data pointer byte of the array's name control
block is zero, the only copy of the array exists in the
work pool, and the array is the result of the last array
operation performed.
The work pool can contain anywhere from zero to
three arrays. A counter is kept of the number of arrays
in the work pool. In addition, for each array in the work
pool the index of the first word of the array, the index
of the first word of the data portion of the array's name
control block, and the information contained in the
array's AEPAR entry is kept. When an array operation
is to be performed a check is made to see if any of the
arrays involved already exist in the work pool. If they
do, no reference to secondary storage needs to be made
to obtain the operands. A check is also made to determine if the result of the previous array operation is an
operand of the current array operation. If it is not the
previous result array must be stored temporarily in
secondary storage.
A name control block which denotes an equals
variable contains the virtual page number of the first
page used to store the internal text for the expression in
its data pointer byte. The first word of each virtual page
is used for linkage. The link contains the virtual page
number of the next page used to hold the text of the
expression or zero if the page is the last. When an equals

variable is an operand of an arithmetic expression this
internal text is evaluated to obtain a value for the
equals variable.
If a name control block denotes a scalar symbolic left
arrow function, the data pointer contains the page
number of the first virtual page used to store the internal
text of the arithmetic expression for the first domain of
definition. The first byte of the fourth word of the data
portion contains the number of arguments of the
function.
The first four words of the first virtual page used to
store the internal arithmetic expression text for each
domain contains a set of pointers. The first word is used
to link together the pages required to store the internal
text for the arithmetic expression for the domain. It
contains the virtual page number of the next virtual
page used. A zero link denotes the last page. The next
three words are subdivided into nine bytes. The first
byte contains the number of words of internal text in
the boolean expression for the domain. This is used when
the boolean expression, is being moved prior to its
evaluation. The second byte contains the reference
count for the function. If the function appears in the
definition of another left arrow function this is increased
by one so that only one copy of this function needs to be
kept. The third byte is the virtual page number of the
first page used to hold the text for the boolean expression
for the domain. This byte is zero if the domain has no
boolean expression. The fourth byte contains the
number of virtual pages that are required to hold the
local name table for the domain. The local name table
contains a name control block for each non-parameter
variable appearing in the boolean and arithmetic
expression for the domain. This is necessary so that the
value of these variables can be fixed when the function
is defined. Byte five is unused. Byte six contains the
virtual page number of the first page used to hold the
local name table for the domain. Byte seven contains
the number of words of internal text in the arithmetic
expression for the domain" Byte eight is unused and
byte nine contains the virtual page number of the first
page of internal, arithmetic expression, text for the next
domain. If this byte is zero, there is not another domain
defined for the function.
The virtual pages used to store the text for a boolean
expression or a local name table are linked together by
the first word of each' page. A zero link specifies the last
page.
The name control block for a scalar symbolic equals
function contains the same information as a scalar
symbolic left arrow function. The text for the function is
also stored in a similar fashion except that in the first
virtual page used to store the internal text for a domain's
arithmetic expression bytes two, four and six are not

Manipulation of Data Structures

used. There is no local name table required for an equals
function since all non-parameter variables appearing in
the function definition assume their current values when
the function is evaluated. There is no reference count
because if an equals function appears in the definition
of a left arrow function, a copy of the equals function
must be created. While the copy is being made the
equals function is transformed into a left arrow function
to insure that the values of all non-parameter variables
are fixed.
If a name control block denotes an array of symbolic
functions, it contains the same information as a numeric
array name control block. In addition the first byte of
the fourth word of the data portion contains the number
of arguments in each of the functions.
The array is treated as if it is an array of real single
precision numbers. Each element contains the virtual
page number of the first page used to store the arithmetic
expression text for the first domain of the element's
definition. If an element is not defined, its value is zero.
The text for the definition of each element is linked
together in the same manner as a scalar symbolic
function.
N APSS is not designed for string processing but it
does allow the user to create strings, concatenate them
and assign them to variables. This is done to permit the
programmer to label his output. The data pointer byte
of a string valued variable's name control block contains
the number of the string. The string number is the index
of an entry in the string relocation vector. Each entry is
subdivided into three bytes. Byte one contains the index
of the start of the actual string description in the string
picture table. The second byte contains the reference
count for the string. The reference count designates the
number of times the string variable has been concatenated· to form another string, plus one. The third
byte contains the index of the first word of the data
portion of the name control block for the string
variable.
The string picture table contains a description of each
string. Several entries compose the description of a
string. Each entry denotes either a literal string, a
reference to a previously defined string variable, or the
end of a string picture. An entry in the string picture
table is subdivided into three bytes.
If byte one is not zero the entry describes a literal.
Byte one is the number of characters in the literal, byte
three is the number of the virtual page in which the
literal is stored, and byte two is the displacement on
that page to where the literal begins.
Each word in a virtual page used to hold string
literals is subdivided into three bytes. A literal is
divided into segments of three characters. Each segment
is stored in a byte. If a string literal will not fit in the

163

current string page, the literal is broken. As many
segments of the literal as possible are placed in the
current page and the remainder are placed in a new
string page. When this occurs two entries are placed in
the string picture table. This avoids the problem of
linking pages used to hold string literals. The maximum
length of one string literal is 576 characters.
If byte one of a string picture table entry is 1313,
then the entry denotes the null string. It has no length
and does not require any storage, so byte two and three
are unused.
If byte one is zero and byte three is not 501, the entry
denotes a reference to a previously defined string
variable. So that a new copy of the previously defined
variable's string is not created, byte three contains the
index of its entry in the string relocation table. When
this occurs the reference count in the relocation table
for the variable is increased by one.
If byte one is zero and byte three is 501, the entry
denotes the end of a string picture.
When a name control block denotes an array of
strings, it contains the same information as a numeric
array. The array is treated as a single precision real
array. The elements of the array contain the indices of
the entries in the string relocation table for the string
descriptions. If an element is undefined, its value is zero.
As can be seen from the descriptions of the various
data structures, the primary concerns in their design
has been to facilitate their use as operands while at the
sam~ time reducing the· amount of physical storage
required.
ACKNOWLEDGMENT
The work was supported in part by NSF Contract
GP-05850.
REFERENCES
1 G J CULLER
Mathematical laboratories: A new power for the phY.'Jical
sciences
Interactive Systems for Experimental Applied Mathematics
Klerer and Reinfelds eds pp 355-384 1968
2 M KLERER J MAY
An experiment in a user-oriented computer system
Communications of the ACM Vol 7 No ,1') pp 290-294 1964
3 M KLERER J MAY
A user oriented programming language
Computer Journal Vol 8 No 2 pp 103-109 196;)
4 J R RICE
On the construction of polyalgorithms for automatic numerical
analysis
Interactive Systems for Experimental Applied Mathematics
Klerer and Reinfelds eds pp 301-313 1968
5 J R RICE
A polyalgorithm for the automatic solution of non-linear

164

6

7

8

9

Spring Joint Computer Conference, 1970

equations
Proc of ACM 24th National Conference pp 179-183 ACM
Publication P-69 1969
A RULYE J W BRACKETT R KAPLOW
The status of systems for on-h"ne mathematical assistance
Proceedings ACM National Meeting pp 151-168 1967
S SCHLESINGER L SASHKIN
POSE-A language for. posing problems to the computer
Communications of the ACM Vol 10 No 5 1967
R N SEITZ L H WOOD C ELY
AMTRAN-A utomatic mathematical translation
Interactive Systems for Experimental Applied Mathematics
Klerer and Reinfelds eds pp 44-66 1968
A N STOWE R A WEISEN D B YNTEMA
J W FORGIE
The Lincoln reckoner: An operation-orientedon-line facility

with distributed control
Proceedings FJCC p 433-444 1966
10 L R SYMES R V ROMAN
Structure of a language for a numerical analysis problem
solving system
Interactive Systems for Experimental Applied Mathematics
Klerer and Reinfelds eds pp 167-177 1968
11 L R SYMES
Evaluation of NAPSS expressions involving polyalgorithms,
functions, recursion and untyped variables
Purdue University Technical Report CDS TR 33 1967
12 L R SYMES R V ROMAN
Syntatic and semantic description of the numerical analysis
programming language (NAPSS)
Purdue University Technical Report CSD TR 11 Revised
1969

A study of user-microprogrammable computers
by C. V. RAMAMOORTHY and M. TSUCHIYA
The University of Texas
Austin, Texas

PART ONE-USER-MICROPROGRAIVIMABLE
COMPUTER
The user microprogrammable computer as the fourth
generation computer is investigated from the user's
point of view. In the first part of the paper microprogramming and its concept as well as the problems
and requirements incurring its use in various applications are discussed. The current status of the microprogrammed computer is also studied to indicate the
differences of philosophy in microprogramming. A
number of suggestions are made for the design of
fourth generation user-microprogrammable computers.
An algorithm for determining the optimum size of
the microprogram store is presented in the second part
of the paper.
INTRODUCTION
In many ways the evolution of computer architecture
can be likened to that of the human species on this
earth. Possibly the current computers represent those
fossilized representatives of the prehistoric times with
enormous bulk and little ability for adaptation to the
environment. But nature's evolutionary processes
always endeavored to adapt the species to their environment. In a similar sense evolution of the computer
architecture had been towards its adaptation to its
use, i.e., to the problems the computers are intended
to solve.
In a sense, the different generations in the architecture of computers can be distinguished as follows. The
separation of procedure and data as exemplified by
Babbage and Aiken represented the first generation.
Their integration in one storage device and the classification and consolidation of different functions into
distinct functional units signaled the second or von
Neumann generation of computers. Typically the latter
exemplified the permanent irrevocable wiring in all

165

functional tasks (machine language instructions). The
third generation which saw the emergence of multiprogramming and time-sharing made the control logic
more flexible by storing the sequences of elementary
functional operations in a read-only memory (ROM)
called a microprogrammed storage or control storage.
Primarily this simplified the engineering design of the
control unit, the testing and the maintenance functions,
and provided the advantage of having a large instruction repertoire which was required for reasons of compatibility in a family of computers, e.g., IBM 360. The
fourth generation yet to come has opened up another
dimension, that is, adapting a computer to the problem
environment efficiently. One of the basic techniques of
achieving this is via user generated and user alterable
microprogramming techniques.
In this paper we shall take a cursory look at the
problems and prospects of new generation of machines
which could provide dynamic adaptation to the user's
instantaneous needs, particularly when the resources
in the system are shared by a multiplicity of users.
Since the day of massive computer utilities is not far
off,8 the type of computer we are considering here
could be a basic building block of the computer utility
of the future. It is in this context we shall examine
what we believe the first step towards problem adaptable computers, namely the user microprogrammable
computers.

N on-microprogrammed and microprogrammed computers

Wilkes 15,16,17,18 the father of microprogramming, suggested microprogramming as an orderly way of designing control sequences to execute machine instructions
which used many common programming techniques
to advantage, such· as program branching, and the
sharing of common sequences amongst machine instructions (subroutine concept) to provide tremendous
flexibility. In all computers programmed instructions

166

Spring Joint Computer Conference, 1970

reside in a memory which can be altered. In nonmicroprogrammed machines the control unit which
sequences all the instructions is implemented with
hard-wired components. Microprogrammed computer
incorporates stored program rather than hard-wired
implementation of the control unit. Thus the control
unit's operating characteristics (architecture) can be
changed without changing the physical implementation
of the hardware. Whereas a fixed control machine can
be efficient to one class of applications but less efficient
to others, the microprogrammable machines can be
adapted easily to different problem-oriented environments.
BASIC CONCEPTS
A computer generally translates a sequence of programmed orders into a sequence of machine codes
which is the only form that the machine can recognize.
The computer then interprets and executes the sequence
of machine codes. When a machine code operation is
performed, transfers of information occur among the
functional components (e.g., registers, memory, adder,
etc.) of the computer. The communication between
functional components in turn is controlled by a set of
primitive machine operations which consists of opening
and' closing gates and circuits between registers and
basic logic elements within the control store of the
c()mputer.
C onventional.fixed control computers

In these, the machine code is interpreted and executed by a completely wired-in set of circuits in the
control unit of the computer. Thus the computer executes the particular, but small, set of machine codes
(i.e., 'machine language) efficiently. It accepts only one
machine language, however, and, as discussed later, it
is inflexible in terms of its applicability. As the vocabulary of the language increases, the hardware complexity
also increases.
Any elemental operation such as a register-to-register
transfer of information performed during the execution
of a machine instruction is called a micro-operation. A
micro-instruction specifies one or 'more micro-operations
that could be performed in a fixed time interval. The
micro-instruction has predetermined formats which
specify internal data flows. Generally it is stored in
one or more locations in a fast memory called a microprogram memory (sometimes also called a control store) .
A micro-program is a set of micro-operations used to
effect a single machine code of user machine. Every
machine code then is considered to be programmed by

the proper arrangement of micro-operations similar -to
the concept of machine code programming. In this
context, if microprograms are stored in the modifiable
(writable) memory rather than hard-wired, the machine codes may be altered and redefined by changing
the arrangement of micro-instructions to suit the particular needs.
M icroprogrammable computer

The microprogrammable computer is a multi-lingual,
multi-purpose, flexible computer. Its machine instruction is performed by a microprogrammed stored subroutine. Although there may be a loss of computing
speed when stored micro-instructions are fetched from
a memory, it is offset by the use of the high-speed
memory and the simplicity of the hardware construction.
The most significant advantage of the microprogrammable computer is' its flexibility offered to the
user. It is a multipurpose (note the difference from
"general purpose") computer and is flexible enough to
be particularized according to the user's application
environment. It is then the user's responsibility to take
full advantage of the microprogrammable computer.
The level of control

There are typically two types of micro-instruction
formats which characterize the level of control exercised on the elementary operations. In one type (the
function/field type), the fields of the micro-instruction
control the gates on the individual data paths of certain
elemental functions (Figure 1). Each field of the instruction constitutes a micro-operation. The microinstruction thus controls data flows within the machine
at the lowest level (eig., IBM 360/50). The more complex the machine structure becomes, however, the
larger becomes the number of fields in the micro-instruction. Although microprogramming in this instruction format can be tedious because of the fineness of
control, the specification of a macro-instruction can be

set of
gates

set of
gates

set of
gates

set of
gates

set of
gates

Figure 1-Function/field type micro-instruction format

A Study of User-lVlicroprogrammable Computers

I

op code

register 1

register

21

flag

I

Figure 2-Machine-code type micro-instruction format

compact and efficient. A multiplicity of fields in a
micro-instruction which is executed in one machine
cycle permits the microprogrammer to specify and use
fully the parallel processable opportunities available
in the system control. This type of microprogram
control is useful particularly in specifying the operating
system functions of a large computer because (1)
efficient and compact implementation of highly active
operating system functions can reduce in time the
wasteful oyerhead and (2) such functions once implemented become building blocks for others.
In the other type (the machine code type) , the microinstructions, or for our discussion, "mini"-instructions
are sequences of commonly occurring elementary functions (Figure 2). The format of a mini-instruction is
similar to that of conventional machine code instruction: one operation code field, one or more operand
fields, and, optionally, a flag field for special conditions.
Operands in mini-instructions are, in general, originating and/or destination registers of some information
to be transfered. A mini-instruction may be characterized as follows: each mini-instruction represents an
elementary functional task, and any macro-instruction
(i.e., machine code) can be built up from mini-instructions. A mini-operation differs from a micro-operation
in that the former represents a sequence of gating
operations requiring a number of basic clock cycles
while the latter requires one clock cycle. The minioperation may be considered as consisting of a few
micro-operations. In fact many mini-instructions resemble basic machine instructions in some simple
computers such as an IB1\1 650. The "ADD" miniinstruction of the Interdata lVlodel 4 computer, for
example, is performed in three micro-steps as follows: 1
(1) transfer to the adder the contents of the A
register;
(2) transfer to the adder the contents of the register
specified by the Source (S) field of the instruction;
(3) transfer the contents of the adder (i.e., sum) to
the register specified by the Destination (D) field of
the instruction.

In the instruction, the registers are designated by
alphanumeric names which, in turn, are translated into

167

and represented by the four-bit S- and D-fields (Figure
8). l\1ini-operations are either wired in the machine or
activate micro-programs in the control memory. Iri the
case of the latter, the instruction execution time is
longer simply because of additional memory references. Although mini-instructions are not suit.able for
flpecifying overlapping operations, data flows within an
individual mini-instruction may be performed in
parallel. The mini-programs are generally user-oriented
and easier to write and debug. As in the case of high
level languages, the storage requirement for miniprograms are smaller than their microprogram counterparts (up to 20 per cent) but additional control memory
or \vired-in logic is necessary for their interpretation.

Users

The motives for the user to use microprogramming
are many. In order to discuss problems from the user's
viewpoint, we classify computer users largely into two
categories according to their main motives and interests in using microprograms. The first, called the
owner-user, consists of those who own the computer
system, maintain, manage and schedule its use. The
administrator of a computer center, systems programmer, etc. are in this category. They do not necessarily
design or run application programs. They are mainly
interested in the maximum utilization of the computer
resources by the customer programs.
The second is called the customer-user who actually
uses the computer to run his programs. Anyone individual could be both owner-user and applicationuser depending on his interest and the type of job he
runs at the moment. This includes those who use the
system to perform computational tasks not related to
the management and upkeep of the computer system.
The customer-user is mainly interested in the maximum
convenience and dependability such as better turn
around time or response time etc., for processing his
jobs.
The difference of purpose and interest in computer
usage necessarily differentiates the users' interests in
microprogramming.

:;\UCROPROGRAl\llVIING SITUATIONS
The two types of users try to restructure the computing system in two different ways; the owner-user
tries to maximize the utilization of his equipment by
the customers and the customer-user tries to maximize
the performance of the equipment in solving his
problem.

168

Spring Joint Computer Conference, 1970

M icroprogrammin,g for owner-user

The primary motive for the user-owner wanting
access to microprogram store for its content modification is to accommodate system expansion or change.
EIllulation

One primary area is emulation which reduces the
reprogramming costs due to change over to new systems
having many similar instructions and addressing
structures. 13 Emulation can be loosely defined as a
combined software/hardware interpretation of the
machine instruction of one machine by another. In a
conventional non-microprogrammed machine, the emulation to be successful requires that the instruction
formats and repertoire must be very similar. A microprogrammed machine, on the other hand, can emulate
a wider variety of machines. In particular however,
experience indicates that word lengths of the host computer must be same or some convenient multiple of the
target computer being emulated. It has been found
desirable that the number of arithmetic registers in the
host· machine must at least be equal to or greater than
that of the target machine simulated for efficient
emulation. Another desirable feature to have in the
host microprogrammable machine is that there be a
number of toggles (flip flops) which can be used for
setting and branching by appropriate conditional
micro-operations.
Expansion of 1-0 and MeIllory

Possibly the greatest benefit to the owner-user is accrued when microprogramming is used to accommodate
the addition of new 1-0 devices, or systems adjuncts
like associative memories or additional memory capacity.
One can categorize the added chores that the system has
to accommodate due to the expansion of 1-0 and memory.
In the case of added 1-0, they can be increased interrupt
activities, special 1-0 formating, manipulation of
variable length fields, new 1-0 commands (if new devices are added), and special error recovery procedures.
In the case of increased-memory capacity, there is
the problem of addressing associated with the increase
in the number of pages in the paging schemes. New
microprograms can be written to reinterpret old 1-0
commands to take into account the change.
Adding New Processing Versatility to SysteIll

The owner-user to accommodate scientific customers
may provide specialized array of new commands or

fast subroutines (which can be partly microprogrammed) not provided originally in the system architecture of the computer, e.g., commands for multiprecision,
floating-point arithmetic, radex conversion, etc.
Redistribution of Operating System Functions

Occasionally a redistribution of system functions
between hardware, software and firmware may be in
order. Principal reasons may be new applications and
new devices added into the system. For example, the
introduction of a real time control system may require
periodic monitoring and instant servicing of high
priority interrupts. Frequent use of a function or
the immediacy of response can be good reasons for
microprogramming part or whole of the function,
rather than leaving it in the software.
System reconfiguration due to an application need
or failure of some system device is another instance
where redistribution of operating system functions may
be desirable. Incurring changes can also be met by
modifying microprograms of some functions.
Microprogramming for customer-user

To the customer user, access to microprogram control
has many advantages which are fraught with head~
aches to owner-user and the system. We shall· briefly
outline a few avenues of benefit to the customer-user.
Real time environment

Microprogramming may be a solution when machine
instructions and assembly language programming can
not keep up with process, where the process is primarily
CPU limited.
Application orientation

Where it is desirable to execute the problem in a
manner naturaLto the process such as in certain problem
oriented machine languages, 6 . it becomes desirable to
interpret application languages and data structures in
their natural form to facilitate on-line debugging,
modifying and program building. In these cases, the
statements in application language may be executed
directly via microprogrammed interpretation. To carry
this further, by having access to a dynamically alterable
microprogram store, one can particularize the computer
dynamically to match with the varying problem types
and environments as encountered, for example, in
jobshop environment. In this manner, the process-

a

A Study of User-Microprogrammable Computers

oriented, higher-level language can be executed easily
via a microprogram interpreter.
PerforlDance enhancelDent of production
ProgralDs

169

random number generators and transcendental function
generators are best microprogrammed. Real time clocks
and associative memories form useful hardware adjuncts iIi this area also.
Pattern recognition

When some procedures of a production-type program
which is run frequently on different sets of data, e.g.,
payroll, are executed time and again, they can be
microprogrammed to enhance the performance of the
program and reduce the overall execution time. Statistics gathering for the usage frequency and execution
time of procedures can be performed concurrently with
execution processes again by the use of microprogrammed adjuncts. Thus by analyzing the usage statistics it is possible to microprogram selected parts of
the program as well as selectively storing machiil-e
language instruction strings in high speed control
memories to improve the execution time of the program.
In the second part of this paper, we shall develop an
algorithm for this purpose.
ADJUNCTS FOR USER APPLICATIONS
We shall review some key user applications and consider what other hardware-software aids needed in
addition to dynamically alterable microprogram
storage.
Realtime environment

In the area of real-time control applications where
vigilance to high priority interrupts and immediacy of
response are essential, the detection and classification of interrupts is conveniently performed by a fast
associative memory. An associative memory is predicated here since it provides the flexibility of assigning
any priority to the interrupt lines and having so assigned, the identification and servicing can proceed
accordingly. Real time applications also need an access
to clock (timer) for purposes of sampling, etc. Amongst
the software adjuncts there is a ne~d for a local monitor
(executive) to provide "a better collaboration" and
communication between the real time process and the
system executive. Interrupt monitoring and control of
associative memory could be more beneficially microprogrammed.

The area of pattern/symbol recognition has developed many picture processing languages. Many of
them require extensive matching and bit manipulations. In particulat, they involve neighborhood processing of clusters and adjacent bits. Here again, microprogramming with associative processing could ease
the task.
Alterable microprogram store

Best benefits of microprogramming are accrued when
the user has access to certain specified portions of
microprogram store and has ability to modify its contents. The system can use a relatively fast possibly
unalterable read-only memory to interpret basic
machine operation codes and certaiIi frequently used
subroutines like radex converters, etc. The dynamically
alterable microprogrammed memory can be shared
between the system (owner-user) and the customeruser. Most preferably the customer user portion must
be paged. For most common functions, the system with
a modest instruction repertoire would need about in
the order of 106 bits or 2K words of approximately 48
to 64 bits per word. The access time of the microprogram memory must be at least between ~ and % times
that of the main memory. The user microprogram store
must be paged or otherwise it may impose costly allocation and garbage collection problems with multiple
concurrent users. The cost of renting of the microprogram memory must be reasonable to the customer user.
Of course, because of its very nature, the user must be
knowledgable at least to the extent of using it to better
cost performance ratio. It is obvious that its use will
be favored by production-type processor-oriented problems rather than otherwise. Also, the system owner
must allow larger quantum of computational time to
the users using microprogramming for the very reason
that since theIr memories are expensive their cost
advantage lies in their frequent usage.
Micro-instruction structure

Simulation

In the area of simulation of dynamic processes, frequently used subroutines such as function generators,

To be readily usable, the micro-instruction structure
must be simple. Some third generation machines such
as an IBM 360/50 require so much knowledge of the

170

Spring Joint Computer Conference, 1970

sophisticated, internal organization of the machine
control that micro-programming for them is a nightmare. It is essential for easier microprogramming that
the microprocessor of the "visible" machine which the
microprogrammer sees and uses be simple.
This also brings up a need for an assembly language or
a higher level language for microcode. The customeruser, unlike the owner-user, would use the microcode
only if it is simple and convenient to write and debug.
l\1icro-instructions must be syntactically sound and
should not contain any "unnatural" funnies. Assemblers
and compilers, on the other hand, tend to reduce the
"tightness" or compactness of the final microprograms.
Thus, there is a possibility of writing microprograms at
two levels: micro-level and mini-level. The micro-level
operations are the most elemental operations beyond
which the machine cannot be controlled. They may
represent basic gate level controls. Microprogramming
is the method of describing these micro-level operations
to execute certain computational tasks. This mode is
beneficial to system owner-user in implementing 1-0
controls, and certain frequent operating system functions. The key issue is that efficient code is essential
here and programs once written become building blocks
for others. The mini-operations are sequences of commonly occurring micro-operations which are characterized by the following: (a) Each mini-operation represents an elemental functional task that the system can
perform, and (b) any general purpose instruction
repertoire can be built up from combinations of minioperations. Specifically, the mini-operations will be
user-oriented. The motive for introducing mini-operations is twofold. Mini-programs are easier to write
than microprograms and hence easier to debug. Just
as in the case of higher level language, the amount of
storage space needed to store mini-programs will be
vastly smaller than that required by microprograms
as much as 20 per cent. Of course, the execution time
for mini-programs will be longer than execution of
microprograms due to an additional level of interpretation. lVlini-level operations could provide parallel
processing capabilities whereas microprograms can
only be sequential. One version of mini-instructions can
be subsets of Iverson's APL operators which make
array representation and parallel manipulation possible.
This again provides economy in notation and storage.
It appears that re-entrant types of microprograms
with physically separate high speed scratch pad storage
area may be advantageous in the future. To summarize,
the goodies required for multi-user microprogram
sharing and use are similar to thofle in a conventional
multi-programmed area.
Another requirement would be the addressing of
arrays and instructional information stored in the

microprogram memory. Since it is believed 80 per cent
of all numerical calculations involve matrix manipulation such processor limited computations can benefit
by matrix type addressing facilities and reduce the user
storage requirements of microprogram memory. Since
most of the addressing within a microprogram memory
is always within the close proximity of the address of
micro-instruction, method of "proximity" addressing
(e.g., incremental addressing) can be helpful. Since
logic is cheap, arithmetic transformation of addresses
may be very helpful as against a random transformation
as in paging.
Privilege among users

It is obvious that not all micro-operations must be
available to the customer users whereas the system
(owner-user) should have access to all. Thus the
system operates in a privilege mode, having access to
all vital controls in the systems. In the restricted
privilege mode, the customer-user will not have access
to:
(1) Any

micro-operations dealing with address
transformation, memory or I -0 barricades;
(2) Any micro-operation dealing with hardware
functions of the executive, like interrupt handling and
polling mechanisms, etc.
Let us now consider the approaches for implementation of these. One obvious solution is to have two microprogram memories, one which can execute the unrestricted operations of the system and the other the
restricted subset allowed to the user. The operation
decoder will not execute any system micro-operations
which come from user portions of a program memory.
Parallel surveillance and debugging

When multiple users are using the microprogrammed
control memory concurrently, it is essential for the
system to provide close surveillance. The possibility
of one program clobbering another program unintentionally is high when the program being executed is
new or being debugged. The tangible approaches to the
problem include any or all of the following:
(1) Restricting the users to a subflet (unprivileged)
of micro-operations;
(2) Sequencing each machine instruction in two
modes: normal mode and debugging mode. In the
debugging mode each instruction will be scanned
for possible syntax errors and conft.icts in address space,
etc. Also sentinels, breakpoints and test address information ,vill be serviced. Any desired intermediate

A Study of User-l\1icroprogrammable Computers

computations will be saved. In the normal mode the
surveillance operations will be suppressed.
In the second approach instead of interweaving the
surveillance and execution sequences, one can have two
distinct micro-program sequencing units, one unit for
normal instruction execution and the other for overseeing and interrogating the execution sequence. This,
of course, involves the availability of multiple microprogram control units. The use of a supervisory microprogram monitoring each machine code operation is
equivalent to each program having its own individual
supervisor.
Relocatability

Relocatability of a program implies that it can be
put into any contiguous set of addresses in a memory
and executed with minor reinterpretation of its address
fields .. Specifically, it implies that the addressing is
relative to a reference. By having relocatable features,
the micro or mini-programs can be stored contiguously
in the microprogram memory and hence it is possible
to serve a number of users of microprograms by swapping them in and out of the expensive microprogra;mstorage.
PERIPHERAL AREA
The first good use of microprograms came in the
peripheral area when IBM developed the I/O channel
for their pre-System 360 computers. The channel isa
small microprogrammed computer itself; it communicates with the central processor and controls the
various I/O devices. The channel transfers the required
amount of data between locations in main memory
and I/O devices, protects against unauthorized transfer
of information into main memory, signals the processor
units of I/O operation status by means of interuption
and permits concurrent activities of the central processor and I/O devices (i.e., multiprogramming). :Microprograms of the channels thus provide flexibility since
they handle a wide variety of I/O devices as well as
complex communications with memory and the
processor.
Microprograms are also used in satellite computers
of a large computer system. The satellite computer is
mainly used to control the activity of various I/O
devices. In this fIlense, it functions as a channel to the
main computer. It can transfer information to and
from the central computer, check word parity, and
store information in the specified location of the specified storage device. l\1icroprogramming is employed

171

to enhance the efficiency and flexibility of satellite
computers, and to control a variety of I/O devices.
FUNCTIONAL REQUIRENIENTS
Micro-operations available to the users

All microprograms available to the system are
classified into two categories: normal micro-operations
and privileged micro-operations. The normal microoperations are those that may be used by the customeruser in his microprogram. The privileged micro-operations are those that may be used only by the systemuser.
Protection may be realized by using two microprogram memories: one for the customer-user and the
other for the system-user. If they are used strictly by
their designated users, encroachment is fairly easily
prevented.
Visible machine

The visible machine, that is, the machine organization as seen by the microprogrammer must be simple.
The internal structure of the machine should be wellorganized so that the data flow among functional units
may be seen easily via micro-operations.
Functional congruity

The incongruity among functional units requires
much housekeeping operations and causes clogging of
internal data flow and, therefore, must be minimized.
Consider, for example, a functionally incongruous
machine with 2-byte wide memory data path, 2-byte
wide registers and I-byte wide adder. It is apparent
that for any simple 2-byte addition, a value has to be
divided into high- and low-order parts in order to
conform to the adder, then computation must be performed on each part sequencially. Every 2-byte addition requires two passes to the adder.
A microprogrammed computer. must be designed
such that a set of common microprogrammed sequences
can be used in more than one way. To illustrate this
view, consider now a computer which has an adder
that can be set to add/subtract in either binary or in
binary-coded-decimal in groups of 4 bits.
Then the following two algorithms will perform
multiplication and division in either binary or decimal
number systems depending on the setting of the adder.
In other words, they have identical sequencing for
decimal and binary multiply and divide procedures.

172

Spring Joint Computer Conference, 1970

The multiplication algorithm which gives a two-word
product when a word A is multiplied by a word B is as
follows.
1. Store A.

2. Compute and store 4XA.
3. Consider the lowest order four-bit group of B as
being 4a + {j where a and {j are each two bit numbers.
Add 4A a-times and A {3-times to form a partial product.
4. Shift both B and the partial product four places
to the right.
5. Repeat steps 3 and 4 for the next higher order
four-bit group of B. At the end of the procedure, a
two-word product is obtained.
The following algorithm for division may be microprogrammed advantageously to divide a two-word
number A by a one-word number B yielding a oneword quotient and a one-word remainder.
1. Store B, then compute and store -4B.
2. Add - 4B to the high order part of A a-times until
the accumulator goes negative. a will be in the range of
one to four.
3. Add B to the accumulator {3-times until the accumulator goes positive. {3 will be in the range of one
to four.
4. Record the first digit of quotient, the high order
two bits being a - 1 and the low order two being 4 - {3.
5. Shift the diminished A four places to the left and
repeat the steps 2, 3 and 4 as many times as the number
of 4-bit groups in one word. The digits yielded are the
successively lower order digits of the quotient, and the
number finally left in the accumulator is the remainder.

These processes are applicable when the negative
quantities are represented by their two's (ten's) complements. The reader may verify that the algorithm is
valid for both binary and BCD arithmetic. It is obvious
that this is more economical in terms of microprogram
memory space than having two separate microprograms:
one for each binary and decimal operation. l\1icroprograms arranged in this manner will not only economize
the space and use of microprogram memory, but will
simplify the computer organization.
Parallel surveillance on all user operations

In a multi-user, multi-processor environment, concurrent operations can encroach each others working
resources. It will be necessary to barricade each user
from innocent mistakes of others, particularly in their
microprograms. The micro-programming sequences
can be so designed that instrusions, and deadlocks
could be prevented or if they occur, the damage to
other programs is either recoverable or minimal.

Allotment of microprogram memory space among multiple
users -

Allotment of modifiable microprogram memory
(l\1MM) space among multiple users is a problem of
the operating system. The operating system assigns
each job a priority for the use of MMM and when
some MMl\1 space becomes available it allots the space
to the incoming jobs according to theirpriority.
Generally, microprograms that perform elementary
operations can be shared by all user programs to prevent
unnecessary duplication. Therefore, commonly and
frequently used microsequences should be coded as
macros and stored in a designated space where all the
users have access. This would reduce the amount of
Ml\1M space allocated for the individual job and certain
jobs should be processed within the scope of microprogrammed macros.
Addressing the microprogram memory

A user program may be coded into regular machine
code or a set of microprograms or a combination of
both. If it is coded into machine code, it is stored in
the core me~ory store. If, on the other hand, it is
coded into a set of microprograms, it should be brought
into the control memory. When parts of a program are
coded into machine code and others aremicroprogrammed, communication linkage for control, i.e., a
uniform addressing scheme between the two types of
memory must be established. In order for control to
be shifted between the two memories, the starting
address of a microprogram and return address of the
program in the core memory must be specified.

Protection within control memory

When a number of user microprograms reside in the
modifiable microprogram memory there arises the possibility of innocent and unintentional encroachments
into one another's microprogram space. A number of
schemes prevalent in the area of current time sharing
computer systems can be used to prevent unauthorized
memory encroachments. We shall briefly list them as
follows:
1. the procedures can be written in the "reentrant"
code and the modifiable part of the microprogram
would be located at a distinct location;
2. the microprogram memory space can be "paged"
and access to each page would be checked for user
identification and protection.

A Study of User-1VIicroprogrammable Computers

Simpler schemes described below can be also
adequate.
1. Restrict the customer user's, access to a certain
area of the microprogram store. In this way, the vital
functions implemented in the owner-user's microprogram store will be protected.
2. Process programs in two modes: debug mode and
the normal mode. New programs are run in the debug
mode first.
In the debug mode, the microprograms sequencing
the user's program will check for violations of his
address space, etc. Once fully debugged, the programs
can be run in the normal mode and executed.

CONTEMPORARY MICROPROGRAMMED
COMPUTERS
Currently, a large number of microprogrammed
computers are available on production basis. They are
not irrtended, however, to allow the user a flexible use
of microprograms. In fact, with a few exceptions,
their microprograms are unalterable. The VIC-I computer of RCA, for example, has an unalterable readonly memory for its microprograms. Its main design
objective is high reliability for aerospace applications.
Every macro-instruction (machine code) is performed
by a set of basic micro-operations and is capable of
being executed in a variety of ways through various
combinations of microprograms. Microprograms are
also effectively used to implement a provision for
"graceful degradation" through error sensing circuits
and automatic rerouting.
The Micro 800 is a small-scale, microprogrammed
computer with a fast read-only memory of 220 nanosecond cycle time. In this computer, good uses of microprograms are observed. The macro-instruction is fully
dependent on the microprograms and is electrically
alterable within the capabilities of the hardware. Main
memory word length (in multiples of 8, 9, or 10 bits)
and I/O interrupt servicing are also controlled by
microprograms. A special micro-memory board can
be inserted to perform system diagnostics. The manufacturer provides a Micro 800 simulator written in
Fortran IV for microprogram logic design and debugging on a'variety of computer systems.
The IC-9000 of Standard Computer Corporation is a
small-scale computer and so far the most versatile and
powerful with respect to microprogramming capabilities. It is a relatively expensive micro-processor with
many sophisticated features as well as a fast read-only
microprogram memory (and fast-writable memory
available. at increased cost). Macro-instructions are

173

microprogrammed with the exception of optional
"Language Boards" which perform preliminary decoding of target language instruction generating a
transfer address in the microprogram and setting
various conditions. The IC-9000 as a versatile microprogrammed data-processor has a number of advanced
capabilities such as micro-subroutine nesting, limited
instruction overlap, many high-speed registers, and
efficient I/O interfaces.
In the following sections, two currently available
microprogrammed computers, the Interdata Model 4
of Interdata Corporation and the IBM 360/50 of IBM
Corporation, will be studied in some detail to illustrate
some differences. Among other things, they differ in
their microprogramming philosophy.
Interdata 4

The Interdata 4 is a small-scale, multi-purpose computer with prewired, nondestructive read-only memory
(ROM) of 400 nanosecond cycle time. 2 The logic of
ROM is wired on a pluggable circuit card containing a
1024 word U-core ferrite transformer with wires winding
through them to determine the contents of the store. It
can be altered by rewiring with some special equipment.
There are four types of microprogram instruction
format. 1,2 An instruction consists of 16 ,bits: 4 bit~ for
the operation code field and 12 bits for the various fields
according to the format. Figure 3 shows an example of
the instruction format. The format resembles closely
that of the conventional machine code instruction
which provides easier microprogramming. A microprogram assembler as well as a simulator for testing
microprograms is supplied and wiring is automatically
done by a machine through the punched paper tape.
With this type of instruction format, it is easy to
program but only one operation can be specified and
overlapping of operations is not possible. This will
cause a performance loss of the computer and is a considerable disadvantage over the IBM 360/50 which
has a capability of specifying overlapping microoperations.

D

I

s

Figure 3-Interdata 4: Add, subtract, exclusive OR, AND,
inclusive OR, and load instruction

174

Spring Joint Computer Conference, 1970

•

IOAI
""0-5

MD M
Cn" .,..
Cn" Cn"
Cn.1

""*-

Control
FI.1d

Local
S.....
Add,..
Con....1

Parity
lill
31-55

Got..,
"'to
IIMar

....

La.ch

Shlf. Cn"

.,..

LI

Coun... Cn"
~'ionCn.1

CMI

Mov.r
~.ion

Lalt
Di,i'

.....h
CnIr

....
Corry

......

Pority
lill
56-III

Moyer
function
lieht
Di,i'

Eflllit
field

Left-

11th..

Adder

Adder

~h

Inpu'

..... '

Con ....1

A-

T....
ar

....r·

C.....I _ '

Con.... ,

SIa'·
Set.i.
Con.... ,

~.ion

Con"'"

Cn.1

Figure 4-An IBM 360/50 micro-instruction

The significance of the Interdata 4 is that it is a
user-microprogrammable computer with a limited
facility for writing microprograms. It meets the requirements by providing a fast-read and very slowwrite control memory.
IBM 360/50

The IBM 360/50 is a medium-scale, multipurpose
computer. It operates with a 2815-word capacitor
read-only store (CROS). Each word consists of 90
bits and controls the gates and control lines of the
system for one 500-nanosecond machine cycle.
Unlike that of the Interdata 4 computer, the format
of microprogram instructions is somewhat complex
(Figure 4). A 90-bit instruction word is divided into a
number of fields of various field length. Each field has a
predetermined function. An instruction word is read
out of ROS and set into the read-only storage data
register (ROSDR) at every machine cycle. The address
of the next micro-instruction is composed partly of the
90-bit micro-instruction in use and partly from the
results of the current machine cycle. Specific bits from
the ROSDR fields are combined with the CPU or I/O
mode by the CROS-word decode logic to activate control lines. Therefore, a number of micro-operations are
performed concurrently in one machine cycle, and the
speed of machine performance is increased.
The Control Automation System (CAS) was developed by IBM as a programmer's assistance in microprogramming for the IBM 370. CAS accepts a listing
of the micro-instructions prepared on the logic sketch
sheet, produces the 90-bit pattern for each control
word, and assigns the address for each ROS word. This
reduces the microprogrammer's job to preparing the
logic sketch sheet. It is not a simple task to prepare it,
however, since there are fixed formats with which the
microprogram mer must comply.

Before a microprogram is converted into the bit
pattern for the ROS wiring, it is tested on a cycle-bycycle simulation with various sets of initial conditions
and traced step-by-step for the effects of instructions
on the various components of the system. This will
eliminate a system malfunction caused by ill-formed
microprograms.
With these facilities, the pitfall of complex timing
and gating restrictions is avoided and an effective use
of the IBM 360 capabilities may be realized. Although
it still seems somewhat cumbersome, the complexity
of microprogramming is considerably reduced.
The two contemporary' computers studied here are
equipped with the writable ROS. Although their microprogramming and microprogram loading processes are
still cumbersome, various equipments have been developed as an aide for easier microprogramming. Although, in principle, the concept of alterable microprogramming is observed in the philosophy of hardware design, manufacturers are still reluctant to relinquish microprogramming to general users.
COl\1MENTS ON ORGANIZATION
In spite of its flexibility, the microprogrammed computer is basically slow. The idea of two-level microprogramming discussed earlier suggests ways for improvement. The frequently-used management functions
can be implemented by the lowest, gate-level microprograms. At this level, well-established speed-up
techniques such as the microprogram memory interlacing, overlapping of various execution phases of a
micro-instruction, and the exploitation of control
parallelism will improve the efficiency of execution.
These techniques, however, require detailed knowledge
of processor timing and internal data-flows. The user
will use the next higher level, i.e., minilevel, for his
microprogram applications. At this level, instructions

A Study of U ser-1Vlicroprogrammable Computers

are less machine dependent and, therefore, much easier
to use.
If the computer has more than one microprogrammable, instruction execution units, the mode of their
utilization will depend on the nature of computations.
The microprogrammed control units would need access
to common storage registers, for example, where
parallel computations are performed on common,
shared data areas. Synchronization between control
units must also be established. Examples of such computations are automatic numerical error analysis,
significance checking, performance measurements, and
so forth.
Microprogrammed control had been successfully exploited in the newer designs of functional subsystems.
The use of such special purpose uiiits for instruction
reformatting, address generation, data structure interpretation, and certain types of simple array processing
may be suggested.

FINAL REMARKS
To conclude, further areas of actual and potential
microprogram application are listed. These are not exhaustive, of course, but should lead the future microprogram users to further study of microprogram
applications.
1. Compilers that map higher-level language source
programs to microprograms via intermediate languages.
2. Control of time sharing systems (special macrooperations for data handling, queue manipUlation,
scheduling and allocation algorithms).
3. Interrupt servicing, queuing, status interpretation.
4. Facilitation of incremental compilers for time
sharing.
5. Control of parallel computer organization.
6. Increased reliability through diagnosis of parts
not being used in current instruction.
7. Increased ease and accuracy in fault detection
through diagnostics on the microprogramming level.
8. Control of "degraded performance" capability
for aerospace and medical applications.
9. Direct execution of process oriented languages.
10. Emulation of machine languages.
11. Image identification.
12. Cryptanalysis.
13. Control of associative memories (control of
multiple operand, mask and results registers load and
dump circuitry).
14. Information retrieval especially in connection
with associative memories.

175

The possibility of microprogram application is limitless. As the microprogramming and hardware techniques advance, wider variety of possibility will be expected to develop. We hope that our study of usermicroprogrammable computers will motivate the future
microprogram user to further study the area of microprogramming and its applications. It is also our hope
that this paper will incite those who are in charge of
the design of fourth generation computer systems to
consider the full impact of microprogramming to users.
PART TWO-TRADE OFF ALGORITHM
INTRODUCTION
The optimal allocation of resources to maximize computing throughput is one of the most important problems in computer design. The throughput of a computing system is a function of many parameters. One
important problem in ,designing a microprogrammable
computer system is the determination of the optimal
size of the high-speed and expensive alterable microprogram memory as well as other types in a hierarchy
of memories given the total resources allocated for
memory. An algorithm for designing memory heirarchy
should answer the following:
1. The optimum sizes of microprogram memory,
and other components in a memory hierarchy in order
to minimize the average access time for the user's activity profile for a given cost constraint. The optimum
amount of information transfered to the microprogram
memory.
2. The optimum set of memory types, including
microprogram memory in a hierarchy with regard to a
number of memory types, cost and access time.
3. The cost versus average access time tradeoffs for a
memory hierarchy for a given activity profile. A change
in the minimum average access time for an expenditure
of some "x" dollars on the memory.
DEFINITIONS AND NOTATIONS
When a computer program is run, certain blocks of
information are accessed more frequently than others.
The activity profile of a given set of programs is the
relative frequencies with which blocks of addresses are
accessed when that set of programs is run.
The numbers of blocks accessed at frequencies F1,
F 2 , F 3 , • • • F m are denoted by WI, W 2 , W 3 , • • • W m,
respectively. Activity is defined directly proportional
to the access frequency.
m

~PiWi
i=l

=

1.

1'76

Spring Joint Computer Conference, 1970

An m-dimensional vector P = (PI, P 2, ... Pm), where
Pi < P HI, denotes the activities of a program. As. sociated with P, there is an m-dimensional vector W
such that its ith component Wi represents the number
of blocks of information accessed at activity Pi.
The activity profile then is defined by an ordered pair
of vectors (P, W). In practice, it can be determined
either analytically or by simulation and experimentation. 7 •11
Let us aSf'ume that n types of memory devices are
available. Furthermore, let Ti denote access time and
Ci denote the cost of one block of the memory type i.
With these definitions and notations, the problem is
now· stated.

and let
n

E V ik

ViO = Wi -

(1)

k=1

so that the problem becomes
n

m

Go+EEckVik~G
k=1 i=1
n

E V ik =

Wi

for

i = 1, 2, ... , m

for

1

=

L L

k=O

(2)
~

i

~

m;

0

~

k

~

n

minimize
m

T

STATEMENT OF PROBLEM

n

PiTkVik

(3)

i=l k=O

The size of memory type k denoted by Uk is then

Given:

m

1) the maximum permissible cost G for the entire
storage system;
2) n different types of memory available where the
cost per block and average time to access one block
are Ci and T i respectively;
3) the activity profile of the information to be
stored in the hierarchy is given by the 2 X m matrix:

W 3,
where

L V ik = Uk.

i=1

A set V ik satisfying the condition for 1 ~ i ~ m and
1 ~ k ~ n is an optimal solution.
The following theorem allows the problem to be
applied to a subset of all the available memory types.
THEOREM 1. Given three memory types, 1, 2,
and 3, such that
(4)
C3 > C2 > C1
and
(5)
T3 < T2 < Tl
and

m

Pi

< PHI

and

E PiWi = 1;

Tl - T3
Tl - T2
>
C3 - C1 C2 - C1

i=1

determine the sizes of the microprogram memory as
well as other memories and the location of information
blocks in the storage such that
1) the total cost does not exceed G and
2) the average access time to any information block
stored in the hierarchy is minimized.
Without losing generality, we may assume that Go
is the cost of mass storage or Type 0 memory, and that
it is one of the least cost and large enough to accommodate all the blocks in the program. We assume that
one block of information occupies one unit of x memory
space and a block is divisible between two memory
types.
LINEAR PROGRAMMING FORMULATION
Let V ik be the number of information blocks of
activity Pi stored in memory type k, (1 ~ k ~ n),

(6)

for any activity profile, then there are no blocks of
information stored in· memory type 2 in an optimal
solution.
The proof is straightforward. An interested reader
may refer to Reference 12 for the proof.
ALGORITHM
By an-application of Theorem 1, a new hierarchy is
derived from the original memory hierarchy. While
data require the same amount of microprogram memory
space as that of non-microprogram memory space
when transfered, note that a macro-instruction in nonmicroprogram memories is represented by several microoperations when transfered into a microprogram memory. This means that the contents of non-microprogram
memory require several times more space of the microprogram memory when transfered. This fact is reflected
in the algorithm by the cost of memory.

A Study of User-Microprogrammable Computers

Another consideration is that a macro-instruction is
performed faster when it is represented by a microprogram and directly executed. Besides the high-speed
of microprogram memory, this is largely due to an
elimination of instruction fetch cycles at an execution
time that an instruction is executed faster with a less
number of machine cycles.
These two facts concerning a transfer of information
from a non-microprogram memory to a microprogram
memory must be reflected in the algorithm. Adjustments are made in deriving a new memory hierarchy,
transfer values, and in the remainder of the algorithm.
(1) Determine memory type j

such that for

O0

r
______ -L

INITIAL

/In

PARAMETERS
TABLE

IORT

Figure 2-Memory allocation and data flow

than the length of the machine word and this is the case
for the majority of sorting jobs.
The analysis of the benchmark problems on sorting
reveals that up to 40 percent of the total sorting time
(see later section of this paper) is for CPU operations.
The functional description

The Sort Processor (SP) is an internally programmed,
firmware special purpose processor dedicated for performing the sort routine outside of the computer's
CPU. The SP shares the common l\1:M with the CPU
on a lower priority basis and has the simplest interface
with the CPU (Figure 1).
The START signal informs the SP that a Control
Word (CW) is available on the MM bus. The CW consists of function and address fields. The function code
indicates the type of operation to be performed by the
SP, e.g., sort (ascending or descending), transfer
status, resume, terminate. The address field specifies
the starting address (ao) of the initial parameters and
boundary conditions table required for the sorting.
This table set by the sort-merge control program, contains the following parameters (Figure 2):
(30 and (3n are the initial addresses of the first and
the last records in the work area.
¢o and ¢m are the initial and final addresses of the
string list buffer.
(Instead of (3~ and ¢m the number of records in
the work area (n) and the size of the string list (m)
can be given.)

186

Spring Joint Computer Conference, 1970

The iteration of steps 5 and 6 continue until the end
of the string. This may occur when one of the following
conditions arise:

SYSTEMB

---------------------r-------------------

CPU

!'T..E!!~

______________

L~~~2

_____________ _

Figure 3-System configurations with sort processor

= address of the key word of the first record
l = length of the key
r = length of the record (considering fixed length
of records)

'Yo

The operating sequence of the SP, after receiving
the CW, typically proceeds in the following manner:
1. Reads the initial parameters table by the address
ao and stores it in its Register File (RF).
2. Generates the effective ,addresses of the consecutive keys by computing 'Yi = 'Yo + ir starting
with i = 0, subsequently i = 1, 2, 3 ... , n.
3. Reads the key (Ki). Meanwhile generates the
initial addresses of each record (f3i = f30 + ir),
and stores the codes Kif3i in the consecutive locations of its Search l\![emory (SM). The capacity
of the Sl\![ can be smaller than the number of
records in the work area, so that in the initial
load the SM will be filled at i < n.
4. Locates the first desired (the highest or lowest)
key (KO) from the SM and stores the corresponding
address (f30) in the CPo location.
5. Replaces the vacancy in the SM with the next
(K i+1)(f3i+l) code available from the work area.
6. Searches for the next desired key using the key
Kj-l located at the previous (j - 1) cycle as base
key for the comparison, and stores the address f3j
corresponding to the. newly located K j into the
location cpj, and returns to step 5.

a. f3i = f3n (or i = n) indicating that the work area
is exhausted.
b. cpj = CPm (or j = m) indicating that the string list
is exhausted.
c. No more successful searches in the SM are possible. The Kj was the last desired key, e.g., the
SM does not contain any more keys that are
greater than (or less than) the current base key.
Now the SP interrupts the CPU and stays idle until
a new control word is received.
The SP continues functioning if it is preassigned to
control the peripheral file (Figure 3, Systems C and D),
and may:
a. transfer the sorted string (by the string list) from
the MM into the peripheral file, load new raw
data into the l\![M;
b. reorganize the memory map, move the buffer
areas;
c. exchange status information with the Systems
Supervisor, and resume sorting operations.
The degree of the complexity of these control functions depends upon the computer systems architecture
and the preestablished functional duties of the SP.
Although the algorithm described appears to be
optimum for the proposed systems organization, other
sorting methods can be implemented.
The hardware structure

The SP consists of three major functional blocks, all
designed with MOS LSI components: search memory,
register file and microprogram storage. The block
diagram of the SP is illustrated in Figure 4.
Search Memory (SM)

The SM is divided into two' sectors. The KEY
sector, designated for storing and searching key words,
includes logic for comparing keys. The ADDRESS
sector stores the initial addresses of the records. The
number of bits per word in this sector equals log2 M,
M being the size of the computer's MM in words
directly accessible to the SP. Both these sectors are
independently expandable in their bit directions, and
the whole SM is expandable in the word direction in
modules. These capabilities allow the SP to meet
various sorting applications and to be integrated In
computer systems that have different MM sizes.

Firmware Sort Processor with LSI Components

Two types of MOS LSI memories for the SM are
considered:
a. Associative Memory (AM). A mod lIar AM with
LSI components can be organized, using monolithic or hybrid technology. The logic to perform
the "next greater"- and "next smaller" functions
is integrated into the AM chips. This could allow
the SM to locate the next desirable key in an interrogation cycle. For current MOS technology, this
is in the range of a few microseconds. However,
the integration of these functions, because of
their complexity, would result in a low yield of
the AM chips. Also, because of the specific nature
of these functions, such an AM chip might have
limited marketplace. For these reasons, the integration of only the "equality" function appeared
to be a more reasonable approach. To locate the .
next desired key i:q. an "equality" search, the
current key is modified (incremented or decremented) and compared continuously until an
"equality" response is detected. The average
number of these comparisons is equal to one-half
of the key length in binary bits. Such a sacrifice
in the searching speed seems to be justifiable by
the economic reasons mentioned.
b. Recirculating Memory (RM). The RM is organized with recirculating MOS dynamic shift
registers. The initial key is compared with the
contents of the RM for "equality." After the
response is detected, the initial key is modified
and compared continuously until the next desired
key is detected. The search is performed by comparing sequentially each word in the RM with
the base key using a single comparator for the
whole RM, as opposed to the AM, which contains

187

TABLE I -Comparison table for search memories
Type
of
SM

MOS
Components
Per Cell

I/O
Pins Per
Cell Ratio

Relative
Search
Time

AM
RM-1
RM-2

10
6
6

1:4
1:13
1:18

5

1
20

a comparator per each word. The search time for
the RM depends not only on the length of the
key word, but also on the length of the shift
registers composing the R1VI.
Based on the state-of-the-art of the MOS technology,
a comparison is made between the AM utilizing the
"equality" function only and two types of R1VI: the
RM-1 with dual 64 bit, and the RM-2 with 256 bit
dynamic shift registers. The frequency range (4 MC)
and the other electrical characteristics of the chips are
identical for both Rl\t{-1 and Rl\t{-2. The comparison
is made for a 256 word SM, with the length of the keys
in eight bytes (characters). The RM-1 has four external
comparators functioning in parallel, one for each 64
word module, while the RM~2 has one external comparator only for the whole 256 word module.
The results of this comparison are summarized in
Table 1. It is evident from this comparison that RM's
offer slower performance at lower cost. The worst case
search time of the Rl\t{'s can be estimated by the
following formula:
TSR

W. b

)

= ( ~ X 10- 6 sec

where: W = number of words in the RM module (or
the length of the shift register) consisting
of a single comparator
b = length of the key in binary bits
f = frequency of the shift register In
megacycles

i
i
SEARCH!

I

MEMORyl

i

i
PERIPHERAL
camMM.

PERIPHERAL FILE . .

Figure 4-Sort processor blOCK diagram

The search time can be decreased by using shorter
shift registers, if the cost for the additional hardware
is justified.l\t{ore dramatic improvements are achievable
through the increase of the frequency (f). The current
MOS technology already reaches up to the 20 MC
range for the shift registers. Further improvements are
expected in the characteristics of M OS associative
memories and· shift registers.
SM's of various searching speeds can be organized
for a given cost-performance criteria. However, for
the SP as a low priority background processor in a

188

Spring Joint Computer Conference, 1970

computer system, the critical issue seems to be the
economical factor, not the inherent speed. Accordingly,
the use of dynamic shift registers for the SM: presently
appears to be more reasonable.
Register File (RF)

This is a scratch pad memory used to store initial
parameters and boundary conditions as described
earlier. Several temporary storage registers for indexing and counting are also included in the RF.
To make the RF more 'uniform and functionally
flexible, all registers have the same log2 M length.
Sixteen registers in the RF appear to be sufficient.

Each of the microroutines are stored in individual
LSI memory units and monitored by a common
synchronizer. Such a partitioning of the control medium allows:
• Simultaneous execution of the search and interface control microroutines.
• More efficient use of LSI technology.
• Easy integration of the SP "with almost any computer system by simply altering the microroutines.
• Easier maintenance and diagnostics.
From the economical standpoint, this approach
does not cause any cost increase. Unlike magnetic
memories, the size of the LSI semiconductor memory
does not affect the bit price. Roughly 4096 bits of
RWM, organized 128 X 32, are required for the ,uS.

Microprogram Storage (,uS)

This random access memory stores the microprogram
of the SP. Either read-only (ROM) or read-write
(RW1V1) memories can be used for the ,uS. Comparing
the ROM vs RWM, the following factors must be
considered:
a. Because of the nondestructive read-out of the
semiconductor memories, the ROM does not offer
significant speed or economical advantages.
b. A higher yield can be achieved in a ROM of a
given size substrate. However, this may not
result in a decisive advantage because the production of ROM requires a certain degree of customization, while the RWI'vf is an established
off-the-shelf product.
c. The RWM allows greater flexibility and simplicity in microprogram alterations, debugging
and maintenance.
Thus, the LSI RWM for the ,uS appears to be more
desirable. Now the SP can perform not only various
sort algorithms but also complementary functions
such as table look-up, file maintenance, and list processing by simple reloading the ,uS by the desired microprogram.
Three basic microroutines reside in the ,uS:
a. the search microroutine which controls the S]\;1
and generates the 1\11V1 addresses of the sorted
records,
b. the 1\11\1 interface control routine which performs
all the communications between the 1\11\1 and
the SP,
c. the peripheral file interface control routine for
controlling the I/O operations if the SP needs to
communicate directly with the peripheral file
(see Figure 3, System Configurations C and D).

SYSTEM CHARACTERISTICS AND
CONFIGURATIONS
The SP as a "black box" can be integrated practically
with any computer system and relieves the computer's
CPU of the burden of sorting. It is applicable for computing systems operating in different processing modes.
In the conventional batch processing systems, the SP
functions as a stand-alone, low priority processor. In
real-time or time-sharing systems, the SP functions as
a background in-house processor. Substituting for the
sort routine only, the SP does not cause any structural
changes in the computer system architecture.
The system characteristics of the SP are summarized
as follows:
a. The SP is easily connected to the MM channel of
the computer and does not require any specific
and/ or additional hardware provisions from the
computer (it behaves as any peripheral controller).
b. In a multiprocessing environment, the SP shares
the common MM with the other processing units
on a preestablished priority basis.
c. The interface between the SP and the M1\1 is
asynchronous and operates on the requestacknowledgement basis.
d. The SP requires the simplest software support.
Statements like SORT ASCENDING, SORT
DESCENDING, RESU1\1E, TERMINATE,
TRANSFER STATUS on the systems language
level must be compiled into a single control word
format which sets the SP to the appropriate
operational state. Further, the SP performs the
specified function autonomously.
e. The sort-merge control program performs overall
supervision and interaction of the SP with the

Firmware Sort Processor with LSI Components

system. Data preparation and ]VIlVI allocation
required for sorting also can be performed.
The SP can be integrated with the computer system
in several configurations. Four typical system configurations are illustrated in Figure 3, and are described
as follows:

SYSTEM A is the simplest configuration where the
SP shares the MM with the CPU on a lower
priority basis. The sorting time for this configuration is relatively long.
SYSTElJ![ B allows the SP more freedom in accessing the appropriate MM bank. Although the
SP remains a low priority processor, this configuration results in higher sorting speed.
In both system configurations A and B, data transfer
between the MM and the peripheral file for sorting is
accomplished through the conventional I/O channel
and is controlled by the appropriate software routine.

SYSTEM C has the same MM sharing scheme as
System A, in addition, the SP shares the peripheral
file with the CPU. The control of the data exchange between the MM and the peripheral file,
required for the sort-merge operations, is performed by the appropriate microroutine of the SP.
SYSTEM D combines the MM sharing scheme
of System B and the peripheral file sharing scheme
of System C. System D comprises fully parallel
processing capabilities aL 1 offers the highest
sorting efficiency.
In all of these configurations, the logic structure and
the basic functional blocks of the SP remain practically
unchanged. The specific interface characteristics of
Systems B, C and D are easily programmed into the
microprogram storage. The choice of a configuration
depends upon the applications spectrum of the given
computer system. Configurations C and D seem to be
more applicable for the business computer systems
where large amounts of data are to be processed and
the I/O portion of the sort-merge operations are of
significant magnitude. Configurations A and B can be
used in scientific-engineering applications where a
relatively small number of files are to be sorted. The
trade-off between the desired degree of sorting efficiency and cost of the features for sharing the lVIM
and/ or the peripheral file should be decided at the
user's level.
EFFICIENCY AND PERFORMANCE ANALYSIS
The efficiency of the SP depends upon the applications orientation, the size and the basic functional char-

,.

FOR:

12

TOTAL TIME ITI

•
4

••

-

----- ----12K

CIt

-

189

C~32K

~m

__

C

R
IaKC IOKC IOKC 120KC

1111

Figure .5-Statistical curves of sorting parameters

acteristics of the computer system. It is very difficult
to depict generalized analytical expressions that correlate computer system parameters and sorting because
of . the diversity and inconsistency of the numerous
variables involved.
The diagrams of Figure 5 illustrate the correlation
between the main sorting parameters: the sorting
time (in minutes), the main memory capacity C (in
kilocharacters), and the transfer rate R of the peripheral file (in kilocharacters per second).
These diagrams are derived by analyzing and combining the statistical data for two typical models of
computer systems performing a sort.3, 6, 7, 8
The following conclusions can be derived:
a. The CPU time (8) spent for sorting generally
does not depend upon the size of the MM.
b. The increase of the transfer rate R causes a decline
of the total sorting time T hyperbolically. The 8
remains unchanged.
c. For R = 60 KC, the ratio between the I/O time
(T - 8) and 8 equals 75 percent to 25 percent.
This ratio is equal to 60 percent to 40 percent for
R 2:: 120 KC. This is the prevailing range of the
transfer rates for the present magnetic files used
in the small-to-medium and larger computer systems.
Considering the fact that at least 25 percent of the
computer's workload in a business oriented system involves sorting, and 40 percent of that workload is the
burden of the CPU, it is evident that the Sort Processor
can release up to 10 percent of the CPU's overall
working time.
The modular logic structure of the SP is highly
adaptable to the further advances in LSI technology.
Larger, faster and cheaper LSI chips (MOS or bipolar)
can be easily utilized in the SP, improving the costperformance index and increasing its overall efficiency
in the computer systems.
The estimates show that the SP, designed with

190

Spring Joint Computer Conference, 1970

today's off-the-shelf MOS LSI components, can save
considerable amounts of user money.
SUMMARY
The semiconductor technology presently offers LSI
components (specifically MOS memory chips) that
have a very attractive price-performance index (less
than ten cents per bit and around 100 nanoseconds
access time). During the coming years this index will
be subject to continuous and dramatic improvement
thus setting up broader technical and economical
grounds for hardware-software trade-offs. The purpose
of these trade-offs is the simplification of the software
sector of computer systems and the increase of the
overall systems productivity for the user.
The Sort Processor designed with LSI components
relieves the CPU from the burden of performing the
tedious and time consuming sorting operations. It
behaves like a low priority peripheral processor and
does not cause any structural changes in the architecture of the computer system. For the small-to-medium
and larger computer systems the Sort Processor can
release up to 10 percent of the CPU's total workload.
The techniques of the search memory and dynamic
microprogramming allow use of the Sort Processor for
algorithmic functions other than sorting.
ACKNOWLEDGElVIENT
The author expresses his appreciation to the NCR-ED
Research Department for encouraging the work on

this project, and gratitudes to his colleagues: to Mr.
A. G. Hanlon for stimulating discussions and advices,
to Mr. D. W. Rork for his constructive engineering
work in designing the breadboard of the Sort Processor,
and to Mr. F. Sherwood for early discussions on sorting
algorithms.
Special thanks are due to Mrs. Ann Peralta who
performed the tedious job of typing and retyping this
paper.
REFERENCES
1 The Diebold Research Program
Technology Series September 1968
2 I FLORESComputer sorting
Prentice-Hall Inc 1969
3 Computer characteristics digest
Auerbach April 1969
4 D A BELL
The principles of sorting
Computer Journal Vol 1 No 2 June 19;")8
.~ R R SEEBER
Associative self-sorting memory
Proceedings of EJCC Vol 18 pp 179-187 1960
6 IBM system/360 disc and tape operating system. Sort/merge
program specification
File No S360-33 Form C24/3444
7 The National 315 electronic data processing system sorting
tables, magnetic tapes
The National Cash Register Company Dayton Ohio
8 Magnetic tape sort generator
Reference manual The National Cash Register Company
Dayton Ohio

Systemj360 model 85 microdiagnostics
by NEIL BARTOW and ROBERT l\1CGUIRE
International Business ]lachines Corporation
Kingston, N ew York

INTRODUCTION
System/360 Model 85 is a large central processing
unit, (CPU), which contains a machine cycle of 80
nanoseconds and a main storage access of 1.2 microsecond.s. It has the capability of executing 12,500,000
add register to register type instructions per second.
Its major parts are the Instruction Preparation Unit,
(I Unit), Instruction Execution Unit, (E Unit), and
Storage Control Unit, (SCU). In addition to these
three main parts, there is also another portion of the
hardware dedicated to maintenance controls. The
IBM Model 85 computer has high speed buffer storage
and hardware capable of initiating and executing Instruction Retry. There are two major control storage
elements. Read Only Storage, (ROS), and Write able
Control Storage, (WCS). ROS consists of 2,048 decimal
control words while WCS consists of 512 control words
for a standard Model 85 or 1024 control words for a
Model 85 with an emulator feature. Each control word
is 128 bits long and consists of 33 control fields as illustrated in Figure I-Model 85 Control Word. Approximately 450 micro orders have been implemented in the
Model 85 for use in microprogramming.

BA

Microdiagnostic Example

Figure 2-l\1icrodiagnostic Test (STAT 'A')-is an
illustration of a microdiagnostic test. The test uses the

BB

SHIFTER

ADDERS

A.B.C.D REGS

I I I I
Be

BO

BE

71
BF

I: I I
BH

II/

7

STG ADDRESS
REGS

NEXT ADDRESS

107
CE

MAR(-r.:Z
'7
/S~~LS

CF

7

CG

II I
CH

MISC CTLS

REGS

STG
REQUEST

LOCAL
STG

7

EMULATORS

DEFINITIONS
Microprogram-a computer based program whose
microinstruction, set is geared to one or more logical
hardware functions which are executable in 'one
machine cycle.' Two or more microinstructions are
normally required for the execution of one instruction
of the standard instruction set.
Microdiagnostic-a microprogram designed specifically to test a predefined portion of hardware.

I 1-1--/~---II : /

7

Figure I-Model 85 control word

Control Automated System, (CAS), output for its
descriptive representation. Each block of the test, of
which there are four, represents one machine cycle. The
purpose of the test is to confirm that the hardware
required to set a latch called STAT 'A' is working
properly. If it is not, to stop at control storage address
A02.
Cycle 1 is defined by the control word located at
'Control storage address A12 (Hex). This control word
will reset a latch called STAT 'A' and fetch the control
191

192

Spring Joint Computer Conference, 1970

Loading of wes
XX
IXXXO

A1H

C O--STAA

I

xx
CYCLE 1

AOH

xX

OXXXO

A02

C , --STAA

I

R STAA
xx
CYCLE 2

I

OXXX*
CYCLE 3

R STAC

I *XXXO
CYCLE 4
TO A03

Figure 2-Microdiagnostic test (STAT 'A')

word located in control storage address AOC to the
control registers. Cycle 2 is defined by this control
word at control storage address AOC. This control
word will set the latch called STAT 'A' and cause the
control word at control storage address B21 to be
fetched to the control register. Cycle 3 is defined by
the control word at address B21 which will set the 12th
bit of the next address field to 1 making the next
address A03 if a latch called STAT 'A' is in the set
state. Or, the 12th bit of the next address field will remain 0 if the latch called STAT 'A' is in the reset
state. Assuming STAT 'A' is reset the control word at
control storage address A02 will be fetched to the control register. Cycle 4 is defined by the control word
A02 and will set the 8th bit of the next address field to
1 if the Latch call STAT 'c' is in the set state making
the next address A12. Or, the 8th bit of the next address
field will remain 0 if the latch called STAT 'c' is in the
reset state causing the next address to be A02.
STAT 'c' can be set or reset by a toggle switch
labeled 'Loop Test' on the maintenance console. Since
diagnostics are run with the loop test switch in the off
position this test will stop at address A02 if STAT 'A'
fails to set and the test can be looped simply by setting
of the loop test switch.

J\1ICROPROGRAIVIJ\1ING USAGE
Base machine functions

The System/360 J\10del 85's basic machine functions
are defined by control words contained in Read Only
Storage. They consist of control words for machine instruction execution and sequencing of manual control
functions which are initiated from the maintenance
console. Other functions include· control words for the
retry of failing instructions, interrupt sequencing and
provisions for handling of invalid instruction operation
Cop) codes.

Load WCS routine

WCS loading for microdiagnostics is handled from a
routine in ROS and is designed to load 512 control
words from main storage into WCS. This routine can
be executed from one of two entry points-either by
using the address contained in the double word starting
with main storage address 8 or by establishing a value
in one of the internal working registers and bypassing
that part of the routine which fetches main storage
address 8.
LMP instruction

WCS loading for purposes other than microdiagnostics is normally handled by the Load Microprogram
Instruction. This instruction is a privileged member of
the System/360 instruction set. It is capable of loading
one to four control words into WCS from main storage.
The control words are indexed by the operand field of
the instruction.
Emulators

The IBJ\1 7090/7094 Emulator is a prime application
for System/360 Model 85 microprogramming. When
emulators are installed on the machine it is necessary
to modify the instruction preparation unit in order to
handle the additional operation codes required for
emulator instruction. WCS must be expanded to two
times its basic size, that is, 1,024 decimal control words,
and must be loaded with the control words which are
required for the execution of each emulator instruction.
Multiply Algorithm

The low speed multiply algorithm contained in WCS
is an alternate way of executing the multiply instruction when the high speed multiply feature is installed
on the machine. The high speed multiply feature requires its own dedicated circuitry in the E unit. The
low speed multiply algorithm is used when there is a
failure or malfunction in the high speed multiply hardware. The low speed multiply algorithm is activated
by setting of a system mode latch which is done
normally via the Diagnose instruction.
Hybrid diagnostics

Hybrid diagnostics are another form of microprogram
usage with the System/360 J\10del 85 as shown in

Systemj360 l\fodel 85 l\1icrodiagnostics

10SET

L

C
8E
TM
80
10SETl LA
ST
8
IOS ETA LA
N
TCH
BC
MYC
01
MYC
01

tional code used for the other two parts of the diagnostic.

2,ACTSPT
2,LASPUT
4(LlNK)
2(2),X'40'
IOS ETA
2,4(2)
2,ACTSPT
10SET
12,X'7FF'
12,0(2)
0(12)
1,IOSETI
-ERR4A(1),2(2)
ERR4A,X'FO'
ERR58(l).2(2)
ERR5B.X'FO'

111icrodiagnostics
Test Setup

OJO(XX _

BC2

C O"BCAL

L-,xxox

lVlicrodiagnostics are microprograms specifically designed to test a given hardware function. Their main
purpose is to detect basic machine malfunctions and to
isolate the failing components. There are three parts
to micro diagnostics :

Load WCS
Execute WCS Code

LMP
Diag
,xxox -

.A,8,C \*_l\'-loA
_to+O

,xxxx -

B05

B06

E

OO~~ -

BC?

\_lO+A
\*-\
I. ' 1*-~8LR+OEMT->W
I

MCK lR

L2-'XXXX -LB

SRl
LA
SRl
STC
Sll
LA
STC
STC
NI

10,12
10,8
4 X'80'
4:0(10)
4,MIOPSW
10,5
10,1(10,0)
10,MCWD
10,MCWE
FLAGS2,X'7F'

LA
ST
TM
BCR
MYC
MYC
01
8R

2,4(2)
2,ACTSPT
DMFLAG,X'80'
1,LlNK
EXOPSW(40) , 88
88(40),EXRPSW
DMFlAG ,X'80'
LINK

xx -

Bce

1*-

B

R

-LA

193

L3-

_

XX-LC

L4-

XX-LO

W-

C/)

BUSY REGISTER

::>

OJ

ABDLMRZ
. LEVEL 1
LEVEL 2
LEVEL 3

t--t--t--J--J--'--i--..J
h~~I-I-l-l-~----I
--'---"--'----'----.l....-L-J
I-

w
C/)
w

BUSY INHIBITS

termine what parts of the Processing Elements will be
used when the instruction is transferred to the FINST
Instruction Register (FIR). This information is stored
in the 1st level of a Busy Register. The Busy Register
also contains the same PE usage information, in level 2,
for the next intsruction to be executed, which is in FIR,
and contains like information, in level 3, for the instruction presently being executed, which is in FIAR
(the ROM address register associated with FIR).
The instruction in FOR is also examined to see if it is a
candidate for instruction overlap. If it is, it is decoded
into a microsequence which addresses the overlap section of the ROM (250 addresses) using the ROM address register FOAR. If the required sections of the PE
are busy, the microsequence is inhibited until the PE
sections are free.
FIAR addresses a 470 word section of the ROM
which is sufficient to execute all instructions. When the
instruction in FIAR is fully executed, the instruction in
FIR is transferred to FIAR and the instruction pointers
to the queue are incremented one position. This shifts
the instruction from FOR to FIR and puts a new instruction in FOR. At 'the same time, the Busy Register
is updated to determine if instruction overlap is possible.
To achieve the required high speed operation, the
PE busy bits in word one are set while the instruction
is in FOR. As the instruction moves from FOR to FIR
to FIAR, the busy bits are transferred from level 1 to
level 2 to level 3. Busy bits are reset via two paths. The
normal path is via CU control enables out of the ROM.
This path takes the longest (3 clock times) when the
instruction in FIR is transferred to FIAR. Because
this may delay the start of an instruction overlap, early
resets are generated in FIR and enabled when the instruction transfers to FIAR.

ex:

>-

C/)

::>

INSTRUCTION DECODING

OJ

(250 ADDRESSES)

ROM
(470 ADDRESSES)

FeR

TO 64 PE'S

Figure 4

Instruction decoding is the same whether it occurs
in the overlap or instruction stations. The format of the
twelve bit instruction word is shown in Figure 5(a).
Bit 0 indicates whether the operation is in 32-bit or 64bit mode. Bits 9, 10 and 11 are operations on the data
word associated with the instruction such as address
indexing when the data word is an address. Bits 1 thru
8 contain the OP CODE. Each instruction is decoded
into a microsequence (microprogram) used to address
the R01\1. Each microsequence consists of from one to
69 microsteps (microinstructions). Generally each
micro step is an ROl\1 word. In some operations, such
as divide, the same word is addressed many times in
succession. However, each time the word is addressed
it is considered a microstep. Since a word is addressed

Use of Read Only l\1emory in ILLIAC IV

o

4

6

8

' - - -_ _ _ _ _ _ . - -_ _

MODE
BIT

(a)

~

_

OP CODE

10

11

_'J~

DATA OPERATIONS

Instruction Word Format
_ _-ID

1

(Tl) WRDN

1

(T2) WRDN + 2

CLOCK-~~-;

1 _ _ _ _- 0

)--_+-+---I D

TO ROM

T

0

...--.-_+---ID

1

(T4) WRDN + 4 .

L....-I--+---I D

1

(T4) WRDN + 6

T

0

(b) Portion of Typical Microsequence

Figure

i)

every clock cycle, micro steps are synonymous with
clock times. When many words are addressed .simultaneously to achieve control enable ORing, this is also
considered a single microstep. Each microstep generates
a full set of control enables which are stored in the
FINST Control Register (FCR). From FCR they are
broadcast to the 64 PE's in the quadrant (see Figure 4).
Generally from one to .50 enables are active for each
microstep.
The decoded instruction contains the starting ~ddress
of the microsequence and all information for decision
making, such as branching, within the microsequence.
Typical branches are one of the six ways to do signed

201

and unsigned arithmetic operations. Figure 5b illustrates a portion of a typical microsequence. The seven
flip flops shown are "D" type flip flops, i.e., the "I"
output is in the same state after the clock pulse as the
"D" input was during the clock pulse. Each "I" output is designated by the ROM word it addresses.
Related clock times are shown in ( ), e.g., (T2). The
flip flops are part of the address register (FOAR or
FIAR). Under control of the decoded instruction, the
microsequence proceeds from flip flop to flip flop each
clock time. Referring to Figure 5b, at time Tl, word N
is addressed. At time T2, either word N + 1 or N + 2
is addressed depending on the state of control bit I. At
time T3, word N + 3 is addressed. At time T4 50th
words N + 4 and N + 6 are addressed to obtain
enable bRing. Finally, word N + 5 is addressed at
time T5.
The following microsteps are the microsequence for
a floating point add and are accomplished in 250 ns
(five clock times) :
(1) Fetch Operand. Transfer to B register (RGB), in
the PE, the operand identified by the address field of the
instruction.
(2) Difference Exponent. Subtract exponent fields
of operands in PE A register (RGA) and RGB.
(3) l\1antissa Alignment. Shift mantissa of operand
in RGA or RGB by amount determined from step 2.
(4) Add Mantissa. Add mantissa field of operands
in RGA and RGB.
(5) Normalize. Normalize sum in RGA.

BASIC ROl\/I REQUIREl\1ENTS
A Read Only l\1emory (ROM) may be thought of as a
numerical conversion table, i.e., the selection of one of the
input lines will present a set of predetermined numbers
at the output of the memory. It is a simple matter to
design a conversion table to read-out one set of numbers
corresponding to the selection of either one or more than
one input numbers. In block diagram form, such an
ROl\1 is shown in Figure 6. However, in this paper, only

(A)

M
liNES

(C)

(B)

M

MIR

LINES

WORDS

(0)

CLOCK TIMING

Figure 6

(D)

ROM

MOR
BITS

(b)

202

Spring Joint Computer Conference, 1970

WORD #1
WORD #2
WORD #3

I

i
WORD #720

BIT # 1

BIT #2

BIT #3

put gates. Because of the large number of gates irivolved, inter-connection wiring is complex. An almost
insurmountable difficulty in such a design is to change
the content of the memory, mechanically or otherwise.
By using a matrix design, the number of printed circuit
boards is reduced and the memory is readily altered.
ROM: DESIGN DETAILS

BIT #280

Figure 7

the functional block which performs the numerical conversion, is called the Read Only Memory. The other
functional blocks are considered to be peripheral logical
functions. The logic design of the ILLIAC IV calls
for an ROM having a cycle time of 50 nanoseconds, i.e.,
the ability to read out a set of numbers every 50 nanoseconds. Referring to Figure 6, the timing from the
input register (1\1IR) to the output register (l\10R) is .50
nsec. The memory must accept 720 input lines and
present an output of 280 bits. To be useful, an ROl\1
must be alterable either electrically or mechanically.
Note that an electrically alterable memory is essentially
a read/write memory and is, as a rule, more difficult to
construct than a memory that is altered by mechanical
means.
Because of its large size and fast cycle time, only an
ROM in a matrix form, was found satisfactory for use in
ILLIAC IV. It may be noted that an ROl\,1 can be
constructed with sufficient speed by employing ECL
gates with typically 2.5 nanosecond propagation delays. Such a design is schematically shown in Figure 7.
In principle, the ROl\1 is simply P number of gates
where each gate is one output bit of the memory, and
1\1 line drivers, corresponding to the :LVI input words.
The gates are selectively connected to the input word
lines, such that the activation of anyone or more word
line\' provides a predetermined output bit pattern
through the gates. The design of Figure 7 was considered early in the ILLIAC IV development. But
careful examination of the design reveals that there
must be 720 input lines and a minimum of 280 output
gates. Each output gate must have 720 inputs. For
gates with only 9-inputs (maximum available), each
of the output gates must be connected with 80 gates in
parallel to accept the 720 possible input lines. The 80
gates produce only one output bit, and must be buffered
with multiple stages of OR-gates to provide that one
bit output. Similarly, multiple buffering or amplifying
stages must be used to drive the large number of out-

The R01\1 schematic is shown in Figure 8. The form
is a standard, transistor cross point matrix consisting
of m (720) word lines by n (280) bit output lines. Only
those bit lines that are transistor coupled to a word line
will switch when that word line switches. Bit line switching is then detected by sense amplifiers on the bit lines.
Each word line is powered by a line driver which must
tolerate the variations in word line loading caused by
the variation in the number of bit lines coupled to the
word line. For convenience word line drivers and bit
line sense amplifiers are standard devices used elsewhere in ILLIAC IV. The line drivers are level converters that convert standard ECL levels of ±0.4 volts
to CTL compatible levels o~ ±3.0 and 0.0 volts. The
sense amplifiers are standard, ECL, 9 input, negative
NAND gates.
For coupling word lines to bit lines, transistors offer
several performance advantages compared to diodes or
resistors. Multiple leakage paths thru resistively
coupled cross points would be prohibitive in a memory

_

BIT =-1

BIT "2

BIT -'3

Figure 8

BIT" 280

280 BIT LINE SENSORS

Use of Read Only l\Iemory in ILLIAC IV

of this size and speed. Compared to diodes, transistors
isolate bit line capacity and dc loading from the word
lines which alleviates word line driving problems.

MATRIX BOARD #1
WORD

#

203

MATRIX BOARD it?

1-120/BITS# 1-40

WORDS it 1--120/BITS:: 141--280

Bit lines
The required nominal input levels at the sense amplifiers are ±0.4 volts. In order to conserve power, the
coupling transistors are biased to be cut-off for a low
level word line. Because a bit line may be driven by a
coupling transistor at any location along the bit line,
each bit line is terminated at both ends in its 50 ohm
characteristic impedance. The bit line terminating
resistors R02 and R03 have a Thevinen's equivalent of
50 ohms returned to -0.8 volts and quiescently bias
the bit line at - 0.4 volts. At a high bit line level of
+0.4 volts, the coupling transistor must supply 8 ma
to R01 and 16 ma to R02 and R03, for a total of 24 mao
Since the Vbe drop is about 0.8 volts, the word line is
required to swing.between 0 to + 1.2 volts. The collector
supply voltage is fed thru diodes D1 and D2, from the
+4.8 volt supply, in order to reduce power dissipation
in the coupling transistors.

MATRIX

MATRIX
BOARDit8

WORDS==121-240/BITS -"141--280

1/20F
WORD
LINE #130

6 SEGMENTS OF BIT LlNES-_

SENSE AMPLIFIER

SENSE AMPLIFIER

Word lines
As explained before, because of the convenience of
using available devices, the word line driver provides
an output level that swings from O. v to 3.0 V. As shown
in Figure 8, R1 is inserted in the word line to attenuate
the 3 volt signal to the desired 1.2 v level. Because, of
the large number of bit lines crossing the word lines, the
length of the word line is electrically long, and is terminated at the far end in 50 ohms. The word lines are
50 ohm microstrip lines. To minimize the word line
time delay each word line is divided into two segments,
with each segment coupled to a maximum of 140 bit
lines. Each segment is designed as shown in Figure 8
and has its own.set of line drivers, and line terminationE.
Although there are 140 bit lines crossing each word
line segment, system design requires only a maximum
of 25 bit lines be coupled to anyone word line segment.
Each coupling transistor introduces a capacitive loading
of about 1.5 pf, delays the signal propagation by about
0.15 nanosecond, and produces a peak negative reflection of about 30 mv. To prevent an excessively large
reflection, no more than 3 transistors are located in
succession along any word line or any bit line.

111emory partitioning
Because of speed considerations and space limitations, .each word line is divided into two segments.

UJl.tJUiI~.IUU!.W-_ _ _ _..:;:.B:.:..IT#~1

1 OF 8 SENSE BOARDS

Figure 9

Similarly, the bit lines are divided into 6 segments.
The result is to divide the ROM into 12 matrix boards.
Each board contains 120 word lines and 140 bit lines.
The sense amplifiers and output register are mounted
on 8 sensing boards. Eight boards are required because
of the pin limitations. The partitioned ROM is shown
in Figure 9. Note that only the sense amplifiers and bit
line terminating resistors (R01) are on the sensing
boards. The remaining components are mounted on the
12 matrix boards. The matrix board is a special design.
The sensing -boards are standard 12 layer boards used
throughout the ILLIACrV Control Unit.

111echanical desctiption of the mattix board
The bit lines are 50 ohm strip lines which gives the
advantages of low parallel line cross-talk and close
impedance control. The word lines are 50 ohm microstrip lines. Address selection lines, which are the input
lines to the word line drivers, are nominally 100 ohm
micro-strip li~es to simplify board construction. The

204

Spring Joint Computer Conference, 1970

~
-

i
-T--*--==

DRIVERS

MATRIX AREA

~I

DRIVERS

'-n----v--~=~

ITI··-E::~==::::~---··----t-

i DRIVERINPUTSi1'On,
DRIVERINPUTSt 9On)

: IT
i
I

The transistors are cubes of 80 mils on an edge. 'The
transistors are spaced 100 mils apart and so are the
spacing of the holes and mounting pads for the emitter,
collector, and base leads. The tight spacing of the transistor requires a special mounting technique as shown
in Figure 11.

.Ll

rrl-----------

! ~ ---

140 SENSE LINES

CHECKOUT OF MATRIX BOARDS

lson,

I 1_11-_______________--j,~R,~U,,'

• 1

f t=======:::::::::;:::;:;:====~:-:J
~20 WORD LINES

GROUND

150 nl

Figure 10

address selection lines are constructed in two layers
with both layers using the same ground plane as the
signal return path. Consequently, the two layers of
address lines are slightly different in line impedance.
The lines in the top layer are about 110 ohms and the
lines in the bottom layer are about 90 ohms. The matrix
board cross section is shown in Figure 10. The board
dimensions are otherwise the same as a standard CU
board, i.e., 18 inches X 20 inches.

The coupling transistors are located on the matrix
boards in accordance with the system instruction table,
i.e., the output bit pattern for every word selected. The
simplest checkout fixture is to apply a +0.4 volt input
level to each word line driver and read the output of
every bit line with a voltmeter. Such a manual checkout system would require 320 man hours for the 12
matrix boards. A more automatic checkout system is
shown in the flow chart of Figure 12. The design table
of the ROM is transferred onto 80-column cards. Each
card will have 80 bits and 12 words. Therefore, 2 of the
80-column cards are required to complete the ,contents
of 12 words by 140 bits, and 20 cards are required for
one of the 12 matrix boards. A card reader reads one
card at a time and compares the reading with the bit
outputs. Any mismatches are detected and displayed
by indicating lamps. With this system the matrix
boards can be checked out at the rate of 9 boards per
hour or 2 man-hours for 12 boards. The cost of the
check out system is about $3,000.
In ILLIAC IV, neither of the above approaches is
used. The Test and Maintenance Unit (TMU) area of
the Control Unit has a 64-bit comparator which' can be
used to check out not only the matrix board but the entire area of instruction decoding. Via the 1/0 interface,

80 Columns

SOLDER

,

VccPLANE, DRIVER INPUTS# 1
DRIVER INPUTS #2

Key Punch
GROUND

!.O Co!umn C~ds

GROUND
GROUND
WORD LINE AND BIT LINE OUTPUT

SOLDER

SOLDER

Figure 11

Figure 12

Use of Read Only lUemory in ILLIAC IV

cu

<9

puc

@

CONTROL UNIT

@

COMBINED PREREGUL~TOR "
SHUNT PACKAGE

@

PIVOTING· TEST. MAINTENANCE UNIT

®

_ITOR UNIT

®

TEST" MAINTENANCE PANEL

®
®

KEYBOARD

i

LOGIC UNIT

I,ll

@>

CARD RACKS" BACKPLANE.

®

SIGNAL OISTRIBUTtON:BUSES

®

@
@

TMU CABLE

Iii

CARRIE,~ IAREA
,III

VeeBUS

v..

SUS

:;11

-

I,

:

11\

'

iill

®

LOGIC PACKAGE ASSEMBLY

@

CONTACTOR ASSEM.LY

Ii!> CONTROL MODULE I\i!
® +4.B V REGULATOR, \I
Ii!> - 3.8V. + 1.2 V REGULATOR
III!

Jill·
Figure 13

instruction decoding and ROM addressing can be controlled programmatically. The program generates the
FINST instruction to be decoded and loads a TJVIU
register with the expected ROM response. The comparator compares the expected and actual responses
and the program proceeds if they agree. Because only
64-bits can be compared at a time, five compares are
required to check each full 280 bit output of the ROM.
If there is an error, the inputs being compared can be
displayed on a CRT display in the TMU (Item 6,
Figure 13).
APPLICATION OF 1\1:SI
For the entire function of decoding instructions into
microsequences, storing enable patterns in the ROM
and sensing and storing the ROM output, the present
system uses a total of 28, 18 inch by 20 inch, multilayer boards. As the following example shows, an 1\1:SI
approach would result in about a 50%. reduction in
volume. An all MSI ROM is composed of MSI cells
where each cell is M words by N bits. To reduce interconnections and conserve pinouts, x-y addressing is
used to address the M words in the cell. To eliminate
buffers between cells, the N bits of one cell are "wire
OR'ed" to the N bits of another cell.
Control Enable ORing is no longer possible because
x-y addressing generates unwanted addresses if more
than one x or more than one y input is active. The
number of required ROM words is increased to about
1200 because this ORing cannot be used. The amount of

205

logic required to generate microsequences also increases.
However, the generation of microsequences is now
more straightforward and better adapted to MSI.
Based on the average number of words required by a
microsequence, the optimum number of words in a cell
is determined. Assume the optimum number is 16
words. Therefore, four address lines plus one address
enable line are required. The address enable permits
more than one cell, i.e., more than 16 addresses, to be
used in a microsequence. Because many microsequences
do not use whole multiples of 16 addresses, more than
1200 addresses are required. Assume the ROlVI size
is increased by 20% to 1440 words to account for this
affect. Assume each cell has 64 outputs. This requires
less than 100 pins per cell which is consistent with
present MSI packaging. To obtain 280 outputs, five
cells are required for every word. Therefore a total of
450 cells are required to make up the complete ROM.
Allowing four square inches of surface area per cell
and sufficient space for terminating resistors, bypass
capacitors and connectors, a total of 8, 18 inch by 20
inch, boards are required to make up the ROM. If the
remaining 16 boards required for microsequence generation and ROM output sensing and storing can be reduced to 6 boards, then the original 28 boards are
reduced to 14 boards. It is important to note that in
order to achieve an ROM cycle time of ,1)0 nsec (i.e.,
register to register), the cell access time (i.e., address
input to bit output) will have to be, approximately,
20 nsec.
ACKNOWLEDGJVIENT
The final R01\1: is the result of the efforts of many
people and the authors gratefully acknowledge their
contributions. Especially noteworthy is work by E. S.
Sternberg in the area of logic design and T. E. Gilligan
in the area of circuit design.

REFERENCES
1 G H BARNES R M BROWN M KATO D J KUCK
D L SLOTNICK R A STOKES
The II.JLIAC IV computer
IEEE Trans Compnt.ers Vol C-17 pp 746-7;"l7 Augnst, 19;,)R
2 R L DAVIS
The ILLIAC IV process1:ng element
IEEE Trans Computers Vol C-IR pp ROO-RI6 September 1969
3 R A STOKES
ILLIAC IV: Route to parallel computers
Electronic Design Vol 26 pp 64-69 December 20 1967
4 Multilayer printed circuit board-technical manual
The Institute of Printed Circuits

A model and implementation of a universal time delay
simulator for large digital nets
by STEPHEN A. SZYGENDA, DAVID M. ROUSE and EDWARD W. THOMPSOK*
University of Missouri
Holla, Missouri

INTRODUCTION
Although simulation of logic circuits has been attempted
in the past, only those which simulate completely
combinational circuits have performed with any degree
of success for various types of logic. This is prim,arily
due to the number of simplifying assumptions that can
be made for combinational circui~. For example, in
combinational circuits, one need not consider timing of
the signals, since any given input vector will always
propagate to a stable state value. Also, since purely
combinational circuits do not contain internal states,
the user need not define or initialize these values, as
must be done (and done meaningfully) for sequential
circuits. A systems study of simulation and diagnosis
for large digital computing systems has been performed.!
The results of that study have led to the implementation to be described. The model, whicl;t has been
adopted, permits the user to select the level of detail
most appropriate to his requirements, and thus not
hamper him with pverly restrictive assumptions. The
advantages of this model are the following:
1. It does not require the specification of feedback
loops in sequential circuits.
2. The ability to reset all feedback lines to any value
at any· time is not required.
3. It provides for the detection of hazards and races.
4. Simulation is not restricted to particular types of
circuits.
5. Besides having the capability of simulating gate
level circuits, it can also simulate at a functional
level. This results in a savings of time and storage,
permitting the simulation of large circuits.
Actually, three models are presented, two being subsets of the other. It is felt that these models will be

*Presellt Address: TELPAR, Inc., Dallas, Texas

207

useful tools in analyzing logic cirCUIts, generating tests,
and providing experience in determining a desirable
level of detail for simulation.
In the next section of this paper, features are defined
which appear to be desirable for a general purpose
system simulator. This is followed by a description 01the means utilized to achieve these desired features. In
particular, a detailed discussion of the models adopted
for the simulation is provided. It is felt that the heart
of any simulator is the basic simulation model. The
effectiveness of the simulation is directly related to the
ability of the model to accurately describe the physical
systems being simulated. Therefore, the adopted models
are of extreme importance and will be described in
sufficient detail to substantiate their effectiveness.
A description of the simulator implementation follows
the simulation model discussion. (This implementation
is used in a total simulation and diagnosis software
system. 2) l\10des of operation and simulation optimization techniques are also described. Since a table
driven simulator was implemented, a discussion of its
implementation will be given.

DESIRED FEATURES OF A SIl\/IULATION
l\10DEL
A primary goal of this simulation package is that it
possesses the ability to simulate any of the common
modes of logic operation. The handling of asynchronous
sequential circuits presents the most difficulty, in that
circuit timing must be accurately described. This rules
out the unit delay assumption used by many existing
simulators (in particular, all space-leveled, compilerdriven simulators). Therefote, this model allows a variable time delay for the elements being simulated.
In the model used by Seshu3 , which is frequently
used for sequential circuits, the only race analysis performed is that of checking feedback line values. All

208

Spring Joint Computer Conference, 1970

feedback lines are assumed to be broken at some point
and a race is declared if, during any pass through the
circuit, more than one feedback line changes value.
One feedback line at a time is declared a winner and
its value is then propagated through the circuit to
determine its effect on the outputs. The race becomes
critical if different stable states are reached, depending
upon the order in which the feedback values were
changed.
This type of model has three major deficiencies:
1. The system is very sensitive to the selection of
feedback lines and the point at which they are
broken.
2. A few feedback loops can produce numerous races,
requiring excessive time and storage for evaluation.
3. Race analysis performed in this manner does not
detect static or essential hazards.
'Also, many of the races. which may be detected are
physically impossible. One of the goals of the model
being presented will be to overcome these deficiencies.
The model used by Seshu also makes the assumption
that feedback lines can be reset to any desired value at
any time, even in the presence of faults. It is felt that
this assumption is not necessarily valid, and the assumption is not made in the model being presented.
The speed of simulation (compilation and execution)
is an important measure of a good simulator. Therefore,
maximum speed is also an objective. However, the
requirement for maximum speed can, and should, be
sacrificed for more accurate simulation results, when
required. Therefore, the speed of this simulator is a
function of the level of detail required of the simulation.
Another simulation goal is the use of a minimum
amount of storage. The importance of this goal is twofold. First, minimization of storage requirements will
enable the simulator to handle larger circuits; and
second, its use will not be limited to large computing
facilities. Appropriate segmentation can help reduce
the storage requirements, but only with a sacrifice in
speed. For these reasons, both minimization of storage
requirements and software system modularity are of
primary concern.
The simulation package should be as flexible and
versatile as possible, but at the same time it should be
easy to understand and implement. Therefore, one
need only be concerned with those options which are
pertine,nt to the task at hand. For example, the option
of simulating faults is available with or without race
analysis.
An additional goal is the capability of easy adaptation
of new element types. Hence, one need only supply a
new description for element evaluation, and the program then adds to, or updates, the existing specifi-

cations. Therefore, this method should not require a
large amount of reprocessing.
The system should also have the capability of multiple fault insertion, as well as fault insertion on inputs
and outputs of functional modules.

MEANS OF ACHIEVING THE DESIRED
FEATURES
There are two approachys that can be taken for
digital simulation. One is the approach of modeling the
entire circuit as one unit and, therefore, with one
macro-model. This would bJ the technique used for a
compiled simulator, since a compiled simulator levels
and transforms the circuit into a form that can be
dealt with collectively. The other approach is that of
modeling the entire circuit by breaking it into smaller
blocks, which can be individually modeled according
to their type. This approach can be accomplished with
a table driven simulator. A table driven simulator deals
directly with elements, in that the circuit description
is explicitly specified during simulation. Therefore, it
determines, during simulation, what elements are to be
evaluated next and then uses one generalized routine
to evaluate all elements of anyone type.
The second method was chosen for this simulator
since the first contains undesirable features, such as
the need of pre-leveling, location and breaking of feedback loops, inability to handle various sequential circuits effectively, etc. Although the second approach
is more general, and can handle a large majority of
circuit types, it does have the disadvantages of being
slower and requiring more storage for certain cases.
Therefore, it is these latter two disadvantages that are
of great concern in this simulator structure. Solutions
to these problems will be discussed later.
The ability to simulate sequential circuits is inherent
in a table driven simulator, in that it dynamically
levels the circuit during simulation. This is done by
determining, from the circuit description, what elements
need to be reevaluated due to a change in the value
of a signal.
The ability to simulate asynchronous circuits relies
on the ability to accurately represent the time associated with evaluation of signal values. By including the
propagation time of each element in its description,
and then using this parameter during simulation to
order the evaluation procedure, this time factor can be
accurately represented. This means that one must have
the ability to accept propagation delays of different
lengths instead of making a unit delay assumption.
Another side feature of a table driven simulator is
that of not being required to consider feedback lines

Model and Implementation of Universal Time Delay Simulator

by special, cumbersome, and inaccurate methods. These
methods include explicit specification of what lines are
feedback lines and the ability to reset these to any
value. These two conditions are produced when an
attempt is made to represent an entire sequential circuit
by one model, as is done in' a compiled simulator structure. The feedback specification problem is not present
in a table driven simulator, in that no special case
need- be made concerning feedback lines. The latter
case, which is commonly referred to as a reset assumption, .is avoided since simulation occurs directly from
a given accessible state without requiring reinitialization of the state during simulation.
As mentioned earlier, speed and storage will be a
primary concern in this simulator. Three techniques
will be employed to reduce these problems to an acceptable level. They are: (1) selective trace simulation,
(2) parallel simulation, and (3) functional simulation.
Selective trace is a technique used in conjunction
with table driven simulators which provides the ability
to evaluate only those elements which have a potential
of changing. For example, one need not reevaluate a
gate output if all the input signals are the same as
they were when it was last evaluated. Thus, simulation
becomes a process of tracing changes, and their effects,
through the circuit.
Signal values can be stored in one of three manners.
First, one bit of a machine word could be used to
represent the value of a signal. Second, each bit of a
machine word could be used to represent a different
signal value. Third, each bit of a machine word could
represent different values of the same signal for different
input vectors or different fault conditions.
The first of these three techniques is extremely inefficient in storage handling. The second approach is hard
to execute in Fortran (the implementation language for
the system) since it would require bit manipulation.
Therefore, the third approach was taken. For this
technique, n different input vectors (where n is the
number of bits in the machine word length), or fault
conditions, can be simulated with the same speed and
storage required for the first approach. This is referred
to as parallel simulation, since n unique simulations
occur in parallel. The effect of this approach is to
divide the required simulation time by a factor of n.
Another important implementation feature is called
functional simulation. This is the grouping of a number
of logic elements together and then expressing the group
by its function. Thus, one need only store and evaluate
the function in order to simulate the represented logic.
An example of functional simulation would be the
representation of an adder by storing and executing a
simple add instruction·, instead of storing and executing
the large number of logic elements used to form an

209

actual adder circuit. Therefore, it can be seen that
functional simulat~on enhances simulation speed and
reduces storage. The ability to implement functional
simulation is compatible with a table driven simulator
structure, since it models the circuit by modeling elements, regardless of the evaluation procedure used to
model the element. For this reason, changing or adding
element types is a simple task which involves changing
only the evaluation procedure and its r~spective pointer.
Fault insertion is also simplified since it now becomes
a matter of simply providing elements having the same
characteristic as a faulty element.
When faults are automatically inserted, fault collapsing is used to reduce the number of possible faults.
This is done by inserting only one fault of a group of
faults which always produce the same outputs for any
input combination. For example, all stuck-at-O's on
the input of an "and" gate appear the same as a
.stuck-at-O on the output of that gate. Therefore, the
stuck-at-O's on the input need not be simulated if a
stuck-at-O on the output is simulated, since they all
produce the same response and are therefore repetitious.
To further increase the accuracy of simulation, an
ambiguity interval can be associated with each signal.
This is a result of an inability to specify exactly when
a given signal will actually make a transition from one
state to the next. The requirement of an ambiguity
interval comes from the fact that gates of the same
type could have different propagation times. Therefore,
the time delay of a gate would be represented as a
minimum value plus an ambiguity region. By considering this ambiguity, race and hazard analysis can
be performed during simulation.
It is not always desirable to perform race analysis,
since it requires greater simulation time and storage.
Therefore, there are different modes of simulation, including ;traight simulation and simulation with race
analysis. Straight simulation will be referred to as the
Mode 1 simulator, and simulation with race and hazard
analysis will be referred to as the Mode 3 simulator.
In order to obtain modularity in simulation and consistency in circuit modeling, Mode 1 will be implemented
as a subset of Mode 3. A Mode 2 simulation is also
available. This is a three valued simulation which can
be used for simulation initialization. It indicates and
propagates information as to whether or not a signal is
defined at a given time. This is accomplished with the
use of a third value. An example of the Mode 2 simulation is given in Figure 1. Here, each signal can either
be 1, 0, or I (1 indicates Indeterminate). Before time 0,
all signals are unknown and, therefore, in an indeterminate (1) state. At time 0, A and B are changed
to a 1. As a result of this change in A and B, the value
of C and D must be reevaluated. C is evaluated to be 1,

210

Spring Joint Computer Conference, 1970

A

E

8
=

D LJ\~:T=-10-

O---+b'---J tJ.T=5 -V
c

1
A

r--1

o

r...1r-------------------------------

B 1

o

results indicated for E. Since E is setting the Flip Flop,
this potential error region sets the Flip Flop to a
potential error value, which is the resulting state of Q.
From this example, it can be seen how the ambiguity
in propagation delay is handled, as well as how Mode 3
simulation handles potential error regions and how
these potential error regions can result in essential
hazards.
In general, the mode 3 simulator is used to propagate
potential error regions to provide determination of the
existence of essential hazards. The unique characteristic of the Mode 3 simulator, that does not exist in
the other modes, is that it carries regions instead of

and

0

1

I

0

0

0

0

1

0

1

I

I

0

I

I

1

C

r ----------'

o
1

DI------------------------____
J
o
--------....-..-'"T"'"'...-.,.u--r-u_______...-.....----- time

o

5

10

15

a)

Figure I-Mode 2 simulation

at t = 5. The change in A, at t = 0, causes a reevaluation of D. However, since C = I at t = 0, then D = I
at t = 10 due to the change in C having not yet propagated to D. An evaluation table for C and D is indicated
in Figure 2. However, the change in C at t = 5 causes
D to be evaluated again. Hence, D becomes 1 at t = 15
since both C and A are known.
An example of the Mode 3 simulation is depicted in
Figure 3. A and B are input signals of initial value 0
and 1, respectively. C and D are the output of inverters,
which have a propagation delay of 4 and an ambiguity
of 2. E, which is the set signal to an S-R Flip Flop, is
the logical "and" of C and D. At t = 0, A changes to
1 which produces a change in C to 0 through an ambiguity region from t = 4 to t = 6. This means that
the change of C could occur sometime between t = 4
and t = 6. If B changes at t = 1 (possibly due to an
ambiguity in the circuit feeding B), then D changes
value as indicated. Notice that the value of E, as a
result of C and D being 1 between t = 5 and t = 6,
is a potential error region. This, along with ambiguity
and minimum delay of the "and" gate, produces the

Logical "and"

or

0

1

I

0

0

1

I

1

1

1

1

I

I

1

I

b)

Logical "or"

Figure 2-Logic table for gate evaluation in a three
value simulation

Model and Implementation of Universal Time Delay Simulator

single values. A technique similar to this has been
described by D. L. Smith. 4 A spread is depicted with
the use of a New Value (NV), as well as a Current
Value (CV), along with the Potential Error value (PE).
During simulation, this ambiguity region is represented
by simulating the earliest possible transition point (the
CV) and the latest possible transition point (the NV).
The potential error is then a logical function of the
CV, NV, and the PE. With respect to the simulator,
this evaluation procedure simply appears as another
element type.
.Sequential logic circuits containing global feedback
loops are extremely difficult to simulate, even for the
simplest of design philosophies. Whether accurate simulation is achieved, most ofter.. depends upon the design
employing a special type of sequential action, which is
consistent with the particular simulator being used, or
the person simulating the circuit must have a very
intimate understanding of the circuit operation. These
two circumstances are more often the exception rather
than the rule. I n many design environments, as a
t=O
0-+1
A

1I=4+1
....._ _.... 8
E

t=l

FF

1-+0
B

R

1I=4+2
t-+
1

A

0-1

B

~---,L.

______________

C 1--------__

~

__

o
1
D O----------..ll...---I

1

E

o-----------------------.~~~---------o

1

2 3

,

ii'

4 5

6 7

"j,

i

,

time

,

8 9 10 11 12 13 14 15 16
potential
error
reaions

Figure 3-Mode 3 simulation

211

10

11

01

00

A

A

B

D

B

01

B

A

B

D

B

11

01

c

A

C

C

8

10

01

D

A

C

D

B

10

11

01

00

00

01

10

01

01

00

01

10

11

00

11

10

00

11

s

Figure 4-A transition table and state table for a simple
sequential circuit

result of practicality, as well as necessity, these conditions are rigidly forced upon the user by their simulation structures. In order to alleviate this problem,
the largest possible user flexibility was a goal for this
simulator. One of these degrees of freedom is presented
in the following example, which shows how race analysis
of an asynchronous sequential circuit can be performed.
A Transition Table and State Table are given in
Figure 4 for a simple sequential circuit which has been
implemented in Figure 5. A race can be seen to exist
between states C and D for the input vector 01 and
stable state B. Whether this race is a result of improper
design, characteristics of a circuit containing a faulty
element, or an intentional risk, is unimportant. The
important thing is that the simulation of this circuit
is capable of revealing sufficient information to determine how the circuit will, or could, act when physically implemented. Figure 6 shows the response of Xl
changing from a 1 to a 0 according to the unit delay
assumption (the circuit is in stable state 01, with
X 1X 2 = 11). From Figure 6, it can be observed that
the circuit makes a transition from state B to state D,
a seemingly definite and satisfactory result.
By considering the circuit response as indicated in
Figure 7, which uses more accurate delay time information (as given by the minimum delay in Figure 5)
for each gate, it is observed, through this type of
simulation, that all isn't as simple as indicated by the
previous simulation. It can be seen that, as the accuracy
of propagation time becomes closer to that of the
physical circuit, the race between states D and C comes
closer to actuality. It appears here as the simultaneous
transition of Y 1 and y 2. This type of simulation is one
which might be performed by Mode 1 simulation.

212

Spring Joint Computer Conference, 1970

2+1

XIII

°t~=-O--------------------------------------------

A

t=6
y11

o

--------.......I

t =10

I

B 1
0

t-4

6+1

I

y 1
20

Y2

F

t=10

1

.

time '
0

12

16

20

Figure 5-Example sequential circuit

Figure 7-Results for circuits with variable time delays

The curiosity raised by using a more accurate representation for the .delay time can be satisfied by also
considering an ambiguity time (as indicated in Figures

5 and 8) associated with each gate. The state variable
y 1 could change anywhere between t = 10 and t = 13.
This is a result of the possible variation in time delays

1

x101~

_______________________________________

t=O

X1J~

_____________________________________________

G~
t=2 t=3

G1---1
o

t=l

I

A

£-6

1

A

. t=2
1

t-lO

t-13

_ _ _ _ _...I

1

t=3

10

B1--,~--------------------------------------o

Y

I8

Y1~_ _ _ _ _ _ _ _ _ _ _~I~~~1

0 _ _ _ _ _...1

Y

t

t=l

1
B

0

t

I4 tI

5

1------

20

Y 1
2
0

t=2

I

t-10
F

F

0

time

.
4

time

o

10

Figure 6-Results for circuits with unit time delays

ruunu}JJ

I

t=20

T=12

12

€~17
.
16

20
Potential Error
Flag is Set

Figure 8-Results for circuits with variable time delays
and ambiguity

Model and Implementation of Universal Time Delay Simulator

of the inverter, "and" gate, and "or" gate, along the
propagation path Xl. Similarly, Y 2 could change between t = 10 and t = 12. The value of F is essentially
the "and" of Y I and Y2, since Xl appears as a constant
1. However, the "and" of the two ambiguity regions
for Y I and Y 2 is not only another ambiguity region in
F, it is also a potential error region as well. In actuality,
F mayor may not produce the momentary 1 spike
between t = 14 and t = 17. Note that a transition
region is concerned with the question of when a transition will take place. However, a potential error region
is concerned with whether a transition could take place.
Thus, from this example, it can be seen that a
variation in propagation delay is enough to produce a
critical race condition from a seemingly stable design.
This condition is detected in Mode 3 simulation when
the potential error flag is set for the state variable Y 2,
as indicated by the shaded area in Figure 8.
This example demonstrates some of the problems
that could be encountered when simulating sequential
circuits. It also shows how these problems can be
handled through the various modes of simulation available in this simulator.
To use these three modes of simulation, one need
only specify which mode is desired, so that the appropriate evaluation routines will be used. Since the
only difference is in the evaluation routines, the same
circuit description can be used for Mode 1, 2, or 3
simulation.
One important feature of this approach, with respect
to race analysis, is that race analysis occurs concurrently for nested races. Therefore, only one simulation
must be performed for n nested races, as compared to
as many as 2n simulations for other approaches.

SYSTEM IMPLEMENTATION
The first major implementation decision was the
choice of a programming language in which the simulator would be written. Assembler, Fortran and PL/l
were considered. Utilizing an assembler language could
result in a little faster execution, with somewhat less
storage required. However, Fortran was chosen since
it would be easier to implement and is considerably
more machine independent. Although PL/l has some
seemingly desirable features, they were sacrificed for
the more commonly acceptable Fortran and the small
decrease in execution time and storage. It was also
desirable for this simulator to be acceptable for use on
smaller machines, which have limited storage and compiler facilities. For these reasons Fortran was considered
more desirable than PL/1.
The basic simulator consists of three tables, the Time

213

Queue Table (TQ) , the System Description Table
(SDT), and a table which contains the Current Value
of each signal (CV). The time queue table contains
events that occur at time t, where t is the index of the
time queue table. The system description table contains
pointers to the evaluation routines used to determine
the output values of the element, pointers to the fan in
and fan out, and also contains the number of fan outs
for each signal.
Using these three tables, simulation is performed
as follows:
1. All values that exist in the time queue, at the current
simulation time, are transferred to the current value
table, thus causing any projected changes in value
to take effect.
2. If the new value entered in the current value table
is different from the old value, then all elements
th~t are immediately affected by this change are
reevaluated. (This is accomplished by following a
fan out list.)
3. The results of these reevaluations are projected into
the time queue at the current time plus the minimum
propagation delay of the signal.
4. The current time is incremented until an entry is
found in the time queue, and then the process is
repeated again.
This process is restated in a flow chart form in
Figure 9.
In addition to this basic structure, other tables are
used for optimization of both speed and storage. For
example, functions are evaluated indirectly, via the
Function Description Table (FDT), which also specifies
additional parameters used in the evaluation routine.
These parameters are: function type, time delay, number of inputs, and bus length. This permits a minimum
number of evaluation routines that must be provided
for simulation. By use of the FDT table, the same
routine would be used for a 2 input "and" gate with a
time delay of 5, as would be used for an 8 input "and"
gate with a time delay of 8.
To keep the size of the TQ from becoming prohibitively large, a Macro Time Queue Table (MTQ) was
implemented to store events which occur at large time
intervals, relative to the largest propagation delay for
any gate. The Time Queue was then made cyclic in
coordination with the MTQ, where each cycle of the
TQ advances the MTQ one step.
Some gate level elements, such as flip-flops, as well
as functional elements, have multiple outputs. To be
able to simulate this type of element, an additional
entry is provided in the SDT Table, which. is used to
chain the output together.
An ultimate goal of this system simulator is to

214

Spring Joint Computer Conference, 1970

INITIALIZATION

1

UPDATE VALUES INDICATED
IN THE TIME QUEUE AT
CURRENT .Tn'1E(CT)

2

IF VALUE CHANGED IN
PREVIOUS STEP, EVALUATE
OUTPUT OF GATES INDICATED
BY FAN OUT OF CHANGING
GATE
3

PROJECT RESULTS IN THE
TIME QUEUE AT CT + TD
(TII4E DELAY)

No

4

UPDATE CT IN TQ

Figure 9-Flow chart of simulator

possess capability of simulating elements other than
actual gates, such as functional modules. 5 Since functional modules are just as apt to be dealing with busses
as with single lines, the ability to specify bus· lines
collectively would make functional module specification
an easier, as well as a more meaningful task. For this
reason, the capability of specifying busses collectively

has been implemented with the use of a paging scheme,
where Bus'Value (BV) is a group of pages and Bus
Value State (BVS) is a table which indicates use, length
and location. Therefore, the CV of a bus is an indirect
pointer to a page, which contains the actual values of
the bus signals.
It can be seen that the System Simulator is independent of the type of function used to evaluate the
output signals of the element. For this reason, any
type of element can be simulated which can be described
in a discrete value system. Therefore, the power of
module simulation is directly proportional to the kinds
of module descriptions which are permitted.
In an attempt to cover the widest range of module
descriptions, five types of descriptions are permitted.
These are: (1) gate elements, (2) standard functional
modules, (3) compiled gate modules, (4) computer design language modules, and (5) Fortran modules.
Evaluation procedures for gate elements are defined
by gate type. Thus, one can specify gate elements, to
the system simulator, by giving the gates fan in and
fan out. This would be used primarily for circuits which
could be specified at the gate level.
Similar to the gate modules are the standard functional modules, in that they are predefined routines
which can be used by specifying an element as a standard functional module type. An example would be an
n-bit 2's complement adder. Thus, to define a complete
adder, one need only give its function type, the number
of bits being added, and its time delay. This feature
was provided in order to eliminate much of the trivial
task of redefining common functional modules, and
also to provide faster, more efficient, system routines.
To permit the initial design specification to become
the initial functional representation for design verification on a macro-level, a computer design language is
allowed for module description. This is done by compiling this description into an equivalent Fortran subroutine, which has the inputs equivalenced to the
appropriate current value table, and the outputs queued
in a scratch array. In order to provide sequential control
modules at a design language level, sequential modules
can be described in a flow table form of expression.
Thus, sequential control variables can be generated by
such a sequential module.
Also, to provide the capability of compiled simulation, one can specify modules at the gate level and
have these transformed into compiled code for simulation. This is desirable for purely combinational modules which can be simulated faster, or with less storage,
in a compiled simulation fashion.
Fortran module descriptions can be used as a means
of generating new, efficient, standard functional modules. They can also be used to generate special modules

Model and Implementation of Universal Time Delay Simulator

whose function cannot be described easily by a high
level system design language.
To make this system more usable, one must have a
means of redefining modules without requiring the
reprocessing of the complete system description. This
is done by automatically defining boundary elements
for each module of the system:
A boundary element is an element which is placed
at each input and output of a module to isolate that
module -from the rest of the circuit. Each boundary
element has one input and one output, and the output
value equals the input value. Taking this approach
permits changes in the module definition without changing descriptions outside the module. This is depicted in
Figure 10 where BI, B2, B3, and B4 are boundary
elements for module M2. Gate G I fans out to G2 and
BI when module M2 is expressed functionally. But, if
module M2 is expressed at the gate level, without
boundary elements, then the fan out of G I would be
to G2, G3, and G4. Without boundary elements, it
would be necessary to change the fan out description
of GI (which is outside the module), if the user wants
to change M2 to a gate representation. But, with the
boundary elements inserted, the fan out of GI (to G2
and Bl) is still the same independent of the type of
expression used for M2. This could be of extreme importance, when a number of design groups are using
the same high level description for the complete system,
while considering their own particular section at the
gate level.
For fault insertion, two tables are used. The Fault
Table (FT) indicates the type of fault and which leads
of the gate with which it is concerned. The Logical
Fault Mask Table (LFMT) indicates in which bit (or
subject machine) the fault is present.
During Mode t simulation, the value of each signal
is stored in the full word array CV. The values in the
CV are updated by transferring the appropriate value,

1------

F

H

- - i

,

A

I

B

I
I

G

215

which is indicated indirectly in the time queue, to the
CV, at the time indicated by the time queue description.
The same simulation structure exists for Mode 2
simulation, with the use of different evaluation routines
for element output evaluation. An IndeterminantValue
Array (IV) is used for storing the third value necessary
for Mode 2 simulation.
Due to the necessity of being able to process time
intervals, in Mode 3 simulation, a few minor modifications must be made in the simulator structure. This,
however, is not apparent to the simulator user. Storage
must also be provided for the Current Value (CV),
New Value (NV), and the Potential Error (PE), along
with the more detailed evaluation procedures used to
determine these quantities for each element.

CONCLUDING REMARKS

The simulator described in this paper has been programmed in Fortran IV on an IB:JVI 360/50, with 256K
bytes of core storage, at the University of MissouriRolla. The size of a system that can be simulated is a
function of the available memory capacity of the host
machine. For 256K bytes of storage, approximately
3000 elements could be simulated, using the 1\10de 1
phase of the system. (An element can range from simple
gate elements to complex functional elements.) This
would be reduced to approximately 2000 elements for
1\10de 3 simulation. For trial simulation runs, a running
time of 100 }Ls/pass·fault·element has been obtained,
utilizing Mode 1 simulation. This, however, is fairly
circuit dependent.
It is felt that this simulator represents a uniform
systems approach to simulation and diagnosis. Versatility of the models utilized has resulted in this capability. The system not only allows various types of
simulation for different hardware implementations, but
also provides the ability to handle the total system as a
collection of subsystems. Thus, each subsystem can be
simulated according to the particular type of circuit
involved, the requirements imposed upon the circuit,
and the most applicable simulation technique for the
particular subsystem.

J

,•

I

L _ _ _ _ .. _ _ _ _ _ _ ,

•

M2

Figure IO-Module boundary elements

REFERENCES

I

I S A SZYGENDA
A software diagnostic system for test generation and simulation
of large digital systems

216

Spring Joint Computer Conference, 1970

Proceedings of the National Electronics Annual Conference
December 1969
2 S A SZYGENDA
TEGAS-a diagnostic test generation and simulation system for
digital computers
Proceedings of the Third Hawaii International Conference on
System Sciences January 1970
3 S SESHU
The logic organizer and diagnosis programs

Report R-226 Coordinated Science Laboratory University of
Illinois Urbana Illinois (AD605927) 1964
4 D L SMITH
Models and data structures for digital logic simulation
Masters Thesis Massachusetts Institute of Technology 1966
5 D M ROUSE S A SZYGENDA
Models for functional simulation of large digital systems
Digest Record of the Joint Conference on Mathematical and
Computer Aided Design October 1969

UTS-I: A macro system for traffic
network simulation
by·HOWARD LEE MORGAN
Cornell University

Ithaca, N ew York

INTRODUCTION
One of the major crises which our cities must face is the
problem of traffic congestion. Already, many areas are
so congested that new roads seem to be the only answer.
In densely populated areas, however, new roads are
often unwanted because of the valuable land they use
and are not the solution to today's problems because
of the long time delay between planning and construction. Theref~re, it is essential that cities obtain maximum throughput from existing roads before new construction is tried. A general urban traffic network simulation system has been designed and programmed to
assist traffic engineers and planners in studying alternative solutions to traffic problems. The implementation consists of a set of macros which are used to
describe the network, and a set of subroutines which
perform the actual simulation.
Many analytical approaches to traffic control problems have been tried with varying degrees of success. 1
Queuing and other applied probability models have led
to the progression system of light control, one of the
more successful methods of smoothing traffic flows.
Ma~hematical programming has been used to optimize
some of the criteria for smooth flow, e.g., number of
stops, or total delay time. Both of these techniques,
however, have limited usefulness when applied to specific large networks because of complicating real world
factors such as garages emptying onto streets or variations'in driver behavior. These and other analytical
methods have usually been applied to systems which
had only static methods for signal light control. The
technology now exists for interactive computer controlled signal lights. Sensors placed on the streets can
report to a central computer which can analyze the
traffic pattern and send control signals back to the
lights in real-time. Mathematical analysis of these systems is extremely difficult.

217

The UTS-I system is a general traffic network simulator which has been developed to permit one to examine the effects both of "firefighting" techniques such
as one way streets, elimination of turns at congested
intersections, reversing directions of streets during peak
hours, and major improvements such as adding new
roads or installing computer controlled signal systems.
UTS-I is a microscopic simulation model in that the
basic transaction unit is an individual vehicle. This
type of model more closely approximates the real world
behavior of a specific traffic network than the macroscopic model, which would consider groups of vehicles
as one unit. While the macroscopic model may be more
efficient with respect to run length when general systems
are being studied, the microscopic model should yield
more insight into particular trouble spots in a specific
real system.
Any combination of intersections and distances between intersections can be specified. Such features as
three-way and larger intersections, stop or yield signs,
timed or sensor controlled traffic signals, or any combination thereof can be specified at any particular intersection. Traffic light sequences can be set for individual
intersections, or light controls can be interconnected
with other intersections and traffic flows. Other traffic
simulation programs2 ,3,4 have included some of the
above features, but rarely all in one system. In addition,
describing the network to be simulated has been as
much a problem as writing the simulator itself.
The basic flow of a car through the system is simple.
A car appears from a source and advances to an interaction point (sensor, intersection, end of queue, or
slower moving vehicle). Eventually, the car steps up
to the intersection and a random number is generated
to determine if a turn will be made. When the car has
the right of way and the intersection is not blocked by
another vehicle, it advances through the intersection
and onto the next road segment. It is then assigned a

218

Spring Joint Computer Conference, 1970

new velocity which it maintains until it reaches the
next interaction point. In this way a vehicle is stepped
through the system until it departs on one of the exit
roads.
The inputs to the system are arrival distributions at
all points where cars may enter the system and turning
percentages at each point where a directional option is
available to the vehicle. Information related to driver
behavior, e.g., wave delay time, velocity, etc., must
also be input. A portion of this information may be
common to all intersection models while some of the
information may depend on specific details of a particular system.
IMPLEMENTATION
UTS-I is actually a GPSSj360 program which makes
heavy use of the MACRO facility. {jPSS was chosen
for the initial UTS implementation because of the
availability of the processor and the fact that experienced GPSS programmers were available. Because of
the modular design, the simulator can be easily reprogrammed in any digital simulation language, and is
now being rewritten in SIMSCRIPT-II, which offers
more flexibility in the use of macros, in input and output
formatting and, in the basic simulation scheduling philosophy than does GPSS.
The system consists of several subroutines, each of
which simulates a particular subsystem of the network,
two important matrices which drive these subroutines,
and a description of the flow through the system provided through the use of macros.
The main routines are:
1.
2.
3.
4.

Road Segment System-simulated by LINK
Intersection System-simulated by ISECT
Interaction Point System-simulated by IACTP
Intermediate Segment System-simulated by
INTER

The vehicle enters the road segment. This routine
checks to see if the vehicle has caused a segment overflow. If it has, the previous intersection is blocked and
will remain blocked until the segment overflow condition is alleviated. The. vehicle then proceeds to the
end of the segment's queue at its own preferred velocity
or at the velocity of a slower vehicle preceding it
(after catching up to the vehicle preceding it, and
remaining at a reasonable distance behind).
Upon entering the intersection system the vehicle
enters the road segment's queue and waits. As the
vehicles preceding it move through the intersection,
this vehicle moves up in the queue. Eventually, it is
first on line, i.e., it is in "control" of the segment's
queue. At this time the vehicle checks to see if it has

the right of way. If it does not, it waits until it does.
When it does have the right of way, the vehicle proceeds
across the intersection. This causes a wave of car
movements to be propagated through the queue. If the
segment overflow condition exists, the previous intersection will eventually be unblocked. The vehicle after
moving through the intersection, turns onto the next
road segment.
The vehicle may come to an interaction point, which
is a point on its lane where the vehicle interacts with
vehicles on other lanes. The vehicle will not reach the
interaction point until its position in the queue reaches
that point along the segment. As vehicles leave the
road segment, this vehicle moves up until it crosses the
interaction point. At this time, the proper interaction
is caused. Interaction points may be used to simulate
flow at a point where one lane splits into two lanes.
The intermediate segment system is used in the same
manner as the Road Segment System, except that it is
used only between two interaction points or an interaction point and the end of the segment's queue.
The input data for UTS is organized into two matrices. These are the intersection entry matrix and the
distance matrix. A vehicle crossing an intersection will
have to cross a number of areas at which conflicts
with other traffic flows could occur. Each of these areas
is called an Intersection Conflict Cell (ICC) and is
assigned an identification number. ICC's for each possible path across the intersection are stored in successsive
columns of a given row of the intersection entry matrix.
Vehicles waiting at "STOP" or "YIELD" signs interact
with vehicles on the road to which they must yield.
There is an area on the road to which the vehicle must
yield which may not contain a vehicle if the vehicle at
the stop or yield sign is to enter the intersection. This
area is called a Continuous Interaction Area (CIA).
Each CIA has an identification number which is also
stored in the intersection entry matrix. The same
scheme used for ICC's and CIA's is used for traffic
lights. The number of lights which must be green and
the location of the first of these are stored in the intersection entry matrix.
The distance matrix is used by the model for storing
the following data:
1. Lane length.
2. Distance to interaction points from the beginning
of the lane.
3. Velocity of the vehicles which have just passed the
interaction points.
4. The length of the vehicle which is at the head of
the queue.
5. The total length of all vehicles which have not
yet reached the queue.

UTS-1

SEGMT
SEGMT

SUBROUT I NE

'ASSIGN
ASSIGN
MSAVEVALUE
TEST LE
SPLIT
NBLK
ASSIGN
ASSIGN
ASSIGN
ROAD
ASSIGN
TEST L
A'SSIGN
LINK
LANE
TEST GE
ASSIGN
ASSIGN
MOVE
ASSIGN
MSAVEVALUE
ADVANCE
THERE ASSIGN
ASSIGN
TEST L
ASSIGN
ASSIGN
ASS IGN
ASSIGN
TEST L
TRANSFER
ATQUE MSAVEVAlUE
MSAVEVALUE
ASSIGN
HOSEN UNLINK
TRANSFER

l,PII
3,PI2
2+,P2,II,P7,H
VSOLEN,KO,NBLK
I ,BLOCK
9,KO
19,PI3
5,CI
4,VSOFFST
MH2(*2,2),MH2(*2,*4),
19,KI
VSCHAIN,FIFO,LANE
MH2 (*2,2 ) ,P9, ATQUE
18,MH2(*2,2)
4,K2
8, VSVEL
2,P2,,6,VSVEL,H
VSDIFR
5,CI
9, PI8
PI8,MH2(*2,2I,ATQUE
18,MH2(*2,2)
19,PI3
4, VSOFFST
19,KI
PI8,MH2(*2,*4I,POINT
, MOVE
2-,*2,2,P7,H
2 - ,P2 ,I I, P7, H
19,KI
VSCHAIN, LANF • I
P,6,I

*'

SAVE ROAD 10
SAVE CAR VELOCITY
SAVE CORRECT ADDITIONAL LENGTH
SHOULD OTHER ROADS BE BLOCKED?
SEND OFF BLOCKING TRANSACTION
SET DISTANCE TRAVELLED TO 0
SET MATRIX2 POINTER
RECORD TIME
SET MATRIX2 POINTER
POINT
WHAT'S NExT IACTP
SET MATRIX2 POINTER
WAIT UNTIL ROAD IS CLEAR
HAVE WE REACHED THE QUEUE?
SAVE,DISTANCE TO QUEUE
SET MATRIX2 POINTER
RESET VELOCITY
SAVE CORRECT VELOCITY
IF VEHICLE IS EARLY, ADJUST IT
RECORD PRESENT TIME
SET NEW DISTANCE TRAVELLED
HAS QUEUE LENGTH,CHANGED
IF YES, SAVE NEW QUEUE OIST.
SETPI9FOROFFST
SET MATRIX 2 POINTER
SETPI9FORENOOFQUEUE
HAVE WE REACHED THE IACTPP
IF NOT, CONTINUE DRIVING
ADJUST QUEUE SIZE'
READJUST ADDITIONAL LENGTH
SFT CHAIN INDEX
CLEAR THIS PORTION OF ROAD
RETURN

Figure l-SEGMT subroutine

The method of implementation will be made clearer
by a more detailed look at the macro and subroutine
package, and by an application.
MACRO AND SUBROUTINE PACKAGE
This section deals with the macros and subroutines
which make up the UTS-I system. The reader who is
unfamiliar with GPSS/360may skip this section without loss of continuity.
Each of the four UT8-1 macros, LINK, ISECT,
INTER, and IACTP, all generate a similar sequence of
GPSS code. This code consists of saving the values of
some parameters, setting some new parameters for the
particular transaction, and calling an appropriate subroutine. For example, the LINK macro, which is used
for vehicles which have just turned onto a road segment,
might be written as follows:

The SEGMT subroutine is shown in Figure 1. The
flow through this subroutine is simple. First, it tests
to see whether or not this vehicle will cause the Jane
to overflow. If so, certain other feeder roads are blocked,
by sending out transactions to block these roads. When
the road is clear, the vehicle is advanced to either the
end of the queue or to the next interaction point along
this road segment. The queue is maintained as a user
chain for efficiency. When the vehicle enters the queue,
its length is added to the length of the queue, and the
available distance on the road is adjusted accordingly.
Each of, the other macros calls a similar subroutine
to handle the actions associated with it. Listings of
these subroutines and macros are available from the
author.
AN APPLICATION
UTS~I has been used to study traffic flow in a three
intersection, 24 lane network near the Cornell University campus. Figure 2 is a schematic diagram of
this bottleneck area.
Arrival rates were determined by counting cars in continuous two minute intervals between four-thirty and
five-fifteen p.m, Each road was measured in the same
manner so as to keep any bias constant. The percent
of the vehicles turning in each direction was determined
by counting absolute numbers of cars following each
distinct path at each' intersection. Segment and queue
distances were measured alongside the road with a
fifty foot tape measure.
The variables, related to driver behavior were estimated as follows:

MWAVE-the time for the vehicle startup wave
to reach a specific interaction point or the previous

LINK MACRO #A, #B, #C, #D,#E
where #A is the lane ID, #B is the preferred velocity,
#C is the expected next interaction point order' number
(1 if a queue is expected next, 2, 3 or 4 if any other
interaction point is expected); #D is the ICC ID to be
used if the lane overflows, and #E is the time which
it would take for the vehicle to start up and move
from ICC #D, if it were backed up into ICG #D.
This usage of the macro would generate the following
GPSS code:
SAVE MACRO#A, #B, #C, #D, #D
(saves parameter values)

219

r!~'l"

2"~

'"':--r-------.,;,:.:..-,----'i"..,;;;;.;.......!.--,-------'u

STOP

DRYDEN ROAD (ROUTE 366)

1:======

.+,
N

JUDD FALLS
ROAD

s

TRANSFER SBR,SEGMT,6
(transfers to subroutine)

Figure 2-Diagram of the simulated system

220

Spring Joint Computer Conference, 1970

intersection was estimated as:
2 seconds

+ -L-- -Di-

ROOI

150 ft./sec.

SWAVE-the time for the vehicle stopping wave
to reach a specific interaction point or the previous
intersection was estimated as:
2 seconds

THE

+ -L-- -Di200 ft./sec.

where L = the road length and D i = the distance
from the beginning of the road to interaction
point i.

LINK
IACTP
INTER
IACTP
INTER
ISECT

R002
R003

GENERA TE
ASSIGN
ASSIGN
MACRO
MACRO
MACRO
MACRO
MACRO
MACRO
LEAVE
LEAVE
TRANSFER
TABULATE
TERMINATE
TABULATE
TERMINATE

SYSTEM MODEL
8,FN$EXP~",20,H

I, FN$TRNI
7,FN$LNTH
1,59,2,0,0
KI,2,I
B
KI,3,2
I
I, FN$NAME, FN$PATH ,1,0,0,0,0,2
1,1
2,1
P,6,I
1,1
1,1

Figure 4-Simulation for road number 1

Figure 3 shows the system which was simulated with
all lane numbers, ICC's, and CIA's marked. From this
diagram the intersection entry matrix can be filled in.
For example a vehicle in lane 1 may cross the intersection to either lane 3 or lane 7. If it goes to lane 3,
it must cross ICC 30. If it goes to lane 7, it must cross
ICC's 30, 31, and 32. One can make the matrix more
compact if 30 is used for both types of crossing. Thus
we get the following values:
IEM(I, 1) = 30
IEM(l, 2) = 31

[!i]

IEM(1,3) = 32

14

The flow of vehicles along the roads in the system is
specified by writing a sequence of macros (with some
interleaved GPSS) which generate subroutine calls to
the appropriate routines. For example, the road marked
#1 in Figure 3 is simulated by the code shown in
Figure 4.
The following outputs have been obtained from the
model:
1.
2.
3.
4.
5.

Figure 3-Intersection conflict cells and lane numbering

Statistics for transit times for the entire system.
Average waiting times at STOP signs.
Queue ·statistics.
Lane utilization.
Graphs of lane utilization, histogram of tra,ncit
times, and cumulative distribution of transit times.

This output is supplied at intervals of 20 simulated
minutes for one hour of simulated time.
An example of the lane utilization graph is shown in

UTS-l

1.00
.95
.90
.85
.80
.75

Lone Utilization
(One or more vehicles in the lone)

.70
.65
.60
.55
.50

.45
.40

.35
.30
.25
.20

221

VVAA
.VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
»»»»»»»>1111»»»»»»»»»»»»>1111»»»»»»»»>
»»»»»»»>1111»»»»»»»»»»»»>1111»»»»»»»»>
«««««««<1111««««««««««««<1111««««««««<

.15

.10
.05
.OO~~~~~~~~~~~~~~~~~~~~~~~

Lone Number

Figure 5-Lane utilization graphs

Figure 5. Lane utilization is the total time vehicles
were "in control" of the lane divided by the total
simulation time up to that point. This figure is thus
the proportion of time that vehicles occupied the lane
and the lane was blocked. This and other statistics
have been compared with real world data and the
model has proved a valid representation of the real
system. Experiments which are now under way are attempting to discover a good traffic control algorithm
for this specific network. Sensors are installed on these
streets and a computer controlled network will be
simulated.
USER INTERFACES
The user interface is one of the most important
parts of any system. Since simulation programs are
often quite sophisticated in their use of list processing
and other non-FORTRAN oriented techniques, the
average engineer who is not a computer specialist often
has a hard time constructing or using a simulation
model. While the macro subroutine system described
above makes traffic network simulation easier to program, the author feels that much can still be done to
improve the usability of the system both by traffic
engineers, and computer programmers.
To this end, a simulation' generator program is P.ow
being written. The input to this program will be a
map of the traffic network to be simulated, with needed
parameters marked directly on this map. The output
of the program will be the two matrices and the sequence of macros which are used to describe the system.
An example of the input form which will be used is·
shown in Figure 6. At present, this will be keypunched
by laying a grid over the form, and then translated by
the program into the required network.
During the seventies, when graphical input/output

VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
»»»»»»»>1111
»»»»»»»>1111
VVAA
VVAA
VVM
VVM

VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVAA
VVM
VVAA

«««««««<1111««««««««««««<1111««««««««<
«««««««<1111««««««««««««<1111««««««««<

XVAX
VA
VA
VA
VA
VA

V.A.>.<

VVAA
VVM
VVAA
VVAA
VVAA
VVAA

-

Indicate direction of flow
Indicates an intersection

X

Blocks further travel on this path

Each symbol is one distance unit.

Figure 6

devices are more readily available, it is hoped that
this map can actually be displayed o'n a CRT, and
parameters altered during the simulation. Systems of
this type have been discussed,5 although not in the
traffic context.
The current output interface, namely, graphs, has
been a useful adjunct to the tabular output which is
produced automatically by GPSS or SIMSCRIPT
systems.
SUMMARY AND CONCLUSION
An important criteria in determining the usefulness of
a simulator is the running efficiency. UTS-I, operating
on a IBM 360/65 computer, in a 200K partition under
OS/MFT-2, runs in about 5% of real time for the
application described above. A large portion of this
time is spent in the GPSS scheduling routine.
The problfm of how to synchronize events scheduling
is an especially important one in a microscopic network

222

Spring Joint Computer Conference, 1970

simulation model. Conway6 has presented some guidelines for choosing between the two standard methods
of variable time (or next most imminent event) or fixed
increment timing· routines. GPSS provides only the
variable timing mechanism, which may be a drawback
in a traffic network simulation model, where the event
densities would lead one to try fixed increment timing.
Working with G. Siegel, the author is currently
implementing a mixed timing method in·SIMSCRIPTII. This will dynamically· change the synchronization
method from fixed increment to variable increment
timing as· the simulation proceeds. This should. make
the entire run more efficient than if either variable or
fixed timing was used alone.
UTS-I was programmed in one man month after
approximately 3 man months of design. These times
are for the general simulator. .The application was
programmed using the UTS-I system in one man week
after approximately three man weeks of design and one
man week of data collection. The relatively short time
needed to program the specific application indicates the
utility of the macro approach to a traffic simulation
system.
In conclusion, UTS-I, a general traffic network simulation system, composed of a set of macros and a subroutine package, has been designed and programmed.
This system, and other macro systems for modeling
the subsystems of our society should prove to be valuable tools in the analysis and optimization. of our environment during the coming decade.
ACKNOWLEDGMENTS
The author wishes to acknowledge the assistance of.
1\1. Feldman, who with R .. Atwood, S. Prensky and

A. Hodges coded UTS-I as part of a Master of Engineering project in the Department of Operations Research at Cornell University. The author also wishes
to acknowledge the help of G. Siegel, who is assisting
the author in reprogramming UTS-I in SIMSCRIPT-II.
The UTS-I project has had the financial assistance
of the Department of Transportation, Federal Highway
Administration, Bureau of Public Roads under Contract No. FH-ll:-6913. The opinions, findings, and
conclusions expressed in this publication are those of
the author and not necessarily those of the Bureau of
Public Roads~
REFERENCES
1 D R DREW
Traffic flow theory and control
McGraw-Hill Inc. 1968
2 A M BLUM
A general purpose digital traffic simulator
Document No 680167 Advanced Systems Development
Division IBM
3 F BARNES F WAGNER JR
Case study in the application of a traffic network simulation
model
Paper presented at the 34th National Meeting of the
Operations Research Society of America in Philadelphia 1968
4 T SAKAI M NAGAO
Simulation of traffic flows in a network
Comm ACM 12 No 6 June 1969 pp 311-318
5 R W CONWAY
Some tactical problems in digital simulation
Mgmt Sci 10 No 1 October 1962
6 HUNTER SHERMAN J REITMAN
GPSS/36o--NORDEN a partial conversational GPSS
Digest of the Second Conference on Applications of
Simulation December 2-4 1968

MARSYAS-A software system for the digital
simulation of physical systems
by H. TRAUBOTH
Marshall Space Flight Center
Alabama

and
N. PRASAD
Computer Applications, Incorporated
New York, New York

computer-like block .diagrams mixed with FORTRAN
subroutines and statements. 5 Most of the common
digital simulators .are pre-compilers which generate
FORTRAN code and order the integrator statements
automatically so that the numerical integration fof'
each integrator can be performed in the proper sequence.
In comparison, the digital simulation system described
in Reference 6interprets linear block diagrams of transfer functions and converts them into a matrix equation
whose coefficients are determined by the numerical
convolution for each transfer function block. 7
A blueprint of the digital simulation system being
described in· this paper was given in Reference 8. This
simulation system addresses itself to the engineer who
has little experience in simulation and in computer
programming and who wants to simulate large physical
systems. It could be used for a variety of applications
such as for design analysis and evaluation, checkout
and malfunction analysis. It could also serve as a
storage and retrieval system for models-being a basis
for a "model configuration control" system on a central
time-shared computer~ The development of models is
costly, and therefore, they should be utilized by as
many people as possible. The language allows a standard
description of models and easy modification of already
stored models, assuming the physical system is described by multiple inputj output blocks or hierarchies
of blocks and their interconnections using names as
they appear in engineering drawings. The outputs of
the simulation system. are not only time-responses but
also other analysis data such as stability parameters,
frequency response, etc.
During the operation of a system of this magnitude

INTRODUCTION
New aerospace systems which are likely to be developed in the next decade such as the Space Shuttle
Vehicle and Space Station will be used for more versatile
missions; they will be more autonomous and independent from ground operations; and therefore, their
design will be more complex than that of present space
flight systems. In order to design these new systems
optimally for all mission phases, extensive design analyses, evaluations, and trade-off studies have to be performed before a design can be finalized. This means
that many simulations of various degrees of depth have
to be run to test all possible mission conditions. Thereafter, the integrated hardware and software systems
have to undergo extensive testing and checkout before
they are flight ready.
For the Apollo Program, several enormous simulation
facilities have been installed at contractor and NASA
sites consisting of various types of simulators such as
special purpose hardware simulators, flight trainers,
analog, hybrid, and digital computers. 1 ,2 Though as
early as 1955, first attempts were made to simulate
analog computer block diagrams on a digital computer, 3
analog computers and special purpose simulators have
played a major role in simulation until a few years ago.
Since then, the great flexibility of modern digital computers has been explored in a number ofr developments
of digital simulation languages particularly for non-real
time analyses. 4 In order to direct the development of
digital simulation languages, a standard language was
introduced by the Sci-Committee. This language, CSSL,
is particularly suitable for the simulation of analog
223

,.

224

Spring Joint Computer Conference, 1970

one has to reckon with certain alterations of the operational features of the syste~. Therefore, the simulation
systems software is designed in a modular form to keep
the impact of possible design changes to a minimum.
To provide the additional analysis capability and to
obtain efficient computation speed, analysis techniques
of modern control theory have been employed for the
mathematical foundation of the simulation system. The
differential equations generated from the block diagram
are in the form of vector-matrix state equations, which
do not need to be ordered for their numerical integration.
We named this simulation system MARSYAS which
stands for Marshall System for Aerospace Systems
Simulation. The description of the user language, the
mathematical foundations, and the software structure
can be only briefly given in this paper. However,
separate papers are being prepared to cover these areas
more exhaustively.

SIMULATION CAPABILITY
Before a physical system can be simulated, a model
of its functions has to be generated. In most cases,
the derivation of a model requires human judgment
and, therefore, is a manual process. Only in special
cases as in electrical circuits, a direct relationship exists
between the physical components network and the
mathematical description of its functions. In such a
case the physical network can be described directly to
the computer which then generates the mathematical
model and solves for it. 9 ,lo For the design of MARSYAS,
a model of the guidance and control system of the
Saturn V and a model of the propulsion system of the
S-IVB stage were used as test models representative
for other space vehicles' electrical, mechanical, and
hydraulic systems. The models referred to in this paper
represent the continuous and discrete dynamics of
physical systems which can be mathematically described by ordinary differential equations, algebraic
equations, and logical.functions.
The engineer prefers to describe a model by block
diagrams because their graphical representation is visually comprehensive. The blocks of the diagram can
have multiple inputs and multiple outputs, and a block
can contain other blocks within itself, i.e., block diagrams can be built in hierarchies, or in other words
they can be nested at several levels. At the lowest
level where the block cannot be broken down further,
we call the block an element. We distinguish between
linear and non-linear elements. A linear element is
represented by a transfer function or more generally

r---------~~~~--------I

I

I
ACTUAT•• _nAG"'"

,lA,

I

I

I
1 ,,,"CA'

L _____________________ -1

Figure I-Example of a model described by a block diagram of
multiple input/output blocks and a nested block

by a linear differential equation. A nonlinear element
is represented by an algebraic equation or by a logical
or switching function or by a nonlinear dinerential
equation. Figure 1 depicts an example of such a model.
Elements which are used frequently are called standard
elements and are available in a Standard Elements
List (Table I). This list is not fixed; it can be updated
easily using MARSYAS statements. For infrequently
used special elements, a FORTRAN subroutine can be
submitted. Thus, the block diagram can contain analog
computer elements, transfer functions, algebraic equations, and nonlinear ordinary differential equations.
A block diagram is specified to the computer only
by the names of the blocks, inputs and outputs; by
the names and values of the element parameters; and
by the unidirectional interconnections of the block.
The names can be up to 36 characters long so that the
same names as found in engineering documentation
can be used. The model thus described can be stored
permanently in the Functional Data Base (FDB) or an
already stored model can be modified. Only authorized
personnel having the access-key can write into the
FDB, whereas everybody can read out and use models
of the FDB.
For a simulation run, the input signals or excitation
functions into the total model can be pre-stored analytical functions such as exponential, sinusoidal, time
functions or digitized signals recorded on magnetic
tape. These recorded signals may be measured signals
or output signals generated by a previous simulation
run. The outputs of the simulation can be manifold.
Any connection point in the block diagram can be
chosen for obtaining a systems output signal.
The dynamic systems output signals can be plotted

l\1ARSYAS

225

TABLE I-Extract from list of standard elements

CLASS

BLOCK DIAGRAM SYMBOL

# OF

# OF

MNE·

INPUTS

OUTPUTS

MONIC

INPUT _ OUTPUT RELATION

LIST OF PARAMETERS IN THE ORDER IN WHICH THEY
APPEAR IN THE ELEMENTS STATEMENT

1'1
_
1: b.I dio(t)
.
-

1'1

1: a; s'
BLOCK

=

~

ott)

j
0
-1'1--

f-+-

1

1

dii(t)
1: o.

I

=

I

0
i

CONSTANT
MUL TIPLIER

~

f

K

N. oN' 01'1_1' 01'1_2 .... 0 0' bN, bN _ 1, bN _ 2, .. ,b o

1'1

1: b.sj
i

dl l

=0

i

BL

1

1

CM

1'1

1

AD

dl i

=0

p(t)

=

K

K itt)

il

i2

N

ott)

1:

ADDER

ott)

= 1:
i

NONE

ii(t)

=1

iN

OUTPUT
SLOPE
LIMITER

SAMPLE AND
HOLD

~~~
1

i(l)

--Ts

1

I

1

b

LM

SH

~

0(1)

5

n

MUL TIPLIER

~

f

X·

i (t)
2

2

1

ML

2

1

BRO

=

=

i~nT).

ott)

i 2 (1)

BOOLEAN RELAY

[X

ott)

-------+

~-

i 2 (t)

il(t)

X

:::1~ o~
0

RESOLVER

0(1)

1

BRI

1

3

o. bt

C

INPUT

nT5t 2.

CONNECT: A2, ADI, MOTOR-A, AD2, II, T2, LIMITER-C, ADI
: AI, AD2: A3, ADI
: .TI, ACT!: T2, ACT2.
END.

MODEL:

CONTROL SYSTEM-X I

INPUTS:

X, Y.

OUTPUTS: HORIZONTAL, VERTICAL.

ENGINEER-ORIENTED LANGUAGE
The language is designed so that the user transmits
to the computer· only information which is essential to
describe the model and specify the simulation run, and
does not concern himself with programming the computer. If an engineer has no knowledge of modelling,
he can call up a pre-stored model and run a simulation
after specifying only the model input signals. On the
other hand, if the engineer has FORTRAN programming knowledge, the language gives him some capability for special control of the simulation.
The MARSYAS language is divided into modules
which describe independent functions of the simulation.
These language modules are the:

ELEMENTS:RE, RESOLVER-B.

SUBMODEL: ACTUATOR-STAGE 1-3: INPUTS: AI, A2. A3: OUTPUTS: Acr I,
ACT 2.
.
SUBMODEL: GIMBAL-3; INPUT: GI; OUTPUTS: GIM-I, GIM-2
CONNECT: X, AI, ACT I, RESOLVER-B (UI, WI). HORIZONTAL
: Y, A2, ACT 2, RESOLVER-B (U2, W2), GI, GIM-I, A3.
: GIM-2, VERTICAL: ACT I, RESOLVER-B (U3).
END.

SIMULA TE:

EXCITE:

Within these modules, several MARSYAS statements are available. Statements are written in free
format and need no special ordering. However, the
ordering of the modules has to follow simple logical
rules, e.g., a Simulation Module has to be preceded by
a Description Module because the model must be described before it can be simulated. A statement consists
of an 'operator', and an 'argument field' which iseomposed of several arguments.
The Description Module is used to describe a model
given in the form of a block diagram. It is headed by
the operator MODEL. The ELEMENTS~statement
contains the name of the element, the type of Standard
Element (to be found in the List of Standard Elements) ,
and the parameters. The parameters are written in the

X, FSTEP, 5.0: Y, FSIN, I, 3000, O.

STOP IF, TIME .GT. 2.00.
END.

PRINT: FMT, 0.01, X, Y, HORIZONTAL, VERTICAL.
FORMAT:

FMT, 4F 13.8.

END.
END:

Description Module,
Modification Module,
Simulation Module,
Continuation Module,
Post-Processing Module, and
Analysis Module.

CONTROL 5YSTEM-X I.

INITS: ACTUATOR STAGE 1-3, MOTOR A, 1.5, 12.

MARSY AS PROGRAM X.

(Figure 2)

Figure 2-,MARSYAS-program of example in Figure 1 (It is
assumed that model GIMBAL-3 is stored in the functional
data base)

proper format for the particular element type and are
either the numerical values or names. The numerical
value of a named parameter is given by the PARAMETER-statement. If the element is not in the Standard Element ·List, a FORTRAN-subroutine carrying
the same name as the element is given. A model stored
in the Functional Data Base (FDB) is called by a
SUBMODElr-statement containing the embedded
model name and its input and output names. The
CONNECT.c.....-statement lists strings of inputs and outputs of elements, submodels, system inputs, or system
outputs to be connected to form the model block
diagram. For elements or sub models having a single
input/ output, only the name of the element or submodel
appears in the CONNECT...,-statement. The INPUTstatement designates names of the inputs of the model,
the OUTPUT-statement designates names of the outputs of the model. The STORE-statement carrying

lVIARSYAS

the proper key-code transfers the Description Module
into the permanent FDB. Thus, the Description Module
can be used for storing and retrieving of models as
well as for describing the model for a subsequent
simulation run. Figure 2 illustrates the MARSYASprogram for the example in ~igure 1.
The Modification Module allows inserting, deleting,
and disconnecting of elements and submodels through
the use of the SUBSTITUTE, DELETE,-and DISCONNECT-statements. The UPDATE-statement
allows additions to the Standard Elements List. Statements of the Description Module are used to specify
the elements, parameters, and interconnections which
are to be modified. The Modification Module can be
used for modifying models of the FDB or for a subsequent simulation.
The Simulation JIIlodule is used to define the course
of the simulation. The IN ITS-statement specifies
initial conditions for the outputs of blocks. The EXCITE-statements tells the MARSYAS-processor
what excitation functions or record tapes are fed into
what system inputs of the model. If a numerical integration method other than the standard method is to
be used, the INTMODE-statement specifies the integration method, and the relative truncation error or
integration step-size (for fixed step-size integration
methods). The STOP or HOLD statements determine
the condition under which the simulation should terminate or hold, e.g., if a certain time or certain amplitudes
of certain output signals have been reached. If the
simulation calls for the repetition of a simulation run
with modified parameters the CHANGE statement
specifies these parameters and their values.
The Continuation Module is used tore-start a simulation run that has been terminated by a HOLD statement. This module is particularly useful when the user
wishes to insert check points in a lengthy simulation
run at which he can obtain intermediate outputs. (He
can also change the integration mode at these check
points.) Upon these outputs he can decide whether it
is worth to continue the run.
In the Post-Processing Module the user indicates
which output signals he wishes to print or plot and the
format and labels of the output. The FORMATstatement resembles the FORMAT-statement in
FORTRAN.
The Analysis Module allows the user to designate the
type of analysis output he wishes, e.g., the FREQUENCY RESPONSE-statement calls for the frequency response within the specified frequency range.
Other analyses performed by the system are the determination of steady-state response, power spectrum,
stability, and sensitivity for which special statements
are available.

227

MATHEMATICAL FOUNDATION
Analytical formulation

In the formulation of the mathematical process which
converts the block diagram into an internal format
acceptable to the computer, we distinguish between
three parts of the model: (1) the "dynamic" elements,
(2) the "non-dynamic" elements, and (3) their interconnections. The output of a "dynamic" element is a
function of all past input while that of a "non-dynamic"
element depends only on the instantaneous input. The
"constant multiplier" (or ideal amplifier) and the
'sumer' are linear "non-dynamic" elements. The linear
elements 'time-delay', 'sample-and-hold', and 'differentiator' are treated as pseudo-non-linear elements. For
explaining the mathematics it is assumed that the
model consists of interconnected "dynamic" and "nondynamIC" elements of various types but of no nested
submodels. By some software processing the MARSYAS
-processor has already unwrapped these nested submodels.
The linear "dynamic" element 'transfer function' is
characterized by the following relationship:

(1)

with ak and bk being constant coefficients and n the
order of the differential equation (or number of poles
in the complex frequency domain). It is assumed that
q < n. If n = q, the 'transfer function' element can
simply be split into one with a < n and one 'constant
multiplier' and 'summer' element. oCt) is the output
signal and i(t) the input signal of the element. Using
the method as described in References 11 and 12 this
differential equation can be converted into a state variable matrix equation. For the jth element we obtain
X(j) (t) =

+ P(j)i W (t)

A W XW (t)

Ow (t) = C(j)x/j) (t)

(2a)
(2b)

X/j)]

xw (t)

=

:

[
X'n (j)

is a state vector of the jth element, and A (j), CW, and
pw are constant real matrices of the dimension n X n

228

Spring Joint Computer Conference, 1970

and can be obtained from

ak

and bk of equation (1):

X(l)

(t)

1 0···0
X(t)

o
AU)

X(j)(t)

1···0
X(m)

=

(t)

A(l)

o

0···1

o

0···0

A=

AU)

A(m)

P=
p(m)

c(j)

= [1 0 0·· ·OJ

(2c)

C(l)

For a collection of m "dynamic" elements we can \vrite

C=
X(t)

= AX(t)

O(t)

=

+ Pl(t)

(3a)
C(m)

CX(t)

(3b)

~(l)(t) l'

where
X(l)

(t)

lU)
[

i(m)

(t)

and

~(l) (t)

XU)
O(t)

=

.

[

.
O(m)

X(m)

(t)

1

(t)

We now assume that the "dynamic elements are con-

IVIARSYAS

nected in any way through 'constant multipliers' and
'summers' to form a linear model. We then can write
the following linear interconnection matrix equations:
let) = EO(t)
Wet) = GO(t)

+ FU(t)
+ HU(t)

nonlinear interconnection matrix equations
let)

In the above
U(t)

= vector of inputs (excitations) into model

Wet)

= vector of outputs from model

=

EO(t)

R(t) = E'O(t)

(4a)
(4b)

k = number of inputs into model
l

= number of outputs from model

and E, F, G and H are matrices having appropriate
dimensions.
The coefficient eij in E means the total constant gain
along the path from the output OJ of "dynamic" element
j to the input ii of "dynamic" element i. The coefficient
hj in F is the total constant gain along the path from
the model input Uj to the input Oi. The coefficient gij
in G is the total constant gain from the output OJ to
the model output 'Wi. The coefficient h ij in H is the
total constant gain from the model input Uj to model
output 'Wi.
There may be linear systems that do not have the
intercOIinection equations (4a and 4b) but wherever
possible the MARSYAS processor generates these interconnection equations.
By substituting equation (4a) into equation (3a)
and equation (3b) into equation (4b) we obtain the
model overall matrix equations:

=

X(t)

A*X(t)

Wet) = C*X(t)

+ P*U(t)
+ D*U(t)

+ FU(t) + KY(t)

(7a)

+ F'U(t) + K'Y(t)

(7b)

where the vectors yet) and R(t) represents the collection of the output vectors and· input vectors of all
nonlinear elements of the model. The output vector
W (t) for the model becomes
Wet) = GO(t)

m = number of "dynamic" elements in model

+ HU(t) + K"Y(t)

=

A +PEC,

C*

=

GC,

XU)

=

A *X(t)

+ P*U(t) + N(O, U, t)

(8a)

Wet)

=

C*X(t)

+ D*U(t) + M(O, U, t)

(8b)

and
with A *, P*, C* and D* being the same matrices as in
equation (5c). N(O, U, t) and ]}f(O, U, t) are the nonlinear column vectors (nl(O, U, t), n2(0, U, t), "',
nm(O, U, t) and (ml(O, U, t), m2(0, U, t),
ml (0, U, t) respectively. It is assumed here that there
are no "algebraic loops" in the model. An overview
diagram of the mathematical process is given in Figure 3.
The matrices A *, P*,C*, and D* are characteristics
for a linear model and they can be used for a number
of analyses.
The stability of the system may be determined from
a knowledge of the eigenvalues of matrix A *, i.e., all

(5a)
(5b)

p* = PF

and D* = H.

'---r=
1="..=•••=.,.=
....=....=."..=.=:

(5c)

We now include nonlinear elements of the form

UMIAI:

DYMMtC

ILI.INTl

MOMLM,,,
1t.. . e:Mn

....... "c .........

I

J "'LOCOTM~~~::: ,..

=

F(R(t)).

I
-::::::.

l-

I

c,....

I ·I'--_ _ _~~
I

(6)
where y

:u -20

!
4~

~

~

J>

l'~ 2A

i

128

~

i!-30

"- ~
L~

3A"

838

-40

Figure 4-Throughput decrease vs. RTS core requirements
Since HTS replaces RJE packages requiring 10 K 8, the curve
passes through zero decrease at that point.

Remote Real-Time Simulation

247

TABLE III-Results of 6600 Job Mix Sample Throughput Tests
Run Time
(minutes)
Test Condition
No.

1
2

3
4
5

6

7

CPU
Utilization

Thruput
Degradation

1st Shift
Job mix
(Case A)

24 hour
Job mix
(Case B)

24 hour
Job mix
(Case B)

1st Shift
Job mix
(Case A)

24 hour
Job mix
(Case B)

33.55
43.00
45.43
47.50
55.30
61.80
41.65

75.28
94.,1)0
100.75
113.00
131.40
142.22
103.30

60.4%
50.1%
47.6%
73.1%
83.6%
78.5%
N/A

21.0%
27.8%
33.6%
54.6%
74.0%
17.4%

25.6%
33.7%
50.1%
74.6%
88.9%
37.2%

Figure 5 is plotted for a particular configuration of
the 6600 system; i.e., no Export/Import operation,
RTS assigned a dedicated PP and RTS assigned 40K(8)
memory locations (which leaves 300K(8) for batch
work). The CPU time is also assumed to be required
on a 20 millisecond interrupt basis. However, included
are the results obtained from case 7. Since the real
time job was assigned only 20K(8) for case 7, and
there is good correlation between the results of case 7
and the curves of Figure 5, Figure 5 is probably applicable over a wide range of real time job core requirements.
Figure 6 shows the average total 6600 CPU usage
(iIi percent) by both the batch jobs and the real time
simulation while the Job Mix Sample tests were running.
Ideally, if every job used the CPU only when no other
job needed it (i.e., during the other jobs' I/O wait
periods), then 100% CPU utilization could be attained.
As this figure shows, while there has been a dramatic
improvement in CPU utilization due to the introduction

of RTS, there are still occasions when, because of I/O
or memory conflicts, no job is ready to use the CPU.
One unexpected result of the throughput tests was
the amount of CPU overhead we experienced in getting
the real time simulation on and off the CPU. In theory
the "on" delay is not accountable as real time job CPU
time and the "off" delay, which is accountable, averages
about 300 microseconds each time the real time job
releases the CPU. The accountable CPU overhead
measured during the throughput tests averaged 900
microseconds per frame. We have not been able to
identify the source of this overhead.

x-

90

<

ITI

/~~

::0

x-

C)

ITI

en
en

0
0

80

V

n

"cc

:j

r
20

~:::~

x-

'"

o
o

....

20

::0

" ,t'A

C

i
"0
C

....

g

40

60

z

~

60

J 08 MIX RUN OATA ASSUMED

FOR CORE REMOVAL EFFECT

~~ -i

6600
~ "~I ~.~
X. ~F

,,

"

SIMULATOR DATA ASSUMED

OR CORE REMOVAL EFFECT

OJ.

,,~

--

"

50

lL
I

o

10

V

20

30

40

50

REAL TIME SIMULATION CPU REQUIREMENT (%1

r'' .... ",
~~

/

(5

50

....
8 -~
~~
........ ' .....8

x

....

40

~~
................ (

<1>

30

70

N

REAL TIME SIMULATION CPU REQUIREMENTS ("!o)
10

/!l5.4B

-

FIRST SHIFT
JOB MIX

- - - 24 HOUR JOB
MIX

Figure 6-Average CPU utilization vs. RTS CPU requirement

Figure 5-Throughput decrease vs. RTS CPU requirements
Solid lines refer to first shift conditions, dashed lines to the 24hour case.

Although percent utilization has increased considerably,
it falls short of the theoretically possible 100%. That goal can be
attained only if the queue managing algorithms are continuously
adjusted to eliminate the condition where, because of I/O waits
or memory conflicts, no job is ready to use the CPU.

248

Spring Joint Computer Conference, 1970

I

~-~.-,-,-.....,-.,.----r--.-r-,.-,--.,-,--,-,-,-,-"

+-FR±4--+-jf- -rto-,--

-- -

I~

It -

If

--I

-~ -: !
~!

I
11.

j

r: ~

l'

vs. real time work come into play; but several conditions could result in the real time job taking precedence. Some of these conditions might be:

1

1 II
11.· 11...1
I

,I

1. Limited duration of the real time application

-·~~I~- ~

f-f--

-1-1--1--.I-

1-

- t- - ._.. -

- -- . ·-f-

-- -.

f--

VI

-

.f-'J.-.-I-+-++
1-.-

-

c'-

f-+-+-+-+-+--+--t-

--+---+-+-t-+--+---t-f-----+---+--+-+-+--+---+---+-+--t

_

~!

:-1": ---- . j

-+. +-t

-+--+ +---+---+

.;

-1

I

-,

"'/:>3501

s,.,nailftlJ,

=-- ~ : : _-t-::-

01'-->--

1--

t:::t:-+--+-i-+-'--_H
f--

_++_

=tt-=ti+++-t+t+t+t:- t::-;-:: _ t-tt:::t:-

-+t-++-+-+--I--Ht-H--+-t

!':,~ : :J-~11~1IiI1,111,1,1,i~llijll~II/11
.01- -I---l---I-l--I-l--I-l--I--HI--H----+--l--l-+--+-+-+-+--+--+--+--+-+

-+--I-+-I-+-+--+--+--+-+-+--+

310N.,
"Ot:l

001-_

a. to support a "crash" effort (e.g., short-deadline
proposal)
b. to conduct a one-shot series of tests
c. to allow time for purchase of additional computing equipment.
2. Availability of "outside" facilities to run batch backlog created by the real time application.
3. Restricting the real time application to certain hours
of the day.
Special test facilities or hybrid simulation labs that
are currently in the planning stages can quite easily be
structured to take advantage of the remote simulation
concept, since remote terminal processors that can
double as special purpose stand-alone digital computers
are available. In a lab such as this, much of the real
time application work would be accomplished using
the remote terminal processor only, but the speed and
power of a large central processor would be available
for those applications that required additional computing capacity.

_-

Figure 7-Typical strip chart recordings from the simulation
Roll and pitch commands are generated by a "joystick" and
transmitted to the central 6600. Other traces represent results of
the 6600 computation transmitted back to the remote site.

U sing the results

Knowing the extent to which a real time application
will affect the batch throughput on a central facility is
one thing, but determining whether the operation could
tolerate the degradation is quite another. A host of
variables such as "spare" capacity on the central
processor, the percentage of the time during which the
real time job is inactive ("STANDBY"), rel~tive economies and flow time associated with alternate approaches, as well as importance of the real time application, make sweeping generalizations rather difficult.
It is possible, however, to identify several situations
where the remote application concept would be an
attractive solution to a real time computing requirement.
A central facility operating at less than 85% or 90%
capacity (critical resource capacity) is, of course, an
excellent candidate for implementation of the remote
simulation capability. Once a production facility is
operating above that level, the relative merits of batch

CONCLUSION
A successful demonstration of remote real-time simulation was carried out in January, 1970. The demonstration consisted of "flying" the airplane simulation
discussed in this report from the developmental center
(location of the 1700) in Seattle, Washington. The
math model was implemented on a 6600 in the Boeing
Renton facility and communication between the 1700
and 6600 was via 48 KHz lines. Each mode of operation
discussed earlier was shown to be operational. Figure 7
is a typical set of strip chart recorder traces obtained
during the demonstration. Traces of the pitch and roll
commands as well as computed roll, pitch, sideslip
angle and angle of attack shown. This technique is
presently being considered as the primary simulation
tool for possible forthcoming work such as the B-1
effort.
The remote simulation concept, therefore, appears to
offer important new possibilities in the design of new
simulation facilities and in the manner of utilization of
existing ones.
ACKNOWLEDGMENTS
Messrs. A. Ayres and Dennis Robertson, of the Boeing
Company, coded the mathematical model for the CDC
6600 central computer.

Remote Real-Time Simulation

Mr. Leo J. Sullivan, assisted by Mrs. Roberta
Toomer, both of Control Data Corporation, coded the
software for the CDC 1700 remote terminal and the
CDC 1500 analog-digital link. Mr. Sullivan and one of
the authors (0. Serlin) performed the actual checkout
of the system.
Mr. Robert Betz, of Boeing, performed the throughput tests on the 6600 and assisted in analyzing the
results.
Of the authors, O. Serlin is responsible for the conceptual design of the software, as well as the coding of
the 6600 central control program, operating system
modifications, and PPU program. R. C. Gerard planned
and analyzed the throughput testing experiments, conducted the demonstration, and coordinated the Boeing
Company's efforts on this project.

249

Many other persons contributed to the project in
other than the technical areas. Among these, Messrs.
John Madden, of Boeing, and Tim W. Towey, of
Control Data, deserve special recognition.
REFERENCES
1 M S FINEBERG 0 SERLIN
Multiprogramming for hybrid computation
Proc AFIPS FJCC 1967
2 J E THORNTON
Parallel operation in the Control Data 6600
Proc AFIPS FJCC 1964 Volume II
3 Control Data® Publication No. 60152900
31700 Computer System Manual
4 J R BAIRD
An optical data link for remote computer terminals
Datamation January 1970

Real time space vehicle and ground support
systems software simulator for launch
programs checkout
by H. TRAUBOTH
Marshall Space Flight Center
Huntsville, Alabama

and
C. O. RIGBY and P. A. BROWN
Computer Sciences Corporation
Huntsville, Alabama

INTRODUCTION
The launch of a Saturn V vehicle is preceded by a
complicated chain of checkout operations, which involve
a large system of checkout and launch equipment in
the Launch Control Center and in the Mobile Launch
Facility at the Kennedy Space Center to assure the
integrity of the flight systems. This checkout and
launch system consists of manual checkout panels,
ground support equipment (GSE) , telemetry stations,
data links, and two RCA-110A launch control computers. Commands initiated in the Launch Control
Center are transferred by these computers to the launch
vehicle under checkout. The computer sends out stimuli
and receives responses which are evaluated based on
predicted values stored in the computer memory.
Sending out stimuli and monitoring the responses is
done in a controlled sequence by test programs residing
in the launch computers. These test programs must be
thoroughly checked out before they are allowed to run
at the launch facility. The rigid testing of the launch
computer programs is done at simulation facilities which
imitate as closely as possible the environment of the
launch computers, i.e., the functions of the vehicle, of
the GSE, and of the checkout system. Most of the
checkout is done with hardware simulators such as the
"Saturn V-Breadboard" which uses partly actual flight
hardware and simulates certain mechanical and hydraulic equipment by electrical circuits.
In order to aid the checkout engineer in the design
and evaluation of test programs, two major software
simulators have been developed by MSFC. These soft-

ware simulators simulate the on-off functions of discrete
networks by evaluating large sets of Boolean equations
including discrete time-delays for pickup and dropout
of relays, valves, etc. They evaluate the equations in
non-real time and are driven by pre-determined sequences of states of switches and stimuli as generated
by test programs.
More than three years ago, a joint effort between the
Astrionics Laboratory and Computation Laboratory
began to define a simulation system in which a digital
computer would simulate in real-time the vehicle and
GSE functions in response to stimuli sent from the two
launch compute;s. The objective of this project was to
find a new way to perform major functions of the
Saturn V-Breadboard with a more flexible digital computer, so that RCA-110A launch computer programs
could be checked out, test programs could be evaluated,
and the effect of malfunctions could be investigated
without having to use and possibly damage expensive
hardware. The primary design objectives were to insure
that:
1. The simulator would act in such a way that the test
programs of the two launch computers would think
they were working with the actual vehicle and GSE
in real-time.
2. The prime portion of the simulator, the software,
should be structured in such a way that no reprogramming would be necessary when a configuration
other than Saturn V had to be simulated, as long
as the hardware components could be described by
the same nomenclature for the data base.

251

252

Spring Joint Computer Conference, 1970

(DISCRETE)

MAX.204O

DO

MAX 3024

01

VEHICLE

LAUNCH
COMPLEX
COMPUTERS

X

XX

IN BLOCKHOUSE

OM JrlK)IILE LAUMCHEI

Figure I-Saturn V-launch computer complex configuration

3. To provide the engineer with the capability of communicating directly with the computer when using
the simulator.
I t was determined that the feasibility of this approach could best be demonstrated by using the Saturn
V configuration as a test bed.
The emphasis of this paper is on describing the software of the simulator. The operation of the simulator
facility and the form of the mathematical models which
are inp~t into the computer are described in detail in
Reference 5. However, to understand the structure and
problem areas of the software, it is necessary to also
understand the configuration of the hardware.
SCOPE OF SIMULATION
A simplified diagram of the Saturn V-Launch Computer Complex configuration is shown in Figure 1.
The launch computers in the Launch Control Center
(LCC) and in the Mobile Launch Facility (MLF) send
out discrete signals (up to 2040 "Discrete Out or DO")
to the Vehicle through the GSE. The sequence and
addresses of these signals is determined by the test
programs. The vehicle then sends discrete and analog
responses (measurements) back to the computers. Most
of the discrete measurements (up to 3024 "Discrete
In or DI," i.e., open/closed relay contacts, valves,
switches, or gates) are fed through the GSE, while all
the analog measurements and a few digital measurements are transmitted through the digital data acquisition system (DDAS) or telemetry system into the
DDAS Computer Interface. The DDAS is the whole
collection of equipment which lies between the sensors
and the DDAS Computer Interface, i.e., per vehicle

stage a transmitter; a line driver and receiver; and a
digital receiver station. The transmitter consists of a
scanner, digital and analog multiplexer, A/D converters,
generator of identification codes, and modulators; the
line driver and receiver contain amplifiers and demodulators. The digital receiver station converts the
de multiplexed measurement information into synchronized data words and address words and sends them to
the Computer Interface. The Computer Interface is
mainly a digital memory that can store up to 8192
words, and a ·special controller which stores one measurement word every 278 p'sec in proper sequence and
which allows the launch computers to retrieve stored
data at random under several modes. The data are
stored in the Interface according to their identification
number containing the stage, channel, frame, multiplexer, and master frame numbers. The controller insures that the data request from the RCA-110A com- (
puter is properly decoded to find the requested measurement. For a measurement which has to be scanned at
a higher rate, the scanner, moving with constant speed,
accesses the sensor several times, and therefore, stores
it at several places. The RCA-110A computer can
access the data in the Interface in several modes, e.g.,
the request can be synchronized with the incoming
data, or locked at a specific measurement. Up to 4500
DDAS measurements can be handled by the Interface.
The launch computers themselves are connected
through a data link for exchange of information. The
test conductor controls the launch checkout through
the Saturn V-display which is driven by a smaller
DDP 224 computer. The coordination of the many
test programs, display programs, and control programs
for the peripheral equipment (printer, card reader, etc.)
is done by the RCA-110A Operating System.
The simulator performs the functions of the equipment shown in the upper portion of Figure 1. The

.
S

I

U

L
A

T

o

•

LAUNCH
COtII'LEX
COIII'UTERS

Figure 2-Real-time simulator systems configuration

Real Time Space Vehicle and Ground Support Systems Software Simulator

253

Logical equation
E(i)

TO LAUNCH CO..PUTER

FROM LAUNCH COMPUTER

I~

~

.

PRISSU~I 1

6DIR

6DIR

+6D ll0CP

+6Dll0LP.

KI

~

K2

6DIR

@1-

PPLY VALYE~Etn
VALVE
CLOSE

OPEN
CO.....ND

6DII

=

Yll(i)

Al

I'r

UOOI(T)
(w.I •• )

2

1

~

.,.J

\

/
°°

.4Dl1O

COMIWID

I

TRANSOUCI~

2000
(00 1 100 1 1200
1600 1
,YI'Y2 YI'Y2
.Y1'.Y2
TIME T (...e)
ANALOG TI"E FUNCTION 1(001

YI'Y2

COMBINED LOGICAL/ANALOG EQUATION FOR SIGNALS TO COMPUTER:
ANALOG SIGNAL E400I' 06110' YI • Y2 / P, 10, 150, M5000, NO, M(500)/
+ 06110 • -YI • -Y2 / P, -25, 150, M50000, NO, M(200)/
I

Jl

LOGIC
DISCRETE SIGNAL BI = 04110' E400II14000, 3500 II

Figure 3-Example of a typical discrete/analog circuit,

hardware portion of the simulator comprises the SDS930 digital computer with 32K words of core memory
and its peripheral equipment and bulk memory devices,
and a special purpose interface (SDS Interface) which
is in size similar to a small computer (Figure 2), This
interface performs the functions of the DDAS Computer Interface but does not contain a memory since
a portion of the SDS-930 memory is dedicated to these
functions, The SDS Interface contains counters, special
registers, and controllers which enable the two launch
computers to communicate with the SDS-930 computer
in the same modes as in their actual launch complex
environment,

Data base

The functions of the vehicle and its ground support
equipment as seen by the test programs can be described by large sets of logical equations and by analog
time functions which are described by polynomials or
tables, The logical or Boolean equations are timedependent in the sense that they consider pick-up and
drop-out time as a time-delay, The logical equations
consist of AND and OR terms (* and +) and negations
of a single variable ( - ), Special relays such as lock-out,
lock-up, and latching relays can be expressed by
equivalent circuits of regular relays, Figure 3 shows a
simplified example of a typical discrete/analog circuit,
For more detailed information see References 5 and 7,
There are basically two types of equations possible:

Y K11(i)

+ ... +

Y1a(i)

*Y

2a (i)

* ... *

Y Kaa(i)

where
or
Y pq (i)

= (-) Z

pq (i)
\

II P p q(i), D

pq (i)

II

Ppq(i)

= Pick-up time (amplitude) of element Zpq(i)

Dpq(i)

= Drop-out time (amplitude) of element Zpq(i)

J

ANALOG

* ... *

+ Y12(i) * Y 22 (i) * ... * Y K22(i)

6

and i, p, q, and a are unlimited index inte,gers 1, 2,
3, ' , " i,e" the number of equations, OR-terms, and
AND-terms is not limited. Pick-up time for a relay
means the time between activation of the coil and the
closure of an associated contact. Drop-out time is the
time between deactivation of the coil and opening of
an associated contact. Generally, pick-up time is the
time-delay between cause and effect, and drop-out
time the time-delay of the reverse action. Or mathematically, if it is the time instant of activation (deacthmtion) the discrete variable Zpq(i) is
Zpq(i)

= 0 for time t <

t1

+

P pq(i)(t

~

it

+

Dpq(i»)

For a relay which has a pick-up/drop-out time of
less than 10 msec the time-delay is ignored because
the delay does not have an effect on slower mechanical
devices. Thus, relay races between fast relays cannot
be simulated, and it is not intended to detect them
because the test programs do not check for them, For
most relays, the bracket term can be deleted.
The value of a logical variable may depend on the
amplitude of an analog value, e.g., the pressure in a
line, instead of a time delay. Then we write the terminators II instead of parentheses ( ).
The discrete variables can have different physical
meaning. We distinguish between a "DO" (Discrete
Output signal from RCA computer), "DI" (Discrete
Input signal to the RCA computer), digital DDAS,
manual switch, power bus, and an "IV". An "IV" is
an internal variable which is needed for internal computation when a circuit stores a signal.
Combined logical/analog equation

A logical equation can be combined with an analog
function. However, each analog function can be associated with only one OR-term of the logical equation.
An analog function can be described in eight different

254

Spring Joint Computer Conference, 1970

The data base for the total Saturn V including the
GSE amounts to about 17,000 equations of various
length. (See Table I).
If systems other than Saturn V are to be simulated,
only another data base has to be established; no reprogramming is necessary as long as the functions of the
physical system can be described by the same types of
equations. The data base is initially set up via the card
reader; modifications to it and control commands for
the simulation are input via teletypewriter.
SUvIULATOR SOFTWARE

General

: : : DATA PLOW

t~

LOGic 'LOW

Figure 4-General flow of simulation processor

formats such as a polynomial, table, cyclic function,
etc. 5 In any case, the analog function is described to
the computer by a one-letter code F designating the
type of format and a variant set of parameters, all of
which are enclosed with the terminators I·· ·1. These
parameters also contain the sampling rate for that
particular analog variable, the maximum and minimum
amplitude limits, the maximum time, and the period
of the time function. The general form of a combined
logical, analog equation is:

The software for the simulator can be divided into
three major areas: (1) Interface Support Software,
(2) Simulation Processor, and (3) Simulator Diagnostics.
The Interface Support Software controls the input into
the DDAS-tables of the SDS-930 core memory and the
output from it to the RCA-1IOA computer supported
by the hardware of the SDS Interface. It also controls
the transmission of the DO's, DI's, the analog values,
various counters, and clocks. The Simulation Processor
generates the data base in the computer from the card
input, evaluates the equations during the simulation
run, and controls the selective print-out of the simulation results. The Simulator Diagnostics checks all
hardware units of the SDS Interface such as counters,
data link control signals, and data transfer registers,
and the communication between the SDS-930 and
RCA-I lOA computers. The diagnostics check especially
for critical timing.
The design of the software is modular so that modifications can be made relatively easily. The total software
excluding the simulator diagnostics consists Qf approxiTABLE I-Magnitude of Equations
DDAS

DO
Discrete

247

SWItcH

557

DI

DI

BUS

IV

791

186

69

579

=

logical OR-term I/F, analog function
parameters 1 1
+ logical OR-term 2/F, analog function
parameters 2 I.

2934

791

Discrete
Equations

69

186

579

Analog
Equations

ALL

where A (i) is an analog value.
The interpretation of this equation is as follows:
Assuming that only one OR-term generates a "1"" at
one time (exclusive OR-terms) then the analog function
of that particular OR-term is evaluated at the specified
sampling rate.

2429

505

Analog
Variables
Discrete & Analog

A (i)

TarAL

Variables

505

about 10000

Variables

MaX. Computer
Capability

1625

4032

1000

6048

3000

3000

3000

about 20000

4000

Capability for
DDAS (Analog)
LENGTH OF EQUATIONS (Rougb Estimate)

Smaller than 10 OR-Terms
Larger than 100 OR-Terms
Between 10 " 100 OR-Terms
Up to 1000 OR-Terms

about
about
about
some

404
201.
354
to

Real Time Space Vehicle and Ground Support Systems Software Simulator

mately 315 subroutines and 26,000 instructions. These
figures should give a feel for the magnitude of the
software effort.
Interface support software

As stated previously, the simulator must provide all
data input values to the launch complex computers as
those provided by the launch vehicle, the Ground
Support Equipment, manual checkout panels, mechanical and pneumatic systems, etc., at the Saturn V
Launch Computer Complex. Additionally the simulator
must accept input data from the launch complex computers and provide the necessary stimuli to the launch
complex computers. These stimuli, in the form of
Discrete Out (DO) signals, Discrete In (DI), and
DDAS signals are provided by the interplay between
the Interface Support Software and the SDS Interface.
Each discrete signal (DI and DO) is represented in
fixed locations of the SDS-930 memory by the presence
or absence of a signal bit in the DI- and DO-Status
Table thus allowing twenty-four discretes to be represented in each twenty-four-bit computer word.
Software is also provided to support the input/output
requirements to Direct Access Communication Channels, Time Multiplexed Channels, etc., for the storage
and retrieval of data from mass storage and the recording of data on magnetic tape.

both RCA-110A checkout computers. This commutated
data must be in the same format as the data provided
by the Launch Computer Complex. This will allow
the Simulator to provide information for the RCA-110A
through the commutation processing of the SDS Interface. The DDAS data represent both analog and discrete
data. The analog data is represented in ten bits of a
twenty-four bit SDS-930 computer word. There are
ten discretes represented in each SDS-930 computer
word with each discrete being represented by the
presence or absence of a single bit.
DDAS simulation requires three DDAS memory
tables within the SDS-930 computer for use by the
Simulator and the SDS Interface (Figure 5). The
DDAS data words which are the results of the evaluation of the combined discrete/analog equations are
stored in a block of memory of the SDS-930 computer
which is referred to as the DDAS Commutation Data
Table. The address of this data word is stored in the
DDAS Commutation Address Table according to the
sampling rate required for this measurement. As a
final step in the commutation process, the data words
are stored by the SDS Interface in the appropriate
locations in the DDAS Commutation Output Table
where they can be accessed by either RCA-110A computer via the SDS Interface.

DDA5 C.

..c
~

....

iii

M

liIiS~
<><

equation cross reference information is generated and
output to magnetic tape for use in later processing.
Phase 2 of the Pre-Simulation Phase completes the
development of the Cross Reference File (Figure 7)
and the Transfer Equation File. This is accomplished
by merging the cross reference information with all
related equations and generating the Cross Reference
File and the Transfer Equation File of the updated
Master Equation Tape (Figure 7).

!§rzi

-00 Q

~~~u8

M

"'xxxx
IiIXXXX

~
~

....
1lCl'

~r:>o

"'xxxx

~

(")rz:I
~~";'
S
~o
zo ::>
.... ~<

..,
0

rz:I><
rz:IX><><><

..,

Figure 8-DDASassignment file

~

 C). D
(B < = C).
( - D) places the contents of D into A if B is greater
than C otherwise - D is placed in A.
Of particular note is the fact that if the variable on
the left of the assignment arrow refers to a picture
Plane, the result is stored in all the points within the
picture; if a Plane is mentioned to the right of the
arrow the execution utilizes the corresponding picture
data for all the points in the picture in the execution
of the statement. This capability coupled with the capacity to loop within a Process provides the system
user with a very powerful on-line programming capa. bility for processing picture data. This routine provides
a happy compromise between the flexibility of an interpreter and the speed of executable code.

+

SUMMARY
To summarize, Picturelab is an interactive system intended for use for research in processing digitized pictures. The system contains a Symbol Table for symbolically handling variable information, a Data Area
for in-core storage of pictures to be processed, and
provision for files containing lists of symbolic commands.
System action is directed by commands entered into
a Command Processor. These commands manipulate
the Symbol Table, call into execution picture processing
Routines, or manipUlate files of commands. In addition,

Picturelab

the user may gain access to data on secondary storage
through an I/O system which permits reading picture
information into and out of the Data Area.
Visual interaction with the picture information in the
Data Area is accomplished by means of the SIGHT
console and Stromberg Datagraphix JVlicrofilm output.
ACKNOWLEDGMENTS
The P.icturelab system is a result of the effort of a
number of people. Not included in the list of authors
are Mr. K. J. Busch who contributed substantially to
the design and Miss K. M. Keller who implemented
most of the Command Processor.
REFERENCES
1 K J BUSCH G W R LUDERER
The slave interactive system: A one-user interactive execu#ve
grafted on a remote-batch computing system
Interactive Systems for Experimental Applied Mathematics
pp 225-240 Academic Press Inc 1968
2 W S BARTLETT K J BUSCH M L FLYNN R L
SALMON
SIGHT-A Sattelite Interactive Graphic Terminal
Proceedings of 23rd National ACM Conference Brandon
Systems Press Inc

APPENDIX
Sample picturelab session

This Appendix presents a sample of the use of
Picturelab commands. The operations described below

271

are shown in Table 1 at the end of the Appendix. Lines
preceded by a star were printed by the system.
A. Three planes are defined to be global variables.
The numbers give the starting bit and number of
bits within a 36-bit word. The masks defined are
then listed.
B. Control is passed to the I/O system, the Data
Area cleared, an input file defined, and the label
on the file is printed. A picture fragment of size
128 x 128 starting at row 641, column 636 is
requested. The picture is read and control returned to the Command processor.
C. A routine applying a constant threshold to the
picture in plane INPLAN is executed with the
results going into plane OUTPLA. Then the
original picture is displayed followed by the
threshold results.
D. A Routine applying an adaptive threshold is executed and the results displayed.
E. Control is passed to the I/O system in order to
save the picture on a file, SAVE THRESH,
labeled "Results of Thresholding". The values
saved are from the plane OUTPLA created by
the adaptive thresholding routine.
F. Then a Process is built containing commands to
perform an adaptive threshold and then display
the results. This is filed under the name ADTHRE.
G. This Process is then executed with a tracing
capability enabled to permit monitoring the flow
of control.

272

Spring Joint Computer Conference, 1970

TABLE I-Sample C0ns0le Sessi0n

* PICTURELAB. C0MMAND PR0CESS0R. TYPE C0MMAND, MAP, R0UTINE 0R PR0CESS NAME
GL0BAL INPLAN PLANE 31 6
GL0BAL 0UTPLA PLANE I;') 1
GL0BAL PLDISP PLANE 32 3
PHVAL INPLAN 0UTPLA PLDISP

* INPLAN PLANE
* 0UTPLA PLANE
* PLDISP PLANE

A. DEFINE PLANES

31 6 000000000077
1;) 1 000010000000
32 3 000000000034

10SYS

* 1/0 SYSTEM
CLEARD
INPUT 21 1
* ENGINEERING DRAWING #1 FR0M MH8361
WIND0W 128 128
SUBPIC
641 636
RDPCT INPLAN
PEEL
* C0MMAND PH0CESS0R.
THRESH IN PLAN 0UTPLA
DISPLA PLDISP
DISPLA 0UTPLA

1;)

ADAPT IN PLAN 0UTPLA
DISPLA 0UTPLA

B. ENTER 1/0 SYSTEM,
HEAD PICTURE FRAGMENT

C. APPLY C0NSTANT THHESH0LD,
DISPLAY
D. ADAPTIVE THRESH0LD

10SYS

* 1/0 SYSTEM

E. SAVE 0UTPUT 0F ADAPTIVE THRESH0LD

0LABEL "RESULTS 0F THRESHOLDING"
0UTPUT SAVE THRESH
WRPCT 0UTPLA
PEEL
* C0MMAND PH0CESf;0R.
BUILD
PARS PLIN PL0UT
10 ADAPT PLIN PL0UT
20 DISPLA PL0UT
FILE ADTHRE
TRACE 0N
ADTHRE INPLAN 0UTPLA
* ENTEHING ADAPT
* ENTEHING DISPLA
* C0MMAND PH0CESs0n
PEEL
BYE

F. BUILD A PR0CESS T0 D0 ADAPTIVE THRESH0LD

G. EXECUTE PH0CESS

H. TERMINATE

Picturelab
I'U TUf'HONE LAlOIIUOIII'S, INC.
OI"TIIIIIU"I111

r.n

,'Jrtll

cn""lfH

'rI'I'lf U ... ""IIANDU" '0
cat.(\I'nNMNtf FIl'S -140

ItfOf,ur ..c'"

"U£ FlU CMY

IFOII" E-IHIII
U

IIEFEUtlC£ CO'ltS

'UfNT nlVlSlON
IF .. '.OUN"U" HAS
SIG'tIFJUNCf.

"IE'"

AllFN,1I C

ALT""."', .. E

AHnfIl5" .. ,1I II
aA .. ' 1'.11 F
'"THI/1I5,£
UA,,.lW5ltY,''lt5S " A
aWlITf.p C
.... TtffT ... ~
81\1'01ln,1I l

!I"'lIS".!: ,

8111" .. ,1'1115 I a
flllIl£II.M ..
IIlINIil. J t
IIOYC!;,w 1'1
8'l1""f I£LO."." J G
1I1II1 .. srIEl".1I A
IIlIInll'l.' F JII
lIunU1II\«'(."IIS .I
.1I1I1I1(1,n II
+'UStH,1I J
r.ltt."• ."A "
C'·".",.J ..
C'.AIILI,J .I
CFLfHINfI,"M" ~
(l'AnnUA •• I.
CHU.n T
(HA"'f.." C
C"FIU'AIt,J
r.. ,t«.A J
CHPUH",sr ... c
CHI'.W
+c:l "'fOlln.1I "
cnt; .. n".11 •
cnOK.J S
(.11""1 £V., 14
D'Ur.H(IHv. T 14
'''VIS;II L .III
DH",G 5
DElHI .. rofll.1I L
De"'.. ,ov."1"5 E "
nEl""tH.O ..
nolt)ROWSKv.1
on""'ll.1I T
1X)IISfTT.J II
III\AI(F .MRS l
DUFFY.F p'
E'STEIN," ,
FFL 'ON.W I
Flltlt,iOlllS 8 I
FIS"EII.G 1'1 C
FLF.I"C"fll.H I
'LY"',."I!.!. .. L
FOWlKE S ..t: "
'IIANII."I"S A .I
FIIF.F"v.,,"S I £
"""W(ll,L .I
.'lIfY.H C
FIIOST,H 8
FUCHS,E
G'''''f.J D
GF'Hfll.J II

"'LE.'
L
HAll.w G

HA"".''''',''I .. S ' ,
"'''''fIl.MII\ ,. T
H. . A.. Y'"w."ISS J
"'UMII".L 0
HAIlIIINt;TO ... 1I J
"'SlfO,E 0
HAUSf.A n
"AWltINS.1I II
H'''''''F.G 0
HAll .... I\I.J J
H£III\ltfY.H I
Hf'1I1fP'.J J
ttUltCllf,1) J
HlIPtCI ....I~ F
III£IIK"." C
"ACltsnN.£ It
"ACnllS,1
JAIIVn,J F
JI: .. SF ... O C
JnHHSIlH.!; C
J(1HHsn".1I C
'JULE ..
I(A£ .. H.R I
IlllSEII.J F
ItAlI"H.H '"
tel.AFIH.1I J
IIAS,,,II.F "
U"!;ON.II ,
IIFlLf".MI S'i I( 1'1
O(ElLY.l "
IIFAH.W f JII
IIFTTl(A.H W
Kn'Fl.p S
KOIINEGI".II L
KAIIPrL.1I J
LANG.J .I
LH,II C
lFII"".C II

.""11."
<;Al ..

SCHFfA~, ..

G
..(""IN"I .... I H
S( ..I) .. ~r.Ctt.1I G

"A~'tll.,;

"C fOWF ... J II
~C T rr;',I'.r. F
"'rtlllHAI n. It \
"tll"~L\" 1 ...1'
"lllfll.J I
"'ILNE.O C
"(1HL.r. I

"'OIlr:AN." ,
1'i.A A JII
OCUIIRAY •• II
"YEAS.F 14
."FVIlLF,R H
NIMTI.II 0
IIIIHl(f ... H
"(1ll."'"'i L W
OtlEltER.f
Oll"II".1I T
OIlTEl .... C G
OSsahH" • .1 f ,,11
"II.J II
PU"OVICH.J T
"""N.G C
'ETEII".l
PF'HlIN.".S 5 ..
PHllllpS.S .I
P"I"'i.G II
PILL' ... A
PIIISO ... I: II
.PLI)T1.11 S
"0111\

Poar.E.J F
POSTO'l.1I "
PIIINfl.t; W
'UElIlINr.," W

11110\"'
' ' ,"
IIAIISON,E G

FELSoA ..

5 .. YOf•• l
SII.H C

F 1T1SII000lS.' f
FOCAA ILE.J P

OItFNS. f I
PAIIOH.S
PFTFRSON.II W
pnLlAK.H 0
pnlOIilSKY.1 P
I'll ItF .M~S II
PII1I1ANSKY."ISS S
",,,IN£II.l II
PfAOV.' I II I
II[CKFA.N J
RHO.S C
IIFHfllf.A F
AFUTHHU8EII.II 0
IIICH"'''N.P l
IIIOG"'AV.II C III
AIFS1.G II
IIINlll)v.f I
RISSfT." C
IIORfATS.C 5
RI'S'i" ... J II

,;p~"r,.'

C

FIIAO.G I

r;

"T'lNft\ACtt.J P
SHll\l\lFFI[tI).1I W
STtllI"'i.C "

"TIJK'S."RS L I
SW_IIIC;')N. J F
TaNH ... ISS , I
'HH>4€II,1I f
lWlOIPS'lN• .1 S
TW)tll .... IJ ... 4t l
T,,.fY.J W
TUKI:V.P •
V" .. HI',SFh.J 0
VI.U8F" .......
VOv'~ •• " J oJlI
W,,,,.F. ,lOll S fI II
IIA1_rll."'IS .. F A
WAM SIIA ... I ..

lIusn .... " ,
.t H

WfX[LI\LU.1I L
.I'IPPI. f. J 14
"'HI H.F w
...H ...... II
lilt l I A>!S, II 14

v,,", ... •

"A"'IN.""'S f £
YEiSlEV.' A

201 d"ES
COVEll SHF.£'T ONLY TO

tOlllllFSI'ONIlnCE FILES -HO
5 tOftIES
PLUS Ol'lE COpy FOil
HCM 10DIf 10N&1.

filiNG SUIIJECT

AMlltta"'.S A
I"fl. I V
'HIIEIIS •• II
110 .. 0 Ell SI)"I , ,

N L

tlFHlEII'''ISS It
I\£RIIUY.O A

IIll UIS«I,O .I

) C"£O AS IIH£AF""f SOt,IICF

UIEH4t Gti , .... -l!

IIf1LFF,II It

..OL'( ,G L
YO'''lllE ,J "

IlSlIS,' ..

, ..UG,II

.,,' ,....£S

A

GI,,~,K

A

GITHFNS.J A
GlnEN.l J
GOOOI(IN.N"
GOOO"'" ,'4 l
GIIH"'fo""S I) I
CRHNHAL".H.H ..
Gil IFF IT".R J
GUST,V "

HAIC.R "
HAUI ... f.F G
HAI.I. ,I 0
itA II hln l.W .f
"~RC" ........ (
R
HFIORlO' T.O R
Hf S)~ • p.~ I S5 ~ I
,IONIr..w l
IVIr.r l
.IAKlIlSIII.( ~
JFHKINC; ...M!. J L
"nltNS lnp~F • P ..
IIANKOWSKI.F F
KAPPFl.J G
teARP. S S
IIAD\ .. fP ."1 55 ( I'
lC[t~N~OY.1I ..
K[PNIGH' ... 8 Ii

KF\SlFR.JI"eS E
ICNnLl.1I l
IIl1lE"S.' I
KIIAN11O'N".1I F
KIII ...... IKEVICH.J II
ICR,eWALl.T "
KRtlSI(Al.J It
LI IIITI.'4R" , P
lA'''VA.MR5 II A
LI"IIFRf."IIS C I
.. A"I\(IIT.P F
LANTER{I'''.L J
lAWSIIN." A
LFII0(1I.IIR5 0 "
LEHOER. W ~
LEYI NE.II
HVITT.H

II "Ilo" 0
I.OII(';.T A

OfsntltUnON COllfl .. UFO

"INCl(fl"'~III, ..

FRISfll.1 G
FII IFn[S.I
FR IEO""H.1 "
FRO£HlICH,F (

"11

F

IPFN,.1 II
IUl.II S
ATKI .... S
fU'''O.n"
BALOIiIN.G l
"IAf.F." F

.ELL TElf'f40NF UIlOlIlTOlllES, INt.

cnVFIl SHFET 0Nt. Y TO

"UFNtH.P E

It

"'e

5"1 'H ... , JII

".IIBtl".'"
,
S"'I'H.L ,

IIA~FrIRn.MRS

IIE1CHERT,1I 5

• "1"'0 IY 'U'HOII

CHA"'~fPS.J

A

O .. LY TO

S.. YO·EI\.F II

. . ,. ,r

I" .1,', ~'
St'fIlU Ill." A
 0 THEN 160
PRINT NAME
PRINT
LET A = NAME
GOSUB 120 B
NEXT
LET C = C - 1
RETURN

Output when the procedure is executed would be

A terminal oriented data processing language has been
defined and some of the more interesting capabilities
discussed. The language is designed to minimize learning
time. In supporting this concept, all data descriptions
have been removed and are handled procedurally. In
its basic form, the dataBASIC language is a procedural
language for on-line record storage, retrieval, and display. Additional capabilities such as field name and
field value manipulation, conditional file restoring, and
recursive subroutines are provided for the more proficient user. This paper has discussed only those aspects
of the dataBASIC language which are visiable to the
user. Underlying data structure and procedure 7 required
to implement the language capabilities, including online update and retrieval and data descriptors defined
during execution of the program are subjects we leave
for discussion at a later date.
REFERENCES
1 Basic language reference manual

2

3

4

"Y0UNG"
5

"J0NES"
"MAS0N"

6

"ALLEN"
7

"SMITH"
Figure 12

IPS-202026A Rev Information Services Department
General Electric Company January 1967
C W BACHMAN
Data structure diagrams
Data Base Quarterly of SIGBDP Summer 1969
C W BACHMAN
The DataBASIC System-A direct access system with dynamic
data structuring capabilities
Internal Paper 1970
C W BACHMAN S B WILLIAMS
A general purpose programming system for random access
memories
Proceedings of FJCC San Francisco California October 1964
G G DODD
APL-A language for assoc1:ative data handling in PLjI
FJCC 1966
Data base task group report
Made to the CODASYL Programming Language Committee
October 1969
R S JONES
Search path selection for inverted files
Internal Pa.per 1970

LISTAR-Lincoln Information Storage and Associative
Retrieval System*
by A. ARlVIENTI, S. GALLEY, R. GOLDBERG, J. NOLAN and A. SHOLL
Massachusetts Institute of Technology
Lexington, Massachusetts

INTRODUCTION
This paper describes an information storage and retrieval system called LISTAR (Lincoln Information
Storage and Associative Retrieval) being implemented
on Lincoln Laboratory's IBM 360/67 computer to run
under the IBM CPICMS time-sharing system. An
experimental version of LISTAR d~signed to test its
main features was implemented on the IBM 7094 under
the MIT Compatible Time Sharing System (CTSS).
This version was described in some detail in Lincoln
Laboratory Technical Report 377. 1 Because of its experimental nature, the CTSS version of LISTAR limited
the total space for data files to the non-program space
available in core memory. The current version allows
the f\le space to extend beyond core memory to auxiliary
storage. The logical limit of this space is determined
by the addressing capacity of the system. File space
size is currently fixed at 230 (1000 million) bytes. This
could be increased by relatively minor program changes.
The current version also introduces new techniques in
space management to deal with this extended file
space.
LIS TAR is written almost entirely in FORTRAN in
order to render the system somewhat machine independent. The basic input-output routines and a small
number of other basic programs were written in assembly language.
GENERAL FEATURES
LISTAR is primarily an on-line interactive system
which permits a user to define, search, modify, and
cross associate data files to suit his own special interests
*This work was sponsored in part by the Department of the Air
Force and by the Public Health Service, Department of Health,
Education and Welfare.

313

and needs. The system' assumes an open-ended library
of users' files which can be called from auxiliary storage
for processing by a name designation. A collection of
files which have a common directory (Master File) is
called a "file set." Each file set contains, in addition
to its data files, the directory information required for
their interpretation and processing. The user is free to
define, modify, augment, delete, and cross associate
data in files as his interests dictate. In addition to
direct support of the terminal user, the system allows
information storage and retrieval functions to be performed at the request of independent task programs.
A file in LISTAR is a set of entries. Each entry consists of a set of data fields which describe the objects
. covered by the file name. For example, if the subject
matter of the file is books, then the data fields might
be title, author, date of publication, publisher, and so
forth. The data fields ascribed to a file apply to every
entry in the file. The number of data fields per entry,
the number of entries per file, and the number of files
per file set are all variable and the user is free to define
as many as he wishes. Data field values may be any
character string, an integer, a floating point number,
etc., or a list of anyone of these. Entries are ordered
on the data value in a user specified field called the
chief field. Information which describes the structure
of files is maintained in a directory called the Master
File. The Master File is structured like any other file
in the system and contains information describing all
files including itself.
The system has been designed to permit the user to
create an association between any two files or parts of
files in a fiie set by defining a relation between them
.and giving it a name. This is a system feature which
was implemented and tested in the earlier version of
LISTAR and is being implemented in modified form
for the current version. A relation associates each entry
of the first file, called the "parent" file, and a set of

314

Spring Joint Computer Conference, 1970

entries from the second file, called the "linkee" file. An
entry from the parent file is called the "parent" entry,
and the subset of entries from the linkee file with which
it is associated is called its "subfile". The user must
.also explicitly identify the entries in the linkee file
which are to be associated with each entry of the
parent file.
When a relation is defined between two files, the
system associates the subfile entries according to some
ordering rule on one of the data fields of the subfile.
The user is asked by the system to give the ordering
rule and the field on which the ordering is to be done.
The ability of a user to relate files in the system and
to search on these relations is one of the powerful features of the system. It gives the user great latitude in
establishing and using cross references between files.
The user is free to create as many files and as many
relations as he chooses. He can therefore cross reference
the same files in many different ways if this serves
his purpose. Each relation is independent of the others.
A file may take part in any number of different relations
either as parent or linkee. The same file can be both
parent and linkee in a relation. Multiple users can define relations on the same file set without disturbing
the data base.

LISTAR SUPERVISOR AND COMMAND
LANGUAGE
The supervisor program for LIS TAR, called the
"Resident Interpreter," is a Fortran main program. All
input to the LISTAR programs passes through the supervisor (except for input of a large number of entries
from a bulk storage medium). The supervisor accepts
command lines (which the user may issue on a variety
of input units), scans them for command names and
parameters and places these latter in a list-structured
buffer area.
The command language has been designed with an
eye to user convenience and flexibility. In addition, the
command language interfaces with the command functions in such a way as to permit relatively simple
system modifications. Important features of the supervisor and command language are summarized below:
(1) The terminal user communicates to LISTAR
entirely by way of simple commands which have
the same format-free structure.
(2) The LISTAR command set is open-ended and
indefinitely expandable through the addition of
command subroutines.
(3) Since command subroutines are independent of
each other, they may be grouped into system

modules or segments to be executed as needed
to service the user.
(4) The supervisor and command language programs are independent of the command function
subroutines to permit non-terminal task programs to execute LISTAR functions directly.
(5) The terminal user searches files or relations on
files by moving markers which he creates and
positions during a session. He is free to create up
to ten markers and may move these markers up
and down a file as needed. Markers are independent of each other and may be erased individually
whenever their usefulness has ended. *
Figure 1 lists a sample set of the commands in the
system. The command name and identifiers required by
the command are typed in upper case; parameter labels
and a brief explanation of the command are typed in
lower case. Where the user may choose among several
alternative modes of the same command, the alter-·
natives are placed in parentheses. For these cases, the
user must select one of the alternatives. Optional
parameters are set off by the paired set of symbols
" <" and ">". Parameters are delimited by spaces.
Parameter names more than one word long are hyphenated. A sequence of parameters of indefinite length
is represented by a string containing the name of the
first and last parameter separated by three dots (... ).
Figure 2 is an example of a session using a number of
the commands in Figure 1. For our purpose, we have
chosen a file which is part of an information retrieval
system called MEDLARS currently used by the National Library of Medicine at Bethesda, Maryland.
The example in Figure 2 was generated by a terminal
user applying LISTAR to a file called "MESH" consisting of medical subject headings and stored on the
Lincoln Laboratory time sharing facility.
SPACE MANAGEMENT AND FILE STRUCTURE
Space allocation and management

LISTAR enables users to reference data stored in a
large number of files each consisting of a large number
of entries. These entries are imagined to reside in a virtual file-set space of up to 230 (one thousand million)
bytes.
The virtual file set space is mapped into a virtual
memory which is 256K bytes in size at present, but
which can go as high as 16M bytes (the address ability
*The use of markers in LIST AR is patterned after the scheme
developed by K. C. Knowlton for the Bell Laboratory's L&
language. See Reference 2.

LISTAR-Lincoln Information Storage and Associative Retrieval System

1.

STOP
'STOP'

2.

315

tpr~inatps

filesct-na~e

LOAD

listar and returns the user to cms.
(filpspt-tYne)

'LOAO' reads the filp set from the user~s disk storage into virtual
memory.
3.

file-na~e (lev~l-O-entry-name)
field-lcn~tr-l ••• fn-n ft-n fl-n

DEFINE FfLE

field-name-l

field-typ~-l

'OEFrr-:E FILF' creatps a file description for the named filc bavin!,:
the snecificd data fields.
4.

BULK rOrMfT format-name record-lenpt~ records-ner-entry
bcr, ; n-co 1 ur-:'n -1  end~co.1l1mn-1 fie 1d -namp-1
ber:in-colunr.-n  end-colun:m-n fie.ld-name-n

5.

BlJLK ItJPUT ( T/\Pn
( ems-filename cms-ffletype
( TERt1lt:f\L
fornat-na~c 

) listar-file-name
)
)

'r;ULK INPliT' causes entric-s to be read from the given bulk medium
into a. LISTfE file accorrlin~ to t~e bulk for~at.
(
FILE
file-name
( RELATIOr: relation-namp (file-name>

6.

PUT

7.

'·10VE marker-rar'e (count (

markcr-ha~e

)
)

field-name  condition value> >

IMOVE' mov~s tre marker down a file, until eit~er t~e numbpr of
entries given hy 'COUNT' havp been examined or the condition bas
been met.

8.

FOJ"\fi/\T fo rma t-namr f i 1e-nam~
(t'ORt.l

field-naflle-1

(TAB

field-nam~-1

column-no-l

field-name-n
field-name-n column-no-n

'FORMtT' is used to specify formats for printing the contents of
entries.
9.

PRINT markpr-naMP format-name
'PRH1T' causes selected data-field values (in the entry at \'Jhich
tbe marker is positioned) to be printed accordin~ to tbe format.

10. CH/\t!GE marker-name field-name-l value-I ••• field-name-n value-n
'CHt.,fJGF' puts values into the snecified fields in the entry at
which the given marker is positioned.

11. DELETE EPTRY marker-name

'OElFTE E~TRY' removes from the file
is positionpn.
Figure 1

t~~

entry at

whic~

the marker

)
)

316

Spring Joint Computer Conference, 1970

=c=>load meshfile
READY 0.03
c==>put ml file mesh
READY 0.02
c==>format fl mesh norm 'eng main
READY 0.02

~dg'

tally 'tag word I'

c==>move mIlO tally gt 4000
READY 0.21
cc=>print ml fl
ENG

~1"

I N HOG

TALLY
TAG \'IORD 1
READY 0.04

CORONARY DISEASE
4665
CS.26.12

c==>bulk format cd1 SO 3 1 40 'eng main hdg' S1 95 'tag word l' H==> 161 170 tally
READY 0.50
===>bulk input terminal mesh cdl
1==>adult
2==>gl.4
3==>S1452
READY 0.21
===>put ml file mesh
READY 0.02
===>move ml 10 tally gt 400n
READY 0.20
ro=·= >p r i n t m1 f 1
ENG tltA I N HOG

ADULT

T-ALL Y
TAG WORD 1
READY 0.04

Gl.4

81452

===>change ml 'tag word I' gl.4.l3
READY 0.04
===)print m1 f1
EUG MAIN HDG

TALLY
TAG \'/0 RD 1
READY 0.04

ADULT

81452

Gl.4.13

c==>stop
Tc l.54/9.34 17:19:36

Figure 2

LISTAR-Lincoln Information Storage and Associative Retrieval System

limit of the IBM 360/67). The size of virtual memory
is a CP/CMS parameter. The current version of CP/
CMS maps virtual memory into a real core memory of
512K bytes. 3
LISTAR files are list structured and processing of
the files can result in essentially random access within
virtual memory with an extreme paging-to-computation
ratio duriI.lg execution. In order to minimize this ratio,
LISTAR maintains a disciplined check on the use of
virtual space by way of space allocation and management algorithms.
Space allocation ap.d management takes three forms
in LISTAR. The first is "block" management which
includes creation, deletion, and moving of blocks, the
unit of data exchange in LISTAR between virtual
memory and auxiliary storage. The second is "cell"
management which concerns management of available
storage within a block. The third is "entry" management, consisting 'of assigning and updating a virtual
address to an entry. An "entry number" is a number
assigned to each entry of a file at the time it is created.
The number not only uniquely identifies the entry
throughout its lifetime, it also specifies the current
location of the entry within virtual memory space.
Over its lifetime an entry may be moved from one
part of the file to another so that the primary linking
order may be maintained. The entry number contains
within it all the information necessary to locate the
entry in the file.

B lock management

A file set is created by linking blocks of free space
representing regions of virtual file sp&ce. The block is
significant in two respects. First, it represents the unit
exchange of data for LISTAR to and from auxiliary
memory. LISTAR programs read blocks of data from
auxiliary memory into the user's virtual memory space
as required and, similarly, write blocks of data into
auxiliary memory to store or update files. The block
size is an integral number of 360/67 "pages" where a
page is 4096 bytes. The number of pages per block has
been fixed at one page for LISTAR. This number can be
changed with relatively little program modification.
A new block is created whenever space is needed to
enlarge a file in the file set and the current available
space is insufficient to satisfy this need. A block is
released when all data entries on the block have been
deleted.
Every block is assigned a number ranging from 1 to
218 -1, at the time it is created. The number uniquely
identifies the block. Reference to blocks residing in
auxiliary storage is always made by block number.

317

LISTAR files will, in general, extend over many blocks
and will dynamically vary in extent. At anyone time,
however, a relatively small number of blocks will lie
in a virtual memory area called the "file area". The
size of the file area is determined by the amount of nonprogram space in virtual memory (where program
space includes the space taken up by the CMS supervisor as well as LISTAR programs). For example, the
file area might take up 20 pages of a 256K byte virtual
memory. A table called the "virtual memory table"
(VM table) is maintained in LISTAR program space
which records the block number and location of those
blocks currently in virtual memory. As blocks are
needed, they are read into locations specified by the VM
table. If necessary, a block is written out to auxiliary
memory to make room for one that is more urgently
needed.

Cell management

When a block is created it is divided into smaller
units called "cells". Cells may range in size from one
double word to a quarter page (1024 bytes or 128
double words) .
Maintenance of free space follows the scheme used by
K. C. Knowlton in the Bell Laboratories' L6language.4
The first word of every cell contains a key and cell link.
The first bit of the key indicates whether the cell is
free or in use. The next three bits identify the cell size
in double words as log2n. If a cell consists of 2n double
words, then n is a three bit index on the cell size where
the index may range from 0 to 7. Free space on a block
is implemented as eight chains of cells, one for each cell
size. The first 2-cell of every block is reserved for a free
space header called Header 1. Each word of Header 1
contains a pointer to the first cell of the free cell chain
of the designated size. A header word (4 bytes) is assigned to each cell size beginning with size 7 and ending
with size O. The free-space chains are doubly linked
lists, that is each free cell has a pointer to its predecessor
in the chain as well as to its successor. The double
linking is an aid in maintaining the proper linkage as
cells are acquired from and restored to the chains.
When a cell of index n is released, the corresponding
twin to that cell is immediately checked to determine
if it is also free. A cell is the twin of another cell if both
have the same index n, are adjacent on the block and if
the combined cell with index n + 1 falls on an n + 1
cell boundary. If the twin is also free, they are combined into a free n + 1 cell and the process is repeated.
In this fashion, the free space within a block is dynamically accumulated into the maximum size cells.
Allocation of new cells is made by first checking the

318

Spring Joint Computer Conference, 1970

First Cell of An Entry
Key

Cell Link

Key

Entry Link

Entry Number
Ascend Link

Key

Data
Meaning

Bit

Cell Key

o

Free/Used
Cell size index (1og2 number of
double words)
Link type

f-~

4-7
Other

Link type
In-blockjout-of-Block indicator

0-6
7

Figure 3

count for the chain of the desired size and, if it is zero,
proceeding to the chain of cells of next larger size which
is non-empty to decompose a larger cell. Note that
Header 1 will never be selected as a free cell since its
busy bit is always set.
Entry management
Entry Structure

Entries are composed of cells chained together by
cell links. Since an entry will describe one object from
a set of similar objects, the number and relative location
within the entry of data fields which characterize the
particular object are specified for the entire file by
specifying the format for a typical entry of the file. This
descriptive information will be described later. Although
the entry format for each file is unique to that file, certain characteristics of the format. are constant for all
entries in the system.
Space is dedicated in the first cell of each entry for a
cell key and link, a "descend" entry key and link, an
entry number and an "ascend" key and link. A key is
always associated with a link. A link is a pointer which
always points to the first byte of a cell or entry and is
either a displacement in bytes relative to the start of
the block or an entry number which is convertible to a
block number and byte displacement relative to the
start of the block. A key is one byte long. The four
high-order bits of a key in a cell link specify the cell
availability and cell size as described above; for other
in-block links, these bits are not used. The four low
order bits of all keys specify a key value representing a
link type. Five basic types are distinguished: a link to

an empty list, a branch link to a sublist, a descend link
to a successor cell or entry, an ascend link to a predecessor cell or entry and a return link from a sublist to
its parent entry in the main list. The lowest order bit
indicates whether the link is pointing to a cell or entry
on the same block as itself or to an entry on another
block. In this latter case, the link is called an "out-ofblock" link. Figure 3 illustrates the format of the first
cell of an entry in which both the entry link and ascend
link point to in-block entries. If either link were an outof-block link, it would be the entry number of the entry
to which it points. LISTAR convention does not allow
an entry composed of more than one cell to cross a block
boundary. A cell link, therefore, will never be an out-ofblock link, i.e., an entry number.
An entry in a LISTAR file may be "simple" or "complex." A simple entry consists of one or more cells
linked by the cell link in decreasing cell size. The total
size of a simple entry is determined initially by the space
required to store the data values for the data fields
specified by the user at the time the file was defined.
Additional cells are acquired whenever new data fields
are added and the free space internal to the entry has
been exhausted. Similarly, cells are released from the
entry when all the data fields assigned to the cell have
been deleted. The expansion and contraction of an entry
in a file is applied equally to every entry in the file.
A complex entry is a collection of simple entries
which are linked together in a hierarchical structure.
For convenience in exposition, we will refer to the
simple entries that make up a complex entry as "subentries". The user determines whether his file .will consist of simple entries or complex entries at the time he
defines the file to the system or subsequently, if he
wishes to modify the definition. He does this by giving
the name of each sub-entry class, the data fields which
make up each sub-entry and the parent entry to which
each sub-entry is to be linked. The sub-entry class name
is used by LISTAR to label the linkage between each
parent entry and its list of sub-entries.
Entries on a block form a chained list which is linked
on the entry link of the highest level parent. Entries
are ordered in this list by the values that appear in a
specially designated field called the chief field. * As
entries are added to a file, they are inserted so as to
retain this logical sequence. Under some circumstances
this may require moving an entry to another block or
creating a new block to acquire space from the free
space chains. The algorithm for managing the addition
and movement of entries is as follows:
(1) The current block is inspected to determine if
*The chief field is determined by the user at the time the file is
defined or described to LISTAR. See earlier section.

LISTAR-Lincoln Information Storage and Associative Retrieval System

space is available for the new entry. If space is
found it is acquired and the new entry added.
(2) If space is not available and the new entry must
be added to the end of the file, then a new block
is created for the entry. Otherwise the predecessor block is inspected and free space is
acquired for the new entry.
(3) If space is not available on the predecessor block,
a new block is created and free space is acquired
for the new entry.
(4) If the predecessor block or a new block must be
used and the logical location of the new entry
comes after the first entry on the current block,
then the first entry is moved and the released
space is used for the new entry. The entry to be
moved will either be moved to the predecessor
block or the newly created block.
The algorithm used by D. G. Bobrow and D. L.
Murphy in their version of LISP is very similar to the
one devised for LISTAR and described here. 5 ,6 This
algorithm seeks to minimize the accessing of blocks
outside virtual memory for the most frequently employed storage and retrievaloperations.
Out-of-block referencing

An out-of-block pointer or entry number has the following format:

4

[A

I

4

14

B

A

10

D

where the 18 bits marked A form a block number ranging from 1 to 218 -1; the four bits marked B
represent an out-of-block key value, * the 10 bits marked
C represent an index on a table called an "entry table"
which resides on the block whose number is given in A.
An entry table is formed on a new block when it is
created. The table serves as a directory for storing the
file set location (block and displacement) of an entry at
the time it is added to the file. Whenever the entry is
moved, its location is updated in its table. If an entry
is deleted, the deletion is noted in the entry table. By
this device, the entry file set location of an entry need
only be kept in one place, regardless of its actual location in the file. When an out-of-block pointer or entry
number is encountered by a LISTAR routine, the
entry number is decomposed into its 18 bit block num-

ber and its 10 bit index. The block number specifies
the block on which the entry table for the entry resides.
The index is used to determine the slot in the entry
table where the file set location is stored.
The format of the entry table is shown below:
Entry Table

Usage
AV Count
Slots

Block #

5

i

18

Disp
8

0

1

2

n-i
n

II

I

II

Each slot in the table is one word (4 bytes) in length.
An entry table can range in size from 64 slots up to the
maximum number that can fit on a block. Normally an
entry table will have 64 or 128 slots. In the unusual
event that all the slots of a table have been used and
none are available for a new entry, a special entry table
is created which fills all the space on another block
exclusive of Header 1 and Header 2, i.e., 1008 slots.
The block containing such a table is called a "table
block". Table blocks are created as needed. In most
cases a slot will be available for a new entry so that the
need for a table block should be very infrequent.
An entry table is formed by acquiring a cell from the
free space chains. The first bit of the cell is an availability bit (Av in the diagram) and is set to zero to
i'ndicate the cell is in use. The remaining bits of a slot
are used to designate the usage count of the slot, the
block number and the relative block displacement of
the entry identified by the entry number. The relative
block displacement is stored in units of 16 bytes
(double-double words). If an entry is deleted, its entry
number is retired, and its slot is marked as vacant. The
slot is then available for use by a new entry. The number of times a slot may be re-used is determined by the
table size. This number will be referred to as the "moduIus". If s is the table size and m is the modulus then:

s X 2m = 1024 for m = 2, 3, 4 or 5
and

*See above section.

319

s X 2m = 1008 for m = 0

320

Spring Joint Computer Conference, 1970

ENTRIES FROM THE MASTER FILE

BLOCK t

FIELD INDEX
EL FLDN
T L.OC LEN
3141r..~~
IsH-I

ENTRY INDEX

FILE INDEX

1 I • Irf~TH IXI~ I-I
.1 t I~ION IFH-I
IC121-1
,1 10 I~i~
10 I &I~~~
IX131-1
12 113 1~:J:;

IsH-I

13 1141~~~E

H21-1
1L131-1

141" 1~~iRT
I&II~ I~~~ION

I - 111 I

19120 I=mT~O':EE 1L1'1-1
2DI211~m~~fHTILI11-1

TO BLOCK 2,
ENTRY I
LEGEND

FIELD IDENTIFIERS
ENTRY INDEX (l,nk)
ENTRY LINK
FI
FIELD INDEX Clink)
FLDN FIELD NAME
FN FILE NAME
FS FILE START llink)
LEN FIELD LENGTH
LOC FIELD lDCATION
SEN SUB-ENTRY NAME
STR STRUCTURE OF SUBENTRY
T FIELD TYPE

EI
EL

1111'1~~:~E ILH-I
IB 11,1 ~~: FIELD1L1&1-1

ENCODING TYPES
C CHARACTER
L LINK
F INTEGER
S SUB-ENTRY
U USER SUPPORTED
X HEXADECIMAL

21 1221~~R
IxH -I
221 231:':kNT IL131-1
23 1241==TFIELDI LI~ I-I
241 2~ IPATH
1L191-1
2~11&1:~TIDN 1c121-1

Figure 4

The usage count is recorded in the five bit field following the availability bit of the entry table. The usage
count also appears as the low order m bits of the index
field of the 'entry number, where m is the entry table
modulus. Information necessary to access the entry
table is recorded in a second header at the top of the
block called Header 2.

FILE DESCRIPTION
There are three principal system files maintained for
each file set. These are called the Master File, the Relation Index and the 'Path File. The Master File contains a description of every file in the file set including
the Master File itself and the other system files. The
descriptions of the Master File and Relation Index are
always the first and second entries in the first block of
a file set. Figure 4 shows a simplified illustration of a
typical first block. The block has an entry which describes the Master File itself and a second entry which
describes the Relation Index. We will explain the make-

up of the first entry on the block (the entry which describes the Master File itself) here and explain the
.second entry (Relation Index) in the next section when
we take up relations.
All entries in the Master File are complex entries.
The Master File entry consists of three sub-entry classes
called "File Index", "Entry Index" and "Field Index".
The sub-entries are numbered 1 to 14 to represent
entry numbers. For this illustration links are shown as
entry number pointers. A number of the fields, such as
key field, have been omitted for the sake of simplicity.
In practice, free space might exist within some of the
sub-entries and would be available for additional fields
if needed. This free space is not shown in the figure.
The field location and length in the Field Index
specify the position of a field relative to the start of
the entry of which it is a part and its length in bytes.
The numbers in the figure are illustrative only and refer
to positions in the diagram. Omitted values are indicated by a dash (-). The field type. specifies the internal coding format of a data value stored in the
designated field. LISTAR accepts 8 types: integer,
floating points, character, decimal, binary link, subentry and user. The "sub-entry" type identifies the
linkage between a parent entry and its sub-entry list.
The "user" type identifies a type defined by the user.
It provides the user with a means of storing information which is coded in a form that is especially meaningful to him. Input and output of data values stored in a
"user" type field are handled by an I/O program written by the user which interfaces with the LISTAR
routines. The I/O program would normally be prepared
by the user before defining his file.
Figure 5 illustrates a second block containing the
file description of three other files. Two of the files,
MESH and PIC, are data files; the third is a special
file created by LISTAR to implement a relation called
"ZEUS".
As indicated above MESH is a file of standardized
medical subject headings. PIC is a special vocabulary
of medical terms appropriate to Parkinson's disease.
ZEUS is a relation which maps PIC terms to MESH
terms. The relation ZEUS will be explained in the
following section. The descriptive information for each
of these files is stored as complex entries in the Master
File. The entry link of the last entry of block 1, "Reaction Index," points to the first entry on block 2,
"MESH", by way of an out-of-block pointer which is
here represented by a block and entry number (2-1).
The last entry of block 2 is the last entry of the Master
File and points back to the first entry on block 1 (1-1).
Figure 6 illustrates the makeup of the PIC and
MESH files. The PIC file contains one field for the PIC
term (PT). The MESH file contains three fields: the

LISTAR-Lincoln Information Storage and Associative Retrieval System

main heading term, Eng Main Hdg (EMH), the Tally
and the classification term, Tag Word 1 (TW1). The
entries are linked in alphabetical order on the entry
link (EL) and each returns to its respective descriptive
entry in the Master File.

RELATIONS

sAMPLE ENTRIES FROM THE MASTER FILE

BLOCK 2
FILE INDEX (continued)
EL

FN

FS

ENTRY INDEX lconllnu••11

EI

EL

SEN

STR FI

(~~.~I~ME~SH~~14_-I~I_2~r-Z~~,-_1M_E_SH

1

I 151

MESH

1 -

I It I

TO ILOCK I.
ENTRY I

151 •

I PIC

I -

T

ENCODING TYPES

FIELD IDENTIFIERS

LOC LEN

I I. I

WORD

5 1 Z • TALLY

.F •3 . - .

8 1 7 IplC TERM

1 C Izl-1

It 1 IZ IBACKPOINTERI L 121-1

12 I 13 I BRANCH

ILl 31-1

131 14 1RETURN

1 L 14 I-I

141 10 I VALUE

1-151-1

1.1 171BACKPOINTERI L Izl-1
171 II IBRANCH

I L 131-1

III "

I L 141-1

IRETURN

" I "IVALUE

LEGEND

El

ENTRY INDEX 11111\1)

C

CHARACTER

EL

ENTRY

~

L

LINK

FI

FIELD INDEX \l1nIl)

INTEGER

FLDN

FELD NAME

SUB-ENTRY

FN

FILE NAME

U

USER SUPPORTED

FS

FILE START UiII\ll

X

HEXADECIMAL

LEN

FIELD

LOC

FIELD LOCATION

SEN

SUB-ENTRY NAME

STR

FLDN

I_-_1_3_r-_~3~1~4jl~E~~~~I_N_HOOjl~C~IZ~I~-1
4 5 :TAG
I :U: 4: -:
1- 18 I

13-11 7 I

10

FIELD INDEX (conti_)
EL

__

\

(I'"

PT

EL

I

I

2 I ABDUCENT NUCLEUS

I

I

I
I
I FASCICULIS CUNEATUS I

•

2

3 I CAVUM EPIDURALE

2

I3

LE~TH

STRUCTURE
FIELD TYPE

Figure 5

1.-151-1

LEGEND
EMH

ENG MAIN HOG

(mesh 'e,m)
PT

PIC TERM

TALLY

TALLY

TW I

TAG WORD I

TALLY

EMH

EL

TO BLOCK 2.
ENTRY 6

As indicated earlier, the user can create an association
between any two files in a file set or between entries of the
same file by defining a relation between them. Relations are implemented as a chain of out-of-block pointers (entry numbers) which link the entries taking
part in the relation.
The entries of a relation chain are distinct from the
entries in the files or file on which the relation is defined,
and they are stored in a separate file called a "path
file" identified by the relation name. The entries in a
path file are abstracts of the entries in the main files
being related. A path file links abstract entries from the
parent file to a set of abstract entries from the linkee
file. Abstracts are created only for those entries specifically relevant to the relation. Each entry of a path

SAMPLE ENTRIES FROM MESH FILE
BLOCK 4

SAMPLE ENTRIES FROM PIC FILE
BLOCK 3

3 1 4

321

2 I
I

I4 I
4 I 5 I
5 I 6 I
6 I 7 I
7 I 8 I
8 I 9 I
9 I 10 I
10 2-'
3

TWI

71 I A8.75.16.1 I

ABDUCENT NERVE

I ca.26.12.1 I

ANGINA PECTORIS

I 1265

COMPUTER DIGITAL

I

CORONARY DISEASE

I 4665 I C8.26.12

DURA MATER

I

HEART DISEASES

865 I LOO. 18.13. 1

I

I

271 I A8.30.26.1

I

I 3826 I ca. 26

MEDULLA OBLONGATA

729 I A8.30.13.1

PONS

440 1 A8.30.13.1

SPINAL CANAL

145

I
I

I A2.84.M.I I

SPINAL CORD

TO BLOCK 2.
ENTRY I

Figure 6

file, whether it be an abstract entry from the parent
file or from the linkee file, has four data fields in addition to the standard fields of an entry: "back pointer",
"branch link", "return link" and "value". The back
pointer is the entry number of the entry from which the
abstract was formed. The branch link connects a
parent entry with its sub-list of linkees. The linkees
are chained on their entry link. The last entry link of
of the sub-list contains a key value indicating a return
to its parent entry in the path file. The return link
provides an additional link from the sub-list to its
parent entry if one is needed for purposes of more
rapid search. The value field contains a copy of the
data value from each entry in the main files which takes
part in the relation. The parent and linkee entries in
the path file are ordered on the value field.
At the time a relation is created the user must specify
the parent entries and the linkee entries that are to be
associated. He must also specify the fields in the main
files on which the entries in the relation file are to be
ordered. The data values from each of the entries
designated are copied into the value field of the relation file entries, and the entry number of the entry
from which a value is copied is stored in the back
pointer field of the relation file entry. The path file
entries then contain just those data values which are
relevant to the relation and each entry is an abstract
of its correlate in the main file.
Information describing a relation is stored in the
Master File and in a system file called the Relation'
Index. The name assigned by the user identifies a
particular path file. Each such file has a description in
the Master File. For example the relation ZEUS mentioned in the preceding section identifies a relation

322

Spring Joint Computer Conference, 1970

SAMPLE ENTRIES FROM THE PATH FILE FOR RELATION ZEUS

SAMPLE ENTRY FROM THE RElATroNS INDEX

ILOCK 6

BLOCK'
1:1..

t

~

MIl

h-tij

ZEUS

b.t.

I

fIOF

IN

2-1

I

I-I

I.OF'

NXP

"XL

12-.lz-, I '-1 1,·,

P

j

011

2-'

I

EL

A

......

I ,14 Is-.I

2

.1.,11-11,
Uf

It

WI

1-

iMDUCENTNUCLfUSI

i·ICMIUIIIEf'IOIJIItAU

I

. . . . OIFLMCtf:'.... EIIIIIII)

21 S 14-tl-I-IAlDUCENTNPM I
SII!4-.I-I-JI'OffS
I

,j.j.·,I-!-!DUftaIlATEJt
• I 4 14-11 - ! - ISPINAL CANAL

LOr OIIICEItIllG'II:LDI"LMt:E'1LE IIIMI
IGC'- NEIITM:L..TtoN .....w:HTH'S'IL£ISI..IMkEEI_1

NlIf' NEMTJI[U,TtON.WMlCHTHlSn...£lSfirMI:.TtllIIIIl
OItOUINGItULl

Oft

...

1MIII':000MIIIUT''''[CtM'

fIOF OM!I:'U", '"l01fllMM:.t 1'-'1
It
ItfiI

VI.

'tUUftIfUNIC
MLATIOfIII ......
u.cUV&&..ur:

.,!Z-ij,-S!'!-jA§ISCUNEAT\lII
(

.1.!4-7!.I-!MlD\lUA ......TAI

.I.,14-tO!-I-I"'NAl~

TOI=-.Z.

by a global procedure. The user searches a relation in
the same way he searches a file, by employing the commands PUT and MOVE.
The relation implementation scheme adopted for
this version of LISTAR is one of several choices made
possible by the LISTAR design rules. It has the disadvantage of requiring redundancy in the storage of
field values, but this is offset by advantages in programming. Under this scheme the same system functions can be applied equally to the processing of relations and files with little or no need to treat relations
as specialized structures.

Figure 7

CONCLUSION
which maps terms from the PIC vocabulary to associated terms in the MESH vocabulary. The entry in
the Master File (Figure 5) gives a description of the
path fil~ entries that take part in the relation. More
specific information on each relation is stored in the
Relation Index. A description of the Relation Index
as it would appear in the Master File is illustrated in
Figure 4. Figure 7 shows, in simple form, the entry for
the relation ZEUS as it would appear in the Relation
Index. The Relation Index entry contains the name of
the relation (RN), the entry number of the parent
file (PN), the entry number of the linkee file (LN), the
entry number of the ordering field for linkee entries
(LOF) , the entry number of the path file description
(P) and the ·ordering rule (OR). All values for these
fields except the relation name and ordering rule are
entry numbers which point to Master File entries.
The ordering rule determines whether the ordering in
the linkee entries is alphabetic, logical, numeric, or user
defined and is specified by a preset code.
In addition to the standard chaining on the entry
link, entries in the Relation Index are chained on two
other links. One of these connects all Relation Index
entries which have the same parent file in the relation
(NXP). The other connects all entries ,vhich have the
same linkee file in the relation (NXL).
When the user defines a relation he specifies the
files to be related, the fields on which the parent and
Iinkees are to be ordered, the ordering rule and the name
he' wishes to assign to the relation. With this information, LISTAR creates an entry in the Master File for
the relation file and an entry in the Relation Index.
Once the relation has been defined the user may then
generate the path file by directing LISTAR to associate
specific entries from the linkee file and the parent file.
A particular path file may be generated incrementally
by the user or, if an algorithmic association applies,

LISTAR was designed primarily to maximize facility
and flexibility in file management. Empirical evidence
is insufficient at this point to determine what the cost in
speed of searching might be in attaining these goals. It
is however, clear that the system is very easy to use
a~d gives the user great latitude in creating, modifying
and searching files. Ease of use has been achieved by
providing a relatively simple English-like command
language which requires the user to have no programming experience and very little knowledge of file organization. The self defining character of the Master
File and the fact that the same structuring rules apply
to the Master File as to data files make it possible to
modify the Master File as easily as data files. Finally,
the ability to create relations provides the user with a
powerful tool for associating data entries in ways that
are especially meaningful to him and which offer more
rapid searching than direct searching of his primary
files might allow.
REFERENCES
1 J F NOLAN A W ARMENTI
A n experimental on-line data storage and retrieval system
Lincolon Laboratory MIT (TH.-377) September 24 1965
2 K C KNOWLTON
A programmer's description of L 6
Communications of the ACM Vol 9 No 7 August 1966
3 It U BAYLES ET AL
Control program-67 ICambridge monitor system (CP-67/CMS)
IBM Cambridge Scientific. Center May 1 1968
4 K C KNOWLTON
A fast storage allocator
Communications of the ACM Vol 8 No 10 October 1965
i) Lincoln Laboratory Internal Technical Memorandum
October 24, 1966
6 D G BOBROW D L MURPHY
Structure of a LISP system using two level storage
Communications of the ACM Vol 10 No 3 March 1967

All-automatic processing for a large library*
by NOAH S. PRYWES
University of Pennsylvania
Philadelphia, Pennsylvania

and

BARRY LITOFSKY
Bell Telephone Laboratories
Holmdel, New Jersey

INTRODUCTION
Our concept of what is considered large-libraryprocessing changes with the growth of published information and with the progress of the relevant· data
processing technology. The size of the library may be
characterized by the number of entities that it concerns and the average number of retrieval terms that
index the information about each entity. This applies
to the processing of bibliographic services in preparation of recurring bibliographies of periodical literature
and to the processing inherent in acquisition and custody of a library collection and communicating information regarding the collection to the library's users.
In this context, a large library may be considered to
have from 50,000 to tens of millions of individual
publications with each publication characterized by
from 10 to 100 retrieval terms. Numerous existing
libraries and bibliographic services fall in this range.
Cost, personnel availability and service quality
problems provide the justification for developing allautomatic processing methodology for a large library.
This is similar to the justification for mechanizing any
other industrial, commercial or service function. These
three problem areas are particularly acute in the
library field. The cost of present library processing is
very high. A major component of the budget of libraries covers the processing functions which include
iI)dexing, cataloging, vocabulary maintenance, and
communicating information to the library's users.

* The research reported has received support from the Information Systems Program of the Office of Naval Research under
Contract 551(40) and from the Bell Telephone Laboratories.
323

There is generally a shortage of qualified personnel
even where funds are available. In anyone of the large
libraries there are thousands of monographs and serials
that are awaiting to be indexed and cataloged. These
often lay unused because of the dearth of competent
indexers and catalogers, especially those expert in
particular subjects arid languages. The increased
amount of material that is being disseminated requires
substantial increases in staff. Staff with such competence is extremely scarce; low salaries and monotony
of processing work discourage young people from
entering the library field. Finally the services are not
satisfactory as indicated by the very low utilization of
library resources by the scientific community. Furthermore, the libraries generally operate at a low, almost
inacceptable, retrieval effectiveness and the library
user requiring specific information is overwhelmed
with much irrelevant information.
There are three major processing functions:· (1)
indexing and classifying; (2) thesaurus (vocabulary)
and classification maintenance; and (3) user query
interpretation. In developing an approach to the
carrying out of these functions, there is a choice between the semi-automatic and all automatic processing
approach. Either approach requires direct interaction
of the staff and users of the library with the automatic
system. However, in a semi-automatic approach, the
staff shares" with the automatic system the minute
decisions in carrying out the above functions. Since
such sharing of decisions and functions will continue
to require expenditure in funds and highly trained
staff, the semi-automatic approach will not respond
fully to the problem areas delineated above. The
following therefore is devoted to exploring the allautomatic methodology for processing in a large library.

324

Spring Joint Computer Conference, 1970

The methods employed in libraries for a century
such as indexing and classification have proven of
lasting value and serve as a foundation for the allautomatic library processing as well. Furthermore,
methods of content analysis for indexing and classified
concordance preparation, which require merely clerical
procedures, have been proposed for centuries. 33
To cope in a practical manner with a mass of data,
the traditional approaches can be performed automatically with only minimal guidance provided by
the library staff. Thus, in the all-automatic processing
of a library the indexing of documents can be performed entirely by the computer following content
analysis algorithms which select index terms from title,
abstract, table of contents or references included in a
document. This procedure also does away with the
vocabulary maintenance work through maintaining
an open ended index word vocabulary which is
uncontrolled.
Next, an automatic process can be applied which
generates a library classification system for the library
collection. The classification system represents a
scheme for placing documents on shelves, in microforms, in bibliographic publications, or in the computer,
as appropriate. The classification system created automatically differs however from conventional classification systems employed in libraries. The prevalent approach to creating a classification system is that of
applying human judgment a priori to divide the library collection into progressively more specific classes.
By contrast, automatic classification systems are generated a posteriori from the information in the documents in the respective collection; namely, the index
terms are extracted automatically from each document
as it enters the collection. The division of the collection
into progressively more specific classes is then performed automatically, based on the index terms extracted from the respective documents. The notable
advantage of such a classification system is that, being
generated automatically by a computer, it can always
be brought to reflect an up-to-date situation incorporating new documents or using new algorithms in reindexed documents. Thus the significance of new
classes in the library collection is incorporated in the
classification system.
Finally, placing publications (or bibliographic citations) according to a classification (similar to manual
classification) provides the ability to browse in a
library and find documents on related subjects placed
together. This browsing capability can be retained in
bibliographic publications (as will be shown) as well as
in library shelves, in microform storage, or in the
memory of the computer: wherever the documents or
the surrogate information are stored. The retrieval

procedure then consists of reference to a classification
schedule which points to the respective areas where
relevant information is found. The look up of respective
index terms (or their conjunctions or disjunctions) provides the respective classification numbers, which can
be algorithmically translated into storage locations.
This contrasts with coordinate indexing retrieval
systems, where the user's needs must be interpreted
into a logical formula which should incorporate all
the relevant index terms including appropriate synonyms. Such an interpretation requires expenditure of
much time by highly skilled staff and by the computer.
The remainder of this chapter deals with three of
the functions which are considered to be the major
ones in an all automatic processing of a library. These
are the automatic indexing which is the subject of the
following section, the automatic classification which is
the subject of the third section and the retrieval
through browsing with the aid of a classification schedule which is the subject of the fourth section. However,
the principal and novel aspect reported here is the
generation and application of the Automatic-Classification in the third and fourth sections. The discussion of
Automatic Indexing, in the following section, is therefore brief and of review nature.
AUTOMATIC INDEXING
A variety of automatic indexing approaches have
been described by Stevens. 36
Of interest here are only those indexing methods
which do not involve human control of the indexing
vocabulary. In selecting these methods there is an apparent lack of direct concern with the concepts, things,
or people behind the mechanically selected index
words. This is indeed not so, since to conceive anything
is to represent it in symbolic form, which in the context
here means representation in words which are selected.
The assimilation of new publications into the collection consists of text analysis and extraction of words
through a clerical procedure. These words are then
entered into a concordance of index terms. The concordance may be further processed to omit terms in
some algorithmic way (based on frequency, for instance)
and to form a classification system. The new publication is thus assigned a classification number and accordingly allocated storage space. Retrieval queries
are similarly processed, where words in the query are
extracted and used to reference the classification
schedule; thereby determining the respective areas of
the collection which are of interest.
The language analysis in deriving automatically the
index terms may be based on the entire text, on cita-

All-Automatic Processing for a Large Library

tion and header information, such as table of contents
or references, or merely on the title. The cost of transcription of the publications into machine readable
form decreases greatly as the amount transcribed is
reduced; however, this also reduces the eventual retrieval effectiveness. ,There is an indication however
that effectiveness of retrieval by subject area increases
considerably· (20% to 25%) when the transcription of
the abstract is added to the transcription of the bibliographic citation. 9,31 Content analysis of full text has
not proven to sufficiently improve the effectiveness of
retrieval to warrant the considerably greater cost of
transcription.
Similarly, increasingly more complex language analysis procedures may be employed. However, again, the
increased complexity of the algorithms may contribute
only very little to the eventual retrieval effectiveness.
The simplest procedure is to analyze a text to recognize and generate stems of words encountered in the
input material and treat these word-stems as candidate
index terms. This involves only recognizing the suffixes
of words and elimination of highly common words.
Suffix editing procedures are simple; a procedure for
English has been described by Stone, et al.,37 and a
procedure for French has been described by Gardin. 16
Similar procedures have been developed by numerous
other investigators. 32 Such simple procedures, where
stem words are derived from title or abstract, without
reference to thesaurus, has proven effective for retrieval
in situations where the user is satisfied with retrieval
of one or few relevant documents. In a library arranged
by subject, according to a classification, additional
documents on relevant subjects will be found in adjacent storage areas. This simple indexing procedure can
therefore serve for the all-automatic large library
processing.
More sophisticated procedures employing syntactic
analysis of sentences and semantic analysis involving
look-up in dictionaries may be employed in the automatic indexing process. Natural language processing
and machine translation research are relevant, as
many of the algorithms developed there are directly
applicable to automatic indexing. 30
The aggregate of the index terms extracted from the
incoming or re-indexed documents constitute an allinclusive directory or concordance of the inde~ terms.
These directories are generated a posteriori from the
documents themselves. An important step in deriving a
usable thesaurus is the elimination of the very high and
very low frequency words. 15 More sophisticated processing of the index word vocabularly may be employed.
For instance, a smaller thesaurus may be obtained by
including only terms with high frequency of use in
retrieval queries, or index terms of documents which

325

have been retrieved frequently. Analysis of queries
may also serve as a guide regarding important relationships among terms. Statistics about frequencies of cooccurrence of terms may be used to combine terms into
phrases which will be used in their entirety as a single
term. Finally, the automatic generation of a classification, described later, may provide further information
about grouping and sub-grouping of terms so that
separate thesauri for some specific subject areas may be
prepared where relationships between index words are
established in the context of the subject areas.

AUTOMATIC CREATION OF A LIBRARY
CLASSIFICATION
A posteriori creation of a classification

The basic aim of a classification system for a collection of documents is to group "like" documents together into categories. A posteriori classification does
this by setting up the categories only after the documents are available. Thus, a posteriori classification, as
opposed to a priori can optimize the categories with
respect to the documents actually existing in the collection. Coupled with the automatic nature of the process,
this leads to a large degree of flexibility and ability to
maintain up-to-date classification schedules.
Lance and Williams l8 ,19 divide a posteriori classification strategies into hierarchical and clustering types.
They further subdivide hierarchical strategies into
agglomerative and divisive types. In agglomerative
strategies the hierarchy is formed by combining documents, groups of documents, and groups of groups of
documents until all documents are in one large group:
the entire collection itself. The hierarchy being thus
formed, all that remains is to select some criterion,
such as category size, at which one cuts off the bottom
of the hierarchy. Experiments using such a method
were performed by Doyle l2 ,13 using the Ward grouping
program. 38 Prywes25 ,26 ,27 has also devised a system of
this type, with only small scale work done on this
algorithm,39 because of computational difficulties.
Divisive techniques have long been thought the
realm of philosophers and other designers of a priori
breakdowns of knowledge. With this technique, one
starts with the entire collection and successively subdivides it until appropriately sized categories are obtained. Doyle14 has proposed a system of this type (see
Dattolall for pr~liminary experiments). However, this
system requires some a priori categories as a starting
point at each level of classification. Another classification algorithm of the divisive hierarchical type is that
of "CLASFY." This algorithm was devised by Lef-

~26

Spring Joint Computer Conference, 1970

TABLE I -Sample Nodes of Hierarchy
Node 1.5

Node 1.5.1

Node 1.2.3

CHEMISTRY

ORGANIC CHEMISTRY

NUCLEAR EXPLOSIONS

chemical reactions
chemical analysis
reaction kinetics
absorption
stability
solutions
separation process
uranium
impurities
thermodynamics
decomposition
labelled compounds
thorium oxides
oxidation
electric potential
adsorption
lattices
cations
spectroscopy
polymers
salts
solubility
organic acids
chromatography
phenyl radicals
organic chlorine compounds
alcohols
phenols

organic compounds
organic nitrogen compounds
organic sulfur compounds
organic bromine compounds
organic fluorine compounds
methyl radicals
propyl radicals
isomers
amines
benzene
ethanol
ethers
urea
ammonia
acetic acid
nitric acid
heterocyclics
solvent extraction
polymerization
alkyl radicals
oxygen compounds
cycloalkanes
hydroxides
catalysis
amides
hexane

nuclear explosion
radioactivity
radioisotopes
contamination
environment
detection
gamma radiation
temperature
analysis
pressure
computers
radiation protection
safety
economics

kovitz. 20 Preliminary experiments have been reported 1,21
and an in-depth investigation of this algorithm along
with extensive {50,000 document descriptions) testing
of it has been recently (1969) performed by Litofsky.22
Some results of these experiments will be discussed
later in this chapter.
Clustering systems involve a wide variety of classification techniques which seek to group index terms or
documents with high association factors together into
"clusters, "3,17 ,29 "clumps"10,23,24,34,35 or "factors"2,4,5, 7,8
without trying to obtain a hierarchy. Most of these
methods require matrix manipulation, though it should
be added that the precise manner of these manipulations varies widely with the particular scheme used.
A tabular summary of automatic classification experiments reported upon through 1968 is presented by
Litofsky.22
Regardless of the quality of the categories produced
by a classification algorithm, the algorithm must do its
task in a reasonable period of time for large collections
in order to be practical for use in libraries. In most
automatic classification systems being considered today
(clustering types), classification time is proportional to

Node 1.2.3.2

FISSION PRODUCTS
fission products
filters
decontamination
waste solutions
standards

the square, or even the cube, of the number_of documents in the system. This is because of the need to
compare every document (or partial category) with
every other document (or partial category) or to generate and manipulate matrixes whose sides are proportional to the number of documents and/or the number
of discrete keywords in the system (see Doyle,14
"Breaking the Cost Barrier in Automatic Classification"). This means that the cost of classification per
document goes up at least linearly with the number of
documents. Considering collections numbering in the
millions of documents, it is evident that systems with
the above characteristics are unacceptable.
There are two systems which are known to break
this N2 effect (N documents in the collection). These
are the two aforementioned algorithms (that described
by Doyle,14 and CLASFY) of the hierarchic, divisive
type. In both, the time proportionality factor is approximately N log N, where the logarithmic base is
the number of branches at each node of the hierarchy.
With appropriate selection of this node stratification
number, the classification time (and hence cost) per
document for these two systems can be held to a con-

All-Automatic Processing for a Large Library

stant. Using CLASFY, Litofsky22 has estimated
classification times, using third generation computers,
of about .04 seconds per document, independent of
the number of documents in the collection.

The resulting classification schedule

Classification schedules are required in order to be
able to make use of a hierarchically classified document
collection. These schedules consist of what shall be
called here a "node-to-key" table (see for instance
further Table I) and a "key-to-node" table (Table II).
The node-to-key table is analogous to the Dewey
decimal classification schedule where "node" 621.3
points to "key" Electrical Engineering. The key-tonode table performs the. inverse function, that of producing node numbers corresponding to given keys.
Because the systems under consideration here are of
the a posteriori type, the index terms, referred to here
as keywords, rather than a priori titles (such as Electrical Engineering) are present in these tables. These
tables are produced by first forming a hierarchy of
keywords.
The hierarchy of keywords is formed from the
bottom to the top. It should be noted that this keyword hierarchy is not used as a semantic hierarchy in a
thesaurus in order to obtain descriptors for documents,
but comes about a posteriori. Initially, the terminal
nodes, or categories, are assigned the keywords which
result from the union of the keyword surrogates of the
documents in that category. The keywords of the terminal nodes under a parent (next level up the hierarchy)

TABLE II-Sample Portion of a Key-to-Node Table

MUTATIONS
1.1.1.1.3.2.
1.1.1.1.3.3.
1.1.1.1.5
1.1.1.2.1
1.1.1.2.4
1.1.1.3.1
1.1.1.4.1
1.1.1.4.3
1.1.1.5
1.1.3.1.1
1.1.3.1.4
1.1.3.5
1.1.4
1.1.5
1.2.5.1
1.5.5

c
c
c
c
c
c
c
c

BARLEY
1.1.1.3.1.
1.1.1.3.2.
1.1.1.3.3
1.1.1.4.2
1.1.1.5
1.1.5.1
1.1.5.4
1.3.1.4
1.5.3

327

node are then intersected and those resulting keywords
are assigned to the parent node. The keyword sets of
the original nodes are then deleted of the keywords
assigned to the parent node. This process is continued
until the top node is reached. In this way, it is guaranteed that the keywords of each document description
are wholly contained in the set of keywords consisting
of the keywords at the nodes in the direct path from
the top of the hierarchy to the terminal node, or category, which contains that document. In addition, each
keyword will appear at most once in any given path
from the top of the hierarchy to a terminal node.
By this means the keywords at a node represent
somewhat of an abstract of the fields of knowledge contained beneath· that node. It is evident that as one
goes up in the hierarchy, one will encounter more frequent or generic terms while finding the more infrequent or specific terms lower in the hierarchy. Just
how this information can be used to aid browsing~.will
be covered later.
.
Table I shows some sample nodes of a hierarchy
generated by Litofsky22 using CLASFY. They are
part of a classification of almost 50,000 document
descriptions (subject matter was nuclear science) into
265 categories. The capitalized words in Table 1 are
manually assigned titles for the nodes. The node numbering scheme used is such that node 1.5.1 is directly
under node 1.5 (the node stratification number equals
five) and 1.2.3.2 is under 1.2.3. The node-to-key table
consists of lists such as the ones shown in Table 1. The
key-to-node table is the inverse of this. Thus, for this
example, node 1.5 would appear in the key-to-node
table under "chemical reactions." Figure 1 shows a
portion of the hierarchy tree (c indicates a terminal
node or category). The keywords themselves are not
shown due to lack of space.
Of course, the 'classification schedules are produced
automatically by computer at the time tte document
file is classified.

c
c
c
c
1. .1.4
Env1ronment and
Metabol1sm

c
c
c
c

/ff\

c
c
c

c
c
Figure I-Portion of hierarchy

328

1000

Spring Joint Computer Conference, 1970

access time and is itself made up of two components,
'motion access and latency (usually averaging one-half
revolution of the recording media). The smaller component of the retrieval time is the actual data transmission time. In general:

I-----'x-J

(1) Once the access time has been "spent," it costs
relatively little more to read additional data as
long as another access time is not involved.
(2) An appreciable time savings can be made by
reducing the required number of memory accesses.

100 I-----~

100

KU!:!'ber ot

1000

10000

b6821

ca tegorl es

Figure 2-Keys per category, 46821 documents

Evaluation of a classification

Evaluation of a classification is not an easy matter.
Until recently, almost all value judgments for classification systems were sorely hampered by dependence on
human judgment. One example of this can be found by
examining the automatically produced nodes of Table
I. The keywords for each node seem (at least to the
authors) to represent reasonably coherent areas of
knowledge. However, this is a qualitative, not quantitative, measure and being thus makes it very difficult
to compare this system with others.
A number of quantitative measures have been described by Litofsky.22 Two of these will be discussed
here. The "likeness" of two documents can be measured, to some extent, by the number of common index
terms of these documents. Extending this notion, a
measure of the- quality of a classification system is how
well the classification algorithm minimizes the average
number of different keywords per category.
Figure 2 presents curves for this measure applied to
almost 50,000 document descriptions for CLASFY, an
existing manual classification system (a priori, documents classified by subject specialists) and for control
purposes, a randomly ordered file. The parallelogram
represents the theoretical boundaries for these
curves. 22 ,28 Besides indicating closer content in categories, a lower curve also represents smaller storage requirements for the classification schedules. It is evident
that CLASFY outperforms the manual systems with
respect to this measure.
The second measure directly affects retrieval time
for on-line retrieval systems. Most mass storage devices
have two components to the time required to retrieve
a record. The larger component is the time required
for the read mechanism to approach the vicinity of the
desired information (or vice versa). This is called the

These points are very pertinent to manual ~s well as
computerized on-line retrieval systems because the
lack of the ability to batch queries leads to a large
number of memory accesses. Automatic classification
takes advantage of item (1) by grouping like documents
(i.e., categories) into cells which are segments of
memory (shelves, pages, tracks, cylinders, etc.) which
do not require more than one memory access. Thus, it
costs little extra in time to retrieve an entire cell than
it would to retrieve a single document.
In addition, classification reduces the number of
memory accesses required by the very fact that the
documents in a given cell are close to each other in
content. This "likeness" increases the probability that
multiple retrievals for a given query would appear in
the same cell. This in turn reduces the number of cells
accessed per query and hence the number of memory
accesses required.
This reduction in memory accesses can be translated
into greater capacity for a system. Alternatively, it
might speed operations up enough to justify slower,
but less costly mass storage devices.
Thus, the second measure is the number of cells
searched (accessed) for a given number of retrieval
requests. Figure 3 shows the numbers of cells searched

lOSr--------.----------,---------,

..

2763

""'"
'"

0-

~-----~l~O~O-----~~----~OO
N=ber of Cells (Categories)

Figure 3-Cells (categories) searched, 46821 documents

All-Automatic Processing for a Large Library

in response to 165 real retrieval requests. Once again,
CLASFY does better than the manual classification
system.
RETRIEVAL
Browsing in the collection

Retrieval -by conjunctions and disjunctions of keywords is performed by executing the proper boolean
function on the node lists of the key-to-node table.
This is done in a manner similar to that of inverted
files with the difference being that instead of resulting
in individual documents, the results here are categories.
In physical form, these 'categories could be books on
shelves or microfilm. Once pointed toward a particular
shelf or microfilm reel, the user could browse through
the documents in that category to find pertinent
documents.
If desired, the computer could serially scan the
descriptions of the documents in a category to find the
precise documents which satisfy the query.
In "The Conceptual Foundations of Information
Systems," Borko 6 notes:
"The user searches for items that are interesting, original, or stimulating. No one can find
these for him; he must be able to browse
through the data himself. In a library, he
wanders among the shelves picking up documents that strike his fancy. An automated
information system must provide similar
capabilities. "
The abiiity to browse through parts of a collection
should be an essential portion of every IS&R system.
There are many times when one has only a vague idea
of the type of document desired. Browsing can help
channel pseudo-random thoughts towards the information actually desired.
Browsing in the schedule

Effective browsing demands a hierarchical classification system in order to enable one to start with broad
categories and work towards specifics. Automatic
classification can produce such hierarchical sets of
categories. In a priori systems, nodes are given names
and index numbers. However, in a posteriori systems
the node names are generated automatically and consist of the sets of keywords (see Table I). If a set of
keywords is too large, humans or preferably automatic
processes can be employed to condense the set and
provide a suitable title for the node.

329

Naturally, automated browsing can only be effective
in on-line computer systems through man-machine
interaction. The user can enter nodes through keying
conjunctions or disjunctions of keywords. The system
would then display the nodes (showing the respective
parts of the node-to-key table) beneath the original
ones, as well as some statistics, such as how many documents there are beneath each node. When the user
selects branches, the cycle repeats with the new nodes.
If desired, one could backtrack up the hierarchy or
jump to completely different portions. Once the user
has narrowed his search, he can demand retrieval of
some or all of the documents by specifying keywords
and/ or categories.
Another way of browsing in a classified set of documents is to start at the very bottom. Assume one has
a specific query in mind and upon submitting it to the
system, obtains only one document. If this is insufficient one could broaden the search by requesting the
display of other documents in the category of the one
retrieved. Since these documents are close in content
to the original, they might also be satisfactory or their
keywords might suggest ways for the user to refine his
query in order to reference other nodes and retrieve
other documents of interest.
None of these modes of browsing could be utilized
by files with strict serial or inverted file organization.

Off-line browsing

An on-line computer is not a necessity for browsing
in the schedule (though if computers are used, they
should be on-line). The schedules could be made in
book form and the browsing done by hand. The procedures would be similar to those outlined above, but
somewhat slower due to page turning.
Table II represents a sample portion of a key-tonode table. It will be used to illustrate manual use of
the classification schedules for a query consisting of
the conjunction of MUTATIONS and BARLEY. A finger is
placed at the first node entry under each of the terms.
The numeric classification number under BARLEY
(1.1.1.3.1) is higher than that under MUTATIONS
(1.1.1.1.3.2), therefore, the listing of MUTATIONS is
scanned until 1.1.1.3.1 is reached. This indicates that
both keywords are contained in category 1.1.1.3.1.
The scanning is continued on the list with the lower
number under one's finger untii 1.1.1.5 is reached.
Since this node is not a category, both keywords appear in all categories starting with 1.1.1.5 (in this
example there happen to be five such categories).
Scanning is continued until 1.1.5 is encountered on the
MUTATIONS list while 1.1.5.1 is found under BARLEY.

330

Spring Joint Computer Conference, 1970

Since MUTATIONS is found in all categories under 1.1.5,
the conjunction can be found in 1.1.5.1 (and 1.1.5.4).
Continuing the scan does not result in any more common categories.
This process has resulted in a number of categories
which would now be browsed through. The major
advantage of this method is that in the indicated
categories one is likely to find relevant documents
which were not indexed by both keywords in addition
to those documents which were indexed by both
MUTATIONS and BARLEY.
Thus, an automated classification system can start
relatively simple and grow in complexity (and cost).
The classification can be done by a batched computer
with retrieval done by hand from printed classification
schedules. When collection size and system utilization
warrants, the retrieval function could be converted to
an on-line computer with little conceptual difference
from the user's point of view.
Browsing simplicity

The methods of browsing outlined above are strikingly similar to those employed in traditionally organized libraries. They do not require users to be
familiar with a thesaurus, its structure or with relations between terms. They do not require formulation
of query formulae or to understand a computerized
system processing of such formulae. In short, these
methods do not require highly trained staff to interpret
user queries but allow user direct browsing along
traditional browsing patterns.
REFERENCES
1 T ANGELL
A utomatic classification as a storage strategy for an information
storage and retrieval system
Master's Thesis The Moore School of Electrical Engineering
University of Pennsylvania i966
2 G N ARNOVICK J A LILES J S WOOD
Information storage and retrieval-Analysis of the state of the art
Proceedings of the SJCC 537-611964
3 R E BONNER
On some clustering techniques
IBM Journal 8: 22-32 January 1964
4 H BORKO
The construction of an empirically based mathematically derived
classification system
Proceedings of the SJCC 279-80 1962
5 H BORKO
Measuring the reliability of subject classification by men and
machines
American Documentation 268-73 October 1964

6 H BORKO
The conceptual foundations of information systems
Systems Development Corporation Report No SP-2057 1-37
May 61965
7 H BORKO M BERNICK
A utomatic document classification
JACM 10 151-62 April 1963
8 H BORKO M BERNICK
A utomatic document classification Part II: Additional
experiments
JACM 11 138-51 April 1964
9 C W CLEVERDON et al
Factors determining the performance of indexing systems Vol
i-design part 1 text
ASLIB Cranfield Research Project 1966
Also
C W CLEVERDON
Report on testing and analysis of an investigation into the
comparative efficiency of indexing systems
ASLIB Cranfield Project October 1962
10 A G DALE N DALE
Some clumping experiments for associative document retrieval
American Documentation 5-9 January 1965
11 R T DATTOLA
A fast algorithm for automatic classification
In Information Storage and Retrieval Cornell University
Dept of Computer Science Report No ISR-14 Section V
October 1968
12 L B DOYLE
Some compromises between word grouping and document
grouping
Symposium on Statistical Association Methods for
Mechanized Documentation 15-24 1964
13 L B DOYLE
Is automatic classification a reasonable application of 8tati8tical
analysis of text?
JACM 12 473~89 October 1965
14 L B DOYLE
Breaking the cost barrier in automatic classification
AD 636837 July 1966
15 J S EDWARDS
Adaptive man-machine interaction in information retrieval
Ph.D. Dissertation The Moore School of Electrical
Engineering
University of Pennsylvania December 1967
16 J C GARDIN
Syntol Vol II
Rutgers State University 1965
17 R T GRAUER M MESSIER
A n evaluation of Rocchio's clustering algorithm
In Information Storage and Retrieval Cornell University
Dept of Computer Science Report No ISR-12 Section VI
June 1967
18 G N L~NCE W T WILLIAMS
A general theory of classificatory sorting strategies
I Hierarchical systems
Computer Journal 9-4 373-80 February 1967
19 G N LANCE W T WILLIAMS
A general theory of classificatory sorting strategies
II Clusteri'ng systems
Computer Journal 10-3 271-7 November 1967

All-Automatic Processing for a Large Library

20 D LEFKOVITZ
File structures for on-line systems
Spartan Books March 1969 See Appendix B
21 D LEFKOVITZ T ANGELL
Experiments in automatic classification
Computer Command and Control Company Report No
85-104-6 to the Office of Naval Research Contract NOnr
4531(00) December 31 1966
22 B LITOFSKY
Utility of automatic classification systems for information
storage and retrieval
Doctoral Dissertation University of Pennsylvania 1969
23 R M NEEDHAM
A pplication of the theory of clumps
Mechanical Translation and Computational Linguistics 8
113-271965
24 R M NEEDHAM K S JONES
Keywords and clumps
J. Documentation 20 5-15 March 1964
25 N S PRYWES
Browsing in an automated library through remote access
Computer Augmentation of Human Reasoning 105-30 June
1964
26 N S PRYWES
An information center for effective R&D management
Proceedings 2nd Congress on Information Systems Science
109-16 November 1964
27 N S PRYWES
Man-computer problem solving with multilist
Proceedings IEEE 54 1788-1801 December 1966
28 N S PRYWES
Structure and organization of very large data bases
Proceedings Symposium on Critical Factors in Data
Management UCLA March 1968
29 J J ROCCHIO JR
Document retrieval systems optimization and evaluation
Doctoral Dissertation Division of Engineering and Applied
Physics Harvard University 1966

331

30 N SAGER
A syntactic analyzer for natural language
Report on the String Analysis Programs Department of
Linguistics
University of Pennsylvania 1-41 March 1966
31 G SALTON
Scientific report No ISR-ll and No ISR-12
In Information Storage and Retrieval Dept of Computer
Science
Cornell University June 1966 and June 1967 respectively
32 G SALTON
Content analysis
Paper given at Symposium on Content Analysis University
of Pennsylvania November 1967
33 W C B SAYERS
A manual of classification for librarians and bibliographers
Second edition Grafton and Company 1944
34 K S JONES D JACKSON
Current approaches to classification and clump-finding at the
Cambridge Language Research Unit
Computer J 29-37 May 1967
35 K S JONES R M NEEDHAM
A utomatic term classifications and retrieval
Information Storage Retrieval 4-2 91-100 June 1968
36 M E STEVENS
A utomatic indexing: A state of the art report
National Bureau of Standards Monograph 911965
37 P S STONE et al
The general inquirer: A computer approach to content analysis
The MIT Press 1966
38 J H WARD JR M E HOOK
A pplication of a hierarchical grouping procedure to a problem of
gro'uping profiles
Education and Psychological Measurement 23 69-92 1963
39 M S WOLFBERG
Determination of maximally complete subgraphs
The Moore School of Electrical Engineering University of
Pennsylvania Report No 65-27 May 1965

Natural language inquiry to an
open-ended data library
by GEORGE

w.

POTTS

Meta-Language Products, Inc.
New York, New York

INTRODUCTION
New technologies often create an effect similar to
children bragging about their new Christmas toys.
This "me-too-ism," as it exists in the proprietary software community, is unfortunate because it robs credibility from this emerging industry at a very crucial
time. In this paper a new computer system and language, "MUSE,"* is presented with the intention not
to follow this pattern.
"MUSE" could be said to belong to that family of
languages called "non-procedural" in that it is not
necessary to produce a sequential flow of programming
logic to force output. This is a somewhat ambiguous
concept in that "MUSE" does incorporate a capability
for the user to embed "procedures." It is expected that
this (among other features) will enable "MUSE" to be
used as an information system for management as well
as other disciplines that require easy access to and
manipulation of large volumes of data.
In the development of "MUSE" an attempt has been
made to refrain from producing anything of merely
academic interest. It is the child of a very informal
set of circumstances in ~he authors' experience which
has consistently shown the need for an unstructured
dialog between those who have non-routine problems
and the computer that can contribute so much toward
a solution.
BACKGROUND
Among the spectrum of new technologies four seem
to be travelling convergent courses. They are:
-Time-sharing
-Natural programming languages

* "MUSE," an acronym for Machine-User Symbiotic Environment, is a trademark of Meta-Language Products, Inc.
333

-Data management
-Management information systems
Each of these subjects has had a good bit of exposure
in computer and business publications recently ...
much of it groping. There follow brief discourses, not
explanations, on the above subjects with an attempt
to offer a few preparatory insights. Following that,
there is a description of "MUSE" and a few of its
more interesting features that tie these subjects together.
Time-sharing

Computer time-sharing, although it provides remote
access and real-time capability, is primarily a medium
for nonroutine data processing. Here the interactive
environment is the message. It is not where or when
creative interaction takes place, it is that it take place.
A good bit of controversy exists just because of confusion on this first point. It is not surprising that the
cult of time-sharing purists take issue when a remote
polling capability is called "time-sharing." Polling
algorithms are created to provide routine, low-level
input and inquiry. The advantage of true time-sharing
is creative interaction. The system designers who have
diligently labored to develop a general purpose timesharing capability must be complimented on their restraint when their work is confused with a reservation
system or message switching.
Another point of confusion regarding time-sharing is
due to a changing optimization emphasis. The entire
orientation of batch processing is to push jobs through
the comput.er as fast as possible. In time-sharing, as
Dartmouth's Dr. Kemeny likes to P9int out, the optimization is more for the user. It is very difficult for some
to accept the premise that machine time should be
wasted so that optimum use can be made of the ma-

334

Spring Joint Computer Conference, 1970

chine-user pair. The motivation of some of the recent
studies of comparative productivity between programming in a time-shared vs. a "batch" environment
appears tenacious in the very same sense. While currently the worst case might be only slight increases in
productivity, the potential of this type of creativity is
huge using time-sharing while with batch it has about
run out.
There are at least three major types of effort that
lend themselves to the creative use of time-sharing:

except the problem statement itself (e.g. APL).l With
refinement the user ignores data input/output detail,
compiler directives and any explicit statement of the
sequential flow of logical events within the machine.
Here then is the threshold of real natural language
interaction. As the problem statement can be made to
look more like English, so can the interactive experience
be made more universal.
Data management

1. Preprogrammed aids for scientists, engineers, etc.

2. The development and debugging of computer programs for both the batch and time-shared environments.
3. Information systems for inquiry and interpretation.
Notice that only the first of these has been exploited
by the time-sharing vendors. The second use is probably
a major objective of the large computer networks currently under construction. The last use is the reason
for this paper.
Natural programming languages

English or "natural" programming languages have
had a somewhat orderly development from the first
halting steps, using mnemonics for binary machine
codes, through FORTRAN and COBOL to the "natural" languages. It is interesting to note that BASIC,
a language only slightly simpler than FORTRAN, is
used by over 60% of all users of time-sharing.
Languages that had to be debugged in a batch
processing environment developed quirks of form that
have a remarkable staying power in the languages now
being used in time-sharing. Slowly the older languages
are being modified to permit interactive debugging
without eliminating the compiler phase. This permits
the compiled code still to be run in batch. Interpretive
languages have gained popularity commensurate with
the increased use of time-sharing. These are languages
that are not reduced to a machiqe language level but
rather to encoded forms that are never given machine
control but are interpreted by other operations code.
Interpreted code is usually slower running than compiled code and thus never had much use in batch shops.
Not surprisingly, however, the first time-sharing languages interpreted are very similar to those that were
formally compiled (e.g. QUIKTRAN). The advantage
of both interpretive languages and interactive compilation is that they make error detection and correction a creative and involving experience.
The current step in language development appears
to be the attempted elimination of all code sequences

Data has been referred to as the fifth economic
factor of production (as vital as land, labor, capital and
management). * It is clear that a great deal of this
data is numeric information and that manipulation of
this data with computers is becoming indispensable in
institutional operations.
The management of this data, or their organization
for quick and easy reference, is a discipline with a
checkered past. It would not be stretching a truth too
far to say that data processing has followed the course
determined by the development of hardware for interfacing with external data files. Fortunately for the
computer industry the first technology of data referencing, sequential access, is most easily applied to
routine data processing (that with the most' obvious
cost benefits). The other major mode of data referencing,
direct access, has matured along with the growth of
time-sharing. Now direct access devices are becoming
available with data rates, capacities and reliability far
more suited to nonroutine, unplanned data referencing.
Again, notice the subtle misapplication of this new
potential due to lagging system software development.
l\1any languages in time-sharing still reference data
files sequentially, even though they are on a direct
access device. This is forgivable when referencing small
private data files, but intolerable if reference is attempted to very large, common data libraries (which,
conveniently, also cannot be referenced with most
languages in time-sharing). It would seem that a large,
open-ended, data library.that can be directly accessed,
simultaneously, by a large number of time-sharing
users would be the answer to most of the logistics
problems of data management.
The only rub is that this structure, which is so well
suited for non-routine data manipulation, is unsuitable
for routine data processing, where throughput rates are
far more critical. A simple solution, then, would be to
have two data structures, one for each orientation. In
most cases, routine data processing would use sequential
access (except where transaction levels are small) and

* I have lost this reference.

Natural Language Inquiry

non-routine data processing would maintain its own
directly-accessed data library. The non-routine environment would accept data from the routine environment
but not, in general, return any new data in the opposite
direction. It is as though there were a semi-permable
membrane between the two structures. This suggested
solution would probably hold until far cheaper, larger
and faster direct access devices are available.

335

? WHAT WERE SI\I.ES FOR THE I.I\ST QUI\RTER OF THIS YEI\R?
206.8 MII.I.ION DOI.I.I\RS
?WHAT IS THE DEFINITION OF "SI\I.ES"?
SI\I.ES - THE DOI.I.I\R VI\I.UE OF GOODS SHIPPED TO PURCHASERS BEFORE 01 SCOUNTS
?HOW DO I GET SI\I.ES OF PHYSICI\I. UNITS?
THE DEFINITION OF SHIPMENTS IS THE NUMBER OF PHYSICI\I. UNITS SHIPPED TO
PURCHASERS.
THI S MIGHT HEI.P.
?GIVE ME SI\I.ES. SHIPMENTS I\ND I\1.S0 SI\I.ES/SHIPMENTS FOR THE I.I\ST FOUR
QUI\RTERS FOR CONSUMER PRODUCTS IF THEIR % OF Ml\RKET IS GREI\TER THI\N 25
I\ND THEIR CURRENT UNIT PRICE IS NOT GREI\TER THAN THEIR UNIT PRICE FOR
1 'I '68.
TITI.E THIS REPORT "SI\I.ES I\NI\I.YSIS SHEET".

Figure 1

Management information systems
l\rfanagement information systems might be said to
have a checkered present, as there seems to be no
obsolescence of approach. The first attempts, on computers, to provide management with the necessary data
for decision making culminated in "exception reporting." This resulted not because it was the best way of
capturing interesting data from the routine processing
environment, but because it was the easiest. As a
consequence, exception reporting has two major drawbacks:
1. Someone must decide, a priori, how decisions are to
be made and so design exception reports that.contain
statistics highlighting this methodology. This is a
dangerous approach because it eliminates flexibility
at the very point it is most needed.
2. To generate enough statistics to cover all eventualities is to generate too much paper to be conveniently read. I t is a common experience to see
stacks of exception reports used more as evidence
of concern than as a decision-making tool.
It should be permitted the individual manager, in
his own way, to have access to all that data that could
affect his operations and to be able to form constructions
from this data in a way best suited to the problem at
hand.
The recent proliferation in the market place of data
management systems must indicate an intuitive dissatisfaction with the old methodologies. However it is
not plain that many of them offer a new alternative.

THE SYSTEl\rf
A computer language has now been developed that:
-exists in a time-shared environment
-uses the English sentence form as its basis for
man-machine communication
-incorporates a simple data-capturing tool and
provides reference to a large common data
library

-is oriented toward the creative involvement of
managers and other non-technical types with
the computer
This language is "MUSE." "l\rfUSE" is not just the
"MUSE" language. It is an interactive information
system for non-routine problem solving and creative
decision making.
"MUSE" is not a time-sharing system. It is a subsystem that interfaces with a general purpose timesharing system. It is built in such a manner for transferability to different time-shared computers of differ~
ent manufacturers.
"MUSE" can be easily understood in terms of five
functional modules: normal interaction module; teaching module, data loading and maintenance module;
meta-language processor; and report generator. Each
of these modules, in turn, is constructed of submodules,
many of which are used in common by the larger,
functional modules. A description, with examples, of
these modules follows:
Normal interaction module
The "MUSE" language, as it presents itself to the
terminal user, is simple English sentences-questions,
commands and declaratives. These sentences are composed individually or in paragraph form. A small
sample of a dialog is shown to indicate its general
interactive nature (Figure '1).
Questions are used for two main purposes: first, to
recall and manipulate data; and second, to permit free
form queries about "l\rfUSE's" capabilities.
Commands, in general, are the user's control over
the dialog process. This includes modifying sentences
or words, listing of all or part of a dialog, inserting or
removing sentences or words, updating data or definitions, creating new language elements, requesting
report output, etc.
Declaratives are the user's statement of what a report
should contain.
These three types of sentences may be intermixed in

336

Spring Joint Computer Conference, 1970

CLASS NAMES (UNIVERSAL SETS)

SIMPLE QUALIFICATION

SALES FOR ALL COMPANIES FOR 1965

OPTION 1)

SALES FOR 1965 FOR G .M.

OPTION 2)

G.M.'S SALES FOR 1965

OPTION 3)

SALES FOR G.M. FOR 19~~

1965

XXX.X

AMERICAN CAN

1965

XXXX.X

AVCO

XXX.X

BETHLEHEM STEEL

G.M.

XXXX.X

XX.X

CONTAINER CORP.

XXXX.X

G.M.

Figure 4

Figure 2

ARITHMETIC FORMULAE

LISTS (SET CREATI NG)

SALES

+

OTHER INCOME FOR G.M., FORD FOR 1965

SALES, EARNINGS FOR G.M., FORD FOR 1965

SALES
EARNINGS
($)

1965
G.M.

XXXX.X

XX.XX

FORD.

XXXX.X

XX.XX

Figure 3

1965

t-

OTHER INCOME
MILLIONS)

($ -

G.M.

XXXX.X

FORD

XXXX.X

Figure 5

Natural Language Inquiry

the dialog as the user wishes. However, declaratives
are the only ones preserved when the dialog is recorded.
Dialogs may also ref<:rence other dialogs and are identified in the same manner as any other entity in the
system-up to 5 words of 10 alpha characters each.
In order to explain the data referencing and manipulative capabilities of "MUSE" there follows a series of
eleven illustrations which represent the relationship
between dialog and output. The text excerpts that produce the output may comprise part of a declarative or
interrogative sentence. To visualize this, preface these
excerpts with "INCLUDE" or "WHAT ARE" respectively. Also the output produced is very stylized
and need not be of three dimensions.
In Figure 2, the word "for", or its synonyms is used
in "MUSE" as an oper~tor to order the qualification
process. This process locates data III a data library of
N dimensions by figuratively intersecting planes passing
through identifier-located points on every necessary
coordinate axis. The intersection of these planes produces a unique disk address for direct data reference.
In Figure 3, identifiers of like class may be explicitly
grouped into lists. This normally produces a vector of
data elements on output.
In Figure 4, these lists can be given identifiers and
used implicitly (see Figure 8) or the identifier for a
class of identifiers implies the universal set.

337

VECTOR FUNCTIONS
SALES, EARNINGS FOR G.M. FOR THE AVERAGE OF
1965, 1964, 1963

AVERAGE OF
1965, 1964, 1963

G.M.

EARNINGS
(S)

~--------------~~-----.-

xxxx.x

xx.xx

Figure 7

In Figure 5, normal arithmetic capability is available
within a class of identifiers or between classes (using
parentheses) .
In Figure 6, functions are available to transform
individual data elements on a one-for-one basis.

POINT FUNCTIONS
LANGUAGE EQUIVALENCES
lOG OF SALES FOR G.M., FORD FOR 1965/1964

.!..!Q!.! (SYNONYMS)
EQUATE "GENERAL MOTORS INC." WITH "G.M."I
EQUATE" PlUS" WIT H "+" I
EQUATE "N.F.C." WITH "NET Fo.RCOMMON"1

1965/19M~

G.M.,
FORD

_ _ _ _~_ _ _----~

x.xxx
x.xxx

I FOR N (EXPRESSIONS)
EQUATE "AUTO INDUSTRY COMPANIES" WITH "FORD,
G.M., CHRYSLER, AMERICAN MOTORS" I
EQUATE "VALUE ADDED" WITH "SALES-COST OF GOODS
SOLD" I
N FOR N (PHRASE STRUCTURES)
.EQUATE "ADD 'EXPRESSION TO EXPRESSION" WITH
"EXPRESSION + EXPRESSION" I
EQUATE" DEFI NE EXPRESSION AS EXPRESSION" WIT H
"EQUATE EXPRESSION WITH EXPRESSION" I

Figure 6

Figure 8

338

Spring Joint Computer Conference, 1970

SET EXPRESSIONS

QUALIFICATION CLAUSE (INCLUDING BOOLEAN OPERATOR)

SALES FOR THE AUTO INDUSTRY COMPANIES WHICH

VALUE ADDED FOR 1966, 1967 FOR ALL COMPANIES WHOSE

ARE IN THE FORTUNE TOP 10 FOR 1965

SALES FOR 1965
1967

>

<

SALES FOR 1966 AND EARNINGS FOR

$10.00

FORD

XXXX.X

AMERADA

XX.XX

e,M,

XXXX.X

I.B.M.

XX.XX

CHRYSLER

XXXX,X

Figure 11
Figure 9

In Figure 7, functions are available to reduce a vector
of values to a parameter.
In Figure 8, new identifiers or identifier groupings
can be equated with old identifiers or identifier/operator
groupings.

In Figure 9, explicit or implicit lists can be combined
using standard Venn criteria.
In Figure 10, explicit or implicit lists can be culled
for members which, with further qualification, meet a
relation~.I criteria.
In Figure 11, relational tests may be combined with
boolean operators.
ASSIGNMENT

SIMPLE QUALIFICATION CLAUSE (WITH RELATIONAL OPERATOR)
SALES, EARNINGS FOR 1966 FOR ALL COMPANIES WHOSE
SALES FOR 1965

>

PROFIT FOR 1970 FOR THE PARTS DIVISION IF THE
TURNOVER IS .5%

= $100 MILLION

EARNINGS

1970

($)

1966
AMERICAN CAN

XXXX.X

XX.XX

CO NT AI NER CORP.

XXXX.X

XX.XX

G.M.

XXXX.X

XX.XX

'.B.M.

XXXX.X

XX.XX

Figure 10

PARTS DIVISION

XX.XX

Figure 12

Natural Language Inquiry

And in Figure 12, identifiers that exist without definitions (variables) in dialogs or language equivalences
may be assigned values (symbolic or numeric) with a
form of the verb "to be."
Syntax analysis in "MUSE" primarily checks operator nesting structures, juxtaposition of language tokens
and parenthesis balancing.
With "MUSE," then, a satisfactory data referencing
algorithm seems to be available within a sentence form
that seems "natural" to its user. The latitude of possible
forms is expanded by inserting language equivalences
into the dictionary.
Teaching module

"MUSE" permits a question and answer interaction
that bears no relation to data or calculations. This is
an unstructured dialog where information is requested
about how "MUSE" works. The system responds with
answers that also suggest other questions. In such a
manner, the user picks up the "MUSE" technique at
his own speed.
This feature's first implementation is very similar in
approach to that taken in the HELP system developed
at Berkeley.2 It analyzes how, when and why questions
with a key word scheme and produces prewritten
answers to anticipated questions or question sequences.
A sample of this form of dialog is provided (Figure 13).

339

2. IS A PHYSICAL INPUT REQUIRED? YES
2.1 ON wHAT DEVICE? TAPE
2.1.1 HOw IS THE INPUT ENCODED? BCD
2.1.1.1 WHAT IS THE PARITY? YES
PLEASE ANSWER FROM AMONG THE FOLLOWING: ODD. EVEN
2.1.1.1 WHAT IS THE PARITY? EVEN
2.1.2 ARE THERE INITIAL (HEADER) RECORDS TO BE SKIPPED? YES
2.1.2.1 HOW MANY PHYSICAL RECORDS ARE TO BE SKIPPED? 1
2.1.2.2 MAXIMUM SIZE OF SKIPPED RECORn(S)? 150
2.1.3 PHYSICAL RECORD SIZE IS? 150
2.1.4 LOGICAL RECORD SIZE IS? 300
2.2 IS PHYSICAL INPUT REQUIRED FOR DATA INPUT? NO
2.3 IS DATA STRUCTURE (ALL/PART>. DERIVED FROf>1 INPUT? YES

Figure 14

An excerpt of this form of dialog is provided
(Figure 14).
These systems queries must obviously be answered
by persons familiar with data processing. They will be
passing on to the "MUSE" system information regarding the physical form of the data file, its logical
form, the identifiers used to reference the data and
other attributes of the data. This is, in effect, the bulk
of the documentation normally associated with every
batch processing data file.
This dialog is used to actually start the loading
process and convert the data into the form used by the
rest of the "MUSE" system. These dialogs are preserved
for the purpose of updating with similar data files.
The only assumption made by this data-loading
mechanism is that the data is in computer acceptable
form and that it has been formatted for data processing
(as opposed to typography, for instance).

Data-loading and maintenance module
M eta-language processor

This module could easily (and probably will) be the
subject of an entire paper. The idea that there can be
a generalized interface to the real world of data input
is the dipole of the earlier concept of generalized data
output with RPGs.
Data loading in the "MUSE" system is accomplished
by first engaging in a unique question and answer
dialog. Throughout this interchange the system is asking
the questions and the user is providing the answers.

~~~: ~~ ~S~~E B~U~~~ING

QUESTIONS. COMMANDS OR DECLARATIVE SENTENCES ON
YOUR TERMINAL WHENEVER YOU SEE A QUESTION MARK (1) AT THE FAR LEFT.

~W:~:S~~O~ ~~E/s~~~i~~~N~NDING

IN A QUESTION MARK (7). AS THE ONE YOU
JUST TYPED. lIHICH REQUESTS IMMEDIATE INFORMATION FROM THE COMPUTER.
THERE ARE TWO TYPES OF QUESTIONS . . . FORMAL ONES AND INFORMAL ONES.

7HOW IS A COMMAND DIFFERENT FROM A QUESTION?
A COMMAND IS A SENTENCE ENDING IN AN EXCLAMATION POINT 

oz;. -

SALES/SHIPMENTS
<$/FL. OZ;.)

921.6
1721·3
1010.7

.17
.16
.09

1021.9
1921.8
1232.5

.19
.16
.10

QUARTER 4
DEODORANT
TOOTHPASTE
HAIR SPRAY

Figure 1.5

Report generator

The report reproduced (Figure 15) is part of the one
developed from the declarative sentences displayed under Normal interaction module. Notice the symmetr~c
structure reflecting the dimensional ordering of output.
The "for" or qualifying operator has served not only
to arrange the output but also to unravel the sequence
of references to the data library.
The "MUSE" user, you may have noticed, has made
no statement as to what is to go where in the report,
how headers are to be displayed, what units are displayed, or even where the decimal point is to be placed.
These were all developed, by "MUSE," from the
dialog. There is no user intervention required to organize
and produce a report from a declarative dialog. He can,
however, if it be necessary, change the units and scaling
of data, rotate the report axis, remove or insert extra
spacing and output the report to a variety of devices.
The reason why there is such a degree of initiative
on the part of the system in formatting output is that
"MUSE" was designed with the assumption that it is
to be used for non-routine data processing. Doing this
has caused a reversal in the normal temperament of
programming. Now the language is far more artistic
and the output far more functional.
BUILDING BLOCK CONCEPTS
The creation of "MUSE" unequivocally depended
upon the refinem~nt of, and the belief in a small set of
fundamentals. They are given here to present another
lens with which to inspect the system:
Primary data

The "MUSE" system is designed to operate most
effectively entirely with stored primary data. This is
defined as source data, or sampling data, or data to
which the construction key has been lost. It is the
opposite of constructed or secondary data which are

arithmetic combinations of two or more primaty data
elements. This does not mean that secondary data is
not available through "MUSE." It just means that it
is not stored. What are also stored are the declarative
procedures that can reference primary data, combine
it arithmetically, and display it as though it had been
stored.
The advantages of this approach are as follows:
-The primary data together with declarative procedures take far less direct-access storage than
the combination of primary and secondary data.
-There is no complex back indexing necessary to
adjust primary data given changes in secondary
data.
-There is no chance of inconsistency of result if
the primary data is updated without the secondary.
-The user has far more flexibility in changing the
construction of secondary data elements, or
creating new ones. In fact, he may even remove
entire declarative constructions without performing violence on the system.
The dimensionality of data

The realization that most numeric data, as it is
organized for data processing, can assume, in this
organization, a logical structure similar to a regular
N-dimensional array has been fundamental in the design
of "MUSE."
What has tended to obscure this point has been the
great attention given to physical data structures. These
physical structures become laborious due to the storing
of textual information along with numeric, the great
disparity between the size and number of the vector
coordinates of the logical structure, and the size and
dimension constraints of the physical storage medium.
The advantages of this logical form of data structure
are:
-There is much less indexing information necessary to permit random retrieval. So much less
that this indexing information can be stored on
faster storage (e.g., drum vs. disk).
-This, of course, allows much faster accessing of
data.
-It greatly simplifies the language needed for data
referencing. For example, the qualification sequence "SALES FOR XEROX FOR 1965" is
all that is needed to delimit a unique data
element.
-It permits the direct loading and interfacing
with the current data library of virtually any
new data file.

Natural Language Inquiry

Bootstrapping language

"MUSE" is an extendable language. When the
"MUSE" system is installed it has a dictionary, with
definitions, of approximately 250 entries. This dictionary is expanded as a result of two classes of activity:
1. The loading of data and the resultant adding of
new _identifiers, definitions, secondary data constructions, and identifier groupings.
2. The inserting into the dictionary of synonyms and
more complex language equivalences.
The concept of a bootstrapping language is interesting
because:
-The dictionary is almost entirely user-built.
-It provides the user the opportunity to communicate in his own idiom.
-The multitude of possible language forms can
make the system seem very "forgiving."
-It permits both verbose and shorthand notation.
Information about information

In any single arithmetic computation there is really
more than one thing going on. Not only are numbers
being combined but also the attributes of numbers are
(or should be) similarly resolved. The development of
"MUSE" has taken into account this parallelism of
calculation and provides the following levels of computation with every simple arithmetic operation:
-The scaling of both numbers is combined to
provide the scaling of the answer.
-A count of significant digits for output is maintained.
-The units of both values are compared or combined to provide the answer units.
-Attempts are made to preserve the integer form
of any numbers.
Human engineering

In the design of "l\1:USE" effort has been expended
to make it sympatico with its user. If non-technicians
are to be brought into dialog with a computer they
must be appreciated for what they are, not what they
might be. The adjustment must be made in the electronics and not the emotion. The following are some of
"MUSE's" features that were motivated by the above
realizations:
-The translation process takes place in a series of
levels. Each one notes different user errors and
permits correction of these errors. This is not

341

precisely the same as incremental compilation
which translates, completely; one statement at a
time. In "l\1USE," each statement goes through
functional phases of translation which, in some
cases, may be separated in time.
-"MUSE" provides full line, word, and character
editing capabilities at all levels of the dialog
process.
-The syntax of commands permits construction
of user requests which expand on the standard
VERB-OBJECT form.
-The "MUSE" system from time to time will
give suggestions or warnings to the user. These
are not errors, but potential errors.
-The "MUSE" system uses the Teletype character set in a standard way. "$" is meant to
mean "dollars" and not some special notation
to help the system designers through a rough
spot.
-Definitions are maintained for every language
token in the system. These definitions can be
recalled at will by the user. The entire dictionary, with definitions, is also available.
Efficient systems architecture

The "l\1:USE" system has incorporated many efficiencies of structure:
-l\1:odular system construction is used for acrosscomputer-implementation and ease of upgrading
capabilities.
-Assembly language coding permits economies of
size and speed.
-The "MUSE" dialogs are, in fact, applications
programs for the sake of reproducibility of
output.
-As a subsystem, "MUSE" borrows features of
the time-sharing system which condenses its
size.
-Advanced system programming techniques are
used throughout.
CONCLUSION
The general objective of "l\1:USE" has been to enhance the time-shared computer environment by providing a natural language for machine-user communication. It is designed to provide the manager with a
medium'of interaction with a large common data base,
loaded and maintained by the "MUSE" system.
"MUSE" is capable of performing as a simulation
model builder, statistical tool, data screening and sorting aid, and report organizer for non-routine, creative

342

Spring Joint Computer Conference, 1970

use. It is not particularly intended to handle routine
processing problems requiring high-resolution data
and non-symmetric, highly formatted output such as
invoicing, payroll, billing, accounts payable and the
like. This is why it is created in a time-shared environment where the machine-user interaction and not the
material flow is the focus of optimization.
da~a

ACKNOWLEDGMENTS
Many thanks are given to David Homrighausen, Peter
Kaiser, Harold Kurfehs and Susan Mazuy of Meta-

Language Products, Inc. for their assistance
paring this paper.

REFERENCES
1 K ElVERSON
A programming language
John Wiley & Sons Inc New Y9rk New York 1962
2 C S CARR
Help-An on line computer information system
Document N () 40 20 30 Office of Secretary of Defense
Advanced Research Projects Agency Washington DC
January 19 1966

III

pre-

Computer instruction repertoire-Time for a change
by CHARLES C. CHURCH
Litton Systems, Inc.
Van N uys, California

INTRODUCTION

on the increased complexity of problems that computers
are being asked to solve. It is legitimate, but not
helpful. Many of these problems could be alleviated if
we modified our computer tool so that its capabilities
more closely matched the job that needs to be done.
Following this logic, we did a study at Litton to
roughly determine how the current machine instructions
were being used. Table~ I and II show some of our
results.

A famous slogan a few years ago from a famous computer manufacturer was a single, simple word-THINK.
And we in the computer profession must have taken
this slogan to heart. One look at the voluminous material printed year after year on computer technology
since the advent of the computer age-a quarter' of a
century ago-will convince anyone that a lot of thinking
concerning computers has preceded us. The proceedings
for this conferenc!3 alone will cover two thick volumes,
if we can judge from previous experience.
I hardly need mention the technological change in
computer system hardware, sizes, speed, weight, cost,
languages and other similar changes that have been
wrought from those primitive days twenty-five years
ago. But, some things have not altered, or only slightly
so, like the Model T, they have remained invariant,
upright, slow, inefficient and immune to the winds of
progressive technological improvement. And if they did
get us from point A to point B, if not in comfort, at
least, we got there. I am referring to the basic form and
rigid format of our instruction sets. I deliberately said
our instruction sets because we have all inherited them
from one original source.
With all due respect for our heritage, I think it is
time to rethink our requirements for an adequate instruction repertoire.

Instructions

Percent

Total

Arithmetic

8.3

1,146

Data Moves

39.4

5,437

Logic

2.7

372

Shifts

2.3

312

Transfers

23.9

3,303

Jumps

12.0

1,655

1/0

Miscellaneous

0.7

99

10.7

1,480

13,804

TABLE I-Instruction Occurrence by Instruction Class

THE PROBLEM:
The symptoms of the problem are obvious:
• Programming costs are high and increasing.
• Program implementation times are long.
• Program modification is difficult and time consuming.
• Hardware systems are getting larger and larger.

Naviption Propam

Track Correlation Propam

Total

Arithmetic

10.5

100,206

177

12,370

87,659

Data Moves

38.4

369,645

6,035

21,228

342,382
10,838

LOSic

1.4

14,064

66

3,160

Shifts

1.3

12,642

1

1,490

11,151

Transfer

26.6

256,589

4,172

18,601

233,816

Jumps

152,560

17.1

165,121

3,893

8,668

1/0

0.1

803

97

706

0

Miscellaneous

4.6

44,387

410

4,780

39,197

100.0

963,457

14,851

71,003

877,603

Total

It is legitimate to blame many of these difficulties

PropIm Monitor

Percent

Instructions

TABLE II-Instruction Execution by Instruction Class

343

344

Spring Joint Computer Conference, 1970

The high correlation between instruction occurrence
and execution was interesting. Of real significance,
however, was the small percentage of the repertoire
devoted to accomplishing the job, as compared to the
large percentage to placate the computer's whims.
In instruction occurrence we found arithmetic 8.3
percent and jumps 12 percent. What are we doing with
the rest of the commands? Obviously, we need the
"Data Moves" function, but do flow charts call for
anything near 40 percent? And what of the transfers?
My flow charts do not call for anything near 23 percent
of the problem to be involved in transferring.
It becomes obvious that the majority of the programmer's required work is involved ,with placating
the computer, while a relatively small remaining portion
of the work is actually applied to the job.
What is the problem? Basically, it's that current
machines have machine-oriented instructions. We must
design computers that are more problem-oriented.

are required to convert the data to intermediate storage
format. These two factors combine to increase the total
amount of memory required by the system. This
memory expansion of course compounds the problem
because the execution speed of the program decreases
in direct proportion to the increased memory access
required.
The strong register orientation of current computers
is another stumbling block to efficient programming.
To move data from one place in memory to another
generally requires LOAD and STORE register commands, or worse yet, LOAD, LOAD, AND, AND,
SHIFT, OR, and STORE.
This register orientation of modern computers results
in roughly the same drawbacks as our current inadequate addressing. It demands excessive instructions to
accomplish the job, and like the addressing problem,
this results in:

TWO SOLUTIONS TO THE PROBLEM

Now let's examine some solutions for these cumbersome, time-wasting operations.

There are two major areas that must be attacked if
a job-oriented computer is ever to exist:
a. Excessive Editing*
b. Fixed Length Instructions
Litton has been experimenting with solutions to these
problem areas, and as a result has developed a technique of Automatic Editing for the first area, and a
system of Variable Length Commands for the second.
These Litton techniques are described in subsections
which follow.
Excessive editing

Excessive editing largely results from two basic and
common weaknesses in modern computers:
a. Inflexible word or character addressing
b. Excessive register orientation
Data fields do not align to the word or character
boundary, and consequently a large portion of the
program is devoted to converting input data into intermediate formats. This approach results in several undesirable conditions. The first, and most obvious, is
that intermediate storage is required, and instructions

* Editing is that work required to ready a data field for use. For
example, we want to add a memory field to a data register. The
field starts at bit 2 and is 14 bits long. The work required to
convert the field to a data format that can be added to the
register is defined as editing.

a. Increased memory requirements
b. Decreased throughput speed

Automatic editing

The negative effects of excessive editing can be dramatically reduced by the following computer features:
a. Address to
b. Automatic
c. Automatic
tions.
d. Automatic

the bit.
crossing of word boundaries.
editing for memory-to-memory operaediting for register operation.

Address to the bit and automatic crossing of word
boundaries combine to provide field addressing. It is
perfectly legitimate for a 33-bit field to start on bit 17
and cross a word boundary, and it is past the time that
computer designers should recognize this. Field addressing negates the necessity for most intermediate
storage (this is not meant to include tables, such as
track stores). In addition, the majority of the LOAD,
AND, OR, and SHIFT commands are no longer
needed. This results in reduced memory requirements
and increased execution speed.
Automatic editing for memory-to-memory operations
would be obvious if it were generally recognized that
memory-to-memory operations were required. Up to
now memory-to-memory operations have been word or
character oriented and generally slow. Memory-tomemory operations don't have to be ineffectual, but
it is possible to design them that way. Addressing must
be to the field. Memory move, or compare commands,

Computer Instruction Repertoire

345

GIVEN THREE WORDS WITH THE INDICATED FIELDS STARTING AT LOCATION A.
REFORMAT THE FIELDS INTO FOUR WORDS AT LOCATION I. AS SHOWN:

'4'5
LOCATION A

1----.:..:----.Ji-------..-If-------:-.-.J......------1

A+'~--~~-~-~~__-~~---_1

13I1TS
SYYYYYYYYY.xxx

MEMORY FIELD

A+2L-_ _h=I1=21_ _ _~~~~~--~------~
IITS

1+ ,

,UII

1+2
1+3

flo - oj

3'

'416

0

LOCATION I

iC41

I

kill

el101
0- - - - 01
0---------01

I
I
ToJ

bill

I

ICiI

I

dCII
.C71

10
10

'ROCESS REGISTER

.xxxxx

I
5673-3A

ell II

10- - - -- - 0

hl121

Figure 3

OUTPUT VARIABLES
'ROILEM I: CODING RESUL15

TOTAL INSTRUCTION
LENGTH IN BYTES

5873-1 A

Figure 1-Problem 1: Variable length field reformatting:
Format 1

must . look at memory words in para~lel, not some
inconsequential sUbportion in sequence. In this way
these commands can become effective and the programmer can forget the time-consuming memory juggling of the registers.
To illustrate these points we will use two reformatting
problems (Figure 1 and 2). We coded these problems
on. a conventional processor and on Litton's new Poly·processor. The conventional processor uses its registers
for editing, while the Polyprocessor uses memory-tomemory commands with automatic editing.
Problems 1 and 2 contain exactly
the same fields.
I
The difference in the two problems is that Problem
One (Figure 1) has no word boundary crossovers whereas Problem Two (Figure 2) has two boundary crossovers. The added co~lexity of Problem Two imposes
on the conventional processor a requirement for 14
more instructions, an increase of almost 50 percent.

In contrast, the number of instructions required by the
Polyprocessor remained the same, and the only operations were two extra memory accesses.
In these problems the Polyprocessor does well in the
important aspects of memory required and execution
speed, while the memory for the conventional processor
(124 versus 61, 180 versus 61) has increased dramatically. The reason Litton's Polyprocessor executes this
much faster is simply that it does not access memory as
often.
The requirements for automatic editing for register
operations is not as obvious now that the data formatting functions have been removed from the registers.
In Litton's Polyprocessor the registers are basically
used for:
a. Arithmetic computations
b. Indexing
The arithmetic area requires substantial automatic
editing; however, in moving data to and from the
registers the field capability removes the necessity for
editing. It is now possible to say "Add the 13-bit field
that starts at bit 23 to the process register."
The arithmetic process should have automatic alignment and overflow when required. For example, assume
that we want to add a signed 13-bit field to a process
register, Figure 3.
The ADD command will:

GIVEN THREE WORDS AT LOCATION ILK'. REFOfIMAT THE FIELDS INTO THREE WORDS AT LOCATION ILK2:
IITI 0
31
LOCATION ILKI

r---~~---~~---~---~
+2

L--_____________-L~~------------~---~~

a. Isolate and edit the 13-bit memory field into register
format, Figure 4.
b. Align the two arithmetic fields, Figure 5.

+1
+2 L--_____________

~

_______

DU~UT

~

__________

~~

VARIAILES

Is __

.....LlM2: CODINGllllULlS

----------------SYyYyyYYYY.xxx

I'OLYPIIOCESSDR

INlTIlUCTIONI

14

TOTAL INSTRUCTION
LENGTH IN IYTES

II
5873-2A

Figure 2-Problem 2: Variable length field reformatting:
Format 2

587:J.4A

Figure 4

346

Spring Joint Computer Conference, 1970

addressing. If the register were called out it would t~ke
two characters. For example add register 3 to register 5.

I

s .....- - - - - - S y y y y y y y y y . x x x o o

I

MeIllory to register cOIllIlland

This command requires 3 to 6 characters depending
upon its implementation. For example, add location
1500 to register 5.

5673·5A

Figure 5

MeIllory to IIleIlloJ,"y cOIllIlland

c. Add the two fields. If the result overflows, it will be
shifted to the right one place so that it will fit.
d. Return result to the register.
In this case automatic editing for register operation
has accomplished several things for us. It has removed
the usually required LOAD, SHIFT, AND, Test Sign
and Jump on Positive, and Set register negative commands. It has also removed the "clean up" computation
commands (i.e., testing for overflow and the associated
patch up commands). Finally, intermediate storage,
and its associated bookkeeping, for values like this is
no longer required because the value is easy to access
within its input stream format.
What do all of these advantages of automatic editing
mean? Very briefly we can say that they will provide
you with:
a. Less programming complexity
b. Less memory required for the problem
c. Higher execution speeds because less memory
accessed

IS

Variable length comman ds

The concept of fixed length commands is probably
the major reason that computer instruction repertoires
have made so little progress. There is no optimal
command length. Timid recognition of this fact is evident in some of our modern computers, but two or
three command lengths still cannot solve the problem.
For examples, let's consider the following command
functions:
a.
b.
c.
d.

Register to register command
Memory to register command
Memory to memory command
Scan characters under table control.
Register to register cOIllIlland

This command can be accomplished with one character if the Polish Notation concept is used for register

This command will probably require 6 to 10 characters. If we move memory to memory the command
will require 2 memory addresses, a count and an
operation code.
Scan characters under table control

Here we want to determine the length of the next
field in a string of characters. The field length can be
terminated by many characters hence a table is required to quickly identify which character will cause
the scan to stop. The required instruction parameters
are:
a.
b.
c.
d.
e.
f.
g.
h.

Data address
Length of character string
Character size
Table address
Transfer condition
Transfer address
Optional mask
Optional pattern address.

This command is obviously greater than ten characters
in length.
The wide variation in the memory requirements for
these commands demonstrate why problem-oriented
instruction repertoires have not appeared. The general
cry is that instructions with this range of capability
will cost a fortune. Our experience at Litton says that
this is not true. In fact, bec~use smaller memories are
required, we find that the total systems cost less. The
cost of this type of instruction repertoire compares
very' favorably with common computer features, such
as instruction overlap and local memories.
In summary, I can say that we at Litton are convinced that the variable length instruction is required
if the computer instructions are to be problem-oriented.
Further, we believe that this problem-oriented concept
must be pursued, in the face of programming response
time and implementation costs which have risen so
high that they must be reduced. As a consequence of

Computer Instruction Repertoire

Conventional Computer

Polyprocessor

Percentage
Fi.1d

2«

IIIIIrUcIIonTotal

100

41.0

Total Memory Required in
Bytes
Actual

20,852

IfTables Optimally
Compact
Microseconds

10,182

48.8

11,968

10,182

85.1

85,024.4

22,460.4

26.4

No. 0( Bits

Description

X

Reetan... lar X Coordinate

13

y

Reetan ...1ar Y Coordinate

13

R

PoIarR......

12

(J

Polar Anale

12

I

Parameter Ran",

f (Radsr Mile
:t640RM
= 2000 yds)

347

Input or
Output
Both

Both
Both

o to 359

Both

TABLE V-Program 1: Parameters
TABLE III-Program Summary

Bench mark program 1

these convictions, variable length instructions have
been included in Litton's new Polyprocessor.
CONCEPT VERIFICATION
To assure ourselves that we were moving in the right
direction, three bench mark problems, which were previously programmed for a conventional computer, were
recoded in the proposed Litton Polyprocessor instruction set. Coding for both machines was in assembly
language. Also, a comparison for each problem was
made as to the number and type of instructions, number
of memory accesses, program memory requirements
including tables, and time of execution.
The execution times for the Polyprocessor were of
necessity estimated (hopefully on the conservative
side) with the best information available to us at the
time the study was done. The conventional computer
has a 1.0-microsecond memory. The Polyprocessor has
a O.5-microsecond memory.
Our conclusion (see Tables III and IV) was that in
all cases the Polyprocessor could accomplish the same
result with fewer instructions and a diminished program
space requirement. You will note, in particular, the
complete lack of shift, logical, and move type instructions with the Polyprocessor instruction set, and the
reduction in the computer "setup" instructions which
are grouped under the heading of Other. The memory
space saving is most apparent when comparing the
space allocated to the tables.

Con.entional Computer

Polypro ..sSOl

There are 512 different random sets of coordinate
information. Half of these sets contain polar coordinates, and the other half contain rectangular coordinates. Problem 1 is in two parts, a and b:
Problem 1a: Convert the 256 polar coordinates to
rectangular coordinates using the following formulas:

x

77

50

Rcos()

Y = R sin ()
Problem 1b: Convert the 256 rectangular coordinates
to polar coordinates using the following formulas:

R = VX2

+

y2

() = arctan (XjY)
A minimum end accuracy of 12 bits plus sign per
coordinate position is maintained throughout the problem. Double precision arithmetic is used whenever
necessary.
The parameters shown in Table V are used by both
problem 1a and problem lb.
ProblelIl 1: coding results

Tables VI and VII display the number of instructions
and other data required of each computer program to
accomplish Bench Mark Program 1.

Per..nt....

Conventional Processor
116

Instructions

Loads.

=

Poly processor
34

64.9
Memory Required in Bytes

Stor..

II

8

72.7

Shift.

28

0

0

464

85

Actual Constsnts and Tables

8.260

3,200

OptimaDy Compact Constants and Tables

4,164

3.200

Instructions

Arithmetlca

LoaICaJ

60

39

65.0

9

0

0
nme (microseconds)

Othen

57

16

TABLE IV-Instruction Summary

301

28.1

TABLE VI--Program 1: Summary

82.9

348

Spring Joint Computer Conference, 1970

Bench mark problem 2
C'ompUkr

Lo.ds

Conventional

37

Stores

Shifts

15

Arithmctics

Logical

Other

Totals

26

114

26

Prol,..~ssur

20

Polyp,ocessor

34

TABLE VII-Program 1: Instruction Summary

eon_tiona!

rroce...

............

PoIyproceaor

46

107

a.nc...

...........
AcIIIII~"T."'"

o,a-a,C..... C _ t...... T....
l'IIM,..-.-...)

421

108

7,Z12

4,634

5,100

4,634

77,770.1

lO,ZU.4

TABLE VIII-Program 2: Summary

Computer

LoadI

St_

Shifb

Arilhmelicl

lcJtjcIl

Other

TolIII

ConftIIIionIl
P'Ioceaw

35

5

13

27

2

25

107

~.

17

2

-

23

-

4

46

TABLE IX-Program 2: Instruction Summary

Field

Description

Parameter
R....

Input or
Output

X

Position InX

1640RN

Input

Y

PositionlnY

1640RN

Input

FX

Batlery Position In X

AclUalW.apon
System

Input

FY

Battery Position in Y

PosilioninX
andY

Input

TARG

Friendly or Foe
Indicator

Oor 1

Input and
Output

nNE

Time of hostile to
battery

N/A

Output

VEL

Velocity of hostile

187S yards/sec.

Constant

TABLE X-Program 3: Parameters

ConftIItioul
~

..........

~

21

1O

84

62

AnII1T.....

4,384

2,093

o,tIaIIaJ Compact T.....

2,704

2,093

6,952.6

2,115.1

a...-

..........

l'IIM(~)

TABLE XI-Program 3: Summary

There are 256 distinct tracks in ·the system, Half of
these tracks are local and the rest are remote, Problem
2 updates each track's ground position and slant range
upon each radar scan,
Each radar scan is subdivided into 20-degree sectors.
All tracks within a sector are linked and are updated
before tracks of another sector are considered,
Track identification for determining the actual tracks
falling within a sector is provided by Track Process.
This is an input array which contains a chain that
links in geographical proximity.
All of the 256 track files are in the same format.
Problem 2 determines the output parameter using the
following formulas:

+ XTp - T + 1/S(AT Ypg + YT.p - T + 1/S(A T Xg2 + 1/g2
Xg2 + Y g2 + H2 (Xg)

Xg = Xpy

AA)

Yg.::!:

AA)

Xs =

Y g2 + Y g2
Ys = Xg2 + Y g2 + H2 (Yg)
Problelll 2: coding results

Tables VIII and IX display data required for each
computer program to accomplish Bench Mark Program 2.
Bench mark program 3
There are 256 distinct tracks in the system. Sixty-four
of the tracks are hostile and are randomly dispersed
among the remaining friendly 192 tracks. If a track is
hostile, the designation time of the hostile to a friendly
battery is computed. Table X presents the track parameters. All of the 256 track files are in the same
format.
Problem 3 determines if a track is hostile. This is
the case when the TARG friendly, or foe, parameter

Computer

Loada

s_

Aritlunetic:l

l.oPcII

Other

To....

eo._tioIIal.
r.-

5

2

7

I

6

21

~

6

2

7

-

5

20

TABLE XII-Program 3: Instruction Summary

Computer Instruction Repertoire

is non-zero. Then the flight time of the hostile track to
the battery is computed by the following formula.
TIME =

V(X - FX)2

+

(Y - FY)2

VEL

The TIME and battery position are stored in the
target array. If the track is friendly, no action is to be
taken.
ProbleDl 3: coding results

Tables XI and XII display the data required of each
computer program to accomplish Bench Mark
Problem 3.
SUMMARY
As demonstrated by our coding experience, Litton has
found that Automatic Editing and Variable Length

349

Instructions result in:
a. Dramatic reduction in instructions required to accomplish programming jobs.
b. Dramatic reduction in memory required to hold instructions and data.
c. Substantially increased internal processing speed because fewer memory accesses are required.
Litton has incorporated these features in its Polyprocessor computer. We expect that these features,
along with others, will provide users of our computer
with a substantial increase in capability.
ACKNOWLEDGMENTS
The author wishes to thank Robert D. Bernstein and
the staff of Data Systems Division of Litton Industries
for their assistance in the preparation and editing of this
paper.

The PMS and ISP descriptive systems
for computer structures*
by C. GORDON BELL and ALLEN NEWELL
Carnegie-Mellon University
Pittsburgh, Pennsylvania

INTRODUCTION
In this paper we propose two notations for describing
aspects of computer systems that currently are handled
by a melange of informal notations. These two notations emerged as a by-product of our efforts to produce
a book on computer structures (Bell and Newell, 1970).
Since we feel it is slightly unusual to present notations
per se, outside of the context of particular design or
analysis efforts that use them, it is appropriate to
relate some background history.
The higher levels of computer structure-roughly,
all aspects above the level of logical design-are
becoming increasingly complex and, as a result, developing into serious domains of engineering design.
By serious we mean the growth of techniques of
analysis and synthesis, with a body of codified technique and design experience which professional designers must assimilate. In the present state, most
knowledge about the technologies for computer architecture is embedded in particular studies for particular
computer systems. Nothing exists comparable to the
array of textbooks and systematic treatments of
logical design or circuit design.
We started off simply to produce a set of readings
in computer systems, motivated by this lack of systematic treatment and the inaccessibility of good examples. As we gathered material we became impressed
(depressed is actually a better term) with the diversity

* This paper is taken from Chapters 1 and '2, substantially
compressed and rewritten, of a book, Computer Structures,
Readings and Examples (Bell and Newell, McGraw-Hill, 1970),
which is about to be published. All figures in the paper have been
reproduced with the permission of McGraw-Hill. The research in
this paper was supported by the Advanced Research Projects
Agency of the Office of the Secretary of Defense (F 44620-67-C0058) and is monitored by the Air Force Office of Scientific
Research. This document has been approved for public release
and sale; its distribution is unlimited.

351

of ways of describing these higher levels. The amount
of clumsy description-downright verbosity':""-even in
purely technical manuals acted as a further depressant.
The thought of putting such a congeries of descriptions
between hard covers for one person to peruse and
study was almost too much to contemplate. We began
to rewrite, and condense many of the descriptions. As
we did so, a set of common notations developed.
Becoming aware of what was happening, we devoted a
substantial amount of attention and effort to creating
notational systems that have som:e consistency and,
hopefully, some chance of doing the job required.
These are the PMS des9riptive system for what we
will call the PMS level of computer structure (essentially the information flow level), and the ISP descriptive system for defining the programming level in
terms of the l~gic level (actually, the register-transfer
level).
Thus, these two notations were developed to do a
descriptive task-to be able to write down the information now given in the basic machine manual in a
systematic and uniform way for all current computers.
They were to provide a complete defining description
for complete systems, such as the IBJV[ 7090 or the
SDS 930. Hence, the essential constraints for the
notations to satisfy were ones of completeness, flexibility, and brevity (i.e., high informational density).
We think the two notations meet these requirements.
They have not yet been used in a way that meets
additional requirements that we would all put on
notational systems; namely, that there be analysis and
synthesis techniques developed in terms of them. *

* There is currently a thesis in progress establishing a substantial
amount of standard analysis at the PMS level. In addition, there
exists at least one simulation system at the register-transfer level
(Darringer, 1969) that bears a close kinship to ISP. Finally, one
new computer, the DEC PDP-ll, reported in this conference
(Bell, et al., 1970), was designed using PMS and ISP as the
working notations.

352

Spring Joint Computer Conference, 1970

use by many people. Thus, they are undoubtedly
imperfect in a number of ways (even beyond the usual
questions of taste in notation, which always prevents
uniform agreement and satisfaction).
By way of justification let us simply note the many
places where pure descriptions (without analysis or
synthesis techniques) are critical to the operation of
the computer field. The programming manual serves
as the ultimate educational and reference document
for all programmers. Professional papers reporting on
new computing systems give descriptions of the overall
configuration; currently these are done by informal
block diagrams. Each manufacturer adopts descriptive
names of its own choosing, often for marketing purposes,
to describe the components of its systems in sales
brochures-e.g., selector, channel, peripheral processor,
adapter, bus. During negotiations for the purchase or
sale of computer system, overall descriptions (at the
PMS level, as it turns out) are passed between manufacturer and p<5tential customer. Large amounts of
rough performance analyses are based on such abbreviated system descriptions. In. the classroom (and
elsewhere) systems are presented briefly to make
particular points about design features. A user, even
though he knows the order code of a machine, needs to
learn the configuration available at a given installation
(which, again, is a description at the PMS level). The
list could be extended somewhat further, but perhaps
the point is made. There is a substantial need for a
uniform way of describing the upper levels of computer
structures, not just for computer design, but for
innumerable other purposes of marketing, use, comparison, education, etc.
With this preamble, let us describe the two notations. Notations are' not theoretically neutral. That is,
they are based on a particular view of the systems to
be described. Thus, to understand PMS and ISP we
give briefly this view of computer systems. This· material is elementary and known, at least implicitly, to
all computer designers. But it serves to provide the
rationale for the notations and to locate them with
respect to other descriptions of computer systems.
After we have given some of this background, we will
describe, first, PMS and then ISP. The two descriptive
However, the gains to the computer field simply from
the use of good descriptive notations are immense.
Thus, we think that these two notations should be
put forward to the computer community, both for
criticism and as one concrete proposal for the adoption
of a uniform notation. ** The present notations are
quite new and have hardly been thoroughly tested in

** A standards committee might be set up for dealing with these
system levels and their description.

systems have a common base of conventions, but it is
simpler to treat them separately, especially when
being informal.We will use the PDP-8 as an example
for both PMS and ISP, since it is small enough to be
completely described within the confines of this paper.
At the end, in order to give some indication of generality, we will treat briefly the CDC 6600.
Our treatment here of these notations is essentially
informal and heuristic. A complete treatment, as well
as many examples, can be found in the forthcoming
book (Bell and Newell, 1970).
HIERARCHICAL SYSTEM LEVELS
A computer system is complex in several ways.
Figure 1 shows the most important. There are at least
four levels that can be used in describing a computer.
These are not alternative descriptions. Each level
arises from abstraction of the levels below it.
A system (at any level) is characterized by a set of
components, of w~ich certain properties are posited,
and a set of ways of combining components to produce
systems. When formalized appropriately, the behavior
.....- . -.---------.---- ..----.- - r - - - - - - - - - - - - - .
Structures:
---

Network/N
Computer/C

,ti compo,ents:

Processors/p. Hem~ ories H, Switches/S, Controls/K,
'" Transducers/T. Data Operators/O,
f Llnks/L.

1--+------------Structure:

g'

!]

.-

~-,:

,

~

programs, subprograms

~omponen~s_:

state (memory cells)
Instructions, operators, controls
interpreter.

, ,

~

I--\-,;--r-----',------··------- -_._- - - - - 1 1 - - - - + ' - ' - - - - - ;

",<",'" .d'~';< ."".

~

,

-;
~

..J

~

-; _

~._

I

,,,

I

"

,

State
System
Level

Circuits: counters, control s, sequent I a I transducer
function generator, register
arrays

~ 1:~~~:~!~/RS~I~~,f~~~:~/o,
u '"
Toggle/T,
0-::
one
shot Latch), delay,

]

j

"

/fuJ'

~ ~----~~~~--4~r-~--r-~rl
~

Ci reu i ts:

encoders, decod-

U - ers, transfer arrays, data
.... - ops, selectors, distributors,
~
iterative networks
~._ Components:

l

and, or, not,
nand, nor, majority

Figure I-Hierarchy of computer structures

C
t
,,_ s,::~~~:nl~
, puts, out,
uts

PMS and ISP Descriptive Systems

of the systems is determined by the behavior of its
components and the specific modes of combination
used. Elementary circuit theory is the prototypic
example. The components are R's, L's, C's and voltage
sources. The mode of combination is to run wires
between the terminals of components, which corresponds to an identification of current and voltage at
these terminals. The algebraic and differential equations of circuit theory provide the means whereby the
behavior of a circuit can be computed from the properties of its components and the way the circuit is constructed.
There is a recursive or nested feature to most system
descriptions. A system, composed of components
structured in a given way, may be considered a component in the construction of yet other systems. There
are primitive components whose properties are not
explicable as the resultant of a system of the same
type. For example, a resistor is usually not explained
by a subcircuit, but is taken as a primitive. Sometimes there are no absolute primitives, it being a matter
of convention what basis is taken. For example, one
can build logical design systems from many different
primitives (AND and NOT; NAND; OR and NOT;
etc.).
A system level, as we have used the term in Figure
1, is characterized by a distinct language for representing and analyzing the system (that is, the components, modes of combination, and laws of behavior).
These distinct languages reflect special properties of
the types of components and of the way they combine.
Within each level there exists a whole hierarchy of
systems and subsystems. However, as long as these
are all described in the same language-e.g., a subroutine hierarc~y, all given in machine assembly
language-they do not constitute separate system
levels.
The circuit level, and the combinatorial switching
sublevel ,and sequential switching sublevels of the
logic level, are clearly defined in the current art. The
register-transfer level is still uncertain because there is
neither substantial agreement on the exact language
to be used for the level, nor on the techniques of
analysis and synthesis that go with it. However, there
are many reasons to believe it is emerging as a distinct
system level.
In the register-transfer level the system undergoes
discrete operations, whereby the values of various
registers are combined according to some rule, and
then stored in another register (thus "transferred").
The law of combination may be almost anything, from
the simple unmodified transfer (A t - B) to logical
cnmbination (A t - B /\ C) to arithmetic (A t - B
C).
Thus, a specification of the behavior, equivalent to

+

353

the boolean equations of sequential circuits or the
differential equations of the circuit level, is a set of
expressions (often called productions) which give the
conditions under which such transfers will be made.
There have been a number of efforts to construct
formalized register transfers systems. Most of them
are built around the construction of a programming
system or language that permits computer simulation
of systems on the RT level (e.g., Chu, 1962; Darringer,
1969). Although there is agreement on the basic
components and types of operations, there is much
less agreement on the representation of the laws of
the system.
The state system representation is also at the logic
level, but it has been put off to one side in Figure l.
The state system is the most general representation of
discrete system available. A system is represented as
capable of being in one of a set of abstract states at
any instant of time. (For digital systems the set is
finite or enumerable.) Its behavior is specified by a
transition function that takes as arguments the current
state and the current input and determines the next
state (and the concomitant output). A digital computer
is, in principle, representable as a state system, but
the number of states is far too large to make it useful
to do so. Instead, the state system becomes a useful
representation in dealing with various subparts of
the total machine, such as the sequential circuit that
controls a magnetic tape. Here the number of states is
small enough to be tractable. Thus, we have placed
state systems off to one side as an auxiliary to the
logic level.
The program level is not only a unique _level of
description for digital technology (as was the logic
level), but it is uniquely associated with computers,
namely, with those digital devices that have a central
component that interprets a programming language.
There are many uses of digital technology, especially
in instrumentation and digital controls which do not
require such an interpretation device and hence have
a logic level but no program level.
The components of the program level are a set of
memories and a set of operations. The memories hold
data structures which represent things both inside
and outside of the memory, e.g., numbers, payrolls,
molecules, other data structures, etc. The operations
take various data structures as inputs and produce
new data structures, which again reside in memories.
Thus the behavior of the system is the time pattern
of data structures held in its memories. The unique
feature of the program level is the representation it
provides for combining components-that is, for
specifyip.g what operations are to be executed on what
data structures. This is the program, which consists of

354

Spring Joint Computer Conference, 1970

a sequence of instructions. Each instruction specifies
that a given operation (or operations) be executed on
specified data structures. Superimposed on this is a
control structure that speCifies which instruction is to
be interpreted next. Normally this is done in the order
in which the instructions are given, with jumps out of
sequence specified by branch instructions.
In Figure 1 the top level is called the ProcessorMemory-Switch level, or PMS level for short. The
name is not recognized, nor is any other, since the
level exists only informally. Nevertheless, its existence
is hardly in doubt. It is the view one takes of a computer system when one considers only its most aggregate behavior. It then consists of central processors,
core memories, tapes, discs, input/output processors,
communication lines, printers, tape controllers, busses,
Teletypes, scopes, etc. The system is viewed as processing a medium, information, which can be measured in
bits (or digits, characters, words, etc.). Thus the
components have capacities and flow rates as their
operating characteristics. All details of the program
are suppressed, although many gross distinctions of
encoding and information type remain, depending on
the analysis. Thus, one may distinguish program from
data, or file space from resident monitor. One may
remain concerned with the fact that input data is in
alphameric and must be converted into binary, or is in
bit serial and must be converted to bit parallel.
We might characterize thIs level as the "chemical
engineering view of a digital computer," which likens
it more to a continuous process petroleum distilling
plant than to a place where complex FORTRAN
programs are applied to matrices of data. Indeed, this
system level is .more nearly an abstraction from the
logic level than from the program level, since it returns
to a simultaneously operating flow system.
One might question whether there was a distinct
systems level here. In the early days of computers
almost ali computer systems could be represented as in
the diagram in MIT's Whirlwind Computer programming manual in Figure 2: the four classic boxes of
memory (storage), control, arithmetic, and input/
output (separated, in the figure). But current timesharing and multiprocessing systems are orders .of
magnitude more complex' than this, and it is known
that the structure at this level has a determining
influence on system performance. (See the PMS diagram
for the 6600 in Figure 6, by no means the most complex
of current systems.)
With this total view of the various systems levels
we can locate both PMS and ISP. PMS is, of course, a
systems level of its own, namely, the top one. ISP is a
notation for describing the components and modes of
combination of the programming level in terms of the

ARITI~fETIC

.....- - - f CONTROL

t - - - -. . STORAGE

F.LF.~1ENT

INPIIT

OIlTPUT

Figure 2-Simplified computer block diagram Whirlwind I
(courtesy of M.LT.)

next level down, i.e., in terms of the register transfer
level. That is, the instructions, operations and interpretation cycle are the defining' components of the
programming level and must be given in terms of a
more basic systems level. The programming level
itself consists of programs written in the machine code
of the system. In essence, a register-transfer description
of a processor is an interpreter program for interpreting
the instruction set. The interpreter describes the actual
hardware of the processor. By carefully structuring a
register-transfer description of a processor, instructions
are precisely defined.
Thus, ISP is an interface language. Similarly, interface definitions exist at all levels of a system hierarchy,
e.g., between the circuit level and the logic level.
Normally, it is not neces.sary to have a special language
for the interface; e.g., one simply writes a circuit
description of an, AND-gate. But with the programming
level, it is most useful not to use simply a register
transfer language, but to introduce a special notation
(i.e., ISP). This will become clear when we describe
ISP.
PMS and ISP are also strongly related in that ISP
statements express the behavior of PMS components.
Thus, for every PMS component there are constructs
in ISP that express its behavior; and each ISP statement implies particular PMS structures.
A word should be said about antecedents. The PMS
descriptive system is close to the way we all talk
informally about the top level of computer systems;
no one effort in the environment stands out as a predecessor. Some notations, such as CPU (for central
processing units), have become widespread. We clearly
have assimilated these. Our modifications, such as Pc
instead of CPU, are dictated entirely by the attempt
to build a consistent notation over the whole range of
computer systems. With respect to ISP, we have been
heavily influenced by the work on register transfer
languages. * The one that we used most as a kernel

PMS and ISP Descriptive Systems

from which to grow ISP was the work of Darringer
and Parnas (Darringer, 1968). In particular, their
decision to work within the framework of ALGOL
suited our own sensibilities, even though the final
version of ISP departs from a sequential algorithmic
language in a number of respects.
PMS LEVEL OF DESCRIPTION
Digital systems are normally characterized as
systems that at any time exist in one of a discrete set
of states, and which undergo discrete changes of state
with time. Nothing is said about what physical state
corresponds to a system state; or the behavior of
compoo.ents that transform the system from one state
to another. The states are given abstract labels: SI,
S2, .... The transitions are provided by a state-transition table (or state diagram) of the form: if the system
is in state Si and the input is Ij, then the system is
transformed to Sk and evokes output Ol. The "statesystem" view captures what is meant by a discrete (or
digital) system. Its disadvantage is its comprehensiveness, which makes it impossible to deal with large
systems because of their immense number of states (of
the order 10107 states for a big computer).
Existing digital computers can be viewed as discrete
state systems that are specialized in three ways. First,
the state is realized by a medium, called information,
which is stored in memories. Thus, a processor has all
its states made explicit in a set of registers: an accumulator, an address register, an instruction register,
status register, etc. No permanent information is
kept in digital devices except as encoded in bits or
some other information unit base in a memory. Sequential logic circuits that carry out operations in the
system may have intermediate non-staticized states
(e.g., during a multiply instruction), but these are
only temporary. Second, the current digital computer
systems consist of a small number of discrete subsystems linked together by flows of information. The
natural representation of a digital computer system is
as a graph which has component systems at the nodes
and information flows as branches. This representation
as an information flow network with functionally
specialized nodes is a real specialization. Finally, each
component in a digital system has associated with it a
small number of discrete operations for changing its
own state or the state of neighboring components. The

* We have not been influenced in a direct way by the work of
Iverson (Falkoff, Iverson and Sussenguth, 1964) in the sense of
patterning our notation after his. Nevertheless, his creation of
a full description of the IBM System/360 system in APL stands
as an important milestone in moving toward formal descriptions
of machines.

355

total behavior of the system is built up from the
repeated execution of the operations as the conditions
for their execution become realized by the results of
prior operations.
To summarize, we want a way of describing a system
of an interconnected set of components, which are
individual devices that have associated with them a set
of operations that work on a medium of information,
measured in bits (or some other base). For the PMS
level we ignore all the fine structure of information
processing and consider a system consisting of components that work on a homogeneous medium ca]1ed
information. Information comes in packets, called
i-units (for information units) and is measured in bits
(or equivalent units, such as characters). I-units have
the sort of hierarchical structure indicated by the
phrase: a record consists of 300 words; a word consists
of 4 bytes; a byte consists of 8 bits. A record, then,
contains 300 X 4 . X 8 = 9600 bits. Each of these
numbers-300, 4, 8-is called a length.
Other than being decomposable into a hierarchy of
factors, i-units have no other structure at the PMS
level. They do have a referent-that is, a meaning. At
the PMS level we are not concerned with what is
referred to, but only with the fact the certain components transform i-units, but do not modify their
meaning. These meaning-preserving operations are the
most ba'Sic information processing operations of alland provide the basic classification of computer
components.
PMS primitives

There are seven basic component types, each distinguished by the kinds of operations it performs:
Memory, M. A component that holds or stores
information (i.e., i-units) over time. Its operations
are reading i-units out of the memory, and writing
i-units into the memory. Each memory that holds
more than a single i-unit has associated with it an
addressing system by means of which particular
i-units can be designated or selected. A memory can
also be considered as a switch to a number of submemories. The i-units are not changed in any way
by being stored in a memory.
Link, L. A component that transfers information
(i.e., i-units) from one place to another in a computer
system. It has fixed terminals. The operation is
that of transmitting an i-unit (or a sequence of
them) from the component at one terminal to the
component at the other. Again, except for the change·
in spatial position, there is no change of any sort in
the i-units.

356

Spring Joint Computer Conference, 1970

that accomplishes all the data operations, e.g.,
arithmetic, logic, shifting, etc.
Processor, P. A component that -is capable of interpreting a program in order to execute a sequence of
operations. I t consists of a set of operations of the
types already mentioned-lVI, L, K, S, T and Dplus the control necessary to obtain instructions from
a memory and interpret them as operations to be
carried out.

Control, K. A component that evokes the operations
of other components in the system. All other components are taken to consist of a set of discrete operations, each of which-when evoked-accomplishes
some discrete transformation of state. With the
exception of a processor, P, all other components
are essentially passive and require some other active
agent (a K) to set them into small episodes of activity.
Switch, S. A component that constructs a link
between other components. Each switch has associated with it a set of possible links, and its operations consist of setting some of these links and
breaking others.
Transducer, T. A component that changes the i-unit
used to encode
given meaning (i.e., a given
referent). The change may involve the medium
used to encode the basic bits (e.g., voltage levels to
magnetic flux, or voltage levels to 'holes in a paper
card) or it may involve the structure of the i-unit
(e.g., bit-serial to bit-parallel). Note that T's are
meaning preserving, but not necessarily information
preserving (in number of bits), since the encodings
of the (invariant) meaning need not be equally
optimal.
Data-operation, D. A component that produces
i-units with new meanings. I t is this component

Computer model (in PMS)

Components of the seven types can be connected to
make stored program digital computers, abbreviated by
C. For instance, the classical Gonfiguration for a computer is:
C := Mp - Pc - T - X

a

Here Pc indicates a central processor and Mp a primary
memory, namely, one which is directly accessible from
a P and holds the program for it. T (input/ output
device) is a transducer connected to the external
environment, represented by X. (The colon-equals
(: =) indicates that C is the name of what follows to
the right.)
The classic diagrams had four components, since it
decomposed the Pc into a control and an arithmetic
unit:
Mp-

Mp-K-T-X

I

or

''-

\

D

Pc :=
Mp

data
----+---~
instructions

,

I

" .. K . . .

I

/

seems to be the appropriate way to functionally decompose the system.
N ow we associate local control of each component
with the appropriate component to get:

where the heavy information carrying lines are for
instructions and their data, and the dotted lines
signify control.
Often logic operations were lumped with control,
instead of with data operations-but this no longer

1

'\.

~-T-X

I

l

-----1f----

~- X

I

I

I

I

•

K(Mp)- -

-

-

I
- K(T)

PMS and ISP Descriptive Systems

where the heavy lines carry the information in which
we are interested, and the dotted lines carry information about when to evoke operations on the respective
components. The heavy information carrying lines
between K and Mp are instructions. Now, suppressing
the K's, then lumping the processor state memory,
the data operators, and the control of the data, operators and processor state memory to form a central
processor, we again get:
Mp-Pc-T-X
Computer systems can be described In PMS at
varying levels of detail. For instance, we did not
write in the links (L's) as separate components. These
would be of interest only if the delays in transmission
were significant to the discussion at hand, or if the
i-units transmitted by the L were different from those
available at its terminals. Similarly, often the encoding
of information into i-units is unimportant; then there
is no reason to show the T's. The same statement
holds for K's-sometimes one wants to show the
locus of control, say when there is one control for

many components, as in a tape controller; but often
this is not of interest. Then, there is no reason to show
K's in a PMS diagram.
As a somewhat different case, it turns out that D's
never occur in PMS diagrams of computers, since in
the present design technology D's occur only as subcomponents of P's. If we were to make PMS-type
diagrams of analog computers, D's would show extensively as multipliers, summers, integrators, etc. There
would be few memories and variable switches. The
rather large patchboard would be represented as a
very elaborate manually fixed switch.
Components are themselves decomposable into
other components. Thus, most memories are composed
of a switch-the addressing switch-and a number of
submemories. Thus a memory is recursively defined as
either a memory or a switch to other memories. The
decomposition stops with the unit-memory, which is
one that stores only a single i-unit, hence requires no
addressing. Likewise, a switch is often composed of a
cascade of I-way to n-way switches. For example, the
switch that addresses a word on a multiple-headed
disk might look like:

M.disk :=

[S(randOm)~ S(randOm)~

S(linear) ~

The first S(random) selects a specific Ms.diskLJdrivtLJ
unit; the second S(random) is a switch with random
addressing that selects the head (the platter and side);
S(linear) is a switch with linear accessing that selects
the track; and S(cyclic) is a switch with cyclic addressthat finally selects the M(word) along the circular
recurring track. Note that the switches are realized by
differing technologies. The first two S(random)'s are
generally electronic (AND-OR gates) with selection
times of 10 ,-....., 100 microseconds, or perhaps electromechanical (relay). The S(linear) is the electrofYIechanical action of a stepping motor or a pneumatic driven
arm which holds the read-write heads-the selection
time for a new track is 50 ,-....., 500 milliseconds. Finally,
the S(cyclic) is determined by the rotation time of
the disk and requires from 16 ,-....., 60 milliseconds,
depending on the speed (3600 ,-....., 1000 revolutions/
minute). This decomposition capability allows us to
be able to describe components with varying precision
and accuracy.
The control element of a computer is often shown
as being associated with the processor-not to the

357

S(CYCliC)~

M(word)

J

control of a disk or magnetic tape, such a K is often
more complex. When we suppress detail, controls often
disappear from PMS diagrams. Alternatively, when we
agglomerate primitive components. (as we did above
when combining Mp and K(Mp) to be just Mp) into
the physically distinct sub-parts of a computer system,
a separate control, K, often appears. The functionally
and physically separate control* has evolved in the
last decade. These controls, often larger than a Pc,
are sometimes computers with stored control programs.
When we decompose such a control there are: data
operations (D) for calculating addresses or for error
detection and error correction data; transducers (T)
for changing logic signal levels and information flow
widths; memory (M) as it is used in D, T, K, and for
buffering; and finally a large control (K) which coordinates the activities of all the other primitives.

* A variety of names for K's are used, e.g., controller, adapter,
selector, interface, buffer multiplexor, etc. Often these names
reflect other functions performed by the device.

358

Spring Joint Computer Conference, 1970

The components are named according to the function
they perform and they can be composed of many
different types of components. Thus, a control (K)
must have memory (M) as a subcomponent, and a
memory, M, may have a transducer (T) as well as a
switch (S) as subcomponents. All of these subcomponents, of course, exist to accomplish the total function
of the component, and do not make the component
also some other type. For instance, the M that does a
transduction (T) from voltages on its input wires to
magnetism in its cores and a second transduction from
magnetism to voltages on its output wires does not
thereby become a transducer as far as the total system
functioning is concerned. To the rest of the system
all the M can do is to remember i-units, accepting and
delivering them in the same form (voltages). We
define for each component type both a simple component and a compound component, reflecting in
part the fact that complex subsystems can be put
together to perform a single function from the viewpoint of the total system. For example, a typewriter
may have 4
6 simple information transduction
channels using video, tactile, auditory, and paper
information carriers.
f'--'

PMS notation

Various notational conventions designate specifications for a component, e.g., Mp for a functional classification, and S(cyclic) for a type of switch access
function in the case of rotating memory devices like
drums. There are many other additional specifications
one wants to give. A single general way of providing
additional specifications is used so that if X is a component, we can write:
X(ai: VI; a2: V2; .... )
to indicate that X is further specified by attribute al
having value VI, attribute a2 having value V2, etc. Each
parameter (as we call the pair ai Vi is well defined independently of what other parameters are given; hence,
there is no significance to the order in which they are
written, or to the number which have to be written.
According to this notation we should have written
M(function:primary) or S(access-function:random)
rather than Mp or S (random). There are conventions
for abbreviating and abstracting parameters to avoid
such a lengthy description. Alternative ways of writing
Mp are:
M (function: primary)
M(primary)

complete specification
drop the attribute, function,
since it can be inferred from
the value

M.primary

M.p

Mp

use the value outside the parenthesis, concatenated with a
dot
use an explicitly given abbreviation, namely, primary/p
(only if it is not ambiguous)
drop the concatenation marker
(the dot), if it is not needed to
recover the two parts (all
components are given by a
single capital letter-here 'M)

Each of these rules corresponds to a natural tendency
to abbreviate when redundant information is given;
each has as its condition that recovery must be possible.
In the full description (Bell and Newell, 1970) each
component is defined and given a large number of
parameters, i.e., attributes with their domain of values.
Throughout, the slash U) is used to introduce abbreviations and aliases as we go. * Any list of parameters
does not exhaust those aspects of a component that
one might ever conceivably want to talk about. For
instance, there are many quite distinct dimensions for
any component in addition to the information dimension: packaging, physical size, physical location,
energy use, cost, weight, style and color, reliability,
maintainability, etc. Furthermore, each of these
dimensions includes an entire set of parameters, just
as the information dimension breaks out into the set
of parameters illustrated in the figures. Thus the
descriptive system is an open one and new parameters
are definable at any occasion.
The very large number of parameters provides one
of the major challenges to creating a viable scheme to
describe computer systems. We have responded to this
in part by providing automatic ways in which one can
compress the descriptions by appropriate abbreviation
-while still avoiding a highly cryptic encoding of
each separate aspect., Abstraction is another major
area in which some conventions can help to handle
the large numbers of parameters. For instance, one
attribute of a processor is the time taken by its operations. This attribute can be defined with a complex
value:
Pc(operation-times: add:4

J.1.S,

store:4 J.1.S, load:4 J.1.S,
multiply: 16 p,s, ... )

That "is, the value is a .list of times for each separate
operation. One might also give only the range of these
numbers; this is done by indicating that the value is a
range:
16 J.1.s).
Pc(operation-time: 4
f'--'

* There is no difficulty distinguishing this use from the use of
slash as division sign-the latter takes priority, since it is the
more specific use of the slash.

PMS and ISP Descriptive Systems

Similarly, one could have given typical and average
times (under some assumed frequency mix of instructions) :
Pc( operation-times: 4 fJs)
Pc(operation-times: average: 8.1 fJs).

values whenever desired, is that it keeps the number
of attributes that have to be defined small.

A PMS example using the DEC PDP-8

The advantage of this convention, which permits
descriptions of values to be used in place of actual

Figure 3 gives the detailed PMS diagram of an
actual, small, general purpose computer, the DEC

r-,----------T.console-

I1p(JI017)~S~S_Pc~S4----r-K6----TLTeletype; 10 charls; 8 b/char; 64 char)-

~

I I,O~
l

K---Tpaper tape;, (reader; 300 char/s)/ (pUnCh]. -

ta Break;

100 charls); 8 b/char

Dtpect

J
J

em01'11 Access

SlOMOI
Data
Multiplexor;

.O~--+

K----TCincreme. n,tal point plot; 300 point/s:
In/point
K---T(card; reader; 2001800 card/min)~"
K

T(card; punch; 100 card/min)-+

K

"fl ine; printer; 300 line/min; 120 col/I ine]. -+

radi a I;

L64 char/col

from: 7 p. K;

K-_-"fCRT: display; area: 10

to: Mp

L30

~s/point;

X

2
10 in 15

X

5 in2J-+

J

.01 /.005 in/point

K---T(Iight; pen)J
K _ _ _ T(Oataphone; 1.2',... 4.8 kb/s)-

K(#I:IO)-L(analog; output; 0 ...... -10 volts)--+
K-S-L(#0:63; analog; input;

° . . -10

volts}~

r - t - - - - - - - ' - - K - S - K(#0:63; Teletype; 110. 180 b/s)-

K-

s_

1-

MSC#"O:7; 'DEC~ape; addressable magnetic tape;

133 ~s/w; length: 260 ft; 350 char/in; 3 o/chad
M=======K_S_M{"#0:7; magnetic tape; 36/451751112.5 In/s~-

J

L200,556,aOO b/ln; 61a b/char

K - S_MS~0:3; fixed head disk; tdelay: 0"", Ii mSJ'
(66 ~/w; 32768 w) I (16 ~s/w; 262144 w);
(12,1 parity) b/w

-= P(display;

'338)~T(#0:3; CRT; display: area: 10 )( 10 in 2 ).....,
~ T(#0:3; light; pen)J

e- '=

Pc

~~: ~::::y 1 . :~C:;:;~:::::c::~::~s~d:::::~:: ~ magne
~omputer/LIN:l

6.25 kw/s: 2 7 w

tI

~

c tape _
J

T(#0:I5; knobs, analog; input)~
T(CRT; display; 5 X 5 In 2 )-+
T(digital; Input', output)T('Data Terminal Panel; digital; input, output)lHp(core; J.5

liS/Wi

4096

Wj

(12 + I)b)

:as ('Hemory Bus)
3pCO _2 w/ins.truction: data: w, i,bv; 12 b/w; H.processor state(2t .... 3

antecedents: PDP-5; descendants; PDP-8s, PDP-8I, PDP-L)
4S('1/0 Bus; fromj Pc; to; 64 K)
6 K(J

359

_ ,. Instructions; ",.buffed) char-2 w)

Figure 3-DEC LINC-8-338 PMS diagram

t1

w; technolo~y: transistors;

360

Spring Joint Computer Conference, 1970

LINC-8-338, which is a PDP-8 with a LINC processor
and a type 338 display processor. We will concentrate
on the notation, rather than discussing substantive
features of the system. A simplified PMS diagram of
the system shows its essential structure:

. - - - T.consoleMp-S

Pc-S I--~____

TMs

t--_~_P.display-

T-

_ _ _ _ Pc('LINC)- MsThis shows the basic Mp-Pc-T structure of a C with
the addition of secondary memory (Ms) and two
processors, one of which, Pc('LINC), has its own Ms.
Two switches are used: the I/O-bus which permits
access to all the devices, and a direct access path to
Mp via Pc for high data rate devices. There 1tre many
other switches in the actual system as one can see
from Figure 3; for example, M p is really 1 to 8 separate modules connected by a switch S to Pc. Also
there are many T's connected to the input-output
switch, S, which we collapsed as a single compound T;
and similarly for S (direct memory access).
Consider the Mp module. The specifications assert
that it is made with core technology, that its word
size is 13 bits (12 data bits plus one other with a
different function); that its size is 4096 words; and
that its operation time is 1.5 f.J.s. We could have written
the same information as:
M (function: primary; technology: core; operation-time:
1.5 f.J.S; size: 4096 w; word: (12 + 1) b)
In Figure 3 we wrote only the values, suppressing the
attributes, since moderate familiarity with memories
permits an immediate inference about what attributes
are involved. As another example, we did not specify
the function of the additional bit in the word when
we wrote (12
1) b. Informed readers will assume
this to be a parity bit, since this is the common reason
for having an extra bit in a word. If the extra bit had
some unusual function, then we would have needed to
define it. That is, in the absence of additional information, the most common interpretation is to be assumed.
In fact, we could have been even more cryptic and
still communicated with most readers:

+

M.core (1.5 f.J.s/w; 4 kw; 12 b),
corresponding to the phrase, "A 12 bit, 1.5 f.J.S, 4k

core store". 4 kw stands for 4 X 1024 = 4096; however, if someone who was less familiar took it to be
4 X 1000 = 4000 no real harm would be done.
Consider the magnetic tapes for Pc. Since there are
eight possible tapes that make use of the same controller, K, through a switch, S, we label them #0
through #7. Actually, # is an abbreviation for the
index. attribute whose values are integers. Since the
attribute is a unique character, we do not have to
write #:3 (although we could). The additional parameters give information about the physical attributes of
the encoding. These are alternative values and any
tape has only one of them. A vertical bar (\) indicates
this (as in BNF notation for grammars). Thus, 75\112
in/s says that one can have a tape with a speed of 75
inches per second or one with 112 inches per second,
but not a tape which can be switched dynamically to
run at either speed.
For many of the components no further information
is given. Thus, knowing that M.magneticL..Jtape is
connected to a control and from there to the Pc tells
generally what that K does. It is a "tape controller"
which evokes all the actions of the tape, such as read,
write, rewind; and therefore these actions do not have
to be done by Pc. The fact that there is only one K
for many Ms's implies that only one tape can be
accessed at a time. Other information could be given,
although that just provided is all that is usual in
specifying a controller in an overall description of a
system.
We have used several different ways of saying the
same thing in Figure 3 in order to show the range of
descriptive notations. Thus, the 64 Teletypes are
shown by describing a single connection through a
switch and putting the number of links in the switch
above the connecting line.
Consider, finally, the Pc in Figure 3. We have given
a few parameters: the number of data types, the number of instructions, and the number of interrupts.
These few parameters hardly define a processor. Several
other important parameters are easily inferred from
the Mp. The basic operation time in a processor is a
small multiple of the read time of its Mp. Thus it is
predictable that Pc stores and reads information in
2 X 1.5 f.J.s (one for instruction fetch, one for data
fetch). Again, where this is not the case (as in the CDC
6600) it is necessary to say so. Similarly, the word
size in the Pc is the same as the word size of the Mp12 data bits. More generally, the Pc must have instructions that take care of evoking all the components of
the PMS structure. These instructions of course do
not use the switches and controls as distinct entities;
rather, they speak directly to the operation of the
M's and T's connected via these switches and controls.

PMS and ISP Descriptive Systems

Other summary parameters could have been given
for the Pc. None would come close to specifying its
behavior uniquely, although to those knowledgeable
in computers still more can be inferred from the
parameters given. For instance, knowing both the
data types available in a Pc and the number of instructions, one can come very close to predicting exactly
what the instructions are. Nevertheless, the way to
describe a Pc in full detail is not to add larger' and
larger numbers of summary parameters. I t is more
direct and more revealing to develop a description at
the level of instructions, which is the ISP description.
In summary, a descriptive scheme for systems as
complex and detailed as digital computers must have
the ability to range from extremely complete to highly
simplified descriptions. It must permit highly compressed descriptions as well as extensive ones and
must permit the selective suppression or amplification
of whatever aspects of the computer system are of
interest to the user. PMS attempts to fulfill these
criteria by providing simple conventions for detailed
description with additional conventions that permit
abbreviation and abstractions, almost without limit.
The result is a notation that may seem somewhat
fluid, especially on first contact in such a brief introduction as this. But once assimilated, PMS seems to
allow some of the flexibility of natural language within
enough notational controls to enhance communication
considerably.
ISP LEVEL OF DESCRIPTION
The behavior of a processor is determined by the
nature and sequence of its operations. This sequence
is determined by a set of bits in Mp, called the program, and a set of interpretation rules, realized in the
processorr that specify how particular bit configurations
evoke the operations. Thus, if we specify the nature
of the operations and the rules of interpretation, the
actual behavior of the processor depends solely on the
particular program in Mp (and also on the initial state
of data). This is the level at which the programmer
wants the processor described-and which the programming manual provides-since he himself wishes to
determine the program. Thus the ISP (Instruction
Set Processor) description must provide a scheme for
specifying any set of operations and any rules of
interpretation.
Actually, the ISP descriptive scheme need only be
general enough to cover some broad range of possibilities adequate for past and current generations of
mac:hines along with their likely descendants. As with
the PMS level, there are certain restriction~ that can

361

be placed on the nature of a computer system, specializing it from the more general concept of a discrete
state system. For the PMS level, it processes a medium,
called information; it is a system of discrete components
linked together by information transfers; and each
component is characterized by a small set of operations.
Similarly, for the ISP level we can add two more such
. restrictions, which will in turn provide the shape of
its descriptive scheme.
The first specialization is that a program can be
conceived as a distinct set of instructions. Operationally, this means that some set of bits is read from the
program in Mp to a memory within P, called the
instruction register, M.instruction/M.i. This set of
bits then determines the immediately following sequence of operations. Only a single operation may be
determined, as in setting a bit in the internal state of
the P; or a substantial number of operations may be
determined, as in a "repeat" instruction that evokes
a search through Mp. In a typical one or two address
machine the number of operations per instruction
ranges from 2 to 5. In any event, after this sequence
of operations has occurred, the next instruction to be
fetched from Mp is determined and obtained. Then,
the entire cycle repeats itself.
The above cycle of activity is just the interpretation
cycle, and the part of the P that performs it is the
interpreter. The effect of each instruction can be expressed entirely in terms of the information held in
memories at the end of the cycle (plus any changes
made to the outside world). During execution, operations may have internal states of their' own as sequential circuits which are not represented as bits in memories. But by the end of the interpretation cycle, whatever effect is to be carried on to a later time has been
staticized in bits in some memory. *
The second additional specialization ~ on the data
operations. A processor's total set of operations can be
divided into two parts. One part contains those necessary to operate other components given in the PMS
diagram-links, switches, memories, transducers, etc.
The operations associated with these components and
the extent to which they can be indirectly controlled
from P are highly constrained by the basic nature of the

* This description holds true for

a P with a single active control
(the interpreter). Some P's (e.g., the CDC 6600) have several
active controls and get involved in "overlapping" several instructions and in reordering operations according to the data
and devices available. With these, a more complex statement
is required to express the same general restriction we have been
stating for simple P's: that the program can .be decomposed into
a sequence of bit sets (the instructions), each of which has local
contro] over the behavior of the P for a limited period of time,
with all inter-instruction effects being staticized as bits in M's.

362

Spring Joint Computer Conference, 1970

components and their controls. The second part contains those operators associated with a processor's D
component. So far we have said nothing at all about
them, except to exclude them completely from all PMS
components except P. These are the operations that
produce bit patterns with new meaning-that do all
the "real" processing-or changing of information. *
If it weren't for data operators, the system would
only just transmit information. As we noted in our
original definitions, a P (including a D) is the only
component capable of directly changing information.
A P can create, modify, and destroy information in a
single operation. As we noted earlier, D's are like the
primitive components in an analog computer. Later,
when we express instruction sets as simple arithmetic
expressions, the D's are the primitive operators, e.g.,
X, /, X2 n , / \ , Y , E9, and concatenation (D),
which are evoked by the instruction set interpreter
part of a processor.
The specialization is that all the data operations
can be characterized as working on various data-types.
For example, there is a data-type called the signedinteger, and there are data operations that ad9. two
signed-integers, subtract them, multiply them, take
their absolute value, test for which of two is the greater,
etc. A data-type is a compound of two things: the
referent of the bit pattern (e.g., that this set of bits
refers to an integer in a certain range); and the representation in the bit pattern (e.g., that bit 31 is the
sign, and bits 30 to 0 are the coefficients of successive
powers of 2 in the binary representation of the integer).
Thus, a processor may have several data-types for
representing numbers: unsigned-integers, signed-integers, single~recision-floating-point, double-precisionfloating-point, etc. Each of these requires distinct
operations to process it. On occasion, operations for
several data-types may all be encoded into a single
instru~tion from the programmer's viewpoint, as when
there is an add instruction with a data-type sub-field
that selects whether the data is fixed or floating point.
The operations are still separate, no matter how
packaged, and so their data-types remain distinct.
With these two additional specializations-instructions and data-types-we can define an ISP description

+, -,

of a processor. A processor is completely described at
the ISP level by giving its instruction-set and its
interpreter in terms of its operations, data-types and
memories.
Let us first give the instruction-set. The effect of
each instruction is described by an instruction-expression, which has the form:
condition

action-sequence.

The condition describe's when the instruction will be
evoked, and the action-sequence describes what transformations of data take place between what memories.
The right arrow (~) is the control action (of a K) of
evoking an operation.
Since all operations in a computer system result in
modifications of bits in memories, each action in a
sequence has the form:
memory-expression

~

data-expression

The left arrow (~) is the transmit operation of a
link, and corresponds to the ALGOL assign operation.
The left side must describe the memory location that
is affected; the right side must describe the information pattern that is to be placed in that memory
location. The details of data-expressions and memory
expressions are patterned on standard mathematical
notation, and are communicated most easily by examples. The same is true of the condition, which is a
standard expression involving boolean values and
relations amOI).g memory contents.
There are two important features of the actionsequence. The first is that each action in the sequence
may itself be conditional; i.e., of the form, "condition
~ action-sequence." The second is that some actions
are sequentially dependent on each other, because the
result of one is used as an input to the other; on other
occasions a set of actions are independent, and can
occur in parallel. The normal situation is the parallel
one. For example,- if A and B are two registers, then
(A~B;

B

~A);

exchanges the contents of A and B. When sequence is
required, the term 'next' is used; thus,
(A

* In principle, this view that only D components do "real"
processing- is false. It can be shown that a universal Turing
Machine can be built from M, S, L, and K components. The
key operation is the write operation into M, which suffices to
construct arbitrary bit patterns under suitably controlled
switches. Hence, arbitrary data operations can be built up. The
stated view is correct in practice in that the data operations
provided in a P are highly efficient for their bit transformations.
Only the foolish add integers in a modern computer by table
lookup.

~

~

B; next B

~

A) ;

transfers the contents of B to A and then transfers it
back to B, leaving both A and B holding the original
contents of B (equivalent to A ~ B).
An ISP example using the DEC PDP-8 Pc

The memories, operations, instructions, and datatypes all need to be declared for a processor. Again

PMS and ISP Descriptive Systems

these are most easily explained by example, although
full definitions exist (Bell and Newell, 1970). Consequently, let us examine the ISP description of the Pc
of the PDP-8, given in Figure 4.

363

is a memory called AC, with 12 bits, labeled from 0 to
11 from the left. Comments are given in italics*-in
this case that AC is called the accumulator (by the
designers of the PDP-8). Alternatively, we could have
used the alias or abbreviation convention:

Processor state

AC(O: 11>/Accumulator (0 : 11).

We first need to specify the memories of tlie Pc in
detail, providing names for the various bits. Thus,
AC(O:ll) the accumulator

* There are a few features of the notation, such as the use of
italics, which are not easily carried over into current comput~r
character sets. Thus, the ISP of Figure 4 is a publication language.

DEC PDP-8 ISP Description

Pc State
AC

Accumulatol'

L

Link bit/AC extension fol' ovel'flow and cal'ry
Progrcun Countez'
1 when Pc is interpreting instroctions or "ronning"
1 when Pc can be intel'ropted; undel' pl'ogl'ammed contl'ol
IOpulses to IO devices

PC
Run
Interrupt..,state
I O....pulse ...1 ; 10....pulse..2; 10...pulseJ+

Mp State

Extended memol"'Y is not included.
H[0:7777 ]
8
Page~[0:I77S] :- H[0:I77 ]

a

Auto ...lndex[0:7] : .. Page.p[IO

Pc Console State
Keys fol' 8ta'l't~ stop~

continue~

Data swltches

a: 17 8 ]<0: 1.1>

special array of dil'ectly addl'essed memol"'Y l'egisters
special al'ray when addl'essed indi'l'ectly~is incl'emented by

examine (load from memol"'Y), and deposit (store in memory) are not included.
data ente'l'ed via con80le

Instroction Format
Instructlon/l

..-

op

1<0:2>

:- 1<3>

Indl rectJllt/lb
page ...OJl 1 tIp

1<4>

page..,address

:- 1<5:11>

th 1s....page
PCI

:- PC ' 
:- (PC -I)

op code
O~ direct; 1 indirect memory reference

o selects

page 0; 1 select8 this page

I O..,select

:- 1<3:8>

selects a T 0'1' M8 device

lo..,pl..,blt

:.- 1<11>

lo..,p2 ...blt

:- 1<10>

these 3 bits contl'ol the selective generation of -3
0.4 ~s pulses to I/O devices

1<;9>

io..,p4J> 1t

:-

sma

;- 1<5>

~

sza

:- 1<6>

~

snl

:- 1<7>

~

bit jOl' skip on minus AC~ operate 2 gl'oup
bit for skip on aero AC
bit fol' s1 :- (

Ib 1\  :- (..., Ib -+z": Ib -+H[Z"»
ZIl :- (page ...O...b It -+ th I s.,J)agecpage...address;
-page..,O..,blt -+Ocpage..,address)
~ mic~coded

instruction

0'1'

instruction bit(s) within an instruction

di'l'ect add'l'eBc

vOltB~

364

Spring Joint Computer Conference, 1970

Instruction Interpretation PPQcess
Run 1\ -, tlnterrupt... request 1\ Interrupt...state)

no interrupt interpreter
fetch
execute

-+ (

Instruction ~H[PC]; PC ~PC + 1; next
Instructlon ...execution);
Run 1\ Interrupt ... request 1\ Interrupt ...state

-+ (

interrupt interpreter

H[O) ~ PC: Interrupt ...state· ~ 0; PC ~ 1)

Instruction Set and Instruction Execution Process
Instructlon...executlon :- (
and (:- op - 0) ..... (AC ~ AC 1\ H[z]);

logical and

tad (:- op - 1) ..... (LDAC ~ LcAC + H[ z]) ;

two's complement add
index and skip if zero

Isz -(:- op - 2) ..... (H[ZI] ~ H[z] + I; next

01. .....

{H[ZI] -

crc

(-pC

t

IU,

dca (:- op - 3) ..... (H[z] ~ AC; AC (- 0);

deposit and clear AC

Jms (: - op - ,.) ..... (H[ z] ..... PC; nex t PC (- Z + I);

jump to subroutine

Jmp (:-

op - 5) ..... (PC (- z);

lot (:- op - 6)

jump

-+ (

~

Io....pl ... blt ..... 10...,plIlse..,1 ~ 1; next
~

lo...p2...b1 t

-+

10..,pulseu2

I o...p4...b I t

-+

I O..,pu I se ...4 (- 1) ;

opr (:- op -

in out transfer, microprogrammed to generate up to 3 pulses
to an io device addressed by .[O...select

1; next

7) ..... Operate...executlon

Operate Instruction Set
The microprogrammed operate instructions:
instruction set.

the operate instruction is defined below
end Instruction execution
operate group l, operate group 2, and extended arithmetic are defined as a separate

Operate..,executlon :- (
cIa (:-

(AC ~ 0);

1<4> - 1) .....

op r ... I (:- 1<3> - 0)

clear AC. Corrmon to all operate instructions.
operate group 1
~ clear link
~ complement AC

-+ (

c I I (:- kS> - I) ..... (L ~ 0): next
cma (:- 1<6> - I) ..... (AC ~-, AC);
cml (:- 1<7>. I) ..... (L ~-, L): next

complement L
increment AC
IJ. 1?otate left
\l. rotate twice left
IJ. rotate right
IJ. rotate twice right
operate group 2
~

lac (:- i
  • - I) ..... (L~C ..... Lo\C + I); next IJ. ral (:- 1<8: 10> • 2) ..... (L~C ~ LO\C X 2 {rotate»; rtl (:- i<8:IO> - 3) ..... (LOAC ~LOAC X 22 {rotate); rar (:- i<8:10> .. 4) ..... (LOAC ~LOAC /2 {rotate}}; rtr (:- i<8:IO> - 5) ..... (LOAC ~LOAC / 22 (rotate1»; opr..,2 (:- 1<3.11> - 10) -+ ( skip condItIon EB (1<8> - 1) ..... (PC (- PC + 1); next ski)" condItIon :- «sma" (AC < 0» osr (:= i - 1) hit (:"" 1<10>- 1) -+ -+ (AC (- AC V V (sza 1\ (AC - 0» Oata swItches): (Run (- 0»; IJ. AC,L skip test \I (snll\ L» IJ. F.AE (.:- l- lJ) ..... EAF...,tnstructlon.Jlxecution) switches ha"lt or stop IJ. ''(If>'' optional EAE description Figure 4-DEC PDP-8ISP Description AC corresponds to an actual register in the Pc. However, the ISP does not imply any particular implementation, and names may be assigned to various sets of bits purely for descriptive convenience. The colon is used to denote a range or list of values. Alternatively, we could have listed each bit, separating the bit names by commas, as: AC(O, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11). Having defined a second memory, L (which has only a single bit), one could define a combined register, LAC, in terms of L, and AC, as: LAC(L, 0:11) := LOAC. The colon-equal (: =) is used for definition, and the middle square box (0) denotes concatenation. Note that the bit named L of register LAC corresponds to the 1 bit L register. PMS and ISP Descriptive Systems MeInory state In dealing with addressed memory, either Mp or various forms of working memory within the processor, we need to indicate multidimensional arrays. Thus, Mp[0:77778](0: 11) gives the primary memory as consisting of 77778 (i.e., base 8) words of 12 bits each, being addressed as indicated. Such an address does not necessarily reflect the switching structure through which the address occurs, though it often will. (Needless to say, it reflects only addressing space, and not how much actual M is available in a PMS structure.) In general, only memory within the processor will occur as operands of the processor's operators. The one exception is primary memory (Mp), which is defined as a memory external to a P, but directly accessible from it. In writing memories it is natural to use base 10 for all numbers and to consider the basic i-unit of the memory to be a bit. This is always assumed unless otherwise indicated. Since we used base 8 numbers above for specifying the addressing range, we indicated the change of number base by a subscript, in standard fashion. If a unit of information other than the bit were to be used, we would subscript the angle brackets. Thus, Mp[0:77778](0: 1)64 reflects the same memory. The choice carries with it, of course, some presumption of organization in terms of base 64 characters-but this would show up in the specification of the operators (and is not true, in fact of the PDP-8). We can also have multi-dimensional memories (i.e., arrays), though no examples are used in Figure 4. These just add the extra dimensions with an extra pair of brackets. For example, a more precise description would have used: Mp[0:7][0:31][0: 127](0: 11) to mean 8 memory fields, each field with 32 pages, each page with 128 words and each word with 12 bits. Instruction forIllat It is possible to have several names for the same set of bits; e.g., having defined instruction (0: 11) we define the format of the instruction as follows: op(0:2) := instruction (0:2) indirectLJbit : = instruction (3 ) pageLJOLJbit : = instruction(4) pageLJaddress(O: 6) : = instruction(5: 11) 365 The colon-equal (: = ) is used to assign names to various parts of the instruction. In effect, this is a definition equivalent to the conventional diagram for the instruction: I 0 op I i pi 3 4 I I 1 page address 5 11 L pageuOub i t indirectubit Notice that in pageLJaddress the names of all the bits have been shifted, e.g., pageLJaddress(4) := instruction(9). In general, a name can be any combination of upper and lower case letters and numerals; not including names which would be considered numbers (integers, mixed numbers, fractions, etc.). A compound name can be sequences of names separated by spaces ( ) or a hyphen. In order to make certain compound names more recognizable, a space symbol (LJ) may optionally be used to signify -the non-printing character. The instruction set With all the registers defined, the instructions can be given. These are shown on the second page of Figure 4. The second page is a single expression, named InstructionLJexecution, which consists of a list of instructions. These are listed vertically down the page for ease of reading. Each instruction consists of a condition and an action sequence, separated by the condition-arrow (~). In this case the condition is an expression of the form (op = octal-digit). Since op is instruction(0:2), this expresses the condition that the operation code of the instruction has a particular value. Each condition has been given a name in passing; e.g., 'and' is the name of (op = 0). This provides the correspondence between the operation code and the mnemonic name of the operation code. If this correspondence had been established elsewhere, or if we didn't care what numerical operation code the "and" instruction is, we could have written: and ~ (AC f- AC 1\ M[z]) We would not have known what condition the name 'and' stood for, but could have surmised (with little difficulty) that it was simply an equality test on the operation code. Or we could define it elsewhere as: and := (op = 0) 366 Spring Joint Computer Conference, 1970 Most generally the form of an instruction is written as: two'sLJcomplementL...Jadd/tad(:= op = 1) (LDAC ~ ~ LDAC + M[z]) Here, we simultaneously define the action of the tad instruction, its name, an abbreviation for the name, and the conditions for tad's execution. The first parentheses are, in effect, a remark to allow an in-line definition. The instructions in the list constitute the total instruction repertoire of the Pc. Since all the conditions are disjoint, one and only one condition will be satisfied when a given instruction is interpreted, hence one and only one action sequence will occur. ,Actually, all operation codes might not be present, so there would be some illegal op codes that would evoke no action sequence. The act of selection is usually called operation decoding. Here, ISP implies no particular mechanism by which this is carried out. I t might be wondered why the conventions are not more stylized-e.g., some sort of table with mnemonic names in one column, bits of the operation code in another, etc. Though standard processors would fit such a stylized scheme, many others would not-e.g., microprogram processors. By making the ISP description a general expression for evoking action-sequences we obtain the generality needed to cover all variations. (Indeed, you will notice that the PDP-8 ISP is a single expression, and that it incorporates two microprogrammed instructions with no difficulty.) For the action-sequence standard mathematical infix notation is used. Thus we write AC ~ AC /\ M[z] This indicates that the word in Mp at address z (determined by the expression on page 1 of Figure 4) is anded with the accumulator and the result left in the accumulator. Each processor will have a basic set of operations. that work on data-types of the machine. Here the data-type is simply the 12 bit word viewed as an array of bits. Operators need not necessarily involve memories actually within the Pc (the processor state). Thus, Mp[z] f- Mp[z] + 1 expresses a change in a word in Mp directly. That this must be mechanized in the PDP-8 by means of some temporary register in Pc is irrelevant to the ISP description. We also use functional notation, e;g., AC ~ abs(AC) replaces the contents of the AC with its absolute value. Effective address calculation In the examples just given we used z as the address in Mp. This is the effective address (simplified) and is defined as a conditional expression (in the manner of ALGa L or LISP) : z(O: 11) : = (n indirect-bit ~ z'; indirect-bit ~ Mp[z']) The right arrow (~) is the same conditional sign used in the main instruction, similar to the "if. .. then ... " of ALGOL. The parentheses are used to indicate grouping in the usual fashion. However, we arrange expressions on the page to make reading easier. As the expression for z shows, we permit conditional within conditionals, and also the nesting of definitions (z is defined in terms of a variable z'). Again, we should emphasize that the structure of such definitions may reflect directly the underlying hardware organization, but it need not. When describing existing processors the ISP description often does or can be forced to reflect the hardware. But if one were designing a processor, then ISP expressions would be put down as design objectives to be implemented in a register transfer structure, which might differ considerably. Special note should be taken of the opr instruction (op = 6) in Figure 4, since it provides a microprogramming feature. There are two separate options depending on instruction(3) being 0 or 1. But common to both of these is the operation of clearing the AC (or not), associated with instruction (4). Then, within one option (instruction (3 ) = 0) there are series of independently executable actions (following the clearing of L); within the other (instruction(3) = 1), there are three independently settable control actions. The nested conditionals and the use of 'next' to force sequential behavior make it easy to see exactly what is going on (in fact a good deal easier than describing it in natural' language, as we have been doing). a: The instruction interpreter From the hardware point of view, an interpreter consists of the mechanisms for fetching a new instruction, for decoding that instruction and executing the operations so designated, and for determining the next instruction. A substantial amount of this total job has already been taken care of in the part of the ISP that we have just explained. Each instruction carries with it a condition that amounts to one fragment of the decoding operation. Likewise, any further decoding of the instruction that might be done in common by the interpreter (rather than by the indi- PMS and ISP Descriptive Systems vidual operation circuits) is implied in the expressions for each instruction, and by the expression for the effective address. The interpreter then fetches the next instruction and executes it. In a standard machine, there is a basic principle that defines operationally what is meant by the "next instruction." Normally the current instruction address is incremented by one, but other principles are used (e.g., on a processor with a cyclic Mp). In addition, several specific operations exist in the repertoire that can affect what program is in control. The basic principle acts like a default condition-if nothing specific happens to determine program control, the normal "next" instruction is taken. Thus, in the PDP-8 we get an interpretation process that is the classic fetchexecute cycle: Run ---; (instruction ~ Mp[PC]; PC ~ PC + 1; next InstructionL...Jexecution) The sequence is evoked so long as Run is true (i.e., its bit value is 1). The processor will simply cycle through the sequence, fetching, then executing the instruction. In the PDP:-8 there exists a halt operation that sets Run to be 0, and the console keys can, of course, stop the computer. It should be noted that this ISP description does not include console behavior, although it could. The ISP description does not determine the way the processor is to be organized to achieve this sequencing, or to take advantage of the fact that many instructions lead to similar sequences. All it does is specify what operations must be carried out for a program in Mp. The ISP description does specify the actual format of the instruction and how it enters into the total operation, although sometimes indirectly. For example, in the case of the and operation (op = 0), the definition . of AC shows that the AC does not depend on the instruction and the definition of z shows that z does depend on other fields of the instruction (indirectLJbit, pageLjOLjbit, p ageLj address ). Likewise, the form of the ISP expression shows that AC and PC both enter into the instruction implicitly. That is', in the ISP description all dependence on memory is explicit. * * This is not correct, actually. In physically realizing an ISP description, additional memories may be utilized (they may even be necessary). It can be said that the ISP description has these memories implicitly. However, it is the case that a consistent and complete description of an ISP can be made without use of these additional memories; whereas with, say, a single address machine, it does not seem possible to describe each instruction without some reference to the implicit memories-as we see in the effective address calculation procedures where definitions look much like registers. 367 Data-types and data operations Each data-type has a set of .operations that are proper to it. Add, subtract, multiply and divide are all proper to any numerical data-type, as well as absolute value and negation. Not all of these need exist in a computer just because it has the data-type, since there are several alternative bases, as well as some levels of completeness. For instance, notice that the PDP-8 first of all does not have multiply and divide (unless one has its special option), thus having a relatively minimal level of arithmetic operations; and second, it does not have a subtract operation, using a two's complement add, which permits negation ( - AC) to be accomplished by complementation (/\ AC) followed by add 1. The PDP-8, unlike larger C's, does not have several data representations for what is, externally considered, the same entity. An operator that does a floating add and one that does an integer add are not the same. However, we denote both by the same symbol (in this case, +), indicating the difference parenthetically after the expression. Alternatively, the specification of the data-type can be attached to the data. Thus, in the IBM 7094 we would see the following add instructions: Add/ ADD ~ (AC ~ AC + M[e]); Add and Carry Logical!ACL ---; (AC ~ AC + M[e]{sl}). Floating add/FAD ---; (AC ~ AC + M[e]{sf}) ; Un-normalized floating add/UF A ---; (AC ~ AC M[e]{suf}); + Double precision floating add/DFAD ---; (ACMQ ~ ACMQ M[e]OM[e + + 1]{ df}); Double precision un-normalized floating add/DUFA ---; (ACMQ ~ ACMQ + M[e]OM[e + 1]{ duf}) The braces { } differentiate which operation is being performed. Thus above, the data-type* is enclosed in the braces and refers to all the memory elements (operands) of the expression. Alternatively, we also use braces as a modifier to signify the encoding of the i-unit. For example, a fixed point to floating point data conversion operation would be given: AC{floating} ~ AC{fixed}. We also use the braces as a modifier for the operation type. For example, shifting (left or right) can be a * The conventions for naming data-types is a concatenation of precision, a name and a structure. Examples include i/integer; dijdouble integer; div /double integer vector; single floating/sf; suf/single unnormalized floating; bv /boolean vector; ch.string/ character string. 368 Spring Joint Computer Conference, 1970 multiplication or division by a base, but it is not always an arithmetic operation. In the PDP-8, for instance, we had LOAC ~ LDAC X 2{rotate}; where the end bits Land AC(ll) are connected when a shift occurs (the operator is also referred to as a circular shift), or equivalently (LOAC ~ LDAC X 2; AC(ll) ~ L). In general, the nature of the operations used in processors are sufficiently familiar that no definitions are required, and they can all be taken as primitive. I t is only necessary to have agreed upon conventions for the different data representations used. In essence, a data-type is made up recursively of a concatenation of subparts, which themselves are data types. This concatenation may be an iteration of a data-type to form an array. If required, an operation can be defined in terms of other (presumably more primitive) operations. It is necessary, of course, first to define the data format explicitly (including perhaps some additional memory). Variables for the operands are permitted in the natural way. For example, binary single precision floating point multiplication on a 36 bit machine could be defined in terms of the data fields as follows; sf mantissa/mantissa sf sign/sign sf exponent/exponent sf exponentLJsign xl ~ x2 X x3{sf} (0:27) (0) (28:35) (28) . - (xl mantissa : = x2 mantissa X x3 mantissa; xl exponent : = x2 exponent x3 exponent; next xl ;= normalize(xl){sf}) + encoding the Mp addresses into the same integer data-type as are used for regular arithmetic. Then there need be no separate data-type for addresses. * The notational aspect is our use in ISP of an mnemonic abbreviation scheme for data-types. We have already used sf for single-precision-floating-point. More generally, an abbreviation is made up of a letter showing the length, a letter showing the type, and a letter showing the structure. The simple naming convention does not take into account all we know about a datatype. The information carrier for the data is only partially included in the length characteristic. Thus the carrier should also include the data base and the sign convention for representing negative numbers, (e.g., sign-magnitude). PMS structure of the CDC 6600 series A simplified PMS structure of the C('6400I'6600) is given in Figure 5. Here we see the C(io; #1: 10) each of which can access the primary memory (Mp) of the central computer (Cc). Figure 5 shows why one considers the 6600 to be a network. Each Cio (actually a general purpose, 12 bit C) can easily serve the specialized Pio function for Cc. The M p of Cc is an Ms for a Cio because the Cio cannot execute programs from this memory. By having a powerful Cio more complex input-output tasks can be handled without Cc intervention. These tasks can include data-type conversion, error recovery, etc. The K's which are connected to a Cio can also be less complex. A detailed PMS diagram for the C('6400, '6416, '6500, and '6600) is given in Figure 6. The interesting structural aspects can be seen from' this diagram. The four configurations, 6400 6600, are included just by considering the pertinent parts of the structure. That I"'-' where normalize is: xl ~ ,.---Pc normalize(x2) {sf} : = ( (xl mantissa = 0) ~ (xl exponent : = 0) (x2 mantissa ~ 0) /\ (x2(0) = x2(1» ~ ( xl mantissa : = x2 mantissa X 2; xl exponent : = x2 exponent-I; next xl := normalize (x2){sf}) Three additional aspects need to be noted with respect to data-types; two substantive, one notational. First, not everything one does with an item of data makes use of all the properties of its data-type. For example, numbers have to be moved from place to place. This operation is not a numerical operation, and does not depend on the item being a number. Second, one can often embed one kind of operation in another, so as to coalesce- data-types. An example is y-] ~p(12 b/W>-Pc(H1>-:+-r-SI~_S_MrITIKMp(12 b/w~Pc(H10 \ KJ C10(#1: 10) ~ peri phery Figure 5-CDC 6600 PMS diagram (simplified) * However logical such a course may seem, it is not always done this way. For example, the IBM 7090 (and other members of that family) have a 15 bit address data type and a 36 bit integer data type, with separate operations for each. PMS and ISP Descriptive Systems 369 M('BqrrelJ workinq: lOw: 51 h/w:,J IJ.s/w) I I · T('Dead Start Console)- Mp(HO:9)l- S2 _ _ ~PC3 (110:9) --Stm_-_SC#l: 12J'[ K - _ - L(1 ILS/W; 12 b/w)fixed I C[#0:9; ,periPheraj and Control Processor/PCP K-S ? T T(H1 :2; CRT; disp1ay)- L-T(keyboard)- K['Read Pyramid; bUfferj' 12 b/w: M(workinq: (1+2+3+4+5): 12 b/w): .2 I1s/w) ~e K['vri Pyramid; bUffer;]? 12 b/w: M(working (5+4+3+2+1) w: 12 b/w; .2 IJ.s/w K['Extended Core couP1erl-,s(4 K: 16 Ms)_Ms6 (HO:15) .1 ~s/w; J 60 b/w Mp4(HO:31)_Soo"__---I..... c8 I L(H2,3,4: to: 'Extended Core Coupler) C('Central) lMp(core; 1.0~s/w: 4096w: 12 b/w) 2S(time multiplex: .2 ~s/w: 12 b/w) 3Pc('Periphera1 and Control Processor: HO:9; time mu1tiplex:.1 ~slw: 1 address/instruction: 12 b/w; Mps('Proqram Counter, Accumulator) 1,2 w/instruction) 4Mp(core: 1.0 ~s/w; 4096 W; (5 x 12) b/w) 6S(time multiplex: 0.1 IJ,s/w: 60 b/w) 6Ms('Extended Core Storage/FCS: 3.2IJos/w; (125952 I R) w: (R X (60, 1 parity» b/w) ?See Chapter 39 for operation. 80 n l y present in CDC 6500 9 No C('Centra1) in CDC 6416: CDC ~500 and CDC 6400 do not have K('Scoreboard), separate D's, and M('lnstruction Stack). Pc('6600; 15, 30 b/instruction; techno1ogy:transistor: Mps(flip flop: -16 w)-S('Switchboard) , , I -S-- D(' Boolean) I I , I K(interpreter) 1964: data: s i ,bv ,w,sf ,df) := D('Shift) D(Hl: 2: 'Increment) : ~('Scoreboard) M.working o(' Br~nch) D('Add; 0.3 IJos) D(' Long Add) M.instructioI'lnstruction Stack; 1 0(#1 :2: Multiply; IJos) D( , 0 i v i de: 2. 9 IJo s ) content addressable; flip flop; 8 W; 60 b/w Figure ~CDC is, a 6416 has no large Pc; a 6400 has a single straightforward Pc; a 6500 has two Pc's; and the 6600 has a single powerful Pc. The 6600 Pc has 10 D's, so that several parts of a single instruction stream can be intBrpreted in parallel. A 6600 Pc also has considerable M.buffer to hold instructions so that Pc need not wait for Mp fetches. 6600 PMS Diagram The implementation of the 10 Cio's can be seen from the PMS diagram (Figure 6). Here, only 1 physical processor is used on a time shared basis. Each 0.1 p's a new logical P is processed by the physical P. The 10 Mp's are phased so that "a new access occurs each 0.1 p.s. The 10 Mp's are always busy. Thus, the information rate is (10 X 12) b/p.s or 120 megabit/so 370 Spring Joint Computer Conference, 1970 This structure for shifting a new Pc state into position each 0.1 }lS has been likened by CDC to a barrel. The T's, K's and M's are not given in the figures, although it should be mentioned that the following units are rather unique: a K for the management of 64 telegraph lines to be connected to a Cio; and Ms( disk) with four simultaneous access ports, each at 1.68 megachar/s data transfer rate; and a capacity of 168 megachar; a Ms(magnetic tape) with a K(#1 :4) and S to allow simultaneous transfers to 4 Ms; the CDC 6400, 6500, 6600 Central Processor ISP Description Pc State ~un Program counter Main arithmetic registers. X[l:5), are implicitly loaded from Mp When A[1:5) are loaded. X[6:?) al'eimpZicitly stored in Mp When A[6:?] are loaded. B register8 are general arithmetic register8, and can be used as index register8. 1 if interpreting instructions, not under program control. EM<17:0> Exi t mode bi ts P<17:0> )([0:7]<59: 0> A[O: "ld7 :0> B[0)<17:0> :- 0 B[1 :7)<17:0> AddressuPutuofurangeumod~ :- EM<12> OperanduPutuPfurangeumode := EH IndefiniteuPperandumode :- EM<14> The above description is inaomplete in that the- above 3 mode's alarm allOlol conditions to trap Pc at Mp[RA). an alarm condition occurs "and" the mode i8 a one. Mp State Trapping occurs if 18 main core memory of 2 w, (256 ~) ECS/Extended Core Storage Program can only transfer data between Mp and Ms. Program cannot ne executed in Ms. reference (01' relocation) addres8 register to map a logical Mp' into physical Mp field length - the bour4S register which limits a program's access to a range·of Mp' reference 01' relocation register for M8(Extended Core Storage) Mp [0 :777777 s. )<59: 0 > Hs [0:2015232 )<59:0> RA<17:0> FL<17:0> RAECS<59: 36 > field length for ECS a bit denoting a state when memory.mapping is invalid FLECS<59:36> AddressuOutuOfurange Memory Mapping Process This process maps 01' relocate8 a logical program, at location Mp', and M8',into physical Mp and M8. Mp'[X) := ((X < FL) ~Mp[X + RA); (X Ms'[X) : := poA[O)oOOOOOOS Mp"[n+l)<53:0> := RAoA[I)oB[I) Mp"[n+2)<53:0> :~ Mp"[n+3)<53:0> := EMoA[3)oB[3) The exohange jump instruotion in FLoA[2JoB(2) Mp"[ n+4) : = RAECSoA[ 4 loBe 4) Hp"[n+5) :- FLECSoA[5)oB[5) Mp"[n+6)<35:0> := A[6)oB[6) Mp"[ n+ 7 )<35: 0> : = A[ ?JOB[?J Mp"[n+l0S:n+17S):= X[O:7] Figure 7-CDC 6400, 6500, 6600 Central Processor ISP Description PMS and ISP Descriptive Systems T(direct; display) for monitoring the system's operation; K's to other C's and Ms's; and conventional T(card reader, punch, line-printer, etc.). 371 ISP OF THE CDC 6600 The ISP description of the Pc is given in Figure 7. The Pc has a straightforward scientific calculation Instruction Format instructlon<29:0> althouqh 30 bits, most ir!st1'Uctions are 15 bits; see InstY'ucf;ion InteY'p1'etation ProceSR fm<5:0> :- instruction<29:24> operation code fmi<8:0> :- fmOi extended op code i.-:2:0> :: Instruction<23:21> J<2 :0> ;= specifies a register speaifies a register instruction<20:18> 01' funct-ion 01' an extension to op code k<2:0> := instruction<17:15> specifies a register jk<5 :0> :- jd<. a shift constant (6 bits) K<17:0> := instruction<17:0> an 18 bit addY'ess siae constant 10ng u instruction .- «fm < lOa) v 30 bit instruction (50 '" fm < 53) v (60 <: fm < 63) 'J (70 ~ fm .-: 73» :- ~ long instruction shortulnstructlon 15 bit instruction Instruction InterpY'etation Process A 15 bit (short) 01' 30 bit (long) instY'uction is fetched from Mp'[P]

    where p hit instruction cannot be stm'etl acY'oss ",'oY'd boundaries (01' in 2,' Mp' locations). = 3, 2, 1, 01' O. A 30 a pointer to 15 bit quarter word which has inst1'Uction p4 Run -.(instruction-29:15:· .-Mp'[P]«p x IS + 14):(p x 15)',: next (ptC!h p .- p - I: next (p - 0)" 10ng u instructlon --.Run .-0; (p ~ 0)" long u instruct ion'" ( instruction <14:0> '-Mp'[P]«p x 15 + 14):(p XIS) , p .- p - I); next e:r{'cute Instructionuexecution; next (p - 0) -, (p • 3; P • P + 1) In8truct£on !.-'et an.i InstY'u(!tion Execution Process Opel"arz,i fetches or stoY'es between Mp' and X[i] (>("~UY' _h'l 7oa,ling 01" stoY'1:ng regiRters A[ iJ. If (0 < i <,6) a fetch from :~!" [;11 i rl ,"·'IH';'. l/ <" ? t;) c'1 IltoY'e is marie to Mp'1 AI ill. 'T'izelesorirUon ,foes not describe Addresswoutt..Jof..,range case, "I-,:',!:: ,':. iI'Plt.',;' lU:e .l nl.dl. orel'ation. Instruction~.execution :;pt 111 :-, ( i I/o";; J+ "SAl Aj + K" (fm ... SO) "SAi Bj + K" (fm - 51) -4 (A [I ] ~B[J] + K; next Fetcht..JStore); "SAi Xj + K" (fm - 52) -4 (A[I ] ,-XU ]<17:0> + K; next Fetch....store); (A [i ) ,-A [j -4 K; next Fetcht..JStore); (A[i] ~X[j }-':17:n> + B[k]; next Fetch~Store); "SAi XJ + Bk" (fm • 53) "SAi Aj + Bk" (fm - 54) -4 (A[I] ~A[J] + B[k]; next Fetcht..JSto re ); -4 !is) (A [I] ~A [J ] - B [k]; next Fetch~Store); "SAi Aj - Bk" (fm - "SAl Bj + Bk" (fm .. 5!)) -4(A[i] ~Blj] + "SAi Bj - Bk" (fm .. 57) -4 -4 (A[i] ~ B[j] BU + K); "SBi Xj + Bk" (fm - 63) -) (B[ I] ~ X[j]<17:Q::> + B[k]); "SBI Aj + Bk" (fm - 64) -) (B[I] f-A[j] + B[k]); "SBi Aj - Bklt Um • 65) -) (B[ I] ~ A[j] - B[k]); "SBi Bj + Bk" Um • 66) -) (B[ I] ~ B[j] + B[k])"; "SBI Bj - Bk" Um. 67) -) (B[I] ~s[j] - B[k]); Set X[ i )/SXi "SXi Aj + K" (fm - 70) -) (X[ I] ~ signuextend(A[j] + K): "SXi Bj + K" {fm - 71} -) (X[ I] "SXi Xj + K" f - slgnuextend(B[j] + K»): (fm - 72) -) (X[ I] (- signuextend(X[]] + K); "SXi Xj + Bk" (fm • 73) -) (X[ I] ~ sfgnuextend(X[j] + B[kJ}); "SXI Aj + Bk" (fm .. 74) -) (X[i] ~signuextend(A[J] + B[k]); "SXI Aj - Bk" (fm .. 75) -) (X[i] ~signJ!xtend(A[j]. B[k])): "SXi Bj + Bk" (fm .. 76) -) (X[ I] ~ sign....,extend(B[j] + B[k]); "SXI Bj - Bk" (fm • 77) -) (X(I] ~signJ!xtend{B[J] • B[k]) ) ; Miscellaneous ppogpam contpol "PS" (:= fm .. 0) -) (Run ~ 0); proogroam stop no operation; pass "NO" (:= fm· 46) -); Jump unconditional "JP BI + K" (:= fm '" 02) -) (p ~ B[ I] + K; p ~ 3); Jump on X[j] conditions "lR Xj K" (:= fml .. 030) -) {(X[j] - 0) -) (p "Nl Xj K" (:= fmi .. 031) -) «X[j] ~ 0) -) (p jump ~ K; p ~ 3»; ~ K; p ~ 3»; "PL Xj K" (:- fmi • 032) -) «X[j] ~ 0) -) (p ~ K; p ~ 3»; liNG Xj K" (:= fml .. 033) -) «XU] < 0) -) (p ~K; P ~ 3»; "IR Xj K" (:- fmi - 034) -) ( aeroo non aeroo plus oro position negative out of pange constant tests -)P ~K; P ~3); ....,«X[]JI<59:4b- 3777)V (X[j]<5C):48>-4000» "OR Xj K" (:= fml .. 035) -) ( (p ~K; p ~3)l; (X[j]<59:48>a3777) V (X[j]<59:48>-4000)-) indefinite "OF Xj K" (:= fml '" 036) -) ( (X [j ]<59: 48>-1777) v (X [j ]<59: 48>-6000) fo~ constant tests -) (p ~K; p ~3»; "10 Xj K"(:= fml .. 037) -) ( (p ~K; p ~3»; (X[j ]<59:48>-1777) v (X[j ]<59:48>-6000) Jump on B[i], B[j] compapison "EQ BI Bj K" (:- fm" 04) -)((B[I]" B[J]) -)(p ~K; p ~3»; "N.E BI Bj K" (:- fm - 05) -) «B[i] ~ B[j]) [p +- K; p +- 3»); "GE BI Bj oK" (:- fm .. 06) -) «B[i] ~ B[j]) -) (p +- K; p +- 3»; -J "LT BI Bj K" (:- fm .. 07) -) «B[I] < B[j]) -) (p ~K; p ~3»; equal not equal groeathep than op equal less than Subpoutine call peturon jump "RJ K" (:- fml .. 010) -) ( M[K] <59:30> ~048c008o(P (p ~ K + I; p ~ 3» ; Peading (REC) and wpiting (~C) + I) C()OOOOOS; next Mp with Extended Coroe Storoage, subjected to bounds checks, and Ms', Mp' mapping pead extended coroe "REC Bj + K" (:= fml '" 011) -) ( FiglU'e 7 (Contiuued) PMS and ISP Descriptive Systems ments, followed by occasional write accesses to store results. Cc has provisions for multiprogramming in the form of a protection and relocation address. The mapping is given in the ISP description for both Mp, but an Ms('Extended Core Storage/ECS) is not described. ~o'[A[n]:A[('I] + BU] + K-i] (-Ms'[X[O]:X[O] + BU] + K-l]); ""'Fe BJ + K" (:= fmi = 012) ~s'[X[O]:X[O] lJr'ite extended aOr'e -7 ( BU] + K-I] i· Mp'[A[O]:A[O] + BU] + K-I]); Fixe,l I'oin.t ApH!lmet:',J ..l~Z'; ;,oJt,~,d O! ci'ai :~)'~i; UI;in.g X "IXi XJ + Xk" 1:,- fm .. 3(.) ·(X[i]· XU] + X[k]); "'Xi XJ - Xk" (:- fm .. 3]) . '(X(j] . XU J - :~[kJ);' "Oi Xk" (:-,: fm .. 471 J• . (X[i "BXi Xj" (:= fm = lOA) sum~/l1odul0.-,2(X[k]); "BXi Xj + Xk" (:~ fm .. 12) = transmit ·(XCi]· XU] VX[k]); n) ,(X[i] ,-XU] E9'-<[kJ); "RXi - Xk ,', XJ" (:= fm '-' lS) . {Xli J logioal Jlroduat logical sum Zogiaal difj'epence .(X[il.--~XLk]); "BXi - Xk" 1:= fm .. 14) 8um aount the numbep of bits inX[K] ·(X[i)· Xli] ,-XU] AX[k]); A "BXi Xj - Xk" (:= fm integer' integer' differ'ence ·(X[i]. XU]); "8Xi Xj ,', Xk" (:= fm .. l1 ) tpansr>'!i t comp lement ,-XU] " , X[k]); log'ieal proifuct and complement "BXi - Yk+ XJ" (:= fm = 1~). (X[i] <-XU'] v,X[k]); logiaal sum and aomplement; Xk - XJ" (:= fm z: 1])-. (X!:i] .-XU] EP,X[k]); logical. :Uffepence and complement: "BXi • (X[i J "LXi jk" (:= fm = 20) "AXi jk" (:z: fm = 21) -7 XCi] left shift nominally x 2 B(JJ<5:0> {rotate}; R[j]<17> -,XCi] '·X[k] / 2-' (:~ fm = 23) ilr'itr.metie }'ig71t shirt -'I ,R[j]<17> -·X[i] ·-X[kJ "AXi Bj Xk" x 2Jk [rotate}); (X[i) .-X[i] / 2 jk ); "LXi Bj Xk" (:= fm = 72) B[j}.10:()~); __ ( apithmetic J·ight shift nominally -,B[j]<17> -'X[i]~X[k] / 2 B[j]<10:0>; B[J]<17> -... X[i] • X[kJ x 2- Blj }<5:(» {rotate}); "f'lXi Jk" (:~ fm == 43)·, ( jk X[iJ<59:59-jk+l> ._2 - I; Uk = 0) -·X[i] ,·0); Floattng ['oint Apitlzrrletia using X (lnl11 the least sigm:,r1:aant (70) rapt cf ar'1:thmetia ill stored in Floating D1' opl'?l'ations. "FXi Xj + Xk" (:= fm .. 30) -) (X(j] .. XU] + X[k] {sf}); "FXi XJ - Xk" (: .. fm = 31) .... (X[i] '·X[j] - X[k] [sf}); "OXi XJ + Xk" (:= fm .. 32) -, (X[i] .- XU] + X[k] (ls.df1); "f1Xi Xj - Xk'.' (: .. fm = 33) -. (X[i] (·-X[j] - Y[k] [Is.df}); "RX i Xj + Xk" (: = fm = 34) -4 floating sum float'~ng (Hfference ftoath1g dp sum fZoattng dp differenae ( XCi] i- round(X[j]) + round(X[kJ) [sf}); "RXi Xj - Xk" (:= fm'" 35) -, ( Y'o1.md fZoating difference XCi] ~ round(X[j]) - round(X[k]) [sf); "FXi XJ ,~ Xk" (:= fm = 40) _.) (X[ i] .,.- X[j] X X[k] {sf}); "RX i Xj ,~ Xk" (: = fm '" 41) _) ( floating pl'oduct pound ftoating ppoduat Xli] (. X[n x X[k] {sf1; next Xli] ~ round(X[iJ) [sf]); "OXi Xj ,', Xk" (:= fm'" 42) ~ (X[i] (-X[j] X X[k] [Is.dfl); floating dp ppoduat "FXi Xj / Xk" (:~ fm = 44) ~ (X[i] ,-X[j] / X[k] (sf}); floating divide "RXi XJ / Xk" (:'" fm = 373 45) ~ (X[i] i- round (X[J] / X[k]) [sf)); "NX i Rj Xk" (:= fm = 24) -) ( Xli] i-normalize(X[kJ) [sf}; B[J] ~ normal izeuexponent(X[k]) (sf); Figure 7 (Continued) pound ftoating divide nOr'malize 374 Spring Joint Computer Conference, 1970 "ZXI Bj Xk" l:- fm - 25) -+ ( round and normalise X[I] ~ round(X[k]) (sf}: next X[i] ~ normalize(X[i]) (sf); B[j] ~normalizeuexponent(X[i]) (sf}): "UXi Bj Xk' (:- fm- 26) -+(B[]] ~X[k]<58:"8> (si}: unpack X[I] ~X[k]<59.47:0:> (si}); "fXI Bj Xk" (:- fm - 27) -+ (X[k]<58:48> ~ B[]] (51}; pack X[k]<59 .47:0> ~ X[ I] (51)); end Inetructionuexecut1?n Figure 7 (Continued) SUMMARY We have introduced two notations for two aspects of the upper levels of computer systems: the topmost information-flow level, here called the PMS level (there being no other common name); and the interface between the programming level and the register transfer level, called ISP. IN e were induced to create these notations as an aid in writing a book describing the architecture of many different computers-which served to make us painfully aware of the (dysfunctional) diversity that now exists in our way of describing systems. It would have been preferable to have notational systems constructed around techniques of analysis or synthesis (i.e., simulation languages). But our immediate need was for adequate descriptive power to present computer systems for a text. Considering the amount of effort it has taken to make these notational systems reasonably polished, it seems to us they should be presented to the computer profession, for criticism and reaction. The main sources of experience with the notation so far is in the aforementioned book, where we have developed PMS diagrams for 22 systems* and ISP * ARPA network; Burroughs B5500, B6500; CDC 6600; LGP 30; ComLogNet; DEC LINC-8-338, PDP-ll; English Electric Deuce, KDF-9; IBM 1800, 7401, 7094, System/360 (Models 30 ~ 91), ASP network; LRL network; MIT's Whirlwind I; NBS'S Pilot; RW 40, SDS 910 930; UNIVAC 1108. descriptions for 14 systems. ** The levels of· details in all of these is as adequate as the programming manual, i.e., as complete as the description of the PDP-8 example given here. In addition at least one new machine, the DEC PDP-II (these proceedings), has made use of the notation at the formulation and design stage. REFERENCES 1 C G BELL A NEWELL Computer structures: Readings and examples In Press McGraw-Hill Company 1970 2 Y CHU Digital computer design fundamentals McGraw-Hill Book Company 1962 3 J A DARRINGER The description, simulation, and automatic implementation of digital computer processors Thesis for Doctor of Philosophy degree College of Engineering and Science Department of Electrical Engineering Carnegie-Mellon University Pittsburgh Pennsylvania May 1969 4 A D F ALKOFF K ElVERSON E H SUSSENG UTH A formal description of system/360 IBM Systems Journal Vol 3 No 3 pp 198-2611956 5 T B STEEL JR A first version of UNCOL Proceedings WJCC pp 371-377 1961 ** The computers and the associated number of description pages (enclosed in parentheses) are CDC 160A (2), 6600 PPU (2), 6600 CPU (4%;); DEC PDP-8 (2, 3 with options), PDP-ll (5), 338 (5), IBM 1800 (372), 1401 (372), 7094 CPU (7), 7094 Data Channel (672); LINC (~3); RW 40 (272); SDS 92 (~3), 930 (4). Reliability analysis and architecture of a hybrid-redundant digital system: Generalized triple modular redundancy with self~r>epair by FRANCIS P. MATHUR* Jet Propulsion Laboratory Pasadena, California and ALGIRDAS AVIZIENIS University of California Los Angeles, California "A random series of inept events To which reason lends illusive sense, is here, Or the empiric Lifes instinctive search, Or a vast ignorant mind's colossal work ... " Savitri, B.I.C.2-Sri Aurobindo 1 INTRODUCTION: FAULT-TOLERANT COMPUTING The objective to attain fault-tolerant computing has been gaining an increasing amount of attention in the past several years. A digital computer is said to be fault-tolerant when it can carry out its programs correctly in the presence of logic faults, which are defined as any deviations of the logic variables in a computer from the design values. Faults can be either of transient or permanent duration. Their principal causes are: (1) component failures (either permanent or intermitent) in the circuits of the computer, and (2) external interference with the functioning of the computer, such as electric noise or transient variations in power supplies, electromagnetic interference, etc. Protective redundancy in the computer system provides the means to make its operation fault-tolerant. It consists of additional programs, additional circuits, *This work was done in partial fulfillment towards the Ph.D. in the Computer Science Department of the University of California, Los Angeles. A preliminary version of this paper was presented as a working paper at the IEEE Computer Group Workshop on "Reliability and Maintainability of Computing Systems," Lake of the Ozarks, Missouri, October 20-22,1969. 375 and additional time of operation that would not be necessary in a perfectly fault-free system. The redundancy is deliberately incorporated into the circuits and/ or software of the computer in order to provide either masking of or recovery from the effects of some types of faults which are expected to occur in the computer. Repetition of programs provides time redundancy. Programmed reasonableness checks and diagnostic programs are forms of software redundancy. Finally, monitoring circuits, error-detecting and errorcorrecting codes, structural redundancy of logic circuits (component quadding, channel triplication with voting, etc.), replication of entire computers, and selfrepair by the switching-in of standby spares (replacement systems) are the most common forms of hardware redundancy. The historical perspective shows that the study and use of hardware redundancy, which began nearly 20 years ago,2,3 has been steadily increasing in the past decade. A very strong reason for this has been the evolution of integrated circuit technology. The inclusion of redundant circuitry is now economically more feasible. The large cost and size of diagnostic software in today's complex computer systems also motivates the relegation of as much checking as possible to special hardware. This special hardware is required to interact with a supervisory program to provide faulttolerance and recovery without interaction with the human operator. The presently existing computer systems with extensive use of hardware redundancy are found in applications with extreme reliability requirements. The 376 Spring Joint Computer Conference, 1970 most interesting illustration is the SATURN V launch vehicle computer which employs triple-modular redundancy (TMR) with voting elements in its central processor and duplication in the main memory.4 Subsequent studies of fault-tolerance in manually nonaccessible computers with life requirements of over 10 years have shown that replacement systems with standby spares of entire computer subsystems offer advantages over complete triplication. 5 These studies have led to the design and construction of an experimental Self-Testing-And-Repairing (STAR) computer.6 This computer is presently in operation at the Jet Propulsion Laboratory. It is being used as an experimental vehicle to study and refine self-repair techniques which incorporate fault-detection and recovery by repetition of programs and/or by automatic replacement of faulty subsystems. Many systems with hardware redundancy (including the STAR computer and other replacement-repair systems) share the common problem of a "hard core." This "hard core" consists of logic circuits which must continue to function in real time in order to assure the proper fault detection and recovery of the entire system. The purpose of this paper is to present the results of a general study of the architecture and reliability analysis of a new class of digital systems which are suitable to serve as the "hard core" of faulttolerant computers. These systems are called hybridredundant systems and consist of the combination of a multiplexed system with majority voting (providing instant internal fault-masking) and of standby spare units (providing an extended mean life over the purely multiplexed system). The new quantitative results demonstrate that hybrid systems possess advantages over purely multiplexed systems in the relative improvement of reliability and mean life with respect to a nonredundant reference system. It is also possible that the continuing miniaturization of computers will make hybrid redundancy applicable at the level of an entire computer serving as the nonredundant reference unit. The hybrid-redundant multicomputer system may then serve as the hard core of very large and complex data handling systems, such as those required for spacecraft, automated telephone exchanges, digital communication systems, automated hospital monitoring systems, and time-sharing-utility centers. TABLE OF SYMBOLS AND NOTATION Failure rate of a non-redundant active unit, (A ~ 0). Failure rate of a non-redundant standby-spare unit, (p. ~ A). K S N n c T tor T (~) Ratio of A to p., (= A/ p.), 1 ~ K ~ 00. Total number of standby-spare units, (S ~ 0). Total number of active redundant units, (=2n + 1). Degree of active redundancy, (= (N - 1)/2). Total number of units in a system, (=N + S). Mission time, (~O). Dummy variables for time, (O~t or T~T). Combinatorial notation for A! (A - B)!B!· DD SU R-S-D unit Simplex system TMR system NMR system Hybrid (N, S) system H(N, S) H(N, 0) H(3, 0) R ("System Characterization") ["time"] Disagreement detector. Switching unit. An abbreviation for the unit which incorporates the restoring organ, switching unit, and the disagreement detector. A non-redundant unit or system. Triple-modularly redundant system, (N = 3). N -tuple-modularly redundant system.. A hybrid redundant system having a total of N + S units of which N units are active and S units .are standby-spares. An abbreviation for Hybrid (N, S). A reduced case of H(N, S) which yields an equivalent system to basic NMR under the assumption of fail-proof R-S-D unit and voter elements. A reduced case of H(N, 0) which yields an equivalent system to basic TMR. The format of a compact notation for simplifying the writing of reliability equations. Here "R" the reliability is followed in parentheses by the "system characterization" such as (N, S), (NMR) , (TMR) or (Simplex) and is then succeeded in square brackets by the parameter "time." The parameter "time" is usually the mission time T and this term may be omitted if it is unambiguous to do Reliability Analysis and Architecture so. If the "system characterization" refers to a simplex system, then both the "system characterization" term and the "time" term may be omitted. Thus, is the reliability of a hybrid redundant system H (N, S) for a mission time of duration T. R(N, S)[T] THE N-TUPLY MODULAR REDUNDANT SYSTEM + 3R2(1 - R) + (a) SIMPLEX (b) TMR R R R 0.60 j::' -< tit 0.-40 Figure 2-Reliability of NMR-type systems vs normalized time NMR system; its reliability equation is (1) The generalization of the TMR concepts7 to an Ntuply modular system utilizing N = 2n + 1 units and having a (n 1) ollt-of-n-restoring organ is illustrated in Figure 1 (c) and is therein designated as the "I 0.80 0.20 The basic TMR types of systems are first reviewed and are illustrated in Figure 1. A simplex or nonredundant system having reliability R is shown in Figure 1 (a). The reliability of the basic triple-modular or TMR system as shown in Figure 1 (b) is given (under the worst-case assumption that no compensating failures occur) by the following well known equation: R(TMR) = R3 377 .. R(NMR) = E(~) (1 - R)'RN-i . 1 notatIOn . · h were t h e comb matorm (N). 't = ( N (2) N! _.)'.' 't .'t. The family of curves illustrating its behavior is shown in Figure 2, with reliability plotted as a function of normalized time AT. The underlying failure law throughout this paper is assumed to be exponential.s Thus the simplex reliability R is given by exp ( - AT) , where A is the failure rate of the nonredundant system when it is active. In the ensuing development of the probabilistic model for the Hybrid (N,S) systems, the assumptIon of statistical independence of failures has been'made. R THE HYBRID(N,S) SYSTEM (c) NMR 2 3 N '. + 1 t---= 2n Figure I-Basic TMR-type systems The Hybrid(N,S) system concept (Figur~ 3) consists of an NMR core, with an associated bank of S spare units such that when one of the N active units fails, the spare unit replaces it and restores the NMR core to the all-perfect state. The active NMR units have a failure rate designated by A, while the standbyspare units, which are said to be in a dormant mode,9 have a failure rate designated by p, (p, ::; A), with the corresponding reliability Rs = exp (- p,T) . The physical realization of such a system is shown in Figure 4, where the disagreement detector (DD) compares the system output .from the restoring organ with 378 Spring Joint Computer Conference, 1970 The "Hybrid(N,S) system concept has been considered by other researchers from the architectural standpoint. ll ,12 A derivation13 of the reliability equation when dormancy of the S spare units is not considered (i.e., when all the S + 3 units in the system are considered to have identical failure rates) yields SYSTEM INPUTS R 2 . R N = 2n + 1 : R R(3, S) = 1 - t 1 SPARES 2 S I I I (1 - R)s+2[1 + (R) + 2)J • (S which is simply the probability that at least any two of the total S + 3 units survive the mission duration, when assumption is made that the majority organ and associated detection and switching logic are fail-proof. R s R s R s DERIVATION OF R(N,S), THE CHARACTERISTIC RELIABILITY EQUATION OF THE HYBRID(N,S) SYSTEM Figure 3-Hybrid (N, S) sytem concept the outputs of each. one of the active 2n + 1 units. When a disagreement occurs, a signal is transmitted to the switching unit (SU), which replaces the unit that disagreed by switching it out and switching in one of the spares.· If the spare "were to fail in the dormant mode and was switched in on demand from the DD unit, the disagreement would still exist and the SU would again replace it by one of the spares. The HybrideN,S) system reduces to a simple NMR system when all the spares have been exhausted, and the whole system fails upon the exhaustion of all the spares and the failure of any n + 1 of the basic 2n + 1 units. In the special case where N = 3 the Hybrid(N,S) system reduces to a Hybrid(3,S) system.lO In the case of zero spares the Hybrid(3,S) system then reduces to Hybrid(3,0) which is the basic TMR system. N IASIC NMR I UNITS First an expression for the reliability of Hybrid (N,l) system (i.e., S = 1) will now be derived. Let the N basic units be designated as aI, a2, ... aN, and the spare as 81, as follows: HYBRI D(N ,5) -.. a2 ·· aN =2" + · 1 .. al I - - . ~ 51 Three cases may be distinguished which yield the success of the system for any mission time T. These three cases are illustrated by means of the line drawings shown in Figure B, Figure C, and Figure D. The notation of these descriptive drawings is explained in Figure A. -' SYSTEM ~i-4----4-t SWITCHING UNIT OUTPUT TIME - - . . ~~------~-----Ir~--------~~~.----~IT ~T { RESTORING ORGAN Figure 4-Hybrid (N, S) system block diagram ~~~ { f ~ AN ACTIVE UNIT,.I FAilS AT TIME If 11 $II I, A DOWANT UNIT $II fAILS AT TIME 12 ~ $12 l'-._PlAClMENT IV $12 TAKES PLACE AT 13 The nomenclature in Figure A is the following. The horizontal line represents the time axis from the start of the mission (time = 0) to the end of the mission Reliability Analysis and Architecture (time = T). The region above the lines is the domain of the active units (massively redundant) while the region below the line is the domain of the dormant units (selectively redundant). Arrows leaving the line represent failure of a unit. The direction of the arrow leaving the line towards the active or the dormant domain indicates failure of an active or dormant unit respectively. An arrow going towards the line indicates a replacement action where a dormant unit replaces a failed active' unit, thus in Figure A t3 would equal tl since the failure of an active unit demands a replacement from the spare bank. Case (i). All units survive mission time T: 379 Case (iii). An active unit fails before the spare: ~ -,________ 'l., NMR HYBRID (N, 1) Ila',02, . . . aN) °1~ ~"""·IT '-(s1) At some time t one of the basic N units fails and is replaced by the spare sI, thus leaving the system in basic NMR for the rest of the time [T - t]. The probability of this event is: jo e-p.t. ';..e-Xt • e-(N-l)Xt • R(N, 0) [T T (01,02, . . . ON)~ /(01,02, .... ON) °1~ ______ __________________ ~ ~IT N t] dt Summing the above three cases yields (11)./ '--(s1) R(N, 1) [T] = RN[T]R8[T] Figure B shows that the active units (aI, a2, ••• aN) which were good at time = 0 are still good at time = T and likewise for the dormant unit s1. This event has the probability RN . Rs. Case (ii). The spare unit is the first unit to fail: 14 ~I" HYBRID (N, 1) NMR ~I t 01 iT lSI '-(11) At some time t (0 ~ t ~ T) the spare unit sl fails, leaving the system in basic NMR, i.e., Hybrid(N,O), for the unelapsed time [T - t]. The probability of this event is T jTo e-(NHp.)t • R(N, O)[T - t] • dt (3) Similarly it may he shown that for the case of two spares R(N, 2) [T] = RN[T]R82[T] + (N';.. + 2J.L) jT e-(NH2p.)t • R(N, 1) [T - t] • dt (4) o and, in general, for S spares / ( 0 1 , 0 2 , . . . oN) jo + (N';.. + J.L) e- NXt • J.Le-p.t • R(N, 0) [T - t] dt R(N, S) [T] = RN[T]RsS[T] + (N';.. + SJ.L) jo e-(NHSp.)t T .R(N,S-I)[T-t].dt which may be rewritten by letting T (5) = T - t as = RN[T]RsS[T] . {1 + (N). + Sp,) f.T e(NA+S.). • R(N, S - 1)[TJ· dT) (6) where R(N, O)[T - t] = L n i=O (N). . (1 - R[T - t])i t The recursive integral equation for the case of one spare (S = 1) has the solution R(N, 1)[TJ = RNR. [1 + (NK + 1) • RN-i[T - t] which is the reliability of the basic NMR system for a mission time [T - t]. ~ i . (i) l EG) (-1) i-l ( 1 )] (Kl + 1) RsRl - 1 (7) 380 Spring Joint Computer Conference, 1970 > and the general solution for (S R(N, S)[T] = RNRs8 o (~ oI: i 1-0 , (-l)~-l e E(NK++ S) 8-2 + j 1 l)i+l + t~=O (~) (NKS+ S) ~ _ Rs (i)l 1 symbols in the above equations are summarized in Table 1. In the derivation of the above equations it was assumed that the restoring organ, the switching unit, and the disagreement detector (jointly referred to as the R-8-D unit) are fail-proof. In order to incorporate the reliability of these units, they may be assigned a lumped parameter R v , reflecting their reliability; and with the simplifying assumption that the R-S-D unit has a series reliability relative to the ideal Hybrid (N, S) configuration, the term Rv may be used as a product term to directly modify the reliability equations derived here. 1) is given by {(-1- 1) -I: (Kl + S) DISCUSSION OF THE MODEL BEHAVIOR 8-2 ') Z; RfR' ° +1 j j=O 1- 1 (Rs )i+l} (8) where K = A/ p,; p, ~ A and 1 ~ K < 00. For the special situation of non-failing spares, we have K = 00, (i.e., p, = 0) and the solutions (7) and (8) reduce to: (i) for S = 1 R(N, 1)[TJ RN {I = (N) + XNT( -I)" (~) i) +NI: ~. I:,J n i=1 (ii) for S > The application of redundancy in general does not necessarily guarantee improvement in reliability. This is especially evident from the characteristic reliability curves of the simple NMR system as shown in Figure 2. It is to be noted that if R (the reliability of the nonredundant unit) is less than 0.5 (i.e., AT > 0.697) then the system is worse off with redundancy. Furthermore the application of higher orders of redundancy (larger value of N) makes the system progressively worse. Also one of the characteristics of such a system is that the cross-over point where the redundant system reliability is equal to the non-redundant system reliability does not vary with the order of redundancy N. It sets a large lower bound on the reliability of the original system amenable to improvement by the application of the basic NMR form of redundancy technique. i ( 1.00 (-1) i - j (~_ . J j=1 R' )} 1 (7a) 3 0.80 1 _ N R(N,S)[T] - R {8-1 I: (NAT) ., &' i 2. i=O + ~ (NAT)s( -l)N Z , 0.60 ;;; 0 8. z < ° (2n) + N8 ~ (N). ~ (i)J. ( i n ~=1 n ~ ~ ..; 0.<40 ?it -1) i - j 3=1 . 0.20 o [~ (~ js Rj _ 1) _8-1E~]} js-l l! (8a) The proof that equations (7) and (8) are the solutions to the recursive integral equation (6) may be verified by inserting them on the righthand side of (6) with parameter equal to S - 1. The meanings of all 0.20 0.«) 0.60 R(SIMPLEX) 0.80 1.00 Figure 5-Comparative reliability curves of H (3, S), NMR, and simplex systems Reliability Analysis and Architecture 1.00 381 ---"'2 l\'WiII:""'I=::::""'O:~--'-------r-----r------, 1.00 ,....------,---....,------,.----:::::::;;. . . HY8RID(N,2) 0.80 1-----4--~_+-~._-+----_1'---__t O.~r--~~~r-~~--+-------r-------~ t C-s 0.60 C-7 ~ C"'9 ::; ~ :::i 0.«> 1-----4--.,~J>MI7IF-----+-----_t_---__t ILl '" O.20I----~~'-+--+----+-----t------t 0.20 .OO~~~~ 0.01 ~ _____________ L____________ ~--------~~ 0.-40 0.60 0.80 R(SIMPLfX) ... EXp(-LAM8DA x n 1.00 Figure 8-Reliability R (N, 2) vs R (Simplex) Figure 6-Reliability comparison of a H (3, S) and NMR systems vs normalized time XT The effects of hybridization (i.e., the addition of standby spares) on the NMR system with the replacement form of redundancy as analytically expressed by equation (8) are shown graphically in Figure 5 through Figure 10. The reliability of the Hybrid(3,S) system for the case of N = 3, K = 1 and with several values of S (the number of spares) is shown in Figure 5 and Figure 6. Also alongside for comparative purposes the reliability curves of the NMR system and the nonredundant system are also shown. In Figure 7 and Figure 8 are shown the reliability of the Hybrid(N,S) system versus the reliability R of the non-redundant unit for several values of N. They illustrate the effect ______ 0.20 of the variation of the order of redundancy N in the NMR core. In Figure 9 and Figure 10 are shown reliability curves for K = 1 and K = 10 respectively for various values of the number of spares S. The improvement in reliability of the Hybrid(N,S) system over the NMR system is readily seen from the curves. It is to be noted that the well-known crossover point, which in NMR systems occurs at a reliability of 0.5 is significantly reduced in the Hybrid(N,S) system. With N = 3 and S = 1 the crossover point occurs at R = 0.233 for the value of K = 1, and rapidly diminishes with higher allocation of the number of spares (S > 1). The shift in the crossover point is also 1.00 1.00 HYBRID(N,1) -• 0.80 0.80 K -1 lit % i ';;t ~ 0.60 ~ ....>- ::; iii « ::; i 0.60 0.«1 ~ '"~ Ca .. Cz6 C=8 ::; 0.«) iii « ::; ILl '" 0.20 0.20 .00 0.-40 0.60 R(SIMPLEX) • EXp( -LAM8DA Ie 0.80 n Figure 7-Reliability R (N, 1) vs R (Simplex) 1.00 0.01 0.20 0.«) 0.60 R(SIMPLfX) - EXP(-LAMlDA Ie 0.10 n Figure 9-R (3, S) vs R (Simplex) with K = 1 1.00 382 Spring Joint Computer Conference, 1970 1.00 HYBRID(3,S) ~ O.80I----,t---lh~---+~-~1_-~~~---_; II ~ ::t: I- ~ 0.60 0.20 0.80 0.60 0.«1 R(SIMPLEX) 1.00 =EXp( -LAMBDA x T) Figure lO-R (3. S) vs R (Simplex) with K = 10 sensitive to variations in the value of K. The effect of changes in values of J( on the system reliability and the shifts of the crossover point become very slight when K exceeds the value of 10. The decision as to how to allocate redundancy for a given total number of units C = N + S, where N is the number of active redundant units in the NMR core and S is the number of standby spares, is resolved by the curves shown in Figure 7 and Figure 8. Since N is always an odd number it follows that if C is odd then S is even and vice versa. The possible allocation policies are then as tabulated below. With one spare, S = 1, as shown in Figure 7 the improvement in reliability in going to higher order N of active redundancy is restricted to the range 0.58 < R < 1. When the number of spares is increased to two, S. = 2, with N as the variable, the range of improved TABLE II ALL POSSIBLE ALLOCATION POLICIES OF C Ce is even Co is odd N = 3 N = 5 N=N N = Co - 2 S = Co - 3 S = Co - 5 N = 3 N = 5 S = Co - N N = N S N = 2 _N_=_C_o_-L_S_=_O___---'. _N S = Ce S = Ce = Ce = e C S = Ce - 3 S = 3 - 1 IS = 1 - - - 3 5 N reliability is further restricted to 0.65 < R < 1. Also, within this shrinking range (as a function of increased S), the improvement in reliability due to larger values of N also tends to become less significant. This indicates that the order of massive redundancy N should be kept at a minimum in the NMR core (i.e., N = 3). Maximum redundancy should be inserted in the spares bank, thus in practical implementation N should equal three, with S as variable to suit the desired level of mission reliability. Hardware utilization and hence cost is another major advantage of the Hybrid (N,S) redundant system. Efficient hardware utilization over comparable NMR systems is due to the fact that for an equal number of total N units the NMR system will tolerate failures of only (N - 1) /2 units whereas the Hybrid (3,S) system will tolerate as many as N - 2 failures. Thus when an NMR system fails it leaves behind n good units while in the Hybrid(3,S) system only one good unit remains upon system failure. In general the Hybrid (N,S) system upon exhaustion of all spares and subsequent failure of the system leaves (N - 1)/2 good operating units which is a minimum when N = 3. Thus another argument for keeping the parameter N confined to the value three in Hybrid(N,S) system is this of efficient hardware utilization. ACKNOWLEDGMENTS The authors wish to thank William F. Scott, John J. Wedel, and George R. Hansen of the Flight Computers and Sequencers Section of the Astrionics Division of the Jet Propulsion Laboratory for their constant encouragement and for providing the atmosphere conducive to this research. Thanks are also due to Prof. Leonard Kleinrock of the University of California, Los Angeles for his advice on the subject of queueing theory; the notation used herein to describe the dynamics of replacement has been adapted from similar notation used to describe the behavior of queues. This paper represents in part research which has been carried out at the Jet Propulsion Laboratory under NASA Contract NAS7-100. REFERENCES 1 S AUROBINDO Savitri-A legend and a symbol Sri Aurobindo International University Centre Collection Vol II Pondicherry India 1954 2 J VON NEUMANN Probabalistic logics and the synthesis of reliable organisms from unreliable components In Automata Studies p 43-98 Princeton University Press Princeton New Jersey 1956 Reliability Analysis and Architecture 3 E F MOORE C E SHANNON Reliable circuits using less reliable relays J of the Franklin Institute Vol 262 Pt I pp 191-208 and Vol 262 Pt II p 281-297 1956 4 J E ANDERSON F J MACRI Multiple redundancy application in a computer Proc 1967 Annual Symposium on Reliability p 553-562 Washington 1967 5 A A AVIZ lEN IS Design of fault-tolerant computers AFIPS C~nference Procedings Vol 31 p 733-743 1967 6· A A AVIZIENIS J ROHR F P MATHUR D RENNELS A utomatic maintenance of aerospace computers and spacecraft information and control systems Proc of the AIAA Aerospace Computer Systems Conference Paper 69-966 Los Angeles September 8-10 1969 7 J K KNOX-SEITH Improving the reliability of digital systems by redundancy and restoring organs PhD thesis Electrical Engineering Stanford University August 1964 8 R F DRENICK The failure law of complex equipment J Soc Ind Appl Math Vol 8 No 4 p 680-690 December 1960 . 383 9 F P MATHUR Reliability study of fault-tolerant computers In Supporting Research and Advanced Development Space Programs Summary 37-58 Vol III p 106-113 Jet Propulsion Laboratory Pasadena California August 31 1969 10 F P MATHUR Reliability modeling and analysis of a dynamic T M R system utilizing standby spares Proc of the Seventh Annual Allerton Conference on Circuit and Systems October 20-22 1969 11 J GOLDBERG K N LEVITT R A SHORT Techniques for the realization of ultrareliable spaceborne computers Final Report Phase I Project 5580 Stanford Research Institute Menlo Park California October 1967 12 J GOLDBERG M W GREEN K N LEVITT H S STONE Techniques for the realization of ultrareliable spaceborne computers Interim Scientific Report 2 Project 5580 Stanford Research Institute Menlo Park California October 1967 13 J P ROTH W G BOURICIUS W C CARTER P it SCHNEIDER Phase II of an architectural study for a self-repairing computer . International Business Machines Corporation Report SAMSO TR-67-106 November 1967 The architecture of a large associative processor by GERALD JOHN LIPOVSKI University of Florida Gainesville, Florida INTRODUCTION This paper will describe features of architectural significance to the segmentability of a processor; it is not intended to be a detailed description of a processor for Information Storage and Retrieval. We regret that the incorporation of some features cannot be defended here because of the length of this paper. They are presented in a report.16 We first state the types of problems to be processed. This will lead to the overall organization of the processor. In Information Storage and Retrieval, a processor should have the capability to store data which is formatted as ordered sets or unordered sets, and to retrieve all such sets having a specified subset. An unordered set search for a given subset 8 retrieves all sets ·containing 8. An ordered set search for a given ordered subset 8 retrieves all ordered sets containing 8. A string search for a given string 8 retrieves all ordered sets (strings) having a substring 8. For example, if S = (SI, S2, S3) and 8 1 = (SI, a, S2, S3), 8 2 = (a, b, SI), S2, S3, C, d), 8 3 = (S2, SI, S3) and 8 4 = (SI, a, b, S2). Then an unordered set search for 8 would retrieve 8 1, 8 2, 8 3, an ordered set search for 8 would retrieve 8 1 and 8 2, and a string search for 8 would retrieve 8 2. By storing ordered sets and unordered sets and allowing pointers to another set to be stored as elements of a set, one can store data formatted as a colored relational graph.22 By permitting ordered sets or unordered sets to be elements of another ordered set or unordered set, one can store data formatted as one of several types of trees. Thus, the capability to store ordered and unordered sets is sufficient to store most useful types of data. The retrieval capability of this processor extends beyond set, ordered set, and string searches, but this feature will not be discussed. A class of processors which are well suited to the problems discussed above are those having a linear array of associative memory cells (Linear array processors). Each cell has a fixed size word of memory, W, and a comparator. All cells simultaneously receive a word C 385 and a mask M, which are broadcast in the channel; normally, a cell is said to match if 1 = /\ / (C = W) V M (Iverson notation). A set is generally stored in a collection of contiguous cells (aggregate) with one element of the set in each cell. Contiguous cells are consecutively numbered in Figure 1. (The numbers are for descriptive purposes in this paper, they are not addresses.) One or more rails between cells are used to combine the results of matches from various cells in the aggregate to detect the existence of a given ordered subset, or substring, in the ordered set stored in the aggregate, or the existence of a given subset in the set stored in the aggregate. A linear array processor was first used by Lee and Paull for string searches. 14 Gains and Lee9 found it useful for ordered set searches, and Savitt et al. 23 found it useful for unordered set searches. Sturman28 showed it possible to store and broadcast instructions from these associative memory cells, and therefore dispense with the need for a central processing unit, and Smathers25 has added several practical improvements to this iterative processor model. These approac;hes were consolidated into an iterative processor designed for set, ordered set, and substring searches. 16 For information retrieval theproeessor should be large to efficiently handle large data 'Sets, and it should have the capability to be loaded and unloaded quickly. Since the same cell can be made to store data or broadcast an instruction, it is clear that an iterative processor based on any of the previously discussed linear array processors can be segmented ·into several independently acting collections of cells, each executing a different program or different parts of the same program. We have found that the capability to broadcast instructions and segment a processor increases the "cost" (number of gates) in a basic associative memory cell by 18% in a possible realization of a processor which was studied. Therefore, if 36% of all programs run on the processor can be executed in pairs of segments simultaneously in the processor, the capability to segment 386 Spring Joint Computer Conference, 1970 a processor is economically justified. This is expected to be the case in large processors. A high degree of parallel programming for subprograms of the same program is made available, and each segment can be connected to a different I/O device for parallel, more efficient, loading and unloading of information in a segmentable processor. + . - Word Cell 1 /' Register Comparator I Mff 1"'-' THE TREE CHANNEL PROCESSOR I a.anne! (connects to all cells in general) ~ Word ~ Rsster Cell 2 ~ Comparator I Mff I I' I"" It ---- Rail (interconnects pairs of cells) A linear array processor (Figure 1) suffers from excessive propagation delay and a succeptability to faults, especially where a cell output amplifier is stuck-on-one. Propagation delay on the rail, where a signal may have to propagate through many cells in one clock period, is especially slow. Figure 2 shows a better connection scheme (tree channel processor). The word, C, normally is broadcast to all cells "simultaneously" via the channel in the tree branches shown, and rails are used to communicate the results of Word ~ - R!iI!ster Comparator I Mff t I / Cell 3 I I + Word -"" , to other cells Register Cell n-1 jm:ator l ¥ I' Word -""- , Register Cell n Comparator I Mff I + Figure 1-A linear array processor / Note: Cell numben are for descriptive purposes in this report; they are not addr. Cell numbering is determined by relative position of the cellon the rail. The root cell always has the largest number. Figure 2-A dubtree of a tree channel processor Architecture of Large Associative Processor searches in aggregates just as in the previous connection scheme. This processor has two rails, each connected as the rail in Figure 2. Note that the rail connects consecutively numbered cells in Figure 2 just as in Figure 1. (Cell i is said to be above cell j, j > i and below cell k, k < i.) However, if a signal at point "a" on the rail must propagate to and through cells 1, 2, and 3 to point "b" and beyond in Figure 2, it will "short cut" directly through cell 3. This decreases the delay time. The ~aximum propagation delay time through gates. and through transmission lines in the channel or rails determines the clock rate of the processor. For a processor having n cells, it grows like a log(n) + b(n)lOg72 in a 7-way homogenous tree,* which IS conveniently realizeable in three dimensions. * Iverson Notationll The tree structure is economically segmented. In Figure 3, by setting a flip-flop in, say, cell 3, the channel can be disconnected between cell 3 and cell 7 (delimit the channel at cell 3). This forms a subtree, cells 1, 2, and 3. In each subtree both rails are effectively reconnected between pairs of cells to provide a "linear array," as in Figure 2 or 7. Within each subtree, in one clock time unit, some cell broadcasts its word into the channel to all cells in that subtree. Each cell amplifies the channel signal and propagates it as soon as possible. All cells obey this instruction in the same time unit. Each subtree operates independently and simultaneously. Because these subtrees define the extent of broadcast of instructions, they are called instruction domains. (ID's) While ID's normally act independently, they can issue instructions to delimit the channel or reconnect it (processor management). Input output is provided at ~.I----='" various leaves of the tree, as in Figure 3. An ID, for example, "A" in Figure 3, gains control of an I/O channel channel below cell 8 by merging with ID's "B" and "C". It does this by changing the flip-flops that delimited the channel in cells 3 and 8. It is clearly possible to simultaneously connect ID "F" to the other I/O unit and load data into cells in "F" while "A" is also being loaded. This architecture has two basic drawbacks. The placement of I/O channels is fixed; for example, it is inconvenient for ID "A" to utilize the I/O channel below cell 17 (Figure 3), and impossible, while this I/O transaction is taking place, for ID "F" to gain access to the I/O channel below cell 8. Further, as we show in a later section, for each I/O transaction, programs in all ID's are temporarily stopped while some ID's are merged in order to connect an I/O channel to the ID requesting it. Later, programs in all ID's must again be stopped to restore the previous arrangement of ID's. In a large processor, the "overhead" to begin and terminate each I/O will be high. To circumvent these difficulties, a switching network (SW-structure) is used at the root of the processor tree. The resulting architecture has some properties of a computer network; it will be discussed later. ESSENTIAL CHARACTERISTICS OF THE CELL A description of the cell will be incompletely given for two reasons. Firstly, a complete discussion of the instruction set for the processor which was studied would be too long and would detract from the study of the segmentation of a processor. Secondly, the results obtained here apply to various iterative processor architectures in which instructions are broadcast in a channel. _0011_. . lit! _ao-. FCff MIt ICff __FouIty ColI . . ColI MIl _T_ RCff 8If SCfI TMIf _ColI ...oct ........,.......-CoII T_ _ . _ 14 Figure 3-A processQl'. with I/O 387 Figure 4-The essential cell 388 Spring Joint Computer Conference, 1970 TABLE I -Cell, Instruction Domain, and Response Modes Instruction domain modes: Cell modes Independent root Dependent root R-instruction S-instruction R-data S-data Faulty Assignment SM TM RC SC 1 1 0 0 0 0 1 0 0 1 0 1 FC = Supervisor Transfer Enable Run 1 0 0 1 0 0 Data Data Passive Instruction Passive Data B-delimit R-delimit Passive Passive Transfer Passive B-delimit R-delimit Instruction Passive Data Passive Passive Passive Passive IC 0 0 1 1 0 0 1 The essential elements of the cell are shown in Figure 4. The flip-flops, FCff, TMff, ICff, RCff, and SCff, and the lines MTr and SMr, are used to control the processor as we shall now describe. In the following discussion of modes and mode changes, for the sake of concreteness and clarity, a tentative assignment of states, or modes, to these flip-flops and lines will be made. The tree channel processor possesses instruction domain modes. These modes are indicated in each cell by SMt and TMff (Figure 4 and Table I). In a given ID, all cells are in the same instruction domain mode. The run mode enables the normal execution of instructions for information retrieval. The transfer enable mode provides for efficient loading and unloading of words in the word register of cells by means of the channel. The supervisor mode enables channel delimiting cells to be set up or changed. The operation of these three modes will be explained in the next three sections. All cells are essentially identical in construction and capability in this iterative processor. Each cell, however, has a cell mode fixed by RCff, SCff, ICff, (Figure 4 and Table I) to assign to it a specific modus operandi. For a given instruction domain mode and a given cell mode, a cell has a response mode (Table 1). The data mode, or data response mode, permits a cell to store data and interpret instructions that search or write in data cells. The instruction mode, or instruction response mode, permits a cell to issue instructions. In the passive mode a cell will not issue instructions, it cannot be written in, and none of its control flip-flops can be changed directly by instructions in the channel. Rails to the cell immediately above it are connected to the cell immediately below it, and the cell itself does not broadcast a "one" signal on either rail. Cells detected to be faulty can have FCff set; they then become permanently passive. Both the B-delimiting (bidirectorial delimiting) and R-de- limiting (rootward delimiting) modes are similar to the passive mode except that they cause a cell to delimit the channel. A B-delimiting cell completely separates the channels of two ID's, while an R-delimiting cell permits leaf-ward broadcasting through it but delimits rootward broadcasting through it. In either case, the rail is reconnected to connect consecutively numbered cells in each instruction domains set up as in Figure 2. For example, in Figure 3, if cell 3 is B-delimiting, the instruction in the channel of ID "A" must be broadcast from some cell of ID "A." On the other hand, if cell 3 is R-delimiting, the instruction in the channel of ID "A" is the bit-wise OR of the instruction broadcast by a cell in ID "B" and the instruction broadcasting a cell in ID "A." This type o'f connection is particularly useful in diagnoising cells. OPERATION IN THE RUN MODE In the run mode, R-data cells are in the data mode and R-instruction cells are in the instruction mode. The generation of instructions in a segmentable processor poses some problems. It is possible, of course, to merge two ID's, both broadcasting their own sequences of instructions, into one ID, in which only one sequence of instructions is broadcast. The machinery for selecting a cell in an ID to broadcast must be able to automatically resolve which sequence broadcasts and when. The Z-propagating rail 26 does not satisfy this requirement. An instruction is any command that a programmer can effect by causing some cell to broadcast its stored word. In each ID, at each clock pulse, one instruction is read into the channel and is broadcast to all cells in the ID. The mechanism for selecting this broadcasting cell is the broadcast queue flip-flop, Bff, and the broadcast priority rail, BPr, in a conventional priority-determining Architecture of Large Associative Processor circuit. The set of data or instruction cells with B = 1 constitute the broadcast queue. By means of BPr, which broadcasts a "one" signal from cell i, if B = 1 in cell i, to BPt'in all cells j, j > i, a B-prior cell will be selected; it will have B = 1, BP = O. This cell will broadcast its word in the channel and reset Bff. Thus, at the next clock pulse, another cell becomes B-prior and broadcasts in the channel. For example, if B = 1 in cells 2, 3, 4, and 6 in Figure 5, cell 2 would set BP = 1 in cells 3-7; cell 2 only would broadcast. 'Vhen cell 2 resets Bff, cell 3 broadcasts, then cell 4, and cell 6 in turn. Cells in the data response mode are assumed to obey at least the following instructions. A Match instruction resets Mff in data cells, then sets Mff if the comparator detects a match between the contents of the channel and the word stored in the cell. The match result rail has a switch controlled by some flip-flop in the cell (not shown) ; by means of this rail, the value in Mff in a cell is broadcast so that string searches, at least, can be carried out. We also require an instruction or program for selecting the one cell i with M = 1 and smallest integer i for which M = 1 (priority instruction). A write instruction is assumed, to change the word in a cell. Several processors have these basic instructions. 14 ,9,22,16 We have included a flip-flop, Sff, in our cell for the transfer enable mode; we require some instruction to load Sff from Mff. Other instructions will be explicitly mentioned in the following sections. Instruction mode cells can be made to broadcast their word if Bff is set. This setting is accomplished with the help of the match result rail (Figure 4). Broadcasting the JUMP instruction clears Bff in all instruction cells and loads the value, U, on the match result rail into Bff. Suppose, in Figure 5, that the switch in the match result • Figure 5-An ID with instructions 389 ...... 2........·2 ...... 3• .......·1 ...... 4. ....... 12 Figure 6-A tree rail is open only in cell 7. If the most recent match instruction set Mff in cellI, a data cell, then U = 1 in cells 2-7; a JUMP instruction sets B = 1 in cells 2, 3, 4, and 6 and clears Bff in any other instruction cell. Cell 2 broadcasts next. If the most recent match instruction set Mff in cell 5, then U = 1 in cells 6 and 7; a JUMP sets B = 1 in cell 6 only and clears Bff in cells 2-4 and any other instruction cells. This simple mechanism can be used in programming a variety of conditional jumps as well as unconditional jumps. It will be shown that all cells are initially in the R-data mode. In this mode, an addressing scheme for locating cells to become channel delimiting is necessary to segment the . processor in a predictable way. In Figure 3, for example, one must be able to locate cell 9 to set up instruction domain B. An address for each cell is neither practical nor desirable. Instead, only leaf cells, cells in level 4 in Figure 6 are marked. An instruction, LOCATE LEAF, sets M = 1 in all these leaf cells and resets Mff in all other cells. Consider a 2-way homogenous tree in Figure 6. A cell n is in level f, - k if cell n - k is a leaf cell and all cells m, n - k < m ~ n, are not leaf cells. For example, cell 7 is in level f, - 2 since cells 7-0 = 7, 7-1 = 6 are not leaf cells, and cell 7-2 = 5 is a leaf cell. It is clearly possible to replace the match instruction with a LOCATE LEAF instruction in a string search in order to set Mff in all cells of level f, - k. The exact mechanism for a specific instruction set depends on the way string searches are programmed with that instruction set. The priority instruction can then be used to "count down" cells in level f, - k to choose a specific cell. Repeated use of this method can be used to locate any cell in the tree in a short time. For example, to locate cell 9 in Figure 6, we use the above method to locate cell 14, then in the subtree below cell 14, we use this method· to locate cell 10, and in its subtree, we locate 390 Spring Joint Computer Conference, 1970 TABLE II-Mode Changing Instructions INSTRUCTION SET IC RESET IC SET SC RESET SC SET RC RETSET RC ACTION In data cells where U = 1, set ICff In instruction cells where U = 1, reset ICff In data cells where M = 1, set SCff In data cells where M = 1, reset SCff In the supervisor mode, in data cells where M = 1, setRCff In the supervisor mode, in data cells where M = 1, reset RCff cell 9. This method reduces the amount of "counting" using the priority instruction. The maximum number of program steps to locate any cell grows logarithmically with the number of cells in the processor; Once the cell has been located, a unique word can be stored in its word register so that it can later be found with a simple match instruction. Since all cells are initially in the R-data mode, some instructions are required to change cell modes. To change cells found by the technique just described to channel delimiting cells, they are first changed into S-data cells in the run mode, and then to root cells in the supervisor mode (SM = 1). One must be able to convert R-data cells to R-instruction cells, and vice versa to compile and then execute instructions. These mode changes are carried out with instructions given in Table II. Note that ICff is reset or set from the signal on the match result rail, similar to the JUMP instruction. Lastly, instructions are required to change instruction domain modes. These, the TRANSFER CALL and SUPERVISOR C~LL, will be examined in turn. OPERATION IN THE TRANSFER MODE A transjeT is the operation of broadcasting a word either from one R-data cell or from an input channel and writing that word in an R-data cell or sending it to an output channel. An efficient transfer mechanism is required to load large data bases into or out of a large processor. This is proyided by the transfer enable mode, which is described in simplified terms below. This mode is entered when the instruction, TRANSFER CALL, is broadcast. In the instruction doma1n, R-instruction cells. become passive, and R-data cells become transfer cells. The contents of Sff which were obtained earlier from Mff are loaded into Bff in all R-data cells at this time. Whatever sequence of instructions was being broadcast from R-instruction cells is temporarily halted since these cells become passive, and the contents of the channel are not interpreted as instructions even though they would be legitimate instructions in the run mode. The combination, Bff and BPr, is used to select words to broadcast, as in the run mode, and Mff and the match result rail, with all switches closed (Figure 4) are used to select a cell to be written in, in a similar way. If any transfer cell has B = 1 the transfer cell with B = 1, BP = 0 broadcasts its word into the channel and resets Bff. If any transfer cell has M = 1, the transfer cell with M= 1 and U = 0 writes this word in the channel into its register and resets Mff. If an I/O channel is connected to the instruction domain in the transfer enable mode, words broadcast from cells will be sent to the I/O channel. To output some words, we set S = 1 in cells containing these words, we reset Mff in all R-data cells so these words are not written in other cells, and we merge ID's to include the I/O channel. Some transfer cells above these cells contain channel comments in their word register; we set S = 1 in these cells too. Then in the transfer mode, the channel commands are first broadcast to select an output device and format, and the words to be output are broadcast. To input some words, we use the transfer mode twice. The first time, we output channel commands to select an input device and format. Then, we reset Sff in all R-data cells and set M = 1 in cells to be written in. When we re-enter the transfer mode, words are read from the input device until an end-of-file is encountered. While the I/O channel broadcasts words into the channel, it broadcasts a "one" signal in BPl to the root cell of the ID (See Figure 7). Mff and the match result rail selects one cell each time to write the word broadcast from the input channel. The transfer enable mode is terminated when neither the I/O channel nor any cell is broadcasting. In this condition, BP = 0 in the root cell of the ID, and that cell broadcast M T = 1 to all cells in the ID. (See Figure 7) These cells reset TMff. The ID returns to the 1 - .......... - CoIIo_ ... _·...,,_ Figure 7-Controllines in the transfer enable mode Architecture of Large Associative Processor run mode, and R-instruction cells, no longer passive, continue broadcasting the instruction sequence exactly where they left off when the transfer enable mode was entered. OPERATION IN THE SUPERVISOR MODE The supervisor mode is provided for processor management. In it, dependent and independent root cells are set up or changed, thereby, ID's are changed. (See Figure 8) These root cells and S-data cells are in the data mode, and can be searched, written in, or changed. S-instruction cells, which are in the instruction mode, are created only when the supervisor mode is entered. If some ID in the processor wishes to enter the supervisor mode to change the segmentation structure, it broadcasts the instruction, SUPERVISOR CALL. This sets SCff in all R-instruction cells in that ID. These become S-instruction cells. Whenever any S-instruction cell exists, it continuously broadcasts a "one" signal on the supervisor mode line, SMt, to all cells in the processor (Figure 7). This signal causes all cells, and thus all ID's, to simultaneously enter the supervisor mode, and remain in it as long as 8M = 1 (Figure 8). All R-data cells, and all R-instruction cells in ID's which did not broadcast SUPERVISOR CALL, become passive. Therefore, if some ID were interrupted while executing some program because some other ID broadcast a SUPERVISOR CALL, all information pertinent to that program is "frozen" in passive cells. If that ID returns to the run mode, it will co~tinue executing the program where it left off. This is also true' if the ID were in the transfer enable mode when the 7 - - _ _ _ _ - . , - ..... _ _ - I , . " , _ ... _ _ . , . . . _ ____ ' ... I _ .. -- -.·'''._~')I. ..... _ _It_ __ '''''_'''''''' Figure 8-Controllines in the supervisor mode 391 supervisor mode was entered. A programmer can therefore ignore the possible interruption of his program by an entry into the supervisor mode unless the ID in which he is programming is changed during the interruption. In the supervisor mode, the instruction sequence is provided by S-instruction cells. Data cells are dependent and independent root cells, and S-data cells, all of which could not be changed while an ID was in the run mode. The entire processor appears to be a single instruction domain, although almost all cells are passive. All the search and write instructions that were available in the run mode are available in this mode to identify root cells or write in them, and the instructions, SET RC and RESET RC (Table II) can be used to set them up or change them. It is possible for two ID's in the run mode to simultaneously broadcast SUPERVISOR CALL. The instruction sequence for the supervisor mode will be a mixture from the S-instruction cells in both ID's. Further, two ID's may wish to change the processor segmentation structure in contradictory ways at approximately the same time. To accommodate these possibilities, some software conventions are necessary. In one of these, a separate ID is designated the executive. I t alone will keep an updated account of the processor structure. In the supervisor mode, it alone will change root cells. Other ID's that wish to change the structure will set flags in some R-data cell and change this cell to an S-data cell while the ID is still in the run mode. The executive will then periodically inspect these S-data cells when in the supervisor mode to determine what requests are impending. When no more instructions are broadcast in the supervisor mode, BP = 0 in the root cell of the processor. (See Figure 8) This cell broadcasts MT = 1 to all cells. S-instruction cells reset SCff, when M T = 8M = 1, thereby becoming R-instruction cells. Since S-instruction cells no longer exist, no "one" signal is broadcast on SMt to all cells. All ID's then return to the run mode or transfer mode that they were in just before they entered the supervisor mode. We remark that some mechanism must exist to start a sequence of instructions broadcasting in a new ID when it is set up. One way is to let the word 00 ... 0 in the channel be the code word for JUMP. Before the new ·ID is segmented away from the "parent" ID, the latter can broadcast a MATCH instruction to set U = 1 in selected R-instruction cells in what will be the new ID. After segmentation has been complete,. the first instruction in the run mode in the channel will be JUMP because no cell will broadcast. This will cause a sequence of instructions to be broadcast from the selected cells in the new ID. 392 Spring Joint Computer Conference, 1970 SOME REMARKS ON THE TREE STRUCTURE The architecture of the tree array processor has two unrelated properties that are, nevertheless, quite important. Some hints of these properties appeared earlier. We now discuss the capability of a tree array processor to test for and operate in the presence of faulty cells, and the propagation delay in a physical realization of a tree array processor. Before initializing a processor or instruction domain, it is expedient to test it for faulty cells. The test will change the word stored in the word register. While the exact nature of these tests depends upon the hardware realization of the cell finally selected, we can sketch some points of architectural significance here. First, by means of an extra line, we force all cells to become dependent root cells (FC = SC = IC = 0, RC = 1). In this situation, an output amplifier stuckon-one in a cell will not prevent correct testing of most other cells because the "one" signal it generates can travel only leaf-wards to cells in its subtree. For similar reasons, for a tree of t levels, after f, cycles no cell will be broadcasting. All cells are then simultaneously diagnosed. Under the control of additional lines, the channel, word register and comparator are checked out by writing the same words in all cells and by matching for these words and for variations of them that would be caused by errors. Then all the rails are checked out. Cells found to be faulty have FC set in them. (Weare assuming an idealized fault detection mechanism where the fault detection circuitry is itself free of faults.) If a channel amplifier is stuck-on-one, all cells leafward from it will be diagnosed faulty. If, for a given cell, all cells connected next to it in the leafward direction are faulty, the whole subtree below a cell is cut off by making the cell permanently B-delimiting and the cell itself sets FC. For example, in Figure 2, if cells 1 and 2 are faulty, then -cell 3 is made permanently B-delimiting, and cells 1, 2, and 3 become inaccessible. The processor can run in the presence of the faulty cells. The physical arrangement of cells in a processor will now be considered. The primary goal is to estimate the propagation delay time, and thus determine the clock rate to evaluate-the cost of programming. A secondary goal is to find a spacial realization of the tree structure. In this section, we shall consider a good realization of a 7-way homogeneous tree. (A similar realization exists for 25-way or 50-way homogeneous trees. However, a tree fanout of seven permits electrical fanouts of rail amplifiers that are reasonable for integrated circuits.) The following heuristic procedure generates a tree structure by beginning at the tree leaf cells. Put cells in the center and at seven of the corners of a cube. Mark the remaining corner "A. " Link each corner to the ~--I_""" r r - -......._ _ _ ..... Figure 9-Connection of leaf modules center cell. This is a leaf module. In general, connect seven identical modules so that their "A" corners coincide, put a cell where these coincide, put a link from that cell to the free corner, corner "B," of the larger module just created (Figure 9). This procedure can be repeated by connecting seven identical modules of the kind just made so that their "B" corners coincide, and so on. Any 7-way homogeneous tree can be realized by repeating the above process. The clock rate for such a· structure is determined by the propagation time in any line or rail through, at most, 2(L - 1) cells, each with bulk delay {3 due to gate delays in amplifiers, and by the transmission time of the pulse on links in the structure. The longest transmission path in this structure is on a straight line through opposite corners of the largest module (Figure 9). To find this length, suppose that a cube with edge E will house a processor cell and provide enough room for a mechanical mole to test or replace the cell. The leaf module can be housed in a cube with edge 2E. (To do this, the center cell of the leaf module is moved into the available space towards the "A" corner of it.) A tree of L levels can be put into a cube with edge 2 (L-l)E. (Again, the center cell can be moved into unoccupied space in each module.) The longest transmission path is eV'3/22 L. The clock period is therefore approximately . ./3"/2 2L + 2{3(L - 1) where a is the transmission delay of a pulse on a transmission line of length E. Architecture of Large Associative Processor Consider an estimated propagation time of a typical processor. A 7-way homogeneous tree with 8 levels has 960,800 cells. If E is three inches, if pulses propagate at Ins./foot, {3 is 20 nanoseconds, and lOOns is required to decode the. instruction in each cell, then the clock period of a processor with nearly one million cells is expected to be about 436 nsec. THE SW-STRUCTURE An earlier section presented a mechanism for disconnecting processors into separately acting instruction domains. This will now be complemented with a, mechanism for connecting processors together to form a larger instruction domain. This mechanism for interconnecting processors can be used instead of the mechanism for segmenting the processor. Both mechanisms provide essentially the same capability-namely that of fitting the processor size to the size required by the program being run-but they have different properties and different costs. The former mechanism is suitable for running interdependent programs concurrently in instruction domains; the ability to merge or divide instruction domains in this mechanism appears to offer considerable programming flexibility. However, as the processor size increases, the number of instruction domains that request I/O increases, and the resulting "bottleneck" slows down the processor. A mechanism will be presented that circumvents this difficulty. A connection network is desired to connect processors together. It must provide for correct interconnection of all rails and lines used in the processor. The maximum propagation delay time must be' kept low, yet the interconnections possible in this network must be many. I t would be further desirable that the network could be reduced to a tree structure, so that it could be physically realized in a structure with low transmission delay times. The SW-structure is a connection network that is derived from a tree. In Figure 10, all of the tree above level 2 is reproduced and attached to the original tree at -' M u '" I5'.,..., f\......IX R. 1/ 'y(/ \ 1/ I~)'.\ \ a.t~I.2 ~\ u_. I \ _. t.t 1.2 1.1 I I 1A .. I' I I -- Figure lO-Formation of an SW-structure 393 level 2. In that figure, a tree structure can be restored by disconnecting links between levels one and two. In fact two tree structures can be obtained. For' example, by cutting links between nodes (3, 2) and (2, 2) and again between nodes (3,4) and (2, 4), one tree has nodes (3, 4), (2, 2), (1, 1) and (1, 2), and the other tree has nodes (3, 2), (2, 4), (1, 3) and (1, 4). In Figure 10c, all of the tree above level 3 has been reproduced again. Here, four tree structures can be obtained. In fact, for each partitioning of the nodes (1, 1), (1, 2), (1, 3) and (1, 4), a collection of disjoint trees can be found such that each tree has the nodes of each block of the partition. In essence, then, and SW-structure is a connection network that partitions a set of nodes, here at level 3, into blocks, and provides a tree structure for each block. This process of reproducing can be done at each level of a tree having.£ levels. It can be generalized to a tree whose fanout is I, just as easily as to the binary tree of Figure 10. Further, more than one reproduction, say s - 1 reproductions, can be made at each level. The resulting structure is a (uniform) SW-structure of -llevels, with fanout I and spread s. This structure will have Il-l base nodes (at the bottom of the structure) and Sl-l apex nodes (at the top of the structure). By means of such an SW-structure, the set of Il-l base nodes can be partitioned into Sl-l or fewer partition blocks. (We note that for s = .£ = 2, the structure is similar to a permutation switching network. 12 ) The nodes of the SW-structure are cells. Base cells occupy base nodes, apex cells occupy apex nodes, and connection cells occupy the remaining nodes of the structure (Figure 10). The links are communication channels; each base or connection cell essentially connects at most one link towards the apex from it. Base and connection cells each have a switching state SS. SS = 0 if no link apex-ward link is connected, and SS = i if the ith apex-ward link i = 1, 2, ... S, is connected. There connected links form one or more trees in the SW-structure. The SW-structure is used to connect together processors and I/O channels, as in Figure 11, to form a processor system. Each base cell of the SW-structure connects to a root cell of a processor, or to a single cell which controls access to an I/O channel. A set of processors and I/O channels connected together in the' SW-structure is a processor block. The entire processor block has the interconnection pattern of a tree. (See heavy line in Figure 11) In this tree, SW-structure cells appear to be permanently passive ~ells in their functional relation to cells in the processor; they have the channel and rail amplifiers that such a passive cell would have. In particular, the perimeter rails go around this tree, part of which is within the SW-structure, to provide a 394 Spring Joint Computer Conference, 1970 I-------------------~~~~-------, I I I I I Figure ll-A processor system linear ordering of cells in processors. Connection cells may have some hardware to detect faulty operation in them, since the SW-structure can operate in the presence of faulty cells, but this is not investigated here. The SW-structure is manipulated by a control unit (Figure 11). Commands are sent to this unit as though it were an output channel. Exactly one instruction domain then, will be connected to this device, and it will send commands to it by sending out a string of data words while it is in the transfer enable mode as though the control unit were an I/O device. Meanwhile, other processor blocks operate without interruption. We consider the connection and then the disconnection of a tree structure in the following discussion. Commands are sent to the control unit to connect a tree structure in the SW-structure. The three steps in connecting a tree are: (1) selecting some base nodes to which a suitable collection of processors and I/O units are attached, (2) choosing an apex node that can become a root cell of the tree structure which has all the selected base nodes in it, and finally, (3) arranging the switching state, SS, in base and connection cells to connect the tree structure. The first step can be done in many ways. For example, each base node might have an address, and the control unit might use a small channel, to base cells only, in order to select base cells one at a time by addressing them. The second step must prevent an unwanted disconnection of trees which are being used to connect present processor blocks when the tree for this processor block is connected in step 3. For the second step, we provide a line in each link of the SW-structure. One at a time, each selected base cell broadcasts a "one" signal apex-ward on these lines. An SW-structure cell delimits this broadcast if it is already being used in a tree connecting together some processor block (SS ~ 0), and is therefore not available. All apex cells receiving "one" on this line have a chain of unused SW-structure cells to the selected base cell. When this procedure has been repeated for each base cell, of those apex cells that are connectable to each selected base cell, a hardware priority circuit among vertex cells selects one such cell to be the root of the tree. For the third step of the problem, we use a second line in each link of the SW-structure. The selected apex cell broadcasts base-ward on this line, while on the line used in step 2, all selected root cells simultaneously broadcast apex-ward. Any SW-structure cell receiving a "one" from both an apex and base cell will set its switching state, SS, to connect that branch apex-ward from it on which the "one" signal from an apex cell arrived. If a tree structure exists to connect the processor block, this technique will connect it up. Commands are sent to the control unit to disconnect a tree structure in a processor block which has completed its program. A third line is provided in each link of the SW-structure. By this line, any base cell can broadcast to exactly those cells used to connect the tree structure. To disconnect the tree, a command selects some base cell in the tree, and a second command causes this base cell to broadcast on this third line. All cells receiving a "one" signal on this line sets SS, the switch state, to 0, thereby disconnecting the tree. These cells now become available for connecting new tree structures. We remark that a processor system has some useful properties of a computer network. Suppose that informati~n for each area in the 2-D grid of Figure 12 were processed in a separate processor. By means of an SW-structure, the processor for area i, j could be connected to the processor for area i + 1, j then i - I , j then i, .f + 1 then i, J' - 1. Meanwhile, the processor for area i + 1, j could be connected to i, j then i + 2, i then i + 1, j + 1, then i + 1, j - 1. Two dimensional I:I~ I:l ~~ ~ [IJ[J[] I:l~ ~~ .1:1 LiiJ Figure 12-A 2D grid Architecture of Large Associative Processor differential equations, or picture processing, or other iterative parallel programs could be executed in parallel in this system. Further, higher degree grids can also be handled. This system appears to offer many advantages of a Holland space. 10 CONCLUSIONS A processor for Information Storage and Retrieval apparently must be large to be practical. We have presented two solutions to several problems associated with large processors. One solution is the segmentation of a processor, and the other is a system of processors. The architectures have low propagation delay, and provides for fast loading and unloading data in parallel from different I/O channels, and permit highly parallel programming. The diagnosis of cells was only lightly considered here. There are many indications that this diagnosis will be relatively simple. It is apparent that both architectures are capable of operating in the presence of some cells found to be faulty. Further, a graceful degradation of performance occurs when most cells are faulty. The results presented here apply to a number of associative memory processors, and may also apply to future iterative processors that broadcast instructions in a channel, even though they do not use the associative search operation as associative processors do. ACKNOWLEDGMENTS The author wishes to acknowledge the excellent criticism of his advisor, Professor R. P. Preparata, and the suggestions of Professor R. T. Chien, Professor S. Ray, and his fellow students at the Coordinated Science Laboratory, Urbana Illinois. The author further wishes to thank the University of Illinois fellowship committee and the Joint Services Electronics Program, contract DAAB-07-67-C-199, for their financial support. Finally the author wishes to thank Keith L. Doty, at the University of Florida, for reading the manuscript and offering many valuable suggestions. REFERENCES 1 B A CRANE Path finding with associative memory IEEE Trans Computers Vol C-17 pp 691-693 July 1968 2 B A CRANE J A GITHENS Bulk processing in distributed logic memory IEEE Trans on Electronic Computers Vol EC-14 pp 186-196 April 1965 395 3 P M DAVIES Design of an associative computer Proc WJCC pp 109-117 March 1963 4 G ESTRIN R H FULLER Algorithms for content-addressable memories Ibid pp 118-130 5 J A FELDMAN P DROVER An algol-based associative language Stanford Artifical Intelligence Project Memo AI-66 August 11968 6 C C FOSTER Determination of priority structure in association memories IEEE Trans Computers (Short Notes) Vol C-17 pp 788-789 August 1968 7 R H FULLER R M BIRD A n associative parallel processor with application to picture processing 1965 Fall Joint Computer Conference pp 105-116 8 R H FULLER G ESTRIN Some applications for content addressable memories Proc FJCC pp 495-505 November 1963 9 R S GAINS C Y LEE An improved cell memory IEEE Trans Electronic Computers Vol EC-14 pp 72-75 February 1965 10 J H HOLLAND A universal computer capable of executing an arbitrary number of subprograms simultaneously Proc FJCC pp 108-113 1959 11 K ElVERSON A programming language Wiley 1962 12 W H KANTZ K N LEVITT A W AKSMAN Cellular interconnection arrays IEEE Trans on Computers Vol C-17 pp 443-451 May 1968 13 C Y LEE Intercommunicating cells, basis for a distributed logic computer Proc of FJCC pp 130-136 1962 14 C Y LEE M C PAULL A content addressable distributed logic memory with applications to information retrieval Proc IEEE Vol 51 pp 924-932 June 1963 1.5 M H LEVIN Retrieval of ordered lists from a content addressed memory RCA Rev Vol 23 pp 215-229 June 1962 16 G J LIPOVSKI The architecture of a large distributed logic associative processor Coordinated Science Laboratory R-424 July 1969 17 B T McKEEVER The associative memory structure Proc F J CC Part I pp 371-388 1965 18 H· 0 McMAHON A E SLADE A cyrotron catalog memory system Proc FJCC p 120 December 1956 19 R MORRIS Scatter storage techniques Communications of the ACM No II pp 38-44 January 1968 20 N K NATARAJAN P A V THOMAS A multiaccess associative memory lEE Trans on Computers Vol C-18 pp 424-428 May 1969 21 L G ROBERTS Graphical communication and control languages Second Congress on Information Systems Sciences Hot Spring Virginia 1964 396 Spring Joint Computer Conference, 1970 22 D A SAVITT H H LOVE R E TROOP ASP-A new concept in language and machine orgam:zation Proc SJCC pp 87-102 1967 23--Association-storing processor study Hughes Aircraft Technical Report No TR-66-174 (AD-488538) June 1966 24 A E SLADE The woven cryotron memory Proc Int Symp on the Theory of Switching Harvard University Press 1959 25 J E SMATHERS Distributed logic memory computer for process control PhD Dissertation Oregon State University June 1969 26 J N STURMAN An iteratively structured general-purpose digital computer IEEE Trans on Computers Vol C-17 pp 2-9 January 1968 27-Asynchronous operation of an iteratively struct1lred general-purpose digItal computer Ibid pp 10-17 28--An iteratively structured computer PhD Dissertation Cornell University Itaca New York September 1966 29 P WESTON S M TAYLOR CyZinders-A data structure concept based on rings Coordinated Science Laboratory Report R-393 September 1968 30 S S Y AU C C YANG A nonbulk addition technique for associative processors IEEE Trans on Electronic Computers Vol Ec-15 pp 938-941 December 1966 Application of invariant imbedding to the solution of partial diffe~~ntial equations by the continuousspace discrete-time method by PAUL NELSON, JR. Oak Ridge National Laboratory* Oak Ridge, Tennessee INTRODUCTION THE CSDT METHOD The continuous-space discrete-time (CSDT) method l of solving initial-boundary value problems for partial differential equations leads to two-point boundaryvalue problems for a system of ordinary differential equations. In order to solve such problems on an analog computer it is necessary to find an algorithm which expresses the desired solution in terms of initial-value problems. Various methods for accomplishing this are discussed in a recent survey article by Vichnevetsky.2 The invariant imbedding technique, which originally arose in connection with radiative transport problems, is essentially a method for converting a two-point boundary-value problem for a linear system of ordinary differential equations to an equivalent initial-value problem for an associated nonlinear system of ordinary differential equations. The purpose of this paper is to suggest the possibility of using invariant imbedding within the CSDT method, and to preliminarily explore some of the ramifications of this suggestion. Extensive references to work in invariant imbedding are given in the book by Wing, 3 and the articles by Bellman, Kalaba, and Wing, 4 by Bailey and Wing, 5 and by Nelson and Scott. 6 For clarity and ease of exposition we shall attempt to illustrate the application of invariant imbedding to the CSDT method within the context of a simple example. The example selected is the standard one of timedependent heat diffusion in one spatial dimension, but with rather special boundary conditions. The last section of the paper contains a discussion of possible extensions, as well as limitations, with pertinent references. We follow Vichnevetsky7,8 in describing the application of the CSDT method to the time-dependent heat diffusion equation in one spatial dimension, x. The specific heat and conductivity are assumed independent of x, time, t, and temperature, u. The unit of time is selected so that the ratio of conductivity to specific heat is unity, and the unit of length is selected to have x varying between and 1. The corresponding diffusion equation is * Operated by Union Carbide Corporation for the U. S. Atomic Energy Commission 397 ° au at au + S(x ax - =- 2 2 t) (1) " where the known function S determines the internal heat source. The temperature, u(x, t), is required to satisfy the initial condition u(x, 0) = Uo(x), (2) and the boundary conditions u(O, t) == 0, (3) u(l, t) = U(t), (4) where Uo and U are given. We introduce a positive discrete time step, llt, and write Ui(X) for u(x, illt) , Si(X) for S (x, illt). The fundamental idea of the CSDT method is to replace (1) by the system of equations UiH - At Ui = [d dx2 Ui+l + S. (x) ] tH 2 (J i = 0, 1, .•. , where (J (5) is some constant. Equation (5) can be re- 39S Spring Joint Computer Conference, 1970 written in the form i=O,l,···, (6) where S,(x) = U;~) + S'+l(X) + (1 ~ 8) [~~' + S,(x) ] . (7) Since Uo is known from (2), we can find So from (7), thence Ul from (6) and the boundary conditions (3) and (4), which then determines SI from (7), etc. More practical methods of determining the Si are discussed by Vichnevetsky.7,8 THE IMBEDDING FUNCTIONS Inasmuch as the time index i plays no essential role for a while, we shall omit it until further notice. Thus we consider the second-order ordinary differential equation dx 2 -Sex), ()dt (S) S (x) given, and u to satisfy the boundary conditions u(O) = 0, (9) = a, (10) u(l) where the constant a is given. This problem can, of course, be solved explicitly, up to an integral of S(x), but our purpose here is to illustrate a technique rather than to solve a problem. A well-known result, quite fundamental to our development, is that, for () > 0 and y rf= 0, the only solution of du U --=0 2 2 dx ()dt are well defined; i.e., the values on the right-hand side are independent of the particular element u E 'U. In order to solve the problem (S)-(10), we need to introduce two additional auxiliary functions. Our introduction of these functions is motivated by the work of Wing,9 who first rigorously applied invariant imbedding to inhomogeneous problems. We henceforth regard S (x) as fixed, and defined for 0 ::; x ::; 1. Consider that solution of (S) which satisfies the boundary conditions (12), where y E [0, 1J is arbitrary, and denote this function by 'o,(x, y) to indicate its dependence on the parameter y. Existence and uniqueness of 'o,(x, y), for arbitrary fixed y, is a standard result in the theory of ordinary differential equations. 10 We denote the partial derivatives of functions of two variables by subscripts, the subscripts 1 and 2 indicating partial differentiation with respect to the first and second arguments, respectively. Let Er(x) and E z(x) be defined for 0 ::; x ::; 1 by Er(x) = l1(x, x), (15) Ez(x) = '0,1(0, x). (16) The functions R, T, E r , and E z can be shown to satisfy the differential equations R'(x) = 1 _ R2(X) ()dt T' (x) (17a) ' R(x) T(x) (17b) ()~t E/(x) = R(x) [ - E;~) + S(x) ] Er(x) Ez'(x) = T(x) [ - - ()dt + Sex) ] , (17c) , (17d) (11) ' and to have the initial values satisfying u(O) = u'(y) = 0, (12) where the trivial solution u (x) == O. In order to use this result we henceforth assume () > O. Let 'U be the class of nontrivial (i.e., not identically zero) functions satisfying (11) and (9). If u, u E 'U, then it follows from the fundamental result stated above that u (x) = cu (x) for some constant c. Consequently the functions R ( x) = u ( x) / u' ( x) , u E 'U, (13) T ( x) = u' (0) / u' ( x) , u E 'U, (14) T(O) = 1, (lSa) R(O) = Er(O) = Ez(O) = O. (1Sb) The initial values (IS) are easy consequences of the above definitions, -as are the differential equations (17a-b) for Rand T. The derivation of (17c-d) is somewhat lengthy and difficult to motivate, although basically quite simple. It is outlined in the appendix, in order to avoid having at this point a lengthy digression from our main purpose, namely solution of (S)(10) . Application of Invariant Imbedding SOLUTION OF EQUATIONS (8)-(10) We now suppose the functions R, T, E r , and E z are known, and attempt to construct the solution of the original problem (8)-(10) from these functions. Let u(x) be the solution of (8)-(10), suppose a(x, y) is as above, with y anywhere in [0, IJ, and define cp(x, y) by = u(x) - cp(~y) a(~y). (19) Then cP, considered as a function of x· for fixed y, is either identically zero, or is in the class 91, defined above. In either event the identities cp(x, x) = CPl(X, x)R(x), (20) CPl(O, x) = CPl(X, x)T(x) (21) and follow from the definitions (13) and (14) and the fundamental result stated at the beginning of the preceding section. From (19), (20), (19) (again), the definition of a, and (15), we obtain the identity u(x) = cp(x, x) + a(x, x) = CPl(X, x)R(x) + a(x, x) = [u'(x) - aleX, x) JR(x) = u'(x)R(x) + a(x, x) + Er(x). (22) Similarly the identity u' (0) = u' (x) T (x) + E z(x) (23) is easily established. The identities (22) and (23) are the key results which enable us to solve the original problem. First note that taking x = 1 in (22) and taking account of (10) enables us to express u' (1) in terms of known quantities, to wit u' (1) = [a - Er(l) JIR(1). (24) With this result in hand, one can solve the original problem as an initial value problem, the initial data being given at x = 1 by (10) and (24). This is the algorithm suggested by Wing, 3 and used extensively by Rybicki and Usher.ll If the original problem (8) were computationally stable in the backward direction, then this would probably be the best procedure; however one of the solutions of the homogeneous equation associated with (8) is exp (-xlYot:.t), which shows that (8) is very unstable in the backward direction for the small values of t:.t which are needed to keep the time discretization error reasonably small. There is a second way to find u(x), due essentially to Scott12 (see 399 also Nelson and Scott6) which avoids this difficulty, and which we now describe. Letting x = 1 in (23), we find the equality u' (0) = u' (1 ) T ( 1) + E z(1 ) , (25 ) which expresses u' (0) in terms of known quantities, u' (1) being known from (24). Equation (23) then gives u' (x) in terms of known quantities, and (22) yields an expression for u (x) in terms of knowns. The final expression is _ R(x) T(1) ,- E ( ) R(x) T(l) u(x) - a R(I) T(x) r 1 R(1) T(x) + [Ez(l) - Ez(x) J R(x) T(x) + Er(x). (26) The reason for this peculiar grouping of terms will become apparent in the next section. COMPUTATIONAL CONSIDERATIONS On a digital computer it would be quite feasible to obtain u from (26), with R, T, E r, and Ez obtained by integrating (17) subject to (18). In fact similar approaches have been implemented digitally, and have been shown to be quite stable computationally.13 However this formula is not appropriate for analog solution of (17)-(18), at least in the context of the CSDT method. The reason is that (26) contains ratios which must be known accurately to yield an accurate value of u (x), while the quantities entering these ratios are too small to be determined accurately from analog solution of (17)-(18). In order to see this let us look somewhat more closely at (17)-(18). The function R (x), which is determined by ( 17a ) and (18b), will be an increasing function, with slope approximately unity near x = 0, and the slope decreasing as x increases, with R (x) asymptotically approaching YOt:.t as x ~ 00. In fact R(x) should be, for practical purposes, approximately equal to y ot:.t for x greater than a few multiples of YOt:.t. (This follows from the inequalities Yot:.t exp (-2xl YOt:.t) ~ YOt:.t - R(x) ~ YOt:.t exp (-xl Yot:.t) , which can be established fairly easily.) Now 0 is of the order of unity, and t:.t must be taken fairly small, say certainly t:.t ~ .01, in order to keep the time discretization error reasonably small. Thus we conclude that R(x) will rise quite rapidly from 0 at x = 0 to (almost) YOt:.t as x increases, and that R(x) ~ YOt:.t for € ~ X ~ 1, where € is small. 400 Spring Joint Computer Conference, 1970 In order to get around the above difficulty we introduce the function (3 (x), defined by (3(x) n==! = T(I)IT(1 - x), o~ x ~ 1. (28) Then (3 is determined by the initial value problem ~.~.~-------------- ANALOG DIGITAL (3' ( x) = - R (1 - x) (3 ( x ) Ie At, (29) (3 (0) = 1. ( 30) In order to solve (29)-(30), R(x) must be available for 0 ~ x ~ 1 from previous solution of (17a) subject to R(x) = O. Analog computer integration of (29)(30) will produce reliable values of (3(1 - x) for x near 1, which is precisely where this factor is most important in (26). At first look it might be thought that similar difficulties would be associated with the factor 'Y (x) (31) in (26). However a little further study shows that 'Y (x) is actually a decreasing function of x, and that El(X) approaches E l (l) somewhat faster than T(x) approaches zero. Consequently 'Y (x) will be zero, within analog resolution, before T (x), and the proper computational procedure is to set 'Y (x) = 0 for larger x by internal logic. In terms of (3 and 'Y, (26) becomes o FOLLOW PROCEDURE OUTLINED IN PRECEDING SECTION FOR COMPUTING Yn(x) b WITH a= U(nt.tl C FOR ALTERNATE METHODS OF COMPUTING Sn(x), SEE VICHNEVETSKy7,8 R(x) u(x) Figure 1-Block diagram for the application of invariant imbedding to the hybrid solution of Equations (1)-(4) by the CSDT method Considering now (17b) and (18a) we find, SInce R(x) ~ veAt for most values of x, that (27) the approximation being quite good for x greater than a few factors of veAt, and certainly for x near 1. Now (27) shows that the largest values of the ratio T (1) IT (x), which appears in the first two terms of (26), occur near x = 1. But (27) also shows that both T (1) and T (x), x near 1, are down at least 3-4 orders of magnitude from the maximum value T (0) = 1. Consequently, because of the low intensity resolution of analog computers, these quantities will both essentially be reported as zero by analog solution of (17)(18). Thus the most inaccurate values of T(I)IT(x) are obtained for exactly those x at which accuracy is most important. R(x) = a R(I) (3(1 - x) - Er(l) R(I) (3(1 - x) Block Diagram for the Application of Invariant Imbedding to the Hybrid Solution of Equations (1) -(4) by the CSDT Method. T(x) ~ exp (-xl v eAt) , = [El ~.i.) - El(X) JIT(x) + 'Y(x)R(x) + Er(x). (32) We note in passing the interesting fact that the first three terms in (32) are important only in relatively thin boundary layers, near x = 1, x = 1, and x = 0, respectively. The first term represents the effect of the imposed boundary condition at x = 1, and the remaining terms stem from the source function. The behavior near x = 0 is do~inated by the third term, and near x = 1 by the first term, except when a = 0 the combined second and fourth terms dominate near x = 1, in spite of the fact that they cancel exactly at x = 1. The behavior far (i.e., a few multiples of veAt) from either ~oundary is dominated by the fourth term, Er(x) . HYBRID IMPLEMENTATION Figure 1 shows a block diagram of one possible hybrid implementation of the method presented here. We have written E/n) and El(n) for Er and Ez corresponding to S = Sn, and 'Yn for 'Y corresponding to Ez = El(n). Note that R, T, and (3 are retained permanently in Application of Invariant Imbedding digital storage, but that they are generated anew in the analog section each time (17c-d) are to be integrated to obtain E/n) and El(n). This procedure is intended to minimize the time consumed by D / A data transmission, and to make available accurate information regarding the high frequency components of R and T during the integration of (17 c-d). The latter consideration is particularly important near x = 0, where all of the imbedding functions are changing quite rapidly. It is true that the low frequency pass band of the digital section does not permit' accurate knowledge of the high frequency components of 8 n in integrating (17c-d), but these are probably relatively less important in most cases than are the high frequency components of Rand T. As a programming note we remark that the procedure indicated in Figure 1 never requires Eland '¥ to be available at the same time, and therefore these two variables can occupy the same locations in digital storage. The same comment holds for the variables u and E r • 401 posed for solving the ordinary differential equations of the CSDT method share this defect, except for the shooting method, and the latter is well known to have serious stability defects.' Vichnevetsky 14 has suggested that nonlinear problems be solved by an iterative predictor-corrector technique, with the decomposition method to be used to solve an approximating problem linearized about the predicted solution. In a similar vein, but with digital application in mind, Allen, Wing, and Scott15 have considered the idea of solving nonlinear problems by the application of invariant imbedding to an appropriate sequence of linearized problems. In conclusion, we believe that the method presented here shows sufficient promise to warrant further in-: vestigation. Such investigations should, include a quantitative comparison of the present method with other commonly used techniques for solving the CSDT equations, with due regard to both effectiveness and computer requirements for frequently occurring types of problems. We intend to pursue such a study, and hope to communicate the results elsewhere. EXTENSIONS AND LIMITATIONS The invariant imbedding technique may be thought of, at least with regard to its application in the CSDT method, as fundamentally applying to problems of the form u' (x) = A (x) u (x) + B (x) v (x) + 8 1 (x) , (33a) v' (x) = C (x) u (x) + D (x) v (x) + 8 (x) , (33b) 2 with two-point boundary conditions of the type + (3, = ,¥u(xo) + o. u(O) = exv(O) (34a) v(xo) (34b) Here x is to range between 0 and Xo, u and v are vectors of finite (but not necessarily equal) length, A, B, etc., and ex, (3, etc. are respectively matrix functions and constant matrices of the appropriate sizes. The problem (8)-(10) is of this form after the substitution v = u'. Details of the imbedding functions and their application in solving (33) - (34) are given by Scott12 and by Nelson and Scott. 6 The statements of the preceding paragraph imply that the invariant imbedding solution of the CSDT equations can be applied, at least in principle, to any linear partial differential equation, provided the associated boundary conditions in the continuous variable can be put in the linear inhomogeneous form (34). The method does not apply directly to problems in which either the differential equation or the boundary conditions are nonlinear. However the other methods2 pro- REFERENCES 1 S H JURY Solving partial differential equations Industrial and Engineering Chemistry 53 p 177-180 1961 2 R VICHNEVETSKY Analog hybrid solution of partial differential equations in the nuclear industry Simulation 11 p 269-2811968 3 G M WING A n introduction to transport theory John Wiley and Sons Inc N ew York 1962 4 R BELLMAN R KALAB A G M WING Invariant imbedding and mathematical physics I-Particle processes Journal Math Phys 1 p 280-308 1960 5 P B BAILEY G M WING Some recent developments in invariant imbedding with applications Journal Math Phys 6 p 453-462 1965 6 P NELSON JR M R SCOTT Internal values in particle transport by the method of invariant imbedding SC-RR-69-344 Sandia Corporation Albuquerque New Mexico 1969 Submitted to Journal Math Phys 7' R VICHNEVETSKY A new stable computing method for the serial hybrid computer integration of partial differential equations Proceedings SJCC Atlantic City New Jersey Thompson Book Company 1968 8 R VICHNEVETSKY Application of hybrid computers to the integration of partial differential equations of the first and second order IFIP Edinburgh A68-A75 1968 402 Spring Joint Computer Conference, 1970 9 G M WING Invariant imbedding and transport problems with internal sources Journal Math Anal Appl13 p 361-369 1966 10 E A CODDINGTON N LEVINSON Theory of ordinary differential equations McGraw-Hill Company esp chapter 7 1955 11 G B RYBICKI P D USHER The generalized Riccati transformation as a simple alternative to invariant imbedding Ap J 146 p 871-879 1966 12 M R SCOTT Invariant imbedding and the calculation of internal values J Math Anal Appl28 p 112-119 1969 13 M R SCOTT Numerical solution of unstable initial value problems by invariant imbedding SC-RR-69-343 Sandia Laboratories Albuquerque New Mexico 1969 14 R VICHNEVETSKY Serial solution of parabolic partial differential equations: The decomposition method for non-linear and space-dependent problems Simulation 13 p 47-48 1969 15 R CALLEN JR G M WING M R SCOTT Solution of a certain class of nonlinear two-point boundary value problems J Comp Phys 4 p 250-257 1969 can be put in the form U2l1(x, y) - U2~~tY) = 0, after using equality of mixed partials. Equation (A-2) gives U2(0, y) o. = The last two equations imply that, for any y E [0, IJ, either U2 (x, y) as a function of x is in the class 'ti defined above, or U2 (x, y) is identically zero. In either case we have the identity _ U2 (x, y) _ u(x) u E 'ti, = U21 (x, y) u' (x) , in x, y E [0, 1]. If we set y = x in this identity, and recall (13), then we find U2(X, x) = U21(X, x)R(x). But differentiation of the identity Ul (x, x) = 0 (A-2), and application of (A-I) and (15) gives III -Ull(X, x) -u(x, x) ()~t APPENDIX -Er(x) + Sex) + S(x). ()~t Recall that u (x, y) is defined to satisfy the differential equation (8) and the boundary conditions (12), as a function of x. These defining conditions can be written respectively as u(x, y) Uu (x, y) - - - - = -Sex), ()~t (A-I) E/ (x) = U12(0, x) _ and the identities u(O, y) The two equations immediately preceding give an expression for U2(X, x) in terms of R, E r , and S. If this expression is substituted into (A-3), the result is the differential equation (17 c) for E r • A similar development shows that u' (0) = U21(X, x) u'(x) = Ul(Y, y) = o. (A-2) To obtain a differential equation for E r , first note that (15) and (A-2) yield (A-3) If we could express U2 (x, x) in terms of Er and known functions of x, then (A-3) would give a differential equation for E r • This is the objective in the next paragraph. If (A-I) is differentiated with respect to y, the result _ u' (0) = -Ull(X, x) u'es) = [sex) _ u(x, X)] u'(O) = ()At u' (x) 7' ( x) [ - :~; x) + S ( x) ] , where U is an arbitrary element of 'ti. This gives the equation (17d) for E l. An initial value formulation of the CSDT method of solving partial differential equations by VENKATESWARARAO VEMURI Purdue University Lafayette, Indiana INTRODUCTION In practice, however, considerable difficulties were encountered in obtaining dependable results using the CSDT method. 2 ,3 The major difficulty is that the CSDT methods are inherently unstable. Methods that were proposed to circumvent this stability problem are either conceptually wrong or impose additional computational burdens making their efficiency debatable. A second difficulty with the CSDT methods is that the basic spatial sweep from boundary to boundary, at each discrete time level, yields a two point boundary value problem (TPBVP) which in turn has to be solved iteratively. It is not clear, at the outset, whether any advantage gained by time-sharing of the analog hardware is really tangible when compared to the price paid in solving a TPBVP. A third disadvantage is that the CSDT method is essentially limited to handle problems in one space dImension only. This paper suggests a new alternative which still adopts the basic CSDT procedure but results in an initial-value problem. By this formulation the first two difficulties cited in the preceding paragraph are eliminated. This paper still treats a one-space-dimensional problem and no attempt was made here to extend the concept to higher dimensions. However, it is not quite inconceivable to extend this technique to higher dimensions by using this in conjunction with an alternating direction iterative method. Numerical methods of solving partial differential equations (PDEs) using analog or hybrid computers fall into three broad categories. Assuming, for concreteness, that one of the independent variables is time and the rest are spatial, the continuous-space and discrete-time (or CSDT) methods envisage to keep the space-like variable continuous and discretize the time-like variable. Similarly, the terms discrete-space and continuous time (DSCT) and discrete-space and discrete-time (DSDT) approximations are self-explanatory. For a one-space dimensional PDE, for instance, both the CSDT and DSCT approximations yield a set of ordinary differential equations while the DSDT approximations lead to a set of algebraic equations. Because of the inherent need to handle a continuous variable, both CSDT and DSCT approximations lend themselves well for computation on analog or hybrid computers. Indeed, several analog and hybrid computer implementations of all these three methods are currently in vogue each method claiming to be superior in some respect to the others. However, it was the CSDT method that showed great promise and produced little results. The purpose of this paper is to present another alternative to this problem. One of the fundamental advantages of the CSDT method over others is its ability to handle moving boundaries. This can be readily achieved by controlling STATEMENT OF THE PROBLEM the analog computer's integration interval since the problem space variable is represented by computerConsider the simple diffusion equation time. A second advantage is that the analog hardware requirements of the CSDT method are very modest because a relatively small analog circuit is time-shared u = U(x, t) to solve the entire problem. With the advent of modern high-speed iterative analog and hybrid computers the with the initial conditions above promises of the CSDT method appeared to be almost within the reach. 1 ,2 U(x,O) = Uo(x) = f(x); 0 ~ x ~ 1 403 (1) (2) 404 Spring Joint Computer Conference, 1970 and without loss of generality5,7 with the homogeneous boundary conditions U(O, t) = O} L = d2/ dx 2 that satisfies the homogeneous boundary conditions in (6) can be written as X (1 (3) K(x, y) U(l, t) = 0 A CSDT approximation to (1), (2) and (3) can be written, as usual, by using a backward difference approximation for the time-derivative. Specifically, at time t = tk, using a simple difference scheme, Eq. (1) can be approximated by where L~d2/dx2 is a differential operator and (At) is the size of the time step taken. With this approximation, the auxilary conditions (2) and (3) take the form (5) = { 0 ~ x ~ '1/ ~ 1 - '1/); (7) 0 ~ '1/ ~ x ~ 1· '1/(1 - x); It is important to note that the Green's function has one form for x < '1/ and another for '1/ < x and that in each semi-interval it has a structure of the product of a function of x alone and a function of '1/ alone. Such a structure is called semi-degenerate, which can greatly simplify the problem. If the Green's function K(x, '1/) obtained is not degenerate or semi-degenerate, it can always be approximated, to any desired degree of accuracy, by a semi-degenerate kernel using standard techniques. 5- 7 Therefore, the procedure presented here is not good for any nondegenerate kernel. Solution of the TPBVP described by (4), (5) and (6) can now be written in terms of the Green's function (7) as (6) The classical method of implementing the CSDT method is to solve (4) on an analog computer with the initial condition (5) and the boundary conditions (6). However, equations (4), (5) and (6) constitute a TPBVP as such Uk (x) for 0 ~ x ~ 1 at any time level t = tk cannot be obtained in a single computer run; an iterative procedure is required to determine Uk (x) at each t = tk. This iterative procedure is often performed using either a trial and error procedure or by using a search technique such as the steepest descent method. Under such circumstances; scaling limitations of analog computers place severe restrictions on the region of search making them unattractive. Coupled with the inherent instability of the analog computer circuit solving (4), this necessity to solve a TPBVP at each time level is therefore the major drawback of the conventional CSDT method. FORMULATION OF INTEGRAL EQUATION The initial value formulation starts once again with equations (4), (5) and (6). Instead of solving them as a TPBVP, equations (4) through (6) are first transformed into an equivalent integral equation of the Fredholm type. The first step of this procedure, which can be found in any standard work,4-7 is to determine a Green's function of the differential operator L in (4) that also satisfies the homogeneous boundary conditions in (6). Specifically, a Green's function for the operator 1 - (At) 11 0 K(x, ~) Uk(~) d~ (8) or 1 c Uk(x) = !k-l(X) +A o K(x, ~) Uk(~) d~ (9) where !k-1 (x) is 'the first term on -the right hand side of (8) and can be explicitly evaluated because U k-1 (x) represents a solution obtained at the preceding time level t = tk-1. The terms A and cin (9) are defined by and are introduced merely for convenience and generality. Equation (9) is a Fredholm integral equation of the second kind in its most familiar format. In (9), !k-1(X) is called the free term, A the parameter and U(x,~) is called the kernel. Without going into the details of a proof, let it be stated that for a well-posed problem, the solution U(x, t) of the given PDE can he approximated by the sequence of functions Uk (x) which are indeed the solution of the above integral equation. This procedure of transforming the given PDE into an equivalent Fredholm type integral equation was apparently suggested also by Chan8 in a recent paper but he adopts an iterative procedure to solve the integral equation. An Initial Value Formulation of the CSDT Method SOLUTION OF THE INTEGRAL EQUATION and The next computational step is to solve the integral equation presented in (9) for Uk (x). Classical methods of solving (9) are essentially iterative in nature9 and so are not suitable for real-time operation. Furthermore, analog computers are ideally suited for solving problems with prescribed initial conditions. It, therefore, is logical to search for methods of transforming integral equations into sets of ordinary differential equations with prescribed initial conditions. Such a method was recently suggested by Kalaba. lo Kalaba's method is essentially one of treating the interval of integration (0, c) as a variable rather than as a constant. By regarding the solution at a fixed point as a function of the interval of integration (now being treated as a variable), a set of ordinary differential equations with a complete set of initial conditions can be obtained. With a knowledge of the solution for one interval length, it is now easy to generate solutions for other interval lengths or for any interval length using this equation as a vehicle. Furthermore, the set of ordinary differential equations with prescribed initial conditions can be solved very easily on an analog computer. Equation (9) is the starting point for the formulation of the initial-value problem. Treating the interval (0, c) as a variable, (9) can be rewritten as dJ (x, T) dT Uk(x, T) = !k-I(X) + iT K(x, ~) Uk(~, T) d~; o o~ It is assumed that (10) has a solution for sufficiently small and x ~ T ~ T (10) c. For T (11) the solution Uk(x, T) of (10) can be proved (see appendix) to be identical to the solution of the set of equations defined by (12) through (20). G(T) ~ (1 - T) d~~) ~ [G (T ) + Tr(T) (12) J2 (13) de(T) ~ ~ G(T) elk-leT) + (1 - T)e(T) J; T with the initial conditions at reT e(T = = 0) 0) = = T >0 (14) = 0 given by reO) e(O) = = 0 (15) 0 (16) = G (T ) [T • J (x, T) J; T > X dUk(x, T) = elk-leT) dT + Te(T)]J(x, T) with the initial condition at T 405 (17) • T; T> X(18) = x given by + xr(x) (19) Uk(x, T = x) = !k-I(X) = xe(x) (20) J(x, T = x) = (1 - x) and COMPUTATIONAL PROCEDURE Equations (12) through (20) can now be solved using an analog computer or a hybrid computer. The various computational stages are indicated below. Step 1. Solve (13) and (14) on an analog computer over the interval 0 ~ T ~ x by treating 'F as computer time. Initial conditions for this computer run are given by (15) and (16) respectively. Step 2. After integrating until time T = x, the analog computer is placed in HOLD mode and the solutions rand e, at T = x, obtained in step 1 are used to evaluate the expressions in (19) and (20). These values will be useful as initial conditions while solving (17) and (18) in the next step. Step 3. At time T = x, and after (19) and (20) are evaluated equations (17) and (18) are adjoined to the original set (13) and (14) and both sets are integrated over the interval x ~ T ~ c by putting the analog computer back in COMPUTE mode. During this phase of integration, the initial conditions of the additional set are the values of J and Uk evaluated not at T = 0 but at T = x. This is precisely the reason and purpose of the computation in step 2. Step 4. The output of the integrator solving equation (18) in Uk(x, T) and this is the solution of (9) at the argument x. This is also the solution of the PDE (1) at a particular time level t = tk. DISCUSSION Initial-value problems are conceptually simple, computationally easy to solve and are susceptible for simulation studies. Simulation inherently involves trial and error experimentation in which the validity of a model is verified; sensitivity to environment is explored and variation of performance due to parameter changes evaluated. Such problems come under the classical heading of inverse problems-that is, problems where a sy~tem's performance is known from a measured set 406 Spring Joint Computer Conference, 1970 of observations and the nature of the system is to be determined. While solving such inverse problems by using such search techniques as gradient methods, it is often necessary to solve not only the dynamic equation of the system, such as (1), but also an additional equation called the derived equation. This is not a mere doubling of computational effort as it appears at first sight. The computational effort required in the evaluation of the gradient increases very fast if the derived equation is an adjoint equation posed as a final value problem. It is precisely in bottleneck situations like this that an initial-value formulation comes in handy. A second possible application of this method would be in on-line control or identification of distributed parameter systems. Implementation of this method, particularly when the kernel has no simple structure requires some degree of sophistication in the analog system. If the Green's function (or Kernel) contains, or is approximated by, expressions that are sums of products of a large number of terms then the analog circuit generally contains a large number of multipliers. This may make the scaling a little more difficult. Finally, computation from step 2 to step 3 requires a degree of sophistication in the analog switching system. Many present generation hybrid computer systems can indeed handle most of these requirements. No attempt was made in this paper to present a procedure that can be applied to any partial differential equation. Similarly no assumptions were made that would restrict the procedure to the simple case presented. In the general case, an easy procedure is required to obtain equations (12) through (20) from (10). Material filling these gaps and results supporting this procedure will be presented in a subsequent paper. 2 R VICHNEVETSKY A new stable computing method for the serial hybrid computer integration of partial differential equations Proc Spring Joint Computer Conference Vol 33 Pt 1 pp 565-574 1968 3 H H HARA W J KARPLUS Application of functional optimization techniques for the serial hybrid computer solution of partial. differential equations Proc Fall Joint Computer Conference Vol 33 pt 1 pp 565.:...574 1968 4 R E BELLMAN Introduction to the mathematical theory tff control processes Academic Press 1967 Vol 1 5 F B HILDEBRAND Methods of applied mathematics Prentice Hall 1952 6 H H KAGIWADA R E KALAB A A practical method for determining Green's function using Hadamard's variational formula J Optimization Theory Appl Vol 1 7 F G TRICOMI Integral equations Interscience 1963 8 S-K CHAN The serial solution of the diffusion equation using nonstandard hybrid techniques IEEE Transactions on Computers Vol 1-8 No 9 pp 786-799 September 1969 9 G A BEKEY W J KARPLUS Hybrid computation Wiley 1968 10 H H KAGIWADA R E KALAB A An initial value theory for Fredholm integral equations with semi-degenerate kernels Rand Corporation Memorandum RM-5602-PR April 1968 APPENDIX Outline of the initial-value formulation Step 1: The proof starts with a realization that if cI>(x, r) is a solution of the integral equation ACKNOWLEDGMENTS The author wishes to express his thanks to Professor R. Kalaba who first introduced the procedure presented in the appendix, to solve an integral equation. He also wishes to express his gratitude to Professor George A. Bekey of the University of Southern California who provided support during the initial stages of this work. provided support during the initial stages of this work from AFOSR-1018-67C. cI>(x, r) = K(x, r) o o~ 1 H S WITSENHAUSEN Hybrid solution of initial value problems for partial differential equations MIT Electronics Systems Lab DSR 9128 Memo No 8 August 1964 x ~ r (AI) then W(x, r) ~ cI>(x, r) U(r, r) (A2) is a solution of the equation defined by W(x, r) REFERENCES + fT K(x, ~)(x, r)U(r, r); 0 ~ x ~ where (x, r) is the solution of (AI). Step 3: Directing attention once again on (AI) and replacing the kernel K (x, r) by its semi-degenerate approximation, namely r); ~ 11' (1 o l (1 - x)r; (A6) o x ~ + 11' K(x, ~)(~, r) d~ r (A14) - ~)J'(~, r) d~, (A1S) + 11' (1 - ~)cp(~, r)J(r, r) d~ (A16) The value of in (A16). r'(r) = (~, r) from (AS) can now be substituted (1 - r)J(r, r) + r 11' (1 - ~)J(~, r)J(r, r) d~ o (A17) (A9) o using the definition of r(r) from (A14) r'(r) = (1 - r)J(r, r) Step 4: If J (x, r) can be determined, then using (AS) the function (x, r) can be obtained which in turn will aid in getting Uk (x, r) from (AI 0). The procedure to get J (x, r) is very similar to the one used to get (AS). Differentiating (A9) with respect to r and using the same principle indicated in Step 1, one gets J'(x, r) = (x, r)J(r, r) ~ Substituting (A10) in (A1S) where J (x, r) is defined by the integral equation (1 - x) + 11' (1 r (AS) ~ 0 o (A7) J(x, r) ~)J(~, r) d~; Step 6: The value of r (r ) can be determined by using, once again, a procedure similar to that used in Step 2. Differentiating (A14) with respect to r r'(r) = (1 - r)J(r, r) equation (AI) can be written as + 11' K(x, ~)(~, r) d~; - o K(x, r) = ~ (x, r) = r(l - x) r(r) r'(r) = (1 - r)J(r, r) rx(l - (A13) where r (r) is defined by (AS) r - o (A10) Step 5: In order to get J (r, r), one goes back to + rJ(r, r)r(r) = [G(r)J2 (A1S) where G(r) is defined in (12) Thus, the differential equation for r (r) is obtained. The initial conditions for this differential equation can be obtained readily from (A14) as r(r = 0) = reO) = 0, The procedure to obtain other equations in the text is similar. A more rigorous and elaborate proof can b~ found in Reference 10. An application of Hockney's method for solving Poisson's equation by R. COLONY and R. R. REYNOLDS The Boeing Company Seattle, Washington INTRODUCTION cp (x, y) such that The classical techniques of separation of variables and eigenfunction expansions apply to a wide variety of boundary value problems in partial differential equations. Analogous procedures exist for certain partial difference equations that arise from discretization of the differential equations. The coefficients in the expansions associated with difference equations are defined by finite sums involving the eigenfunctions and the unknown function. Under the conditions specified in the next section, the coefficients for a discretized form of Poisson's equation satisfy a tridiagonal system of linear algebraic equations which are easily solved. Hockney 5 has detailed a direct method for solving the five-point difference equation for Poisson's equation \72cp = p when boundary conditions are periodic. We have worked out the complete development when the values of cp are given on the boundary of a rectangle (Dirichlet Problem) and the finite difference network is composed of rectangles. The restrictions illustrated in Figure 1 are removed in a later section simply by redefining p. The paper is self-contained and thus repeats some of Hockney's formulas. . \72cp(x, y) = (x, y) E R - B, p(x, y), cp(x, y) = f(x, y), (x, y) E B. The rectangle oriented on a coordinate system as in Figure 1 is indicated by the coordinates of the corners. The point (x, y) E B if (x, y) is on the perimeter of the rectangle. The point (x, y) E R if (x, y) is inside the rectangle or if (x, y) E B. The solution of the Dirichlet problem to be described here is restricted to the rectangle shown in Figure 1. A method involving Fourier series may be used to great advantage if certain restrictions are placed on f (x, y). These restrictions are cp(x, 0) = bo(x) cp(O,y) = 01 cpO, = OJ 1 ~O ~ x ~ l, cp (x, m) = bm ( x) J y) ~O ~W !~ 5 4 !cn )(w ecn 2cn 2 ~! "' \.. 6 ""- 'I\. 3 "\.."- ~ I 0.1 0.2 0.3 0.4 0.6 0.8 I 2 3 ~ 4 6 8 10 MEMORY CYCLE. MICROSECOND Figure 6-Maximum number of passes through memory for 6.84 microseconds for all accesses to each sample * Here "just after" means when one more sample has been digitized. 423 did not result in either reducing the number of subsequent passes through the memory or in· a significant simplification of the vector processor. (If a last pass through the vector processor were performed as data left the memory, the use of an input pass would have been profitable. The scheme, which was rejected, ** would have used 4-Y2 passes through the memory. Processing would be base-2 at input and base-2 at output; three base-8 processing steps would occur between input and output; 2 X 83 X 2 = 2,048 = number of samples per subband per frame.) Because the use of fractional passes was rejected, the selected memory was that for the next integral number of passes below the 4.6 (that is four) allowed by a 1.5-microsecond memory (see Figure 6). Since one of the four passes is required to load samples into the ,memory and simultaneously unload filters from the memory, this resulted in allowing three passes for processing. If the same base is used for all three steps with a total of 2,048 samples per subband, the base used must be (2048)1/3 = 12.7. A practical compromise is to use 16, 8 and 16 for the bases of the three passes. It is preferable that the passes be in the sequence listed since the symmetry (that is, having the first and last of the three passes use the same base) avoids unnecessary complexity in the address sequence control. Similarly for the 1,024 samples in the 7-millisecond case, bases of 16, 4 and 16 have been selected. A base-8 processor* requires a total of five passes through the memory which must provide a cycle time of 1.4 microseconds and eliminates registers for eight words or 8 X 32 bits within the vector processor; this tradeoff is about even in dollars, but for a new design favors the slower memory in schedule risk and development cost. A base-4 processor** requires a memory cycle of 0.98 microsecond and eliminates registers for 12 words or 12 X 32 bits within the vector processor and might also save six 8-bit (scalar) multipliers, but requires much faster circuitry (arithmetic and otherwise) throughout the system. Thus no cost advantage appears to accrue to the lower base even though it requires a faster memory. Considering all such factors, the selected system appears optimum for its intended application since it is at worst no more expensive than the alternatives and is the most easily designed and developed. ** The use of a processing pass concomitant with data input appears to complicate the timing and does make the data flow more complex. *Actually 8/8/4/8 or 8/4/8/8 (2,048 pulses per subband). ** Actually five passes of base-4 and one of base-2 (2,048 pulses per subband). 424 Spring Joint Computer Conference, 1970 Burst processing The concept of burst processing allows uninterrupted use of a period equivalent to five memory cycles for the operations within the vector processor. Thus it permits acceptably slow arithmetic circuits within the processor without a disproportionate increase in the number of buffer registers. The read-write cycle associated with loading a new sample into the memory occurs once for each sample and occurs three times during processing of each sample; i.e., of each four memory cycles three are required for processing alone. Thus, while 16 memory cycles are used to read a new set of 16 operands per subband into the processor and transfer a set of 16 results per subband from the processor, the average time span during which these 16 memory cycles occur is actually 4/3 X 16 = 21-~ memory cycles. On the other hand, as noted previously, if an internal base of two is used, processing in the base-16 vector processor cannot start until ,the ninth operand is available and cannot start until the thirteenth operand is available if an internal base of four is used. A more stringent limitation results from the decision to always exchange memory data; each time an operand is read from the memory, there must be a result ready to put back in the memory. If this decision is retained, two undesirable alternatives result: to perform all processing in one memory cycle using very fast circuits or to provide a redundant set of registers for 16 words. The latter would allow a procedure similar to that which is used with a random access memory large enough for two frames of data and in which one-half the storage is used for input-output while the other half is used for processing. Since neither alternative is pleasant, it is highly desirable that a method be conceived to increase the allowed processing time to four or five uninterrupted memory cycles. The scheme adopted was to perform the transfer* of the 16 operands of the two subbands from the memory (through the vector multiplier) to the vector processor in one burst of 16 memory cycles. During the 16 cycles, new samples destined for the memory from the analog/digital converter are stored in an integrated circuit buffer of six words each of which represents one sample from each of the two subbands. The six-word buffer may store either five or six words, as illustrated in the timing diagram of Figure 2. As shown in that figure, four pairs of samples are obtained during the first 16-memory-cycle burst used ¥ As each operand is transferred from memory to processor, a result (from a set of 16 previous operands) is transferred from the processor into the same memory position. to communicate with the vector processor and a fifth pair is obtained soon thereafter. The five pairs are then transferred, in a burst, to the memory as an equal number of results are read out of the memory through the vector multiplier to the threshold circuits. The procedure is repeated in bursts as depicted in the figure with every third input burst transferring six sample pairs rather than five. SYSTEM DESCRIPTION Address sequence control The functions of the address sequence control (see Figure 1) are: 1. To determine the locations (addresses) in the random access memory (RAM) from which data is read and into which data is written. 2. To provide the sine/cosine table with a measure of the rotation angles (unit vectors) to be used to rotate vectors leaving the memory for the vector processor. IZlolz'lz'II'1 ~ Jo Oft k. JI OR kl J. Oft ko kl ~ k. J./kl~~ k, +1 ~ Iz'lz·lzllzol_MEMOItY ADDRESS REGISTER _ CONTENT 0# REGISTER I+- INPUT ~ (al LOAD (lNPUTI (b I lASE - II STE" INPUT ~ ., ~ J. ~~ INPUT J, ~ I~~ i, k. J. i. IIII'UTJ ~ .. ~ ~ ~ II, It. J. ~~ ~ II. ~ J+-INPUT J J (el lASE -. STE" (dl lASE -II.STEP ('1 OUTPUT If I LOAD ~INl'UT (tl lASE - II J. ;J (h I lASE - • STEP ~.CLJe-f J, ~ ~ J, ~~ Illl'UT j,/II. $TEl' (J I IlASE - II STEP IIII'UT ~ JI, j.-INPUT (kl OUTPUT Figure 7-Address counter configurations for 2,048 samples persubband Radar Signal Processor To avoid confusing the reader with a mathematical notion (of the fast Fourier transform procedure) involving many changeable subscripts, the explanation* of the address sequence control is in terms of exemplifying logic block diagrams for the 2,048 sample per subband case. Description of processing sequence Let 1 2047 f( j) = - 2048 I: mke iejk vi - where 0 = 21r/2048 and i = (4) k=O 1. Let j = j 2128 + j 116 + jo (5) k = k2128 + k l16 + ko (6) where j2, jo, k2, ko = 0, 1, 2, •• ·15 (7) (8) 1 15 = 16 I: exp [inC j2128 + j 116 + jo)koJ ko=O 425 Address control for readout sequence Equations (9), (10) and (11) constitute the three steps in the algorithm adopted for this processor and Equations (7) and (8) define the ranges of the j's and k's. The first step sorts the data according to the lowest digit of the frequency representation. Since the summation is over k2, there are 128 different f( jo, k1, ko) for each jo. If a counter is organized in three sections corresponding to k2, k1 andko, in terms of the original input data, one counts in ko and carries from ko to k1 and thence to k2 • This is shown in Figure 7a. To form the set of f( jo, k1' k o) in Equation (9) (Step 1), the summation requires a count of k2 and then progresses to cover all combinations of k1 and k o. It is accomplished by inj ecting the count pulse into the k2 section of the counter and letting the carry flow to ko and thence to k 1• At the end of this step, the data has been sorted according to jo as shown in Figure 7 (a) . The next two steps of the algorithm (Equations (10) and (11)) evaluate the sets f( jo, j1, k o) and f( jo, j1, j2) and proceed similarly with the counter inputs as shown in Figures 7 (c) and (d), respectively. Due to special system requirements, the output must be read out in order of frequency. With the counter significance, in terms of j's shown in Figure 7 (d), the carry path must be reconnected in order to read out in the proper frequency sequence. This is shown in Figure 7 (e). Since this is by design* also the next input sequence, the connection for the output sequence in Figure 7 (e) is the same for the next input sequence; thus Figure 7 (f) is the same as Figure 7 (e with j's replaced by k's. In Figure 7 (b), k2 carried to ko although carrying to either k1 or ko was proper as long as all the combinations of kl and ko are eventually gone through. Similarly, in the step of Figure 7 (g), k2 carries to k1 and thence to ko to again use the original counter configuration. As before, after the first step, the k2 counter has the significance of jo. The next two steps are again similarly performed as shown in Figures 7 (h) and 7 (j ) . The output sequence shown in Figure 7 (k) is identical to Figure 7 (a), thus completing one cycle of counter configurations. r 15 I: 1 16 mk2128+k116+ko exp [iOjol28k2J k2=0 Step 1 (Base-16) Compute . 1 f( Jo, k1, ko) = 16 15 I: mk2128+k116+ko exp [iOjol28k 2J (9) k2=0 Step 2 (Base-8) Compute f( jo, j., 1<0) = ~ .E f( jo, k., 1<0) exp [i8( j.16 + jo) 16k.] (10) Melllory data exchange cycling Step 3 (Base-16) Compute The previous section explains the operation of the readout cycle and ignores the control cycle for returning processed data to the memory. The design described operates by exchanging a set of 16 (or eight) previ- * Also see appendix. * To avoid a split-cycle memory or a fa.'5ter memory. 426 Spring Joint Computer Conference, 1970 ously processed data from the vector processor for a set of 16 (or eight) unprocessed data from the memory. This way, only 16 (or eight) memory cycles are required for each batch. Thus, the particular counter configuration of Figure 7 displaces the data after processing, by 16 memory positions. For example, after processing, the 16 pieces of data from k2 = 0, 1; 2, ... 15, kl = 0, ko = as shown in Figure 7 (b), are returned to k2 = 0, 1, 2, ... 15, kl = 0, ko = 1. Finally, the 16 pieces of data from k2 = 0, 1, 2, ... 15, kl = 7, ko = 15 are returned to k2 = 0, 1, 2, ... 15, kl = 0, ko = 0, which was the source of the first 16 pieces. . This last step requires an extra 16 (or eight) memory cycles in addition to the 2,048 cycles required for each pass of the Cooley-Tukey algorithm. A simple concept to keep track of the memory address precession is through the description of index registers. Consider a memory with addresses enumerated 0, 1, 2, . . . k, . . . 211 - 1. This transfer may be hidden from outside world if an index adder is connected as shown in Figure 8. Similarly a sequence of such transfers such as mk to mk+p· followed by mk+Pl to mk+P2' etc., can be disguised with an index register. For the purpose of the address sequence generator, the index register in Figure 8 can be made in the form of an accumulating register so that it always contains 2:Pi' modulo 211. Therefore, this configuration can keep track of any number of address shifts. In Figure 7, if the counters shown are considered as equivalent (to the outside world) to the memory address register and if the modulo 211 adder and index registers are interposed as shown in Figure 8, the corrections are made .as follows: TO _MOllY .MOIIY AIIORESS REGISTER AS SEEN IY MEMORY !INDEXED I '-r-...-----..,~ ° After Step Phase 1 Phase 2 Add to Index Register l(b) 20 1 (c) 27 l(d) 24 1 24 l(h) 27 1 (j) .ouTSi: 20 It is noted that once initialized ·the index register is never set to zero. There are two alternate frames to be controlled. If the two occupy separate areas in the memory from addresses 0 to 211 - 1 and from 211 to 212 - 1, the same address control may be used as identical opera- INDEX REGISTER (MODULO -1"1 :..:'tL.-___ ---I Figure 8-Memory address indexing tions delayed by one frame are performed in memory addresses that differ by 211. If we add 211 to the index register for the second frame, the general operational area of the memory will be shifted to the other half. The simplest way to cope with the two alternate frames is to double the equipment for the address counters and index registers while sharing the index adder. In this way, the two sequences can be generated independently. Phase shifter coefficients The three processing steps Equations (9), (10) and (11) may be written as: Step 1 (Base-16) f(jo, kl' ko) = 1 u .. i6 Eo mk2128+kll6+ko exp [~Jo(-n/8)k2] (12) Step 2 (Base-8) f( jo, j" ko) l(g) 1 MEMORY ADDRESS REIISTER (UN-INDEXEDI = ~ f. {f( jo, k" ko) exp [ijo( ../64)k,JI exp [ijl('1I/4)kl ] (13) Step 3 (Base-16) f( jo, jl, j2) = ~ 16 t {f( jo, jl, ko) exp [ijo('11/1024)ko] ko=O exp [ijl('11/64)k o]} exp [ij2(?r/8)ko] (14) These equations are implemented in a base-16 vector processor that provides vector rotation in multiples of ?r/8 (and hence ?r/4). This corresponds to the factored out exponent. The. operations inside the big brackets in steps 2 and 3 are one operation to each "f" so that they may be performed as information transits from Radar Signal Processor 427 cosine of the rotation angle at a similar rate from the trigonometric function table; in the "filter magnitude" mode (see Figure 10) used to obtain the squared magnitude no other input is required. In both modes of operation, each 32-bit word coming from the memory is comprised of a 16-bit datum for each of the two subbands. The 16-bit quantity has an 8-bit real portion and an 8-bit imaginary portion. The real and imaginary parts of the A and B subband data are loaded (Figures 9 and 10) into the four 8-bit registers Areal, B real , A imag and Bimag. In the phase rotation mode (Figure 9) operation occurs in four sequences (one after another) to give (in the a and (3 registers) : r-----, I ____ IIIDIOIIY ...J L. Figure 9-Vector multi pIer shown in phase rotation mode 1. the memory to the vector processor. The coefficients are functions of the address counter. In Step 2, the rotations are in units of 71"/64. There are 128 possible entries in a table. The product of jokl can range from 0 to 105. If a 7-bit number is used for representing this product, the first two digits may be used to indicate the quadrant so that the table is reduced to 32 places. In Step 3, there are two exponents in the bracket. The multiples of 71"/64 can be treated the same as before except the multiplier is now jlko. The other exponent involves multiples of 71"/1024. Since there are 226 different products of joko, a table of 226 places will be sufficient. The two vector multiplications may be performed in cascade. 2. areal aimag = Real [(Areal = cos (J Areal {3real 4. {3imag - A imag = Imag [(Areal = 3. + iAimag) (COS (J + i sin (J) ] = Areal B real (J - = B real sin (J + iAimag) (COS (J + i sin (J) ] + A imag cos (J sin (J cos sin (J B imag sin (J + B imag cos (J Within each sequence, the two required multiplications occur simultaneously in the two (scalar) mUltipliers (Mpier) and the products are immediately added in the adder/subtractor. A complete memory cycle, less only setting time for the A and B registors, is available for four sequences. An allowance of 400 nanoseconds for each sequence seems realistic. Vector multiplier The vector multiplier receives memory words at a rate of one each 1.71 microseconds and in the "phase rotation" mode (see Figure 9) receives the sine and r-""---, I ____ IIIDIOIIY ...J L. o FROM VECTOR "ULTIPLIER I I FILTER ARITH .. ETIC PROCESSOR '.ASE 4/2) (INCLUDING WiltED FUNCTION TA.LEOf'SINlCOS Of' ..,. ANO "'4) I I Ij-_ _ - ___ .....JI I I cosJJ.., I ~ iiiii-' I LT..TJ I r--.L--, I COIITIIOL...JI L ____ Filgure 10-Vector multiplier shown in filter magnitude mode L..-_ _--.J.....- - - - -.... ~~ ..ORY Figure 11-Vector processor 428 Spring Joint Computer Conference, 1970 TABLE II-Operation Sequences Within Filter Arithmetic Processor Operation of the vector multiplier in the filter magnitude mode is essentially similar (Figure 10)' except that only two sequences are performed. These result in the following a and {3 register contents: 1. areal = Areal 2 + A imal Vector processor Within the vector processor, fast Fourier processing employs either a base-4 or base-2 procedure. For an (externally viewed) overall base of 16, two base-4 steps occur and for a base-8 procedure, a base-4 and base-2 step are combined. Conceptually and equipment-wise, the vector processor (Figure 11) can be viewed as two 16-register sets of 16-bit registers R Ao to R A15 and R Bo to R BI5 • Each set stores the 16 data associated with the processing of a one-subband group of 16 operands; thus the 16 16-bit registers can contain one G-16 (see Summary System Description) or two G-8's or four G-4's. (Only the G-16 case is described.) As a new G-16 enters the vector processor from the vector multiplier, it is appropriately loaded into the 32 Radar Signal Processor 429 TABLE III-Rewritten Operation Sequences Within Filter Arithmetic Processor registers by the loading and set-transfer switches. * At the same time the old G-16 (the result of the previous operation) is transferred out of the registers and into the random access memory where it replaces the G-16 being read in. Transfer from the memory occurs under control of the output sequence switches. Thus, during a 16 memory cycle burst, the processed (old) G-16 in the vector processor is exchanged for a (new) G-16 from the memory; as it passes through the vector multiplier, the data of the new G-16 are multiplied by the proper unit vectors. During the subsequent five memory cycles, ** the ~ctual processing of the G-16 occurs. In this process a sequence of fout:-register-sets is selected by the set selection switches, transferred to the filter arithmetic processor (FAP) where they are subjected to a base-4 fast Fourier procedure and then returned to the four * The switches are a part of the control and are considered separately only to facilitate the description. ** When six memory cycles are available for processing, the sixth cycle is a "dead period." registers from which they came. In order to process the G-16, a total of four sets must "go through" the FAP per base-4 step per subband. This is illustrated by the base-16 column at the right of Table II. In that table, it is seen that the first base-4 step uses (for each subband) operands 00, 0 4, Os and 0 12 to give results Ro, R 4 , Rs and R12 and uses operands 0 1 , Os, 0 9 and 0 13 to give results R 1, R s, R9 and R 13 , and so on. The second step is described by the lower half of the base-16 column of Table II. Table II presents the fast Fourier arithmetic in its most conventional form. The actual equations chosen to be mechanized are shown in Table III in which the equations have been rewritten to reduce the number of multiplier circuits required. The F AP itself is shown in Figure 12. As demonstrated by Table III, the operations performed on the (input vector) operands and on combinations of the operands consist of multiplication by ±1 and ±i (= vi=l), addition and subtraction and multiplication by the sine of 7r/4 and/or the sine and cosine of 7r/8. All operations except multiplication by 430 Spring Joint Computer Conference, 1970 Sine/ cosine table TO lOADING AND SET-TRANSfER SWITCHES I I I L ____________ r-~~ -j """"'01. L I I I I I NOTE CONTROlLED ADDERS MULTIPLY EACH OPERA~BY+I.·I.+I.OA-,8£FORE AOOING AlL. BRACKETEO TEAMS CW TABLE 2 ARE AVAILABlE AT INPUT TO ~CONTROi..lEO ADDERS NO ," f-------l _ _ ...J Figure 12-Filter arithmetic processor the sines and the cosine occur in the "controlled adders" of Figure 12. In the controlled adders, pairs of vectors are added to or subtracted from each other either directly or after prior multiplication by ± 1 or ±i. The results of the addition or subtraction may also be mUltiplied by ± 1 or ±i. Multiplication of the appropriate information by functions of 7r/4 and 7r/8 is performed in the six 8-bit (scalar) multipliers shown in Figure 12; these multipliers are similar to those used in the vector multiplier. Since each word entering the vector processor contains one operand per subband (and there are two subbands) and sin = F7 eil47r/8 = 0.707 - iO.707 X l6 e = 0.924 - iO. 383 = = Xu = Fa = Xu = Fu = + iO = 1 = 0.924 + iO .383 i27r/8 e = 0.707 + iO.707 ei37r/S = 0.383 + iO.924 e i47r IS = 0 + i = i e i57r IS = - 0 .383 + iO. 984 ei67r/8 = -0.707 + iO.707 e i77r / S = -0.924 + iO.383 eiS7r/8 = -1 + iO = -1 eiO = F lj il57r IS 0 . 924 - iO. 383 = - i 435 An improved generalized inverse algorithm for linear inequalities and its applications by L. C. GEAR y* and C. C. LI University of Pittsburgh Pittsburgh, Pennsylvania when each (cosh !Yi) 2 and each Yi 2 are respectively minimized, one can simply compare J (Yi) and J hk(Yi) , the convex functions of one variable only. Taking the gradients of J (Yi) and J hk(Yi) with respect to Yi, one obtains INTRODUCTION A great amount of research for the solution of linear inequalities has been undertaken in the past ten years. One of the reasons for this research is the development of linear separation approaches to pattern recognitionl - 5 ,8-16 and threshold logic problems. 6 ,7,9 Both of these problems require the determination of a decision function or decision functions which, in the case of linear separation, involve a system of linear inequalities. In this paper, an improved iterative algorithm will be developed for the solution of the set of linear inequalities which is written in the following equation: Aw> o. and aJ hk(Yi) aYi - - - = 2Yi. It is clear that the absolute value of aJ (Yi) / aYi is greater than the absolute value of aJhk(Yi)/aYi everywhere except at Yi = 0 where they are equal. In gen~ral, the gradient aJ (y) / ay is greater than the gradient aJ hk (y) / ay everywhere except at the origin Y = O. Since the gradient descent procedure is used in both algorithms, and since Y and b, or Y and w, are linearly related, it is conceivable that the proposed algorithm may have a higher convergence rate for a solution w. As mentioned previously, J(y) reaches a mi!limum when each "term (cosh !Yi)2, (i = 1, ... , N), is minimized. For each (cosh !Yi)2 to be a minimum, each Yi, (i = 1, .... ,N), must equal zero and Y = 0 gives a desired solution. Since the b/s are only constrained to be positive, J(y) can be minimized with respect to both wand b subject to the condition that b > O. Note that it is not necessary to attain the minimum value of J (y); in fact, a solution w* is obtained whenever Y 2:: 0 with b > 0 from which follows A w* 2:: b > o. (1) This algorithm is an improvement of the Ho-Kashyap algorithm by choosing a criterion function N J(y.) = 4 L (cosh !Yi)2 (2) i=l to be minimized where Yi is the ith component of the N by 1 vector y defined below Y = Aw - b, b > O. (3) The improvement lies in an acceleration of the HoKashyap algorithm caused by a steeper gradient of J (y ) as can be seen when a comparison is made between the two criterion functions. Let J hk (y) designate the criterion function used in the Ho-Kashyap algorithm, N Jhk(y) = II Y 112 = LYi 2. (6) (4) i=l DEVELOPMENT OF THE TWO-CLASS ALGORITHM Since J (y) and J hk (y) reach their respective minimum Let the matrix A, whose transpose is *Presently with Gulf Research & Development Company, Pittsburgh, Pa. 437 ·438 .. Spring Joint Computer Conference, 1970 As can be shown later, p(k) may be chosen to equal be represented as al2 ... aI.] ~l ~2 • • • aNI aN2 [an A = where U-2n (7) , = Ymax(k) = Max I Yi(k) aNn I. (16) lXi + r aJ(y) aw - - = 2A ts(y) (8) where st(y) = [sinh YI, ••• , sinh YN], and the gradient of J (y) with respect to b is given by a~~) = (9) -2s(y), where the derivative of a scalar with respect to a column vector is a column vector. Since w is not constrained in any way aJ(y)/aw = 0 implies s(y) = 0 which, in turn, implies Yi = 0 for all i = 1, 2, ... , N. Therefore, for a fixed b > 0, minimizing J (y) with respect to w gives Y = Aw - b = O. Solving the above equation for w, one obtains w= A~b (10) where A ~ is the generalized inverse of A. On the other hand, for a fixed w, aJ (y) / ab = 0 with b > 0 dictates a descent procedure of the following form, with k denoting the iteration number: b(k + 1) = b(k) + db(k) where the components of dbi(k), i i1b (k) are governed by ( 1 - Abi(k) (15) where is an n by 1 augmented pattern vector, 1, and N = nl + n2. The gradient of J (y) with respect to w is given by n 1 cosh Ymax (k) p(k) ex: (11) = 1, 2, ... , N, of Substituting (13) into (11) and, from (10), writing w(k + 1) = w(k) + p(k)A~h(k), (17) one obtains the following algorithm: w(O) = A~b(O), b (0) > 0 but otherwise arbitrary y(k) = Aw(k) - b(k) b(k 1) = b(k) p(k)h(k) (18) ( w(k 1) = w(k) p(k)A~h(k) + + + + where h (k) and p (k) are given by equations (14) and (15) respectively. Note that in this algorithm p (k ) varies at each step and is a nonlinear function of Y (k ) . A recursive relation in y(k) can also be obtained from (18) , y(k + 1) = y(k) + p(k) (AA~ - J)h(k). (19) Just like the Ho-Kashyap algorithm, it can be shown that the above algorithm (18) converges to a solution w* of the system of linear inequalities in a finite number of steps provided that a solution exists, and simultaneously acts as a test for the inconsistency of the linear inequalities. These properties are formally stated in Theorem I given in the next section. THEOREM I Before discussing the main theorem, a lemma to be used in the proof of the theorem will be given first. Lemma 1: Let one consider the set of linear inequalities (1) and the algorithm (18) to solve this set. Then 1) y(k) ::I> 0 for any k; aJ(Y(k) )) . b = 2 smh Yi a . If Yi > 0, if Yi ~ and (12) i o O. 2) if the set of linear inequalities is consistent, then y(k) 0, the elements of the vector y(k) cannot be all non-positive. Theorem I: Consider the set of linear inequalities (1) and the algorithm (18) to solve these inequalities, and let V[y(k)J = II y(k) 112. 1) If the set of linear inequalities is consistent then a) ~V[y(k)J ~ V[y(k + I)J - V[y(k)J <0 and lim V[y (k) J = 0 implying convergence AA*y(k) = 0, ~ V[y(k) J= ~ V[y(k) 2) If the set of linear inequalities is inconsistent, then there exists a positive integer k* such that J<0 ~V[y(k) J = 0 ~V[y(k) for k < k* for k ~ k*, for k < k* y(k) = y(k*) ~ 0 for k ~ k* and Further simplification leads to ~V[y(k)J = -[y(k) + I y(k) I Jt[p(k)R(k) + p2(k)R(k) (AA* - I)R(k) J[y(k) = for k ~ k* b(k) = b(k*) for k ~ k*. + p (k ) R (k) - p2 (k ) R2 (k ) . where R . [sinh = d lag - -Yl, 000, Yl sinh YN] YN - - (22) • For ~ V[y(k) J to be negative semidefinite, ~ V[y(k) J = = 0 or y(k) ~ 0, the matrix o only if y(k) [p2(k)R(k)AA*R(k) + P (k)R(k) for all In other words, the occurrence of a nonpositive vector y (k ) at any step terminates the algorithm and indicates the inconsistency of the given set of linear inequalities. - p2(k)R2(k) J >0 i = 1,2, 000, N. (23) Since ruCk) = sinh Yi/Yi > 0 for all i and p(k) is restricted to be positive, the above condition reduces to the condition, 1 - p(k)ru(k) >0 for all i = 1,2, • 00, N. (24) For p(k) chosen in equation (15), Proof: Part 1: Since the algorithm (18) can be rewritten as a recursive relation in y (k ) given by (19), and 8 V[y(k) J = II y(k) W> 0 for all y(k) ¢ 0 (20) V[y (k ) J can be considered as a Liapunov function for the nonlinear difference equation (19). Thus ~ V[y(k) J ~ V[y(k = yt(k + 112 -lly(k) l)y(k + W sinh Yi(k) + p2(k)h t (k) (AA* - I) t(AA* - I) h(k). is hermitian idempotent, Yi2rf(k) n=O (2n + 1) ! -----<1. Ymax 2n (k) n=O (2n) ! t + p(k)yt(k) (AA* - I)h(k) I) 1 sinh Yi(k) p(k)ru(k) = -h--(-k) .(k) cos Ymax y~ t 1) - yt(k)y(k) = p(k)ht(k) (AA* - I) ty(k) (AA* - 1 p(k) = - - - cosh Ymax (k ) + 1)]- V[y(k) J = Ily(k+ 1) Since + I y(k) I J -II y(k) + I y(k) I II 2p2(k)R(k)AA*R(k) [p (k ) r u ( k) - p2 ( k ) r ii2 (k ) J w(k) = w(k*) (21) + p2(k)ht(k) (I - AA*)h(k). must be positive definite. AA* is positive semidefinite because AA* is hermitian idempotent, xtAA*x ~ 0 for any x; it follows that ztRAA*Rz ~ 0 for any z; hence RAA*R is also positive semidefinite. Now one can choose a p(k) such that [p(k)R(k) - p2(k)R2(k) J is positive definite. [p(k)R(k) - p2(k)R2(k) J is positive definite if and y (k) O. This completes the proof of Part 1 (a) . To prove the convergence of the algorithm (18) in a finite number of steps, one notes that b(k ) is a nondecreasing vector. Let bt(O) = [1, 1, ..• , 1J, then bt(k) ~ bt(O) ~ k* As a consequence, one obtains Then Ll V[y (k ) J has the desired property of negative semidefinite for p(k) = l/cosh Ymax(k) and for any finite y(k). From equation (22) one notes that Ll V[y (k ) J equals zero if and only if y(k) = 0 or y(k) ::::; O. Since it is assumed that the set of linear inequalities (1) is consistent, and from the lemma y (k) « 0, therefore Ll V[y(k) J ~ for all k [1,1, •.. , 1J for any k > o. Since Aw(k) = b(k) + y(k), I yt(k) I < [1,1, •.. , 1J implies A w* (k) > 0 when a solution w* is reached. But V[y(k) J ::::; 1 impli~s I yt(k) I < [1, 1, ..• , 1]. Since V[y(k) J converges to zero in infinite time, it must converge to the region V[y(k) J = 1 in finite time, hence I yt(k) I < [1, 1, ... , 1J, AW(k) > 0, and a solution w* = w (k) is obtained in a finite number of steps. This completes the proof of Part 1 (b) . Ll V[y(k) J = 0 'for all k ~ k* h(k) = 0 for all k ~ k* w(k) = w(k*) for all k ~ k* b(k) = b(k*) for all k ~ k* This completes the proof of the theorem. A n Optimum Choice of the Scalor p (k ) The choice of p(k) = l/cosh Ymax(k) in the previous section is only one of many possible choices of p (k ) for the convergence of the algorithm (18). The convergence rate may be further improved by choosing a p (k) such that the decrease in the Lyapunov function V[y(k) J is maximized at every step, that is, -~V[y(k)J is maximized with respect to p(k). Taking the partial derivative of ~ V[y (k ) J in equation (22) with respect to p(k) leads to an optimum value of p(k) given by p(k) + + I y(k) I J [y(k) I y(k) I JtR(k) [y(k) 2[y(k) + I y (k) I JtR(k) • [/ - AA*JR(k) [y(k) + I y(k) I J (26) provided that / - AA # > O. For this value of p (k) , ~ V[y(k) J is negative definite in [y(k) + I y(k) I J which is required in the convergence proof of the algorithm (18). A flow chart summarizing the above procedure is shown in Figure 1. EXAMPLES Part 2: It has been proved in Part 1 that V[y(k) J is negative semidefinite independent of the consistency of the linear inequalities. Now, if the set of linear inequalities (1) is inconsistent, one notes that y (k) cannot be 0 and hence V[y (k ) J cannot become zero for any k > O. There must exist a value of k, called k*, such that Ll V[y(k) J y(k) <0 for 0::::; k =0 for k «0 for 0::::; k < k* = k*, < k*. But V[y(k*) J = 0 if either y(k*) = 0 or y(k*) ::::; O. Since y(k*) =;t. 0, this implies y(k*) ::::; 0 and hence, The algorithm (18) has been applied to pattern recognition and switching theory problems. For switching theory problems the generalized inverse of the N by n pattern matrix A is simplified to A* = 2-(n-l)At. Two example problems will be presented, one in switching theory and the other in pattern recognition. Example 1: Consider a Boolean function of eight binary variables which corresponds to the separation of the two classes: Class C1 /= (127, 191, 215, 217 to 255) Class C2 = (0 to 126, 128 to 190, 192 to 214, 216). Improved Generalized Inverse Algorithm 441 was observed, that for 0.5 ~ bi(O) ~ 0.001 and p(k) given by equation (26), for all examples tried by the authors that the number of iterations was less than or equal to the number of iterations required by the HoKashyap algorithm. In some cases the number of iterations was reduced by a factor of 25. 17 yes yes Problem is not linearly separable Problem is linearly separable p - equation (15) or equation (26) Figure l-Flow chart of the proposed 2-class algorithm + Here m = 2r = 256 and n = r 1 = 9, where r is the number of binary variables. For bt(O) Example 2: The proposed algorithm was also applied to a preliminary study of a biomedical pattern recognition problem. The problem is to investigate whether or not a change exists in the diurnal cycle of an individual person upon a change in his environmental condition or physiological state and if such a change may be used to diagnose physical ailments under strictly controlled conditions by measuring the amounts of electrolytes present in urine samples every three hours. IS The data used in this example consisted of thirteen sample patterns under two different conditions. Each pattern lias eight components which represent the mean excretion rates of an electrolyte for each three-hour period of the twenty-four hour cycle. Thus N = 13 and n = r + 1 = 8 + 1 = 9; the size of the pattern matrix A is 13 by 9. The pattern matrix A is shown in Table 1. Let bt(O) = [0.1,0.1, ···,0.1]' For this problem the Ho-Kashyap algorithm with p = 1 required 7 iterations to determine the separability. However, the proposed algorithm with p(k) given by equation (26) required only two iterations, where p(l) = 5.270684 and p(2) = 3.197152. The problem is linearly separable and a solution weight vector w obtained by the proposed algorithm is wt(2) = [-13.6089, 2.5915, 1.6847, 2.2314, 0.3414, = [.1, .1, .1, •.• , .1, .1, .1J and p(k) given in equation (26), the algorithm terminates after the tenth iteration and gives a solution weight vector w for the switching function, w t = [0.3732, 0.2278, 0.2278, 0.1654, 0.0769, 0.0569, 0.0247, 0.0247, 0.0247J, The same example was solved using the Ho-Kashyap algorithm. s It required 229 iterations with the same initial b(O). The solution weight vector w for the HoKashyap algorithm is w t = [0.5741, 0.3447, 0.3447, 0.2425, 0.1155, 0.1080, 0.0436,0.0436,0.0436]' The computing time for the proposed algorithm was 50 seconds on IBM 7090 with a cost of $1.50, while the Ho-Kashyap algorithm required 80 minutes with a cost of $23.50. Thus the proposed algorithm not only reduced the number of required iterations but also the computing time and cost to solve the problem. It 3.0077, 1.8428, 1.6559, 0.0096J EXTENSION TO THE MULTICLASS ALGORITHM The problem of multiclass patterns classification is that it must be determined to which of the R different classes, Cl , C2 , ••• , CR, a given pattern vector, x, belongs. If the R-class patterns are linearly separable, there exist R weight vectors Wj to construct R discriminant functions gj (x), (j = 1, 2, ... , R), such that gj(x) = XtWj> XtlVi = gi(X) for all i ¢ j, x E Cj. (27) Chaplin and LevadPo have formulated another set of inequalities which can be considered as a representation of linear separation of R-class patterns. This set of inequalities is II xtU - e/ II < II xtU - ei t II for alIi ¢j, x E C j (28) for allj = 1,2 ••• ·, R 442 Spring Joint Computer Conference, 1970 TABLE 1-The Pattern Matrix A for Example 2 Xo Xl 1.00 1.00 1.00 1.00 1.00 1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 .96 .75 .80 .66 2.04 1.02 - .48 -.55 -.87 -.09 -1.12 -1.20 -1.43 X2 X3 1.19 1.19 1.13 1.40 1.14 1.32 -1.01 -.55 -.79 -.70 -1.75 -1.47 -1.79 X4 1.35 1.35 .85 1.25 1.10 1.06 - .68 -1.04 -1.34 -.67 - .51 -.60 -.68 .75 1.06 .90 1.09 .57 1.03 - .72 - .91 -.86 -.80 -.72 -.96 -.75 where U is an n X (R - 1) weight matrix and the vectors e/s are the vertex vectors of a R - 1 dimensional equilateral simplex with its centroid at the origin. If each ej is associated with one class, x is classified according to the nearest neighborhood of the mapping xtU to the vertices. Inequalities (28) are, in fact, equivalent to inequalities (27) with (j = 1, 2, 00 0, R) (29) Let the N X n pattern matrix A be defined in the following manner, A= Aj ~ lxtj X6 Xs 1.12 1.07 1.14 1.54 .62 1.07 -1.76 -1.40 - .44 -1.93 -1.25 -1.13 -.82 X7 .73 .94 .97 1.27 .79 .66 1.16 -1.25 -1.17 -2.15 -1.29 -.46 -.89 -.56 .81 1.01 .27 .47 .77 - .62 -1.28 -.82 -1.14 -.89 -.74 -.94 .97 .81 .88 .00 1.39 .57 -1.47 -1.09 -.74 -1.39 -1.29 -1.00 -1.04 where Aj is an nj X n submatrix having as its rows nj transposed pattern vectors of class Cj, (l = 1, 2, zxtj, 000, nj), where the right subscript denotes the pattern class and the left subscript denotes the lth pattern in that class, and N = nl n2 nR. Designate the n X (R - 1) weight matrix U as composed of (R - 1) column vectors U q , (q = 1, 2, R - 1), + + 0 0 + 0 000, U = [Ulo 0 °Uqo 0 OUR-I]. (31) Also define an N X (R - 1) matrix B as lb tl lb t j (30) B= Bj (32) ~ njb tj njxti AR BR IxtR Xs lb t Improved Generalized Inverse Algorithm whose row vectors (j = 1, 2, ••• , R; l = 1, 2, .•• , lbtj, in this paper is nj), correspond to the class groupings in the A matrix AjU(ej - ei) and satisfy the following inequalities for all i ~ j B j is an nj X (R - 1) submatrix of B, j = 1,2, ... , R. Let an N X (R - 1) matrix Y be defined as Y ~ AU - B. > for all i Yj(ej - ei) = (AjU - B j ) (ej - ei) for all i (38) j ~ >0 (39) j for all j = 1, 2, •.• , R or (IXtjU - zb tj ) (ej - ei) >0 for all i (35) or an array of N row vectors l Y j, (j = 1, 2, ... , R; l' = 1, 2, .•• , nj), corresponding to the class groupings in ,the A matrix, ~ Associated with it is another set of linear inequalities (34) The representation of Y may be in the form of either an array of (R - 1) column vectors, Yq, (q = 1, 2, ••• , R - 1), 0 for all j = 1, 2, ..• , R (33) for all j = 1, 2, .•• , R. 443 ~ j foraH j = 1,2, ··R for all l = 1, 2, • ·nj. Since, by (33), Bj(ej -=- ei) is constrained to have positive components for all i ~ j, inequalities (39) implies the inequalities (38) and hence (27) or (28). When inequalities (38) are satisfied for all i ~ j and for all j = 1, 2, ..• , R, a solution weight matrix U is reached which will give linear classification of R-class patterns; that is, if for all i ~ j then x is classified as of class Cj. lY j Y= Yj ~ (36) njYj DEVELOPMENT OF THE MULTI-CLASS ALGORITHM For the notational simplicity in the derivation of the gradient function to be developed below, let the matrices A, U, B, and Y in equations (30), (31), (32), and (35) be represented respectively as YR (40) lY R u= (41) where Y j is an nj X (R - 1) submatrix of Y, Yj = AjU - B j (37) (42) or j = 1,2, •.• , R l = 1, 2, ••• , nj. The set of linear inequalities which will be discussed and (43) 444 Spring Joint Computer Conference, 1970 SUbstituting these into equation (34), one obtains zSj( Y) = [ZSjl (Y), n Yij = L and zSj (Y) is a row vector of the following form aikUkj - bij. (44) = [sinh Y(nj_d"Z) ,1, k=1 Let C (Y) be an N X (R - 1) matrix defined by C(Y) = [eij] ~ [cosh !Yij] (45) (i = 1, "',N;j = 1, "',R -1). The criterion function J (Y) to be minimized is chosen as the trace of 4C t ( Y) C (Y) , J(Y) ~ Tr (4CtC) = N R-l i=1 j=1 L LJij(Y) Jij(Y) = 4 (cosh !Yij)2. Determine the gradients of J (Y) with respect to both U and B, aJ(Y) = 2AtS(Y) au (47) = -2S(Y) (48) where S(Y) is an N X (R - 1) matrix with the following representation Y n1Sl(Y) ------ Sl(Y) lSj(Y) (50) = AU - B = 0 U= A#B. (51) On the other hand, for a fixed U and the constraint Bj(ej - ei) > 0 for all i ¢ j as given in (33), one might attempt to increment B according to the following gradient descent procedure to reduce J (Y) at each step, B(k + 1) = B(k) + oB(k) (52) where the qth element, o[zbjq(k)], of o[zbl(k)] in oBj(k) is given by [ -P(k) [aJ(Y)(k)] z aB jq = 2p(k) ZSjq(Y(k»), if zYj(k) (ej - eq) > 0 for any q ¢ j if zYj(k) (ej - eq) S 0 for any q ¢ j. o[zbjq(k)] = 1l o However, zYj(k) (ej - eq) > 0 does not imply zSj(Y(k») (ej - eq) > O. In order to make o[zb/(k)] • (ej - eq ) ~ 0 so that (33) can be satisfied at eaJh step, a modified gradient descent procedure, similar to the one adopted in Teng and Li's generalization of the Ho-Kashyap algorithm,16 is to be used. Let a (R - 1) X (R - 1) non-singular matrix E j be defined as 16 E j = [ej - el, "', ej - ej-l, ej - ej+l, "', ej - eR]. Sj(Y) ~ sinhY(nj_l+Z),R-l]. which gives a least square fit of S (Y) ~ [sinh Yij] (i = 1, 2, "', N; j = 1, "', R - 1) "', ZSj(R-l) (Y)] Since U is not constrained in any manner, aJ ( Y) / au = o implies that S (Y) = 0, which, in turn, implies that sinh Yij = 0 and hence Yij = 0 for all i = 1, "', Nand j = 1, 2, "', R - 1. Therefore, for aJ(Y)/au = 0 and a fixed B, (46) where aJ(Y) aB ZSj2(Y),' • " (53) (49) ~ Also define njSj(Y) ------ for all j = 1,2, "', R. Zj = YjEj (54) The increment o[zbjq(k)] is then given in terms of SR(Y) 12p(k) ZSjq(Z(k») = p(k) [zSjq(Z(k») ------ lSR(Y) o[zbt·(k)E] J J q j -lo if + zAjq(k)] zZjq(k) = zYj(k) (ej - eq) > 0 if zZjq(k) = zYj(k) (ej - eq) S0 (55) Improved Generalized Inverse Algorithm 445 where p (k) may be chosen as equal to where (56) and, following (50), ZSjq(Z(k)) = Sinh zZjq(k). R nj j=1 Z=1 L: L: {zEj(k) + zHj(Y(k)) (El)-IR-l (57) R-I Putting into vector representation, 2 L: hqt(J - AA#)hq q=1 (58) (65) or provided that + zAj(k)]Erl o[zbj(k)] = p(k) [zSj(Z(k) ) R = p(k) zHj(Y(k)) (59) L: t n° {zEj(k) + zHj(Y (k)) (El)-IR(zZj(k) )El j=1 Z=l where + zHj(Y(k)) ~ [zSj(Z (k)) Hj(Y(k)) = [Sj(Z(k)) • Hj(Y(k))} zAj(k) ]Erl. (60) + Aj(k) JErI (66) (67) rqq Hj(Y(k)) zZjq > 1 ( zZ J0) ~ Sinh Z -, z jq ZEj l = 1, 2, "', nj) ~ [zZjR(zZj) (q = 1, "', R - 1). (68) + zAjJ(ElEj)-IR-l(ZZj) • [zZjR(zZj) - zAj]t ~ 0 HR(Y(k)) 0 where17 C,i = 1, 2, "', R; H(Y(k)) = > for all j and all nRHR(Y(k)) l. (69) = [hl(Y(k)) ••• hq(Y(k)) ••• hR_I(Y(k))]. The initial B matrix, B(O), may be chosen from (61) Bt(O) = ~[el' •. ·el I ••• ! ej, "', ej l ••• 1 eR, "', eR], It follows from (58) and (56) that o[zbj(k) ](ej - ei) ~ 0 for all i ~ j Then, from (59), Y(k Substituting the above equation into (52), one has + 1) = B(k) + p(k)H(Y(k)) (62) Using the above equation in (51), one has U(k + 1) = AlfB(k + 1) = AIf{B(k) = U(k) + p(k)AlfH[Y(k)] + 1+ + 1) = Y(k) + p(k) (AAIf - (70) (63) = AlfB(O) J)H[Y(k)] (71) This algorithm is a convergent algorithm for the solution U of the set of linear inequalities (38). The nonlinear separability of the multi-class patterns can also be detected by observing at a certain step k* for all + p(k)H[Y(k)]} Therefore, an iterative algorithm to solve for U can be proposed in the following: U(O) Y(k) B(k 1) HlY(k)) U(k 1) > O. A recursive relation in Y(k) is also obtained as follows: o[B(k) J = p(k)H(Y(k)). B(k ~ and for allj. i ~ j for all j = 1, 2, "', R. CONVERGENCE PROOF OF THE MULTI-CLASS ALGORITHM = AU(k) - B(k), Zj(k) = Yj(k)Ej = B(k) + p(k)H[Y(k)] = [Sj(k) + Aj(k) JErI The convergence of the proposed multi-class algorithm can be proved in the following steps. = U(k) Lemma 2. Consider the set of inequalities (38) and the + p(k)AlfH[Y(k)] (64) 446 Spring Joint Computer Conference, 1970 algorithm (64) to solve it. Then all j at any step terminates the algorithm and indicates the nonlinear separability of the R-classpatterns. 1) Yj(k) (ej - ei) :1> 0 for all i :;C j for all j = 1, 2, •• , R for any k for all land j. For a sufficiently large but finite k, V[Y (k)] that IllYj(k) 112 < 1 and {(E/)-l[R-l(lZj) - p(k)I]Ejt} j=l l=l • lHj(Y(k» >0 then aV(Y(k» is negative definite in [lZjR(zZj) lAj]. Note that if p(k) = 1 cosh Y max(k) + j,l,q [t E {/Ej(k) + ,Hj(Y(k» (E;')-l J = - --------R-l ~O for zHj(Y(k» L: htq(Y(k» (I - AA#)hq(Y(k» q=l ~ 0 or [lZjR(zZj) + zAj] ~ 0 for alll and j. Hence, .aV[Y(k)] is negative definite in [ZZjR(lZj) + lAj]. Note that ZZjR(lZj) + lAj = 0 for all j and alll only if zZj ~ 0, that is, only if Y(k) = 0 or iYj(k) • (ej - ei) ~ 0 for all i ~ j and for all j. Since it is assumed that the set of the inequalities (38) is consistent, from the lemma Yj(k) (ej - ei) ~ 0 for all i ~ j and for all j; therefore, aV[Y(k)] <0 for all =0 if Y(k) ~ 0 Y(k) = 0 and the solution Y = 0 of the equation (71) can be reached asymptotically, that is, lim -{3e/Ej II Y(k) 112 = 0 k-+«) which corresponds to a. solution U** with A U** = B such that AjU**(ej - ei) = Bj(ej - ei) > 0 for all i ~ j and for all j. This is the proof of Part 1 (a). for all l ~ j < 1 such and all j. It follows then (1 • R(,zj(k) )E;'(,Hj(Y(k») I 4 > AjU(k)Ej = Bj(k)Ej , [R-l(lZj) - p(k)I] is positive definite and has real eigenvalues as can be shown by following (67) and (68); but it is not certain that (E/)-l[R-l(ZZj) p(k)I]E/ can be positive definite for all j and alll. Let p(k) be so chosen as to maximize -aV[Y(k)] at each step, one obtains a choice of p(k) as given in (65), provided the condition (66) is satisfied to make sure that p(k) > O. For this value of p(k), aV(Y(k» lYj(k)Ej 0, + p(k) [lSj(Z(k» + lAj(k)] nj L: L: lHj(Y(k» 447 + o)Bj(O)Ej - + Yj(k)E j > Bj(O)Ej > oBj(O)Ej > 0 for all J which indicates a solution U* = U(k) is obtained in a finite number of iteration steps. This is the proof of part 1 (b). Part 2 can be proved in the same way as that in the Ho-Kashyap theorem. 17 CONCLUSION A new generalized inverse algorithm for R-class pattern classification is proposed which is parallel to the one given by Teng and Li. In the case of R = 2, the algorithm is reduced to the improved dichotomization algorithm developed in the beginning; except here A2 is composed of transposes of augmented pattern vectors without change of sign and B2 is a column vector consisting of elements all equal to e2 = -1. This corresponds to the reformulation of the Ho-Kashyap algorithm as mentioned by Wee and Fu. 15 The-proposed 2class algorithm has a higher rate of convergence than previous methods for a certain range of initial b vector or vectors. A comparison has been made between this improved algorithm with p(k) given by equation (26) and the Ho-Kashyap algorithm with p = 1, the convergence rate may be greatly increased for .001 ~ bi(O) ~ 0.5 (i = 1, 2, "', N), as verified by the computer results of several switching theory and pattern classification problems. For problems where a large number of iterations, for example, greater than twenty, were required for the Ho-Kashyap algorithm, the proposed algorithm reduced this number of iterations by a factor of 20 or more. Even though the cost per iteration for the proposed algorithm is 10 to 20 per cent greater than the Ho-Kashyap algorithm, the total cost is reduced. For problems where a small number of iterations were required by the Ho-Kashyap algorithm, less than twenty, the proposed algorithm reduced the number of iterations by as much as 30 percent. Experimental results suggest that the proposed algorithm is advantageous· for problems requiring a 448 Spring Joint Computer Conference, 1970 large number of iterations by the Ho-Kashyap algorithm. ACKNOWLEDGMENT This work is based partially on a Ph.D. dissertation submitted by the first author in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the University of Pittsburgh. During the course of this study, the first author was supported by a NASA Traineeship as well as the Learning Research and Development Center at the University. This work was also supported in part by NASA Grant NsG-416. The authors would like to thank Professor J. G. Castle, Professor T. W. Sze, Dr. T. L. Teng and lVlajor T. E. Brand for their helpful discussions. REFERENCES 1 N J NILSSON Learning machines McGraw-HilllncNewYorkN Y 1965 2 G S SEBESTYEN Decision-making processes in pattern recognition Macmillan Co New York N Y 1962 3 A G ARKADEV E M BRAVERMAN Computers and pattern recognition Thompson Book Co Washington D C 1967 4 L UHR Pattern recognition John Wiley and Sons Inc New York N Y 1966 5 C K CHOW An optimum character recognition system using decision functions IRE Transactions on Electronic Computers Vol EC-6 No 4 1957 6 R 0 WINDER Threshold logic Ph D Dissertation Princeton University Princeton N J 1962 7 P M LEWIS C L COATES Threshold logic John Wiley ane Sons New York NY 1967 8 Y C HO R L KASHYAP An algorithm for linear inequalitites and its applications IEEE Transactions on Electronic Computers Vol EC-14 No 51965 9 R L KASHYAP Pattern classification and switching theory· Ph D dissertation Harvard University Cambridge Mass 1965 10 W G CHAPLIN V S LEV ADI A generalization of the linear threshold decision algorithm to multiple classes Computer and Information Sciences-II edited by J T Tou Academic Press N ew York N Y 1967 11 C C BLAYDON Recursive algorithms for pattern recognition Ph D dissertation Harvard University Cambridge Mass 1967 12 K S FU W G WEE On generalizations of adaptive algorithms and application of the fuzzy sets concept to pattern classification TR-EE67-7 School of Electrical Engineering Purdue University Lafayette Ind 1967 13 I P DEVYATERIKOV A I PROPOI Y Z TSYPKIN Iterative learning algorithms for pattern recognition Automation and Remote Control Vol 28 No 11967 14 W G WEE Generalized inverse approach to adaptive multiclass pattern classification IEEE Transactions on Computers Vol C-17 No 12 1968 15 W G WEE K S FU An extension of the generalized inverse algorithm to multiclass pattern classification IEEE Transactions on Systems Science and Cybernetics Vol SSC-4 No 2 1968 16 T L TENG C C LI On a generalization of the Ho-Kashyap algorithm to multiclass pattern classification Proceedings of the Third Annual Princeton Conference on Information Sciences and Systems 1969 17 L C GEARY A n improved algorithm for linear inequalities in pattern recognition and switching theory Ph D Dissertation University of Pittsburgh Pittsburgh Pa 1968 18 A G NIELSEN A H VAGNUCCI C C LI Application of pattern recognition to electrolyte circadian cycles Proceedings of the 8th International Conference on Medical and Biological Engineering Chicago III 1969 The social impact of computers by O. E. DIAL Baruch College New York, New York People are afraid of computers, But they shouldn't be. Computers are good guys! MADMAN KRONENBERG During two recent years at MIT, I had a teletypewriter assigned to my use. The teletype gave me instant access to a large computer complex in which I had various sets of data stored. The teletype was located in a basement room which was generally dark except for the light focused over my teletype and small desk. It was always silent in that room except for the noise of the teletype. In this setting, I would conduct analysis of my data by the hour. I would sort along particular variables, intersect those which seemed promising and in this way be led from one avenue of investigation to another. I was in effect carrying on a dialogue with the computer. I asked a question and I got an answer. The answer led me to other questions. Sometimes the computer would complain that I had not made my inquiry in the correct form and it would suggest that I try again. It kept me informed of the time I had used and how much I had remaining; what data sets I had placed on file and what analysis I had completed. I could cuss it (and often did), thank it, wait impatiently the few seconds it sometimes required to respond, and get excited about what it was telling me. Given all of this, it should not seem strange that this machine came to be human to me for long periods of time. It had personality, value, integrity-and it carried on conversations with me alone. I understand Kronenberg when he says "computers are good guys. "1 But of course computers are neither good nor bad. They are neutral instruments which are used for good and bad purposes. But their stated purposes often do not take accounts of other effects they are having in society. The purpose of this paper is to speculate on some of those effects. First let me recount a few statistics which will serve to suggest the magnitude of the subject. After doing so, 449 I believe that you will conclude with me that the most impressive thing about computers is the future, not the past. Over 71,000 computers have been installed in the United States by the time of this writing. Equally important, there are back orders for over 15,000 more. 2 This means that there are back orders in this year alone for more than one-fifth of the total number of computers installed during the past, say, twenty-five years. Demand for computers is changing, becoming stronger. So much for numbers of computers. How about changes in the computer itself. Paul Armor recently had occasion to report these changes in terms of orders of magnitude. 3 He says that the speed with which computers operate has increased by an order of magnitude about every four years. The size of the computer has decreased an order of magnitude in the last ten years, and it will shrink another three orders of magnitude in the next ten. The cost of computation, too, has declined, by an order of magnitude every four years. To summarize at this point computers are becoming more numerous, faster, smaller and cheaper. But other changes are' taking place. For one, the computer industry is serviced by our system of higher education. In the brief span of years since the term "Computer Scientist" was invented, the system has produced computer scientists at a rate which has already wedged out 2% of· America's scientific manpower.4 At a lower order of preparation, the successive tides of graduates from the countless programming schools in every large city of the country have even now not met demand. Perhaps the best evidence of this unsatisfied demand is the recruiting piracy which has become commonplace in many computer hardware and software companies. Another area to be noted here is that of computer applications. These have increased along an exponential curve in both breadth and depth. From earlier employment in scientific computation, we now find computers in use by virtually every size and level of public and 450 Spring Joint Computer Conference, 1970 private activity, and for an incredible range of applications-from trash-collection routing, to optimization of freight hauling and warehousing; from the operation of traffic control systems, to the optimal stationing of emergency vehicles; from the automation of library inventories and ordering, to the management of military supply and equipment inventories; from hospital patient monitoring, to the conduct of regional health planning; and from matching the unemployed with available jobs, to matching the lonely heart with available dates. We. find similar variety in the users of systems-the clergy, politician, physician, professor, builder, manager, government official, soldier, sportsman, and on to an endless list of persons who just a few years back could not have anticipated their own involvement with computers. For example, a new tide of computer-related technology has made multiple-access networks common- . place, and already we are looking for its marriage to CATV in bringing the blessings of the computer to the housewife. This application will not only require the ultimate in user-oriented languages, but some change in the rules as to· who has the last word as well. In any event, it must be perfectly clear that computers are so well entrenched in every segment of our society that it is merely academic to discuss its impact. There is not much we could do about it at this point even if we tried. But it does make an interesting subject for speculation, which is what I want to try to do at this time. When a computer is performing in a particular application, its first-order effect is assessed fully in terms of how well it is performing with respect to that application. But taken as a whole, it must be obvious that the computer industry has created a whole range of second and third order effects. Second-order effects might include a wide assortment of contributions, e.g., contribution to GNP, contribution to efficiency in production and administration, contribution to improved scientific and technological capabilities, and a substantial contribution to an improved potential for the collection, storage and selective retrieval of important data and all that this implies. Third-order effects flow from these contributions. For example, a municipality is for the first time able to create an informational-decision system which is useful for the conduct of its operations, and the planning and evaluation of its programs. This can yield enlightenment in goal formulation and improvements in the quality of life. Whether this potential will be exploited remains to be seen, but it is there, as an indirect effect of computer technology. But it is not these effects that I am interested in for purposes of this paper. I believe that computer technology has had a number of impacts upon society where the causal relations are even more remote than the examples I have enumerated, and thus more difficult to trace and prove. They must be considered speculative. First of all, there is a growing persuasion to the systems approach in the belief that it is the only profitable method of inquiry. Of course, there is disenchantment in some quarters, but not enough to slow the movement. This persuasion is understandable given the fact that all computer programs are in fact systems. Input is processed to output. Computer programs are discrete and capable of precise specification. Its processes are clearly visible for inspection and verification. Furthermore, much of it is modular, thus permitting hierarchical structuring as sub-systems into larger systems. The complex can and must become simple with this approach. All mystery is removed. The problem of the social sciences, for example, is merely to isolate and relate the variables in the social system. We can begin at simple levels and build toward the complex. If we are successful, given a variety of inputs for purposes of testing their effects, we can simulate processes within systems at all levels. Second, I think, is the pervasiveness of programming, or perhaps I should say, the universality of programming. Witness its spread from the computer to becoming a methodology for pedagogy. Considerable attention has been given to the wonders of the programmed textbook, the programmed plan, the programmed career, and so on in a list challenged only by the innovative limits of the entrepreneur. The extent to which programs already govern our thought processes is most appropriate for inquiry. It carries with it a subtle reenforcement of rationality as a value in our society, but rationality as defined in terms of programming. All options are reduced to the program's world of mutually exclusive IF STATEMENTS. Computers and programs are absolutely rational and because of this, they can solve infinitely complex problems with great accuracy, provided that the unravelling can reduce the problem to additions no more complex than a value of one or zero to a value of one or zero. That which cannot be reduced to such algorithms are merely held in abeyance until its true nature can be understood. Understanding is equivalent to order.' The reduction of phenomena to specific variables is essential; nothing else will compute. A corollary to this spells the decline of intuition and belief as positive values in society. Third, I think, developments in computer technology are encouraging streams of reevaluation as to the feasibility of keeping and using historical records of all types. Record reductions may increasingly be based on entire statistical populations, as opposed to sampling. Social Impact of Computers This can permit, in fact encourage, the collection of environmental and social data on a scale never before contemplated. This may be amassed longitudinally in such quantities and periods as to permit real headway in social sciences research. Such headway has profound implications for the close monitoring of the behavior, activities and welfare of increasingly more numerous segments of society and its institutions. With knowledge and monitoring can come control. Fourth, I think, the developments I have just discussed will precipitate an increasingly tense confrontation with the individual's right to privacy as a tradeoff with society's right to know. To paraphrase Alan Westin, the practical boundaries of privacy, as we knew them before the age of the computer, are being redefined in the onslaught of the greatest data-generating society in human history. 5 Where this will take us is of course unknown, but I deeply suspect that it will be in the direction of acceptance of progressive reductions of the data trails which we now hold to be private. The urgency of the crises presented by over-population and environmental pollution will demand (and we will accede to) planning controls. These are planning and controls which could never have been contemplated without computer technology. The masses of data which are prerequisite would quickly have inundated manual processes of data collection, retrieval and massage. It may well be that privacy is going the way of the skirt length-ever more revealing of the subject it covers. The remaining areas of impact of computer technology on society seem to me to be relatively trivial, but nonetheless worthy of note. We can anticipate increasingly insistent pressures to articulate the parameters of highly repeti~ive and routine decisions to the end that they may be automated. This should elevate decision-making in which true judgment is involved to higher orders of application. The lingering worry is, of course, that in. our anxiety to do this work we will force the articulation of these parameters, ruling in a measure of cases, however small, in which judgment should remain a factor. This has implications for the demise of concepts of the importance of the individual and of the justice which must be granted him. Where we explicitly settle for validity in two or three sigmas, we are in fact writing off the cases beyond that as unworthy of concern. While this list of speculations is by no means ex- 451 haustive, it should serve to illustrate at least some of the impacts of computer technology upon society. The odd thing is, that we shall not really know what the second and third-order effects are until we have applied computers on a considerable scale to the search. I don't think that this will be done for many years. The difficulty is, of course, that society does not value information sufficiently at the margin. When computers are employed for public purposes, we demand that their application have a short payoff. Political feasibility is not tested by the automation of personnel accounting systems because savings achieved over manual systems are quickly realized. On the other hand, the development of comprehensive information systems for purposes of collection and storage of environmental and social information have long-run payoffs, and hence do not meet the test of political feasibility. When the time comes to allocate the substantial funds that are required, or to make the organizational, jurisdictional, political and private compromises that are a part of the cost, we effectively reject comprehensive information systems. And yet the problems of our society today are of such a nature that these systems are considerably more important than systems which merely achieve economies of time and dollars. It is quite possible that we should be talking of survival. I must conclude, therefore, that we will learn of the social impact· of computer technology much as we learned of the profound impact of the automotive industry-considerably after the fact. REFERENCES 1 R TODD You are an interfacer of black boxes ATLantic p 68 March 1970 2 EDP industry report pp 6-7 August 61969 3 PARMER The individual: His privacy self-image and obsolescence Panel on Science and Technology Eleventh Meeting Committee on Science and Astronautics U S House of Representatives pp 1-2 January 28 1970 4 American science manpower National Science Foundation NSF 70-5 January 1970 5 A F WESTIN Privacy and freedom Atheneum New York 1968 A contiuum of time-sharing scheduling algorithms* by LEONARD KLEINROCK University of California Los Angeles, California INTRODUCTION A GENERALIZED 1VIODEL The study of time-sharing scheduling algorithms has now reached a certain maturity. One need merely look at a recent survey by McKinneyl in which he traces the field from the first published paper in 19642 to a succession of many papers during these past six years. Research which is currently taking place within the field is of the nature whereby many of 'the important theoretical questions will be sufficiently well answered in the very near future so as to question the justification for continuing extensive research much longer without first studying the overall system behavior. Among the scheduling algorithms which have been studied in the past are included the round robin (RR) , the feedback model with N levels (FBN) , and variations of these. 1 The models introduced for these scheduling algorithms gave the designer some freedom in adjusting system performance as a function of service time but did not range over a continuum of system behaviors. In this paper we proceed in that direction by defining a model which allows one to range from the first come first served algorithm all the way through to a round robin scheduling algorithm. We also find a variety of other models within a given fami]v which have yet to be analyzed. Thus the model analyzed in this paper provides to the designer a degree of freedom whereby he may adjust the relative behavior for jobs as a function of service time; in the past such a parameter was not available. Moreover, the method for providing this adjustment is rather straightforward to implement and is very easily changed by altering a constant within the scheduler. In an earlier paper3 we analyzed a priority queueing system in which an entering customer from a particular priority group was assigned a zero value for priority but then began to increase in priority linearly with time at a rate indicative of his priority group. Such a model may be used for describing a large class of timesharing scheduling algorithms. Consider Figure 1. This figure defines the class of scheduling algorithms which we shall consider. The principle behind this class of algorithms is that when a customer is in the system waiting for service then his priority (a numerical function) increases from zero (upon his entry) at a rate a; similarly, when he is in service (typically with other customers sharing the service facility simultaneously with him as in a processor shared system4 ) his priority changes at a rate {3. All customers possess the same parameters a and {3. Figure 1 shows the case where both a and {3 are positive although, as we shall see below, this need not be the case in general. The history of a customer's priority value then would typically be as shown in Figure 1 where he enters the system at time to with a 0 value of priority and begins to gain priority at a rate a. At time tl he joins those in service after having reached a value of priority equal to a(tl - to). When he joins those in service he shares on an equal basis the capacity of the service facility and then continues to gain priority at a different rate, {3. It may be that a customer is removed from service before his requirement is filled (as may occur when one of the slopes is negative) ; in this case, his priority then grows at a rate of a again, etc. At all times, the server serves all those with the highest value of priority. Thus we can define a slope for priority while a customer is queueing and another slope for priority while a cus- * This work was supported by the. Advanced Research Projects Agency of the Department of Defense (DAHC15-69-C-0285). 453 454 Spring Joint Computer Conference, 1970 >!::: a: o ~ ~ Figure 1-Behavior of the time-varying priority tomer is being served as queueing slope = a (1) serving slope = {3. (2) A variety of different kinds of scheduling algorithms follow from this model depending upon the relative values of a and {3. For example, when both a and {3 are positive and when {3 2:: a then it is clear that customers in the queue can never catch up to the customer in service since he is escaping from the queueing customers at least as fast as they are catching up to him; only when the customer in service departs from service after his completion will another customer be taken into service. This new customer to be taken into the service facility is that one which has the highest value of priority. Thus we see that for the range (3) we have a pure first come first served (FCFS) scheduling algorithm. This is indicated in Figure 2 where we show the entire structure of the general model. Now consider the case in which o ~ {3 ~ a. (4) This is the case depicted in Figure 1. Here we see that the group of customers being served (which act among themselves in a processor-shared round robin (RR) fashion) is attempting to escape from the group of customers in the queue; their attempt is futile, however, and it is clear from this range of parameters that the queueing customers will eventually each catch up with the group being served. Thus the group being served is selfishly attempting to maintain the service capacity for themselves alone and for this reason we refer to this system as the selfish round robin (SRR). Figure 2-The structure of the general model What happens in this case is that entering customers spend a period of time in the queue and after catching up with the serving group proceed to be served in a round robin fashion. The duration of the time they spend in the queue depends upon the relative parameters a and {3 as we shall see below. It is clear however that for {3 = 0 we have the case that customers in service gain no priority at all. Thus any newly entering customer would have a value of priority exactly equal to that of the group in service and so will immediately pass into the service group. Since all serving customers share equally, we see that the limiting case, (3 = 0, is a processor-sharing round robin (RR) scheduling algorithm! It happens that SRR yields to analysis very nicely (whereas some of the other systems mentioned below are as yet unsolved) and the results of this analysis are given in the next section. Another interesting range to consider is that for which a ~ {3 < o. (5) Here we have the situation in which queueing customers lose priority faster than serving customers do; in both cases however, priority decreases with time and so any newly entering customer will clearly have the highest priority and will take over the complete service facility for themselves. This most recent customer will continue to occupy the service facility until either he leaves due to a service completion or some new customer enters the system and ejects him. Clearly what we have here is a classical last come first served (LCFS) scheduling algorithm as is indicated in Figure 2. Contiuum of Time-Sharing Scheduling Algorithms Now consider the range a < 0 < {3. (6) In this case a waiting customer loses priority whereas a customer in service gains priority. When an arriving customer finds a customer in service who has a negative value for priority then this new customer preempts the old customer and begins service while at the same time his priority proceeds to increase at a rate {3; from here on no other customer can catch him and this customer will be served until completion. Upon his completion, service will then revert back to that customer with the largest value of priority. Since customers lose priority with queueing time, then all customers in the system when our lucky customer departed must have negative priority. One of these will be chosen and will begin to gain priority; if now he is lucky enough to achieve a positive priority during his service time, then he will seize the service facility and maintain possession until his completion. Thus we call this range LCFS with seizure (see Figure 2). In the special case a=O<{3 (7) we have the situation in which a newly emptied service facility will find a collection of customers who have been waiting for service and who have been kept at a zero value priority. Since all of these have equal priority they will all be taken into service simultaneously and then will begin to gain priority at a rate {3 > O. Any customers arriving thereafter must now queue in bulk fashion since they cannot catch up with the current group in service. Only when that group finishes service completely will the newly waiting group be taken into service. We refer to this case as bulk service. The last case to consider is in the range {3 < 0, {3 < a. before, the SRR system yields very nicely to analysis and that analysis is given in this paper. This system has the interesting property that we may vary its parameters and pass smoothly from the FCFS system through the SRR class to the familiar RR system. The others (LCFS with seizure and LCFS with pickup) are as yet unsolved and appear to be more difficult to solve than the SRR. Of course other generalizations to this scheme are possible, but these too are yet to be studied. Among these generalizations, for example, is the case where each customer need not have the same a and {3; also one might consider the case where growth (or decay) of priority is ~ non-linear function of time. Of all these cases we repeat again that the SRH has been the simplest to study and its analysis fo llows in the next section. THE SELFISH ROUND ROBIN (SRR) SCHEDULING ALGORITHM We consider the system for which customers in service gain priority at a rate less than or equal to the rate at which they gained priority while queueing (see Equation (4)); in both cases the rate of gain is positive. We assume that the arrival process is Poisson at an average rate of A customers per second P [inter-arrival time::; t] = 1 - e-Xt t ~ 0 (9) and that. the service times are exponentially distributed P [service time ~ x] = 1 - e-P.X x ~ 0 (10) Thus the two additional parameters of our system are average arrival rate = A average service time = 1/fJ. (11) (12) As usual, we define the utilization factor (8) In this case a customer being served always loses priority whereas a queueing customer loses priority at a slower rate or may in fact gain priority. Consequently, serving customers will tend to "run into" queueing customers and pick them up into the service facility at which point the entire group continues to decrease in priority at rate {3. We refer to this region as LCF S with pickup (see Figure 2). Thus Figure 2 summarizes the range of scheduling algorithms which this two-parameter priority function can provide for us. We have described a number of regions of interest for this class of algorithms. The FCFS, LCFS, and RR systems, of course, are well known and solved. The three regions given by Equations 4, 6, and 8 are as yet unsolved. As mentioned 455 P == A/fJ. (13) For the range of a, {3 under consideration it is clear that once a customer enters the service facility he will not leave until his service is complete. Consequently, we may consider the system as broken into two parts: first, a collection of queued customers; and second, a collection of customers in service. Figure 3 depicts· this situation where we define* Tw = E[time spent in queue box] Ts = E[time spent in service box] (14) (15) N w = E[number in queue box] (16) Ns = E[number in service box] (17) * The notation E[x] reads as "the expectation of x." 456 Spring Joint Computer Conference, 1970 system is unaware of the customer's service time until he departs from the system, it is clear that the time he spends in the queue box must be independent of this service time and therefore r--------------------.I I I I I ARRIVALS --+--~ I QUEUE SERVICE BOX BOX I I DEPARTURES (24) I I I I .L - ______________________ J I T, N Figure 3-Decomposition of the SRR system We further define T = Tw ~ = Nw + Ts = E[time in system] + Ns = E[number in system] (18) (19) Due to the memoryless property of the exponential service time distribution, it is clear that the average number in system and average time in system are independent of the order of service of customers; this follows both from intuition and from the conservation law given in Reference .5. Thus we have immediately T = I/p. (20) 1- p N=-P1- P (21) For our purposes we are interested in solving for the average response time for a customer requiring t seconds of service; that is for a customer requiring t seconds of complete attention by the server or 2t seconds of service from the serVer when he is shared between two customers, etc. Recall that more than one customer may simultaneously be sharing the attention of the service facility and this is just another class of processorsharing systems. 4- Thus our goal is to solve for Let us now solve for Ts (t). We make this calculation by following a customer, whom we shall refer to as the I "tagged" customer, through the system given that this . customer requires t seconds of service. His time in the queue box will be given by Equation 24. We now assume that this tagged customer has just entered the service box and we wish to calculate the expected time he spends there. This calculation may be made by appealing to an earlier result. In Reference 4, we studied the case of the processor-shared round robin system (both with and without priorities). Theorem 4 of that paper gives the expected response time conditioned on service time and we .may use that result here since the system we are considering, the service box, appears like a round robin system. However, the arrival rate of customers to the service box conditioned on the presence of a tagged customer in that box is no longer A, but rather some new average arrival rate A'. In order to calculate A' we refer the reader to Figure 4. In this figure we show that two successive customers arrive at times tl and t2 where the average time between these arrivals is clearly I/A. The service group moves away from the new arrivals at a rate (3 and the new arrivals chase the service group at a rate a; as shown in Figure 4, these two adjacent arrivals catch up with the service group where the time between their arrival to the service box is given by I/A'. Recall that the calculation we are making is conditioned on the fact that our tagged customer remains in the service box during the interval of interest; therefore the service box is guaranteed not to empty over the period of our T (t) = E[ response time for customer requiring t seconds of service] (22) where by response time we mean total time spent in the system. The average of this conditional response time without regard to service time requirement is given by Equation 20. Due to our decomposition we can write immediately T(t) = Tw (t) + Ts(t) )- t~ o ~ 0.. (23) where T w (t) is the expected time spent in the queue box for customers requiring t seconds of service and T8(t) is the expected time spent in the service box for customers requiring t seconds of service. Since the Figure 4-Calculation of the conditional arrival rate to the service box Contiuum of Time-Sharing Scheduling Algorithms calculations. 'A' is easily calculated by recognizing that the vertical offset y may be written in the following two ways (~) ~ y = and so T=T10 +~ 1 _ p' (32) Using Equation 20 we have the result Tw = lip, _~ 1- p 1 - p' (25) T(t) = +t - lip, 1 - p (recall that for the SRR system (3 venience we now define = 'A'I p, ~ a). For con- (26) We may now apply Theorem 4 of Reference 4 and obtain the quantity we are seeking, namely, T 8 (t) t 1_ p' (27) The only difference between Equation 27 and the referenced theorem is that here we use p' instead of p since in all cases we must use the appropriate utilization factor for the system under consideration. That theorem also gives us immediately that p' N (28) =-8 1 _ p' This last equation could be derived from Equation 27 and the application of Little's result6 which states that (29) and where T=~ 1 _ p' (30) 8 We may now substitute Equation 27 into Equation 23 to give T(t) = Tw t +-, 1- p (31) In order to evaluate T10 we form the average with respect to t over both sides of Equation 31 to obtain lip, 1 - p' (34) Another convenient form in which to express this result is to consider the average time wasted in this SRR system where wasted time is any extra time a customer spends in the system due to the fact that he is sharing the system with other~ customers. Thus, by definition, we have VI(t) =-- (33) Upon SUbstituting Equation 33 into Equation 31 we obtain our final result as and so we may solve for 'A' as follows p' 457 = T(t) - t (35) and this results in Wet) = pip, 1 - p + (t - IIp,)p' 1 - p' (36) In both Equations 34 and 36 we observe for the case of a customer whose service time is equal to the average service time (II p,) that his average response time and average wasted time are the same that he would encounter for any SRR system; thus his performance is the same that he would receive, for example, in a FCFS system. We had observed that correspondence between the RR system and the FCFS system in the past; here we show that it holds for the entire class of SRR systems. In Figure 5 below we plot the performance of the class of SRR systems by showing the dependence of the wasted time for a customer whose service time is t seconds as a function of his service time. We show this for the case p = %: and p, = 1. The truly significant part regarding the behavior of the SRR system is that the dependence of the conditional response time upon the service time is linear.' Once observed, this result is intuitively pleasing if we refer back to Figure 3. Clearly, the time spent in the queue box is some constant independent of service time. However, the time spent in the service box is time spent in a round robin system since all customers in that box share equally the capability of the server; we know that the response time for the round robin system is directly proportional to service time required (in fact, as shown in Reference 8, this statement is true even for arbitrary service time). Thus the total time spent in the SRR system must be equal to some con- 458 Spring Joint Computer Conference, 1970 t.el/l' t.--- Figure 5-Performance of the SRR system stant plus a second term proportional to service time as in fact our result in Equation 34 indicates. Again we emphasize the fact that customers whose service time requirements are greater than the average service time requirement are discriminated against in the SRR system as compared to a FCFS system; conversely, customers whose service time requirement is less than the average are treated preferentially in the SRR system and compared to the FCFS system. The degree of this preferential treatment is controlled by the parameters ex and {j giving the performance shown in Figure 5. CONCLUSION In this paper we have defined a continuum of scheduling algorithms for time-shared systems by the introduction of two new parameters, ex and {j. The class so defined is rather broad and its range is shown in Figure 2. We have presented the analysis for the range of parameters that is given in Equation 4 and refer to this new system as the selfish round robin (SRR) scheduling algorithm. Equation 34 gives our result for the average response time conditioned on the required service time and we observed that this result took the especially simple form of a constant plus a term linearly dependent upon the service time. Moreover, we observe that the parameters ex and {j appear in the solution only as the ratio {j/ ex. This last is not overly surprising since a similar obser.vation was made in the paper3 which was our point of departure for the model described herein; nameiy, there too the slope parameters appeared only as ratios. Thus in effect we have introduced one additional parameter, the ratio {j/ ex, and it is through the use of this parameter that the designer of a time-sharing scheduling algorithm is provided a degree of freedom for adjusting the extent of discrimination based upon service time requirements which he wishes to introduce into his algorithm; the implementation of this degree of freedom is especially simple. The range of the algorithm is from the case where there is zero discrimination base(- on service time, namely the FCFS system, to a case where there is a strong degree of discrimination, namely the RR system. The'mathematical simplicity of the SRR algorithm is especially ·appealing. Nevertheless, the unsolved systems referred to in this paper should be analyzed since they provide behavior distinct from the SRR. In any event, this continuum of algorithms is simply implemented in terms of the linear parameters ex and {j, and the scheduling algorithm can easily choose the desired behavior by a~justing ex and {j appropriately. REFERENCES 1 J M MCKINNEY A survey of analytical time-sharing models Computing Surveys Vol 1 No 2 pp 105-116 June 1969 2 L KLEINROCK Analysis of a time-shared processor Naval Research Logistics Quarterly Vol 11 No 1 pp 59-73 March 1964 3 L KLEINROCK A delay dependent queue discipline Naval Research Logistics Quarterly Vol 11 No 4 pp 329-341 December 1964 4 L KLEINROCK Time-shared systems: A theoretical treatment JACM Vol 14 No 2 pp 242-261 April 1967 5 L KLEINROCK A conservation law for a wide class of queueing disciplines Naval Research Logistics Quarterly Vol 12 No 2 pp 181-192 June 1965 6 J D C LITTLE A proof of the queueing formula L = AW Operations Research Vol 9 pp 383-387 1961 7 L KLEINROCK Distribution of attained service in time-shared systems J of Computers and Systems Science Vol 3 pp 287-298 October 1967 8 M SAKATA S NOGUCHI J OIZUMI A nalysis of a processor shared queueing model for time-sharing systems Proc of the Second Hawaii International Conference on Systems Science pp 625-628 University of Hawaii Honolulu Hawaii January 22-24 1969 The management of a multi-level non-paged memory system by FOREST BASKETT, J. C. BROWNE and WILLIAM M. RAIKE The Univer8ity of Texa8 Austin, Texas INTRODUCTION There is a clear tendency for large-scale and, especially time-sharing computer systems to have several levels of random access memory with gradations in access time, degree of address ability, and functional capability. In our configuration at The University of Texas at Austin these are a high-speed magnetic core memory, an extended core memory of magnitude 4 times the size of the main memory, and 4 large, fast disks. An extensive literature1 ,2,3,4 has already developed on the management of multi-level systems where the main memory is structured in pages, usually with an extended logical addressing space. The management of multi-level memory systems where the main memory is not paged has received much less attention. 5 ,6,7 Certain problems are characteristic of systems which can assign main memory to a given process only in a single contiguous block. These problems become performance-limiting factors when the computer system supports a multi-programming batch system and a substantial interactive load. We discuss some models for memory management where both a multiprogramming batch system and heavy interactive usage compete for the resources of a three-level memory system with a non-paged main memory. These models are based on detailed measurements8 of system performance and job characteristics for the current operation of a CDC 6600 computer system. We pay special attention to the competition between batch and interactive jobs for memory resources and on the costs of data flow between levels of random access memory. Fuchel and Heller5 have studied the general characteristics of Control Data's extended core storage (ECS) as a swapping medium and a storage medium for active files. Fuchel, Campbell, and Heller15 have studied in detail the use of ECS as a buffering device for active files with the motive of increasing CPU efficiency. Such uses of ECS in our configuration does little to improve CPU efficiency. As predicted by their 459 analysis, the use of two dual-channel 6638 disks (effectively, four independent disks), together with 131,000 words o'f main memory, allows us to attain central processor efficiency in the range 85 to 90 percent. Our intention is to use ECS to provide a significant interactive computing capability without significantly degrading the current high level of system performap.ce. The following analysis will show how that is possible. SYSTEM CHARACTERISTICS The particular system which we use to parameteriz--e our models is a CDC 6600 with 131,000 60-bit words of high-speed main memory, 524,000 words of extended core storage (ECS), and four 6-million-word disks. For future reference, we shall summarize certain characteristics of the CDC ECS: (a) Transfers between ECS and main memory proceed at the acceptance rate of main memory (10 words per microsecond) after a transfer is initiated. (b) Average transfer initiation time is 3.4 microseconds. (c) Transfers are initiated by central processor instructions and hold the central processor until the transfer is complete. (d) ECS is internally structured in 8-word records. A peripheral processor may interrupt an ECS transfer at th~ end of an 8-word record. One main memory word may be read or written in one microsecond by the peripheral processor before the transfer is automatically resumed with an additional start-up time. Each main memory access by a peripheral processor during an ECS transfer will be delayed an average of 400 nanoseconds and will increase the total transfer time by 4.4 microseconds. (e) ECS is word addressable by the central processor for data transfers between it and main memory, but instructions and data cannot be fetched directly to CPU registers. Thus ECS cannot be used as a direct logical extension of main memory, as IBM Large Core Storage can, but must be considered as auxiliary storage. 460 Spring Joint Computer Conference, 1970 ~ ::~ ~ HISTOGRRM Of DISK 110 TIMES MEDIAN MEAN = 38. = -'6. ~ ~~ u •• ~. ~ ~ . ~ p . / g • ~.oo . 60.00 10.00 100.00 I 0.00 140.00 160.00 110.00 200.00 T I ME IN MI LL I-SECONDS Figure I-Histogram of disk I/O times The operating system for The University of Texas at Austin 6600 system supports both local site and remote batch job entry as well as conversational interaction. The operating system is locally written, having been derived from an early (1966) version of the standard CDC (SCOPE 2.0) operating system. We shall describe those features which are essential to the management of memory resources. The operating system divides main memory into not more than eight blocks or work spaces. One work space is a fixed length block for system tables and monitor code. Up to seven variable length work spaces may be allocated to active processes. Each assigned work space is associated with a control point. One control point is used to control and drive the peripheral input and output ~quipment and the remote computers. Another is assigned to the management of. the remote on-line terminals. Five control points and 85,000 words of main memory are left for user programs. Administrative policy constrains batch jobs to 73,000 words or less and 'interactive jobs to 32,000 words or less. Files are assigned to the four disks in a round-robin fashion. The next file to be assigned is assigned to the next disk in the round-robin. No space is allocated on that disk until the file is actually written. The disks have moveable arms but only 32 different positions. The basic allocation unit on a disk is called a halftrack, 3072 words, comprising every other sector (64 words) of a particular arm position and head select. Each arm position covers 64 half-tracks. Half-tracks are numbered according to their physical order on the disk .. At the moment in time when a file needs more space, the lowest numbered half-track available on the disk to which the file is assigned is allocated to that file. The effect of these facts and allocation policies is that currently active files are distributed over the avail- able disks and active files that are on the same disk tend to be interleaved under a given arm position. This minimizes the principal difficulty with moveable arm disks, namely arm motion. Figure 1 is a histog·ram of user disk I/O times under these policies. The relevant data on the 6638 disks with respect to this figure is that the average rotational latency is 25 milliseconds and arm motion requires between 20 and 100 milliseconds, depending on the distance involved. For practical purposes, we take this structure as given and we consider next the allocation and scheduling strategies within this structure which affect system performance. ALLOCATION OF RESOURCES BETWEEN BATCH AND INTERACTIVE USAGE It is desired to allocate the resources of the system; access to the CPU, main memory, ECS, etc. so as to insure rapid (one second or less) response time to a substantial number (e.g., about 30-40) of interactive users while maintaining a fast batch system throughput and a high (~80%) central processor efficiency. The primary factors in determining the allocation strategy are job load characteristics, the characteristics and capacity of the swapping media, the swapping overhead, and the competition between the batch and interactive job streams. These problems and the allocation of CPU activity between control points are discussed in the next sections. The interference between the batch and interactive systems is primarily a competition for main memory. The factors dominating this competition are interference with disk I/O due to main memory lockout during swapping, the scheduling of batch jobs for loading into main memory, and the "memory compacting" problem. A separate section is devoted to each of these factors. JOB-LOAD CHARACTERISTICS The job-load characteristics have been determined by measurement of more than 50,000 jobs. The measured mean size of user batch programs is 21,000 words. The mean interactive program size is 12,800 words. Let S denote the siz~on an arbitrarily chosen active job and F s ((3) = P[ S ~ (3J be the distribution function of job sizes. Figure 2 shows F s ((3) for batch and interactive jobs. At any given time, the probability that any given batch job in execution will be 21,000 words or less in size is one-half. The core-size distribution is such that five control points are active 20 percent of the time and four are active 65 percent of the time. }\1anagement of Multi-Level Non-Paged Memory System The mean I/O wait time (t1O) is 46 milliseconds. The distribution function of tlO is extremely compact as can be seen from Figure 1. Disk channels are active 22 percent of the time per channel. This means that a disk I/O request is queued because of conflicting channel usage only a small percentage of the time. The compactness of this distribution is produced by the disk space allocation strategies described in the previous section. The mean time (t e ) a batch job computes before requesting an I/O operation is 48 milliseconds. The _mean CPU time per interaction is not much above one millisecond. The measured median "think time" for an interactive user is 10 seconds (12). This figure is very close to that found in studies of other systems. 13 •14 THE ALLOCATION OF THE CPU TO ACTIVE JOBS Modeling and analysis done by Gaver17 has shown that there are four principal effects on CPU efficiency in a multiprogramming computer system: (1) the ratio of te , the average interval of execution between I/O operations for any single given job and tlO, the average length of an I/O operation; (2) J, the degree of multiprogramming or the number of jobs in main memory executing, doing I/O, or awaiting either; (3) I, the number of I/O units; and (4) (12, the variance of the compute time distribution. It is interesting to note that the "shape" of the compute time distribution curve seems relatively insignificant compared to the value of (12. In order to increase the degree of multiprogramming, we make use of "memory compacting." The operating system will move the contiguous block of memory assigned to a control point from one absolute location to PROBABILITY DISTRIBUTION OF JOB CORE SIZES Figure 2--Probability distribution of job core sizes 461 another. This increase in the degree of multiprogramming is achieved by the job scheduling policy developed in a following section. The cost of memory compacting under the resulting policy depends on the packing policy used. The last section describes a policy which makes this cost very small (less than one percent of the CPU time) with respect to the increased CPU efficiency gained from the increased degree of multiprogramming, even with a heavy interactive job load. In actual practice we observe a CPU efficiency between 85 and 90 percent. The variance, (12, of the CPU compute time distribution can be affected by CPU scheduling strategies. Gaver's analysis17 considered no CPU scheduling other than first come, first served. However, by switching the CPU among jobs ready to compute, we can effectively lower the variance of the compute time distribution. Lowering (12 will increase the CPU efficiency as w~ll as increase the I/O rate. A full analysis of this effect has not yet been completed but preliminary results on a round-robin servicing discipline indicate that there is an optimum quantum size for a given distribution of I/O operations and compute times and a given cost of switching the CPU from one job to another. This optimum is most affected by the cost of switching. For a CDC 6600, this cost is approximately 32 microseconds. A quantum size of five milliseconds seems to achieve the best results with respect to increasing the I/O rate and increasing the CPU efficiency. ALLOCATION OF ECS Despite the speed of the disks with respect to the I/O demands of the batch system, it is easy to see that they are extremely slow swapping devices for a nonpaged system, especially compared to ECS. For the mean interactive job size of 12,800 words a disk transfer would require at least 250 milliseconds exclusive of any queueing and positioning time. The transfer time for ECS depends on the block size used. Figure 3 shows the actual transfer time as a percentage of the maximum theoretical transfer time as a function of the block size. It should be noted that the limiting percentage of 64 percent is due to the PP break-ins experienced in a running system. This curve makes clear the desirability of a large block size to take maximum advantage of the speed of ECS. Thus we want B, the block size to be as large as possible. However, if S is the mean program size and B «S then the waste per program is B /2. We now require that the waste, 13/2/ S, be less than 2 percent. B = 512 achieves this objective while giving an ECS transfer utilization rate near 50 percent. Since B « S for this value, the result is valid. 462 Spring Joint Computer Conference, 1970 8 :8 Ees TRANSFER SPEED UTI LI ZAT J ON Figure 3-ECS transfer speed utilization Because of the unsuitability of the disks for interactive swapping, the memory capacity of ECS effectively defines the "natural capacity" of the system for handling interactive jobs. The interactive job-size distribution shows the ECS capacity to be about 42 users. Recalling that measured median think time for a given interactive user is 10 seconds and the mean CPU time per interaction is on the order of one millisecond, it is clear that with the CPU scheduling policy described above, a single control point could service interactive demands up to the "natural capacity" of the system with response times of less than one second. A CPU quantum size of five milliseconds insures CPU allocation to the interactive control point at least once every 25 milliseconds so that there is no appreciable delay due to CPU queueing. The interaction rate of once every 250 milliseconds for a set of 40 interactive users and a mean swap time (in and out) of 5.2 milliseconds will give a CPU overhead in this case of approximately 2 percent. We consider this to be a very reasonable price to pay for the attained interactive service. INTERFERENCE WITH DISK I/O DUE TO SWAPPING The most serious effect of swapping interactive jobs through ECS is the effect on disk I/O. The alternate sector coding technique used on the disks requires that the peripheral processors doing disk I/O have sufficient access to main memory between alternate sectors to access 64 words. The transfer itself between main memory and peripheral processor memory requires 320 microseconds. The total available time between alternate sectors is 500 microseconds. Because of the book- keeping to be done, there is very little time to spare. Delays in accessing main memory will cause the peripheral processors to miss the next sector and have to wait a full revolution of the disk to access the missed sector. Since ECS transfers with a block size as small as 8 words cause sufficient delay (in main memory access for the peripheral processors) to generate· this problem, it is important to assess the level of ECS usage and the amount of this disk I/O degradation. A missed sector adds 50 milliseconds to the I/O service time for an I/O request that encounters this problem. The level of ECS usage predicted in the preceding analysis will cause an increase of approximately 15 percent in the average I/O service time. This will cause an increase in the probability that a process is in an I/O wait state and thus an increase in the expected CPU idle time. This effect may require a change in the alternate sector coding technique used on the disks. On CDC 6638 disks; without ECS interference, the alternate sector coding technique is so close to optimal that a high level of ECS usage could be counterproductive; however, preliminary studies indicate a decrease in expected CPU efficiency of about 5 or 6 percent at peak periods. Thus the total expected overhead and increased CPU idle time is less than 10 percent, a figure we consider acceptable. BATCH JOB SCHEDULING To schedule jobs, the basic administrative policy is to give fast turnaround to jobs with small resource requirements. We can justify this policy on the basis of job load. For example, 90 percent of all jobs use less than 20 percent of all CPU time charged to users. The scheduler has available the same swapping mechanism for batch jobs as for interactive jobs. A scheduling strategy which pre-empts a long batch job when a short job arrives in the queue and resumes the long job after the short job terminates can be used. This is commonly called a pre-emptive-resume type of scheduling strategy. We desire to find a scheduling strategy which makes "optimal" use of the available space with respect to the above policy. To describe the situation formally, we make use of a simple type of mathematical programming model. The scheduling problem consists, essentially, of examining the n jobs which are currently awaiting execution and selecting some or all of these for loading. Let Xi be a variable assuming the values 0 or 1, denoting respectively the decision not to load, or to load, job i. Each job i has a space requirement Si, and at scheduling time has a time remaining requirement ti, the total time requirement for job i minus the elapsed execution Management of Multi-Level Non-Paged Memory System time. If the main memory available for batch jobs is S, and if each job is assumed to have a "utility" Ui = l/ti (which represents the anticipated completion rate· for job i), one way of formulating the scheduling problem is to determine values for the variables Xl, ••• , Xn which solve 463 practice begin with a "judicious" choice of a! To proceed with derivation of an optimal scheduling algorithm from (2), we note that the chance constraint simply states that ·n L: UiXi Maximize i=l n subject to· L: 8iXi ~ S (1) i=l and Xi = 0 or 1. Since Ui and 8i are strictly positive for all i, this is the familiar "knapsack problem" of mathematical programming. The actual scheduling algorithm will not be determined by solving (1) for several reasons. First, the available space S is not constant over the time interval within which a given scheduling decision will be effective and thus is not known with certainty. This is a result of the rapidly changing space requirements of the control point assigned to interactive jobs. Second, the discrete character of the variables Xi makes an optimal schedule determined by (1) unduly sensitive to fluctuations in S, an undesirable lack of robustness of the solution. For this reason, we reformulate (1) to take into account the uncertain nature of S. Specifically, we suppose S is a random variable with known distribution function F s. This function can be determined from the data presented graphically in Figure 1. In view of the "policy" character (versus "resource" character) of the space constraint, a particularly appealing way to encompass the intent of (1) is to express the space restriction as a "chance constraint."16 We rewrite (1) as follows n Maximize L: UiXi i=l subject to P [ and 0 ~ Xi ~ E s,x, ;;; s ] ;;; " (2) 1. Here a is the confidence with which we require the space constraint to be satisfied. A typical value for a, which is specified a priori and is not a variable whose value is to be determined, might be .9. We shall see below that the relaxation of the explicit integrality condition on Xi does not depart radically from the desired interpretation of Xi since for some confidence level close to a, an optimal solution to (2) will automatically assign only 0-1 values to the Xi. Since there is nothing sacred about .9, for example, we shall in when the distribution function F s is given by F s (t) peS < t]. Since the Xi are to be chosen as constants and not as functions of S, they are "zero-order" decision rules in the usual terminology of chance constrained programming. It is not difficult to prove, using the monotonicity of F s and the definition of F S-l (given by FS-1(1 - a) = sup{t: Fs(t) ~ 1 - a}) that the set of values for the Xi which satisfy the chance constraint of (2) is the same as the set of values for which n L: 8iXi~ FS-l(1 - (3) a) i=l where FS-l(1 - a) denotes the 1 - a fractile of the distribution function .of S. Note that this is true even if F s is discontinuous or not 1-1. In view of this, (2) is equivalent to the ordinary linear programming problem n maximize L: UiXi i=l n subject to L 8iXi ~ Fs-l(l - a) (4) i=l and 0 ~ Xi ~ 1. Two observations are pertinent here. First, FS-1(1 - a) provides a "certainty equivalent" for the random space S which will actually be available. The structure of the linear programming problem (4) is so special that an optimal solution is obtainable by inspection, as is well known. Such an optimal solution is determined as follows. Order the variables Xi such that uti 81 ~ U2/ 82 ~ ••• ~ U n /8 n and take Xl = 1, X2 = 1, ... until Xk = 1 would violate (3). Xk should then be set to the fractional value k-l FS-l(1 - a) - L: 8iXi i=l in order to just use up the remaining space. However, if a happened to be such that k-l Fs-1(1 - a) = L: 8iXi, i=l we can see that no variable Xi would need to be set to 464 Spring Joint Computer Conference, 1970 pletion rates l/ti of the jobs scheduled, subject to permissible interference with interactive jobs if this interference is sufficiently infrequent, an optimal scheduling policy with extremely simple form can be achieved. In the next section we consider the effect of this scheduling policy on the memory compacting problem. l\1:EMORY COl\1:PACTING PROBLEl\1: Figure 4-Main memory allocation transition a fractional value. In view of the approximate nature of the policy restriction in (2), it is not unreasonable to expect this to be the case for some confidence level close to ex if not for ex itself. We therefore implement the following scheduling method: schedule jobs in decreasing order of their Ui/Si values until available space [FS-l(1 - ex) ] is used up. In practice, each job has associated with it a "job cost" Ci = 8iti. In this terminology, the scheduling order is determined by the order of increasing costs Ci since Ui/8i = l/ti8i = 1/Ci; this rule is particularly simple in form. Our formulatiJ.lg the space requirement as a chance constraint has another motivation. If batch jobs were never permitted to delay interactive jobs because of conflicting space demands, corresponding to a choice of ex = 1, the result would be a high incidence of unused memory space, a smaller number of active batch jobs, and a resulting impairment of CPU efficiency. However, a small batch job (e.g., the last one scheduled) can be swapped to ECS when interactive needs dictate, with a modest reduction in ECS capacity for interactive jobs and in interactive response speed. In practice, when the loaded batch jobs are confronted with a demand for space by an interactive job (corresponding to a violation of the space limitation), the last batch job scheduled is swapped to ECS. In view of a choice of ex approximately equal to .9, this will happen only about 10 percent of the time, which is in accord with the policy toward interactive response time and efficient utilization of the CPU. In summary, by a formulation of the scheduling problem as the maximization of the sum of the com- In a non-paged memory system with only a single bounds-checking facility per process, the memory allocated to a given job must be continuous. When a job terminates in a multiprogramming non-paged system, the total memory available to the next job scheduled is potentially the sum of all unallocated memory regions. This is the amount assumed in the previous discussion of batch job scheduling, i.e., we assumed that memory compacting was done whenever required. Making this potential total actually available frequently requires that other jobs, which are still running, be moved. We now consider the factors which affect the frequency with which these storage moves are necessary and memory management policies used to minimize the total cost of compacting. Memory compacting may be necessary whenever a job ceases to occupy its memory. For a batch job, this situation can arise because of a termination or a preemption. For an interactive job, the situation is caused primarily by a terminal wait condition or a time slice pre-emption. In the current system, the memory requirements of the batch system change about once every 10 seconds. In the worst case, this wili require approximately 30 milliseconds of compacting with one block of ECS dedicated to this application. The effects of the changing memory requirements of the interactive partition on the necessity for memory compacting could be much more serious. If these changes require that batch jobs be moved, the overhead will be substantial. In addition, in the previous section we discussed the possibility of swapping a batch job to and from ECS in order to fill in the valleys of interactive storage requirements and increase the average number of active processes in order to decrease the CPU idle time. If the swapping of this batch job requires that other batch jobs be moved, the overhead could also be large. In order to avoid these problems most of the time and minimize the cost of memory compacting we use the following ordering policy for active processes. All batch jobs are packed into one end of main memory in the order scheduled. Since jobs are scheduled in order of increasing costs, the batch job likely to be swapped because of interactive memory demands normally is last, i.e., closest to the unallocated region. Hence 1\1:anagement of Multi-Level Non-Paged 1\1:emory System swapping of this job will not require moving other batch jobs. The interactive control point is placed last. It is next to the unused memory. This unused memory is sufficient to satisfy its demands 90 percent of the time, as described in the previous section. Figure 4 illustrates the insertion policy used by the scheduler to maintain this ordering. Job a, the lowest cost batch job running, terminates. The scheduler decides to load job e whose cost is between the cost of job d and job c. Thus jobs band c must be moved down to fill the gap left by a and job d n:t-ust, in general, be moved to either make room for job e or fill an additional gap created by job e. It should be noted that the cost of a job varies with time so that the cost ordering of jobs in memory may change. In practice, this rate of change is very small with respect to the scheduling rate for the interactive control point. When it does happen, the highest cost job will be pre-empted when a pre-emption is necessary, regardless of the ordering in main memory. If this job is then rescheduled, it will be put in the proper place to restore the cost ordering in memory. This policy minimizes moves due to changes in the memory demands of the interactive system but maximizes moves generated by completions in the batch system. However, the average rate of completion of batch jobs is one every 10 seconds. Thus this overhead is still less than .3 percent in practice. CONCLUSION A non-paged multi-programming computer system required to support a time-sharing system and a batchprocessing system faces the problem of memory compacting and memory demand interference. Proper memory management policies can minimize these difficulties. Measurements of the system and job characteristics provide the basis for an adequate design. We have shown that such a design for a CDC 6600 and CDC Extended Core Storage can support an interactive load of approximately 40 users with a response time of less than a second at very small cost to a highly efficient batch processing system. ACKNOWLEDGMENTS This research was sponsored in part by the Control Data Corporation under a research grant to the Computation Center at The University of Texas at Austin and by the National Science Foundation under grant GJ-741. 465 REFERENCES 1 J FOTHERINGHAM Dynamic storage allocation in the Atlas computer including an automatic use of a backing store Communications of ACM October 1961 2 B WARDEN B A GALLER T C O'BRIEN F H WESTERVELT Program and addressing structure in a time-sharing environment Journal of the ACM January 1966 3 P DENNING Resource allocation in multi-process computer systems PhD Dissertation Mass Inst of Tech June 1968 4 R E FIKES H C LAUER A L VAREHA Steps toward a general purpose time-sharing utility using large capacity core storage and TSSj360 Proc of Nat Conf ACM 1968 ;) K FUCHEL SHELLER Considerations on the design of a multiple computer system with extended core storage Communications of ACM November 1968 6 W ANACKER C P WANG Performance evaluation of computing systems with memory hierarchies IEEETEC December 1967 7 D N FREEMAN A storage-hierarchy system for batch processing Proc AFIPS SJCC 1968 8 H D SCHWETMAN J C BROWNE A study of central processor inactivity on the CDC 6600 To be published 1970 9 H N CANTRELL A L ELLISON Multiprogramming system performance measurement and analysis Proc AFIPS SJCC 1968 10 U N DE MEIS N WEIZER Measurement and analysis of a demand paging time sharing system Proc ACM 1969 11 B WARDEN D BOETTNER Measurement and performance of a multiprogramming system Proc Second Symposium on Operating Systems Principles Princeton New Jersey 1969 12 H D SCHWETMAN J R DELINE A n operational analysis of a remote console system Proc AFIPS FJCC 1969 13 A L SCHERR An analysis of time-shared computer systems Research Monograph No 36 1967 MIT Press Cambridge Massachusetts 14 G E BRYAN JOSS 20,000 hours at the console-A statistical summary Proc AFIPS FJCC 1967 1,15 K FUCHEL G CAMPBELL SHELLER The use of extended core storage in a multiprogramming operating system Proc Third International Symposium on Computer and Information Sciences Miami Beach Florida 1969 16 A CHARNES W W COOPER Deterministic equivalents for optimizing and satisfying under chance constraints Operations Research Vol 11 1963 17 D P GAVER JR Probability models for multiprogramming computer systems Journal of the ACM July 1967 A study of interleaved memory systems by G. J. BURNETT Index Systems, Inc. Boston, Massachussetts and E. G. COFFMAN, JR. Princeton University Princeton, New Jersey INTRODUCTION There is frequently a severe mismatch between achievable processor and memory speeds in today's computer systems. For example, the CDC-7600 has a 27ns (nanosecond) processor cycle time and a 270ns memory cycle time;l the IBM-360j91 has a 60ns processor cycle time and a 750 ns memory cycle time. 2 In order to obtain the desired increase in the effective memory speed, an efficient memory system must use such techniques as interleaving memory modules and implementing an automatic level in a memory hierarchy (e.g., a slave memory3 as in the IBM-360j854 or 1955 and the CDC-7600). In the past, interleaving was often studied by simulation using a random address generating source to obtain memory requests. 6 •7 This paper discusses results of mathematical analyses of models of interleaved memory systems. In these investigations the properties of addresses generated by instructions and data have been distinguished. Interleaving is achieved by dividing the memory into separate, independent modules that can be in simultaneous operation. Information is then stored in the memory with sequential items residing in modules that are consecutive, modulo the number of memory modules; equivalently, the low order address bits specify the memory module number, and the high order bits specify the word within a module. We will first present a model of interleaved memory systems. In the analysis of this model we obtain a figure of merit for such systems, viz. the average number of memories in operation on data or on instructions during a memory cycle. Results of numerical investigations of this figure of merit, which we call the average memory bandwidth, will be displayed both individually for instruction and data requests and for a combination of these requests into a system structure utilizing interleaving. ANALYSIS OF A MATHEMATICAL MODEL OF INTERLEAVED MEMORY SYSTEMS Model and terminology The model is pictured in Figure 1. There are n identical modules each capable of reading or writing one word per memory cycle. We shall assume that the modules operate synchronously and with identical memory cycle times. In practice the Request Queue contains conventional instruction and data storage addresses; however, for our purposes only the module number from the address is of interest. Thus, we will consider the requests ri, i = 1, 2, "', to be integers from the set (0,1, "', n - 1). The Scanner operates by admitting new requests to service until it attempts to assign a request to a busy memory module. To do this, prior to the start of a given memory cycle, i.e., during the previous memory cycle, the Scanner inspects the Request Queue beginning with rl and determines the maximum length sequence of distinct module requests. That is, it scans the queue to the first repetition of a module request. The memory requests in this maximum length sequence are then sent to the appropriate memory modules so that they will be active in the next memory cycle. The maximum length sequences found in this manner are called request sequences, and their lengths can be from 1 to n requests. We shall assume that the Request Queue always contains at least n items when inspected. 467 468 Spring Joint Computer Conference, 1970 string of k distinct integers followed by a repetition of one of these k is given by Memory Modules (n - 1) (n - 2) P(k) 0 0 0 (n - k + 1) ~ (1) n Return ken - 1h-l e r where we use the notation9 Figure 1-Interleaving model (i) j = i (i - 1) (i - 2) In effect, the queue will always be saturated and the system operating at capacity. More formally then, a sequence of requests rl, r2, rk at the head of the request queue is a request sequence if and only if (1) for all i, j (1 ~ i ~ k, 1 ~ j ~ k) i ~ j implies ri ~ rj, and (2) there exists a p, (1 ~ p ~ k), such that rp = rk+l' We will assume that the Scanner can process n requests per memory cycle; this is equivalent to assuming that the processor can handle n words per memory cycle. (We shall shortly consider the case where the Scanner can process M ~ n requests per cycle.) I t is then clear that the effectiveness of an interleaved memory system is determined by the probability density function (pdf) for the lengths of request sequences. Let P(k), k = 1, 2, 3, on, denote this pdf. 0 0 0, 00 Then Bn = t k = l kP (k) denotes the mean value of this pdf. Bn will also be called the average memory bandwidth with the units of words/memory cycle. In practice, repetitions in the Request Queue occur sufIiociently often such that Bn is considerably less than n, the maximum value for Bn. We will thus be interested in considering systems with M processor cycles per memory cycle where 1 ~ M ~ n. (Clearly M > n is useless in terms of obtaining information from the memory since we are assuming we have only n memory modules each capable of accessing one word per memory cycle.) Such a system allows a slower processor to be used and yet may offer almost the same average memory bandwidth as a system with M = n. The average memory bandwidth for such a system will be denoted by B M and will also be calculated directly from the P (k) mentioned above. To compute the P(k) (and therefore B n), it is necessary to know the properties of the sequences rl, r2, r3, in the Request Queue. Hellerman8 has analyzed this model under the assumption that for all i, Pr [r i = j] = l/n for allj E Sn, and that this probability is stationary (i.e., the same for every memory cycle). A simple analysis for this model shows that the probability of a 0 0 0 0 • 0 + 1) ; (1 (i - j ~ j ~ i) and (i)o = 1 Hence, Bn = f k=l kP(k) = f k=l 2 k (n -k 1h-l (2) n Hellerman has carried out curve fitting to get the approximation Bn ~ n· 56 , 1 ~ n ~. 45, which is accurate to within about 4 percent. In his model, Hellerman assumed that instruction and data requests were intermixed in the Request Queue. Inasmuch as successive instruction requests tend to have more serial correlation than successive data requests, we have chosen to represent instruction and data request sequences separately in our model. In addition we will investigate a system structure in which these requests are handled separately. Thus, we consider the model of Figure 1 to contain two separate queues, the Instruction Request Queue and the Data Request Queue. In this ·paper, we will only consider a system structure that alternates instruction and data cycles. That is, for one memory cycle the Scanner obtains all requests from the Instruction Queue and for the next cycle only the Data Queue is scanned. We refer to this system structure as the Instruction Data Cycle Structure, IDCS. The above operation enables us to carry out separate mathematical analyses in order to determine the average memory bandwidth for an instruction cycle, IBn, and for a data cycle, DBn. Bn is obtained from a weighted sum of IBn and DBn. The weighting depends on the assumed percentage of the instructions that request data and as a result on the actual values of IBn and DBn. Studies of program composition1o suggest that approximately 80 percent of all instructions require an operand (data) reference to storage. We will make the somewhat more conservative assumption that 80 percent of the instructions executed, excluding branch instructions, request data. We shall use this as an assumption to calculate Bn from DBn and IBn. Study of Interleaved Memory Systems one memory cycle -(~ processor cycles)_ processor cycle [InstrUCtion Cycle II ......... 1 1. n instructions are being fetched from the memory 2. Data for the next cycle is requested Data Cycle Instruction Cycle 11 ......... 1 I 3. Instructions return from memory and are decoded one per processor cycle I 11 ..... ILlJ b~anch 1 ignored instructions (internal waste) 4. Instructions for the next cycle are requested* * N9te that all!. sequential instructions can be requested at once since they are for sequential words that are located in sequential memory modules. with the branch address. If a branch is not found, the next instruction cycle will request n sequential words starting with the word following the last requested instruction. With the Instruction Queue we associate the parameter "X, the stationary and independent probability that any given instruction generates a branch. Observe that in the data cycle following an instruction cycle in which a branch was requested, all instructions that were requested after the branch will not be decoded. (This is all right since the look-ahead mechanism only guessed that n sequential instructions would be used.) The instructions that are requested but not decoded in a given memory cycle are referred to as internal waste. With "X given, internal waste is accounted for in P(k), the probability that k of the n instruction words obtained in an instruction cycle are actually decoded. Figure 2-Instruction processing P(l) = "X Instruction model The efficient use of an interleaved memory in a single processOJ: system is predicated on the existence of a processor fast enough to handle several instructions per memory cycle. In order to obtain this multiple instruction capability, each memory cycle the processor requests a number of instructions beyond the present instruction counter address, i.e., it looks ahead. Therefore, such a system must have buffering for instructions, for data (operands), and for storage addresses. For the purpose of determining the memory bandwidth we only need to assume that the instruction buffer holds the :'maximum number of instructions that can be obtained from the memory during a memory cycle, n, and that the instruction decoding unit can decode all these instructions in one memory cycle. (In other words, the system is capable of decoding one instruction per processor cycle.) We also assume that a branch instruction requested during one memory cycle will be decoded in the next memory cycle, and will immediately affect the instruction stream. Under the foregoing assumptions, our system operates as shown in Figure 2 and as described below. 1. At the beginning of an instruction cycle the Scanner requests the n sequential instructions following the present instruction counter address. Thus, n memory modules are busy during this cycle. 2. If an instruction requesting a branch is decoded during the next data cycle, the n sequential requests for the next instruction cycle will start from the branch address, i.e., the instruction counter will be loaded 469 P(k) (1 - "X)k-1"X; pen) (1 - "X)n-l l ~ (declarator> =} I [ J (declarator> =} I union (declaratorpack» = } (13) - (14) - In the translation of array bounds in ALGOL 68, one has to take into account the possibility of undefined array limits indicated by the "[ J" sequence or by 498 Spring Joint Computer Conference, 1970 the following translation rules: that is used. The notation (clausesequence) --7 (clause)::::} .$ y(clause )$.1' I (clausesequence) --7 3 [(name)]) (clausesequence), (clause) ::::} (clausesequence ),.$y(clause )$:.!t' (17) A typical procedure definition in the translation of (clausesequence) would look like the sequence Here, the ".$" command tells the run-time system that what follows is a procedure definition. It therefore leaves as its value the value of the translated program location counter that begins the translated (clause). Since the procedure is not activated when the procedure definition is assigned to some variable, the ".$" command is followed by a jump to the code directly following the procedure definition. The "$." command is part of the code in the procedure definition. The "$." is executed as a return jump command that looks up a return label on a table of return jumps and transfers control back to that point in the program where the procedure was called. Finally, for our particular choice of intermediate language, subscripting is accomplished by the ")" command. This ")" command assumes that the topmost operand of the run-time operand stack is an integer number, and that the next-to-top operand is a reference to a list cell. Thus, we have the translated sequence in rules (20) represents the translator-supplied subscript number followed by the subscripting command. This subscripting capability of course implies that the translator must itself keep track of structures and pointers from one element of a structure to another structure. In this way, the translator must have a list processing capability for tracing lists and sublists to any desired depth of nesting. DATA CONSTANTS AND PROCEDURE DEFINITIONS IN ALGOL 68 At this point, it is convenient to introduce the data types used in the language. Data of type char (character) is denoted by the following syntax: (charprim) (subscripts) --7 (sum) I (subscripts), ::::} I (sum)::::} (subscripts») (sum) (18) To complete this description of SUbscripting, we introduce the next layer of rules above rules (17) in the system of precedence: --7 "(alphameric)"::::} .*(alphameric) (21) The sequence ". * (alphameric)" stores that symbol on top of the run-time operand stack. Here, (alphameric) is whatever set of symbols are available for a particular computer. By convention, a quote symbol (") is represented by a pair of quotes (" ") in the language. Thus, the assignment V .·=""""·, stores a single quote in character variable v. Data of type bool (logical) is denoted by the following syntax: (subscriptvar) (subscripts») given in rules (16), and the following rules for translation of (subscripts): (20) (logicalprim) --7 true::::} $TRUE I false ::::} $FALSE (22) The $TRUE ($FALSE) command stores the internal representation of logical truth (or falsety) on top of the run-time operand stack. Although ALGOL 68 allows real and integer numbers, as well as multiple precision versions of numbers, we have for simplicity translated all numbers into singleprecision floating point: (number) (19) (integer) ::::} $NUMBR3[ (integer)] I (integer) . (integer) ::::} $NUIVIBR3[ (integer) . (integer)] (23) Thus, in our syntax, the numerical subscripting of (16) takes precedence over the logical subscripting of (19) because of the ordering of rules. When a variable is logically subscripted as described in (19), the translator provides a numerical subscript to the translated program, and this numerical subscript corresponds to the position in its own structure of the logical subscript In both cases of rules (23), the command "$NUIVIBR" is followed by the internal floating point representation of the appropriate character strings. "$NUMBR" serves to place the following translated word on top of the run-time operand stack. For convenience in initializing small arrays, ALGOL 68 allows structures similar in appearance to lists. So, we will call them lists, instead of using the ALGOL 68 (selection) --7 (subscriptvar) ::::} I I (name) of (selection) ::::} (selection). 3 [(name)]) --7 A Translation Grammar for ALGOL 68 "flex" in place of a definite upper or lower bound. As handled in the translation scheme above, all arrays translate into lists. Hence} a lower index bound of 1 always exists for each dimension of an array, whether or not the programmer asks for that lower bound. In addition, arrays with no bounds specified for some dimension contain one element undefined sublists for that dimension. Translation of variables declared to be of type union is simple and direct, since our run-time system treats all variables as though they were capable of storing any legal data type. The translator merely keeps a record of which variables are of particular data type. 499 for subscripting variables in ALGOL 68, and these methods can be used separately or in combination. Another feature of variable usage in ALGOL 68 is that variables can be "selected" before being subscripted. Thus, the statement a : = if gl then b else cft [5, 3J; is valid in ALGOL 68 if a, b, c and gl are appropriately declared, and the subscripts "[5, 3J" are within the bounds of band c. Another, essentially similar, usage is one in which a case statement (similar to the case statement in ALGOL W) is used for selecting a variable: if gl then d else eft: = case i in a, b, c esac [2, 4J; VARIABLES AND SUBSCRIPTING IN ALGOL 68 From the syntax in the preceding section, we see that the declaration struct (real x, [1:10, 1:5Jinty)z; is legal in ALGOL 68. This declaration causes z to refer to a two-element structure of values whose second element is a two-dimensional array that stores fifty integers. In order to store values into z or extract values from z, these elements of z are referred to by name; e.g., This use of conditional expressions for selecting variables in ALGOL 68 leads to a translation difficulty in which the translator is unable to decide whether it is translating a chain of variable references or expressions yielding values because either interpretation is correct for a conditional expression. To get around this difficulty, we have decided to write our translation grammer so as to treat every expression as though it were a variable reference until the last possible minute. Thus, we obtain the following strange-looking syntax of subscripted variables: :=} I (name) ---7 (letter) I (name) (letter) :=} I I (name) (digit) :=} I xofz:= (yofz) [8,3J; In this statement, the forty-third integer in the second element of z is converted to a real number and stored into the first element of z. Thus, we have two methods (subscriptvar) ---7 (name) I (block) I if (clause )(1) then (clause )(2) else (clause )(3) ft I case (clause) in (clausesequence) esac I (subscriptvar) [(subscripts) J :=} :=} :=} :=} :=} (15) In rules (15), a table lookup mechanism informs the translator that some (name) is actually a declared variable. Then, rules (16) are applied. $VARBL (name) I (clause )(1) $IF (clause )(2) $THEN'F4'(clause )(3)1' L[ (clause) . (.$ (clausesequence) $.) . $VARBL $CASE $IN (subscriptvar) (subscripts) J (16) In rules (16) above, it should be explained that the translator must have an operand stack that permits it to keep track of data types associated with variables and expressions. With this operand stack, the translator will only allow subscripting to follow an expression whose value is a variable reference. In translating the conditional statement of rules (16), the translator also uses a stacking mechanism to supply program labels to the conditional branch command "$IF" and the unconditional branch command "$THEN." These labels are indicated schematically in the rules by the " ~" notation. Finally, a- case statement is translated into a two-parameter procedure call on. the system procedure "$CASE." The first parameter is an expression that has a subscript as its value, and the second parameter is a run-time list of parameterless procedure definitions that is to be subscripted by the first parameter in the process of calling one of the list elements. This list of procedure definitions is constructed by 500 Spring Joint Computer Conference, 1970 name for them: (listprim) ({clause), (list end ) ({clause), (listend) (list end ) --4 (clause») =? (clause») I (clause), (list end ) =? (clause), (listend) --4 =? . (24) Here, if we use the declaration "[1: 2J real x" and follow this by the assignment "x : = (1.0, 3.5)"; we can see how the lists are used. Thus, these list primaries translate directly into our own notation, with the exception that lists of zero or one elements cannot be represented ~sing list notation. This is to avoid an ambuiguity, e.g., in which the sequence "(I)" is interpreted as being both a (block) and a (listprim). In an analogous vein, ALGOL 68 has provisions for representing strings of symbols: (stringprim) "(alphameric) (string end ) *(alphameric), (stringend) (stringend) --4 (alphameric)" =? . * (alphameric») . I(alphameric) {stringend)=? *(alphameric ), (stringend) --4 =? . (. to prevent ambiguity. As we will see, it is possible to invent a data type called string whose variables all consist of character arrays with flexible upper bounds. Finally, we include procedure definitions in this section because procedure definitions ~an be assigned to variables in the same way as other data constants. We have chosen to use an older version of the ALGOL 68 procedure definitions, because the keyword "expr" used in these older versions is subject to fewer usages in the language than the ":" that replaces expr in the syntax. We have also chosen to force the programmer to be explicit about his choice of procedure definitions in our version. Thus, the assignment statement vI: (Assign procedure definition v2 to vI if gl is true, otherwise assign procedure definition v3.) is legal in our version of ALGOL 68 as well as the "full" version of the language. But we do not allow the alternate assignment statement: vI : = if gl then v2 else v3 fi; (25) Thus, a string translates into an array of characters in the same fashion as in ALGOL 68. Here again, strings o( length zero and one cannot be directly represented (procdef) = 1f g1 then expr v2 else expr v3 fi ; even though this is also legal in ALGOL 68 and is equivalent to the first statement. We thus have the syntax of procedure definitions in rules (26): expr{assignment) =? .$y{assignment) $.1' I (indication) expr (assignment) =? .$ L( (assignment) $·1 I ({p. list») expr {assignment =? .$ (p. list) (assignment) $. t I ({p.list») (indication) expr (assignment) =? .$1.( (p.list) (assignment) $. t --4 These procedure definitions are of the same form as described in rules (18). However, here the procedure definition symbols ".$" and "$." surround what will be seen to be single statements. In order for a procedure to be permitted to return a value in ALGOL 68, the type of its returned value must be indicated in the pro- L cedure definition. Thus, all the procedure definitions having an (indication) before the expr keyword behave like ALGOL 60 or FORTRAN functions, while the remaining definitions are similar to procedured in those languages. Finally, we have the translation syntax of the parameter list: (p. list) --4 (type) (name) =? $FOR1VIA (name) I (p.list), (name) =? $FORlVIA (name) (p.list) I (p. list), (type) (name) =? $FORl\1A (name) (p. list) Here, the formal parameters are translated into a reverse sequence. The command "$FORMA" has the effect of entering the following (name) of a parameter onto the run-time name table. $FORMA then fetches the topmost operand on the runtime operand stack, and stores its value and type into the newly declared parameter (name). Because of this method for passing (26) (27) parameters to procedures, the old call-by-name and call-by-value conventions in ALGOL 60 procedures have been abandoned in favor of allowing the programmer to pass procedure definitions, pointers to variables, and actual values into his procedure calls. As an illustration, if vI is a reference variable, v2 is an integer variable, and v3 is an appropriate procedure A Translation Grammar for ALGOL 68 name, then the procedure call v3 (vI, int expr v2, 5) passes as parameters to the procedure definition of v3 a pointer stored in vI, a procedure definition whose body "calls" v2, and an integer value. As we will see, the procedure calling mechanism puts these three parameters onto the operand stack in the right sequence for allowing the $FORMA commands to pick off their values for the formal parameters. PROCEDURE CALLS AND EXPRESSION PRIMARIES In a number of preceding rules, we implicitly assumed the existence of a procedure calling mechanism in our intermediate language. This mechanism works as follows: A typical procedure call is of the form (value 1) (value 2)· .. (value n) $VARBL (name) $IN. The n values are stored in sequence on the run-time operand stack, and then the command sequence "$VARBL (name) $IN" passes control to the procedure. This procedure is assumed to have n formal parameters, and therefore n "$FORMA (name)" sequences at the beginning. These "$FORMA (name)" sequences pick off the values stored on the operand stack in the reverse of the sequence in which they were stored. Hence, any procedure calling system must have a mechanism for matching actual and formal parameters. When we discuss the extendible operations feature of ALGOL 68, we will justify the necessity for our use of this procedure calling· command sequence. The mechanism needed for matching actual and formal parameters is further complicated by the legality of ALGOL 68 procedure calls such as in the following example: if gi then pI else p2 fi [3, 9J (4, real expr p3) ; Here, one of two procedure arrays is chosen by a conditional statement. Then, the chosen procedure array is subscripted, and finally, the actual parameters are supplied when that procedure element is called. To explain our method for translating such ALGOL 68 procedure calls, we give the following syntax of procedure calls: (procall) ~ (subscriptvar) (a. p. list») =} (subscriptvar) . (a. p. list») . $V ARBL $XEQUT $IN (28) We see that a program procedure call with parameters 501 is translated into a call on a system procedure named "$XEQUT." This is accomplished by transforming the program procedure call into two parameters. The first parameter is a (possibly SUbscripted or selected) pointer to where the procedure call is stored on the run-time name table. The second parameter is a link to a list constructed from the actual parameter list of the program by rules (28) and (29) : (a. p. list) ~ (assignment) =} I I (a. p. list), (assignment) =} I (29) The listing of the $XEQUT procedure can be found in Appendix 1. A procedure call without parameters can be treated syntactically like an ordinary value primary of the language. This is because the sequence (selection) $IN is sufficient to execute procedures as well as to bring values referenced by (selection) pointers to the operand stack. We can thus begin to give a syntax for expression primaries in ALGOL 68: (prim) ~ (pro call ) I (procdef) I (stringprim) I (listprim) I (number) I (logicalprim) I (charprim) I (selection) I (referenceprim) I val (prim) I I =} I =} I =} I =} I =} I =} (selection) $IN =} I =} (prim) $IN =} =} (30) In rules (30) above, the $IN command is supplied by the translator to fetch values in two instances. For the first instance to apply, the translator program must determine by inspection of its operand stack that (selection) appears after the left part of an assignment statement such that the left part is not of type reference. This is because the assignment statement, a: = b; where "a" is a variable of type reference, is legal in ALGOL 68. Thus, "b" must be treated as a (referenceprim), and the translator must determine by context (using its own operand stack) whether or not some (selection) is a (referenceprim) as in rule (31) or a value-bearing (prim) as in rules (30) : (referenceprim) ~ (selection) =} I (31) 502 Spring Joint Computer Conference, 1970 Of course, the sequence EXPRESSIONS AND THE PRECEDENCE OF OPERATORS means that the programmer explicitly desires to fetch a value to which a (referenceprim) refers. So as to spare the translator from the necessity of trying to discover how many layers of pointers must be traced through in order to fetch a value, we will require the use of val whenever a (referenceprim) is to be "depressed" to a value in the middle of an expression. (unary) In ALGOL 68, there are ten levels of operator precedence. Corresponding to each level is a set of standard operators for that level. These operators can be redefined by the programmer, who may change their precedences, introduce new procedures that describe their actions, and introduce new operators to suit his convenience. The syntax for such redefinable operations must take into account this new facility: ~ (op 10) (prim) => (prim) $VARBL (op 10) $IN I go to (prim) => (prim) $GOTO (complex) ~ (unary) => I I (complex) (op 9) (unary) => (complex) (unary)· $VARBL (op 9) $IN (exponent) ~ (complex) => I I (exponent) (op 8) (complex) => (exponent) (complex) $VARBL (op 8) $IN (product) ~ (exponent) => I I (product) (op 7) (exponent) => (product) (exponent) $VARBL (op 7) $IN (sum) ~ (product) => I I (sum) (op 6) (product) => (sum) (product) $VARBL (op 6) $IN (inequality) ~ (sum) => I I (sum)(l) (op 5) (sum)(2) => (sum)(l) (sum)(2) $VARBL (op 5) $IN (confrontation) ~ (inequality) => I I (inequality)(l) (op 4) (inequality)(2) => (inequality)(l) (inequality)(2) $VARBL (op 4) $IN (conjunction) ~ (confrontation) => I I (conjunction) (op 3) (confrontation)=> (conjunction) (confrontation) $VARBL (op 3) $IN (disjunction) ~ (conjunction) => I I (disjunction) (op 2) (conjunction) => (disjunction) (conjunction) $VARBL (op 2) $IN (assignment) ~ (disjunction) => I I (selection) (op I) (assignment) => (selection) (assignment) $VARBL (op 1) $IN The complete table of standard ALGOL 68 operators is given in section 8.4.2 of (11). In rules (32) above, we use operator categories (op 1), ... , (op 10) to replace the usual arithmetic, logical, and relational operators that appear in similar grammars. For the translator to know which category an operator belongs to, it must have a table of legal operators similar to its nametable, and with each operator will be an associated level number. To each of these operators there corresponds either a standard intermediate-language operation (in which case, the intermediate language operation is written into the translated program) or a procedure definition (in which case the procedure call "$VARBL (OPn) $IN" is written into the translated program). Procedures defining the .standard operations and their effects when executed are given in section 10.2 of (11). It should be mentioned that we would include several operations that are not in the ALGOL 68 table. For example, the standard operations of (con- (32) frontation). are the relational "=" and "~". In addition to these standard operators at that level, the ALGOL 68 conformity symbol "::" (which checks whether two expressions are of the same mode) and the ALGOL 68 identity symbol ": = :" (which asks whether two expressions yield references to the same (name» are included because they are used in essentially the same way as "=" and "~". Note also that the definition of a jump instruction in ALGOL 68 is put into the (unary) rule of (32) because its precedence in the language is compatible with that level of the grammar. However, the "go to" operation is most emphatically not redefinable, and so is listed separately. OPERATOR DEFINITIONS AND DECLARATIONS Now that we have seen a-syntax for expressions, we can discuss the syntax of operator declarations and A Translation Grammar for ALGOL 68 priority declarations in ALGOL 68. A priority declaration has the form (priority dec!.) ~ priority (operator) = (priority) I (priority dec!.), (operator) = (priority) (33) A priority declaration is not translated, since its role is (operator dec!.) 503 to provide information to the operator table of the translator. Naturally, the (priority) is some (integer) from 1 to 10: (priority) ~ 1 I2 I ••• 110 (34) An operator declaration takes the form ~ op (operator) = (p.list») (indication) expl' (assignment) ==? $NEW (operator) $VARBL (operator) .$'( (p.list) (assignment) $.1 (3.5) In (35), the translator enters the new (operator) name onto its operator table and translates the operator declaration as a new procedure definition. Although the (p. list) l!lechanism is used for simplicity in the translation process, an operator declaration is meaningless unless the (p. list) consists of only one or two parameters. Moreover, as in ALGOL 68, it is assumed that all unary operators have a non-redefinable priority of 10. This is because of the ambiguity that would result if a programmer attempted to redefine the precedence of unary "+" or "-", where the binary addition and subtraction have the same denotations. As an example of an operator declaration, we give here our version of subtraction with real operands in ALGOL 68: variable, and the translation of (indication) becomes its value. This means that, along with its name table and operator table, the translator must have a table of indicants that stores links at translation to any structures that may be assigned- to indicants. Thus, when a variable is logically subscripted whose mode is a structure, the translator can find its own prototype copy of that structure and supply appropriate numerical subscripts in t!!e translation. Because the (indicant) is treated as a program variable by the run-time system (and as a mode declarator by the translator), the translation of (indicant) given in rules (5) presents to the $COPY procedure a list which is copied and assigned to all variables later declared to be of the mode given by the (indicant). op- = (ref real a, b) real expr val a minus val b; op- = (ref real a, real b) real expr val a minus b; op- = (real a, ref real b) real expr a minus val b; op- = (real a, b) real eXl!r a minus b; PROGRAM STRUCTURE OF ALGOL 68 Here, the operator "-" is translated into the intermediate-language comm:and for subtracting the two topmost values on the run-time operand stack. This definition should be compared with definition 10.2.4 (g) of the ALGOL 68 report (11), where the definition of subtraction is given in words and then addition and negation are programmed in terms of this subtraction operator. MODE DECLARATIONS As a complement to the facility for defining new expression operators, ALGOL 68 allows the definition of new data types and structures of data types. This is accomplished by the mode declaration: (mode dec!.) ~ mode (indicant) = (indication) ==?$NEW (indicant )$VARBL (indicant) (indication) = (36) Here, the translation of .(indication) produces some initial value, array, or structure. The (indicant) is translated as though it were a newly declared program At this point, we can complete our description of ALGOL 68 program structure and fill in some remaining details concerning implementation of the language: First, we can draw together the different declarations outlined so far: (declaration) ~ (mode dec!.) ==? I I (operator dec!. ) ==? I I (priority dec!. ) ==? I I (type dec!.) ==? I (37) We can then complete our definition of a program (statement) : (statementsequence) ~ (labelstat.) ==? I I (statementsequence); (labelstat.) ==? I (38) The effect of the semicolon in rules (38) should not be ignored. Its effect as an intermediate-language command is to unstack the topmost operand of the runtime operand stack. Because of the semicolon, the (block) (x:=x+1; 2+x) causes the intermediate value "x + I" to be unstacked, but leaves the value of "2 + x" on the operand stack 504 Spring Joint Computer Conference, 1970 as the value of the (block). This same mechanism is used for returning function values, since the last value on the operand stack before exit from a procedure is its value. Following the scheme of precedence in our syntax, we next define labeled statements: (labelstat) ~ (statement) ==} I I (name): (labelstat) ==} $LABEL (name) (labelstat) "t-t (39) Here, the sequence" (name):" invokes a label declaration in the translated program. The notation "'i-1''' (statement) means that the value of the translated progra~ location counter plus one is inserted following "$LABEL (name)" in the translated program. Strictly speaking, the translator will also have a mechanism for writing out $LABEL (name) whenever a go to statement is translated before the appropriate label is encountered. In addition, it should be noted that program labels are treated as variable names by the syntax of (15), (16), (19), (30) and (32). N ext, we have a syntax for (statement): ~ (assignment) ==}I (statement) ~ for (selection) from (clause )(1) by (clause )(2) to (clause )(3) do (statement) ==} (selection) (clause )(1) (clause )(2) (clause )(3) .$y (statement )$. $VARBL $FORSL $IN t (statement) ~ from (clause )(1) by (clause )(2) to (clause )(3) do (statement) ==} (clause )(1) (clause )(2) (clause )(3) .$ Y(statement )$. $VARBL $FOR $IN (statement) ~ while (clause) do (statement) ==} .$ (clause)$. .$ Y(statement)$. f $VARBL $WHILE $IN 1 Y t Listings of the system procedures "$FORSL," "$FOR," and "$WHILE" can be found in Appendix 1. With these forty sets of rules, we have completed a description of the essential features of ALGOL 68. Missing from the syntax is any built-in procedure for input-output, as well as any description of formatting. As the language is described in this report, formatless input and output procedures for this language can be written that are quite similar to the procedures given in. section 10.5.2 of the ALGOL 68 report (11). Such input-output routines have been programmed for the CDC-6500 computer at Purdue University, and are currently being tested together with the remaining components of the ALGOL 68 translator. BIBLIOGRAPHY 1 R W FLOYD A descriptive language for symbol manipulation JACM No 8 pp 579-584 October 1961 2 E T IRONS A syntax-directed compiler for ALGOL 60 Communications of the ACM Vol 4 pp 51-5;) January 1961 3 P M LEWIS R E STEARNS Syntax-directed transduction JACM No 15 pp 46,!'5-488 July 1968 4 D L MILLS The syntactic structure of MAD / 1 (40) Defense Documentation Center Report No Ad-671-683 June 1968 5 P NAUR editor Revised report on the algorithmic language ALGOL 60 Communications of the ACM No 6 pp 1-17 January 1963 6 V B SCHNEIDER The design of processors for context-free languages National Science Foundation Memorandum Department of Industrial Engineering and Management Science Northwestern University Evanston Illinois August 196;) 7 V B SCHNEIDER Pushdown-store processors of context-free languages Doctoral Dissertation Northwestern University Evanston Illinois 1966 8 V B SCHNEIDER A system for designing fast programming language translators Technical Report 68-76 Computer Science Center The University of Maryland College Park Maryland July 1968 Also in Proc Spring Joint Computer Conference 1969 9 V B SCHNEIDER A translator system for the EULER programming language Technical Report 69-85 Computer Science Center The University of Maryland College Park Maryland January 1969 10 V B SCHNEIDER Some syntactic methods for specifying extendible programming languages Proc Fall Joint Computer Conference 1969 11 VAN WIJNGAARDEN editor Report on the algorithmic language ALGOL 68 Mathematisch Centrum Report MR-101 Amsterdam February 1969 A Translation Grammar for ALGOL 68 12 J WEIZENBAUM A symmetric list processor Communicat.ions of t.he ACM No 6 p 524 Sept.ember 1963 13 N WIRTH A generalization of ALGOL Communicat.ions of the ACM No 6 pp 547-554 September 1963 14 N WIRTH H WEBER EULER: A generalization of ALGOL and its formal definition: Parts I & II Communications of the ACM No 3 pp 13-25 and 89-99 January-February 1966 APPENDIX 1-SYSTEM: PROCEDURES USED The following "system procedures," with the exception of the $XEQUT routine, are written in Wirth and Weber EULER.14 It is understood that their translated versions are supplied by the ALGOL 68 translator to the run-time interpreter system as part of the (standard prelude) of a translated (program). Hence, these pro .. cedures are globally defined in every translated ALGOL 68 program. $WHILE f - 'formal clause; formal stat; if clause then (stat; $WHILE ('clause', 'stat')) else Q'; $FORSL f - 'formal var;formal from; formal by; formal to; formal stat; begin new sign; label cycle; sign f - if by < 0 then - 1 else 1 ; var. f - from; cycle: if (to - var .) X sign >0 then begin stat; var . f - var . + by; go to cycle end else Q end'; + $FOR f - 'formal from; formal by; formal to; formal stat; begin new index; $FORSL (@ index, from, by, to, 'stat') end'; $CASE f - 'formal subscript; formal statementlist; statementlist [subscript]'; $COPY f - 'formal structure; formal var; var . f - if islist structure then begin new dimension; new index; dimension f - list (len gth structure) ; $FORSL (@ index, 1, 1, length structure, 505 '$COPY (structure [indexJ, @ dimension [indexJ) ') ; dimension end else structure'; $ARRAY f - 'formal boundslist; formal value; begin new dimension; new index; dimension f - list boundslist [1 J ; if length (boundslist) = 1 then$FORSL (@index, 1, 1, boundslist [IJ, '$COPY (value, @ dimension [indexJ) ') else$FORSL (@index, 1, 1, boundslist [IJ, 'dimension [indexJ f - $ARRAY (tail boundslist, value)') ; dimension end' ; If it were legal in EULER to omit the semicolon between statements, thus leaving the values of preceding statements on the operand stack without erasing them, the $XEQUT 'procedure could be written in EULER as follows: $XEQUT f - 'formal var; formal paramlist; (paramlist [1 J if length (paramlist) then $XEQUT (var, tail paramlist) else var . ) , ; > 1 The effect of the procedure above is to place all the parameters of the procedure call onto the rum..time operand stack, and then call the procedure using the "var· " statement. Since the semicolon is missing between "paramlist [IJ" and "if", the effect of the procedure call is to recursively place the first element of paramlist onto the operand stack, and successively "pop off" the top of the paramlist in each recursive use of paramlist in the call "$XEQUT (var, tail paramlist)." In our intermediate language, the $XEQUT procedure can be (correctly) written as follows: $VARBL $XEQUT .$37 $FOR1VIA PARLST $FOR1\1A VAR $VARBL P ARLST $NUlVIBR 1) $IN $VARBL P ARLST $IN $LEN GT $NU1VIBR 1 $GT $IF 13 $VARBL VAR $IN $VARBL PARLST $IN $TAIL $VARBL $XEQUT $IN $THEN f) $VARBL VAR $IN $IN $.=; BALM-An extendable list-processing language* by MALCOLM C. HARRISON New York University New York, New York INTROI!UCTION The LISP 1.5 programming language! has emerged as one of the preferred languages for writing complex programs,2 as well as an important theoretical too1. 3 ,4 Among other things, the ability of LISP to treat programs as data and vice versa has made it a prime choice as a host for a number of experimental languages. 5,6 However, even the most enthusiastic LISP programmers admit that the language is cumbersome in the extreme. A couple of attempts 7 ,8 have been made to permit a more natural form of input language for LISP, but these are not widely available. The most ambitious of these, the LISP 2 projf;lct, bogged down in the search for efficiency. The system described here is a less ambitious attempt to bring list-processing to the masses, as well as to create a seductive and extendable language. The name BALM is actually an acronym (Block And List Manipulator) but is also intended to imply that its use shoucd produce a soothing effect on the worried pr~grammer. The system has the following features: 1. An Algol-like input language, which is translated into an intermediate language prior to execution. 2. Data-objects of type list, vector and string, with a simple ex~ernal representation for reading and printing and with appropriate operations. 3. The provision for changing or extending the language by the addition of new prefix or infix operators, together with macros for specifying their translation into the intermediate language. 4. Availability of a batch version and a conversational version with basic file editing facilities. The intermediate language is actually a form of LISP 1.5 which has been extended by the incorporation of new data-types corresponding to vector, string and entry-point. The interpreter is a somewhat smoother and more general version of the LISP 1.5 interpreter, using value-calls rather than an association-list for looking up bindings, and no distinction between func~ tional and other bindings. The system is implemented in a mixture of Fortran (!) and MLISP, a machineindependent macro-language similar to LISP which is translated by a standard macro-assembler. New routines wri!ten in Fortran or MLISP can be added by the user, though if Fortran is used a certain amount of implem~ntation knowledge is necessary. The description given here is of necessity incomplete because of the flexible nature of the system. In practice it is expected that a number of different dialects will evolve, with different sets of statement forms, operators, and procedures. What is described here is a fairly natural implementation of basic features of the intermediate language which will probably form the basis from which other dialects will grow. We will illustrate the facilities by example rather than by giving a formal description, which can hopefully be obtained from the manua1. 9 OVERVIEW OF BALM FEATURES A BALM program consists of a sequence of commands separated by semi-colons. Each command will be executed before the next one is read. The user can submit his program either as a deck of cards, or type it in directly from a teletype. When submitted as a card deck, any data required by the command should follow the command immediately, and on the output! a listing of the cards will appear, interspersed-with any printed output resulting from a command. When ~ teletype is used, just the output requested will appear., Variables in BALM do not have a type associated· with them, so each variable can be assigned any value. The command: * This work was done under AEC contract no. AT(30-1)-1480. A = 1.2; .507 508 Spring Joint Computer Conference, 1970 would assign the value 1.2 to A, while: would print: [2 3] PRINT(A); would print out: 1.2 Any expression can be indexed so that repeated indexing can be used to extract elements of matrices. Thus: PRINT (W[2] [1]) ; Arithmetic operations are expressed in the usual way, so: x = 2 A + 3; PRINT(X); would print out: would print: Assignments to vector elements are straightforward: * 2 5.4 W[2][1] = "(A B); Automatic type conversion is done where necessary. A " symbol is used to allow the input of lists. Thus: A whole vector or list can be assigned from one variable to another variable in a single assignment, of course, but then any operation which changes a component of one will change a component of the other. If this is not desired, the vector or list should be copied before the assignment: A = "(A(B C)D); PRINT(HD TL A); would print: (B C) z = COpy (W); The prefix operators HD and TL have the same effect as the functjons CAR and CDR in LISP, giving respectively the first element of a list and the list without its first element. The LISP CONS operator is available either as a procedure, or as an infix colon associating to the right. Thus: subsequent changes to Z will then not affect W. An arbitrary structure can be broken up into its constituent parts by the procedure BREAKUP. This takes two arguments, a structure whose elements are constants or variables, and a structure to be broken up. Parts of the second structure corresponding to variables in the first structure are assigned as the values of those variables, while constants must match. If the structures cannot be matched, the BREAKUP procedure is terminated and gives the value NIL. Otherwise it has the value TRUE. For example: A = "A:"(B C):"D:NIL; would also assign the list "(A(B C)D) to A. Vectors can be input in a notation similar to that for lists, but using square brackets instead of parentheses. Elements of vectors are accessed by indexing. Thus: v= "[A[B C]D]; PRINT(V[2]); would print: [B C] BREAKUP ("(A B), "((C C) (D D))); will give the value TRUE and will assign "(C C) to A and "(D D) to B. Either structure can involve vectors, and constants in the first structure are specified by preceding them with the quote mark ("). Thus: Lists can be members of vectors, and vice versa, so: PRINT(TL"(A(B C)D)); would print: ((B C) D) while: PRINT("[A (B C) D][2]); would print: (B C) A non-rectangular matrix can be expressed as a vector of vectors: W = "[[1][2 3][4 5 6]]; and elements can be extracted by indexing. Thus: PRINT(W[2]) ; BREAKUP ("[A "B C], "[[X X] B [Y Y]]); will have the value TRUE and will assign "[X Xl to A and "[Y Y] to B. The converse of BREAKUP is CONSTRUCT, which is given a single structure whose elements are variables, and which will construct the same structure but with variables replaced by their values. Thus: x = "(A B);Y = [C D]; PRINT(CONSTRUCT ("(X Y))); will print ((A B) [C D]). These two procedures allow convenient forms such as: IF BREAKUP ("(A "+ B), X) THEN RETURN (CONSTRUCT("[A B "PLUS])); BREAKUP and CONSTRUCT are quite efficient, BALM-An Extendable List-Processing Language and should be used in preference to the more primitive operations whenever possible. Character strings of arbitrary length can be specified: C = (EXAMPLE OF A STRING); They can be concatenated, or have substrings extracted. Thus: D = C --+ (AND ANOTHER); E = SUBSTR(D,9,4); PRINT(E); would print (OF A) The BALM system allows the user to assign properties to variables. A property consists of a name and a value. For example, the command: "VAR PROP "ABCD = (STR); assigns the property called ABCD with an associated value of (STR) to the variable VAR. Similarly: X = "VAR PROP "ABCD; will set the value of X to the value of -the property ABCD of variable VAR. A variable can have any number of properties and any number of variables can have the same property. There is complete garbage collection of all inaccessible objects in the system, so the user does not need to keep track of particular lists or vectors. Procedures are available for creating lists or vectors with values of expressions as their elements, with storage being allocated dynamically: + Q, ABC, S (XY»); VECTOR(X + W, ABC, S(XY»); LL = LIST(Z VV = A procedure in BALM is simply another kind of dataobject which can be assigned as the value of a variable. The variable can then be used to invoke the procedure in the usual way. The statement: SUMSQ = PROC(X, Y), xi 2 + Y i 2 END; assigns a procedure which returns as its value the sum of the squares of its two arguments. The translator translates the PROC .. END part into the appropriate internal form, which is assigned to SUMSQ. In fact this is simply a list, which could equally well have been calculated as the value of an expression. The procedure can subsequently be applied in the usual way. For example: PRINT(5 + SUMSQ(2, 3) + 0.5); would print: 18.5 509 Instead of assigning a procedure as the value of a variable, we can simply apply it, so that: X = 5 + PROC(X, Y), xi 2 + Y i 2END(2, 3) + 0.5; would assign 5 + 13 + 0.5 = 18.5 as the value of X. Note that a procedure can accept any data-object as an argument, and can produce any data-object as its result, including vectors, lists, strings and procedures. Thus it is possible to write: M = MSUM(Ml, MPROD(M2, M3) ); where Ml, M2, M3, and M are matrices. Procedures can be recursive, of course. Analogous to procedures we can also compute with expressions. The statement E = EXPR A + BEND; would assign the expression A + B, not its value, to E. Subsequently, values could be assigned to A and B, and E evaluated: A = 1; B = 2.2; V = EVAL(E); EVAL(E) could also have been written as $E. A procedure is simply an expression with certain variables specified as arguments. The most useful expression for procedure definitions is the block, which is similar to that used in ALGOL, but can have a value. The statement: REVERSE = PROC(L), BEGIN (X) , COMMENT (FIRST TEST FOR ATOMIC ARGUMENT) IF ATOM(L) THEN RETURN (L), COMMENT (OTHERWISE ENTER REVERSING LOOP) X = NIL, COMMENT (EACH TIME ROUND REMOVE ELEMENT FROM L, REVERSE IT, AND PUT AT BEGINNING OF X) NXT, IF NULL(L) THEN RETURN (X) , X = REVERSE(HD L) :X, L=TLL, GONXT END END; shows the use of a block delimited by BEGIN and END in defining a procedure REVERSE which reverses a list at all levels. The COMMENT operator can follow any infix operator, and will cause the following dataitem to be ignored. As well as IF ... THEN ... statement there is an IF ... THEN ... ELSE ... as well as an IF ... THEN ... ELSEIF ... THEN ... etc. Looping statements include a FOR ... REPEAT ... as well as a WHILE .. · 510 Spring Joint Computer Conference, 1970 REPEAr .... A label should be regarded just as a local variable whose value is the internal representation of the statements following it. Accordingly, assignments to labels, and transfers to variables or expressions are legal, and can give the effect of a switch. A compound statement without local variables or transfers can be written DO .. , .. , .. END. Of course any of these statements can be used as an expression, giving the appropriate value. Note that a comma is used to separate statements and labels within a block and a compound statement. The semicolon is interpreted as an end-of command by the system (unless it occurs within a string), even if it occurs within parentheses or brackets. Any unpaired parentheses or brackets will be paired automatically, with a warning message being issued. EXTENDABILITY The TRANSLATE procedure used by BALM to translate statements into the appropriate internal form is particularly simple, consisting of a precedence analysis pass followed by a macro-expansion pass. Built-in syntax is provided only for parenthesized subexpressions, comments, the quote operator, the NOOP operator, procedure calls, and indexing. All other syntax information is provided in the form of three lists which are the values of the variables UNARYLIST, INFIXLIST, and MACROLIST. The user can manipulate these lists as he wishes, by adding, deleting, or changing operators or macros. Operators are categorized as unary, bracket, or infix, and have precedence values, and a procedure (or macro) associated with them. Examples of unary operators are -, HD and IF, while infix operators include +, THEN, and =. Bracket operators are similar to unary operators but require a terminating infix operator which is ignored. Examples of bracket operators are BEGIN and PROC, which both can be terminated by the infix operator END. New -operators can be defined by the procedures UNARY, BRACKET, or INFIX. These add appropriate entries onto UNARYLIST or INFIXLIST. For example the statement: UNARY("PR, 150, "PRINT); would establish the unary operator PR with priority 150 as being the same as the procedure PRINT. Thus we could subsequently write PR A instead of PRINT(A). Similarly we could define an infix operator by INFIX("~, 49, 50, "APPEND); to allow an infix append operation. The numbers 49 and 50 are the precedences of the operator when it is considered as a left-hand and right-hand operator respectively, so that an expression such as A ~ B ~ C will be analyzed as though it were A ~ (B ~ C) The output of the precedence analysis is a tree expressed as a list in which the first element of each list or sublist is an operator or macro. For example, the statement: SQ = PROC(X), X *X END; would be input as the list: (SQ = PROC (X) , X *X END) and would be analyzed into: (SETQ SQ (PROC (COMMA X (TIMES X X»» This would then be expanded by the macro-expander, giving: (SETQ SQ (QUOTE (LAMBDA (X) (TIMES X X»» the appropriate internal form. This would then be evaluated, having the same effect as the statement: SQ = "(LAM.BDA(X) (TIMES X X»; which would in fact be translated into the same thing. The macro-expander is a function EXPAND which is given the syntax tree as its argument. It is actually defined as: EXPAND = PROC(TR), BEGIN(Y), IF ATOl\1(TR) THEN RETURN(TR), Y = LOOKUP(HD TR, MACROLIST), IF NULL(Y) THEN RETURN (MAPCAR(TR, EXPAND», RETURN (Y(TR» END END; That is, if the top-level operator is a macro, it is applied to the whole tree. Otherwise EXPAND is applied to each of the subtrees recursively. Most operators will not require macros because the output of the precedence analysis is in the correct form. However, operatots such as IF, THEN, FOR, PROC ... etc. require their arguments to be put in the correct form for the interpreter. For instance, the IF macro can be defined: MIF = PROC(TR), BEGIN(X) , X=HDTLTR, IF HD X == "THEN THEN RETURN ("COND: LIST(EXPAND(HD TL X), EXPAND(HD TL TL X» :NIL) , RETURN ("COND :EXPAND (X) ) END END; BALM-An Extendable List-Processing Language where recursive calls to EXPAND are used to transform subtrees in the appropriate way. The statement: MACRO("IF, MIF); would associate the macro MIF with the operator IF. We can think of this expansion procedure as topdown, in the sense that a higher level macro in the tree is expanded before a lower level macro. In fact the higher level macro can process the tree in any way. This may include not processing the tree at all (as is done by the QUOTE macro), or expanding selected subtrees in a standard or non-standard way. A macro can even act as a translator of a special-purpose sublanguage which is quite different from BALM. For example, the expression: SNOBOL "(X (ABC) ARB Y PP(I) = Y :F(FAIL)) is perfectly legitimate in BALM, and could be translated into the appropriate internal form by a macro associated with the prefix operator SNOBOL. One particular outcome of this expansion procedure is the ability to write other than simple variables on the left-hand-side of assignment statements. These are conveniently handled by a macro associated with the assignment operator which checks the expression on the left-hand-side and modifies the syntax tree accordingly. It is this mechanism which permits an element of a vector to appear on the left-hand-side, and also such statements as: HDX = Y; which will be translated as though it had been written: RPLACA(X, Y); The assignment macro currently in use looks up the top level operator found on the left-hand-side in a list LMACROLIST, applying any macro associated with the operator to the tree representing the assignment statement.-The set of expressions which can be handled on the left-hand-side can easily be extended by adding entries to LMACROLIST. For example: LMACRO("PROC,MPROC) ; could be used to add the left-hand-side macro MPROC to permit assignments such as: PROC PPP(X,Y) = EXPR ... END; as an alternative way of defining a procedure. Note that the essential properties of the system are those of the intermediate language, the most important of which is its ability to treat data as program, and thus to preprocess its program. Even the TRANSLATE 511 procedure described above can be ignored and the user's own translator substituted. Of course this will require a different level of expertise on the part of the programmer than simply th~ addition of new operators. However, the translator, which takes about 2000 words on the CDC 6600, is only about 250 cards, and quite straightforward, so this is not an unlikely possibility. In summary, the BALM system permits extendability in a number of different ways: 1) 2) 3) 4) By By By By addition of user-defined procedures. the definition of unary or infix operators. the definition of macros. the use of a user-defined translator. Procedures, macros and translators can be written with the full power of BALM, or in MLISP or assembly language. ACKNOWLEDGMENTS Much of the coding of the current version of BALM has been done by Douglas Albert and Jeffrey Rubin, both of whom have made substantial contributions to· its development. REFERENCES 1 J McCARTHY et al Lisp 1.5 programmers manual MIT Press 1962 2 M MINSKY Semantic information processing MIT Press 1968 3 J PAINTER Semantic correctness of a compiler for an Algol-like language A I Memo 44 Stanford University 1967 4 P LANDIN The mechanical evaluation of expressions' Computer Journal January 1964 " . 5 D BOBROW J WEIZENBAUM List processing and extension of language facility by embedding IEEE Trans on Elec Comp EC-13 Aug 1964 6 C ENGLEMAN M athlab-A program for on-line machine assistance in symbolic conputations Proc FJCC 1965 7 L P DEUTCH 940 LISP reference manual University of California Berkeley February 1966 8 P ABRAHAMS et al The Lisp 2 programming language and system Proc FJCC 1966 9 M HARRISON BALM users manual Courant Inst Math Sci New York University Design and organization of a translator for a partial differential equation language by ALFONSO F. CARDENAS and WALTER J. KARPLUS University of California Los Angeles, California INTRODUCTION translator are shown in Table 1. Subsequent extensions of the translator have been directed toward permitting the treatment of fields in three space-dimensions, the inclusion of a wider variety of boundary conditions, and the handling of singularities, moving boundaries, and other special features. The numerical treatment of such partial differential equations most often proceeds from finite difference approximations. The time-space continuum is replaced by an array of regularly-spaced points in one, two, three, or four dimensions, and sets of algebraic equations are solved simultaneously to provide solutions. In the case of elliptic equations, the solutions for all points are obtained simultaneously; in the case of transient field problems, solutions' are obtained sequentially for successive steps in the time domain. A variety of algorithms are available for the solution of the difference equations. In order to obviate the problem of computational stability, the catastrophic accumulation of round-off errors, so-called implicit methods are usually preferred. Even so, the numerical analyst must make a judicious choice from among several practical algorithms, a choice which sometimes has a decisive effect upon computer execution time. This choice depends, of course, upon the specific type of equation under study, but is also influenced by the geometry (whether the field has regular boundaries), the parameter distribution (whether the field is linear or nonlinear, or constant or variable parameter), the required solution accuracy, and by the size of available fast memory. A language designed to solve partial differential equations must, therefore, provide various algorithms for the different types of equations; these algorithms may necessarily involve the construction of lengthy subroutines. To avoid waste of computer time, only the subroutine corresponding to the problem at hand should be produced and compiled. The preprocessor In recent years a variety of techniques for the design and implementation of translators for problem-oriented programming languages has been developed. A number of these employ a high-level programming language such as FORTRAN as the target language, rather than translating directly into assembly language or mac-hine code. The purpose of the present paper is to demonstrate the unique advantages which can be realized by making PL/1 the target language and by utilizing the so-called preprocessor (or compile-time) facilities of PL/l. This approach has been successfully used in the design of a translator for PDEL, a special-purpose language for the solution of the partial differential equations of classical physics. Details regarding the language and its application have been presented in an earlier paper! and are, therefore, only briefly summarized in the next section. The overall structure and philosophy of the translator are described in the third section, while more detailed aspects of syntax analysis and code generation are described in· the fourth section. In the final section a quantitative evaluation of the translator is presented. REVIEW OF THE BASIC FEATURES OF PDEL PDEL was designed at the University of California at Los Angeles and implemented in its basic form in 1968. Its purpose is to facilitate the solution of those partial differential equations which are of particular importance to engineers. These include particularly the elliptic equations which characterize potential fields, the parabolic equations which characterize heat transfer and diffusion, the hyperbolic equations which characterize wave phenomena, and the biharmonic equations which arise in elasticity problems. The classes of problems which could be handled by the original .513 514 Spring Joint Computer Conference, 1970 facilities of PL/l are exceptionally well-suited to this end. 2 ,3 Accordingly, PL/l was selected as the target language of the translator, the translator was written in preprocessor PL/1, and all legal PDEL statements were designed so as to be legal PL/l preprocessor statements. A typical PDEL program may contain the following statements, expressed in a syntax chosen to be readily comprehensible to engineers: 1. Definition of the equation to be solved, i.e., the mathematical form of the equation including parameter identifiers, the order of the partial derivatives, Poissonian terms, etc.; 2. Parameter specification, which may be constants or functions of the dependent and independent problem variables; 3. Specifications of the finite difference grid spacing for the space and the time variables; 4. Description of the geometry of the field, that is the coordinates in the space domain of the field boundaries; 5. Boundary conditions, which may be of the Dirichlet, Neumann or Fourier types; 6. Selection of one of available algorithms to be employed; 7. Bounds on the number of iterations or the iteration error for iterative algorithms 8. Description of the type and nature of the printout desired. SYNTACTIC ANALYSIS O,F PDEL PROGRAM GENERATION OF PUI PROGRAM PROCESSING BY PL/I COMPILER Default conditions are provided for most of these statements, so that the programmer may choose to omit certain specifications in the program; in this case the translator will make the selection for him. An example of a PDEL program for a simple partial differential equation is presented in Appendix 1, together with a typical print out. To date, over a hundred partial differential equations have been programmed and solved using PDEL. GENERAL TRANSLATION APPROACH As indicated in the preceding section, a translator for a partial differential equations language must contain a selection of algorithms, each suitable for a different class of partial differential equations, different geometries, etc. A study of digital computer programs corresponding to a number of these algorithms indicated that for a given class of equations, approximately 70% to 95% of the total number of program statements was common to all programs. The other 5% to 30% of the statements dealt with the description of parameters, initial and boundary conditions, and geometries which Figure I-Processing of a PDEL program vary from problem to problem. The greater the complexity of the geometry of the field~ the greater number of special statements required for characterizing the particular boundary configuration. A key to the successful and economic translation of partial differential equations is the avoidance of the generation of unnecessary code. That is, it is important to assure that only those portions of the translator required for the specific problem to be solved is selected for compilation, and then combined with user-generated statements cor- Translator for Partial Differential Equation Language 515 } A1 A }A2 1 SIme..-_ 3 _ _ _ B1 far 2 dI-.IonI }B2 Derivlllive Mix'" }B3 I- B C Derivati.. Mix'" }- •. ... Ally Module R....... Follows " ' - C , -.uniform .~~- . ... '--Module ~--------~--------~- Be Add and Mix the Fi_ PL/1 - . - 1 1 willi the o-.tod PU1 s.-tn1l to CompIett the PL/1 ........... Sohot the E_ _ by the Alpilllm ......- Figure 2 (Part 1), (Part 2)-Conceptual flow chart. of the PDEL translator responding to the S% to 30% of the program which is specific to the problem being solved. Among -the major problem-oriented languages currently in wide use, only PL/l, by virtue of its compiletime facilities, permits full control over which' portions of a program are to be compiled. The PL/l preprocessor language is a subset of PL/l, with many significant features particularly suited for character-string manipulation and generation, and hence for language translation. 2 ,3 As the name indicates, the preprocessing phase immediately precedes the actual PL/I compilation. During this phase, the program is scanned for all state- ments which contain the % identifier; and only those statements are executed. The PDEL translator is in effect a preprocessor program which automatically chooses groups or modules of fixed PL/l statements corresponding to the equation to be solved and the algorithms to be employed. These modules of fixed PL/l code are stored in secondary storage devices (e.g., disc, data cells), and constitute the 70% to 95% of the statements common to all programs of a given class. The other 5% to 30% of the code is generated during the preprocessing phase by specially designed code generation routines which are a 516 Spring Joint Computer Conference, 1970 part of the PDEL translator. If desired, the programmer may also include PL/l statements (without the % identifier) in his PDEL program. These statements are unaffected by the preprocessing and proceed directly to the PL/l compiler. The overall translation process proceeds as shown in Figure 1. The preprocessor phase, which results in the generation of a PL/l program, is performed in two stages: Syntactic analysis and code generation. This permits the diagnosis of illegal statements, programming errors, and calls for modules not presently available, prior to generation of PL/l code. The syntactic analysis in turn proceeds in two steps: 1. The standard PL/l preprocessor analyzes the PDEL program to determine that it is a valid PL/l preprocessor program. 2. The syntactic analyzers of the PDEL translator analyze the strings on the right hand side of each PDEL statement, so as to detect violations of PDEL syntax. Figure 2, is a generalized conceptual flow-chart of the PDEL translator. Syntactic analysis is represented by blocks Al and A2. Blocks B determine whether the required PL/l modules are available in secondary storage, while block C contains all the PL/l code generators. The approximate size of the PDEL program is illustrated in Figure 3, which also indicates the sequence of operations. The two INCLUDE statements appear in the original PDEL 'program and serve to call the PDEL translator. A finer view of the operation of the translator is given in the next section. step n hl't1t1_ e 4> --= ax 2 m1 PDEL PL/l of PDEL 18 26 11 58 11 108 39 34 19 93 19 154 12 28 10 51 16 11R 23 42 22 91 20 256 = 0 it _ 2 = (0.08/4> ) a4> ax at a2/f! + i4> ax2 . ay2 = 44>-3/4 a4> at * 'l'ranslation (includes associated overhead) + COl'l'IOilation (1.ncludes associated overhead) + Execution (includes associated overhead) = Total + distributed overhead. **Translatton time "'las obtained by subtracti~ the total ttme required to process sepR.ratelv the PL/l program generated from the time required to corrpletelN process the correspondlw PDEL program. and boundar~, conditions or various cQ'11Dlexities. the nur:1ber of PL/l statements p:enerated. t \-l1th arbitrarily selected geometries, initlal condit1ofJs '!he more the complexity the p;reate~ value (a number or another character string) has been given is replaced by the value. The statement is then sent to the PL/I compiler. The primed j;.OW. DO J. 1 TO I Jl1AX-lIl 00 j-lEIIJI TO RIGlIjl; -lf2IJ"'Tlj-~1"'2IJf WHilE- Ilf'2"tJi -'j "'---,ou- :i.f 1 statement numbers in Figure 6 identify statements which are the result of such a replacement activity. C4: These statements are sent directly to the PL/I compiler. They include: (a) output statements to print out the tabular solution of the equation in the manner specified in the PDEL output statements, and (b) other statements which complete the PL/I program. C.I): If a plot of the solution is ordered in the PDEL program, the module containing the appropriate PL/I subroutine to plot two-dimensional solutions is retrieved and sent to the PL/I compiler. Thus the complete PL/I program is produced. It is then compiled and executed like any regular PL/I program to produce the solution to the equation. The number of PL/I statements involved in this example is indicated in Figure 4. EPS.· IN '.IT.' n£IIAnn"s . . . . 0. fl'" UtH --GRin poililTliiDICAhil, IT __ ~_IPI61_1_ ._. __ . ______________ _ is., . . -- -- - ----- -- .- _!JJ"_~.R,_!!.,_~!!lIh~I,AI EVALUATION OF THE PDEL TRANSLATOR Fignre 6-PL/l code sent to the PL/l compiler. Thii'l ('ode corresponds to the t.wo-dimensional elliptic problem in Appendix 1. As yet no generally accepted criteria for the evaluation and comparison of translators are available in the computer software area. Qualitative discussions gen- 520 Spring Joint Computer Conference, 1970 erally center on: compatibility, diagnostic capability, and efficiency. Some comments as to the characteristics of the PDEL translator as far as these three qualities are concerned, appears in order. Compatibility with other computers The PDEL translator is compatible with any computer for which there exists a standard PL/l compiler and sufficient external random access memory. The translator itself is written in preprocessor PL/l, a highlevel language, and makes no reference to unique hardware features. Diagnostic capability The preprocessor PL/I syntax analyzer (furnished with the PL/l compiler) and the PDEL syntax analyzer function effectively to detect any programming errors involving the violation of syntactic rules. Errors which can be detected only at execution time (for example, overflow and requests for excessively large arrays) create difficulties, difficulties which arise in the use of most higher-level programming languages. Efficiency A working PDEL translator capable of solving the problems indicated in Table 1, has been in operation since 1968. It contains approximately 3,000 cards. The preprocessing, compilation, and execution time for four arbitrarily selected field problems are summarized in Table II together with the number of PDEL and PL/l statements required in each case. Improvements in some of these figures have been effected recently by evolutionary changes in the translator. 7 F R HOPGOOD Compiling techniques American Elsenier Publishing Company 1969 8 L PRESSER Translation of programming languages Survey of Computer Science edited by A F Cardenas M A Marin and L Presser To be published 9 J CEA B NIVELET L SCHMIDT G TERRINE Techniques numeriques de l' approximation variationnelle des problemes elliptiques Tome 1 Institute Blaise Pascal Paris France April 1966 and Tome 3 March 1967 10 S M MORRIS W E SCHIESSER SALEM-A programming system for the simulation of systems described by partial differential equations Proc of Fall Joint Computer Conference December 9-11 1968 San Francisco California 11 C C TILLMAN EPS-An interactive system for solving elliptic boundary-value problems with facilities for data manipulation and general-purpose computation Department of Mechanical Engineering Massachusetts Institute of Te.chnology June 1969 APPENDIX 1 The problem is to solve the two dimensional elliptic equation _~ ((TOcl» ox + ~ ((TOcl» ox iJy in which (T = 10 = k oy +y k=O for the following hollow quadratic field with the indicated Dirichlet boundary conditions REFERENCES 100 J. 1 A F CARDENAS W J KARPLUS P DEL-A language for partial differential equations Communications of the ACM 1970 2 P L I 1 language specifications IBM Systems Reference Library Form C28-82011968 3 R L GAUTHIER PLI1 compile time facilities Datamation pp 32-34 December 1968 4 J·A FELDMAN D GRIES Translator writing systems Communications of the ACM pp 77-113 February 1968 5 N WIRTH H WEBER EULER; A generation of ALGOL and its formal definitionPart I and Part II Communications of the ACM pp 13-25 and 89-99 February 1966 6 J A LEE The anatomy of a compiler Reinhold Publishing Corporation 1967 18" 1 (l0/3)y-100 (10/3 )y-IOO 18"_ 100 o 60" , 18" Figure 7-Appendix I-Hollow quadratic field with the indicated dirichlet boundary conditions Translator for Partial-Differential Equation Language THE SOLUTION CONVERGES WITHIN PN(O,O)- O.OOOOOOE+OO PNIO,SI3 0.000000[+00 PNCO,101 3 O.OOOOOOE+OO PNiO~lSJ· O.OOOOOOE+OO PNIO,20)= 3.333333E+Ol PNio,25)= 6.666666E+Ol PNCO,30Ja 1.000000E+02; ~NCl,OJ=O.OOOOOOE+OO PNCl,SI= 6.314485E+OO PNCl,10J= 1.015018E+Ol PNC1,L51= 1.511424E+Ol ~NCt,20J= 4.141614E+Ol PNCl,25)= 1.065380E+Ol P~Cl,30)= 1.OOOOOOE+02; PNC2,OJ: O.OJOCOOE+OO PNiZ,SI:- 1.214132E+01 PHC2,10J: 2.0Z8821E+Ol PNI2,15)= 2.183543E+Ol PN(2,20)= 4.934300E+Ol PNC2,251= 1.~42571E+Ol PNIZ,30)= 1.000000E+02; PNC3,OI~ 0.000000£+00 PNC3,5): 1.9086~6E+01 PNI3,101= 3.042688E+Ol PNC3,ISI= 3.921964E+Ol PNC3,201= 5.093144E+Ol PN(3,2SJ= 7.798208E+Ol PH(3,301= 1.000000E+02; PN(4,OI. O.OOOOOOf+OO PNC4,5)= 2.539424£+01 PNI4,10)= 4.062991E+Ol PNC4,ISJ= S.002168E+Ol PNC4,20)= 6.428710[+01 PNC4,25J= 8.133390f+Ol PN(4,30)= 1.000000E+02; PN(~,OJ= 0.000000[+00 PN(5,S)a 3.162348~+OI PNIS,101. S.101813E+Ol PN(S,lSI· 6.033391E+Ol PNCS,201= 1.142692E+Ol PNC5,2SI· 8.44925SE+Ol PNC5~30J= 1.000000E+02; PNC6,OJ= O.OOOOOO~+OO PNC6,SJ= 3.710349E+Ol PNC6,10Ja 6.178895E+Ol PNC6,151= 1.039014E+Ol PHC6,20)= 1.843001f+Ol PN(6,25J= 8.7421S8f+Ol PN(6.30J= 1.000000~+02; PN(7,OI· O~OCOOOOE+OO 5.00COOE-02 IN 3.500"OF+Ot IT~kATIONS, A~~, FOR EACH GKln poINT INDICAT~O, 521 IT rs: PN(O,lt= O.OOOOOOE+OC PNCO,6J= O.OOO~OOE+OO PNCO,11)= O.OOCOOOE+OO PNCC,16J= 6.666666E+OO PNCO,ZIJ= 3.999999E+Ol PNIO,26J= 7.333333E+Ol PN(O.2)= O.0"10':00F+O,) PNCO,7)= C.CJ00Q2E+GJ PN(O,IZ)= O.GOOC0rE+~n PN(O,17)= 1.j33313E+Ol PNIO,2Z)= 4.666666[+01 PNCO.21)= 1.Q9099q~.'1 PN(O,3)= n.COOJOO~+OO PN(O,8t= J.Ol0JCDf+1J PN(r,13)= o.oonGoot+00 PN(O,18J= 1.99q999F+~1 PN(O,21)= 5.333333F+01 PN(C,78J= 8.666~~&~.~1 PN(O,4)= o.roo~nOE+rO PN(G,9)= ~.rO)~"F+O' PN(O,14'= o.pnr~0~E+O{ PN(O,19t= 2.66S~SbE+OI PNI0,24)= ~.99Q999f+Oi PNfO,29)= q~333j3iF+"1 PNC 1,1 J= 1.6487 82E+00 PNCl,6)= 7.277188£+00 PNCl,ll)= 1.074dI9£+01 PNll,16t= 1.948171E+(1 PNfl, 11) = 4. 725468.E+0 1 PNC1,26)= 7.654648E+Ol PN(I,2J= 3.0~bl~2~+00 PNCl,l)=Q.103J19E+OJ PNCl,12)= 1.1]7215[+01 PN( 1,11)= 2.4·62:J4lL+~Jl PNC1,27t= 5.3~~174~+nl PN(1,27J= 3.?4~4~5~+O} PN(1,3,= 4.?B5014E+OC P~(l,Rt= 8.f,4R941t+OJ PHC],11)= 1.212682F+n] PNC1,4)-= 5.:H5Ft42b+f'lil PN(l.q)= 9.~~Kl12t+on PN(1,14)~ 1.322?02E+01 PN(I,lQ)= 3.~74q3uF+Ol PN(I,?41= 6.4773f'13f+Ol PNI1,29l= 9.420142F+Ol PNCZ,I)= 3.293151E+OO PNCZ,61= 1.4~5600E+Cl PNI2,11J= 2.142877E+Ol PN(Z,16)= 3.123970E+Ol PN(Z,Zlt= 5.4256R4f+Cl PN(2,26)= 7.Q55107E+Ol PN(l,21= 6.1(4~3h[+~~ PNI7,7)= 1.62131hE+Ol PN(2,12)= Z.157449f+Cl 'PNC2,17)= 3.~3311bE+)1 PN(Z,2Z)= 5.9l3261f+Jl PN(2,l7)= 8.47C~~6r+01 PNI3,l)= 4.92373AE+OO PNC3,6J= 2.183S6gE+Ol PN(3,ll)= 3.203278E+01 PNC3,16)= 4.213358E+Ol PNI3,21J= 6.101681E+Ol PN(3,26)= 8.231275E+Ol PNC3,2)= PN(3,7)= PNC4,1)= 6.518893E+00 PN(4,6)= 2.911164[+01 PNC4,11)= 4.261211E+Ol PN(4,16)= ~.24131bE+Ol PNC4,21)= 6.754419E+Ul PNC4,16J= 8.501383E+Ol PN(4,2J= l.1~711~r+~1 PNC4,7)= j.?r,L~z~r+~1 PNC4,12J= •• 43~~S~[+;1 PHI",17): 5.~lr!3~r+ftl tlNI",??): 7.'~"5~r.~+"\1 PN( 4,211= ~.'11~ h9F+-; } PNC5,11= 8.C63937E+OO PN(S,6J= 3.636366E+Ol PHIS,111= ~.327888E+Ol PNIS,lb): 6.221704E+Cl PHCS,lll= 7.386238E+Ol PH( 5, 11:: ".(-761;'<)'-:+'-1 PN(;,I?)a 5.~1~7~1(+'1 PN(S,lll", 6.41~?1qr+~1 PN(5,22)- 7.~j~~nrf+" PNC~,2613 P~(~,Z1J= PN(b,I)= PHC6,6J= 8.148719~+OI 9.5~14~6E+OO 4.354351~+01 6.418135E+Cl PN16,t61= 1.187387E+01 PNC6,211= A.OO~845E+01 PN(6,Z«» ti.97tHIE+:11 PNC6~111a = P~(3,12)= P~C],IB): 3.CI062hF+~1 PN(l,23'~ 5.~904~4F+}1 PNC},?RJ= a.D1~f2qt+~t PNC?,3t= ~.~494qlF+r~ P~(?,HJ= 1.7713q5f+Jl PNC2,lll= PN(?,18)= PNCl,73)= PN(2,26)= PN(2,4)-= 1.0744q9f+01 PNI?,qJ= 1~90h610E~Ol 2.3~6?3nF+81 P~Cl,14)= 3.9~11hlf+~1 PN(2,IQ)= 4.451147E+11 PN(~,24)= 6.Q32369F+Cl PN(2,79J= Q.49636If+Ol 6.42~7~qF+Jl 8.9~~076F+~1 ~.550623E+OI ~.11471Ht+'C P~(3,lJ= 1.11~053F+~1 PNI3,4)= t.hrR~17t+Ol l.4!4~Ir,c+~l P~(l,Q)= 1.on2~61E+0l PNC3;QJ~ 7.A64191b+Ol 3.7n nn 66E+Ol 3.J5b7?1[+~1 P~(3,11)= 1.515q4~.-:+nl PHI3,llJ= 4.~4~~~~~+Gl PN(3,17J: 6.si&416F+11 PNO,71):: S'.fd('051r+,J I P~Cl,l~l= 4.90QQ21F+'1 PNC1,I Q)= PN(~,21)= 6.931]51~+rl ~NC3.74)= 1.?64hlQF+Cl P~(3,7~J= q.12190C~+~1 P~C3,79)= 9.~oS~15~+Gl PN(~,21= 1.4~4R~~f+'t ~~I',')· ~~(~.~)= PN(6,ZI= 1.77J391~+~1 PNC ft, 11 '" 4. tw 70 N~.C 1 PNCb,IZ)a h.~~1371F+11 PHC6,11)a 1.144431E+Jl PNCb,Zl)= 8.1bj114rt11 PN(6,171. 9.724qq~t+)t 1 5.24~115F+Ol PN(4,4J= 2.1'6?~lt+tl P~C4,4)= ?Q318S5F+Pl 4.794Q34E+Ol P~C/.,13): 4."1l74.,rt~ ~N(4,1~)= ~.~(19?1[.~1 PN(4,l~)= 6.1~q~7~F+Ol ~'j(4';1'= 7 ... 2<;1,/1 t.'1 ~N(4,74)= 1.774?5aF.+ol ~~C~,I~J: a.I~Jl~1 •• ~1 PNf4,29J= ~.~301~3f+01 ... l.I~'l('-'''[t';l P~(~,3Iz 4.47~~~~~."1 .)NC'),~J 9."~ql~~t+'1 1.~9~58~~+~1 3.~61711~+~] PN(~,14J= ~N(4,14): PNC~,~J= ~NC~,~)= ?~53122r+Ol 6.~2n812~+al U~(~,'l'. ~.hAl?74[.~1 PN(S,141= 5.85~493F+rl PNC~,lj)= (.~6~7o'r+01 P~C~,19)= 6.9CI524F+Ol ~N(~,?jJa 1.~~11~Jr+:l PN(5,241= 8.162~Olt+Ol p~C~,~~)= ~.~7~~1~F+'1 PN(~.2qJ: 9.~8H43~E+Ol P~fb,ll= 1.4~~~3~~."1 P~(~,qlt ~.415~11r+rl Pt.j( h 13)= 1-.• 1557"'~1-+:; 1 P~C~,l~l= 1.~"Aqj~~+'1 iJ~Ct:.,_U'" II. p~(~,?e): 9.4~1Io~r+11 H~19/f.~ 1 PH(b,4)= 3.153921F+Ol PN(6,QJ= 5.8513~lE+rl PNff,14'= 6.8q7~14E+01 PN(6,19J= 1.b7661QE+Dl PN(6,24)= 8.5l7~H1F.+"1 PNfb,?9t= q.741~4~F+~1 PH(7,1)- 1.092357E+Ol Figure 8-Appendix 1-Tabular printout of the field potential for the two-dimensional elliptic problem. This system could be a hollow square pipe carrying a fluid whose thermal conductivity is nonuniform, with the walls being subjected to the indicated temperatures. With the understanding that PDEL solves equations by finite difference techniques in rectangular coordinates, the user of the language then has to indicate the following, in addition to specifying unambiguously the equation, parameters and conditions, .5. the error tolerance 6. the maximum number of iterations that are allowed Assuming that the user specifies the following conditions, 1. a 30 by 30 grid 2. the spacing between each point in either direction 1. the rectangular grid to be used 2. the spacing between each grid point 3. the form of the printout and assuming that the method used to solve two dimensional elliptic equatioI).s is the successive point overrelaxation, the following must also be specified, 4. the overrelaxation factor (if not specified, the optimum is used) is 1.0 3. the out, and 4. the .5. the 6. the solution at each grid point is to be printed also a discrete plot of the solution is desired overrelaxation factor is 1.70 maximum error allowed is 0.05 maximum number of iterations allmved is 100 then the following PDEL program is written to solve 522 Spring Joint Computer Conference, 1970 the two dimensional elliptic field: PDEL Program Statement # % INCLUDE $PDEL(INITIAL); 1 /*THIS PROBLEM SOLVES A 2 DI2 MENSIONAL ELLIPTIC EQUATION IN A HOLLOW QUADRATIC FIELD BY SUCCESSIVE OVERRELAXATION */ % DECLARE (PARAM1, PARAM2) CHAR- 3 ACTER; % EQUATION = 'PARAM1 * PX, PX, 4 PHI + PARAM1 * PY, PY, PHI PARAM2'; % PARAM1 = '10 + Y'; % PARAM2 = '0'; 5,6 % DIMENSION = '2'; 7 % GRIDPOINTSX = '30'; 8 % GRIDPOINTSY = '30'; 9 % DELTAX = '2.0'; % DELTAY = '2.0'; 10,11 % GEOMETRY = '(1:29,1:8); (1:29,22:29); 12 (1:8 & 22:29,9:21)'; % BCOND = '(*,0) = 0; (*, 30) = 100; 13 (0,0:15) = 0; (30,0:15) = 0; (0,16:29) = (10/3) * Y - 100; (30, 16:30) = (10/3) * Y - 100; (9:21,9) = 100; (9:21,21) = 100; (9,9:21) = 100; (21,9:21) = 100'; % lVfAXERROR = '.05'; 14 % ITERATE = '100'; 15 % ORF = '1.70'; 16 % PRINTINTX = '1.0'; 17 % PRINTINTY = '1.0'; 18 % PLOT = 'YES'; 19 % INCLUDE $PDEL(HEART); 20 Ahead of statement #1 of the program the user has to place the appropriate job control cards to give the operating system of the computer the necessary information about the job and where in secondary storage the PDEL translator is residing. The statements indicated play the following role: 1: calls from the computer system that part of the PDEL translator which performs all the necessary initialization. It must be the first statement in a program involving a new equation. $PDEL(INITIAL) is assumed to be the name of the data set where the initialization part of the translator is stored. 2: a comment statement 3: declares the names to be used for the parameters uandk 4: defines the equation to be solved 5, 6: 7: 8, 9: 10, 11: 12: 13: 14: 15: 16: 17, 18: 19: 20: define the parameters u and k indicates that the field is 2 dimensional indicate that a 30 by 30 grid is to be used indicate that the distance between grid points is 2.0 and equal in either direction indicates the geometry of the approximated field by specifying the points interior to the field indicates the boundary conditions indicates the maximum tolerable error in the solution indicates the maximum number of iterations allowed indic_ates the overrelaxation factor to be used indicate how often is space (grid point-wise, not space unit wise) the solution is to be printed out. indicates that a scaled plot of the potent~al distribution, a contour plot, is to be printed out calls from the computer system that part of the PDEL translator which performs the processing »»»»»»»»»»»»»»»> YYYYIZZ1ZZIIlIZI1IlllZ!ZllZVVVY ~~XXYYYYIIlllLLII'lllllYVYVXXWh JYWWXXXYYlIZZlll~lllllrYXXXW~VU TTUV~WXXYVllllLi"ZILYYXXW~VUTT RSTUVVWXYVVlllIZZ1ZZVYYX~VVUT~H PQSTUVWXXYYlll'll'ZZYYXX~VUTSUP ~~QSTUV~XVZllIllIlIZlY(WVUlSOP~ MNPQSTV~XVZllIZlllllzyr~VTSQP~M 1("4(JPRTUwY> I KMlX)SUWV> >YWUTRPm.1K ~JlNP~TVX> ~HKM~RTVX> >YwU:)(JIJi"i( I >XVTkPNlJ~ >XVT~P~KHF DGJlO(Jl VX> >XVT(,J!lLJ.;n 3FIK~USVX> AOHKN~SUX> >KV~UNKlfH >XUS~NKH0A ADGJMPRUX> AOGJlCHUX) ACFILORUw> ACFllNUTh> ACFHKNCl W) >X1.J,(PM.V;DA >XtnDl.h.Dt\ )iHrjNKH':CA ACEHJ~~SV> >VSP~JHECA >.~IF~nl1FCA >~l~~lIr(A AC[GJlOJrV.-fx)( x)t')(XXXYwil ffHlJ(;l- C" ACEGI KMUR ') T JVVVVVVVUT SPU,-1K Tl;fU ABOfHJLNU "H SSS T TTSSSi{OUNl JHflJ HA ~3DEGIJlMNU~JWG0~~wPU~MlJIGfnRA ARCEf-G I JKL '1\!;~"!M;·JN;-.;/>:~aKJ I GH:C :34 .AHCDLFGHIJJKKKKK{KK~JJIH~~~~cqA AABCODfFF"(~H~IIHHHHHHG';FI- r_ nr)CbAA AAABBCCCOOIJ!Jt:t i- ~ I: ~n;}Gur.CC~P'f\ I\!\ AAAAAAAAAAAAhAAAAAAAAAAA\AAAA4A ftIotAX· 1.0()()()(E+02j IHtIIN- O~OOOOlE+OOj BANDa 3.84615E+OOj Figure 9-Appendix l-Contour plot of the field potential for the two-dimensional elliptic problem. Translator for Partial Differential Equation Language of the PDEL program. $PDEL(HEART) is assumed to be the name of the data set where this key part of the translator is stored. Solution printed out The solution at each grid point for only a part of the field is shown. In the contour plot, areas of smaller 'Potential are 523 represented by characters A, B, C,_· .. and those of larger potential by X, Y, and Z. The grid point(s) with the largest potential is (are) represented by character >, while each letter represents a band (BAND) of potential equal to the difference between the maximum and the minimum potential in the field divided by 26 (the number of letters in the alphabet). The contour plot is elongated because the computer printer prints out characters leaving more space between each row than between each column. SCROLL-A pattern recording language* by MURRAY SARGENT III University of Arizona Tuscon, Arizona INTRODUCTION A number of routines have been developed recently to facilitate labeling of computer plotted output. One of the more versatile programs is that written by Freeman! which is capable of plotting characters in sequence including sub and superscripts, over and underscoring, using italics, changing fonts and returning to a saved coordinate. Programs of related nature have been written specifically for the purpose of text editing such as the IBM TEXT 360 and CALL 360 which are primarily printer oriented. Along these lines, all conversational programming systems have editing facilities. The routines share a common feature: the interpretation of character strings containing substrings specifying the desired output and other substrings specifying control functions. The substrings are separated typically by a break character such as a dollar sign or slash followed by a character representing the purpose of the substring. The use of a single character to represent a word or idea is as old as language itself (&> for and, $ for dollar, etc.) and the characters so used are called logograms or logographs. One of the early programming languages to use logograms is APL although the purpose of that language is very different from the string languages involved above. A clear advantage of the logogrammatic language is its brevity. However this can be a confusing factor as well. In this paper, we present a new logogrammatic language called SCROLL which extends the string language of Freeman to allow nesting of the subsuperscript, over-underscore and backward reference facilities and most important to include recursive procedure and measurement capabilities hitherto absent in plotting * Most of the work discussed herein was completed at Bell Telephone Laboratories, Inc., Holmdel, New Jersey. 525 languages. * An abstract pattern can be defined by such a procedure and invoked as desired with specification of appropriate arguments to yield a specific pattern. Hence a mathematical fraction procedure measures the dimensions .of the plotted output corresponding to its two arguments, one for the numerator and one for the denominator. This procedure then centers the numerator with respect to the fraction bar and denominator, while positioning the pattern elements vertically to prevent intersections. The arguments consist of any allowable sentences of the language and can in particular reference the procedure to which the arguments themselves belong. This allows one to draw, for example, a fraction in the numerator of another fraction. The semantics and syntax of SCROLL are given in the next section of the paper together with numerous examples of plot output. A detailed discussion of procedures and measuring functions is given with examples in the third section. The utility of the language is discussed in the last section. In the appendices, SCROLL syntax is specified using a meta-language similar to that used in the COBOL report, and definitions of built-in SCROLL procedures given. SE1VIANTICS AND SYNTAX SCROLL sentences are composed of plot and control statements which, syntatically, can be mixed together in any order such that the final statement is a termina- * SCROLL is an acronym for String and Character Recording Oriented Logogrammatic Language. The language has been incroporated into a general plotting system described in Bell Telephone IJaboratories memorandum MM 69-1254-11 by M. Sargent III. Details of the SCROLL implementation can be found there. 526 Spring Joint Computer Conference, 1970 TABLE I-Semantics of Logograms for IBM 360, CDC 6000 and Machine Independent Versions of SCROLL. DEFINITION Font change Case Inversion Subscript Superscript Sub-superscript return Italics U nder-overscoring Ending sentence Carriage control Column control Linewidth Character size Omitting output Coordinate changes Procedure calls Diagnostics Plotting $ Changing semantics Process statements Null Backspace LOGOGRAMS-Semantic 1 IBM 360 CDC 6000 0-9 0-9 A-Z A-Z + + LOGOGRAMS-Semantic 2 (machine independent) 0-9 + /1 /V // () () () ., , , ¢ T (FIV FORMAT) W (Width) S (Size) # @ -",., <> , , I· X <> H (Here) T (There) ? t C (Call) D (Diagnostics) $ $ $ i * * * % & tion control statement. * A plot statement consists of any string of characters (a Hollerith literal) terminated by and not including a dollar sign. The action implied by a plot statement is the plotting of the characters constituting the statement. Hence the string 'ABC' means "plot 'ABC' ;." This statement cannot by itself constitute an entire sentence; it must be followed by a control statement which terminates the sentence. Control statements are also character strings, but inevitably begin with a dollar sign and end according to context as described below. The first nonblank character following the dollar sign stands for the type of action demanded by the control statement. As such the character is a logogram and gives SCROLL its logogrammatic nature. More characters may be included in the statement depending on its purpose. Blanks occurring between the initial $ and the final character of the control statement are ignored unless they belong to a plot statement which is the argument of a SCROLL procedure or function call. The simplest SCROLL sentence consists of the single control statement '$.' which means "terminate the sentence." Combined with the plot statement above, one has the sentence 'ABC$.' which means "plot 'ABC'." lVlost control statements have fixed length. Those having * The reader may prefer the schematic definition of SCROLL syntax given in the Appendix A to that given here. 1\ L (Logogram) N(Null) B variable length are terminated either by a per cent sign (%) of the $ of the next control statement. In the remainder of this section, the control statements are defined and illustrated by figures containing prints of unretouched output obtained using an IB1VI 360/65 computer in conjunction with a StrombergCarlson 4060 microfilm recorder. The semantics given are for the IBl\1 360 version; Table I summarizes these semantics, the CDC 6000 version* and a machine independent set. Additional examples of SCROLL sentences are given in Appendix B which gives the definitions of the built-in procedures. 1. Specifying a new type font (see Figure 1a) $n change to the font numbered n (1:::; n:::; 9) . Four interpretive fonts** have been used herein: #1, the upper case English font; #2, the lower case English font; #3, the upper case Greek font, and #4, the lower case Greek font. The fonts include the special symbols. % + - = ' ( ) / * I$ ; :, < > [ ] ¢ # &- { } and characters representing integration, differentiation, infinity, summation, product and the Yale seal. Characters not * Debugged at the University of Arizona Computer Center. ** An interpretive font consists of a sequence of coordinates and delimiters which is interpreted and scaled to produce characters of desired size. SCROLL 527 A$4BC$OO$. Af3rD $1 AS/BCD$. ABeo A$30HP$lE$~ A~snF $1 A$/BCDSIEF$ . ABCOEF A$2BCO$OEF$. AbcdEF $1 A$I4B$I3C$. ABC $a A$SC$. AbC $($)+ A$(BCD$)+F$ . ABCDF $+ A$+QR$. AQR $($)- A$(BCD$)-F$ . ABCDF $- A$-QR$. AQR $($) $('ABC$).$. : ABC $= A$+R$=Q$. ARQ $$ P$2LOT $$$. Plot $ A$+R$-$G$=A$=S$. ARC)AS Figure 2-Examples of SCROLL sentences illustrating control statements for (a) italics, (b) bold face, (c) over and underscoring, (d) returning to a saved plot position and (e) plotting a dollar sign $n Figure 1-Examples of SCROLL sentences illustrating control statements for (a) switching type fonts, (b) shifting case for one character and (c) sub and superscripting $) + on key punch are retrieved by a number sign # followed by a key punch character. #A and #B, for example, yield { and } respectively. See the plot system memo for further discussion. $0 return to previous font. 2. Shifting case for one character only (Figure 1b ) $a where a is any letter of the alphabet inverts the shift for one character only: if $A is encountered and a lower case font is set up "A" will be plotted (instead of "a") and succeeding characters will be plotted in lower case. Note: The case may be shifted for one or more characters by changing to the appropriate font or by typing parts of SCROLL sentences in lower case when an upper case font is set up. 3. Subscripting and superscripting (Figure lc) $enter subscript mode, $+ enter superscript mode, $ = return to previous sub or superscript mode. 4. Italics (Figure 2a) $/ enter italics mode and $ I leave italics mode . .5. Under and overscoring (Figure 2c) $( remember where to start drawing a line, under or overscoring, $) or $) - overs core (+) or underscore (-) the characters between this $) and the corresponding $(; $( and $) are treated as a pair the same way right and left parentheses are treated in FORMAT statements, followed by anything else, draw a line between current plot coordinates and those saved by corresponding $(. See also $_ facility for drawing lines. 6. Ending sentence (see next section for examples) $. (or $;) If encountered in procedure or argument sentence, return to calling sentence; otherwise return to the plot system. $, The interpretation of $, starts one on the next line $t A$t60B$t65C$. A B C $& 0$&/ 0 $- $_0,0; ,2;6,2;6;$. $_),2;6; ,-2;-6$. Figure 3-Examples of SCROLL sentences illustrating control statements for (a) skipping to next line, (b) changing column, (c) backspacing and (d) drawing lines 528 Spring Joint Computer Conference, 1970 7. Starting a new line (Figure 3a) $,n advance "carriage" according to value of n: n = plus, go to beginning of current line; n = 1, go to next frame; n = 0, skip a line; n = 2-9, advance (l/n)th of current frame; n = anything else, go to next line. If the control requests plotting below the frame, the frame is automatically advanced. $: (an, sentence) define a procedure named an by the SCROLL sentence sentence. $:? read characters on next card in FORTRAN input stream into SCROLL storage starting at this $ and continue interpretation. A 0-2-8 punch does the same thing and can occur anywhere in a SCROLL sentence. 8. Specifying plot column (Figure 3b) $¢n (n= COLNO-variable in PLTPRM described in plot system memo) go to column n with respect to left side of film (ignore XORG and X, positioning variables). The variable COLNO determines the number of columns assumed on a frame and has the default value 80.0. Hence unless COLNO is reset larger than 100.0, the control sequence '$¢100' will require plotting outside screen boundaries and error messages will be issued. The letter n can be the name of a SCROLL process variable containing the desired column number. Hence on can set "tabs" for printing text. 14. Backspacing (Figure 3c) $&n backspace n (1 ~n~9) characters exclusive of control characters, $&0 backspace 10 plotted characters, $& followed by anything else-backspace one plotted character and process the next character as usual. Up to ten characters can be backspaced over at a given time unless fewer than ten positions have been established. To backspace. more generally, one must use the $< and $> facility (see 12). 9. Changing linewidth (Figure 2b) $#n for n = 1 to n = 4, use standard line density and change line width to n where width is proportional to 2**n (gives varying degrees of bold-face. type); for n = 5 to n = 8 change linewidth to n - 4 and use light density. 10. Changing character size $@n. change character size to SIZE*n/2, where SIZE is the nominal character size; $@O go back to previous size. 11. Omitting output $-, omit output until $-, ; is encountered, $:....,; restore normal output. 12. Saving and returning to a point (Figure 2d) $ < save the current plot coordinates; $ > go back to the coordinates saved by the $ < corresponding to this $ > ($ < and $ > are treated as a pair the way right and left parentheses are treated in FORTRAN FORMAT statements). 13. Calling and defining SCROLL procedures (see next section) $ :an call' the procedure named by the upper case alphanumeric character a and numeral n (1-9 or null) , $ :an (list) call the procedure named an with arguments given by list, 15. Rotating output $" [expl][,exp2]% where expl and exp2 are SCROLL arithmetic expressions. The azimuthal angle (angle of rotation from the x axis in the x-y plane) is set equal to the value of expl if present, and the polar angle (angle of rotation from the z-axis towards the x-y plane) is set equal to the value of exp2 if it appears. The two angles have default values 0° and 90° respectively. Note that in SCROLL Version I, all previous measurement information is destroyed by this control statement. Hence one cannot, for example, draw an unrotated rectangle around a rotated pattern. 16. Shifting plot position and drawing lines (Figure 3d) $_field[_field] ... % shift plot position and/or draw lines. A field contains one or more coordinate sets separated by semicolons and the sets themselves are SCROLL arithmetic expressions separated by commas. If a field consists of a single set, the plot position is shifted such that the first coordinate is added to the x position and the second to the y position. If a third coordinate appears it is used as the z (depth) coordinate in a projected drawing. If an equal sign precedes the x coordinate, the set specifies an absolute location on the plot screen. If more than one set appear in a field, lines are drawn between the SCROLL points they determine relative (no =) to the current plot position. If a ' > ' is the first character of a field, the coordinates are treated as vectors, that is, displacements from the current plot position. Lines are drawn and shifts occur. If a coordinate is omitted, it is assumed to be zero; if a single coordinate (no comma) appears, it is assumed to be x. Examples of shifting are given in another section and examples of line drawing in the "box" procedure. Shifts can be made in process stements also. 17. Writing diagnostic information $? ; terminate printing plot diagnostics, $?n turn on the flag for diagnostic class n (see discussion of PLTDBG in next section of plot system memo for details), $? followed by anything else causes descriptions of the type of actions resulting from subsequent control strings to be printed. 18. Plotting dollar sign (Figure 2e) $$ plot dollar sign ($). 19. Changing SCROLL semantics $!n use control semantics n (lor 2), where semantics refers to the meanings of the logograms. $!1 causes the semantics described in this section to be used; $!2 causes semantics presumably specified by the user to be used (perhaps second set given in Table I). 20. Execll.ting process statements (see Appendix B f(,>· ~~:J,mples) $* ... % execute the process statements given by the ellipsis ( ... ) (see next section for further details). 21. Null statement $% statement is ignored. Further examples of SCROLL sentences To demonstrate a little more of the power of SCROLL we consider the first equation in Figure 5. This resulted from interpretation of the sentence 529 font, 'Q' plots the letter p, '$0' returns to the previous font (the upper case English. font) and '$.' ends the sentence. The next four characters '= -' plot an equal sign with blanks on either side followed by a minus sign. The' $1' plots i (the dollar sign inverts the case for I alone), '[H,' plots itself, '$4' switches to the lower case Greek font and 'Q]' plots pl. The '$ - - ' subscripts a minus sign, '$ =' returns to on-line mode, ' - ' plots a minus sign with blanks on either side, '$:F1(1$.2$.)' calls the simple fraction procedure to plot the fraction one half, the '$C' plots r, '$- +' subscripts a plus and '$.' ends the SCROLL sentence. In the IBM implementation for fonts 1 through 4, [ is given by #1 and ] by #2. The second equation resulted from interpretation of the sentence '$:B($:P(A$. B$. $N$.) = $:R ($:F(A-B$. B$+Q$-$N$=$=$.)$.) $:P(A$. B$. $N -1$.)$.) $.' Here the "box" procedure B plots a rectangle tailored to fit plot output resulting from its argument. Similarly, the partial derivative (P), square root (R) and fraction (F) procedures shift their arguments into place and draw lines of appropriate length. SCROLL PROCEDURES In Figure 5 a complex interpretive font character is plotted namely the Yale seal. * One often wants to plot complicated groups of characters, such as a mathematical fraction and could, as for the Yale seal, define the desired pattern as an interpretive font character. In general this is an exceedingly tedious procedure. Instead one would like to define recurrent patterns by procedures and invoke particular patterns using appropriate arguments in the procedure calls. This extremely useful possibility is part of the SCROLL language. Specifically, the call is a control statement having one of the forms (1) '$:an', (2) '$: an (list) " '$:T($4Q$0$.) = -$I[H, $4QJ$--$ = -$:F1 (1$.2$.) [$C.QJ$~ +$.' where a (any letter of the-alphabet) and n (1-9 or null) identify the procedure, list is a SCROLL paragraph whose sentences comprise the procedure's arguments. The procedure facility is recursive, that is, a procedure Here the string '$: T($4Q$0$.) ' is a procedure call (see next Section) to the time derivative procedure T which centers a dot above the plot output given by the argument, '$4Q$0$.'. The argument itself is a SCROLL s(mtence in which '$4' switches to the lower case Greek * The author is indebted to D. Barth for the coordinates of the Yale seal and other characters used herein for plotting. His Yale seal appears slightly differently in his article in Computer Programsjor Chemistry, Ed. D. F. DeTar, Vol. 3, W. A. Benjamin, Inc., New York (1969). 530 Spring Joint Computer Conference, 1970 p = -i[H, p]_ - ~[r, p]. I t is called by the statement '$:F(n, m)' which plots n m where nand m are SCROLL sentences. In particular the call '$:F(A - B$. A$ Figure 4-Plot output resulting from interpretation of SCROLL sentences can call itself, and procedures can define other procedures. Note that a SCROLL procedure differs from a SCROLL function in that the latter returns a value such as the length of a sentence as contrasted to the execution of plot and control statements. The fraction procedure In plotting mathematical equations one pattern which occurs frequently is the fraction. A procedure named F has been defined to plot a fraction. + C$ - $N$ = $ = $.)' plotted the fraction within the square root in Figure 4. Every SCROLL sentence constitutes a procedure definition. The nth argument in the procedure call is inserted whenever the characters '&n' (1 ~ n ~ 9) are encountered. For example, the fraction procedure F might be defined by the sentence '$($($_0.58%&1$) - .-$)$_ - 1.68%&2$_0.68$.' Here the $( saves the cJrrent x position for under or overscoring, the $ ( saves the x position for later reference, the $_0.58 adds half a character size to the vertical (y) coordinate, &1 plots the first argument, $) - underscores this argument, $) returns to the x coordinate saved by $(, $_ -1.68 subtracts 1.6*SIZE from the vertical (y) coordinate, &2 plots the second argument, $_0.68 restores the vertical coordinate to its value at the start of the call and $. returns to the calling string. Simple fractions can be plotted by the procedure defined by '$+$(&1$) -$&$=$-&2$=$.' However, neither definition centers the numerator above the denominator or performs shifts to prevent intersection of lines. For this, one must use the process statement facility described in SCROLL Process Statements section. Partial derivative and other procedures An example of a procedure calling another procedure is the partial derivative procedure which plots when called by the statements '$:P($2Y$. X$. N$.)'. Its definition is Figure 5-The Yale seal, example of a complex interpretive font character. Note that this figure is static: only its size can be changed. The SCROLL procedures allow one to define equally complicated figures which are dynamic, that is, the user can change parts of a figure merely by changing arguments to a procedure '$: F(#D$+&3$ =&1$. #D&2$+&3$=$.)$.', vvhere #D retrieves a in fonts 1-4. Use of this procedure "vas made in drawing Figure 4. An ordinary derivative procedure is available and is named D. A "time derivative" procedure named T SCROLL $ : B5(SNOBOL $ . ) C5&QBoD $ : S($N= 1$. N$.) $: B(ABC$.) l0_$SJ $ : X{$N= 1$. N$.) @ $ : R{$ : F{A$ . A-B$ . )$ . ) $ : C($4AB$ . ) $: 0 1(A)B$ . ) $ : H(PL TBCO$ . ) A)B (P~I_B~D ) $: A(5$.) Figure 6-Examples of SCROLL procedure calls illustrating the "bead" (oval), box, circle, diamond, hexagon and arrow procedures. The bead, box, circle, diamond and hexagon procedures produce figures of jus't the correct size to enclose the plot output resulting from their arguments centers a dot above the character(s) specified by the argument. These and other procedures are built into the language (and plot system), are illustrated in Figures 6-8 and are defined in Appendix B. The user can define his own procedures simply by including $: 81 ($: F{A$. A-B$.)$.) $ : B2{$ : F{A$. A -B$ . )$ . ) $:Pl ($:F(A$. A-B$. )$.) $ : O{A$. B$. $N$.) $ : P(A$. B$. $N$.) L "-I N IT "=1 .. /-,;:-' VA-B Figure 8-Examples of SCROLL procedure calls illustrating the summation, product and square root procedures statements of the form '$: (an, sentence)' (CALCUL).....---~1X=~~1 $ : F 1( 1$. 2$.) N ~ $: H(CALCUL$.)$: A(6$.)$: B(X=$4B$1 $C$. )) $ : F{$A$. A -8$ . ) 531 a A-B 1 2 [A~BJ {A~Bl (A~B) dnA dB" anA aB" Figure 7-Examples of SCROLL procedure calls illustrating the fraction and simple fraction procedures, the square bracket, curly brace and parenthesis procedures, the ordinary and partial derivative procedures. The fraction and bracketing procedures are constructed with precisely the correct sizes to fit the plot output resulting from their arguments in a call to the plot system. Here a and n identify the procedure and sentence is the SCROLL sentence comprising the procedure's definition. SCROLL process statements Particularly in the execution of procedures it may be necessary to calculate the dimensions of a substring or argument. For example, if the numerator of a fraction has different length than the denominator, the bar of the fraction should be as long as the larger of the two and the smaller should be centered with respect to the larger. Furthermore both should be shifted far enough away from the bar to prevent intersection. Clearly the dimension[J of the numerator and denominator vary from call to call and cannot be successfully defaulted ahead of time. One needs a processing facility to calculate dimensions on the spot. The $* '. . . % control statement serves this purpose and has the form '$*statement[ {, 1; }statement] ... %' Here "statement" can be (1) a branch statement)±n meaning branCh to the present character position ±n, )n meaning branch to the nth position in current sentence, where n can be either an integer constant, e.g., 10, or a SCROLL variable (defined below) whose value has been assigned by a q: (see (2»); ) followed by anything else causes a return to the calling sentence (procedure or argument) ; (2) an assignment statement [q:] a [,b] .. ~ = expression 532 Spring Joint Computer Conference, 1970 where the (optional) q: assigns the SCROLL variable q the current location counter (for later branching) , where aLb]... can be SCROLL variables (see below), X, Y or Z which add the value of the expression to the current horizontal, vertical and depth plot positions respectively, nP which stores into the nth variable in the labeled common PLTPRJVI (see plot system memo, op. cit.), or 1, 2 or 3 which store the expression's value into the plot limit variables containing the x maximum, y maximum and y minimum respectively if that value exceeds the one already established (allows one to measure plot output dimensions quickly); (3) a conditional statement ? logical expression {, statement} ... where the logical expression has a form described below and the statements are any process statements. The logical expression dominates all statements represented by the ellipsis. A null logical expression is true if and only if plotting is not suppressed. SCROLL variables SCROLL variables are named by single letters of the alphabet, A through W, and letters A-Z preceded by # (i.e., #A). Those named by the letters A through Ware local and automatic, that is, the values they assume in different procedures are unrelated and are lost upon return from the procedure in which they were set. Those named by #A through #Z are global and static, that is, their values are known to all procedures and can be modified in any procedure. When a SCROLL variable is used in an arithmetic context, it is assumed to be floating-point. When used in a logical context, a non-zero value is interpreted as true; a zero value as false. A SCROLL arithmetic expression consists of a single arithmetic primary, a unary arithmetjc operator followed by an arithmetic primary or two or more such expressions separated by binary arithmetic operators. An arithmetic primary can be a fixed point constant, e.g., 0.5 or 3, a SCROLL variable, a variable in common PLTPRJVI (nP retrieves the nth variable), the letters X, Y, Z which return the current x, y and z coordinates, a function reference or an arithmetic expression. In arithmetic context, fixed-point constants and SCROLL variables are assumed to have floating-point format. Values of SCROLL variables and fixed point constants have units of the plot screen. However if a fixed point constant is followed by the letter S, it is multiplied by the current character size before being used. There are four binary arithmetic operators, + (addition) , (subtraction), * (multiplication) and / (division). (used for emphasis There are two unary operators, only) and - (negation). Binary operators must occur between two primaries or between a primary and a unary operator which precedes a primary. Expressions are evaluated left to right subject to the following hierarchy. Functions are evaluated first, negation (unary - ) second, mUltiplication and division third, and addition and subtraction last. The unary plus is ignored. Primaries consisting of parenthesized expressions are evaluated before arithmetic operations are performed and can be used "as in algebra to override the hierarchy. Examples of legal arithmetic expressions are + A*B/(C-D) F+E(A$-Q$=X$.)*-D B($I$-Q$.) The second and third examples contain function references described below. Note that A * - D is a legal expression, for the minus sign is interpreted as a unary operator signifying arithmetic negation rather than the binary operator indicating subtraction. Examples of illegal expressions are AB - C either an operator is missing between A and B, or SCROLL variable is misnamed (only one letter is allowed) , (A *B*(C+D) A - *B unpaired parenthesis, two binary operators must be separated by a primary. A SCROLL logical expression consists of a single logical primary, a logical negation operator followed by a logical primary or two or more such expressions separated by binary logical operators. A logical primary has the value true or false and can be a fixed-point constant as above, a SCROLL variable, a logical expression enclosed in parentheses or two arithmetic expressions separated by a relational operator. In logical context, fixed-point constants and SCROLL variables are considered true if and only if they are nonzero. Zero values are false. The relational operators are > (greater than) < (less than), > = (greater than or equal to), < = (less than or equal to), = (equal to), and -, = (not equal to). The binary logical operators are I (logical or) and &- (logical and). There is one un"ary logical operator, -, (logical negation). The total hierarchy of operations for logical expressions is as SCROLL follows Operation Precedence 8 (highest) 7 Evaluation of functions Unary + and * and / + and>, <, >=, <=, and -,= 6 5 4 3 2 1 (lowest) -, '& I As with arithmetic expressions, parentheses can be used to override the hierarchy. Examples of valid logical expressions are as follows A> =B(ABC$.) F G&(A I B) I -,C Process functions Five functions are defined for the measurement of the plotted output resulting from SCROLL sentence interpretation: B returns the bottom and length of its argument; D returns the length, height, bottom difference height-bottom of its argument; E returns the length alone, G is the same as D except that plotting can take place concurrently with measurement, and H returns the height and length of the argument. When used in an arithmetic statement other than a simple assignment, only one value (the first) is returned; in simple assignment statements, the first value returned is stored in the variable indicated, the second in the next variable in the alphabet, etc. The dimensions are all given in plot screen units. Vertical measurements (height and bottom) are made relative to the bottom of an on-line (not sub-superscripted) period. The functions are recursive. In addition a maximum function M is defined which returns the maximum value given by the SCROLL variables appearing as arguments (see example below). Exam.ple of use As an example of process statement use, consider an extended definition of the fraction procedure F '$* A =B(&l$.); C=H(&2$.); C=C+0.2; E=J,!I(B, D); $_,.58$<$<$~(E-B)/2, -A+.2 %&1$>$_0; E,_(E-D)/2,-C %&2$>E,-.58$.' Here for the sake of readability, numerous blanks have been included; ordinarily one deletes blanks to save 533 space and execution time. The first statement stores in A and B the bottom and length of the first argument (the numerator), the second statement stores in C and D the height and length of the second argument (the denominator) and the third statement adds 0.2" to C. The fourth statement stores in E the larger of the lengths stored in Band D. The control statement $_,.58 in line 2 shifts the plot position up one half a character size. The two $ < statements save the plot position for future reference and the $_ (E - B) /2, - A + .28 shifts the plot position so that the numerator will be centered with respect to the length E and the bottom of the numerator will be 0.2 of a character size above the bar of the fraction. The percent sign in line 3 terminates the shift field, and &1 plots the numerator. The $> then returns to the plot position established before the numerator was shifted into place and the $_0;E,_(E-D)/2,-C draws the bar of the fraction (length E) and then shifts so that the denominator is 0.2 of a character size below the .bar of the fraction. The percent sign in line 4 terminates the shift field, &2 plots the denominator, $ > returns to the saved plot position and $_E,-.58 shifts to the original vertical position and a horizontal position on the right of the fraction. Note that by saving and returning to a known reference position, one does not have to know where the numerator and denominator finish. CONCLUSIONS The SCROLL language has been used extensively in the preparation of lecture slides such as that printed 1'=0.5 0.8 - ~0.6 ij.. H 0.4 0.2 0.0 -5" -3" -" ¢ " 3" 5" Figure 9-A graph labeled using simple SCROLL sentences 534 Spring Joint Computer Conference, 1970 in Figure 4 and in the labeling of graphs for pUblication as shown in Figure 9. The cost involved is dramatically less than that incurred when a draftsman is used. Specifically, the first equation if Figure 4 ran on an IBl\,f 360/65 (using FORTRAN IV level G) 0.226 seconds; the second for 0.64 seconds and the complete labeled graph in Figure 9 for one second. A second costs about ten cents on a typical IB1VI 360/65, and the cost of a frame on the Stromberg-Carlson 4060 microfilm recorder varies between eight and sixty cents according to load. This yields a cost for either slide of about thirty cents. Furthermore, the turn around time for the slides is generally less than a day. Compared to a draftsman, this is orders of magnitude cheaper and faster. Inasmuch as SCROLL with its recursive procedure and measurement capabilities now provides the usei:' with the power hitherto only available from a draftsm~n, it is felt that SCROLL represents an important advance in the preparation of figures. It is. felt that in general SCROLL represents a substantial advance in the computer preparation of manuscripts for publication. ACKNOWLEDGMENTS The author is indebted to numerous colleagues at Bell Telephone Laboratories for helpful discussions. In particular, he would like to thank A. G. Fox, C. T. Schlegel, D. A. Vaughan and D. P. Allen for editorial suggestions and J. F. Gimpel and 1. P. Polonsky for illuminating conversations about programming languages. The author is especially grateful to R. E. Griswold for helpful suggestions of all kinds and for finding numerous errors in the SCROLL implementation, to G. C. Freeman of the Yale Computer Center whose l\,fAP-coded plotting routines served as starting points for some of the principal subprograms, to J. D. Beyer for very helpful discussions about syntactic analysers and to D. Barth of Yale University for the coordinates of most characters in the interpretive type fonts. APPENDIX A-SCROLL SYNTAX In this appendix, a formal definition of SCROLL syntax is given in terms of a meta-language muph like that used in the IBM PL/I Reference l\,fanual (C288201) . A.1 The Syntax 111eta-language The meta-language used below to define the syntax of SCROLL consists itself of literals, variables, expres- sions and operators much as SCROLL or any other language does. More specifically, a literal consists of any character which is preceded by a bar (I), a blank, a left square bracket ([) or a left curly brace ({) and is followed by a bar,' a blank, a right square bracket (J) or a right curly brace (}). A variable is named by a lower-case letter of the English alphabet followed by any non-zero combination of such letters and underscores (_). A primary is a constant or a variable or a bracketed expression. Square brac'h:ets are used when the expression is optional; curly braces when required. A unit is a primary optionally followed by an ellipsis (: .. ), the latter indicating that the primary is repeated zero or more times. An expression is one of three things: 1) a single unit, 2) a unit followed first by one or more blanks and then by another expression, or 3) a unit followed first by a bar (1) optionally preceded and followed by blanks and then by another expression. In the second case the values of the unit and expression are concatenated; in the third case they are considered to be alternatives. A variable is given the value of an expression when its name is followed first by a colon ( :), then by one or more blanks and finally by the expression. A variable is given the value of an English phrase when the variable is followed by a colon, then one or more blanks and finally by the phrase. A phrase is distinguished from an expression in that the former contains undefined variables and always makes sense as a phrase. When a syntactic symbol I [J {} is to be used as a literal, it is underlined. Hence .1 is a literal I rather than the operator separating alternatives. In particular the syntax of this meta-language is defined in terms of the language itself as follows: literal: variable: primary: unit: expression: definition: A.2 alblcldlelflglhlilj Ikillminiolplqlrlsitl ulvlwlxlylz AIBI CIDIEI FI GIHIIIJIKI LIl\1INIOI PIQIRISITIUIVIWIXIYI ZI I_letter 1.l1[IJI {I} 10111213141516171 8191,1:1 ;\'I"I?I+I-I*I/I < I> I(I) 1%1 = 1$1 !I¢I@I&I#II_letter {I_letter I _} •.. literal I variable I [ expression ] { expression } primary [ ... ] unit [[blank]· .. {.llblank} [blank} .. unit}·· variable : [blank]. .• expression an English phrase SCROLL Syntax digit: integer: 0111213141516171819 digit·· . SCROLL letter: AIBICIDIEIFI GIHIIIJIKILIl\1INIOI PIQIRISITIUIVIWIXIYI Z character: letterlblankldigitl.1 1+ 1-1*1/1 = 1·1 ,I, I< I> I(I) I?I!I;I :1'1%lalblcldlelfl glhlilj Ikillminiolplqlrlsitlulvlwixlyl zl&I¢I" simple_logogram: digitlletterl + I-I = 1/1.1 1·1 ;1$1 < I> I (1%1, delimiter: % I null if a control statement ,I follows constant: integer [ . J I [integer J • integer variable: AIBICIDIEIFI GIHIIIJIKILIMINIOI PIQIRISITIUIVIWI#lletter factor: constantlvariablelXIYIZI integer P function: letter (sentence) I letter (variable [, variable J •.. ) a_primary: factor I (a_expression) I function a_term: [+1- J ... a_primary a_expression: a_term [{ +I-I*I/} a_term}·· relation: a_expression { > I< I= I< = I> = I , = } a_expression I_primary: factor I (I_expression) I relation [ , J. .. I_primary I_term: I_expression: I_term [{&I.1} I_term}·· expression: I_expression I a_expression angle_set: a_expression I [a_expression J , a_expression coordinate_set: [= J [a_expressionJ [, [a_expressionJJ [, a_expressionJ field: [coordinate_set ;J ... coordinate _set procedure_name: letter [11213141516171819J 1 hs: integer P I variable I X I Y I Z process_statement:> [[*{ + 1- }J{integerlvariable} JI [variable:} .. lhs [, IhsJ = expression I ? I_expression {, process_statement} ..• control field: simple-logogram I , [blank Idigiti + J I ¢ {integerlvariable} I angle_set delimiter I " # {OI1121314} I @ {OI1121314} I ) [+l-JI & [digitJ I ? [digit I ;J I ! {l12} I ' ; I procedure_name [ ( paragraph ) J I ( procedure_name, sentence) I:? I field [_ field J ... delimiter I * process_statement [ ; process_statementJ··· delimiter 535 control_statement:$ control_field plot_statement: {character I # {character I #}} ... sentence: [plot_statement I control_statement } .. $ . paragraph: sentence [[blankJ sentence}·· APPENDIX B-PROCEDURE DEFINITIONS In this appendix, the built-in SCROLL procedures are defined. The name of the procedure is given first, then the call indicating the number of arguments for the procedure and a brief description of the procedure and finally the definition itself. 1. ALIGN-$:A9(&1 &2 &3), where &1 is to be centered below &3 and &2 is to be. centered above &3 (used for summatio!ls and products). '$* A =D(&3$.) $-$* D=D(&l$.); G=D(&2$.) ;J=M(A,D,G) $=$ *C=C-E-.2; B=B-I+.2; ?"X,l=J, 2=B+H, 3=C+F,> $<$<$<$_(J -A)/2 %&3$-$>$_(J -D)/2,C %&l$>$_(J -G)/2,B %&2$=$>$_J$.' 2. ARROW-$: A (&1), where an arrow of length &1 is to be drawn. Plotting terminates at the arrow's tip; the arrow is horizontal pointing to the right if &1 is positive and to the left if &1 is < o. '$* A = &1; B = .5; ?A> O,B = - B ; $_>A_;B,.3_;B,-.3$.' 3. ARROW1-$A 1: (&1), where an arrow of length &1 is to be drawn. Plotting terminates at the arrow's tip; the arrow is vertical pointing up if &1 is positive and pointing down if &1 is negative. '$* A=&l; B=.3; ?A>O,B= -B; $_> ,A_;.5,B_;- .5,B$.' 4. BEAD-$: B5 (&1), where &1 is to be enclosed within a bead (oval). '$* A =D(&l$.); D=D+.4; 2,H =D/2; 1,E=A+D; ?--',3= -H,X=E,>; I=14P; 14P=D $<$_,-H$($@2#E$)$_H $($_,D;A,D_)A%#F$)$@O$* 14P=I;Y= H - (B+C)/2%&1$>$_E$.' 5. BOX-$: B (&1), where &1 is to be plotted with a box (rectangle) around it. '$* A=D(&l$.); H= (B+C)/2; 2,F=HC+.4; 1,A=A+.8; ?-,3= -F, X =A, > .$_, -F;,F;A,F;A, -F;, -F$<$_.4, -H%&l$>$_A$.' 636 Spring Joint Computer Conference, 1970 6. BRACES-$:B2(&1), where &1 is to be enclosed in curly braces. '$:B9 (&I$.#A$.#B$.) $.' 7. GENBRK-$:B9(&1 &2 &3), where &1 is to be enclosed in the bracketed pair given by &2 and &3. '$* A =D(&I$.) ;1,H =A+D/2+.6;?--" 2=B+D/I0,3=C-D/I0,X=H,> ; 1=14P; 14P=D; E=D/4+.2 $<$<$_,C$< $@2&2$>$_H-E%&3$@0$*14P=I$>$_E&1 $>$_H$.' 8. BRACKET-$: B1 (&1), where &1 is to be enclosed in square brackets. '$* A=D(&l$.); E=D/6+.1; 1,A=A+2 *E+.6; 2,B=B+.2;3,C=C.2;?-',X =A,>$<$_E,C;,C;,B;E,B_A -E,B;A,B; A,C;A-E,C_E+.3%&1$>$_A$.' 14. GRID-$:G(&l &2 &3 &4), where a rectangle is to be drawn containing &1 by &2 boxes each with length &3 and height &4. Plotting starts and terminates at the upper left-hand corner of the grid. '$* E=&3; H=&4; 1, A=&l*E; 3,B= -&2*H; ?--,,>; I,J = 0 $_,J; A,J $*J=J-H; ?B-J>.I,> -31$_I;I,B $* 1=I+E; ?1-A>.l,> -27$.' 15. HEXAGON-$:H(&l), where &1 is to be enclosed within a hexagon. '$* A =D(&l$.}; 2,H =D/2+.4; F=H/3; A =A+.8; 1,G=2*F+A; ?--" 3= -H,X=G,>$<$_>F,H;A,;F,-H;-F,-H; -A, ;-F,H_F+.4, - (B+C) /2%&1$>$_G$.' 16. INTEGRAL-$:I(&l &2 &3), where &1 gives the lower limit of integration, &2 the upper limit and &3 the variable of integration. 9. CIRCLE-$: C (&1) , where &1 is to be encircled. '$* A =D(&l$.); D=1If(A,D); 1,D= 1.4*D +.4; 3, E= -D/2; ?--" 2= -E,X=D,>; 1=14P; 14P=D $<$<$-, E$@2#0$@0$>$* 14P=I; X = (D-A)/2; Y = - (B+C)/2%&1$>$_D$.' 10. ORDINARY DERIVATIVE-$:D(&1 &2 &3), where the &3th derivative of &1 with respect to &2 is to be plotted. '$:F(#8$+&3$=&1$. #8&2$+&3$=$.)$.' 11. DIAl\10ND-$:D1 (&1), where &1 is to be enclosed 'within a diamond. '$* A =D(&l$.); 1= .75*D+A/2+.4; 2,J = 2*1/3; 1,E=1 +1; ?--" 3=-J,X=E,> $<$_;I,J;E;I,-J;_I-A/2, - (B+C)/2 %&l$>$_E$.' 12. SIl\IPLE FRACTION-$:F1 (&1 &2), where the ratio of &1 to &2 is to be plotted. No measurement of arguments is performed: they should have equal widths, no undershoots and full height. '$($+$(&1$) -$=$)$-&2$=$.' 13. GENERAL FRACTION-$:F(&1 &2), where the ratio of &1 to &2 is to be plotted. The arguments can have different dimensions. '$* A =D(&l$.); E=D(&2$.); 1=111 (A,E) ; ?--"X,1=1,2=D+.7, 3= .3-H,> $_,.!)$<$<$_(1-A)/2,-C + .2%&1$ > $_0;1_ (I - E) /2, - F - .2%&2$ > $_1 ,-';")$.' '$@4#3$@0$<$_ -.8,-.8$-&1$>$_ - .3,1.8&2$ = $_, -1#8&3$.' 17. l\1IDDLE-$:M(&1 &2), where &2 is to be centered in the box &1 characters long. One starts in the middle of the left side of the box and ends in the middle of the right side. '$* E=&l; ?--"X =E,> ; A =D(&I$.); X= (E-A)/2;Y= - (B+C)/2 %&2$>$_E$.' 18. PAREN-$:Pl (&1), where &1 is to be enclosed in parentheses. '$:B9(&1$. ($. )$.)$.' 19. PARTIAL DERIVATIVE-$:P(&l &2 &3), where the &3th partial derivative of &1 with respect to &2 is to be plotted. '$:F(#D$+&3$=&I$. #D&2$+&3$=$.)$.' 20. SQUARE ROOT-$:R(&l), where the square root of &1 is to be plotted. '$*A=D(&l$.) ;A=A+.4;E=.2$_, C+D/2_>E,.I;D/12+E,-D/2-E-.1; D/6,D+2*E_;A;A,-E_E,-B-E$*?--" X=A,>$<&l$>$_A$.' 21. SUl\1l\1ATION-$:S(&1 &2), where the summation of &1 to &2 is to be plotted. Note that &1 must contain the equal sign if desired, e.g., if one wants A = 1 below the sigma, &1 should equal 'A = 1$.'. '$:A9(&1$. &2$. #6$.)$.' SCROLL 22. TIlVIE DERIVATIVE-$: T (&1), where a dot is to be plotted above the argument. '$* A =G(&l$.) ;I,X = (A-E(.$.) )/2; Y,E=B+.2%.$_I,-E$., 537 a 23. PRODUCT-$:X(&l &2), where the product (II) is to be plotted from &1 to &2. &1 must contain the equal sign if desired as for summation. '$:A9(&1$. &2$. #9$.)$.' AMTRAN-An interactive computing system by JURIS REINFELDS, NIELS ESKELSON, HERMANN KOPETZ and GERHARD KRATKY University of Georgia Athens, Georgia well-known semantics of algebra as possible without reducing the power of the system. The system should be fully recursive and there should be no practical limitation to the length of variable and program names, the number of defined variables, the dimensions of arrays, etc. INTRODUCTION The AMTRAN system (Automatic Mathematical TRANslator) is a multiterminal conversational mode computing system which enables the mathematically oriented user to interact directly with the computer in a natural mathematical language. The first version of AMTRAN was developed at the George C. lV[arshall Space Flight Center in Huntsville, l and was implemented on IBM 1620 and, 1130 computers and, as a time-sharing version, on a Burroughs 5500 computer. A modified 1620 console version is currently in use at the University of Georgia. In connection with the project of implementing a multiconsole version on an IBM 1130 computer, the AMTRAN language has been revised and formally defined at the University of Georgia Computer Center.2, 3 The following objectives have been of primary importance in the development of the AMTRAN systems: DEFINITION OF THE PROGRAMMING LANGUAGE AMTRAN The BNF notation is the only widely used formal method for the definition of a computer language. It has been derived from str~ctly structural considerations. The use of bracketed English words as metasymbols has made the notation look less formidable but, at the same time, has introduced semantic aspects which originally were not present. As long as these semantic aspects served only to characterize necessary structural categories, the method was as originally intended. As soon as strict semantic categories were established, with no structural characteristics, difficulti~s arose. This can be explained by a simple example. The ALGOL 60 report 5 states: First, the initial use of a computer by the mathematically oriented nonprogrammer and scientist should be made easy by using a simple language as similar to mathematical notation as possible. The language should be designed for incremental learning so that a user may successfully use the system without knowing all of its details. Second, the system should provide powerful programming capabilities for the solution of medium and large scale problems and complicated algorithms by the more experienced user or professional programmer. Third, since AMTRAN was conceived as a special purpose language for mathematical and scientific use, the system should provide more flexibility in programming, debugging, and turnaround time compared to conventional computing systems. Fourth, the new design and definition of the AMTRAN language should have as few restrictions, exceptional rules to remember, and departures from the (variable identifier> :: = (identifier> (simple variable> :: = (variable identifier> The introduction of the categories (simple variable> and (variable identifier> is necessary to distinguish between a strictly formal a-numeric string ( (identifier» and a special type of a variable. This distinction is based only on the meaning assigned to a particular string; the structure remains the same. Therefore, the difference is not really expressed by the above method since the distinction between the categories'is left to the arbitrary interpretation of the metasymbols (identifier >, and (simple variable> and is by no means formally defined. Omitting all these questionable 'semantic categories' would diminish the content of such a language definition considerably. 537 538 Spring Joint Computer Conference, 1970 Realizing these difficulties, a new general method of formal definition of a mathematically oriented computer language was developed at the University of Georgia Computer Center. 2 By introducing different levels of the language, one structural and several semantic levels, it is possible to distinguish between the structure of a language and the meaning attached to the structural elements. A large part of the semantics is systematized in notions like type, range, sign, dimension of numerical quantities, and binding power of mathematical operators. It is found to be sufficient to introduce a few well-defined semantic values. Each structural element is assigned a semantic and dimensional characteristic which carries the information associated with the structural unit. A structural unit and its semantic and dimensional characteristic are thus combined to form a constituent of the language. The idea of the new method of definition is to describe the language by setting up production rules for constituents2 rather than for structural units by themselves. This systematization and formalization covers a much larger part of the language than mere 'syntax' definitions or the definition by BNF notation. Now it is possible to resolve the previous ALGOL example. The structure of {simple variable), {variable identifier), and {identifier) are the same, an alphanumeric string. But, since the semantic characteristics of a strictly formal alphanumeric string and of a simple variable are different, they form different constituents of the language. Therefore, the distinction' between these categories is not left to the interpretation of the reader but is a part of the language definition. BASIC FEATURES OF AMTRAN Language features The complete definition and a detailed description of the AMTRAN language appears in References 3 and 6. In this chapter, we shall describe the basic features of the language in accordance with the design principles and the aims of the AMTRAN systems as presented in the introduction. Basic operators and functions +, -, *, I, power, unary minus, SQRT, LN, ABS, EXP, LOG, SIN, COS, TAN, TANH, ARCTAN. These operators are intrinsic to the system and, by using the well-known semantics of algebra and the familiar names for the functions, they can be used immediately by the nonprogrammer. Absence of diInension stateD1ents Arrays are created and changed with complete freedom at run time. Examples: x = ARRAY (0, 5, 5) creates an array X with the values 0, 1, 2, 3, 4, 5 y = X + 1 creates an array Y with the values 1, 2, 3, 4, 5, 6 even if Y had been a scalar or an array with another dimension before. Another operator to construct arrays is the concatenate operator &. X = 0&1&2&3&4&5 creates the same array as array X in the previous example. Absence of declaration stateD1ents The type of a variable is automatically defined through the assignment statement until it is changed by another assignment statement. At execution time, the control routine for each operator checks the type and range, sign and dimension of the operands. A new concept is used for the handling of integers. They are stored and treated internally as real numbers, but a special rounding routine preserves their integer status through any arithmetical manipulations. Every time an operator requires an integer argument, the system examines the real representation of the value of the operand to determine whether it represents an integer number; an error message is typed if it does not. Thus, AMTRAN will give the right results for ( - 2.5)2 as well as for (-2.5)3*SIN(1T/6)+O.5. AutoD1atic array. arithD1etic The basic operators and functions mentioned in 3.1.1. and the relational operators can be used not only for scalars, but also on arrays. Thus, the user may compute directly with the numerical representations of functions without writing loops. For example, the function 2 . Y=y;.e-xsmx A1VITRAN is represented in AMTRAN as Y = 2/SQRT PI*EXP - X*SIN X where X can represent an array of 100 equally spaced intervals generated by X = ARRAY (0, PI, 100). The resulting function Y is represented by an array of 101 numbers, where each Y-value is the value of the above function for the corresponding X-value. Conditional operators: IF, THEN, ELSE Relational operators: GT, GE, EQ, LE, LT Boolean operators: NOT, AND, OR They are basically the same as in ALGOL 60 exc~pt that each IF has a corresponding FI (ALGOL 68 style) at the end of the conditional expression to avoid the dangling ELSE problem. Unconditional branch and loop The GO TO operator can be used for transfer of control to any numbered statement in a program. The argument of GO TO may be any expression which returns a scalar value. Non-integer values cause a warning message. The REPEAT-operator is used to repeat a group of statements a specified number of times. These repeatloops can be nested arbitrarily. Generality of operands As a general rule, every operand or parameter in AMTRAN can be an expression, but the result of this expression has to fulfill the semantic requirements its operator asks for. Example: The third parameter of the ARRAY operator (number of intervals) may be an expression, but the result has to be an integer with the dimension one. 539 Statement 1 picks up two arguments which have to be provided by the program call. For example, A (2, 4) will give 111 = 2 and N = 4 upon execution of statement 1. Powerful instruction set which can easily be expanded by the user The philosophy of AMTRAN is that the user should be given a powerful basic set of instructions which are intrinsic to the system, together with a disc file library of instructions which are actually routines written in AMTRAN. In addition, the user can define his own high level operators by writing special AMTRAN routines. Operators for automatic numerical analysis (integration, derivation), satisfactory for routine situations, are also included. ASK- and TEACH-operators The ASK-operator can be used to program a dialogue between the computer and the user. If the user is not satisfied with a system message, he can react with 'ASK,' and the system will respond with a more detailed message. This feature makes AMTRAN to a truly conversational system. The experienced user does not have to spend time running through the questions and answers, which are of great importance for the average and beginning AMTRAN -user. The TEACH-program allows the user to learn how to use AMTRAN directly on the console. The new AMTRAN -user does not have to take a course in programming; he need not study a programming language or learn how to read computer outputs. He can get started with a simple teach program on the console in a few minutes without having _to learn complicated rules. If there occurs a problem in using AMTRAN, the user can use the TEACH-operator and run the part of the teach program which refers to his problem. Fully recursive prograInIning capabilities An example of an inherently recursive function is Ackermann's function A (111, N), defined over the positive integers and zero: 1. 111 = IN 1, N = IN 2 2. A = IF 111 EQ 0 THEN N + 1 ELSE IF N EQ 0 THEN A(1l1 - 1, 1) ELSE A(1l1 - 1, A(1l1, N - 1» FI FI 3. NA1VIE A Call by sYInbol concept The call by symbol concept allows the passing of executable strings as parameters to subprograms. This symbolic expression can contain variables local to the calling program and variables local to the subprogram. Everytime the parameter is invoked within the suij~ I program, the symbolic expression is evaluated using th~ actual internal and external variables. 540 Spring Joint Computer Conference, 1970 Three m.odes of operation 1. Execute mode: An interactive system must have an execution mode (or desk calculator mode) where each statement is executed immediately and control is returned to the keyboard. This is the default mode in AMTRAN. 2. Suppressed mode: The suppressed mode (delayed mode) allows the user to construct programs which are syntax checked and stored for execution at a later time. 3. Checking mode: AMTRAN has a third mode, the checking mode, which allows the user to execute parts of suppressed programs while they are being constructed. This is an important aid for online program construction. Implementation on the IBM 1130 computer This AMTRAN version was implemented on an IBM 1130 computer with 8K of core and a disc. A typewriter version is currently being tested, and a multiconsole version is under development. Some of the goals for the implementation were high speed, fully dynamic storage allocation, powerful editing and checking capabilities, and a completely re-entrant structure for multi console use with short response time. The length of programs, the dimension of arrays, the ratio of program area to variable area, the number of defined variables and programs can be chosen with complete freedom as long as the available core storage, dependent upon implementation, is not exceeded. A special monitor system, independent of IBM software, has been developed for a more efficient use of the disc to obtain short response times. AMTRAN as a tool for pedagogic purposes The interactive system AMTRAN is highly useful not only for research purposes but also as an educational tool. Lowering the level of difficulty in programming makes the computing facility available for students who are not basically interested in computer science but want to expand their understanding of mathematics or physics. Graphic display capabilities are very well suited for studying and demonstrating the behavior of functions. By interacting directly with the computer, the student also gets a better feeling for the kind of problems involved in programming a computer. A multiconsole system eliminates keypunch problems, and there are none of the time delay, debugging, or control language problems usually found in batch mode. COMPARISON WITH OTHER HIGH LEVEL LANGUAGES A comparative study between AMTRAN and other high level languages has to be divided into two parts. Only language features can be compared with batch mode languages, whereas the whole AMTRAN - system can be taken into account for a comparison with other interactive systems. Batch mode languages Most likely, PL/1, ALGOL, or FORTRAN would be used to solve mathematical, technical or scientific problems in batch mode. A comparison with AMTRAN is not really feasible as the basic philosophy and design principles of batch mode languages are completely different from AMTRAN. Since language development goes more and more in the direction of powerful general purpose languages, it becomes more and more difficult, time consuming, and cumbersome for the non programmer to make the first step towards use of a computer. But even for the experienced user, the three languages mentioned above do not provide the convenience and facilities in programming that AMTRAN does. They need type and dimension declarations; the flexibility in changing types and dimensions at run time is lacking; and they do not have AMTRAN's array handling capabilities. PL/1 with its default philosophY, its various types of storage allocation, and certain automatic array arithmetic features is close to AMTRAN's facilities and philosophy of programming convenience. On the other hand, it is inconvenient for the user to keep track of storage allocation problems in writing recursive or re-entrant programs or in using arrays with computed origin. PL/1 is truly a general purpose programming language. I t is designed for programming needs in science, engineering, data management, and business. AMTRAN, on the other hand, is a special purpose programming language for mathematical, scientific, and technical applications and has not been designed to compete in general with a language like PL/l. It is not intended to handle extensive data; therefore, it does not need powerful I/O-capabilities and sophisticated formatting facilities. But it can compete or even perform better within the limits of its special purpose. Interactive systems An interactive console system fills the gap between a desk top calculator and conventional batch mode Al\1:TRAN computer programming. On one hand, it has to give immediate answers to simple requests; on the other hand, it has to provide powerful programmmg capabilities. A milestone In the development of interactive systems was the Culler-Fried-System, which strongly influenced the early AMTRAN development. Prof. Culler's system represents a highly powerful multiconsole system. A disadvantage is that it does not stay close to the mathematical notation, and it is not simple and easy to learn. 7 CPS is a conversational subset of PL/l. It has a calculator 'mode and a program mode and is a useful conversational system although it does not have AMTRAN's flexibility and power in array and function handling. Iverson's language APL (A Programming Language) is a more formal approach to application programming. I t is particularly useful for classical mathematical applications, and it has been implemented as a powerful interactive time-sharing system. The language has features such as array arithmetic, programming capabilities, and a large set of primitive operators including matrix handling operators. An extensive set of special symbols is used instead of keywords. Thus, a special typewriter is necessary. The proponents of APL claim that its source code is more efficient per statement than that of any other programming language. On the other hand, it is less readable. One has to learn special symbols instead of using mnemonics. For example, the quad D in APL is less informative as an output operator than the TYPE in AMTRAN. Major disadvantages are that APL does not follow classical mathematical notation, there is no hierarchy among operators, and the order of execution of statements is from right to left. This means the mathematician and scientific non programmer must convert his formulas, written in normal textbook format, into the APL-notation, and the programmer experienced in any other language is even more confused. APL is a language which requires both care and training for simple applications. .541 CONCLUSIONS AMTRAN, as described in this paper, can be implemented on a small computer. Such a small computing system is a serious alternative to a console of a commercial time-sharing system. The present developments on the hardware market-a decrease in the price of small computers-make the outlook for problem-solving systems like AMTRAN, particularly bright. ACKNOWLEDGMENT This work has been supported in part by National Science Foundation grant GJ-39. BIBLIOGRAPHY 1 J REINFELDS L A FLENKER R N SEITZ P L CLEM Proceedings of the Annual Meeting of the ACM p 469 Thompson Press New York 1966 2 G KRATKY H KOPETZ The semantics of a mathematically oriented computer language Proceedings of the ACM N at-ional Conference San Francisco August 1969 3 G KRATKY The definition of AMTRAN Computer Center AMTRAN Report University of Georgia 1969-4 4 B D FRIED On the users point of view Interactive Systems for Experimental Applied Mathematics M Klerer and J Reinfelds eds Academic Press New York 1968 fj P NAUR et al Revised report on the algorithmic language ALGOL 60 Communications of the ACM Vol 6 pp 1-17 January 1963 6 N ESKELSON H KOPETZ AMTRAN-manual for the IBM 130 computer To be Published 1969 7 W D ORR Conversational computers John Wiley and Sons Inc 1968 Computer network development to achieve resource sharing by LAWRENCE G. ROBERTS and BARRY D. WESSLER Advanced Research Projects Agency Washington, D.C. gether as remote users of each other and permitting user programs to interact with two consoles (the human_-user and the remote computer), the basic characteristics of a network connection are obtained. Such an experiment was made between the TX-2 computer at Lincoln Lab and the Q-32 computer at SDC in 1966 in order to test the philosophy.1 Logically, such an interconnection is quite powerful and one can tap all the resources of the other system. Practically, however, the interconnection of pairs of computers with console grade communication service is virtually useless. First, the value of a network to a user is directly proportional to the number of other workers on the net who are creating potentially useful resources. A net involving only two systems is therefore far less valuable than one incorporating twenty systems. Second, the degradation in response caused by using telegraph or voice grade communication lines for network connections is significant enough to discourage most users. Third, the cost to fully interconnect computers nation-wide either with direct leased lines or dial-up facilities is prohibitive. All three problems are a direct result of the inadequacy of the available communication services. INTRODUCTION In this paper a computer network is defined to be a set of autonomous, independent computer systems, interconnected so as to permit interactive resource sharing between any pair of systems. An overview of the need for a computer network, the requirements of a computer communication system, a description of the properties of the communication system chosen, and the potential uses of such a network are described in this paper. The goal of the computer network is for each computer to make every local resource available to any computer in the net in such a way that any program available to local users can be used remotely without degradation. That is, any program should be able to call on the resources of other computers much as it would call a subroutine. The resources which can be shared in this way include software and data, as well as hardware. Within a local community, time-sharing systems already permit the sharing of software resources. An effective network would eliminate the size and distance limitations on such communities. Currently, each computer center in the country is forced to recreate all of the software and data files it wishes to utilize. In many cases this involves complete reprogramming of software or reformatting the data files. This duplication is extremely costly and has led to considerable pressure for both very restrictive language standards and the use of identical hardware systemS. With a successful network, the core problem of sharing resources would be severely reduced, thus eliminating the need for stifling language standards. The basic technology necessary to construct a resource sharing computer :network has been available since the advent of time-sharing. For example, a time-sharing system makes all its resources available to a number of users at remote consoles. By splicing two systems to- DESIGN OF A NETWORK COIVIMUNICATIONS SERVICE After the Linco]n-SDC network experiments, it was clear that a completely new communications service was required in order to make an effective, useful resource-sharing computer network. The communication pipelines offered by the carriers would probably have to be a component of that service but were clearly inadequate by themselves. What was needed was a message service where any computer could submit a message destined for another computer and be sure it would be delivered promptly and correctly. Each interactive 543 544 Spring Joint Computer Conference, 1970 conversation or link between two computers would have messages flowing back and forth similar to the type of traffic between a user console and a computer. l\tlessage sizes of from one character to 1000 characters are characteristic of man-machine interactions and this should also be true for that network traffic where a man is the end consumer of the information being exchanged. Besides having a heavy bias toward short messages, network traffic wil1 also be diverse. With twenty computers, each with dozens of time-shared users, there might be, at peak times, one or more conversations between all 190 pairs of computers. Reliability Communications systems, being designed to carry very redundant information for direct human consumption, have, for computers, unacceptably high downtime and an excessively high error rate. The line errors can easily be fixed through error detection and retransmission; however, this does require the use of some computation and storage at both ends of each communication line. To protect against total line failures, there should be at least two physically separate paths to route each message. Otherwise the service will appear to be far too unreliable to count on and users will continue to duplicate remote resources rather than access them through the net. Responsiveness In those cases where a user is making more or less direct use of a complete remote software system, the network must not cause the total round-trip delay to exceed the human short-term memory span of one to two seconds. Since the time-sharing systems probably introduce at least a one-second delay, the network's end-to-end delay should be less than Y2 second. The network response should also be comparable, if possible, to using a remote display console over a private voice grade line where a 50 character line of text (400 bits) can be sent in 200 ms. Further, if interactive graphics are to be available, the network should be able to send a complete new display page requiring about 20 kilo bits of information within a second and permit interrupts (10-100) to get through very quickly, hopefully within 30-90 ms. Where two programs are interacting without a human user being directly involved, the job will obviously get through sooner, the shorter the message delay. There is no clear critical point here, but if the communications system substantially slows up the job, the user will probably choose to duplicate the remote process or data at his site. For such cases, a reasonable measure by which to compare communications systems is the "effective bandwidth" (data block length for the job/end-to-end transmission delay). Capacity The capacity required is proportional to the number and variety of services available from the network. As the number of nodes increase, the traffic is expected to increase more than linearly, until new nodes merely duplicate available network resources. The number of nodes in the experimental network was chosen to: (1) involve as many computer researchers as possible to develop network protocol and operating procedures, (2) involve special facilities, such as the ILLIAC, to distribute its resources to a wider community, (3) involve as many disciplines of science as possible to measure the effect of the network on those disciplines, and (4) involve many different kinds of computers and systems to prove the generality of the techniques developed. The nodes of the network were generally limited to: (1) those centers for which the network would truly provide a cost benefit, (2) governmentfunded projects because of the use of special rate government-furnished communications, and (3) ARPA-funded projects where the problems of intercomputer accounting could be deferred until the network was in stable operation. The size of the experimental network was chosen to be approximately 20 nodes nation-wide. It was felt that this would be large and diverse enough to be a useful utility and to provide enough traffic to adequately test the network communication system. For a 20 node network, the total traffic by mid1971 at peak hours is estimated to be 200-800 KB (kilobits per second). This corresponds to an average outgoing traffic per node of 10-40 KB or an average of 0.5-2 KB traffic both ways between each pair of nodes. Traffic between individual node-pairs, however, will vary considerably, from zero to 10 KB. The total traffic per node will also vary widely, perhaps from 5-50 KB. Variations of these magnitudes will occur in both space and time and, unless the connumications system can reallocate capacity rapidly (seconds), the users will find either the delay or cost excessive. However, it is expected that the total capacity required for all 20 nodes will be fairly stable, smoothed out by having hundreds of active network users spread out across four time zones. Cost To be a useful utility, it was felt that communications costs for the network should be less than 25% of the Computer Network Development computing costs of the systems connected through the network. This is in contrast to the rising costs of remote access communications which often cost as much as the computing equipment. If we examine why communications usually cost so much we find that it is not the communications channel per se, but our inefficient use of them, the switching costs, or the operations cost. To obtain a perspective on the price we commonly pay for communications let us evaluate a few methods. As an example, let us use a distance of 1400 miles since that is the average distance between pairs of nodes in the projected ARPA Network. A useful measure of communications cost is the cost to move one million bits of information, cents/megabit. In the t.able below this is computed for each media. It is assumed for leased equipment and data set rental that the usage is eight hours per working day. TABLE I-Cost. per Megabit. for Various Communicat.ion Media 1400 Mile Distance Media Telegram $3300.00 Night. Letter 565.00 Computer Console 374.00 TELEX DDD (103A) 204.00 22. ,1)0 AUTODIN 8.20 DDD (202) Letter 3 .4:) 3.30 W. U. Broadband WATS 2.03 1.54 Leased Line (201) .57 Data 50 .47 Leased Line (303) Mail DEC Tape Mail IBM Tape .23 .20 .034 For 100 words at 30 bits/wd, daytime For 100 words at. 30 bits/wd, overnight delivery 18 baud avg. use 2, 300 baud DDD service line & data sets only 50 baud t.eletype service 300 baud data sets, DDD daytime service 2400 baud message service, full use during working hours 2000 baud data sets Airmail, 4 pages, 250 wds/pg, 30 bits/wd 2400 baud service, full duplex 2000 baud, used 8 hrs/working day 2000 baud, commercial, full duplex 50 KB dial service, utilized full duplex 50 KB, commercial, full duplex 2.5 megabit tape, airmail 100 megabit tape, airmail Special care has also been taken to minimize the cost of the multiplexor or switch. Previous store and forward systems like DoD's AUTODIN system, have had such complex, expensive switches that over 95% of the 545 total communications service cost was for the switches. Other switch services adding to the system's cost, deemed superfluous in a computer network, were: long term message storage, multi-address messages and individual message accounting. The" final cost criteria was to minimize the communications software development cost required at each node site. If the network software could be generated centrally, not only would the cost be significantly reduced, but also the reliability would be significantly enhanced. THE ARPA NETWORK Three classes of communications systems were investigated as candidates for the ARPA Network: fully interconnected point to point leased lines, line switched (dial-up) service, and message switched (store and forward) service. For the kind of service required, it was decided and later verified that the message switched service provided the greater flexibility, higher effective bandwidth, and lower cost than the other two systems. The standard message switched service uses a large central switch with all the nodes connected to the switch via communication lines; this configuration is generally referred to as a Star. Star systems perform satisfactorily for large blocks of traffic (greater than 100 kilobits per message), but the central switch saturates very quickly for small message sizes. This phenomenon adds significant delay to the delivery of the message. Also, a Star design has inherently poor reliability since a single line failure can isolate a node and the failure of the central switch is catastrophic. An alternative to the Star, suggested by the Rand study "On Distributed Communications"3, is a fully distributed message switched system. Such a system has a switch or store and forward center at every node in the network. Each node has a few transmission lines to other nodes; messages are therefore routed from node to node until reaching their destination. Each transmission line thereby multiplexes messages from a large number of source-destination pairs of nodes. The distributed store and forward system was chosen, after careful study, as the ARPA Network communications system. The properties of such a communication system are described below and compared with other systems. A more complete description of the implementation, optimization, and initial use of the network can be found in a series of five papers, of which this is the first. The second paper by Heart, et al4 describes the design, implementation and performance characteristics of the message switch. The third paper by Kleinrock5 derives procedures for optimizing the capacity of the trans- 546 Spring Joint Computer Conference, 1970 mission facility in order to minimize cost and average message delay. The fourth paper by Frank, et al6 describes the procedure for finding optimized network topologies under various constraints. The last paper by Carr, et aF is concerned with the system software required to allow the network computers to talk to one another. This final paper describes a first attempt at intercomputer protocol, which is expected to grow and mature as we gain experience in computer networking. Network properties The switching centers use small general purpose computers called Interface Message Processors (IMPs) to route messages, to error check the transmission lines and to provide asynchronous digital interface to the main (HOST) computer. The IMPs are connected together via 50 Kbps data transmission facilities using common carrier (ATT) point to point leased lines. The topology of the network transmission lines was selected to minimize cost, maximize growth potential, and yet satisfy all the design criteria. plies no long term message storage' and no message accounting by the IMP. If these functions are later needed they can be added by establishing a special node in the 'network. This node vmllld accept accounting information from all the I~IPs and also could be routed all the traffic destined for HOSTs which are down. We do not believe these functions are necessary, but the network design is capable of providing them. Responsiveness The target goal for responsiveness was .5 seconds transit time from any node to any other, for a 1000 bit (or less) block of information. The simulations of the network show the transit time of a 1 kilobit block of .1 seconds until the network begins to saturate. After saturation the transit time rise~ quickly because of excessive queuing delays. However, saturation will hopefully be avoided by the net acting to choke off the inputs for short periods of time, reducing the buffer queues while not significantly increasing the delay. Reliability Capacity The network specification requires that the delivered message error rates be matched with computer characteristics, and that the down-time of the communications system be extrememly small. Three steps have been taken to insure these reliability characteristics: (1) at least two transmission paths exist between any two modes, (2) a 24 bit cyclic check sum is provided for each 1000 bit block of data, and (3) the Il\1P is ruggedized against external environmental conditions and its operation is independent of any electromechanical devices (except fans). The down-time of the transmission facility is estimated at 10-12 hours per year (no figures are currently available from ATT). The duplication of paths should result in average down-time between any pair of nodes, due to transmission failure, of approximately 30 seconds per year. The cyclic check sum was chosen based on the performance characteristics of the transmission facility; it is designed to detect long burst errors. The code is used for error detection only, with retransmission on an error. This check reduces the undetected bit error rate to one in 1012 or about one undetected error per year in the entire network. The ruggedized IMP is expected to have a mean time to failure of 10,000 hours; less than one failure per year. The elimination of mass storage devices from the IMP results in lower cost, less down-time, and greater throughput performance of the IMP, but im- The capacity of the network is the throughput rate at which saturation occurs. The saturation level is a function of the topology and capacity of the transmission lines, the traffic distribution between pairs of nodes (traffic matrix) and the average size of the blocks sent over the transmission lines. The analysis of capacity was performed by Network Analysis Corporation during the optimization of the network topology. As the analysis shows, the network has the ability to flexibly in... crease its capacity by .adding additional transmission lines. The use of 108 and 230.4 KB communication services, where appropriate, considerably improves the cost-performance of the network. $49 K per node per year 16 KB per node SRI t UTAH NCAR AWS CASE CMU UCSB Figure 1-ARPA network initial topology Computer Net\vork Development !}47 Cost Per ,.Aegabit for Node- Pair Average Rates of .5 to I KB $0c---------------------~ $59 K per node per year 23 KB per node W. U. B",adbond 2.4 KB Dial Up _ - DOD 2 KB Dial Up CMU UCSB $1 UCLA RAND BBN HARVARD BTl FULLY INTERCONNECTED 19 KB NET Watts 2 KB Dial Up ~ I Data 50 KB Dial Up MITRE Figure 2-ARPA network expanded topology Configuration Initial configuration of the ARPA Network is currently planned as shown in Figure 1. The communications circuits for this network will cost $49K per node per year and the network can support an average traffic of 16 KB per node. If the traffic builds up, additional communication lines can be added to expand the capacity as required. For example, if 23 KB per node is desired, the network can be expanded to the configuration shown in Figure 2 for an increase of only $10K per node per year. Expansion can be continued on this basis until a capacity of about 60 KB per node is achieved, at which point the IMPs would tend to saturate. COMPARISON WITH ALTERNATIVE NETWORK COMMUNICATIONS SYSTEMS DESIGNS For the purpose of this comparison the capacity required was set at 500 baud to 1 KB per node-pair. A minimal buffer for error checking and retransmission at every node is included in the cost of the systems. Two comparisons are made between the systems: the cost per megabit as a function of the delay and the effective bandwidth as a function of the block size of the data. Several other functions were plotted and compared; the two chosen were deemed the most ipf'')rmative. The latter graph is particularly informative in showing the effect of using the network for short, interactive message traffic. The systems chosen for the comparison were fully interconnected 2.4 KB and 19 KB leased line systems, Data-50 the dial-up 50 KB service, DDD the standard 2 KB voice grade dial-up system, Star networks using 19 KB and 50 KB leased lines into a central switch, and the ARPA Network using 50 KB leased lines. The graph in Figure 3 shows the cost per megabit versus delay. The rectangle outlines the variation caused by a block size variation of 1 to 10 Kilobits and FULLY INTERCONNECTED 2.4 KB NET '~.~,-~~~~~U---L~-L~~U,O---L-~~~LUIOO Delay in Seconds for Messages 1 to 10 Kilobits Long Figure 3-Cost vs delay for potential 20 node network designs capacity requirement variation of 500 to 1000 baud. The dial-up systems were used in a way to minimize the line charges while keeping the delay as low as possible. The technique is to dial a system, then transmit the data accumulated during the dial-up (20 seconds for DDD, 30 seconds for Data-.50). The dial-up systems are still very expensive and slow as compared with other alternatives. The costs of the ARPA Network are for optimally designed topologies. The 19 KB Star was eliminated because the system saturated just below 1 KB per node-pair which did not provide adequate growth potential though the cost was comparable to the ARPA Network. For the 50 KB Star network, the switch is assumed to be an average distance of 1300 miles from every node. The graph in Figure 4 shows the effective bandwidth EFFECTIVE BANDWIDTH \~o~B Length/End to End Delay) TWENTY NODE NETWORK 10M "Figure 4-Effective Bandwidth vs block size· 548 Spring Joint Computer Conference, 1970 versus the block size of the data input to the network. The curves for the various systems are estimated for traffic rates of 500 to 1000 baud. The comparison shows the ARPA Net does very well at small block size where most of the traffic is expected. has responsibility for system development and system maintenance, and UCLA, who has responsibility for the Net measurement and modeling. All the sites will also be involved in the generation of intercomputer protocol, the language the systems use to talk to one another. NETWORK PLANS External research community use Use of the Network is broken into two successive phases: (1) Initial Research and Experimental Use, and (2) External Research Community Use. These phases are closely related to our plans for Network implementation. The first phase, started in September 1969, involves the connection of 14 sites involved principally in computer research. These sites are current ARPA contractors who are working in the areas of Computer System Architecture, Information System Design, Information Handling, Computer Augmented Problem Solving, Intelligent Systems, as well as Computer Networks. This phase should be fully implemented by November 1970. The second phase involves the extension of the number of sites to about 20 to include ARPA-supported research disciplines. Initial research and experimental use During Phase One, the community of users will number approximately 2000 people. This community is involved primarily in computer science research and all have ARPA-funded on-going research. The major use they will make of the network is the sharing of software resources and the educational experience of using a wider variety of systems than previously possible. The software resources available to the Network include: advanced user programs such as MATHLAB at MIT, Theor.em Provers at SRI, Natural Language Processors at BBN, etc., and new system software and languages such as LEAP, a graphic language at Lincoln Lab, LC2, an interactive ALGOL system at Carnegie, etc. Another major use of the Network will be for accessing the Network Information Center (NIC). The NIC is being established at SRI as the repository of information about all systems connected into the Network. The NIC will maintain, update and distribute hard copy information to all users. It will also provide file space and a system for accessing and updating (through the net) dynamic information about the systems, such as system modifications, new resources available, etc. The final major use of the Net during Phase One is for measurement and experimentation on the Network itself. The primary sites involved in this are BBN, who During the time period after November 1970, additional nodes will be installed to take advantage of the Network in three other ARPA-funded research disciplines: Behavioral Science, Climate Dynamics and Seismology. The use of the Network at these nodes will be oriented more toward the distribution and sharing of stored data, and in the latter two fields the use of the ILLIAC IV at the University of Illinois. The data sharing between data management systems or data retrieval systems will begin an important phase in the use of the Network. The concept of distributed data bases and distributed access to the data is one of the most powerful and useful applications of the network for the general data processing community. As described above, if the Network is responsive in the human time frame, data bases can be stored and maintained at a remote location rather than duplicating them at each site the data is needed. Not only can the data be accessed as if the user were local, but also as a Network user he can write programs on his own machine to collect data from a number of locations for comparison, merging or further analysis. Because of widespread use of the ILLIAC IV, it will undoubtably be the single most demanding node in the Network. Users will not only be sending requests for service but will also send very large quantities of input and output data, e.g., a 106 bit weather map, over the Net. Projected uses of the ILLIAC include weather and climate modeling, picture processing, linear programming, matrix manipulations, and extensive work in other areas of simulation and modeling. In addition to the ILLIAC, the University of Illinois will also have a trillion bit mass store. An experiment is being planned to use 10% of the storage (100 billion bits) as archival storage for all the nodes on the Net. This kind of capability may help reduce the number of tape drives and/or data cells in the Network. FUTURE There are many applications of computers for which current communications technology is not adequate. One such application is the specialized customer service computer systems in existence or envisioned for the Computer Network Development future; these services provide the customer with information or computational capability. If no commercial computer network service is developed, the future may be as follows: One can envision a corporate officer in the future having many different consoles in his office: one to the stock exchange to monitor his own company's and competitor's activities, one to the commodities market to monitor the demand for his product or raw materials, one to his own company's data management system to monitor inventory, sales, payroll, cash flow, etc., and one to a scientific computer used for modeling and simulation to help plan for the future. There are probably many people within that same organization who need some of the same services and potentially many other services. Also, though the datb, exists in digital form on other computers, it will probably have to be keypunched into the company's modeling and simulation system in order to perform analyses. The picture presented seems rather bleak, but is just a projection of the service systems which have been developed to date. The organization providing the service has a hard time, too. In addition to collecting and maintaining the data, the service must have field offices to maintain the consoles and the communications multiplexors adding significantly to their cost. A large fraction of that cost is for communications and consoles, rather than the service itself. Thus, the services which can be justified are very limited. Let us now paint another picture given a nationwide network for computer-to-computer communication. The service organization need only connect its computer into the net. It probably would not have any consoles other than for data input, maintenance, and system development. In fact, some of the service's data input may come from another service over the Net. Users could choose the service they desired based on reliability, cleanliness of data, and ease of use, rather than proximity or sole source. Large companies would connect their computers into the net and contract with service organizations for the 549 use of those services they desired. The executive would then have one console, connected to his company's machine. He would have one standard way of requesting the service he desires with a far greater number of services available to him. For the small company, a master service organization might develop, similar to today's time-sharing service, to offer console service to people who cannot afford their own computer. The master service organization would be wholesalers of the services and might even be used by the large companies in order to avoid contracting with all the individual service organizations. The kinds of services that will be available and the cost and ultimate capacity required for such service is difficult to predict. It is clear, however, that if the network philosophy is adopted and if it is made widely available through a common carrier, that the communications system will not be the limiting factor in the development of these services as it is now. REFERENCES 1 T MARILL L ROBERTS Toward a cooperative network of time-shared computers AFIPS Conference Proceedings Nov 1966 2 P E JACKSON C D STUBBS A study of multi-access computer communications AFIPS Conference Proceedings Vol 34 p 491 1969 3 PAUL BARAN et al On distributed communications RAND Series Reports Aug 1964 4 F E HEART R E KAHN S M ORNSTEIN W R CROWTHER D C WALDEN The interface Message Processor for the ARPA network AFIPS Conference Proceedings May 1970 .5 L KLEINROCK Analytic and simulation methods in Computer network design AFIPS Conference Proceedings May 1970 6 H FRANK IT FRISCH W CHOU Topological considerations in the design of the ARPA computer network AFIPS Conference Proceedings May 1970 7 S CARR S CROCKER V CERF HOST-HOST Communication protocol in the ARPA network AFIPS Conference Proceedings May 1970 The interface message processor for the ARPA computer network* by F. E. HEART, R. E. KAHN, S. 1\II. ORNSTEIN, W. R. CROWTHER and D. C. WALDEN Bolt Beranek and Newman Inc. Cambridge, Massachusetts INTRODUCTION voice communication, that overhead time is negligible, but in the case of many short transmissions, such as may occur between computers, that time is excessive. Therefore, ARPA decided to build a new kind of digital communication system employing wideband leased lines and message switching, wherein a path is not established in advance and each message carries an address. In this domain the project portends a possible major change in the character of data communication services in the United States. In a nationwide computer network, economic considerations also mitigate against a wideband leased line configuration that is topologically fully connected. In a non-fully connected network, messages must normally traverse several network nodes in going from source to destination. The ARPA Network is designed on this principle and, at each node, a copy of the message is stored until it is safely received at the following node. The network is thus a store and forward system and as such must deal with problems of routing, buffering, synchronization, error control, reliability, and other related issues. To insulate the computer centers from these problems, and to insulate the network from the problems of the computer centers, ARPA decided to place identical small processors at each network node, to interconnect these small processors with leased common-carrier circuits to form a subnet, and to connect each research computer center into the net via the local small processor. In this arrangement the research computer centers are called Hosts and the small processors are called Interface Message Processors, or IMPs. (See Figure 1.) This approach divides the genesis of the ARPA Network into two parts: (1) design and implementation of the IMP subnet, and (2) design and implementation of protocols and techniques for the sensible utilization of the network by the Hosts. Implementation of the subnet involves two major For many years, small groups of computers have been interconnected in various ways. Only recently, however, has the interaction of computers and communications become an important topic in its own right. ** In 1968, after considerable preliminary investigation and discussion, the Advanced Research Projects Agency of the Department of Defense (ARPA) embarked on the implementation of a new kind of nationwide computer interconnection known as the ARPA N etwork. This network will initially interconnect many dissimilar computers at ten ARPA-supported research centers with 50-kilobit common-carrier circuits. The network may be extended to include many other locations and circuits of higher bandwidth. The primary goal of the ARPA project is to permit persons and programs at one research center to access data and use interactively programs that exist and run in other computers of the network. This goal may represent a major step down the path taken by computer time-sharing, in the sense that the computer resources of the various research centers are thus pooled and directly accessible to the entire community of network participants. Study of the technology and tariffs of available communications facilities showed that use of conventionalline switching facilities would be economically and technically inefficient. The traditional method of routing information through the common-carrier switched network establishes a dedicated path for each conversation. With present technology, the time required for this task is on the order of seconds. For * This work was sponsored by the Advanced Research Proj·· ects Agency under Contract No. DARC 15-69-C-0179. ** A bibliography of ~elevant references is included at the end of this paper; a more extensive list may be found in Cuadra, 1968. 551 552 Spring Joint Computer Conference, 1970 necting the first four sites and are preparing to install circuits at six additional sites. The design of ~he network allows for the connection of additional Host sites. A map of a projected elevennode network is shown in Figure 2. The connections between the first four sites are indicated by solid lines. Dotted lines indicate planned connections. NETWORK DESIGN 50 KILOBIT CIRCUITS The design of the network is discussed in two parts. The first part concerns the relations between the Hosts and the subnet, and the second part concerns the design of the subnet itself. H ost-subnet considerations Figure I-Hosts and IMPs technical activities: providing 50-kilobit commoncarrier circuits and the associated modems; and providing IMPs, along with software and interfaces to modems and Host computers. For reasons of economic and political convenience, ARPA obtained commoncarrier circuits directly through government purchasing channels; AT&T (Long Lines) is the central coordinator, although the General Telephone Company is participating at some sites and other common carriers may eventually become involved. In January 1969, Bolt Beranek and Newman Inc. (BBN) began ,york on the design and implementation of IMPs; a four-node test network was scheduled for completion by the end of 1969 and plans were formulated to include a total of ten sites by mid-1970. This paper discusses the design of the sub net and describes the hardware, the software, and the predicted performance of the IMP. The issues of Host-to-Host protocol and network utilization are barely touched upon; these problems are currently being considered by the participating Hosts and may be expected to be a subject of technical interest for many years to come. At this time, in late 1969, the test network haR become an operating reality. IMPs have already been installed at four sites, and implementation of IMPR for six additional sites is proceeding. The common carriers have installed 50-kilobit leased service con- The basic notion of a subnet leads directly to a series of questions about the relationship between the Hosts and the subnet: What tasks shall be performed by each? What constraints shall each place on the other? What dependence shall the subnet have on the Hosts? In considering these questions, we were guided by the following principles: (1) The subnet should function as a communications system whose essential task is to transfer bits reliably from a source location to a specified destination. Bit transmission should be sufficiently reliable and error free to obviate the need for special precautions (such as storage for retransmission) on the part of the Hosts; (2) The average transit time through the subnet should be under a half second to provide for convenient interactive use of remote computers; (3) The sub net operation should be completely autonomous. Since the subnet must function as a store and forward system, an IMP must not be dependent upon its local Host. The IMP must ------------------------------------------ Figure 2-Network map The Interface 1\1essage Processor continue to operate whether the Host is functioning properly or not and must not depend upon a Host for buffer storage or other logical assistance such as program reloading. The Host computer must not in any way be able to change the logical characteristics of the sub net ; this restriction avoids the mischievous or inadvertent modification of the communication system by an individual Host user; (4) Establishment of Host-to-Host protocol and the enormous problem of planning to communicate between different computers should be an issue separated from the subnet design. Messages, links, and RFNMs In principle, a single transmission from one Host to another may range from a few bits, as with a single teletype character, up to arbitrarily many bits, as in a very long file. Because of buffering limitations in the subnet, an upper limit was placed on the size of an individual Host transmission; 8095 bits was chosen for the maximum transmission size. This Host unit of transmission is called a message. The sub net does not impose any pattern restrictions on messages; binary text may be transmitted. Messages may be of variable length; thus, a source Host must indicate the end of a message to the subnet. A major hazard in a message switched network is congestion, which can arise either due to system failures or to peak traffic flow. Congestion typically occurs when a destination IMP becomes flooded with incoming messages for its Host. If the flow of messages to this destination is' not regulated, the congestion will back up into the network, affecting other IMPs and degrading or even completely clogging the communication service. To solve this problem we developed a quenching scheme that limits the flow of messages to a given destination when congestion begins to occur or, more generally, when messages are simply not getting through. The subnet transmits messages over unidirectional logical paths between Hosts known as links. (A link is a conceptual path that has no physical reality; the term merely identifies a message sequence.) The subnet accepts only one message at a time on a given link. Ensuing messages on that link will be blocked from entering the subnet until the source IMP learns that the previous message has arrived at the destination Host. When a link becomes unblocked, the subnet notifies the source Host by sending it a special control message known as Ready for Next Message (or RFNM) , which identifies the newly unblocked link. The source Host may utilize its connection into the subnet to tlransmit messages over other links, while waiting to 553 send messages on the blocked links. Up to 63 separate outgoing links may exist at any Host site. When giving the subnet a message, the Host- specifies the destination Host and a link number in the first 32 bits of the message (known as the leader). The IMPs then attend to route selection, delivery, and notification of receipt. This use of links and RFNMs also provides for I1\1Pto-Host delivery of sequences of messages in proper order. Because the subnet allows only one message at a time on a given link, Hosts never receive messages out of sequence. Host-IMP interfacing Each IMP will initially service a single Host. However, we have made provision (both in the hardware and software) for the IMP to service up to four Hosts, with a corresponding reduction in the number of permitted phone line connections. Connecting an IMP to a wide variety of different Hosts requires a hardware interface, some part of which must be custom tailored to each Host. We decided, therefore, to partition the interface such that a standard portion would be built into the IMP, and would be identical for all Hosts, while a special portion of the interface would be unique to each Host. The interface is designed to allow messages to flow in both directions at once. A bit serial interface was designed partly because it required fewer lines for electrical interfacing and was, therefore, less expensive, and partly to accommodate conveniently the variety of word lengths in the different Host computers. The bit rate requirement on the Host line is sufficiently low that parallel transfers are not necessary. The Host interface operates asynchronously, each data bit being passed across the interface via a Ready For Next Bit/There's Your Bit handshake procedure. This technique permits the bit rate to adjust to the rate of the slower member of the pair and allows necessary interruptions, when words must be stored into or retrieved from memory. The IMP introduces between bits a (manually) adjustable delay that limits the maximum data rate; at present, this delay is set to 10 f.,Lsec. Any delay introduced by the Host in the handshake procedure further slows the rate. Sys tern. failure Considerable attention has been given to the possible effects on a Host of system failures in the subnet. Minor system failures (e.g., temporary line failures) will appear to the Hosts only in the form of reduced rate of service. Catastrophic failures may, however, result in the loss of messages or even in the loss of 554 Spring Joint Computer Conference, 1970 HEADER . l\cONTROL! SOURCE ILINK , ~ \ I \ I \ LEADER , I \ \ I LEADER ~RESERVEDFOR ~ IMP USE Figure 3-Messages and packets subnet communication. IMPs inform a Host of all relevant system failures. Additionally, should a Host computer go down, the information is propagated throughout the sub net to all IMPs so they may notify their local Host if it attempts to send a message to that Host. Specific subnet design The overriding consideration that guided the subnet design was reliability. Each IMP must operate unattended and reliably over long periods with minimal down time for maintenance and repair. We were convinced that it was important for each IMP in the subnet to operate autonomously, not only independently of Hosts, but insofar as possible from other IMPs as well; any dependency between one IMP and another would merely broaden the area jeopardized by one IMP's failure. Th~ need for reliability and autonomy bears directly upon the form of subnet communication. This section describes the process of message communication within the subnet. visible to the Host computers. Figures 3, 4, and 5 illustrate aspects of message handling. The transmitting Host attaches an identifying leader to the beginning of each message. The IMP forms a header by adqing further information for network use and attaches this header to each packet of the message. Each packet is individually routed from IMP-toIMP through the network toward the destination. At each IMP along the way, the transmitting hardware generates initial and terminal framing characters and parity check digits that are shipped with the packet and are used for error detection by the receiving hardware of the next IMP. Errors in transmission can affect a packet by destroying the framing and/or by modifying the data content. If the framing is disturbed in any way, the packet either will not be recognized or will be rejected by the receiver. In addition, the check digits provide protection against errors that affect only the data. The check digits can detect all patterns of four or fewer errors occurring within a packet, and any single error burst of a length less than twenty-four bits. An overwhelming majority of all other possible errors (all but about one in 224) are also detected. Thus, the mean time between undetected errors in the subnet should be on the order of years. As a packet moves through the subnet, each IMP stores the packet until a positive acknowledgment is returned from the succeeding IMP. This acknowledgment indicates that the message was received without error and was accepted. Once an IMP has accepted a packet and returned a positive acknowledgment, it holds onto that packet tenaciously until it in turn receives an acknowledgment from the succeeding IMP. Under no circumstances (except for Host or IMP malfunction) will an IMP discard a packet after it has generated a positive acknowledgment. However, an IMP is always free to refuse a packet by simply not returning a positive acknowledgment. It may do this for any of several reasons: the packet may have Message handling PACKET PACKET ACKI Hosts communicate with each other via a sequence of messages. An IMP takes in a message from its Host computer in segments, forms these segments into packets (whose maximum size is approximately 1000 bits), and ships the packets separately into the network. The destination IMP reassembles the packets and delivers them in sequence to the receiving Host, who obtains them as a single unit. This segmentation of a message during transmission is completely in- IMP ACK 2 ------- READY FOR NEXT MESSAGE __ ~C.!!_ IMP ------- READY FOR NEXT MESSAGE _ _A£.K.l_ Figure 4-RFNMs and acknowledgments The Interface l\lessage Processor been received in error, the IMP may be busy, the IMP buffer storage may be temporarily full, etc. At the transmitting IMP, such discard of a packet is readily detected by the absence of a returned acknowledgment within a reasonable time interval (e.g., 100 msec). Such packets are retransmitted, perhaps along a different route. Acknowledgments themselves are not acknowledged, although they are error checked in the usual fashion. Loss of an acknowledgment results in the eventual retransmission of the packet; the destination IMP sorts out the resulting duplication by using a message number and a packet number in the header. The packets of a message arrive at the destination IMP, possibly out of order, where they are reassembled. The header is then stripped off each packet and a leader, identifying the source Host and the link, followed by the reassembled message is then delivered to the destination Host as a single unit. See Figure 3. Routing algorithIn The routing algorithm directs each packet to its destination along a path for which the total estimated transit time is smallest. This path is not determined in advance. Instead, each IMP individually decides onto which of its output lines to transmit a packet addressed to another destination. This selection is made by a fast and simple table lookup procedure. For each possible destination, an entry in the table designates the appropriate next leg. These entries reflect line or IMP trouble, traffic congestion, and current subnet connectivity. This routing table is updated every halfsecond as follows: Each IMP estimates the delay it expects a packet to encounter in reaching every possible destination over each of its output lines. It selects the minimum delay estimate for each destination and periodically (about twice a second) passes these estimates to its immediate neighbors. Each IMP then constructs its own routing table by combining its neighbors' estimates with its own estimates of the delay to that neighbor. The estimated delay to each neighbor is based upon both queue lengths and the recent performance of the connecting communication circuit. For each destination, the table is then made to specify that selected output line for which the sum of the estimated delay to the neighbor plus the neighbor's delay to the destination is smallest. The routing table is consistently and dynamically updated to adjust for changing conditions in the network. The system is adaptive to the ups and downs of lines, IMPs, and congestion; it does not tequire the START F~G 555 END ERROR FR~ ~ SPARE BITS Figure 5-Format of packet on phone line IJJIP to know the topology of the network. In particular, an IMP need not even know the identity of its immediate neighbors. Thus, the leased circuits could be reconfigured to a new topology without requiring any changes to the IMPs. Subnet failures The network is designed to be largely invulnerable to circuit or IMP failure as well as to outages for maintenance. Special status and test messages are employed to help cope with various failures. In the absence of regular packets for transmission over a line, the IMP program transmits special hello packets at half-second intervals. The acknowledgment for a hello packet is an I heard you packet. A dead line is defined by the sustained absence (approximately 2.5 seconds) on that line of either received regular packets or acknowledgments; no regular packets will be routed onto a dead line, and any packets awaiting transmission will be rerouted. Routing tables in the network are adjusted automatically to reflect the loss. We require acknowledgment of thirty consecutive hello packets (an event which consumes at least 15 seconds), before a dead line is defined to be alive once again. A dead line may reflect trouble either in the communication facilities or in the neighboring IMP itself. Normal line errors caused by dropouts, impulse noise, or other conditions should not result in a dead line, because such errors typically last only a few milliseconds, and only occasionally as long as a few tenths of a second. Therefore, we expect that a line will be defined as dead only when serious trouble conditions occur. If dead lines eliminate all routes between two IMPs, the IMPs are said to be disconnected and each ,556 Spring Joint Computer Conference, 1970 of these IMPs will discard messages destined for the other. Disconnected IMPs cannot be rapidly detected from the delay estimates that arrive from neighboring IMPs. Consequently, additional information is transmitted between neighboring IMPs to help detect this condition. Each IMP transmits to its neighbors the length of the shortest existing path (i.e., number of IMPs) from itself to each destination. To the smallest such received number per destination, the IMP adds one. This incremented number is the length of the shortest path from that IMP to the destination. If the length ever exceeds the number of network nodes, the destination IMP is assumed to be unreachable and therefore disconnected. Messages intended for dead Hosts (which are not the same as dead IMPs) cannot be delivered; therefore, these messages require special handling to avoid indefinite circulation in the network and spurious arrival at a later time. Such messages are purged from the network either at the source IMP or at the destination IMP. Dead Host information is regularly transmitted with the routing information. A Host computer is notified about another dead Host only when attempting to send a message to that Host. An IMP may detect a major failure in one of three ways: (1) A packet expected for reassembly of a multiple packet message does not arrive. If a message is not fully reassembled in 15 minutes, the system presumes a failure. The message is discarded by the destination IMP and both the source IMP and the source Host are notified via a special RFNM. (2) The Host does not take a message from its IMP. If the Host has not taken a message after 15 minutes, the system presumes that it will never take the message. Therefore, as in the previous case, the message is discarded and a special RFNM is returned to the source Host. (3) A link is never unblocked. If a link remains blocked for longer than 20 minutes, the system again presumes a failure; the link is then unblocked and an error message is sent to the source Host. (This last time interval is slightly longer than the others so that the failure mechanisms for the first two situations will have a chance to operate and unblock the link.) Reliability and recovery procedures For higher system reliability, special attention was placed on intrinsic reliability, hardware test capabilities, hardware/software failure recovery techniques, and proper administrative mechanisms for failure management. . To improve intrinsic reliability, we decided to ruggedize the IMP hard,,~are, thus incurring an approxi- mately ten percent hardware cost penalty. For ease in maintenance, debugging, program revision, and analysis of performance, all IMPs are as similar as possible; the operational program and the hardware are nearly identical in all IMPs. To improve hardware test capabilities, we built special crosspatching features into the IMP's interface hardware; these features allow program-controlled connection of output lines to corresponding input lines. These crosspatching features have been invaluable in testing IMPs before and during field installation, and they should continue to be very useful when troubles occur in the operating network. These hardware test features are employed bya special hardware test program and may also be employed by the operational program when a line difficulty occurs. The IMP includes a 512-word block of protected memory that secures special recovery programs. An IMP can recover from an IMP failure in two ways: (1) In the event of power failure, a power-fail interrupt permits the IMP to reach a clean stop before the program is destroyed. When power returns, a special automatic restart feature turns the IMP back on and restarts the program. (We considered several possibilities for handling the packets found in an IMP during a power failure and concluded that no plan to salvage the packets was both practical and foolproof. For example, we cannot know whether the packet in transmission at the time of failure successfully left the machine before the power failed. Therefore, we decided simply to discard all the packets and restart the program.) (2) The second recovery mechanism is a "watchdog timer", which transfers control to protected memory whenever the program neglects this timer for about one minute. In the event of such transfer, the program in unprotected memory is presumed to be destroyed (either through a hardware transient or a software failure). The program in protected memory sends a reload request down a phone line selected at random. The neighboring IMP responds by sending a copy of its whole program back on the phone line. A normal IMP would discard this message because it is too long, but the recovering IMP can use it to reload its program. Everything unique to a particular IMP must thus reside in its protected memory. Only one register (containing the IMP number) currently differs from IMP-to-IMP. The process of reloading, which requires a few seconds, can be tried repeatedly until successful; however, if after several minutes the program has not resumed operation, a later phase of the watchdog timer shuts off all power to the IMP. In addition to providing recovery mechanisms for both network and IMP failures, we have incorporated The Interface l\1essage Processor into the subnet a control center that monitors network status and handles trouble reports. The control center, located at a network node, initiates and follows up any corrective actions necessary for proper subnet functioning. Furthermore, this center controls and schedules any modifications to the subnet. In trospection Because the network is experimental in nature, considerable effort has been allocated to developing tools whereby the network can supply measures of its own performance. The operational IMP program is capable of taking statistics on its own performance on a regular basis; this function may be turned on and off remotely. The various kinds of resulting statistics, which are sent via the network to a selected Host for analysis, include "snapshots", ten-second summaries, and packet arrival times. Snapshots are summaries of the internal status of queue lengths and routing information. A synchronization procedure allows these snapshots, which are taken every half second, to occur at roughly the same time in all network IMPs; a Host receiving such snapshot messages could presumably build up an instantaneous picture of overall network status. Ten-second summaries include such IMPgenerated statistics as the number of processed messages of each kind, the number of retransmissions, the traffic to and from the local Host, and so forth; this statistical data is sent to a selecte.~ Host every ten seconds. In addition, a record of actual packet arrival times on modem lines allows for the modeling of line traffic. (As part of its research activity, the group at UCLA is acting as a network measurement center; thus, statistics for analysIs will normally be routed to the UCLA Host.) Perhaps the most powerful capability for network introspection is tracing. Any Host message sent into the network may have a "trace bit" set in the leader. Whenever it processes a packet from such a message, the IMP keeps special records of what happens to that packet-e.g., how long the packet is on various queues, when it comes in and leaves, etc. Each IMP that handles the traced packet generates special trace report messages that are sent to a specified Host; thus, a complete analysis of what has happened to that message can be made. When used in an orderly way, this tracing facility will aid in understanding at a very detailed level the behavior of routing algorithms and the behavior of the network under changing load conditions. 557 Flexibili ty Flexibility for modifications in IMP usage has been provided by severa( built-in arrangements: (1) provision within the existing cabinet for an additional 4K core bank; (2) modularity of the hardware interfaces; (3) provision for operation with data circuits of widely different rates; (4) a program organization involving many nearly self-contained subprograms; and (5) provision for Host-unique subprograms in the IMP program structure. This last aspect of flexibility presents a somewhat controversial design choice. There are many advantages to keeping all IMP software nearly identical. Because of the experimental nature of the network, however, we do not yet know whether this luxury of identical programs will be an optimal arrangement. Several potential applications of "Host-unique" IMP software have been considered-e.g., using ASCII conversion routines in each IMP to establish a "Network ASCII" and possibly to simplify the protocol problems of each Host. As of now; the operational IMP program includes a structure that permits unique software plug-in packages at each Host site, but no plug-ins have yet been constructed. THE HARDWARE We selected a Honeywell DDP-516 for the IMP processor because we wanted a machine that could easily handle currently anticipated maximum trafficand that had already been proven in the field. We considered only economic machines with fast cycle times and good instruction sets. Furthermore, we needed a machine with a particularly good I/O capability and that was available in a ruggedized version. The geographical proximity of the supplier to BBN was also a consideration. The basic machine has a 16-bit word length and a O.96-,usec memory cycle. The IMP version is packaged in a single cabinet, and includes a 12K memory, a set of 16 multiplexed channels (which implement a 4-cycle data break), a set of 16 priority interrupts, a 100-,usec clock, and a set of programmable status lights. Also packaged within this cabinet are special modular interfaces for connecting the IMP to phone line modems and to Host computers; these interfaces use the same kind of 1 MHz and 5 MHz DTL packs from which the main machine is constructed. In addition, a number of features that have been incorporated make the IMP somewhat resilient to a variety of failures. Teletypes and high-speed paper tape readers which are attached to the IMPs are used only for mainte- 558 Spring Joint Computer Conference, 1970 Figure 6-The IMP nance, debugging, and system modification; in normal operation, the IMP runs without any moving parts except fans. Within the cabinet, space has been reserved for an additional 4K memory. Figure 6 is a picture of an IMP, and Figure 7 shows its configuration. Ruggedization of computer hardware for use in friendly environments is somewhat unusual; however, we felt that the considerable difficulty that IMP failures can cause the network justified this step. Although the ruggedized unit is not fully "qualified" to MIL specs, it does have greater resistance to temperature variance, mechanical shock and vibration, radio frequency interference, and power line noise. Weare confident that this ruggedization will increase the mean time to failure. Modular Host and modem interfaces allow an IMP to be individually configured for each network node. The modularity, however, does not take the form of pluggable units and, except for the possibility of adding interfaces into reserved frame space, recon- figuration is impractical. Various configurations allow for up to two Hosts and five modems, three Hosts and four modems, etc. Each modem interface requires approximately one-fourth the amount of logic used in the C.P.V. The Host interface is somewhat smaller (about one-sixth of the C.P.V.). Interfaces to the Host and to the modems have certain common characteristics. Both are full duplex, both may be crosspatched under program control to test their operation, and both function in the same general manner. To send a packet, the IMP program sets up memory pointers to the packet and then activates the interface via a programmable control pulse. The interface takes successive words from the memory using its assigned output data channel and transmits them bit-serially (to the Host or to the modem). When the memory buffer has thus been emptied, the interface notifies the program via an interrupt that the job has been completed. To receive information, the program first sets pointers to the allocated space in the memory into which the information is to flow. Vsing a control pulse it then readies the interface to receive. When information starts to arrive (here again bit-serially), it is assembled into 16-bit words and stored into the IMP memory. When either the allocated memory space is full or the end of the data train is detected, the interface notifies the program via an interrupt. The modem interfaces deal with the phone lines in terms of 8-bit characters; the interfaces idle by sending and receiving a sync pattern that keeps them in character sync. Bit sync is maintained by the modems themselves, which provide both transmit .and receive clocking signals to the interfaces. When the program initiates 12 K MEMORY (16 BIT WORD O. 96~s) 100 ~s CLOCK WATCHDOG TIMER STATUS LIGHTS pn~ER FAIL/ AUTO-RESTART Figure 7-IMP configuration The Interface Message Processor 559 transmission, the hardware first transmits a pair of initial framing characters (DLE, STX). Next, the text of the packet is taken word by word from the memory and shifted serially onto the phone line. At the end of the data, the hardware generates a pair of terminal framing characters (DLE, ETX) and shifts them onto the phone line. After the terminal framing characters, the hardware generates and transmits 24 check bits. Finally, the interface returns to idle (sync) mode. The hardware doubles any DLE characters within the binary data train (that is, transmits them twice), thereby permitting the receiving interface hardware to distinguish them from the terminal framing characters and to remove the duplicate. Transmitted packets are of a known maximum size; therefore, any overflow of input buffer length is evidence of erroneous trans-:, mission. Format errors in the framing also register as errors. Check bits are computed from the received data and compared with the received check bits to detect errors in the text. Any of these errors set a flag and cause a program interrupt. Before processing a packet, the program checks the error flag to determine whether the packet was received correctly. IMP SOFTWARE Implementation of the IMPs required the development of a sophisticated operational computer program and the development of several auxiliary programs for hardware tests, program construction, and debugging. This section discusses in detail the design of the operational program and briefly describes the auxiliary software. Operational program The principal function of the operational program is the processing of packets. This processing includes segmentation of Host messages into packets for routing and transmission, building of headers, receiving, routing and transmitting of store and forward packets, retransmitting of unacknowledged packets, reassembling received packets into messages for transmission to the Host, and generating of RFNMs and acknowledgments. The program also monitors network status, gathers statistics, and performs on-line testing. This real-time program is an efficient, interrupt-driven, involute machine language program that occupies about 6000 words of memory. It was designed, constructed, and debugged over a period of about a year by three programmers. The entire program is composed of twelve func- 1 PAGE = 512 WORDS ~BUFFER SPACE ~PROTECTED PAGE Figure 8-Map of core storage tionally distinct pieces; each piece occupies no more than one or two pages of core (512 words per page). These programs communicate primarily through common registers that reside in page zero of the machine and that are directly addressable from all pages of memory. A map of core storage is shown in Figure 8. Seven of the twelve programs are directly involved in the flow of packets through the IMP: the task program performs the major portion of the packet processing, including the reassembly of Host messages; the modem programs (IMP-to-Modem and Modem-to-IMP) handle interrupts and resetting of buffers for the modem channels; the Host programs (IMP-to-Host and Host-to-IMP) handle interrupts and resetting of buffers for the Host channels, build packet headers during input, and construct RFNMs that are returned to the source Host during output; the time-out program maintains a software clock, times out unacknowledged packets for retransmission, and attends to infrequent events; the link program assigns and verifies message numbers and keeps track of links. A background loop 560 Spring Joint Computer Conference, 1970 TABLE I-Program Data Structures 5000 120 300 100 150 400 300 WORDS-MESSAGE BUFFER STORAGE WORDS-QUEUE POINTERS WORDS-TRACE BLOCKS WORDS-REASSEMBLY BLOCKS WORDS-ROUTING TABLES WORDS-LINK TABLES WORDS-STATISTICS TABLES contains the remaining five programs and deals with initialization, debugging, testing, statistics gathering and tracing. After a brief description of data structures, we will discuss packet processing in some detail. Buffer allocation, queues, and tables The major system data structures (see Table I) consist of buffers and tables. The buffer storage space is partitioned into about 70 fixed length buffers, each of which is used for storing a single packet. An unused buffer is chained onto a free buffer list and is removed from this list when it is needed to store an incoming packet. A packet, once stored in a buffer, is never moved. After a packet has been successfully passed along to its Host or to another IMP, its buffer is returned to the free list. The buffer space is partitioned in such a way that each process (store and forward, traffic, Host traffic, etc.) is always guaranteed some buffers. For the sake of program speed and simplicity, no attempt is made to retrieve the space wasted by partially filled buffers. In handling store and forward traffic, all processing is on a per packet basis. Further, although traffic to and from Hosts is composed of messages, the IMP rapidly converts to dealing with packets; the Host transmits a message as a single unit but the IMP takes it one buffer at a time. As each buffer is filled, the program selects another buffer for input until the entire message has been provided for. These successive buffers will, in general, be scattered throughout the memory. An equivalent inverse process occurs on output to the Host after all packets of the message have arrived at the destination IMP. No attempt is ever made to collect the packets of a message into a contiguous portion of the memory. Buffers currently in use are either dedicated to an incoming or an outgoing packet, chained on a queue awaiting processing by the program, or being processed. Occasionally, a buffer may be simultaneously found on two queues; this situation can occur when a packet is \vaiting on one queue to be forwarded and on another to be acknowledged. There are four principal types of queues: Task: Packets received on Host channels are placed on the Host task queue. All received acknowledgments, dead Host and routing information, I heard you and hello packets are placed on the system task queue; all other packets from the modems are placed on the modem task queue. The program services the system task queue first, then the Host task queue, and finally the modem task queue. Output: A separate output queue is constructed for each modem channel and each Host channel. Each modem output queue is subdivided into an acknowledgment queue, a priority queue, a RFNM queue, and a regular message queue, which are serviced in that order. Each Host output queue is subdivided into a control message queue, a priority queue, and a regular message queue, which are also serviced in the indicated order. Sent: A separate queue for each modem channel contains packets that have already been transmitted on that line but for which no acknowledgment has yet been received. Reassembly: The reassembly queue contains those packets that are being reassembled into messages for the Host. Tables in core are allocated for the storage of queue pointers, for trace blocks, for reassembly information, for statistics, and for links. Most noteworthy of these is the link table, which is used at the source IMP for assignment of message numbers and for blocking and unblocking links, and at the destination IMP to verify message numbers for sequence control. Packet flow and program. structure Figure 9 is a schematic drawing of packet processing; the processing programs are described below. The H ost-to-IM P routine (H ~ I) handles messages being transmitted from the local site. The routine uses the leader to construct a header that is prefixed to each packet of the message. It also creates a link for the message if necessary, blocks the link, puts the packets of the message on the Host task queue for further processing by the task routine, and triggers the programmable task interrupt. The routine then acquires a free buffer and sets up a new input. The r~)Utine tests a hardware trouble indicator, verifies the message format, and checks whether or not the destination is dead, the link table is full, or the link blocked. The routine is serially reentrant and services all Hosts connected to the IMP. The Interface J\1essage Processor The lJ!Jodem-to-IMP routine (M - ? I) handles inputs from the modems. This routine consists of several identical routines, one for each modem channel. (Such duplication is useful to obtain higher speed.) This routine sets up an input buffer (normally obtained from the free list), places the received packet on the appropriate task queue, and triggers the programmable task interrupt. Should no free buffers be available for input, the buffer at the head of the modem task queue is preempted. If the modem task queue is also empty, the received packet is discarded by setting up its buffer for input. However, a sufficient number of free buffers are specifically reserved to assure that received acknowledgments, routing packets, and the like are rarely discarded. The task routine uses the header information to direct packets to their proper destination. The task routine is driven by the task interrupt, which is set whenever a packet is put on a task queue. The task routine routes packets from the Host task queue onto an output queue determined from the routing algorithm. For each packet on the modem task queue, the task routine first determines whether sufficient buffer space is available. If the IMP has a shortage of store and forward buffers, the buffers on the modem task queue are simply returned to the free list without further processing. Normally, however, an acknowledgment packet is constructed and put near the front of the appropriate modem output queue. The destination of the packet is then inspected. If the packet is not for the local site, the routing algorithm selects a modem output queue for the packet. If a packet for the local site is a RFNM, the corresponding link is unblocked and the RFNM is put on a queue to the Host. If the packet is not a RFNM, it is joined with others of the \ " /. M TTY I I \ M M M M S I D'QUEUE same message on the reassembly queue. Whenever a message is completely reassembled, the packets of the message are put on an output queue to the Host for processing by the IMP-to-Host routine. In processing the system task queue, the task routine returns to the free list those buffers from the sent queue that have been referenced by acknowledgments. Any packets skipped over by an acknowledgment are designated for retransmission. Routing, I heard you, and hello packets are processed In a straightforward fashion. The IlJ!JP-to-lJ!J odem routine (I - ? M) transmits successive packets from the Modem output queue. After completing the output, this routine places any packet requiring acknowledgment on the sent queue. The IMP-to-Host routine (I - ? H) sets up successive outputs of packets on the Host output queues and constructs a RFNM for each non-control message delivered to a Host. RFNM packets are returned to the system via the Host task queue. The time-out routine is started every 25.6 msec (called the time-out period) by a clock interrupt. The routine has three sections: the fast time-out routine, which "wakes up" any Host or modem interrupt routine that has languished (for example, when the Host input routine could not immediately start a new input because of a shortage in buffer space); the middle time-out routine, which retransmits any packets that have been too long on a modem sent queue; and the slow time-out routine, which marks lines as alive or dead, updates the routing tables and does long term garbage collection of queues and other data structures. (For example, it protects the system from the cumulative effect of such failures as a lost packet of a multiple packet message, where buffers are tied up in message reassembly.) It also deletes links automatically after 15 seconds of disuse, after 20 minutes of blocking, or when an IMP goes down. These three routines are executed in the following pattern: FFFF FFFF FFFF FFFF FFFF FFFF ... TE~ _ _ ... I 561 , .... - .... , ~ \ \ ). PROG,RAM • • CHOICE ---' Figure 9-Internal packetftow t . DERIVED PACKET and, although they run off a common interrupt, are constructed to allow faster routines to interrupt slower ones should a slower routine not complete execution before the next time-out period. The link routine enters, examines, and deletes entries from the link table. A table containing a separate message number entry for many links to every possible Host would be prohibitively large. Therefore, .the table contains entries only for each of 63 total out- 562 Spring Joint Computer Conference, 1970 going links at any Host site. Hashing is used to speed accessing of this table, but the link program is still quite costly; it uses about ten percent of both speed and space in a conceptually trivial task. Initialization and background loop The IMP program starts in an initialization section that builds the initial data structures, prepares for inputs from modem and Host channels, and resets all program switches to their nominal state. The program then falls into the background loop, which is an endlessly repeated series of low-priority subroutines that are interrupted to handle normal traffic. The programs in the IMP background loop perform a variety of functions: TTY is used to handle the IMP Teletype traffic; DEBUG, to inspect or change IMP core memory; TRACE, to transmit collected information about traced packets; STATISTICS, to take and transmit network and IMP statistics; P ARAMETERCHANGE, to alter the values of selected IMP parameters; and DISCARD, to throwaway packets. Selected Hosts and IMPs, particularly the Network Measurement Center and the Network Control Center, will find it necessary or useful to communicate with one or more of these background loop programs. So that these programs may send and receive messages from the network, they are treated as "fake Hosts". Rather than duplicating portions of the large IMP-toHost and Host-to-IMP routines, the background loop programs are treated as if they were Hosts, and they can thereby utilize existing programs. The "For IMP" bit or the "From IMP" bit in the leader indicates that a given message is for or from a fake Host program in the .IMP. Almost all of the background loop is devoted to running these programs. The TTY program assembles characters from the Teletype into network messages and decodes network messages into characters for the Teletype; TTY's normal message destination is the DEBUG program at its own IMP; however, TTY can be made to communicate with any other IMP Teletype, any other IMP DEBUG program or any Host program with compatible format. The DEBUG program permits the operational· program to be inspected and changed. Although its normal message source is the TTY program at its own IMP, DEBUG will respond to a message of the correct format from any source. This program is normally inhibited from changing the operational IMP program; local operator intervention is required to activate the program's full power. The STATISTICS program collects measurements about network operation and periodically transmits them to the Network Measurement Center. This program sends but does not receive messages. STATISTICS has a mechanism .for collecting measurements over 10-second intervals and for taking halfsecond snapshots of IMP queue lengths and routing tables. It can also generate artificial traffic to load the network. When turned on, STATISTICS uses 10 to 20 percent of the machine capacity and generates a noticeable amount of phone line traffic. Other programs in the background loop drive local status lights and operate the parameter change routine. A thirty-two word parameter table controls the operation of the TRACE and STATISTICS programs and includes spares for expansion; the PARAMETERCHANGE program accepts messages that change these parameters. Control organization It is characteristic of the IMP system that many of the main programs are entered both as subroutine calls from other programs and as interrupt calls from the hardware. The resulting control structure is shown in Figure 10. The programs are arranged in a priority order; control passes upward in the chain whenever a hardware interrupt occurs or the current program decides that the time has come to run a higher priority program, and control passes downward only when the higher priority programs are finished. No program may execute either itself or a lower priority program; however, a program may freely execute a higher priority program. This rule is similar to the usual rules concerning priority interrupt routines. In one important case, however, control must pass from a higher priority program to a lower priority program-namely, from the several input routines to the TASK routine. For this special case, the computer hardware was modified to include a low-priority hardware interrupt that can De set by the program. When this interrupt has been honored (i.e., when all other interrupts have been serviced), the TASK routine is executed. Thus, control is directed where needed without violating the priority rules. Some routines must occasionally wait for long intervals of time, for example, when the Host-to-IMP routine must wait for a link to unblock. Stopping the whole system would be intolerable; therefore, should the need arise, such a routine is dismissed, and the TIMEOUT routine will later transfer control to the waiting routine. The control structure and the partition of responsibility among various programs achieve the following The Interface IVlessage Processor 563 timing goals: 1. No program stops or delays the system while waiting for an event. 2. The program gracefully adjusts to the situation where the machine becomes compute-bound. 3. The Modem-to-IMP routine can deliver its current packet to the TASK routine before the next packet arrives and can always prepare for successive packet inputs on each line. This timing is critical because a slight delay here might require retransmission of the entire packet. To achieve this result, separate routines (one per phone line) interrupt each other freely after new buffers have been set up. 4. The program will almost always deliver packets waiting to be sent as fast as they can be accepted by the phone line. 5. Necessary periodic processes (in the time-out routine) are always permitted to run, and do not interfere with input-output processes. Support software Designing a real-time program for a small computer with many high rate I/O channels is a specialized kind of software problem. The operational program requires not only unusual techniques but also extra software tools; often the importance of such extra tools is not recognized. Further, even when these issues are recognized, the effort needed to construct such tools may be seriously underestimated. The development of the IMP system required the following kinds of supporting software: 1. 2. 3. 4. Programs to test the hardware. Tools to help debug the system. A Host simulator. An efficient assembly process. So far, three hardware test programs have been developed. The first and largest is a complete program for testing all the special hardware features in the IMP. This program permits running any or all of the modem interfaces in a crosspatched mode; it even permits operating together several IMPs in a test mode. The second hardware test program runs a detailed phone line test that provides statistics on phone line errors. The final program simulates the modem interface check register whose complex behavior is otherwise difficult _to predict. The software. debugging tools exist in two forms. Initially we designed a simple stand-alone debugging program with the capability to do little more than examine and change individual core registers from the : ::T 1 -+ .... -' Q. ::E ::; ~ o ....... '" ~ '" :t: * CLOC K * ----1 PROGRAt·WABLE I NTE RRUPT '---I--L___J Arrows indicate that control is passed with a subroutine call; control will eventually return back down the arrow. Note that the hardware interrupts and VI the lower pri ori ty routines can both .., call the same programs as subroutines. ~ eSet proqrammahle hardware interruot Figure lO-Program control structure console Teletype. Subsequently, we embedded a version of the stand-alone debugging program in to the operational program. This operational debugging program not only provides debugging assistance at a single location but also may be used in network testing and network debugging. The initial implementation of the IMP software took place without connecting to a true Host. To permit checkout of the Host-related portions of the operational program, we built a "Host Simulator" that takes input from the console Teletype and feeds the Host routines exactly as though the input had originated in a real Host. Similarly, output messages for a destination Host are received by the simulator and typed out on the console Teletype. Without recourse to expensive additional peripherals, the assembly facilities on the DDP-516 are inadequate for a large program. (For example, a listing of the IMP program would require approximately 20 hours of Teletype output. ) We therefore used other locally available facilities to assist in the assembly process. Specifically, we used a PDP-1 text editor to compose and edit the programs, assembled on the 564 Spring Joint Computer Conference, 1970 TABLE II-Transit Times And Message Rates Minimum Maximum 5 msec 10 msec 100/sec 50 msec 100 msec lO/sec SINGLE WORD MESSAGE Transit Time Round-trip Max. Message Rate/Link SINGLE FULL PACKET MESSAGE Transit Time Round-trip Max. Message Rate/Link 45 msec 50 msec 20/sec 140 msec 190 msec 5/sec 265 msec 195 msec 5/sec 360 msec 320 msec 3/sec 8-PACKET MESSAGE Transit Time Round-trip Max. Message Rate/Link DDP-516, and listed the program on the SDS 940 line printer. Use of this assembly process required minor modification of existing PDP-1 and SDS 940 support software PROJECTED IMP PERFORMANCE At this writing, the subnet has not yet been subjected to realistic load conditions; consequently, very little experimental data is available. However, we have made some estimates of projected performance of the IMP program and we describe these estimates below. Host traffic and message delays In the subnet, the Host-to-Host transit time and the round-trip time (for RFNM receipt) depend upon routing and message length. Since only one message at a time may be present on a given link, the reciprocal of the round-trip delay is the maximum message rate on a link. The primary factors affecting subnet delays are: . Propagation delay: Electrical propagation time in the Bell system is estimated to be about 10 J.Lsec per mile. Cross country propagation delay is therefore about 30 msec. • Modem transmission delay: Because bits enter and leave an IMP at a predetermined modem bit rate, a packet requires a modem transmission time proportional to its length (20 J.Lsec per bit on a 50-kilobit line). . Queueing delay: Time spent waiting in the IMP for transmission of previous packets on a queue. Such waiting may occur either at an intermediate IMP or in connection with terminal IMP transmissions into the destination Host. . IMP processing delay: The time required for the IMP program to process a packet is about 0.35 msec for a store-and-forward packet. Because the queueing delay depends heavily upon the detailed traffic load in the network, an estimate of queueing delay will not be available until we gain considerable experience with network operation. In Table II, we show an estimate of the one-way and round-trip transit times and the corresponding maximum message rate per link, assuming the negligible queueing delay of a lightly loaded net. In this table, "minimum" delay represents a short hop between two nearby IMPs, and "maximum" delay represents a cross-country path involving five IMPs. In all cases the delays are well within the desired half-second goal. In a lightly-loaded network with a mixture of nearby and distant destinations, an example of heavy Host traffic into its IMP might be that of 20 links carrying ten single-word messages per second and four more links, each carrying one eight-packet message per second. Computational load In general, a line fully loaded with short packets will require more computation than a line with all long packets; therefore the IMP can handle more lines in the latter case. In Figure 11, we show a curve of the computational utilization of the IMP as a function of message length for fully-loaded communication lines. For example, a 50-kilobit line fully loaded in both directions with one-word messages requires slightly over 13 percent of the available IMP time. Since a line will typically carry a variety of different length packets, and each line will be less than fully loaded, the computational load per line will actually be much less. Throughput is defined to be the maximum number of Host data bits that may traverse an IMP each second. The actual number of bits entering the IMP' per second is somewhat larger than the throughput because of such overhead as headers, RFNMs, and acknowledgments. The number of bits on the lines are still larger because of additional line overhead such as framing and error control characters. (Each packet on the phone line contains seventeen characters of The Interface l\1essage Processor overhead, nine of which are removed before the packet enters an IMP.) The computational limit on the IMP throughput is approximately 700,000 bits per second. Figure 12 shows maximum throughput as a function of message length. The difference between the throughput curve and the line traffic curve represents overhead. 1400 1300 1200 1100 LINE TRAFF IC 1000 c z .".. 900 o ~ 800 ./ I o == :.0:: 500 400 300 In this section we state some of our conclusions about the design and implementation of the ARPA Network and comment on possible future directions. We are convinced that use of an IMP-like device is a more sensible way to design networks than is use of direct Host-to-Host connection. First, for the subnet to serve a store-and-forward role, its functions must be independent of Host computers, which may often be down for extended periods. Second, the IMP program is very complex and is highly tailored to the I/O structure of the DDP-516; building such complex functions into special I/O units of each computer that might need network connection is probably economically inadvisable. Third, because of the desirability of having several Host computers at a given site connect to the network, it is both more convenient and more economic to employ IMPs than to provide all the network functions in each of the Host computers. The whole notion of a network node serving a multiplexing function for complexes of local Hosts and terminals lends further support to this conclusion. Finally, because we were led to a design having some interIMP dependence; we found it advantageous to have identical units at each node, rather than computers of different manufacture. Considering the multiplexing issue directly, it now seems clear that individual network nodes will be connected to a wide variety of computer and terminal complexes. Even the initial ten-node ARPA Network 70 IIPERCENT OF IMP CAPACITY DUE TO IISTORE AND FORWARD T~AFFlC. (IN AND OUT! - 60 'J/< .... 50 :z -' ~ 40 1\ \ \ f'-...-. I- ..: ~ 30 Q.. Z01\ :2 " "' o~ 0 ..... 230.4 KILOB ITS/SEC I I ! I"--~ viOS ~ ILOB ITS/SEC . / 50 K ILOB ITS/SEC 2 3 4 5 6 MESSAGE LENGTH (PACKETS) Figure I1-IMP utilization I 200 100 0 ~ .....- ~V \I) ~ 700 ;;; 600 DISCUSSION .56.5 if f J ..A T MAX IMUM THROUGHPUT f - - - - ~ I 7" d ----- t [( 0 3 4 5 6 7 10 MESSAGE LENGTH (PACKETS) Figure I2-IMP throughput includes one Host organization that has chosen to submultiplex several computers via a single Host connection to the IMP. We are now studying variants of the IMP design that address this mUltiplexing issue, and we also expect to cooperate with other groups (such as at the National Physical Laboratory in England) that are studying such multiplexing techniques. The increasing interest in computer networks will bring with it an expanding interaction between computers and communication circuits. From the outset, we viewed the ARPA Network as a systems engineering problem, including the portion of the system supplied by the common carriers. Although we found the carriers to be properly concerned about circuit performance (the basic circuit performance to date has been quite satisfactory), we found it difficult to work with the carriers cooperatively on the technical details, packaging, and implementation of the communication circuit terminal equipment; as a result, the present physical installations of circuit terminal equipment are at best inelegant and inconvenient. In the longer run, for reasons of economy, performance, and reliability, circuit terminal equipment probably should be integrated more closely with computer input/output equipment. If the carriers are unable to participate conveniently in such integrations, we would expect further growth of a competing circuit terminal equipment industry, and more prevalent common carrier provision of bare circuits. Another aspect of network growth and development is the requirement to connect different rate communication circuits to IMP-like devices as a function of the particular application. In our own IMP design, although there are limitations on total throughput, 566 Spring Joint Computer Conference, 1970 the IMP can be connected to carrier circuits of any bit rate up to about 250 kilobits; similarly, the interface to a Host computer can operate over a wide range of bit rates. We feel that this flexibility is very important because the economics of carrier offerings, as well as the us~r requirements, are subject to surprisingly rapid change; even within the time period of the present implementation, we have experienced such changes. At this point, we would like to discuss certain aspects of the implementation effort. This project required the design, development, and installation of a very complex device in a rather short time scale. The difficulty in producing a complex system is highly dependent upon the number of people who are simultaneously involved. Small groups can achieve complex optimizations of timing, storage, and hardware/ software interaction, whereas larger groups can seldom achieve such optimizations on a reasonable time scale. We chose'to operate with a very small group of highly talented people. For example, all software, including software tools for assembly, editing, debugging, and equipment testing as well as the main operational program, involved effort by no more than four people at any time. Since so many computer system projects involve much larger groups, we feel it is worth calling attention to this approach. Turning to the future, we plan to work with the ARPA Network project along several technical directions: (1) the experimental operation of the network and any modifications required to tune its performance; (2) experimental operation of the network with higher bandwidth circuits, e.g., 230.4 kilobits; (3) a review of IMP variants that might perform multiplexing functions; (4) consideration of techniques for designing more economical and/ or more· powerful IMPs; and (5) participation with the Host organizations in the very sizeable problem of developing techniques and protocols for the effective utilization of the network. On a more global level, we anticipate an explosive growth of message switched computer networks, not just for the interactive pooling of resources, but for the simple conveniences and economies to be obtained for many classes of digital data communication. We believe that the capabilities inherent in the design of even the present subnet have broad application to other data communication problems of government and private industry. ACKNOWLEDGMENTS The ARPA Network has in large measure been the conception of one man, Dr. L. G. Roberts of the Advanced Research Projects Agency; we gratefully acknowledge his guidance and encouragement. Researchers at many other institutions deserve credit for early interactions with . ARPA concerning basic network design; in particular we would like to acknowledge the insight about IMPs provided by W. A. Clark. At BBN, many persons contributed to the IMP project. We acknowledge the contributions of H. K. Rising, who participated in the subnet design and acted as associate project manager during various phases of the project; B. P. Cosell, who participated significantly in the software implementation; W. B. Barker and M. J. Thrope, who participated significantly in the hardware implementation; and T. Thatch, J. H. Geisman, and R. C. Satterfield, who assisted with various implementation aspects of the project. We also acknowledge the helpful encouragement of J. 1. Elkind and D. G. Bobrow. Finally, . we wish to acknowledge the hardware implementation contribution of the Computer Control Division of Honeywell, where many individuals worked cooperatively with us despite the sometimes abrasive pressures of a difficult schedule. REFERENCES 1 P BARAN On distributed communication networks IEEE Transactions on Communication Systems Vol CS-12 March 1964 2 P BARAN S BOEHM P SMITH On distributed communications Series of 11 reports Rand Corporation Santa Monica California 1964 3 B W BOEHM R L MOBLEY Adaptive routing techniques for distributed communication systems Rand Corporation Memorandum RM-4781-PR 1966 4 Initial design for interface message processors for the ARPA computer network Bolt Beranek and Newman Inc Report No 1763 1969 5 Specifications for the interconnection of a Host and an IMP Bolt Beranek and Newman Inc Report No 1822 1969 6 G W BROWN J G MILLER T A KEENAN EDUNET report of the summer study on information networks conducted by the interuniversity communications council John Wiley and Sons New York 1967 7 S CARR S CROCKER V CERF HOST-HOST communication protocol in the ARPA network Proceedings of AFIPS SJCC 1970 In this issue 8 C A CUADRA A nnual review of information science and technology Interscience Vol 3 Chapters 7 and 10 1968 9 D W DAVIES K A BARTLETT R A SCANTLEBURY P T WILKINSON A digital communication network for computers giving rapid response at remote terminals ACM Symposium on Operating System Principles 1967 The Interface l\1essage Processor 10 D W DAVIES The principles of a data communication network for computers and remote peripherals Proceedings of IFIP Hardware Paper D111968 11 D W DAVIES Communications networks to serve rapid-response computers Proceedings of IFIP Edinburgh 1968 12 EIN software catalogue EDUCOM 100 Charles River Park Boston (Regularly updated) 13 R R EVERETT C A ZRAKET H D BENINGTON Sage-a data processing system for air defense Proceedings of EJCC 1957 14 Policies and regulatory procedures relating to computer and communication services Notice of Inquiry Docket No 16979 Washington D C 1966 Federal Communications Commission 15 L R FORD JR D R FULKERSON Flows in networks Princeton University Press 1962 16 H FRANK I T FRISCH W CHOU Topological considerations in the design of the ARPA computer network Proceedings of AFIPSSJCC 1970 In this issue 17 R T JAMES The evolution of wideband services IEEE International Convention Record Part I Wire and Data Communication 1966 18 S J KAPLAN The advancing communication technology and computer communication systems Proceedings of AFIPS SJCC Vol 32 1968 19 L KLEINROCK Communications nets-stochastic message flow and delay McGraw-Hill Book Co Inc N ew York 1964 20 L KLEINROCK Models for computer networks Proceedings of International Communications Conference June 1969 21 L KLEINROCK Optimization of computer networks for various channel cost functions Proceedings of AFIPS SJCC 1970 In this issue 22 T MARILL Cooperative networks of time-shared computers Computer Corporation of America Preliminary Study 1966 23 24 25 26 27 28 29 30 31 32 33 34 35 567 Also Private Report Lincoln Laboratory MIT Cambridge Massachusetts 1966 T MARILL L G ROBERTS Toward a cooperative network of time-shared computers Proceedings of AFIPS FJCC 1966 Biomedical communications network-technical development Plan National Library of Medicine June 1968 Networks of computers symposium NOC-68 Proceedings of Invitational Workshop Ft Meade Maryland National Security Agency September 1969 Networks of computers symposium NOC-69 Proceedings of Invitational Workshop Ft Meade Maryland (in press) National Security Agency M N PERRY W R PLUGGE A merican Airlines 'Sabre' electronic reservations system Proceedings of AFIPS WJCC 1961 L G ROBERTS Multiple computer networks and intercomputer communication ACM Symposium on Operating System Principles 1967 L G ROBERTS Access control and file directories in computer networks IEEE International Convention March 1968 L G ROBERTS Resource sharing computer networks IEEE International Conference March 1969 L G ROBERTS B D WESSLER Computer network development to achieve resource sharing Proceedings of AFIPS SJCC 1970 In this issue R A SCANTLEBURY P T WILKINSON K A BARTLETT The design of a message switching centre for a digital communication network D26 Proceedings of IFIP Hardware Edinburgh 1968 K STEIGLITZ P WEINER D J KLEITMAN The design of minimum cost survivable networks IEEE Transactions on Circuit Theory Vol CT-16 November 1969 R SUNG J B WOODFORD Study of communication links for the biomedical communication network Aerospace Report No ATR-69 (7130-06)-1 1969 W TEITEL MAN R E KAHN A network simulation and display program Proceedings of 3rd Annual Princeton Conference on Information Sciences and Systems March 1969 Analytic and simulation methods in computer networkdesign* by LEONARD KLEINROCK University of California Los Angeles, California INTRODUCTION THE ARPA EXPERIMENTAL COlV[PUTER NETWORK-AN EXAMPLE The Seventies are here and so are computer networks! The time sharing industry dominated the Sixties and it appears that computer networks will play a similar role in the Seventies. The need has now arisen for many of these time-shared systems to share each others' resources by coupling them together over a communication network thereby creating a computer network. T& mini-computer will serve an important role here as the sophisticated terminal as well as, perhaps, the message switching computer in our networks. It is fair to say that the computer industry (as is true of most other large industries in their early development) has been guilty of "leaping before looking" ; on the other hand "losses due to hesitation" are not especially prevalent in this industry. In any case, it is clear that much is to be gained by an appropriate mathematical analysis of performance and cost measures for these large systems, and that these analyses should most profitably be undertaken before major design commitments are made. This paper attempts to move in the direction of providing some tools for and insight into the design of computer networks through mathematical modeling, analysis and simulation. Frank et al., 4 describe tools for obtaining low cost networks by choosing among topologies using computationally efficient methods from network flow theory; our approacn complements theirs in that we look for closed analytic expressions where possible. Our intent is to provide understanding of the behavior and trade-offs available in some computer network situations thus creating a qualitative tool for choosing design options and not a numerical tool for choosing precise design parameters. * This work was' supported by the Advanced Research Projects Agency of the Department of Defense (DAHC15-69-C-0285). 569 The particular network which we shall use for purposes of example (and with which we are most familiar) is the Defense Department's Advanced Research Projects Agency (ARPA) experimental computer network. 2 The concepts basic to this network were clearly stated in Reference 11 by L. Roberts of the Advanced Research Projects Agency, who originally conceived this system. Reference 6, which appears in these proceedings, provides a description of the ~is­ torical development as well as the structural orgamzation and implementation of the ARPA network. We choose to review some of that description below in order to provide the reader with the motivation and understanding necessary for maintaining a certain degree of self containment in this paper. As inight be expected, the design specifications and configuration of the ARPA network have changed many times since its inception in 1967. In June, 1969, this author published a paper8 in which a particular network configuration was described and for which certain analytical'models were constructed and studied. That network consisted of nineteen nodes in the continental United States. Since then this number has changed and the identity of the nodes has changed and the topology has changed, and so on. The paper by Frank et al., 4 published in these proceedings, describes the behavior and topological design of one of these newer versions. However, in order to be consistent with our earlier results, and since the ARPA example is intended as an illustration of an approach rather than a precise design computation, we choose to continue to study and therefore to describe the original nineteen node network in this paper. The network provides store-and-forward communication paths between the set of nineteen computer re- 570 Spring Joint Computer Conference, 1970 Figure I-Configuration of the ARPA network in Spring 1969 search centers. The computers located at the various nodes are drawn from a variety of manufacturers and are highly incompatible both in hardware and software; this in fact presents the challenge of the network experiment, namely, to provide effective communication among and utilization of this collection of incompatible machines. The purpose is fundamentally for resource sharing where the resources themselves are highly specialized and take the form of unique hardware, programs, data bases, and human talent. For example, Stanford Research Institute will serve the function of network librarian as well as provide an efficient text editing system; the University of Utah provides efficient algorithms for the manipulation of figures and for picture processing; the University of Illinois will provide through its ILLIAC IV the power of its fantastic parallel processing capability; UCLA will serve as network measurement center and also provide mathematical models and simulation capability for network and time-shared system studies. The example set of nineteen nodes is shown in Fig~re 1. The traffic matrix which describes the message flow required between various pairs of nodes is given in Reference 8 and will not be repeated here. An underlying constraint placed upon the construction of this network was that network operating procedures would not interfere in any significant way with the operation of the already existing facilities which were to be connected together through this network. Consequently, the message handling tasks (relay, acknowledgment, routing, buffering, etc.) are carried out in a special purpose Interface Message Processor (IMP) co-located with the principal computer (denoted HOST computer) at each of the computer research centers. The communication channels are (in most cases) 50 kilobit per second full duplex telephone lines and only the IMPs are connected to these lines through data sets. Thus the communication net consists of the lines, the IMPs and the data sets and serves as the store-andforward system for the HOST computer network. Mesages which flow between HOSTs are broken up into small entities referred to as packets (each of maximum size of approximately 1000 bits). The IMP accepts up to eight of these packets to create a maximum size message from the HOST. The packets make their way individually through the IMP network where the appropriate routing procedure directs the traffic flow. A positive acknowledgment is expected within a given time period for each inter-IMP packet transmission; the absence of an acknowledgment forces the transmitting IMP to repeat the transmission (perhaps over the same channel or some other alternate channel). An acknowledgment may not be returned for example, in the case of detected errors or for lack of buffer space in the receiving IMP . We estimate the average packet size to be 560 bits; the acknowledgment length is assumed to be 140 bits. Thus, if we assume that each packet transmitted over a channel causes the generation of a positive acknowle.dgment packet (the usual case, hopefully) , then the average packet transmission over a line is of size 350 bits. Much of the short interactive traffic is of this nature. We also anticipate message traffic of much longer duration and we refer to this as multi-packet traffic. The average input data rate to the entire net is assumed to be 225 kilobits per second and again the reader is referred to Reference 8 for further details of this traffic distribution. So much for the description of the ARPA network. Protocol and operating procedures for the ARPA computer network are described in References 1 and 6 in these proceedings in much greater detail. The history, development, motivation and cost of this network is described by its originator in Reference 12. Let us now proceed to the mathematical modeling, analysis and simulation of such networks. ANALYTIC AND SIlVIULATION METHODS The mathematical tools for computer network design are currently in the early stages of development. In many ways we are still at the stage of attempting to create computer network models which contain enough salient features of the network so that behavior of such networks may be predicted from the model behavior. In this section we begin with the problem of analysis for a given network structure. First we review the author's earlier analytic model of communication networks and then proceed to identify those features which distinguish computer networks from strict communica- Analytic and Simulation Methods tion networks. Some previously published results on computer networks are reviewed and then new improvements on these results are presented. We then consider the synthesis and optimization question for networks. We proceed by first discussing the nature of the channel cost function as available under present tariff and charging structures. We consider a number of different cost functions which attempt to approximate the true data and derive relationships for optimizing the selection of channel capacities under these various cost functions. Comparisons among the optimal solutions are then made for the ARPA network. Finally in this section we consider the operating rules for computer networks. We present the results of simulation for the ARPA network regarding certain aspects ()f the routing procedure which provide improvements in performance. A model from queueing theory-Analysis In a recent work8 this author presented some computer network models which were derived from his earlier research on communication networks. 7 An attempt was made at that time to incorporate many of the salient features of the ARPA network described above into this computer network model. It was pointed out that computer networks differ from communication networks as studied in Reference 7 in at least the following features: (a) nodal storage capacity is finite and may be expected to fill occasionally; (b) channel and modem errors occur and cause retransmission; (c) acknowledgment messages increase the message traffic rates; (d) messages from HOST A to HOST B typically create return traffic (after some delay) from B to A; (e) nodal delays become important and comparable to channel transmission delays; (f) channel cost functions are more complex. We intend to include some of these features in OUr model below. The model proposed for computer networks is drawn from our communication network experience and includes the following assumptions. We assume that the message arrivals form a Poisson process with average rates taken from a given traffic matrix (such as in Reference 8), where the message lengths are exponentially distributed with a mean 1/,u of 350 bits (note that we are only accounting for short messages and neglecting the multi-packet traffic in this model). As discussed at length in Reference 7, we also make the independence assumption which allows a very simple node by node analysis. We further assume that a fixed routing procedure exists (that is, a unique allowable 571 path exists from origin to destination for each origindestination pair). From the above assumptions one may calculate the average delay Ti due to waiting for and transmitting over the ith channel from Equation (1), Ti 1 =--- ,uC i - Ai (1) where Ai is the average number of messages per second flowing over channel i (whose capacity is Ci bits per second). This was the appropriate expression for the average channel delay in the study of communication nets7 and in that study we chose as our major performance measure the message delay T averaged over the entire network as calculated from (2) where 'Y equals the total input data rate. Note that the average on Ti is formed by weighting the delay on channel Ci with the traffic, Ai, carried on that channel. In the study of communication nets7 this last equation provided an excellent means for calculating the average message delay. That study went on to optimize the selection of channel capacity throughout the network under the constraint of a fixed cost which was assumed to be linear with capacity; we elaborate upon this cost function later in this section. The computer network models studied in Reference 8 also made use of Equation (1) for the calculation of the channel delays (including queueing) where param-:eter choices were 1/,u = 350 bits, Ci = 50 kilo bits and Ai = average message rate on channel i (as determined from the traffic matrix, the routing procedure, and accounting for the effect of acknowledgment traffic as mentioned in feature (c) above). In order to account for feature (e) above, the performance measure (taken as the average message delay T) was calculated from (3) where again 'Y = total input data rate and the term 10-3 = 1 millisecond (nominal) is included to account for the assumed (fixed) nodal processing time. The result of this calculation for the ARPA network shown in Figure 1 may be found in -Reference 8. The computer network model described above is essentially the one used for calculating delays in the topological studies reported upon by Frank, et aI., in these proceedings. 4 A number of simulation experiments have been carried out using a rather detailed description of the ARPA network and its operating procedure. Some of 572 Spring Joint Computer Conference, 1970 60 THEORY CORRECTED AND WITH PRIORITIES THEORY WITH CORRECT ACKNOWLEDGE ADJUSTMENT AND PROPAGATION DELAYS 50 L~=======~:::::~~~-SIMULATION U)40 ~ ~ ~ equations have accounted only for transmission delays which come about due to the finite· rate at which bits may be fed into the channel (i.e., 50. kilobits per second); we are required however to include also the propagation time for a bit to travel down the length of the channel. Lastly, an additional one millisecond delay is included in the final destination node in order to deliver the message to the destination HOST. These additional effects give rise to the following expression for the average message delay T. -' ~ 30 LIJ <.!) 4 U) U) LIJ ~ (4) 20 THEORY WITHOUT ACKNOWLEDGE ADJUSTMENT 10 0L-----~----~--~7_--~~--~ 50 60 70 80 TRAFF IC IN % OF FULL DATA RATE Figure 2-Comparison between theory and simulation for the ARPA network these results were reported upon in Reference 8 and a comparison was made there between the theoretical results obtained from Equation (3) and the simulation results. This comparison is reproduced in Figure 2 where the lowest curve corresponds to the results of Equation (3). Clearly the comparison between simulation and theory is only mildly satisfactory. As pointed out in Reference 8, the discrepancy is due to the fact that the acknowledgment traffic has been improperly included in Equation (3). An attempt was made in Reference 8 to properly account for the acknowledgment traffic; however, this adjustment was unsatisfactory. The problem is that the average message length has been taken to be 350 bits and this length has averaged the traffic due to acknowledgment messages along with traffic due to real messages. These acknowledgments should not be included among those messages whose average system delay is being calculated and yet acknowledgment traffic must be. included to properly account for the true loading effect in the network. In fact, the appropriate way to include this effect is to recognize that the time spent waiting for a channel is dependent upon the total traffic (including acknowledgments) whereas the time spent in transmission over a channel should be proportional to the message length of the real message traffic. Moreover, our theoretical where 1/~' = 560 bits (a real message's average length) and PL i is the propagation delay (dependent on the channel length, L i ) for the ith channel. The first term in parentheses is the average transmission time and the second term is the average waiting time. The result of this calculation for the ARPA network gives us the curve in Figure 2 labeled "theory with correct acknowledge adjustment and propagation delays." The correspondence now between simulation and theory is unbelievably good and we are encouraged that this approach appears to be a suitable one for the prediction of computer network performance for the assumptions made here. In fact, one can go further and include the effect on message delay of the priority given to acknowledgment traffic in the ARPA network; if one includes this effect, one obtains another excellent fit to the simulation data labeled in Figure 2 as "theory corrected and with priorities." As discussed in Reference 8 one may generalize the model considered herein to account for more general message length distributions by making use of the Pollaczek-Khinchin formula for the delay Ti of a TABLE I-Publicly Available Leased Transmission Line Costs from Reference 3 Speed 45 56 75 2400 41 82 230 1 12 bps bps bps bps KB KB KB MB MB Cost / mile/month (normalized to 1000 mile distance) $ .70 .70 .77 1. 79 15.00 20.00 28.00 60.00 287.50 Analytic and Simulation Methods 573 TABLE 2-Estimated Leased Transmission Line Costs Based on Telpak Rates. * channel with capacity Ci, where the message lengths have mean 1/~ bits with variance 0"2, where Ai is the average message traffic rate and Pi = Ai/ ~C i which states Cost Cost/mile/rrwnth (termination + mileage) (normalized to 1000 mile distance) / month Speed (5) This expression would replace the first two terms in the parenthetical expression of Equation (4)'; of course by relaxing the assumption of an exponential distribution we remove the simplicity provided by the Markovian property of the traffic flow. This approach, however, should provide a better approximation to the true behavior when required. Having briefly considered the problem of analyzing computer networks with regard to a single performance measure (average message delay), we now move on to the consideration of synthesis questions. This investigation immediately leads into optimal synthesis procedures. Optimization for Synthesis various channel cost functions- We are concerned here with the optimization of the channel capacity assignment under various assumptions regarding the cost of these channels. This optimization must be made under the constraint of fixed cost. Our problem statement then becomes:* Select the {C i} so as. to minimize T subject to a fixed cost constraint 150 2400 7200 19.2 50 108 230.4 460.8 1.344 $ 77.50 232 810 850 850 2400 1300 1300 500 + $ . 12/mile + . 35/mile + . 35/mile + 2.1O/mile + 4.20/mile + 4.20/mile + 21.00/mile + 60.00/mile + 75.00/mile $ .20 .58 1.16 2.95 5.05 6.60 22.30 61.30 80.00 *These costs are, in some cases, first estimates and are not to be considered as quoted rates. must now attempt to find an analytic function which fits cost functions of this sort. Clearly that analytic function will depend upon the rate schedule available to the computer network designer and user. Many analytic fits to this function have been proposed and in particular in Reference 3 a fit is proposed of the form: Cost of line = O.IC i o.44 Simile/month (7) Based upon rates available for private line channels, Mastromonaco1o arrives at the following fit for line costs where he has normalized to a distance of 50 miles (rather than 1000 miles in Equation (7)) Cost of line = L08Ci o.316 (6) where, for simplicity, we use the expression in Equation (2) to define T. Weare now faced with choosing an appropriate cost function for the system of channels. We assume that the total cost of the network is contained in these channel costs where we certainly permit fixed termination charges, for example, to be included. In order to get a feeling for the correct form for the cost function let us examine some available data. From Reference 3 we have available the costing data which we present in Table 1. From a schedule of costs for leased communication lines available at Telpak rates we. have the data presented in Table 2. We have plotted these functions in Figure 3. We bps bps bps KB KB KB KB KB MB Simile/month (8) Referring now to Figure 3 we see that the mileage :r I- ~IO 2 .... 11.1 oJ i .... ~ ::l oc D * The dual to this optimization problem may also be considered: "Select the {Ci } so as to minimize cost, subject to a fixed message delay constraint." The solution to this dual problem gives the optimum Ci with the same functionai dependence on Xi as one obtains for the original optimization problem. Figure 3-Scanty data on transmission line costs:$/mile/month normalized to 1000 mile distance 574 Spring Joint Computer Conference, 1970 costs from Table 2 rise as a fractional exponent of capacity (in fact with an exponent of .815) suggesting the cost function shown in Equation (9) below - following equation for the capacity Ci = ~~ + (De) V~ Cost of line = AC i o.815 $/mile/month $/mile/month (10) It is clear from these simple considerations that the cost function appropriate for a particular application depends upon that application and therefore it is difficult to establish a unique cost function for all situations. Consequently, we satisfy ourselves below by considering a number of possible cost functions and study optimization conditions and results which follow from those cost functions. The designer may then choose from among these to match his given tariff schedule. These cost functions will form the fixed cost constraint in Equation (6). Let us now consider the collection of cost functions, and the related optimization questions. 1. Linear cost function. We begin with this case since the analysis already exists in the author's Reference 7, where the assumed cost constraint took the form (11) where D = total number of dollars available to spend on channels, d i = the dollar cost per unit of capacity on the ith channel, and Ci once again is the capacity of the ith channel. Clearly Equation (11) is of the same form as Equation (10) with a = 1 where we now consider the cost of all channels in the system as having a linear form. This cost function assumes that cost is strictly linear with respect to capacity; of course this same cost function allows the assumption of a constant (for example, termination charges) plus a linear cost function of capacity. This constant (termination charge) for each channel may be subtracted out of total cost, D, to create an equivalent problem of the form given in Equation (11). The constant, d i , allows one to account for the length of the channel since di may clearly be proportional to the length of the channel as well as anything else regarding the particular channel involved such as, for example, the terrain over which the channel must be placed. As was done in Reference 7, one may carry out the minimization given by Equation (6) using, for example, the method of Lagrangian undetermined multipliers.5 This procedure yields the VAjd j (12) j (9) These last three equations give the dollar cost per mile per month where the capacity Ci is given in bits per second. It is interesting to note that all three functions are of the form Cost of line = A Cia L di J.I. where De = D - L Ai d i i J.I. >0 (13) When we substitute this result back into Equation (2) we obtain that the performance measure for such a channel capacity assignment is n( ~ VAi d /A)2 i T = ------------- J.l.De (14) where LAi i A n = - - == - = ~ average path length (15) ~ The resulting Equation (12) is referred to as the square root channel capacity assignment; this particular assignment first provides to each channel a capacity equal to A/ J.I. which is merely the average bit rate which must pass over that channel and which it must obviously be provided if the channel is to carry such traffic. In addition, surplus capacity (due to excess dollars, De) is assigned to this channel in proportion to the square root of the traffic carried, hence the name. In Reference 7 the author studied in great detail the particular case for which di = 1 (the case for which all channels cost the same regardless of length) and considerable information regarding topological design and routing procedures was thereby obtained. However, in the case of the ARPA network a more reasonable choice for d i is that it should be proportional to the length Li of the ith channel as indicated in Equation (10) (for a = 1) which gives the per mileage cost; thus we may take di = ALi. This second case was considered in Reference 8 and also in Reference 9. The interpretation for these two cases regarding the desirability of concentrating traffic into a few large and short channels as well as minimizing the average length of lines traversed by a message was well discussed and will not be repeated here. We observe in the ARPA network example since the channel capacities are fixed at 50 kilobits that there is no freedom left to optimize the choice of channel capacities; however it was shown in Reference 8 that one could take advantage of the optimization procedure in the following way: The total cost of the network Analytic and Simulation Methods using 50 kilobit channels may be calculated. One may then optimize the network (in the sense of minimizing T) by allowing the channel capacities to vary while maintaining the cost fixed at this figure. The result of such optimization will provide a set of channel capacities which vary considerably from the fixed capacity network. It was shown in Reference 8 that one could improve the performance of the network in an efficient way by allowing that channel which required the largest capacity as a result of optimization to be increased from 50 kilobits in the fixed net to 250 kilo bits. This of course increases the cost of the system. One may then provide a 250 kilobit channel for the second "most needy" channel from the optimization, increasing the cost further. One may then continue this procedure of increasing the needy channels to 250 kilobits while increasing the cost of the network and observe the way in which message delay decreases as system cost increases. It was found that natural stopping points for this procedure existed at which the cost increased rapidly without a similar sharp decrease in message delay thereby providing some handle on the costperformance trade-o ff. Since we are more interested in the difference between results obtained when one varies the cost function in more significant ways, we now study additional cost functions. 2. Logarithmic cost functions. The next case of interest assumes a cost function of the form (16) result with the result in Equation (12) where we had a square root channel capacity assignment. If we now take the simple result given in Equation (17) and use it in Equation (2) to find the performance measure T we obtain {I [ (1 l' )2JI/21-1 T-L + + 2 di /3 d i /3 2 di /3 J - i (17) In this solution the Lagrangian mUltiplier /3 must be adjusted so that Equation (16) is satisfied when Ci is substituted in from Equation (17) . Note the unusual simplicity for the solution of C i, namely that the channel capacity for the ith channel is directly proportional to the traffic carried by that channel, Ai/ JL. Contrast this (18) In this last result the performance measure depends upon the particular distribution of the internal traffic {Ail JL} through the constant /3 which is adjusted as described above. 3. The power law cost function. As we saw in Equations (7), (8), and (9) it appears that many of the existing tariffs may be approximated by a cost function of the form given in Equation (19) below. (19) where a is some appropriate exponent of the capacity and d i is an arbitrary multiplier which may of course depend upon the length of the channel and other pertinent channel parameters. Applying the Lagrangian again with an undetermined multiplier /3 we obtain as our condition for an optimal channel capacity the following non-linear equation: (20) where Ai where D again is the total dollar cost provided for constructing the network, di is a coefficient of cost which may depend upon length of channel, a is an appropriate multiplier and Ci is the capacity of the ith channel. We consider this cost function for two reasons: first, because it has the property that the incremental cost per bit decreases as the channel size increases; and secondly, because it leads to simple theoretical results. We now solve the minimization problem. expressed in Equation (6) where the fixed cost constraint is now given through Equation (16). We obtain the following equation for the capacity of the ith channel 575 g.- - ~ - ( JL'Y/3a di )1/2 (21) Once again, /3 must be adjusted so as to satisfy the constraint Equation (19). It can be shown that the left hand side of Equation (20) represents a convex function and that it has a unique solution for some positive value Ci. We assume that a is in the range as suggested from the data in Figure 3. We may also show that the location of the solution to Equation (20) is not especially sensitive to the parameter settings. Therefore, it is possible to use any efficient iterative technique for solving Equation (20) and we ha.ve found that such techniques converge quite rapidly to the optimal solution. 4. Comparison of solutions for various cost functions. In the last three subsections we have considered three different cost functions: the linear cost function; the' logarithmic cost function; and the power law cost function. Of course we see immediately that the linear 576 Spring Joint Computer Conference, 1970 30 28 en ~ >cI ...J LU 0 26 24 I LU :IE 22 i= 20 60 70 80 90 100 110 120 130 140 PERCENTAGE OF FULL DATA RATE Figure 4-Average message delay at fixed cost as a function of data rate for the power law and linear cost functions cost function is a special case a = 1 of the power law cost function. We wish now to compare the performance and cost of computer networks under these various cost functions. We use for our example the ARPA computer network as described above. It is not obvious how one should proceed in making this comparison. However, we adopt the following approach in an attempt to make some meaningful comparisons. We consider the ARPA network at a traffic load of 100% of the full data rate, namely 225 kilobits per second (denoted by 'Yo). For the 50 kilobit net shown in Figure 1 we may calculate the line costs from Table 2 ( e~inating the termination charges since we recognize this causes no essential oil( LOOPS _ SUPPRESSED ~ ..A W improvement; nevertheless, this improved version remains inferior to those simulated systems with asynchronous updating. As mentioned above, asynchronous updating contains many virtues, but one must consider the overhead incurred for such a sophisticated updating procedure before it can be incorporated and expected to yield a net improvement in performance. o w !IOO w ~ a:: FASTEST ZERO·LOAD PATH FIXED ROUTING ~ CONCLUSIONS THRESHOLD ____ 3% PACKETS ....;::::: 2'h PACKETS "'"-"" 3 PACKETS w ~ :.. 50 ASYNCHRONOUS UPDATING o ~--~~~----~------~------~----~ 1.0 2.0 3.0 4.0 5.0 AVERAGE PATH LENGTH Figure 7-Comparison of synchronous and asynchronous updating for routing algorithms but certainly implies the use of thresholds on the percentage change' of estimated delays. When these thresholds are crossed in an IMP then routing information is transferred to that IMP's nearest neighbors. This asynchronous mode of updating implies a large overhead for updating and it remains to be seen whether the advantages gained through this more elaborate updating method overcome the disadvantages due to software costs and cycle-stealing costs for updating. We may observe the difference in performance between synchronous and asynchronous updating through the use· of simulation as shown in Figure 7. In this figure we plot the average time delay T versus the average path length for messages under various routing disciplines. We observe immediately that the three points shown for asynchronous updating are significantly superior to those shown for synchronous updating. For a comparison we also show the result of a fixed routing algorithm which was computed by solving for the shortest delay path in an unloaded network ; the asynchronous updating shows superior performance to the fixed routing procedure. Moreover, the synchronous updating shows inferior performance compared to this very simple fixed routing procedure if we take as our performance measure the average message delay. It was observed that with synchronous updating it was possible for a message to get trapped temporarily in loops (i.e., traveling back and forth between the same pair of nodes). We suppressed this looping behavior for two synchronous updating procedures with different parameter settings and achieved significant Our goal in this paper has been to demonstrate the importance of analytical and. simulation techniques in evaluating computer networks in the early design stages. We have addressed ourselves to three areas of interest, namely the analysis of computer network performance using methods from' queueing theory, the optimal synthesis problem for a variety of cost functions, and the choice of routing procedure for these networks. Our results ~how that it is possible to obtain exceptionally good results in the analysis phase when one considers the "small" packet traffic only. As yet, we have not undertaken the study of the multi-packet traffic behavior. In examining available data we found that the power law cost function appears to be the appropriate one for high-speed data lines. We obtained optimal channel capacity assignment procedures for this cost function as well as the logarithmic cost function and the linear cost function. A significant result issued from this study through the observation that the average message delay for the power law cost function could very closely be approximated by the average message delay through the system constrained by a linear cost function; this holds true in the case when the system cost is held fixed. For the fixed delay case we found that the variation of the system cost under a power law constraint could be represented by the cost variation for a linear cost constraint only to a limited extent. In conjunction with pure analytical results it is extremely useful to take advantage of system simulation. This is the approach we described in studying the effect of routing procedures and comparing methods for updating these procedures. We indicated that asynchronous updating was clearly superior to synchronous updating except in the case where the overhead for .asynchronous updating might be severe. The results referred to above serve to describe the behavior of computer network systems and are useful in the early stages' of system design. If one is desirous of obtaining' numerical tools for choosing the precise design parameters' of a system, then it is necessary to go to much more elaborate analytic models or else to resort to efficient search procedures (such as that Analytic and Simulation Methods described designs. III Reference 4) III order to locate optimal ACKNOWLEDGMENTS The author is pleased to acknowledge Gary L. Fultz for his assistance in simulation studies as well as his contributions to loop suppression in the routing procedures; acknowledgment is also due to Ken Chert for his assistance in the numerical solution for the performance under different cost function constraints. REFEllENCES 1 S CARR S CROCKER V CERF Host to host communication protocol in the ARPA network These proceedings 2 P A DICKSON ARPA network will represent integration on a large scale Electronics pp 131-134 September 30 1968 3 R G GOULD Comments on generalized cost expressions for private-line communications channels IEEE Transactions on Communication Technology V Com-13 No 3 pp 374-377 September 1965 also R P ECKHERT P M KELLY A program for the development of a computer resource sharing network Internal Report for Kelly Scientific Corp Washington D C February 1969 579 4 H FRANK I T FRISCH W CHOU Topological considerations in the design of the ARPA computer network These proceedings 5 F B HILDEBRAND Methods of applied mathematics Prentice-Hall Inc Englewood Cliffs N J 1958 6 F E HEART R E KAHN S M ORNSTEIN W R CROWTHER D C WALDEN The interface message processor for the ARPA network These proceedings 7 L KLEINROCK Communication nets; stochastic message flow and delay McGraw-Hill New York 1964 8 L KLEINROCK Models for computer networks Proc of the International Communications Conference pp 21-9 to 21-16 University of Colorado Boulder June 1969 9 L KLEINROCK Comparison of solutions methods for computer network models Proc of the Computers and Communications Conference Rome New York September 30-0ctober 2 1969 10 F R MASTRO MONACO Optimum speed of service in the design of customer data communications systems Proc of the ACM Symposium on the Optimization of Data Communications Systems pp 127-151 Pine Mountain Georgia October 13-16 1969 11 L G ROBERTS Multiple computer networks and intercomputer communications ACM Symposium on Operating Systems Principles Gatlinburg Tennessee October 1967 12 L G ROBERTS B D WESSLER Computer network developments to achieve resource sharing These proceedings Topological considerations in the design of the ARPA computer network* by H. FRANK, 1. T. FRISCH, and W. CHOU Network Analysis Corporation Glen Cove, N ew York INTRODUCTION The ARPA Network will provide store-and-forward communication paths between a set of computer centers distributed across the continental United States. The message handling tasks at each node in the network are performed by a special purpose Interface Message Processor (IMP) located at each computer center. The centers will be interconnected through the IMPs by fully duplex telephone lines, of typically 50 kilobit/sec capacity. When a message is ready for transmission, it will be broken up into a set of packets, each with appropriate header information. Each packet will independently make its way through the network to its destination. When a packet is transmitted between any pair of nodes, the transmitting IMP must receive a positive acknowledgement from the receiving IMP within a given interval of time. If this acknowledgement is not received, the packet will be retransmitted, either over the same or a different channel depending on the network routing doctrine being employed. One of the design goals of the system is to achieve a response time of less than 0.2 seconds for short messages. A measure of the efficiency with which this criterion is met is the cost per bit of information transmitted through the network when the total network traffic is at the level which yields 0.2 second average time delay. The goal of the network design is to achieve the required response time with the least possible cost per bit. The final network design is subject to a number of additional constraints. It must be reliable, it must have reasonably flexible capacity in order to accommo- * This work was supported by the Advanced Research Projects Agency of the Department of Defense (Contract No. DAHC1570-C-0120). 581 date variations in traffic flow without significant degradation in performance, and it must be neatly expandable so that additional nodes and links can be added at later dates. The sequence and allowable variations with which the nodes are added to the network must also be taken into account. At any stage in the evolution of the network, there must be at least one communication path between any pair of nodes that have already been activated. In order to achieve a reasonable level of reliability, the network must be designed so that at least two nodes and/or links must fail before the network becomes disconnected. To plan the orderly growth of the network, it is necessary to predict the behavior of proposed network designs. To do this, traffic flows must ,be projected and network routing procedures specified. The time delay analysis problem has been studied by Kleinrockl ,2 who considered several mathematical models of the ARPA Network. Kleinrock's comparison of his analysis with computer simulations indicates that network behavior can be qualitatively predicted with reasonable confidence. However, additional study in this area is needed before all the significant parameters which describe the system can be incorporated into the model. For the present, it appears that a combination of analysis and simulation can best be applied to determine a specific network's behavior. Even if a proposed network can be accurately analyzed, the most economical networks which satisfy all of the constraints are not easily found. This is because of the enormous number of combinations of links that can be used to connect a relatively small number of nodes. It is not possible to examine even a small fraction of the possible network topologies that might lead to economical designs. In fact, the direct enumeration of all such configurations for a twenty node network is beyond the capabilities of the most powerful present day computer. 582 Spring Joint Computer Conference, 1970 TOPOLOGICAL OPTIMIZATION The design philosophy As part of NAC's study of computer network design, a computer program was developed to find low cost topologies which satisfy the constraints on network time delay, reliability, congestion, and other performance parameters. This program is structured to allow the network designer to rapidly investigate the tradeoffs between average time delay per message, network cost, and other factors of interest. The inputs to the program are: By a "feasible" solution, we mean one which satisfies all of the network constraints. By an "optimal" network, we mean the feasible network with the least possible cost. Our goal is to develop a method that can handle realistically large problems in a reasonable computation time and which can find feasible solutions wi th costs close to optimal. The method to be used has two main parts called the starting routine and the optimizing routine. The starting routine generates a feasible solution. The optimizing routine then examines networks derived from this starting network by means of local transformations applied to the network topology. When a feasible network with lower cost is found, it is adopted as a new starting network and the process is continued. IIi this way, a feasible network is eventually reached whose cost cannot be reduced by applying additional local transformations of the type being considered. Such a network is called a locally optimum network. Once a locally optimum network is found, the entire procedure is repeated by again using the starting routine. The starting routine may incorporate suggestions made by a human designer. For example, the present tentative configurations for the ARPA Network have been used. Alternatively, if desired, the starting routine may generate feasible networks without such advice. At the present time, our starting routine is capable of generating about 100,000 low cost networks. By finding local optima from different starting networks, a variety of solutions can be generated. Figure 1 shows a diagrammatic representation of the process. 1. Existing network configuration (i.e., lines and nodes already installed and ordered) 2. Estimated traffic between nodes 3. Maximum average delay desired for short messages In addition, the user may specify to the program a maximum cost that no network design will be allowed to exceed. The output of the program is a sequence of low cost networks. Each network is identified by the following information: 1. 2. 3. 4. 5. 6. Network topology Cost per month Maximum throughput Estimated average traffic Message cost per megabit at maximum throughput Average message delay for short messages Each acceptable network design also conforms to the standard that at least two nodes and/or links must fail before all communication paths between any pair of nodes are disrupted. APPROACH The general design problem as stated above is similar to other network design problems for which computationally practical solutions have recently been obtained. These problems include the minimum cost design of survivable networks,3 the minimum cost selection and interconnection of Telpaks in telephone networks,4 the design of offshore natural gas pipeline networks, 5 and the classical Traveling Salesman problem. 6 These problems have long resisted exact solution; however, recent work on approximate methods has been extremely successful and has led to efficient ~ethods of finding low cost solutions in practical compu.. tation times. Figure 1-Diagrammatic representation of the optimization procedure Topological Considerations in the Design of the ARPA Computer Network 583 adding a new set to the network. The method of selection of the number and location of the links to be removed and added determines the usefulness of the transformation and its applicability to the problem in hand. For example, in the problem of economically designing offshore natural gas pipeline networks, dramatic cost reductions were achieved by removing and adding one link at a time. 5 On the other hand, in a problem of the minimum cost design of survivable networks, the most useful link exchange consisted of removing and adding two links at a time. 3 In general, it is not necessary that the same number of links be added and removed during each application of the transformation. Generate Starting Network Local Transforms Exhausted? >. . . _. . . I Yes Local ~~ OPtim~ • Found I _ No Examine Next Local Transformation DESIGN CONSTRAINTS The preceding section has a given general approach for the design of low cost feasible net\v"orks. To implement this approach, a number of specific problems must be considered. These include: 1. 2. 3. 4. No The distribution of network traffic. Network Route Selection. Link capacity assignment. Node and Link Time Delays. No Distribution of traffic Accept New Network Figure 2-Block diagram of optimization procedure The space of feasible solutions is represented by the area enclosed by the outer border of the figure; starting solutions are represented by light circles and local optima by dark circles. The practicality of the approach is based on the assumption that with a high probability some of the local optima found are close in cost to the global optimum. Naturally, this assumption is sensitive to the partiCUlar transformation used in the optimizing routine. A block diagram of the optimization procedure is shown in Figure 2. Local transformations A local transformation on a network is generated by identifying a set of links, removing these links, and At the present time, it is difficult to estimate the precise magnitude and distribution of the Host-to-Host traffic. However, one design goal is that the amount of flow that can be transmitted between nodes should not significantly vary with the locations of sender and receiver. Hence, two users several thousand miles apart should receive the same service as two users several hundred miles apart. A reasonable requirement is therefore that the network be designed so that it can accommodate equal traffic between all pairs of nodes. However, it is known that certain nodes have larger traffic requirements to and from the University of Illinois' Illiac IV than to other nodes. Consequently, information of this type is incorporated into the model. The magnitude of the network traffic is treated as variable. A "base" traffic requirement of 500·n bits per second (n is a positive real number) between all nodes is assumed. An additional 500·n bits per second is then added to and from the University of Illinois (node No.9) and nodes 4, 5, 12, 18, 19, and 20. The base traffic is used to determine the flows in each link and the link capacities as discussed in the following sections. n is then increased until the average time delay exceeds .2 seconds. The average number of bits per second per 584 Spring Joint Computer Conference, 1970 node at average delay equal .2 seconds is taken as a measure of performance and the corresponding cost per bit is taken as a measure of efficiency of the network. Route selection In order to avoid the prohibitively long computation times required to analyze dynamic routing strategies, a fixed routing procedure is used. This procedure is similar to the one which will be used in the operating network but it has the advantage that it can be readily incorporated into analysis procedures which do not depend on simulation. The routing procedure is determined by the assumption that for each message a path which contains the few~st number of intermediate* nodes from origin to destination is most desirable. Given a proposed network topology and traffic matrix, routes are determined as follows: For each i (i = 1,2, "', N = 20): 1. With node i as an initial node, use a labelling procedure 7 to generate all paths containing the fewest number of intermediate nodes, to all nodes which have non-zero traffic from node i. Such paths are called feasible paths. 2. If node i has non-zero traffic to node j (j = 1, 2, "', N, j ¢ i) and the feasible paths from i to j contain more than seven nodes, the topology is considered infeasible. 3. Nodes are grouped as follows: (a) All nodes connected to node i. (b) All nodes connected to node i by a feasible path with one intermediate node. (c) All nodes connected to node i by a feasible path with two intermediate nodes. (d) - - - - - - - - - (e) - - - - - - - - - (f) All nodes connected to node i by a feasible path with five intermediate nodes. Traffic is first routed from node i to any node j which is ,directly connected to i over link (i, j). Consequently, after this stage, some flows have been assigned to the network. Each node in group (b) is then considered. For any node j in this group, all feasible paths from i to j are examin.ed, and the maximum flow thus far assigned in any link in each such path is found. All paths with the smallest maximum flow are then considered. The path whose total length is minimum * A node j ~ 8, t is called an intermediate node with respect to a message with origin 8 and destination t if the path from 8 to t over which the message is transmitted contains node j. is then selected and all traffic origin.ating at i and destined for j is routed over this path. * All nodes in group (b) are treated in this matter. The same procedure is then applied to all nodes in group (c), (d), (e) and (f) in that order. Capacity assignment Link capacities could be assigned prior to routing. Then after route selection, if the flow in any link exceeds its assigned capacity, the network would be considered infeasible. On the other hand, link capacities may be assigned after all traffic is routed; we adopt this approach. The capacity of each link is chosen to be the least expensive option available from AT&T which satisfies the flow requirement. The line options which are presently being considered are: 50,000 bits/sec (bps), 108,000 bps, 230,400 bps, and 460,000 bps. Monthly link costs are the sum of a fixed terminal charge and a linear cost per mile. Thus, to satisfy a requirement of 85,000 bps, depending on the length of the link it is' sometimes cheaper to use two 50,000 bps parallel links and sometimes cheaper to use a single 108,000 bps link. The following line options and costs have been investigated: Type Speed Full Group (303 data set) Full Group (304 data set)** Telpak C Telpak D 50KB Cost Per Month $850 108 KB $2400 230.4 KB $1300 460 KB $1300 + $4.20/mile + $4.20/mile + $21.00/mile + $60.00/mile Link and node delays Response time T is defined as the average time a message takes to make its way through the network from its origin to its destination. Short messages are considered to correspond to a single packet which may be as long as 1008 bits or as short as few bits, plus the header. If T i is the mean delay time for a packet passing through the ith link, then M T = r- 1 LYiTi, i=l *It is also possible to divide the traffic from i to j and send it over more than one feasible path, but for uniform traffic this is not an important factor. **Not a standard AT&T offering. Topological Considerations in the Design of the ARPA Computer Network where r is the total IMP-to-IMP traffic rate, Yi is the average traffic rate in the ith link, and M is the total number of links. T i can be approximated with the Pollaczak-Khinchin formula as: where 1/po is the average packet length (in bits), Ci is the capacity of the ith link (in bits/second), a is the coefficient of variance for the packet length. These parameters are evaluated as follows: 1. r is the sum of all elements in the traffic matrix after each element has been adjusted to include headers, parity check and requests for next message (RFNM). 2. Yi is determined by the routing strategy. 3. In calculating 1/po, we consider three kinds of packets: (a) packets generated by short messages and all other packets (except RFNM's) with length less than 1008 bits; (b) full length packets of 1008 bits belonging to long messages; (c) RFNM's. It is assumed that the packets of part (a) are uniformly distributed with mean length equal to 560 bits. The packet length for part (b) is a constant equal to 1008 bits. The average packet length is then calculated by first estimating the average number of packets with 1008 bits. It is assumed that each long message consists of -an average of 4 packets. In many of our computations, we assume that 80% of the messages are short. The number of RFNM packets can then be estimated. Finally, since the average length of each type of packet is known and the number of each type of packet has been estimated, the average packet length can be estimated. 4. Yi is adjusted to include the increased traffic due to acknowledgments. C i is then selected as already described. 5. The larger the value of a, the larger the delay time. For the exponential distribution a = 1; for a constant, a = 0; and for many distributions 0 < a < 1. Since it is reasonable to assume that the packet length distribution being considered is very close to the combination of a uniform distribution and a constant, the value of a should be less than one. To avoid underestimating T, a is set equal to one in all calculations. The above analysis is based on the assumption that the number of available buffers is unlimited. When the traffic' is low, this assumption is very accurate. For high traffic, adjustments to account for the limitation of buffer space are necessary. There are two roles for buffers in an IMP; one for reassembling messages destined for that IMP"s Host 585 and the other for store-and-forward traffic. At the present time, about one-half of the IMP's core is used for the operating program. The remainder contains about 84 buffers each of which can store a single packet. Up to i of the buffers rimy be used for reassembly. Buffers not used for reassembly are available for storeand-forward traffic. When no buffer is available for reassembly, any arriving packet which requires reassembly but does not belong to any message in the process of reassembly will be discarded and no acknowledgment returned to the transmitting IMP. This packet must then be retransmitted, and the effective traffic in the link is therefore increased. In addition, each time a packet is retransmitted, its delay time is not only increased by the extra waiting and transmitting time, but also by the 100 ms time-out period. To account for these factors, an upper bound on the probability that no buffer is available is calculated for each IMP. The traffic between IMPs is then increased and extra delay time for the retransmitted packets is calculated. The increase in delay time is then averaged over all the packets. When no buffer is available for store-and-forward traffic, all incoming links become inactive. Effectively, the average usable capacities of these links is lower than their actual capacities. The probability that no buffer is available for store-and-forward traffic is set equal to the average of an upper bound and a lower bound; the upper bound is calculated by assuming that the ratio of flow to capacity of each link into the IMP is equal to the maximum ratio for' all links at that node while the lower bound is found by assuming that the ratio of flow to capacity for each link is equal to the minimum such ratio. Link capacities are then reduced to include this effect and the response time is then recalculated. An example of the effect of the above assumptions is shown in Figure 4. Figure 4 relates a verage time delay and throughput per node for the network shown in Figure 3. Two curves are shown. One is obtained by assuming that there are an infinite number of buffers at each node. The second curve is obtained by using the actual buffer limitations of the ARPA network. 4 ~ <.::::;;,?-~,~' " loe.... ,8 . loa ....8 7 5 Figure 3 1o",,& , " 586 Spring Joint Computer Conference, 1970 delay time. (in TIts) Figure 5 of PI and a vertical line down from PI. Any point say P 2 which falls within the quadrant defined by the two lines is said to be dominated by PI, since in a sense, node Figure 4 PRELIMINARY COMPUTATIONAL RESULTS The optimization procedures were employed to design many thousand twenty node networks. The parameters of the best of these networks were then plotted as scatter diagrams as indicated in Figure 5. The coordinate of the horizontal axis on the graph is cost in dollars. The coordinate of the vertical axis is the average throughput per node* in bits per second for a specified distribution of traffic. The graph shown is for an average message delay of .2 seconds for short messages. Each point in the graph corresponds to a network generated, evaluated, and optimized by the computer. Interpretation of results Consider any point P.l corresponding to a network N 1• Draw a horizontal line starting at PI to the right * throughput is the average number of bits/second out of each node. network N I is "better than" network N 2. Similarly N 1 is said to be a dominant network. That is, for the same delay N 1 provides at least as much throughput as N 2 at no higher cost. Horizontal and vertical lines can be drawn through certain points PI, ... , P n so that all other points are dominated by at least one of these. PI, ... , P n thus represent, in one sense, the best networks. One must be cautious, however, in that a network which is dominant for one time delay may not be dominant for another. Many networks with this property have been found in our studies. Furthermore, in some cases a network may be dominated but might still be preferable to the network which dominates it because of other factors such as the order of leasing lines and plans for future growth. As an example, PI is a dominant point and yet there are many points which it dominates which are very close to it and might well be preferable. Some other conclusions can be drawn from the graphs. Examining the set of dominant points it appears that there are significant savings due to economies of scale in the range of costs of $64,000 to $80,000. That is, small increases in cost yield large gains in throughput. Similar savings are observed in 'the $90,000-$100,000 cost range for average throughputs in the 30,000 bits/ second range. These savings are due to the utilization of 108 kilobit lines which have the same line cost as 50 kilobit lines but a higher data set cost. This means that for a modest additional cost, the capacities of cross country lines can be more than doubled. To see Topological Considerations in the Design of the ARPA Computer Network 587 nodes on these parameters. Furthermore, alternative routing schemes will be considered as well as the costthroughput tradeoffs that can be obtained by increasing the number of buffers at appropriate nodes. REFERENCES Figure 6 the effect of eliminating the 108 kilobit line option (which is not a standard AT&T offering), the cost per megabit of transmitted data is plotted against the total monthly line cost in Figure 6 for low cost networks designed with and without this option. Each point in this figure represents a feasible network. The points are connected by straight lines for visual convenience. Additional investigations are presently under way to better understand the relationship between cost, delay and throughput, and the effect of the number of 1 L KLEINROCK Models for computer networks Proceedings of the International Conference on Communications pp 21.9-21.16 June 1969 2 L KLEINROCK Analytic and simulation methods in computer network design. See paper this conference 3 K STEIGLITZ P WEINER D KLEITMAN Design of minimum cost survivable networks IEEE Transactions on Circuit Theory 1970 4 B ROTHFARB M GOLDSTEIN Unpublished work 5 H FRANK B ROTHFARB D KLEITMAN K STEIGLITZ Design of economical offshore natural gas pipeline networks Office of Emergency Preparedness Report No R-1 Washington D C January 1969 6 S LIN Comp'l1;ter solutions of the traveling salesman problem Bell System Tech Journal Vol 44 No 10 pp 2245-2269 December 1965 7 H FRANK I T FRISCH Communication, transmission, and transportation networks Addison-Wesley 1971 HOST-HOST communication protocol in the ARPA network * by C. STEPHEN CARR University of Utah Salt Lake City, Utah and STEPHEN D. CROCKER and VINTON G. CERF University of California Los Angeles, California INTRODUCTION The Advanced Research Projects Agency (ARPA) Computer N etwok (hereafter referred to as the "ARPA network") is one of the most ambitious computer networks attempted to date. 1 The types of machines and operating systems involved in the network vary widely. For example, the computers at the first four sites are an XDS 940 (Stanford Research Institute), an IBM 360/75 (University of California, Santa Barbara), an XDS SIGMA-7 (University of California, Los Angeles), and a DEC PDP-I0 (University of Utah). The only commonality among the network membership is the use of highly interactive time-sharing systems; but, of course, these are all different in external appearance and implementation. Furthermore, no one node is in control of the network. This has insured generality and reliability but complicates the software. Of the networks which have reached the operational phase and been reported in the literature, none have involved the variety of computers and operating systems found in the ARPA network. For example, the Carnegie-Mellon, Princeton, IBM network consists of 360/67's with identical software. 2 Load sharing among identical batch machines was commonplace at North American Rockwell Corporation in the early 1960's. Therefore, the implementers of the present network have been only slightly influenced by earlier network attempts. *This research was sponsored by the Advanced Research Projects Agency, Department of Defense, under contracts AF30(602)-4277 and DAHC15-69-C-0285. 589 However, early time-sharing studies at the University of California at Berkeley, MIT, Lincoln Laboratory, and System Development Corporation (all ARPA sponsored) have had considerable influence on the design of the network. In some sense, the ARPA network of time-shared computers is a natural extension of earlier time-sharing concepts. The network is seen as a set of data entry and exit points into which individual computers insert messages destined for another (or the same) computer, and from which such messages emerge. The format of such messages and the operation of the network was specified by the network contractor (BB&N) and it became the responsibility of representatives of the various computer sites to impose such additional constraints and provide such protocol as necessary for users at one site to use resources at foreign sites. This paper details the decisions that have been made and the considerations behind these decisions. Several people deserve acknowledgment in this effort. J. Rulifson and W. Duvall of SRI participated in the early design effort of the protocol and in the discussions of NIL. G. Deloche of Thomson-CSF participated in the design effort while he was at UCLA and provided considerable documentation. J. Curry of Utah and P. Rovner of Lincoln Laboratory reviewed the early design and NIL. W. Crowther of Bolt, Beranek and Newman contributed the idea of a virtual net. The BB&N staff provided substantial assistance and guidance while delivering the network. We have found that, in the process of connecting machines and operating systems together, a great deal of rapport has been established between personnel at 590 Spring Joint Computer Conference, 1970 SR I UCLA Figure 1-Initial network configuration the various network node sites. The resulting mixture of ideas, discussions, disagreements, and resolutions has been highly refreshing and beneficial to all involved, and we regard the human interaction as a valuable by-product of the main effort. THE NETWORK AS SEEN BY THE HOSTS Before going on to discuss operating system communication protocol, some definitions are needed. A HOST is a computer system which is part of the network. An IMP (Interface Message Processor) is a Honeywell DDP-516 computer which interfaces with up to four HOSTs at a particular site, and allows HOSTs access into the network. The configuration of the initial fqur-HOST network is given in Figure 1. The IMPs "form a store-and-forward communications network. A companion paper in these proceedings covers the IMPs in some detail,3 A message is a bit stream less than 8096 bits long which is given to an IMP by a HOST for transmission to another HOST. The first 32 bits of the message are the leader. The leader contains the following information: (a) HOST (b) Message type (c) Flags (d) Link number When a message is transmitted from a HOST to its IMP, the HOST field of the leader names the receiving HOST. When the message arrives at the receiving HOST, the HOST field names the sending HOST. Only two message types are of concern in this paper. Regular messages are generated by a HOST and sent to its IMP for transmission to a foreign HOST. The other message type of interest is a RFNM (Requestfor-Next-Message). RFNMs are explained in conjunction with links. The flag field of the leader controls special cases not of concern here. The link number identifies over which of 256 logical paths (links) between the sending HOST and the reoeiving HOST the message will be sent. Each link is unidirectional and is controlled by the network so that no more than one message at a time may be sent over it. This control is implemented using RFNM messages. After a sending HOST has s"ent a message to a receiving HOST over a particular link, the sending HOST is prohibited from sending another message over that same link until the sending HOST receives a RFNM. The RFNM is generated by the IMP connected to the receiving HOST, and the RFNM is sent back to the sending HOST afteF the message has entered the receiving HOST. It is important to remember that there are 256 links in each direction and that no relationship among these is imposed by the network. The purpose of the link and RFNM mechanism is to prohibit individual users from overloading an IMP or a HOST. Implicit in this purpose is the assumption that a user does not use multiple links to achieve a wide band, and to a large extent the HOST-HOST protocol cooperates with this assumption. An even more basic assumption, of course, is that the network's load comes from some users transmitting sequences of messages rather than many users transmitting single messages coincidently. In order to delimit the length of the message, and to make it easier for HOSTs of differing word lengths to communicate, the following formatting procedure is used. When a HOST prepares a message for output, it creates a 32-bit leader. Following the leader is a binary string, called marking, consisting of an arbitrary number of zeroes, followed by a one. Marking makes it possible for the sending HOST to synchronize the beginning of the text of a message with its word boundaries. When the last bit of a message has entered an IMP, the hardware interface between the IMP and HOST appends a one followed by enough zeroes to make the message length a multiple of 16 bits. These appended bits are called padding. Except for the marking and padding, no limitations are placed on the text of a message. Figure 2 shows a typical message sent by a 24-bit machine. DESIGN CONCEPTS The computers participating in the network are alike in two important respects: each supports research inde- HOST-HOST Communication Protocol pendent of the network, and each is under the discipline of a time-sharing system. These facts contributed to the following design philosophy. First, because the computers in the network have independent purposes, it is necessary to preserve decentralized administrative control of the various computers. Since all of the time-sharing supervisors possess elaborate and definite accounting and resource allocation mechanisms, we arranged matters so that these mechanisms would control the load due to the network in the same way they control locally generated load. Second, because the computers are all operated under time-sharing disciplines, it seemed desirable to facilitate basic interactive mechanisms. Third, because this network is used by experienced programmers it was imperative to provide the widest latitude in using the network. Restrictions concerning character sets, programming languages, etc., would not be tolerated and we avoided such restrictions. Fourth, again because the network is used by experienced programmers, it was felt necessary to leave the design open-ended. We expect that conventions will arise from time to time as experience is gained, but we felt constrained not to impose them arbitrarily. Fifth, in order to make network participation comfortable, or in some cases, feasible, the software interface to the network should require minimal surgery on the HOST operating system. Finally, we accepted the assumption stated above that network use consists of prolonged conversations instead of one-shot requests. Those considerations led to the notions of connections, a Network Control Program, a control link, control commands, sockets, and virtual nets. A connection is an extension of a link. A connection connects two processes so that output from one process 24 bits Leader --- (32 bits) rIOO-- - 16 bits of marking ----(J' ~ Text of message (96 bits) 100---,,- ----0 I ~ 16 bits of padding added by the interface Figure 2-A typical message from a 24-bit machine 24 User Number 8 591 8 L LAEN HOST number Figure 3-A typical socket is input to the other. Connections are simplex, so two connections are needed if two processes are to converse in both directions. Processes within a HOST communicate with the network through a Network Control Program (NCP). In most HOSTs, the NCP will be part of the executive, so that processes will use system calls to communicate with it. The primary function of the NCP is to establish connections, break connections, switch connections, and control flow. In order to accomplish its tasks, a NCP in one HOST must communicate with a NCP in another HOST. To this end, a particular link between each pair of HOSTs has been designated as the control link. Messages received over the control link are always interpreted by the NCP as a sequence of one or more control commands. As an example, one of the kinds of control commands is used to assign a link and initiate a connection, while another kind carries notification that a connection has been terminated. A partial sketch of the syntax and semantics of control commands is given in the next section. A major issue is how to refer to processes in a foreign HOST. Each HOST has some internal naming scheme, but these various schemes often are incompatible. Since it is not practical to impose a common internal process naming scheme, an intermediate name space was created with a separate portion of the name space given to each HOST. It is left to each HOST to map internal process identifiers into its name space. The elements of the name space are called sockets. A socket forms one end of a connection, and a connection is fully specified by a pair of sockets. A socket is specified by the concatenation of three numbers: (a) a user number (24 bits) (b) a HOST number (8 bits) (c) AEN (8 bits) A typical socket is illustrated in Figure 3. Each HOST is assigned all sockets in the name space which have field (b) equal to the HOST's own identification. A socket is either a receive socket or a send socket, and is so marked by the low-order bit of the AEN (0 = receive, 1 = send). The other seven bits of the 592 Spring Joint Computer Conference, 1970 connection ~ I p.oce .. ;-B ____~__~A~__________~ I r send socket link receive socket Figure 4-The relationship between sockets and processes AEN simply provide a sizable population of sockets for each user number at each HOST. (AEN stands for "another eight-bit number".) Each user is assigned a 24-bit user number· which uniquely identifies him throughout the network. Generally this will be the 8-bit HOST number of his home HOST, followed by 16 bits which uniquely identify him at that HOST. Provision can also be made for a user to have a user number not keyed to a particular HOST, an arrangement desirable for mobile users who might have no home HOST or more than one home HOST. This 24-bit user number is then used in the following manner. When a user signs onto a HOST, his user number is looked up. Thereafter, each process the user creates is tagged with his user number. When the user signs onto a foreign HOST via the network, his same user number is used to tag processes he creates in that HOST. The foreign HOST obtains the user number either by consulting a table at login time, as the home HOST does, or by noticing the identification of the caller. The effect of propagating the user's number is that each user creates his own virtual net consisting of processes he has created. This virtual net may span an arbitrary number of HOSTs. It will thus be easy for a user to connect his processes in arbitrary ways, while still permitting him to connect his processes with those in other virtual nets. The relationship between sockets and processes is now describable (see Figure 4). For each user number at each HOST, there are 128 send sockets and 128 receive sockets. A process may request from the local NCP the use of anyone of the sockets with the same user number; the request is granted if the socket is not otherwise in use. The key observation here is that a socket requested by a process cannot already be in use unless it is by some other process within the same virtual net, and such a process is controlled by the same user. An unusual aspect of the HOST-HOST protocol is that a process may switch its end of a connection from one socket to another. The new socket may be in any virtual net and at any HOST, and the process may initiate a switch either at the time the connection is being established, or later. The most general forms of switching entail quite complex implementation, and are not germane to the ;rest of this paper, so only a limited form will be explained. This limited form of switching provides only that a process may substitute one socket for another while establishing a connection. The new socket must have the same user number and HOST number, and the connection is still established to the same process. This form of switching is thus only a way of relabelling a socket, for no change in the routing of messages takes place. In the next section we document the system calls and control commands; in the section after next, we consider how login might be implemented. SYSTE1Vr CALLS AND CONTROL COMMANDS Here we sketch the mechanics of establishing, switching and breaking a connection. As noted above, the NCP interacts with user processes via system calls and with other NCPs via control commands. We therefore begin with a partial description of system calls and control commands. System calls will vary from one operating system to another, so the following description is only suggestive. We assume here that a process has several input-output paths which we will call ports. Each port may be connected to a sequential I/O device, and while connected, transmits information in only one direction. We further assume that the process is blocked (dismissed, slept) while transmission proceeds. The following is the list of system calls: Init where (port) and (AEN 1)} (port), (AEN 1), (AEN 2), (foreign socket) is part of the process issuing the Init are 8-bit AEN's (see Figure 3) (AEN 2) (foreign socket) is the 40-bit socket name of the distant end of the connection. The first AEN is used to initiate the connection; the second is used while the connection exists. The low-order bits of (AEN 1) and (AEN 2) must agree, and these must be the complement of the loworder bit of (foreign socket). The NCP concatenates (AEN 1) and (AEN 2) each with the user' number of the process and the HOST number to form 40-bit sockets. It then sends a Request for Connection (RFC) control command to the distant NCP. When the distant NCP responds positively, the connection is established and HOST-HOST Communication Protocol the process is unblocked. If the distant NCP responds negatively, the local NCP unblocks the requesting .process, but informs it that the system call has failed. 593 occurs and it provides a more compact way to connect a set of processes. CLS (my socket), (your socket) Listen (port), (AEN 1 ) where (port) and (AEN 1) are as above. The NCP retains the ports and (AEN 1) and blocks the process. When an RFC control command arrives naming the local socket, the process is unblocked and notified that a foreign process is calling. Accept (AEN 2) After a listen has been sa tisfied, the process may either refuse the call or accept it and switch it to anoth~r socket. To. accept the call, the process issues the Accept system call. The NCP then sends back an RFC control command. Close (port) After establishing a connection, a process issues a Close to break the connection. The Close is also issued after a Listen to refuse a call. Transmit (port), (addr) If (port) is attached to a send socket, (addr) points to a message to be sent. This message is preceded by its length in bits. If (port) is attached to a receive socket, a message is stored at (addr). The length of the message is stored first. Control commands A vocabulary of control commands has been defined for communication between Network Control Programs. Each control command consists of an 8-bit operation code to indicate its function, followed by some parameters. The number and format of parameters is fixed for each operation' code. A sequence of control commands destined for a particular HOST can be packed into a single control message. RFC The specified connection is terminated CEASE (link) When the receiving process does not consume its input as fast as it arrives, the buffer space in the receiving HOST is used to queue the waiting messages. Since only limited space is generally available, the receiving HOST may need to inhibit the sending HOST from sending any more messages over the offending connection. When the sending HOST receives this command, it may block the process generating the messages. RESUME (link) This command is also sent from the receiving HOST to the sending HOST and negates a previous CEASE. LOGGING IN We assume that within each HOST there is always a process in execution which listens to login requests. We call this process the logger, and it is part of a special virtual net whose user number is zero. The logger is programmed to listen to calls on socket number O. U pnn receiving a call, the logger switches it to a higher (even) numbered socket, and returns a call to the socket numbered one less than the send socket originally calling. In this fashion, the logger can' initiate 127 conversations. To illustrate, assume a user whose identification is X'010005' (user number 5 at UCLA) signs into UCLA, starts up one of his programs, and this program wants to start a process at SRI. No process at SRI except the logger is currently willing to listen to our user, so he executes Init, (port) = 1, (AEN 1) = 7, (my socket I), (my socket 2), (AEN 2) = 7, (your socket), «link») (foreign socket) = O. This command is sent because a process has executed either an Init system call or an Accept system call. A link is assigned by the prospective receiver, so it is omitted if (my socket 1) is a send socket. There is distinct advantage in using the same commands both to initiate a connection (Init) and to accept a call (Accept). If the responding command were different from the initiating command, then two processes could call each other' and become blocked waiting for each other to respond. With this scheme no deadlock His process is blocked, and the NCP at UCLA sends RFC (my socket 1) = X'0100050107', (my socket 2) = X'0100050107', (your socket) = X'0000000200' The logger at SRI is notified when this message received, because it has previously executed Listen (port( = 9, (AEN 1) = O. IS 594 Spring Joint Computer Conference, 1970 (i) .LOGIN@ (ii) • (iii) The process at UCLA is unblocked and notified of the successful Init. Because the SRI logger always initiates a connection to the AEN one less than it has just been connected to, the UCLA process then executes R TELNET@ ESCAPE Listen (port) = 11 (AEN 1) = 6 CHARACTER IS .@ and when unblocked, ( iv) CONNECT TO SRI@ (v) @ENTER ( vi) @CAL!@ (vii) CAL AT YOUR SERVICE@ (viii) >READ FILE FROM NETWRK.@ CARR.@ Accept (AEN 2) = 6. When these transactions are complete, the UCLA process is doubly connected to the logger at SRI. The logger will then interrogate the UCLA process, and if satisfied, create a new process at SRI. This new process will be tagged with the user number X'OI0005', and both connections will be switched to the new process. In this case, switching the connections to the new process corresponds to "passing the console down" in many time-sharing systems. USER LEVEL SOFTWARE ( ix) .. NETWRK: +- OS K:MYFI LE • CAL@ Figure 5-A typical TELNET dialog Underlined characters are those typed by the user The logger then executes Accept (AEN 2) = 88. In response to the Accept, the SRI NCP sends RFC (my socket 1) = X'0000000200' (my socket 2) = X'0000000258' (your socket) = X'OI00050I07' (link) = 37 where the link has been chosen from the set of available links. The SRI logger then executes Init (port) = 10 (AEN 1) = 89, (AEN 2) = 89, (foreign socket) = X'OIOOO.10I06' which causes the NCP to send RFC (my socket 1) = X'0000000259' (my socket 2) = X'0000000259, (your socket) = X'0100050I06' At the user level, subroutines which manage data buffers and format input destined for other HOSTs are provided. It is not mandatory that the user use such subroutines, since the user has access to the network system calls in his monitor. In addition to user programming access, it is desirable to have a subsystem program at each HOST which makes the network immediately accessible from a teletype-like device without special programming. Subsystems are commonly used system components such as text editors, compilers and interpreters. An example of a network-related subsystem is TELNET, which will allow users at the University of Utah to connect to Stanford Research Institute and appear as regular terminal users. It is expected that more sophisticated subsystems will be developed in time, but this basic one will render the early network immediately useful. A user at the University of Utah (UTAH) is sitting at a teletype dialed into the University's PDP-IO/50 time-sharing system. He wishes to operate the Conversational Algebraic Language (CAL) subsystem on the XDS-940 at Stanford Research Institute (SRI) in Menlo Park, California. A typical TELNET dialog is illustrated· in Figure .1. The meaning of each line of dialog is discussed here. (i) The user signs in at UTAH. (ii) The PDP-IO run command starts up the TEL NET subsystem at the user's HOST. (iii) The user identifies a break character which causes any message following the break to be HOST-HOST Communication Protocol interpreted locally rather than being sent on to the foreign HOST. (iv) The TELNET subsystem will make the appropriate system calls to establish a pair of connections to the SRI logger. The connections will be established only if SRI accepts another foreign user. The UTAH user is now in the pre-logged-in state at SRI. This is analogous to the standard teletype user's state after dialing into a computer and making a connection but before typing anything. (v) The user signs in to SRI with a standard login command. Characters typed on the user's teletype are transmitted unaltered through the PDP-10 (user HOST). and on to the 940 (serving HOST). The PDP-10 TELNET subsystem will have automatically switched to fullduplex, character-by-character transmission, since this is required by SRI's 940. Full duplex operation is allowed for by the PDP-10, though not used by most Digital Equipment Corporation subsystems. (vi) and (vii) The 940 subsystem, CAL, is started. At this point, the user wishes to load a CAL file into the 940 CAL subsystem from the file system on his local PDP-10. (viii) CAL is instructed to establish a connection to UTAH in order to receive the file. "NETWRK" is a predefined 940 name similar in nature to "PAPER TAPE" or "TELETYPE". (ix) Finally, the user types the break character ( #) followed by a command to his PDP-10 TELNET program, which sends the desired file to SRI from Utah on the connection just established for this purpose. The user's next statement is in CAL again. The TEbNET subsystem coding should be minimal for it is essentially a shell program built over the network system calls. It effectively establishes a shunt in the user HOST between the remote user and a distant serving HOST. Given the basic system primitives, the TELNET subsystem at the user HOST and a manual for the serving HOST, the network can be profitably employed by remote users today. HIGHER LEVEL PROTOCOL The network poses special problems where a high degree of interaction is required between the user and a particular subsystem on a foreign HOST. These problems arise due to heterogeneous consoles, local operating system overhead, and network transmission delays. Unless we use special strategies it may be 595 difficult or even impossible for a distant user to make use of the more sophisticated subsystems offered. While these difficulties are especially severe in the area of graphics, problems may arise even for teletype interaction. For example, suppose that a foreign subsystem is designed for teletype consoles connected by telephone, and then this subsystem becomes available to network users. This subsystem might have the following characteristics. 1. Except for echoing and correction of mistypihg, no action is taken until a carriage return is typed. 2. All characters except" i ", ,,~" and carriage return are echoed as the character typed. 3. ~ causes deletion of the immediately preceding accepted character, and is echoed as that character. 4. i causes all previously typed characters to be ignored. A carriage return and line feed are echoed. 5. A carriage return is echoed as a carriage return followed by. a line feed. If each character typed is sent in its own message, then the characters H ELL 0 ~ ~ P c.r. cause nine messages in each direction. Furthermore, each character is handled by a user level program in the local HOST before being sent to the foreign HOST. Now it is clear that if this particular example were important, we would quickly implement rules 1 to 5 in a local HOST program and send only complete lines to the foreign HOST. If the foreign HOST program could not be modified so as to not generate echoes, then the local program could not only echo properly, it could also throwaway the later echoes from the foreign HOST. However, the problem is not any particular interaction scheme; the problem is that we expect many of these kinds of schemes to occur. We have not found any. general solutions to these problems, but some observations and conjectures may lead the way. With respect to heterogeneous consoles, we note that although consoles are rarely compatible, many are equivalent. It is probably reasonable to treat a model 37 teletype as the equivalent of an IBM 2741. Similarly, most storage scopes wiJl form an equivalence class, and most refresh display scopes will form another. Furthermore, a hierarchy might emerge with members of one class usable in place of those in another, but not vice versa. We can imagine that any scope might be an adequate substitute for a teletype, but hardly the reverse. This observation leads us to wonder jf a network-wide language for consoles might be possible. Such a language would provide for distinct treatment of different classes of consoles, with semantics ap- 596 Spring Joint Computer Conference, 1970 propriate to each class. Each site could then write interface programs for its consoles to make them look like network standard devices. Another observation is that a user evaluates an interactive system by comparing the speed of the system's responses with his own expectations. Sometimes a user feels that he has made only a minor request, so the response should be immediate; at other times he feels he has made a substantial request, and is therefore willing to wait for the response. Some interactive subsystems are especially pleasant to use because a great deal of work has gone into tailoring the responses to the- user's expectations In the network, however, a local user level process intervenes between a local console and a foreign subsystem, and we may expect the response time for minor requests to degrade. Now it may happen that all of ~his tailoring of the interaction is fairly independent of the portion of the subsystem which does the heavy computing or I/O. In such a case, it may be possible to separate a subsystem into two sections. One section would be the "substantive" portion; the other would be a "front end" which formats output to the user, accepts his inputs, and controls computationally simple responses such as echoes. In the example above, the program to accumulate a line and generate echoes would be the front end of [ome subsystem. We now take notice of the fact that the local HOSTs have substantia~ computational power, but our current designs make ~se of the local HOST only as a data concentrator. This is somewhat ironic, for the local HOST is not only poorly utilized as a data concentrator, it also degrages performance because of the delays it introduces. These arguments have led us to consider the possibility of a Network Interface Language (NIL) which would be a network-wide language for writing the front end of interactive subsystems. This language would have the feature that subprograms communicate through network-like connections. The strategy is then to transport the source code for the front end of a subsystem to the local HOST, where it would be compiled and executed. During preliminary discussions we have agreed that NIL should have at least the following semantic properties not generally found in languages. 1. Concurrency. Because messages arrive asynchronously on different connections, and because user input is not synchronized with subsystem output, NIL must include semantics to accurately model the possible con currencies. 2. Program Concatenation. It is very useful to be able to insert a program in between two other programs. To achieve this, the interconnection of programs would be specified at run time and would not be implicit in the source code. 3. Device substitutability. It is usual to define languages so that one device may be' substituted for another. The requirement here is that any device can be modeled by a NIL'program. For example, if a network standard display cOLltroller manipulates tree-structures according to messages sent to it then these structures must be easily implementable in NIL. NIL has not been fully specified, and reservations have been expressed about its usefulness. These reservations hinge upon our conjecture that it is possible to divide an interactive subsystem into a transportable front end which satisfies a user's expectations at low cost and a more substantial stay-at-home section. If our conjecture is false, then NIL will not be useful; otherwise it seems worth pursuing. Testing of this conjecture and further development of NIL will take priority after low level HOST-HOST protocol has stabilized. HOST/IMP INTERFACING The hardware and software interfaces between HOST and IlVIP is an area of particular concern to the HOST organizations. Considering the diversity of HOST computers to which a standard IMP must connect, the hardware interface was made bit serial and full-duplex. Each HOST organization implements its half of this very simple interface. The software interface is equally simple and consists of messages passed back and forth between the IMP and HOST programs. Special error and signal messages are defined as well as messages containing normal data. 1\1essages waiting in queues in either machine are sent at the pleasure of the machine in which they reside with no concern for the needs of the other computer. The effect of the present software interface is the needless rebuffering of all messages in the HOST in addition to the buffering in the IMP. The messages have no particular order other than arrival times at the IlVIP. The Network Control Program at one HOST (e.g., Utah) needs waiting RFNM's before all other messages. At another site (e.g., SRI), the NCP could benefit by receiving messages for the user who is next to be run. What is needed is coding representing the specific needs of the HOST on both sides of the interface to make intelligent decisions about what to transmit next over the channel. With the present software interface, the channel in one direction once committed to a particular message is then locked up for up to 80 milli- HOST-HOST Communication Protocol seconds. This approaches one teletype character time and needlessly limits full-duplex, character by character, interaction over the net. At the very least, the IMP jHOST· protocol should be expanded to permit each side to assist the other in scheduling messages over the channels. 597 areas of interest. NIL is one example, and interprocess communication is another. Interprocess communication over the network is a subcase of general interprocess communication in a multiprogrammed environment. The mechanism of connections seems to be new, and we believe this mechanism is useful even when the processes are within the same computer. CONCLUSIONS REFERENCES At this time (February 1970) the initial network fo four sites is just beginning to be utilized. The communications system of four IMPs and wide band telephone· lines have been operational for two months. Programmers at UCLA have signed on as users of the SRI 940. More significantly, one of the authors (S. Carr) living in Palo Alto uses the Salt Lake PDP-10 on a daily basis by first connecting to SRI. We thus have first hand experience that remote interaction is possible and is highly effective. Work on the ARPA network has· generated new 1 L ROBERTS The ARPA network Invitational Workshop on Networks of Computers Proceedings National Security Agency p 115 ff 1968 2 R M RUTLEDGE et al An interactive network of time-sharing computers Proceedings of the 24th National Conference Association for Computing Machinery p 431 ff 1969 3 F E HEART R E KAHN S M ORNSTEIN W R CROWTHER D C WALDEN The interface message processor for the ARPA computer network These Proceedings A comparative study of management decision-making from computer-terminals by C. H. JONES Harvard U niver8ity Cambridge, Massachusetts and J. L. HUGHES and K. J. ENGVOLD IBM Corporation Poughkeepsie, N ew York INTRODUCTION The advent of interactive computer systems and cathode ray tube terminals promises great achievements in the area of managerial decision-making. Thus far, however, graphic terminals and light pens have been used mostly to solve scientific, engineering, and mathematical problems. They have been applied relatively little to the types of problems frequently encountered by business management. As a result, objective data on their effectiveness in this environment are largely lacking. In order to provide some data, a study was undertaken to observe how the managerial decision-making process might be improved by a graphic man-computer interface and to measure quantitatively the gains that might be achieved. Many resource allocation tasks faced by management are combinatorial. Capital expenditure budgets, personnel assignment, project selection, and many production scheduling tasks all have this feature in common. Demonstrating that a computer can be helpful in solving one managerial problem of this kind would therefore suggest its extrapolation to other such problems. Since job shop scheduling has frequently been considered to be the prototype of many complex combinational problems faced by industry, it \vas chosen as the problem for investigation in this study.I,4,8,9 A great deal of effort has been devoted to finding the algorithmic and heuristic solutions to selecting the fi99 best sequence for processing jobs through the different machines in a job shop. At present, there is still no feasible optimizing procedure for selecting the best sequence from among the .astronomical number of sequences possible even in a medium-sized job shop. In this sense, schedule-making is still an art. IS Since no single technique can satisfactorily provide an optimal answer, some recent effort2 ,S,5,7,10,l1 has been based on the premise that a production scheduler would find it helpful to have a computer assist him in generating and evaluating a number of alternative schedules. The scheduler could then bring to bear human cognition in further improving these computergenerated schedules and in selecting the best schedule based on his current knowledge of priority, cost, and personnel factors. In order to study the effect of different computer interfaces on the ability of managers to generate profitable schedules using the symbiotic model described, both a typewriter and a cathode-ray-tube terminal were programmed to control the job shop scheduling model. Because its light pen allowed greater ease and speed of operation, the display terminal promised to provide more effective man-machine interaction than the typewriter terminal. A study was therefore designed to test two hypotheses: (1) that computer-aided job scheduling using either terminal was superior to manual job scheduling, and (2) that a display terminal was superior to a typewriter terminal for job scheduling. 600 Spring Joint Computer Conference, 1970 DESCRIPTION OF JOB SHOP MODEL The shop contains six machines: a lathe, a grinder, two boring machines, and two heat treating furnaces. The scheduler must devise a three-day job schedule for nine jobs presented to him for acceptance or rejection. Each job carries with it a selling price, delivery date, penalty for lateness, fixed sequence of two to five operations on the various machines, and amount of time for each operation. The boring operation includes three different set-ups with varying time requirements for changing set-ups. The scheduler can authorize up to eight hours of overtime on each machine each day. Although in real life a production planner has to concern himself with a complex goal which includes terms relating to· customer service, worker satisfaction, machine efficiency, etc., the scheduler in this model is asked only to maximize profits. The calculation of the profit includes the penalty costs of late deliveries and overtime premium costs. The simplification of the goal permitted the results of schedules produced by different methods to be compared objectively. Three different methods were studied: manual, typewriter terminal, and display terminal. MANUAL JOB SCHEDULING For several years the job shop problem has been given at the Harvard Business School to graduate students and business executives to be solved with pencil and paper. Most schedulers have followed a similar procedure, consisting essentially of the following five steps: 1. They make general arithmetic calculations based on noting the urgent jobs, profitable jobs, and loads on different machines. 2. They layout some form of Gantt chart or paper analogue of the shop, usually consisting of a row for each machine and a column for each hour. 3. They fill in blocks of time for job operations on different machines. They are rarely able to verbalize the process by which they decide on which operation to put in next. Their main goal is to layout some feasible way of getting through the three-day period. 4. They "fine-tune" this schedule by such adjustments as adding a little overtime (e.g., "Job 2 is three hours late. If I work the lathe overtime on Day 1, I can get it out on time."), or swapping two jobs (e.g., "I need to get some work to the furnace earlier, so I'll put Job 4 on the lathe ahead of Job 6 in order to use the furnace on the~econd day."). This fine-tuning can greatly improve a schedule. 5. They calculate the costs and profits resulting from the final schedule. There is some looping and cycling through certain of the steps, but the five steps are usually followed in this sequence. The value of the final schedule achieved appears to depend largely upon two factors-the characteristics of the starting schedule and the improvements that can be made in that schedule. Obviously, not all first pass schedules are equally good bases for making improvements. One may involve higher costs than another. One will highlight a critical swap which will make it easy to achieve a larger increase in profit, while another will hide the opportunity for such a swap in a way that requires the patience and skill of a cryptographer to uncover it. The task of laying out a first pass schedule is quite time consuming. It takes the planner approximately forty-five minutes to an hour and a half. The result is that the manual schedulers are frequently stuck with their first pass because they do not have time to construct a second one. COMPUTER JOB SCHEDULING From these observations, there appeared to be four ways in which a computer could assist a planner: 1. It could perform a variety of standardized calculations giving him information on such variables as profit/hour for each jo~, hours of work required on critical machines, slack time on each job, etc. 2. The computer could be programmed to generate different first pass schedules based on rules selected by the scheduler. For example, a brief coded input could tell the computer "show me the schedule that would result if I accepted all jobs, assigned jobs with the earliest promise data first to empty machines, and worked enough overtime to finish the work waiting for machines at the end of each regular shift." Selecting different combinations of rules would allow scores of schedules to be generated in much less time than it takes to generate one manually. The decision maker could then select the best of these for fine-tuning. 3. The computer could present the information in a Gantt chart format to help the decision maker to scan it and to make improvements. It would be programmed to make it easy to enter modifications to the schedule. 4. The· computer could carry out the bookkeeping so that the decision maker could quickly determine the profitability of his latest schedule. A Comparative Study of Management Decision-Making from Computer-Terminals 601 TYPEWRITER TERMINAL A Fortran computer program using an electric typewriter interface (IBM 1050 or 2741) to provide these capabilities for job scheduling has been described elsewhere. 7 The typewriter terminal, however, presents several obstacles to an effective "conversation" between the decision maker and the computer which a graphic terminal promises to remove: 1. Printout on the electric typewriter is slow. The time lag involved in printing a 50-line schedule at about 4 seconds a line is noticeable and breaks the concentration of the decision maker. A cathode ray tube can output an entire page of data simultaneously. 2. An electric typewriter is not well suited to making small changes in a schedule because a complete new printout is required each time. On a cathode ray tube, it is possible to change only a few numbers on a page without disturbing the rest. 3. It is cumbersome to type in code identifying the specific job, operation, and machine involved in a proposed change. With a light pen, the same input can be made merely by pointing at a location on a Gantt chart. Choosing a rule with a light pen from a table of alternatives also produces fewer errors than typing coded input, particularly for an inexperienced typist. 5. A cathode ray tube allows the presentation of graphic displays which may enhance the decision maker's understanding of the problem. The IBM 2250 Display Unit was therefore programmed to serve as another I/O device for the job shop scheduling model. At the display, the planner uses a light pen to select three decision rules from among three sets of rules (acceptance, sequence, overtime) displayed on the screen (Figure 1). He then light-pens "RUN" in order to view an operating Figure 1-Decision rules for job shop scheduler, selected by light pen at IBM 2250 display unit Figure 3-Job schedule produced by selecting decision rules Figure 2-0perating statement for job schedule produced by selecting decision rules GRAPHIC TERMINAL 602 $pring Joint Computer Conference, 1970 statement showing the net profit or loss from scneduling, errors in scheduling, and a summary of machine queue and overtime ·hours (Figure 2). By light-penning "MANUAL", he can examine the detailed hourly job schedule generated by the rules (Figure 3) and rearrange the job operations in the schedule manually with the light pen in an attempt to improve it. By means of the light pen, operations may be split, moved up or down the job schedule, or moved back and forth between the hourly job schedule on the left and the jobs-to-bescheduled section at the right. The scheduler moves operations by first light-penning the operation to be moved and then the location to which the operation is to be moved (Figure 4) . Instead of selecting decision rules and generating a schedule automatically, the planner can fill in the entire schedule manually by transferring job operations from the list of jobs to be scheduled on the right to the job schedule on the left (Figure 5). This operation is not exercised frequently, however, because generating a schedule by selecting rules is more efficient. The planner can also obtain printed copy of any display by lightpenning "PRINT." In addition, he can store highly profitable schedules by light-penning "SAVE" and retrieve them later by light-penning "GET." The program for the display unit also provides additional information to the planner to aid him in making acceptance and sequencing decisions (Figure 6). If he light-pens the variable in the heading of any column (PRICE, PRICE PER HOUR, etc.) under the Sequence-of-Jobs section at the bottom, the display automatically puts the nine jobs in the order called for by the variable. For example, light-penning "PRICE" lists the jobs in descending order by price. When a satisfactory job sequence is obtained, the sequence and acceptance rules selected are transferred back to the original rules panel (Figure 1) by light-penning "RULES," after which the simulation is run from the rules panel. Figure 5-Empty job schedule and list of jobs to be scheduled manually by light pen Figure 6-Supplementary information panel to assist scheduler in making acceptance and sequencing decisions Figur~ 4-Job schedule, produced by selecting decision rules, being adjusted with the light pen A Comparative Study of Management Decision-Making from Computer-Terminals 603 TABLE I-Performance Data on Preliminary & Maximum Profit Schedules for Manual, Typewriter, and Display Samples Preliminary Schedule Profit Jobs Omitted Manual Sample (n = 5 teams) $ 914 2546 1411 1913 3425 M SD 2042 982 2191 636 Display Sample (n =.6 teams) 3354 3448 695 2694 3521 2737 M SD 2741 1065 Errors Profit Jobs Omitted Errors No. Runs Increase In Profit 1 1 2 1 3 $2009 1009 469 925 0 6 3 4 3 1 0 0 3 4 1 3.4 1.8 1.6 1.8 2924 661 1.8 .8 3 0 0 1 4 3492 3402 3510 3573 3188 1 1 1 1 1 0 0 0 0 0 18 11 10 10 11 753 1740 2145 859 713 3.6 1.3 1.6 1.8 3433 150 1 0 0 0 12 3.4 1242 657 1 1 2 1 3 0 2 0 4 2 1 3444 3448 3444 3576 3501 3477 1 1 1 1 1 * 0 0 0 0 0 0 26 14 23 51 22 15 90 0 2749 882 20 740 2 1.3 1.5 1.5 34tl6 52 1 0 0 0 25 13.5 724 1058 Typewriter Sample (n = 5 teams) 2739 2 1662 5 1365 5 2714 3 2475 3 M SD Maximum Profit Schedule ~ $2923 3555 1880 2838 3425 2 1 3 2 1 0 0 2 1 1 .8 .7 1.6 .9 882 748 * Data missing. THE EXPERIMENT Thirty-two production managers and schedulers from thirteen companies in the neighborhood of the IBM Education Center in Poughkeepsie, N ew York, participated in the study. They were formed into 16 two-man· teams .and assigned by a randomization procedure (slightly modified by the exigencies of human and computer availability) to one of the three scheduling techniques: manual, typewriter terminal and graphic terminal. The typewriter terminal was connected through public telephone lines to the MIT time-sharing computer (an IBM 7094) in Cambridge, Massachusetts, while the graphic terminal was directly conIlected to an IBM 360/40 in the IBM Education Center in Poughkeepsie. In the ·latter, the scheduling program was catalogued as a job under OS/360 and controlled by a display processing and tutorial system developed at the Poughkeepsie Education Center.6 On the MIT computer, the program was written in Fortran and operated under MIT's CTSS. The only difference between the programs stemmed from the special capabilities. and limitations of the input-output devices. Each team received a 55-minute introductory session in which they were presented with a description of the scheduling problem ~nd asked to devise manually ~ three-day schedule for the hypothetical job shop. After a five-minute break, each team was sent to its assigned experimental method (typewriter or display terminal) to see if it could devise a better schedule. During these one hour and twenty-five minute experimental sessions, the authors explained how to use the terminal and remained present to answer questions. Instructions and descriptions of the scheduling rules were also provided in printed form. After the experimental session, the manual participants were given a chance 'to tryout the computer terminals. In addition, the type- 604 Spring Joint Computer Conference, 1970 writer teams were given an opportunity to work at the display terminals. The most profitable schedules achieved by each team during the introductory and experimental periods were collected for analysis. A program was written to collect data on various aspects of each team's performance at the terminals, such as amount of profit, number of schedules generated, number of jobs scheduled, and number of scheduling errors. In addition, the participants responded to written questionnaires asking about their previous experience and their attitude toward the scheduling task and the tools available to them. FINDINGS Table I summarizes the performance data for the schedules produced by the manual, typewriter, and display samples. The differences among the three samples in mean profit for the preliminary and maximum profit schedules were not significant by analysis of variance (ANOVA). The mean increase in profit from preliminary to maximum schedule was also not significant. However, Bartlett's test of homogeneity. of variance for the maximum schedule profits was significant at the .01 level. The display group had a standard deviation of only $52, the typewriter group $150, and the manual group $661 (Table 1). All but one of the eleven display and typewriter teams had a maximum profit above $3,400, but three of the five manual teams did not achieve this figure. Thus, one result of computer scheduling apparently was to help the weaker teams-particularly those in the display sample-to raise their maximum profits closer to those of the top schedulers. On the other hand, there was little difference in profits among the top schedulers in all three samples. On other performance variables (Table I), the expected differences in favor of the computer teams appeared. On the maximum profit schedule, each computer team omitted only one job, while three manual teams failed to schedule two or three jobs. The differences in mean jobs omitted among the three samples were significant at the .05 level by ANOVA. Three of the manual teams also made one or two errors in scheduling, while the computer program prevented the typewriter and display teams from making any errors. Striking differences among the three samples occurred in the number of runs or job schedules generated. The display sample had a mean of 25 schedules, the typewriter sample 12 schedules, and the manual sample 1.6 schedules. The corresponding standard deviations were 13.5, 3.4, and .9. The differences in means and variances were significant at the .01 level by ANOVA and Bartlett's test, respectively. The typewriter and display teams thus not only generated more schedule runs, but showed more variation in the number generated. The size of the difference between the display and typewriter teams indicated the potential advantages of the display over the typewriter iri. ease of selecting decision rules with a light pen and in speed of presenting results on the screen. On the questionnaire, all the schedulers expressed positive reactions to the use of the computer terminals. Seven of the nine participants who had a chance to try both typewriter and display terminals expressed a preference for the latter. DISCUSSION In addition to the objective results cited above, a number of observations of the behavior of the schedulers at the terminals were made. The approaches of the typewriter and display groups varied considerably. Once a schedule was generated by the computer in response to the selection of rules, it could be improved by making minor changes, i.e., "fine-tuning." The electric typewriter sample, however, found making these small adjustments too time consuming. As a result, none of them made more than one or two such changes. On the other hand, the display teams found the sliding and swapping of jobs such fun that they were seduced into a misallocation of time. They tended to spend too much time adjusting the first schedule that they generated and therefore did not have sufficient time later to adjust their best schedule. Experience with other planners has shown that they lost $50-$100 by not taking the time to adjust their final schedule. Because of this situation, the profits attained by the display sample may very well have been understated. With more time to learn to become accustomed to the display, they undoubtedly would have done better. One of the original goals of the cathode ray tube programming was to provide an electronic analogue for constructing a two-dimensional job schedule by means of pencil and paper. Despite the ease of lightpenning job operations into a blank schedule displayed on the screen, the display teams all preferred to use rule combinations to build a starting schedule rather than fitting jobs into the schedule one by one. If the three approaches to scheduling the shop are treated as a continuous spectrum running from no communication with a fast calculating device (the manual groups) through mediocre communication (the typewriter groups) to easy communication (the display groups), there is an interesting shift in the problemsolving techniques of schedulers in each sample. A Comparative Study of Management Decision-Making from Computer-Terminals The manual teams were clearly trying to make the best decision at each decision point. Thus, they would agonize over the decision to put Job 2 or Job 4 on the lathe. They would consider the effect of Job 2's delivery time, the amount of overtime on the lathe, the future requirements on the milling machine, etc. These decisions were usually inconclusive because it is not possible for most human minds to appreciate fully all the ramifications of the decision trees arising in job shop schedules. The typewriter teams were less bogged down in details. Instead of arguing about a specific sequencing decision, their conversations were concerned with the merit of following decision rules which emphasized machine utilization or delivery performance. In effect they said, "We can't hope to make all the separate decisions perfectly. Let's try to figure out which decision rules should give us the best schedule." Based on existing knowledge, this is not a soluble task. Although good rationales can be given for many different rules, even small changes in the operation times and sequences can cause any rule to provide a much less satisfactory result. The display tube teams were more pragmatic in that they spent less time discussing alternatives and more time generating schedules. They found it faster and easier to try a combination of rules than to attempt to reason out the logical value of the combination. Their replies to the attitude questionnaire showed an appreciation of the difficulty of reasoning their way to a conclusive answer. They saw the main contribution of the computer as a means of trying out more alternatives. Their colleagues using the typewriter terminals, on the other hand, emphasized the value of the computer in demonstrating the importance of using particular rules. This study has furnished some evidence of the extent to which an interactive terminal, and particularly a display terminal, can enhance the decision-making skills of a manager engaged in solving a fairly representative kind of business problem. This approach appears to offer opportunities for improved managerial decision-making in many areas where the ability to try out many alternatives rapidly by computer and to improve the best of these alternatives by human cognition can result in more profitable decisions. 605 REFERENCES 1 D C CARROLL Heuristic sequencing of single and multiple component jobs Unpublished PhD thesis Massachusetts Institute of Technology 1966 2 D C CARROLL Implications of on-line, real-time systems for managerial decision making Paper prepared for presentation at the Research Conference on the Impact of New Developments in Data Processing on Management Organization and Managerial Work Sloan School of Management MIT March 29-30 1966 3 D C CARROLL Man-machine cooperation on planning and control problems Paper presented at the International .Symposium on LongRange Planning for Management sponsored by the International Computation Center Rome Held at UNESCO Paris September 20-24 1965 4 R W CONWAY W L MAXWELL Network dispatching by the shortest operation discipline Operations Research Vol 10 pp 51-73 1962 5 J C EMERY The planning process and its formalization in computer models Proceedings of the Second Congress of the Information System Sciences pp 369-389 1965 6 K J ENGVOLD J L HUGHES A general-purpose display processing and tutorial system Communications of the ACM Vol 11 pp 697-702 1968 7 R FERGUSON C H JONES A computer-aided decision system Management Science In press 8 W S GERE Heuristics in job shop scheduling Management Science Vol 13 No 3 pp 167-190 November 1966 9 B GIFFLER G L THOMPSON Algorithms for solving production scheduling problems Operations Research Vol 8 pp 489-503 1960 10 J C R LICKLIDER Man-computer partnership International Science and Technology 19ff May 1965 11 J C R LICKLIDER Man-computer symbiosis IRE Transactions on Human Factors in Electronics HFE-I No 1 pp 4-11 March 1960 12 H A SCHWARTZ R J HASKELL JR A study of computer-assisted instruction in industry Journal of Applied Psychology Vol 50 pp 360-363 1966 13 M SPITZER The computer art of schedule-making Datamation Vol 15 pp 84-86 1969 An interactive keyhoard+ for man-computer communication by LARRY L. WEAR Hewlett-Packard Company Mountain View, California and RICHARD C. DORF Ohio University Athens, Ohio INTRODUCTION With the advent of time-sharing and remote terminals, people who have little or no programming experience are becoming computer users. One of the reasons' these people have been attracted to using time-sharing computers is that the programming languages available have been made relatively simple and easy to use. BASIC is an ,example of the type of language that has become popular in the time-sharing community. Other languages have been developed for such fields as numerical control. Languages such as these which have been developed using terminology and syntax which are consistent with the terminology and syntax of a given field of interest are called natural languages. 1 In the past few years there has been a· considerable amount of discussion about the use of natural languages for man-computer systems. 2 Although many feel that natural languages are the wave of the future, 3 some believe that there are some problems with the use of natural languages that must be solved before natural languages can become widely used. One of the problems associated with the use of a natural language is the input of statements and commands to the computer. N aturallanguages usually contain a number of English words and abbreviations. Because of this, statements written in a natural language often resemble sentences written in English. If an operator is a good typist, entering a natural language statement with a teletype or similar device presents no problem. However, if the operator is not a proficient typist, inputting a state- * Patent Applied For. 607 ment in a natural language can be a time consuming process. In order to circumvent the problems associated with using a standard teletype, the interactive keyboard described in this paper was developed. Besides eliminating some of the problems associated with the teletype or similar device, the interactive keyboard has two other important features which enhance the performance of a man-computer communication system; they are: (1) error prevention in the form of electrical lockout of keys that would cause incorrect syntax to be generated and, (2) visual feedback to the operator in the form of lights under keys which will result in proper syntax generation. It is the visual feedback incorporated in the keyboard which makes the keyboard interactive as opposed to the static unidirectional information transfer provided by a standard keyboard. These features are described fully in the following section. An experiment which was conducted to obtain a measure of the performance of operators using the interactive keyboard and the results of this experiment are given in a later section of this paper. A DESCRIPTION OF AN INTERACTIVE KEYBOARD This section contains a detailed description of the interactive keyboard that was developed by Vincent J. Nicholson, James G. Rudolph and the author at Hewlett-Packard Laboratories. An application for a patent has been filed. The keyboard has three characteristics which make it very useful as an input device for an interactive system: (1) there is no multiple 608 Spring Joint Computer Conference, 1970 "- Key With Li ghts T :: ... ... ·~~.r.cter f nterpreter . . Li ght and enable lin .. drtvers ! .. . Decoding Matrix ... ... ... : I I State storage rcgi sters AN INTERACTIVE KEYBOARD FOR A COMPUTER-AIDED CHECKOUT SYSTEM , Figure I-A block diagram of the basic elements of the. interactive keyboard use of keys, (2) lights under the keys are turned on to indicate which keys can be struck to given syntactically correct statements ·and (3) keys that are not lit are electrically locked out so that inputs which would result in incorrect syntax are prevented. The latter two are· the characteristics which make this keyboard unique. A block diagram that illustrates the basic elements of the keyboard is shown in Figure 1. The keyboard and its associated electronics can be broken down into four elements: the keys with lights, the decoding matrix, the state storage registers and the light and enable line drivers. The keys, which contain the lights, form the interface with the operator. When a key which has been enabled is pressed by the operator a character is sent to the processor, which in general would be the cpu of a computer; there the program resident in core will perform whatever action is required by the character. Besides going to the processor the character is transmitted to the decoding matrix. In the decoding matrix the combination of the character plus the knowledge of the present state of the keyboard is used to determine the next state of the keyboard. The state of the keyboard in this case is determined by which keys are lighted and enabled. When the next state of the keyboard has been determined the correct light and enable line drivers are activated. The process can now start over again when the next key is depressed. In the brief description given above no mention was made as to the timing required; this is because the purpose of the paper is to discuss the man-computer communication problem and not to go into a detailed analysis of hardware. For the same reason, a description of the components of each of the elements of the keyboard has been omitted. The interactive keyboard described in the preceding section could be used with practically any system that required man-machine communications. In this section a description of a specific realization of such a keyboard is given. An interactive keyboard was designed for use with the Hewlett-Packard 9500A programmable checkout system. 4 A brief description of the 9500A system is given in the Appendix. The interactive keyboard was used to replace the keyboard on the teletype. The printer portion of the teletype was still used as the computer output device. Figure 2 contains a layout of the interactive keyboard as it was designed for the 9500A system. The keys were partitioned into three general groups: words associated with standard BASIC statements, words associated with programming instruments and keys that correspond to variables, operators and miscellaneous functions such as carriage return and escape mode. These groups are enclosed by the dash lines in Figure 2. The following example is presented to show how the keyboard functions during the input of typical statements. Suppose the operator wanted to input the following statement: 5 PROGRAM DVM FUNCT FREQ RANGE AUTO VAR A If the keyboard were not initialized, that is wating for the first character of a line, the programmer would first press CR for carriage return or ESC for escape mode (these keys are always enabled and lighted). This would initialize the keyboard. When the keyboard is initalized the only keys that are lighted and enabled are the digits keys 0 through 9, CR, ESC, SPACE and the system command keys, RUN, LIST, and SCRATCH. Since it is desired to· input the line shown above the t;':STRUMENT RELATED KEYS SERC ·9 1--1-- c·, SIN <" DCV > FREQ SQR. tALt , ·.'<,d~l~te::;. CAL~ t __ 1--+--+--1 .:md miscell.',ncous keys Figure 2-Layout of interactive keyboard used with the Hewlett-Packard 9500A programmable checkout system AUTO Interactive Keyboard for Man-Computer Communication operator would first strike the "5" key. When he had done this, a "5" would be printed on the teletype and the set of keys corresponding to permissible next inputs would be lighted and enabled. At this point the keys corresponding to legal first words at a BASIC statement will be lighted and enabled. The digits will remain lighted and enabled because it is permissible to add to the line number. The system command keys are disabled and their lights go out because if a system com~and is to be given it must be the first input of a line. Since CR, ESC and SPACE are always lighted and enabled they will not be for the remainder of the example. The operator now presses the "PROG" Key. (See 609 ACV I - SMeE f"R':Q ,:.4;".... R.,ES CAi.- (d) Sco 7 8 9 4 5 6 2 3 . r--ISC SMeI I 0 lI 0 (.) (a) S - LIT llAl _ "IIIT MTA 7 8 9 4 S 6 2 3 I--I SMeI ISC 0 D,," FUIICT Fllet MIl" AUTO VAl SMeI (f) (~) I'S ISC SMeI OSC Cl ESC SMeI - 7 8 9 4 S 6 2 3 I 0 (e) (9) Figure 3-Illustrations (a) through (g) show which keys are illuminated and enabled during the input sequence required to enter the program statement: 5 PROGRAM DVM FUNCT FREQ RANGE AUTO VAR A Above the keys the statement being formed is shown 610 Spring Joint Computer Conference, 1970 reference five for a description of programming language.) This will cause "PROGRAM" to be output on the teletype and also will cause the next bank of switches to be .lighted and enabled. At this point the only syntactically correct inputs are those from the keys that correspond to the programmable instruments: the "DVM," "SWT," "PS" and "OSC" keys. These keys refer to the digital volt meter, the programmable relay bank, thed.c. power supply and the oscillator, respectively. The operator now strikes the "DV1\tI" key. When he does this "DVM FUNCT" will be typed out and the keys associated with the permissible voltmeter functions, "DCV," "ACV," "RES.' "FREQ." "CAL+" and "CAL-," will be lighted and enabled. At this point the operator has two feedback signals from the system. The abbreviation "FUNCT" has been typed out to indicate that a voltmeter function is required next and the function keys on the keyboard are lighted. Next, the operator presses the "FREQ" key; this causes "FREQUENCY RANGE" to be typed out and the keys "I," "0," ".," and "AUTO." The operator may enter a specific range such as "10000.," or "10." or he may choose to use the auto ranging feature of the instrument. In this case the latter option is chosen. After he presses the "AUTO" key, "AUTO VAR" is typed out and the variable keys, "A," "B," "C" and "D," are lighted and enabled. The operator now presses the "A" key and "A" is printed and the keys "0" through "9' are lighted and enabled. The final input required is a carriage return to terminate the line and initialize the keyboard for the next line of input. Figures 3 (a) through (g) show the state of the lights for each of the steps described above and the printer output for each-step. AN EXPERIMENT USING THE INTERACTIVE KEYBOARD An experiment was conducted on the 9500A system to measure the performance of operators using the interactive system. Four engineers who were familiar with BASIC, but who had never used the computer aided checkout system were chosen and given approximately an hour's training on the system. Two similar test procedures were given to the subjects. Two of them programmed and executed the tasks required in procedure 1 and the other two programmed and executed the tasks required in procedure 2. The time required to compose, enter and execute the programs and the number of lines required were recorded. In order to have a basis to judge the performance of the interactive keyboard, the subjects also did the required programming on a standard keyboard that TABLE I-Summary of Data from Check-Out Language Experiments Natural Subject A B C D Time 21 17 20 22 80 minutes Interactive Lines 42 45 44 43 174 lines Time 19 16 14 16 65 minutes Lines 42 43 44 44 173 lines had been modified for functional inputs. 5 For this experiment the two subjects who had worked with procedure 1 before were given procedure 2 and vice versa. The results of the experiment are obvious and show that the interactive keyboard improves the performance of the man-machine system significantly. A comparison between using a natural language on the teletype modified for functional input and using the interactive keyboard with natural language shows that the teletype method required 23 percent more time than the interactive keyboard method. CONCLUSIONS The main conclusion that can be drawn from the comparison of the data from the experiment described above is that the interactive keyboard provided a significantly improved method of man-machine communication. Two comments made by the test subjects seem worthy of noting: first, two of the subjects said that by the end of the test they were watching the keyboard almost exclusively and not referring to the teletype print-out; second, one subject said that programming with the interactive keyboard was more relaxing than programming with the teletype. Even though the improvements in performance that were calculated above are probably not exactly correct, they do provide a positive indication that the interactive keyboard is a definite improvement over the teletype in man-machine interactive systems. EXTENSIONS The type of interactive terminal described above does not have to be limited to a keyboard method of entry. If one has available a CRT with light pen input the same principles can be applied to the design of the system. In this case rather than just illuminating a key with a word written on it, the word can be written on the face of the CRT and if the operator desires to Interactive Keyboard for Man-Computer Communication input that work he touches the CRT in the area of the word with the light pen. With a little more imagination one could envision a system with a CRT to provide feedback to the operator and an audio processor to convert verbal commands into correct electrical signals. Since work is already being done to convert brain impulses into computer commands it is probably only a matter of time until complicated interactive systems such as these are in general use, J - systems Santa Clara University 1969 '\ - - H-P 2902A DVM J 9liOOA DISTR I PoUT I '''J UNIT - \IAVETEK 157 WAVEFOR~t sY~nHES IZER BIBLIOGRAPHY 1 M HALPERN Foundations of the ease for natural-language programming IEEE Spectrum March 1967 2 J R PIERCE Men, machines and languages IEEE Spectrum July 1968 3 R B MILLER New, simple man/machine interface EDN October 15, 1969 4 R A GRIMM A utomated testing Hewlett-Packard Journal August 1969 5LLWEAR N aturallanguages and functional input devices for man-computer I NTERACT I V~ KEYBOARD H-P 2 I Il'A COMPUTER H-P 6130 DC VOLTAGE SOURCE 611 I----- l T WilT U:!DU; EST I Figure 4-A block diagram of the 9500A system used to conduct the experiment on natural language and functional inputs the Hewlett-Packard 6130 DC voltage source and the Wavetek 157 waveform synthesizer. The 6130 is capable of suppling DC voltages from -50 to +50 volts. When using this instrument, the operator must supply three parameters to the system: 1. The address of the particular 6130 to be used (in this special case there was only one 6130 so that the address was always 1). 2. The desired voltage in volts. 3. The desired current limit on the power supply. APPENDIX Description of the 9500A system and a natural language for computer aided checkout T he particular configuration of the system used to experiments on is shown in Figure 4. For thIS experIment, the unit under test was the system itself. Measurements were made on the outputs of the 6130 programmable dc power supply and the 157 p~og~am~able oscillator and on resistors placed in the dIstrIbutIOn unit. The general purpose digital computer used in the system is a H-P 2114A with 8 K of core memory. Each of the instruments and input/output devices is connected to the computer through one of the 2114's interrupt connectors. The language used by the operator to communicate with the computer and the instruments is a modified version of BASIC. An earlier section of this paper contains a detailed description of the features of the modified portion of BASIC. There are two instruments in the system that are used to supply stimuli to the unit under test; these are co~duct t~e The 157 is a programmable waveform generator. It is capable of generating -sine, square and triangular waveforms with amplitudes of .001 to 10.volts peak-topeak and with frequencies from .0001 to 1,000,000 hertz. The generator has three modes of operation program, trigger and search. In this program mode, the specified output is provided continuously from the time the instrument is programmed until a new output is requested. In the trigger mode, the output is supplied only during the time that an external trigger pulse is applied. In the search mode, the frequency of the output is determined by a dc voltage that is supplied to the 157 through a rear panel connector by external equipment. To program the waveform synthesizer the operator must, in general, supply four parameters to the system: 1. 2. 3. 4. The The The The mode of operation waveform type frequency amplitude The 2402A DVM is the only measurement device 612 Spring Joint Computer Conference, 1970 '''"'~ ¥:~n ,______-.!iBw.JJL 15 (a) IH.(L (b) Figure 5-(a) A typical relay switch bank in the 9400A distribution (b) Interconnections between the stimuli and DVM in the 9500A demonstration system in the system used for this experiment. The 2402A is capable of measuring DC voltages, AC voltages; frequency and DC resistance. Two other functions are available to check the calibration of the instrument; they are calibrate + and calibrate -. Besides the desired function, the operator must supply the system with two other parameters: 1. The range for the instrument 2. The identifier under which the measurement value is to be stored in the computer The 9400A distribution unit is used to connect the supplies and measuring devices to specified pins on the 9400A. The unit contains four banks of relay operated switches. Figure 5 (a) shows a typical switch. The instrument connected to the input pin can be switched to anyone of the 16 output pins. Figure 5 (b) shows how the 9400A was wired for the system used in the experiment. To program the 9500A the operator must supply the desired output pin, 0 through 15, for each switch bank. If the operator does not want to change output connection of one of the switches, he can input a-I to the system rather than a number betwe~n 0 and 15. The language used with the 9500A system was a modified version of BASIC. Extensions were made to BASIC so that the system was capable of interpreting the statements used to control the various instruments. When the operator wants to give a command to one of the instruments he enters "PROGRAM" following the line number. The operator then enters the instrument, DVM, OSC, PS or SWT. Following this, modifiers associated with the instrument being used are input. For a complete description of the language see Reference 5. Linear current division in resistive areas: Its application to computer graphics by J. A. TURNER and G. J. RITCHIE University of Essex Colchester, England INTRODUCTION Present-day computer systems employ a variety of sophisticated peripheral equipments for the input and output of information. This paper describes a new method of obtaining (x, y) coordinate position information by means of linear current division in a resistive area. 1 The method h3JS applications in many fields but particularly that of Computer Graphics where it can be used as an input device in the form of a Data Tablet, as an output device in the form of a precision CathodeRay-Tube Display and as an alternative to the 'lightpen'. Linear current division applied to Data Tablets differs radically from approaches used by Rand 2 and Sylvania3 in that the operator's electronic pen or stylus injects a constant current into the tablet, the x and y coordinate position information being obtained from individual peripheral connections to the tablet. It has the advantage of being a basically accurate system which is simple to construct and therefore economical to manufacture. In addition, the coordinate information can be sampled at a high rate, 10KHz being achieved without difficulty on the prototype Data Tablet. Included in this paper are the experimental results obtained from a Data Tablet-results which can be applied to any system employing the principle of linear current division. Also presented is a theoretical verification of the principle. tudes at the boundary of this area, but this distribution generally will be a very complex function of position. Establishing certain symmetrical boundary conditions yields a simple relationship between coordinate position and boundary currents, as follows. Linear current division for the one-dimensional case Consider a rectangular area with uniform surface resistivity as shown in Figure l. The edges at x = 0 and x = Xo are reinforced with low resistivity material and are connected to earth. If a current I is injected into the area at a point (Xl, Yl), currents II and 12 flow in the conducting edges as shown. Symmetry analysis yields the following relationship: 12 Xl -= I Xo (see Appendix) This relationship has been verified experimentally using an x-y table with a constant current probe which could be moved over a rectangular sheet of commercially-available teledellos paper, one pair of opposite edges being reinforced with a coating of high conductivity silver paint. The results are shown in Figures 2 and 3. Note particularly that when moving the probe in the y-direction, the current 12 is constant whereas moving the probe in the x-direction produces a current 12 which is linearly proportional to x position. When making these measurements, a random error of 0.25 per cent due to variations in surface resistivity of the teledeItos paper was observed. CURRENT DIVISION IN A RESISTIVE AREA Linear current division for the two-dimensional case If a current is injected at a point in a finite resistive The method for measuring probe position previously described can be extended to a two-coordinate system by presenting the resistive area with alter- area, the current must emerge at the boundary of the area. The coordinate position of the current source can be described by the distribution of current magni- 613 614 Spring Joint Computer Conference, 1970 y ~ ......-----------resistive area conducting edge , '2 t conducting (x .y) edge~ __ '_'______~L- o xo x Consider the waveforms shown in Figure 4(b). During the 'x period,' transistors Q2 and Q4 are gated OFF so that 12 = 14 = 0, whereas Ql and Qa are gated ON. The diodes associated with Ql and Qa thus set up equipotential edges parallel to the y-axis so that the source current, 1, splits into II and Ia giving a voltage V Ra = I aR which is linearly proportional to the xcoordinate position of the source. During the 'y period,' trans!stors Ql and Qa are OFF whereas Q2 and Q4 are ON; thus VR2 = I2R is a voltage linearly proportional to the y-coordinate position of the source. If the gating is repetitive, integration of VR3 and VR2 respectively provide low-frequency analogue x-y Figure I-Current division in a resistive area nately conducting and non-conducting opposite pairs of edges. One method by which this may be achieved is illustrated in Figure 4(a). The periphery of the conducting area is connected to four groups of uniformly-spaced diodes, the direction of conduction of the diodes being appropriate to the polarity of the current source. In this case, the anodes of the diodes along a single edge of the resistive area are connected together and also to the emitter of an npn gating transistor (transistors Ql, Q2, Qa and Q4). '2 I rO+----------l0 0·8 -t----------- 0·8 0·6 +-------------- 0·6 0·4 + - - - - - - - - - - 0·1, 0·2 +-..........- - - - - - - - 0·2 o 1-0 Figure 3-Constancy of current division 0·5 o 0·5 1-0 Figure 2-Linearity of current division information; alternatively, sampling of V Ra and V R2 may be performed during the appropriate gating periods to provide instantaneous measurements. An alternative method of operating the system is indicated in Figure 5 in which the current source is switched between +1 and - I. In this case opposite banks of diodes are arranged to conduct sequentially and are terminated in the low impedance virtual earths of operational current-to-voltage amplifiers. This method has the advantage of eliminating the varying emitter-base voltage drop associated with transistors Ql, Q2, Qa and Q4. Linear Current Division in Resistive Areas NON-LINEARITIES Although the simple one-dimensional system of Figure 1, with its completely reinforced edges, gives an ideal 'y-independent' and 'x-proportional' division of current, the two-dimensional systems described in the second section introduce non-linearities in current division. This is due to the small, discrete contact areas of the diode connections and to the fact that practical diodes have a finite and variable voltage drop when forwardbiased as well as a leakage current when reverse-biased. @and©-1 -6 J I I I - : x-p.riod : y-p.riod : - 1 I ®and@~6 l I I I I I 615 l ---.t r ~I I I I x-y slImpling I cycle Figure 4 (b)-Voltage drive waveforms y Lx L I t (X,IY, ) R t ':' dependent upon the relative impedances of the current paths from the current source contact point to the edges. A consequence of this behaviour is that the ripple should be reduced if the. probe is traversed at a greater distance from a conducting edge or if the number of contact areas,' n, on each edge is increased. These effects are illustrated in Figures 7 and 8. Since an edge of the resistive area is required to be conducting and non-conducting during successive half-cycles of the x-y coordinate position sampling waveform, two means whereby the ripple amplitude might have been reduced were investigated. Both Figure 4 (a)-Basic (x, y) system The effect of discrete contact areas In order to separate out the effects of finite diode contact area and diode voltage drop, an experiment was performed in which a constant current probe was moved parallel to and at a fixed small distance, M, from a conducting edge. This conducting edge was provided by small circular discrete contact areas connected together using 'ideal' forward-conducting diodes, i.e., wire. The results are shown in Figure 6. It is seen that the current 12 is subject to regular variations (ripple), each peak corresponding in position to a contact area. This is to be expected because the division of current between two opposite edges is Figure 5-Alternative (x, y) system 616 Spring J9int Computer Conference, 1970 ERIfOR(X) t3 __ l b - a at tt =5% '!2 ------ f t1 error = a b , diodes I 'ideal' diod~s··.~ .._~ __• I, bowing o+---2...,0r---_4_0r--_6___0~no-n-~'?"~-if-Or-m-;-r~-;-ti-'1it-Y ~ (Y.J o 1-0 -----n ~---r.~--~,-----,,------~, 5 10 15 20 Figure 6-Ripple due to edge contacts Figure 8-Variation of error with number of edge connections attempted a closer approach to a reinforced edge than the dot contact structure described in the previous section. The first method involved extending the dots into lines along the edge of the conducting area, measurements of ripple amplitude being taken for various contact-to-space ratios (see Figure 9). It was found that the ripple remained substantially unchanged for contact-to-space ratios up to 75% but, more important, the current division plot showed a marked deviation from linearity in the form of stepping, even with a contact-to-space ratio as low as 25% (Figure 10). This approach was therefore rejected. resistive area ERROR(Y.J t2 9 diodes/edge t1 Figure 7-Variation of error with edge approach distance Figure 9-Extension of dot contacts into lines Linear Current Division in Resistive Areas The second method consisted of interposing isolated conducting dots between adjacent contact dots (Figure 11), but no change in ripple amplitude could be observed even when interposing as many as fifteen isolated dots . between each pair of contact dots. It is therefore concluded that the ripple amplitude can be reduced by: . (a) increasing the number of contact dots per edge, and (b) increasing the minimum approach distance of the current source from an edge. 617 • resistive area • • • • • • The effect of diode voltage drop (ON diodes) Semiconductor diodes have a logarithmic currentvoltage characteristic which is conveniently described by the relationship: 59m V change per decade of current change. Conducting diodes along a single edge in general carry currents which are different from one another; hence the edge is no longer an equipotential. This results in a 'bowing' superimposed on the ripple and gives rise to an increase in the overall error (solid curve Figure 8). • Figure 11-Interposed dot contact structure The effect of leakage current (OFF diodes) In order to obtain a non-conducting edge, the diodes associated with that edge are reverse-biased. Each reverse-biased diode exhibits a leakage current, and the net source current available for position measurement is therefore (I - 2nI8)' I is the source current-. n is the number of diodes per edge. 18 is the reverse leakage current per diode. 1-0 The EC402 diode has a typical 18 as low as 0.45nA at 5 volts reverse-bias, yielding a total error current of ~nA for n = 20. If the source current is ImA, leakage will then account for a typical error of less than 0.002 per cent of full scale. This error is insignificant compared with the other factors under consideration. > Effects due to the resistive area x, o 1·0 X. Figure lO-"Stepping" error due to finite contact dimensions Non-uniform surface resistivity causes the current division between opposite edges to be non-linear. This effect can be seen in Figure 8 in which a random error of 0.25 per cent is present when using teledeltos paper.. It is therefore essential to use a material with a uniform surface resistivity. For a constant current injected into the resistive area, the voltage developed at the point of contact will depend upon the surface resistivity. Since the 618 Spring Joint Computer Conference, 1970 corrected drive analogue spot position information inputs x y comparator Figure 12-Feedback C.R.T. display system 'bowing' non-linearity is due to variable forwardbiased diode voltages, errors -due to 'bowing' can be minimised by using a material with large surface resistivity or injecting a large current. In both cases errors due to 'bowing' will be minimised when most of the voltage is dropped across the resistive area rather than the diodes. CATHODE RAY TUBE DISPLAY SYSTEMS The principle of linear current division can be applied most effectively in a Cathode Ray Tube in which the electron-beam is the current source and the resistive area is the aluminised backing to the phosphor, or a transparent conducting screen inner surface. The coordinates of spot position are obtained from the face of the tube allowing the tube to be included within a feedback loop, thus producing a deflection linearly proportional to input signal independent of the nonlinearities within the loop (Figure 12). A C.R.T. of this type will find applications in professional display systems such as Computer Graphic and Radar Displays. sufficiently fast for hand-written data to be reproduced without observable error. When hand-written information is presented to the Data Tablet, variations in pressure of the stylus input can be accommodated within the voltage swing capability of the input current-source, but intermittent contact will give a false (0, 0) position. In any practical device, a finite peripheral margin will be required in order that the ripple non-linearities may be kept down to a pre-specified level. Under these conditions, a (0, 0) output will indicate that the stylus has been lifted. The lifting may be temporary in the case of intermittent contact, or semi-permanent in the case of finishing a hand-drawn line. Where intermittent contact takes place, the (0, 0) indication is obtained for a few cycles only of the (x, y) sampling waveform-in which case a peak-charging sample-and-hold circuit maintains thE; output at its last non-zero value. A continuous output of (0, 0) for, say, 500mS indicates that the stylus has been lifted. Where digital information of position is required, the sample-and-hold circuit is replaced by an output register. A Data Tablet in which the resistive area is transparent has the capability of being used as a 'light-pen' when the tablet is placed over the face of a CathodeRay-Tube. A tube of the type described in the above section with a linear x-y display is an obvious choice for this application. SUMMARY AND CONCLUSIONS The principle of linear current division in resistive areas has potential application in many fields, but particularly that of Computer Graphics where it may be used in both input and output devices. Attractive features of THE DATA TABLET The two-coordinate systems described in an earlier section may be used as a Computer Graphic Input Device or Data Tablet. A prototype Data Tablet has been built (Figure 13) in which the analogue signals VR2 and VR3 of Figure 4(a) are connected to an x-y plotter in order to demonstrate the working capabilities of the system. x and y position information is sampled at 10KHz which is more than Figure 13-Prototype data tablet Linear Current Division in Resistive Areas 619 the system are: (a) High accuracy (better than 1%) (b) Simplicity in concept and construction (c) (x, y) sampling rate» 10KHz current boundary The basic accuracy of the Data Tablet on which the experimental results were taken is better than 1%. With a suitable choice of margin and number of diodes per edge, the main sources of error are non-uniform s!lrface resistivity and the effects of forward-biased diode voltages. It is expected that considerable improvement can be made, by careful choice of surface resistivity and input current magnitude. At present, information which has been hand-drawn is lost to the operator, unless a graphical output system is employed. It is, however, envisaged that transparent semi-flexible resistive areas 'in conjunction with a pressure-sensitive hard-copy material will allow the operator to see what he has drawn at the point of contact of the stylus, in addition to obtaining hard-copy. An area, as yet untouched in the electronics field, and in which linear current division has obvious application, is the extraction of analogue position information from a Cathode-Ray-Tube face. Output currents proportional to position are obtained when the beam current is a constant fixed value. An alternative method of operation requires a divider to evaluate 12/1 in which case a continuous indication of beam position can be obtained even when the brightness of the trace is varying. Circuitry to perform this normalised analogue division is currently under development. I 2 .,II--_4---l. Y.1 I I 2 2 I 2 ,I' ----J~ • -----------~j I I I x, Figure 15-First-cycle even-mode configuration land, for the facilities provided, and Automatic Radio, Melrose, Massachusetts for development funding. G. J. Ritchie is indebted to the Science Research Council and the Marconi Company Limited for an Industrial Studentship grant. Thanks are also due to Dr. L. F. Lind for many useful discussions regarding the appendix proof REFERENCES ACKNOWLEDGMENTS The authors wish to thank the Department of Electrical Engineering Science of the University of Essex, Eng- y y. ......- -resistive - - - -area - - - - conducting . I, .. , edge '2 ' ......---. conducting 51 ," APPENDIX Proof of the linearity of current division (x,y') edge~___'__ ' __~___________ o 1 U.K. Patent Application No. 13341/69 2 T 0 ELLIS M RDAVIS The Rand tablet: A man-machine communication device Proceedings Fall Joint Computer Conference pp 325-331 1964 3 J F TEIXEIRA R P SALLEN . The Sylvania data tablet: A new approach to graphic data input Proceedings Spring Joint Computer Conference pp 315-321 1968 4 J REED G J WHEELER A method of analysis of symmetrical four-port networks IRE Transactions on Microwave Theory and Techniques pp 246-252 October 1956 x•. x Figure 14-Current division in a resistive area Consider the uniform, resistive, rectangular sheet shown in Figure 14. The edges at x = 0 and x = Xo are reinforced with a highly conductive material and are connected to earth. A current, 1, is injected' into the 620 Spring Joint Computer Conference, 1970 sheet at source S(Xl' Yl) causing currents II and 12 to flow in the conducting edges as shown. It is required to establish the relationship between the currents II and 12, and the coordinate position of the source. Symmetry analysis (even and odd mode) yields the solution. 4 C~nvention: A downwards arrow (!) denotes the injection of current, whilst an upwards arrow (1) denotes removal of current from the point indicated. First Cycle: The first line of symmetry is x = xo/2 with Sf the image of S (Figure 15). The current, I, can be described by the sum of the even and odd mode contributions. Even mode: Equal currents of magnitude 1/2 injected at S and Sf give, by symmetry, contributions of 1/2 to both II and 12 irrespective of YI and of whether S lies in the left.;.hand or right-hand half-plane. Odd mode: By symmetry x = xo/2 is an equipotential at earth potential in both cases of Figure 16, but "the magnitudes of the boundary currents are not known. However, these may be established by sub-dividing this first cycle odd mode into further even and odd modes. Since each half-plane of Figure 16 is bounded by two earth equipotentials, the half-plane containing S may be treated as follows. Second Cycle: There are four possible cases for each mode. Even mode: The axes of symmetry occur at.x = xo/4 or x = 3xo/4 with S" the image (Figure 17). In (a) and (b), for which Xl < xo/2, there is a contribution of +1/4 to II and a -1/4 ~ontribution to 12. In (c) and (d), for Xl > xo/2, there is a +1/4 contribution to 12 and a -1/4 contribution to II. zero potential zero potential I I ..... ' "2 I S I I I I I I I I I I I I I I 2 t S' -t ,. 'It- 2 t S' I 2 i s Figure 16-First-cycle odd-mode configurations -tl' I .,..- '4 I I 4 '4 ts s,t x. I I 4 l I · 4 I I I x. ts'" st 4 I "4 3x. 2" 2 -4I ~ I tS"l st x. 4 4 : 4 : 4 , . · 4" : I : 4 I I I I l 4 I "4 is !s"i : Figure 17-Second-cycle even-mode configurations < xo/2, i.e. Xl/ Xo = 72 ~ ± (other terms) 72 - ~ ± (odd mode, then 12/1 second cycle) and if Xl > xo/2, i.e. Xl/ Xo = 72 + ~ ± (other terms) 72 + ~ ± (odd mode, then 12/1 second cycle) Study of the second cycle odd mode leads to the third cycle even mode contribution and introduces ±I/8 terms in direct correspondence with the description of Xl/ Xo by the series 72 + ~ ± % ± (other terms) or 72 - ~ ± % ± (other terms). The following pattern emerges: Therefore, if Xl - (i) The even modes contribute to 12/1 as terms of the form ± 72 n forming a series 72 ± ~ ± ~~ ... , each sign being positive if the source lies in the right-hand side or negative if the source lies in the left-hand side of the appropriate sub-division of the plane. (ii) Xl/ Xo is also described by this series with a direct correspondence of signs. (The series 72 ± ~ ± % ± ... ± 72 n • • • is absolutely convergent, therefore this series is convergent, regardless of sign choice.) (iii) The currents are independent of Yl because of symmetry considerations, i.e., the images of S are at the same height as S. :. 12/1 =xI/xo and 11/1 is the complement of 12/1 Thus, division of the current source, I, between the two conducting edges parallel to the y-axis has a linear dependence on the x-coordinate of the source and is independent of the y-positiori. Remote terminal character stream processing in Multics by J. H. SALTZER Massachusetts Institute of Technology Cambridge, Massachusetts and J. F. OSSANNA Bell Telephone Laboratories, Inc. Murray Hill, New Jersey INTRODUCTION There are a variety of considerations which are pertinent to the design of the interface between programs and typewriter-class remote terminal devices in a general-purpose time-sharing system. The conventions used for editing, converting, and reduction to canonical form of the stream of characters passing to and from remote terminals is the subject of this paper. The particular techniques used in the IV[ultics* system are presented as an example of a single unified design of the entire character stream processing interface. The sections which follow contain discussion of character set considerations, character stream processing objectives, ,character stream reduction to canonical form, line and print position deletion, and other interface problems. An appendix gives a formal description of the canonical form for stored character strings used in l\1ultics. CHARACTER SET CONSIDERATIONS Although for many years computer specialists have been willing to accept whatever miscellaneous collection of characters and codes their systems were delivered with, and to invent .ingenious compromises when designing the syntax of programming languages, the * Multics is a comprehensive general purpose time-sharing system implemented on the General Electric 645 computer system. A general description of Multics can be found in Reference 1 or 2. 621 impact of today's computer system is felt far beyond the specialist, and computer printout is (or should be) received by many who have neither time nor patience to decode information printed with inadequate graphic versatility. Report generation has, for some time, been a routine function. Recently, on-line documentation aids, such as RUNOFF,3 Datatext (IBM Corp.) or RAES (General Electric Co.) have attracted many users. Especially for the latter examples it is essential to have a character set encompassing both upper and lower case letters. l\1odern programming languages can certainly benefit from availability of a variety of special characters as syntactic delimiters, the ingenuity of PL/I in using a small set notwithstanding. Probably the minimum character set acceptable today is one like the USASCII 128-character set4 or IBlVI's EBCDIC set with the provision that they be fully supported by upper/lower case printer and terminal hardware. The definition of support of a character set is almost as important as the fact of support. To be fully useful, one should be able to use the same full character set in composing program or data files, in literal character strings of a programming language, in arguments of calls to the supervisor and to library routines requiring symbolic names, as embedded character strings in program linkage information, and in input and output to typewriters, displays, printers, and cards. However, it may be necessary to place conversion packages in the path to and from. some devices since it is rare to find that all the different hardware devices attached to a system use the same character set and character codes. 622 Spring Joint Computer Conference, 1970 TABLE I-Escape conventions for input and output of USASCII from an EBCDIC typewriter ASCII Character Name Right Square Bracket Left Square Bracket Right Brace Left Brace Tilde Grave Accent ASCII Graphic Normal Alternate EBCDIC "edited" Escape Escape ¢> ¢< ¢) ¢( ¢t ¢' ± =t -+-+ + CHARACTER STREAIVr PROCESSING CONSIDERATIONS The treatment of character stream input and output may be degraded, from a human engineering point of view, unless it is tempered by the following two considerations: 1. If a computer system supports a variety of terminal devices (l\1ultics,. for example, supports both the IBl\1 Model 27415 and the Teletype IVlodel 37 6) then it should be possible to work with any program from any terminal. 2. It should be possible to determine from the printed page, without ambiguity, both what went into the computer program and what the program tried to printout. To be fully effective, these two considerations must apply to all input and output to the system itself (e.g., when logging in!, choosing subsystems, etc.) as well as input and output from user programs, editors, etc. As an example of the "device independence" convention, l\1ultics uses the USASCII character set in all intern~l interfaces and provides standard techniques for dealing with devices which are non-USASCII. When using the GE-645 USASCII line printer or the Teletype l\10del 37, there is no difficulty in accepting any USASCII graphic for input or output from any user or system program. In order to use non-USASCII hardware devices, one USASCII graphic (the left slant) is set aside as a "software excape" character. When a non-USASCII device (say the IBl\1 2741 typewriter with an EBCDIC print element) is to be used, one first makes a correspondence, so far as possible, between graphics availa~le on the device and graphics of ,USASCII, being sure that some character of the device corresponds to the software escape character. Thus, for the IBl\1 2741, there are 85 obviously corresponding graphics; the EBCDIC overbar, cent sign, and apostrophe can correspond to the USASCII circumflex, left slant, and acute accent respectively, leaving the IBl\12741 unable to represent six USASCII graphic characters. For each of the six missing characters a two character sequence beginning with the software escape character is defined, as shown in Table I. The escape character itself, as well as any illegal character code value, is represented by a four character sequence, namely the escape character followed by a 3-digit octal representation of the character code. Thus, it is possible from an IBl\1 2741 to easily communicate all the characters in the full USASCII set. A similar, though much more painful, set of escape conventions has been devised for use of the ·l\10del 33 and 35 Teletypes. The absence of upper and lower case distinction on these machines is the principal obstacle; two printed 2-character escape sequences are used to indicat~ that succeeding letters are to be interpreted in a specific case shift. Note that consideration number two above, that the printed record be unambiguous, militates against character set extension conventions based on non-printing and otherwise unused control characters. Such conventions in~vitably lead to difficulty in' debugging, since the printed record cannot be used as a guide to the way in which the input was interpreted. The objective of typewriter device independence also has some implications for control characters. The l\1ultics strategy here is to choose a small subset of the possible control characters, give them precise meanings, and attempt to honor those meanings on every device, by interpretation if necessary. Thus, a "new page" character appears in the subset; on a l\1odel 37 teletype it is interpreted by issuing a form feed and a carriage return; on an IBl\1 2741 it is interpreted by giving an appropriate number of new line characters. * Of the 33 possible USASCII control characters, 11 are defined in l\1ultics as shown in Table II. Red and black shift characters appear in the set because of their convenience in providing emphasis in comments, both by system and by user routines. The half-line forward and half-line reverse feed characters were included to facilitate experimentation with the l\1odel 37 Teletype; these characters are not currently interpretable on other devices. One interesting point is the choice of a "null" or "padding" character used to fill out strings after the last meaningful character. By convention, padding characters appearing in an output stream are to be discarded, either by hardware or software. The USASCII, choice of code value zero for the null character has the *This interpretation of the form feed function is consistent with the International Standards Organization option of interpreting the "line feed" code as "new line" including carriage return. Remote Terminal Character Stream Processing in Multics interesting side effect that if an uninitialized string (or random storage area) is unintentionally added to the output stream, all of the zeros found there will be assumed nulls, and discarded, possibly leaving no effect at all on the output stream. Debugging a program in such a situation can be extraordinarily awkward, since there is no visible evidence that the code manipulating the offending string was ever encountered. In J\ifultics, this problem was considered serious enough that the USASCII character "delete" (all bits one) was chosen as the padding character code. The zero code is considered illegal, along with all other unassigned code values, and is printed in octal whenever encountered. As an example of a control function not appearing in the character set, the printer-on/printer-off function (to allow typing of passwords) is controlled by a special call which must be inserted before the next call to read information. This choice is dictated by the need to get back a status report which indicates that for the currently attached device, the printer cannot be turned on and off. Such a status report can be returned as an error code on a special call; there would be no convenient way to return such status if the function were controlled by a character in the output stream. ** CANONICAL FORlVI FOR STORED CHARACTER STRINGS Probably the most significant impact of the constraint that the printed record be unambiguous is the interaction of that constraint with the carriage motion control characters of the USASCII and EBCDIC sets. Although most characters imply "type a character in the current position and move to the next one," three commonly provided characters, namely backspace, horizontal tab, and carriage return (no line feed) do cause ambiguity. For example, suppose that one chooses to implement an ALGOL language in which keywords are underlined. The keyword for may now be typed in at least a dozen different ways, all with the same printed result but all with. different orders for the individual letters and backspaces. It is unreasonable to expect a translator to accept a dozen different, but equivalent, ways of typing every control word; it is equally unreasonable to require ** The initial Multics implementation temporarily uses the character codes for USASCII ACK and N AK for this purpose, as an implementation expedient. In addition, a number of additional codes are accepted to permit experimentation with special features of the Model 37 Teletype; these codes may become standard if the features they trigger appear useful enough to simulate on all devices. 623 TABLE II-USASCII Control Characters as Used in Multics USASCII MULTICS NAME NAME BEL BS BEL BS HT HT LF NL SO SI VT RRS BRS VT FF NP DC2 DC4 DEL HLF HLR PAD MULTICS MEANING Sound an audible alarm. Backspace. Move carriage back one column. The backspace implies overstriking rather than erasure. Horizontal Tabulate. Move carriage to next horizontal tab stop. Default tab stops are assumed to be at columns 11, 21, 31, 41, etc. New Line. Move carriage to left edge of next line. Red Ribbon Shift. Black Ribbon Shift. Vertical Tabulate. Move carriage to next vertical tab stop. Default tab stops are assumed to be at lines 11, 21, 31, etc. New Page. Move carriage to the left edge of the top of the next page. Half-Line Forward Feed. Half-Line Reverse Feed. Padding Character. This character is discarded when encountered in an output line. that the typist do his underlining in a standard way since if he slips, there is no way he can tell from his printed record (or later protestations of the compiler) what he has done wrong. A similar dilemma occurs in a manuscript editing system if the user types in underlined words, and later tries to edit them. An answer to this dilemma is to process all character text entering the system to convert it into a canonical form. For example, on a "read" call Multics would return the string: _ (BS)f_ (BS)o _ (BS)r (where (BS) is the backspace character) as the canonical character string representation of the printed image of for independently of the way in which it had been typed. Canonical reduction is accomplished by scanning across a completed input line, associating a carriage position with each printed graphic encountered, then sorting the graphics into order by carriage or print position. When two or more graphics are found in the same print position, they are placed in order by numerical collating sequence with backspace characters between. Thus, if two different streams of characters produce the same printed image, after canonical reduction they will be represented by the same stored string. Any program can thus easily compare two canonical strings to discover if they produce the same printed image. No restriction is 624 Spring Joint Computer Conference, 1970 placed on the human being at his console; he is free to type a non-canonical character stream. This stream will automatically be converted to the canonical form before it reaches his program. (There is also an escape hatch for the user who wants his program to receive the raw input from his typewriter, unprocessed in any way.) Similarly, a typewriter control module is free to rework a canonical stream for output into a different form if, for example, the different form happens to print more rapidly or reliably. In order to accomplish canonical reduction, it is necessary that the typewriter control module be able to determine unambiguously what precise physical motion of the device corresponds to the character stream coming from or going to it. In particular, it must know the location of physical tab settings. This requirement places a constraint on devices with movable tab stops; when the tab stops are moved, the system must be informed of the new settings. The apparent complexity of the l\1ultics canonical form, which is formally described in Appendix I, is a result of its generality in dealing with all possible combinations of typewriter carriage motions. Viewed in the perspective of present day language input to computer systems, one may observe that many of the alternatives are rarely, if ever, encountered. In fact for most input, the following three statements, describing a simplified canonical form, are completely adequate: 1. A message consists of strings of character positions separated by carriage motion. 2. Carriage motions consist of New Line or Space characters. 3. Character positions consist of a single graphic or an overstruck graphic. A character position representing overstrikes contains a graphic, a backspace character, a graphic, etc., with the graphics in ascending collating sequence. Thus we may conclude that for the most part, the canonical stream will differ little with the raw input stream from which it was derived. A strict application of the canonical form as given in Appendix I has a side effect which has affected its use in l\1ultics. Correct application leads to replacement of all horizontal tab characters with space characters in appropriate numbers. If one is creating a file of tabular information, it is possible that the ambiguity introduced by the horizontal tab character is in fact desirable; if a short entry at the left of a line is later expanded, words in that entry move over, but items in columns to the right of that entry should stay in their original cardage position; the horizontal tab facilitates expressing this concept. A similar comment applies to the form feed character. The initial l\1ultics implementation allows the horizontal tab character, if typed, to sneak through the canonical reduction process and appear in a stored string. A more elegant approach to this problem is to devise a set of conventions for a text editor which allows one to type in and edit tabular columns conveniently, even though the information is stored in strictly canonical form. Since the most common way of storing a symbolic program is in tabular columns, the need for simple conventions to handle this situation cannot be ignored. It is interesting to note that most format statement interpreters, such as those commonly implemented for FORTRAN and PLjl, fail to maintain proper column alignment when handed character strings containing embedded backspaces, such as names containing overstruck accents. For complete integration of such character strings into a system, one should expand the notion of character counts to include print position counts as well as storage position counts. For example, the value returned by a built-in string length function should be a print position count if the result is used in formatting output; it should be a storage location count if the result is used to allocate space in memory. LINE AND PRINT POSITION DELETION CONVENTIONS Experience has shown that even with sophisticated editor programs available, two minimal editing conventions are very useful for human input to a computer system. These two conventions give the typist these editing capabilities at the instant he is typing: 1. Ability to delete the last character or characters typed. 2. Ability to delete all of the current line typed up to the point. (l\1ore complex editing capabilities must also be available, but they fall in the domain of editing programs which can work with lines previously typed as well as the current input stream.) By framing these two editing conventions in the language of the canonical form, it is possible to preserve the ability to interpret unambiguously a typed line image despite the fact that editing was required. The first editing convention is to reserve one graphic, (in .:.Vlultics, the "number" sign), as the erase character. When this character appears in a print position, it crases itself and the contents of the previous print position. If the erase follows simple carriage motion, the entire carriage motion is erased. Several successive Remote Terminal Character Stream Processing in M ultics erase characters will erase an equal number of preceding print positions or simple carriage motions. Since erase processing occurs after the transformation to canonical form, there is no ambiguity as to which print position is erased; the printed line image is always the guide. Whenever a print position is erased, the carriage motions (if any) on the two sides of the erased print position are combined into a single carriage motion. The second editing convention reserves another graphic (in Multics, the "commercial at" sign) as the kill character. When this character appears in a print position, the contents of that line up to and including the kill character are disca!'ded. Again, since the kill processing occurs after the conversion to canonical form, there can be no ambiguity about which characters have been discarded. By convention, kill is done before erase, so that it is not possible to erase a kill character. OTHER INTERFACE CONVENTIONS Two other conventions which can smooth the human interface on character stream input and output are worth noting. The first is that many devices contain special control features such as line feed without carriage movement, which can be used to speed up printing in special cases. If the system-supplied terminal control software automatically does whatever speedups it can identify, the user is not motivated to try to do them himself and risk dependence on the particular control feature of the terminal he happens to be working with. For example, the system can automatically insert tabs (followed by backspaces if necessary) in place of long strings of spaces, and it also can type centered short tabular information with line feed and backspace sequences between lines. The second convention has been alluded to already. If character string input is highly processed for routine use, there must be available an escape' by which a program can obtain the raw, unconverted, unreduced and unedited keystrokes of the typist, if it wants to. Only through such an escape can certain special situations (including experimenting with a different set of proposed processing conventions) be handled. In l\1ultics, there are three modes of character handlingnormal, raw, and edited.* The raw mode means no processing whatsoever on input or output streams, while the normal mode provides character escapes, canonical reduction, and erase and kill editing. The edited mode (effective only on output if requested) is designed to produce high quality, clean copy; every effort is made to av~id using escape conventions. For example, illegal characters are discarded and graphics no1i available on the output device used are typed with 625 the "overstrike" escapes of Table I, or else left as a blank space so that they may be drawn in by hand. CONCLUSIONS The preceding sections have discussed both the background considerations and the design of the l\1ultics remote terminal character stream interface. Several years of experience in using this interface, first in a special· editor on the 7094 Compatible Time-Sharing System and more recently as the standard system interface for lVlultics, have indicated that the design is implementable, usable and effective. Probably the most important aspect of the design is that the casual user, who has not yet encountered a problem for which canonical reduction, or character set escapes, or special character definitions are needed, does not need to concern himself with these ideas; yet as he expands his programming objectives to the point where he encounters one of these needs, he finds that a method has been latently available all along in the standard system interface. There should be no assumption that the particular set of conventions described here is the only useful set. At the very least, there are issues of taste and opinion which have influenced the design. l\10re importantly, systems with only slightly different objectives may be able to utilize substantially different approaches to handling character streams. ACKNOWLEDGlVIENTS l\1any of the techniques described here were developed over a several year time span by the builders of the 7094 Compatible Time-Sharing System (CTSS) at l\1IT Project l\1AC, and by the implementers of l\1ultics, a cooperative research project of the General Electric Company, the Bell Telephone Laboratories, Inc., and the l\1assachusetts Institute of Technology. The usefulness of a canonical form for stored character strings was independently brought to our attention by E. Van Horne and C. Strachey; they had each implemented simple canonical forms on CTSS and in the TITAN operating system for the ATLAS computer, respectively. F. J. Corbat6 and R. l\10rris developed the pattern of escape sequence usage described here. Others contributing to the understanding of the issues involved in the character stream interface were R. C. Daley, S. D. Dunten, and M. D. l\1cIlroy. Work reported here was supported in part by the advanced Research Projects Agency, Department of *The "raw" m~de is not yet implemented. 626 Spring Joint Computer Conference, 1970 Defense, under Office of Naval Research Contract Nonr-4102(01). Reproduction is permitted for any purpose of the United States Government. REFERENCES 1 F J CORBAT6etai A new remote-accessed man-machine system AFIPS Conference Proceedings 27 1965 FJCC Spartan Books Washington D C 1965 pp 185-247 2 The multiplexed information and computing service: Programmer's manual MIT Project MAC Cambridge Massachussetts 1969 To be published 3 J H SALTZER Manuscript typing and editing In The Compatible Time-Sharing Sytem: A Programmer's Guide 2nd Edition MIT Press Cambridge Massachusetts 1965 4 USA standard code for information interchange X3 4-1968 USA Standards Institute October 1968 5 IBM 274-1 communications terminal IBM Systems Reference Library Form A24-3415 IBM Corporation N ew York 6 Model 37 teletypewriter stations for DATA-PHONE service Bell System Data Communications Technical Reference American Telephone and Telegraph Company N ew York September 1968 7 P L I I language specifications IBM System Reference Library Form C28-6571 IBM Corporation New York APPENDIX I The M ultic8 canonical form To describe the Multics canonical form, we give a set of definitions of a canonical message. Each definition is followed by a discussion of its implications. PL/I-style formal definitions are included for the benefit of readers who find them useful. 7 Other readers may safely ignore them at a small cost in precision. In the formal definitions, capitalized abbreviations stand for the control characters in Table II. 1. The canonical form deals with messages. A message consists of a sequence of print positions, possibly separated by, beginning, or ending with carriage motion. message : : = [carriage motion] [[print position]· .. [carriage motion] J... Typewriter input is usually delimited by action characters, that is, some character which, upon receipt by the system, indicates that the typist is satisfied with the previous string of typing. Most commonly, the new line character, or some variant, is used for this function. Receipt of the action character initiates canonical reduction. The most important property on the canonical form is that graphics are in the order that they appear on the printed page reading from left to right and top to bottom. Between the graphic characters appear only the carriage motion characters which are necessary to move the carriage from one graphic to the next. Overstruck graphics are stored in a standard form including a backspace character (see below). 2. There are two mutually exclusive types of carriage motion, gross motion and simple motion. carriage motion : : = gross motion } simple motion { gross motion simple motion Carriage motion generally appears between two graphics; the amount of motion represented depends only on the relative position of the two graphics on the page. Simple motion separates characters within a printed line; it includes positioning, for example, for superscripts and subscripts. Gross motion separates lines. 3. Gross motion consists of any number of successive New Line (NL) characters. gross motion : : = {NL} .. • The system must translate vertical tabs and form feeds into new line characters on input. 4. Simple motion consists of any number of Space characters (SP) followed by some number (possibly zero) of vertical half-line forward (HLF) or _reverse (HLR) characters. The number of vertical half line feed characters is exactly the number needed to move the carriage from the lowest character of the preceding print position to the highest character of the next print position. [[HLF] .•• ] simple motion : : = {SP}.·· [HLR]·· • The basis for the amount of simple carriage motion represented is always the horizontal and vertical distance between successive graphics that appears on the actual device. In the translation to and from the canonical form, the system must of course take into account the actual (possibly variable) horizontal tab stops on the physical device. In some systems, a "relative horizontal tab" character is defined. Some character code (for example, USASCII DCI) is reserved for this meaning, and by convention the immediately following character storage position contains a count which is interpreted as the size of the horizontal white space to be left. Such a character fits smoothly into the canonical form de- Remote Terminal Character Stream Processing in Multics scribed here in place of the successive spaces implied by the definition above. It also minimizes the space requirement of a canonical string. It does tequire some language features, or subroutines, to extract the count as an integer, to determine its size. It also means that character comparison is harder to implement; equality of a character with one found in a string may mean either that the hoped for character has been found or it may mean that a relative tab count happens to have the same bit pattern as the desired character; reference to the previous character in the string is required to distinguish the two cases. 5. A print position consists of some non-zero number of character positions, occupying different half line vertical positions in the same horizontal carriage position. All but the last character position of a print position are followed by a backspace character and some number of HLF characters. print position : : character position [BS [HLF]· •. character position]· •• Note that all possible uses of a backspace character in a raw input stream have been covered by statements about horizontal carriage movements and overstruck graphics. 7. A graphic former is a possibly zero-length setup sequence of graphic controls followed by one of the 94 USASCII non-blank graphic characters. . graphIc former : : . = [setup sequence] character position : : = graphic former [BS graphic former] . • • one of the) 94 UASCII h' grap IC characters l 8. A graphic setup sequence is a color shift or a bell (BEL) or a color shift followed by a bell. The color shift only appears when the following graphic is to be a different color from the preceding one in the message. = 6. A character position consists of a sequence of graphic formers separated by backspace characters. The graphic formers are ordered according to the USASCII coded numeric value of the graphics they contain. (The first graphic former contains the graphic with the smallest code, etc.) Two graphic formers containing the same graphic will never appear in the· same character position. 627 setup sequence: : = [ 1 RRS [BEL] BRS Jl BEL in the absence of a color shift, the first graphic in a message is printed in black shift. Other control characters are treated similarly to bell. They appear immediately before the next graphic typed, in the order typed. By virtue of the above definitions, the control characters HT, VT, and· CR will never appear in a canonical stream. A study of heuristic learning methods for optimization tasks requiring a sequence of decisions* by L. ROWELL HUESMANN y ale University New Haven, Connecticut INTRODUCTION Learning is a broad term covering many different phenomena. It is convenient to segment learning into three different problems in induction: the collection and use of stochastic information on past performance in order to improve performance, the determination of which variables are relevant to the decisions being made, and the derivation of performance rules in the predicate calculus from the collected data. This study concentrates on the first problem. THE ISSUES FOR INVESTIGATION (a) Can a digital computer program significantly improve its performance on an optimization task of real-world complexity (and generalize that improvement to other problems of the same type) solely through ordinal feedback from intercomparisons of the solutions it has produced? Most of the previous work in machine learning dealt with pattern recognition or game playing tasks. Yet these tasks have specific characteristics that differentiate their requirements for a learning mechanism from other tasks' requirements. Both are essentially win-loss or right-wrong tasks. In addition, in pattern recognition, feedback about the success of a decision is usually immediate. Yet many tasks have other than binary outcomes-that is, they are optimization tasks or problems in finding the "best" solution, according to some objective criterion, from a set of feasible solutions. Usually, the problem solver does not even * This paper is based on a Ph.D. thesis completed at CarnegieMellon University, Pittsburgh, Pa. The project was supported in part by United States Public Health Service grants MH-07722 and MH-30,606-01Al. The author is indebted to Mr. Herbert A. Simon for his advice and assistance. 629 know how well he can do. Consumer decisions, social decisions, and business decisions are often problems of this type. With many optimization tasks one can obtain interval information about the relative worth of two solutions, however for others -only an ordinal scale of solutions can be found. More important, it is often an order of mangnitude easier for a program to decide whether onesolution is better than another than for it to decide how much better. Hence, it is desirable to find a mechanism that can improve a program's performance solely from ordinal feedback. (b) Can significant improvement occur if the task environment is characterized for the program by a vector of relevant stimulus variables (a state vector)? Another characteristic of much of the previous work in machine learning is that most learning mechanisms have combined the stimulus variables in linear polynomials and selected a response on the basis of the various polynomials' values. Many of these schemes are called stimulus voting procedures because each stimulus votes separately for a response. The limitations of such linear machines are well known and have been analyzed in detail. l ,2 What is particularly disappointing is the simplicity of some patterns that cannot be handled by linear machines. For example, consider the association pattern in Table 1. When the values of the two features are the same, response R1 is required; otherwise, R2 is required. Let us now show that linear discriminant functions cannot be used to make this classification. Theorem: Linear discriminant functions do not exist for some very simple classifications of features. In particular none exist for the classification shown in Table 1. Proof: The theorem will be proved by assuming the linear discriminant functions do exist and finding a 630 Spring Joint Computer Conference, 1970 TABLE: I-A Simple Discrimination That Is Not Realizable with Linear Discriminant Functions VALUES OF FEATURES (FI ) DESIRED RESPONSES (F 2) (R I ) (R 2) FEATURE 1 FEATURE 2 RESPONSE 1 RESPONSE 2 x 1 x 2 1 X 2 X 2 example, for R stimulus variables and' N values per variable a parsimonious representation of the stimulus state requires on the order of NR storage cells. On the other hand, one needs only N* R cells to represent the status of each stimulus variable independently of the other variables and only R cells to represent the stimulus situation as the value of a linear polynomial. However, psychological evidence indicates that humans seldom attend to more than a few environmental features at a time3 so a state-vector of low dimensionality might be a reasonable representation for a learning program. This is the representation we adopted. The learning problem contradiction. Let E1 be the linear discriminant function for R1 Let E2 be the linear discriminant function for E2 R2 = C21F 1 + C22 F 2 If linear discriminant functions exist that can make this discrimination, then For (F 1 E2 - For (Fl El - = 1, El = 2, (F1 = C21 F 1 + C22 F 2 - CU FI = 1, E2 = 2) and F2 = 1) and F2 = 2, (Fl F2 = 1) C12F 2 > 0 - F2 (1.1) = 2) = CU FI + C12F 2 - C21 F 1 - C22 F 2 > 0 (1.2) Substituting the values of the features gives from (1.1) C21 2C22 - Cu - 2C12 >0 (1.3) + C22 - 2Cu - C12 > (1.4) + and 2C21 0 and from (1.2) + C12 (C11 + C12 Cu - We view the learning problem as one of associating states of the environment, defined by some set. of stimulus variables to which the problem solver is' attending, with strategies for performance. The strength of such associations can be represented by the entries in a table of connections or matrix whose rows represent stimulus states and whose columns represent strategies. We want to see if significant learning can be accomplished on a very complex optimization task if the stimulus environment is represented by a state vector of some of the most obvious relevant stimulus variables and only ordinal feedback is used. C22 +C 21 - C21 C21 - - Cu - C22 > 0 (1.5) <0 C12 < 0 (1.6) C22) But adding (1.3) and (1.4) gives 3Cn +3Cu -3Cu -3Cu >O C22 + C21 - Cu - C12 > 0 (1.7) Equation (1.7) contradicts (1.6). Since the conclusion of a correct line of reasoning has been a contradiction, the assumption that a linear discriminant function exists must be false, and the theorem is proved. A mechanism that associated states of the environment with strategies or responses could learn such discriminations. Unfortunately, a state vector description requires a great deal more computer storage. For THE TASK TO BE LEARNED To avoid spending a majority of' the programming effort on a performance program for solving a very general class of optimization tasks, it was decided to restrict the study to one specific task, the project scheduling task. A sample problem for this task is shown in Figure 1. The objective is to complete all the jobs in as short a time as possibl~ by executing them in parallel. It is a difficult real-world task faced by management scientists, but it can be shown to be very similar to other optimization tasks requiring a sequential set of decisions, e.g., finding the minimum number of moves to checkmate, or the Traveling Salesman Problem. A task very similar to the project scheduling task was used by Fisher and Thompson4 in a study that suggested the learning technique we have used. One can view optimization tasks that require a sequence of decisions as problems in finding the shortest or longest path through a decision tree. A feasible solution is any path from the root of the tree to a terminal or goal node. The branches descending from a node represent possible decisions and the nodes represent the status of the "system" after a decision is made. Study of Heuristic Learning ]Vlethods THE LEARNING TECHNIQUE Given a state-vector representation of the task environment and a set of performance strategies, the learning mechanism must create a good (and generalizable) table of connections between stimulus states and strategies. An informal "hill climbing" procedure will be used to construct the table. Viewing learning as constructing a table of connections is not a new idea. 5 However, unlike almost all previous learning programs, this one will have no way to make an absolute judgment about the utility of a solution. Since the problems to be attacked are optimization problems themselves, the learning program cannot determine when it has achieved the best solution. How will feedback be obtained? The best previous solution will be designated as a bench mark solution and new solutions will be compared to it. If the new solution is better, the comparison 631 is positive; if it is worse, the comparison is negative. A fairly sophisticated comparison procedure was developed to make comparisons feasible as frequently as possible during construction of a solution. Hence, one comparison corresponds to what is normally called one trial in the learning literature and one trial on a problem includes a whole series of comparisons. One can show that this technique can be applied (with a few restrictions) to almost any optimization task requiring sequential decisions. THE DESIGN OF THE PROGRAM This program, like most learning programs, should be viewed as two closely interacting routines-a performance routine and a learning routine. The routines were written in IPL-5. Let us first discuss the performance routine. The performance routine Figure I-A sampie project scheduling problem. The leftmost jobs can be executed initially. The lines indicate the prerequisities for the other jobs: No job can be executed until all the jobs connected to it from the left are completed. The dotted line is the critical path for the execution times given below. This problem has only one resource need limiting how many jobs can be scheduled ip. parallel. Job job job job job job job job job job job job job job job job 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Units of Time Needed Units of Resource Occupied 4 3 2 3 4 .5 3 3 3 3 6 8 7 1 1 5 5 1 3 6 4 4 2 4 6 6 4 4 6 10 This routine is designed to find the shortest path through the tree of feasible solutions, i.e., feasible job schedules. Each level in the tree corresponds to a different time; each node in the tree specifies what jobs are completed, what jobs are currently being executed, and what jobs remain to be scheduled; each branch indicates the scheduling of a particular set of jobs. Hence, two geometrically different nodes may have the same meaning with different histories. Every path eventually leads to a node specifying that all jobs are complete. The objective of the performance program is to find a path through this tree that ends at the highest level terminal node (minimum time path). The performance program uses a "depth first" approach to search. It looks ahead along a path through the tree until it detects a node where the path can be evaluated. Of course, there will be no evaluations during the production of the initial solution .since there is no solution for comparison. At each node encountered in the look ahead process, the program must decide what branch to follow next. This is equivalent to choosing the jobs that should be scheduled at that time. When a node is reached that can be evaluated, the learning program is called in to compare the current path with the bench mark solution. The comparison is either indeterminate, positive-the new path is more desirable, or negative-the present solution is more desirable. If the comparison is indeterminate or positive, the performance routine looks ahead deeper along the current path. In addition, when the evaluation is positive, the current path is merged with the bench mark solution to form a new bench mark solution. On the 632 Spring Joint Computer Conference, 1970 other hand, if the evaluation is negative, the performance routine abandons search along the current path. It may either return to the top of the solution tree and investigate a new path or look. ahead deeper from the corresponding but preferred node on the present bench mark solution's path. By "corresponding node" we mean the node on the solution path that was used in the evaluation. The tables of connections In carrying out this procedure the· performance program has to make two types of decisions. As mentioned above, following a negative comparison, the program must decide whether to back up to the start or continue from a point on the present bench mark solution; the program must also decide what branch to follow (what jobs to schedule) at each node encountered in the look ahead process. While the former of these decisions requires a general-problem-solving strategy, the latter decision requires a task-specific strategy. To select these strategies, the performance routine employs two tables of connections. One table links a state vector composed of characteristics of the current search situation to general strategies, in this case strategies specifying what to do after a. negative comparison. The other table connects a state vector of task relevant variables to a set of task specific strategies, in this case job selection strategies. Both of these tables are represented by tree structures in the computer's memory. A numerical value associated with Strategy i at State-node j will provide a measure of the past success of that strategy in State j relative to the success of other strategies in that state. Selecting a strategy The information in a state node could be used in any of several ways to select a strategy. For example, if one wishes to select a good strategy, one might choose the strategy whose success value is the greatest of all the values at the node, or one might select a strategy probabilistically in proportion to the success values. On the basis of several pilot runs it was decided that the performance program should construct the initial solution on each run by selecting the strategies with the highest success values (ties are broken randomly). During the rest of the run the performance routine would select strategies probabilistically. Specifically, the probability of choosing Strategy Si whose success value in the current state is Vi, from n strategies whose values in the current state are Vl ••• Vn is given by: Built-in heuristics The performance routine was not intended to begin as a completely naive problem solver. Certain general and task specific heuristics were built into the routine while other heuristics were introduced via the initial entries in the general-strategy table of connections. These heuristics were ones that most human problem solvers would have learned before ever attempting a problem of this type or ones that would be suggested by a cursory glance at the literature on the ·task. Foremost, among the general heuristics built into the program, is sub-goal evaluation. During the look-ahead process, the performance program asks the learning program to evaluate virtually every potential subgoal-that is, every node on the current path that is on the same level as a node of the solution path. (In look-ahead searches on a typical problem twenty-eight sub-goals were evaluated for every evaluation of a complete path through the tree of feasible solutions.) A second built-in general heuristic is the program's "next event" approach to search. During the lookahead process many nodes are encountered where no decisions need to be made. For example, no jobs can be scheduled at a node unless a job terminates there. Hence, the performance program jumps from node to node ignoring intervening nodes where no decisions need to be made. This heuristic speeds performance greatly, but it is dependent upon another heuristic-a task specific heuristic-included in the performance program. The performance routine always schedules as many jobs as resource constraints permit; so no new job can be scheduled until a job terminates and frees some resources. Such a heuristic is not without its drawbacks. There are a few situations where it prevents the program from searching a slightly superior branch. However, it is a heuristic with strong intuitive appeal, one that reduces the number of branches in the solution tree considerably, and one that permits implementation of the next event search process reducing the number of nodes to be analyzed during look-ahead. Three heuristics dealing with search behavior were introduced through the initial values of the success terms in the general-strategy table of connections. These heuristics deal with what the program should do following the discovery that a branch presently being searched can only be inferior to the solution (negative comparison). The probability of "backing Study of Heuristic Learning l\1ethods 633 up" to the start was made an inverse function of the depth that search had progressed into the solution tree and a direct function of the number of dead end branches encountered (negative comparisons and nonpositive comparisons of complete paths). Thirdly, whichever "back up" strategy is selected, it should be tried several times consecutively before being abandoned. Storing a solution BEGIN A NEW PATH FROM BEGINNIN OF THE PROBLEM. The performance program remembers only the present solution path and the path currently being searched. Both are stored as lists of scheduled jobs separated by time markers. The jobs preceding the nth time marker are the jobs being executed at time n. Associated with the list representing the path currently being searched is a list of the values in the tables of connections that have been selected during the search-that is, the values corresponding to the state-strategy pairings used to produce the path. Whenever a strategy is used, its value in the current state is added to this list. Hence, all the cells that the learning routine will modify are contained on this list. CONSTRuCT THE INITIAL BENCH MARK SOLllTION BY SELECTING THE "BEST" STRATEGY IN EACH STIMUI.US STATE BEGIN A NEW PArH FROM PREFERRED NODE ON BENCH MARK. INCREMENT STATE-STRATE PAIRS THAT WERE USED. COMBINE NEW PATH WITH BENCH MARK TO FORM A NEW BENCH MARK. Figure 3-A flow chart showing the steps the learning routine takes when called upon to compare part of a new solution with the bench mark solution. A gross flow chart of the performance procedure is presented in Figure 2. The learning routine The learning program evaluates paths through the tree of feasible solutions by comparing them with the bench mark solution (best solution so far), and it alters the tables of connections on the basis of these evaluations. Comparing solutions Figure 2-A flow chart of the performance routine. How does the program determine which of two paths is preferred? Path Zl up to node x is preferred to path Z2 up to node y if node x and yare at the same level in the tree of feasible solutions and if node x "dominates" node y. A node, one should remember, specifies the set of jobs currently completed and the set of jobs currently being executed. To say node x on Zl dominates node y on Z2 means that (a) all jobs 634 Spring Joint Computer Conference, 1970 TABLE II-These Sample Schedules Illustrate the Construction of a New and Superior Bench Mark Solution (Z4) Out of the Old Bench Mark (Z2) and a New Partial Solution (Zl). SCHEDULES TTME Zl Z2 Z3 1 2 job 1 job 2 job 3 job 4 job 1 job 4 job 2 I job 3 I job 1 job 2 job 3 job 4 3 I 4 job 5 5 6 7 8 I I I I job 5 I job 5 job 6 job 7 job 8 I I job 5 job 6 job 7 job 8 I I I job 1 job 2 job 3 job 4 I I job 5 I job 6 job 7 job 8 I I LIST REPRESENTATION OF Zl: Zl-O job 1 job 2 time mark job 3 job 4 time mark job 3 job 4 time mark job 4 job 5 time mark-O scheduled on Z1 prior to x are completed by x, (b) all jobs on Z2 that are completed by y or being executed at yare completed by x on Z1 (from (a), any job on Z1 prior to x is completed by x), and (c) there is at least one job completed on Z1 prior to x that is not on Z2 prior to y. From this definition if any node x on Z1 dominates its corresponding node y on Z2 (the node at the same level), then combining Z1 prior to x with Z2 after y produces a new path at least as short as Z2 and in most cases shorter. Hence,_ Z1 prior to x is preferred to Z2 prior to y. One should be careful to clearly understand these statements as they are essential to the learning method. They form a task specific algorithm for judging partial schedules. To verify that the new path will indeed be no longer, one simply recognizes that Z2 after y can always be added onto Z1 before x without any changes since all jobs on Z1 are completed at x. Furthermore, at least one job on Z2 after y has already been executed and can be deleted. If this deletion (or deletions) shortens Z2 after y the new path will be shorter. Consider as an example the schedules Z1 and Z2 in Table 2. At time 4 the node on Z1 dominates the corresponding node on Z2. As a result Z1 and Z2 can be combined into the new schedule Z3. By deleting "job 5" which had already been completed on Z1 and moving the other jobs up in time, a shorter schedule Z4 was then produced. The large majority of evaluations turn out to be indeterminate. For example, during training on a typical problem about 94% were indeterminate. When the comparison is negative (the present bench mark solution is preferred), the tables of connections are not altered and control is returned to the performance routine which decides whether to look ahead from the corresponding node of the bench mark or back up to the top of the tree. When the comparison is positive (the current path is preferred), the learning program alters the tables of connections and constructs a new bench mark solution. Altering the memory structures Altering the tables of connections is fairly trivial. Remember that during look ahead a list is maintained of all the success terms associated with the selected state strategy pairings. This requires very little storage, only one cell for each decision made since the last Study of Heuristic Learning Methods positive or negative comparison. To positively reinforce the state-strategy pairings participating in the construction of a better solution, each element of this list is simply incremented. On the basis of pilot studies we selected an increment of 3 over smaller values. Larger values might produce more rapid learning but also less stable. Obviously, an entire study could be devoted to finding the optimal value for this increment. With an increment of 3 the probability of selecting each strategy is altered as follows. Let Pt(Si/R) be the probability at time t of selecting strategy Si in state R. Vi be the success value associated with the jth strategy in state R at time t. n be the total number of strategies. Then, as mentioned earlier, and if strategy Si is reinforced in state R at time t P,+l(s./R) = Vk / (tv; + 3), k,.. i or letting at time t These changes will be called positive reinforcement. Consistent decision making during learning This completes the description of the learning routine. One very important addition was made to the learning scheme as a result of some early failures in the pilot studies. Principle: While exploring a path through the tree of feasible solutions, a performance program used with a learning routine should employ the same strategy every time the same state occurs (make 635 the same decision in the same situation) until the path has been successfully evaluated (positively or negatively) . When this principle is not adhered to, credit assignment becomes almost impossible. Conceivably, all the strategies could be used in the same state before an evaluation occurred. -In this case the bad strategies may mask the good strategies, and one has no way to distinguish between them. Hence, it is not sufficient to "select a strategy in proportion to its past successes." One must first check to see if a strategy has already been paired with the current state, and, if so, use that strategy. SELECTING STRATEGIES AND FEATURES One might well argue that the major portion of this program's work is done by the programmer when he selects the stimulus features for attention and the potential strategies for use. Yet this is exactly what happens to the human beginner. He generally derives his first ideas about strategies and features from a teacher, a book, or his experience with other similar tasks. The features and strategies that we selected for use were simple ones that would occur to anyone who made a cursory glance at the literature on scheduling problems. Within the program the features and strategies were represented as lists of components in such a way that new strategies or features could be synthesized. Later we will see how this learning mechanism could employ its feedback to eliminate poor strategies or features and introduce new ones. Five task-specific strategies and three features of 3, 3, and 4 values were used initially. Hence, there were 3*4*3 or 36 state nodes in the task-specific table of connections. Each state, of course, really represented a broad class of stimulus situations. With five strategies per node the total storage requirement of the task specific table of connections was only 463 IPL-5 cells or 926 32-bit words. All the success values in this table were initially set at 10. Other smaller values were tried during the pilot studies and found to change the performance routine's behavior too radically in early training. The general-s_trategy table of connections used in these experiments was employed only to choose between two search strategies. The "previous-strategy" feature thus had two values while the other two features had three value classes. Hence, there were 18 state nodes requiring 133 IPL-5 cells or 266 32-bit words. This means that the two tables' total storage requirement was 1,192 computer words. The initial success 636 Spring Joint Computer Conference, 1970 255 170 BASE SOLUTION x ~ 250 z ILl ...J ~ 245 i= ::::I c5 U) 240 \ "-..fO 30 \ 50 165 TEST '-- 70 \ BASE SOLUTION 90 110 130 COMPUTER TIME (in tOO seconds) 160 Figure 4.1-The solutions produced during training on Problem 1 with positive reinforcement. The discontinuities are points where all previous solutions were erased from the program's memory, and it began again on the problem from scratch. 155 values in the general-strategy table were assigned to implement certain heuristics as discussed in the section on the performance routine. 150 ._\ .'--- 10 30 EXPERIMENT I To answer questions (a) and (b) we tested this program's learning ability on three project scheduling problems. The dependent variable on which a learning mechanism should be evaluated is improvement in performance not the quality of performance. We want to demonstrate that the proposed learning mechanism, using only ordinal feedback, can learn what strategy to apply in what state so that the performance program performs significantly better on the training problem and on other problems of the same type. Fifteen project scheduling problems (unbiased in any obvious manner) were generated randomly by the computer to find three that satisfied hardware and complexity constraints (most were too simple). We will call these Problems 1, 2, and 3. The program was trained on Problem 1, trained more on Problem 2, and tested for ten minutes on Problem 3. Then we retrained the program from scratch on Problem 3 and tested it on Problem 1. No negative reinforcement was applied in this experiment. If the bench mark solution was definitely superior, the new path was abandoned and the program selected a general strategy telling it what to do next. Figure 4.2-The solutions produced during additional training on Problem 2 with positive reinforcement. 255 t+----BASE SOLUTIONS 250 245 240 235 10 to Results The training significantly improved the performance program after only a moderate number of positive Figure 4.3-The solutions produced during test trials on Problems 1 and 3. Study of Heuristic Learning lVlethods reinforcements. The improvement generalized to problems other than the training problem. Figures 4.1 to 4.3 and 6.1 to 6.2 contain learning curves showing the improvement. The base solution to a problem is the average solution produced by random strategy selection. The improvement can be measured quantitatively by the ten-minute solution rate. The rates are shown in Table 3. A Mann-Whitney U test confirmed a highly significant difference (p < .0001) in rates on training and test trials. Each segment of the learning curves in Figures 4.1 and 6.1 represents the performance from creation of an initial bench mark by using the highest valued statestrategy pairings until the program has not improved the bench mark in a specified time period. The bench mark is erased before a new segment starts. These segments are called training trials, but within anyone of them there are many comparisons of solutions which may result in reinforcements. About 94% of all comparisons were indeterminate, i.e., neither the bench mark nor current solution was preferred. On the average 9 different state-strategy pairs were evaluated in a determinate comparison. One inevitable characteristic of an ordinal feedback system is that as learning progresses within a trial, positive comparisons become less frequent, and negative comparisons become ~ 1.0 • = A DETERMINATE COMPARISON L&J ~ .9 ~ t; .8 L&J ::z: .... .7 •6" ~ ~'.S25 o 150 LI.J en ::> (f) Q: LL..Q.. o >- 100 Q:ffi LI.J~ CDc:( :::EQ: ~ CD I c:(~ ~ ~ ~en 50 ~ z LI.J ffi 25 LL.. LL.. C •• ••• o •••• •• ••••••• 100 50 150 •••••••••••••• 200 250 300 NUMBER OF COMPARISONS Figure 6.2-The number of different state-strategy pairings used as a function of the number of comparisons of partial solutions during learning. The data are from the first training trial on Problem 1 in Experiment I. more frequent. Altogether, during the three training trials on Problem 1, there were 449 positive reinforcements of state-strategy pairs, while, during the training on Problem 3, there were 142 positive reinforcements of state-strategy pairs. The change in one individual row in the table of connections is displayed in Figure 5.1. One can see that an equilibrium was reached early in the trial. The changes in the table of connections can also be measured in terms of the entropy of the table 230 ::z: .... 0 .3 z BASE SOLUTION 225 L&J ...J .2 ~.1 t;; 75 -JLI.J .~ .4 ...J 125 ;;t .5 >.... 637 '... 521-24 • 523 -----------...·------..L.S21,22,24 ~ 0.0 ~~~~~~~~~~~~~~LL~~ o 50 100 150 200 250 NUMBER JF COMPARISONS Figure 5.1-The probabilities of the program selecting each of the five strategies as a function of the number of comparisons during learning. The data are for one particular task situation (state) corresponding to average time requirements, average criticality, and the beginning of a solution. The data are from the first training trial on Problem 1 in Experiment 1. The strategies are described in Table 2. z 220 • 0 :;:) ...J 0 .-...........- \ t= 215 U) o-TEST 210 to 30 50 70 90 COMPUTER TIME ( in 100 seconds) Figure 6.1-The solutions produced during training on Problem 3 with positive reinforcement. 638 Spring Joint Computer Conference, 1970 PROB.3 1-----BASE SOLUTIONS Having demonstrated that a learning mechanism based on ordinal feedback and us~ng a state-vector representation will work, we can turn to the central issue in this study: should negative reinforcement (or error correction training) be used in a learning mechanism for optimization tasks? The large majority of trainable pattern classifiers and game playing programs have used error correction training alone or in conjunction with positive reinforcement. This is somewhat surprising since the weight of evidence from psychology seems to indicate that positive reinforcement plays the most important role in learning while negative reinforcement may speed learning slightly by eliminating incorrect responses or may not help at all. Furthermore, we assert that error correction training is useful only if the learning program receives feedback data on an interval or ratio scale; feedback on an ordinal scale, as one receives in optimization tasks, while sufficient for positive reinforcement, is not sufficient for negative reinforcement (error correction training). In fact, error correction training or negative reinforcement should adversely affect the 10 T ABLE III-Solution Rates During Experiment I (Positive Reinforcement) li'igure 6.2-The solutions produced during test trials on Problems 1 and 2. of connections. The entropy of the table trained on Problem 1 was reduced from 83.6 bits to 71.7 bits during training. This was 93% of the maximum possible reduction for the number of reinforcements. The final table of connections for training on Problem 1 is summarized in Table 4. On the basis of relatively brief exposure to one optimization problem the performance program's table of connections was changed so that the program produced good solutions significantly more rapidly. This learning generalized to two other problems .of the same type. Training from a naive state on these problems in turn improved performance on the original problem. Hence, significant learning is possible on a very complex optimization task with ordinal feedback and a statevector representation of a reduced task environment. First tra1:ning serie8 (Run 1) Trial Final Rate Ten Minute Rate 1 2 3 0.9 1.2 2.9 1.7 1.3 4.3 1 2 2.4 2.9 2.4 2.9 Training on Problem 1: Additional training on Problem 2: Test on Problem 1: 1 5.0 1 5.1 Test on Problem 3: Second traim:ng series (Run 2) Trial Final Rate Ten Minute Rate 1 2 3 2.9 2.2 1.5 1.7 1.7 3.1 Training on Problem 3: EXPERIMENT II (c) Does improvement occur more rapidly if the program changes its structure following its failures to improve its performance as well as after its success? Test on Problem 1: 1 4.3 1 4.0 Test on Problem 3: Study of Heuristic Learning l\1ethods 639 TABLE IV-A Summary of the Table of Connections in Experiment I (Both the mean success values and the probabilities of selection are given in each cell.) FEATURE DEPTH IN SOLUTION TIME NEEDS OF EXECUTABLE JOBS STRATEGIES S21 S22 S23 S24 S25 SUM BEGINNING 11.8 .133 12.8 .145 13.0 .147 17.8 .201 33.0 .373 88.4 MIDDLE 10.5 . 108 21.5 .222 32.3 .333 . 17.0 .175 15.8 .162 97.1 END 16.8 .218 11.3 .147 23.8 .309 10.5 .137 14.5 .189 76.9 BELOW AVERAGE 12.8 .221 10.0 .173 10.0 .173 10.0 .173 15.0 .257 57.8 ABOUT AVERAGE 16.3 .117 23.0 .165 43.3 .311 21.3 .153 35.3 .254 139.2 ABOVE AVERAGE 10:0 .154 12.5 .193 15.8 .244 13.5 .208 13.0 .199 64.8 1 16.7 .192 18.7 .215 21.0 .242 11.7 .135 18.7 .215 86.8 BELOW AVERAGE 12.3 .138 16.7 .188 27.7 .311 19.7 .221 12.6 .142 89.0 ABOUT AVERAGE 11.0 .120 14.0 15.0 .164 11.0 .120 40.3 .441 91.3 .V53 ABOVE AVERAGE 12.0' .146 11.3 .137 28.3 .344 18.0 .219 12.7 .154 82.3 13.0 .149 15.2 .173 23.0 .263 15.1 .173 21.1 .242 ~ VARIANCE OF CRITICALITY MEANS Strategies: S21: S22: S23: S24: S25: Schedule Schedule Schedule Schedule Schedule job with minimum resource demands job with maximum time demand job with maximum criticality job whose time demand is closest to the remaining time for a scheduled job job with maximum resource demands learning ability of a program trying to learn an optimization task. Before showing why error correction training should hamper this type of learning, we need to review three key features of our ordinal learning program. (a) The program possesses a preference routine that enables it to compare parts of new solutions with a bench mark solution. (b) The program implements positive (or negative) reinforcement by incrementing (or decrementing) those .cells in the table of connections (by Cpos or Cneg) that contributed to the new solution. (c) The program uses strategy j in state i with a probability equ~l to Viii 2:;:=1 Vik where VXY is the value of the cell corresponding to state x and strategy y. Hence, the summation is over all strategies in state i. Now we can state the theorem leading to our conclusion that error correction training will fail for any sequential 640 Spring Joint Computer Conference, 1970 optimization task representable as finding the optimal path through a tree of feasible solution. . Theorem: For any ordinal-feedback learning procedure possessing characteristics a, b, and c, error correction training will decrease the probability of selecting the "best" strategy or response in each stimulus situation as soon as (1) The "best" strategy is being used in over .50% of the situations encountered (2) The probability that the bench mark will be preferred to a new solution is greater than Cposj (Cpos + Cneg). In less formal terms, when the bench mark solution and the table of connections have both become pretty good, negative reinforcement will begin to make the table of connections worse. Though this theorem will be proved for a program with characteristics a, b, and c, the reader should realize that the theorem (in slightly different form) will hold for viable alternatives to characteristics band c. The central problem is that ordinal feedback becomes unreliable as the bench mark improves. Proof: Let Cpos be the increment for positive reinforcement. Cneg be the decrement for negative reinforcement. P be the probability that the bench mark solution is preferred after a determinate comparsion. V t ( i, j) be the entry in the table of connections corresponding to the ith state and the jth strategy (at time t). Et(V) be the average value of Vt(i, j) over all state-strategy pairs used in constructing a new path. q be the % of situations in which "best" strategies were used in constructing the new path (% of situations for which "best" strategies exist in which they were used). Now we can rewrite the two premises in the theorem as (P.1) q > E t+1(V) = P*(Et(V) - Cneg) + E t+1(V) = Et(V) (1 - P)*(Et(V) + Cpos - (Cpos (P.2) P > Cposj (Cpos + Cneg), Cpos> 0, Cneg> 0 From our description of the learning mechanism, we know that after a positive comparison and after a negative comparison = Et(V) - Cneg (2.2) + Cpos) + Cneg)*P (2.3) Cposj(Cpos + Cneg); (Cpos but from P.2 we know P > therefore + Cneg) *P > Cpos and from (2.3), < Et(V) E t+1(V) In other words, once P > Cposj (Cpos + Cneg) we can expect the pairs used in constructing new paths to be decremented. As a result those pairs not used on the path ",ill become more likely to be selected. Let D be the expected decrement in the probability of selecting a "best" strategy that was used on the new path. I be the expected increment in the probability of selecting a "best" strategy that was not used on the new path. Since the probabilities of selection in any state must sum to unity, and since there are more than two strategies per state, and only one is decremented, D >I (2.4) Now we can write an expression for the expected change in the probability of selecting a best strategy. Let ~prob be the expected increase in the probability of selecting a "best" strategy after a reinforcement. ~prob = -q*D + (1 - q)*I (2 .•5) but from P.2, q> .50 (q < 1) q > (1 - q) Therefore, using (2.4), we get from (2.5) that ~prob .50 E't+l(V) Hence, the overall expectation following a determinate comparison is <0 Hence, we have shown that the probability of selecting a "best" strategy must decrease, and our theorem is proved. To test this hypothesis we attempted to train the program again from scratch on the same problems using both positive and negative reinforcement (the decrement for negative reinforcement was 1). The procedure was the same as in Experiplent I, but the results were quite different. Study of Heuristic Learning Methods TABLE V-A Comparison of Solution Rates in Experiments I and II (2)a (1) Mean Mean 10 Minute 10 Minute Rate Rate on First Two on Test (2) - (1) Trials Learning Trials Experiment I (Positive Reinforcement) 1.6 4.0 +2.2** Experiment II (Positive and Negative Reinforcement) 1.9 1.5 -0.4 Difference (I) - (II) -0.3 * t = 3.309, df = 10, p ** U = 0, p < .001 a +2.5* 170 165 BASE SOLUTION 160 •\ - 155 < .005 Including additional training on Problem 2. 150 10 Results The improvement in performance was significantly less for training with both positive and negative reinforcement than it had been in Experiment I for positive reinforcement alone. In Table 5 the solution rates for training and test trials in Experiments I and II are compared. One can see that the solution rates were significantly inferior when negative reinforcement was included. This is also quite clear from the learning curves shown in Figures 7.1 to 7.3 and 9.1 to 9.2. The performance seems to have improved on the first training trial and then worsened. One can compare the test trials in this TEST 255 " BASE SOLUTION % t; 250 z w -' \....-_ ---_.- - z 245 o i= :::) g ................ 240 (I) 10 30 50 70 641 90 110 130 COMPUTER TIME (in 100 .econds) Figure 7.1-The solutions produced during training on Problem 1 with positive and negative reinforcement. 30 50 Figure 7.2-The solutions produced during additional training on Problem 2 with positive and negative reinforcement. experiment with those in the first experiment (Figures 4.1 to 6.2) and note the substantial differences. The reduction in entropy in the tables of connections trained with negative reinforcement was also less. The reduction in the table trained on Problem 1 was only 13% of the possible. One should not overemphasize this difference, however, since orderliness does not necessarily imply that the desired order has been achieved. Nevertheless, it is interesting to note that 90% of the 13% reduction in entropy during training on Problem 1 occurred during the first training trial. This, of course, is in accord with our hypothesis. One can see some of the more subtle effects of negative reinforcement more clearly by looking at the within trial behavior of the program. During the three training trials on Problem 1 and the following two trials on Problem 2 there were 518 positive reinforcements of state-strategy pairs in contrast to 1701 negative reinforcements of strategy pairs. Let us look in detail at the behavior of the program within Training Trial 1 on Problem 1 and compare it with the corresponding training trial in Experiment I. The rate of occurrence of negative comparisons was slightly less in this experiment. This is not surprising since negative reinforcements would make it less likely that the program would repeat a series of bad decisions and encounter 642 Spring Joint Computer Conference, 1970 PROB.3 ~ 1.0 IU 255 ~ a:: BASE 225 .....--- ~ I/) • = A DETERMINATE COMPARISON .9 .8 IU SOLUTIONS ~ .7 tl) 250 ~ ~ . _ ..........---......~.., .5 245 :S21 .3 ~ .2, ~ 215 ......,. ... I iii r~ " S21,22,24,25 ~_ ... ... ---_. ___ ~- .... -----....\..~ 24 S25".__ '\" .....- .1 g: 0.0 ~ ..... J ./ > ~ ~' ............ /tIf\ S23 ~.4 220 S 21 • ~ .6 .............. S22,24,25 ~ LL1..LL..LLLJLLJLLJL...LJL...LJ...L...i..J..J.:.1""""':..LJ...L.J...L.L.z.J;....L.L.l..J...l..J...l..J..J...J..J o 50 fOO f50 200 250 300 350 400 NUMBER OF COMPARISONS 240 Figure 8. I-The probabilities of the program selecting each of the five strategies as a function of the number of comparisons of partial solutions during learning. The data are for one particular task situation (state) corresponding to average time requirements, average criticality, and the beginning of a solution. The data are from the first training trial on Problem 1 in Experiment II. 210 235 10 10 Figure 7.3-The solutions produced during test trials on Problems 1 and 3. another negative reinforcement. From the large fluctu-, ations in the sum of the entries in the table of connections during the trial, we could see that positive reinforcements were the dominant influence at the beginning of the trial, but that their effect could have been wiped out by the negative reinforcements later in the .trial. Besides preventing the repetition of bad decision sequences, negative reinforcement introduces more variety into the decision making process. In other words, negative reinforcement can move the table of connections off a locally optimal structure to search for a better structure. The greater variety in decision making is best seen by comparing Figure 8.2 with Figure 5.2. However, the total effect of these characteristics of negative reinforcement in changing the structure of the table of connections is best shown by Figure 8.1. With negative reinforcement the changes in probability were more erratic. A new strategy suddenly increased in probability after the trial was half over. This experiment demonstrates that negative reinforcement can never be ,used as freely as positive reinforcement in learning optimization tasks. The many previous methods based solely on error correction train- ing would perform poorly on optimization tasks. N evertheless, one can see that negative reinforcement has some desirable effects: it prevents the table of connections from becoming stranded on local optima and causes a greater variety of decisions to be investigated. o 150 LU fJ) ;::) fJ) a:: 125 ;;( u..Q. 0> a::~ I- fOO •• LU CDc( ~ ~ 75 •••• z'? ..J LU ...... - c(1- - I- ~ ••• •• ••••• ••••••• •••• • • 50 .. efJ) ~ LU •• • ~ 25 • u.. u.. o o 50 100 f 50 200 250 300 NUMBER OF COMPARISONS 350 400 Figure 8.2-The number of different state-strategy pairings used as a function of the number of comparisons of partial solutions during learning. The data are from the first training trial on Problem 1 in Experiment II. Study of Heuristic Learning Methods 230 ~ 225 ~_ _ _.....::B:..:.A..:..::S:..::E:..--=.SO=-:L::..:U:.....:T..:...IO=-:N...:.....-_.3ir-_ __ (!) z \ ~ ....I z 220 o t= ~ ~ en 215 210 10 30 50 \ \- 70 90 no 130 COMPUTER TIME (in 100 seconds) Figure 9.1-The solutions produced during training on Problem 3 with positive and negative reinforcement. Hence, we would like to find a way to reap some of the benefits of negative reinforcement while avoiding the pitfalls. EXPERIMENT III (d) During training should the program always strive to produce the best possible solution? An implicit assumption in many previous learning mechanisms is that the path to becoming an excellent problem solver is monotonic. However, if this assumption-that the strategies employed by a good problem solver will be worthwhile for an expert-is not always true, then a learning program needs a mechanism for exploring solutions outside of those suggested by its previous learning? In different terms, doesn't a learning program need a way to escape local optimums generated by particu1ar strategies and to experiment with new strategies that may lead to the global optimum? If one views an optimization problem as the problem of finding an ideal path in a tree of solutions, he can see why a learning program would need such mechanisms. The strategies that generate a path (solution) are reinforced if that path is superior to previous paths. Des~rably enough, this somewhat narrows the scope of future search to branches likely to be selected by the same strategies. Eventually, a solution will be reached that cannot be exceeded in a reasonable amount of time. At this point all branches off this path (assuming some entropy in the search process) and all branches likely to be reached with the same strategies should have been tried and found inferior. But there is no guarantee that a radical change in several strategies at some point on the path might not lead to an equal or 643 better path. The danger is that a few radical changes in strategies might consistently produce quite different paths and superior solutions, but these changes would never be investigated because anyone of them, alone, coupled with the learned series of strategies, only leads to a branch off the old path and a worse solution. Therefore, it is suggested that a program that adopts a short period of relatively non-directive search at the end of a learning sequence where improvement has terminated will learn a superior decision structure and eventually perform better than a program that spends all its time searching on the basis of its past experience. Admittedly, such a non-directive search would be time consuming and costly in that performance would be bad during learning. MacKay, 8 . in fact, has suggested deliberately selecting bad strategies during learning so that they can be eliminated with negative reinforcement. However, such a method would not help much in selecting a strategy of little utility most of the time that is of the highest utility in moving from good to excellent solutions. It is suggested that what is needed is a mechanism for relatively random exploration of strategies whenever it appears that the program is "hung up" on a local optimum. How useful such an addition to a learning procedure would be is the fourth issue for study. PROBe 1 255 PROB.3 BAS-E 225 1---- SOLUTIONS 250 - 220 245 215 240 210 235 10 to Figure 9.Z---':The solutions produced during test trials on Problems 1 and 3. 644 Spring Joint Computer Conference, 1970 255 ~ 250 :z: ~ >- 1.0 (!) LLJ ..... BASE SOLUTION a:: I- 245 U) § 5240 ' .. 235 .9 .6 Z I- U LLJ .5 LLJ Figure 1O.1-The solutions formed during special additional training on Problem 1 in Experiment III. The initial memory structure has been trained with positive reinforcement alone in Experiment I (see Figs. 4.1 and 4.2). Both positive and negative reinforcement were used on these three trials. In addition, after the first bench mark was formed, startegies were selected completely at random during '!'rials 1 and 3. . en .4 b... o .3 ...>- .2 ....J m CJ) 0:: 125 c.!) 0::L.&.J 100 wIm00 z I L.&.J ....JI, Ot, <, ~) MOVE (RO)+, Rl MOVE (RO)+, R2 MOVE Rl, -(RO) MOVE R2, _(RO) MOVE fN, RO COM_O BCLR (RO)+, _0 lStack pointer has been arbitrarily used as register RO for this example. Figure 14-Stack computer instructions and equivalent PDP-ll instructions However, with the PDP-ll there is an address method for improving the program encoding and run time, while not losing the stack concept. An encoding improvement is made by doing an operation to the top of the stack from a direct memory location (while loading). Thus the previous example could be coded as: load stack B divide stack by C add A to stack store stack D Use as a one-address (general register) Illachine The PDP-II is a general register computer and should be judged on that basis. Benchmarks have been coded to compare the PDP-II with the larger DEC PDP-IO. A 16 bit processor performs better than the DEC PDP-IO in terms of bit efficiency, but not with time or memory cycles. A PDP-II with a 32 bit wide memory would, however, decrease time by nearly a factor of two, making the times essentially comparable. The most significant factor that affects performance is whether a machine has operators for manipulating data in a particular format. The inherent generality of a stored program computer allows any computer by subroutine to simulate another-given enough time and memory. The biggest and perhaps only factor that separates a small computer from a large computer is whether floating point data is understood by the computer. For example, a small computer with a cycle time of 1.0 microseconds and 16 bit memory width might have the following characteristics for a floating point add, excluding data accesses: programmed: programmed (but special normalize and differencing of exponent instructions) : 75 microseconds microprogrammed hardware: 25 microseconds hardwired: Figure 15 lists typical two-address machine instructions together with the equivalent PDP-II instructions 2 microseconds I t should be noted that the ratios between programmed and hardwired interpretation varies by roughly two orders of magnitude. The basic hardwiring scheme and the programmed scheme should allow binary program compatibility, assuming there is an interpretive program for the various operators in the Model 20. For example, consider one scheme which would add eight 48 bit registers which are addressable in the extended instruction set. The eight floating registers, F, would be mapped into eight double length Two Address Computer !2!:::l! A +- B; transfer B to A MOVE B,A A +-AiB; add ADD B,A -, x, / (see add) A +- _A; negate NEG A A +-A V B; inclusive or 'SETB,A A +-.., A; not COM jump UIlcOIldit101led .AAfP Test A, and transfer to B Use as a two-address machine 250 microseconds TST A BR (_. ~. >. ~. <. s) B Figure H)-Two address computer instructions and equivalent PDP-ll instructions The DEC PDP-II (32 bit) registers, D. In order to access the various parts of F or D registers, registers FO and FI are mapped onto registers RO to R2 and R3 to R5. Since the instruction set operation code is almost completely encoded already for byte and word length binary ops bop' D S 669 data, a new encoding scheme is nece~sary to specify the proposed additional instructions. This scheme adds two instructions: enter floating. point mode and execute one floating point instruction. The instructions for floating point and double word data would be: op floating point/f and double word/ d f- FMOVE FADD FSUB FMUL FDIV FCMP DMOVE DADD DSUB DMUL DDIV DCMP FNEG DNEG + X / compare unary ops uop' D LOGICAL DESIGN OF S(UNIBUS) AND PC The logical design level is concerned with the physical implementation and the constituent combinatorial and sequential logic elements which form the various computer components (e.g., processors, memories, controls). Physically, these components are separate and connected to the Unibus following the lines of the PMS structure. Bus control Most of the time the processor is bus master fetching instructions and operands from memory and storing results in memory. Bus mastership is determined by the current processor priority and the priority line upon which a bus request is made and the physical placement of a requesting device on the linked bus. Unibuo . Unibus organization ('Bul Addrell) .. Figure 16 gives a PMS diagram of the Pc and the entering signals from the Unibus. The control unit for the Unibus, housed in Pc for the Model 20, is not shown in the figure. The PDP-II Unibus has 56 bi-directional signals conventionally used for program-controlled data transfers (processor to control), direct-memory data transfers (processor or control to memory) and control-toprocessor interrupt. The Unibus is interlocked; thus transactions operate independent of the bus length and response time of the master and slave. Since the bus is bi-directional and is used by all devices, any device can communicate with any other device. The controlling device is the master, and the device to which the master is communicating is the slave. For example, a data transfer from processor (master) to memory (always a slave) uses the Data Out dialogue facility for writing and a transfer from memory to processor uses the Data In dialogue facility for reading. \ '" s' / ... D(ohift) /" t(t adder'loa~Cal ~ ... Mr16 word; ' lntegrated~ 2 circuit-, 8cratchpad', t/ OpI) (processor .. M( 'A; latch)" .. M( 'B; latch)" / . / 8 tate i 8 (temporary; 2 wordl); 16 blu/word M( 'H; 18 bitl,. J'--.....__prace.,or .tate I ' i I _ _-_-J-_--. f(deCoder)-,M('lnotrucUon Reglater/lR; 16 blts)" ~11(~lnatrucuOD "J R9s3:O> I I Control ~ G:'PCSJoO? cltlalize f 1( 'Pc BUI Control)a':--_ _ _ _....J 4 ""'ter SynC Bus Buoy J~lcontrOl components to above (statua), A, B, IR, Conaole; M(20 blts) Slave Sync, Parlty Avallable I Set Interpreter: Ihlft, add, M(acratchpad), 1( 'Timlng) -> L( 'Bus C';"trol)3_ t T(c lock)6 !n ... data operations to perform ahiftlna and acldina ... usually cm.binatorial : 3~ ::~~!~8 u:~t:;w:::U!!:i:~t:~i:~h::e~:!c:~::. "s . . awitch. used to gate contents of an H to a let of linea :T - Tranoducer - to encode tlme lnto a loglc (clock) olgnal RotaUon to denote 16 11nel _ed 15,14·••••• 1.0 Figure 16-PDP-ll Pc structure 670 Spring Joint Computer Conference, 1970 The assignment of bus mastership is done concurrent with normal communication (dialogues). Dnibus dialogues Three types of dialogues use the Unibus. All the dialogues have a common protocol which first consists of obtaining the bus mastership (which is done concurrent with a previous transaction) followed by a data exchange with the requested device. The dialogues are: Interrupt; Data In and Date In Pause; and Data Out and Data Out Byte. Interrupt Interrupt can be initiated by a master immediately after receiving bus mastership. An address is transmitted from the master to the slave on Interrupt. Normally, subordinate control devices use this method to transmit an interrupt signal to the processor. Data in and data in pause These two bus operations transmit slave's data (whose address is specified by the master) to the master. For the Data In Pause operation data is read into the master and the master responds with data which is to be rewritten in the slave. memory holds most of the 8-word processor state found in the ISP, and the 8 bits that form the Status word are stored in an 8-bit register. The input to the adder-shift network has two latches which are either memories or gates. The output of the adder-shift network can be read to either the data or address parts of the Unibus, or back to the scratch-pad array. The instruction decoding and arithmetic control are less regular than the above data and state and these are shown in the lower part of the figure. There are two major sections: the instruction fetching and decoding control and the instruction set interpreter (which in effect defines the ISP). The later control section operates on, hence controls, the arithmetic and state parts of the Pc. A final control is concerned with the interface to the Unibus (distinct from the Unibus control that is housed in the Pc). CONCLUSIONS In this paper we have endeavored to give a complete description of the PDP-II Model 20 computer at four descriptive levels. These present an unambiguous specification at two levels (the PMS structure and the ISP), and, in addition, specify the constraints for the design at the top level, and give the reader some idea of the implementation at the bottom level logical design. We have also presented guidelines for forming additional models that would belong to the same family. Data out and ~data out byte These two operations transfer data from the master to the slave at the address specified by the master. For Data Out a word at the address specified by .the address lines is transferred from master to slave. Data Out Byte allows a single data byte to be transmitted. Processor logical design The Pc is designed using TTL logical design components and occupies approximately eight 8" X 12" printed circuit boards. The organization of the logic is shown in Figure 17. The Pc is physically connected to two other components, the console and the Unibus. The control for the Unibus is housed in the Pc and occupies one of the printed circuit boards. The most regular part of the Pc, the arithmetic and state section, is shown at the top of the figure. The 16-word scratchpad memory and combinatorial logic data operators, D(shift) and D(adder, logical ops) , form the most regular part of the processor's structure. The 16-word ACKNOWLEDGMENTS The authors are grateful to Mr. Nigberg of the technical publication department at DEC and to the reviewers for their helpful criticism. Weare especially grateful to Mrs. Dorothy Josephson at CarnegieMellon University for typing the notation-laden manuscript. REFERENCES 1 R HALLMARK J R LUCKING Design of an arithmetic unit incorporationg a nesting store Proc IFIP Congress pp 694-698 1962 2 G M AMDAHL G A BLAAUW F P BROOKS JR Architecture of the IBM System/360 IBM Journal Research and Development Vol 8 No 2 pp 87-101 April 1964 3 C G BELL A NEWELL Computer structures McGraw-Hill Book Company Inc New York In press 1970 The DEC PDP-II 4 A W BURKS H H GOLDSTINE J VON NEUMANN Preliminary discussion of the logical design of an electronic computing instrument, Part II Datamation Vol 8 No 10 pp 36-41 October 1962 5 W S ELLIOTT C E OWEN C H DEVON ALD B G MAUDSLEY The design philosophy of Pegasus, a quantity-production computer Proceedings IEEE Pt. B 103 Supp 2 pp 188-196 1956 6 F M HANEY Using a computer to design computer instruction se~s Thesis for Doctor of Philosophy degree College of Engineering and Science Department of Computer Science Carnegie-Mellon University Pittsburgh Pennsylvania May 1968 671 7 W LONERGAN P KING Design of the B5000 system Datamation Vol 7 No 5 pp 28-32 May 1961 8 W D MAURER A theory of computer instructions Journal of the ACM Vol 13 No 2 pp 226-23.5 April 1966 9 S ROTHMAN R /W 40 data processing system International Conference on Information Processing and Auto-math 59 Ramo-Wooldridge (A division of Thompson Ramo Wooldridge Inc) Los Angeles California June 19.59 10 M V WILKES The best way to design an automatic calculating machine Report of Manchester University Computer Inaugural Conference July 1951 (Manchester 1953) APPENDIX 1 DEC PDP-II instruction set processor Description (in ISPL*) The following description is not a detailed description of the instructions. The description omits the trap behavior of unimplemented instructions, references to non-existent primary memory and io devices, SP (stack) overflow, and power failure. Primary Memory State M/Mb/Memory[O:216 -1J(7:0) l\1w[O:215 -1J(15:0) : = l\tl[O:216-1J(7:0) Processor State (9 words) R/Registers[O: 7J (15: 0 ) SP(15:0) : = R[6J(15:0) PC(15:0) : = R[7J(15:0) (byte memory) (word memory mapping) (word general registers) (stack pointer) (program counter) *ISP NOTATION Although the ISP language has not been described in publications, its syntax is similar to other languages. The language is inherently interpreted in parallel, thus to get sequential evaluation the word "next" must be used. Italics are used for comments. The following notes are in order: a: = f( . .. ) equivalence or substitution process used for name and process substitution. For every occurrence of a,f( . .. ) replaces it. a+--f(· .. ) Replacement operator; the contents in register a are replaced by the value of the function. register declaration, e.g., Q[O:I] [0:4095] (15:0) an array of words of two dimensions 2 and 4096; each word has 16 bits denoted 15, 14, 13, ... , 1, 0 (a:b )n Denotes a range of characters a, a + 1. ... , b to base n. If n is not given, the base is 2. [c:d] Array designation c, c + 1, ... , d a-+b; equivalent to ALGOL if a then b "next" sequential interpretation instruction declaration, e.g., ADD (: = bop = 0010) -+ (ee, D +--D + S) defines the "ADD" instruction, assigns it a value, and gives its operation. ADD is executed when bop = 00102, Equivalent to: ADD -+ (ee, D +-- D + S) where ADD: = (bop = 0010) bop has been previously declared o concatenation, consider the combined registers as one operators: = (+/add I-/subtract/negate I X/multiply I //divide I A/and I V lor I v'/not I ES/exclusiveor I =/equal/>/greater than I ~ I < I ~ I rE= I modulo I etc.) 672 Spring Joint Computer Conference, 1970 PS(15:0) Priority/P(2:0) : = PS(7:5) CC/ConditionLJCodes(3:0) : = PS(3:0) Carry/C : = CC(O) Negative/N : = CC(3) Zero/Z : = CC(2) Overflow/V: = CC(I) Trace/T : = ST(4) Undefined(7:0) : = PS(15:8) Run Wait Instruction Format (Bit assignments used in the various instruction formats) i/instruction (15: 0) bop(3:0) : = i(15:12) uop(15: 6) : = i(15:6) brop(15:8) : = i(15:8) sop(15:6) : = i(15:6) s/source(5:0) : = i(II:6) sm(O: 1) : = s(5:4) sd : = s(3) sr : = s(2:0) d/destination(5:0) : = i(5:0) dm(O: 1) : = d(5:4) dd : = d(3) dr(2:0) : = d(2:0) offset(7:0 : = i(7:0) addressLJ increment/ai Data Types by/byte(7:0) w/word(15:0) by.i/byte.integer(7: 0) w.i/word.integer(15: 0) by. bv/byte. booleanLJ vector (7: 0) w.bv /word.booleanLJ vector (15 :0) (processor state register) ( under program control; priority level of the process currently being interpreted a higher level process may interrupt or trap this process) (under program control; when set, each instruction executed will trap; used for interpretive and breakpoint debugging) (a result condition code indicating an arithmetic carry from bit 15 of the last operation) (a result condition code indicating last result was negative) (a result condition code indicating last result was zero) (a result condition code indicating an arithmetic overflow of the last operation) ( denotes whether instruction trace trap is to occur after each instruction is executed) (unused) (denotes normal execution) (denotes waiting for an interrupt) (binary operation code) (unary operation code) (branch operation code) (shift operation code) (source control byte) (source mode control) (source defer bit) (source register) (destination control byte) (signed 7 bit integer) (implicit bit derived from i to denote byte or word length operations) (signed integers) (boolean vectors (bits» The DEC PDP-II d/doubleLJ word(31:0) t/tripleLJ word (47: 0) f/t.f/triple.fioating LJ Point(47: 0) 673 (*double word) (*triple word) ( *triple floating point) Source/S and Destination/D Calculation S/Source(15:0) : = (-, sd ~ ( (sm = 00) ~ R[srJ; (sm = 01) /\ (sr ~ 7) ~ (M[R[srJJ; next R[srJ f- R[srJ + ai); (sm = 01) /\ (sr = 7) ~ (M[PCJ; PC f- PC + 2); (sm = 10) ~ (R[srJ f- R[srJ - ai; next M[R[srJJ); (sm = 11) /\ (sr ~ 7) ~ (M[M[PCJ + R[srJJ; PC f- PC + 2); (sm = 11) /\ (sr = 7) ~ (M[M[PCJ + PCJ; PC f- PC + 2)); sd~ ( (sm = 00) ~ M[R[srJJ; (sm = 01) /\ (sr ~ 7) ~ (M[M[R[srJJJ; next R[srJ f- R[srJ + ai); (sm = 01) A (sr = 7) ~ (M[lVI[PCJJ; PC f- PC + 2); (sm = 10) ~ (R[srJ f- R[srJ - ai; next M[R[srJJ); (sm = 11) /\ (sr ~ 7) ~ (M[M[PCJ + R[srJJ; PC f- PC + 2); (sm = 11) /\ (sr = 7) ~ (M[l\1[M[PCJ + PCJJ; PC f- PC + 2)) (direct access) (register) (auto increment) ( immediate) (auto decrement) (indexed) (relative) (indirect access) (indirect via register) (indirect via stack, auto decrement) (direct absolute) (indirect via stack, auto increments) (indirect, indexed) (indirect relative) (The above process defines how operands are determined (accessed) from either memory or the registers. The various length operands, Db (byte) , Dw(word) , Dd(double) and Df(floating) are not completely defined. The Source/S and Destination/D processes are identical. In the case of jump instruction an address, D', is used-instead of the word in location M[CI].) Instruction Interpretation Process -, InterruptLJrqs /\ Run /\ Wait T ~ (SP f- SP + 2; next M[SPJ f- PS; SP f-SP + 2; next M[SPJ f-PC; PC f- M[14sJ ST f- M[16 sJ)) InterruptLJrq[j] /\ (CC[j] SP f- SP + 2; next M[SPJ f-PS; > ~ (i f- M[PCJ; PC f- PC + 2; next instructionLJ execution; next CC) /\ Run ~ (T f- 0; SPf-SP + 2; M[SPJ f-PC PC f- M[f(j) J PS f- M[f(j) + 2J) Instruction Set and the Execution Process (fetch) (execute) (trace bit store state) (interrupt) (store state and PC enter new process). The locations M[ f( j) J are: reserved instruction = M[lOJ illegal instruction = M[ 4- J stack overflow = M[4-J bus errors = M[4-J) (The following instruction set will be defined briefly and is incomplete. It is intended to give the reader a simple understanding of the machine operation.) InstructionLJ execution : = ( MOV(: = bop = 0001) ~ (CC,D f- S); MOVB(: = bop = 1001) ~ (CC,Db f- Sb); * not hardwired or optional (move word) (move byte) 674 Spring Joint Computer Conference, 1970 Binary Arithmetic: D ~ D b S; ADD(: = bop = 0110) ~ (CC,D ~ D+s); SUB(: = bop = 1110) ~ (CC,D ~ D - S); CMP(: = bop = 0010) ~ (CC ~ D - S); CMPB(: = bop = 1010) ~ (CC ~ Db - Sb); MUL(: = bop = 0111) ~ (CC,D ~ D X S); DIV(: (add) (subtract) (word compare) (byte compare) (*multiply if D is a register then a double length operator) (*divide, if D is a register, then a remainder is saved) = bop = 1111) ~ (CC,D ~ D/S); Unary Arithmetic D ~ u S; CLR(: = uop = 050 8) ~ (CC,D ~ 0); CLRB(: = uop = 10508) ~ (CC,Db ~ 0); COM(: = uop = 051 8) ~ (CC,D ~ -,D); COMB(: = uop = 1051 8) ~ (CC,Db ~ -,Db); INC(: = uop = 052 8) ~ (CC,D ~ D + 1); INCB(: = uop = 10528 ) ~ (CC,Db ~ Db + 1); DEC(: = uop = 053 8) ~ (CC,D ~ D - 1); DECB(: = uop = 1053 8) ~ (CC,Db ~ Db - 1); NEG(: = uop = 054 8) ~ (CC,D ~ - D) ; NEGB(: = uop = 10548) ~ (CC,Db ~ -Db) ADC(: = uop = 055 8) ~ (CC,D ~ D + C); ADCB(: = uop = 1055 8) ~ (CC,Db ~ Db + C); SBC(: = uop = 056 8) ~ (CC,D ~ D - C); SBCB(: = uop = 1056 8) ~ (CC,Db ~ Db - C); TST(: = uop = 057 8 ) ~ (CC ~ D); T8T(: = uop = 1057 8) ~ (CC ~ Db); Shift operations: D ~ D X 2n; ROR(: = sop = 060 8) ~ (COD ~ COD/2{rotate}; RORB(: = sop = 10608) ~ (CODb ~ CODb/2{rotate}); ROL(: = sop = 061 8 ) ~ (COD ~ COD X 2{rotate}); ROLB(: = sop = 1061 8) ~ (CODb ~ CODb X 2{rotate}); ASR(: = sop = 062 8) ~ (CC,D ~ D X 2); ASRB(: = sop = 10628 ) ~ (CC,Db ~ Db/2); ASL(: = sop = 063 8 ) ~ (CC,D ~ D X 2); A8LB(: = sop = 1063 8) ~ (CC,Db ~ Db X 2); ROT(: = sop = 064 8) ~ (COD ~ D X 2 ROTB(.: = sop = 1064 8) ~ (CODb ~ D X 28 ) ; LSH(: = sop = 065 8) ~ (CC,D ~ D X 2 {logical}) ; LSHB(: = sop = 1065 8) ~ (CC,Db ~ Db X 2 {logical}) ; ASH(: = sop = 066 8) ~ (CC,D ~ I)/X 2 ASHB(: = sop = 1066 8 ) ~ (CC,Db ~ Db X 2 NOR(: = sop = 067 8) ~ (CC,D ~ normalize(D»; (R[r'] ~ normalizeL..Jexponent(D»; NORD(: = sop = 1067 8) ~ (Db ~ normalize (Dd) ; R[r'] ~ normalizeL..Jexponent(D»; SWAB(: = sop = 3) ~ (CC,D ~ D(7:0, 15:8» 8 ); 8 8 8 ); 8 ); Logical Operations BIC(: = bop = 0100) ~ (CC,D ~ D ~ D 1\ -,8); BICB(: = bop = 1100) ~ (CC,Db ~ Db V -,Sb); BIS(: = bop = 0101) ~ (CC,D ~ D VS); BISB(: = bop = 1101) ~ (CC,Db ~ Db V Sb); BIT(: = bop = 0011) ~ (CC ~ D 1\ S); BITB(: = bop = 1011) ~ (CC ~ Db 1\ Sb); (clear word) (clear byte) (complement word) (complement byte) (increment word) (increment byte) (decrement word) (decrement byte) (negate) (negate byte) (add the carry) (add to byte the carry) (subtract the carry) ( subtract from byte the carry) (test) (test byte) (rotate right) (byte rotate right) (rotate left) (byte rotate left) (arithmetic shift right) (byte arithmetic shift right) (arithmetic shift left) (byte arithmetic shift left) (rotate) (byte rotate) (*logical shift) (*byte logical shift) (*arithmetic shift) (*byte arithmetic shift) ( *normalize) (*normalize double) (swap bytes) (bit clear) (byte bit clear) (bit set) (byte bit set) (bit test under mask) (byte bit test under mask) The DEC PDP-II Branches and Subroutines Calling: PC f- f; JMP(: = sop = 0001s) ~ (PC f- D'); BR(: = brop = 01 16) ~ (PC f- PC + offset); BEQ(: = brop = 03 16) ~ (Z ~ (PC f- PC + offset»; BNE(: = brop = 02 16) ~ (-,Z ~ (PC f- PC + offset»; BLT(: = brop = 05 16) ~ (N E9 V ~ (PC f- PC + offset»; BGE(: = brop = 04 16) ~ (N == V ~ (PC f- PC + offset»; BLE(: = brop = 07 16) ~ (Z V (N E9 V) ~ (PC f- PC + offset»; BGT(: = brop = 06 16) ~ (---, (Z V (N E9 V» ~ (PC f- PC + offset»; BCS/BHIS(: = brop = 87 16) ~ (C ~ (PC f- PC + offset»; BCC/BLO(: = brop = 86 16) ~ (-,C ~ (PC f- PC + offset»; BLOS(: = brop = 83 16) ~ (C /\ Z ~ (PC f- PC + offset»; BHI(: = brop = 82 16) ~ « -,C V Z) ~ (PC f- PC + offset»; BVS(: = brop = 85 16) ~ (V ~ (PC f- PC + offset»; BVC(: = brop = 84 16) ~ (-, V ~ (PC f- PC + offset» ; BMT(: = brop = 8h6) ~ (N ~ (PC f- PC + offset»; BPL(: = brop = 80 16) ~ (-,N ~ (PC f- PC + offset» ; JSR(: = sop = 0040s) ~ ( SP f- SP. - 2; next M[SP] f- R[sr]; R[sr] f- PC; PC f- D); RTS(: = i = 000200s) ~ ( PC f- R[dr]; R[dr] f- M[SP]; SPf-SP + 2); 675 (jump unconditional) (branch unconditional) (equal to zero) (not equal to zero) (less than (zero» (greater than or equal (zero» (less than or equal (zero» (less greater than (zero» (carry set; higher or same (unsigned» (carry clear; lower (unsigned» (lower or same (unsigned» (higher than (unsigned» (overflow) (no overflow) (minus) (plus) (jump to subroutine by putting R[sr], PC on stack and loading R[sr] with PC, and going to subroutine at D) (return from subroutine) Miscellaneous processor state modification: RTl(: = i = 2s) ~ (PC f- M[SP]; SP f- SP + 2; next PS f- M[SP]; SPf-SP + 2); HALT(: = i = 0) ~ (Run f- 0); WAIT(: = i = 1) ~ (Waitf-1); TRAP(: = i = 3) ~ (SP f- SP + 2; next M[SP] f- PS; SP f- SP + 2; next M[SP]f-PC; PC f-l\1:[34s]; PS f-l\1:[12]); EMT(: = brop - 82 16) ~ ( SP f- SP + 2; next M[SP]f-PS; SP f- SP + 2; next M[SP]f-PC; PC f- M[30 s]; PS f- M[32s]) ; lOT(: = i = 4) ~ (see TRAP) RESET ( : = i = 5) ~ (not described) OPERATE(: = i(5: 15) = 5) ~ ( i(4) ~ (CC f- CC V i(3:0»; -,i(4) ~ (CC f- CC /\ -, i(3:0»); end lnstructionL...J execution (return from interrupt) (trap to M[34s] store status and PC) (enter new process) (emulator trap) (1/0 trap to M[20 s]) (reset to external devices) (condition code operate) (set codes) (clear codes) A systems approach to minicomputer I/0 by FRED F. COURY Hewlett-Packard Company Cupertino, California INTRODUCTION You can tell a lot about a guy by the way he draws a block diagram of a computer system. If he draws the central processor and memory as small boxes off in a corner, then proceeds to fill the page with an elaborate portrait of the input/ output system, he is" usually referred to as (among Dther things) an "I/O type". I have drawn several such diagrams, and I offer this information as a caveat to the reader. In the pages to follow, I shall outline and attempt to justify some of my views on minicomputer I/O, particularly on "where we should be going from here". If some of the suggestions are already being implemented, I think they are steps in the right direction. If, on the other hand, some of the ideas seem too far out, consider the source. A BIT OF HISTORY I guess things started the way they did for several reasons. Hardware (relays, vacuum tubes, power supplies, and air conditioners) was very expensive, especially in the large quantities necessary for computing. The resulting machines were so incredibly complex (literally thousands of relays and vacuum tubes) that just getting one to work was a major accomplishment. In spite of the complexity involved, the actual capability of the early machines was limited to large-scale automatic number-crunching. It is not hard to understand that hardware optimization was foremost in the designers mind. U nfortunately, programming these first machines was quite difficult due to the limited storage available in the machines, and also due to the fact that no programming frills (such as assemblers) were provided. I/O was no real problem, since most of the early machines were clearly compute-bound, especially in number-crunching applications, and most I/O was simple card input, line printer (or card) output. 677 Engineers took advantage of technological developments (core memories, transistors) to build faster, more powerful machines. Programmers began to apply the new machines to a wide variety of problems (such as writing assemblers) and began to explore the true potential of computers. As the number of machines increased, users (programmers) began to outnumber designers (engineers). They wanted to have something to say about the design of the machines they would be using before it was too late. The engineers made the computers work, but the programmers made the computers do something. It was recognized that the important parameter to optimize was overall system performance. The engineers had to worry not only about how fast a machine could multiply two numbers together, but how efficiently the machine could be programmed to invert a matrix. I t is now common practice for computers to be designed by teams of engineers (with programming experience) and systems programmers (with hardware understanding) in order to optimize the overall performance of the resulting hardware/software system. Also, the emphasis is shifting from hardware minimization to people optimization. As the cost of hardware goes down, and the cost of people goes up, the way to minimize cost is to maximize the efficiency of people in the design, production, programming, and eventual use of the system. A GLIMPSE -INTO THE FUTURE In the near future, especially in some of the new minicomputer markets, the vast majority of computer users will not be programmers. As a matter of fact these users will not want to program computers. They won't particularly even want to use computers. They will have questions to be answered, problems to be solved, and things to be done. If a computer offers a better way (or, in some case, the only way) to do it, 678 Spring Joint Computer Conference, 1970 people will consider using a computer. Otherwise, they will choose another method, or not do it at all. Let's face it ... the novelty is wearing off. The small computer industry must come of age. Weare approaching the same position as the commercial airlines are in now. People don't fly just because they want a plane ride. They want to get somewhere, and flying happens to be the best (fastest, cheapest, most convenient) way to get there. If it's not, they will choose a better way to go or they will stay at home. And most people are no longer interested in "roughing it" (wearing goggles and helping to start the engines). In most cases, the less they are aware of the fact that they are flying, the better they like it. This attitude is reflected in boarding ramps at the airports, and music, drinks, dinners, and movies while in flight. And people are only interested in new developments insofar as they are directly affected. A revolutionary new jet aircraft design is of interest only if it means a faster, quieter, or more comfortable trip. Note what is stressed in thp, Boeing 747 advertisements. New navigation, propulsion, and control systems are ignored in favor of winding staircases and plush accommodations. Pilots fly planes, people pay to ride in them, and there are a lot more people than pilots. The same rules will apply to minicomputers. New architectures, bussing structures, and addressing modes are only appreciated in terms of benefits which the user can see. Applications programs will be written for the user, not by him, and he will only be interested in the performance of the entire system as it affects his particular problem. 2. He tries to load the "ENGLISH" compiler paper tape into the machine. Discovers that "ENGLISH" requires 8K of core; he only has 4K. So he buys another 4K of core for $5,000. 3. He is about to load "ENGLISH" when he discovers that the MTBF on the teletype is shorter than the time it takes to load the tape. So he buys a highspeed photoelectric paper tape reader for $3,000. 4. He loads the "ENGLISH" compiler. 5. He types (in "ENGLISH") "GENERATE AMORTIZATION SCHEDULE (CR, LF)" 6. Immediately, the system starts to punch a binary tape. However, halfway through, the teletype punch breaks down. So he buys a high speed punch for $2,000. 7. He punches the binary tape. 8. He loads the binary tape. 9. He starts the program and types in the amount of loan, interest rate, and term. 10. Immediately, the system starts printing output, one line for each monthly payment. It takes a total of forty-five minutes to print all 360 lines. Meanwhile, the man stands there, with his fingers in his ears, hoping that the teletype printer will not break down before all the output has been printed. THE MYTH OF THE ULTIMATE PROCESSOR Some may say that the example is an exaggeration. It may be, but I wonder if they have ever tried to generate an amortization schedule using a minicomputer in its "basic configuration". The point is, that if one were to substitute zero-cost, infinitely-fast processors into most existing minicomputer systems, the total system cost and overall system throughput would not be significantly affected. But we are continually improving our machines. We are coming up with better performing hardware/software systems every day. I don't think that faster processors and more powerful languages are the whole solution. Let me illustrate by carrying the current trends to their ultimate goal. Suppose a man wants to generate an amortization schedule for a home loan. State of the art in minicomputers has reached the point where he can get a zero-cost infinitely-fast processor with 4K of memory and a super-powerful new compiler called "ENGLISH". The steps he goes through to generate the amortization schedule may be familiar to many readers: 1. He sits down to tell the computer (in "ENGLISH") to generate a loan amortization schedule. He discovers that no I/O device was provided. So he buys a teletype (with controller) for $2,000. The following chart compares the price/performance characteristics at the beginning and at the end of the example: Before After Price $0 $12,000 Speed Infinite 10 char/sec. A CALL FOR UNITY So far, I have tried to make three points: 1. Computers should be designed for the user, not for the designers. The user wants a system to solve his problems, not a computer to program. 2. The best way to optimize the overall performance of a system is to take a unified approach in the design of the system's components in order to optimize their performance together. A Systems Approach to Minicomputer I/O 679 3. I/O is by far the weakest link in current minicomputers. The total cost and overall performance of most existing minicomputer systems would not be greatly affected if we substituted a zero-cost, infinitelyfast processor and a super-powerful programming language. A VERY BASIC DISTINCTION The conclusion I draw from these points is that if we are to improve the overall performance of minicomputers, we must concentrate more on I/O. However, I don't think that faster, cheaper I/O devices are the whole answer. There is no question that we need such devices, but we need something more. We need to include I/O in the design process from preliminary specification through actual construction. I/O is an integral part. of system performance and it should be an integral part of the design process. Processor architecture, instruction set, and I/O scheme should be developed together, from scratch, in order to truly optimize total system performance. I don't think we should discuss minicomputer I/O as an isolated topic; rather it should be treated as an integral part of the whole system. As soon as we look at I/O in this light, several very interesting possibilities appear. Let us analyze the loan amortization problem discussed earlier, and classify the I/O operations performed according to the above criteria. The problem, as you recall, was to generate a loan amortization schedule (not to program a computer to generate the schedule. The difference here is important as will be seen). The I/O operations involved are classified as follows: A BIT OF PHILOSOPHY Note that the division would have been quite different if the problem had been defined in terms of programming a computer to generate the schedule. Unfortunately, we "computer types" have grown so a~customed to this rigmarole that we accept it as a part of problem solving. It is difficult for us to distinguish between the two because we are so used to working with machines. (If you have a hard t~.me categorizing the I/O steps in a particular applicatIon, try describing the sequence of operations to your wife. Those operations which she accepts and understands without further explanation are intrinsic, the others are incidental.) The goal of new I/O design approaches should be to streamline the intrinsics and to eliminate the incidentals. If an incidental operation cannot be eliminated, it should be made transparent or at least as painless as possible. Before we approach the problem of new I/O schemes, let us approach the problem of approaching problems. I think that we often misdirect our efforts due to taking too narl"ow a view of a given problem. It's like struggling to climb over a wall when, if we had stepped back and looked at the whole scene, we would have seen the open gate a short distance away. The important thing is to define the real problem (in this case, to get to the other side of the wall, not to climb over the wall) and to take a sufficiently broad view of the problem so as to include several alternative paths from which to select the best. Don't look for a way to improve existing methods. Rather, carefully define the real problem, then try to find the best way to solve that problem. The best solution may be to improve upon existing methods, but then it may be a totally different approach. Rapid advances in technology necessitate constant reevaluation of goals and methods. Decisions which were valid two years ago may have lost their validity due to technological developments. Let us try to reanalyze some of the basic characteristics of I/O and perhaps suggest some new approaches to minicomputer I/O design in the light of current (and projected) technolo~v I/O operations can be divided into two groups: 1. Those which are intrinsic to the solution of the problem at hand, and 2. Those which are incidental to the solution. Intrinsic 1. Input loan description 2. Output amortization schedule Incidental 1. 2. 3. 4. Load compiler Type program Punch object tape Load program COMPUTERS TA-LK TO PEOPLE Until recently, man/minicomputer communications have been rather poor. The teletype has been by far the predominate minicomputer I/O device, primarily due to an unapproachably low cost for a combined keyboard, printer, tape punch, and tape reader facility. Rather than ask how we can improve upon the teletype, let us ask "What is the best way to talk to 680 Spring Joint Computer Conference, 1970 computers?" The answer is contained in the question. Most interpersonal information is conveyed by speech. ~ven "HAL", the ultimate computer talked and listened to people. ("Yes", you might say, "but look what happened to him.") Notice, however, that not a single teletype was to be seen (or heard) throughout the entire Space Odyssey. Unfortunately, inexpensive spoken communication with minicomputers is not (yet) within the state of the art. So we must ask what is the next best method. Obviously it is visual communications. Man can as simulate visual information very rapidly. Ten characters per second is much too slow, one hundred per second is adequate, and a picture is worth a thousand and twenty-four words. I think that we are on the right track with some of the low-cost CRT terminals which have been and are being developed. One objection which is usually raised about CRT output is the lack of hard copy. True, this may be a limitation in some instances, but how often do you really need hard copy? Suppose you could store scrolls of output in a file somewhere and call them back for CRT display and manipulation very rapidly? Again, the solution space is different for different statements of a problem. N ow, how should man talk to a computer? Remember, most new users will be non-technically oriented. We should attempt to tailor the computer to the people, not vice versa; Let the machine do the work. This is in keeping with the trend toward less expensive machines and more expensive people. I firmly believe that the human finger is much better for pointing than for typing. Given a fast CRT output, a very efficient input method is the selection of a reply from a computer-generated "menu". Let the computer guide the user and help, rather than hinder, in the solution of his problems. COMPUTERS ALSO TALK TO MACHINES Peripheral device interfacing is the area where we have had more experience, since we have long been attacking such problems as "How can we make our machine talk to a teletype?" (instead of "How can we make our machine talk to the person sitting at the teletype?") . I think new developments in technology and new applications areas warrant a new look at the area of interfacing peripheral devices to minicomputers. I think we can find ways to design better device controllers, faster and at a much lower cost. We spend most of our time trying to develop integrated processor/software systems. We take advantage of quantity production techniques to lower hardware costs. We do everything we can to minimize engineering and programming time for the basic system, then we design a unique controller (and write new support software) for each new peripheral device. Weare very interested in statistics concerning the amount of time our CPU's are busy, but do we realize how inefficiently our device controllers are used? Most integrated circuit devices can easily run at a ten megacycle clock rate. Yet an I.C. teletype interface runs on a 110 cps clock. A typical photoelectric reader reads 300 characters per second. This means that such device controllers are only active on the order of 0.001 percent of the time. The remaining 99.999 percent of the time, the high speed logic gates are idle and only a few flip-flops are needed to hold some logical state information. To me, this clearly suggests multiplexing, or in some way time-sharing the control logic among several devices. PARTITIONING OF I/O FUNCTIONS The inclusion of I/O design as an intrinsic part of the overall computer system design provides a much larger space over which to distribute the functions necessary for I/O operations. For example, we could choose to implement a full duplex teletype controller using only one flip-flop, a clock, and two level converters, and provide timing and control functions in software. To add a photoreader merely requires device addressing capability, perhaps another flip-flop, and an addition to the software. This argument begins to fall apart when we add too many devices, or hang on a fast device (such as a magnetic tape unit). But does it really fall apart? How much I/O could your minicomputer handle if all it had to do was I/O? How much more could it handle if the I/O routines were in read-only-memory, rather than in core? Minicomputers are commonly being used to handle I/O for larger machines. "But", you say, "separate I/O processors are only warranted for very large machines. They are much too expensive to .be included in a minicomputer." A WAY-OUT IDEA(?) Let us consider this approach before we dismiss it as unrealistic. Suppose we were designing a minicomputer as an integrated CPU/software/I/O system. We could A Systems Approach to Minicomputer I/O choose to include two identical sets of processor logic, each with access to main memory and to each other. We could micro program one to act as a CPU and the other as an I/O processor. We could provide the absolute minimum hardware necessary in the device controllers and let the I/O processor do the rest of the work. Would this really be expensive? How much would it cost to add a duplicate set of cards? They have already been designed. Provisions have been made for their production and testing. Programming support has long since been developed. The existing core memory, power supplies, and cabinet can be shared. The amount saved in device controller design and implementation should greatly exceed that spent for the I/O processor. And consider the power of such a system. I/O instructions, block data transfers, virtual memory schemes, multi-level priority interrupts, and special user-defined I/O functions all take on a new dimension. 681 Whether or not such an I/O scheme is feasible requires much further consideration. The important point is that it is an example of what might be possible in an integrated design approach. SUMMARY 1. Minicomputer performance and development is I/O bound, especially in the man/machine interface area. 2. It is time we stood back and looked at minicomputer I/O, not in terms of how we can improve on existing techniques, but by analyzing what we want to do (the total problem) and deciding on the best way to do it (a total solution). 3. I/O should be designed into not onto the system. An integrated CPU/software/I/O design will result in optimum performance. 4. New approaches to existing problems might lead us in exciting new directions. A multiprogramming, virtual memory system for a small computer by C. CHRISTENSEN and A. D. HAUSE Bell Telephone Laboratories, Incorporated Murray Hill, New Jersey OVERVIEW The specific objective of this small computer system is to interface six to eight small graphical terminals l to a large batch-processing computer. The small computer provides the graphical terminals with real-time processing for generating, editing and manipulating graphical or text files. The small computer passes along to the large computer requests for large tasks. Access to the data base in the large computer is provided. Another aspect of this objective is remote concentration. The terminals are connected to the small computer directly or through several DATA-PHONE® 103 data sets. The small computer is connected to the large computer through a single DATA-PHONE® 201 data set. This configuration reduces communication costs for a group of terminals located remotely from the large computation center. The general objective of this system is to investigate memory management strategies in small computers. In particular, can large computer techniques be applied? How big are the required programs? To what extent can high processor speed be substituted for large primary memory size? The hardware configuration for the system is as follows: The computer is a Honeywell DDP-516 with an 8192-word, 16-bit, .96 p,s core memory. Secondary memory is a special fixed-head disk by Data-Di,sk, Inc., which has 64 tracks, packed 8192 words/track, and operates at 30 rps rotational velocity. The disk is connected to the computer through a high-speed Direct Memory Access Channel. The disk is sectored in 8-word blocks (hence, a 16-bit address just suffices for the disk sectors). A Soroban 600 cpm card reader is connected to the computer I/O bus. A special serial transmission system (called the I/O loop) is also interfaced to the I/O bus. Currently, four DATA-PHONE® 103 data sets are connected to the I/O loop, but other I/O hardware may be added relatively easily. 683 Low-level software support (primarily, a fancy MACRO assembler) is provided on a GE-635 computer, not on the DDP-516 itself. Loaders and debugging aids have been written for the DDP-516 .. The system supports a virtual-memory addressing scheme and a multiprogramming user environment. 2 The system manages memory (moves programs and data between core and disk, on demand) and all disk I/O, and provides the low-level interrupt handler for the local teletypewriter, the card reader, and the I/O loop. The system supports virtual addressing by providing a mechanism to convert virtual addresses to real core addresses, a task that requires memory management if the addressed data is not currently in core. The mUltiprogramming support is provided in the form of the tables and memory management required to automatically switch control from one user to another without interference. MEMORY MANAGEMENT Segmentation In order to free the programmer from the task of memory management, that is, the supervision of the movement of data and programs between primary (core) and secondary (disk) memories, it was decided that the programmer should address a large virtual memory space rather than the physical storage media. The system handles the tasks of converting virtual addresses into physical addresses and of making the data available for processing (moving data into core). Space is allocated within the virtual memory on the basis of segments. 2 As usual, a segment is a named block of storage which contains contiguous words. The first word of a segment is at relative address O. In. our system, a segment may contain any number of words up to 2047. Furthermore, the segment is the physical 684 Spring Joint Computer Conference, 1970 storage allocation unit; segments are not physically subdivided (paged). A physical segment consists of a logical segment plus a few extra words required by the memory management system. Although segments are permitted to contain 2000 words, in practice most segments are limited to 512 or fewer words. One reason for this limitation is that a DDP-516 memory-reference instruction has a 9-bit address field, so that a reference to a location greater than 511 would require indirect addressing. Moreover, the approximately 4000 words of core storage available for segments would be too easily clogged by larger segments. For convenience, ID's (segment numbers) rather than segment names are used internally for segment references. An ID is assigned the first time the segment is encountered, e.g., when it is first loaded into the system, or when a data segment is created at run time. Whenever a segment is deleted, its ID is reclaimed for future use. The ID is a 15-bit number, which means that the system can handle in excess of 32,000 segments. Hence, if each segment contained 500 words, the virtual memory space would contain 16,000,000 words, far in excess of our present physical storage capacity. As a practical matter, the current implementation permits just 4096 ID numbers, but this can easily be expanded if required. Files In order to facilitate handling large character strings that would not fit into a convenient size segment, we have implemented, a higher-level storage classification called the file. A' file is a linked group of segments. Each segment in a file has a forward and backward pointer ID to the succeeding or preceding segment in the string. A user has a private file directory which lists his ow~ files by name. The directory provides a link from file name into the file via the ID of the first segment in the file. Note that files may be any size, because there is no imposed limit on the number of segments that may be linked together, other than the total number of ID's available. The system provides routines for accessing files by name a;nd for fetching and storing characters in a file. These routines make the segment boundaries invisible to the user, which is a great programming convenience. The system also provides a public file directory so that data may be shared. One user's private files are inaccessible to other users. Addressing The hard-core system program uses conventional DDP-516 addressing,' without any special software structure or restrictions. Since the hard-core system is "bolted in," the memory management system does not have to handle it. The segmented programs (those which are handled by the memory manager and use the virtual memory space mentioned previously) require four specialized addressing modes for various purposes. Before these modes are described, it would be helpful to consider two major system requirements that the addressing scheme was designed to support. First, it was required that the memory manager not have to relocate addresses within a segment when that segment moved from one core location to another. This requirement could be removed, but it would significantly increase system overhead (both space and time). Second, it was required that segmented programs not contain internal variable storage, so that such segments could be used concurrently by any number of users without interference. Thus, it is not necessary to provide multiple copies of such "pure procedure" segments, which saves core storage space. The first specialized address to be considered is the intra-segment pointer, i.e., an address that points to a location within the same segment that contains the pointer. Fortunately, it was possible to satisfy the relocation requirement by pre-empting the DDP-516 index (X) register for use as a base register. This scheme works because the index register can be attached to any address, whether it be in a memoryreference instruction or a full-word indirect address. Furthermore, indexing is controlled independently of indirect addressing. In particular, whenever a se'gment is in execution, the index register contains the starting address of that segment. Then, all intra-segment references have the index bit "on" and the address field set to the desired relative address within the segment. The second specialized address is the absolute address. As its name implies, this address points to a fixed core location in sector 0 (otherwise, such an address could not be used in a memory-reference instruction). Hence, the index bit is O. There are two distinct uses for the absolute address. One use is to refer to fixed information in the system. In particular, a transfer vector is required in order to reach various system subroutines, none of which are located in sector O. In addition, a pool of generally useful constants is provided so that segments may be spared the necessity of containing their own copies of these constants. The other distinct use of sector 0 is to provide a pool of temporary storage locations for use by segmented programs. This pool of variables (the "thread-save") belongs to the currently executing thread. The threadsaves of all other existing threads occupy other places within core. When another thread is given control, its Multiprogramming, Virtual Memory System pool must be moved into sector O. It should be noted in passing that if our computer had possessed a second index or base register, we would have used it to point to the thread-save and thus avoided the need to physically swap thread-save data when switching threads. The third specialized address is a software-interpreted address called the virtual address. The virtual address is the general inter-segment address. It consists of two words. The first word contains the ID of the o LL(7) ID(15) I RA(9) desired segment. The second word contains two fields. The low-order 9 bits (RA) of the second word is the relative address of the word within the segment. Note that only the first 512 words of a long segment can be referenced, which matches the limitation of memoryreference ·instructions. The left 7 bits of the second word (LL) form the "loose link," which is a pointer into the 128-entry table of in-core segments. The loose link need not point to the correct entry in the table, but if it does, conversion from virtual to physical core address is relatively rapid ('"'-'25 microseconds). Hence, whenever the system is required to convert a virtual address into a physical address, the loose link is properly set, so that subsequent address conversions will go at maximum speed. Note that the loose link contradicts the "no variable storage within a segment" requirement, because the system can change the loose link. However, such a change is never harmful, because the segment table entry is checked before it is used. Also, since the same segment table is used for all threads, a correct loose link for one thread will be correct for any thread. The fourth specialized address is a direct address. A direct address is an absolute core address of a location inside a segment. A direct address must be stored within a certain subset of the thread save, in order that the system be able to find and relocate the address if the segment is shifted to a new core location. Moreover, a segment that is referenced by a direct address is locked into core, else use of the direct address could give a spurious result. The direct address does not add any significantly new addressing capability beyond that provided by the virtual address. However, it is useful because it is fast (hardware interpretation) compared to the virtual address. Use of the direct address does inflict some costs, in particular, the locking into core of the referenced segment, and the system overhead 685 required to relocate the direct addresses during a core shift. CORE MANAGEMENT The 8192-word memory of the DDP-516 is divided into two approximately equal parts, the hard-core system and segment storage. The hard-core system is the portion of the system that resides permanently in core memory. The rest of memory is allocated in variable size blocks to segments, as required. Each in-core segment is accessed through its entry in the segment table. A segment table entry is composed of two words, the segment ID and the segment location in core. ID BASE ADDR. OF SEG. j A virtual address (ID, RA) is converted to an absolute core address by adding the. relative address (RA) to the base address of the segment (second word of segment table entry). If the ID of the desired segment cannot be found in the segment table, then the segment is not in core and must be fetched from disk. When core is filled with segments and a new segment is required, one or more of the in-core segments must be pushed (written onto disk or just discarded) to make room for the new segment. The algorithm for choosing which segments to push out of core is simple. A sequential scan of the segment table produces push candidates. The scan begins where the last push scan ended and ends when successful pushes have yielded the desired amount of space. A candidate is pushed if and only if the second of the two segment header words contains, zero, i.e., there are no direct or inter- SEG. TYPE(5) SEG. SIZE(II) INTERRUPT ADDR. COUNT (6) DIRECT ADDR. COUNT (10) BASE ADDR. SEGMENT 686 Spring Joint Computer Conference, 1970 rupt addresses pointing to the segment. The leading bit of the first header word (hence, the leading bit of the segment type) tells whether the segment must be written on disk when it is pushed. Each time the user establishes a direct address to a segment, the direct address count for that segment is incremented. When the direct address is deleted the count is decremented. The. direct addresses stored on the users' call push down lists are also included in this count. When a call is executed a direct address to the called segment is pushed on the list and the count for the called segment is incremented. Upon execution of a return the direct address is popped off the list and the count is decremented. Hence, all segments on user push down lists are locked into core, as are all segments pointed to by direct addresses established by the user. I/O interrupt handlers are also allowed to be segments; when in use they are locked into core by incrementing the interrupt address count (high order six bits of the second segment header word). When enough segments have been pushed out of core to make the desired· amount of space, the holes left by the pushed out segments are gathered at the top of core. This is accomplished by moving all the segments above the holes down over the holes. Moving the segments in core requires that all direct addresses be changed to reflect the core shift. These include the segment addresses in the segment table, the users' call push down lists, direct addresses established by the users and various other system pointers. Since this is a long list and shifting the segments down is a long task (about 50 milliseconds), an attempt is made to free a large block of space (currently 1000 words) instead of just the amount requested. This makes the next few space requests easier to fill since a segment push and core shift are not required. DISK MANAGEMENT The half-million-word disk attached to the DDP-516 is divided into three areas. Eight K (K = 1024 words) is reserved for saving and restoring the hard-core system, 32K is allotted to disk management tables and the remaining 472K is used to store segments that have been pushed out of core. These segments on disk can be accessed by name or ID using the disk name table or disk ID table. Both tables are themselves on disk and comprise the disk management table area. Each entry of the disk ID table contains four words. SEG. DISK ADDR. SEG. HEADER NAME CROSS REF. CHECKSUM When a segment on disk is accessed using its ID the corresponding disk ID table entry is read and the segment's size is extracted from the segment header word (2nd word of entry). Then the system makes sufficient space in core for the segment. N ext the segment's disk address (1st word of entry) together with its size and the starting core address of the space just acquired are given to a routine that reads the segment into core. Each entry of the disk name table also contains four words. SEGMENT NAME ID CHECKSUM I When a segment is accessed using its name, the corresponding disk name table entry is accessed with the aid of a hash coding technique. 3 The ID (3rd word) is then used to access the segment as previously described. The cross-referencing words in the two disk tables facilitate conversion from name to ID or ID to name. When a segment is pulled into core from disk a flag bit in the segment's header word called the disk restore bit is reset. If any changes are made to the segment while it is in core, this bit is set to 1, which indicates that the segment should be rewritten on disk if it· is pushed out of core. Otherwise, the segment is thrown away when pushed, since an up-to-date copy already exists on disk. A new segment is created with the diskrestore bit set; when it is pushed out of core, a zero disk address in its disk ID table entry indicates that space for it must first be allocated on the disk. Segments can also change size while in core. If a segment is too large for its old slot on disk a new disk slot is Multiprogramming, Virtual'Memory System allocated and the old one is marked as a hole to be collected later. Disk garbage collection is triggered when the disk allocation pointer approaches the end of disk storage. An autonomous thread is then initiated in the multiprogramming system which relocates the segments on disk, bubbling the holes to the top. This is a long procedure and executes concurrently with other threads. MULTIPROGRAMMING The multiprogramming system uses a pure roadblock strategy, i.e., it gives a thread control and lets it compute until it roadblocks. The next thread is then given control until it roadblocks, etc. There is no fixed maximum slice of time for each thread. A thread can roadblock for several reasons. If the thread requests input or output a roadblock occurs and the I/O proceeds under interrupt control. When the I/O is complete the thread is unroadblocked. A thread can also address a segment which is not in core and roadblock until the segment is brought in from disk. If a thread is still roadblocked (I/O not completed yet) when its turn comes around again it will be skipped. Thus a thread is given control only when its roadblock is removed and its turn comes around. The heart of the multiprogramming system is the thread table. It contains a six-word entry for each of the possible ten threads. I I ROADBLOCK B IT I RESTART ADDRESS 4 THREAD TEMP. DATA CELLS PTR. TO THREAD SAVE BLOCK When the system is otherwise idle it scans this table for an unroadblocked thread (roadblock bit = 0). When a thread, does um;oadblock, its four temporary data cells are transferred to core sector zero, and the thread is restarted at the address specified by the first word of the thread table entry. This restart address is always within the hard-core system; before control is' pussed to an outside program segment the thread's save block is moved into core sector zero. The thread 687 save block is 80 words long and resides in the segment storage part of core. However, it is never pushed out of core. THREAD SAVE BLOCK FUNCTION PUSH DOWN LISTS USER TEMP'DATA USED DIRECT ADDR. SYS. TEMP DATA SYS. DIRECT ADDR. MISC. SYS. POINTERS AND DATA NO. OF WORDS 24 16 8 8 15 9 The thread save block contains all the data and pointers required by system and user in order to implement pure procedure programs. Before a thread's save block is moved into core sector zero the save block in core sector zero is restored to the previous -thread's save block. This data movement constitutes most of the overhead involved in changing threads and takes about a millisecond. However, most of the roadblocks that occur when' a thread is using hard core system programs require only the four data cells in the thread table entry to be in core sector zero. This brings the thread changing time down to about 50 /Lsec. For example, when an out-of-core segment is addressed a thread could be roadblocked three or four times for things like reading the disk ID table, making space for the segment, and finally transferring the segment into core. Only after the segment is in core and the address is about to be computed is the complete thread save block required to be in sector zero. There are several other interesting roadblocks which can occur. For example, a low usage program may be more compact and simpler if it is not pure, procedure (required by multiprogramming). This is allowed by using a GATE statement at the start of the program. The gate allows only one thread to be in the program. Any other threads that tried to enter would be roadblocked until the first thread opens the gate on its way out. Of course, allowing only one thread at a time rules out this technique for high usage programs. A program can also-request a thread to give up control. This is a useful technique' to break up programs with long execution times or to wait for some external event to occur without locking the other threads out of the machine. INPUT/OUTPUT All input/output in the computer is done under the interrupt system with the aid of the I/O table. This 688 Spring Joint Computer Conference, 1970 table contains a five-word entry for every I/O device attached to the computer. INTERRUPT HANDLER ADDRESS BUFFER ADDRESS MAXIMUM CHAR. POINTER ESCAPE CHAR. CURRENT CHAR POINTER INITIAL CHAR. POINTER THREAD TABLE POINTER The above example is a table entry for a characteroriented device. When input or output are desired the appropriate program is called with the buffer segment and the escape character supplied as arguments. The called I/O program then fills in the I/O table entry, primes the I/O device and roadblocks the thread. When an interrupt occurs, the system gives control to the location specified by the interrupt handler address in the I/O table entry for the interrupting device. For input, the characters would then be inserted in the buffer using the buffer address and the current character pointer. About 300 microseconds are required to handle each character interrupt for the teletypes. When an escape character match or ·full buffer is encountered the thread :table pointer in the I/O table entry enables the program to unroadblock the thread, and the I/O functiori is complete. Since a large number of different I/O devices are expected to be connected to the computer, with only a few of them active at one time, interrupt handlers are allowed to be program segments. When an I/O device becomes active its interrupt handler segment is fetched and locked into core for the duration of its activity. Disk I/O is controlled by the disk I/O queue, which contains twenty entries of five words each. DISK TRANSFER PROGRAM DISK ADDRESS CORE ADDRESS THREAD TABLE POINTER OTHER DATA A disk transfer is initiated by finding an empty queue entry and inserting the address of the appropriate disk transfer program and its arguments. The requesting thread is then roadblocked. The disk I/O handler is an autonomous process which goes on in the background of thread processing. When a disk I/O task is finished, the current disk rotational position is read and the disk addresses on the disk I/O queue are scanned to pick the task with the least latency. Control is then given to the picked entry's disk transfer program, which sets up the disk I/O. Upon completion of the task, the thread is unroadblocked using the thread table entry address in the queue entry. USAGE A brief description of currently available programs is provided in order to demonstrate some of the system capability. The system does not as yet serve a community of "outside" users or applications programmers. Log-in The teletypewriter log-in procedure is controlled by a segmented program. When a user dials the system, the interrupt handler answers the telephone and initiates a thread for the user. The new thread executes the log-in program, which requests a password. The password identifies a user catalog of files. Then the log:-in programs sends thread execution to the monitor, which permits the user to select an applications program. Text editor One currently available applications program is a text editor. This is a very simple, line-oriented editor. It has the ability to enter one or more lines of text anywhere within an existing text file and to delete one or more existing lines. Of course, selected lines may also be printed out on the teletypewriter. The text input mechanism has a tab feature which permits the user to select the tab key as well as to position the tab stops. The command format is a single command letter followed by arguments, if any. The arguments are decimal line numbers for the print and delete commands, for example, or a line of text for the entertext command. The text editor may create or delete files, as well a~ attach any· existing file. This capability is also pro- Multiprogramming, Virtual Memory System vided by other programs, but it is convenient to have these features available from within the editor. Interpreter The second available applications program is a TRAC-like interpretive program. 4 This program can be used for text editing, but it is a general characterstring manipulator. Like the editor, it is a "safe" program in that user errors do not bring down the system. Also, this interpreter provides a high-level programming language in which other applications programs can be written. . Debugging We have provided a low-level debugging tool for use with segmented programs. This debugger is itself a segmented program, and it is designed to operate in the multiprogramming system environment. This is important, because it allows one user to debug without blocking other users. Also, it aids the programmer by freeing him from the necessity of knowing absolute addresses, which would be a painful requirement when the segments move within core or between core and disk. The debugger allows the user to print out or alter the contents of selected locations within a segment. The user may also print or alter some of the data in his thread-save block. The command format consists of a one-letter command, followed by arguments, if needed. Numerical arguments are given in octal; segments may be referred to by ID or by name, as in the following address: segname/ra where segname is the segment name (in ASCII) or ID (in octal), and ra is the relative address (in octal). SOFTWARE SUPPORT It is becoming common practice to support smallcomputer programming on a large computer system. We have available a GE-635 computer in our computation center, and we use it to assemble all of our DDP516 programs .. The most important reason for using this kind of computation center support is that, by using the available assembler (GMAP), we have access to MACRO instructions and many powerful pseudo-operations that are not available in the manufacturer's assembler (DAP). The difference may not be important for small programs, but it makes a vast ··689 difference in a comparatively complex project, such as the current effort. Another significant advantage of computation-center assembly is that it uses convenient output peripheral equipment, in particular, card punches and line printers. Our use of the large computer assembler to generate small computer code is not novel, but apparently it is not widely used. I t works well in our case because the two computers (GE-635 and DDP-516) are organized similarly (e.g., they are both word-organized, single-address), and in a rough sense, the smaller computer is almost a "subset" of the larger computer. Moreover, GMAP has provision for redefining old instructions and for defining new ones. Hence, it is relatively easy to get GMAP to accept DDP-516 assembly code (in GMAP format, but with DAP mnemonics). The G MAP binary output cannot be changed so easily, however, so a separate program, called a post processor, has been written to convert the GMAP binary output into a more suitable format for our requirements. For this project it was convenient to have two assembler/post processor packages, one for segmented programs, and one for system programs. The segment assembler helps the user with the special segment addressing modes, and simplifies access to system programs callable from segments. The segment post processor converts the resulting binary into a form suitable for use by the segment loader. The system assembler is simpler -and less restrictive than the segment assembler, and it uses conventional (for GMAP) inter-program linkage features. The system post processor includes a linking, desectorizing loader which loads one or more relocatable programs into a core memory image, then punches the result as an absolute program which can be loaded into the DDP516 by a simple, compact loader. In both cases, the post processor prints an octal listing of the final output as a debugging aid. We have found it convenient to incorporate octal patch card facilities in both the segment loader and the system loader. Thus, the user is able to patch known errors in his binary decks before he has had a chance to reassemble. STATUS REPORT The system described above is currently working stand-alone (the 201 data set link to the large computer has not yet been implemented). The system supports four 103 data sets for communication with teletypewriter consoles. The graphical terminals are not yet available for connection to the system. Hence, 690 Spring Joint Computer Conference, 1970 it is too soon to conclude whether the system can be used as a remote concentrator for graphical or other terminals connected to a large computer system. However, we can comment on the memory-management aspects of the objectives. The current size of the hard-core system is just under four thousand words. This includes character handling routines, a 103 data set communication package, card reader package, in addition to the memory manager and multiprogramming support software. Hence, approximately four thousand words remain for program and data segments. The system has been exercised with four concurrent users; the segmented programs in use during this exercise included the text editor, the interpreter, and the segment debugger, which altogether represents five thousand words of program. The experiment generated and accessed several tens of thousands of data words. Delays due to multiprogramming were scarcely noticeable compared to delays due to disk latency. For example, it took ten seconds to sequentially access every character in a 20,OOO-character file. Note that this sequential access time is a function of the size of data segments that make up the file. The data segment for this experiment was 64 words in order to minimize the amount of core required by each user. Doubling the data segment size would halve the access time; it would also increase multiprogramming interference. More experiments are needed in order to explore such trade-offs. REFERENCES 1 H S McDONALD W H NINKE D R WELLER A direct-view CRT console j01' remote computing Digest of Technical Papers International Solid-State Circuits Conference Vol 10 pp 68-69 1967 2 R C DALEY J B DENNIS Virtual memory, processes, and sharing in MULTICS Communications of the ACM Vol 11 No 5 pp 306-312 May 1968 3 R MORRIS Scatter storage techniques Communications of the ACM Vol 11 No 1 pp 38-44 January 1968 4 C N MOOERS TRAC, A procedure-describing language jor a reactive typewriter Communications of the ACM Vol 9 No 3 pp 215-219 March 1966 5 D L MILLS Multi-programming in a small-systems environment The University of Michigan Technical Report 19 CONCOMP May 1969 Applications and implications of mini-computers by C. B. NEWPORT Honeywell, Inc. Framingham, Massachusetts Over the past four or five years the largest growth segment of the computer industry has been minicomputers and it appears that this trend will continue into the foreseeable future. Mini-computers have typically been defined by their price rather than by performance. As recently as early in 1969, some observers were classifying mini-computers as those having a price for a minimum system of less than $50,000. Today a more reasonable figure would be $20,000 and some people may even press for $15,000 or even $10,000, but perhaps at this level one is talking of micro-computers. These machines have had a fairly remarkable impact on the comp~ter industry since in some respects their performance is even better than that of their big brothers which have built up the computer industry over the past 20 years. For instance, many computers have core cycle times and peripheral transfer rates which are considerably higher than the conventional large scale computers. Core cycle times of less than 1 microsecond are common and some machines are in the region of ~ of a microsecond. Maximum I/O transfer rates are frequently determined solely by the memory speed and with 16 bit machines transfer rates of over 2 million characters per second are quite common. It is interesting to compare these parameters with an IBM System 360/50 which has a core cycle time of 2 microseconds and a maximum I/O data rate of 800K bytes per second on the selector channel. The 360/50 could in no sense be classed as a mini-computer and of course in other areas such as core size, instruction repertoire, range of peripherals, standard software, etc., it is a far m9re powerful one than any mini-computer. Nevertheless this does indicate that for those applications where high speed minimum complexity processing is required, and rapid I/O transfers are needed, minicomputers may well be more effective than their large brothers. Most large computers were designed basically for batch processing either s~ip.nt.if1,. OY' hll~iness and the 691 concept of high speed real time interaction with these machines tends to have been added as an afterthought .. Thus, when one attempts to use large machines for systems such as air lines reservations, time sharing, messaging switching, industrial process control, etc., one finds that it is relatively easy to burden the large processor with the simple tasks of handling communication lines, attending to external interrupts, and inter. rogating large data files, thus leaving no time for the basic computation that may need to be done. The realization of this is leading to the off-loading of the simple jobs handled by large machines on to small peripheral machines (mini-computers) which can be dedicated to high speed but relatively simple tasks. Tasks include the handling of error control, polling, and conventional communication disciplines, over a number of communication lines, and then, presenting packaged and checked messages to the large processor for subsequent handling. In time sharing, it is becoming clear that the "number crunching" machine used to invert matrices, solve linear programming problems and so on, should be isolated from the relatively trivial tasks of sending and receiving messages to and from users and from the tasks of scheduling and monitoring the performance of the overall system. Mini-computers designed for high speed character manipulation rather than computation can undertake many of these housekeeping chores more effectively than the big machines. In some instances mini-computers have proven to be very effective in taking on complete jobs that were previously'thought to be the province of large machines. A good example is in message switching where minicomputers are showing that they can handle effectively the switching of messages between as many as 100 or 128 low speed communication lines. In this application essentially no calculation is required but there is an extensive amount of character manipUlation, checking of character strings, and elaborate real time housekeeping. Figure 1 shows a diagram of a typical message 692 Spring Joint Computer Conference, 1970 FIXED HEAD DISC FIXED HEAD DISC Figure I-Message switching system switch indicating the way in which two computers may be used to provide a switching function, and redundancy in the case of equipment failure. In a typical application, the CPUs would be DDP-516 class computers with 16K to 32K words of core, and a fixed head disc on each machine would be used for in-transit storage. The long term storage requirements for journaling and intercept would use moving head discs with replaceable disc packs. The incoming communications lines, some having relay interface to dc lines, and others having EIA type interfaces to modems, would be connected into individual line termination units. These feed partially formed characters into the multi-line controllers (MLCs) which completely format the characters, check for parity, and perform other control functions before passing the characters and line identification into the CPUs. Input is- taking place in parallel on both machines so that in normal operation they can both build up identical information on the in-transit disc stores. Communication between the CPUs is via the intercomputer communication unit (ICCU) and allows both machines to insure that they are in step on a message or partial message basis. The watchdog timer (WDT) is an independent hardware device monitoring the performance of both CPUs and providing an alarm and switchover if one of the CPUs should fail. It will be seen that messages are inputted in parallel into both systems but outputted from only one of them. Figure 2 shows a block diagram of the program which would be running in one of the processors. Input characters from the multi-line controller are passed through the input processor and assembled into partial message blocks in the input buffer area. These messages consisting of heading and text blocks are then passed to the disc queue and transferred from core to the fixed head disc. As messages are completed on the disc they are transferred back into core one at a time for header analysis and routing. This is undertaken by the message processing program, and completely processed messages are returned to the fixed head disc where they are queued ready for output to the appropriate line. As the lines become free the output processor program takes the messages off the fixed head disc, a block at a time, buff~r~ them temporarily in cores and then transfers theta to the multi-line controller~CJr~ It will be seen that the majority of the processing involved is examining strings of characters for particular sequences, and manipulating blocks of core storage being used for queues. Efficient data handling in both these areas and also the ability to operate with a high speed fixed head disc enables mini-computers to handle between one thousand and two thousand characters per second in a typical message switching application. Mini-computers are normally quite limited in the amount of core storage they can have and it is interesting to note that in most communications applications there is a trade off between the amount of core storage required for input/output buffer blocks and queues, and the speed of the fixed head disc. With a high speed disc small buffer blocks can be used in core since these can be unloaded rapidly onto the disc before core saturation occurs. With a disc providing about 100 independent random accesses per second, buffer blocks holding in the region of 64 characters are normally acceptable and do not demand excessive core storage. However, if it were possible to increase the speed of the fixed. head disc, by say 4 times, an approximately equivalent reduction in the size of the buffer blocks could be made and a corresponding reduction in the amount of core storage and hence in the cost of the system. FIXED HEAD DISC RECALL IUl'ERVlIORY FUNCTIONS Figure 2-Diagram of basic message switching programs Applications and Implications of Mini-Computers It is interesting to note that there are essentially 3 different parts of the program: input processing, message processing and output processing. The communication between these 3 basic program segments is almost entirely through data held on the fixed head disc, with the exception of pointers and status words. This leads directly to the consideration of multi-processor systems to handle applications beyond the capability of one mini-computer. One Cf" "'nuter'would be assigned to ~h of the major procesb. . . J.f;; areas and they would all ,.:.d.ve access to the common fixed head disc. Some simple means of passing limited amounts of status information between the computers would be necessary but :all the major data flow would be thru the disc. The a~ount of core on each processor could be optimized to the task it had to perform and in principle so could the power of the processors, although in practice it would be simpler to maintain identical processors in all cases. One would not expect the throughput to be as much as 3 times greater than that of a single processor because of the inability to share spare time between the processors. For instance, if the input is particularly heavy and the input processor is becoming overloaded it would be very difficult to arrange for, say, the output processor to take some of this load, whereas in a single computer system this can be arranged to happen automatically. While 3 computers might be expected to give somewhat less than 3 times the thruput of one computer, significant economies can be obtained in the redundancy since now one standby machine can be used to replace anyone of the 3 n«?rmally operating computers. Thus 4 machines connected in the appropriate way can provide between 2 and 3 times the thruput of 2 machi~es connected in a normal redundant configuration. This \type of configuration clearly gives savings in cost and an increase in reliability over the use of one or two large machines. In addition, it can simplify the programming and the checkout since separate tasks are confined to separate pieces of hardware. Time sharing is another application area that has until recently been the province of large computers. There are, however, now on the market a number of small time sharing systems based on a single minicomputer. These systems typically provide for 16 or 20 users working with the interactive terminal language BASIC, or sometimes FORTRAN. These systems clearly do not 'compete with the larger time sharing systems in data storage, power and the varity of language facilities, library facilities, or ability to undertake extensive mathematical calculations. They do, however, provide a very useful service where simple fast access computation and data retrieval is required. The use of a multiple mini-computer configuration to extend the capability of the smaller systems upwards towards that 693 I' Figure 3-Timesharing system of the large systems is well illustrated in the H1648 time sharing system. Figure 3 shows how the 3 computers are connected together to provide up to 48 simultaneous terminal users with the capability of programming in FORTRAN, BASIC, TEACH or SOLVE. The terminals are handled by a DDP-416 with 4K of core. This machine passes characters to and from the terminals, provides echo-back for transmission verification, provides some buffering, and passes characters one at a time into the control computer. The control computer and the job computer both share moving head disc files for data interchange, and they pass control information thru an ICCD. The control computer is essentially the executive of the system and provides the normal interaction between the user and his programs and data files which are held on the discs. The user may build up programs and data files upon the discs and when he requests that these programs be run, the control computer will queue his request for execution on the job computer. When the job computer is ready to execute the program, it will read the necessary files ·from the disc and will bring in any required system programs from the system disc. It will then compute for a predetermined period of time, in the region of 7.4: second, and if the job has not been completed, swap it out on to the system disc and bring in the next job for a similar period of time. In this system, the tasks have been divided fairly cleanly between the 3 computers. There is considerable difference in the computing power of the machines, each machine being matched reasonably well to the task required of it. The communications computer is a DDP-416 with 4K of memory, while the control 694 Spring Joint Computer Conference, 1970 computer and the job computer are both DDP-516s with 32K of memory. The use of the multiple computer configuration has considerably increased the power of this system over that which would be possible with all functions undertaken in one machine, and it has simplified the task of implementing the software and of adding modifications. As a simple example of the independent usage of the computers, the control computer and the job computer can be isolated from the normal time sharing function and used for software development by the system programmers, while the front end communications computer can remain online to the terminal users. Clearly the terminal users cannot do their normal computation, but when they attempt to sign on the communications computer can reply to them with a standard message informing them of the state of the system and when it will be back on the air. This is much more satisfactory than receiving no reply at all to an attempted sign-on and having to make a telephone call to verify the state and availability of the system. The two applications so far discussed are indicative of how mini-computers can take on tasks normally assigned to larger machines and provide benefits of low cost, separation of programming tasks, and economical provision of redundancy. In examining these and other similar applications of small computers, various questions arise which need answering if these mini-computers are to be as widely used as I believe they should. The main points of discussion are: 1. Shared peripherals versus shared core memories. The two applications mentioned have used shared discs as a means for transferring data between computers, but an alternative is to share core storage or at least portions of it between two or more processors. This has the immediate advantage that data is accessible to both machines without any need for an ICCU, and no time is lost in making data transfers. There may, however, be hardware difficulties in that shared core may cause a slowing down of the normal operation and can negate the conceptual speed of advantage. It also becomes more difficult to provide redundant backup for the shared core memory, and the attempt to do so almost inevitably causes further slowing down of the effective cycle time. Probably the optimum solution is the large private memory on each processor, a small somewhat slower shared memory with redundant backup, to provide data transfer between machines, and a large high speed shared disc to provide the basic bulk data transfer. 2. Efficient, simple, modular operating systems are required. If multiple mini-computers are really destined to effectively challenge the large computer business in the real time application area, it is vital that they develop some sophistication and maturity in their software systems. I t is impractical to develop new software for every new application and an attempt to do so will simply delay the introduction of these machines. Large, all embracing operating systems are not required, but simple, high speed, standard executives are badly needed. These should provide high speed handling of interrupts, simple task dispatching, clearly defined interfaces with application programs, and the ability for the user to start very simply and elaborately by adding modules as required to do his particular job. 3. Large data storage peripherals. Most mini-computers have only been available with relatively small bulk storage devices, because their designers anticipated that these machines would only be used on small scale applications. The realization that large scale applications are also appropriate for mini-computers means that large bulk storage devices, 10 million to 100 million or more characters, are required .. Because of the inherently limited core storage capabilities of minicomputers, these bulk store devices must have fast access. Moving head discs are the only acceptable devices at present but even these are too. slow in many applications. A typical moving head disc provides approximately 10 random accesses per second on an average, but some applications require that this be increased by an order of magnitude. Fixed head discs can be used but it may also be possible to design some form of hardware queueing into the disc file controller. This would enable the computer to output a series of requests, perhaps 10 or 20, to the disc file controller, which would then compare its current position simultaneously with all the stored access requests. It would then automatically make those transfers which were closest to its current position and so minimize the average disc latency. Assuming requests for access are made at random· positions on the disk, the effective reduction in latency would be dependent on the number of requests that could be stored in the controller and searched simultaneously. It seems plausible that an improvement of 5 or 10 times could be made. 4. Some problems will always require large machines. When a problem requires extensive scientific calculation, long word length, hardware floating point, and extensive demands on core storage, it will always require the use of large machines. This will also be true of problems that cannot reasonably be broken down. into small component parts. In these cases minicomputers could still be expected to serve as peripheral processors around the large machine. 5. Manufacturer support. The widespread use of minicomputers will depend on the support available to Applications and Implications of Mini-Computers design and implement actual systems since ideas for applications will always exceed the supply of those capable of seeing the application through to successful implementation. It will be essential for manufacturers to supply simple and efficient software modules, all useable with each other, and supported with high quality documentation. Also implementation assistance, effective field service, and maintenance support will be needed. System analysis and design may also come from the manufacturers, in some cases, but will be more 695 likely to be provided by consultants. Manufacturers must provide meaningful training courses and effective manuals on the equipment, the software, and its potential applications. Too few people know what to do and those who do must generalize, write it down, and distribute it as widely as possible so that others may learn. l\1ini-computers have a great future limited more by our collective ability to understand how they can be used than by an deficiences or omissions in the hardware. Teleprocessing systems software for a large corporate information system by HO-NIEN LIU and DOUGLAS W. HOLM~ES Pacific Gas and Electric Company San Francisco, California INTRODUCTION One of the functions of management is to control the organization in such a way that it responds to changes and deviations in the optimum manner. The magnitude of the deviation from the established goal often depends upon the ,length of the delay in response, any deviation from the best performance objectives must be quickly detected and corrective measures applied promptly. A fast response corporate information system is designed to accommodate this criterion with the following capabilities: 1. Keeping the. Corporate Data Base Freshly updated Source data may be transmitted directly into the computer to improve the efficiency of the information flow, thus providing prompt and accurate collection of data from widely dispersed areas. This capability can at least provide the following benefits: • Reduction in human waiting time. • Reduction in idle resources. 2. Extending the usage of the Corporate Data Base New applications could be added to provide benefits not previously available. • Direct exchange of information with the corporate data base helps users in diverse locations keep abreast of rapidly changing events. For example: • • Immediate presentation of operating status aids decision making. • • Rapid transmission of decisions to the point of execution can be accomplished. Swift distribution of decisions to the associated parties for supplemental decision-making are completed within the time frame. Timely feedback of the results of the decisions allows adjustments to the operating environment in an incremental manner. 697 A well planned and developed teleprocessing system will provide the backbone of a fast response corporate information system. The remainder of this paper describes the requirements, strategy, facilities, and actual implementation of such teleprocessing system. SYSTEM REQUIREMENTS The following requirements are essential for a teleprocessing support of a growth-oriented corporate information system. 1. Support for a Variety of Terminal Types Each terminal installation must be reviewed to determine the specific terminal type which can best handle the types and volumes of information processing typical of that location. The system must be capable· of supporting, in addition to the standard devices, several special devices tailored to satisfy special situations. • The standard devices will include: • • Typewriter terminals • • CRT terminals • • Low-price, short-message terminals for data entry • • Card readers, card punches, and line printers for remote locations. • The special devices could include: • • Analog transducers • • Process control computers 2. Centralized Control of Tele-Communications Network To assure efficient information flow and optimal utilization of the communications network, control of the teleprocessing system should be centralized so that resources can be allocated dynamically to satisfy changing demands. Conventional systems 698 Spring Joint Computer Conference, 1970 PERMANENT DEVIATION (fAILURE TO ACCOfI'LlSH GOAl) DELAY IN RESPONSE Figure 1 have often allowed segmentation of the system resources into disassociated subsystems between which temporarily unused resources cannot be shared. For a large scale party-line (multidrop) network, provisions should be made to maintain a network discipline that will ensure increased system efficiency as well as dependable service. For example, continuing to poll a malfunctioning terminal which fails to reply degrades service to other terminals on the same line and wastes central processor time. Such a terminal, therefore, should be deleted logically from the network until it is capable of replying. By contrast, when maintenance personnel are testing a malfunctioning terminal on-line, the system must poll this terminal as usual until testing is completed. The network control program should provide at least the following functions: • Polling and addressing network discipline. • Threshold error counters to allow automatic deletion of malfunctioning lines and terminals. • Diagnostic terminal mode which bypasses the automatic deletion of malfunctioning lines or terminals to allow for on-line hardware maintenance. • Manual stop and start for lines and terminals to allow console operator to control network. 3. Support for a Variety of Message Types Each application project should be able to select the message types best suited for its needs. The system should support a variety of basic message types which may be used independently or as building blocks for more complex activities. These basic types of messages are: • INQUIRY: An operator may ask predefined questions by specific transaction codes. • RETRIEVAL: A user may select and examine information elements from the data base. • DATA ENTRY: An operator enters new information into the data base, whether update occurs immediately or later. • JOURNAL: An application project reports the status of transactions previously processed. • MULTIPLE DESTINATION: A message processing program responds to a single request with messages directed to two or more locations. A complex activity example: • DATA CHANGE (INQUIRY + DATA ENTRY): An operator inquires into the data base and then enters changes based on the inquiry response. 4. Versatile and Balanced Message Control Facility In support of these message types, the message control program should also provide the following services: • HEADER BUILDING: Identification, time stamping,· routing, and classification of messages to permit off-line analysis of message flow in addition to on-line control. • QUEUE MANAGEMENT: For a large real-time system, the interval between message arrivals is often less than the service time so that messages cannot be processed serially nor can the system keep up with the demand for its resources. The resulting backlog of messages must be managed to smooth out peak loads and provide a tolerable response time. • PRIORITY MANAGEMENT: Certain activities are of such importance that they .demand immediate attention regardless of the backlog of other messages. A priority scheduling mechanism would permit such activity to avoid long waits in queues by providing express routes throughout the system. 5. Efficient and Easy Applications Programming Economic considerations require an approach be taken which reduces programming, testing, and maintenance costs of message processing programs. The teleprocessing system should present an interface which permits such cost reduction. Teleprocessing Systems Software There are three aspects to reliability in any system: 6. Testing Provision for Message Processing Programs To facilitate testing new or modified message processing programs in an actual operational environment without endangering the on-going operations. The system should protect at least the following resources: • ENDURANCE: Protection against failure of its own programs, and graceful degradation of the system under adverse conditions. • RECOVERY: Provision for restarting the system close to the point of failure after the disturbance has been removed or corrected. • AUDIT TRAIL: Each day's message log should be retained for a period of time to permit reconstruction of a single event or a sequence of events which led to failure of a program module or the system. After the fact analysis is often the only technique possible for problem identification/solution in a real-time environment. Some provision should be included for a computer search of the message log when specific selection criteria permit. • DATA BASE: Retrieval of data elements from the data base should function normally for the testing program, but any attempts to update the data base, directly or indirectly, must be intercepted. • OPERATION AL PROGRAM: A different storage protection key should be assigned to the testing programs. The system should also provide the following functions: • MESSAGE TRACE: Print or log every work area associated with a testing program to show the message in different stages during processing as a diagnostic aid. This function should be available by request for operational programs also. • REFRESHING: After trying a specific condition to which the testing program fails to respond normally, it is desirable to refresh the copy of the testing program from the library so that different conditions can be tested to speed up the debugging cycle. • TASK INDEPENDENCE: If one testing program fails, the system must take action for abnormal termination of this individual program (subtask); however, all the other programs in the same region should continue processing independently of this failure. • TIME LIMITING: A program should be terminated if it does not complete within a specified time limit. This function is of value for operational programs but especially for testing programs to break tight programming loops. 7. Data Base Security To protect the integrity of the corporate information system data base, security measures must be provided against unauthorized update and retrieval of privileged information. Security should be a function of the operator's level of authority, the location of the terminal, the transaction code, as compared to the sensitivity of the data element. 8. System Reliability A real-time information processing system must demonstrate its reliability to its users. 699 9. Facility to Evaluate System Performance Usage statistics should be gathered to detect problem areas of the system worthy of special attention, so that solutions can be implemented to improve: • • • • Main frame through-put Network traffic Terminal operation efficiency Application program proficiency STRATEGY OF THE SYSTEM The principal strategy entails the reduction of redundant coding otherwise inherent in the massive application programming effort by shared system modules, wherein the following disciplines should be imposed on the system directly or indirectly: • • • • Application Program Proficiency Network Traffic Efficiency Terminal Operating Efficiency Main-Frame Throughput Efficiency Let us define the term "application program" as referring to a message processing program tailored to handle one or more varieties of messages as identified by the transaction codes. The three stages of application program structure described below will demonstrate the progression of teleprocessing software architecture for a large corporate information system. STAGE 1: Centralized Data Management Functions for All Application Programs A previous paper (1) has described in detail how to centralize the data management functions to obtain 700 Spring Joint Computer Conference, 1970 response message to the user in fixed form without terminal control characters. MESSAGE PROCESSING FUNCTIONS EXCLUDING FILE ~lANAGH'iENT FliNcnONS 5. Output Editor 1 INPUT BUFFER MESSAGE I N TERM I NAL FORM 2 I NPUT ED ITOR MESSAGE NOR~IALIZED IN FIXED FORM 3 DATA VALIDATION VERIFY INP~T DATA PROCESS ROUTINE PROCESSING LOGIC FILE ACCESS EXCHANGE INFORMATION FROM DATA BASE 5 OUTPlJ T EDITOR MESSAGE 6 OUTPUT BUFFER MESSAGE IN TERMINAL FORM 4 ,------- _._-----FOR~IATED TO TER~lINAL FORM Convert the logical response message from internal format to display format inse~ing terminal control characters for transmission. If there is more than one message to be sent to different types of terminals, construct different message strings to corresponding terminals. 6. Output Buffer Dispatch message in terminal form. STAGE 2: Independence of Application Program From Terminal Hardware Characteristics Figure 2-Stage 1 the following benefits: • • • • Reduction of Core Memory Requirement Reduction of Program Loading Time Centralized Control of Shared Data Base Optimal Allocation of Resources Associated with Shared Data Base • Flexibility in File Design and Record Layout After excluding the data management function from an application program (Figure 2), the following SIX key functions remained to be performed: 1. Input Buffer Get message from message queue as it arrived at input buffer; the message string is in original terminal form, containing terminal control characters and varies in length and format. 2. Input Edit Normalize the input message string to fixed form by interpreting the terminal control characters, replacing absent characters and fields with nulls. Because different types of terminals have unique sets of control characters and logic, an application program that contains this function will always be d~pendent on the type of terminal and its logic. §Oared message editors normalize input messages and format output messages in order to isolate the application programs from the tedious function of terminal control character interpretation. Several advantages are derived from this approach: 1. Programming Proficiency • One application program can handle similar information from several types of terminals each with a format most suited to its special features. • Shared message editors permit optimization of terminal characteristics at low programming cost since they need be programmed only once. • New terminal types may be added and input/ output display formats redesigned without application reprogramming. • High-level languages, such as COBOL and PL/1, can be easily applied to process fixed format message records. IrmEPENVEliCE OF MESSAGE PROCESSING PROGRAtl FRO~; TERW;AL HARDWARE CHARACTERISTICS OPE.RATING SYSTEM FACILITY NETWORK CONTROL PROGRAM IIlPLT MESSAGE EDITOR 3. Data Validation Validity check the information content of the incoming message string to intercept bad data and send out error messages so the user may correct and reenter the message. MESSAGE CONTROL PROGRAM OUTPUT MESSAGE EDITOR MESSAGE. PROCESS I r~G PROGRAM 4. Process and Access Data Base Process message content and exchange information with the corporate data base. Construct the logical Figure 3-Stage 2 SECOND REGION ~lllI REGION Teleprocessing Systems Software • Applications programs are easier to design, program, and test. 2. Main-Frame Efficiency • Static core requirements for application programs and work areas are reduced. • Application prugrams process a message more quickly, reducing the dynamic core requirements, measured in bytes occupied per second. 3. Network Efficiency Optimized use of terminal control characters shortens the message length, conserves message transmission time, reduces line load, and permits an increase in the number of terminals per line; the communications network may then comprise fewer lines at a sizable reduction in installation and maintenance costs for a given number of terminals. 4. Operator Efficiency Optimal use of terminal format control characters increases operator efficiency as much as it relates to display readability and input cursor control. Since the application program is truly independent of the display format, it need not be changed when a display format is redesigned or modified. This feature simplifies making improvements in terminal display design formats. STAGE 3: A Single Retrieval Module Replaces]{any A pplication Programs Progressive reduction of redundant coding from Stage 1 and Stage 2 application programs have already placed the following functions in the shared system modules. • Network and Message Control • File Definition and File Access • • • • 701 Data Definition and Data Retrieval Input and Output Buffer Management Input Message Normalization and Editing Output Message Formatting Only three functions remain to be performed by the application program. • Input Data Validation • Processing logic and the interface with file management programs for data retrieval from the shared data base • Pattern editing for output message; i.e., insert decimal point, comma, $, etc. Expanding the input message editor to perform the function of "input data validation" and the output message editor,"pattern editing," there remains only one function for the message processing program, and even this very last function can be performed by shared system modules. • A shared system module can obtain information from the message descriptor to request data element retrieval from tl:e data base via the data management modules. • For most basic message types, such as INQUIRY, RETRIEVAL, DATA ENTRY, JOURNAL, etc., the processing logic can be easily represented by a simple list which defines the processing path through and within the shared system modules. Therefore, if we create a simple list for each transaction code, the shared system, modules can perform the required processing logic without recourse to an application program except when extraordinary processing logic' occurs. The Stage 3 teleprocessing system will add benefits in addition to those previously derived in Stage 2. 1. Programming Proficiency A SI NG.LE.. REJRl£VAL lIDDULE REMCES MANY MESSAGE PROCESSING PROGRAMS OPERATING SYSTEM FACILITY RESOURCE MANAGEMENT FACI LI TY NETWORK CONTROL PROGRAM r.ESSAGE CONTROL PROGRAM I!:===:::::;-:=::=:I INPUT MESSAGE EDITOR OUTPUT MESSAGE EDITOR GENERAL RETRIEVAL MODULE SECOND REGION FILE f"IANAGEMENT FACILITY NIti REGION • Shared input data validation and output pattern editing permit optimization in program design and efficiency, • • Input data validation can be designed, coded, and tested in optimal fashion at low programming cost '. since they need be programmed only once. Standard error messages can be generated tiirectly. • Application programming, testing, debugging for most transactions .are eliminated. 2. NI qin-Frame Efficiency Figure 4-Stage 3 A single resident reentrant module replaces many many application programs, eliminating the roll-in, roll-out time. 702 Spring Joint Computer Conference, 1970 3. Network Efficiency A well designed input data validation and error message notification technique can effectively cut down the amount of bad message traffic in the network. 4. Operator Efficiency Standard error messages make it easy for the operator to take corrective action on bad input data; the problem of inferring the same meaning from different error messages coded by programmers in various application programs is avoided. SYSTEM LOGIC FLOW The system support package comprises the regional resource manager, the two message editors, the retrieval module, and the data control manager. Refer to Figure 5. The following paragraphs describe the general system logic flow. 1. Initialization At the beginning of the day, the regional resource manager performs the following functions: • Builds program and descriptor directories to expedite the paging activities. • Opens the message queues and on-line files. • Prepares the shared buffer pools. • Loads the resident modules. • Attaches input and output message editors as operationally independent subtasks. When initialization is complete, the input editor is activated and begins to scan the input queues for a message to process. 2. GET Message Scanning of the message is accomplished by means of a table of displacements into a list of message queue control blocks. A pointer to a control block may appear up to four times depending upon the priority assigned to the message queue. If a scan of the entire list fails to find a message, scanning is suspended for a predetermined period of time and then recommenced. 3. Application Message Header After a message is gotten from the input queues, the input message editor constructs a message header for the application program (see Appendix II). This header serves the following functions: • Passes pertinent information relevant to: • • Message disposition. • • Message status. • • System status. Passes information to the application program in a form readily usable by high level languages. • Permits the application program to pass certain information back to the system. 4. Process Routing Figure 5-System logic flow The input editor obtains the process routing information from the transaction code table and constructs a control list for the regional resource manager. For the majority of transactions, this control list will make no reference· to an application program. For those special transaction codes requiring complex processing logic, however, a reference to the Teleprocessing Systems Software associated application program will be included in the list. 5. Message Format The message may simply be a request for a message format. The input message editor posts control to the output message editor to generate a message format (captions and control characters) based on the information in the message descriptor. 6. Normalize I nput Message The input message editor processes the elements in an order controlled by the input message descriptor (Appendix I). The input message descriptor sequentially defines the attributes of each message field: • The starting position in vertical and horizontal coordinates. • The maximum length. • A field designated as caption will be eliminated from processing. • A field designated as mandatory infor~ation must be present or the entire message will be rej ected. • The retrieval descriptor index points into the retrieval descriptor so that additional editing may be performed. The retrieval descriptor defines additional attributes for the message field: • The length and location of the area reserved for the message field. • The data class expected; i.e., numeric or alphameric. Based on the above information, the input message editor performs the following functions: • Checks for invalid characters, such as a letter in a numeric field, posts error condition if invalid character found. Deletes punctuation, such as commas in a numeric field. • Aligns to the left or right. • Truncates if the input message descriptor length exceeds the retrieval descriptor length, such as if the operator included too many decimal positions. 7. Data Validation The retrieval descriptor serves for both editing and data validation. As many as 256 data validation routines may be programmed to permit the choic~ of an appropriate validation technique. Some examples of checking. 703 • RANGE: Test the numerical value of a data field against a predetermined range of values. • CODE: If the input data field is a code argument in a table, the system will perform a table look-up to determine if it is a valid argument. • RECORD KEY: If the input data field references a record key in an on-line file, the system can issue a read (key) against that file to determine if it is valid. • FIELD ASSOCIATION: When one or more input fields depend upon the value of another input field, the system can match them against a predefined associative decision table. • DATE: The following tests can be applied to a date field: • • Any valid date. A holiday. A work day. Today's date. • • Test a range of predefined work days from today's date. Test a range of predefined elapsed days from today's date. S. Standard Error Message If any errors have been identified, a standard error message is prepared. The following considerations are taken to design the standard error messages: • FIXED LOCATION: Error messages always appear in the same location to attract the terminal operator's attention. For example, all the error messages will appear on the last two lines of the CRT screens. • MULTIPLE ERRORS: To conserve network efficiency and eliminate unnecessary traffic arising from bad messages, the system will handle up to four errors per message at a time. • STANDARD PHRASE: The error message will reference the specific input field and indicate the kind of error the system detected for that field. 9. General Retrieval Module The message may have been a general retrieval request. The retrieval descriptor would then have been a core-resident skeleton sufficient to satisfy the normalization and validation routines. The data identification information (element control numbers) supplied by the operator will be inserted in a copy of the skeleton so that data retrieval may proceed. The retrieval routine builds a list containing the file name, the record key, and one or more data 704 Spring Joint Computer Conference, 1970 element control number and receiving area pairs, and requests the services of the data control manager. Nulls are returned for a data element when the operator's or terminal's security clearance is less than that assigned to the element. A status code informs the retrieval routine of any abnormality. The retrieval descriptor defines whether an abnormal status code is to be ignored or considered an error. 10. Output Message Editor The output message editor pattern edits the data field if it is required, prefaces whatever control characters are necessary to position the message character string in the terminal format. It has the following three modes of operation: • Device With Non-Formatted Memory • • Blanks between graphics in the same line and blanks at the front of the· line are replaced everywhere possible with control characters. Lines are truncated on the right after the last graphic. • Device With Formatted Memory: When the transaction requires a new format at the device. The memory of the device is cleared. • • Fields marked as variable in the output message descriptor (device dependent option) are inserted in the message string full size without the blanks. suppressed, even if they are completely blank. Blanks in caption and format fields are replaced with control characters wherever pos;.. sible. • Device With Pre-Formatted Memory: When the transaction requires the same format as is at the device. Caption and format fields are omitted. • • Blanks in variable fields are replaced with control characters wherever possible. • • The effect of each controLcharacter is considered with respect to the existiIlg format. 11. PUT Message This routine places the prepared· response message in the proper destination queue for dispatch to the receiving terminal. 12. Termination A privileged transaction code allows the console operator to terminate the· teleprocessing. system.. The system termination module directs the message control region to discontinue polling on all lines as soon as incoming messages have been received. When polling has been discontinued, the input editors are directed to return control to the regional resource manager whenever they find the process queues empty. Normally, they wait a predetermined period of time and then scan the queues again. When all message processing is complete, the regional resource manager terminates the teleprocessing system. SYSTEM FACILITIES This section will briefly describe some system support functions not mentioned in the previous section: 1. Terminal Start-Up Procedure An operator must log-on before attempting. any other activity on a terminal. For his own protection, he should log-off when his work is completed or during an interruption in which he leaves the terminal. When log-on occurs, an employee number is entered in the terminal table. This employee number is used to set up individual restrictions on the terminal and to facilitate error and security violation tracing. Each time a log-off is processed, a corresponding log-on is required before business can be resumed. A second operator may log-on at a terminal without the· previous . operator logging off; the second operator's employee number and restriction code replace the first's. 2. Data Base Security SIx modes of operation are supported for terminals: • Business: Normal mode for business work. Most applications functions are valid in this mode. INQUIRY, RETRIEVAL, DATA ENTRY, nATA CHANGE, MULTIPLE DESTINATION and JOURNAL are available and work as defined. • Training: Operator training mode for practicing business work. • • INQUIRY and RETRIEVAL work as defined. • • DATA ENTRY and DATA CHANGE appear to the operator as defined but fail to update the data base. • Supervisor: Extended mode for business work. All applications functions are valid in this mode. At the. application's discretion certain transactions may be reserved for supervisor mode or more information· may be passed in this mode. Teleprocessing Systems Software A terminal in supervisor mode may: Put another terminal in the same office in supervisor mode if that terminal is authorized for supervisor mode. • • Display the employee presently logged on a particular terminal. • • Copy a message to another terminal in the same office. • Diagnostic: Systems aid for on-line engineering maintenance; message directed to special. diagnostic programs which display generated status information on the console terminal. • • A terminal in diagnostic mode may return itself to any mode authorized for it. • • I t may copy any terminal in training mode and any terminal may copy it. • Console: Network control terminal located at the computer console; terminal, line, and transaction code status tables may be altered from this terminal. • Master: Network monitor terminal; permits dynamic observation of system for debugging and audit control. Master mode terminals may change the mode status of other terminals to any mode of operation, including master. 3. Network Monitoring Hardware errors will be analyzed and may cause the following actions to be taken: • Send an Et"i-or Message • • Describes the error to the console operator. • • Describes action being taken by the system. • • Suggests action which should be followed by ~the operator. • Manually start or stop a terminal, line segment, line, or group of lines from operation on request. • Automatically stop a terminal, line segment, or group of lines, depending on error which occurred. • Redirect messages to an alternate terminal (which may be a different type) when the original destination terminal has any type of hardware error which renders it unable to transact business. 4. Formats For data entry, a formatting facility preformats a terminal's memory with caption material and control characters at the- operator's request. The operator indicates· the particular transaction format required by suffixing a letter Ii' to the associated transaction code. The facility responds with the format, and the operator simply fills in the data. If the information content· of the message is acceptable for processing, the system restores the vari- 705 able fields to blanks, and the operator can proceed to the next transaction if it is the same. If it is not, a new format may be requested. 5. Data Collection A data collection facility stores audit trail records created by application modules which update the data base and transaction records destined for off-line batch processing programs. These records are sorted and cataloged by off-line programs for easy retrieval at the end of the on-line day. 6. Testing Program Library A special program module library contains all test status programs. Any module loaded from this library is automatically placed in test status to protect the integrity of the data base from unproved routines. IMPLEMENTATION OF THE SYSTEM The teleprocessing system package has been written in IBM-360 Operating System Assembly Language (ALC). 1. It is fully interfaced with the IBM-360 operating system MVT (multiprogramming with variable number of tasks) environment. • The application programs and the system programs operate as independent subtasks of the regional resource manager; abnormal termination of a subtask will not stop the remaining subtasks in the region. • The package is not tied to any particular release of O/S; hence, if a new version is released, there should be little effect on this package. 2. The teleprocessing system package takes full advantage of the existing operating system facilities. 3. It is intended to interface with all the operating system supported languages (COBOL and ALC interface have been implemented). 4. The entire package has been designed to be dynamic in nature; that is, all programs are load modules. They are not linkage edited into the application program; thus, the" package may be redesigned and improved without any appreciable effect on the application programs. 5. The entire package has been programmed in reentrant code. 6. The system has been coded in a modular fashion. Each routine was individually coded, tested in detail, subgrouped, and finally all routines were combined together. 7. The message control and message processing regions are independent of each other to permit relocation 706 Spring Joint Computer Conference, 1970 of the control program to a front-end communications computer when the network size warrants the change. 8. The hardware anticipated over the next several years includes two large central processors with a million bytes of main memory, supported by smaller satellite computers and a score of multi-drive disk storage units. The system is being designed to support several hundred terminals, most of which are expected to be high speed CRT display units. ACKNOWLEDGMENT The authors wish to thank Mr. J. R. Kleespies for his encouragement and support; Mrs. G. L. Kenny, Messrs. R. A. DaCosta and P. A. Terry for their dedicated efforts in design and programming; Mrs. L. J. Fiore for her careful typing; Mr. R. P. Kovach, University of California, for his early research and design of the system; and Messrs. F. J. Thomason and J. W. Nixon of Haskins & Sells for their invaluable advice. APPENDIX I Input message descriptor (fixed form) BITS 1. 2. 3. 4. 5. 6. 7. 8. Retrieval Descriptor Index Message Field Length Mandatory Field Indicator Line N umber (Vertical Spacing) Spare Spare Position (Horizontal Spacing) Caption Delete 8 7 1 6 1 1 7 1 4 BYTES Input message descriptor (free form) 1. Retrieval Descriptor Index 2. Message Field Length 3. Spare 8 7 1 2 BYTES Output message descriptor BIBLIOGRAPHY 1 H LILU A file management system for a large corporate information system data bank FJCC Proceedings 1968 2 J MARTIN Design of real-time computer systems Prentice Hall 1967 3 J MARTIN Programming real-time computer systems Prentice Hall 1965 4 Advances in fast response systems; fast response system design and auditing fast response systems EDP Analyzer February 1967-March 1967 and June 1967 5 C HAUTH Turnaround time for messages of differing priorities IBM Systems Journal Volume 7 No 2 1968 6 W P MARGOPOULOS others Teleprocessing system design IBM Systems Journal Vol 5 No 3 1966 7 M G JINGBERG Notes on testing real-time system programs IBM Systems Journal Vol 4 No 11965 8 J D ARON Real-time systems in perspective IBM Systems Journal Vol 6 No 11967 9 J L EISENBIES Conventions for digital data communication link design IBM Systems Journal Vol 6 No 4 1967 10 L F WERNER Software for terminal-oriented systems Datamation June 1968 11 J DIEBOLD Thinking ahead-Bad decisions on computer use Harvard Business Review January-February 1969 12 Heading format for data transmission A USAAI Tutorial Communications of the ACM 1. 2. 3. 4. 5. 6. 7. Retrieval Descriptor Index Device Dependent Options Line Number (Vertical Spacing) Data Scan Override Format Position (Horizontal Spacing) Caption Field 8 8 6 1 1 7 1 4 BYTES BITS 3 1. Data Class a. Arithmetic (Right Numeric, Left Zero Fill, Decimal Alignment (0-7» 0 0 0 0 1 1 0 1 1 0 0 0 Binary, Display As Decimal Binary, Display As Hex Packed Zoned b. Alphameric (Left Alignment, ~ight Blank Fill) 1 0 0 Graphics (Terminal's Entire Character Set) 1 0 1 Alphabetic 1 1 0 Alphanumeric 1 1 1 Numeric 3 2. Decimal Alignment (For Arithmetic Class Only) o to 7 Places or Date Verification Decision Table (Replaces Decimal Alignment Table) Teleprocessing Systems Software 0 0 0 0 1 1 1 1 10 3 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 Any Valid Date Today's Date Any Holiday Any Working Day Prior Date Prior Date or Today Future Date Future Date or Today 3. File ID Table Index/Code Table Number 4. Verification/Retrieval 0 0 0 Bypass Verification/Retrieval 0 0 1 Date Verification 0 1 0 Data Verification 0 1 1 Duplicate (Descriptor Points to Argument Cross Index Points to Receiving Area Descriptor) 1 0 0 Verify Code Argument (Descriptor Points to Argument) 1 0 1 Verify File Key 1 1 0 Retrieve Code Function (Descriptor Points to Argument) 1 1 1 Retrieve File Element (Cross Index Points to Receiving Area Descriptor) 1 12 7 1 8 16 Retrieval descriptor (validation) BITS 3 1. Data Class 3 2. Date Verification Decision Table 10 3.3 4. Verification Indicators 1 5.12 6. Displacement 7 7. Field Length 1 8.8 9. Data Verification Range Table Index 16 10. Associative Decision Table Index and Data Verification Routine Index APPENDIX II A pplication message header TPCMSHDR The 40 byte application message header allows communication between the message editors, MSGIN and MSOUT, and the application module. Use of the information is left to the discretion of the application analyst. TPCINTM Pattern Edit Output Field Displacement Field Length Spare Retrieval Descriptor Cross Index (See Item 4) or Data Verification Range Table Index 10. DCM File Element Control Number (Binary Half Word) (See Item 4) or Associative Decision Table Index and Data Verification Routine Index The application may insert this field in the generated transaction records' sort keys to post by arrival sequence. Because the conversion of this field to decimal hours, minutes, and seconds is time consuming, it is not appropriate to do so in the on-line environment. Retrieval descript01' (normalize input) (format output) TPCUSER 10 3 1 12 7 1 8 16 3.4. Date Indicator (Output: Format YYMMDD as MM-DD-YY) 5. Pattern Edit Output Field (Input: N ormalize MM-DD-YY as YYMMDD) 6. Displacement 7. Field Length 8.9.10.- Arrival Time of Day A .binary clock maintains the time of day in units of 1/150 second (6% milliseconds). The high order byte contains binary zeros 5. 6. 7. 8. 9. BITS 3 1. Data Class 3 2. Decimal Alignment (Ignore If Date Indicated) 707 User Status Flags MSGIN initializes this field to binary zeros. MSOUT logs it in the QTAM message header. QDUMP retrieves it at the end of the day for application analysis Each application may define its own coding structure. However, the codes should, in the least, describe the TPCSCODE selected and explain why, so that the application analysis can reconstruct the process condition. 708 Spring Joint Computer Conference, 1970 TPCMODE Terminal Mode field to redirect the response to a different terminal. A terminal may be placed in one of several operations modes which define how the system will react to messages from it. 1. Training-The application appears normal to the operator, but no transaction records should be generated or posted to the masterfile. Such a receiving terminal must be a hard copy device. Creation of an invalid terminal identifier will direct the response to a dead letter queue. TPCINNR An application may desire" to maintain special files of pseudo accounts for training and testing and post these in training mode. 2. BUSINESS-The application reacts normally to all stimuli. 3. Supervisor-This mode is normally for terminal operator supervision but on occasion some business work will arrive from a terminal in 'supervisor' mode. The application may handle such work as business or may grant special privileges. Supervisor mode is allowed only for specific terminals and specific employees in supervisory positions. 4. Operator-This mode is normally for systems operation, but on occasion some business work will arrive from a terminal in 'operator' mode. 5. Master-This mode is normally for systems programming, but on occasion some business work will arrive from a terminal in 'master' mode. TPCSOTRM Terminal of Origin Each terminal has a unique five character identifier comprising: DIVISION OFFICE LOCATION UNIT 1 byte alphabetic 1 byte alphabetic 1 byte numeric 2 bytes numeric The application may insert this field in the generated transaction records for journal distribution or as a debugging trace. TPCDSTRM Terminal of Destination The name redefines TPCSOTRM. The application module may alter this Message Sequence Number In QTAM maintains an input message sequence number for each terminal The application may insert this field in the generated transaction records for journal sequencing or as a debugging trace. TPCDATE Today's Julian Date, YYDDD+. This field is supplied for the application's convenience. Because the conversion of this field to calendar format, e.g., YYMMDD, is time consuming, it is not appropriate to do so in the on-line environment. TPCSCODE Transaction Code Modifier TPCTCODE Transaction Code A transaction code identifies an entry from the operator and the related response to the operator. The transaction code modifier X'FO' is assigned to the entry on input and to the standard response for a valid entry on output. The modifiers X'Fl', X'F2', X'F3', X'F4' designate alternate responses selected by the application module processing the entry. They must, however, be designed to fit the display format of the standard response X'FO' is the standard response. X'Fl' is the error description response For Data Entry and Data Change X'Fl' is the standard acceptance response which re-initializes the terminal buffer and screen for the next entry Teleprocessing Systems Software from the operator since the TP System assumes the next entry will be similar to the one just processed. X'F2', X'F3', and for INQUIRY, X'Fl' are available to the Application for alternate responses as they require. COBOL linkage section for application message header 01 TPCMSHDR. 03 TPCINTM 03 TPCUSER 03 FILLER PICTURE S9(004) COMPUTATIONAL. PICTURE S9(002) COMPUTATIONAL. PICTURE X(010) VALUE SPACE. 03 TPCMODE 03 TPCSOTRM. 05 TPCTDST. 06 TPCTDVS 06 FILLER 05 TPCTOFC 05 TPCTNUM 03 TPCDSTRM REDEFINES TPCSOTRM 03 TPCINNR 03 TPCDATE 03 TPCSCODE 03 TPCTCODE 709 PICTURE X(OOI) PICTURE PICTURE PICTURE PICTURE X(OOI). X(OOI). X(OOI). 9(002). PICTURE X(005). PICTURE 89(002) COMPUTATIONAL. PICTURE S9(005) COMPUTATIONAL-3. PICTURE 9(001). PICTURE X(004). The selection and training of computer personnel at the Social Security Administration by EDWARD R. COADY Social Security Administration Baltimore, Maryland INTRODUCTION be classified into the following categories: (Figure 1) How many computer systems managers have claimed their individual systems and operating environments contain a unique group of applications? I suggest the majority answer yes. The "slow-down" in implementing third generation computer systems is implied in this answer. The dilemma that many systems managers are confronted with in the conversion from second to third generation systems is rooted in the educational process or lack of it. This paper will present the social security data processing system, in general terms; the recruitment, selection and training systems for computer personnel and the future tasks of computers and their programmers at the Social Security Administration. THE SOCIAL SECURITY DATA PROCESSING SYSTEM The mission of the Social Security Administration is to operate a social insurance program for the American people. The Bureau of Data Processing and Accounts which is headquartered in Baltimore, Maryland maintains the earnings history for each person with covered earnings who is assigned a social security account number. These earnings records are kept so that when it is time to decide on a person's eligibility for benefits and on his benefit amount, his earnings history is available. To handle these tasks we have 50 computer systems and supporting peripheral gear and over 1200 personnel to program and man these systems. I would like to briefly discuss the major EDP functions to provide an overview for recruitment, selection, and training of programmers at Social Security. The EDP activities of the Bureau of Data Processing and Accounts of the Social Security Administration can 711 (1) The New Account Establishment and Correction Process-This process involves the establishment of various records used to identify social security account number holders. Identifying information is maintained on printed listings by account number and on microfilm by name and date of birth. Approximately 250,000,000 names are found in this file. The establishment process also prepares the magnetic tape record to which worker's earnings information will be posted. (2) The Earnings Record Maintenance Process-The earnings information of individuals participating in the social security program is maintained in two forms, magnetic tape for computer processing and microfilm for visual examination. Each of these earnings data files is updated four times a year. After the earnings data is converted to magnetic tape, the individual employer reports are balanced in a computer process. N ext, the balanced items are processed through a series of sorting operations which provide for the arrangement of items in social security account number sequence. Finally, the current earnings, balanced and sorted, are compared with the summary earnings tape and those records matching on account number and surname are updated. A new summary record is prepared. A microfilm record of those items which match is prepared as a by-product of this operation. (3) The File Search-Benefit Computation-Earnings Statement Process-The magnetic tape file containing 185,000,000 summary earnings records is searched daily to obtain the necessary earnings information for benefit computation and earnings and coverage statement requests. The finder 712 Spring Joint Computer Conference, 1970 Figure 1-EDP applications at SSA 1. 2. 3. 4. 5. 6. NEW ACCOUNT ESTABLISHMENT EARNINGS RECORD MAINTENANCE BENEFIT COMPUTATION REINSTATING BENEFIT MAINTENANCE HEALTH INSURANCE items to be located number about 55,000 and are received from several sources. All of the requests concerning earnings information are arrayed on the finder tape which is processed through editing and sorting operations. The search of the summary earnings records is made, and the records located for claims and statement requests are written out for processing through separate operations. The desired data is prepared on appropriate forms, certified, and forwarded to the requesting individual, organization, or district office. (4) The Reinstating Process-Each year approximately 312 million earnings items are received in the Bureau of Data Processing and Accounts for posting to individual earnings records. Of this amount nearly 3 million items are reported without an account number and are immediately deleted for correspondence. Of the 309 million items which we attempt to post slightly over 14 million reject because of an improperly reported name or account number. These rejected items are subjected to a series of computer and manual reinstating operations designed to locate and correct these reporting errors. The series of computer operations involved in these processes is based upon a thorough analyses of repetitive error" statistics and the nature of the errors encountered in the reporting of account numbers. (5) A master record of all social security beneficiaries is maintained on magnetic tape. This record, arranged in account number sequence, contains complete identification of the beneficiaries-including mailing address, entitlement data, benefit amount, and benefit payment history. The primary use of the record include adding new beneficiaries to the system, correcting and changing information already in the system, identifying beneficiaries becoming eligible for health insurance protection, updating the actual master tape record, preparing transcripts of the updated record for check printing purposes, and preparing a microfilm of the master record for visual reference purposes. (6) The Health Insurance Identification and Enrollment Process-Monthly, the Bureau of Data Processing and Accounts searches magnetic tape records to identify those social security beneficiaries about to obtain age 65. An "Application for Enrollment in Supplementary Medical Insurance" is mailed to each beneficiary identified. A· search of the summary earnings record is also made". to identify non-beneficiaries about to attain age 65, and every effort is made to develop a claim for social security benefits, including Medicare. The combined identifying and response information is processed through a distribution operation to produce Health Insurance cards showing entitlement to Hospital Insurance and Supplemental Medical Insurance. In support of these functions, the Bureau employs approximately 9,000 people. Of this number, over 1,200 persons are directly engaged in our EDP activities. In calendar year 1968, these people were responsible for the processing of over 7,000 different computer applications. We maintain a magnetic tape library of over 160,000 reels and process on the average 5,000 reels per day. It is not uncommon to process a file of 500 or more reels in one operation, for example, in the Health Insurance operations, over 900 reels are needed each month to record information that is subsequently converted to a microfilm file. About 82 million earnings items are posted to the 185 million master earnings accounts each quarter. The actual update operation is handled in a batch processing mode and over 250 hours of computer time are used. An analysis of our system usage reflects the following data in terms of major application areas: (Figure 2) (1) 33 percent to the claims operations-from the initial file of a claim through the continuing maintenance of the account for as long as a benefit is paid. (2) 21 percent to the statistical operations-covering all phases of statistical activities. (3) 19 percent to the health insurance operationsfrom the initial placement on the Master Health Insurance file through the continuing maintenance of the account. Figure 2-Computer usage at SSA 1. 2. 3. 4. 5. CLAIMS STATISTICAL HEALTH INSURANCE EARNINGS MISCELLANEOUS 33% 21% 19% 16% 11 % Selection and Training of Computer Personnel (4) 16 percent to the earnings account operationsfrom the point of establishing the social security account through all of the postings to the master account and the policing of the account when it is in beneficiary status. (5) 10 percent to miscellaneous functions-these include our own systems software activities, management information, and utility operations. At the present time and during the next two to three years, we will be dedicating all the resources that we can spare from current operating demands in order to exploit the full potential of the third generation. THE RECRUITMENT AND SELECTION OF COMPUTER PROGRAMMERS Where do computer programmers come from? Anywhere you can find them. At SSA, we have' discovered them within and without our organization; in our headquarters in Baltimore, our payment center in Birmingham and our district office in Klamath Falls, Oregon. From within the organization, they have come from a variety of occupations: correspondence clerks, secretaries, computer operators, claims examiners, etc. From withoutSSA, we have hired and lured a small number of experienced programmers from private industry and other government agencies. Our most lucrative area of new programmer blood has come from the selectees we have hired through the Federal Service Entrance Examination process. These .trainees are generally fresh from the college campus and have developed into a cadre of valuable employees. We have been using this recruitment source since 1966. The selection system, except for the .experienced hires, is based on an aptitude test score. Actually, two tests are used, one for the SSA employees and the other for the FSEE trainees. The FSEE examination isa general· abilities test which covers vocabulary, reading comprehension and quantitative reasoning. It is used by most government agencies forentry positions in a variety of career fields. The in-house test, which we call the. Organization and Methods examination, was developedbySSA test psychologists and is given for three job categories: management analyst, budget analyst and .computer programmer. There are three parts to the test: verbal, quantitative and abstract reasoning .. Individual test items are statistically related to job success in each part. The emphasis is placed on the job relatedness of the test. Although the test may lack academic flavor it does allow SSA employees to use their backgrounds to demonstrate their abilities in the test areas. The test was developed after the jobs were studied, a validation of the test items Figure 3~Programmer 713 selection criteria APPRAISALS APTITUDE TEST EDUCATION INCENTIVE AWARDS RELATED WORK EXPERIENCE RELATED OUTSIDE ACTIVITIES 38% 35% 7% 3% 12% 5% were made and weights assigned based on the correlation of test score to job success. We have found both the FSEE and the 0&11 tests to be good predictors of success in the .training program and on-the-job. Additionally, we have given the IBM Programmer Aptitude Test to several hundred trainees. The PAT scores, also are indicative of success in programming and correlate with the other tests. We feel the aptitUde test is an integral part of the recruitment and selection system for programmers. The mix of inputs for programming positions is also desirable because: (1) In-house employees generally require less on-the- job training tohecome productive. (2) Organizational morale is boosted when the rank and file employee makes the grade as a programmer. (3) Selecting, only in-house perso;nnel; however, would ultimately weaken the organization, so the infusion of new blood results in strengthening the competitive spirit. The selection for training in computer programming is based. on criteria in addition to the aptitude test score. (FigUre 3) For internal employees we apply numerical weights to' the selection elements as follows: the employee appraisal-38%, the aptitude test score35%, related work experience-12%, education-7%, incentive awards-3%, and related outside activities5%. To illustrate this point, the following is a breakdown of th.e employee' appraisal and the respective element weights: Element Outstanding Above Standard Meets Standard Productivity 7 4 2 Work Quality 7 4 2 Initiative 7 4 2 10 6 3 Resolution of Problems 714 Spring Joint Computer Conference, 1970 Working With Others 5 2 1 Adjustments to Work Pressures 7 4 2 Figure 4-8ource of programmers at SSA WITHIN Total Points = 43 To give you an idea of the great interest in getting into programming work, let me mention some data from our most recent selection process. Nine hundred and sixty-three employees filed applications in response to a bulletin board advertisement for twenty trainee programmer positions. The selection criteria was applied to each applicant. The applicants were ranked by total points. The twenty trainees were selected to attend the training program. These twenty employees had 'A' in aptitude test, most were college graduates, all had above standard or outstanding on all appraisal elements and some had several incentive awards. N ext, we subject the group to an eight week training program. It will be described in detail later. Our historical data reflects; that only thirteen of the twenty will be successful in the training program. This provides support for those employers who are extremely cautious in their programmer selection and hiring practices. We are no exception. Let me summarize the objectives of our selection system: (a) to identify employees for a career program leading to a seniol' supervisory systems analyst, (b) to provide the opportunity for in-house employees to enter this job stream, and (c) to provide the infusion of highly qualified people from outside the organization via the Federal Service Entrance Examination. THE COMPUTER PROGRAMMER TRAINING PROGRAM Armed with highly qualified internal employees and FSEE candidates, as input, we give the candidates a rigorous eight week computer programming training course. The course has three phases; the first three week period consists of presentations on computer fundamentals (we use a hypothetical computer as an example), introduction to System/360 and assembly language coding for S/360. Frequent review quizzes and an examination at end of the third week are given. A comprehensive evaluation by the course instructors of each candidate is then made and unsuccessful candidates are cut from the training program. In-house employees return to their former jobs, and FSEE Headquarters Field Offices WITHOUT FSEE OTHER INPUT NOW 572 349 285 141 162 10 81 10 candidates are assigned to other responsible positions in the Administration. Approximately, one third of the class is phased out at this point. The second phase of the training course consists of one week of advanced assembly language techniques and three weeks of COBOL coding. Several lectures on operating system techniques and job control language are also covered in this phase. The third phase of the course is a series of briefings by members of the programming staffs on special systems, techniques, administrative writing, operational procedures, stan-· dards, etc. (Figure 4) Our history of using the training class as a screening device has been successful. Since 1955, we have selected 1083 employees for training, 507 are programming for us today, 359 or 33% were phased out or voluntarily withdrew during the training course. The remaining 217 have migrated into other organizations, advanced into management positions in other parts of our organization, retired or died. In analyzing, where our strength is, in terms of the most valuable long term employee, we have experienced less turnover in headquarters people than field people. Field office personnel seem to have a strain of nomad in them which shows up as soon as enough experience is gained in programming to make them marketable. The cost in attracting the field employee is high, since the costs of travel, per diem while in training, household moves, etc. are paid to lure him to the headquarters installation. For example, from 1959-1964, we brought 125 field employees into the headquarters for the training program, 46 were cut (37%); of the remaining 79 graduates only 26 (20%) are with us today in programming work. In summing up the training program, our objectives are: (a) to identify those who can program a digital computer, that is, to assimilate, analyze, solve problems, code solutions and evaluate results; (b) to prepare the trainee for the on-the-job environment through training in assembly language and COBOL and the techniques for using these languages. The training program is conducted by our own staff Selection and Training of Computer Personnel of administrative specialists. The class is limited to 30 trainees. The methodology consists of a lecture-problem solving sequence which provides sufficient time for instructor-student counselling and assistance. The manufacturer's manuals are used for reference by the trainees. Several problems in each language are compiled and analyzed during the course. Following the training course, a one year on-the-job training phase takes place. During this period the trainee is evaluated on his programming assignments. The elements used are: (a) (b) (c) (d) (e) (f) (g) (h) (i) ability to absorb and retain information, originality and creative imagination, analytical ability, thoroughness, initiative, industry, working with others, oral expression, and written expression. Even after our refined system of selecting programmers, we have a few trainees that cannot cope with the rigors and frustrations associated with programming work. These employees are phased into other staff or administrative positions. The total system for selecting, training and ultimately promoting employees functions under the legal aegis of a training agreement approved by the U.S. Civil Service Commission. The salary range in this program starts at $7,639 at the entry to a maximum of $11,233. Advances at one year intervals are provided to $9,320 and then to $10,203. These raises are automatic providing satisfactory performance and meeting the time-in-grade requirements. Trainees may enter the stream at any of these levels dependent on their present salary level and experience. Beyond the $11,233 level, competitive promotional procedures are used. THE ADP TRAINING STAFF For those who can afford their own ADP training staff, I like to briefly mention the fruitful experience we have had with ours. In the early 1950's, the EAM days, we had the need for a variety of training courses in machine operation and wiring. One training officer was dedicated to the development and tailoring of these courses for SSA personnel. We soon reaped benefits from this arrangement. We. incorporated our own procedures in the training, conducted the courses at our convenience at our installation. This experience laid the foundation 715 for using our own people for computer programmer and operator training in 1956. The amount of training needed initially and continuously justified the enlargement of the staff to the six instructors we have today. Last year, over 1,200 employees were trained in 78 courses of instruction. Our instructors spent over 5,000 hours in the classroom in the conduct of these courses. In addition to the initial training course, we conduct courses in FORTRAN, COBOL, OS Concepts, Job Control Language and operations courses tailored for medium and large scale systems as well as Operating Systems training for senior operators and job schedulers. The selection of our instructor staff has been based primarily on the following criteria: 1. desire to instruct 2. high performance In the programmer training program 3. programming ability 4. strong administrativB skills THE FUTURE TASKS The problems that we face today with the implementation of third generation systems, and will face in the future with the fourth generation, have their roots in the past. The problems of the second generation and how they were solved dictate to a large degree how we must now proceed. A look at the evolution of automatic data processing at Social Security will serve to give an appreciation of our current problems. Keeping abreast of the processing workloads is not enough, we are required to make major changes in our processes each time Congress enacts a change to the Social Security Act and often the time frame that must be adhered to is not of our choosing. Although we have many large jobs, large data files, and large volume detailed transactions, our conversion effort is not limited to the conversion of a few large jobs. Rather, our activities require the running of many jobs, both large and small. As mentioned earlier, last year we processed over 7,000 jobs: some daily, weekly, monthly, quarterly, annually, and some were one-time operations. With second generation hardware we adhered to the concept of integrating the large computers and the peripheral systems. That is, keep the big machines going with the fastest input/output devices available and burden the smaller equipment with the necessary editing, formatting, printing, and punching. This permeated our every operation-it was a way of life. Most of the almost 500 programmers and systems analysts grew up with this concept. Each of our 716 Spring Joint Computer Conference, 1970 programmers was imbued with a consciousness of the cost of the operation. He set about· to .maximize the utilization of the resources at his command, use all of the tape drives and all of the memory in the system to the extent that their use made his operation the most efficient possible. If he didn't use them, those resources would remain idle during the running of that program. At the same time, he was aware of the relatively high cost of processing a reel of tape, and therefore strove to reduce file sizes. Also, he saved tape space with special non-standard labels; by combining, where feasible, more than one data file on a tape reel; by manipulating the memory character to save space when indicators and codes were needed; and by using many sizesof.variable length records. For example, in our Master Earnings file, we indicated the quarter of coverage pattern by the use of bit codes over the earnings field. I mentioned variable records, our record blocks .vary from 15 characters to nearly 18,000 characters. To illustrate the effect of adding additional characters to each record in our large files, if only one character was added to .each of our 185 million master earnings accounts that we search daily, would result in that file being expanded by 25 reels of tape. Since the file is processed daily, it represents substantial time and cost factors. The preceding are some of the facts and considerations that have led us to where we are today-in the midst of converting to third generation systems. We believe that we had a most efficient second generation installation. We utilized the resources effectively and created a smoothly running program. Now, as We move forward, we have no choice but. to live with, and to remain compatible with, what we created in the past, pending redesign of master records and processing systems. The biggest problem that faces us in the conversion to the third generation is the need to keep the social security program running smoothly while making a gradual transition. Each month, 25 million beneficiary checks must be mailed; new claims for OASDI benefits must be processed and added to the beneficiary reels; and the utilization of health insurance benefits must be recorded. Returning to the training aspect, we find that a properly paced education program is the key to .an orderly conversion period from second to· third genera.,.. tion systems. To date, virtually our entire programming staff has received training on third generation systems. This training program had some problems· of its own. The vetern programmers have·a· wealth of . knowledge and experience with second generation equipment. They are skilled in their own fields. They had to start from scratch. They had to return to class; learn new concepts, .and jargon hardware/software, etc. -in short, they (the pros) were trainees. They not only had to learn new programming skills, but had to keep current operations going full tilt. And all this at a time when we are working in a rapidly expanding work environment. One of the problems we encountered in training for the third generation was the scheduling of people for these classes. The people who needed training were also needed to keep our day-to-day activities current with planned modifications, necessary changes, and scheduled commitments. We solved this by conducting half-day training sessions. Without in-house training capability, training and· conversion would have been seriously hindered. We envision that in several years a real-time claims process .will be available that will permit "instant updating"of our master files. To do this we will have to eliminate our tape files and the 5,000 mountings required each day in our present system. The real-time files would take the· form of large scale. random access devices· and mass storage devices which are capable of supporting a continuous updating process. Response to inquiry and.· request for action would be based on the most recent data possible which would be instantly accessible. For example, we anticipate that a district office will be able to request information over a teleprocessing· system and receive a reply in the same day. The telecommunications linkage is already availableall that is needed is a means of instantly tapping the file to· retrieve and forward the requested data. From 75-100 billion bytes of information will have to be accessible. Our earnings file alone will probably require 40 .billion bytes. To support this vast information storage. and retrieval system there will need to be high speed printers.·or graphic display and photocopy units in each district office. The same devices will be used in other locations where correspondence is handled or where action decisions are made outside of the automated system and inputted to the system. The same basic capabilities will enable us to process a large number of claims in the briefest imaginable time. Data will be wired by the district office, the· earnings record will be summarized instantly and complete claims information will be channelled through to a point where, on receipt of a signal that no problem exists, or on receipt of correcting information, the payment of benefits will be started. To support this system of real-time access and instantly updated input, devices will be needed which place claims application forms and reports requiring action in as direct contact as possible with the EDP system. With proper design of input documents and low equipment cost,the idea of optical scanning devices in district offices will be entertained. .These optical scanners, will have to handle not only claims application data, but reports prepared by beneficiaries as well. As mentioned earlier, the lines of communication Selection and Training of Computer Personnel exist today. With the development of random access files and rapid response input/output devices, our ideal system can be attained perhaps within the next several years. In summary, at present, Automatic Data Processing in the Social Security Administration is in a highly dynamic state of flux from second generation to third generation systems. At the same time our sights must be on the issues and problems we will face with the fourth generation. At the same time and almost in spite of this, our basic mission must remain ever dominant, the administration of the terms of the Social Security Act with all its amendments and related legislation in a timely fashion and with due regard for 717 the rights and needs of that segment of the public whom we serve. BIBLIOGRAPHY 1 A DRATTELL The people problem Business Automation pp 34-41 November 1968 2 R PASTON Organization of in-house education Proceedings of the 1968 International DPMA Conference pp 225-231 3 A BIAMONTE Predicting 8UCCes8 in programmer. training Proceedings of the Second Annual Computer Personnel Research Conference pp 9-12 AMERICAN FEDERATION OF INFORMATION PROCESSING SOCIETIES (AFIPS) OFFICERS and BOARD OF DIRECTORS OF AFIPS V ice Pre8ident Pre8ident Dr. Richard I. Tanaka California Computer Products, Inc. 305 North Muller Street Anaheim, California 92803 Mr. Keith W. Uncapher The RAND Corporation 1700 Main Street Santa Monica, California 90406 Secretary Treasurer Mr. R. G. Canning Canning Publications, Inc. 134 Escondido Avenue Vista, California 92083 Dr. Robert W. Rector Informatics, Inc. 5430 Van N uys Boulevard Sherman Oaks, California 91401 Executive Director Executive Secretary Dr. Bruce Gilchrist AFIPS Headquarters 210 Summit Avenue Montvale, New Jersey 07645 Mr. H. G. Asmus AFIPS Headquarters 210 Summit Avenue Montvale, New Jersey 07645 ACM Director8 Professor Anthony Ralston State University of New York Computing Center 4250 Ridge Lea Road Amherst, New York Dr. B. A. Galler Computing Center University of Michigan Ann Arbor, Michigan 48104 Mr. Donn B. Parker Stanford Research Institute 333 Ravenswood Avenue Menlo Park, California 94025 IEEE Director8 Mr. L. C. Hobbs Hobbs Associates, Inc. P.O. Box 686 Corona del Mar, California 92625 Dr. Robert A. Kudlich Wayland Laboratory Raytheon Company Boston Post Road Wayland, Massachusetts 01778 Dr. Edward J. McCluskey Dept. of Electrical Engineering Stanford University Palo Alto, California 94305 S~'mulation Council8 Director Mr. James E. Wolle General Electric Company Missile & Space Division P.O. Box 8555 Philadelphia, Pennsylvania 19101 American Society for Information Director Mr. Herbert Koller ASIS 2011 Eye Street, N. W. Washington, D.C. 20006 A.~sociation jor . Computation, Linguistic,~ Director Special Libraries ,A s8ociation D1:rector Dr. Donald E. Walker Head, Language and 'fext Processirig The Mitre Corporation Bedford, Massachusetts 91730 ::\1r. Burton E. Lamkin X ational Agricultural Library U.S. Department of Agriculture Beltsville, Maryland Society for Injormation Display Director Society for I ndust1-ial and Applied 111athematics Director Mr~ William Bethke RADC-(EME, W. Bethke) Griffis Air Force Base New York, N ew York 13440 Dr. D. L.Thomsen, Jr. IBM Corporation Armonk, New York 10504 AFIPS Committee Chairmen Awards Mr. Fred Gruenberger 5000 Beckley Avenue Woodland Hills, California 91364 Constitution & By Laws Mr. Richard G. Canning Canning Publications, Inc. 134 Escondido ·Avenue Vista, California 92083 Admissions Dr. Robert W. Rector Informatics, Inc. 5430 Van N uys Boulevard Sherman Oaks, California 91401 Ad Hoc Conference Committee Dr. Barry Boehm Computer Science Department The RAND Corporation 1700 Main Street Santa Monica, California 90406 Education Finance Dr. Melvin A. Shader CSC-Infonet 650 N. Sepulveda Blvd. EI Segundo, California 90245 Mr. Walter L. Anderson General Kinetics, Inc. 11425 Isaac Newton Square So. Reston, Virginia 22070 Harry Goode Memorial Award IFIP Congress 71 Mr. Brian W. Pollard Radio Corporation of America-NPL 200 Forest Street Marlboro, Massachusetts 01752 Dr. Herbert Freeman Professor of Electrical Engineering New York University University Heights New York, New York 10453 International Relations JCC Conference Dr. Richard I. Tanaka California Computer Products, Inc. 305 North Muller Street Anaheim, California 92803 Dr. A: S. Hoagland IBM Research Center P.O. Box 218 Yorktown Heights, New York 10598 I njormation Systems Miss Margaret Fox Office of Computer Information U.S. Department of Commerce National Bureau of Standards Washington, D.C. JCC Technical Program Dr. David R. Brown Stanford Research Institute 333 Ravenswood Avenue Menlo Park, California 94025 JCC General Chairmen 1970 FJCC 1971 SJCC Mr. Robert A. Sibley, Jr. Department of Computer Science University of Houston Cullen Boulevard Houston, Texas 77004 Mr. Jack Moshman RAMSCO 6400 Goldboro Road Bethesda, Maryland 20034 1971 FJCC Mr. Ralph R. Wheeler Lockheed Missiles and Space Co. Dept. 19-31, Bldg. 151 P.O. Box 504 Sunnyvale, California 94088 1970 SJCC STEERING COMMITTEE 1970 SJCC Steering Committee General Chairman Harry L. Cooke RCA Laboratories V ice Chairman William C. Tarvin UNIVAC Technical Program James H. Bennett-Chairman Applied Logic Corporation Horace Fisher-Vice Chairman Applied Logic Corporation Paul Chinitz UNIVAC Martin Goetz Applied Data Research David E. Lamb University of Delaware Thomas H. Mott Rutgers University Joseph Raben Queens College of the City of N.Y. C. V. Srinivasan Rutgers University Sheldon Weinberg Cybernetics International Public Relations Conrad Pologe-Chairman AT&T .J. Bradley Stroup-Vice Chairman General Electric Co. Special Activities Edward A. Meagher-Chairman Hoffman La Roche Inc. Tom Carscadden-Vice Chairman The Shering Corp. Registration Richard A. Bautz-Chairman Axicom Systems Inc. Mark Ricca-Vice Chairman Comsul Ltd. Ladies Activities Peggy Crossan-Chairman RCA Rita Morley-Vice Chairman RCA Publications Donald Prigge-Chairman UNIVAC Phillip A. Antonello-Vice Chairman UNIVAC Exhibits Treasurer James T. Dildine Price Waterhouse' & Co. Stanley R. Keyser-Vice Price Waterhouse & Co. Ed Snyder-Chairman Lou Zimmer Organization Herbert Richman-Vice Chairman Data General Corp. SCI Representative Secretary Edwin L. Podsiadlo Dataram Corp. Ronald T. Avery~Vice Dataram Corp. Local Arrangements John J. Geier-Chairman Univac Edward L. Hartt-Vice Chairman N. J. Bell Telephone Co. Jess Chernak Bell Telephone Labs A CAl Representative A. B. Tonik UNIVAC IEEE Representative N. R. Kornfield Data Engineering Associates SESSION CHAIRMEN, PANELISTS, DISCUSSANTS, REFEREES' SESSION CHAIRMEN Saul Amarel Peter Denning Thomas De Marco George Dodd Robert Forest Martin Goetz Julien Green Herbert Greenberg Albin Hastbacka Theodore Hess H. K. Johnson R. A. Kaenel Alvin Kaltman John L. Knupp, Jr. David Lamb James F. Leathrum A. Metaxides John Morrissey Stuart L. Mathison Marvin Paull Howard R. Popper Joseph Raben James Rainey David Ressler Lawrence Roberts William Rogers Hal Sackman William Schiesser John Seed R. E. Utman Sheldon Weinberg Kendall Wright Karl Zinn PANELISTS Andrew Aylward Roger M. Bakke Kenneth R. Barbour Ed Berg H. Borko Donald Croteau John C. Cugini S. H. Chasen Steven A. Coons Jack Dennis O. E. Dial Ernest Dieterich Saul Dinman Philip H. Dom S. Drescher R. A. Dunlop E. S. Dunn, Jr. Joel D. Erdwinn Alfred Ess Robert B. Forest John L. Gentile Jack Goeken A. J. Goldstein David McGonagle Donald N. Graham Frank Greatorex Kelly E. Griffith Herbert J. Grosch Margaret Harper Frank E. Heart Vico Henriques Bertrom Herzog Richard Hill L. Kestenbaum Charles Lecht Lawrence 1. Lerner William Lewish William R. Lonergan Michael Maccoby John McCarthy Carl Machover Sam Matsa George H. Mealey M. A. Melkanoff Therber Moffett Allen Newell Seymour Papert E. Pa~ker Mike Patterson Ralph Pennington Michael 1. Rackman L. John Rankin Carl Reynolds Leonard Rodberg Arthur Rosenberg Robert Rossheim Gordon R. Sanborn Robert Simmons Dan Sinnott William G. Smeltzer, Sr. David M. Smith William D. Stevens Harrison Tellier Frederick B. Thompson Carl Vorlander Larry H. Walker, Jr. Philip M. Walker William E. Ware Frank Wesner DISCUSSANTS James P. Fry A. N. Habermann B. Huberman Butler W. Lampson H. W. Lawson Gene Levy R. McClure R. E. Merwin T. Kevin O'Gorman Richard Robnett Jerome H. Saltzer Edgar A. Sibley Dudley Warner Thomas Wills-Sandford REFEREES Ralph Alter Stanley Altman Paul Baran T. Benjamin W. A. Beyer Garrett Birkhoff John Bruno Edward Burfine Walter Burkhard Peter Calingaert W. F. Chow C. Christensen J. W. Cooley Peter J. Denning Donald O. Doss William B. Easton R. D. Elbouin Everett Ellin Edward R. Estes George A. Fedde Wallace Feurzeig Tudor R. Finch James Flanagan Franklin H. Fowler, Jr. Margaret R. Fox Al Frazier C. V. Freiman James P. Fry Edward Fuchs L. M. Fulton Adolph Futterweit Reed M. Gardner John Gary James B. Geyer Clarence Giese M. C. Gilliland G. Golub M. H. Gotterer Alonzo G.~Grace, Jr. J. Greenfield Donald W. Grissinger George F. Grondin W. B. Groth Frank G. Hagin Murray J. Haims Carl Hammer Fred M. Haney A. G. Hanlon R. Dean Hartwick A.D.Hause John F. Heafner Walter A. Helbig Bertram Herzog Elias H. Hochman R. W. Hockney A. D. C. Holden Robert L. Hooper R. Howe S. Hsu Thomas A. Humphrey Albert S. Jackson Ronald Jeffries R. A. Kaenel Marvin J. Kaitz R. Kalaba Ted Kallner M. S. Kephign Robert E. King Edward S. Kinney Justin Kodner Igal Kohari John Kopf G. A. Korn A. B. Kronenberg Jerome Kurtzberg Kenneth C. K wan Dominic A. Laiti Richard I. Land W. D. Lansdown D. J. Lasser Gary Leaf Gene Levy W. Wayne Lichtenberger Minna Lieberman J. Lindsay Robert Linebarger Dimitry A. Lukshin R. McDowell Charles M. Malone Carl W. Malstrom Paul Meissner Leslie Mezei Stephen W. Miller Baker A. Mitchell, Jr. M.Moe Robert P. Myers Joseph A. O'Brien T. Kevin O'Gorman Thomas C. O'Sullivan Thomas Pyke Toby Robison G. Rybicki Asra Sasson Elmer Shapiro C. K. Show E. H. Sibley J. R. Splear Mary Stevens Thomas Stockham Fred Tonge R. Vichnevetsky Dudley Warner Jerome Wiener Calvin Wilcox Lyle C. Wilcox David G. Williams Thomas Wills-Sandford James E. Wolle Sherrell L. Wright Yu-chi Ho 1970 SJCC PRELIMINARY LIST OF EXHIBITORS ACM Addison-Wesley Publishing Company Addmaster Corporation Addressograph Multigraph Corporation Advance Research, Inc. Advanced Memory Systems, Inc. Advanced Space Age Products, Inc. Advanced Terminals Inc. AFIPS Press Airoyal Mfg. Co. Allen-Babcock Computing, Inc. Allied Computer Systems Inc. Allied Computer Technology Inc. American Data Systems American Regitel Corporation American Telephone & Telegraph AMP Incorporated Ampex Corporation Anderson Jacobson, Inc. Applied Data Research, Inc. Applied Digital Data Systems, Inc. Applied Dynamics Inc. Applied Logic Corporation Applied Magnetics Corporation Applied Peripheral Systems, Inc. Astrocom Corporation Astrodata, Inc. Atlantic Technology Corporation Atron Corporation Auerbach Info, Inc. Auricord-Div. of Scoville Co. Auto-Trol Corporation Axicom Systems, Inc. Beehive Electrotech Inc. Bendix Corp.-Advanced Products Div. Beta Instrument Corporation BIT, Incorporated Boole & Babbage, Inc. Bridge Data Products, Inc. Brogan Associates, Inc. . Bryant Computer Products Bucode Inc. Bunker-Ramo Corp.-Bus. & Ind~ Div. Business Press International, Inc. California Computer Products, Inc. Cambridge Memories, Inc. Carterfone Communications Corporation Century Data Systems, Inc. Cincinnati Milling Machine Co. Cipher Data Products, Inc. Clary Datacomp Systems, Inc. Codex Corporation Cogar Corporation Cognitronics Corporation Colorado Instruments, Inc. Comcet, Inc. Community Computer Corporation Compat Corporation Compiler Systems Inc. CompuCord, Inc. Computek, Inc. Computer Automation, Inc. Computer Design Publishing Computer Devices, Inc. Computer Digital Systems, Inc. Computer Displays, Inc. Computer Learning & Systems Corporation Computer-Link Corporation Computer Micro-Image Systems, Inc. Computer Operations Computer Optics, Inc. Computer Peripherals Corporation Computer Products, Inc. Computer Sciences Corporation Computer Signal Processors, Inc. Computer Synectics Computer Terminal Corporation Computer Transceiver Systems, Inc. Computervision Corporation Computerworld Consolidated Computer Services Limited Courier Terminal Systems, Inc. Cybermation, Inc. Daedalus Computer Products, Inc. Dasa Corporation Data 100 Corporation Data Action Corporation Data Automation Communications Data Card Corporation Data Computer Systems, Inc. Data Computing, Inc. Datacraft Corporation Data Disc Dataflo Business Machines Corporation Data General Corporation Dataline Inc. Datamate Computer Systems, Inc. Datamation Datamax Corporation Data Printer Corporation Datapro Research Data Processing Magazine Data Products News Data Products Corporation Dataram Corporation Data Systems News DataTerm Inc. Data Terminal Systems Datatrol Corporation Datatrol Inc. Datran Corporation Delta Data Systems Corporation Digi-Data Corporation Digital Equipment Corporation Digital Information Devices, Inc. Digital Scientific Corporation Digitronics Corporation DSI Systems, Inc. Dynelec Systems Corporation Eastman Kodak Co.-Bus. Sys. Mark. Div. EG&G EDP Technology Inc. Edutronics Edwin Industries Corporation Electronic Arrays Components Div. Electronic Arrays Systems Div. Electronic Associates, Inc. Electronic Information Systems, Inc. Electronic Memories Electronic N ews-Fairchild Publications EMR Computer Engineered Data Peripherals Corporation Fabri-Tek Inc. Facit-Odhner, Inc. Factsystem Inc. Ferroxcube Corporation Ford Industries Foto-Mem, Inc. General Automation Inc. General Computers, Inc. General Electric Company Gerber Scientific Instrument Co. Gould, Inc. -Graphics Division GRI Computer Corporation Hayden Publishing Company, Inc. Hazeltine Corporation Hewlett-Packard Hitachi Honeywell-CCD Honeywell-EDP Houston Instrument IBM Corporation IEEE IER Corporation Image Systems, Inc. Imlac Corporation Industrial Computer Systems, Inc. Info-Max Inforex, Inc. Information Information Information Information Control Corporation Data Systems, Inc. International Storage Systems, Inc. (ISS) Inform~tion Technology Inc. (ITI) Information Technology, Incorporated Infotec, Inc. Infotechnics, Inc. Infoton, Inc. Interdata, Inc. Interface Mechanisms, Inc. International Computer Products, Inc. International Computers Limited International Data Sciences, Inc. Interplex Corporation Iomec, Inc. Jacobi Systems Corporation Kennedy Company Keymatic Data Systems Corporation Kybe Corporation Litton-Automated Business Machines Litton-Datalog Division Lockheed Electronics Company Logic Corporation McGraw-Hill Book Company 3 1\1 Company Madatron Corporation MAl Mandate Systems, Inc. Marshall Data Systems Mechanical Enterprises, Inc. Megadata Memorex Corporation Memory Technology, Inc. Merlin Systems Corporation Micro Switch-Div. of Honeywell Micro Systems Inc. Microwave Communications of America, Inc. Milgo Electronic Corporation Modern Data Mohawk Data Sciences Corporation Monitor Data Corporation Monitor Displays Motorola Instrumentation & Control Inc. NCR-Industrial Products Division NCR-Paper Sales Nortec Computer Devices, Inc. Nortronics Company, Inc. Novar Corporation Novation, Inc. Omega-T Systems Inc. Omnitec-A Nytronics Corp. Path Computer Equipment Penta Computer Associates, Inc. Penril Data Communications, Inc. Peripheral Dynamics Inc. Peripheral Equipment Corporation Peripherals General, Inc. Perspective Systems, Inc. Potter Instrument Precision Instrument Company Prentice-Hall, Inc. Press Tech, Inc. Princeton Electronic Products Quantum Science Corporation Quindar Electronics, Inc. Raytheon Company RCA Corp.-Electronic Components RCA Corp.-Graphic Systems Division RCA Corp.-Memory Products Division RCA Ltd.-Divcon Systems Reactive Computer Systems Redcor Corporation Remex Electronics~Div. Ex-Cell-O Corp. Research & Development (F. D. Thompson) Resistors RFL Industries, Inc. Rixon Electronics Inc. Rolm Corporation Royco Instruments, Inc. Sanders Associates, Inc. Sangamo Scan-Data Corporation Scan-Optics, Inc. Scientific Time Sharing Corporation Scope The Service Bureau Corporation Shepard/Div. of Vogue Instrument Corp. Singer-Friden Division Singer-Librascope Singer-Tele-8ignal Corp. Sonex, Inc. Spartan Books Spiras Systems, Inc. Standard Memories-Sub. Applied Magneti Storage Technology Corporation Sycor, Inc. Sykes Datatronics, Inc. Syner-Data, Inc. Sys Associates, Inc. Systematics/Magne-Head-Div. Gen. Instr Systems Engineering Labs Tally Corporation TDK Electronics Company, Ltd. TEC, Inc. Technitrend Inc. Tektronix, Inc. Teletype Corporation Telex Computer Products Tel-Tech Corporation Tempo Computers Inc. Texas Instruments Timeplex, Inc. Time-Sharing Terminals, Inc. Tracor Data Systems Inc. Trio Laboratories, Inc. Tymshare, Inc. Typagraph Corporation Univac-Div. of Sperry Rand University Computing Company UTE (United Telecontrol) Vanguard Data Systems Varian Data Machines Varisystems Corporation Vermont Research Corporation Versatec, Inc. Vernitron Corporation Viatron Computer Systems Victor Comptometer Corporation Wang Computer Products, Inc. (WCP) Wang Labs Wanlass Electric Co. Weismantel Associates, Inc. Western Union John Wiley & Sons, Inc. Xerox Corp.-Business Products Group Xynetics, Inc. AUTHOR INDEX Abate, J., 143 Anderson, R., 653 Andrews, D. W., 131 Armenti, A., 313 Arthurs, E., 267 Ash, W. L., 11 Avizienis, A., 95, 375 Barsamian, H., lS3 Bartlett, W. S., 267 Bartow, N., 191 Baskett, F., 459 Bell, C. G., 351 Bell, G., 657 Bouknight, J., 1 Brown, P. A., 251 Browne, J. C., 459 Burnett, G. J., 467 Cady, R., 657 Cardenas, A. F., 513 Carr, C. S., 589 Cerf, V. G., 589 Chooljian, S. K., 297 Chou, W., 581 Christensen, C., 673 Church, C. C., 343 Coady, E. R., 701 Coffman, E. G., 467 Colony, R., 409 Coury, F. F., 667 Crandall, C. E., 485 Crocker, S. D., 589 Crowther, W. R., 551 Delagi, B., 657 Dial, O. E., 449 Donow, H. S., 287 Dorf, R. C., 607 Dressen, P. C., 307 Dubner, H., 143 Engvold, K. J., 599 Erickson, R. F., 281 Eskelson, N., 539 Foster, D. F., 649 Frank, H., 581 Frisch, I. T., 581 Galley, S., 313 Geary, L.C., 437 Gerard, R. C., 237 Gibbs, N. E., 91 Goldberg, R., 313 Gracon, T. J., 31 Grishman, R., 59 Hackney, S., 275 Harrison, Nt C., 507 Hause,A. D., 673 Heart, F. E., 551 Holmes, D. W., 687 Hope, H. H., 487 Huesman, L. R., 629 Hughes, J. L., 599 Johnson, E. L., 19 Jones, C. H., 599 Kahn, R. E:, 551 Karplus, W. J., 513 Katzan, H., 109 Kelley, K., 1 Kleinrock, L., 453, 569 Kopetz, H., 539 Ladd, D. J., 267 Levine, D. A., 487 Li, C. C., 437 Lipovski, G. J., 385 Litofsky, B., 323 Liu, H., 687 l\1cFarland, H., 657 l\lcGuire, R., 191 lVlanna, Z., 83 1\1athur, F. P., 375 lVlei, P. S., 91 l\1olho, L. 1\1., 135 l\1organ, H. L., 217 l\1yers, J. E., 297 Nelson, P., 397 Newell, A., 351 Newport, C. B., 681 Nolan, J., 313 Noonan, R., 657 O'Laughlin, J., 657 Ornstein, S. 1\1., 551 Ossana, J. F., 621 Perlis, H. W., 475 Potts, G. W., 333 Prasad, N., 223 Prywes, N. S., 323 Raamot, J., 39 Radice, R. A., 131 Raike, W. lVI., 459 Ramamoorthy, C. V., 165 Reinfelds, J., 539 Reynolds, R. R., 409 Rigby, C. 0., 251 Ritchie, G. J., 613 Robers, L. G., 543 Rouse, D. 1V1., 207 Salmon, R. L., 267 Saltzer, J. H., 621 Sargent, 1V1., 525 Schneider, V., 493 Selwyn, L. L., 119 Serlin, 0., 237 Shakun, 1V1. L., 487 Sholl, A., 313 Sibley, E. H., 11 Steingrandt, W. J., 65 Strand, E. 1V1., 475 Symes, L. R., 157 Szygenda, S. A., 207 Taylor, R. W., 11 Thompson, E. W., 207 Thurber, K. J., 51 Tomalesky, A. W., 43 Trauboth, H., 223, 251 Tsuchiya, M., 165 Tung, C., 95 Turner, J. A., 613 Vemuri, V., 403 Vichnevetsky, R., 43 Walden, D. C., 551 Walters, N. L., 77 Wear, L. L., 607 Wessler, B. D., 543 Whipple, J. H., 267 White, H. J., 199 WixsoI)., S. E., 475 Wong, S. Y., 417 Yau, S. S., 65 Yu, E. K. C., 199 Zukin, A. S., 417


  • Source Exif Data:
    File Type                       : PDF
    File Type Extension             : pdf
    MIME Type                       : application/pdf
    PDF Version                     : 1.3
    Linearized                      : No
    XMP Toolkit                     : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19
    Producer                        : Adobe Acrobat 9.0 Paper Capture Plug-in
    Modify Date                     : 2008:11:18 04:30:45-08:00
    Create Date                     : 2008:11:18 04:30:45-08:00
    Metadata Date                   : 2008:11:18 04:30:45-08:00
    Format                          : application/pdf
    Document ID                     : uuid:bbdb5446-c450-4c08-9e09-5b9d95703049
    Instance ID                     : uuid:58dda28f-6798-4211-a2a8-f631e64a0aeb
    Page Layout                     : SinglePage
    Page Mode                       : UseOutlines
    Page Count                      : 739
    
    EXIF Metadata provided by EXIF.tools

    Navigation menu