1969 11_#35 11 #35

1969-11_#35 1969-11_%2335

User Manual: 1969-11_#35

Open the PDF directly: View PDF PDF.
Page Count: 834

Download1969-11_#35 1969-11 #35
Open PDF In BrowserView PDF
AFIPS
CONFERENCE
PROCEEDINGS
VOLUME 35

1969
FALL JOINT
COMPUTER
CONFERENCE
November 18 - 20, 1969
Las Vegas, Nevada

The ideas and opinions expressed herein are solely those of the authors and are
no necessarily representative of or endorsed by the 1969 Fall Joint Computer
Conference ComInittee or the American Federation of Information Processing
Societies.

Library of Congress Catalog Card Number 55-44701
AFIPS PRESS
210 Summit Avenue
Montvale, New Jersey 07645

c 1969 by the American Federation of Information Processing Societies, Montvale,
New Jersey, 07645. All rights reserved. This book, or parts thereof, may not be
reproduced in any form without permission of AFIPS Press.

Printed in the United States of America

CONTENTS
OPERATING SYSTEl\1S
A survey of techniques for recognizing parallel processable stre2,ms
in computer programs .........•............................

1

Performance modeling and. empirical measurements in a system
designed for batch and time-sharing users .................... .

17

M. J. Gonzalez
C. V. Ramamoorthy
J. E. Shemer

D. W. Heying
Dynamic protection structures ............................... .
The ADEPT-50 time sharing system .......................... .

39'

An operational memory share supervisor providing multi-task
processing within a single partition .......................... .

51

27

B. W. Lampson
R. R. Linde
C. Weissman
C. Fox
J. E. Braun
A. Oart~nhaus

ARRAY LOGIC-LOGIC DESIGN OF THE 70's
Structured logic .......................................

0

•••••

61

R. A. Henle

I. T. Ho
Characters-Universal architecture for LSI .................... .
Fault location in cellular arrays ............................. '..
Fault mUltiplication cellular arrays for LSI implementation ...... .
The pad relocation technique for interconnecting LSI arrays of
imperfect yield ....................... ' .................... .

69
81

89

O. A. Maley
R. Waxman
F. D. Erwin
K. J. Thurber
C. V. Ramamoorthy
S. C. Economides

99

D. F. Calhoun

111

R. O. Skatrud'
C. Weissman
E. V. Comber

COMPUTERS FOR CONGRESS
(Panel Session-No papers in this volume)
THE COMPUTER SECURITY AND PRIVACY CONTROVER.SY
The application of cryptographic techniques to data processing ... .
Security controls in the ADEPT-50 time-sharing system ......... .
Management of confidential information ....................... .

119
135

PROGRAMMING LANGUAGES AND LANGUAGE PROCESSOR.S
Some syntactic methods for specifying extendible programming
languages ..... , .......................................... .
SYMPLE-A.general syntax directed macro processor .......... .

An algebraic extension to LISP ............................... .
An on-line machine language debugger for OS/360 .............. .
The multics PL/1 compiler .................................. .

145
157

169
179
187

V. Schneider
J. E. Vander Mey
R. C. Varney
R. E. Patchen
P. Knowlton
W. H. Josephs
R. A. Freibeurghouse

FORTHCOMING COMPUTER ARCHITECTURES
A design for a fast computer for scientific calculations ........... .
A display processor design ................................... .

209

The system logic and usage recorder. . . ....................... .
Implementation of the NASA modular computer with LSI functional characters .......................................... .

219

P. M. M elliar-Smith
R. W. Watson
T. H. Myer
I. E. Sutherland
M. K. Vosbury
R. W. Murphy

231

J. J. Pariser
H. E. Maurer

247

O. A. Korn

255

D. S. Miller
M. J. Merritt
M. A. Franklin
J. C. Strauss
W. L. Oraves
R. A., MacDonald

DIGITAL SIMULATION OF CONTINUOUS SYSTEMS
Project DARE: Differential analyzer replacement by on-line
digital simulation ......................................... .
MOBSSL-UAF: An augmented block structured continuous systems simulation language for digital and hybrid computers ..... .
A hybrid computer programming system .....•.................

275

Hybrid executive-User's approach ........................... .

287

PROBLEMS IN MEDICAL DATA PROCESSING
A system for clinical data management ........................ .

297

R. A. Oreenes
A. N. Pappalardo
C. W. Marble
O. O. Barnett

Medical education: A challenge for natural language analysis,
artifical intelligence, and interactive graphics ................. .

307

J. C. Weber
W. D. Hagamen

Design principles for processor maintainability in real-time systems ..

319

Effects and detection of intermittent failures in digital systems ....

329

Modular computer architecture strategies for long-term mission ...

337

A compatible airborne multiprocessor ......................... .

347

H. Y. Chang
J. M. Scanlon
M. Ball
F. Hardie
F. D. Erwin
E. Bersoff
E. J. Dietrich
L. C. Kaye

ARCHITECTURES FOR LONG TERM RELIABILITY

PUBLISHING VERSUS COMPUTING
(Panel Session-No papers in this volume)
INFORMATION MANAGEMENT SYSTEM,S FOR THE 70's
(Panel Session-No papers in this volume)

WHAT HAPPENED TO LSI PROMISES
LSI-Past promises and present accomplishment-The dilemma
of our industry ........................................... .
What has happened to LSI-A supplier's view' ................. .

359
369

H. G. Rudenberg
C. G. Thornton

Real-time graphic display of time-sharing system operating
characteristics. . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A graph manipulator for on-line network picture processing ...... .

379
387

On-line recognition of hand generated symbols ................. .

399

J. M. Grochow
H. A. DiGiulio
P. L. Tuan
G. M. Miller

TOPICS IN ON-LINE TECHNIQUES

MANAGING MONEY WITH ,COMPUTERS
(Panel Session-No papers in this volume)
DATA BASE AND FILE MANAGEMENT STRATEGIE.S
Common file organization techniques compared ................. .
An information retrieval system based on superimposed coding ... .

413

423

Establishment and maintenance of a storage hierarchy for an
on-line data base \lnder TSS/360 ........................... .

433

Resources management subsystem for a large corporate information system .............................................. .

441

N. Chapin
J. R. Files
H. D. Huskey
J. P. Considine
A. H. Weiss

H. Liu

W. S. Peck
P. T. Pollard
Incorporating complex data structures in a language designed for
social science research ..................................... .

453

S.

Jr. Kidd

CIRCUIT /MEMORY INNOVATIONS
A nanosecond threshold logic gate ............................ .
Silicon-on-sapphire complementary MOS circuits for high speed
associative memory ........ , .............................. .

463

L. Micheel

469

J. R. Burns
J. H. Scott

A main frame semiconductor memory for fourth generation
computers ...... '. , ........................................ .

479

A new approach to memory and logic-Cylindrical domain devices.

489

A new integrated magnetic memory ........................... .

499

T. W. Hart,Jr
D. W. Hillis
J. Marley
R. C. Lutz
C. R. Hoffman
A. H. Bobeck
R. F. Fischer
A. J. Perneski
M. Blanchon
M. Carbonel

Mated film memory-Implementation of a new design and
production concept. . . . ••...........•......................

505

L. A. ProhoJsky
D. W.Morgan

THE IMPACT OF STANDARDIZATION FOR THE 70's
(Panel Session-No papers in this volume)
USING COMPUTERS IN EDUCATION
A computer engineering laboratory ............................ .
Evaluation of an interactive display system for teaching numerical
::.nalyBiJ.a. . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . .
Computer based instruction in computer programming: A symbol
manipulation-List processing approach ..................... .

515

D. M. Rob.inson

525

P. Oliver
F. P. Brooks, Jr.

535

P. Lorton, Jr.
J. Slimick

545

A. M. Hlady
T. W. (lay, Jr.
M. L. Dertouzos
K. Nezu
S. Naito

COMPUTER RELATED SOCIAL PROBLEMS: EFFECTIVE
ACTION ALTERNATIVES
(Panel Session-No papers in this volume)
DEVELOPING A SOFTWARE ENGINEERING DISCIPLINE
(Panel Sossion-N0 papers in this volume)
PROPRIETARY SOFTWARE PRODUCTS
(Panel Session-:-No papers in this volume)
HARDWARE TECHNIQUES FOR INTERFACING MAN WITH
THE COMPUTER
A touch sensitive X-Y position encoder for computer input ...... .
A queueing model for Bcan conversion ......................... .
Charcter generation from resistive storage of time derivatives .... .
Economical display generation of a large character set ........... .

553
561
569

COMPUTER-AIDED DESIGN OF COMPUTERS
ISDS: A program that designs computer instruction sets ........ .
Directed library search to minimize cost ....................... .
Computer-aIded-design for custom integrated systems ........... .

575
581
599

F. M. Haney
B. A. Chubb
W. K. Orr

613
625

D. M. Avedon
S. A. Brown

629

J. K. Koeneman
J. R. Schwanbeck

MANAGEMENT PROBLEMS IN HYBRID COMPUTER
FACILITIES
(Panel Session-No papers in this volume)
COMPUTER OUTPUT MICROFILM SYSTEMS
An overview of the computer output microfiJm field ............. .
The microfilm page printer~oftware considerations ............ .
Computer microfilm: A cost cutting solution to the EDP output
bottleneck ................. '................ , ............. .

THE FUTURE IN DATA PROCESSING WITH
COMMUNICATIONS
A case study of a distributed communications-oriented data
processing system ......................................... .
Analysis of the communications aspects of an inquiry-response
system .......... '........................................ .
A study of asynchronous time division multiplexing for time-sharing
computer systems ........................................ .

637

N. Nisenoff

655

J. S. Sykes

669

TOPICAL PAPERS
The jnvolved generation: Computing people and the disadvantaged .
The CUE approach to problem solving ........................ .
Self-contained exponentiation ................................ .

679
691
701

DCDS digital simulating system .............................. .

707

Pattern recognition in speaker verification. . ................... .

721

D. B. Mayer
J. D. McCully
N~ W. Clark
W. J. Cody
H. Potash
D. Allen
S. Joseph
S. K. Das
W. S. Mohn

HYBRID TECHNIQUES AND APPLICATIONS
A hybrid/digital software package for the solution of chemical
kinetic parameter identification problems ..•..................
Extended space technique for hybrid computer solution of partial
differential equations .................... : . . . . . . . . . . . ...... .

733

A. M. Carlson

751

D'. J. Newman
J. C. Strauss

761
771

N. H. Kemp
P. Balaban

A time-shared I/O processor for real-time hybrid computation .....

781

On-line software checkout facility for special purpose computers ...

789

T. R. Strollo
R. S.' Tomlinson
E. R. Fiala
T. H. Witzel
S. S. Hughes

A hybrid frequency response technique and its application to
aircraft flight flutter testing ................................ .

801

Extension and analysis of use of derivatives for compensation of
hybrid solution of linear differential equations ................ .
HYPAC-A hybrid-computer circuit simulation program ........ .
REAL-TIME HYBRID COMPUTATIONAL SYSTEMS

J. M. Simmons

W. Benson
J. P. Fiedler .

A survey of techniques for recognizing
parallel processahle streams in
computer programs *
by C. V. RAMAMOORTHY and M. J. GONZALEZ
The University of Texas
Austin, Texas

lNTRODUCTIOK

."). Improved performance in a uniprocessor multiprogra~med environment. Even in a uniprocessor environment, parallel processable segments of high priority jobs can be overlapped so
that when one segment is waiting for I/O, the
processor can be computing its companion
segment. Thus an overall speed up in execution
is achieved.

State-of-the-art advances-in particular, anticipated
advances generated by LSI-have given fresh impetus
to research in the area of parallel processing. The
motives for parallel processing include the following:
1. Real-time

urgency. Parallel processing can
increase the speed of computation beyond the
limit imposed by technological limitations.

With reference to a single program, the term "parallelism" can be applied at several levels. Parallelism
within a program can exist from the level of statements
of procedural languages to the level of micro operations.
Throughout this paper, discussion will be confined to
the more general "task" parallelism. The term "task"
(process) generally is intended to mean a self-contained
portion of a computation which once initiated can be
carried out to its completion without the need for
additional inputs. Thus the term can be applied to a
single statement or a group of statements.
In contrast to the way the term "level" was used
above, task parallelism can exist at several levels within
a hierarchy of levels. The statements of the main
program of a FORTRAN program, for example, are
said to be tasks of the first level. The statements within
a subroutine called by the main program would then
be second level tasks. If this subroutine· itself called
another subroutine, then the statements within the
latter subroutine would be of the third level, etc. Thus
a sequentially organized program can be represented
by a hierarchy of levels as shown in Figure 1. Each

2. Reduction of turnaround· time of high priority

jobs.
:~

Reduction of memory and thne requirements
for "housekeeping" chores. The simultaneous
but properly interlocked operations of reading
inputs into memory and error checking and
editing can reduce the need for large intermediate storages or costly transfers between
members in a storage hierarchy.

4. An increase in simultaneous service to many

users. In the field of the computer utility, for
example, periods of peak demand are difficult to
predict. The availability of spare processors
enables an installation to minimize the effects
of these peak periods. In addition, in the event
of a system failure, faster computational speeds
permit service to be provided to more users
before the failure occurs.
'" This work was supported by NASA Grant NGR 44-012-144.

1

2

LEVEL 1

Fall Joint Computer Conference, 1969

LEVEL 2

LEVEL 3

LEVEL n

(a)

(b)

(c)

Figure 2-Sequential and parallel execution of a
computational process

Figure I-Hierarchical represen ta tion of a seq uen tially
organized program

block within a level represents a single task; as before,
a task can represent a statement or a group of statements.
Once a sequentially organized program is resolved
into its various levels, a fundamental consideration of
parallel processing becomes prominent-namely that
of recognizing tasks within individual levels which can
be executed in parallel. Assuming the existence of a
system which can process independent tasks in parallel,
this problem can be approached from two directions.
The first approach provides the programmer with
additional tools which enable him to explicitly indicate
the parallel processable tasks. If it is decided to make
this indication independent of the programmer, then
it is necessary to recognize. the parallel processable
tasks implicitly by analysis of the relationship between
tasks within the source program.
After the information is obtained by either of these
approaches, it must still be communicated to and
utilized by the operating system. At this point, efficient
resource utilization becomes the prime consideration.
The conditions which determine whether or not two
tasks can be executed in parallel have been investigated by Bernstein. 1 Consider several tasks, T i, of a
sequentially organized program illustrated by a flow
chart as shown in Figure 2(a). If the execution of

task Ts is independent of whether tasks Tl and T2 are
executed sequentially as shown in Figure 2(a) or 2(b),
then parallelism is said to exist between tasks T 1 and
T 2. They can, therefore, be executed in parallel as
shown in Figure 2(c).
This "commutativity" is a necessary but IlLOt sufficient condition for parallel processing. There may exist,
for instance, two processes which can be exelcuted in
either order but not in parallel. For example:, the inverse of a matrix A can be obtained in either of the
two ways shown below.
(1)

a) Obtain transpose of A
b) Obtain matrix of cofactors of the transposed
matrix
c) Divide result by
determinant of A

(2)

a) Obtain matrix of
cofactors of A
b) Transpose matrix
of cofactors
c) Divide result by
determinu.nt of A

Thus obtaining the matrix of cofactors and the transposition operation are two distinct processes which can
be executed in alternate order with the same result.
They cannot, however, be executed in parallel.
Other complications may arise due to hardware
limitations. Two tasks, for example, may need to access
the same memory. In this and similar situations,
requests for service must be queued. Djkstra, Knuth,
and Coffman2 •8 •4 have developed efficient scheduling
procedures for using common resources.
In terms of sets- representing memory locations,
Bernstein has developed the conditions which must be

Techniques for Recognizing Parallel Processable Streams
satisfied before sequentially organized processes can be
executed in parallel. These are based on four separate
ways in which a sequence of instruct'ions can use a
memory location:
(1) The location is only fetched during the execution

ofT i .
(2) The location is only stored during the execution
ofT i •

(3) The first operation within a task involves a fetch
with respect to a location; one of the succeeding operations of T i stores in this location.
(4) The first operation within a task involves a store
with respect to a location; one of the succeeding operations of T i fetches this location.
Assuming a machine model in which processors are
allowed to communicate directly with the memory
and multi-access operations are permitted, the conditions for strictly parallel execution of two tasks or
program blocks can be stated as fo11ows.
(1) The areas of memory which Task 1 "reads"
and onto which Task 2 "writes" should be mutually
exclusive, and vice-versa.
(2) With respect to the next task in a sequential
process, Tasks 1 and 2 should not store information in
a common location.

individual functional units can be assigned to independent components within a task. The motivation
remains the same-- a decrease in execution time of
indjvidual tasks. The CDC 6600, for example, can
utilize several arithmetic units to perform several
operations simultaneously. This type of parallelism can
be illustrated by the arithmetic expression which
follows.

x

= (A+B) * (C-D)

Normally, this expression would be evaluated in a
manner similar to that shown in Figure 3(a). The
independent components within the expression, however, permit parallel execution as shown in Figure
3(b) with the same results.

Explidt and implicit parallelsim
In the explicit approach to parallelism, the programmer himself indicates the tasks within a computational
process which can be executed in parallel. This is
normally done by means of additional instructions in
the programming language. This approach can be
illustrated by the techniques described by Conway,
Opler, Gosden, and others5 ,6,7. FORK in the FORK
and JOIN technique6 indicates thep arallel processability of a specified set of tasks,within a process. The
next sequence of tasks will not be initiated until all

The conditions listed by Bernstein are sufficient to
guarantee commutativity and parallelism of two
program blocks. He has shown, however, that there do
not exist algorithms for deciding the commutativity or
parallelism of arbitrary program blocks.
As an example of what has been discussed here
consider the tasks shown below \vhich represent FORTRAN statements for evaluation of three arithmetic
expressions.

x

= (A+B) * (A-B)

Y = (C-D) / (C+D)

z = X+y
Because the execution of the third expression is independent of the order in which the first two expressions
are executed, the first two expressions can be executed
in parallel.
Parallelism within a task can also exist when individual components of compound tasks can be executed
concurrently. In the same manner that ind.ividual
processors can be assigned to independent tasks,

3

(a)

(b)

Figure 3-Illustre,tion of pamllelism within a compound
task

4

Fall Joint Computer Conference, 1969

the tasks emanating from a FORK converge to a
JOIN statement.
In some instances, some of the parallel operations
initiated by the FORK instruction do not have to be
completed before processing can continue. For example,
one of these branch operations may be designed to
alert an I/O unit to the fact that it is to be used momentarily. The conventional FORK must be modified
to take care of these situations. Execution of an IDLE

Figure 4-FORK and JOIN technique

statement, for example, permits proceSSOrB to be
released without initiation of further action. 7 The
FORK and .JOIN TECHNIQUE is illust:rated in
Figure 4.
Another example of the explicit approach is the
PARALLEL FOR7 which takes advantage of parallel
operations generated by the FOR statement in ALGOL
and similar constructs in other languages. For example,
the sum of two n X n matrices consists essentially of
n2 independent operations. If n processors were available, the addition process could be organized such that
entire rows or columns could be added simultaneously.
Thus the addition of the two matrices could he accomplished in n units of time. Another example of this
approach is the programming language PL/l which
provides the TASK option with the CALL staten;.ent
which indicates concurrent execution of parallel
tasks.
An additional way of indicating parallelism explicitly
is to write a language which exploits the parallelism in
algorithms to be implemented by the operating system.
This is the case with TRANQUIL,8,21 an ALGOLlike language to be utilized by the array processors of
the ILLIAC IV. The situation is unique in that the
language was created after a system was devised to
solve an existing problem. "The task of compiling a
language for the ILLIAC IV is more difficult than
compiling for conventional machines simply because of
the different hardware organization and the need to
utilize its parallelism efficiently." A limitation of this
app:roach is that programs written in that particular
language can only be run on array-type computers and
is, therefore, heavily machine dependent.
The implicit approach to parallelism does not depend
on the programmer for determination of inherent
parallelism but relies instead on indicators existing
within the program itself. In contrast to the relative
ease of implementation of explicit parallelism, the
implicit approach is associated with complex compiling
and supervisory programs.
The detection of inherent parallelism between a set
of tasks depends on thorough analysis of the source
pro,gram using Bernstein's conditions. Implementati.on
of a recognition scheme to accomplish this detecti.on
is dependent on the source langua,ge. Thus a r€lco:~nizer
which is universally applicable cannot be implomented.
An algorithm developed by Fisher9 approaches the
problem of parallel task detection in a general manner.
His algorithm utilizes the input and output. sets of
each task (process) to determine essential ordering
and thus inherent parallelism. Given such information
as the number of processes to be analyzed, the input
and output set for each process, the given permissible

Techniques for Recognizing ParaUelProcessable Stream.s
ordering among the processes, and any initially known
essential order among the processes, the algorithm
generates the essential serial ordering relation and the
covering for the essential serial ordering relation. This
covering provides an indication of the tasks within the
overall process which can be executed concurrently.
Basically, this work formalizes in the form of an
algorithm the conditions for par2Jlel processing developed by Bernstein. The conditions for parallel processing
between two tasks are extended to an overall process

Detection of task paraUelism-A new approach
,The next subject covered in this paper involves
implicit detection of parallel processable tasks within
programs prepared. for serial execution. An indication
is desired of the tasks which can be executed in parallel
and the tasks which must be completed before the
start of the next sequence of tasks. Thus the problem
can be broken down in two parts-recognizing the
relationships between tasks within a level and using
this information to indicate the ordering between tasks.
The approach presented here is based on the fact
that computational processes can be modeled by
oriented graphs in which the vertices (nodes) represent
single tasks and the oriented edges (directed branches)
represent the permissible transition to the next task
in sequence. The graph (and thus the computational
process) can be represented in a computer by means
of a Connectivity Matrix, C.IO.ll C is of dimension
n X n such .that C ij is a "1" if and only if there is a
directed edge from node i to node j, and it is "0"
otherwise. The properties of the directed graph and
hence of the computational process it represents can
be studied by simple manipulations of the connectivity
matrix.
A graph consisting of a set of vertices is said to be
strongly connected if and only if any node in it is reachable from any other. A subgraph of any graph is defined
as consisting of a subset of vertices with all the edges
between them retained. A maximal strongly connected
(l\!£.S.C.) subgraph is a strongly connected subgraph
that includes all possible nodes which are strongly
connected with each other. Given a connectivity matrix
of a graph, all its M.S.C. subgraphs can be determined
simply by well-known methods. to A given program
graph can be reduced by replacing each of its M.S.C.
sub graphs by a single vertex and retaining the edges
.connected .betwe~n these vertices and others. After
the reduction, the reduced graph will not contain any
strongly connected components.
The paragraphs which follow will describe the sequence of operations needed to prepare for parallel

5

processing in a multiprocessor computer a program
written for a uniprocessor machine.
(1) The first step is to derive the program graph
which identifies the sequence in which the computation
al tasks are performed in the sequentially codeprogram. Figure 5(a) illustrates an example program
graph. The program graph is represented in the computer by its connectivity matrix. The connectivity
matrix for the example is given in Figure 5(b).
(2) By an analysis of the connectivity matrix, the
maximal strongly connected subgraphs are determined
by simple operations.1O This type of subgraph is i:llustrated by tasks 2 and 12 in Figure .5. Each M.S.C.
subgraph is next considered as a single task, and the
graph, called the reduced graph, is derived. The reduced graph does not contain any loops or strongly

1 2

3 4

0

0

2a 0
2b 0
3
0
4
0
5
0
6
0
7
0
8
0
9
0
10
0
11
0
12a 0
12b 0
12c 0
13
0
14
0

a 2b
1
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

5

6 7 8 9 10 11 12

o
o

0 000 0
0 o 0 o 0
000 000
1 1 o 0 0 0
0 0 0 1 0 0
0 0 1 0 0 0
0 0 0 1 o 0
0 0 0 0 1 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0

12
a 12b
c
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0

13 14
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0

(b)

Figure 5-Program graph of a serially coded program
and its connectivity matrix

6

Fall Joint Computer Conference, 1969

connected elements. In this graph; when two or more
edges emanate from a vertex, a conditional branching
is indicated. That is, the execution sequence ",-m take
only one of the indicated alternatives. A vertex which
initiates the branching operation wdl be called a
decision or branch vertex. The reduced graph for the
example program graph is shown in Figure 6. In this
graph. vertex 3 represents a branch vertex.
(3) The next step is to derive the final program
graph and its connectivity matrix T. The elements of
T are obtained by analyzing the inputs of each vertex
in the reduced graph. An element, T ii , iF! a "I" if
and only if the j-th task (vertex) of the reduced graph
has as one of its inputs the output of task i; othCf\vise
T ii is a "0". Figure 7 illustrates the final program for
the example after consideration iR given to the inputoutput relationships of each taRk. The connectivity
matrix for the final program gr9ph is shown in F"gure R.
From the sufficiency conditions for task parallelism.
two tasks can be executed in parallel if the input set of
one task does not depend on the output Ret of the other
and vice versa. The technique outlined in Step 4 detects
this relationship and uses it to provide an ordering
for task execution.
(4) The vertices of the final program graph are

,

E) 6

= f{S)

Figure 7-Final progra:n graph of the parallel
processable i)rogram

10
0

0

0

0

0

0

0

0
.4

1=

0

0

0

0

0

0

0

0

0

a

0

9

0

0

0
0

0

0

0

0

0

0

0

0

0

0

0

10

0

11

0

12

0

0

0

0

13

0

0

0

0

14

0

0

0

0

Precedence
Partitlons

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0
0
0

0

0
0

0

0

0

0

0

0

0

0

0

0

0
0

0

0

0
0

0
0

0

0

0
0

14

0

0

0
0

0

0

0

0

0

0

0

0

0

0

0

0

0
0

13

0

0

0

0

1"

0
0

0
0

11

0

0

0

0

[I} , (2} , {3,a} , (4,5,9,lO}
(6,11,12}, (7,131, (141

Figure 8-Connectivity matrix of the final program
graph

F!gure 6--Reduced program graph of the serially coded
program

partitioned into "precedence partitions"P as follows.
Using the connectivity matrix T, a column (or columns)
containing only zeroes is located. Let this eolumn
correspond to vertex Vl. Next delete from T both the
column and the row corresponding to this vertex. The
first precedence partiton is P 1 = {vI} . Using the remaining portion of T, locate vertices {V21, Vzz, . .. } which
correspond to columns containing only zeroes. The
second precedence partition P z thus contains vertices
{VZ1, Vzz, .. . }. This implies that tasks in set p z =

Techniques for Recognizing Parallel Processable Streams
{V21, V22, ••• } can be initiated and executed in parallel
after the tasks in the previous partition (i.e., PI) have
been completed. Next delete from T the columns and
rows corresponding to vertices in P 2 • This procedure is
repeated to obtain precedence partitions P a,P4,' .. P p ,
until no more columns or rows remain in the T matrix.
It can be shown that this partitioning procedure is
valid for connectivity matrices of graphs which contain
no strongly connected components.
The implication of this precedence partitioning is
that if P 1,P 2 , ••• P p corresponds to times t 1,t2,. • •t p , the
earliest time that a task in partition Pi can be initiated
is ti.
The final program graph contains the following types
of vertices: (1) The branch or decision type vertex
from which the execution sequence selects a task from
a set of alternative tasks. (2) The Fork vertex which
can initiate a set of parallel tasks. (3) The Join vertex
to which a set of parallel tasks converge after their
execution .• (4) The normal vertex which receives its
input set from the outputs of preceding tasks. Figure 7a
indicates the final program graph with the first three
types of vertices indicated by B, F, and J, respectively.
(5) From precedence partitioning and the final
program graph, a Task Scheduling Table can be
developed. This table, shown in Table I, serves as an
input to the operating system to help in the scheduling
of tasks. For example, if the task being executed is a
Fork task, a look-ahead feature of the system can
prepare for parallel execution of the tasks to be initated upon compl~tion of the currently active task.
(6) The precedence partitions of Step 4 provide an
indication of the earliest time at which a task may be
initiated. It is also desirable, however, to provide an
indication of the latest time at which a task may be
initiated. This information can be obtained by performing precedence partitions on the transpose of the
T matrix. This process can be referred to as "row partitions". The implication here is that if task is in the
partition corresponding to time period h, then h is
the latest time that the task i can be initiated.
Using both the row and column partitions, the permissible initiation time for each task can be derived as
shown in Table II. Task 4, for example, can be initiated during t4 or to depending on the availability of
processors.
At this point it is desirable to clarify some possible
misinterpretations of the implications of this method.
The method presented here does not try to determine
whether any or· all of the iterations within a loop can
be executed simultaneously. Rather the iterations
executed sequentially are considered as a single task.

7

TABLE I-Tp.sk scheduling table

TASK
TYPE

TIME

INPUTS
TO TASKS

TASK
NUMBER

tl

-

1

t2

1

2

FORK

t3

2

3

BRANCH

t3

2

8

FORK

t4

3

4

t4

3

5

t4

8

9

t4

8

10

ts

5

6

ts

9

11

ts

9

12

t6

4,6

7

JOIN

FORK

to

10,11,12·

13

JOIN

t7

7 ,13

14

JOIN

For this reason, the undecidability problem introduced
by Bernstein is not a factor here.
In addition, precedence partitions may place the
successors of a conditional within the same partition.
The interpretation of this is that only one of the successots will be executed, and it can be executed in
parallel with .the other tasks within that partition.

The FORTRAN parallel task recognizer
In order to determine the degree of applicability of
the method described above, it was decided to apply
the method to a sample FORTRAN program. This
was accomplished by writing a program whose input
consists of a FORTRAN source program; its output
consists of a listing of the tasks within the first level
of the source program which can be executed in parallel. .
The program written to accomplish this parallel task

Fall Joint Computer Cqnference, 1969

8

TABLE II-Permissible task initiation time

COLUMN PARTITIONS
TASK

TIME

PERMISSIBLE TASK
INITIATION PERIODS

t1

1

TASK

TIME

t?

2

1

t1

t3

3,8

2

t2

t4

4,S,9,1O

3

t3

ts

6,11,12

4

t 4 , ts

t6

7 ,13

S

t4

t7

14

6

ts

7

t6

ROW PARTITIONS

t1

1

8,

t3

t2

2

9

t4

t3

3,8

10

t 4 , ts

t4

S,9

11

ts

ts

4,6,10 ,11,1:1

12

ts

t6

7 ,13

13

t6

t7

14

1~

t7

detection is known in its final form as a FORTRAN
Parallel Task Recognizer .13
The recognizer, also written in FORTRAN, relies
on indicators generated by the; way in which the
program is actually written. ConSider the expressions
given below.
Xl

=

f1(A,B)

X2 = f 2 (C,D)

Because the right-hand side of the second expression
does not contain a parameter gen~rated by the computation which immediately preced~s it, the two expressions can be executed in parallel. ~f, on the other hand,
the expressions were rewritten as shown below, the

termination of the first computation would have to
precede the initiation of the second.
Xl
X2

= fl(A,B)
= f 2 (XI,C)

The recognizer performs this determination by comparing the parameters on the right-hand of the equality
sign to outcomes generated by previous statements.
Other FORTRAN instructions can be analyzed
similarly. Consider the arithmetic IF:
IF (X - Y) 3,4,5
Here the parameters within the parentheses must be
compared to the outputs of preceding statements in
order to determine essential order.
Other FORTRAN instructions are analyzed in a
similar manner in order to generate the connectivity
matrix for the source program. During t.b.is analysis
the recognizer assigns numbers to the executable
statements of the source program. After this is completed, the recognizer proceeds with the method of
precedence partitions described earlier. Precedence
partitions yield a list of blocks which contain the statement numbers which can be executed concurrently,
Figure 9 shows a block diagram of the steps t:a.ken by
the recognizer to generate the parallel processable
tasks within the first level of a FORTRAN source
program.
Some statements within the FORTRAN set are
treated somewhat differently. The DO statement, for
example, does not itself contain any input or output
parameters but instead generates a series of repeated
operations. Because of the loop considerations mentioned earlier, and because the rules of FOHTRAN
require entrance into a loop only through the DO
statement, all the statements contained within a DO
loop are considered as a single task. A loop, however,
may contain a large number of statements, and a great
amount of potential parallelism may be lost if consideration is not given to the statements within the
loop. For this reason, the recognizer generates a separate connectivity matrix for each DO loop within the
program.
The recognizer itself possesses limitations which
must be eliminated before it can be applied to programs
of a complex nature. For example, only a subset of
the entire FORTRAN set is considered for recogniton.
This could be corrected by expanding the recognition
process to include a more complete set of instructions.
In addition to the DO statement, loops can also be

Techniques for Recognizing Parallel Processable Streams

C

READ NEXT
SOURCE
PROGRAM
IN3'1RUCTION

20

10

N

SCAN EXECUTABLE
STATEMENTS AND
COMPARE INPUT
PARAMETERS TO
OUTPU1S a' FRLVlCUS
STATEMENTS

IF THIS TAS< IS THE
SUCCESSOR OF A
BRANCH OR TRAN3HR
CPERJU'ION, REOORD
THIS I!'lFffiMATION

ECORDINPUT
ND OUTPUT
>-_"":_'1ARAME TERS
QUIRED BY
HIS TASK

C
9
10
11
12
13
14
15
16
17
18
19
20

30
40

50

60
100
200
3057
315
4
52

21

THIS IS A TEST PROGRAM DESIGNED TO CHECK PPS
DIMENSION Al(lO) ,A2(l0) ,A3(l0)
INTEGER Al ,A2 ,ABC ,A2X2, B ,C ,D
READ 100, (Al(I) ,1=1 ,10), B ,C ,D
READ 100, (A2(I),I=1,10),NS,NST,NSTU
DO 10 1=1,10
IF (Al(I) -A2(I)) 20,30,40
Xl=(Al(I))*(B-C)
X2=D+(B/c)
A3(I)=Xl*X2
CONTINUE
THIS IS A TEST COMMENT
PRINT 200 ,B,C ,D
CALL ALPHA(Al ,A2 ,ABC, B4 ,B5)
PRINT 3057 ,Xl ,X2, (A3(I) ,1=1 ,10)
CALL BETA(Xl ,X2 ,A3, B6)
IF(B4-B5) 50.50,60
READ 315, E , F • G • H
X3=(E*F)+(G-H)
PAR}\LLEL
X4=B6+G
PROCESSABLE
X5=X3-X4
TASKS
X6=(B4+B5)* X5
(1,2)
PRINT 4,X3,X4,X5
(3)
PRINT 52, (Al(I) , 1=1,10) .ABC ,C, (A3(I) ,1=1.10)
(9,10.11,12)
FORMAT(lOI2,3I3)
(13)
FORMAT(1HO,8 B C D* ,/,313)
(14)
FORMAT(1H ,213 .lOF7 .1)
(15,16)
FORMAT(4F7 • 4)
(17)
FORMAT(3F7 • 4)
(18.19,20)
FORMAT(12I3 ,10F7 .1)
END

(a)

HEN MATCH IS
FOUND ,MJKE ENI'RY
IN C,i.e. , SHOW A
CONNECTION FROM
PREDECESSOR TO
SUCCESSOR

USING THE As)IGNED
AFTER GENERATION
STATEMENT NUMBERS
OF CIS CCMPLETE,
GEm:RATE
INDICATE THOSE
I-----I~TASKS WI'lHIN THE
PRECEDENCE
FIRST LEVEL WHICH
PARTITIONS
CAN BE DONE IN
PARALLEL

Figure 9-Block diagram of the FORTRAN
parallel te.sk recognizer

created by branch and transfer operations such as
the IF and GO TO instructions. To eliminate these
loops, it would be necessary to analyze the connectivity matrix in the manner mentioned earlier before
beginning the process of precedence partitions. The
recognizer does not presently perform this analysis.
Nested DO loops are not permitted, and the source
program size is limited in the number of executable
statements it may have and in the number of parameters anyone statement can contain.
Some of these limitations could be eliminated quite
easily; others would require a considerable amount of
effort. To allow a source program of arbitrary size
would require a somewhat more elaborate handling of
memory requirements and associated problems. At the

9

(b)

Figure IO-An exe.mple of the recognition process.

present time the recognizer consists of a main program
and six subroutines. In its present form the recognizer
consists of approximately 1300 statements.
The recognizer is presently written in such a manner
that it will detect only first level parallelism. The
method it uses, however, can be applied to parallelism
at any level.
The theory of operation of the FORTRAN parallel
task recognizer will be illustrated by applying the
recognition techniques to a sample FORTRAN program.
Figure IOCa) is a listing of the sample program showing
the individual tasks. Figure IOCb) is a listing of the
parallel processable tasks as determined by precedence
partitions. The numbers to the left of the executable
statements are the numbers assigned by the recognizer
during the recognition phase.
Elimination of the limitations mentioned here and
other limitations not mentioned explicitly will be the
subject of future effort.

Observations and comments
Regardless of the manner in which the subject of
parallel processing is approached, common problems
arise. Prominent among these is a need to protect
common data. If two tasks are considered for concurrent execution and one task accesses a memory
location and the other amends it, then strict observance
must be paid to the order in which this is done. The

10

Fall Joint Computer Conference, 1969

FORTRAN recognizer, for example, may determine
that two subroutines can be executed in parallel. At
the present time no consideration is given to the fact
that both subroutines may access common data
through COMMON or EQUIVALENCE statements.
In order to truly optimize execution time for a
program which is set up for parallel process'llg, it
would be highly desirable to determine the time required for execution of the individual tasks ·within
the process. It is not enough to merely determine that
two tasks can be executed concurrently; the primary
goal is that this parallel execution result in higher
resource utilization and improved throughput. If the
time required for the execution of one task is 100 tImes
that of the other, for example, then it may be desirable
to execute the two tasks serially rather than in parallel.
The reasoning here is that no time wou~d be spent
in allocating processors and so forth.
Determinat;.on of task execution time, however, is
not a simple matter. Exhaustive measurements of the
type suggested by Russell and Estrin14 would provide
the type of information mentioned here.
Another problem area involves implementation of
special purpose languages such as TRANQUIL. It
was mentioned earlier that programs written in a
language of this type are highly machine-limited. It
would be highly desirable to be able to implement
progr9ms written in these languages in systems whicl~
are not designed to take advantage of parallelism.
Along these lines, the programming generality suggested by Dennis 15 may be significant.
It should be pointed out that aU the techniques
whl.ch have been discussed here will create a certain
amount of overhead. For this reason it is felt that a
parallel task recognizer, for example, would be best
suited for implementation with production programs.
Thus even though some time would be lost initially,
in the long run parallel processing would result in a
significant net gain.

Conclusions
The method of indicating parallel processable tasks
introduced here and illustrated in part by the FORTRAN Parallel Recognizer appears to provide enough
generality that it is independent of the language, the
application, the mode of compilation, and the number
of processors in the system. It is anticipated that this
method will remain as the basis for further effort in
this area.
In additi.on to the comments made earlier, some
possible future areas of effort include determination of

possible paralleljsm of individual iterations within a
loop. It is hoped that additional information can be
provided to the operating system other than a mere
indication of the tasks which can be executed in parallel. This would include the measurements mentioned
earlier and an indication of the frequency of execution
of individual tasks.
I t is also hoped that a sub-language may be developed which can be added to existing languages to
assist in the recognition process and the development
of recognizer code.

Detection of parallel components within
compound tasks
Several algorithms exist for the detection of independent components within compound tasks.16.17.1b.19
These algorithms are concerned pr·.marily with detection of this type of parallelism within arithmetic
expressions. The first three algorithms referenced
above are summarized in [19] where a new all~orithm
js also introduced.
The arithmetic expression which will be used as an
example for each algorithm is given below.
A+B+C+D*E*F+G+H
Throughout this discussion. the usual precedence
between operators will apply. In order of increasing
precedence, the precedence between operators will be
as follows: + and - , * and/, and t, where l' stands
for exponentiation.
Hellerman's algorithm
This algorithm assumes that the input string is
written in reverse Polish notation and contailns only
binary operators. The string is scanned from left to
right replacing by temporary results each occmrrence
of adjacent operands immediately followed by an
operator. These temporary results will be considered
as operands during the next passes. Temporary results
generated during a given pass are said to be at the
same level and therefore can be executed in parallel.
There will be as many passes as there are levels in the
~;;yntactic tree. The compilation of the expression
listed above is shown in Figure 11.
Although this algortihm is simple and fast, it has
two shortcoming'). The first is a possible difficulty in
implementation since it requires the input string to
be in Polish notation; the second is its inabilit.y to
handle operators which are not commutative.

Techniques for Re,cognizing Parallel Processable Streams

TEMPORARY RESULTS
GENERATED DURING lth PASS

INPUT STRING AFTER THE lth PI\SS
0

the algorithm causes it to be slow, and at least one
additional pass would be required to specify parallel
computations.

AB+C+DE*F* +G+H+
Rl C+R2 F*+G+H+

Rl=A+B
R2=D*E

R3 R4+G+H+

R3=Rl+C
R4=R2*F

RS G+H+

RS=R3+R4

4

R6 H+

R6=RS+G

5

R7

R7=R6+H

2

11

LEVEL

~~

5

4

It""
/RS"""

o

H

G

A"-c 1',\
;I.;", / R: '" F

A

B D

E

Figure ll-Parallel computation of
A+B+C+D*E*F+G+H using Hellerman's
algorithm

Stone's algorithm
The basic function of this algorithm is to combine
two subtrees of the same level into a level that is one
higher. For example, A and B, initially of level 0, are
combined to form a subtree of level 1. The algorithm
then searches for another subtree of level 1 byattempting to combine C and D. Since precedence relationships between operators prohibit this combination, the
level of subtree (A+B) is incremented by one. The
algorithm now searches for a subtree of level 2 by
attempting to combine C, D, and E. Since this combination is also prohibited, 'subtree (A+B) is incremented to level 3. The next search is successful, and a
subtree of level 3 is obtained by combining C, D, E
and F. These two subtrees are then combined to form a
single subtree of level 4 .
In a similar manner the subtree (G+H), originally
of level 1, is successively incremented until it achieves
a level of 4; at that time it is combined with the other
subtree of the same level to form a final tree of level 5.
The algorithm yields an output string in reverse
Polish which does not expressly show which operations
can be performed in parallel. Even though the output
string is generated in one pass, the recursiveness of

Squire's algorithm
The goal of this algorithm is to form quintuples of
temporary results of the form:
Ri (operand 1, operator, operand 2, start level
= max [end level op. 1; end level op. 2], end level=
start level + 1) .
All temporary results which have the same start level
can be computed in parallel. Initially, all variables
have a start and end level equal to zero.
Scanning begins with the rightmost operator of the
input string and proceeds from right to left until an
operator is fouIld whose priority is lower than that of
the previously scanned operator. In the example thp
scan would yield the following substring:
D*E*F+G+H
N ow a left to right scan proceeds until an operator is
found whose priority is lower than that of the leftmost operator of the substring. This yields: D*E*F.
At this point a temporary result Rl is available of the
form:

HI (D, *,E,O,I).

The temporary result, Rl, replaces one of the operands
and the other is deleted together with its left operator
The new substring is then:
R1*F+G+H.
The left to right scans are repeated until no further
qunituple can be produced, and at that time, the right
to left scan is re-initiated. The results of the process
are shown in Figure 12.
Although the example shm,'s the algorithm applied
to an expression containing only binary operators, the
algorithm can also handle subtraction and division
with a corresponding increase in complexity.
A significant feature of this algorithm is that Polish
notation plays no part in either the input string or
the output quintuples. Because of the many scans and
comparisons the algorithm requires, it becomes more
complex as the length of the expression and the diversity of operators within the expression increase.

Fall Joint Computer Conference, 1969

12

INITIAL STRING: A+B+C+D*E*F+ G+H
RIGHT TO LEFT SCAN
D*E*F+G+H

Rl*F+G+H
R2+G+H

A+B+C+R2+G+H

R3+C+R2+G+H
R4+R3+R2+H
R4+RS+R2
R6+R2
R7

QUINTUPLES
Rl
R2
R3
R4
RS
R6
R7

Op.l
D
F
A
C
H
R4
R2

LEVEL

LEFT TO RIGHT SCAN

OPERATOR

+

+
+
+
+

4

3

Op.2

START

END

E
Rl
B
G
R3
RS
.R6

0
1
0
0
1
2
3

J
2
1
1
2
3
4

2

LEVEL
4

1

3

o

o
Figure 12-Parallel computation of
A+B+C+D*E*F+G+H uE-ing Squire's p,lgorithm

Baer and Bovet's algorithm
The algorhhm uses mUltiple passes. To each pass
corresponds a level. All temporary results which can
be generated at that level are constructed and inserted
appropriately in the output string produced by the
corresponding pass. Then, this output string becomes
the input string for the next level until the whole
expression has been compiled. Thus the number of
passes will be equal to the nUInber of levels in the
syntactic tree. During a pass the scanning proceeds
from left to right and each operator and operand is
scanned only once.
The simple intermediate language which this algorithm produces is the most appropriate for multiprocessor compilation in that it shows directly all
operations which can be performed in parallel, namely
those having the same level number. The syntactic
tree generated by this algorithm is shown in Figure
13.
A new algorithm
This section will introduce a technique whose goals
are: (1) to produce a binary tree which illustrates the
parallelism inherent in an arithmetic expression; and

Figure 13-Parallel computation of
A+B+C+D*E*F+G+H using Baer and
Bovet's algorithm

(2) to determine the number of registers needed to
evaluate large arithmetic or Boolean expressions without intermediate transfers to main memory.
This technique is prompted by the fact that existing
computing systems possess multiple arithmetic units
which can contain a large number of active storages
(registers). In addition, the superior memory bandwidths of the next generation of computers will simplify
some of the requirements of this technique.
In the material presented below, a complex arithmetic expression· is examined to determine its maximum
computational parallelism. This is accomplished by
repeated rearrangement of the given expression. During
this process the given expression in reverse Polish form
is also tested for "well formation", i.e., errors and
oversights in the syntax, etc.
The arithmetic expression which was used aB a model
earlier will also be used here, namely A+B+C+D
*E*F+G+H. The details of the algorithm follow:
(1) The first step is to rewrite the expression in
reverse Polish form and to reverse its order.
+H+G+*F*E D+C+B+A
(2) Starting with the rightmost symbol of the string,
assign a weight to each member of the string based on
the following procedure:

Techniques for Recognizing Parallel Processable Streams
Assign to symbol Si the value Vi = (V i-I) + Ri
i = 1,2, ... ,n

INITIALRIGHTMOSTS i
SUBSTRING

---. .--

O(Si) = 0 if Sds a variable

=

FINAL RIGHTMOST
SUBSTRING

1 if Si is a unary operator

V i-2

1,

=,

\'0 = 0

Using this procedure, the following expression results:
Root
Xode

8
8
14

13

12

11

H

+

G

+

2

Vi

1

10

*

F

3

This procedure is repeated until the initia,l Vm occupies
the position i = 2 in the substring. For this example
this is already the case. Thus the rightmost substring
is in the proper form.
(5) The transposition procedure of step 4 is applied
next to the leftmost substring. However, since the
leftmost substring of this example consists of only two
operands and one operator, no further operations are
necessary.
(6) The resultant binary tree is shown in Figure 14.
The numbers assigned to each node represent the final
weight V i of the symbol
determined in steps 1-5
above.

as

2

2

Vm

9

Si + + C + B A * F * E D

V i-3 + H i- 2,

such that V i-(i-l) = VI = HI. and

Si

11 10 9 8 7 6543 2 1

Vi 12 3 1321212 1

O(Si) = 2 if Si is a binary operator
and Vi - l = V i-2+R i etc.,

+*F*ED+C+BA

ViI 2 3 2 3 2 1 2 1 2 1

where Ri = 1 - O(Si) given that

O(Si)

13

8

6

5

4

3

2

1

*

D

+

C

+

B

A

2

2

1

2

2

1

Note that for a "well-formed expression" of n symbols
V1l = 1.
(3) At this point the root node of the proposed
binary tree can be determined. Thus the given string
can be divided into two independent sub-strings. To
determine the root node, draw a line to the left of the
firRt symbol with a weight of 1 (i = 11, Si=+, V i =l)
to the left of the symbol with the highest weight,
V m(i=7, Si=E, Vi=Vm=3). The two independent
substrings consist of the strings to the left and to the
right of this line. The root node will be the leftmost
member of the string to the left of the line (i= 15,
St=+, Vi=l). Note that Vi also equals 3 for j=9;
however V m is chosen from the etuliest occurrence of
a symbol with the highest weight.
(4) The next step is to look for parallelism withni
each of the new substrings. Consider the rightmost
substring. Form a new substring consisting of the
symbols within the values of Vi = 1 to the right and to
the left of Vm' Transpose this substring with the substring to the right of it whose leftmost member has a
weight of V i= 1.

Some observations and comments on this algorithm
are given below.
(1) The two branches on either side of the root node
can be executed in parallel. Within each main branch,
the transposition procedure of step 4 yields supplementary root nodes. The sub-branches on each side of the
supplementary nodes can be executed in parallel.
(2) The number of levels in the binary tree can be
LEVEL
4

o
Figure 14-Bin:;>,ry tree for pt',rallel computation of

A+B+C+D*E*F+G+H

14

Fall Joint Computer Conference, 1969

predicted from the Polish form of the original string.
No. of LEVELS = MAX [NUMBER OF 1's; Vm]
in the substring (rightmost or leftmost) containing Vm.
(3) The tree is traversed in a modified postorder
form.20 The resulting expression is
D*E*F+A+B+C+G+H
(4) An added feature of this technique is that the
number of registers required to evaluate this expression
without intermediate STORE and FETCH operations
is obtained directly from the binary tree. This information is provided by the highest weight assigned to
any node within the tree. Thus for this example the
expression could be evaluated using at most two
registers without resorting to intermediate stores and
fetches.
(5) This technique of recognizing parallelism orr a
local level has been applied to a single instruction, in
particular, an arithmetic expression. It is worthwhile
mentioning that each variable within the expression
can itself be the result of a processable task. Thus this
technique can be extended to a higher level of parallel
stream recognition, i.e., level parallelism.
In order to implement the techniques mentioned
here for components within tasks and the techniques
mentioned earlier for individual tasks, several system
features are desirable. Schemes for detecting parallel
processable components within compound tasks are
oriented primarily toward arithmetic expressions. For
these situations string manipulation ability would be
highly desirable. Since individual tasks are represented by a graph and its matrix, the ability to manipulate rows and columns easily would be very important. In this same area, an associative memory
could greatly reduce execution time in the implementation of precedence partitions.
ACKNOWLEDGMENTS
The authors would like to thank the referees of the
FJCC for their comments and suggestions which
resulted in improvements of this paper.
REFERENCES
1 A J BERNST.EIN
Analysis of programs for parallel processing

IEEE Trans on EC Vol 15 No 5 757-763 Oct 1966
2 E W DJKSTRA
Solution of a problem in concurrent programming control

Comm ACM Vol 8 No 9 569 Sept 1965

:~

D KNUTH
Additional comments on a problem in concurrent
programming control

Comm ACM Vol 9

~o

5

:~21-322

Nlay 1966

-1 E G COFFMA~ H. R MUNTZ
Models of pu~e lime sharing disciplines for research
allocation

Proc 1969 Natl ACM Conf
5 M E CONWAY
A. mult1:processor 8ystem de8ign

Proc FJCC Vol 23 139-146 1963
6 A OPLER
Procedure-oriented statements to facilitate parallel proce8sing'

Comm ACM VoIR No 5 306-307 May 1965
7 J A GOSDEN
Explicit parallel processing description and control in
programs for multi- and nni-proce8.'?or computers
Proe FJCC Vol 29 651-660 1966

R N E ABEL P P BUDNIK D J KUCK
Y MURAOKA R S NORTHCOTE
H. B WILHELMSON
TRANQUIL: A. languaqc for an array proce8sing computer

Proc SJCC 57-68 1969
9 D A FISHER
Program analY8i8 for multiproces.'?ing

Burrougfi1 Corp May 1967
10 C V RAMAMOORTHY
Analysis oj graphs by connectivity considerations

Journal ACM Vol 1:~ No 2 211-222 April 1966
11 C V RAMAMOORTHY M J GONZALEZ
Recognition and representation of parallel processable streams
'in computer progranv~--Il (task/proce88 parallelis'm)

1969 Nr,tl ACNI Cont'
12 C V RAMAMOORTHY
A. structural theory of machJne diaf!nOsl:s

Proc SJce 74;{-756 1967
13 M J GONZALEZ C V RAMAMOORTHY
Rec)g'1,itia. ad repres'nt'ltiJn, '),{ p1.rallel proces8abl~e
8treams

in CJmputer

programs

Symposia on Parallel PrOCe3'30r System"! Technolol~ie3 and
Applications Ed. L C Hobbs Spartan Books June 1969
14 E C RUSSELL G ESTRIN
Mea8urement based automatic analYI~is of FORTRAN
programs

Proc SJCC 1969
15 J B DENNIS
Programming generality, parallelism and computer
architecture

Proc IFIPS Cong;res'l 68 CI-C7
16 H HELLERMAN
Parallel processing of algebraic

expres,~ions

IEEE Trans on E C Vol 15 No 1 Feb 1966
17 H S STONE
One-pa8b compilation of arithmetic expre88ions for C~
parallel proce8sor

Comm ACM Vol 10 No 4 220-223 April 1967
18 J S SQUIRE
A translation algorithm for a multiprocessor computer

Proc 18th ACM Natl Conf 1963
19 J L BAER D P BOVET
Compilation of arithmetic expre.~sions for parallel
computation

Techniques for R.ecognizing Parallel Processable Streams
Proc IFIPS 68 B4-BI0
20 D KNUTH
The art oj computer programming, Vol. 1, fundamental
algorithms

15

Addison-Wesley 316
21 R S NORTH COTE
Software developments for the array computer ILLIAC IV,
Univ of Illinois Rpt Ko 313 March 1969

Performance Illodeling and empirical
measurements in a system designed for
batch and time-sharing users
by JACK E. SHEMER and DOUGLAS W. HEYING
Scientific Data Systems, a Zerox Company
EI Segundo, California

the quality of service the user receives (his waiting time
for service completion, the price he is charged for
service, etc.).
The ramifications of hardware and software designs to
achieve such service can be investigated both internally
and externally; yet, a particular design strategy need
not supplement effective service from both viewpoints.
On the contrary, schemes tailored to improve external
utilization often degrade internal service effectiveness
and vice versa. Unfortunately, in confronting these
design trade-offs, the designer often had to rely upon
heuristic and intuitive arguments, since there is a
general lack of design models which quantitatively
relate system variables to reflect a priori performance
estimates. Hence, the design is complicated not only by
trade-offs between the often dissimilar aims of external
and internal effective service, but also by a deficiency of
design tools for investigating various implementation
alternatives.
These problems are especially amplified with the
advent of time-shared cqmputer systems. In timesharing systems, an ideal goal is to respond to interactive
on-line users such that each user receives the impression
that he has his own computer, yet at a price he can
afford. Thus in these systems, the computer complex is
shared among a number of independent users who are
concurrently communicating with the system, generating programs and interactive service requests via
on-line remote terminal equipment. This action enables
one to achieve economies of scale and distribute the cost

INTRODUCTION
If any design goal is common to all computer system

organization schemes, it is that of providing "effective
service" both externally to the user of the computational
facility and internally with respect to utilization of
system resources. Thus, generally speaking, there are at
least two dimensions to this design objective. On the one
hand, effective service is the external satisfaction of a
broad spectrum of user demands. For example, the ideal
system might be visualized as one which economically
provides a large number of programming languages;
machine compatibility with other computers of widely
diverse hardware; and rapid computation. On the other
hand, effective service is the internal utilization of all
system components so as to increase computational
efficiency. In this respect, system structures are implemented which strive to maximize sub-system
simultaneity and system throughput. For example, a
degree of macro-parallelism is attained in many present
day systems by allowing a central processing unit (CPU)
and input/output controller to share the use of a main
memory register, thereby enabling processing and
input/output (I/O) to proceed concurrently (for one or
several independent programs, depending upon the
system software).
In general, external effectiveness is all that the user
sees, and it is therefore of primary interest to him.
Whereas, the purveyor of the equipment is vitally
concerned with internal utility and coordination.
However, this latter consideration indirectly relates to

17

18

Fall Joint Computer Conference, 1969

of the system among all users according to their usage
of the facilities. Similarly, the objective of rapid response
is realized by time slicing CPU service and sharing it
among the on-line users. A request for program execution
is not necessarily serviced to completion; but rather jobs
are granted finite intervals (quanta~ of processing time.
If a job fails to exhaust its demands during a quantum
allocation, then it is truncated and postponed according
to a scheduling discipline, thereby facilitating rapid
response to short requests. 1- 4 This preferential treatment
of short jobs increases the programmer's productiveness,
since one-attempt efforts, editing, debugging, and other
typically short interactive demands often encounter
exorbitant turn-around times in batch processing
environments (i.e., in relation to the amount of actual
processing time consumed, due to problems of key
punching, printer output, card stacking, and total
system demand).
However, since computation is not necessarily run to
completion and main memory size is limited (by both
economic and physical reasons), programs must be
swapped into and out of main memory as the CPU
commutates its service from request to request.
Therefore, unless swapping is achieved with no loss in
time, it is obvious that service in the time-sharing sense
is less efficient in CPU utilization than service to
completion. Also, the time spent scheduling, allocating
buffers, and controlling swap input/output represents
overhead or wasted processing time which, due to
incomplete servicing, is greater in time-sharing systems
than batch processing systems. Furthermore, if the
system is dedicated to servicing on-line requests, the
CPU is essentially idle during periods of low on-line
input traffic. Hence, a design compromise must be
attained between external response rapidity and internal
efficiency since system performance, in the general case,
is a function of both response to selected classes of users
and utilization of system resources.
Yet, exploring such problem areas prior to design is
complicated, because any performance investigation is
incorrigibly statistical. Performance is not only a
function of software characteristics such as the input/
output, memory, and processing requirements of each
on-line request together with the occurrence rate of such
requests, but also dependent upon hardware characteristics such as the instruction processing rate and rates
accessing secondary memory.
This paper presents one approach to mitigating some
of these difficulties. A system design is briefly described
and then analyzed utilizing a mathematical model. The
system is structured to accommodate both batch and
time-sharing users with the goal being to achieve a

balance of system efficiency and responsiveness. A set
of variables are defined which characterize on-line user
demands and the servicing capacity of variou8 units
within the system. These variables are then quantitatively related in a mathematical model to derive salient
performance measures. Examples are given which
graphically display these measures versus various ranges
of the system variables. These a priori performance
estimates are then compared with empirical data
extracted from the system during its actual operation.
Here the emphasis is given to mathematical modeling
because this analysis method is more expedient and
generally less costly than the alternative approach of
simulation. Moreover, since many of the variables are
non-independent and rely upon characterization of user
demands, and siilce these are difficult to accurately
describe prior to actual operation, the macroscopic and
statistical indications provided by a mathematical model
are perhaps all that one can feasibly obtain.
Design and performance study

System design
The Batch/Time-Sharing Monitor (BTM) is designed
to afford SDS Sigma 5 and Sigma 7 users with interactive
and on-line time-sharing without disrupting batch
operations. For considerations of efficiency, the primary
objective of the BTM design is to provide limited timesharing service while concentrating on throughput of
batch jobs-the servicing of time-sharing u:sers is
allocated to minimize response for interactive users with
no special service given to the compute bound on-line
users (because high-efficiency batch service is avaHable).
Thus, the system is structured with resources for the
batch and time-sharing portions of the system separated
as much as possible. Different areas of main memory are
allocated so that a (compute bound.) batch user is
always "ready to run." The file device is common
because files may be shared between batch and timesharing users. However, the management tec:hnique
used minimizes the interference from this factor. The
swapping Rapid Access Disc (RAD) for time-i~haling
users is independent of the file device, thus insuring that
swaps in process do not affect on-going batch programs.
The batch user is kept essentially compute bound by
buffering all of his unit record I/O via a RAD. This
allows the compute portion of each job to follow that
of the previous job without waiting for the printout,
etc., to complete. Thus, there is no need to attEmpt to
reclaim swap time from one time-sharing user to
another-a natural claimant: the batch job is readily
available.

Performance Modeling and Empirical Measurements
Hence, a very simple (and low overhead) swapping
and scheduling algorithm can be used. As a particular
user is dismissed, other users are polled in turn to see
who is "ready to run." If someone is found (not the
same user), a replacement swap is initiated and the
CPU is allocated to the batch job. When the swap-out/
swap-in is complete, the new user is given one quantum
(Le., providing the batch job has already had at least its
quantum) ; then the cycle is repeated.
In this way, batch is guaranteed a certain percentage
of the machine (and typically gets much more), and a
moderate number of time-sharing users receive rapid
response to conversational request. Yet with this
relatively simple framework, a number of questions are
unavoidable: How does on-line response and batch
throughput vary with the number of on-line users, and
how do other variables such as quantum size and swap
time relate to system performance? Moreover, how
does one characterize system performance and the
variables which influence it?

Parameterizations and performance measures
The subject of "on-line" response is unfortunately
plagued by many interpretations of what constitutes
response (and, moreover, what defines adequate
response). For the purposes of this paper, "typical
on-line requests" are those which require minimal
central processor time-less than one quantum allocation. Thus, the response time C 1 to a "typical on-line
demand" is that period elapsing between request
generation (the keying in of a control character such as
"carriage return") and the termination of the first time
quantum * which is allocated to the servicing of the
request. This definition provides the basis upon which
the on-line performance of the BTM system is analyzed
in this paper, since it is assumed that on-line users are
typically in phases of program preparation. ** Thus,
providing the quantum is large enough, the great
majority of user interactions (e.g., "open the next
line," "delete source image," "perform syntax check
and insert into text," etc.) can be satisfied ·with single
quantum allocations.
The mathematical model developed in the Appendix
enables one to characterize the system by selecting
values for the variables:
N = total number of active on-line communication

* Also note that if the scheduling algorithm is round-robin then
0 1 provides a basis for approximating the response time for
a request which requires multiple quanta.'
** Note that this is not the case in system environments in which
the on-line users run production (compute bound) programs.

19

sources (i.e., the number of remote users who
are concurrently using the system).
A

= average uf>er interaction rate (frequency at
which a single user requests service by the
CPU).

J.t

= mean rate at which on-line requests are
serviced by the CPU (1/ J.t = average
amount of CPU time required to complete
each request given that the CPU was
dedicated to the servicing of the request).

S=

the average amount of time required to swap
an old user out of core and load a new user
(clearly, S is dependent upon the swapping
device as well as program size).

qR = time quantum allocated to on-line requests
(time-sharing users).
qB

= time quantum given to batch requests
(background users).

ill

= the average cumulative quantum extension
(for monitor services such as scheduling, file
I/O, service calls, etc.) incurred during the
period elapsing between successive quantum
allocations to on-line jobs.

To supplement analysis efforts, the BTM system
software is capable of monitoring these (and other)
variables and accumulating their statistical distributions
during actual system operation. This does not impose
any significant overhead since much of this data is
already accumulated in the accounting log, and (as in
many other commercial systems) used as a basis for
charging users.
Upon establishing reasonable values for the above
variables, the model can then be used to derive performance measures. In terms of resptmse, the salient
performance index is E[C 1] where
E[C1] == the expected response time which "typical
on-line demands" experience (see defini
tion given above).
In addition, the model can readily be used to estimate
the percentage of CPU time available for batch jobs; the
percentage of CPU time received by time-sharing users;
utilization of the swapping RAD; expectations of
system revenues; and a variety of other indices obtained
from combinations of the derived parameters.
A priori estimates for some of these performance
measures are given in Figures 1-5 for reasonable ranges

Fall Joint Computer Conference, 1969

20

100
qR= 200 ms.

~

S

I

as mi. IF 7212 RAD

248
443

mi.
mi.

IF 7232 RAD
IF 7204 RAD

Aa 1 Request/20 user.sec.

1/.... 400 ms./request
iii,.

100 mi.

80

PERCENT OF
CPU TIME
AVAILABLE
FOR BATCH
JOBS
(Pr[B]X 100%)

LIMIT FOR
7212 RAD

60

.<:

Avera9"
Response

6~~'~t!kal

2

20

Demands"
(sec.)

i

_.

"Swap Limited"

....

"Batch limited"

o

20

10

N

30

40

•

NUMBER OF CONCURRENT USERS
10

14

22

18

26

30

34

Figure 3-Relative batch capability

N_

NUMBER OF CONCURRE NT
USERS

Figure I-E[Cll vs. N (p.

120

= 2.5 requests/sec.)

100

.1

QR=200m ••

S=

85 ms IF 7212 RAD
248 ms. IF 7232 RAD
443 ms. IF 72)4 RAD

1

10

MAXIMUM

NUMIfII Of
CONCUHENT
USERS
'"

A = 1 Request!20 u_.sec:.

1/.. = 200 ms./request
iii = 100 ms.

--

20

Avero9"
Reoponse

To "Typkal
On-Line
Demands II
(Sec.)

1~.O----;---~10-C-~-SP-EE-D-~(-~~-I*-.-)~100~------~~~
(LOGSC"LE)

Figure 4-N max vs. CPU speed

e-0 A

'Swap limited"

l-II,.........-~--r--.---,..--r"----.,...,_.....-~--r--=.t_"Batch limited"
10

14

18

22

26

30

34

N-NUM8ER OF CONCURRENT
USERS

Figure 2-E[Cl l vs. N

(I-' =

5 requests/sec.)

of the variables N, A, JJ., S, qn, qB, and m· Obviously,
these variables will differ from ()ne environment to
another. Therefore, before discussing conclusions which
can be drawn from these graphical results, it is appropriate to clarify the parameterizations and assumptions
which were used in the calculations:

I-'

S was conservatively
calculated assuming that four RAD accesses are
required per swap with an average total of 16K
words transferred during each swap. (The RAD's
are head per track rotating memories operating
at 1800 rpm; and the SDS model 7204, 7~~32 and
3
7212 RAD transfer data at rates 187 X 10
6
bytes/sec., 384 X 10 3 bytes/sec. and 3 X 10
bytes/sec., respectively.)
2. The user interaction rate A was estimated from
statistics gathered at RAND6 and other data
extracted from the GE/Dartmouth BASIC
system6 and the SDS 940 system.

1. The average swap time

Performance Modeling and Empirical Measurements

21

Mathematical results
N
qe

~

18

=85 1115. (;.e.

"swap lim;ted")

85 mi. IF 7212 RAD
S =( 2048 rris. IF 7232 RAD
«3 1115. IF 7204 RAD

\

>. = 1 request/20 user·sec.
;;; = 100 mi.

\

E[C ]
1
Average -4
Response
To "Typ;cal
On-Une

Demands II
(Sec.)
7232 RAD

r.i.. -....:--~ ___;... _~ _~ __~ __~
7204 RAD

7212 RAD

'.!.- - ~J -iJr-

-<1> __

0.1

0.-4

-ar- -ar -

-.I> -

-.I>- - -&. -

0.7

0.8

-Jo,. -

,,= 2.5 requests/sec.

0-0.
&r • ..!.

0.:1

.0.3

0.5

0.6

0.9

-iJr - -.....

"

1.0

= 5 requelts/sec.
1.1

1.2

qR(sec.)QUANTUM ALLOCATION
TO

ON-LINE USERS

Figure 5-E[011 vs. qn (N = 18)

3. The selection of qn = 200 ms. was established
such that the majority of user interactions are
satisfied with single quantum allocations. Whereas, selecting qB = 85 ms. and 200 ms. was done
merely to demonstrate "swap limited" and
"batch limited" operation, respectively.
4. The value of the average monitor time ill per
on-line/batch quantum cycle was approximated
utilizing batch accounting information and
timing studies of monitor services.
5. Values of p, were chosen such that the average
would be ,:::::: 125 ms. to
on-line quantum
150 ms. when 200 ms. was allocated. This
selection was inferred from data extra~ted from
the SDS 940 System and BTM code traces. (Yet,
note that a single parameter p, does not provide a
characterization covering the more general case
in which the processing time distribution is
multi-modal.t However,for purposes of studying
interactive response, it provides a good approximation and lends itself to the mathematical
analysis.)

qn

t The multi-modal case arises because of a multiplicity of language facilities and the natural division of requests into interactive
or compute demands.

Given this framework, let us now turn our attention
to the 'figures. Employing the mathematical model,
a priori estimates of average interactive response time
E[C1] are displayed versus N in Figure 1 and Figure 2
for p, = __ 2.5 requests/sec. and p, = 5 requests/sec.,
respeetively. Here, three different curves are plotted in
each :figure to demonstrate the limiting effects of each
swapping device (i.e., "swap limited" operation when
the batch quantum qB is less* than the swap time S).
Also, note that an additional, curve is given for the
model 7212 RAD to display the effects of selecting a
batch quantum which exceeds the swap time (i.e.,
"batch limited" operation). This latter curve shows that
the fastest swapping device effectively becomes a slower
device when qB is set such that operation is "batch
limited"-the model 7~12 RAD is almost equivalent to
a model 7232 RAD when qB = 200 ms.
Now since N ·is the total number of concurrent users
(active communication sources), Figures 1 and 2 enable
one to estimate a value for the maximum number of
users N max which the system can simultaneously
accommodate by: (1) assuming "swap limited" operation
and (2) defining what constitutes adequate response to
typical on-line demands. For example, if one assumes
that adequate interactive response is achieved if :::::: 80%
of the time a user experiences a delay of less than 5 sec.
then, depending upon p" one concludes:**
i. the model 7204 RAD will accommodate a
maximum of 10 to 16 concurrent users for***
p, = 2.5 requests/sec. to p, = 5 requests/sec.,
respectively;
ii. the model 7232 RAD will accommodate a
maximum of 16 to 26 concurrent users for
p, = 2.5 requests/sec. to p, = 5 requests/sec.,
respectively;
iii. the model 7212 RAD will accommodate a
maximum of 26 to 38 users for p, = 2.5 requests/
sec. to p, = 5 requests/sec., respectively.
However, the actual number of on-line users who
'" For this situation, the actual batch quantum allocation is the
swap time S.
""" These conclusions were made by assuming that the probability distribution for response time 0 1 is such that twice the mean
E[Oll is (at least) the 80 percent point. This is a reasonable assumption in light of both the mathematical characterizations used in
the model and empirical measuresments.
""""'Note that reducing J.L from 5 requests/sec. to 2.5 'requests/sec.
is tantamount to reducing processing speed by a factor of 1/2.

22

Fall Joint Computer Conference, 1969

concurrently use the system is a statistical parameter
which generally is less than N max and varies according
to the total number of on-line subscribers, their
demands, processing speed, N max, etc. In practice, the
total number of on-line subscribers typically exceeds
N max by at least a factor of three.
For the above cases, nominally 50--80% of the CPU
time is available for batch jobs. This is shown in
Figure 3. Similarly, utilizing this same response
criterion, it is interesting to observe the effects of
increasing**** CPU speed J.I.. This is demonstrated in
Figure 4 for each of the swapping devices. As CPU speed
increases indefinitely, the capacity of the system to
service on-line requests approaches a limit established
by the swapping device.
Additional insight into system responsiveness is
provided by Figure 5. Here, E[C 1] is graphically
displayed versus the on-line user quantum qR for "swap
limited" operation and N = 18 (with all other variables
the same as those employed in Figures 1 and 2.) Note
that the selection of a minimum qR is very critical;
however, having estabIished a minimum qR, the variations are not dramatic for a relatively large range above
minimum qR. Also, notice that as J.I. is reduced from 5
requests/sec. to 2.5 requests/sec., a model 7232 RAD
must be used to achieve what a model 7204 RAD
accomplished in the former case; and similarly, a model
7212 RAD is required to equal the performance of a
model 7232 RAD.

Experimental results
Extensive statistics were gathered from the system
(while running typical jobs) with a twofold purpose in
mind. First, it was necessary to substantiate the validity
of the assumptions employed in the model; i.e., establish
that the chosen parameters were indeed consistent with
the actual environment. Secondly, a correlation between
empirically measured performance and the results of the
model would lend credence to the validity of the model,
and therefore allow us to extrapolate and predict
performance for other user environments and system
configurations.
The first objective was accomplished by observing a
BTM system which used a model 7212 RAD for
swapping with quanta qR = qB == 200 ms. Values for
A, J.I., ill and program size were tabulated for many
different observation periods. For each (jj these monitoring sessions different average values were obtained, but

the values J.I. = 3.5 requests/sec., A = 1 request/15
user-sec., § = 85 msec. and ih = 100 msec. were found
t~ be quite representative of most samples. The variables
J.I. and A were most subject to variation and ran~~ed from
2 to 6 requests/sec. and from 1 request/25 use:r·sec. to
1 request/l0 us~r·sec., respectively. Also, the data
indicated that the assumptions of exponentinlly distributed CPU time and request inter-arrival time
provided good approximations of user demandEI.
Given that the first objective was satisfied, realization
of the second objeetive is buttressed by Figure 6 which
plots the average of all sampled values for two of the key
performance indications (average response time E[C1J
and CPU time available for batch Pr[B]) as a function
of the number of users N. Upon comparing these results
with the mathematical predictions (also see Figures 1-3),
one can infer that (at least for the range of variables
considered) the mathematical model is reasonably
consistent with actual system operation.
Comments

The analysis presented above primarily focused attention on the system's capacity to accommodate user
demands. Even though no mention was given to
cost/performance tradeoffs, the model lends itself to
this latter design consideration. For example, the
variables N, Pr[B] , and J.I. might be combined to reflect
the revenue derived for service to batch jobs and the
revenue obtained for servicing interactive users which
could then be weighted against the cost expended to

100
Measured
Percentage af CPU Time
Available for Batch Jobs

- "EI

)a...

1

Soltlpled
E [c~

Prediction
Obtained
From Madel

... ... ...

/
:1il,

I

SAMPLED
Pr[B] X 100%

(PERCENT)

(Sec.)

60

20

.4

12

16

N

**** Note that this latitude is only possible on a limited basis
(e.g., code optimization, faster memory, faster operation unit,
multi-processing, etc.)

80

24

28

•

NUMBER OF CONCURRENT
USERS

Figure 6-Empirical results

32

Performance Modeling and E.mpiricalMeasurements
provide (and maintain) the system complement. This
would provide a basis for the designer to balance CPU
cost/performance with that of other system elements.
The process of selecting and examining performance
indexes similar to those discussed here enables the
designer to better appraise the many implementation
tradeoffs which confront him. Moreover., when supplemented with empirical data, these techniques provide a
basis for not only configuring existing systems but also
synthesizing new systems. However, it should be
emphasized that apart from the mathematical model
itself and its macroscopic treatment of the system, the
fidelity of the results and conclusions obtained in this
analysis (or any analysis of this sort) can only be as good
as the accuracy attributed to the independent variables
(N, X, J.l, m, S). The values possessed by these variables
dramatically affect performance and will vary from one
environment to another. Therefore, one should be
cautious before inferring any explicit and universal
characterizations of system performance.

completion of a request and generation of a new request
on a given line is described by the distribution function
I -

A(x) = ( 0

e-}..~

for t
for t

- NXpo(t)

0
0

~

<

0
0

+ J.lPr[R(t)]Pl(t)
for n = 0

[(N - n)X + J.lPr[R(t)]]Pn(t)
+ (N - n + 1)XPn-l(t)
+ J.lPr[R(t)]Pn+1(t)
for 0 < n
J.lPr[R(t)]PN(t)

<

N

+ XPN-l(t)
for n = N

where Pr[R(t)] denotes the probability that at time t
t'le computer is servicing one of the remotely generated
on-line requests. Note that in the above equations, the
input rate is (N - n)X when n requests are queued.
Thus the model accounts for the natural variations in
demand intensity which r ~sult because there are a finite
number N of input sources.
From these equations, the stationary probability 7
that n on-line requests are queued is
p.

=

eN ~I n)! C;r[Rl)"

po

where
Pr[R] = limit Pr[R(t)] and
----? 00

1

po

= ---------------------------

BTM mathematical model
Consider the generation of on-line requests on each
communication channel is an exponential process with
parameter X. Hence, the time interval x between

~

<

Given that there are N channels, let p (~l denote the
probability that n on-line requests are queue 1 at f 0 ne
arbitrary time t for n = 0, 1, .. ·N, then

t

APPENDIX

for x
for x

Similarly, assume that the service time t required by
each on-line request is exponentially distributed with
parameter J.l and characterized by the distribution
function

REFERENCES
1 B KRISHNAMOORTHI R C WOOD
Time-shared computer operations with both interarrival and
service time expone11 tial
J A C M Vol 13 317-338 July 1966
2 E G COFFMAN JR
Stochastic models of multiple and time-shared computer
operations
Report 66-38 Dept of Eng Univ of Calif Los Angeles
June 1966
3 L KLEINROCK
Time-shared systems: A theoretical treatment
J A C M Vol 14 242-261 April 1967
4 J E SHEMER
Some mathematical considerations of time-sharing scheduling
algorithms
J A C M Vol 14 262-272 April 1967
5 G E BRYAN
JOSS: 20,000 hours at a cOrlsole-a statistical summary
Proc F J C C 769-777 1967
6 H CANTRELL
. Time-sharing data
General Electric Technical Information Series Report
R65CD12 December 1965
7 T L SAATY
Elements of queueing theory
McGraw-Hill New York 1961

23

[ 1

+

t;,

(N~! n) ! C;'[R1YJ

The probability Pr[R] can be estimated by considering

24

Fall Joint Computer Confer·ence, 1969

the interval which elapses between successive allocations
of a quantum to on-line users. Let;Tk denote the total
time between the oth on-line quantum completion and
the kth on-line quantum completion. If the kth completion leaves the on-line queue in an empty state, then the
expected value of the time ATk until the next on-line
quantum completion is

is to let f increase by some small Af uutil a solution for
po is obtained which is consistent with Pr[R,]. The
variable f satisfying this criterion will vary drama.tically
depending upon N,
J.L, A and qB.
Upon solving for Po, the percentage of CPU time
available for batch jobs is

m,

qB

+ Po (l/NA)

Pr[B] = =qR-+-=-qB--'-+~m::::::-:-+"":"-p-o~(l-/-N-A)
where qB is the avera~e quantum which batch users
receive; qR is the expected duration of an on-line
(remote user) quantum; (l/NA) is :the mean time until
the generation of the next on-line request; and ill is the
expected monitor overhead time per batch/on-line
quantum cycle. Here, ill accounts. for 'any scheduling;
I/O overhead; file operations, and any other CPU time
pre-empted by the monitor which results during the
cycle of a quantum allocation to a batch job followed by
a quantum allocation to an on-linejob.
In the case when the kth on-line quantum completion
does not leave the interactive user queue empty, then
with probability (1 - po)

The variables qB and qR are heavily infiueneed by
quantum periods and swap time. If one assumes that
(with the exception of a batch quantum allocation every
other quantum) on-line jobs run on a demand basis
(i.e., the batch quantum qB is less than the swap time S),
then qB = S. Hence, the swap time limits the rate at
which successive quantum allocations are provided to
the on-line requests (i.e., maximum service capacity is
given to on-line requests). Whereas, if the batch
quantum limits the servicing of on-line requests
(qB > S), then qB = qB. Therefore, for completeness
_,
[ qB if S < qB
qB =
S if S ~ qB

l

Now let T B , T R , and Tm denote respectively the length
of time out of T k which the system spends servicing
batch jobs, on-line jobs, and monitor functions)
respectively.
Then as k goes to infinity, the ratios TB/k, TR/k, and
Tm/k converge with probability one to (qB + palNA) ,
and
respectively. Therefore, in the limit, an
approximation to the fraction of the time which the
system spends servicing on-line requests is

qR,

m,

Pr[R] = lim [TRJ = lim [TR/kJ
k-HD
Tk
k-+ 1 + qB

where
N

Here, f is an appropriate scale factor introduced to
facilitate solving for

{pn}~

n-O

The numerical technique

E[n] =

L: npn

and E[ToJ is the expected time remaining subsequent to

Performance Modeling and Empirical Measurements
the arrival of an on-line request before the next quantum
allocation is initiated. The value of E[ToJ is difficult to
accurately express since it is a function of the probability
densities for qB and m together with machine state
probabilities; however, it is clear that

25

time interval t given that m requests are queued. For
example, with exponential inter-arrival

Also, in the above equations
At any rate, E[To] is not a dominant factor in E[C I ]
unless E[C I ] is extremely small (i.e., E[C I ] ~ qR + E[To],
for example). Hence, the precise value of E[ToJ is not
Qritical in those cases which are of particular interest
(namely, those resulting when the on-line queue tends
toward saturation; i.e., E[n] ~ N).
In addition to the above result for E[C I ], since the
scheduling discipline is round-robin, it is possible to
estimate2- 4 the expected total response time E[rl t] for
an on-line request which requires a processing time t in
excess of a single quantum qR
E[Rlt] ~t

where

< alb >

+  [E[C

I]

+ qR)
+ qB + ill]

(po E[To]

-

is the smallest integer greater than a/b.

Alternate model

Let Pmn(T k ) denote the probability that non-line
requests are queued at epoch T k marking the completion
of the kth on-line quantum allocation, given that at
epoch T k - l there were m on-line requests awaiting
service from the system. I ,2 Then independent of k
since the CPU servicing of requests is characterized as
an exponential process

[0

Pr[n - m

Here, p B denotes the probability density function which
describes the batch quantum allocation, and p B+R is
the convolution of PB with the density function PR
defining the distribution of an on-line quantum allocation. Both PB and PR include overhead functions to
account for file I/O, monitor overhead, etc.
,The density function PB is derived from the swap time
distribution when qB < S; whereas, it depicts the CPU
servicing of batch requests when S < qB. For example,
in the latter case with o(z) representing the Dirac-delta
function describing an independent variable z, one
could characterize the constant batch allocation interval
by
PB(t) = oCt - ('YB

+

qB»

where the constant 'Y B reflects batch overhead. Similarly,
letting 'YR denote the overhead incurred during an
on-line quantum allocation
0
for t :::; 'YR or t > 'YR + qR
PR(t) = p,e-JJ.t + e-MR oCt - (qR + 'YR»
l
for 'YR :::; t :::; 'YR + qR

!

For completeness, the transitions from the O-state are
assumed to be

y+QR-E

+

y=

Smax if service to on-line customers is swap
limited (i.e., qB < S)
qB if batch quantum limits on-line service
(i.e., qB ~ S)

+ 11 m, t] PB+R(t)

dt.

for 1 :::; m S n
pmn

=

o for n

:::; m - 2; m

~

1

1J+q R-E
[o

Pr[OI m, t] PB+R(t) dt
for n = m - 1

~

0

where E ~ 0 and Pr[kl m,t] denotes the conditional
probability of generating k new on-line requests in a

Then, having formulated the state transitions {Pmn}
and defined the density functions PB(t) and PB+R(t), the
problem remains to solve for the steady-state probabilities. This is accomplished by noting that the Pmn'S
define an ergodic Markovian chain whereby in matrix
form with!!.. = (Pmn) there exists a unique set of number~
fPm }~=0 such that

26

Fall Joint Computer Cpnference, 1969

and
N

LPn = 1
n=O

The solution of these equations produces the limiting
stationary probabilities {P16}n~o which could be used in
calculating E[n] to provide a more accurate estimate of
E[C 1]. (That is, providing one can accurately describe
PB, PB+R, A, etc.).
However, since the accuracy of such variables would
be highly questionable in the absence of any empirical
information and since this latter model presents a
number of non-trivial mathematical difficulties, it was
not utilized to derive the result.s given in this paper.
Yet, in the future, as sufficient data is accumulated from

the actual operation of BTl\1 systems, then the latter
model will enable us to extrapolate and better predict
the effects of alterations to the system (e.g., improvements resulting from faster swapping devices or
increases in CPU speed).
ACKNOWLEDG~\1ENT

The authors are indebted to ::,\1. Leavitt, D. Cumming,
.J. Doeppel, T. l\1artin and G. E. Bryan for their many
contributions to the BTl\1 design effort and also wish to
extend thanks to all those other individuals at Scientific
Data Systems who helped to make this project possible.
In particular, the authors are grateful to D. Cota,
E. lVlaso and Dr. R. Spinrad for their guidance in these
efforts.

Dynamic protection structures
byB. W.LAMPSON
Berkeley Computer Corporation
Berkeley, California

INTRODUCTION
A very general problem which pervades the entire field
of o,Perating sys.tem design is the construction of protectIOn mechamsms. These come in many different
forms, ranging from hardware which preve~ts the execution of input/output instructions by user programs,
to password schemes for identifying customers when
t~ey log onto a time-sharing system. This paper deals
wIth one aspect of the subject, which might be called
the meta-theory of protection systems: how can the
information which specifies protection and authorizes
access, itself be protectea and manipulated. Thus, for
example, a memory protection system decides whether a
program P is allowed to store into .location T . We are
concerned with how P obtains this permission and how
he passes it on to other programs.
In order to lend immediacy to the discussion it'
will be helpful to have some examples. To pro~ide
some background for the examples, we imagine a
computation C running on a general multi-access
system 1\1. The computation responds to inputs from
a terminal or a card reader. Some of these look like
commands: to compile file A, load B and print the
output double-sI;>aced. Others may be program statements or data. As C goes about its business, it executes
a l~rge n~mber of different programs and requires at
varIOUS tImes a large number of different kinds of
access to the resources of the system and to the various
objects which exist in it. It is necessary to have some
way of knowing at each instant what privileges the
comput~ti?n ha~, and of establishing and changing
these prIvIleges In a flexible 'vay. We will establish a
fairly general conceptual framework for this situation,

and consider the details of implementation in a specific
system.
Part of this framework is common to most modern
operating systems; we will summarize it briefly. A
program running on the system M exists in an environment created by M, just as does a program running in
supervisor state on a machine unequipped with software. In the latter case the environment is simply the
available memory and the available complement of
~achine instructions and input/output commands;
SInce these appear in just the form provided by the
hardware designers, we call this environment the bare
machine. By contrast, the, environment created by IVI
for a program is called a virtual or user machine. 6 It
normally has less memory, differently organized, and
an instruction set in which the input/output at least
has been greatly changed. Besides the machine registers and memory, a user machine provides a set of
objects which can be manipulated by the program. The
instructions for manipulating objects are probably
implemented in software, but this is of no concern to
the user machine program, which is generally not able
to tell how a given feature is implemented.
The basic object which executes programs is called
a task or process;6 it corresponds to one copy of the
user machine. What we are primarily concerned with
in this paper is' the management of the objects which
a process has access to: how are they identified, passed
around, created, destroyed, used and shared.
Beyond this point, three ideas are fundamental to
the framework being developed:
1. Objects are 'named by capabilities,a which are
names that are protected by' the system in the

27

28

Fall Joint Computer Conference, 1969

sense that programs can move them around but
not change them or create them in an arbitrary
way. As a consequence, possession of a capability can be taken as prima facie proof of the
right to access the object it names.
2. A new kind of object called a domain is used to
group capabilities. At any time a process is
executing in some domain and hence can exercise
the capabilities which. belong to the domain.
When control passes from one domain to another (in a suitably restricted fashion) the capabilities of the process will change.
3. Capabilities are usually obtained by presenting
domains which possess them with suitable
authorization, in the form of a special kind of
capability called an access key. Since a domain
can possess capabilities, including access keys,
it can carry its own identification.
A key property of this framework is that it does not
distinguish any particular part of the computation. In
other words, a program running in one domain can
execute, expand the computation, access files and in
general exercise its capabilities without regard to who
created it or how far down in any:hierarchy it is. Thus,
for example, a user program runnipg under a.debugging
system is quite free to create another incarnation of
the debugging system underneath him, which may in
turn create another user program which is not aware
in any way of its position in the i scheme of things. In
particular, it is possible to reset 'things to a standard
state in one domain without disrupting higher ones.
The reason for placing so much weight on this property is two-fold. First of all, it 'provides a guarantee
that programs can be glued tog~ther to make larger
programs without elaborate pre1arrangements about
the nature of the common environment. Large systems
with active user communities quickly build up sizable
collections of valuable routines. The large ones in the
collections, such as compilers, often prove useful as
sub-routines of other programs. Thus, to implement
language X it may be convenient to translate it into
language Y, for which a compiler already exists. The X
implementor is probably unawar~ that Y's implementation involves a further call on an assembler. If the
basic system organization does not allow an arbitrarily
complex structure to be built up~ from any point, this
kind of operation will not be feasible.
The second reason for concern about extendibility
is that it allows deficiencies in the design of the system
to be made up without changes in the basic system
itself, simply by interposing another layer between the
basic system and the user. This is especially important

when we realize that different people may have different
ideas about the nature of a deficiency.
We now have outlined the main ideas of the paper.
The remainder of the discussion is devoted to filling
them out with examples and explanations. The entire
scheme has been developed as part of the operating
system for the Berkeley Computer Corporation IVfodel
I. Since many details and specific mechanisms a,re
dependent on the characteristics of the surrounding
system and underlying hardware, we digress briefly
at this point to describe them.
Environment

The BCC Model I is an integrated hardware ~md software system designed to support a large number (up to
500) of time-sharing users. This system consists of
two central processors, several small processors, a large
central (core and integrated circuit) memory, androtating magnetic memory. The latter contains more than
500x 106 bytes, including approximately 12X 10 6 bytes
of drum having a transfer rate of more than 5X 106
bytes per second.
The hardware allows each process more than 512k
bytes of virtual memory. The central processors can
accommodate operands of various sizes including 48and ~6-bit floating point numbers. The addresslng
structure allows characters, part-word fields and array
elements to be referenced directly. The subroutinecalling instruction passes parameters and allocates
stack space automatically. System calls are handled
exactly like ordinary function calls.; when anays or
labels are passed to the system they are checked automatically by the hardware so that they can be used
by the system without further ado.
The memory management system organizes memory
into pages. A page is identified by a 48-bit unique name
which is guaranteed different for each page ever created
in the system. Tables are maintained in the central
memory which allow the page to be found in the various
levels of the memory system. These tables are automatically accessed by the address mapping hardware
the first time the page is referenced after the processor
starts to run a new process. Thereafter its real core
address is kept in fast registers. It is therefore unnecessary for any program other than a small part of the
basic system to be concerned about the location of a
page in the memory system; when it is referenced, it
will be brought into the central memory if it is not
already there. Extensive facilities are provided, however, to allow a process to control the level in the memory hierarchy of the pages it is interested in. 'The work
of managing the memory is done by a processor with

Dynamic Protection Structures
read-only program memory and data access to the
central memory; this processor has a 100 ns cycle
time, so that it can handle the large amount of computing required to keep up with demands placed on
the memory system. Another small processor handles
-the remote terminals, which are multiplexed in groups
of 20 to 100 at remote concentrators and brought.
into the system over high-speed lines.
Pages are grouped into files, ·which are treated as
randomly addressable sequences of pages. The only
mechanism provided to access the data in a file is to
put a page of the file into the virtual memory of a
process. Files and processes are named and have protection information associated with them.
Domains in action

Before plunging into a detailed analysis of capabilities and domains, we will look at some of the practical situations which these facilities are designed to
serve. They all have the same general character: several
programs with different privileges exist. Each program
corresponds to one domain. Some of the domains con. trol others, in the sense that the capabilities of a controlled domain are a subset of those of its controlling
domain. As a first example, consider the command
process CP of an operating system. This program
accepts a command, perhaps from a remote terminal,
and attempts to recognize it as a call on a program X
which CP knows about. If it succeeds, CP calls on X for
execution, passing it any parameters which were included in the command. To do this, CP must set up
a suitable environment for X to function in. In particular, enough memory must be provided for X to
run, X must be loaded properly, and suitable input/
output must be available. When X is finished, it will
return and CP can process a new command.
The key point is that we want CP to be protected
from X, to ensure that the user's commands continue
to be processed even if X has bugs. In particular, we
want to be sure that

X: command

cP: command processor
command input

Capabilities

command output

required by

Directory of commands

X

Domain X

Return to CP
Calls

Domains

Figure 1-A command processor and its comma.nd

to X in two forms: in the picture on the right, and as
a return capability in X. The reason for the capability
is that X cannot return with a simple branch operation, since it would then be able to start CP running
at any point, which would destroy the protection.
Suppose now that we want to allow X to get additional commands executed. X might, for example, be a
Fortran compiler whose output must be passed
through an assembler. A simple way to do this is to
put the assembler input on a file called, say, FORTRANTEl\1P, and issue the command.
ASSElVIBLE FORTRANTEMP, BINARY
This command is just a string, which can easily be
constructed by the compiler X. To get it executed,
however , X must be able to call CP; This situation
is illustrated in Figure 2; note the call capability in X,
which is quite different from the return capability.
Weare ignoring for the moment the question of how
CP knows that X is authorized to call the assembler.
If the idea of the preceding paragraph is pursued, it
suggests the value of being able to switch the source
of command input and the destination of command
output in a flexible way. By these terms we mean the

cP: command processor

X:

Command

Y: Command
I

command input

1. X does not destroy CP's memory or files, so
that CP can continue to run when X returns.
2. CP can stop X if it goes wild. Usually we want
the ability to set a time limit and also to intervene from the terminal.
In other words, we want CP and X to run in separate
domains, as illustrated in Figure 1 (since this is an
informal discussion, we do not trouble to distinguish
carefully between the program X and the domain in
which it runs). Here we have shown the call from CP

29

command output
Directory of commands

Capabilities

capabili ties

required by

required by

Y

! Return

to X

i

0

X

X

Domain X

I Domain

(0

!

(0
call CP

IReturn to CP

iI Return to CP

i

Figure 2-A recursive command processor

0

30

Fall Joint Computer Conference, 1969

traffic between a program and the entity by which it
is directed. In a time-sharing system this is normally
a terminal at which the user is sitting; in a non-interactive system it will be a file of control cards. It is
often desirable, however, to switch between the two,
so that routine processing can be done automatically
when the user's attention is elsewhere, yet he can
regain control when things go awry. Again, it is not
uncommon to wish to capture a complete record of a
conversation between user and machine for later
analysis and replay. More radical, it may be of interest
to replace the user at his terminal with a program
which can manipulate the strings of characters which
constitute commands and responses. In this way major
changes in the external appearance of a system can
be obtained with little effort.
All of these things can be accomplished by giving
interactions with the command I/O device the form of
calls to a different domain which acts as a switch. A
generalization to include the possibility of different
command devices for different domains is easy. Thus,
a user may initiate a program in a domain X which,
while continuing to communicate with him, starts a
CP 1:

cOllllUlnd
proceSlIOr 1

call CIO

X:

aacro

J«::

c~d

call CIO

CP2: command
processor 2

call CIO

Domain J«:

Doaain CP2 .

Domain X

Directory
of caa.and.

Return to Cpl

Return to Me

Domain CIO

Return to CIO

user proaram

CIO·

control I/O

call CIO

call CPl

Return to CP2

call CP2
call Me
Return to X

Figure 3a-Switchable control I/O--the- domains

~

Top-level command processor initiates a
cornmand

~

which wants to drive another command
processor with some pre-stored or computed
input.

It therefore creates another CP

and calls it, telling CIO to use

Me

fClr

its I/O

8

The lower CP is given a command to cal.l
the user program

x.

This program needs input
which it gets by calling CIO, the domclin
which is switching the control I/O.

~

the current input source, which is

CIO calla
Me

Figure 3b-Switchable control I/O-the calh

subsidiary domain and feeds it commands. The subsidiary, unaware of the way in which it is being; driven,
may iterate the process by creating Z. The key fact
which makes it all work is the isolation of one domain
from others. Thus, Y may decide to close all its files
without disturbing X, since Y has no way of even
knowing about X's files,. much less accessing t.hem. Z,
on the other hand, can be an open book to Y. Various
aspects of the situation are illustrated in Figure 3.
This section concludes by analyzing a problem of
great practical importance: how to construct H debugging system. This example is a good source of insights
into the facilities required of a protection system because of the great variety of things which can be expected to go wrong during debugging. There are two
domains, one for the debugger D and one for the program X being debugged. We of course want D to be
protected from X. Equally important, we want X to
be completely open to D, so that every object a{}cessible
to X is also accessible to D, and furthermore that D
can find all the objects accessible to X as well as access
them. Otherwise D will not be able to find out what X
has done or to undo any damage. Furthermore, we
want D to be able to imitate any actions which X
can take, so that D can create suitable initial conditions
for debugging parts of X. Thus, D needs operations
which, given a capability for X, allow D to
find all the capabilities in X
copy capabilities between D and X
destroy capabilities in X
enter X at any point with any machine state

DYnamic Protection Structures
With these powers, D can also handle domains whicll
X has created, since it can get hold of X's capabilities
for them. Breakpoints can be inserted in X in the
form of calls on D.

NAME

TYPE

VALUE

31

DOMAINS

1

A

1: 0

2

B

0

1

o :I 0
I
o:0

C

0

0

1:0

D

0

0

0·: 1

E

1

1

o:1

I

I
I

4

Domains and capabilities
The nature of capabilities

F

6

0

1

I
I

1

I

2

1:0
I

As we have already said, a capability is a protected
name of an object. When any object is created, a
capability is created to name it; without the capability
the object might as well not exist, since there is no
way to talk about it. The capability may be thought
of as an ordinary data item enclosed in a box which
prevents tampering with the contents. Thus, for example, it may be convenient to make a capability for
a file consist of simply the disc address of its index.
This is entirely satisfactory, since programs which
handle the capability cannot modify it. If they could,
disaster would ensue, since any program could put
any desired disc address into a file capability, and
there would be no protection at all. If the machine
hardware allows a word to be tagged so that it cannot
be modified except by the supervisor, then we have
precisely what we want for a capability. The situation
is illustrated in Figure 4. It should be possible to load
and store such a word (including the tag bits) in order
to give programs the necessary freedom to manipulate
the names of the objects they are working with.
If this kind of hardware is not available a different
and potentially confusing implementation is required.
The potential can be kept from realization by referring
back to the "pure" implementation of the last paragraph. What is required is to hide the capabilities
away in the supervisor and provide programs with
unprotected names which can be used to refer to them.
When a program running in domain D presents one
of these names, it is necessary to check that it actually
names a capability which belongs to D. This can easily
Capabili ty:

TAG

TYPE

TAG

= read-only,

TYPE

= FILE

VALUE

= disk

VALUE

except to supervisor

address of index

Figure 4--Structure of a eapability

(a)

capabilities grouped, with

1

ITJ~in]

1

ITJD~ain4

bits for ownership

(b)

capabilities separate
for each domain

Figure 5-Capabilities and unprotected names

be done, if there are n such capabilities, by using
numbers between 1 and n for the names. 3 An attractive
alternative, if domains can be grouped into larger units
which share many capabilities, is to number the
domains from 1 to i and the entire collection of capabilities from 1 to n and to attach a string of i bits to
each capability. Bit d is on exactly when the capability
belongs to domain d. Figure 5 illustrates.
A somewhat more expensive implementation is to
search a table associated with the domain whenever
an unprotected name is used. This scheme shares with
the bit-string idea the advantage that it is easy for
different domains to use the same names for the same
object.
There are capabilities for all the different kinds of
objects in the system. On the Model I these are
files
pages of memory
processes
domains
interrupt calls
terminals
access keys

Domains and memory
The nature of a domain is considerably more dependent on the underlying system than is the case
for capabilities, mainly because of the treatment of
memory. From a purist's viewpoint, every access to a

32

Fall Joint Computer Conference, 1969
!

--------------------------~---------------------------------------------------------------monitor
memory word is an exercise of a, capability for that
utility
word. A more moderate positio~, and one which is
user
quite feasible on suitable hardw~re, is to view each
access as the exercise of a cap~bility for a segrnen t
in decreasing order of strength. The hardware enforces
which contains the word. 2 The! mapping hardware
a restriction that addressing cannot go into fI, higher
which implements segmentation is thus viewed as part
ring. It also provides protected entry points :into the
of the capability system, and ~ satisfying unity of
utility and monitor rings and automatically checks
outlook is gained. Since a seg~ent is identified by
addresses passed into these rings as param1eters to
number, the preceding section applies. We shall not
ensure that they are legal in the ring from which they
consider the formidable difficulties which arise if different domains use different names for the same segment.
came.
This simple hardware-implemented structure permits
If segments are accessed through capabilities like
three
domains to transfer control around among each
everything else, then a domain cOJilsists of nothing more
other
and to address each other's memory in a very
than a collection of capabilities. On machines not
convenient and efficient way. The price paid is a riequipped with the proper hard\\'are a domain has an
gidity in structure, and a drastic incompatibility with
address space as well. In the lVlodel I this is a list of
the main, software-implemented domain meehanism.
the pages which occupy each of the 64 slots for pages
The incompatibility is resolved by requiring a change
in the 128k memory which is acc:essible to a user proin ring to be reported to the software, except \yhen the
gram.
only processing to be performed before returJl1ing the
It is also necessary to deal w~th the fact that the
original ring can be done with the capabilities of the
hardware does not allow one domain to access the
original ring. Short calls thus remain cheap, while the
address space of another one directly. This fact is of
overhead added to longer ones is not excessive.
great importance when we consider how data is passed
back and forth between domains, since it implies that
arrays cannot be passed simply by specifying their
Domains and processes
addresses. It is therefore extremely convenient to inThe relationship between domains and processes is
clude as part of a call the abilitN' to pass scalar data
another area greatly influenced by the surrounding
items, and essential to include th~ ability to pass capasystem. The logical nature of the two kinds of object
bilities. From this foundation arQitrarily complex comallows a great deal of freedom: in fact, a domain has
munication can be built, since capabilities for pages,
much the same appearance to a process that a segment
files and domains can be passed. 'rhus, if an array needs
of memory does. The storage for capabilities ~provicled
to be passed as a parameter, i~ is sufficient to pass
capabilities for the pages or file !containing the array,
by a domain can accommodate many processes, and a
single process can switch from one domain to another
together with its base address a:pd length. The called
(subject to restrictions which are considered in the
domain can then put the pages into its address space
and access the array. This is of course much less connext section).
In the ::Uodel I, however, storage is allocated in 2k
venient than passing an entire segment as a parameter,
but it is quite workable.
'
pages, and one of these, called the context block, is
An alternative approach is to organize the hardware
used to hold the system-maintained private data for
each process. The cost of ha.ving a process is thus high,
so that the address space of one domain is a subset to
and there is considerable incentive to minimize the
that of another. This eliminates all problems when the
number of processes; usually one is enough per compusmaller one calls the larger, although it does not help
at all when we want to share only part of the address
tation, if advantage is taken of the interrupt facilities
space. A subset organization fits well with a linear or
described later. When the usage of space in the context
"ring"-like system4 in which the domains are numbered,
block is analyzed, it turns out that there are only two
and the capabilities of domain i are a subset of those
items which would have to be duplicated to allow
of domain i-I. As we shall see, there are good reasons
~everal processes to run with the same address space.
for wanting a more flexible sch¢me, but for a great
These are a 14-word machine state and a stack used
many applications a linear orderirlg is quite satisfactory.
for local storage when the supervisor is executing in
To allow these to be handled more efficiently, the
the process. This stack has a minimum of about 60
Model I hardware breaks the address space of a process
words and can grow to several hundred words at certain
into three rings:
points during supervisor execution. It is therefore the

Dynamic Protection Structures
main barrier to the existence of cheap processes. The
problem can be greatly alleviated by allocating stack
space dynamically at each function call and releasing
it at each return, but this would require some major
changes in system organization.
Although processes are expensive, domains are quite
cheap, since the bit-string method is used to assign
capabilities to domains. Each process in the Model I
can have about a dozen domains associated with it.
The process can run in any of its associated domains
but in no others. This implies that two processes never
run in the same domain.
In a system in which processes are cheap, it is possible
to take an entirely different approach which encourages
the creation of processes for every purpose. In such a
system, parallel processing is of course greatly facilitated. In addition, free creation of processes can be
used to give a somewhat different form to many of
the facilities described in this paper.3
It is perhaps worthwhile to point out that a machine
whose addressing is not organized around a stack or
base registers cannot reasonably run several processes
out of the same domain unless they are executing totally disjoint code, because of the problem of address
p.onflicts.

Transfers of control

Calls
The only reason for creating a domain is to establish
an environment in which a process may execute with
different protection than that provided by any existing
domain. If this objective is to be fulfilledJ transfers of
control between domains must be handled with great
care, since they generally imply the acquisition of
new capabilities. If it is possible for a process' running
in domain X to suddenly jump into domain Y and
continue execution at any arbitrary point, X can certainly induce Y to damage the objects accessible
through Y's capabilities.
To provide an adequate mechanism for transfers
between domains, we introduce the idea of a protected
entry' point or gate, and make the rule that transfer
into a domain is normally allowed only at a gate. A
gate is a new kind of capability which can be created
by anyone with a capability for the domain. It specifies
a location to which control is to go when the gate is
used. Gates can be passed around freely like other
capabilities, and each one may be viewed as conferring
a certain amount of power, namely the power to accomplish whatever the routine entered by the gate is

33

designed to do. With gates it is possible to selectively
distribute the powers of a domain in a flexible way.
A transfer through a gate usually takes the form of
a subroutine call; some provision must therefore be·
made for a return. It is not satisfactory to create
another gate which the called process may return
through, since he might save it away and use it to
return at some later and unexpected time. Instead,
the domain and location to return to are saved on a.
call stack in the supervisor, from which the return
operation can retrieve them. It is possible to call a.
domain recursively with this mechanism, a feature
which is generally desirable and also quite important
for the trap and interrupt system about to be described.
In order to allow the stack to be reset in case of an
error, or for any of the other reasons which prompt
programmers to reset stacks, a jump-return (n) operation is provided which returns to the domain n levels
back. Protection is maintained by requiring the domain
doing the jump-return to have capabilities for all the
domains being jumped over.

Traps
A trap is caused by the occurrence of some unusti~l
event in the execution of the program which requires
special handling, such as a floating point overflow, a
memory protection violation or an end of file. When a
trap occurs, it forces control to go to a specified place,
where presumably a routine has been put to deal with
the event. Whether any particular event causes a trap
or simply sets a flag which can be tested by the program
is a decision which should be under the programmer's
control. Traps may be initiated by hardware (e .g ..
floating overflow) or may be artifacts of the software;
as with most distinctions between hard ware and software implementation, this one is of little importance,
and we expect all traps to be transmitted to the program
in the same form, regardless of their origin.
These are all obvious points which are generally
accepted, and have even become embedded in the
definition of PL/I. What concerns us here is the relationship between traps and domains, which is not
quite so obvious. The basic problem is that the response to a trap must be made to depend on the environment in which is occurs. The 'occurrence of, say, a
floating overflow is simply a fact, and has nothing to
do with who is running. The action to be taken, on the
other hand, is entirely a function of the situation.
Consider the example in Figure 6. If a floating overflow
occurs with the call stack in state (b), it is clear that

34

Fall Joint Computer Conference, 1969

Name
A

Domain

Traps

B

Statl.stl.cal
package

C

Matrl.x
Inversion
a)

FLTOV,

SINGMTX

I

I

FLTOV

Domains and
enabled traps

o

b)

The call stack
during matrix
inversion

o
~SIN~ o
o

o

CATCHALL

8

0FLTOV

(0
c)

o
o
G

ICommand processor ICATCHALL I

the matrix
inverter processes a
floating overflow

d)

the matrix
inverter returns with
trap-return
(SINGMTX)

e)

the matrix
inverter returns
with trapreturn
(BAD DATA)

Figure 6--Traps and trapreturns

C should have the first chance to handle the trap. If
it is not interested, the domain B which called it should
have the second chance. In state Cc}, on the other hand,
domain B should have the first chance, and then A.
The reasons for this, is that we do not wish to give up
control to a weaker domain when a trap occurs.
The idea is then the following: Each domain is
considered to have a father. When a trap occurs, it is
first directed to the domain S which is running. If S
does not have the trap enabled, the father of S is
tried in the same way. If no one can be found to handle
the trap, there are two possibilities:

to each hardware-generated trap is a standard name.
Software-generated traps can use £tny names, including
the ones for hardware traps. This makes it easy for a
subroutine to simulate the occurrence of a hardware
condition which it may not be convenient to produce.
A simple extension of the return operation. to a
trap-return allows a routine to signal an error without
leaving any traces of itself; the trap-return does a
return and immediately causes the specified trap,
without allowing any execution beyond the return
point. The domain which handles the trap then sees
it as having occurred in the calling routine, which is
exactly what is wanted. Thus in Figure 6 we have n
matrix inversion routine which processes its own
floating overflows, but reflects two other conditions
to its caller with trap-return. Another useful convention is to disable the trap when it occurs. This
makes it much less likely that the program will get
into a loop, especially for such traps as illegal instruction and memory protection violation.

Interrupts
There remains one more way to cause n tlmnsfer
between domains: the occurrence of nn interrupt. This
is not intended to be the normal mechanism for communication between coopernting processes; the basic
block ,and wnke-up mechanismso are expected to perform that function. There nre times, however, when it
is desirable to force a process to do something:, even
if it is not paying attention. Two obvious reasons for
this are:
n quit signal from the terminal, which indicates
that the user wants to regain control over a process
which hns gone into a loop, or perhaps ,simply
become unnecessarily wordy;

ignore it;
generate a catchall trap which any domain that
lacks a father is forced to handle.

the elapse of a certain amount of time, which
has much the same meaning.

If a domain T is found with the trap enabled, it is
called with the name of the trap as argument. It can
then return and allow execution to proceed if it is
able to clear things up. Alternatively, it can do a
jump-return to someone farther back on the call stack
if it finds the situation to be hopeless. An important
property of this scheme is that the trap routine can do
arbitrarily complex processing without disturbing the
situation at the time of the trap.
Conceptually, we wish to think of traps as identified
by symbolic names. Each domain must then include a
list of names of the traps it has enabled. Conesponding

The action required in these two cases is different.
When n timer interrupt is requested (and there may be
two kinds, for real time and CPU time) the desired
action is usually to cnll a specific domain, often the
one which is setting the timer. If another domain
wants a timer, it will use one which is logically different.
The user's quit signal, on the other hand, is context
dependent like a trap; the desired action is a function
of the routine which is running when the signal 2~rrives.
Thus an iterntive root-finder may interpret a quit as
an indication that the solution is accurate enough,
but the debugging system under which it may be run-

Dynamic Protection Structures
ning will curtail its printing when it sees a quit and
await a new command. This' analysis suggests a simple
implementation: convert the quit into a trap from the
currently executing domain. Each interrupt, then, will
give rise to a call or a trap, depending on its type as
declared by the programmer.
Even when we see how to convert them into operations within the process, interrupts still pres.eut one
serious problem which does not arise in the handling
of traps. This is the fact that a program occasionally
needs to be allowed to compute for a while without
losing control. Usually this happens when modifications are being made to a data base; if a quit signal
should appear or a timer run out halfway through this
operation, the data is left in a peculiar state. The
obvious solution is to allow a process to become noninterruptible for a limited period of time. The function
of the limit is to prevent the process from getting into
a state from which it cannot be retrieved; exceeding
it is a programming error and always causes the process
to become interruptible again and an error trap to
occur, regardless of whether an interrupt is actually
pending. The limit is properly measured in real time,
since its primary purpose is to put a bound on the
frustration of the user at his console.
N on-interruptibility is a process-wide condition. It
must be possible, however, for a newly -called domain
to extend the limit exactly once, so that it can function
properly even though its caller is about to exceed his
limit. The limit is thus part of a call stack entry. When
a return occurs, the old limit comes back into force,
and an immediate trap may occur if it has been exceeded.
Table I summarizes the operations connected with
transfers of control between domains.
TABLE I-Operations for transfers
Operation

Arguments

Call
Return
Jump
Jump-return
Trap
Trap-return

Gate, Parameters
Parameters
Gate, Parameters
Depth, Parameters
Trap number
Trap number

Proprietary programs

The remainder of this paper deals with the protection problems introduced when objects are allowed

35

to have external, mnemonic names. The examples in
this section are intended to introduce this subject, and
are also of interest in their own right. Suppose then
that a user U has a program executing in domain P
and wishes to perform a circuit analysis. P has generated the input data for the analysis, and intends to
use the results for further calculation. Within the
system M on which P is running, some user V has
written a suitable analysis program A which he has
offered for sale, and U has decided to use V's prog.ram.
I t happens that U and V are competitors.
Both users in this situation have selfish interests
to protect. First, and most obvious, V does not want
his program stolen. He therefore insists that while it
is executing U must not be allowed to read it. Equally
important, however, is the fact that U does not want
V's program to be able to read the calling program P
and its data; although U may not be trying to market
P, it, and especially its data, contain valuable information about U's current development work which
must be kept from competitors. The relationship
between U and V, and between their programs P and A,
is therefore one of mutual suspicion. Each is willing
to entrust the other with just enough information
to allow the circuit analysis to be completed, and no
more. The system must support this requirement if it
is to be a suitable vehicle for selling programs.
Furthermore, cale must be taken beyond the programs. While P is running it needs the ability to access U's files by name, to read input data and record
results. This privilege must certainly not be extended
to A, since it can learn even more about U's secrets
by examining his files than by looking at his program,
not to mention the possibility of modifying them. On
the other hand, A may need access to V's files to obtain
data for the analysis and to collect statistics and accounting information; this access must not be available
to p,. The. protection mechanisms must therefore provide for isolating P and A at the level of file naming as
well as on the lower levels which have been the subject
of this paper so far.
What is required then is a system facility something
like this. V establishes A as a proprietary program,
specifying the file on which it resides. Another user's
program P may then ask the system to attach this
file. To do this, the system creates a new domain A,
installs the program in it, provides it with some storage,
and returns to P a gate into A. When P wants to call
A, he uses the gate and passes whatever parameters
he thinks are needed for Ato function.. When A is
finished, he retmns. The protection mechanisms we

36

Fall Joint Computer Conference, 1969

have been discussing prevent undesired interference
between P and A. Safeguards for the files are discussed
below.
The example abcwe is one of a great variety of similar
situations. The system itself creates many of them. A
LOGOUT command, for example, requires special access to accounting files and to capabilities for destroying
a process, but it would be nice to call it with the
standard command processor. Similarly, driving a
special peripheral like a printer requires special capabilities. If a company maintains a large data base, it
may wish to give different classes of users access to
different parts of it by allowing them to call different
accessing programs. These and many other applications
fall within the general outline established by our proprietary program example. We now proceed to consider
how to handle the file naming problems it presents.
External names

Table II lists the goals of a naming system for objects,
and indicates some of the distinctions between the
use of capabilities in names which have been discussed
in previous sections, and the use of external names,
which are strings of characters such as 'FILEl' or
'CIRCUIT'. In summary, it says, that capabilities are
very convenient for use by a program, since they are
cheap and self-validating. On the other hand, they are
very bad for people, since they cannot be typed in or
remembered. Names for people ~hould also have the
property that the same name can :refer to many different objects, the distinctions to be made by context.
Thus, Smith's file 'ALPHA' is not the same as Jones'
'ALPHA'.
TABLE 11- Goals of a naming system for objects

Goal

Achieved by
Capabilities

N ames are mnemonic
N ames can be relative
to other names
N ames can be used externally
Possession of name
X
authorizes access
N ames are cheap
X
to use
N ames can be maX
nipulated by programs

Achieved by
external names
X

X
X

X

Techniques for achieving all these goals are well
known. They depend on the introduction of a new kind
of object called a directory, which consists of pairs:
< external name, capability>, and an operation of
opening an object by supplying the name to obtain
the capability. Since the external name is interpreted
relative to a directory, there is a suitable basis for
establishing the context of a name. A tree-structured
naming system is implicit in the scheme, because
directories are themselves objects accessed by capabilities. It is now easy to see how a program in 2~ domain
D accesses the objects belonging to owner U. 'When D
is created, it is supplied with a capability for TJ's
directory, which it simply exercises.
There is more controversy over the proper methods
of accessing objects belonging to other users. A popular
approach is to use passwords: a public read-only
directory is filled with capabilities for all other directories which allow the objects in them to be accessed
provided a correct password (usually different for each
object) is supplied as part of the opening operation.
This method is not satisfactory. First, it is inconvenient,
since it requires the person accessing the fillS to remember the password. Second, it is insecure. If he
writes the password down, or includes it in a program,
the possibility increases that it will become known. It
is bad enough to have to use a password tOo obtain
entry to the system, but at least only one password is
involved, it is used only once per session, and it can
be changed, if need be after each session, without too
much fuss. None of these things is true of passwords
attached to files: there are many of them, many people
need to know them, and one must be used each time
a file is opened. This scheme has no advantage except
economy of implementation.
A method based entirely on capabilities suffers only
one of these drawbacks: it is inconvenient, but secure.
It is also, however, quite complex. The idea is that if
a file (or anything else) is to be shared, a capability
for it should be passed from its owner to those who
wish to share it. The problem is that a capability,
being a protected object, must be passed through protected channels; it cannot be sent in a letter, even a
registered letter. The solution is illustrated in Figure
7. Every user has (at least) two directories, a private
one which he works with, and a transfer directory. The
public directory PUB, for which every user has a read
capability, contains write capabilities for all the trans··
fer directories. The object is to move the capability
for X from PDA to PDB. Proceed as follows:

Dynamic Protection Structures

Name

Ac_c~~~_

va!.ue..

A

W

TOA

B

W

TOB

PUB:

Name

Access

37

Value

mAl L...J .,J.y
A '.

public directory, containing a write-only
capability for the
transfer directory
of each user.

*

'"

** ..

temporary capability for
copying
final copied
capability

-.. .. path for copying

c::

R

PUB

RW

TDA

•W

OBJ
TOB

D8
*

SMITH*
I

u ser A's priv!te directory

1_. -- -:-1--

I

~ _~l-o~:

I

I
I

l

I

user B' s transfer directory

C

Rr-:~
~

OBJ

I

I

Capabilities for
SMITH's computation before opening
the file.

I

I
I

I
I

I

**

user B' s private directory

.I--::.:AL=P:..:.HA::.:...a.I~R~~ ~- - ~ ... "

JONES' directory

Figure 7-Sharing capabilities without aecess keys

A moves a capability for TDB into PDA
Using it, A moves his capability for X to TDB
B moves the capability for X from TDB to PDB
Since only B can access TDB, security is preserved. A
malicious user can confuse things by writing random
capabilities into the TDs, but it is easy for B to check
that he has gotten the right thing. Furthermore, if X
is a directory, future communication can be carried
out quite conveniently, since A and B can then communicate through X without any worries about outside interference.
A much better method is based on the simple idea
of attaching to a directory entry a list of the users
who are allowed to access it; with each user we can
also specify options, so that Rosenkrantz may be
granted write access to the file while Guildenstern can
only read it. This scheme, which was first used in
CTSS/ has two drawbacks. The first is that if the list
of users who are authorized to access a file is long, it
takes a lot of space to store it; this problem is espe~ially
annoying if there are several files to be accessed by the
same group of users. The second drawback is that there
is no provision for giving different kinds of access to
different domains of a computation. Both difficulties
can be overcome in a rather straightforward manner.
Before we pursue this point, it is important to notice
-why the difficulty encountered above in the capabilitypassing scheme does not arise here. We can think of
the computation of a logged-in user as possessing a
special kind of capability which identifies it as belonging to him. If SMITH is the user, we will refer to
thiA capability as SMITH*, meaning that the string

S~11TH*

Capabil1ties for
SMITH's computation after opening
the file.

Figure 8--Use of access keys

'SNIITH' has been enclosed in a tamper-proof box.
When JONES wishes to give SMITH access to his
file ALPHA, he puts the name SMITH on the access
list; JONES can do this since he has a capability for
ALPHA. When a computation presents the capability
SMITH*, ~the system observes that the string (or user
number) which is the contents of the capability matches
the string on the ac~ess list and grants the access.
At no time is it necessary for JONES to have SMITH*
in his possession. He needs only the name SMITH
which, since it is not a protected object, can be communicated to him by shouting across the room. Figure
8 illustrates.
To generalize the method we need two ideas. One
is that of an access key. This is an object (i.e., it can
be referenced only by using a c.apability) which consists simply of a bit string of modest length, long
enough that the number of different access keys is
larger than the number of microseconds the system
will be in existence. Any user may ask the system for a
new access key; the system will create one never seen
before and return a capability for it. The object SMITH*

38

Fall Joint CoIllJ)uter Conference, 1969

mentioned in the last paragraph is an example of an
access key; one is kept for each user in the system.
Since an access key is an object, capabilities for it
appear in the directories and are protected exactly as
is done for any other object (since the access key is a
small object, it may be convenient for the implementation not to give it any existence independently
of the capabilities for it, i.e., to make the value of the
capability the object itself, rather than a pointer to
it as in the case of files). To give a group of users access
to some files, all we have to do is distribute a new
access key GROUP* to the users and put GROUP
on the access list for each file. The distribution is
accomplished by creating GROUP* and putting all
the users on its access list; once they have copied it
into their directories they can be removed from the
access list, so that no space need be wasted. In practice,
as we have pointed out, numbers of perhaps 64 bits
would be used instead of strings like 'GROUP'.
The second idea is not new at all. It consists of the
observation that since an access key is just an object,
different domains can have different access keys and
hence different kinds of access to the file system. Thus,
for example, a user's computation may be started with
two domains, one for his program with his name as
access key, and the other for system accounting with
an access key which allows it to write into the billing
files. With a single suitable access: key, a domain can
easily get hold of an arbitrarily large collection of
othAl' objects which are protected by other keys, since

the first key can be used to obtain other keys from the
directory system.

SUMMARY
We have described a very general scheme for distrlbuting access to objects among the various parts of
a computation in an extremely specific and flexible
way. The scheme allows two domains to work together
with any degree of intimacy, from complete 1~rust to
bitter mutual suspicion. I t also allows a domain to
exercise firm control over everything created by it or
its subsidiaries.
.

REFERENCES
P A CRISMAN editor
The compatible time-sharing system: A. programmer's guide

MIT Press 2nd ed Cambridge Mass 1965
2 J P DENNIS
Segmentation and the design of mu,lti-programmed computer
systems
.J ACM Vol 12 Oct 1965 589

3 J B DENNIS E C VAN HORN
Programming semantics Jor multiprogrammed compuuuion

CACM Vol 8 No 3 March 1966 143
4 R M GRAHAM
Protection in an information proce8sing utility

CACM VollI No 5 May 1968 368
5 B W LAMPSON
A scheduling philosophy for multi-proce8.<;iny 8ystems
CACM VollI No 5 May 1968347
6 B W LAMPSON et al
A user machine in a time-sharing system
Proc IEEE Vol 54 No 12 Dec 1966

The ADEPT-50 time-sharing system
by R. R. LINDE and C. WEISSMAN
System Development Corporation
Santa Monica, California

and

C. E. FOX
King Resources Company
Los Angeles, California

INTRODUCTION
In the past decade, many computer systems intenderl
for operational use by large military and governmental organizations have been "custom made" to
meet the needs of the particular operational situation
for which they were intended. In recent years, however, there has been a growing realization that this
design approach is not the best method for long term
system development. Rather, the development of
general purpose systems has been promoted that
provide a broad, general base on which to configure
new systems. The concepts of time-sharing and general-purpose data management have been under development for several years, particular.ly in university
or research settings. 1 ,2,3 These methods of computer
usage have been tested, evaluated, and refined to
the point where today they are ready to be exploited
by a broad user community.
Work on the Advanced Development Prototype
(ADP) contract was begun in January 1967 for the
purpose of demonstrating-in an operational environment-the potential of automatic informationhandling made possible by recent advances in computer technology, particularly advances in timesharing executives and general-purpose data management techniques. The result of this work is a largescale, multi-purpose system known as ADEPT, which

operates on IBM system 360 computers. *
The entire ADEPT system is now being used at
four field installations in the Washington, D. C. area,
as well as at SDC in Santa Monica. The system was
installed at the National Military Command System
Support Center in May 1968, at the Air Force Command Post in August 1968, and at two other government agenc;es in January 1969. These four field sites
collectively run ADEPT from 80 to 100 hours per
week, providing a total of some 2000. terminal hours
of time-sharing service monthly to theIr users.
The ADEPT system consists of three major components: a time-sharing executive; a data management system adapted from SDC's Time-Shared Data
Management System (TDMS) described by Bleier,4
and a programmer's package. This p~per deals .exclusively with the ADEPT Time-SharIng Executr~re,
and particularly with the more novel asp~~ts of Its
architecture and construction. Before examInIng these
aspects it will be instructive if we review the basic
design and hardware configuration of the system.
A general purpose operating system

The ADEPT executive is a general-purpose time-

* Development of ADEPT was supported in part by the Advanced Research Projects Agency of the Department of Defense.
39

40

Fall Joint Computer Conference, 1969

sharing system. The system operates on a 360 Model
50 with approximately 260,000 bytes of core memory,
4 million bytes of drum memory, and over 250 million
bytes of disc memory, shown graphically in Figure
1 and schematically in the appendix. With this machine
configuration, ADEPT is designed to provide responsive on-line interactive service, as well as background
service to approximately 10 concurrent user jobs. It
handles a wide variety of different, independent application programs, and supports the use of large
random-access data files. The design-basically a
swapping system·-provides for flexibility and expansion of system functions, and growth to more powerful
models in the 360 family.
ADEPT functions both as a batch processor (whereby jobs are accumulated and fed to the CPU for operation one by one) and as an interactive, on-line system
(in which the user controls his job directly in real
time simply by typing console requests).
Viewed as a batch system, ADEPT allows jobs to
be sub"mitted to console operators or submitted from
consoles via remote batch commands (remote job
entry). In either case, jobs are "stacked" for execution
by ADEPT in a first-in/first-out order. The stack is
serviced by ADEPT as a background task, subject
to the priorities of the installation and the demands
of "foreground" interactive users.' Viewed as an interactive system, ADEPT allows the user to work with
a typewriter, allowing computer-user dialog in real
time. Via ADEPT console commands, the u,ser identifies himself, his programs, and his data files, and
selectively controls the sequence and extent of operation of his job in an ad lib manner. A prime advantage
of the interactive use of ADEPT is that the system
provides an extendable library of service programs
that permit the user to edit data files, compile or
assemble programs, debug and: eliminate program
errors, and generally manage large data bases in a
responsive on-line manner.
System architecture

The architecture of the ADEPT executive is that
of the "kernel and the shell". The "kernel," referred
to as the Basic Executive (BASEX), handles the
major problems of allocating and scheduling hardware resources. It is small enough to be permanently
resident in low core memory, per~itting rapid response
to urgent tasks, e.g., interrupt control, memory allocation, and input/output traffic. The "shell," referred to as the Extended Executive (EXEX), provides
the interface between the user's application program
and the "kernel". It contains those non-urgent, large-

/

CORE ( 26M BYTES)

lj
2303 DRUM
(3.9M BYTES)

2311 DISC PACKS
(7.25M BYTES PER PACK)

2314 DISC STORAGE
(207M BYTES)

2302 DISC STORAGE
(226M BYTES)

Figure 1-Relative capacity of various ADEPT direct-access
storage media available in less than 0.2 seconds. The initial
system that operates at SDC utilizes core, 2303 drum, ~~311 and
2314 disc packs, and 2302 disc storage. The NMCSSC system
utilizes 2314 disc storage in lieu of 2311 or 2302 discs. The architecture of the ADEPT executive is such that it permitR any
combination of the e..bove types of disc storage in varying a.mounts

task extensions of the basic "kernel" prqcesses that
are user-oriented rather than hardware-oriented;
they may, therefore, be scheduled and swapped.
The version of the ADEPT time-sharing system,
thus far developed has multiple levels of control
beyond the two-level "kernel-shell" structure--i.e.,
it can be thought of figuratively as an "onion skin".
Figure 2 shows these relationships graphically.
Beyond EXEX, "object systems" may exist as
subsystems of ADEPT (developed by the user community without modification to EXEX or BASEX.),
thus further distributing and controlling the system
resources for the object programs that form still
another level of the system. The design ideas embodied
in ADEPT parallel those of Dijkstra,o Corbato,6
and Lampson,7 but differ in techniques of implementation.
The ADEPT Basic Executive operates in the lower
quarter of memory, ther~by providing three quarters
of memory for user programs. With the current H
core configuration, ADEPT preempts the first 65,000
bytes of core memory, the bulk of which is dedicated
to BASEX; EXEX must then operate in user memory

The AD!E:PT-50 Time-Sharing System

,.,..------- ........
",. / '

OTHER FUNCTIONS

........ "

/

/

/

41

,,

\

/

\

I

\

I

\

\

I

,
,

I
I

\

I

\

\

,

II

" " ........

/

"'"'-----,.,."

." ./

/

/

.... ....

Figure 2-Multiple levels of control in ADEPT

in a fashion similar to user programs. ADEPT is
designed to operate itself and user programs as a
collection of 4096-byte pages. BASEX is identified
as certain pages that are fixed in main storage and
that cannot be overlayed or swapped. EXEX and
other programs are identified as sets of pages· that
move dynamically between main storage and swap
storage (i.e., drum). It is necessary to maintain considerably more descriptive information about these
swapp able programs than about BASEX. This
descriptive information is carried in a set of system
tables that, at any point in time, describe the current
state of the system and each program.
ADEPT views the 'User as a job consisting of some
number of programs (up to four for the 360/50H
configuration) that were loaded at the user's reouest.
These programs may be independent of one another
or, with proper design, different segments of a larger
task. Implicitly, EXEX is considered to be one of
these programs. To simplify system scheduling, communication, and control, only one program in the
user's set may be active (eligible to run) at a time.
When ADEPT scheduling determines that a job may
be serviced, the current job in core is saved on swap
storage, and the active program of the next job is
brought into core from swap storage and f'xecuted
for a maximum period of time, called a quantum. The
process then repeats for other jobs. Figures 3 and 4
schematically depict these relationships.

Figure 3-Simple commutation of users programs. This figure
illustrates the relationship between user's programs' EXEX
and BASEX. Each spoke represents a user's job, with his EXEX
providing the interface between BASEX and the hardware
resources. The maximum number of interactive job the
IBM 360j50H configuration is ten.

Figure 4-ADEPT's basic sequence of operation. This figure
shows the basic operating system cycle: idle loop is interrupted
by an external interrupt (an activity request); a program is
scheduled, swapped into core from the drum, and executed
escape from the execution phase occurs when quantum termination condition (e.g., time expiro'l.tion, service or I/O call, error
condition) is met; the program i"! then swapped out and control
is returned to the idle loop (if no other program"! are eligible to
be scheduled).

Basic executive (BASEX)
Table I lists the BASEX components and their
general functions as of the eighth and latest executive
release. These basic system components form an
integrated, non-reentrant, non-relocatable, perma-

Fall Joint Computer Conference, 1969

42

nently-resident, core memory package 16 pages long
(each page is 4096 bytes). They are invoked by hardware interrupts in response to service requests by
users of terminals and their programs. Note the
. division of input/output control into cataloged (SPAM
and lOS), terminal (TWRI), and drum (BXEC)
activities to permit local optimization for improved
system performance.
TABLE I-Basic executive components

Component

Function

ALLOC

Drum and core memory allocation.

BXBUG

Debugger for executive programs.

BXEC

Basic sequence and swap control.

BXECSVC

SVC handlers for WAIT, TIME,
DEVICE, STOP AND DISMISS
calls.

EXEX

Linkage routines for EXEX (BASEX/
EXEX interfaces); also services commands DIALOFF, DIALON.

INTRUP

First-level interrupt control.

lOS

Channel-program level input/output
supervisory control.

RECORD

Records SVC, interrupt activity in
BASEX.
Scheduler.

SKED
SPAM

Input/ output access methods to cataloged storage.

TWRI

Terminal input/output control.

System Tables

Resident system data areas for communicationtable (COMTAB) 1 loggedin user's table (JOB), loaded programs
table (PQU) , drum and core status
tables (DSTAT, GSTAT), and a
variety of other tables.

Extended executive (EXEX)
Unlike the tight, closed package of integrated
BASEX components, EXEX is; a loose, open-ended
collection of semiautonomous programs. Table II
lists this collection of programs. EXEX is treated
by BASEX as a user program, with certain privileges,
and each user is given his own "'copy" of the EXEX.
I t is transparent to the user that EXEX is reentrant

TABLE II-Extended executive components

Component

Function

AUDIT

Maintains a real-time recording of all
security transactions as an accountability log.

BMON

Batch monitor for control of background job execution.

CAT

Cataloger for file storage access control; also services FORGET command.

DTD

Transfers recording information from
drum to disc.

DBUG

Debugger for non-executive (user)
programs.

LOGIN

User authentication and job creation.

SERVIS

Library of service commands 'Ghat are
reentrant, interruptible and scheduled:
APPEND, CHANGE, CREATE,
CYLS, DELETE, DRIVES;I INIT,
LISTF, LISTU, LOAD, LOADD,
LOAD and GO, OVERLAY, REPLACE, RESTORE, RESTORED,
SAVE,
SEARCH,
VAItYOFF,
VARYON.

RUN

Remote batch job submission control
servicing commands RUN and
\ OANCEL.

XXTOO

Library of small, fast, executive
service commands: CPU, BGO,
BQUIT, BSTOP, DIAL, DRUMS,
GO, LOGOUT, QUIT, R~BTART,
SKED,
SKEDOFF,
STATUS,
STOP, TIME; USERS.

SYSDEF

Defines input/output hardwa.re configuration at time of system start up.

SYSLOG

Defines authorized user/terminal security profiles at time of system
start up.

TEST

Initializes system tables at time of
system start up.

SYSDATA

Non-resident, shared, system data
table for dial messages and other
common data, e.g., lists of all logged-in
users; other non-resident, job-specific
tables also exist, e.g., job environment
pagel push-down list data page.

The ADEPT-50 Time-Sharing System
and is being shared with other users, except for its
data space. Each job has its own "machine state"
tables saved in its unique set of environment pages.
This structure permits flexible modification and orderly
system expansion in a modular fashion. EXEX is
always scheduled in the same way as other user programs.
Though EXEX components are, in large part,
non-self-modifying reentrant routines and thus, could
at sm!1ll cost, be relocatable; neither user programs
nor EXEX components are relocated between swaps.
The lack of any mapping hardware on the IBM 360/50
and the design goal and knowledge that most user
programs would be of maximum size made unnecessary
a software provision to relocate programs dynamically.
User programs may be relocated once at load time.
however.
Communication and control techniques used in ADEPT

Communication is the generic term used to cover those
services that permit two (or more) programs to intercommunicate, be they system program, user program,
or both. From this communication vantage point we
shall examine the connective mechanism used between
the Basic and Extended Executives; the techniques
that allow components within the EX EX to make
use of one another; and the system design that permits
an object program to control its own behavior as well
as to communicate with the system and with other
object programs.

The ADEPT job or process
Before we discuss the system mechanics, let us
examine how the system treats each user logically.
A user in the system is assigned a job number. Each
job in the system may be viewed as a separate process,
and each process is, by definition, independent of all
other processes running on the machine. A processor job- is not a program. It is the logical entity for
the execution of a program on the physical processor,
and it may contain as many as four separate programs.
A program consists of the set of machine instructions
swapped into the processor for execution, and the
Extended Executive is one of these programs.
The ADEPT executive requires a large number of
system tables to permit Basic and Extended Executive communication. Conceptually, the use of descriptive tables defining the condition of a user's process
is analogous to the state vector (or state word) discussed by Lampson and Saltzer. 8 •9 That is, the collection of information contained by these tables is

43

sufficient to define an inactive user's process state
at any given moment. By resetting the central processor from the state vector, a user's job proceeds
from an inactive to an active state as if no interruption had occurred. The state vector contains such
items as the program counter, the processor's general
registers, the core and drum map of all the programs
in the job, and the peripheral storage file data. All
of the collective data for each program or task in the
process are contained in the state vector.

Basic and extended executive communication
Each ADEPT user (i.e., any person who initiates
some activity within the system by typing in commands) is given a job number and assigned an entry
in the JOB table. The JOB table contains the system's
top-level bookkeeping on user activity. I t contains
the user's identification, his location, his security
clearance, and a pointer to his program queue. Each
user is assigned one entry, or JOB, in the table. Associated with each JOB are the one or more programs
that the user is running.
Top-level bookkeeping on programs is contained
in the Program Queue (PQU) table. Each PQU entry
contains a program identification and some (but not
all) information that describes that program in terms
of its space requirements, its current activity, its
scheduling conditions, and its relationship to other
programs in the PQO that belong to the same JOB.
The detailed descriptive information and the status
of each JOB and its programs are carried in the swappable environment space.
The environment pages (there can be as many as
four) comprise a number of separate tables that contain such information as the contents of the general
registers, the swap storage page numbers where the
balance of the program resides, the program map,
and lists of all active data files. A single environment
page (or pages) is shared by all programs that belong
to the same JOB (user). The system design allows for
environment page overflow at which time additional
pages are assigned dynamically. The environment
pages, PQU table, JOB table, and data pages comprise the state vector of the user's job.
To permit storage of "global" system variables,
and to allow system components to reference system
data that may be periodically relocated, there exists
a system communication table, which resides in low
core so that it can be referenced without loading a
base register.
The IBM 360 supervisor call (SVC) is used exclu-

44

Fall Joint Computer Conference, 1969

sively by EXEX components and object programs to
request BASEX services. Though additional overhead
is incurred in the handling of the attendant interrupt,
the centralization of context switching provided is
of considerable value in. system design, fabrication,
and checkout.

Extended executive communication
An EXEX may make use of another EXEX fUIlction by use of the sve call m~chanism. To support
the recursive EXEX, an additional sve processing
routine is required to manage the different recursive
contexts. This routine, called the sve Dispatcher,
processes calls from user and EXEX functions alike,
manages a swappable data page, and switches to an
interface linkage routine. The· data page contains
a system communication stack that consists of a
program's general registers and the Program Status
Word at the time of the sve. This technique is
analogous to the push-down logic of recursive procedure calls found in ALGOL or LISP language
systems. The stack provides a convenient means of
passing parameters between routines in the EXEX.
Since each job has its own unique data page and environment page, EXEX is both recursive and reentrant.
The environment status table (ESTAT) contains
the swap and core location for each component in
the EXEX and for each program in the job. It resides
in the job environment page. When an EX EX service
is requested, only that particular EXEX program is
brought in from swap storage,: rather than the full
service library. The interface linkage routine provides
this management function; it lies as a link between
the sve Dispatcher and the particular EXEX
function. The interface routine picks up necessary
work pages for the EXEX component involved and
branches to that component aner it is brought into
core. The interface routine maintains a separate pushdown stack of return addresses: providing the means
for the EXEX component to properly exit and return
control to its interface routine and then to the system.
The EXEX component called; may make additional
EXEX sve calls before exiting. To provide correct
work page allocation during recursive calls, the interface routine also saves the work page core and drum
page addresses in the push-down stack. Upon completion of a call, the EXEX component returns to
its interface routine; the interface routine releases
all allocated work pages to the system and branches
to a common unwind procedure.
The unwind procedure, like the sve Dispatcher,
is simply a switching mechanism. It determines, via

the stack, whether to return to a still higher level
EXEX function, or to turn the EXEX off and exit
to the Basic Sequence. This recursive/reentrant control is the most complex portion of ADEPT and is
the "glue" that binds BASEX and EXEX together.
Figure.5 illustrates the recursive process.

Object program communication
One of the more stringent services required of an
operating system is the rapid interchange of large
quantities of data between object programs. The
interchange of even simple arrays, matrices, and tables
via stack parameters or a common fil~ suffers from the
inadequacy of limited capacity or extensive I/O time.
Many operating systems ignore this requirement,
thereby restricting the general-purpose appllications.
Yet there are solutions to this problem, and one successful technique employed in the ADEPT system is
that of "shared memory". Shared memory is achieved
by using the basic mechanism for managing reentrancy,
namely the program environment page map. Through
the ADEPT SHARE Page call, an object program
can request that designated pages of another program

DATA PAGE PUSH
DOWN STACK.
SVC
DISPATCHER
STACKS
EXEX
COMPONENT'S
GENERAL
REGISTERS

NUMBER OF ENTRI ES
EX EX
"A" COMPONEN·r
REGISTER~

UNWIND
DECREMENTS
STACK

EX EX
"B" COMPONENl
REGISTERS

0)

Figure 5-Block diagram of EXEX behavior g,nd
control

The ADEPT-50 Time-Sharing System

45

console and then processed in turn. by this supervisor
in the job be added to its map. If core page numbers
function.
are passed as parameters in various service calls, whole
pages of data may be passed between programs. EXEX
and many object programs operating under this system
Armed interrupts and rescue function
use this method for inter-program communication.
ADEPT operating on the IBM 360/.50H restricts
The basic design of ADEPT conveniently provides
its user programs to 46 active core pages. However,
for prooessing object program "armed" interrupt
by utilizing the GETPAGE call, an object program
calls. This means that an obje-ct program is able to
may acquire up to 128 drum pages and may subseconditionally start (wakeup) and stop (sleep) the
quently activate and deactivate various page sets
execution of its own programs, and others as well.
by utilizing another service call, ACTDEACT (actiThe conditions for ~mploying wakeup calls include
vate/ deactivate). This scheme permits bulk data from
too much elapsed time, or the occurrence of unpredisc storage to be placed on drum and operated upon
dictable but anticipated events, e g., errors and other
at "swap" speeds. Thus skilled system users can
program calls. In "arming" these "software-interachieve efficient use of time and memory by managing
rupt" conditions by object program calls, the program
their own "paging". We consider this the best alternaentry point(s) for the various conditions are specified.
tive considering the questionable state of other, autoWhen such conditions occur, the operating system
matic paging algorithms. 1O ,1l,12,13 Most EXEX comtransfers to the specified entry' point and gives the
ponents use these calls for just such purposes. For
appropriate condition code. (Note that if we take this
example, the interface routines mentioned above use
call one step further, and permit one object program
activate calls to "turn on" called components of the
to arm the software and hardware interrupts of another
EXEX.
object program, we have the basic control mechanism
The Allocator component of ADEPT manages the
necessary to permit the operation of "object systems.
page map for each program. This software map renecessary to permit the operation of "object systems,"
flects the correspondence between drum and core
i.e., subexecutives-another level in the "onion skin"
pages, established initially by the SERVIS (service)
of ADEPT control.)
component at load time. The Allocator's function is
User programs interface with the ADEPT system
to inventory available core and drum pages by mainprimarily via the supervisor call (SVC)· instruction;
taining two resident system tables: one for core, the
a secondary interface is provided via the program
other for drum. Whenever drum pages are released . check interrupt that protects the program and system
or obtained, the Allocator updates the page map in
after various error conditions. The executive design
the job's environment page. The Allocator processes
allows user programs to trap all such interfaces with
the SHARE (page), GETPAGE, FREEPAGE, and
the system via its rescue arming mechanism. This
ACTDEACT calls from EXEX and object programs.
means that one program can trap and get first-level
SERVIS allows a program at run time to add data
control of all occurrences of SVC's and program checks
pages or to overlay program segments from disc or
within a single job. This mechanism also means, then,
tape. In so doing, SERVIS makes use of the various
that the responsibility and meaning for these interAllocator calls.
faces can be redefined at the user program level.
As of this writing, this mechanism is being employed
Simulating console commands
to eonstruct object systems for an improved batch
monitor, an interface for the proposed ARPA NetAn importan.t attribute of ADEPT time-sharing
work,14
and to experiment with automatic translators
is that nearly all the functions and services that can
for
compatibility
with other operating systems. Other
be initiated at the user's console can also be called
uses
include
improvements
in program recovery in
forth within a user's program. A program designer
a
variety
of
user
tools,
e.g.,
compiler
diagnostics.
can, for example, build a system of programs, which
can operate in batch mode under the control of a program by issuing internal commands in much the same
manner as the user sitting at the console .. With this
approach, the ADEPT batch monitor controls background tasks by simulating user terminal requests.
Ba.tch requests can be enqueued by users from any

Resource allocation, access, and management
ADEPT system design, of course, includes a complete set of resource controls that monitor secondary
storage devices.

46

Fall Joint Computer Conference, 1969

The cataloger
The Cataloger, an EXEX component, is functionally
analogous to the core/drum Allocator, but is used
for devices accessible by user programs. It maintains
an inventory of all assignable storage devices, assigns
unused storage on the devices,· maintains descriptions of the files placed on these devices, controls
access to these files, and-upon authorized requestdeletes any file. Specifically, the Cataloger:
• Assigns storage on 2302, 2311 and 2314 discs.
• Assigns tape drives.
• Locates an inventoried file by its name and certain qualifiers that uniquely identify the file.
• Issues tape or disc pack mounting instructions
to the operator when necessary.
• Verifies the mounting of labeled volumes.
• Passes descriptive information to the user program opening a file.
• Allows the user of a file to request more storage
for the file.
• Denies unauthorized users access to files.
• Returns assigned storage to available storage
whenever a file is deleted.
• Maintains a table of contents on each disc volume.
As the largest single compon~nt of the ADEPT
Eexcutive (65,000 bytes), the Cataloger was written
in a new, experimental programming language called
MOL-360 (Machine-Oriented Language for the 360).16
I t is a "higher-level machine language" developed
under an ARPA-sponsored SDC research project on
metacompilers. It resolved the dilemma involving
our desire for higher-level source language and our
need to achieve flexibility with machine code. The
Cataloger design and che6kput, enhanced by the use
of MOL-360, showed simultaneously the validity
of MOL compilers for difficult machine-dependent
programming.

results of EXCP for the call are "interpreted" by
SPAM and returned to the user program as status information. As such, SPAM represents a more symbolic
I/O capability than the EXCP level. It provides a
relatively simple method for executing the operations
of reading, writing, altering, searching for, ELnd positioning records within ADEPT cataloged and controlled disc-based and tape-based file structures,.

Resource mana,gement
As of this writing, the computer operator has a set
of commands at his disposal that allow him to control
the system resources. Various privileged on-line commands enable him to monitor the terminal activities
of system users and to control assignment and availability of storage devices. However, there is an increasing need for a "manager" to be given more
latitude in dynamically controlling the system resources and observing the status of system users,
particularly because ADEPT was designed to handle
sensitive information in classified government and
military facilities. To meet these objectives, a design
effort is under way that gives the computer operator
system-manager status, with the ability to observe
and control the actions of system users. The result
will be a program that encompasses some of the management techniques reported by Linde and Ch aney 16
tailored to present needs.
Swapping and scheduling user programs .

Most of the programs that run under ADEPT
occupy all of the core memory that is not used by
the resident Basic Executive (46 pages on the 360/
50H). If the set of needed pages could be reduced
considerable reduction in swap overhead could be
expected. One way to achieve this is to mark fo][, swapout only those pages that were changed during program execution. The hardware needed to automatically
mark changed pages is unavailable for the 3:60/50;
however, through use of the store-protect feature on
the Model 50, ADEPT software can simulate the effect and produce noteworthy savings in swap time.

Page marking
The SPAM component
SPAM is a BASEX component that permits symbolic, user-oriented I/O. It can be viewed as a specialpurpose compiler that compiles sytnbolicuser program
I/O calls into 360 channel programs, and delivers them
to the Input/Output Supervisor (lOS) for execution
via the EXCP (execute channel program) call. The

Whenever a user program is swapped into ClOre, its
pages are set in a read-only condition. As the program
executes, it periodically attempts to store data (write)
in its write-protected pages. The resulting interrupt
is fielded by the system. After s~tisfying itself that
the store is legal for the program, the executive marks
the target page as "written," turns off write-protect

The ADEPT-50 Time-Sharing System
for that page, and resumes the program's execution.
The situation repeats for each additional page written.
At the completion of the program's time slice, the
8wapper has a map of all the program pages that
were changed (implied in the storage keys with no
write protection). Only the changed pages are swapped
out of core. Measurement of this scheme shows that
about 20 percent of t·he pages are changed; hence,
for every five pages swapped in, only one need be
swapped out, for a total swap of six pages, rather
than the full swap of ten pages (five in, five out). The
scheme makes the drum appear to be 40 percent faster.
The use of the storage protection keys is based on
the functional status of each page rather than on
some user identity. User programs always run with
a program status word key of one, and the bits in
the storage key associated with the programs start
out at zero. After a page has been initially changed,
its key is set to one also. The other bits in the key are
used to indicate: first, a page is transient, not yet
completely moved to or from swap storage; second,
a page is unavailable, i.e., it belongs to someone else;
third, a page is locked and cannot be swapped or
changed; and finally, a page is fetch-protected because
it may contain sensitive information.

Scheduling algorithm
The scheduling algorithm provides for three levels
of scheduling. Jobs that are in a "terminal I/O complete" state get first preference in the schedule. Jobs
in the second level, or background queue, are run if
there are no level-one jobs to run. A job is placed in
level two when the two-second quantum clock alarm
terminates its operation two consecutive times. Compute and I/O-bound programs are treated alike. A
level-two job-when allowed to run-is given quantum
interval equal to the basic quantum time multiplied
by the scheduling level (i.e., 2 sec X 2 = 4 sec).
However, a level-two background job may be preempted after two seconds for terminal I/O. Anyoperation a level-two job makes that terminates its quantum prematurely will return the job to a level-one
status. The batch monitor job is run when the first
two queues are empty. User programs may be written
to overlap execution and I/O activity. Our choice of
scheduling parameters for quantum size, and number of service levels was selected empirically and as a
result of prior experience. 17
A command SKED, which is limited to the operator's terminal, has the effect of forcing top priority
for a job (the job stays at level one all the time). Only

47

one job may run in this privileged scheduling state
at a time.
Pervasive security controls

Integrated throughout the ADEPT executive are
software controls for safeguarding security-sensitive
information. The conceptual framework is based
upon four "security objects": user, terminal, file,
and job. Each of these security objects is formally
identified in the system and is also described by a
security profile triplet: Authority (e.g., TOP SECRET, SECRET), N eed-to-Know Franchise, and
Special Category (e.g., EYES ONLY, CRYPTO).
At system initialization time, user and terminal
security profiles are established by security officers
via the system component SYSLOG. SYSLOG also
permits the association of up to 64 passwords with
each user. At LOGIN time, a user identifies himself
by his unique name, up to 12 characters, and enters
his private password to authenticate his identity. The
LOGIN component of ADEPT validates the user
and dynamically derives the security profile for the
user's job as a complex function of the user and terminal security profiles. The job security profile is
used subsequently as a set of "keys," used when access
is made to ADEPT files. The file security profile is
the "lock" and is under control of the file subsystem.
File access Need-to-Know is permitted for Private,
Semi-Private, and Public use. With the CREATE
command, a list of authorized users and the extent of
their access authorization (i.e., read-only, write-only,
read and write) can be established easily for SemiPrivate files. Newly created files are automatically
classified with the job's "high water mark" security
triplet-a cumulative security profile history of the
security of files referenced by the job. Through judicious use of the CHANGE command, these properties may be altered by the owner of the file.
Security controls are also involved in the control
of classified memory residue. Software and hardware
memory protection is extensively used. Software
memory protection is achieved by interpretive, legality checking of memory bounds for I/O buffer
transfers, legality checking of device addresses for
unauthorized hardware access, and checks of other
user program attempts to seduce the operating system
into violating security controls.
The hardware protection keys are used to fetchprotect all address space outside the user program and
data area. Also, newly allocated space to user programs
is zeroed out to avoid classified memory residue.

The ADEPT-50 Time-Sharing System
for that page, and resumes the program's execution.
The situation repeats for each additional page written.
At the completion of the program's time slice, the
swapper has a ma,p of all the piogram pages that
were changed (implied in the storage keys with no
write protection). Only the changed pages are swapped
out of core. Measurement of this scheme shows that
about 20 percent of the pages are changed; hence,
for every five pages swapped in, ;only one need be
swapped out, for a total swap ot six pages, rather
than the full swap of ten pages (five in, five out). The
scheme makes the drum appear to be 40 percent faster.
The use of the storage protection keys is based on
the functional status of each page rather than on
some user identity . User programs always run with
a program status word key of one, and the bits in
the storage key associated with the programs start
out at zero. After a page has been initially changed,
its key is set to one also. The other bits in the key are
used to indicate: first, a page is transient, not yet
completely moved to or from sw~p storage; second,
a page is unavailable, i.e., it belongs to someone else;
third, a· page is locked and cannot be swapped or
changed; and finally, a page is fetch-protected because
it may contain sensitive information.

Scheduling algorithm
The scheduling algorithm provides for three levels
of scheduling. Jobs that are in a "terminal I/O complete" state get first preference in ,the schedule. Jobs
in the second level, or background queue, are run if
there are no level-one jobs to run. A job is placed in
level two when the two-second quantum clock alarm
terminates its operation two consecutive times. Compute and I/O-bound programs are treated alike. A
level-two job-when allowed to run-is given quantum
interval equal to the basic quantum time multiplied
by the scheduling level (i.e., 2 sec X 2 = 4 sec).
However, a level-two background. job may b~ preempted after two seconds for terminal I/O. Anyoperation a level-two job makes that terminates its quantum prematurely will return the job to a level-one
status. The batch monitor job is run when the first
two queues are empty. User programs may be written
to overlap execution and I/O activity. Our choice of
scheduling parameters for quantum size, and number of service levels was selected eritpirically and as a
result of prior exp~rience.17 .
A command SKED, which is limIted to the operator's terminal, has the effect of f~rcing top priority
for a job (the job stays at level one all the time). Only
!

48

one job may run in this privileged scheduling state
at a time.

Pervasive. security controls
Integrated throughout the ADEPT executive are
software controls for safeguarding security-sensitive
information. The conceptual framework is based
upon four "security objects": user, terminal, file,
and job. Each of these security objects is formally
identified in the system and is also described by a
security profile triplet: Authority (e.g., TOP SECRET, SECRET), Need-to-Know Franchise, and
Special Category (e.g., EYES ONLY, CRYPTO).
At system initialization time, user and terminal
security profiles are established by security officers
via the system component SYSLOG. SYSLOG also
permits the association of up to 64 passwords with
each user. At LOGIN time, a user identifies himself
by his unique name, up to 12 characters, and enters
his private password to authenticate his identity. The
LOGIN component of ADEPT validates the user
and dynamically derives the security profile for the
user's job as a complex function of the user and terminal security profiles. The job security profile is
used subsequently as a set of "keys," used when access
is made to ADEPT files. The file security profile is
the "lock" and is under control of the file subsystem.
File access N eed-to-Know is permitted for Private,
Semi-Private, and Public use. With the CREATE
command, a list of authorized users and the extent of
their access authorization (i.e., read-only, write-only,
read and write) can be established easily for Semi·,
Private files. Newly created files are automa1jcally
classified with the job's "high water mark" security
triplet-a cumulative security profile history of the
security of files referenced by the job. Through judicious use of the CHANGE command, these properties may be altered by the owner of the file.
Security cdntrols are also involved in the control
of classified memory residue. Software and hardware
memory protection is extensively used. Software
memory protection is achieved by interpretive, legality checking of memory bounds for I/O buffer
transfers, legality checking of device addresses for
unauthorized hardware access, and checks of other
user. program attempts to seduce the operating system
into violating security controls.
The hardware protection keys are used to fetchprotect all address space outside the user program and
data area. Also, newly allocated space to user prog;rams
is zeroed out to avoid classified memory re:3idue.

The ADEPT-50 Time-Sharing System
Typically, the complete system reaches "on the air"
status in less than a minute.
System instrumentation

Many of the parameters built into the scheduling
and swapping of early ADEPT versions were based
upon empirical knowledge. The latest versions of
the' Basic and Extended Executives include routines
to record system performance, reliability, and security
locks.
Built into the BASEX is a routine to measure the
overall and the detailed system performance. 20 Such
factors as the number of users, file usage, hardware
and software errors, and page transaction response
time are recorded on unused portions of the 2303
drum. These measurements provide a better understanding of the system under a variety of inputs and
give the designers insight into how the hardware and
software components of the system affect the performance of the human user.
An AUDIT program was made part of the EXEX
to record the security interaction of terminals, users,
and files., AUDIT records EXEX activity in the areas
of LOGIN, LOGOUT, and File Manipulation. This
routine strengthens the security safeguards of the
executive. Specific items that are recorded involve:
type of event, user identification, user account number, job security, device identification, time of event,
file identification1 file security and event success. In
addition, this routine provides accounting information and is used as a means of debugging the security
locks of new system releases.
In addition to the BASEX recording function,
several object programs have been written that simulate various modes of user activity and provide controlled job distributions. These programs, called
"benchmarks," run under controlled conditions and
enhance the means of improving system performance
and throughput, as described elsewhere by Karush.21
The programs are designed to gather performance
measures on the major routines of the executive and
have been of considerable help in system "tuning,"
because they renect the effect of coding and design
changes to various system routines. The routines in
the executive that are of primary concern are the
swapper, the scheduer ,the terminal read/write pack..
age, and the interrupt handling processes. Attempts
are being made to design a set of benchmarks that
represent a typical job mix. However, we are primarily
interested in measuring the performance of our system
against various modifications of itself and in measuring
its behavior with respect to different job mixes.

49

SUMMARY
The ADEPT executive is a second-generation, generalpurpose, time-sharing system designed for IBM 360
computers . Unlike the monolithic systems of the past,l,2
it is structured in modular fashion, employing distributed executive design technIques that have permitted
evolutionary development. This design has not only
produced a flexible executive system but has given the
user the same facilities used by the executive for
controlling the behavior of his programs. ADEPT's
security aspects are unique in the industry, and the
testing and fabrication methods employ a number
of novel approaches to system checkout that contribute to its operational reliapility.
It is important to note that this system deals particularly well, with size limitation problems of very
large files and very large programs. The provisions
made for multiple programs per job, active/inactive
page status for programs larger than core size, page
sharing between programs, common file access across
programs within jobs, and the commitment of considerable space to active fil~ environment tables (up
to four pages worth) contribute to this success. Nevertheless, all these capabilities are designed to handle
the smaller entities as well. We feel ADEPT-50 is
a significant contribution to the technology of generalpurpose time-sharing.

ACKNOWLEDGMENTS
We would like to express our appreciation for the
dedicated efforts of some very adept individuals who
participated in the design and building of this timesharing system. Our thanks go to Mr. Salvador Aranda,
Mr. Peter Baker, Mrs. Martha Bleier, Mr. Arnold
Karush, Mrs. Patricia Kribs, Mr. Reginald Martin,
Mr. Alexander Tschekaloff and all the others who
have followed their lead.

REFERENCES
1 P CRISMAN editor
The compatible time-sharing system: A programmer's guide
MIT Press Cambridge Mass 1965
2 J SCHWARTZ et al
,A general-purpose time-sharing system
.
Proc SJCC Vol 25 1964397-411 Spartan Books BaltImore
3:E W FRANKS
A data management system for time-shared file-processing
using a cross-index file and self-defining entries
AFIPS Proc Vol 28 196679-86 Also available as SDO
document SP-2248 21 April 1966

5(}

Fall Joint Computer Conference, 1969

4 R E BLEIER
Treating hierarchical data structures in the SDC time-shared
data management system (TDMS)
Proc 22nd Nat ACM Conf Thompson Book Co 196741-49
5 E W DIJKSTRA
The structure of T.H.E. multi-programming system
C A C M Vol 11 No 5 May 1968
6 F J CORBATO V A VYSSOTSKY
Introduction and overview of the multws system
Proc FJCC Nov 30 1965 Las Vegas Nevada
7 B W LAMPSON
Time-sharing system reference manual
Working Doc Univ of Calif Doc No 30.1030
Sept 1965 Dec 1965
8 B W LAMPSON
A sch6duling philosophy for multi-processing systems
C A C M Vol 11 No 5 May 1968
9 J H SALTZER
Traffic control in a multiplexed computer system
MAC-TR-30 thesis MIT Press July 1966
10 G H FINE et al
Dynamic program behavior under paging
Proc ACM 1966223-228 Thompson Book Co Wash D C
11 E G COFFMAN L C VARIAN
Further experimental data on the behavior of programs in a
paging environment
C A C M Vol 11 No 7 July 1968471-474
12 L A BELADY
A study of replacement algorithms for d, virtual storage computer
IBM Systems Journal Vol 5 No 2 1966
13 R W O'NEIL
Experience using a time-shared multi-programing system

with dynamic address relocation hardware
Proc SJCC 1967 Vol 30 611-627 Thompson Book Go
Washington D C
14 L G ROBERTS
Multiple computer networks and intercomputer networks and
intercomputer communication
ACM Symposium on Operating System Principles
Oct 1-4 1967 Gatlinburg Tenn
15 E BOOK D C SCHORRE S J SHERMAN
Users manual for MOL-360
SCC Doc TM-3086/003/01
16 R R LINDE P E CHANEY
Operational management of time-sharing systems
Proc ACM 1966 149-159
17 P V McISSAC
Job descriptions and scheduling in the SDC Q-32 timesharing system
SDe Doc TM-2996 June 196628
18 C WEISSMAN
Security controls in the ADEPT-50 time-sharing system
AFIPS Proc FJCC Vol 35 1969
19 W A BERNSTEIN J T OWENS
Debugging in a time-sharing environment
AFIPS Proc FJCC Vol 33 19687-14
20 A D KARUSH
The computer system recording utility: application and
theory
SDC Doc SP-3303 Feb 1969
21 A D KARUSH
Benchmark analysis of time-sharing system
SDC Doc SP-3343 April 1969

APPENDIX A: Advanced development prototype system block diagram.
UNIVUSAL CHAl SeT '1640
HN2 NINT 'lAIN ott"...
6/'0 LINES/INCH (.,Q) w,,9'10

'EVISID 3OA""L '96'

IWT MOOI.&I....::::===+--""
ONlY
..uJICIYTIS

4615 lUMINAL
CONTIOL TV'! I

"" SECOND

IACH
ClIYTEWIDI)

3." M IYTE CA'ACITY
312.5K IVTEJUC TRANSon_ lATE
•.• M SIC AVEUGE ACCESS TIM!

3233

1912TEllGlAPH

TEiMlNAL CONTlOL
TY,11l

lPO Q 20569

c. IICOGNITION
lPOQ23'6oI
IT>! 'NTlIIU"

7U5T1iM'NAl

CONnOt.
IX'ANS'ON
AVE ACCISS W/O
MOVING HIAD • 17 MS,
WITH MOVING HlAO 120 MS

TlANS.IIIAn
'41KIYTDi\K
CA'''',TY

m .. Ivm
MU)( CAlLI
ASS~V

5ni2t2

(1) IILONGS TO CCD

~y31O

DOIS NOT HAVI TY_TIC
(2) ON ONI. V ON1274'

\

."SOlUTE VECTO« AND CONUOL
ALfttANUMEllC KEV'OAIO 124.5

IKIUFF£ltl499
CHARACTER GENElATOI 1680
LIGHT 'EN41B5
FUI\fClION KEVIO ...IO st55
2J8K IVTESlSfC

' .......LL(t OATA AO""Ul
ptAYUlil1 '5.500

OVAL 'APE DRIVES
800BITS!INCH
7 9~TRJ.CK AND 1 7-TRACK

ADDITIONAL
DlIVES
2.8 'IOSQ

90 K eVTES:SEC TUNSFU.
112.SIN/SEC

~A TE

EX'.ANOED
CAPA,BILITY

'381'

TO REMOTE CONSOLES VIA OATA SETS (lNHEN NUOEO)

An operational memory share supervisor
providing multi-task processing within a
"single partition
byJ.E.BRAUN
Penna. -N. J. -Md. Interconnection
Philadelphia, Pa.

and
A.GARTENHAUS
Applied Programming Services, Inc
Philadelphia, Pa.

INTRODUCTION
The real-time digit"al process control system, of which
the Partition Share Supervisor is an operational feature
was designed and implemented to assist in the function~
of monitoring, evaluating and controlling an interconnected system of electrical power utility companies. The main processing unit is located at the
central control office with teleprocessing communications to remote lower level control centers.
The basic addressable unit within the main processor
is the byte (8 data bits + 1 parity bit), with a word
consisting of four bytes. There is a storage protect
option which is implemented through assignment of
storage and "keys" to contiguous 2048 byte blocks of
memory. A group of memory blocks with matching
protect keys comprise a partition or task area. This
protection feature permits non destructive read-out
across partition boundaries but will cause termination
of any task which attempts to write in another task's
memory area.
The arithmetic-logic unit maintains its current status
in a program status word which contains such information as whether or not I/O is currently being permitted on each of the data channels, the protect key for

the instruction presently being executed, present
machine status, length of current instruction, the address of the next in"struction to be fetched, etc. There
are certain instructions within the instruction set
which can only be executed when the machine is in
the "supervisor" state, i.e., when the portion of the
program status word which indicates machine status
is correctly set." These instructions are classified as
"privileged" instructions and perform such functions
"as disabling data channel interrupts, altering storage
keys, resetting the program status word, etc.
The ability of the computer to disallow certain of
its instructions when operating in the normal problem
program state prevents inadvertent destruction of
critical storage area or catastrophic conditions being
caused by problem programs which could lead to
system shutdown.
This system utilizes the indeperldent I/O channel
concept which permits the main processor to continue
execution of program instructions while the channel
transfers data from I/O devices into main storage by
cycle interleaving.
The multi-tasking capability of the manufacturer
supplied software support system permits priority
51

52

Fall Joint Computer Conference, 1969

scheduling of several tasks all utilizing the resources of
one processing unit. The design of the real-time control
system requires that it perform certain of its functions
in a cyclic basis. Therefore, the internal storage has been
divided into' four task areas (partitions) with time dependent and critical programs placed in partitions
with relatively higher priorities. The following task descriptions are listed in order of task priorities:

/
HIGH
MEMORY
ADDRESS

TASK 2

.//
REAL TIME PROCESS CONTROL
TYPEWRITER/CARD READER
TELECOMMUN I CATI ONS CONTROL

72K

ANALOG/DIGITAL TELECOMMUNICATIONS
CONTROL
EMERGENCY DISPATCH ROUT I NES

V/
TASK 3

Task 1 (core requirement) == 42K)

DIG I TAL CONSOLE MESSAGE PROCESS I NG
OUTPUT TEXT GENERATION FOR TASK 2
REMOTE CARD I NPUT PROCESSOR

\

Task 1 is dedicated to the manufacturer supplied
operating system (O/S) which contains supervisory
routines, data management routines priority scheduler,
etc.

TASK-TO-TASK COMMUNICATIONS MONITORING
TASK 4

PART I TI ON SHARE SUPERY I SOR (PSS)

TASK 5/TASK 6

~:::

6K

(SHARED PART IT ION)
TI ME DEPENDENT AND SPEC I AL DEMAND

Task 2 (Icore requirement ==72K)

961t

SC I ENT I FI C APPLI CAT! ON PROGRAMS
(TASK 5)

Task 2 incorporates the process control family of
programs. It also includes the remote typewriter/caTd
reader communications programs since they use little
processing time and benefit from both the independence
of input/output channel operations and quick response
time available to the task.
D~ring power system
emergency situations, Task 2 additionally initiates
routines which, due to their critical nature, retain
system resources and dispatch emergency communications until the disturbance is relieved.

Task 3 (core requirement == 40K)
Task 3 contains special digital console message processing routines, text output gene*ators for programs
operational within Task 2, routines! for processing card
inputs from the telecommunications system and routines which monitor and control inter-task communications.

Task 4 (core requirement == 6K)
Task 4 is the Partition Share, Supervisor (PSS)
which causes Tasks 5 and 6 to share the remaining
available memory. The detailed description of this
task is the subject of this paper.

Task 5 (core requirement == 96K)
Task 5 consists primarily of scientific application
programs. These programs are run as required either on
special demand from real-time on~line tasks or periodically with the length of the period depending on
the nature of the program.

OR
OFFL I NE MISCELLANEOUS USES
(TASK B)

TASK I
(NUCLEUS)

V/

OPERATING SYSTEM CONSISTING OF:
SUPERY I SORY, DATA MANAGEMENT,
PRIORITY SCHEDULER ROUTINES, ETC.

V/

LOW

MEMORY
ADDRESS

Figure I-Initial memory configuration with task
functional descriptions and relative locations shown

Task 6 (core requirement == 96K)
This task is the off-line* task and is dedicat.ed for
miscellaneous uses such as compiles, assemblies, accounting routines, etc.
Figure 1 is a functional diagram of the tasks just
discussed and shows their re1ative locations in computer memory.
General discussion

Task dispatching
Task dispatching is under the control of the operating system. From a copceptual standpoint, the
operating system can be considered to be the only
main program in storage and all other tasks within
the computer as subroutines.

* The term off-line is used in this paper when referring to tasks
which do not directly operate within the real-time environment.
This use is similar to the term "background" which the re9.der
may have previously encountered.

An Operational Memory Share Supervisor
The dispatching function consists of allocating the
resources of the processor to the highest priority task
which is in the Hready" state. When no tasks are in
the ready state, the processor is not working and is in
a wait state. When any task reaches a point where it
no longer can process until the completion of some
event (such as an I/O operation), it relinquishes control of computer facilities to lower priority tasks via
the scheduler. It will regain these facilities when the
event it is awaiting is completed and there are no
higher priority tasks which are in the ready state.

/'"
HIGH
MEMORY
ADDRESS

/

TASK 2

72K

/
TASK 3
4OK

Inter partition communication
The subject real-time system requires that operational tasks be able to communicate for the purpose
of exchanging information such as live data, requests
to run various subtask routines, etc. Tasks which
communicate with other tasks are equipped with intertask communication routines which are considered the
highest priority routines within the individual task. In
this fashion, when the task is dispatched, the internal
task priority scheme allows the communication routines
to be processed first. Furthermore, any task can be
interrupted to allow its communication routines to
operate. Thus tasks can communicate at any time
(asynchronously) .

Partition sharing
The Partition Share Supervisor (PSS) is required to
be able to handle three basic functions:
1. Suspend processing of the off -line task when
required.
2. Load and process the lowest priority on-line
task (LPOL).
3. Upon completion of (2) above, be able to restore
and restart the off-line task.
There are two conditions under which PSS suspends
off-line processing. One is when the previously set
real-time clock causes an interrupt. This interrupt is
recognized as indicating the LPOL is to be recycled
for a periodic run. The other is when a communication
is received' from another task indicating that one of the
routines within the LPOL task is to be executed.
Figure 1 shows the computer configuration in the
normal mode. Normal mode is considered to be when
the shared partition is occupied by off-line programs.
Note that there are four problem program partitions
(excluding the nucleus).
Figure 2 shows the configuration when the off-line
programs are "rolled out" and the LPOL programs
are operational. There are now three problem program

53

TASK 4 PARTITION SHARE SUPERVISOR {PSS'
TASK 5

V

"6

96K

(LO' PR lOR I TV ON LI NE TASK)

~

COMBINED
SINGLE
TASK
AREA
(102K)

Vi'"
TASK 1 (NUCLEUS)
LO.
MEMORY
ADDRESS

42 K

V

Figure 2-Showing memory configuration when low
priority on line (LPOL) task is active

partitions and the area dedicated to the PSS and LPOL
tasks is one contiguous partition.
Detailed discussion

The following description details the operations involved in reconfigurating the system from that of
Figure 1 to that of Figure 2 and returning to that of
Figure 1.
As previously stated, the PSS task is initiated for
one of two reasons:
1. Timer interrupt indicating a need to run the
LPOL task for time dependent programs.
2. External interrupt triggered by communication
from another task indicating a need to process
a requested program.

Prior to either type of interrupt, the PSS task is
in a wait state (i.e., the task cannot be dispatched
until the completion of one of the above two events).

Fall Joint Computer Conference, 1969

E4

Upon being initiated, PSS takes the following steps:
FIELD

1. Places its own task in the supervisor state in
order to allow execution of privileged instructions
required to modify system control blocks in the
nucleus, override the storage protection feature,
and disable system interrupts at critical times.
2. Allows all outstanding I/O to complete in the
off-line partition (quiescing the partition).
3. Erases the boundary between the PSS task
and off-line task.
4. Deletes reference to the now non-existent offline task from operating system control blocks.
5. Writes a copy of the off-line partition, which is
now an extension of the memory area of the
PSS task, on a disc file.
6. ~eads the LPOL task into the vacated area.
7. Executes the LPOL task.
At this point, we have gone from the configuration
shown in Figure 1 to that of Figure 2 and the LPOL
task is now able to process its requests. Upon completion by the LPOL task of all required processing,
the following steps are taken by PSS to return to the
off-line configuration:
8.
9.
10.
11.
12.

Writes the LPOL task on a disc file.
Reads the off-line task into the vacated area.
Re-establishes task boundaries erased in 3.
Restores system reference to the off-line task.
Places the PSS task in a "wait state" awaiting
an interrupt which will cause a recycle.

At thi'3 point, the off-line task is fully restored to the
system and in a "ready state". It will then be red ispatched by the task dispatching routines on a priority
basis.

System 'control blocks
Prior to a detailed discussion of PSS mechanics, we
will discuss relevant system control blocks utilized in
effecting partition sharing.
Task Control Block (TCB)
There is a TCB associated with each task. Contained
in the TCB are various boundaries, indicators, etc.,
used in performing task controL Figure 3 shows those
fields (with references labeled as used in this paper)
which are accessed or modified by PSS.
TCB List (TCBLIST)
The TCBLIST is located in the nucleus and is a
list of TCB 10cationR in ord~r of task priority. There

TCBTAHB

Figure

3~Task

C(I.ENTS
PO I NTEfI TO TASK
MSS (B(JUNDARY
BOX-SEr: FIG.5)

TCBPKE

CONTA HIS STORAGE
PROTECTION KEY
FOR THlt TASK

TCBIDF

TASK I DIENTI FI
NUMBER

TCBTCB

PO I NTEFI TO NEXT
LOIER F'R I OR I TV
TASK T(:B

cn I ON

control block (TCB)

is an entry in the list for each task in the system (see
Figure4).
Task Area Boundary Block (TABB)
There is a TABB associated with each task. The
TABB contains addresses defining the upper and lower
boundaries of the task region and also has a pointer
to the first free area label within the task. The format
of a TABB is shown in Figure 5.
Free Area Label (FAL)
There is an F AL which is an integral part of every
available free storage area in memory. An F AL is

POINTER TO TCB OF HIGHEST PRIORITY TASK
~------------------------------------~~--

POINTER TO TCB OF NEXT HIGHEST PRrORITY TASK
~--------------------------------------------.-

•

•
•
•
.. ,..

.!.,..

POINTER TO TCB OF LOWEST PRIORITY TASK
Figure 4-TCB list (TCBLIST)

An Operational Memory Share Supervi.sor

LABLE
FALPT
FALPT
LOADDR

~
POINTER TO FIRST FREE AREA
LABEL (FAL) IITHIN TASK
AREA. (SEE FIGURE 8)

LOADDR

THE ADDRESS OF THE LOW
BOUNDARY OF THE TASK

HIABOR

THE ADDRESS OF THE HIGH
BOUNDARY OF THE TASK.

HIADDR

55

I01 ESTAT

STATUS INDICATOR FOR THIS
lORE. THE LAST lORE IN THE
CHAIN HAS AN lORE STAT FIELD
11TH A VALliE OF 1-

IOREI 0

FIELD SET TO SAlE ID NUMBER
AS THAT OF THE TCBIDF FI ELD
OF THE TASK IHICH INITIATED
I/O REQUEST (SEE FIGURE 3)

Figure 7-1/0 request element (lORE)

Figure 5-Task area boundary block (TAB B)

Quiescing a partition
effectively a label for each free storage area which
defines the size of it and contains a linkage pointer to
the next FAL. The format of an FAL is shown in
Figure 6.
Input/Output Request Element (lORE)
There is a chain of IOREs for all outstanding or
queued I/O operation requests from any partition.
Each lORE contains information used by the system
I/O interrupt handling routines as I/O operations are
completed. Figure 7 shows the format of an lORE.
System Vector Table (SVT)
The SVT is resident in the nucleus and contains
essential pointers required by the operating system.
Included is a pointer to the start of the lORE chain.
The location of the SVT is retrieved from a fixed memory location which is conditioned with the SVT address
during system initialization.
As mentioned under General Discussion the PSS
.
'
task IS required to run in supervisor state at times.
Although the state of, the PSS task changes from
problem to supervisor and back throughout its execution, these changes of state will not be noted in
this discussion. It should be understood that PSS
operates in problem state at all times where it is not
required to be executing privileged instructions modifying storage in another partition or the nucieus or
disabling I/O interrupts.
'

FAUXT

FAL ..n

FALCOUIIT
FALCOU..T

PO INTER TO NEXT FAL IN THE
CNAIN OF FAL' S..
NOTE: IF THIS FIELD IS ALL
ZEROS, THIS IS THE LAST fAL
IN THE CHAIN.
.....T OF FREE MEIORY
AVAILAILE SURTING AT THE
IEII.I. OF THI S FAL.

Figure 6--Free area label (FAL)

Prior to rolling out the off-line partition, PSS must
be sure all I/O is quiesced in order to prevent the I/O
supervisor routines from accessing some storage area
which is in a transitory state.
There is an lORE for all outstanding and queued
I/O requests. Within each lORE is an identification
number field (IOREID-see Figure 7) which links it
with the initiating task. When that task is involved in
an I/O operation, the TCBIDF field of the TCB
(Figure 3) has a task identification number that will
match the 10REID field of some active lORE.
As I/O interruptions occur, the I/O Interrupt Handler services the interrupt and removes the appropriate
lORE from the chain and makes it inactive.
Partition quiescing is accomplished by initially disabling I/O interrupts, obtaining the TCBIDF field
from the TCB of the task involved, locating the lORE
chain by using the pointer in the SVT, and scanning
the IOREs checking for 10REID fields which match
the TCBIDF field of the TCB. If none are found, there
are no 10REs for the task and it is already in a quiescent
state. If any are found, then the task has a pending
I/O interrupt or outstanding I/O requests. If this is
the case, PSS enables interrupts allowing the I/O
Supervisor to process, if necessary, and then immediately disables them. If the I/O in question has been completed, the lORE will have been removed from the
chain during the time interrupts were enabled.
PSS restarts at the beginning of the chain and checks
again, repeating the above steps until it comes to the
end of the chain without having found any active
elements for the task. When it rea-ches this point, there
are no longer any 10REs associated with the task and
it is in fact quiescent.
It should be noted that since the PSS task has a
higher priority than the task to be quiesced, it does
not anow any new I/O requests to be initiated by that
task since PSS retains the computer resources.

Erasing of a partition boundary and
task deletion
There is control information which is received by

56

Fall Joint Computer Conference, 1969

------------------------------------------------------------------------------------the communications routines within the PSS task
which must be accessible to the, LPOL task for both
reading and writing (such as indications which LPOL
routine to is be run, the replacement value for the
next cycle time which is calculated by the LPOL task
as a function of its current running time, entry point
addresses of routines mutually shared by the PSS and
LPOL tasks, etc.). Additionally; task management is
greatly facilitated by extending the PSS task aroa to
include the LPOL function while controlling via the
PSS Task Control Block (TCB) rather than modifying
the off-line task TCB or creating a newone.
In order to make the shared task area a memory
extension of the' PSS task, the memory areas must be
linked. This is achieved by modifying the TABB (see
Figure 5) of the PSS task so that the LOADDR field
points to the low address of th~ shared task. Figures
8 and 8a show the pointer relationships before and
after these TABB modifications.
The storage protection feature must now be satisfied
to make the two storage areas completely contiguous.
Since there is a mismatch in storage keys between the
PSS and shared tasks, the keys associated with each
protected block of memory within the shared task are
reset to match those of the PSS task. At this point,
I

1

1

ON LINE REAL TIME
TASK AREAS

PSS TASK AREA

1

ON LINE REAL TIME
TASK AREAS

PSS TASK AREA
$..7r:.\

f'

//7

"

""

..... ~s~

I

..

SHARED TASK AREA

/ f 'r//"" "",
I
~
, \\\
I,

,

..

\ \

\\

\\1

'

OPE:RATING
"--t---~ )'\\ .........- - - - 4 Sl'STEM
LOADDR
\\
LOADDR
TASK
\'
'IREA
HIADDR
HIADDR
PSSTABB

1

NUCLEUS

OFFLINE TABB

Figure 8a-TABB pointers after modification

the two task areas have become a contiguous block of
memory assigned to the PSS task area.
Figure 9 shows how, TCBs are linked together within
the system. Note that each entry in the TCBLIST
points to a TCB and each TCB points to t.he next
lowest priority TCB in the chain. Figure 9a shows the
arrangement of the TCBLIST and the TCBTCB field
in the next-to-Iast TCB in the chain after modification
to three partitions. This has been done by replacing
the pointer to the last TCB in the TCBLIST with a
pointer to the next-to-Iast TCB, and setting TCBTCB
field of the next-to-Iast TCB to zero. These modifi-

OPERATING

\I--l-O-AD-O-R~ S~ ~1~M

AREA
HI ADDR

HI ADDR
PSSTABB

NUCLEUS

OFFLINE TABB

Figure 8-TABB pointers in PS$ and offline task
prior to modification

Figure 9-Portion of nucleus showing TCBIJIST gond
TCBTCB pointer relationship prior to modification

An Operational Memory Share Supervisor

1

1

ON LINE REAL TIME
TASK AREAS

~

57

PSS TASK AREA

Figure 9a-TCBLIST and TCBTCB pointers after
modification

cations have additionally made the last task nonexistent to the operating system.

OPERATING
SYSTEM
t - - - - - - - 1 TASK
AREA

t-------1

Rollout jRollin
The process of rolling out the off-line task and rolling
in the LPOL task is a straightforward write/read
operation to a disc file. Since storage is divided into
2048 byte units for assignment of storage keys, the
task area read or written is some multiple of 2048
bytes in length. Thus the records are read or written
in 2048 byte blocks for purposes of simplicity and
efficiency.

Free area modification
The PSS and LPOL tasks now occupy the same task
area. It is neceRsary, therefore, to make certain modifications which will cause all requests for work storage
to be satisfied from that portion of the task area wholly
dedicated to the LPOL task. Although no task boundary exists between LPOL and PSS, if work storage
were to be allocated from the PSS domain, it would
not be subsequently saved and restored in future
cycles since the PSS area is not included in the dynamic
area which is stored on the disc file.
Figures 10 and lOa show how these modifications
are accomplished. Initially (Figure 10) the FALPT field
of the PSS TABB is pointing to the free area within
what was its own task area. This is the normal condition
for this pointer when there is an operating off-line
task. However, we have modified the configuration to
three task areas and we now wish to make the only
available free area all exist in the LPOL area. Figure
lOA shows that the FALPT field of the PSS TABB
has been re-pointed to the first F AL within the LPOL
task area.
At this point, the LPOL task is ready to process

PSS TABS

(NOT BEING USED)
NUCLEUS

Figure lO--F ALPT relationship with F AL locations
prior to modification

I,.,.,

1

ON LINE REAL TIME
TASK AREAS

'r-

PSS TASK AREA

,' FALNXT 1FALCOUNT1

.J'",
.....

-----------------FORMER TASK BOUNDARY
_--_

VACANT TASK AREA

t~f:::----~--:"'-::::-~

(

ZERO

I

9SK

FALPT

PSS TABB

. . . ~~"

.....

"

FALPT

OPERATING
SYSTEM
TASK
.AREA

(NOT BEING USED)
NUCLEUS

Figure lOa-F ALPT fields after modification

Fall Joint Computer Confer'ence, 1969

58

whatever request caused it to be activated. We have
now covered steps 1 through 7, under General Discussion. In returning from the three partition to the
four partition environment, the steps are essentially
the reverse of those detailed.
Upon restoring the off -line task, PSS enters a wait
state and will be restarted as previously outlined. The
task dispatcher port.ion of O/S will restart the off-line
task as soon as there is available computer time and
no higher priority tasks require the computer resources.

Initialization
The initialization process for PSS consists of:
1.
2.
3.
4.

5.
6.
7.
8.

Suspending of off-line processing.
Reconfiguration from forir to three partitions.
Rolling out the off-line task.
Making the off-line task area one contiguous
free area.
Loading the LPOL task and allowing it to
ini tialize itself .
Rolling out the LPOL task.
Rolling in and restarting the off-line task.
Entering the normal cycle at the wait point.

Step 4 above has not been previously covered in
detail. In order to force the initial loading of LPOL into
the desired location, 'the F ALs for PSS are initially
modified. Figures 10 and lOA show the PSS TABB
before and after this is done. The F ALPT field of the
PSS TABB initially points to the first FAL within
the PSS area. The FALPT field of the LPOL TABB
points to the first FAL of its task area. By altering
the FALPT of the PSS TABB to make it point to the
LPOL first F AL and by altering the F AL by both
making it the last F AL in the chain and indicating
one large block of free memory, we have created a
large free area available to PSS for loading the LPOL
programs.
As the LPOL task acquires and releases memory
blocks for work storage, the FALs within the area
are modified by the operating system consistent with
memory availability. PSS simply saves the pointer to
the first LPOL FAL prior to each rollout and restores it
after rollin and prior to reinitiating LPOL. Continuity
of FAL linking is maintained in this fashion.
Special handling

There are occasions when the off-line partition cannot be quiesced. This could be caused by a card reader
jam, a printer being out of paper, etc., causing an
lORE associated with the I/O to remain linked in the

chftin beyond some reasonable amount of time (presently 10 seconds). These conditions are relatively
infrequent; however, provision has been made for them
by advising the operator via the computer console
typewriter and an attention bell that the off-line task
is non-quiescent and requires attention.
The memory area actually required by PSS is less
than 6K. However, in order to initially load PSS into
memory, a large enough partition must be available to
furnish the operating system job scheduler routines
their required amount of core. This requirement is in
the order of 24K. Thus there is a pre-initialization
phase during which PSS changes the initial configuration (Figure 11) of 50K and, 52K to 6K and 96K for
the PSS and off-line tasks, respectively (Figure 1).
The technique for doing this will not be detailed; hmvever, the essential steps are as follows:
1. Heferring to Figure 12, the initial PSS task area
is shown iri three segments (B, C, D) and the
initial off-line task area is shown in one segment
(A). The PSS Pre-Initializer is loaded by the
operating system into area B.

72K

TASK 2 (ON LINE)

40K

TASK 3 (ON LI NE)

TASK 4 (PSS)

50K

TASK 5/6 (OFF-LINE/LPOL)

52K

TASK 1 (OPERATING SYSTEM)

42K

NUCLEUS
Figure ll--Initial task core allocations

An Operational Memory Share Supervisor

2. In ord~r to place the PSS main program in the
area where it can control storage, it must be
forced into area D. To achieve this, the task
area boundary block is modified to make area
D free and areas Band C unavailable.
3. The PSS main program is loaded into area D.
4. The off-line boundary block is, modified to include areas Band C as free areas.
5. Control is passed to PSS main.

/

/'"
TASK 2
(OK LINE)

72K

/'

UPPE R BOUNDARY OF
PSS
TASK AREA

TASK 3
(ON LINE)

(0)

~

PSS TASK

~ 6K

/'
PRE-INITIALIZATION PROGRAM

CONCLUSION

~---UWE R BOUNDARY

>-

OF
PSS
TASK
AREA
5QK
AFTER
PRE-I NI.TIALIZATION

I~

LOWER
/ ' """---. INITIAL
BOUNDARY
OF PSS TASK AREA

®

The configuration is now that of Figure 1.

40K

©

®

59

52K

OFF-LINE TASK AREA
(llHTlALLY)

Implementation of PSS has effectively added 96K of
additional processor memory to the real-time system
of which it is an integral part. This coupled with the
facility to process off-line tasks while having an available stand-by on-line task; has greatly enhanced the
capability of the system. The application of PSS has
effected a maximal utilization of computer resources
by the system.
REFERENCES

/'

1 IBM System/360 operating system control blocks

Form No 028-6628
42K

OPERATING SYSTEM
(NUCLEUS)

V
Figure 12

2 JRM system/360 operating system input/output supervisor

Program Logic Manual Form No Y-28-6616
3 IRM sysfem/360 operating system control program with MFT
Program Logic Manual Form No Y27-712~
4 IBM system / 360 operating system fixed task supervisor
Program Logie Manual Form No Y28-6612

Structured logic
by R. A. HENLE, 1. T. HO, G. A. MALEY
and R. WAXMAN
IBM Components Division
Hopewell Junction, N.Y.

INTRODUCTION

dissipate maximum power at the same time.

Large-scale integration for computer applications
has been predicted for several years, but close examination shows that the progress has been uneven. Memory
designers continually demand higher levels of integration for larger and faster memory systems, and
new memory concepts are being developed to further
exploit the characteristics of large-scale integration.
The one-thousand-circuit chip will become nothing
more than a milestone.
But what of the logic area? Here, we struggle along
hoping to find some high-volume applications for chips
with a mere fifty circuits. When we design a mediumsized machine we find that so much unit logic is required that the average level of integration falls below
ten. Orderly memory and random logic integrated
circuit fabrication procedures are growing so different
that thought is being given to building different types
of manufacturing facilities. This represents a rather
drastic approach and in the authors' opinions may
prove unnecessary.
The success to date in memory is encouraging, for
it gives direction to logic. Memory products should
therefore be examined critically for they may well
hold the key to success for logic products. The salient
features of a chip used in a memory product are:

• Well-Defined Function. The memory chip designer knows exactly how his chip fits into the
entire memory system. He therefore can optimize on a high level. As examples, he uses special
circuits for the latch functions and uses decoders redundantly to save pads.
• Volume. •While the initial memory chip design
is quite complex, the volume requirement makes
the initial design cost nearly negligible. With
this ground rule the chip can be highly engineered,
and nearly order of magnitude improvement
can be expected and obtained.
Structured logic, or array logic as it is sometimes
called, is an attempt to design logic with more of the
characteristics· of memory. Many unsuccessful starts
have taken place, but we shall discuss some of the
more successful efforts. We shall also add some thoughts
of our own, but it should be pointed out that the problem is far from solved.
Logic arrays

The basis of all array logic is a matrix of elements
with programmable interconnections. Diode structures
have been proposed in the past, and a matrix of common collector transistors is of recent interest. The
transistor array is programmed in the factory by
connecting or not connecting the emitter of each
transistor to a common line. (See Figure 1.) We shall
use transistor arrays in our examples, for that is what
we have been working with, but diode arrays should
not be ruled out.

• Regularity. Memory arrays are regular in components and wiring. The layout geometry is well
defined and can be highly optimized for total
chip utilization.
• Low Power. Memory systems are designed and
partitioned so that all circuits on a chip do not

61

62

Fall Joint Computer Conference, 1969

Figure I-A tn3,l1sistor array

The ROS
The read-only store (ROS) array in its simplest
form uses two decoders to feed the array: one feeds
the horizontal lines and the other the vertical lines,
as shown in Figure 2. A particular grid position in the
array is selected by activating ~he appropriate horizontal and vertical decoder line~. The addressed cell
of the array is located at the intersection of the two
activated lines. If the emitter at this address is COll-

nected to the horizontal decoder line, then a, 1 has
been programmed into this particular cell in the array.
If the emitter is unconnected, a 0 is said to be programmed into the array. The presence of the programmed 1 or 0 is sensed at the output when that
particular cell is addressed. The horizontal output lines
are dot ORed together to produce one common output
line, as shown in Figure 3.
Conceptually, the ROS is related directly to a
Karnaugh map, one bit position in the array for each
square in the appropriate Karnaugh map. Figure 4
depicts the four-variable K-map that relates to the
ROS of Figure 2. This relationship proves the universality of a ROS, for any Boolean function that
can be K-mapped can be implemented directly. Universality is the feature of the ROS chip most often
described as an asset, but in practice it is seldom useful except in code translators. The Boolean functions
used in the design of any computer are definitely not
random and not evenly distributed among all possible function"! of n variables. This fact is well documented in the many failures with other universal
logic blocks (ULB's). The real problem with the ROS
array is that it doubles in size each time an input
variable is added. This doubling in size is necessary
to maintain the dubious value of being universal.

The ROAM
The read-only associative memory (ROAl\1)

IS

ROS CIRCUITS

2

4

+6

ROS

C

z

0

Figure 2-Read-only store

Figure 3-Read-only-store circuits

a

Structured Logic
K-MAP
CD
J
AJ B

00

00

1

01

0

01

11

10

1

0

1

1

0

1

~

11

1

0

0

1

10

1

0

1

0

Figure 4-Karnaugh map

matrix of common collector transistors that may be
programmed by conneoting or not connecting the base
of each transistor to a common line in its own column
(Figure 5.) The emitters of each row are commoned
and feed the emitter of an output transistor. Each
row of array transistors and the associated output
transistor form a current switch.
Through phase splitters, each input variable has
both true and complement lines available to the array.
Hence , each variable controls a true line and a complement line (column) in the array. This gives rise

ROAM
A

B

C

Figure 5-Read-only associative memory

63

to the word "associative" in the name. By programming each row in the array to a particular pattern
of l's and O's, the input word pattern will "associate"
(compare) with the appropiate row in the array. If
there is no match, the outputs will remain logical zeros.
If at least one row has a pattern the same as the input
pattern, there will be a logical one output on that
horizontal line (row).
To program the array, each base is tied to a true
line (column), a complement line (column), or is
left floating. Thus, for a base tied to a true line, a 1
on that input line will yield a 1 at the emitter and a
1 at the output, since the row of emitters effectively
forms a DOT -OR (positive logic). Bases tied to a true
line are equivalent to a logical 1, since a 1 at that input causes a 1 at the output.
Conversely, a base tied to a compleme~t line is
equivalent to a logical O. A 0 at a particular input
raises the complement line of the phase splitter,
thereby raising to the 1 level all emitters of transistors
in that column that have their bases tied to the complement line (column).
If the base is left floating, that array grid position
is effectively a DON'T CARE. That is, the output
line will not be raised to 1 by either a 1 or 0 at that
. transistor's column input.
Figure 6 illustrates the implementation of an adder
position with SUlVf and CARRY outputs using a
ROAM array. A black triangle connecting a vertical
line and a horizontal line indicates a base connection;
lack of a black triangle indicates a floating base. Note
that if a true line is connected, then the complement
line is not connected, and vice versa for each array
grid position. Thus, at most, only 50 percent of the horizontal and vertical intersections will ever be used.
To conceptually understand the ROAM and relate
it to the Karnaugh Map it is convenient to think in
terms of negative logic. Thus, down levels are logical
1, the commoned emitters of each row form a DOTAND (all emitters down results in a down level, any
emitter up results in an up level), and dotting the output
transistors results in a DOT -OR.
Each row of the ROAl\I represents a term of a
logical expression in the sum-of-products form. The
logical expression CARRY = B . C + A . B + A . C
is in sum-of-products form, and B . C, A . B, and
A . C are each terms of the expression. Each term
may be implemented on one row of the ROAM. For
example, Figure 6 illustrates the implementation of
the CARRY function. Note that the A true and B
true columns are both connected to a transistor base
in the second row of the ROAM array, yielding the
term A . B. The three rows B . C, A . B, and A . C

64

Fall Joint Computer Conference, 1969

A,S

C ,0

C

1

1

0

0

1""

111

T

10 0

-1

o0
o1

[0

11

0[2] 0
A

0
(l)

A,S

oo 0 OJ
olQJ 0
1 0

are DOT-ORed at the output to yield B . C + A .
B + A . C = CARRY. In forming the term A . B,
the variable C does not have its true or complement
column line connected to a base. CARRY is 1 if A is
1 and B is 1 regardless of the value of C.
Each term of a logical expression in sum-of-products
form is an "implicant" on a Karnaugh Nlap. An implicant is formed by looping the l's in the Karnaugh
map and "reading" the loops from the ma,p. Loops
can only contain adjacent l's and the number of ones
in a loop must be equal to 1, 2, 4, ... , a power of 2.
This results from the fact that adjacent squares on a
Karnaugh map always differ only by the value of
one variable. Two squares looped yields a term with
n-l variables (n = number of variables), four squares
looped yields a term with n-2 variables, etc. Thus,
each implicant requires one row in a ROAM. The
bigger the loop of l's the fewer connections need be
made in that row. The complete expression i.s formed
by DOT -ORing the rows which is the same ,as ORing
the implicants.
The example of Figure 6 uses three loopi3 of two
l's each to form the CARRY. The SUM is formed
by four loops of one 1 each. In this case three con-

CARRY

SUM

C

S

........- .....-+-+---t-~-

~

CARRY

SUM

Figure 6-ROAM adder position

TABLE I----+Bits required for n variables in ROS and ROAM ARRAYS

2

3

VARIABLES
4
5

6

7

8

n

BITS

--.
<:Ij
~

c

R.::
'-"
~
~

j

R.

~

ROS
Always Universal

4

8

16

32

64

128

256

ROAM
2

8

12
18
24

16
24
32'
40
48
56
64

8

24

64

20
30
40
50
60
70
80
90
160
160

24
36
48
60
72
84
96
108
192
384

28
42
56
70
84
98
112
126
224
.896

32
48
64
80
96
112
128
144
256
2048

3

4
5
6
7
8
9
16
2n/2 Rows (Universal)

2 ''J

4·n
6'n
8·n
10'n
12·n
14·n
16·n
18·n
32·n

n·2 n

Structured Logic
nections must be made in each of the four required rows
to obtain
SUM

=

A . 13 . C + A . B . C +

A.B .C

+A· 13·(3
In contrast to the ROS, the ROAM can have uni~
versal capability with only one-half the number of
rows as the ROS needs bits for the same number of
variables. Moreover, the ROAM does not need to be
universal to be useful, thus allowing even further
reduction in size. Table I illustrates the difference
brought about by the ROS requiring one bit per K-map
position and the ROAM requiring one row per K-map
implicant.
Historically, computer functions are composed of
about four implicants or terms. The chart shows that
a four-implicant function is cheaper to implement
with a ROAM than with a ROS when the function
contains six variables or more. When the decoders
required for the ROS are considered, even four-variable functions with four implicants are more economical in ROAlVI than in ROS.
Two useful formulas to compare ROS bits required
with ROAM bits required for a given function are:
ROS bits

=

2n

65

ROAM bits = 2 In,
where n = number of variables, I = number of implicants. Thus, it is more economical to build a function
with the ROAM when 2 I n < 2n. This does not
consider the cost of the ROS decoders, which add a
factor to the inequality.
If we assume that the decoders for n-even take
2n(2n/2) bits, and for n-odd take [en + 1) 2(n+1 )/2 +
(n - 1) 2(n-l)/2] bits, then the cases for which ROAM
should be used are:
1. n even
2 I n < 2n

+ 2 n(2

n / 2) ;

2. nodd

+ (n + 1) (2[n+l1/2) + (n -

2 In < 2n

1) (2[n-ll/2)

Thus, ROAM is more economical than ROS in most
practical problems.
A realistic example of control logic for a small machine model has been implemented using the ROAM
array. Table II gives a comparison of the number of
bits required for a ROAM implementation versus the
number of bits required for a ROS implementation.
Note that the ROAM is significantly more economical.
A partitioning of functions could have been devised
for the ROS implementation. The ROAM would still

TABLE II-ROS vs. ROAM -a control logic example
TOTAL NUMBER OF VARIABLES. ..................................................
TOTAL NUMBER·OF FUNCTIONS ... 0...............................................
TOTAL NUMBER OF IMPLICANTS ................ 00................................
One 7-implicant function of 13 variables
Four I-implicant functions of 7 variables
One I-implicant function of 11 variables
ROAM
ARRAY SIZE: 28 X 12 ......................................................
ROS 1
ARRAY SIZE/FUNCTION: 214 . . . . . . . • . . . . . . . . . . . . . • . . . . . . . . . . . . .
6 ARRAYS FOR 6 FUNCTIONS: 6 X 16,384 ................................. .
SHARED DECODER .......................
TOTAL BITS ................................................ ·....
ROS 2
ARRAY SIZE FOR 13 VA'RTABLES: 213 . • • . . . . . . . • • . . . . . .
ARRAY SIZE FOR 7 VARIABLES: 27 X
ARRAY SIZE FOR 11 VARIABLES: 211 • • • • • • • • • • . • • • • • . .
SHARED DECODER ....
1"OTAL BITS .....................................
0

0

0

•••••

0

•

0

•••••••

0

4.0.0

•••••••

0

••

00

•••

••••

0

•

0

•••••

0

•••••••••••

0

•••

0

0

0

0

••••••••••

•••••••••••

0

0

0

••••

00

••••

•

•••

0

0

0

•

•••••••

0

•••••

0

•••

•••••

0.0

0

0

0

0

0

••••••••••

0

00000.

0

•••

•••••••••••••••••

. . . . . . . . . . . . . . . .

0

••

0.0.

0

••••••• 0

0

0

••

0

0

14
6
12

336 BITS
16,384 BITS
98,304 BITS
3,584 BITS
101,888
8,192 BITS
512 BITS
2,048 BITS
3,584 BITS
14,336

66

Fall Joint Computer Conference, 1969

be more economical than the ROS, however, especially
when one considers the additional wiring complication
-of connecting several small ROB arrays and the additional design time required to effectively partition
the functions.
The optimum size for a ROAM has not been determined, but chips with at least 512 bits on them are
desirable. This capacity would provide between eight
8-variable, 4-implicant functions, and one 64-variable,
4-implicant function (an extreme case, needless to say)
on a chip. The practicality of building and using such
a chip is yet to be determined.

The SLT array
Arrays can be designed so that they may be used for
direct replacement of present logic. The SLT array
performs the function AND-OR-INVERT in negative logic or OR-AND-INVERT in positive logic
and can be used directly to replace SLT logic. While
direct replacement of random logic with array chips
may prove to be the wrong approach in the long run,
it may well be the only way to get array logic started.
The SLT array has the same advantages over ordinary logic that all arrays have: orderliness of design
and layout, and high density with relatively low cost.

In addition, this type of array has a higher bit usage
than other arrays, since it more closely resembles the
familiar random logic, functionally. The SLT array
does not have decoders or phase splitters on its input
lines, as do other types of arrays. This makes the array
less universal than even the ROAIVI array ibut more
effective for r2,ndom logic. It is fair to say that arrays
of this type make poor code translators just as SLT
logic builds poor translators. It is difficult to believe
that any array will be effectiv3 in both random logic
and code translation problems.
As already stated, the ROAM array has specific
applications to decoders and associative memory
problems. The SLT array may very well be the element required to do general logic design. The reason
for this is the placement of the inverters as shown in
Figure 7. This movement of the inverters to the output lines may appear a minor modification, but it
should be remembered that there has never been a
useful logic block with inverters on the input lines. It
may pay to have both true and complemented outputs from a current switch logic block. Figure 8 shows
a full adder implementation in SLT logic 2~nd in an
SLTarray.

Array-driving arrays
The SLT array in Figure 8 demonstrates one necessary feature of an array that has yet to be discussed:
Any logic array must be able to drive any other array
in the same family, including itself. Note in Figure
8 the CARRY output fed back into the array. This
line probably will be an external wire. This technique
is required since it is in effect Boolean faetorin~~, a
proven necessity. This type of feedback is al80 needed
to produce sequential circuits, giving memory to the
arrays.

Figure of merit

Figure 7-8LT array

I t is less meaningful to compare array logic with
random logic in each individual term of power consumption, propagation delay time, and silieon area,
since one can usually be traded for the other, such 9·S
power with delay. Instead a comparison is made of
their figures of merit, chosen to be the product of
power consumption P, delay time T, and silicon area
A, all with weight function of one (PTA). Since no
isolation wall is needed between collector transistors,
a ROS or ROAM cell including approprifl,te interconnections can be laid out on a silicon chip area equivalent to 20-25 percent of that occupied by a transistor
that needs isolation walls. As shown in Figures 5 and 7,

Structured Logic

A
B
B
C

~--------------CARRY

ABC

67

pears to be the limited useful size of a single array,
and the difficulty in standardizing a particular array
configuration. As a minimum achievement at this
time, it appears that arrays will be useful in development of complex functions within a silicon chip.
Array logic will not eliminate the need for a circuit
designer in the future, since specialized designs will
be needed to optimize circuit and component technology. In some of these design cases, the importance of
array logic techniques will be obvious, but in others
it will not be.
At this point, array logic does not appear to strongly
affect the system designer's approach to machine design, and a knowledge of array logic may never be required.
In the future, however, to the extent that array
logic techniques influence the design and optimization
of highly efficient functions, the system designer's
work will be significantly influenced by progress made
in developing array logic techniques.
BIBLIOGRAPHY

Figure 8-SLT full adder position

the delay time of an array is two levels of current
switch emitter follower (CSEF) independent of the
number of inputs. For sophisticated functions, such
as the one-bit adder shown in Figure 8, more than two
levels of logic may be required.
Some typical comparisons of array logic and random
logic include the sampling design of array logic chips
to perform the same function a random logic chip
would. This comparison helps to partially discover
the merit and the limitation of the array logic. In
comparison with random logic chips that perform
sophisticated functions or have two or more cascading
levels of CSEF's, array logic chips have superior
PTA figures.
CONCLUSIONS
Various array configurations described here suggest
that random logic may be implemented by use of an
array of programmable crosspoints. Comparisons of
array logic with conventional logic indicate that in
many cases the PTA figure of merit is superior for
arrays. The most significant problem with arrays ap-

1 R RICE
Computers of the future
IBM Research Report RC-151 April 201959
2 R RICE
Systematic procedures for digital system realization from logic
design to production
Proc IEEE Vol 52 12 1691-1702 pec 1964
3 R C MINNICK
Application of cellular logic to the de:~ign of monolithic digita
systems
Microelectronics and Large Systems
Spartan Books Wash D C 1965 225-247
4 L C HOBBS
Effects of large arrays on machine organization and hardware
software tradeoffs
Proc FJCC 1966 Vol 2989-96
5 R C MINNICK
Cutpoint cellular logic
IEEE Transactions on Electronic Computers Dec 1964
6 W E KING III A GUISTI
Can logic arrays be kept flexible?
AFCR!. Report 65-547 Aug 1965
7 D C FORSLUND R WAXMAN
The universal logic block (ULB) and its application to logic
design
IEEE Conference Record 1966 Seventh Annual Symposium
on Switching and Automata Theory 236-250
8 S S Y AU C K TANG.
Universal logic circuits and their mod1~lar realization
Proc SJCC 1968
9 R C MINNICK
A survey of microcellular research
Jour ACM Vol 142 April 1967 203-241

Characters-Universal architecture

for LSI
by F. D. ERWIN and J. F. McKEVITT
Hughes AircraJt Company
Fullerton, California

defined areas represent the regions of the system with
the highest gate-to-pin ratios. After these portions are
lifted out of the system, the remainder is characterized
by very low gate-to-pin ratios (notably control and
data routing functions). Unable to satisfy the LSI
design criteria of high gate-to-pin ratios any longer,
the designer must look to more standard components.
Unfortunately, any proposed solution to the LSI
partitioning problem which lacks a total system approach tends to drift towards this pitfall.
Researchers striving towards partitioning for total
or near-total LSI implementation tend to diverge
along one of two conceptual paths; bit-slicing and
functional partitioning. To illustrate the difference,
consider the data portion of the computer. In functional
partitioning one may specify an adder as one LSI array, registers as another, a shift register as a third, and
so forth. On the other hand, in bit-slicing one would
design an LSI array consisting of a combined one- or
two-bit adder, registers, shift registers, etc., then build
up his system from this chip type according to the desired word length.
The bit-slice approach has resulted in some notable
advantages, particularly the ability to achieve very
high gate-to-pin ratios and implement systems using
a small number of different array types. 1 ,2 However,
bit-sliced mod~les have the basic flaw of being systemdependent, a drawback described by Pariser in an
early paper.3 This means that behind such bit-slicing
approaches there lie systems, real or implied, for which
the resulting arrays are most efficient. An attempt to
apply the arrays to a significantly different system
results in a poor design. Considering the types of bit-

BACKGROUND
Since the advent of LSI technology, several schemes
have evolved for the utilization of large arrays to their
full potential. A common and straightforward approach
involves the designer restricting himself to the equipment being designed at the moment. Faced with only
a limited set of problems, it is not difficult to specify
a small number of LSI array types which will efficiently
complete the design. While the results are quite encouraging for specific cases,! the drawbacks of any mass
adoption of these techniques are obvious. This, the
so-called "custom approach," would require the semiconductor manufacturer to be responsive to each customer with numerous low-output production runs of
highly specialized devices. The per-unit cost to the
user, for his own efforts as well as those of the manufacturer, would be quite high due to the inability to
spread initial costs over many devices. In addition,
the complexity of lOO-gate-plus arrays is such that it
is difficult to substitute one for another (with efficient
results). This would severely limit the· off-the-shelf
capabilities of both user and manufacturer.
An obvious solution to these problems is the intrqduction of a small set of standard LSI chips. Semiconductor suppliers, making tentative advances into
LSI product marketing, have already proposed such
devices as adders, counters, and shift registers. However, this does not represent the solution to the general
problem. A design heavily committed to the use of these
devices must fall back on MSI or standard I C for the
large remainder of the circuitry. The reason is that
adders, counters, registers and other orderly, well-

69

70

Fall Joint Computer Conference, 1969
.~--------------~--------------------------------------------~---

slice devices being proposed, inefficiencies would most
often be manifest in the design of a simple device in
which the majority .of the gates qf the array intended
to accomplish complex functions ~re wasted. Although
this may be acceptable in some: situations, it is unlikely that it would satisfy the strict requirements of
size, weight, power, and reliability imposed by aerospace and military systems.
It is the contention of this p~per that a judicjous
partitioning of digital systems in general, divorced
from bias towards any particular system, results in a
set of LSI devices that can entirely implement many
different computer systems of varying functional complexities and word lengths.
The resulting group of array~, referred to as a
"character set" and each one indiyidually as a different
"character", is sufficiently small ib. number (10), with
each type having acceptable size· and gate/pin ratio,
to be considered acceptable and desirable in view of its
wide range of app~ications. These! building blocks are
referred to as characters because of the metaphor that
may be made between the building blocks and characters of the alphabet (letters). Letters form words
to express the language whereas ~uilding blocks form
units to build the machine. In both cases a closed set
(of characters) is used to produce the desired end.
Although the character set is neither rigidly functionally-partitioned nor bit-sliced, it is biased towards
functional partitioning to give it the versatility to
efficiently implement both comple* and simple digital
devices. As an approach, functio~al partitioning has
a detailed and successful backgtound. 3 ,4 Bit-slicing
consideratoins give the character set its ability to
implement systems of varying word lengths.
In addition to providing the u~er with a standard
set of chips to implement many different digital machines, the completeness of the approach (the ability
of the characters to implement the whole machine)
relieves the user of the burden of: logic design. These
tasks are reduced to the selection of character types
and word lengths.

Introduction to the character set
A universal conclusion among LSI researchers is
that control functions are more difficult to modularize
than functions related to data :operations. Micromemory control technique was chdsen as the solution
for LSI implementation for several reasons. A micromemory, meaning here a read-only Bolid-state memory
with its sequencer and instruction register, is easily
partitioned into the large modules! necessary for LSI
implementation. Control fUllctions in this form are

then amenable to reproduction in large quantities
of identical units. Also,design with control centered
in one level of micro memory is more orderly and
straightforward.
The micro memory has been provided with a relatively sophisticated microprogram instruction repertoire. This means that the microprogram contains the
essence of the machine's major mathematical functions, such as multiply and complex sequencing. This
is desirable since it represents an efficient use of hardware for these purposes and also reduces the number of
different array types necessary. Also, a versatile repertoire leaves the designer free to make units which
operate as simply or as complexly as desired. The
~egree of flexibility which this repertoire gi ves the
character set is a major factor in its success. It should
be stressed that the "micro operations" of the I~harac­
ter set are as important a factor as its logic design. This
fact, a critical one in all LSI solutions committed to
micromemory control, cannot be overemphasized.
Interest in designing a character set at Hughes was
concurrent with the development of an advanced computer system. The character set itself was developed
with the ultimate objective of implementing all future
Hughes digital data processing equipment with a common family of LSI circuits.
The outcome of that original effort revealed that
computer structures in general are frequently ordered,
or at least amenable to such ordering, as shown in
Figure 1.
The divisions of Figure 1 are functional. That is,
regardless of the hardware characteristics, the computer
philosophy is such that its functions may be identified,
separated, and diagrammed as shown in the figure.
From Figure 1 came the concept of the funetional
character set. With the fundamentals of LSI design
in mind, logic was designed to accomplish each computer

BOOLEAN LOGIC FUNCTIONS

COMPUTER
CONTROL
FUNCTIONS

•

MINORI TRANSFER, SHIF1',
ROTATE, COMPLEMENT,
INCREMENT, LOGICAL
OR, ETC .

•

MAJOR, ADO, SUBTRACT,
EXCLUSIVE OR, ETC.

'NPUT/OUTPUT

FUNcn"'~

FAST ACCESS
REGISTER STORAGE

AUXILIARY DEVICES
• COUNTERS
• CLOCKS
:

~~:CT~~:AD

I

~

'----1
_ _ _ _ _ _ .....J
L

CORE MEMORY

Figure I-Computer functional organization

Characters-Universal Architecture for LSI

ta

71

M.M Micro-array
PI Scratch pad memory
P2 Up/Down counter
P3 Switch

M

Ml

M2

INPUT /OUTPUT
FUNCTIONS

~:~~s~~~~ss
SCRATCHPAD

!

II

L.._ _----I._ _ _-...J

II

L..._ _----1._ _ _.....

G1 character

~.BITS---1

~----j.----I

g~~~'::-ER·I ~_ _--j._ _ _- I
L..-_ _---I._ _ _~
SWITCH

Characters of the same letter are logically grouped
into a common unit as illustrated in Figure 3.

CORE MEMORY DEFINED
AS AN I/O-TVPE DEVICE

I

Figure 2-Functional charf:l.cter set

function indicated by the picture. Each unique LSI
chip type which resulted was referred to as,a different
character type and given an identifying name and
number. Figure 2 shows the character set which resulted from the logic design according to the concepts
outlined in Figure 1.
The character set and repertoire have been through
several improvement cycles and used in the test implementation of a NASA computer to be discussed
later. Current plans include test design of the H4400
(a new Hughes computer) with the improved character
set, implementation of the character set with high
speed ~IOS circuits, and construction of one computer
using the characters.
These ten LSI characters alone provide the entire
hardware complement for the logic of a broad range of
computers and digital equipment. No extra logic in
the form of either IC, MSI, or custom LSI need be
added to the characters to finish the job. An important
by-product of this is that the user need never consider
logic design. His tasks are reduced to selection of the
necessary characters and the writing of the appropriate
microprograms for them. In fact, it is possible for the
character set to fit into a realistic total design automation procedure as discussed later.

The G 1 character provides the bulk of storage for
operands of the microprogram. Each character contains four registers of eight bits each accompanied by
reading and writing selector gates. The storage element
is provided with simultaneous dual reading and
writing capability. The storage flip flop itself is designed
for minimum read after write delay.
Eaeh of the two input busses is common to all
registers and carries to the G 1 character eight lines
per bus, one line from each bus for each bit of the
register. Input data selection is accomplished at the
memory element by a coincidence of positive information on a particular input bus and register selection
for that bus by destination decoding logic within the
character. The destination decoding logic is duplicated
to provide for writing from the two input busses into
the same character under control of two different microcommands. As will be illustrated later, this is a key
factor for the machine expandability property of the
character set as it allmvs G 1 to form a data path link
between individual logic units under control of up to
two' different micromemories. Different registers in
r,he character may be written into simultaneously.
Reading of the register is provided by dual source
decoding logic which gates data to independent dual
output busses. This duality provides for information
from any two registers to be simultaneously placed on
two output busses. The conceptual structure of the G 1
character is shown in Figure 4.
Several G 1 characters placed in parallel provide
registers of more than eight bits in length.

Description of the character set
This section describes each of the ten characters.
They are summarized below for reference.
G1
Ll
L2
L3
Ml
M2

Register storage
Generallogic
Arithmetic logic
Input/Output
Micromemory counter
Micro-instruction Register

Figure 3-Typical functional character configuration

Fall Joint Computer Conference, 1969

72

L1 character
The Ll character provides the basic logic functions
selectable by microprogram. In addition input bussing
is provided for nine channels (eight bits/channel).
One channel of the bus is required for each G 1, L2 or
or L3 character connected to the L1 character. The
logic functions provided consist of the rotates, shifts
(logical), no-operation, complement., and incrementation. Also associated with the L1 charac>ter is the decoding logic for these logic operations. The type of
microprogramming used with the functional character
system relies heavily upon the fast and efficient manipulation of bits within the various operands. To this
end, shifts and rotates have been: provided which execute from 1 to 31 positions in a single step (as opposed to serial operation). Incrementation is accomplished with the use of a logic register which may also
be used as a simple holding register. The L1 character
is eight bits wide and contains the following logic:
1. Bussing gates

2.
3.
4.
5.
6.

Decoding logic
Rotate, slJift, and complement logic
Incrementer
L register
Gating to output bus

In Figure 5 is shown a block diagram of the L1
character. Several L1 characters may be connected
together to form logic operations on words longer than
91112-1'

r---I
I
I

L_-,
I
I
I
ENCODED
SIGNALS

ENCODED
SIGNALS

Figure 4--G 1 character block diagram

MICRO.
MEMORV
CONTROL

L _ _ _ _ _ _ _ _ _ _ _ _ GENERAL LOGIC.JI
F~I~

Figure 5-Ll character block diagram

une byte. A limit of four bytes exists in order to maintair! consistency of definition in the rotates and shifts.
Information entering the L1 card from the various
sources is bussed to form the input bus. Then it is
operated upon and the resultant is bussed to the output bus where it leaves the character or is optionally
stored in the L regist.er (",here it would thus be available
at the next mirro-instruction time for use in the increment operation or as an "L" source).

L2 character
The L2 character provides the major arithmetic
functions used by the microprogram. The arithmetic
unit provides the 2's complement sum of the contents of the A and B registers. Addition is performed
with carry look-ahea'l byte parallel. Control signals
may copditioll the adder to alternately provide either
of two special results (a) a mod 2 addition instea.d
of full addition or (b) an input carry to the lowest order
bit for full addition (this forced carry in conj unction
with a negated operand accomplishes a 2'B complement operand for subtraction). The L2 character
consists of two holding registers for the operands of
the adder, the adder itself, decoding and error logic,
and bussing gates. Figure 6 dia.grams function-wise
the L2 character.
A typical arithmetic operation using the L2 character might proceed as follows: (1) first operand traIlS"
ferred to· B register (from output bus), (2) second
operand transferred to A register, (3) after appro,priat.e
delay access result and transfer out of L2 charact.er via
the input. bus. The error logic provides overflow and
carry-out information.

Characters-Universal Architecture for LSI

r----------------------~

73

r-----.,.LL..L...U.. . .

I
1
1

I

~~+h

I

I
I
I

I
I
I
I

I

INPUT

I
_;...1-++-~--+--I

I

INTERRUPT

I

Figure 6-L2 character block diagram

I
I
I
I
I
I
I

(mIl

DESTINATION

'---_~OECODE

I
'------

L3 character
The L3 character provides input/output capability
for the microprogram machine. For purposes here
input/ output includes not only the usual peripherals
but also main memory, scratch pads, real time clocks,
an P -charact.ers-namely all elements of the computer
not directly controlled by the micromemory. The L3
character provides iDput gating for external devicesfour buffered and three non-buffered channels. The
buffered-input gatiDg may be controlled either by the
microprogram or the external I/O device itself. Four
I/O output channels are provided. Interrupt signal
storage and int.errupt mask storage for four channels are
available. Parity generation and checking along with
odd/even control is provided for the four buffpre tl and the error type in cell m has not changed.
Further, the input and output leads of the cascade do
not fail.
I t is assumed that the 12 allowable cell functions for a
Maitra cascade are fI, f2, f3, f4, f5, f6, 17, fs, f9, flO, fn, f13,
and f14. (See Definition 1 for an explanation of the notation
Ii.) Seven allowable errors are assumed for each cell;
these are hb (s-a-l; stuck-at-one), fo (s-a-O; stuck-atzero), fl5-p (complementation where p is the cell
function), f12 (the input X), f3 (the complement of the
input X), flO (the input V), and f5 (the complement of
the input V). These seven errors consist of the two
failure types (s-a-O and s-a-l) usually assumed by
most fault diagnosticians augmented by f15-p, h2, fa, fIr'>
and fs. [Note that flO and i5 have different allowable
error sets; i.e., Ehu = (fr, i15, f5, f12, f3) and Ef5
(fr, f15, flO, f3,

i12).J

Definition 1.
follows:

The cell functions are numbered as

Xi Y i-I fo fl h f3 f4 f5 f6 17 /s /9 flO /n !I2 !I3 f14
Assumptions and definitions

Figure 1 illustrates the interconnection structure of a
Maitra cascade. 3 Every cell in the cascade is a two-input,
one output cell. It is assumed that the Boolean variables
applied to the cascade are numbered as illustrated on the
cascade shown in Figure 1. All testing of the cascade is
accomplished using only the input leads and the output
lead of each cascade (and of arrays). The ability to
measure the functional value produced by a cell by
means of probing a buss connecting two adjacent cells js
not assumed. To minimize the "uncertainties" (the
functional values between cells cannot. be measured and
the location of the error is unknown; therefore, the
functional values between cells are uncertain) involved
in testing cascades, it is assumed that cell n is tested first
(see Figure 1), then cell n-l, etc. If an error occurs in
cell n-j, its propagation may be stopped by one of cells
n-1, n-2, ... , n-j + 1. Once cell n is tested, it may be
set such that it transmits the output of cell n-1 to the
output terminal of the cascade. In this manner (under
certain error assumptions) the cells may be tested in the
following order until error location results: n, n-1, ... , 1.
The number of tests needed to test a cellular cascade is
O(n) *, where n is the number of cells in the cascade.
I t is assumed that only one error (faulty cell) may
appear in a cascade. Also, the interconnections between
cells do not fail, the error is time independent; i.e.,

* See Definition 6.

0 0 0 1
0 1 0 0
1 0 0 0
1 1 0 0

0 1 0
1 1 0
0 0 1
0 0 0

1 0 1 0 1
0 1 1 0 0
1 1 1 0 0
0 0 0 1 1

0
1
0
1

/15

1 0 1 0 1
1 0 0 1 1
0 1 1 1 1
1 1 1 1 1

Definiton 2. An error occurs in a cell whenever the
cell produces a function that is not the same as the
function specified for that cell.
Definition 3. G = (ft, i2, 14, fs, f6,

17, fa, jg, ho, /n, h3,

!t4).

Definition 4. I

p

denotes (1, 2, 3, 4, ... p).

Definition 5. The error function E is a mapping
from G x In to G, where EUh j) = A denotes that cell j
was theoretically to produce fi€,G but instead it
produced AeG. Clearly, E(jj, j) = fi indicates that cell j
does not have an error occurring in it.
Definition 6. X* means either X or X', but not both.
Definition 7.
nitude as n.

O(n) means the same order of mag-

11 necessary and sufficient condition for fault
location in cascades

Location of a single fault in a cascade is considered in
this section. A necessary and sufficient condition for
location of a single fault in a cascade is proven. The

84

Fall Joint Computer Conference, 1969

proof of Theorem 1 can be utilized to obtain an algorithm to loca,te faults in a cellular cascade or array.
Theorem 1. Given a cascade with n cells, then the error
can be located if a,nd only if for every
iEln - (1)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)

Proof:

E(fl4, i) ~ fIi)? f12
E(fll, i) ~ f10i h
E(fs, i) ~ fo, 112
E(h, i) ~ fo, f3,
E(fa, i) ~ f9, J12' f3
E(f9, i) ~ f6, ~12' f3
E(f13, i) ~ f12; flo
E(h, i) ~ f3,!t6
E(f4, i) ~ fo, f12
E(fl, i) ~ fo, f3
E(flO, i) ~ fo, f15, f6
E(j5, i) ~ f10,fl),!I5

The proof is an inquction proof. Clearly,
the theorem is truJ for the case n = 1.
Assume that the ~heorem is true for a
positive integer k and consider a cascade
with k + 1 cells. Given the cell function
for cell k + 1, if it can be shown that the
error can be located in cell k + 1 if and
only if assumptions (1) through (12) are

Figure 5-Test decision map for fs

Figure 6-Test decision map for f2

Figure 7-Test decision map for f8

Figure 3-Test decision map for fu

Figure 4-Test decision map for f11

valid for cell k + 1, then the proof is
complete.
Assume conditions (1) through (12).
This part of the proof is now completed in
Figures 3 through 14. Note that if Co,
G1, " ' , Gi are used to set Yi = C at time
tI, then if Y i = C is wanted at time ~ if
Go, G1, " ' , Ci are utilized again, Y i is the
same value as it was at it; however all that
can be said about Y i is that it is either C
or C', but not both. This fact is used in the
proof of this theorem. In the figures with
the circled function number it may be
necessary to add one more test to deter-

Fault Location in Cellular Arrays

Figure 8-Test decision map for f9

Figure 9-Test decision map for f1a

Figure lO-Test decision map for f7

Figure ll-Test decision map for f4

85

Figure 12-Test decision map for f1

Figure 13-Test decision map for flo

Figure 14-Test decision map for fr;

mine whether the cell is in error or is
receiving the complemented sequence.
The proof of the other half of the
theorem will be by contradiction. Assume
that the error can be located, but that the
restlictioIlS (1) through (12) are not
needed. Then it can be verified that the
following pairs of conditions give the same
output at the cascade's t~rminal. Since the
two conditions give the same outputs, the
error cannot be located, which is a con··
tradiction of the assumption; therefore,

86

Fall Joint Computer Conference, 1969
(9) Y k = 1, 1, 1 and E(f4, k + 1) = h;
Y k = 0, 0, 1 and E(h, k + 1) = fo.
Y k = 0,0, and E(h, k + 1) = h;
Y k = 0, 0, 1 and E(j4, k + 1) = !J.2'

the assumption that the restrictions are
not needed iA incorrect and the proof is
completed. After (1) an abbreviated notation is used. Note:
Using the Test
Decision lVlaps and the contradiction part
of this proof one can actually determine
the values of Y i-I.
(1) Y k = 1, 1, 1 and E(f14 , k

+

°

(10)

°

k

k

Yk

°

(12)

+

+

°

(4) Y k = 1, 1, 1 and E(f2, k + 1) = f2;
Y k = 0, 1, 1 and E(f2, k + 1) = fa.
Y k = 0, 0, and E(f2, k + 1) = f2;
Yk = 0, 1, 1 and E(f2, k + 1) = fo.

°

(5) Y k = 1, 1, 1 and E(f6, k + 1) = f6;
Y k = 0, 1,
and E(f6, k
1) = fa.
Y k = 0, 0, and E(f6, k
1) = f6;
Y k = 0, 1, and E(f6, k
1) = f12.
Y k = 1, 0, 1 and E(fe, k
1) = f6;
Y k = 0, 1, 0 and E(f6, k + 1) = fg.

°°
°

+
+
+
+

= 1, 1, 1 and E(fg, k + 1) = fg;
Y k = 0, 1, and E(jg, k + 1) = !12.
Y k = 0, 0, and E(fg, k + 1) = fg;
Y k = 0, 1, and E(fo, k + 1) = fa.
Y k = 1, 0, 1 and E(fo, k + 1) = fo;
Y k = 0, 1, and E(fg, k + 1) = f6'

(6) Y k

°
°
°
°

(7) Yk = 1, 1, 1 and E(fla, k + 1) = f13;
Yk = 0, 1, 1 and E(jla, k
1) = f12.
Y k = 0, 0, and. E(fla, k + 1) = f13;
Y k = 0, 1, 1 and E(fla, k + 1) = f16.

°

+

(8) Y k = 1, 1, 1 and E(h, k + 1) = f7;
Y k = 1, 0, 1 and E(j7, k + 1) = fa.
Y k = 0,0, andE(j7, k
1) = f7;
Y k = 1, 0, 1 anp E(J7, k + 1) = fu).

°

+

=

=

0, 1,

°
°

and E~!lO, k

+ 1) = 10.
+ 1) = .flO;
+ 1) = /5'

Y k = 1, 0, 1 and E(flO , k
Y k = 0, 1, and E(ito, k

(2) Y k = 0, 0, and E(fll, k
1) = fn;
Yk = 0, 0, 1 and E(fn, k + 1) = fa.
Y k = 1, 1, 1 and E(fn, k + 1) = fn;
Y k = 0,0,1 and E (fn, k + 1) = fIr,.
(3) Y k = 1, 1, 1 and E(fs, k
1) = fs;
Y k = 1,0, 1 an.;! E(fs, k + 1) = !t2.
Y k = 0, 0, ~nd E(fs, k + 1) = fs;
Y k = 1, 0, 1 and E(fs, k + 1) = fo.

= f15'

k

Y k = 0,0, and E(f14, k + 1) = f14
are equivalent to Y k = 0, 1, and
E(f14, k + 1)= it2 at the cascade's
output terminal.

°

°°

°
(11) Y = 1, 1, 1 and E(flO, k + 1) = ito;
Y = 0, 1, °and E(ito, k + 1)
Y
0, 0, °and E(flO,k + 1) = flO;

1) = f14
are equivalent to Y k = 0, 1, and
E (f14, .k + 1) = fl5 at the cascade's
output terminal.

°

Y k = 1, 1, 1 and E(fl, k + 1) = it;
Y k = 0, 1, and E(it, k + 1) = fo.
Y k = 0, 0, and E(Jl, k + 1) = fl;
Yk = 0, 1, and E(it, k + 1) = fa.

Y k = 1, 1, 1 and E(f5, k + 1) = !5;
Y k = 0, 1, and E(f5, k
1) = !o.
Y k = 0, 0, and E(Is, k
1) = /5;
Y k = 0, 1, and E(/5) k
1) = f15'
Yk = 1, 0, 1 and E(f5, k + 1) = is;
Yk = 0, 1, and E(/5, k
1) = ito.

°
°
°
°

+
+
+
+

If the cascade meets the assumptions of Theorem 1,
then Theorem 1 can be used to determine test schedules
for the location of an error in cascades. It should be
noted that when cell k is tested, one obtains information
about the cells k - 1, k - 2, .. " 1, and therefore a test
schedule with O(n) tests will test any cascade with n
cells under the allowable error set6 • Clearly, if the
conditions of Theorem 1 are relaxed, then fault detection
(and maybe isolation) can be accomplished in the same
number of tests; however, if one is only interested in
fault detection, Theorem 2 is the best technique to use.
If a more complex cascade than the casca.des considered here is under consideration,· then a good
understanding of the method used to derive the
theorems in this paper will allow one to extend the
theories presented. If the cell functions fo, fa, !J.2, and f16
are allowed, then the fault techniques may be easily
extended since none of these functions depend on the Y
value; however, one must exercise care in the use of the
theory because it is based on the ability of the tester to
place theoretically both a
and a 1 on the Y interconnection, and examples (trivial) in which this cannot
be accomplished do exist.

°

Fault detection in Maitra cascades

In

this section the detection of a single ftmlt in a
cascade is considered. The theory for this section is
based on the observation that every n cell Maitra

Fault Location in Cellular Arrays
cascade (as defined in this. paper) produces :;I. function
dependent on X 06.
The purpose of this detection scheme is to utilize
exactly two tests to detect whether a cascade has a
faulty cell.
Theorem 2. Let the Maitra cascade have n cells. If
e2, " ' , c'n are such that f(X o, e1, e2,
en) = Xo*, then

e

1

e1, "', en) = f(O, e1, "', en)
implies that there exists a cell i such
that E(fp, i) = fo, f16, itz, or fa.

(1) f(l,

(2) f(l,

e1,

en)

= (1 *)' and f(O,
(0*)' imply that there
exists a cell i such that E(f p, i) =
it6-p or is·

e1,

"',

e1,

"',

en) =

en) =

1 * and f(O,
0* imply that there is
no error in the cascade or that there
exists a cell such that E(f p, i) = flO
and p r6 10.

(3) f(l,

e1,

Proof:

"',

"',

en)

=

In part (1) f does not depend on Xo;
therefore, there must be a cell i such that
E(jp, i) = fo, f15, it2, or fs. In part (2) f
depends on (X 0 *) '; therefore, there is a
cell i such that E(f p, i) = f15-p or f5'
Whereas, the proof of part (3) is now
obvious.

X 0 was chosen as the variable to be used in Theorem 2
because of the symmetry of the resulting theorem.
Since Xl can be made (by a suitable choice of constants~
to pass theoretically through every cell *, the theorem
could be rewritten in terms of Xl. In terms of the
complexity of the detection scheme it is seen that
cascades could have a very simple detection test
schedule. It should be noted that Theorem 2 can very
easily be adapted to provide fault detection in cascades
if it is assumed that flO is not an allowable error for any
of the 12 cell functions.
Examples
This section consists of examples of the use of
Theorems 1 and 2. fA denotes the measured value of
f whereas fT denotes the theoretical value of f.

* Assuming the cell function for cell 1 is not flO or f6•

87

Example 1. Assume that there is no error in the
cascade shown in Figure 15.
Test
Xo Xl
0
0
0
0
0
0
0
0
0
1
1
0

X 2 Xa
1
0
1
1
0
1
1
0
1
0
1
0

X 4 fT fA
0
0 0
0
1 1
1
1 1
0 0
0
1 1
0
1 1
0

Conclusion

E(fa, 4) = fa
E(fs, 3) = fs
E(f14, 2) = f14
E(f14, 1) = f14

Example 2. Assume that E(fs, 3) =
shown in Figure 15.
Test
Xo Xl
0
0
0
0
0
0
0
0

X 2 Xs
1
0
1
1
0
1
1
0

X 4 fT fA
0
0 1
0
1 1
1
1 0
0
0 1

it6 in the

cascade

Conclusion

E(fa, 4) = f6
E(fs, 3) = f15

Example 3. Assume that E(f14, 2) = fain the cascade
shown in Figure 15.
Test
Xo Xl
0
0
0
0
0
0
0
0

X 2 Xa
1
0
1
1
1
0
1
0

X 4 fT fA
0 1
0
1 0
0
1
1 0
0 0
0

0000000

0101011

Conclusion

E(fa, 4) = fa
E(fs, 3) = f6 80
an extra test is
needed.
E(fs, 3) ~ f5 and
the complemented
sequence
Y2 IS being
received.
E(f14, 2) = f3

Example 4. This example satisfies the hypothesis of
Theorem 2. Assume that E(fs, 4) = fo for
the cascade shown in Figure 1.5.
[(Xo

+

Xl

+

X z) Xa]

EB X 4 =

fT(X O, Xl, X 2, X a, X 4)

!T(X O, 0, 0, 1, 0) = Xo
fA(O, 0, 0, 1,0) = fT(I, 0, 0, 1,0) = 0 implies that there
is a cell i such that E(f p, i) = fo, it5, it2, or fa.

Fall Joint Computer Conference, 1969

88

--1

xo

t
f14

H

C
f14

H

(3
fa

r ~f

closely resembling a cascaded structure).

4

H

ACKNOWLEDGMENT

f6

Figure 15-A cascade to: be tested

The author wishes to thank R. C. Minnick for his help
in the preparation of this paper.

REFERENCES
CONCLUSION
Techniques for fault location an~ detection in cellular
arrays with an allowable error set of fo, f16, !I6-p, fa, !I2,
f6, or flO were described in this paper. It was shown that
the problem of testing an array could be reduced to the
problem of testing a cascade. The solutions presented
are particularly attractive because of their simplicity.
To locate an error, O(n) tests are needed for an n cell
cascade. Detection of an error requires only two tests
if the allowable error set is reduced by one error (flO).
A necessary and sufficient conclition for single-error
location was given. If the restrictions of this condition
are relaxed, then an isolation theorem such as given by
Thurber 6,7 can be derived; however, this isolation
condition will be more complex t~an the theorem given
by Thurber 6,7. A criterion that enables detection of a
single error in only two tests was! derived.
Although the theories presenited were derived for
regular arrays of logic, they have ,potentially wide areas
of application. A good understanding of the philosophies
presented here will allow the extension of the results to
cascades of m input n output cellf';. Also, some irregular
arrays may be tested using this ,theory if they can be
decomposed into sections composed of some form of a
cascaded structure (or sections composed of structures

1 W H KAUTZ
Testing for faults in combinationa,l cellular logic armys
1967 Switching and Automata Theory Symposium
2 W H KAUTZ
Diagnosis and testing oj cellular arrays, properties of
cellular arrays jor logic and storage
SRI Project 5876 Scientific Rpt No 3 July 1967 119-145
3 K K MAITRA
Cascaded switching networks oj two-input flexible cells
IRE Trans on Electronic Computers Vol EC-ll April
1962 136-143
4 R C MINNICK
Cutpoint cellular logic
IEEE Trans on Electronic Computers Vol EC-13 Dec
1964 685-698
5 R C MINNICK
A survey of microcellular research
Journal Association for Computing Machinery Vol 14 April
1967 203-241
6 K J THURBER
Fault location in cellular arrays
PhD dissertation Montana State Univ June 1969
7 K J THURBER
Fault location in cellular cascades
Submitted to IEEE Trans on Computers
8 L M SPANDORFER J V MURPHY
Synthesis of logic .functions on an array of integrated circuits
Scientific Rpt ~o 1 for UNIVAC Project 4645 AFCRL63-.528 Contract AF 19(628)2907 Sperry Rand Corp
UNIVAC Engineering Center Oct 1963

Fast multiplication cellular arrays for
LSI implementation
by C. V. RAMAMOORTHY and
S. C. ECONOl"fIDES
The Univer.~ity of Texas at Austin
Austin, Texas

The methQdQIQgy and retrQactive design prQcedures
Qf the lVlultiplicatiQn Array are presented. IntercQnnectiQn arrangements at the cell level, fQr the array
fQrmatiQn, as well as the mQdule level by. bringing all
mQdule inputs and Qutputs at the terminals Qf the
"package", fQr the purpQse Qf assembling larger multiplicatiQn units, are alsO' shQwn.
Since in any LSI circuit testing impQses a cQmplex
prQblem SQme diagnQstic schemes are suggested for
recQnfiguratiQn and QperatiQn under reduced capabilities 0'1' even by autQmatically switching in Qf a permanently cQnnected spare mQdule.
Other LSI cQnsiderations in terms Qf cell or module
fan-in/fan-Qut, tQtal number Qf pins required per
package, chip sizes and densities and rough cost estimates are alsO' discussed.

INTRODUCTION
The inherent capabilities Qf Large Scale IntegratiQn
technQIQgy have recently shifted attentiQn tQward twO'
majQr cQncepts in the design Qf functiQnal cQmputer
subsystems; the cQncepts Qf FunctiQnal MQdules and
Cellular Arrays.
The FunctiQnal MQdule cQncept emphasizes the
PQssible standardizatiQn Qf frequently used CQmmQn
digital subsystem units such as registers, adders,
cQunters, etc. Because Qf the unique iterative prQperties alsO' displayed by these units it is CQmmQn to' view
them as building blQcks (functiQnal mQdules), built
Qn a single substrate Qf material, the intercQnnectiQn
Qf which can expand significantly their functiQnal
capabilities. In additiQn to' standardizatiQn, their
massive prQductiQn may suggest IQW CQst subsyst~ms.
The Cellular Array cQncept allQws the intercQnnectiQn Qf several types Qf mutually independent logic
blQcks, the cells, in various geQmetric CQnfiguratiQns
to' perfQrm a desired QperatiQn.
This paper is an attempt to' cQmbine the abQve twO'
apprO' aches in the realizatiQn Qf a Binary Cellular
Array multiplicatiQn unit easily adaptable to' the
LSI realizatiQn techniques and speculate the PQssibilities Qf the realizatiQn Qf Qther similar such functiQnal
units aiming to' IQwer the CQst per unit Qf cQmputatiQn and PQssibly increase the Qverall system reliability.
MultiplicatiQn was chQsen in the study because it
fQrms the basis Qf divisiQn and square rQQt operatiQns
by iterative methods as well as others indicated by
design trend Qf present day cQmputing systems.

Single bit multiplier

;'Figures 1 and 2 show the integral parts and the detailed cellular array structure Qf the multiplication
unit, in which each rQW of the array cQrresPQnds to'
Qne bit Qf the multiplier. The array uses K-bit Qperands
prQducing 2K bit prQduct.
TO' achieve fast executiQn time the mUltiplication
is done by perfQrming K-l carry save additiQns (simple
EXCLUSIVE-OR QperatiQns) followed by a full
binary addition. Since the cells in the array Qperate
asynchrQnQusly, the unit as a whQle can Qperate faster
without using a clock pulse.
We, shall next explain the single-bit multiplication
unit in some detail.

89

90

Fall Joint Computer Conference" 1969

------------------------------------~~-----------------------------

m\

m.

m~_ um2
m, --"I
.

r-------------

,

1

-n2

~~~C'

I
I

-n3

C
C

~-I-I----.l¥--_

I
I
I

P~

-

C

_..J'.

P~

_ _n\

P~
AND

-+f-,¥,-------",

CARRY SAVE ADDER

~c-------_n7

I

I
I

'\

I~--~~~~--~~~I

I

,...x.---:1I.~"------"'--""'----"'---JI

I
I

I
L _______________ _

FigurE' 2-The "single-bit" asynchronous mult.iplieation
cellular array

I

-

-

__ I

Figure l-The integral part.s of the asynchronouH
multiplication array

The following example will illustrate the above
matrix formation.

Let the multiplicand be represented by the binary
vector M = (mJ, m2, ... mk) and :the multiplier by the
binary vector N = (n 1, n2, .. . nk).
A kx2k, P matrix is now generated starting from righ t
to left (whose elements Pij are computed from the
relation Pij = m
nj, PijE fO, 1} with the follO\ving
conditions
1: '

EXAl\IPLE

-----

11

o if ni =

SiS k for i - 1,2,3
} ... k
0 and/or 1
k

+ 1S j S i i=I,2,3 ... k

I

1 for

In terms of the array to be implemented, this condition
implies that for the range "i," "j" where Pij = 0 no cell
will be required to perform a 10giO function. Thus the
[PJ matrix has the following form: '
Pl,2k-2 ... Pl,A: ... P13
P2,2k-Z' ••

Pk,2k-l

P2,A:' .. P23

P12

Pu

000 0 101 0 1
00010 L 0 1 0

o 0 1 0 1 0 100
o 1 0 1 0 100 0

1

=

(10101) and N = (111111)

then the P matrix is P =

fl SiS k for i = 1,
2,3, ... k
mj-Hl if ni = 1 and/or i - l < j < k + l
I for i = 1, 2, 3 ... k
Pii

.:.vI

lVIUI.TIP LY ...

101 0 1 000 0

The above matrix can be realized by selective ANDing of components of M and N. This "Shifting Network" accomplishes the proper positioning of the
numbers to be added before their addition, just as in
the conventional multiplication. Arrays of Carry
Save Adders are used to perform the addition IOf these
binary numbers utilizing Wallace's algorithm.!
The first stage of the Carry Save Adder adds the
first two rows of the P matrix (first two generated
partial products) thus generating two vectors-the
first partial sums and the first carry having the form:
S

=

(SI, 2k-l

SI, 2k-2 • •• SI, k • •• SI1)

P21

The double subscript is used to identify the above
vectors with corresponding positions of the P matrix
that contributes to their generation.

Faist Multiplication Cellular Arrays
The logic functions yielding the elements
are:

S2i

91

and

C2j

where j = 2, 3,," 2k - 1. The composite cells are
shown in Figure 3a.
In the subsequent stages the Carry Save Adder will
add three vectors: The sum vector generated at the
previous stage, the carry vector generated at the
previous stage shifted once to the left and the next row
vector of the P matrix.
The logic functions producing the new sand c vectors

C

Slj~

S.
c

l,j-1

•

PHI,j

•

+

IJ

c

•

I,J-I

+

•

1)

MAJORITY
P lj
FUNCTION OF P 2j =t!)--c 1J
THREE
VARIABLES
P

~~J>S

~L·+IJ
"EXCLUSIVE-OR"
FUNCTION OF
THREE
VARIABLES

-cp
I

-

nference, 1969 LI aotivates the single multiple of the multiplicand (first "AND" gate row of each group of rows in the ESA). L2 activates the 2's complement of the multiplicand (second "AND" gate row, directly under each row of inverters).L 3 activates the double multiple of the mUltiplicand. Therefore, the typical cell of the ICC has B I, B 2, and Co as inputs and L I, L2 and L3 as outputs. Its logic functions are shown below. BI and B2 are any two consecutive bits and Co is the darryout. The logic: BI B2 Co 0 0 0 0 1 1 1 0 1 0 1 0 1 0 0 0 1 1 0 0 1 1 M M 2M LI L2 L3 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0, 0 0 1 0 0 1 0 0 Note: The interpretation of BII B2 = 01 is not one times the multiplier as it would obviously appear, but it is instead two times the multiplicand because of the way the multipli,er is plaGed in the register, vertically with the least significant bit on the tOli>. The B I , B2 = 10 combination is interpreted in a similar manner The typical cell "K" of the ICC is shown in detail in Figure 5b. The ,carry save adder, end around carry accumulator and full binary adder A layout of the inputs to the CSA stages, the EACA and FBA is displayed below. The groups of binary numbers between the lines represent the actuall inputs to a particular row of cells. The first three groups are CSA row inputs. The fourth group represents the EACA inputs and the final group, those of the FBA. An binary numbers representing partial products are of cou~se P matrix row vectors activated by the ICC lines dne to a .particular multiplier bit pair combination. 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 0 1 0 0 1st partial product 2nd partial product, o 0 0 0 0 1 001 0 0 1 1 1 1 101 0 0 0 0 0 1 1 101 0 0 1st partial sum 1st carry 3rd partial product 1 0 0 0 1 1 0 0 0 1 0 0 o1 1 1 0 0 1 60 0 0 0 0 o10 1 1 2nd partial su m 2nd carry 4th partial product o 1 0 0 0 1 0 0 0 1 0 0 3rd partial su m 1 0 1 0 1 1 0 0 0 0 0 0 0 3rd carry o 1 1 1 End Around Carries 1 0 0 0 1 1 1 0 1 0 0 0 14th partial sum o 1 0 0 0 0 0 0 0 1 0 0 0 4th carry 1 1 0 0 1 1 1 0 1 1 0 0 1 Figure 6-The binary multiplying cellular array Final Sum (Result) Figure 6 shows array after superimposing the individual circuits. It can be easily noticed that there is a reduction by a factor of two in the total number of cell rows re..., quired for the array and therefore in the total finml propagation T p, at the expense of some ,additional control logic, a number of inverters and an additional stage for the EACA. No further complexity in the cell structure results, thus the o'tiginally developed cells were used, with a minor modification for cell S as shown in Figure 7a. This cell may also, be present in the single bit multiplication array. It must also be noticed that the overflow of bits resulting in the left-most significant part of the final Fast Multiplication Ce,llularArrays 95 a good choice for all practical purposes. An interconnecting scheme of standard dimension 64 X 8 bit modules to realize the 64 bit multiplier was then devised aiming to minimize the number of pins per module necessary for the interconnection. As seen in Figure 8, the resulting 64 X 64 multiplication unit requires 2-Full Binary addition stages and 4-Carry Save addition stages per module, a total of 32-Carry Save additions and 15-Binary Additions (only one for the first module). However, there is a real time overlap between these various stages, and by utilizing a pipelining technique and a series of flip-flops after each FBA, a 100 percent utilization of the unit during computation is achieved, and the multiplication cycle is considerably faster. This is illustrated shortly in connection with Table III. The basic module as displayed in Figure 6 has to be modified further for the interconnection. An extra FBA and additional gating for diagnostic purposes is , 64 • I-10 -I Figure 7a-Cell "S"-A form of Cell "S" I MODULE - 1 PRIl_ l[ Flln-Plnn ~I'"I Bits I 1 1 MODULE-4 1 MODULE - 5 ~ l MODULE - 6 MODULE - 7 1 1 Figure 7b-CelJ "R"-Reconfiguration cell product register may be advantageously utilized for sign and decimal point consideratlons. Diagnostics and reconfiguration In order to incorpora te diagnostics in the array and study- the interconnection problem, a standard size module had to be assumed. It was felt that the implementation of a 64 X 64 bit multiplier would be Bits 2 - Bits ....:...J +- MODULE -3 r S _ _ _ _-++--_... l~ MODULE - 2 NS L .<;toraae Figure 8-Example of an assembled 64 X 64-bit mUltiplication unit using the pipelining scheme Fall Joint Computer Conference, 1969 96 ----,---------------...:.----~------- introduced in every module between the output of its respective FBA and what is shown as a product register. The typical newly developed cell for the diagnostics and reconfigura tion is shown in Figure 7b, while the above mentioned modifications are displayed in detail in Figure 9 for a typical module. As seen, three additional control lines are needed to perform the following functions. a. To relay a Fault or No-Fault signal, indicating that a fauft has or has not occurred in one particular module (NF/F) (e.g., if F = 0 NF = 1). b. To relay a No Shift signal for the output of this module, (NS = 1) if no fault has occurred in the preceding module. c. To relay a shift, eight-bits to the right, (S = 1) for the output of this and all subsequent modules if a fault has been detected in the preceding module. The detection of the fault could be accomplished by a software routine which may check the final product of the unit periodically and appropriately set the flipflops of the control signals. By shifting the outputs of all subsequent modules to the malfunctioning one eight-bit positions to the right while forcing the output of the faulty module to be equal to zero at the same time and simultaneously introducing the spare module which is permanently connected to the unit, one can still achieve 100 percent computational efficiency. If another module fails to function properly, by applying again the same reconfiguration scheme the unit will function with a reduced capability since the eight-least significant bits of the multiplier will be lost. No provision has been made at this point if two modules fail to function properly OVERFWW I.. at the same time. At least one of them must be replaced to put the multiplication unit back in service. Aiming to maximize the number of multiplications per unit time, as already mentioned, one can introduce storage elements at intermediate points. This allows the unit to accept a new set of operands without waiting for the total completion of the present computation. Consider an m X m bit multiplier module. If the intermediate computations are stored after the Carry Save adders, the first Binary adder and the second Binary adder, the rate of multiplications in the module per unit time will be Rm = 1 where max [tcs, t b] tC8 = Total time propagation through the CHA. tb = Total time propagation through the FBA for the binary addition of two m-bit binary numbers. Then the number of storage elements required per module is 2m + m + m = 4m. If, however, storage elements are inserted at the outputs of the two Binary Adders only, as shown in Figure 8, the maximum rate of multiplications in each module per unit time will be while the total number of storage elements required will be decreased by half, that is 2m. The table below gives the sequence of events in the first four modules of the 64 X 64 composite multiplier unit of eight modules, based on the pipelining technique. Table III 1:.,--- 6 .. ··8 MODULES BlTS TIME UNITS 1 -+--+-..+-----1--rI'-- NF/..-F 2 3 4 5 Figure 9-The comhinationallogic gating for reconfiguration 1 Bll B2l Bn B41 B51 2 Bll B 2t , Bl2 B SI, B22 B 41, B32 B n , B41 4 3 Bll Bu Bal B 41 , B 23 B 51, B33 Bll BZI B13 B 41, B14 B n , B24 Each time unit in the above table corresponds to the factor tb + t es , and B i j represents the j th bimtry addition of the i th multiplication. Fast Multiplication Cellular Arrays 97 ---~-----,------------------------------------------------------------------------ Approximate number of GATES/CELL* For cell "C" approximately seven-gates are required For cell "S", itS'"~ approximately three-gates are required For cell "R" approximately two-gates are required For cell "K" approximately nine-gates are required Figure lO-An alternate. interconnecting scheme for the 8-modules of the 64 X 64 multiplication unit Another interconnecting scheme which has not been investigated yet in detail but seems to be equally as efficient, considerably faster and adaptable to the proposed reconfiguration technique is the one shown in Fig. 10, where each level of nodes represents FBA's Figure 10, where each level of nodes represents FBA's performing in parallel with an anticipated multiplication cycle of LSI implementation The implementation shown for the 64 X 8 module reveals a number of characteristics suitable for large scale integration. Among them are the repetitive interconnections of simple identical cells and the modularity suitable for expansion and reconfiguration. Below some of the approximate hardware requirements are pointed out. Approximate 'number of PINS/MODULE 1. 2. 3. 4. 5. m + n + 2 needed for the multiplicand register m + n + 2 needed as inputs to the second FBA m + n + 2 needed for the product n + 2 needed for the multiplier register three-control pins for reconfiguration Approximate number of CELLS/MODULE The cells are the kinds already discrussed: C, S, S', R, K. All are present in a module. 1. 2. 3. 4. 5. m X n/2 cells needed for the CSA stages m + n cells needed for the EACA stage m + n reconfiguration cells 2 (m + n + 2) cells needed for the two FBS's n/2 + 1 cells needed for the ICC. The above estimates point out the fact that testing at the individual cell or circuit level (item yet to be examined) becomes a problem, especially when the complexity of the chip is increased, with a paralleled decrease in reliability and yield of non-defective chips. However, using the modular approach it is advisable to perform the testing externally on the module and discard the malfunctioning units. This would considerably decrease the amount of logic on a chip, which would otherwise have to be inserted for the testing of the individual circuits. This approach seems to be economically feasible since it is estimated that by 1970 an LSI chip of 100 X 100 mils in size may contain 200 components, at five cents per component, while by 1975 an LSI chip of 300 X 300 mils in size may contain as many as 3,600 components at the cost of about one cent per component. Therefore, miniaturization of LSI chips will discourage the testing on the individual circuit level, while the loss due to the discarding of modules after tesing at the frame level, will be negligible. In view of the above considerations and since the present state-of-art high density MOS circuits are being driven at 10 MHz, implementation of the multiplier modules as the one presented by MOS circuits appears very desirable from a manufacturing viewpoint. A reasonable building block might be a 64 X 64 bit multiplication unit requiring an approximate number of 5000 active elements (field effect transistors) . One could also visualize the whole unit incorporated in one or two chips. Where speed is the primary requirement, the unit can be designed using fast bipolar transistors, with an expected five ns delay. Assuming then a 64 X 64 bit module is implemented by bipolar transistors, the execution time could be in the neighborhood of 0.22.5MS, which when pipelined, the maximum number of multiplications per second may be approximately 5 X 106 • An MOS array of the same module will perform in an order of magnitude slower than in the bipolar case. * The above gates 9,re mostly "AND" gates with the "OR" gate not included in the count. They are also 2(m + n) additional gates needed for the reconfiguration scheme and m X n gates for shifting each array. 98 Fall JQint Computer Co¢erence, 1969 The pin count also indicates that the current design is within the state-of-art of the MOS technology. The performance figures given :above are educated guesses since the circuit and int~rmodule delays are dependent on the circuit types, their interconnections, the chip topology, etc. In addition 1the design examples described in the previous sections indicate the ease with which the array could b~ partitioned to fit reasonable unit or chip sizes. functional arrays appear quite feasible and, worth considering. The possibility of composite design of a multiplication, division and square rooting unit using techniques presented in this paper could be very useful, particularly if the division and square root algorithms are based on the availability of fast multiplication units such as those discussed in this paper. CONCLUSION The authors would like to thank Mr. Gary Vvang of the NASA Electronics Research Center for sharing with them some of his thoughts on the subjeet. Also Mr. W. R. Adrion, graduate student at the University of Texas at Austin for his constructive sugg;estions. Since fast multiplication has become the basis of iterative divisions and square root~ in fast computers6 • 7 there appears to be a need for ch~ap array type, LSI realizable multiplication subsystems. This paper reports the design methodology and the detailed implementation of one such structure. Ease of diagnosis and capability of reconfiguration were used ~s twin requirements in the final design. When the unit is composed of a number of modules and a malfunction is detected in one of them, a method of switching automatically in a spare module was presented. An estimate of the logic circuitry in the hard core (that portion of the unit which must be operating without any faults) during testing is found to be less that 14 percent for a 32 X 32 module, 9.7 percent ~or 64 X 64 module and 4 percent for 128 X 128: module. Therefore, as the size of the multiplication module-unit increases the relative size of the hard core decreases very rapidly. To conclude, the cellular array implementation of an asynchrouous multiplication unit using mostly noncarry-propagating Carry Save add~rs was accomplished. The final cell design and the cOJitrol and the reconfiguring circuitry are quite simple. A number of additional studies needs to be done in the future. The design of self-diagnosable and repairable ACKNOWLEDGMENTS REFERENCES 1 C S WALLACE A suggestion for a last multiplier IEEE Trans Prof Group on Electronic Computers Vol 13 No 1 Feb 1964 2 'Methods for high-speed addition and multiplication NBS Cir No 591 1958 3 0 L MAcSORELY High-speed arith/11.R,tic in binary computers Proc IRE Vol 49 No 1 Jan 1961 4 1\1 LEHMAN Short-cut multiplication and division in automatic binary digital computers Proc Inst Elec Eng Paper No 2693M Vol105B Sept 1958 5 I FLORES The logic of computer arith/11.R,lic Prentice-Hall Inc 1963 6 D FERRARI A division method using a parallel multiplier IEEE Trans Prof Group on Electronic Computers Vol 16 No 2 April 1967 7 S F ANDERSON et al I BM system model .91: Floating point execution unit IBM Journal of Research and Development Jan 19167 The Pad Relocation technique for interconnecting LSI arrays of imperfect yield by D. F. CaLHOUN Hughes Aircraft Company Culver City, California INTRODUCTION The interconnection of circuits required in Large Scale Integration (LSI) using multi-level metalization above monolithic semiconductor arrays is taking basically two approaches. One is predicated on processing with a reasonable yield entire arrays without any semiconductor defects (i.e., 100 percent yield chips) which allows once-generated fixed-wiring patterns to obtain the required interconnect. The second approach aims at much larger semiconductor hrrays (i.e., full-slice LSI) for which defect-free processing cannot be expected. Thus, probe tests are made of the semiconductor circuits processed on each LSI slice (or wafer) and record is made of the good and bad circuit positions. Unique interconnection masks are then generated to interconnect good circuits in each wafer's particular yield pattern using certain "discretion" in avoiding the bad circuits. As a result, the 100 percent yield approach emphasizes the need to use standard interconnect masks but is complexity limited by the occurrence of defective circuits in larger arrays, whereas approaches capable of routing around the defective circuits have required a full set of unique signal interconnect masks for each wafer's particular yield pattern. The Pad Relocation approach, however, allows the interconnection of full ..slice LSI arrays containing defective circuits to be accomplished with a minimal amount of unique interconnect per array. Only a portion of one of the typically three interconnect levels varies from array to array, thus allowing significant improvements in the cost, reliability, and testability of the finished arrays as well as less limitation on cell yields and array complexities. Description of the Pad Relocation technique Pad Relocation is a technique which allows a predetermined standard pattern of good circuits to be established on all LSI slices used to perform the same array function regardless of the varying yield patterns determined by DC wafer probe tests. This is accomplished by relocating the pads of nearby good circuits to the positions where good circuits were specified by a presc~ibed master pattern, but were not· found during wafer probe tests. The pad positions above a bad circuit (or any unused circuit) are isolated from that circuit by a layer of dielectric. Where good circuits are found in expected good circuit locations, those circuits are used without relocation. Thus, the Pad Relocation technique functionally establishes a specified pattern of good circuits as if there had actually been a 100 percent circuit yield in that pattern. A single wiring pattern can then be generated for all the LSI arrays of the same function to accomplish the much more complex signal interconnect between the master pattern circuits. By determining standard cross-under areas within the Pad Relocation layer where relocation lines need never occur, it has been shown that large arrays can be interconnected with the same number of total interconnect layers as required by discretionary techniques. 99 100 Fall Joint Computer Conference, 1969 With each wafer's good circuits located in the predetermined master pattern, an optimal standard interconnect of the circuits can be made for each wafer. Since this signal routing and mask-making expense is incurred only once for each function, much more effort can be spent optimizing the signal routing. As a result, the total number of interconnect levels (including Pad Relocation) may actually be fewer (for very complex arrays) than pther techniques by which the interconnect is generated for each wafer's particular yield pa,ttern. The Pad Relocation technique has been 100 percent successful for all integrated circuit and special LSI wafers considered so far. The "master pattern" gives the prescribed locations of good circuits to which each LSI array's particular yield will be tailored. Statistically, if M is the percentage of wafer circuits in the master pattern and Y is the wafer circuit yield from probe tests, then only M(100 - Y)/100 percent of all wafer circuits need to be relocated. For example, if Y = 35 percent and M = 30 percent, then the relocation (as a statistical average) of 19.5 percent of the wafer circuits will establish a master pattern that uses 86 percent of all the good wafer circuits. This would allow 120 good circuits to be located in prescribed positions, leaving an average of only 20 good circuits unused. An example The methodology of the Pad lRelocation technique is best described by example. Figure 1 shows the mapping of circuits on an LSI wafer. :Each dot represents the position of a semiconductor. cell such as a full adder, or a quad two-input NAND gate cell, or a flipflop, etc. Figure 2 identifies with a slash (/) the location of all circuits determined to be good by dc wafer probe tests on a particular slice., The yield of wafer circuits varies from 10 percent to 90 percent depending on the circuit complexity, and the locations of the good circuits cannot be predicted from wafer to wafer. This makes it impossible to use standard interconnect patterns without first transforming the various wafer yielq patterns to a single standard pattern. The circuit yield (the percent of :total circuits which are good) for the wafer in Figure 2; is nearly 30 percent and yet there is not a single area :of 100 percent yield that is larger than three circuits by two circuits. Thus, 100 percent yield could obtain urtits with only about 5 percent of the complexity allowed by full-slice interconnection techniques. The goal ~s to tailor by some efficient means the locations of the good circuits in Figure 2 to a standard pattern that may be used for Figure I-Integrated circuit wafer .. // .. . / / .. / . ./ . ./ . . I I I ... / .. ........ / / . / . / / . / ... / / . / . / / . / / . .. / .. / .. / . / .. / / / / . / / / ...... / . / / / / . / ... / .. " / . / ..... . . / . . . . . / /. . / . / / . . . . ... / / / . . . / / / .. / . / .... .. / / / / . / .. / .... / / / . ..... / / . / . / / .... . . . / / . / / / .. / .. / ... . / .. -././ .... / / / / . / . ... / / . / / . / . / .. / / / / . / ... / . / .... / ..... / . / / / . / .. / / ... / . / / ... / .. ./ . / Figure 2-Wafer after test-Slashes show good cireuit positions all wafers with about the same circuit yield. For higher yield wafers, there are other standard patterns. which use more good circuits. Figure 3 shows a master pattern (in heavy dots) which can be used for wafers having at least a 25 percent yield. That pattern is characterized by a, more The Pad Relocation Technique •• • • • • • • • • • • • • • • • • •• • •• • • • • • •• • • • • •• • •••••••• •••••••••••••••••••• • • • • • •• • •• • • • • • • • • • • • ••••••••••••••••••••• • ••••••••••••••••••••• •••••••••••••••••••••• • •••••••••••••••••••••• • ....................... · ....•.•.•...•........ ·• ••••••••••••••••••••• ' •••••••••••••••••••••• • ••••••••••••••••••••• • •• • • • • • • • • •• • • • • • • • • ••• • • • • • • ••••• • • • • • • • • • • • • • • • • • • •• • • •• ••••• • ••• •• • • •• •• •• •• • • • • • • • e •• • 101 •••••• • . . I I .. . .• I I • . , • .1.1·····1··· ·111·.·1·.·.········ . ··.··.·1'·1·'1···1. .. ... II . I. I I . , I .•... I I . ·1·.1'.1·1·'11'1··111 III·.· '."'.1'1.1". . •. I . • . . . • . I . • . . . . • . . .. ·1.···.11··1·11··· .··1.1·' ··1.1··.·'·.· ·.·11'1.1··.···.111. ., ·.11·.·· ·.1· . • . . .·11·1.1··.··1··· . 1 · . · . · 1 · · · ·1.1"1 • . 11·11'.'" ·1111 . , . '.1·1··· '1.' .•. l·ll'.I·.II· . . • . '1·11' . . • ·1· •. el Figure 3-A master pattern of good circuits-All wafers will be matched to this pattern by the Pad Relocation technique dense usage of good circuits toward the center of the wafer \vith good circuit positions never adjoined on more than one side by another circuit in the master pattern. The latter characteristic facilitates the routing of standard signal interconnect as well as the relocation of circuits in at least three directions. The matching of the master pattern to the expected yield distribution as a function of distance from the wafer center optimizes the conflicting goals of minimum number of relocations and maximum probability of fulfilling the master pattern. Figure 4 shows the Figure 3 master pattern superimposed on the particular \vafer yield of Figure 2. The objective now is to route a nearby good circuit, shown by a slash, to each heavy dot (i.e., master pattern position) which initially is \vithout a good circuit. This specification can be completed manually giving a coding sheet descr~ption of necessary circuit relocations; or a simple computer routing program can output a punched tape or cards that can be used to make a mask automatically. The computer routine for Pad Relocation \vill use about two orders of magnitude less run time than a customized signal routing primarily because no circuit placement or logic signal routing are required. Pad Relocation requires only that a good circuit be identified for relocation to each position in the master pattern which did not initially have a good circuit. A later paper will present work that is under way to automate the Pad Relocation Figure 4-Master pattern superimposed on the particular yield of the Figure 2 wafer selection and Rpecification with the use of interactive graphics. Figure 5 shows a manually generated specification AREA A . :ri-J::1·· :{1.r..~:ill ~ f: J ~g. n: • . . ·Ihl-e//,·.;-..... ·J-e1·'/,·/6·11.' 11 "if ~~~~:~:::~~~.~ . . .-. . -/- . . ,1. r--:..J . -./ . . foe I---e / . / /-' . . . r-J4 I.~ia~n ~ ~ . . . . ' f H / ' " ' ' ·.1 .• . · e - - I / · I . h · , , · . / .. e---t. e-J flf· I ' " . 1.1.· I.e '11'11·· ·.··1111· •. · - . t · I · · · !-e .. , . .. hll.fH·~h·· . '~:i~:{q:" fH· ~ .. Figure 5-Specification of a set of relocations necessary to completely implement the master pattern of Figure 3 Fall Joint Computer Conference, 1969 102 of posdible relocations that cOn)pletely satisfies the master pattern of Figure 3, us~ng the good circuit positions of the wafer in Figure! 2. The longest relocation line length is less than 10.45 inch. Figure 6 shows how the relocation in area of Figure 5 can be accomplished without crossovers for a quad two-input gate cell. Each gate of the bad ci~cuit at the lower left is functionally replaced with a good gate from the top right circuit. It should be noteq that the computer needs only subroutines for leaving (or entering) a cell from the top, bottom, left, and right, for moving parallel lines' over some number of c~lls, and for making ninety degree turns in order to dq all the possible Pad Relocation routing patterns. Figure 7 shuws the actual Pad Relocation of an SN5480 g~ted full adder above a silicon wafer using 0.002 incl~ aluminum lines on 0.0035 inch centers. Figure 8 s~ows how simple the Pad Relocation mask is if it is cbnsidered as a set of the above mentioned subroutines. A Figure 7-Pad Relocation of an SN5480 gated full adder above a silicon wafer (Using O.002-inch aluminum lines on O.0035-inch centers) Intermedia.te step to full wafer LSI Figure 9 shows an intermediate step to full-wafer LSI using the Pad Relocation te;chnique. Three 4-bit Modular Multiplier modules are ~o be fabricated from the three bordered half-inch square areas (as was suggested in a 1968 FJCC paper by D. F. Calhoun). Within the three bordered areas,; slashes again represent good circuits and circles show the master pattern , rr.....-r , I ~ locations. The lines terminating in arrowheads show how three, eight, and five good circuits can be relocated into the positions circled to establish the same pattern of good circuits for each module, thus allowing the use of one standard signal interconnect pattern for all subsequent modules tailored to that pattern. Figure 10 demonstrates the simplicity of a coding sheet specification of the necessary circuit relocations ~ ... n . ..I t • II • c~t 1 --' ,'.. _.L-- .. , r-II I I f.-- - __ . ~:~~:I I - -, I I .. I .. r' - -- _,. __ I . :.J .. ~ I ~ ::!.~~ _··1 ... ------ . . . . . 1 . • .....J . I I ! i ...... .It I ; L _____ ..... _ ---------- Figure 6-A set of pad relocations ;necessary to replace functionally the quad two-inptit gate circuit in area A of Figure 5 . - :-, ~: Figure 8-Mask pattern for the pad relocations specified in Figure 5 The Pad Relocation T'echnique Figure 9-Pad Relocation routing for three 200-gate modules on a single l-Yzinch wp,fer 103 Figure ll--Four relocation patterns for SN5480's for the three multipliers of Figure 9. Figure 11 shows the four possible Pad Relocation interconnect patterns which are necessary for the LSI multipliers. For these modules it seems appropriate to incorporate simple signal cross-under lines and power distribution in the Pad Relocation level so as to require only two additional levels of interconnect above the tested LSI chips. ..... PAD RELOCATION LSI A Pad Relocation LSI hardware program PHONE CiRCUit HUGHES 0." Rtt6CAliuN biMteflbN ~T~~+W++W++W~~w+~~~~~~C~L~U;fl~:t3~1~*, _,: I T S IFI S THE ...L..I-i-I-I-U+~-I--I-+--I--I--l~I-++W++++-j-E~!~2f-+--I-W-++-~~~~C*~+'ll."J'+-j ~-+~.+-I-i-+-+-I-+-I-+-l -'-l-U!--W-J.-l-l-iW-l-+-W-~--+-l-I-I-l--I-l-+-l-I-1'~L4"IH++H-H-I~H~~ C: ~ LU~~"I-+++++++-l++-l iLU!--W-J.~UL+-W-~--+-l-W+-I-l-+~~~L*2+-W-~+h~~L~~CA E : I L 2 • RI H C L, I N 33 PECIFI S IC R L CA I N S AT R I E USED H T C I RC IT SI I II ,I II An LSI hardware development program began in January 1969 (in which Hughes Aircraft Company contracted Texas Instruments to do the multi-level processing) and which resulted in fully tested and packaged 207 gate arrays in May 1969. During this program, (1) TI fabricated and tested one type of their LSI wafers having a certain mix of gates and flip-flops, (2) TI supplied the yield information on each wafer to be processed for Hughes, (3) Hughes generated both the one standard signal interconnect mask for all wafers as well as an iI).dividual Pad Relocation mask for each wafer, and (4) using the mask specifications from Hughes, TI processed the two additional levels of interconnect and tested and packaged each of the finished units. Similar programs for higher complexity arrays have since been initiated. The results of this program are described below. The logi,c array to be built in: LSI Figure lo-Coding sheet specification Investigations were made three years ago at Hughes Aircraft Company into the applicat:on of LSI arrays 104 Fall Joint Computer Conference, 1969 j to techniques for doing the verx high speed sum-ofproducts computations required: in advanced digital filtering systems. A result of thi;s study ,vas the de.velopment of the high speed ":l\10dular Carry Advance l\1ultiplier" which was described l in a 1968 Fall Joint Computer Conference paper by D. F. Calhoun. Among its characteristics is its modularity \vhich allows longer wordlength multiplication$ to be efficiently accomplished (in terms of speed ~nd parts) simply by paralleling more of the identic~l modules. A 5-bit sign-and-magnitude Modular Multiplier designed with four types of logic gates and a JK flip-flop was thus chosen as the vehicle for LSI development on this program. Such an array forms and. stores in a register the 9-bit sign-and-magnitude product of two 5-bit operands. The 5-bit multiplier design uses 153 NAND gates and 9 flip-flops (each equi\ralent to six NAND gates) for a total of 207 interconpected gates per LSI wafer. The logical interconnection of, 207 gates using less than one square inch of an LSI ~afer represents well any state-of-the-art bipolar LSI ~pproach. Two levels of interconnect (including the Pad Relocation) were used above the tested wafer which already had a first level of metalization for component interconnect. In terms of cross-over complexity, signal linelengt.hs, and circuit fan-outs, the IVToduhtr l\1ultiplier design can be considered typical of a 200 gate logic array. Figure 12-Texas Instruments LSI type "K" slice (HAC Photo 4R07185) Description of the chosen LSI slice The chosen semiconductor slice :for this LSI development program was the Texas Instruments type HK" slice. Basically, the K slice is a hiploar array of transistor-transistor logic (TTL) ga~es and flip-flops occupying an active area of about 11.1 square inches. A picture of this LSI wafer is shown in Figure 12. The array is subdivided into 298 cell!3 of dimension 0.084 inch by 0.044 inch. Of the 298 Basic wafer cells, 170 are split into two 42 by 44 mil halt-cells for gates while the 128 JK flip-flops on the wafkr occupy full 84 by 44 mil cells. The distribution of logic elements on the K slice is shown in Figure 13. Each cell labeled "3" has two independent three-input NAND gates while the adjacent cells labeled "5" have an independent five-input NAND gate and a on~-input NAND gate. In three of the rows of gates ~ single seven-input NAND gate designated by a "7" was processed instead of two three-input NAND gates. The rows of fullsized 84 by 44 mil cells contain the JK flip-flops, which are labeled "FF". In total there! are 642 logic gates (170 ones, 264 threes, 170 fives, 'and 38 sevens) and 128 JK flip-flops processed on the wafer. LSI ARRAY· SLICE "K" - t 44 MILS ~ F T6 F F 7/ 5 F 31& 6 3 5 3 6 316 5 3 5 3 5 31 5 315 F F F F F F F F F F F F F 315 3 5 3 5 3 5 3 5 315 3T 5 3/5 F, F F F F F F F F F F F d6 ih 7 5 7 5 7 5 7 5 F F F F F F F F F F F 3 6 3 5 3 5 3 6 315 F F 31, 31 6 315 315 F F F F F F F F F F F F F F 71 5 F F 3T 5 F 71 6 315 F F F F F F F F F F F F 31 6 3 6 3 5 3 5 3 & 31& 3T 5 F F F F F F F F F F F F 71 6 7 F F F F F F F 31 5 F F F 31& 31 6 F F F F 31 6 F F 316 F 7 5 7 5 7/5 71 5 F F F F F F F 31 5 3 5 3 5 3/ 5 31 & F F F F F F F F F F F F 3/ 5 31 6 3 & 3 6 3 6 3 5 31 5 3r 5 31& 31 B 31 5 F F F .F F F F F F F F F F F F 7 5 7 5 7 & 7 5 715 7/5 71 5 7ls F F F F F F F 715 71& F F F F 31 e 31 5 F F F F F F F F F F F F F F F F F F F F 315 3 5 3 5 3 5 3 5 3T 5 31 6 F F F F F F F F F F F F F 31 5 315 3 5 3 6 3 6 3 5 3T 5 31& 3/5 31 5 3 5 3 5 3 6 3 6 31 5 31 5 3 6 3 6 3 5 3 5 3\ 5 3 5 3 5 3 5 3 5 F F 316 F F Figure 13-LSI array slice "K" 71 5 71 5 715 F F F F F F 3/ & 3/ 5, 31 5 F F F F F 31 6 F F F F 1188 MILS F 5 31' 3/ & 315 F 715 F 6 F F F F 3 3/ 5 F F F 7 F 3j5 715 F F F F F dB F 31 5 F 31' 3j1 F 715 B F F F F & 71' F F F F F 71 5 F 31 5 F F 3 7\5 F F F 3r5 F 715 F F F F F 3\ 6 F 5 3 31 6 F 5 3 3 31 5 F 3 5 5 315 F 5 3 6 F F 3 5 3 F F 5 3 3 F F 3 5 315 3/6 F 5 3 3/5 3/ 5 F 'F 3 F 3 3/5 315 3} a 7/ 5 F I 1176 MILS IM~LSr- F F F 3\ 5 F I The Pad Relocation Technique Selection of the master pattern and pad relocation patterns First, a master pattern of circuits was chosen to define the standard circuit positions on the K slice that would be interconnected to form the Modular Multiplier function. This master pattern (shown in Figure 14) was defined with respect to (1) maximizing the probability of successful fulfillment, Pr(M), of the master pattern, (2) facilitating the standard signal interconnect, and (3) using a minimum number of relocation patterns efficiently. After the master pattern and the repertoire of relocation patterns to be used were determined, restricted areas in the Pad Relocation level were defined to allow signal crossunders from the standard top level signal interconnect. Sufficient cross-under capability for this design was found in the flip-flop cells alone by using certain areas of these cells which are not required by any of the defined relocation patterns. Other cross-under areas can be defined for any more complex designs so as to still use only two metalization layers above the tested circuits. A set of Pad Relocation patterns was prepared to allow the efficient selection of the Master Pattern Cell Designation Key: t:. = 1 input o = 3 input gates o=4 gate input gates 105 particular patterns and their positions necessary to fulfill each wafer's master pattern. The chosen set of K slice relocation patterns is shown in Figure 15. This semiautomated specification has :fi~,cilitated a very fast turnaround and low cost capabiiity for the generation of Pad Relocation masks and for working with new routing requirements, wafer layouts and logic designs. ' LSI program results The end results of the Hughes effort described in this section were the two metalization mask specifications used by TI to process each wafer. Only one of these is unique since the use of Pad Relocatio~ allowH all signal interconnect to be obtained from a oncegenerated standard mask. Figure 14 shows the worksheet specification of how the yield of a typical LSI slice can be tailored to the chosen master pattern. The lines with arrowheads at the end specify relocation patterns from the set of patterns shown in Figure 15. The completion of the K slice master pattern was accomplished successfully on each of the 30 wafers attempted. A typical time for a man to complete and verify the specification shown in Figure 14 was two minutes manually. From the specifications like those in Figure 14, the necessary relocation patterns were selected from the standard set shown in Figure 15 and were added to o = JK flip-flop ~ 11111 /MIIIIIf ~ F u, "lI 'It ~ l "'" .jill ~ m .. II ;!II' - 1@!::siiil ill Figure 14~Pad Relocation worksheet with master pattern locations shown l Figure 15-Set of K slice relocation patterns 106 Fall Joint Computer Conference, 1969 the standard cross-under pattern to complete the Pad Relocation mask such as the on~ shown in Figure 16. Only the particular circuit relocation patterns vary within this mask which allows thb least possible variation of interconnect and testing from one array to another. The more complex but standard mask is the one shown in· Figure 17 which abcomplishes all necessary. signal interconnect (except the cross-unders to the Pad Relocation level) and the power distribution for the 5-bit multiplier design. The design for this mask can efficiently be done manually for arrays of this and larger size since the ~aster pattern is well distributed. In mask plotting itime alone, the Pad Relocation mask required only about 20 percent the time required to plot the signal interconnect metalization patterns. A photograph of the final 207 gate LSI multiplier is shown in Figure 18. Statistics of Pad Relocation master patterns The choice of a master pattern for Pad Relocation is important since its definition affects the average number of relocated circuits (and thus the routing time and mask complexity) as well as the number and simplicity of the signal interconnect levels. Also a good statistical match between the ~aster pattern and the expected wafer yield distribution will result in a higher - • probability of successful relocation. As an example, consider a master pattern that is defined too densely about a wafer's periphery. Since peripheral wafer circuits show a much lower yield than the more central '1IJJ'IJJIJIJ'IJJJJIII~J~llllllll.JI .. t: l ;. ;.; :. !. ~; ~ 1, :; ~ , c'· , ' I •• 'hi ....... I • L - II-II~=.II1II \ It I ~ , • II .11 II I I • - I II ~~ II = 11111 ....... - II - -ill' .- -:;D:5 --.-~ • Figure 17-5-bit. modular multiplier standard interconnect mask L 11111 -- ~- --q . I • ill' - , • ~ iiIIIt 1.1 • ' . . . , 11111111111111111111"111111111111111111 . li'igure 16-Pad relocation mask with standard crossundel'S Figure 18-207 gate multiplier LSI array usin.g Pad Relocation (HAC Photo 4R09152) 'The Pad Relocation Technique ones, there will statistically be more relocations, longer relocation lengths, more difficulty in satisfying the master pattern, and a higher concentration of signal interconnect above the master pattern than if the master pattern had been chosen to match the "expected" yield distribution as was done for the example shown in Figure 3. A first question that must be answered is what is the "expected" yield distribution? Investigations thus far have pointed out only that there is significant decrease in yield as a function of the distance from the wafer center which can be attributed to boundary defects, and that when good or bad circuits occur, there is a more than random clustering effect. No ability to predict the locations of these clusters has been obtained. What must be done is to examine the yield of large samples of the wafer types that will be used to determine the distribution that best describes their expected yield patterns. This distribution will be different for different ranges of yield as well as for different circuit complexities and wafer types. The master pattern for a specific range of yield, wafer type, and wafer size should be matched to the expected distribution so as to take advantage of any knowledge of where good circuits are more probable. By so doing, the probability of successfully fulfilling a master pattern is maximized while minimizing the expected length of the longest relocations. StatisticaJ techniques have been developed to determine and compare the efficiency of various master patterns in terms of maximizing both the utilization of good circuits and the probability of successfully fulfilling the master pattern. For example, if y is the percentage of the total circuits that were found to be good (i.e., the yield), m the percentage of total circuits that are in the master pattern, and r the number of unused circuits from which a relocation could be made to each master pattern circuit, then the probability of successfully fulfilling each master pattern circuit independently is: Y ~ L.J ~ (1 - y - 1)(1 - y)k (1 - y)k = Y LJ k=O k"'"r + (1 - y)ry = y L (1 - y)k (1) k-O where the first term is the probability that the master pattern circuit itself is good, and each succeeding term is the conditional probability of needing to examine another candidate for relocation times its probability of being good. Equation (1) can be simplified as follows: (1 - Y - 1) k=O ~ (u - l)u k k=Q -Y = Y L.J (2) with a P(l) = Y + (1 - y)y + (1 - y)2y.+ 107 u = (1 - y) and k=r L k=O (u - l)u k k=O - (u r+1 - 1) 1 -(1 - y)r+l (3) therefore, P(l) = 1 - (1 - y)r+l (4) If the master pattern has a total of M circuits in it, then the joint probability of successfully fulfillin g all of the M circuits becomes: P(M) = P(l)M = [1 - (1 - y)]r+1M (5) Equation (5) is based on an uncorrelated and pseudorandom distribution of good circuits (see Reference 10 with y 2:: 0.25) as well as the same assumption as Equation (1) that there are r circuits (good or bad) for each master pattern circuit fsom which a relocation can be made independently of the other master pattern circuits. It is, however, an unnecessary restriction to assign r circuit positions which could only be used to fulfill each master pattern circuit. Instead, consider successively examining up to r circuit positions which are the closest to each particular master pattern position and, for which, there is still a free path in the Pad. Relocation level to the master pattern position. Then Equation (5) will give the probability of successfully relocating (if necessary) to each of the M required master pattern positions at least one of the r closest and free circuit positions. Equation (5) determines a family of curves. for P reM) versus M for various yields and values o~ r. Figure 19 shows the curves of PrOf) versus M With y = 0.5 for r = 4 and r = 9. It should be noted that each circuit of M may actually be many interconnected gates of logic and M = 100 would represent 1000 gates Fall Joint Computer Conference, 1969 108 1.00 y =0.5" CIRCUIT YIELD 0.90 allow the standard signal interconnect to be designed to require the minimum number of levels and the minimum area per level. Thus, chip areas can be less interconnect limited. 0.80 Improvement of testing and reliability of la:rge scale integrated systems 0.70 0.60 ~ ... 0.50 ~ M =220 FORP=O.S 0.40 0.30 0.20 0.10 0.0 20 50 100 soc 200 1000 M Figure 19--The probabilty Pr{M) of successfully fulfilling a ma.ster pattern of M cifcuits by relocating from one of up to r nearby circuits. Eeqh circuit is a tested unit which may have many gates 6f logic complexity if each circuit of M had 10 gates of equivalent logic complexity. If it is desired to 'successfully fulfill the master patterns of at least half the wafers considered, Figure 19 shows that 220 circuits (and thus probably 750 or more gates) can be used if r = 4, and 680 circuits can be used if r = 9. Of ¢ourse, any wafers for which the master pattern was hot easily fulfilled are not lost since they can be inv~ntoried and used for other master patterns, or for integrated circuits, or diced and bonded to substrates~ As a comparison the most complex current bipolar p.iscretionary unit has an equivalent Al of 169 while the 100 percent yield approach has reached an equivalent M of only 24. Advantage of Pad Relocation to iJSI signal interconnect The prime advantage of Pad ~ Relocation LSI which has been described above is th~t it places the pads of all used circuits in standard positions which both allows fixed-pattern signal' routing between these circuits as well as the utilization of more circuits than allowed by other LSI techniques. There are further advantages, however, to the rquting of the standard signal interconnect. For exaIl1ple, the positions to which circuit pads will always be brought can be modified and optimized to facilitate the necessary routing of signals as well as to minimize the lengths of the longest or the most critical signal paths. This will also Semiconductor device reliability, as well as propa~ gation delay, is highly dependent on proper maintenance of junction temperatures within certain bounds. From the maximum specified junction temperature, a maximum power dissipation per wafer area can be computed which is dependent on the heat conductive characteristics of the wafer and the cooling techniques used, as well as on the area and power dissipation of the particular circuits. Thus there will be a maximum number of circuits that should be powered up on the wafer. In addition, no region of t.he wafer should exceed a certain maximum power density in order to insure that the wafer will not have relative "hot spots" where too many powered circuits are located. Pad Relocation LSI can help insure that the wafer power dissipation density is not excessive by specifying the relocated circuits to be primarily those from areas of sparce circuit utilization, thus obtaining a more uniform pmver density across the enti.re wafer. By so doing, the system cooling requirements can be relaxed and/or more circuits can be used on the same wafer. This more uniform power dissipation could be quite difficu ~t to insure with other routing techniques since there is less choice in the used circuit positioning. A simple means by which a Pad Relocation 0 (12) Likewbe, define the Category history C h , at the eth event.~'),~ (13) F, = u~ (19) From equations (11) through (14) we see how the Authority and Category histories accumulate as a function of event e. These events are the specific times when files are accessed by a job. To maintain security Fall Joint Computer Conference, 1969 122 i TABLE I-8ecurity property determination matrix Object ~roperty User, u Authority A Category C Given Constant Given Constant Franchise F u .------------------Terminal, t Given Constant Job, j min(A 1o At) File, f Existing file Given Constant u~ Given Constant Cu T\ u~J C, Existing file Given Constant New file max(A(he-1), peA;»~, e > 0 New file Ch(e - 1) U Ci, e > 0 integrity, these. histories can n:ever exceed (i.e., be greater than) the job security profile. This is specified as, If equations (22) and (23) hold, then by definition Ah(oo) ~ Ai (20) u = Ut = Uj (24) (21) Access is granted to a file jf and only if For e::l 0, we see the properties initialized to their simplest form. However, as e g~ts large, the histories accumulate, but never exceed thai upper limit set by the job. Ah(e) and Ch(e) are impQrtant new concepts, discussed in further detail laterl We speak of them, affectionately, as the securj~y "high-water mark," with analogy to the bath tub ring that marks the highest water level attained. The Franchise of a new file is always obtained from the Franchise of the job given by equation (6). When i = II = 0, the job is controlled by the s~ngle user Uj who becomes the owner and creator of the file wth the sole Franchise for the file. Access control Our model is now rich enough tq expreSl:) the equations of access control. We '\\ ish to control access by a user to the system, to a terminal, and to a file. Access is granted to the system if and only if UEU (22) where U is the set of all sanctioned users known to the system. Access is granted to a terminal if· and only if (25) for propertjes A and C according to equationEI (8) a.nd (9), and (26) If equations (25) and (26) hold, then access is granted and Ah(e) and Ch(e) are calculated by equations (12) and (14). Model interpretation Three different dimensions for restricting :Jiccess to sfnsitive information and information processes are possible with the security profile triplet. The generality of this technique has considerable application 1;0 public and military systems. For the system of interest, however, the Authority property corresponds to the Top Secret, Secret, etc., levEls of government and m~litary­ security ~ Category c)rresponds to the host- of special control compartments used to restrict access by project and area; such as those of the Intelligence and Atomic Energv communities; and the Franchise property corresponds to access sanctioned on the lbasis of Security Controls in ADEPr-50 Time-Sharing Systetn need-to-know. With this interpretation, the popular security terms "classifics-tion" and "clearance" can be defined by our model h the SB,me dimensions--as a nUn/max test on the security plofile trjplet. CIgssification is attached to a security object to designate the minimum security profile required for access, vvhereas clearance grants to a security object the maximum security profile jt has permissjon to exercise. Thus, legal aCCfSS obtains if the clearance is greater than or equal to the classjfication, i.e., if equation (25) holds. Another observation on the modEl is the "job umbrella" concept implied by equatjons (22) through (26); i.e.. tbe derived clearance of the job (not thf' clearance of the user) is used as the securhy control triplet for file access. The job umbrella spreads a homogeneous clearance to normalize access to a heterogeneous assortment of program and data files. This simplifies the problem of control in a multi-level security system. Also note how the job umbrella's h;gh-water mark (equat;ons (11) through (14» is used to automatically classify new files (equ9tions (17) and (18»; this subject is discussed further below. A final observp.tion on the model is its p,pplic["tion of need-to-know to terminal access, equation (23). This feature allows terminals to be restricted to special people and/ or special groups for greater control of personnel intmfaces-i.e., systems programmers, computer operators, etc. Security control implementation The selection of a set ,theoretic model of security control was not fortuitous, but [) deliberate choice biased toward computation91 efficiency and ease of implementatjon. It permits the clean separation and isolation of security control code from the security control data, which enables ADEPT's security mechanjsms to be openly discussed and still remain safe-a point advocated by others.14.16 We achieve this safety by "arming" the system with security control datB, only once at start-up time by the SYSLOG procedure discussed later. Also, the model jmproves the credibility of the security system, enhancing its understanding and thereby promoting its certification. Security objects: Identity and structure Each security object has a unique identification (ID) within the system such that it can be managed indivjdually. The form of the ID depends upon the securityobject type; the syntax of each is given below. 123 User identification For generality of definition, each user is uniquely identified by his user:id, which must be less than 13 characters with no embedded blanks. The user :id can be any meaningful encoding for the local installation. For example, it can be the individual's Social Security number, his military serial number, his last name (if unique and less than 13 characters), or some local installation man-number convention. The set of all user :ids constitutes the universal set, U. Terminal identification All peripheral devices in ADEPT are identified uniquely by their IBM 360 device addresses. Besides interactive terminals, this includes disc drives, tape drives, line printer, card reader-punch, drums, and 1052 keyboard. Therefore, terminal:id must be a two-digit hexadecimal number corresponding to the unit address of the device. Job identification ADEPT consists of two parts: the Basic Executive (BASEX), which handles the allocation and schedul~ng of hardware resources, and the Extended Executlve (EXEX), which interfaces user programs 'with BASEX. ADEPT is designed to operate itself and user programs as a set 'of 4096-byte pages. BASEX is identified as certain pages that are fixed in main core, whereas EXEX and user programs are identified as sets of, pages that move dynam.ically between main and s~ap memory. A set of user programs are identified as a job, with page sets for each program (the program map) described in thejoh's environment area, Le., the job's "state tables." Every job in ADEPT has an environment area that is swapped with the job. It contains dynamic system bookkeeping information pertinent to the job, including the contents of the machine registers (saved when the job is swapped out), internal file and ~/O control tables, a map of all the program's pages on drum, user:id, and the job security control parameters. The environment page(s) are memory-protected against readin~ and writing by user programs, 80S they are really swappable extensions of the monitor's tables . . The job:id is then a transitory internal parameter which changes with each user entrance and exit from the system. The job:id is a relative core memory address used by the executive as a major index into central system tables. It is mapped into an external two-digit number that is typed to the user in response to a successful LOG IN. 124 Fall Joint Computer Conference, 1969 File identification ADEPT's file system is quite rich in the variety of file types, file organization, and equipment permitted. There are two file types: temporary and permanent. Temporary files are transitory "scratch" disc files, which disappear from the system: inventory when their parent job exits from the syst~m. They are always placed on resident system volumes, and are private to the program that created them. Permanent files constitute the majority of files cataloged by the system. Their permanence derives from the fact that they remain inventoried, cataloged, and available even after the job that created or last referenced them is no longer present, and even if they are not being used. Permanent files may be placed by the user on resident system volumes or on demountable private volumes. There are six file organizations from which a user may select to structure the records of his file: Physicalsequential, Sl; non-formatted, S2; index-sequential, S3 ; partitioned, S4; multiple volume fixed record, S5; and single volume fixed record, S9. Regardless of the organization of the records, ADEPT manages them as a collection, called a file. Thus, security control is at tho file level only, unlike more definitive schemes of sub-element control. 8,10--12 All the control information of a file that describes type, organization, physical storage' location, date of creation, and security is distinct from the data records of the file, and is the catalog of the file. All cataloged ADEPT files are uniquely identified by a four-part name; each part has various options and defaults (system assumptions). This name, the file:id, has the following form: file:id : : = name, jorm,·user:id, volume:id Name is a user-generated cha~acter string of up to eight characters with no embedded blanks. It must be unique on a private volume as well as for Public files (described below). Form is a descriptor of the internal coding of a file. Up to 256 encodings are possible, although only these seven are currently applicable: 1 2 3 4 5 6 7 = binary data = relocatable program = non-relocatable program card images = catalog = DLO (Delayed Output) = line images = U ser:id corresponds to th~ owner of the file, i.e., the creator of the file. Volume:id is the unique file storage device (tape, disc, disc pack, etc.) on which the file resides. For various reasons, including reliability, ADEPT file inventories are distributed across the available storage media, rather than centralized on one particular volume. Thus, all files on a given disc volume are inventoried on that volume. Security properties: Encoding and structure Implementation of the security properties in ADEPT is not uniform across the security objects as suggested by our model, particularly the Franchise property. Lack of uniformity, brought about by real-world considerations, is not a liability of the system but a reflection of the simplicity of the model. Extensions to the model ~tre developed here in accordance with that actually implemented in ADEPT. Authority Authority is fixed at four leveJs (w = 3 for Hquation (1)) in ADEPT, specifically, UNCLASSIFIED, CONFIDENTIAL, SECRET, and TOP SECB.ET in accordance with Department of Defense security regulations. The Authority set is encoded as :~ logical 4-bit item, where positional order is important. Magnitude tests are used extensively, such that the high-order bits imply high Authority in the sense of equ2.tion (8). Category Category is limited to a maximum of 16 eompart· ments (1/1 :::; 15 for equation (2)), encoded as a logical 16-bit item. Boolean tests are used exclusively on this datum. The definition of (and bit position correspondence to) specific compartments is an installation option at ADEPT start-up time (see SYSLOG). Typical examples of compartments are EYES ONLY, CRYPTO, RESTRICTED, SENSITIVE, etc. Franchise Property Franchise corresponds to the military concept of need-to-know. Essentially, this corresponds to a set of user:ids; however, the ADEPT implementation of Franchise is different for each security object: 1. User: All users wishjng ADEPT service must be knowIl to the system. This knowledge is imparted by SYSLOG at start-up time and limited to approximately 500 user:ids (max(U) :::; 500). Security Controls in ADEPT-50 Time-Sharing System 2. Terminal: Equation. (5) specifies the Franchise of a given terminal, F t, as a set or user:ids. In ADEPT, F t does not exist. One may define all the users for a given terminal, i.e., F t ; or alternatively, all the terminals for a given user. Because SYSLOG orders its tab1es by user:id, the latter definition was found more convenient to jmplement. 3. Job: The Franchise of a job is the 'llser:id of the creator of the job at the time of LOGIN to the system. Currently, only one user has access to (and control of) a job (p, = 0 for equation (6)). 4. File: Implementation of Franchise for a file (F f), is more extensive than equation (7). In ADEPT, we wish to control not only who accesses a file, but also the quality of access granted. We have defined a set of four exclusive qualities of access, such that a given quality, q, is defined if q E {READ, WRITE, READ-ANDWRITE, READ-AND-WRITEWITH-LOCKOUT-OVERRIDE} (27) ADEPT permits simultaneous access to a file by many jobs if the quality of access is for READ only. However, only one job may access a file with WRITE, or READ-AND-WRITE quality. ADEPT automatically locks out access to a file being written to avoid simultaneous reading and writing conflicts. A special access quality, however, does permit lockout override. Equation (7) can now be extended as a set of pairs, F f = {(uJ, qO), (u), ql), "', (ul, q'Y)} (28) where q i are not necessarily distinct and are given by equation (27). The implementation of equation (28) is dependent upon 1', the number of franchised u,sers. When l' = 0, we have the ADEPT Private file, exclusive to the owner, uJ; for l' = max(U), we have the Public file; values of l' between these extremes yield the Semi-Private file. l' is implicitly encoded as the ADEPT "privacy" item in the file's catalog control data, and takes the place of F f for all cases except a Semi-Private file. For that case exclusively, equation (28) holds and an actual F f list of user:id, quality pairs exists as a need-to-know list. The owner of a file specifies and controls the file's privacy, including the composition of the need-to-know list. 125 Security control initialization: SYSLOG SYSLOG is a component of the ADEPT initialization package responsible for arming the security controls. It operates as one of a number of system start-up options prior to the time when terminals are enabled. SYSLOG sets up the security profile data for user:id and terminal:id, i.E.:" the "given constants" of Table I. SYSLOG creates or updates a highly sensitive system disc file, where each record corresponds to an authorized user. These records are constructed from a deck of cards consisting of separate data sets for compartment definitions, terminal:id classification, and user:id clearance. The dictionary of compartment definitions contains the less-than-9-character mnemonic for each member of the Category set. Data sets are formed from the card types shown in Table II. Use of passwords is described later in the LOGIN procedure. An IDT card must exist for each authorized user; the PWD , DEV , SEC , and CAT card types are optional. Other card types are possible, but not germane to security control, e.g., ACT for accounting purposes. More than one PWD, DEV, and CAT card is acceptable up to the current maximum data limits (i.e., 64 passwords, 48 terminal:ids, and 16 compartments). A variety of legality checks for proper data syntax, quantity, and order are provided. SYSLOG assumes ~he following default conditions when the correspondlIlg card type is omitted from each data set: PWD DEV SEC CAT No password required All terminal:ids authorized A = UNCLASSIFIED C = null (all zero mask) This gives the lowest user clearance as the default, while permitting convenient user access. Various options exist in SYSLOG to permit maintenance of the internal SYSLOG tables, including the replacement or deletion of existing data sets in total or in part. The sensitivity of the information in the security control deck is obvious. Procedures have been developed at each installation that give the function of deck creation, control, and loading to specially cleared security personnel. The internal SYSLOG file itself is protected in a special manner described later. Access control A fund2.mental secur.1ty concern in multi-3ccess sysis that many users with different clearances will be simultaneously using the system, thereby raising the 126 Fall Joint Computer Conference, 1969 TABLE II-SYSLOG control cards Card Type Purpose DICT I dentifies start of data set of compartment definitions. Defines up to 16 compartments. compartment 1 TERMINAL UNIT terminal:id IDT 'U,ser:id PWD password DEV terminal:id1 SEC Authority CAT compartment 1 compartment16 password' terminal:id48 compartment 16 Identifies start of data sets of terminal definitions. Identifies start of a terminal data set. Identifies start of a user data set. Defines legal passwords for user:id up to 64. Defines legal terminals for user:id up to 48. Defines user:id Authority. Defines user:id Category set. possibility of security compromise. Since programs are the "active agents" of the user, the system must maintain the integrity of each and of itself from accidental and/or deliberate intrusion. A multifile system must permit concurrent access by one or more jobs to one or more on-line, independently classified files. ADEPT is all these things--multiuser, multiprogram, and multifile system. Thus, this section deals with access control over users, programs, and files. an unsuccessful LOGIN. Furthermore, the terminal is ignored (will not honor input) for approximately 30 seconds to frustrate high-speed, computer-assisted. penetration attempts. If, however, the match is successful (equation (22) holds), the current password in the SYSLOG file for this user:id is discarded ,and LOGIN proceeds to create the job clearance. ( start) User access control: LOGIN To gain admittance to the system, a user must first satisfy the ADEPT LOGIN decision procedure. This procedure attempts to authenticate the user in a fashion analogous to challenge-response practices. The syntax of the ADEPT LOGIN command, typed by a user on his terminal, is as follows: ----- Equatic'n (22) /LOG IN user :id password accounting Figure 1 pictorially displays the LOG IN decision procedure based upon the user-specified input parameters. Usel':id is the index into the SYSLOG file used to retrieve the user security profile. If no such record exists (Le., equation (22) fails), the LOGIN is unsuccessful and system access is denied. If the security profile is found, LOGIN next retrieves the terminal:id for the keyboard in use from internal system tables, and searches for a match in the terminal:id list for which the user:id was franchised by SYSLOG. An unsuccessful search is an unsuccessful LOG IN. If the terminal is franchised, then the current password is retrieved from the SYSLOG file for this usel':id and matched against the password entered as a kevboard parameter to LOGIN. An unsuccessful match i; again ----- Equation (23) ----- Equation (22) ----- Equations (15) and (16) Figure 1-LOGIN decision procedure Security Controls in ADEPT-50 Time-Sharing System Passwords in ADEPT obey the same syntax conventions as user:id. (See the earlier description of User Identification.) Although easily increased, currently SYSLOG permits up to 64 passwords. Each successful LOG IN throws away the user password; 64 successful LOGINs are possible before a new set of passwords need be established. If other than random, once-only passwords are desired, the 64 passwords may be encoded in some algorithmic manner, or replicated some number of times. Once-only passwords is an .easily implemented technique for user authentication, which has b~en advocated by others.2,7 It is a highly effective and secure technique because of the high permutability of 12-character-passwords and their time and order interdependence, known only to the user. Once the authentication process is completely satisfied, LOGIN creates the job security profile according to equations (15) and (16) of our model. That is, the lower Authority of the user and the terminal becomes Ai, and the intersection (logical AND) of the user and terminal Category sets becomes the Category of the job, Cj. For example, a user with TOP SECRET Authority and a Category set (1001 1001 0000 1101) operating from a SECRET level terminal with a Category set (0000 0000 0000 0010) controls a job cleared to SECRET with an empty Category set. Program access control: LOAD As noted earlier, the ADEPT Executive consists of two parts: BASEX, the resident part, and EXEX, the swapped part. EXEX is a body of reentrant code shared by all users; however, it is treated as a distinct program in each user's job. Up to four programs can exist concurrently in the job. Each operates with the job clearance-the job clearance umbrella. LOAD is the ADEPT component used to load the programs chosen by the user; it is part of EXEX and hence operates as part of the user's job with the job's clearance. Programs are cataloged files and as such may be classified with a given security profile. As is described in "File Access Control" below, LOAD can only load those programs for which the job clearance is sufficient. Once loaded, however, the new program operates with the job clearance. In this manner, we see the power of the job umbrella in providing smooth, flexible user operation concurrent with necessary security control. Program files may be classified with a variety of security profiles and then operate with yet another, i.e., the job clearance. By this technique security is assured and programs of different classifications may be operated by a user as one job. It 127 permits, for example, an unclassified program file (e.g., a file editor) to be loaded into a highly classified job to process sensitive classified data files. File access control: OPEN Before input/output can be performed on a file, a program must first acquire the file by an OPEN call to the Cataloger. Each program must OPEN a file for itself before it can manipulate the file, even if the file is already OPENed for another program. A successful OPEN requires proper specification of the file's descriptors-some of which are in the OPEN call, others of which are picked up directly by the Cataloger from the job environment area (e.g., job clearance, user:id)-and satisfactory job clearance and user:id need-to-know qualifications according to equations (25) and (26) of our model. Equation (25) is implemented as (8) as a straightforward magnitude comparison between A j and AI' Equation (25) is implemented as (9) as an equality test between C I and (C j / \ C / ). We use (C j / \ C / ) to ensure that C I is a subset of the job categories; i.e., the job umbrella. Lastly, equation (26) is a NOP if the file is Public; a simple equality test between Uj and UI if the File is Private; and a table search of F I for Uj if the file is Semi-Private. These tests do increase processing time for file access; however, the tests are performed only once at OPEN time, where the cost is insignificant relative to the I/O processing subsequently performed Qn the file. The quality of access granted by a successful OPEN, and subsequently enforced for all I/O transfers, is that requested, even if the user hp"s a greater Franchise. For example, during program debugging, the owner of a file may OPEN it for HEAD access only, even though READ-AND-WRITE access quality is perm.itted. He thereby protects his file from possible uncontrolled modification by an erroneous WRITE call. Considerable controversy surrounds the issue of automatic classification of new files form.ed by subset or merger of existing files. The heart of the issue is the poor accuracy of many such classification techniques17 and the fear of too many over-cle.ssifIed files (a fear of operations personnel) or of too many under-clPJssified files (a fear of the security control officers). ADEPT finesses the problem with a clever heuristic-most new files are created. from. existing files, hence classify the new file as a private file with the composite Authority BJnd Cate. J' ,,0 I YJ'~ 4C'c :i?o~~C'~ ~o~J'~c ,,0 ;.~~ o~~. q,.;. J'~'.f>J'J'r J'J'~ ~~~ C'"oy ~( '.l: '-I'.() 4'~o/ ~ .t~ ¢'~ ;. ...." A'./; :i?oJt( '1"", If( ~ :/~ 1'(~ -I'(~ oJ'~ ~ ~~p~ ¢'~ ~-I'(~ 1'(~ 1l ~~ o~~ o~<:~C'o. o~ o(~ c:.,~ J'<~"eC'Y<1> ;p "l'lf. J' A <~ '1'<~4'~ ~"b~.,. -I'.() ~ ~<1> . EVENT ~ J' LOGIN X X LOGOUT X X X X X X X X X X X X X X X X CHANGE FILE X X X X X X X CLOSE FILE X X X X X DELETE FILE X X X X ~- 1-.- RECLASS X X x x OPEN FILE REOPEN l FILE REPLACE DEVICE LIST 2 4 WRAPUp 5 X X X X X X X x CATEGORY DICTIONARy RESTART X 3 ~ x x X --- 1 - - - -f.--- x x v X 1 This is the "OPEN existing file" command. 2 A list. of all the terminal devices and their assigned security and categories is recorded at each system load. 3 A list of the prose category names is recorded at each system load. 4 Whenever the system is restarted on the same day (and AUDIT had been turned on earlier that day) the time of the restart is recorded. 5 The time that the AUDOFF action was taken, or the time that the WRAPUP function called AUDIT, to terminate the AUDIT function. 132 Fall Joint Computer Conference, 1969 fully demonstrated a security control mechanism that more than adequately supports heterogeneous levels and types of classification. Of note in thi~ rega~d is the LOGIN decision procedure, access control tests, job umbrella, high-wat.er mark, and audit trails recording. The approach can be improved in the direction of more compartments (on the order of 1000 or more), extension of the model to include system files, and the implementation of a single Franchise test for all security objects. The implementation needs redundant encoding and error detection of security profile data to increase confidence in the system-though we have not ourselves experienced difficulty here. The increase in memory requirements to achieve these improvements may force numerical encoding of security data, particularly Category, as suggested by Peters.7 Second, SYSLOG has been highly successful in demonstrating the concept of "security armin.g" of the system at start-up time. Our greatest difficulty in this area has been with the human elem.ent-the computer operators-in preparing and ha.ndling the control deck. In opposition to Peters,7 we believe the operator should not be "designed out of the operation as much as possible," but rather his capabilitits should be upgraded to meet the greater levels of sophistication B.nd responsibility required to operate a time-sharing system. 20 He should be considered part of line management. ADEPT is oriented in this direction and work now in progress is aimed at building a real-time security surveillance and operations station (SOS). Third, we missed the target in our attempt to isolate and limit the ~mount of critical coding. Though much of the control mechanism is restricted to a few components--LOGIN, SYSLOG, CATALOGER, AUDIT -enough is sprinkled around in other areas to make it impossible to restrict the omnipotent capabilities of the monitor, e.g., to run EXEX in Problem state. Some additional design forethought could have avoided some of this dispersal, particularly the ·wide distribution in memory of system data and programs that set and use these data. The effect of this shortcoming is the need for considerably greater checkout time, and the lowered confidence in the system's integrity. Lastly, on the brighter side, we were surprisingly frugal in the cost of implementing this security control mechanism: It took approximately five percent of our effort to design, code, and checkout the ADEPT security control features. The code represents about ten percent of the 50,000 instructions in the system. Though the code is widely distributed, SYSLOG, security commands, LOGIN, AUDIT, and. the CATALOGER account for about 80 percent of it. The overhead cost of operating these controls is difficult to me8.sure, but it is quite low, in the order of one or two percent of total CPU time for norm.al operation, excluding SYSLOG. (SYSLOG, of course, runs at card reader speed.) The most significant area of overhead is in the checking of I/O channel programs, where some 5 to 10 msec are expended per call (on the average). Since this time is overlapped with other I/O, only CPU bound programs suffer degredation. AUDIT recording also cont.ributes to service call overhead.. In actuality, the net operating cost of our security controls may be zero or possibly negative, since AUDIT recordings showed us numerous trivial ways to measurably lower system overhead.. ACKNOWLEDGl\'lENTS I would like to acknowledge the considerable encouragement I received in the formative stages of the ADEPT security control design from lVIr. Richard Cleaveland, of the Defense Communications Agency (DCA). I woul.d like to thank l\irs.l\1artha Bleier, l\1r. Peter Baker, and. Mr. Arnold Karush for their patient care in designing and implementing much of the work I've described Also, I wish to thank Mr. l\Tarvin Schaefer for assisting me in set theory notation. Finally, I would like to applaud the· ADEPT system project personnel for designing and building a time-sharing system so amenable to the ideas discussed herein. REFERENCES 1 A HARRISON The problem of privacy in the computer age: An annotated bibliography RAND Corp Dec 1967 RM-5495-PR/RC 2 L J HOFFMAN Computers and privacy: A survey Stanford Linear Accelerator Center Stanford Univ Aug 1968 SLAC-PUB-479 8 H E PETERSEN R TURN System implications of information privacy Proc SJCC Vol 30 1967 291-300 4 W H WARE Security and privacy in computer systems Proc SJCC Vol 30 1967 279-282 5 W H WARE Security and privacy: Similarities and differences Proc SJCC Vol 80 1967287-290 6 R LINDE C WEISSMAN C FOX The ADEPT-50 time-sharing system Proc FJCC Vol 35 1969 Also issued as SDC Doc SP-3344 7 B PETERS Security considerations in a multi-programmed comp uter system Proc SJCC Vol 30 1967 283-286 8 RYE CAPRI COINS OCTOPUS SADIE Systems Security Controls iIi A:P,EPT'-50 Time-Sharing System NOC Workshop National Security Agency Oct 1968 9 H W BINGHAM Security technique8 for EDP oj multi-level cla88ified information Rome Air Development Center Dec 1965 RADC-TR-65-415 10 R M GRAHAM Protection in an information proce88ing utility ACM Symposium on Operating Systems Principles Oct 1967 Gatlinburg Tenn 11 L J HOFFMAN Formularie8-Program controlled privacy in large data ba8e8 Stanford Univ Working Paper Feb 1969 12 D K HSIAO A file 8y8tem for a problem 80lving facility Dissertation in Electrical Engineering Univ of Pa 1968 13 J I SCHWARTZ C WEISSMAN The SDC time-8haring 8Y8tem revi8ited Proc ACM Conf 1967 263-271 14 P BARAN On di8tributed communication8: IX, 8ecurity, 8ecrecy, and tamper-free con8ideration8 133 RAND Corp Aug 1964 RM-3765-PR 15 C WEISSMAN Programming protect'ion: What do you want to pay? SDC Mag Vol 10 No 8 Aug 1967 16 J P TITUS Wa8hinqton commentary-Security and privacy CACM Vol 10 No 6 June 1967379-380 17 I ENGER et al .{l utomatic 8ecurity cla88ification study H.ome Air Development Center Oct 1967 H.ADC-TR-67-472 18 A KARUSH The computer sY8tem recording utility: A pplication and theory System Development Corp March 1969 SP-3303 19 A KARUsiI Benchmark analysi8 of time-8haring 8ystem8 : Methodology and re8ults System Development Corp April 1969 SP-3343 20 It R LINDE P E CHANEY Operational management of time-8haring 8Y8tems Proc 21st Nat ACM Conf 1966 149-159 Management of confidential information b,y EDWARD V. COMBER System Dynamics, 1m. Oakland, California. INTRODUCTION For many years, informed persons have expended considerable time and energy attempting to evolve an acceptable philosophic assessment of the concept of "privacy." Studies made in the fields of anthropology, phychology, and sociology are in general agreement that both the mental and physical well-being of an individual requires fr~edom to experience some degree of personal anonymity within the envir?nment. While the significance of "privacy" has been recognized, it has eluded the constraint of an acceptable definition. .The search for a workable definition continues as man seeks a means for establishing, practical bounds for inter-personal relations. Recently, the concern for "privacy" has become a rallying point for those who see the present growth and applications of data automation as a threat to the "rights of privacy" of the individual. These advocates lament that the individual is unaware of the threat to his "loss of privacy" as his attention is diverted by the glowing promises of anticipated benefits that may become available through data automation. It is the writer's belief that through the proper and reasonable utilization of the tools of modern data technology man will have within his power a mechanism that has the potential of becoming his strongest ally in his search for means to preserve the values of "privacy." In reality, the critical element in this question of "privacy" should not address itself to the electromechanical capability of the computer or system telecommunications functions. The true focal point is the direct challenge to the discipline and conduct of man who is the designer and user of the data system. 6 Man must be willing to abide by the standards he derives from his own "privacy" criteria. He 'must staunchly forego any temptation to engage in system shortcuts, and he must hold to the position that he will not accept lightly any violations of the "confidentiality controls" established for system operation. Any breach in the integrity 'of the system must be viewed as a direct personal challenge to the integrity' of each person associated with the undertaking. SUMMARY The following is a brief resume of significant elements that have been identified with the question of "privacy." These comments are not offered as final nor are they to be considered as embracing the entire area of concern. The summary is presented simply as a means of bringing together some key factors that could serve as a foundation for a basic "privacy" control system. The working standards will evolve as man gains more experience with this powerful ally and is able to resolve philosophical and ethical questions that are inherent in the concept of "privacy". As the environment and pace of modern life adjust to current needs, the nature of "privacy" will probably also reflect changes in priorities and the character of the social stresses. Elements in the invasion of privacy No definitive statement exists which provides a clear and acceptable statement of what is "private information," or what constitutes an "unwarranted invasion of privacy." Any criteria proposed to date to identify "private information," or describe an act 135 136 Fall Joint Computer Conference, 1969 that would constitute "unwarranted invasion of privacy," must take into account whether or not such disclosure of the specific data: A. Would relate to an individual, a family or other small group in such manner as to facilitate the likelihood of the unwarranted identification of the individuals, or B . The data is not considered public information by provision of legal statute, or C. Would cause or be the basis for unjust economic loss or social stigma or harassment to the individual, or D. Result in the unnecessary loss of a property right. What is private vs. what is confidential? When attempting to discuss "privacy," the term "confidentiality" inevitably will join the debate, but does not promote clarification. What sort of personal information do reasonable men interpret as "private?" The answer to this question depends upon many things; for example, anyone or more of the following factors may apply: A. The context within which the specific information is embedded, B. The amount of information assembled and accessible, C. The intrinsic nature of the information. D. The sophistication of the social values held by the individuals concerned, E. The character and scope of the sub-culture, F. Significance of personal attributes such as: age, ancestry, social status, race, etc. Recently, the California Intergovernmental Board on EDP was established by statute.1 It is charged with responsibility to provide for intergovernmental representation in the coordination of the many government sponsored EDP programs and to take leadership in the establishment of intersystem standards. The Intergovernmental Board appointed a select Technical Advisory Committee to assist in the preparation of a Manual to serve as a guideline for all agencies in the development of local systems and facilitate adequate interface capability as required. The manual was completed and is under review by the Intergovernmental Board prior to general release to official agencies throughout the State of C~lifornia. A sub-committee of the Technical Advisory Committee was specifically assigned to address the question of "privacy". The members of the Privacy Sub-com- mittee concluded, after some study, that there are a number of personal information items that could be made accessible to an integrated data system without any threat to the individual "privacy". It was also recognized that there are many other data items that for one reason or another should be restricted from wide access in the absence of an established right to know. Some examples of these data items are as shown below: A. Information that may not be relevant to personal privacy: Name Maiden Name Address Age or DOB Race Sex Marital Status N arne of Spouse Next of Kin B. Information that would probably be relevant to personal privacy: Occupation Education Income Religious Preference Political Preference Family Size N umber of Children Ages of Children Taxes Paid History of Residence Attitudes Toward Social Issues Property Ownership Value of Real Property Marital History Drinking Practices Hospitalization Record Medical Record Symptoms of Illness Record of Arrest Ancestry Nationality Name of Relatives Response to Psychological or Medical Questions Proliferation of data it'3ms throughout culture While some of the information items mentioned above may be found on records that are classified as confidential, many of the information items may also be found on records that are not subject to restriction Management of Confidential Information by law or policy. The current trend in social intercourse and information exchange reflects an everbroadening depth of self-revealment by individuals. Private and governmental services are being extended into newer areas and thereby attracting the participation of an ever-growing segment of the citizenry. The integration of interagency information systems with data exchange introduces a new dimension associated with the creation of composite record images of persons known to the total system. These images are the product of independent and frequently unrelated inputs of data to serve other specific needs. Any integrated interagency information system with this potential capability must be administered by professionally qualified persons who remain sensitive of the need to verify both the identification of the subject of inquiry and the inquirer's "right to know". As more data systems are activated and interfaces are established, the individual who is the initial source of the data becomes more remote and isolated from the operational inquiry that relates to his record. It should be the constant aim of the system design, operational programming, and user discipline to assure that system integrity is not subverted. Significance of developing standards for data verification Attention should not be directed solely to provide for the identification and classification of personal data items. What is equally important, standards must be developed and adopted to guide data acceptance and utilization with respect to the ability to verify the information. For example, the confidence in the operating system will be increased and utilization encouraged if the user is assured that data items are subject to verification as to: A. B. C. D. E. F. Accuracy Bias Completeness Currency Documentation Satisfaction of Legal Requirements A safety value that will support a sound verifica'tion program is to initiate a practi,cal data purge system. The best data system in terms of cost/benefit analysis is one that has a high content of active data and one that is adequately updated. The effect of establishing a continuous and critical purge system is to provide an orderly review of file content, to remove inactive or low value data. 137 One approach to a data classification plan A number of studies have been undertaken in an attempt to identify and define data items that should be processed as classified or confidential. There have been perhaps as many solutions offered as there have been studies proposed. The Privacy Sub-committee mentioned above proposed a simple three category data plan for consideration and approval or the California Intergovernmental Board on EDP.2 The concept is summarized below: A. Confidential: This classification has the highest level of restriction, and should be limited to data which is prohibited from free and full disclosure by statutory regulation (law). B. Restricted: This is data which: 1. Is not prohibited from full and free disclosure by statute (coufidential), and 2. An unauthorized intrusion could constitute an unwarranted invasion of personal privacy, and 3. Has been administratively assigned a security classification-restricted. C . Unclassified: All data maintained by a public agency not otherwise classified as confidential or restricted as defined below. Sources of classification criteria The criteria for the establishment of classification of data arise from a variety of sources. In many instances, the criteria is a result of the interaction of one or more of the following: A. Public Policy: The living residue of tradition and social acceptance. B. Statutory Law: The formalized and legal codification of social needs and standards of conduct. C. Legal Interpretation: The implementation of judicial and administrative decisions that have been sanctioned through public acceptance. D. User Agency Specifications: Operational decisions that have been adopted 138 Fall Joint Computer Conference, 1969 and ennunciated to promote agency goals In an atmosphere of public support. E. Personal Needs of The Individual: Acceptance of the system integrity by the public who participate and furnish personal information to assist an agency function with respect to the needs of the individual (Federal Census, Social Security, etc.). Each of the sources of criteria utilized is subject to its own characteristic variations, and will require continuous reevaluation. The scope of data items subject to the confidential classification are under constant adjustment and reassessment due to the dynamic character of the social conditions which give rise to the data. Identification of areas sensitive to intrusion3 . One of the main deterrents to the development of new ideas about privacy has been the lack of specificity as to where the threats to privacy may arise. Many agree that at· some future date, a serious threat may develop. That a real danger exists today is not universally accepted. Let us consider the potential challenge to "privacy" that may originate from any of the;following sources: A. The accidental observance of data by an individual. B. The accidental dumping oj a volume of confidential data to general view. C. The solitary snoop. D. The snoop-Jor-pay (hired spy). E. The file stealer. F. Misuse of confidential file by administrator having access to system. G. Organized crime. H. Totalitarian government. I. Another possibility might be the intrusion of the private sector into government data files. Establish policy on data classification Before any acceptable automation program can be developed to process information that may be considered "private" or "confidential," certain policy decisions must be resolved. A. The responsible administrators representing users of the system must reach agreement on the data content of the information' system. This agreement must include the identification of any data items or files that would be subject to restricted access or inquiry. If the restriction is pursuant to current policy, said policy should be specified: 1. 2. 3. 4. General Public Policy Agency Administrative Policy Statutory Provision Judicial Ruling B. Specific criteria should be established based on the accepted policy statements, and serve as a guide to test the classification of all data, introduced into the system. The c011.tinued validity of a classification should be based upon periodic challenge and justification. C. A policy manual should be prepared and maintained as a ready reference to facilitate system operation. 1. Personnel participating in the system should be held individually accountable for full compliance with the "policy guidelines." 2. The policy manual should be subject to continuous review and update to remain current with system requirements, technology, and legal specifications. D. Additional considerations in the development of an Interagency Information System to maintain privacy control. Decisions regarding the following elements of the system design and operation will prove significant: 1. Facility Security: (a) Location of Hardware Single vs. Multiple Facility (b) Physical Adequacy Equipment Personnel (c) Access to Facility Normal Emergency 2. Equipment: (a) Selection (b) Configuration (c) Operating Characteristics Multi-processing Multi-programming Remote Terminals 3. Program Control: (a) Single Management Responsibility Management of Confidential Information (b) (c) (d) (e) (f) User Representation and Participation Operating System Monitor of System Services And Access System Applications Man Machine Interface (Key Consideration) Modularization of System Applications Does Modularization Weaken Privacy Control? Integration of Compatible Systems Does Program Control Reside With The Core System? 4. The Human Factor: This is the critical and perhaps most unpredictable element in the functioning process. (a) Personnel Recruitment, Selection And Appointment (b) Personnel Training And Supervision (c) Maintenance of Operating Discipline (d) Personnel Retention 139 encourage system utilization by t:q.e participants for which it was designed. 1. Equipment (system hardware): (a) Location and physical security of equipment. (1) Central Computer Installation (2) Associated Peripheral Equipment (3) Back-Up FacilitiesDuplicate Files (b) Remote terminal installations (I/O devices.) (c) Circuit Security 2. System Configuration (a) Central Data Bank vs. Dispersed Data Bases (b) Central Data File vs. Central Index Concept (c) Central System Control vs. Remote Terminal Activation Precautions to minimize potential for "privacy" violations (1) Restricted Terminal Operation (2) Multiple Function Remote Terminal The same versatility and power that makes the computer valuable as a data manipulator can be employed to monitor system services and support human supervision procedures. The operating information system should provide (assuming an adequate system analysis and design): 3. Software System Support-Programming must be developed with an awareness of the need for system integrity and data security. Provision must be made to provide control over basic software components, such as: A. A Sound Data Classification System 1. Specify data subject to restricted access and special protection. 2. Provide for isolated storage of restricted data if necessary. 3. Determine who has right to access to confidential data and under what operating conditions. 4. User agency personnel should be certified for access by administration. B. Physical Conditions: What levels of control should be imposed to promote system integrity and at the same time provide a functional environment that will (a) (b) (c) (d) Program Library Back-Up Documentation Diagnostic And Test Routines Continuous Coding of Update Schedules That Support The Identification Schemes Inherent to The Confidentiality Control Programs (e) Transaction Monitor Logs Should Be Designed to Provide The Basis For Operational Supervision But Not Reveal The Location or Content of The Confidential Files Which Are Subject to Monitor Control 140 Fall Joint Computer Conference, 1969 -------------------------------------------------------------------------------------4. Personnel Requirements-If the system equipment and facilities justify particular planning to minimize the hazards to confidentiality, it is certain that consideration be given to the personnel who will function in the system. The scope of attention should extend through both the employees who perform the technical services associated with EDP, and the operating personnel of the agency for which the information system was developed. Despite all that has been said heretofore, the "key" to security of information rests with the individuals who have access to the data system. Our personnel planning should encompass many specific areas. The following relate most directly to physical factors: (a) Personal Safety (1) Area Accessibility (2) Emergency Provisions (b) Personal Accountability (1) Identification Control Plan (a) Access to Installation (b) Access to Specific Work Areas (2) Is the Plan PracticalUsed? (c) Conveniences And Necessities (1) Are They Adequate? (2) Are They Properly Located? (d) What Special Precautions Are Warranted When Non-employee Personnel Are Permitted Access to The Installation Area? (1) Equipment Maintenance (2) Building Service Maintenance C. System Design Considerations: Control provided through specific programming techniques. 1. Limiting Terminal Access to The System-Programming (a) Classification Schedule (Data Level Control) (1) Terminal Identification (2) (3) (4) (5) Terminal Verification User Identification User Verification Call-Back COnCep1j (b) Restriction of Detail of Information in Response to Inquiry (Data Item Control) (1) Refer to Index -. Pointer to Source Data (2) Status Indicator (3) Advise Supervisory Station (a) Secure Permission to Interrogate The Restricted File (b) Receive Seleeted Hesponse Throug;h Monitor Agent (4) Specific Limitation Terminal Operation on (a) Data Input (b) Data Manipulation (c) Data Output (d) Data Change or Update (e) Data Purl~e 2. Establish A Monitor On All Terminal Action to Intercept and Identify unauthorized attempts to access the system. (a) Identify Transmitting Terminal And Location Operator(?) (b) Identify Terminal (c) Identify Specific Nature of Restricted Access Attempt (d) Provide For Supervisory Level Notification of The Attempt to Support Maintenance of System Discipline The Unauthorized At(e) Abort tempt to Secure Data 3. Maintain audit review of selected files to Managem,ent of Confidential Information facilitate the orderly purge of files and to check levels of file activity (a) Establish, as necessary, periodic file review procedures to challenge the continued "confidential" status of individual data items to assure conformity with system policy and user need (b) Maintain necessary statistical measures of activity in restricted files to document operational policy decisions. (c) Provide special test routines to challenge the confidentiality procedures and verify system functional integrity (d) The Human Factor- The concern for confidentiality of data and file security eventually will focus on an assessment of problems that arise from the human element in the man-machine system. Despite the sophistication exercised in system analysis, design and implementation, specific recognition must be given to the fact that people participate in system operations. What about a future computer utility?4 With the rapid and diverse growth of computer services and recognizing the intimate relation between hardware facilities, communication channels and the users of the systems, it is no accident that discussion should arise about the future establishment of a computer-communication utility. The need for such a service becomes more apparent as we see the introduction of time-sharing systems and the implementation of large integrated data services that support major regional and even statewide programs. The arguments pro, and con the justification for a computer-communication utility are beyond the scope of this paper. However, the utility concept does provide the opportunity to propose several avenues of approach to improving the "privacy" control aspect in personal data· systems. One of the recurring suggestions has been to establish a system of certification and licensing for persons directly involved with the design, installation, management} and the operation of data systems -nontaining sensitive personal information. A second device that could prove of value w{)uld be to effect 141 control through regulation of the computer-communication utility service. CONCLUSION The challenge of privacy control Violations of standards regarding confidentiality or privacy of information occur when particplar items of personal data furnished to an information system for approved selective use are released to unauthorized persons or in a manner that jeopardizes expected system integrity. A. The Predominance of The Human Factor Tbe integrity of any information system regarding confidentiality or invasion of privacy will eventually be resolved at the level of the human factor. Machines, data sets, file cabinets, index cards, tape drives, disk files, memory modules, computers, report registers-each of these devices is an inanimate object devised by man to receive, transfer, or hold information items made available to the system through human intervention. Data stored in these devices are significant only insofar as the output is meaningful to man, and subject to change or exposure by the action of an individual. Data stored in an inactive or inaccessible device without human interaction will not reveal information that would provide the basis for a violation of privacy. The relationship between man and his information system can be described as consisting of the following basic elements: (1) Man conceives the system. (2) Man builds the elements necessary to provide the system. (3) Man organizes the elements and establishes a scheme of operation. (4) Man gathers the data that he introduces into the system. (5) Man activates the system. (6) Man commands the resources of the system. (7) Man utilizes the results of the system in his external contacts in society. The consistent factor in the above summation is the predominant relationship of man to the system. Man is responsible for creation of the system, the input of information, the manipulation of that information, and the final 142 Fall Joint Computer Conference, 1969 disposition of the data produced or revealed by the system. B. Personnel Standards Are Necessary Due to the prime significance of the human element in the integrity of any data automated system, the programs must address the following problems in a forthright manner: (1) Personnel standards must be established for all participants. (2) All accepted personnel must be indoctrinated on a continuing basis regarding the system objectives, functions, operational responsibility, etc. (3) Specific training must be provided regarding system participation and terminal operation. (4) Each installation should have competent supervision and a plan of routine inspection of operations. (5) Each agency participating in a larger shared system must be accountable for the performance and integrity of its representatives. It must also be responsible for the release of any system information that is received from a classified file. (6) All personnel who have access to the system should be required to sign a voluntary statement acknowledging their individual responsibility to protect the integrity of the system and respect the confidentiality of classified data. This statement could be a factor in the initial as well as continued employment. 4 The operating system must prove convenient and satisfactory to the User. It must provide an effective service with assurance as to its accuracy and adequacy. Outputs should be tailored to meet the user need under the circumstances of the inquiry. The efficiency of the system should discourage any user development or maintenance of alternate or substitute systems. The man-machine interface should be maintained through the use of simple, direct devices with a minimum requirement for coding progressive verification, etc. An automated data system should be so designed and supported that the user is free to direct his full attention to his prime functional responsibility. The information system must be a viable fmd practical tool. It should function at the convenience of the user, with intelligible outputs consistent in time and content to satisfy the service requirement. Where a system requires specific security restrictions, these must be furnished and function without imposing any awkward limitation on the legitimate user of the system. C. Weak Policy And Discipline Result8 in An Inferior System Recent critics have voiced objection to the development of major data banks and interagency information sharing systems in government service. Their objection has been based, in part, on certain practices associated with private credit bureau operations. The lament, properly uttered, pointed to a lack of data control fLnd exercise of discretion by a number of these private agencies. While the economic and social value of credit rating bureaus is rendily admitted, the loose policies regarding "privacy of data" casts a shadow regarding the ability to maintain integrity in a major information system. I believe it is an unfortunate and improper inference to conclude that public information systems cannot protect the "privacy" of information due to questionable practices among some business organizations established to collect and merchandise private information for profit. D. Limitation of Data Access of Specific A uthorizalion Suggestions have been made that an individual should specify the extent of utilization of personal information and then the system be required to conform to the intention expressed by the individual. This proposal sounds reasonable, but on further consideration:. presents subsequent problems in data management, modification of data use authorization, etc., that demand thorough study. E. Individual Right of Inspection of Record - File Correction Perhaps one of the most practical approaches toward satisfaction of individual "right to privacy," and at the same time facilitate the availability of the maximum of information resources to solve social needs is to make pro- Management of Confidential Information VISIOn so that the individual can inspect the system files that contain his personal data. The individual should also have means to seek correction of any data item that is in error and subject to bias interpretation. F. Develop Realistic Data Purge Policy Attention should be given to the development of basic guidelines regarding the longevity of data resident in a file or information system. The current trend is to collect and classify more and more data on more and more people. While hopefully most of the data will have social value, I am sure that a significant quantity will provide little benefit to the individual or the community. It is not too early to consider the need for sound purge criteria so that the data retained in an operating system will offer the highest potential return for the energy expended. G. Adequate Training Programs Must Be Developed And Employed For The EDP Staff And Personnel of The User Agency Who Have Occasion to Engage The Data System The content should include an introduction to system design concepts, the overall functions and data processing applications that are components of the system and a thorough instruction in terminal man-machine dialog. In addition, some attention should be given to explaining the service philosophy with particular attention to the rules regarding access to and utilization of any information from confidential or restricted files. The legal and mora] issues must be clearly defined, and an understanding accepted by all who engage the system that a violation of the security code regarding 143 restricted data may be sufficient grounds for removal from system participation or dismissal. The training program must be viewed as a continuing support function with periodic refresher classes, problem sessions, review of privacy criteria, etc. It is most important that the agency administrators and key supervisory personnel become involved in this program" and not leave the system discipline t.ask to the technical staff who are not equipped nor responsible for this duty. H. Despite much uncertainty and misgivings as to the effectiveness in terms of "privacy" control that will result from the imposition of a licensing scheme, such a potential mechanism will be the subject of more intense consideration with the passage of time. REFERENCES 1 Intergovernmental Board on Electronic Data Processing created by statute passed by Legislature of the State of California. S B No 1100. This statute established under sections No 11710-11720 of the Government Code 2 File Security Procedures-Report by Sub-Committee on Privacy and Confidentiality of the Intergovernmental Board on Electronic Data Processing Oct 18 1969 3 Ibid 4 D E SCHWEINFURTH The coming computer utility-Laissez-Faire licensing or regulation? Computer Digest May 1968 5 A F WESTIN Privacy and freedom Atheneum New York 1967 6 Hearings Before a Sub-Committee on the Committee on Government Operations House of Representatives-89th Congress (Second Session) July 26 27 and 28 1966 7 System Development Corp "SDC Magazine" Vol 10 Nos 7 and 8 July Aug 1967 (This issue focussed on the question of computer privacy.) Some syntactic methods for specifying extendible programming languages by VICTOR SCHNEIDER Purdue University Lafayette, Indiana Model of translator system Our model of a programming-language translator system is represented schematically in the block diagram of Figure 1. This diagram divides the translator system into two components. The first component T is a translator program that reads in and translates the valid programs of some programming language L. The output of the translator is a subset T(L) of the intermediate language. The second component is a system M for executing the programs translated into the intermediate language. It will be seen that, in this intermediate language, the operators follow their operands in postfix (reverse polish) form, and they are relatively machine jndepend.ent. In this paper, we will be mainly concerned with defining the operation of the translator component by specifying the' inputoutput relationships of the translator for a particular programming language. These relationships will be described in a syntactic notation that is independent of the particulE r translation algorithm used. for implementing the translator T. The language that was chosen as an example for this paper is Wirth and Weber's EULER.14 EULER is quite similar to ALGOL 60 in appearance and capabilities, and it has additional features found in the LISP list-processing language. The original EULER Input .Programs in Language L Figure I-Simplified block diagram of a translator system syntax was written to conform to the requirements of a precedence translation algorithm,14 and contains a number of syntactic rules whose purpose is to facilitate construction of a precedence translator from these rules. Because of the presence of these stylized rules, it was decided to rewrite the EULER grammar into a more compact and transparent form than the one in which it originally appeared. An Irons-style notation2 ,3 was used to specify the translation of this new EULER grammar. Reverse Polish translation of programming languages To illustrate what we mean by a syntactic specification of a programming-language translator, let us consider as an example the following small portion of the EULER syntax and examine some of the basic devices used by our EULER sY:'ltem: 145 146 Fall Joint Computer Conference, 1969 Grammar 1. A Simplified Subset of EULER Syntactic Rule Rule of Translation (expr) ---+ (var) = (expr) I(sum) (sum) ---+ (sum) + (term) I(term) (term) ---+ (term) * (factor) I(factor) (factor) ---+ (sum») lat (var) I(var) I(var ). ( expr-sequence )). (var) ---+ (name) (expr-sequence) ---+ (expr) I (expr~sequence), (expr> (var) (expr > assign I (sum) (term )add I (term) (factor )multiply I (sum) (var) (var )in (expr-sequence) (var )in variable (name) I (expr-sequence) (expr) Note that the rules of translation above refer to sequences of symbols on the right parts of syntactic rules. In this example, we see that the rules of translation specify how symbols and sequences of symbols in the source language are rearranged and rewritten in the translated language. Where no change at all is indicated in the translation of a particular rule, the symbol ·"1" appears as a translation rule. As an example of how sequences of symbols are rearranged for translation, the infix addition of + is translated into the reverse polish sequence of symbols consisting of a "" followed by a "" followed by the intermediate-language command for adding together the values resulting from evaluation of the previous two subexpressions. As in good polish notation, parenthesis are removed from around expressions, and this process is specified by associating the translation nde "" with the syntactic rule ---+( on the lefthand side are used for translating arithmetic operands into the intermediate language. For example, the syntactic rule ---+ indicates that operands in arithmetic expressions are variable names, and the translation of a into the sequence in indicates that the "in" command is used for fetching the value associated with and for storing that value on top of the run-time operand stack of systom M. The other syntactic rule ---+ at reflects the fact the EULER permits use of program variables that are pointers to data named by other program variables. Hence, the effect of the "at" command of the source language is to suppress the appearan~e of "in" in the translated program after the translated variable name. In this case, a pointer to the data stored in is left on top of the operand stack in system M at run time. Finally, the rule -+ means that the names of program variables are translated into the sequence "variable ." Here, the effect of the "variable" command is to find a pDinter to the data stored in the following name by system M alli to place this pointer on top of the run-time op~rand stack. The sequence ".( )." on the right part of the remaining rule is ~he definition of an EULER function call. FunctlOn calls are translated with the parameters preceding the function name in the translated program. In this way, the function call can be made to look like a reverse polish operator having n operands: with n the nnmber of Syntactic Methods for SpecifyingEJxtendible Programming Languages parameters. A parameterless function call is translated exactly the same way as a program variable. Thus, the sequence "variable < name> in" in a translated program serves both to fetch data and to initiate a call on a function, depending on the < name> involved. This calling sequence will be referred to in the following discussion of extendible language features. In the full translation grammar for EULER given in Appendix 2, it is possible to see how the methods presented in the preceding example are applied to the specification of a complete programming language. Note that this larger grammar uses, e.g., the symbol "+" in place of the "add" instruction of our small example, and, in general, translates as many sourcelanguage symbols as possible directly jnto commands of the intermediate language. The description of EULER programming given in Appendix 1 of this paper should clarify the meaning of the EPLER operators used, and the following section in thIs paper wHI discus 3 the syntactic methods for optimizing and extending EULER as they are developed in the EULER grammar. A full description of the intermediate reversepolish language specified by the EULER rules of translation can be found in Schneider. 10 Syntactic methods of optimizing expressions In the EULER grammar of Appendix 2, the rules of translation specify that a conditional statement or expression of the form "IF < expr> 1 THEN < expr> 2 ELSE < ..expr> 8" is translated into its intermediate language version in the form "l$IF 2 $THEN 3 $ELBE" Note that each of the expressions here can themselves contain conditional expressions of any desired degree of nesting, and each of the subexpressions will be rearranged aFi shown above. In this intermediate language Syntactic Rule (prim) ~ (stringprim) (stringprim) ~ (stringhead) I (stringhead) ~ I I(stringhead) (symbol) 147 the "$IF" command causes an interpretive scan to the matching "$THEN" label if 1 is false. Otherwise execution continues until a "$THEN" is reached, at which point a scan occurs to the "$ELSE" label that matches this "$THEN" . In this way, "$THEN" and "$ELBE" behave like baJanced parentheses around expressions, and also serve as placemarkers to which control can be transferred in the translated program. This mechanism for executing translated cond tional expressions is used also as the basis for translating logical expressions into a partially optimized form. To take an example, the EULER sequence corresponding to a disjunction is represented by " OR ". Its translated form is " < disj > $IF $TR UE $THEN < conj > $ELSE". Here.. if the first operand" " of the expression is true, the entire expression is true. Therefore, the second operand is evaluated only if the first operand is false. A similar mechanism is used for the sequence " < conj > AND < neg> ". Here, if the first operand is false, the second operand need not be evaluated. Hence, the translated conjunction is of the form " $IF $THEN $FALSE $ELSE." Some syntactic methods of extending E U LE R After developing the appropriate techniques for translating conditional expressions and for optimizing logical expressions, the next order of business is to use these syntactic tricks to provide extended facilities in the EULER language. The introduction of full string-processing facilities into the EULER system is the first example to be considered. Without altering the EULER interpreter, and with a little reprogramming of the translator, we can effect the following improvement: Rule of Translation I (stringhead )). (stringhead). * (symbol), 148 Fall Joint Computer Conference, 1969 Here, a string of arbitrary length is translated into a list whose cells store the symbols in the string one symbol in the cell in sequence. With this arrangement, it is possible to manipulate strings using the list concatenation operator provided by EULER, and using EULER subroutines to perform tests for list equality and containment. The second example involves the addition of facili- ties for reading in data at run time within the framework of the EULER system. In this case, additional facilities must be provided in the EULER polish string interpreter. These facilities take the form of routines for converting numbers into their internal representation and for packing string data. The added syntax consists of the following set of rules: Syntactic Rule Rule of Translation (program) -+ .ENTRY (block).EXIT. \.EKTRY (data)., (block) .EXIT. (data) -+ (datahead) END (datahead) -+ DATA (item) \ (datahead )., (item) (item) -) (number) I (stringprim) I (datalist) (datalist) -+ .0. I (datalisthead ) (item»). (datalisthead> -+ .( I (datalisthead >(item), (block) (data> (block) With this program structure, the data portion could be read in by a run-time subroutine that leaves the data in a pre-arranged location of memory. The interpreter routine could then be read in over the data routine, and the translated program would be executed. A statement of the form "READ < prim>" would then store an appropriate link to some segment of the read-in on top of the run-time operand stack. The third example involves the use of a syntactic notation to expand the EULER language into a selfextendible programming language similar to MAD / 1 (4) and ALGOL 68 (11). By an extendible programming language, people currently mean the following two things. a. A language in which the programmer can specify new data types and data structures composed of novel configurations of data elements. b. A language in which the programmer is able to reorder the priorities of expression operators and is able to specify arbitrary new operations at will. In EULER, there already exists a general mechanism for allowing programmers to manipulate data structures, namely, the list mechanism. EULER lists can be constructed from arbitrary combinations of data I $DATA (item) I I I I I I I I elements. However, EULER only has eight data types with no facilities for extending their ranges. Such rangeextension facilities depend on the machine on which the language is implemented, and algorithms for specifying such data types as numbers of arbitrary precision must be written for the machine in question. Hence, our example will concentrate on the machine-independent problem of specifying new operators in programs. Any reasonable programming language must presuppose the existence of a standard set of expre~~ion operators before provision is made for aUa wing programs to expand this set of operators. VVith each standard operator will be associated a standard precedence level, and the operators to be introduce:l by the programmer must also have precedence levels. A" the term is currently used, operator precedence (or priority) is a measure of how expression operators compare in binding power. For example, exponentia.tion is said to have lower precedence than addition, bec:aus~ . exponentiation is performed before addition in 2.rithmetic expressions. Thus, precedence impose<:J an ordering on the operations of a language. This ordering is reflected in the ordering of syntax rules in programming language grammars. In the EULER grammar above , rules are ordered so that list concltenation is . performed first, then exponentiation, and so on, unttl the operation of value assignment. From concatenation Syntactic Methods for Specifying E,xtendible Programming Languages to assignment of value there are nine levels of precedence. Our approach in providing, for the programming of new operations js to assign these operations to one of nine c:asses of operators, reflecting the nine levels in original grammar. This means that the translator must now treat operators as though they are procedure calls that ca.n only be written into the translated program 149 where their associated precedence level permits th eir operations to occur. In order to permit the programmer to tell the translator what precedence is associated with a newly defined operator, we require an additi onal operator declaration in our language. This declaration , together with the precedence syntax of express)ons that follows, is sufficient to provide the expanded operator-definition facility Grammar 2. An Expression Grammar for Defining New Operators Syntactic Rule Rule of Translation (expr ) ~ (var) (opname) (expr ) I (disj ) (disj) ~ (disj) (opname) (conj) I (conj ) (conj) --? (conj) (opname ) (neg) I(neg) (var) (expr) $VARBL (opname) $IN I (disj ) (conj ) $VARBL (opname) $IN I (conj ) (neg) $VARBL (opname) $IN I (catena) ~ (catena) (opname) (prim) I (prim) (catena) (prim) $VARBL (opname) $IN I (blockhead) ~ (blockhead) (operatordec )., (operatordec) ~ OPERATOR (opname) I(operatordec), (opname) (blockhead) (operatordec ) (explI) ~ (opname) = (opdef) (opdef) ~ (defhead) (expr) $. (defhead) ~ (rankpart) (operand part )., (rankpart) ~ RANK OF (digit)., (operand part ) ~ OPERANDS (name) I (operandpart), (name) (opname) ~ (symbol) I (opname) (symbol) In the expression syntax above, the in each rule is translated into a procedure call, \vith parameters consisting of the one or more operands associated with each . These procedure calls either refer to the "Standard" operator associated $NEW (opname) (operatordec) $NEW (opname) (opname ) (opdef) = I (rankpart) (operandpart) (Not Translated) $FORMA (name) $FORMA (name) (operand part ) I I with a particular precedence level or refer to the translated declared by the programmer. It is assumed that the translator will automa.tically enclose each translated program with an extra outer block containing procedure definitions for the set of standard 150 Fall Joint Computer Conference, 1969 operators basic to the language. In this way, the standard operators can be redefined within a particular program, but will regain their usual meaning upon exit from the block in which. the redefining statement occurred. A consequence of this method of allowing new operator definitions is that program subroutines may use operators global to their definitions, but may not have operators passed to them as parametsrs, since all assignment of precedence is performed at translation time. A certain amount of optimization is still possible within the framework of this extendible translator. As an example, suppose that we write the following pro~ cedure correspond to the standard operator for logical conjunction: AND = RANK OF 7., OPERANDS X, Y., IF Y THEN X ELSE FALSE $. The actual parameters in the procedure call for logical AND above are expressions surrounded by ".$" and "$.". Thus, the effect of the conditional expression in the operator definition given above is to evaluate the Y parameter only once and not to evaluate the X parameter unless Y is true. Grammar 3. A Programmer~defined syntactic augments to existing languages As a next step in allowing programmers to decide on the nature of their own programming languages, we could conceive of a translator facility for allowing programmer~specified syntactic and semantic augments to existing programming languages. The idea behind this definitional facility is that the translator can be provided with facilities for accepting new syntactic rules and associating their right parts with :rules of translation that are essentially calls on global procedures. The operands within the new syntactic augments are than translated as parameters supplied to the procedures for executing the augments. The feasibility of such augments, provided they do not lead to problems of syntactic ambiguity, can be inferred from the algorithms presented in Schneider. 9 .10 As an example of what a programmer might be tempted to add to his language, and of the methods he could use, we consider the problem of adding ALGOL W-style iteration to the EULER language. In the folloWing translation grammar, the global procedures used in translated programs are "$FOR." and "$ WHILE", corresponding to the incremented variable and ]ogioal iterations, respectively. Programmer~Defined Syntax of Iterative Statements Syntactic Rule (a) (expr) ~ WHILE (expr)l DO (expr)2 (b) (expr) ~ FOR (var) FROM (expr)l UNTIL (expr)2 BY (expr)3 DO (expr)4 Rule of Translation (a) .$ (expr)1 $.. $ (expr)2 $.$VARBL $WHILE $IN (b) (var) (expr)l (expr)2 (expr)3 .$ (expr)4 $.$VARBL $FOR $IN Note that the controlled statement in the syntax above is translated with procedure definition brackets ".$." and "$.". In this way whenever the corresponding formal parameter in the "$FOR" OR "$WHILE" procedure definition is executed, the entire controlled statement is executed as a procedure. The procedure definitions of "$FOR" and "$WHILE" that follows are the "semantics" of Grammar 3: $FOR = .$FORMAL VAR, EXPl, EXP2, EXP3, STAT., BEGIN LABEL TEST, CYCLE., VAR = EXPl.,GOTOTEST." CYCLE .. VAR = VAR+ EXP2., TEST .. IF(VAR- EXP3) *SIGN(EXP2)GT 0 THEN UNDEFINE D ELSE BEGIN STAT., GO TOOYOLE: END $. $WHILE = .$FORMALLOGEXP, STAT. BEGIN LABEL OYOLE., OYOLE .. IF LOGEXP 'THEN BEGINE~T A1', GO TO OYOLE END ELSE UNDEFINED END $. Syntactic Methods for Specifying Extendible Programming Languages 151 The flowchart of Fjgure 2, showing the transitions to and from the box corresponding to < expr <, illustrates hO\v the EULER translator was programmed. 1111+1 REFERENCES N ",Sj 1 j=j+l THEN (consequence) (alternative) Outcode(N1 ) Outcode(N1 ) 1.1-1 1a1-1 Sj ill ? (pro cde t) ). Out code (Sj) j.j+l Outcode(Sj) jaj+l 1-1-1 TO INITIAL POINT FOR ~ Xl Xl ~ X 2 < consequence> X2 ~ < condition> By letting Xl be THEN and X2 be IF in the translator, the coding is greatly simplified, and no ambiguities are introduced, since the X; can be treated as "new and distinct" symbols of the normal-form grammar. 1 R W FLOYD A descriptive language for symbol manipulation JACM Vol 8 1961 579-584 2 E T IRONS A syntax dire::ted compiler for ALGOL 60 CACM Vol 4 1961 51-55 3 P M LEWIS R E STEARNS Syntax-directed transduction JACM Vol 15 1968465-488 4 D L MILLS The syntactic 8truciure oj MADlt DDC Rpt No AD-671-68:-3 1968 5 P NAUR editor Revised report on the algorithmic langua(,'c ALGOL 60 CACM Vol 6 1963 1-17 6 V 13 SCHNEIDER The design of processors for context-free languages NSF Memo Northwe",tern Univ Hl65 7 V B SCHNEIDER Pushdown-store processors of context-free languages Dept of Industrial Engineering Northwe-"tern Univ 1966 Evam;ton III 8 V B SCHNEIDER Syntax-checking and parS'ing of conte;rt-free languages by pushdown-store auto mata Proc SJCC 196771-75 9 V B SCHNEIDER A system for deS'igning fast programming language translators Proc SJ CC 1969 777-792 10 V B SCHNEIDER A translator system for the EULER programmng language Tech Rpt 68-76 Computer Science Center Univ of Md College Park 1969 11 A VAN WIJNGAARDEN editor Report on the algorithmic language ALGOL 69 Mathematisch Centrum 49 2e Boerhaavestraat Am",terdam The Netherlands 1969 12 J WEIZENBAUM A symmetric list processor CACM Vol 6 1963524 13 N WIRTH A generalization of ALGOL CACM Vol 6 1963 547-554 14 N WIRTH H WEBER A generalization of ALGOL and its formal definition: Parts I and II CACM Vol 9 1966 13-25 89-99 Appendix I Features of the E U LE R language EULER is a nested block-structure language, similar to ALGOL. Thus, every block, consisting of a sequence of statements surrounded by BEGIN and 152 Fall Joint Computer Conference, 1969 END parentheses, can be treated as a single statement in ALGOL fashion. An EULER program consists of an EULER block preceded by .ENTRY and followed by.EXIT .. In EULER., there are three declarations. One declaration is for data variables, one for program labels, and one for formal parameters of procedures. In the program ".ENTRYBEGIN NEW X, Y., LABELZ., ... Z .. X X and Y will store data, and Z will be a label preceding some statement. Assigning a data type to a declared variable is accomplished by writing an assignment statement with data of the appropriate type on the right-hand side of the assignment. Thus, typing of variables in EULER is dynamic, since any assignment statement can change the data type stored in a variable. And, data typing is implicit, since there are no declarations like rea.!, integer, etc., as appear in ALGOL. The followi.ng is a list of the right EULER data types: + YEND .EXIT." I. Number --In the EULER system, all numbers are assumed to be floating point numbers. The assignment statement "V = E.," with E a numerical expression or number, causes variable V to become a numerical variable. II. Symbol -In this EULER implementation, an assignment statement such as "V = . *ALPHAN.," causes the six characters "ALPHAN" to be stored in the location named by variable V. III. Logical -The logical constants are TRUE and FALSE, standing respectively for logical truth and falsehood. The assignment statement, "V = L.," with L a logical constant or logical expression, causes variable V to become a logical variable. IV. Label --EULER programs use two declarations. "NEW" is used to declare a data variable, and "LABEL" is used to declare the presence of a label in some block of a program. Interestingly, if V is a variable in some EULER block, and V is not in a block global to the block of label L, then the assignment statement "V = L.," causes V henceforth to be of type label, and to be interchangeab1e with L in GO TO statements. V. Reference-In EULER, if VI is a variable not in a block global to the block of variable V2, then the assignment statement "VI = AT V2.," Syntactic Methods for Specifying Extendible Programming Languages makes VI a pointer to the data stored in V2. After VI is turned into such a pointer, the two statements and "V2 = V2 + 1.," "VI IN = VI IN + 1.," will have exactly the same effect of manipulating whatever data is stored in V2. VI. Procedure--An assignment statement of the form "VI = .$ (expr) $.. ," causes VI to become the name. of a parameterless procedure call with body given by (expr). As a programming example, we might consider the following EULER block: "BEGIN NEW X, Y., X = 2., Y = .$FORlVIAL Z., X = X OUT Y~(5). END" + Z$ .. , When Y.(5). is operated on by the "OUT" operator, the value 7.0000 will be -written out. VII. List -In EULER, lists can be constructed in three distinct ways: (a) On command: "VI = LIST 300.," This statement creates a list of 300 undefined cells and makes VI their name. (b) By explicit notation: "V2 = .(1,.(2, 3)., 4) .. ," This statement creates a list consisting of two numbers and a sublist and makes V2 the name of that list. (c) By concatenation: "VI = VI CON CAT V2.," Using the CONCATenation operator, small lists can be joined into larger ones. In addition, lists can be subscripted in the same way as ALGOL arrays, each element of a list can be any EULER data type, including label, reference, and procedure. The following EULER block is a small example of the genera1ity of the list notation: "BEGIN NEW X, Y., LABEL Z., = .(2, .$ BEGIN X = X+ 1., Y(X) END $., .$ OUTX$., Z) .. , X = Y(l)., Y(X)., GOTO Y(4)., Z .. OUT .*FINISH END" Y With this program segment, first 3.0000, then FINISH will be written out by the executed program. VIII. Undefined-Every variable declared by "NEW" in an EULER program is initially of type "UNDEFINED." In addition, "UNDEFINED" is used as a data constant occasionally and as an empty option in conditional statements such as: "V = IF LI THEN .(1, 5). ELSE UNDEFINED.," For more details on EULER programming, the reader is referred to the Wirth and Weber EULER paper.14 153 154 Fall Joint Computer Conference, 1969 Appendix 2 11 new translation grammar for EULER Syntactic Rule Rule of Translation 1: (program) ~ .ENTRY (block) .EXIT. (block) (blockhead) (body) $END 2: (block) ~ (blockhead) (body) END $BEGIN 3: (blockhead) ~ BEG IN (blockhead) (label dec ) 1(blockhead) (labeldec )., (blockhead) (vardec ) 1(blockhead) (vardec )., $NEW name 4: (vardec) ~ NEW (name) (vardec) $NEW (name) 1(vardec ), (name) $LABEL (name) 5: (labeldec) ~ LABEL (name) (labeldec ) $LABEL (name) I (labeldec), (name) I 6: (body) ~ (body)., (stat) I 1(stat) 7: (stat) ~ (labdef) (stat) I 1(expr) I $LBDF (name) 8: (labdef) ~ (name) .. (expr) $GOTO 9: (expr) GO TO (expr) (expr) $OUT lOUT (expr) (var) (expr) = 1(var) = (expr) I I(disj ) 1(condition) (consequence) (alternative) I (expr) $IF 10: (condition) ~ IF (expr) (expr) $THEN 11: (consequence ) ~ THEN (expr) (expr) $ELSE 12: (alternative) ~ ELSE (expr) I 13: (disj ) ~ (conj) (disj ) $IF_$TR UE $THEN_ 1(disj ) OR (conj) (conj ) $ELSE I 14: (conj) ~ (neg) (conj ) $IF_ (neg) $THEN_ 1(conj) AND (neg) $FALSE $ELSE I 15: (neg) ~ (relation) (relation) $NOT INOT (relation) I 16: (relation) ~ (sum) {sum )1 (sum )2 (relop ) 1(sum)1 (relop) (sum)2 $EQI$NEQI$GEQ 17: (relop) ~ EQINEQIGEQ I$LEQI$GTI$LT ILEQIGTILT I 18: (sum) ~ (term) (term) 1+ (term) (term) $NEG 1- (term) (sum)(term) {+I-} 1(sum) {+I-} (term) 19: (term) ~ (factor) I (term) (factor) {*I/I./.I 1(term) {*1/1·/· $MODUL} IMODULO} (factor) I 20: (factor) ~ (catena) (factor) (catena )** 1(factor )** (catena) I 21: (catena) ~ (prim) (catena) (prim) $CONCA 1(catena) CONCAT (prim) $UNDEF 22: (prim) ~ UNDEFINED Syntactic Methods for Specifying Extendible Programming Languages Syntactic Rule 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: I(val') I (label) I( (expr») I (block) I (procdef) I (reference prim ) I (Iistprim) I (numberprim) 1(logicalprim ) ITAIL (prim) I (val') . ( (expr-sequence )) . I{symbolprim) (label) -~ (name) (val') ~ (name) I (val') IN 1 (val') (sum-sequence») (expr-sequence) ~ (expr) I (expr-sequence), (expr) (sum-sequence) ~ (sum) I (sum-sequence), (sum) (referenceprim) ~ AT (val') (list prim ) ~ (list) ILIST (sum) (list) ~ .( ). I (listhead> (expr )). (listhead) ~ .( (numberprim ) ~ (number) IREAL (disj) ILENGTH (catena) IABSOL UTE (sum) IINTEGER (sum) (logicalprim ) ~ TRUE 1FALSE ILOGICAL (sum) 1(sypeinquiry) (val' ) (typeinquiry) ~ ISNU IISLOIISLAIISLI IISPR IISREIISSY IISUN (symbolprim) ~ . * (6-symbol string) (procdef) ~ .(prochead ) (expr) $. (prochead) .$ I (prochead) (formaldec )., (formaldec) ~ FORMAL (name) (formaldec ), (name) (6-symbolstring) { (letter)1 (digit) (blank) I,I·I$I*I?I = 1+1- Rule of Translation (var) $IN (label) $IN (expr) I I I I I I (prim) $TAIL (expr-sequence) (val') $IN I $VARBL (name) $VARBL (name) (val') $IN (val' ) (sum-sequence») I (expr-sequence) (expr) I (sum-sequence») (sum) (val') I (sum) $LIST I I I $NUMBR (number) (disj ) $REAL (catena) $LENGT (sum) $ABSOL (sum) $INTEG $TRUE $FALSE (sum) $LOGIC (val' ) (typeinquiry ) $ISNU I$ISLO I$ISLA I$ISLI I$ISPR I$ISRE I$ISSYI$ISUN I I .$-(prochead) (formaldec ) $FORMA (name) $FORl\1A (name) (formal dec ) I i>1<}6 (i.e., a string of 6 characters.) 39: (name) ~ (letter) I 155 156 Fall Joint Computer Conference, 1969 ----------------------~-------------------------------------------------,-- Syntactic Rule I(name> (letter> I (name> (digit> 40: 41: 42: 43. Rule oj Translation I I (For the IBlYI 7094 and the UNIVAC 1108, only the first six characters of a (name> are translated.) (number) ---'? (integer> Converted to octal. I(integer). (integer) Converted to octal floating point. (integer> ---'? (digit> I (integer> (digit> (digit> ---'? 0111 ... 19 I (letter) ---'? AI ... IZ I -- SYMPLE-A general syntax directed ~acro preprocessor by JAMES E. VANDER MEY The Pennsylvania State University University Park, Pennsylvania ROBERT C. VARNEY The Pennsylvania State University McKeesport, Pennsylvania and ROBERT E. PATCHEN IBM Corporation Boston, Massachusetts INTRODUCTION The subject of this paper is a general syntax directed macro preprocessor system. One of the suggested potential uses of this system is that of evaluating new or extended programming languages by the technique of syntax directed macros. This led to the association of the acronym SYl\1PLE (SYntax Macro Preprocessor for Language Evaluations) with this system. A preprocessor is a processor intended to be used prior to another processing stage. In our case, it is assumed that the SYlVIPLE preprocessor system will generally be used in processing higher level language texts (ones which are user oriented), producing output text in the same or a similar higher level language. The term "macro" is used in a very general sense in this paper. As in other macro systems, the macro mechanism consists of the recognition of a macro "reference" in the source text being processed, and a macro "definition" defining a translation proceduFe invoked by some corresponding macro reference. A SY1\:lPLE macro definition consists of two parts: the "macro semantic portion" or "macro body"; and the "macro templates." The macro semantic portion is the translation procedure and consists of the instructions to be executed when the macro is "invoked". A macro is invoked when a pattern described in one of its macro templates is recognized by the parser in the source input text. This macro reference pattern may have identifiable parts which are then considered as arguments for the semantic portion. A macro template defines a possible macro reference pattern for this macro and consists of two distinct parts: A specification of a general syntactic substructure of the source input text in which a given macro reference may occur (i.e., context); and any necessary further syntactic qualifications within that general syntactic substructure (e.g., a specific pattern). The actual pattern matching technique for macro reference is thus a two level syntax directed matching procedure. This syntax 157 158 Fall Joint Computer Conference, 1969 directed macro reference technique is the method by which SYl\1PLE achieves both simplicity and generality. The SYl\1PLE system as a macro system is not tied to any particular programming language. The base (source input) language and the object (output) language of the macro facility could in fact be entirely different languages. The syntax of the languages to be processed and/ or extended must be adequately described through the syntax description metalanguage of the S Yl\1PLE system. This syntactic description is used for determining "context" for macro references and thus the requirements for a minimally "adequate" syntactic description of a language are proportional to the degree of context required to isolate macro references. As a very simple example, assume all macro references must occur in only a single specific syntactic unit (syntactic substructure) of the base language (e.g., only labels of Fortran statements). Then to facilitate the recognition. of macro references in the source language, the syntax of the base language need only be described via the metalanguage to the extent that it can isolate this syntactic unit type (i.e., Fortran labels.) vVhen recognized, this syntactic unit will then be considered as a candidate for containing a macro reference. After a candidate syntactic unit is isolated in the source input a check can be made for the existence of specific macro references by testing for further qualifying patterns within that syntactic unit. For instance, a Fortran label of "three blanks followed by t"yO numbers" might be a specific macro reference. A check would thus he made for this reference according; to the syn-' tactic pattern defining "three blanks followed by two numbers" whenever a Fortran label is recognized. This process of local syntax investigation is called "template matching" for a macro reference. It is also through the template matching facility that translation parameters in the source language (e.g., arguments, conditions, etc.) are recognized and passed to the actual macro facility. These translation parameters, which we shall call argument strings, can be manipulated by the instructions contained in the body of the macro (semantic portion). Since the primary function of the SYl\1PLE system is that of a preprocessor, the translation process is mainly that of a manipulation of argument strings and the insertion of modified and/or created strings back into the source input. Hence, the actual semantic portion of the macro is implemented in a language oriented to the manipulation of character strings. Thus translation due to macro references and related translation param- SYW'LE PREPROCESSOR S'I'STEM FI..CNI Figure I-A general flow of the SYMPLE macro preprocessor system eters generally results in the insertion of the translation code in the base language into the body of the code being processed. It will be shown that this "in place" translation in the SYMPLE system does not necessarily imply expansion in exactly the same place (i.e., at the lexicographical location of the maero reference). An attempt will now be made to summariize and interrelate the functions of the SYMPLE system by outlining the system functional flow via a system flow diagram (Figure 1) and the following brief description. The preprocessor operates as follows: 1. The first items processed contain control information which includes such items as the device(s) from which subsequent information is to be read, the device(s) designed for system output, the names of special edit macros, specifie listing options, etc. Control information ma,y oceur in the input stream at other logical stages of processing. 2. A description of the base language syntactic structure is read as input and proeessed to build a data base for the recognition portion. This data base will be used later by a parser. 3. Macros (templates and associated semantic translation routines) are read in, stored, and used to create necessary data bases for later processing. 4. A source deck is read in and parsing; of the source input begins. (Probable entry point for most users.) a. As a syntactic unit is recognized, a check is made to see if any macros have templates to be matched in this syntactic unit. SYMPLE Ternplates of edit macros, if any, are tested last. When there are no templates left to be checked and if the end of the total parse has not been encountered, the parse is continued. b. If a macro template match is successful, the argument strings are passed to its associated macro semantic portion. There may be any number of macro templates associated with a given macro semantic portion, and ident.ical template patterns can be associated with different macro semantic portions. c. The instructions in the current macro semantic portion are executed (actually interpreted) and the results of their operations are effected (e.g., storage manipulation, insertion of translation into input source, dynamic creation of new macro templates or semantics for this or other macros). Upon completion of execution control is returned to 4a above. 5. When the source deck has been completely parsed and thus source time translations, including any necessary editing, have been completed, the file is then ready for output in a manner specified by the control information. 6. Processing is now completed, but by appropriate control information another cycle may be initiated on (a) new information or (b) on a previous preprocessor output file. Thus, in the latter case, we have the possibility of a multipass preprocessor, if desired. The remainder of this paper will be devoted in the main to the details of what the SYMPLE system can do and in general how one goes about using the SYMPLE system. The syntax description metalanguage is introduced first followed by an introduction to the macro translation (semantic) and insertion capabilities ofSYMPLE. Syntax description metalanguage The syntax description metalanguage is used to describe a parsing "grammar" of the base language in which macro references are to be embedded and thereby outline the manner in. which the source input is to be parsed. For example, suppose a label field is one syntactic structure to be parsed. The parser should then be told that a label field consists of, say, five characters which are either all digits, all blanks, or a string of blanks followed by a string of digits. 159 The grammatical metalanguage used to direct SYMPLE',s parser is similar to the Backus-Naur Form 4 (BNF) metalanguage. For example, similar grammatical productions are used to define syntactic structures; the nonterminals and terminals of BNF are also used being renamed syntactic units and literal strings, respectively. There, are, however, several features in SYMPLE's metalanguage which were incorporated to extend the power and simplicity of grammatical description over that of standard BNF. Actual productions in SYMPLE's metalanguage to define the parsing desired in the preceding example are (LABEL-FIELD) :5&5(0$' 'O$(DIGIT» (DIGIT) :'0' 1'1' 1'2' 1'3' 1'4' 1'5' 1'6' 1'7' 1'8' 1'9' The first production above is interpreted as: a label field is defined as not less than five nor more than five characters of a string of zero or more blanks 'immediately followed by zero or more digits. Productions The syntactic units of the base language are defined by productions in the metalanguage. These productions are of the form: (LHS): right side where (LHS) represents the syntactic unit being defined on the left side and the right side contains metalinguistic descriptions of other syntactic unites) and/or literal string(s) in the left to right order in which they comprise the structure of (LHS). The colon (:) separates the defined syntactic unit on the left side from the defining information on the right side. The first production of the base language grammar must be the definition of the syntactic unit representing the total syntactic structure of the base language (i.e., the initial or distinguished symbol of BNF). Other productions may be in any order. (Named) Syntalctic units The metalinguistic representation of a syntactic unit in a production is a string of arbitrary length enclosed in parantheses. The string (called the name of the syntactic unit) may be composed of any characters with the exception of those used as special delimiters in the syntax description metalanguage (i.e., illegal characters are 0: ;'1 $&). 160 Fall Joint Computer Conference, 1969 Literal strings A literal string is represented in the metalanguage by the desired string of characters enclosed in single quotation marks ('). Any character may be used within a literal string, except that a single quotation mark is represented by two adjacent single quotes for each occurrence in the literal string in order to differentiate it from the ending delimiter of the literal string. Alternatives If a syntactic unit in the base language may h~ve alternative representations, these alternatives may be represented in the metalanguage as a single production with the alternatives of the syntactic unit each appearing on the right side and separated from each other by the conventional OR symbol (I). Example: (DIGlf):'1'1'2'1'3'I(OTHER) Complex substructures (Unnamed syntactic units) If one does not wish to break down and label a syntax substructure in detail, but simply label an entire complex substructure as a syntactic unit, pairs of parentheses may be used as grouping in::licators. Consider the following equivalent examples of a definition of the syntactic unit (NUM4). Example: Example: (NUM) :'2'1'3'1'4' (NUM2) :'3'1'4'1'5' (NUM3):'5'1 '6'1'7' (NUM4) :'1' (NUM) (NUM2) 1'1' (NUM3) (NUM4): '1' «'2'1'3'1'4')('3'1'4'1'5')1 ('5'1 '6'1 '7')) Grouping may occur to any depth desired and each quantity within the grouping parentheses must have the form of any legal right side of;a production. Quantity repetition and bounds Often in the syntax of a base language a (named or unnamed) syntactic umt or literal string may be required to occur several times. Or it may be desirable to specify that a syntactic structure b3 a function of the length of an input string in addition to other qualifications (e.g., a label field of exactly five characters and consisting of . .. ). To indicate either the repetition of a string (Le., the input string defined by a syntactic structure) or the length bound on the number of characters in some string, an operator group must precede the respective quantity in the syntax. The operator group ils of the form n$m or n&m for the string and character counters respectively, where n is an integer representing the lower bound and m, an in 'jeger representing the upper bound. Consider the following example. (A): 3$3 (SUB-STRUCTURE) (B): 3$3 (SUB-STRUCTURE) (C): 'C' (SUB-STRUCTURE): O~~5 (0) 1$3'AB' The first production defines (A) as exactly three strings of (O$5(C)1$3'AB'). Thus, acceptable strings for (A) might be ABABAB or ABCABCC.CCABAB or CCABABCABAB, etc. However, (B) is defined as exaetly three characters which are otherwise defined as in (A). Thus, (B) can be only CAB; no other combinations will yield exactly three characters. Notice that the string counter differs from the character counter in that it is distributed over all inner strings whereas the character counter represents an absolute bound over a given substructure. When productions include quantities with :repetition counts, the parser which utilizes these produc:tions will attempt to find the largest number of those quantities in the input source consistent with the upper bound of repetitions. If the input contains more than 1Ghe upper bound of these quantities, the input string corresponding to the upper bound count of quantities will be reeognized and succeeding repetitions will be analyzed according to the syntax following. A lower bound count of zero is allowable and simply indicates the optional omission of the quantity. The absence of an explicit lower bound implies a lower bound of one. The absence of an explilcit upper bound implies an upper bound which is the maximum bound allowable in the system. In the present im.plementation it is 32767. It should be noted that 1$1 (SYUN) and (SYUN) are equivalent as are $(SYUN) and 1$32767 (SYUN) Complement look-ahead The symbol -, preceding a literal string, syntactic unit or grouping indic?tes that at that point in the syntax the quantity indicated lll:ust not occur: This :ls called a complement look-ahead for the indicated quantity at SYMPLE parse time. If the quantity is found, the parse being attempted has failed. (Any syntactic units found on the look-ahead will not result in macro template match attempts.) If the quantity is not found, the parse continues as before the complement look-ahead. Example: (LETTER):'A'I'B'I'C'\'D'I'E' (SPLTRSTRG) :$( --, '0' (LETTER» The strings recognized as (SPLTRSTRG) will be any string which consists of one or more of A, B, D or E, butnotC. Scan positioning The production defining a syntactic unit can be made to include, without investigation as to structure, an arbitrary lengh of input, or it. may require that a particular syntactic unit in the input conform to more than one syntactic structure. This is done by explicitly positioning the location at which the parser is "looking." This location, called the scan position, can be adjusted either relative to its present position or to the beginning reference points in the syntax of the parsed input. a-X(Space) positioning The occurrence of the symbol X immediately followed by an unsigned integer number and delimited by bracketing commas at any point in the right side of a production will cause the scan position to be adjusted rightward from its present location the integer number of positions specified. The symbol X and following number must be bracketed on both sides by commas except in the following cases: X is the first (last) symbol of a grouping level or the first (last) symbol of the right side of a production, in which case the left (right) comma is not required. Example: Define an (END-CARD) to be an 80 character string. The first six characters must be blanks, the next 66 characters must have the word END somewhere with the rest blanks, and the last eight characters may be anything. (END - CARD): 6 & 6' '66 & 66 (0$" ('END') 0$' ') , X8 b-T (Tab) positioning The format is similar to that of X positioning, except a T is used instead of an X. The T scan positioning results in the scan position 161 being moved the specificed number of places to the right of the beginning location at which the parse began at (1) this grouping level, if the T positioning appears within a grouping parenthesis pair, or (2) th~ right side of the production otherwise. Example: A syntactic unit (El\1PLOYEE-NO.) is defined to be an 80 character string with'i1 syntactic unit (LAST-NAME) beginning in position one, followed by a single blank and then the syntactic unit (FIRST-NAIVIE). Exactly 15 spaces after the beginning of (FIRST-NAl\/[E) is to appear the syntactic unit (CODE). Finally (NUMBER) will be 75 spaces from the beginning of (ElVIPLOYEE-NO.). (El\tfPLOYEE-NO. ): (LAST-NAME) " ((FIRST-NAl\tfE) , TI5, (CODE)), T75, (NUNIBER) Recursive grammars in the metalanguage Recursive grammars (i.e., productions with the syritactic unit of the left side occurring as well on the right side, or being in the derivation of a syntactic unit of the right side) are allowed in the metalanguage subject to certain conditions. For instance, left recursive productions are not allowable, but other recursive productions are allowable. Further, the character (&) bound counts are cumulative . from the initial (top) occurrence in a recursive parse while the repetition bounds ($) are effective at each leVf~1 of recursion. N on-specific grammars in the metalanguage Let a non-specific grammar be one in which the particular alternatives of structure for a syntactic unit may have structurally the same headings (i.e., leading components which are structurally the same). The metalanguage allows the specification of such grammars and at recognition time the parser always picks the first specified (or left most) alternative as its initial guess. Subsequent guesses continue with the next specified alternatives. The user must be aware of the possible consequences if the apparent ambiguity in a non-specific grammar causes the recognition of syntactic units to be rejected later as a result of an unsuccessful parse. Though the back-up to the next alternative is handled automatically by the parser, the syntactic units recognized may result in macro invocations; the results of which will not automatically be negated . Relevant user aids in this area are provided by the system. The following example illustrates' a parsing grammar 162 Fall Joint Computer Conference, 1969 for a language which is context sensitive and not context free and which utilizes recursive productions. L = (Onl nOn:n ~ 1) (LANG) :(LSTR) -; '1', Tl, $'0' (RSTR) (LSTR) :'O'(LSTR)'I'1 '01' (RSTR) :'I'(RSTR)'O'j '10' The parser first determines that the input string belongs to the context-free language On 1nx; checks to make sure x does not begin with a 1; repositions to the beginning of the parsed substring of l's and then determines that the remaining substring of the input string belongs to the context-free language 1nOn. If the above conditions are true, then the input string belongs to the context-sensitive language Onl nOn. The SYMPLE macro facility The macro facility of SYl\IPLE provides the actual translation mechanisms. The macros themselves are read in to the system following the base language grammar and prior to the user's source deck. The individual macro definitions are described in this section. MACRO FORMAT The overall format of an individual macro definitions is as follows: < macro name> ( < syntactic unit» = < template body> / ( < syntactic unit» =