1972 12_#41_Part_1 12 #41 Part 1
1972-12_#41_Part_1 1972-12_%2341_Part_1
User Manual: 1972-12_#41_Part_1
Open the PDF directly: View PDF .
Page Count: 666
Download | |
Open PDF In Browser | View PDF |
AFIPS CONFERENCE PROCEEDINGS VOLUME 41 PART I 1972 FALL JOINT COMPUTER CONFERENCE December 5 - 7, 1972 Anaheim, California The ideas and opinions expressed herein are solely those of the authors and are not necessarily representative of or endorsed by the 1972 Fall Joint Computer Conference Committee or the American Federation of Information Processing Societies, Inc. Library of Congress Catalog Card Number 55-44701 AFIPS PRESS 210 Summit Avenue Montvale, New Jersey 07645 ©1972 by the American Federation of Information Processing Societies, Inc., Montvale, New Jersey 07645. All rights reserved. This book, or parts thereof, may not be reproduced in any form without permission of the publisher. Printed in the United States of America CONTENTS PART I OPERATING SYSTEMS Properties of disk scheduling policies in multiprogrammed computer systems .. " ........................................ , ....... . The interaction of multiprogramming job scheduling and CPU scheduling .................................................. . 1 T. J. Teorey 13 23 J. C. Browne J. Lan F. Baskett D. Murphy 33 K. Levitt Exact calculation of· computer network reliability .................. . 49 R. Wilkov E. Hansler G. McAuliffe A framework for analyzing hardware-software trade-offs in fault tolerant computing systems ................................... . 55 K. M. Chandy C. V. Ramamoorthy A. Cowan Automation of reliability evaluation procedures through CARE-The computer aided reliability estimation program .................. . An adaptive error correction scheme for computer memory systems .. . 65 83 Dynamic configuration of system integrity ........................ . 89 F. P. Mathur A. M. Patel M. Hsiau B. Borgerson Storage organization and management in TENEX ................. . The application of program-proving techniques to the verification of synchronization processes ..................................... . ARCHITECTURE FOR HIGH SYSTEM AVAILABILITY COMPUTING INSTALLATIONS-PROBLEMS AND PRACTICES The in-house computer department .............................. . A computer center accounting system ............................ . An approach to job billing in a multiprogramming environment ..... . 97 105 115 Facilities managemBnt-A marriage of porcupines ................. . 123 J. Pendray F. T. Grampp C. Kreitzberg J. Webb D. C. Jung COMPUTER GRAPHICS Automated map reading and analysis by computer ................ . 135 Computer generated optical sound tracks ......................... . 147 Simulating the visual environment in real-time via software ......... . Computer animation of a bicycle simulation ...................... . 153 161 An inverse computer graphics problem ........................... . 169 R. H. Cofer J. Tou E. K. Tucker L. H. Baker D. C. Buckner R. S. Burns J. P. Lynch R.·D. Roland W. D. Bernhart SOFTWARE ENGINEERING-THEORY AND PRACTICE (PART I) Module connection analysis-A tool for scheduling software debugging activities ................................................... . Evaluating the effectiveness of software verification-Practical experience with an automated tool ............................... . A design methodology for reliable software systems ................ . A summary of progress toward proving program correctness. . . ..... . 173 F. M. Haney 181 191 201 J. R. Brown B. H. Liskov T. A. Linden 213 221 D. J. Kuck J. Watson 229 J. A. Rudolph SIFT-Software Implemented Fault Tolerance .................... . TRIDENT-A new maintenance weapon ........................ . Computer system maintainability at the Lawrence Livermore Laboratory ................................................. . 243 255 J. H. Wensley R. M. Fitzsimons 263 The retryable processor ........................................ . 273 J. M. Burk J. Schoonover G. H. Maestri 279 287 299 G. J. Nutt T. E. Bell A. De Cegama LOGOS and the software engineer ............................... . Some conclusions from an experiment in software engineering techniques .................................................. . Project SUE as a learning experience ............................ . 311 C. W. Rose 325 331 System quality through structured programming .......... , ....... . 339 D. L. Parnas K. C. Sevcik J. W. Atwood M. S. Grushcow R. C. Holt J. J. Horning D. Tsichritzis F. T. Baker SUPERCOMPUTERS-PRESENT AND FUTURE Supercomputers for ordinary users ............................... . The Texas Instruments advanced scientific computer .............. . A production implementation of an associative array processorSTARAN .................................................. . MAINTENANCE AND SYSTEM INTEGRITY COMPUTER SIMULATIONS OF COMPUTER SYSTEMS Evaluation nets for computer system performance analysis ......... . Objectives and problems in simulating computers .................. . A methodology for computer model building ...................... . SOFTWARE ENGINEERING-THEORY AND PRACTICE (PART II) ARCHITECTURE LIMITATIONS IN LARGE-SCALE COMPUTATION AND DATA PROCESSING (P anel Discussion-No Papers in this Volume) ARRAY LOGIC AND OTHER ADVANCED TECHNIQUES An application of cellular logic for high speed decoding of minimum redundancy codes ........................................... . 345 On an extended threshold logic as a unit cell of array logics ......... . Multiple operand addition and multiplication ..................... . 353 367 Techniques for increasing fault coverage for asynchronous sequential networks .................................................. , . 375 L. R. Hoover J. H. Tracey K.Ohmori K. Nezu S. Naito T. Nanya R. Mori R. Waxman S. Singh ADVANCES IN SIMULATION System identification and simulation-A pattern recognition approach .................................................. , . Horizontal domain partitioning of the Navy atmospheric primitive equation prediction model .................................... . 385 W. J. Karplus 393 An analysis of optimal control system algorithms .................. . 407 Computer simulation of the metropolis ........................... . 415 E. Morenoff P. G. Kesel L. C. Clarke C. N. Walter G. H. Cohen B. Harris 423 425 S. Rothman R. F. Boruch 435 R. Turn N. Z. Shapiro 445 J. M. Carroll Hardware-software trade-offs-Reasons and directions ............. . A design for an auxiliary associative parallel processor ............. . 453 461 An eclectic information processing system ........................ . 473 R. L. Mandell M. A. Wesley S. K. Chang J. H. Mommens R. Cutts H. Huskey J. Haynes J. Kaubisch L. Laitinen G. Tollkuhn E. Yarwood PRIVACY AND THE SECURITY OF DATABANK SYSTEMS The protection of privacy and security in criminal offender record information systems ......................................... . Security of information processing-Implications for social research ... . Privacy and security in data bank systems-Measures, costs, and protector intruder interactions ................................ . Snapshot 1971-How one developed nation organizes information about people ............................ , ..... , .................. . ARRAY LOGIC-WHERE ART THOU? (Panel Discussion-No Papers in this Volume) HARDWARE-FIRMWARE-SOFTWARE TRADE-OFFS Microtext-The design of a microprogrammed finite state search machine for full text retrieval ................................. . 479 Design of the B1700 ........................................... . 489 R. H. Bullen, Jr. J. K. Millen W. T. Wilner HUMAN ENGINEERING OF PROGRAMMING SYSTEMS-THE USER'S VIEW An on-line two-dimensional computation system ................... . Debugging PLII programs in the multics environment ............. . AEPL-An Extensible Programming Language ................... . 499 507 515 The investment analysis language ............................... . 525 T. G. Williams B. Wolman E. Milgrom J. Katzenelson C. Dmytryshak DATA COMMUNICATION SYSTEMS The design approach to integrated telephone information in the Netherlands ................................................ . 537 R. DiPalma G. F. Hice Field evaluation of real-time capability of a large electronic switching system ..................................................... . 545 Minimum cost, reliable computer-communications networks ........ . 553 W. C. Jones S. H. Tsiang J. De Mercado MEASUREMENT OF COMPUTER SYSTEMS-SYSTEM PERFORMANCE (Panel Discussion-No Papers in this Volume) MEMORY ORGANIZATION AND MANAGEMENT Control Data STAR-lOO file storage station ....................... . 561 Protection systems and protection implementations ................ . B1700 memory utIlization ...................................... . Rotating storage devices as "partially associative memories" ........ . 571 579 587 G. Christensen P. D. Jones R. M. Needham W. T. Wilner N. Minsky DYNAMIC PROGRAM BEHAVIOR Page fault frequency (PFF) replacement algorithms ............... . 597 Experiments wish program locality .............................. . 611 W. W. Chu H. Opderbeck J. R. Spirn P. J. Denning COMPUTER ASSISTED EDUCATIONAL TEST CONSTRUCTION TASSY-One approach to individualized test construction .......... . 623 T. Blaskovics J. Kutsch, Jr. A comprehensive question retrieval application to serve classroom teachers ... , ............................................. ' .. . Computer processes in repeatable testing ..... , ................... . 633 641 G. Lippey F. Prosser J. N akhnikian Properties of disk scheduling policies in multiprogrammed computer systenls by TOBY J. TEOREY University of Wisconsin Madison, Wisconsin SCAN have been suggested by Coffman and Denning, 2 Manocha, 9 and Merten. 10 The implementation of SCAN is often referred to as LOOK,1O,12 but we retain the name SCAN for consistency within this paper. Both C_SCAN9,11,12,13 and the N-step scan6 ,12,13 have been discussed or studied previously and the Eschenbach scheme was developed for an airlinessystem. 14 Because it requires overhead for rotational optimization as well as seek time optimization it is not included in the following discussion. In the simulation study12 it was seen that the C-SCAN policy, with rotational optimization, was more appropriate than the Eschenbach scheme for all loading conditions, so we only consider C-SCAN here. The simulation results indicated the following, given that cylinder positions are addressed randomly:12 under very light loading all policies perform no better than FCFS. Under medium to heavy loading the FCFS policy allowed the system to saturate and the SSTF policy had intolerable variances in response time. SCAN and the N -step policies were superior under light to medium loading, and C-SCAN was superior under heavy loading. We first investigate various properties of the N -step scan, C-SCAN, and SCAN, since these are the highest performance policies that optimize on arm positioning time (seek time). The properties include mean, variance, and distribution of response time; and the distribution of the positions of requests serviced as a function of distance from the disk arm before it begins its next sweep. Response time mean and variance are then compared with simulation results. A unified approach is applied to all three policies to obtain mean response time. The expressions are nonlinear and require an iterative technique for solution; however, we can easily show that sufficient conditions always exist for convergence. Finally, we look at the factors that must be considered in deciding whether or not to implement disk INTRODUCTION The subject of scheduling for movable head rotating storage devices, i.e., disk-like devices, has been discussed at length in recent literature. The early scheduling models were developed by Denning, 3 Frank, 6 and Weingarten. 14 Highly theoretical models have been set forth recently by Manocha,9 and a comprehensive simulation study has been reported on by Teorey and Pinkerton. 12 One of the goals of this study is to develop a model that can be compared with the simulation results over a similar broad range of input loading conditions. Such a model will have two advantages over simulation: the computing cost per data point will be much smaller, and the degree of uncertainty of a stable solution will be decreased. Although the previous analytical results on disk scheduling are valid within their range of assumptions, they do not provide the systems designer with enough information to decide whether or not to implement disk scheduling at all; neither do they determine which scheduling policy to use for a given application, be it batch multiprogramming, time sharing, or real-time processing. The other goal of this study is to provide a basis upon which these questions can be answered. The basic scheduling policies are summarized with brief descriptions in Table 1. Many variations of these policies are possible, but in the interest of mathematical analysis and ease of software implementation we do not discuss them here. SCAN was first discussed by Denning. 3 He assumed a mean (fixed) queue length and derived expected service time and mean response time. The number of requests in the queue was assumed to be much less than the number of cylinders, so the probability of more than one request at· a cylinder was negligible. We do not restrict ourselves to such an assumption here. Improvements on the definition and representation of 1 2 Fall Joint Computer Conference, 1972 TABLE I-Basic Disk Scheduling Policies 1. FCFS (First-come-first-served): No reordering of the queue. 2. SSTF (Shortest-seek-time-first): Disk arm positions next at the request tha.t minimizes arm movement. 3. SCAN: Disk arm sweeps back and forth across the disk surface, servicing all requests in its path. It changes direction only when there are no more requests to service in the current direction. 4. C-SCAN (Circular scan): Disk arm moves unidirectionally across the disk surface toward the inner track. When there are no more requests to service ahead of the arm it jumps back to service the request nearest the outer t.rack and proceeds inward again. 5. N-step scan: Disk arm sweeps back and forth as in SCAN, but all requests that arrive during a sweep in one direction are batched and reordered for optimum service during the return sweep. 6. Eschenbach scheme: Disk arm movement is circular like C-SCAN, but with several important exceptions. Every cylinder is serviced for exactly one full track of information whether or not there is a request for that cylinder. Requests are reordered for service within a cylinder to take advantage of rotational position, but if two requests overlap sector positions within a cylinder, only one is serviced for the current sweep of the disk arm. (constant transmission time) is a fair approximation for our purpose of a comparative analysis. 3. Only a single disk drive with a dedicated controller and channel is considered, and there is only one movable head per surface. All disk arms are attached to a single boom so they must move simultaneously. A single position of all the read/write heads defines a cylinder. 4. Seek time is a linear function of seek distance. 5. No distinction is made between READ and WRITE requests, and the overhead for scheduling is assumed negligible. If there are L requests in the queue at equilibrium and C cylinders on the disk, we partition the disk surface into C1 equal regions (as defined below) and assume that at least one request lies in the center of that region. This partition is only valid when seek time is a linear function of distance. C1 is -computed as follows: since the distribution of L requests serviced is uniform, the probability that cylinder k has no requests is given by scheduling in a complex system. In practice, considerable attention should be given to these factors before thinking about which policy to use. N-STEP SCAN The N -step scan is the simplest scheduling policy to model using the approach discussed here. While the disk arm is sweeping across the surface to service the previous group of requests, new requests are ordered linearly for the return sweep. No limit is placed on the size of the batch, but at equilibrium we know the expected value of that size to be L, the mean queue length. Furthermore, we know that the resulting request position distribution will be the same as the input distribution, which we assume to be uniform across all the disk cylinders. We also assume the following: 1. Request inter arrival times are generated from the exponential distribution. 2. File requests are for equal sized records. This simplifies the analysis. We assume that the total service time distribution (seek time plus rotational delay plus transmission) is general and cannot be described by any simple distribution function. We also assume that the access time (seek time plus rotational delay) dominates the total service time, so that fixed record size (1) The expected number of cylinders with no requests is CO=CPk , so that the expected number of cylinders requiring service is: C1 =C-CO =C-C(l- ~r (2) If the incoming requests are placed at· random and the disk arm has equal probability of being at any cylinder, we know that the expected distance between an incoming request and the current position of the disk arm is approximately C/3 for large C. Typically, C ~ 200 for currently available disks. In Figure 1 we see the possible paths taken from the disk arm to the new request for the expected distance of C/3. The expected number of requests serviced hefore the new request is serviced is L, and the mean response time is (3) where Ts is the expected service time per request and T 8W is the expected sweep time from one extreme of the disk surface to the other. Properties of Disk Scheduling Policies 3 we have: L= XCl(Tmin+AT/Cl+T/2+T/m-a) 1-Xa NEW REQUEST Figure 1 The expected service time under the assumptions listed above was derived by Teorey and Pinkerton12 as follows: T.=P (T,.+ f + f) +(1-P) (9) Equation (9) computes mean queue length in terms of the input rate X, the known disk hardware characteristics, and C1• C1 is, however, a nonlinear function of L. We solve (9) by estimating an initial value for L in (2) and iteratively substituting (2) into (9) until the process converges. Convergence ! [(mt-2) (m-l) +1] m 2(mt-l) (4) L(l-Xa) where P is the probability that a seek is required to service the next request, Tsk is the expected seek time, T is the rotational time of a disk, m is the number of sectors per track, and t is the number of tracks per cylinder. Under our conditions, P=CI/L, and we simplify expression (4) by making the following definition: T [(mt-2) (m-l) a= +1 ] m 2(mt-1) (5) XAT L= - 1-Xa [1- (C~lrJ (Tmin+ f +; -a) -,..C +- (Tmin +T/2+T/m-a) l-Xa XC (C-l)L - - - (Tmin +T/2+T/m-a) -C (10) Letting Kl=XATj(l-Xa) +[XC/(1-Xa)](Tmin +T/2+ Tim-a) andK2 =[XC/(1-Xa)]. (Tmin +T/2+T/m-a) we obtain after i iterations: llT c; (11) (6) llT T m in+ =MTHC 1-Xa Also, for a linear seek time characteristic Tmin+ Rewriting (9) in terms of (2) we obtain 3 where llT = T max- T min, T min is the seek time for a distance of 1 cylinder, and T max is the seek time for a distance of C -1 cylinders. Restating (4) we now have Assuming that Li>O for all i, and l-Xa>O (no saturation) , we have: Li>O} =}0:5: (C-l)Li <1 - -C1-Xa>0 =}0L i - 1=}Li+l >Li and L i 1 for all i, and l-Aa>O (no saturation). L'i>1 } l-Aa>O ==>O~ (C-l)Li -- C - ==>1/C 0< K1 C , Ka also exists; each has a probability of .5. DISK ARM DIRECTION h AREA 2 Kr c Ka Figure 6 cylinder k obtaining the next incoming request: 1. Kr Ka C for Areas 1, 2, 3; lR = number of requests serviced from Kr to Ka = AREA 3 Area 2 K-1 = 72 (Kr-Ka) ( C":'I· 2L' K-1 2L') C + C-=-1 • C l~k~C (23) The input distribution is uniform; therefore each arrival of a new request represents a repeated Bernoulli trial of the same experiment. The probability that cylinder k remains empty is for Area 2 (Kr-Ka) (Kr+Ka-2)L' C(e-l) (23) = (1- k-l • ~)L' C-l To compute the expected number of cylinders with no requests, we first determine the probability of a given C for Areas 1, 2, 3 (24) and the expected number of occupied cylinders in that region is /,,/ / / A=50 REQUESTS/SEC. 500 C2 =C/3- Kr [1- (k -=-. 1 2L'/ )]lR :E -C lR C 1 k=Ka .,1'" for Area 2 cLIJ U ~ 2)LI 400 c ( k-l C1=C-:E 1- . - LIJ en en k=l t;; 300 LIJ :::> S a::: ~ a::: IJJ C for Areas 1, 2,3 (25) Mean response time 200 A=20 REQUESTS/SEC. CD ~ The mean response time is given by W = Probability {K r> Ka} • Tsw {Area 2} :::> z C-l 100 +Probability {Kr -' f- This expression is the same as (9) for the N -step scan except for the meaning of L' and C1• Solution of (27) is obtained by iteration. ~ ffi ~ .050 ~ .025 Convergence ~---+-----t----+---I--_ _ Sufficient conditions for convergence of the above procedure for SCAN are L'o>O and l-Xa>O. The proof proceeds as before: Letting K 1= (X/I-Xa)[AT+ C(Tmin +T/2+T/m-a)] and K 2 = (X/I-Xa) [Tmin + o .5Tsw Figure 7 Variance of response time The response time distribution for SCAN is not intuitively obvious. In order to obtain a close approximation to this distribution we can sample all possible TABLE II-Ratio of Requests Serviced per Sweep to Mean Queue Length for SCAN L' /L Requests/second 10 20 I I I 1.18 1.36 1.46 30 40 I 16 - - - - - N-STEP SCAN ................. SCAN ---C-SCAN 1.47 50 60 1.48 1.49 Limit 1. 50 I 14 I I 12 T/2+T/m-a] we can substitute (25) into (27) and obtain after i iterations: L'i+1=KI-K2 c ( L k=l L'i> 2 k-l )Lif 1- - . - C C-l 0} (2 k -1 )Lif =}O~ 1- - . - - l-Xa>O (28) I fBz 0 f:d 10 sa i= LIJ CFl <1 for all k~C z 0 L 8 ~ f{3 c:: z « k=l I LIJ LIJ =}O~ I :!! C C-l o ( RESPONSE TIME :!! 2 k_l)Lif 1- - . (S:> T)J == [CRAS)':) T)] to form a logical expression involving a single implication where only the transformed consequent assertion appears on the right side of the implication. 35 /\ (B/2= Q*D/4) /\ (D/2=2- k ) '/\ (k = nonnegative integer) /\ (P/Q-D/2 O. TA-71 0582-33 Figure I-Flow chart representation of simple control program Two steps must be followed in proving the program Application of Program-Proving Techniques with respect to the above four assertions. Step 1 is to prove that for all paths the assertions are consistent with the transformation specified by the intervening code; Step 2 is to establish that the validity of the correctness and deadlock parts is correctly embedded in the guessed assertions. First, in Step 1 the following control paths must be verified: 1~2, 1~3,3~4,3~3. For purposes of illustration we will outline the proof of 1~3; this outline should enable the reader to verify the other paths. The path from 1~3 embodies the following steps 37 ---I I CRITICAL SECTION 1 I I ql S~S-l ~ Test: 8=0 D~D+1 V(S) qa. ~I Back substitution on qa leads to the following verification condition: [ (integer S) /\ (8 ~ 1) /\ (D = u ( - 8 + 1) ) /\ (PENS=u(8) -S) J/\ (8-1 =0) :> [ (integer 8 - 1) /\ (8 -1 ~ 0) /\ (D+1=1) /\ (PENS = -S+l)]. The first term of the consequent, integer 8 -1, is true from integer S. The second term is true from 8 -1 = 0, i.e., 8=1. The third term, D=O, is true from [D=u( -8+1)J/\ (S=l). The fourth term is true from [PEN8=u(S)-8J/\(8=1). Thus the path 1~3 is verified (with respect to the "guessed" assertions). Step 2, establishing that the assertions embody the desired behavior of the correctness and deadlock parts, remains to be carried out. The correctness part is apparent from the assertions by noting that D = or 1. The deadlock part is satisfied by noting that whenever PEN8 ~ 1, then also D ~ 1; thus there exists a process currently in the critical section that will eventually schedule some deferred process. As an extension of this simple control program that we will use in the following sections, consider the program displayed in Figure 2. The program is a straightforward extension of the simple single critical section program discussed above. It can be shown by a proof similar to that outlined above that access is granted to only one of the two critical sections at a time. Thus, control cannot be simultaneously at points ® and 0. The interpretation of P(8) and V(S) is modified from that described previously, as shown in Figure 3. The variables PENS1 and PENS2 serve to ° I ~ p ( S ) -...... - l® 0 - ;, I I CRITICAL SECTION 2 V TA-71 0582-34 Figure 2-A control program with two critical sections indicate, respectively, the number of processes pending on semaphore 8 at critical sections 1 and 2. The "CHOOSE" box functions as follows. Either of the two output branches is chosen at random. If the test in the selected branch succeeds, then control continues along the branch; otherwise, control is passed to the other branch. Note that the relation S < 1 ensures that control 38 Fall Joint Computer Conference, 1972 APPLYING ASSERTIONS TO SYNCHRONIZATION PROGRAMS AND ABSTRACTING THE PROOF OF CORRECTNESS AND DEADLOCK FOR THE ASSERTIONS P(S) (FOR CRITICAL SECTION 1) ~ S +- S -1 • N01 TEST: S < l~C SPLIT h I -' ~ 4 (:HOOS0 ~ TEST: [ENS I PENS 1+- PENS + To (V II 1 ~ > O~ TEST: PE1V:> 1- 1J IPENS 0 2 +- PENS 2 - 11 The simple program of Figure 1 reveals, although only in a trivial manner, the possibilities for parallel activity that we wish to exhibit. For example, in Figure 1 it is possible for control to reside simultaneously in the critical section (point 0) and at point CD. The assertion we applied at point CD reflects the possibilities for multiple points of control in that the variable relationships correspond to control being only at point CD, simultaneous at points CD and 0, or simultaneous at points CD, ®, 0. (It is assumed that processors are available to execute any code associated with the critical section as well as with the peS) and YeS) blocks.) In proving the program we did not require any new formalisms beyond those associated with the uniprocessing situation since hardware locks are so constituted that the P and V operations are not simultaneously executed. A more general situation is displayed in Figure 4. Here we illustrate portions of two processes, A and B, with interprocess communication achieved via the semaphore S. The particular model of computation that we will assume is as follows: Assume that at periodic intervals calls are made on sections A or B. The availability of a processor + TOQD TA-71 0582-35 Figure 3-Interpretation of P and V for two critical sections can pass along at least one of the branches because if S < 1, then PENS1 + PENS2 > 1. The purpose of the CHOOSE box is to place no arbitrary constraints on the scheduling of deferred processes. The "SPLIT" box simultaneously passes control along each of its output branches. The intention here is both to reschedule another process onto a critical section associated with semaphore S and to have the process that just finished the critical section execute the instructions following YeS). Wherever two or more parallel paths converge there is a JOIN box, embodying some rules for combining the paths. Points 0 and 0 of Figure 3 are really JOIN boxes. The most apparent such rules are OR (AND) indicating that control is to proceed beyond the JOIN box wherever any (all) of the inputs to the JOIN box are active. Our discussion will apply mainly to the OR rule, but is easily· extended to the AND case. SECTION A SECTION B ENTER A ENTER B t ® P(SI---+ Y, <- f(Y21 ! @ VIS) r-- -----------------------, I I I I ITEST':< 1YH.( SPtT ~ I [No [ ! I I ~ II t I S<-S-1 I I ~ ~ TEST: PENS 1 1 :T2! ~: >0 PENS 1 <- PENS 1 - 1 TEST: PENS 2 1 >0 PENS 2 ... PENS 2 - 1 y,·tJ V(SI---+ 1® Y2 <- h(Y2 1 !0 Ya <- g(Y2 1 I II L---------i------------------- J ~V(MI TA-710582-36 Figure 4-Program to illustrate assertion interpretation Application of Program-Proving Techniques to commence process'ng of the calls is always assumed to exist. If two or more processors attempt simultaneous reference to a variable or operator, the selection of the processor that achieves access is made arbitrarily. If execution is deferred, say, at point @ , subsequent to the P (lVI) operation, the affected processor is presumably freed to handle other tasks. When the corresponding V (M) operation is carried out, schedul ng a deferred process, a processor is assumed to exist to effect the processing. With reference to this program and the assumed model of parallel computation, we will illustrate approaches to the placement of assertions and to proving the consistency of the assertions relative to intervening program statements. Assertion placement Since we are assuming a parallel/multiprocessing environment, there are potentially many points in the flow chart at which a processor can be in control. For example, in Figure 4 control can be simultaneous at points CD, ®, and 0. However, we will assume that the role of the POVI) and V(l\1:) operations is to exclude simultaneous accesses to the intervening critical section, provided there are no branches into the critical section. Hence, control cannot be simultaneous at points CD and @ . An assertion, for example at point CD, must reflect the state of the variables of the various processes assuming that: (1) Control is at point CD and, simultaneously, (2) Control is at any possible set of allowable points. By "allowable" we mean sets of points not excluded from control by virtue of mutual exclusion. We recall that for the uniprocessor environment assertions are placed so that each path in the program is provable. As an extension of that observation we can show that the proving of paths in a parallel program can be accomplished provided the following rules are satisfied: (1) Each loop in the· program must be broken by at least one assertion. (2) Within a critical section (i.e., one where control is at only one point at a time and where any variables in the critical section common to other portions of the program are themselves in critical sections under control of the same semaphore), only a sufficient number of assertions need be applied to satisfy the loop-cutting 39 rule, (1). We assume that all entries to critical section are controlled by P, V primitives. If not then rule (3) below applies. (3) All points not covered by rule (2) must generally be supplied with assertions. (4) Each HALT point and all WAIT points associated with a P operation must contain assertions. Thus, in Figure 5 a possible placement of assertions is at points @ , CD, ®, 0, 0, and 0. Note that since the purpose of synchronization programs is generally to exclude, by software techniques, more than one process from critical sections, such programs will not require the plethora of assertions associated with a general parallel program. Also note that it is a simple syntactic check to determine if a given assertion placement satisfies the above rules. Once the points where the assertions are to be placed have been selected and the assertions have been developed, it remains to prove the consistency of assertions. As in the uniprocessor case, the first step in this proof process is to develop the verification conditions for each path. For the parallel environment of concern to us here, we are confronted with the following types of paths: simple paths, paths with SPLIT nodes, paths with CHOOSE nodes, and impossible paths. These four path types are handled below, wherein the rules are given for developing the verification conditions, and some indication is given that the parallel program is correct if these rules are followed. A complete proof of the validity of the rules is not given because an induction argument similar to that of Floyd's applies here. Verification condition for a simple path By a simple path we mean a path bounded by an antecedent and a consequent assertion, with the intervening program steps being combinations of simple branch and assignment statements. For such a path the verification condition is derived exactly as in the uniprocessor case. That this is the correct rule is seen by noting that the assertion qa placed at point a in the program reflects the status of the variables, assuming that control is at point a and also at any allowable combination of other points containing assertions. Also note that because of our assertion placement rules, the variables involved in the code between a and b are not modified simultaneously by any other process. Thus, if a simple path a~b is bounded by assertions qa and qb and if it is proven that %/\ (intervening code) ::)qb, then the path is proven independently of the existence of control at other allowable points. 40 Fall Joint Computer Conference, 1972 Verification conditions for paths with SPLIT nodes Impossible paths Assume that a SPLIT node occurs in a path, say, bounded on one end by the antecedent assertion qa. Recall that at the SPLIT node, separate processors commence simultaneously on each of the emerging paths. Also assume that along the two separate paths emerging from the split nodes the next assertions encountered are qb and qc, respectively. * In this case the "path" (which is actually two paths) is proved by showing that As mentioned above, not all topological paths in a program are necessarily paths of control. In effect, what this means is that no input data combinations exist so that a particular exit of a Test is taken. Recall that for antecedent and consequent assertions qa, qb and an intervening Test, T, the verification condition is qa/\ T'::)qb', where the prime indicates that back substitution has been carried out. Clearly, if the test always evaluates to FALSE, then qa/\ T' must evaluate to FALSE, in which case the implication evaluates to TRUE independent of qb'. (We recall that TRUE::) TRUE, FALSE::)TRUE, and FALSE::)FALSE are all TRUE.) qa/\ (code between point a and SPLIT node) /\ (code between SPLIT node and point b) /\ (code between SPLIT node and point c)::) (qb/\ qc). Proving that program has no deadlock Note that it is not sufficient merely to consider the path between, say, a and b, since the transformations between the SPLIT node and c may influence the assertion qb. However, note that the variable references along the two paths emerging from the SPLIT node are disjoint, by virtue of the rules for selecting assertion points. Hence the use of back substitution to generate the verification condition can function as follows. Assertion qb is back-transformed by the statements between point b and the SPLIT node, followed by the statements between point c and the SPLIT node, finally followed by the statements between the SPLIT node and point a to generate qb. A similar rule holds for traversing backward from qc to generate qc. Note that the order in which the two paths following the SPLIT node are considered is not crucial since these paths are assumed not to reference the same variables. Verification condition for a path with a CHOOSE node Recall that when control reaches a CHOOSE node having two exits, the exit that is chosen to follow is chosen arbitrarily. Hence the effect of a CHOOSE node is simply to introduce two separate simple paths to be proven. For antecedent assertions qb, qc, what must be proved is qa /\ (code between a and b) ::)qb qa /\ (code between a and c) ::)qc. Note that one or possibly both of the paths might not be control paths, but this introduces no difficulties, as we show below. * Various special cases are noted, none of which introduce any particular difficulties. It is possible that qa, qb and qc might not be all distinct or that another SPLIT node occurs along a path before encountering a consequent assertion. For the parallel programs that we are dealing with deadlock will be avoided if for every semaphore S such that one or more processes are pending on S, there exists a process that will eventually perform a YeS) operation and thus schedule one of the deferred processes. (Weare not implying that every deferred process will be scheduled, since no assumptions are made on the scheduling mechanism.) In particular, if a process is pending on semaphore a, then it is necessary to show that another process is processing a. If that latter process is also pending on a semaphore b, it is necessary to show that b~a, and that a third process is processing b. If that third process is pending on c, it is necessary to show that c~b, c~a, and that a fourth process is processing c, etc. In the next sections we apply the concepts above to the verification of particular control programs. PROOF OF COURTOIS2 PROBLEM 1 This section presents a proof of a control program that was proposed by Courtois et al. The program is as follows: Integer RC; initial value = 0 Semaphore M, Q; initial values = 1 READER P(M) WRITER RC~RC+1 if RC= 1 then P(Q) V (lVI) READ PERFORMED P(M) RC~RC-1 if RC=O then V(Q) V(M) P(Q) WRITE PERFORMED V(Q) Application of Program-Proving Techniques WRITER READER lCD P(M)--+® -----..0 RC +- , RC + 1 TEST: RC = 1.:!!!, NOj_ . PIQ )-@ +-------, RD +- RD + 1 lCD l ---... P(Q)--+® V(M) 1® 1 ~1° WD (DEVICE) WD RD - 1 RC - 1 ++- 1 V(Q)_..J I 4 I 1 1 +- WD - 1 I 1@ V(Q) _ __ Yes TEST: RC = 0--, No WD + 1 @)(DEVICE) P(M)-+@ RD RC +- - 41 certain rare circumstances a reader's access might be deferred for a writer even though at the time at which the reader activates the READER section no writer is actually on the device.) A writer is to be granted access only if no other writer or reader is using the device; otherwise, the requesting writer's access is deferred. In particular, any number of simultaneous readers are allowed access provided no writers are already on. The role of the semaphore M is to enforce a scheduling discipline among the readers' access to RC and Q. For our model of parallel computation, it can be shown that the semaphore M is not needed, although its inclusion simplifies the assertion specification. Figure 5 is a flow chart representation of the program. A few words of explanation about the figure are in order. The V(Q) operator for the reader and the V (1\1) operator for the upper critical section are assumed to be the generalized V's containing the CHOOSE and SPLIT nodes as discussed in the two previous sections. The other V operators are assumed to contain CHOOSE but no SPLIT nodes. The dashed line emerging from V (Q) indicates a control path that will later be shown to be an impossible path. Associating appropriate variables with each of the P and V operators, the following integer variables and initial values are seen to apply to the flow-chart. 1\;1 Q RC RD WD RPENQ 1 1 0 0 0 WPENQ RPEN1\11 RPEN1\12 o 0 0 ° where the Rand W prefixes to a variable correspond, respectively, to readers and writers and the 1 and 2 suffixes correspond, respectively, to the "upper" and "lower" critical sections associated with semaphore 1\1. Once again we will divide the proof for this program into a correctness part and a deadlock part. For the correctness part we will establish that ......_ _ V(M) l® TA-71 0582-37 Figure 5-Flow chart representation of Courtois problem 1 The purpose of the program is to control the access of "readers" and "writers" to a device, where the device serves in effect as a critical section being competed for by readers and writers. If no writers are in control of the critical section, then all readers who so desire are to be granted access to the device. (We show below that the program almost satisfies this goal, although under (1) WD = 0 or 1, indicating that at most one writer at a time is granted access to the device. (2) If WD=l, then RD=O, indicating that if one writer is on the device, then no readers are "on." (3) If WD=O, then RPENQ=O, indicating that if no writer is on the device, then a reader is not held up by semaphore Q. An entering reader under these circumstances could be held up by semaphore 1\1, i.e., RPENMl>O. (We will temporarily defer discussion of this situation.) According to the assertion placement rules, each 42 Fall Joint Computer Conference, 1972 input, output and wait point must possess an assertion, each loop must be cut by an assertion, and in addition, an assertion must be placed at each point along a path wherein along another parallel path there exists an instruction referencing variables common to the point in quest:on. For this program the assertion placement problem is simplified since all variables, e.g., RC and Q, common to two or more parallel paths are a part of critical sections wherein access is granted only to one such critical section at a time. Hence, only the inputoutput and loop-cutting constraints must be satisfied, leading to a possible placement of assertions at the numbered points in Figure 5. Note that point ® does not require an assertion, but since it represents a control point where readers are on the device, it is an interesting reference point. The assertions associated with all 11 control points are indicated in Table 1. The assertion labelled G LO BAL is intended to conjoin with the other 11 assertions. The appearance of (i) at the beginning of a disjunctive term in q2, q3, q8, q9 indicates that the first (i) terms are the same as in ql. Thus, for example, in the first disjunctive term of assertion q2, the first six conjunctive terms are the same as in the first disjunctive term of ql, but the seventh and eighth terms are different, as shown. It is worthwhile discussing our technique for specifying the assertions-we will provide sample proofs later on to attest to the validity of the assertions. In specifying the assertion at a point a, we assumed, of course, that control is at a and then attempted to guess at which other points control could reside. Variable relationships based on this case analysis were then derived, and then the expressions were logically simplified to diminish the proliferation of terms that resulted. For example, in assertion ql, the first disjunctive term corresponds to the case: no writers on the device, i.e., control is not at @. The second disjunctive term corresponds to the case of control at @. With regard to the second term if control is hypothesized at @, it is also guessed that control could possibly be at 0, ([), and 00r0. It remains to verify all the paths bounded by the 11 assertions. The paths so defined are: 1~2;1~3;3~;3~(5,3);3~(5, 7);5~6;5~7, 7~8 [RC~O]; 7~7 [RC~O]; 7~3 [RC~O]; 7~(5, 3) [RC=O]; 7~(5, 7) [RC=O]; 7~(5, 8) [RC=O]; 7~(10, 3) [RC=O]; 7~(10, 7) [RC=O]; 7~(10, 8) [RC=O]; 1~9; 1~10; 10~11; 10~10; 10~5; 10~(5, 3); 10~(5, 7). A brief discussion of the symbolism is in order. For example, the path 3~(5, 3) commences at 0, and then splits at the SPLIT node of V (M) into two paths leading to ® and 0. The path 7~(10, 3) [RC=O] indicates that the branch defined by RC = 0 is taken, followed by a splitting at V (Q), one path leading to 0, and the other path taking the CHOOSE exit toward @. Clearly, many of the above paths are impossible paths-as revealed by the proof. We will not burden the reader of this paper with proofs of all the paths, but we will provide an indication of the proofs for several of the more interesting paths. TABLE I-Assertions for Courtois Problem 1 Global: (All variables E 1);'\ (M ~ 1);'\ (Q ~ 1);'\ (RC ~O);'\ (RD ~O);'\ (WD ~O);'\ (RPENQ ~O);'\ (WPENQ ~O);'\ (RPENMI ~ 0);'\ (RPENM2~0) ql; q2; qa: q4: [(WD=O);,\(RD=RC);,\(RPENQ=O);,\(WPENQ=u(Q)-Q);,\ u(Q)=u(I-RC»;'\(RPENM2~RD);'\(RPENMl+ RPENM2=u(M)-M)]V [WD= 1);'\ (RD =0);'\ (RPENQ = RC);'\ (WPENQ = -Q-RC);'\(RC =u(RC»;,\ (RPENM2 =0);'\ (RPENMI =u(M) - M);'\ (M ~u( 1- RC»] [(6)(RPENMI >Q);,\ (M O RPENQ~RPENQ -1 RD~RD+1 M~M+1 Test: M<5 Test: RPENi>o ~q, RPENMl~RPENMl-l Backsubstitute qa and q5 to yield qa', q5' qs': (WD=1)I\(RD+1=RC)I\(RPENQ=1)I\(WPENQ= -Q-l)I\(Q+1::S;0)I\(RPENM2::S;RD)I\(RPENMl+RPENM2 =u(M+1)-M)I\(RC~1) q3': r(WD = 1)1\ (RD + 1 =RC)I\ (RPENQ = 1)1\ (WPENQ = u(Q + 1) -Q -1)(u(Q + 1) =u( 1-RC»I\ (RPENM2 ::S;RD + 1) I\(RPENM1+RPENM2=u«M+l)-M)]V[(WD=2)I\(RD= -1) ... ] Tests backsubstituted T': (RPENM1 >0)1\ (M <0)1\ (RPENQ >0)1\ (Q <0) Verification Conditions qlOl\ T':>q5' qlOl\ T'::>q3' Sample Proof: Proof of Q5' term RPENQ = 1 From qlO: (RPENQ=RC)I\(RC=u(RC» Thus RPENQ=O or 1 From T' RPENQ>O Thus RPENQ = 1 Table II outlines the steps in proving the path 10~(5, 3). At the top of Table II we delineate the steps encountered along the path. As is readily noted, the path contains a SPLIT node. To develop the verification condition, back substitution is required from both q3 and q5 to form qa' and q5'; note that in developing q5' the statements between the SPLIT node and point ® must be considered, in addition to the statements directly between points @ and 0. To verify the path, the following two logical formulas must be proved true: qlOl\ T'~q5', qlOI\ T'~q3" At the bottom of Table II we outline the few simple steps required to prove the term (RPENQ = 1) in q5'. The patient reader of this paper can carry out the comparably simple steps to handle the remaining terms. Note that qs' is the disjunction of two terms, one beginning with the term (WD = 1) and the other with the term (WD=2). For ql01\ T'~qs' to be true, it is necessary for only one of the disjunctive terms to be true. The reader can verify that it is indeed the first disjunctive term that is pertinent. As a final note on the verification of paths, consider the path 10~(5, 7). A little thought should indicate that this should be an impossible path since the effect of control passing to point (j) is to schedule a process that was deferred at point 0, but at point 0 a reader is considered to be on the device, and hence point 0 could not be reached from point @ where a writer is on the device. This is borne out by considering the formula (qlO 1\ T') for the path in question. In qlO there is the conjunctive term (RPENM2=O) while in T', the back-substituted test expression, there is the conjunctive term (RPENM2 <0). Thus, qlOl\ T' evaluates to FALSE, indicating that the path is impossible. I t remains now to prove the correctness and deadlock conditions by observation of the assertions and the program itself. The key assertion here is ql since it expresses the relationship among variables for all possible variations of control, e.g., for all allowable assignments of processors to control points in the program. On the basis of ql we can conclude the following with regard to correctness: (1) WD = 0 or 1, indicating that no more than one writer is ever granted access to the device. (2) If WD=l, then RD=O, indicating that if a writer is on the device, then no reader is. (3) The issue. of a requesting reader not encountering any (appreciable) delay in getting access to a device not occupied by a writer is more complicated. From the first disjunctive term of ql 44 Fall Joint Computer Conference, 1972 (that deals with the case of no writers on the device), we note that if WD =0, then RPENQ = O. Hence, under the assumed circumstances a requesting reader is not deferred by semaphore Q. However, a requesting reader could be deferred by semaphore lVr. In fact, a requesting reader could be deferred at point ® while the RD readers on the device emerge from point 0, and then be scheduled onto the lower/ critical section wherein the last emerging reader performs V(Q) and schedules a writer. The deferred reader will then be scheduled onto the upper critical section only to be deferred by Q at point 0. Although it is an unusual timing of requests and reader completions that leads to this situation, it still violates the hypothesis that a reader is granted access provided no writer is on the device. * Note that, under the assumption that (WD = 0) and RPENM2 remains zero while a requesting reader is deferred by M at point ®, the requesting reader will be granted access to the device prior to any requesting writers. We now dispose of the question of deadlock. We need to demonstrate that, if a process is pending on a semaphore, then there exists another process that will eventually perform a V operation on that semaphore. With regard to semaphore Q, we note from observation of ql that if RPENQ>O or WPENQ>O, then either WD = 1 or RD ~ 1. Thus, if any process is pending on Q, there exist processes that might eventually do a V(Q) operation. It remains to dispose of the issue of these processes themselves pending on semaphores. It is obvious that a writer on the device must emerge eventually, at which time it will do a V (Q) operation. For one reader (or more) on the device, in which case RPENQ = 0, we have shown that the last reader will perform a V(Q) operation. A reader could be deferred by semaphore M, but in this case there is a reader processing lVI that is not deferred by Q and hence must do a V (1\11) operation. * We conjecture that there is no solution to this problem without permitting the conditional testing of semaphores, so that the granting of access to a writer or reader to the device is decided on the basis of the arrivaltime of a reader or writer at the entry point to the control program. In effect, what the program here accomplishes is to grant a reader access to the device provided it passes the test: RC = 0 while WD = O. Note that there are other problems that do not admit to solutions using only P and V operations unless conditional testing of semaphores is permitted, e.g., see Patil.1 5 DISCUSSION In this paper we have developed an approach based on program-proving techniques for verifying parallel control programs involving P and V type operations. The proof method requires user-supplied assertions similar to Floyd's method. We have given a method for developing the verification conditions, and for abstracting the proof of correctness and nondeadlock from the assertions. We applied the technique to two control programs devised by Courtois et al. At first glance it might appear that the method is only useful for toy programs since our proofs for the above two programs seem quite complex. However, in reality the proofs presented here are not complex, but just lengthy when written out in detail. The deductions needed to prove the verification conditions are quite trivial, and well within the capability of proposed program proving systems. * By way of extrapolation it seems reasonable for an interactive program verifier to handle hundreds of verification conditions of comparable complexity. Thus one might expect that operating systems containing up to 1000 lines of high-level code should be handled by the proposed program verifier. We might add that some additional theoretical work is called for relative to parallel programs and operating systems. Perhaps the main deficiency of the proofs presented here is that a suspicious reader might not believe the proofs. In establishing the correctness of the programs it was required to carry out a nontrivial analysis of the assertions. For example, we refer the reader to the previous section where the subject of a reader not encountering any delay in access is discussed. Contrast this with a program that prints prime numbers, wherein the output assertion says that the nth item printed is the nth prime-if the proof process establishes the validity of the output assertion there is no doubt that the program is correct. It is thus clear that the operating system environment could benefit from a specification language that would provide a mathematical description of the behavior of an operating system. Also some additional work is needed in understanding the impact of structured programming on the proof of operating systems. We would expect that structured programming techniques would reduce the number of assertion points and the number of paths that must be verified. * See London16 for a review of current and proposed program proving systems. Application of Program-Proving Techniques ACKNOWLEDGMENTS The author wishes to thank Ralph London for many stimulating discussions on program proving and operating systems and for providing a copy of his proof of the two programs discussed in this paper. Peter Neumann, Bernard Elspas and Jack Goldberg read a preliminary version of the manuscript. Two referees provide some extremely perceptive comments. 14 15 16 REFERENCES 1 E W DIJKSTRA The structure of THE multiprogramming system Comm ACM 11 5 pp 341-346 May 1968 2 P J COURTOIS F HEYMANS D L PARNASS Concurrent control with "READERS" and WRITERS" Comm ACM 14 10 pp 667-668 October 1971 3 R W FLOYD Assigning meanings to programs In Mathematical Aspects of Computer Science J T Schwartz (ed) Vol 19 Am Math Soc pp 19-32 Providence Rhode Island 1967 4 P NAUR Proof of algorithms by general snapshots BIT 6 4 pp 310-316 1966 5 Z MANNA The correctness of programs J Computer and System Sciences 3 2 pp 119-127 May 1969 6 R L LONDON Computer programs can be proved correct In Proc 4th Systems Symposium-Formal Systems and N onnumerical Problem Solving by Computer R B Banerji and M D Mesarovic (eds) pp 281-302 Springer Verlag New York 1970 7 R L LONDON Proof of algorithms, a new kind of certification (Certification of Algorithm 245, TREESORT 3) Comm ACM 136 pp 371-373 June 1970 8 R L LONDON Correctness of two compilers for a LISP subset Stanford Artificial Intelligence Project AIM-151 Stanford California October 1971 9 B ELSPAS K N LEVITT R J WALDINGER A WAKSMAN An assessment of techniques for proving program correctness ACM Computing Surveys 4 2 pp 97-147 June 1972 10 E ASHCROFT Z MANNA Formalization of properties of parallel programs Stanford Artificial Intelligence Project AIM-110 Stanford California February 1970 11 A N HABERMANN Synchronization of communicating processes Comm ACM 153 pp 177-184 March 1970 12 J C KING A program verifier PhD Thesis Carnegie-Mellon University Pittsburgh Pennsylvania 1969 13 D I GOOD Toward a man-machine system for proving program correctness 17 45 PhD Thesis University of Wisconsin Madison Wisconsin 1970 B ELSPAS M W GREEN K N LEVITT R J WALDINGER Research in interactive program-proving technique Stanford Research Institute Menlo Park California May 1972 S PATIL Limitations and capabilities of Dijkstra's semaphore primitives for coordination among processes MIT Project MAC Cambridge Massachusetts February 1971 ' R L LONDON The current status of proving programs correct Proc 1972 ACM Conference pp 39-46 August 1972 R C HOLT Comments on the prevention of system deadlocks Comm ACM 14 1 pp 36-38 January 1971 APPENDIX Proof of Courtois problem 2 Figure 6 displays the flow chart of the second control problem of Courtois et al. 2 The intent of this program is (1) similar to problem 1 in that the device can be shared by one or more readers, but a writer is to be granted exclusive access; (2) if no writers are on the device or waiting for access, a requesting reader is to be granted immediate access to the device; and (3) if one or more writers are deferred, then a writer is to be granted access before any reader that might subsequently request access. As we show below, a formal statement of the priorities can be expressed in terms of the variables of Figure 6. Also, as in problem 1, the intent of the program is not quite achieved relative to the receiving of requests at the entry points of the reader and writer sections. It is noted that the program contains semaphores L, M, N, Q, S, all of which have initial value 1, and "visible" integer variables RS, RD, RC, WS, WD, WC, all of which have initial value o. In addition, as in problem 1, there are the variables associated with the various P and V operations. As in problem 1, the V operators, with the exception of VeL) and those at points @ and @ , embody both the SPLIT and CHOOSE nodes; VeL) has only the SPLIT node, and the final V's have only CHOOSE nodes. The dotted control lines indicate paths that can be shown to be impossible. The numbered points on the flow chart are suitable for assertion placement in that each loop is cut by at least one assertion and all commonly referenced variables are contained within critical sections. There are several approaches toward deriving the assertions, but once again the most sensible one involves case analysis. 46 Fall Joint Computer Conference, 1972 WRITER READER 0!® P(L)- ---i® P(S~)_.:... _ _ _ _ _ _ _ _ _ _ _ _ _ _- - , l.. ------ .,I RS +- RS + 1 ~ 0 r-I-r------:::rl ® I RC :tc I! I 1~ ® P~N).@.. @! WC +- wc + 1 ~ TEST: WC = Yes 1~ N0l.. II ...--_ _No\,_=_ 4------------, 4-------, I P(O)-=- II I I + 1 TEST: RC = 01 II I I I I P(M)- : I I I I l I RD +- RD + 1 l L----V(M) RS +- t- P(O)- ~l + WD+-WD+1 (DEVICE)@ --V(L) • ~ WD +- WD - 1 (1) ~® P(M)~ (DEVICE) L-- _-===vtO) ~t® ~@ P(N)- @t:=:======!=; RD +- RD - 1 + WC+-WC-1 + 0--, Yes TEST: WC RC +- RC - 1 Nol .. 11- +@ ~ ~ I + V(N) vtS)------.l.--J TEST: RC = P(S)-=-+ WS+-WS+1 I! 1 @ 1_, I CI describes the domain of the individual variables and is common to all assertions for the program. It was convenient to decompose the second disjunctive term into two disjunctive terms, bl, b2, corresponding to the reader processing Q and the reader not processing Q. A similar case analysis for the al term is embedded in the conjunctive terms. Note that, as in problem 1, the prefixes W, R refer to writer and reader and the suffixes 1 and 2 refer to the upper and lower critical sections. We will not burden the reader of the paper with a listing of the assertions at all points or with a proof of the various paths; the proof is quite similar to that illustrated for problem 1. However, it is of interest to abstract from qi sufficient information to prove the program's intent. For a discussion of deadlock the reader is referred to Reference 14. As with problem 1 the decision concerning whether a requesting reader or a requesting writer gains access to the device is based on which one arrives first at the corresponding P (8) operation-not on arrival time of the readers and writers at the corresponding section entry points. This point is discussed in more detail below: Yes 0---, = WS+-WS-1 -I V(O)-:..-- - - - - ' 1 1 V!S)--_J 1_ ...._~I V(N)=========~ L...-_ _ V(M) l@ TA_710582-38 Figure 6-Flow chart representation of Courtois problem 2 From the view of control at point CD, we have derived the assertion qi of the form qi = CI/\ (al V [a2/\ (b i V b2) ]) , wherein al corresponds to a writer not processing S, i.e., WS =0, and [a2/\ (biV b2) ] corresponds to a 'writer processing S, i.e., WS = 1. The assertionql listed in Table III reflects this case analysis: The global assertion (1) The assertions indicate that any number of readers can share the device provided no writers are on, since if WD = 0, then from al we see there are no constraints on RD. The assertions indicate that at most one writer is on the device because from observation of both al and a2 we note that WD = or 1. (2) The assertions indicate, as follows, that a reader's access to the device is not delayed provided no writers are processing S or are on the device, and provided no writers are pending on Q or S. The term al indicates that if WS = WD = 0, i.e., no writers are processing S or on the device, and if WPENQ= WPENS=O, i.e., no writers are pending on Q or S, then RPEN8=RPENQ=0, ° TABLE III-Main Assertion for Courtois Problem 2 Global: (All. variables E I)I\(L~l)/\ (M ~l)/\ (N ~l)/\ (Q ~l)/\ (S~l)/\(RC~O)/\ (RD~O)/\ (WC~O)/\ (WD ~O)/\ (RPENS~O) /\(WPENS~O)/\(RPENQ~O)/\(WPENQ~O)/\(RPENL~O)/\(RPENMI~O)/\(RPENM2]~0)/\(WPENNl~0)/\(WPENN2~0) /\(RS~O)I\(WS~O) ql: (Writer not processing S) (RS=u( -S+l»/\(WS=O)!\(WD=O)/\(WC=u(WC»/\(WPENQ=O)/\(RPENQ=O)/\(RPENS=O)/\(u(WPENS)=u(S)-S) /\ (RD = RC)/\ (u(Q) =u(l-RC»/\ (WPENS = WC)/\(WPENNI =u(N)-N/\ (WPENN2 =0)/\ (RPENL=u(L) -L)/\(u(L) =u(S»/\ (RPENMI +RPENM2 =u(M)- M)/\ (RPENMI ~RD) (Writer processing S) {(WS=l)/\(RS=O)/\(WPENS=O)/\(RPENS= -S)/\(S= -u( -S»A (RPENQ =0)/\ (WPENQ =u(Q)-Q)/\(WPENNl + WPENN2 = u(N) - N)/\ (RC =RD)/\ (RPENL = u(L) - L)/\ (RPENMI = 0)/\ (RPENM2 = u(M) - M)/\ (L:::; u(S + 1» }/\ {[(Q :::;0) /\(RC~l)/\(WD=O)/\(WC= -Q)/\(WPENN2=0)]V[(RC={)/\(M=1)/\(WC=WD+WPENNl+WPENQ)/\(WD=u(WD» /\ (WD ~u(WPENQ»]1 Application of Program-Proving Techniques indicating that no reader is deferred by S or Q from access to the device. The issue of writer priority will be handled by applying case analysis to ql. • RPENQ is always 0; thus a V(Q) operation can only grant access to a deferred writer, never to a reader. • RS is 0 or 1; thus, at most, one reader is processing S. If RS = 1, then RPENS = 0 and WPENS = 0 or 1. This indicates that if a reader is processing S, the subsequent YeS) operation can only signal a deferred writer. • If WS = 1 then WPENS = 0, and there are no constraints on WPENQ. This indicates that all deferred writers are pending on Q (or N as discussed below), and since RPENQ=O a writer must get access to the device either immediately if RD= WD=O, or when the next V(Q) is performed by either a writer or a reader. As we mentioned above, the issue of granting access 47 to a writer or a reader is determined by the arrival time at peS). If this is indeed the intent of the program, then the above discussion serves to prove the correctness of the program. However, there are several important exceptions that deserve discussion. For example, while a writer is pending on S, all subsequent requesting \\-Titers will be deferred by N . Now these writers should be granted access to the device before any requesting readers receive it, which will be the situation under "normal" timing conditions. The deferred writer, at point @, will be scheduled by a reader doing V(S), in which case the writer will perform YeN) and in turn will schedule a deferred writer. These previously deferred writers will not get blocked by S but will pass to P (Q). Of the readers requesting access, one will be blocked by S and the remainder by L. The only way this scheduling would not occur as stated would be if the deferred writer at point @ passed through the \\-Titer section and performed a YeS) operation, thus scheduling a deferred reader before a writer processing the upper critical section could get through the first two instructions. Exact calculation of computer network reliability by E. HANSLER IBM Research Laboratory Ruschlikon, Switzerland G. K. McAULIFFE IBM Corporation Dublin, Ireland and R. S. WILKOV IBM Corporation Armonk, N ew York network fail with the same probability q and each of the b links fail with the same probability p, then Pees, t] is approximately given by INTRODUCTION The exact calculation of the reliability of the communication paths between any pair of nodes in a distributed computer network has not been feasible for large networks. Consequently, many reliability criteria have been suggested based on approximate calculations of network reliability. For a thorough treatment of these criteria, the reader is referred to the book and survey paper by Frank and Frisch 1 •2 and the recent survey paper by Wilkov. 3 Making use of the analogy between distributed computer networks and linear graphs, it is noted that a network is said to be connected if there is at least one path between every pair of nodes. A (minimal) set of links in a network whose failure disconnects it is called a (prime) link cutset and a (minimal) set of nodes with the same property is called a (prime) node cutset. If a node has failed, it is assumed that all of the links incident to that node have also failed. A cutset with respect to a specific pair of nodes ns and nt in a connected network, sometimes called an s-t cut, is such that its removal destroys all paths between nodes ns and nt. The exact calculation of PeeS, t], the probability of successful communication between any pair of operative computer centers ns and nt, requires the examination of all paths in the network between nodes ns and nt. More specifically, if each of the n nodes in any given b PeeS, t]= 2: A:.t(i) (l_p)ipb-i, p»q. (1) i=O In Eq. (1), A:.t(i) is the number of combinations of i links such that if only they are operative, there is at least one communication path between nodes ns and nt. On the other hand, the calculation of the probability P,[s, t] of a communication failure between nodes ns and nt requires the examination of all s-t cuts. For specified values of p or q, P,[s, t] is approximately given by b P,[s, t]= 2:C:. t (i)pi(l-p)b-i, p»q. (2) i=O For q»p, a similar expression can be given replacing C:.t(i) by C:.t(i). The coefficients ·C:.t(i) and C:. t (i) denote the total number of link and node s-t cuts of size i. The enumeration of all paths or cutsets between any pair of nodes ns and nt is not computationally possible for very large networks. RELIABILITY APPROXIMATION BASED ON CUTSET ENUMERATION If any network G of b links and n nodes, it is easily shown that the order of the number of cutsets is 2 n - 2 49 50 Fall Joint Computer Conference, 1972 whereas the order of the number of paths between any pair of nodes is 2b-n+2. For networks having nodes of average degree (number of incident links) greater than four, b>2n and 2b-n+2>2n-2. Consequently, such networks have a larger number of paths than cutsets. Computation time would clearly be reduced in such cases by calculating network reliability from cutsets instead of paths. In this case PeeS, tJ can be obtained from PeeS, tJ= 1-Pr [s, tJ, where Pres, tJ can be calculated from Eq. (2). Alternatively, PI[s, tJ~p[9JZ:.,] (3) where Q!. t is the event that all links fail in the ith prime s-t cut and N is the total number of prime cutsets with respect to nodes ns and nt. The calculation of Pres, tJ from Eq. (2) clearly requires the examination of all s-t cuts. The number of prime s-t cuts is usually much smaller. However, Pres, tJ is not readily calculated from Eq. (3) because the Q!.t are not mutually exclusive events. Following Wilkov,4 we shall use Pr = Maxs.tPr[s, tJ as an indication of the overall probability of service disruption for a given computer network. For specified values of p or q, Pr depends only on the topology of the network. A maximally reliable network clearly has a topology which minimizes Pr and hence minimizes Maxs.tC:.t(m) or Maxs.tC:.tCm) for small (large) values of m when p or q is small (large). Letting X:,t(m) and X:.t(m) denote the number of prime node and edge s-t cuts of size m, Xn (m) = Maxs.tX:.t(m) and Xe(m) =Maxs.tX:,t(m) have been proposed4 as computer network reliability measures. These measures Xn(m) and Xe(m) denote the maximum number of prime node and edge cutsets of size m with respect to any pair of nodes. A maximally reliable network is such that Xn(m) and Xe(m) are as small as possible for small (large) values of m when the probability of node or link failure is small (large). In the calculation of Xn(m) and Xe(m) for any given network, all node pairs need not be considered if all nodes or links have the same probability of failure. It has been shown5 that in order to calculate Xn(m) and Xe(m), one need only consider those node pairs whose distance (number of links on a shortest route between them) is as large as possible. For a specified pair of nodes n s, nt, X:. t(m) can be calculated for all values of m using a procedure given by Jensen and Bellmore. 6 Their procedure enumerates all prime link cutsets between any specified pair of nodes in a non-oriented network (one consisting only of full or half duplex links). It requires the storage of a large binary tree with one terminal node for each prime cutset. Although these cutsets are not mutually exclusive events, it has been suggested 6 that Eq. (3) be approximated by N Pr[s, tJ ~ ~ P[Q!. tJ. (4) i=O However, it is shown in the following section that no additonal computation time is required to actually compute Pr[s, tJ exactly. EXACT CALCULATION OF COMPUTER NETWORK RELIABILITY A simple procedure is described below to iteratively calculate a minimal set of mutually exclusive events containing all prime link s-t cuts. This procedure starts with the prime cutset consisting of the link incident to node nt. Subsequently, these links are re-connected in all combinations and we then cut the minimal set of links adjacent to these that lie on a path between node ns and nt, assuming that the network contains no pendant nodes (nodes with only one incident link). The link replacements are iterated until the set of links connected to node ns are reached. The procedure is easily extended to provide for node cutsets as well and requires a very small amount of storage since each event is generated from the previous one. PieS, tJ is obtained by accumulating the probabilities of each of the mutually exclusive events. Procedure I 1. Initialization Let: N be the set of all nodes except nodes n s • C be the set of all links not incident to node ns. M1 = {ns} F1 be the set of links incident to both ns and nt Sl be the set of links incident to ns but not nt b1 be a binary number consisting of only I Sl ones i=l I 2. Let: Ti be a subset of Si consisting of those elements in Si for which the corresponding digit in bi is unity. M i +1 be a subset of N consisting of nodes incident to the links in T i. N = N-Mi+l' Fi+l be a subset of C consisting of links incident to nt and adjacent to any member of T i • Si+l be a subset of C consisting of links incident to nodes in N other than nt and adjacent to any member of T i • C C- (Si+lUFi+1). Exact Calculation of Computer Network Reliability 3. If Si+I¢0, then let: bi+l be a binary number with I Si+l I ones i = i+1 Go to step 2 Otherwise, let: Ti+I=0 HI CS= U [Fk U1\U (Sk-Tk)], k=l where CS is a modified cutset and Tk indicates that the links in set Tk are connected. 4. Let: C=CUFi+IUSiH' N=NUMi+1 bi = bi-l (modulo 2) If bi B ij • A flow-chart for determining whether a state ought to be saved after task i is finished and if task j is called next is shown in Figure 2. If S = H + M, then it is possible to implement rollback and to allow recovery from one error by means of rollback. The reliability in this case is the probability of no error in T+H time units (in which case no rollback is necessary) plus the probability of exactly one error in T + H units followed by a period of M error free units in which recovery is taking place. R(H+M) = e-a(T+H) [a(T+H) ]te-a(T+H+M) += -:--------- 1! By the same argument, if S = H +2M then two error recoveries are possible and R(H+2M)=R(H+M)+ [a (T +H +M) J2e -a(T+H+2M) , 2. In general R(H+nM) =R(H+(n-1)M) + { aCT + H + (n -1) MJ} ne-a(T+H+nM) n., for n=2, 3, ... If we are considering delaying the time required to complete the mission by S units we get the Software Reliability Efficiency index to be: SRE= R(S) -R(O) S Computing the software reliability efficiency index Let T be the time required to complete the program if there is no error, and without implementing a rollback method. Let H be the overhead incurred by implementing a rollback procedure. H can be easily computed for an arbitrary program as shown in Reference 8. Recollect that the rollback procedure is designed so that the maximum recovery time will not exceed a given value M. If the mission is completed in T+S units rather than T units a "lateness penalty" is incurred which gets larger as S increases. We shall find the reliability of a system with rollback as a function of S, the amount of "lateness" permitted. We shall assume that failures occur according to the exponential failure law, and the mean time between failures is l/a. If S = 0 then the program must finish in T time units without error. The probability of no error in T time units is e-aT • Letting R(S) be the reliability, defined as the probability of completing a successful mission, we have: R(O) =e-aT Note that in this analysis undetected and permanent errors were ignored. They can be included quite simply. Let Q(S) be the probability of the event that there is no undetected or permanent error in S units and let it be independent of other events. Then we have SRE= Q(S) ·[R(S) -R(O) ] . S I nstructional retrial If an error is detected while the processor is executing an instruction, the instruction could be retried, if its operands have not already been modified. This technique is an elementary form of rollback: recovery time never exceeds the execution time of an instruction, and overhead is negligible. However, there is a probability that an error will persist even after instruction retry. Let this probability be Q. The SRE for this technique can be computed in a manner identical to that for rollback and has the same form. The SRE for instruction retrial will in general be higher than that for rollback. Framework for Hardware-Software Tradeoffs 59 Deadlock prevention Discussion INPUT --_1. . ___ CO_",_P_UT_E_R_ _ _: - _........ OUTPUT Prevention of deadlocks is an important aspect of overall system reliability. Deadlocks may arise when procedures refuse to give up the resources assigned to them, and when some procedures demand more resources from the system than the system has left unassigned. Consider a system with one unit each of two resources A and B, and two procedures I and II. Now suppose procedure I is currently assigned the unit of resource A while II is assigned B. Then if procedure I demands B and II demands A, the system will be in a deadlock: neither procedure can continue without the resources already assigned to the other. The hardware approach to this problem is to buy sufficient resources so that requests can be satisfied on demand. Habermann and others 6,7 have discussed methods for preventing deadlocks without purchasing additional resources. In these methods sufficient resources are always kept in reserve to prevent the occurrence of deadlock. This may entail users being (temporarily) refused resources requested, even though there are unassigned resources available. Keeping resources in reserve also implies that resource utilization is (at least temporarily) decreased. An alternative approach is to allocate resources in such a manner, that even though it is possible that deadlocks might arise, it is very improbable that such a situation could occur. The tradeoff here is between the probability of deadlock on the one hand and resource utilization (or throughput) on the other. The tradeoff is expressed in terms of the software reliability efficiency index. Simplex Figure 3a Summary of software methods Different methods for improving the overall reli... ability of a system using software have been discussed. The software reliability efficiency index was suggested as an aid in evaluating software methods. Techniques for computing SRE were discussed. Similar techniques can be used for computing SRE for other software methods. HARDWARE METHODS Triple modulo redundancy Discussion Triple Modulo Redundancy (TMR) was one of the earliest methods! suggested for obtaining a reliable sys,;. tem from less reliable components. The system output (Figure 3) is the majority of three identical components. If only one of the components is in error, the system output will not be in error, since the majority of Determining the software reliability efficiency index The probability P of a deadlock while the mission is in progress and the time T required to complete the mission (assuming no deadlock) using a scheme where resources are granted on request are determined through simulation. The time (T+H) required to complete the mission using a deadlock prevention scheme is 'also determined by means of simulation. If Q(L) is the probability that no malfunctions other than deadlock arise in L time units, then assuming independence, we have: SRE= Q(T+H) -Q(T)· (l-P) Configuration r - - ---------------, I I I I I j 1 I I System Input I i I I I I I J I I I I I I I I I I I SYSTEM I L _____________________ 1 H At this time we know of no way of computing Hand P analytically. Figure 3b System Output 60 Fall Joint Computer Conference, 1972 components will not be in error. Thus, the system can tolerate errors in anyone component; note that these errors may be transient or permanent. In this discussion we discuss only permanent errors. Computing the hardware reliability efficiency index Let P be the probability that a permanent malfunction occurs in a given component before the mission is completed. If failures obey an exponential law, and if the average time to a failure is 1/a, then P = 1- e-aT , where T is the time required to complete the mission. If the system is a discrete transition system (such as a . computer system), then the time required to complete the mission can be expressed as N cycles (iterations) where computation proceeds in discrete steps called cycles. If the probability of failure on any cycle is p independent of other cycles then P=l- (1-p)N Let v be the probability of a malfunction in the votetaker before the mission is complete independent of other events. The reliability R of a TMR system is the probability that at least two components and the votetaker do not fail for the duration of the mission. R= [(I-P)3+3(I-P)2 oPJo (I-v) If C is the cost of each component, and D the cost of the vote-taker, the hardware reliability efficiency index is: HRE = 1_-_ P-,-)3_+_3-,-(1_-_P---,)_2_oP-=Jc--(c--1_-_v)c-------.:-(1_-_P_) =-[(c-0 2C+D Transient errors can also be included quite easily in HRE. Hybrid system Discussion Mathur and Avizienis 2 discuss an ingenious method of obtaining reliability by using TMR and spares, see Figure 4. The spares are not powered-on and will be referred to as "inactive" components. If at any point in time, one of the three active components disagrees with the majority, the component in the minority is switched out and replaced by a spare. The spare must be powered-up and loaded; one method of loading the component is to use rollback and load the component with the last saved error-free state, and begin computation from that point. If at most one component fails r----------------, INPUT I I I I I I I I I I I I I I I----+-'-~-- OUTPUT I : I I I I I I I I I I I I L _____ 1_ _ _ _ _ _ _ ~~e~o~ _ _ J I I 8 Spare Un i ts I I EJ Hybrid System (5,3) Figure 4 during a cycle and if the vote-taker is error-free, this system is fail-safe until all the spares are used up, i.e., the system output will not be in error. Consider a comparison of a system with three active units and two spares with another system which has five active units. If at most one unit can fail at a time then the majority is always right and the system with three active units is at least as good as a system with five active units (since a majority of two active units is as right as a majority of four). Thus if at most one unit fails at a time, the number of active units need never exceed three; additional units should be kept as spares. Of course in digital computer systems where computation proceeds in discrete steps such as cycles, iterations, instruction-executions, task-executions, etc., it is possible, though improbable, that more than one unit may fail in a single step. In this case, an analysis which assumes that at most one active unit can fail at a time is an approximation to the real problem. Computation /oJ the hardware reliability efficiency index Mathur and Avizienis (op cit) assume that malfunctions occur according to an exponential failure law. A consequence of this assumption is that at most one unit 61 Framework for Hardware-Software Tradeoffs curve of reliability as a function of N is shown in Figure 8. Let RH be the reliability of the hybrid system, C the cost of each unit and D the cost of the vote-taker. The hardware reliability efficiency index with two spares is then: Number of Active Units HRE =R __ H_-_C_1_-_P_)_N 4C+D Passive Units Self-purging system Discussion Markov diagram of a hybrid conflguratlon Figure 5 can fail at a given instant which in turn implies that the majority is always right. Now consider what happens if the improbable event does occur and the majority is in error and the minority is correct. The correct minority unit will be switched out to be replaced by a spare which is powered up and initialized. A comparison with the other two active units will show that the powered-on spare is in the minority, and it will in turn be switched out to be replaced by yet another spare and so on. Eventually all the spares will be used up and the system will crash. Thus even though the probability of failure of two units in one iteration is indeed small, the consequence of this improbable event is catastrophic. Hence we feel that in calculating SRE it is important to back-Up the ¥athur-Avizienis study of this ingenious method with an analysis that does not assume that simultaneous failures never occur. In this analysis we will assume that computation proceeds in discrete steps called tasks; a task may consist of several instructions or a single instruction. Key variables of the active units are compared at the end of a task completion, and the minority element, if any, is switched out. Let the probability of failure of a unit on any step of the computation be P, independent of other units and earlier events. A discrete-state, discrete~transistion Markov process may be used to model this system. A Markov state diagram is shown in Figure 5. If the system is in state F, then a system failure has already occurred. The reliability of the system is the probability that the system is not in state F at the Nth iteration, where N is the number of computation steps required in the mission. The reliability can be computed analytically from the Jordan normal form. A Consider a self-purging system shown in Figure 6. Initially there are five active units and no spares. If at any instant the vote-taker detects a disagreement among the active units, the units whose outputs are in the minority are switched out, leaving three, active, error-free units. If the failure rates for active and passive units are the same, the self-purging system will tolerate two simultaneous failures, which may be catastrophic for the hybrid system. Computation of the hardware reliability efficiency index In this analysis we shall assume that computation proceeds in discrete steps, as in the analysis for the OUTPUT INPUT Self-purging System with Figure 6 5 Units Fall Joint Computer Conference, 1972 62 reliability of the system is the probability that the system is not in state F one the Nth computation step. A curve showing the reliability of this system as a function of N is shown in Figure 8. Let Rs be the reliability of a self-purging system with five active units initially. Then Number of fault free processors HRE= Rs- (l-P)N 4C+D If the cost of power supplies are included HRE for the hybrid system is larger than that for self-purging. Markov dlagram of a self-purging configuration Figure 7 hybrid system. Let P be the probability of failure of a unit on a computation step, independent of other units and earlier steps. A Markov state diagram for this process is shown in Figure 7. As in the hybrid case the Summary of hardware methods TMR, hybrid, and a system called a self-purging system were discussed. Some of the problems of approximating these systems as continuous transition systems were analyzed. Techniques for obtaining the hardware reliability efficiency indices were presented. Similar techniques can be used for other hardware methods. 0 0 .... CONCLUSION 1 We have attempted to develop a set of simple indices which may be useful in comparing different techniques for achieving reliability. We feel that an important research and pedogogical problem is to develop a more comprehensive, sophisticated framework. Models for rollback and discrete transition models for hybrid and self-purging systems were discussed briefly. Z;0~ 0- .0 o~ ~ «: ACKNOWLEDGMENT This research was supported in part by NSF grants GJ-35109 and GJ-492. REFERENCES o '" o 240 120 Figure 8 Time 1 J VON NEUMANN Probabilistic logics and the synthesis of reliable organisms from unreliable components Automata Studies p 43-98 Princeton University Press Princeton N J 1956 2 F P MATHUR A AVIZIENIS Reliability analysis and architecture of a; hybrid-redundant digital system: Generalized triple module redundancy with self-repair Proc Spring Joint Computer Conference 1970 Framework for Hardware-Software Tradeoffs 3 M BALL F H HARDIE Redundancy for better maintenance of computer systems Computer Design pp 50-52 January 1969 4 M BALL F H HARDIE Self-repair in a T M R computer Computer Design pp 54-57 February 1969 5 A COWAN Hardware-software tradeojJs in the design of reliable computers Master's thesis in the Department of Computer Sciences University of Texas December 1971 6 A N HABERMANN Prevention of system deadlocks Comm ACM Vol 12 No 7 July 1969 7 J HOWARD The coordination of multiple processes in computer operating systems 63 Dissertation Computer Sciences Department University of Texas at Austin 1970 8 K M CHANDY C V RAMAMOORTHY Optimal rollback IEEE-C Vol C-21 No 6 pp 546-556 June 1972 9 G OPPENHEIMER K P CLANCY Considerations of software protection and recovery from hardware failures Proc FJCC 1968 AFIPS pp 29-37 10 A N HIGGINS Error recovery through programming Proc FJCC 1968 AFIPS pp 39-43 11 A N HABERMANN On the harmonious cooperation of abstract machines Thesis Mathematics Department Technological U Eindhoven The Netherlands 1967 Automation of reliability evaluation procedures through CARE-The computer-aided reliability estimation program* by FRANCIS P. MATHUR University of Missouri Columbia, Missouri INTRODUCTION Unifying notation The large number of different redundancy schemes available to the designer of fault-tolerant systems, the number of parameters pertaining to .each scheme, and the large range of possible variations in each parameter seek automated procedures that. would enable the designer to rapidly model, simulate and analyze preliminary designs and through man-machine symbiosis arrive at optimal and balanced fault-tolerant systems under the constraints of the prospective application. Such an automated procedural tool which can model self-repair and fault-tolerant organizations, compute reliability theoretic functions, perform sensitivity analysis, compare competitive systems with respect to various measures and facilitate report preparation by generating tables and graphs is implemented in the form of an on-line interactive computer program called CARE (for Computer-Aided Reliability Estimation). Essentially CARE consists of a repository of mathematical equations defining the various basic redundancy schemes. These equations, under program control, are then interrelated to generate the desired mathematical model to fit the architecture of the system under evaluation. The math model is then supplied with ground instances of its variables and then evaluated to generate values for the reliability theoretic functions applied to the model. The math models may be evaluated as a function of absolute mission time, normalized mission time, nonredundant system reliability, or any other system parameter that may be applicable. A unifying notation, developed to describe the various system configurations using selective, massive, or hybrid redundancy is illustrated in Figure 1. N refers to the number of replicas that are made massively redundant (NMR) ; S is the number of spare units; W refers to the number of cascaded units, i.e., the degree of partitioning; R( ) refers to the reliability of the system as characterized in the parentheses; TMR stands for triple modular redundant system (N =3); the NMR stand for N-tuple modular redundancy. A hybrid redundant system H(N, S, W) is said to have a reliability R(N, S, W). If the number of spares is S = 0, then the hybrid system reduces to a cascaded NMR system whose reliability expression is denoted by R(N, 0, W) ; in the case where there are no cascades, it reduces to R(N, 0,1), or more simply to R(NMR). Thus the term W may be elided if W = 1. The sparing system R (1, S) consists of one basic unit with S spares. Furthermore, the convention is used that R * indicates that the unreliability (1- Rv) due to the overhead required for restoration, detection, or switching has been taken into account e.g., R*(NMR) =Rv.R(NMR); if the asterisk is elided then it is assumed that the overhead has a negligible probability of failure. This proposed notation is extendable and can incorporate a number of functional parameters in addition to those shown here by enlarging the vector or lists of parameters within the parentheses, e.g., R (N, S, W, ... , X, Y, Z). Existing reliability programs Some reliability evaluation programs, known to the author, are the RCP, the RELAN, and the REL70. The RCpl,2 is a reliability computation package developed by Chelson (1967). This is a program which * The work presented here was carried out while the author was with the Jet Propulsion Laboratory, California Institute of Technology, and under Contract No. NAS7-100, sponsored by the National Aeronautics and Space Administration. 65 66 Fall Joint Computer Conference, 1972 NMR SYSTEMS r--~---------------------l : R(NrMRI\ I ., S=O , W=l \ / R(TiMRI~,: \ S=O' II W=l\~ ~ • , : R(N,O,W) \ R(3,O,W) \ I --------,-----------"t-.J SPARING SYSTEMS ts r-------, '! I I ; I =0 I, 1 : N =1 I S =0 \ --------.&..---_______ l. RH, S, WIl+--l R(N, S, WI W=l I \ I· , I W= 1 I :i R(3 S WI I ' , /---+ ,/ N = 3 lW = 1 repository of equations is extendable. Dummy routines are provided wherein new or more general equations may be placed as they are developed and become available to the fault-tolerant computing community. For example, the equation developed by Bouricius, et al., for standby-replacement systems embodying the parameters C and Q has been bodily incorporated into the equation repository of CARE. " / /' I 1. " I I L~E:..~ ___ J+--L -R(N, SI .. / R(3, SI /' I - - - - - - - - - - - _______ -J HYBRID SYSTEMS Figure l-Unifying notation can model a network of arbitrary series-parallel combinations of building blocks and analyzes the system reliability by means of probabilistic fault-trees. RE LAN3 is an interactive program developed by TIME/WARE and is offered on the Computer Sciences Corporation's INFONET network. RELAN like Rep models arbitrary series-parallel combinations but in addition allows a wide choice (any of 17 types) of failure distributions. RELAN has concise and easy to use input formats and provides elegant outputs such as plots and histograms. REL704 and its forerunner REL5 developed by Bouricius, et al., are interactive programs developed in APL/360. Unlike RCP and RELAN, REL70 is more adapted for evaluating systems other than series-parallel such as standby-replacement and triple modular redundancy. It offers a large number of system parameters, in particular C the coverage factor defined as the probability of recovering from a failure given that the failure exists and Q, the quota, which is the number of modules of the same type required to be operating concurrently. REL 70 is primarily oriented toward the exponential distribution though it does provide limited capabilities for evaluating reliability with respect to the Weibull distribution; its outputs are primarily tabular. Since APL is an interpretive language, REL is slow in operation; however, its designers have overcome the speed limitation by not programming the explicit reliability equations but approximate versions6 which are applicable to short missions by utilizing the approximation (l-exp( - AT» = AT for small values of AT. The CARE program is a general program for evaluating fault-tolerant systems, general in that its reliability theoretic functions do not pertain to anyone system or equation but to all equations contained in its repository and also to complex equations which may be formed by interrelating the basic equations. This CARE'S ENVIRONMENT, USERS AND AVAILABILITY CARE consists of some 4150 FORTRAN V statements and was developed on the UNIVAC 1108 under EXEC 8 (version lIe). The particular FORTRAN V compiler used was the Level 7E having the modified 2/3/4 argument ENCODE-DECODE commands. The amount of core required by the unsegmented CARE is 64K words. The software for graphical outputs is designed to operate in conjunction with the Stromberg Carlson 4020 plotter. The software enabling threedimensional projections, namely the Projectograph routines,7 are a proprietary item of Technology Service Corporation. In addition to the Jet Propulsion Laboratory, the originator, currently there are three other users of CARE, namelyNASA Langley Research Center (a FORTRAN II version operational on a CDC 3600), Ultrasystems Corp. (operational on a UNIVAC 1108 under EXEC II), and MIT Draper Laboratory. The CARE program, minus the Projectograph routines, has been submitted to COSMIC** and is available to interested parties from them along with users manuals. Its reference number at COSMIC is NPO-13086. CARE's repository of equations The equations residing in CARE, based on the exponential failure law, model the following basic fault-tolerant organizations: (1) Hybrid-redundant (N, S) systems. 8 •9 (a) NMR (N, 0) systems.lO (b) TMR (3,0) systems. 10 ( c) Cascaded or partitioned versions of the above systems. (d) Series string of the above systems. The equation representing the above family of ** Computer Software Management and Information Center, University of Georgia, Athens, Georgia 30601. The Computer-Aided Reliability Estimation Program systems is the following: l~K< 00 for K= 00 for R*(N, S) = [ R QIW 67 L --'(CQAT/W)iJwz --.--'-,-'-S 1. i=0 (3) TMR systems with probabilistic compensating failures. Io (a) Series string and cascaded versions of the above, The equation characterizing this system is: ~ R*(3, 0) = {RV[3R2IW -2R3IW +6P(1- P)RIIW (1- RIIW)2]} wz L (Kl+S)(_l _ 8-2 j={} RI/W s j+1 r _1)i+I}]. RV (4) Hybrid/simplex redundant (3, S)sim systems. ll (a) TMR/simplex systems,s (b) Series string and cascaded versions of the above. The general equation for this class of systems is the following: R(3, S)sim[T] 1 -1 ) =R3Rs 8 { 1+1·5 ( -2-8 R Rs J l~K< for = 1IW {RNIWR8 [1+(NK+I) ~ (_I)i-Z( (Kl+I) 1 i=1 J i=0 l~K< X(R~-l -1) (2K+:~~K+i)} for )] -1 00 i l RV }WZ and = (I.5)8+IR-R3 and S= 1 S>O and ",>0 ±(3AT)~+~-i i=I for 2K+~ and S>I l=O R81IWRlIW 8 i=I 8 (3K+ ') 8-1 (S) - II 'J L (-I)i t (~) f (i) -r=O X 00 +i) II (3K --, (S-1,), X[(I.5) i-I]-R3[(1.5)8+1-1] (2) Standby-sparing redundant (1, S) systems. a,10 for S>O and JL=O and (a) K-out-of-N systems,S (b) Simplex systems. (c) Series string and cascaded versions of the above. The general equation for the above is: R(l, S) = [RQIW {H E[~(l-R,'IW); X fi (QK+i) )}T" R*(3, S)sim=R v ·R(3, S)sim For the description of the above systems and their mathematical derivations, refer to the cited references. These equations are the most general representation of their systems parameterizing mission time, failure rates, dormancy factors, coverage, number of spares, number of multiplexed units, number of cascaded units, and number of identical systems in series. The definitions of these parameters reside in CARE and may be optionally requested by the user. More complex systems may be modeled by taking any of the above listed systems in series reliability with one another. 68 Fall Joint Computer Conference, 1972 TABLE I-Table of Abbreviations and Terms x = p, = U npowered failure rate = AI J1, = Dormancy factor K T XT R R S Powered failure rate Mission time Normalized mission time = Simplex reliability = Dormant reliability, exp( -p,T). = Number of spares n = (N-1)/2 where N is the total number of multiplexed units Q = Quota or number of identical units in simplex systems C = Coverage factor, Pr(recovery /failure) RV = Reliability of restoring organ or switching overhead Z = Number of identical systems in series W = N umber of cascaded or partitioned units P = Probability of unit failing to "zero" TMR = Triple modular redundancy TMRp = TMR system with probabilistic compensating failures (1, S) = Standby spare system (N , S) = Hybrid redundant system (3, S)sim = Hybrid/simplex redundant system MTF = Mean life R(MTF) = Reliability at the mean life = = Reliability theoretic functions The reliability equations in the repository may be evaluated as a function of absolute mission time (T), normalized mission time (AT), nonredundant system reliabili ty (R), or any other system parameter that may be applicable. The set of reliability theoretic functions defined in CARE are applicable to any of the equations in the repository. This independence of the equations from the functions to be applied to the equations impart generality to the program. Thus the equation repository may be upgraded without effecting the repertoire of functions. The various reliability theoretic functions useful in the evaluation of faulttolerant computing systems have been presented in Ref. 11, the measures of reliability have been defined, categorized into the domains of probabilistic measures and time measures and their effectiveness compared. Among the various measures of reliability that the user may request for computation are: the system mean-life, the reliability at the mean-life, gain in reliability over a simplex system or some other competitive system, the reliability improvement factor, and the mission time availability for some minimum tolerable mission reliability. Operational features Although CARE is primarily an interactive program, it may be run in batch mode if the user prespecifies the protocol explicitly. In the interactive mode CARE assumes minimum knowledge on the user's part. Default values are provided to many of the items that a user should normally supply. This safeguards the user and also makes usage simpler by providing logical default values to conventionally used parameters. Instructions provided by CARE are optional thus the experienced user can circumvent these and operate in fast mode. Definitions of reliability terms and abbreviations used in the program may be optionally requested. An optional "echo" feature that echoes user's responses back to the terminal is also provided .. A number of diagnostics and recovery features that save users from some common fatal errors are in the program. Model formulation-an example A typical problem submitted for CARE analysis may be the following: Given a simplex system with 8 equal modules which is made fault-tolerant by providing two standby spares for each module, where each module has a constant failure rate of 0.5 failures per year and where the spares have a dormancy factor of 10 and the applicable coverage factor being 0.99, it is required to evaluate the system survival probability in steps of 1/10 of a year for a maximum mission duration of 12 years. It is required that the system reliability be compared against the simplex or nonredundant system and that all these results be tabulated and also plotted. It is further required that the mean-life of the system as well as the reliability at the mean-life be computed. It is of interest to know the maximum mission duration that is possible while sustaining some fixed system reliability objective and to display the sensitivity of this mission duration with respect to variations in the tolerable mission reliability. I t is also required that the above analysis be carried out for the case where three standby spares are provided and these configurations of three and two spares be compared and the various comparative measures of reliability be evaluated and displayed. The above problem formulation is entered into CARE by stating that Equation 2 (which models standby spare systems) is required and the pertinent data (S=2,3; Z=8; K=10; T=12.0; LAMBDA =0.5; C=0.99; STEP=O.l) is inserted into CARE between the VARiable namelist delimiters $VAR ... $END. The above example illustrates the complexity of problems that may be posed to CARE, and the simplicity with which the specifications are entered. The reliability theoretic functions to be performed on the above specified system are acknowledged interactively by responding a YES or NOon the demand terminal to CARE's questions at the time it so requests. The Computer-Aided Reliability Estimation Program A PRIMITIVE SYSTEM: O,S), (N,S), (3,S)SIM OR TMRp ----m---rn-... -1:IDAN m- PARTITIONED PRIMITIVE SYSTEM (W = m). ~ ... --c:z::::::J--- SERIES - STRI NG OF A PRIMITIVE SYSTEM (Z =.i). 1 2 i .. ~~~~.~ L.ii%~.~ L _______ .J L ______ .J L ______ J AN m- PARTITIONED SERIES - STRING OF A PRIMITIVE SYSTEM (W =m, Z =2). ....._---,,,--_. . .1. ... -1'--_ _ --1'--_~_----J~ AN ARBITRARY SERIES-STRING OF m-PARTITIONED SERIES-STRING OF PRIMITIVE SYSTEMS. Figure 2-Formation of complex systems COMPLEX SYSTEMS The basic equations in CARE's repository define the primitive systems: (1, S), (N, S), (3, S)sim and TMRp. Equations representing more complex systems may be fabricated by combining the primitive systems in series reliability with one another as shown in Figure 2. The description of a complex system is entered by first enumerating the equation numbers of the primitive systems involved in namelist VARiable1. Thus "$VAR1; PROD = 1, 2; $END;" states that equation 1 and equation 2 are to be configured in series reliability. N ext, the parameter specifications for these equations are then entered using the namelist VARiable. The set of values for any parameter pertaining to a complex system is stored as a matrix, thus in the general case of PARAMETER (m, n) n refers to the equation involved m is the internal index for the set of values that will be attempted successively. For example, C(I, 2) = 1.0, 0.99 states that in equation 2 (the equation for standby-spares system) the value of the coverage factor should be taken to be 1.0 and having evaluated the complex system for this value the system is to be reconsidered with coverage factor being 0.99. 69 here was to be evaluated for the worst case dormancy factors K of 1 and infinity. On completing the evaluation of the above system, the effect of reducing coverage to 0.99 was to be reevaluated. Also the effect of increasing the number of spares to 3, as also the effect of increasing the module failure rates to their upper bound value of .0876 failures/year. All combinations of these modifications on the original system are to be considered. The mission time is 12 years and evaluations are to be made in steps of 1/10th of a year. The above desired computations are specified using the VAR namelist thus: $VAR; T=12.0; STEP=O.I; Z(I, 1) =1, Z(I,2)=8; C(1, 2) =1.0, 0.99; N(I,I)=3; S(I, 1) =2,3,S(I,2) =2,3;LAMBDA(I, 1) = .01752, .0876, LAMBDA(I, 2)=.01752, .0876; K(1, 1) = 1.0, INF, K(I, 2) = 1.0, INF; $END; (N ote the semicolons (;) denote carriage returns.) The ease and compactness with which complex systems can be specified in CARE is demonstrated by the above example. The reader will note the complex system configured in this example corresponds to a STAR-like system having eight functional units in standby-spare mode and a hard-core test-and-repair unit in Hybrid redundant mode (Figure 3). SOlVIE SIGNIFICANT RESULTS USING CARE Some significant results pertaining to the behavior of W partitioned NMR system (Figure 4) will now be presented. These results pertain to the behavior or reliability theoretic functions of an NMR system such as its mean life or mean time to first failure (MTF) and the reliability of the system at the mean life, R(MTF). The reliability theoretic system measure- Complex model formulation-an example It was required to evaluate a system consisting of 8 equally partitioned modules in a standby-spares (1, S) configuration having 2 spares· for each module. The 9th module was the hard-core of the system and was configured in a Hybrid redundant (3, S) system having 2 spares (S = 2). The coverage on the (1, S) system modules was to be initially considered to be 1.0. The lower bound on the failure rate A on all the modules had been evaluated to be .01752 failures/year on the basis of parts count. This complex system as specified 1 0 00000 DOD Figure 3-Configuration for an example of a STAR-like complex model 70 Fall Joint Computer Conference, 1972 1.00 ~ 0.60 ... o ... z - 0 :: 0.401---N= TMR, N=3 .125.06250.00 .03125 0.0 0.5 I 0.694 I 2.78 AT R(N,~O, W) vs AT AS A FUNCTION OF NAND W Figure 4-R(N ,0, W) vs AT as a function of Nand W reliability at the mean life, R(MTF)-is the reliability of the system computed for missions or time durations of length equal to the mean time to first failure of the system. The behavior of these functions were evaluated under the limiting conditions of the system parameters in order to establish system performance bounds. The results presented here have been both proven mathematicallylO and been verified by CARE analysis. Since it is well-known that mean-life (MTF) is not a credible measure of reliability performance (e.g., MTF of a simplex system is greater than the MTF of a TMR system!), another measure the reliability at the meanlife R(MTF) has been used to supplant MTF. 'This measure essentially uses a typical reliability estimate of the system. The typical reliability value being taken at a specific and hopefully a representative time of the system. This representative time is taken to be the time to first failure of the system, namely the MTF of the system. The foregoing is the rationale for choosing R(MTF) as a viable measure of system reliability. However, contrary to general belief this measure R(MTF) is not a good measure for partitioned NMR systems due to its asymptotic behavior as a function of the number of partitions W. It is proved in [10J that the reliability at MTF of a (3, 0, W) system in the limit as W becomes very large approaches the value exp ( -11"/4) asymptotically from below and that this bounding value is reached very rapidly, see Figure 5. The Computer-Aided Reliability Estimation Program 71 TABLE II--MTF and R(MTF) as a Function of W r-;··.. 1.00 (3,0, W) System W MTF R(MTF) o (Simplex) 1.0 0.368 0.402 exp( -11"/4) = 0.454 1 (TMR) co (3,0, co) 0.83 co 0.80 Some other results observed graphically in Figure 4 and the detailed mathematical proof of which are in [10J are summarized below. These results follow from the general reliability equation for a W partitioned NMR system, which is: R(N, 0, W) = (N) E [ J----------[A The construction of the submatrices H64 and A is done by an APL program3 given in the appendix with theory stated in Section III. The sub matrix 13) and drop the column if this seven digit vector has already appeared in a previously taken column. This guarantees that these columns along with the first 7 columns for check bits form a single error correcting code. This exercise was carried out using an APL computer program which generated a (104,90) and (172, 154) DEC codes which has separable SEC and can be shortened to handle data bit lengths 64 and 128. The codes are given in the Appendix. The DED capability is obtained by adding a check bit on the SEC code which makes a SEC-DED odd-weightcolumn-code. 5 The number in front of each column of H-matrix in the Appendix represents the cyclic position number in the full length code. These position numbers are used in the algebraic decoding algorithm4 in error correction process. SYSTEM IMPLEMENTATION There are at least three methods of generating the parity check matrix of a double error ·correcting code. The parity check matrix denoted by HI has Xi mod ml(x)ma(X) as its ith column (0 origin) where ml(x) and ma(X) are minimum functions of the field elements a and aa of GF (27). The parity check matrix H2 generated by the second method has the concatenated vector Xi mod ml(X), Xi mod ma(X) as its ith column. The parity check matrix Ha generated by the third method has the concatenated vector Xi mod mi (x) , x3 i mod ml(x) as its ith column. The codes generated by thes~ three matrices are not only equivalent but also isomorphic. These three matrices possess different desirable properties. In particular, the matrix HI possess the property (1) for the adaptive correction scheme-presently under consideration. The firs t 14 columns of HI represent an identity matrix which corresponds to 14 independently-acting check digits. However, any 7 check bits as a group do not provide SEC capability which is the required property (2). The matrix H2 on the other hand can be divided into two parts where the first group of seven check bits, corresponding in the part column Xi mod ml (x), does provide SEC capability, however, the two groups of check bits do not act independently and hence are not separable. The matrix Ha behaves in the same manner as H2 except that the syndrome in Ha is easily decodable. 4 Let us use a simple example for illustration. Figure 1 shows a memory system which contains two basic memory units. Each unit has a (72, 64) SEC-DED code. The following is the parity check matrix for this simple system. [H64 IsJ cf> cf> H= cf> cf> [HS4 IsJ cf>] cf> (2) [ [A cf>J [A cf>J Is Where H64 is the first group of 7 columns of the matrix in the Appendix and an additional column is added to make it odd weight. The A-matrix is the second group of 7 columns of the matrix in the Appendix. Another column is added to these 15 columns to make the overall parity matrix odd weight. This means that the overall code has double error correction and triple error detection capability. The encoding follows directly from the H-matrix of Equation (2). The decoding is classified as follows: 1. Any single error in each memory unit can be corrected separately and simultaneously. 2. If a double error is detected in one of the memory units and no error indication in the other memory Adaptive Error Correction Scheme for Computer Memory System 85 words out of a group of m words is very small. Such adaptive error correction scheme more closely matches the requirements of modern computer memory systems and can be used very effectively for masking faults and reducing cost of maintenance. REFERENCES MEM MEM Check DECODER To CPU or Channel Figure 1 unit for the corresponding word, then this double error can be corrected by the additional 8 check bits. 1 J F KEELEY System/370-Reliability from a system viewpoint 1971 IEEE International Computer Society Conference September 22-24 1971 Boston Massachusetts 2 W W PETERSON Error correcting codes MIT Press 1961 3 A D F ALKOFF K ElVERSON APL/360 user's manual IBM Watson Research Center Yorktown Heights New York 1968 4 A M PATEL S J HONG Syndrome trapping technique for fast double-error correction Presented at the 1972 IEEE International Symposium on Information Theory IEEE Catalog No 72 CHO 578-S IT 1972 5 M Y HSIAO A class of optimal minimum odd-weight-column SEC-DED codes IBM J of Res & Dev Vol 14 No 4 pp 395-402 July 1970 APPENDIX A-CODE GENERATION PROGRAl\1 The decoding of the double errors as stated in class 2 needs the data bits portion of both memory units. The data bit portion for the error free memory is required to cancel its effects in the last 8 syndrome bits. Therefore, the double error correction can be done as that given in Reference 4. APL 360 VSECDEC[O]V SUl\1MARY An adaptive ECC scheme with SEC-DED feature can be expanded to DEC feature in a memory system containing several memory units environment. The normal memory cycle time remains unaffected, except in the presence of a double error when extra decoding time is required for the double error correction procedure. Other major advantage is cost savings in terms of number of check bits required. If the memory system contains m basic memory units then 8(m-l) check bits can be saved by using this scheme. The number m is chosen such that the probability of double-errors in two [1] [21 [3] [4) [5] [6) V SEC DEC C M+-1+pG }l+2*(!tf+2 ) V+,~1p 0 Z+l'JpO (j+(MpO),l I+O [7] V+MpQ+(-l4>q)~(Gx.q[M-l]) ~9+(I>M)x«X+(2i«M+2)+V»)€Z) [8] [9) '0123456789*'[(10 10 10 TI).10,V] Z[I]+X I+I+1 [10] [11] [12] [13] [14] ~7+(7x(I=N» 'DONE' ~15 V 86 Fall Joint Computer Conference, 1972 APPENDIX B-PARITY CHECK l\;fATRIX FOR (104, 90) SEC-SEPARABLE DEC CODE SEeDEC 1 0 0 0 0 1 1 0 1 1 1 0 1 1 1 000*10000000000000 001*01000000000000 002*00100000000000 003*00010000000000 004*00001000000000 005*00000100000000 006*00000010000000 007*00000001000000 008*00000000100000 009*00000000010000 010*00000000001000 011*00000000000100 012*00000000000010 013*00000000000001 014*10000110111011 015*11000101100110 016*01100010110011 017*10110111100010 018*01011011110001 019*10101011000011 020*11010011011010 021*01101001101101 022*10110010001101 023*11011111111101 024*11101001000101 025*11110010011001 026*11111111110111 027*11111001000000 028*01111100100000 029*00111110010000 030*00011111001000 031*00001111100100 032*00000111110010 037*00110001010110 03$*00011000101011 039*10001010101110 040*01000101010111 041*10100100010000 042*01010010001000 043*00101001000100 044*00010100100010 045*00001010010001 046*10000011110011 047*11000111000010 050*11011101011110 051*01101110101111 052*10110001101100 053*01011000110110 054*00101100011011 055*10010000110110 056*01001000011011 057*10100010110110 058*01010001011011 059*10101110010110 060*01010111001011 061*10101101011110 065*00101011011011 066*10010011010110 074*10001100000010 075*01000110000001 077*11010100000110 078*01101010000011 083*11101111000111 084*11110001011000 085*01111000101100 086*00111100010110 088*10001001111110 094*11001111110100 095*01100111111010 096*00110011111101 097*10011111000101 098*11001001011001 099*11100010010111 100*11110111110000 101*01111011111000 108*10010110110001 109*11001101100011 110*11100000001010 111*01110000000101 112*10111110111001 113*11011001100111 114*11101010001000 115*01110101000100 116*00111010100010 117*00011101010001 119*11000010110010 120*01100001011001 124*00110111011100 125*00011011101110 126*00001101110111 Adaptive Error Correction Scheme for Computer Memory System APPENDIX C-(172, 154) SEC-SEPARABLE DEC CODE SEC DEC SECDEC 1 0 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 000*1000000000000000 001*0100000000000000 002*0010000000000000 003*0001000000000000 004*0000100000000000 005*U000010000000000 006*0000001000000000 007*0000000100000000 008*0000000010000000 009*0000000001000000 010*0000000000100000 011*0000000000010000 012*0000000000001000 013*0000000000000100 014*000000000CO n 0010 015*0000000000000001 01~*1011011110110001 017*1110110001101001 01q*11000001100001~1 011*1101111101110011 020*1101110000001000 021*°110111000000100 022*OOliolll000000l0 023*0001101110000001 024*1011101001110081 025*1110101010001001 026*1100001011110101 l27*1101011011001011 031*1010110000101011 032*1110000110100100 033*0111000011010010 034*0011100001101001 035*1010101110000101 030*1110001001110011 o37*1100011~laOOlnoo 038*0110001101000100 03·::J*QOll00nll0l000l0 040*0001100011010001 041*1011101111011001 045*0110101101111111 04r*1000001000001110 047*010Q0001COOOOlll 048*1001011100110010 04J*0100101110011001 050*1001001001111101 051*1111111010001111 052*1100100011110110 053*0110010001111011 o 5 II * 1 000 n 11 110 0011 0 0 05S*0100001011000110 05C*0010000101100011 057*1010011100000000 058*0101001110000000 05j*0010100111J000JO 06J*Q0010ljOlll00000 I) 61 * ,) 0 (; 0 1 0 10 Q 111 :) 0 :I 0 062*0000J11100111000 ~~7·~1~11-11·111"""~ 063*1001101001001001 069*1111101010010101 070*1100101011111011 071*1101001011001100 072*0110100101100110 073*0011010010110011 074*1010110111101000 075*0101011011110100 076*0010101101111010 077*0001011110111101 078*1011110101101111 079*1110100100000110 08~*0111a10010000011 ~81*100Dll0lllll00CO 082*0100011011111000 083*0010001101111100 o [lll * 'i 0 r) 1800110111110 085*1011001111011110 187*0181100111101111 088*10011:lllJ1nOOll0 () 8 ') * 0 1 'J (; 11 r; 11 0 1 r) 0 0 11 090*1001000101100000 ,) 91 * ,:; 1 :) 0 1 0 0 0 1 0 11 ~; 0 () 0 092*0010010001011000 093*0001001000101100 094*0000100100010110 096*1011010111110100 097*0101101011111010 098*0010110101111101 099*1010000100001111 100*1110011100110110 101*0111001110011011 102*1000111001111100 103*0100011100111110 105*1010011001111110 107*1001111800101110 108*010011111001~111 109*1001iJ00000111010 111*1001001110111111 11 3 * 0 1111111 () () 11 0 111 114*1001100000101010 115*0100010000010101 11~*100101011nlll0ll 117*1111110101101100 llR*nllllll0l0ll0ll0 11~*OOlllll101011Qll 120*1010100000011100 121*0101010000001110 1 2 2 * .J 0 1 ,1 1 0 10 0 0 0 0 0 111 123*1010001110111010 124*~1()liJOOlalJll0Jl 125*10011111JOOlll0l 126*11111J0000111111 127*1110101110101110 128*0110010111010111 131*1001011011100111 132*1111110011000010 135*1111001111110001 136*1100111001001001 137*1101000010010101 138*1101111111111011 130*1101100001001100 140*0110110000100110 141*0011011000010011 147*0101111010111101 143*1001100011101111 143*1111101111000110 150*0111110111100011 151*1000100101000000 153*0010001001010000 15l*lnll0llQl01JOOll lGO*Oll1111nOlllQono 161*0011101100111000 162*0001110110011100 163*0000111011001110 164*0000011101100111 165*1011110000000010 171*11DllllollCllo00 172*0110111101101100 176~alnlll01J010llln 177*0010111010010111 178*1010008011111010 179*0101000001111101 182*0111110000111011 189*1010001101100001 190*1110011000000001 191*1100010010110001 192*1101010111101001 193*1101110101000101 19 11 * 11 0110 () 100010011 195*1101101100111000 1 96* 0 11 () 11 () 11 00111 f) 0 201*1001100100110001 209*0000110111101110 210*0000011011110111 21G*OOll111010111100 217*0001111101011110 71~*0000111110101111 210*1011000001100110 220*0101100000110011 223*001001101110101i )24*8001001101110101 225*1011111000001011 226*1110100010110100 228*0011101000101101 229*1010101010100111 23J*nll1a~OlUlllJOOl 232*1000111100001001 233*1111000000110101 234*1100111110101011 23G*Ol10100000110010 240*1100011100000110 242*1000011001110000 243*0100001100111000 87 Dynamic confirmation of system integrity * by BARRY R. BORGERSON University of California Berkeley, California INTRODUCTION continuously integral are identified, and the integrity of the rest of the system can then be confirmed by means less stringent than concurrent fault detection. For example, it might be expedient to allow certain failures to exist for some time before being detected. This might be desirable, for instance, when certain failure modes are hard to detect concurrently, but where their effects are controllable. It is always desirable to know the current state of any system. However, with most computing systems, a large class of failures can remain undetected by the system long enough to cause an integrity violation. What is needed is a technique, or set of techniques, for detecting when a system is not functioning correctly. That is, we need some way of observing the integrity of a system. A slight diversion is necessary here. Most nouns which are used to describe the attributes of computer systems, such as reliability, availability, security, and privacy, have a corresponding adjective which can be used to identify a system that has the associated attribute. Unfortunately, the word "integrity" has no associated adjective. Therefore, in order to enhance the following discourse, the word "integral" will be used as the adjective which describes the integrity of a system. Thus, a computer system will be integral if it is working exactly as specified. Now, if we could verify all of the system software, then we could monitor the integrity of a system in real time by providing a 100 percent concurrent fault detection capability. Thus, the integrity of the entire system would be confirmed concurrentlYJ where "concurrent confirmation" of the integrity of any unit of logic means that the integrity of this unit is being monitored concurrently with each use. A practical alternative to providing concurrent confirmation of system integrity is to provide what will be called "dynamic confirmation of system integrity." With this concept, the parts of a system that must be QUALITATIVE JUSTIFICATION In most contemporary systems, a multiplicity of processes are active at any given time. Two distinct types of integrity violations can occur with respect to the independent processes. One type of integrity violation is for one process to interfere with another process. That is, one process gains unauthorized access to another's information or makes an illegitimate change of another process' state. This type of transgression will be called an "interprocess integrity violation." The other basic type of malfunction which can be caused by an integrity violation occurs when the state of a single process is erroneously changed without any interference from another process. Failures which lead to only intraprocess contaminations will be called "intraprocess integrity violations." For many real-time applications, no malfunctions of any type can be tolerated. Hence, it is not particularly useful to make the distinction between interprocess and intraprocess integrity violations since concurrent integrity-confirmation techniques must be utilized throughout the system. For most user-oriented systems, however, there is a substantial difference in the two types of violations. Intrapr{)cess integrity violations always manifest themselves as contaminations of a process' environment. Interprocess integrity violations, on the other hand, may manifest themselves as security infractions or contaminations of other processes' environments. * This research was supported by the Advanced Research Projects Agency under· contract No. DAHC15 70 C 0274. The views and conclusions contained in this document are those of the author and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the Advanced Research Projects Agency or the U.S. Government. 89 90 Fall Joint Computer Conference, 1972 We now see that there can be some freedom in defining what is to constitute a continuously-integral, useroriented system. For example, the time-sharing system described below is defined to be continuously integral if it is providing interprocess-interference protection on a continuous basis. Thus other properties of the system, such as intraprocess contamination protection, need not be confirmed on a continuous basis. Although the concept of dynamic confirmation of system integrity has a potential for being useful in a wide variety of situations, the area of its most obvious applicability seems to be for fault-tolerant systems. More specifically, it is most useful in those systems which are designed using a solitary-fault assumption. Where "solitary fault" means that at most one fault is present in the active system at any time. The notion of "dynamic" becomes more clear in this context. Here, "dynamic" means in such a manner, and at such times, so that the probability of encountering simultaneous faults is below a predetermined limit. This limit is dictated not only by the allowable probability of a catastrophic failure, but also by the fact that other factors eventually become more prominent in determining the probability of system failure. Thus, there often becomes a point beyond which there is very little to be gained by increasing the ability to confirm integrity. The rest of this paper will concern itself with dynamic confirmation in the context of making this concept viable with respect to the solitary-fault assumption. DYNAMIC CONFIRMATION TECHNIQUES In this section, and the following section, a particular class of systems will be assumed. The class of systems considered will be those which tolerate faults by restructuring to run without the faulty units. Both the stand-by sparing and the fail-softly types of systems are in this category. These systems have certain characteristics in common; namely, they both must detect, locate, and isolate a fault, and reconfigure to run without the faulty unit, before a second fault can be reliably handled. Obviously, if simultaneous faults are to be avoided, the integrity of all parts of the system must be verified. This is reasonably straightforward in many areas. For instance, the integrity of data in memory can be rather easily confirmed by the method of storing and checking parity. Of course, checks must also be provided to make sure that the correct word of memory is referenced, but this can be done fairly easily too. 1 It is generally true that parity, check sums, and other straightforward concurrent fault-detection techniques can be used to confirm the integrity of most of the logic external to processors. However, there still remains the problems of verifying the integrity of the checkers themselves, of the processors, and of logic that is infrequently used such as that associated with isolation and reconfiguration. All too often, there is no provision made in a system to check the fault detection logic. Actually, there are two rather straightforward methods of accomplishing this. One method uses checkers that have their own failure space. That is, they have more than two output states; and when they fail, a state is entered which indicates that the checker is malfunctioning. This requires building checkers with specifically defined failure modes. It also requires the ability to recognize and handle this limbo state. An example of this type of checker appears in Reference 2. Another method for verifying the integrity of the fault-detection logic is to inject faults; that is, cause a fault to be created so that the checker must recognize it. In many cases this method turns out to be both cheaper and simpler than the previously mentioned scheme. With this method, it is not necessary to provide a failure space for the checkers themselves. However, it is necessary to make provisions for injecting faults when that is not already possible in the normal design. With this provision, confirming the integrity of the checking circuits becomes a periodic software task. Failures are injected, and fault detection inputs are expected. The system software simply ignores the fault report or initiates corrective action if no report is generated. Associated with systems of the type under discussion, there is logic that normally is called into use only when a fault has been detected. This includes the logic dedicated to such tasks as diagnosis, isolation, and reconfiguration. This normally idle class of hardware units will collectively be called "reaction logic." In order to avoid simultaneous faults in a system, this reaction logic must not be allowed to fail without the failure being rapidly detected. Several possibilities exist here. This logic can be made very reliable by using some massive redundancy technique such as triple-modular-redundancy.3 Another possibility is to design these units such that they normally fail into a failure space which is detected and reported. However, this will not be as simple here as it might be for self-checking fault detectors because the failure modes will, in general, be harder to control. A third method would be to simulate the appropriate action and observe the reaction. This also is not as simple here as it was above. For example, it may not be desirable to reconfigure a system on a frequent periodic basis. However, one way out of this is to simulate the Dynamic Confirmation of System Integrity action, initiate the reaction, and confirm the integrity of this logic without actually causing the reconfiguration. This will probably require that the output logic either be made "reliable" or be encoded so as to fail into a harmless and detectable failure space. The final area that requires integrity confirmation is the processors. The technique to be employed here is very dependent on the application of the system. For many real-time applications, nothing short of concurrent fault detection will apparently suffice. However, there are many areas where less drastic methods may be adequate. Fabry 4 has presented a method for verifying critical operating-system decisions, in a timesharing environment, through a series of independent double checks using a combination of a second processor and dedicated hardware. This method can be extended to verifying certain decisions made by a real-time control processor. If most of the tasks that a real-time processor performs concern data reduction, it is possible that software-implemented consistency checks will suffice for monitoring the integrity of the results. When critical control decisions are to be made, a second processor can be brought into the picture for consistency checks or dedicated hardware can be used for validity checking. Alternatively, a separate algorithm, using separate registers, could be run on the same processor to check the validity of a control action, with external time-out hardware being used to guarantee a response. These procedures could certainly provide a substantial cost savings over concurrent fault-detection methods. For a system to be used in a general-purpose, timesharing environment, the method of checking processors non-concurrently is very powerful because simple, relatively inexpensive schemes will suffice to guarantee the security of a user's environment. The price that is paid is to not detect some faults that could cause contamination of a user's own information. But conventional time-sharing systems have this handicap in addition to not having a high availability and not maintaining security in the presence of faults, so a clear improvement would be realized here at a fairly low cost. In order to detect failures as rapidly as possible in processors that have no concurrent fault-detection capability, periodic surveillance tests can be run which will determine if the processor is integral. VALIDATION OF THE SOLITARY-FAULT ASSUMPTION 91 removed at any given time. However, in order to design the system so that all possible types of failures can be handled, it is usually necessary to assume that at most one active unit is malfunctioning at any given time. The problem becomes essentially intractable when arbitrary combinations of multiple faults are considered. That is not to say that all cases of multiple faults will bring a system down, but usually no explicit effort is made to handle most multiple faults. Of course by multiple faults we mean multiple independent faults. If a failure of one unit can affect another, then the system must be designed to handle both units malfunctioning simultaneously or isolation must be added to limit the influence of the original fault. A quantitative analysis will now be given which provides a basis for evaluating the viability of utilizing non-concurrent integrity-confirmation techniques in an adaptive fault-tolerant system. In the analysis below, the letter "s" will be used to designate the probability that two independent, simultaneous faults will cause a system to crash. The next concept we need is that of coverage. Coverage is defined5 as the conditional probability that a system will recover given that a failure has occurred. The letter "e" will be used to denote the coverage of a system. In order to determine a system's ability to remain continuously available over a given period of time, it is necessary to know how frequently the components of the system are likely to fail. The usual measure employed here is the mean-time-between-failures. The letter "m" will be used to designate this parameter. It should be noted here that "m" represents the meantime-between-internal-failures of a system; the system itself hopefully has a much better characteristic. The final parameter that will be needed here is the maximum-time-to-recovery; This is defined to be the maximum time elapsed between the time an arbitrary fault occurs and the time the system has successfully reconfigured to run without the faulty unit. The letter "r" will be used to designate this parameter. The commonly used assumption that a system does not deteriorate with age over its useful life will be adopted. Therefore, the exponential distribution will be used to characterize the failure probability of a system. Thus, at any given time, the probability of encountering a fault within the next u time units is: p= jU (llm)*exp( -tim) dt o Fault-tolerant systems which are capable of isolating a faulty unit, and reconfiguring to run without it, typically can operate with several functional units = 1-exp( -ulm) From this we can see that the upper bound on the 92 Fall Joint Computer Conference, 1972 conditional probability of encountering a second independent fault is given by: q= l-exp( -rim) Since it is obvious that r must be made much smaller than m if a system is to have a high probability of surviving many internal faults, the following approximation is quite valid: q= l-exp( -rim) 00 =1- 2: (-r/m)k/k! k=O = 1-I+r/m- (Y2)*(r/m)2+ Oi)*(r/m)3- ..• ~r/m Therefore, the probability of not being able to recover from an arbitrary internal failure is given by: x= (I-c) +c*q*s = (I-c) +c*s*r/m where the first term represents the probability of failing to recover due to a solitary failure and the second term represents the probability of not recovering due to simultaneous failures given that recovery from the first fault was possible. If we now consider each failure as an independent Bernoulli trial and make the assumption that faulty units are repaired at a sufficient rate so that there is never a problem with having too many units logically removed from a system at any given time, then it is a simple ~atter to determine the probability of surviving a given period, T, without encountering a system crash. The hardware failures will be treated as n independent samples, each with probability of success (1- x), where n is the smallest integer greater than or equal to T /m. Thus, the probability of not crashing on a given fault is (I-x) =c*(1-r*s/m) and the probability, P, of not crashing during the period T is given by: P= [c*(I-r*s/m) In =c *(I-r*s/m)n . concurrent schemes and since this time is essentially equivalent to how frequently the confirmation procedures are invoked, we can assume that r is equal to the time period between the periodic integrity confirmation checks. In order to gain a feeling for the order of r, rather pessimistic numbers can be assumed for m, s, and T. Assume m=1 week, s= Y2, and T=10 years; this gives an n of 520. For now, assume c is equal to one. Now, in order to attain a probability of .95 that a system will survive 10 years with no crashes under the above assumptions, r will have to be: r= m/ s*[I-. 95(1/520) ] = 119 seconds Thus, if the periodic checks are made even as infrequently as every two minutes, a system will last 10 years with a probability of not crashing of approximately.95. The effects of the coverage must now be examined. In order for the coverage to be good enough to provide a probability of .95 of no system crashes in 10 years due to the system's inability to handle single faults, it must be: c= .95(11520) =.9999 Now this would indeed be a very good coverage. Since the actual coverage of any given system will most likely fall quite short of this value, it seems that the coverage, and not multiple simultaneous faults, is the limiting factor in determining a system's ability to recover from faults. The most important conclusion to be drawn from this section is that the solitary-fault assumption is not only convenient but quite justified, and this is true even when only periodic checks are made to verify the integrity of some of the logic. INTEGRITY CONFIRMATION FEATURES OF THE "PRIME" SYSTEM ll With this equation, it is now possible to establish the validity of using the various non-concurrent techniques mentioned above to confirm the integrity of a system. What this equation will establish is how often it will be necessary to perform the fault injection, action simulation, and surveillance procedures in order to gain an acceptable probability of no system crashes. Since the time required to detect, locate, and isolate a fault, and reconfigure to run without the faulty unit, will be primarily a function of the time to detection for the non- In order to better illustrate the potential power of dynamic integrity confirmation techniques, a descrip, tion. will now be given of how this concept is being used to economically provide an integrity confirmation structure for a fault-tolerant system. At the University of California, Berkeley, we are currently building a modular computer system, which has been named PRIME, that is to be used in a multiaccess, interactive environment. The initial version of this system will have five processors, 13 8K-word by 33-bit memory blocks with associated switching units, Dynamic Confirmation of System Integrity DISK I I ~ DRIVE .. .. . DISK DRIVE I I I IEXTERNAL DEVICE DISK DRIVE I I I"'TERNAL DEVICE .... IEXTERNAL DEVICE I I I I I INTERCONNECTION NETWORK I I I I/O CONTROLLER I I~I I I 1/0 CONTROLLER I 1*11ri I I I/O CONTROLLER 1*1 ~ 93 I T I/O CONTROLLER * I I ~ CONTROLLER I/O I *1 *RECONFIGURATION LOGIC EACH nIDICATED LINE REPRE SENTS 16 TERMINAL CONNECT IONS I PROCESSOR I I MEMORY :rnTERFACE PROCESSOR I I I PROCESSOR I I MEMORY INTERFACE MEMORY 1 :rnTERFACE PROCESSOR I I I I PROCESSOR I r :rnTERFACE MEMORY r MEMORY INTERFACE j - ~~E\j ~ 1 MB4 MB5 EJ~~ ~ EACH MEMORY BLOCK (MB) CONSISTS OF TWO 4K MODULES MBlO MBn 88 Figure 1-Block diagram of the PRIME system 15 high-performance disk drives, and a switching network which allows processor, disk, and external-device switching. A block diagram of PRIME appears in Figure 1. The processing elements in PRIME are 3-bus, 16-bit wide, 90ns cycle time microprogrammable processors called IVIETA 4s. 6 Each processor emulates a target machine in addition to performing I/O and executive functions directly in microcode. At any given time, one of the processors is designated the Control Processor (CP), while the others are Problem Processors (PPs). The CP runs the Central Control Monitor (CCM) which is responsible for scheduling, resource allocation, and interprocess message handling. The Problem Processors run user jobs and perform some system functions with the Extended Control l\1onitor (ECl\1) which is completely isolated from user processes. Associated with each PP is a private page, which the ECM uses to store data, and some target-machine code which it occasionally causes to be executed. A more complete descrip- tion of the structure and functioning of PRIME is given elsewhere. 7 The most interesting aspects of PRIl\1E are in the areas of availability, efficiency, and security. PRIME will be able to withstand internal faults. The system has been designed to degrade gracefully in the presence of .internal failures. 8 Also, interprocess integrity is always maintained even in the presence of either hardware or software faults. The PRIME system is considered continuously integral if it is providing interprocess interference protection. Therefore, security must be maintained at all times. Other properties, such as providing user service and recovering from failures, can be handled in a less stringent manner. Thus, dynamic confirmation of system integrity in PRIl\1E must be handled concurrently for interprocess interference protection and can be handled periodically with respect to the rest of the system. Of course, there are areas which do not affect interprocess interference protection but which 94 Fall Joint Computer Conference, 1972 will nonetheless utilize concurrent fault detection simply because it is expedient to do so. Fault injection is being used to check most of the fault-detection logic in PRIlVIE. This decision was made because the analysis of non-concurrent integrityconfirmation techniques has established that periodic fault injection is sufficiently effective to handle the job and because it is simpler and cheaper than the alternatives. There is a characteristic of the PRIME system that makes schemes which utilize periodic checking very attractive. At the end of each job step, the current process and the next process are overlap swapped. That is, two disk drives are used simultaneously; one of these disks is rolling the current job out, while the other is rolling the next job in. During this time, the associated processor has some potential free time. Therefore, this time can be effectively used to make whatever periodic checks may be necessary. And since the mean time between job steps will be less than a second, this provides very frequent, inexpensive periodic checking capabilities. The integrity of Problem Processors is checked at the end of each job step. This check is initiated by the Control Processor which passes a one-word seed to the PP and expects the PP to compute a response. This seed will guarantee that different responses are required at different times so that the PP cannot accidently "memorize" the correct response. The computation requires the use of both target machine instructions and a dedicated firmware routine to compute the expected response. The combination of these two routines is called a surveillance procedure. This surveillance procedure checks all of the internal logic and the control storage of the microprocessors. The target machine code of the surveillance routine is always resident in the processor's private page. The microcode part is resident in control storage. A fixed amount of time is allowed for generating a response when the CP asks a PP to run a surveillance on itself. If the wrong response is given or if no response is given in the allotted time, then the PP is assumed to be malfunctioning and remedial action is initiated. In a similar manner, each PP periodically requests that the CP run a surveillance on itself. If a PP thinks it detects that the CP is malfunctioning, it will tell the CP this, and a reconfiguration will take place followed by diagnosis to locate the actual source of the detected error. More will be said later about the structure of the reconfiguration scheme. While the periodic running of surveillance procedures is sufficient for most purposes, it does not suffice for protecting against interprocess interference. As previously mentioned, this protection must be continuous. Therefore, a special structure has been developed which is used to prevent interprocess interference on a continuous basis. 4 This structure provides double checks on all actions which could lead to interprocess interference. In particular, the validity of all memory and disk references, and all interprocess message transmissions, are among those actions double checked. A class code is used to associate each sector (lK words) of each disk pack with either a particular process or with the null process, which corresponds to unallocated space. A lock and key scheme is used to protect memory on a page (also lK words) basis. In both cases, at most one process is bound to a lK-word piece of physical storage. The Central Control ]VIonitor is responsible for allocating each piece of storage, and it can allocate only those pieces which are currently unallocated. Each process is responsible for deallocating any piece of storage that it no longer needs. Both schemes rely on two processors and a small amount of dedicated hardware to provide the necessary protection against some process gaining access to another process' storage. In order for the above security scheme to be extremely effective, it was decided to prohibit sharing of any storage. Therefore, the Interconnection Network is used to pass files which are to be shared. Files are sent as regular messages, with the owning process explicitly giving away any information that it wishes to share with any other process. All interprocess messages are sent by way of the CPo Thus, both the CCM and the destination EC]VI can make consistency checks to make sure that a message is delivered to the correct process. The remaining area of integrity checking which needs to be discussed is the reaction hardware. In the PRIlVIE system, this includes the isolation, power switching, diagnosis, and reconfiguration logic. A variety of schemes have been employed to confirm the integrity of this reaction logic. In order to describe the methods employed to confirm the integrity, it will be necessary to first outline the structure of the spontaneous reconfiguration scheme used in the PRIME system. There are four steps involved in reorganizing the hardware structure of PRIME so that it can continue to operate with internal faults. The first step consists of detecting a fault. This is done by one of the many techniques outlined in this paper. In the second step, an initial reconfiguration is performed so that a new processor, one not involved in the detection, is given the job of being the CPo This provides a pseudo "hard core" which will be used to initiate gross diagnostics. The third step is used to locate the fault. This is done by having the new CP attach itself to the Programmable Control Panel9 of a Problem Processor via the Interconnection Network, and test it by single-stepping this Dynamic Confirmation of System Integrity PP through a set of diagnostics. If a PP is found to be functioning properly, then it is used to diagnose its own I/O channels. After the fault is located, the faulty functional-unit is isolated, and a second reconfiguration is performed to allow the system to run without this unit. Of the four steps involved in responding to a fault, the initial reconfiguration poses the most difficulty. In order to guarantee that this initial reconfiguration could be initiated, a small amount of dedicated hardware waS incorporated to facilitate this task. Associated with each processor is a flag which indicates when the processor is the CPo Also associated with each processor is a flag which is used to indicate that this processor thinks the CP is malfunctioning. For every processor, these two flags can be interrogated by any other processor. Each processor can set only its own flag that suggests the CP is sick. The flag which indicates that a processor is the CP can be set only if both the associated processor and the dedicated hardware concur. Thus, the dedicated hardware will not let this flag go up if another processor already has its up. Also, this flag will automatically be lowered whenever two processors claim that the CP is malfunctioning. There is somewhat of a dilemma associated with confirming the integrity of this logic. Because of the distributed nature of this reconfiguration structure, it should be unnecessary to make any of it "reliable." That is, the structure is already distributed so that a failure of any part of it can be tolerated. However, if simultaneous faults are to be avoided, the integrity of this logic must be dynamically confirmed. Unfortunately, it is not practical to check this logic by frequently initiating reconfigurations. This dilemma is being solved by a scheme which partially simulates the various actions. The critical logic that cannot be checked during a simulated reconfiguration is duplicated so that infrequent checking by actual reconfiguration is sufficient to confirm the integrity of this logic. The only logic used in the diagnostic scheme where integrity confirmation has not already been discussed is the Programmable Control Panel. This pseudo panel is used to allow the CP to perform all the functions normally available on a standard control panel. No explicit provision will be made for confirming the integrity of the Programmable Control Panel because its loss will never lead to a system crash. That is, failures in this unit can coexist with a failure anywhere else in the system without bringing the system down. For powering and isolation purposes, there are only four different types of functional units in the PRIlVIE system. The four functional units are the intelligence module, which consists of a processor, its I/O controller 95 and the subunits that directly connect to the controller, its memory bus, and its reconfiguration logic; the memory block, which consists of two 4K-word by 33-bit 1\10S memory modules and a 4X2 switching matrix; the switching module, which consists of the switch part of two processor-end and three device-end nodes of the Interconnection Network; and the disk drive. The disk drives and switching modules can be powered up and down manually only. The intelligence modules must be powered up manually, but they can be powered down under program control. Finally, the memory blocks can be powered both up and down under program control. No provision was made to power down the disks or switching modules under program control because there was no isolation problem with these units. Rather than providing very reliable isolation logic at the interfaces of the intelligence modules and memory blocks, it was decided to provide additional isolation by adding the logic which allows these units to be dynamically powered down. Also, because it may be necessary to power memory blocks down and then back up in order to determine which one has a bus tied up, the provision had to be made for performing the powering up of these units on a dynamic basis. Any processor can power down any memory block to which it is attached, so it was not deemed necessary to provide for any frequent confirmation of the integrity of this power-down logic. Also, every processor can be powered down by itself and one other processor. These two power-down paths are independent so again no provision was made tofrequently confirm the integrity of this logic. In order to guarantee that the independent power-down paths do not eventually fail without this fact being knmvll, these paths can be checked on an infrequent basis. All of the different integrity confirmation techniques used in PRIlVIE have been described. The essence of the concept of dynamic confirmation of system integrity is the systematic exploitation of the specific characteristics of a system to provide an adequate integrity confirmation structure which is in some sense minimal. For instance, the type of use and the distributed intelligence of PRI1\1E were taken advantage of to provide a sufficient integrity-confirmation structure at a much lower cost and complexity than would have been possible if these factors were not carefully exploited. REFERENCES 1 B BORGERSON C V RAVI On addressing failures in merrwry systems Proceedings of the 1972 ACM International Computing Symposium Venice Italy pp 40-47 April 1972 2 D A ANDERSON G METZE Design of totally self-checking check circuits for M-out-of-N 96 Fall Joint Computer Conference, 1972 codes Digest of the 1972 International Symposium on Fault-Tolerant Computing pp 30-34 3 R A SHORT The attainment of reliable digital systems through the use of redundancy-A survey IEEE Computer Group News Vol 2 pp 2-17 March 1968 4 R S FABRY Dynamic verification of operating system decisions Computer Systems Research Project Document No P-14.0 University of California Berkeley February 1972 5 W G BOURICIUS W C CARTER P R SCHNIEDER Reliability rrwdeling techniques for self-repairing computer systems Proceedings of the ACM National Conference pp 295-309 1969 4. computer system microprogramming referlmce manual Publication No 7043MO Digital Scientific Corporation San diego California June 1972 7 H B BASKIN B R BORGERSON R ROBERTS P RIM E-A rrwdular architecture for terminal-oriented systems Proceedings of the 1972 Spring Joint Computer Conference pp431-437 8 B R BORGERSON A fail-softly system for time-sharing use Digest of the 1972 International Symposium on Fault-Tolerant Computing pp 89-93 9 G BAILLIU B R BORGERSON A multipurpose processor-enhancement structure Digest of the 1972 IEEE Computer Society Conference San Francisco September 1972 pp 197-200 6 META The in-house computer department by JOHN J. PENDRAY TECSI-SOFTWARE Paris, France INTRODUCTION tive's office. This organizational form has created two of the most widely adopted erroneous concepts ever to permeate corporate activity. The first, and perhaps least damaging of these, is the concept that the highest corporate officers should be directly in touch with the computer at all times (and at any cost) to take advantage of something called the J\1:anagement Information System (MIS). (Briefly, a MIS is a system designed to replace a fifteen-minute telephone call to the research department by a three-second response from a computer, usually providing answers exactly fourteen minutes and fifty-seven seconds faster than anyone can phrase his precise question.) The second concept to follow the attachment of the computer department to the chief executive's office has been the missionary work which has been undertaken in the name of, and with the power or influence of, the chief executive. Information service missionary work generally consists of the computer experts telling each department exactly what their information needs are and how they should go about their business. This article will examine the nature of· the in-house computer department in terms of its place in the corporate structure, its product, its function in the maturing of the product, and its methods of optimizing its resource utilization. Additionally, one possible internal structure for an in-house computer department will be presented. Over fifteen years ago, in some inner recess of some large corporation, a perplexed company official stood pondering before a large corporate organizational chart on his office wall. In his hand he held a small square of paper on which the words "Computer Department" were inscribed. Behold one of the modern frontiersmen of twentieth century business: the first man to try to stick the in-house computer department on the company organizational chart. He probably failed to find a place with which he felt comfortable, thereby becoming the first of many who have failed to resolve this problem. Most of the earlier attempts ended by putting the computer department somewhere within the grasp of the corporate financial officer. The earliest computer applications were financial in nature, such as payroll, bookkeeping, and, after all, anything that costs as much as a computer must belong in the financial structure somehow. Many corporations are still trying to get these financial officers to recognize that there are many non-financial computer applications which are at least as important as the monthly corporate trial balances. Additionally, and perhaps even worse, the allocation of the computer department's resources is viewed as a relatively straightforward financial matter subject to budgeting within financial availability. This method of resource dispensing seems not to provide the right balance of performance and cost generally sought in the business world. As the computer department growth pattern followed the precedent of Topsy, many corporations began to wonder why something that had become an integral part of every activity in the company should belong to one function, ·like finance. This questioning led to a blossoming forth of powerful in-house computer departments disguished under surcharged names like Information Services Department. Often, this square on the organizational chart had a direct line to the chief execu- THE IN-HOUSE COMPUTER DEPARTMENT WITHIN THE CORPORATE STRUCTURE Most of the blocks on the corporate organizational chart have some direct participation in the business of the company. Take an example. The Whiz-Bang Corporation is the world's leader in the production of whizbangs. Its sales department sells whiz-bangs. Its production department produces whiz-bangs. Its development department develops new types of whiz-bangs. Its 97 98 Fall Joint Computer Conference, 1972 computer department has nothing to do with whizbangs. The people in the computer department know lots about computers, but their knowledge about whizbangs comes from what other departments have told them. What are they doing in the whiz-bang operation? The computer department provides services to all the other departments in the company. These other departments are directly involved in the business of the company, but the function of the computer department is to provide services to the company, not to contribute directly in the business of the company. In this light, the computer department is like an external supplier of services. How should such a supplier of services be funded? Let's return to the Whiz-Bang Corporation analogy. The marketing department is allocated sufficient resources to market whiz-bangs; the production department gets resources adequate to produce whiz-bangs, etc. It is not possible to allocate r~sources to the computer department on the basis of its direct contribution to whiz-bangs. The computer department provides services, and these services are what should be funded. The value of these services provides the basis for funding. Other departments use the computer services, and it follows that only these departments can place the value on a service and that each department should pay for the services which it gets. Therefore, the funding of the computer department is the sum total of the payments received from the other departments for services rendered. How should the computer department be controlled? First of all, it is necessary to define what is to be controlled. Either one can control product specifications or one can control the resources necessary to produce a product. Product specifications are generally controlled, in one way or another, by the buyer, while resource control is usually an internal problem concerned with the production of the buyer-specified product. At the Whiz-Bang Corporation, the marketing determines the buyer-desired product specifications, but each internal department calculates and controls its resource requirements to yield the specified number and type of whizbangs. If the nature of the computer department is to provide services as its product, the users of these services should control their specifications. Mter all, they are paying for them (or should be). If the computer department has the task of providing services that the other departments will be willing to fund, it should have the responsibility to allocate its resources to optimize its capability to provide the services. Mter all, they are the experts (or should· be). In resume, the departments in the corporation are using an external type of service from an internal source, the in-house computer department. Only they can value the service, but they won't do this job of valuation unless they are charged for the service. This valuation will automatically produce customer-oriented specifications for the services. On the other hand, once the services are specified and accepted at a certain cost, it is the job of the computer department to use its revenues in the best manner to produce its services. That is, the funding flows as revenues from the other departments; but the utilization of this funding is the proper responsibility of the provider of the services, the computer department. These principles indicate that the in-house computer department can be melded into the corporate structure in any position where it can be equally responsive to all of the other departments while controlling, itself, the utilization of its resources. THE PRODUCT OF THE COl\fPUTER DEPARTMENT-THE COMPUTER SERVICE A computer service, which is the product produced and sold by the computer department, has an average life span of between five and ten years. It is to be expected that as the speed of computer technological change diminishes, this life span will lengthen. To date, many computer services have been conceived and developed without a real understanding of the nature of a computer service. The lengthening of the life span of the computer service should produce a more serious interest in understanding this nature in order to produce more economical and responsive services. A well-conceived computer service is a highly tuned product which depends on the controlled maturing and merging of many technical facets. Too often this maturing and merging is poorly controlled because the life cycle of the computer service is not considered. The net result may be an inflexible and unresponsive product which lives on the edge of suicide, or murder, for the entirety of its operational life. Computer services management should not allow this inflexibility to exist, for the computer is one of the most flexible tools in the scientific grabbag. This innate flexibility should be exploited by management in the process of maturing a computer service. MATURING THE COMPUTER SERVICE There are four major phases in the maturing process: definition, development, operation, and overhaul. Perhaps the most misunderstood aspect of this maturing The In-House Computer Department process is the relation between the phases in the life cycle of a computer service and the corresponding changes required in the application of technical specialties. Each phase requires a different technical outlook and a different level of technical skills. The definition phase Defining the service is oriented toward producing the functional specifications which satisfy the needs and constraints of the client. From another point of view, this is the marketing and sales problem for the computer department. It should be treated as a selling problem because the service orientation of the computer department is reinforced by recognition that the buyer has the problem, the money, and the buying decision. The technical outlook should be broad and long term, for the entire life of the service must be considered. Technical details are not desirable at this stage, but it is necessary to have knowledge of recent technical advances which may be used to the benefit of the service. Also, a good understanding of the long-range direction and plans of the computer department is necessary in order to harmonize the proposed service with these goals. The first step in defining a computer service is to locate the potential clients and estimate their susceptibility to an offer of a computer service. At first glance, this seems an easy task as the potential clients are well-known members of the corporate structure. Not so! Many of the most promising avenues of computer services cut across the normal functional separations and involve various mixtures of the corporate hierarchy. These mixtures are frequently immiscible, and the selling job involves convincing each participant of his benefit and helping him justify his contribution. The corporate higher-ups would also need to be convinced, but the money will seldom come from their operating budgets. In any case, the responsibility to seek out and sell new computer services lies with the computer department; however, the decision to buy is the sole property of the client departments. Mter potential clients are identified, a complete understanding of the problem must be gained in order to close the sale. This understanding should give birth to several alternative computer system approaches giving different performance and cost tradeoffs. The potential customer will want to understand the parameters and options available to him in order. to select his best buy. This is a phase of the life cycle of the service where the computer department provides information and alternatives to the prospective client. Closing of the agreement should be in contractual terms with each party obligated for its part of the 99 responsibility. All terms such as financing schedules, product specifications, development schedules, modification procedures, and penalties should be reduced to writing and accepted before work begins. A computer department that cannot (or will not) make firm commitments in advance of a project is poorly managed. (Of course there can always be a periodic corporate reckoning to insure that imbalances are corrected.) The development phase The contract is signed; the emphasis for the computer department changes from sales to development and implementation of the service. This phase calls for a concentrated, life-of-the-effort technical outlook with in-depth and competent technical ability required at all levels. The specialists of the computer department must be organized to produce the system which will provide the service as specified. The usual method for accomplishing this organization is the "project". Many learned texts exist on the care and feeding of a technical project, so let's examine here only the roles of the computer department and the client within the general framework of a project. Computer department participation centers on its role as being the prime responsible party for the project. It is the computer department's responsibility to find the best techniques for satisfying all the goals of the project. The correct utilization of the resources available to the computer department is a key to the project's success. One resource is time, and time runs put for a project. That is to say that no true project succeeds unless it phases out on time. A project team produces a product, turns it over to the production facility, and then the project ceases to exist. The personnel resource of the computer department is also viewed differently in a project. The project team is composed of a hand-tailored mix of specialists who are given a temporary super-incentive and then removed from the project after their work is done. Superincentives and fluid workforces are not easily arranged in all companies, and this is one of the reasons why the computer department must maintain control of the utilization of its resources. The computer department should acquire new resources for a project within the following guideline: don't. Projects should not collect things around them or they become undisintegratable. The only exception: acquisitions which form part of the product, and not part of the project, and which will go with the product into the production phase. Assuring the continuing health of the project's product is another critical aspect of the computer depart- 100 Fall Joint Computer Conference, 1972 ment's responsibility in the project. Since the project team will die, it must provide for the product to live independently of the project. This involves producing a turnoverable product which is comprehensible at all levels of detail. Also, the final product must be flexible enough to respond to the normal changes required during its lifetime. It is interesting to note that in the development phase of the life cycle of a service, the project philosophy dictates that the computer department orient itself toward project goals and not just toward satisfying the specifications of the service. That is, the service specifications are only one of the project goals along with time, cost, etc. On the other hand, the eventual user of the service, i.e., the client department, views the project as only a part of the total process necessary to have the service. To the client, the project is almost a "necessary evil"; however, the development project philosophy depends on active client involvement. Three distinct client functions are required. In their order of importance they are: 1. Countinuing decision-making on product performance and cost alternatives surfaced during the project work. 2. Providing aid to the technical specialists of the computer department to insure that the functional specifications are well understood. 3. Preparing for use of the service, including data preparation, personnel training, reorganization, etc. These three client functions are certainly important aspects of a project, but it should not be forgotten that the development project is a method used by the computer department to marshal its resources and, therefore, must be under the responsibility of the computer department. Development of the service may be an anxious phase as the client has been sold on the idea and is probably eager for his first product. This eagerness should not be blunted by the project team, nor should it affect the sound judgment of the team. Consequently, contact between the technical experts and the client should be controlled and directed toward constructive tasks. philosophy which is single-minded: to assure the continuing viability of the service. This is often a fire-fighting function in which the quick-and-dirty answer is the best answer. There isn't much technical glory in this part of the life cycle of a service, but it's the part that produces the sustaining revenues for the computer department. The computer department enhances continuing product viability by performing two functions. Of primary importance is to reliably provide the specified service with minimum expenditure of resources. Secondarily, the client must be kept aware of any possible operational changes which might affect the performance or cost of his service. Again, the client has a strong part in the decision to effect a change. The client must contribute to the continuing viability of the product by using it intelligently and periodically evaluating its continuing worth. The overhaul phase As a service ages during its operational heyday, the environment around it changes little by little. Also, the quick-and-dirty maintenance performed by the operations personnel will begin to accumulate into a patchwork quilt which doesn't look much like the original edition. These two factors are not often self-correcting, but they can go unnoticed for years. The only answer is a complete technical review and overhaul. Every service should be periodically dragged out of the inventory and given a scrub-down. This is another job where the technical glamor is quite limited; however, overhauling services to take advantage of new facilities or concepts can provide significant gains, not to mention that the service will remain neat, controllable, flexible, and predictable. Thus definition, development, operation, and overhaul are the four phases in the life cycle of a computer service. All of these phases directly affect the clients and are accomplished with their aid and involvement. However, there is another area of responsibility for the computer department that does not touch the clients as closely. This area is the control over the utilization of the computer department's resources. OPTIMIZING THE UTILIZATION OF THE COMPUTER DEPARTMENT'S RESOURCES The operation phase The third step in the life cycle of a service begins when the development project begins to phase out. This is the day-to-day provision of the service to the client. In this phase, the computer department has a production This important responsibility of the computer department is an internally-oriented function which is not directly related to the life cycles of the services. This is the problem of selecting the best mix of resources which fulfills the combined needs of the clients. In the comput- The In-House Computer Department er service business there are two main resources, people and computing equipment. Effective management of computer specialists involves at least training, challenging, and orienting. If these three aspects are performed well, a company has a better chance of keeping its experts, and keeping them contributing. Training should be provided to increase the professional competence of the staff, but in a direction which is useful to the company. It is not clear, for instance, that companies who use standard off-theshelf programming systems have a serious need to train the staff in the intricate design of programming systems software. It's been done, and every routine application suddenly became very sophisticated, delicate, and incomprehensible. However, training which is beneficial for the company should be made interesting for the personnel. Challenging technical experts is a problem which is often aggravated by a poor hiring policy which selects over-qualified personnel. Such people could certainly accomplish the everyday tasks of the company if only they weren't so bored. The management problem of providing challenge is initially solved by hiring people who will be challenged by the work that exists at the company. Continuing challenge can be provided by increasing responsibility and rotating tasks. Orienting the technical personnel is a critical part of managing the computer department. If left alone, most technical specialists tend to view the outside world as it relates to the parameter list of his logical input/output module, for example. He needs to be oriented to realize that his technical specialty is important because it contributes to the overall whole of the services provided to the clients. This client-oriented attitude is needed at all levels within a service organization. Besides personnel, the other major resource to be optimized by the computer department is the computing system. This includes equipment and the basic programs delivered with the equipment, sometimes called "hardware" and "software". Optimizing of a computing system is a frequently misunderstood or neglected function of the computer department. In a sense this is not surprising as there are three factors which obscure the recognition of the problem. First of all, computers tend to be configured by technical people who like computers. Secondly, most computer systems have produced adequate means of justifying themselves, even in an unoptimized state. Lastly, computer personnel, both manufacturers and users, have resisted attempts to subject their expenditures to rigorous analysis. It seems paradoxical that the same computer experts who have created effective 101 analysis methodologies for so many other fields maintain that their field is not predictable and not susceptible to methodological optimization. The utilization of computer systems is capable of being analyzed and may be seen as three distinct steps in the life cycle of the resource. These three steps can be presented diagrammatically as follows: general requiremBnts I I development of the hardware strategy computing requirements selection of a system system options j tuning of the system system configuration All too often, the strategy is chosen by default, the selection is made on the basis of sales effectiveness, and the tuning is something called "meeting the budget." Development of the hardware strategy Many computer departments don't even realize that different strategies exist for computing. This is not to say that they don't use a strategy; rather that they don't know it and haven't consciously selected a strategy. The hardware strategy depends on having an understanding of the general needs of the computer department. The needs for security, reliability, independence, centralization of employees, type of computing to be done, amount of computing, etc., must be formulated in general terms before a strategy decision can be made. There are many possible ways to arrange computing equipment, and they each have advantages, disadvantages, and, as usual, different costs. The problem is to pick the strategy which responds to the aggregate of the general needs. Perhaps some examples can best demonstrate the essence of a computing strategy. A large oil company having both significant scientific and business processing decides to separate the two applications onto two machines with each machine chosen for its performance/ cost in one of the two specialized domains. A highly decentralized company installs one large economical general purpose computer but with remote terminals each of which is capable of performing significant 102 Fall Joint Computer Conference, 1972 independent processing when not being used as a terminal. A highly centralized company installs two large mirror-image general purpose computers with remote terminals which are efficient in teletransmission. This is one area where the in-house computer department is not exactly like an external supplier of services, for the system strategy must reflect the general needs, and constraints, of the whole corporation. Selection of a system Mter the strategy is known, it becomes possible to better analyze and formulate the computing needs in terms of the chosen strategy. This usually results in a formal specification of computing requirements which includes workload projections for the expected life of the system. This is not a trivial task and will consume time, but the service rendered by the eventual system will directly depend on the quality of this task. Once an anticipated workload is defined, one is free to utilize one, or a combination, of the methods commonly used for evaluating computer performance. Among these are simulation, benchmarks, and technical expert analysis. One key decision will have a great influence on the results of the system selection: is a complete manufacturer demonstration to be required? This question should not be answered hastily; because a demonstration requires completely operational facilities, which may guarantee that the computer department will get yesterday's system, tomorrow. On the other hand, not having a demonstration requirement may bring tomorrow's most advanced system, but perhaps late and not quite as advanced as expected. In any case, some methodology of system selection is required, if only to minimize the subjectivity which is so easily disguised behind technical jargon. environment. Take an example. As a result of the characteristics of the selected computer system, it might turn out that the mix of jobs "required" during the peak hours dictates that the expensive main memory be 50 percent larger than at any other time. Informing the clients of this fact, and that the additional memory cost will naturally be spread over their peak period jobs, will usually determine if all the requirements are really this valuable to the client. The client has the right to be informed of problems that will directly affect his service or costs. Only he can evaluate them and decide what is best for him. Tuning of the environment involves selecting the best technical options, fully exploiting the potential of the computing configuration, and otherwise varying the parameters available. The trick is to examine all the parameters in the environment, not just the technical ones. This tuning process should be made, on a periodic basis, to insure that the environment remains as responsive as possible to the current needs. PROPOSAL-AN ORGANIZATIONAL STRUCTURE FOR THE IN-HOUSE COMPUTER DEPARTMENT It may not be possible to organize every computer department in the same manner, but some orientation should be found which would minimize the lateral dependencies in the organization. Perhaps a division of responsibilities based on the time perspective would be useful. Something as simple as a head office with three sections for long-range, medium-range, and short-range tasks could minimize lateral dependencies and still allow for exploitation of innate flexibility. In the language of the computer department, these sections might be called the head office, planning, projects, and operations, as shown in Figure 1. Tuning of the system The head office The winner of the hardware selection should not be allowed to start to take advantage of the computer department once the choice is made. On the contrary, the computer department is now in its strongest position as the parameters are much better defined. One more iteration on the old specifications of requirements can now be made in light of the properties of the selected system. Also, an updating of the workload estimates is probably in order. Armed with this information, the computer department is now ready to do final battle to optimize the utilization of the system. This optimization involves more than just configuring the hardware. It is a fine tuning of the computing There are three functions which must be performed by the head office. These functions are those which HEAD OFFICE PROJECTS SECTION Figure 1 The In-House Computer Department encompass all of the other sections and are integral to the computer department. The first, and most important, of the functions for the head office is certainly marketing and client relations. All aspects of a service's life cycle involve the customer and he must be presented with a common point of contact on the business level. Every client should feel that he has the attention of city hall for resolving problems. In the opposite direction, the three sections should also use the head office for resolving conflicts or making decisions which affect the clients. The second function of the head office is to control the life cycle of a service. As a service matures from definition to development to operations, it will be passed from one section to another. This phasing avoids requiring the technical people to change their outlook and skills to match the changes in the maturing process, but may create problems as a service is passed from hand to hand. Only the head office can control the process. Resource control is the last function of the head office. The allocation of the various resources is an unavoidable responsibility and must reflect the changing requirements of the computer department. 103 is limited for each task and each task is executed in a project approach. A permanent nucleus of specialists exists to evaluate and implement major changes in the equipment. Each such major change is limited in its scope and accomplished on a project basis. Development of services is naturally a task for the projects section. Each such project is performed by a team composed of people from the permanent nucleus and from the other two sections. The leadership comes from the projects section to insure that the project philosophy is respected, but utilization of personnel from the other sections assists in the transistions from planning to projects and from projects to operations. This latter transition from development to operations is a part of the third function of the projects section. Direct aid is given to the operations section to insure that project results are properly understood and exploited in the day-to-day operations. The operations section Here is the factory. The time orientation is immediate. There are five major tasks to be performed, each of which is self-evident. The planning section This is the long-range oriented group which must combine technical and market knowledge to plan for the future. The time orientation of this section will vary from company to company, but any task which can be considered as being in the planning phase is included. Among the planning tasks is the development of longrange strategy. This strategy must be founded on a knowledge of expected customer needs (market research), advances in technical capabilities (state-of-theart studies), and constraints on the computer department (corporate policy). Development of an equipment strategy is a good example of this task. Another planning function is the developing of functional specifications for potential new services. In this respect, the planning section directly assists the head office in defining new services for clients. Lastly, -the planning section assists the projects section by providing state-of-the-art techniques which can be used in developing a specified service. The projects section This section has responsibility for the tasks in the computer department which are between planning and operation. Included is both development of services and changes in the technical facilities. The time orientation • Day-to-day production of the services, • Accounting, analysis and control of production costs, • Installation and acceptance of new facilities, • Maintenance of all facilities (this includes systems software and client services), • Recurring contact, training, and aid to the clients in use of the services. TWO EXAMPLES Perhaps the functioning of this organization can be demonstrated by an exampleJrom each of the two major areas of services and resources. The life cycle of a service may begin either in the planning section (as a result of market research) or in the head office (as a result of sales efforts). In any case, the definition of the service is directed by the head office and performed by the planning section. Once the contract is signed, the responsibility passes to the projects section and the project team is buJlt for the development effort. On the project team there will be at least one member from the planning section who is familiar with the definition of the service. The operations section also contributes personnel to facilitate the turnover at the end of the project. Other personnel are gathered from the permanent nucleus and the sections as needed. Each 104 Fall Joint Computer Conference, 1972 project member is transferred to the project section for the life of the project. The service is implemented, turned over to the operations section, and the project team is disbanded. Daily production and maintenance are performed by the operations section as is the periodic overhaul of the system. Each change of sections and all client contacts are under the control of the head office. For resource utilization a close parallel exists. The head office again controls the life cycle. As an example, take the life cycle of a computer system. The planning section would develop a strategy of computing which would be approved by the head office. When the time arrived for changing the computer system, the projects section would define a project and combine several temporary members and permanent nucleus personnel to form the project team. A computer system selection would be made in line with the strategy of computing, and the system would be ordered. The operations section would be trained for the new system and accept it after satisfactory installation. Periodic tuning of the computer system would be done by permanent personnel in the projects section with the cooperation of the operations section. The flow of responsibility for these two examples is represented by Figure 2. SUMMARY Excepting those cases where the product of a company contains a computer component, the in-house computer department is in the business of providing an external service to the integral functions of a non-computer business. For this reason, the computer department does not appear to mesh well on an organizational chart of the departments which do directly contribute to the product line of the corporation. However, a wellfounded in-house computer department which depends on its users for funds and on itself for the optimizing of the resources provided by these funds can peacefully serve within the organization. The computer department can respond to these two principles of funding and resource control by recognizing that its funds depend on the satisfaction of the users and that the optimizing of the use of these funds can be aided by organizing around the life cycles of both the services provided and the resources used. One possible organization designed to fulfill these two goals is composed of a head office and three sections. The head office maintains continuing control over the client relationship and over the life cycle of both services and resources. Each of the three sections specializes on a certain phase of the life cycle: definition, development, and operation. Such an organizational approach for the computer department should provide: CLIENTS 1 HEAD OFFICE I servi ce functi ons rt - - - I SERVIcESALES L _ _ _ ...J --------1 I -, I resource functions I r---, RESOURCE r' .... - ....... -- -'-' .... - ' ':' i I NEEDS L _ _ _ _ .-1 PROJECTS SECTION r I, 1 I " '- _ _ _ _ . ____L I 1 l : RESOURCE: : STRATEGY I--~ RESOURCE : SERVICE SERVICE I_~ , I ~. -- t- _ _ _ _ _ _ _ 1 : ~ - : - - -- - _ ... I RESOURCE , SELAE~610N ;--.: UTILIZATION: 1 ' : OPT 1M t ZAT I ON : : 1--------1 :---------j !---------j '- ________ , ,- - - - - - - - - -l '- - - _ - - - - - - -! , : ~,1_DEF IN IT ION'- -,.,1_ DEVELOPMENT _______I _ _ _ _ _ _ _ Figure 2 L-~ I 1 •• 1_ SERVICE PRODUCTION _ _______ , .!I • Computer services which are responsive to, and justified by, the needs of the users, • A controlled and uniform evolution of the life cycle of both services and resources, • A computer department management oriented towards dealing with technical considerations on a business basis, • Technical personnel who are client-oriented specialists and who are constantly challenged and matured by dealing with different problems from different frames of reference, • An in-house computer department which is selfsupporting, self-evaluating, and justified solely by its indirect contributions to the total productivity of the corporate efforts. A computer center accounting system by F. T. GRAMPP Bell Telephone Laboratories, Incorporated Holmdel, New Jersey INTRODUCTION and timely reporting of charges by case (the term "case" is the accounting term we use for "project" or "account"), so that costs of computer usage to a project would be known, and by department, to ascertain the absolute and relative magnitude of computer expenses in each organization. These orders of reporting are not necessarily identical, or even similar. For example, the cost of developing a particular family of integrated circuits might be charged against a single case, and computer charges for this development might be shared by departments specializing in computer technology, optics, solid state physics, and the like. Similarly, a single department may contribute charges against several or many cases-a good example of this is a drafting department. Original charging information is associated with a job number, an arbitrary number assigned to a programmer or group of programmers, and associated with the department for which he works, and the project he is working on. This job number is charged to one case and one department at any given point in time; however, the case and/or department to which it is charged may occasionally change, as is shown later. This paper describes a computer center accounting system presently in use at the Holmdel Laboratory and elsewhere within Bell Telephone Laboratories. It is not (as is IBM's SMF, for example), a tool which measures computer usage and produces "original" data from which cost-per-run and other such information can be derived. ltis, rather, a collector of such data: it takes as input original run statistics, storage and service measurements from a variety of sources, converts these to charges, and reports these charges by the organizations (departments) and projects (cases) which incur them. "DESIGN CRITERIA," below, outlines the overall functions of the system and describes the design criteria that must be imposed in order to assure that these functions can be easily and reliably performed. The remainder of this paper is devoted to a somewhat detailed description of the data base (as seen by a user of the system) and to the actual implementation of the data base. Of particular interest is a rather unusual means of protecting the accounting data in the event of machine malfunction or grossly erroneous updates. Finally, we describe backup procedures to be followed should such protection prove to be inadequate. A description of the system interface is given in the Appendix for reference by those who would implement a similar system. Simplicity of modification Many factors were considered in designing the system described here. The following were of major importance: One thing that can be said of any accounting system is that once operational, it will be subjected to constant changes until the day it finally falls into disuse. This system is no exception. lt is subj ected to changes in input and output data types and formats, and to changes in the relationships among various parts of its data base. Response to such changes must be quick and simple. Cost reporting Expansion capability Reporting costs is the primary function of any accounting system. Here, we were interested in accurate One of the more obvious unknowns in planning a system of this type is the size to which its data base may DESIGN CRITERIA 105 106 Fall Joint Computer Conference, 1972 eventually grow. On a short term basis, this presents no problem: one simply allocates somewhat more storage than is currently needed, and reallocates periodically as excess space begins to dwindle. Two aspects of such a procedure must, however, be borne in mind: First, the reallocation process must not be disruptive to the day-to-day operation of the system. Second, there must be no reasonably foreseeable upper limit beyond which reallocation eventually cannot take place. Protection Loss of, say, hundreds of thousands of dollars worth of accounting information would at the very least be most embarrassing. Thus steps must be taken in the design of the system to guarantee insofar as is possible the protection of the data base. Causes of destruction can be expected to range from deliberate malfeasance (with which, happily, we need not be overly concerned), to program errors, hardware crashes, partial updating, or operational errors such as running the same day's data twice. If such dangers cannot be prevented, then facilities which recover from their effects must be available. Continued maintenance The most important design criterion, from the designer's point of view, is that the system be put together in such a way that its continued maintenance be simple and straightforward. The penalty for failure to observe this aspect is severe: the designer becomes the system's perpetual caretaker. On the other hand, such foresight is not altogether selfish when one considers the problems of a computer center whose sole accounting specialist has just been incapacitated. THE DATA BASE: LEVEL 1 There are two ways in which to examine the data base associated with the accounting system. In the first case, there is its external appearance: the way it looks to the person who puts information into it or extracts information from it. Here, we are concerned with a collection of data structures, the way in which associations among the structures are represented, and the routines by means of which they are accessed. In the second, we look at its internal appearance: Here, we are interested in implementation details-in particular, those which make the system easily expansible and maintainable, and less vulnerable to disaster. These two aspects of the data base are, in fact, quite independent; moreover, to look at both simultaneously would be confusing. For this reason, we shall consider the first here, and defer discussion of the second to a later part of this paper. We first examine the structures themselves. Tally records Accounting system data is kept on disk in structures called tally records. Since we are concerned with data pertaining to cases, departments and job numbers, we have specified a corresponding set of tally records: Case Tally Records, Department Tally Records and Job Tally Records, respectively. These will be abbreviated as CTRs, DTRs and JTRs. In each tally record is kept the information appropriate to the particular category being represented. Such data fall naturally into three classes: fiscal information-money spent from the beginning of the year until the beginning of the present (fiscal) month; linkage data-pointers to associated records; other data---;anything not falling into the other two categories. For example, a CTR contains fiscal and linkage information: charges (a) up to and (b) for the current fiscal period, and a pointer to a chain of JTRs representing job numbers charged to the CTR's case. A DTR's content is analogous to that of a CTR; the exception is the inclusion of some "other" data. When we. report charges by case, the entire report is simply sent to the comptroller. Department reports, however, are sent to the heads of individual departments. To do so, we require the names of the department heads, and their company mailing addresses; hence the "other" data. A JTR contains considerably more information: in addition to the usual fiscal and linkage information, a JTR contains pointers to associated case and department, data identifying the responsible programmer, and a detailed breakdown of how charges for the current month are being accumulated. There is no way of determining a priori those things which will be charged for in order to recover computer center costs. In the olden days (say, 10 years ago) this was no problem: one simply paid for the amount of time he sat at the computer console. With today's computers, however, things just aren't that simple, since the computer center is called upon to provide all sorts of computing power, peripherals and services, and in turn, must recover the costs of said services from those who use them. Thus one might expect to find charges for CPU time, core usage, I/O, tape and disk storage rental, mounting of private volumes, telephone connect time, and so on. Add to this the fact that the charging A Computer Center Accounting System algorithm changes from time to time, and it quickly becomes apparent that the number and kinds of charging categories simply defy advance specification. Further, it seems clear that a given resource need not always be charged at the same rate-that in fact the rate charged for a resource should be a function of the way in which the resource is being used. For example, consider a program which reads a few thousand records from a tape and prints them. If such a program were to be run in a batch environment, in which printed output is first spooled to a disk and later sent to a high speed printer, one would expect the tape drive to be in use for only a matter of seconds. If the same program were to be run in a time-shared environment, in which each record read was immediately shipped to a teletype console for printing, the drive might be in use for several hours. If the computer center's charging algorithm is designed to amortize the rental costs of equipment among the users of the equipment, the latter use of "tape" ought to be considerably more expensive than the former, even though the same amount of "work" was done in each case. For these reasons, we chose to make the process table-driven. In this way, new charging categories can be added, old ones deleted, and rates changed simply by editing the file on which a rate table resides. Such a scheme has the obvious drawback of requiring a table search for each transaction with the system, but the inefficiencies here are more than compensated by the ability to make sweeping changes in the charging information without having to reprogram the system. Our rate table is encoded in such a way that it may be thought of as a two dimensional matrix. One dimension of the matrix consists of the services offerred by the computer center: batch processing (in our case, an ASP system), time shared services, data handling (a catch-all category which includes such things as tape copying, disk pack initialization and the like) storage rental, and sundry others. The other dimension consists of the usual computer resources: CPU time, core, disk and tape usage, telephone connect time, etc. When a user incurs a charge, it is recorded in his JTR as a triple called a "chit." The chit consists of a service name, such as "ASP," a resource name, such as "CPU," and the dollar amount which he has been charged. In this implementation, each chit occupies twelve bytes: BYTE: 0 COST RES SERV 4 8 107 These chits are placed in an area at. the end of the JTR. Initially, the area is empty. As time progresses and charges are accumulated, the number of chits in the JTR grows each time the job number is charged for a service-resource combination that it hasn't used before. The JTR itself is of variable length, and open-ended "to the right" to accommodate any number of chits that might be placed there. Linkages There are, in general, two ways in which one accesses information in the data base. Either one knows about a job number, and applies a charge against it and its associated case and department, or one knows about a case or department number and desires to look at the associated job numbers. This implies that there must be enough linkage information available for the following: (a) Given a job number, find the case and department to which that number is charged. (b) Given a case or department number, find all of the job numbers associated with that case or department. The first case is trivial: one simply spells out, in a JTR, the case and department to which the job number is charged. The second case is somewhat more interesting in that there may be one, or a few, or even very many job numbers associated with a single case or department. At Holmdel, we have the worst of all possible situations in this regard, in that the large majority of our cases and departments have very few job numbers associated with them, whereas a certain few have on the order of a hundred job numbers. Viewed in this light, schemes such as keeping an array of pointers in a CTR or DTR are, to say the least, unattractive because of storage management considerations. What we have chosen to do, in keeping with our philosophy of open-endedness, is to treat the case-job and department-job structures as chains, and using the CTRs and DTRs as chain heads, operate on the chains using conventional list processing techniques. In our implementation, a case-job chain (more properly, the beginning of it) appears in a CTR as a character field containing a job number charged to that case. In the JTR associated with that job number, the chain is continued in a field which either contains another job number charged to the same case, or a string of zeros, which is used to indicate the end of a chain. Fields in the DTR and JTR function analogously to represent department job chains. 108 Fall Joint Computer Conference, 1972 will fit in the core storage currently allocated the index. RO,MRT: are not relevant at this time, and will be discussed later. Traversing such a chain (as one frequently does while producing reports) is quite simple: begin at the beginning and get successive JTRs until you run out of pointers; then stop. Inserting a new job number into a case- or department-job chain is also straightforward: copy the chain head into the chain field in the new JTR; then point the CTR or DTR to the new JTR. Deletion of JTRs from the system is accomplished by means of similar "pointer copying" techniques. Entries are of the form (P1,P2,NAME), where PI and P2 are 31-bit binary numbers pointing to records in a direct access data set, and NAME is a character string of appropriate length containing a case, job or department number. Indices A ccessing techniques As was previously mentioned, the job numbers that are assigned to users are arbitrary. They happen, in point of fact, to be sequential for the most part, but this is simply a matter of clerical convenience. The only convention followed by case and department numbers is that (as of this writing) they are strictly numeric. This implies the necessity of a symbol table to associate names: case, department and job numbers, with their corresponding tally records on disk. Three types of symbol table organization were considered for use with this system: sequential, in which a search is performed by examining consecutive entries; binary, in which an ordered table is searched by successively halving the table size; hash, in which a randomizing transformation is applied to the key. Of these, the sequential search is simply too slow to be tolerated. While the hashing method has a speed advantage over the binary method, the binary method has a very strong advantage for our application, namely, that the table is ordered. One of the functions of the accounting system is that of producing reports, which are invariably ordered by case, department or job number. The ordering of the indices facilitates the work of many people who use the system. In this implementation, there are three indices: one for cases, one for departments, and one for job numbers. These will be abbreviated CDX, DDX and JDX, respectively. Each index consists of some header information followed by pairs of names and pointers to associated tally records. Header information consists of five items: Two types of access to the data base are required. The first is the programmer's access to the various structures and fields at the time he writes his program. The second is the program' 8 access to the same information at the time the program is run. The choice of PL/I as the language in which to write the system was, oddly enough, an easy one, since of all of the commonly available and commonly used languages for System/360, only PL/I and the assembler have a macro facility. Using assembly language would make much of the code less easily maintainable, and thus PL/I won by default. The macro facility is used solely to describe various data base components to the routines that make up the accounting system by selectively including those components in routines which use them. Further, all references to these components are made via the macros. Adoption of this strategy has two somewhat related advantages: First,. it forces consistent naming of data items. Without the macros, one programmer would call a variable "X", another would call it "END-OFMONTH-TOTAL", and so on. This, at least, would happen, and worse can be imagined. Second, should there be a change in a structure, all of the programs that use the structure must be recompiled. If the macros are used, the change can be made in exactly one place (the compile-time library) before recompilation. Run-time access to the data base is achieved by following simple conventions and by using routines that have been supplied specifically for this purpose. These conventions are simple because they are few and symmetric. The data base consists of six structures: the three indices, ~nd the three types of tally records. None of these structures are internal to a program that interfaces with the data base. All of them are BASED, that is, located by PL/I POINTER VARIABLES which have been declared to be EXTERNAL so that they will be known to all routines in the RL: TN: TMAX: Record Length for the tally record. This is needed by the PL/I and 08/360 InputOutput routines. The number of entries currently in the index. The maximum number of entries which A Computer Center Accounting System system. Thus, for example, a program that accesses the JDX would contain the following declarations: DCL 1 JDX BASED(PJDX), /* The JDX is defined, and */ % INCLUDE JDX; /* its detailed description */ DCL PJDX POINTER EXTERNAL; /* called from the library. */ The same convention applies to all of the other structures: they are allocated dynamically and based on external pointers whose names are the structure names prefixed by "P". A more detailed description of the user interface is given in the Appendix. The foregoing implies that there is a certain amount of initialization work to be done by the system: setting pointers, filling indices and the like. This is, in fact, the case. Initialization is accomplished by calling a routine named INIT, usually at the start of the PL/I MAIN program. Among its other functions, INIT: (a) Opens the accounting files. These include the six files containing the indices and tally records. Also opened are the file which contains the rate table, and a file used for JTR overflow. (b) Allocates space for the indices and tally records, then sets pointers to the allocated areas. (c) Reads into core the indices and the rate table, then closes these files. Some unblocking is required here both because the designers of PL/I (and indeed, of OS/360) have decreed that records shall not exceed 32,756 bytes in length, and because short records make the data base accessible to time shared programs running on our CPS system. Once INIT returns control, the operating environment for the accounting system has been established. Indices are in core, and can be accessed by conventional programming techniques or by using the SEARCH, ENTER and DELETE routines, provided. Reading and writing. of tally records is also done by system routines, these being: RDCTR RDDTR RDJTR WRCTR WRDTR WRJTR The read-write routines all require two arguments-a character string containing the name of the tally record to be read or written, and a logical variable which is set to signal success or failure to the caller. Actual data 109 transfer takes place between a direct access data set and a based "TR" area in core. A typical example of the use of these routines is: CALL RDJTR(MYJOB,OK); IF --, OK THEN STOP; Two higher level routines, FORMJTR and LOSEJTR, are available for purposes of expanding or contracting the data base. FORMJTR examines the contents of JTR. If the JTR seems reasonable, that is, if it contains a case and department number, and its chain pointers are explicitly empty (set to zero) it performs the following functions: (a) Checks to see if an appropriate CTR and DTR exist. If not, it creates them. (b) Writes the JTR. (c) Includes the JTR in the linkage chains extending from the CTR and DTR. LOSEJTR performs exactly the inverse function, including deleting CTRs and DTRs whose chains have become empty as a result of the transaction. INTERFACING WITH THE SYSTEM Activities involving the system fall into four general categories: creating the data base, modifying the existing data base, inputting charges, and producing reports. Creating the data base No utility is provided in the system for the express purpose of creating the data base, because the form and format of previously extant accounting information varies widely from one Bell Laboratories installation to the next. A program has to be written for this purpose at each installation; however, the system routines provided are such that the writing of this program is a straightforward job requiring a few hours' work, at most. Briefly, creation of the data base proceeds as follows: (a) Estimates are made of data set space requirements. These estimates are based on the number of cases, departments and job numbers to be handled, and on the direct access storage device capacities as described in IBM's Data Management! publication. Data sets of the proper size are allocated, and perhaps catalogued, using normal OS/360 procedures. (b) An accounting system utility named RESET is 110 Fall Joint Computer Conference, 1972 then run against the files. RESET initializes the indices so that they can communicate their core storage requirements to the system. No entries are made in the indices. (c) The aforementioned installation-written routine is run. This routine consists of a two step loop: read in the information pertinent to a job number and construct a JTR; then call FORMJTR. (d) At this point, the data base is installed. A program called UNLOAD is run so that a copy of the data base in its pristine form is available for backup purposes. Modifying the data base Two types of data base modifications are possible: those which involve linkage information, and those which do not. The latter case is most easily handledan EDITOR is provided which accepts change requests in a format similar to PL/l's data directed input, then reads, modifies and rewrites a designated tally record. The former case is not so simple, however, and is broken down into several specific activities, each of which is handled by an accounting system utility supplied specifically for that purpose. Authorizing new job numbers and closing old ones is done by a program called AUTHOR. This program adds a new entry to the data base by calling FORMJTR, and closes a job number by setting a "closed" bit to "I" in its JTR. Note that closed job numbers are not immediately deleted from the system. Deleting closed job numbers is done once per year with an end-of-year program designed for that purpose. At this time, DTRs and CTRs which have no attached JTRs are also deleted from the system. Changing the case or department number to which a job number is charged may be done in either of two ways. It is best to illustrate these by example. In the first case, consider a department which has been renamed as a result of an internal reorganization. Its department number has been changed, say from 1234 to 5678, yet its work and personnel remain the same. In this case, it is desirable to delete "1234" from the DDX, install "5678", and change all "1234" references in the department-job chain to "5678". As a second example, consider the case of a job number which was used by department 2345 but is now to be used by department 6789 due to a change in departmental work assignments.· On the surface, this seems to be a matter of taking the job number out of 2345's chain and inserting it into 6789's. Unfortunately, it isn't that simple. The charge fields in a chain, if added, should be equal to the field in the DTR at the chain head. Simply moving a JTR from one chain to another will make the old chain's fields sum low, and the new chain's fields sum high. The obvious solution to this problem is to forbid the changing of charged departments-i.e., to require that in the event that such a change is desired, the old job number be closed, and a new one authorized. Such a solution is not a very popular one, since job numbers have a habit of becoming imbedded in all sorts of hardto-reach places--catalogued procedures, data set names and the like. Furthermore, it has been our experience that programmers develop a certain fondness for particular job numbers over a period of time and are somewhat reluctant to change them. Our solution, then, is as follows: Given a job number, say 1234, whose charged department is to be reassigned, open a new job number, say 1234X, whose name was not previously known to the system, and which is charged to the proper department. Then close the old job number, and proceed to exchange n~mes in the JTRs, and linkage pointers in the respective chains. A utility called SWAP is available which permits renaming or reassignment of either departments or cases (or both). Inputting charges As might be expected from our previous discussion of charging categories, there are many inputs to the accounting system. Moreover, the input formats are quite diverse, and subject to constant change. In order that the people charged with maintaining the accounting system might also be able to maintain their own sanity, it was necessary to design a simple way of incorporating new sources of charging information into the system. Our first thought was to design a "general purpose input processor" i.e., a program that would read a data description and then proceed to process the data following (in this case, charge records). This approach was quickly abandoned for two reasons. First, the data description language required to process our existing forms of charge records would be quite complicated and thus difficult to learn and use, if in fact it could be implemented at all. Second, for each class of input charges, there is a certain amount of validity checking that can be performed at the time the charge records are read. Such checking need not be limited to a single record-for example, if it is known that a certain type of input consists of sequentially numbered cards, then a check can be made to determine whether some cards have been left out. A Computer Center Accounting System Our approach was as follows. For each type of charge record used by an installation, an input program must be written. This input program reads a charge record, does whatever checking is possible, constructs a standard structure consisting of a job number, service name, and one or more resource-quantity pairs, and passes this structure to aprogram called CHARGE. CHARGE does the rest. It brings in the appropriate JTR, converts the quantities in the resource-quantity pairs to dollar charges via factors contained in the rate table, charges the JTR, adding chits if necessary, and charges the associated CTR and DTR. The important point here is that the writer of an input program is allowed complete freedom with respect to formats and checking procedures, while he is also allowed almost complete naivete with respect to the rest of the system. Reporting The system includes programs to produce three "standard" reports: one, (by cases) to be sent to the comptroller, one (by departments) to be sent to department heads, and a third (by job number) to be sent to the person responsible for each active job number in the system. The comptroller's report is required of the computer center, and its format was specified in detail by the comptroller. The other two reports were designed to give the users of the computer center a clear and easily readable report of their computer usage in as concise a form as possible. The department report shows old and recent charges to the department, followed by a list of job numbers being charged to that department. Accompanying each job number are its charges, the case to which it is charged, and the name of the person responsible for it. A more detailed breakdown is certainly possible; the average department head, however, usually doesn't want to see a breakdown unless something looks unusual. In that case, the programmer responsible for the unusual charges is probably his best source of information. The user's report shows old and new charges for a job number, together with a detailed breakdown of the new charges by service-resource pairs. Its use to the programmer is threefold: it satisfies his curiousity-it enables him, in some cases, to detect and correct uneconomical practices-and it enables him to supply more detailed information to his department head should the need arise. In order to produce the user's report, all of the chits in all of the JTRs in the system must be scanned. Dur- 111 ing the scanning process, it is a trivial matter to maintain a set of "grand totals" showing the money recovered by the computer center in terms of all serviceresource categories. This valuable "by-product" is published after the user reports have been generated. More specialized reporting is possible, but these programs, by their nature, are· best written by particular installations rather than distributed as a part of the accounting system package. As was mentioned earlier, the ordering of the indices greatly facilitates the writing of such programs. THE DATA BASE: LEVEL II The foregoing discussion of the data base was aimed at the user of the system, and thus said nothing about its structure in terms of physical resources required, and the way in which these resources are used. We now expand on that discussion, concentrating on those factors influencing expansibility and protection. The main features of interest here are the implementation of tally record storage, the indices, and the provision to handle variable length JTRs. Free storage pools CTRs, DTRs and JTRs are stored on direct access data sets. When it is desired to access a tally record, a search of the appropriate index is made, and a relative record number on which the tally record is written is obtained from the index and used as the KEY in a PL/I read or write statement. The interesting feature of the system is that there is no permanent association between a particular tally record and a particular relative record number. Direct access records used to contain tally records are stored in linked pools. The RO entry in the appropriate index head points to the first available link, that link points to the second, and so on. One can think of the initial condition of a pool (no space used) as follows: RO contains the number 1, record # 1 contains the number 2, etc. When a link is needed for tally record storage, a routine called GETLINK detaches the record pointed to by RO from the free pool by copying that record's pointer into RO. The record thus detached is no longer available, and its number can be included into an index entry. A second routine called PUTLINK performs exactly the inverse function. These activities are well hidden from the users of the system. The obvious advantage of the casual user not seeing the list processing operations is that he won't 112 Fall Joint Computer Conference, 1972 be confused by them. The disadvantage is that when he runs out of space and allocates a larger data set, he will forget to initialize the records by having the nth unused record point to the n + 1st, as above. On the assumption (probably valid) that the data base will continue to grow over long periods of time, we have simplified the initialization procedure as follows: (a) Let RO point to the first available record (initially 1) and MRT point to the maximum record ever taken (initially 0). (b) When GETLINK calls for a record, compare RO and MRT. If RO>MRT then the data base is expanding into an area it never used before. In this case, set MRT=MRT+1. Otherwise, the record specified by RO has been used in the past and has since been returned by PUTLINK, in which case we proceed as before. With this procedure, initialization of the pools is done at the time that the data base is first created. Subsequent reallocation of data sets for purposes of enlarging the storage area is done as per standard OS / 360 practice. Indices and protection Recall that although an index entry can be thought of as consisting of a pair (P ,NAME) where P is a record number, and NAME is some character string, the entries are in fact represented as triples (Pl,P2,NAME). At the time that an index is read into core by INIT, all of the PIs contain record numbers, while all of the P2s contain O. Reading and writing of the tally records is done as follows. For reading: (a) If the P2 entry is non-zero, read record # P2. (b) Otherwise, read record # Pl. And for writing: (a) If the P2 entry is zero, call GETLINK for a free record number, and copy it into P2. (b) Write record # P2. At the conclusion of a run in which the data base is to be updated, the main program, which had caused the operating environment to be established by calling INIT, now calls a routine named FINI, which in turn: (a) Exchanges the P2 and PI index entries in all cases where P2 is non-zero. (b) Returns surplus links to the pool via PUTLINK. (c) Rewrites the indices in more than one place. (d) Closes all of the files. Such a strategy offers both protection and convenience. Clearly, the danger of partial updating of the files during a charging run is minimized. Indeed, our standard operating instructions for those who run the system state that a job which crashes prior to completion is to be run a second time, unchanged. Further, a program that doesn't call FINI will not update the accounting files. Included in this category, besides "read only" reporting programs, are debugging runs, and input programs which contain algorithms to test the validity of incoming data, and which may not modify the files. Variable length records The JTRs, because of the fact that they can contain an unpredictable number of chits, are variable in length. Overflow records are obtained via GETLINK to extend the JTR as far as required. As read into storage by RDJTR, the overflow links are invisible to the user. Besides the obvious convenience, the overflow handling in the JTR offers a different, if not devious type of protection. In the case of a system such as this, where the number of charging categories is, for practical purposes, unlimited, there is always the temptation to make the charging breakdown finer, and finer, and finer. Succumbing to this temptation gives rise to nasty consequences. Processing time and storage space increase but the reports from the system become more voluminous, hence less readable, and in a sense contain less information because of the imprecision inherent in so many of the "computer usage measurement" techniques. (In this latter case, we often tend to behave analogously to the freshman physics student who measures the edges of a cube with a meter stick and then reports its volume to the nearest cubic millimicron.) By happy coincidence, it turns out that in a system with "normal" charging categories, most JTRs have relatively few chits-too few to cause overflow-while occasional JTRs require one or more overflow records. Should the breakdown become fine enough that most of the JTRs cause overflow, the cost of running the accounting system rises-not gradually, but almost as a step. Further, if the breakdown is subsequently made coarser, the excess chits, and hence the overflow records, quietly disappear at the end of the next account- A Computer Center Accounting System ing period. Thus the system is, in a sense, forgiving, and tends to protect the user from himself. BACKUP As Mr. Peachum2 aptly remarked, it has never been very noticeable that what ought to happen is what happens. In addition to our efforts to make the system crash-proof, we have also provided several levels of backup procedures. 113 to the extent that one or more of them "points to the wrong place". Although this condition is most unusual, it is also most insidious, since there is a possibility that errors of this, type can remain hidden for, perhaps, as long as a few weeks. If enough input data has been added to the data base to make it undesirable to backtrack to the point prior to that at which the initial error is suspected to have occurred, symbolic information sufficient to regenerate the pointers is contained in the data base, and routines have been provided to copy the data base, sans structure, onto a sequential file, and then to rebuild it, using FORMJTR. Backup indices ACKNOWLEDGMENT As noted previously, FINI rewrites the indices, but in more than one place. Since the "extra" copies are written first, these can be copied to the "real" index files in the event that a crash occurs while the latter are being rewritten by FIN!. Unload-reload copies Two utilities, UNLOAD and RELOAD, are supplied with the system .. UNLOAD copies the structured files onto tape in such a way that the structure is preserved if RELOAD copies them back. It is our present practice to take UNLOAD snapshots daily, cycling the tapes once per week, and, with a different set of tapes, monthly, cycling the tapes once per year. Since chits are deleted at the end of each month (for economy of storage) UNLOAD-style dumps are also useful if it becomes necessary to backtrack for any reason to a point in time prior to the beginning of the current month. Further, the tapes are in such a format that they are easily transmitted via data link to another installation for purposes of inspection or off-site processing. The design and implementation of the accounting system described here was completed with the help and cooperation of many people, and for this the author is truly grateful. In particular, the efforts, advice, insight and inspiration provided throughout the project by Messers. R. E. Drummond and R. L. Wexelblat assured its successful completion. REFERENCES 1 IBM system/360 operating system data management services Order Form GC26-3746 2 B BRECHT Die Dreigroschenoper APPENDIX The user-system interface The facilities provided to give the user convenient access to the data base and the routines which manipulate it can be divided into two categories: compile-time facilities and run-time facilities. 08/360 dump-restore It is the practice, in our computer center, to periodicany dump all of our permanently mounted direct access storage devices using the OS/360 Dump-Restore utility. Since the accounting files are permanently mounted, this procedure provides an additional level of safety. Reformatting The worst possible mishap is one in which the chains in the system, for one cause or another, are destroyed Compile-time facilities These consist of PL/I macro definitions describing various structures. Since the storage class of a structure (e.g., BASED, STATIC, etc.) may be different in different routines, or, where there are multiple copies of a structure, even within the same routine, the initial "DCL 1 name class," must be provided by the user. Compile-time structures include the indices (CDX, DDX, JDX) the tally records (CTR, DTR, JTR) and the rate table. 114 Fall Joint Computer Conference, 1972 2 CCUM FIXED(31) BINARY, /* Cumulative total. */ 2 CJCH CHAR(8); /* Job chain for this case. */ Example 1: DCL 1 CDX BASED(PCDX), % INCLUDE CDX; produces: Example 3: DCL 1 CDX BASED (PCDX), 2 RL FIXED(15) BINARY, /* Record Length */ 2 TN FIXED(15) BINARY, /* # of Entries */ 2 TMAX FIXED(31) BINARY, /* Max. Entries. */ 2 RO FIXED(31) BINARY, /* Pool Head */ ~2 MRT FIXED(31) BINARY, /* Max. Record Taken */ 2 VAREA(O:N REFER(CDX.TN)), /* Index Proper */ 3 PI FIXED(31) BINARY, /* Read Ptr. */ 3 P2 FIXED(31) BINARY, /* Write Ptr. */ 3 NAME CHAR(9); /* Case Number */ Since it is expected that the user will always use the system-supplied rate table (as opposed to a private copy of same): % INCLUDE RATES; produces: Example 2: DCL 1 CTR BASED (PCTR), % INCLUDE CTR; produces: DCL 1 CTR BASED (PCTR) , 2 CLNK FIXED(31) BINARY, /* Used by GETLINK. 2 CCAS CHAR(9) , /* Case charged. 2 CUNUSED CHAR(3), /* For future use. 2 COLD FIXED(31) BINARY, / * $ to last fiscal. 2 CNEW FIXED(3l) BINARY, /* Latest charges. DCL 1 RATES BASED (PRTS) , /* Rate Data */ 2 #SERVICES FIXED BIN(31), 2 TOT_RES FIXED BIN(31), /* Tot. # Resources */ 2 SERVICE(12) , /* Classes of Service */ 3 NAME CHAR(4) , 3 CODE CHAR(l), /* Comptroller's Code */ 3 #RESOURCES FIXED BIN(31), 3 OFFSET /* Into Res. Table */ FIXED BIN(3l), 2 RES_TABLE (120) , /* Resources */ 3 NAME CHAR (20) , 3 ABBR CHAR(4), 3 UNIT CHAR(8) , 3 RATE FLOAT DED(14); /* Per-unit */ */ */ Run-time facilities */ Routines are provided to establish and terminate the system's run-time environment, maintain the indices, fetch and replace tally records, expand and contract the data base, and handle allocation of disk storage. These are shown in Table I, below. */ */ TABLE I-User Interface Routines ROUTINE INIT FINI SEARCH ENTER DELETE RDCTR RDDTR RDJTR WRCTR WRDTR WRJTR FORMJTR LOSEJTR GETLINK PUTLINK FUNCTION Initialization & termination. Index maintenance. ARGUMENTS REQUIRED None. Index name, key name, return pointer, success indicator. Read and write routines for tally records Name (i.e. case, department or job number), success indicator. Installation & deletion of job nos. Allocate & return disk space. Job number, success indicator. Data set name, pointer to 1st avail. record, return pointer. EXAMPLE CALL INIT; CALL FINI; CALL SEARCH(DDX, '1234', RP, OK); IF , OK THEN STOP; CALL RDJTR('MYJOB', OK); IF ,OK THEN DO; PUT LIST(MYJOBII'MISSING'); STOP; END; CALL FORMJTR(NEWJOB,OK); IF ,OK THEN STOP; CALL GETLINK (FILE,RP,POOLHD) ; An approach to joh pricing in a multi-programming environment by CHARLES B. KREITZBERG and JESSE H. WEBB Educational Testing Service Princeton, New Jersey equitably charge for the running of jobs. The two major reasons for this are: INTRODUCTION Computers are amazingly fast, amazingly accurate, and amazingly expensive. This last attribute, expense, is one which must be considered by those who would utilize the speed and accuracy of computers. In order to equitably distribute the expense of computing among the various users, it is essential that the computer installation management be able to accurately assess the costs of processing a specific job. Knowing job costs is also important for efficiency studies, hardware planning, and workload evaluation as well as for billing purposes. For a second generation computer installation, job billing was a relatively simple task; since in this environment, any job that was in execution in the machine had the total machine assigned to it for the entire period of execution. As a result, the billing algorithm could be based simply upon the elapsed time for the job and the cost of the machine being used. In most cases, the cost for a job was given simply as the product of the run time and the rate per unit time. While this algorithm was a very simple one, it nevertheless was an equitable one and in most cases a reproducible one. Because of the fact that in a second generation computer only one job could be resident and in execution at one time, the very fast CPUs were often under utilized. As the CPUs were designed to be even faster, the degree of under utilization of them increased dramatically. Consequently, a major goal of third generation operating systems was to optimize the utilization of the CPU by allowing multiple jobs to be resident concurrently so that when anyone job was in a wait state, the CPU could then be allocated to some other job that could make use of it. While multiprogramming enabled a higher utilization of the CPU, it also introduced new problems in job billing. No longer was the old simple algorithm sufficient to • The sharing of resources by the resident jobs, and • The variation in elapsed time from run to run of a . given job. Unlike the second generation computer a given job no longer has all of the resources that are available on the computer allocated to it. In a multi-programming computer, a job will be allocated only those resources that it requests in order to run. Additional resources, that are available on the computer, can be allocated to other jobs. Therefore, it is evident that the rate per unit time cannot be a constant for all jobs, as it was for second generation computer billing, but must in some sense be dependent upon the extent to which resources are allocated to the jobs. The second item, and perhaps the most well-known, that influences the design of a billing algorithm for a third generation computer is the variation that is often experienced in the elapsed time from run to run of a given job. The elapsed time for any given job is no longer a function only of that job, but is also a function of the job mix. In other words, the elapsed time for a job will vary depending upon the kinds and numbers of different jobs which are resident with it when it is run. In order to demonstrate the magnitude of variation that can be experienced with subsequent runs of a given job, one job was run five different times in various job mixes. The elapsed time varied from 288 seconds to 1,022 seconds. This is not an unusual case, but represents exactly what can happen to the elapsed time when running jobs in a multi-programming environment. The effect, of course, is exaggerated as the degree of multiprogramming increases. Not only can this variation in run time cause a difference in the cost of a job from one run to another, 115 116 Fall Joint Computer Conference, 1972 but it also can cause an inequitability in the cost of different jobs; the variation in run time can effectively cause one job to be more expensive than another even though the amount of work being done is less. Objectives We have isolated several important criteria to be met by a multi-programming billing algorithm. Briefly, these criteria are as follows. • Reproducibility-As our previous discussion has indicated, the billing on elapsed time does not provide for reproducibility of charges. Any algorithm that is designed to be used in a multiprogramming environment should have as a characteristic, the ability to produce reproducible charges for any given job regardless of when or how it is run, or what jobs it is sharing with the computer. • Equitability-Any billing algorithm designed for use in a multi-programming environment must produce equitable costs. The cost of a given job must be a function only of the work that the job does, and of the amount of resources that it uses. Jobs which use more resources or do more work must pay more money. The billing algorithm must accommodate this fact. • Cost Recovery-In many computer operations it is necessary to recover the cost of the operation from the users of the hardware. The billing algorithm developed for a multi-programming environment must enable the recovery of costs to be achieved. • A uditability-A multi-programming billing algorithm must produce audit able costs. This is particularly true when billing outside users for the use of computer hardware. The charges to the client must be audit able. • Encourage Efficient Use of the H ardware-8ince one goal in a design of the third generation hardware was to optimize the use of that hardware, a billing algorithm that is designed for use in a multijobbing environment should be such that it encourages the efficient use of the hardware. • Allow for Cost Estimating-The implementation of potential computer applications is often decided upon by making cost estimates of the expense of running the proposed application. Consequently, it is important that the billing algorithm used to charge customers for the use of the hardware also enables potential customers to estimate beforehand, the expense that they will incur when running their application· on the computer hardware. We distinguish between job cost and job price: job cost is the amount which it costs the installation to process a given job; job price is the amount that a user of the computer facility pays for having his job processed. Ideally, the job price will be based on the job cost but this may not always be the case. In many organizations, notably universities, the computer charges are absorbed by the institution as overhead; in these installations the job price is effectively zero-the job costs are not. In other organizations, such as service bureaus, the job price may be adjusted to attract clients and may not accurately reflect the job cost. In either case, however, it is important that the installation management know how much it costs to process a specific job. 1 •2 The development of the job billing algorithm (JBA) discussed in this paper will proceed as follows: first, we will discuss the "traditional" costing formula used in second generation computer systems: cost = (program run time) X (rate per unit time) and we shall demonstrate its inadequacy in a multijobbing environment. Second, we shall develop a cost formula in which a job is considered to run on a dedicated computer (which is, in fact, a subset of the multi-programming computer) in a time interval developed from the active time of the program. DEVELOP1VIENT OF THE JOB PRICING ALGORITHlVI In order to recover the cost of a sharable facility over a group of users, the price, P, of performing some operation requiring usage t is; P= (C) ( (tk) ) Lti (1) where; C is the total cost of the facility L ti is the total usage experienced tk is the amount of use required for the operation Consider the billing technique which was used by many computer installations running a single thread (one program at a time) system. Let $m be the cost per unit time of the computer configuration. Then, if a program began execution at time tl and terminated execution at time t2, the cost of running the program was computed by: (2) As the utilization of the computer increased the cost per unit time decreased. The cost figure produced by (2) is in many ways a Approach to Job Pricing in Multi-Programming Environment very satisfying one. It is simple to compute, it is reproducible since a program normally requires a fixed time for its execution, it is equitable since a "large" job will cost more than a "small" job (where size is measured by the amount of time that the computers resources are held by the job). Unfortunately for the user, however, the cost produced by (2) charges for all the resources of the computer system even if they are unused. This "inflated" charge is a result of the fact that, in a single thread environment, all resources of the computer system are allocated to a program being processed even if that program has no need of them. The effect of this is that the most efficient program in a single thread environment is the program which executes in the least amount of time; that is, programmers attempt to minimize the quantity (l? - t1 ) ; this quantity, called the wall clock time (WeT) of the program, determines the program's cost. Since the rate of the computer is constant, the only way to minimize the cost for a given program is to reduce its WeT; in effect, make it run faster. Hence, many of the techniques which were utilized during the second generation were designed to minimize the time that a program remained resident in the computer. The purpose of running in a multi-thread environment, one in which more than the one program is resident concurrently, is to maximize the utilization of the computer's resources thus reducing the unit cost. In a multi-thread processing system, the cost formula given by (2) is no longer useful because: 1. It is unreasonable to charge the user for the entire computer since the unused resources are available to other programs. 2. The wall clock time of a program is no longer a constant quantity but becomes a function of the operating environment and job mix. For these reasons we must abandon (2) as a reasonable costing formula. lVlany pricing algorithms are in use; however, none is as "nice" as (2). If possible, we should like to retain formula (2) for its simplicity and intuitive appeaJ.3 This may be done if we can find more consistent definitions to replace m (rate) and WeT (elapsed time). Computed elapsed time A computer program is a realization of some process on a particular hardware configuration. That is, a program uses some subset of the available resources and "tailors" them to perform a specific task. The program is loaded into the computer's memory at time t1 and compute ---rI 110 voluntary ,·",t 117 ~ +____ _ T " t l - - - - - - - - - - - - - - - - - -.~ . time Figure I-States of a program in a single thread environment terminates at time t2 • During the period of residency, the program may be in either one of two states: active or blocked. A program is active when it is executing or when it awaiting the completion of some external event. A program is blocked when it is waiting for some resource which is unavailable. These categories are exhaustive; if a program is not active and is not waiting for something to happen then it is not doing anything at all. The two categories are not, however, mutually exclusive since a program may be processing but also awaiting the completion of an event (for example an input/output operation) indeed, it is this condition which we attempt to maximize via channel overlap. Therefore, we define voluntary wait as that interval during which a program has ceased computing and is awaiting the completion of an external event. We define involuntary wait as the interval during which a program is blocked; a condition caused by contention. In general, voluntary wait results from initiation of an input/output operation and in a single thread system we have: (3) where: each tc is a compute interval and each tv is a voluntary wait interval. graphically, the situation is represented as in Figure 1. The solid line represents periods of compute and the broken line indicates in~ervals of input/output activity. Since '1;tc is based on the number of instructions executed which is constant and '1;t v is based on the speed of input/output which is also constant (except for a few special cases), WeT is itself constant for a given program and data in a single thread environment. The ideal case in this type of system is one in which the overlap is so good that the program obtains the i+ 1st record as soon as it has finished processing the ith awn .4_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ~ll ---1J----][---]----- - - - Figure 2-A program with maximum overlap 0/1 alndwo~ 118 Fall Joint Computer Conference, 1972 voluntary wait compute i nvo 1untary compute ----.r,.-------.. ,. wait ~~~p~te ~::::::::;:~, J~~OI involuntary wait l _____t I JOB 2 : compute r'-----'"""',,.,,------"""'-\r'-----"""'-, -------------- compute ~,::::::::~, j-I_ _ _ _ _ _ _----! ____________________ comoute: .~ time time Figure 5-States of a program based on active time Figure 3-States of a program in a multi-thread environment record. Graphically, this situation is shown in Figure 2 and we can derive the lower bound on WCT as: (4) and, of course: WCT~~tc as ~tv~O (5) In a multi-thread environment, we know that: WCT = 'J;tc + ~tv + ~ti (6) where: ti is an interval of involuntary wait. But, from the above discussion we know that ~tc+ ~tv is a constant for a given program, hence, the inconsistency in the WCT must come from ti . This is precisely what our intuition tells us; that the residency time of a job will increase with the workload on the computer. Graphically, a program running in a multithread environment might appear as in Figure 3. During the interval that a program is in involuntary wait, it is performing no actions (in fact, some programmers refer to a program in this state as "asleep"). As a consequence, we may "remove" the segments of time that the program is asleep from the graph for time does not exist to a program activity in involuntary wait. This permits us to construct a series of time sequences for the various programs resident in the computer; counting clock ticks only when the program is active. When we do this a graph such as Figure 4 becomes continuous with respect to the program (Figure 5). Of course, the units on the x-axis in Figure 5 no longer represents real-time, they represent, instead, the active time of the program. We shall call the computed time interval computed elapsed time (CET) defined as: , CET=~tc+~tv=WCT-~ti and as ship: ti~O, CET~WCT (7) so that we have the relation- WCT~CET compute voluntary wait -----.",----...,', involuntary wait (8) ----, " ----.. comoute --- - ---/111//11//1/ 1 . - - - - - I I I time Figure 4-States of a program based on real time The quantity WCT-CET represents the interference from other jobs in the system and may be used as a measure of the degree of multi-programming. Unfortunately, the CET suffers from the same deficiency as the WCT-it is not reproducible. The reason for this is that on a movable head direct access storage device contention exists for the head and the time for an access varies with the job mix. However, the CET may be estimated from its parameters. Recall that CET = ~tc+ ~tv. The quantity ~tc is computed from the number of instructions executed by the program and is an extremely stable parameter. The quantity ~tv is based upon the number and type of accesses and is estimated as: (9) where a (i) is a function which estimates the access time to the ith file and ni is the number of access to that file. The amount of time which a program waits for an input or output operation depends upon a number of factors. The time required to read a record is based upon the transfer rate of the input/output device, the number of bytes transferred, the latency time associated with the device (such as disk rotation, tape inter-record gap time, and disk arm movement). For example, a tape access on a device with a transfer rate of RT and a start-stop time of ST would require: (10) seconds to transfer a record of b bytes. Hence, for a file of n records, we have a total input/output time of: n L (ST+RTb i ) =nST+RT~bi (11) i=l where ~bi is the total, number of bytes transferred. In practice ~bi ~ nB where n is the number of records and B is the average blocksize. The term ST is, nominally, the start-stop time of the device. However, this term is also used to apply a correction to the theoretical record time. The reason is that while the CET will never be greater than the I/O time plus the CPU time, overlap may cause it to be less. This problem is mitigated by the fact that at most computer shops (certainly at ETS) almost all programs are written in high-level computer languages and, as a result, the job mix is homogeneous. A measure of overlap may be obtained by fitting various curves to historical data and choosing Approach to Job Pricing in IVlulti-Programming Environment the one which provides the best fit. In other words, pick the constants which provide the best estimate of the WCT. It is important to remember that the CET function produces a time as its result. We are using program parameters such as accesses, CPU cycles, and tape mounts only because they enable us to predict the CET with a high degree of accuracy. The original billing formula (2) which we wished to adapt to a multi-thread environment utilized a time multiplied by a dollar rate per unit time. The CET estimating function has provided us with a pseudo run time; we must now develop an appropriate dollar rate function. In order to develop a charging rate function we consider the set of resources assigned to a program. In a multi-programming environment, the computer's resources are assigned to various programs at a given time. The resources are partitioned into subset computers each assigned to a program. The configuration of the subset computers is dynamic; therefore, the cost of a job is: n cost= L: CETiori (12) i=l where i is the allocation interval; that is, the interval between changes in the job's resources held by the job. CET i is the CET accumulated during the ith interval. ri is the rate charged for the subset computer with the configuration held by the program during interval i. The allocation interval for OS/360 is a job step. The rate function Some of the attributes which the charging rate function should have are: • the rate for a subset computer should reflect the "size" of the subset computer allocated; a "large" computer should cost more than a "small" computer. • the rate for a subset computer should include a correction factor based upon the probability that a job will be available to utilize the remaining resources. • the sum of the charges over a given period must equal the monies to be recovered, for that period. With these points in mind, we may create a rate function. The elements of the resource pool may be classified as sharable resources and nonsharable resources. Tape drives, core memory, and unit record equipment are 119 examples of nonsharable resources; disk units are an example of a sharable resource. While these categories are not always exact they are useful since we assume that allocation of a nonsharable resource is more significant than allocation of a sharable resource. At Educational Testing Service, it was determined that the most used nonsharable resources are core storage and tape drives. Therefore, it was logical to partition the computer into subset computers based upon the program's requirement for core and tapes. Tapes are allocated in increments of one; core is allocated in 2K blocks. Hence, there are (# tapes * available core/2,000) possible partitions. For any given design partition, we would like to develop a rate which is proportional to the load which allocation places upon the resources pool. A single job may sometimes disable the entire computer. If, for example, a single program is using all of the available core storage then the unused devices are not available to any other program and should be charged for. On the other hand, if a single job is using all available tapes, other jobs may still be processed and the charge should be proportionately less. The design proportion is the mechanism by which the total machine is effectively partitioned into submachines based upon the resources allocated to the sub machines. A design proportion can be then assigned to any job based upon the resources it requires. The design proportion should have at least the following properties. • The design proportion should range between the limits 0 and 1. • The design proportion should reflect the proportion of the total resources that are allocated to the job. • The design proportion should reflect, in some fashion, the proportion of the total resources that are not allocated to the job, but which the job prevents other jobs from using. The design proportion proposed for the billing algorithm is based upon the probability that when the job is resident, some other job can still be run. The definition of this parameter is as stated below. The design proportion of a job is equal to the probability that when the job is resident, another job will be encountered such that there are insufficient resources remaining to run it. Since OS/360 allocates core in 2K blocks, the number of ways that programs can be written to occupy available core is equal to: N=C/2 where, (13) 120 Fall Joint Computer Conference, 1972 N = Number of ways that programs can be written C = Core available in Kilo-bytes In addition, if there are T tapes available on the hardware configuration then there are T plus 1 different ways that programs can be written to utilize tapes. Therefore, the total number of ways that programs can be written to utilize -core and tapes is given by the following equation, N = (C/2) (T+1) (14) where, N = Total number of ways that programs can be written C = Core available in Kilo-bytes T=Number of tape drives available The design proportion for a given job can be alternately defined as 1 minus "the probability that another job can be written to fit in the remaining resources." This is shown as follows. D =1.0- [(C A -Cu )/2](TA -Tu +1) p [(C A )/2](TA +1) (15) where, Dp = Design proportion for the job CA = Core available in Kilo-bytes Cu = Core used by job T A = Tape drives available on the computer Tu=Tape drives used by the job It is important to note that the sum of the design proportions of all jobs resident at one time can be greater than 1.0. For example, consider the following two jobs resident in a 10K, four tape machine. Job #1: 6K; 1 Tape Dp =17/25 Job #2: 4K; 3 Tapes D p =19/25 resources. Clearly, the theoretical probability and the actual probability may be somewhat different. Consequently, a design proportion could be designed based upon the actual probabilities experienced in a particular installation. Such a probability function would change as the nature of the program library changed. The design proportion function described above would change only as the configuration of the hardware changed. Either technique is acceptable and the design proportion has the desired properties. That is, the design proportion increases as the resources used by the various jobs increase. However, it also reflects the resources that are denied to other jobs because of some one jobs' residency. Consider the fact that when all of core is used by a job, the tape drives are denied to other jobs. The design proportion in this case is 1.0 reflecting the fact that the job in effect has tied up all available resources even though they are not all used by the job itself. While the design proportion function is simple, it has many desirable properties: • It is continuous with respect to OS/360 allocation; all allocation partitions are available. • It always moves in the right direction, that is, increasing the core requirement or tape requirement of the program, results in an increased proportion. • It results in a proportion which may be multiplied by the rate for the total configuration to produce a dollar cost for the subset computer. • It is simple to compute. If it were determined that the required recovery could be obtained if the rate for the computer were set at $35 per CET minute, the price of a step is determined by the equation: P step = (($35.)Dp(core, tapes)) (CET/60) (16) and the price of a job (with n steps) is: The sum of their design proportion is 36/25. This seems odd at first since the design proportion of a 10K; four tape job is 1.0. However, this can be shown to be a necessary and desirable property of the design proportion. To show that this is the case, it is necessary to consider the amount of work done and the total cost of the work for two or more jobs that use the total machine compared to the cost of the same amount of work done by a single job that uses the total machine. This analysis will not be covered here. The design proportion function as defined herein is a theoretical function. It is based solely upon the theoretical possibility of finding jobs to occupy available n P job = L P step (17) 1:=1 We have come full circle and returned to our "second generation" billing formula: cost = rate· time The key points in the development were: • A multi-tasking computer system may be considered to be a collection of parallel processors by altering the time reference. • The variation in time of a program run in a multi- Approach to Job Pricing in Multi-Programming Environment· programmed environment is due to involuntary wait time. • The computed elapsed time may be multiplied by a rate assigned to the subset computer and an equitable and reproducible cost developed. IMPLEMENTATION OF THE JOB PRICING ALGORITHl\1 The Job Pricing Algorithm (JPA) is implemented under OS/360 Release 19.6. No changes to the operating system were required; a relatively minor modification was made to HASP in order to write accounting records to disk for inclusion in the accounting system. The basis of the JPA is the IBM machine accounting facility known as Systems IVlanagement Facility (SMF).4 Billing under the JPA involves four steps: 121 CONCLUSION The approach to user billing described in this paper has proved useful to management as well as users. Many improvements are possible especially in the area of more accurate CET estimation. Hopefully, designers of operating systems will, in the future, include sophisticated statistics gathering routines as part of their product thus providing reliable, accurate data for acco?nting. APPENDIX A method of deriving GET parameters Let the wall clock time (W) be estimated as follows, where, 1. Collect the job activity records at execution time. The records are produced by SMF and HASP and are written to a disk data setSYS1.MANX. 2. Daily, the SYSl.lVIANX records are consolidated into job description records and converted to a fixed format. 3. The output from step (2) is used as input to a daily billing program which computes a cost for the jobs and prep arBS a detailed report of the day's activity by account number. 4. Monthly, the input to the daily program is consolidated and used as input to a monthly billing program which interfaces with the ETS accounting system. X T = # of tape accesses XD=# of disk accesses X M = # of tape mounts C=CPUtime AT, AD, AM = Coefficients to be determined We wish to determine the coefficients AT, AD, and AM that will maximize the correlation between W', the computed elapsed time, and W, the actual elapsed time. Define the error e as, e= (W - W') (2) The correlation coefficient, r, can be written as, The raw SMF data which is produced as a result of job execution contains much valuable information about system performance and computer workload which is of interest to computer center management. One useful characteristic of the· JPA is that costs are predictable. This enables a programmer or systems analyst to determine, in advance, the costs of running a particular job and, more importantly, to design his program in the most economical manner possible. In order to facilitate this process, a terminal oriented, interactive, cost estimating program has been developed. This program is written in BASIC and enables the programmer to input various parameters of his program (such as filesize, CPU requirements, blocking factors, memory requirements) and the cost estimate program produces the cost of the program being developed. Parameters may then be selectively altered and the effects determined. (4) Then, in order to maximize r2, it is sufficient to minimize u e2 since u w 2 is a constant over a given sample. Since (6) we have, (7) ue2= ~e2-n-l(~e)2 Finally, we have, ue2= ~[(Wi-Gi) -ATXT -ADXD-AMXMJ2-n-l[~(Wi-Gi) (8) 122 Fall Joint Computer Conference, 1972 (9) (10) (11) Solving the simultaneous equations ( 13), ( 14), and (15) for AT, AD, and AM should give values for the parameters that will maximize the correlation between the computed elapsed time and the actual elapsed time. The technique was applied to a sample month of data which was composed of 19401 job steps. The coefficients determined were, AT = 0.0251 seconds AD = 0.0474 seconds AM=81.2 seconds (12) Since all the partials must vanish, we have, When these coefficients were used in Equation (1) to determine the computed elapsed time, the correlation coefficient between the computed time and actual time over the 19401 steps was 0.825. When other coefficients were used, i.e. A T =0.015, AD =:,0.10, and A M=60.0, the correlation was only 0.71. Note: Card read, card punch, and print time constants were not computed in this fashion simply because there is insufficient data on job steps that use these devices as dedicated devices. However, as data become available in the future, the method could be applied to obtain good access times. (13) (14) REFERENCES 1 L L SELWIN Computer resource accounting in a time sharing environment Proceedings of the Fall Joint Computer Conference 1970 2 C R SYMONS A cost accounting formula for multi-programming computers The Computer Journal Vol 14 No 11971 3 J T HOOTMAN The pricing dilemma Datamation Vol 15 No 8 1969 4 IBM Corp IBM System/360 operating system: System management facilities Manual GC28-6712 1971 Facilities management-A marriage of porcupines by DAVID C. JUNG Quantum Science Corporation Palo Alto, California Fl\1-DEFINED tion. Moreover, this software succeeded in improving operator control and reducing operating costs. Consequently, EDS marketed these software packages to other Blue Cross/Blue Shield organizations. Outside of the medical insurance field, EDS has successfully pursued FM opportunities in life insurance, banking, and brokerage. The success of EDS, both in revenue/profit growth and in the stock market did not go unnoticed by others in the computer services industry. As a result, in the late 1960's and early 1970's many software firms and data service bureaus diversified into Fl\![-many, unfortunately, with no real capabilities. Since FM has proven itself as a viable business in the commercial market, over 50 independent FM firms have been formed. Moreover, at least 50 U.S. corporations with large, widespread computer facilities have spun off profit centers or separate corporations from their EDP operations. In many cases, these spinoffs offer customers FM as one of their computer services. F M definition often elusive There are almost as many definitions for Facilities Management (FM) as there are people trying to define it. Because FM can offer different levels of service, some variations in its definition are legitimate. FM was initiated by the Federal Government in the 1950's when the Atomic Energy Commission, the National Aeronautics and Space Administration (NASA), and the Department of Defense offered several EDP companies the opportunity to manage and operate some of their EDP installations. Previously, these companies had developed strong relationships with the various agencies through systems development and software contracts. FM definition expanding Nurtured by the Federal Government, FM has emerged as a legitimate computer service in the commercial EDP environment. Since FM has been offered in the commercial market, its definition has expanded to include additional services. In fact, customers are now beginning to expect Fl\1 vendors to have expertise that extends far beyond the day-to-day management of the data processing department. Electronic Data Systems (EDS), formed in the early 1960's, pioneered the FM concept in the commercial market. Shortly after its founding, EDS recognized the massive EDP changes required in the hospital and medical insurance industry as a result of Medicare and other coverages changed by the Social Security Administration. Accordingly, EDS secured several State Blue Cross/Blue Shield organizations as customers. While operating these installations, EDS developed standard software packages that met the recordkeeping requirements of the Social Security Administra- A n ideal concept of F M The ideal role for the FM vendor is to assist in all the tasks related to business information and the EDP operations in the firm. The Facilities Manager could assume full responsibility for the EDP operations, from acquiring the equipment and staffing the installationto distributing the information to the firm's operating areas. FM also has a vital role in defining business information requirements for top management. More specifically, the FM vendor should be able to define what information is required to operate the business, based on his industry experience. He should also be able to help establish cost parameters, based on an analysis of what other firms in the industry spend for EDP. Moreover, FM vendors will assist top management to cost optimize the array of business processing methods which may include manual or semi-automated 123 124 Fall Joint Computer Conference, 1972 only a single division or a major application may be taken over by an FM vendor. Merely taking one of many applications on a computer and performing this function on a service bureau or time-sharing basis, however, is not included as an FM contract. TOP MANAGEMENT DEFINES BUSINESS INFORMATION REOUIREMENTS ----. PER IODIC REV lEW FOR EDP I I I I I • • • • • • DEFINE INFORMATION REQUIRED TO OPERATE PERFORM SYSTEMS ANALYSIS SPECIFY OUTPUTSflNPUTS SET TIMING SET COST PARAMETERS SELECT INFORMATION PROCESSING METHODS I NON-EDP PROCESSING METHODS II HOW EDP USERS BENEFIT EDP OPERATIONS • • • • • I I. FEEDBACK ACQUIRE EQUIPMENT AND PERSONNEL SCHEDULE AND BUDGET RESOURCES MANAGE DAILY OPERATIONS PERFORM AUDITS/SECURE OPERATIONS DISTRIBUTE INFORMATION COMPANY OPERATIONS F M benefits: A study in contrast SOllle FM users benefit • • • I USE INFORMATION • DEVELOP NEW/CHANGED INFORMATION REqUIREMENTS Figure I-Business information and EDP in the ideal firm approaches as well as EDP. The FM vendor must be skilled at working with personnel in the customer's operation centers to improve ways in which the information is used and to effectively develop new methods for handling information as a business grows. (See Figure 1.) FM-Today it's EDP takeover The real world of FM is quite different from the ideal version just described, and there will be a period of long and difficult transition to reach that level. Actual takeover of an existing EDP installation is now the prime determinant of whether an FM relationship exists. When the FM vector takes over the EDP installation, it also takes over such EDP dpeartment tasks as (1) possession, maintenance, and operation of all EDP equipment and the payment of all rental fees or acquisition of equipment ownership, (2) hiring and training all EDP personnel, and (3) development of applications, performance of systems analysis, acquisition of new equipment and implementation of new applications. Takeover lllay be partial Many FM vendors are increasingly offering cafeteriastyle services so that the customer can retain control over EDP activities that he can perform proficiently. In some cases where equipment is owned, the customer may retain title to the equipment. Salaries of EDP personnel may continue to be paid by the customer, but responsibility for management is assigned to the vendor. Also included as partial FM are takeovers of less than the client's total EDP activities. EDP activity of Southwestern Life, a $5 billion life insurance company in Dallas and a customer of EDS, typifies the satisfied FM user. Southwestern Life's vice president, A. E. Wood, has stated, "We are very pleased with our agreement and the further we get into it, the more sure we are we did the right thing. We won't save an appreciable amount of money on operations, but the efficiency of operation will be improved in great meassure. To do the same job internally would have taken us two to three times as long and we still would not have benefited from the continual upgrading we expect to see with EDS." ••• and SOllle do uot Disgruntled users exist too, but they are more difficult to find and in many cases are legally restricted from discussing their experiences. One manufacturing company told us, "We cannot talk to you; however, let me say that our experience was unfortunate, very unfortunate. They (the FM vendor) did not understand our business, did not understand the urgency of turnaround time on orders. We lost control of our orders and finished goods inventory for six weeks. As a result we lost many customers whom we are still trying to woo back after more than a year." Two medium-sized banks had similar comments that indirectly revealed much about FM benefits. "We're not in any great difficulty. In fact, the EDP operations now are running well, but every time we want to make a change it costs us. I wish I had my own EDP manager back to give orders to." FM benefits are far ranging Large users benefit least frolll FM There is no question in our mind that there are many potential benefits for FM users. However, installation size is the primary yardstick for measuring benefits Facilities Management-A ]Vlarriage of Porcupines users can obtain from FM; large users have the least to gain for several reasons. In most cases, large users have already achieved economy-of-scale benefits which FM and other computer services can bring to bear. Large users typically have computers operating more shifts during the day and do not allow the computers to sit idle. In addition to higher utilization, larger users can more fully exploit the capabilities of applications and systems programmers because they can spread these skills over more CPU's than can smaller users. For these and other reasons, it is much more difficult to demonstrate to large users that an FM vendor can operate his EDP department more efficiently and less expensively. For these reasons, the bulk of FM revenue will come from the small- or medium-sized EDP user. This is defined as a user who has a 360/50 or smaller computer and is spending less than $1.5 million per year on EDP. Improved EDP operations The most tangible benefit FM can bring to an EDP user is improved control over the EDP operations and stabilization of the related operating costs. This conclusion is based on Quantum's field research which has shown that installations in the small-to medium-sized range are out of control despite the refusal by managers to admit it. Lack of EDP planning, budgeting, and scheduling shows up in obvious ways, such as skyrocketing costs, as well as in obscure ways that are difficult to detect, yet contribute significantly to higher EDP costs. These subtle inefficiencies include program reruns due to operator or programming errors, equipment downtime due to sloppy programming documentation, and idle time due to poorly scheduled EDP workloads. Because they are obscure and often hidden by EDP departments, it is difficult for managements in small and medium installations to detect and correct these problems. On the other hand, an FM vendor can often quickly identify these problems and offer corrective remedies because his personnel are trained to uncover these inefficiencies and his profits depend on their correction. S:maller investments to upgradeEDP FM can also benefit end users by reducing proposed future increases in EDP costs. Small- and medium-sized users that have a single computer must eventually face the problem of increasing their equipment capacity to meet requirements of revenue growth and expanded 125 360/40 360/50 360/65 HAVE OS NOW 10% 46% 69% HAVE NO OS NOW, BUT PLAN TO INSTALL IN 1971-72 48 38 8 HAVE NO OS NOW AND NO PLANS 42 16 23 100% 100% 100% - TOTAL TABLE A-User OS Plans 1971-72 applications. This often means a significant increment in rental and other support costs. A 360/30 user, for example, who is spending $13,000 a year on equipment may have to jump to a 360/40 or a 370/135, costing $18,000-$22,000 per year to achieve the required increase in computing power. Support costs will also increase, in many cases more quickly. If a useris acquiring a 360/40, for example, he probably will have to use an Operating System (OS) to achieve efficient machine performance. Many users today will upgrade their software as shown in Table A. An OS installation requires a higher level of programming talent than is currently required to run a DOS 360 system. Because the user does not need the full time services of these system programmers, FM offers an economical solution whereby system programmers are shared among multiple users. Elimination of EDP personnel problems One of the most serious problems users encounter in managing EDP operations is personnel management. The computer has acquired an aura of mysticism that has tended to insulate the EDP department from the normal corporate rules and procedures. Many programmers often expect to receive special treatment, maintain different dress and appearance and obtain higher pay. High turnover among EDP personnel, often two to three times the norm for other company operations, further aggravates EDP personnel problems. Through subcontracting, FM vendors can separate EDP personnel from the corporation and thus alleviate this situation for management. Eased conversion to current generation software Over one-third of all users are locked into using third generation computers in the emulation mode, 126 Fall Joint Computer Conference, 1972 LARGE COMPANIES MEDIUM COMPANIES SMAll COMPANIES ANNUAL EDP EXPENOITURES PERCENT OF EDP COSTS SPENT ON PLANNING, ETC. >'1.5 MILLION 1-5% $3OOK-1.5 MILLION 0-2% <$300K 0-1% TABLE B-User Expenditures on EDP Planning where second generation language programs are run on third generation computers. Although software conversion is a difficult and expensive task for users, the FM vendor who has an industry-oriented approach usually has a standard package already available that the customer can use. In several installations, FM vendors have simplified conversion, thus providing their users with the economies of third generation computers. Improved selection of new equipment and services Users of all sizes continually need to evaluate new equipment and new service offerings, including the evaluation of whether to buy outside or do in-house development. Again, the large user holds an advantage because his size permits him to invest in a technical staff dedicated to evaluations. Installations spending more than $1.5 million annually for EDP usually have one full time person or more appointed to these functions. In smaller installations there is no dedicated staff and pro-tem evaluation committees are formed when required. Table B shows the relationship between the size of EDP expenditures and the share of those expenditures allocated to planning, auditing and technical evaluations. In this area of EDP planning, FM can benefit users in two ways. First, FM vendors can and do take over this responsibility and, second, the effective cost to any single user is less because it is spread over multiple users. Other operating benefits One potential benefit from FM relates to new application development. Typically, 60 percent or more of a firm's EDP expenditures are tied to administrative applications, such as payroll, general accounting, and accounts payable. Because of the relatively high saturation in the administrative area, firms are now extending the use of EDP into operational areas such as productioncontrol and distribution management. However, many of these firms lack the qualified EDP professionals and line managers necessary to develop and implement applications in non-administrative areas. Thus, they have become receptive to considering alternatives, including FM. Major EDP cost savings Earlier in this chapter, the stabilization of EDP costs was discussed. Now we will focus on the major savings that FM can provide through the actual reduction of EDP costs. This potential FM benefit is too often the major theme of an FM vendor's sales pitch. Consequently, its emotional appeal often clouds a rational evaluation that should precede an FM contract. If the FM contract is well written and does not restrict either party, the FM vendor can apply his economies of scale and capabilities for improving EDP operations and should be able to show a direct cost savings for the customer. However, these "savings" may be needed to offset costs of software conversion or other contingencies and thus, may not really be available to the customer in the early contract years. Long range benefits-Better information Improved operation control and profits through better information-this is the major long-range benefit from FM. While this contribution is not unique to FM vendors, few EDP users today have been able to develop a close relationship between company operations and EDP. Companies such as Weyerhauser and American Airlines-generally recognized as leading edge users-~re few in number, and many try to emulate their achievements in integrating EDP into the company operations. EDP expenditures, however, are seldom judged on their contribution toward solving basic company problems and increasing revenues and profits. Many apparently well-run EDP departments would find it difficult to justify their existence in these terms. The situation is changing, however. An indication of this new attitude is the increased status of the top EDP executive in large firms. The top EDP executive is now a corporate officer in over 300 of the Fortune 500 firms. While titles often mean little, the change to Vice President or Director of Business Information from Director of EDP Operations suggests that top management in many companies has considered and faced the problem. Facilities Management-A Marriage of Porcupines 127 In 'addition to new management titles, continuing penetration of EDP functions into operating areas is increasingly evident. FM-A permanent answer for users TOTAL $645 MI LLiON FM REVENUES FM should not he treated as an interim first-aid treatment for EDP. There are several good reasons for continuing the FM relationship indefinitely. • Individual users cannot duplicate the economies of scale that FM vendors can achieve. Standard softwar epackages, for example, require constant updating and support and new equipment evaluations are constantly required if lowest cost EDP is to be maintained. • Top management would have to become involved in EDP management if operations were brought back in-house. This involvement would take time from selling and other revenue producing activities. A rational top ,management trys to minimize the share of its time spent on cost-management activities. • By disengaging from the FM contract, the customer risks losing control over his EDP again while receiving no obvious compensation for this ri~k. Even if the customer believes he is being overcharged by the FM vendor, there ~s no real guarantee the excess profits can he converted to savings to the customer. For these reasons an F.M relationship should normally be considered permanent rather than temporary. MARKET STRUCTURE AND FORECAST Current F M market Total FM Illarket size and recent growth The 1971 market for FM services totals $645 million with 337 contracts. However, 45 percent or $291 million was captive and not available to independents. Captive FM contracts are defined as being solidly in the possession of the vendor because of other than competitive considerations. Typically, captive contracts are negotiated between a large firm and its EDP spinoff subsidiary. The remaining market is available to all competitors and totals $354 or 55 percent. Available does not necessarily mean the contract is available for competition immediately, since most contracts are signed for a term of two to five years. Captive and available 1971 FM revenues and contracts are shown in Figure 2. TOTAL 337 FM CONTRACTS Figure 2-1971 FM market Industry analysis Discrete and Process Manufacturing are the largest industrial sectors using 'FM services and account for over 44 percent of total FM revenues. However, most EDP spinoffs have occurred in manufacturing and much of these FM revenues are therefore captive and not available to independent competitors. After deleting the captive portion, the two manufacturing sectors account for only 12 percent of the available 1971 FM market of $354 million. Manufacturing has failed to develop into a major available FM market primarily because there is a general absence of common business and accounting procedures from company to company, thus, providing no basis for leveraging standard software. This is true even within manufacturing subsectors producing very common products. In the medical insurance sector, however, Federal Medicare regulations enforce a common method for reporting claims and related insurance data, thus providing a good basis for leveraging standard software. The Medical Sector accounts for 25 percent of available FM revenues. The Medical Sector includes medical health insurance companies (Blue Cross/Blue Shield) 128 Fall Joint Computer Conference, 1972 Type of perfor:mance SERVICE BUREAU TOTAL FM MARKET $645 MILLION FM vendors who initially take over on-site operation of a customer's computer strive for economies of scale. This has created a trend whereby the FM vendor has eliminated the need for the customer's computer by processing data through NIS (timesharing) or service bureaus. NIS now accounts for 5 percent of total FM revenues. Service bureau processing which requires the physical transport of data from the client's location to the vendor's computer installation accounts for 2 percent of total FM revenues. In Figure 3, which depicts FM market by type of performance, combination refers to the use of two or more of the above services to carry out the FM contract. Types of vendors Types of vendors that perform FM contracts are described below: AVAILABLE FM MARKET $354 MILLION Figure 3-1971 FM market by type of performance and hospitals. This sector was the first major commercial FM market. FM continues to be attractive in this sector because it permits rapid upgrading of EDP to meet the new Medicare reporting procedures and relieves the problem of low EDP salary scales. The largest industry sector in the available FM market, the Federal Government, accounts for over 34 percent of available revenues. All Federal Government contracts are awarded on the basis of competitive bids. Most Federal Government FM contracts still tend to be purely subcontracting of EDP operations rather than total business information management which is becoming more common in commercial markets. The Finance Sector currently accounts for 22 percent of available FM revenues. Banks and insurance companies are the major markets within the Finance Sector which also includes brokerage firms, finance· companies and credit unions. • Independents who accounted for 67 percent of total FM revenues in 1971 were startups in the computer service industry or vendors who have graduated from the ranks of spinoffs. • Spinoffs are potentially strongest in their "home" industries ; however , competitive pressures may limit market penetration here. An oil company spinoff, for example, would have a difficult time selling its seismic services to another oil company because of the high value placed on oil exploration and related information. • Computer manufacturers are increasingly offering FM services. Honeywell has several FM contracts and will be joined by Univac and CDC who have announced intentions to marketFM services. The RCA Services Division should find FM opportunities· among RCA· customers. IBM has several ways in which it can enter FM, and will show an expanding profile. Contract Values The average FM contract in 1971 is valued at slightly less than $2 million. This is the equivalent of a user with two or three computers, one at least a 360/50. However, this is based on the total market analysis which distorts the averages for captive and available FM markets. An analysis of available and captive contracts shows that the average value of an available contract drops to $1.24 million, which would be equivalent to a user with Facilities Management-A Marriage of Porcupines 129 TABLE C-Major Vendors AVAILABLE TOTAL FM Revenues· Rank Company 1 2 Electronic Data f3ystems Corp. McDonnell Douglas Automation Co. Boeing Computer Services Inc. University Computing Company Computer Sciences Corp. Grumman Data Systems Computing and Software, Inc. System Development Corp. Martin Marietta Data Systems Westinghouse Tele-Computer Systems Corp. MISCO National Sharedata Corp. A. O. Smith Corp.'s Data Systems Div. Executive Computer Systems, Inc. Unionamerica Computer Corp. Cambridge Computer Corp. Greyhound Computer Corp. Tracor Computing Corp. Mentor Corp. Programming Methods, Inc. (PMI) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 95.7 89.4 82.2 42.2 26.6 25.6 14.7 14.0 11.5 11.0 10.0 7.5 5.4 5.2 5.0 4.4 4.3 4.0 4.0 3.8 Company Electronic Data Systems Corp. Computer Sciences Corp. Boeing Computer Services Inc. Computing and Software, Inc. System Development Corp. University Computing Company National Sharedata Corp. McDonnell Douglas Automation Co. Executive Computer Systems, Inc. Cambridge Computer Corp. Greyhound Computer Corp. Tracor Computing Corp. Programming Methods, Inc, (PMI) MISCO Allen Babock Computing Corp. RAAM Information Services Corp. Data Facilities Management Inc. Bradford Computer and Systems, Inc. Computer Usage Co., Inc. Martin Marietta Data Systems FM Revenues· 95.7 26.6 22.2 14.7 12.8 10.1 7.5 7.1 5.2 4.4 4.3 4.0 3.8 3.0 2.9 2.5 2.3 2.0 2.0 1.5 • Annual Rate in 1971 in millions of dollars two 360/40's. Analysis of captive contracts, however, shows that the average value is significantly higher at $5.6 million per year. Most of these contracts are spinoffs from ·large industrial firms who have centralized computer installations or multiple installations spread throughout the country. A more revealing analysis of contract values is shown in Table D. Here total and available contracts are distributed according to contract value. From this analysis, it is clear that well over one-third of contracts are valued at $300,000 or less per year. A typical computer installation of this value would include a CONTRACT VALUE PER YEAR 0.1-0.3 0.31-0.5 0.51-0.8 > 0.8 TOTAL ALL CONTRACTS % # 1 AVAILABLE CONTRACTS % # 129 40 39 129 38.3 11.9 11.6 38.2 122 30 34 99 42.8 10.5 11.9 34.8 337 100.0 285 100.0 AVERAGE CONTRACT VALUE: $1.91 MILLION 285 AVAILABLE CONTRACTS - AVERAGE VALUE: $1.24 MILLION 52 CAPTIVE CONTRACTS - AVERAGE VALUE: $5.61 MILLION TABLE D-FM Contract Analysis by Value I - I 360/30, 360/20, 360/25 or equivalent computers in other manufacturers' lines. There are in total 18 contracts, captive or available, valued at more than $5 million per year. These are all spinoff parent or Federal Government contracts. Projected 1977 FM market FM market potential The ultimate U.S. market potential for FM is the sum of EDP expenditures for all users. By 1977 EDP expenditures for equipment, salaries and services will total $29.5 billion spread among 52.4 thousand users. Since FM benefits are not available to all users, five criteria were developed to help identify the industry sectors which· could most benefit from FM and would be most amenable to accepting FM as an alternative approach for EDP. The five criteria are: • Homogeneous Business Methods. Industries with similar information requirements from company to company are ideal situations for FM. These might be the coding of business records, such as the MICR codes used in banks, or price standards, e.g., tariffs used in motor freight. 130 Fall Joint Computer Conference, 1972 TABLE E-High Growth Potential FM Markets Selection Criteria Industry Sector Homogeneous Business Methods Similar Products or Services Regulation by Government Agencies Prior Evidence of Subcontracting Services Special EDP Operating Problems X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Medical-Health Insurance Federal Government Banking-Commercial Insurance State and Local Government Motor Freight Brokerage Utilities-all except telephone Medical-Hospitals, Clinics Regional/Interstate Airlines Mutual Fund Accounting Banking-Savings Small & Medium Aerospace Cos. Education-Elementary and Secondary Education-College Construction and Mining X X X X X X X X X X X X X X X X X X X X company or industry practices which indicate that subcontracting of vital services is an accepted business procedure also help pinpoint industries with high FM potential. Correspondent relationships between smaller banks and larger city banks, historically a part of the banking industry, is an example. • Special EDP Operating Problems. Several industries have special EDP operating problems. These may result from historically low pay scales for EDP personnel which cannot easily be changed, such as in state and local government or a pending major conversion in basic accounting approaches imposed by an outside force, resulting in major EDP conversions as was the case in health insurance when Medicare and state health programs were implemented in the 1960's. • Similar Products or Services. The more similar the products and services sold by companies within a given industry sector, the more likely they will have common business procedures and, therefore, EDP systems. In the brokerage industry, for example, there is little differentiation in the serivce provided. • Regulation by Government Agencies. Industries that are regulated directly by State/Federal agencies or indirectly through strong trade associations also become good candidates for FM because of the enforced standards for pricing, operating procedures, account books, or other factors that impact ED P operations. Health insurance firms, utilities of all types, and brokerage firms are typical of these highly regulated industries. • Prior Evidence of Subcontracting Services. Prior TABLE F-Total FM Revenue Growth 1971-1977 Revenues Millions Contracts # Compound Annual Growth Rate of FM Revenues 446 590 389 344 350 236 318 160 23 2856 635 590 255 200 350 185 400 320 60 2995 27 37 18 16 92 12 58 52 69 28 1977 1971 Revenues Millions Contracts 104 88 146 144 7 122 20 13 1 645 89 89 42 34 15 36 13 17 2 337 $ Medical and Other Finance Discrete Manufacturing Process Manufacturing Government-State and Local Government-Federal Utilities and Transportation Wholesale and Retail Trade EDP Service Bureaus and NIS Operators Total # $ Facilities Management-A Marriage of Porcupines 131 TABLE G-Available FM Revenue Growth 1971-1977 Revenues Contracts Revenues Contracts $ Millions # $ Millions # Compound Annual Growth Rate of FM Revenues 15 76 81 36 16 6 32 22 1 285 350 503 280 236 133 196 135 129 6 1968 350 480-510 380-400 185 250-270 235-245 90-95 70-75 12-14 2052-2144 92 37 21 12 49 137 41 30 35 33 1971 Government-State and Local Finance Medical and Other Government-Federal Wholesale and Retail Trade Utilities and Transportation Discrete Manufacturing Process Manufacturing EDP Service Bureaus and NIS Operators Total 7 77 88 122 12 3 17 27 1 354 INDEPENDENTS $1,396M 1977 The above criteria were applied against major industry sectors. As a result, 16 sectors were identified and ranked according to their suitability for FM. (See Table E.) On the basis of this analysis the industry sectors most likely to benefit from FM include banking (mainly commercial), insurance, state and local governments, Federal Government, motor freight, brokerage, and medical (hospitals, and health insurance firms). Of these, the Federal Government and medical sectors are already established FM markets and will grow more slowly as a result. AVAILABLE $1,968 MILLION ProjectedFM revenues, 1971-1977 Actual realized FM revenues will be $2.86 billion in 1977. This is a 28 percent annual growth from $645 billion in 1971. Total contracts will increase to 2,995 in 1977 from 337 in 1971, with an average contract value of slightly less than $1 million. The available portion of the 1977 FM market will total $1.97 billion, up from $354 million in 1971, a growth of over 500 percent. Related contracts will be between 2,000 and 2,200 in 1977, up from 285 in 1971. See Tables F and G. Who are the vendors? TOTAL $2.856 MILLION Figure 4-1971 FM markets by type of vendor Independent vendors will retain the same share of the total FM market in 1977, as in 1971. Computer manufacturers will increase their penetration in the FM business primarily to protect installations that are threatened by competitive equipment. See Figure 4. 132 Fall Joint Computer Conference, 1972 HOW TO EVALUATE FM PROPOSALS Know what benefits are desired For the purposes of reading this segment, assume you are an EDP user considering a proposed FM contract. Assume further that by reading the previous material, you have concluded that, indeed, FM can benefit your company, both in terms of improved EDP operations and in improved information flow to the operating departments. But now you must get specific about the vendor, his proposal, and finally the detailed provisions of the contract he wishes you to sign. In this chapter we will provide the guidelines you can use to make these evaluations. Before digging into the evaluation guidelines, you should first articulate just what you, the management, and the current EDP department are expecting in the way of benefits. By doing this, you can compare your expectations as a customer with what the FM vendor is willing and able to provide. Have you had a poor experience with EDP? Is your primary objective to get out of the operating problems of an EDP department? If this is the case, then don't expect immediate improvements in the information you are receiving from EDP and the speed in which it flows to your operating departments-even if you have been told by the FM vendor this is to be the case. On the other hand, if your real goal is speeding order entry and decreasing finished goods inventory by a factor of three without a major investment in new applications software, then these are the points an FM vendor should be addressing in his proposal and you will want to evaluate him on this basis. Assuming you and the vendor have agreed on a set of expectations, let us look at the guidelines you can use in evaluating the vendor, his proposal and the FM contract: • • • • Evaluating vendor and his proposal Vendor Three potential problem areas should be explored to accurately appraise an FM vendor. These are: financial stability, past FM record, and level of industry expertise. • Financial Stability Financial stability of the vendor is a critical issue to pin down, for if he is in difficulty, such. as being short of working capital, your information • flow from EDP could be stopped leaving you in an extremely serious and vulnerable position. Previous FM Performance Record N ext to the financial record, the vendor's previous performance in FM as well as in other data services can be a good guide to his future performance on your contract. If the vendor has done well in past contracts, he no doubt will use his past work as a "showcase" and invite your visit to current sites he has under contract. However, the absence of these referrals should not be taken negatively due to the possible proprietary nature of current FM work. Industry Expertise Full knowledge of your industry and its detailed operating problems should be demonstrated completely by the vendor. This should include full appreciation for the operating parameters most sensitive for profitability in your industry and company. The vendor should be staffed with personnel who have had top management experience in the specific industry and people who have had experience in other specific industries. Vendors become more credible if they can show existing customers who are pleased with the vendor's services and who will testify to his ability to solve specific industry-oriented EDP problems. Proposal Responsiveness The proposal should be addressed specifically to the objectives that you and the vendor agreed were the purpose of considering the FM contract. The vendor should .detail exactly how he will improve your EDP operation or provide faster or improved information to serve your operating areas. He should suggest where savings can be made or what specific actions he can take that are not now being taken to effect these savings. Work Schedule for Information Reports While it is not desirable to pin the vendor down to an operating schedule for theEDP department-for it is exactly this flexibility that allows him to achieve economies of scale-he should, however, be very specific about the schedule for delivery of required reports. If you have a dataentry problem, for example, then the proposal should indicate that the computer will be available when you need to enter data. The work schedule should fully reflect as closely as possible the current way in which you do business and any change should be fully justified in terms of how it can improve the operation of the whole company, not just the operation of the EDP department. Equipment Transfer Details of equipment ownership and any trans- Facilities Management-A Marriage of Porcupines • • • • fers to the vendor should be specified. Responsibility for rental or lease payments should also be detailed. Responsibility for maintenance not built into equipment rentals or leases should also be delineated. Cost Schedule Contract pricing is the most critical cost item. A fixed-fee contract is advantageous to both parties if the customer's business volume is expected to continue at current levels or grow. If business drops, a fixed fee could hurt the customer. Thus, the fairest pricing formula is composed of two components: a fixed fee to cover basic operating costs and a variable fee based on revenue, number of orders, or some easily identifiable variable sensitive to business volume. Some contracts also include a cost-of-living escalator. The proposed cost schedule should also take into account equipment payments, wages, salary schedules, travel expenses, overhead to be paid to the customer (if the vendor occupies space in the customer's facilities) and all other expenses that might occur during the course of the contract. If special software programming or documentation is to be performed for the customer, the hourly rates to be charged should be identified in the contract. Vendor Liaison A good proposal recognizes the need for continuing contact between top management and the FM vendor. Close liaison is especially required in the early days of the contract, but also throughout its life. The cost for this liaison person should be borne by the customer, but the responsibilities and the functions that will be expected of him should be clearly stated in the proposal. Personnel Transfer Since all or most of the personnel in ED P operations will be transferred to the FM vendor, you must make sure that this will be an orderly transfer. Several questions arise in almost every contract situation and should be covered in the proposal: Does the proposal anticipate the possible personnel management problems that might come about? If all personnel are not being transferred and some may be terminated, how will this be handled? Are FM vendor personnel policies consistent with yours? IIas the vendor taken into account the possibility that large numbers of persons may not wish to join the vendor and may leave? Failure to Perform While it is most desirable to emphasize the positive aspects of an FM relationship, the negative possibilities should be explored to the satis- 133 faction of both parties. Most of these revolve around failure to perform. If the vendor fails to perform his part of the contract, you should be able to terminate the contract. The proposal should detail how this termination can be carried out. Is the vendor, for example, obligated to permit you to recover your original status and reinstall your in-house computer? What are the penalties the vendor will incur? What is the extent of his liabilities to replace lost revenue, lost profits that you may suffer as a result of his failure to perform? IIow will these lost revenues and lost profits be identified and measured? That's the vendor's side, but you also have obligations as a customer. If your input data is not made available according to schedule, for example, what is your possible exposure in terms of late reports? • Software I t is important to pin down ownership of existing software when an FM contract is signed and any subsequent software that is developed. Proprietary as well as non-proprietary software packages should be identified and specified in the report so that competitors may not benefit unfairly if the FM vendor uses the packages with other clients in your industry. Software backup and full documentation procedures should also be identified. This is one area in which FM may be a great help. If your installation is typical, your backup and documentation procedures are weak and an FM vendor, using professional approaches, should be able to improve your disaster recovery potential. FM contract: Marriage of porcupines The FM contract should incorporate all the above issues, plus any others which are uniquely critical, in an organized format for signing. The FM co~tract is as legal a document as any other the company might enter into; therefore, the customer's legal staff should carefully review it in advance of any signing. The body of a typical FM contract shows the general issues which have been discussed above and which apply in most FM contract situations. Attachments are used to detail specific information about the customer that is proprietary in an FM contract Attachments discuss the service and time schedule, equipment ownership and responsibility, cost schedule and any special issues. One. of the most striking features is the general absence of detailed legal jargon. This is typical in most 134 Fall Joint Computer Conference, 1972 FM contracts and is a result of two factors. First, the two parties have attempted to communicate with each other in the language that both understand. Second, the wording reflects an aura of trust between the two parties. In a service subcontracting relationship the customer must implicitly trust the vendor. Without this mutual trust, it would be foolish for a vendor or a customer to even consider a proposal. BIBLIOGRAPHY EDP productivity at 50%'1 Administrative Management June 1971 pp 67-67 Working Paper #177 Graduate School of Business Stanford University P J McGOVERN Interest in facilities management-Whatever it is-Blossoms EDP Industry Report April 30 1971 D M PARNELL JR A new concept: EDP facilities management Administrative Management September 1970 pp 20-24 I POLISKI Facilities management-Cure-all for DP headaches? Business Automation March 1 1971 pp 27-34 A RICHMAN Oklahoma bank opts for FM Bank Systems and Equipment February 1970 pp 18-32 EDP-What top management expects Banking April 1972 pp 18-32 L W SMALL Special report on bank automation Banking April 1971 Facilities management users not sure they're using-If they are Datamation January 1 1971 p 54 When EDP goes back to the experts Business Week October 18 1969 pp 114-116 KUTTNER et al Is facilities management solution to EDP problems? The National Underwriter January 23 1971 H CLUCAS JR The user data processing interface Quantum Science Corporation Reports Dedicated information services July 1970 Facilities management-How much of a gamble? November 1971 Federal information services October 1971 Network information services April 1971 Automated map reading and analysis by computer by R. H. COFER and J. T. TOU University of Florida Gainesville, Florida INTRODUCTION Florida's IBM 360/65 computer utilizing less than lOOK words of direct storage. Although the set of possible map symbols is quite large, those used in modern topographic maps form the three classes shown in Figure 2. Point symbology is used to represent those map features characterized by specific spatial point locations. This class of symbology is normally utilized to represent cultural artifacts such as buildings, markers, buoys, etc. Lineal symbology is used to mark those features possessing freedom of curvature. This class is normally utilized to represent divisional boundaries, or routes of transportation. Typical examples of lineal symbology include roads, railways, and terrain contours as well as various boundary lines. Area symbology is used to represent those features possessing homogeneity over some spatial region. It is normally composed of repeated instances of point symbology or uniform shading of the region. Examples include swamps, orchards, and rivers. As its extension to the recognition of area symbology is rather straightforward, MAPPS has been designed to recognize the point and lineal forms of symbology only. Further it has been designed to recognize only that subset of point and lineal symbology which possess topographically fixed line structures. This restriction is of a minor nature since essentiall all map symbology is, or may be easily converted to be, of this form. Even given these restrictions, MAPPS has immediate practical utility since many applications of map reading require only partial recognition of the symbology of a given map. As an example, the survey of cultural artifacts can be largely limited to the recognition of quite restricted forms of point and lineal symbology. Color information provides strong perceptual clues in maps. On standard topographic maps for instance blue represents hydrographic features, brown represents terrain features, and black represents cultural features. Even so, utilization of color clues is not incorporated into MAPPS. This has been done to provide a more stringent test of other more fundamental techniques of A great deal of attention is presently being given to the design of computer programs to recognize and describe two-dimensional pictorial symbology. This symbology may arise from natural sources such as scenery or from more conventionalized sources such as text or mathematical notation. The standardized graphics used in specification of topographic maps also form a conventionalized, two-dimensional class of symbology. This paper will discuss the automated perception of the pictorial symbology to be found within topographic maps. Although conventionalized, this symbology is used in description of natural terrain, and therefore has many of the characteristics of more complex scenery such as is found within aerial photography. Thus it is anticipated that the techniques involved may be applied to a broader class of symbology with equal effectiveness. The overall hardware system is illustrated by Figure 1. A map region is scanned optically and a digitized version of the region is fed into the memory of a computer. The computer perceives in this digitized data the pictorial symbology portrayed and produces a structured output description. This description may then be used as direct input to cartographic information retrieval, editing, land-use or analysis programs. THE PROGRAM Many results of an extensive research into the perception of pictorial symbology have been incorporated into a computer program which recognizes a variety of map symbology under normal conditions of overlap and breakage. The program is called MAPPS since it performs Machine Automated Perception of Pictorial Symbology. MAPPS is written in the PL/l programming language heavily utilizing the language's list and recursive facilities. It is operated on the University of 135 136 Fall Joint Computer Conference, 1972 Computer System Figure I-The overall hardware system recognition. It is obvious however, that utilization of color descriptors can be easily incorporated, and will result in increased speed of execution and improved accuracy of recognition. MAPPS is divided into three systems: picture acquisition, line extraction, and perception. In brief, the picture acquisition system inputs regions of the map into the computer, the line extraction system constructs data entities for each elementary line and node present in the input, and the perception system conducts the recognition of selected symbology. A flow-chart of MAPPS is shown in Figure 3. scanning of 35 mm transparencies within a research environment. I It consists of a flying-spot scanner, minicomputer, disk memory, storage display, and incremental tape unit. In operation, PIDAC scans a transparency, measures the optical density of the transparency at each point, stores the results in digital form, performs limited preprocessing actions, and generates a digital magnetic record of the acquired data for use by the IBM 360/65 computer. For each transparency, PIDAC scans a raster of 800 rows by 1024 columns, a square inch in the film plane Picture Acquilition Slltem Lin. Eatractlon Sl,tem 51mboiou Perc.ptlon Sl't.m po.. lbility 0' laolottoll PICTURE ACQUISITION The picture acquisition system PIDAC is a hardware system developed by the authors to perform precision D Arroy looloto" Symbology ea.ed LI.t Structure Llno. a Nod.. 0' X .d. Gray-tovet Picture Point Symbology ~ 8tnary Ptctur. \\ Recoonized Map Symbolooy Arroy I \ Procel.. d lilt of 1111•• olld lIod •• Ullr QUlry \ DOIcrlptlon of •• irod .ymboloOJ J Figure 3-MAPPS flow chart Lineal SymboloGY "II, t =~ ":' ~I~ ":' 00000 00000 00000 0000 Area Symbology Figure 2-Classes of map symbology thus corresponds to approximately 106 points. At each raster point PIDAC constructs a 3-bitnumber corresponding to the optical density at that point. As the original map may be considered to be black and white, a preprocessing routine, operating locally, dynamically reduces the 3-bit code to a 1-bit code in a near optimal fashion. This action is accomplished by a routine called COMPACT since it compacts storage requirements as well. The result is an array whose elements correspond to the digitized points of the map region. This compacted array is then input to the line extraction system. Automated Map Reading and Analysis by Computer 137 LINE EXTRACTION I I CLEAN FINAL CLEAN UP I I MEDIAL AXIS DETERMINATION I ,:, I. &r ~ I ~ I LIST GENrATION Name - 1 1st fbie NaIre - 1 1st Ncxle Position - (38.44) 2nd Ncxle Name - 2 2m Ncxle Position - (57.45) Line Ier¢h - 49 Grid-Intersect Coding - 24424 54444544546464465 7656606007777000000000 60000 Ncxle Eht%y 4-POINT LOOP REMOVAL Name - 2 Positicn - (57.45) Ibnber Adjacent Lines - 3 1st Adjacent Line Erxi 1st Line - 2 2nd Adjacent Line - 2 End 2nd Line - 1 3td Adjacent Line - 3 Erxi 3rd Line - 2 As shown by Figure 3, the compacted array is input to the line extraction system. The function of this system is the extraction of each of the elementary line segments represented in the map, so that the program can conduct perception in terms of line segments and nodes rather than having to deal with the more numerous set of digitized points. The system of line extraction, as developed, does not destroy or significantly distort the· basic information contained within the map. This is necessary since significant degradation makes later perception more difficult or impossible. The action of the line extraction system is illustrated in Figure 4. First the map is cleared of all small holes and specks likely to have resulted from noise. Then a connected medial axis network of points is obtained for each black region of the map. This first approximation to a desired line network is converted to a true line network by an algorithm called 4-point loop removal. Operating on a more global basis, later algorithms remove spurious lines and nodes, locate possible corner nodes, and convert to a more suitable list processing form of data base. For each line and node, a PL/I based structure is produced. Each structure contains attributes and pointers to adjacent and nearby data entities. The structure for a line entity contains the attributes of width, length, grid interest coding, as well as pointers to adjacent nodes. The structure for each node entity contains the attribute of position and pointers to adjacent lines and nearby nodes. The line extraction system, being somewhat intricate, has been discussed in detail in a prior paper. 2 Abstractly, each state S of the system can be viewed as responding to distortions occurring within the map. These distortions may be characterized by a set of context sensitive productions of the form Rr(i,j)Rzn(i,j)~Rr(i,j)Rlnf(i,j) z= 1, 2, ... , Ns RIm (i, j) represent some region about the point (i, j) having a fixed size and gray-level distribution. Rzn(i, j) and· Rlnf (i, j) represent regions of the point (i, j) having the same fixed size but differing gray-levels. By inversion of the production sets, each stage can be described as the repetitive application of the rules Rr(i,j)Rzn f (i,j)~Rr(i,j)Rln(i,j) l= 1,2, ... ,Ns in forward raster sequence to the points (i, j) of the map until no further response is obtainable. As an example, one such rule {M(i+l,j) =0, M(i-l,j) =O} {M(i,j) =1} Figure 4-Action of line extraction system ~{M(i+l,j) =0, M(i-l,j) =O} {M(i, j) =2} 138 Fall Joint Computer Conference, 1972 is used in the medial axis determination to mark object regions of width 1 as possible line points for further investigation. It is important to observe the degree of data reduction and organization which is accomplished through the extraction of line data. As previously mentioned, even a small map region contains a huge number of nearly 106 picture points. The extracted list structure typically contains no more than 300 lines and node points thereby resulting in a very significant data reduction. Equally significant, the data format of the list structure permits efficient access of all information to be required in the perception of symbology. The digitized map array therefore may be erased to free valuable storage for other purposes. PERCEPTION OF SYMBOLOGY It is interesting to observe that certain familiar pattern recognition procedures cannot be directly used in the recognition of map symbology. This results from the fact that in cartography, symbology cannot be well isolated as there are often requirements for overlap or overlay of symbology in order to meet spatial positioning constraints. Many of the techniques used for recognition of isolated symbology, such as correlation or template matching of characters, cannot be used to recognize such non-isolated symbology and are thus not very powerful in map reading. In MAPPS, alternative techniques have been employed to accomplish isolation of symbology in addition to its recognition. THE CONCEPT OF ISOLATION PROCESSING Conceptually, isolation of symbology from within a map cannot be accomplished in vacuo. Isolation requires some partial recognition, while recognition generally requires some partial isolation. This necessitates the use of a procedure in which isolation is accompanied by partial recognition. In order to guide this procedure, there must exist some a priori knowledge about the structure of the symbology being sought. The underlying structure of pictorial symbology, such as is present in maps and elsewhere, is found to be that of planar graphs upon the removal of all metric constraints. Using this structure the isolation process functions by sifting through the data of the map proposing possible instances of pattern symbology on a graph-theoretic equivalency basis; thereby suppressing extraneous background detail. is isomorp hie to is h omao mo rphie to Figure 5-Graph equivalencies Two types of graph equivalency are used in isolation. These are • isomorphism • homomorphism One graph is isomorphic to another if there exists a one-to-one correspondence between their nodes which preserves adjacency of edges. A graph is homomorphic to another if it can be obtained from the other by subdivision of edges. Figure 5 yields an instance of isomorphic and of homomorphic equivalence of graphs. Using the above definitions of graph equivalency, the process of isolation can be achieved by means dependent upon and able to cope with the types of structural degradation, Figure 6, found within actual maps. For instance, should a map contain no structural degradation, then on the basis of graph structures only, it is necessary and sufficient to propose as possible symbology isolations those components of the map which are isomorphic to the sy~bology being sought. If the map Crossing of lines Breakage of lines Overlay of nodes Overloy of lines Uncertain location of corner nodes Figure 6-Structural degradations occurring in MAPPS Automated Map Reading and Analysis by Computer contains no crossing, overlay, or breakage of lines then on the basis of graph structures only, it is necessary and sufficient to propose as possible symbology isolations those partial subgraphs of the map which are isomorphic to the symbology. If the map contains no breakage or overlay of lines, then it is necessary and sufficient to propose those partial subgraphs which are homomorphic to the symbology sought. Finally, if a map contains as the only forms of structural degradation: line crossovers, line breakage, node overlay, and uncertain location of corner nodes, first complete the map by filling in all possible instances of line breakage. Then it is necessary and essentially sufficient to propose as possible pattern isolations those partial subgraphs of the completed map which have no two adjacent broken edges and which are homomorphic to the symbology sought. Although the process of isomorphic matching of graphs can be conducted rather efficiently,4 the more realistic process of homomorphic matching requires the investigation of large numbers of partial subgraphs of the map for possible equivalency to pattern symbology. (b) Region containing instances (a) a feature space of pattern symbology ./ L /' '/ V (c) region forme d by (d) best con.native region bounds te.ting of formed by bound testing feotures of f eatur IS Figure 7-Partitioning of feature space by metric tests (a) A feature space (b) Region containing instances of pattern symbology (c) Region formed by bounds testing of features (d) Best conservative region formed by bound testing of features 139 8 H ." /::;. G F Cl E y a .~ B • 0 A B fj I~' ~ • /::;.." 8 • y C Its spanning tree T. A pattern Symbol S 4 a c 8 T '. I~ 12 16 18 "~. 19 13 The element. T. Ii) of T. AppHcotion to a mop Figure 8-The structure of a pattern symbol In order to limit the number of partial subgraphs which need be checked for homomorphic match, metric equivalency tests have been integrated into the graph theoretic isolation process. These tests include bounds checking of the lengths, curvatures, thicknesses, and angles between lines, and may be easily extended as required. If the metric tests are well chosen then they will be conservative, i.e., will not reject any true instance of pattern symbology. This may be seen by viewing the various screening quantities as features in the feature space of Figure 7. The ensemble of all true instances of pattern symbology will form some region A in the space, Figure 7. Any set of metric tests may also be viewed as partitioning the feature space, passing only those instances of symbology which lie in some region B of the space formed by the partition. If region B contains region A, then the set of tests is conservative. If region B exactly contains region A, then the set of tests also form a perfect recognizer. It is more important however, that the tests result in a high processing efficiency. This may be achieved by immediate testing of each feature as it is first calculated. This form of testing generates a partition which boxes in some region of feature space as shown in Figure 7c. While this partition is not necessarily perfect, it is usually possible to adjust the bounds of the tests so as to achieve a near optimal, as well as conservative, performance on the basis of limited sampling of pattern symbology within a map, Figure 7d. Thus the isolation process may also often serve well as the final stage· of recognition. When desired, however, it is always possible to concatenate other more conventional recognition processes in order to achieve yet higher accuracies of recognition. 140 Fall Joint Computer Conference, 1972 To Calling Routines __--------------__ A~ (final Success End ~Of- Pattern _________________ Err~ Temporary Success Return \ Get Next M atc h Possibility For This Match Level ~ then MATCH takes a FINAL SUCCESS exit which carries it back up the recursive string with the isolated symbology from Gm • If all matching possibilities for T 8 (i) , i>l, have been exhausted then MATCH takes an ERROR RETURN exit back to the i-lth level of recursion in order to try to find other matching possibilities for T8(i-l). Alternatively if all matching possibilities for Ts (i), i = 1, have been exhausted then MATCH fails to isolate the symbology sought and exits along the ERROR RETURN exit to the calling program. On the other hand, if it finds an acceptable match for Ts(i) then it exits via the TEMPORARY SUCCESS exit to continue the matching search for T8(i+l). At each recursive level, MATCH performs one of three specific actions: matching to nodes of T 8 , matching to lines of T and initial matching to new pattern components of Ts. 8 , , ,\ Final \(uccess Temporary Success Error Return) Matching of nodes y Recursive Invocation Of Match Figure 9-Structure of MATCH The routine MATCH Application of the search for graphical and metric equivalencies is conducted via a recursive routine called MATCH. On the graph-theoretic level, MATCH functions through utilization of tree programming. In this approach, a spanning tree Ts is pre-specified for each pattern symbol S. The elements of Ts are named T s (i),i=l, 2, ... , N s, where Ts(i) is constrained to be connected to Ts(1) through the set {T8 (j), j = 1, 2, ... , i}. These structural conventions, illustrated by Figure 9, are developed to permit utilization of a recursive search policy in matching Ts and the partial subgraphs of a map Gm • The recursive structure of MATCH is shown in Figure 9. It has one entry from and two exits back to the calling program. Being recursive, it can call itself. At the ith level of recursion, MATCH investigates the possibilities of homomorphic equivalence of elements Gm to Ts(i). As each possibility is proposed MATCH checks to insure that all implied graph-theoretic equivalencies between Ta(j) ,j = 1, 2, ... , i, and Gm are acceptable, and that basic metric equivalences are met. More explicitly, at the recursive level i, MATCH takes the following action. If Ts has been fully matched The fundamental operation performed by MATCH is the matching of the immediate neighborhood of a node of Gm to that of Ts. This matching must satisfy several constraints. It must be feasible, must satisfy a···· CJ .. : ~.. ~-~><. (a) node neighborhoods before matching .. g':" . ~ Gm ... . 'l~' '. 5 6 (b) node neighborhoods after matching Figure lo-Matching of node neighborhoods (a) N ode neighborhoods before matching (b) Node neighborhoods after matching •• ' ... Automated Map Reading and Analysis by Computer ... ~ ... (a) line regions before matching . ~ .;y. 141 This path may contain one or more elementary lines and may even contain breaks. The path must, however satisfy minimal constraints. It must not cross over it~ self, no portion of the path other than endpoints may have been previously matched, no breaks may be adjacent, the implied endpoint matchings must be consistent with prior matchings, and finally certain metric equivalencies must be observed. Typically these metric equivalencies need be no more complex than a rough correspondence of length and curvature between line of T8 and the path within Gm • As an example of the matching of lines consider Figure 11. If the conditions of Figure 11a hoid upon a call of MATCH then Figure lIb shows a suitable matching between the line of Ts and a path within Om . Gm ... ~ ... (b) line regions after matching Figure ll-Matching of line regions (a) Line regions before matching (b) Line regions after matching certain angular conditions, and must not violate any prior matching assumptions. A matching is feasible if the degree of the node of Gm is greater than or equal to that of the node of Ts. This requirement, for example results in termination of the matching of the map seg~ ment of Figure 8 at recursive stage 16 because the degreeof node Ts(16) was greater than the degree of the corresponding node of Gm. A matching satisfies necessary angular constraints if all internal angles of the planar graphs of Gm and Ts are sufficiently similar. It satisfies prior matching assumptions if the present matching attempt is not in conflict with previous matching attempts or involves lines of Gm which are matched to other pattern symbology. The neighborhood of a node of Ts is considered to be fully matched when the node and its adjacent lines are matched to a node of Gm and some subset of its adjacent lines. For instance, if the conditions represented in Figure lOa hold upon a call of MATCH, then Figure lOb shows a suitable match of the neighborhoods. Matching of lines In matching a line of T8 to elements of Gm , MATCH finds a path in Gm which corresponds to the line of T s , Initial m.atching of com.ponents . Matching of nodes and lines of connected symbology conducted by the tracking of connectivity via T B • This technique may be extended to the matching of symbology S composed of disjoint components through inclusion of lines within Ts which link nodes of the various components of S. These lines may, for matching purposes, be treated as straight lines of T s , thereby simplifying the matching process. IS FINAL CLASSIFICATION MAPPS has the capability for inclusion of a final classification routine (CLASS). When used this routine serves to provide a final decision as to whether an isolated piece of symbology is a true instance of the symbology being sought. If the isolation is determined to be erroneous, then MATCH continues its search toward an alternative isolation .. CLASS may be implemented by a variety of app,roaches, the best known and most appropriate of which is through use of discriminant functions. The power of its application can be dramatically enhanced through proper use of results from MATCH. For example, quite complex features can be devised for input to CLASS from the very detailed description of the isolated symbology produced by MATCH. As further example MATCH may be used to isolate new symbology and tentative classification from a map to form a training set for CLASS. Then with or without a teacher, the discriminant function underlying CLASS can be perturbed toward a more optimal setting by any of several well-known algorithms. 3 142 Fall Joint Computer Conference, 1972 /} : I ! }i Figure 12-Test results Figure 12a-The input map Figure 12c-Isolated highways OUTPUT The Output Routine (OUTPUT) takes the isolated symbology recognized by earlier routines (MATCH, CLASS, ... ) and produces the final MAPPS output in ··-···l···-···-······ ..-.. --..---~I accordance with a specific user query. This is accomplished by establishing a data structure in which data can be retrieved through use of relational pointers. Retrieval is effected by specification of the desired symbology class and by calling various relations. The relation "contains" may be used, for instance, to find ----- ------_--..,-.. _.____ _ ................................................._._......_-.. _- .'~' Figure 12d-Isolated railways Figure 12b-The input map after line extraction Automated Map Reading and Analysis by Computer 143 tively a call of "position" will return the nominal location of the center of each isolated symbology. RESULTS OF TEST RUNS ....I.:::~:...I.·. ......-- ..:~.... ...-.=~=:::.- . . . .. ~> l / . Figure 12e-Isolated roads the various isolated symbols belonging to a specified symbology class. Another call of "contains" will then result in presentation of all lines and nodes present in the specified symbology. Yet another call of "contains" will return the specific picture points involved. Alterna- MAPPS has been tested on several map regions. In each case CLASS was set to accept all isolations in order to most stringently test the operation of MATCH. Throughout all testing the results were highly satisfactory. Figure 12 presents the results for a representative run. The map region of Figure 12a was fed to the early stages of MAPPS producing the preprocessed map of Figure 12b. This preprocessed map was then subjected to several searches for specified symbology resulting in Figures 12c through 12k. In all but one case the recognition was conservative. Only in the case of Figure 12f was a false isolation made. An M was there recognized as an E. Had CLASS been implemented using character recognition techniques, this misrecognition could have been avoided. In those cases where recognition was incomplete, as for the highway of Figure 12c, isolation was terminated by MATCH due to mismatch of structure between the map and symbology sought. Some overall statistics on the test run: MAPPS correctly found instances of 18 types of lineal and point symbologies. These instances were formed from 5382 elementary lines. In addition 7 incorrect instances were EF' .-.t. e E Figure 12f-Isolated 'E's Figure 12g-Buildings 144 Fall Joint 'Computer Conference, 1972 I I I . . e\ \/, " , '".r Figure 12h-Benchmark symbols Figure 12j-Swamp symbols isolated although in each case this could have been avoided by use of a proper classification structure within CLASS. Since minimization of run-time was of minor importance, the average test-run for each symbology search of a map region took approximately 10 minutes from input film to output description. It is estimated that this could have been improved very significantly by various means; however this was not a maj or goal at this stage of research. "·"+-r::, .~--!: Figure 12i-Churches Figure 12k-Spring symbols Automated Map Reading and Analysis by Computer 145 CONCLUSION ACKNOWLEDGMENT This work has been an investigation into a broad class of conventionalized, two-dimensional, pictorial patterns: the symbology of maps. Important aspects of the problem involve line extraction, isolation under conditions of qualitatively-defined degradation, use of graph structures and matching techniques in isolation, and interactive recognition of geometrically variable symbology. A sophisticated approach to line extraction yielded a useful data base upon which to conduct symbology isolation and recognition. The use of graph structure and matching in symbology isolation proved very effective. Unexpectedly, it was found to be seldom necessary to resort to formal classification techniques in recognition of the isolated symbology. Such techniques could be incorporated as desired resulting in a continued search for symbology in case of any misisolation. The program as a whole is able to be expanded to the recognition of a wide variety of graphical symbology. In addition, the concepts involved can quite possibly be applied to the automated perception of gray-level sceneries such as blood cells, aerial photographs, chromosomes, and target detection. The authors would like to acknowledge the interest displayed by other members of the Center for Informatics Research in this and related research. This research has been sponsored in part by the Office of Naval Research under Contract No. N00014-68-A-0173-0001, NR 049-172. REFERENCES 1 R H COFER Picture acquisition and graphical preprocessing system Proceedings of the Ninth Annual IEEE Region III Convention Charlottesville Virginia 1971 2 R H COFER J T TOU Preprocessing for pictorial pattern recognition Proceedings of the NATO Symposium on Artificial Intelligence Rome Italy 1971 3 J T TOU Engineering principles of pattern recognition Advances in Information Systems Science Vol 1 Plenum Press New York New York 1968 4 G SALTON Information organization and retrieval McGraw-Hill Book Company N ew York 1968 Computer generated optical sound tracks by E. K. TUCKER, L. H. BAKER and D. C. BUCKNER University of California Los Alamos, New Mexico were represented by sounds, interpretation of results would be greatly facilitated. This is feasible only if the sound track is computer produced, not "dubbed in" after the fact. It should be made clear at this point that it was not an objective of this project to have the computer create all of the waveforms represented on the sound track. What was required was that the computer be able to reproduce on an optical sound track any recorded audible sound, including voices or music. The waveforms that the computer would actually have to create could be limited to some of the sounds we wanted to use as data representations. INTRODUCTION For several years various groups at the Los Alamos Scientific Laboratory have been using computer generated motion pictures as an output medium for large simulation and analysis codes. 1 ,2,3 Typically, the numerical output from one simulation run is so large that conventional output media are ineffective. The timevariable medium of motion picture film is required to organize the results into a form that can be readily interpreted. But even this medium cannot always convey all of the information needed. Only a limited number of variables can be distinctly represented before the various representations begin to obscure or obliterate each other. Furthermore, the data presented usually must include a significant amount of explanatory material such as scaling factors, representation keys, and other interpretive aids. If a film is to have long-term usefulness to a number of people, this information must either be included on the film or in a separate writeup that accompanies the film. In an effort to increase the effective information density of these films, a study was undertaken to determine the feasibility of producing optical sound tracks as well as pictorial images with a microfilm plotter. Some exploratory work done at the Sandia Laboratories, Albuquerque, New Mexico, suggested that this might provide a good solution to the problem. 4 It has been demonstrated many times that a sound track facilitates the interpretation of visual presentations. 5 However, from our standpoint, the addition of another channel for data presentation was as important as facilitating interpretation. Not only could a sound track present explanatory and narrative material efficiently and appealingly, it could also be used to represent additional data that might otherwise be lost. For example, it is always difficult to clearly represent the movement of many particles within a bounded three-dimensional space. If, however, the collisions of particles-either with each other or with the boundaries of the space- OPTICAL SOUND TRACKS Sound is generated by a vibrating body which produces a series of alternating compressions and rarefactions of some medium, i.e., a wave. As this series is propagated through the medium, particles of the medium are temporarily displaced by varying amounts. We shall speak of the magnitude and direction of this displacement as the instantaneous amplitude of the wave. If the variation of this amplitude can be described as a function of time, a complete description or encoding of the wave is obtained. Thus, a sound wave can be "stored" in various representations, as long as the representation fully describes the variation of amplitude with respect to time. An optical sound track is one way of representing a sound. It consists of a photographic image which determines the amount of light that can pass through the track area of the film at a given point. As the film is pulled past the reader head, varying amounts of light pass through the film to strike a photocell, producing a proportionally varying electrical signal. A given change in signal amplitude can be produced at the photocell by varying either the area or the intensity of exposure of the sound track image. Conventional sound tracks are produced by either of two methods. The variable area type of track is pro147 148 Fall Joint Computer Conference, 1972 duced by having a beam of light of constant intensity pass through a slit of variable length to expose the film. In the. variable intensity recording method, either the light's intensity or the slit width can be varied with the slit length held constant. Commercial sound tracks are· produced by both methods. In both cases, the sound track image is produced on a special film that is moved past the stationary light source. Separate films of sound track and pictures are then reprinted onto a single film. Sixteen-millimeter movies with sound have sprocket holes on only one edge. The sound track is located along the other edge of the film (see Figure 1). Such sound tracks are normally limited to reproducing sound with an upper frequency of 5000-6000 Hz. This limitation is imposed by the resolution that can be obtained with relatively inexpensive lens systems, film and processing and by the sound reproduction system of most 16 mm projectors. 6 INPUT SIGNALS In order not to be limited to the use of computer created sounds alone, it was necessary to be able to store TIME SAMPLING THE ORIGINAL SIGNAL y ............... ............. ~------------------~~------------.-x APPROXIMATING THE ORIGINAL SIGNAL FROM THE SAMPLES Figure 1-A computer generated optical sound track Figure 2~Discrete sampling Computer Generat.ed Optical Sound Tracks other complex audio signals, such as voices, in a form that could be manipulated by a digital computer. As discussed above, any audio signal can be completely described by noting the variation of the signal's amplitude as a function of time. Therefore, the data for a digital approximation of an audio signal can be obtained by periodically sampling the signal's amplitude (see Figure 2). The primary restriction associated with this approach requires that the sampling rate be at least twice the highest frequency contained in the signal,7 In effect, samples obtained at a relatively low sampling rate S from a wave containing relatively high frequencies f will create a spurious "foldover" wave of frequency 8-f. The input for our experimental film was recorded on standard 7.4'-inch magnetic tape at a speed of 7,72 IPS. Frequencies greater than 8000 Hz were filtered out, and the resulting signal was digitized at a sampling rate of 25,000 samples/second. The digitizing was performed on an Astrodata 3906 analog-to-digital converter by the Data Engineering and Processing Division of Sandia Laboratories, Albuquerque. The digital output of this process was on standard ,72-inch 7-track digital magnetic tape in a format compatible with a CDC 6600 computer. This digital information served as the audio input for the sound track plotting routine. PLOTTING THE SOUND TRACK The sound track plotting routine accepts as input a series of discrete amplitudes which are then appropriately scaled and treated as lengths. These lengths are plotted as successive constant intensity transverse lines in the sound track area of the film. When these lines are plotted close enough together, the result is an evenly exposed image whose width at any point is directly proportional to an instantaneous amplitude of the original audio signal (see Figure 1). Consequently, as this film is pulled past the reader head, the electrical signal produced at the photocell of the projector will approximate the wave form of the original audio signal. The routine is written to produce one frame's sound track at a time. During this plotting, the film is stationary while the sound track image is produced line by line on the cathode ray tube of the microfilm plotter. The sound reproduction system of a motion picture projector is very sensitive to any gaps or irregularities in the sound track image. Plotting a sound track, therefore, requires very accurate film registration. Furthermore, the sound track image must be aligned in a perfectly vertical orientation. If either the registration or the vertical alignment is off, the track images for successive frames will not butt smoothly together and noise will be produced. 149 PLOTTER MODIFICATIONS All of our early experimental films were produced on an SD 4020 microfilm printer/plotter. Three modifications had to be made to the 16 mm camera of this machine in order to make these films. These modifications do not affect any of the camera's normal functions. In the first modification, the Vought 16 mm camera had to be altered to accommodate single sprocketed 16 mm movie film. For this it was necessary to provide a single sprocketed pull-down assembly. This was accomplished by removing the sprocket teeth on one side of the existing double sprocket pull-down assembly. Next, it was necessary to replace the existing lens with a lens of the proper focal length to enable the camera to plot the sound track at the unsprocketed edge of the film. The lens used was a spare 50 mm lens which had previously been used on the 35 mm camera. With the existing physical mountings in the 4020, this 50 mm lens presents, at the film plane, an image size of approximately 17.5 X 17.5 mm. Thus, with proper raster addressing, a suitable 16 mm image and sound track may be plotted on film. (Increasing the image size in this fashion produces a loss of some effective resolution in the pictorial portion of the frame while the 50 mm lens is in use. This loss of resolution in the picture portion is not particularly penalizing in most applications.) Finally, it was necessary to expand the aperture both horizontally and vertically to allow proper positioning and abutment of the sound track on the film. By interchanging the new lens with the original lens, normal production can be resumed with no degradation caused by the enlarged aperture and single sprocketed pull-down. No other modifications were required on the SD 4020 in order to implement the sound track option. The primary difficulty we encountered using the SD 4020 was that we could not get consistently accurate butting of consecutive frames. Therefore, the later films were plotted on an III FR-80, which has pin registered film movement. In order to use this machine, the film transport had to be altered to accommodate single sprocketed film, and the aperture had to be enlarged. A software system tape was produced to allow the sound track image to be plotted at the unsprocketed edge of the film, with the pictorial images still plotted in the normal image space. The FR-80 also provides higher resolution capabilities, so that no loss of effective resolution is incurred when pictorial images and the sound track are plotted in one pass through the machine. As was discussed earlier, optical sound tracks are usually limited to reproducing sound with an upper frequency of 5000-6000 Hz. Since motion picture film is projected at a rate of 24 frames/second, a minimum of 150 Fall Joint Computer Conference, 1972 410 lines per frame are needed to represent such frequencies in the sound track. While we have made no quantitative tests to demonstrate the production of such frequencies, we would expect efficient resolution to produce frequencies in or near this range with either of the plotters. Our applications so far have not needed the reproduction of sounds in this frequency range. THE TRACK PLOTTING ROUTINE The present sound track plotting routine was written with three primary objectives in mind. First, it was felt that it would be advantageous to be able to produce both pictorial imagery and the sound track in one pass through the plotter, with the synchronization of pictures and sound completely under software control. Second, the routine was written to allow the user maximum flexibility and control over his sound track "data files". Finally, the routine was designed to produce film that could be projected with any standard 16 mm projector. One-pass synchronization The sound track plotting routine is written to produce one frame's sound track at a time, under the control of any calling program. However, in a projector, the reader head for the sound track is not at the film gate; it is farther along the threading path. The film gate and the reader head are separated by 25 frames of film. Therefore, to synchronize picture and sound, a frame of sound track must lead its corresponding picture frame by this amount so that as a given frame of sound track arrives at the reader head, its corresponding pictorial frame is just reaching the film gate. In order to be able to generate both picture and sound in one pass through the plotter, it was necessary to build a buffer into the sound track plotting routine. This buffer contains the plotting commands for 26 consecutive frames offilm. In this way, a program plotting a pictorial frame still has access to the frame that should contain the sound track for the corresponding picture. The simultaneous treatment of pictorial plot commands puts the synchronization of pictures and sound completely under software control. Furthermore, this can be either the synchronization of sound with picture or the synchronization of picture with sound. This is an important distinction in some applications; the current picture being drawn can determine which sound is to be produced, or a given picture can be produced in response to the behavior of a given sound track wave. Flexibility The present routine will read from any number of different digital input files and can handle several files simultaneously. Thus, for example, if one wishes to have a background sound, such as music, from one file behind a narrative taken from another file, the routine will combine the two files into a single sound track. The calling routine can also control the relative amplitudes of the sounds. In this way, one input signal can be made louder or softer than another, or one signal can be faded out as another one fades in. Any input file can be started, stopped, restarted or rewound under the control of the calling program. DEMONSTRATION FILMS Several films with sound have been produced using the sound track plotting routine. Most of the visual portions were created with very simple animation techniques in order to emphasize the information content added by the sound track. The films review the techniques employed for the generation of a sound track. No attempts have been made to rigorously quantify the quality of the sounds produced since no particular criterion of fidelity was set as an objective of the project. Furthermore, the sound systems of portable 16 mm projectors are not designed to produce high fidelity sound reproduction, since the audio portion will always be overlaid by the noise of the projector itself. For our purposes it was enough to make purely subjective judgments on the general quality of the sounds produced. SUMMARY The ability to produce optical sound tracks, as well as pictorial imagery, on a microfilm plotter can add a tremendous potential to computer generated movies. The sound medium can serve to enhance the visual presentation and can give another dimension of information content to the film. This potential cannot be fully exploited unless the sound track and the pictures can be plotted by the computer simultaneously. Under this condition, the input for the sound track can be treated by the computer as simply one more type of data in the plotting process. The input for the sound track plotting routine discussed in this report is obtained by digitizing any audio signal at a suitable sampling rate. This digital information can then be plotted on the film like any other data. Very few hardware modifications were made to the Computer Generated Optical Sound Tracks plotter in order to produce sound tracks. The modifications that were made did not affect the plotter's other functions. The routine is written to give the user as much flexibility and control as possible in handling his sound track data files. Multiple files can be combined, and synchronization is under the control of the user's program. It now appears that the production of computer generated optical sound tracks will prove to be cost effective as well as feasible. If so, this process could conveniently be used to add sound to any computer generated film. ACKNOWLEDGMENTS While many individuals have made significant contributions to this project, the authors would like to give particular thanks to Jerry Melendez of Group C-4 for many hours of help in program structuring and debugging. The work on this project was performed under the auspices of the U. S. Atomic Energy Commission. 151 REFERENCES 1 L H BAKER J N SAVAGE E K TUCKER Managing unmanageable data Proceedings of the Tenth Meeting of UAIDE Los Angeles California pp 4-122 through 4-127 October 1971 2 L H BAKER B J DONHAM W S GREGORY E K TUCKER Computer movies for simulation of mechanical tests Proceedings of the Third International Symposium on Packaging and Transportation of Radioactive Materials Richland Washington Vol 2 pp 1028-1041 August 1971 3 Computer fluid dynamics 24-minute film prepared by the Los Alamos Scientific Laboratory No Y-204 1969 4 D ROBBINS Visual sound Proceedings of the Seventh Meeting of UAIDE San Francisco California pp 91-96 October 1968 5 W A WITTICH C F SCHULLER A udio visual materials Harper & Row Publishers, Inc. New York 1967 6 The Focal encyclopedia of film and television techniques Focal Press New York 1969 7 J R RAGAZZINI G F FRANKLIN Sampled-data control systems McGraw-Hill Book Company New York 1958 Simulating the visual environment in real-time via software by RAYMOND S. BURNS University of North Carolina Chapel Hill, North Carolina INTRODUCTION Laboratory, Providence, Rhode Island, has constructed several examples of unprogrammed simulators. One of these features a model terrain board with miniature roads and buildings over which a television camera is moved through mechanical linkages to the steering wheel of an automobile mock-up. The television camera is oriented so that the subject is presented with a windshield view. This arrangement earns the "unprogrammed" label within the physical limits of the terrain board. In practice, however, its value as a research tool is limited to studying driver behavior at dusk, as the image presented to the subject is dim. Natural daylight illumination, even under cloudy conditions, is much brighter than the usual indoor illumination. Duplicating the natural daylight illumination over the surface of the whole terrain board was found to be impractical in terms of the heat produced and the current re,.. quired by the necessary flood lamps. Because of the difficulties and disadvantages of filmand terrain board-type simulators, some efforts in recent years have been directed toward constructing visual simulators based on computer-generated images. General Electric has developed a visual simulator for NASA, used for space rendezvous, docking and landing simulation, which embodies few compromises. 2 The G. E. simulator output is generated in real time and displayed in color. However, from a cost standpoint, such a simulator is impractical for use as a highway visual.simulator because the G. E. simulator was implemented to a large extent in hardware. Consequently, the search for a visual-environment simulator which could be implemented in software was initiated. A study, investigating the feasibility of such a simulator was undertaken by the Highway Safety Research Center, Chapel Hill, North Carolina, an agency of the State of North Carolina. This study led to the development of the VES, for Visual Environment Simulator, a black-and-white approximation of the GE-NASA spaceflight simulator, adapted for highway environment simulation and implemented in software. Computer graphics has been seen since its inception! as a means of simulating the visual environment. I van Sutherland's binocular CRTs was the first apparatus designed to place a viewing subject in a world generated by a computer. When the subject in Sutherland's apparatus turned his head, the computer generated new images in response, simulating what the subject would see if he really .were in the 3-space which existed only in the computer's memory. This paper describes a system which is a practical extension of Sutherland's concept. The problem of simulating the visual environment of the automobile driver has attracted a variety of partial solutions. Probably the most used technique is simple film projection. This technique requires only that a movie camera be trained on the highway from a moving vehicle as it maneuvers in a traffic situation. The resulting film is shown to subjects seated in detailed mockups of automobile interiors, who are directed to work the mock-up controls to "drive" the projected road. The illusion of reality breaks down, however, when the subject turns the steering wheel in an unexpected direction and the projected image continues on its predefined course. Mechanical linkages from the mock-up to the projector, which eause the projector to swing when the steering wheel is turned, have also been tried. But that technique still breaks down when the subject chooses a path basically different from the path taken by the vehicle with the movie camera. Such film simulators are termed "programmed" . That is, what the subject sees is a function, not of his dynamic actions, but of the actions taken at the time the film was recorded. An "unprogrammed" simulator reverses this situation in that the image that the subject sees is determined only by his behavior in the mock-up. Unprogrammed visual environment simulators have been built for studying driving behavior. The U. S. Public Health Service at the Injury Control Research 153 154' Fall Joint Computer Conference, 1972 lated building, the visual, kinetic and ,auditory feedback should realistically reflect his actions. A visual simulator to provide the feedback described above must meet several requirements. To support the subject's unlimited alternatives, each image generated by the visual simulator must be determined only by the subject's inputs via the steering wheel, accelerator and brake, together with the subject's position in the simulated terrain. Therefore, the entire image representing the visual environment must be calculated in the time span separating subsequent images. REALISM Figure I-Mock-up of an automobile interior VES DESIGN REQUIREMENTS The requirements laid down by the Highway Safety Research Center were for a visual simulator that could be incorporated in a research device to totally simulate the driving experience to the subject. Not only was the visual environment to be simulated, but the auditory and kinesthetic environment as well. The subject was to be seated in a mock-up of an automobile interior; complete with steering wheel, brake and accelerator (see Figure1). The kinesthetic environmentwas to be simulated by mounting the mock-up on a moveable platform equipped with hydraulic rams. Under computer control, the mock-up could be subjected to acceleration and deceleration forces, as well as pitch, yaw and Toll. Similarly, a prerecorded sound track would be synchronized with the visual simulation to provide auditory feedback. To as great a degree as possible, the subject was to be isolated from the real environment and presented only with a carefully controlled simulated environment. From the researcher's point of view, this simulation device should allow him to place a subject in the mockup, present him with a realistic simulated environment and then study the subject's reactions. Further, the choice of reactions available to the subject should not be limited in·any way. So, if the subject were to "drive" off the simulated road and through the side of a simu- The high premium placed on realism in the visual simulator implied that the time span between subsequent images would be short, comparable to the time span between movie or television frames. The realism requirement also made hidden surface removal mandatory. Transparent hills, cars and road signs were unacceptable if the illusion of reality were to be maintained. Further, television-type images were preferable to wire-frame drawings. If the images were to be of the wire-frame type, then objects would be represented by bright lines on the otherwise dark face of the CRT. For objects at close range, this representation presents few problems. But for objects at long range, the concentration of bright lines near the horizon would resemble a sunrise. SYSTEM DESCRIPTION The visual simulator software runs on a stand-alone IDIIOM-2 interactive graphics terminal consisting of a display processor, a VARIAN 620f mini-computer and a program function keyboard3 (see Figure 2). The display processor is itself a computer, reading and exe-' cuting its program (called a display file) from the core of the mini-computer on a cycle-stealing basis. The display processor's instruction set is extensive, but the visual simulator uses only a few instructions. Those used are instructions to draw horizontal vectors at varying intensities at varying vertical positions on the screen. The display processor is very fast, drawing a full screen (12") vector in about 20 microseconds. This speed allows a display file of seven thousand instructions to be executed in about ~~oth of a second, effectively preventing image flicker at low light levels. The VARIAN 620f mini-computer is also fast. Its Core has a 750 nanosecond cycle time and most instructions require two cycles. Word size is 16 bits and core size is 16,384. Simulating Visual Environment in Real-Time Via Software In its present configuration, the simulator receives its steering, braking and acceleration inputs from an arrangement of push buttons on the program function keyboard. The design configuration calls for the installation of an analog-to-digital converter and a driving station mock-up to replace the PFK. At the same time that the analog-to-digital converter is installed, a VARIAN fixed-head disk with a capacity of 128K words will be installed, giving the simulator nearly unlimited source data set storage. The visual simulator (VES)· accepts a pre-defined data set which describes a plan view of the terrain through which travel is being simulated. The terrain data set consists of (x, y, z) triples which describe the vertices of polygons. At present, the YES input data set resides in the computer memory at all times. The main function of the VARIAN fixed-head disk mentioned above will be to store the YES input data set. In operation, the YES system accesses a portion of the input data set corresponding to the terrain which is "visible" to the subject as a function of his position ,............... 155 J\ F c 1~ • E • 1 A Figure 3-Diagram depicting subject's position (light triangle) moving through terrain data set versus data set moving past subject's position Display File NO Last Plan on Disk Figure 2-YES system block diagram in the simulated landscape. Then, the steering, brake and accelerator inputs from the mock-up are analyzed and used to compute a wire-frame type view of the terrain which would be visible through a car's windshield as a result of such steering, braking or accelerating.Next, the hidden surface removal routine (HSR) processes each polygon to determine which polygons "obscure" others and to remove the parts of each that are obscured. The output of HSR is then converted into a program (display file) to be executed by the display processor. The display processor executed this program to draw the horizontal vectors at up to 8 different intensities which make up the television-like final image. The subject's position (see figure 3) in the terrain plan view is represented by the light triangle. The dark triangle represents a fixed object in the terrain. If the terrain is established as the frame of reference, the subject'sposition moves across the terrain. But from the point of view of the subject, who is stationary, the terrain must move toward him. The current angular position of the mock-up steering wheel in radians, relative to a fixed heading, is found in variable ALPHA. AL- 156 Fall Joint Computer Conference, 1972 DIST x COS (ALPHA) SIN(ALPHA} N th of the rectangle and the triangle are compared. As indicated by the dashed lines, the rectangle is "behind" the triangle. In this case, no ordered triple is output. On encountering a3, the triangle's flag is set "out". As there is now only one polygon with flag set to "in", the ordered triple (X3, Y 3,PN r ) is output. Similarly, as a4 is encountered, the rectangle's flag is set to "out" and the triple (X4' Y 4, PN r) is output. This concludes LINESCAN's processing of a single scan line. To obtain the set of intersections corresponding to . t hALT for scan 1'" scan line "b," each element In e me a " must be modified by an amount determined by the space between raster elements, oY, and the slope of the polygon's face. Because polygons are composed of straight line segments, the change necessary is constant for each given line segment. To obtain the ALT entries for scan line "b," this constant value is added to 'the previous entries in the ALT. . But before processing scan line "b" can begm, the new ALT is re-sorted on increasing X values. This step is required because when the new ALT is constructed from the old by the addition of the slope constants mentioned above the order of some points may be disturbed. Note that this situation occurs when the ALT for scan line "c" is generated. Because of the differing slopes of the triangle and rectangle sides, a3 now precedes a2 in the left-to-right scanning order. Once the ALT is sorted, LINES CAN continues to process the ALT points as described above. DEVELOPMENT AND SPECIALIZATION OF THE HSR ALGORITHM Y1 1 Figure 7-Detail of LINESCAN operation 159 In writing the HSR program, the basic logic of LINESCAN was implemented. Unlike LINESCAN, which was not expressly designed for real-time applications, HSR was written in assembly language. Some features implemented in LINES CAN were judged unnecessary 160 Fall Joint Computer Conference, 1972 for the visual simulator application. Chief among these was the "implicitly defined line" feature of LINESCAN. This feature allows polygons to intersect and project through one another. Without this feature, polygons projecting through one another subvert the scanning logic, producing incorrect and distracting images. In a driving simulator, intersecting polygonal objects usually represent car crashes; hence, these are events which should be distracting. Some major changes to the basic logic of LINESCAN were implemented with the object of saving time. Recall that, when LINESCAN processes the ALT, as each point is encountered, a flag associated with that polygon is set to signify that the scan has "entered" that polygon. Then as each successive point is encountered, a search of the flags is used to determine which and how many flags are set. Performing even a short search at each point encountered on each scanline would consume a large fraction of the time allowed between frames in a real-time system. In the HSR algorithm, a table of polygon numbers is kept and updated as each new polygon is "entered" by the scan. The number of elements in the table is kept in a variable. Unless this variable indicates that the scan is "in" more than one polygon at a time, "no "depth sort" is required and no search need be made for polygons flagged as "in." When a "depth sort" is required, the polygons which must be depth sorted are readily accessible by table reference. Another change to the basic LINESCAN logic also involved sorting. LINESCAN sorts the ALT once for each scan. Recall that this step is required because the ALT is disordered when lines of different slopes intersect. Rather than sort the ALT for each scan, a simple test for ALT order was devised and performed at each point of the ALT. When disorder is found, the ALT is sorted. In simple scenes, this disorder occurs for about 8-10 of the possible 512 lines in a frame. Even very complex scenes require fewer than 20 ALT sorts. Hence, the savings in time are substantial. REFERENCES 1 I E SUTHERLAND A head-mounted three dimensional display Proceedings of the Fall Joint Computer Conference Vol 33 Part I pp 757-764 1968 2 BELSON Color TV generated by computer to evaluate spaceborne systems Aviation Week and Space Technology October 1967 3 IDIOM-2-Interactive graphic display terminal The Computer Display Review Vol 5 pp 201-214 1972 G ML Corporation Lexington Massachusetts 4 J BOUKNIGHT An improved procedure for generation of half-tone computer graphics presentations Communications of the ACM Vol 13 Number 9 pp 527-536 September 1970 Computer animation of a bicycle simulation by JAMES P. LYNCH and R. DOUGLAS ROLAND Cornell Aeronautical Laboratory, Inc. Buffalo, New York INTRODUCTION In early 1971, Cornell Aeronautical Laboratory, Inc., (CAL), began a research program, sponsored by Schwinn Bicycle Company, devoted to the development of a comprehensive digital computer simulation of a bicycle and rider. This simulation would be used to study the effects of certain design parameters on bicycle stability and control. Phase II of this research effort included the development of a computer graphics display program which generates animated movies of the bicycle and rider maneuvers being simulated. It is this graphics display capability that is described herein. For years, printed output was the only means of communication between the computer and man. This limitation dictated that only the technically skilled could interpret the reams of computer printout with its lists of numbers and specialized codes. For certain types of computer usage, such as accounting, numbers may be the most meaningful form of output which can be presented to the user. Solutions to other problems, however, may represent functional relationships of intangible variables. In this case plots of output data provide a much faster means of communication between the computer and the human. There is a class of problems for which neither numerical nor plotted output provide sufficient reality for rapid user comprehension. One such area is the simulation of the dynamics of tangible physical systems· such as airplanes, automobiles and bicycles. Fortunately, a means of communication is becoming practical which provides immediate visual interpretation of simulation results; not only for the analyst but for the layman as well. This mediumi s the computer animated graphics display. The early development of computer animated graphics displays was spurred by several investigators. Bill Fetter of the Boeing Company created an animated human figure in 1960 and a carrier landing film in 1961.1 Ed Zajac of Bell Telephone Laboratories produced a computer generated movie of a tumbling communications satellite in 1963. 2 Frank Sinden, also of Bell Laboratories, generated an educational computer animated film about gravitational forces acting on two bodies. 3 Two other investigators deserve mention, Ken Knowlton of Bell Labs for his computer animation language (BEFLIX)4 and Ivan Sutherland for his interactive computer animation work. 5 Interested readers will find an excellent bibliography on the subject in Donald Weiner's survey paper on computer animation. 6 Figure I-Computer graphics rendition of a bicycle and rider 161 162 Fall Joint Computer Conference, 1972 Figure 2-Blcycle slalom maneuver Computer Animation of a Bicycle Simulation 1.. 1 SEC Figure 2 (Cont'd) 163 164 Fall Joint Computer Conference, 1972 Computer Graphics activities at the Cornell Aeronautical Laboratory range from everyday use of general purpose plotting facilities by many programmers to highly complex computer-generated radar displays. One of the more fascinating computer graphics applications has been the Single Vehicle Accident Display Program, developed at CAL for the Bureau of Public Roads by C. M. Theiss. 9 This program converts automobile dynamics simulation data into a sequence of computer animated pictures used to generate motion picture film of the event. The demonstrated usefulness of this capability spurred the development of a graphics program for the Schwinn Bicycle Simulation. BICYCLE GRAPHICS PROGRAM FEATURES The Schwinn Bicycle Graphics Program provides a complete and flexible perspective graphics package capable of pictorially documenting the results of the bicycle simulation. The salient features of the graphics program are; 1. The program can plot a perspective picture of a bicycle and rider, positioned and oriented as per the simulation data. 2. The line drawing of the bicycle and rider can be easily changed to fit simulation or esthetic requirements. 3. The program can produce single pictures or animated movies. 4. Background objects, such aB roadways, houses, obstacles, etc., can be plotted in the scene. 5. The "frame rate" for animated films can be adjusted for "slow motion" or normally timed action. 6. The program is written to simulate a 16 mm movie camera, so that "photographing" a sce:lie is accomplished by specifying a set of standard camera parameters. 7. The program's "camera" can be set to automatically pan, zoom, remain fixed, or operate as on a moving base. 8. Any of the above characteristics may be changed during a run. Figure 1 shows a typical frame from a bicycle simulation movie. SIMULATION AND GRAPHICS SOFTWARE Digital computer simulation of bicycle and rider The computer simulation consists of a comprehensive analytical formulation of the dynamics of a bicycle-rider system stabilized and guided by a closed-loop rider control model. This computer simulation program will be used for bicycle design and development with particular consideration being given to the effects of various design parameters and rider ability on bicycle stability and maneuverability. The bicycle-rider model is a system of three rigid masses with eight degrees of freedom; six rigid body degrees of freedom, a steer degree of freedom of the front wheel, and a rider lean degree of freedom. I~ cluded in the analysis are tire radial stiffness, tire side forces due to slip angle and inclination angle, the gyroscopic effects of the rotating wheels, as well as all inertial coupling terms between the rider, the front wheel and steering fork, and the rear wheel and frame. Forty-four parameters of input data are required by the simulation program. These data include dimensions, weights, moments of inertia, tire side force coefficient, initial conditions, etc. The development of the simulation program has been supported by the measurement of the above physical characteristics of bicycles, the measurement of the side force characteristics of several types of bicycle tires and full scale experimental tests using an instrumented bicycle. Solutions are obtained by the application of a modified Runge-Kutta step-by-step procedure to integrate equations of motion. Output is obtained from a separate output processor program which can produce time histories of as many as 36 variables (bicycle translational and angular positions, velocities, accelerations, and tire force components, etc.) in both printed and plotted format. The simulation program, consisting of seven subroutines, uses approximately 170K bytes of core storage and requires about 4 seconds of CPU time per second of problem time when run on an IBM 370/165 computer. The output processor program uses approximately 200K bytes of core storage and requires about 5 seconds of CPU time per run. The total cost of both the simulation and output processor programs is approximately seven dollars per problem. The mechanics of making a bicycle graphics movie In addition to the printed and plotted output generated by the Schwinn Bicycle Simulation Program, a pecial "dynamics tape" is created for input to the bicycle graphics program. This dynamics tape contains, for each simulation solution interval, the bicycle's c.g. position (X, Y, Z coordinates), angular orientation (Euler angles), front wheel steer angle, and rider lean angle. All other pertinent information, such as the steering head caster angle, rider "hunch forward" angle, are fed to the graphics program via data cards, along Computer Animation of a Bicycle Simulation 165 with the stored three-dimensional line drawings of the bicycle and rider, and any desired backgrounds. The bicycle graphics program searches the tape and finds the simulation time corresponding to the desired "frame time." Information is then extracted to draw the desired picture. The program mathematically combines the chassis, front fork and pedals to draw the bicycle, and mathematically combines the torso, left and right upper arms and forearms, and left and right thighs, calves and feet to draw the rider. Everything is so combined to yield a picture of a rider astride a bicycle assuming normal pedaling, leaning and handlebar grip. The correctly positioned three dimensional line drawings are transformed into a two dimensional picture plane, as specified by the program's camera parameters (location, orientation, focal length, etc.). COMPUTER PLOTS OF SIMULATION RESULTS PRINTOUT {J. OF BICYCLE GRAPHICS PROGRAM ~-----tol STORED PICTURE Figure 4-Joints used for rider display .j! ::J,.4-, .j! t Figure 3-8teps in making Schwinn bicycle movie An interface program converts the final line drawings into a set of commands to the CAL Flying Spot Scanner. The cathode ray tube beam of the Flying Spot Scanner traces out one frame of the movie while a 16 mm. movie camera records the image. Upon completion of the picture, the movie camera automatically advances one frame and the graphics program reads the next data (positions, angles, etc., of bicycle and rider) from the dynamics tape. The completed film will show animated motion, exactly as simulated by the computer, Figure 2. A block diagram of the movie making procedure is shown in Figure 3. x~ CHASSIS For maximum realism and esthetic quality, seven distinct bicycle/rider motions were generated: 1. Bicycle chassis translation and rotation (6 degrees-of-freedom) c!)- .x FRONT·FORK \ Bicycle motions displayed ·x __ t PEDALS "-. BICYCLE SYSTEM Figure 5-Sections used for bicycle display 166 Fall Joint Computer Conference, 1972 2. 3. 4. 5. 6. 7. Front wheel and handlebar steering Bicycle crank and pedal rotation Rider left-right leaning Rider arm steering Rider leg pedaling Rider ankle flexing (XSTEER, Y STEER, ZSTEER) are points in the front Figure 4 shows the various body members and joints included in the rider. The separate parts of the bicycle are shown in Figure 5. Modification of the basic graphics package The Bureau of Public Roads graphics display program provided an excellent base from which to build the Schwinn Bicycle Graphics Program. A pre-stored line drawing, defined in its own coordinate system, is Euler transformed into fixed space and camera transformed into two dimensional picture space. Edge tests are performed to delete lines out of the field of view. Plotting any object (a line drawing) involves a call to the OBJECT subroutine CALL OBJECT (TITLE, X, Y, Z, PHI, THETA, PSI) Title refers to a particular stored line drawing, while X, Y, Z and PHI, THETA, PSI refer to the desired fixed space position and Euler angles at which the object is to be plotted. Subroutine OBJECT then does all the necessary transformations to plot the object. Plotting the chassis is straightforward, the chassis position and Euler angles are read directly from the dynamics tape. Displaying the bicycle and rider All segments of the bicycle and rider are displayed with the same mathematical approach. Parts are referenced by position and orientation to the chassis axis system, and this information is used to calculate the fixed space Euler angles and position. For example, the matrix equation relating points in the front fork axis system to corresponding points in fixed space is: [ ::] = [AJ ZF 1 [B J [:::::] ZSTEER + [:::]1 + [:] ZZF J Z where: A is the standard Euler transformation matrix ( chassis to fixed space) B is the front-fork system to chassis axis transformation matrix fork space (XXF, Y YF, ZZF) is the front fork system connection point in the chassis system (X, Y, Z) is the current fixed space position of the bicycle chassis (XF, Y F, ZF) is the front fork points specified in the fixed space set The B matrix, of course, is a two rotational transformation, being a function of the caster angle and the steer angle. The Euler angles required by subroutine OBJECT can be determined by equating like terms of the standard Euler transformation with the overall transformation, [ABJ= [AJ*[BJ For instance: PHI = TAN-l AB(3, 2) AB(3, 3) PSI = TAN-l AB(2, 1) AB(l,l) THETA=TA -1 N -AB(3, l)*SIN(PSI) AB(2, 1) This procedure can be easily automated by a general subroutine which accepts the coefficients of the two transformation matrices and outputs the Euler angles. Displaying the pedaling action The pedal rotation angle is easily determined by tabulating the distance traveled by the chassis and relating it to the wheel size and gear ratio. The toe angle can be approximated by a cosine function of the pedal rotation angle. w = gear ratio*distance/wheelsize Toe angle = - .25*cos (w) An important simplifying assumption in the display of the leg pedaling motion is that the legs move up and down in a single plane. This makes trigonometric calculation of the joint locations straightforward and the object-to-chassis transformations simple one-rotation matrices. Once this information is determined, procedures similar to the front-fork manipulations are used. Three objects are required for each leg: the thigh, the calf, and the foot. Displaying the torso The torso must hunch forward (so that the arms may reach the handlebars) and lean to the left and right Computer Animation of a Bicycle Simulation (real-world rider control action). The transformation between the torso axis system and the chassis system is determined by two rotations. This transformation is also used for determination of arm location in the chassis system. Displaying the arm.s Determination of the fixed space Euler angles of the arms is complicated by the fact that the elbow joint lies on a circular locus around the shoulder-to-handlebar line. Since the upper arm and forearm are assumed equal in length, the perpendicular distance from the elbow to thehandlebar-to-shoulder line is known. A transformation matrix can be developed to convert points in the elbow circle plane to the torso system. A constant angle from the elbow circle plane's Y-axis defines a unique elbow point which can be transformed back into the chassis system. Once the elbow point is known, determination of the Euler angles of the arm is straightforward. MOVIE PRODUCTION Both the bicycle simulation program and the bicycle graphics program are run on CAL's IBM 370/165 computer. The flying spot scanner is interfaced with the central digital computer through an IBM 2909 asynchronous data channel. The flying spot scanner is a high resolution CRT display system used for plotting and scanning. The interface software provides all the controls required by the display to move the beam, advance the film, etc. The Schwinn Bicycle Graphics program requires 250K bytes of core, and generally runs from 50¢ to 90¢ per frame in computing costs, depending on image complexity. No attempt at hidden line removal was planned for this phase. FUTURE APPLICATIONS The Schwinn Bicycle Graphics Program was designed as a research tool to demonstrate the capability of the bicycle simulation. Several computer animated movies have been produced of simulated bicycle maneuvers which compare well with full scale experimental maneuvers. At current production cost levels, only the most interesting runs are documented with the bicycle graphics program. The authors feel, however, that the advent of high speed intelligent computer terminals will 167 allow the economical production of computer graphics. In the future the investigator will be able to view animated summaries of simulation results first, before referring to more detailed printed and plotted output data. The most gratifying result of this bicycle graphics capability is that the technically unskilled can share in the understanding that computer simulation is an emulation of reality, and has visible meaning in the everyday world. ACKNOWLEDGMENT The authors wish to express their gratitude to the Schwinn Bicycle Company for permission to present this work and also to Ronald B. Colgrove, CAL chief artist, for his excellent rendering. of the bicycle rider used in the movie sequences. REFERENCES 1 W A FETTER Computer graphics in communication McGraw-Hill New York 1965 2 E ZAJAC Film animation by computer New Scientist Vol 29 Feb 10, 1966 pp 346-349 3 F SINDEN Synthetic cinematography Perspective Vol 7 No 4 1965 pp 279-289 4 K KNOWLTON A computer technique for producing animated movies Joint Computer Conference AFIPS Conference Proceedings Vol 25 Baltimore Md Spartan 1964 pp 67-87 5 I SUTHERLAND Perspective views that change in real time Proceedings of 8th UAIDE Annual Meeting 1969 pp 299-310 6 D D ·WEINER Computer animation-an exciting new tool for educators IEEE Transactions on Education Vol E-14 No 4 Nov 1971 7 R D ROLAND JR D E MASSING A digital computer simulation of bicycle dynamics Cornell Aeronautical Laboratory Inc Technical Report No YA-3063-K-1 June 1971 8 R D ROLAND JR J P LYNCH Bicycle dynamics, tire characteristics and rider modeling Cornell Aeronautical Laboratory Inc Technical Report No YA-3063-K-2 March 1972 9 C M THEISS Perspective picture output for automobile dynamics simulation Prepared for Bureau of Public Roads by Cornell Aeronautical Laboratory Inc Technical Report No CPR-1l-3988 January 1969 10 C M THEISS Computer graphics displays of simulated automobile dynamics Proceedings AFIPS Conference Spring 1969 An inverse computer graphics problem by W. D. BERNHART Wichita State University Wichita, Kansas The goal of a conventional computer perspective algorithm is to assist in the establishment of a scaled perspective view of a real or conceptual geometric object. The purpose of this paper is to present the required conditions for the inverse transformation; that is, given the perspective of an object, establish the required parameters used in generating the perspective and to a more restrictive extent, establish the original geometric definition of the object. Because this inverse mapping is from a two to three dimensional space, the method is approximate and is accomplished by the method of least squares based on certain a priori information regarding the geometrical object. The method does require a considerable amount of numerical computation, but is particularly well suited to a digital computer solution. The need for this required transformation arose in the course of a problem associated with the determination of the coordinates of certain desired points which appeared in photographs of an event which occurred several years ago, wherein the desired points had been completely obliterated by recent construction activities. Thus, the first task was to establish the generating parameters for the photographs. The generating parameters are defined as six independent coordinates from which a photograph may be geometrically reproduced by considering a large number of points in the threedimensional object space, and transforming these to the two-dimensional space of the photograph. These parameters consist of the coordinates of the point where the camera is located, the symmetric equations of the line along the optical axis of the camera, and a linear scale factor associated with the photograph, enlarged to any magnification. The treatment of a photograph as a true perspective is consistent with the paraxial ray tracing approximation of geometrical optics. For the purpose of this analysis, all points will be defined in a rectangular Cartesian coordinate system as shown in Figure 1. The point where the camera is located is denoted by three independent coordinates, (X e, Y e, Ze). In the context of traditional perspective terminology, this point is commonly described as the location of the eye or observer, and the point (Xo, Yo, Zo) is referred to as the center of interest of the object space or perspective center. A line through these two points is regarded as the optical axis of the camera and the plane perpendicular to this axis represents the picture plane, projection plane, or two-space photograph. The location of this plane in relation to the eye point requires the identification of a linear-scale factor which is associated with each photograph~ The coordinates of z Figure 1-Projection plane and control points 169 170 Fall Joint Computer Conference, 1972 the center of interest, (Xo, Yo, Zo), are not a unique set, as any point on the line passing through points 'e' and '0' will require a particular value of the linearscale factor to perspectively generate the object space into the projection plane space. For this analysis, the scale factor will be regarded as a constant and the six independent parameters, (Xe, Y e, Ze) and (Xo, Yo, Zo) will be determined such that the photograph may be geometrically reproduced in the perspective sense. Before analyzing this particular problem, it will be necessary to present the required coordinate transformation that maps an arbitrary point 'i' in the object space to the projection-plane space. This perspective transformation has received considerable attention in computer graphics applications in the last decade. 1 ,2,3 A form which is particularly suited to the parameter identification problem is ni=Ro(1-A) Returning to the original problem, the six desired parameters are determined by the method of least squares by considering four or more points in the object space whose rectangular coordinates are known or may be estimated with a high degree of accuracy. Next let (Sij)m denote the measured value of the distance between points i and j in the photograph. Thus, for en' such points, there are m = n (n -1) /2 corresponding measured distances. The calculated value of this associated distance in the projection plane is given by and the six desired generating parameters are then obtained by expanding this calculated value in a multiple Taylor series, expressed as (1) hi = A(Ro2/PoDi) {- (Xi-X o) (Y e- Yo) + (Y i - Yo) (Xe- Xo) } Vi = A(Ro/PoD i ) {- [(Xi-Xo) (Xe-Xo) + (Y i - Yo) (Y e- Yo) ](Ze-Zo) (9) (2) (3) + (Zi-Zo)Po2} in which + higher-order terms P o= [(X e-Xo)2+ (Y e- Y o)2J/2 Ro= [(X e-Xo)2+ (Ye- Yo)2+ (Ze-Zo)2J/2 (5) Di=R2-[(Xe-Xo) (Xi-X o) + (Y e- Yo) (Y i - Yo) + (Ze-Zo) (Zi-Zo)] (6) and A= the linear scale factor; A> O. The coordinate normal to the picture plane is a constant and is of no particular interest other than as an aid in the estimation of a suitable photographic scale factor. For the case of a photograph, this normal coordinate is proportional to the focal length of a simple convergent camera lens. This particular form of the perspective mapping transformation is based on twosuccessive rotational transformations such that the plane defined by a line parallel to the Z-axis and the. point 'e' also contains the V-axis of the projection plane. These two-successive rotations are defined as follows 8=tan-1 [(Ye+Yo)/(Xe-X o)] (7a) ,8 = sin-l [(Ze-Zo)/Ro] (7b) A third rotation may be easily introduced by rotating the H, V-axes in the projection plane. It is important to note that distances measured in the projection plane would remain invariant with respect to this third rotation. The subscript 'a' in Equation 9 denotes the evaluation for some assumed value of the six parameters. Thus, by neglecting the higher-order terms and minimizing the sum of the squares of the residuals between the calculated and measured values for the en' points m G= L: [(Sij)c- (Sij)m]% (lOa) k=l and aG -=0 aXe ' axo aYe aG ' aG aG -=0 aG -=0 ' aZ e ' (lOb) aG -=0 aYo -=0 -=0 ' azo The six equations lOb, in general yield the six desired parameters after two to five iterations, depending on the initial assumed· values of the parameters and the desired accuracy. Again, the scale factor is held constant throughout this iterative process. A different choice of A will simply slide the coordinates of point '0' along the line o-e without disturbing the iterated coordinates of point 'e'. The writer has employed this procedure on several Inverse Computer Graphics Problem different controlled photographs with encouraging success. 4 These laboratory experiments yielded parameters estimates within 4 percent of their exact values. This error is largely attributed to the various unknowns associated with the optics of both the camera and enlarger, as both instruments were of commercial rather than laboratory quality. Recent experiments, 5 dealing with photogrammetric resectioning yielded considerably smaller errors. These experiments utilized a phototheodolite, spectroscopic flat quality glass plates and a mono comparator. As mentioned earlier, the original need involved the determination of the coordinates of certain desired points which appeared in photographs of an event which occurred several years ago, wherein the desired points had been completely obliterated by construction activities. However, a sufficient number of points in the object space still existed such that the tn' required object-space coordinates described previously could still be easily obtained by field measurements. The desired points were located such that they appeared in two different photographs of the event. Thus, by iteratively determining the generating parameters for each photo- 171 graph, the coordinates of the desired point were redetermined by solving for the intersection of the two lines associated with the point in each photograph. REFERENCES 1 H R. PUCKETT Computer methods for perspective drawing ARS-IAS Structures and Materials Conference Engineering Paper No 135 Palm Springs California April 1-3 1963 2 T E JOHNSON Sketchpad III-A computer program for drawing in three dimensions Proceedings Spring Joint Computer Conference 1963 3 W D BERNHART W A FETTER Planar illustration method and apparatus United States Patent Office No 3519997 July 7 1970 4 W D BERNHART Determination of perspective generating parameters ASCE Journal of the Surveying and Mapping Division Vol 94 No SU2 September 1968 5 L J FESSER Computer-generated perspective plots for highway design evaluation Federal Highway Administration Report No FHWA-RD-72-3 September 1971 Module connection analysis-A tool for scheduling software debugging activities by FREDERICK M. HANEY X erox Corporation El Segundo, California INTRODUCTION of various kinds of effort such as design, coding, module testing, etc. More recently Belady and Lehman described a mathematical model for the "meta-dynamics of systems in growth. "3 These schemes provide useful insights into the difficulties of designing and implementing large systems. Even with these improved estimation techniques, however, we still face the threat of long periods of unstructured post-integration putting out of fires. We may know better how long this "final" debugging will take, but we are still at a loss to predict what resources will be required or what specific activities will take place. If we predict an 18 month period for "final testing," will management buy it? How can we peer into this hazy contingency portion of a schedule and predict in greater detail where bugs will occur, who will be needed to fix them, elapsed time between internal releases, etc.? Belady and Lehman suggest the need for a "micro-model" for system activities; i.e., a model based on internal, structural aspects of a system. This is essentially the objective of this paper. In the following sections, we will develop a very simple, but useful, technique for modeling the "stabilization" of a large system as a function of its internal structure. The concrete result described in this paper is a simple matrix formula which serves as a useful model for the "rippling" effect of changes in a system. The real emphasis is on the use of the formula as a model; i.e., as an aid to understanding. The formula can certainly be used to obtain numeric estimates for specific systems, but its greater value is that it helps to explain, in terms of system structure and complexity, why the process of changing a system is generally more involved than our intuition leads us to believe. The technique described here, called Module Connection A nalysis, is based on the idea that every module pair (may be replaced by subsystem, component, or any other classification) of a system has a finite (possibly 0) The largest challenge facing software engineers today is to find ways to deliver large systems on schedule. Past experience obviously indicates that this is not a wellunderstood problem. The development costs and schedules for many large systems have exceeded the most conservative, contingency-laden estimates that anyone dared to make. Why has this happened? There must be a plethora of explanations and excuses, but I think H. R. J. Grosch identified the common denominator in his article, "Why MAC, MIS and ABM will never fly."l Grosch's observation is essentially that for some large systems the problem to be solved and the system designed to solve it are in such constant flux that stability is never achieved. Even for some systems that are flying today, it is obvious that they came precariously close to this unstable, "critical mass" state. It is my feeling that our most significant problem has been gross underestimation of the effort required to change (either for purposes of debugging or adding function) a large, complex system. l\10st existing systems spent several years in a state of gradual, painfully slow transition toward a releasable product. This transition was only partially anticipated and almost entirely unstructured; it was a time for putting out fires with little expectation about where the next one would occur. The difficulties of stabilizing large systems are universal enough that our experience has resulted in several improved methods for estimating projects. Rules-of-thumb like "10 lines of code per man day" once sounded like extremely conservative allowance for the complexities of system integration and testing. J. D. Aron2 has described a relatively elaborate technique for estimating total effort for large projects. Aron's technique is based· on the estimated amount of code for a project and empirically observed distributions 173 174 Fall Joint Computer Conference, 1972 probability that a change in one module will necessitate a change in any other module. By interpreting these probabilities and applying elementary matrix algebra, we can derive formulae for estimating the total number of "changes" required to stabilize a system and the staging of internal releases. The total number of changes, by module, is given by where A is a row vector representing the initial changes per module, P is a matrix such that Pij is the probability that a change in module i necessitates a change in modulej, and 1 is the nXn identity matrix. The number of changes required for each "internal release" is given by APk, K=O, 1, ... , or by AX (1 -P)-lX Uk, k=1,2, ... n, Uk= (0, ... , 1, ... 0) i k th element depending upon the release strategy. The derivations of these formulae are presented in the following section. Module connection analysis is useful primarily as a tool for augmenting a designer's quantitative understanding of his problem. It produces quantitative estimates of the effects of module interconnections, an area in which intuitive judgment is generally inadequate. THEORY OF lVIODULE CONNECTIONS As a basis for our analysis, we postulate several characteristics of a system: • A system is hierarchical in structure. It may consist of subsystems, which contain components, which contain modules· or it may be completely general having n different levels of composition where an object at any level is composed of objects at the next lower level. • At any level of the hierarchy, there may be some interdependence between any two parts of the system. • If we view a system as a collection of modules (or, whatever object resides at the lowest hierarchical level), then the various interdependencies are manifested in terms of dependencies between all pairs of modules. By dependence here, we mean that a change in one module may necessitate a change in the other. The fundamental axiom of module connection analysis is that intermodule connections are the essential culprit in elongated schedules. That a change in one module creates the necessity for changes in other modules, and these changes create others, and so on. Later, we will see that perfectly harmless-looking assumptions lead easily to sums like hundreds of changes required as a result of a single initial change. (The notions of hierarchy, interconnection, etc., used here are described at length in Reference 4.) If we assume that a system consists of n "modules," then there are n 2 pairwise relationships of the formPij = Probability that a change in module i necessitates a change in module j. In the following, the letter "P" denotes the n X n matrix with elements pij. Furthermore, with each module i, there is associated a number Ai of changes that must be made in module i upon integration with the system. (Ai is approximately the number of bugs that show up in module i when it is integrated with the system.) If we let A denote a row vector with elements Ai, then we have the following: A = total changes, by module, required at integration time, or at internal release 0. AP = total changes required, by module, as a result of changes made in release 0, or total changes for internal release 1. (Internal release n+ 1 is, roughly, a version of the system containing fixes for all first-order problems in internal release n.) N ow we observe that the i, jth element of p2 is n L: Pik Pkj, k=l which represents the sum of probabilities that a change in module i is propagated to module k and then to module j. Hence, the i, jth element of p2 is the "twostep" probability that a change in module i propagates to module j. Or, AP2 is the number of changes required in internal release 2. The generalization is now obvious. The number of changes required in internal release k is given by APk and the total number· of changes, T, is given by Now we are interested to know whether or not the matrix power series in P converges; clearly, if it does not our system will never stabilize. To establish con- Module Connection Analysis vergence of the power series, we appeal to matrix algebra (see Reference 5, for example) which tells us that the above series converges whenever the eigenvalues of P are less than 1 in absolute value. If this is the case, then the series converges and T=A (I _P)-l, where I is the nXn identity matrix. We now have an extremely simple way to estimate the total number of changes required to stabilize a system as a linear function of a set of initial changes, A. Moreover, the number of changes at each release is given by the elements of AI, AP, AP2, etc. 175 ·STAGING. INTERNAL RELEASES There are various strategies for tracking down bugs in a complex system. The most obvious are: (1) fix all bugs in one selected module and chase down all side effects, or, (2) fix all "first-order" bugs in each module, then fix all "second-order" bugs, and so on. The module cbnnection model can aid in predicting release intervals for either approach. For strategy (1) (one module at a time), the number of changes required to stabilize module i, given Ai initial changes, is given by (p, ... , Ai, ... , 0) (I _P)-l ESTIMATING TOTAL DEBUGGING EFFORT FOR A SYSTEM The above theory suggests a simple procedure for estimating the total number of changes required to stabilize a system. The procedure is as follows: The product is a row vector with elements corresponding to the number of changes that must be made in each module as a result of the original changes. The total number of changes required to stabilize this one release is given by n Ai LXik, k=l (1) For each module pair, i,j, estimate the probability that a change in module i will force a change in module j. These estimates constitute the probability matrix P .. (2) From the vector A by estimating for each module i the number of "zero-order" changes, or changes required at integration time. (3) Compute the total number of changes, by module: T=A (I _P)-l. ( 4) Sum the elements of the column vector T to obtain the total number of changes, N. (5) Make a simple extrapolation to "total time" based on past experience and knowledge of the environment. If past experience suggests a "fix" rate of d per week, then the total number of weeks required is N / d. Hence if we have some estimate for the initial correctness (or "bugginess) of a system and for the intermodule connectivity (the probabilities), then we can easily obtain an estimate for the total number of changes that will be required to debug the system. The formula is a simple one in matrix notation, but the fact that we are dealing with matrices probably explains the failure of our intuition in understanding debugging problems. In the following sections, we will show how the above formula can be used to aid our understanding of other aspects of t~e debugging process. where the Xik are elements of (I _P)-l. This strategy, then results in n internal releases where· the time for release i is Ai (max X ik) X (time required per change) k and the total debug time after integration is L (Ai max Xik) X (tIme required per change) i k With the second debugging strategy (make all "firstorder" changes, then all "second-order" changes, etc.), the number of changes in the kth release is given by APk. That is, APk is a row vector with elements corresponding to the number of changes in each module for release k. The time required for release k is approximately max (APk) Xtime required per change. To determine the total number of releases for this strategy, we must examine A, AP, AP2, until the number of changes AP8 in release 8 is small enough that the system is releasable. The total time for this strategy, then, is 8 L max APkXtime required per change. k=O I t is worth noting that both of the debug strategies described above evidence a "critical path" effect. The total time in each case is a sum of maximum times for each release. This effect corresponds to the well-known fact that debugging is generally a highly sequential 176 Fall Joint Computer Conference, 1972 process with only minor possibilities for making many fixes in parallel. This fact, coupled with the "amplification" of changes caused by rippling effects, certainly accounts for a large portion of many schedule slips. REFINING THE INITIAL ESTIMATES Module connection analysis is proposed as a tool for aiding designers and implementors. l\1ore than anything else, it is a rationale for making detailed quantitative estimates for what is generally called "contingency." Now, we must ask, "As a project progresses, how can we take advantage of actual experience to refine the initial estimates?" The module connection model is based on two objects: A, the vector of initial changes; and P, the matrix of connection probabilities between the modules. Both A and P can be revised simply as live data become available. As each module i is integrated into the system, the number Ai of initial changes becomes apparent. Using updated values for the vector A, we can recompute the expected total number of changes and the revised release strategy. The elements, Pij, of the matrix P can be revised periodically if sufficient data is kept on changes, their causes, and their after-effects. One simple way to do this is to keep a record for each module as follows: Module i .2 .1 o .2 0 o o .1 .1 0 .1 0 .1 0 .1 .1 0 0 .1 0 .1 0 .2 0 .1 .1 0 0 0 0 0 0 0 0 0 0 0 0 0 o .1 0 o .2 0 o 0 0 o .1 o .1 0 0 0 0 o .1 0 0 0 0 0 0 0 0 0 0 0 0 0 o o .1 .1 .1 0 0 o .1 .4 .1 o .3 .2 .1 .2 o 0 0 o .1 0 0 o .1 o .1 0 0 0 0 0 0 0 0 .1 o o .1 o .1 .1 0 .1 .1 .1 .3 .1 0 .1 0 0 .1 0 0 0 0 .1 0 0 0 0 o .1 o .1 o .1 o .1 0 .1 o .4 0 o .2 0 0 o .1 o .2 0 0 0 0 0 0 0 0 0 0 0 0 0 o .1 0 0 .1 .1 0 .1 .4 0 0 0 0 .1 0 0 0 0 0 0 0 0 .2 .3 .2 .1 0 0 0 0 0 .1 0 o .1 .1 0 0 0 0 o .1 0 0 0 o .1 0 o .1 o o .1 0 0 0 0 .2 .1 .1 0 0 0 0 o .1 .3 0 0 o .2 0 0 o .2 0 0 o 0 0 0 0 0 0 0 o .1 0 0 0 0 0 0 0 .1 .1 .3 o 0 0 0 0 0 .1 0 0 0 .1 0 0 .1 0 .1 0 0 0 .2 .1 0 o o .1 o .1 o .1 o .1 o caused by which other modules module? affected After a relatively large sample of data is available, the above forms can be used to revise P as follows: .. number of changes in j caused by i total changes made to i P1,J = - - - - - - = - - - - - - " - . - - - - - ' - - The revised matrix P can be used to revise earlier estimates for total effort and release strategies. AN EXAMPLE OF MODULE CONNECTION ANALYSIS The following example is hased on the Xerox Universal Timesharing System. Eighteen actual subsystems 0 .1 0 0 0 0 0 0 .2 Figure I-Probability connection matrix, P are used as "modules." Estimates for connection probabilities and initial changes are made in the same way that they would be made for a new system, except that some experience and "feel" for the system were used to obtain realistic numbers. (Thanks to G. E. Bryan, Xerox Corporation, for helping to construct this example.) The 18 X 18 probability connection matrix for this example is given in Figure 1. The matrix is relatively sparse; moreover, most of the nonzero elements have a value of .1. Most the larger elements lie on the diagonal INITIAL AND FINAL CHANGES description of change o 0 0 0 0 0 0 0 .1 0 .1 0 0 0 0 0 0 0 .3 Module Initial Changes Total Required Changes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2 8 4 6 28 12 8 28 4 8 40 12 16 12 12 28 28 40 241.817 100.716 4.44444 98.1284 248.835 230.976 228.951 257.467 4.44444 318.754 238.609 131.311 128.318 ·157.108 96.1138 150.104 188.295 139.460 TOTALS 296 2963.85 Figure 2 Module Connection Analysis corresponding to the fact that the subsystems are relatively large so that the probability of ripple within a subsystem is relatively large. The total number of changes required in each module are given in Figure 2. It is interesting to note which modules require the most changes and to observe that six modules account for 50 percent of the changes. Figure 3 illustrates the one-release-per-module debug strategy. That is, we repair one module and all side effects, then another module, and so on. This strategy is rather erratic since the time between releases, which is determined by the maximum number of fixes in one module, ranges from 4 to 95 indiscriminately. If we adopt this strategy, we may want to select the 177 MAXIMUM CHANGES PER MODULE PER RElEASE AND TOTAL CHANGES PER RElEASE 300 275 250 225 :> o ~ 30 ~ Y~296' 230 (.92)-X , / (TOTAL PER RElEASE) o - 150 ~~ '~ ! -125 20 ~ - 100 ::!; 75 50 - 25 ·---------·-------'-----+I-----"----~I 10 15 20 30 35 25 ONE RELEASE PER MODULE Maximum Changes in One Module Release 4.41764 11. 8619 4.44444 8.84029 67.8994 24.7185 20.3720 85.8099 4.44444 35.2976 95.2147 22.5608 39.7013 15.0000 15.0000 35.0000 35.0000 66.5554 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 "CRITICAL PATH" TOTAL Figure 4-"Internal" release an average rate of about 1 per day, then Figure 4 is fairly representative of experience with the first release of UTS. The total number of changes on the "critical path" is 338, so that approximately 15 months would I I I 4000 3000 - -f _ 2000 592.138 Figure 3 worst module first and continue using the worst module at each step. We will see, however, that this strategy is far from optimal because it does not take maximum advantage of opportunities to make fixes in parallel. A more effective release strategy is illustrated in Figure 4. This strategy assumes all first-order changes in release 1, all second order changes in release 2, etc. Figure 4 shows, for each release, the maximum number of changes in one module and the total number of changes. The reader who has worked on a large system will, no doubt, recognize the painfully slo~ convergence pattern. In this case, the system is assumed to be ready for external release when the "maximum changes per module" becomes less than one. If we assume the "critical path" changes are made at X "'NTERNAl" RELEASE 1000 900 800 / 700 / 6 0 0 / -/ 500 -- . 400 300 - 200 AVERAGE MODULE CONNECTION PROBABILITY 100 .01 I .02 I .03 -21-.04 .05 Figure 5-Total changes as a function of "average connection probability" 178 Fall Joint Computer Conference, 1972 be required to stabilize the system for the first external release. To conclude this example, let us take a brief look at the relationship between "total changes" and the probability of intermodule connection. The probabilities in the connection matrix above have an average value of approximately .04. What is the result if we assume the same relative distribution of probabilities in the matrix, but reduce the average by dividing each element by a constant? Figure 5 shows the total number of changes as a function of "average probability of module connection" under the above assumption. This curve shows that our example is precariously close to "critical mass" and that any small improvement in the connection probabilities results in significant payoff. OTHER APPLICATIONS OF IVIODULE CONNECTION ANALYSIS The value of module connection analysis is its simplicity. The computations can be performed easily by a small (less than 50 lines) program written in APL, BASIC, or whatever language is available. Used on-line, the technique is useful for experimenting with various design approaches, implementation· strategies, etc. Three examples of this use of the model are described below: Estimating new work If the designers, or managers, of a system have kept detailed records of the .module-module changes in the system (as described above), then the matrix P is a reliable estimator of the "ripple factor" for the system. It can be used to predict, and stage, the effort to stabilize the system after any set of changes. If we postulate a major improvement release of the system, then we can assume, for example, that the new program code falls into two categories: (1) independent code particular to a new function and, (2) code that necessitates changes in an existing module. By estimating the number of changes, bi, to each module i, we can estimate the total number of changes to restabilize the system: The previously described computations can be used to estimate release intervals and total time for the improvement release. To be more realistic, it may be useful in the above computation to use bi+ei as the estimated changes in the module, where ei represents the number of changes required in module i by previous activity. Evaluating design approaches The best time to guarantee success of a· system development effort is in the early design stages when architecture of the system is still variable. There is much to be gained by selecting an appropriate "decomposition" (see Reference 4). of the system into subsystems, components, etc. During this stage of a project, module connection analysis is a useful tool for evaluating various decompositions, interfacing techniques, etc. It is a simple, quantitative way of estimating the modularity of a system, the ever-present objective that no one knows exactly how to achieve. By fixing some of his assumptions about intermodule connections, a designer can experiment with various system organizations to determine which are the least likely to achieve "critical mass." Evaluating implementation approaches The reader who performs some simple experiments with the formulas described here is likely to be very surprised at the results. Even an extremely sparse connection matrix with very low probabilities can result [examine (/ - P)-l] in very large "ripple factors." It is also interesting to experiment with small perturbations in the connection matrix and observe the profound effect they can have on the "ripple factor." One becomes convinced more than ever before that it is necessary to minimize connections between modules, localize changes, and simplify the process of making changes. The most impressive gains come from minimizing the probabilities of intermodule propagation of changes. A reduction of the average probability by as little as 5 or 10 percent can cause a significant reduction in the "ripple factor." Additional improvement can result from improvements in techniques for making changes. The total debug time is essentially linear with respect to the time required to make a change, but the multiplier (total number of changes) can be so large that any reduction in the time-per-change results in enormous savings. lVlodule connection techniques are extremely useful in estimating the value of various implementation techniques and strategies. How are the module connection probabilities changed if we use a high-level implementation language? How much easier will it be to Module Connection Analysis make changes? How much will we save, if any, by doing elaborate environment simulation and testing of each module before it is integrated with the system? l\1odule connection analysis is a valuable augmentation of intuition in these areas and can be useful for generating cost justifications for approaches that result in significant savings. CONCLUSION The objective of this paper has been to describe a simple model for the effect of "rippling changes" in a large system. The model can be used to estimate the number of changes and a release strategy for stabilizing a system given any set of initial changes. The model can be criticized for being simplistic, yet it seems to describe the essence of the problem of stabilizing a system. It is clear, to the author at least, that experimentation with the module connection model could have 179 prevented a significant portion of the schedule delay that occurred for many large systems. REFERENCES 1 H R J GROSCH Why MAC, MIS, and ABM won't fly Datamation 17 Nov 1 1971 pp 71-72 2 J D ARON Estimating resources for large programming systems Software Engineering Techniques J N Buxton and B Randell (eds) April 1970 3 L A BELADY M M LEHMAN Programming system dynamics or the meta-dynamics of systems in maintenance and growth Research Report IBM Thomas J Watson Research Center Yorktown Heights New York July 1971 4 C ALEXANDER Notes on the synthesis of form Cambridge Mass Harvard University Press 1964 5 M MARCUS Basic theorems in matrix theory National Bureau of Standards Applied Mathematics Series #57 US Government Printing Office January 1960 Evaluating the effectiveness of software verificationPratical experience with an automated tool by J. R. BROWN and R. H. HOFFMAN TRW Systems Group Redondo Beach, California tem , an evolving collection of automated tools which . provide support in various phases of software testmg. Examination of a typical software testing process results in identification of four fundamental activities: test planning, production, execution and evaluation. Examination of the overall cost and schedule impact resulting from manual performance of these activities reveals the reasons for many testing efforts being less complete and successful than expected. With emphsais upon those tasks which are often neglected due to the menial aspect of their performance, PACE development was planned to complement manual testing efforts with automated utilities. Early planning and study efforts indicated a need to give emphasis to the ability of the system to meet diverse (and probably changing) user needs. To adequately cope with this requirement a number of events (instances) were identified at which operation~l releases of interim PACE capability would be most beneficial. Practical applications of the capabilities produced by each P ACE instance would then provide meaningful direction for subsequent releases. The initial PACE instance was the FLOW program to support test evaluation activities. FLOW monitors statement usage during test execution, thus providing a basic evaluation of test effectiveness. The results produced by FLOW, in particular the statement usage frequencies, are similar to the program profiles discussed by Knuth in Reference 1. In addition, FLOW supports the test planning activity by indicating the unexercised code and, consequently, the additional tests required for more comprehensive testing. INTRODUCTION From the point of view of the user, a reliable computer program is one which performs satisfactorily according to the computer program's specifications. The ability to determine if a computer program does indeed satisfy its specifications is most often based upon accumulated experience in using the software. This is due in part to general agreement that the quality of computer software increases as the software is extensively used and failures are discovered and corrected. In keeping with this philosphy, increasing emphasis has been placed on exhaustive testing of computer programs as the principal means of assuring sufficient quality. Nevertheless, a significant problem which pervades all software development is a lack of knowledge as to how much testing of a software system or component constitutes sufficient verification. The major impact of this problem (if not adequately addressed) is evidenced by high cost of testing (as much as 50 percent of total project cost) and insufficient visibility of test effectiveness. As a result, we often lack sufficient confidence that the software will continue to operate successfully for unanticipated combinations of data in a real-world environment. In recognition of the high cost and uncertainty of software verification, TRW Systems' Product Assurance Office initiated a company-funded effort to improve upon current testing methodology. Much of the effort has been directed toward development of some general purpose automated software "tools" which would provide significant aid in performance of a software quality assurance activity. The desirable extent to which the "general purpose'; and "automated" characteristics should be pursued has received considerable study, as did a precise definition of "significant aid." The result of the study, experimentation, design and development thus far conducted comprises the TRW Product Assurance Confidence Evaluator (PACE) sys- FLOW PROGRAM DESCRIPTION Purpose During the software development process, a question frequently asked (and seldom if ever answered satis181 182 Fall Joint Computer Conference, 1972 STATEMENT PSN 0 0 0 0 1 2 3 ELEMENT SPEAR PROGRAM SPEAR(INPUT,OUTPUT,TAPE5=INPUT,TAPE6=OUTPUT) DIMENSION A(10),B(10),R(20) N AMELIST /TESTIN /N ,NR,A,B 100 READ(5,TESTIN) CALL SRANK(A,B,R,N,RS,T,NDF,NR) WRITE(6,6001)A 0 0 0 1 2 ELEMENT SRANK SUBROUTINE SRANK(A,B,R,N,RS,T,NDF,NR) DIMENSION A(l),B(l),R(l) FNNN=F(N) IF(NR-1)5,10,5 15 16 17 18 19 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15,16 17 18,19 20 21 22 23 24 25 26 27 0 KT=l CALL TIE(R,N,KT,TSA) CALL TIE(R(N+1),N,KT,TSB) FKTN = F(KT) + F(N) IF(TSA)60,55,60 ELEMENT TIE SUBROUTINE TIE(R,N,KT,T) DIMENSION R(l) T=O.O Y=O.O 5 X=1.0E38 IND=O DO 30 I=l,N IF(R(I) - Y)30,30,10 10 IF(R(I)-X)20,30,30 20 X=R(I) IND=IND+1 30 CONTINUE IF(IND)90,90,40 40 Y=X CT=O.O DO 60I=1,N IF (R(I) EQ.X) CT=CT+l.O 60 CONTINUE IF (CT.NE.O) IF(KT-1) 75,80,75 GO TO 5 75 T=T+CT*(CT-1.0)/2.0 GO T05 80 CONTINUE ICT=CT T = T + F(ICT) /12.0 GO TO 5 90 RETURN END Figure I-Sample program with pseudo statement numbers Evaluating Effectiveness of Software Verification factorily) is: "How much testing is enough?" There appears to be vital interest in the subject,2.3.4 but too little in the way of practical applications has been accomplished in the past to provide any final answers. We feel strongly that a measure of the variety of ways in which a computer program is tested (or not tested) can combine to form a software "experience index", and quantification of the index supports evaluation of both the computer program and testing thoroughness. Based on this premise, the FLOW program was developed to: (1) support assessment of the extensiveness with which a computer program is tested, (2) provide a variety of quantified indices summarizing program operation, and, (3) support efforts to create a more comprehensive but less costly test process. The objective of FLOW is not to find errors, per se, but to quantitatively assess how thoroughly a program has been tested and 183 to support test planning by indicating the portions of code which are not exercised by existing test cases. Method FLOW analyzes the source code of a computer program and instruments the code in a manner which permits subsequent compilation and makes possible monitored execution of the program. This technique is representative of one of several approaches toward software measurement technology described by Kolence. 5 A complete application of FLOW provides for an accumulation of frequencies with which selected program elements (e.g., statements, small segments of code, subprograms, etc.) are exercised as the program is being tested. There are optional levels of detail at which usage **QAFLOW MAP PRINT** ELEMENT SPEAR PSEUDO NOS. FREQ 1 TO 7= 1 CUMULATIVE TIME PSEUDO NOS. FREQ .0780 SECONDS PSEUDO NOS. FREQ PSEUDO NOS. ELEMENT RANK PSEUDO NOS. FREQ 1 TO 7= 20 14 TO 14= 200 24 TO 24= 2 CUMULATIVE TIME PSEUDO NOS. FREQ 8 TO 9= 200 15 TO 17= 20 .6430 SECONDS PSEUDO NOS. FREQ 10 TO 11= 90 18 TO 22= 0 PSEUDO NOS. FREQ 12 TO 13= 20 23 TO 23= 20 ELEMENT SRANK PSEUDO NOS. FREQ 1 TO 5= 1 15 TO 22= 1 31 TO 32= 1 CUMULATIVE TIME PSEUDO NOS. FREQ 6 TO 10= 0 23 TO 26= 0 .0860 SECONDS PSEUDO NOS. FREQ 11 TO 11 = 1 27 TO 29= 1 PSEUDO NOS. FREQ 12 TO 14= 10 30 TO 30= 0 FREQ ELEMENT TIE CUMULATIVE TIME 1.2350 SECONDS PSEUDO NOS. FREQ PSEUDO NOS. FREQ PSEUDO NOS. FREQ PSEUDO NOS. FREQ 1 TO 2 = 2 3 TO 4 = 22 5 TO 6 = 220 7 TO 7= 110 12 TO 13= 20 8 TO 9= 65 10 TO 10= 220 11 TO 11= 22 14 TO 15= 200 16 TO 16= 20 17 TO 17= 200 18 TO 19= 20 20 TO 22= 0 23 TO 26= 20 20 TO 27= 2 **QAFLOW USAGE SUMMARY AFTER 1589 NUMBER PAIRS(ENTRY/EXIT SEGMENTS) THE SUBJECT PROGRAM CONTAINS 90 EXECUTABLE STATEMENTS. THE TEST DATA EXERCISED 72 OF THESE STATEMENTS. THE TEST EFFECTIVENESS RATIO AT THE STATEMENT LEVEL IS .80 THE PROGRAM CONTAINS 1 TERlVIINATION POINTS, ONLY ONE OF WHICH WAS EXECUTED. THE CORRECTED TEST EFFECTIVENESS RATIO IS .80 4. THE PROGRAM CONTAINS 4 ENTRY POINTS. THE TEST DATA EXERCISED THE TEST EFFECTIVENESS RATIO AT THE ENTRY POINT LEVEL IS 1.00 Figure 2-FLOW execution frequency summary 184 Fall Joint Computer Conference, 1972 monitoring can be performed. The desired level is selected by the user and controlled by input. Typical use of the complete FLOW capability involves the application of three distinct FLOW elements. The first of these is QAMOD, the code analysis and" instrumentation module. The QAMOD module sequentially analyzes each statement of a FORTRAN source program and accomplishes the following: 1. The first executable statement of each element (i.e., sub-routine or main program) is assigned a pseudo statement number (PSN) of one. Each subsequent statement (assuming that the most detailed monitoring is opted) is assigned a sequential PSN and the statements are displayed with their assigned number as illustrated in Figure 1.* Statements are referenced by element name and PSN during subsequent FLOW processing. 2. The code is instrumented by the insertion of transfers to the FLOW execution monitor subprogram, QAFLOW. The function of the transfers is the generation of a recording file containing the sequence of statements exercised during test execution. Upon completion of the analysis and instrumentation of the source program, the instrumented version of the program is output to a file for subsequent compilation and execution. QAFLOW is appended to the program prior to execution with test data. The third FLOW module, QAPROC, provides summary statistics on the frequency of use of program elements as well as detailed trace information and an indication of the effectiveness of the test. QAPROC accesses the statement execution recording file generated by execution of the instrumented subject program and produces an evaluation and summary of the test case executed. The recording file is sequentially accessed and the data are assimilated into an internal table. At times designated by the input control options,a display is printed (Figure 2) which includes the following: 1. A map, delineated by subroutine, indicating the number of executions which have been recorded for each statement. * The program shown is a modification of the Spearman Rank Correlation Coefficient program from the IBM Scientific Subroutine Package. 6 Figure 1 shows a portion of the main program SPEAR and the subroutine SRANK (lines omitted indicated by :). The complete subroutine TIE is included to support later reference in this report. 2. Statistics indicating the percentage of the total executable statements which were exercised at least once. 3. Statistics indicating the percentage of the total number of subroutines which were executed at least once. 4. A list of the names of subroutines which were not executed. 5. Total execution time spent in each subroutine. Frequencies derived by FLOW from a number of separate tests of the subject program may be combined to provide a cumulative measure of the comprehensiveness of all testing applied to the program. At the option of the user, detailed trace information can be displayed. The trace depicts the sequence in which statements (referenced by pseudo statement number) were exercised during program execution. A complete trace display for one test of the SPEAR program is illustrated in Figure 3. In addition, time of entry to each subroutine is recorded and displayed to support timing studies. The information in Figure 3 is interpreted as follows: • Execution is initiated at pseudo statement number (PSN) 1 of the main program SPEAR at time 2.474; • Subroutine SRANK is called from PSN 2 of SPEAR at time 2.479; • Subroutine RANK is called following the sequential execution of PSN 1, 2 and 3 of SRANK; • Upon entry to subroutine RANK, PSN 1 and 2 are executed 10 times before proceeding to PSN 3; • When execution of RANK reaches PSN 24, control is returned to subroutine SRANK at PSN 4. The value of the FLOW trace information in understanding an otherwise complex logic structure can be appreciated by following the execution of subroutine TIE (using the program listing in Figure 1). The interaction of the three FLOW modules is illustrated in Figure 4 with a description of inputs and outputs for a typical application. CASE STUDIES In early planning for the capability which FLOW should provide, consideration was given to the requirements of the various phases of the software testing process. Because of the resulting flexibility of the FLOW program, successful use has been reported from a number of diverse applications. IVlajor usage has been in two areas: (1) assessment of. testing effectiveness, and Evaluating Effectiveness of Software Verification **QAFLOW TRACE PRINT** ELEMENT SPEAR TIME = 2.4740 1- 2, ELEMENT SRANK TIME = 2.4790 1- 3, ELEJ\IENT RANK TIME 2.4850 1- 2 (10 TIMES) 14- 14, 8- 11 3- 9, 12- 14, 8- 11, 14- 14, 8- 11, 14- 14, 8- 11, 14- 14 14- 14, 8- 9, 14- 14 14- 14, 8- 9, 8- 9, 12- 17, 23- 24 8- 9, 14- 14, 2.8040 ELEMENT SRANK TIME 4- 4~ TIME ELEMENT RANK 2.8100 1- 2 (10 TIMES) 3- 9, 12- 14, 8- 9, 14- 14, 8- 9 ELEMENT 17- 17, ELEMENT 1- 10, 55- 7, 1010- 10, 55- 7, 1017- 17, 1414- 15, 1717- 17, 1414- 15, 175- 10, 55- 7, 10- SRANK TIME TIE 7, 10, 7, 10, 15, 17, 15, 19, 7, 10, 10510517141723105- 10171714310- 17, 15, 15, 17, 10, 6, 14171714510- 15, 17, 17, 15, 6, 10, 17- 17, 23- 26, 5- 6, 17141417105- 14- 15, 3- 6, 10- 10, 10- 10, 527- 27, ELEMENT 18- 22, 27ELEMENT 6, TIME 10, 7, 10, 7, 17, 15, 17, 26, 10, 7, 3.7410 5105101417143510- 3.7470 7, 1010, 57, 1017, 1415, 1717, 1415, 176, 107, 1010, 5- 17141417105- 17, 17, 15, 19, 10, 6, 14- 15 14- 15 17- 17 23- ·26 5- 6 10- 10 17- 17, 10- 10, 5- 6, 14- 15, 5- 6, 10- 10, 17- 19 10- 10 5- 6 10- 10, 5- 6, 10- 11 15, 17, 17, 15, 6, 10, SRANK TIME 29, 31- 32, SPEAR TIME = 4.3700 4.4270 3- 7, Figure 3-FLOW execution trace display 10 7 10 15 17 15 17 10 10 7 185 186 Fall Joint Computer Conference, 1972 OAMOD INDEXED LISTING OF SUBJECT SOURCE PROGRAM (FIGURE 1) time and 35-50 man-hours of test results validation. Although developers were aware that redundant testing was being performed, it was impractical to delete any of the cases from the file. Because of the criticality of the program's accuracy, removal of any test case without precise proof of its impact on verification effectiveness could not be allowed. In addition, the tight schedule of the project did not permit detailed manual appraisal of each test case. The FLOW program provided the means of determining the areas of HOPE which were tested by each case. The first FLOW analysis disclosed that the 33 cases tested 85 percent of the subprograms and that one-half of this number were exercised by almost every case. Consideration of these statistics prompted the funding of extended analyses to produce a more effective test file. An incremental test planning activity was performed and a file of six cases was generated. These six cases tested 93 percent of the subprograms, but they required less than three hours of computer time and less than 24 man-hours of test results examination. Since the FLOW analyses indicate the areas of the program exercised by each case, these six cases can be selectively used at each update to assure maximum cost effectiveness . • Navigation Simulation Processor (NAVPRO) QAPROC (FIGURE 2) Figure 4-FLOW program overview (2) analysis and solution of software problems difficult to solve with conventional techniques. Brief descriptions of several such applications are included here and grouped accordingly. Test effectiveness • Houston Operations Predictor /Estimator (HOPE) The HOPE program is used by NASA/MSC for orbit determination and error analyses on the Apollo Missions. It contains approximately 500 subprograms including 80,000 lines of code. Over a four year period of program development, cases had been added to the . test data file as required until the file consisted of 33 separate cases which required 4.5 hours of computer N AVPRO is the program used by N ASA/MSC to process data from Apollo Command Module and Lunar Module onboard computer navigation simulation programs. NAVPRO contains approximately 75 subprograms and 4000 executable statements. FLOW was applied to NAVPRO to assist in the generation of a comprehensive set of test cases. The first step was selection of a basic test from the cases which were then being used for verification. FLOW analysis of the effectiveness of this first test had surprising resuIts; although the case exercised 45 percent of the NAVPRO code, it was apparent that the time span being simulated (and consequently the case execution time) could be reduced by 85 percent without significantly reducing the effectiveness of the test. By eliminating this redundant testing and then manually extending the input data with the goal of improving its effectiveness, the case was modified such that it tested 80 percent of the code in one-fourth the time required by the original case. By continued application of FLOW, a complete test file consisting of four cases was compiled which tested 98 percent of the executable statements. The 2 percent not tested were areas of the program dedicated to error terminations not considered worthy of verification at each program update. These Evaluating Effectiveness of Software Verification were verified initially and will be retested only if modifications are made which specifically affect their operation. • Skylab Activities Scheduling Program The FLOW program was used by NASA/MSC to measure the comprehensiveness of a set of 20 test cases for 52 subroutines comprising a crew model for the Skylab Activities Scheduling Program. The testing which had been performed was thought to be adequate but, since the program is to be used for on-line mission support, documentary evidence of sufficient verification is especially important. Each of the 20 test cases was executed and evaluated separately by FLOW, then the results were accumulated using one of the FLOW options. These cumulative results verified that the critical software for each of the subroutines was indeed exercised; thus, there was no requirement to apply FLOW in the modification or addition of test cases. Although no direct manpower savings can be assigned to this application, the value of the confidence in the software and in the test cases due to the FLOW results is evident. The users also acknowledged the value of the trace capability of FLOW, since they easily diagnosed a program error which had been previously undiscovered in their testing. • Program Anatomy Tables Generator (TABGEN) TABGEN is a utility program developed for NASA/ MSC as one of the components of the Automated Verification System (AVS). The functions of TABGEN are to perform syntax analysis of FORTRAN programs, segment the code into blocks of statements and generate tables describing each of these blocks (e.g. variables referenced, transfer destinations) and the logical relationships between blocks. TABGEN consists of approximately 25 subroutines and 2000 executable statements. Through FLOW application to TABGEN, test cases were devised to test 100 percent of the executable statements. The developers and users of TABGEN are convinced of the value of thorough testing, due to the fact that no errors have occurred since delivery of TABGEN in November 1971. The original version of the program was not altered until April 1972, when new requirements made modifications necessary. • Minuteman Operational Targeting Program (MOTP) MOTP is used by USAF/SAC to generate the targeting constants which must be supplied to the guidance 187 computers aboard the Minuteman II missile system. The program contains approximately 160 subprograms which are extensively overlaid. Prior to each delivery of an updated version to SAC, extensive validation must be performed. Because of the criticality of this validation exercise, a means of accurately measuring the testing effectiveness was clearly required. To determine the applicability of FLOW to the MOTP verification effort, a particularly complex portion of the program was instrumented and then monitored during a complete targeting run. FLOW provided new information about portions of the program which were assumed to be exercised but, in fact, were not. The results of this application clearly demonstrated the value of using FLOW to complement verification efforts. The decision was made to incorporate FLOW as a standard testing procedure for future deliveries. Recommendations were also made for selective use of the FLOW logic trace feature to gain a clearer understanding of the more complex portions of the MOTP. Problem solving • Apollo Reference Mission Program (ARM) The ARM program is used by NASA/MSC during Apollo missions for simulation of all activities (powered and free flight) from earth launch to re-entry. Because of its extensive use during Apollo and anticipated future applications, it is imperative that the program execution time be optimal; expecially in the areas of the program which receive the most use. The FLOW program was applied to ARM to determine the most-used portions of the program during a typical mission simulation and to obtain execution time analyses. * Although the application did not produce any surprising results, the predictions of the developers were verified (i.e., timing had been of prime consideration during development). Careful examination of critical statements (those exercised more than 10,000 times during the run) resulted in some minor modifications to improve timing which, if extrapolated over their anticipated period of use, will result in noticeable cost savings. • DRUM SLAB II The DRUM SLAB II aerodynamic analysis program was developed for NASA/MSC to simulate the molecule impact force and direction on spacecraft surfaces * Similar applications have been produced by Knuth using the FORDAP program. 1 188 Fall Joint Computer Conference, 1972 during re-entry. During checkout, the program always aborted after seven minutes of execution with an illegal operation apparently resulting from erroneous storage of data due to the complex computation of various indices. Several unsuccessful attempts were made to manually diagnose and solve the problem. Although the incorrect data storage was thought to be occurring throughout the run, it did not cause an abort until the density of the molecules began to increase rapidly at lower altitudes. It was not obvious which of the indices were being miscalculated or at what point they were computed. Because of the complex modelling of the program and the fact that the original developers were not available, the problem caused the development project to be discontinued after three months of unsuccessful debug efforts using conventional methods. Several months later, after attending a FLOW demonstration, the manager in charge of the DRUM SLAB development requested that FLOW be applied in an attempt to diagnose the problem. By selective instrumentation of the DRUM SLAB program and application of the FLOW data trace option, the problem was found to originate at some point during execution of the first 800 lines of the main program. Then, by close examination of the statement execution trace for these 800 lines, the precise point at which the erroneous index computation occurred was determined. Three separate errors were found in the computation of various indices. Correction of these computations eliminated the store error and resulted in an apparently error-free execution until the run was terminated by the operator at 15 minutes (the maximum execution time specified for the run). Although limited funds and lack of personnel familiar with the DRUM SLAB program prohibited a complete verification of the modified program, the utility of FLOW was proven by the fact that the problem had been solved in 50 man-hours by personnel totally unfamiliar with the DRUM SLAB program. • Minuteman Geometric Identification Data Program (GIDATA) The Minuteman Geometric Identification Data (GIDATA) program has been used to generate absolute and relative radar data for tracking sites. Recently, the program was extensively modified to generate special purpose output. The FLOW program was applied to the GIDATA program before modification was started in hope that the analysis would give a better understanding of the program and, hence, aid in modification design. Some of the most useful information obtained from FLOW was: • Subroutine level trace and usage summary • Inefficient subroutine structure and calling sequences • Areas where code was never used • Relative subroutine timing indicating inefficient code Using this information a better understanding of GIDATA was achieved and it became relatively simple to determine necessary modifications for reducing program execution time and core requirements. Upon completion of the GIDATA modifications, additional applications of FLOW will ensure comprehensive testing of the program. • Navigation Simulation Processor (NAVPRO) In generation of a particular test case for NAVPRO (program described in the previous sections of this report), a problem developed when the error flag indicating vehicle impact with the lunar surface was being set during execution. Since the flag was in global COMMON and could have been set in any of several subroutines during the integration, it was difficult to determine precisely where the error was occurring. Since NAVPRO had already been instrumented for statement execution monitoring, the origin of the error was easily detected. By using a special option of FLOW, the value stored in the error flag location was checked at execution of each transfer during the run. The FLOW display disclosed the exact statement at which the vehicle impact flag was set and described the program logic flow immediately preceding the impact. The error, which was in the NAVPRO input data, was found and corrected. • Earth Re-entry Orbit Determination Program (REPOD) REPOD is a large multi-link program developed and used in support of Minuteman trajectory analysis and orbit determination. Since REPOD is an amalgamation of several older programs, the detailed flow through each of its 9 links is particularly difficult to understand. The trajectory link is one of the more complex and was therefore chosen for FLOW analysis to identify possible program improvements. The analysis of the trajectory link was particularly desirable because: (1) A significant portion of the total REPOD execution time is spent in this link. (2) It was felt, by the user, that the FLOW analysis would lead to significant improvement in program efficiency. Evaluating Effectiveness of Software Verification One application of FLOW provided some striking results in identification of blocks of statements which were exercised with unexpected high frequency. FLOW also: (1) identified portions of REPOD not used for selected input options, (2) displayed subroutine and statement trace data for given options, and (3) indicated primary areas of concern for subsequent program improvements. Using the FLOW results as a guide, a detailed examination of the trajectory integration algorithm was initiated. The complete task culminated in significant reductions in execution time (for example, processing time for one function was cut from 67 seconds to 11 seconds) and optimum selection of error criterion and integration step size for improved program performance. SUMMARY The initial PACE instance described here responded - to an important need in supporting assurance of comprehensively tested and more reliable software products. Although execution of all statements is by no means a conclusive measure of test effectiveness, it is considered an important first step in the improvement of conventional testing methodology. Subsequent instances of PACE have produced: • A program which displays unexercised statements and performs an analysis of the FORTRAN code to determine the conditions necessary for their execution;3 the computation and input of significant parameters is highlighted to support test redesign activities. • A program to determine all possible logical transfers and extrapolate these to construct and display all logic paths within a FORTRAN module. 7 • A program to monitor the execution of transfers during program execution;8 a test effectiveness ratio is calculated based upon actual versus potential transfers exercised (used either as an alternative or in conjunction with FLOW statement usage analyses). Parallel research and development activities have resulted in: • A FLOW-like program to produce statement usage frequency without the execution trace feature;9 although the results are not as detailed as those produced by FLOW, the program operation is more efficient and therefore more easily applied to large systems. • Well-defined steps for the adaptation of PACE 189 technology to programming languages other than FORTRAN (e.g., assembly language, COBOL, JOVIAL). This approach toward development of PACE technology has proved successful and has resulted in needed exposure and critique of concepts and techniques. PACE applications have already provided some very meaningful answers to a variety of participants (from programmer to procurer) in a number of software· development activities. As was expected, each new application lends additional insight into the evaluation of existing PACE technology and provides vital information for direction of continued design and implementation. 10 Application of PACE capabilities has stimulated interest in the effectiveness of testing among TRW personnel and its customers and has provided a firm foundation upon which a long-neglected technology2 can now advance. ACKNOWLEDGMENTS Without the cooperation of many individuals the collection and presentation of the FLOW usage results documented he:re would have been an extremely difficult task. Those particularly deserving of mention are A. C. Arterbery, K. W. Krause, Dr. E. C. Nelson, R' M. Poole, R. W. Smith and R. F. Webber. REFERENCES 1 DE KNUTH An empirical study of FORTRAN programs Software-Practice and Experience Vol 1 pp 105-103 1971 2 F GRUENBERGER Program testing and validating Computing: A First Course 1968 3 J R BROWN et al A utomated software quality assurance: A case study of three systems Presented at the ACM SIGPLAN Symposium Chapel Hill North Carolina June 21-23 1972 4 LTC F BUCKLEY Verification of software programs Computers and Automation February 1971 5 K W KOLENCE A software view of measurement tools Datamation January 1971 6 System/360 scientific subroutine package (360A-CM-03X) version III programmer's manual IBM Application Program H20-0205-3 7 J R BROWN Practical applications of automated software tools To be published in the Proceedings of the Western Electronic Show and Convention (WESCON) 190 Fall Joint Computer Conference, 1972 Los Angeles California September 19-22 1972 8 R W SMITH Measurement of segment relationship execution frequency TRW Systems (#72-4912.30-31) March 29 1972 9 R H HOFFMAN et al Node determination and analysis program (NODAL) user's manual TRW Systems (#18793-6147-RO-00) June 30 1972 10 J R BROWN R H HOFFMAN Automating software development-A survey of techniques and automated tools TRW Inc May 1972 A design methodology for reliable software systems* by B. H. LI8KOV** The JlrfITRE Corporation Bedford, Massachusetts INTRODUCTION In order for testing to guarantee reliability, it is necessary to insure that all relevant test cases have been checked. This requires solving two problems: Any user of a computer system is aware that current systems are unreliable because of errors in their software components. While system designers and implementers recognize the need for reliable software, they have been unable to produce it. For example, operating systems such as 08/360 are released to the public with hundreds of errors still in them. l A project is underway at the MITRE Corporation which is concerned with learning how to build reliable software systems. Because systems of any size can always be expected to be subject to changes in requirements, the project goal is to produce not only reliable software, but readable software which is relatively easy to modify and maintain. This paper describes a design methodology developed as part of that project. (1) A complete (but minimal) set of relevant test cases must be identified. (2) It must be possible to test all relevant test cases; this implies that the set of relevant test cases is small and that it is possible to generate every case. The solutions to these problems do not lie in the domain of debugging, which has no control over the sources of the problems. Instead, since it is the system design which determines how many test cases there are and how easily they can be identified, the problems can be solved most effectively during the design process: The need for exhaustive testing must influence the design. We believe that such a design methodology can be developed by borrowing from the work being done on proof of correctness of programs. While it is too difficult at present to give formal proofs of the correctness of large programs, it is possible to structure programs so that they are more amenable to proof techniques. The objective of the methodology presented in this paper is to produce such a program structure, which will lend itself to informal proofs of correctness. The proofs, in addition to building confidence in the correctness of the program, will help to identify the relevant test cases, which can then be exhaustively tested. When exhaustive testing is combined with informal proofs, it is reasonable to expect reliable software after testing is complete. This expectation is borne out by at least one experiment performed in the past. 4 Rationale Before going on to describe the methodology, a few words are in order about why a design methodology approach to software reliability has been selected. t The unfortunate fact is that the standard approach to building systems, involving extensive debugging, has not proved successful in producing reliable software, and there is no reason to suppose it ever will. Although improvements in debugging techniques may lead to the detection of more errors, this does not imply that all errors will be found. There certainly is no guarantee of this implicit in debugging: as Dijkstra said, "Program testing can be used to show the presence of bugs, but never to show their absence." 3 * This work was supported by Air Force Contract No. F19(628)71-C-0002. ** Present Address-Department of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts. t The material in this section is covered in much greater detail in Liskov and Towster.2 The scope of the paper A key word in the discussion of software reliability is "complex"; it is only when dealing with complex sys191 192 Fall Joint Computer Conference, 1972 tems that reliability becomes an acute problem. A twofold definition is offered for "complex." First, there are many system states in such a system, and it is difficult to organize the program logic to handle all states correctly. Second, the efforts of many individuals must be coordinated in order to build the system. A design methodology is concerned with providing techniques which enable designers to cope with the inherent logical complexity effectively. Coordination of the efforts of individuals is accomplished through management techniques. The fact that this paper only discusses a design methodology should not be interpreted to imply that management techniques are unimportant. Both design methodology and management techniques are essential to the successful construction of reliable systems. It is customary to divide the construction of a software system into three stages: design, implementation, and testing. Design involves both making decisions about what precisely a system will do and then planning an overall structure for the software which enables it to perform its tasks. A "good" design is an essential first step toward a reliable system, but there is still a long way to go before the system actually exists. Only management techniques can insure that the system implementation fits into the structure established by the design and that exhaustive testing is carried out. The management techniques should not only have the form of requirements placed on personnel; the organization of personnel is also important. It is generally accepted that the organizational structure imposes a structure on the system being built.5 Since we wish to have a system structure based on the design methodology, the organizational structure must be set up accordingly. * CRITERIA FOR A GOOD DESIGN The design methodology is presented in two parts. This section defines the criteria which a system design should satisfy. The next section presents guidelines intended to help a designer develop a design satisfying the criteria. To reiterate, a complex system is one in which there are so many system states that it is difficult to understand how to organize the program logic so that all states will be handled correctly. The obvious technique to apply when confronting this type of situation is "divide and rule." This is an old idea in programming and is known as modularization. Modularization consists of dividing a program into subprograms * Management techniques intended to support the design methodology proposed in this paper are described by Liskov. 6 (modules) which can be compiled separately, but which have connections with other modules. We will use the definition of Parnas:7 "The connections between modules are the assumptions which the modules make about each other." Modules have connections in control via their entry and exit points; connections in data, explicitly via their arguments and values, and implicitly through data referenced by more than one module; and connections in the services which the modules provide for one another. Traditionally, modularity was chosen as a technique for system production because it makes a large system more manageable. It permits efficient use of personnel, since programmers can implement and test different modules in parallel. Also, it permits a single function to be performed by a single module and implemented and tested just once, thus eliminating some duplication of effort and also standardizing the way such functions are performed. The basic idea of modularity seems very good, but unfortunately it does not always work well in practice. The trouble is that the division of a system into modules may introduce additional complexity. The complexity comes from two sources: functional complexity and complexity in the connections between the modules. Examples of such complexity are: (1) A module is made to do too many (related but different) functions, until its logic is completely obscured by the tests to distinguish among the different functions (functional complexity). (2) A common function is not identified early enough, with the result that it is distributed among many different modules, thus obscuring the logic of each affected module (functional complexity) . (3) Modules interact on common data in unexpected ways (complexity in connections) . The point is that if modularity is viewed only as an aid to management, then any ad hoc modularization of the system is acceptable. However, the success of modularity depends directly on how well modules are chosen. We will accept modularization as the way of organizing the programming of complex software systems. A major part of this paper will be concerned with the question of how good modularity can be achieved, that is, how modules can be chosen so as to minimize the connections between them. First, however, it is necessary to give a definition of "good" modularity. To emphasize the requirement that modules be as disjoint as possible, and because the term "module" has been used so often and so diversely, we will discard it and define modularity as the division of the system into Design Methodology for Reliable Software Systems "partitions." The definition of good modularity will be based on a synthesis of two techniques, each of which addresses a different aspect of the problem of constructing reliable software. The first, levels of abstraction, permits the development of a system design which copes with the inherent complexity of the system effectively. The second, structured programming, insures a clear and understandable representation of the design in the system software. Levels of abstraction Levels of abstraction were first defined by Dijkstra. 8 They provide a conceptual framework for achieving a clear and logical design for a system. The entire system is conceived as a hierarchy of levels, the lowest levels being those closest to the machine. Each level supports an important abstraction; for example, one level might support segments (named virtual memories), while another (higher) level could support files which consist of several segments connected together. An example of a file system design based entirely on a hierarchy of levels can be found in Madnick and Alsop. 9 Each level of abstraction is composed of a group of related functions. One or more of these functions may be referenced (called) by functions belonging to other levels; these are the external functions. There may also be internal functions which are used only within the level to perform certain tasks common to all work being performed by the level and which cannot be referenced from other levels of abstraction. Levels of abstraction, which will constitute the partitions of the system, are accompanied by rules governing some of the connections between them. There are two important rules governing levels of abstraction. The first concerns resources (I/O devices, data) : each level has resources which it owns exclusively and which other levels are not permitted to access. The second involves the hierarchy: lower levels are not aware of the existence of higher levels and therefore may not refer to them in any way. Higher levels may appeal to the (external) functions of lower levels to perform tasks; they may also appeal to them to obtain information contained in the resources of the lower levels. * * In the Madnick and Alsop paper referenced earlier, the hierarchy of levels is strictly enforced in the sense that if the third level wishes to make use of the services of the first level, it must do so through the second level. This paper does not impose such a strict requirement; a high level may make use of a level several steps below it in the hierarchy without necessarily requiring the assistance of intermediate levels. The 'THE' systemS and the Venus systemlO contain exampl~ of levels used in this way. 193 Structured programming Structured programming is a programming discipline which was introduced with reliability in mind. 11 ,12 Although of fairly recent origin, the term "structured programming" does not have a standard definition. We will use the following definition in this paper. Structured programming is defined by two rules. The first rule states that structured programs are developed from the top down, in levels. * The highest level describes the flow of control among major functional components (major subsystems) of the system; component names are introduced to represent the components. The names are subsequently associated with code which describes the flow of control among still lower-level components, which are again represented by their component names. The process stops when no undefined names remaIn. The second rule defines which control structures may be used in structured programs. Only the following control structures are permitted: concatenation, selection of the next statement based on the testing of a condition, and iteration. Connection of two statements by a goto is not permitted. The statements themselves may make use of the component names of lower-level components. Structured prograInming and proofs of correctness The goal of structured programming is to produce program structures which are amenable to proofs of correctness. The proof of a structured program is broken down into proofs of the correctness of each of the components. Before a component is coded, a specification exists explaining its input and output and the function which it is supposed to perform. (The specification is defined at the time the component name is introduced; it may even be part of the name.) When the component is coded, it is expressed in terms of specifications of lower level components. The theorem to be proved is that the code of the component matches its specifications; this proof will be given based on axioms stating that lower level components match their specifications. The proof depends on the rule about control structures in two important ways. First, limiting a component to combinations of the three permissible control structures insures that control always returns from a component to the statement following the use of the * The levels in a structured program are not (usually) levels of abstraction, because they do not obey the rule about ownership of resources. 194 Fall Joint Computer Conference, 1972 component name (this would not be true if go to statements were permitted). This means that reasoning about the flow of control in the system may be limited to the flow of control as defined locally in the component being proved. Second, each permissible control structure is associated with a well-known rule of inference: concatenation with linear reasoning, iteration with induction, and conditional selection with case analysis. These rules of inference are the tools used to perform the proof (or understand the component). Structured progra:mming and syste:m design Structured programming is obviously applicable to system implementation. We do not believe that by itself it constitutes a sufficient basis for system design; rather we believe that system design should be based on identification of levels of abstraction. * Levels of abstraction provide the framework around which and within which structured programming can take place. Structured programming is compatible with levels of abstraction because it provides a comfortable environment in which to deal with abstractions. Each structured program component is written in terms of the names of lower-level components; these names, in effect, constitute a vocabulary of abstractions. In addition, structured programs can replace flowcharts as a way of specifying what a program is supposed to do. Figure 1 shows a structured program for the top level of the parser in a bottom-up compiler for an begin integer relation; boolean must-scan; string symbol; stack parse_stack; must.scan := true; push (parse_stack, eoLentry); while not finished(parse_stack) do begin if must.scan then symbol := scan_next-symbol; relation := precedenceJelation(top(parse_stack), symbol); perform_opera tion_based_onJelation (relation, parse_stack, symbol, must-scan) end end Figure I-A structured program for an operator precedence parser * A recent paper by Henderson and Snowden13 describes an experiment in which structured programming was the only technique used to build a program. The program had an error in it which was the direct result of not identifying a level of abstraction. INITIALIZE .. / FINISHED? \ NO YES SCAN SYMBOL IF NECESSARY COMPUTE PRECEDENCE RELATION .r PERFORM OPERATION BASED ON PRECEDENCE RELATION Figure 2-Flowchart 'of an operator precedence parser operator precedence grammar, and Figure 2 is a flowchart containing approximately the same amount of detail. While it is slightly more difficult to write the structured program, there are compensating advantages. The structured program is part of the final program; no translation is necessary (with the attendant possibility of introduction of errors). In addition, a structured program is more rigorous than a flowchart. For one thing, it is written in a programming language and therefore the semantics are well defined. For another, a flowchart only describes the flow of control among parts of a system, but a structured program at a minimum must also define the data controlling its flow, Design Methodology for Reliable Software Systems so the description it provides is more concrete. In addition, it defines the arguments and values of a referenced component, and if a change in level of abstraction occurs at that point, then the data connection between the two components is completely defined by the structured program. This should help to avoid interface errors usually uncovered during system integration. Basic definition We now present a definition of good modularity supporting the goal of software reliability. The system is divided into a hierarchy of partitions, where each partition represents one level of abstraction, and consists of one or more functions which share common resources. At the same time, the entire system is expressed by a structured program which defines the way control passes among the partitions. The connections between the partitions are limited as follows: (1) The connections in control are limited by the rules about the hierarchy of levels of abstraction and also follow the rules for structured programs. (2) The connections in data between partitions are limited to the explicit arguments passed from the functions of one partition to the (external) functions of another partition. Implicit interaction on common data may only occur among functions within a partition. (3) The combined activity of the functions in a partition support its abstraction and nothing more. This makes the partitions logically independent of one another. For example, a partition supporting the abstraction of files composed of many virtual memories should not contain any code supporting the existence of virtual memories. A system design satisfying the above requirements is compatible with the goal of software reliability. Since the system structure is expressed as a structured program, it should be possible to prove that it satisfies the system specifications, assuming that the structured programs which will eventually support the functions of the levels of abstraction satisfy their specifications. In addition, it is reasonable to expect that exhaustive testing of all relevant test cases will be possible. Exhaustive testing of the whole system means that each partition must be exhaustively tested, and all combinations of partitions must be exhaustively tested. Exhaustive testing of a single partition involves both testing based on input parameters to the functions in the partition and testing based on intermediate values of state vari- 195 abIes of the partition. When this testing is complete, it is no longer necessary to worry about the state variables because of requirement 2. Thus, the testing of combinations of partitions is limited to testing the input and output parameters of the external functions in the partitions. In addition, requirement 3 says that partitions are logically independent of one another; this means that it is not necessary when combining partitions to test combinations of the relevant test cases for each partition. Thus, the number of relevant test cases for two partitions equals the sum of the relevant test cases for each partition, not the product. GUIDELINES FOR SYSTEM DESIGN Now that we have a definition of good modularization, the next question is how a system modularization satisfying this definition can be achieved. The traditional technique for modularization is to analyze the execution-time flow of the system and organize the system structure around each major sequential task. This technique leads to a structure which has very simple connections in control, but the connections in data tend to be complex (for examples see Parnas14 and CohenI5). The structure therefore violates requirement 2; it is likely to violate requirement 3 also since there is no reason (in general) to assume any correspondence between the sequential ordering of events and the independence of the events. If the execution flow technique is discarded, however, we are left with almost nothing concrete to help us make decisions about how to organize the system structure. The guidelines presented here are intended to help rectify this situation. First are some guidelines about how to select abstractions; these guidelines tend to overlap, and when designing a system, the choice of a particular abstraction will probably be based on several of the guidelines. Next the question of how to proceed with the design is addressed. Finally, an example of the selection of a particular abstraction within the Venus systemlO is presented to illustrate the application of several of the principles; an understanding of Venus is not necessary for understanding the example. Guidelines for selecting abstractions Partitions are always introduced to support an abstraction or concept which the designer finds helpful in thinking about the system. Abstraction is a very valuable aid to ordering complexity. Abstractions are introduced in order to make what the system is doing clearer and more understandable; an abstraction is a conceptual simplification because it expresses what is being done 196 Fall Joint Computer Conference, 1972 without specifying how it is done. The purpose of this section is to discuss the types of abstractions which may be expected to be useful in designing a system. Abstractions of resources Every hardware resource available on the system will be represented by an abstraction having useful characteristics for the user or the system itself. The abstraction will be supported by a partition whose functions map the characteristics of the abstract resource into the characteristics of the real underlying resource or resources. This mapping may itself make use of several lower partitions, each supporting an abstraction useful in defining the functions of the original partition. It is likely that a strict hierarchy will be imposed on the group of partitions; that is, other parts of the system may only reference the functions in the original partition. In this case, we will refer to the lower partitions as "sub-partitions." Two examples of abstract resources are given. In an interactive system, "abstract teletypes" with end-ofmessage and erasing conventions are to be expected. In a multiprogramming system, the abstraction of processes frees the rest of the system from concern about the true number of processors. Abstract characteristics of data In most systems the users are interested in the structure of data rather than (or in addition to) storage of data. The system can satisfy this interest by the inclusion of an abstraction supporting the chosen data structure; functions of the partition for that abstraction will map the structure into the way data is actually represented by the machine (again this may be accomplished by several sub-partitions). For example, in a file management system such an abstraction might be an indexed sequential access method. The system itself also benefits from abstract representation of data; for example, the scanner in a compiler permits the rest of the compiler to deal with symbols rather than with characters. Simplification via limiting information According to the third requirement for good modularization, the functions comprising a partition support only one abstraction and nothing more. Sometimes it is difficult to see that this restriction is being violated, or to recognize that the possibility for identification of another abstraction exists. One technique for simplification is to limit the amount of information which the functions in the partition need to know (or even have access to). An example of such information is the complicated format in which data is stored for use by the functions in the partition (the data would be a resource of the partition). The functions require the information embedded in the data but need not know how it is derived from the data. This knowledge can be successfully hidden within a lower partition (possibly a sub-partition) whose functions will provide requested information when called; note that the data in question become a resource of the lower partition. Simplification via generalization Another technique for simplification is to recognize that a slight generalization of a function (or group of functions) will cause the functions to become generally useful. Then a separate partition can be created to contain the generalized function or functions. Separating such groups is a common technique in system implementation and is also useful for error avoidance, minimization of work, and standardization. The existence of such a group simplifies other partitions, which need only appeal to the functions of the lower partition rather than perform the tasks themselves. An example of a generalization is a function which will move a specified number of characters from one location to another, where both locations are also specified; this function is a generalization of a function in which one or more of the input parameters is assumed. Sometimes an already existing partition contains functions supporting tasks very similar to some work which must be performed. When this is true, a new partition containing new versions of those functions may be created, provided that the new functions are not much more complex than the old ones. System maintenance and modification Producing a system which is easily modified and maintained is one of our primary goals. This goal can be aided by separating into independent partitions functions which are performing a task whose definition is likely to change in the future. For example, if a partition supports paging of data between core and some backup storage, it may be wise to isolate as an independent partition those functions which actually know what the backup storage device is (and the device becomes a resource of the new partition). Then if a new device is added to the system (or a current device is removed), only the functions in the lower partition 'will be affected; the higher partition will have been isolated Design Methodology for Reliable Software Systems from such changes by the requirement about data connections between partitions. How to proceed with the design Two phases of design are distinguished. The very first phase of the design (phase 1) will be concerned with defining precise system specifications and analyzing them with respect to the environment (hardware or software) in which the system will eventually exist. The result of this phase will be a number of abstractions which represent the eventual system behavior in a very general way. These abstractions imply the existence of partitions, but very little is known about the connections between the partitions, the flow of control among the partitions (although a general idea of the hierarchy of partitions will exist), or how the functions of the partitions will be coded. Every important external characteristic of the system should be present as an abstraction at this stage. Many of the abstractions have to do with the management of system resources; others have to do with services provided to the user. The second phase of system design (phase 2) investigates the practicality of the abstractions proposed by phase 1 and establishes the data connections between the partitions and the flow of control among the partitions. This latter exercise establishes the placement of the various partitions in the hierarchy. The second phase occurs concurrently with the first; as abstractions are proposed, their utility and practicality are immediately investigated. For example; in an information retrieval system the question of whether a given search technique is efficient enough to satisfy system constraints must be investigated. A partition has been adequately investigated when its connections with the rest of the system are known and when the designers are confident that they understand exactly what its effect on the system will be. Varying depths of analysis will be necessary to achieve this confidence. It may be necessary to analyze how the functions of the partition could be implemented, involving phase 1 analysis as new abstractions are postulated requiring lower partitions or sub-partitions. Possible results of a phase 2 investigation are that an abstraction may be accepted with or without changes, or it may be rejected. If an abstraction is rejected, then another abstraction must be proposed (phase 1) and investigated (phase 2). The iteration between phase 1 and phase 2 continues until the design is complete. Structured program.m.ing It is not clear exactly how early structured- programming of the system should begin. Obviously, whenever 197 the urge is felt to draw a flowchart, a structured program should be written instead. Structured programs connecting all the partitions together will be expected by the end of the design phase. The best rule is probably to keep trying to write structured programs; failure will indicate that system abstractions are not yet sufficiently understood and perhaps this exercise will shed some light on where more effort is needed or where other abstractions are required. When is the design finished? The design will be considered finished when the following criteria are satisfied: (1) All major abstractions have been identified and partitions defined for them; the system resources have been distributed among the partitions and their positions in the hierarchy established. (2) The system exists as a structured program, showing how the flow of control passes among the partitions. The structured program consists of several components, but no component is likely to be completely defined; rather each component is likely to use the names of lower-level components which are not yet defined. The interfaces between the partitions have been defined, and the relevant test cases for each partition have been identified. (3) Sufficient information is available so that a skeleton of a user's guide to the system could be written. Many details of the guide would be filled in later, but new sections should not be needed.* A n example from Venus The following example from the Venus systemlO is presented because it illustrates many of the points made about selection, implementation, and use of abstractions and partitions. The concept to be discussed is that of external segment name, referred to as ESN from now on. The concept of ESN was introduced as an abstraction primarily for the benefit of users of the system. The important point is that a segment (named virtual memory) exists both conceptually (as a place where a * This requirement helps to insure that the design fulfills the system specifications. In fact, if there is a customer for whom the system is being developed, a preliminary user's guide derived from the system design could be a means for reviewing and accepting the design. 198 Fall Joint Computer Conference, 1972 programmer thinks of information as being stored) and in reality (the encoding of that information in the computer). The reality of a segment is supported by an internal segment name (ISN) which is not very convenient for a programmer to use or remember. Therefore, the symbolic ESN was introduced. As soon as the concept of ESN was imagined, the existence of a partition supporting this concept was implied. This partition owned a nebulous data resource, a dictionary, which contained information about the mappings between ESNsand ISNs. The formatting of this data was hidden information as far as the rest of the system was concerned. In fact, decisions about the dictionary format and about the algorithms used to search a dictionary could safely be delayed until much later in the design process. A collective name, the dictionary functions, was given to the functions in this partition. Now phase 2 analysis commenced. It was necessary to define the interface presented by the partition to the rest of the system. Obvious items of interest are ESNs and ISNs; the format of ISNs was already determined by the computer architecture, but it was necessary to decide about the format of ESNs. The most general format would be a count of the number of characters in the ESN followed by the ESN itself; for efficiency, however, a fixed format of six characters was selected. At this point a generalization of the concept of ESN occurred, because it was recognized that a two-part ESN would be more useful than a single symbolic ESN. The first part of the ESN is the symbolic name of the dictionary which should be used to make the mapping; the second part is the symbolic name to be looked up in the dictionary. This concept was supported by the existence of a dictionary containing the names of all dictionaries. A format had to be chosen for telling dictionary functions which dictionary to use; for reasons of efficiency, the ISN of the dictionary was chosen (thus avoiding repeated conversions of dictionary ESN into diction~ry IS N) . When phase 2 analysis was over, we had the identification of a partition; we knew what type of function belonged in this partition, what sort of interface it presented to the rest of the system, and what information was kept in dictionaries. As the system design proceeded, new dictionary functions were specified as needed. Two generalizations were realized later. The first was to add extra information to the dictionary; this was information which the system wanted on a segment basis, and the dictionaries were a handy place to store it. The second was to make use of dictionary functions as a general mapping device; for example, dictionaries are used to hold information about the map- ping of record names into tape locations, permitting simplification of a higher partition. In reality, as soon as dictionaries and dictionary functions were conceived, a core of dictionary functions was implemented and tested. This is a common situation in building systems and did not cause any difficulty in this case. For one thing, extra space was purposely left in dictionary entries because we suspected we might want extra information there later although we did not then know what it was. The search algorithm selected was straight serial search; the search was embedded in two internal dictionary functions (a sub-partition) so that the format of the dictionaries might be changed and the search algorithm redefined with very little effect on the system or most of the dictionary functions. This follows the guideline of modifiability. CONCLUSIONS This paper has described a design methodology for the development of reliable software systems. The first part of the methodology is a definition of a "good" system modularization, in which the system is organized into a hierarchy of "partitions", each supporting an "abstraction" and having minimal connections with one another. The total system design, showing how control flows among the partitions, is expressed as a structured program, and thus the system structure is amenable to proof techniques. The second part of the methodology addresses the question of how to achieve a system design having good modularity. The key to design is seen as the identification of "useful" abstractions which are introduced to help a designer think about the system; some methods of finding abstractions are suggested. Also included is a definition of the "end of design", at which time, in addition to having a system design with the desired structure, a preliminary user's! guide to the system could be written as a way of checking that the system meets its specifications. Although the methodology proposed in this paper is based on techniques which have contributed to the production of reliable software in the past, it is nevertheless largely intuitive, and may prove difficult to apply to real system design. The next step to be undertaken at MITRE is to test the methodology by conscientiously applying it, in conjunction with certain management techniques,6 to the construction of a small, but complex, multi-user file management system. We hope that this exercise will lead to the refinement, extension and clarification of the methodology. Design Methodology for Reliable Software Systems ACKNOWLEDGMENTS The author wishes to thank J. A. Clapp and D. L. Parnas for many helpful criticisms. REFERENCES 1 J N BUXTON B RANDELL (eds) Software engineering techniques Report on a Conference Sponsored by the NATO Science Committee Rome Italy p 20 1969 2 B H LISKOV E TOWSTER The proof of correctness approach to reliable systems The MITRE Corporation MTR 2073 Bedford Massachusetts 1971 3 E W DIJKSTRA Structured programming Software Engineering Techniques Report on a Conference sponsored by the NATO Science Committee Rome Italy J N Buxton and B Randell (eds) pp 84.;.88 1969 4 F T BAKER Chief programmer team management of production programming IBM Syst J 111 pp 56-73 1972 5 M CONWAY How do committees invent? Datamation 14 4 pp 28-31 1968 6 B H LISKOV Guidelines for the design and implementation of reliable software systems 7 8 9 10 11 12 13 14 15 199 The MITRE Corporation MTR 2345 Bedford Massachusetts 1972 D L PARNAS Information distribution aspects of design methodology Technical Report Department of Computer Science Carnegie-Mellon University 1971 E W DIJKSTRA The structure of the "THE"-multiprogramming system Comm ACM 11 5 pp 341-346 1968 S MAD NICK J W ALSOP II A modular approach to file system design AFIPS Conference Proceedings 34 AFIPS Press Montvale New Jersey pp 1-13 1969 B H LISKOV The design of the Venus operating system Comm ACM 15 3 pp 144-149 1972 E W DIJKSTRA Notes on structured programming Technische Hogeschool Eindhoven The Netherlands 1969 H D MILLS Structured programming in large systems Debugging Techniques in Large Systems R Rustin (ed) Prentice Hall Inc Englewood Cliffs New Jersey pp 41-55 P HENDERSON R SNOWDEN An experiment in structured programming BIT 12 pp 38-53 1972 D L PARNAS On the criteria to be used in decomposing systems into modules Technical Report CMU-CS-71-101 Carnegie-Mellon University 1971 A COHEN Modular programs: Defining the module Datamation 18 1 pp 34-37 1972 A summary of progress toward proving program correctness by T. A. LINDEN National Security Agency Ft. George G. Meade, Maryland whether the program text is correct with respect to those specifications. The mathematics necessary for this was originally worked out primarily by Floydl and Manna. 2 I t must be made clear that a proof of correctness is radically different from the usual process of testing a program. Testing can and often does prove a program is incorrect, but no reasonable amount of testing can ever prove that a nontrivial program will be correct over all allowable inputs. INTRODUCTION Interest in proving the correctness of programs has grown explosively within the last two or three years. There are now over a hundred people pursuing research on this general topic; most of them are relative newcomers to the field. At least three reasons can be cited for this rapid growth: (1) The inability to design and implement software systems which can be guaranteed correct is severely restricting computer applications in many important areas. (2) Debugging and maintaining large computer programs is now well recognized as one of the most serious and costly problems facing the computer industry. (3) A large number of mathematicians, especially logicians, are interested in applications where their talents can be used. Example The approach to proving programs correct which was developed and popularized by Floyd is still the basis for most current proofs of correctness. I t is generally known as the method of inductive assertions. Let us begin with a simple example of the basic idea. Consider the flowchart in Figure 1 for exponentiation to a positive integral power by repeated multiplication. For simplicity, assume all values are integers. I have put assertions or specifications for correctness on the input and output of the program. We want to prove that if X and Yare inputs with Y>O, then the output Z will satisfy Z = KY. This assertion at the output is the specification for correctness of the program. The assertion at the input defines the input conditions (if any) for which the program is to produce output satisfying the output assertion. Note that the proof will use symbolic techniques to establish that the· program is correct for all allowable inputs. The proof technique works as follows: Somewhere within each loop we must add an assertion that adequately characterizes an invariant property of the loop. This has been done for the single loop flowchart of Figure 1. It is now possible to break this flowchart into tree-like sections such that each section begins and ends with assertions and no section contains a loop. This is This paper summarizes recent progress in developing rigorous techniques for proving that programs satisfy formally defined specifications. Until recently proofs of correctness were limited to toy programs. They are still limited to small programs, but it is now conceivable to attempt to prove the correctness of small critical modules of a large program. This paper is designed to give a sufficient introduction to current research so that a software engineer can evaluate whether a proof of correctness might be applicable to some of his problems sometime in the future. THE NATURE OF CORRECTNESS PROOFS Given formal specifications for a program and given the text of a program in some formally defined language, it is then a well-defined mathematical question to ask 201 202 Fall Joint Computer Conference, 1972 shown in Figure 2 if one disregards the dashed-line boxes. We want to show that if execution of a section begins in a state with the assertion at its head true, then when the execution leaves that section, the assertion at the exit must also be true. By taking an assertion at the end of each of these sections and using the semantics of the program statement above it, one can generate an assertion which should have held before that statement if the assertions after it are to be guaranteed true. Working up the trees one then generates all the assertions in dashed-line boxes in Figure 2. Each section will then preserve truth from its first to its last assertions if the first assertion implies the assertion that was generated in the dashed-line box at the top. One thus gets the logical theorems or verification conditions given below each section. With a little thought it can now be seen that if these theorems can all be proven and if the program halts, then it will halt with the correct output values. In this case the theorems are obviously true. Halting can be proven by other techniques. - - ---z = Xl __ r - -:--- -- - ----- --, :.lY~~ ~=XYl ~ W~~ ~xX == X~ll! >--+J~___ ----i~: ~YJ ___ _I z =Xl ~ [( Y = I ~ Z = X Y) & (Y f I ~ Z x X == xI+ 1) ] Figure 2-Sectioned flowchart The careful reader will note that the input assumption Y> 0 is not really needed for the proof of either of these theorems. This is because that assumption is really only needed to prove that the program terminates. Inherent difficulties Figure 1-Exponentiation program This process for proving the correctness of programs is subject to many variations both to handle programming constructs which do not occur in this example and to try to make the proof of correctness more efficient. Full treatments with many examples are available in a recent survey by Elspas, et al.,3 and in Manna's forthcoming textbook. 4 Some further general comments about the nature of the problem will be made here. Analogous comments could be made about most of the other approaches to proving correctness. Programs can only be said to be correct with respect to formal specifications or input-output assertions. Summary of Progress Toward Proving Program Correctness There is no formal way to guarantee that these specifications adequately express what someone really wants the program to do. Given a program with specifications on the input and output, there is probably no automatic way to generate all the additional assertions which must be added to make the proof work. For a human to add these assertions requires a thorough understanding of the program. The programmer should be able to supply these assertions if he is able to formalize his intuitive understanding of the program. Given a program with assertions in each loop and given an adequate definition of the semantics of the programming language, it is fairly routine to generate the theorems or verification conditions. Several existing computer programs that do this are described below. The real problem in proving correctness lies in the fact that even for simple programs, the theorems that are generated become quite long. This length makes proving the theorems very difficult for a human or for current automatic theorem provers. Formalizing the programmer's intuition of correctness It may not be apparent, but the process of proving correctness is just a formalization into rigorous logical terms of the informal and sometimes sloppy reasoning that a programmer uses in constructing and checking his program. The programmer has some idea of what he expects to be true at each stage of his program (the assertions), he knows how the programming language semantics will transform a stage (generating the assertions in dashed-line boxes of Figure 2), and he convinces himself that the transformations will give the desired result (the proof of the theorem). In this sense proving program correctness is just a way to put into formal language everything one should understand in reading and informally checking a program for correctness. In fact, there is no clear division between the idea of reading code to check it for correctness and the idea of proving it correct by more rigorous means; the difference is one of degree of formality. One question that should be addressed in this context regards the fact that both the correctness and the halting problems for arbitrary programs are known to be undecidable in the mathematical sense. However, this question of mathematical undecidability should not arise for any program for which there are valid intuitive reasons for the program to be correct. Confidence in correctness I hope I have made the point that logical proof of correctness techniques are radically different from 203 testing techniques which are based on executing the program on selected input data in a specific environment. However, I do not want to imply that in a practical situation a proof or anything else can lead to absolute certitude of correctness. In fact a proof by itself does not necessarily lead to a higher level of confidence than might be achieved by extensive testing of a program. From a practical viewpoint there are a number of things that could still be wrong after a proof if one is not careful: what is proven may not be what one thought was proven, the proof may be incorrect, or assumptions about either the execution environment or the problem domain may not be valid. However, a proof does give a quite different and independent view of program correctness, and if it is done well, it should be able to provide a very high level of confidence in correctness. In particular, to the extent that a proof is valid, there should no longer be any doubt about what might happen after allowable but unexpected input values. MANUAL PROOFS The basic ideas in the last section have been known for some time. This section describes the practical progress which has been made with manual proofs in the last few years. The size of programs which can be proven by hand depends on the level of formality that is used. In 1967 McCarthy and Painter5 manually proved the correctness of a compiler for very elementary arithmetic expressions. I t was a formal proof based on formal definitions of the syntax and semantics of the simple languages involved. Rigorous but informal proofs A more informal approach to proofs is now popular. This approach is rigorous, but uses a level of formality like that in a typical mathematics text. Arguments are based on an intuitive definition of the semantics of the programming language without a complete axiomatization. Using these techniques a variety of realistic, efficient programs to do sorting, merging, and searching have been proven correct. The proof of a twenty line sort program might require about three pages. It would now be a reasonable exercise for advanced graduate students. Proofs of significantly more complex programs have also been published. London6 ,7 has done proofs of a pair of LISP compilers. The larger compiler is about 160 lines of highly recursive code. It complies almost the 204 Fall Joint Computer Conference, 1972 full LISP language-enough so it can compile itself. It is a generally unused compiler. It was written for teaching purposes, but it is not just a toy program. Another complex program has been proven correct by Jones. s The program is a PL-1 coding of a slightly simplified version of Earley's recognizer algorithm. It is about 200 lines of code. Probably the largest program that has been proven correct is in the work on computer interval arithmetic by Good and London. 9 There they proved the correctness of over 400 lines of Algol code. The largest individual procedure was in the 150-200 line category. A listing of many other significant programs which have been proven correct can be found in London's recent paper. lO If a complex 200 line program can now be proven correct by one man in a couple of months, one can begin to think about breaking larger programs into modules and getting a proof of correctness within a few man years of effort. Clearly there are programs for which a guarantee of the correctness of the running program would be worth not man years but many man decades of effort. We had better take a closer look at the feasibility of such an undertaking and what the proof of correctness would really accomplish. language program would certainly go a long way toward improving the probability that the program will run according to specifications. Errors in the proof An informal proof of correctness typically is much longer than the program text itself-often five to ten times as long. Thus the proof itself is subject to error just like any other extremely detailed and complex task done by humans. There is the possibility that an informal proof is just as wrong as the program. However, a proof does not have any loops and the meaning of a statement is fixed and not dependent on the current internal state of the computer. To read and check a proof is a straightforward and potentially automatable operation. The same can hardly be said for programs. Despite its potential fallibility, an informal proof would dramatically improve the probability that a program is correct. There is evidence from London's work7 that a proof of correctness will find program bugs that have been overlooked in the code. Less rigorous proofs Environment problems In most existing proofs of program correctness, what has been proven correct is either the algorithm or a high level language representation of the algorithm. With today's computers what happens when the program actually runs on a physical computer would still be anybody's guess. It would be a significant additional chore to verify that the environment for the running program satisfies all the assumptions that were made about it in the proof. Problems with round off errors, overflow, and so forth can be handled in proofs. Good and London, 9 Hoare,11 and others have described techniques for proving properties of programs in the context of computer arithmetic, but this can make the proof much more complex. Furthermore, to assure correctness of the running program one would have to be sure that all assumptions about the semantics of the programming language were actually valid in the implementation. The compiler and other system software would have to be certified. Finally, this could all be for naught considering the possibility of hardware failure as it exists in today's machines. Thus, proving the correctness of a source language program is only one aspect of the whole problem of guaranteeing the correctness of a running program. Nevertheless, eliminating all errors from the source A person proving a program correct by manual techniques must first achieve a very thorough understanding of all details of the program. This clearly limits manual proof techniques to programs simple enough to be totally comprehended by the program provers. It also means that clarity and simplicity is very important in the program design if the program is to be proven correct. There is another school of thought which places primary emphasis on techniques for obtaining clarity and structure in the program design. Dijkstral2 •l3 as long been the primary advocate of this approach. By appropriately structuring the program and by using what is apparently a much less formal approach to proofs, Dijkstra claims to have proven the correctness of his THE operating system. l4 Millsl6 advocates a similar approach with the program being sufficiently structured so an informal proof can be as short as the program text itself. I t is probably true that more practical results can be obtained with less rigorous approaches to proofs, especially in the near future. I t is even debatable whether the more rigorous proofs give more assurance of correctness, but the formality does make it more feasible to automate the proof process. Whether or not one feels that the rigorous hand proofs of correctness will have much practical value, they are providing experience with different proof techniques that should Summary of Progress Toward Proving Program Correctness be very valuable in attempting to automate the proof process. AUTOMATING PROOFS OF CORRECTNESS In proving program correctness the logical statement that has to be proven usually is very long; however, the proof is seldom mathematically deep and much of it is likely to be quite simple. In the example given previously the theorems to be proven were almost trivial. I t would seem that some sort of automatic theorem proving should be able to be applied in proving program correctness. This has been tried. So far the results have not been very exciting from a practical viewpoint. Computer-generated proofs Fully automatic theorem provers based on the resolution principle generally can prove correctness for very small programs-not much larger than the exponentiation program above. However, Slagle and N orton16 report that they have obtained fully automatic proofs of the verification conditions for Hoare's sophisticated little program FIND17 which finds the nth largest element of an array. In 1969 King18 completed a program verifier that automatically generated the verification conditions and then used a special theorem prover based on a natural deduction principle to automatically prove them. This system successfully proved programs to do a simple exchange sort, to test whether a number is prime, and similar integer manipulation programs. The data types were limited to integer variables and one dimensional arrays. Others have experimented with other data types and proof procedures. At the time of this writing I believe that there is no automatic theorem prover which has proven correctness for a program significantly larger than those mentioned. Automatic theorem provers still cannot handle the length and complexity of the theorems that result from larger programs. Another problem lies in the fact that some semantics of the programming language and additional facts about the application area of the program have to be supplied to the theorem prover as axioms. Automatic theorem provers have difficulty in selecting the right axioms when they are given a large number of them. Even in the minor successes that have been achieved, a somewhat tailor-made set of axioms or rules of inference have been used. 205 Computer-aided proofs There are now several efforts directed toward providing computer assistance for proving correctness. This takes the form of systems to generate verification conditions and to do proof checking, formula simplification and editing, and semiautomatic or interactive theorem proving. Unfortunately at this time almost any automation of the proof process forces one into more detailed formalisms and reduces the size of the program that can be proven. This is because the logical size of the proof steps that can be taken in a partially automated proof system is still quite small. Presumably this is a temporary phenomenon. It seems reasonable to expect that we will soon see computer-aided verification systems which make use of some automatic theorem proving and can be used to prove correctness of programs somewhat larger than those that have been proven by hand. Igarashi, London, and Luckham19 are developing a system for proving programs written in PASCAL. The verification condition generator handles almost all the constructs of that language except for many of the data structures. Their approach is based on the work of Hoare. ll •2o Elspas, Green, Levitt, and Waldinger21 are developing a proof of correctness system based on the problemsolving language QA4. 22 It will use the goal-oriented, heuristic approach to theorem proving which is characteristic of that language. Good and Ragland23 have designed a simple language NUCLEUS with the idea that a verification system and a compiler for the language could be proven correct. Both the verification system and the compiler would be written in NUCLEUS and the proofs of correctness would be based on a formal definition of the language. Theintent is that the language would then be able to be used to obtain other certified system software. These three systems give a general idea of the current work going on. A proof-checking system will be described in the next section. Several other interesting systems have been implemented and basic information about them is readily available in London's recent paper. 10 Long-range outlook Proofs of correctness are currently far behind testing techniques in terms of the size and complexity of the programs that can be handled adequately. It is very much an open question whether automated proof techniques will ever be feasible as a commonly used alter- 206 Fall Joint Computer Conference, 1972 native to testing a program. Many arguments pro and con are too subjective for adequate consideration here; however, a few comments are in order before one uses the rate of progress in the past as a basis for extrapolating into the future. Proofs are based on sophisticated symbolic manipulations, and we are still at an early stage of gathering information about ways to automate them. Existing proof systems have been aimed mostly at testing the feasibility of techniques. Few if any have involved more than a couple man years of effort-many have been conceived on a scale appropriate for a Ph.D. dissertation. If and when a cost-effective system for proving correctness becomes feasible, it will certainly require a much larger implementation effort. Proofs may be practical only in cases where a very high level of confidence is desired in specified aspects of program behavior. With computer-aided proofs one could hope to eliminate most of the sources of error that might remain after a manual proof. As exemplified by the work of Good and Ragland,23 the verification system itself as well as compilers and other system software should be able to be certified. If the basic hardware/software is implemented with a system such as LOGOS24 for computer-aided design of computer systems, then there should be a reasonable guarantee that the implemented computer system meets design specifications. With sufficient error-checking and redundancy, it should thus be possible to virtually eliminate the danger of either design or hardware malfunction errors. By the end of this decade these techniques may make it possible to obtain virtual certitude about a program's behavior in a running environment. There are many applications in areas such as real-time control, financial transactions, and computer privacy for which one would like to be able to achieve such a level of confidence. SOME THEORETICAL FRONTIERS Proofs of program correctness involve one in a seemingly exorbitant amount of formalism and detail. Some of this is inherent in the nature of the problem and will have to be handled by automation; however, the formalisms themselves often seem awkward. The long formulas and excessive detail may result partially because we have not yet found the best techniques and notation. Active theoretical research is developing many new techniques that could be used in proving correctness. Research in this area, usually called the mathematical theory of computation, has been active since McCarthy's25.26 early papers on the subject. I feel that practical applications for proofs of correctness will develop slowly unless new techniques for proving correctness can significantly reduce the awkwardness of the formalisms required. This section will describe some of the current ideas being investigated. The topics chosen are those which seemed more directly related to techniques for facilitating proofs of correctness. Induction techniques for loops and recursion Proving correctness of programs would be comparatively simple if programs had no loops or recursion. However, some form of iteration or recursion is central to programming, and techniques for dealing with it effectively in proofs have been a subject of intensive study. All the techniques use some form of induction either explicitly or implicitly. The method of inductive assertions described previously handles loops in flowcharts by the addition of enough extra assertions to break every loop and then appeals to induction on the number of commands executed. For theoretical purposes it is often easier and more general to work with recursively defined functions rather than flowcharts. Almost ten years ago McCarthy proposed what he called Recursion Induction26 for this situation. Manna et al. have extended the inductive assertion method to cover recursive, 27 parallel,28 and non-deterministic29 programs. Several other induction principles have been proposed by Burstall,30 Park,3l Morris, 32 and Scott. 33 A development and comparison of the various induction principles has been done recently by Manna, Ness, and Vuilleman. 34 Formalizing the semantics of programming languages The process of constructing the verification conditions or logical formulation of correctness is dependent on the meaning or semantics of the programming language. One can also take the opposite approach-proving correctness is a formal way of knowing whether a higher level meaning is true of the program. Thus the meaning or semantics of any program in a language is implicitly defined by a formal standard for deciding \vhether the program satisfies a specification. There is a very close interrelation between techniques for formalizing the semantics of a programming language and proofs of program correctness. Floyd's early work on assigning meanings to programsl has been developed especially by Manna2 and Ashcroft. 35 Bursta1l36 gives an alternative way to formulate program semantics in first-order logic. Ashcroft37 has recently summarized this work and described its relevance. Summary of Progress Toward Proving Program Correctness Hoare,1l·2o Igarashi,38 de Bakker,39 and others have worked to develop axiomatic characterizations of the semantics of particular programming languages and constructs. The Vienna Definition Language40 uses an abstract machine approach to defining semantics, and Allen41 describes a way of obtaining an axiomatic definition from an abstract machine definition. The axiomatic definition is generally more useful in proofs. Scott and Strachey have developed another approach to defining semantics42 which is described below. Work on defining the semantics of programming languages is very active with many different approaches being tried. Those described above are only the ones more closely related to proofs. If any of these ideas can greatly simplify the expression or manipulation of properties of programs, they should have a similar simplifying impact on proofs of correctness. Formal notation for specifications Formal correctness only has meaning with respect to an independent, formal specification of what the program is supposed to do. For some programs such specifications can be given fairly easily. For example, consider a routine SORT which takes a vector X of arbitrary length n as an argument and produces a vector Y as its result. With appropriate conventions, the desired ordering on Y is specified by: (Vi,j)[l~i
a:: a:: COST <{ L.' I 72 SALES I I \ I I JONES \ a:: 0 3: I T 41 i L \ a I 71 SALES T I 1 PROJECT ·1 ENGR I \ I I * IACTIVITY \ I 0 " 10 0 EXPENSES \ I \ I 1 \\0 1 1 0 0 1 1 0 1 0 0 i 1 i 0 0 1 1 1 0 1 \ 1 CUSTOMER 1DIVISION <{ 0 1 0 -l 0 NAME <{ u - OHIO CI) ~ 252 DAVIS 253 NAME 254 255 0 0 0 1 SMITH 0 0 SMITH 0 a.. DIV DEPT TAG TAG / ,\1 REP DESCRIPTOR HIRE DATE SALES RECORD 0 EMPTY ARRAY 1 PROJECT 0 DELETED 0 0 0 1 0 0 WORD RECORD RECORD PERSON NEL RECORD, SEC 1 PERSONNEL RECORD, SEC 2 0 0 255 0 4 - - BIT *PARENT RECORD IDENTIFIER IS EMPLOYEE NAME IN ADDRESS--+ TWO-SECTION PERSONNEL RECORD Figure 5-Associative array map example the PE having the lowest physical address in the array (or arrays). The new record, with its activity field set to a "1," is written into this first empty location. The hardware pointer then moves to the next available empty memory location for writing another record if a batch of new entries must be loaded. If no empty locations are found the program will exit to whatever routine the programmer has chosen for handling this type of error-for example, if appropriate to a specific application, the program may select an age test of all records in a particular file, purging the oldest to make room for the newest. A record once located may be deleted from a file by merely setting the activity bit to an "0." When a specific file is to be processed in some manner, the scattered locations containing the file's records are activated by performing EQC's on both the activity field and an n-bit "file descriptor" tag field. If, as in the example of Figure 5, the file descriptor field is two bits long, the entire selected file will be ready for processing in less than 2 microseconds « 1 p..s for the activity bit search, < 1 p..s for the file descriptor field search). Where record lengths are greater than the 256-bit length of the associative array word, several noncontiguous associative array words may be used to store the single· record in sections, one section per array word. The format for each record section must contain the same activity and file descriptor fields as are used in all record formats, and in addition it must contain a parent record identifier and an n-bit "section identifier" tag field. The scattered locations containing the desired section of all records in the specific file may be activated by performing EQC's on the activity, file descriptor, and section identifier fields. All three searches can be completed in approximately 2 or 3 microseconds. These two or three tag search operations in the AP STARAN permit random placement of records in the physical file and eliminate the bookeeping associated with file structuring and control required in conventional systems. The same approach is used for files which exceed the capacity of the associative arrays-the records of such files are stored in a similar manner on external mass storage devices and are paged into the arrays as required. The strategy used to allocate array storage space can have a significant effect on program execution time. An example is shown in Figure 6 where the products of three operand pairs are required. In A, the operands are stored in a single array word. For 20-bit fixed point operands the three MPF instructions would execute in a total of 1175 microseconds. All similar data sets stored in other array words would be processed during the same instruction execution. However, an alternative storage scheme (B) which utilizes three PE's per data set requires only one MPF execution to produce the three products in 392 microseconds. If one thousand data sets were involved in 235 each case the average multiply times per product would be 392 and 131 nanoseconds, respectively, but at the expense, in B, of using 3000 processing elements. Unused bits in B may be assigned to other functions· A last example of how array storage allocation can affect program execution time is shown in Figure 7 where the columns represent fields. Here the sum el, of 16 numbers is required. If the 16 numbers are directly or as a result of a previous computation stored in the same field of 16 physically contiguous array words, the near-neighbor relationships between the processing elements can be used to reduce the number of ADF executions to four. All similar 16 number sets would be processed at the same time. STARAN APPLICATIONS While many papers have appeared (see Minker4 for a comprehensive bibliography) which discuss the application of AM's and AP's in information retrieval, PROBLEM: 0i , bi , ci ,di ,ei ,fj ARE 20 BIT OPERANDS. FORM PRODUCTS ojbj, cjdj , ejfj FOR n DATA SETS FILE METHOD A - ALLOCATE ONE ARRAY WORD (PROCESSING ELEMENT) PER DATA SET SET IDENT PROGRAM A-i. MPF A, B, G 2. MPF C, D, H "3. MPF E, F, J I n sets processed in 1175.A(s (fixed point) METHOD B - ALLOCATE THREE ARRAY WORDS (PROCESSING ELEMENTS) PER DATA SET FIELD NAME - A B C °i Cj bi OJ bj j 01 1 dj ci di j 01 t ej fj ej fj i 01 t I / 1 III ? ! an bn On bn cn dn en fn n 011 cn dn n 011 en fn n 011 PROGRAM B - MPF A, B, C 1 n sets processed in 392 .A(s (fixed point) Figure 6-Effect of array memory allocation on execution time 236 Fall Joint Computer Conference, 1972 A ir traffic control -- ] ~1---~- ~ JJT~ ~~ JT~;:r 0e '< be °9 010 °11 °12 013 16 °14 Lai 1 °15 °16 NUMBER OF OPERATIONS IS -fn2N = ..In216 = 4 Figure 7-Tree-sum example text editing, matrix computations, management information systems and sensor data processing systems, there are none yet published which describe actual results with operating AP equipment in any application. (But see Stillman: for a recent AM application result.) Recent actual applications of the AP have been in real time sensor related surveillance and control systems. These initial applications share several common characteristics: 1. a highly active data base; 2. operations upon the data base involve multiple key searches in complex combinations of equal, greater, between-limits, etc., operations; 3. identical processing algorithms may be performed on sets of records which satisfy a complex search criterion; 4. one or more streams of input data must be processed in real time; and 5. there is a requirement for real time data output in accordance with individual selection criteria for multiple output devices. An example of an actual AP application in an air traffic control environment is shown in Figure 8. In this application a two array (512 processing elements) STARAN 8-500 model was interfaced via leased telephone lines with the output of the FAA ARSR long range radar at Suitland, Maryland. Digitized radar and beacon reports for all air traffic within a 55 mile radius of Philadelphia were transmitted to STARAN in real time. An FAA air traffic controller's display of the type used in the new ARTS-III terminal ATC system and a Metrolab Digitalk-400 digital voice generator were interfaced with STARAN to provide real-time data output. The controller's keyboard was used to enter commands, call up various control programs and select display options. Although a conventional computer is not shown explicitly in Figure 8 the sequentially oriented portions of the overall data processing load were programmed for and executed in the STARAN sequential controller as shown in Figure 9. Sequential and associative programs and instruction counts for STARAN are shown in Table II. In a larger system involving multiple sensors and displays, and more ATC functions such as metering and spacing, flight plan processing, and digital communications, the sequential and parallel workloads would increase to the point where a separate conventional computer system interfaced with the AP would be required. The STARAN system was sized to process 400 tracks. Since the instantaneous airborne count in the 55 mile radius of Philadelphia was not expected to exceed 144 aircraft, a simulation program was developed to simultaneously generate 256 simulated ARSR RADAR TELEPHONE SUITLAND, MARYLAND LINES .. r STARAN S- 500 l FAA A portion of the processing inherent in these applications is parallel-oriented and well suited to the array processing capability of the AP. On the other hand these same applications also involve a significant amount of sequentially-oriented computation which would be inefficient to perform upon any array processor, a simple example being coordinate conversion of serially occurring sensor reports. DISPLAY BEACON • RADAR • CONFLICT • DETECT ION RESOLUTION AVOI DANCE TERRAIN AUTOMAT IC vorCE ADVISORY DIGITAL 1 TRACK I NG • CONFLICT • • MONITOR TRACKI NG • DISPLAY PROCESSING VFR VOICE GENERATOR Figure 8-Air traffic control application STARAN 237 DATA PATH CONTROL PATH EXECUTIVE ---...,I I I M~E1_""'...I.:_--.j_.... SC TELETYPE SC''''-f---t ---+---1. . CONTROLLER TAPE READER/PUNCH ON -LINE SC DEBUG AND UTILITY PACKAGE DATA RECEIVER ASSEMBLY KEYBOARD INTERRUPT HANDLER , __________ .J r---I • EXTERNAL DEVICE ARTS m KEYBOARD TARGET SIMULATION ROUTINE .. .....-----....- AVA I SC I I AUTOMATIC VOICE ADVISORY DRIVER I I I I I ASSOCIATIVE PROCESSOR L _______ .., AP I I I AP SC LIVE DATA INTERRUPT HANDLER I I I I SEQUENTIAL CONTROL PROCESSOR CLOCK INTERRUPT SC DATA LINE SC I L ________ ., I I I : 1- - - - - I I AP CONFLICT PREDICTION AP CONFLICT RESOLUTION AVA MESSAGE SELECTOR AP I I L_________'t. ______ ____ :!t __ __________ ~ _________ ...J Figure 9-ATC program organization aircraft tracks. Display options permitted display of mixed live and simulated aircraft. The 400 aircraft capacity is representative of the density expected as North-South traffic loads increase through the late '70s. Conflict prediction and resolution programs based upon computed track data were demonstrated and used to display conflict warning options. Automatic voice services were provided for operator-designated aircraft, thus simulating warning advisories for VFR pilots requesting the service. The voice messages, which in an operational system would be automatically radioed to the pilot, were generated by the Metrolab unit from digital formats produced by the associative processor and broadcast in the demonstration area via a public address system. A· typical message would be read out in voice as, "ABLE BAKER CHARLIE, FAST TRAFFIC SEVEN O'CLOCK, 4 MILES, ALTITUDE 123 HUNDRED, NORTHEAST BOUND". Top level flow charts for four of the associative programs used in the demonstration are shown in Figures 10, 11, 12, and 13. A detailed report is in preparation describing all of the ATC programs used in this demonstration, but some comments on the four flow charts shown may be of interest. Live target tracking (Figure 10) is performed in two dimensions (mode C altitude data was not available) using both radar· and beacon target reports to track all aircraft. Incoming reports are correlated against the entire track file using five correlation box sizes, three of which vary in size with range. Any incoming report which does not correlate with an existing track is used to automatically initiate a new tentative track. An aircraft track must correlate on two successive scans and have a velocity exceeding 21 knots to qualify as an established track and must correlate on three successive scans to achieve a track firmness level high enough to be displayed to a controller as a live target. 238 Fall Joint Computer Conference, 1972 TABLE II-STARAN Air Traffic Control Programs SEQUENTIAL PROGRAMS NAME Executive , ASSOCIATIVE PROORAMS INSTR COUNT Keyboard Inte rrupt Real Time Interrupt Live Data Input Automatic Voice Output > 1600 . INSTR COUNT Tracking System 881 Track Simulation System 415 Turn Detection 88 Conf 1 ict Pred ic t ion 488 Conflict Resolution 296 Automat ic Voice Advisory 709 Display Process ing Total Field Definition Statements Included ret Operating Instructions 1"6"00 There are provisions for 15 levels of track firmness including 7 "coast" levels. If a report correlates with more than one track, special processing (second pass resolve) resolves the ambiguity. Correlated new reports in all tracks are used for position and velocity smoothing once per scan via an alpha-beta tracking filter where for each track one of nine sets of alpha-beta values is selected as a function of track history and the correlation box size required for the latest report correlation. If both beacon and radar reports correlate with a track, the radar report is used for position updating. Smoothed velocity and position values are used to predict the position of the aircraft for the next scan of the radar and for the look-ahead period involved in conflict prediction. Track simulation processing (Figure 11) produces 256 tracks in three dimensions with up to four programmable legs for each track. Each leg can be. of 0 to 5 minute duration and have a turn rate, acceleration, or altitude rate change. A leg change can be forced by the conflict resolution program to simulate pilot response to a ground controller's collision avoidance maneuver command. Targets may have velocities between 0-600 knots, altitudes between 100-52,000 feet, and altitude rates between 0-3000 feet per minute. The conflict prediction program sequentially selects Net Ope rat ing Instructions 1140 4017 514 3493 up to 100 operator-designated "controlled" or "AVA" aircraft, called reference tracks in Figure 12, and compares the future position of each during the lookahead period with the future positions of all live and simulated aircraft and also to the static position of all terrain obstacles. Any detected conflicts cause conflict tags in the track word format to be set, making the tracks available for conflict display processing. A turn detection program not shown opens up the heading uncertainty for turning tracks. Display processing (Figure 13) is a complex associative program which provides a variety· of manageby-exception display options and automatically moves operator-assigned alpha numeric identification display data blocks associated with displayed aircraft so as to .prevent overlap of data blocks for aircraft in close proximity to one another on the display screen. Sector control, hand off, and quick-look processing is provided. All programs listed in Table II were successfully demonstrated at three different locations in three successive weeks, using live radar data from the Suitland radar at each location. The associative programs were operated directly out of the bulk core and page 0 portions of control memory since there was no requirement, in view of the low 400 aircraft density STARAN * *ONE REPORT AGAINST ALL TRACKS -ALL TRACKS * * SECOND PASS RESOLVE (ONCE PER AMBIGUOUS TRACK) * SMOorH TRACK POSITION AND VELOCITY (ONCE PER 10 SEC.) PREDICT TRACKS NEXT REPORTING POSITIONS (ONCE PER 5 SEC.) where performance is shown in terms of millions of instructions per second for the ADF and EQC instructions using two different operand lengths, and cost effectiveness is measured in terms of instructions per second per hardware dollar. This form of presentation was taken from Bell. ~ Another cost effectiveness measure is to compare projected hardware and software costs of an associative configuration and an all-conventional design for the same new system requirements, where the associative configuration may include a conventional computer. Only a few attempts at this approach have been made to date and none have been confirmed through experience. One classified example, using a customer defined cost effectiveness formula, yielded a total system cost effectiveness ratio of 1.6 in favor of the associative configuration. Of the two methods, the first is least useful because there is no way of estimating from these data how much of the associative computing capability can be used in an actual application. The second method is U?DATE TRACK FIRMNESS (ONCE PER 5 SEC.) NO Figure IO-Live target tracking involved, for the higher speed instruction accesses available from the page memories. At intervals during the demonstration all programs were demonstrated at a speed-up of 20 times real time with the exception of the live data and AVA programs which, being real-time, cannot be speeded up. Timing data for the individual program segments will be available in the final report. The entire program executed in less th3ill 200 milliseconds per 2 second radar sector scan or in less than 10 percent of real time. All programming effort was completed in 4% months with approximately 3 man-years of effort. This was the first and as of this writing the only actual demonstration of a production associative processor in a live signal environment known to the author. It was completed in June, 1972. Other actual applications currently in the programming process at Goodyear involve sonar, electronic warfare and large scale data management systems. These will be reported as results are achieved. CALCULATE NEW ie, Y.MODIFIED BY fil, OR Vc tn = time left in leg 13 = turning rate Vt =acceleration rate COST EFFECTIVENESS Associative processor cost effectiveness can be ex.;. pressed in elementary terms as shown in Figure 14 239 Figure II-Tracking simulation 240 Fall Joint Computer Conference, 1972 more meaningful but is exceedingly expensive to use since it implies a significant engineering effort to derive processing algorithms, system flow charts, instruction counts, and timing estimates for both the conventional and the· associative approach. The weakest element in this approach lies in the conventional approach software estimate which historically has been subject to overruns of major dimensions. 2000 EQC-16-BIT 1500 EQC-32-BlT 1000 0 Z 0 ADF-16-BIT 500 U ILl ~ « r~~:::~::~ ~ 0 I Q. ~ 19~1 F M 'l: I I M A 85 J 1971 F A M I M J J A I A SON D 19J72 F M A M J J ~ S 0 N D J F M A 1972 Figure 5-Percent of total machine availability M J J Computer System Maintainability 271 TABLE I-Current Scheduled Maintenance Monday Machine Network hub CDC 6600 CDC 7600 CDC 7600 CDC 7600 a b Tuesday Wednesday Thursday Friday Interval x x x x x b b b 4:00-8:00 4:00-8:00 4:30-8:00 4:30-8:00 4:30-8:00 a x x x x x CDC 6600 taken on alternate Mondays. Any two CDC 7600's may be taken, but not all three at the same time. is maximized by conducting scheduled maintenance at a time least visible to the user (0400-0800 hours) and by selecting subsets of components to be down concurrently. For comparison, the scheduled or. preventive maintenance (PM) and unscheduled or emergency maintenance (EM) history for the CDC 7600 R (serial No.1) and CDC 7600 S (serial No.2) host computers is illustrated in Figure 4. These maintenance actions required the host computers to be off-line and therefore completely unavailable to the user. Figure 5 shows the average total' percentage availability for the CDC 7600 Rand S host co~puters, the CDC 6600 L (serial No. 13) and CDC 7600 M (serial No. 31) host computers and the total percentage availability for the PDP-10 hub or control computer. The percentages are arrived as follows: Hours in Month-(PM +EM) Hours in Month EVALUATION OF DIAGNOSTIC MAINTENANCE TOOLS AND PROCEDURES ACKNOWLEDGMENTS The diagnostic maintenance tools do provide for rapid, positive identification and isolation when the component or device failure is solid. However, experi7600 R 7600 S 6600 L 6600 M PDP-IO File transport network ence has indicated that most failures tend to be intermittent in nature and difficult to identify and isolate. Even though great amounts of time and money can be spent attempting to isolate intermittent failures, it has not been demonstrated at LLL that intermittent failures become significantly less intermittent when extensive off-line diagnostic periods are used. For this reason, it is concluded that it is better to catalog an intermittent error for administrative analysis, recover as softly as possible, and promptly return the device or component to full productivity rather than insist on the immediate off-line isolation of the problem. Samplings (Figure 6) of system availability taken hourly Monday through Friday from 0800-1630 hours from November 1970 through April 1972 demonstrate the following: Percent Total System Availability 75* (all devices on line) Partial System Availability 100* (Useful work being accomplished by at least one host) D--+-......--+-----+----~----+-- IBM Data Cell ~H--+---+---~---+- IBM Photostore D General Precision disk D-+~-+---+---+---+-- 400 series PDP-8 TTY network D~-.-+- 200 series PDP-8 TTY network 0 600 series PDP-8 TTY network 0 - .......--+-----'-------<1.-----1.- ... ~-.---+------ -- Figure 6-Host computers on-line during sampling period The authors express their appreciation to LLL's Donald L. von Buskirk, Richard E. Potter, Robert G. Werner, and Pete Nickolas for their contributions to this paper. REFERENCES 1 D L PEHRSON An engineering view of the LRL Octopus computer network Lawrence Livermore Laboratory Rept. UCID-51754 1970 2 Livermore time-sharing system manual M-026 Lawrence Livermore Laboratory 1972 3 K H PRYOR et al Status of major hardwar(} additions to Octopus Lawrence Livermore Laboratory Rept UCID-30036 1972 * Power failures affecting the entire network are not included in these figures. 272 Fall Joint Computer Conference, 1972 APPENDIX Commands available include: Printer / Punch P ALL PI Send all printer output to on-line printer No. 1. Printer 2 can now go down for maintenance. PALL P2 Send all printer output to on-line printer No.2. Printer 1 can now go down for maintenance. PALL HSP Send all printer output to tape for off-line processing. Both on-line printers can now go down for maintenance. P KILL PI Ab6rts processing of current printer / or P2 or punch files on indicated device. PUNCH P2 HSP Send all printer No. 2 output to tape for off-line processing. Printer 2 can now go down for maintenance. P NORMAL Restores operating status of printer output devices. Disk DNMMM N is a disk "unit number 1 through 4. MMM is either "IN" or "OUT." This will allow or prevent, respectively, the creation of new files on disk N. If "OUT," existing files remain accessible and disk N can be scheduled for maintenance. DP N MMM As described for the D option, but will also purge disk N of all existing files. All files on disk N are destroyed and are no longer accessible. Drum P DRUM Transfers system tables from the drum DOWN to memory and rewrites these tables to a disk file. All subsequent attempts to access the drum will be redirected to the disk. This provides backup capabil- ity for the drum and allows the drum to be taken down for maintenance. P DRU1VI UP Restores normal operating status of the drum. System tables that have been stored on disk during the drum down period are transferred from disk to memory and rewritten to the drum. Tape CN Tape unit N is physically not available to the system. FN Tape unit N is physically available to the system. EP A tape error status is returned to program P. Severs logical connection between tape XN unit N and problem program. Execution Allows only privileged user number SP NNNNNN NNNNNN access to the system. All IN TEXT users programs are saved on disk, and all users remote TTY stations are logged out. The TEXT is sent to all users attempting to log in. Removes privileged user number SP NNNNNN NNNNNN and automatically restarts previously running programs and reOUT stores normal time-sharing. Prohibits any additional log in. TEXT S TEXT is sent to remote TTY stations. Restores normal time-sharing. R Broadcasts I STORE Sends the TEXT to all logged in TEXT remote TTY stations. Sends the TEXT once to all remote TTY's at log-in time. Sends TEXT to TMDS. I ERASE Erases the I STORE TEXT. I BROAD Broadcasts TEXT once to all remote TEXT TTY stations and sends TEXT to TMDS. The retryahle processor by GEORGE H. IVIAESTRI Honeywell Information Systems Inc. PhOenL1{, Arizona INTRODUCTION In the interest of improving readability, instruction retry is presented generically. Technical terms unique to the 6000 are avoided. A study performed by the U.S.A.F.l showed that 80 percent of the electronic failures in computers are intermittent. A second study performed by IBM2 stated that "intermittents comprised over 90 percent of the field failures." The intermittent failure problem Alternatives for solution A hard failure is the result of a logic signal either remaining permanently in a one or a zero state or of a signal consistently switching to an improper state (such as an AND gate behaving as an OR). In the case of an intermittent failure, identical instructions executed in different sequences or at different times will not fail consistently. Test and Diagnostic (T&D) programs are designed to diagnose solid failures and are successful at accomplishing their design objectives. They begin by certifying a basic core of processor functions and then read the T&D executive into memory to commence comprehensive testing. No function is used until it is tested. A problem with this approach is that intermittent failures can occur on functions that have been previously certified, completely destroying the rationale of the program. The second, and most likely problem, is that T&D programs seldom detect intermittent failures. To trigger an intermittent failure, the T&D must execute a particular sequence of instructions in an exact order, using the proper data patterns. Also, intermittents are often triggered by stimulus that is beyond the control of programs; thermal variations, mechanical vibration and power fluctuations are examples. Sequence sensitive intermittents can be caused by the following: a low noise threshold in an IC, crosstalk, slow rise or fall times of logic signals, or extra slow or fast gates that activate a normally quiescent race condition. The ideal method of diagnosing an intermittent failure is to bypass test programs and to diagnose directly from the symptoms of the original failure. The only reason that this method is not in common use is that the set of failure symptoms that are available to programs is inadequate for that purpose. First, it is necessary to know which bits are in error and whether they failed to switch from a one to a zero state or vice versa. Second, all control points should be visible to the diagnostic for all cycles of the failed instruction. A scratchpad memory or snapshot register could save the state of control points and data in case an error is detected. In the case of intermittent failures, the problems associated with using a failing processor to diagnose its own ills will be minimal. Also, a minicomputer or a second processor could do the data processing necessary for diagnosis. If it is not possible to diagnose from the symptoms of the original failure, it will be necessary to run a T&D program to gather additional information about the failure. However, T&D programs are ineffective against intermittent failures because they are usually unsuccessful in detecting them. What is required then is a method of .making an intermittent failure solid. Stress testing is often effective in changing an intermittent failure to a solid failure. Stress testing involves setting marginal voltage and timing conditions to amplify the effects of slow rise times, low switching thresholds and race conditions; thermal stress is also 273 274 Fall Joint Computer Conference, 1972 applied for the same reasons. Mechanical vibration is applied while the T &D is in execution to locate loose wirewraps, defective connectors, microphonic chips or substrates, conductive debris that is caught between wirewrap pins, and defective printed circuit runs. Error Detection And Correction (EDAC) codes are particularly effective for correcting memory parity errors, which are inherently not recoverable by instruction retry techniques. Algorithms have heen developed to allow single or multiple bit failures to be corrected. Some EDAC codes are particularly effective for correcting memory parity errors, which are inherently not recoverable by instruction retry techniques. Algorithms have been developed to allow single or multiple bit failures to be corrected. Some EDAC codes can traverse adders to correct addition errors. A dvantages of instruction retry over other alternatives There is no reason that an immediate branch to a diagnostic program must exclude an instruction retry. The detection of an error can cause an immediate branch to a diagnostic program that will log and analyze all available symptoms. Failure analysis could result in the generation of a list of all logic elements whose failure could result in the symptoms that were recorded. The boundary between suspect and nonsuspect logic will be called "limits." Once limits are established, they can be analyzed to determine if the failure has been sufficiently isolated to enable a repair. If they have not, the processor can be restored to the state that existed prior to the failure and the instruction can be retried. If the retry attempt is successful, the processor will remain available to the customer until the next failure. Subsequent failures will serve to further narrow the limits by contributing new symptoms. Stress testing requires that the processor be dedicated to T&D, which means that the processor will not be available to the user. Instruction retry will keep the processor available to the user as long as it is successful; maintenance can be performed during slack time. Also, thermal and mechanical stress can inflict new damage. While EDAC is an effective means of correcting memory parity errors, it is incapable of correcting control point errors in the processor. If a word of data and its correction code fail to traverse a switch, because of a control point error, both the correction code and the data will be lost. Since instruction retry is conversely ineffective against memory parity errors and most effective against control point errors, EDAC and instruction retry will complement each other. Obstacles to retry A prerequisite to a successful instruction retry is that memory locations and registers associated with the faulted instruction must contain the same data that they did before the instruction was started. If a register .or memory location was changed during the execution of the instruction, it must be restored before retry can be attempted. Restoration will not be possible if the contents of a memory location is added to and replaces the contents of a register and an image of the original register contents is not available. Memory parity errors are detected after an error has invalidated the contents of a memory location. Unless memories are duplicated or EDAC is present, memory parity errors cannot be retried. The instruction repertoire of some processors includes instructions that cause the memory controller to perform a destructive read of a memory location. If an error occurs on an instruction that caused a destructive read, it will be necessary to restore the cleared memory location before retry can be attempted. A MOVE is an instruction that moves a block of data from one area of memory to another. If a MOVE overlays part of its source data, instruction retry will not be possible. For example: if 100 words are moved from location 70 to location 0, words 70 to 100 of the source data can be overlaid. Indirect addressing offers the programmer the capability to address operands via a string of indirect words that are often automatically updated every . time they are accessed. If a .faulted instruction has obtained its operand via such a string, every indirect word in the string must be restored prior to retrying the instruction. If an error occurred during an indirect word cycle, then only the indirect words preceding the failure must be restored. In processors with hard-wired control logic, the multicycle instructions repeatedly change the contents of registers as fast as data can be cycled through the adder. Delaying every cycle for error checking is often an unacceptable degradation of throughput. Consequently, a register could be overlaid with erroneous data before the error can be detected. Instruction overlap is a feature of large scale processors that complicates identifying the failing instruc- , tion. Instruction overlap takes advantage of the fact that no single instruction uses all of the processor logic at any given instant. While one instruction is The Retryable Processor terminating, a second instruction may be using the adder, while a third is undergoing an address preparation cycle and a fourth is being fetched from memory. If instruction overlap is active, the instruction counter may be pointing to the instruction being fetched from memory at the time an error is detected on the instruction that is in the address preparation sequence. Thus, merely safestoring the instruction counter at the time of failure will not serve to identify the failing instruction. Design methodology to avoid obstacles The destruction of data can be avoided for single cycle instructions by not overlaying the register in the first place. The adder sum can be buffered or merely retained on the data lines until error checking has finished. If an error is detected, the instruction can be aborted before the defective data is moved into a register. EDAC can enable the recovery of memory parity errors. At the time of a fault, the state of the cycle control flags and address register could be saved in a snapshot register. The contents of the snapshot register could identify the failing cycle of a MOVE so that software could continue moving the block of data in place of the interrupted MOVE. This will be effective in recovering an error on a MOVE that has overlaid part of its source data. The snapshot register can also serve as a diagnostic aid by saving the state of cycle control flags at the time of an error. One method of restoring a string of indirect words is to obtain a pointer to the first indirect word from the address field of the instruction. Since indirect word updates are performed by a fixed and known algorithm, it will be a simple matter to restore the first word of the string to obtain a pointer to the second and then follow the string; restoring each word to obtain a pointer to the next. However, several pitfalls· exist in the above method. One is that if the error occurred on an indirect word cycle, the recovery program must know when to terminate its indirect word restoration activity. Otherwise, it may restore indirect words that have not been updated, thus inducing an error. Also, the recovery program must be able to determine if the indirect word being restored has been damaged by a parity error. Another problem is the possibility that an indirect string may wrap back on itself, causing a word to be updated twice. If the recovery program merely follows the string, without knowledge of the 275 double update, it will fail to restore part of the string and will induce an error when the instruction is retried. An alternative that would also allow software to restore indirect words without inducing errors, is to provide a scratchpad memory to save the state of sequence control flags and memory addresses for every cycle of an instruction. If an indirect word string wraps back on itself, causing an indirect word to be updated twice, it would present no problem to the recovery program; the snapshot register will contain two entires for that word, and it will be rolled back twice. Providing intermediate registers will serve to both increase processor speed and to protect the contents of primary registers in case of error. The secondary registers can be placed at the inputs to the adder to hold the operand from memory and the operand from a primary register. The secondary registers will also serve to decrease the execution time of multicyc1e instructions, by providing a shorter path to the adder. Intermediate registers will allow date manipulation to be performed for multiplies, divides, etc., without changing the contents of the primary registers. The sum, product, quotient, etc., can be moved into a primary register after error checking is complete. Another alternative would be to save an image of the registers every time that an instruction comes to a normal (error-free) termination. Instruction retry could be accomplished by refreshing the - primary registers with data from the backup registers. If the processor has instruction overlap capability, it will be necessary to correct the instruction counter when a fault is detected. Otherwise, it may not be possible to determine which of several instructions, that are simultaneously in execution, failed. Another possibility is to provide an instruction counter for each of the instructions that can be simultaneously executed. The instruction counter assigned to the failing cycle can be selectively safestored. A third possibility is to include a failure flag in the scratchpad memory to identify the failing cycle. If the failing cycle is identified in the scratchpad, the instruction containing that cycle can be identified by a program. Tradeoffs Figure 1 shows that the simple processor operations; i.e., loads, stores, transfers and instruction fetches account for 95 percent of all processor operations (excluding address modification). Figure 2 shows that 30 percent of all processor operations are preceded by 276 Fall Joint Computer Conference, 1972 some type of address modification; of the 30 percent, 25.4 percent is simple register type modification and 4.6 percent involves indirect words. Since register type address modification does not in itself alter register contents, it is not a factor in determining the retryability of a simple instruction. Consequently, if instruction retry is implemented at all, it will be better than 90 percent effective. The mandatory design requirements for instruction retry are: (1) The failure must have been detected by hardware error detection. (2) The failing instruction must be identifiable. (3) Instruction operands must either remain intact or must be restorable. The simple mechanism of holding the adder sum output on the data lines until it 'has been determined that an error has not occurred will prove effective against processor/memory interface errors, for simple instructions. If the processor does not have instruction overlap capability, merely safestoring the instruction address in a predetermined memory location will serve to identify the failing instruction. If the processor has overlap capability, then a more sophisticated method of identifying the failing inNumber of Operations Number of Instructions Instruction Fetches** Stores Multiplies Divides Transfers Shifts Floating Adds Floating Multiplies Floating Divides Loads Load Register, Store Register Negate Master Mode Entry Execute Double, Execute Repeats Repeated Instructions*** Returns Binary Coded Decimal NOP Retryable Operations 2,653,856 1,661,723 938,873 367,811 1,196 933* 479,421 41,894 3,231* 2,621* 372* 743,170 1,078 450 661 4,152 5,326 53,260* 4,088 2,919 2,400 2,589,287 (97.5 percent) * Not Retryable 64,569 (2.5 percent) ** Instruction Fetches = 56.5 percent Number of Instructions *** Repeated Instructions == Repeats times 10 Figure I-Instruction frequency analysis ProbabiH Probabili ty .4 .3 .2 .1 '------1t-----+----f----4--.-.-==.~~.~ Any R IT IR RI Totals Any address modification 761,580 Address modification =R 644,062 Address modification = IT 58,161 Address modification = IR 53,778 Address modification = RI 5,579 Figure 2-Probability of address modification on processor operations 3 struction is necessary. (See "Design Methodology To Avoid Obstacles"). Once the failing instruction is identified, the opcode and address modification specifier can be examined to determine if the instruction is a candidate for an instruction retry attempt. The addition of a snapshot register bank and other features mentioned in the "Design Methodology" section of this paper will allow multi-cycle instructions and instructions utilizing indirect address modification to be retried; This will raise the effectiveness of instruction retry to almost 100 percent.* Cost of implementation The 6000 processor features hard wired control logic and instruction overlap. Its four instruction counters, scratchpad register bank and intermediate registers allow instruction retry to be better than 97 percent successful (see Figure 1). * 100 percent effectiveness means that instruction retry can be unconditionally attempted. The Retryable Processor However, none of the above features were incorporated as instruction retry aids and cannot be considered a cost of implementation. The instruction retry effort was not started until after the processor design was frozen. The four instruction counters are an improvement on the 600 line's "ICT Correction" logic. It has always been considered good design practice to accurately identify the location of a fault. The scratchpad register bank was implemented in approximately 400 SSI chips as a dump analysis and T&D aid. The intermediate registers were implemented to speed processor instruction execution. The instruction retry feasibility study, programming and debug efforts required one man year. 277 3 K ROSENSTEEL An analysis of dynamic program behavior Honeywell No ASEE # 54 1972 ACKNOWLEDGMENTS Peter J. Scola For originating many of the hardware changes that enabled instruction retry to be implemented on the Honeywell 6000 line and for obtaining the necessary funding. Harlow E. Frick For assistance in implementing the Honeywell Instruction Retry Program. CONCLUSION APPENDIX If a processor failure is detected, there are two possible actions that can be taken. One is to abort the program that was in the execution, to prevent propagation of the error. The second is to retry the failing instruction in an attempt to bypass a possible intermittent failure. Since 80 to 90 percent of processor failures are intermittent, there is an excellent chance that instruction retry will succeed. The advantage of retrying the instruction over aborting the program is that a successful instruction retry will keep the system available to the customer while an abort takes it away from him. As long as errors continue to be successfully recovered and performance is not seriously degraded, maintenance can be deferred until a convenient time period. The question of what comprises a serious degradation is probably best answered by the individual user. In a real time application, 3 or 4 percent may be serious; while in an I/O bound batch application, 30 or 40 percent degradation may be tolerable. REFERENCES 1 J P ROTH Phase I I of an architectural study for a self-repairing computer USAF Space and Missile Sys Org Los Angeles CA AD 825 460 18 1967 2 M BALL F HARDIE Effects and detection of intermittent failures in digital systems IBM No 67-825-2137 1967 With the exception of the footnotes in Figure 1, Figures 1 and 2 were extracted from a report by Kenneth Rosensteel3 entitled: "An Analysis of Dynamic Program Behavior". Figures 1 and 2 represent the total number of instructions executed by a mix of FORTRAN, ALGOL and COBOL compilations and executions. In considering address modification, Figure 2 shows the four possible modification types: Register (R)-Indexing according to the named register and termination of the address modification procedure. Register Then Indirect (RI)-Indexing with the named register, then substitution and continuation of the modification procedure as directed by the tag field of this indirect word. (Indirection with pre-indexing.) Indirect Then Register (IR)-Saving the register designator, then substitution and continuation of the modification procedure as directed by the tag field of this indirect word. (Indirection with post-indexing.) Indirect Then Tally (IT)-Indirection, then use of the indirect word as tally and address with automatic incrementing and decrementing. Any-Probability of any type of address modification. Evaluation nets for computer system performance analysis by G. J. NUTT* University of Washington Seattle, Washington INTRODUCTION second transition, a2, has a single input location, b3, and two output locations, bl and b4• For this example, let a dot in a location represent a token residing on that location. Suppose that the definitions of the two transitions, al and a2, specify that they fire when all input locations contain a token and all output locations do not contain a token. Then in Figure 1 (a), transition al is ready to fire. Figure 1 (b) shows the same transitions and locations after transition al has fired. Figure 1 (c) shows the result after firing a2. In this example, the prespecified subsets are the complete sets of input and output locations. Figure 1 may be interpreted as the following situation in a computer system. If bl contains a token, the central processor is available. If b2 contains a token, there is a job requesting the central processor. Thus, concurrent occupancy of tokens on bl and b2 indicates that there is a request for the central processor and it is available, causing transition al to take action, (representing central processor allocation). The time required to fire al indicates allocation time, and is negligible. A token on b3 represents central processor activity. The transition time for a2 reflects the length of central processor time for the job, and on completion of firing, the central processor again becomes available, (i.e., a token is placed on bl ) and the job has completed central processor utilization, (i.e., a token is placed on b4 ). Evaluation nets are derived from the work of Petri2 and Noe3 ; they have also been influenced by the work of Holt,4,5 who is primarily responsible for the development of Petri nets. The nets given in this paper· differ from Petri nets in their' 'practicality." The path of a token through a net is well-defined by providing a mechanism to choose from a set of alternate paths that the token might take. Each token in the net is distinct and retains its identity. The token may have a vector of attributes, (capable of taking on values), that may be modified by the various transitions that operate on the token. The time required for each execution of a transition is part of the specification of the net, thus introducing time as a measure of net performance. The growing complexity of modern computer systems has made performance evaluation results more and more difficult to obtain. The difficulty of representation and analysis of combination hardware/software systems has increased with the level of sophistication used in their design. One popular approach that has been used for evaluating proposed computer systems is simulation.l In this paper, a method of representation is presented that is useful in constructing a modular model, where the level of detail may vary throughout. This method has been designed to aid implementation of these models, with a net effect of providing the ability to construct flexible simulation models in a relatively small amount of time. A graphic approach is used so that the two dimensional structure of the machine is pictorially available to th~ simulation designer. The graphs are also useful for planning measurements of either a simulation model or the machine they represent. An evaluation net is made up of transitions interconnected by directed edges with locations. Each location may contain a token. For a particular transition, the members of the set of locations directed into the transition are called input locations, and the members of the set of locations directed away from the transition are called output locations. A transition fires if the set of input and output locations satisfies the definition of that particular transition, causing one token to be removed from each location of a prespecified sllbset of the input locations and one token to be placed on each .location of a prespecified subset of the output locations. Example Figure 1 (a) shows an example with two transitions. The first transition, (the vertical line labeled al), has two input locations, (the circles labeled bl and b2 ). The * Present address: Department of Computer Science, University of Colorado, Boulder, Colorado 80302 279 280 Fall Joint Computer Conference 1972 THE CLASS OF EVALUATION NETS (a) In this section we shall describe the class of evaluation nets in detail. We begin by defining location types and statuses. All locations are connected to at least one . transition. If a location is an input (output) location for some transition and is not an output (input) for any other transition, the location is said to be peripheral, e.g., b2 and b4 in Figure 1. If a location is not peripheral, it is an inner location. A location is empty if it does not contain a token, and full if it contains a token, e.g., locations b1 and b2 are full and ba is empty in Figure 1 (a). If it is not known whether the location is empty or full, the status of the location is undefined. An inner location may change from empty to full or full to empty only by the firing of one of the transitions to which it is connected. The conditions for the status of a location to be undefined will be discussed later, as will the utility of this convention. (b) Transition schemata (c) A transition definition is given by a triple, a = ( s, t ( a ) , q), where s is a transition schema (or type), t (a) is a transition time, and q is a transition procedure. The movement of tokens from input locations to output locations is described by the schema. The number of output locations is limited to two, the number of input locations is limited to three, and the number of connected locations is limited to four for any given transition. If a location is empty, its status is denoted as "0". If the location is full, the .status is denoted as "1". The undefined status is given by the symbol, " ,M(r): =l-i] where i is either 0 or 1, r is the label of the peripheral resolution location, and PI, P2 are "Algolic" Boolean expressions (predicates) which! can be evaluated to either true or false, (Nutt7 contains a more complete handling of these predicates). The resolution procedure is evaluated by first evaluating Pl. If it is true, M(r) becomes i and further evaluation of the procedure is discontinued. Otherwise, P2 is evaluated; if it is true, M (r) is set to 1- i. In either case the resolution procedure evaluation is discontinued after predicate P2 is evaluated. Note that when both predicates are evaluated as false, the marking of r remains undefined. The procedure need not be evaluated again until one of the arguments of the predicates changes its status. Examples of resolution procedures are given in the next section. Token attributes and their modification The transition schema definitions imply that no location may contain more than a single token at a time, provided that an initial marking does not place more than one token on any location. For example, the type T transition fires only when the input location is full and the output is empty, hence only an empty location can receive a token. This property of evaluation nets, (known as safety), allows each token to be distinct. Since tokens retain their identity, we shall give them names and associate a list of n attributes with each token, such that each attribute may take on a value. A token, K, with n attributes is denoted as K[n], and if location b contains K[n], we shall write M(b) =K[n] rather than M(b) =1. The }th attribute of the token K is denoted as K ( j). At times we will find use for tokens with no attributes, whose identity is unimportant. We shall continue to indicate these tokens by the symbol "I". For example, a resolution location setting will only need to indicate empty or full status, hence can be denoted as M(r) =0 or M(r) =1. The attributes of a token impose a data structure on the locations of a net. Any particular location will always receive (and yield) tokens with a fixed number, n, of attributes. A location, b, which holds tokens with n attributes is denoted as ben]. Hence, more properly, the expression of a marking should be M (b[n]) = K[n]. As long as the context makes the dimension of b clear, we will not insist on the more complete notation. Conceptually, a location b[n] is composed of n "slots" which contain the n attributes of a token residing on the location. The values of the slots are the values of the corresponding attributes. If the location is empty, the values of the slots are undefined. We shall refer to the ith slot as M(b(i», hence if M(b[n]) =K[n], the ith attribute of K may be denoted M(b(i». Let b[m] be an output location of transition ai and an input location of transition aj (see Figure 4). First, suppose that ai produces a token, K[n], to be placed on b[m], where n~m; the resulting M(b[m]) is defined as follows. Let g be the minimum of the integers nand m. Then M(b(l»: =K(l) M(b(2»: =K(2) M(b(g»: =K(g) If n is greater than m, then the remaining attributes of K[ n] are lost. If n is less than m, then the values of M (b (i) ), for n+ 1 :::; i:::; m, are undefined. Next suppose that aj removes the token on b[m]. The number of a.~ a·J F-€~ Figure 4-Number of attributes Evaluation Nets for Computer System Performance Analysis attributes for that token is defined to be m; where jlf(b(n+l)), ... ,M(b(m)) are undefined through the placement of the token on M(b[n]). Although the transition schema of a particular transition defines the locations that are to receive and yield tokens, the identities and attribute modifications are not reflected without specifically providing for them. For example, suppose a transition a= (s, tea), q), has a schema, s, of J(b1[n], b2[n], b3 [n]) and M(b 1) = KI[n], M(b2) =K2[n], where KI[n]~K2[n]. A transition procedure has the form [Pl-7(ell; e12; ... ; eln ) : ••• : Pk-7(ekl; ek2; ... ; ekm)] where the Pi are predicates (l:::;;i:::;;k, k finite), as described previously, and the eij are "Algolic" arithmetic assignment statements, e.g., M(b 3 (4)): =M(b1(4)) +100. A transition procedure is evaluated by the following algorithm: 1. Set i to 1. Go to step 2. 2. If Pi is true, execute (eil; ei2; ... ; eij) and then terminate transition procedure evaluation. Otherwise go to step 3. 3. Set i to i+l. If i is greater than k, terminate the transition procedure evaluation. Otherwise g<> to step 2. Transition firing A transition firing may now be more formally defined as consisting of the following phases: pseudo enabled phase: A transition is pseudo enabled if all locations satisfy the left hand side of a schema except for the undefined status of a peripheral resolution location. Since this status is undefined, the resolution procedure must be evaluated. (The resolution procedure cannot be evaluated unless the transition is pseudo enabled.) enabled phase: A transition is enabled if all location statuses satisfy the left hand side of a schema. The transition then begins operation. active phase: Transition action is in progress. The status of all associated locations does not change. terminate phase: The transition completes processing, changing the status of output locations to agree with the right hand side of the schema, then executing the transition procedure, and finally changing the status of the input locations to agree with the right hand side of the schema. 283 The existence of an active phase in a transition firing implies an associated time that the transition requires to carry out its operation. An expression reflecting this time is provided by the second coordinate of the transition description, (s, tea), q). This specification, t (a), may be a constant value, or it may be a function that is evaluated, ( on entering the active phase), for the particular token(s) that enable the transition for a specific firing. It is convenient to express t (a) for a transition of type X or Yas an ordered pair, where the first coordinate is t (a) if the token moves from the location labeled "0" in a Y transition graph. and the second coordinate is t ( a) if the token moves from the location labeled "1". In the X transitions, the first coordinate indicates t (a) if the token moves to the location labeled by a "0" in the graph and the second coordinate indicates tea) if the token moves to the location labeled with a "1". Since tokens that enable a transition reside on the input location(s) during the active phase, transition time imposes a dwell time on each location. The dwell time of a location, b, denoted d(b), is the total amount of time any token resided on location b. The dwell time contributed by a particular token may be greater than the corresponding transition time for the token, since the token may have begun residence on the location without enabling the associated transition. The accumulation of dwell time for a location reflects the "occupancy time" or "busy time" for that location. Dwell times for a particular token may be summed up to provide a measure of the time required for that particular token to traverse the network, hence turnaround time. In N utt, 7 measures of dwell time and their relationship to transition times are explored further. Definition of an evaluation net With the above preliminaries in mind, we can now define an evaluation net. An evaluation net is a connected set of locations over the set of transition schema and is denoted as the 4-tuple E = (L, P, R, A) and an initial marking of the locations, M, where L = A finite, non-empty set of locations. P = A set of peripheral locations, P~L. R = A set of resolution locations, R ~L. A = A finite, non-empty set of transition declarations, {ai}, ai "= (s, t(ai), q) where s is a transition schema, t(ai) is a transition time, and q is a transition procedure. 284 Fall Joint Computer Conference, 1972 EXAMPLE OF AN EVALUATION NET Let us construct a model of a very simple computer system which uses most of the concepts presented in the previous section. In our computer system, a job entering the mix may either wait for a single tape drive if it requests one, or if no tape drive is needed, proceed directly to processing by requesting the central processor. When processing is complete, the job relinquishes the central processor and releases the tape drive if it has been allocated. In the description given below, we shall use the symbol "T" to denote a predicate that is always true. For the transition procedures that are implied by the transition schema, (i.e., there is no attribute modification and the transition merely copies the token from an input location to the output location(s) indicated by the schema), the procedure is indicated by a hyphen, bl : b2 : b : 3 b4 : b : 5 b6 : b : 7 be: b : 9 Job ready to enter mix Tape drive is available Job requires tape drive Job does not require tape drive Tape job has drive allocated Job requesting CP CP is idle CP is busy Job is through with CP b lO : b ll : b 12 : b : 13 rl : r2: r3: r4: Tape job ready to release drive Non-tape job ready to vacate Tape job ready to vacate Job 1s complete Routes tape job to b ; Non-tape to b4 3 Chooses job from b or b 4 5 Routes tape job to blO; Non-tape to b ll Chooses job from b l2 or b n Figure 5-Graph of evaluation net "-" Tokens that represent jobs in the computer system will be of the form K[3J, where K(l) =The number of tape drives required, (0 or 1). K(2) =Time required to fetch and mount a tape. K (3) = Central processor time. Let E = (L, P, R, A) be the net, (see Figure 5) R = {rl' r2, ra, r4} P = {bl [3J, bu [3]} UR L = {b2, b3[3J, b4[3J, bs[3J, b6 [3J, b7, bs[3J, b9[3J, blO[3J, bl1 [3J, bI2 [3J} UP A = {al'~, ... , as} al= (X(rl, bl [3J, b4[3J, b3[3J), (0,0), -) i.e., al is a type X transition with input locations rl and bl [3J, which copies tokens to either b3[3J or b4[3J with no time delay. a2= (J(~, b3[3J, bs[3J), M(b3(2)), [T~(M(bs[3J): =M(b3[3J)J) i.e., a2 is a type J transition whose time is determined by the second attribute of the token on the input location, b3[3J. aa= (Y(r2, b4[3], bs[3J, b6[3J), (0,0), -) a4 = (J (b6 [3J, b7 , bs[3J), 0, [T~(M (bs[3J) : = M (b6[3J)) J) as= (F(bs[3J, b9[3J, b7), M(b s(3)), [T~M(b7) :=1]) a6= (X(r3rb9[3J, bu [3J, bIO[3J), (0,10 sec.), -) ~= (F(b IO [3J, b2, bI2 [3J), 0, [T~(M(b2): =1)J) as= (Y(r4, bu [3J, bI2 [3J, bI3 [3J), (0,0), -) rl: [(M(b l (l)) = l)~M(rl): = 1; (M(b l (l)) =O)~M(rl): =OJ i.e., rl takes on the same values as the first attribute of the token on bl [3 J. r2:[T~M(r2): = IJ i.e., r2 is always marked with a one. r3:[(M(b9(1)) =O)~M(ra): =0; T~M(r3): =1J r4:[T~M(r4): =IJ Initially, letM(b2 ) =M(b7) =1. A job enters the net at location bl [3J, (the arrival rate of subsequent jobs is not specified in this example). The existence of a token on bl [3 J pseudo enables transition al since b3[3J and b4[3J are both empty. Resolution procedure rl is evaluated, its marking being determined by the first attribute of the token on b1[3J. Suppose that M (~ (1) ) = 1. Then the token is- moved to location b3 [3J, the transition time being negligible, i.e., teal) is zero. Since M(~) = 1 initially, transition ~ is enabled and becomes active. The transition time for a2 is provided by the second attribute of the token on location b3 [3J (which, let us say, contains "trace data" giving the time required to mount a tape). When this transition time has elapsed, bs[3J receives the token from b3 [3J, (see the transition procedure for ~). The resolution location, r2, is a "tie breaker" and in this case always favors jobs· that have just had the tape drive allocated to them, should two jobs be ready to start requesting the central processor simultaneously. Evaluation Nets for Computer System Performance Analysis The remainder of the net may be interpreted in the manner described above. Let us suppose that the net was put into operation at time to and was halted at time tn. The elapsed time, tn - to, is called the system up time and is denoted T Notice that the dwell time of location b7, d (b 7 ), gives the central processor idle time and corresponds to Tu-d(b s[3]). Similarly, the resource utilization of the tape drive is available from Tu-d(b2). If token Km[3] enters location b1[3] at time tim and enters location b13 [3] at time tjm, the expression tjm - tim reflects the turnaround time of the job represented as Km[3]. Let K 1 [3], K 2 [3], ... ,KN [3] be N tokens that traversed the net. Then the mean turnaround time for this mix is given by 'U' N L (tjm-tim)/N m=l or, alternatively, may be computed by summing up the appropriate dwell times and dividing by N. The throughput rate may be expressed as N jTu jobs/system up time Suppose we exercise our model and find that it is insufficient for our purposes, e.g., disk access is completely ignored, but has an affect on the parameters we are measuring. We may choose to change the level of detail of the central processor activity in the net. Figure 6 suggests a slightly more complex net that reflects simultaneous disk I/O with central processor activity. We can replace transitions a4, as, and location bs[3] of Figure 5 by the net shown in Figure 6 (the definition of this modification can be expressed in the b : Job through with OF and disk 9 b 14 : Job ready to use disk and OF b : Disk is idle 15 b : JOb is requesting disk 16 b : Disk is busy 17 b : Job is through with disk 18 b : OF is busy 19 b ! Job is through with OF eO b 2l : Job ready to relinquish OP Figure 6-Parallel central processor and disk activity 285 same manner as illustrated previously, but will not be given here). This implies that another attribute for disk time is necessary, which determines the transition time for an. The transition time for as would become zero, and t(a13) is determined by trace data carried in M (b 19 (3». SUMMARY The class of evaluation nets has been informally described. An evaluation net may be treated as an interpreted marked directed graph, where transitions correspond to vertices and the locations correspond to directed arcs. The arcs are capable of holding a single item of structured data at a time. The graph of the net represents the structure of the system and indicates the control of token flow. The transition procedures interpret the action of the vertices. By operating the net, (in a simulation manner), measures of resource utilization, turnaround, throughput, etc., are available for further analysis of the system. An implementation of the nets might include some "automatic" analysis, such as resource utilization figures. The nets are modular and allow varying level of detail of representation. An interactive implementation of evaluation nets might consist of a net editor with graphic and symbolic output. The graphic output would be used by the designer in structural debugging and the symbolic output could be used by an interpreter to simulate the net. Current studies at the University of Washington include the implementation of evaluation nets. A more formal treatment of the nets may be found in N utt, 7 from which this paper is abstracted. Examples are given which model the Boolean functions of two variables and a Turing machine. A comprehensive evaluation net of the CDC 6400 is presented which shows the structure of the machine and which allows an extensive performance evaluation of the machine at the task/resource level. This net includes models of priority queues of arbitrary length and illustrates how queueing algorithms may be handled. Evaluation nets are also compared with Petri nets. Future work, besides the implementation, includes the study of the nets as models for computational processes. ACKNOWLEDGMENT The author is grateful to Jerre D. Noe for his support of the research and to Alan C. Shaw for his editorial suggestions. 286 Fall Joint Computer Conference, 1972 REFERENCES 1 H CLUCAS JR Performance evaluation and monitoring Computing Surveys 3 No 4 pp 79-911971 2 C A PETRI Kommunikation mit automaten PhD dissertation University of Bonn 1962 Translated by C F Greene Jr Applied Data Research Inc Technical Report No RADC-TR-65-377 1 supl1 1966 3 J D NOE A Petri net description of the CDC 6400 Proceedings of ACM Workshop on System Performance Evaluation Harvard University pp 362-378 1971 4 A W HOLT et al Information system theory report Applied Data Research Inc Technical Report No RADC-TR-68-305 1968 5 A W HOLT F COMMONER Events and conditions Record of the Project MAC Conference on Concurrent Systems and Parallel Computation pp 3-52 1970 6 J D NOE G J NUTT Validation of a trace-driven CDC 6400 simulation SJCC Proceedings Volume 40 pp 749-757 1972 7 G J NUTT The formulation and application of evaluation nets PhD dissertation University of Washington Computer Science 1972 Objectives and problems in simulating computers by THOMAS E. BELL The Rand Corporation Santa Monica, California problems, a list of objectives, and, finally, a matrix showing which objectives lead to which problems. With this information he can plan his effort more effectively* and improve the design of his simulation model. INTRODUCTION Because the effort required to simulate a computer system is often very great, analysts should consider carefully the probable value of the results prior to embarking on it. Speciallanguages1-5 have been created to aid the programmer in reducing the time required to code a simulation, and analysis techniques 6- 11 are available to reduce time requirements in the later phases of a study. Still, unexpected problems usually arise: An effort concludes with a study only partly completed because budgeted resources have been exhausted, * or the results may be of less value than anticipated. If the analyst can foresee problems prior to commencing the detailed coding phase of a study, he can avoid many of the problems, mitigate many of the remainder, and allow for the rest in anticipating the payoffs of the effort. While some of the problems encountered have unique characteristics, a common set of them seems to keep appearing in simulation studies of computers. Simply knowing the total list of all common problems is no solution to the analyst who typically goes over budget; his difficulty is sorting out the problems that are most relevant to his situation and ignoring the rest. Trying to plan for the unlikely and unimportant can deflect effort from more appropriate areas and lead to less effective analysis than would occur if the problems were ignored until they appeared. The objectives of the simulation influence how the situation will be approached and which problems will most likely lead to critical difficulties. The challenge facing the analyst is to associate the potential problems with his objectives so that he can anticipate his most probable pitfalls and allocate his resources to solving these problems. He needs -a list of SIMULATION PROBLEMS Problems in simulating computer systems could be organized into (1) choosing the language for the simulation, (2) representing the real system appropriately, (3) debugging the simulation, (4) performing experiments, and (5) interpreting the results. ** This classification scheme jumps to the analyst's mind immediately because, chronologically, these are the steps he takes in performing a simulation analysis. Although procedural frameworks are important and may lead to improved simulations, they usually do not attempt to identify which particular issues will be most important for a specific simulation effort throughout the procedure. For example, the analyst, in designing his simulation, must consider the resources available to him and how flexible his work must be. He can choose his simulation language by considering these and several other issues. The underlying problems he encounters in language choice and the other steps in a study amount to resolv- * One of the most important advantages in the planning stage is an ability to predict the costs and specific payoffs of an effort. Overselling the potential payoffs of a simulation not only puts the actual results in question, but decreases the credibility of future simulations. ** A more useful scheme is suggested by Morris (Chairman of the Association for Computing Machinery's Special Interest Group on Simulation) and Mayhan in an unpublished paper:12 (1) Define the problem; (2) select a solution method; (3) develop models; (4) validate models; (5) simulate alternative solutions; (6) select and implement the best alternative; and (7) validate simulation solutions. * See Reference 2, page 2. 287 288 Fall Joint Computer Conference, 1972 ing them correctly. Some of the most troublesome are the following:* 1. Resources. The amount of manpower and machine resources to perform a simulation study may be greater than the expected value of the study, or they may simply exceed the total resources available for the effort. The desire should always be to minimize invested resources, but the characteristics of some simulations make this issue more critical than in other studies. (The total available resources may be very limitedparticularly in terms of elapsed time-and the challenge very great.) Typically, adequate resources are invested in the early phases of an effort with the later phases receiving whatever is available. The issue of resources is mentioned on page 150 of Reference 14. 2. Changes. Changes to improve model validity, to produce additional output, and to reflect modified objectives can prove a major difficulty in some simulations, while they are relatively trivial in others. Although some simulation efforts are not complicated by unexpected changes, quick examination of simulation code often reveals that changes were far more extensive than anticipated. Inadequate appreciation of the need for change can lead to choosing a language that is too inflexible as well as designing code that is too complex. The need for changes in a model is noted on page 87 of Reference 15. 3. Boundaries. In addition to changes as described above, a simulation analyst may find that the boundaries defining the modeled portion of the system change as the study progresses. He may discover that he has attempted to simulate too much of the system and be forced to replace parts of the simulation with simple functions. Alternatively, he may find that his boundaries are too narrow, and important interactions are not being reflected. Identifying the degree to which boundaries will need change can alter a simulation's design to reduce the difficulty of boundary redefinition. Dumas16 refers to the problem of boundaries on page 77 of his paper. 4. Costs. Cost models are often of significant utility, * Few authors even mention the problems they have encountered in simulating a computer system; this may explain the impression held by some that such efforts are easy. References to sources dealing with specific issues are given in the descriptions of the issues. McCredie and Schlesinger13 mention nearly every one of the issues in their paper. 5. J 6. 7. 8. particularly when the objective includes analysis for procurement or performance improvement decisions. Their inclusion, however, often implies a heavier investment of resources in order to determine the costs of purchasing hardware or software. Costs of using alternative systems (including costs of delays) often prove particularly difficult to quantify. The importance of cost models is noted in References 17 and 18. Experimental design. Toward the end of many simulation efforts analysts realize that exercising the simulation will not be a straightforward process. At this late date, they begin to consider how to design experiments: Are 500 hours of CPU time adequate to determine the response surface? Many documents deal with the problem, including References 6 and 8-11. Detail. Simulations vary in detail of implementation from those that are relatively gross (References 19 and 20 give examples) to those that represent operations at the micro-instruction level (References 21 and 22 present examples). The level of detail can often be expressed as the smallest increment of time explicitly recognized in the model. If the simulation is performed in a language like GPSS23 or CSS,24 this level is explicitly recognized in the language. However, this indicator of minimum time increment, although quantitative, conceals the essence of the problem, which is to decide on the extent that system interactions are to be replicated. Accuracy. Analysts should always desire to have the ultimately achievable degree of accuracy in a simulation as high as possible. However, the utility of improved accuracy may be very low and hardly worth the cost. This issue is addressed in References 19 and 25. Validation. An analyst's belief in the accuracy of his simulation is inadequate for evaluating its actual closeness to reality. Only a formal validation effort· can reduce the doubt that it is unrepresentative of the real system. The degree of representativeness is usually assumed to be definable by the ability of the simulation to produce a few numbers that are close to the numbers obtained from reality. Other types of validity are often important, however, including correct sequences of operations and correct responses to alterations of input. The analyst must determine the most appropriate degree of effort to be expended in validating his model. Although many simulations of computers are never validated, examples of validation exercises can be found in References 19 and 25. Objectives and Problems in Simulating Computers 289 OBJECTIVES The objectives of a simulation should be explicitly stated and should be closely related to the decision environment in both terminology and emphasis. Simulation for its own sake is a sterile process and economically unjustifiable. Some published papers on specific simulations state that the author's objective was to simulate a particular jobstream on a particular hardware/software system. These papers probably reflect the author's orientation toward the problems involved in the simulation activity per se; the decision-environment objectives can usually be deduced from sections titled "Findings" or "Conclusions." Five categories of simulation objectives seem to characterize the bulk of simulations of computer systems. These five categories are as follows: 1. Feasibility analysis-investigating the possibility of performing a conceptualized workload on a general class of computer systems. An example of a feasibility analysis is presented in Reference 26. 2. Procurement decision-making-comparing one or more computer systems with a specific workload to decide which of several (or whether any) computer systems should be procured. For example, Bell Telephone Laboratories reports this type of simulation application in Reference 27, and page 4 of Reference 28 provides a report of Mobil Oil Corporation's application. 3. Design support-projecting the effects of various design decisions and/or tracking .the development of a system. Many simulations have design as the objective. Examples are to be found in References 15, 16, and 29. 4. Determining capacity-for projected systems, determining the processing capacity of various configurations; for existing systems, determining the processing capacity of a load different from the current work. Examples of what were apparently simulations to determine capacity are presented in References 30 and 31. 5. Improving system performance-increasing processing capacity by identifying and changing the most sensitive parts of the hardware/software system. This process is also known as tuning, and examples can be found in References 20 25, and 32. Decision-oriented objectives may be as hard to state at the beginning of an effort as they are to discover in many post-analysis papers~ Nevertheless, analysts somehow manage to choose an approach and then develop ISSUE OBJECTIVE Feasibility Procurement Design Determining Capacity Improving System Resources Changes Boundaries Costs Experimental Design Detail Accuracy Validation Figure I-Desired matrix some solution to each of the issues suggested earlier. Many of these are developed within the context of other choices (e.g., the language to be used) involving additional, mechanistic criteria (e.g., user-directed output). One danger in using this procedure is that the process of making other choices may seriously compromise the simulation's value by directing the simulation into unfruitful areas. Just as importantly, the analyst may attempt to generate a simulation that will do all possible things. McCredie and Schlesinger* point out that attempting simulations "capable of answering almost any reasonable question about the system ... must be paid for by large investments in personnel and computer time." This is true, of course, because the analyst must solve all the problems indicated earlier, and some of these may have solutions for one objective that are inconsistent with solutions for other objectives. Such inconsistent solutions should be detectable by drawing a matrix of the issues and objectives with the general solutions as entries. Figure 1 illustrates such a matrix, but it is not completed because the objectives are not well enough defined to permit identification of even a general solution for each issue. For example, the most appropriate level of detail in a study to determine the feasibility of computer logic might be at the microinstruction level as it is in Rummer's study.33 At the same time, a simulation to investigate the feasibility of an entire system might be at so high a level that nothing shorter than a complete job task or data transmission is considered. (This is the case in the studies by Downs, et a1. 26 and Katz.34) Yet both simulations would have * Pages 201-202 of Reference 13. 290 Fall Joint Computer Conference, 1972 ~easibility as the objective. A different categorization scheme is needed for objectives-one that will make it easier to associate problems with objectives by aggregating the decision environment's objectives into classes for the simulation environment. Alternative categorization scheme for objectives The alternative scheme suggested in this paper does not divide the objectives into more categories; instead, it reduces the number and redirects them so that they are more useful in defining answers to the issues suggested above. The alternative defines three categories: absolute projection, sensitivity analysis, and diagnostic investigation. It may appear that all the simulations in each of the decision-oriented five categories map easily, as blocks, into categories in the alternative scheme of three, but exceptions appear often enough that generalizations about mappings are dangerous. Absolute projection This category includes those simulation studies whose objectives can be reduced to the desire to make basically dissimilar comparisons. An example of this type of objective is a situation in which the processing capacities of two systems under a certain load are to be compared. The analyst wishes to determine which system should be procured. Another example is the comparison of a system's processing capacity with the load that it is expected to encounter. (This is usually tested operationally by determining the expected time for the simulated system to process a load and comparing this time with the maximum allowable time.) The decision under consideration in this instance may be whether to procure a certain system or it may be whether to perform a new job on an existing system. A third example of a simulation in this category is one in which response times are being compared with stated requirements. If the proposed system is unable to meet the requirements, then it must be augmented. The important characteristic in each of these examples is the necessity for evaluating an objective function in absolute terms with a high degree of absolute accuracy. If two systems actually differ in processing capacity by 20 percent, the simulation technique must produce answers with absolute errors of less than 10 percent if the analyst is to be sure of choosing the better system. Apparent examples of absolute projection are described in References 14, 26, and 30. Sensitivity analysis Simulations falling in this category emphasize similar comparisons. While simulation studies making absolute projections must have absolute accuracy, sensitivity analyses require good accuracy only in (1) the areas in which two cases are not identical and (2) the areas that significantly interact with the nonidentic~l areas. Although the simulation code may represent far more than the portion of the system under consideration, the primary validation effort should be devoted to the central portion, with reasonableness being the criterion for the rest. The remainder of the simulation code is seldom excess (and therefore an embarrassment) for several reasons. First, it usually interacts with the central portion in some manner in which the details are not important, but the general sorts of interaction are important. Second, other sensitivity analyses may use the same simulation code, and building one simulation for several analyses may be the most effiCIent procedure. Third, the boundaries of the central portion are often not identifiable early in the simulation effort because the analyst is not yet familiar with all the interactions. A decision-maker doing sensitivity analysis may require that answers have high reliability, but if he has an alternative that improves on the default by 20 percent, he does not need to have the absolute values of each. His decision is based on the changed value of the objective function rather than its absolute value. A basic characteristic of sensitivity analysis, of course, is the comparison of slightly different alternatives. For the simulation analyst this implies that his simulation must be constructed to facilitate changes. As an example of sensitivity analysis, an analyst might be interested in the effects of changing hardware, changing software, or changing scheduling schemes. Specifically, he might want to know whether increasing the size of buffers results in increased message throughput. With the exception of the changes under consideration, the initial and changed simulations are identical. The analyst must ensure that the changed portion (and the parts it interacts with) are represented accurately, but the remainder of the simulation (probably including disk queuing, front-end processors, file layouts, etc.) can be less accurate. Of course, the possibility exists that the analyst will incorrectly assume that parts of the system are not critical when they really are, but this is the boundary problem that an analyst always faces. He might find comfort in having the simulation agree with reality in correctly reporting message throughput over a wide range of conditions, but his decision can be made on the basis of the ratio of throughputs before and after the increase in message size. Objectives and Problems in Simulating Computers II References 16, 18,21,22, 29, and 35 would appear to give examples of sensitivity analysis. Diagnostic investigation Diagnostic investigations tend to place less emphasis on the value of an objective function. The interest of the analyst is to gain understanding of the detailed manner in which the simulated system behaves. He may be interested in examining interactions, in analyzing aberrations in the real system (or, too often, those peculiar to his simulation), or in tracing the progress of a transaction to determine whether it goes through the system as expected. The emphasis tends to be on performance of very small parts of the system~ Graphical analysis techniques often find application in this type of simulation since detailed sequences of activities may require examination. Diagnostic investigation would appear to be the objective in References 32 and 33. Substudy objectives The global objectives of a simulation study may not match the immediate objectives of an analyst at certain points in a study. For example, an analyst performing a sensitivity analysis study may find that he needs to project absolute performance to determine whether his model's gross interactions yield results that are even remotely correct. Then he may wish to verify the details of an alternative scheduling strategy and trace its actions through the scheduling algorithm. Only then does he bother to perform simulation runs for the several alternatives that he has programmed. Although his global objective would fall in the category of sensitivity analysis, the analyst would have performed two substudies with local objectives in the other two categories ef absolute projection and diagnostic investigation. Dumas16 and Ceci and Dangel36 have performed these types of substudies. The substudy objectives in a simulation study constitute a means of attaining the study's objectives and merely reflect short term techniques. While these may be important in performing tasks such as verification and ensuring reasonableness, they are not the objectives that determine the simulation's overall design and should not confuse the analyst about the type of global objectives he is pursuing. If the effort devoted to a substudy becomes large, the analyst should carefully consider whether his substudy effort is relevant, his formal global objectives should be revised, or the global objectives.are simply unattainable. 291 ASSOCIATING PROBLEMS WITH OBJECTIVES The definitions of problems and objectives suggested in the preceding pages have assumed that the analyst is interested in designing his simulation before launching into the details of language choice and coding. The assertion has been implicit that these definitions could be used in associating the problems with the objectives to lead to better simulation designs. Figure 2 represents an attempt to provide this type of aid. The importance of the first five issues (resources, changes, boundaries, costs, and experimental design) are indicated there; the applicability of most of the entries is apparent. For example, limitations on available resources will be a critical problem in absolute projection studies because the entire system must be simulated to a high degree of accuracy, and usually the work must be done in a short time. On the other hand, a diagnostic investigation need only reflect particular parts of the system of interest. Sensitivity analyses lie somewhere between these two extremes. Suggestions regarding the last three issues are of a different character. Rather than indicating importance (largely the degree of resource commitment needed), they suggest approaches that are not necessarily indicative of a particular level of effort; however, taking the right action is critical for a simulation of any objective. Level of detail The most appropriate level of detail for an absolute projection simulation is usually at quite a macro level because the entire system must be simulated, and resources are usually at a premium. At the other extreme, a diagnostic investigation usually must be at a relatively micro level in order to reflect detailed interactions. A sensitivity analysis simulation, however, often represents a combination of levels since it may represent the bulk of the system grossly and the altered part in detail. Accuracy The accuracy of response time or throughput figures in a diagnostic investigation study is usually of superficial importance. The analyst is investigating the manners in which one (or a few) parts of the system interact; investigating the details of one part of a system's behavior does not require overall accuracy of performance parameters. Of course, the performance should be reasonable or the behavior will not be reasonable, but 292 Fall Joint Computer Conference, 1972 ISSUE OBJECTIVE Absolute Projection Sensitivity Analysis Diagnostic Investigation Resources Critical Important Desirable Changes Desirable Critica I Important Boundaries Desirable Important Critica I Desirable Important Irrelevant Experimental Design Important Critical Desirable Detail Macro Moderate Micro Accuracy Critical Overall Critica I in Places Reasonableness Only Validation Value Comparison Derivative Comparison Sequence Checking Costs Figure 2-Issues vs objectives high accuracy in performance parameters is not necessary for representative interactions. For sensitivity analysis, a simulation must closely reflect the differences that will be encountered between the various alternatives under consideration. While absolute values of performance parameters may be comforting, the decision problem at hand requires only the relative difference between similar situations. Absolute projection, of course, requires accuracy in desired performance parameters; undue faith in absolute projections is perhaps the most dangerous error in simUlating computer performance. Validation Validation is performed. to improve the confidence that the required type and degree of accuracy is obtained in a simulation. This means that, in absolute projection, the simulation's projection of performance parameters must be compared with the parameters from the real system. This value comparison is necessary if faith is to be vested in the predicted. parameters. In instances where a system does not exist (so no valida- tion can be performed), the analyst should include a caveat with any reported results to indicate that the simulation is of undetermined accuracy. Sensitivity analyses are often validated by comparing the values of real performance parameters with the predicted values over some set of conditions that are realizable on the actual system being simulated. A projection can then be made of an unvalidated case based on the knowledge that the predicted performance was correct in a number of similar cases; therefore, the changes in performance were accurately reflected and probably will be in the new case. This approach may be excessively expensive because it requires accuracy in parts· of the simulation that are not to be altered. An alternative is to compare only changes in performance due to specific changes in the system. In this case, only the fractional changes need be compared, so significant savings may be possible. This less exhaustive process is analogous to comparing derivatives rather than absolute values of functions. In many eases, the analyst only needs to determine whether it is positive, negative, or zero. Validation of diagnostic investigations requires even less rigor than for sensitivity analyses. Since the emphasis is on examining detailed interactions, the analyst usually only needs to ensure that the sequence of operations is correct. Even this validation can be quite timeconsuming and frustrating if the analyst is restricted to viewing flowcharts. Powerful graphical techniques for showing interactions are very useful here. APPLICATION The categorization schemes and matrix presented in this paper are without value unless they can aid analysts in planing analyses and designing simulations. Two examples will be given to indicate how they can be applied. One example uses a simulation that was performed without reference to such schemes and illustrates how the effort could have been aided by their use. (The problems encountered led to developing the schemes and matrix.) The second example presents a situation in which the schemes were applied in order to avoid problems that otherwise might have arisen. Example 1: S~'mulating without reference to the matrix This first example involves a simulation of the Video Graphics System (VGS) performed during the implementation of software on newly designed hardware. The system uses a central communications switching and controlling machine-an IBM 1800-that communicates with a series of terminals and several service Objectives and Problems in Simulating Computers machines. The service machines execute user code and send digital representations of pictures to the 1800 for conversion to analog representation in a special picture generator controlled by the 1800. One picture generator and three scan converters service all users (presently 28) who employ terminals with raster scanned screens that can be slightly modified broadcast television sets. Various input devices, including keyboards, are added to the sets to enable two-way communications. The objective of the system is to supply high-powered interactive graphics capability to many users at a moderately low cost through time-shared use of the expensive digitalto-picture hardware. The system as a whole is described in Reference 37 and a description of the modeled portion of the system is presented in Reference 38. Objectives Prior to doing any simulation coding, we spent time learning about the system1s characteristics and developing a set of simulation objectives. We then distributed a preliminary description of our understanding of the system along with our proposed objectives. (The objectives were expressed as questions that needed answers.) The characters to the left of each objective did not appear in the original (taken from Page 80 of Reference 38). They indicate the type of objective, and the characters stand for the following: A S D Absolute projection. Sensitivity analysis. Diagnostic investigation. Although several additional objectives were added during the study, the objectives listed below were retained for its duration. Many of these objectives, however, were not addressed due to lack of time and the belief that the questions could not be adequately answered with the simulation. A A A S 1. Under what load conditions will the system give poor response? (It may be feasible to alter the load by user education as well as by changing characteristics of such software support as the Integrated Graphics System.) 2. Will messages be unduly delayed in the VMH* system in the 360s? 3. Will channel cycle-stealing slow the 1800 CPU enough that input data are lost due to delays in processing? 4. Will a ping-pong system decrease response time of the VGS? * Video Message Handler, essentially an access method. 293 D 5. What will be the effect of the 1800 waiting at interrupt level four while buffers are unavailable for service machine input? D 6. What will be the effect on the 1800 of one service machine being unresponsive for a short period? A 7. What portion of system capacity does a Tablet take? (It might be profitable to disable a Tablet that is temporarily not in use, or to use a keyboard instead of the Tablet.) S 8. How useful would more core be in the 1800? S 9. How useful would another 1800 be? These objectives included four in the category of absolute projection, three in sensitivity analysis, and two in diagnostic investigation. Since we had objectives in each of the three objective categories, we can see from Figure 2 that we needed a simulation that was at a macro level but also (conflicting) in micro detail. In addition, the simulation had to have a high degree of overall accuracy, be easy to change, have easily altered boundaries, use few resources, and be extensively validated. While no one noted the extreme difficulty of achieving all the objectives at the time we stated them, our proposed categorization schemes and matrix of solutions quickly reveals how difficult it would be to achieve them all. Diagnostic investigations We decided to create a simulation at a low level of detail; the basic time increment in this GPSS simulation was 50 microseconds. It traced all normal interactions in the 1800 and used approximate timing information generated by multiplying the number of instructions in a module by the average time per instruction as measured during an early run of the system. Most of the actual work with the simulation involved diagnostic investigations, including objectives 5 and 6. In addition to the objectives stated before coding began, we investigated cases of potential deadlock· and the platooning that were characteristic of the system. Interactive computer graphics was used extensively to aid in investig~ting these situations; hardcopy graphics was used to document the results and communicate with system designers. Figure 3 shows a typical output, complete with the analyst's marginal notes. This display shows, over simulated time, the priority level of the executing software at the top; the entire bottom of the display presents a Gantt chart. This Gantt chart shows which routines are in control at each moment of simulated time. With these displays the simulation served ad- . mirably in answering questions during diagnostic investigations. 294 Fall Joint Computer Conference, 1972 DISPLAY FROM _~~~~~ TO ~--l~~~'---~:"-R =-.! - !- 2 • II !II 1 DI:;PLAY "ROM ___I~~~~ TO __ ~~~~~. 11100 -;;-_~-. ,:,J_ .. -_ _-+---_ _ -~-~-----~ili-~j+-: ---.:8:...-...1_ _ 8 ! ---- --~_J-_I_ . --+-:-=---··_ _ 'ill 8 'ill 8 -__.J-L- 2 1 . : 1'i11 8 11 6 +--------------l---,,,......::.===--+--I - - - - - sc.. \ 11 S~ 7, 1 Sd ~~ fJV A 1 • . e e ~--------------------~·f-·--~--~---~~·--~~--~--~~--~~--~ 111 C --L Ii 1 !. I ./ • !I llli~00~----------~11~8~OO~---------1~17.10~G70-------·---------1-18-0-0---------1-111~OLOO3 + STATISTICS GAM"" CHAR" VARIABLI! ORA PH STATISTI_C_S_~_G_A_M"_"_C_H_AR_"__--l-_V_AR_I_AB_L_I!_G....:RA__ P__ H---1 Figure 3-Graphical output Sensitivity analysis The initial sensitivity analysis objectives were addressed, but delays in validation caused us to be very reluctant to put much faith in the results. Early use of the real system indicated that some functions performed by hardware should be implemented in software, and we decided to add an objective about the utility of this change. We found, to our surprise, that total system utilization would be only marginally affected by' the change. Without validation, we discounted this result initially. The importance of the issue, however, led to a substudy with strong characteristics of diagnostic investigation to explain the result. We discovered that low priority attempts by the system to clean tip various queues caused processing in the altered case (with hardware implementation) but not in the initial case (with a software implementation). Eventually, we did perform a validation of the simulation and found that, within the context of sensitivity analysis, the simulation had quite adequate accuracy. (See Reference 39 for details of this validation effort.) Absolute projection The largest number of objectives for this simulation study fell into the category of absolute projection. Ob- jective 7 (regarding the portion of system capacity used by a single RAND Tablet) is typical of these, and illustrates the problems of using a simulation for absolute projection when it is designed to fulfill other objectives too. The first problem' is that the portion of system required by a Tablet varies with system loading. As the load increases, the overhead to handle a Tablet (contrary to usual system performance) decreases. Therefore, a single number is inadequate to represent performance in general. This characteristic of systems (performance not being easily represented in simple ways) appears in most systems, but, in absolute projection, stating the fact is often considered unacceptable by people desiring simple answers. Secondly, the absolute projections, in comparison with measured results, tended to be optimistic by a factor of about two. That is, reality took twice as long as predicted by the simulation. Since projections were based on average instruction times, we put the 1800 processor into a very restricted processing loop (83 instructions) to separate timing assumptions from interaction representations, computed the predicted time to execute the instructions (using published, manufacturer supplied timings), and measured the actual time to execute them. In a variety of instances the actual and predicted did not agree; in one of the clearest cases the prediction was 155 microseconds and the measured time was 220 microseconds. This last difference led us to Objectives and Problems in Simulating Computers doubt that our bottom up approach to generating timings would ever lead to fulfilling most of the absolute projection objectives since we did not understand some of the interactions between hardware and software. (One of the few objectives that were usefully addressed, even if not rigorously answered, was objective 7. The predicted system loading was so high that even gross errors in the simulation would not lead to acceptable performance. Predicting this performance problem helped strengthen the case for hardware implementation of some of the functions necessary for Tablet operation.) SUInm.ary Many of the anticipated payoffs of the simulation were not realized because its objectives implied inconsistent solutions to problems in simulation. While the results were useful in fulfilling some objectives, a review of the problems to be encountered in achieving the other objectives could have allowed us to rank them and to consider, before coding the simulation, whether its design was most appropriate in aiding the VGS designers. Example 2: Referencing the matrix before simulating This second example involves a ~imulation of a very large information management system. The simulation was undertaken during the design stage; no hardware was yet available for running any validation tests. A "packaged simulator" was to be used to determine the size of hardware to be ordered. The objective clearly fell into the absolute projection category, and yet, validation of program descriptions could not be performed. While management wished to know the precise configuration that should be acquired, facilities were not available for performing the necessary validations of overall accuracy. Diagnostic investigation We suggested that diagnostic investigations be undertaken to determine whether some critical portions of software would perform as expected. The micro-level simulations could be checked for correct sequences and, as soon as hardware was available, the reasonableness of the predictions could be validated. Absolute projection The need for information about appropriate hardware configurations was very real, so we suggested that 295 a multi-phase strategy be pursued. During the period when no validation was possible, important programs could be simulated at a macro level to see whether obvious design problems existed. (If the simulation predicted 100 hours to run each of ten daily programs, even the most skeptical analyst would question the design. A number of such instances were discovered and corrected.) The important element of this phase was to devote heavy effort only to cases where problems clearly exceeded the potential errors in the simulation. Concurrently, techniques for describing programs were checked by employing them on software being run on an existing system. This effort led to changes in the descriptions of software for use in simulations. Later, preliminary validation could be performed using data made available from configurations used in testing. Since analysts had already completed initial simulations of the programs, validation and revision could be accomplished in the short time between availability of initial data and the required hardware ordering date. SUInInary Suggestions about a more appropriate procedure for this example could clearly be made without our schemes and matrix. In practice, however, they often are forgotten in the rush to implement something and show results. Further, opinion about the difficulty of a specific task is a weak tool to use in convincing people who are unfamiliar with simulation's limitations or under heavy pressure to "get on with the job." The categorization schemes and matrix of solutions are convenient techniques for indicating the requirements to achieve a certain objective in comparison with other potential objectives. RECOMMENDATIONS We have found the application of this approach useful in planning and designing our own simulations and in helping other analysts to improve theirs. It proves particularly useful in predicting how much effort is appropriate for validation exercises and what form such validation should take. While an experienced simulation analyst may feel that it expresses little that he does not already know, too many analysts fail to apply their knowledge rigorously in the early stages of a simulation effort. We suggest that analysts force themselves to state objectives clearly-and in writing-at the beginning of a simulation effort. They should then consider whether all their objectives are realizable when using the suggested solutions to the problems listed in the matrix of 296 Fall Joint Computer Conference, 1972 Figure 2. Only after assuring themselves that the effort can result in fulfilling the objectives should they design the simulation. Finally, they should consider whether the achievement of the objectives will justify the cost required to implement and validate the simulation. REFERENCES 1 L J COHEN S3 The system and software simulator Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 282-285 2 N R NIELSEN ECSS: An extendable computer system simulator The Rand Corporation RM-6132-NASA February 1970 3 J N BAIRSTOW A review of system evaluation packages Computer Decisions Vol 2 No 6 June 1970 p 20 4 W C THOMPSON The application of simulation in computer system design and optimization Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 286-290 5 G K HUTCHINSON J N MAGUIRE Computer systems design and qnalysis through simulation Proceedings AFIPS 1965 Fall Joint Computer Conference Part 1 pp 161-167 6 G S FISHMAN Estimating reliability in simulation experiments Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 6-10 7 T E BELL Computer graphics jor simulation problem-solving Third Conference on Applications of Simulation ACM et al New York December 1969 pp 47-56 (Also available as RM-6112 The Rand Corporation December 1969) 8 D P GAVER JR Statistical methods for improving simulation efficiency Third Conference on Applications of Simulation ACM et al N ew York December 1969 pp 38-46 9 T H NAYLOR K WERTZ T H WONNACOTT Methods for analyzing data from computer simulation experiments Communications of the ACM Vol 10 No 11 November 1967 pp 703-710 10 G S FISHMAN Problems in the statistical analysis of computer simulation experiments: the comparison of means and the length of sample records The Rand Corporation RM-4880-PR February 1967 11 G A MIHRAM An efficient procedure for locating the optimalsimular response Fourth Conference on Applications of Simulation ACM et al New York December 1970 pp 154-161 12 M F MORRIS A J MAYHAN Simulation as a process Simuletter Vol 4 No 1 October 1972 pp 10-15 13 J W McCREDIE S J SCHLESINGER A modular simulation of TSS/360 Fourth Conference on Applications of Simulation ACM et al New York December 1970 pp 201-206 14 H A ANDERSON Simulation of the time-varying load on future remote-access immediate-response computer systems Third Conference on Applications of Simulation ACM et al New York December 1969 pp 142-164 15 A L FRANK The use of simulation in the design of information systems Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 87-88 16 K DUMAS The effects of program segmentation on job completion times in a multiprocessor computing system Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 77-78 17 S R CLARK T A ROURKE A simulation study of cost of delays in computer systems Fourth Conference on Applications of Simulation ACM et al New York December 1970 pp 195-200 18 N R NIELSEN An analysis of some time-sharing techniques Communications of the ACM Vol 14 No 2 February 1971 pp 79-90 19 J D NOE G J NUTT Validation of a trace-driven CDC 6400 simulation Proceedings AFIPS 1972 Spring Joint Computer Conference Vol 40 1972 pp 749-757 20 J H KATZ Simulation of a multiprocessor computer system Proceedings AFIPS 1966 Spring Joint Computer Conference Vol 28 pp 127-157 21 S C CATANIA The effects of input/output activity on the average instruction time of a real-time computer system Third Conference on Applications of Simulation ACM et al New York December 1969 pp 105-113 22 S E McAULAY J obstream simulation using a channel multiprogramming feature Fourth Conference on Applications of Simulation ACM et al New York December 1970 pp 190-194 23 General purpose simulation system/360 user's manual H20-0326 International Business Machines Corporation White Plains New York 1967 24 Computer system simulator II (CSS II) general information manual GH20-0874 International Business Machines Corporation White Plains New York 1970 25 P E BARKER H K WATSON Calibrating the simulation model of the IBM system/360 time sharing system Third Conference on Applications of Simulation ACM et al New York December 1969 pp 130-137 26 H R DOWNS N R NIELSEN E T WATANABE Simulation of the ILLIAC IV-B6500 real-time computing system Fourth Conference on Applications of Simulation ACM et al New York 1970 pp 207-212 27 J M JENKINS R G MAHER Uses of simulation in the design of large scale information systems Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 85-86 28 R A CANNING Data processing planning via simulation EDP Analyzer Vol 6 No 4 April 1968 13 pp Objectives and Problems in Simulating Computers 29 M H MAcDOUGALL Simulation of an ECS-based operating system Proceedings AFIPS 1967 Spring Joint Computer Conference Vol 30 pp 735-741 30 L C SANDERS A Monte Carlo process for determining response times for tactical systems Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 79-84 31 W I STANLEY H F HERTEL Statistics gathering and simulation for the Apollo real-time operating system IBM Systems Journal Vol 7 No 2 1967 pp 85-102 32 M M LEHMAN J L ROSENFELD Performance of a simulated multiprogramming system Proceedings AFIPS 1968 Fall Joint Computer Conference Vol 33 Part 2 pp 1431-1442 33 D I RUMMER FORTRAN simulation of digital logic Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 297-305 34 J H KATZ An experimental model of system/360 297 Communications of the ACM Vol 10 No 11 November 1967 pp 694-702 35 S L REHMANN S G GANGWERE JR A simulation study of resource management in a time-sharing system Proceedings AFIPS 1968 Fall Joint Computer Conference Vol 33 Part 2 pp 1411-1430 36 R J CECI G W DANGEL On-line system simulation Digest of the Second Conference on Applications of Simulation ACM et al New York December 1968 pp 89-93 37 K W UNCAPHER The Rand video graphics system-an approach to a general user-computer graphics communication system The Rand Corporation R-753-ARPA April 1971 38 T E BELL Modeling the video graphics system: procedure and model description The Rand Corporation R-519-PR December 1970 39 T E BELL Computer performance analysis: minicomputer-based hardware monitoring The Rand Corporation R-696-PR June 1972 A methodology for computer model building by A. DECEGAMA The National Cash Register Company San Diego, California INTRODUCTION real systems as possible. The cost involved in a project of this type may well run into the hundreds of thousands of dollars, due to the number and lengths of the simulation runs involved, which require large and fast machines. These two stumbling blocks: the difficulty of building a good model and the cost of verifying it, are the main reasons why good mathematical models of computer systems are practically nonexistent. This paper presents a methodology to build mathematical models of multiprogramming systems that can greatly reduce the time and cost involved in developing models for specific systems. System performance evaluation techniques are of vital importance in the system design process. As depicted schematically in Figure 1, the selection of the design variables is generally accomplished by an iterative process in which the evaluation of the system cost and performance plays a crucial part. Simulation and mathematical modelling constitute the two basic approaches to computer system performance evaluation. Simulation models can be built to study almost any system with a very fine degree of detail, but they may require an inordinate amount of time for the determination of the system performance. On the other hand, mathematical models, while being more limited in scope, are much faster than simulation models and consequently, much more economical to apply. This fact gives mathematical models a decisive advantage. When they can be built, mathematical ~odels of complex systems with large numbers of design variables can be used to optimize the designs; whereas, the number of simulation runs required to accomplish the same task would be so high that the simulation approach to computer system optimization is totally impractical. But good and realistic mathematical models of computer systems are difficult to develop. The real world is far too complex to be described faithfully with a series of equations. Therefore, approximations requiring extensive testing, guided by a deep understanding of the stochastic processes involved, must be made in order to formulate a model that, while capturing the essence of the problem at hand, is mathematically and computationally feasible. Furthermore, mathematical models of computer systems are costly to develop because their verification requires in turn the development and application of very detailed simulation models that are as close to the THE CPU INTERRUPT PROCESS IN MULTIPROGRAMMING SYSTEMS The occurrence of CPU interrupts in a multiprogramming system is the result of the superposition of many random and independent events (paging, I/O file accesses, ends of programs, time-slicing, spooling, etc.) that are caused by the concurrent processing of different programs. This situation is illustrated in Figure 2 which represents the succession· in time of the interrupts produced by five programs residing simultaneously in main memory. Programs 1 and 2 are assumed to be of priority 1 (the highest), program 3 of priority 2 and programs 4 and 5 of priority 3. The Pooled Output Theoreml states that the distribution of the CPU inter-interrupt times (Ti) would be asymptotically exponential, as the number of programs in main memory increases,· if the individual inter-interrupt times within any priority i, T ii • were independent. The individual T ii are the result of a sum of random variables: 299 300 Fall Joint Computer Conference, 1972 where Tio=I/O response time/interrupt (includes both waiting and service time) T wi = CPU waiting time for programs of priority i T ui = CPU service time between consecutive interrupts for programs of priority i To = CPU interrupt overhead It can be seen that there is some interdependence between the different Tii due to the I/O and CPU waiting times involved. But, since in a system properly designed the CPU and I/O service times should be much longer than the corresponding waiting times, such interaction may not be strong enough to constitute a significant deviation from the condition of independent Tu. Also, since the trend toward the exponential density is very rapid with an increasing number of programs in memory, it would appear that a finite number of programs being multiprogrammed provide conditions that may sufficiently approach those for which the CONSTRAINTS (Technology, Costs) ~r , I l " ' l e - - - - Til - - - + - I - - -...... ~/ : PROGRAM 2 I Tul T? I I " PROGRAM 3 Tlo I I : : I TU21To I I I ~ I 1 14- I I Ti2 I ;V TW 2! Tu2 T6 I 1'< I TwI Tui T9 I I I Tio I I I I I I : PROGRAM 4 PROGRAM 5 ABSOUJTE TIME : I I I I I I 1 I , i, I I T I I I I: " I I :~. I I I I I I I I I I I , I I I I I ~ I I I I I J. Tu i =CPU t im
.!grIi!!s~'l.c~ J!'!.~ _______ _ N= No. programs being multiprogrammed 1.6~~ KS _ _ ___________ __ t/!~9..".!.f~l!nE~ Le'y~!. _____ _ _________________ ~~~Qn.!f.!.c!'!c..e_~~~ ______ _ 1.~.?~ N .. 4 N=7 1.0 N=10 1.000 o~------~------~------~------~------~ Figure 3-Trend of inter-interrupt times vs coefficient of variation of basic CPU +I/O cycle from the exponential distribution are intended to determine the validity of this rationale. The main results obtained by simulation followed by the statistical analysis are shown in Figures 3 through 6. The statistics in those figures are presented as a function of N, the number of programs being multiprogrammed and C v, the coefficient of variation of the corresponding program service cycles, defined as the sum of the CPU service time plus the 110 service time between consecutive interrupts for an individual program. It was observed during the course of this investigation that the closeness of the CPU interrupt process to a Poisson process in a multiprogramming system is just a function of these two parameters alone. In other words, for a given value of N, all combinations of program and system characteristics resulting in the same values for C v give similar values for each statistic considered. Thus, Cv is a convenient parameter to identify different _ 1.9 N. No. programs being multlprogrammed ~+-----~~----~----~~ 0.5 1.5 ____ ~~ 2 ____2.5__Cv ~ Figure 5-Two-sided Kolmogorov-Smirnov statistic vs coefficient of variation of basic CPU +1/0 cycle multiprogramming environments from the standpoint of the interrupt characteristics of individual programs. As was to be expected from the Pooled Output Theorem and the properties of the Poisson process, the higher the number of programs being multiprogrammed and the closer Cv is to 1, the stronger the indications are that the CPU inter-interrupt times are exponentially distributed. The points of Figures 3 through 6 represent average values of each statistic considered. The values of a given Max. deviation 0/0 Serial Correlat ion statistic 1.~§ .500 N = No. programs being multiprogrammed 30 _ ______________ §.°lo.§!9.!lificance level N= No. programs being-~-ultip-;~gr~~~-ed----N=7 20 N=10 1.0 10 - ;______ o~------~-- .5 ____~______~______~______~C~v 1.5 2 ~ .5 ____ ~ ______ ~ 1.5 ____ ~ 2 ______ ~ ______ Cv 2.5 2.5 Figure 4-Serial correlation vs coefficient of variation of basic CPU +1/0 cycle Figure 6-Maximum deviation C%) from exponential density vs coefficient of variation of basic CPU +I/O cycle (frequency interval 10%) A Methodology for Computer Model Building statistic characterized by the same Nand C v are closely clustered around the indicated average point. Thus, it can be said, based on experimental evidence, that only two parameters, Nand C v, are needed to determine whether the exponential hypothesis can be applied in any given multiprogramming situation. The conclusions that can be drawn from the simulations and statistical analyses are: 1. The interrupt process in a multiprogramming system constitutes most probably a renewal process (Figures 3 and 4 show clearly the stationarity and very low serial correlation of the process). 2. The deviation of the interrupt process in a multiprogramming system from the Poisson process is a function of the number of programs being multiprogrammed and the coefficient of variation of the CPU-I/O service cycle exclusively (Figures 5 and 6 show trends that are typical of all the computed statistics). 3. There was no outright rejection of the Poisson hypothesis at the 5 percent significance level. This was true even for the Moran Statistic which is a most powerful test. The implication is then that the assumption of an underlying exponential distribution is not inconsistent with the observed inter-interrupt times. 4. The maximum deviation from the exponential distribution that has been observed (Figure 6) in any sample appears to be tolerable. This is borne out by the results obtained with computer models based on the exponential hypothesis. As later indicated, the application to the models of queueing theory expressions requiring exact exponential distributions yielded only small errors in all the cases considered. 5. The range of applicability of the exponential hypothesis is at least N~4 .5:::;C v ::;2.5 This range comprises all the studied cases which included most practical multiprogramming situations. MODELING IMPLICATIONS The basic fact that the occurrence of interrupts in a multiprogramming system constitutes a Poisson-like process for a wide range of programming and system characteristics is the keystone for a powerful technique of computer model building. 303 In any system where the inter-interrupt times are exponentially distributed, the interarrival times of service requests to the different I/O devices are also exponentially distributed. This is due to the fact that selecting events at random with a given probability p from a Poisson process results in another Poisson process. If the density function of the times between events in the original process is Ae-)'t, the corresponding density function in the derived process has the same form but with parameter AP instead of A. In a given system the probabilities of requiring access to the different I/O devices can be calculated as a function of measured program and system characteristics. 4 If the interarrival times to the I/O devices are exponentially distributed, then the average and variance of the I/O response time from each device can be determined by standard Queueing Theory formulas as a function of the average interarrival time and the first three moments of the corresponding service time. 4 No knowledge is required of the actual forms of the service time distributions for a majority of service disciplines. The first three moments of the service times are easily ealcnlated4 from the measured first three moments of the amount of data to be transferred and the access and transfer characteristics of the different I/O devices. The possibility of calculating accurately the average and variance of the I/O response time per interrupt leads directly to a methodology to develop realistic mathematical models of multiprogramming systems. MODEL BUILDING METHODOLOGY The main objective of any mathematical model of a computer system is the determination of the system performance for any configuration and programming environment. The key to the determination of a system's performance is the calculation of the first two moments of the service time/program for the given hardware, software and programming characteristics. This makes possible the computation of the two fundamental measures of system performance: throughput or the quantity of service provided per unit time and the average response time per program that indicates the quality of the service. The system throughput is equal to the program arrival rate, if it can be sustained by the system. This is determined by comparing the average number of programs that can be concurrently serviced with the actual average number of programs that must be multiprogrammed. The number of programs that the system can serve 304 Fall Joint Computer Conference, 1972 simultaneously is either a fixed constant or it is a function of the memory and program sizes and the storage allocation algorithm. On the other hand, the average number of programs that must be processed concurrently is equal to the quotient of the average service time per program and the average program inter arrival time. If this value is not less than the number of programs that can be multiprogrammed, then the assumed program arrival rate cannot be sustained and it must be reduced. With respect to the average response time per program, it can be calculated as a function of the average program interarrival time and the first two moments of the service time per program only if the program interarrival times are exponentially distributed. The reason for this is that the response time per program is the sum of the service time and the time waiting in the queue of programs trying to enter main memory to begin service. lV[athematical expressions for the average value of the waiting time only exist for the case of exponentially distributed program inter arrival times when the service time distribution is of general form. This limitation is not as serious as it may seem. In addition to .system performance prediction, the most important application of a mathematical model of a computer system is as a basic component in a system optimization program. If the target function for optimization is the maximization of throughput, the form of the program interarrival time distribution is not important. If the target function is the minimization of the average response time, its calculation is not needed~ This can be seen by considering that the phenomenon of waiting is a direct consequence of the variances of the arrival and service processes. The variance of the program interarrival times is uncontrollable but the variance of the service times can be minimized to achieve the least possible value of the average waiting time. Thus, the determination of the first two moments of the service time/program as a function of the system and programming environment characteristics constitutesthe basic calculations of the mathematical model of a computing system. T Io = I/O time/program (service time plus waiting time) Consequently, the average and variance of the service time/program are calculated by and respectively. Tep is in turn equal to the sum of a numberof components: where Tpw=Time waiting for CPU service/program Tpe=CPU execution time/program T po = CPU overhead time/program The average and variance of Tep are then given by and I/O time/program TIo is equal to the sum of a random number of random variables: k Ni j=1 1 :E :E T T Io = j where k=number of I/O devices in the system Nj=number of interrupts/program resulting in access to devices of type j Tj=Response time for devices of typej The average value of T Io is calculated by k E[TIoJ = Service time/program :E E[NjJE[TjJ j=1 The service time/program can be expressed as T8=Tep+TIo where T8 = Service time/program Tep= CPU time/program (service time plus waiting time) and the variance by k Var [TIoJ= :E (E[NjJ Var [TjJ+ Var [NjJE2[T jJ) j=1 according to standard Probability Theory expressions for the sum of a random number of random variables. 5 As has been explained, the average and variance of A Methodology for Computer Model Building 305 I' ,I' T j can be determined only because the service request interarrival times are quasiexponentially distributed. With respect to Nj, its average and variance can be calculated by applying the indicated expressions for the sum of a random number of random variables to the program execution time, i.e., tectural features such as cache memories design parameters and system resource management features such as paging algorithms and buffering schemes, etc. The computations can become very involved4 and they are beyond the scope and length of this paper. Suffice it to say that a number of good models exist6 ,7,8,9,lo that can be applied to compute the different Pij. CP U time/program With respect to the remaining components of Tcp not yet determined, the average and variance of T po, the interrupt overhead per program, and T pw, the tim€ waiting for CPU service per program, are calculated by considering that T po and T pw are sums of random numbers of random variables: k E[Tpo] = E[Toi] L: E[N j] j=1 where k Tij=CPU time between events (execution of jump instructions, data references, file references, etc.) that may result in an I/O interrupt to access a device of type j Pij=Probability of actually accessing a device of typej and where the actual CPU time between interrupts is again the sum of a random number of random variables. The random variables are the Tij and their random number between interrupts has a negative binomial distribution with average I-Pi; . - - and varIance Pi; I-Pij Pil Solving for E[N j] and Var [Nj] E[Tpe]Pij E[N j]= E[Tij](l-Pij ) E[Tpe]Pij Var [N;] = [ Var [Tpe]- E[Tij](I-Pij ) + E2[Tij] )]. Pil Pil (I-P ij ) (I-P ij )2E2(Tij ) E[Tpe] and Var [T pe] as well as E[Tij] and Var [Tij] must be measured for each environmental program mix. P ij should be calculated as a function of system archi- k Var [Tpo] = Var[Toi] L:E[NJJ+E2[Toi] L:Var[NJJ j=1 j=1 k E[Tpw]=E[TwB] L: E[N;] j=1 k k Var[Tpw]= Var[TW8 ] L:E[N j]+E2[Tws] L:Var[N j] j=1 j=1 where Toi = CPU overhead/interrupt T = Time waiting for CPU service between consecutive CPU services WB The average and variance of Toi must be measured. The average and variance of T W8 must be calculated. This is accomplished by considering the CPU as a stochastic service system similar to the I/O devices. The process of requesting CPU service is analogous to the CPU interrupt process and is also Poisson-like (Figure 2). In other words, the interarrival times of CPU service requests can be considered to be exponentially distributed with the same average as the inter-interrupt times. Simulation shows that this approximation is also very close to reality. 4 In addition to the average time between CPU service requests, the first moment of the CPU holding times is required. It is simply and, since the different Tij are truly independent, it is 306 Fall Joint Computer Conference, 1972 reasonable to expect that the CPU holding times resulting from their superposition have a density function that tends toward the exponential density as k increases. Simulation shows4 that in systems with several levels of secondary storage, the approximation is as close as that for the inter-interrupt times. Under those conditions, the variance of the CPU holding times is just equal to the average squared. Also, the CPU waiting times are exponentially distributed if the inter arrival and holding times are both exponentially distributed and consequently Var [Tws]=E2[Tw8]. E[TW8] is calculated by standard Queueing Theory expressions depending on the service discipline as a function of the average inter-interrupt time and the average CPU holding time. System interrupt rate The calculation of the average inter-interrupt time completes the backwards presentation of the basic steps that must be taken to build a model (when building an actual model of a system, development proceeds step by step from interrupt rate toward the computation of performance) . The average inter-interrupt time is given by where Ucp represents the CPU utilization factor which is calculated by The 1/0 device utilization is simply the quotient of the average service time and the average service request interarrival time. The preceding outline of the steps to be taken to build a model of a multiprogramming system can be applied to interrelate system variables (hardware and software), programming variables and resource management variables. (A list of typical environmental, system and controllable variables has been published elsewhere. 4 •16 ) Thus, a model built by applying the described methodology will relate intimately all these variables in a set of equations that constitute the mathematical expressions of the basic interrelationships of the system. Therefore, the resulting model can be used for optimization purposes, since the effect of the change of anyone variable on system performance can be readily cal~ulated by the model. MODULAR DESIGN The described model building approach is susceptible to modular implementation. As depicted in Figure 7, the determination of the system interrupt rate is the focal point of this methodology. The steps to be taken to determine it as a function of the system configuration and characteristics and how to use it to obtain the system performance have already been outlined. The point to be made here is that many specialized computations that can be modelled separately are also required to build a complete model of a computer system. where Q8 indicates the system throughput in programs per unit time. Equipment utilization constraints In addition to the indicated calculations to build a system's model, a set of constraints must be simultaneously satisfied in order for the system to be able to maintain the desired rate of throughput. These are the constraints for equipment utilization: CPU, Main IvIemory and I/O devices that must be all less than one. How to compute the CPU utilization factor has already been indicated. The main memory utilization can be computed as where Ns = average number of programs that can reside in main memory and receive service simultaneously (multiprogrammed) Figure 7-Basic model building methodology and modular design A Methodology for Computer Model Building For instance, a Paging l\10del and a Storage Allocation Model are needed to determine the probability that a paging interrupt will occur when a memory address is generated by the CPU. A System Configuration Model is required to specify the hierarchy of memories in the system and Buffering l\10dels are required to determine the number and size of the I/O buffer areas and the probabilities of actually having to perform an I/O operation when a READ or WRITE instruction is executed. If the CPU service is quantized, a Quantum Interrupt Model is necessary to obtain the actual average CPU time between interrupts. Also, it must be kept in mind that the CPU times between consecutive events of the same type that may result in an interrupt and that are assumed to be known, have either been measured in the system under study or in some other system. In the latter case, several more models may be needed to determine the corresponding CPU times based on the characteristics of the new system. In the first place, a CPU Model that can compare different CPU and memory designs and instruction sets and determine the relative CPU powers to process a given type of task is required to convert the CPU time measurements to the new system. Since the interference with other CPU's and I/O l\10dules may slow down the CPU and lengthen the CPU times required for a given task, a CPU-I/O module Interference l\:lodel must also be applied. In addition, if the system under study has a cache memory, a Cache Model is needed to determine the impact of the cache on the program processing speed of the CPU. And, if possible hardware/software trade~offs are being studied, another model is required to establish their effect on the interrupt rate and the distribution of I/O accesses. In addition to the indicated models, a variety of I/O service models for different types of devices and service disciplines are also needed to be able to apply the basic lVlodel Building Methodology depicted schematically in Figure 7 to a variety of situations. Thus, a substantial number of functional models are necessary to implement a complete model of a multiprogramming system. The degree of detail of the functional models determines the complexity of the overall system model and its accuracy. Priorities of programs can be considered, and distinctions of processing requirements between compilations and executions or code and data types of information can be made. 4 The linking of the functional models that results in a unified system model is provided by those parameters and variables that are common to one or more functions. For instance, some of the variables Df the Storage Allocation Model (page size, number of pages allowed in memory, program sizes) that influence 307 system performance are also required in combination with the System Configuration l\10del to determine the structure of the hierarchy of memories in the system. But this interdependence alone is not sufficient to make the overall system model feasible. Without the knowledge that the system interrupt rate constitutes a Poisson process, the entire model structure would collapse. A basic link between the functional modules would be missing. Then, what makes this Model Building Methodology very powerful is the fact that the individual functional models can be developed and verified independently from one another. Thus, a given functional model can be built by applying new mathematical developments, by experimenting with simulations or by regression analysis. The important point is that the problem of developing a complex model of the overall system has been considerably simplified and reduced to manageable proportions. VERIFICATION l\1athematical models of multiprogramming systems developed by applying the described methodology have been verified by comparing their predictions with the results of a very detailed simulation model of multiprogramming systems. The simulation model has been written in GPSS and it has great flexibility to simulate different system configurations, service disciplines and control policies. In order to give an indication of how closely the results of the two models follow one another, Table II presents the calculated by a mathematical model and the measured by simulation values for optimum performance of some of the important parameters that correspond to some of the systems to which the models have been applied. The systems investigated had three levels of storage (main memory, bulk memory (ECS), disk). Their environments are indicated in Table II. The industrial environment consists of three priorities of programs: conversational, real-time and batch. In the university environment there are three priorities of programs also, but instead of the real-time system there is a computer aided instructional system requiring on the average, more CPU time but less memory space than the real-time system. An optimization procedure based on the mathematical model and using Direct Search Techniquesll was applied in all cases. Table II shows clearly the value of mathematical models to predict system performance. The eloseness of the results of the mathematical and simulation models seems to indicate that good models have been developed. 308 Fall Joint Computer Conference, 1972 TABLE II-Comparison of Results of the Mathematical and Simulation Models CASE 1 University environment; programs and data accessed in main memory. Parameter 1 Parameter 2 Parameter 3 Parameter 4 Parameter 5 Parameter 6 First moment of CPU time between interrupts for priority 3 (msec.) Average interarrival time to disk controller (msec) CPU utilization Average service time for priority 1 (msec) Average service time for priority 2 (sec) Average service time for priority 3 (sec) Calc. Calc. 5.79 4 Business environment; programs and data accessed in main memory. 5 Business environment; programs and data accessed in ECS (user programs). 5.25 Meas~ Calc. Meas. Calc. Meas. Calc. Meas. Calc. Meas. 147 165 .44 .444 507.6 464.1 42.9 41.57 188.3 184.1 127 151 .722 .73 575.1 559.8 47.66 50.16 272.33 277.38 19.46 18.6 108 121.5 .54 .544 207.1 189.87 27.7 25.4 306.39 311.54 10.1 11.5 129 151 .362 .366 221.15 210.2 107 .548 .551 393 2 University environ- 181.7 ment; user programs and data accessed directly in ECS (Extended Core Storage). 3 University environment; priorities 1 and 2 accessed in main memory; code and data of priority 3 accessed directly in ECS. Meas. 186.14 187.6 96.16 The general problem of verification of simulation models has been extensively discussed in the literature.12.13.14.15 It has been said15 that "in view of the difficulty which arises in attempting to agree upon a set of criteria for establishing when a model is verified, efforts should concentrate on the· degree of confirmation of a model rather than whether or not the model has been verified." It is felt that all the numerous models that have been built according to the methodology 374 190 2.21 2.19 154 153.67 2.71 2.45 166.33 174.24 presented in this paper have been confirmed to a high degree. CAPABILITIES ~1athematical models of computer systems relate all the important parameters and variables of the hardware, software and environment to the indexes of A Methodology for Computer Model Building performance and cost by means of computations that can be carried out very quickly. Consequently, they provide a flexible and efficient way for the study, design and control of computer systems. Some typical applications of models developed along the lines discussed in this paper are: Optimum hardware configurations for given applications /:I I'i"I I' I I,' The cost/performance gains that can be obtained by system optimization can be significant. Improvements of up to 30 percent of the cost/performance index were observed4 in the optimization of systems whose controll able variables were initially chosen by an intuitive process of educated guess. I:, Cost-performance evaluations for hardware-software trade-offs A mathematical model of a multi programmed business computer has already been applied to study the improvement in system performance that can be obtained by a Firmware Sort-Processor. 16 Evaluation of new system architectures based on advanced technologies A mathematical model of a cache has been used in conjunction with an overall system modeP7 to reveal some interesting facts about the effectiveness of cache memories to improve system performance. Adaptive operating system design If the 'programming environment of a system and its hardware availability can be measured dynamically, then it is possible in principle to have some kind of , control over the allocation of resources and the scheduling of tasks so that a high level of performance is I maintained under dynamically changing conditions. The ability of a mathematical model of a system to predict its performance for any combination of environ:, mental parameters and controllable variables makes it the focal point of an algorithm for system performance optimization. It is suggested that such an algorithm can provide an operating system of conventional design with an effective mechanism for decision making. IVIEASUREl\1ENTS The environmental measurements that must be made for the application of mathematical models of multi- 309 programming systems are: measurements of times between certain events that may result in an interrupt, measurements of amounts of space required, measurements of distances between data or instruction addresses referenced by the programs and frequencies of usage of the different elements of system and user programs. The cost of gathering and analyzing the required data should be offset by the improvements in performance and cost savings that can be achieved through the application of mathematical models. SUMMARY A new methodology to design mathematical models of computer systems has been introduced. The hypothesis of exponential inter-interrupt times is the cornerstone of this new approach to computer model building. The range of validity of this hypothesis has been specified. It is demonstrated that a diversity of models can be built based on this concept and on a modular functional approach to model development. The main computational steps to be followed in order to link together partial functional modules and build an overall system model have been specified. Simulation has shown the validity of the exponential hypothesis in many multiprogramming situations and the basic soundness of models developed by applying the defined methodology. Some typical applications for which mathematical models are particularly suited have been indicated. AGKNOWLEDGl\1ENTS The encouragement and constructive criticism provided by IVIr. Harut Barsamian of the NCR-DPD Research Department is sincerely appreciated. The author is also grateful to Mrs. Lori Lehman and JVlrs. lVlaryAnn Garrison for their contributions in typing the paper. REFERENCES 1 D R COX P A LEWIS The statistical analysis of series of events John Wiley & Sons Inc New York 1966 2 P W W LEWIS T C KELLY A computer program for the statistical analysis of series of events IBM Research Report RJ 362 November 121965 3 E M SCHEUER Testing grouped data for exponentiality RAND Memorandum RM 5692 PR August 1968 310 Fall Joint Computer Conference, 1972 4 A DECEGAMA Performance optimization of multiprogramming systems Doctoral Dissertation Computer Science Department Carnegie Mellon University Pittsburgh Pa April 1970 5 W FELLER An introduction to probability theory and its applications Vol I John Wiley & Sons Inc New York 1968 6 P J DENNING The working set model for program behavior Communications of the ACM May 1968 7 E GELENBE J C A BOEKHORST J L W KESSELS Minimizing wasted space in partitioned segmentation Proceedings of the Symposium on Computers and Automata Polytechnic Institute of Brooklyn New York April 1971 8 E G COFFMAN JR T A RYAN A study of storage partitioning using a mathematical model of locality Communications of the ACM March 1972 9 P J DENNING Properties of the working set model Communications of the ACM March 1972 10 A WOOLF Analysis and optimization of multiprogrammed computer systems using storage hierarchies 11 12 13 14 15 16 17 University of Michigan Technical Report RADC TR 71165 August 1971 C F WOOD Recent developments in direct search techniques Research Report 62 159 522 Rl Westinghouse Research Laboratories Pittsburgh Pa. D K TOCHER The art of simulation The English Universities Press Ltd 1963 General purpose simulation system 360 user s manual IBM Publication H20 0326 0 January 1968 T H NAYLOR K WERTZ T WONNACOTT Some methods for analyzing data generated by computer simulation experiments National Meeting of the Institute of Management Sciences Boston April 1967 T H NAYLOR J M FINGER Verification of computer simulation models Management Science October 1967 H BARSAMIAN A DECEGAMA Evaluation of hardware firmware software trade offs with mathematical modeling AFIPS 1971 SJCC Proceedings H BARSAMIAN A DECEGAMA Some design considerations of cache memories IEEE Compcon 1972 Proceedings San Francisco LOGOS and the software engineer* by C. W. ROSE Case Western Reserve University Cleveland, Ohio INTRODUCTION I with the problem of storage allocation; however, they do little in reducing the major problem of complexityinter-module communication, software/hardware interface conflicts, and mishandling of real and apparent concurrency within the hardware/software system. This is more obvious. when one remembers that most large design efforts are multiperson, and that software modules and hardware designed by many people must communicate properly at the many interfaces. The hardware designer is somewhat better off since he can call upon switching algebras, flow table analysis3 and register transfer languages4 ,5 to aid him in the design. Unfortunately, these tools are not amenable to the design and analysis of very large systems, and the designer soon learns to modularize his system and to apply his techniques to several modules of manageable size. It is at the interfaces of these modules though, that problems equivalent to those in software arise. The net result of this inability to systematically deal large scale complexities has been the late delivery of expensive and buggy computer systems. This is not to suggest that the several successful structural approaches to systematic operating system design6,7 are insignificant, but rather that the difficulty of enforcing their requisite structural and communication disciplines becomes very great as the size of the target system increases. Hardware engineers encountered this problem of complexity very early in terms of implementation, and responded by developing computer-aided design (CAD) systems for logic diagram production, package placement, wire routing and mask generation, and simulation and test generation. s Many other engineering disciplines have also turned to the computer to help deal with the complexities of large systems. 9 It is ironic, however, that the computer, which has great analytic capability, doesn't often forget details, and can enforce structural and communications disciplines by syntactical analysis, has not, to date, been applied to the conceptual and detailed design of computer systems. Most of us consider a well-engineered product to be one which is structurally sound; which communicates with its environment in a predictable, well-disciplined manner; which has been thoroughly tested; and which is reliable and easily maintained. In any engineering field, the structural philosophy, design disciplines, and checkout methods which yield such a product are called "good engineering practices." Software engineering is the application of good engineering practice to the design, implementation and final checkout of large programs. The result of effective software engineering should be: ,I (1) The production of a correct program (certifiable) (2) The availability of means of efficiently determining the correctness of a program (certification) (3) The ability to modify a program so that recertification is possible. l The goal is to organize complexities, master multitude, and avoid its bastard chaos as effectively as possible. 2 However, unlike many types of engineers, the software engineer has had few tools, either for implementation or analysis, with which to accomplish his task. Many of the problems in operating systems which occurred during the mid-sixties can be traced to an inability to enforce the design disciplines indicated by good engineering practices, or to determine after the fact that they had been applied. In some cases the faults appeared years after the system was in the field. Higher level implementation languages for software remove many trivial coding errors and deal effectively * This work was supported in part by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Department of the Army under Contract No. AMSELPP-Cn122(CAE). 311 312 Fall Joint Computer Conference, 1972 Project LOGOS was begun in 1968 at Case Western Reserve University to exploit these capabilities by creating a computer-aided design environment for both the hardware and software of large-scale computer systems. An integrated hardware/software design system was chosen because mismatches at that particular interface in a computer system are the most costly and timeconsuming to correct. The goals of LOGOS can be simply stated: the creation of a multi-designer environment in which computer system designers can define a system in which a high degree of parallelism or concurrency exists, verify its logical and functional consistency, evaluate its expected performance before implementation, and finally implement the hardware and generate the code for the software. Inherent in any CAD system is. a philosophy of target systems structure and an associated representation system which both embodies that philosophy and has a well-defined syntax and semantics. It would be helpful here to briefly describe both to set the stage for a discussion of LOGOS' contributions to the software engineer. A LAYERED VIEW OF SYSTEM STRUCTURE From a user's viewpoint, a computer system presents an environment to each user which is characterized by a collection of service facilities. 10 Each facility may be activated and directed according to a welldefined communications discipline. Since users do not, in general, act in coordination, the system facilities must cope with multiple and asynchronous requests for services. Response to a request activates the facility, an instance of which we shall call a task, and the method of handling multiple requests depends upon the nature of the facility. A single-user facility would queue all requests in excess of one, while a limited resource facility such as a magnetic tape controller with six tape drives would allow six concurrent activations before queueing requests. On the other hand, a fully reentrant software procedure would have no limit to its activations although exhaustion of some other resource such as core memory would impose a limit externally. Users of facilities very often do not care about how a facility is implemented internally, but rather how it interacts with its environment. This concern with the input/ output function of a facility is the "external" or "primitive" view. Conversely, a user and, in particular, a designer may need to know the details of implementation as well as the I/O function of a facility. This is the "internal" view. A facilities approach to viewing systems immediately gives rise to a hierarchical structure. Many facilities in a system provide essentially identical services, or equivalently, have identical subtasks. These subtasks could be viewed as instances of activation of separate facilities shared among those requiring the particular services. The most primitive shared subtasks in a software system are the machine instructions. By the same token, however, the reading of a text file appears primitive to a compiler using the file system facility, although an internal view of the file system shows that the read file operation is quite complex and uses other shared facilities such as the disk channel. It is natural, therefore, to structure a system as a partially ordered hierarchy of layers, the highest layer being the interface with the system users, and the lowest, the system primitives. A system primitive for a software system might be a machine language instruction or a library subroutine, while a hardware primitive might be a NAND gate or a four-bit MSI adder. A facility on a lower layer may be activated by a task of a facility existing on a higher layer. Its tasks may, in turn, activate facilities on still lower layers. Between any two layers, there is an interface partitioning the system into facilities below the interface and users of those facilities above it. Users above present an environment of service requests and arrival rates, while facilities below present an environment of service available and service times. The ability to "collapse" or look at a facility as a primitive suggests that consistency analysis of a facility could be done by exposing the internal structures, analyzing it, collapsing it, and then analyzing its interactions with its environment as a primitive. This is the only practical way of analyzing large systems, and the representation which accompanies this philosophy allows just that. Implied in all of this is the existence of an interfacility communications discipline for both data and control. Several might be defined such as Dijkstra's P-V disciplinell or lVlultics' mailbox scheme. 1Z What is important is that whichever one is chosen, it must be enforced, or the layered model will break down, and the analytic capability afforded by this scheme will be lost. A facility in general consists of four elements: (i) Resources of one or more types which may be required by the facility subtasks. (ii) An enclosing control which determines, based upon resource availability, if a subtask should be activated or if the request should be queued or dismissed. (iii) A set of algorithms defining the subtasks. Algorithms .are called activities; instances of their activation are called processes. I I I I LOGOS and the Software Engineer (iv) An interpreter which accepts user directives and determines appropriate action. 313 CONTROL GRAPH DATA GRAPH A facility need not have all of these elements. A wholly software facility would not have local resources, while a storage allocator would contr-ol a resource but have no set of algorithms to be selected by a user. This philosophy of system structure can be applied to both hardware and software. It is consistent with the structural approach to proving program correctnessI3 ,2,14,15 which is to .force the structure of the program text (or representation) to correlate strongly ",ith the structure of the actual computation, thus allowing analysis of the computation by analyzing the structure of the representation by stepwise decomposition. THE LOGOS REPRESENTATION SYSTE1V[ The central part of any CAD system is its representation system which consists of the design data base in which the description of the target system is accrued, the external representation of this design information, and the translators between the external and internal form. The representation system must satisfy severBl global constraints. . First, the representation must be sufficiently general to describe all interesting and desirable objects in the set of design objects, while at the same time, it must be sufficiently specific to allow algorithmic consistency and performance analysis. Second, the internalrepresentation must be decomposable into elements which may be implemented directly. Finally the 'designer must be comfortable with the external representation and the constraints it places upon his freedom of expression. In the case ofJ;.,OGOS, the target o~jects are facilities, which can be described by a number of algorithms implemented in either hardware, software, or a combination of both. Therefore, the representation must be suitable for describing both and· must yield the target system implementation directly. The representation must be consistent· with . the hierarchical, layered view of system structure. It must, therefore, be deClarative in that it must express bOth the structure and function. of ~he t,arget facility to allow algorithmic. consistency and;performance analysis .. It must allow~he design tobe described in multiple levels of abstraction to accommodate the primitive and internal views of facilities required: forstepwiB'e . analysis. This feature is especially important to designers since they terid to work "around" in a design rather than in a strictly top-down or bottom-up manner. Finally, since many of today's computer systems and Figure I-Example of an activity those proposed for the next generation contain parallel processing capabilities, the representation must allow the specification of parallelism or concurrency in a natural way and be capable of analyzing its effect on consistency and performance. LOGOS chose a graph-theoretic system of representation which satisfies the above constraints. The system is a synthesis and extension of valuable work done by Petri,16 Karp and l\1:iller,17 Holt,18 Luconi,19 and others. The extensions were required because (1) LOGOS deals with very large systems and must localize analyses, (2) little work had been done in the representation and analysis of data structures in graph models, and (3) because LOGOS must actually implement rather than merely analyze the target algorithms or systems. A complete treatment of the representation may be found in References 20 and 21. Briefly, the representation of an algorithm consists of a pairof directed graphs. The data graph (DG) defines the algorithm datastructures and the transformations upon them, while the associated control graph (CG) sequences the transformations and defines the control flow. The schema formed bya CG-DG pair is called an activity and will be seen 314 Fall Joint Computer Conference, 1972 AND: BLOCKHEAD OR: BLOCKEND PREDICATE: BLOCKHEAD a b c d p e c f 9 h i k 1 X X 0 0 0 0 0 0 1 0 0 X 1 X 1 1 0 0 0 0 0 BLOCKEND d f h j k 1 m n Figure 2-LOGOS atomic control operators X , 0 0 0 0 0 0 1 0 0 0 0 X 0 1 1 0 0 0 0 , 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ) 0 LOGOS and the Software Engineer to be the static template of a task. Figure 1 is an example of an activity. The CG consists of two node types: the squares are control variables or c-cells, and the remaining nodes are control operators. Cells must be connected only to operators by directed arcs and vice versa. There are several types of control operators as denoted by the different shapes in Figure 1. Each type of control operator has an associated enabling or transfer function defined on its input and output c-cells. The DG consists of cells (squares) which represent the information structures of the activity, and data operators which perform the transformation upon them (e.g., Add, IV[ove, Integrate, etc.). Each data operator is associated with a unique control operator which determines when the data transformation may take place. The initiation of a control operator initiates the associated data operator which then reads its input cells (data structures), performs the data function, and writes the results into its output cells. Upon writing, the data operator communicates to the control operator that it has terminated, and the control operator terminates by alterings its c-cells appropriately. The flow of control in the C G is determined by the values in the c-cells and the nature of the c-operators to which they are connected. The c-operators are defined so that asynchronous or synchronous control and data flow can be represented. The atomic or first level control operators are shown in Figure 2 together with their transfer functions in vector form. The AND operator of Figure 2 is used to resynchronize parallel control paths and functions analogously to an AND gate in hardware. The OR operator is asymmetric in that if both of its input c-cells contain l's, the initiation of the operator preserves the 1 in the second c-cell. It will then reinitiate as soon as possible. This "conservation" of l's is required to insure determinacy, a property of consistent systems with concurrency which will be discussed later. The OR operator is analogous to an OR gate in all other ways. The PRED operator is the interface between data values and the control flow in the CG. It is a data dependent control branch whose associated data operator performs a test on its input d-cells. The result of this test conditions the branch in the control. The blockhead (BH) and Blockend (BE) operators are paired to delineate an activity and form the enclosing control for the facility task being represented. The control algorithms must perform the following functions: (i) arbitrate access to the facility (ii) provide a communication discipline between the facility and its users 315 (iii) define the number of concurrent users which may be served by the facility. The BH/BE pair described in Figure 2 act as a P-V pair. The arbitration algorithm shown is a fixed left-toright priority, but round-robin and other disciplines have been implemented also. The BH and BE communicate primarily via the feedback c-cell, which initially contains, if it is present, the number of concurrent activations possible. All control flow is restricted to enter and leave the activity via the BH/BE pair with the exception of nested subroutines or procedure calls (i.e., calls upon activities on the same layer of the system) which are controlled by Call/Return operator pairs constructed from a common control primitive. These control operators may all be constructed from a common primitive control operator whose definition is logically complete. This primitive operator may be realized directly in hardware, but for the purposes of the software engineer, it is sufficient to state that other, higher-level control operators may be constructed from the primitives and placed in a macrolibrary. The activity of Figure 1 is shown in Figure 3 with interpretations placed upon the data structures and data operators of the DG (these are informal interpretations; a formal syntax will be introduced later). The activity is the representation of an ALGOL 60 FOR statement with a parallel DO (statement) part. When the task is activated, the stepping variable is initialized to 5, and the loop head is passed. Note that there is no data operator associated with· the loophead OR. If the predicate is false, the parallel DO (statement) is executed which allows the sequence of data functions fj to be time independent of h. The threads are resynchronized at i whose data operator uses common results. n is decremented and the loop is re-entered. Thus we have represented: FOR N =5 Step-l until 0 DO (statement) The 1 in the feedback c-cell indicates that the activity may be initiated only once before terminating. The nesting of activities on a layer allows the imposition of an ALGOL-like block structure upon the representation. If the activity in Figure 3 were in a block structured environment, data cells A, B, and C might be global to the block while n, m, p, 5, and r are local. Thus, a collapsed or primitive view of the activity is that of a single control-data operator pair as shown in Figure 4. For most types of analysis performed by LOGOS, the local structure of an activity is analyzed in its internal or expanded form, and the activity is then collapsed. All further interactions with its global environment are analyzed in the collapsed form. In this 316 Fall Joint Computer Conference, 1972 DATA GRAPH A CONTROL GRAPH B 5 n m p r Figure 3-Example of Algol 60 for statement representation LOGOS and the Software Engineer way the analysis of a software (or hardware) system can be carried out in a stepwise, computationally efficient manner. The imposition of an ALGOL environment is optional, of course, and does not affect the representation itself. To do so does imply the existence of an ALGOLlike run time environment layer which implements the necessary storage allocation and other semantics. A cactus stack is required to keep track of the concurrently active and executing tasks. The representation may be generalized to allow controlvariable contents to be non-negative integers with the control operator definitions changed to allow decrementing of input c-cells and incrementing of output ccells by greater than one. The constraint that output cells be 0 before initiation of the control operator is removed, and the initiation and termination of control and data operators are made distinct to allow multiple initiations of data operators before termination of preceding activations as resources allow. This generalization is useful in describing higher level software processes and hardware such as pipeline systems. Thus far, only an ALGOL-type level structure has been suggested. Where does layering enter the picture? The concept of layers enters the representation at the data operator. The function performed by a data operator may be truly a system primitive or it may be a "call" on a lower layer facility. That is, its data cells may be parameters to a task existing on a lower layer which is activated by the initiation of the data operator. This may in turn activate other tasks on still lower layers, but the entire data function appears primitive at the layer on which it is initiated. This is an explicit call upon a lower layer. An implicit call would be the activation of the storage allocator upon activation of an activity in a block structured environment. A formal syntax and semantics for data structure~ Figure 4-Collapsed activity c.Df 317 , WORD = (3q). (d) CDF DOUBL_WD = YiQ.BD (2) (6) STRlJer INTEGER = ; (C) STRUCT LlSTEL =- (INTEGER; DATA') REF LISTEL; PTR)j Cd) INTEGER ~CONSrRAINr. SIMPLE (e) EX TO CF) LlSTEL Figure 5-Example of data structure declaration and a syntax for data operator declarations is being developed. The declarative language is similar to ALGOL 68 and the resulting graphic representation of the data structure descriptors resembles those of Early's VERS.22 The language consists of six basic building block structures-SIMPLE, MULTIPLE BIT STRUCTURE (MBS) , ARRAY, REFERENCE, UNION, and COMPLEX. Examples of SIMPLE structures are integers, reals, etc. MBS's are used to define fields in words or tables. REFERENCE structures denote address variables. UNION is meant in the Set sense, and thus UNION is a place holder for one of a set of structure types. A COMPLEX structure is heterogeneous, consisting of more than one type of basic building block. Another fundamental concept is that of a constraint. The data structure declarations define logical relationships only. Constraints are used to relate these to physical realities such as words, right half words, etc. These two primitive constraints are: WORD and CONTIGUOUS. As an example, consider Figure 5. The length of a WORD is defined in Figure 5a. The constraint 318 Fall Joint Computer Conference, 1972 DOUBL_WD is defined in Figure 5b, and an integer is Figure 5c. A complex structure LISTEL (list element) is defined in Figure 5d. It is constrained to occupy a double word, one being an integer, and the other a reference to a LISTEL. The terms DATA and PTR are accessing function names. Figures 5e, f, and g show the graphic representation of the resulting templates. Instances of these data structures may be declared which create descriptors based upon the template but with nodes for allocation information added. Data operators are defined in terms of the types of their input and output data structures. LOGOS has no formal semantics for data operators, so functional definition is not possible at present. However, an informal semantics is being developed to enhance interdesigner communication and to allow simulation of activities if desired. The intent of this brief and incomplete description of the structural philosophy and representation system of LOGOS has been to set the stage for a discussion of the use of LOGOS and the analysis tools it provides the software engineer and systems designer. THE DESIGNER'S ENVIRON1VIENT Before discussing the types of analysis tools LOGOS provides the software engineer and system designer it would be helpful to examine the LOGOS environment by describing a typical design scenario. The systems architects, two or three highly skilled analysts, will either be given or will create a specification for the target system in terms of capabilities, number of users, service times, arrival rates, etc. They will pick the design parameters, block the system into facilities, and identify any obviously common facilities such as memory. In the case of a software system built on an existing computer, the system primitives-the machine instructions-will be specified in advance. The facilities will probably be specified in terms of their external characteristics and will have required performance parameters associated with them. The information will be given to a group of designers (perhaps the architects themselves) who will define these facilities in the LOGOS design data base from their graphics consoles. The individual tasks performed by facilities will be roughed in, and performance parameters defined for them from those on the facility itself. The designers may define canonical schemata and store them in a macrolibrary to be inserted and expanded during the design process. These may include structures such as IFTHENELSE and DOWHILE, the primary control elements of structured programming. 23 In terms of hardware, these macros will include the set of lVISI functions· available to the designers. As the description of a task becomes complete on a layer, the resulting activities can be analyzed,and the activity collapsed. Common tasks may be grouped into facilities on lower layers and defined accordingly, each having a performance specification derived from above. The designers will make their work available to each other by placing it in a common global data base. Here, lower layer facilities common to several designers may be identified. Duplicate and similar facilities and tasks will be replaced by commonly shared facilities. Modifications may be evaluated along the way by substituting modified tasks into the data base and reanalyzing the affected portion of the design. This process is continued across descending layers until the data operator functions are in terms of the software primitives of the target system, i.e., machine instructions, implementation language statements, or a trial set of instructions if the entire hardware/software system is being created. Code optimization can then be attempted using one of the newer graph-oriented optimization techniques. Code generation will be discussed briefly in the next section. If the hardware and/ or implementation language exists, actual times will be available for the software primitive operations. These can be reflected up and across layers to determine if the performance requirements were met. If not, redefinition of tasks and/or layer boundaries will be required to attempt to meet the specifications. If the hardware has not yet been designed, it can be begun at this point with the trial instructions and their required times. as the target. The process is identical to the one above, 'but the lowest layer hardware primitives will be hardware elements such as NAND gates, MSI chips, etc. Once again, actual performance information becomes available and is backed out to the software layers. If the resulting performance is inadequate, the interpreter (hardware) can be speeded up by increasing the degree of parallelism or upgrading the technology. On the other hand, the hardware/software interface can be adjusted by redefining as primitives certain key data operators which were originally implemented as calls upon lower layer facilities. These procedures may be applied interactively in combination to reach the desired performance, if indeed, it is attainable at all. Once the instruction set is frozen, code may be generated, and the necessary steps taken for implementation of the hardware. Note that this series of events is a departure from the usual practice of defining the target instruction set as almost the first step in system design and then sending the hardware designers away to build a computer and the programmers to build an operating system. The LOGOS and the Software Engineer -integrated design approach advocated here should (1) reduce the hardware/software interface mismatches, and (2) allow cost/performance tradeoft's to be intelligently evaluated at the proper time-before commitment to hardware and code. The dat~ structures, data operators and resulting code of the operating system are simply data in one of the data structures-memory-of the interpreter (hardware processor). This is true of all program/ interpreter systems. If the interpreter were not to be implemented in hardware but on, say, an 1108, then the data operator primitives would be 1108 machine instructions, and code rather than hardware would be generated. In addition to a framework for representing layered systems, LOGOS will provide the designer with several types of consistency and performance analyses. Further, code generation of target system software, and ultimate implementation of target system hardware are goals which appear attainable. The analyses can be separated into two classesuninterpreted and interpreted. Uninterpreted analysis implies that no interpretation is placed upon the function performed by the data operators for purposes of the analysis. Thus, uninterpreted analyses deal primarily with the control graphs. and are topological in nature. The addition of parallelism or concurrency to an activity raises several analysis questions. Of primary interest is whether multiple activations of a parallel activity (schema) with a given initial control state (contents of its c-cells) and data values will result in the same final values in a set of "result" locations. This condition is called determinacy and was formulated originally by Karp and Miller.17 This condition, even after formalization, is mathematically difficult to prove. Another condition, more stringent but easier to verify, has been formulated by Karp and Miller. 1. A schema is determinate if, given an initial state, qo and an initial set of values, each data location has a fixed sequence of values. With this condition satisfied, then a schema will surely produce consistent values in the "resultn locations provided that the algorithm terminates. Karp and Miller further showed that the above condition is equivalent to the following two conditions. 2. (i) No two data operations can be concurrently enabled to "write" into a common data location. (ii) No data operation can be enabled to "write" into a data location while another data operation is simultaneously enabled to "read" from the same location. 319 From conditions· (i) and (ii), a schema is determinate provided that it is free of "race" conditions of two types. This situation should not startle hardware designers who have always faced this problem. Karp and lVliller gave conditions on a parallel schema which allow determinacy analysis to be conducted on the control graph. The analysis tool is a mathematical construct called a "vector addition system" (VAS); for a given schema, the vectors used have one component corresponding to each c-cell in the control graph. A vector qo gives the initial control state, and, for each control operator, e, a vector oe gives control state changes when control operator e occurs. These change vectors may be derived from those shown in Figure 2, but may be generalized to integers greater than ± 1 for higher level representation. A "tree of nodes" is generated from the root node qo which corresponds to the tree of attainable control states of the schema. The algorithm identifies loops in. the control and may be used for finite and infinite attainable state schemata. A complete treatment may be found in Reference 20. The resulting tree can be used to determine those control operators which can·· be simultaneously enabled, and, hence, those data operators which are concurrent. By examining the input and output data cells of those data operators, conditions (i) and (ii) above can be verified. The blockhead/blockend of the activity in LOGOS limit the scope of the analysis, and thus can limit the size of the tree to manageable size. The activity can be analyzed for determinacy and collapsed. It will then appear as a single operator pair in more global analyses. The vector addition system can be used to check for proper termination of an activity, i.e., can a control! data operator pair remain enabled after the blockend of an activity is enabled? Further, is the topology of the control graph such that the activity will not terminate? Remember that this is uninterpreted analysis, and, consequently, the results of predicate operations are not known. Therefore, in some cases, all that can be said is that there exists a path which if taken, will result in no termination. Similarly, by viewing all activities as primitives, a potential recursion analysis can be carried out using the vector addition system. These types of analyses fall into the category of general control path analysis, and additional algorithms in this family can be identified and easily implemented using the VAS. A major weakness in the integrity of computer systems has been the management of system resources and the prevention of system deadlocks. This is particularly true in systems with a high degree of real or apparent concurrency. This problem has been extensively studied, and much insight has been gained. 6 ,l1,24,25 Holt25 has 320 Fall Joint Computer Conference, 1972 developed graph models for deadlock and resource allocation which' are directly applicable to· the' LOGOS environment; Resources are represented as control cells, and a topological analysis using adaptations of Holt's results can be performed. Once again, a layered structure tends to limit the scope of analysis. System performance analysis depends upon knowledge of arrival rate and service reqm~st distributions, and, thus is only as good as the model load. However, actual path transit times can be computed in the LOGOS environment, and if model service request distributions and arrival rates are available,performance statistics .can be gathered before implementation using a combination of path analysis and simulation, if necessary. Interpreted analysis deals with the correctness of the algorithms used in implementing the activities. At present, LOGOS has no automated solution to the program .correctness problem; The layered structure .of target· systems, together with the communications disciplines enforced by the syntax of the representation and the various other analysis algorithms tend to assure logical and structural consistency. However, a logically consistent, but incorrect algorithm is undetectable. Current work by Scott and Strachey,26 leading toward a formal mathematical theory of hierarchical systems and semantics, may well be the answer. Results of this work could be adapted to r~place LOGOS current data graph syntax and semantics .and provide' a certifiable representation. In· the interim, interpreted data' analysis algorithms based upon the functi,onal attributes' of .the data operators are being considered. For example, a data operator must access . data structures of the appropriate type and compute results which correspond to ·the types of output. data structures to which it is connected. This is useful in analyzing data functions which are implemented by interlayer facility activations. In critical areas" actual simulation of the, algorithms in question may be performed directly. Finally, if a global semantic such as ALGOL 60 or FORTRAN is imposed, environmental ,consistency algorithms such as scope of reference can be included modularly. CURRENT STATE OF LOGOS The LOGOS system is being implemented on a Digital Equipment PDP-I0 with a Bolt, Beranek and Newman paging box and TENEX executive system. The primary graphics terminals are two IlVILAC PDS-l display systems whichcomrnunicate with thePDP-IO at 9600 haud. The implementation language is SAIL (Stanford Artificial Intelligence Language). A multi- designer data base management system is being implemented using the LEAP associative data structures of SAIL and the TENEX virtual memory facilities. The system provides for local (single user) and global (shared) data bases with linking between local and global information in a controlled manner. The data base management system is based upon earlier work by M. Pliner. 27 The graphical representation system is implemented together with the following analysis algorithms: graphical syntax checking, determinacy, halting and termination, and repetition freeness. Implementation of generalized control path analysis is also under way. The remainder of the control analyses, deadlock and resource allocation, are scheduled to be implemented and integrated by September 1973. It should be noted here that all of· the analysis packages are modular and act upon the standard internal representation, thus allowing new packages' to be added· when necessary. The implementation specifications for the data structures and data operators are scheduled for completion in December 1972, and implementations should be complete by September 1973 along with the associated analysis routines. These analysis routines assume a FORTRAN environment with a static block structure but may be replac~dif another semantic is chosen. Performance analysis algorithms should be implernented and integrated by September 1973. Thus, with the very major exception of 'a, formal semantics and corresponding attack on program correctness, LOGOS is scheduled to have a funning representation and analysis system by September 1973. The implementation of target systems requires the production of a code generator for the software and a "hardware cornpiler" for the hardware portions of the representation. Here again, Scott's work may provide a general solution to the semantics problem for the code generators, but even without' such results, if the software primitives in the data graphs are machine language instructions of the target machine, code generation becomes rather straightforward. In addition, the graphic form of the program tasks will allow application of the newer optimization. techniques to the target software. A first cut code, generation scheme for sequential (rather than parallel) systems should be implemented in early 1974. Rather than re-create a Hhardwarecompiler" which would require 30-50 man years, LOGOS has chosen to interface with existing hardware CAD systems at the logic equation/logic diagram level. .Although much of the information which could help in optimization of the hardware will be lost in' going to' the equations,' the time scale and scope of the project preclude attacking the hardware CAD problem directly. It is felt, however, LOGOS and the Software Engineer that the graphic representation may provide helpful insight in the partitioning and placement operations of hardware CAD, and those problems will continue to be studied. The hardware equation/diagram outputs are scheduled for September 1974. In parallel with these efforts, an attempt is being made to define one or more programming languages to serve as alternate external representations of the target system rather than the current graphical representation. This is being done because some programmers may feel uncomfortable with the graph form, and because the human engineering and scope management problems become significant as the complexity of the target graphs increases. The LOGOS representation has been used off-line to describe various types of small systems and subsystems including a PDP-8. The resultirig descriptions are concise, and being able to see both the structure and function of the systems in one "picture" aids in. understanding the target system. With regard to implementation, the resident executive in the IMLAC display'processors was designed according to the LOGOS structural philosophy. The IMLAC system proVides the designer. with facilities of (1) creating a picture and designating it a subroutine for transmission to the PDP-IO, (2) editing a subroutine, and (3) deleting a subroutine. The system exists on six layers as shown in Figure. 6. The lowest is the PDS-I hardware used by all higher layers. The next LAYER 1 h/ SUBROUTINE CREATE LIGHT PEN TRACKING LAYER 2 LAYER 3 GRAPHICS MESSAGE HANDLER KEYBOARD CHARACTER HANDLER ~ " 1·· CHARACTER ~ RECEIVER CHARACTER ~ TRANSMITTER IMLAC HARDWARE Figure 6-Layer structure of Imlac executive LAYER 4 LAYER 5 LAYER 6 321 layer facility .is the character transmitter (all messages, text and graphics are sent to the PDP. . IO as multiple character strings). Layer 4 contains the keyboard character handler and the character receiver both of which are users of the character transmitting facility. The; character receiver. uses the character transmitter facility to control the transmission of characters from the PDP-IO to the Il\1LAC. The next layer has three independent facilities-'-the light pen tracking facilities, the graphics message handler, and the core management facility. All of these facilities are used by the facilities on layers 1 and 2, the subroutine edit and subroutine create and delete facilities. The communications discipline between the facilities. are well-disciplined according to LOGOS design principles. The design and implementation of the executive required about six man months of effort. It occupies approximately 3000 words in Il\1LAC. core and was coded in assembly language. As with .the 'THE'li and 'VENUS'7 systems, coding errors were discovered,; but few logical errors were committed in the design. These proved easy to identify and correct. CONCLUSION The aim of Project LOGOS is to provide the computer system designer with a .computer-assisted design environment in which good engineering practice can be applied to large-scale target systems and verified after the fact. The basis of this good engineering practice is a structural view of computer systems which is a generalization of Dijkstra's2 and l\1il1s'23 structured programming for sequential software. Dijkstra's 'THE' system6 is a result of this philosophy as is the VENUS system.7 Both these and the IMLAC executive have demonstrated the payoffs of a well-disciplined approach to structure. They were implemented in a fairly short time by small design groups (VENUS required about 6 man years for the design and implementation of the operating system and the support software). They were easily checked out and modified, and have proven to be stable, reliable systems. The primary contribution of LOGOS in this area is that it provides a uniform, analyzable representation in which to. express. these otherwise abstract notions of system structure, one which leads directly to the implementation of· the target software or hardware: It also allows the designer to, express the maximum degree of real and apparent concurrency in his target system and provides the analyses required io evaluate its- effect. Both 'THE' and VENUS are small operating systems implemented on small to medium scale machines, yet even they were found to contain a few errors resulting 322 Fall Joint Computer Conference, 1972 from breaches of discipline. True, these errors were easily corrected, but as the size and complexity of the operating system and hardware increases, the difficulties of enforcing the disciplines, detecting errors, and correcting them without introducing more will increase nonlinearly. It is because of this complexity explosion that a CAD environment such as LOGOS is required for large scale systems. A LOGOS-type system can provide several other advantages to the software engineer and system designer. First, because performance measurements can be made before rather than after implementation, modifications to the system can be proposed and their effects evaluated economically. In particular, the final positioning of the hardware software interface can be postponed until quite late in the design cycle and can be made a true function of performance vs. cost. Second, the design team will tend to be smaller. The computer will act as the "bookkeeper" and will perform many of the analyses which have traditionally been attempted manually or not at all. Third, the increased degree to which a target system can be certified before implementation (even without formal semantics) should reduce the integration and checkout cycle significantly. It may also be possible to produce more complete diagnostics in a LOGOS environment since the entire system description as well as its implementation is stored within the design data base. This is an area for continued research. Finally, although this hints of "big brother," valuable management and scheduling information can be extracted from such a system. The effectiveness of designers, the times required to complete various portions of the system, etc., could be used in estimating, staffing, and scheduling future systems. LOGOS is an open-ended system. Although a first producing system will be complete in 1974, it is expected that the users themselves will enhance, modify and tailor the design environment to their needs as new technology becomes available. ACKNOWLEDGMENTS The LOGOS design environment described in this paper is the result of work done over the past three years by Professor E. L. Glaser, principal investigator; Dr. F. T. Bradshaw; S. Katzke; the author and several others. In particular, much of the philosophy of system structure which underlies the LOGOS system was articulated by Dr. Bradshaw, and the syntax and semantics of target system data structures and data operators were developed by S. Katzke during his doctoral research. REFERENCES 1 F G HEATH C W ROSE The case for integrated hardware/software design with CAD implications IEEE Computer Conference Digest September 1972 2 E W DIJKSTRA EWD249-notes on structured programming T. H. Report 7o-Wsk-Q3 Technological University Eindhoven Netherlands April 1970 3 T BREDT A model for parallel computer systems Technical Report No 5 STAN-CS-70-160 Stanford University April 1970 4 C G BELL A NEWELL Computer-structures: reading and examples McGraw-Hill Book Company New YQrk New York 1971 5 M BARAY Y H SU A digital system modelling philosophy and design language Proceedings Eighth Annual Design Automation Workshop 1971 6 E W DIJKSTRA The structure of the T.H.E.-multiprogramming system Comm ACM Vol 11 No 5 May 1968 pp 341-346 7 B LISKOV The design of the VENUS operating system Comm ACM Vol 15 No 3 March 1972 pp 144-149 8 C D MARSH A utomation of the design and manufacturing of a large digital computer lEE Electronics & Power October 1970 pp 375-379 9 M R CORLEY The graphically accessed interactive design of thermally stressed pipe systems Proceedings Ninth Annual Design Automation Workshop 1972 10 F T BRADSHAW Some structural ideas for computer systems IEEE Computer Conference Digest September 1972 11 E W DIJKSTRA Co-operating sequential processes Programming Languages ed F Genuys Academic Press 1968 12 M J SPIER E I ORGANICK The MULTICS interprocess communication facility Second ACM Symposium on Operating Systems Principles Princeton University October 1969 13 E W DIJKSTRA A constructive approach to the problem of program correctness BIT Vol 8 1968 pp 174-186 14 CAR HOARE Proof of a program FIND Comm ACM Vol 14 No 1 January 1971 pp 39-45 15 N WIRTH Program development by stepwise refinement Comm ACM Vol 14 No 4 April 1971 pp 221-227 16 C A PETRI Kommunikation mit automaten Schriften des Reinsch-West Falischen Inst Instrumentelle Math und der Universitat Bonn Nr 2 Bonn 1962 17 R M KARP R E MILLER Parallel program schemata Journal of Computer and System Sci 3 1969 pp 147-195 LOGOS and the Software Engineer 18 A W HOLT F COMMONER Events and conditions an approach to the description and analysis of dynamic systems Third Semi-annual Technical Report Part II For the Project Research in Machine-Independent Software Programming Applied Data Research Inc April 1970 19 F L LUCONI Asynchronous computational structures Doctoral Thesis MIT Cambridge Mass January 1968 20 F T BRADSHAW Structure and representation of digital computer systems Jenning Computing Center Report No 1114 Case Western Reserve University Cleveland Ohio January 1971 21 C W ROSE A system of representation for general purpose digital computer systems Jennings Computing Center Report No 1113 Case Western Reserve University Cleveland Ohio August 1970 22 J EARLY Toward an understanding of data structures Comm ACM Vol 14 No 10 pp 617-627 323 23 H D MILLS Mathematical foundations for structured programming FSC72-6012 Federal Systems Division International Business Machines Corporation Gaithersburg Maryland February 1972 24 A N HABERMANN Prevention of system deadlocks Comm ACM Vol 12 No 7 July 1969 pp 373-385 25 R C HOLT On deadlock in computer systems Doctoral Dissertation Cornell University Ithaca New York January 1971 26 D SCOTT C STRACHEY Toward a mathematical semantics for computer languages Tech Monograph PRG-6 Oxford University Computing Laboratory August 1971 27 M S PLINER PDMS-a primitive data base management system for representing structured data in an information sharing environment Doctoral Dissertation Case Western Reserve University Cleveland Ohio September 1971 Some conclusions from an experiment in software engineering techniques by DAVID L. PARNAS Carnegie-Melton University Pittsburgh, Pennsylvania In two earlier reports1,2 we have suggested some techniques to be used producing software with many programmers. The techniques were especially suitable for software which would exist in many versions due to modifications in methods or applications. These tech~ niques have been taught in an undergraduate course3 and used in an experimental project in that course. The purpose of this report is to describe the results that have been obtained and to discuss some conclusions which we have reached. The experiment was completely uncontrolled, .the programmers generally inexperienced and poor, and the programming system used was not designed for the task. The numerical data presented below have no real value. We include them primarily as an illustration of the type of result that can be obtained by use of the techniques described in the earlier reports. We consider these results a drastic improvement over the state of the art. Major changes in a system can be confined to well-defined, small, subsystems. No intellectual effort is required in the final assembly or "integration" phase. "checkout') some module other than his own. Because of the billing policies of our University Computing Center, the programs were to be written and run in WATFIV-a version of FORTRAN. All the defined functions were to be made available as either subprograms or FORTRAN functions. Only minor additional information was supplied beyond the specifications given in Reference 2. , (1) Where necessary, the error routines were given an additional parameter to.be used in identifying the module whose error procedure should be executed. This arose only where the same function could be called from more than one module. (2~ Module identIfication numbers were assigned for use in selecting the error routine. (3) Conventions for the naming of labelled common were established. No programmer ever knew the name of the common used by other programmers. The conventions merely avoided duplication. (4) Maximum values for the various .parameters were specified. THE PROJECT The students did not know which' combinations of systems would be tested, nor did they know which version of the module they would check. For that reason they could use no information other than the published specifications. On completion of the programming' and checkout of individual modules, complete systems were assembled by a graduate student who had absolutely no knowledge of the internal structure of any module. The results indicated below were obtained with only one major difficulty. All students had dimensioned their arrays for the maximum possible values of. the parameters. The combined storage exceeded what was available in the programming system. The .sizes of the arrays were easily reduced to a value appropriate for the actual The class was asked to produce the KWIC index system described in Reference 2. The project was divided into six modules, but two were combined because they were clearly simpler than the remaining four.* For each of the five assignments we specified four distinct types of implementation. Each student was given one of those to program. Had the experiment been a complete success, any combination of one version of each assignment would have run correctly; we would have had 45 working versions (five independent selec.,. tions from sets of four elements). In addition, each student was assigned to write a program which would * See Appendix 1 for a brief description. 325 326 Fall Joint Computer Conference, 1972 TABLE I-Final State of Assignments for Individual Participants Assignment Version 1 2 3 4 5 A B C OK OK INCORRECT3 OK OK NOT ASSIGNED NOT COMPLETED 2 OK OK OK INCORRECT' INCORRECT5 D E INCORRECT3 NOT ASSIGNED OK OK STUDENT DROPPED OK OK OK OK NOT ASSIGNED OK OK NOT ASSIGNED Notes: 1. In our calculation of the potential number of working combinations we excluded versions which were not assigned or were assigned to students who did not complete the course. 2. No work was supplied by this student. 3. The students assigned to check these programs did not do so. The modules were thought (by the instructors) to be incorrect, but the simplest test was to include them in combinations with programs which were working properly. The suspect programs made errors which were detected by the other modules. The errors were verified by the instructors to be violations of the specification of the modules in question. In fact, in both cases the error had been detected by the student's own tests, but they failed to examine the output closely enough to notice. (These were, by any measure, two of the poorest students in the class.) 4. This program was clearly incorrect, but still did not violate the restrictions specified for the modules which it called. Thus combinations involving this program would run but would produce incorrect output. It produced the same incorrect output in every combination tested. The program was "completed" by the student well past the due date and the "checker" was not able to do his job. 5. This program simply failed to terminate in any case. The error was found by the checker. test. (In a language such as ALGOL where the dimensions to arrays could be variables, this difficulty would have been easily avoided.) Table I gives the versions of each module which we judged correct. From this we may calculate that there are 192 working combinations. We could not test all of these. An experiment was planned so that (1) each version is used in at least two combinations and (2) each version was in at least one combination where it was the only difference with another tested combination. Table II shows the results. It should be noted that the fact that only 192 of the possible 1024 combinations worked does not represent a failure of the method. It represents the failure of five students out of 20 to complete the work assigned to them. One can argue that these failures provide additional evidence of the value of the method. In eachc~se it was possible to show, without doubt, that the individual student had failed to do his assignment. In most projects to construct programs in teams some ambiguity in the individual work assignments results in some difficulties which cannot be assigned to an individual programmer. Because of the use of formal specifications in this project we had no cases in which a program was found to meet its specifications yet would not work in combination with other programs which met their specifications. Further experimentation 1. When an earlier version of. this note was circulated privately early this year, Mr. Thibault of IRIA, Rocquencourt, France studied the data and suggested trying the combination 1B,2B,3D 4E and 5D which he believed would be significantly faster than any of those tested. 4 It ran in 4.4 seconds. TABLE II-Execution Times for Some of the Combinations Tested Execution Time (sec.) (excludes compilation of 6-8 sec.) Combination Tested lA lA lA IB lA IB lA lA lA lA lA lA lA lA lA lA lA lA lA lA lA lA 2B 2D 2D 2E 2E 2A 2A 2A 2A 2A 2B 2A 2D 2D 2E 2E 2B 2B 2B 2B 2B 2B 3B 3D 3A 3A 3A 3E 3B 3B 3B 3D 3D 3B 3B 3B. 3B 3B 3B 3D 3D 3D 3B 3E 4B 4B 4C 4C 4B 4C 4B 4B 4C 4C 4C 4E 4E 4E 4E 4C 4C 4C 4C 4B 4C 4C 5A 5A 5A 5A 5D 5B 5B 5A 5A 5A 5A 5A 5A 5D 5D 5D 5D 5D 5B 5B 5B 5D 37.26 11.42 10.87 10.31 8.53 21.79 302.99 50.16 36.69 11.07 10.99 43.30 43.61 19.17 19.16 28.48 27.23 8.43 76.34 113.32 238.88 10.06· Conclusions from Experiment in Software Engineering Techniques 2. We have just repeated the whole experiment with a somewhat larger class. The results were essentially the same. We estimate that the family of programs has 1100 members, more than 400 of these .were tested. Performance improves somewhat, ranging between 3 and 13 seconds. The only interesting distinction between the two experiments was that the instructor (project leader) changed from intensely interested to bored and unconcerned with no noticeable effect. We also eliminated the problem with storage limitations mentioned above. Conclusions 1. We cannot avoid stating our conclusion that the experiment has revealed some validity in the comments of our earlier papers. 2, 3 Clearly one purpose of this paper is to draw your attention to those earlier ones. 2. Our most significant new conclusion comes in the area sometimes called "project management". Recent papers have suggested that the project manager must devote a significant part of its best manpower to the "integration phase". In our experiment the "integration phase," while not mechanized, was so simple that it could have been mechanized. Even in the few cases where errors did occur, the system had been structured in such a way that diagnostic messages automatically indicated the module making the error. We had no need for anyone who had a thorough knowledge of the whole system. Our experience indeed suggests that the integration phase is a very poor place to invest one's manpower. The limited capacity of our minds makes us more efficient when our job depends on a relatively small amount of knowledge. Moreover, if we plan our project management around a large "integration phase," we will have to invest that manpower again whenever we change some part of the system. Our experiment suggests that manpower can be much more profitably invested in the "preprogramming" or "design" phase. The success of our project depended largely upon the precisely written module specifications described in Reference 1. The "cost" or intellectual effort required to produce one of these module specifications was comparable to the cost of producing an implementation of the module. Such predesign work therefore appears to many as unjustifiable overhead. When we amortize this cost over the number of versions of the system which are 327 finally built, and consider the savings realized in the final "integration" phase, it appears to us that the overhead is well justified. Efforts in the industry to invest heavily in a "pre-design" or "concept" phase have often proven fruitless because the outcome was a set of natural language documents which were so general that they provided almost no decisions to guide the development groups. When this predesign phase produces precise module specifications the payoff is much more significant. Additional amortization of the "pre-design" effort can occur when the modules or their specifications are used (either unchanged or slightly modified) in a later project. 3. Another important conclusion lies in the area of documentation. Several firms have invested heavily in formalized documentation standards intending to make all information easily available to everyone on the project. Our experiment suggests that the effort in these projects can be focussed. Precise documentation of the external characteristics of each module is essential and should be in a standard notation. Our project had minimal documentation about the internals of the one.;..man assignments. Industrial practice would require more effort in the area than we put into it, but much less effort than is now common. More significant, the specifications produced in the pre-design phase were the only external documentation required throughout the project. These documents were updated several times as errors were discovered, but no additional descriptive material was needed. This is yet another way that the effort invested in the pre-design phase can be amortized. 4. Our experience demonstrated the importance of careful attention to the possibility of errors in the running program during the "pre-programming" phase. Because of our careful attention to the errors in the design phase, errors which did occur when the systems were assembled were quickly traced to their source and meaningful diagnostic information was produced with almost no effort on the programmer's part. A paper reporting what we have learned in this area is in preparation. 5. Our experience has indicated the great value of independent module tests (by persons other than the module author) before integration. In an earlier effort of this sort we required each programmer to test his own module before integration. In the two experiments which we discuss here, we required an additional person to 328 Fall Joint Computer Conference, 1972 test the module against the formal specifications (another use of our pre;.design efforts). Our success rate increased drastically and there were apparently two reasons: (1) Sloppy programmers'do sloppy tests. (2) The specifications, although precise, can be misinterpreted by human programmers. A misinterpretation by the programmer which resulted in an error in his module often results in a corresponding error in his tests. An independently written test was unlikely to share the same misconceptions. We are well aware that, as E. W. Dijkstra has put it, 6 "Program testing can be used to show the presence of bugs, but never to show their absence." Showing the presence of bugs however is a very valuable service. We eagerly await the day that professional programmers habitually produce programs which are written so that they can be carefully proven to be error free. In the meantime we suggest that effort invested in independent pre-integration testing is well worthwhile; Our -experience also suggests that both the hierarchical structure which can be found in the system2 and the abstract nature of the modules themselves greatly ease the building of the "scaffolding" required for independent module tests. To test a given module one need simulate only those modules immediately below it in the system hierarchy. Further, the nature of the modules means that many of them can be directly simulated by arrays for testing purposes. NON-CONCLUSIONS The reader of this paper and the references might be led to some conclusions which those closer to the project would not draw. We mention them here to avoid mislea V v n y= n ~ Wj(Uj>Vj) = ~ WjXj j=1 Figure l-Symbolic expression j=1 n different from usual 3-valued or multi-valued logic. The main difference is that AETL is based on the most simple method, that is, the comparison of algebraic magnitudes in operation of multi-valued variables, instead of seeking a basic set of logical operations which realize all logical functions in multi-valued logic. Hardware realizations of basic operation become straightforward by this AETL method. Therefore AETL has made it possible to design 4-valued, 9-valued, and 17-valued logical hardware on the same basis. xj=O,l 1, ... , ~ Wj y=O, j=1 where y: output Uh Vj: input Wj: weight. AETL has one output. However, logically, AETL generates many outputs comparing y with values tj. Let Uj be variable and Vi, tj be constant. Then Figure 2 is reduced to Figure 3. This restricted usage of AETL is called Multiplex Median Logic, for short, MML.7,8 An MML circuit can generate a complete set of median functions for a given weight vector (Figure 3). Here median functions M t W are defined as follows. For a weight vector (WI, W2, Wa, ••• , W n ) , SYNTHESIS PROBLEM BY AETL6 Definition of AETL n 1 Let's define a notation as follows (Figure 1). I (u>v) = { ° if U>V if U t1 ) Wj I. I An Extended Threshold Logic as Unit Cell of Array Logics X1 Xn W1 1 W2 2 Wj t Wn W 355 M, Figure 3-Graphic symbol of MML Synthesis of logical functions 9 ,10 Vn is the whole of n dimensional binary vectors. The weight sum of the second stage MIVIL does not exceed that of the first stage. When n variables ofa function f are partially symmetric and classified into h blocks as nl +n2+ ••• +nh = n, we say that f has a symmetric pattern (nl, ~, ... , nh). First of all, we give a fundamental theorem concerning synthesis problems by MIVIL. Theorem 1. Theorem 2. The necessary and sufficient condition that a given logical function can be realized by two cascaded MIVILs as shown in Figure 4 is as follows: If n~variable - -- f(X) ~f(Y), -- ~~ for all XY, E Vn then WX~WY When a given function f has a symmetric pattern (nl' ~, ... , nh), f can be realized by assigning the weights of the first stage MML in Figure 4 as follows. Let all variables belonging to the kth block have the (1) ~ where Wi is the weight of Xi and W = X1 X2 I (WI, W2, •.• , W n ) • WI 1 W2 2 I I b2' t I I I I I I b1 Wn W I bw Figure 4-Simple cascade connection f 356 Fall Joint Computer Conference, 1972 Xl 0 1 X2 I I I I I Xn 02 On t I Xn WI 1 W2 2 b1 I I I I I I I I Wn W b2 bw Figure 5-Bypassed cascade connection --. same weight Uk, and there exists a vector A= (aI, U2, ••• ,an) such as ~ ~ for all -+ AX>AY X,Y E Ch,j(X) >j(Y),jcan be realized for the given weight W, where k-l Ul = 1, Uk = L: uJnj+ 1 (2~k~h). ~ j=l Ch Then total weight sum W is equal to --l>---+ ={X I WX=h}. Minimization oj weight sum W W does not depend on the order of assigning the weight for each block. Lemma 2-1 An arbitrary n-variable symmetric function can be realized by at most two MMLs of which weight sum W is at most n. For the practical purpose, there is no reason to confine to the case of Figure 4. Figure 5 is the most generalized connection of two MMLs. Concerning the minimization of W in this case, we show next theorem. We presented a kind of table look up method. Once we have prepared the table of SWV which means Standard Weight Vector, we can select an optimum weight vector from the table of SWV for a given function, where "optimum" means the minimum weight sum. The numbers of SWV for 1- to 4-variable functions are given in Table 1. As seen from the table, the ratio n 1 2 2 Theorem 3. - In Figure 5, given function j, given weight W = (WI, W2, ••• ,wn ) of the first stage, the necessary and sufficient condition for j to be realized is as follows: If and only if, without depending on the parameter h, 2 3 4 5 19 14 222 N1 Number of standard weight vectors N2 Number of representative functions TABLE I-Nl,N2 of n-Variable Boolean Function An Extended Threshold Logic as Unit Cell of Array Logics 'I 357 i (1) x I I " X, X2 (2 ) ~XI+X2 1 2 X, X2 X3+ X, + Xl X, X2 X3 X, X2 X, * X I + X 2 + X3 X2 12 X, X2+ X,X3+ X2 X3 X3 1 3 X, X2 X3 X, X2 X3 X, X2 X3 X2 X3 (9 ) X2 X3 (10) X2 X3 (3 ) II I (8 ) ~X3(X2+Xl) (4 ) (5 ) ( 6) Xl X2 X3 XI X2 X3 X, X2 X3 + X, X, X2 G) X3 (1 1) (X , E& X2 )X3 (12) Xl X2+ (13) X, X2X3 X, X2 X, G) X2 (7 ) X2 X3 X , X3 + X2 X3 ( 14) X, X3 Figure 6-Synthesis example of 3-variable functions I of the number of SWV to that of the representative boolean functions is very low. So this table look up method is effective. The minimum value of W to realize all 4-variable boolean functions by Figure 4 is 12. The minimum value of W may be decreased by Figure 5. Figure 6 presents synthesis examples for all 14 representative functions of 3 variables. All of them are realized by at most two MMLs, each of which has at most 4 as the weight sum, and at most 3 input-pairs. Table II presents those for all 222 representative functions of 4 variables. All of them are realized by at most two MMLs, each of which has at most 10 as the weight sum, and at most 6 input-pairs. 212 functions out of 222 can be realized at most 8 as weight sum and at most 4 input-pairs. Example 1; No. 132 function (0, 1, 3, 4, 6, 9, 10) can be realized as shown in Figure 7a. Example 2; No. 62 function (0, 1, 2, 3,4,8) is I-realizable function, and the fourth column is marked by the symbol "T" which stands for threshold function (Figure 7b) . How to look up the table is as follows: Compute the characteristics vector of the given 358 Fall Joint Computer Conference, 1972 TABLE II ~Synthesis of 4-Variable Functions NO CHAR VC:CTOP) STANDARD SUN(2) NO CONNECT! ON (4) WI(3) I I I I 32'Hl~IIilIOI"1101111 e T 01011 III I I IIII ;; I 2 " I 6 I 24 1) 114 10 312 I 212 T I I 2 2 IIII I I - I 0- 2 10 IIII -I " " 0 I-I II JJ I 10 0 0- I - I II I 3 I 0 " 10 0- I I I II - I - I 0 0-1 10 I 4 201il1lkliliHilI0l1kl 10 3 5 211111 I I I JCu,"uiilll 0 7 6 211110,jlj000111Hl CI5 7 3tJIill1011 1121 1222 8 ';;;111111",,2(;:::211 ::I J .. 11111122222,;';'; I" 3111Ikl.iJ2 ';2..:2"211';';2 411 112<::<.2223J3";4 41 II<:iJd<:1 31 :::2<::3 4J J J:::,,8J2J33";..;2J 411<:<:0111 I <:22';J2 41 J 2<:<::J 11 ..I2.:2';J ~ 4112221 I 3';;;221 I..; 41122211..1';2<.2334 41222111 ,)221 3321 41~G~111~~~")3JI.j.) 42~~~du(! c.:~:::!~~~~G 42(::22 tjz2~e:;~~.:..~2iJ 422<::2,,22<::";22<:J.,2 " " 32 50122122332..1..12J3 OJ 3J 5 "'2:::22222222..;244 C 34 5~22.2222<:;242";422 ," J5 51 I I I <:<:22223';..IJ4 0 36511122<:123332"-4'; \) 37 5112<:.)30101J<:223';2'; 38511222113";222334;; 39 :>1122213314223';2 oJ 4;.1 51122213';J2241';~ J 41 511222333';2443..14 I 42 51222111222';334J 0 43 5122211322.1;31323 ~ 44 5122<:13.522211343 C 45 51<::<::2I..1..12 .. 2J3321 C 46 51222133442JoIJ4'; I 47 512a.33..1J22J 3J2'; C 48 5122233..1222333"'1 ., 49 5122233322233345 I 50 51222';33224.1';52') I 51 5, 53 54 55 56 51 58 I I I I I 3 I 2 I <: I 5GzG~;i}~4~~£':~~44G u 52c:!t::c:.:,J~:""..!L.oi..!~ ... ~4 ~ 52<:22';24..;42~~~22 " I 2 J I I 2. 3 4 2 4 7 2 5 6 2 4 6 249 2 312 2 411 2 51ll 3 4110 3 4 8 2 415 2514 2. 712 J 36J2 <:: I 3 2 2 3 6 5 4 5 J (::1,15 21314 1 I 522222<:2,,224444.. I 2 52222222224J4222 " I 52<::2222<:24-142244 I 2. 52222,,244222222" oJ .; 5222222442422442 I 2 412 1 8 6 8 7 8 6 8 JI214 4 815 61'" I 3 4 914 5UI2 51012 59 602232234334·333J 0 2·3 4 5 60 6023323333233244 0 2 3 4 1 61 6033333322422433 0 2 5 6 1 62 6112223J332443J4 0 2 3 4 8 63 6112323234343243" 2 3 4 9 64 61133222442J3334 10 2 3 411 65 6122213344233343 0 2 3 412 66 6122233322233345 0 2 4 7 8 61 6122233322433523 0 2 5 6 8 68 6122313243334234" 2 3 413 69 6122333223332254 0 2 4 1 9 10 6122333223532432 0 1 2 5 6 9 11 6122333243352232 e 2 3 4 5 9 12 612JJI22JJ244J4J 0 1 2 3 415 73 6123332233222345 0 I 2 4 111 14 61233322J3422523 .. 1 2 5 611 75 6123332235224323 C I 3 4 611 16 613J322222233354 0 I 2 4 115 71 61333222224J3532 0 1 2 5 615 18 6222222222244444 0 I 2 4 815 19 6222222224442244 0 1 2 4 914 80 6222222442422442 0 1 2 510"12 81 6222224444244444 1 2 3 4 812 82 6222304343J233J3 0 I 2 JI213 83 6222322123343J53 " 1 2 4 915 84 6222322143323335 0 1 2 41113 85 62223223415233J3 0 1 2 51013 86 6222322343321533 0 1 2 51112 81 6222324143343133 0 2 3 4 913 88 6222324343341331 0 2 3 5 912 89 6222344343363333 2 3 4 5 8 9 90 62233103333233442 0 1 2 31215 91622332113323344401241115 92 62233213J143J442 " I 2 51015 93 622332133J235242 " 1 3 41015 94 6223323333211444 0 1 2 11112 95 6223341333213244 0 1 4 11011 96 6223341353233222 10 3 4 51011 91 6223343333033224 0 3 4 1 811 98 6223343333233042 0 3 4 1 910 99 6223343333233446 1 2 4 1 811 100 6233311322442433 10 1 2 51415 1101 6233313322222453 IiJ 1 2 11215 102 6233313322422235 0 1 2 11314 103 6233333302224433 0 1 6 1 815 104 6233333322244411 0 3 5 6 815 105 6233333322404233 0 1 6 11013 106 6333302222433334 0 1 2131415 107 6333322222415332 10 1 6101315 108 6333322244433336 1 2 4111314 Note: (1) (2) (3) (4) (5) T T 122 1111 122 JJ 11 IIII JJ I I 122 122 1II1 JJ22 122 1I1I 1122 IIII 122 1122 1122 I 2 2 " " " 0 " I I I I I-I I-I 0 -I 0 -1-1 10 10 0 0 0 0 0 0 I I 0 - I ,,- I 0 0 I-I 10 I 10 0 -I 10 0 10 0 10 10 10 10 0 " "" " " " I 2 2 2 10 10 1- I 10 I 10-1 0 I 0 0-1 5 3 3 2 0 0 0 0 0 I 2- 2 " 0 2 0 4 3 0 2 1-2 10 e e 1)-2 I 0 0 0 0 I -I 0 0 0-2 I-I Ii) 10 I Ii) 10 2 1-1 0 0 1-1-1 10 0 I 1-2 ~ I-I 0 0-3 I II c- I 0 03 2 0 1-2 I 10 " 0 0 10 2-2 10 10 1-2 10 I 10 0 I-I ~ 2 0-1 10 C 10 I 0 10 0 I-I 10 10-1 10 10 I-I " <: 3 3 2 2 " " J 3 3 2 5 3 3 " " I 2 2 5 I 1 1 2 3 111 I -1 " 0 0- I 1111 1-1-1010 '" 0 I 1 1 1 I 1111 1 2 " " :il-1-2.1O IIII I I 10 0-2 0 0 10 1111 1-100101-20 123 -1 0 0 I 10. 10 1-2 0 0 I I 12 1- 1 0 0 1 10- 2 0 0 I 111 -1-1 0 0 10-·1 2 0 II I I I " " 0- 2 " I " 1122 1 0- I 0 0 J-2 10 0 0 11121"0001-210" 1113 1-1 0 0 0 3-2 0 10 0 12210.,1-31000 122 0 10 0 1-1-1 0 1 10 1111 I I 1 0 0-2 0 0 1111 -1 0 0 " 2-2 I 0 122 10 0 0 I I 1-2 0 " 1 122 10 111 0 ;: - I I -I " " "0 2 .4 2 4 2 4 4 4 3 2 5 3 5 3 2 3 .4 4 2 2 2 2 4 1122000;)I-J01il1 122 0 0 0 1-1-1 10 0) 2 1111 00011l1-101 1112 I 0 0 0 ,,- I 2 -;! '" I 112 - 1 - I 0 0 0 3- 2 '" 0 1I I I I I 0 0 0 1-2 0 • J 123 " J '" " .:I I £ "-2 2-2 0 5 4 5 Til 2 3 3 3 !III 1 1-1 0 0-2 0 1111 -1 0 10 0 1 0-1 Til 2 2 122 0 1 0 1 0 2-1 0 0 1111 1-1 0 0- 1 0 0 1 1111 1 2 10 0-2-1 0 0 1111 1 0 0 0 10- 1 2 0 122 10 10 0 1 0 1-2 0 0 1112 1-1 0 10 2- 2 0 0 0 122 -1 0 0 1-1 0 2 0 0· il11 1-1-200 300 1112 -1-1 0 0-1 0 2 0 0 1111 1 2 0 0-3 10 1 0 1122 1 0 0 0 1"" 2 0 1 0 122 -1 0 0 1 2 0-1 0 0 123 0 0 0 1-1- 1 0 1 0 0 1111 1 0 0 0- 1- 1 1 0 1112 -1 0 0 0 2-2 1 0 0 1111 0 10 0 0-1 0 1 0 1123 -1-1 0 0 0 3 0-2 0 0 0 1124 1- 1 0 0 0 3 0- 2 0 0 0 1122 1 1 0 0 0-3 1 10 0 0 122 0 0 0 0-1 0 10 1 0 1113 1 0 0 0 0-1 2 0-2 0 1111 1 2 0 0 0 1-3 0 123 0 0 0 1 0-1 2-2 0 0 1123 1 0 0 0 0- 1 0 2 0- 2 0 123 -1 10 0 1-20 0 2 0 0 1122 -1 0 0 0 1 0-1 10 2 0 122 10 10 0 0 1 0-1 0 0 1122 0 0 1 1 1-2 0 0 0 1112 1 0 0 0-1- 1 0 2 0 *1234 0 0 0 0 0 0 1 0-1 1 0 1112 1-1 0 0 0 3-2 0 0 1112 -1 0 0 0-1 0 2-1 0 1122 -1 0-1 0 0 2 0-1 10 0 1112 1-1 0 0 0- 2 0 2 0 122 0 0 0 1 0 1 1-2 0 1122 1 0 1 1 0 0-2 0 0 0 *1133 0 0 0 0 1-1 1-1 1-1 0 *1234 0·0 0 0 0 1.,.1 1 0 0 0 1123 0 0 0 0-1 1-1 0 0 1 0 1112 1 0 0 10 1-2 0 1 0 1122 1 0 10 0-1 10-1 0 2 10 1112 0 0 0 0-1 1-1 0 1 1223 0 0 0 0-1 0 0 0 1 0-1 1122 0 0 0 0 0- 1 0 0 1 0 1223 0 10 0 0-1 0 1-1 0 0 1 11110·100001-10 3 4 2 3 2 3 4 3 5 3 3 3 4 2 2 4 1 5 0 5 3 1 4 5 4 4 3 3 2 3 2 0 0-1 3 5 3 4 3 4 3 0 10-1 o 4 3 2 3 2 ·2 0 0 CHAR VECTOR 2 1 2 2 CONNECTION WI STANDARD SUI'! 109 7033333344444433 0 2 3 4 5 6 H0 1122333443354434 0 2 3 4 5 8 2 3 4 510 111 1123332453444343 0 112 11233344·33244345 0 2 3 4 1 8 113 7123334433444523 0 2 3 5 6 8 114 1133322444453334 0 2 3 4 514 115 7133324444233354 0 2 3 4 112 116 1133324444433532 0 2 3 5 612 111 1133344422433534 0 2 5 6 1 8 118 1133344444455534 1 2 3 4 5 6 8 119 1222224444244444 10 1 2 3 4 812 120 1222324343345335 0 1 2 3 4 813 121 1222324345343353 0 1 2 3 4 912 122 7222344343363333 0 2 3 4 5 8 9 123 1223323333255444 0 1 2 3 4 815 124 7223323335453244 0 1 2 3 4 914 125 1223323355233444 0 1 2 3 41112 126 1223323553433442 0 1 2 3 51012 1211223341353433444012451011 128 122.3343333233446 0 1 2 4 1 811 129 1223J43333433264 0 1 2 4 1 9110 130 1223343333633442 0 1 2 5 6 910 131 1223343335235424 0 1 3 4 6 811 132 12233433J5435242 0 1 3 4 6 910 133 122334355.3455444 I 2 3 4 5 8110 134 1233313344244453 0 1 2 3 41215 135 1233313344444235 0 1 2 3 4131"4 136 1233313544442433 0 1 2 3 51214 137 7233333322244455 10 1 .2 4 1 815 138 7233333322444633 0 1 2 5 6 815 139 7233333324442255 0 1 2 4 1 914 140 1233333324642433 0 1 2 5 6 914 141 1233333344422633 0 1 2 5 61112 142 1233333522442435 0 1 2 5 1 814 143 1233333524444413 0 1 3 5 6 814 144 1233333542422453 0 1 2 5 11012 145 1233333544424431 0 1 3 5 61012 146 7233333544464435 1 2 3 4 5 814 141 1233333564444453 1 2 3 4 51012 148 1233335524224433 0 1 3 6 1 812 149 1233335544244455 1 2 3 4 1 812 150 1233335544444633 1 2 3 5 6 812 151 1333304444433334 0 1 2 3121314 152 1333322224453354 0 1 2 4 91415 153 1333322244433336 0 1 2 4111314 154 7333322442433552 0 1 2 5101215 155 1333322442633334 0 1 2 5101314 156 1333322444431534 0 1 2 5111214 151 7333322644433332 0 1 3 5101214 158 7333324444255554 1 2 3 4 81215 159 1333324444415332 0 1 3 6101213 160 1333324444455336 1 2 3 4 81314 161 1333324446453354.1 2 3 4 91214 162 1333344444433330 0 3 5 6 91012 163 1333344444433314 1 2 4 1 91012 1641333344444633552 1 2 5 6 911012 4 5 3 1111 1-1 2 0-2 0 0 0 3 1111 1 1-1 0 0-3 0 0 3 i 22 0- 1 0 1 0- 1 2 10 0 3 1111 1 1 2 0-3 0 0 0 2 1220001-1-1100 4 11111-1002-2010 .4 1111 1-1 0 0 1 0-2 0 3 1111 1 0 10 0-2 1 0-1 2 1111 I 1 0 0-1-1 0 0 2 122 0 0 0 1-1 0 1 1 0 4 122 Iii 1 0 1 0 2-2 0 0 4 1111 -1-1-1 0 2 0 0 0 2 1111 I 1 10 0-2 0 1 0 3 1112 1-1 0 0-2 0 2 0 0 2 122 -1 0 0 1-1 0 0 0 2 5 1112 1 2 0 0 0 1 0-3 0 4 122 0 1 0 1 1 0 0-2.0 3 1111 1 1 0 0 0-2 2 0 3 122 - 0 0 0 1-1 0 2-1 0 3 1111-1-1000200 3 123 0 0 0 1 1 0-2 0 1 0 3 1122 -1 0-1 0-1 0 2 0 0 0 4 1123 -1-2 0 0 0-2 0 3 0 0 0 4 1124 0 0-1 0 0 2 0 0-2 1 0 0 3 1112 -1 0 0 0 0 2-1 0 0 2 123 0 0 0 1-2 0 0 1 1 0 2 1111 1 0 10 0- 1- 1 2 0 2 1123 0 0 0 0-1 " 0 1-1 1 0 2 1123 0 0 0 0- 1 1 0-1 0 10 1 5 1113 1-1 0 0 0 3 0-2 0 0 4 1112 -1 0 0 0 0-1 2-2 0 4 1122 1 0 10 0 0 1-3 0 0 3 122 -1 0 0 1-2 0 2 0 0 3 1123 -1 0 0-1 0-2 0 2 0 0 " 3 1122 .,.1 0 0 0-1 0 2-1 0 0 4 1223 0 0 0 0 1-1 0 0 1 0 0-2 5 122 -1 0 0 1 2 0 0- 2 0 4 1123 -1 0 0 0 2 0-2 1 0 0 0 .3 1112 1 1 0 0 0-3 1 0 0 5 122 -1·0 0 1 3 0-2 0 0 2 1220001-1-1020 4 1123 0 0 0 0 0 1-1 1 0-2 0 3 1111 100001-20 5 1223 -1 1 10 0 0 0 3 0-2 0 0 0 4 1123 1 0 0 0 0-1 2 0-2 0 0 *1234 0 0 0 0 0 0 1-1 1 0-1 1-1 " 4 2 1112 -1 0 0 0 0-1 0 2 0 2 1122 0 0 0 0-1 1-1 0 1 0 3 1122 1 0 '" 0 1 0-2 '" 1 0 4 1122 0 0 0 0 1-1 0 1 0-2 .3 1123 -1 0 '" 0 0 1-2 0 0 2 0 2 1111 0 0 0 0-1 1-1 0 4 122 0 0 0 1-:1 0 2 0-2 2 1122 0 0 0 0 0 1 0-1 0 0 Til 1 3 T 1 2 2 3 165 8044444444444444 10 I 2 3 4 5 6 1 T 1 166 81333444444555J4 0 I 234 5 68 1111 1 0 0 0-1 0 0-1 167 81334443 /15554443 '" 1 2 3 4 5 6 9 1111 1-1" 0 0-2 0 0 168 81J4443355444534 " 1 2 3 4 5 611 11 JJ 1-1 10 0-1 0 1 0 169 8144433344455543 0 I 2 3 4 5 6·15 1111 1 0 10 0-2 1 0 0 170 8222444444464444 (; 1 2 3 4 5 3 9 I 1 1 1 111 8223343553455444 0 I 2 3 4 5 810 122 -1 2 0 1-2 0 0 0 0 112 8223443454354535 " 1 2 3 4 5 811 123 1 0 0 1-1 0 0 1 0 0 1 1 0 0 0-2 0 0 0 11"; 3224442464444444 " I 2 3 4 51011 122 174 8224444444244446 " 1 234 1 811 1111 1 1 0 0 10-1 1 0 175 8:':24444444444264 0 I 2 3 4 1 910 122 1 1 0 1 0 0-2 0 0 1768233333544464435 0 1 234 5 814 122 1 0 0 1-2 0 200 171 823333J564444453 Iil I 2 3 4 51012 122 0 0 0 1 0-2 1 0 0 118 3233433443365544 (; I :2 3 4 5 815 1112 1 1 0 0-3 0 2 0 0 119 8233433463545J44 0 1 234 51013 1112 1-1 0 10-1 0 0 2 0 1808<:33435245545344 0 I 2 3 4 6 913 122 0 1 0 1-2 0 0 1 0 181 823343544J345346 0 1 2 3 4 1 813 1223 1 1 0 0 0 0-2 0 2 10 0 0 1<32823343544354552401 235 6 813 1123 1-1 0 10 0-20 2 000 1838234432453455453 0 I 2 3 4 51015 1122 1 10-1 0-2 10 2 0 0 0 184 82J4434433255455 10 1 2 3 4 7 815 122 1 0 0 1-2 0 0 1 0 185 823443443J45563.~ 0 1 2 3 5 6 815 1112 -1 0 0-1 0-2 2 0 0 186 824442G444464444 0 1 2 3 4 51415 122 1 0 0 0-2 1 0 0 0 187 8244424444244464 Ii 1 2 3 4 71215 122 -1 0 0 1 0-2 2 0 0 188 8244424444444246 0 I 2 3 4 7J..314 1112 1 0 0 0 1-2 1 0 0 189 8244444422444644 " 1 2 5 6 1 815 1112 1 0 0 0-1 0-1 2 0 190824444444442264401 2 5 6 71112 1112 0 000 1 0-1 0 I 191 8333324444255554 10 1 2 J 4 81215 1122 1 1 0 0-2 0 0 2 0 0 192833332.444445533601 234 81514 1111 1 1 0 10 1-2 0 0 1938333324446453354 0 1 2 3 4 91214 1123 -1 0 1 0 0-2 0 2 0 0 0 0 0 0 1 1 0-2 0 I 194 8333344444433314 0 1 2 4 1 91012 122 195 8333424343356445 0 1 2 3 4 81315 122 0 0 0 1-2 0 1 1 0 196 8333424345354463 " 1 2 3 4 91215 1223 -1-1 0 0 0-2 0 2 0 0 0 0 191<3333424365334445012341112131122 -t"0-1 00-10020 1986333444145534445101 24 6 91113 1112 1 001 0-2 0 0 0 19983334443433344410124 181113 1111 10 0 0-11-20 2008333444343374443023458915 1113 0 0 0 0-11-10 1 0 201 8333444343534265 0 1 2 4 1 91013 1223 0 1 0 0 0 0 1-2 0 2 0 0 202 8334423335464354 0 1 2 3 4 91415 1223 0 1 0 0 0-2 0 0 2-1 0 0 2038334423355244554 0 1 2 3 4111215 1122 1-1 0 0 0-2 0 0 2 0 20483344233554443360 1 234111314 1111 1-1000-220 205 8334441J53444554 0 1 2 4 5l XLj) = 1 (=0), QJU is turned on (off). As to (2), the current source generates the current proportional to the weight Wj for each pair of inputs. The current from the source flows through QJU or QJL by the switching operation and is summed at the collector-circuit common resistance RJ, and generates the voltage proportional to the sum of each current. As the collector internal resistance of transistor QJ is suffici~ntly high compared with RJ, the analogue summation of the currents is achieved almost completely. Unsaturated current switch pair enables high speed logic operation. Figure 8 shows 4-input-pair AETL circuit (Wi = W2 = 1, ws=1, 2, w4=1, 2, 3, 4). QE1, QE2, . .. ,QE4 form the constant current generators. Diodes D1, D2, D3 compensate the variation of the unit current due to the ambient temperature. Connecting or not the terminals Z31, Z41, Z42 to E to select the values of emitter resistance RE3, RE4, the weights W s, W 4 can be chosen for the logic function to be realized. The voltage level shift circuit by QH and RH1 is to prevent collector saturation of QJU and QJL. The minimum value (when 8=8) of RJL I Figure 8-4-2-1-1 AETL circuit An Extended Threshold Logic as Unit Cell of Array Logics 361 (V) O~-----------.r-------------------~ tvo x HI - 2 f- - 4 I- - 61- - Vyu -VYL ~I::~ IvOxw -12 I- -14 I- -161- J VBE - 18 base potential of constant current generating CI rcu It I- - 20 I- E voltage source 1---- ------------------------ input RJ output (terminalsJ RH1 design of potentials in AETL circuit * tolerance for unsaturation ** level shift for asymmetric w Yu = L Wj(Xj>xj ), YL = Figure Io-Output voltage vs. logical output the potential VCJ of the collector of QJ should not be much less than the maximum value (when S = 0) of output voltage VC S where 4 L Wj (XUj> XLj) for the upper side. QH compensates for this change of VFO caused by temperature variation. In other words, the changes of V FO and V FH are required to be the same for the temperature variation, and resistors to satisfy the relationship of RJ + RHI *' RH. 4-input-pair (Wmax=8) AETL circuit was designed and hybrid-integrated (Figure 11). CAD enabled us to design the circuit with appropriate noise margin. The j=1 The transition of signal level· in the circuit is shown in Figure 9. This circuit is perfectly symmetric in upper and lower sides except the level shift circuit. There is a complementary relation between 4 YL= L Wj(XUj>XLj) and j=1 4 Yu= LWi(xj>Xi) outputs Figure 9-Signal voltage level in 4-2-1-1 AETL S= o L: Wj(XLj>XUj): j=1 YU+YL=W. For the complete operation of the current switch pair, the situation XUj=XLj may not occur. So let two classes of outputs, Y and y' be generated, where y=O, 1, ... , Wand y' ="72, 1"72, ... , W +"72 (Figure 10). To realize this, the potential of QHU is made a little different from that of QHL. The base emitter voltage VFO of transistor QO influences output voltage directly. Figure ll-Elements top; 3-Compatible Switch left; 4-3-2-1 AETL bottom; input MML 362 Fall Joint Computer Conference, 1972 Figure 12-8-bit high speed full adder results of experiments on integrated AETL agreed very well with that of simulation and the values aimed by the design. The effect· of the ambient temperature was compensated almost completely by the temperature compensation circuit (O.09mV/deg). It was found that the output error was mainly caused by the deviation of resistors. To make W max large, it is required (1) to increase base emitter break down voltage of transistor QJ, (2) to increase the accuracy of resistors. HIGH SPEED FULL ADDER14,15,16,17 essentially in multi-valued logic but if we dare to explain it like 2-valued logic, the right side of the virgule corresponds to the "true" and the left side_jto the "false." Let us explain the operation of master F I F in Figure 14. When E I M=E2M=5, the FIF keeps its previous state. When ElM =3 and E2M~9 the data DIM is enabled by ElM and the output Qu M follows the data DIM. In the next step, ElM and E2M should be 5, and the output is held. The data Dl'[ is enabled when ElM ~ 9 and E2M = 3. When ElM, E2M~9, the FIF is cleared. The slave FIF operates in the same way, i.e., when Eis = E 2s = 6, the To examine the dynamic behavior of hybrid integrated }\1}\1L we have constructed an 8-bit Full Adder, shown in Figure 12, based on a revised Sklansky's Conditional Sum }\1ethod. The experimental results are as follows. The carry propagation time was measured as 11ns (Figure 13). Stable operation was assured by a dynamic tester/ 8 ,19 under the conditions that ambient temperature is between -15°C",-,+ 75°C, and the source voltage drops down to 55 percent of the design center value. Co C8 PLAN:E REGISTER We show, as an example of sequential circuit using hybrid IC AETL mentioned above, a plane register, in which one cell corresponds to one AETL element. Two types of Data F IF's are shown in Figure 14. They accepts two data inputs D I , D2 and are dual each other. The data to·bereceived is decided by the values of two enable signals E I , E 2• The weight of each input pair is 1. In the figure; integer values denote the logical values of each variable. This plain register operates Figure13 -Carry propagation time Co; the first bit carry input Cs; the final bit carry output X axis; lOns/div Y axis; 0.2 V/div Source voltage; lOV An Extended Threshold Logic as Unit Cell of Array Logics 11 (9)/5/3 "'0,214 11(9)/5/3 /0, 2 14 IE~ (enable) ID~(data) Q~ 12(10)16/4 I 8,6/4 IE; 1,3/5 IDs (data) 12(10)1614 IE~ 363 0,2/4 1 M IE2 Q~ I ID~ master I 1,3/51 I L_ -1,3/5 ---- 9,7/5 ID~ slave FI F F/ F Figure 14- Two data F IF's Figure-15 Construction of plane register using two data F/F's shown in Figure 14 Figure 16-Simulated 4-bit torus register. The terminals which have the same name are connected 364 Fall Joint Computer Conference, 1972 The state of each slave FIF on a lattice point can be shifted to any direction by the combination of enable signals. 4 bits plane register connected in the torus form (torus register) was simulated by the AETL simulator (Figure 16). The 'wave forms of enable signals and outputs are shown in Figure 17 and their values in Table IV. Torus register of 8X8 bits was fabricated using hybrid Ie AETL (Figure 18). The stable operation and the ability of AETL for cellular automata were ascertained. TABLE IV-Simulated Four Bits Torus Register >0 C 4 EMI EM2 ESI ES2 SUI SU2 SU3 SU4 MLI ML2 ML3 ML4 7 5 7 5 767 6 * 2 • 4 • 4 • 4 • 5 • 5 • 5 • 5 7 5 7 5 1 6 7 6 + 0 • 4 • 4 • 4 • 5 • 5 • 5 • 5 ** * 3 .91 6 7 6 + 0 • 4 • 4 • 4 • 5 • 5 • 5 • 5 * 3 • 9 1 6 7 6 + 0 • 4 • 4 • 4 • 5 • 5 • 5 * 3 ** Figure 17-Waveforms. of simulated 4-bit torus register. The arrow direction denotes truth value, dot line threshold value between true and false value FIF keeps its previous state. The data D 1S(D 2S) IS enabled when ElS=4(~10) and E2S~10( =4). The enable signals can be generated easily by the usual pulse generator and AETL. The plane register can be made arranging master and slave FIF's above mentioned, as shown in Figure 15. 7 5 7 5 * 4 .10 + 0 • 4 • 4 • 4 • 5 • 5 • 5 * 3 1 5 7 5 * 4 .10 • 4 * 2 • 4 • 4 • 5 • 5 • 5 + 1 ** .9* 3 7 616 • 4 * 2 • 4 • 4 • 5 • 5 • 5 + 1 • 9 * 3 7 6 1 6 • 4 + 0 • 4 • 4 • 5 * 3 • 5 • 5 ** 1 5 7 5 *4 .10 4 + 0 4. 4 • 5 *3 5 5 1 5 7 5 * 4 .10 • 4 • 4 • 4 * 2 • 5 + 1 • 5 • 5 ** * * • 9 3 1 676 • 4 • 4 • 4 2 .5+ 1 • 5 • 5 • 9 * 3 7 6 1 6 • 4 • 4 • 4 + 0 • 5 • 5 • 5 * 3 ** 7 5 7 5 .10 * 4 • 4 • 4 • 4 + 0 • 5 • 5 • 5 * 3 1 5 1 5 .10 * 4 • 4 • 4 * 2 • 4 • 5 • 5 • 5 + 1 ** .9* 3 1 616 • 4 • 4 * 2 • 4 • 5 • 5 .5+ 1 • 9 * 3 1 6 1 6 • 4 • 4 + 0 • 4 • 5 • 5 * 3 • 5 ** . · 4 · 4 • 55 .. 55 ·4 ·4 • 5 ·5 2 1 6 7 6 . 1 6 1 6 · ·· ·· ·• 55 ·· 5 · 5 7 5 1 5 · · ·· 4 •• 55 ·• 55 ·· 55 1 5 1 5 4 7 5 7 5 * 4 .10 7 5 7 5 * 4 .10 * 2 * 3 * 3 4 + 0 4 4 • 5 * 3 + 1 + 1 4 4 4 4 4 .10*4+0 4 • 4 * 2 .10 * 4 4 * + 0 9 9 Note: 4 ** 5 * 3 ** * 3 + 1 ** (1 ) '**' denotes stable state. (2) '+' , 1 and 1 *' denote 'true'. and '? ' denote 'false' in this case. ANALOGUE ME1VIORY20.21 Figure 18-Torus register of 8X8 bits by the hybrid Ie AETL circuit A new analogue memory element with similar configuration to AETL has been developed, and its basic experiments have been performed successfully. The principle of this system consists in that (1) the direction of the unit current flowing through each current switch is changed over according to the level of input voltage, (2) the product of the current and the common collector An Extended Threshold Logic as Unit Cell of Array Logics r-----~--------~----~---------~---- rCA RK RK RU 365 RH1U __- - _ M N _ - - - -_ _--~ -----"Nv--- RJU -i: ---r----RE I VBO ~ ____ L ~----~------ --~----~--~~----~~~ RH1U Figure 19-Circuit of analogue memory element resistance forms an analogue quantity, and (3) the result is made to be self maintained. Results of the basic experiments showed that the response time is pS per step. Figure 19 shows the circuit and Figure 20 shows the input-output response. CONCLUSION A new kind of threshold logic named AETL (An Extended Threshold Logic) and its simplified modification l\1ML (l\1ultiplex Median Logic) have been presented. This report is summarized as follows: (1) We obtained several theoretical results for the realization of boolean functions by AETL. a. One AETL can generate a complete set of median functions and their negations. b. Two AETLs can generate an arbitrary boolean function. c. An arbitrary symmetric n-variable boolean function can be generated by two AETLs of which weight sum W~n. Figure 20-0utput response by sine wave input Synthesis examples are shown for all 3-variable (Figure 6) and 4-variable (Table II) boolean . functions. All above results can. be similarly realized also by l\llVIL. (2) 4-input-pair (W max= 8) AETL circuit was designed and hybrid-integrated. It has been used as a unit cell of the plane register. 366 Fall Joint Computer Conference, 1972 (3) An 8-bit full adder was fabricated to test the performance of Ml\1L. (4) We showed as an example of sequential circuit usingAETL, an 8X8-bit plane register, in which one cell corresponds to one hybrid IC AETL element. (5) As. a first step to realize F / F based on multivalued logic, 17-valued analogue memory was designed and tested successfully. Through this research, it is our solid conclusion that the AETL concept will override previous threshold logics in logical abilities. On the other hand, the AETL hardwares presented are by no means conclusive ones. Their future will depend much on IC technology and requirements for the variable weight and variable threshold ability of AETL. One of our next interest aims at a realization of certain AETL processor, which might present a unified (pre-) processor for various kinds of pattern information. REFERENCES 1 W.S MEISEL Variable-threshold threshold elements Doctoral Dissertation E E Dept U of So Calif. May 1967 and IEEE Transactions pp 656-667 Vol C-17 No 7 July 1968 W S MEISEL Nets of variable-threshold threshold elements IEEE Transactions pp 667-676 Vol C-17 No 7 July 1968 2 DR HARING Multi-threshold threshold elements IEEE Transactions pp 45-65 Vol EC-15 No 1 February 1966 3 J J AMODEI D HAMPEL T R MAYHEW R 0 WINDER An integrated threshold gate 1967 International Solid-State Circuits Conference Digest of Technical Papers pp 114-115 Lewis Winner NY February 1967 4 R 0 WINDER The status of threshold logic 1st Annual Princeton Conf on Information Sciences and Systems pp 59-67 Princ Univ NJ March 1967 5 R 0 WINDER The status of threshold logic RCA Review pp 62-84 Vol 30 No 1 March 1969 6 N SANECHIKA Synthesis of logical functions using Multiplex Median Logic Bulletin of the Electrotechnical Laboratory pp 17-36 Vol 35 Nos 9 & 10 1971 7 R MORI Unitron (multiplex median logic) Technical Group on Electronic Computers of Institute of Electronics and Communication Engineers of Japan December 1968 8 R MORI Multiplex median logic system 1971 Mexico IEEE International Conference on Systems Networks and Computers pp 683-687 January 9 N SANECHIKA M TAJIMA R MORI Synthesis of logical functions by Unitron National Convention of the Institute of Electronics and Communication Engineers of Japan p 957 No 1898 August 1970 10 R MORI Y TSUJI N SANECHIKA Synthesis of logical functions by Unitron Joint Convention of the Four Electrical and Electronics Institutes of Japan pp 3566-3567 No 3093 March 1969 11 M A HARRISON Introduction to switching and automata theory McGraw-Hill Inc pp 162-167 pp 395-407 1965 12 R MORI Y OKADA M TAJIMA S KAO T TOMARU T ABE J,.-input-pair variable weight and variable threshold AETL circuit Bulletin of the Electrotechnical Laboratory pp 99-125 Vol 35 Nos 9 & 10 1971 13 S COHEN R 0 WINDER Threshold gate building blocks IEEE Transactions pp 816-823 Vol C-18 No 9 September 1969 14 Y TSUJI H TAJIMA R MORI 8-bit high speed full adder by Unitron National Convention of the Institute of Electronics and Communication Engineers of Japan p 960 No 901 August 1970 15 Y TSUJI N SANECHIKA H TAJIMA R MORI A high speed full adder using Unitron and Switch Bulletin of the Electrotechnical Laboratory pp 69-82 Vol 35 Nos 9 & 10 1971 16 J SKLANSKY Conditional-sum addition logic IRE Transactions pp 226-231 Vol EC-9 No 2 June 1960 17 R MORI Y TSUJI H TAJIMA Design and trial fabrication of 3-input push-pull Unitron and compatible Switch Bulletin of the Electrotechnical Laboratory pp 48-68 Vol 35 Nos 9 & 10 1971 18 H TAJIMA Y TSUJI R MORI The dynamic tester of small scale logic system National Convention of the Institute of Electronics and Communication Engineers of Japan p 962 No 903 August 1970 19 H TAJIMA Y TSUJI R MORI T ABE A dynamic tester for small scale logic system Bulletin ofthe Electrotechnical Laboratory pp 140-146 Vol 35 Nos 9 & 10 1971 20 R MORI S KAO A nalog memory element using current switches Bulletin of the Electrotechnical Laboratory pp 132-139 Vol 35 Nos 9 & 10 1971 21 R MORI S KAO Stabilization of 16-input-pair MML circuit Bulletin of the Electrotechnical Laboratory pp 126-131 Vol 35 Nos 9 & 10 1971 Multiple operand addition and multiplication by SHANKER SINGH and RONALD WAXMAN International Business Machines Corporation Poughkeepsie, N ew York INTRODUCTION In order to illustrate the basic ideas involved in the method, it will be worthwhile to start with an example. . Oonsider a matrix of nine 3-bit numbers as shown In Figure 1. We can use a circuit Figure 2 to obtain the final sum. The circuit operation can be described step by step as follows: Traditionally, adders used in small- and medium-sized computers are designed to add two n-bit numbers. There are arithmetic operations which require the addition of a large number of numbers. Multiplication (division) and special function generation are such operations. In large computers, "carry save addition", which adds a group of 3 numbers and reduces their sum to a partial sum of two numbers, has been frequently used to speed up multiplication. One of these two partial sums evaluates the sum modulo 2 of bits in the same binary order; the second partial sum being composed from carries generated but not transferred. These partial sums are regrouped in triplets and enter a "carry look ahead" adder to provide the final sum. The circuit implementation is a cascade connection of full adders, and is referred to in the literature as' "adder tree" .1.2 The operation time is considerably reduced because carries are not transferred, although they are formed. This paper considers the problem of adding k n-bit numbers (operands) where k> 3. A novel scheme for adding k numbers will be described. It will be shown that by partitioning these k numbers columnwise, such that each column partition contains m bits of each of the k numbers, where m is an integer2::log2 (k-1), the final sum can be obtained in m+ 1 addition cycles. These cycles are not algorithmically related to the cycles used in the adder tree method. Cycle time is dependent upon structure and technology. We shall also be describing the use of the adder in multiplication of two numbers. It will be shown that the use of such an adder can lead to a good compromise between hardware requirements and speed for multiplication. Finally, it will be shown that, from the point of view of large scale integration, such implementation may be quite suitable for arithmetic units in digital computers. 1. Initialization Reset all the register cells R 1 'to zero state. 2. 1st Add Cycle Gate column 2 (left most) of the matrix in the adder. The sum and the carries appear simultaneously at So, COl, C02, C03 terminals of the circuit, which in turn provides the 1st cycle partial sum of the numbers at terminals Sc2, SCI, Sco, S2, Sl, So equal to 001000. The values at ScI down through So are set into register cells Ill 8 at this time. 3. 2nd Add Cycle Gate the column 1 of the matrix in the adder. Once again, the sum and the carries are generated simultaneously and in turn automatically get added to the previous cycle shifted partial sum contained in register cells RIB. (The previous cycle partial sum is effectively shifted left one position because the contents of each register cell is fed into the sub-adder position to its left.) This provides the second cycle partia:l sum at Sc2, SCI, Sco, S2, Sl, So as 010110. 4. 3rd Add Cycle Gate column 0 of the matrix. The operation repeats as in Step 3, and we obtain the final sum 110011 of the 9 numbers in the matrix of Figure 1. 8 From the example, it may be noted that sub-adder unit 1 of the multiple adder is the most complex and requires the maximum number of logic gates. This subunit also increases in size and complexity as k increases. 367 368 Fall Joint Computer Conference, 1972 Register No.2 Numbers To Be Added I" Register No. 1 - Register No. 0 1 1 1 1 2 1 0 1 3 1 1 1 4- 1 Figure 3. The decoders 1 and 2 produce aIl5-tuples and 4-tuples of inputs ao, aI, a2, aa, a4 and a5, a6, a7, as respectively. po represents 5-tuple aOala2aaa4 (00000). PI represents the set union of all the 5-tuples with weight 1 (i.e. {OOOOI +00010+00100+01000+ 10oo0}) realized by 'dot ORing' all the tuples of weight 1, (weight is r--- I 0 1 0-- 5 0 1 1 0---- 6 1 1 1 0-- 0--- 7 1 1 1 8 1 0 0 9 1 r '. 0 ~., 0 . I~ .. ~ . :~ .,..- 1st .- Otb Figure I-Matrix of nine 3-bit numbers stored in three registers of length 9 0 0 , 0 , 0 , 0 , 0 0 , 0 , 0 , , 0 , 0 , 0 0 0 , , 0 0 , 0 0 , , 0 , , 0 0 , , 0 , , 0 0 , 0 0 , , 0 0 0 0 0 0 , , , 0 0 0 .. - , , , , 0 0 0 , , , 0 0 0 , , , 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , 0 0 0 0 0 , , 0 0 0 0 .. .. So ~-.- } e~ i e~ V .. v .. ..----.NV- .,.. ~ v .. , .,- .. v .. , ..---H The size of other sub-adder units remains constant. However, with recent advances in large scale circuit integration and with the availability of monolithic readonly memories, the circuit realization of sub-adder unit 1 should not be difficult. One of the many such possible circuit realizations of sub-adder unit 1 is shown in I '0 .. <>------I V. . ~ Columns ~ ft,W , '3- 2nd r 0 Cl ..- - ; r r DECODER ., Ull '4Lllill ~lli 0 '-v..... "" 1 0 .. - r - '3- ~ ..-~ Vee '-----1 .........0 e~ V .. V.. ~ V •• STORAGE BUFFER "0 ., "2 ~ "4 "5 ' , , , , , , °, t-+-+--+-t-+-I t-+-+--+-t-+-I r,+-,~o+-t-+-I t-0+--;'t-'+-t--I-~ t-'+--;'I-l+-t--I-~ , ,, ,, °° °, NOTATIONS ur -t~ -W-- ~c ~cc Figure 3-Sub-adder unit I Figure 2-9-number adder defined as the number of 111 in the n-tuple). Similarly P2, Pa, P4 and P5 are realized by 'dot ORing' all the 5tuples of weight 2, 3, 4, and 5 respectively. qo, ql, q2, qa, and q4 are also realized by 'dot ORing' all the 4-tuples of weight 0, 1, 2, 3, and 4 respectively. Thus, logical functions for So, COl, C02 and Coa can be expressed as fol- Multiple Operand Addition and Multiplication 369 lows: So= (qO+q2+q4) (P1+P3+PS) + (q1+q3) (PO+P2+p4) C01=qO(P2+P3) +q1(P1+P2+PS) +q2(PO+P1+P4+PS) +q3(PO+P3+P4) +q4(P2+P3) C02= qO(P4+PS) +q1(P3+P4+PS) +q2(P2+P3+P4+PS) +q3 (PI +p2+P3+P4) +q4 (PO+P1 +P2+P3) C03= Q3PS+q4 (P4+PS) REGISTER A REGISTERB The other function shown in Figure 2 for sub-adder units 2, 3, 4, 5 and 6 are expressed as: Sl = [So (previous cycle) JEBC01 FINAL SUM C11 = [So (previous cycle) J. COl S2= [Sl (previous cycle) JEB C11EB C02 C21= [Sl (previous cycle) J. [C11+C02J+C1l·002 Sco= [S2 (previous cycle) JEBC21 EBC03 C31= [Sl (previous cycle) J·[C21+C03J+C21·C03 ScI = [SeO (previous cycle) JEBC31 Ci= [SeO (previous cycle) J·C31 Se2= [ScI (previous cycle) JEBC41 An adder of the type shown in Figure 2 is able to add any 9 numbers of n bits long, with the final sum available after n cycles. Such an adding scheme has an addition time proportional to n. Therefore, if we use many parallel adder units such as shown in Figure 2, we increase the speed of addition considerably. The example shown added the most significant column first. However, equivalent results would be obtained if the least significant column were to be added first. This may be verified easily by the reader. One less S.A. unit and one more R1 unit (see Figure 2) would be required. Let us proceed with the example, but increase to 32 bits the length of the nine numbers to be added. Suppose we partition these nine numbers column-wise, such that each partition set has three adjacent columns. Now the eleven partition sets, each with nine numbers, are added in parallel, using 11 adder units of Figure 2. (See Figure 3.) The final sum digits from 11 units denoted by So, Sl, ... S32 are stored in register A, and the sum digits denoted by SeO, ScI, ... Se32 are stored in register B. Three cycles of AMO are required to obtain the register A and B sums. In the fourth cycle, the contents of registers A and B are fed to a carry look-ahead adder to obtain the final sum of all nine 32-bit numbers. Figure 4-32-bit 9-number adder Thus in 4 addition cycles, one may add nine numbers. Note that the register positions are lined up into the carry look-ahead adder so that S32 adds to Se29, S31 to Se2S, Sao to Se29, etc. Thus a 36 position CLA is required. The first three cycles include the time to ripple the carriers through sub-adder unit 2 to 6. But this is the case for a simple design. In a more sophisticated design using techniques of "carry look ahead", one could reduce each individual cycle time for the first three cycles to a minimum by generating carriers C11, C21, C31, and Ci, simultaneously. Such expressions of carry generation are given by: C11= COl[SO (previous cycle) J C21 = C01[SO (previous cycle) J[Sl (previous cycle) +C02J +[Sl (previous cycle) JC02 Cal = C01[SO (previous cycle) J[SI (previous cycle) +C02J X [S2 (previous cycle) +C03J+[S2 (previous cycle) JC03 C41= [Seo (previous cycle) J[Col[So (previous cycle) J X [SI (previous cycle) +C02J[S2 (previous cycle) +C03J+[S2 (previous cycle)C03JJ The circuit implementation of these expressions' to obtain the sum term will lead to a minimum overall time for addition of nine numbers. It can be easily verified that if each column partition of nine numbers had two adjacent columns instead of three, then it would be impossible to obtain two partial sums 000S31 S30 ... So and Seal, Se30, ... SeO 000 only by appending Si and Sci for f = 0, 1 ... 31 from individual sub~adder units in Figure 4 respectively. However, one could obtain at least three partial sums which can be formed by ap- 370 Fall Joint Computer Conference, 1972 pending only. It may be noted that if we intended to add five numbers, a partition set of two adjacent columns would be quite suitable. .The reader may also note that the specific manner in which the partial sum digits are separated for register A and B is no consequence provided the individual digits preserve proper positional significance. notation will yield two partial sums of: ADDITION OF k OPERANDS this corresponds to 000832 8 31 . . . 8 0 and 8c328e31 • • • 8 eo 000.) However, by choosingm-1 adjacent columns instead of m, for the same range of k, it will be impossible to obtain only two partial sums. In case of m -1, it is easy to show that one cannot obtain less than three partial sums formed by simply appending the sums available from individual 2 bit, 9 number AMO units in Figure 4. Addition of these three partial sums implies one additional stage of carry save addition (Le., another cycle of addition) before using the carry look ahead adder to obtain the final sum of k numbers. Thus, minimally m+ 1 cycles are required to add k numbers. The reader may note that according to the general partition concept explained with regards to k number addition, carry save adder design is a special case. For carry save addition, k = 3 and hence from the minimum addition cycle point of view, each column partition set can have one column only. While for k = 5, 9, 17 and 33, the column partitions must have minimally 2, 3, 4, and 5 adjacent columns respectively. For example, the addition of 33 numbers with 5 bit column partitions will require only 6 addition cycles. Consider a design for an adder for multiple operands (AMO) capable of adding k, n-bit long numbers in minimum addition cycles by a scheme such as described by the example of nine numbers. First, we shall show that it is possible to add k numbers in m+ 1 cycles, where m is the smallest integer 2log2 (k -1) . Let each column partition of k numbers have m adjacent columns for any k such that (2 m- 1+ 1) (0). A parallel analysis is performed for the signal lines associated with the faults of the set FU1. The result here is an additional output pin <1>(1). The result from this method is two Ilew output pins, one associated with saO faults, the other with sal faults, which yield maximum fault coverage under the input sequence ~. The theoretical development of the al~ gorithm which generates the functions is presented in Appendix B. I I Procedures for Increasing Fault Coverage for Digital Networks In addition to the two primary outputs required by this method, two additional inputs, 10 and II, are needed to facilitate the detection of faults within the ~(O) and ~ (1) networks. The major advantages to method 2 are: (1) only fault free simulation is required, (2) regardless of the size of the network, a maximum of four additional external contacts is required. Several techniques have been discussed for decreasing the maximum below 4 additional contacts discussed above. lO ',1,: 1 I SE::::::li:::I:::~::~:ditionru extenwl con- tacts from methods 1 or 2, when considered as new primary outputs, partition M U into two disjoint subsets, MUd (detected) and M Uu ( undetected) . A similar partition exists on Fu; that is, for each m i contained in MUd, thenfi contained in FUd and for each mi contained in M U u , then Ii contained in Fuu • If the external contacts, which have been added to facilitate this partition are considered to be the r components of an output vector P, then for the application of X on ~ the results are: (1) for each mi contained in MUd, pi~pO (where pk is the output sequence of P vectors from m k under application of X) . (2) for each mi contained in M Uu, pi = po. I Application of the FDT sequence X to ~ has been successful in detecting all single faults except those which result in the set M uu. Since these faults could not be detected by direct monitoring of the signal line, it is apparent that under the application of X to ~, the signal line associated with fault ji, for each ji contained in Fuu , did not assume the, proper value to allow for detection of ji. As an example, to facilitate detection of Z(k-L+l) X (k-l) X(k) Z(k) X (k-L+l) cO (k-L+l) Y(k-L+l) CO(k) cO (k-l) y(k-L+l)Y(k-l) y(k-l) Y(k) Figure l-General space domain model (k 377 singular cover bed c d b==D~ ___- e e III 1 o 0 x x x 0 x 0 x x 0 0 Figure 2-8ingular cover for an AND gate the fault, line a (sal), the FDT sequence must force line a in mO to assume the value 0 at least once. The problem is to develop a heuristic which will allow modification of X so as to enable detection of the faults ji contained in Fuu • The heuristic technique presented here borrows on the theory which has developed around the use of the classic d-algorithm. 1 The similarity will be seen between this method and the consistency test or backward drive segment of the d-algorithm. Following Breue:r6 it is suggested that the time domain analysis of the system ~ be mapped into its corresponding special equivalent. This mapping can be accomplished if, for each new input vector, a new copy of ~ is allowed. Since it is the goal to force a given value on a particular line in mO, the multiple copies of mO will be labeled CO(k), CO(k-1),----CO(k-L+l). The length L of the new sequence X rri i generated in this manner can be dynamically determined within reasonable restraints. The space domain analysis can be understood by observing Figure 1. The copies of the machine are interconnected in such a way that in addition to the original input vector, COCk-d) has as inputs on its Y(k-d) lines the state variable vectory (k-d-l) from copy Co (k - d -1) . Assume that it is necessary to generate an input sequence X m i of length L to aid in detecting ji contained in Fuu , a sal fault on line a. First, assign line a in COCk) the value 0 and attempt to drive this signal from COCk) back through all copies to CO(k-L+1}. The method for accomplishing the backward drive will now be discussed., For all gates along the signal paths which control line a of Co (k) , the singular coverss must be formed. An example of the singular cover for a 3 input AND gate is given in Figure 2. The singular cover for Co (k) is formed between inputs and signal line a. The required value on line a is then driven backward to the inputs of COCk) by performing intersections on the singular covers of the gates along the path. All parallel paths must be intersected simultaneously. However, intersections need ,not be made with singular cover vectors for gates whose outputs are 378 Fall Joint Computer Conference, 1972 unrestricted. The rules for intersection are: lAO=0=OAl xAO=O=OAx xAl=I=IAx 0 results, then an inconsistency exists and a retrace is required beginning with a new vector from the appropriate singular cover. If ~ is asynchronous, care must be taken when picking vectors from the singular cover for intersection. It must be assured that D[X(k-r) -X(k-r+l) ]~1 (where D is the Hamming inter-vector distance). As an example, if X(k-2) = [Oxxl] and X(k-l) = [Olxl], D = 1. This, however, may force the reevaluation of D[X(k-l) -X(k)]. When the backward drive to the inputs of COCk) is completed, the values required on the input vectors X(k) and Y(k), which is being input from the CO(k-l) copy, is Y(k) = [xxx . ... x] (unrestricted), then the result is a sequence ffmi of length L = 1. However, if Y(k) ¢[xxxx ... x], the backdrive must continue through CO(k+l). This procedure continues until at some level (k-L+l), Y(k-L+l) = [xxx ... xl This strategy is required so that the sequence which is generated is not state dependent. Therefore, the sequence ~m i is forced to produce the desired result on line a regardless of the state of ~ when ~m i is applied. If, due to network configuration, information concerning machine state is known, this requirement can be appropriately relaxed. If at the (k-r) level the condition Y (k - r) = [xx ..... x] is not satisfied, the procedure must continue to the (k-r-l) level. However, this process must not be allowed to continue indefinitely. One criterion for stopping the process short of success would be to determine some cost effective constant R and require that L~R+l. If this technique yields a sequence ~m i and if ~ is synchronous or combinational, ~m i is certain to assign the proper value to line a; that isif~mi=X(k-L+l), X(k-L+2), . .. X(k-l), X(k) is applied to m Obeginning at time t=tQ, line a will assume the desired value at t=to+L (with L assigned time units). If ~ is asynchronous, the space domain model fails; thus, the technique is heuristic, and ~mi must be simulated to check on its validity. In either case, if ~mi is valid, the new FDT, which covers the set of faults, (fi+F-Fuu) , is ~~mi. That is, fCm i concatenated to the end of ~. If there are other faults, p contained in Fu u , which are not covered by fC~m i then this procedure would be repeated for p. There is no guarantee that the ~m i found in this manner is optimal. The length of ~m i If at any time during the backward drive a is dependent upon the choice· of. vectors. from the singular covers. After all sequence modifications of the form ~mj have been produced, the total modifications are then simulated with ~, to determine their success. If the ~mPs are successful these results must be combined with either method 1 or method 2. CONCLUSION Summary Two techniques have been presented which yield modification to the general digital network to facilitate maximum fault coverage under a given input sequence. Method 2 accomplishes the network modifications with a minimum impact upon the surrounding environment with which the network must interface. The technique for providing input sequence modifications will have little value if the original input sequence was designed by an accepted fault detection. test generation algorithm. However, if the original. ~ sequence was developed by a less effective technique and if ~ lends itself to space domain analysis, this technique is very useful. It seems evident that if the designer purposely exercises the trade offs made available by' these techniques, an acceptable level of fault coverage can be realized on any general digital network. Although these techniques are useful on all types of networks, it seems apparent that they are of extreme importance in the asynchronous sequential area since it is in this area that previous techniques fail. Results The TEGAS9 digital logic simulator was utilized in collecting data to evaluate the secondary techniques. This system is implemented on an IBM 360/50 system in Fortran IV and can simulate 32 different network fault configurations with each pass through the network. The simulator presents the network data in a form which is readily usable by the secondary techniques. The signal line values can be readily interrogated at any time to determine fault coverage. Although some of the actual data analysis for the secondary techniques was done manually, this process is being program implemented and interfaced with the TEGAS simulator. The computer run time required by the simulator is dependent, not only upon the element count for the network, but also upon network structure. Typically, asynchronous networks with 15-30 elements will require 1-5 minutes of computer time for simulation with an I I I I I Procedures for Increasing Fault Coverage for Digital Networks 379 ~, .~. input sequence of length 10. It is expected that when the secondary techniques have been program implemented and interfaced r this time will increase by something less than 35 percent. ~,. I. I ;,' Ii APPENDIX A il, I ; I· , This appendix will present the theoretical foundation underlying method 1. Consider the set of all signal lines contained in ;m: to be S = (S1, S2, ..... Sm). S contains all primary inputs, primary outputs, feedback lines, and all internal connection lines. For each siE S, two logical faults can be associated; that is, si(SaO) and Si(Sal). The total number of faults can be collapsed across each network element; but since this in no way influences the theory of solution, it will be ignored until it can be utilized to expedite data analysis. For each Si E S there exists Ji EF and Ii EF and m i EM and mjEM. Observation of the output sequence Z = ZlZ2Z3 ••••••• Zw, for the application of X = X 1X 2X 3 • • • • • • • Xw to ;m: performs a partitioning of M and F. This partitioning can be applied to the set S. Consider the set Su (undetected) to represent the set of signal lines such that VsiE Su there exists at least one Ii EFu corresponding to a logical fault on Si. Sd will be the subset such that VSjE Sd there exists exactly two faults, fk and il, EFd which are associated with faults on signal line Sj. The value on signal line Si after the application of X k , in the X sequence, to machine m j , will be represented by v(i, j, k). For the application of each input vector X k , . in the X sequence, first a comparison of v (i, 0, k) with v(i, j, k) is made for all j to determine which elements of M can be detected by Si under application of X k • This must be done VsiE S. This entire process must then be performed for k= 1 to w. The result from this operation will be a set of fault coverage lists of the form Si, X k , m P , ml, ... m where this list represents the fact that by observing line Si, while X k , in the X sequence, is applied to ;m:, faulty machines m p , ml, ..... m can be detected. It is upon these fault coverage lists that the cover analysis must be performed to determine which signal lines must be monitored. The rules for performing the cover analysis will now be considered. All signal lines which are primary outputs are, by definition, going to be monitored. Consider the set of all primary output lines to be Sz. For each siE Sz, Si is a primary output of ;m:. Thus, the removal of all faults which are associated with the fault coverage lists of the elements of S z before the analysis starts is necessary. VsiE Sz, there is associated a set of fault T 1\ III I 'i ,I II il 1.1 I '".,'! T , coverage lists of the form Si, X k , m p , m l ... mr. By combining all machines which are listed in the fault coverage lists for signal lines Si the set M Zi is formed, where Vm j EMZi, mi can be detected by monitoring Si. Similar sets M z~ are formed Vk such that Sk E S z. It can be seen that the set Md= U(Mz i ) for all i such that siE Sz (where U is the set union operation). In a similar fashion, sets MS i for all i, such that, siE (S-Sz) are formed. From each such set M Si, the elements which are common to M Si and M d are then removed. That is, MSi*=Msi- (MsiAMd) is formed (where A is a set intersection operation). There now exists a set of the sets of form MSi*, where V miEMs/, mjEMu and m j can be detected by monitoring Si. To decide which signal lines of the set (S- Sz) must be monitored, first a search for critical signal lines is performed. That is, Vm i EMu, for which m i is contained in one and only one Ms/, monitoring of Sj is required. All machines which are covered by any such line Sj must now be removed from the M Sk * for all remaining lines in (S- Sz). The cover analysis then proceeds using the following two rules: (1) The signal line with the highest value is the next line entered into the set S8' The value for any line is equal to the number of previously undetected faults which are covered by monitoring this line. (2) If several lines have equal value, the choice will be arbitrary with the only priority being assigned to state variable lines. The results of this analysis will be two sets of signal lines Sz and S8, where VsiE Sz, Si is a primary output and VskE S8, Sk is not a primary output. The members of S8 are the signal lines which will require additional primary outputs from the package to facilitate monitoring. If ;m: represents a general network, then VS k E S8, it is necessary to add an additional primary output. APPENDIX B This appendix presents the theoretical foundation underlying Method 2. The set M is partitioned into M d and M u by the application of X to ;m:. The elements of each M u and Fu are then further partitioned into two disjoint subsetsFuo, M Uo and FU1 and M ul-where Vm i EMUo, the associated JiE Fuo is a saO type logical fault, and Vm j EMUl, the associated Ii E FUI is a sal type logical fault. For each fault Ji EFuo, there is an associated 380 Fall Joint Computer Conference, 1972 signal line Sk. So will be the set of signal lines associated with the faults of Fuo and similarly S1 and FU1. Since, in general, we may have both logical faults Ji and ji associated with a given line as elements of Fu, generally, S1ASo~ 0. The signal lines Si, such that Si E (S1 U So) , are the lines which must be monitored. If under the input vector X k from ~, the signal line Si (where Si E So) = 1 in mO, then Si can be monitored to detect Ji (where JiE Fuo is one of the faults associated with Si) during Sk. Since there may be many such s/s for a given X k , there will be associated with each input vector two sets of signal lines, SXk(O) and SXk(1) where VsiESXk(O) the faultJi (where JiEFuo is a fault associated with Si) can be detected by monitoring 8i during X k. Likewise, V8jE SXk(1), the fault ji (where jjEFu1 is one of the faults associated with line 8j) can be detected by monitoring line 8j during X k • After the entire sequence has been applied to ~ and all of the sets of the type SXk(a) have been formed, a set S(O) = (SXk(O), SXk+r(O) ..... ) is formed. S(O) is formed by including sufficient elements SXk(O) so that V8iE So, for which there exists at least one SXk(O) E S(O). Thus JiE Fuo can be detected by monitoring 8i during X k. Similarly S(1) = (SX 8 (1), SX8+r (1} .... ). The following notation is now defined. If we have a set R = (r1, r2, ra, ...... rk), then IT (R) = n(rI, r2, ... rk) = (rI"r2"ra" .... rk), where (.) represents the logical AND operation. Similarly, };(R) = (rI, r2 - - - - rk) = (rl +r2+ra+ - - - rk) where (+) is the logical OR operation. Utilizing the above notation, the functions <1>(0) = L: [n(SXi(O) , 1 J 0 S(O) <1>(1) = II [~(SXi(1), IIJ are formed. S(I) The 1 signals are conditioning signals which will be defined later. The 's express the logic function which must be realized on the additional network outputs so as to cover the faults of Fu which are detectable by this method. In realizing <1>(0), it can be seen that each element of S(0) will define the input list to an AND gate. That is, VSXk(O) E S(O) there will be defined an AND gate AXk(O). Each such AXk(O) will have as inputs all elements of the set SXk(O) plus an additional conditioning signal 10. The outputs of all such AXk(O) gates will completely define the input set for an OR gate <1>(0). The output of <1>(0) will represent one of the additional required primary outputs. Note: This discussion has been based, for simplicity, upon two level AND~OR logic. Certainly, the type logic elements actually utilized and the method of x1----t X 2 ----; ,------1 c + Figure 3-Example network interconnection is unrestricted so long as the function realized"is unaltered. A similar two level OR-AND structure can be described for the (1) function. Due to the parallelism between these two functions, the verbal description of <1>(1) is omitted. The 10 and 11 signal lines are used to facilitate fault detection of the added hardware. 10= 1 during the application of every X k to ~, for which SXk(O) E S(O). 11 = 0 during the application of every X k to ffir, for which SXk (1) ES(1). It must be mentioned that if the network is such that every Xi of ~ has associated with it an SXk(a) E Sea) (for a=O or a= 1), then an additional input vector must be added to ~ to facilitate the detection of the gates in the (a) network. That is, if line I a must be used to condition the gates of network (a) during the entire ~ sequence, then an additional input vector must be added to ~ so that 1a can be used to detect faults in the (a) network. From the above discussion it can be seen that if the network is fault free, then <1>(0) = 1VXi for which there exists an SXi(O) E S(O). However, if we have the fault fiE Fuo on the signal line 8iE So, then <1>(0) =0 for all X k , such that 8iE SXk(O). TABLE I-Faulty Machine List mi Specific Fault ml xl{sal) Xl (saO) x2(sal) X2(saO) x3(sal) x3(saO) a (saO) a(sal} b(saO} b(sal} c(saO) c(sal) m2 m3 m4 m5 m6 m7 m8 m9 mlO mll m 12 I I I Procedures for Increasing Fault Coverage for Digital Networks 381 example 3 illustrates the sequence modification technique. A sa fault on the output of gate AXk (0) of the <1>(0) network will result in <1>(0) =0 during X k • Also, (O)saO will be detected by <1>(0) =0 during an X k for which SXk(O) E S(O). If there exists an Xr such that SXr(O) ~ S(O), then setting 10=0 during Xr yields <1>(0) =0 for mO; but <1>(0) will equal 1 if any gate in the <1>(0) network is sal. A similar argument can be given for the output values and the faults within the (1) network. Example 1 Method 2 will be illustrated using the network of Figure 3. Table I associates with each possible single logic fault a machine number mi. For the input sequence OC=X1X~3X4= (111) (101) (001) (011), Table II shows the values of all signal lines of the network shown in Figure 1. The table includes data for the fault free and all single fault machines. Note: APPENDIX C This section contains three example problems: Example 1 illustrates method 1, example 2-method 2, and TABLE II-Simulator Output Table Signal Unes o 1 2 3 1 1 1 1 1 1 1 1 1 o 1 1 1 1 1 Xl 1 (111) 1 1 1 1 X2 (101) o o 1 1 1 1 1 1 1 1 1 1 o o 1 1 1 1 1 1 1 1 1 1 1 o (001) 1 1 o c o o o X3 1 1 1 1 1 1 1 1 1 1 o 1 1 o o 1 o 1 1 1 1 1 1 1 1 1 o 1 1 1 1 1 o o 1 1 1 1 1 1 1 1 1 1 assume line c = 1 at start. From Table II, it can be seen that since Md=[m6, m9, mIl] then i = machine number (mi) 4 5 6 7 8 1 o 1 1 1 1 1 1 1 1 1 1 1 1 o o 1 1 1 1 1 1 1 1 o o 1 1 o o o o 1 1 1 1 o o 1 o 1 1 o 1 1 1 1 1 1 1 o 1 1 1 1 0 1 1 1 1 1 o o 1 1 1 1 1 1 1 0 o o 1 1 0 o 1 1 1 1 1 1 1 o o o 0 0 o o 1 1 1 1 1 o 0 o 1 1 o o 0 o 1 1 1 0 1 1 1 1 1 1 1 o 1 o 1 9 10 11 ll'ault Coverage Lists 12 1 1 1 1 0 1 1 111 Xl, Xl, 111 111 111 101 101 X2, Xl, 0 000 1 1 0 1 1 1 1 1 1 1 1 1 1 101 101 000 000 0 0 1 0 0 0 1 1 1 000 101 101 000 0 111 1 1 1 1 1 1 1 1 1 0 1 101 101 m2 m' Xa, Xl, m G a, XI,mT b, Xl, m G, m 9, m ll c, mll X 2, m 2 X2, X 2, m 3 Xa, X 2, m G a, X 2, m2, mT b, X 2, m G, m 9, m ll c, X 2, mll Xl, X a, ml X2, X 3, m S Xa, X a, m G a, X a, ml, m 3, m S b, X a, m G, m 9, mll c, X a, m G, m 9, m ll Xl, X., m l X2, m. Xl, Xl, x., X 4, m G a, X 4, m4, m T b, X 4, m G, m 9, m ll c, Jt4, mll Xa, The cover analysis is shown in Table III. From Table I it can be seen that by monitoring signal line a, all faults coverable by this method are detected. By monitoring line a along with the primary output c, MX1*=[ml, m2] TABLE III-Cover Analysis Elements of Mu MX2*=[m3, m4] MX3*=fJ Signal Lines Mb*=fJ Xl X X X X X2 a x X X X X X 382 Fall Joint Computer Conference, 1972 all machines, except m 5, m lO , and m12, can be detected. Faults p, po, and p2 are undetectable under this input I sequence. Example 2 AX1 (0) Referring to the network of Figure 3, the following sets are enumerated to further clarify the theoretical discussion included in Appendix B concerning method 2. cp (0) Md= [m 6, m 9, mll] Fuo= [Xl (saO) , X2(saO) , a(saO)] FUl=[Xl(sa1), x2(sa1) , x3(sa1), a(sa1), b(sa1), c(sa1)] cp (1) Table IV contains the fault free simulation data. From Figure 4-Networks leading to additional outputs TABLE IV-Fault Coverage Table Signal Lines Xl = (111) Xl X2 Xa a b c X2 = (101) Xl X2 Xa a b c Xa = (001) Xl X2 Xa a b c X4 = (011) Xl X2 Xa a b c mO 1 Xl So X2 a Xl X2 Sl Xa a b c X X 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 1 SXl (0) = [xl,x2,a] SXl (1) = 0 X X X XS 2(0) = [xl,a] XS 2(1) = [X2] X X X X SXa(O) SXa(l) =0 = (Xl, X2, X X SX4 (0) = (X2) SX 4 (1) = (Xl) a) I Procedures for Increasing Fault Coverage for Digital Networks I, 383 TABLE V-Singular Cover for Gate b(k) .~,i 1 1 o x A 1 o o o x label b(k) c(k-1) B C this table it can be seen that Figure 5-Space domain model of Figure 3 S(O) = (SXl(O)) or Since the feedback line c(k-l) ~x when b(k) =1, the process must proceed to the (k-l) level. Therefore, CO(k-l) is added to Figure 5 and the singular covers listed in Table VI are formed. The singular cover vector A from b(k ) , labeled Ab(kh can be intersected with either A or B of the singular cover of c(k-l). Since b is the gate which is influenced directly by the feedback line, the intersection between Ab(k) and Bc(k-l) is performed. This intersection will place less restrictions on the feedback line which is input to gate b(k). The results of the intersections are shown in Table VII. A * need not be intersected with any of the singular covers of b(k -1) since b (k -1) = [x]. A * is now intersected with either Aa(k-l) or Ba(k-l). The result is shown for Aa(k-l). This final vector has Y(k-l) =c(k-2) = [x]. Therefore, the procedure stops with L=2. The Xm i sequence is X 1X 2 = (xIx) (xxI). It can be verified by hand simulation that this sequence does indeed force line b to have a value 1. S(O) = (SX2 (O)), (SX4 (O)) ij SCI) = (SX3(1)) The networks which lead to outputs , u, as, t) NEW"'M PARAMETER MODIFI· CATION STRATEGY (2) where c/> is the response vector, u is the excitation, and as is the system parameter vector. 2. "Starting" Model: On the basis of all available evidence and insight, an initial hypothesis as to the model is made. This includes an initial specification of the governing equations, the structure or' geometry of the system, and the system parameters. The model equations have the general vector form ~=j('I!, u, aM, t) (3) where 'I! is the response of the model to the excitation u, and aM is the model parameter vector. Figure I-Conventional parameter identification considerable sampling errors, and never sufficiently complete to satisfy the analyst. Usually, these data are passively-obtained, constituting responses to incompletely-known excitations over which the analyst has no control. The ultimate objective of the modelling effort is to generate a computer model which can be utilized, during the simulation phase, to investigate a variety of hypothetical control situations and which can be used to predict the response of the system to these control strategies. There are, of course, many system identification problems which do not have these characteristics. I t is conceded, therefore, that the type of modelling discussed in this paper is directly applicable to only one class of a broad spectrum of modelling problems. I THE CONVENTIONAL MODELLING METHOD I The approach most often used in the construction of models of the type discussed in the preceding section involves the iterative refinement of an assumed model, 1 by comparing the response of the model with the 'I, response of the prototype system and by modifying the model so as to minimize the difference between the two. This is illustrated in Figure 1 and discussed in considerable detail by Balakrishnan 3 and Bekey.4 The , ' following are the major steps in the conventional method: i!,1 II 1 ' 1, I' 1. Formulation: The basic governing equations and all specific physical information applying to the 3. Implementation: The equations characterizing the "starting" model are programmed on a computer. The computer model is then subj ected to excitations similar to those recorded for the prototype system under study, and the response of the model to these excitations is obtained. A criterion function is specified to serve as a measure of the extent to which the response 'I! of the model conforms to the response cf> of the prototype system being modelled. Usually this criterion function is defined by an expression of the type 4. Criterion: J(T, aM) = iT (cf>-'I!)' W (cf>-'I!) dt o (4) where W = W(c/>, 1/;, t) is a suitable weighting function, and T is the time interval over which the identification takes place. This criterion function J(T, aM) is calculated from the system and model responses. That is, the response of the model and the response of the system are compared. 5. Decision: The objective of the identification procedure is to seek an optimum set of parameters aM which minimize the criterion function such that min J(T, aM) = J(T, aM) (5) The criterion function calculated in step 4 is. therefore, examined to see if it exceeds a specified minimum E. If it does not, the identifica- 388 Fall Joint Computer Conference, 1972 tion is complete, the model is considered valid and employed for simulation. If the criterion function is not sufficiently small, the model must be modified. 6. Modification: A computational procedure, usually in the form of algorithms, is specified. This routine defines the manner in which the model parameters aM are to be modified after each iteration, and it may involve gradient methods, random search, relaxation, etc. In any event, it acts to change the parameters of the model hopefully in a manner which results in a smaller J(T, aM). The conventional method of modelling is effective for the identification of systems which are "well-behaved." In particular it works well in situations in which the initial guess as to the model is very close to the prototype system, and where the excitation/response data are of very high quality. The method breaks down in many practical applications, however, for two principal reasons: 1. The first hypothetical model is not a sufficiently-close representation of a prototype system, and 2. The excitation/response data available from observations of. the prototype system are of such low quality that the attaining of a minimum in the criterion function cannot be taken with confidence as an indication of the validity of the model. To illustrate the weakness of conventional modelling, consider again the underground water resource modelling problem discussed above. Whereas Equation (1) can be assumed to apply reasonably well throughout the aquifer, the geometry of the field (buundaries of the porous medium), the initial conditions, the excitations, as well as the presence of maj or inhomogenieties are only incompletely known. The first hypothetical model is, therefore, likely to be substantially different from the actual system. The system response data which are to be used to improve this initial guess are represented by well-logs at haphazardly-spaced points in the field and constitute measurements of the dependent variable sampled at insufficiently-frequent intervals and subjected to serious measurement errors. Nonetheless, the conventional modelling approach requires the iterative refinement of the initial model until a criterion function of the type of Equation (4) is minimized; that is, until the transient response curves of the model are fitted closely enough to the field data. As a result, even if after much laborious and elegant computation, one arrives at a model which provides a tolerable match of the field data, there remains considerable uncertainty as to the meaningfulness of the model and its usefulness in subsequent simulations. This unfortunate consideration applies to models and simulations in a wide variety of important areas of application. THE PATTERN RECOGNITION APPROACH The approach to modelling suggested in the present paper is based upon the following premise: the excitation/response data available from experiments or observations of a prototype system contain a large amount of potentially-valuable and useful information which is not adequately utilized in the conventional approach to modelling. In the attempt to employ curve- or data-fitting methods to match the responses of a dubious model to highly error-prone experimental observations, many key features inherent in the experimental data are averaged out, overshadowed, or simply not utilized. A reason for this lies in the application of the criterion function, such as Equation (4), during each iterative cycle, which involves an attempt to compare the "artificial" responses of the model with the "realworld" responses of the system at each stage of the modelling process. The pattern recognition approach to modelling is similar in some respects to that employed by Duda5 and others in recognizing and classifying handwritten chara'cters. In that method, the pattern recognition problem is viewed as a sequence of four mappings as shown in Figure 2. The handwritten characters themselves constitute a so-called "object space" (z). By means of a video scan of the characters, followed by sampling and digitizing, the object space is mapped into a "representation space" (y) consisting of a sequence of binary numbers. Algorithms are then developed to map from the "representation space" into a "feature space" (x). This mapping, termed feature extraction, involves ignoring most of the available samples and the focusing of attention on a few key sampled values which are sufficient to distinguish the characters from each other. Finally, there follows a mapping from the "feature space" to the "decision space" (d), a classification operation in which the features that have been extracted are used to decide the identity of a character under examination. OBJECT SPACE z REPRESENTATION SPACE SCANNING SAMPLING DIGITIZING I-- V FEATURE EXTRACTION FEATURE SPACE DECiSION SPACE d CLASSI FICATION x ... Figure 2-Successive mappings in pattern recognition ! System Identification and Simulation ,I,' /' lIt II: In character recognition, no attempt is made to devise a criterion function of the type of Equation (4) in order to identify characters. Rather the one-dimensional sequence of video signals is subjected to feature extraction, such that a small number of video samples are examined to determine whether they are black or white. The decision as to whether a given character is or is not the letter A, for example, is made on the basis Df whether these key characters, sometimes termed the "mask," are of the correct combination of black and white. This mask is developed by postulating a "starting" mask and by working with a "learning set." The learning set is a collection of handwritten characters obtained from representative collections of manuscripts. The "starting mask" and a decision algorithm are then used to examine the learning set, and the success or failure of the character identification is recorded. The mask and the decision algorithm are then modified and applied to the same sequence of char~cters. This is repeated for many different masks and decision algorithms. The mask and algorithm which manifest the best record of success are adopted as the pattern recognition algorithm, arid are then applied to unknown characters as required. The pattern recognition method of modelling has the same starting point as the conventional approach. Experimental system data (input and output measurements) are assembled, and a first hypothetical model is formulated and implemented on the computer. At this point, the two approaches part company. In the pattern recognition approach, the model implemented on the computer is not regarded primarily as something tD be iteratively matched to reality (the system outputs). Rather it is considered as a "learning· machine" to develop feature extraction and classification algorithms which will eventually serve to extract pertinent information from the "real world" system data. The primary objective of this first stage of modelling is not to progressively refine the model, but rather to develop a set of computing routines which can subsequently be utilized to analyze the data available from the system to be modelled. The results of this analysis then are used to formulate the "starting" model for conventional parameter identification. As shown in Figure 3, the modelling problem is therefore -subdivided into two stages: pattern recognition and parameter identification. The term "pattern" is used in the present context to connote general or global characteristics of the system being modelled. For any specific modelling problem, these patterns must be known in order to talk meaningfully of parameters and their identification. Accordingly, a list of patterns is prepared, and the nature of these patterns is to be extracted from available system observations (excitations/response data). Where possi- MODEL RESPONSE 389 PATTERN PERTURBATION PATTERN RECOGNITION SYSTEM RESPONSE CONVENTI·ONAl PARAMETER IDENTIFICATION SYSTEM EXCITATION AND RESPONSE Figure 3-Steps in modelling ble, these patterns are formulated in such a manner that their recognition involves the answering of a yes/no question. For example, in the case of a distributed system such as that described by Equation (1), these questions might include: 1. Is a given parameter (for example S) present in non-negligible quantities? That is, is it necessary to include that parameter in the model? 2. Is this parameter constant, in the range of dependent and independent variables for which system observations are available? 3. Is this parameter a function of the independent space variables, x and y? 4. Is this parameter a function of time? 5. Is this parameter a function of the dependent variable (nonlinear)? 6. Does the magnitude of this parameter everywhere fall within a specified range? 7. Considering the quality of available system observation data (number of measuring stations, sampling interval in time, and measurement 390 Fall Joint Computer Conference, 1972 errors) is it possible to derive a model of a given dimensionality? That is, do available response data permit the meaningful construction of a finite difference grid of a specified truncation interval? Similar questions can be asked regarding the geometry of the system, that is the location of field boundaries, and even the general structure of the basic equations. Usually in conventional modelling, all patterns of the type listed above are assumed initially, and a basic error in these assumptions invalidates all subsequent modelling efforts. In the pattern recognition method, a set of algorithms is developed with the express purpose of extracting the answers to these questions from available prototype system observations. The computer model is used to develop these computing routines. Each algorithm is designed to accept, as its input, the response data of the model and eventually response data of the system being modelled. The model is designed to provide data having the same sampling interval, spatial distribution, and measurement noise as the original system. The output of each algorithm is the answer to one question of the type posed above. Each algorithm is, therefore, a separate pattern recognition routine. This routine can conceivably involve transformations or spectral analysis, or it may involve cross-correlations of response data taken at different points in space, but will more often take the form of a "mask." Instead of processing all the samples obtained from all response functions, attention is focused on a few key sampled values. The yes/no decision is based upon the information contained in these samples. The optimum mask, that is the combination of samples which are processed to determine whether the answer to a question is "yes" or "no," is determined experimentally using the computer model. The pattern recognition algorithms used in modelling are developed in a manner basically similar to that used in character recognition. A "starting" algorithm is adopted either from experience or from heuristic considerations. This algorithm is tried out on the model response transients, where these response functions are generated by exciting the computer model with excitations similar to those which excited the prototype system. The algorithm also acts to "perturb" the model, so that the effectiveness of the algorithm over a number of similar yet different model configurations or parameter distributions (patterns) is determined. The algorithm is then modified automatically or by an on-line operator and the process repeated. After a number of such experiments, that algorithm which proved most effective in identifying the desired pattern is selected. The same procedure is followed to obtain successful algorithms for identifying all of the other patterns of interest, so that eventually a library of algorithms is formed-algorithms which are tailor-made for the system being modelled and for the specific excitations and responses which are available from the physical system. Once this library is complete, attention is turned, for the first time, to the response data of the physical system. These data are now processed by the algorithms that were just developed. That is, the pattern recognition algorithms are employed to determine the patterns of the physical system. This process may demonstrate that the model used for "learning" differs radically from the system being modelled. Accordingly, this model is modified so as to give it the patterns which were found to be contained in the system being modelled. This whole process is repeated until the pattern extracted from the physical system corresponds reasonably closely to those assumed for the system being modelled. At that point, one can conclude that the model is "within the correct ball park," and the conventional parameter identification method can be employed to determine the fine structure of the model. The general approach is illustrated in Figure 4 and takes the following steps: 1. Formulation: The basic governing equations and all specific information applying to the system under study are assembled, together with all available excitation/response data. The basic equations take the vector form ;p = f( cp, U, as, (38, t) (6) where (3s is a vector of patterns. 2. "Starting" Model: On the basis of all available evidence and insight, an initial hypothesis as to the model is made. The model equations have the general form (7) where aM is the model parameter vector and is the model pattern vector. (3M 3. Implementation: The equations characterizing the "starting" model are programmed on a computer. Provision is made in this implementation for perturbing or modifying the patterns of the model under control of the pattern recognition (P .R.) algorithms. The model is to accept as input data the observations of the excitation, u, of the system being modelled. The model response, 1/1, is given as nearly as possible the same characteristics as the system response, cp. That is, response data are read out from the same System Identification and Simulation locations as those at which system response data is available, a similar sampling interval is employed, and if appropriate, noise is artificiallv added to the model output. 4. "Starting" Pattern Recognition Algorithm: On the basis of previous experience and insight, a separate algorithm is provided for each of the patterns, (3M to be recognized. These algorithms may include masks for selecting key samples for further processing. 5. P.R. Algorithm Implementation: The "starting" algorithms are programmed on the computer. These algorithms may contain loops which act to perturb or modify the patterns of the computer model so as to test the algorithms under a number of different situations. For example, if the purpose of the algorithm is to determine whether a given parameter is constant or not, that parameter is given a number of different constant values as well as caused to vary in a prescribed fashion. The modified patterns imposed by the P.R. algorithm are denoted by (3M.* 391 modifications are required, that algorithm having the best percentage of success is selected and stored. 12. P.R. of System Response: The selected algorithms are now employed to process the system response, ¢, obtained from prototype system observations. That is, the algorithms are employed to recognize the patterns, (38, in ¢. 13. Comparison: The patterns (38 recognized using the syste~ observations are compared with the patterns (3M initially assumed for the model. That is, it is verified whether the model used for algorithm development was "in the correct ball park." 14. Decision: The results of the comparison of all the members of the pattern vectors (3M and (38 are analyzed to determine whether the "starting" model was close enough to the system being observed. If agreement between the two is adequate, that is, if the model has most or all of 6. P.R. of Model Response: The algorithm is employed to process the model response, y;, and to recognize the model patterns for each of the model perturbations. The patterns recognized by the algorithm are denoted by SM.* 7. Comparison: For each pattern recognition run, the success or failure of the algorithm is determined by comparing the pattern of the model, (3M* with that determined by the algorithm, SM. * 8. Criterion: A figure of merit for each algorithm is determined by totaling the successes and failures of the algorithm over all the experiments conducted with that algorithm. 9. Decision: A decision is made as to whether or not additional modifications of the P.R. algorithms should be attempted. , SYSTEMR_ 10. Algorithm Modification: Either automatically or with the aid of an on-line operator, the P.R. algorithm is mo~ified. This modification may involve the re-specification of the mask, a change in the manner in which the samples are processed, or it may involve a more fundamental change in strategy. Evidently, the specific nature of this modification depends upon the patterns to be recognized by the algorithms. In any event, steps 5 through 9 are repeated until no additional modifications are required. 11. Selection: Provided no additional algorithm Figure 4-The Pattern Recognition (P.R.) modelling method 392 Fall Joint Computer Conference, 1972 the patterns of the physical system, the pattern recognization process is considered complete, and the computer model can be employed as the starting point for conventional parameter identification and eventually for simulation. 15. Model Modification: If agreement between the model and the physical system is inadequate, the computer model is modified by giving it the patterns determined in step 12. Steps 5 to 14 are then repeated until adequate agreement is obtained. The most difficult steps in this method are the selection of the "starting" algorithm for pattern recognition and classification and the specification of the modification strategy of this algorithm. These depend strongly upon the type of patterns to be recognized, upon the computer model, and upon the nature of the response data. It is necessary, therefore, to build up a considerable amount of experience with this method for any specific application area. Occasionally it may turn out that a proven algorithm modification strategy does not lead to adequate convergence for a specific problem. This may then be taken as an indication that the quality of available response data is insufficient to permit meaningful pattern recognition. For example, the time sampling-interval may be too large, or response data may not be available for enough points in the space domain, or the signal-to-noise ratio may be too low. Under these circumstances, the computer model and the pattern recognition method can be employed to determine the approximate extent to which system observation data must be improved to make modelling possible. This can be accomplished by gradually improving the quality of the computer model response (by sampling it more frequently, for example) until the algorithm modification strategy leads to successful convergence. The results of this computer experiment are then used as the basis for better and more complete field· measurements. CONCLUSIONS The pattern recognition method described in this paper is evidently not a panacea. The procedure is useful only for the identification of systems of "a certain shade of gray," and it leans heavily upon the ingenuity and insight of the analyst. It does, however, constitute a novel utilization of computer models-the development of a "learning set" and the determination as to whether the system response data are of sufficient quality to permit parameter identification. The approach has been used with some success in the modelling of underground water reservoirs of the type characterized by Equation (1) as well as in the study of aquifer pollution problems. The results of these studies will be reported in separate papers. REFERENCES 1 W J KARPLUS V VEMURl Heuristic optimization and identification in hybrid field simulations Proc Fifth Int Congress of AlCA Lausanne Switzerland pp 345-350 1967 2 V VEMURI W J KARPLUS Identification of nonlinear parameters of ground water basins by hybrid computation Water Resources Research Vol 5 pp 172-185 1969 3 A V BALAKRISHNAN V PETERKA Identification in automatic control systems Automatica Vol 5 pp 817-829 Pergamon Press 1969 4 G A BEKEY Systmn 1:den tification-an introduction and a survey Simulation Vol 15 pp 151-166 1970 5 R 0 DUDA Elements of pattern recognition Adaptive Learning and Pattern Recognition Systems (J M MENDEll and K S FU, editors) Academic Press pp 3-33 1970 'I ! Horizontal domain partitioning of the Navy atmospheric primitive equation prediction model by E. MORENOFF Ocean Data Systems, Inc. Rockville, Maryland and P. G. KESEL and L. C. CLARKE Fleet Numerical Weather Central Monterey, California rather than the equation set. That effort has now been completed and a new version of the PEM, partitioned according to horizontal grid space considerations, has been operational at FNWC since October 1971, with significant improvements in terms of both elapsed time and central memory size requirements to generate the 72-hour forecast. This paper summarizes the principal factors involved in the repartitioning of the PEM. First, the PEM, and the mechanisms by which the partitions of the PEM in each of the four process + RT a'IT) + D(u) + F(u) -m 2 {ax ax ax ay m acr m a ('lTv) at a (wv) - 'lTuf - m('IT 2..<£.+ RT 2.2!.) + D(v) + F(v) a + a (VV'IT)} + 'IT aa -m 2 {ax (~) ay ay ay m m a ('ITT) at -m 2 a {ax ('IT~T) + a a ('lTg) at -m 2 a {ax ('lTu q ) + a ('lTvq)}+ 'IT a ('''q) + D(q) ---acray m m 1 -rt -j[m 2 ael> - 0RT a'IT a aa 'IT D( RT = P 0 ) F( ) ay m ('lTvT)} + 'IT a (wT) + ~To w + D(T) + H(T) aa m p a + aay (~)} + {ax (~) m m where 0 - PI 'IT 'IT and w - + Q (q) ~] do dO -0 lateral diffusion operator surface friction operator Q() H() moisture source and sink terms diabatic heating terms Figure I-The equation set that the flux terms conserve the first and second moments of any advected parameter, assuming continuous time derivatives. Total energy is conserved because of constraints placed upon the vertical differencing. Total mass is conserved when integrated over the entire domain. Linear computational instability is avoided by meeting the Courant-Friedrichs-Lewy criterion. The Phillips7 sigma vertical coordinate is employed in which pressure is normalized with the underlying terrain-level pressure. At levels where sigma equals 0.9, 0.7, 0.5, 0.3, and 0.1, the horizontal wind components, u and v, the temperature, T, and the height, z, are carried. See Figure 2. The moisture variable, q, is carried at the three lowest levels. Vertical velocities, w (defined as minus sigma dot), are carried at the layer interfaces (sigma equals 0.8, 0.6, 0.4, 0.2) and are calculated diagnostically from the vertically-integrated continuity equation. At the top and bottom of the column the vertical velocities vanish identically. SIGMl\ VARIABLES 0.0 U,V ---- --------- T,z --- --- w u,v,T,z w u, v: T, z, q - - w U v,T,z,q - - w u,V,T,z,q - - --- --- ---------- ----- -- -- - p Figure 2-Diagram of levels and variables 0.2 0.4 0.6 0.8 1.0 Partitioning of Navy Atmospheric Primitive Equation Prediction Model 395 'II' 1:',1 ;il, {, ~, ! 1,1 I, I ', , The Clarke-Berkofsky mountains are used in conjunction with both a Kurihara8 form of the pressureforce terms in the momentum equations, and with slight amounts of lateral diffusion to eliminate the customary "noise" patterns over high, irregular terrain. The Richtmyer centered time-differencing method is used with a ten-minute time step, but integrations are recycled every six hours with a Matsuno (Euler backward) step to reduce solution separation. The earth is mapped onto a polar stereographic projection of the Northern Hemisphere. The grid lattice has 63 rows and columns, and the geographical equator is an inscribed circle. The mesh length is 381 kilometers at 60 degrees North (and about one half this distance in the extreme corners of the array) . A considerable part of the diabatic heating and moisture terms in the model was based on the work of Mintz and Arakawa as described by Langlois and Kwok. Climatological values of the earth's albedo are used. A Smagorinsky parameterization of cloudiness based on layered relative humidities is used in the radiative flux calculations. Dry convective adjustment precludes hydrostatic instability. Moisture and heat are redistributed in the lowest three layers by use of an Arakawa parameterization of three types of cumulus cloud ensembles. Convective precipitation is permitted in two of these three cloud types. Evaporation and largescale (cyclones) condensation are important source-sink terms in the moisture conservation equation. Evaporation over land, however, is based on a Bowen ratio, using data from Budyko. In the calculation of sensible heat fluxes over water, the FNWC-produced sea surface temperature distribution is invariant with time but updated for each forecast. Over land, the surface temperature is obtained from a heat balance equation. Surface stress is computed for the lowest layer. The type of lateral boundary conditions which led to best over-all results was the constant flux, restoration boundary conditions devised by Kesel and Winninghoff, and implemented in 1970. The procedure is as follows: A field (63 X 63 array) of restoration coefficients which vary smoothly from unity at and south of 7.5 degrees North to zero at and north of 15 degrees North is computed once and saved. At the conclusion of each ten-minute integration step the (new) values of the state variables are restored back toward their values at the previous time step according to the amount specified by the restoration coefficient at each grid point. The net effect of this technique is to produce a fully dynamic forecast north of 15 North, a persistence forecast south of 7.5 North, and a blend in between. The blend region acts as an energy sponge for outwardly propagating inertia-gravity waves. The basic inputs for the model are the virtual temperature and height distributions for the Northern Hemisphere at twelve constant pressure levels between the surface and 50 MBS, moisture distributions at four levels between the surface and 400 MBS, the sea surface temperature distribution, the sea level pressure distribution. These analyses are generated twice daily on an operational basis, and are derived from about 550 upper air reports (temperature, pressure, moisture, winds) and 4,500 sea level observations. These reports are augmented by aircraft observations (mainly between 30,000 and 40,000 feet) in large numbers, and satellite soundings (SIRS data) sporadically. INTER-PROCESSOR COMMUNICATIONS AND SYNCHRONIZATION The inter-processor communications and synchronization mechanisms are identical to those employed in the version of the PEM partitioned on the basis of the equation set as reported in the previously referenced paper . These mechanisms are briefly reviewed in this section for purposes of clarity. The two FNWC dual-processor CDC 6500 computer systems can be linked with each other through the one million words of Extended Core Storage (ECS) operated in a mode such that the entire ECS is accessible by either CDC 6500 computer system. When the PEM is to be executed, the four programmed partitions of which it is comprised are assigned to and loaded in each of the four processors. One of the program partitions is designated as the master partition and the remaining three as the slave partitions by the use of appropriate ECS access codes and pass keys. If the ECS access code field indicates the partition to be the master, the associated pass key is interpreted as the name- of the ECS block storage assigned to the PEM. The slave partitions have no ECS assigned to them but are able to refer to the same ECS block as the master partition by use of the same pass key. Communications between the program partitions being executed in the different CDC 6500 computer systems are established through the aid of a FNWC developed Peripheral Processor (PP) routine, lSI, which links the two operating systems in each of the two computers. Hence, lSI provides a software, full duplex block multi-plexing channel between the two computers via ECS. Messages and/or blocks of data may be sent over this channel so that lSI may be used to call PP programs in the other computer or to pass data such as tables or files between the computers. 396 Fall Joint Computer Conference; 1972 Immediately following the initiation of the execution of the four program partitions in their respective processors, the operation of the three slave partitions is delayed until the master partition requests and has been assigned the necessary ECS block storage required by the PEM, and the synchronization mechanism is initialized. ECS block storage is requested by the master partition in the same manner as any conventional job to be executed in the computer. Once obtained, the master partition labels the ECS block storage by passing an argUment comprised of an access code specifying its status as master and the desired pass key to the peripheral processor routine, ECS. The routine ECS searches the resident control point exchange area (CPEA) and, through lSI, that of the other computer, for a master with the same pass key. If one is found, the requesting program partition is aborted. If the other computer is inoperative or if no matching key is found, the label is established. When the operation of the three slave partitions in their respective processors is manually reinitiated following the successful' assignment of ECS block storage to the master partition and the initialization of the synchronization mechanism, each slave partition passes the argument comprised of its access code indicating it to be a slave and pass key to the peripheral processor routine, ECS. This time, ECS searches its own computer's CPEA for a master with a matching key. If none is found, the search is repeated in the other computer's CPEA via lSI. If still none is found, this fact is indicated to the requesting partition. If a match should exist in either computer, the original ECS will contain the address (ECRA) and field links (ECFL) of the requesting partition stored in its CPEA and will be given the ECRA and ECFL of the matching master partition. The mechanism by which the parallel execution of the multiple partitions in each of the processors are exactly synchronized is based on a general program linkage mechanism known as the Buffer File Mode of Operation. 9 •1o •1l The application of the Buffer File Mode of Operation to the FNWC multiple computer environment requires the Buffer Files to reside in a random access storage device jointly accessible by each of the computers. The ECS satisfies this requirement when operated in the manner described above. The nature of the information passed between any pair of partitions is whether or not one partition has reached a point in its execution where sufficient data has been developed to allow the other partition to initiate or continue its own execution. This is represented as a single "GO-NO GO" flag to be sensed by the second partition. Hence, the recirculating ring structure normally associated with the· Buffer File Mode of Operation reduces to a simple single one word block maintained in ECS. Finally, access to a Buffer File by a partition must be unidirectional. Any single Buffer File may only be written to by one partition and read from by another partition. Consequently, a pair of Buffer Files is assigned between any two partitions whose operation is to be synchronized. PEM STRUCTURE Each of the four partitions of which the PEM is comprised are identically structured. Each partition is considered in three distinct phases: the initialization phase; the integration phase; and the output phase. Whereas the computations associated with the integration phase are identical in each of the four partitions, those associated with the initialization and output phases vary from partition to partition. The phases are structured as separate overlays within each partition. The relationships which exist among the overlays of a particular partition and with the overlays in the other three partitions are illustrated in Figure 3, which is a representation of the master overlays associated with each of the four partitions. The principal functions of the master overlays are to synchronize the calls for the execution of the overlays within the respective partitions with those in the other partitions, to dynamically adjust the field lengths of core storage required by the overlay to be called, and finally, to invoke the execution of the appropriate overlay. Overlay call synchronization among the different partitions is realized via requests by the master overlays in each partition to "set" a Buffer File (SCOMM) or to "read" a Buffer File (RCOMM). A five character Buffer File naming convention was established to facilitate identification of which partitions were involved. The first two characters of the name serve to identify the Buffer File as being interstep ("IS"). The third character specifies whether a split ("S") or a join ("J") is being signaled. The fourth and fifth characters specify the partitions writing and reading the Buffer File. Hence, Buffer File ISS14 is used by partition 1 to split its operation by initiating execution of partition 4 in going from one time step to another. The initialization phase is invoked only once per 72 hour forecast period. The integration phase is repeated in each forecast time step. Each thirty-sixth time step, the results of the preceding forecast hours are output and the integrations reiterated. The program loops extending from the DO statement through the Partitioning of Navy Atmospheric Primitive Equation Prediction Model Partition 1 Partition 2 Partition 3 Partition 4 SCOMM(ISS12,3) START ....---. RCOHl1(ISS12, I} ~~~~~r-----------------------~RCOMM(ISS13,l) OVERLAY 1 L...-""7""""------___~____,.--------L... RCOMM (ISS14, 1) RCO[.M(ISJ21,3....>II--;_ SCOMM(ISJ21,l) :~~J~~4-.~---___________~_ SCO~.M(ISJ31,1) ACKNOV~EDG~~~~:_______________~__________________.- SCO~U"l (ISJ41, 1) RFL,167000. RFL, 14 5000. RFL,145000. RFL,145000. OVERLAY 1 (Initialization OVERLAY 1 (Initialization OVERLAY 1 {Initialization OVERLAY 1 (Initialization RCOMJ.'-1 (ISJ21, 3) . SCOMM(ISJ21,1) EXIT OVERLAY 1 ~~_______________________S_C_0_MM ___ {I_S_J_3_1_,_1_>_____ DO 1035 I=1,LU1 DO 1035 I=l,LIM SCOMM(ISS12,3) . START RCOMM(ISS12,1) DO 1035 I=l,LIM SCOMM(ISJ41,1) DO 1035 I=l,LIM ~==~~2~------------~~ RCOMM(ISS13,1) I L-.-r-------------~---------~ RCOMM (ISS14 f 1) i RCOMM(ISJ21,3) 1 1-'-:- SCOMM (ISJ21 ,1) . ~~~ , - - - - - - - - - - SCOl-!M(ISJ31,1) ACKNOWLEDG;~4~__________________~_______________~~ SCO~~{ISJ41,1) OVERLAY .... I RFL,125000. RFL,125000. RFL,125000~ RFL.,125000. OVERLAY 2 (Integration) OVERLAY 2 (Integration) 'OVERLAY 2 (Integration) OVERLAY 2 (Integration) RCOMM(ISJ21,3) SCOMl'·1 (I SJ21 ,1) EXIT "=0=V=E=RL-=-=-A=Y--=24.::~~~~~~::::~~~~~~~~~~~~~~::-_S_C_0_MM __ (I_S_J_3_1_,_1_)_~ SCOMM(ISJ41,1) SCOMM(ISS12,3) START ~~~ RCOMM(ISS12,1) ~~~-~~----------------~~ RCOMM(ISS13,l) OVERLAY 3 L..-_ _ _ _ _ _ _ _ _ _ _ _ _- : -_ _ _ _ _ _ _. , - - _ ._ _ _ _ _~: til RCOMM (ISS 14 ,1) RCOMl1 (ISJ21, 3) : E· . ._SCOMM(ISJ21,1) (I_SJ_311) _ _ _ _ _ _ _-:-_ _ S_C_0_MM ___ __, _ _ I ACKNO~lLEDG ...l...- RFL,155000. RFL,160000. OVERLAY 3 (Output) Rearm~21 f VERLAY 1035 CONTINUE ~. --.- (OvEiLAY31 ~~ SCONH (ISJ21 ,1) SC01A.J.1 (ISJ41, 1) RFL,160000. RFL,16COOO. OVERLAY 3 OVERLAY 3 (Output) (Output) J -SCOHM (ISJ31,1) --~----~----------~SCOMM(ISJ41,1) 1035 CONTINUB 1035 CONTINUE Figure 3-Partition overlay structure 1035 CONTINUE 397 398 Fall Joint Computer Conference, 1972 Partition I Partition 2 SET-UP Partition 3 SET-UP SET-UP IF (RESTART) GO TO 9050 IF (RESTART) GO TO 9040 COMPUTE/STORE ~, In ?( , MAP, Z{P) COMPUTE/STORE Partition 4 SET-UP IF (RESTART) GO TO 9040 Z, BETA (3) SCOMM(ISS12,2) I I :~ START . WINDS RCOMM(ISSI2,1} ;~ ( INITIALIZE REMAINDER WINDS RCOMM(ISS13,1) ) ( o WINDS ) I RCOMM(ISJ32,1~D~ne SCOMM(ISJ32,1) I INTERPOLATE WINDS I I RCOMM(ISJ21,1)~DOne SCOMM(ISJ21,1) I READ Z, BETA (3) 9040 CONTINUE 9050 CONTINUE 9040 CONTINUE SCOMM (ISS12 , 3.) ~--~ RCO~1(ISS12,1) PREPARE --~E~X~I~T--~--~-----------------~~~ OVERLAY 1 ~------------------------~--------------~----~~RCO~M(ISS14,1) IF (RESTART) READ· ( Z, BETA ( 3) ) END RCOMM(ISS13,1) END READ ( Z, BE'r A ( 3) ) END Figure 4-0verlay 1 partition synchronization READ (Z, BETA(3» END Partitioning of Navy Atmospheric Primitive Equation Prediction Model CONTINUE statement in each of the partitions of Figure 3 control the execution of the integration and output overlays. Note the variable upper limit of the number of executions of these program loops. This is manually set. The operation of the PEM may be suspended and reinitiated from the point following the completion of the most recent execution of the output ove1'lav. Finally, the RFL (REQUEST FIELD LENGTH) statements appearing in Figure 3 clearly indicate the variability of the main storage requirements of the overlays within each partition of the PEM. This variability is principally a result of the manner in which the overlays operate on the data fields present. The initialization overlays treat the data fields on a "fullfield" basis. The integration overlays, however, treat the data fields on a "quarter-field" basis. Lastly, the output overlays treat the data fields on both a full-field and third-field basis. The utilization of the data fields by the overlays will be elaborated on in the following section. The point to be made here is that the main storage requirements of the PEM are dynamically adjusted in the course of its execution to maximize the storage available to other programs which may be concurrently sharing the FNWC computer resources with the PEM. PEM PARTITION OVERLAYS 'I" :' I\' 1,1 In addition to the synchronization of the operation of the partitions of the PEM at the master overlay level, further synchronization is required among the subordinate initialization, integration and output overlays. This additional level of synchronization is realized by the same Buffer File mechanism employed at the master overlay level. Consider first the initialization overlays. As noted in the preceding section, the initialization overlays treat the data fields on a full-field basis and hence the partitioning of these overlays is based on computational functions rather than spatial considerations. Intraoverlay synchronization is consequently needed to insure the completion of each section of the initialization process in the appropriate partitIons before the next section is allowed to be initiated. Reference to Figure 4, for example, shows that the interpolation of the winds in the initialization process in partition 2 must wait for a confirmation that the initial wind computations in partitions 2 and 3 have each been completed. Figure 4 also illustrates the manner in which the PEM restart capability functions. In the event the 399 Row 63 Row 62 Row m Row m-l Row k Row k-l Row n Row n-l Row 2 Row 1 Col.l Col. 63 Figure 5-Horizontal doman partition operation of the PEM is being restarted, as manually noted by the operator, the computations within the initialization overlay are essentially completely bypassed. In such a case the initialization overlay is used to pass those parameters needed for the continued operation of the PEM from partition 1 to the other partitions. During initialization, each partition operates on "full" fields (that is, on the complete 63 X 63 arrays). In the integration phase each partition operates on quarter fields. During the output phase both full fields and third fields are used, depending on the operations being performed. In the output phase, for example, transformation of ·coordinates (from sigma surfaces to pressure surfaces) are carried out in three processors (on third fields) while the fourth processor writes sixty checkpoint restart data fields on the restart tape. Once done, all four processors can then perform full-field filtering and/or smoothing oper,ations on one-fourth of the number of forecast fields which are written on the disk (for transmission to users) . In the integration phase, it is important to note that several alternate configurations were considered with respect to how the 63 X 63 data arrays could be most effectively partitioned. For example, the four-way 400 Fall Joint Computer Conference, 1972 Partitiqn 1 Partition 2 I Partition 3 Partition 4 'j SCOMM(1SS12,~3~)~~ gNiT. ITAT~ .. RCOMM (1SS12 ,1) ~-~~~~-~----------------~--~~. RCOMM(1SS13,l) TI~ STEPL-~------------------~--------~------~~~RCOMM(1SS14,1) DO 10K=1,5 DO 10 K=1,5 I (WINDS) : (WINDS) DO 10K=1,5 DO 10 K=1,5 (WINDS ) (WINDS) I RCOMM(1J21 (k) ,3) SCOMM. (IJ21 (k) ,1) I~: _W~1=N:..:..::D::.::.S~· ---_I----------+.... DONE -E· L..oI SCOMM (IJ 31 (k) ,1) I ~~I------------------+--------------------- SCOMM(1J41(k),1) SCOMM(1S12(k);3) ~ RCOMM(IS12(k),1), I WINDS STORE 110 RCOMM(IS13(k),1) L _ _ _+-____________--____~_________________!~~ I ( STORE ) ( STORE ) RCO~~(IS14(k),1) ( STORE ) 10 CONTINUE 10 CONTINUE 10 CONTINUE (COMPUTE) (COMPUTE ) (COMPUTE ) SCOl-IM (ISJ21, 1) SCOMM(ISJ31,1) SCOMM(1SJ41,1) ... RCOMM(ISS12,1) • RCOMM (ISS13, 1) I ( STORE ) .1 I ( STORE ) "" RCOMM(ISS14,1) ( STORE ) SCOMM(ISJ21,1) SCOMM(1SJ31,1) SCOMM(ISJ41,1) RCOMM(ISS12,1) R~OMM(ISS13,1) ~-, OR EXIT L-~__________________~________________ RCOMM(ISS14,l) OVERLAY 2 END END END Figure 6-0verlay 2 partition synchronization END Partitioning of Navy Atmospheric Primitive Equation Prediction Model partition based on quadrants was rejected because an array transformation would have been required to reassemble the quarter fields into contiguous ECS locations. A detailed analysis indicated that partitioning the grid into four adjacent rectangular strips was the best possible scheme. See Figure 5. Using the partition method shown, nonoverlapping writes to ECS, extra-row reads from ECS for space differentiation, and time synchronization of partitions contribute both to economy and solution (no internal boundary problems). Assume that each partition is to calculate answers for n rows in the 4n X m total field. Now, because the need to compute horizontal gradients, it was necessary to read into central memory of each partition and to calculate the quantities to be differentiated on (n+2) rows (using second-order space differencing). The nonoverlapping n-rows from each partition are reassembled into total fields on ECS at the conclusion of each computational segment. This meant that the read/write first/last word addresses for data transfers to/from ECS were unique to each partition. One final consideration needs to be covered. If all grid points in the total array were of the same class (computationally speaking), each partition should calculate on exactly one-fourth of the number of rows in the total horizontal space. But, in this particular model, three classes of grid points exist. South of 7.5 degrees North, only restoration of former parameter values takes place. North of 15 degrees North, however, the model simulates more of the physical processes than in the region between 7.5 and 15 North (where it is only diabatic). By expressing all of the addresses and field lengths in terms of easily changeable variables, it was possible to let the computer determine the optimum number of rows to be calculated in each processor. In the computations of a typical integration time step the II, results of the preceding time step are transferred from permanent storage to temporary working storage associated only with the particular partition. Following the , computations within the time step of each partition, the results are transferred back to permanent storage. It is important to note the number of strips into which the grid was divided was solely based on the number of Central Processing Units available in which to process the partitions. In the event, for example, ten or one hundred Central Processing Units were available, then , the number of strips could have been selected as either ten or one hundred, respectively, and the number of partitions extended appropriately. Synchronization points within the integration overlay ',·~I ·' among the four partitions of the PEM are illustrated in Figure 6. The following observations are relevant. First, the wind computations are repeated within a 0'. 1 I I' 1 I 'I,", ;,1 1 1 !i'l I ,1,'1 ,~, II 401 DO Loop present in each partition five times, once for each level of the atmosphere. In order to insure the PEM remains in synchronization in the event the execution of one of the partitions is temporarily interrupted or suspended, a different pair of Buffer File communication cells is required for each execution of the DO Loop. Second, the oval COMPUTE box incorporates all the remaining computations associated with the time step. Each sixth time step these computations are modified to take into consideration the effects of diabatic heating. This includes incoming solar radiation, out-going terrestrial radiation, sensible heat exchange at the airearth interface, and evaporation. Each step the computations are further modified to take into account condensation processes. And third, synchronization controls are provided to insure completion of all computations at a time step prior to allowing the results of that time step to be transferred to permanent storage by any of the partitions. Similarly, controls are provided to insure the completion of the transfer of the results of the time step computation to permanent storage prior to allowing the next time step to be initiated in the integration overlays of any of the partitions. The output overlay is entered every thirty-sixth time step or sixth hour during the forecast period. The output overlay in partition 1 is devoted to duplicating onto magnetic tape from their permanent storage all data fields required to restart the PEM. At the same ~ime, the output overlays in partitions 2, 3 and 4 are postprocessing the output fields, that is, transforming coordinates, filling in values under terrain, filtering and outputting the resultant fields. The time required to prepare the restart tape is a small fraction of the time required to output the results of the previous thirty-six time steps and hence this part of the restart procedure does not appfeciablyextend the model's execution time. The synchronization of the execution of the output overlays in each of the four partitions is shown in Figure 7. The routines contained in the square boxes labelled OUTPEI are amplified in Figure 8. Note that all preprocessing of data must be completed before OUTPEI can be invoked in any of the partitions. Further note that in OUTPEI partition 3 assumes the role of master partition with respect to the operation of partitions 2, 3 and 4. The computations within OUTPEI are performed on a third-field basis, one-third of the data fields being processed in each of partitions 2, 3 and 4. OUTPEI prepares data fields for output five separate times during its execution and then calls on the routine OUTPE2(k) where k= 1, 2, 3, 4, 5 to actually output the data fields. Synchronization controls are provided 402 FallJomt Computer Conference, 1972 o Partition 2 RCOMM(1SS12,1} Partition 3 I I 1 I I I RCOMH (1SS13, I) I I Partition 4 RCOMM(ISS14,1} I PREPROCESSING PRE- PROCESSING I I PREPROCESSING SCOMM(ISJ21,1} SCOMM(1SJ31,1) SCOMM(ISJ41,1) - -...~ RCOMM(1SS12 ,1) ~ ~~~---r-=: COPY RESTART FIELDS TO TAPE 1 OU~El RCOMM(ISS13,1} RCOtJIM (18814,1) EJ OUTPE1 1 ~O~(ISnl'~E3~)~~=~S~C~O:~=(:I:8:J2:1::'1:}=~_~s~m I. __ MM_(_IS_J_3_1_,_1_)_T: • RCOMM(ISS12,l) IL~EX~IT~f===~~====~ OVERLAY :3 SCOMM(ISS12, 3) RCOMM(ISS13,l) I!> 0 UTPEI ~~n~U,l) I' . RCOMM(ISS14,l) ("----_CR_UN_) END END END Figure 7-0verIay 3 partl°tIOn synchromzation o o 1 END Partitioning of Navy Atmospheric Primitive Equation Prediction Model 403 Partition 2 Partition 3 Partition 4 COMPUTE 1/3 FIELD COMPUTE 1/3 FIELD COMPUTE 1/3 FIELD 1 SCOMM (ILS2 3,1) 1/3 FIELD COMPLETE JIIIr RCOMM(ILS23,1) I 1/3 FIELD RCOMM (ILS43 ,1) .. COMPLETE I CALL RCOMM(ILJ32,1)~OUTPE~(I) SCOMM(ILS43,1) I SCOMM(ILJ32,1) i CALL OUTPE2 (1) SCOMM(ILJ34,1) ~ RCOMM (ILJ34 ,1) I I (OUTPE2(ll) I I SCOMl-1 (ILS2 3,1) OUTPE2(1) COMPLET~RCOMM(ILS23,1) i I RCOMM(ILS43 1)~OUTPE2(1) , , SCOMM(ILS43,1) COMPLETE I CO¥.l.PUTE 1/3 FIELD COMPUTE 1/3 FIELD COMPUTE 1/3 FIELD I 1/3 FIELD .. RCOMM(ILS23,1) ! I COMPLETE 1/3 FIELD I RCOMM(ILS43,1) ~ COMPLETE I I CALL SCOMM(ILJ32,1) I RCOMM (ILJ 32 , 1) ~ OUTPE2 (5) CALL SCOMM(ILJ34,1) OUTPE2(5) SCOMM(ILS23,1) , SCOMM(ILS43,1) II- RCOMM(ILJ34,1) I (OUTPE2(Sl) (OUTPE2 (S (OUTPE2(Sl) I ! SCOMM(ILS23,1) OUTPE2(5) ~ RCOMM(ILS23,1) COMPLETE RCOMM (ILS4 3 1) , , CSI .OUTPE2 (5) COMPLETE SCOMM(ILS43,1.) I END l) END Figure 8-0UTPEl partition synchronization END 404 Fall Joint Computer Conference, 1972 Resolution Number of Points Effective* S,eace Increment Composite Factor** A. 5°/5 Layers 2450 300 0.41 B. 2.5°/5 Layers 10082 150 3.38 C. 2.5°/10 Layers 1(}082 150 6.76 D. 1. 250/10 Layers 40898 75 55.03 E. 1. 250/20 Layers 40898 75 110.06 * Assumes some technique to artificially eliminate overspecification at high latitudes. ** Compared to the FNWC PEM. Figure 9-Global grid model hierarchy to isolate the output data preparation computations and the OUTPE2(k) calls within each partition. CONCLUSIONS The FNWC (Kesel-Winninghoff) Primitive Equation Atmospheric Prediction Model was repartitioned on the basis of horizontal grid space rather than equation partition considerations. Although the current version of the PEM has been partitioned to take advantage of the four processors of the FNWC two dual-processor CDC 6500 computer systems, the partitioning may be directly extended in the event additional processors are made available. Hence the current version of the PEM is ideally suited for operation on parallel processor computers such as the ILLIAC IV or the CDC 8600. As a consequence of employing the four-processor version of the PEM partitioned on the basis of horizontal domain rather than computational burden considerations, the same 72 hour meteorological forecast products were generated in 80 minutes rather than 135 minut~. In addition, the main core storage requirements of the current model are significantly less than that of the earlier version. This is due, in part, to the introduction of an overlay structure in the current model and, in part, to the performance of computations during the integration overlay on a quarter-field basis. The current PEM has demonstrated a remarkable increase in forecast skill over the previous operational model. It models more of the physical processes better than ever before. But error analyses reveal that the forecasts still deteriorate rapidly in the smaller scales of motion being simulated because of spatial truncation. Spatial truncation can cause undermovement of some small-scale pressure systems by as much as twenty-five percent of the observed displacement. Another significant source of error is the data base itself. In spite of the receipt of over 500 upper-air soundings every twelve hours and 4,000 surface observations every six hours, the data are too sparse over oceans and aloft to correctly specify the initial conditions. With the expected proliferation of satellite probes of the atmosphere, this may not only minimize the initialization problem but also justify high-resolution global models for operational forecasting. A hierarchy of models of varying resolution and the associated computational burden that must be overcome have been consldered. 12 The composite computation factor is normalized in terms of the size of the PEM problem being solved today on two CDC 6500 computers. See Figure 9. Recall that the current PEM has the following attributes: five layers, 4,000 grid points per layer, hemispheric, 200 nautical mile mesh (at 60 degrees North), and ten-minute time steps. Figure 9 shows that latitude-longitude grids of increasing resolution (both horizontal and vertical) could lead to problems requiring two orders of magnitude more computations than are currently being done operationally without any serious risk of over-specification (assuming large quantities of satellite soundings) . If one assumes a fifty percent efficiency for a computer of the ILLIAC IV class, it might be possible to obtain about 500 Million Instructions Per Second (MIPS). FNWC's two CDC 6500 Computing Systems generate about 3.2 MIPS in the PEM. Thus, the number-crunching ratio suggests one might be able to tackle weather forecasting problems from 100-200 times the current problem and still get the answers to the users in the same amount of time. On the other hand, timeliness is a consideration. One might, in the interim, while waiting for satellite soundings, choose to calculate . using moderate resolution and get the products dis.seminated in a more timely fashion. The results of these new efforts involving the implementation of the PEM on computers of the ILLIAC IV class will be reported on in a later paper. I I REFERENCES 1 E MORENOFF W BECKETT P G KESEL F J WINNINGHOFF P M WOLFF 4-Way parallel processor partition of an atmospheric primitive-equation prediction model Proceedings of the AFIPS SJCC 1971 2 P G KESEL F J WINNINGHOFF Fleet numerical weather central's four-processor primitive equation model Proceedings of the 6th AWS Technical Exchange Conference US Naval Academy Technical Report 242 197017-42 1 I I I I. Partitioning of Navy Atmospheric Primitive Equation Prediction Model 3 J SMAGORINSKY S MANAGE L L HOLLOWAY JR Numerical results from a 9-level general circulation model of the atmosphere Monthly Weather Review Vol 93 No 121965727-768 4 A ARAKAWA Computational design for long term numerical integration of the equations of fluid motion: Two dimensional incompressible flow Journal of Computer Physics Vol11966 119-143 5 A ARAKAWA A KATAYAMA Y MINTZ Numerical simulation of the general circulation of the atmosphere Proceedings of the WMO/IUGG Symposium of NWP Tokyo 1968 6 W E LANGLOIS H C KWOK Description of the Mintz-Arakawa numerical general circulation model UCLA Dept of Meteorology Technical Report No 31969 7 N A PHILLIPS A coordinate system having some special advantages for numerical forecasting Journal of Meteorology Vol 14 1957 405 8 Y KURIHARA Note on finite difference expression for the hydrostatic relau'f)'11 and pressure gradient force Monthly Weather Review Vol 96 No 91968 9 E MORENOFF J B McLEAN Job linkages and program strings Rome Air Development Center Technical Report TR-66-711966 10 E MORENOFF J B McLEAN Inter-program communications program string structures and buffer files Proceedings of the AFIPS SJCC 1967 175-183 11 E MORENOFF The table driven augmented programming environment: A general purpose user-oriented program for extending the capabilities of operating ststems Rome Air Development Center Technical Report TR69-108 1969 12 P G KESEL E MORENOFF The Navy's operational four processor atmospheric prediction model Proceedings of the NASA/ARPA ILLIACIV Symposium Naval Postgraduate School Monterey California 1972 An analysis of optimal control system algorithms* by CAROL N. WALTER X erox Corporation Rochester, New York and GERALD H. COHEN The University of Rochester Rochester, New York scribed by Figure 1 (xo indicates a variable left boundary of increasing thickness). The final computation in all three algorithms is the solution of the .following transformed formulation of the state equation in terms of the two given boundary conditions: INTRODUCTION Currently, there are methods available which were derived in the field of computer science to analyze and evaluate algorithms implemented in computer programs. The subject of this paper will involve a combination of three of these methods l - 3 with a rather rigorous simulation of three invariant imbedding algorithms in a manner strictly slanted toward their usefulness and importance in control system applications. The algorithms used to solve the problems and special solution formulations of the problems are presented I first. Then, the numerical routines which provided the most efficient implementation of the problems in their algorithmic form are explained. And last, the adaptation of the analysis techniques to the problems is shown to aid in understanding the final conclusions drawn. Some of the reasoning used in the selection of prob" lems and the method of comparing the algorithms may or may not be totally applicable to algorithm analysis in other fields. U(x) =1/I11(L, x, xo) U(xo) +Jl(L, x, xo) (1) Vex) =1/I2l(L, x, xo)U(xo)+J2(L, x, xo) (2) Algorithm I (One-Sweep Transformation) integrates the following transmission, reflection and internal source differential matrix equations (3, 6; 4, 5; 7, 8), 'respectively, to provide values for the transformed matrix equations (1) and (2). P11(X, xo) =A 11 (x)P n (x, xo) -P12 (X, xo)A 21 (x)P n (x, xo) P 12 (x, xo) =A 11 (x)P12 (X, xo) +A 12 (X) -P12 (X, xo) ·A 2l (X)P 12 (x, xo) -P12 (x, XO)A22(X) P 2l (X, xo) = -P22 (x, xo)A 21 (x)Pn (x, xo) P 22 (X, xo) = -P22 (x, XO)A 2l (X)P12 (X, xo) -P22(X, XO)A22(X) THE ALGORITHMS (4) (5) (6) ill (x, Xo) = Fl(X) + [All (x) -P12 (x, Xo) A21 (x) ] The invariant imbedding algorithms used for this evaluation were derived4- 6 from the fundamental matrix specifically to provide numerical solutions for linear I two-point boundary value problems. The principle of invariant imbedding was applied in the form of certain invariant matrices for solving subproblems imbedded in , xE [xo, L]. The axis nomenclature used for expressing the operations in space in the imbedded area is de- .ill(x, xo) -P12 (X, xo)F2(x) il2(x, xo) (7) = -P22 (x, xo)[F2(x)+A 21 (xhil(x, xo)] (8) Initial Conditions: P(~,~)=LH(~,~)=O 1/I11(L, x, xo) =Pn-l(L, x)Pn(L, xo) (9) (10) 1/121 (L, x, xo) = P 22-1 (x, xo) • [P2l (L, xo) -P2l (X, xo)] * This research was supported in part by the Office of Naval Research under contract number NOOO14-68-0091. Such support does not imply endorsement of the content by the Department of Navy. 407 , I (3) (11) Jl(L, x, xo) =Pn-l(L, x) [ill(L, xo) -ill(L, x) ] (12) J 2(L, x, xo) =P22-1 (X, xo) [il2(L, xo) - i l2(x, xo)] (13) 408 Fall Joint Computer Conference, 1972 The additional equations (16), (17), (18), (19) are necessary to compute algorithm III (One-Sweep Riccati). (L.x.x o ) x L Figure l-"Medium" nomenclature The following four steps required for computing algorithm I are pictorially represented in Figure 2. (1) Integrate equations (3-8) from Xo to x with the initial conditions described by equation (9) applied at Xo, and store P 2l (x, xo), P 22 (X, xo) and H2 (x, Xtl) at each x. (2) Adjoin equations (3), (4), (7) with initial conditions (equation 9) applied at x, and integrate from x to L to obtain Pn(L, x), P I2 (L, x) and H1 (L, x) ¥x. (3) Integrate the complete set of equations from Xo to L to obtain the necessary values for equations (10-13) . (4) Solve equations (1) and (2) ¥x. Thus, the solution for each point is available after all of the sweeps (one sweep for each data point) have been completed. Algorithms II (Two-Sweep Riccati) and III (OneSweep Riccati) integrate the following Riccati differential equations in the process of their solution steps. S21(X, L) =A21(X) +A 22 (X) S2l(x, L) (14) - S2l(x, L)Al1(x) +A12 (xo)1/I2l(L, xo, xo) ] I~=io (16) Initial Conditions: 1/Ill(L, xo, xo) =1. Initial Conditions: 1/I21(L, xo, xo) = S2l(xo, L) Initial Conditions: J 1(L, xo, xo) =0. iJJ2(L,_ x, xo) -_ iJxo - .1, '1'21 (L ,x, Xo-) [F1 (-) Xo +A 12 (Xo)J2(L, Xo, Xo) ] 13;=:&0 (19) Initial Conditions: J 2(L, xo, xo) =H2(xo, L) Equations (16)-(19) hold ¥xo:::;x. Therefore, the steps required (consult Figure 4 for -S21(X, L)A I2 (X)S21(X, L) Initial Conditions: S2l(L, L) =0, where: 1/I2l(L, Xo, Xo) 1;;0=3;0= S2l(XO, L). H2(X, L) =F2 (x) -S21(X, L)F1(x) +A 22 (X)H2 (x, L) (15) -S21(X, L)A I2 (X) H 2(x, L) Initial Conditions: H 2 (L, L) =0, where: J 2(L, xo,xo) lio=3;0=H2(xo, L). Algorithm II implements the following steps: (Consult Figure 3 for the flow diagram.) ( 1) The first sweep, a backward sweep from L-Y.Co, requires that equations (14) and (15) be integrated backward in space to enable equation (2) to be solved for V (xo) . (2) The problem now becomes an initial value problem (see Eq. 2). Therefore, the second sweep is a forward integration of the solution differential equations: O(x) and Vex) from xo~L. o .1 .2 .3 .4 .5 .6 .7 .8 .9 t I.C. t I.C. t I.C. t t t t t t t B A A I.C. A I.C. A I.C. I.C. I.C. A A A I.C. A PI! (L,x) = 1 A Pj2(L,x)= 0 { (L,x)= 0 Hi Figure 2-Algorithm I-one-sweep transformation I.C. A 1.0=L t I.C. A :·1:, " Analysis of Optimal Control System Algorithms d 409 ·1 1 1I', ,. 1 PROBLEM IMPLEMENTATION CONSIDERATIONS flJ(x) " Iv(x) 1<'., " Two problems were chosen to provide a worst case and a best case digital simulation of each algorithm. Problem I is the reduced system of ordinary differential equations for a lumped parameter control problem. , \ U(X)] = [ vex) 521 (Xo, Ll U(Xo)=O I.c. =V(Xo) -2 Initial Conditions: u(xo) 1.0=L I.e. [-1 -1] the flow diagram) to compute solutions to algorithm III are: (1) Integrate the Riccati equations (14) and (15) from L backwards to x. At x adjoin equations (16) and (17) (where 1/I21(L, x, X{)=S21(X, L» and equations (18) and (19) (whereJ2 (L,x,xo) = H 2 (x, L». Integrate all six equations backward from xto Xo. (2) Equations (1) and (2) produce an immediate solution for each x sweep for U (x) and V (x) . (3) Continue until the solution ¥xE [xo, L] has been obtained. 1 vex) 1 veL) =0 521(L,L):O Figure 3-Algorithm II-two-sweep Riccati algorithm = [U(X)] This problem does not require the matrix formulation property or the forcing function of the algorithms. Therefore, the computational form of each algorithm for Problem I will be of minimal complexity. Problem II, the worst case application, is a distributed optimal control system in the form of a hyperbolic system of partial differential equations. au(x, t) at I.C. + au(x, t) = -u-v U(X, 0) =h(x) = 1, {u(O,t) =g(t) = 1, (20) ax xE [0, L] } tE [0, t] aV av -+-=-u+iJ at ax (21) VeX, T) =0, xE [0, L] } I.C. .1 .5 t B B ,7 I.C. B B B B ,8 t t I.c. I.C. t I.C. B B B ,9 I.O=L t t I.C. I.C. B A { veL, t) =0, tE [0, T] The method of lines is used to transform the partial differential equations into a set of ordinary differential difference equations. Equations (20) and (21) are discretized in the time variable by the unique substitution of the forward difference approximation for au (x, t) / at and a backward difference for av(x, t)/at. Therefore, the resulting solution equations for algorithmic computation are inhomogeneous and also require the matrix formulation of the algorithms (maximal algorithm implementation) . ' Forward Difference dUi+1(X) -1 dx = ~ [Ui+1(X) +Ui(X)] -Ui+1 (x) - Vi+1 (x) Backward Difference dVi(X) = -1 [Vi+1(X)+Vi(X)] dx At -Ui(X) +Vi(X) Figure 4-Algorithm III-one-sweep Riccati algorithm (22) (23) where i = 0, ... , N -1, N = number of intervals between Xo and L. 410 Fall Joint Computer Conference, 1972 Kutta integration which could be adapted to the matrix formulations necessary in Problem II. The classical fourth-order Runge-Kutta technique7 implemented follows: Yn+l=Yn+%(k 1 +2k2+2k3+k4) ; k 1 =hf(xn, Yn) k2=hf(xn+~h, Yn+~kl) k3=hf(Xn+~h, Yn+~k2) k 4 =hf(xn+h, Yn+k 3). A special in-line Runge-Kutta routine (outline) was developed (refer to Figure 5 for a flow chart of this routine) to avoid the overhead of calling a subroutine and to make optimum use of the following two facts inherent in the integratable equations: (1) The independent variable Xn never appears to the right of the equals sign. (2) The coupled property of the invariant imbedding algorithm equations produces functions which are constant within an integration interval of the dependent variable. These functions change at a fixed value of the independent variable .as a function of the interval on which the boundary value problem is specified. A matrix inversion routine was required to implement algorithm I in Problem II. The IBM Scientific Subroutine MINV, which performs a matrix inversion by the Gauss Jordon Method with a Full Maximal Pivoting technique, was chosen for this requirement due to the following facts: Figure 5-Runge-Kutta routine flow chart Problems I and II can be solved for the same data points along the x axis by realizing that when the method of characteristics is used for Problem II, it becomes identical to Problem I. All integration in Problem I is along the x axis; in Problem II these data points exist on the characteristic diagonal line (length = 1) . NUMERICAL ROUTINES The development of the actual computational form of the algorithms for both problems required a rather efficient manner of performing a fourth-order Runge- (1) An accurate matrix inversion technique is more complex than an ordinary in-line routine. (2) The inversion was not required extensively throughout the program for algorithm I. ANALYSIS TECHNIQUES** The three methods of analysis implemented in the problem's computing characteristics were: (1) solvability analysis; (2) local time and storage analysis; (3) efficiency and optimality analysis. ** All computations referred to in this section were made on an IBM 360/65 computer in Fortran G. Analysis of Optimal Control System Algorithms 411 TABLE I-Digital Analysis-Problem I Algorithms Problem I Number of Executable Statements Execution time (sec) Execution Program Space (Bytes) Storage Program Space + Work Space (Bytes) Maximum Absolute Error ME = IElmax I 106 0.09 22,288 26K 0.0000092 0.1 II 68 0.04 21,440 26K 0.0000476 0.1 III 73 1.21 21,600 26K 0.0004870 0.00625 Solvability Analysis The first technique provides the basis for all of the analysis performed. The accuracy of the result of each algorithm is compared to the analytic solution of the respective problem, to determine if the algorithm yields the correct solution. The numerical method of error computation is used to obtain the error for each point in the solution. ERROR = E = Analytic solution value-algorithm value. l The second indication of the computational workability of a problem is its stability. Algorithms I and III transform an unstable set of solution (system) equations to a stable set for their necessary integrations. However, algorithm II uses the original problem equations to obtain the final solution once the initial conditions have been computed. These equations may be unstable. Local time and storage analysis A local analysis2 to compare algorithms is one which investigates the important characteristics of some algorithm under "worst case" and "best case" input conditions. Therefore, using a local analysis, the three algorithms presented are evaluated in terms of execution time and storage for both problems. The criteria for determining the point of comparison for the three algorithms for each problem was chosen as the value of Ax (Runge-Kutta integration increment) where they exhibit the same chosen maximum absolute error (ME) (decimal accuracy) for a certain number of data points. *** ME= 1E lmax A three-place-decimal-accuracy, local analysis for Problem I is shown in Table I. *** kth Llx decimal accuracy (significant places) with the analytic solutionl is defined as: lEI:::; 1/2 X 10-k • In Problem II the same approximate (I E Imax. =0.074) one-decimal place of accuracy comparison with the analytic solution was the criteria for comparison because for At < 0.1 (discrete time increment) , the matrices became ill conditioned in Algorithm I. Refer to Figure 6 for the matrix form of the solution equations for Problem II. Conditioning the matrices would only cause further computational complexity and longer execution time ( cost) and add nothing to the comparison besides increased decimal accuracy (refer to Table II for the local analysis results) . Efficiency and optimality The efficiency-optimality analysis is presented in Pager3 in the theoretical terms of a Turing Machine. The theory begins with the basic premise that a function j 0 only for (Xl, ..• , xn) in the domain of f. The space-time measure 'Yp (z) of a Turing Machine 'Y.(z) = ~ L",~", p(x" ' .. , x.)p.(z, X" ••• ,x.) ] (25) where c = number of computations (problems) (arguments) that satisfy equation (24). The function Jl. (z, Xl, . . . , xn ), the space-time measure of a computation of a Turing Machine Z for argument (problem) (Xl, . . . , Xn), is an increasing recursive function of both E (z, Xl, . . . , Xn) and M (z, Xl, ..• , Xn). A pplication of efficiency-optimality For an algorithm (computer program), a practical application of this theory is used. Since there are Analysis of Optimal Control System Algorithms certain input arguments for which the computation will not go to completion, only two problems (arguments) are used for each of the three algorithms (computations) to obtain the space-time measure I'Pl'e(Zi). In this practical usage, the following application of the theory is implemented: (1) Zi represents each algorithm; i= 1,2,3. (2) ri represents the problems (arguments) that are included in the partially recursive alphabet; j=l, c=2. (3) Pti(XI, ... , Xn) for each are assumed to be ~ 1 and equal to each other, (Ptl (Xl, ... , Xn) ~ Pt2(XI, ... , Xn). represents Problem I, the simplified form of an extensively used control problem, a "best case" application of the algorithms. Problem II (r2) represents the complex version of an optimal control problem used quite extensively in the field of chemical engineering, a "worst case" application of the algorithms. This approximation has been made since it would be very time consuming to obtain a statistical calculation of the probability of usage for these problems. Therefore, the effect of the probability in the calculation of I'Pte(Zi) is represented by a constant k j • (4) The rj for each algorithm were written in a very efficient manner (separate programs of the same algorithm for each problem). Therefore, the program space changes for each !:j to more efficiently implement the algorithms. This procedure agrees with Pager's3 definition of two Turing machines having the same behavior if they perform the same sequence of tasks for each argument. ~II ri rl Equation (27) is now expressed In the following generalized form: Optimality of Zi = I'Pr e(Zi) (26) (27) i I' Then, a local comparison of the efficiency-optimality of the three algorithms is performed using the maximum error criteria (ME) (approximately one-decimal-place accuracy). Refer to Tables II and III for the efficiencyoptimality results for both problems. 413 TABLE III-Efficiency-Optimality Analysis OPTIMALITY (SPACE TIME MEASURE) "(Pte (Zi) EFFICIENCY 8,498,170 0.01176 X 10-5 TWO-SWEEP RICCATI 234,320 0.4267 X 10-5 ONE-SWEEP RICCATI 8,822,630 0.01132 X 10-5 ALGORITHMS ONE-SWEEP TRANSFORMATION RESULTS It is apparent that the problems chosen are solvable with these algorithms since the solutions are stable within the boundary conditions chosen, even though both problems have unstable characteristic roots. The local storage and time analysis proved that algorithm II required the minimum execution storage and time for both problems, due to its minimal amount of manipulation of original data; algorithm III required the maximum storage and algorithm I, the maximum time. Algorithm I was the most accurate when a comparison was made for a certain decimal accuracy in Problem 1. The efficiency-optimality analysis indicates that algorithm II is the most optimum and efficient (smallest optimality measure and largest efficiency measure). Algorithms I and III, respectively, are second and third in optimality and efficiency. A slight anomaly results here because algorithm II should be less accurate than algorithms I or III for a given dX (Runge-Kutta integration increment) since the integration performed is with unstable solution equations. This is true for algorithm I in Problem I and for algorithm III in Problem II. Evidently, the three types of error encountered in these digital solutions, original data error, roundoff error, and truncation error, mask the theoretically proven instability of algorithm II in comparison with algorithms III and 1. This is logically deduced since the manipulations of algorithms III and I are subject to the largest amount of original data-error buildup of the three algorithms. The error buildup for algorithm II would have been substantially larger if the imbedding interval was greater than X= 1. 414 Fall Joint Computer Conference, 1972 Finally, it can be stated that the two types of storagetime analysis yielded consistent results and, therefore, either method would have sufficed. REFERENCES 1 A RALSTON A first course in numerical analysis (New York) McGraw-Hill Book Company 1965 2 D E KNUTH Mathematical analysis of algorithms Stanford University Computer Science Department STAN-CS-71-206 March 1971 3 D PAGER On the efficiency of algorithms Journal of the ACM Vol 17 No 4 October 1970 pp 708-714 4 E D DENMAN G H COHEN One and two sweep methods of solving linear two-point boundary value problems USC Department of Electrical Engineering Technical Report No 70-39 August 1970 5 G H COHEN C N WALTER Hybrid computer solutions of partial differential equations using invariant imbedding techniques Sixth Annual Princeton Conference on Information Sciences and Systems March 1972 6 C N WALTER An analysis of two-point boundary value problem algorithms University of Rochester Department of Electrical Engineering Master's Essay December 1971 7 C F GERALD Applied numerical analysis (Philippines) Addison-Wesley Publishing Company 1970 8 M DAVIS Computability & unsolvability (New York) McGraw-Hill Book Company 1958 Computer simulations of the metropolis by BRITTON HARRIS University of Pennsylvania Philadelphia, Pennsylvania data and diverse data sets are frequently available for years which do not match. The major transportation studies have solved most of these problems (except time series data) on a one-shot basis. Second although it is frequently not recognized, transportation studies have taken an essentially behavioral view of transportation demand, although a very naive one. Over the last ten to fifteen years, there has been a growing recognition that a behavioral understanding of the reasons why certain decisions (to travel, to move, to build, etc.) are made and how they are influenced by the environment and by public policy is the key to a useful understanding of the urban organism. Such understandings are being expanded from the simple descriptive level to more subtle and complex views of more diverse and extensive types of behavior. Third, on the basis of these data and a limited behavioral understanding, transportation analysts were able to construct very large-scale simulation models of transportation behavior. These models can predict the use of transportation systems in substantial detail under varying assumptions. In considering the merits of this achievement, one must note the very large size of the systems involved and the fact that these systems have been treated in a fairly holistic fashion. Two special aspects of this whole development deserve slightly more exten'ded discussion. We should have expected, since we are discussing transportation planning, that the developments of the 50's and 60's could have produced a very extensive improvement in plan-making methods themselves. The simulation models which I have referred to under the third point above are essentially models for predicting behavior and the impacts of change and policy on these predictions. Almost nothing in the transportation literature bears on the question of producing a plan. One might have expected that the engineering approach to transportation planning would have generated optimizing techniques based either on the analytical solution of the conditions for an extremum or on search methods defined in some form of mathematical pro- The history of modern computer simulation of urban affairs represents the confluence of a number of trends which came to maturity in the middle of this century. Probably the oldest of these tendencies is the emphasis on planned urban development which has existed for millennia and which in the last century has demonstrated considerable vitality as a reaction to the excesses of the industrial revolution and the poverty and squalor of nineteenth-century cities. A second strand is the development of economic and sociological theory which goes a considerable distance in explaining some aspects of the organization and form of metropolitan settlement and its growth. These theories have a long history, but have matured principally during the 1920's and 1930's. Finally, as a methodological catalyst, the development of the automobile, of a Federal Bureau of Public Roads dedicated to providing facilities for it, and of the largescale metropolitan study based on the origin-and-destination survey have together made possible the crystallization and further growth of simulation methods. These methods are thus proximately based on the engineering attitude and computer technology of the large-scale transportation study, but they are in a position to draw on a number of other important streams of intellectual development. The transportation planning effort as carried out in large metropolitan area studies produced or laid the basis for three major advances in planning methods, all related to simulation. First, through the use of originand-destination studies and through the consideration of small-area detail, these studies emphasized the creation of large data banks. The bringing to perfection of such data banks has become a matter of nagging concern in the fields of urban management and urban planning, but for a variety of reasons adequate reservoirs of data have not yet been accumulated. Data is incommensurate as to area definition and activity definition. It is uneven in coverage across political jurisdictions. It lacks important elements such as detailed employment location and accurate descriptions of man-made structures. There are no time serIes 415 416 Fall Joint Computer Conference, 1972 gramming. Actually examples of this are extremely rare, and transportation planning takes the form of the evaluation of a limited number of alternatives which are generated in very conventional ways. The other observation is almost superfluous, having to do with the utilization of computers. The very large masses of data which are available for any city, and particularly as the outcome of a transportation study, moved research rapidly from punched card storage to tape storage and computer manipulation. At the same time, with increasing computer power, the analyses which were conducted became more sophisticated. Finally, the very large simulation models themselves require, for a transportation system alone, computer time on the order of hours rather than tenths of hours. Inevitably the appetite of simulation model designers requires more and more core and frequently more and more computer time. Against this background, let me discuss briefly several different dimensions of variation which apply to the computer simulation of urban growth and change. I assume that properly designed computer simulations can be used in a two-edged way-either as an aid to scientific investigation or as a means of making predictions which will vary under different assumptions about the state of the real world, the growth of technology, and the policies which are pursued by government, corporations, and households. I take the general view that policy manipulations are becoming increasingly disaggregated both as to the means which they employ and the objectives which they pursue. This means essentially that the most useful sets of simulation methods may have to do with a fairly detailed portrayal of the phenomena. I believe that there is room in general for considerable skepticism as to the accuracy of the simulation models which are used for policy explorations. In the extreme case, one may fall back on the alternative view that, even with inaccurate predictions, the use of models helps to define the nature of the problem and the construction of models helps to develop deeper insights into the theoretical and practical issues which are involved. The essential advantages of large-scale computer simulation models lie, first of all, in their extensive bookkeeping and computational capabilities. These aspects may escape direct theoretical· comprehension and hand manipulations. By extension, computer-based models can in principle take into account extensive interactions between different parts of the urban system and can trace the repercussions of events widely over space and time. This capability clearly depends on the ability of the analyst to identify the interactions in the first instance. We may now turn to two principal aspects of the subject matter which are dealt with in these computer simulations. The first distinction has to do with the difference be':' tween inter-urban and intra-urban simulations. Interurban and inter-regional simulations are necessary to provide a basis for action in any particular sub-area of a large country. Projections of the probable growth of the Philadelphia metropolitan region or of the State of California, hopefully under various policies, is a necessary background to planning for the metropolis or the state. Such projections are best made in the multiregional context so as to take into account the competition and interactions which occur at the national scale. Single projections, including those proposed by J. H. Forrester in Urban Dynamics, are extremely unreliable because they isolate the entity from its environment. Projections in this class fall into the realm of regional geography, regional science, and classical locational economics. I personally am much more concerned with intra-metropolitan locational. patterns, given the prior determination of levels of growth, composition of industry, and income. It is of course true that certain internal decisions affect these levels of growth, but in my view there is not yet an adequate basis for modeling this feedback. I am concerned therefore in the balance of this paper principally with the interaction between intra-metropolitan policies of all types. and the growth and development of the metropolis within its own generously defined boundaries. A second maj or distinction can be made along a spectrum of phenomena-physical, economic, social and political. As we move along this continuum, phenomena become more and more difficult to simulate because the theoretical models which describe them become less and less quantitative and to an extent more purely descriptive. In the physical realm, for example, and including the physical development of the biosphere, we can simulate fairly well such matters as hydrology, meteorology, and pollution. We can also, as I have suggested, simulate economic behavior such as the use of transportation facilities and choices of residentiallocation. There are difficulties in these predictions which arise not from our lack of understanding of the phenomena, but from the existence of externalities which make certain aspects of projections more dependent on large-scale random events. l\1a,ny of the behaviors of businesses and households in the metropolitan area are at least quasi-economic; their use of public facilities, for example, can be interpreted in the paradigm of economic behavior. Nevertheless, as activities become increasingly social, as in the achievement and employ;. ment of education, skills, and upward mobility, predic- I I I Computer Simulations of the Metropolis I I tion becomes more difficult and less accurate. Similar and stronger remarks can be made about political behavior. Finally, social, political, and racial considerations interact with many otherwise straightforward phenomena. The rise and fall of neighborhoods, the preservation and deterioration of the housing stock, and many other economic or quasi-economic behaviors are in the city immersed in these higher-level social systems. It should be apparent that a thread constantly running through all of these subject matters is the location of activities in space, their competition for sites, and their interaction with other activities both near and far by transportation and communication. Our theories and models of communication are much weaker than our theories and models of transportation. It is difficult, however, to see how we can possibly separate the overwhelming majority of the urban phenomena .that we wish to simulate from their spatial distribution or from their interactions. Communication and transportation models therefore occupy a central place in the simulation process. Simulation in the sense in which I discuss it here does not consist at all of Monte Carlo or single-event simulations and, indeed, has very few stochastic properties. Some, and indeed a majority, of existing models assume some sort of probabilistic laws governing human behavior, but such large numbers of individuals are being dealt with the division of people amongst various modes of behavior is in itself deterministic. In consequence of this type of assumption, the outcome of two successive runs of most of these models with the same inputs would be identical. I personally believe that this is desirable because large-scale events which might drastically alter the evolution of a metropolitan area should properly be explicitly under the control of the investigator. It follows from the foregoing discussion that many simulation models could be expressed in analytical form. Owing, however, to their very large size and nonlinearity, the solution of the analytical form of the models is usually outrageously difficult. A great deal of the computation involved in simulation is therefore one or another form of iterative solution to large and complex systems. The probabilistic interpretation of behavior is intimately related to questions of disaggregation and reaggregation, and to the distinction between descriptive and behavioral models. A simple example would be the analysis of the distribution of shopping trips amongst shopping centers. A linear programming solution would assign most individuals to the nearest center, but this is obviously not what takes place. The original models dealing with phenomena of this type were descriptive 417 in the sense that they attempted to replicate behavior without paying detailed attention to people's motives and decision-making processes. These matters are now coming under increased scrutiny, with the result that the analyst is faced by a bewildering array of behaviors and attitudes. Quite apparently, while this understanding of behavior may provide in principle a sounder basis for the construction of simulation models, it must be accompanied by the evolution of rules of aggregation which govern the deduction of mass phenomena from individual behavior. In principle, any model, no matter how highly aggregated, should have been derived in any one of a number of possible ways from an understanding of the behavior of decision-making units. This is a profound and complex problem which has only begun to receive adequate attention. Probably the most interesting, difficult, and subtle of all of the issues involved in modeling revolves around the question of static versus dynamic models. This problem affects the basic structure of models, the mode of simulation, and the types of policy conclusions which can be drawn. Directly and indirectly the issues involved appear in many of the disputes which arise from modeling. The more sophisticated computational, econometric, and mathematical discussions which arise over this issue have a somewhat more naive but very useful counterpart in the planning profession. Quite simply, twenty years ago the profession was strongly oriented toward the production of a "comprehensive plan" which envisaged some future state of affairs toward which the efforts of planned development should be directed. This conception has been roundly criticized on many grounds. The definition of a future state apparently left no room for further development beyond that date. The preoccupation with future conditions left present difficulties untended. The future state might indeed be incapable of achievement either because it was too costly, 'or because institutional obstacles existed, or because the path to it might be blocked by the beha vior of individual decision-makers. In the light of all of these and many other criticisms, the idea of the comprehensive plan has fallen into some disrepute, and much more attention has been thrown on planning methods which emphasized the path of development, the most immediate measures which will relieve present difficulties, and the modes by which various segments of the population are impacted by and involved in the planning process and the implementation of plans. This second procedure is much more process-oriented and more apparently socially aware. It is also more oriented to the practical problems of consensus and implementation, and hence apparently more realistic. 418 Fall Joint Computer Conference, 1972 In my own view, the planning orientations of these two approaches are complementary rather than competitive. The difficulty with the more recent and more dynamic view is that it is not guaranteed to lead to an acceptable or viable future state. It does not provide any means of defining an image of the future toward which the metropolis and its population can aspire. As an experimental vehicle, a dynamic model could be very clumsy since it is "forward-seeking" rather than "backward-seeking." On the other hand, the comprehensive plan defines some future optimal state subject to the possible difficulty that "you can't get there from here." The relationship of these two approaches to formal and mathematical aspects of modeling should be apparent. The comprehensive planning model is entirely compatible with the idea of optimizing through a mathematical program. It turns out that the locational and organizational problems of cities are incredibly difficult to solve in this mode because they provide very largescale, non-linear, integer programming problems with many local optima. Nevertheless, viewing the problems in this light provides important insights into design methodology. In certain cases, portions of the problem may be quite properly cast in a programming format. This is especially true of the predictive portion of models where market behavior is involved. Generally, however, even here the models are best solved by iterative procedures. The dynamic approach to planning corresponds in an intuitively simple way with dynamic or growth models cast in the form of differential or difference equations. Once again, the typical system of equations would be extremely large, non-linear, and complicated. Models of this type are supposedly represented most clearly by the Dynamo system of J. H. Forrester, but in fact versions of such models have been used in many types of forecasting for metropolitan areas long prior to Forrester's Urban Dynamics. Such models have a verisimilitude which makes them very popular with professional planners and decision-makers, and an element of mathematical sophistication which makes them attractive to operations researchers and analysts. As I have sketched above, their operation is somewhat difficult for purposes of policy testing. At the same time, their calibration is particularly difficult from the point of view of data requirements because at least two points in time are required in fine-area detail. From an econometric point of view, many more data points would be desirable, but this is in general utterly impractical. There are various points of contact between static optimizing models and dynamic models. One of these is entirely utopian in the present state of the art, but should be borne in mind as a future possible line of development. This would be to use the criterion function of an optimizing model to optimize not over metropolitan arrangements (as in the static case), but over the choice of policies, using the dynamic model as an embedded predictor in the dynamic programming con-· text. Because of the very large number of possible policy combinations, this approach is presently infeasible, but it may have some future value. More practically, the relation between dynamic and static models may be explored along a different line. The performance of a dynamic model may be regarded as an effort by the system to achieve equilibrium, although if changing outside circumstances and driving functions are available, this target equilibrium will be a moving one. In general, it is almost certainly reasonable to expect for the metropolitan region that some "sensible" equilibrium exists. The alternatives are exponential growth, collapse to an extreme configuration, or continuous oscillations. While none of these is impossible, they are not intuitively attractive. It therefore seems likely that one or more stable equilibria can be defined for most dynamic models. This is intuitively obvious for the Forrester model, given its output, and this equilibrium has been identified and analyzed. If for a dynamic model the equilibrium can be expressed analytically, it can also be explored for sensitivity to changes in parameters and to changes in policies. In principle there is no reason why such a model should not be "run backwards" so that policy variables would be set at a level required to maximize some welfare function. Such a backward-seeking model would be useful but of less general value than the dynamic programming model just described. It is very difficult to achieve because of the size, non-linearity, and possible existence of multiple optima in many large dynamic models. Viewed in the light of the preceding paragraph, the distinction between static and dynamic models is not as great as might at first appear. An example of the blurring of this distinction might be found, for example, in the Lowry model of residential location, which is widely used and which has been developed in different directions by a number of workers. In the first place, while this model makes no direct claim to optimize, its equilibrium properties tend to suggest that some such process is at work at the behavioral level. More complex models containing market behavior and explicit optimization (such as the Herbert-Stevens model of residential location) probably produce similar results to Lowry's. It follows from these quasi-equilibrium properties that efforts to make the Lowry model dynamic can produce a succession of static equilibria which depend Computer Simulations of the Metropolis on changes in the over-all conditions which the modp,} must meet. These successive equilibria mayor may not preserve part of the previous decisions made in earlier runs of the model. Finally, Lowry himself saw a certain resemblance in successive iterations of the model which were needed for solution purposes, a rough analog of the time-phased physical development of the Pittsburgh region. On the basis of such resemblances there is indeed a substantial confusion in lay circles between iterations which are designed to solve the model at a single moment, and iterations over time which should more properly perhaps be called recursions. There may be important statistical consequences of the similarity between static and dynamic models. In equilibrium the sub-areas of a metropolitan region would have constant composition, and either constant population or constant rates of growth. This would imply that in each area the internal forces leading to change in different directions would be exactly in balance. For a linear model, therefore, some combination of independent variables would be precisely collinear across all areas. Such a combination of equilibrating forces might be identifiable from cross-sectional rather than time series data. In this case, the principal roles of time series data would be quite different from their usual one. They would establish mainly the rate of adjustment to the equilibrium. It is important to note that if a growing organism like the metropolitan area has a set of internal forces of this kind which tend to lead to an equilibrium, then the multiplication of error in a projection tends to be minimized. The model is in a sense self-correcting, to the limits of the accuracy with which the relative importance of the various factors has been estimated. In rapidly growing areas or for slowly moving locators, the gap between equilibrium and the observed situation may be quite large and the errors may be not only substantial but biased. All of these remarks affect principally operational considerations and do not deny the basic relationship between static and dynamic models. Mathematical neatness suggests that all variables in a model be treated symmetrically so that all equations for every locator look very much alike. If this kind of treatment is possible,a monolithic model may be developed which makes all types of projections at once. This is the case with the Forrester model of Urban Dynamics, the EMPIRIC model of the development of the Boston metropolitan area, and to an extent, the Lowry model. The difficulty with such monolithic concepts comes from a number of sources. First, for very large numbers of classes of locators or very fine small-area detail, the size of a unified model becomes outrageous, especially since the number of interactions tends to rise with the 419 square of the number of variables. Second and more important, various different activities may have different modes of development which suggest the desirability of substantially different models linked together in some reasonable fashion. A fair amount of experimentation has already been done with such linkages, and their" character is clear on both practical and theoretical grounds. Essentially what may be expected to happen is that in studying any particular sub-system or large coordinated group of locators in the metropolitan region, the results of the activities of other locators are taken to be a part of the environment for the sub-system under study. At a later point in the process, the results of the activities of this locator become parts of the environment for other locators and influence their behavior. These interactions may be worked out by iteration at each single point in time, or they may be lagged and carried through successive steps in the recursion. A major advantage of this type of subdivision of the problem is that highly diverse locator behaviors can be dealt with properly by distinctive models. It appears, for example, that a large part of retail trade location responds very rapidly to market conditions and is well represented by an equilibrium model. Residential locational choice is better represented by a model in which only a certain number of movers seek equilibrium and where this relocational behavior of a small proportion of the population represents a lagged dynamic element. Industrial location and the location of certain centralized services like banking are much slower to relocate than are households and require a still different model. One might expand this list very considerably, showing how public services of various types, household formation and dissolution, and various social phenomena each require their own type of model, and how these models may be operated in sequence and linked through a computerized data base which simulates the environment for all of them. Such a conception of modeling is flexible and easily amended. It is simple to define in principle and somewhat tedious to develop in practice or describe in detail. A coordinated model system in which a variety of models interact with each other does not prejudge the issue of whether the total model will be dynamic or static. Such a group of models can be iterated to equilibrium and thus solved as a total system. Alternatively, the inclusion of a single dynamic model dynamizes the entire system. I have not developed in any detail the concept and methodology of planning model design since this is substantially less mature than simulation. I define a planning model as a model which produces a plan, as in 420 Fall Joint Computer Conference, 1972 the case of a mathematical programming method, or which greatly assists a planner in producing a plan. Planning models are difficult to manage because of the large combinatorial searches which are involved, and it seems likely that this activity will best be left to a designer or decision-maker intervening in an interactive computerized system. This computerized system will have to have embedded in it simulation and evaluation models which can predict the results of the designers' efforts, but owing to the nature of the interactive process , an.d the extent of the searches which are required, it seems likely that such simulations will have to be greatly condensed and simplified. In my view, we are perhaps in danger of proceeding too rapidly with interactive processes, without exploring the implications of the simplified simulations which they use. The foregoing review has attempted to highlight some of the principal issues which surround the design of shnulation models of urban metropolitan areas. These models currently exist in rather sophisticated forms and make heavy demands upon model design capabilities and upon computer power. Indeed it is altogether conceivable that the development of methods in this field will result in a substantial reduction of these demands at comparable levels of performance. The principal issues which I have discussed and which are subject to further research and investigation may be listed as follows: 1. The extension of substantive investigations into social and political spheres. 2. The investigation of elementary behavioral patterns coupled with an appropriate understanding both of disaggregation and of rules for aggregation or reaggregation. 3. An expanded understanding of the different structures, performan.ce, characteristics, and uses of static and dynamic models. 4. The development of systems of linked models. 5. The development of planning models and interactive planning methods, together with the appropriately subordinate use of simulation as a part of these methods. APPENDIX Selected readings in urban simulation Alonso, William. Location and Land Use-Toward a General Theory of Land Rent. Cambridge: Harvard University Press, 1964. Berry, Brian J. L. Department of Geography Research Paper No. 85. Commercial Structure and Commercial Blight. Chicago: University of Chicago, 1963. Chapin, F. Stuart, Thomas G. Donnelly, and Shirley F. Weiss. A Probabilistic Model for Residential Growth. Chapel Hill: University of North Carolina, Institute for Research in Social Science, in co-operation with U.S. Department of Commerce, Bureau of Public Roads, May 1964. Chapin, F. Stuart, and Shirley F. Weiss. Factors Influencing Land Development, Chapel Hill: University of North Carolina, Institute for Research in Social Science, in co-operation with U.S. Department of Commerce, Bureau of Public Roads, August 1962. - - - . Some Input Refinements for a Residential Model. Chapel Hill: University of North Carolina, Institute for Research in Social Science, in co-operation with U.S. Department of Commerce, Bureau of Public Roads, July 1965. Forrester, Jay H., Urban Dynamics, M.LT. Press, Cambridge, 1969. Harris, Britton. "The Uses of Theory in the Simulation of Urban Phenomena," Journal of the American Institute of Planners, Vol. 32, September 1966. - - - . Highway Research Record No. 26: Land Use Forecasting Concepts. Washington: National Academy of Sciences-National Research Council, Highway Research Board, 1966. - - - . "Some Problems in the Theory of Intra-Urban Location," Operations Research, Vol. 9, SeptemberOctober 1961. - - - . "A Model of Locational Equilibrium for Retail Trade." Paper presented at a Seminar on Models of Land Use Development, Institute for Urban Studies, University of Pennsylvania, October 1964. Mimeo. - - - . "Inventing the Future Metropolis." Paper prepared for the Catherine Bauer Wurster Memorial Public Lecture Series, sponsored by the Harvard Graduate School of Design and Massachusetts Institute of Technology. May 1966. Mimeo. - - . "The City of the Future: The Problem of Optimal Design." Paper presented at 13th Annual Meeting, Regional Science Association, St. Louis, Mo., November 1966. Mimeo. Herbert, John, and Benjamin H. Stevens. "A Model for the Distribution of Residential Activities in Urban Areas," Journal of Regional Science, Vol. II, No.2, 1960. Journal of the American Institute of Planners. Special issues: Urban Development Models: New Tools for Planning, Vol. 31, May 1965; Land Use and Traffic Models, Vol. 25, May 1959. Lowry, Ira S. A Model of Metropolis. Memorandum RM-4035-RC. Santa Monica: The RAND Corporation, August 1964. I I Computer Simulations of the IVIetropolis - - - . Seven Models of Urban Development: A Structural Comparison. P3673. Santa Monica: The RAND Corp., September 1967. Muth, Richard F. "The Spatial Structure of the Housing Market," Papers and Proceedings of the Regional Science Association, Vol. 7, 1961. Orcutt, Guy, John Korbel, Alice M. Rivlin, and Martin Greenberger. A Microanalysis of Socio-Economic I 421 Systems: A Simulation Study. New York: Harper, 1961. Seidman, David R. A Linear Interaction Model for 111anufacturing Location, Penn-J ersey Transportation Study. Philadelphia: Delaware Valley Regional Planning Commission, 1964. Wingo, Lowdon, Jr. Transportation and Urban Land. Washington: Resources for the Future, Inc., 1961. The protection of privacy and security in criminal offender record information systems by STANLEY ROTHMAN Consultant Manhattan Beach, California INTRODUCTION of technology, a model state act and administrative regulations for the protection of privacy and security in this exchange of criminal histories. However, this work is strictly advisory. All fifty states now participate in the work, but there is guarantee of neither unanimity nor state legislative approval of the results. In this paper we will single out those aspects of the problem of protecting privacy and security in information systems that are special to law enforcement. FEDERAL-STATE RELATIONSHIP LAW ENFORCEMENT The National Crime and Information Center, which extends from the FBI to the state, county, and city level, has been expanded to contain and exchange criminal histories. The rules under which state and local governments participate in this system are under debate, a debate that may extend to a law suit by the State of Colorado against the FBI. The substance of the conflict is the ruling that any computer participating in this on-line exchange of criminal histories must either be dedicated to law enforcement or under the management control of law enforcement. The significance of this is as follows: Law Enforcement is the principal participant in this system to date. Thus, the system operates twentyfour hours per day, seven days per week. Eventually the courts, prosecutors' offices, probation, parole, prisons, and the entire criminal justice system will participate. The information system competence of law enforcement is highly variable. They are in general too dependent on the equipment manufacturers. Their information systems have to serve many other purposes than enforcement, such as credit, military clearance, and licensing. They have some experience in handling "need-to-know" type restrictions for vice records, but the whole idea of restricting access to arrest records that do not have convictions will take some getting used to. Law Enforcement must manage personnel within Civil Service regulations. Thus, screening out people with a criminal record, criminals in the family, or firing an employee for violating administrative security regulations is difficult. Similarly the use of the polygraph as a control is subject to fifty different sets of state laws. A large number of law enforcement installations still operate manual files and these must be protected, perhaps even more stringently than automated ones. This is because they may have a terminal that receives criminal history information even if they do not have a computer that is linked to the network. a. Neither the FBI nor the Federal Government control local law enforcement. b. There are at least a dozen states that can only afford a shared service bureau installation. c. Management control of non-enforcement reccords, such as welfare or health, by law enforcement will cause another debate, a very loaded one. Other technical requirements are dedicated communications lines and non-dial-up terminals. PROJECT SEARCH The Law Enforcement Assistance Act has for some time, thru Project SEARCH, sponsored development 423 424 Fall Joint Computer Conference, 1972 One further requirement that is not unique but is important is the facility for research in criminal records. Longitudinal studies have to be done of the criminals in their progress thru the criminal justice system. These studies require added protections because of their potential for violating the privacy of the criminal. THE THREAT Within this context of ambiguity and good intentions there are the threats to law enforcement information systems that are very real and very specific. They are: a. The anarchist who wants to disrupt, destroy or embarrass the system; b. The criminal who would like to remove a file or query the file of another criminal; c. The private detective, bank officer, newspaper reporter, or employer who would like to check for a criminal history; and d. Civil disorder. The access can be gained either with some difficulty from outside the system or thru misuse of people with legitimate access. All of these threats have taken place at one time or another. The most common threat is the bribery of system employees and police officers. The technical threat has been documented elsewhere and is little different than the technical threat to any computer-communications system. One of the differences is the extent to which it is worthwhile to protect against wire-tapping and electromagnetic radiation. Until an organized crime intelligence exchange is automated this added expense is not justified. This is not so much a judgment on the cost of the protections as it is an estimate of the small value of most of this traffic. With the exception of juvenile records, most of the information in criminal records is a matter of public record. It is uniformly agreed that errors in these records should be corrected. However, since these records are widely disseminated, the dissemination records must be maintained to direct the distribu- tion of the error corrections. This by itself is a considerable task. An unusual requirement is that under some circumstances all evidence that a criminal record existed must be removed. CONCLUSION There are several thinl~s that can be said about the solution to these problems. First of all, the achievement of a commercially available secure operating system is vital to resolving the debate about the relative security of shared versus dedicated installations. I suggest that the computer manufacturers pay attention to the requirement for a secure operating system. There is every evidence that these federally sponsored non-military agencies will unite at the Federal level to produce binding procurement specifications that could be influential. While many manufacturers have been working on absolute identification of terminal users thru voice, fingerprint or handwriting recognition, I would like to underscore the importance of success here. It is the key to the control of one-man remote terminals. A problem area that has so many requirements, purposes, decision makers, and an incomplete technology-the technology of computer protection-ends up with procedural protections. These are inherently weak because they depend upon human diligence. It is for this reason that the management control of shared installations that contain criminal records requires the added protection of the discipline that is traditional with the police. Lastly, while Law Enforcement has been particularly farsighted by working on this problem well in advance of an uproar like that created by the proposal for a National Data Center, the achievement of a secure, nation-wide criminal history exchange that protects privacy could well take a good deal more time and trouble. In part this is because of the absence of concrete cost trade-offs studies that tell us how much reduction in risk our security measures buy. However, even more important is the fact that such an exchange requires a uniformity of state laws governing the use of criminal histories. Such uniformity will be difficult to achieve. Security of information processing-Implications from social research * by ROBERT F. BORUCH Northwestern University Evanston, Illinois THE CHARACTER OF SOME SOCIAL RESEARCH PROGRAMS INTRODUCTION Many social research programs are characterized by a stringent requirement that identifiable data collected on the subjects of research be kept confidential. This requirement, coupled with the increasing number of sensitive, sometimes controversial research efforts, has stimulated social scientists' interest in legal, administrative, and technical methods for assuring that confidentiality is maintained. We concern ourselves primarily with the technical methods in this paper, treating "security" as a partial operationalization of the notion of confidentiality. ** Specifically, we should like to sketch those problems met in social research which are relevant to securityoriented activities in information processing. In the following remarks, some of the distinctive features and needs of large-scale social research are outlined. Then, I~' !j, the research design, data collection, maintenance and dissemination stages of the research system are examined to discover now the interests of social research and those of security-oriented information processing , might intersect vis-a-vis the problem of assuring confidentiality. Maintaining confidentiality and security of data are likely to be important objectives in a variety of social research efforts. In the section, examples of these are furnished and the factors which appear to be important in distinguishing research archives from other kinds of information systems are described briefly. Focuses of the research I In order to establish a manageable topic area, suppose we restrict attention to large-scale social research which results in a computerized information system containing data on identified research subjects. Some form of identifiers (e.g., names and addresses) are essential when individual subjects must be tracked over time to investigate biological and social development, to appraise the cumulative impact of drugs or alcohol abuse, etc. These so-called "longitudinal studies" are frequently conducted, and although many are quite small, some involve repeated in-depth measures on over 100,000 individuals over a 10 or 20 year period. The research topics which can be expected to generate some concern about privacy, confidentiality, and security cut across all the social sciences. In political science, for example, whether an individual voted or not is frequently a provocative topic for inquiry and a negative response usually constitutes "sensitive" information. Human factors psychologists, often involved in accident research, focus on seat-belt wearing behav.,. ior; in some highway surveys, spot checks are made of drivers' alcohol use. Each type of information may have a stigmatizing character. Epidemiologists, of course, I ,[ 1'1 * Work on this topic has been supported by NSF Grants GS32073X and GI29843. Naturally, the views expressed here do not necessarily reflect those of the sponsoring agency. Some of the observations made in this paper are an extension of earlier research reports, notably (1). ** Confidentiality here refers to the status of information, a condition under which access is formally restricted to certain agencies or individuals. Security refers to the administrative, technical, and legal devices used to assure that the formal restrictions are met; i.e., security is an operational definition of the concept of confidentiality. 425 426 Fall Joint Computer Conference, 1972 frequently need to acquire longitudinal data on incidence and spread of venereal· disease, on illegal abortion, and on other socio-biological deviations from the norm. Social psychology, traditionally concerned with relatively innocuous laboratory experiments, has become associated with research on white-collar crime, on mob violence, and on helping behavior in critical situations (e.g., bystandar apathy to a streetcorner mugging). Large-scale research in economics and la~ has, in recent years, accumulated much longitudinal data on individual's spending behavior, deviations between actual and reported taxable income and other sensitive topics. (For references to work in each area, see Reference 1.) In the past, confidentiality has not been so crucial and generalized a concern because the size of the research efforts had been small and visibility of the studies low. Perhaps more importantly, the academic orientations seemed to have been associated with relatively innocuous data on anonymous individuals or subjects tracked over very short time intervals. During the past five years, the size and visibility of social research projects such as those described above has increased dramatically, particularly in the policy research and evaluation areas. 2 •3 As in commercial data collection activities, accidental disclosure and deliberate penetration of research files can have serious consequences: research subjects may be embarrassed or harassed and the research programs would undoubtedly suffer. Although the empirical risks here are sometimes no better documented or appraised than risks in commercial data collection enterprises seem to be, the issue is serious enough for both Federal and private grant agencies to develop guidelines on collection and maintenance of identifiable data on individuals (see references in Reference 2). The social researcher's interest in establishing the security of information stems from increased visibility of research, from these formal legal requirements, and from the ethics and the realities of research. Our ability to collect data will suffer considerably unless we conscientiously and conscionably recognize the need for security. A rchival data: Functional distinctions relevant to security How might we describe the functional character of social research data archives and those features which appear to be important for the sake of security? As a first approximation, we might consider a rough continuum of computerized data banks which contain personal records, defining the continuum such that one end represents an auditing function and the other represents a research function. Personnel records and intelligence systems typify the first extreme, where each identifiable record serves as a basis for making evaluative judgments about the individual on whom the record is kept, and for taking direct and personal action which directly affects the individual. The research-oriented systems generally serve not as a vehicle for decision and action about an individual, but for appraising the group's condition with respect to some social theory or with respect to the effectiveness of a program with which the group is involved. The American Council on Education's Higher Education Data Bank,4 and Project Talent2 exemplify this activity. Each collects identifiable data on thousands of students annually. Most of the data are innocuous by any standards, but some pertain to campus protest activity, alcohol use or other sensitive behavior. Identifiers serve as an accounting device, and the data are not meant for use as a basis for evaluative decisions about individuals. The functional distinction-audit versus researchhas some rather important implications for minimizing the likelihood of disclosure or the utility of data should the data be deliberately appropriated for nonresearch purposes. Identifiers, even if collected, do not need to be as accessible as statistical data for research purposes. Special strategies for separate handling of identifiers and statistical records can be developed and have been used to minimize risk of disclosure (see Intrasystem Linkages, below). Statistical records in audit systems usually must be quite accurate, but in the social research systems, imperfections generated by the method of data collection are recognized and estimated, not for the individual, but for the group as a whole. In fact, to undermine the utility of individual records, without jeopardizing the integrity of the total data seriously, random error whose parameters are known can be inoculated into the data. * This strategy, evidently inappropriate for commercial record systems, seems to hold some promise in research concerning topics such as use of contraceptives and illegal abortion, 5 * Some research designs can be set up such that each respondent injects his response with random error in a manner prescribed by the researcher. For example, in a question requiring a yes-no response, the researcher might instruct the subject to roll a die and to lie if a "1" shows and to tell the truth if 2, 3, or 6 shows. The known likelihood of false positive and negative responses in the paradigm can be used to obtain unbiased estimates of parameters in data analysis. The presence of randomized error in the record system would presumably reduce embarrassment, and threats of unauthorized or legal disclosure, since individual records cannot be used for unambiguous judgments about individuals on whom records are kept. Security of Information Processing I~ Linkage problems also differ a bit depending on function. In the research systems, one often wishes to merge identifiable data collected by different agencies. Unlike merges in many audit systems, the separate agencies each may have their own rules and practices regarding disclosure of individual records but may be willing to share data if rules about confidentiality are not compromised. The researcher must then devise special strategies to link data without breaching these rules and without compromising the promises of confidentiality made to individuals on whom records are kept. Specialized methods have been developed (see remarks below on Intersystem Linkage) but more work needs to be done. The legal status of information in social research also differs from data in the audit system. In some states, socio:"medical research records, some educational and psychological records are protected from even legal interrogation by a testimonial privilege. More often, however, they are not so protected and some mechanisms have been devised to undermine the data's legal utility or to minimize its legal accessibility. The inoculation of random errors probably meets the first objective; specialized froms of data linkage and maintenance (Intrasystem Linkages, below) help to meet the second. These legal differences are related to security needs in general, and since processing is typically conducted with computing machinery some particular features of information processing technology may also be relevant here. Each of these differences imply some of the specialized needs of the research data archive in contrast to the audit information system. In the next sections of this paper, the collection, processing, and maintenance stages of the research system are described in a bit more detail and linked to methods for assuring security of data. 427 Questionnaire surveys In order to eliminate the possibility of disclosure during survey administration, some plans require the respondents to put the completed questionnaires into locked and addressed boxes which would be sent directly to the data processor. In some cases, representatives of various interested and disinterested groups can and do monitor the collecting, packaging, and mailing of completed questionnaires. Even more simply, questionnaires or interview documents have been designed so that one section, containing identification and code number, can be detached from the other, containing responses and an identical code number. Either the respondent at the site of the surveyor the researcher at later stages of the survey process can actually separate the two components of information. The identifying information can then be held by the respondent or by a monitoring agency (e.g., group of respondents or representatives of the host agency) and submitted to the researcher after the statistical information is compiled. The code numbers permit later linkage of statistical data with information collected later in the research process. Rather than require individuals to respond directly on a questionnaire, some researchers are making more use of perforated, but otherwise standard EAM cards as a vehicle for recording data. In requiring that the respondent merely punch his responses out on the card and return it by mail, any intermediate handling of identified records is reduced. And, we can couple this strategy with the use of nominal or numeric aliases to further enhance security. The principal problems with this approach seem to be subjects' reaction to the cards and limitations of the card format on permissible response options. Human engineering studies would prob-· ably help to ameliorate some of these problems. Remote terminals DATA COLLECTION In the simplest case, data are elicited by the researcher and an individual's response transmitted back to the researcher through various intermediary groups. The intermediaries often include local administrators, staff members of scanning/mark sense processing units, and key punch operators as well as the researcher's personal representatives. For the sake of security, many social researchers are attempting to reduce the possibility of disclosure to intermediaries, particularly by reducing the numb~r of intermediate stages between eliciting information and the provision of response. One idea which seems to have some merit involves the use of remote terminals as a kind of voting booth for repeated surveys of certain groups of individuals. That is, rather than have respondents furnish data via questionnaire or telephone, we might require that they do so through "social reporting units" in which opinions and self-descriptions can be i~put directly to storage by an individual. Remote input devices might be particularly useful in organizational settings where continuous monitoring of individual's attitudes, activities, expectations, and status are essential for research on the effects of policy changes or of organizational innovations. 428 Fall Joint Computer Conference, 1972 The voting booth or other remote input methods might, for example, be applied usefully to public housing appraisal where good data on resident's status is essential to economic studies. 6 ,7 Othera pplications may include welfare recipient's reporting, transportation depot surveys, or surveys of any well-defined group (e.g., hospital, military, prison or student groups), whose members can provide useful input data to the social research reservoir. In many such reporting systems, a guarantee of anonymity is necessary for honest and continued reporting; however, tracking the development of individuals is also a frequent requirement. These two needs suggest creation of systems in which the technically unsophisticated respondent can make inputs easily and without being jeopardized by the opinion or factual information he offers. The numeric alias or password systems already developed appear to be relevant here. Some are persuasively secure, e.g., permitting the respondent to form his own transform of a random number of identifiers supplied by the computer. The human factors problems in getting people to use and to adhere to their personal, private transforms will probably outweigh the technical problems in implementing such a system, but these do not seem to be intractable. formal administrative regulations regarding the treatment of sensitive data. Similarly, the paucity of information on the establishment of and adherence to codes of ethics in the document processing industry is serious concern to many researchers; since the document processing is frequently (perhaps necessarily) tied to computer operations, the concerns apply to this area as well. When no administrative or ethical codes are espoused by the service agency that the researcher must employ, it may become necessary for the researcher and the data processor to reach some formal contractual agreement on the treatment of data. At a minimum, such agreements should require that identifying data and response data be separated at an early stage, that the documents be destroyed soon after processing, and that the responsibilities and consequences of negligence on the part of the service agency be carefully defined. At present, insuring that such a prescription is adequate can be difficult because legal precedents and specification of negligence and liability in a document or data-processing environment have not been fully established. The current explorations of these legal problems may clarify the situation (see references in References 2 and 11). MAINTENANCE AND DATA LINKAGE DOCUMENT PROCESSING Anonymous reporting, responding under alias identifiers, and using specially constructed questionnaires (or having respondents inoculate their response with random error), usually minimize if not eliminate the likelihood of unauthorized disclosure at later stages in the research system, including document processing. But these strategies may be inappropriate or too expensive for particular kinds of research. Very large and very expensive field experiments, for example, are an important means of evaluating economic and other governmental programs; intensive and long term longitudinal surveys of small samples contribute to our knowledge of human development. 3 ,8,9,lo Both kinds of studies typically require exhaustive cross-checking capabilities, very complex merge operations, and other activities which appear to justify the joint processing of statistical and identifying information. The use of aliases in these cases may be completely inappropriate and the use of specially constructed documents may make cross-checking the validity or completeness of response very expensive. In these circumstances, the social researcher usually meets several problems. For one thing, document processing agencies often have neither written policies nor When identifying information must be collected with data, the device most frequently used by social researchers for minimizing accidental disclosure or deliberate interrogation of identifiable records is physical separation of identifiers and statistical data. Each separated file usually contains code numbers which permit later merging operations and the identifier file is often kept in vault storage. A few social research agencies have applied some of the Department of Defense administrative and mechanical requirements for security, and the agencies often require computer service groups with which they deal to use the same regulations where feasible. 4 ,l1 More elaborate schemes for minimizing the likelihood of disclosure have been developed and are being used. Many of these strategies can be divided into three groups depending on the purpose of maintaining identifiers: schemes for intrasystem linkages, for intersystem linkages, and for combined audit-research systems. I ntrasystem linkages Intrasystem linkages refer to a single agency's collecting and merging data on the same sample of individuals over an extended time period. In longitudinal Security of Information Processing I ' I, studies of students' political activism, for example, data are frequently collected in identified form. It is reasonable to expect that nonresearchers may be interested in examining identifiable data. The researcher with no legal testimonial privilege (i.e., without the ability to resist subpeona), would normally like to minimize or eliminate the possibility of disclosing sensitive data to even legal authorities when he has promised confidentiality to his respondents. An interesting operational resolution of this problem is the American Council on Education's LINK FILE SYSTEM.4,l1 The strategy was developed to assure the confidentiality of longitudinal data on college students, data which includes limited but identifiable information on disruptive campus protest activities. It works in the following way. Mter identified questionnaires are returned by students, the researchers split the information into two segments. The first contains statistical data with one of arbitrary numerical codes attached to each record; the second contains students' names and addresses linked to a second set of code numbers. A third file matches the first and second set of numerical identifiers (aliases). This code linkage is kept in a foreign country with an agency contracted to maintain the linkage fo~ later data merges; the, agency is also required by the contract not to return the linkage to the researchers under any circumstances. In followup studies, the researcher's name and address file is used to distribute questionnaires. The associated numerical aliases are substituted for names during document processing and this file is then shipped to the contract agency. The agency replaces the numerical identifiers in this file with the first set of identifiers, using its code dictionary. Then, this follow-up file is returned to the research agency for merging follow-up data with the original data, using the numerical identifiers common to both files (i.e., the first set of arbitrary numerical identifiers). The system is certainly flawed in that it can be undermined in some cases by the research staff, by the agency holding the code linkage file, and by legal agencies with international ties (see the Hoffman and Turn critiques described in Reference 11). But it is a useful prototype which may help us learn a bit more about how to design and implement a system which will assist in protecting social research data. It does provide a concrete target for the check list strategy given by Peterson and Turn,12 to determine susceptibility of the data to legal interrogation and corruptibility of the system by its creators as well as by outside agencies. The difficulty of using encoded identifiers (arbitrary identification numbers) in physical protection for files and of protecting against indirect disclosure overlap considerably with problems next. * III 429 intersystem linkages which we consider Intersystem linkages Intersystem linkage refers to the researcher's merging his own identifiable research records with records maintained under other auspices. As an example, consider the (true) example of an economist who obtains data on spending behavior and wishes to correlate these with items from income tax returns. The linkage of both sets of records raises difficulties of two kinds. On one hand, the researcher's provision of identifiable information to the IRS for merge purposes may violate his promise to his respondents assuring the confidentiality of the data. On the other hand" the researcher cannot obtain identifiable data from the IRS for merging because IRS regulations generally prohibit such disclosure. The so-called insulated linkage process for merging data is an illustrative resolution of these two problems. To link the files, the researcher first cryptographically encodes all statistical data in his own records. He then supplies the joint records (encoded statistical data coupled with identifiers), to the other archival agency. The latter then merges its own files with the researcher's file, basing the merge on the identifiers appearing in both files. When the merge is complete, identifiers are deleted and the resultant file, consisting of unidentified statistical records from both agencies, is then returned to the researcher. This system has been used in actual merge operations' with some success and is one of a general class of strategies for linking data under security restrictions.ll Again, a linkage strategy of this sort can sometimes be rather vulnerable, and additional mechanisms must be invented to minimize deliberate efforts to interrogate identifiable data in either file. To corrupt the system, the researcher could, for example, encode a duplicate set of identifiers in his file, allowing identifiers to masquerade as statistical data. Presumably this strategy can be' rendered useless by having the archival agency not only merge the data but also summarize it. The provision of summary data then may permit only indirect disclosure efforts by the researcher. But if the researcher uses a very simple encryption scheme, such as systematically substituting one character for another in the records, the archival agency may be able to penetrate * Indirect disclosure involves using a twenty questions strategy to deduce new information about an identified individual when the interrogator has statistical as well as identifiable records on the individual. 430 Fall Joint Computer Conference, 1972 the substitution scheme and in fact examine the researcher's identifiable records. In both the intrasystem and intersystem linkages, social researchers need more guidance on appraising the vulnerability of the strategies. Aside from making more thorough appraisals as outlined, for example by Peterson and Turn,12 we should obtain better insights into more systematic ways of detecting and inhibiting the likelihood of indirect disclosure, and the utility of cryptographic encoding in these applications. Combined audit-research systems Some organizations, governmental ones especially, have both audit and research missions and the information maintained in their computers reflects this dual objective. Trust may be a reasonable basis for assuring that researchers will not improperly explore identifiable administrative records or that nonresearchers will not interrogate identifiable research records. More formal restrictions on access and disclosure may be warranted, however, particularly where the data vary considerably in sensitivity, and administrative or personnel monitoring 'procedures are difficult to implement. Some of the researcher's needs here can be characterized as having two dimensions. On one hand, he has some need for a flexible, hierarchical system of protection for his own data which can be tailored to the pyramidal nature of its sensitivity. Innocuous and public data might then be kept secure with the cheapest form of protection possible, e.g., existing administrative checks on personnel and the physical plant. More sensi- . tive information such as sources of income, psychiatric and hospitalization records, personal habits and beliefs would justify more secure (and presumably more expensive) mechanisms including those reviewed by Hoffman,14 say, in his state-of-the-art survey. Some flexibility is essential if the researcher is to keep pace with both changing public opinions regarding the sensitivity of stored information and the changing substance of research. These requirements may be met with the development of hardware modules or micro-coded instructional sets which the researcher himself can use as building blocks for made-to-order protection of data with different levels of sensitivity. On the. other hand, the audit portion of a combined audit-research system may warrant authority hierarchies for access to data which are geared to administrative and researchers' needs. Normally the social researcher wishes to meet his research objectives without incurring the responsibility or liabilities associated with access to joint information and without forcing a com- promise of the original conditions (e.g., a promise of confidentiality) under which information was originally supplied to an audit agency. The Shared File System (APL) developed by David Booth15 appears to have some relevance to this problem; it involves the use of access authorization codes associated with particular primitive (and unmodifiable) commands and particular roles. Presumably, research needs can be accommodated well by tailoring the system so that the researcher can operate with restricted functions in restricted work spaces and arrays, while locked out of his administrative or research colleagues' work spaces, and unable to examine or modify other functions and files stored in the same equipment. DISCUSSION: POTENTIAL USE OF A DATA BANK REGISTRY AND DEVELOPMENT AGENCY A paper as brief as this one must be cannot hope to give a detailed appraisal of the social scientists' needs in their efforts to maintain the confidentiality and security of the data they maintain. As a framework for summarizing those needs, suppose we consider the current proposals for a national registry of computerized data banks. The proposals are in the interest of developing mechanisms for solving problems in the security area and they may be helpful at the design as well as implementation stages of social research. It has been suggested that such a registry, coupled with a development agency, be created for the purpose of documenting the nature of computerized information systems, the kinds of personal data maintained in such systems, and the rules and practices which pertain to storage of data. Alan Westin's proposed "data bank on data banks,"16 John Kemeny's plan for a National Computer Development Agency,17 and other suggestions for monitoring large-scale data collection 8,18,19 seem to imply documentation functions of this sort. We can anticipate that such plans, if implemented, will be of considerable interest and use to social researchers, especially if they include the kinds of information listed below. POLICY AND PRACTICES IN DATA COLLECTION Given the diversity of social research programs, no single policy or managerial practice is likely to satisfy all public and private requirements for assuring confidentiality of data. Statistical methods for minimizing likelihood of identification, legal constraints against Security of Information Processing access as disclosure and administrative methods for assuring confidentiality have been developed, but they have been organized and appraised in only a few instances. 2,5,20 Regrettably, these strategies have not been tied well to more computer-bound technical devices such as those described by Hoffman,14 Peterson and Turn,12 and Goodfellow. 21 An agency with an information clearinghouse function, coupled with a development mission, would be quite helpful in documenting, consolidating, and organizing information in the following categories. Legal solutions Local, state, and Federal statutes relevant to privacy and bonfidentiality of data; court precedents, administrative regulary powers; empirical data on problems in enforcement of codes, and adherence to guidelines furnished by government agencies to social researchers regarding rights of privacy and conditions of disclosure. Administrative approaches Link file systems, 4 insulated data banks,l1 and other similar strategies for eliciting and merging sensitive data; vulnerability, utility, and frequency of the strategy's use; cost data. Statistical/Mathematical Solutions Documentation on applications of error inoculation5 and other approaches to depreciating probability of indirect disclosure ;13,20 costs and benefits of applications. Technical mechanisms Types of cryptographic encoding appropriate for computer applications; their cost and vulnerability; catalogs or listings of hardware and software security devices; possible relevance of new devices to specialized research needs (such as remote terminal application mentioned earlier; see also Reference 22). Empirical studies There is some real value in consolidation of data on people's resistance to data collection and to social research. Complaints about the collection of information and against organizational disclosure practices, concerns about the magnitude of data maintained, etc., 431 need to be well-documented. Although some empirical data exist, there is currently no single source on which the researcher may draw to establish the likelihood of privacy problems in the conduct of his research and to anticipate the costs of resolving them. In many cases, questions can be phrased to minimize embarrassment and/or threats of sociolegal action against a respondent. Some of the relevant strategieselimination or generalization of the inquiry, approximations to direct questions-are fairly well documented. 2 Small "item pools" or computerized retrieval systems containing questions which pertain to the same general behavior, but with varying levels of sensitivity and intrusiveness do exist. But data on both strategies and item pools are widely dispersed. There is still a great need for large, accessible item pools which have been tested for objectionability, intrusiveness, and susceptibility to error. Validity appraisals and secondary analysis Frequently, social researchers elicit anonymous information from previously identified samples or require research subjects to use an alphanumeric alias (in short term longitudinal studies), so as to minimize if not eliminate any risks that data will be used for nonresearch purposes. An information registry would be of considerable help in appraising validity of sampling and credibility of reporting in such efforts. Suppose, for example, a medical sociologist, who usually has no testimonial privilege for the data he collects,. relies on mailed or telephoned responses to his questionnaire on illegal methadone use. He might encourage the use of aliases to assure that his data are not appropriated (legally or otherwise) for harassment of his subjects, but he still needs to anticipate the redundancy of his data, and to appraise its validity since he does depend on voluntary responses. The researcher could do so if a data bank register furnished information about the existence of medical records, census data, police intelligence systems, etc., which contained relevant statistical data on the population from which subjects were sampled. And if identifiers were actually obtained he could merge his own data with existing files without violating access restrictions using some special administrative strategies which might also be documented in the same registry. SUMMARY The objective of maintaining security of social research data is an operationalization of the concept of "confidentiality" in social research. The problems in meeting 432 Fall Joint Computer Conference, 1972 the objective depend on where the research falls on a hypothetical audit-research continuum for the data, on the kinds of process being used to elicit the data, and on the level of identifiability of records necessary in the research. Major differences between audit and social research approaches to security problems stem from the social researcher's infrequent need to maintain joint identifying and statistical records, and the opportunity to use modified (alias) identifiers and modified response data (i.e., inoculated with random error in a controlled process). Aside from benefiting from systematic appraisal methods such as those described by Peterson and Turn,12 social researchers might do well to capitalize on other research efforts connected with security in information processing. Linkage systems and similar devices mentioned earlier depend very much on encryption schemes for assuring integrity of the system. The encryption transforms used in the examples cited have been limited to simple substitution of one character for another or simple linear transforms of original numerical characters. Perhaps certain kinds of transposition or additive transforms, as yet unfamiliar to the social scientist, can be adapted to this kind of problem to assure greater security. Certainly, the development of algorithms which help in checking whether indirect disclosure is possible or likely would be well received by managers of the re~earch data banks. Translating the structure of data sets into simple algebraic equations is a skill which is usually beyond the social scientists' own expertise. Judging from Fellegi's13 work and current activities by Turn,23 such algorithms are likely to require a great deal of techinal attention to efficiency, to heuristic alternatives to searching large sets of equations (data sets), to determining the likelihood of indirect disclosure, tasks in which the social must be educated by the computer technologist. Certainly, if proposals for national data registries and development centers are implemented, social scientists will have the opportunity to reduce redundancy in collection and maintenance of identifiable data. A centralized information source may help to stimulate more interest and expertise in technical solutions to problems in this area. Since most social research involves data which are heterogeneous with respect to their sensitivity and publicity, the researcher will benefit most from technological developments which associate more protection with increasing levels of sensitivity, and authority access designs which recognize these levels. REFERENCES 1 R F BORUCH A n annotated bibliography of randomiz.ed field experiments in policy research Background paper for Social Science Research Council's Committee on Experimentation Northwestern University 1972 2 R F BORUCH Maintaining confidentiality in educational research: A systemic analysis American Psychologist 1971 26 pp 413-430 3 T K GLENNAN Using experiments for social research and planning Monthly Labor Review February 1972 4 A W ASTIN R F BORUCH A "link" file system for assuring confidentiality of research data in longitudinal studies American Educational Research Journal 1970 7 pp 615-624 5 R F BORUCH Administrative, statistical, and legal solutions to the problem of assuring confidentiality in social research Paper presented at Statistics Department Colloquium University of Chicago 1972 6 J ROTHENBERG Urban economics In Nancy D Ruggles (Ed) Economics: Report of the behavioral and social science survey (NAS and SSRC) N J Prentice-Hall 1970 7 H BLACK E SHAW Detroit's social data bank In A F Westin Information technology in a democracy Cambridge Harvard University Press 1971 8 E B SHELDON Social reporting for the 1970's Chapter 7 Report of the President's Commission on Federal Statistics Washington DC US Government Printing Office 1971 9 W D WALL H L WILLIAMS Longitudinal studies in the social sciences London Heinemann 1970 10 D T CAMPBELL AdministratiJe experimentation, institutional records, and nonreactive measures In W M Evan (Ed) Organizational experiments Laboratory and field research New York Harper and Row, 1971 11 R F BORUCH Strategies for eliciting and merging confidential social research data Policy Sciences September 1972 (in press) 12 H E PETERSEN R TURN System implications of information privacy Proceedings of the 1967 Spring Joint Computer Conference American Federation of Information Processing Societies 1967 13 I P FELLEGI Question of statistical confidentiality Journal of the American Statistical Association 1972 67 pp 7-18 14 L J HOFFMAN Computers and privacy: A survey Computing Surveys 1969 1 pp 84-103 15 D F BOOTH File security for a shared file, remote terminal system Paper presented at the Conference on Computers, Privacy, and Freedom of Information (Mimeo) Queen's University 1970 16 A F WESTIN Civil liberties and computerized data systems Security of Information Processing 17 18 19 20 In Martin Greenberger (Ed) Computers, communications, and the public interest Baltimore The Johns Hopkins University Press 1971 M GREENBERGER (Ed) Computers, communications, and the public interest Baltimore The Johns Hopkins University Press 1971 President's Commission on Federal Statistics Report of the President's Commission Washington DC US Government Printing Office 1971 G B F NIBLETT Digital information and the privacy problem Paris Organization for Economic Cooperation and Development 1971 M H HANSEN Insuring confidentiality of individual records in data storage and retrieval for statistical purposes Proceedings of the 1971 Fall Joint Computer Conference 433 American Federation of Information Processing Societies 1971 21 B B GOODFELLOW Projections of the impact of technology on the development of large data base information systems Position paper presented at the Conference on Computers: Privacy and freedom of information Queens University Kingston (Canada) May 21-24 1971 22 N M BRADBURN Survey research in public opinion polling with the information utility-promises and problems In H Sackman and N Nie The information utility and social choice Montvale (New Jersey) AFIPS Press 1970 23 R TURN N Z SHAPIRO Privacy and security in data banks: Measures of effectiveness, costs, and protector-intruder interaction Proceedings of the 1972 Fall Joint Computer Conference American Federation of Information Processing Societies 1972 Privacy and security in databank systemsMeasures of effectiveness, costs, and protector-intruder interactions* by REIN TURN and NORMAN Z. SHAPIRO The Rand Corporation Santa Monica, California THE DATABANK SYSTEM INTRODUCTION The term databank implies a centralized collection of data to which a number of users have access. A computerized databank system consists of the data files, the associated computer facility (processors, storage devices, terminals, communication links, programs and operating personnel), a management structure, and assorted "interested parties." The nearly seven years of concern with data privacy and security in computerized information systems have produced a variety of hardware and software techniques for protecting sensitive information against unauthorized access or modification. 1- 7 However, systematic procedures for cost-effective implementation of these saf eguards are still lacking. The data security design and implementation process will remain more art than science until adequate theoretical foundations are laid and analytical tools developed for a "data security engineering" discipline. Needed in particular are measures for evaluating the effectiveness of data security techniques in various threat and implementation environments; methods for estimating the costs of implementing the safeguards in various classes of information systems; and tradeoff relationships between these and other relevant variables. Equally important is the ability to estimate potential losses. This paper strives to contribute to the formulation of data security engineering in the areas of personal information databank systems: a model of the personal information databank system is presented; the nature of the interactions of the databank security protector with potential intruders is explored; and the amount of security and implementation costs associated with several classes of data security techniques are discussed. Structure If the function of a databank system is to collect, store, retrieve, process, and disseminate personal data on individuals (or organizations), the databank system includes the following elements: • Subject, a person or an organization about whom data are stored in the databank system. He may have provided the data voluntarily, in a quasimandatory fashion to obtain benefits or privileges, or as required by law. Data on him may also have been collected without his knowledge or consent. • Controller, an agency or institution (public or private) with authority over the databank system and its operations. The controller authorizes the establishment of the databank system, specifies the population of subjects and type of data collected, and establishes policies for the use, dissemination, disclosure, and protection. • Custodian, the agency and its personnel in physical possession of the data files, charged with the operation of the databank system, and responsible for enforcing the policies established by the controller. • Collector, the agency and personnel who collect the data and transmit it to the custodian. * The research reported in this paper was supported by the National Science Foundation Grant No. GI-29943. Any views or conclusions contained in this paper should not be interpreted as representing the official position or policy of the National Science Foundation or The Rand Corporation. 435 436 Fall Joint Computer Conference, 1972 Privacy and security Privacy, confidentiality, and security are terms that refer to the philosophical, legal, and technical aspects of the subject's interactions with other elements of the databank system. • Privacy is the right of an individual to determine for himself what personal information to share with others, as well as what information to receive from others. Confidentiality Data Security Figure I-The Databank System • User, a person or agency authorized by the controller or the custodian to utilize specified subsets of data for specified purposes, subject to the disclosure and dissemination policies of the databank system. Other parties interested in the data and its uses include: • Intruder, a person or agency either deliberately attempting to gain unauthorized access to the databank system or making unauthorized use of the data normally available to him as an authorized user, or accidentally doing so. • Society, the population within which the subjects have rights and obligations, and whose welfare also affects the welfare of the subj ects. Large classes of databank systems are needed to support studies of the society, and administer and assess social benefit programs. Figure 1 illustrates the structure of a generalized databank system and displays the more prominent lines of communication between its elements. Note, however, that the elements of a databank system need not be unique. Multiple roles and overlap in functions are common in existing databank systems. For example, the controller, custodian, and user may be the same agency or group of persons. The role of a subject in the databank system is to provide the "raw material" (i.e., personal information about his characteristics, background, and activities) for the databank operation. The roles of the other databank system elements are to store and process these data, and to make the data available to users for making decisions affecting a specific subject, groups of subjects, or the entire society. It is also their responsibility to protect the data against misuse, intrusion and, when appropriate, the society's claim of the "right to know." Relevant questions for examining possible invasions of privacy by the data collection activities of a databank system include: 8 What personal information should be collected and stored to support the users of a specific databank system? To what extent should personal information from different sources be integrated to give a unified view of the individual? Who should be allowed to use the data and for what purposes? • Due process, in the context of personal information databank systems, deals with the right of the subject to know the information stored about him in a databank system and to challenge the veracity of such information. The relevant questions here include: Should an individual be entitled to know that information about him is being collected and stored? Should he be allowed to challenge the presence, accuracy, and completeness of this information? Westin8 points out that answers to questions dealing with privacy and due process are political, not technical, to be worked out by balancing the value of civil liberties against the needs of the society. • Confidentiality refers to the special status given to sensitive personal information in the databank system to minimize potential invasions of privacy. Disclosure of confidential data is restricted to users and only for purposes authorized by the controller or the subjects themselves. Confidentiality is achieved by legal and procedural means,9,lO,1l and by implementing techniques of data security. • Data security refers to the protection· provided to the databank system against deliberate or accidental destruction, and unauthorized access or modification, of the data. In the context of this paper, data security refers to technical and procedural means for protecting the data from intruders. I' 1 Privacy and Security in Databank Systems Within the databank system, the controller determines the nature of personal data to be gathered and a method of collection that satisfies the right of individuals for due process and establishes policies and procedures for data confidentiality. The collector and custodian have the responsibility to enforce the confidentiality policies and to provide procedures and technical safeguards for data security (see Figure 1). Classification I I' The nature of the databank ownership, the principal use of the data, and the characteristics of the computer facilities strongly affect the complexity of the data security problem. It is useful, therefore, to establish a classification system that reflects data security requirements. • Public-Private-Public databank systems are operated by government agencies. The controller, custodian, and users are legislative, judicial, or executive entities. Private databanks are operated by corporations or institutes within applicable laws. For example, the operation of credit information bureaus is regulated by the Fair Credit Reporting Act of 1970. databanks are • Statistical-Dossier-Statistical operated to produce statistical summaries. Individuals are not identified in the output, but identification may be needed in the databank to permit either periodic updating of longitudinal studies or linking with other databanks. In dossier databanks, personal data are used to take action on specific individuals. Precise subject identification is important. Dossier databanks can be used for statistical purposes. The converse, however, is not necessarily true. • Centralized-De centralized-A centralized databank consists of one databank. In a decentralized databank, there are several physically separated databanks, each containing a part of the overall data collection. The several databanks mayor may not be connected by a communication network. For example, the U.S. Internal Revenue Service maintains a decentralized databank system of income tax information. • Dedicated-Shared-In a dedicated databank implementation, the computer facility is used exclusively to serve the databank. In a shared system, other databanks or computer applications use the same computer facilities. • Off-line-On-line-An on-line databank permits direct real-time interaction of a user with the data 437 through a terminal. Access may be direct or indirect. In the latter case, a databank employee acts as an intermediary. In an off-line databank, the user is neither in control of data processing nor knows when his data request is processed. These classifications permit ranking databank systems in order of increasing complexity of potential data security problems, ranging from the public, statistical, centralized, dedicated, off-line databank systems (e.g., the U.S. Census Bureau), which can be expected to have relatively simple data security problems, to the private, dossier, decentralized, shared, on-line databank systems (exemplified by commercial credit bureaus and the future computer utilities), where every conceivable data security problem is likely. Threats and countermeasures Threats to data privacy, confidentiality, and security in a personal information databank system may arise from all elements of the databank system. For example, without the consent of the subjects, the controller may change disclosure rules; the custodian, collector, or users may disregard confidentiality procedures or use data for unauthorized purposes; the databank personnel, users, or even the subjects themselves may become intruders; and the databank equipment or programs may fail and cause accidental disclosures or data modification. Technical means by which the intrusion may be perpetrated include deception, nullification, circumvention of existing protective features, and wiretapping of communication links. Whether or not the intrusion threats actually materialize depends on the nature of the data stored, the potential value of the data to the intruder, the risks he is willing to accept, and the resources he is willing to invest. Countermeasures against the various threats include legal sanctions to deter confidentiality violations by the personnel and authorized users of the databank system, application of irreversible transformations on data in statistical databanks, and implementation of access control, threat monitoring, and cryptographic techniques.l.2.7.12 The design criteria for data security systems include effectiveness, economy, simplicity, and reliability. Although social policy may prefer protection of confidentiality at any cost, the rational approach to security system implementation is to protect only the data worth protecting. The following section outlines a model of the economic interactions of a rational protector of the databank system and a rational, profit-motivated 438 Fall Joint Computer Conference, 1972 intruder. This model can be used to discuss the design of cost-effective data security systems for various classes of databank systems. A MODEL OF PROTECTOR-INTRUDER STRATEGIES Consider the case where economic profit motivates an intruder to attempt penetration of a personal information databank system. In particular, assume that the intruder wants to compile a "mailing list," L, of N information items, each of which has the market value k. The total market value, V, of the list L is then V=kN • 1(0, Y) =/(X, 00) =0, for X, Y>O; • 1 (X, Y) is monotone non-decreasing In X and (1) To perpetrate the databank penetration, the intruder makes an investment, X. If the intruder requires a minimum profit, rX, r>O, then his maximum investment to obtain the list L is . X=kN/(1+r) obtained by the intruder when he expends X amount of resources to overcome the Y amount invested by the protector. l(X, Y) is an expected value since the probability of success for the intruder is not necessarily unity. For example, the intrusion may be thwarted because of the intruder's incomplete information about the databank's security system or even by a computer error. As is apparent from the previous discussion of the nature of X and Y, l(X, Y) is not a simple function of X and Y. However, some of its elementary properties are (2) where it is also assumed that this intrusion is an isolated event that does not significantly benefit from previous, nor contribute to future, intrusions. The possibility of selling multiple copies of the list could be easily accommodated. The intruder's investment, X, is an expected value and should take into account the probability of failure and the risk that the databank's deterrence and retaliatory mechanisms may lead to additional costs. To counter this intrusion threat and others, the protector of the databank system expends Y resources for data security measures. This investment should reflect the value of the protected information to the subjects, to the protector himself, and to potential intruders. Thus, prudent investment decisions of the protector would be: • Not to commit large resources to protect information of little value to the potential intruders, even if the subjects are very strongly against the possible acquisition of this information by the intruders. • Not to expend large resources to protect information whose release would not greatly disturb the subjects, even if the information would be valuable to the intruders. • To commit most resources to protect information that is valuable to the intruders, and whose acquisition by the intruders would be very detrimental to the subjects. Consider the protector-intruder interaction further. Let / (X, Y) be the expected amount of information monotone non-increasing in Y. Let feN) be the value to the intruder of N units of information and g(N) be the cost to the protector and subjects of the same N units of information, occurring as a result of the intruder acquiring this information. Then, for given X and Y, the expected net profit of the intruder, veX, Y), is veX, Y) =f(I(X, Y))-X (3) while the net loss to the protector and subjects, u(X, Y),' IS u(X, Y) =g(I(X, Y))+Y (4) Given sufficient information regarding the expenditures of the protector, Y, and the nature of the security system implemented, an intruder may vary his investment, X, to maximize the expression (3). A rational protector would utilize his estimates of the value of protected information, the technical feasibility of threats, and the likely resources of the intruders to vary his expenditures, Y, to minimize the expression (4). It follows that if f, g, and 1 are suitably differentiable in a region containing X and Y, the selected values of X and Y will satisfy f'(l(X, Y))a(X, Y)/aX=l (5) g'(/(X, Y))al(X, Y)/aY=-l (6) where the prime denotes differentiation. If one or more of the functions 1, f, or g are not differentiable in the region containing (X, Y), then the expressions (5) and ( 6) must be replaced by more complex conditions. To use the above interaction model, analytical or empirical expressions are required for • The value of personal information to the intruder (i.e., the functionf(N». I Privacy and Security in Databank Systems • The value of personal information in the databank to the protector (i.e., the loss function g(N)). • The amount of security provided by various data security techniques (i.e., the expected expenditures, X, of intruder's resources). • The costs of implementing the security barriers. • The tradeoff relations between the amount of security (intruder's cost) and the protector's cost. These items are difficult to determine and are often sensitive to the particulars of a databank security system and the information protected. There are, however, certain general features that can be discussed in qualitative terms. VALUE OF PERSONAL INFORMATION Securing personal information in a computerized databank system requires estimating the value of protected information to the potential intruders, the subjects of the data themselves, and the protectorcustodian of the databank system. In general, this is a difficult task involving emotional as well as economic considerations. The following discussion represents only a preliminary exploration of this problem. I I Value to potential intruders A flourishing market for information has always existed. The value of trade secrets, marketing information, new product plans, and customer lists that are acquired by intruders in industrial espionage operations amounts to millions of dollars annually.13.14 The value of personal information to potential intruders is more difficult to estimate. A personal information market exists for mailing lists of names and addresses of persons satisfying selected criteria. These are used mainly for mailing advertising literature or making sales calls, but they are also sought for political and even criminal purposes. The mailing list rates for advertising purposes are approximately $10 per 1000 names ;15 this price increases with sophistication of selection criteria. Currently, the sale of name and address lists compiled for public information is not illegal and is practiced at all levels of government i agencies. However, Federal legislation is pending26 to I make illegal such sales without the consent of the subjects involved. I' The value of information on specific individuals can II be expected to vary from next to nothing to thousands of dollars, depending on the prominence of the in'iii dividual, the nature of the information, and his susceptiIii, bility to blackmail, political smear, or litigation. I I I, i,\ ,: 439 Given the relatively high cost of penetrating the security barriers or subverting the employees of a databank system, it is likely that intrusions involving personal information are likely to be bulk operationslarge numbers of information items would be obtained per intrusion, or many intrusions would be attempted to amortize the initial expenditures. Prime-target personal information includes information held confidential by Federal or state statutes (criminal justice, public health, psychiatric, financial status, family background, etc.). Such information could be utilized for perpetrating frauds, high pressure sales, and blackmail. Illicit "purging" of records for a fee, or planting of fabricated information, may be attempted. Court records, statistics on fraud and blackmail, and mailing list prices may provide the initial empirical data on the value of personal information to the intruders. Value to the subject The value to the subject of protecting his personal information can range from very little (for much of the population who, at most, would be annoyed by sales literature or salesmen's telephone calls), to thousands of dollars for those vulnerable to blackmail or character assassination. Indeed, the value to intruders of the latter type of information stems directly from the value that the subjects place on the same information, as evidenced by their willingness to pay. The value of information of certain categories (e.g., family background) may be a time-varying function of contemporary mores. Empirical data on value of information can be gathered from statistics on the use of unlisted telephone numbers and the effects of fees for this service; the insurance premiums paid by municipalities, banks, credit bureaus, and other personal information handlers against "invasions of privacy" lawsuits; the willingness of individuals to accept money, and how much, in exchange for releasing personal information; and surveys of attitudes concerning privacyP Considerable collections of such statistics, and correlation with various population groups, are required to establish even first-order guidelines on estimating the value of personal information to individuals themselves. Value to the protector The value of personal information in databank systems manifests itself to the protector as: • The legal liability of the custodian to damages 440 Fall Joint Computer Conference, 1972 incurred by subjects whose data has been divulged to intruders through inadequate security measures or through personnel negligence. This reflects itself in the insurance premiums and payments for damages that the databank may have to make in addition to insurance coverage. • The pressure on the custodian by the controller may result in firing of personnel, cuts in budget, restrictions of operations, etc. The dollar values of such losses could be estimated from analogous actions taken against agencies other than databanks. • The cost of re-creating the files in cases of data destruction. It is apparent that the functions feN) and g(N), representing the value of N items of information to the intruder, the protector and the subjects, respectively, cannot quantitatively provide for all possible situations. In a more complete protector-intruder interaction model, N would be a multidimensional vector whose components represent types of information, rather than a scalar. AMOUNT OF SECURITY AND COSTS The amount of security provided by a data security technique refers to effectiveness against intrusion. As suggested previously, an intruder's expected expenditure of resources in overcoming a security barrier may be a suitable measure. Before attempting to penetrate a databank security system, an intruder must: • Obtain sufficient information about the databank system to determine whether it contains the desired information; what data security techniques are applied; what is the probability of. success; and what are the penalties for failure. • Formulate an acceptable intrusion plan to satisfy the cost constraints, and provide acceptable probabilities for success and risk. • Gain physical access to the databank system either directly through a terminal, communication links, computer, etc., or indirectly through an employee of the databank system. • Penetrate into the databank; nullify or circumvent the data security techniques to gain access to the information; acquire the information for subsequent analysis; and escape detection and reactive measures sufficiently long to complete the action. The objectives of a security system are to deter a profit-seeking intruder by raising the intrusion cost to a level that reduces his expected profits to an unacceptable level, and to prevent access by intruders not economically motivated through effective access control and threat monitoring techniques. Effective integrity management programs must be implemented to maintain personnel loyalty and reliability of equipment and software. These three classes of data security techniques must be applied against intruders to: • Deny information about the security system. It may I not be possible, or even not desirable,18 to maintain secrecy about the security techniques used, but the specific access codes and keys must be kept from all but a few authorized personnel. • Prevent unauthorized access to the computer system (terminals, communication links, processor, data storage devices), the protected data files within the computer, and to specific data processing operations. • Detect intrusion attempts; discriminate among threats; sound alarm; and take responsive action. • Maintain integrity of the databank system by reducing opportunities for personnel subversion, increasing hardware and software reliability, and controlling any changes in software or hardware. The amount of security The burden of preventing intrusion is borne by the access control techniques. Threat monitoring is used mainly to reduce the time available for perpetrating the intrusion and for post facto investigation. The basic elements of access control are: • Authorization of persons to access the computer facility, terminals, data files, and processing operation. • Identification of a person· seeking access. • Authentication of his identity and access authorization. Not all databanks have implemented all of the above steps as part of the access procedure-in some, the mere possession of a valid password is considered sufficient. The enforcement of access control techniques may be assigned to computer facility personnel, performed by hardware devices, or implemented in software. To defeat an access control technique, an intruder I I Privacy and Security in Databank Systems must be able to accomplish one of the following: • Acquire or forge the proper identification and authentication passwords or keys. • Curcumvent or disable the access control technique. The choice depends on the technical feasibility of these approaches and, for those deemed feasible, the relative costs, risks, and required time. Acquisition of access control inforlllation The protective capability of passwords and privacy transformation keys lies in the intruder's uncertainty regarding which of the very large number of possible passwords or keys is being used. For example, there are 265~1.2XI07 possible 5-character and 266~3.1XI08 possible 6-character passwords. Nevertheless, a trial and error search for the correct password is not entirely infeasible: a minicomputer can be. programmed to imitate the databank terminal's sign-on and password sending sequences. This computer i can then be used to try different passwords at the rate , permitted by the communication channel and the databank computer. The intruder's effort is greatly reduced if the passwords used by the databank are selected for their mnemonic capability (i.e., are similar to English words). For example, studies of 5-character alphabetic code words that were required to differ in at least two characters and contain at least two vowels show19 that only 150,480 5-character words can be selected out of the total space of 1.2 X 107. To test all of these at the rate of 10 per second would require slightly more than 4 hours. \I" However, passwords could be obtained with less II effort by wiretapping the communication links and J recording the sign-on sequences. 20 Acquisition by wiretapping of passwords that are used once-only requires more sophisticated techniques, e.g., "piggy backing":1 insertion of a minicomputer in the line to intercept user-computer communications, to return an error code to the user, and to enter the file with the password obtained. If passwords are generated by a " pseudorandom process for once-only use, and several passwords are intercepted, certain number-theoretic 1,' techniques may be applied to discover the password generation process and its parameters. The intruder's cost of acquiring passwords through wiretapping ranges from the cost of recording equipment-a few hundred dollars, to the cost of a minicomputer and associated programming-a few thousand dollars. The risks include the possible legal prosecution. 441 Cryptanalysis of privacy transforlllations The intruder's work factor in attempting to solve for the key of a privacy transformation from intercepted, enciphered data is normally much larger than required for passwords. The key spaces are much greater, and exhaustive trial-and-error solution is infeasible. However, analysis of intercepted transformed data from the point of view of language statistics can be applied. Relevant are • Single character frequency distribution; • Digram (pairs of characters) and polygram frequency distributions; • Word usage patterns; • Syntactical rules of the language. The two main classes of privacy transformations are substitutions of characters in the data with other characters (or groups of characters) and transportation of the order of the characters.21.22.23 The easiest to apply in a computer system are the substitution transformations: • M onoalphabetic substitution, or the "Caesar cipher," where each character, Xi, of the data (the "plaintext") is transformed into a character, Yi, of the "ciphertext" by modulo N addition of a constant c (modN) where N is the size of the alphabet. The constant c has only N -1 possible values and, thus, can be easily discovered. • Polyalphabetic substitution of period u (the Vigenere cipher) consists of cyclic application of u monoalphabetic substitutions by adding modulo N the constants co, CI, .•• , C'U-I so that I YO=XO+eo I YI=XI+CI I (modN) I I I, I I The key space here contains Nu possible selections of the constants co, •.. , C'U-l. • A k-loop polyalphabetic substitution uses k sets of alphabets, applied cyclically with periods Ul, ••• , Uk: Yi=Xi+Cl,J(mod 'U1)+··· +Ck,i(mod 'Uk) (mod N) where the Ui are relatively prime (mod N) • A Vernam cipher is a polyalphabetic substitution (Vigenere) where the key period is at least as long as the amount of data to be transformed. 442 Fall Joint Computer Conference, 1972 Computer-aided solution of substitution- transformations has been studied by Tuckerman. 24 Such solutions can always be found, provided that sufficient contiguous lengths of transformed data (ciphertext) can be acquired). If the ciphertext contains fragments of known data, even if their precise location is not known, the cryptanalysis task is greatly simplified. In the case of highly formatted artificial languages (programs), where the fixed vocabulary is very small and used with rigid observance of syntax and punctuation rules, fragments of known plaintext are very likely. If the polyalphabetic cipher keys are relatively short and coherent (phrases of a natural language), the task is even further simplified. The techniques for solving substitution-type transformations proceed as follows :24 • A Caesar cipher, where the key consists of a single constant, is easily solved by language statistics or trial and error. Shannon21 has shown that for natural language plaintext in English, the sufficient length of a fragment of intercepted ciphertext (the unicity distance) is about 30 characters. • A I-loop polyalphabetic (Vigenere) cipher of period u is reduced to u Caesar cases by statistical analyses and trial-and-error determination of the key period, u. At least 20 u characteris of intercepted text are required. • A 2-loop polyalphabetic cipher of periods u is reduced to one loop case by certain "differencing" methods.24 Then, the I-loop analysis can reduce the problem to Caesar cipher level. At least 100 (u+v) characters of ciphertext are required. The effort is considerably greater than for the 1~loop case. • The Vernam cipher (where the key is as long as the data, used only once, and generated by a natural random process) cannot be solved. However, if .the key is generated by a pseudo-random process, such as a shift-register sequence generator, and plaintext fragments are known, then computer-aided trial-and-error methods may lead to a solution. The intruder's work factor in the above cryptanalytic activities requires a sufficiently powerful computer and appropriate cryptanalytic programs. Given these, solutions are sometimes found in minutes. 24 To successfully attack privacy transformed data requires an investment measured in thousands of dollars for the more complex systems. The work factor is in terms of hundreds of dollars if simple substitutions are used. Circu:niventing or disabling of access controls Circumvention of access controls enforced by databank personnel can be attempted by using the well- developed techniques of diversion, confusion, or in~ timidation. Costs are low and risks involve being "kicked out," which in turn might be good diversion for permitting an accomplice to enter. Personnel other than professional security guards are well-known for their reluctance to challenge others not known to them. Hardware access control devices (e.g., locks operated by keys or controlled by programs) are usually effective, especially if connected to alarm systems. 25 However, some types could be easily disabled, thus reducing the enforcement to facility personnel. Assistance of unsuspecting facility employees could be recruited with the "forgot my key" gambit. Costs and risks are low. Circumvention of software enforced access controls (i.e., the protective features of operating systems) requires that the intruder gain not only access to the computer through regular or illicit terminals, but also the ability to enter programs into the system. Diversion and "flooding" techniques may be able to overwhelm the threat monitoring system long enough to perpetrate the intrusion. 6 The resources required by the intruder include a computer to develop and test the intrusion plan and programs. The risk is low. However, the operating systems designed for high security3,4 may escalate the intrusion costs into the thousands or even ten thousands of dollars. Protection costs The costs involved in implementing a data security system include the initial planning and design, initial investment in hardware devices and software, the recurring operating costs, and the decreases in functional capability. The available cost data is very limited and does not suffice for formulating analytic expressions for the protector-intruder interaction mo-el described above. Hardware access devices, such as card-key locks for doors or computer terminals, are priced in the $150-300 : range per unit. Complete systems start from $5000. Hardware implemented data privacy systems for communication links cost in the $2000 range per unit. Data on software implementation of access controls in operating systems is equally scarce. The following represents almost the entire cost data base:3 ,5 I I Main memory requirements: Programming time: Operating system code: Recurrent CPU time: 10-20% 5% 10% 5-10% Some cost data points are also available for the implementation of privacy transfor~ations in software. Privacy and Security in Databank Systems In substitution type privacy transformations, each character of plaintext is transformed into a character of the ciphertext by addition of one or more constants, Cj. Also required are similar decoding and the necessary key-retrieval operations. In terms of the percent of the databank operating system overhead, the following computing time requirements have been established for applying privacy transformations to IO-bit characters in a CDC 6600 computer:7 One-time Vernam ciphering: Vigenere ciphering (table look-up) Vigenere (modulo arithmetic) 0.66% 3.5% 6.3% The above cost figures are quite sensitive to the type of information retrieval system used and represent only isolated cost data points. Estimates of decreased functional capability of the databank system caused by security requirements are even less available. A systematic effort to compile a comprehensive data base of security system costs and decreases in functional capability is clearly needed. CONCLUDING REMARKS I III The design of cost-effective data security safeguards for personal information databank systems requires a careful balancing of the value of protected information against the protection costs. In particular, it is important to consider not only the value of personal information to the subjects, but also to the potential intruders, i.e., the protection investments should be made on a rational basis. The simple protector-intruder interaction model discussed in this paper illuminates the nature of the protector's investment problems when faced with an equally rational intruder. However, before this or any other interaction model can be fully utilized, it is necessary to formulate appropriate analytical or empirical relationships among the value of information to the parties involved, the costs of protection and intrusion, and the effectiveness of data security and intrusion techniques. Deriving such relationships and gathering empirical data will be a major objective of the authors' further work in this area. ACKNOWLEDGMENTS The authors would like to acknowledge valuable suggestions and comments by their colleagues at The Rand Corporation, Mario L. Juncosa, Irving S. Reed, and Selmer M. Johnson, and by Robert H. Courtney of the IBM Corporation. 443 REFERENCES 1 H E PETERSEN R TURN System implications of information privacy AFIPS Conference Proceedings 1967 SJcq Vol 30 pp 291-300 2 W F BROWN AMR' s guide to computer and 80ftware 8ecurity AMR International Inc New York 1971 3 C WEISSMAN Security controls in the ADEPT-50 time-8haring 8y8tem AFIPS Conference Proceedings 1969 FJCC Vol 35 pp 119-133 4 G S GRAHAM P J DENNING Protection-principle8 and practice AFIPS Conference Proceedings 1972 SJCC Vol 40 pp 417-429 5 C WEISSMAN Trade-off considerations in 8ecurity 8y8tem design Data Management April 1972 pp 14-19 6 D VAN TASSEL Computer 8ecurity management Prentice-Hall Inc Englewood Cliffs New Jersey 1972 7 W A GARRISON C V RAMAMOORTHY Privacy and security in databank8 Technical Memorandum No 24 Electronics Research Center University of Texas Austin Texas November 21970 8 A F WESTIN Civil liberties and computerized data 8y8tems in M Greenberger (Ed) Computers, Communications and Public Interest Johns Hopkins Press 1971 9 A F WESTIN Privacy and freedom Atheneum New York 1967 10 A R MILLER AS8ault on privacy: computer databanks and d08sier8 University of Michigan Press Ann Arbor Michigan 1971 11 P NEJELSKY L M LERMAN A research-subject te8timonial privilege: what to do before the subpoena arrives Wisconsin Law Review Vol 1971 No 4 pp 1085-1148 12 R TURN H E PETERSEN Security of computerized information system8 Proceedings Carnahan Conference on Electronic Crime Countermeasures University of Kentucky Lexington Kentucky 1970 pp 82-88 13 R DONOVAN Trade 8ecrets Security World April 1967 pp 12-18 14 P HICKSOM Industrial espionage Spectators Publications Ltd London 1968 15 Firms sue in mailing list theft Computerworld 8 July 1970 16 Security breach leads to police data theft Computerworld 10 February 1971 17 A national 8urvey of the public' 8 attitudes toward computer8 AFIPS-Time Inc New York & Montvale New Jersey November 1971 18 P BARAN On distributed communication8: IX, Security, 8ecrecy and tamper-free con8iderations The Rand Corporation RM-3765-PR August 1964 444 Fall Joint Computer Conference 1972 19 W F FRIEDMAN C J MENDELSOHN Notes on code words American Mathematical Monthly August 1932 pp 394-409 20 J M CARROLL The third listener Dutton 1969 21 C E SHANNON Communication theory of secrecy systems Bell System Technical Journal 1949 pp 656-715 22 D KAHN The codebre,a,kers The Macmillan Co New York 1967 23 M B GRIDANSKY Cryptology, the computer and data privacy Computers and Automation April 1972 pp 12-19 24 B TUCKERMAN A study of the V igenere-Vernam single and multiple loop enciphering systems IBM Corporation Report RC 2879 14 May 1970 25 R J HEALY Design for security John Wiley & Sons Inc New York 1968 26 U.S. Senate Bill S.969 US Senate 25 February 1971 Snapshot 1971-How Canada organizes information about people by JOHN M. CARROLL University of Western Ontario London, Canada one developed nation makes use of information handling technology in the management of personalized information. In the largest sense, the most significant thing about this study is that in Canada concern regarding potential invasions of individual privacy of information abetted by computers arose initially within the federal government. This fact is borne out by a tabulation of responding organizations who indicated that they had received complaints from the public regarding their data handling practices. Only 16 percent reported receiving any. INTRODUCTION In 1971 the Government of Canada initiated a study to determine whether the computerization of personally identifiable records concerning or describing Canadian residents would diminish their quality of life or adversely affect their life chances, and to propose remedial action in the event this premise proved to be true. The study was carried out by a joint Task Force appointed by the Departments of Communications and Justice. The empirical studies group of the Task Force was charged with determining the magnitude and composition of personal data banks in the public and private sectors and the means by which such data are gathered, processed, stored, and disseminated. This paper summarizes the results obtained by this group. The investigative procedures used consisted of soliciting briefs from organizations thought to be interested in the subject, making formal site visits to selected , firms and agencies, conducting field studies to gather background information on the organizations to be visited, sending letters of inquiry to multinational organizations, and mailing a detailed questionnaire to " all Canadian organizations believed to possess significantly large files of personal data that were or might II become computerized. Although the site-visit technique provided the principal input regarding large government data banks such as those of the Royal Canadian Mounted Police, III Statistics Canada, the Department of National Rev\ enue/Taxation, and the Department of National Health and Welfare, it was the questionnaire which provided the most comprehensive information of a quantitative nature. Over 2,500 questionnaires were mailed and the response exceeded 50 percent. What emerges from this portion of the study is a finely detailed snapshot of how Number of Respondents Nature of Complaint Inadequate provisions to review one's own record 200 Methods of collecting personal information 180 Practices of disseminating personal information 160 CHARACTERISTICS OF THE RESPONSE BASE Organizations which replied to the questionnaire employed about one-sixth of the labor force. Thus the questionnaire returns, with due allowance for potential distortions, represent a comprehensive overview of Canadian data banks-or, more specifically, of data banks containing identifiable personal data about individuals. The largest number of respondents employed less than 100 persons each: 23 percent of the respondents had 80 percent of all employees. I' I '11'1 , II I I, Employees Customers Subjects Data Reaipients 445 Average Total 980 61,000 70,000 4,900 1,200,000 65,000,000 24,000,000 2,000,000 446 Fall Joint Computer Conference, 1972 "Customers" were defined as including present clients, customers, patients, students, policy-holders, and members (of associations). Many Canadians are customers of several organizations. The most numerous group had between 2,000 and 25,000 customers each; 14 percent of the respondents had 83 percent of all customers. Only 40 percent of the respondents said they had files on individuals regarded by them as "subjects", defined to include prospective customers, persons upon whom credit or criminal records are held, auto registrants, and subjects of research studies. Federal agencies dealing with veterans affairs, family allowances, and manpower and immigration responded under this denomination as did some provincial public health agencies. The most numerous group had less than 1,000 subjects each; 7 percent of the respondents had 51 percent of all subjects. With regard to information recipients, only 37 percent of respondents admitted to having any; 16 percent of the respondents served 95 percent of information recipients. Thus, there comes into focus the picture of an information elite that uses vast files of personalized information as its base of power. CHARACTERISTICS OF FILES The files reported upon contained over 83 million records. Respondents in the most numerous classification had fewer than 5,000 records each; 19 percent of the organizations held 90 percent of the records. It was our practice to request information on what we perceived to be the largest file held by a particular questionnaire recipient. COLLECTION OF DATA The subject himself is the prime source of information. Health services are in second place. One would expect references to be checked, but it is interesting that they turn out to be more important sources than former employers, present employer, or educational institutions. We find it significant that published records are rarely consulted and that law enforcement agencies are sources at all. Figure 1 shows the relative utilization of the more common sources of information; Figure 2 shows the relative use of less common sources. We found that the data gatherers most likely to tap medical sources included health services, insurance companies, social welfare agencies, charitable institutions, and regulatory agencies. Data gatherers most likely to approach present or former employers included merchandising houses, employment agencies, insurance I Subject I Medical I Reference s Average Characteristics of Files Size of file Size of record Number of requests for information Period of retention (inactive records) regarding persons in the "subject" category included credit bureaus, police forces, motor vehicle bureaus, and mailing-list suppliers. With regard to the time a record is held after an individual has severed his connection with the organization, 534 respondents said they keep such records seven years or more. 72,000 520 1 , 300 67 records characters per year months With regard to size of record, the largest response category had record sizes under 300 characters. This was offset by 90 organizations whose record sizes exceeded 2,000 characters. Over a million requests for information were reported on a yearly basis; 791 respondents said they had fewer than 100 requests a year, while 46 organizations said they answered more than 10,000 requests a year. Organizations which reported that they responded to more than 10,000 requests for information annually I Ex-Employer s Employer I Schools I publication stJ I I Number of Respondents I _I 1,000 ORDINARY SOURCES OF DATA Figure 1-Commonly used sources of information concerning or describing individuals How Canada Organizes Information About People companies, police forces; and prospective:-:-employers; Agencies most likely to interview a subject's family included health services, social welfare agencies, charitable institutions, and police forces. Organizations most likely to interview a subject's neighbors included health services, educational institutions, insurance companies (through credit bureau representatives), police, and social welfare agencies. Police forces reported that they principally consulted other police, regulatory agencies, private investigators, insurance companies, and employers. Private investigators reported that they obtained information from police, insurance companies, other private investigators, social welfare agencies, and regulatory agencies. Among the techniques employed by data gatherers, protection of informants outranked confirmation of facts from independent sources in importance to the data gatherer. In response to the questions as to whether the individuals upon whom records were kept or groups representing their interests ever complain against the method of collecting any item of information, five organizations said they get frequent complaints, 910 said they get none at all. Most likely to receive complaints regarding methods of collecting personal data are law-enforcement agencies, motor vehicle bureaus, credit bureaus, travel- Farr,ily Infcrrr.ation Suppliers Privat.E' Investigators Police Other Hecipients Neighbours 447 and-entertainment card companies, and insurance com.;. panies. CUSTODY OF INFORMATION As to management policies regarding disclosure of personal data, 55 percent of respondents said they have an unwritten policy, 33 percent have a written policy, and the rest have none at all. Non-profit institutions were twice as likely to have a written policy than were profit-making organizations. We inquired whether an explicit statement of the organization's policy was communicated. Responses revealed that it is highly likely that, where such a policy exists, it will be communicated to employees charged with records management but unlikely that it will be communicated to either the subjects of the records or to the general public. As to policing the actions of staff with regard to misuse of personal information: 23 percent of respondents do not police the actions of their own staff; 67 percent do police the actions of staff but claim they don't catch any offenders; 10 percent police the actions of staff, catch some offenders, and prosecute or discipline the ones they catch. With respect to the likelihood that an organization will take effective action against its own employees for misuse of personalized information in its files, nonprofit institutions were nearly twice as likely to take effective action than were profit-making organizations. The organization,s most likely to take effective action were motor-vehicle bureaus, police, public utilities, credit bureaus, and health services. Response to the question as to whether individuals on whom records are kept or groups representing their interest ever complain about disclosure of personal information revealed that four organizations get frequent complaints; 873 get none at all. Most likely to receive complaints regarding disclosure of personal data were motor vehicle bureaus, credit bureaus, educational institutions, law-enforcement agencies, social welfare agencies, and employment agencies. DISSEMINATION OF INFORMATION Nunber of Respondents 1,000 EXTRA-ORDINARY SOURCES OF DATA Figure 2-Less commonly used sources of information regarding individuals Regarding exchange of information with other organizations 38 percent of respondents, said they did exchange information; 62 percent said they did not. Most likely to disclose personal data outside their own organizations are motor vehicle bureaus, regula- 448 Fall Joint Computer Conference, 1972 NU~TURING ORG~IZATIONS 40 a 0 on +J rU Q Obtain Information 0 Ii-! a I-! "til +J c::OJ i 'lj a 0 0. til & Ii-! 0 +J c:: Q) 0 I-l OJ A.c. their subjects conform to the norms of society. Nurturing organizations tend to supply information to groups in the other two categories. Business-type organizations tend to exchange information freely, principally with org~nizations of the same general type. Authoritarian organizations appeared to gather personal information in a volume disproportionate to their relative number in the response base and to communicate little information to other organizations. These patterns of information interchange are depicted in Figures 3, 4, and 5. With regard to international traffic in personal information, 61 organizations said they frequently supply information to U.S. organizations; 107 organizations said they frequently obtain information from U.S. organizations. We found only five organizations had their files entirely in the U.S.A. Organizations most likely to have BUSINESS-TYPE ORGANIZATION Employers Supply Information ~ Size of Category D Amount of Information i Figure 3-Information interchange patterns of nurturing or subject-serving organizations show them as sources Obtain Information tory agencies, educational institutions, credit bureaus, health services, insurance companies, oil companies, and law-enforcement agencies. Information is most commonly furnished in response to specific requests. Publication of periodic reports for widespread distribution is a rarity. J EXCHANGE OF INFORMATION We utilized information developed by analysis of responses to the Task Force questionnaire to construct a matrix illustrating the degree of exchange of personalized information among organizations. We found it convenient to classify these organizations as nurturing, that is concerned principally with the well-being of the individual; business, that is dealing with the individual on. a quid-pro-quo· basis; and authoritarian, or interested primarily in ensuring that Supply Information ~ Size of Category o Amount of Information Figure 4-Information interchange patterns of business-type or self-serving organizations show them as dynamic storage elements How Canada Organizes Information About People some files containing personal data located in the U.S.A. were oil companies, associations (especially labor unions), insurance companies, health services, manu .. facturers, and lending institutions. About 10 percent of all organizations employing 500 or more persons had some of their files in the U. S. A. Ten organizations had their customers entirely in the U.S.A. and 10 organizations had their information recipients entirely in the U.S.A. Sources in U.S.A. 449 J Recipients in U.S.Jl". J I Customers :=J Files AUTHORITARIAN ORGANIZATIONS Would Locate F'iles in U.S.A. J 40 Would P:rocess Data in U.S.A. i I Obtain Information I I I Number of Respondents TRP~FIC I 800 WITH U.S.A. Figure 6-International information interchange: Canada-U.S. traffic Police Private Investigators 800 Information Suppliers Supply Information ~ Size of Category o Amount of Information Figure 5-Information interchange patterns of authoritarian or society-serving organizations show them as sinks 55 With regard to future intentions to locate files in the U.S.A., more than three quarters of responding organizations said they would not. do so; 57 organizations said they already had files in the U.S.A. The remainder said they would do so to save money or if they would be placed at a severe disadvantage by not doing so. Figure 6 summarizes information developed regarding exchange of data between Canadian and U.S. organizations. 60 64 69 a;fter 69 YEAR ACQUIRED COMPUTER o First Acquisition ~ Last Acquisition Figure 7-Trends in acquisition of central processors in Canada 450 Fall Joint Computer Conference, 1972 EXTENT OF COMPUTERIZATION UTILIZATION OF COMPUTERS Roughly half of our respondents (about 500) utilize electronic data processing equipment. Of these, about 300 have their own computers and 200 employ the facilities of computer service bureaus. Of the respondents having computers, about 1/3 have facilities for remote access from terminals. The average computer user among our respondents first began computer processing of records in 1964-65 He procured his present machine in 1967; so we are looking at a group of computer users who were initiated on second generation computers and later upgraded to third generation machines, in other words, a populalation of sophisticated users. Figure 7 illustrates the trend in acquisition of computing equipment. Despite the widespread use of computers by organizations, the penetration of computers within organizations is not all that great. Relatively few computer users report that they hold computerized records on all or most persons in any given category; Still fewer users report that they have computerized all or most information held o~ such persons. The following table summarizes Task Force findings with respect to the classification of files reported upon, the percentage of respondents who hold computerized records on all or most persons in each category, and the percentage of respondents who say they have computerized all or most of the information they hold on each of these persons. CHARACTERISTICS OF MACHINES Extent of Computerization (Percent of respondents) The average computer reported upon may be regarded as a large machine; 123 organizations have computers whose memory size exceeds 256,000 words of core storage. Average on-line disk storage capability appears to be adequate for remotely accessed time-sharing should the user so desire. Average Characteristics of Computers Core Memory On-line disk memory 133,000 WORDS 130,000,000 BYTES One hundred twenty-four respondents said they had high-speed remote terminals. Of these, 102 had less than six terminals; 22 had six or more. Use of keyboarded remote terminals was reported by 134 respondents; 101 had less than 12 such terminals; 26 had from 12 to 200 terminals; seven had more than 200. Seventy-three percent of respondents have implemented physical access controls over electronic data processing equipment; 39 percent have implemented hardware or software security measures such as passwords, terminal identification codes, or cryptographic coding; 42 percent routinely seek to establish the personal integrity of processing personnel; 58 percent report utilizing audit logs or other access-monitoring methods; 69 percent employ secure disposal methods for unwanted tapes or printouts; and 31 respondents report· implementing security measures beyond access control, integrity checks on processing personnel, audit logs, and secure disposal methods. Category Employees Customers Subjects Subject matter of file 31 55 14 Hold compu ter records on all or most persons Have computerized all or most information 58 72 30 40 30 22 Most computer users said they supplement their machine sensible files with manual files. Characteristics of Manual Files (Percent of respondents) Supplement computer files with manual files Manual files contain more subjective information Manual files contain more sensitive or confidential data Manual files contain more narrative or graphical data 90 83 75 70 ASSESSMENT OF COMPUTERIZATION The following table summarizes the assessment of computerization by organizations providing responsive answers to questions in this category (i.e., organizations using computers) : Comments Regarding Computers (Percent of Respondents) Detected errors in records during computerization Computer improves routine data handling Computer provides more complete and timely reports Computer is essential to operations Computer permits collation of data regarding individuals Improved management planning is principal benefit of computerization 74 51 45 41 32 4 How Canada Organizes Information About People In addition, the importance of accuracy problems experienced with the computer was reported to be insignificant. Only 16 percent of respondents say that, as a result of increased retrieval capability after computerization, they are called upon to furnish more individually identifiable information to government agencies; and only 34 percent say that, as a result of computerization, they are called upon to furnish more statistical (aggregated) information regarding individuals. The amount of data collected per given individual after computerization was reported to have increased. However, only 39 percent of respondents attributed this increase to the fact of computerization; on the other hand, 60 percent attributed the increased data collection to changes in organizational objectives or programs, or to increasing government requirements for collecting or reporting information. 451 examine all data in his record include travel-and-entertainment card companies, market research firms, insurance companies, social welfare agencies, police forces, health services, employment agencies, and oil companies. Response to the question of whether individuals on whom records are kept or groups representing their interests have ever sought to examine their own records or complained about the adequacy of an organization's practices regarding an individual's right to examine his own record revealed that eight organizations get frequent complaints; 867 get none at all. IVlost likely to receive complaints about the inability of a subject to examine his record are law-enforcement agencies, credit bureaus, and health services. CONCLUSIONS RIGHTS OF SUBJECTS The right of an individual to examine his own record or a copy of his record from the file is the cornerstone of many suggested reforms in the area of privacy of individual information. Following is a complete tabulation of answers to the question of whether or not this right exists: No response The individual does not know the record exists He has no understanding of the contents of his record He can examine all data in his record He can examine some data in his record He can examine no data in his record Number Percent 64 5.27 62 5.19 135 502 291 172 11.03 41.87 23.21 14.24 Right to Examine One's Personal Record by Type of File Employees Does not know record exists Has no understanding of contents All Some None Customers Subjects 9 43 10 31 169 133 22 85 257 121 131 19 76 37 19 In cases where an individual is permitted to examine data in his record, we asked whether translation or interpretation was provided in an official language that the individual understands; 68 percent said it was. Organizations least likely to permit an individual to The acquisition of computing equipment has declined in recent years, which could indicate that the majority of those organizations who feel they could benefit from a computer have already acquired at least the main frame. However, utilization of computers for handling personal records is relatively low both in number of persons whose records are computerized (breadth) and the amount of information regarding each person who is computerized (depth). The fact that customer records tend to be most completely computerized both in breadth and depth demonstrates that the controlling factor behind decisions to establish or augment computer-based personal data banks may be based upon the expected economic return from this exercise. Thus economics rather than either technical infeasibility or unavailability of data has thus far inhibited the wholesale creation of personal data banks. Much greater capability for remote-access computing exists than is currently being utilized. However, there is a growth trend in this area. This may have unfortunate consequences with regard to data security. With a few notable exceptions, computer users have not yet proved fully capable to safeguard the confidentiality of computer-based files that are processed in the batch mode at a central location ; and remote-access computing presents a whole new dimension of hazard in respect of unauthorized interception and intrusion. A great de.al more exchange of personal information takes place than is generally appreciated. There appears to be a flow of information that proceeds through stages from nurturing organizations such as schools and health services to authoritarian organizations. Therefore, it is quite likely that personal information volunteered by an individual seeking some social benefit in one context 452 Fall Joint Computer Conference, 1972 may be used in another context to impose sanctions upon him for failure to conform to some societal norm. International traffic in personal data by large multinational organizations is already significant in volume and may easily double in the near future. Such traffic may adversely affect the quality of life and life chances of citizens in ways which are beyond the power of national governments to ameliorate. The official report of the Task Force is available from Information Canada under the title: "Privacy and Computers Task Force Report." Details of empirical studies (Studies 2, 3, and 4) are available from the Department of Communications under the title: "Personal Records: Procedures, Practices, and Problems". This document contains a copy of the Task Force questionnaire and a tabulation of responses. The reader is urged to consult also the report of Professor Alan Westin's study of the records problem in the U.S. This study was sponsored by the National Academy of Sciences. The report of a British study group was published in July 1972. Hardware/ software trade-offsReasons and directions by RICHARD L. MANDELL Compata, Incorporated Tarzana, California trade may mean merely shifting the boundary between system hardware and system software by providing different instructions or architectural features. A hardware/software trade-off is the establishment of the division of responsibility for performing system functions between the software, firmware and hardware. This is part and parcel of the fundamental process of defining computer architecture. It begins the day a computer is conceived and may be carried on by an ever widening group of individuals until the last computer of a given model is retired. There are areas of the trade-off which are the sole preserve of the manufacturer and his hardware/software team. Other areas of the trade-off are the responsibility of the user, or independent equipment manufacturers. Trade from 80ftware to firmware Many modern computers are microprogrammed. This introduces anbther trade-off possibility. Rather than introduce new hardware in place of software, the trade is often made between software and firmware. This trade may sometimes have no effect other than to speed upa system by eliminating main· memory fetch cycles. On the· other hand since the microprogrammer has available to him data paths and parallelisms that are not available at the traditional software level, it is possible to perform functions that would not be feasible or efficient in software. TYPES OF TRADE-OFFS II , I, Since hardware/software trade-offs occur in all areas of computer design and application, it is difficult to write about them without discussing most of the factors that enter into both hardware and software design. In this paper, an attempt will be made to define several classes of trade-offs and discuss the reason for each. Some computers are microprogrammed. In these systems the microprogram resides in a fast control store and controls the flow of data through storage, transformation units and data paths. For the purposes of this paper, the microprogram will be referred to as firmware, and the control store and other functional units will be called hardware. In the discussion that follows, a conventionally organized wired logic control will be viewed as a part of the hardware. Trade from firmware to hardware In designing a microprogrammed machine, a designer must decide which functions are to be performed strictly under hardware control and which functions are to be performed by sequences of microinstructions. He must also decide what fundamental data paths and functional unitswilI exist within the machine. Both of these types of trade-offs constitute trade-offs between hardwar~and firmware. The hardware/firmware trade may be made without influencing the external architecture of the system. Trade from 80ftware to hardware Direction of the trade The first class of trade-off is the trade from hardware to software or vice versa. Such a trade-off may involve transferring whole functions, such as memory protection from one system to the other. On the other hand, a Frequently, computer designers and users think only in terms of making machines bigger and faster. How453 454 Fall Joint Computer Conference, 1972 ever, there is always a market for smaller and simpler machines as the manufacturers of minicomputers, smart terminals and desk calculators have discovered. Thus, frequently the design objective is to simplify the computer. Accordingly, we will arbitrarily consider the three elements hardware, firmware and software to form a hierarchy, with hardware at the bottom, with firmware next and with software at the top. An upward trade will then be defined as a trade in which responsibility for a function is moved through the hierarchy from hardware toward software. A downward trade moves in the other direction. REASONS FOR PERFORMING TRADE-OFFS There are several reasons for performing hardware/ software trade-offs: • to achieve an otherwise unattainable performance goal • to minimize overall system costs • to reduce software complexity • to achieve overall system reliability • to extend system life • to improve debugging aids • to achieve compatibility • to achieve market position An inward/outward trade Achieving otherwise unattainable performance In the list of trades considered so far, the hardware has been considered to be a single system. In many systems the hardware is really viewed as an interconnection of subsystems which may themselves be hierarchically organized (i.e., memory systems and I/O systems). The organization and function resident within or attachable to these subsystems is frequently a part of the hardware/software trade. For example, as more autonomous control is given to the I/O system, the requirement for software control of the I/O system may be simplified. Trade-offs which move function from the CPU to autonomous control units will be termed outward trades and trades in the other direction will be termed inward trades. It should be noted that the outward trade may go so far as to remove a function from the computing system completely and place it in another communicating system. This is the case when printing is removed from the main I/O system and transferred to an autonomous off-line printer. Another example of this is an architecture which allows peripherals to communicate with one another without requiring service from the software. 1 The outward trade is an impressive tool for system enhancement after the system architecture has been frozen. This is possible because the I/O system usually presents a clear stable interface to the outside world. Thus, autonomous processors such as sorters,2 communications handlers,3 array processors,4 and support processors have been attached to CPUs in order to perform functions that would otherwise be done by central processor software. Some architectures have made the I/O systems sufficiently powerful to take on the role of much of the supervisor.5 Though the outward trade-off can be a powerful tool, it often introduces expensive special purpose elements into the system. These elements can only be justified if the function that they perform is required frequently enough to make them economical. One of the most common reasons for trading software for hardware is to achieve a performance that could not otherwise be obtained. This process ranges from the inclusion of internal features such as floating point arithmetic and index registers through the addition of specialized processors such as sorters and fast fourier transform processors.28 These processors may be added to either the I/O system, the memory interface or the CPU. Many of the advanced features of present and proposed computers represent hardware/software or hardware/firmware trades that were made by the manufacturer. Prager 6 gives a good example of a set of tradeoffs for improving the performance of the inner loop of scientific computers. Minimize overall system cost A frequent goal of designers today is to minimize overall system cost. Thus, it is usually the case that the boundary between software and hardware is drawn in such a way as to minimize hardware costs or even the costs of the entire system, including software. Thus, upward trades are frequently made. They may even be left as an option to the purchaser. Many systems offer optional features such as floating point arithmetic which may be performed either by hardware or by software. 7 Two trends are visible in the marketplace today. One trend is to provide systems in which a large amount of function is being assigned to the firmware in preference to software. The other trend is to develop small fast computers with minimal instruction sets. Reduce software complexity A goal which is becoming apparent is to reduce the complexity of both system software and user software Hardware/Software Trade-Offs by the addition of hardware features which reduce the amount of overall code, provide enhanced run time support, or free the programmer from concerns about limitations of memory space. 8 To achieve overall system reliability Software often is subject to failures due to inadvertent over-writing and frequent changes. Thus, there is a tendency of some experimenters to move critical functions to more secure locations. The most secure location is in the firmware or har~ware. Another trend associated with reliability is to move I/O error recovery functions from the software to peripheral controllers or channels. 9 To extend system life In the field of computing, the life time of a system is sometimes measured by its ability to change. This adaptability to change is achieved by assigning hardware functions to software or firmware. This phenomenon is particularly observable in communications controllers. However, it is probably an important property of microprogrammed computers with writable control stores. Improved debugging aids Monitoring for software errors (such as exceeding the bounds of an array) is very expensive to achieve by means of software alone. However, if the monitoring is built into the hardware it becomes a practical debugging aid. 8 Other hardware aids include firmware monitors, which perform flow tracing, and interrupt schemes, which monitor for violations of system conventions. 10 The protection hardware, which is a part of many modern computers, is an example of a hardwarell •12 aid to debugging as well as an aid to system reliability. Compatibility The design of emulators represents an interesting exercise in hardware, software, firmware trade-offs. An emulator combines hardware, software and firmware for the purpose of executing instructions for a machine other than the machine· on which the emulator is run. The selection of the boundary between the three components can significantly affect the performance of the emulator. Another reason for examining the possibility of hardware/ software trade-offs is to achieve intra-line com- 455 patibility. When a whole series of computers must be compatible, there are serious constraints that must be placed on the performance of some members of the family. These constraints sometimes limit the performance of downward trades at the large end of the line. The compatibility may be achieved by means of upward trades in the lower performance end of a computer family. To achieve market position It is frequently very difficult to demonstrate the cost effectiveness of unique hardware or software features. However, one is led to speculate that a motivation for performing hardware/software trade-offs is to achieve product differentiation and create captive customers who depend on the existence of a unique feature. These customers cannot easily transfer to a different computer. Marketing considerations have driven many of the minicomputer manufacturers to provide system software that was not required when the first minicomputers were introduced. It is reasonable to speculate that this requirement will stimulate new and creative hardware/ software trade-offs for these small machines. Recurring nature of hardware/software trades One characteristic of hardware/software trade-offs is that they must be repeated each time a new computer is developed. In fact, hardware/software trade-offs appear at the heart of the design process. They must always be reevaluated in terms of design goals and constraints, as well as within the limits of contemporary technology. It has been characteristic that hardware/software trades have been performed to achieve high performance in the largest, fastest computers of an epoch or generation. At the same time or slightly later, smaller, more spartan machines without the high performance features are introduced. These machines are optimized for low hard ware cost. During the next epoch the technology evolves so that "advanced" features can be included in new machines at the same price as the smallest machines of the previous epoch. Concurrently, new, even cheaper machines appear without many of the "exotic features." This cycle repeats itself as time goes on. Inhibiting forces involved with hardware/software trades While designers would like to make whatever hardware/software trade-offs their imagination and tech- 456 Fall Joint Computer Conference, 1972 nological constraints allow, they are not always free to do so. Marketing considerations and the cost of developing system software often inhibit this kind of freedom. While there are no formal standards for computer architecture in the United States, manufacturers often impose a standard architecture derived from earlier machines. This permits salvaging of system software and allows users to move from older to newer machines. The effect of this overriding requirement is that many of the trade-offs that exist in machines today occur as trades between firmware and hardware and not between software and hardware or firmware. While compatibility with previous systems is an important inhibiting force, it is often relaxed to the extent that the features of an older system form a subset of the features of the newer system. Thus, the older software can usually be used on the new system. However, this implies an inability to use the new features. Thus, even though a machine may include new instructions, there may be considerable expense and delay in making these features available to the user through system software. This expense and delay severely inhibits the ability of designers to freely trade software for hardware and vice versa. EXAMPLES OF HARDWARE/SOFTWARE TRADES I/O system The I/O system in computers has traditionally been an area in which hardware/software trades have been made. Examples of both inward/oqtward trades and upward/ downward trades can easily be found. Some of the reasons for the fertility in this area are: • A high degree of parallelism is possible. • The I/O system mu.st deal with a large spectrum of data rates requiring different processing techniques. • The I/O system is frequently controlled by system software, instead of user software, so that compatibility constraints can be maintained by software rather than hardware interfaces. • I/O devices seem to undergo a more rapid change than CPU techniques. The trade-offs that are usually considered lie in the following areas: • method of transferring data to main memory • method of monitoring for the completion of an I/O event • the complexity of an I/O event that can occur between CPU system interventions • the handling of error conditions Method of transferring data to lDain lDelDory The method of transferring data to main memory depends upon the data rate that must be handled. In the simplest systems, bytes or words of data are deposited in a CPU register. Software is responsible for collecting the data together into main memory size words, transferring the collected words into memory, recognizing the termination of the transmission and analyzing the status of the I/O device. In systems requiring higher data rates, the data is block transferred into main memory by a hardware controller and the software is only responsible for initiating each block transfer and determining the status of the device. The saving in CPU time required can be at least one order of magnitude. In still more complex systems, the software is only responsible for starting a chain of I/O events. These. run independently until they are completed. Method of 1D0nitoring for cOlDpletion of an I/O event Just as there is a spectrum of techniques for transferring data between the I/O system and memory, there is a spectrum of techniques for monitoring for the completion of an I/O event. At one end of the spectrum the software is required to repeatedly test for completion of an I/O operation. At the other end of the spectrum an interrupt system is used for seizing contro~ of the CPU when an event is complete. The interrupt system may itself offer a range of services which include saving of the machine status, identifying the I/O device which caused the interrupt and providing summary information about the nature of the event that caused the interrupt. All of these interrupt services are subject to hardware/software trades. COInplexity of an I/O event One of the most important features in determining the amount of software overhead and the amount of sof(ware required is the complexity of an I/O event. In the simplest case, the transfer of a single character constitutes an event. In more complex systems, an event may consist ofa large chain of block transfers. Devices have been constructed in which extremely complex events can occur as the result of a single command. Examples of such devices are graphics terminals and Hardware/Software Trade-Offs file processors. 13 Thus, the event may be a lengthy search of a structured· file or the sorting of a file. In these cases, the software overhead consists of building an adequately complex command to control the event rather than of monitoring for the completion of the event. Firmware/hardware trades associated with I/O systems The trades discussed in the previous paragraphs have all been hardware/software trades. In implementing these trades the designer is also faced with a firmware/ hardware trade at all control levels within the I/O hierarchy. In general, the trades are between the same services as discussed above. For example, if an I/O channel is to be implemented using shared CPU facilities,14 the designer has the choice of requiring the firmware to repetitively check for the completion of an I/O event or to provide for trapping the microcode when an I/O event occurs. The handling of error conditions r I, Since errors occur frequently, the handling of I/O errors has usually been the· responsibility of software. However, error handling can be the subject of an inward/outward trade-off. Errors are usually detected by some type of a coding scheme which involves examining both the meaningful data and a string of code bits that are transmitted along with the data. This examination can be done by either controller hardware, controller firmware, controller software or CPU software. The usual strategy in correcting errors that can be detected but not corrected by coding techniques is to rePeat the transmission. The initiation and control of this retransmission is also a subject for hardware software trades.' Trades in the CP U Within the CPU itself there are many. design trades that can be made in the hardware/software spectrum. The first group to consider are the downware trades I which move function from software to either firmware or hardware. Some examples of these functions are: II • context switching15 •16 • task dispatching15 • register optimization17 • memory hierarchy management18 • storage protection12 • emulation 457 A second class of trades within the CPU is augmentation of the instruction set for the purpose of simplifying the work of the problem programmer. These trades involve augmenting the architecture to remove constraints or adding of hardware macro functions such as . . sme or cosme. The class of trades which remove constraints is frequently associated with address space. These· constraints include: • the size of randomly addressable memory • the size of the address field in the instruction • the requirement for instruction and data alignment on word boundaries Specialized systems A class of hardware/software trades which is of particular interest is the specialized system. Two types of specialization can be seen in the industry. One class of specialization isolates a function such as sorting, matrix multiplication or fast fourier transform. This may be implemented as a special purpose computer which either operates stand-alone or as a part of a host machine. Some systems have been built or proposed in which a restructurable portion of the system is temporarily configured to obtain high performance for a special function. 19 ,2o The configuration of these reconfigurable machines is continually changing. Another class of specialized machine is the machine which is optimized to execute programs written in a higher level language. These machines offer hardware or firmware compilers or translators plus an architecture which is tailored to provide specialized run time support for the functions provided by the language. One characteristic of this architecture is that the command structure is very similar to the constructs of the higher level language for which the machine was designed. This structure may be markedly different than that of the traditional computer and may be a variant on polish notation or it may be a list or tree structure. Since the internal machine architecture is closely related to the requirements of the source language, the compiler or translator is required to perform much simpler transformations than would be necessary for a more traditional machine architecture. Thus, the compiler or translator is inherently fast. In addition, if the compiler is implemented in firmware, it has the advantage of not requiring main store fetches for instructions and can possibly rely on some parallelism within the CPU. The execution time. support associated with these language specific machines includes specialized instruc- 458 Fall Joint Computer Conference, 1972 tions, data structures and storage management techniques that are tailored specifically to the language. In the process of providing this support, many functions normally performed by the operating system are moved to the hardware. Language specific machines have been developed for ALGOL,S FORTRAN, 21 EULER,22 SYMBOL,23 and APL.24,25 Emulators Emulators are an excellent example of trade-off between hardware, firmware and software. An interesting example illustrating the range of possibilities is the series of 1401 emulators available on several models of System 360 and System 370. The 1401 emulators on the smaller 360 models are implemented almost entirely by firmware and hardware. Almost all of the 1401 instructions are fetched and executed directly by the emulator microcode. The 1401 emulators on System 370 have a different organization. The emulator firmware implements several instructions which are not 1401 instructions, but which can be used in conjunction with the System 370 instruction set to construct short emulation routines (software) which interpret the 1401 programs. The next 1401 instruction is fetched and decoded by a special emulator instruction at the end of each emulator routine. This instruction forces a branch to the emulator routine that will simulate the next 1401 instruction. Thus, the 1401 emulator on the System 370 has a large software component. The software portion of the emulator also interfaces with OS/360 in such a way that emulator jobs use the normal data management and supervisor services provided by the operating system. Emulator jobs and non-emulator jobs can be mixed indiscriminately. Figure 1 shows the number of bits of control storage used by 1401 emulators in several System 360 and 370 models. The System 370 emulators use less control store, but more main storage. The main store is only used, however, when the emulator is in use. The 1401 emulator on the 360/40 is illustrative of firmware to software trade. The emulator includes a hardware translator which is used to convert 1401 addresses to physical System 360 addresses. The translation function could have been performed by firmware, but would have required considerably more time. CONCLUDING REMARKS This paper has examined some of the reasons for making hardware/software trade-offs and has shown some of the types of trade-offs that have been made in existing machines. Techniques for evaluating trade-offs are discussed in References 2,26 and 27. Though hardware/software trade-offs have been carried on throughout the history of computing, the recent introduction of machines that can be microprogrammed by the user should bring about new interest in the topic. Advances in system performance, measurement and modeling are providing better tools for evaluating hardware/software trade-offs and should lead to a more complete understanding of trades. Language specific machines, intelligent terminals, emulation, machines with firmware operating systems, minicomputers .with enhanced capability and implementation of virtual memory will be intensely studied with reference to hardware/software trades during the next few years. BIBLIOGRAPHY 1 Processor handbook, PDPll 2 3 4 5 System Amount of Control Store Used for 1401 Emulation firmware (bits) 360/30 360/40 370/135 370/145 370/155 240K 224K 109.8K 38.4K 38K Figure I-Number of bits of control store in 1401 emulators on IBM computers 6 7 8 9 Digital Equipment Corporation Maynard Massachusetts H BARSAMIAN A DECEGAMA Evaluation of hardware-firmware-software trade-offs with mathematical modeling Proceedings of the 1971 SJCC pp 151-159 Introduction to the IBM 3705 communications controller IBM Corporation White Plains New York Form No GA 2720511972 J F RUGGIERO D A CORYELL An auxiliary processing system for array calculations IBM Systems Journal Vol 8 No 2 1969 Control Data 6400/6600 computer systems reference manual Control Data Corporation Minneapolis Minnesota D PRAGER Some notes on speeding up certain loops by software, firmware and hardware means IEEE Transactions on Computers Jan 1972 pp 97-100 System/360 model 40 functional characteristics IBM Corporation White Plains N ew York Form No 22-6881 E A HAVEK D A DENT Burroughs' B6500/B7500 stack mechanism AFIPS Conference Proceedings Vol 32 1968 pp 245-251 J F KEELEY System/370-reliability from a system viewpoint Hardware jSoftware Trade-Offs 10 11 12 13 14 15 16 17 18 19 Proceedings of the 1971 IEEE International Computer Society Conference Boston Massachusetts pp 33-34 L ROBERTS Can microcode be used to measure system performance Proceedings of the 4th Annual Microprogramming Workshop Santa Cruz California September 13-14 1971 IBM system/360 principles of operation IBM Corporation White Plains N ew York Form No GA 22-6821 M D SCHROEDER J H SALTZER A hardware architecture for implementing protection rings Communications of the ACM March 1972 pp 177-184 2314/2844 multiplex storage control feature-airlines buffer IBM Corporation White Plains New York Form No GA 26-5714 S S HUSSON Microprogramming, principles and practice Chapters 7 and 8 Prentice-Hall Inc Englewood Cliffs N J 1970 MAC computer reference manual Lockheed Electronics Los Angeles California Chapter 4 Sigma 7 reference manual Xerox Data Systems EI Segundo California R M TOMASULO Efficient algorithms for expoliting multiple arithmetic units IBM Journal of Research and Development Jan 1967 pp 25-33 A guide to the IBM/system/370 model 165 IBM Corporation White Plains New York Form No GA-20-1730 pp 14-24 . G ESTRIN Organization of computer systems-The fixed plus variable 20 21 22 23 24 25 26 27 28 459 structure computer Proc WJCC 1960 pp 33-40 W CLARK M acromodular computer systems Proc SJCC 1967 pp 335-336 T R BASHKOW A SASSON A KRONFIELD A system design for a FORTRAN machine IEEE Transactions on Electronic Computers August 1967 pp 485-499 H WEBER Implementation of Euler on the system/360 model 30 Communications of the ACM September 1967 pp 547-558 W R SMITH et al SYMBOL-A large experimental system exploring major hardware replacement of software Proc of the 1971 SJCC pp 601-617 R ZACKS D STEINGART J MOORE A firmware APL time-sharing system Proc of the 1971 SJCC pp 179-191 A HASSITT J W LAGESCHULTE L E LYON Implementation of a high level language machine Proc of the 4th Annual Microprogramming Workshop Santa Cruz California September 1971 J D FOLEY An approach to the optimum design of computer graphics systems Communications of the ACM June 1971 pp 380-390 N R NIELSON The simulation of time sharing systems Communications of the ACM July 1967 pp 397-412 M J CORINTHIOS A fast fourier transform for high speed signaal processing IEEE Transactions August 1971 pp 843-846 A design for an auxiliary associative parallel processpr by M. A. WESLEY, S.-K. CHANG and J. H. MOMMENS IBM Thomas J. Watson Research Center Yorktown Heights, New York INTRODUCTION SYSTEM DESCRIPTION The use Qf highly parallel processing units for. computing problems that are highly .parallel in structure has been widely studied. The range of systems varies frQm the duplication of complete prQoessing elements,! through the provision of a set of specially tailored small prQcessors attached to. a main processor,2 to the use of cellular arrays;3 Qther writers have exploited the inherent parallelism Qf associative· memQries as components of parallel prQcessing systems.4-7 Associative memQries have been prQPosed either as true CQntent addressable memQries, 5 or as processing units. 4- 6 In general, for use as a processing unit, each wQrd in the memQry, Qr PQssibly pairs or grQUps of wQrds, is regarded as a serial by bit prQcessing unit, all operating in parallel and cQntrQlled by a single program. These proposals have includoo rather complicated control systems to perfQrm bit indexing and Qther functiQns necessary to. sequence the memory through a program. An impQrtant extension to the co.ncept of associative memories as processing elements was proposed by lV[cKeever,8 who. described the use Qf three state storage elements with increased logic functiQn at eachstQrage cell; a memory with this feature is referred to here as an associative functional memory. The use of three state cells as a general system technolQgy for conventiQnal sequential processors has been described ;9,10 it is the purpose Qf this paper to. demonstrate that: The associative processQr to be described here is intended for use as a prQgrammable auxiliary proceSSQr to assist a conventiQnal main proceSSQr in special problems. Programs are lQaded from the main proceSSQr and are used to lQad data, to process it, and to. return results to the main prQcessor. The main processor has at all times the ability to. force the auxiliary processor to accept a new program Qr to. branch to' ·a specified locatiQn in its program. For applicatiQns invQlving the processing and reduction Qf very large amQunts of raw data, fQr example, radar signal processing, it would be wasteful to transfer data to the associative processor by way of the memory and channels of the main prQcessor. In these circumstances, the assQciative prQcessor could be modified to. accept data directly from its SQurce, that is, to. act·as a pre-processor, but would not be expected to exercise control over the data source. The overatl design gQals have been simplicity of implementation and generality of applicatiQn. Simplicity of implementation has been achieved by construction frQm units which CQuid be standard modules9 with a minimum of additional special logic, and has led to. a potentially fast cycle time. Generality of application has been achieved by implementing many contrQI functions in memory and by the inclusion Qf some extra associative memQry features which are not necessarily required· in all applicatiQns. The prQPosed proceSSQr cQnsists of two main components (Figure 1): a 1024 word X 64 cell assQciative functional memory and a 512 word X 50 bit read/write contrQI stQre. The as~ sociative memQry is used tQstQre both data being processedand control informatiQn. An alternative would have been to. have used separate memQries; however, the use of a single unit permits the ratio Qf data to control information to be tailQred to any given prQblem and enables a very simple cQntrol system to be used. On the other hand, the single array approach reduces the speed of data processing since many associative 1. An assQciative functional memory with suitable peripheral features could be used to implement many of its own control functions as well as perfQrming processing operations, and could readily be assembled into a complete auxiliary parallel processor, 2. Such a processor would be an attractive means Qf enhancing the performance of small. conventiQnal prQcessors in a wide range of problems. 461 462 Fall Joint Computer Conference, 1972 ~ ~ ~ I/O Control "' , -Bit control - I/O Data .J' 'Program load j~ , "''' , ~ Conditions Word Control Associative Memory Array Control store ~ Controls Figure 1-Block diagram of the proposed associative processor memory cycles have to be used for control operations. It tends to be wasteful in the use of associative cells for control tables, and requires the introduction of extra features to reduce the interference between data and control. Control sequences for the execution of a program are contained in ,a read/write control store normally operating in a read-only mode. Conditional branches in the program may be made by testing the condition of various signals in the processor and its I/O interfaces. Program loading, ie., writing into the control store, is performed under the control of a short, permanent, initial load program. Input and output data transfers are made by way of the associative array bitconttol unit. Basic interface control is carried out by the control store which can generate outgoing and test incoming control signals; more complex I/O control, such as an IBM Standard Interface, requires the addition of an interface control unit. Attachment closer to the main processor (e.g., interfacing the main, memory) would give higher performance but would imply modifications to the main processor. Associative processing array The associative processing array is a two-dimensional array of three state (0, 1, X= "don't care") associative storage cells with arbitrarily chosen dimensions of 1024 words X 64 cells. The array is connected in the word direction to the word control unit and in the bit direction to the bit control unit. In an LSI implementation, the basic module could be a self-contained associative functional memory unit of, say, 128 words, complete with bit and word controls. lVlodules could readily be extended in the word direction by suitable interconnection of data and control lines; extension in the bit direction may be simulated by software. Three basic operations may be performed on the array: search, read, and write. Search A ternary search argument is generated in the bit control unit between the specified data register (Rl, R2) and the specified mask (M, alII's, all O's) on a bit Design for Auxiliary Associative Parallel Processor 463 Write by bit basis: Data ~ 1 Mask o X X 1 0 1 X = don't care Generation of search arguments. All cells, in parallel, compare their contents with the search argument for that bit column and generate a mis-match signal in accordance with the truth table: Cell Content 0 Two write commands are provided: Write Normal, and Write Special. In either case a ternary argument is generated in the same manner as a Search argument and acts on the contents of cells in selected words, as defined by the specified selector register in true or complement form (P, S, all 0). The effects on a cell are shown in the two truth tables below: Write Argument o Cell Content 0 1 X 1 No' change Search Argument 0 X 1 X X No change 1 X Write Normal 1 0 1 1 0 0 X 0 0 0 0 0 Write Argument Generation of mismatch signals Mismatch signals for a cell are ORed to give a mismatch signal for the word; word mismatch signals, in true or complement form, are sent to the word control unit where they may be ANDed or ORed with, or replace the contents of one of two sets of selector latches (P and S). Read The contents of a specified set of selector latches (P, S, all O's) in true or complement form are used to select words to be read. The contents of cells from selected words are ORed in the bit direction onto a read bus (an X state reads as zero) and sent to the bit control unit where they are used to load a specified register (Rl, R2, M) based on the value of mask specified (M, all 0, all 1): Mask 0 1 0 No change 0 1 No change Read Bus 1 Effect of Read operation on specified register. Write Special The word control unit may also perform a one bit shift of a selector register up or down with end around carry, or fill with 0 or 1; a shift takes the same time as an array operation or may be overlapped with an adjacent preceding array operation using the same selector register. This provides the only parallel means of communicating vertically between words. Other writers (e.g., McKeever, Reference 8) have usually specified other operations in the word control unit, such as isolate first match. Although provided by our simulator we have found little use for such operations, which tend to be serial in nature, and for the most part found that they can be economically simulated by software, e.g., by use of a code field. The exception was sorting with an arbitrary number of identical items, when a means of separately identifying multiple matches is necessary. The bit control unit contains three registers: two data registers (Rl, R2) defining a data source or sink for an array operation; and one mask register(M) defining a field for an array operation. Any array operation may use either data register and the mask register, or may replace the mask by a source of all O's or alII's. In addition, the control store may specify directly the leftmost four bits each for the mask and data registers. These bits (the immediate field) are ORed into the register outputs without affecting the contents of the register. A non-array operation, a single bit shift operation on any register may be specified; this feature is assumed to take the same time as an array operation unless it is overlapped with an adjacent array operation in which the register being shifted is a data source or sink; again, fill with 0 or 1 may be specified. 464 Fall Joint Computer Conference, 1972 Input-output operations Program loading Input-output operations for the associative processor take place through the bit control data register RL The register is divided into fields each of the same width as the I/O interface data busses. Data may be gated to or from the register under program control and is interlocked with the main processor by interface synchronizing signals. Outgoing interface control signals are generated by the control store and by the run control logic. Incoming interface control signals are either tested as machine conditions by the program, or act directly on the run control logic. Operation as a pre-processor, taking data from but not controlling another source, would require the ability to transfer into the processo:r from another interface and generate and test another set of I/O synchronization signals. This modification requires at least two extra bits in the control word and some extra logic, but is not p.xpected to be very difficult to implement. Program loading is performed under the control of a small fixed routine held in the first few words of the control store. The program load routine assembles data from the I/O interface into the bit control register RL This data is interpreted as a control word and the address of the location in the control store into which it is to be stored. The program load routine then gives a special signal "write next cycle" which causes the run control logic to break its normal cycle of read-only operation, and to spend one cycle writing into the control store from RL Note that the control store data register is not altered and is available for normal operation on the cycle after the write operation is performed. The "write next cycle" control also permits the transfer of programs from the associative array to the control store. Programming techniques Control store The control store (Figure 2) is a conventional (as opposed to associative) read/write store used to hold a program defining the sequence of operations to be performed by the associative memory.u During the execution of a program, the control store normally operates in a read-only manner. Each word read out specifies the operations to be performed in the array and also the address of the next program word. The next address may be modified by a condition in the machine, specified by the program word, enabling conditional branches to be made in the program. The control store contains 512 words of 50 bits, though these numbers may vary, depending on the features included. When formed into groups of mutually exclusive options, the operation options to be specified for the array processing unit fall into rather small groups, so that coding within a group is not very advantageous, and bit significant operation has been chosen. This has other advantages as it increases flexibility and eliminates timing delays through decoders. It is expected that a semiconductor memory will be necessary to be able to operate at the same speed as the array. Such. a memory will. have nondestructive read out so that writing into the control store will require special control features. Subroutining capability is provided by a data path to the bit control register R1, enabling subroutine return addresses to be stored in the associative array. The guiding principle behind the design of the control system has been to make the hardware simple whilst keeping the system flexible. This principle led to the use of a single associative memory, controlled by a single conventional control store, with both data and control information stored and processed in the associative memory. Three classes of control information are held in the associative memory: (1) mask and data register contents for operating on data. In the case of relatively simple operations, such as addition, these register contents are stored in consecutive locations in the sequence in which they will be needed, and are accessed by shifting a selector register reserved for the purpose. In more complicated operations, such as multiplication, where the total number of masks is proportional to P2 (where P is the field width) and may be large, it may be advantageous to process the masks as data in the manner described in References 9 and 10, and to generate the required sequence of masks; the number of control words now becomes essentially proportional to p. (2) program flow. logic, including counts and logical decisions. These may be programmed directly or, in simple cases, may be implemented by inserting blank words in mask sequences and test- Design for Auxiliary Associative Parallel Processor 465 Program load and subroutining from Bit control Rl Program load from Bit Control Rl ~ (9) Address (9) (1) (8) - ( 8) ~ I-- Next Address ~_______---tCond (4 ) ~ Condi tion .1, . . ._ _ _ _ __ Se 1. r Condi tions from bit and controls, and I/O sync In. Control store (50) (8)~Immediate field to bit control (3)~Array (50) operations ~ (9)~Bit (l2)~ control Word control ~ (4)~I/0, including I/O sync. out (2)~Misc. Run Control to I/O interference Busy ~ ______________________________~Timing for controls from I/O interface: Stop Reset and Branch Start Figure 2-Control store connections word 466 Fall Joint Computer Conference, 1972 lnunediate Field Field A .------...( Rl R2 1 0 0 Field B " ,.... \( 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 o i i : : 1 I , o Data ~O-Olrlllllljoo-o-ooo 1000 1 000000,111111 0 1 1000 1 01 1 11 o 1000' 01 1 11 o 1 0 0 1000 1 1000: 0 I 1 1000 1 I o 1 10001 0 I 1 10001 1 I o 10001 0 I 1 1000 1 1 I o 1 0 0 0: 0 11 100° 10 1 1 100010 Immediate field codes: Control (blank 10 0 data word 1 0 0 1 start of control sequence 1 0 0 0 control word Immediate field allocation for data words: bit 1 0 data word bit 2 0/1 not active/active marker bit 3 0/1 carry a/I Figure 3a-Memory organization of addition: A' = A +B 0) Design for Auxiliary Associative Parallel Processor Location Operation Inunediate Field Selector 14 Search 1001 P 15 Read 0000 P, shift down IS Read 0000 17 Read 18 Data Source or Sink Next Address Mask 467 Conunents a 15 R2 1 16 P, shift down Rl 1 17 0000 P, shift down M 1 13 Search 0--- S a 19 19 Write -lO- S a 20 20 Search 010- S R2 M 22 22 Search 011- OR into S Rl M 23 23 Write 00-- S 0 24 24 Read 0000 P, shift down M 1 25 read new 25 Search 01-- S R2 M 26 identify a field bit changing to 0; update 26 Write 001- S Rl M 27 27 Search 01-- S Rl M 28 28 Write 000- S R2 M 29 29 Search 0--- S 0 30 set active markers 30 Read 0000 P, shift down 1 31 read new mask 31 Write -1-- S 0 20/21 (M=O) test for mask = 0 M load Rl, R2, M. keep mask table pointer in ·P. reset carry and active markers in data words identify no change combinations and mark as inactive mask identify a field bit changing to 1, update 21 Figure 3b-Program for addition: A' ing for an all zero read out. Note that the only internal condition tests available to the programmer are zero tests on the bit and word registers; an alternative would have been a test on a single bit. (3) partitioning. The immediate field provides a fast software technique for partitioning the single array into groups of words. The four bits of the field permit 16 interleaved partitions of arbitrary size. This feature is particularly valuable for distinguishing and separating data and control information; for example, a 0 in the leftmost bit position may signify data while a 1 signifies control. A further consequence of the use of a single array is the need to load and store the mask register from and to the array. The three array operations have been generalized for this purpose. = A +B Programming example: Serial-by-bit addition This example is given to show: (1) the use of the immediate field (2) the use of the associative array for both data and control information (3) the ability to define fields independently of the program by means of control tables. Suppose we wish to perform the addition of two fields, A and B, the result to overwrite field A, i.e., A' = A + B. The minimum possible number of array operations per bit is 6 (4 Search and 2 Write); however, this assumes no performance loss handling control operations. The addition algorithm given below takes 11 operations (9 if the inner loop is expanded to handle two bits consecutively). The algorithm uses 2p+3 memory words to store masks and data register contents (p is the field width) ; we have found that, in general, it is possible to trade less speed for less control storage. 468 Fall Joint Computer Conference, 1972 The algorithm is illustrated in Figure 3. The first six instructions locate the start of an addition control table, load the two data registers R1 and R2 with constants which remain unchanged throughout the algorithm, load an initial pattern into the mask register, and initialize the immediate field. Three bits of the immediate field are used: bit 1 indicates data or control words, bit 2 is an activity marker used to indicate whether a word has been completely processed in the current bit position, and bit 3 is a carry and is initially zero. Instructions 20-31 make up the main loop of the algorithm which proceeds in a·serial by bit manner starting with the least significant bit. At each bit position the no change condition in the A and carry bits is detected, and these words are marked as inactive. The remaining words are tested for changes in the A field and are updated. Indexing across the fields is achieved by the mask register contents, which are read sequentially from the control table. Execution of the loop ceases when an all zero mask is read out. APPLICATIONS The principal mode of parallel processing employed in this associative processor is serial by bit, parallel by word, over some selected subset of words in the memory. Thus a memory of 1024 words has a potential processing parallelism of up to 1024. Operating in a serial-by-bit manner across fields inherently requires more cycles than a conventional machine with bit parallel processing. This is particularly significant in arithmetic operations; for example, 16 bit addition requires about four times as many control cycles as a System/360 Model 30, 32 bit addition requires about eight times as many, and this must be more than cancelled by the parallelism used. At present we are limited to fixed or block floating point operation; normalization in general floating point is prohibitively time consuming. In bit manipulation operations, the programmable field feature (i.e., the ability to define fields by mask control tables stored in the associative memory) may enable the associative processor to take fewer operations than a sequential machine. The overall performance of the associative processor is affected by a number of overheads. It is assumed that the processor would be used for repetitive execution of a program, so that program and control table loading times need not be included in the problem-solving time. Input and output of data is sequential by word and can be very significant. In general, the processor as described with a single I/O data path is only suited to problems with a high processing to I/O ratio; however, multiple I/O data paths could be provided to each of a number of partitions. After each stage of parallel computation (e.g., after a vector addition) it is generally necessary to reorganize the data for a subsequent stage of processing; this too can use significant amounts of time and must be minimized by careful algorithm selection and memory organization. The performance of the processor has been studied with the assistance of a very flexible simulator program which allowed function truth tables to be defined at object time. Execution times, including processing, input/ output, and data reorganization, have been computed assuming a cycle time of 100 nsec, which is believed to be within the capability of an LSI technology. A wide range of examples have been studied for the associative processor and are discussed here without details of programming techniques. The aim in choosing examples has been to investigate the versatility of the associative processor and to demonstrate its performance on problems for which special purpose processors are being built. The examples are summarized in Table I; performance figures for the associative processor are based on a cycle time of 100 nsecs and an I/O data rate of 1.5 p. secs per byte. Picture processing The functional memory may be regarded as a twodimensional array of storage cells. Given a memory with suitable dimensions, two-dimensional pictures may be stored in two-dimensional form and, since neighboring point relationships are preserved, local processing operations may be performed directly and with a high degree of parallelism. Analog picture element values may be coded into a number of adjacent bits in either the bit or word direction; pictures too large for the memory may be partitioned and processed in separate pages, but this requires care in piecing the edge results together. As an example, consider the application of a twodimensional binary mask operator (nxXny) to a binary picture (NxXNy) stored in the functional memory. The algorithm proceeds by searching sequentially for each line of the operator, centered on one column of the picture. The result of the first search operation is loaded into a selector register and shifted one position; the results of subsequent searches are ANDed into the previous selector register contents before shifting. After ny search operations, the selector register contains the full result of applying the operator to the column and may be either stored back into the memory or output; further columns may be processed sequentially. With Nx=Ny= 144, application of 25 operators with nx=ny= Design for Auxiliary Associative Parallel Processor 7 takes 120 milliseconds and is estimated to be 610 times faster than a 360/30. Note that this problem gains performance through both the parallelism of the associative processor and its ability to tailor data fields to the needs of the current algorithm. An alternative approach, suitable for on-line character recognition, would be to exploit the symmetry of the picture-operator system and hold the mask operators in the memory and search them with the picture as received from a scanner. This operation is the "feature extraction" process of character recognition; the resultant feature vector may subsequently be matched against a stored library of standard reference feature vectors; in both operations the three call states may be used to represent ternary data. Distance measures between the feature vector and all the reference vectors may then be computed in a serial-by-bit manner; the recognition process may be completed by testing for the minimum distance using parallel search techniques. 469 of numbers by searching for columns containing a mixture of O's and l's. If no such columns exist, all the numbers are identical and are equal to the largest one. Otherwise, the leftmost mixed column is searched for numbers with 1 in this position, and the operation is repeated on this new subset. The number of operations taken by the associative processor to execute the algorithm is very data dependent; worst-case figures are given in Table I for an internal sort of 1000 items using 16 bit keys and show a speed up of a factor of 1100ver a 360/30. As mentioned previously, a sort of identical items requires a means of isolating the components of multiple matches; in this example, where an arbitrary number of identical items may be present software techniques require a wide code field and are therefore expensive. We have assumed the existence of an isolate first hit feature. Tree searching Lewin sorting algorithm The Lewin sorting algorithm12 was originally proposed for an associative memory with a special hardware feature to indicate whether a column contained all O's or all l's. This feature may readily be simulated by software on this processor; for example, searching for 1 on a data column and a subsequent read of a marker column containing all l's will indicate whether or not the data column contains all O's. An all l's condition may be similarly detected. The algorithm finds, for example, the largest of a set I TABLE I-Summary of the Performance of the Associative Processor Distribution of total processing time Data Total Speed Up Process- ReorganProcessOver ing ization ing Time 360/30 I/O I II, Picture Processing Sorting I I 97% 3% 70% 30% Matrix Mult. 31% 7% 62% Fourier Transform Hadamard Transform 1-D Filter 17% 44% 39% 4% 46% 50% 40% 60% 2-D Filter 50% 50% 122 millisec. 20 millisec. 1 millisec. 31 millisec. 12 millisec. 10 millisec. 20 sec. One of the major problems in artificial intelligence is to perform efficient tree searching. Since the number of nodes of a tree grows exponentially with respect to the depth of the tree, the tree searching time also increases exponentially, rendering deeper search impractical. It is clear that in tree searching the same sequence of computation and condition testing is performed on every node. Thus the basic requirement of "Single Program Multiple Data" processing is satisfied and we can perform computations upon all nodes in parallel. The tree may still have to be grown step by step, but this is probably unavoidable. It is difficult to define a typical tree searching problem and, since performance of both the associative processor and a conventional processor are highly problem dependent, no performance comparisons are given. However, we note that the performance improvements in the region of 2-3 orders of magnitude have been found in simple game-playing problems. 610X Matrix operations 1l0X Many matrix operations are inherently parallel in nature and may readily be programmed for the associative processor. Vector addition, subtraction, and multiplication operations, and summation of elements of a vector, may be executed very efficiently; division may be performed only with difficul~y. Thus, matrix multiplication is very attractive, but operations involving a high proportion of divisions is not likely to show any great, .advantage on the associative processor. When only a small number of divisions are required. 78X 75X 79X 280 X 510X 470 Fall Joint Computer Conference, 1972 they may be performed by the main processor (e.g., pivotal element normalization in matrix inversion2). Fixed point multiplication of 10X10 matrices at 16 bit precision gives a performance improvement of 78 times. Larger matrix sizes may be partitioned to fit the processor and show approximately the same processing performance improvement because I/O time dominates. Fast Fourier and Hadamard transforms The fast Fourier13 and Hadamard14 transforms are closely related operations used particularly in signal and image processing. The radix-2 fast Fourier transform computes the Fourier transform of a set of points Al o ••• An 0 by means of a sequence of transformations AO~AL_~ •• ·Am-l, where n=2m. Each of these transformations is made up on n/2 pairs of elementary operations of the form square wave analog of the sine and cosine wave Fourier transform and has many advantages from a computational point of view. In particular, the use of square waves of amplitude± 1 makes multiplication unnecessary, and an ability to generate square wave transition lengths for a transform of length 2N from a transform of length N removes the need for a stored table of coefficients. The Hadamard transform also has a fast Hadamard transform algorithm. Performance on the associative processor for a 1024 point real transform is shown in Table 1. Note that in both these transforms data reorganization becomes very significant. I I I Digital filtering Digital convolutional filters of the form: n yet) = L x(t-r)g(t) 1'=1 where where W pi is a complex 2iHth root of unity in the fast Fourier transform, and 1 in the Hadamard transform. , Each of the pairs of elementary operations in a transform may be performed in parallel and consists of a complex multiplication followed by a complex addition and subtraction; the result of a transformation may overwrite the input to the transformation. Many algorithms have been proposed for the selection of the pairs of indices p and q. The procedure chosen for use here selects the indices in a regular manner and allows efficient use to be made of the select latches as a means of parallel communication between words. For the first transformation A o~A 1, Ap and Aq are n/2 words apart; forA 1-tA2, n/4 words apart, etc. However, this procedure has the disadvantage that if the input data are in order, the results will be permuted with their addresses in bit reversed form, though this may be corrected when the results are transferred back to the main processor. The basic steps of the algorithm have already been described 15 for an associative processor with external storage and separated data and control functions. The implementation of a useful size of Fourier transform within this associative processor requires the use of a larger memory array. The principal reasons are the need to store the complex roots of unity and the inclusion of an address field to enable blocks of operands to be identified rapidly. A 1024 point complex transform with 14 bit precision may be fitted into an associative memory of 1273 words of 89 bits with a performaIlce approximately 75 times faster than a 360/30. The Hadamard transform may be regarded as a get) is a filter of length n x(t) is the filter input yet) is the filter output may be implemented on the associative processor in a number of ways, the choice depending principally on the dimensions of the problems, e.g., filter length, data record length, and number of filters. The most efficient method, in the sense that I/O operations are minimized, is to store the filter vector g permanently in the memory and to regard the data points as scalar inputs operating on all elements of the filter vector. This method is applicable when N ~ number of words in the memory, where N represents either a single long filter or a number of shorter filters of equal length. Note that, in the Single Program Multiple Data form of parallel processing, a scalar operation on a vector differs from element operations between a pair of vectors in that it is now possible to perform look-ahead operations when processing the scalar quantity, thereby approximately . halving the execution time. The algorithm assumes that the memory is partitioned into two fields of equal size, one for the filter vector g and the other for partial results. Processing proceeds in a pipeline manner-a new data point is received and used as a scalar multiplier on the filter vector, the products being added into the adjacent partial result field. The partial results are shifted one word position and the process repeated with the next data point. After the first n data points have been processed, one output result will be available for each filter held in the memory; thereafter, output results are available after each new data point has been processed. i i I i Design for Auxiliary Associative Parallel Processor An alternative method to be used when the filter is short is to load the memory to capacity with data points and to apply the filter coefficients as external scalars. When the whole filter has been applied, all the results may be read out. The processing time for this method is the same as that for the stored filter, but the I/O time is significantly greater. Two examples have been considered, both using the stored data method. The first is a typical seismic signal processing problem and has a 1000-point data record, a 25-point filter, and operates at 16 bit precision. The second is a picture processing problem similar to that posed by Mariner pictures· with a picture of 600 X 684 elements, a two-dimensional filter of 15 X 15 points, and operates at 8 bit precision. The results are shown in Table I. In spite of the I/O overheads, the performance improvements are large; in particular, the space picture processing performance reflects the ability of the associative processor to tailor its field lengths to the problem. Convolutional decoding In this example, the associative processor is used to perform error-correcting decoding operations. The Viterbi decoding algorithm16 is given as an example; however, in order to understand the decoding algorithm it is necessary to first describe the coding process. The encoder has the canonical form of a shift register of length S. Each time an information digit is encoded, the contents of the shift register are shifted right, the rightmost bit being discarded, and the information digit is stored in the leftmost bit of the register. The encoded message bits are the modulo 2 sums of some bits in the shift register; the ratio of information bits to encoded message bits is known as the code rate. The present contents of the shift register may beregarded as the state of the encoder, and a state transition diagram may be constructed for every input of an information bit. The Viterbi decoding algorithm is based on storing, for each possible state of the encoder, a history of the most probable, in some sense, sequence of information digits to reach that state. Each state and its history has a distance measure associated with it. When a new set of encoded message bits are received, the histories are updated by computing the error distam~e between the received bits and the true bits corresponding to each state transition, and adding this to the distance measure for the corresponding history. The histories are arbitrarily restricted to a length 3(S -1) and, after each updating, the bit 3(S -1) bits away in the history with the lowest distance measure is output as a decoded bit. 471 A number of examples have been studied with various values of S and code rate. Comparisons have not been made with a conventional machine because special purpose processors are being built for these decoding problems. For S = 6, rate = 72, the associative processor takes 100 JLseconds per bit, i.e., 10K bits/sec. For S = 9, rate = %, the processor takes 370 JLsecs. per bit, i.e., 2.7K bits/sec. This variation in performance with shift register length S is almost entirely caused by an increase in data reorganization overheads caused by a larger number of encoder states. CONCLUSIONS The auxiliary associative processor described in this paper has been shown to have a high performance on a wide range of problems which are inherently parallel in structure. The major drawbacks have been found to be in the processor's ability to handle only fixed point or block floating point arithmetic, and the difficulty of performing division. The principal system problems have been in the operating overheads of I/O and data reorganization. The I/O overhead could be reduced by integrating the associative processor into the main processor, which would also permit more complex interaction betwee~ the processors or by providing multiple I/O paths. The data reorganization overhead is caused mainly by long shift operations in the selector latches; these could be reduced by hardware ·and/or software partitioning of the memory, enabling inactive blocks of words to be by-passed. In Reference 17 the data reorganization problem is studied in detail. The consequences of using a single array to hold both data and control information are hard to isolate. In operation time the overhead for control operations is always less than 50 percent (arithmetic operations) and is generally much less. In memory space, the price is at least one bit of each word (the immediate field) and up to 25 percent increase in size (Fourier transform). In contrast, the flexible partition between data and contr61, and the ability to tailor fields to the problem have proved very powerful. The decision to use the three-state cells of McKeever8 was based on the aim for generality of application. In parallel binary arithmetic operations only two of the three states have been used; however, the third state has been used for data representation in picture processing and tree searching, and for implementation of control functions in all examples. In practice the threestate cell may be implemented by 2 two-state cells; the possibility then exists of having two-state cells individually for 2 state operations, and in conjunction for larger numbers of states. In this paper we have not pursued such an approach. 472 Fall Joint Computer Conference, 1972 Economic realization of the processor requires the availability of high performance, low-cost integrated circuit technologies. However, the system design has aimed at the use of only a small number of different components, most of which are memory rather than random logic. The components used could be a standard technology suitable for both conventional and parallel systems. The performance figures quoted in this paper have been obtained with the aid of a software simulator at the microprogram level. Little work has been done on the development of a higher level language or assembler for the processor. ACKNOWLEDGMENT We are indebted to Dr. R. Lyons for drawing the Lewin sorting algorithm to our attention and pointing out its suitability for execution on an associative functional processor. REFERENCES 1 D L SLOTNIK we BORCK R C MCREYNOLDS The Solomon Computer Proc FJCC pp 97-107 1962 2 B A CRANE J A GITHENS Bulk processing in a distributed logic memory IEEE Trans on Elect Computers Vol EC 14 pp 186-196 April 1965 3 J H HOLLAND A universal computer capable of executing an arbitrary number of sub-programs simultaneously Proc FJCC pp 108-113 1959 4 G ESTRIN R FULLER , Algorithms for content-addressable memories Proc IEEE pp 118-130 Pacific Computer Conf 1963 5 R G EWING P M DAVIES A n associative processor Proc FJCC pp 147-158 1964 6 R H FULLER R M BIRD An associative parallel processor with applications to picture processing Proc FJCC pp 105-115 1965 7 J A GITHENS An associative, highly parallel computer for radar data processing Parallel Processor Systems Technologies and Applications editor L C Hobbs pp 71-86 Spartan Books 1970 8 B T McKEEVER The associative memory structure Proc FJCC pp 371-388 1965 9 M FLINDERS P L GARDNER J G MINSHULL R J LLEWELYN Functional memory as a general purpose systems technology 1970 IEEE Computer Group Conference June 1970 10 P L GARDNER Functional memory and its microprogramming implications IEEE Trans on Computers Vol C20 No 7 pp 764-755 July 1971 11 D A SAVITT H H LOVE Association storing processor study Hughes Aircraft Technical Report No TR-66-174 (AD 488538) June 1966 12 M H LE\VIN Retrieval of ordered lists from a content addressed memory RCA Review June 1962 pp 215-229 13 G-AE SUBCOMMITTEE ON MEASUREMENT CONCEPTS What is the fast Fourier transform IEEE Trans Audio and Electroacoustics Vol AV-15 pp 44-55 June 1967 14 \V K PRATT J KANE H C ANDREWS Hadamard transform image coding Proc IEEE Vol 57 No 1 Jan 1969 pp 58-68 15 M A WESLEY Associative parallel processing for the fast Fourier transform IEEE Trans on Audio and Electroacoustics Vol Au-17 No 2 pp 162-165 June 1969 16 A .T VITERBI Error bounds for convolutional codes and an asymptotically optimum decoding algorithm IEEE Trans on Inf Theory April 1967 Vol IT-13 No 2 pp 260-269 17 S K CHANG Parallel computation of local operations Proc Third ACM Symposium on Theory of Computing May 1971 pp 101-115 I An eclectic information processing system * \ '!II" by R. C U TTS, . J HAYNES, H. HUSKEY, J. KAUBISCH, L. LAITINEN, G. TOLLKUHN, and E. YARWOOD University of California Santa Cruz, California INTRODUCTION A Turing machine, although universal in the required sense is far too cumbersome. Thus, it becomes necessary to choose for implementation a set of ~rimitive processes that is a compromise among ge~eraht~, complexity, and speed. Any particular ~p~r~tIOn mIg~t be built-in or composed from more prImItIve operatIOns. Except for speed the user does not know the difference; and since the primitives are universal he can always use them to build other high-level operations to suit his needs. In other words, an extensible system is wanted. Of course there are other desiderata for a computer system b~sides extensibility and universality, such as conciseness, naturalness, ease of learning, self-documentation of programs, ease of generating efficient runtime representations, etc. Instead of concentrating on just the programming language, this project aims to build an integrated language and logical structure for computing. Recursive string processing is the basis of this system. This is a general technique; most computer applications can be characterized as operating upon an input string to produce an output string as specified in a procedure string. Properly outside this scope are computer graphics applications, since these apply to two-dimensional objects rather than to strings. However, as usually instrumented, even graphical processing begins by transforming a picture into a string of point values, and ends by stringing point values together for the output device. .. . It is fair to question the wisdom of desIgnmg a timesharing system in the light of the one-user free-standing computer-on-a-chip promised by MOS technology. There are really two questions: (1) Are there resources which can usefully be shared among users? (2) Can resource-sharing be implemented economically? The authors offer an affirmative answer to both of these questions. The first resource worth sharing is high-speed storage. Users have varying storage requirements, so that·with individual machines of any storage size there This paper is a progress report on a computer system Which is now being designed and constructed. As the title indicates, ideas that seem good have been taken from many different sources. Many features of contemporary large systems that were earlier incorporated into a plan for a large machine! are now being applied to this smaller system. The new design is grounded in hardware string pro~IIII'I cessing, affording a greater generality of application than is typical in existing small systems. The structure employs multiple, specialized sub-processors operating concurrently. Interrupts are not needed; rather the natural breakpoints in evaluating expressions cause I p~ocessors to move from one task to anothe~ under the control of hardware queues. Main storage IS allocated in variable-length segments. Parsing hardware facilitates the use of input languages of familiar style. Disk storage is provided only for currently-active users; long-term storage of user data is on personal magnetic tape cassettes. The system structure readily admits the addition of evaluating processors designed to improve performance in specific areas. Vector arithmetic and graphical di~ play of functions are two examples in the prototype system. Aside from its computational capabilities, the sysI' tem can serve as a communications "front end" for a , large computer system. . The goal is a relatively inexpensive system that wIll serve the needs of a diverse community of users, such as the faculty and students of a small college. These potential users cannot foresee their information processing needs any better than they can foresee the results of their yet-undone research. This means that the only completely satisfactory system is one that is universal, I, i.e., one that can compute anything that is computable. I I I I I I, I I I * This work was supported by the National Science Foundation, Grant GJ-30436X. 473 474 Fall Joint Computer Conference, 1972 would always be storage going to waste in some machines while other users could not run for lack of storage. The economy-of-scale argument to justify resourcesharing, while no longer convincing as far as hardware cost is concerned, remains valid in considering the incremental cost of adding users to the system. For individual machines the cost of serving N users is N times the cost of one machine. For N users of a shared system, the cost per user may exceed the cost of individual machines for small N, but for large N the cost per user tends to decrease. This comes about for two reasons: (1) Adding one user without increasing system resources does not seriously degrade service to the other users. (2) The peaks in moment-by-moment demand for resources tend to average out, so that only the total average demand must be satisfied rather than the sum of the peak demands. String processing is characterized by an unusually dynamic pattern of storage requirements, since the objects of computation frequently vary in length. It is less clear that processor time is worth sharing since the price of processor logic has dropped by several orders of magnitude; while the cost of software to keep the processor busy has increased considerably. In this system the problem is largely avoided by employing a number of dissimilar, cheap sub-processors operating simultaneously; and by taking advantage of natural breakpoints in execution to switch a processor from one user to the next. The several processors are driven by hardware queues. Interrupt control of them is not needed, because any process requiring service simply makes an entry in the appropriate queue. The economics of hardware still make it worthwhile to employ more than one level of storage in a system. Currently, rotating memory remains the choice for secondary storage. The difference between the speed of fast storage and the average access time of rotating storage is a severe problem. If requests for disk transfers are handled on a first-come, first-served basis, one average access time is required for each transfer. Aside from slowing down user processing this is a gross waste of memory residence time. A ·block of data in memory sits around for several milliseconds awaiting disk access and is then read or written very quickly. Utilization of fast storage can be improved considerably by employing a "smarter" disk controller. For a disk-to-main transfer the controller can delay seizing memory until just before transfer starts; and for a main-to-disk transfer the controller might write the data into the first available location rather than waiting for some location specified by the processor. It is now well-known that a disk controller should schedule its oWn tasks according to a shortest-seek-time-first policy. USER LANGUAGE To construct a computer system like the one contemplated, one typically selects an existing general purpose computer, then selects, or designs, a user language, providing appropriate extensions to the language to give the user access to all functions of the system. Next an interpreter is implemented for the user language in the machine language of the selected computer. In this project, however, a general and extensible user language is being designed first, after which the set of small processors to interpret the language will be designed and constructed. In recursive string processing the primitive element dealt with is the variable-length character string. Numbers, identifiers and subroutines, as wel~ as arbitrary strings of text, are all examples of this type. The result of evaluating a function may immediately be used as input for further evaluation. Examples of earlier recursive string processing systems are Calvin Mooers' TRAC®2 and Christopher Strachey's General Purpose Macrogenerator. 3 The simplicity of the scanning algorithm for the TRAC language follows from the fact that the source code is already in a form of prefix-Polish notation. The Polish string is the proper starting point for the design of ~ machine language (see Barton4 ). This notation makes TRAC somewhat awkward as a user language. The system described here provides a more conventional user language called ZIP, using infix operators with operator precedence, function notation, and other such attributes, but giving the user access to a set of string manipulation primitives like those of TRAC. Such a notation is clearly desirable, especially for arithmetic computation, allowing one to write I A=B+C*2- (5+C); or S=CONCAT(S, [THE HOUSE.]); instead of (the equivalent TRAC expressions) : # (DS, A, # (SUB, # (ADD, # (CL, B), # (MUL, # (CL, C), 2)), # (ADD, 5, # (CL, C)))) or .# (DS, S, # # (CL, S) THE HOUSE.) Note that in ZIP one may refer to the value of an identifier without marking that identifier with the CL (call) function, as is necessary in TRAC. On the other hand, quoted strings must be marked by square brackets in ZIP, where no such quoting is necessary in TRAC. To implement such a language, there is a hardware processor, called the parser, to take text from the source string and place operands on an evaluation stack I I I An Eclectic Information Processing System in the correct sequence for evaluation by the evaluatorprocessor. The interpretation of source text is accomplished by alternating actions of the parser and evaluator. The parser places operands on the stack until an operation is called for, then signals the evaluator to perform that operation, after which control is returned to the parser. In addition to the source string and the evaluation stack, each user will have a parse stack, to be used exclusively by the parser in rearranging operators from the source string. The basic algorithm for infix-to-postfix-Polish translation using a stack is discussed thoroughly in References 6, 7, and elsewhere. In our interpretive scheme, operations are performed at appropriate steps in the parsing of source code, rather than operators being placed in a postfixPolish code string for later execution. In either case the algorithm for translation from infix notation is the same. By keeping these two parts of the system, the parser and the evaluator, conceptually and physically separate, either the language or the evaluator primitives may be redesigned without extensive global design changes. . Implementation of this parser in hardware may seem a formidable task, but the translation process will be simpler than in conventional compilers because of the direct correspondence between the source-language operators and the primitive operations of the evaluation processor. An example of this is the if-then-else construct, which could be of the form: I~ I IF (logical expression) THEN [(statements)] ELSE [(statements)];. Instead of emitting test and branch instructions into object code, as would a typical compiler, our interpretation scheme need only present the evaluator with the following top-of-stack configuration: /logical value / string of text / string of text / 475 The processors are independent hardware devices all referring to the same memory. They are: 1. Input processor 2. 3. 4. 5. Output processor Parser Evaluator Allocator Each processor has a request table with an entry corresponding to each user. Each processor scans its table in round-robin fashion and performs tasks for any user for which a request exists. The input processor The processor scans for input characters from each user (actually from each input terminal on the system). It has two tables; one indicates for each user whether input is expected, and the second specifies whether input is to be echoed on the user's output device. If an arriving character is expected then it is placed on the user's evaluation stack. If it is a terminator symbol (found by checking the character against the list of terminator symbols for that user), then input expected is set to zero for that user, and his entry in the parser request table is set. If echo is "on" the character is displayed. If input is not expected, the character is compared with a list of special symbols such as "log-on", "suspend" (pause) , " continue" (start up after pause), "kill" (stop doing everything), etc. If it is none of these it will be ignored. Log-on causes initialization of pointers and the assignment of initial segments for stacks. Suspend disables a user and no processor will take 'any action for him. Continue re-enables the user and processing continues. Kill clears requests in all tables and initializes that user. top of evaluation stack---.J and call for the if-then-else operation. This causes the The parser I evaluator to select the first or second string of text, de- I . pending on the value of the logical expression. The selected string is then copied into the source string and interpreted. . 'THE PROCESSOR SYSTEM In order to simplify this presentation, a five-processor Isystem will be described. Whereas economics forces one 'Ito use several levels of memory (e.g., high speed, disk, :and magnetic tape), this presentation will be in terms ',of a single level. lll' The parser scans its table for requests. If a request exists for a particular user the parser starts obtaining characters from that user's source string. An item from the source string is compared with the top of the parse stack, and depending upon a precedence table, the item may be (1) discarded, (2) placed on the parse stack, or (3) placed on the evaluation stack. Marks are automatically placed in the evaluation stack to delimit multicharacter items. If an operation is to be performed then the evaluator request table is marked and the parser request table entry for that user is cleared. 476 Fall Joint Computer Conference, 1972 The evaluator The evaluator scans its table and for a given user's request performs the operation specified on the top of the evaluation stack. When the operation is complete the parser request table is again marked for that user. Thus, for each user the parser and the evaluator are alternately processing text and performing operations. The evaluator does the conventional arithmetic operations, string operations, etc. Named operands are kept in a linked list of segments called the form store. A form segment contains both the name of an operand and its value. To fetch an operand the evaluator searches the user's form store, and upon finding the form name copies the corresponding value. To store an operand, form store is searched to find any previous instance of the name. The allocator is called to delete this segment (the memory space is returned to the free list). Then the allocator is requested to provide space for the new form. The name and value are copied from the evaluator stack into the new space. The allocator The allocator answers requests to release memory and to obtain memory. The release process involves connecting the released segment into a list of free segments. If either of its neighbors is free merging takes place to produce the largest possible contiguous free segment. Obtaining memory for a user involves (1) scanning the free list to find a segment long enough, (2) disconnecting it from the free list, and (3) returning the address of the segment to the user. STORAGE ALLOCATION Main storage is addressable by byte, and is allocated to processors in variable-length segments. While segmentation is more difficult to implement than some other storage allocation schemes, and while it tends. to tie up some storage for bookkeeping, it is desirable in a system in which all data are variable-length strings. The smallest possible segment is nine bytes, as a not-inuse segment contains this much linkage information. An in-use segment typically has four or five bytes devoted to linkage. The largest segment can, in principle, be as large as all of storage; but, in practice, the segments that can be assigned to processors are restricted In SIze. When the system is initialized, storage is partitioned into two segments. The segment beginning at address zero is called the base segment, and contains system global information. The remainder of storage is formed into a single free segment. As the $ystem runs, processors request and release space, causing the free storage to become fragmented. The free segments are doublylinked together into a chain. In-use segments are chained to various lists belonging to individual processors. These lists include the stacks and form store. The formats of segments vary, but all contain the extent of the segment in the first two bytes. The allocator is a processor which has the task of managing the free storage chain while servicing processor's requests to obtain or release storage. All such requests are made to the allocator. A reserved location in the base segment contains a pointer to the beginning of the free chain. Another reserved location contains the total amount of free space currently available, ignoring fragmentation. When a processor requests additional space the allocator checks whether this much space is available at all. If so, it begins a search of the free chain for a free segment at least as large as requested. The free list is not ordered in any particular way. In searching for space the allocator chooses the first segment that is big enough rather than looking for a particularly close fit. This policy is one of Knuth's5 recommendations; a best-fit takes more time than a first-fit policy and tends to proliferate small free segments that are rarely useful. When the allocator finds a sufficiently large segment there are two possibilities: the segment may be large enough to satisfy the request with a usable amount of space left over, or the segment may be an exact or close fit. In the former case, space is excised from the tail of the free segment and formed into a new segment. In case of a close fit there is not enough space left over to form a free segment, so the entire segment is taken from the free list; the extra bytes, if any, are marked null. If the request can be satisfied the address of the segment is returned to the calling processor; otherwise it moves on to another task with the request unsatisfied. When a processor.is ready to release a segment that has been in use it calls the allocator with a pointer to the segment. The allocator adds the extent of the segment to the count of free storage and proceeds to connect it with the free chain. First, its neighbors are checked; they might also be free. If so, the segment being released can be merged into its free neighbor(s). Otherwise, the segment is simply added to the end of the free chain. lVlerging whenever possible reduces fragmentation. Knuth's experiments suggest that fragmentation will not become a serious problem so long as segments are restricted in size to less than perhaps 1/10 of total storage capacity. The allocator may satisfy requests for storage only from the free chain. The system will keep a tally of the total storage assigned to each user so that the performance can be monitored. The amount of storage allo- An Eclectic Information Processing System cated to an inactive user is only a couple of bytes in the base segment. When a user becomes active he is assigned I segments for stack space, but no space for form storage. A pointer to the base of the evaluation stack is placed in the base segment. When a stack is created a segment of minimum size is allocated to hold it. This segment contains the stack pointer. If the stack outgrows this segment another segment is requested and chained to the first. The pointer is understood to point relative to the origin of the segment which currently contains the top of the stack. tl When the pointer comes back down out of a stack segment, that segment is released, and the pointer is set to the top of the previous segment. The segments of a stack form a doubly-linked chain. Forms are stored one per segment; large pieces of text must be segmented to stay within the maximum segment size restriction placed on processor space requests. The form store is a singly-linked chain of form segments. In addition to the segment extent and chain pointer, a form segment contains the length of the name string, the name string itself, and the value string. New forms are added at the beginning of the form store i, chain. When space must be released the form at the ~ end of the chain is selected for writing out to disk. Each time a value is assigned to a name a new form is created; the previous instance of that name will usually be de, leted. This is done even if the new value would fit into the old form segment. With forms varying in length so dynamically, it does not appear worthwhile to try to ,I re-use an old form segment. Further, the policy of creating a new form at each assignment means that the most recently assigned form is at the head of the list. To locate a particular form by name a linear search is performed on the list. This should perform better than the average for randomly-distributed forms, since the pattern ,. of accesses to operands during program execution is not : at all random. The name search process is related to the memory management process. It is just those items which should be at the head of the list that should have priority for remaining in fast storage. If a frequently'referenced item does happen to get written out to disk lit will be brought back to the head of the list at the next reference, where it will enjoy quick access for a time. ':11 I I II I, 'SECONDARY STORAGE The secondary storage subsystem consists of a control processor and a rotating memory. Commands to the (processor specify the main storage address of a segment , to be written out, or the disk address of a segment to be ead in. Writing takes precedence over reading, since !i· t tends to free space in main storage. A write operation I 477 returns the disk address at which the segment has been stored to the user-evaluator stack; a read command returns the main storage address at which the requested segment has been loaded. The minimum addressable amount of disk space is called a sector. Short segments will be written on disk one per sector, while longer ones will occupy several successive sectors. In a write operation the control processor must locate the first available space on the disk having the requisite number of free sectors. For this purpose a free-sector map containing one bit per sector is maintained. The bits of the map are organized into shift registers, one per disk track, which are shifted synchronously with disk rotation. The bits in the register corresponding to sectors which are approaching the disk write heads are shifted through discrete flip-flops. Simple gating of their outputs indicates the number of contiguous free sectors which can be written next. Knowing how many contiguous sectors are needed, the control processor scans over the shift registers until it has found sufficient space. It then sets the flip-flops to mark the chosen sectors in-use, performs the write, returns the disk address to the evaluator stack, and requests the allocator to release the main storage segment that has been written. Should the control processor fail to find enough space on any track it must wait until another sector time has gone by, at which time space might be sufficient. Meanwhile it can attempt to process other write requests. Read requests are stored by the disk control processor in a list that is kept sorted by disk address. The disk sector counter is compared with the list to determine whether a read can be executed at the next sector time. If so, the processor first requests enough main storage to receive the data to be read. If this succeeds it performs the read, marks the disk sectors free in the map, and returns the main storage address to the user-evaluator stack. If the allocator fails to make the requested main storage available soon enough the read request IS returned to the sorted list for a later attempt. ACKNOWLEDGMENTS The ideas used in this design have come from many sources. Some of the most helpful sources were Alan Kay, 8 and Rex Rice. 9 REFERENCES' 1 J HAYNES Designing a computer: The eclectic information processing system National Technical Information Service Springfield Virginia No N71-19923 478 Fall Joint Computer Conference, 1972 2 C N MOOERS TRAC-A procedure describing language for the reactive typewriter Communications ACM Vol 9 No 3 pp 215-219 March 1966 3 C STRACHEY A general purpose macrogenerator Computer Journal Vol 8 No 3 pp 225-241 October 1965 4 R S BARTON Ideas for computer systems organization: A personal survey Software Engineering CO INS III Proceedings of the Third Symposium on Computer and Information Sciences Miami Beach Florida pp 7-13 December 1969 5 D E KNUTH The art of computer programming Vol 1 Addison-Wesley Palo Alto 1968 pp 435-451 6 W M MCKEEMAN J T HORNING DB WORTMAN A compiler generator Prentice-Hall New Jersey 1970 7 P WEGNER An introduction to stack compilation techniques Introduction to System Programming Academic Press 1964 8 A KAY The reactive engine PhD Dissertation University of Utah 1969 9 R RICE et al Papers on the SYMBOL system AFIPS Conference Proceedings 1971 Spring Joint Computer Conference Vol 38 AFIPS Press Montvale New Jersey 1971 pp 563-616 I I Microtext-The design of a microprogrammed finite state search machine for full-text retrieval by R. H. BULLEN, JR. and J. K. MILLEN The MITRE Corporation Bedford, Massachusetts INTRODUCTION selves well to dynamically changing file collections, because no indexing need be done; but file size is usually restricted because search time is proportional to the amount of text searched. However, because searching can be done on a character-by-character basis, direct searching can permit considerably more detailed query patterns than are possible with most indexing schemes. In addition, editing and augmentation of the full text can be performed on-line, although sometimes with side-effects which can adversely affect later search performance (e.g., file fragmentation). The Microtext system represents a new approach to the design and implementation of a full-text retrieval system. The approach is unusual in that it integrates hardware, firmware, and software components in an attempt to provide a solution to the problems involved in pro\ cessing large files of unformatted textual data. The sysii tem is based on a minicomputer specialized for high, speed full-text retrieval, through the use of a finite state ~ search algorithm implemented in firmware. " I i Full-text retrieval Regardless of which of the above categories a full-text system may fall into, it is at an immediate disadvantage with respect to retrieval performance when compared, for example, with structured data retrieval systems (i.e., data management or management information systems). In the latter case, requests for qualifying data base entries can be satisfied by inspection of a selected subset of fields in each data base entry, whereas a fulJtext system must concern itself with all of the text in each entry. This problem is particularly acute for direct search full-text systems, since all of the text must be scanned each time a search is performed rather than only once, at index generation time, in the case of indexed systems. Full-text systems, and specifically direct search systems, are plagued with a second problem, which is at the heart of the motivation for the Microtext system. In addition to having to process a very large amount of data in response to retrieval requests, direct search systems have a performance disadvantage because of the inability to. express full-text handling functions in the primitives and data structures available on most general-purpose computers. Software must be used to map these application-level functions, often with great Full-text retrieval, as distinguished from other types of text and data processing, involves the location of patterns of characters, words, and phrases in text. In , addition, bibliographic structures, such as title or author, as well as linguistic structures, such as sentence and paragraph, can be identified in text when the data W base has been suitably constructed. :, A variety of systems have been built to perform fulltext retrieval,1·2 If generalizations are possible, these systems can be divided into two categories : I I ,I (1) Those systems which use an index, or concordance, of text words during retrieval, but which have access to the full text for display. In some cases, such systems can also perform a sequential search of the full text for query items not also in the index. Generally speaking, search performance with indexed systems is adequate, but index generation and update is time-consuming and, as a result, editing or augmenting the full text must usually be done off-line, if at all. (2) Those systems which always make a direct search of the full text. Such systems lend them479 480 Fall Joint Computer Conference, 1972 difficulty, into the facilities of fundamentally word- and arithmetic-oriented central processors; it is this mismatch of problem and tool, and the additional level of mapping required, which adversely affects full-text handling systems, and it is to this mismatch that the Microtext work is addressed. An application architecture One approach to the solution to these problems, and the approach which was taken in the development of Microtext, is to work toward the design of a computer system specialized for full-text processing and retrieval functions. The system envisioned would be built up from hardware, firmware, and software components in the following way: (1) hardware: state-of-the-art, commercially available hardware would be used to provide a lowcost, easily reproduced base for the system. The hardware would be chosen with a view toward its eventual use by judging its inherent suitability for character string handling, its raw performance, and its ease of microprogramming. (2) firmware: microcode would define the data structures, primitives, and basic architecture (execution environment) for text handling problems at a level which facilitates their expression. In addition, because Microtext is viewed as an application-oriented machine, many functions typically thought of as the province of an operating system would be implemented directly in microcode. * (3) software: software would be used for most dataand user-oriented functions so that they could be easily changed to suit specific application requirements. The question is: how to get there from here? The Microtext develo,oment plan his task by specifying the system architecture, he might risk bounding the problem before it was identified in a practical sense. If, as in the case of Microtext, this involved a higher-level application machine specification, later changes to the system due to practical. requirements could affect the basic architecture of the system, and changes at that late date might not be tolerabl.e. A more conservative, practical approach was chosen for Microtext. The development plan for Microtext r.lakes use of a phased, or boot-strap, technique, wherein the output of each phase is an operational prototype, the application of which can proceed in parallel with the design of the following phase. The approach has the advantage that each phase can bind to the basic architecture of the system only that subset of the application environment which has been proven through practical experience and user feedback, leaving still experimental components untouched in software where they can be changed easily if the need arises. I Phase I To test the design philosophy described above, it was decided that the first phase in the development activity should be a prototype software implementation of a fulltext retrieval system, with an important, but manageable, subcomponent of the system implemented in firmware. The idea was to take a cautious, initial step, to prove the feasibility of the approach as well as to encourage, through the production of software support, the development and use of data structures and primi- , tive operations fundamental to full-text handling problems, so that these facilities might be well enough under- ' stood to be applied in subsequent phases of the Micro- ' text activity. In line with this goal, a fundamental primitive of fulltext retrieval-character string searching-was selected I for implementation as the function of a microprogrammed, black-box peripheral device attached to a larger host computer system, specifically an IBM System 370/155. A search algorithm was designed, using techniques of finite state automata theory, and was implemented in firmware on a Digital Scientific Corporation Meta-4 computer. Higher-level language soft-I ware was used to implement the control logic for the device, as well as the application logic necessary to, demonstrate the operation of the system. The sections which follow describe the overall structure of the system, the special finite state search al-! gorithm designed for the application, and the implementation of the algorithm in firmware. A final section disi I I I Aside from the fact that a task of this magnitude would take considerable time and money, with few intermediate products along the way, there is also a fundamental technical problem involved here. If a designer were to take the theoretical approach and begin * A recent project at MITRE has demonstrated ways in which operating system functions can be distributed between firmware and software. 3 I I I Microtext 481 cusses some of the refinements planned for the Phase I system and suggests possible directions for Phase II activity. SYSTEM STRUCTURE A pplication environment 'I The goal in the specification of an application en- I vironment for the initial version of the Microtext system was to model the operational characteristics of a full-text retrieval system, without actually implementing all the bells and whistles which a demanding user / might desire. For this first pass, we were most interested III in basic structure, not so much'In f orm. I, From the user's point of view, Microtext provides an on-line full-text retrieval capability, available through a time-sharing system* on IBM 2260 displays, 2741 terminals, and teletypes. The heart of the terminal enI vironment is provided by three commands, described [ below. 'i"I,,"., l Figure I-Flow diagram of a hypothetical full-text retrieval system I, DANSWER-Display answers This command creates a file of retrieved text and allows the user to browse through this file. System operation This command causes the query to be processed and the search to be initiated. The query is redisplayed at the terminal for verification, and as the search progresses the search monitor displays continuous hit data to reassure the user that the system is actually working. , The user controls the frequency of this output by the parameters specified in the query questionnaire. At the end of the search, the system displays the total number of documents searched as well as the number of hits. In order to understand the role of the microprogrammed processor in the Microtext system, consider first the operation of a hypothetical full-text retrieval system, as it might be driven by the set of commands described above. For this purpose, the system can be thought of as three primary modules: (1) a query translator, (2) a search monitor, and (3) a display processor. From the user's point of view, the first two of these modules are not really thought of as separate components , and in the online terminal environment described above, they are lumped together under the single "SEARCH" command. Figure 1 gives a flow diagram of such a system. User input is first validated by the query translator and is then translated into an internal form, which facilitates easy evaluation of subparts of the query. This internal representation and the text file are then input to the search monitor which produces, not documents, but pointers to text items matched during the search. This list of pointers, and the text file, are then input to the display processor which gives the user access to the retrieved items. It is in support of the operation of the search monitor that it was decided to apply firmware components first. To see how this was done, let us break the search monitor down further into the following processing functions: *The system under which the Microtext software runs is OS/MVT , with the Time Sharing Option (TSO). (1) a data base interface, which is concerned with data access requirements and with the specifics DQUERY-Display query questionnaire This command causes display at the user's terminal of a questionnaire which is used to specify basic parameter data for the search, as well as the query itself. Also specified in the questionnaire is information about the " structure and format of the file to be searched. The query language used to specify retrieval requests allows searches for words, phrases, or expressions involving words and phrases, optionally restricted in scope , to the level of sentence or paragraph. This language is described in more detail in the following section. : I ,II' 1,' 1 I SEARCH-Search file 482 Fall Joint Computer Conference, 1972 CORE data base ~---+~ interface 1------+---::.;o:::.=..--tI CPU matches Microtext Search Machine (Meta-4) Host Machine transfer of text first to the host machine and then to the search machine, but would have been inconsistent with the development goals described in the Introduction to this paper. In this initial version of the Microtext system, we wanted to separate as much as possible the well-defined problem of character string searching from such functions as the data base interface, which are more likely to be sensitive to particular application requirements. The final section of this paper presents this mode of operation, as well as other alternatives, as possible directions for future work. THE SEARCH MACHINE (370/155) Figure 2-Retrieval operation with Microtext search machine of file structure, and which has the function of preparing text blocks for searching; (2) an evaluator, which drives the matching process, evaluates the query, and records hit information; (3) and a character string searching algorithm, which performs the scanning and recognition involved in the retrieval process. Figure 2 shows this same system, no longer hypothetical, redrawn to indicate how the Microtext search machine replaces the character string searching function of the search monitor. Here the three major components perform exactly the same functions as before, but the data flow is slightly altered. The internal form of the query, described in more detail in a later section, is in a tabular form, highly compacted to fit within the available core memory on the Microtext search machine. After the table is generated, it is sent over a high-speed interface to the Microtext machine. Control is then passed to the search monitor which accesses the text as before, but appeals to special functions which communicate directly with the search machine, through operating system I/O facilities. The results of individual matches are returned to the host machine and hit information is recorded for later use by the display processor. The reader will note from Figure 2 that there are several other ways in which -the operation of a full-text retrieval system might be shared between a host machine and a specialized, microprogrammed processor. One such way might be the inclusion in the Microtext machine of the data base interface function and the incorporation in the design of a direct connection between this machine and the data base. This would have had the obvious advantage of avoiding the extra I/O Brief description In this section we will examine the Microtext search machine more closely. It is, of course, implemented in firmware, but before we can fully appreciate this aspect of the machine, we have to understand the driving algorithm, and the manner in which the input to that algorithm is generated. A finite state approach to character string searching satisfies the two requirements of (1) improving the performance of the sequential search, and (2) not sacri- : ficing in any way the user's ability to state search requests that reap the benefit of having the full text available. This is accomplished by first transforming the search request into a table using a software routine. The actual search is then performed on each section of text by a very simple microprogrammed algorithm which operates on the text, the table, and a register holding the "state" of the search. A section of text is by definition that portion of text submitted to an individual execution of the search algorithm. Its length is controlled by application software, and it could range from a sentence to a complete document. The search passes through the text section from beginning to end, using each successive character to transform the state by consulting the table. At the end of the section, the output of the resulting state indicates whether or not the section satisfies the search request. (In cases where the search request succeeds or fails before reaching the end of the section, the search stops immediately and restarts with the next section.) By choosing a query language abstractly equivalent to the regular expression language of Kleene, we can I employ existing algorithms to construct a finite state recognizer for strings of characters satisfying the query.* At the same time, the regular expression I I I I I I I I * We use "query" interchangeably with "search request." Microtext language is powerful enough to support a query language at least as flexible as those designed for existing full-text systems. 1 ,2 Various ways are then available for designing a table to direct the emulation, as it were, of the finite state maooine. We have chosen the straightforward tactic of constructing a deterministic transition table. A nondeterministic version of a finite state machine is generally smaller and more easily found from the regular expression; a scheme for using it for string searches was suggested by Thompson. 4 A nondeterministic search method, however, was thought less suitable for microprogrammed implementation because of the greater number of core references required per character. It should be kept in mind that, while the table format (or choice of formats) is fixed by firmware, a new finite state machine to fill in the table must be constructed for each search request, preferably quickly enough not to discourage a user waiting at an interactive terminal. I~I The query language Our present query language is regular expression notation modified for the convenience of the user. The precedence of operators has been changed to reduce the number of parentheses required in natural formulations of common search requests, and a number of standard abbreviations have been set up. The following samples illustrate both the flavor of " the present query language and the power of search requests based on regular expressions. ,I I Query 1: / MICROPROGR/ & -,1 EMUL/ Query 2: /U# # #( U S) TROOP/ & / ( WITHDR I PULL- OUT)/ & 'SENTENCE')/ \ I Query 1 specifies a section of text about microprogramming but not emulation. The slashes indicate the embedding of the adjacent expressions in arbitrary text, and the blanks in contact with letters denote required punctuation or blanks. Thus, more literally, a section satisfies Query 1 if it contains a word beginning with "MICROPROG" but no word beginning with " "EMUL." Query 2 specifies a section mentioning the withI drawal of at least 100 U.S. troops. The number sign , # stands for an arbitrary digit; the vertical bar is the , "or" operator; the hyphen permits an arbitrary string I of letters; and the angle brackets ( ) enclose an optional expression. In order to ensure that the "TROOP'" , I I 'I ~'I 483 mention is logically related to the "WITHDR" mention, they are required to be in the same sentence. The queries are recognizable as regular expressions after the abbreviations have been expanded. For example, the slash / is translated to .¢.* The 'SENTENCE' in quotes is an abbreviation for a moderately complicated regular expression characterizing the set of strings which can be sentences in the given data base. The option brackets are expanded so that (expression) becomes an "or" between the null string and "expression" (the right bracket just becomes a right parenthesis). Incidentally, the digit sign # is not expanded into (0 I ... I 9), but is retained by the software as a single character-range symbol until the final construction of the table. Query translation Construction of the table from the query can be summarized in four steps: (1) (2) (3) (4) Expansion of abbreviations Infix-to-prefix translation Production of the state graph Table generation. The Microtext implementation of this process is unusual in two ways: string manipulation techniques were used throughout (to simplify working space management and to anticipate the development of Phase II primitives), and several well-known algorithms were used in straightforward ways. Expansion of abbreviations While selecting the abbreviations requires some ingenuity, their expansion in the query is a simple table lookup. This is fortunate, because new abbreviations generally have to be designed for different data bases. For example, the fact that a data base mayor may not have lower case letters affects the abbreviation for "arbitrary letter". Eventually Microtext software will have an associated data base descriptor file which will be used for, among other things, selecting the correct expansions. This will allow the user to express his queries in the same language, regardless of the structure of the data base he is searching. * Regular expressions are built up from the character set using operators and two special symbols: phi (e/» and lambda or nil (X). The symbol e/> represents the empty set and -,e/>, therefore, represents the set of all strings. The symbol X represents the null string. 484 Fall Joint Computer Conference, 1972 Infix-to-prefix translation The regular expression resulting from expanding the abbreviations is translated from its infix-operator form to a prefix form which is both more compact and easier to manipulate symbolically. Unions, intersections, and concatenations have any number of arguments and are thus parenthesized; complements and Kleene closures (stars) have one argument and have no bounding parentheses. Zero and one are used for ¢ and A, respectively. A sample prefix regular expression is input character, and (3) the next state output. The next state output is an indication of whether the text starting from the beginning of the section and ending with the current input character is recognized as satisfying the query. Before production of the state graph, the query is augmented slightly so that only complete sections satisfying the query are accepted by the finite state machine. To cut down on the length of the list of transitions, certain characters are distinguished as significant for each state; the others share a default transition. In most states, only a few characters will be significant. (&*(1 AB)(., OB)) The infix-to-prefix translation is an instance of the classical use of a pushdown stack for this purpose. A transduction grammar of sixteen productions (exclusive of the replacement of the character set by a single nonterminal) was found and used in a simple syntaxdirected translation using Lewis and Stearns' three stack algorithm. 5 Production of the state graph Brzozowski's derivative method was implemented to produce the state graph. 6 There is essentially only one other kind of method, based on Kleene's original proof that regular expressions can be recognized by finite state machines. It has two steps: generating a nondeterministic machine, and then converting it to a deterministic one. While this two-step method is fine in a batch system, such as the RWORD system for producing lexical processors, where it is followed by a state reduction phase, our early experiments in this direction were discouraging in speed of operation. 7 A number of far-reaching design choices were made here for reasons of efficiency.* For example, the derivative algorithm generates a regular expression for each state, and these must be compared with previously generated ones and stored if they are new. Since even reasonable queries can give rise to large numbers of states, most state expressions are stored on disk, while a few are kept in a buffer according to a usage-age rule. Details of this and other strategies of the state graph production procedure constitute a paper in preparation. The state graph is produced in the form of a list of transitions from each state. A transition comprises (1) an input character, (2) the next state after reading that * Queries like those discussed in this section have been routinely processed and generate machines of about 50 states and 300 transitions. State graph production occurs at a rate of about 20 transitions per second of CPU time. Table generation The idea of distinguishing significant from default characters carries over into the design of the table used by the microprogrammed search algorithm. To explain the design of the table, let us shift our time frame from the preprocessing of the query to the execution of the search algorithm. During the search, the transition for the current character must be located among the set of transitions from the current state. This is done, in the table format described below, with a binary search among the significant characters with respect to the eight-bit unsigned value of the current character. Failure of this search causes the default transition to be taken. For example, suppose state 1 has a transition to itself when the input is any letter, to state 2 on a blank, and to state 3 on any other input character. This portion of a hypothetical state graph is shown in Figure 3(a). The list of transitions from state 1 is shown in Figure 3(b). The table generator identifies the intervals of characters (considered as eight-bit unsigned binary ·numbers) causing each transition. It produces a list like the one in Figure 3(c). The isolated characters in the list in Figure 3(c) are the ones against which the input character is compared in the binary search for the proper transition. The search tree is shown in Figure 3(d). The order in which the comparisons are made is chosen by applying an easy modification of Huffman's algorithm to minimize the average search time (under the simplifying assumption that the successive characters in the text were chosen randomly and independently with given probabilities).8 The probabilities can be assigned proportionally to the relative frequencies of the individual characters in a representative sample of the data base, or the usual single-letter English probabilities can be used. This procedure, while not guaranteed optimal, should result in generally better performance than, say, choosing an arbitrary balanced tree on the same characters. I I i Microtext letter The locality of state Figure 3(a). 1 in a hypothesized state graph. -----------------ON INPUT TO STATE 1 [x' 00' ,blank) 3 1 blank 2 1 (blank,A) 3 1 A 1 1 (A,Z) 1 1 Z 1 1 (Z,x'FF' ] 3 TO STATE WITH OUTPUT 1 letter 1 0 1 blank 2 1 1 other 3 0 Figure 3 (b) . The transitions from state 1. Figure 3(d). The transitions by EBCDIC character value. ON INPUT ------------------ FROM STATE Figure 3(c). FROM STATE Intervals The search tree for state 1. exclude the endpoints except at x'OO' and x'FF' • CHARACTER DISPLACEMENTS SMALLER GREATER LINK LINK ADDRESS OF TABLE SECTION FOR NEXT STATE NEXT STATE OUTPUT A 1 2 addr(l) 0 blank 2 2 addr(2) 1 Z 0 1 addr(l) 0 0 0 addr(3) 0 Figure 3( a)-The locality of state 1 in a hypothesized state graph Figure 3(b)-The transitions from state 1 Figure 3(c)-The transitions by EBCDIC character value Intervals exclude the endpoints except at X'OO' and X'FF' Figure 3(d)-The search tree for state 1 Figure 3( e)-The layout of the table section for state 1 485 486 Fall Joint Computer Conference, 1972 The table section for a given state can be written directly from the binary search tree. Figure 3(e) shows the layout of the section of the table for the sample state discussed above. A binary search using computed addresses, although simple in concept, is a complicated algorithm by microprogramming standards; instead, to simplify next-address calculation, the displacements from each table entry to entries for greater- and smallervalued characters are found in the table for each transition. The table is laid out so that all of the displacements are positive. A zero displacement forces the present transition. Thus, the default transition is signalled by zeros in both link fields. Also shown in Figure 3(e) is a special feature of this choice of table format: the zero in the smaller link field for the character "Z" indicates that not only is state 1 to be the next state for the letter "Z", but also for all letters smaller than "Z". After generating all the table sections for the states in the state graph, and therefore knowing the relative addresses of the table sections for each state, the table generator makes a final pass over the entire table replacing state numbers with addresses of the corresponding table sections. Finally, the table contains, for each transition, the output associated with the next state; the use of this field is described with the search algorithm below. The internal format of a table entry, with bit addresses for each field, is shown below. is encountered with an output of one, or (2) if a state with an output of one is encountered first. Although, in the state graph, the output is one only at the end of a complete section, the list of transitions is inspected before table generation for states which, once entered, cannot be left until the end of the section. Their outputs are set to one so that the search will stop there and the remainder of the section can be skipped. Machine architecture The architecture of the Microtext search machine is shown in Figure 5. Some basic statistics about the size of the machine are indicated in that figure; the microprogram occupies 243 microinstructions, of which approximately 20 percent are for the search algorithm itself, the rest being required for system and interface transfer control. The device is initialized by writing the table and the initial state into the machine's core memory. The search command is then sent to the Meta-4. The search logic loads the initial state into a local store register where it remains for the duration of the search, and the search begins. As each character is accepted from the 370/155, get current Istatel (equal to address of first entry in 14 15 table section) address of next state next state output no more Note that a table entry requires two 16-bit words, and that therefore the address of a table entry is always even. Thus, the last bit of the next state address is always zero, permitting the last bit position to store the output for the next state. The search algorithm Figure 4 shows a flowchart of the firmware search algorithm which accesses the above table format. The algorithm has three inputs, as previously mentioned: the text section, the table, and the 16-bit current state value, which is maintained in a microregister during the search. The operation of the algorithm is quite straightforward. Note that the algorithm can terminate in either of two ways: (1) if the input is exhausted before a state use displacement to address next table entry to be tested Figure 4-The Microtext search algorithm done Microtext the count is incremented and a transition is taken by lookup in the table in core memory. As each new state replaces the current state in local store, it is inspected to see if its output is one. If so, the search terminates immediately by presentation of ending status to the 370/155 channel. Otherwise, the search proceeds until the channel stops sending data. When termination occurs', the search logic stores the updated state and count in core memory and the control logic takes over. On the 370/155, the channel then reads back the state and count from the machine's core memory, software records hit information if a match occurred, and the search is continued with the next block of text. .~. 487 interface 370/155c====I=,.transfer ... - - - -- dedicated logic locations and 1 table search _---+---~ algorithm space Core memory: Processor: 1K words ROM 16K words 90 nanosecond cycle 900 nanosecond cycle 12 general-purpose registers Key: - - - --initialization/termination transfer - - - s e a r c h transfer = = = 3 7 0 / 1 5 5 - Meta-4 interface Figure 5-Architecture of the Microtext search machine Performance data I t is instructive to compare the performance of a highly specialized, microcoded algorithm of this kind to a similar approach implemented in software. In this case the most appropriate comparison for the Microtext search algorithm would be to a machine language implementation of the algorithm for representative System/360 and System/370 machines, where the search would be performed on text in a buffer in the machine's main memory. The table below compares, for each method, the minimum time in microseconds to process a single character (that is, the case where the character matches the first table entry inspected by the algorithm), and the additional time in microseconds required for each subsequent probe in the table, if previous probes do not result in a match or default condition. minimum time II 'I probe time Microcode: Meta-4 . 4.5 .9 Software: 370/155 370/145 360/50 8.9 19.6 46.8 8.9 19.3 47.5 From the figures, it can be seen that the microcoded implementation averages several times faster than the software implementation on the 370/155. It should be noted that the large difference between minimum and probe times for the Meta-4 is due to the overhead for I/O interface transfer; the software implementations need only perform an Insert Character instruction. At its best, the Meta-4 microprogrammed implementation can scan text at roughly 220,000 characters per second, with the rate degrading to roughly 140,000 characters per second when three additional table probes are required to locate the character being processed. The lower rate is still five times faster than software on the 370/155 and 25 times faster than the 360/50 for the same case. FUTURE PLANS Future plans for the Microtext activity include completion of and refinements to the Phase I system, as well as analysis of the operation and application of the Phase I system as part of the design work for the next Phase. Completion of Phase I The goal of this activity is to bring the system to full operational standing, where its development can be frozen, and emphasis can be placed on its use and application. Extensions to the current software support are planned to make the system easier to use, and to improve the performance and capacity of the system in the translation of very complex queries. The microcoded search algorithm has remained stable since its implementation and no further changes to it are planned. Planning for Phase II We feel very strongly that the development of application-oriented systems should proceed in parallel with the application of initial, or prototype, versions of such systems. It is only through experience in the solution of real problems, and through user feedback, that truly useful. automated systems can be developed. Of necessity, then, we can at best suggest possible future directions for Microtext, with specific plans waiting 488 Fall Joint Computer Conference, 1972 until we have had the benefit of this application experience. One possible future direction is toward a version of the system which would take over query translation as well as searching responsibilities from application software. The user query would be sent directly to the Microtext processor, where it would be translated into the tabular finite state machine description. The host machine would then be notified that the processor is ready and the search would begin. This version of the Microtext processor would be able to take full advantage of the experiences of developing the query translation software for the Phase I machine. We would expect that many of the basic modules in this software would become microcoded instructions in the query translation machine, with the top level of this software becoming the Phase II machine language. A second possibility under consideration is to implement the Microtext searching capability as an adjunct to the basic control mechanism of a disk file subsystem on a general-purpose computer. This approach, which has promise for heavily I/O-bound installations, would augment "the primitive sequential and keyed lookup capabilities of such devices with a facility for, say, reading only those records which match specific patterns. Rather than transfer the retrieved records directly, this extended control mechanism could perform the entire search automatically, accumulating hit data in a separate file on disk. When the search completes, the application software could then access this file as an index into the original text file searched. CONCLUSION This paper has described the design and implementation of a specialized microprogrammed processor which performs character string searching for full-text retrieval applications. The activity has been successful in proving the feasibility of the approach, in identifying the basic requirements of such a system, and has pointed out areas for future work. Conclusions about the utility of the system we have developed must wait until we have had the opportunity to apply the system in solution of real problems and until we have had the benefit of user feedback. Although the Microtext project did not set out with this particular goal in mind, we feel that the success of our work to date demonstrates the utility of firmware as a tool for application system design. In recent years, microprogramming has largely been undertaken only by computer manufacturers, universities, and some research organizations, such as MITRE. Part of the reason for this is that inexpensive microprogrammable computers have not been available for experimentation, and few guidelines have been developed for the methodology of applying microprogramming in systems design. We believe that this situation is changing, and we hope that reports of practical experience such as ours with the development of Microtext will contribute to this body of knowledge. ACKNOWLEDGMENTS The authors wish to acknowledge the financial support of the MITRE Independent Research Program and the contributions to the design and implementation of the Microtext machine made by H. A. Bayard, E. L. Burke, G. E. DeAgazio, R. J. Fleischer, and W. L. Schiller. Additional acknowledgment for moral support and encouragement is due J. A. Clapp, E. L. Lafferty, and C. M. Sheehan. REFERENCES 1 R S GLANTZ SHOEBOX-A personal file handling system for textual data AFIPS Conference Proceedings Fall Joint Computer Conference Vol 37 1970 2 W B KEHL J F HORTY CRT BACON D S MITCHELL An information retrieval language for legal studies Communications of the Association for Computing Machinery Vol41961 3 B H LISKOV The design of the VENUS operating system Communications of the Association for Computing Machinery Vol 15 1972 4 K THOMPSON Regular expression search algorithm Communications of the Association for Computing Machinery Vol111968 5 P M LEWIS II P E STEARNS Syntax directed transduction Journal of the Association for Computing Machinery Vol 15 1968 6 J A BRZOZOWSKI Derivatives of regular expressions Journal of the Association for Computing Machinery Vol111964 7 W L JOHNSON J H PORTER S I ACKLEY D T ROSS A utomatic generation of efficient lexical processors using finite-state techniques Communications of the Assiciation for Computing Machinery Vol111968 8 D A HUFFMAN A method for the construction of minimum redundancy codes Proceedings IRE Vol 40 1952 Design of the Burroughs Bl700 by W. T. WILNER Burroughs Corporation Goleta, California variability has increased as well. The semantics of the growing number of programming languages are not converging to a small set of primitive operations. Each new language adds to our supply of fundamental data structures and basic operations. This shifting milieu has altered the premises from which new system designs are derived. To increase throughput on an expanding range of applications, general-purpose computers need to be adaptable more specifically to the tasks they try to perform. For example, if COBOL programs make up the daily workload, one's computer had better acquire a "Move" instruction whose function is similar to the semantics of the COBOL verb MOVE. To accommodate future applications, the variability of computer structures must increase, in yet unknown directions. Such ft.exibility reminds one of Proteus, the mythological god who could change his shape to that of any creature. INTRODUCTION I I' II '1 1Ii III: I '~' Procrustes was the ancient Attican malefactor who forced wayfarers to lie on an iron bed. He either stretched or cut short each person's legs to fit the bed's length. Finally, Procrustes was forced onto his own bed by Theseus. Today the story is being reenacted. Von N eumannderived machines are automatous malefactors who force programmers to lie on many procrustean beds. Memory cells and processor registers are rigid containers which contort data and instructions into unnatural fields. As we have painfully learned, contemporary representations of numbers introduce serious difficulties for numerical processing. Manipulation of variable-length information is excruciating. Another procrustean bed is machine instructions, which provide only a small number of elementary operations, compared to the gamut of algorithmic procedures. Although each set is universal,. in that it can compute any function, the scope of applications for which each is efficient is far smaller than the scope of applications for which each is used. Configuration limits, too, restrict information processing tasks to sizes which are often inadequate. Worst of all, even when a program and its data agreeably fit a particular machine, they are confined to that machine; few, if any, other computers can process them. In von Neumann's design for primordial EDVAC,1 ridigity of structure was more beneficial than detrimental. It simplified expensive hardware and bought precious speed. Since then, declining hardware costs and advanced software techniques have shifted the optimum blend of rigid versus variable structures toward variability. As long ago as 1961, hardware of Burroughs B50002 implemented limitless main memory using variable-length segments. Operands have proceeded from single words, to bytes, to strings of fourbit digits, as on the B3500. The demand for instruction DESIGN OBJECTIVE Burroughs B1700 is a protean attempt to completely vanquish procrustean structures, to give 100 percent variability, or the appearance of no inherent structure. Without inherent structure, any definable language can be efficiently used for computing. There are no word sizes or data formats-operands may be any shape or size, without loss of efficiency; there are no a priori instructions-machine operations may be any function, in any form, without loss of efficiency; configuration limits, while not totally removable, can be made to exist only as points of "graceful degradation" of performance; modularity may be increased, to allow miniconfigurations and supercomputers using the same components. Design rationale The B1700's premise is that the effort needed to accommodate definability from instruction to instruction 489 490 Fall Joint Computer Conference, 1972 is less than the effort wasted from instruction to instruction when one system design is used for all applications. With definable structure, information is able to be represented according to its own inherent structure. Manipulations are able to be defined according to algorithms' own inherent processes. Given such freedom, it is easy to construct novel machine designs which are 10 to 50 times more powerful than contemporary designs, and which can be interpreted by the B1700's variablemicrologic processor using less than 10 to 50 times the effort, resulting in faster running times, smaller resource demands, and lower computation costs. To accomplish definable structure, one may observe that during the next decade, something less than infinite variability is required. As long as control information and data are communicated to machines through programming languages, the variability with which machines must cope is limited to that which the languages exhibit. Therefore, it is sufficient to anticipate a unique environment for each programming language. In this context, absolute binary decks, console switches, assembly languages, etc., are included as programming language forms of communication. Let us call all such languages "S-languages" ("S" for "soft," or also for "system" or "source" or "specialized" or "simulated"). Machines which execute S-language directly are called "S-machines." The B1700's objective, consequently, is to emulate existing and future S-machines, whether these are 360's, FORTRAN machines, or whatever. Rather than pretend to be good at all applications, the B1700 strives only to interpret arbitrary S-language superbly. The burden of performing well in particular applications is shifted to specific S-machines. RPG • \ \ \ \ \ . _----- \ FORTRAN .- ALGOL \ __ - - "'"'0 RPG OPERATING SYSTEM • COMPILING ALGOL • NUMERICAL PROCESSING 0--" o • SIMULATION o • • DATA BASE. EMULATION6;~ Figure 2-Typical B1700 S-machines (0) positioned by goodness-of-fit to application areas ( • ) Throughput measurements, reported below, show that the tandem system of: GENERAL DESIGN COBOL ~ COBOL .- ...- COMPILING ...- ... _-• OPERATING SYSTEM _--- ..... ...... SIMULATION APPLICATION PROGRAM, interpreted by an S-MACHINE (which is optimized for the application area), interpreted by the B1700 HARDWARE (which is optimized for interpretation) is more efficient than a single system when more than one application area is considered. I t is even more efficient than conventional design for many individual application areas, such as sorting. To visualize the architectural advantage of implementing the S-machine concept, imagine a two-dimensional continuum of machine designs, as in Figures 1 and 2. Designs which are optimally suited to specific applications are represented by bullets ( • ) beside the application's name. The goodness-of-fit of a particular machine design, which is represented as a point (0) in the continuum, to various applications is given by its distance from the optimum for each application; the shorter the distance, the better the fit, and the more efficient the machine is. Figure 1 dramatizes· the disadvantage of using one design for COBOL, FORTRAN, Emulation, and Operating System applications. Figure 2 pictures the advantage of emulating/interpreting many S-machines, each designed for a specific application. Note that emulation inefficiencies must be counted once for each S-machine, SInce they are all interpreted. DATA.BASE ~- ......... ~------------------------. • NUMERICAL PROCESSING EMULATION Figure 1-Typical machine design (0) positioned by gqodness-of-fit to application areas ( • ) HARDWARE CAPABILITIES To allow the user's problem statement to dictate the structure of the machine and the semantics of machine operations, new degrees of flexibility and Design of the Burroughs B1700 speed are required from hardware, firmware, and software. Defined-field capability All information in a B1700 system is represented by fields, which are recursively defined to be either bit strings or strings of fields. Specifically, bytes and words do not exist. • All memory is addressable to the bit. • All field lengths are expressable to the bit. e Memory access hardware must fetch and store one or more bits from any location with equal facility. That is, there must be no penalty and no premium attached to location or length. • All structured logic elements in the processor can be used iteratively and fractionally under microprogram control, thus effectively concealing their structure from the user. Iterative use is required for operands which contain more bits than a functional unit can hold; fractional use is required for smaller operands. Defined-field design gives flexibility because information is represented by recursively defined structures of bits. It also gives speed because all bits in a field (and only those bits in a field) are processed in parallel. Additional speed is obtained from the advanced technology of the B1700 components. Main memory is constructed out of LSI MOS circuits with 1024-bit chips having lS0-nsec access time. The B1700 is the first small-scale, general-purpose, commercial computer to use MOS/LSI circuitry in its main memory. Generalized language interpretation Iii I ,I" I; III' II No machine language is built into the hardware. There is no processor structure or set of machine instructions for which compilers may generate code. Each language to be executed must first configure the B1700 processor into whatever structure is efficient for algorithms in that language. Defined operations on the defined structure are then executed by changeable microprogram. B1700 processors are specifically designed to avoid causing significant differences in efficiency due to differences in such "soft" machine structures and operations. • Microinstructions are executed at 2, 4, and 6MHz rates using MSI CTL II logic with typical delay of 3 nsec per gate. 491 • Microcode executes out of main memory. It may be buffered through 60-nsec access bipolar circuits. Such buffering is invisible to the microprogrammer. • Microprocedures are reentrant and recursively usable; each processor includes a 32-deep stack for fast entry and exit; stack operations are automatic, not microprogrammed. • Microprograms are not limited in size, nor would large microprograms be inefficient because of size. • Microcode on the B1700 is compact, economizing storage. COBOL, FORTRAN, BASIC, and RPG language processors as well as second-generation and third-generation emulators have been microprogrammed each in less than 4000 16-bit microinstructions. • Hardware assists with the concurrent execution of many microprogrammed interpreters. It takes from 14 p.sec to 53 usec (at 6MHz) from the completion of an S-instruction for one interpreter until the beginning of an S-instruction for another interpreter, depending on how much of the processor must be reconfigured. ]\t{emory protection, fast interrupt response, and uniform status of microprograms allow each microprogrammer to be unconcerned that other interpreters may be running simultaneously. Control over binding While the hardware for defined-field and generalized language interpretation allows a varying processor image for microinstruction to microinstruction, it does not preclude taking advantage of a static processor image. For example, the number of bits to be read, written, or swapped between processor and memory can be different in consecutive microinstructions, but if an interpreted S-machine's memory accesses are of uniform length, this length can be factored out of the interpreter, simplifying its code. In other words, S-memory may be addressed by any convenient scheme; bit addresses are available, but not obligatory for the S-machine. With these hardware advances, language-dependent features such as operand length are unbound inside the processor and memory buss, except during portions of selected microinstructions. Some of these features have, until now, been bound before manufacture, by machine designers. Language designers and users have been able to influence their binding only indirectly, and only on the next system to be built. On the B1700, the delayed binding of these features, delayed down to the 492 Fall Joint Computer Conference, 1972 S-MEMORY address, field length, and direction] into whatever form actually drives the memory and to converting bit strings into whatever form is actually read and written by the memory.) Each processor also connects to one to eight I/O channels or toone to four microprogram memory (M-memory) modules. (See Figure 3.) Later systems may have several field-isolation units. With only one processor, the port interchange may be eliminated, as in Figure 4. EMULATION VEHICLE Any computer which can handle the B1700's portto-port message discipline may employ a B1700 for on-line emulation. (See Figure 5.) Programs and data M-MEMORY PROCESSOR Figure 3-B 1700 Organization-Peripherals include standard large-scale devices, data communications networks, and mass storage units as well as minicomputer devices such as paper tape and 96-column card equipment. Special purpose devices include graphics, document sorters, teller machines, etc. clock pulse level of the machine, gives language designers and users a new degree of flexibility to exploit. Hopefully, this flexibility will lead to the design of languages which are levels closer to user problems. Because of the B1700's interpretation speed, there should be little execution penalty incurred by such advanced forms of man-machine communication. SYSTEM ORGANIZATION Extreme modularity improves the B1700's ability to adapt to an installation's requirements. There may be one to eight processors' connected to one another and to two to 256 65,536-bitsystems memory (S-memory) modules, interfaced by a field-isolation unit. ("Field-isolation" refers to converting defined-field memory requests [i.e., least- or most-significant bit PROCESSOR FI US-MEMORY 300 CPM 96-COL . MFCU--::::f"-'"7 300 LPM 132-COL. PRINTER t - - - - - f }----t:=::::I DUAL SPINDLE 20 MS. DISK Figure 4-0ne of the smallest B 1700's PORT INTERCHANGE FlU S-MEMORY -~ITS COMMUNICATION LINE TO HOST COMPUTER Figure 5-B1700 as an emulation vehicle are sent to the B1700 for execution; I/O requests are sent back to the host which uses its own peripherals for them. Interpreters are loaded via the B1700's console cassette drive. Each Burroughs emulator can run standing-alone, or in an emulation vehicle, or in a multiprogrammed mix. STATE OF THE ART DESIGN The B1700's innovative features have been realized without diminishing the system's ability to provide many proven throughput enhancements. All Burroughs interpreters rely on the B1700's ]\1aster Control Program (M CP) for: • Virtual memory-user programs are not limited in size by the amount of physical storage nor does the programmer ever need to know how much storage is available; compilers automatically segment programs, and the MCP automatically manages these segments without introducing any code into the user program. • l\1ultiprogramming-because common system Design of the Burroughs B1700 functions such as input/output, storage management, and peripheral assignment are removed from user programs and handled by the M CP, every pause in a running program becomes an evident opportunity to run other programs. • Multiprocessing-with S-machine state kept in main memory and with every interpreter in main memory, any processor in the system can resume execution of an interrupted program. The B1700 is the first small-scale computer to offer so comprehensive an operating system. In addition to the MCP capabilities, there are notable system flexibilities, viz: • Dynamic system configuration-processors, memory addresses, I/O channels, and peripherals are not uniquely coded into programs, so such entities can be brought on-line and used immediately without any reprogramming. • Descriptor-organized I/O-in effect, I/O has its own S-language, interpretation of which causes data transfer; it is possible to build this interpretation in hardware, for maximum speed, or it may be soft for maximum flexibility, for example, to allow easy interfacing with new devices. • System performance monitoring-interpreters automatically gather dynamic execution frequencies of program components to establish which parts of a program take the most time;3,4 also, specific microinstructions can interface directly with external monitors, allowing soft event flagging. OVERLAYABLE DATA SEGMENTS S-MACHINE STATE ( RUN STRUCTURE) DATA DEFINITIONS FILE DEFINITIONS FILE BUFFERS 493 OVERLAYABLE PROGRAM SEGMENTS DODD Figure 6-B1700 program S-memory components interpreters active in one mix-one designed for speed, another for code compaction, etc.-all employing the same S-language expressly designed. for COBOL, that is, a COBOL-machine definition. The interpreter name is looked up in the interpreter dictionary to yield a pointer to the interpreter code in S-memory. To switch back to the MCP interpreter, a user interpreter performs the identical procedure. It calls the interpreter interface routine, which maintains a pointer to the MCP's interpreter, and switches run structures. Interpreter switching is independent of any execution considerations. It may be performed between any two S-instructions, even without· switching S-instruc.tion streams. That is, an S-program may direct its interpreter to summon another interpreter for itself. This facility is useful for changing between tracing and non-tracing interpreters during debugging. Interpreter switching is also independent of M-memory. Microcode always actually addresses S-memory. In case M is present, special hardware diverts fetches to it. Without M, no fetches are diverted. Interpreter switching Interpreter management I I I I I Note that without a native machine language, the MCP itself must be written in higher-level language and interpreted just like any other program. It, and all other active jobs, are represented in memory according to Figure 6. There are read-only code segments which may be anywhere in memory and a write-protected area which contains the program's S-machine state, data segments, file buffers, and other work areas. One of the l\1CP's data segments contains an interpreter dictionary that points to each interpreter which is active (i.e., interpreting one of the jobs in the mix). To reinstate a user's interpreter, the MCP extracts from the user's S.,..machine state the name of the interpreter being used, brings it into S-memory, and calls the interpreter interface routine which switches run structures. Associating S-machines and interpreters symbolically allows such things as several COBOL Entries in the interpreter dictionary are added whenever a job is initiated which requests a new interpreter. Interpreters usually reside on disk, but may be read in from tape, cards, cassettes, data comm, or other media. They have the same status in the system that object code files, source language files, data. files, compiler files, and MCP files all share: symbolicallynamed, media-independent bit strings. While active, a copy is brought from disk, to be available in main memory for direct execution. The location may change during interpretation due to virtual S-memory management, so microinstructions. are location-independent. At each job initiation and termination, the MCP rearranges the interpreters in M-memory to try to avoid swapping. Interpreter profile statistics show that over 99 percent of all microinstructions are executed 494 Fall Joint Computer Conference, 1972 out of M-memory, even when the demand for M-memory space is double the supply. At higher demand rates, swapping occurs. Ease of microprogramming Writing microprograms for the B1700 is as simple, and in some ways simpler, than writing FORTRAN subrou tines: • Microprograms consist of short, imperative English-like sentences and narrative comments. For example, one microinstruction in the FORTRAN interpreter is coded as follows: Read 8 bits to T counting FA up and FL down. • Knowledge of microinstruction forms is not beneficial. Although micro programmers on other machines need to know which bits do what, on the B1700, there is no way to use that information. Once the function is given in English, its representation is immaterial. The B1700 microprogrammer has only one set of formats to worry about: those belonging to the S-language which he is interpreting. • Multiprogramming of microprograms is purely an MCP function, carried out without the microprogrammer's knowledge or assistance. Actually, there is nothing one would do differently, depending on whether or not other interpreters are ,running simultaneously. • Use of M -memory is purely an M CP function; users cannot move information in and out of M. Other than rearranging one's interpreter according to usage, there is nothing one should microprogram differently depending on whether microinstructions are executing out of M-memory or S-memory. Maximizing use of system resources is beyond the scope of any individual program; responsibility lies solely with the MCP and the machine designers. • Since all references are coded symbolically, protection is easy to assure. Microprograms can reference only what they can name, and they can (a)? COMPILE XCOBOL/INTERP WITH MIL; DATA CARD (b)? COMPILE XCOBOL/INTERP WITH MIL; MIL FILE CARD=XCOBOL/ SOURCE Figure 7-Typical MCP control information for creating interpreters (a) ? EXECUTE FILE/UPDATE (b)? EXECUTE FILE/UPDATE; INTERP = XCOBOL/INTERPRETER Figure 8-Typical MCP control information for .executing programs only name quantities belonging to thems'elves and their S-machines. Moreover, artificially generated names (e.g., negatively subscripted FORTRAN arrays) are checked for validity by concurrent hardware. • Calling out interpreters is simplified by the continuation of Burroughs' "one-card-of-free-formEnglish" philosophy of job control language. Figure 7 shows the control information which creates a new interpreter (a) from cards, and (b) from a disk file named XCOBOL/SOURCE. • Association of interpreters and S-language files occurs at run-time. Figure 8 shows the control information which executes a COBOL program named FILE/UPDATE with (a) the usual COBOL interpreter, and (b) another interpreter named XCOBOL/INTERPRETER. • There is no limit to the number of interpreters that may be in the system (except that no more than 244 bits are capable of being managed by the B1700's present virtual memory property, so a 28,OOO-bit average interpreter length means there is a practical limit of 628,292,362 interpreters ... many more than the number of S-languages in the world). Additional information about B1700 microprogramming may be found in Reference 5. EVALUATION Evaluation of novel architecture is not merely an unsolved problem; most rational attempts produce worse results than subjective guesses. Consider benchmarks, which measure more system parameters than any other technique. Any benchmark program which runs on the B1700 develops not only an observed run.. ning time, but also a program profile which indicates how to reduce that time (possibly by 50 percent or more). What, then, is the true performance of the system? The observed time, even though known inefficiencies are pin-pointed? Half the observed time? Not until the benchmark has been changed. The point of benchmarks is to have a standard reference which allows the customer to characterize Design of the Burroughs B1700 his work and obtain a cost/performance measure. What customer would be satisfied with an inefficient characterization? If the B 1700 can show that a program is not using the system well, what good is it as a benchmark? If we change the program to remove the inefficiencies, it is no longer standard. This is a pernicious dilemma. Even the simplest measure, add time, still published as if it hasn't been a misleading and unreliable indicator for the past 15 years, is void. What is the relative performance of two machines, one of which can do an almost infinite variety of additions and the other of which can do only one or two? The B1700 can add two 0-24 bit binary or decimal numbers in 187 nsec; how fast must a 16-bit binary machine be in order to have an equivalent add time? Assuming reasonable benchmark figures are obtainable, they would say nothing about the intrinsic value of a machine which can execute another machine's operators, for both existing and imaginary computers; which can interpret any current and presently conceivable programming language; which can always accept one more job into the mix; which can add on one more peripheral and one more memory module, to grow with the user; which can interpret one more application-tailored S-machine; which can tell a programmer where his program is least efficient; which can continue operation in spite of failures in processing, memory, and I/O modules. These characteristics of the B1700, shared by few other machinesno machine shares them all-save time and money, but are not yet part of any performance measurement. Despite the nullification of measures with which we are familiar and the gargantuan challenge of measuring the B1700's advancements of the state-of-the-art, there are, nevertheless, some quantifiable signs that the system gives better performance than comparablypriced and higher-priced equipment. Utilization of memory Defined-field design's major benefit is that information can be represented in natural containers and formats. Applied to language interpretation, definedfield architecture allows S-language definitions which are more efficient in terms of memory utilization than machine architectures which have word- or byteoriented architecture. For example, short addresses may be encoded in short fields, and long addresses in long fields (assuming the interpreter for the language is programmed to decode the different sizes). Alternatively, address field size may be a run-time param- Language of Sample FORTRAN FORTRAN COBOL COBOL RPGII Aggregate Size on B1700 280KB 280KB 450KB 450KB l50KB Aggregate Size on Other 560KB 450KB l200KB l490KB 310KB 495 Percent Improved Other B1700 Utilization System System/360 50 B3500 40 B3500 60 System/360 70 System/3 50 Figure 9-Amount of program compaction on B1700 eter determined during compilation. That is, programs with fewer than 256 variables may be encoded into an S-language that uses eight-bit data address fields. Even the fastest microcode that can be written to interpret address fields is able to use a dynamic variable to determine the size of the field to be interpreted. Just how efficient this makes S-languages is difficult to say because no standard exists. What criterion will tell us how well a given computer represents programs? What "standard" size does any particular program have? We would like a measure that takes a program's semantics into account, not just a statistical measure such as entropy. If we simply ask how much memory is devoted to representing the object code for a set of programs, we find the statistics of Figure 9. In short, the B1700 appears to require less than half the memory needed by byte-oriented systems to represent programs. Comparisons with word-oriented systems are even more favorable. As to memory utilization, the advantage of the B1700 is even more apparent. Consider two systems with 32KB (bytes) of main memory, one a System/3, the other a B1700. Suppose a 4KB RPG II program is running on each. If we ask how much main memory is in use, we find the comparison of Figure 10. The utilization at any given moment may be 30 times better on the B1700 than on the System/3. At least, with all program segments in core, it is seven times better (4.5KB vs. 32KB). Even if we assume the RPG interpreter is in main memory and. is not shared by other RPG jobs in the mix, the comparison varies System System/3 B1700 Bytes in Use Percent Comment 32K 100 28K is idle without multiprogramming and virtual memory. 3 Assumes 500B run structure lK and 500B of program and data segments. Figure lO-Hypothetical RPG memory requirements 496 Fall Joint Computer Conference, 1972 from 6:1 to 4:1, 5KB to 8KB (vs. 32KB) , 84 to 75 percent better utilization. As more and more RPG jobs become active in the mix, the effect of the interpreter diminishes, but then comparison becomes meaningless, because other low-cost systems. cannot handle so large a mix. (Note that these figures change when a different main memory size is considered, so the comparison is more an illustration of the advantage of the B1700's variable-length segments and virtual memory than of its memory utilization.) More detailed information on memory utilization may be found in Reference 6. Running time Although program running time is said to involve less annual cost at installations than the unquantifiable parameter which we may call "ease of use", let us mention some current observations. When the B1700 interprets an RPG II program, the average S-instruction time is about· 35 microseconds, compared to System/3's 6-microsecond average instruction time. On a processor-limited application (specifically, calculating prime numbers), the identical RPG program runs in 25 seconds on a B1700 and 208 seconds on a System/3 model 10. Both systems had enough main memory to contain the complete program; only the memory and processor were used. The B1700 lease rate was 75 percent greater than the System/3's. In terms of cost, the B1700 run consumed 30¢ while the System/3 run took $1.60. In terms of instruction executions, the B1700 was 50 times faster. That is, each individual interpreted RPG instruction, on the average, contributed as much to the final solution as 50 System/3 machine instructions. The fact that the B1700's S-machine for RPG is 50 times more efficient than System/3 seems to support the B1700 philosophy, that interpretation of S-machines which are optimized for each application yields better performance than using a general-purpose architecture. Using another set of benchmark programs (for banking applications), and another B1700 which leases for the same as the System/3 with which it was compared, throughput comparisons are again noteworthy. Despite defined-field design, soft-interpretation, soft I/O, multiprogramming, multiprocessing, and virtual memory, all of which supposedly trade speed for flexibility, the B1700 executes RPG programs in 50 to 75 percent of the System/3 time, and compiles them in 110 percent of the System/3 time, for the same monthly rental. In applications of this type, compilation is expected annually (monthly at worst) while execution is expected daily. (Systems used for this comparison included a multi-function card unit to read, print, and punch 96-column cards, a 132-position 300 lpm printer, a dual spindle 4400 bpi disk cartridge drive, and operator keyboard. The System/3 could read cards at 500 cpm, while the B1700 could read at 300cpm.) CONCLUSION lVIicroprogramming, firmware, user-defined operators, and special-purpose minicomputers are being touted as effective ways to increase throughput on specific applications while decreasing hardware costs. One standard system tailors itself to an installation's needs. Effective as these approaches are, they are all held back by procrustean machine architecture. Burroughs B1700 appears to eliminate inherent· structure by its defined-field and soft interpretation implementation, advancements of the state-of-the-art. Without a native machine language, the B1700 can execute every machine language well, eliminating nearly all conversion costs. Designed for language interpretation rather than general-purpose execution, the B1700 can run every programming language well, reducing problem-solving time and expense. It does not waste time or memory overcoming its own physical characteristics; it works directly on the problems. Furthermore, these innovations are available in low-cost systems that yield better price/performance ratios than conventional machinery. ACKNOWLEDGMENT Many of the design objectives were first articulated by R. S. Barton. 7 The author wishes to thank Brian Randell, R. R. Johnson, Rod Bunker, Dean Earnest and Harvey Bingham for their conscientious criticism of various drafts of this article. BIBLIOGRAPHY 1 A W BURKS H H GOLDSTINE J VON NEUMANN Preliminary discussion oj the logical design oj an electronic computing instrument A H TAUB (ed) Collected Works oj John von Neumann Vol 5 The Macmillan Co New York 1963 pp 34-79 Also in C G BELL A NEWELL Computer structures: Readings and examples McGraw-Hill Book Co 1971 pp 92-119 Design of the Burroughs B1700 2 W LONERGAN P KING Design of the B5000 system Datamation 7 5 May 1961 pp 28-32 3 S C DARDEN S B HELLER Streamline your software development Computer Decisions 2 10 October 1970 pp 29-33 4 D E KNUTH An empirical study of FORTRAN programs Software-Practice and Experience 1 2 April 1971 pp 105-134 5 W T WILNER f 1,1 1I~ (~ 497 Microprogramming environment on the Burroughs B 1700 IEEE CompCon '72 For reprints write to the author at Burroughs Corporation 6300 Hollister Avenue Goleta California 93017 6 W T WILNER Burroughs B1700 memory utilization Proc F JCC '72 this volume 7 R S BARTON Ideas for computer systems organization: A personal survey Software Engineering 1 Academic Press New York 1970 pp 7-16 An on-line two-dimensional computation system* by THOMAS G. WILLIAMS ~ System Development Corporation Santa Monica, California I i main body of the paper, describes the TAM software modules and some of the details of their operation. The fourth section draws some tentative conclusions about the usefulness and feasibility of on-line computation systems and indicates some areas for future refinement. INTRODUCTION ft, ',' The role of graphics in interactive man-computer systems is to extend the capability of the computer for communication in a visual mode so that men can communicate with the computer directly in the figurative notations and graphics conventions that they have developed for communication among themselves. A large number of computer-graphics systems have been developed, most of them directed toward drawing lines, curves, or shapes, as in schematic drawing,! solid and half-tone drawing, 2 and computer animation. 3 Considerable work has also been done in computer output of drawings and graphs. This paper presents the results of an exploration into a different domain of computer graphics-one in which symbols and alphanumeric characters are the primary notations, rather than lines or pictures. Such notations are used in mathematics, organic chemistry, flowcharting, and other applications. By design and evolution, they tend to exhibit the structure, organization, and nature of the problems they are designed for in a more compact and economical manner than do notation systems (e.g., programming languages) designed to express the operational steps of logic-oriented computer programs. Moreover, these notations are familiar, through education and experience, to a greater number of potential computer users than are programming languages. The experimental system described here, called The Assistant Mathematician (TAM), uses computergraphics techniques to allow the on-line use of ordinary hand-printed mathematical notation for computer programming and mathematical problem solving. The second section gives a general description of the TAM facility, with examples of its use. The third section, the GENERAL DESCRIPTION TAM is an interactive programming system for numeric computation. The TAM user language is ordinary two-dimensional mathematical notation. TAM incorporates an extensive set of arithmetic operators on constants, variables, and one- and two-dimensional arrays. It provides many common functions such as trigonometric and logarithmic functions. It also provides function definition and looping facilities for repetitive calculation. TAM operates under ADEPT, a time-sharing system developed at SDC.4 The graphics console used with TAM is a single interactive input-output surface. For input, a data tablet supplies a continuous stream of X - Y coordinate pairs representing the position of the tablet stylus. Information generated by the computer program is rear-projected on the tablet surface by a cathode-ray-tube projection system. The tablet surface is the only working area on the console; no mechanical pushbuttons or keyboards are used. Printing on the tablet surface is remarkably similar to writing with a pen on a piece of paper. The engineering and interface aspects of this device have been documented by Gallenson. 5 A user begins by printing the expression he wants the computer to evaluate. As he prints, the track of the stylus is displayed on the surface so that he can see what he has printed. (Figure 1 shows the console surface with hand-printed input.) When he has completed his input (signalled by a time-out), each input character is processed by a character-recognition program that operates under ADEPT. Two kinds of verification are then made. First, the character recognizer * This research was supported by the Advanced Research Projects Agency of the Department of Defense under Contract DARC 15-67-C-0149. 499 500 Fall Joint Computer Conference, 1972 Figure I-Handprinted input Figure 3-Result of computation displays a computer-generated set of characters corresponding to the position and size of the user's handprinted input; from this display, the user can verify that his input has been correctly recognized, character by character. At the same time, the character recognizer's output is passed through two mathematicsstructure modules: one, the analyzer, transforms the user's two-dimensional input into a machine-interpretable linear infix statement; the other, the builder, transforms the linear statement back into two-dimensional form for display 'to the user, thus giving him the opportunity to verify that the analyzer is correctly analyzing his input. Figure 2 illustrates these operations. At this point, the user can instruct the computer to execute the expression by placing the tablet stylus over the area labeled 'TAM' (with the result shown in Figure 3), add to the expression, or make any necessary corrections. To make corrections, the user can call upon a number of editing operations. He may change a character by simply overwriting it with a new character. He may erase one or more characters by "scrubbing" over them as though he were scratching them out. He may also move groups of characters to open up space Figure 2-Results of character recognition and structure analysis Figure 4-Matrix input On-Line Two-Dimensional Computation System 501 The system handles a wide range of mathematical notation. In all cases, the notation used for displaying the results of computation corresponds to that used for input. For instance, Figure 4 shows the input of a 3 X3 matrix; Figure 5 shows the result of inverting this matrix. SYSTEM DESCRIPTION Figure 5-Matrix inversion to insert new characters, close up an expression to delete spaces, correct errors, etc. two-dimensional notation Graphics Modules 2-D-+l-D Graphics Modules , f The basic graphics modules are: a. A recognizer for hand-printed characters; b. A mathematics-structure analyzer, which converts a two-dimensional mathematical expression into an equivalent expression in linear infix form; and c. A mathematics-structure builder, which converts the linear expression produced by the analyzer into the equivalent two-dimensional form. Character Recognizer The character recognizer serves as an input device for the rest of the system. It provides the system with the processed results of a hand-printed input in a standard form-namely, the character code assigned to the in- This section discusses the basic software modules of the TAM system. TAM handles information in two equivalent forms-two-dimensionaland linear-and the system modules are distinguished into two classes, depending on the form with which they operate. Graphics modules operate with both two-dimensional and linear information and are used in the user-interface parts of the system. Language-processing modules, which handle only linear information, interpret and numerically process user requests. The overall information flow in the system is shown below: one-dimensional notation Language Processing Modules two-dimensional notation Graphics Modules l-D-+2-D one-dimensional notation put, its size, and its position. The TAM recognizer has been designed to provide a large alphabet (in excess of 120 characters) for a given user; unlike most such efforts,6,7,8 it is general in the sense that the user prints in his own style and only rarely must change his normal habits. This is accomplished by building a unique character dictionary for each user from samples of his own printing. Dictionary building is an interactive process, and the user may add characters or resolve ambiguities at any time. In our usage, typical character sets consist of numerals, the uppercase and lowercase Roman letters, those Greek characters that are distinct from the Roman characters, and special mathematical symbols. The character recognizer is described in Reference 11. Mathem.atics-structure analyzer The TAM mathematics-structure analyzer accepts as input a two-dimensional mathematical expression and Fall Joint Computer Conference, 1972 502 Two-Dimensional Form Linear Infix Form (X[~i./'2]+ Y[~i./'2])/(Zl~i./'2]) Figure 6-Expression representation produces a linear infix equivalent of the expression. The form of the input is that supplied by the character recognizer: a list of character codes with associated size and position information. Only the dimensional information is converted; no conversion to Polish or tree-structure form is performed. Figure 6 shows an example of a two-dimensional form and the equivalent linear form produced by the analyzer. Our reasons for not converting fully to a tree structure notation, as Anderson9 does, lie in the nature of operations on an interactive time-sharing system. In general, the user constructs his expression in stages, correcting or adding to it at successive stages. We give him the results of a "trial analysis" (in two-dimensional form) at each stage, so that he can correct analyzer errors as early as possible. By not converting to a tree structure at each stage of analysis, we improve the speed and efficiency of the analyzer. However, we do sacrifice some flexibility and analysis power in comparison to Anderson's analyzer. The analyzer accepts a wide range of mathematical notation, including subscripts and superscripts, displayed fractions, overs cores and underscores on single characters and groups of characters, the ~ and II notations for summation and multiplication, the integral sign with limits, matrix and vector notation, and combinatorial notation. Many other notations, such as those used for ordinary and partial differential equations, are combinations of these basic notational devices and can also be analyzed. However, owing to the leftto-right scan of the analyzer, some notations cannot be accepted. Examples are: 4 2 3 1 X lim n->oo Because some subscripts and superscripts precede the main character. Because the lim is handled as three distinct t3ymbols, and the n,~, and 00 will be treated as subscripts of the characters to the right of which they happen to fall. In general, however, most forms of mathematical notation in common use are acceptable and are quickly and correctly analyzed. Briefly, the analyzer operates as follows. The analyzer begins with the list of characters supplied by the recognizer, sorted into left-to-right order according to the X coordinate of the left edge of the character. The analyzer looks for a spatial relationship between a character on the input list and a reference character, which is one of the previously analyzed characters in the input list. This relationship may be either (1) one that causes the input character to be bound to the reference character as a subscript, numerator, etc., or (2) the "mainline" relation (on a roughly horizontal line) that causes the input character to become a new reference character. Characters bound to a reference character become subreference characters that can be used recursively in the analysis of an input character. This allows nested fractions and multiple levels of subscripts and superscripts. The kinds of relationships possible between a reference character and an input character depend upon the nature of the reference character. For instance, if the reference character is a letter, the overscore, underscore, superscript, or subscript relationships are tested. If the reference character is a horizontal bar, the numerator and denominator relationships are tested. This provides a reasonably efficient search and allows a degree of error control (for instance, digits cannot have subscripts). Reference 10 contains a full description of the analyzer. MatheIIlatics-structure builder To ensure that a user need deal only with two-dimensional notation, a means is provided in the TAM system to supply output to the user in two-dimensional notation. The mathematics-structure builder supplies this function. It accepts the linear expression generated by the analyzer and produces a clear, typeset-quality* two-dimensional expression. The builder is used for two purposes. First, it generates two-dimensional displays (e.g., 5.8 -10 9) of the results of computation, including matrix computation. Second, it feeds back to the user the results of the analyzer's operation. Since the structure modules distinguish between brackets and parentheses supplied by the user and those generated internally, displaying only those supplied by the user, the reconstruction of a correct analysis will have the same form as the original input; the form of an incorrect analysis is usually very different from that of the input. Thus, the user need only compare the two displayed forms to verify the analysis. The builder has two components: a scanner for finding, in succession, the main elements of the linear ex- * That is, the information generated by the analyzer could drive an automatic typesetting device. I On-Line Two-Dimensional Computation System , I pression and a set of element processors, one for each of the forms used in mathematical notation. The element processors accept as input an element and its related characters (e.g., a variable and its·subscripts and superscripts or a fraction bar and its numerator and denominat or) and produce the two-dimensional equivalent in terms of characters and their associated size and posi-' tion. Element processors also produce the location and size of the smallest possible rectangle surrounding the expression for each element. The scanner finds the main elements, calls the element processors, and strings the output of the element processors together; it also returns the size and location of the rectangle around the expression. The scanner and element processors are called recursively to analyze subparts of the expression. Details of the builder's operation are contained in Reference 10. Language processing and notation I The TAM graphics modules accept most of the syntactical forms of conventional mathematical notation. In order for the system as a whole to behave according to the user's expectation that notation that is accepted will be taken to mean what he means by it, the system's language-processing modules must implement the semantics indicated by syntactical conventions. This, with a few exceptions, they do. (The exceptions, such as the operations of integral and differential calculus, are signaled as errors.) Despite the exceptions, the set of operations that is implemented is adequate for a large number of arithmetic-computation tasks. Some of the important semantic features of the language processors are the following: Implicit multiplication-multiplication indicated by juxtaposition of two or more variables or constants. This is a standard feature of mathematical notation; because, in TAM, all identifiers are single letters, implicit multiplication can be provided. I I Implied data types and dimension-frees the user from having to declare, in a separate statement, the data type and/or dimension of a variable. In TAM, this information is acquired from the first use of the variable, when possible. (A dimension statement for arrays does exist and is used primarily in an optional form that allows an array to be preset or cleared.) Universal operators-refers to the applicability of every operator to all data types for which it is meaningful. That is, any operation (such as multiplication) can apply to integers, real numbers, 503 vectors, matrices, and any combination thereof, with no notational change or distinction. A utomatic prompting-the ability of TAM to ask for the value of any variable that is undefined at the point of first use. Built-in junctions and constants-some functions like sine, cosine, are called explicitly by name. Others are called by the use of the ordinary mathematical notation; e.g., A-I, where A is a matrix calls the matrix-inversion function. 1r and e are. among the built-in constants. With a few exceptions, the notation used in TAM is' standard. One exception is absolute value, for which the symbols [ ] were invented because, in United States usage, there is no real distinction between the printing of the digit 'one' and the printing of a vertical bar. Another exception is a loop control statement which, except for specific functions such as and II, does not exist in ordinary mathematics. L Entities Quantities. Quantities in TAM are integers or mixed numbers. Quantities may be contained in variables or arrays or expressed as constants. lVlost storage declaration is implied by usage. Arrays are dimensioned either implicitly or explicitly. Identifiers. Variables and array identifi~rs are single letters. (Note that this permits implicit multiplication.) The legal alphabet of TAM consists of Greek and Roman uppercase and lowercase letters. An identifier may be qualified (made unique) through the use of overscoring or underscoring. Legal overs core and underscore characters are: -r..J/\.~ The large character set accepted by the system gives a suitably diverse set of possible identifiers. It is usually possible, for example, to copy an equation directly out of a paper or textbook and enter it into TAM. Operators Quantities may be manipulated through the use of operators as shown below: a+b a-b ab,a·b,a*b a b' alb addition subtraction multiplication division 504 Fall Joint Computer Conference, 1972 exponentiation root V = factorial absolute value ceiling floor nth [ ] [ ] [ ] > ~ n IT i=m product summation i=m +b,-b T unary sign transpose (two-dimensional arrays only) Each operator is usable when meaningful. With few exceptions (for example, transpose applies only to twodimensional matrices; (-3)! is signaled as an error), all operators are usable to manipulate quantities, variables, and arrays. The operators are legal when applied to arrays when an acceptable matrix or vector operation is defined. One-dimensional arrays are stored and treated as row vectors, with one exception: in multiplication, if one or·both operands are vectors, the operand on the left is treated as a row vector and the operand on the right is treated as a column vector. The multiplication performed is the dot product. StateDlents There are three distinct TAM arithmetic-computation statements: assignment, function definition, and loop. There are also some built-in functions. Assignment. The assignment statement is used to set identifiable variables or arrays, presumably for use in subsequent statements. An assignment statement is of the form: identifier ~ expression The expression may consist of any legal manipulation of quantities. Function Definition. The TAM user may define frequently used arithmetic expressions as functions; he may then call upon these functions when necessary. Functions, of course, return values. The function definition and call may contain parameters. Both the function expression and the actual parameters of the call may contain calls to other functions. The function-definition statement has the form: In(PI,P2, . .• Pm) = expression where I is a legal identifier, n is an optional alpha- betic or numeric qualifier, and the Pi are optional parameters. The expression may involve any legal manipulation of quantities. The identifier I, once it has been used as a function name, defines a class of functions In and cannot be used later as a variable or array identifier. The functions in class I are distinguished from one another through the use of the qualifier n. (For example, al(X) = X2 and a 2 (X) = VX are two functions in class a; a~3 is an illegal statement; G and G are not in class a.) As many function classes as desired may be defined. The optional parameters, pi, must be legal identifiers and may be used as parameters in many function definitions and also as variables or array names or as function classes. Loop Control. A statement may be iterated by following it with loop-control information. Loops may be nested to any level, but each loop variable in the nest must be unique. Loop control may be specified in three forms: Loop Control, Form 1 Statement: i = m, ... , n where i is the loop variable (an identifier of a simple variable whose value will be incremented by one for each iteration of the statement), m is the initial value for i, and n is the terminal value for i. m and n may be any legal expressions that yield single numeric values. The iteration is complete when i exceeds n. The statement iterated may, but need not, contain references to i. Loop Control, Form 2 Statement: i=ml, m2, ... , n where i is the loop variable, ml and m2 are the first two values for i as the statement is iterated, m2-ml defines the loop increment (or decrement), and n is the terminal value. mI, m2 and n may be any legal expressions that yield single numeric values. The iteration is complete when i exceeds (or becomes less than) n. The statement iterated may, but need not, contain references to i. Loop Control, Form 3 Statement: i = ml, m2, m3, m4, ... mn where i is the loop variable and the mj are successive settings for i each time the statement is iterated. (The elipsis ( ... ) shown is not a part of the loop-control form, as it is in the two previous forms, but is included to indicate that the list I On-Line Two-Dimensional Computation System is of user-determined length.) The loop terminates after the statement has been executed for i=mn • The statement may, but need not, contain references to i. mj Built-In Functions. TAM includes a set of built-in functions that the user can activate by including one of the names given below (along with an appropriateparameter) within any context in which a function call is permissible. The available functions are: ~I Name and Parameter sin (x) cos (x) tan (x) cot (x) arctan (x) tan- 1 (x) In (x) Definition sine (x) cosine (x) tangent (x) cotangent (x) arctangen t (x) arctangen t (x) natural logarithm (x) In expressing the name, any combination of uppercase and lowercase Roman letters is permissible; e.g., Ln=ln=LN. The built-in functions also include those called by notational devices: square root, exponential, matrix inverse, matrix transpose, and factorial. CONCLUSION TAM is an experimental system designed and built to test the usefulness of, and problems associated with providing, natural man-machine communication in the context of problem solving by the non-programmer physical scientist or engineer. We feel that TAM has demonstrated the usefulness of man-machine communication. It provides considerable computation power in a simple, flexible way; learning how to use it requires very little time, even if one has not programmed a computer; and remembering how to use it is easy. More importantly, however, designing TAM has helped us to identify the real problems of working with natural notation. Fundamentally,' any natural notation, including mathematics, is ambiguous and context dependent. As one result of this, TAM does not contain built-in complex arithmetic capabilities, principally because it is difficult to resolve the use of the letter i as an integer (index of summation), mixed number (arbitrary variable), or ~, without reference to the global context in which it is being used. Therefore, the use of natural notation requires some solution of the ambiguity problem. Two solutions are possible. One, adopted in most programming languages, is to require explicit specification 505 by the user of the things he wants to use so that there can be no ambiguity; this specification must be provided for each new program. The other, which we are beginning to explore, is to assume that the user works for a period of time within some specific computational context, with its own defined notations, functions, and data, and that it should be possible to establish in the computer, more or less implicitly, a contextual framework within which the user works as long as he continues with a specific problem or set of problems. The user might start with a general context appropriate to his general problem area. To this he could add his notational devices and functions, much as he might do in defining notation and functions when writing a paper. A contextual framework could also provide a datahandling capability. The notions of information storage and retrieval are foreign to mathematics, and the volume of numeric data required for many useful and interesting problems is beyond the reasonable capacity for entry from a data tablet~ The context system could provide a computational capability over a defined data base, thus providing a simple way to process and reprocess data. We have, basically, just begun to explore the possibilities opened up by the freedom of two-dimensional input. We foresee that the TAM system, and others like it, are forerunners of a new capability and flexibility in natural man-computer interaction. We look forward to the day when a user, in any field, can sit down and communicate with a computer in the language of his choice or invention. ACKNOWLEDGMENT The continuing progress made on this project owes a great deal to many individuals for support, constructive criticism and direction. In particular, Mort Bernstein for his overall guidance; Jean 19awa, for her work on the character recognition; John McGahey, for his work on the graphics modules; and Joan Bebb and Jean Saylor, for the design and implementation of the language processor. REFERENCES 1 W R DE HAAN . A utomatic graphic schematic drawing program Proceedings Third SHARE Design Automation Workshop May 1966 2 W J BOUKNIGHT K KELLEY An algorithm for producing half-tone computer graphics presentations with shadows and movable light sources Proceedings AFIPS 1970 Spring Joint Computer Conference Vol 36 pp 1-10 506 Fall Joint Computer Conference, 1972 3 R M BAECKER Picture driven animation Proceedings AFIPS 1969 Spring Joint Computer Conference Vol 34 pp 273-288 4 R R LINDE et al The ADEPT-50 time-sharing system Proceedings AFIPS 1969. Fall Joint Conference Vol 35 pp 39-50 5 L A GALLENSON A graphic tablet display console for use under time-sharing Proceedings AFIPS 1967 Fall Joint Computer Conference Vol 31 pp 689-695 6 T L DIAMOND Devices for reading hand-written characters Proceedings Eastern Joint Computer Conference December 1957 pp 232-237 7 G F GRONER Real-time recognition of hand-printed text Proceedings AFIPS 1966 Fall Joint Computer Conference Vol 29 pp 591-602 8· W TEITLEMAN Real-time recognition of hand-drawn characters Proceedings AFIPS 1964 Fall Joint Computer Conference Vol 26 pp 559-576 9 R H ANDEHSON Syntax-directed recognition of hand-printed two-dimensional mathematics Presented at ACM 1967 Symposium on Interactive Systems for Experimental Applied Mathematics 10 M I BERSTEIN On-line, interactive parsing and programming; final report for Phase III System Development Corporation Document TM-4582 August 1970 11 M I BERNSTEIN T G WILLIAMS A two-dimensional programming system System Development Corporation Santa Monica California Information Processing 68-North-Holland Publishing Company-Amsterdam (1969) Debugging PL/I programs in the multics environment by B. L. WOLMAN Honeywell Information Systems Cambridge, Massachusetts INTRODUCTION variables which contain a segment number, a word offset, and a bit offset. There is a direct correspondence between PL/I pointers and virtual addresses in Multics; PL/I pointer values may be loaded into the addressing registers of the 645 by a single machine instruction. An attempt to use a pointer whose value is the PL/I null pointer causes a condition to be signalled. The PL/I stack is maintained for each user as a series of contiguous frames (block activation records) within a single segment. A register is dedicated by the system to point at the stack frame of the procedure being executed. Multics defines a system-wide standard call/return sequence which is relatively efficient. Stack frames can be obtained and released by executing a few instructions. Procedure segments in Multics are normally pure and sharable. Access to procedure and data segments is set by Multics access control commands and checked by the hardware at each instruction and data reference. If a user does not have appropriate access to a segment, or if any other error such as an attempt to divide by zero happens, a machine fault occurs. This fault is turned into a PL/I condition (e.g., "accessviolation" or "zerodivide") and is signalled by the PL/I condition mechanism. All but a few catastrophic errors are handled in this manner. Multics provides a default error on-unit which is invoked if the user has not established an on-unit for a specific condition. In most cases, the default on-unit prints an appropriate error message (which may include information as to probable causes for the error) and calls the command processor to read a command from the user's input stream. The stack chain of calls leading up to the fault is preserved; in many cases the user's program can be restarted. In Multics there is no real difference between a command and a program written by the user: both are PL/I procedures. Any program written in PL/I following command argument· conventions may be invoked as a "command". One of the popular misconceptions concerning PL/I is that programs written in PL/I are necessarily inefficient and hard to debug. Several years experience with the Multics PL/I compiler running on the Honeywell 645 has shown that in spite of the apparent complexity of the PL/I language, PL/I programs are easily debugged in the Multics environment, even by novice users who are newcomers to PL/I and are unfamiliar with the Honeywell 645. In most cases the user can debug his program symbolically without having to refer to a listing of the generated instructions or add debugging output statements to the program. This is due to a number of factors: • the run-time environment provided by the system. • the implementation of PL/I. • the availability of a variety of powerful debugging facilities. THE ENVIRONMENT The use of PL/I as the principal tool for programming by users of M ultics was envisioned at the very start of the project. Features which are required by PL/I such as a stack, pointer variables, conditions, and a recursive call/return mechanism are all provided and are directly supported by the system hardware and/or software. Consequently, the basic Multics environment is ideally suited to the needs of PL/I programs. In fact, nearly all of Multics itself is coded in PL/I and executes in this self-maintained environment. I-a The Multics system currently provides the user with a virtual address space of over 1000 segments of 65536 words each (some changes now in progress will increase the maximum size of a seg.ment to 262144 words). Access to these segments is by means of PL/I pointer 507 508 Fall Joint Computer Conference, 1972 When the user types a command line of the form edit alpha beta the Multics command processor searches a specified set of directories for a procedure named "edit" and issues the equivalent of the PL/I statement call edit("alpha", "beta"); 'rhe procedures found in the system directories are the "commands" and utility procedures normally available to Multics users. Since the user can change the search rules used by the system, he can tailor his own command set if he chooses. THE IMPLEMENTATION The implementation of PL/I in Multics is particularly complete and has few restrictions. 6.7 The only omission of any consequence is tasking. The Multics implementation allows: • arbitrary pointer qualification including chains of locators and use of functions as qualifiers. • adjustable data with no restrictions. Arrays may have any number of adjustable bounds. Structures may have any number of adjustable members. • operations on aggregates. • functions which return values whose length or bounds are not known at the time the call is made, e.g., returns(char(*» or returns«*) fixed bin). • entry variables. • recursive procedures at no extra cost. • full stream and record I/O. • all data types including complex and decimal. Since the implementation is so complete, the programmer does not have to worry about what features are or are not available to him. The ability to use the full language reduces the amount of code the user has to debug by increasing the amount of work handled by the run-time support system provided by the compiler. The Multics PL/I compiler produces efficient object code, even when measured against the best efforts of experienced hand coders· using assembly language. The availability of a compiler which generates efficient programs greatly reduces the user's desire to want to switch to assembly language for reasons of efficiency. This is particularly important in Multics because of the richness of the machine instruction set (512 instructions and 64 types of address modification) and the complexity of the system environment from the view point of an assembly language coder. Multics PL/I makes use of a separate "operator seg- ment" which contains assembly language coding for about 50 commonly used functions such as string moving, complex multiplication, and the index operator, as well as tables of constants for masking, shifting, storing characters, etc. This segment is shared by all PL/I programs. Communication with the operator segment is by means of a work area in a standard position in each stack frame. The operator segment is entered by a short sequence of instructions which loads certain machine registers with parameters and then jumps directly into the operator segment at a known location. The use of the operator segment reduces the cost of PL/I programs by reducing their size and by reducing paging activity. If a begin block or internal procedure block does not declare any automatic variables with adjustable bounds or sizes and can only be entered by first entering its parent block, then the block is said to be "quick". The Multics PL/I compiler does not use a separate stack frame for such blocks. Instead, they share the stack frame of their parent block. The overhead of calling a quick block, exclusive of the cost of preparing the argument list, is only three instructions: one each at call, entry, and return. The cost of a quick procedure is also reduced because automatic storage in the parent block can be addressed directly. The availability of a really inexpensive mechanism for internal procedures means that users can write them without having to concern themselves with efficiency. The artifice of using label variables and goto statements so that a block of code can be executed efficiently from a number of places is not necessary. The compiler makes no restrictions on the format of structures. This is important, since programmers can choose a structure description that is appropriate for the problem they are trying to solve without having to consider its acceptability to the compiler. However, it is possible for a user to specify a structure which causes the compiler to generate very expensive accessing code. There are a few "common sense" rules users can follow if they are concerned about the efficiency of their programs. Extensive error checking is done during compilation; there are nearly 500 possible error messages. Except for a few cases of multiple, related errors within a single statement the Multics PL/I compiler normally finds most errors in a single run. It is infrequent that a user will correct a set of source errors and recompile his program only to receive another batch of error messages. Errors are reported on the user's console as they are discovered; the printed message normally includes the source for the offending statement. The listing generated by the compiler is designed to be printed by a high-speed line printer but is formatted Debugging PL/I Programs in the Multics Environment so that items of interest to the user can be easily located in the listing segment by inspecting it with an on-line editor. The user can control the amount and level of detail of information placed in the listing. print blowup.pl1 blowup: procedure; dcl (j,a(10}) fixed binary, loop_index fixed binary external static, recoverY_label label variable external static, sysprlnt fi Ie; DEBUGGING FACILITIES Multics provides a number of special commands which aid user debugging. There is a powerful breakpoint debug command, a facility for tracing procedure calls, and tools which help the user determine the operating characteristics of his programs. There are several options that the user can specify when he uses the PL/I compiler to cause it to generate additional information for use by debugging commands. Of these, only the "profile" option causes any change in the code generated by the compiler. The run-time symbol table The PL/I compiler and the system debug command cooperate to allow the user to debug his program symbolically. The compiler normally generates a run-time symbol table only if "get data" or "put data" statements are used in the source program. The compiler can be instructed, however, to generate a "full" symbol table which includes all identifiers in the source program. Each entry in the run-time symbol table describes an identifier in the user's program giving its name, storage class, location, size, bounds and other information needed to access the identifier. Information is available about the block in which the identifier is defined as well as its relationship to other members of the strucutre to which it belongs. The run-time symbol table facility is much more powerful than it needs to be just to support data directed I/O. • Parameters, defined, and based variables can all be represented in the table. When a variable is declared based on a specific pointer, e.g., "dcl a based(p)", information is kept which allows the address of that pointer to be obtained at run-time. • The size, offset, bounds, multipliers, or virtual origin of any identifier can be any arbitrary expression. This is necessary for the representation of based variables. • .References to identifiers in the user's program from data directed input or from requests to the system dehugger need not be fully qualified. The same algorithm used by the compiler to resolve partially qualified names is also used by the support program which searches the run-time symbol table. 509 recoverY_label = thru; do loop_index = -1 to -100000 by -Ii j = a{loop_index}; end; put skip list("l oop index put skip; end; thru: r 2127 = ", loop_index}; 2.205 pll blowup table PL/I UAfH-l I NG 307 The variable "a" has been referenced but has never l1een set. r 2128 5.516 blowup Error: out_bounds_err by blowupllOO referencing stackl777777 r 2129 2.474 debug /b I owup/100& t, s j loop_i ndex 450 = a(loop_index}; -1209 .Q r 2131 3.544 Figure I-The PLjI condition mechanism is used for most errors, including those defined by Multics. In this example, the program generates a fault by looping until it runs off the front of the stack. The default error on-unit prints the location at which the fault occurred (100 in blowup) and the location being referenced (-1 in the stack). The program was compiled with a run-time symbol table, so the Multics debug command may be used to print the source for the line in which the fault happened. The request syntax accepted by debug is designed to minimize typing: the request specifies segment blowup, location 100 in the text section, and source line output. The value of a variable may be obtained merely by typing its name; the response gives the address of the variable (450 in the static data segment) as well as its value ( -1209). The run-time symbol table is generated at the end of the object segment and is shared by all users of the segment. If it is not used during execution, there is nQ overhead required to support it: the pages it occupies will not be brought into core memory; no code is required to initialize it. After the program has been debugged, the run-time symbol table can be eliminated from the object segment without having to re-compile it. The compiler will also generate a "map" of the object program when a full symbol table is requested by the user. This map is a table, placed at the end of the object segment, giving information about the location in the object segment of each source statement. Tbtl availability of this table means that the user can refer to his object program by source line number, e.g., to set a breakpoint at a specific line number. Similarly, 510 Fall Joint Computer Conference, 1972 the system debugger can tell him the line number corresponding to a given location in the object program. In fact, as is demonstrated in Figure 1, the debug command can print the source line that corresponds to the object location. Error: ou t_bounds_err by b Im/upll00 referencing stackl777777 r 2134 1.057 hold r 2134 .211 edm f i x.pll !)egment not found. Input. fix: procedure; The debug command The command "debug" can be invoked at any time; for example, after an error condition has been signalled for which no on-unit exists. It may also be called directly from the user's program. It accepts requests from the user for actions such as examining some location in the virtual memory or printing a trace of the chain of calls in the user's stack. It is aware of the different PL/I data types, so variables in the object program may be displayed in the format appropriate to their type. When a program has been compiled with a run-time symbol table, the user can refer to it symbolically, either with identifiers defined in the program or by the line number on which a statement begins. For example, if the user's program was dealing with a two-dimensional based array of integers, he could change one of the elements in the array by entering the request p~x(i+5, blowup j -2) =3 which takes the form of a PL/I style assignment. The addresses of "p", "x", "i", and "j" would be obtained from the symbol table. Any of the identifiers in this example could be part of a structure. The debug command can also be used with PL/I programs when a run-time symbol table is not available. In this case, the user must refer to the compilation listing of his program in order to determine the location at which a variable is stored or at which a given statement starts. The debug command has other features which let the more experienced user examine or alter the values in a machine register or display the status of the machine at the time a fault occurred. These facilities are not normally needed if a symbol table is available. The debug command also lets the user set conditional or unconditional breakpoints in object segments. When the breakpoint instruction is executed, the debug program gains control. If the condition associated with the breakpoint is satisfied, a message is printed; at this point the user can enter requests to debug. One of the actions available is to continue execution from the point of the break. The user may associate with each break a set of debug requests which are to be automatically executed whenever the break is encountered; thus, for example, the user might use the break mechanism to "insert" a (very simple) PL/I assignment state- dcl loop_index fixed binary external static, recovery_label label variable external static; loop_index = 12345; goto recovery_label; end; Ed It. q r 2135 1.789 pll fix Pl/I r 2135 3.732 fix loop index = r 2135 1. 724 12345 Figure 2-When a fault occurs, the complete status of the executing program may be preserved. The "hold" command causes Multics to retain the chain of stack frames (block activation records) up to the current frame until the user issues an explicit "release" command. In this example, the user inputs and compiles a small procedure to fix up the loop index that caused the bounds violation in the example of Figure 1. The program blowup is reactivated by a non-local transfer of control to the external label variable and completes normally. The same change of the loop index and re-start of blowup could also be done using only the debug command. . ment into his program. There is a mode of execution available with debug which lets the user run his program one PL/I statement at a time. An object program may have more than one break set in it; similarly, more than one program may have active breakpoints. Facilities are available in debug for listing and altering breaks. Setting a break involves changing the object program, so breakpoints remain active until explicitly removed by the user. Breakpoints should not be used when other users are sharing the segment. There is an "escape" facility which causes debug to pass the line typed by the user to the Multics command processor instead of treating it as a request. This is a very powerful feature since it allows the user to invoke any series of Multics commands (or any of his own programs) without having to leave the debug command. He could, for example, run a special program to display· the values of the static variables used by the program he is trying to debug. If he did not have such a program, he could input it, compile it, and test it while Debugging PL/I Programs in the Multics Environment preserving the complete status of the program he was originally debugging. The ability to "escape" back to the full Multics system to execute any series of commands is generally available in any command such as the editor that interacts with the user. As is shown in Figure 2, the "hold" command may be used to preserve the execution environment after a fault. The trace command The command "trace" lets the user monitor all calls to a specified set of external procedures. Trace modifies the standard Multics procedure linkage mechanism so that whenever control enters or leaves one of the procedures specified by the user, a debugging procedure is invoked. The arguments given to the debugging procedure by trace enable it to obtain the values of the arguments and return point of the procedure being called. The user can also provide his own debugging procedures instead of the one supplied as a default by the tracing package. 511 The action taken by the default trace debugging procedure is to print a message on the user's console whenever control enters or leaves one of the procedures being traced. There are a number of options which the user can specify to request such actions as printing the arguments (at entry, exit, or both) or stopping (at entry, exit, or both). The user can control the frequency with which the tracing message is printed, e.g., every 100 calls after the 1000th call. He can also specify the maximum recursion depth he wishes to see. The user can also request that the tracing message be printed only if the contents of some specified location in the virtual memory has changed. The default trace debugging procedure "stops" the execution of the user's program by calling the debug command; this makes all of the facilities of debug available to the user. An example of the use of trace is presented in Figure 3. The user may start tracing a procedure at any time, even if it has already been executed. Tracing may be removed at any time; subsequent calls of the procedure will execute normally. Any procedure which uses the standard Multics calling sequence may be traced without interfering with other users who may be sharing the segment. print (trev rev).pll trey: proc(string); dcl string char(*) unal, rev entry(char(*» returns(char02) varying); put skip lIst(rev(strlng»; put skip; end; rev: proc(strlng) returns(char02) varying); dcl string char(*); i = index(strlng," II); if i = 0 then return(string); else return(rev(substr(strlng, I» II " " II substr(strlng, 1, I»; end; r 2131 4.164 trey "now is the time" Fatal error. Process has terminated. Uut of bounds fault on user's stack. flew process created. r 2131 3.712 trace rev r 2131 .578 trev"now is the time" Call 1 of rev from trevl1l7 ARli 1 = "no.l is the time" Call 2 of rev from revll06 ARG 1 = " is the time" Call 3 of rev from revll0li ARG 1 = "·is the time" QUIT r 2132 2.428 Figure 3-The flow of control in to and out of any external procedure may be monitored with the Multics debugging procedure trace. In this example, trev is a driver program which calls procedure rev to reverse the words in a string specified by the user when trev is called. rev is coded as a recursive procedure; it contains a bug which causes infinite recursion. The "fatal error" occurs when there is no room left in the stack segment for a new frame. The reason for the infinite recursion becomes obvious when trace is used. Determining program efficiency The two debugging packages debug and trace which we have just discussed help the user find errors which prevent his program from running properly. There is another class of errors which are much harder to find. These are usually flaws in the program design (or perhaps in its implementation) which cause the program to run correctly but to take much longer to execute than it should. Simply locating the largest statement in the program or the biggest procedure is not sufficient to locate the causes of program inefficiency because that statement or procedure may be executed only once; the real offender may be some small statement which gets executed very frequently. Without detailed knowledge of program flow during execution, instruction counts alone are not much good. The cost of executing a specified procedure, either for a single call or a total of many calls, can be determined by using the "meter" option of the trace command. This causes trace to read the system clock when control enters or leaves the specified set of procedures. The clock counts in microsecond steps, so high resolution is possible. Once a procedure has been found to be inefficient, its operating characteristics can be examined by recompiling it with the PL/I "profile" option. 8 This option causes the compiler to generate in the internal stat- 512 Fall Joint Computer Conference, 1972 ic data area a table which contains an entry for each statement in the source program; the table entry contains information about the source line as well as a counter which starts out as zero. Each statement in the program is modified to start with an instruction to add one to the counter associated with the statement. Mter running a program compiled with the "profile" option, the user can determine the number of times each statement in the program was executed. The table entry contains the raw cost of the statement measured in instructions, so the user can determine both the absolute total cost for the statement as well as its relative cost compared to other statements. A number of different tools have been developed for presenting the information available in the profile table. Figure 4 shows the source for a small procedure printed by a program which computes the percentage of the total time spent in each statement. Figure 5 shows the same profile information presented in another format. Profile of shell PEI{CEI~T ~TATEj·IEHT she 11 : proC + 1; .1 .8 do i k 1 to hboundC x, i + d; 1) - d; 13 j 14 12.7 up: 15 16 63.3 17 18 .'1. 19 .3 20 .4 21 22 23 15.8 24 25 3.0 2G 3.0 if j > d then do; k = j; go to up; end; .6 29 .1 30 31 .0 r 1047 4.5~b d; if x (j > > x(k) then do; t = x{j) ; x(j) xCk); x(k) = t; end; 1.7 28 k if d end; 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 9 11 12 14 16 18 19 20 23 25 26 28 30 31 TOTAL r 1048 3.461 COUNT COST 1 4 30 66 1500 23301 155340 234 702 702 31068 7267 7267 1000 24 1 6 6 500 7767 7767 234 234 234 7767 7267 7267 500 6 1 PROGRAI,I shell 228506 Figure 5-Another presentation of the execution profile of the procedure shown in Figure 4. The cost is measured in number of instructions executed. ' 1) ; 8 9 STl-1 DIFFICULTIES {) 7 lll~E The paging characteristics of a program can be measured by using the "page trace" facility. The Multics paging mechanism maintains a buffer for each user in which the system records the segment number, page number, and time of occurrence for each of the last few hundred page faults taken by the user's process. A command is available which formats the information kept by the system. time_prof i 1e she 11 LIf~E print_profile shell end~ As might be expected, there are problems associated with debugging PLjI programs in Multics. Most of these problems are minor and have the effect of requiring the user to know more about the internal workings of Multics than he might otherwise have to know. The most difficult problem occurs when a program in the user's process commits an error so severe that the system cannot continue running the process. An example of such an error is using up the entire stack segment (perhaps because of unlimited recursion). When the system detects an error of this magnitude, it prints a message such as: Fatal Process Error. Out of bounds fault on user's stack. > 1 then go to down; Figure 4-The execution profile of a Shell sort routine after having sorted the descending sequence 999, 998, ... , 0 into ascending order. Each statement is labelled with the percentage of the total execution time spent in that statement. The profile tells us that the algorithm is quite good since unnecessary interchanges were not often done. and creates a new process, thereby erasing all information about the old process. This type of error can be very difficult to find,because no information is available to the user about where it occurred. Future versions of Multics will alleviate this problem by allowing the user to retain information about the old process. The system will also be changed Debugging PLjI Programs in the l\1:ultics Environment to detect when the user is near the end of his stack; when this occurs, a special "stack" condition will be signalled. COMPARISON WITH OTHER WORK PLjC9 and the IBM Checkout Compiler1o are approaches to the problem of debugging PLjI programs in which a special compiler is used during the debugging phase. Extra checking is done at run-time to catch programming errors such as the use of undefined variables. No particular effort is made to generate good object code since it is assumed that the program will be re-compiled with a production compiler after having been debugged with the special compiler. An advantage of this approach is that a great deal of information about the original source program may be preserved at run-time, thus allowing good diagnostics. A debugging compiler can often check for errors whose detection would be intolerably expensive for a production compiler, e.g. a mismatch between a based variable and the object identified by the pointer value. The Checkout Compiler allows the user to make incremental symbolic additions to his program, a very desirable feature. A disadvantage of using a special complier is that two compilers are involved in the debugging process and therefore two sets of compiler bugs. Another disadvantage is that meaningful figures on program performance are hard to obtain. Multics provides a single PLjI compiler which is used· by all programmers, whether novice or expert. Extra checking (other than that defined as part of the PLjI language) is not done at run-time. The run-time symbol table and the map of the object program let the user refer to his program symbolically. Since a production compiler is being used, accurate figures on program performance are available. A "program" in M ultics often consists of a number of separately compiled procedures; the Multics PLjI compiler, for example, consists of 181 procedures comprising over 137,000 instructions. Because of the poor runI' time performance normally available with. a special ':1 debugging compiler, it is doubtful whether such a large collection of procedures could be successfully implemented using a debugging compiler. Since a special com1 pilation is not required for their use, the Multics debugging tools debug and trace may be successfully used in finding bugs in production software. Even if a module i could be re-compiled with a debugging compiler, the { resulting object program would not be the same as the " one which failed. lill' EXDAMSll is a powerful debugging tool which uses " I ,,1'1, I 'II 513 a pre-processor to modify the original source program before compilation. Calls to special monitoring procedures are inserted at points of interest in the program. During execution a record is kept of the complete execution history of the program. This allows the programmer to easily determine the point at which a given variable changes, for example. This sort of debugger would be useful, even in Multics, when a program is first being debugged; its usefulness is limited by the fact that a special compilation is required. Evans and Darley12 discuss source language debugging of higher-level languages. They present a number of principles which they believe are important. The Multics debugging commands satisfy most of their criteria: 1. The user has flexible control over the execution of his program. The program may be run in steps which range from a single procedure call, through a single statement, down to a single instruction. 2. The data being operated on may be examined and altered at any time and this may be done in the PLjI notation. 3. The conventions of the debugging language are to a large extent designed to minimize typing. (It is only fair to point out that the Multics debug command has been accused of being overly terse.) The area in which Multics falls short of the features desired by Evans and Darley is the lack of the facility for incremental compilation. ACKNOWLEDGMENTS The Multics PLjI compiler was designed and implemented by R. A. Freiburghouse, the author, G. D. Chang, and J. D. Mills; significant contributions were also made by P. A. Belmont, P. A. Green, and A. C. Franklin. The Multics debug command was written by S. H. Webber. The trace command was written by the author. Many other members of the Honeywell and M. I. T. staffs, notably M. B. Weaver, D. Bricklin, and D. P. Reed, have made important contributions to easing the process of debugging PLjI programs in Multics. REFERENCES 1 E I ORGANICK The multics system.~ An examination of its structure MIT Press Cambridge Massachusetts 1972 2 A BENSOUSSAN C T CLINGEN R C DALEY The multics virtual memory: Concepts and design Comm ACM 15 5 May 1972 pp 308-318 514 Fall Joint Computer Conference, 1972 3 R C DALEY J B DENNIS Virtual memory, processes and sharing in multics Comm ACM 11 5 May 1968 pp 306-312 4 F J CORBATO J H SALTZER C T CLINGEN M ultics-The first seven years AFIPS Conf Proc 40 1972 SJCC AFIPS Press 1972 pp 571-583 5 M ultics programmers' manual Honeywell Document AG90-93 1972 6 R A FREIBURGHOUSE The multics PLjI compiler AFIPS Conf Proc 35 1969 FJCC AFIPS Press 1969 pp 187-199 7 R A FREIBURGHOUSE The multics PLjI language Honeywell Document AG94 1972 8 DE KNUTH 9 10 11 12 An empirical study of Fortran programs Stanford University Computer Science Department Report CS-186 H L MORGAN R A WAGNER PLjC:-The design of a high-performance compiler for PLjI AFIPS Conf Proc 38 1971 SJCC AFIPS Press 1971 pp 503-510 IBM Systemj360 operating system: PLjI checkout compiler IBM form number GC33-0003 1971 R M BALZER EXDAMS-EXtendable Debugging and Monitoring System AFIPS Conf Proc 34 1969 SJCC AFIPS Press 1969 pp 567-580 T G EVANS D L DARLEY On-line debugging techniques: A survey AFIPS Conf Proc 291966 FJCC AFIPS Press 1966 pp 37-50 Data structures in the extensible programming language AEPL * by E. MILGROM** New York University New York, New York and J. KATZENELSON*** Technion-Israel Institute of Technology Haifa, Israel of the area. Many extensible language schemes have been described in detail. 2- 5,7 ,9,10,16,19 At the present time, we do not believe that the existing extensible languages can reasonably claim to replace all existing general-purpose and special purpose languages, mainly for reasons of efficiency. We concede therefore that the usefulness of AEPL will be greatest for application areas which do not warrant the cost of a specially written compiler and where the matter of efficiency is relatively unimportant. Another possible use of AEPL is during the design phase of a new application language: AEPL provides a rapid and cheap way to experi~ent with different versions of a proposed language. We believe that the major innovations present in AEPL are the treatment of sets, used to create data structures and to define new data types, and the use of a powerful syntax description mechanism derived from the Markov Algorithm. We think also that most of the power of the system stems from its particular architecture and the concept of a special machine or processor which embodies the semantics of the language. In what follows, we give a description of the data structure concepts of AEPL and we show how these concepts are used to create complex data structures and new types of data elements. A complete description of the language can be found elsewhere. 15 The next section of this article presents some of the design objectives of AEPL; the following one describes in general terms the overall model of the AEPL system. Finally, the last section discusses the data structure concepts and the semantics of the data definition facility. INTRODUCTION The extensible programming language AEPL has been designed as a tool for the implementation of a large class of problem-oriented languages or languages for specific applications. The reason for such a goal is that we believe that there exist numerous areas of human interest generating problems which can be solved with the aid of a computer. We think also that to be able to approach these problems using languages which are close to the terminology and the methodology of the respective areas 'is a significant advantage: it enables a user to think in familiar terms and it liberates him from the burden of extraneous detail. This has been the reason for the uneconomic proliferation of a large number of programming languages, each more or less well adapted to the solution of a particular class of problems (see for instance Sammet's book18 for a survey of a number of problem-oriented languages). Extensible languages propose to cover wide areas· of application at lesser cost and greater convenience. A detailed description. of a large number of current extensible languages and systems can be found in·a report by Solntseff,21 together with an extensive bibliography * Based in part on a thesis submitted by the first author to the Faculty of Electrical Engineering of the Technion in partial fulfillment of the requirements for the degree of Doctor of Science in Technology. ** Presently with the Department of Applied Mathematics, University of Louvain, 3030 Heverlee, Belgium. *** On sabbatical leave at the Department of Electrical Engineering and. Computer Sciences, University of California, Berkeley, California 94720. 515 516 Fall Joint Computer Conference, 1972 GENERAL DESIGN OBJECTIVES During the design phase of AEPL, we tried to remain consistent with a number of general concepts and ideas which we discuss in this section. Extensibility The three main aspects of extensibility which we set out to provide were the ability to define new types of data items and new operations on old or new data items and the possibility to modify extensively the syntactic frame of the language. The AEPL system was designed so as to present itself to the users as a language, sometimes called core or kernel language, which includes a number of basic data types, a number of operators for these data types and a syntactic frame within which one can describe sequences of operations on data, i.e., programs. The core language includes also the tools which enable one to modify these basic constituents and create 'extended languages'. Note, however, that the adjective 'extended' does not necessarily imply addition of features to the core language: one can use the extension mechanisms to produce a language which is less rich than the kernel by deletion of undesired features. Minimality In the design of an extensible language, one is tempted to limit the number of primitive language features to the bare minimum and to rely on extensibility for the creation of useful languages from the original core language. While the precise definition of a minimum set of features is a problem in itself, it is clear that the emphasis on minimality leads to kernel languages which are so primitive and involuted that their use is difficult: they have to be drastically extended in order to be of any practical use. The design of AEPL is a compromise between a desire to keep the number of features of the kernel as low as possible and the requirement that the language be a fairly convenient programming tool. Generality and completeness Rather than to emphasize minimality, our approach has been to try to limit the number of primitive concepts, not the number of built-in language features. For that purpose, we tried to isolate a few very general ideas regarding data structures and syntax and to implement them in a language which would respect the concept of completeness as expressed by Reynolds :17 any value or class of values which is permitted in some context of the language should be permissible in any other meaningful context. This makes the langauge very regular: the number of special cases and particular conventions is greatly reduced. We believe that this is an important feature for an extensible language, since it reduces the number of possible inadvertent violations of the language rules. THE MODEL The AEPL system is composed of three parts: • a core language, • a processor, • a translator. 1. The AEPL core language is a relatively small language which resembles Algol 60 in the sense that it includes a number of basic expression and statement forms (including declarations) and that the name-scoping of its variables is governed by an Algol-like block-structure. It differs from Algol 60 in the following aspects: • the primitive data items manipulated in AEPL are not the integer numbers, real numbers, arrays, etc., of Algol 60, but socalled t-values and objects as described below; • the AEPL core language contains a data definition facility which enables the user to define and manipulate new data structures; • the AEPL core language includes a number of facilities for modifying its own translator, thereby allowing an extensive syntactic variability. 2. The AEPL processor is a machine which operates on data structures .of a particular kind, namely executable data structures called programs. Programs may be created and operated upon by the user in the same way as any other data structures. Programs are distinguished only by the fact that if the AEPL processor is applied upon them or, more precisely, if control is transferred to a program data structure, a number of actions will be performed by the processor. The AEPL processor recognizes 63 different kinds of programs, i.e., the processor is a machine with a repertoire of 63 different instruc- I I Data Structures in Extensible Programming Language AEPL tions. It is possible to combine a number of such programs into a compound data structure; control can then be transferred to this structure and the processor will then execute the different actions specified by the individual programs in a well-dB fined order. 3. The AEPL translator is a program for the AEPL processor whose purpose is to transform an input string of characters into another data structure according to the rules of a special kind of grammar. At certain points of the translation , control may be transferred from the translator program to certain parts of the generated structure, thereby yielding "execution" of the transformed text by the processor. The AEPL translator is composed of a lexical scan and a parsing phase. The parser consists of a parsing algorithm derived from the Markov Algorithm 8 ,11-13 and a modifiable grammar which 'drives' the algorithm.. The source text submitted by a user may contain statements whose execution affects the grammar by addition or deletion of rules. This feature is used to modify the syntax of the language: one may add new operators, new kinds of expressions, new types of statements dynamically' it is also possible to redefine (overload) or delete existing language structures. 4. In conclusion, one may view the AEPL system as consisting of a program (the translator) executed on a special machine (the processor). The translator transforms the input into several data structures. A certain number of those data structures can be interpreted as instructions for the processor and control can be transferred to them. If the input contains the appropriate command, the execution of the corresponding data structures by the processor will modify the translator: the language will have been extended. The translator program is present in the memory of the processor together with the generated data structures unless those have been deleted by specific commands. Thus, at any instant of time, the "run time environment" of a user's program consists of the whole AEPL system augmented by the programs which were executed in the past and the data structures resulting from the execution of those programs. This approach is similar to that of languages such as LISP and BALM.9 Since the processor is implemented conceptually as a program executed on an existing computer, the AEPL system can be considered to be interpretative. 517 DATA STRUCTURES Principles One of our aims has been to create in AEPL a simple general data definition and manipulation facility WhiCh would allow us to handle a wide class of data structures. This facility should be powerful in order t? ena?le the user to define complex data organizatIOns; It should however be simple enough to understand and to use. This last point required that the data structure facility be baSed on a small number of wellchosen primitives. Another design decision which has been made regarding AEPL is the total separation between data structures as conceptual organizations of data and storage structures or representations of data structures in memory. At present, the user is provided with a flexible data structure manipulation system, but he has ~o control over the way the structures are represented m memory. I t is clear that an algorithm can be specified and checked out for logical flaws without reference to memory representations. Indeed, when a complex algorithm is designed, it is common practice to clear the main issues and to avoid excessive detail by specifying the ?ata structures first and postponing decisions regardIng memory structures to a later stage. On the other hand, it is certain that the efficiency of any algorithm depends on the memory representations of the data structures. Therefore, in its current form, AEPL is a tool which is useful in the first stage of the design of algorithms. Using AEPL, one can verify and debug an algorithm in terms of its logic rather than in terms of its storage structures. Mter the debugging phase, however, it may be necessary to modify the default storage structures in order to increase the efficiency of the algorithm. At this stage, it is certainly easier to experiment with new storage structures, since one is at least almost certain that the logic of the algorithm is correct. A complete programming system such as the one we aimed. at should also provide means for controlling and .checkmg the memory representations. This requires an l,mplementation specification language which would allow the specification of storage structures by addition of statements to a program rather than by the modification of the program. This idea is not new: it has been proposed by Balzer,! Schwartz,2 and Earley 6 among others. bu~ Basic data elements There are two kinds of data elements in AEPL: tvalues (terminal values) and objects. Both kinds are strongly interrelated. 518 Fall Joint Computer Conference, 1972 -t-values are entities which can be used as values (in the sense described below) of attributes of objects. Examples of t-values are integer numbers, character-strings, sets of integer numbers. -Objects are entities to which six t-values are associated in the following way: we say that an object possesses six attributes, named respectively: name-, value-, mode-, type-, scope-, and rule-attribute. Each attribute may possess a value, which is necessarily a t-value. If an attribute of an object possesses no value at some point in time, it is said that its value is undefined. It is possible to inquire about the value of any attribute of any object, and to modify that value. Another way of looking at this would be to say that one object describes particular relationships between the six t-values which are the values of its attributes. The nature of these relationships will be explained below. References A reference is a t-value which designates an object in a unique way. One of the ways to gain access to the attributes of an object is by using a reference to that object. We do not concern ourselves with the implementation of such references: the important fact is that for every reference t-value there exists one and only one object which is referred to by that t-value. The reference concept is a generalization of the pointer concept which does not imply any particular implementation. Sets T-values The AEPL system provides the following kinds of t-values: -atomic t-values: integers, reals, characterstrings, labels and references; -compound t-values or, in our terminology, sets: -explicit sets or E-sets, -conceptual sets: C-sets, R-sets, P-sets, U-sets, I-sets, F-sets and primitive sets. The primitive sets are: -the set -the set -the set -the set -the set primitive sets are primitive classes. The term "class" is used for a set which specifies the "kind" or "type" (in the Algol 60 sense) of a t-value. The primitive classes are available in the kernel language; other classes can be formed by means of the extension facili~ ties. In fact, any set can be used in AEPL to define a class of t-values (see below). Among the five primitive classes, only the class of reference t-values needs further explanation. of of of of of all all all all all integer t-values, real t-values, character-string t-values, label t-values, reference t-values. Although the term "set" is used, the concept is not in every case the same as the one used in mathematics. Some of the AEPL sets are ordered and may contain the same element many times; other sets (e.g., the primitive sets) correspond precisely to the mathematical notion of set: an unordered collection of distinct elements. Classes of t-values The set of all integer t-values is sometimes called the class of all integer t-values; similarly, the other As mentioned above, a set, in AEPL, is a collection of t-values which is itself a t-value. Sets are used: -to create aggregates of t-values, -to define new classes of t-values. AEPL distinguishes between two kinds of sets: explicit sets and conceptual sets. An explicit set is a finite ordered collection of t-values which are effectively present in the system. Such sets correspond to the usual programming concepts of vector, list or sequence. An example is the explicit set composed of the integer t-values one, two and three, in that order. A conceptual set is a collection of t-values which is defined implicitly. It may be finite or infinite, ordered or unordered. Such a set is defined by a predicate: it consists of all the t-values for which the predicate is true. In mathematical notation: {x I P(x)} An example is the set of all integer t-values, or the set of all character strings beginning with the letter A, or the set of all prime numbers smaller than 100. Sets are described in greater detail below. Other classes of t-values There are a number of classes of t-values which are not primitive classes, but which are used within the Data Structures in Extensible Programming Language AEPL translator program. Because of the model described above, the data structures of the translator are accessible to the user. Among other structures, the translator for the core AEPL uses a number of classes, called built-in classes, which define domains of t-values which may be of interest to the user: these classes are built in terms of the primitive classes in the same way that user-defined classes are constructed. Among these builtin classes is the class of all identifier t-values (character-strings beginning with a letter and containing only letters or digits) and the class of program t-values (explicit sets which can be interpreted as commands to the AEPL processor). Objects and their attributes Objects are entities to which six t-values are associated: one object describes specific relationships between these t-values, which are said to be the values of the attributes of that object. We describe here the roles of the attributes of an object. ' The attributes of an object The name-attribute The value of the name-attribute of an object or, for short, the name of an object, is a t-value belonging to the class of identifiers: it may be used to refer to an object in the same way as a reference t-value. An identifier is thus associated with an object through the nameattribute of that object. Many objects may have the same identifiers as value of their name-attribute, but at every point in time a given identifier may be used to refer to only one of these objects. The choice of the object which is referred to by a given identifier is governed by the name scoping rules which depend on the block structure of the text submitted to the translator. The value-attribute I The value of the value-attribute of an object or, again for short, the value of an object is a t-value whose class is defined by the mode-attribute of the object (see below). This attribute is closely related to the usual concept of value of a constant or of a variable in other programming languages. The mode-attribute I: 'I: 1,,\1 " The value of the mode-attribute of an object a is a reference to an object {3 whose value is a set of t- 519 values to which the value of a belongs. The value of {3 thus defines the domain of the values of a or their class. For short, we say that {3 is the mode of a or that object a possesses mode {3. The reason for the existence of the mode-attribute is simply to allow the association of a meaning with the internal representation of the value of an object. The mode of an object a will indeed indicate whether the value of a is an integer t-value, a reference, a set, and so on. The corresponding Algol 60 concept is that of type of a variable; the name "mode" has been chosen because of the similarity with the Algol 68 22 idea. Another purpose for the mode-attribute is its use, similar to that of syntactic type, during the parsing process. The modes of the objects involved in the parsing play indeed an important role in the selection of the grammar rules which must be applied to transform the input string into the parse tree. The type-attribute The type-attribute of an object can possess two values which indicate whether the object' is a variable or a constant. An object is variable if the set of possible values for that object contains more than one element; otherwise it is constant. Clearly, one could. indicate that an object a is constant by having its mode be a reference to an object {3 whose value is a set with one elem~nt, namely the value of a. However, it is usually preferable not to use this device; it is more appropriate to distinguish between a variable object whose value is the integer t-value seven and a constant object whose value is the integer seven by means of the type-attribute. The mode-attributes of these two objects could then both be a reference to an object whose value is the set of all integer t-values. The scope-attribute The scope-attribute of an object can possess three values denoted GLOBAL, LOCAL and DUMMY which define the scope of the relationship between the object and the identifier which is the value of its name-attribute.15 ,16 The rule-attribute The purpose of this attribute is related to the generation of program t-values by the parsing process. 15 ,16 The value of the rule-attribute belongs to the built-in class of program t-values. 520 Fall Joint Computer Conference, 1972 Primitive objects To all primitive classes correspond built-in primitive objects. We thus have an object whose name is INT and whose value is the class of all integer t-values. The mode of this object has to indicate that its value is a primitive class: this is achieved by having the mode of object INT be a reference to a special object known to the system as the object whose name is PRIMITIVE and whose value is the set of all primitive classes. The value of the mode of PRIMITIVE is undefined. (The program which operates on the data structures recognizes the name PRIMITIVE.) An example -the value is the integer t-value twenty-seven (the dotted line is used to indicate this), -the mode is a reference to the object INT, -the type is the t-value indicating that A is a variable, -the scope is the t-value indicating that the association of the identifier A with this object is global, -the rule is irrelevant (its value is either undefined or not important in this context). Sets-detailed description Explicit sets An explicit set or E-set is a finite ordered collection of t-values. It corresponds to the usual notions of vector, list or sequence. Every member of such a collection .....- Value Mode Type Scooe Rule .,.... .. A r --- .- I I I I --i. INT .......... -- I VARIABLE CONSTANT I GLOBAL GLOBAL I I I I '-'" -selection of a component by ordinal position or by name (retrieval or storage of a value), -addition or removal of a component, -selection of a subset, -test for membership, -finding the number of components, -concatenation of two E-sets. The language possesses a notation for constant E-sets, e.g., E{1,2,'ABC,'E{3,4} } Figure 1 illustrates, through a schematic representation, the relationships among three objects. The object A, i.e., the object whose name-attribute has the identifier A as value, has as other attributes: Name is called a component. E-sets may be used to create aggregates of data or to define new classes of t-values. The basic operations on E~sets are: .-' - .I.. CmlSTANT GLOBAL ~ ....... " 27 Figure 1-The objects A, INT and PRIMITIVE - -' which denotes an E-set of four components, the fourth of which is itself an E-set with two components. One can define other operations on E-sets in terms of these basic operations by means of the syntactic extensibility mechanism of the language. If one wishes, for instance, to introduce unordered finite sets of distinct elements, one can do so easily by representing these sets as E-sets and by ignoring the ordering relation among the components. At least two basic operations must however be redefined: -the test for equality between two sets must ignore the ordering, -the addition of an element to a set must verify that the element is not yet a member of that set. Other operations on unordered sets (union, intersection, power set, and so on) can then be written in terms of the basic operations. Unordered sets of ordered pairs may be used,as in SETL,20 to represent mappings; functional application can then be easily defined for such mappings. E-sets can be used to define new classes of t-values by enumeration. For instance, the set E{'1','3','5', '7', '9'} could be used to define the class of odd digits. Similarly, the set E{1,2,3,5,7,11,13} could be used to define the class of prime integers smaller than 15. Conceptual sets A conceptual set is a set defined by a predicate. Such a set is not present in the system under the form of a collection of t-values: it is present purely by convention as the set (in the mathematical sense) of t-values for which the predicate is true. A conceptual set is thus a collection of t-values defined by a certain common property. These sets are represented in AEPL by descriptions Data Structures in Extensible Programming Language AEPL of the properties of their elements rather than by a list of their elements. Since such a description is usually composed of several elements, AEPL represents a description by an E-set. The description of a conceptual set may be stored in the value-attribute of an object; the mode of that object will indicate that its value may be interpreted as the description of a conceptual set. The primitive sets are conceptual sets corresponding to the primitive classes of AEPL: they exist in the system as the values of the primitive objects INT, REAL, CHAR, LABEL and REFERENCE. Other conceptual sets belong to one of the following categories: C-sets, R-sets, P-sets, U-sets, I-sets and F-sets. The reason why there is more than one kind of conceptual set besides the primitive sets is simply one of ease of programming: it is not always convenient to represent a set by a general predicate; certain particular cases deserve special treatment. The basic operation involving conceptual sets is the test for membership. C-sets 521 the class to which all the components belong. If the E-sets are nonhomogeneous, then this t-value is a reference to a function of two arguments: an integer t-value n and a reference to an object 01. whose value belongs to the class of E-sets described by this C-set. The result of this function is a reference to an object whose value is the class to which the nth component of the value of 01. belongs. Names is either undefined or a reference to a function of two arguments, an identifier id and a reference to an object 01. whose value belongs to the class of E-sets described by this C-set. The function returns an integer t-value n which is the ordinal position of the component of the value of 01. whose name is to be id. If no such component is found, then the function returns zero. Figure 2 schematizes an example in which the object PAIR has, as its value, the set of all E-sets with two integer components which are unnamed. The values of number-type (constant), number (2), componenttype (constant), component-class (a reference to INT) and of names (undefined) define this by convention. The value of object X (a pair of integers) belongs to the class defined by PAIR, so the mode of X is a refer- I According to the functions of the attributes described above, if an object 01. has an E-set as value (i.e., as value of its value-attribute), then the mode of 01. should be a reference to an object {3 whose value is a set of E-sets, namely the class to which the value of 01. belongs. This class may be defined by a C-set: a C-set is indeed a set of E-sets. Its description is composed of the following five t-values: Number-type is a t-value which indicates whether the number of components of the E-sets which belong to this C-set is variable or constant. Number is a t-value which is the number of components of the E-sets which belong to this C-set if this number is constant (examine number-type to find this out); otherwise, this t-value is a reference to a Boolean function of two arguments: an integer t-value n and a reference to an object 01. whose value belongs to the class of E-sets described by this C-set. The function returns the value true if and only if the integer n is a permitted value for the number of components of the value of 01.. Component-type is a t-value which indicates whether the E-sets belonging to this C-set are homogeneous or not. A homogeneous E-set is one whose components belong to the same class. Component-class is a t-value which .defines the class of every component of any E-set belonging to this Cset in the following way. If the E-sets are homogeneous (examine component-type to find this out),then component-class is a reference to an object whose value is Name Value Mode Type Scope Rule r- I I ~----i I I I I l .... """".... EE 13 ---... CONSTANT E-set number 2 27 number-type CONSTANT component-type component-class names C-set ,..- INT .......... - . CONSTANT GLOBAL ....... ,..,. - 4RIMITI~ CONSTANT GLOBAL ....... - ..,.,; Figure 2-A possible data structure for an object X whose value is a pair of integers 522 Fall Joint Computer Conference, 1972 bers of the restricted set. In mathematical notation: {xeS I,P(x)} Name Value r t-----"----I Mode : 1-------4 Type : I Scope Rule I I------~----J to built-in definition of PROG LOCAL ~~!~ ----+---. ~I---~~~~r-~--r::::---r==~~~- Examples of sets defined by means of R-sets: -the set of all positive integers, -the set of all prime integers, -the set of all prime integers smaller than 15, -the set of all character-strings beginning with the letter A. program t-value P-sets (property sets) '--_--'I _,_---' Figure 3-A program t-value and its structure ence to PAIR. We wish to point out there that the schematic representation of an E-set as shown in Figure 2 does not imply any particular implementation. Another example of a class of E-sets can be found in Figure 3 which illustrates the structure of program tvalues. A program t-value is a particular kind of E-set which can be interpreted by the AEPL processor as a command to perform some actions. The class of program t-values is predefined in the AEPL system ;16 the example of Figure 3 shows a program t-value composed of two components: an operator and an E-set of three references to objects A, Band C. The "execution" of this structure by the AEPL processor will cause the sum of the values of objects A and B (namely the integer t-value 15) to be stored as the value of the valueattribute of object C. This kind of conceptual set is similar to the class of R-sets; it defines a subset S' of a given set S by distinguishing a specific property. This property need not, however, be expressed as a predicate; the user must nevertheless specify how the property is to 'be used to distinguish the elements of the subset. This involves modifying the membership operation to perform the appropriate actions when testing for membership in S'. P-sets are thus an escape mechanism enabling the user to design' different kinds of conceptual sets. U-sets and I-sets (union and intersection sets) These conceptual sets make it possible to define classes of t-values as unions or intersections of other sets. Examples of sets defined in such a way: -the set of all integer and real t-values, -the set of all prime integers smaller than 15. F-sets (file sets) These sets are used to define input-output sequential files. CONCLUSION Other conceptual sets In order to shorten this presentation, we shall not give the complete descriptions of the components of the other kinds of conceptual sets here. We shall limit ourselves to an informal description of the properties of the conceptual sets. R-sets (restriction sets) R-sets are used to impose a restriction on the elements of another set. This restriction takes the form of a Boolean function which specifies which t-values are mem- Itt this article, we have presented an overview of the main features of the AEPL system. We have discussed in detail the data structure concepts which form the basis for the data definition facility of AEPL. Using these concepts, a user can create, in a straightforward manner, new kinds of data items and aggregates of data. Other features of the system enable the user to define new operators for any kind of data items and to create new language structures such as statement forms. Our experience with the language has been limited to pen-and-paper coding since the system is not yet implemented. A language for the creation and manipulation of linear graphs obtained by extension of the Data Structures in Extensible Programming Language AEPL kernel AEPL is described in detail elsewhere. 15 In our opinion, this and other examples show the feasibility and the usefulness of the approach described in this paper. ACKNOWLEDGMENTS We would like to express our gratitude to the following people for the numerous and fruitful discussions we had with them: Prof. Y. Wallach and Dr. E. Kantorowitz of the Technion, Prof. J. Feldman of Stanford University, Profs. J. C. Boussard and M. Griffiths of the University of Grenoble and Messrs. S. Schuman and P. Jorrand of the IBM Scientific Center in Grenoble. REFERENCES 1 R M BALZER Dataless programming Proc AFIPS 1967 FJCC pp 535-544 2 J R BELL The design of a minimal expandable computer language Doctoral dissertation Stanford University 1968 3 T E CHEATHAM Jr The introduction of definitional facilities into higher level languages Proc AFIPS 1966 F JCC pp 623-637 4 T E CHEATHAM Jr A FISHER P JORRAND On the basis for ELF-An Extensible Language Facility . Proc AFIPS 1968 FJCC pp 937-948 5 C CHRISTENSEN C J SHAW (eds) Proceedings of the Extensible Language Symposium SIGPLAN Notices 4 (Aug 1969) pp 1-62 6 J EARLEY Toward an understanding of data structures Comm ACM 14 10 (Oct 1971) pp 617-627 7 B A GALLER A J PERLIS A proposal for definitions in Algol Comm ACM 104 (April 1967) pp 204-219 8 B A GALLER A J PERLIS A view of programming languages Addison-Wesley Publ Co Reading Mass 1970 523 9 M C HARRISON BALM-An extendable list-processing language Proc AFIPS 1970 SJCC pp 507-511 10 E T IRONS Experience with an extensible language Comm ACM 13 1 (Jan 1970) pp 31-40 11 J KATZENELSON The Modified Markov Algorithm as a language parser-linear bounds J of Systems and Computer Sciences to be published 12 J KATZENELSON E MILGROM The Markov Algorithm as a language parser In preparation 13 A A MARKOV Theory of algorithms Academy of Sciences of the USSR 1954 English translation by Israel Program for Scientific translations 14 M D McILROY Macroinstruction extension of compiler languages Comm ACM 3 4 (Apri11960) pp 214-220 15 E MILGROM Design of an extensible progammming language Doctoral dissertation Technion-Israel Institute of Technology 1971 16 E MILGROM J KATZENELSON AEPL-An Extensible Programming Language In preparation 17 J C REYNOLDS GEDANKEN-A simple typeless language based on the principle of completeness and the reference concept Comm ACM 13 5 (May 1970) pp 308-318 18 J E SAMMET Programming languages: history and fundamentals Prentice-Hall Co Englewood Cliffs N J 1969 19 S A SCHUMAN (ed) Proceedings of the International Symposium on Extensible Programming Languages Grenoble 1971 SIGPLAN Notices 6 12 (Dec 1971) 20 J T SCHWARTZ Abstract algorithms and a set-theoretic language for their expression Prelimin~ry draft Courant Institute of Mathematical Sciences New York University New York N Y 1970-72 21 N SOLNTSEFF A YEZERSKY A survey of extensible programming languages Computer Science Tech Report No 71/7 MacMaster University Hamilton Ontario 1971 22 A VAN WIJNGAARDEN (ed) Report on the Algorithmic Language ALGOL 68 Numerische Mathematik 14 (1969) pp 79-218 The universal consulting language alias-The investment analysis language by CAROLE A. DMYTRYSHAK Bankers Trust Company New York, New York ment Science, Operations Research, Financial Analysis and Planning or Corporate Planning. Regardless of the formal title, the composition of the group and the problems it is asked to solve are of the same structure. The group is composed of highly paid people who have advanced degrees in business administration, operations research, management science, economics or some other related field. The problems they are presented with generally have the same set of characteristics: INTRODUCTION IAL (Investment Analysis Language) is a computer language which can be used to generate economic forecasts, develop data bases with complicated list structures, analyze results from psychological tests or compare alternative investments. Unfortunately, the label "Investment Analysis Language" has really limited the number of people who have considered using the language. IAL has been developed as a tool for a group of internal consultants in order that they can solve these problems quickly taking into consideration the types of problems they are asked to solve, the tools available and their own talents. Unfortunately, since IAL was developed for a bank, the label "investment analysis" was used in its name. A more appropriate title would be the UOL, Universal Consulting Language. Regardless of the industry, regressions, statistics and adaptive forecasting are performed in the same manner. 1 am convinced that the use of the Investment Analysis Language is a good way of getting the maximum output from a group of internal consultants (i.e., Management Science, Financial Analysis or Operations Research Group) in the shortest period of time. To illustrate my point, I will present the history of IAL, discuss the need for developing such a language, explain the characteristics of IAL and why they are essential to an efficient operation. A specific example of the use of IAL at Bankers Trust, the support the language receives from the American Bankers Association and future uses of the language, will clarify the major arguments. 1. They must be solved quickly. For example, an analysis to look at the effect of a change in the prime lending rate must be completed and all the financial implications reported to the chief lending officer of a bank before he gets impatient and makes a decision without the benefit of the knowledge gained from the study. Because of this need for faster solutions, time-shared computer systems are one of the tools used by the group. 2. A request to solve the problem is often a onetime assignment. For instance, a study on leasing 747's to an airline is needed only once then the leasing group is out looking for new clients. 3. When performing the analysis, many factors must be taken into consideration and these factors tend to make the problem highly technical and complicated. When, looking at a leasing problem, we must take into consideration tax rates, depreciation schedules, reinvestment rates and the resale value of the equipment. It is im:possible for even the most skilled analyst to take all these factors into account without the use of a computer. 4. The programs developed for the analysis must be flexible. Often, after working on a problem, the results will lead to other questions. To answer these questions another degree of so- THE NEED FOR IAL The problems Most corporations concentrate a collection of bright young men and women in departments such as Manage- 525 526 Fall Joint Computer Conference, 1972 phistication must be added to the program and quickly. 5. Similar types of analysis require the same basic operations to be performed on different data. To analyze financial deals one always performs future value or present value calculations. As can be seen by these characteristics, it is essential that problems be solved quickly. Yet these very characteristics make it twice as hard to solve the problem. The organization of the group and the tools used by the group are key factors in achieving fast throughput. Previou8 8olution8 There are three basic structures which have been followed in organizing a group of internal consultants. First, and most expensive, is an all-analyst staff attempting to do their own programming as well as analysis. This is expensive both in terms of salaries and in utilization of talent: analysts are generally paid more than programmers but are not efficient programmers. N or are analysts interested in investing much of their time in extensive programming tasks. The other end of the spectrum is an all programmer staff to perform analysis as well as programming. The only point in favor of this method is that programmers are not as highly paid as analysts. Programmers are often pressured into giving the user just what he requested in terms of printouts but in most cases are not permitted to make the in-depth analysis required to solve the problem. The middle of the road, and most common approach, is to have programmers and analysts work together on the same problem. This method does work and appears very efficient at first glance. However, it breaks down because the analysts and programmers communicate and operate on two distinct levels. Communication is always a problem whenever professionals from two diverse fields are assigned to the same projects. When the analyst and programmer finally do find a basis for communication, they may be creating an unnecessarily complex and unique solution. The analyst's main concern is structuring the analysis to cover any contingency. He knows that he is working with an expert who has mastered the computer, and, therefore, he has no hesitation in requesting revisions to a program and adding another degree of complication to the study with very little consideration for the marginal cost of the revision. If he were told, "We can make those changes but it will take three days of reprogramming" he might not find the revisions quite as necessary. On the other hand, the programmer is anxious to please and may not object to adding another degree of complexity to an already complex problem. In most cases, the end result is a monster program that has taken several weeks or months to develop and is not general enough to handle anything but the immediate problem. A new 8olution In 1968, David M. Ahlers became the head of the Management Science Group at Bankers Trust (BTCO) and he brought with him the basic modules for the IAL system. The idea and the modules for the language had been developed over a period of several years while Mr. Ahlers had been studying and teaching at CarnegieMellon University. Mr. Ahlers wanted to be able to teach financial concepts as applied to real problems without falling into the trap of teaching programming. He needed a computer tool that was easy to understand and use. When consulting, he could not afford to start from scratch and program every phase of an assignment. In order to save time in future projects he began to save the routines he had written for performing those calculations which . were common to most projects. Since he had to work on several different types of computers his routines had to be easily transferred and basically, hardware independent. These needs led to the development of IAL. Since the problems faced when teaching and consulting are very similar to· those of Bankers Trust's Management Science Division it was decided to use IAL ·for most projects. Revisions were made to the language and it was installed on three different time-sharing systems used by Bankers Trust. The user's manual was completed and later. the language leased to The American Bankers Association for distribution to commercial time-sharing vendors. DESCRIPTION OF IAL IAL is a computer-based language consisting of over 60 functions which can be called on to perform analysis in areas ranging from time value of money to adaptive forecasting. The language is structured so that users of different degrees of proficiency in any area are able to use it and in some cases to become quite sophisticated with increased use of IAL. In the following discussion the hypothetical user is never termed "programmer." The principal user should be an analyst with specialization in business administration, economics, finance or some other related field. IAL has been designed in such a way that the user does not have to know how to program, nor spend a great deal of time learning how to use an interactive terminal. The Investment Analysis Language The analyst will find it faster to specify his needs directly, using IAL, rather than calling on the services of a programmer. Because of this tool the structure of the internal consulting group will naturally evolve into a group of analysts and perhaps one or two programmers. These programmers will be assigned projects requiring expertise in the computer field. Based on FORTRAN IAL is written in a subset of FORTRAN. This set of FORTRAN is composed of the intersection of the FORTRAN languages available on commercial timesharing when IAL was developed. In its present form IAL is very similar to FORTRAN II and contains the standard operations of that language. The primary aim was to be able to install the language quickly, without any major modifications, on any time-sharing system which offered FORTRAN. IAL is now provided as a service by six different time-sharing vendors. It is running on computers made by five different manufacturers. Installation has not presented many problems to date. Blocks of commands IAL is composed of a set of blocks and functions witp.in those blocks. The blocks can be combined in any fashion. The functions within each block are related by the kinds of jobs they perform. For example, the time value of money (TIMEE) block contains all of the 527 future value and present value commands, the teletype input/output (ENTEL) block contains all of the print and read commands for the teletype, and the graphing routines are in the block ENGRF. The language was divided into blocks for two reasons; the time-sharing system at that time did not have large quantities of core available for the user and it was felt that a user would be able to gain a better understanding of the commands if they were organized by functional area. Table I contains a list of the blocks and their functional areas. Each command-a function Each command in IAL is a FORTRAN function and the characteristics of the FORTRAN functions have been incorporated into the language. The communications link between the main program and the various functions is the function name and the argument list. The function name is used for program control while the argument list is for data flow. As an example, command RELAT was designed to determine if two data series are statistically related and if so what is the relationship. If the user were looking at series A and Beach with N points he could execute the following call in his program CALL RELAT (A, B, N). If he wanted to take advantage of the control parameter passed through the function name he could nest RELAT in QUEST. CALL QUEST (1, RELAT (A, B, N». TABLE I-Blocks of Commands Block Name ENTEL QTRNK ENFIL I 'i ENGRF TDATA TIMEE INTRR DEPRE QALTS QALCP ,I, ,Ii CAPBD NUMTS REGRS INVRS Functional Area Input/Output of data via a terminal, Utility functions for statistical routines Save and retrieve data from permanent storage devices Graphing data Transformation of data Time value of money calculations Internal rate of return computations Depreciation and tax credit analysis Evaluation of qualitative time series (lor 2 series) Evaluation of qualitatives time series (continuation of QALTS) Capital budgeting and risk analysis . Forecasting of numerical time series Correlation and regression analysis Inversion of matrices Number of Commands 8 2 5 2 1 15 3 4 5 4 3 4 4 1 QUEST will translate these control parameters and display one of the following messages depending upon the relationship between the two series. THE ANSWER TO QUESTION 1 IS NO THE ANSWER TO QUESTION 1 IS YES THE 1ST SERIES IS GREATER THAN THE 2ND THE ANSWER TO QUESTION 1 IS YES THE 2ND SERIES IS GREATER THAN THE 1ST The command structure gives two advantages: (a) The user has complete control of the information that is displayed. In many computational packages if the user calls the regression command, the system will print out the coefficients, the intercept, the standard deviation, the residuals, the Durban-Watson coefficient and any 528 Fall Joint Computer Conference, 1972 TABLE II-I/O Commands Name TTYI OUTI Call Statement* Function Enter 1 value via terminal Print 1 value on the terminal INTTY Enter table, Y, with N rows and M columns NOUTT Print table Y with N rows and M columns SAVE Store on a permanent storage device table Y with N rows and M columns NFETCH Retrieve from a permanent storage device table Y with N rows and M columns located on line J QUEST Print out the results of statistical tests in terms the user can understand TTYI (ID, X) OUTI (ID, X) INTTY (ID, Y, N, M) NOUTT (ID, Y, N, M) SAVE (ID, IF, Y, N, M) NFETCH (R, IF, Y, M*N) QUEST (IF, FN) * In the command calls ID-the output identification number X-variable being displayed or entered Y-table being displayed or entered N-number of rows M-number of columns IF-file number FN-name of statistical function R-Iocation where retrieving is to begin other statistic that captured the programmer's interest at the time the package was developed. This is wasteful since not all users understand how to correctly interpret the results. When using the regression command in IAL all of the relevant statistics are calculated and passed to the main program through the argument list. The user then has the option to print any of the values he feels are needed for the analysis. (b) The concentration of the I/O routines in a limited set of commands has made installation of IAL on various time-sharing systems relatively easy. It has been found that the execution of the I/O commands are the most system-dependent operations on any time-sharing system. When installing IAL on a new system, the nine I/O commands have to be modified but the computational routines are transferred without further modifications. Flexibility was a key ingredient to early users of time-sharing systems since they were required to make frequent switches in services, for one reason or another. User works at own level IAL always allows the user to work at his own level of sophistication-to the extent that he feels at home with the language. For example, a user· can call one command (SFORM) to look at a time-series to determine what the forecast for next time period is. This command will provide him with the results and these results will be dependent upon whether he indicates that the series has trend and seasonal components. If the user does not know or can not make an assumption as to whether the trend or seasonal components exist, he can include in the argument list calls to the TREND and SONRA commands. CALL SFORM (A,12,TREND(A,12),SONRA(A, 12),B) The commands will then set the necessary parameters for the SFORM command using statistical tests based on a 90 percent confidence interval. If the user is more knowledgeable in statistical analysis and would like to set up a different set of confidence limits or to know the underlying statistics used in determining the results, he is able to run the SFORM command and set up the necessary parameters CALL INTTY (1,A,12,1) CALL SFORM (A,12,1.,1.,B) He can also descend to another level and use the regression command which is basic to SFORM. The use of the I/O commands varies heavily depending upon the level of the sophistication of the user. The user has the option of using the commands to store individual lists of data in files or to create data structures of linked lists. For instance, in the reporting system for BTCo's Corporate Planning model, a directory system consisting of three levels is used. This directory is used for retrieving data as well as for creating the headings and subheadings used by a new report generator. Because of the careful design of .the general purpose file manipulation commands, the user is given a powerful set of tools to handle his storage problems. He can use this system for basic storage of data or build a directory system for the storage and retrieval of the information. At the core of the data storage and retrieval system are the two commands SAVE and NFETCH. Their tasks are explained by their names. SAVE stores data and NFETCH retrieves information stored by the SAVE command. If the user wishes to store a Table A containing .6 rows and 3 columns in the file designated as File 2, his program would contain the command ~ The Investment Analysis Language TABLE III-Function Names Related to Job CALL SAVE (1,2,A,6,3) When the system executes the SAVE command in the user's program, the system will issue the message SAVE 1 FILE 2 LINE 1001. The line identification (1001.) tells the user the line number at which the header items for the save and first data element are stored. The line number is used by the NFETCH command for retrieving the data. To retrieve information the user's program will need the following command CALL NFETCH (1001.,2,A,3*6) From these two simple commands a complicated directory system can be developed. To build a simple directory of a series of 6 tables, each containing 2 rows and 5 columns, the user could write the following program: DO 1 J=1,6 CALL INTTY (J,A,2,5) B(J) = SAVE (J,1,A,2,5) 1 CONTINUE CALL SAVE(7,2,B,6,1) The user will have stored list B containing the line numbers of his 6 saves in File 2 and the 6 tables in File 1. The portion of a program which automatically retrieves the third data series is CALL NFETCH (1001.,2,B,6) CALL NFETCH (B(3) ,1,A,2*5) The user who does not understand the directory concept or does not have a problem complicated enough to , , warrant such sophistication is not hampered by any involved system of control codes or options on what to use. To the more sophisticated user the system is open ended, and he may decide upon the level of complexity. The system is also open ended for the novice user, for as his skills grow he too can build more complicated and involved routines. Providing a language which gives analysts the capability of working at their maximum technical capacity rather than forcing them to work at a level established by a programming package is one of the best ways of , getting the greatest value per dollar per analyst. 529 Function Name Operation GRAPH Produce a graph containing up to 8 data series PV Calculate the present value of a number TREND Test a data series to determine if a trend exists RELAT Determine if two data series are statistically related GROW Compute growth rates UTEST Perform a Mann-Whitney U Test on a set of data The language has no apparent limit to the number of commands which can be incorporated into it. If a user finds that he is performing certain computations frequently, there is no reason why he cannot write a function to perform these computations and use it in conjunction with IAL. At Bankers Trust we have added sets of commands for calculating bond prices, yields and coupons. The IAL user's manual has been designed so that a user can incorporate documentation for the commands he creates. This provides a central library for all routines used by the group. Function names The name of each IAL command is related to the task the command performs. Table III shows some of the commands and their related functions. This method of naming functions helps the user analyze his problem and understand the operations he performs, instead of issuing a series of numerical codes as in the case of the early statistical routines. It also provides a means of instant documentation. I 1 EXAMPLES OF THE USE OF IAL A forecasting problem 1 ' I 11,11,'1:' Open-endedness The report generating command just added to the ,il system points up another characteristic of the system. One of the most common uses for IAL is in the building of forecasting models and tracking the output of these models. An economist wishes to take the monthly values of an index of business activity for the last 3 years, devise a model to describe past performance and forecast quarterly activity for the next four quarters, and analyze these forecasts. 530 Fall Joint Computer Conference, 1972 To do this he has decided to write two programs. The first will 1 2 3 4 accept the data graph the data devise a model and store the model data ona permanent file The following program when executed will answer his needs. CALL NFILE(1,'MODEL ') CALL INTTY(1,B,0,36) CALL NGL (B,1,36,2,0) CALL GRAPH(1,1,6) CALL OUT1 (2,SFORM (B,36, TREND (B,36), SONRA(B,36) ,A)) CALL NOUTT(3,A,10,1) CALL SAVE(4,1,A,10,1) STOP END 271.00 277.83 0- - - - - - - - - + - 1.00 2. 00 3. 00 4. 00 5.00 6. 00 7.00 8.00 9.00 11.00 11.00 12.01 13 .on 14.00 15.00 16.00 17. 00 18.0n 19.00 20.00 21. on 22. 00 23.00 24.00 25.on 26. 00 27. 00 28. 00 29. on 30 .00 31. 00 32.00 33.00 34.00 35.00 36.00 284.67 2Q1.50 I I I I I I I RESULT 2 IS RESlJL T 3 311.% 311.11 0.85 0.0 o. a 2.00 5.38 o. a SAVE :'>05.17 312.00 , 0.0 311.96 3799.44 4 LitlE 1001 FILE USING BTC IAL EXECUTION BEGINS ... TTY INPUT 1 36 FREE FORM VALUES ?271.,278.,282.,288.,289.,283.,288.,288. ?291.,281.,277.,279.,293.,295.,300.,302. ?297.,307.,304.,312.,306.,297.,291.,298 ?307.,306.,310.,309.,304.,305.,310.,305. ?308. ,300. ,298. ,307. From the output we see that 311.96 is the forecast for the next month. The equation which describes the behavior of this series is A t =311.11+.85*t The second program the analyst writes is to revise his original model each month as new data is collected. He will 1 2 3 4 read in the new observation forecast 1 period ahead then forecast 3, 6, 9 and 12 periods ahead Print out 'the forecasts as well as the upper and lower confidence limits for each forecast CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL CALL STOP END ROIl 10 298.33 - - -- - -- +- - - -- - - -- + -- - -- - - -- + - - - - - - - - - +- - - - - - - - - + When the program is run' the following actions, appear on the terminal 1 Graph I NFILE1 (l,'MODEL ') NFETCH(1001.,1,A,10) OUT 1 (2,FORM1 (A,1.,TTY1 (1) ,EI)) OUT1 (3,FORMT (A,3.,FH1,FL1)) OUT1(4,FH1) OUT1(5,FL1) OUT1(6,FORMT(A,6.,FH2,FL2)) OUT1 (7,FH2) OUT1(8,FL2) OUT1 (9,FORMT (A,9.,FH3,FL3)) OUT1 (10,FH3) OUT1(11,FL3) OUT 1 (12,FORMT (A,12.,FH4,FL4) ) OUT1(13,FH4) OUT1 (14,FL4) SAVE(l,l,A,lO) When the program is run with a new observation of The Investment Analysis Language 531 318 the following results appeared on the terminal: USING BTC IAL EXECUTION BEGINS ... TTY INPUT 1 IS 1318. RESULT 2 IS 314.43f-next period forecast RESULT 3 IS 316.34f-forecast 3 periods ahead RESULT 4 IS 323.0'7. )confidence limits RESULT 5 IS 309.61 RESULT 6 IS 319.21f-forecast 6 periods ahead RESULT 7 IS 326.99 RESULT 8 IS 311.42 RESULT 9 IS 322.07 RESULT 10 IS 330.93 RESULT 11 IS 313.22 RESULT 12 IS 324.94 RESULT 13 IS 334.89 RESULT 14 IS 314.99 SAVE 1 LINE 1041 FILE 1 The corporate planning Illodel The most ambitious undertaking using IAL was the development of the Corporate Planning System at Bankers Trust. The Management Science Division was transferred from the Computer Research and Development Department to the Corporate Planning Task Force in the Office of the Chairman. Our mission was to gather operating data from various sources in the bank, make forecasts and develop a model to be used in the Corporate Planning process. This system had to take into consideration various economic environments, operating characteristics of the bank, possible invest..., ment strategies and internal policies. Through this process, a set of expense and income guidelines was developed on a departmental level, covering the next sixteen quarters. To accomplish this task many people with varied talents. were needed. In order to spread the workload, the projects were divided into several small proj"ects with a project coordinator to ensure compatibility of the separate projects. Figure 6 shows how each project fit into the Corporate Planning system. Each project will ~. be discussed separately. I ~il , Forecast interest rates and spreads Iii 'Ii 1:1 ~I One of the roles of the Economics Department is to serve as the official forecaster of certain key rates for the bank. FIVE KEY RATES GENERATE RATES AND CHECK OUT ....... GENERATE SPREADS RATE & SPREADS FILE Figure I-Forecast interest rates and spreads An analyst was assigned to generate the rates needed to drive the planning model and to act as liaison between the Economics Department and the Task Force. Five key rates were generated by the Economics Department and then the analyst expanded them to provide rates needed for the model. The analyst had to check that these rates were economically consistent with the rest of the bank's forecasts, and generate the spreads between rates. IAL's forecasting routines were used to generate the rates, the regression routines were used to check the relationships between rates and the I/O routines were used for storing these rates and spreads so that they could be acceAAed bv the model. Link into the bank' 8 commercial loan system Although our model was to be run on an outside timesharing system, we needed large quantities of data from the bank's in-house commercial loan system. This system had been written in COBOL five years earlier and would have required many hours of modification before we could get our output directly from the operational system. Rather than spending an extensive amount of time on this problem, our solution was to have a program written in PL/l to read the commercial loan tapes and write the necessary records in a format which could be easily read by FORTRAN. Some of the IAL modules were installed on our in..;house batch system and then a series of IAL programs were written which simulated the loan portfolio for the next sixteen quarters, generated the proper schedules and rates for the outstanding loans and divided the loans into categories needed for the model. The data was then stored on the time-sharing system using the IAL I/O functions. 532 Fall Joint Computer Conference, 1972 ~ v-· -C> AGED LOANS & EXTRACTION PROGRAM RATE 1-1> FILE RATE AND SPREAD FILE Figure 2-Link into Bank's commercial loan system Figure 4-Build the corporate planning model Gather direct expenses and income data In order to generate salary expense tables, coefficients were needed to apply to hypothetical salary policies. The line of attack was to use a forecasting model on another time-sharing system to produce future levels for various loan and deposit types in the New York market. This was then broken down to bank and then departmental levels. Prior to this, historical loan and deposit information was gathered plus 150 time series of various activities performed within the banking operations groups. These series were the various activities that had to be performed to service the loan and deposit accounts. Using regression commands it was possible to determine the relationship between activity and account levels. The relationship between people and activity was also revealed. By starting at desired loan and deposit levels, then calculating the numbers of people needed and then finally using policy data, we could derive salary expenses. the actual programs, an analyst who understood organizational structures and the accounting conventions of the bank was an absolute must. This was not a job for a programming or forecasting type. The task involved using all of the information provided by the other members of the Task Force and generating an earnings estimate and a tax strategy for the bank. In fact, this project was worked on in parallel with all of the other projects for the system. Because all of the input data had been stored using IAL, there was no problem of compatibility. Since record layouts are standardized as a result of using IAL, given the file name , the user is able to access any data on . that .file. Since all of the forecasting has been done In prevIOUS programs IAL was needed only for data manipulation, retrieval of tables stored on files, and printing of data. Build the corporate planning model The final output from the system was a set of operating goals for the various divisions within the bank. The focal point of this effort is the Corporate Planning model. In order to design the model and to write MODEL FORECAST OF N.Y. LOAN & DEPOSITS BREAKDOWN INTO BANK & MARKET SHARES Generate the operating goals on a department level GOAL PEOPLE FILE ACTIVITIES TO PEOPLE Figure 3-Gather direct expense BREAKDOWNT DEPARTMENT GOALS' Figure 5-Generate departmental goals The Investment Analysis Language 533 using IAL for the last three years. This is primarily because the type of projects assigned to the department has changed radically. Prior to 1968 the group worked on short, one-time studies in various areas in the bank. After 1968 it was decided that the Management Science group should decrease the programming staff and have analysts work on a few key projects. The analysts were to do their own programming using IAL. Less programming time Figure 6-Corporate planning project This program was assigned to a professional programmer because the expertise required for this project was a knowledge of data and file structures. Although the projects assigned to the Management Science group were much more comprehensive after this change in policy, the programs used for analysis tended to be much shorter. A program composed of a series of calls to the IAL functions was usually less than a page (66 statements) in length. These short programs required less time in the debugging stage. The IAL functions had already been tested and certified so the user only had to make certain that the arguments used in the calls to the functions agreed with those of the language. Most errors occurred because the user did not read his manual carefully or because he made a mistake entering his parameters to the functions. Review of the project The· programming and forecasting portions of this project were completed in two months time. A project of this magnitude could never have been completed in this period without the use of a language such as IAL. IAL allowed us to use each analyst where his expertise would be an asset to the project and programming per se was never an issue~ A project of this size usually is slowed down because team members have to wait while others complete their portions of the project. Because of the modularity , of the major project design and the ease with which the user could store and retrieve data, data files and programs could be tested in parallel. This gave each analyst a chance to fine-tune his project and not have to rely on somebody else in order to meet his deadline. I, The most difficult portion of the project was not communications within the team but the collection of data I and learning the exact meaning of the collected data. ' Relieved of programming details, the group was able to II devote more time to the real analysis side of the project. I I I, BENEFITS GAINED FROM USING IAL It is impossible to put a dollar value on the savings ', that the Management Science group has realized by Cut time-sharing costs The department's time-sharing expenses had been steadily increasing until the installation of IAL. Our expenses were cut by % after the advent of IAL. Thi~ included the decrease in terminal rental as well as a decrease in the cost for using commercial time-sharing systems. Decreased need for documentation Every analyst was forced to use IAL and as a result, to become familiar with the language. He was able to read any program written in the language~ This benefited the group in two ways. There was little need for extensive documentation of a program and the only docu~entation required was a list of data to be provided by the user and a list of the results. With this, any analyst could pick up a project or work on the programs without extensive orientation. As a result, the analyst had more time to spend on the structure of a problem. He did not have to worry about communicating his vague suspicions on how the program should be structured. He was on his own and in most cases loved it. 534 Fall Joint Computer Conference, 1972 A ids in communication Extended lists One of the biggest advantages to be gained from using IAL is that everybody is speaking the same language. IAL is actually enforcing a set of definitional and computational standards on the user group. As an example, all users will be calculating depreciation schedules, internal rates of return and present values using the same methods. The benefits gained from a common language can be spread throughout the organization as well as within an individual group. At Bankers Trust the Management Science Division, Economics Division, BT Consultants (financial consultants) and the Credit Analysis Group all use IAL. The groups are able to share programs without extensive documentation, discuss the solutions to problems and in some cases, implement the solutions without overwhelming communications problems. In this way any organization can leverage the unique talents of various groups within it and develop company-wide projects. Today there are many good. packages and languages available for various types of analysis. Packages have been designed to run regressions, to perform statistical analysis and to calculate present values. These packages are good but IAL has the capability of operating in all of these areas. This means that every professional in a research group, regardless of his expertise, is speaking the same language. This is one of the strongest arguments for using IAL. Because of the need to pass tables of various lengths and dimensions it was necessary to design the system around extended lists instead of tables. An extended list is devised for a table of M by N. dimensioning and storing it in a list of M*N length. Element (1, 2) of the table is stored in element of M +1 of the list. This presents no problems to those using all IAL functions in a program. If a user wishes to mix his own code with IAL he must be very careful when moving elements in the extended lists to make sure that they would correspond to the correct element in his imaginary table. PROBLEMS WITH THE LANGUAGE Distribution of I AL Terminal I/O The obvious drawbacks are those that meet the user's eye first. Until recently, the terminal I/O commands were very brief and there was no generalized reportgenerating capability. The language was designed as a tool for research and as such did not need options for elaborate reports. Another drawback was that the user was initially providing input on-line as his program was running. In many cases, depending on the time-sharing system when the user accidentally typed an alphabetic character for a number he would be forced to start over and rerun the program. These two problems have been solved first by adding a generalized report generator to the list of commands and secondly designing a function which allows the user to enter data through the editor of the time-sharing system instead of on-line. New users If a potential user of IAL has learned to use BASIC or FORTRAN he usually feels that he is a pretty good programmer and is generally more of a problem than somebody who has never programmed. A person with previous experience feels he doesn't need help and the idea of using a package is pretty silly. The only way that this can be solved is to have him working with somebody using the language. He will observe how easy it becomes to use the functions and gain confidence in the language. After a while he will get a great deal of satisfaction out of being able to do the programming end of the project with such ease. IAL has been leased to The American Bankers Association by Mr. Ahlers. The ABA is distributing the system to commercial time-sharing vendors throughout the United States and hopefully IAL will be marketed world-wide. IAL has been installed on seven time-sharing systems and is available to any vendor who wishes to install the language, provided he goes through the prescribed certification process. This process was instituted to guarantee the integrity of the computational routines after installation on a new system. In most cases IAL is available at no extra charge to any subscriber on anyone of the seven time-sharing systems. Plans are now being processed to make IAL available to the universities and it will soon be used in the graduate schools at Carnegie-Mellon University and Harvard University. After IAL becomes available as a teaching tool in the business schools, more and more young anaylsts will find it an efficient means of performing most of their financial analysis. The Investment Analysis Language The American Bankers Association has also offered week-long courses in the use of IAL and the financial principles behind the language. These courses have been attended by many members of the Management Science and Operations Research groups from all of the major banks in the United States. A users manual for the language has also been written and is distributed through the ABA. FUTURE OF IAL The IAL system has been a success in the financial and research areas. A demand has been developed for new functions which will fit into the IAL framework but deal with very specialized areas of banking or mathematics. As described earlier, various modules have been added: corporate tax modules, bond modules and generalized report generators. Functions are being added to the system to solve linear programming problems and for use in adaptive forecasting. IAL was originally designed to be used for short research problems but I feel that its real future lies in serving as a 535 communications link in large projects. This can be the unifying thread in a large project such as the corporate planning model designed at Bankers Trust. The language has proven to be an ideal means of communications within a group or corporation. ACKNOWLEDGMENT I would like to thank Dave Ahlers for the opportunity of working on such an interesting project. I am also indebted to Judith Liersch and Arlene Kaplan for their tireless efforts in re-reading, and re-reading, and rereading this paper. REFERENCES 1 D M AHLERS I AL reference manual American Bankers Association 1970 2 C A DMYTRYSHAK The development of the investment analysis language MS Thesis Department of Computer Science Pratt Institute June 1970 The design approach to integrated telephone information in the Netherlands by R. A. DIPALMA* Litton-M ellonics Sunnyvale, California . and G. F. HICE PANDATA N.V. Rijswijk (ZH), the Netherlands t· INTRODUCTION • • • • • • The Integrated Telephone Customer Information System (ITCIS) is a computer network system which was initiated by the Dutch Post,' Telepho~e and Telegraph (PTT) and PANDATA N.V., a Dutch software company partly owned by PTT, in June 1970. The initial definition study concerned the feasibility of integrating several data files each containing t~lephone customer data (Billing, Directory PreparatIOn, and Work Order Administration). In addition, there were efforts under way by a PTT research group concerning the automation of the Directory Assistance Service and, by another group, the Telephone Cable and Pair Administration. The conclusion reached in the definition study indicated that integration was not only feasible, but, that a completely integrated online system, including Cable and Pair and Directory A SSlstance . would be economically desirable. Subsequently, the Preliminary Design of ITCIS was undertaken in October 1970. Within this design stage, hardware and software elements were designed for a system based on the projected workload for 1980 when, it was estimated, there would be more than 4 ,'. million telephone customers. The applications included within the integrated system are: Directory Preparation Billing and Collections Work Order Entry and Administration Customer Services Inquiry Cable and Pair Administration and Inventory Management Information. These applications involve large batch runs coupled with high load real-time inquiry. . During the Preliminary Design effort, a number of trial configuration approaches were developed and examined to determine the most favorable approach for ITCIS. A decision matrix analysis, combined with a method for converging opinions, was used. The result of this analysis is that a .configuration approach based on a centralized data base, on-line to all administrative districts is the most advantageous. The hardware facility required to support this centralized data base designated the Central Processing Facility (CPF) , i~ a multi-processor mainframe using large capacity removable disk storage to contain the data base. Access to this shared centralized facility will be provided via dedicated communications circuits be~ tween each district and the CPF. These circuits will be terminated in each district by a small general purpose computer, designated a Computer Based Concentrator (CBC) which will act as data concentrators and remote batch terminals. There are over 150 points at which a remote device must be located. These include 13 telephone district II I I • Directory Assistance Inquiry * Formerly of PANDATA N.V. 537 538 Fall Joint Computer Conference, 1972 headquarters, 20 directory assistance operator rooms and 120 technical service areas. Each district headquarters requires remote batch facility and a minimum of one teletypewriter (TTY) and visual display unit (VDU). A cluster of up to 40 VDU's is needed in each of 20 operator rooms. Each of the 120 service areas requires a minimum of one TTY. Some generalized system software, other than that supplied by the manufacturer, will be required to support ITCIS application programs, in the areas of: HARDWARE/SOFTWARE ENVIRONMENT The CPF proposed for the final system is a UNIVAC 1110 multi-processor with three subsystems of 8440 disk storage (2.5 billion characters). Two Control units are used with each disk subsystem. It is a 2X2 system with 2 Command/Arithmetic Units (CAU) and 2 I/O Access Units (IOAU). Memory consists of 128K words of Main Memory (plated wire) and 262K words of Extended Core Memory. The communications interface consists of 2 Communications Terminal lVIodule Controllers (CTMC) and 8 Communications Terminal Module (CTM). A tape subsystem of 6 drives is included. Unit record I/O is handled by 2 UNIVAC 9300's with printers and card reader/punches. See Figure 1 for the CPF configuration. The remote processors (CBC) will most likely be PDP II/20's. Each PDP 11 (15 in all) will act as remote batch terminal, accepting card and paper tape input and producing printed output. The main function, however, consists of being an on-line communications multiplexor and concentrator. The configuration consists of 8K (16 bit) words of memory augmented COMMAND AR ITHMET IC UN IT COt1MAND ARITHMETIC UN IT • District computer operating system and communications • Real-time communications and processing at the CPF Detailed design and programming of the first phase of implementation of ITCIS has now begun. This phase will automate Directory Assistance (Inquiry 008) and Cable and Pair. There are several points of interest in this design effort which will be discussed in more detail. These include the hardware/software environment mentioned briefly above, some of the design techniques used, and the use of a manufacturer-implemented version of the CODASYL Data Base Task Group Report. 1 MAIN MEMORY (65K) MAIN MEMORY (65K) I/O ACCESS UN IT I/O ACCESS UN IT 16 I/O CHANNELS 16 I/O CHANNELS 8440 SUBSYSTEM 8440 SUBSYSTEM 1 2 (8 DISK UNITS) 3 (8 DISK UN ITS) UNIVAC VIII C MAG TAPE DUAL CONTROL UNIT 8440 SUBSYSTEM (6 DISK UNITS) 9300 AND 9330 SUBSYSTEMS COMM LINES TO CBC's Figure l-CPF configuration by 64K words of disk storage. Card, Print and paper tape I/O is provided, as well as interfaces for the required terminals and high speed link to the CPF. See Figure 2 for the CBC configuration. The high speed link will be leased lines (15) operating at 2400 and 4800 bps, full duplex. The communications technique used is segmented messages with acknowledgement of first, last and retransmitted segments. Cyclic error protection codes are used. There are over 600 terminals required (in 1980). Of these, 425 are Uniscope 100's for Directory Assistance Inquiry and Customer Service Inquiry. Both multistation and stand-alone ·connections are allowed with remote connections via 2400 bps lines and modems and local connections via 4800 bps cables. There are almost 200 Automatic Send/Receive (ASR) teletypewriters situated in customer service areas and connection departments for work order entry, maintenance and I Design Approach to Integrated Telephone Information ~ '" "~ >- a dedicated system but can be shared by other applications, perhaps even other real-time applications. • DMS-ll00 Is also being used in its entirety. The run time component of DMS-l100, called DMR, is presently available as a single thread program, with a reentrant version, capable of concurrent run-unit execution, to be available in June 1972. Figure 4 shows how the components of DMS-1100 are used to build a SCHEMA and an application program to manipulate the Data Base. • RTS UNIVAC Real Time Scope Handler, has been modified by PANDATA to activate the user whenever a message comes in on any communication line, to dynamically handle message buffers and to provide a general pool capability. ... . • CBC ( COMPUTER-BASED CONCENTRATOR) HODEM Figure 5 shows how both jobs and inquiries are initiated in this system. Figure 6 shows the interface to the data base. T ~ FULL DUPLEX H102 TRANSMISSION LINE OPERATING INITIALLY AT 2400 BAUO. WITH POSSIBLE EXPANS10N TO 4800 BAUD Figure 2-CBC configuration I 539 inquiry. The TTY's are connected to the CBC's over local leased 110 bps lines and may be connected in multi-drop or stand-alone fashion. Figure 3 shows the approximate location of terminals throughout the Netherlands. The total configuration is to be built up in a modular way over the period of 1972 to 1978 according to application implementation and system growth. The stress has been on flexibility, modularity and reliability. The emphasis in design of the software for ITCIS has also been on flexibility, modularity and reliability. The software to implement ITCIS has either been selected from available general purpose UNIVAC software, or is being developed as a joint effort of PANDATA and PTT personnel. In order to meet the requirements cited above, as much general purpose UNIVAC software as possible is being used. To qualify, the software must meet performance requirements and be available now or in the· immediate future. The software being used from UNIVAC is: o @ o • EXEC-8 The operating system in its entirety and without modification, is being used to control the CPF resources. This means that ITCIS is not TELEPHONE DISTRICT HEAD0UARTER/ DIRECTORY ASSISTANCE OPERATOR ROOM/ 6i~~~~6R~R~~SISTANCE OPERATOR ROOt1/ SERVICE AREA SERVICE AREA Figure 3-Telephone districts of the Netherlands, showing network 540 Fall Joint Computer Conference, 1972 The application software being developed for ITCIS is also indicated and explained in Figures 5 and 6. It should be noted that the scheme is open-ended enough so that new applications .can be added to the IT CIS network by simply connecting new terminals to the CBC's and adding real-time transaction processors or batch runs needed to the CPF software. DESIGN TOOLS The entire design effort has been conducted. within the framework of the PANDATA System Development Methodology (SDM) which is a standard controlling the design and development of large-scale systems. This method provides detailed sets of activities within each stage of seven stages of system development. For each activity, there are explicit steps that must be CREATE A SCHEMA TO DESCRIBE THE DATA BASE. THE WAY ANY PARTICULAR APPLICATI ON PROGRAM WOULD LIKE TO SEE IT. (DELAYED' RESPONSE UP TO ONE DAY) (REAL TIME RESPONSE) SA TeOl COLLECTS TRANSACTIONS ON A BATCH BUFFER (BY • TYPE OF TRANSACTION) OFF-LINE BATCH RUN. DOES NOT USE ITCIS DATA BASE AND RUNS SEPARAT FROM ITC IS BATCH MONITOR INITIATES BATCH JOB WHEN A BUFFER IS FULL OR AT SPECIFIED INTERVALS (TO EXEC-8 COARSE SCHEDULER Figure 5-Software environment-CPF inquiry and batch job initiation CREATE AN APPLICATION PROGRAM TO DO A PARTI CULAR JOB ON THE DATA BASE. performed which result in specific products and require specific inputs. When followed closely, the SDM has provided high quality and complete documentation, and has resulted in well controlled timely development efforts. In addition to the general design philosophy su pported by SDM for ITCIS, two Operations Research tools were used during Preliminary Design in a rather unorthodox, but effective, and therefore interesting manner. The tools themselves were not unusual at all to problems in System Design or business-Decision Theory and Simulation. USE OF DECISION THEORY Figure 4-Use of DMS-llOO to create a data base SCHEMA and an application program Initially, there were a wide variety of basic system approaches (both centralized and decentralized) which could have been used for ITCIS. In order to resolve such controversial and subject problems, a "payoff" Design Approach to Integrated Telephone Information OUTPUT ANSWERS TO INQUIRIES OFF-LINE BATCH (INCLUDING ALL RUNS STARTED LOCALL Y OR REMOTELY. INDEPENDENT OF ITCIS ITC I S DATA BASE IN P UT / OUTPUT I I I I I UN I VAC DMS11 00 RUN-TM PORTION INPUT/ OUTPUT TO ITC I S DATA BASE , LOCAL OUT PUT I I r~~-L-, I INOTE: I IBATCH RUNS ARE NOT STARTE 0 IIMMEDIATELY BUTI EXEC-8 PRINTER PUNCH SH1BIONT COtWLEX I~~E T~~H~~~~~~ I I~~A~~~T i~~:~~: I Gi~R+~6\:~E - 541 tification was more subjective but could sometime be related to cost. For example, the reliability of two approaches can be made equal by spending more money on one of the approaches. 5. The coordinator of the analysis (one of the authors) carried arguments to various participants. As a result the deviation of many responses was lessened. 6. The standard deviation high, low and mean figures for efficiency and value were put into the efficiency matrix and the value vector, and 3 matrix multiplications were performed, using means, high and low figures. 7. The results were clearly in favor of the two centralized approaches, although the unit proc essor and the multi-processor options were hard to distinguish. A choice was later made for the multi-processor because in the configuration required it is cheaper, and· the modular architecture of the UNIVAC 1110 (UNIVAC was the most serious contender because PTT had I I I L ______ ..J ITIME AND CORE (AN NOT BE GIVEN *' (DMR) Figure 6-Software environment-CPF inquiry and batch job processing DECISION MATRIX -l ~ CRITERIA CRITERION CRITERION CRITERION 1 2 ; CON FIGURATION APPROACH APPROACH A matrix was used in the following manner: 1. Four alternative configurations were designed, I I I varying from completely decentralized (thirteen stand-alone processing facilities, one for each telephone district in the Netherlands) to completely centralized (all computing done at a central facility). These four configurations define rows of the decision matrix (see Figure 7). 2. A number of detailed· Selection Criteria were p.stablished in the categories shown on Figure 7. 3. Value Analysis followed. Members of the ITCIS Preliminary Design Proj ect Staff and other interested parties were asked to rate the criteria according to value. 4. Efficiency analysis was conducted with a small ·group more familiar with the trial configurations; efficiencies of each approach toward meeting each criteria were established and quantified. In some cases (e.g., cost) a quantification was simple and direct-in others quan- 13 COMPUTERS. CONNECTED TO SERVICE INTERDISTRICT TELEPHONE NUMBER INnUIRIES APPROACH B EFFICIENCY OF APPROACH B TOWARD MEETING StLECTION CRITERION i SEVERAL COMPUTERS (BUT NOT 13 ) INTERCONNECTED AND SHARING A CENTRAL DATA BASE. BUT EACH CAPABLE OF STAND-ALONE OPERATION APPROACH C MULTIPLE UNIT PROC ES SORS AT A CENTRAL FAC I LITY WITH A REDUNDANT SHARED DATA BASE APPROACH 0 A SINGLE MULTIPROCESSOR SYSTEM (AS DESCRIBED IN PART I I) SELECTION CRITERIA CATEGORIES: COST RELIABILITY SECURITY DES IGN FACTORS ERGONOMIC FACTORS Figure 7-Decision matrix 542 Fall Joint Computer Conference, 1972 a UNIVAC 1106 already, which was to be used for development) is almost as internally redundant as completely separate computers. The Decision Theory Analysis, although still subjective in some cases, reduced subjective decision to such a low level that most participants were unable to be swayed by prejudice toward the much more encompassing (but still underlying) question of centralization vs. decentralization. MODEL SET UP AND INITIALIZATION (DONE BY OUTER ALGOL BLOCKS AND PROCESS OF SIMULA) r - - - iI TERMINATION ~~mATION, USE OF SIMULATION -, : I AND OUT PUT L _____ Design validation for a system as large and complex as ITCIS is very difficult and simulation of ITCIS was one of the primary methods used to answer design questions. Very early in Preliminary Design a GPSS-II model (GPSS-1100 was not yet available) was written of the ITCIS system, to investigate CPF throughput, line utilization, etc. It was soon found that the ITCIS model was pushing the standard version of G PSS-II to its limit. GPSS-II extended version would not run on the PTT 1106 because the core was too small and, in addition, GPSS was not suitable for such a highly interactive, complex system. Certain useful results were still obtained but more simulations were planned for detailed design to examine the behavior of the CPF under overloads, etc., as the hardware and software design become firmer. Splitting the model into submodels was considered but the entire ITCIS is very interactive due to the communications technique and the nature of Inquiry 008, i.e., multiple operator/computer interactions due to a single telephone number inquiry. Of the available UNIVAC simulation packages, systems and languages, SIMULA I (which was developed at the Norwegian Computing Center) was selected because: • It was supported as standard UNIVAC software (unlike SIMSCRIPT and GASP) • It allowed similar model structure to GPSS (i.e., flow or process oriented) In fact the SIl\IULA language is so powerful that a model can be organized in a manner very much like the required organization of other modelling systems. (SIMULA 67 was not yet available but would have been even more suitable.) As a result a special purpose simulation system was developed. (See Figure 8.) It is based in SIMULA, to be used to create models of ITCIS by changing the I ~ r - - - - - - ,I , I S IMULAT I ON OF I ~~~~~A~~ROWARE I ["" ,,,",,,,,..1J COMPUTE AND OUTPUT REPORT ,.-• I - -- --: H!l9~~ TRANSACTION :·~OCESEACH PROCESS SES. IS A DATA SET, WITH ITS oWlr TYPE, 14ESSAGE LEW,TH, RANDOM INTEGER AND INDEX INTO ITS SCHEDULE. THE SCHEDULE FOR EACH TYP:: OF TPRO TELLS WHIC!: [PRO'S ARE TO BE USED BY A TPRO AND IN WHAT SCHEDULE ORDER. TPRO' S ARE ACTIVE BETWEEN USAGE INDEX OF EPRO' SAND PASSIVE WHILE WAITING IN AN EPROQUE OR BEING PROCESSED. I I, I mg_.~!!!l~'!. ACTI~ IF NOT BUSY ';"'- . - --......:-. I EQUIPMENT PROCESSES. SIMULATE HARDWARE cor4PONENTS (CPU, DISK, ETC.) AND SOFTWARE COMPONENTS OF THE THE EPRO' S SYSTEM. ARE GENERATED AT IN ITIALIZATION IN THE MAIN ~~gHss (UPPER LEFT TPRO i SCHEDULE .... AND m!2.'5. TPRO SCHEDULE ( INTEGER ARRA y) SIMULATION OF TRANSACTIONS CO~IPU - EACH ENTRY IS A POINTER TO A DATA SET DESCRIBING THE PARTICULAR PIECE OF EQUIPMENT. I ~ I ~P!l_09~UR!lA'!. I f--- ~ '-----<0 PLACE IN QUEUE SAME NUMBER OF ENTRIES AS IN fPRO ARRAY. EACH EN~RY I S A SET WH ICH I S USED AS THE I~AITING 0UEUE FOR TPRO' S. I I Figure 8-Major components of ITCIS simula modeling system parameters of an input data set. (This can be done easily by using a system utility.) This simulation system was developed so that changes to ITCIS could easily be incorporated in the model without reprogramming, and a great deal of detail could be inserted in certain areas of the model without affecting other areas. This much generality was felt necessary for the ITCIS project, but the resulting system is in fact useful for modelling any teleprocessing system, and there are plans to use it in other systems. The only model completed to date was a model very similar to the GPSS-II model. The results are compatible although more extreme situations show up in the SIMULA model. The SIMULA program takes about one-third of the core required by the GPSS ' model and executes in about 85 percent of the time required by the GPSS program. I Design Approach to Integrated Telephone Information SUMMARY The ITCIS System, besides the usual expected advantages of providing better customer service and maintaining lower operating costs, has resulted in the testing of a System Development Methodology, the development of a Simulation tool which will make simulating computer systems several orders of magnitude easier, and the synthesization of a decision theory technique which should prove ~seful in many subjective situations. In addition the value of using Economies of Scale has been demonstrated, in that a large system, designed to handle a peak inquiry load which will only last a few hours a day, will be used 543 during non-peak hours to handle all other customer services. By the end of June 1972, a SCHEMA had been written using the DDL of DMS-1100, and trial runs indicated that use of this large, general purpose system is feasible. The possible availability of a COBOL compiler which can generate re-entrant, asynchronous program code also indicates that it may be possible to use COBOL to a much greater extent than normally thought possible in systems programming. REFERENCE 1 October 1969 Report of the CODASYL Data Base Task Group of the CODASYL Programming Language Committee Field evaluation of real-time capability of a large electronic switching system by W. C. JONES and S. H. TSIANG Bell Telephone Laboratories Naperville, Illinois stantial increase in the traffic-handling capacity of No.1 ESS, the streamlining of the program is perhaps the most significant. Figure 1 shows a history of the No. 1 ESS call capacity improvements. In 1965 and 1966, we had only two No.1 ESS offices in service, and the maximum call carrying capacity at the time was about 25,000 peak busy hour calls. This is only an estimate since few meaningful measurements were taken. Peak busy hour calls are the number of calls estimated for an office on its highest normally recurring busy hour during the busy season. This number must be known in order to engineer the office properly. It is generally assumed that of the number of peak busy hour calls, about 85 percent complete to talking and about 15 percent to busy or no answer. The capacity of the system is usually expressed in terms of a range--maximum, average, and minimum. This is because the type of traffic handled varies from office to office. For example, an office may have more interoffice calls than intraoffice calls. The machine time consumed by processing different types of calls are different. The program released at the beginning of 1967, has a capacity of about 27,000 peak busy hour calls. Soon after, program improvements were made to increase the call capacity to 32,000. In 1968, a significant increase in capacity was achieved through the addition of a signal processing unit (SP) to the basic central control processing unit (CC), to take over many of the repetitive functions such as scanning lines and collecting dialed digits. SP is essentially an input/output processor. With SP, the system reached a capacity of about 64,000. In 1969, further program improvements were made which increased the capacity to 71,000. As more experience was gained, it became clear that the central control processor was spending most of its INTRODUCTION The Bell System's No.1 Electronic Switching System! (N o. 1 ESS) was designed for medium-to-Iarge telephone offices. Its performance has been improved radically since first put into service on May 30, 1965, in Succasunna, New Jersey. By June, 1972, some 250 No. 1 ESS offices were in service equipped with over 4 million customer lines. This paper describes a load test which was conducted recently in a field office to evaluate the real-time capability of the latest program, named SPCTX-5. In order to aid the reader in the comprehension of this paper, some background information is provided. BACKGROUND General I I Capacity of a telephone system is multidimensioned. It can be measured in terms of quantity of calls processed, number of customers served, traffic load handled by the switching network, etc. The No.1 ESS capacity, to date, has been limited only by the capability of the processor and its associated program. The number of calls that the system can handle is directly proportional to the amount of time that it takes to process individual calls. History on call capacity I .1 I In the past 6 years, a great deal of effort has been expended to increase the call-carrying capacity of the No. 1 ESS as well as to add new features. Improvements were made through both hardware and software means. Of the several developments that have produced a sub545 546 .,.... Fall Joint Computer Conference, 1972 120K .... >C !:U Ua: C::t Q. 90K O ~:z: I~ "'::t ~ID 60K >:.:: U)C '" Q. 30K flj o 1966 1967 1968 1969 1970 MAXIMUM AVERAGE MINIMUM 1971 1972 CALENDAR YEAR Figure I-No.1 ESS capacity real time in network connections. To simplify some of these actions, the service link network was developed. This equipment is a new adjunct to the standard switching network, and is designed to simplify the operations that set up ringing and digit-receiver connections. The service link net~ork hardware relieves the No.1 ESS program of much of the chore of establishing these routine connections and, thus, increases the call capacity by greatly reducing the average time the program is spent with each call. Introduction of the service link network in 1970, raised the No.1 ESS capacity to 83,000. Major program improvements were made in 1971, which resulted in a maximum capacity of over 100,000. This was the original No. 1 ESS design objective. It is significant that these capacity improvements have been made at the same time that many new features, which tend to reduce capacity, were being added. The program introduced at the beginning of 1967, has 137,000 instructions (44 bits/instruction). The latest program, which has incorporated many new features as well as fault detection and diagnosis for the new hardware and call capacity improvements, has over 230,000 instructions of which over half are required for maintenance. shows the real-time consumption; the bottom half represents 100 percent of CC real time, and the top half 100 percent of SP real time. The SP performs all the input/output (I/O) work, and the CC executes all other work associated with each call. The abscissa shows peak busy hour calls. As shown in this Figure, different job segments comprise the total real time consumed. The SP overhead and the CC overhead are constant, relatively independent of the amount of traffic being processed. The equipment dependent I/O real-time consumption is directly rela~ed to the amount of equipment, such as lines and trunks, in the office. A trunk is a circuit which provides a communication channel between telephone offices. The line representing the equipment dependent I/O indicates that the real-time usage increases as calls per hour increase. The lines for the per call I/O and per call other work show the percent of real-time consumption also rising as the level of calls goes up. The slopes of these lines depend upon the average amount of real time consumed per call. As program improvements are made, the slopes decrease. As a result, the call capacity goes up. A number of techniques have been developed to perform real-time studies. Simulation2 is one of these techniques. A somewhat simpler method is to determine the number of machine cycles required for overhead and for processing various types of calls su.ch as intraoffice O------------------------------~----------------------------------------_, SP OVERHEAD Il. en 80 ---IOO4-----------------------------------~~ UJ 80 ~ ~ ...J <[ 60 UJ a: (,) (,) CC OVERHEAD O~------------~------------~------------r_--------_,------------~ Capacity studies o 20 40 60 PEAK BUSY HOUR CALLS (XI03) Figure 2 shows a greatly simplified model of the realtime usage in No. 1 ESS. The number on the ordinate Figure 2-No. 1 ESS real-time usage 80 I Field Evaluation of Real-time Capability of Large Electronic Switching System with TOUCH-TONE®, outgoing with multifrequency pulsing, etc. These cycle counts can then be used to estimate the call carrying capacity. In early days, both call-type cycle counts and capacity calculationsrequired a large amount of manual effort. Since there are many different types of calls, both jobs were time consuming and tedious. Now, all have been automated. In the automated procedure, a programmable electronic call simulator controlled by a computer system, is used to generate a set of test calls in the system laboratory. While a call is being processed, machine cycles are counted by a program-controlled counter. ESS utility programs collect and record the counts on a magnetic tape. This tape is then processed on a commercial computer which prints out the cycle counts. A total of 50 call-type cycle counts have been collected. Since each type of call is processed by the system in stages separated by time breaks, the call-type cycle count is made up of cycle counts of many program segments. The segment cycle counts are summed to determine the real time required for each type of call. The overhead and call-type cycle counts form the data base for a capacity estimating program ESS1CAP, and are used for computing the call processing capacity " of No.1 ESS offices. ESS1CAP is a conversational, timeshared computer program. Capacity is estimated in the manner similar to manual calculations employed previously, except that the manual effort is greatly reduced. In the manual method, the number of input items that the telephone company traffic engineer had to ·provide was great-well over a hundred. Now only about 20 ~' input items are needed for the ESS1CAP program. These are the traffic mix of the office for which the peak '(" hour call capacity is to be estimated. Traffic mix includes such items as percent originations of total calls, and a further breakdown of originations into partial dials, intraoffice, and outgoing calls. ESS1CAP is also I an important tool for evaluating capacity improvements. Our experience with ESS1CAP capacity predictions has been good. However, its accuracy had not been fully verified under controlled load conditions. ;1 LOAD TEST Purposes Although we were reasonably sure about our call capacity estimates, we felt that there is no substitute for a load test with real calls. Some of the reasons for the load test are to verify the real-time improvements and , to check the accuracy of the cycle count data collected . with the newly automated procedures. Such a test is also II. 547 important to secure the confidence of the telephone operating companies in the use of the ESS1CAP program, which is made available to them for call capacity estimates of their No. 1 ESS offices. Another reason for performing the load test is to determine the adequacy of the present overload control with the increased call capacity. In overload control, the executive control program must ensure that peaks of traffic do not overwhelm the system and cause it to process less than an optimal load. An overload control program can modify the operation of the executive control. Under overload conditions, scanning for new service requests is slowed down and the hopper which is used to store these service requests, is emptied much less frequently. It is both convenient and desirable to handle overload by limiting the traffic. Each additional service request is a commitment by the system to perform a certain amount of data processing. By an orderly deferral of further commitment during overload, we guarantee that the data processing overload is rapidly eliminated. Otherwise, the delays in processing calls becomes so great that many customers hang up before their calls have been completed. This wastes that portion of call processing which had already been completed, and leads to further overloads when these customers try again to complete their calls. The overload control has been simulated on a general-purpose computer; however, it has not been fully verified in the field under overload conditions. Environment and equipment The load test was conducted in an office at Portland, Oregon, prior to its cutover into service. The system was running very well at that time with no obvious calleffecting hardware or software problems. There was a sufficient number of trunks and service circuits available in that office to make the test possible. Service circuits include items such as signal transmitters, digit receivers, ringing and other similar circuits. A total of 1100 test lines and 1000 trunks were employed in the test. The calls are generated by 11 load boxes. Load box is a type of test set (Figure 3). Each set can originate calls on 50 lines which are divided into ten groups of five lines each. A maximum of 13 digits may be pulsed over each group. All five lines in the same group will dial the same digits, but each group can have a different set of dialed digits. Using the technique to be described later, it is possible to terminate five lines dialing the same number to five different lines. Audible signals can 548 Fall Joint Computer Conference, 1972 The load box traffic tends to be more bunched than real customer traffic. Although staggered originations and disconnects are provided among the ten groups in a load box, the five lines in each group, nevertheless, will originate and disconnect simultaneously. Dialing for all 50 lines will be done at the same time. Therefore, the load presented to the system by the load boxes is more severe than one would encounter with real life traffic. Techniques Figure 3-Load box be monitored on a single line at any time through the use of the monitor amplifier and speaker furnished. Lamps are also provided for indicating the states of the lines, such as origination, dialing, and disconnect. Various timing adjustments can be made which determine when to start dialing, disconnect, etc. It should be pointed out that most test set actions are governed by time delays rather than system responses. For example, . a set will start dialing after a preset delay following origination, whether dial tone is received or not. The test set also provides termination for 50 lines. Each terminating line is equipped with a lamp, counter, and a circuit to detect ringing. Upon reception of a call, this circuit will trip ringing-simulating an answer back to the system, light the lamp and increment the counter. The circuit also applies a special tone to the terminating line for manual monitoring purposes. When this tone is received at the originating end, it verifies that a talking path is indeed established. The counter is used to determine the number of calls that are completed to talking. A number of techniques are used to get around some of the constraints and problems associated with the test environment and the existing test equipment. To simulate an outgoing call to another office and an incoming call from a distant office, a loop-around technique is employed. Under this case, a call is originated from a test line to an outgoing trunk. The output of this trunk is then fed back into No. 1 ESS by looping around the tip and ring conductors of the outgoing trunk to an incoming trunk. The system completes the loop by placing a terminating call to another test line in the office. Each loop-around call, therefore, consists of two calls, one outgoing and one incoming call. If a load box repeats its cycle every 36 seconds, then each test line can generate calls, either intraoffice or loop-around, at the rate of 100 per hour. To terminate five lines in each group (dialing the same number) to five different lines, the speed calling technique is used. In No. 1 ESS, this is one of the new customer services provided. Speed calling permits a Figure 4-Sequencing unit Field Evaluation of Real-time Capability of Large Electronic Switching System customer to place calls to one of a group of frequently called numbers by dialing an abbreviated code instead of the seven or more digits that would normally be required. The abbreviated code consists of an access code of 11 plus one or two digits depending upon the size of the abbreviated dial list. With this method, five lines in the group are all assigned the speed calling feature, and each is given a different abbreviated dial list. Thus, all five lines dialing the same abbreviated dial code will place calls to five different terminating directory numbers. The calls can even be a mixture of intraoffice and interoffice calls. Also different types of signaling over different trunk groups can be selected for the interoffice calls. All this is accomplished by selecting the proper directory numbers for the abbreviated dial lists. The real-time consumption in processing a speed dialed call is about the same as a conventionally dialed call. The time saved in collecting the extra digits is consumed by the additional time required in translating the abbreviated code. Eleven load boxes if simultaneously originating, dialing, outpulsing, and disconnecting, would place a very unrealistic load on the system. This is true even if enough service circuits existed in the office to permit such a test. Equipment limitations are major considerations in designing the load test. For example, the number of transmitters of a given type limits the nu~ber of interoffice calls that can be placed simultaneously at any given time. To stagger the operation of load boxes, a sequencing unit was designed (Figure 4). This unit generates start signals to load boxes in a fixed-time relationship. Therefore, it permits staggering of originations, dialing, and disconnects between load boxes. This results in a more evenly distributed load to the system, and simulates more closely the real-life traffic through the office. With 11 load boxes operating at a 30-second cycle time, it was possible to generate 120,000 test calls per ...0« ~----r-------, I I I I : I } i~ NO.1 ESS I I I LOAD BOX 2 3 "I}}}}]d 5 tttn 6 kJ 7 8 9 10 11 o 40 30 10 _ DIAL TONE l1li DIALING mEl OUTPULSING []!J RING 50 & TRIP EEl TALKING o DISCONNECT AND AWAITING ORIGINATION Figure 6-Load box timing chart hour. The load box traffic mix was chosen to duplicate, as nearly as possible, the expected mix at the Portland office. Figure 5 shows the load test arrangement. Figure 6 displays a simplified 30-second cycle timing chart for 11 load boxes. It gives the relationship between load boxes and the time allowed for the various phases of call processing: 5 seconds for dial tone, 3 seconds for abbreviated dialing, 5 seconds for outpulsing (transmitting signals over outgoing trunks), 6 seconds for the ring and ring trip, 2 to 4 seconds for talking, and 7 to 9 seconds for disconnect and awaiting origination. This type of timing chart is utilized to determine the maximum service circuit and trunk demand. Analysis of this nature is based on the concept that the load pattern is repetitive. It can be appreciated that accurate timing adjustments for load boxes and the sequencing unit are extremely important. t- I I I I a: 549 - LOO P AROUND L--_ _---I TEST I EQUIPMENT L _ _ _ _ _ _ .--J ~----Figure 5-Load test arrangement ti° Q Test monitoring There are many hardware and software performance indicators built 'within the No. 1 ESS. The system routinely prints out messages on various aspects of its well being. One particular message, the quarter hour message, is of particular interest to us during the load test. It shows the total number of originating and incoming calls processed by the system in the preceding 15-minute period. 550 Fall Joint Computer Conference, 1972 Dial tone speed test data are also included as a part of the message. To conduct a dial tone speed test, the system performs a routine test every 4 seconds (or 225 tests in 15 minutes). The test involves an origination from a random selected line. If dial tone is not detected within 3 seconds, a counter is incremented. Dial tone delay is an important indicator of the quality of service provided to the customers. The dial tone speed test data, therefore, is closely watched in the normal service of an ESS office. A high percentage of more than 3second delays usually indicates an overload or some other trouble condition. Another important data included in the quarter hour message is the number of the executive control or main program cycles. As traffic load builds up, the main program cycles get stretched out longer and longer. Consequently, the total number of main program cycles becomes less in a fixed length of time. The main program cycle rate, therefore, is an inverse function of machine load. Early studies of the traffic data obtained from the then existing program have led to the use of 3500 as the minimum number of main program cycles in each 15minute interval that can be tolerated while meeting all customer service requirements. The peak call capacity of the system, therefore, is the call rate which results in 3500 main program cycles per quarter hour. The call completion rate, that is the number of calls completed to talking, can be derived from the load box counters. Since the ratio of originating calls and incoming calls is known from load box setups, the call completion rate can be calculated with reasonable accuracy from the quarter hour traffic data. This is a much easier method than the one which requires resetting and reading some 560 counters for each test. onds during this test. This call completion rate was 99.8 percent. In real-time consumption, a completed call takes more machine cycles than a call to busy or no answer. Therefore, the overall results, in terms of the number of calls, main program cycles, and call completion rate, are considerably better than expected. The highest load box traffic placed on the system was 118,080 calls per hour. For this test, the overload control was disabled with a program overwrite to avoid having the inputs rejected at a lower traffic level. At this load, 98.5 percent of calls completed to talking, and 1.5 percent to partial dial or recorder. A call will be routed to a reorder tone if a needed service circuit is not available. Partial dial in this case was caused by load box dialing before receiving dial tone. About 1.8 percent of calls encountered dial tone delays over 3 seconds. The main program cycles were 1092 in the 15minute interval. The longest duration of a main program cycle was 5.5 seconds. This data was obtained through a program overwrite which prints out the num-. ber of various main program cycle durations in a special message. The system performed remarkably well even at such low number of main program cycles. It is believed that the dial tone speed test failures and long main program cycle durations were caused primarily by load bunching of load boxes. Figure 7 shows a plot of the number of main program cycles per second versus elapsed time on one of the test runs. The load bunching is clearly evident. A Varian recorder (40 cm per second tape speed) was used in gathering this data. Figure 8 shows some of the data on main program cycles versus traffic and ESS1CAP predictions. This figure applies to Portland office only. ., U TEST RESULTS III II: III ., ... L Call capacity 30 III TRAFFIC 106KIHR LOAD BOX CYCLE TIME: 30 SEC U The observed call capacity follows closely but generally higher (about 5 percent) than the capacity predicted by the capacity-estimating ESS1CAP program. For the Portland load box traffic mix (10 percent intraoffice, 45 percent incoming, 45 percent outgoing calls) the peak busy hour call capacity computed by the ESS1CAP is 108,000 calls at a main program cycle rate of 3500 per 15 minutes. ESS1CAP assumes that 85 percent of calls complete to talking and 15 percent to busy or no answer. The load test result shows 111,300 calls per hour at a main program cycle rate of 4767 per 15 minutes. T~ere were no dial tone delays over 3 sec- >- U 20 :IE c II: c:J o :: 10 z C :IE ~ o o z O~~~+W-U-U~~~~~~~~~~~~~ 10 20 30 40 50 TIME-SECOND Figure 7-Distribution of main program cycles in load test-a random sample 60, Field Evaluation of Real-time Capability of Large Electronic Switching System PORTLAND OFFICE LOAD TEST (SPCTX- 5 PROGRAM) 40K~--------~--------~--------~-' II: ~ 3&K~ II: 30K~~~----+---------~--------1-' ~ III i 551 It appears that for the Portland office, the system can operate satisfactorily with 2500 main program cycles per quarter hour. The system not only would be able to 'meet, but also would exceed the service requirements. This corresponds to about 4 percent additional capacity over and above what has been achieved for the present program. Whether the lower main program cycle limit can be applied universally to all offices with the same program merits further investigation. t- Il: Overload control C ~ G ...• &. •... 20K III U ~ u I c •o CD ESS1CAP PREDICTION 10KJ---------~--------~--~----~-; .. II: &. Z C 2 ---- 3.IK • 1. -------,••.,: 108K/HR o o 10K 20K The present overload control appears to be satisfactory. A temporary program overwrite was installed so that the various overload parameters could be modified via teletype input messages. A series of load tests were made by varying this set of parameters. It was possible to clamp the quarter hour main program cycles in the general vicinity of any desired number during overload. In other words, we can limit the load to the system to any amount regardless of the service demands . Based upon our experience at Portland, the existing values of overload parameters could be left as they are. 30K CALLS PER QUARTER-HOUR Figure 8--Number of main program cycles versus traffic It should be pointed out that even though the number of calls in the load test matches closely the ESS1CAP predictions with similar number of main program cycles at very high traffic, the corresponding test load to the system is greater. This is because the call completion rate in the load test was higher. Main program cycles As mentioned earlier, to meet the service requirements, 3500 was used as the minimum number of main program cycles per quarter hour. The test result shows that at the Portland office, the system is capable of providing good service well below the 3500 minimum limit. When the system was processing calls at the rate of 116,300 calls per hour (8300 over ESS1CAP predicted peak capacity), the number of main program cycles was 2104 in a 15-minute interval. At this level of traffic, only 15.5 percent of originating calls encountered more than 3 seconds of dial tone delay ~ The service requirements allow 20 percent. Program problems The load test also provides maximum interactions for various segments of different call programs. Usually more program bugs will show up under heavy load, not because they are traffic dependent, but rather the conditions which lead to the bugs happen more frequently. During the entire 2-week load test period, only one callaffecting program problem was found, and this problem is truly traffic sensitive. Many outgoing calls were lost during the early part of our test under heavy load. In ESS call processing, a certain timing is required to be done on outgoing trunks placed on a waiting list after each use. The program performs this timing on ten trunks every 200 milliseconds. In very high traffic, there were more trunks put on the list than could be taken off. Thus, not enough trunks were available to handle calls. A program change since has been made to correct this situation. SUMMARY A series of load tests have been made on the Bell System's No.1 ESS latest program in a field office at Portland, Oregon. The results of these tests have validated the real-time improvements predicted. The call capacity 552 Fall Joint Computer Conference, 1972 estimate made by the ESS1CAP computer program is credible and conservative by about 5 percent. The system is capable of providing good service at the Portland office well below the main program cycle rate of 3500 per quarter hour. A 2500 figure is more realistic. This corresponds to a gain of an additional 4 percent capacity. Overload control appears to be satisfactory. New improvements, primarily in programming rather than in hardware, are being made which will further increase the No. 1 ESS call capacity in the future. REFERENCES 1 W KEISTER et al No. 1 electronic switching system Bell System Technical Journal Vol 43 Parts 1 and 2 September 1964 2 P N ADOR S H TSIANG Operational testing of software by means of simulation techniques lEE Conference Publication No 52 International Conference on Switching Techniques for Telecommunication Networks London England April 21-25 1969 Minimum cost-reliahle computer communication networks by JOHN DEMERCADO Ministry of Communications Ottawa, Canada probability matrix for a general network is presented in the Appendix. INTRODUCTION Iii' A designer of a computer-communications network must consider the reliability of a given network design as a function of its realization costs. Although there is an abundance of graph theoretic and queuing tools that have generated algorithms for the topological synthesis and analysis of large networks, 1 ,2,3 it is unfortunate that the reliability and cost dimensions of the problem have not been satisfactorily related. In this paper a fast recursive algorithm6 and elements of the theory of discrete Markov process5 are combined to develop a new theory of reliability prediction for gen~ral networks whose nodes and links have constant failure and repair rates. The methods presented are applicable to a large class of networks including computer-communication networks. The reliability theory as presented permits the time behavior of these networks to be rigorously treated. In particular, methods are given for computing reliability functions 5 for the network. These functions given the probability that the network is in an acceptable state at time t; methods are also given for computing the moments of the first time that the network passes from given ."acceptable" states to any arbitrary or specified "failure" states. In the section on Preliminaries, a method is outlined for obtaining the transition probability matrix of a Markov chain that contains the per unit time probabilities of communication between each pair of nodes in the network. In the section on Reliability lVlodelling these methods are applied to develop a reliability prediction model for any given network. An algorithm for minimum cost reliability modelling which delineates the computational procedure for using these results is then given. The recursive algorithm for computing the transition PRELIMIN ARIES Let I P v I be the 2 X 2 transition probability matrix associated with node 1]v of a n network 1]. That is Av pa,a,v pa.!,v v=l, ... ,n (1) Fv PI,a,v PI,l,v where entry pa,a,v is the probability that node 1]v which now operates successfully will operate again successfully one unit of time later. Node 1]v is said to be in acceptable state Av if it is operating successfully and failure state Fv if it is not. Similarly, let I P vu I be the 2X2 transition matrix associated with the link (uv) of the network 1], that is: Auv Fuv Auv pa,a,u,v pa,l,u,v Fuv PI,a,u,v PI.!.u,'D I P uv I (uv) E {L} (2) where pa.!,u,v is the probability that link (uv) which is now operating successfully will fail in the next period of time. The probability PI.a,u.v is the repair probability this is, the probability that if link (uv) is now failed it will be repaired in one unit of time. The link (uv) is said to be operating successfully when it is in state Auv and unsuccessfully when it is in state F uv. The network 1] is thus specified by a set of n nodes, 553 554 Fall Joint Computer Conference, 1972 l1v; v=l, ... n, denoted by {N}, a set of links {L}, and a set of matrices associated with these nodes and links. Then for every node 'Y/i and any other node 'Y/j not directly connected to l1i by a single link, it is possible using the Algorithm given in the Appendix to compute the set of 2 X 2 matrices Aij Fij Aij Xa,a,i,j Xa,j,i,j Fij XI,a,i,i XI,I,i,i I Xij I = j=l, ... , n (3) In equation (3), I Xii I is the one step transition matrix for the node pair l1i, l1j. In particular Xa,a,i,j is the probability that there was communication between nodes l1i and l1h at time t and that there will be communication at time t+ 1. The network is said to be in acceptable state A ih if the nodes l1i and l1j can communicate at time t, and in state F ij otherwise. For a n node network there are n such 2X2 matrices for each node l1i and it is possible to combine these into a transition probability matrix I Mi I of dimension 2nX2n for each node l1i as For the purpose of this paper the matrices I Ci I, i = 1, ... n will be considered as n X n unit matrices, corresponding to the case of distinct independent failures of individual nodes. The techniques presented could be modified for general I Ci I matrices to include progressive degrees of failure, and dependence of failures of given nodes on other nodes. For the purposes of reliability modelling the matrices I Di I can be considered as nXn null matrices (all entries zero) . Furthermore, from the definitions given in Equations (3) and (4) it is readily apparent that the matrices I Ai I and I Bi I are diagonal matrices. This fact greatly simplifies computational procedures. All the methods presented in this paper depend on the computation of the matrices shown in Equation (3). The recursive algorithm described in the appendix has been developed to calculate these transition probabilities for a general network. RELIABILITY MODELLING For each node of the network, a reliability function Ri(t) can be defined as network 11 is in every acceptable] Ri(t) =Prob I Ai I I Bi I } A.~F, (4) I Di I ICd } F.~F, [ state in Ai at time t that is, node l1i is reliable provided it can communicate with all other nodes in the network at time t. Defining Si(O) as the (1 X2n) initial state vector for node l1i: Typically Si(O)' = 11, ... 1, 0 ... 0 nones Where A i and F i are the set of acceptable* and failed states associated with node 1Ji. They specify its operation with respect to the other nodes of the network. In general then the probabilistic behavior of the network can be characterized by the set {I Mil}, i = 1, ... n of 2nX2n matrices. The matrices I Ai I, I Bi I, I Ci I, I Di I are n X n square matrices which contain the one step transition probabilities. In particular I Ai I, governs the transition from Ai---7Ai; I Bi I, governs the transition from Ai---7Fi; I Ci I, governs the transition from Fi---7Fi; IDi I, governs the transition from Fi---7Ai; n * Ai = U k-i Aik. Vi= 1, ... n Vi= 1, ... n Vi= 1, ... n Vi= 1, ... n I n zeroes Let Si(t) be the state vector at time t corresponding to node l1i, and the kth element of the vector Si(t) be Si,k(t); then n Ri(t) = II Si,k(t); i=l, ... n (5) k=1 The product in (5) is over the set of acceptable states and the state probabilities satisfyb Transition failure probabi'ities Let Pik (t) be the probability that node l1i initially connected to node 11k is no longer connected to node 11k after t units of time. Let I Pi(t) I the nXn matrix of these probabilities, then the following is true Minimum Cost-Reliable Computer Communication Networks Theorem 1 Consider a network 7], with nodes 7]iE {N}, i = 1, ... , n and links (ij) E {L}. Let the transition matrices for these individual nodes and links be I P v I for 7]vE {N} and, I P uv I for (uv) E {L} respectively. Then 555 obtained for each of the nodes 'Y/i in terms of the matrices I Ai I and I Bi /. To obtain these expressions, define for node 'Y/i the random variables r ik as rik="first time that node 'Y/i is no longer connected to 'Y/k." (6) Then Pik (t) is the probability distribution function of the random variable r ik that is where (10) Proof This is a straightforward extension of Theorem 2 in Reference 5. A special result of Theorem 1 is the following: Corollary Generating functions Since rik is a discrete random variable its moments can be obtained from its generating function gik(Z) which is defined co gik(Z) = ~ Ztpik(t) The n X n matrix of the steady state probabilities defined as (7) t->oo (11) t=1 For each node 7]iE {N} of the network 'Y/ we obtain a matrix I Gi(z) I of generating functions which can be written in matrix form as satisfy co i=l, ... , n (8) where I I / is a nXn identity matrix. I Pi I also will be a n X n identity matrix since all physical systems will ultimately fail with probability 1. I Gi(z) 1= 2:: zt I Pi(t) I, i=l, ... , n. (12) t=1 Let I riCk) I be the (nXn) matrix of the kth moments, k= 1,2, ... of the random variables {rij} for node 'Y/i. Then: {k=1,2, ... Proof {i=l, 2, ... , n Equation (6) can be expanded as therefore the limit in Equation (7) is the sum of the infinite Geometric series Using the Equation (6) it is possible to obtain a closed form expression for I Gi (z) I and hence I r i (k ) I in terms of the matrices I Ai I and I Bi I without the need to evaluate infinite series of the form given in Equation (12). This result is given in Theorem 2. co I Pi I = ~ I Ai It I Bi I t=O which is (8) I Q.E.D. Moments of the first time to failure The Reliability Modelling of the Network 7] is completed if in addition to the Equations (5), (6) and (8) it is possible to compute the moments of the first time that various types of disconnections occur in the network. Closed form expression for these moments can be Theorem 2 Let 'Y/ be a n node network, with nodes 'Y/i E {N}, and links (uv) E {L} with corresponding one step node and link transition failure probabilities I P v I and I P uv I. Then the generating functions I Gi(z) I for the moments of the random variable r ij "first time no connections exist between node 'Y/i and node 'Y//' are given as i=l, ... , n (14) 556 Fall Joint Computer Conference, 1972 Proof option r, the matrices r=l, 2, ... Substituting (6) into (12) and expanding gives 00 I Gi(z) I= 00 ~ zt I Ai It-II Bi 1+ t=I ~ zt I P i (t-l) I (15) t=I the first term in (15) is z II I I-z I Ai 11-11 Bi I and the second is simply z I Gi(z) I. Q.E.D. It can also be shown5 that the moments tiCk), k=1 the random variable t i, where· . Find those options that will realize 1] with the best reliability and have cost less than or equal to some constant C. ~ C(1]v) nVfl N} + ~ (UV) C(uv) ~C f{ L} Step 1 Find the set of options whose realizations of 1] satisfy (17). If none then C is too low, and should be incremented by an amount t,.C and Step 1 repeated. When options are found go to Step 2. t (== "first time that node 1]i is disconnected from all other nodes" are ·given by Step 2 n tiCk) = ~ k=l, 2, ... tij(k) (16) j=I In particular for k= 1, Equations (13) and (16) are the important mean times to first failure. MINIMUM COST RELIABILITY MODELLING ALGORITH1VI In general the problem facing the network designer is which equipment to use to realize a given network within a given cost range and with what reliability. There are many, variations of this algorithm depending on which aspect of the network design problem is receiving the most emphasis. In the version indicated below emphasis will be placed on the problem of implementing the most reliable network that is below a certain cost. . Use the Recursive Algorithm in the Appendix for each network node 1]i and each of the options r, that satisfy the condition (17), to compute the general one step transition probabilities in Equation (3) and arrange these as matrix I Mil for the rth option I M{ I; i=I, ... n; r=I, ... , h Go to Step 3. Step 3 Let us assume that there are h such options, for each acceptable option r, r=I, ... , h, compute using the methods given in the paper [Rl(t);1 Pl(t) 1;/ tl(k) I;t{(k)], i=I,2, ... ,no The most reliable network is the one which has the "best" reliability function, longest mean time to failure, etc. Usually the designer can do the selection trade offs, by comparison of the above functions for the different options. Network costs Example . Let {C(N)} and {C(L)} be the cost matrices for the nodes and links of network 1]. That is the cost C of a realization of 1], for a given topology and type of equipmentis C= ~ C(1]i)+ 'lie { N} ~ (uv) C[(uv)] Consider the network 1] with full duplex links (17) E {L} Algorithm Given the network 1], with nodes 1]iE {N} and links (ij) E {L}, and link and node cost transition matrices for r implementation options, that is given for each {N} = {1]I, 1]2, 1]3}, {L} = {(1]l'Y/2), (1]11]3), (1]21]3)}. The following options, can be used to construct this network. Furthermore the cost of the network realization should if possible not exceed C = 250 units. Minimum Cost-Reliable Computer Communication Networks 557 Use of the Algorithm Step 1 Using Equation (17) we find for these two options Option 1: Cost 236 units < C = 250 units Option 2: Cost 213 units < C = 250 units :. Both Option 1 and Option 2 must be considered as possibilities in realizing the network 'Y]. Step 2 The recursive algorithm in the Appendix is now used to find general one step transition matrices for each node 'Y]i, i = 1, 2, 3, for each of the two options r= 1, 2. Option 1 : node 'Y]1 0 0 .10 0 0 0 .86 0 0 .14 0 0 0 .88 0 0 .12 .90 0 0 .10 0 0 .88 ' 0 .80 0 0 .20 0 .15] , 0 0 .85 0 0 .15 Au .90 Option 1 : node 'Y]2 .1 ] .8 I P32 = 1 .9 .1] [ .9 .1] [ .04 .96 == f I As1 1II1 Bi II Option 1 : node 'Y]3 I P 122 I = I P 212 = 1 .2 .8 .88 0 0 .12 0 0 0 .85 0 0 .15 0 0 0 .75 0 0 .25 .15] .92 .12] .86 558 Fall Joint Computer Conference, 1972 REFERENCES Option 2: node 111 0 0 .1 0 0 0 .88 0 0 .12 0 0 0 .82 0 0 .18 .91 0 0 .09 0 0 0 .83 0 0 .17 0 0 0 .80 0 0 .2 An .9 1 H FRANK .I FRISCH Communications, transmission and transportation networks Addison Wesley 1970 2 J DEMERCADO N SPYRATOS The synthesis of non flow redundant computer communications networks Proceedings Brooklyn Polytechnic Symposium on Computer-Communications and Teletraffic NYC April 1972 3 J DEMERCADO K TOTH The synthesis of computer communication networks Department of Communications Report June 1972 available from Library Department of Communications 100 Metcalfe Street Ottawa Ontario 4 E HANSLER A fast recursive algorithm to calculate the reliability of a communication network IEEE Transactions on Communications Vol COM-20 No 3 June 1972 pp 637-640 5 J DEMERCADO Reliability prediction studies of complex systems having many failed states IEEE Transactions on Reliability pp 223-230 Vol R-20 No 4 November 1971 6 J DEMERCADO N SPYRATOS Recursive algorithms for stochastic networks (To appear) Option 2: node 112 Option 2: node 113 APPENDIX A33 .87 0 0 .13 0 0 0 .79 0 0 .21 0 0 0 .81 0 0 .19 In this appendix an algorithm6 is presented for determining the probability of disconnection between any two nodes of a general communication network with failing links and nodes. This algorithm offers considerable computational savings compared to a recent algorithm by Hansler.5 Notation Step 3 The methods given in the Paper can now be used to compute the reliability function, mean times to failure, etc., for each of the network realizations using option 1 and option 2. Computation indicates that option 2 will yield better performance than option 1 even though it costs less. The following symbols are used: pa.!,i pa,l,i,i Xa,l,i,i ACKNOWLEDGMENTS di The author would like to thank Mr. Nicholas Spyratos for his assistance in the preparation of this paper, and his secretary lVIiss Gail Widdicombe for expertly producing the manuscript. is the probability that node l1i operates, at time t, but fails at time t+ 1. is the probability that the link (ij) operates at time t but fails at time t+ 1. is the probability* that node l1i can communicate with node l1i at time t but there is no communication at time t+ 1. denotes the degree of node l1i. That is the number of linkcs onnected ot this node. * Xa, I, i, i should be identified as the transition probabilities given in equation (3). Minimum Cost-Reliable Computer Communication Networks denotes the set of nodes at the ends of the d i links having 1Ji as a common terminal node. 559 Therefore, L u[(x, A, B)]+ L u[(y, A, B)] Xa,f,l,n= Fl L To simplify the notation we suppose that the network has n nodes and the probability Xa,f, l,n is to be calculated. Pa,f,io'Y[(A, B)] A=N-B L =pa,f,l +(l-Pa,f,l) p.(A)p.(B} A=N-B N-B II (l-xa,f,i,n) II Xa,f,l,i II B B A = {1Ji EN / 1Ji cannot communicate with 1Jn at time t+l} B= {1JiEN/the link (i, j) fails at time t+l} L A=N-B Define the following sets: Y l = {x, y} where x means failure and y operation of node 1Jl. 'Y[(A, B)] A,BEP(Nl) The recursive algorithm (l-Pa,f,l,i) N-B A=N-B X B II Xa,f,i,n(l-Pa,f,l,i) N-B Now the space YlXP(Nl ) XP(Nl ), where P(Nl ) denotes the power set of N l , is clearly the sample space on which the failure events for the network must be identified. Suppose that u is the probability measure on the sample space, 'Y the probability measure on P(Nl ) X P(Nl ) and p. the probability measure on P(Nl ). The events of failure for the network belong to one of the following two classes of events: F l = {(x, A, B)/A, BEP(Nl )} F 2 = {(y, A, B)/AUB=Nd The last formula is a recursive one since Xa,f,i,n is the probability of disconnection between 1Ji and 1Jn but in a simpler network. Comments Since Nl contains dl elements, the various ways we can set A = N - B are, in all 2d1 and, therefore the number of terms in this formula is 1+2d1 • On the other hand, Hansler's recursive formula uses 1 +2 2d1 terms. Therefore the computational savings of the present method are (1+2 2d1 ) - (1+2 d1 ) = 22dl_2dl. The Control Data ® Star.IOO file storage station by G. S. CHRISTENSEN and P. D. JONES Control Data Corporation St. Paul, Minnesota INTRODUCTION I I ',i .[ , 'I:" " hardware and software hut vary at the device controller and system software interface level. The service station is a key station in that it manages the system resources and provides fan-out to the second level stations. Operating system functions are thus distributed in a manner which closely follows the distribution of the hardware. The connecting links between the distributed operating system functions are controlled by a set of system messages and message handling is a key factor in efficient operation of the system. The choice of where each operating function should be located is often self-evident, although a few functions are assumed to be movable from one element to another~ Any final decision regarding function locations may depend on experience with particular work loads. In general each operating function is located closest to the resource being used and may be local or remote to the STAR processor. This provides modularity of both hardware and software and such advantages as: Successful experience with the Control Data® 60001 and 70002 computer series has led to implementing improved concepts3, 4, 5 of distributed computing in the STAR-IOO computer system. In the STAR system different computing functions have been physically separated from one another. Each 'computing function is performed by an independent system unit which possesses its own processing logic and memory. Thus each is performed in its own right in an optimal manner. STAR-100 computer6 is a high speed processor capable of producing 100 million results (from a multiply operation, for instance) per second in its 4 or 8 million byte core memory. STAR itself cannot perform data input/output, this is performed by input/output units called stations which have channel interfaces to STAR. A station consists primarily of a mini-computer specially . designed for data handling. The STAR design is thus simplified by not having to contain device interfaces; this modularity is important in the design of large computer systems. 7 Also the processor overhead of driving peripheral devices is relegated to the stations thus freeing STAR for additional user computation. Experience in several hundred Control Data® 6000 computer sites has shown it impossible to operate very high speed computers efficiently without distributing peripheral functio~s. As well as distributing the peripheral device drivers in STAR it has been found possible to perform system functions, such as file management, in the stations. So far 9 different STAR station types have been identified and built, these include: maintenance and monitoring, paging, storage, media (tape and disk), unit record, communication, display/ edit, graphic and service. These contain the same basic • independence from other units, particularly in the areas of non-propagation of errors throughout the system and more immediate action on fault conditions. • capability to be independently maintained. • easier replacement of future new hardware or software parts. • easier addition of new types of stations. Figure 1 illustrates the layout of a large STAR system showing the connections between the various stations. A STAR central processor with its immediate storage is simply another station within the system-a data processing station-and in no way does it have any extra authority. It does, however, have two stations 561 562 Fall Joint Computer Conference, 1972 Figure I-STAR system showing station connections fairly intimately connected, the paging station and the maintenance/monitoring station. The paging station, under control of the hardware virtual page mechanism and the operating system, provides temporary storage for programs exceeding the available core space. The maintenance station, besides its functions of off-line and on-line fault diagnosis/repair and preventive checking, has the capacity to collect detailed information about STAR's performance. The data management function is performed by programs' executed within the central processor. These functions include merge, sort, select, scan, append, extract and insert. The data manager in turn exploits the storage station via message commands. This paper describes the storage station which manages the storage and retrieval of working and archival files. The seu consists of a mini-computer, display/ keyboard, small drum and channel interfaces which exist with power supplies, cooling fan and operator panel in one cabinet. The mini-computer has an instruction set which caters to bit and byte manipulation. It contains 8K (K= 1024) 8-bit bytes, expandable to 16K of 1.1 microsecond core memory. There is a 200 nanosecond version of the same meory but the 1 MIP (million instructions per second) rate of the computer is adequate for most present applications. The drum has an average access time of 17 milliseconds and a capacity of approximatly 80,000 bytes. It is used as a store for program overlays and also as a refresh memory for the display console. The mini-computer (or buffer controller) provides a single, parallel-block transfer channel with hardware control for high speed data transfer. Its maximum rate is one 16-bit word plus two parity bits per memory cycle, 1.1 microseconds. The buffer controller also provides up to 512 normal channel bits for lower speed data transfer and device and station control. These bits are organized into 16 input channels and 16 output channels with 16 bits in each channel. Their use is determined by the individual peripheral devices on the station. The normal channel bits of the buffer controller provide the primary mechanism for control of the other station elements and the attached devices. A direct STATION BUFFER UN IT STORAGE INTERFACE STORAGE DEVICE STATION HARDWARE The hardware used to implement the distributed computing concept in STAR is designated as various classes of input/output stations. Each Star channel terminates at a station with a common interface. The station (Figure 2) consists of an seu (Station Control Unit) and an SBU (Station Buffer Unit). BUFFER CONTROLLER Figure 2-STAR station Control Data ® Star-100 File Storage Station interface of normal channel bits is provided between the SCU and the SBU (Figure 2). The SBU consists of up to 64K bytes of memory organized in eight interleaved banks of 8K bytes each. Each bank has a memory cycle of 1.1 microseconds with a maximum bandwidth of 14 million bytes per second. Storage control logic provides for 12 independent channel accesses. The SBU is always associated with a controlling SCU. The general function of the SBU is to provide intermediate buffering of data, fan in/out from one STAR channel to many other station channels and working storage for the station. The interfaces to attached devices are contained in the SBU. The following features of the SBU and its interfaces are important to its application and performance as a storage control mechanism. I I I I • The high bandwidth allows simultaneous transfer of a number of storage devices into the SBU. The CDC 844 disk pack, for instance, has a transfer rate of approximately 1 million bytes per second compared with the SBU bandwidth of 14 million bytes per second. • Device control operations such as connecting, addressing, and status are accomplished directly from the SCU over the buffer controller normal channel to the SBU device interfaces. This provides direct, detailed control of the devices. • Actual data transfer between a storage device and SBU takes place automatically under control of the SBU device interface hardware. This frees the SCU during SBU data transfers. • The SCU can directly access STAR storage via normal channel .bits and the SBU interface. This mode is advantageous for message transfer and queue control. • The SBU device interfaces are capable of stacking (queueing) functions and data transfer specifications. This allows maximum performance of the devices while relieving the SCU of having to intervene during brief, critical events such as crossing of intersector gaps. • The SBU device interfaces have the capability of chaining SBU memory areas creating a contiguous data stream to a storage device from several SBU memory areas. This is used to automatically assemble and disassemble sync pattern and header information with the data block. • All data is stored in fixed length blocks of 4096 bytes. Storage 8tation 80ftware Tasks are communicated to the storage station via system messages. Each message selects a specific task 563 and is handled by an SCU routine referred to as a task overlay. The task overlay contains the control code necessary to accomplish the task by calling various station subroutines and device drivers. Associated with each device attached to a station is a device software driver in the SCU. This is a specialized routine which actually drives the devices through the SBU hardware interfaces. The other station routines communicate with the drivers through a driver parameter table and a driver-maintained status table. One status table exists for each device. In addition to the device drivers other station subroutines are associated with station resource management and utility functions. Examples of these are: • • •. • • Rent buffer space in SCU Rent block in SBU Transfer SBU/SCU data Transfer CPU data Hash file name Each station contains a standard program referred to as the nucleus or monitor. It contains a set of simple diagnostic routines known as quick-look diagnostics, a system autoload program, driver programs for the microdrum and for the keyboard associated with the character display, programs to manage the microdrum overlay mechanism, and the main control and organizational program. The SCU microdrum holds a copy of all station software. The SCU operates under one of four different systems. These systems are allocated as follows: 1. 2. 3. 4. Microdrum loader system Run system (normal case) Diagnostic system Off-line system The system is selected at start-up of SCU programs. The selection of a system causes linking of all routines associated with the system via scanner and overlay tables. When running, a given system contains the operating portion of the nucleus (the system selection and set-up routines are discarded to be called again from the microdrum for a new autoload) and specified routines fixed in core. The remaining routines are called when required from the microdrum. Calling a routine is accomplished through an overlay table which contains the core address of the called routine or the address of a routine which reads it into a core· area available for temporary overlay and buffers. All routines associated with a system are thus directly accessible yet only the 564 Fall Joint Computer Conference, 1972 most active routines reside dynamically in the SCU core memory. The scanner is the idle loop of the nucleus. The primary purpose of the scanner is to map normal channel data signals to overlay programs based on priority and logical selection, thereby providing a low overhead mechanism for handling asynchronous external events. The external events (such as channel flags, microdrum busy, or input read signals) are presented to the sc~nner program via one or more normal channels. ASSOCIated with each channel are two logical selection words, the EN ABLE mask, and the STATE mask .. The channel data is exclusive or'ed with the state mask in order to select the appropriate signal polarities, and then matched against the enable mask. Any bits that are now set represent selected channel events in the desired state. These bits are scanned from left to right and the first bit found set is used to enter the overlay program associated with that bit. If all hits are zero, the scanner moves on to the next channel and repeats the procedure. One or more memory words are used to initiate internal events via the scanner. In this case, the memory words rather 'than the channels represent the raw input to the scanner. In a typical station, the scanner cycles through two normal channels. and two memory words. A detailed error handling and maintenance system is provided in the stations. Abnormal conditions in the operation of a device cause the device driver to exit to an associated error handling routine. This routine handles retries and error logging. It operates in conjunction with a device monitor routine which is used to set the parameters for a device, such as number of retries, turning device off to system, and breakpointing in the driver. A maintenance information system provides an English translation of the driver parameter tables and the device status tables on the SCU display and provides operator access to control the device operation via the device monitor. Included in the maintenance system is the capability to run diagnostics and utilities associated with a device. These tests are controlled using the device driver, parameter table, and status tables and may be run in conjunction with the system operation on other station devices. The station file system is implemented as a set of task overlays. Each overlay is associated with a specific system message and provides the coordination necessary to accomplish the system task. using the station device drivers and subroutines. Each message has a separate overlay to process it. If the message occurs frequently, the overlay remains in seu core; otherwise, it is called in from the micro drum when it is needed. Active file index All the file messages are listed in the Storage Station Messages section. A file is simply a collection of stored bits which has a descriptor and can be operated on bya set ~f file functions. No file function is processed until the file is first opened, and the last file function must always be a close function. In the open message, identification of a file is by file name (File Name Section). For other messages, identification of a particular file is by its active file index, the index of the file entry in the active file table (Figure 3). The file index is assigned by the storage station and returned to STAR in response to the open message. The advantage of this arrangement is that the majority of file messages use a 16-bit identifier rather than a variable length string of characters which could be quite long. By maintaining active-file information in core storage, access validation and transformation between logical (file page) and physical block locations is normally accomplished with negligible overhead and without introduction of superfluous input-output operations. The size of each active file table entry is 8 characters (Figure 3). Initially, one SBU block of 4096 characters is devoted to the active file table, allowing 512 open files at anyone time. This can be easily expanded if required. If the file is noncontiguous, read/write of file pages which are not in the first contiguous section require an access to the storage map in the file descriptor. One could trade the number of open files allowed for fewer open files with each entry containing the map of more than one file section. NI u 1 15 8 8 16 16 BITS 'FILE SYSTEM The file system described here exists totally within the storage station and is independent of any particular processor station, network configuration or storage device type. Creation, maintenance, recovery, access, security, st,orage layout, accountancy data, and performance statistics are all managed within the station. 0/1 F M U S N = free/used flag description pointer access mode = unit number = starting address of file on device = number of blocks contiguous to S = = Figure 3-Active file table entry format Control Data ® Star-100 File Storage Station 565 File descriptor (catalog entry) Each file has a descriptor which' describes the file as seen by the system. The descriptor (Figure 4) consists of 8 sections: Header, characteristics, name, storage map, access map, activity map, and two free sections reserved for later use .. The set of descriptors for those files occupying a particular storage unit is itself part of a file and may be processed like any other file; it is called the descriptor file or catalog. This catalog mayor may not be on the same storage media as the files it describes. Normally, removable media contain their own catalog files, but these may be copied elsewhere on mounting. The size of an individual descriptor is variable in modules of 256 bytes up to a maximum of 4096 bytes. Initially, just one module (256 bytes) is used for each descriptor. As an example the Control Data ® 844 disk pack at present has the following layout. 32 BITS T #127 32 BITS RN I I L RB PTR FREE FREE IRA # 2 I IRA # BITS;~ I RA T = type 64 records RN number of = RB = fL i~s~l~~~s file length = file RL = ~~~:d length in bytes FLI RL I = descriptor I length in bits record length in 4 RA = relative address in words 3 # B 12 BITS 1116 BITS PTR= pointer to structure defini tion wi thin file 2 HEADER CHARACTERISTICS 3 NAME N = number of file sections on this unit S]. = starting block address N]. = number of blocks contiguous to this address PTR= pointer to extended storage map 5 ACCESS MAP ACTIVITY N2 L]. = length local name in bytes L2 = length owner ID in bytes FREE 8 FREE MO 8 I Mp I N ETC PTR L2 1M2 U2 ETC 32 PTR BITS OaT Blocks 0, 1 Blocks 2,3 Pack Label Free Storage Map Blocks 4 through 67 Descriptor Modules (1024) Blocks 68 through Data Files 23,027 oaT LU Pack Catalog ) File oa T I FREE FREE c E = c reati on date N number of entries and time = expiration date ~g ~~~iic a~~~::s = last update date ~i ~:~~t~ ~~c~~;es of U1 and time U]. user 1 identifier = number of opens PTR= pointer to extended and "time LU N access map Figure 4-File descriptor format Storage map section I '11'1 "I,' 1 II II, :1 II,' I To facilitate processing in the SCU, the descriptor proper is kept reasonably small, but the sections can have pointers to overflow areas and these may be of any length. The space allocated for the catalog is also variable. Initially 64 blocks of 4096 characters are used providing 1024 files per storage unit. The allocation of a descriptor module to a newly created file is done either by the use of a free space map for the modules or by a hashing algorithm. To locate a file descriptor, the file name is hashed to locate a bucket in a hash table which contains entries of file names and pointers to their descriptors. This hash table is re-created (say at autoload) so that the system is not tied to anyone hashing algorithm. The hash table may itself become quite long and is kept on the storage unit with the files or some associated storage device. An alternate implementation simply hashes directly to the descriptor module. If the file name does not match the name in that module, a search is made of the surrounding modules in that block. It is to be emphasized that normally the descriptor is only referenced on the open and close functions. All read/write file pages reference the active file table which is in SBU core. The storage map (Figure 4) allows for a storage system to be divided into 256 units, each with a capacity of 65,536 blocks (228 bytes: approximately 268 million). A variation on this scheme is being implemented which has 32-bit field lengths for block addresses and number of blocks contiguous to an address. This will cater for larger storage systems with capacity up to 232 (approximately 4 billion) blocks or 244 (16 trillion) bytes. Characteristics section The characteristic section of the descriptor is shown in Figure 4; the different file types are undefined (0), ASCII coded delimited (1), AS CIl coded fixed (2), binary STAR (3), binary fixed (4), foreign delimited (5), foreign fixed (6), virtual memory (7), drop (8), labeled (9), multiple volume (10), incomplete (11), temporary/permanent (12), input (13), and output (14). Types 1 through 6 categorize file types according to their internal coding. The exact definition is not important but it should be noted that types 3 through 6 566 Fall Joint Computer Conference, 1972 File name section have an associated record map which describes the record structure of the file. A virtual memory file has a virtual address associated with each file page. The drop file is similar to the virtual memory file, it is a frozen image of an executing job which has been suspended for some reason together with the virtual address list and current program status information. The labeled file is one that has a label (somewhat similar to the file descriptor) within the file. These last three types use a pointer address to locate the relevant structural information within the file. The multiple volume/unit file is one that is spread over a number of storage units; yet, it is logically one file. An incomplete file is one upon which, although incomplete, processing begins; such is the case when processing begins after only a portion of tape is spooled onto a disk. No doubt other file types will be added, but these provide sufficient categorization for the present. Perhaps the most important thing about a file is its name. It is that which identifies it uniquely and which must be used to open the file before it can be processed. The name consists of two parts, a local name followed by an owner identifier. Each part consists of a variable length string of characters (the ASCII alphanumeric set plus -$#). The parts are separated by the ASCII space character. Certain characters are reserved for special use within file names: *, /, ., &, I, and ? The period character . for instance, is used to indicate some hierarchical structure within the name. The file system is not normally concerned with the internal structure of either the local name or owner identifier, who gave this name or identifier, or where it came from. Essentially the name is used to locate the descriptor. Example of File Name MATRIX / J~49 4D4154524958204A323439 local name {ASCII hexadecimal} ~ notation} owner identifier I separator Storage layout section The storage layout of a file varies with the particular storage device but the goal in each case is the same, that is, to organize file storage in a manner which does not deter high-performance of expected access requests. A large block of data, stored as 128 consecutive physical blocks on a Control Data ® 817 disk requires a little over a tenth of a second for transferring its half million bytes; stored differently, its transfer could take up to 10 seconds. The allocation and layout of a file are governed by a RENT/STORE routine which can be replaced or modified in order to implement more elaborate policies. This routine normally tries to allocate the desired number of blocks in a contiguous fashion; if this is not possible it will allocate the total space on as few large sections as possible. The map of the disk file is a vector. Each element of the vector is a storage location and a number indicating how many blocks are contiguous to the location. As many contiguous sections as possible are represented in the descriptor proper and the rest are kept in an overflow area. Access security section Every time an OPEN operation is requested through a storage station, the. access rights of the user are checked against the access map in the file descriptor. The open function has an owner identifier and a user identifier catenated to the local file name and terminates with the ASCII record separator code. If the user and the owner are the same person, the user identifier may be omitted. If the access is not permitted, an invalid access response is returned to the message sender. For the other file messages, validity of the operation is checked against the mode stored with the file entry in the active file table. Initial file access mode is one of four: • • • • Cannot delete Cannot alter access modes Cannot write Cannot read The access modes for the different users are set or modified by access mode messages to the station.. The default option on creation of a file is that the o-wner has open access and the public has no access. The file system again is not concerned with the internal structure of the user identifier; it is simply a variable length string of characters, .and in fact, could be an agreed upon group name rather than an individual user identifier. Control Data ® Star-IOO File Storage Station 567 Example of File Identifier MATRIX ® Space local name J249 -I @ owner identifier L543 -I @ Record Separator user identifier If the user is the owner, then this can reduce to MATRIX @ J249 Multiple stations A typical STAR installation might include two STAR processors supported by a number of storage stations each having different storage devices attached. Such a system exists and is in experimental operation. It is possible for a user to specify on opening a file its location; if this is not done STAR sends messages to all storage stations listed in its directory. The station where the file exists opens the file and makes the appropriate response which STAR keeps till the file is closed. The other stations return a "not found" response. At present on "create and open" the user must specify the storage station .where the file is to be created but need not specify the device on that station unless he wished to do so. If a file of the same name already exists on the station it will be deleted if it is a "temporary" file and the new one will be created; otherwise, if it is a "permanent" file an "already exists" response will be returned to STAR. Files may be shared between different users and two STAR processors providing they are open for read only access. The station has no difficulty returning responses and data to the correct STAR processor since it is identified by its zip-code· in the message header. File system extensions The basic file system can be extended to provide specific features. The basic file system and these extensions are expected to provide a very complete, standalone storage system. • Automatic mounting-(packs cartridges, tapes, etc.)-Standard ASCII labels, automatic allocation of drives, and the mounting and dismounting with label validation. • Multivolume files-Allowing a file to spread itself over a number of units. @ @ • Archival file directory-One archival file directory for all files, on-line and off-line. • Structured file name and ownerluser identifiersStructured names and identifiers linking files of a given class into a more complex access mechanism. • Shared access security-Extended access mode conditions. • File editions-Allow the user to specify file edition numbers or default to the latest edition. • Accounting and performance statistics-Recording of station accounting and usage statistics. • Experiment with distributing certain data management functions, which are now performed in STAR, to the stations. STORAGE STATION MESSAGES The following list gives messages which can be processed by storage stations. The underlined parameters are returned with the response. Function Create and open file Open File Close File Close and delete file (temporary and permanent) Close and delete temporary file Keep file Set file characteristics Set file length Is file open Read file pages Write file pages Read file descriptor Parameters File Messages F, M, Mo, Mp characteristics, name' and user ID F, M, characteristics, name and user ID F, characteristics F F F F, characteristics RB F, characteristics, name F,N,S,B F,N,S,B F,B 568 Fall Joint Computer Conference, 1972 Parameters Function Write access list entry Modify owner and public access Mount (tape, pack, cartridge) label L F, M, user access key F, M, user access key Read N blocks from storage unit Write N blocks from storage unit Read N blocks from storage unit with header Write N blocks from storage unit with header Storage unit status B,N,S L RESPONSE MESSAGE PRIVATE PRIVATE CODE LENGTH USE OF USE OF SENDER SENDER PRIVATE USE OF SENDER TO FROM MESSAGE ZIPCODE ZIPCODE FUNCTION CODE Test Messages B,N,S B, N, S, Header B, N, S, Header 16 16 16 16 BITS Details of the message formats are not significant here, except to mention that it is valuable to limit the number of different formats used and to ensure field lengths are large enough to cater for future storage devices. The format is important, however, in respect that once it is established and used by a number of routines even small modifications to it can have widespread, effects and are often time consuming and difficult to checkout. Legend for Parameters CONCLUSIONS F M = active file index (given by storage station) = access mode bit 0 set means cannot (used on open) delete bit 1 set means cannot alter access modes bit 2 set means cannot write bit 3 set means cannot read Mo, Mp = access modes of owner and public, respectively (used on creation) N = number of blocks/file pages to be transferred S = starting file page number (starts with zero) B=core block number, if bit 0 set B=SBU address User ID = user access identifier, variable length string of characters which ends with the record separator character. N 1 = total number of· blocks N 2 = number of disabled blocks N 3 = number of active blocks N 4 = number of free blocks L = label on pack, cartridge, tape RB = Length of file in blocks Message header format Preceding each set of message parameters is a header which has the following format. The storage and file functions of a general-purpose computing system have been identified and separated to operate outside and in parallel with the central processor in a stand-alone, local or remote, storage station. This station forms part of an overall plan to distribute specific functions associated with general-purpose computing into separate computing elements or stations. The same basic hardware and software is used in all these stations to lower manufacturing costs by high volume production. The features and performance of this station have worked out well on delivered and in house systems using drums, large disks and disk packs for archival and working store on both large and small computers. The main reason for success has been the clear identification of the basic file and message functions required and a careful implementation of these functions, utilizing both hardware and software techniques on a standard STAR peripheral station. Although designed to meet the needs of the STAR-l 00 processing unit the storage station is well suited to be used with any processor which matches its cha.nnf~l and message protocol; it is also relatively independent of storage device type and system configuration. ACKNOWLEDGMENTS This work was performed in the Advanced Design Laboratory of Control Data Corporation in St. Paul, Minnesota. The head of this laboratory and chief designer of the CDC STAR-I00 and STAR-IB Computer Control Data ® Star-lOO File Storage Station Systems is J. E. Thornton. The success of the project is mainly due to his leadership and support, together with the hard work over a number of years of the following people in the Advanced Design Laboratory's peripheral group-No G. Horning, W. C. Hohn, D. J. Humphrey, L. H. Schiebe, E. V. Urness, D. A. Van Hatten, C. L. Berkey, D. C. McCullough and R. A. Sandness. REFERENCES 1 J E THORNTON Design of a computer-The Control Data 6600 Scott Foresman 1970 2 T H ELROD The CDC 7600 and Scope 76 Datamation April 1970 Vol 16 No 4 pp 80-85 569 3 J E THORNTON System design and implementation Proceedings of Third Australian Computer Conference 1966 pp 90-102 4 P D JONES C J PURCELL Economics and resource parallelism in large scale computing systems Proceedings of Fourth Australian Computer Conference 1969 pp 241-244 5 P D JONES N R LINCOLN J E THORNTON Whither computer architecture Proceedings of IFIP Congress 1971 pp TA4/162-TA4/167 6 W R GRAHAM The parallel and pipeline computers Datamation April 1970 Vol 16 No 4 pp 68-71 7 D J WHEELER Assessing the complexity of computer systems Proceedings of IFIP Congress 1971 pp 1/164-1/168 Protection systems and protection implementations by R. M. NEEDHAM University of Cambridge Cambridge, England INTRODUCTION a set of words whose addresses are contiguous in a virtual address space, and whose protection status is at all times the same. Protection is thus intimately bound up with addressing, since our very definition of the unit of protection is in terms of an addressing mechanism. This approach allows us to specify a protection regime by giving a list of those segments accessible to a process at a particular time, together with notes as to the kind of access which is permitted. A somewhat minimal protection regime could then be described by saying that segment A contains data to which read-write access is permitted, while the words of segment B may only be executed as instructions. A major object of research in protection implementations is to propose mechanisms whereby any desired protection regime can be implemented, with as few limitations as possible imposed by the engineering approach adopted. Protection regimes are not constant during the life of a process. They may change as the work proceeds, and in a fully general discussion they should be allowed to change arbitrarily. Statements would be allowed, for example, to the effect that certain segments were only accessible if the value standing in a system microsecond clock were prime. In practice, one departs from full generality, and limits those circumstances which may give rise to a change of protection regime. A reasonable approximation is to say that changes of protection regime are associated with changes of the segment from which instructions are currently being extracted; this is not to say that such segment changes must necessarily give rise to changes of protection regime, but only that no change of protection regime may occur without a change in the program segment. The first proposals for the physical design of a processor which took these ideas seriously were by Y ngve and Fabry.2 A summary of their ideas will be found in Wilkes. 1 The essential aspect of these proposals was that there was no restriction on them imposed by any of the implementation techniques. It was thus possible to arrange, in principle, that a process's capability list The paper discusses the nature of systems for protection of information in the central memory of a computer, describing the potentialities and limitations of a variety of approaches. It is based upon work done in the course of a current project on protection systems at the Computer Laboratory, Cambridge, and outlines a system which is being developed to the point of hardware implementation in the Laboratory. PROTECTION SYSTEMS AND PROTECTION IMPLEMENTATIONS For the purpose of this paper Protection is understood to refer to logical and physical mechanisms for controlling access to data in the central memory of the computer. The purpose of protection systems is to insure that at any point in the execution of a job by means of the computer, only those data objects which require to be accessible are accessible, and that this access is only of the mode, for example reading only permitted, which is required for performance of the task in hand. The object of work on protection systems is to devise mechanisms which will afford protection to the greatest extent possible, and do so without excessive expense in hardware, runtime, or program size. The hope is that if such mechanisms can be devised, then it will be very much easier to contain and to localize the consequences of hardware or software failure, and to know much more precisely than is the case at present which of the activities in which a computer is engaged must be suspected of having been spoiled by the failure, and must therefore be re-initiated. In order to get any rationale for a protection imple,mentation, we must set up some defined concepts in terms of which protection systems can be discussed. The first of these is that of the segment, the unit of information to which protection applies. A segment is 571 572 Fall Joint Computer Conference, 1972 always contained exactly and only what it should. Y ngve and Fabry adopted the same approach to change of protection regimes as we have, namely that it only occurred when there was, additionally, a change of the program segment. A special instruction, called ENTER, caused a complete replacement of the process's capability list, and could thus change the protection regime of the process in an arbitrary way. In a capability system of the type just described there are two problems calling for further discussion. First, if a capability indicates the absolute store address of the segment to which it refers, there is the problem of updating all copies of a capability when the segment is shifted in memory, and in deleting all copies when, and only when, the segment is destroyed. An obvious solution is to centralize the lists of absolute capabilities, and replace the capability lists associated with running process by lists of pointers to the central list. This is more than a simple technological device because it conceptually replaces the current capability lists of a program by a mechanism which selects from a larger list. This selection function has come to seem more and more important to us. Secondly, the original proposals dealt rather clumsily with pieces' of data which were the property of a process, in the sense that if the process were deleted the data would go too, but which were only accessible when the right pieces of code were being executed. On the other hand, the proposals dealt very elegantly with bundles of capabilities which invariably became accessible when a certain piece of code was used, regardless of the process using it. The idea that will be developed is that the capability list of a process is to be regarded as that which defines a selection from all the absolute capabilities that exist; at any time in the history of the process some other mechanisms make further selections from the capabilities of the process, the selected capabilities being physically accessible in virtue of the current protection environment. Thus we have the idea of multiple levels of selection. We may now focus on the implementation of protection as the implementation of selection functions among capabilities, where by a capability we mean that which defines the physical position and size of a segment and the access mode allowed. Immediately there are two ways to proceed, which depend on the extent to which addressing is brought further into the protection implementation. One way is to proceed by means of lock and key systems. A lock and key system is one in which any segment, including here a segment containing capabilities, has associated with it a lock. At any stage in the history of a process there is associated with the process a key. Access to a segment is permitted if, and only if, the current key fits the lock of the segment. A lock and key system tends to separate the notions of addressing and of protection. In such an approach, a process may address any segment whatsoever; only those in which the key fits the lock will do other than give rise to violations. There is no relationship between the mechanisms for addressing a given word and the mechanisms for addressing a given word and the mechanisms for validating access other than that which is implicit in the segment being the unit of protection. Accordingly, it becomes feasible to arrange that a segment has the same name, that is to say it is addressed in the same manner, throughout the lifetime of the process or even to go further and to say that all segments are uniquely identified in the computer. This approach has much merit in that it avoids any renaming problems when communication is involved. Unfortunately, it proves extremely difficult to set up lock and key systems which are of sufficient generality to achieve the desired results. Because of the great potential advantages of lock and key systems, the reasons why this is so merit some examination. LOCKS AND KEYS Consider a situation in which all distinct protection regimes which can ever occur are identified by name or number. One could then imagine a lock and key system in which the key consists simply of the name of the current protection regime and the lock associated with the segment consisted of a list of the names or numbers of all protection regimes in which the segment was accessible, together with the nature of the access permitted. It is clear that this places no restrictions at all on the variety of accessibility patterns which can be implemented. It is, however, a very expensive thing to consider doing; there is no convenient limitation which can be set on the length of the lock, and the process of consulting it to see whether a particular key matched would be extremely slow. All practical lock and key systems which have been proposed work by means of some sort of encoding scheme, the purpose of which is to reduce the locks and keys to a fixed and convenient size. Any such encoding scheme regards the lock and key as being bit patterns between which a certain relation is sought. For example the lock and key may be two parts of a single valid message unit in an error correcting code. If we take this as an example, we see that every lock has to have the right relationship to each key which is supposed to fit it. If we now take a particular lock, it is possible in virtue of the structure of the relationship we have constructed to list in principle, all of the keys which will open it; equally every key can be accompanied by a list of those locks which it will open. We can think of listing out the possible locks and Protection Systems and Protection Implementations b I I I drawing lines from each pointing to the appropriate keys, and also putting in lines in the inverse sense from the keys to their locks. We shall be able to express the total variety of protection regimes we are interested in if, and only if, we can make an assignment of locks and keys to the segments in the regimes in such a manner that invalid access is never allowed. This poses an extremely difficult combinatorial problem in all nontrivial cases. It is at the least an extensive task to find allocations which satisfy all the constraints and, even if one can succeed in doing so, a small change in the protection regimes to be implemented may result in a total upset to the lock and key allocation. It appears that one either has to put up with the necessity for computing allocations of locks and keys, or alternatively to accept a lock and key system which will not implement all the protection regimes which might be required. One can sum up by saying that sufficiently powerful lock and key systems are too difficult in practice because of the allocation problem, and that lock and key systems in which one can face the allocation problem are not powerful enough. A good example is the plain hierarchical protection system afforded by representing the locks and keys by small integers, and saying that access is- permitted if, for example, the key is less than or equal to, the lock. This is easy to think about and easy to implement; unfortunately, it places extreme restrictions on the protection regimes. which can be described. If in protection regime A, something has to be accessible which was not accessible in protection regime B, then necessarily everything which is accessible in B must be accessible in A too. It is just not possible to deal with some situations which occur commonly in practice, such as the following. Suppose there is an input program Pi which has to have access to an input buffer B i ; suppose further that there is an output program Po which has to have access to an output buffer Bo. It is not possible to arrange that each of these programs has access to its buffer but not to either of the others. These are simple consequences of the linear arrangement of privilege. An additional difficulty about simple lock and key systems is that they do not deal satisfactorily with the the non-static and unpredictable nature of protection regimes. Arguments which have been passed to a , program which runs in a particular protection regime may carry with them the requirement that during i running certain segments are accessible because they I contain the data passed and not because they are permanently associated with the called program. A simple hierarchic system is in no difficulty if the new , , protection regime is further up the hierarchy than the I previous one, but it is in very serious difficulty if the new protection regime is lower down than the previous I I I 573 one. The more one elaborates lock and key systems, the more this problem becomes a troublesome addition to the allocation problem mentioned before. For these reasons, after a great deal of investigation, we did for the time being abandon the use of lock and key systems as a means of implementing the selection we desire. SELECTION BY INDIRECTION As foreshadowed above, the obvious alternative means of selection for accessible segments is by the use of indirection tables. If all segments are accessed via an indirection table or via one of a set of indirection tables, then it is possible to constrain the selection of available segments in quite arbitrary ways by suitable construction of the indirection tables. A consequence of the use of indirection tables is that addressing has become much more bound up with the protection implementation. This can be seen by looking at the complete specification for getting at a word of core. In order to specify a work, we must give three pieces of information: I 1. which indirection table must be used, 2. which entry in that indirection table indicates the required segment, 3. which word in that segment is wanted. The first two of these will be called the segment specifier, and the three collectively an address. The segment specifier of a segment depends on the protection regime, and so in turn does the address of a word. If the protection regime changes, a new set of indirection tables will be brought into use, and the addresses of words will in general change too. What changes is not the segment itself, nor is it the capability for the segment; the change is to the means of finding the capability. This point is of the greatest importance, and it is worth recapitulating it in a sharp form. On one side of the divide we have systems such as those which rely totally on locks and keys, where if a program attempts to load the accumulator with the contents of a certain word, then the actions it undertakes are in all circumstances the same, regardless of protection regime, although in some protection regimes they may cause a violation. On the other side of the divide, we have systems in which protection is so bound up with addressing that the bit pattern to be presented in order to load a certain word into the accumulator differs according to the current protection regime. The latter approach gives the flexibility which we have been unable to achieve in the former. However, if the mode of 574 Fall Joint Computer Conference, 1972 addressing words or segments is influenced by the protection environment in force, then there are complications in the compilation process that do not arise in a system with permanent segment addressing. Secondly, one gets into some difficulties with pointers from one segment to another. If we have a data structure which exists in more than one segment, some of the pointers in one segment will point to places in another segment. If the specifier of the segment changes, we are in difficulty. Although this does not happen very often, a solution must be found. The non-uniformity of treatment of pointers is something which compiler writers dislike since the existence of the non-uniformity may not be evident at a convenient time in the compilation process. Bearing in mind the above points, we now look at methods of implementation of systems which rely upon indirection to perform selection. The principal choice we have considered is between a system with explicitly named capability registers, and one without. A system with explicitly named capability registers works in the following way. A number of registers are provided, usually about eight, each of which is able to contain a capability for a segment in absolute form. Typically this consists of a base, a limit, and an access code. A process is at any time equipped with one or more capability segments, which contain either absolute capabilities, or information from which absolute capabilities may be found or constructed. The system has an instruction called 'load capability register' which has two arguments. The first argument is the number of a capability register to be loaded, and the second is an indication of which capability is to be loaded there. It must indicate which capability segment to use if there is more than one, and which entry in the selected capability segment should be used. A store reference instruction will then be interpreted via a capability register. A subsidiary point is whether or not the selection of which capability register to use is part of the address field of the instruction or part of the function field. The significance of this point is whether or not the capability register selection can be changed by index modification. Take first the case where the capability register selection cannot be changed by index modification. In this case a particular instruction in the program has it fixed for all time which register is going to be used. This approach imposes a rather considerable lack of flexibility. Some of this lack of flexibility is associated with any explicit capability register scheme, and will be mentioned in a moment. One aspect, however, is unique to this approach, namely that it is impossible to have a pointer from one segment into another. There is no uniform way of writing a program which will follow a chain searching for something, if that chain is likely to pass through words of more than one segment. It was remarked above that there are difficulties in this area anyway, and possibly the solution to the problem is to decide that intersegment pointers should be disallowed. Turning now to the alternative case where the capability register selector can be altered by index modification, we see that the particular difficulty just referred to does not arise. Provided that the capabilities for the segments in which the data structure resides are loaded, and known to be loaded, into the correct registers (where 'correct' means the ones which were assumed when the points were set up), then inter-segment pointers are perfectly possible. This proviso, however, indicates the lack of flexibility which remains. A great deal of pre-allocation of capability registers has to be done in any system which refers to them explicitly. Furthermore, an instruction will only be correctly executed if the right capability register has been loaded. Unless there are sufficient capability registers, which may be rather a lot, there is a good deal of keeping track to be done to insure that at all times the correct capabilities are where they should be as the flow of control proceeds round the program. For example, it may be desirable to pass the address of a word around in a program at a time when a capability for the segment containing it is not necessarily loaded. There is of course no need for the capability to be loaded until the address is actually used. We find that we need, in effect, two sorts of address which can be described as a particular address and a general address. A particular address consists of a capability register number and an offset. It is valid in all circumstances in which the capability register has been properly loaded. A general address consists of a complete segment specifier and offset; the segment specifier is just the second argument of a 'load capability register' instruction. If a piece of program, say a sub-routine, receives a general address, it is in a position to load the indicated capability into whichever capability register it thinks fit. However, in this case also we have difficulties of compilation, because the compiler cannot know when to use general addresses and when to use particular addresses. Furthermore, considerations of economy would suggest that we do not need two forms of address; of the two it is clear that the general one should be retained. THE SYSTEM: PROPOSED This final remark leads to the outline of the system we have eventually proposed. There are no explicitly named and explicitly loaded capability registers; instead the general address as defined in the last paragraph is interpreted directly by the hardware. The hardware I Protection Systems and Protection Implementations 575 indirection tables: Figure 1 must internally have registers in which absolute capabilities are to be found, and what it does, when presented with a general address, is to test whether the absolute capability corresponding to the segment specifier part of the general address has already been loaded into one of the internal registers. There are a variety of ways of doing this at hardware level. Weare now in a position where programs only use addresses in the form 'segment specifier, offset', and the runtime interpretation of the segment specifier is buried beneath the hardware-software interface. We must remember, however, that the interpretation of a segment specifier will still depend on the protection regime, because it makes use of indirection tables as a means of selection. I t is now time to return to a question implied about, namely how many indirection tables there should be and what they should be used for. The structure we are talking about is sketched in outline in Figure 1. In this structure, a change of protection regime will be implemented either by changing the contents of the indirection tables, or by bringing into use new indirection tables and putting out of use old ones. Some things are most naturally done by amending the contents of indirection tables. For example, a system call to give the process a brand new segment results in a change to the protection environment which is most easily made by extending a presently existing indirection table. The call has said something like 'get me a new segment of size n and call it Jack' where Jack is a segment specifier. The consequence will be that the appropriate indirection table entry will be set. On the other hand, when protection regimes change not by giving the .process new resources but by changing the accessibility of the resources already given, it is expedient to bring new (but pre-existent) tables into use and similarly to dispose (temporarily) of old ones. We have chosen to classify the segments available for a process at any time into four classes, implying that there are four current 1. Segments which are available to the process regardless of which program is currently being executed; these are known as G for global. 2. Segments which contain the code, or alternatively read only data, for a current program; these are called P for program. 3. Segments which, although the property of the process, are only accessible within the current program; these are called type CPo 4. Segments which are accessible because they have been passed to the current program from the program which called it; these are called type A for argument. For example, consider a program package whose duty it is to perform an input/output operation, such as taking a string of characters away from the calling program, despositing it in a buffer, and subsequently disposing of it. The code of the package may read from or write to the calling program's data area, it will require to be passed capabilities which will be A type. If in the course of executing this package it is necessary to make calls to the generally available operating system facilities, the ENTER capabilities for these facilities will probably be capabilities of type G. If, however, the system calls may only be made from within the input/output package we are describing, those ENTER capabilities could be either of P type or of CP type. The action of an ENTER instruction will thus be to change three of the four indirection tables. The table G will not be changed, because it is always available. The P indirection table will be replaced by one which is the defining characteristic of the called package; everything referred to in theP table will be shareable between all users of this procedure. The existing CP table will be replaced on ENTER by one set up to have the required properties at the time when the procedure was made available to the process. Making a procedure available to the process thus consists of equipping the process with the required ENTER capability and with the required indirection tables. The A indirection table will be replaced by one which is characteristic of this particular call. It is convenient to place A indirection tables on a special stack of standard format and distinguished from any stack that the running program may create for its own purposes. The special stack can also be used to store the links associated with ENTER instructions. Specifically, if an A type indirection table is constructed before a call, it will be the top few words of the stack. One or two special instructions are provided for moving pointers to capabilities from one indirection table to another, and 576 Fall Joint Computer Conference, 1972 one of these is specifically used for establishing entries in what will be a new argument type indirection table. It is worth noting that in the system proposed material other than that in global segments will only be available to called programs if appropriate capabilities are explicitly passed. There is inevitably a slight overhead on calls, but this is unavoidable in any system which does not have hierarchical protection. In hierarchical systems, it is usually assumed that when a call is made to a more privileged regime (and most calls are like this) everything which was previously available is still available. We are now in a position to give some account of the protection system as it appears when a process is running without any reference to problems of interprocess communication or of coordination. At any time the protection regime is represented by the current settings of the four indirection tables. Some of the capabilities referred to in these indirection tables will be ENTER capabilities; these delineate those changes of protection regime which are immediately possible. When one of. the ENTER capabilities is exercised by means of the ENTER instruction, the protection regime changes and the P, CP, A indirection tables are all replaced. We thus see that an ENTER capability must specify, directly or indirectly, the capabilities for the two new indirection tables of the P and CP types, the A type being part of a stack as previously described. What an ENTER capability actually looks like in a process capability segment is an implementation decision. We can now look at the same questions from another angle, and consider how to construct a protected procedure-that is, a procedure which will be entered with an ENTER instruction and which will run in its own protection regime. A protected procedure is characterized by its P- and CP- indirection tables. Accordingly, to construct one we must construct these tables, and insure that there are in the process's capability segment the correct capabilities for the indirection tables to select. A specimen prescription for such a procedure could look something like this: "There are 4 entries in the P-table. The first must select a segment of program whose text-name is Peter and the only access needed is 'execute.' The second selects a translation table called Bill, and 'read' access is required. The third and fourth must select ENTER capabilities for two standard system functions. "There are 2 entries in the CP-table. One is for local workspace of the procedure, and should bea copy of named segment Alfred, which contains initial data values. It must be readable and writeable. The second must be a workspace segment to use as a buffer, readable and writable, and 1000 words long." A routine that interprets this prescription and sets up an ENTER capability for the procedure in question then takes the following actions. First it procures suitable segments in which to build the indirection tables, and then it sets about filling them in. In the case of a workspace segment, whose initial contents do not matter, all that is necessary is to ask the core management routine for a segment of a suitable size and set the appropriate pointer in the indirection table. In the case of a segment whose initial contents must be set from a file, then the file system must be consulted in order to discover the segment size and disc address. There is a third possibility, namely that the prescription is for a segment already known to the process, and in this case the insertion of a new pointer is all that is needed. The two entries for standard system functions mentioned above would very likely fall under this case. Since the purpose of the routine is to equip a process with a new ENTER capability, it may be convenient to write it so that it can act recursively when the prescription itself calls for ENTER capabilities. The final action of the routine is to construct the ENTER capability which was originally requested, and leave a pointer to it in a suitable place. In this approach the protection procedure is regarded as a totally encapsulated entity which can be incorporated into the environment of a process without any presuppositions as to what was there already. If parts of the (read- or execute-only) environment were present already, then they will be re-used. It is open to take a slightly different approach and to construct protected procedures on the assumption that, for all processes, certain standard functions are available through the G-indirection table, this being always accessible. Doing this makes P-indirection tables shorter, but requires more conventions as to the way processes are set up. THE PROBLEM OF INVALID ARGUMENTS It is common for one or more of the arguments of a call to a protected procedure (or indeed any procedure) to be the address of a piece of store to which the procedure will write. There is no protection problem if the store so addressed is accessible to the calling program; potential difficulties arise if it is not so accessible but would be accessible to the called program. As a concrete example, suppose that in a traditional computer where the supervisor runs in a privileged mode, all memory b.eing accessible, there is a system call to read n words from an input document to store starting at address a. Protection Systems and Protection Implementations If a user program executes this call, giving as argument an address in store available to itself, there is no problem; what, however, if the address is that of store inaccessible to the user, but accessible to the supervisor? Unless precautions are taken, the supervisor may, when presented with an invalid argument, over-write its own program or important data. This problem is not new; there are explicit counter-measures to it in the hardware of the Atlas. However, the more generalized one's approach, the more difficult it is likely to be to deal with this class of difficulty. In a system with explicitly named capability registers, and in which the capability register number is in the function part of an instruction (i.e., it cannot be altered by address modification) the problem cannot arise. This is because any address passed as an argument will only be interpreted by the called program as referring to an authorized segment, and no possible action can mislead it. As soon as we move to a system in which the capability register number or the segment specifier are parts of the address passed, then there is the possibility of trouble. Difficulties of this sort arise in any system in which indirect references to segment names or numbers are possible. In order to guard against the danger just referred to a check must be made which depends on a number of different pieces of information being available at the same time. We must know: 1. The protection regime in which the address was constructed; 2. The protection regime in which the word referred to by th e address is accessible; 3. Whether it is allowable to construct an address in the former regime which refers to a word in the latter. In the structure outlined above, where there are four indirection tables, a simple rule results as follows: an address residing in a segment of type G or A may not specify a word in a segment of type P or I. The difficulty comes in knowing when the rule is being broken. As an example of a common sequence in which the relevant information is not all at hand at once, consider: I Load index register from store Access store via index register Item 1 above is available in the first instruction, but . it is not then known that the word will be used as an address. Item 2 is known on the second instruction, but I not where the contents of the index register came from. Any approach, for example the use of indirect instrucIJI. 'I 577 tions, which has both pieces of information available at the same time will enable the problem to be solved, e.g., 'Load accumulator indirectly from store' because both addresses and hence both segment types are know in the course of the same instruction, or 'Validate stored address' which is like 'Load accumulator indirectly' except that it does not load the accumulator, but only checks that the address in store obeys the rules. A really satisfactory solution to the problem of invalid argument addresses would not place on programming style the constraints which are imposed by the compulsory use of indirect or validation instructions. Such a solution is not yet obviously available. The body of this paper has been concerned with protection systems within a process. Nothing has been said about how the process obtains its resources and from where. There follows a brief view on how this aspect of a· system may be organized. The time available to a process is administered by a superior process called its coordinator. The coordinator is responsible for allocating time to its junior processes, and for synchronizing their execution where necessary by managing their halting and freeing. In addition to being the source of time allocation, the coordinator has responsibility for space allocation. Finally, any process may act as coordinator for processes junior to itself. This view has consequences for protection. The within-process protection architecture discussed above aids the orderly use of the process's resources, and all privileges conferred on particular procedures are relative privileges within the general facilities available to a process. Since all facilities available to a process are mediated by the coordinator, the last statement implies that privileges are valid within the universe set up by a coordinator for its junior processess, this universe being a subset of that available to the coordinator itself. It is a consequence of these remarks that privileges enjoyed by a coordinator in virtue of its relationship to its superior may not be passed on to the coordinator's juniors. They exist in the wrong world. The result then is that a coordinator may pass to its junior processes, when setting them up or later, access to core segments or subsegments available to it, with or without further access restrictions. It may not pass an 'ENTER' capability at all, though it may be able to pass the use of pieces of code from which an ENTER capability can be constructed for the junior. Since the coordinator has complete control over the actions of its 578 Fall Joint Computer Conference, i972 should not be forgotten, however, that if the requirements of a protection system are modest, then a lockand-key method may well be feasible. An outline was given of a practicable indirection technique for use in more general cases; again it should not be forgotten that others can be devised which may be more suitable in particular cases. junior processes, including interfering with register settings during halts or after interrupts, passing an ENTER capability could allow the coordinator to perform, via a subordinate, action which would ordinarily be forbidden. The ENTER refers not merely to a piece of code but to package whose existence implements privileges granted by the coordinator's superior. In the above approach, there is nothing unique about the status of a coordinator. Any program may create subprocesses for which it carries out coordinator functions according to any queueing logic or discipline it may choose. Two instructions are to be provided in our experimental system to assist in this operation; 'ENTER SUBPROCESS' which effects the complete change of protection context required by making current a new process capability segment and new indirection tables-the new capability segment being defined by reference to the old-and 'ENTER COORDINATOR' which reverses this action. I am much indebted to Professor M. V. Wilkes who has devoted a great deal of effort to improving the text of this paper, as well as many discussions of its content. The ideas have benefited from interaction with numerous colleagues, in particular three research students, J. R. Horton, C. C. Kirkham, and R. D. H. Walker. The work is part of a project supported by the Science Research Council. CONCLUSION REFERENCES The foregoing discussion has attempted to describe the requirements upon a protection system for information in central memory, and to bring out the problems which arise from various approaches. The upshot is an abandonment for the most general protection systems of lock-and-key methods, and the use instead of methods which. rely on selection by indirection. It ACKNOWLEDGMENTS 1 M V WILKES Time sharing computer systems Second Edition American Elsevier 1972 2 R S FABRY Preliminary description of a supervisor organised around capabilities Quart Prog Report No 18 Section IIA Inst Comp Res U niv Chicago 1968 Burroughs B1700 memory utilization by W. T. WILNER Burroughs Corporation Goleta, California INTRODUCTION UNIQUE DESIGN REQUIREMENTS Squeezing more information into memory is a familiar problem to everyone who has written a program which was too large to fit into memory. Program compaction is also important to those who work on machines with virtual memory (such as the B55001) ; despite the almost unlimited amount of storage, one wants to keep program working-sets2 (collections of segments needed in core at the same time) as small as possible to reduce both the number and duration of segment swaps. In general, one seeks to raise the information content (or reduce the redundancy) of the blocks of information which one is using. In this discussion, "information content" will suffice as an intuitive notion. One of the devices which hardware and software designers have provided to help with compaction is choice of container sizes. Machines can manipulate more than words: bytes, double words, and so on. Languages allow variables to be declared with different sizes, e.g., four-byte or eight-byte integers. Another category of compaction devices is encoding techniques. For example, memory addresses may be encoded literally, or as a "base-register-name/displacement" pair, or as an "indirect-reference-flag/reference-table-index" pair, or so on. A third technique for raising information content is to group information according to time, that is, by keeping information which is likely to be needed at the same time in one place. For example, variable-length segments are more efficient than fixed-length pages, 3 partly because segments are made to contain coherent subprograms, which is a way of grouping according to time. Ideally, then, a computer system very likely to utilize memory most efficiently would be one which could (a) manipulate any size bit string, (b) interpret any sort of encoding, and (c) administrate any segmentation scheme. Burroughs B1700 (described elsewhere in these Proceedings 4) is the only information-processing system (known to the author) which almost attains these ideals. The B1700 is specifically designed (a) to manipulate fields from zero to 65,535 bits long equally adeptly (which is a requirement of its defined-field design), (b) to interpret arbitrary "soft" machine language, or S-language, faster than a hard-wired system in the same price class could execute identical functions (which is a requirement of its generalized language interpretation design), and (c) to automatically move information in and out of memory according to any scheme (which is a requirement of its throughput objectives). As a result, the information content of fields in B1700 memory is exceptionally high, and memory is often utilized twice as efficiently as on other systems. COMPACTION TECHNIQUES A rbitrary field size With defined-field design, fields may be defined to be just the size that is necessary, however many bits that may be, and other, arbitrarily-defined fields may begin in the very next bit. One bit will do for boolean variables, and it may truly be any bit in memory. Character strings may begin on any bit address. There is no such thing as byte alignment, or data specification. A major addressing boundary, if it can be called that, occurs between each of 244 (over 17 trillion) bits. Every bit can be fully utilized. There are no locations and no field lengths which offer any processing advantage over other locations and lengths. Therefore, S-language designers are free to choose container sizes, such as for data addresses, which 579 580 Fall Joint Computer Conference, 1972 are precisely as many bits long as desired. This simple freedom appears to account for half of all the program compaction which has been realized on the B1700. S-language designers are further able to leave such things as branch address field lengths unbound until after compilation, when specific program details are known, such as the maximum number of instructions to be skipped by a branch instruction. It is just as easy to bind field lengths at run time as earlier; hence, S-language format can profitably change from program to program. instructions, viz.: Opcode 1 0111 0110 0101 0100 0011 Instruction #1 #2 #3 #4 #5 #6 Occurrence Total Bits 1000 200 200 200 200 200 1000 800 800 800 800 800 5000 One thousand bits are eliminated, increasing memory utilization by: Frequency-based encoding 6000 - 5000 1 01 6000 X 00 /0 Given that fields may have arbitrary sizes, S-language designers (and users) may employ the varying-size containers generated by Huffman's algorithm for minimum redundancy codes. I) Briefly, the technique encodes elements by means of strings whose length varies inversely with the occurrence frequency of the elements; i.e., the most frequent element is represented by one of the shortest strings, and the least frequent element is represented by a longest string. Huffman encoding constitutes one extreme form of representation, which may possibly stipulate a different length string for each element to be represented. The opposite extreme is uniform container size, e.g., words. Between these two extremes lie a range of encodings, which particular circumstances may merit, as will be illustrated later. As a simple illustration of frequency-based encoding, suppose a defined-field computer with a six-instruction repertoire exhibited the following frequency counts of instructions in a program whose size was to be compacted: Instruction Frequency #1 #2-#6 1000 5@200 Total 2000 Using ordinary encoding techniques, a three-bit field would be used to represent six quantities. The program's 2000 instructions would then be represented by 6000 bits. If, on the other hand, we allow variable-length, frequency-based encoding, the most frequent instruction could be encoded with only one bit. The bit would signify either the instruction or that three more bits follow, carrying the encodings of the remaining five = 16 701 • (1) /0 A better encoding would use two bits for one of the five less frequent instructions, since the remaining four could still be encoded in four bit opcodes, viz.: Opcode 1 01 0000-0011 Instruction Occurrence Total Bits #1 #2 #3-#6 1000 200 @ 200 1000 400 3200 4600 Fourteen hundred bits are eliminated, increasing memory utilization by 23.3 percent. Note that this encoding has no unused bit combinations; it can be used for exactly six instructions. More redundant codes have room for other opcodes. Time-based representation In addition to representing information in fields according to occurrence frequency, one may improve memory utilization by rearranging fields according to dynamic frequency. That is, fields which are needed most often in memory may be collected into a common segment, in a time-analogy to minimum spatial redundancy. The B1700's interpreters are equipped to record program profile statistics6 which determine what pieces of code spend the most time being executed. By designing S-languages which allow arbitrary grouping of data or program pieces into segments, one may permit program representations in which most-often-used constructs appear in short, coherent segments while relatively unused portions reside in large, discontinuous (from the standpoint of flow of control) segments. Bits in each segment have similar time-utilization, just as the varying length of fields in Huffman encoding grant similar space-utilization to the bits in a particular field. Burroughs B1700 Memory Utilization Dynamic field size A defined-field computer must transmit a ,field length as well as a bit location to memory for each access since arbitrary field lengths are permitted. Consequently, it is just as convenient to have operand lengths dynamically changeable as fixed. Length constants must be stored somewhere between requests to memory, and it is no less efficient to keep them in addressable fields. This opens up the possibility of Dial-A-Precision FORTRAN, where the operand fields in the FORTRAN S-language can be adjusted on the fly to be long enough to hold a required precision, for example, during inner product calculations. This capability is planned for the B1700 software, but is not in the initial releases. APPLICATION TO SPECIFIC S-LANGUAGES Since all high-level languages on the B 1700 are compiled into novel S-languages of Burroughs' own invention, opportunities exist in these contexts for improved memory utilization. S-languages for. existing machines, such as System/360 machine language, prohibit compaction because the fields are locked on to non-defined-field hardware formats. SDL S-language Burroughs supplies B1700 customers with a language and interpreter which have been designed to be most efficient in a compile-time environment. Named Systems Development Language, SDL, it has been used to program all B1700 compilers. SDL is constructed from an extendable base language which has been used, in augmented form, to write the B1700's Master Control Program (which performs supervisory functions such as 1/0,' multiprogramming, multiprocessing, virtual memory management, etc.) and, in a different form, to write sorting applications. 581 TABLE I-Comparison of SDL Opcode Encoding . Against Extreme Methods Encoding Method Huffman SDL 4-6-10 8-bit field Total Bits for MCP's Utilization Decoding Opcodes Improvement Penalty Redundancy 172,346 184,966 301,248 .0059 .0196 .4313 17.2% 2.6% 43% 39% 0% o. % parallel decoding of all bits in the field, mInImIZIng time, but requiring much storage (except when all elements have identical occurrence frequencies, but that is contrary to computer behavior). Huffman codes may require much more decoding time, since bits may need to be examined serially until the length of the field manifests itself, but the codes can minimize storage. In the middle, SDL's three lengths come very close to minimizing storage, and also incur very little extra decoding time, as Table I indicates. Figure 1 presents the same figures graphically. The reason for Figure l's exponential curve is that there are several orders of magnitude between the frequencies of the most and least frequent elements in the set to be encoded. There is a great deal to be gained in such circumstances even by encoding the $ingle most frequent element in a shorter field than the others (as was illustrated also in our example). If the opposite were true, if all elements were uniformly frequent, then the trade-off curve wo:uld be linear (or nearly so, depending on what multiple of the encoding radix the number of elements is). SDL opcodes Opcodes are encoded into three lengths: four, six, and ten bits. Of the sixteen four-bit combinations, ten name the most frequent instructions, five indicate that two more bits specify the remainder of a six-bit instruction field, and one signifies that six more bits are needed to define the operation. The design trade-off between space and time in opcode representations does not vary linearly between the extremes of Huffman encoding and fixed container size. One fixed field length allows Huffman enCOding"" 57%,--+----+--~~--------------------~ 50%+---~----r_------~r_------~--_r----_+ 1.00 1.026 1.05 LlO 1.15 1.17 Decoding time Figure I-Performance of SDL encoding compared to extreme techniques 1.20 582 Fall Joint Computer Conference, 1972 Redundancy We can compare these techniques on a less intuitive basis. "Information content" may be precisely defined in terms of the probability of a message's occurrence (as opposed to its meaning). Shannon's entropy function7 tions out of the programmer's attention (because the possibility of working with machine language is removed), and this allows further efforts to remove redundancy, because opcodes no longer have to be human-engineered. Ease of use and high memory utilization are not orthogonal design criteria and increasing one need not decrease the other. I L Pi log Pi H= - (2) gives a measure of the average information content of I independent events with individual probabilities {pd. If we consider an SDL opcode in the MCP program as an event, and calculate Pi log2 Pi for all 73 opcodes, then we find H = 4.55, which may be- interpreted as the average number of bits needed for an opcode. To compare the encoding techniques of Table I using this criterion, we have: Average content (bits/ opcode) 4.58 4.88 6.51 Technique binary Huffman SDL 4-6-10 8-bit field which shows that our chosen technique is very close to the minimum value of 4.55. The redundancy factor of an encoding technique may be calculated as Redundancy = 1- optimum message length encoded message length' (3) which ranges from zero (no redundancy) to one (infinite redundancy). To derive the redundancy column in Table I, we compared the total bits in the MCP via each technique against 4.55 X 37,656 (the total number of MCP opcodes) = 171,349, which is the smallest number of bits that may be used under the assumption that opcodes are decoded independently. Redundancy, despite its quantifiability, is not a good independent design criterion. If pursued too extremely, there are disadvantages (such as intricate and slow decoders). If ignored, of course, there are extreme disadvantages (total system inefficiency). One must consider it in balance with all other design criteria, and attempt to reduce it without sacrificing performance in other areas. Most importantly, one must not sacrifice the unquantifiable criteria, such as ease of use, which appear to be most significant. It is interesting, however, that the B1700's S-language concept (which was pursued primarily to improve ease of use) has the desirable side effect of taking actual opcode representa- Significance of opcode compaction Opcodes, in SDL's case, occupy nearly one-third of the entire program space because the choice of S-language significantly reduced all other kinds of fields. Compaction of opcodes contributed the most toward reducing overall SDL program size. SDL data addresses Locations of variables, or data addresses, are the second most populous fields, after instructions. SD L is a block-structured language and the SDL machine (for which the SDL S-language is the machine language) is a stack-structured processor, so data is accessed by a pair of integers, one giving the lexicographical level on which the variable was declared in the SDL program, and the other giving an occurrence number, or ordinal position, of the declared identifier in its block. The level identifies a (dynamically varying) region of the stack and the occurrence number indicates a displacement into the region where the variable may be found. In order to accommodate extremely large programs, the language designers decided to allow up to 1024 variables on any lexicographical level, and up to sixteen nested levels. The largest data address, thus, would require fourteen bits, ten for displacement and four for level. Once the compilers and the MCP were written and debugged in SDL, the actual usage of bits in data address containers was studied, in order to apply frequency-based encoding techniques. Table II gives the usage statistics for the B1700 MCP. Using the arbitrary fourteen-bit container, 9174 addresses require 128,436 bits. The usage study found that 66.1 percent of the occurrence numbers could be contained by a five-bit field and 78.4 percent of the level numbers were either the current level or level zero, which could be encoded in one bit. All together, if these shorter fields were made available, only 94,900 bits would be required, which is a 26.1 percent improvement in memory utilization. By mutual consent of the two SDL compiler writers and the SDL interpreter writer, it was agreed that the S-language would be changed to include a new data address format: level fields of one or four bits, occurrence Burroughs B1700 Memory Utilization 583 TABLE II-Occurrence of Actual Field Lengths Required by B1700 MCP Data Addresses Level Field Size 0 1 '2 3 4 Relative Level Contents 0 -1 -2 -3 -4 -5 -6 to -15 Displacemen t field size 4 5 6 7 8 9 Total 67 762 223 64 0 130 1182 408 44 0 189 1116 107 7 0 345 701 13 0 0 635 0 0 0 0 0 0 0 0 0 1574 5619 1611 370 0 1026 1116 1764 1419 1059 635 0 9174 2 3 4 5 6 7 8 9 Total 411 178 159 1 0 0 501 354 136 35 0 0 409 567 102 29 9 0 342 871 454 93 3 1 231 689 329 146 24 0 108 439 167 305 40 0 46 205 300 41 41 2 0 0 0 0 0 0 2985 3672 1714 663 133 7 0 0 1 2 3 136 478 162 56 0 31 355 143 45 0 12 397 272 68 0 29 628 283 86 0 832 574 749 0 1 521 234 44 13 16 4 416 135 23 0 0 0 number fields of five or ten bits, and two prefix bits to indicate which of the four possibilities followed. Locations in SDL are thus eight, 11, 13, or 16 bits long. Because this scheme is so different from conventional techniques, it is difficult to establish the exact advantage in memory utilization. If we consider a conventional scheme which can address as many variables, it is reasonable to require that two bytes of address field be used, since it is certainly possible for a program with 216 variables to be executed by the SDL interpreter. Another way of reaching the same conclusion is to consider the fourteen-bit maximum container; without defined-field design, fields must be byte-multiples (at least), so two bytes are needed. For 9,174 addresses of 16 bits each, 146,784 bits are needed. Hence, the four-way SDL encoding offers 35.4 percent memory utilization. SD L code addresses Program points are addressed by a pair of integers' one giving a segment name and the other specifying the starting bit of an instruction in the segment, relative to the start of the segment. Program segments are stored separately from data segments. * As a consequence, code * This is so that protection can be efficiently implemented and so that reentrancy is free, i.e., more than one program can execute a segment concurrently without requiring a different representation from that which a one-program version would use and without executing any instructions specifically to administrate reentrancy. addresses may be structured differently from data addresses. This freedom is advantageous for compaction, too, because usage information may be applied independently to each kind of field. Code address requirements are typically very different from data address requirements. Programs usually have many more variables than segments, so fewer bits are needed for segment names than for variable addresses. Segments usually contain more bits (thousands) than blocks contain variables (less than a hundred), so more bits are needed for displacement fields than for occurrence number fields. SDL designers wanted to allow over a billion bits for programs, in up to 1024 segments of up to one million bits each. At the same time, they surmised that many references to the first 32 segments might better be encoded in five-bit segment names, and references to the first 4096 bits and the first 65,536 bits might be more efficiently encoded in 12- and 16-bit fields, respectively. These shorter options were included in the preliminary S-language design which was used during MCP and compiler construction and check-out. Prior to release, actual usage was studied to evaluate the appropriateness of the design choices. * A sample of * On the B1700, S-language design may be changed at any time. Programmers see only higher-level language which is independent of S-language format. Hardware sees only microcode, which is indifferent to S-language format. Many S-language revisions can, in fact, be implemented simply by changing some literal fields in the interpreter and compiler. 584 Fall Joint Computer Conference, 1972 TABLE III-Occurrence of Actual Field Lengths Required for B1700 MCP Code Addresses Segment Field Size 0 1 2 3 4 5 6 7 8 Displacement field size 11 9 10 12 13 14 15 Total 0 0 0 5 42 32 110 78 12 153 0 2 9 12 36 37 132 5 1 1 1 6 42 66 80 147 40 39 0 0 9 28 25 166 168 26 33 5 0 0 0 5 36 9 0 102 1 0 0 0 0 0 0 0 332 7 28 42 147 309 502 603 120 268 376 389 461 88 103 2090 0-2 3 4 5 6 7 8 4 0 14 0 7 4 4 5 6 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 6 0 0 1 2 0 0 0 0 0 0 3 4 18 4 0 0 2 0 0 140 13 5 4 0 0 4 7 16 19 47 39 23 44 2 0 2 9 29 164 155 2,090 code addresses from the B1700 MCP corroborated the design team's choices, including that of a null segment field, for references within the same segment. Table III gives the occurrence of the actual field requirements for the code address sample. Actually, a fraction of 1 percent improvement in memory utilization could be achieved by changing from a five-bit segment field to a six-bit field, but if future SDL programs turn out to be smaller than present Burroughs compilers, the usage statistics will eventually prefer the present fields, so no change was made. SDL code addresses, then, have the following format: field description 3 bits displacement 0,5 or 10 bits 0,12, 16 or 20 bits Of the eight permitted variations, the most frequent is 15 bits long; three description bits, no segment field, and 12 displacement bits. Except for the null address (only three bits long), this is the shortest option, so it is to be expected that a proper frequency-based encoding of code addresses would make this the most commonly used format. Comparing this schem~ against conventional hardware is no less puzzling than it was for data addresses. Perhaps we may consider that the SDL machine can directly address 2 30 (over one billion) bits of instruction storage. An equally capable byte-oriented machine needs to address 227 bytes directly, so four bytes are needed. The SDL representation uses 74,303 bits for 3,767 addresses in the B1700 MCP, whereas the byte-oriented machine would need 120,544 bits (ignoring the extra capacity that four bytes allows). In terms of memory utilization, the ability to define eight different code address formats yields a 38.4 percent advantage for the B1700. SDL profile statistics While an SDL program is running, the SDL interpreter records each segment transition in an array which is automatically allocated to each program. This monitoring, performed by microcode, adds less than 7.4: percent to the running time. At the end of the run, SDL code is interpreted which prints the number of times each segment was entered, a compiler-estimated traversal time for each segment, and . the product of the two, giving an indication of which segments used the most execution time. SDL programmers can also indicate at compile-time which sections of their program should be monitored closely, down to each conditional expression, so that exact frequency counts can be obtained for each bit of code in the program. Subsequent monitoring involves additional code, but the amount is always less than a 1 percent increase. Assuming segment and procedure frequency counts have been utilized to focus one's attention on c;mly a few procedures, execution overhead for conditional expression monitoring is under 72 percent in most cases. Thus, the B1700 can indicate to programmers what the time-utilization of their program sections is, at very low cost. By grouping similarly utilized sections into segments so that segment transitions are lessened, temporal memory utilization can be increased. Other S-languages Only SDL designers were able to apply accurate usage statistics to their design because the entire world's Burroughs B1700 Memory Utilization supply of SDL programs was available to them. COBOL, FORTRAN, BASIC, and RPG (the initial set of languages for the B1700) S-language designers collectively extracted usage statistics from over nine million bits of sample programs, but can only guess at their verity. COBOL S-language I ': I 1.1 Opcodes are represented by three- or nine-bit fields; the seven most frequent instructions are encoded into seven of the eight three-bit codes, and the eighth is an escape code which indicates that six more instruction bits follow. Operands are prefixed by a single bit which indicates literal operands or variables. Literals consist of a type field (two bits), a length field (three or eleven bits), and the literal string itself. Operands are indicated by a string of fields which give segment and displacement location, length, type, character code semantics, and subscript information. Frequent operand descriptions do not appear in-line; they are placed into a table by the compiler and their table index is used in-line. COBOL S-language even includes a sub-S-Ianguage which defines editing operations. This implies that the COBOL interpreter contains a sub-interpreter which handles the editing program strings. These program strings, like operand descriptions, may appear either in-line or tabulated. COBOL, more than any other language on the B1700, has taken advantage of the ability to leave object code format undefined until after the compiler has seen a program. Segment descriptions, displacements within segments, variable-length data's size fields, operand descriptors, data addresses, and branch addresses are all stored in fields whose size is made just large enough to hold whatever maximum value is needed for the particular program. This capability appears to reduce COBOL object program sizes by 46 percent; that is, if the length of these containers were fixed for all programs, the average program size would be 85 percent larger. Of course, the amount of compaction is so large because the chosen S-language provides many opportunities to eliminate wasted space. The overall appropriateness for COBOL's S-language is difficult to assess. One faces the same evaluation problem as trying to say how much better one machine design is than another at implementing COBOL. So far, the only secure comparisons which we have obtained pertain to overall throughput and resource requirements. From a set of twenty ANSI COBOL programs of diverse application and varying size, we have concluded that COBOL programs tend to occupy 70 585 percent less memory on the B1700 than they do on a System/360 model 30. Such a drastic reduction in memory also improves running speed, which averages around 60 percent faster than the 360/30. The B1700, when interpreting its COBOL S-machine, even seems to out-do the B3500 system, whose hardware was designed to execute and compile COBOL programs. Program storage requirements are 60 percent less, and execution times are comparable. RPG S-language In order to reduce the number of interpreters active at anyone time on single-processorB1700s, the initial release of RPG uses the same S-language as COBOL. From a set of 31 RPG programs used for benchmarks, we observed that program storage is typically 50 percent of System/3 size (although one program with a preponderance of character strings in its representation was only 25 percent smaller, due to the fact that eight bits are used on both systems to represent characters). Execution speed is between 25 percent and 50 percent faster than System/3, due to the conciseness of the instruction stream and S-language advantage. COBOL S-instructions are interpreted at an average rate which is six times slower than System/3's average instruction rate (36 usec. vs. 6 usec.). To achieve 50 percent faster running time, each S-instruction must, on the average, accomplish twelve times more work than each System/3 machine instruction. Obviously, size alone cannot adequately measure program compaction. FORTRAN S-language FORTRAN also uses an opcode format of three or nine bits in each field, with seven short ops and one escape code. Data and code addresses have a common format, usually 24 bits long: field description bits (six) which control interpretation of the rest of the address and of the operand as well; a segment field (ten bits) which either names a segment or a place where a segment name can be found; a displacement field (eight bits) which locates operands within a segment; and possibly more fields, depending on context. Summarizing seven moderately-sized jobs, FORTRAN programs tend to occupy 50 percent of the space needed on a System/360, and 40 percent of the space needed on a B3500. (Note that these figures and those for COBOL imply that the B3500 is better at representing COBOL programs than a 360 and not as good at representing FORTRAN programs, which is well-known.) 586 Fall Joint Computer Conference, 1972 DISCUSSION there is no compile-time or execution-time overhead associated with this degree of user optimization. Inherent limitations Although choice of S-language format has been completely free on the B1700, there are implied restrictions due to the semantics of the higher-level languages for which S-languages have been invented. Contemporary languages, and FORTRAN especially, reflect the kind of hardware which their designers knew existed: sequential, word-oriented processors. Defined-field design offers significantly different machinery, and languages have yet to be defined which unconsciously assume defined-field capabilities, namely, that data and programs may be represented in any format whatever. Good as the B1700's memory utilization is, it tends to be better for programming languages which have a large number and variety of data and program formats. Diminishing returns Modifying interpreters to accommodate S-language refinements has a definite cost, including reprogramming affected portions of compilers and recompiling source programs. Our experience indicates that after identifying and improving seven or eight redundant aspects of a language, information content is relatively uniform among various S-language fields. Further refinements may not be worthwhile. This also implies that only first-order usage statistics need be collected, which keeps analysis costs down. User optimization When several alternative encodings seem equally attractive and their design trade-offs are well drawn, their invocation may be placed under user control. Each programmer knows individually whether his local system is time- or space-rich at any given hour, so he can give simple indications to a compiler about what options should be exercised. In COBOL, for instance, all data addresses can be forced either into the operand table to minimize program storage, or in-line to speed up execution by eliminating the table indirection. Since the interpreter is already capable of decoding both forms, CONCLUSION Defined-field design permits the definition of S-languages which are more efficient at memory utilization than contemporary machine structure. Because accessing and manipulation of arbitrarily-sized bit strings is handled al;ltomatically by B1700 hardware, various encodings may be selected solely on their inherent merit, with respect to program storage and decoding time; their suitability to the B1700 is irrelevant because the hardware is uniformly adept at manipulating all sizes of fields. One is free to choose problem representations which equalize the information content of fields in memory. Experience with compaction techniques, such as variable-length, frequency-based encodings, indicate that memory requirements can be reduced from 25 percent to 75 percent, compared to byte-oriented systems. BIBLIOGRAPHY 1 Burroughs B5500 information processing systems reference 2 3 4 5 6 7 manual Burroughs Corporation Business Machines Group Sales Technical Services Systems Documentation Detroit Michigan 1964 P J DENNING The working-set model for program behavior Comm ACM 11 5 May 1968 p 323ff E G COFFMAN JR T A RYAN A study of storage partitioning using a mathematical model of locality Comm ACM 15 3 March 1972 W T WILNER Design of the Burroughs B1700 Proc FJCC72 Vol 41 D A HUFFMAN A method for the construction of minimum redundancy codes Proc IRE 40 September 1952 pp 1098-1101 D EKNUTH An empirical study of FORTRAN programs Software-Practice and Experience 11971 pp 105-133 C E SHANNON W WEAVER The mathematical theory of communication The University of Illinois Press Urbana Illinois 1949 Rotating storage devices as partially associative memories by N. MINSKY University of Minnesota Minneapolis, Minnesota INTRODUCTION (a) The primitive in!ormation items stored on a PAM are pairs of the form: "Associativity" is a highly desirable property of memory d~vices. Unfortunately, it does not seem to fit very well into the structure of contemporary randomaccess memories. A realization of associativity on such memories is always involved with high density of logic, and in today's technology is bound to be very expensive. Virtually all existing implementations of associative memories are accordingly on a very small scale and are typically used for special purposes such as the support of "virtual memory" schemes. From this situation one can get the impression that large scale associative memories are impractical. Fortunately, however, it turns out that rotating memories, unlike random access memories, are very natural hosts for at least a limited degree of associative addressing. This paper points to several latent potentialities of rotating memories, and describes a method for utilizing them for a realization of "partial associativity" (a term which is defined below) . The method described here is by no means the only possible way for realizing associativity on cylinder memories. Several related proposals were published quite recently: by Slotnick and Parker,l·2 by Coulouris Evans and Mitchell3 and by Gertz.4 Other possibilities were considered by the author. 5 •6 At present it is not clear to the author which of all the possible methods is preferable, and under which circumstances. This paper should be treated, therefore, as an illustration of what can be done rather than as a definite proposal. item = (n, d) The components nand d will be called the name-part of the item and its data-part respectively. As we will see below, the items stored on the PAM can be addressed only through their name-part, which is the reason for including the word "partial" in the name given to the memory. In the proposed realization of P Al\1, the name-part will be considerably smaller than the data-part. We will almost always have length (n) ;5 length (d) /10. While this is quite a severe restriction it still leaves place for a wide range of applications. (b) The operational characteristics of the PAM can be defined as a pair (P, J) where: (b. 1) P is a finite set of primitive predicates (Pl, P2, ... ,Pk) defined over the name-part of the items. The following are examples of predicates which are likely to be included in P. (1) For a given pattern of bits b, and a given maskm: n.AND.m=b. Here ".AND." is a masking operator and n stands again for the name-part of an item. (2) For a given mask m, and given integers num1,num2 num1 ~n.AND.m~num2. (3). A composite expression, like: (n.AND.ml=b) /\ (num1~n.AND.m2~num2). (b.2) I is the set of primitive instructions that the PAM can carry out. The following instructions are considered as being essential. to every PAM: (1) Store an item (n, d). (N ote that no address is specified; the item PARTIALLY ASSOCIATIVE MEMORY-A DEFINITION In this section we will define the term "partially associative memory" (PAM), by specifying the primitive structure of the information to be stored on it, and its operational characteristics. 587 588 Fall Joint Computer Conference, 1972 set of fixed length* addressable units called pages. The pages are grouped into N 1 tracks numbered from 0 to N, and S sectors numbered from 0 to S -1. The pages are addressed by their track and sector indices (i, j). We will use the phrase "to access a page" for the act of reading it or writing on it. A set of read/write heads, one head per track, is moving** together above this surface from left to right. But only one channel is serving all the heads, so that just one page can be accessed at a time. We introduce the following notation: + T is the "revolution" time of the heads. t g is the amount of time required to pass over an data space I ~ .... _.... "',. .... , / / ," - - - - - - ;,- - - - - - - ~ .... \ ; "\ \ \ \ intersector gap. \ \ \ , ts = T / S is the amount of time required to pass ........... "\ \ , \ \ \ \ \ , ", over a page plus one intersector gap. " \ \ \ \ \ Figure 1-The partition of the cylinder storage space, and the wayan item is stored on it has to be stored in an arbitrary empty memory slot.) (2) For a given predic'ate pEP, retrieve an item which satisfies p (n) . (3) For a given pEP, retrieve the set of items satisfying p (n ) . (4) For a given pEP delete the set of items which satisfies p (n) . A specific PAM may have more primitive instructions on top of these four, but we consider the efficient execution of the primitives defined above, as a necessary and sufficient condition for a memory to be a useful PAM. THE SUITABILITY OF "CYLINDER MEMORIES" FOR PARTIALLY ASSOCIATIVE ADDRESSING We will be concerned in this paper with cylindermemories, either in the form of drums, or as the "cylinder part" of disks. Although the conventional structure of such memories is well-known, we will describe it here schematically, mainly in order to introduce notations which will be used in the rest of the paper, (cf. Figure 1). The storage space of a cylinder-memory contains a The activity of the memory is supervised by a special processor called controller. In order to access a randomly given page (i, j), the controller first selects the head associated with track number i, (the head selection, being an electronic operation, is in general much shorter than tg ). Then, the controller waits until the head gets to the addressed page. During this "rotational delay" the read/write heads are passing above the cylinder while the reading mechanism is essentially looking for the addressed page, which is identified by clock pulses or by some pattern of bits. Our proposal for a realization of associative addressing is essentially based upon an exploitation of this unavoidable delay. Instead of spending this time in looking for a given address, we are proposing to use it for a search for a given content. However, with the single reading mechanism, currently available to a cylinder, it would take a long time to search its whole storage space. In principle it is possible to build a pattern matching mechanism for each head, as was proposed by Slotnick and Parker. 1 •2 Such an arrangement will indeed enable full associative addressing, but it is bound to be very expensive. Having in mind standard low-cost devices we will try to achieve a less ambitious goal, namely the realization of "partial associative memory" as it was defined above. Essentially, the method for achieving partial associativity is very simple: Suppose that the items to be stored on our memory satisfy the relation length (name-part) ~ length (data-part) / N, * The page length is not always fixed in practice but we will assume this for the sake of simplicity. ** In practice the heads are fixed in space and the cylindrical surface rotates. It is, however, easier for us to talk about the head as the moving part. Rotating Storage Devices as Partially Associative Memories then we can group all the name parts into a single track which can be scanned with a single reading mechanism in T seconds. We will accordingly partition the storage of the cylinder as follows (see also Figure 1): The cylinder is (logically) divided into two parts, to be called data space and control space. The data space consists of N tracks (track 1 through track N), while the control space is just one track (track number 0) to be called also control track. The pages of the control track will be called control pages (CP's); a specific control page (0, i) will be denoted by CP i . We further partition the data space into equal sized cells to be called data cells (DC's). A data cell may be of any size depending upon the application; for the sake of simplicity, however, we assume in this paper that each CD is a page, to be also called data page (DP). The control space is similarly partitioned into fixed size control cells (CC's) so that the total number of control cells is equal to the number of data cells, which means that length(CC) =length(DC)/N. I I We now associate a unique CC with each DC as illustrated in Figure 1: The control cells in CPi are associated with the data cells in column i+o*, while 0 is a small integer which depends upon the application, as will be described later. (In Figure 1, 0 is assumed to be 1). We will use the term memory slot for a pair (CC, DC) associated with each other. Suppose now that the items (n, d) are stored by recording n on a CC and d on its associated DC. Given then a predicate p we can retrieve an item satisfying p (n) simply by reading continuously the control track, computing p on each CC. If a CC containing an n which satisfies p (n ) is found, then we switch to its associated DC to read the d part of the item. The predicate p must be simple enough so that its computation can be carried out in time for the associated DC to be accessed at. the same revolution; (this point will be further discussed later) . Although the above organization of information on the cylinder permits an efficient associative retrieval of an item, it still does not turn the device into a PAM, since most of the primitives of a PAM cannot be performed efficiently, as will now be demonstrated by considering two such primitives. We will see that the difficulties are due to limitations of the conventional architecture of cylinder devices, limitations which are not inherent to such memories and can be removed quite easily. Consider first the act of storing a new item (n, d). * Additions performed on the sector index are assumed to be modulo S throughout this paper. 589 writing head Figure 2-An illustration of the fast correction capability provided by separating between the reading and writing heads. Suppose that the heads are moving from left to right relative to the track, keeping the distance d between them constant. Correction of a page is performed as follows: The content of the page is read into the buffer by the reading head, corrected by the processorll". Then, when the writing head gets to the page, the corrected information is written on it. This operation has three steps: (1) An empty slot must be· found. (We will assume that an empty slot is identified by a zero CC.) (2) The control page containing the zero CC must be modified by inserting n into it. (3) d must be written on the DC associated with the CC. When we are ready to perform step 2, after identifying an empty CC, the read/write head is not above the control page any more. We have to wait until it gets there again in order to record the corrected information. Hence, it takes more than T seconds to store an item, which does not seem acceptable. Essentially, our problem is the inability to update a sector on conventional rotating devices in less than T seconds. This is clearly due to the fact that the same physical head is used for both reading and writing. The problem can in principle be solved by separating between the two heads, as illustrated in Figure 2. Note that fast updating capability is very useful even for conventionally addressed memories. It is, however, crucial for the control track in the above proposed access method, since, as we will yet show, virtually all the primitive operations of a PAM require frequent modification of control cells. There IS yet another inadequacy of conventional cylinder-memories as hosts for a PAM. This is revealed by the following description of set retrieval. Suppose that there are several items satisfying a predicate p (n ). A useful PAM must be able to retrieve 590 Fall Joint Computer Conference, 1972 all of them efficiently. But suppose that two of the d parts of these items happened to be written in sectors j and j+1. To be specific, suppose that the association between CC's and DC's is as in Figure 1. Now, the CC of the first item is sensed while the heads are above sector j -1; if the controller is instructed to read the associated DC, then the CC pointing to the second relevant item cannot be sensed, and we will have to wait for another revolution in order to retrieve the second item. The problem here is that we can use only one head at a time. Apparently, we need the ability to access the data space and the control space in parallel, in order to be able to retrieve sets "by content" efficiently. In the next section we describe in detail a realization of a cylinder PAM which incorporates the above suggested modifications. AN ORGANIZATION OF CYLINDER MEl\10RIES AS PAM'S The main structural characteristics of the memory A schematic description of the proposed memory organization is given in Figure 3. The main novel ~_-+-_co_nt_ro_l_In_pu_t_Cl_lan_n>,-el_-,wr~:~ head (COR) read head (ClR) control track Control Buffer Data Tracks \ \ I I ~ I Controller Storage Space features in it are the following: (a) We are using the partition of the storage space into control-cells and data-cells, which was introduced above. (The specific method for associating a CC to a DC is yet to be determined.) (b) The read/write head normally associated with the control track is separated into two heads: the control input head (CIH) which reads from the control track, and the control output head (CO H) which writes on it. (See also Figure 2.) (c) The two heads associated with the control track have to function in parallel to each other, and to one of the data heads. There are accordingly three parallel channels connecting the controller to the device: two unidirectional channels and a bidirectional one. (d) Apart from whatever buffer space is required for the normal data transfer, we need fairly large buffer space for manipulating the control track. The size of this control buffer will be determined later. (e) To supervise the activity of the PAM we need a fairly sophisticated controller which must support five parallel activities. We will accordingly describe it as being a complex of five independent "virtual processors"* working in parallel but interacting with each other. They are listed below: (1) The control input channel which reads the control track, via the, CIH, into the control buffer. (2) The control output channel which writes information from the control buffer into the control track (using the COH). (3) The data channel which controls the data heads and transmits information from one of them (at a time) into a data buffer in the controller, and back. (4) The memory channel which transfers information into the main (target) memory. (5) The'monitor which supervises. the activity of the memory. (By the phrase "memory" we mean the whole complex: storage space, heads, channels, buffers, etc.) The dynamic behavior of the memory Main Memory Figure 3-An illustration of the proposed memory organization (note that the cylinder storage space is illustrated in a planar form, for simplicity) The dynamic behavior of the memory is illustrated by the flow-chart in Figure 4. The five columns in the * We use the term "virtual" because it should be possible to realize several of them by a single actual processor. Rotating Storage Devices as Partially Associative Memories Control Input Ch. (CHl) Monitor - - - - - - - - - Control Output Ch. (CH2) - Data Ch. (CH3) Memory Ch. (CII4) -1 data of CPj' back into it. Figure 4-The dynamic behavior of the controller The five columns represent the five parallel activities of the controller. There is no significance, from the point of view of timing, to the relative location of the boxes in the various columns, see for that Figure 5 diagram represent the five processors mentioned in the previous section. The activity of the various boxes in Figure 4 depends upon the specific 1/0 instruction being served by the memory; the flow of control, however, is general. We will now describe the activity of the memory using a specific 1/0 instruction as an example. (Throughout this subsection we will ignore most of the issues involved with the need to synchronize the various parallel activities of the memory; they will be treated later in the paper.) Suppose that the memory is instructed to store a sequence of items (nl' dl ), (~, rh) ... (nk, dk). The following steps must be performed for each item (ni, d i ) : (a) An empty memory slot must be found. (We will assume that an empty slot is identified by a zero control cell.) 591 (b) d i has to be written on the part of the empty slot. (c) ni has to be written on the control part of the slot. The execution of this task begins by the monitor instructing the CIH to read a page from the control track (box a.l in Figure 4). The control page so accessed is the one which happens to be nearest to the CIH at this moment; we assume it to be CP j. The actual input process is represented by box b.I. Box a.l is executed just once per instruction; the monitor then loops between boxes a.2 and a.IO, reading the control track, analyzing its information, updating it and activating the data heads and the memory channel until the 1/0 instruction is satisfied. Box a.2 begins to analyze the data extracted from CP j (we will see later that at this moment, some, but not all of CP/s contents is transferred into the control buffer). In general, the purpose of the analysis is to verify a given predicate p on every CC of CP j. p depends upon the instruction being served; in this case it is simply a search for a zero CC, which identifies an empty memory slot. The monitor may not be able to complete the analysis of CP/s data by the time that the CIH gets close to the next sector in the track. In that case the analysis has to be interrupted in order to reactivate the CIH to continue by reading CP j +1 (cf. boxes a.3 and b.2). When the analysis of CP j is completed (in box aA), the monitor is in position to decide whether an item has to be transferred into, or out of the main memory. In our case, the current item (ni' d i ) has to be recorded on the slot identified by a zero CC, if one was found. This is done in several steps: First, ni is written into the empty CC space of the control buffer containing the copy of CP j (boxes a.5 and c.l). This should not take more than a few microseconds as the name-part of an item is relatively small and the operation does not involve the cylinder itself. Secondly, the monitor activates the appropriate data head (boxes a.6, d.a) to access the data page associated with the empty CC found in CP j ; we will denote this data page by DP j • The association between data cells and control cells has to guarantee that at this moment the data head will be fairly close to DPj ; we will return later to this point. When the data head gets to the page DPj, both the memory channel and the data channel connecting the controller with the cylinder are activated (boxes d.2 and e.2). If any additional modification to the content of CP j is required, it will be done by box a. 7. In our case no such modification is necessary but it may be required in various "marking operations" as will be shown. If the 592 Fall Joint Computer Conference, 1972 buffer containing the content of CP j was modified, either by box a.5 or by a.7, the modified buffer must be written back into CP j modifying the page itself (boxes a.S and c.l). Obviously, the COH must be in position to do that, which is a requirement on the physical separation between the two heads. If the 1/0 instruction is fulfilled (box a.9), the controller idles; otherwise it goes back to box a.2 to begin analyzing CPi+l'S information which by now is partly in the buffer. (Box a.lO represents the fact that now the next control page will be analyzed.) We should keep in mind that the above description is only an example. Actually the memory must be able to carry out a variety of I/Y> instructions based upon a set of predicates. The actual activity of most boxes in Figure 4 is therefore a function of the instruction being served, and there should be a way to select the desirable algorithm each time. One can infer from the discussion above that the controller which supervises all this activity should be fairly powerful, preferably programmable, processor. This processor must be able to perform simple computations very quickly, to support several parallel activities and to choose at run time between several alternative procedures. The price of such a minicomputer is nevertheless only a small fraction of the price of a large disk or drum. In fact there is a growing trend in the industry* to change the conventional special purpose controller to a processor which has essentially the characteristics required here. Synchronization of the parallel activities of the PAM In the discussion above we neglected the need to synchronize the various parallel activities of the controller with each other and with the rotation of the cylinder. Such a synchronization imposes constraints which will now be discussed. (a) The controller cycle (boxes a.2 through a.lO) must be executed in less than t8 = T / S seconds, since it has to be applied to each of the S pages of the control track. This obviously restricts the complexity of the primitives of our P A1Vf, for a given controller and cylinder. With the present speed of logic, however, we should have ample time at least for primitives of the type of simple pattern matching. (b) The data pages associated with CPj can be accessed not sooner than 01 seconds after the reading of CP j is terminated, where 01 is defined * Control Data Corporation, for example, now has such a controller.7 as the time required to execute boxes a.3 through a.6 plus the "head switching" time (box d.l). This delay clearly depends upon the specific 1/0 instruction; However, for a given memory system, with its finite set of primitives we can define: The delay ~1 may be realized by an appropriate association between the control cells and data cells. There are two cases to be considered: (1) If ~1 < t g (t g was defined as the intersector gap time), then it is enough to associate the CC's in CP j with the data column j+l (namely the pages (i, j + 1) for 1 ~ i ~ N) . (2) If ~l>tg, we may use several techniques: We can associate the data sector j+2 with CP j or we can place the data sectors and the control pages in different angular positions. (Alternatively, the same effect may be achieved by allowing a non-zero angle between the data heads and the head reading from the control track.) ( c) The actual updating of CP j can begin only 02 seconds after the reading from it was terminated. Here, 02 is the execution time of boxes a.3 through a.S in Figure 4. We can again define for a specific memory: The delay built into the memory must be greater than or equal to ~2. Such a delay may be realized by an appropriate physical spacing between the read head and write head associated with the control track, so that the write head gets to every page, 82 seconds (or more) after the read head leaves it. In the case of disks there is a problem involved with this delay: a fixed linear distance between the heads amounts to different time delays when the heads are accessing different cylinders. This, however, does not necessarily mean that the distance between the heads must be physically readjusted for each cylinder. It would be enough to guarantee that all the delays are greater than or equal to ~2. The different delays can be buffered by the control buffer whose length, as we will see below, is a function of the delay. (d ) Note that during the part of the controller-cycle serving CPj, we have the whole copy of CP j residing in the control buffer while in addition the content of CPi+l already flows into it. This situation exists for about ~2 seconds, (during the execution of boxes a.4 through a.S). At the end Rotating Storage Devices as Partially Associative Memories of this period the occupied buffer space roughly: (~/ts+1) • (page size) IS (neglecting the intersector gap time). We do not need more buffer space than that, since after concluding the execution of box a.8, information begins to flow out of the buffer back into CP j (box c.1) at the same rate in which it flows in,! from CPHI. We thus have a rough estimate of the required buffer space. In the case of disks, if g2 is the maximum of 52 over all the cylinders, then the required buffer space is: (52/t +1) • (page size). 8 An illustration of the time relationship of the various activities of the memory is provided by the time-diagram in Figure 5. It is easy to infer from this diagram, as well as from Figure 4, that the size of the delays 81 and ~ would generally be smaller, and in any case not much bigger, than t8' It may be instructive to present at this point an example of an actual situation. We will consider the popular CDC 841 disk file (which is virtually identical to IBM 2314). Every cylinder of this disk is constructed from 20 tracks each with 14 pages. The size of every page is 3840·bits. The revolution time Tis 25ms. In this disk we will have 19 control cells per control page; the size of each CC is about 190 bits. The controller cycle ts is about 1.8 ms. which should allow a fair amount of computation to be rt ~ne - :~ ell 3 ~l __61_-f -'1-1 I I _ I 6 I DP, accessing DP1+1 Wt"itingCP 2-3 I , I I I' : , r I -(-- - r writing CP -"'f'- -j - - I if: . <" 4'-,'5':6 ,!~ ,-a fl." O~ // II equivalent to saying that the distance between the read head and the write head must be at least 1.5 times the length of a page. Now, the radius of the outermost cylinder of the 841 disk file is "-'6.5 inches, while the radius of the innermost cylinder is "-'4.5 inches. If the two separated heads are mounted on a fixed angle fork, which has to move from cylinder to cylinder, then the biggest delay ~ would be 52~ (6.5/4.5) . ~~0.75·t8' Which means that the size of the control buffer should be: (52/t s +1)· (page size) ~6750 bits. THE PERFORMANCE OF THE PROPOSED "PARTIALLY ASSOCIATIVE lVIEl\10RY" We will try now to justify the name "partially associative memory" for the proposed device, by describing its performance in carrying out the operations which were defined as essential primitives of a PAM. In order to compare the performance of our P Al\1 with that of a conventional rotating memory we will assume an item in the conventional case to be of page size. Retrieval of a single item Instead of the average T/2+tB seconds on conventional cylinder memories, it takes on the average T/2+2·t 8 +81* seconds to retrieve an item from our PAM. Needless to say, this extra inefficiency is more than compensated for by the fact that the retrieval is "by content." : t_~ :readi1 CPj+l readingCP j 1"" : f, 2_ _ I I I Honitor accessing _-='--_ performed on each CC. As to the· size of the control buffer, suppose that ~ = t8/2 which is " :-t __ Z~~~~e~ontrol .!hI. , , , , 593 L, Jlt Storing a single item reading CP j +2 ./~ ;{4-S ... Controller cycle - - - - This operation turns out to be almost always more efficient than its equivalent for conventional cylindermemories. The store operation for both types of memories has two parts - ts _____ - t - Figure 5-Illustration of the time correlation of the various parallel activities of the memory The numbers above the "monitor line" represent the boxes of Figure 4 which are active at a specific moment. The dotted lines represent the activation of the various channels by the monitor (a) Getting to the point where the item has to be written; (b) The actual writing process. * Here, and in the rest of this section we neglect the inter-sector gap time t g • 594 Fall Joint Computer Conference, 1972 On conventional cylinders, the store operation takes T /2+ts seconds on the average: T /2 to get to the desired address and ts forthe actual writing. To store an item on our PAM, we do not have to wait until the heads get to a prespecified sector; we can write on the first empty memory slot encountered. The time delay d involved with that, obviously depends upon the distribution of empty slots on the cylinder, and it is in general considerably smaller than T /2. For the case that there are e empty slots distributed uniformly, on the cylinder, the expected delay d (e) was computed elsewhere. 6 Here we will bring some numerical results for illustration. If S = 10 then we have: process has two phases: Phase 1: During the first revolution we look for items satisfying p (n ). As many of them as possible will actually be retrieved and their mark bit will be set to 0 (note that not more than one item per sector can be so retrieved). All the other items satisfying pen) will be marked. Let R be the number of items so marked. R would obviously be zero if M = 1; in this case the retrieval is accomplished in one revolution. If, however, R ~O we continue with Phase 2. Phase 2: Now we are looking for items which satisfy the predicate: pen) /\marked e expectedd (e) in units of ts 1 4.5 5 10 15 20 1.2 0.49 0.24 0.13 These results mean that if the cylinder is not heavily loaded we have a very good chance of finding an empty slot in the first sector encountered. On the other hand, the second part of the store operation takes longer than its conventional counterpart. Once the CIH gets to the control sector containing an empty CO, it takes 2·ts +52 seconds to complete the store operation. This is more than the standard ts seconds, but the much shorter rotational delay would more than compensate for it, in most cases. trying to retrieve them. (Again, we can retrieve only one per sector.) Each such item, if retrieved, is unmarked and R is reduced by one. The retrieval process is terminated when R becomes zero. A few remarks are in order: (a) The marking of items does not take any extra time; this is due to the efficient correction capability of the control space provided by the head separation. (b) The algorithm can be carried out only if the memory can handle predicates of the form: p (n) /\ marked. Storing a set of items This operation was already described in detail. The time required to carry it out depends again upon the distribution of empty slots on the cylinder surface. This time is calculated elsewhere6 for the case that the empty slots are distributed uniformly. Retrieval of unordered sets Let A (p) be the set of items defined by the simple predicate pEP. Namely, A (p) is the set of items stored on the PAM, which satisfy p (n) . Let M be the maximal number of pages containing items from A (p), which belong to a single data sector. For the sake of retrieval of such sets (and also for other purposes) we allocate a single bit in each CC for system use; it will be called mark-bit. We will use the terms: to mark and unmark an item, for making its mark bit 1 and 0 respectively. In addition we will use the word marked as a primitive predicate which is "true" for marked items. The following is an algorithm which retrieves the set A (p) in M revolutions of the cylinder. The retrieval Note, however, that if we can guarantee that items which do not belong to A (p) are unmarked, then in phase two we can simply look for marked items. (c) The efficiency of set retrieval obviously depends upon the distribution of the set on the cylinder surface. This means that although we do not have to know the address of an item stored on a P AIVI in order to retrieve it, we should keep some control on the distribution of sets of items. The efficiency of this operation for randomly distributed sets is calculated elsewhere. 6 Deletion To delete an item, it is enough to erase its control cell. Given a simple predicate p we can therefore delete the set A (p) in T+ts seconds regardless of its distribution on the cylinder. This efficiency is again due to the head separation. Other associative l/P operations, such as the retrieval of sets defined by composite predicates are discussed elsewhere. 5 ,6 Rotating Storage Devices as Partially Associative Memories Some odd applications of the proposed memory Although a comprehensive study of the applications of the proposed memory is not within the scope of this paper, we will mention here two of its less obvious usages. The usages to be discussed here depend more on the effectively instantaneous modification capability created by the head separation, than on the associative addressing. (a) Synchronization of parallel processes which access the same information, is a common problem in data-base management. In particular, if two processes PI and P2 are capable of modifying the same item i, they must be prevented from doing that at the same time. Otherwise, the following undesirable sequence of events may occur: PI reads item i. P2 reads item i. PI replaces i with modified information. P2 replaces i with modified information. In the context of the proposed PAlVI, such processes can be synchronized by means of "semaphores" (cf. Dijkstra8 ). Every bit in the control cell can serve as a semaphore since it can be set and reset instantaneously. (b) The second useful application of the "instantaneous modification capability" is that it provides us with a very simple and cheap way for maintaining various "usage statistics." Suppose that we are interested in the number of retrievals of each item in the memory. Suppose in addition that we can afford to allocate a suitable count field in each CC. All we have to do then is to increment by one the count field of every item when retrieved. This capability may be quite useful for data-base administration. CONCLUSION As pointed out in the introduction, the detailed description of an organization of rotating associative memory was not intended to imply that it is the only, or even the best way for realizing associativity on rotating devices. The basic ide~s presented in this paper can be put together in several different ways as it is shown elsewhere. 5 ,6 One of the possible variations is worth mentioning here: The control information can be stored on a relatively slow random-access memory, external to the rotating device. Such an organization does not require any change in the device itself. 595 Space limitations do not allow us to include any discussion of the applications of the proposed memory. Such a discussion is particularly necessary because of two reasons: (a) Because of the current unavailability of large scale associative memories, there is almost no published study of their use. (There are, however, several papers which discuss the use of simulated associative memories in data-base environment. 9 ) (b) The proposed memory is quite restricted; it is only "partially associative" and it is not really large, from the point of view of "data base" type applications. Therefore, the ways for utilizing such memories are not obvious. Elsewhere6 we consider some applications of the proposed PAM, but a comprehensive study of the subject is yet to be done. ACKNOWLEDGMENT It is my pleasure to thank Dr. Peter Patton and Dr. Larry Kinney for reading drafts of this paper and for their useful comments and suggestions. REFERENCES 1 D L SLOTNICK Logic per track devices Advances in Computers Academic Press 1970 2 J L PARKER A logic per track retrieval system IFIP Congress 1971 3 G F COULOURIS J M EVANS R W MITCHELL Towards content-addressing in data bases The Computer Journal Vol 15 No 2 1972 4 J L GERTZ Storage reallocation in hierarchical memories Third Symposium on Operating Systems Principles October 1971 5 N MINSKY Rotating storage devices as "partially associative memories" University of Minnesota Computer Sciences Department Technical Report 72-4 April 1972 6 N MINSKY On associative addressing in cylinder memories To be published 7 A nonformal description of CDC /844 disk system 8 E W DIJKSTRA Cooperating sequential processes Programming Languages F Genuys (ed) Academic Press 1968 9 J A FELDMAN A n algol-based associative language Communications of the ACM August 1969 The page fault frequency replacement algorithm* by WESLEY W. CHU and HOLGER OPDERBECK University of California Los Angeles, California INTRODUCTION I I I; havior3 takes into account the varying memory requirements during execution. With respect to this model, we call the replacement algorithm, which keeps exactly those pages in main memory that have been accessed during the last T references, the Working Set Replacement Algorithm. The performance of the Working Set Algorithm still depends on the choice of the working set parameter T and program characteristics (e.g., locality).4 Further, the Working Set Algorithm appears expensive to implement. Therefore we were motivated to develop an adaptive replacement algorithm which is largely independent of program behavior and input data and is simple to implement. We shall use the page fault frequency (the frequency of those instances at which an executing program requires a page that is not in main memory) as an adaptive parameter to control the decision process of the replacement algorithm. Since the page fault frequency reflects the actual memory requirements of a program at execution time, the Page Fault Frequency (PFF) Algorithm can be applied to arbitrary programs without prior knowledge about program behavior. The performance of replacement algorithms is usually compared in terms of efficiency and space-time product. Because of the complex nature of program behavior, we used simulation techniques to measure the efficiency and the space-time product for various programs. From these simulations we were able to compare the performance of the LRU and Working Set Replacement Algorithms. Next we describe the PFF Algorithm and compare its performance with the LRU and Working Set Replacement Algorithms. Finally, we discuss the advantages of this new replacement algorithm when employed in a multiprogramming environment and the implementation of the PFF Replacement Algorithm. Dynamic memory management is an important advance in memory allocation especially in virtual memory and multiprogramming systems. In this paper we consider the case of paged memory systems: that is, the physical and logical address space of these systems is partitioned into equal size blocks of contiguous addresses. The paged memory system has been used by many computer systems. However, the basic memory management problem of deciding which pages should be kept in the main memory to allow efficient operation without wasting space is still not. sufficiently understood and has been of considerable interest. Obviously, pages should only be removed from the main memory if there is a very low probability that they will be used in the near future. The difficulty lies in trying to determine which pages to remove, without incurring difficult implementation problems at the same time. Many replacement algorithms have been proposed and studied in the past, such as Random, First-in First out, and Stack Replacement Algorithms! [for example, Least Recently Used (LRU)]. These replacement algorithms are usually operated with a fixed size memory allocation. For such a fixed size memory replacement algorithm we need to have prior knowledge about program behavior. For example, in the LR U case, we need to have an estimate for the number of page frames which have to be allocated for each individual program. Further, program behavior is usually data dependent and changes during execution. An efficient replacement algorithm should therefore automatically adapt to the dynamically changing memory requirements. A recent study by Coffman and Ryan2 shows that such dynamic storage partitioning provides substantial increases in storage utilization over fixed partitioning. The working set model of program be- PERFORMANCE OF LRU AND WORKING SET REPLACEMENT ALGORITHMS * This research was supported by the U.S. Office of Naval Research, Mathematical and Information Sciences Division, Contract Number NOO014-69-A-0200-4027, NR 048-129. Because of the complex nature of program behavior, analytical estimation of such parameters as page fault 597 598 Fall Joint Computer Conference, 1972 10- 2 r--.....,.---,----,.....--~----,,...---........- -..... 10- 2 r---,----.----,---~----,,...----......--- \ \ \ \ \ \ ""'\ \ , \ > U ow .'.. , , I' \ Z ~ .. ,, ~~Ii ...\ 10-4 a: """, "V \ \V u.. 'I; " ".", > u zw 5 META7 , \" 10-4 ~ w -a: V'\ . ., ~.I\ , u.. \ I" , J , '-'"\ I I I ,, ,, , \ FORTRAN yDcDL FORTCOMP , \ I I I I I I \ I I 10- 6 10 20 30 40 50 60 ~_----L_ 70 10 ...L..-I'--_'--_---L_ _....J _.LJ..._ _....J..._ _ 20 STACK DISTANCE Figure 1-Stack distance frequency for the four measured programs a) FORTRAN and META7 frequency and the average inter-page-fault-time (average process running time between page faults) becomes very difficult. Yet this information is important in the planning of an efficient replacement algorithm that optimizes system performance. Therefore we employ measurement techniques for such estimations. This technique has been used previously to measure dynamic program behavior5 and also to measure the performance of the Belady Optimal Replacement Algorithm, 6 the LRU Replacement Algorithm/· s and the Working Set Replacement Algorithm.4 For this purpose an interpreter for the UCLA SIGMA-7 time-sharing system has been developed. This interpreter is capable of executing SIGMA-7 object programs by handling the latter as data and reproducing a program's reference string. This sequence . ' In turn, can then be used as input to programs which simulate various types of replacement algorithms. For convenience in presentation, we let the time required 40 30 50 60 70 STACK DISTANCE b) FORTCOMP and DCDL for a thousand page references correspond to one millisecond (msec). Four different programs of various characteristics were interpretively executed. A FORTRAN Program (FORTRAN) and a FORTRAN compiler (FORTCOMP) were chosen as representatives for programs with small localities. A META7 compiler and a DCDL compiler represent programs with large localities. META7 translates programs written in MET A7 to the TABLE I-Characteristics of Measured Programs SIZE FORTRAN FORTCOMP DCDL META7 NUMBER OF PAGE REFERENCES STATIC 8 0 DYNAMIC ro 24 24 44 84 38 39 71 165 4,870,000 3,810,000 3,010,000 2,590,000 Page Fault Frequency Replacement Algorithm assembly language of the SIGMA-7. The DCDL (Digital Control and Design Language) is written in META7. It translates specifications of digital hardware and microprogram control sequences into machine code. To illustrate the behavior of these programs, Figures 1a and 1b display the stack distance frequencies as defined in Reference 1. The frequent occurrence of large stack distances (20 and more) for META7 and DCDL indicates that the localities for these programs are larger than the localities of FORTRAN and FORTCOMP. Table I shows some characteristic properties of these programs. The column 'size' is divided into two parts. 'Static' refers to the number of pages, 80, necessary to store the program as an executable file on a disk where one page consists of 512 32-bit words. 'Dynamic' indicates the number of different pages, To, actually referenced while processing the given input data. There are two reasons why To is not equal to 80: first, not all the pages which make up the program may be referenced while processing a particular set of input data; second, a number of data pages is created and accessed during execution to provide for working storage space, buffer areas, etc. The number To is of special interest because it is equal to the minimal number of page faults which will be incurred by every replacement algorithm based on demand paging. Actually, To page faults will occur even if not a single page is replaced. In this case, all page faults are caused by the very first reference to a page. For a given page reference string wand a given replacement algorithm with its parameter, the page fault frequency few) is defined as the ratio of the number of page faults during processing w to the total number of references in w. T few) = - t where T= 599 when applied to different reference strings. We therefore define a normalized page fault frequency, fn(w), as The normalized page fault frequency considers only those page faults which are caused by references to pages which have been accessed before but which were replaced later. Clearly, if no pages are replaced, fn(w) is O. The efficiency E for a program execution is defined as the ratio of total virtual processing time (processing time without page-wait times) to the total real processing time (total virtual processing time plus total page-wait times) ; that is, total virtual processing time E = total real processing time 1 l+f(w)·R' (1) where T m = access time of main memory Ts = access time of secondary memory R is called the speed ratio of a particular combination of secondary and main memory. We assume T m to be the time of one page reference (10-3 msec). The maximum efficiency which can be achieved if no pages are replaced is Eo again depends on To and t and is therefore in general different for each reference string. For this reason, we define the normalized efficiency En which corresponds to the normalized page fault frequency fn(w), as total number of page faults t = total number of page references For a finite t, since TZTo>O, few) is always greater than O. The average inter-page-fault-time is t/T page references. If no pages are replaced, the smallest page fault frequency is fo(w) =ro/t which depends on ro and t. In general, fo (w) is different for different programs and could be different even for the same program when processed with different sets of input data. For this reason, it is awkward to use few) as a measure to compare the performance of a replacement algorithm and use this as a measure to compare the performance of various replacement algorithms. Note that En always reaches its maximum of 100 percent if no page is replaced that will be referenced again, and that En is independent of ro. Figures 2 and 3 display the normalized efficiency· and the average inter-page-fault-time as a function of memory space allocation for the LRU Algorithm with R = 10,000. * For a given memory space we notice that * These curves can be derived directly from the stack distance frequencies in Figure 1 (see Reference 1). 600 Fall Joint Computer Conference, 1972 90 80 70 DCOL ?f!. >-' u zw 60 U ~ LL W aw 50 N :::i « :!!: a: 40 0 z 30 20 amount of main memory to every process will almost surely result in either inefficiencies or waste of storage. In addition, the estimate of the memory needs of a process should be fairly accurate because only a few pages less than actually necessary means, in many cases, a large decrease in the average inter-page-fault-time. The determination of the amount of required memory is further complicated by the fact that it is usually data dependent and may vary during execution. In contrast with the LRU Algorithm, the Working Set Replacement Algorithm requires a variable sized storage space for each process. This variable storage space provides the capability to adapt to dynamic changes in program behavior. The working set W (t, T) at a given time t is the set of distinct pages referenced in the process (or virtual) time interval (t-T+1, t), that is, the set of pages accessed during the last T references where T is called the working set parameter. The working set size w(t, T) is the number of pages in W (t, T). The basic replacement policy is to keep in the main memory those pages which have been referenced 10 NUMBER OF ALLOCATED PAGE FRAMES Figure 2-Normalized efficiency of the LRU algorithm u different programs have different normalized efficiencies and different average inter-page-fault-times. Further, programs with small localities tend to yield better performance than programs with large localities; The average inter-page-fault-time increases as the assigned memory space increases and reaches its maximum, tiro, as the memory space reaches a certain size. At this memory size the normalized efficiency reaches 100 percent. Further increase in memory space does not increase the average inter-page-fault-time and efficiency. The fact that all four curves in Figures 2 and 3 have their steepest slope occurring at different memory sizes reflects the different memory needs for each program. Thus, for a given process that uses the LRU replacement algorithm, one of the most difficult tasks is to determine the size of the memory which is to be allocated for each process. Assigning too large a number of page frames for a process results in inefficient utilization of memory space, while assigning too small a number of page frames yields too many page faults, resulting in inefficient operation. A procedure which gives the same w en :!!: w' :!!: 100 i= FORTCOMP ~ ..J ::l « LL W Cl «0.. r:i: w I- ~ w Cl «a: 10 w > « 10 20 30 40 50 60 70 NUMBER OF ALLOCATED PAGE FRAMES Figure 3-Average inter-page-fault-time of the LRU algorithm Page Fault Frequency Replacement Algorithm during the last T msec. T is an important parameter which affects the performance of the Working Set Algorithm. In general, the Working Set Algorithm can be considered as an LRU Algorithm with variable size memory allocation. There is, however, a crucial difference. Using an LRU Algorithm, pages are always replaced when a page fault occurs. This does not apply to the Working Set Algorithm. Here, page frames are freed whenever they have not been accessed during the last T msec. A strict implementation would require the setting of a flag at this point; that is, an indication that the corresponding page frame can be used for a different page of any process. Another problem is to detect the exact time when a page has not been referenced during the last T msec. Hence, it appears to be rather expensive to implement the Working Set Algorithm. In the simulation of the Working Set Algorithm we assume that exactly the working set is kept in main FORTCOMP ~ ~." 90 / / ,. ,. ." ~ - 90r-----.-----~----~----~~ 601 __~----__ 80 70 c.J w (J) :iE 60 w' :iE i= ..:. ...J :::> ' Figure 5-Average inter-page-fault-time of the working set algorithm (,) z w U u:: u.. 60 w o w ~ o oa: 500 a.. w :E 400 i7w ~ a.. en 300 200 100 20 30 40 50 60 (2) NUMBER OF ALLOCATED PAGE FRAMES where S(z) is the amount of storage occupied by the process at time z. The real time occupancy of information in main memory can be much longer than the actual processing time. This occurs because of multiple processes running in parallel and because of page-wait times. Since our study is concerned with the application of replacement algorithms to individual programs, only page-wait times have been considered. If we consider the execution of a program as a discrete process, the integral in (2) can be replaced by a sum which consists of two parts. The first part is the spacetime product, due to the actual processing, while the second part is due to the total page-wait time. Thus, the space-time product C can be re-written as r C= L SiTm+ L Sti+l R-Tm o i=l 800 10 o t - - R=10,OOO - - - - R=O 900 (3) i=l where t is the total number of references;· r is the total number of page faults; Si is the number of allocated Pl1ge frames prior to the ith reference (i is called the number of the reference); ti is the number of the reference which causes the ith page fault (since we do not preload any pages, tl = 1) ; and StHl is therefore the number of page frames which are allocated during the ith page-wait time. Figure 6-Space-time product of the LRU algorithm Since Li':l Sti+1- T mand Li':l SiTmare independent of R, C is a linear function of R. If we know C for R = and C for another R (0 < R < 00 ), then we can compute Li:l Sti+l" T m from (3), and thus C for any R for 0< R < 00. For example, the space-time products of R=O (the dashed lines) and R=10,000 for META7 with the LRU Algorithm are shown in Figure 6. The space-time product for R = 5,000 can be obtained easily from a linear interpretation of the curve R = and R= 10,000. Hence the space-time product presented in this paper can also be extrapolated to other values of R(O 0 0 a: a.. w 2 META7 200 i= W u " (.) zw 0 u: 60 LA. W 0 w N ::::i 50 ~ u. Lit l' ~ 50 ~ ci: w fo- ~ w 40 DCDL l' ~ 605 of P. The same is true for DCDL if P is less than 1/40 and for META7 if P is less than 1/120 page faults/msec. Figure 10 displays the space-time product for the PFF Replacement Algorithm. Again we can observe that the performance of the PFF Algorithm is almost independent of the choice of P for P < 1/50. For the four measured programs, the space-time products are almost constant over a wide range of values of P. The performance of the PFF Replacement Algorithm is therefore relatively insensitive to P. This is a very appealing feature of the PFF Algorithm since it alleviates the task of selecting a parameter P for implementation. In Figure 11, the number of allocated page frames is displayed as a function of virtual processing time. As can be seen, the memory requirements for the four programs are quite different, and the number of allocated pages varies during execution, particularly for META7 and FORTRAN. This clearly demonstrates the adaptive capability of the PFF Algorithm. The area a:: w > ~ 30 450r-----.-----.-----.-----~----~--~ 20 LMETA7 400 - - R=10,OOO - - - R=O 10 350 1/40 1/80 1/120 1/160 1/200 PFF-PARAMETER P, PAGE FAUL TS/MSEC (1 MSEC = 1000 PAGE REFERENCES) en o z oo 300 w en w l' Figure 9-Average inter-page-fault-time of the PFF algorithm ~ 250 fo-' o :> o the page fault frequency and the lower limit of T is T=l/P. The PFF Algorithm may also be viewed as a LRU Replacement Algorithm with variable size memory allocation where the size is determined by T and the inter-page-fault-times. Figures 8 and 9 show the normalized efficiency and the average inter-page-fault-time for the PFF Replacement Algorithm for which P ranges from 1/10 to 1/200 page faults/msec. Figure 8 reveals two interesting properties of the PFF Algorithm: (1) For P < 1/100, the normalized efficiency is larger than 90 percent for the four measured programs. This implies that high efficiency can be achieved by the PFF Replacement Algorithm by using the same page fault frequency parameter for all four· programs regardless of their size and characteristics. (2) for FORTCOMP and FORTRAN the normalized efficiency is virtually independent o ~200~ ~ _L >! ~ ~ en DCDL 150 _ ... - 100 = \. META7 _.1--- _----,LDCDL ,-~---------- --------,,' ~-~~:;::-----~::~R:~ 50 PFF-PARAMETER P, PAGE FAUL TS/MSEC (1 MSEC = 1000 PAGE REFERENCES) Figure lQ-Space-time product of the PFF algorithm 606 Fall Joint Computer Conference, 1972 :: :: ~ :.. ·······:···:.J"H 60 50 40 META7 P = 1/50 PAGE FAUl TSIMSEC I:~ : ~ ::1------'"\.-------1----, •• ~ '- • DCDl: 1 L_ ._._. 1 I1 : :: 1 : : 1 n : :: 30 20 ~ _; •• : . -..Jl.i-~r-.~ '" I • 10 "': : 1I : 1 : : I I ~ FORTCOMP VIRTUAL PROCESSING TIME,SEC (1 SEC =1000 000 PAGE REFERENCES) Figure ll-Dynamic changes in memory allocation of the PFF algorithm the fixed-size memory allocation. In such systems good performance can only be achieved if we allocate a number of page frames which is close to the optimal number that minimizes the space-time product. But even if we knew this optimal number, the performance of the PFF Algorithm would still be superior. This is especially true in those cases where the memory requirements vary drastically during execution (e.g., META7) . If the memory requirements are relatively constant (e.g., FORTCOMP), then the performance of the LRU Algorithm is similar to that of the PFF Algorithm, provided that the optimal number of page frames is known a priori. Figure 13 shows a comparison of the performance of the Working Set and PFF Algorithms. The space-time product is plotted for the working-set parameter T and PFF-parameter P. For large values of T, the space-time product of the Working Set Algorithm is usually lower than the space-time product of the PFF Algorithm for below the four curves corresponds to the space-time product due to actual processing time. For simplicity in representation, only major changes in the number of allocated page frames are indicated in Figure 1l. Nevertheless, the figure shows clearly that the majority of page faults which resulted from changes in program locality occurred during relatively short time intervals. 450 FORTRAN 400 350 COMPARISON WITH THE LRU AND WORKING SET REPLACEMENT ALGORITHMS ~ z FORTCOMP o(.) w en w 300 C!l « In order to compare different replacement algorithms, it is not enough to compare their measured efficiency, since every replacement algorithm will yield a high efficiency if the number of replacements is very small. This can be achieved by providing a large amount of memory. A better criterion for evaluating the performance of a replacement algorithm is the amount of main memory required to achieve a high efficiency level. For this reason the space-time product is a more valid measure for comparison. Let us use the performance of a specific PFF Algorithm with P = 1/100 page faults/msec to compare it with the LR U Replacement Algorithm for various numbers of allocated page frames. Figure 12 shows that the space-time products of the PFF Algorithm for all four programs are lower than the minimum space-time products achievable by the LRU Replacement Algorithm. This implies that the performance of the PFF Algorithm is better than that of the best LRU Algorithm. The above results reveal the inefficiency involved in c.. 1-' (.) ::l o 250 oa: c.. w :E i= 200 W ~ 3s 150 , 100 ----\-50 10 -LRU - - - - PFF FORTRAN FORTCOMP 30 P = 1/100 PAGE FAUL TS/MSEC R = 10000 40 50 60 NUMBER OF ALLOCATED PAGE FRAMES Figure 12-Performance comparison between the LRU and PFF algorithms Page Fault Frequency Replacement Algorithm 607 APPLICATION OF THE PFF ALGORITHM IN A MULTIPROGRAMMING ENVIRONMENT 400 _ 350 - WORKING SET ALGORITHM - Q Z 0 (.) w en w - PFF ALGORITHM R = 10000 en 300 CJ c( L 250 , ~. META7 ~ (.) :::I Q -- -- 0 a:: L W 200 ", ,~' META7 ::E j:: iii ~ a.. en 150 100 50 -::T:;M:7--WORKING SET PARAMETER T, MSEC 40 80 120 160 200 1/40 1/80 1/120 1/160 1/200 PFF-PARAMETER P, PAGE FAUL TS/MSEC (1 MSEC= 1000 PAGE REFERENCES) Figure l3-Performance comparison between the working set and PFF algorithms corresponding values of P. Within the range of P and T of interest, the space-time product of the PFF Algorithm is less sensitive to P than the space-time product of the Working Set Algorithm is to T. A similar statement can be made with respect to the normalized efficiency and the average inter-page-fault-time (see Figures 4, 8 and 5, 9). Since the space-time product of the PFF AI~ gorithm is relatively insensitive to the PFF-parameter P, we do not have to know an "optimal PFF~parameter" to come close to optimal performance. Further, the minimum space-time product of the Working Set Algorithm is comparable to that of the PFF Algorithm. The normalized efficiency and the average inter-pagefault-time of the PFF Algorithm are greater than the normalized efficiency and average inter-page-fault-time of the Working Set Algorithm for all corresponding values of P and T. This shows that the performance of the PFF Algorithm is comparable to the performance of the Working Set Algorithm. Variable-sized memory allocation algorithms such as the Working Set Algorithm and the PFF Algorithm are useful in a multiprogramming environment where the main memory is shared by several programs. In this case the total amount of main memory not occupied by the resident supervisor can be considered as a pool of available page frames. These page frames are allocated to processes and returned to the pool according to the dynamically changing memory requirements of each individual process. In the previous sections, we have used simulation techniques to study the performance of different replacement algorithms. It has always been assumed that there is enough main memory available so that a process can extend its memory space whenever needed. Any implementation of these variable-sized replacement algorithms, of course, must consider the possibility that the pool of available page frames is empty. The probability of this event is determined by the degree of multiprogramming and the type and size of the programs currently in the main memory. However, these variables can, to a large extent, be controlled by the process scheduling mechanism. Therefore, memory management is closely related to process scheduling in multiprogramming and time-sharing systems. In order to reduce CPU idle time due to excessive page swapping, each process must be provided with enough main memory to keep the page fault frequency low. In a multiprogramming environment, it is crucial that this is accomplished without waste of memory space. The PFF Replacement Algorithm provides the supervisor with a means of achieving this effect which assures the same high efficiency level for programs of completely different types and sizes. In addition, the decrease policy of the PFF Algorithm continually tries to free those page frames which are no longer used by the process which enable processes to be run efficiently without wasting memory space. The PFF Algorithm can also be very helpful for process scheduling since it gives the supervisor information about the required number of page frames for each process during execution. Once a process is removed from the main memory this information can be used to schedule this process for the next time quantum. In general, a process will be put on the processor queue only if there are enough available page frames in the pool. The information about the memory space of each process can also be used to decide which process has to be removed from the main memory if the page frame pool becomes empty. There are many ways for the supervisor to make use 608 Fall Joint Computer Conference, 1972 of the information about program behavior provided by the PFF Algorithm. Further investigations might yield other interesting applications. IMPLEMENTATION OF THE PFF REPLACEMENT ALGORITHM The implementation of the PFF Algorithm is very simple. We need only a clock in the CPU to measu~e the time between page faults of every process. ThIs clock measures the process (or virtual) time of each process. The current process time is recorded in the process' stateword. The page table entry can be used to determine which pages are residing in the main memory. For those paging systems that have a USE-BIT feature this feature can be used to determine those pages whi~h have been referenced during the time interval since the last page fault occurred. Whenever a page fault occurs the USE-BITs are reset and the supervisor deter~ines whether the process is operating below the critical page fault frequency level P. For this purpose the time of the last page fault has to be stored. If the last page fault occurred more than T=I/P msec ago, the USE-BITs are used to determine which pages have to be removed from the main memory. Let us now consider the overhead of the above mentioned operations. We know that: 1. The overhead is proportional to the number of page faults. Since the PFF Algorithm assures a low page fault frequency, the overhead is very low. 2. Due to sudden changes of program localities the virtual processing time between page faults is very short in many cases (see Figure 11). Whenever the time between page faults is less than T = II P, no page frames are freed and therefore there is no overhead involved in the "decrease decision" in these cases. From the above implementation discussion we know that the PFF Algorithm is much easier to implement, and requires less overhead to operate than both the LRU and Working Set Algorithms. SUMMARY A new type of replacement algorithm based on page fault frequency (PFF) is developed in this paper. This PFF Replacement Algorithm allocates memory according to the dynamically changing memory requirements of each process. It does not require prior knowledge of program behavior and can be applied to programs of different types and sizes. The PFF Algorithm uses the measured page fault frequency as the basic parameter for the memory allocation decision process. A high page fault frequency is considered to be an indication that a process needs more memory space to run efficiently. Thus whenever a page fault occurs the amount of allocated memory is increased if the page fault frequency lies above a given critical level P. P is called the PFF-parameter. The number of allocated page frames may be decreased if the page of allocated page frames may be decreased if the page fault frequency falls below this level P. In this case only those page frames are freed which have not been accessed between successive page faults. The PFF Replacement Algorithm adapts to dynamic changes in program behavior during execution. As a result, this algorithm is largely independent of individual program behavior and input data. Measurement results from simulation of the PFF Algorithm for four different pro~ams reveal that t~e performance (in terms of space-tIme product) of thIS algorithm is better than the performa~ce of the ?est LRU Replacement Algorithm (for whICh the optImal memory space is known a priori), and is comparable to the Working Set Replacement Algorithm. Further, t~e performance is relatively insensitive to changes In the PFF-parameter P. The implementation of the PFF Replacement Algorithm is simple and less complicated than that. of LRU and far less complicated than that of the Working Set Replacement Algorithm. It does not re.quire .any additional hardware. Using the PFF AlgOrIthm In a multiprogramming environment, the supervisor has control over the efficiency and memory requirements of all processes. Based on this informati~n, the supervisor can perform efficient process scheduhng and memory allocation. From this study, we conclude that the PFF Replacement Algorithm should have hig~ potential.for use in future virtual memory and multIprogrammIng systems. REFERENCES 1 R L MATTSON J GECSEI D R SLUTZ L TRAIGER Evaluation techniques for storage hierarchies IBM Systems Journal 9 2 pp 78-1171970 2 E G COFFMAN T A RYAN A study of storage partitioning using a mathematical model of locality Communications of the ACM 153 pp 185-190 Mar~h 1972 3 P J DENNING The working-set model for program behavior Communications of the ACM 11 5 pp 323-333 May 1968 4 W W CHU N OLIVER H OPDERBECK Measurement data on the working set. replacement algorithm Page Fault Frequency Replacement Algorithm and their applications Proceedings of the MRI International Symposium XXII Polytechnic Institute of Brooklyn April 1972 5 G H FINE C W JACKSON P V McISAAC Dynamic program behavior under paging Proceedings of the 21st National Conference of the ACM pp 223-228 1966 6 LA BELADY A study of replacement algorithms for virtual storage computers IBM Systems Journal 5 2 pp 78-101 1966 609 7 E G COFFMAN L C VARIAN Further experimental data on the behavior of programs in a paging environment Communications of the ACM 11 7 pp 471-474 July 1968 8 M JOSEPH An analysis of paging and program behavior Computer Journal 13 1 pp 48-54 February 1970 9 L A BELADY C J KUEHNER Dynamic space sharing in computer systems Communications of the ACM 12 5 pp 282-288 May 1969 Experiments with program locality* by JEFFREY R. SPIRN** and PETER J. DENNING*** Princeton University Princeton, New Jersey modeI. 4 ,5,6 For other locality processes, this policy appears to be nearly optimap,ll The means of measuring the locality, and the accuracy of the measurement, depend on one's definition of "locality." The definitions that have appeared so far in the literature can be classified into two categories: the intrinsic locality models, and the extrinsic ones. Intrinsic models for locality assume that memory references emit from a program according to some (abstract) structure internal to the program itself. The locality in effect at a given time is a function of the internal state of the program at that time. Since the state of the program may not be known, it is usually not possible uniquely to. determine the locality by examining the memory reference sequence of· the program. Some examples of this type of locality model are page reference distribution functions, 7 ,8 the independent reference model,I the locality model,2,3 and the LRU stack mode1. 4 ,5 Another example can be found in Reference 6, where, for p > 0, it is assumed that there exists a sequence of sets of pages W p (l), W p (2), ... , Wp(t), ... , such that Wp(t) is the smallest set of pages containing the reference at time t with probability at least p. Extrinsic models do not rely on any assumptions of internal program state. They define locality in terms of observable properties of the memory reference sequence of the program. Three examples of extrinsic locality are: (1) Given a sequence of time intervals, the "locality sequence" L 1L 2 • •• Li . . . is defined so that Li is the set of pages referenced in the ith interval; (2) Given an integer k~l, define a sequence of time intervals so that each locality Li in the locality sequence L1L2 ... Li ... contains exactly k pages-i.e., exactly k distinct pages are referenced in the ith interval; and (3) A "working set" W (t, T) is defined to be the set of distinct pages referenced among the last T references, and is a measure of the locality at time t. 9 ,1O,1l Intrinsic models are useful primarily for analysis and simulation. They are limited by the accuracy to which INTRODUCTION For many years, there has been interest in "program locality" as a phenomenon to be considered in storage allocation. This notion arises from the empirical observation that it is possible to run a program efficiently with only some fraction of its total instruction and data code in main storage at any given time. That virtual memory systems can be made to run at all demonstrates that program locality can be used to advantage; and though it is certainly possible to write a program which violates the principles of locality, it seems one must go out of one's way to do so. lf a program is favoring a subset of its information at some particular time, we should very much like to know the identity of that subset. The set of favored pagest of information at a given time will be called the locality at that time. Using this information, we may answer such questions as "What behavior can be expected of the program in the near future?" or "How much storage should be allocated to the program at this time?" For some classes of programs the best we can do is estimate this locality, whereas for others we may be able to measure it exactly. The utility of this measurement is demonstrated by the fact that, for several models of program behavior, the policy "keep the current locality in memory" can be proved to be an optimal memory management policy. These models include the independent reference model, 1 the locality model 12,3 and the least-recently-used (LRU) stack * Work reported herein was supported in part by NSF Grant GJ-30126 and NASA Grant NGR-31-001-170. ** Present address: Division of Engineering, Brown University, Providence, Rhode Island 02912. *** Present address: Department Computer Sciences, Purdue University, Lafayette, Indiana 47906 t We assume pages are all of the same size, containing at least one word each. Most of our results extend in a straightforward manner to systems in which the block size is variable, so that the assumption of paging is mostly a matter of convenience. 611 612 Fall Joint Computer Conference, 1972 they simulate real programs. Due to the practical difficulty of measuring or estimating the locality, they may have little use in storage allocation. Extrinsic models are evidently more practical, since they define a measurement procedure; yet they are obviously limited by the extent to which the measurement taken reflects what the program is really doing. Such models are less suited for use in modeling, but they can be used conveniently to allocate memory. Although there are many models for defining the concept of locality, little experimental verification of their accuracy has been undertaken. The working set model is perhaps the only exception.12.13.14 Unless a given model can be shown to approximate closely the behavior of real programs, any analytic results obtained using the model are only of theoretical or academic interest. Accordingly, we have chosen in this paper to emphasize experiments which· test the ability of extrinsic measurements to estimate current intrinsic localities and predict future (intrinsic) localities, and the ability of intrinsic models to simulate real world behavior. Let us summarize the terminology that we shall be using for the various meanings of locality. If, as discussed above, a program's memory reference string is divided into (not necessarily equal) time intervals, the (extrinsic) observed locality Li is defined to be the set of pages referenced in the ith interval. Since it may be difficult to determine the internal state of a program according to an intrinsic model, we usually in practice use the observed locality in the immediate past (such as the working set W (t, T)) as an estimate for the current intrinsic locality; this use is termed an estimated locality. If we assume something about the program's internal structure, we may be able to predict, on the basis of the current (estimated) locality, the most likely references in a future interval; this is termed the (intrinsic) predicted locality. For some intrinsic models, the estimated locality can quite accurately (or even perfectly) determine the current intrinsic locality. Such models are clearly of special interest, and we shall discuss two of them. A third, the independent reference· model, is in general not as well measured by the working set, but is presented for comparison. Throughout this paper, it will be assumed that demand paging is being used and that a paging algorithm is optimal if it minimizes the expected probability of a page-fault in a given size memory. is the sequence of members of N generated by the program for given input data, where reference rt is the number of the page containing the address referenced at time t (time being measured in terms of the number of memory references made by the program). Suppose a reference string has been divided up into intervals, and Li is the observed locality in the ith interval. With respect to the given sequence of intervals, the reference string is considered to satisfy the properties of locality if:10 1. For almost all i, Li is a proper subset of N; 2. For almost all i, Li and Li+l tend to have many pages in common; and 3. The observed localities Li and L i+i tend to ~ become uncorrelated as j becomes large. A program reference string is considered to have a high degree of locality if Li is a small subset of N (statement 1), Li and L i+l differ by at most one page (statement 2), and the value of j for which Li and L i+i become uncorrelated is small compared to the length of the reference string. A very general model for locality, displaying properties 1~3 intrinsically has been defined in Reference 3. It defines a sequence (Ll, tl ) (L2, ~) ... (L i, t i ) . . . (1) in which Li is the ith intrinsic locality and ti the holding time in L i ; the Li are members of a specified set £ of localities associated with the program, and are subsets of N. During its stay in L i , the program generates some sequence of references r ilri2 ••• r iti' over the pages of Li only. The mechanism for generating the references from Li is unspecified and may be arbitrary. The current locality L t at time t is that Li for which tl + · · . +ti-l < t::::; tl + ... +ti. A probability structure can be imposed by specifying a transition matrix [peL, L') ] among localities Land L' of £, and a set of holding time distributions hL(t) for each L of £ In the following sections we shall discuss some special cases of this general model. These cases are of practical interest to the extent that. our experiments indicate agreement between localities predicted by these cases and the localities actually observed by using the working set model. The very simple locality model (VSLM) MODELS FOR INTRINSIC LOCALITY Consider an n-page program whose pages constitute the set N = {I, 2, ... , n}. A reference string rlr2 ... r t • •• This model assumes a fixed size locality-i.€., the localities Li in (1) are all of the same size l, where 1:::; l < n. At any given. time t, the probability of referencing an interior page (a member of L t ) is I-X; and Experiments with Program Locality the probability of referencing an exterior page (one· not in L t ) and making a transition is X. All l interior pages are referenced independently and with equal probability (l/l) . All n-l exterior pages are referenced independently and with equal probability (1/ (n-l). When an interior page is referenced at time t+1, no change in locality occurs-i.e., Lt+1 = Lt. When an exterior page is referenced, a change in locality occurs, but to a demand-paging neighbor only-i.e., L t+l = L t +rt+1-Y where y is chosen at random from Lt. The unconditional probability of referencing an interior page is at least as large as that of referencing an exterior page, i.e., (I-X) > ~ l - n-l' (2) which is equivalent to the condition X5: (n-l)/n. This model has two important parameters-the locality size l and the transition probability X-and will sometimes be called the two-parameter model. Note that the mean holding time in a locality is l/X. For this model, the storage allocation rule "keep the current locality in memory" has been proved optimal. 3 I t can ,easily be shown that for programs which fit this model, it is impossible to determine absolutely the current intrinsic locality from observations on the generated reference string. We shall consider next the accuracy with which we can estimate the locality by an extrinsic model, namely, the working set. As mentioned, the working set W (t, T) is the set of distinct pages referenced among the references rt-T+l' .. rt. If we desire to use the working set to estimate the locality, we must specify T, the window size. The choice of T must satisfy two criteria: (1) it must be large enough so that all pages within the locality are referenced with high probability, and (2) it must be small enough so that the likelihood of more than one locality transition within the window is low (for several transitions would introduce error). Although it is not obvious that a suitable T can be found, it is the case that for reasonable parameters of the VSLM, not only does a T exist, but its value is not especially critical. For the VSLM, condition 2 will hold whenever T 5: l/X, and our experiments verify that such values of T typically exist. We shall consider the working set to be a good estimate of a VSLM locality when two criteria are satisfied: (a) the average working set size is approximately equal to l, the locality size, and (b) the average missing-page probability when the working set is kept in memory is approximately X, the probability of referencing outside. the locality. Plots of working set sizes and missing-page probabilities for various values of n, l, and X show that,3 for small X (.01 or less), a 613 value of T on the order of 5 or 10 times the locality size will do an excellent job of achieving criteria (a) and (b) above, irrespective. of nand l. Furtherm~re, for small values of X, the values of the working set SIze and missing-page probability level off and are nearly constant in a large neighborhood of T, indicating that the choice of T is not too critical for these values of X. For large values of X (in excess of 0.05), the working set apparently does not provide as good an estimate of the locality. In this case, the working set size and missing page probabilities do not tend to level off at the values of l and X, respectively. Furthermore, the value of window size needed to get the missing-page probability equal to X gives a working set size as much as 20 percent too large. The simple LRU stack model (SLR UM) This model is based on the metnory contention stack generated by the LRU (least-recently-used) page replacement algorithm.s This stack is simply a priority ordering on all pages of a program according to. the time of their most recent usage. Thus, the first position (top) of the stack is the current reference, the second position is the next most recently used page, and so. o~. When the page in stack position i is referenced, It IS moved to the top, and all the pages which were in positions 1 ... i-I are pushed down one position. Specifically, if set) = (Xl, •.. , Xn) -is the stack at time t and the page at position i is referenced, the stack at time t+1 is s(t+l) = (Xi, Xl, •.• ,Xi-I, Xi+l, •.• , Xn). To create the simple LRU Stack Model, we assign to each position of the stack a fixed, independent probability. We will denote these probabilities al, . .. , an, where n is the number of pages in the program (and thus the number of stack positions) and al +an = 1. The ai are termed stack distance probabilities with i being the distance (from the top of the stack). At any given time stack position i will be chosen with probability ai; if it is chosen, the page in that position becomes the current reference and is brought to the top of the stack, as above. If we make suitable restrictions on the ai, we can cause this model to exhibit locality. In Reference 3, the requirement is made that the ai be monotonically non-increasing as one goes down the stack (al~"~ ~an)*. If, under this restriction the stack is divided at any point, the pages in the stack positions above the division are all more probable than those below the division. Specifically, if the stack at time tis set) = (Xl, . . . , x n ), we can define a locality of size l + ... *This requirement can be weakended slightly to min {al . . .am} ~ max {am+l, •.• ,an} for LRU paging in a memory of size m f3]. 614 Fall Joint Computer Conference, 1972 (for any l, 1 ~l O where B is a parameter. This will be termed the exponential model (EXP). Using the measured values for the independent reference and LRU stack probabilities, the working set curves for these two models were computed. (Algorithms for computing the working set curves of the various models are given in Reference 3.) For the locality model, the parameters l and A were chosen to give the lowest mean-squared relative error for the set of window sizes 10, 20, 30, ... , 1000, against the observed weT) curve. The same procedure was repeated for the exponential model to determine a value of the parameter B. DESCRIPTION OF RESULTS Programs on two machines were tested for fits with the various models. The PAL assembler on the Digital CHART I-Description of Programs Measured Machine Page Size (words) 5 6 7 PDP-8 PDP-8 360 360 360 360 360 360 128 128 256 256 256 256 256 256 8 360 256 9 360 256 Ref. Str. No. 0 1 2 3 4 Description Assembler, Pass 1 Assembler, Pass 1 FORTRAN (G) COMPILER FORTRAN (G) COMPILER Small FORTRAN job. One main loop. Small FORTRAN job. One main loop. Small FORTRAN job. One main loop. Medium FORTRAN job. Several Subroutines. Medium FORTRAN job. Several Subroutines. Medium FORTRAN job. Several Subroutines. Refs. Skipped Refs. Measured 0 lOOK lK 200K lK lOOK lK lK 20K 20K 20K 20K 20K 20K 300K 20K lOOK 20K lK 300K 616 Fall Joint Computer Conference, 1972 CHART 2-Values of Parameters ReLStr. No. Total pages refd. 0 1 2 3 4 5 11 12 35 38 20 20 20 22 20 31 6 7 8 9 EXP VSLM l 4 4 5 7 3 4 4 4 3 4 h B .0025 .0027 .014 .022 .030 .020 .021 .024 .029 .014 .0013 .0015 .00080 .0012 .0025 .0020 .0021 .0020 .0024 .00085 IRII ~ f f o:i I I I I ci I f [i! I ci : ~ cj I t I Equipment PDP-8 was run using a. page size of 128 words, the standard page size for the machine. Several IBM 360 programs were run, including two FORTRAN jobs and the FORTRAN (G level) compiler itself. The 360 page size was chosen arbitrarily to be 256 words. Chart 1 gives a description of each program. The "reference string number" refers to a reference string segment from each program. In particular, we expressed the reference string in the form rlr2 ••• rkrk+l ••• rk+z ••. , where k is the number of "references skipped" and x the number of "references measured." In other words, rk+l ••• rk+z is the reference string segment over which we attempted to fit the models. Curve fit results Chart 2 gives the values of various measured or bestfit parameters. It is important to note that the best-fit o - ; _ - - + - - _ - + - - _ - t - -_ _-+--___ + _______ + _____- j - - _ _ _ +___ _ :U::.:::' '\'",:,:;;: !o;:.~,::: -;':J:--.:';:' 1:':.'.:';:- Figure 2-Working set size (Ref. St. 4) VSLM locality size l was typically under 20 percent of the program size n, that the locality transition probability X was typically in the range 0.01 to 0.03, and that the locality transition time l/X was typically in the range 30 to 100 references. Reference strings 0 and 1 were exceptions, having much lower transition probability X than the others, this being due undoubtedly to the severely limited amount of memory on the PDP-8 (4K words). Chart 3 gives the results of the working set curve --, I IRM IRII .. ~_ _~_. 18 d ~ f ""-fi! I ci I f I :ilo f I L-J-I-o_=--oo-2:;!:-.tI.;;;;-CO-;;r-30.;;:;-CO-;J;-QOncoc-ssto_oooc-~60_~OC-7t71J_Ococ-;-:!lI!_Ococ-::;:oo·~.oo wiNDell SIZE (XIOJ ) Figure 1-Working set size (Ref. St. 2) :Ji:::.:;.-'· :rH":~::: Figure 3-Working set size (Ref. St. 6) Experiments with Program Locality 617 CHART 3-Fits to mean working set size curve . Ref. Str. No. 0 1 2 3 4 5 6 7 8 9 VSLM avg. error worst error 7.5% 6.2 6.0 11 5.3 10 9.9 8.0 6.5 10 56% 49 49 58 30 53 54 51 29 56 EXP avg. error worst error 37% 38 32 32 20 26 25 24 22 31 99% 99 97 96 95 96 96 96 95 97 fittings for the various models, and Chart 4 gives the corresponding results for the missing page probability curve. Two error measures are listed for each fit: "average relative error" over the curve, and the "worst case relative error." Except for the IRM, the worst errors occurred for very small values of T (less than 10); for the IRM, the worst errors occurred for the largest values of T (above 500). All errors are shown as per cent of the observed value. The "average relative error" is only an approximate value: it is found by taking the square root of the previously mentioned mean squared relative error (it can be shown that this represents an upper bound to the true average of the absolute values of the errors). The worst case error is the largest relative error considered over all integer window sizes in the range 1 to 1000. It seems apparent from the data that the SLRUM performs the best over all in approximating the two curves, with the VSLM a close second. The fits of these two models are usually very good on the working set curve. The errors in fitting the missing page probability curve are larger, even unacceptably large in some cases. IRM avg. error worst error 84% 97 161 109 95 77 86 98 85 162 97% 106 208 146 246 200 210 225 207 291 SLRUM avg. error worst error 28 % 19 20 6.5 2.6 2.3 2.6 2.8 2.9 8.3 33 % 24 29 8.2 7.7 7.9 8.1 7.9 8.9 9.4 However, it can at least be said that even for this curve, these two models perform much better than either of the others, again with the SLRUM slightly superior. We can conclude from this that the models are better for predicting a program's memory demands than for predicting its page-fault probability; further refinements to the models are required to achieve the latter goal; Because of its static treatment of locality, the independent reference model is the worst model of the four. It consistently overestimates the working set size, usually by a factor of 2 or 3. Figures 1-3 show typical working set curves, and Figures 4-6 show typical missing page probability curves. All six figures show the observed (measured) curve OBS, and the results of attempting to fit each model; (EXP was omitted to aid in readability). Figures 7-9 show typical stack distance probabilities; all such curves show that the monotonically nonincreasing assumption of the ai tends to be valid for the majority of values of i. (Note the logarithmic vertical axis on these figures) . CHART 4-Fits to Missing Page Probability Curve Ref. Str. No. 0 1 2 3 4 5 6 7 8 9 VSLM avg. error worst error 30% 36 29 82 32 85 72 37 30 57 228% 190 132 131 94157 153 158 92 190 EXP avg. error worst error 266% 301 93 157 40 119 100 62 48 117 407% 413 127 207 90 224 192 110 91 195 IRM avg. error worst error 77% 167 133 70 103 74 81 93 89 102 376% 419 426 210 479 417 435 450 420 609 SLRUM avg. error worst error 84 % 85 27 5.1 16 19 18 9.5 10 14 197% 181 118 26 48 43 38 37 37 40 Fall Joint Computer Conference, 1972 618 tel' ., t;;- . > \ \ \ \ \ \ , , \ \ \ Hr' + - - - - + - - - - t - - - - - t - - - - + - - - - f - - - - - - l - - - - - , , + - - - 4 - - _ + _ _ _ : _ _ o.cn 10.00 1D.OO li.U.O~ 5D.;::'::: st'.en "INO~'" Sllt:!XIO' I Figure 4-Missing page probability (Ref. St. 2) Several other statistics of interest appear in Chart 5. qr is the sum of the n-llowest measured independent reference probabilities; it gives an indication of the performance which could be expected if the program were in fact an IRM program allocated l pages of memory. qw is the missing page probability for the working set with window size T w, where T w is chosen to make the average working set size equal to l. Thus, qr and qw apply to the same average memory size. Notice , ,- iO-l.In~.o=-c----:<::-:::---::2!:::-oo.=-6~----:3~OO-=.oo-±~OO--:-.DJ--+SO-o.c-~i:J-501-0_-8:::-7-+-0D-.C:~--+----+-----"!OQ;::.x "IND~W SllE: Figure 6-Missing page probability (Ref. St. 6) that qr is typically an order of magnitude greater than qw, showing much more dramatically how pronounced are the dynamic effects of locality: The assumption of static locality would have led us to predict missing page probabilities in the order of qr whereas in fact they were in the order of qw. This re-emphasizes the poor performance introduced by a model assuming a static locality. It is also notable that in every case, Tw < I/'A, where \ \ , \ \ 1(:1:1(1+.a-c--+---+----+---.+-uo-.ec--+sO;Jc-:c .. O,-O--::+~--::'c:-=----::!::-::::-±900:-::_00::------:-: ,HNOOioi SllE. Figure 5-Missing page probability (Ref. St. 4) Figure 7-Distance distribution (Ref. St. 2) Experiments with Program Locality 619 II}" is the expected interval between locality transitions in the VSLM. Thus, it is unlikely that more than one such transition will occur in this size window, so that the working set will be a good estimator of the VSLM locality for all tested programs. EXTENSIONS TO THE SLRUM Attempts have been made to improve the SLRUM by increasing the complexity of the process by which stack distances are generated. Shedler and Tung,S for example, analyze a stack with a Markov process substituted for the ai. To our knowledge, no attempts have been made to validate any extensions to the SLRUM, other than that which we shall describe below. CHART 5-Additional Statistics Ref. Str. No. q! qw Tw .14 .09 .33 .36 .69 .48 .49 .54 .63 .57 . 0088 .024 .035 .071 .071 .042 .041 .046 .070 .030 75 45 34 40 15 36 36 31 15 43 1rS hs Figure 8-Distance distribution (Ref. St. 4) 0 1 2 3 4 5 6 7 8 9 O. O. .038 .11 .015 .015 .015 .0084 .0040 .0056 10.6 12.7 5.5 5.7 5.7 4.2 4.0 4.0 A very simple attempt was made to improve the performance of the LRU stack model. It was imagined that stack distances would be selected, as before, according to the ai, and the ai would be biased toward short stack distances. Occasionally, however, a new set of probabilities, the bi , would take effect for a short time; these would be biased toward long stack distances. The distribution {ai} corresponds to the intuitive concept of "drifting slowly among neighboring localities," whereas the distribution {bi } to the notion "jumping suddenly to very different localities," or "scrambling up the entire stack." The choice between {ai} and {b i } would be determined by a 2-state Markov chain. As has been suggested earlier, there is a distance string d 1d2 ••• d t ••• associated with the program's reference string rlr2 . .. r t • •• being measured. Given the distance string, our problem was to determine which distances should be considered data points for the {ad distribution and which for the {b i } distribution. Somewhat arbitrarily, we decided to count the distances toward the {b i } distribution whenever the majority of the last four successive distances exceeded four (four was chosen since it represented a typical VSLM locality size); distances would continue to be counted toward the {b i } -distribution until four successive distances were all at most four, in which case distances would be counted toward the {ail-distribution. The measured value for the steady state probability 7('8 of the {b i } (or "stack scrambling") state is shown in Chart 5; 7('8 is an indication of the fraction of time the program spent making large jumps between localities. Except '0- 5 - 1 - - - 1 - - - - - - < 1 - - - - - - - + - - - + - - - - + - - - - + - - - - + - - - + - - - + - G.02 :':IOC 1O.0Q S rqCK DIS r Figure 9-Distance distribution (Ref. St. 6) 20.00 620 Fall Joint Computer Conference, 1972 for reference string 3, all the programs seemed to spend under 4 percent of their time jumping localities-Le., they seemed to spend in excess of 96 percent of their time obeying the properties of locality. Chart 5 also shows the mean holding time hs in the {b i } -state. In all cases, hs was at least as large as the VSLM locality size l, suggesting that, when scrambling is over, the resulting locality is likely to be disjoint from the original locality. As might be anticipated, however, the working set size and missing page probability curves generated by this extended model were in all cases indistinguishable from those produced by the SLRUM. This is because the transitions between the {ai} and the {b i } -states occur independently of the process which generates stack distances. Apparently, it is necessary to make the stack-scrambling process correlated directly with the stack distance generating process, perhaps by generating distances directly from a Markov chain. Shedler and Tung's approach represents one possible solution, 5 though as yet unvalidated. CONCLUSIONS We have attempted here to validate experimentally several intrinsic models for the concept of program locality. We have done this with particular regard to the use of the working set as an estimator of the (intrinsic) locality. We have tried to take examples of both system software (a compiler and an assembler) and user programs, and have attempted to fit each of the models to the observed behavior of each given program. Fitting was attempted to the measured working set size and missing page probability curves. In this way, reasonable approximations to the paging behavior of the actual programs could be obtained, without having to consider other details of the programs of less importance in paging. Two models appear to produce good approximations to real world behavior: the two-parameter simple locality model and (especially) the LRU stack model. The independent reference model, because of its static concept of locality, does very poorly. The working set is a good estimator of the simple two-parameter model's locality, provided the locality does not change too rapidly; we observed no case in which the locality was changing too rapidly for the working-set to be a good estimator. The working set exactly measures the locality in the case of the LRU stack model and is thus nearly optimal for programs whose behavior can be closely approximated by this model. The principal conclusions to be drawn from this work are: 1. There exist non-trivial cases in which the working-set memory management policy is optimal, and evidence suggesting it will perform quite well when reference strings are generated by locality processes other than the ones studied here. 2. The concept of a "locality size" is not sharply defined, as in the case of the simple two-parameter model; it is instead a graduated concept, as in the LRU-stack model. 3. The locality at any given time receives the vast majority of references, is small compared to the program size, and is constantly changing in membership. 4. There is a tendency for transitions to occur between neighboring locali ties for the vast majority of the time, transitions among disjoint localities being relatively infrequent. Stack models appear to hold great promise of being good models for program behavior, especially as we gain a better understanding of the processes by which stack distances are generated. ACKNOWLEDGMENTS We are grateful to J. J. Horning and K. Sevcik of the University of Toronto for many useful ideas and insights relating to intrinsic and extrinsic concepts of ocality. REFERENCES 1 A V AHO P J DENNING J D ULLMAN Principles of optimal page replacement JACM 18 1 January 1971 pp 80-93 2 P J DENNING J E SAVAGE J R SPIRN Some thoughts about locality in program behavior Proc Brooklyn Polytechnic Institute Symposium April 1972 3~--- Models for locality in program behavior Princeton University Department of Electrical Engineering Computer Science Technical Report TR-I07 April 1972 4 P H ODEN G S SHEDLER A model of memory contention in a paging machine IBM Research Report RC-3053 September 1970 5 G S SHEDLER C TUNG Locality in page reference strings IBM Research Report RJ-932 October 1971 6 E G COFFMAN JR T A RYAN JR A study of storage partitioning using a mathematical model of locality Comm ACM 15 3 March 1972 pp 185-190 Experiments with Program Locality 7 J E SHEMER G SHIPPEY Statistical analysis of paged and segmented computer systems IEEE Trans Comp EC-15 6 December 1966 pp 855-863 8 J E SHEMER S C CUPTA On the design of Bayesian storage allocation algorithms for paging and segmentation IEEE Trans Comp C-18 7 July 1969 pp 644-651 9 P J DENNING The working set model for program behavior Comm ACM 11 5 May 1968 pp 323-333 10 P J DENNING S C SCHWARTZ Properties of the working set model Comm ACM 153 March 1972 pp 191-198 11 P J DENNING On modeling program behavior 621 Proc AFIPS Conf Vol 40 Spring Joint Computer Conference 1972 12 J RODRIGUEZ-ROSELL Experimental data on how program behavior affects the choice of scheduler parameters Proc 3rd ACM Symposium on Operating Systems Principles October 1971 13 W DOHERTY Scheduling TSSj360 for responsiveness Proc AFIPS Conf Vol 37 Fall Joint Computer Conference 1970 pp 97-112 14 W W CHU N OLIVER H OPDERBECK Measurement data on the working set replacement algorithm and their applications Proceeding of the Polytechnic Inst of Brooklyn Symposium on Computer-Communications and Teletraffic April 1972 TASSY-One approach to individualized test construction by THOMAS L. BLASKOVICS and JAMES A. KUTSCH, JR. West V irginia University Morgantown, West Virginia too monolithic. The decision to change a course or set of courses to a C.A.I. approach requires a "go-forbroke" commitment. We found that users did not like the non-incremental requirement of C.A.I. This was not too surprising to us when we considered that most of the end users who we were concerned about had little or no experience with computers. Because of the many know factors in C.A.I. systems and the type of commitment required, many potential users we surveyed were reluctant to commit themselves and their resources to a C.A.I. effort. As we were considering the major C.A.I. systems, we were also engaged in trying to determine what our students felt to be their own needs with regard to a college education. Along with study of student needs, we tried to discover the needs of the faculty with respect to their problems in teaching. The results of our study suggested that two problems existed. Students indicated that a major source of frustration (and possibly aggression) toward the University stemmed from a lack of feedback from the "establishment" regarding individual progress. A second, and only slightly less important, frustration was stated to be the lack of relevance of course material. Interestingly enough, the students felt that given the feedback, they would be able to deal with the problem of relevance themselves. Student evaluations of professors indicated that where amount of feedback was high, and good, relevance was not a problem. The faculty, by and large, agreed with the students feedback was a problem; howeyer, they saw relevance as being more critical to the learning process. The faculty also indicated that they did not have an easy means to provide feedback. In analyzing our findings,. it appeared to us that one reason for the limited success of C.A.I. in other universities was possibly that it might. have been the wrong solution for the problem facing the university at the present time. One of the claims of C.A.I. is that it allows the instructor to individualize: to tailor the instructional During the past ten years universities and the computing industry have seen the development of a new mode of teaching called Computer Aided-or Assisted-Instruction (C.A.I.). This new field, emerged as an attempt to meet and deal with the growing criticism and frustration of students, employers, legislators, and faculties, which stemmed from our inability to prepare students adequately. Several very creative C.A.I. projects were directed toward providing a whole new system of instruction. However, to date, the success of C.A.I. has been limited, at best. PLANIT, PLATO, LYRIC, COPI, COURSEWRITER, and others have not been able to meet the needs of the teaching community. The problems reported by the major projects are only in part bounded by the technology of the computer. At West Virginia University, we watched the development of these systems with great interest and concern, because we, like other universities, were faced with the same problems. We carefully examined several of the better-publicized systems with an eye toward implementing one of them to meet our instructional needs. As we analyzed the systems we discovered that C.A.I. systems: 1. Were too machine-dependent to allow a feasible implementation without scrapping our existing hardware (an IBM 360/75). 2. Were too expensive in terms of Core requirements. 3. Made (what appeared to be) unreasonable demands upon the instructor in terms of intimate knowledge of programming and/ or computer technology. 4. Were "not yet available but would be soon" even though the projected date had slipped by several times. We also found that the present C.A.I. systems Were 623 624 Fall Joint Computer Conference, 1972 experience to the student through a series of incremental feedback statements. This seemed to imply that the instructor would be spending more time with thestudent. Our observation of ongoing C.A.I. systems indicated that, once the horrendous task of programming ,had been accomplished, the instructor retreated to his office or lab either to write new programs, to collapse, or to become more deeply involved in his own research again. The net effect of C.A.I. was to make the student more dependent for help upon the machine, or if he were available, upon a graduate assistant. In some cases, the instructor developed a bad case of "Blinking Light Syndrome" and spent his time diddling with the machine. Since C.A.I. seemed fraught with problems, we decided to look more closely at the problems of feedback. We made an assumption (tentatively) that the instructor was able to teach the material. Analysis of the instructor's time indicated that he spent a large portion of time developing, administering, and scoring exam questions, keeping track of what his teaching assistants were doing, worrying about the security of his files-and precious little time actually teaching. In most classes, the most feedback that students received were scores on one or two maj or exams and the final. In almost all cases we observed, the feedback was delayed until as much as two to three weeks following the respective tests. The students had virtually no opportunity to analyze their performance, or to learn where their deficiencies might be. It appeared that the feedback system we were currently using could not be seen', by any stretch of the imagination, as' a learning experience. (In some cases it was viewed by the students simply as a means of satisfying the sado-masochistic tendencies of the faculty.) In an analysis of one undergraduate course of 325 students, we found that the teaching assistants spent better than 10 percent of their time in purely mechanical activities, such as distributing, proctoring, and scoring tests. In addition, a course manager spent approximately 50 percent of her time doing clerical work necessary to keep the student's grades up-to-date, etc. Initially, we felt that the time commitments were rather high; until we realized that the assistants and course manager had almost total responsibility for 2,275 hours of testing in one semester. 1 Further consideration of the function of efficient feedback suggested utility for both students and instructor. For the former, it provides (1) a test score, (2) a diagnostic evaluation of material learned (and/or not learned), and (3) (hopefully) some prescription to remediate his problem. In 1971, Baker presented the state of the art in Com.puter Based Instructional Management Systems (CBIMS). He suggested that the instructor is not only a teacher, but also the manager of a rather complex system of activities designed to help the student learn something. 1 He indicated that "a major facet of this managerial task is composed of the mechanical tasks of scoring homework, test papers, and keeping track of what instructional materials a student has used." Our discussions with faculty and teaching assistants tended to support the notions put forth by Baker. Each of the systems Baker reviewed was designed to provide for the four major functions of any CBIM system, namely: test scoring, diagnosing, prescribing, and reporting. However, the actual operation of the systems seemed to be very awkward, and required that the student participate on some fixed schedule. Another difficulty that we observed was that the present CBIM systems seemed to double the work of the faculty member in that he had to develop essentially two sets of testing material-one set for the diagnostic function and another for the examination function. 2 It appeared to us that if a system with interactive capabilities could be developed, it might resolve much of the awkwardness and restrictiveness we had observed. With regard to tpe second problem, having to maintain double sets of items, we asked the faculty why not let the students use the real thing for both diagnosis and evaluation. The rationale for this approach was that most professors have, over time, established large item pools from which they draw, in some more or less random manner, to make up any quiz. In addition to their own item pools, many instructors use items suggested from the instructor's manual or handbook that accompanies the text being used that particular semester. Additional verification of this approach to test-design was accomplished by looking at the exam files in the fraternity and sorority houses at our campus. Because of all of the above considerations, we decided to develop an automatic Testing And Scoring System (TASSY). We felt that TASSY should have the following specifications: (1) it should be easy to use by both the student and the instructor; (2) it should allow for immediate feedback to the student; (3) it should allow the instructor, on demand, to review the progress of a student; (4) it should allow the student to individualize his request for proficiency; (5) it should have a high degree of security; TASSY (6) it should meet at least the minimum needs of the registrar for recording grades; (7) it should allow the student the option of taking an exam for diagnostic purposes or for grade purposes; (8) it should maintain a record of each student's individual performance for instructor analysis of items. TASSY'S PROGRAMMING STRUCTURE TASSY takes the form of a main driving program with several small sub-routines. This structure was necessary because of the constraints of the Conversational Programming System (CPS) with West Virginia University's IBM 360-75. There is a limitation of four pages (each of four thousand bytes) placed on any CPS program. However, through the use of external procedures, a much greater effective program size can be attained (provided that not more than six thousand bytes are in the work space at any time). The driving program consists of the "welcome" and "exit" lines as well as the calls for all the sub-routines. When a student enters TASSY, he is first asked if he would like to see some operating instructions for the system. If he replies yes, an instruction sub-routine "HELP" is called. Next, a password routine, "PASS", is called. Here, it is determined whether he is permitted entry to the system. If the password is recognized as that of an instructor, the user has an option of seeing special operating instructions from "MORE HELP" (restricted to only instructor mode). The instructor then has the option of "UPDATE" or "DUMP" (described later). If the password is recognized as that of a student, a call to sub-routine "GENER" is issued (also described later). A third alternative is that the password is recognized as a master password, allowing access to control mode. From this mode, a system manager has the option of "UPDATE", "DUMP", "GENER", or "WHO". The system manager has access to any course. When a user (student, instructor, or system manager) is finished, control is returned to the driving progra~ where a "good-bye" line is printed. Then the system IS ready for the next user. ROUTINE "PASS" The password routine has the main .purpose of determining whether a given user is authorized to be in the system, and, if so, in what mode. A student pass- 625 word is given by the proctor to a student when he enters the testing center. This password varies systematically each hour of the day, and can be reset by the system manager each day or week as necessary. An instructor's password consists of any combination of up to six letters, numbers, or special characters and is chosen by the instructor. If an instructor's password is recognized, a further check is made on the name entered by the user. After passing both checks, control is returned to the driving program, passing back a code to indicate that this user is authorized for instructor's mode. Also, the number of the course in which this instructor belongs is returned. If the master password is found, a code is returned to the driving program to indicate that this user is the system manager and is authorized for anything. All passwords, including the instructor's and the master password can be altered at any time by the system manager. An added feature of the password check is the activity file (WHO). A record is written to this field when an instructor's password is found or when a user is not permitted entry in any mode, i.e., when he has entered an invalid password. This record contains the password used the first and last name and ID number as entered by the user, and the date and time of his entry into the password procedure. It was decided not to record valid student entries for two reasons. First, the number of entries would be great, and second, they are recorded as a part of "GENER". ROUTINE "GENER" This routine is called to generate, print, score, and record a student's examination. It is probably one of the most important components of TASSY. Upon entering the routine the student is asked the course, section, and quiz number he desires (his name .. and ID number have been passed from the drIVIng program). . . From the course number, the approprIate questIOn file is opened. Then, from a control record in the file, and from the entered quiz number, the type and number of the items to be given is determined. These items are generated from the file at random (except tha~ there has to be the prescribed number of each type deSIgnated by the control record). One rather interesting problem developed in choosing the algorithm for random generating of questions. The records are stored in the file in the order of entry. Along with each record is the attribute of the given question. From the control record, the desired attributes for a given quiz, and the number of each which should be generated, is obtained. In an early version of the system, 626 Fall Joint Computer Conference, 1972 below, and its number was also saved. As each new item number was generated, it was compared against the growing list of used items. This procedure prevents duplication of items on a given test. As the test proceeded from the attribute being used to the next as defined in the control record, the vector of used items was cleared for use by the next attribute. a random number between one and the number of questions in the file was generated by the built in function in CPS. The corresponding record was then read, and a check was made to determine if the record was of the correct attribute. If not, the number of this record was saved in a vector of 'used' items. If the item had the correct attribute it was printed as described (4 digits) Enter the course number of the desired course demo Which section? (two digits) ..Q.1 What quiz do you want? (1 through 9) 1- Enter yes to activate the verification option ~es You will now be given the requested test. Good luck. Question 1: Elapsed time: .083 minutes THE DIFFERENCE IN VAPOR PRESSURE BETWEEN SOIL AND ATMOSPHERE IS EQUILIBRATED BY PLANTS THROUGH: 1: ROOT HAIRS 2: OSMOSIS 3: TRANS PI RATI ON I~: TRANS LOCATI ON 5: TURGOR Your answer please: .1 Verify: 3? Correct Question 2: Elapsed time: .566 minutes GRANULAR ENDOPLASMIC RETICULUM CAN BE FOUND IN 1: ONLY PLANT CELLS 2: CELLS ACTIVE IN PROTEIN SYNTHESIS 3: CELLS WHICH ARE ABOUT TO DIVIDE 4: ALL OF THE AROVE 5: NONE OF THE ABOVE Your answer please: ~ Verify: 4? Sorry, the correct response is End of exam. 2 Please wait while results are compiled You have responded correctly to lout of Breakdown of score Attribute A 1 right out of a possible Attribute T 0 right out of a possible 2 questions or 50.000 percent. 1 or 100.000 percent 1 or 0.000 percent Mode? ~top Thank you for your Interest in our computerized testing service. Figure 1 Please come back again. TASSY 627 The following students have recently taken exaMS: N A t1 CHAMBERS CHAMBERS JELL! NEK JELL! NEK HEADE MEADE VARGO MANN E JEAN JEAN HalL! S HOlliS TONI TONI JERRY KAY ID 236603486 236603486 232760653 232760653 234785045 2347115045 174381354 23 11361515 SECT QUIZ DATE 01 3 07/21/72 n1 2 07/21/72 n2 5 07/21/72 02 5 07/21/72 02 4 07/21/72 02 4 07/21/72 01 4 07/21/72 02 4 07/21/72 3 2 5 5 4 4 4 4 TYPE 1 80.no 90.no 70.no 90.00 65.no 70.00 75.00 80.00 TYPE 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0.00 0 0 0 0 0 0 0 TYPE 3 0.00 0.00 0.00 0.00 0.00 0.00 0 0.00 0 0.00 0 0 0 0 0 0 TOTAL 80.00 qO.OO 70.00 90.00 65.00 70.00 75.00 80.00 Figure 2 As is apparent, if, during generation of items for a given attribute, an item of some other attribute were selected, it would be ignored and its record number would be stored so that this unusable record would not be selected again. It was thought that the machine time used in searching the "used" vector would be less than the I/O time required to keep selecting an unusable record (the records, in this case, must be read before usability can be determined). What was not realized was that the item vector would grow as fast as it did. As an experiment, the algorithm was changed in a way that the records are read, the test of attribute is made, then the vector is scanned to see if this item has already been used in this test. In this way, many more records are read from the file, but, much fewer comparisons are made in the "used" vector. The later of the two methods proved much more satisfactory. In the earlier method, when more than fifteen items of the same attribute were generated, real time between items ran 45 seconds, or more, while the time separation between items in the first part of the quiz (items one through eight) was minimal (on the order of two to six seconds). In the latter method, the separation time was much more uniform from the beginning of a quiz through twenty items or more and was on the order of two to eight seconds. Needless to say, the latter method has been used since the day that the time differential was noted. (It is thought that locating various attributes in different physical locations in the file may be a useful way of decreasing item generation time even more than what has been attained by the above change.) As the items are generated, they are printed on the terminal one by one and a reply is requested. Upon entry of this reply, the student is told immediately whether he is correct. If the response is incorrect, the correct answer is given. As the test is being given, a record is kept of each question, and, by question type, of the number correct I! s t Enter the number of the item to be listed number=.!!.5 ITEM 0045 COURSE 10 1220 SECTION nl ATTRIBUTE M CORRECT RESPONSE C Mitosis and meiosis are consirlererl to be dynamic processes because; a. The events are rliscontinous and discrete that occur randomly b. The events are discontinous anri discrete that occur in a system~tlc sequence c. The events are continlous and discrete that occur in a systeMatic sequence d.The events are contlnious and discrete that occur randomly e. none of the above ~~ode? list, insert, define, or stop lis t Enter the nUMber of the IteM to be listed number=.§.7 ITEM 0067 COURSE 10 1220 SECTION 01 ATTRIBUTE E CORRECT RESPONSE A \'Ihich of the following evolutionary characteristics of Plananla is not important in its use as an experimental animal. Its: a. small size b. ability to be trained c. regenerative powers d. beginnings of a brain e. none of the above Mode? list, insert, riefine, or stop ~top Figure 3 628 Fall Joint Computer Conference, 1972 update Course number? demo SectIon number? .n,1 (two digits) Mode? list, Insert, define, or stop insert Item number? (four digIts) .n,012 Attribute? (one character> .9. Correct response? (one character) .i. Question and answers? What is the correct date for the FALL Joint Computer 1. July 4~ 2. December 25~ .i.. December 5~ ll. 2. Conference?~ Feb.31~ None of the above~~ RECORD 0012 successfully entered Mode? list list, insert, define, or stop Enter the number of the item to be lIsted number=12 ITEM 0012 COURSE 10 demo SECriON 01 ATTRIBUTE Q CORREC T RESPONSE 3 What is the correct date for the FALL Joint Computer Conference? 1. July 4 2. December 25 3. December 5 4. Feb. 31 5. None of the above Figure 4 Mode? list, Insert, define, or stop Insert Item number? (four digits) ~erse ~ 1012demoOlq3What Is the correct date of the FALL JOINT COMPUTER CONFERENCE? 1. none of the above~~ RECORD 0012 successfully entered l.endl Mode? list list, Insert, define, or stop Enter the number of the Item to be listed number =.12 ITEM 0012 COURSE 10 demo SECTION 01 ATTRIBUTE Q CORRECT RESPONSE 3 What Is the correct date of the FALL JOINT COMPUTER CONFERENCE? 1. July 4 2. Dec. 25 3. Dec. 5 4. Feb. 31 5. none of the above Figure 5 TASSY Mode? 629 1 i st, Insert, def I ne, or stop define Definition of quiz parameters There is a maximum of three attributes. These may be any letter or number (one character in length) Please enter the number of the quiz to be defined 1 Enter the number of attribute used in this quiz. Max of 3 number =.1 Enter attribute * 1 .2 Quantity? .1 Enter attribute H 2 ..t. Quantity? 2- Enter attribute # 3 1 Quantity? I Quiz number 1 Is now defined Mode? list, insert, define, or stop ~top Figure 6 and number attempted; i.e., the number of questions of that type which were on the quiz. At the end of the quiz, the student is given the totals of questions correct and attempted and the breakdown of this information by question or attribute type. (See Figure 1) Before control is returned to the driving program, a student record is written onto a file, indicating the name, ID number, section number, quiz number, date~ time, and subscores, and total score on the examination. (See Figure 2) ROUTINE "DUMP" The "DUMP" routine is used by the instructor to print the records in the student file. In one sense "DUMP" keeps the instructor's grade book. The formatted file gives the instructor the student's name, his student number, the date the quiz was taken, the number of the quiz, section number, and a percentage correct breakdown for each attribute and percentage correct for the total quiz. The LIST function will list a requested item from the file. In the LIST is the correct response, question type (attribute), the question, and the distractors. (See Figure 3) INSERT is the converse of LIST. It allows the instructor to replace or insert an item in the file. There are two versions of INSERT. In the more commonly used version, the user is prompted before each entry. After each prompt the user enters the information requested. (See Figure 4) It was found that this method is time consuming and At present, the records are printed in chronological order. However, in later versions of TASSY the instructor will have the ability to have the records sorted for his convenience. (See Figure 2) ROUTINE "UPDATE" "UPDATE" is a routine for file maintenance of the question file. The user has the following options: LIST, INSERT, and DEFINE. somewhat boring for the experienced user, especially when large numbers of questions are being entered. Accordingly, a 'terse' mode of INSERT was developed. When this mode is requested, no prompts are given. It assumes that the user knows the record structure and the entire record is entered at one time. The basic difference between the two methods is that the information (item number, attribute, correct response, course number, and section number) must be provided to the system in the correct order without prompts, when the 'terse' mode is used. (See Figure 5) In both modes the user must learn only one special character. This is· the 'not' (...,) sign which is used as a separator between the questions and its answers, as well as between the answers themselves. It should be noted 630 Fall Joint Computer Conference, 1972 Fo 11 ow i ng CG305 2Q...+33 CG305 2-Z333 2-Z333 CG305 CG305 533391 533391 533391 CG305 CG305 CG305 CG305 CG305 2 P*PQ3 2 P* PQ3 2+++*3 2+++*3 CG305 2++533 23Z5-1 23Z5-1 23Z5-1 23Z5 23+ P*3 BIOL1 CG305 2325+3 2325+3 23Z5+3 23Z5+3 2Q* PP; 2 ** P3 3 2**P33 persons were in the system YURA DEE LAROCHE THm1A.S YURA DEE SKINNER BECKY SKINNER BECKY YURA DEE YURA DEE YURA DEE YURA DEE YURA DEE YURA DEE YURA DEE YURA DEE YURA DEE YURA DEE ADKI NS LINDA ADKINS LINDA MCCLAIN CHARLES MCCLAIN CHARLES YURA DEE GRINDLE BEVERLY CARLSON t'/l LlI A~1 CARLSON WILL lAM CARLSON \~/llL I AM CARLSON \\fILLIAM MEADO~/S SARA NEPTUNE CHARLOTTE YURA DEE MURPHY SANDY MURPHY SANDY MURPHY SANDY MURPHY SANDY JElLINEK HOl LI S BLIZZARD GINNY BLIZZARD GINNY 1 234G43C'122 1 232702237 232702237 1 1 1 123456678 123456789 123455678 123456778 1 1 236682088 236682088 236681690 236681690 1 232743692 213441310 213441310 213441310 213441310 233808122 236849383 1 232824319 232824319 232824319 23282431c) 232760653 232781259 232781259 07/20/72 07/20/72 07/20/72 07/20/72 07/20/72 07/21/72 07/21/72 07/21/72 07/21/72 07/21/72 07/21/72 07/21/72 07/24/72 07/24/72 07/24/72 07/24/72 07/24/72 07/24/72 07/24/72 07/25/72 07/25/72 07/25/72 07/25/72 07/25/72 07/25/72 07/25/72 07/26/72 07/26/72 07/26/72 07/26/72 07/26/72 07/26/72 07/26/72 07/26/72 07/26/72 08:43:17 0c):ln:2Q 09:17:50 11:05:06 11 t 0,9: 29 09:21:53 09:24:21 09: 52: 12 09:52:56 09:53:45 09:54:34 09:56:05 09:41:26 13:11:48 13:28:39 14:06:09 14:08:51 15:22:40 15:27:22 08:51:36 09:59:59 11:08:44 11:10:51 11:13:13 11:14:36 11:59:38 08:40:33 08:56:29 10:42:21 10:43:40 10:45:15 10:46:32 11:42:05 14:09:06 14:10:43 OK OK OK OK OK OK OK OK OK OK OK OK Figure 7 that this is the only place in the entire system where a user has to learn a new symbol. All other commands to the system are in natural language and are very straightforward. The DEFINE function is used to set up the control record with the quiz definitions. This includes the number of items to be on a given quiz and the breakdown for each type. This record is used by "GENER" when generating the quiz. (See Figure 6) MESSAGE ROUTINES Message routines have been implemented. These allow communications from the instructors to the system manager through a file. It is thought that this feature may be valuable for reporting any difficulty to the system manager or for leaving suggestions for improved function. CONCLUSIONS ROUTINE "WHO" This routine, available only from control mode, is used to print the system activity file generated by "PASS." It shows who has entered the system and whether or not their password was accepted. The date and time are also available. (See Figure 7) The value of this routine is to check on activity, especially if an instructor thinks his password is no longer secret. Our interest in developing TASSY was to explore the problems and potentials of using the computer in the educational process. TASSY served that purpose in many ways. Our first concern was the problem of software development and record design. We originally designed the question record to be 500 characters long. We found that this is too short. Our next version of TASSY will have the ability to hold a question and associated distractors totalling 1,000 characters on each record. TASSY At the time of this writing we are still not sure what optimal student record should look like. We estimate that the student record should have the ability to record a minimum of 75 items, the sub-scores from 10 attributes, and the total score, in addition to the necessary identification date mentioned in the system description. A second problem we wanted to evaluate was the feasibility of operating under the auspices of a large central computer using telephone communication. Under the best circumstances, our experience has indicated that we not try it again. Our experience was not unlike that of anyone else who has had to rely on the telephone system and someone else, i.e., the central system, to do the work for them. A third problem we encountered was that our system has had to rely on the telephone system and some else, i.e., the central system, to do the work for them. A third problem we encountered was that our system became fair game to students who would try to "break in" and look at the answers and the system. The computer center staff developed a special software "lock" for us that was, in effect, a self-destruct button. If any tampering was attempted, a system error was generated, duly logged, and the program disappeared. (See Figure 7) At times this security feature was inconvenient, but we felt the trade-off for security well worth it. We have decided to develop TASSY to operate outside of the University's central computer because of the cost of maintaining enough core and disc space on line. We estimate that the cost of a 16-terminal system would be almost double that of having our own minisystem. Weare also concerned with the reactions of faculty, teaching assistants, and students. The students and teaching assistants liked TASSY very much. The students did not feel that the computer de-personalized them. In fact, most of the students felt that TASSY represented a meaningful step on the part of the faculty to meet their needs. The teaching assistants were overjoyed because the most boring 10 percent of their work assignment was removed. The faculty agreed with the idea, and liked the potential of the TASSY system. When the system became a reality, they wanted to use it as little more than a slow test printer. However, with some handholding and encouragement, and favorable results from the prototype experiment, the faculty have begun to use all of the system capabilities. Several additional problems arose as a result of our efforts. The faculty had difficulty in developing good test items. Traditional item analysis methodology is 631 only partially useful. Because of TASSY's ability to generate so many different tests, item validity and internal consistency become difficult values to compute. We found that with an item pool of 500 items equally split over 5 attributes, the probability of getting the same test with the same order of items is 1 in (10 !jlOO !)5. Needless to say, sample size for each test is rather small. Our experience with TASSY, plus cOqlmunication with other researchers, indicates that this problem will be with us for a long time to come. A second, and even more serious, problem became apparent as we began to develop the diagnostic and prescriptive capabilities of TASSY: namely, once the instructors had detailed data on the student's deficiencies, they didn't know what course of action should be taken except in the most general terms. This has caused some embarrassment. Our future plans for TASSY include enhancing its response repetoire by adding the ability to recognize single answers and formulas in a manner similar to PLAN IT . We also hope to give the instructor his choice of scoring modes besides the traditional, rights, rightswrongs, etc. Finally, we hope, in the not too distant future, to be able to add some graphics capabilities to the system. Because of the modularity of TASSY and its relatively simple-minded approach to testing and feedback we feel that it would be implemented readily by individual instructors at almost any level of college instruction. We hope that, as our experience with TASSY grows, we will be able to develop· a product which will not only be easy to use, but will also be cost-effective enough so as to warrant serious consideration. ACKNOWLEDGMENTS The author would like to give thanks to Mrs. Rita Saltz for her invaluable editorial assistance and to the West Virginia University Computer for their patience. REFERENCES 1 F B BAKER Computer-based instructional management systems-A first look Review of Education Research Feb 1971 Vol 14 No 1 pp 51-70 A C KELLEY An experiment with TIPS-A computer aided instructional system for undergraduate education The American Economic Review 1968 No 58 pp 446-457 A comprehensive question retrieval application to serve classroom teachers by GERALD LIPPEY IBM Corporation San Jose, California forms, reports, retrieval options, item revision procedures,test modification p'rovisions. 2. Identify problems associated with development of item pools, e.g., item classification, cost of preparation, adequacy. 3. Discover how classroom teachers would use questions when they were conveniently available, e.g., testing, drill, discussion. 4. Gain quantitative information on usage, e.g., frequency of use, length of tests, requirement for data bank size. INTRODUCTION CTSS (Classroom Teacher Support System) was -developed to aid teachers. The concept consists of retrieving questions according to specified attributes from a centralized data bank, assembling them into tests or exercises, and scoring student answers. Since scoring mark-sense answer sheets is a well-understood and widespread application, the emphasis was placed on solving systems problems related to producing lists of questions which meet the teacher's needs as he perceives them. To achieve this, the system permits items to be classified along several dimensions so that they can be selected by the computer according to criteria set by the teacher requesting a test. (The word "test" is used here to design~te a list of questions, regardless of how it is to be used by the teacher who receives it.) CTSS enables many teachers to share a collection of questions; thus, they all benefit from the advantages of specialization. Such an application has the potential of providing a teacher with access to high-quality questions; freedom from the clerical chores of test construction and scoring; a new test, tailored to his needs, for each occasion; and comparative data based on previous student responses. Exploration of this concept began in IBM's Advanced Systems Development Division in 1968. In 1969, a joint study agreement was reached between IBM's Systems Development Division and the Los Angeles City Unified School District to develop a prototype application. System functions were specifi~d jointly: IBM developed the computer programs, and the school district prepared an initial collection of 8000 questions in U.S. history and took responsibility for all operational aspects. Objectives of the joint study were to During the first half of 1969, functional specifications were established. Programs were coded during the second half, while the first item collection was being, prepared. Systems testing began in January 1970, with six teachers in one school. CTSS slowly phased into operational use as teachers at various schools have been gradually added during the last couple of years. There are now over 200 history teachers using the system in Los Angeles schools, and several other educational institutions have installed it. TEACHER SERVICE CTSS is intended to be entirely under the teacher's control. It may be used or not as the teacher sees fit. The system is free of any particular philosophy of testing or other use; it is the teacher's prerogative to use it in any fashion that satisfies his needs. Questions have been used for quizzes, homework assignments, final exams, drill, review~ classroom discussion, and material for special student proj~cts. Although it can be used with essay and short-answer questions as well, CTSS was intended for objectively scorable (multiple-choice, true-false, and matching) questions. This decision was made to encourage machine-scoring in order to collect data to help identify unsatisfactory items. Within this framework, some 1. Confirm decisions related to functional system operation, e.g., communications procedures, 633 634 Fall Joint Computer Conference, 1972 features were included to accommodate item format variations: Items consisting of several questions preceded by a paragraph or table are acceptable. Also, special print control provisions permit item authors to specify overprinting of text lines (e.g., for underlining words) and to control the splitting of long items between test pages. Item collections are maintained on disk storage. Teachers submit requests for questions on optically scanned test-request forms, which are sent directly to the computer center through the district's internal mail. This input is batched, run each night, and the resulting tests placed in the school mail the next day. Scoring is handled in the same fashion. Item specification During the design of CTSS, primary attention was given to item selection. So teachers can conveniently construct tests· which meet their needs, the system permits questions to be classified in several ways. Although specific questions can be requested, teachers usually request questions by specifying attributes associated with them. Questions in a broad subject matter area (an "item collection") are classified at the least into major subject matter "categories" and at most along four additional dimensions. The category classification may be based on behavioral objectives or not, depending upon the item collection designer's wishes. It may also be structured in hierarchical levels. During the retrieval process, items are selected from both those in the category specified and those in all categories subordinate to the one specified. There may be up to five hierarchical levels in the category classification defined. Other classification dimensions can include an assigned difficulty level, behavior level (knowledge vs. application of knowledge), keywords, and several special flags. Some dimensions (e.g.,. keywords) permit the item classifier to assign more than one value to each item. Dimensions which can. have a large number of values (e.g., category) are coded numerically, so that, with the aid of an index, they can .easily be specified on an optically scatined test-request form. Specifications for questions are entered in "request blocks" on the test-request form. Each block consists of several fields in which the teacher specifies the attributes and the quantity of a group of items desired. While items are normally selected by attribute, a request block may be used to specify the unique identification number of an individual item desired. Thus, a test may be constructed which contains a specified number of questions in each of several categories with the desired mix of other characteristics, as well as some specific items which the teacher knows from experience and wants to include. Tests The result of processing a test request is a list of questions identified by the teacher's employee number and a two-digit test number assigned for that teacher by the system. The test thus produced is stored by CTSS and labeled "generation 1." The teacher may then modify the test by requesting the system to add or delete items. To accomplish this, a test-request form is filled out which references the test number and specifies the items to be added in the same way that an initial test is requested. Items to be deleted are indicated on another field in the form. A new list of questions with the same test number will be created and labeled "generation 2.". This process can be repeated until the list of questions satisfies the teacher. Only the most recent generation of a test is remembered by the system. A teacher may have up to twenty such tests retained simultaneously. The teacher may specify on the test-request form that the test be printed on a reproduction master. He may also request up to nine different versions of the test with the items appearing in a different sequence on each. Each time a test is printed, two associated reports are produced. An Item Characteristics report provides the answer key and informs the teacher how each item has been classified for retrievel. It may also provide references to two resources which contain information related to the content of each item. The second report repeats the teacher's request and indicates the items retrieved in response to each request block. CTSS will score student answer sheets when the teacher so desires. Since the test has been remembered by the system, it is not necessary for a teacher to submit a scoring key. Several scoring options are available for identifying students, suppressing or adding questions, and partitioning reports. The usual scoring reports are sent to teachers. Scoring procedures and reports will not be discussed. The system was designed so that on-going, everyday service could be provided without the need for judgment by anyone other than the teacher concerned. Probably the most important consequence of this objective was the attempt to anticipate input errors of various kinds and, whenever possible, respond automatically by addressing an explanatory message directly to the teacher. Comprehensive Question Retrieval Application SUPPORT ACTIVITIES The teacher service described above is supported in several ways. Direct daily support is provided by the internal mail service and the data processing center. At the center, request forms are batched and read by an optical mark reader; processing is accomplished at night; and output to each teacher is manually matched to its request form prior to its return to the teacher. CTSS is, however, not adminstered by the data processing group. Rather, the data processing center performs a service function, while operations are monitored and managed by education-oriented personnel referred to as CTSS "coordinators." This arrangement dictates that the points of contact between the data processing center and others be well defined, so that the computer center can regard all CTSS jobs as routine production. Consequently, input from, and output to, both teachers and coordinators is handled according to well-established procedures. Service support Coordinators have two areas of responsibility-one related to operational teacher services and the other related to item pools. In the teacher service area, new users of the system may obtain coordinator assistance getting started and, subsequently, in understanding errors that they make. Coordinators also supply teachers with the optically scanned request forms. Since a user identifies himself to CTSS on the test and scoring request forms by employee number only, a file of teacher names and locations, the "teacher file," is required to automatically address the title page of tests and scoring reports. A teacher must be registered on this file prior to using CTSS. One chief responsibility of the coordinators is maintenance of the teacher file. The file in which tests are stored, called the "active list," can also be influenced by coordinators. When a new test is generated, it is automatically added to the active list; when a test is scored, it is deleted. Since many tests are never scored by CTSS, the active list would continue to grow indefinitely unless old tests were removed. Old tests are identified by assigning an "activity date" to each test when it is created. This is reset to the current date whenever a new generation is produced or a scoring request is not successfully processed due to input error. The activity date is used to purge old unscored tests from the file. A "time-out cancellation" program removes from the active list tests whose activity dates precede a cancellation date set by a coordinator. When a test is timed-out, a notice is sent to the teacher concerned, informing him that it is no longer 635 available for modification or for scoring against the answer key retained in the system. Time-out cancellation is initiated periodically by coordinators. As tests are produced and scored, statistics on system activity are accumulated. A "system statistics file" contains counters which cumulate data for two dozen kinds of user activity. For example, the number of test requests rejected due to input errors, the average number of generations produced, and the average number of questions requested on tests are accumulated. The system statistics file contains this information for two durations (a long and a short time period); the data is maintained for each item collection; and it is classified according to which of 13 teacher groups the corresponding users belong. Coordinators may reset the system statistics file counters when they wish the accumulation process to begin again. In addition to direct contact with teachers, coordinators monitor system usage by looking over summaries produced by each test production and scoring run. There is also a report generated by the time-out cancellation program, which summarizes the number of tests in the active list, tests having scrambled versions, and tests removed from the active list by. the time-out routine. Longer term activity is observed by drawing activity reports from the system statistics file described above. Item pool support The second area of coordinator responsibilities involves the item collections themselves. To use an item pool, teachers must understand how it has been classified, and they must have access to at least the index which defines subject matter category numbers and perhaps to other indexes which have been constructed. Coordinators are responsible for communicating this type of information to teachers. Item collection maintenance is another important coordinator job. Typically, large collections of questions are made available before they have been thoroughly edited and field tested (otherwise, the development investment would be too large). Thus, one begins with relatively poor-quality items and depends upon a long range revision process during usage to improve the questions. As teacher comments and scoring data become available, items are repaired. Teacher reactions to poor-quality items, while negative, have not turned out to be a serious problem. Indeed, teachers sometimes appear to experience satisfaction when they discover and report items needing correction. The best source of item revision information appears to be the teacher. A second source has been provided in CTSS by cumulating item usage data in an "item 636 Fall Joint Computer Conference, 1972 statistics file," associated with each item collection. The item statistics file retains information on the number of times each question appeared in a test, or was deleted from a test or suppressed from scoring by a teacher. The file also contains student-supplied data obtained when questions are machine-scored, such as the number of responses to each option and a central tendency for discrimination index. A coordinator can obtain an "item statistics report" based on information in the item statistics file. The philosophy behind this report is that by appropriate selection of item statistics thresholds he can obtain a list of those items most likely to need revision. A coordinator may set thresholds on high teacher rejection rate, low average discrimination, unusually heavy use of a distractor, and very high or low measured difficulty level. He may also require that some minimum number of tests has been drawn, or answer sheets scored, to cause an item to be eligible for these tests. Finally, it is the coordinator's responsibility to oversee the creation and supervise the installation of new item collections. When a new item pool is to be constructed, many decisions need to be made: a character set, i.e., those characters which are to be allowed in item text, must be chosen; the kinds of items which are permissible must be determined; the dimensions of item classification must be designed; and the method of transmitting all the necessary classification information to teachers must be planned. There appears to be significant value in having large item pools. Teachers do not usually wish to encounter the same few items over and over. A large collection is especially useful when multiple tests are requested to cover the same material. It also enables the collection designer to include several approaches to subject matter ; this is essential if the collection is to be shared by users having a variety of pedagogical styles. Finally, it appears to be helpful if teachers regard an item collection as essentially infinite in size and constantly changing, and do not, therefore, have a desire to deal manually with the entire collection at once. Experience from CTSS indicates that about 30 questions per class hour should be regarded as a minimum item pool size; 50, as more desirable; and, perhaps, about 70 as the number beyond which the cost begins to exceed the value. SYSTEM DESIGN CTSS was designed as a prototype because it addresses a new application area. Many of the design decisions were influenced by this. Perhaps the most obvious effect on system design was the effort to include features whose utility was questionable in order to establish their value through experience. On the other hand, an effort was made to design low cost into the system framework. Thus, on-line terminals were rejected in favor of internal mail service, and keypunching of teacher requests is avoided by using optically scanned input forms. Costly human intervention is further reduced by sending error messages directly to teachers. Also, teachers are discouraged from requesting reproduction masters or scrambled versions unnecessarily by preventing further modification of a test once either of these more expensive printouts has been produced. Program design priorities (from high to low) were as follows: (1) ease of coding and testing, (2) ease of modification and maintenance, (3) low storage requirements, (4) low execution time. Prototype design specifications included handling several item collections with about 10,000 items in each. The system was programmed in PL/I to run under the IBM/360 Operating System in a 74K byte partition. A highly modular programming approach was chosen. Communication between programs is accomplished through files which are either permanently established or used as temporary interfaces. Dividing functions into a series of separately executable programs simplified programming and testing. More importantly, this made it easier to modify the system, particularly when additional features were later inserted. The permanently established files will be outlined next, followed by a brief summary of the major runs available. Files CTSS includes two types of permanent files: those which are item collection-independent and those which are item collection-dependent. An "item collectionindependent" file is required by the system only once, irrespective of how many item collections are supported. An "item collection-dependent" file is required to be present for. each item collection. The main permanent files and their contents are itemized in Tables I and II. Access to items Item manipulation is, of course, the key element of a test construction system. Ease and efficiency of item retrieval and item revision depend upon the item file organization. In CTSS, all of the selection decisions concerning which items are to appear on a test are made by consulting the classification file. This file contains item attributes, but no item text, and is therefore very small and easily referenced compared to the item file. During the run that produces tests, the item file is referenced only when it is necessary to format the test Comprehensive Question Retrieval Application TABLE I-Principal Item Collection-Independent Files File Name Course File Teacher File Active List Version File System Statistics File Contents Record for each item collection: Identification, location, and parameters Record for each teacher: identification and address; identification of each active test, its activity date, and its location in the active list Record for each active test: identification of items, answer key, and status information; identification of last 50 items deleted during previous modifications; location of version file record, if any Record for each active test having scrambled versions: scrambling keys Record for each item collection: two sets of system usage information for each teacher group ate a scan of the whole classification file, a teacher must go to some trouble. This compromise between meeting user needs and discouraging use of unnecessary computer time has so far proven satisfactory. The item file itself consists of 80-character card images, where each image contains one text line of a printed item and a unique identification number. No attempt was made to compress text by coding blanks. This file serves as the master file of items; there is no duplicate file of punched cards. When cards are desired to aid in changing item text, they are punched. The item file is normally accessed differently for file maintenance than for test construction. Items are retrieved from the item file for tests by direct access, but the basic item file maintenance run is a sort-merge TABLE III-Principal Runs for Coordinators Run Name for printing. (Similarly, the active list contains records of specific tests by recording item identifications instead of the item text itself.) I t was decided early in the design phase that, while several dimensions of ite~specification would be available to teachers, one important dimension would be emphasized and retrieval optimized around it. As a result, items are ordered by subject matter category number in the item collection-dependent files, making hierarchical selection in this dimension easy to implement. Instead of building and maintaining inverted files for other dimensions, the classification file is simply scanned within the category designated for items which meet other specified criteria. To discourage teachers from too often requesting a scan of all items in a collection for those items which have some other attribute, they are prevented from initiating such a search with a single request block. By requiring that teachers enter a category number in each request block used and limiting the range of a search initiated by one request block to the highest level of the hierarchical category classification system, CTSS can require that several request blocks be used to cause a search over the entire item collection. Consequently, though it is possible to initiTABLE II-Principal Item Collection-Dependent Files File Name Classification File Item File Item Statistics File Contents Record for each item: item attributes and location in the item file Item text Record for each question: item· usage data 637 Time-Out Cancellation Print Teacher File Teacher File Maintenance Print Activity Report Print Item Statistics File Print Item Statistics Report Print Classification File Print Item File Item File Maintenance Item Stop and Change Function Remove old tests from the active list List the teacher file and active list Add, delete, or modify teacher identification data Produce report from system statistics file and reset counters if desired List item statistics Identify items having statistics which exceed specified thresholds List item attributes List item text for specified section of the item collection Add, delete, and replace items, and reset an item's statistics if desired Tag item so that it will not be available to teachers, or replace item characteristics or text line procedure which rebuilds all of the item collectiondependent files. Item additions, deletions, and substantial modifications may be accumulated and this run executed infrequently. Between such runs, it is possible to prevent specific items from appearing on tests and to change specific item cards in the file by direct access. Computer runs Like those for item file maintenance, most computer runs were designed for use by coordinators. The principal runs available to coordinators are listed in Table III. A few other runs are for the programmer during system maintenance or when adding new item collec- 638 Fall Joint Computer Conference, 1972 tions. The two primary runs, executed daily, are those which service teachers: One produces tests and the other handles scoring. Test production will be discussed here; scoring will not. Test production There are two strategies which might be employed for batch retrieval of items: One is to publish a catalog of all items in the collection, from which the teacher selects those he wants; the other is to have the computer select items from the collection according to attributes specified by the teacher. When a large question data bank is involved, it is impractical for the teacher to deal directly with the questions. Furthermore, if, in addition, new items are continually being added and old ones revised, providing teachers with a relevant catalog of items becomes a problem. It would be hundreds of pages long, and the publishing cost would be compounded by the need for frequent revisions to account for changes. For these reasons, CTSS relies instead upon the computer's ability to retrieve by attribute, while still reserving to the teacher his right of final choice. However, the approach chosen results in another problem. When selection by attribute is used to locate entries in a large data bank, the difference between the quantity desired and the quantity available must be resolved. This problem is easily handled in a conversational retrieval system, because the user can specify attributes and immediately learn how many items are available. If there were more than he wanted, he could tighten up the specifications and inquire again; if there were less, he could loosen· them. In a batch retrieval system, some alternate means need to be employed to insure that the user is neither flooded with eligible items nor receives too few. In CTSS, random selection achieves the former and automatic specification relaxation the latter. If, for example, a request block specifies five questions having certain characteristics and there are 100 that satisfy the criteria, five are picked at random from the 100. CTSS does this by partitioning the 100 questions, ordered by category number, into five groups of 20 items each. One question is selected at random from each group. The stratified sampling prevents too many items from being occasionally picked from a single category when selection ranges over several subordinate categories. There are, of course, many alternative ways to reduce the number of eligible items, but random selection has proven adequate. If fewer items are found that match a request block's specifications than are requested, some specifications will be relaxed in an attempt to meet the quantity objective. This is consistent with the observation that teachers prefer to receive items, even though they do not meet all criteria specified. In the prototype, behavior level, if specified, is ignored first; then, any assigned difficulty level specification is disregarded. No other type of teacher specification is relaxed. When more than one request block is used on a testrequest form, each is treated separately for item selection purposes. However, no item is included in a single test more than once. In addition, the identifications of items deleted from a test during modification are stored in the active list. Such items are considered ineligible for subsequent generations of the test unless specifically requested. The test production run consists of a series of programs, each of which operates on a batch of test requests and runs to completion prior to the next program's execution. This run handles both initial requests and requests for modifications to existing tests. Considering only requests for new tests, the primary functions of each program are summarized in the steps below: 1. COJfvert the format of optically scanned test- 2. 3. 4. 5. 6. 7. 8. 9. request forms to one more useful during program debugging and maintenance Sort test requests by item collection (so that all item files need not be accessible simultaneously) Edit test requests for teacher input errors and allocate space in the active list and version file Select items to appear on tests by referencing the classification file; store identification of items selected in the active list and scrambling keys in the version file Update the system statistics file Prepare tests for printing by retrieving text from the item file Print summary of this run Print tests on paper (spooled) Print tests which are to appear on reproduction masters (spooled) Backup and recovery CTSS has an extensive set of backup and recovery procedures built-in to reduce the impact of human or machine errors related to processing. Because all data needed for runs is retained in files, backup procedures need only be able to restore these files. Prior to executing the test production, scoring, teacher file maintenance, and time-out cancellation runs, the teacher file, active list, and version file are automatically copied. If the run does not go to completion, these files are restored. Because the most recent state of the quasi- Comprehensive Question Retrieval Application random number generator used to select items and scramble tests is stored in the active list header, a test production run can be reproduced from restored files. The system statistics file is copied periodically. Every time item file maintenance is run, a copy of the item classification file, the item file, and the item statistics file is automatically made. CONCLUSION Probably the most significant systems learning that occurred during prototype operation has been in the support area. The fact that a good deal of attention was directed toward reducing teachers' chores turned out to increase the coordinators' work substantially. For instance, although teachers are encouraged to offer new items and suggestions for improvements, they are not expected to revise questions. Likewise, teachers are not required to submit their names and locations with requests. Such system-provided services resulted in the presence of additional files and more support work for coordinators. The coordinators' activities have required more computer assistance than had been anticipated during system design. In fact, most of the functions added after CTSS was installed were to aid coordinators-either to diagnose teacher difficulties or to maintain files. 639 The CTSS prototype is available to other educational institutions; programs, documentation, and some of the existing item collections may be obtained from the Los Angeles City Unified School District. Several other institutions have installed CTSS and more data banks of questions are becoming available. Also, systems of a similar nature have independently emerged at various other locations, chiefly in institutions of higher learning. One can infer from the success of CTSS and from the developing interest elsewhere that the use of computers for banking questions and generating tests and exercises is an embryonic application area which will continue to grow. Furthermore, test generation is a natural component of more sophisHcated computer-assisted instructional approaches. A few of the existing automated test construction activities are, in fact, parts of larger computermanaged instruction systems. These more extensive systems usually include pedagogical decision-making elements, such as diagnosis of learner difficulties and prescription of assignments. Some of them enable students to proceed through large units of instructional material independently of each other. Those who wish to begin with a small, simple system and grow toward a comprehensive system may find test construction a convenient starting point, since it can stand alone under teacher control as well as fit into an integrated computer-managed instruction system at a later time. • Computer processes in repeatable testing by FRANKLIN PROSSER and JEAN NAKHNIKIAN Computer Science Department Bloomington, Indiana INTRODUCTION correct answers and other study aids. And third, the examinations are made repeatable. Large numbers of unique but equivalent individualized tests are prepared using a digital computer. The instructor may readily permit his students to be examined repeatedly over the same unit of materiaL Students may learn from their errors,. and return to be tested again over similar material. In this way, the examination is made a vital part of the learning process. The CGRT method consists of four basic steps: developing pools of test items, producing the individualized tests, administering the tests, and scoring of student responses to the tests. The entire process is described in general terms in Reference 1. Our purpose here is to discuss in some detail the computer processes involved in repeatable testing. There is emerging an increased interest in computer augmented testing procedures. Among those feasible techniques that have proven of particular value is the method of Computer Generated Repeatable Tests (CGRT). This approach to testing, which allows the repeatable administration of tests over a body of material, has been previously described.! Interest in the method has been high, and frequent inquiries into the nature of the computing processes involved in CGRT have led us to elaborate here on the computer software aspects of the method. In this paper we· describe the structure of the test generation and student response scoring programs, we describe important performance improvements, and we discuss some aspects of the problem of· developing "portable" or machine-independent computer programs. The CGRT process was conceived by Donald D. Jensen, now at the University of Nebraska, as a means of avoiding many of the adverse features of conventional testing in large university classes. The typical exam in such classes consists of true-false or multiple choice questions, is administered at one fixed time only, and is given infrequently and over a large amount of material. Often several days elapse before the student receives any useful information on his performance, and often the total score is the only information given, an item that is of little value in guiding further study. Such procedures are disliked by students, who frequently adopt the "loaf-and-cram" pattern of study, and who are subject to considerable· anxiety over their performance on the infrequent and all-important big exam. The CGRT scheme provides reasonable alternatives to these objections to conventional exams. First, the student may be examined more frequently, which encourages the student to keep up with his course work. Second, the student may receive immediate feedback: when he turns in his answer sheet, he may receive the STRUCTURE OF PROGRAMS Of the four basic steps in the CGRT procedure, the preparation of tests and the scoring of student responses are facilitated by computer. Test preparation is accomplished with a program GENERATOR, and scoring is done by GRADER. All programs are written in FORTRAN. In the following sections, the essential features of the computer programs are described. Generator Figure 1 shows the overall flow of program GENERATOR. The input to GENERATOR consists of the pool of items (questions and answers) to be used in formulating individualized tests, and directives describing the structure of the tests. The program reads the pool of test items, verifies that each item is in an acceptable format, and stores the information as a convenient data structure in the program. A typical item· consists of the body of the question, contained on as many records (cards, usually) as nec- 641 642 Fall Joint Computer Conference, 1972 READ} CHECK} AND STORE ITEM POOL READ TEST SPECIFICATIONS PREPARE ITEM POOL ANSWER SUMMARY INITIALIZE TEST NUMBER SELECT ORDER OF ITEMS FOR THIS TEST OBTAIN ITEMS~ FORMAT THE TEST~ AND OUTPUT TEXT FOR PRINTING INCREMENT TEST NUMBER NO ALL TESTS FORMED? Figurel-GENERATOR program flow essary, followed by one or more cards of answer information. To facilitate computer grading, the correct answer is placed in a standard position on a card; other ~upporting answer information (textbook references, etc.) may follow on this or subsequent cards. The cards are numbered in such a way that the correct card order may be verified, and the answer part distinguished from the question body. Each item is further identified with a numeric code. Typically, to permit the instructor to group his items into sets of items covering similar material, the item identification will consist of a set number and an item number within the set. The collection of items to be used in forming a series of individualized tests thus is divided into a variable numbers of sets, each containing a variable number of items. The instructor may assign a specific weight to each set. The weight applies to each item in the set, and represents the number of points to be awarded for correctly answering such an item. Further, the instructor may assign a frequency value to each set of items, the frequency being a relative or absolute measure of the number of times on a test that the set is to be used for selection of an item. Weights allow control of the point value of items, and frequencies permit control of the number of items used from each set. When weight factors are used, either with or without frequency specifications, some simple rules are imposed. to assure a constant number of points on each test. The pool of items that forms the principal input to GENERATOR may reach a substantial size, perhaps several thousand cards. Some users of repeatable testing find it convenient to maintain their item pools on a master file and manipulate the pools with an updating or editing system. The item pool is organized by GENERATOR into a list, which is controlled by several tables. An important aspect of the overall efficiency of the CGRT process is that the entire pool is kept in directly addressable memory, since it will be repeatedly accessed in a random fashion. In a subsequent section we discuss ways to operate with the data kept primarily on secondary storage. All items in a set are stored in logically adjacent positions in the list. The list of items is accessed by a vector of pointers to the origin in the list of each item. To distinguish· sets of items,· another vector contains set pointers to positions in the first vector, each set pointer. specifying the index in the vector of the first item pointer for that set. The names of. the items and of sets,as supplied by the instructor in his item pool, are not used in the item selection processes; only the position of a set in the pool and the position of an item in its set is relevant. Thus the items may be accessed by position indices rather than by name; thjs allows a very rapid retrieval of any item. GENERATOR next reads the specifications for the tests to be produced from the pool of items. Required parameters are an examination unit number, the number of individualized tests, the number of copies of each such test, the· number of questions to appear on each test, and a starting value for numbering of the individual tests. Among the .optional specifications, in addition to the previously. described weight and frequency factors, is the .ability to specify randomized or ordered selection of sets during test generation. Once the test specifications are known, GENERATOR produces a small output file (usually on cards) describing the structure of the item pool, the item identification, the item answers, and certain critical test parameters. This file, typically containing about twenty cards, will permit the later regeneration of the Computer Processes in Repeatable Testing sequence of items (and answers!) on any given test, and thus will permit grading of student responses without the necessity of explicitly preserving the answers for each form. The amount of information which must be maintained between test preparation and scoring is therefore very small. With these preliminaries attended to, GENERATOR then formulates and prints each test. The selection of items for a test involves two stages: selection of a sequence of sets, and then choosing of an item from each selected set. The instructor will have stipulated either random or ordered set selection, and may have provided frequencies for his sets. Randomized set orderings are effected using a pseudo-random number generator; ordered set selection requires that the sets be used in the order that the instructor presented them in the item pool. Item selection from within a chosen set is always randomized; of course, the item selection is performed without replacement, to assure that no item appears more than once on a given test. The designation of both set and item is achieved conveniently using indices to the set and item vectors previously described. For later scoring to be performed by regeneration of the answer sequence for each test, it is vital that the pseudo-random numbers used be reproducible. This is achieved by using a simple function of the test number as an initializing value for a random number generator. Using the indices to the selected items, G ENERATOR formats and writes the test onto a file for subsequent printing. See Reference 1 for an example of the format of a printed test. The test consists of a heading section followed by each item, with question body on the left and answer part on the right. The answer material will be removed by the instructor prior to administration of the exam, and will be given to the student upon his completion of the test. As one might imagine, the speed of test generation is very heavily dependent on the speed of the test formatting and output processes. As is shown in a later section, very dramatic improvements in program performance may be obtained by some rather simple (but unfortunately non-standard) manipulations of the FORTRAN output processes. Grader Scoring of student responses is performed by GRADER, whose program flow is given in Figure 2. Its first important act is to read and record the information contained on the small answer regeneration decks prepared by GENERATOR. Since the instructor may have prepared tests with several different exe- 643 READ ANSWER SUMMARY DECKS READ STUDENT RESPONSE REGENERATE ANSWERS FOR THIS TEST ___N_O___ rALL RESPONSES SCORED? YES SORT SCORES BY STUDENT PREPARE ROSTER OF SCORES Figure 2-GRADER program flow cutions of GENERATOR (and indeed may have changed items between runs), he may supply several answer regeneration decks to GRADER. The principal requirement for the success of this procedure is that no two different tests over an examination unit· have the same test numbers. This is easily arranged by the instructor at the time he prepares the tests. For meaningful scoring, the instructor will also see that each test has the same total point score. The answer information is structured in memory in a fashion very similar to that used by GENERATOR for the item pool, except· the text of the items is not present during grading. The student typically marks his responses on a mark sense form, which is subsequently reduced to a punch card by an optical form reader. An annoying phenomenon is inherent in many such operations because of inadequacies in form design and reader capability: the mark sense answer form may contain insufficient space for each item to uniquely record the range of possible answer characters, and some doubling up of spaces is required. This imposes a many-to-one 644 Fall Joint Computer Conference, 1972 mapping on the information that reaches the computer, and requires that the original item pool answers be similarly transformed. This transformation is described to GRADER in a series of FORTRAN data statements, and is imposed on the correct answer information prior to entering the scoring section of the program. A student's response to a test thus typically reaches the computer as a card containing his name or identification number, the individual test number, and his (possibly transformed) answers to the questions. Using the test number and the information supplied with the appropriate answer deck, GRADER performs an item selection identical to that done by GENERATOR when the test was prepared. With the sequence of correct answers at hand, GRADER scores the student's response, weighting each item appropriately. When all student responses have been scored, GRADER sorts the results according to student identification, and prints a roster of the scores. A student may have taken several tests over the same examination unit; his scores will be .listed together on the roster, ranked either according to score or test number, as dictated by the instructor. Several special features associated with the grading process are available. We mention them only briefly. A number of computer programs are in use for cumulative recording of examination scores and subsequent assignment of a course grade. We have a rather primitive but useful item analysis routine which assists the instructor in the detection and improvement of faulty items in his pool. An elaborate system of preediting of both correct answers and student responses is available. This allows very flexible alterations of "correct" answers and student responses, and permits the assignment of penalty points (e.g., for lateness), or the selective elimination or alteration of specific items, sets, tests, students, etc. We are working on methods of allowing optional and multiple answers to items, weighted appropriately. ENHANCEMENT OF PERFORMANCE The CGRT process was conceived and implemented as a production system-to be used repeatedly and routinely by many people. The efficiency and cost of the computing processes are thus important factors in the acceptance of the method both by instructors and by computing centers. Observations of early versions of the test generation program indicated that the development of the test output to be printed was requiring an uncomfortable amount of computer time. (Note that we are discussing central processor time for producing the test output and recording it on some appropriate device such as disk, drum, or tape; the printing of the output is inevitably a lengthy process, but does not place a significant burden on the central processor in modern buffered output systems.) Subsequent timing studies showed that the vast bulk of central processor time was spent in the formatting and outputting processes themselves, with only a small portion of the time spent in the logic of item selection and other computational and input-output activities. Most of the activity in test production is character manipulation of a very elementary kind. The elaborate FORTRAN formatting routines, which were being.invoked for each line of output, were inappropriate for such simple but voluminous work. Furthermore, it was felt that too much time was being spent in the library routines which supported the FORTRAN write operation. As aresult of these observations, a version of GENERATOR was prepared which (1) reduced the calls to the FORTRAN formatting routines virtually to zero, and (2) blocked the output internally in the program so as to reduce the calls to the FORTRAN output routines .. This was made possible on our Control Data 3600 (later a CDC 6600) equipment by two non-standard but now fairly prevalent FORTRAN features: in-core formatting (specifically, the ENCODE feature), and direct input-output (specifically the BUFFER OUT feature). The former allows data in memory to be manipulated with the customary FORTRAN formatting routines and the result placed in memory, rather than inescapably on an output device. The latter feature permits the writing of an arbitrary amount of data in an arbitrary format onto a file without any editing by the FORTRAN library. These features were used in the following way: Since the highly repetitive processes in GENERATOR were in the test printing section, all editing of information which would eventually be printed was moved forward in the program and performed (with the aid of the incore formatting feature) at the earliest practical time. For instance, the text of each item in the pool could be immediately edited and stored to appear as printable lines by inserting the printer control character in the first character position of each line.· Other formatting operations, such as formation of question· numbers, were moved forward until the number of invocations of a format statement was reduced to one per test! This one involved the first line of the test, which contained the test number. The lines of generated output were collected in a buffer (a FORTRAN array) with appropriate line terminators appended, and under the management of a simple blocking routine were written periodically to the output file. The results were astonishing, even to seasoned programmers. Table I gives timings for the standard FOR- Computer Processes in Repeatable. Testing TABLE I-Execution Times to Produce 1000 Tests on CDC 6600 Standard FORTRAN version Total time in GENERATOR Time in output section Time in item selection section Time in other parts of program 326 sec. 320 4 2 Modified CDC 6600 FORTRAN version 18 sec. 12 4 2 TRAN version of GENERATOR and for our CDC 6600 modified FORTRAN version. The figures are central processor times for preparation of 1000 typical individualized tests, and do not include printing time or time spent waiting for the completion of physical output operations (the latter time is used in a multiprogramming system to run other programs). One observes an improvement factor of 27 in the crucial output section, with no appreciable cost in other sections of the program! The exact figure will vary with different computer systems, but the obvious conclusion transcends this particular project and this particular machine. Programming language designers and implementers please note. The improvement in performance gained by the above steps changed the CGRT test production operation from one which placed uncomfortable demands on the central processor (which if nothing else usually results in poor turnaround) to one whose cost and performance were quite acceptable. Readers are referred to Reference 1 for a discussion of the economics of the CGRT process. "MACHINE-INDEPENDENT" FORTRAN PROGRAMS? The resounding effect of the improvements cited in the previous section has posed a dilemma. One of us (FP) has had considerable experience in developing' "machine-independent" FORTRAN programs, and has been an enthusiastic advocate of such coding practices, wherever realistic. Yet the value to the CGRT project of our non-standard practices could not be denied. To compound the problem, there developed considerable interest in the CGRT scheme on the part of teachers and computer people at numerous other locations. We have developed standard FORTRAN IV versions of GENERATOR and GRADER that are specifically. designed to be readily adaptable to most mediumand large computer systems. By and large, the 645 syntactic problems in such an effort are minor; virtually everyone has a well-maintained standard FORTRAN compiler at his disposal. However, there are a number of serious semantic difficulties. Although this is not the place to embark on a catalog of possible FORTRAN machine dependencies, several of the problems and our solutions (or lack of them) -should be mentioned. The first awkward problem is the packing of characters into words during input and output operations. Different computers have different word sizes, which accommodate various numbers of characters, and the FORTRAN format statements should reflect this fact. Packing is essential in this project, since the item pool frequently contains a large number of characters. Our solution was to provide for each input and output operation a set of read or write and format statements for each common number of characters per word, and to select the proper read/format or write/ format pair by a branch governed by a variable indicating the number of characters per word. The user thus is required to set only a single variable at the beginning of the program. A second problem is the shifting of characters within words. The usual multiplications or divisions by powers of two inevitably fail on some machine (e.g., the CDC 6600) . We reduced the problem in these particular programs to the need to move a character from the leftmost end of a word to the rightmost end. We request the user to supply his own shift routine to accomplish this act, and to replace our routine, which is designed for the CDC 6600. The generation of acceptable pseudo-rando m numbers on machines with small word size is difficult to generalize. However) almost every installation will have a random number generator in its library, and we ask that the user repla-ce our generator with a call to his own routine. These are easy and acceptable (if inelegant) solutions to several tedious problems in producing portable FORTRAN programs. There remains the more difficult matter of converting the output in GENERATOR to reduce the dependence on FORTRAN output procedures. With the present inadequate standard FORTRAN, there is no good solution. Our choice was to provide the straightforward FORTRAN program, with copious comments illustrating what one should wish to accomplish with the particular in-core formatting and blocking mechanisms available at his installation. CGRT WITH SMALL COMPUTERS CGRT has aroused the interest of teachers in colleges and high schools that have access to small computing 646 Fall Joint Computer Conference, 1972 systems. We have received many inquiries about the availability of programs for such machines. The IBM 1130 seems to be most widely used in this environment. Therefore we have developed a version of CGRT for a minimal IBM 1130 system equipped with card reader and punch, disk, line printer, and 8K of main memory. At the time of writing, the programs for generating and scoring tests have just completed field test and are ready for distribution. The tactics employed to implement CGRT on small computers are significantly different from those previously described, although the result is similar. The FORTRAN language available is typically ANSI Basic FORTRAN, a minimal language with few frills. The minimum 8K words of main memory will accommodate on the order of 16,000 characters. Since a modest pool of test items will need over 100,000 characters, the pool must be kept almost entirely on secondary storage. Further, a typical test of three pages will fill main memory, and thus test output must be disposed of promptly. The computer is too small to provide a spooling mechanism (one in which printable material is written to a secondary storage file for later printing), so printing occurs on-line at the time of execution of the program. Since the volume of print is large, and since the typical line printer available on such a system operates at a rather slow speed (80 lines per minute for an IBM 1132), one anticipates that the limiting factor in test generation on the small machine will be the speed of the printer. On the basis of these indications, our IBM 1130 version of the test generation program was designed to keep the item pool entirely on the disk, using main memory for programs and for organizational data such as the vector of (disk) pointers for each item. Items are selected for inclusion on a test using algorithms similar to those described in an earlier section. The items are then obtained from the disk in the proper order for printing, and are immediately formatted and printed. The expectation that the test generation process would be limited by the speed of the line printer was confirmed for the IBM 1132 printer. However, on a machine with a 600 line per minute printer, the operation was then limited by the disk access time. Weare now developing algorithms to minimize disk accesses so as to again make such faster printers the limiting resource in test generation. PRESENT STATUS AND PROJECTED DEVELOPMENTS Computer Generated Repeatable Testing has been a useful adjunct to the instructional process at Indiana University for several years, and more recently at other institutions. A number of uses of repeatable testing have suggested themselves. The principal use has been for testing in college-level classes, frequently but not exclusively in large sections. Repeatable testing in the classroom, at scheduled times outside class, and in a more flexible student-scheduled environment have each proven effective for various instructors. Repeatable testing is well suited for make-up examinations and for administering special tests to allow advanced placement. There are potential applications to correspondence course work, and in regional and national testing centers. Using the repeatable test as a tutorial device, in which the student may take tests simply as a study aid, has become a very important aspect of our service. For those instructors with available item pools or who are willing to develop items in the necessary quantity, repeatable testing has frequently been a great aid to effective teaching. There are now the beginnings of a coordinated effort to publicize the availability of test items in machine readable form. At the instigation of Gerald Lippey of IBM Corporation (San Jose, California 95114), a meeting was held in San Jose in January 1972 of a small group of people who had worked in the area of mechanized test item banking or computer facilitated testing. As a result of this meeting, Lippey has made a good start toward ascertaining the type and extent of available pools of test items. Through such efforts, and through the generosity of item writers, banks of items may in the future be more readily available than in the past, and each instructor will not be faced with the task of developing his own complete item pool. At Indiana University, instructors have developed and used repeatable testing item pools in English, geography, home economics, chemistry, economics, statistics, psychology, speech therapy, accounting, education, and others. Usage of CGRT at Indiana University encompasses about ten courses a semester, with over 40,000 individualized tests printed for over 2000 students. Our plans are for expansion of the CGRT facility in several areas. We are working on schemes for student recording (and mechanized recovery) of multicharacter answers, to accommodate those instructors who find the single-character answer restriction to be unacceptable. We would like to allow multiple or optional answers to items, with appropriate weights for allocating part credit. There is interest in a more comprehensive item analysis package. And several people are working on the automatic generation of test items. Repeatable testing programs are available for distribution. The standard FORTRAN version (for medium and large computer systems), the specialized CDC Computer Processes in Repeatable Testing 6600 FORTRAN version, and the IBM 1130 FORTRAN version may be obtained from the authors. If you wish to investigate the CGRT method, please write for documentation and instructions for requesting the programs. 647 REFERENCE 1 F PROSSER D D JENSEN Computer generated repeatable tests Proceedings of 1971 Spring Joint Computer Conference pp295-301 AMERICAN FEDERATION OF INFORMATION PROCESSING SOCIETIES, INC. (AFIPS) AFIPS OFFICERS and BOARD OF DIRECTORS President Vice President Mr. Walter L. Anderson General Kinetics, Inc. 12300 Parklawn Drive Rockville, Maryland 20852 D. Robert A. Kudlich Raytheon Co., Equipment Division Wayland Laboratory Boston Post Road Wayland, Massachusetts 01778 Secretary Treasurer Mr. Richard B. lUue, Sr. TRW Systems Group Scientific Data Processing Lab. One Space Park-R3/1098 Redondo Beach, California 90278 Mr. George Glaser McKinsey and Company, Inc. 3000 Sand Hill Road Menlo Park, California 94025 Executive Director Dr. Bruce Gilchrist AFIPS 210 Summit Avenue Montvale, New Jersey 07645 ACM Directors Dr. Anthony Ralston SUNY at Buffalo Computer Science Department 4226 Ridge Lea Road Amherst, N ew York 14226 Mr. Donn B. Parker Stanford Research Institute 333 Ravenswood Avenue Menlo Park, California 94025 Mr. Herbert S. Bright Computation Planning, Inc. 5401 Westbard Avenue, Suite 520 Washington, D.C. IEEE Directors Dr. A. S. Hoagland IBM Corporation Dept. 29A-Building 021 P.O. Box 1900 Boulder, Colorado 80302 Professor Edward J. McCluskey Stanford University Department of Electrical Engineering Palo Alto, California 94305 Dr. S. S. Yau Department of Electrical Engineering Stanford University Palo Alto, California 94305 Simulations Council Director Association for Computation Linguistics Director Mr. Frank C. Rieman Electronic Associates, Inc. P.O. Box 7242 Hampton, Virginia 23366 Dr. A. Hood Roberts Center for Applied Linguistics 1717 Massachusetts Avenue, N.W. Washington, D.C. 20036 American Institute of Aeronautics and Astronautics Director American Institute of Certified Public Accounts Director Mr. Frank Riley, Jr. Auerbach Corporation 1501 Wilson Boulevard Arlington, Virginia 22209 Mr . Noel Zakin AICPA 666 Fifth Avenue New York, New York 10019 American Statistical Association Director American Society for Information Science Director Dr. Mervin E. Muller 5303 Mohican Road Mohican Hills Washington, D.C. 20016 Mr. Herbert Koller ASIS 1140 Connecticut Avenue, N.W" Suite 804 Washington, D.C. 20036 Instrument Society of America Director Society for Industrial and Applied Mathematics Director Mr. Theodore J. Williams Purdue Laboratory for Applied Industrial Control Purdue University Lafayette, Indiana 47907 Dr. D. L. Thomsen IBM Corporation Armonk, New York 10504 Society for Information Display Director Special Librar'ies Association Director Mr. William Bethke Rome Air Development Center RADC (IS, W. Bethke) Griffiss AFB, New York 13440 Mr. Herbert S. White Institute for Scientific Information 325 Chestnut Street Philadelphia, Pennsylvania 19105 Associat'ion for Educational Data Systems Director Dr. Sylvia Charp Director of Instructional Systems The School District of Philadelphia Board of Education 5th and Luzerne Streets Philadelphia, Pennsylvania JOINT COMPUTER CONFERENCE BOARD President A aM Representative Mr . Walter L. Anderson General Kinetics, Incorporated 12300 Parklawn Drive Rockville, Maryland 20852 Dr. Herbert R. J. Grosch National Bureau of Standards Center for Computer Science Washington, D.C. 20234 V ice President IEEE Representative Dr. Robert A. Kudlich Raytheon Company, Equipment Division Wayland Laboratory Boston Post Road Wayland, Massachusetts 01778 Dr. S. S. Yau Department of Electrical Engineering The Technological Institute Northwestern University Evanston, Illinois 60201 Treasurer saI Representative Mr. George Giaser McKinsey and Company, Iuc. 3000 Sand Hill Road Menlo Park, California 94025 Mr. Paul W. Berthiaume Electronic Associates, Inc. 185 Monmouth Park Highway West Long Branch, New Jersey 07764 JOINT COMPUTER CONFERENCE COMMITTEE JOINT COMPUTER CONFERENCE TECHNICAL PROGRAM COMMITTEE Mr. Jerry L. Koory, Chairman H-W Systems 525 South Virgil Los Angeles, California 90005 Mr. Henry S. MacDonald, Chairman Bell Laboratories Murray Hill, New Jersey 07971 1973 NATIONAL COMPUTER CONFERENCE CHAIRMAN Dr. Harvey Garner Director Moore School of Electrical Engineering University of Pennsylvania Philadelphia, Pennsylvania 17104 1972 FJCC STEERING COMMITTEE Chairman Local Arrangements Robert Spinrad Xerox Corporation Antonia Schuman Litton Industries Technical Program Printing and Mailing Donald A. Meier National Cash Register Katherine Jamerson Computer Sciences Corp. Secretary Exhibits Harold Sarkissian Major Data Corp. A. Luke Ward San/Bar Electronics Corp. Public Relations Controller Allen T. LeAnce LeAnce and Associates H~ward Verne Hughes Aircraft Co. Special Activities Registration Patricia Riley TRW Systems Fred Gruenberger San Fernando Valley State College TECHNICAL PROGRAM COMMITTEE Chairman Speaker A rrangements Director Mr. Donal A. Meier National Cash Register Mr. Lynn Maxson IBM Corporation Liaison & Review Coordinator Vice-Chairman Dr. Harold Petersen RAND Corporation Mr. Wolfgang G. Pfeiffer National Cash Register Publication Director Mr. Russell Bennett Burroughs Corporation SESSION DIRECTORS A nalysis and Simulation Director Dr. Ray Nilsen University of California, Los Angeles Interdisciplinary Director Users and Applications Director Mr. Ross F. Penne University of Southern California Users and Applications Assoc. Dir. Mr. Lowell Amdahl Compata, Inc. Dr. Arnold F. Goodman McDonnell-Douglas Astronautics Softwm'e Director Hardware Director Dr. Richard R. Muntz University of California, Los Angeles Systems and A rchitecture Director Mr. Harut Barsamian National Cash Register Mr. Jack Pariser Hughes Aircraft Co. SESSION CHAIRMEN, REVIEWERS AND PANELISTS SESSION CHAIRMEN Baker, Frank Balzer, Robert M. Barsamian, Harut Bekey, George Boehm, Barry Chen, T. C. Chu, Wesley W. Denning, Peter J. Farber, David J. Fetter, William A. Flynn, Michael J. Gaines, Eugene C., Jr. Gentile, Richard B. Golub, Eugene Goodman, Arnold Hamming, Richard W. Hollander, G. Hunter, Kenneth Husson, Samir Juncosa, M. L. Kimbleton, Stephen Kiviat, Philip J. Lyon, JohnK. McCluskey, E. J. McManus, Jack MeN amee, Laurence Mason, Maughn Mills, Harlan Mitchell, Gordon Montgomery, Christine Morgan, Howard Newport, Christopher Patrick, Robert Penne, Ross F. Phister, Montgomery, Jr. Pinkerton, Tad Reinstedt, Robert Stefferud, Einar Taplin, Janet M. Turn, Rein Waxman, Ronald Weissman, Clark Wilson, Jon C. REVIEWERS Alberts, A. Alrich, J. C. Anderson, H. M. Anderson, R. Arndt, F. Arnovick, G. Astrahan, M. Augustin, D. C. Avizienis, A. Ball, N.· Barlow, A. E. Becker, P. Bell, T. E. Bernstein, W. A. Biener, J. W. Bloomfield, J. Boehm, B. W. Borgsahl, R. Bork, A. Branch, R. Branin, F. Brereton, T. B. Brown, A. B. Calhoon, D. Canova, G. Cardwell, D. Carlson, G. Carroll, J. Carter, W. C. Chen, T. C. Chernak, J. Cheydler, B. F. Choma, J. Jr. Chu, W. W. Climenson, W. D. Copp, D. H. Courtney, R. Cowell, W. Critchlow, A. Csuri, C. Dale, A. Dalrymple, S. H. Darms, D. Dittberner, D. Dorr, F. W. Duggan, M. Durney, A. I. Eccles, W. Edwin, L. Eisenstark, R. Farmer, N. A. Feurzeig, W. Feustel, E. A. .Fiefant, R. Firschein, O. Fletcher, J. Frank, H. Freilich, A. Frost, C. R. Fuches, E. Fulton, L. M. Gardner, R. Gentile, R. Gillette, G. Gilliland, B. Gold, M. Goodman, A. F. Gosden, J. Gotterer, M. Grandmaison, J. Grau, A. Grobstein, D. Gulick, L. R. Jr. Hagenstad, M. T. Hamilton, D. C. Hammer, C. Hamming, R. W. Hammond, F. Hanson, R. J. Harper, S. Harrison, R. L. Hartwick, R. Hendrie, G. Herr, W. B. Heterick, R. Jr. Hixon, J. Hoffman, L. Hootman, J. T . Humphrey, R. Hunt, E. Hunter, 'K. W. Hutt, A. E. Hyman, M. Isaksen, L. Ito, R. A. Jackson, H. L. Jeffrey, S. J ellinek, 1. Jenkins, J. M. Joseph, E. KaltmanJ A. Karplus, W. J. Kay,A. Keenan, T. Kernighan, B. W. Kerr, D. V. Kimbleton, S. R. Klein, E. Kleinrock, L. Klinger, A. Klotz, D. A. Knight, K. Koory, J. Kosinski, W. Kurasch, C. Kuhns, J. L. Lange, L. Larkin, R. Larson, K. Lasser, D. J. Ledley, R. Leffler, N. Leichner, G. H. Levine, L. Lewis, W. Lindloon, E. Linville Liskov, B. Loewe, R. T. Logan, R. S. Losleben, P. Luderer, G. W. R. Lum, V. Madden, J. Markel, R. Marks, H. Martin, W. Mathison, S. Mathur, F. Mayper, V. McCracken, D. McGovern, W. Mclssac, D. McMurran, M. N. Meier, D. Mekota, J. Mergenweck, Meuller, M. MichIe, M. Miller, S. Miller, W. G. Mills, H. Minker, J. Mitchell Mittman, B. Moler, C. B. Morterana Myers, R. Nance, R. E. Nicols, A. J. Niedrauer, R. V. Nielsen, R. O'Brien, J. Ofek, H. Oliver, P. Onovec Onyshkevych Opderbeck, H. Ostapko, D. Owens, J. Pariser, J. Parker, D. Patel, A. Patrick, R. L. Penne, R. Petersen, H. Phillips, T. D. Pohm, A. Pomerene, S. Postel, J. Price Prokop, J. Rajaraman, A. S. Ramamoorthy, C. Ray,L. Reynolds, C. Rhodes Rick, J. W. Rigney, J. Ripley, G. Robinson, J. Robinson, L. Rodriguez, R. Rosenbaum, S. Rosenberg, A. M. Rosenthal, M. Rutman, R. Saal, H. St. John, D. Schafer, E. Schell, R. Schichman, H. Schieldge, J. Schischa, E. Schneidewind, N. Schultz, M. H. Sedelow, W. Schechter, J. Short, G. E. Silvern, L. Singh, S. Skelly, P. G. Small, D. L. Smith, C. Smith, R. A. Southworth, R. W. Steenbergen, H. Stefferud, E. Stephenson, J. W. Stewart, R. M. Sturm, W. Su, S. Y. H. Summit, R. Sutherland, W. Svoboda, A. Sykes, D. Taylor, R. Thomas, R. T. Tseng, C. Tucker, S. Uhlig, R. H. Uttal, W. Van Tassel, D. Walker, D. E. Watson, R. A. Watt, W. C. Weeg, G. P. Wegbreit, B. Weiss, E. Weissman, C. Werner, J. J. Jr. Wersan, S. Whitney, D. Wiederhold, G. Wigington, R. Wiggins Wilkov, R. S. Williams, J. G. Williams, L. Williams, T. J. Wilner, W. Wilson, J. Wolf, E. W. Wright, K. Wyllys, R. Yakowitz, S. Yelvington, S. Young, J. Zelkovitz, M. PANELISTS AND SPEAKERS Donald Aufenkamp, N .S.F. A. Avizienis, University of Southern California John Bacon, United California Bank Max Beere, Tymshare Barry Boehm, RAND Corporation Robert Brass, Xerox Barry Brotman, Allied Chemical Corporation Gary Carlson, Brigham Young University Leo Cohen, Consultant David Copp, Bell Telephone Laboratories Stephen Crocker, Department of Defense John Davis, TESDATA Systems Corp. Lt. Col. Phillip Enslow Jr., Office of Telecommunications Policy, Executive Office of the President David Evans, Evans and Sutherland John Farquhar, RAND Corporation Nick Finamore, Western Electric H. Fleisher, IBM Corporation L. Garrett, Motorola Robert Gordon, Consultant P. F. Gudenschwager, Honeywell Richard Hamming, Bell Telephone Laboratories Cdr. Grace Murray Hopper USNR Richard Johnson, Stanford University Computation Center Robert Johnson, Burroughs Corporation V. Kahan, University of California at Berkeley Robert Kahn, Bolt, Beranek and Newman, Inc. E. Mahoney, United States General Accounting Office C. H. Mays, Fairchild John McCarthy, Stanford University M. Douglas McIlroy, Bell Telephone Laboratories Harry Mergler, Case Western Reserve University Capt. M. Morris, Federal ADP Simulation Center Mervin Muller, International Bank for Reconstruction and Development Peter Newcombe, Brigham Young University Nils Nilsson, Stanford Research Institute A. Patel, IBM Corporation Alan Perlis, Yale University Charles Perry, McDonnel-Douglas Astronautics Tom Poole, United Computer Systems C. Ramamoorthy, University of Texas Louis Robinson, IBM Corporation Arthur Rosenberg, Informatics Capt. Paul Roth, Fleet Combat Direction Systems Support Activity Stephen Y. Su, University of Southern California Lee Talbert, Packet Communications, Inc. L. C. Thomas, Bell Telephone Laboratories D. E. Walker, S.R.I. P. Weber, Lane County Mark Wells, Los Alamos Scientific Laboratory James Williams, United States General Accounting Office Joe Wineke, Compress, Inc. M. Worthy, Operating Systems Gordon Zeller, Los A ngeZes Times PRELIMINARY LIST OF EXHIBITORS Addison-Wesley Publishing Company, Inc. Addmaster Corporation Addressograph Multigraph Corporation AFIPS Press American Elsevier Publishing Company American Telephone & Telegraph Ampex Corporation Anaheim Publishing Company Ansul Company Basic Timesharing, Inc. Beehive Terminal Bridge Data Products, Inc. Burroughs Corporation Caelus Memories, Inc. Centronics Century Electronics and Instruments Cipher Data Products Codex Corporation Collins Radio Company ComData Corporation Computer Access Systems, Inc. Computer Automation, Inc;. Computer Copies Corporation Computer Design Publishing Corporation Computer Machinery Corporation Controls Research Corporation Courier Terminal Systems, Inc. Data Disc, Inc. Data General Corporation Datamation Data Printer Corporation Datapro Research Corporation Data Products Corporation Dataram Corporation Datawest Corporation Datum, Inc. Diablo Systems, Inc. Digital Computer Controls, Inc. , Digital Development Corporation Documation, Inc. DuPont Company Eastman Kodak Company Electronic Engineering Company of California Electronic News, Fairchild Publications Facit-Odhner, Inc. Federal Screw 'Vorks Floating Point Systems, Inc. General Automation, Inc. GTE Lenkurt G-V Controls Hayden Publishing Company, Inc. Hewlett-Packard Company Honeywell Computer Journal Houston Instrument IMSL Inforex, Inc. Information Data Systems, Inc. Infosystems Infoton, Inc. Intel Corporation International Communications Corporation International Computer Products, Inc. Kennedy Company Kybe Corporation Lipps, Inc. Litton ABS OEM Products Lorain Products Corporation Marubeni America Corporation Microdata Corporation Micro Switch Milgo Electronic Corporation Miratel Divisiqn-Ball Brothers Research Corp. Modern Data Mohawk Data Sciences Corporation Northern Electric Company, Ltd. Nortronics Company, Inc. Olympia USA, Inc. Ovonic Memories, Inc. Panasonic Paradyne Corporation Pertec Corporation Pioneer Electronics Corporation Pioneer Magnetics, Inc. Potter Instrument Company, Inc. Prentice Hall, Inc. Printer Technology, Inc. Producers Service Corporation Radley Associates Limited Randomex, Inc. Raymond Engineering, Inc. Raytheon Service Company Redactron Corporation Remex, A unit of Ex-Cell-O Corporation Sangama Electric Company Signal Galaxies, Inc. The Singer Company Sycor, Inc. Sykes Systems Furniture Company Tally Corporation Techtran Industries, In~. Tekronix, Inc. Tele-Dynamics Teleprocessing Industries, Inc. Teletype Corporation Texas Instruments, Inc. Toko, Inc. Tri-Data Corporation Van San Corporation Vector General, Inc. VelD-Bind Wangco, Inc. John Wiley and Sons, Inc. Xerox Corporation AUTHOR INDEX Albus, J. S., 1095 Alexandridis, N., 1057 Altshuler, G. P., 1133 Anacker, W., 1269 Anderson, J. A., 703 Atwood, J. W., 331 Augusta, B., 1261 Avizienis, A., 1057 Bailey, P. T., 1279 Baird, G., 819 Baker, F. B., 661 Baker, L. H., 147 Baker, F. T., 339 Barr, W. J., 755 Baskett, F., 13 Bauer, W. F., 993 Bell, C. G., 765,779 Bell, T. E., 287 Beltz, G. E., 1009 Bernhart, W. D., 169 Berra, P. B., 867 Blaskovics, T. L., 611 Boehn, B. W., 1141 Booth, G. M., 1025 Borgerson, B. R., 89 Boruch, R. F., 425 Bowdon, E. K., Sr., 755 Brown, J. R., 181 Brown, K. M., 1309 Browne, J. C., 13 Buckner, D. C., 153 Bullen, R. H., Jr., 479 Burk, J. M., 263 Burns, R. S., 153 Calahan, D. A., 885 Carroll, J. M., 445 Casasent, D., 709 Chandy, K. M., 55 Chang, S. K., 461 Chen, T. C., 1045 Christensen, G., 561 Chu, W. W., 597 Clarke, L. C., 393 Cofer, R. H., 135 Cohen, G. H., 407 Concus, P., 1303 Conn, R. B., 1057 Cosell, B. P., 741 Cowan, A., 55 Cronin, H. F., 1037 Crowther, W. R., 741 Cureton, H., 965 Curtice, R. M., 1105 Cutts,R., 473· Dana, C., 1111 De Cegama, A., 299 De Mercado, J., 553 Denning,P. J., 611 Derksen, J., 1181 Di Palma, R.,537 Dmytryshak, C. A., 525 Doty, K. L., 691 Down, N. J., 1243 Ellis, M. E., 1117 Feldman, J. A., 1193 Fichten, J. A., 1017 Fitzsimons, R. M., 255 Foster, D. F., 1235 Freedy, A., 1089 Freeman, P., 779 George, A., 1317 Glaser, E. L., 1045 Goodman, A., 669, 1163 Grace, H. A., 1257 Grampp, F. T., 105 Grobstein, D. L., 889 Grushcow, M. S., 331 Haney, F. M., 173 Hansler, E., 49 Harris, B., 415 Harroun, T. V., 1261 Haynes, H., 473 Healey, L. D., 691 Heart, F. E., 741 Hench, R. R., 1235 Hice, G. F., 537 Hoagland, A. S., 985 Holt, R. C., 331 Holt, R. M., 1069 Hoover, L. R., 375 Horning, J. J., 331 Hsiau, M. Y., 83 Hull, F., 1089 Huskey, H., 473 Jensen, E., 719 Jones, W. C., 545 Jones, P. D., 561 Jung, D. C., 123 Karplus, W. J., 385 Katke, W., 1117 Katzenelson, J., 515 Kaubisch, J., 473 Kesel, P. G., 393 Kimbleton, S., 1163 Kossiakoff, A., 923 Kreitzberg, C. B., 115 Kuck, D. J., 213 Kutsch, J. A., Jr., 611 Laitinen, L., 473 Lan, J., 13 Levitt, K. N., 33 Linden, T. A., 201 Lipovski, G. J., 691 Lippey, G., 633 Liskov, B. H., 191 Lou, J. R., 1089 Low, J. R., 1193 Lyman, J., 1089 Lynch, J. P., 161 McAuliffe, G., 49 McDermott, D. V., 1171 McQuillan, J. M., 741 Maestri, G. H., 273 Mandell, R. L., 453 Martins, G. R., 801 Mathur, F. P., 65 Merten, A., 849 Milgrom, E., 515 Millen, J. K., 479 Minnick, R. C., 1279 Minsky, N., 587 Mommens, J. H., 461 Moe, M. L., 1081 Morenoff, E., 393 Morgan, M. G., 1243 Mori, R., 353 Murphy, D. L., 23 Naito, S., 345 N akhnikian, J., 641 Nanya, T., 345 Needham, R. M., 571 N ezu, K., 345 Nutt, G. J., 279 Ohmori, K., 345 Olson, J., 1117 Opderbeck, H., 597 Orlandea, N., 885 Orlando, V. A., 859 Parhami, B., 681 Parnas, D. L., 325 Parrett, G. H., 1251 Patel, A. M., 83 Pendray, J. J., 97 Plagman, B. K., 1133 Pomerene, J. H., 977 Presser, L., 1111 Prosser, F., 641 Raamot, J., 867 Ramamoorthy, C. V., 55 Roland, R. D., 161 Rose, C. W., 311 Rosenberg, A. M., 993 Rothman, S., 423 Rudolph, J. A., 229 Rulifson, J. F., 1181 Ruud, R., 949 Sadler, R. W., 1243 Sager, N., 791 Sandfort, R. M., 1279 Schneidewind, N. F., 837 Schoonover, J. E., 263 Schultz, G. W., 1069 Schwartz, J. T., 1081 Semon, W. L., 1279 Sevick, K. C., 331 Shapiro, N. Z., 435 Singh, S., 367 Sleight, T. P., 923 Spirn, J. R., 611 . Sterling, W., 709 Stone, P. J., 811 Strauss, J. C., 1225 Stucki, L. G., 829 Sussman, G. J., 1171 Swinehart, C., 1193 Szygenda, S. A., 875 Taylor, R. H., 1193 Teichroew, D., 1203 Teitelman, W., 917 Teorey, T. J., 1 Thompson, E. W., 875 Thurber, K., 719 Tobias, M. J., 1025 Tollkuhn, G., 473 Tou, J. T., 135 Tracey, J. H., 375 Tsiang, S. H., 545 Tsichritzis, D., 331 Tucker, E. K., 147 Turn, R., 435 Uhlig, R. P., 889 Varah, J. M., 1299 Vickers, F. D., 649 Walden, D. C., 741 Waldinger, R. J., 1181 Walter, C. N., 407 Warner, C. D., 959 Watson, J., 229 Watson, R. A., 1141 Waxman, R., 367 Way, F., III, 1045 Webb, J. H., 115 Wegbreit, B., 905 Weltman, G., 1089 Wensley, J. H., 243 Wesley, M. A., 461 Wilkes, M. V., 971 Wilkov, R., 49 Williams, L. H., 899 Williams, T. G., 499 Wilner, W. T., 489, 579 Wolman, B. L., 507 Wulf, W. A., 943 Yang, S. C., 1117 Yarwood, E., 473
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19 Producer : Adobe Acrobat 9.0 Paper Capture Plug-in Modify Date : 2008:11:18 15:41:32-08:00 Create Date : 2008:11:18 15:41:32-08:00 Metadata Date : 2008:11:18 15:41:32-08:00 Format : application/pdf Document ID : uuid:80be1c66-f5ce-4707-9986-f30398ce60f1 Instance ID : uuid:214d96f2-1ac7-4934-bf8f-fa252864ce41 Page Layout : SinglePage Page Mode : UseOutlines Page Count : 666EXIF Metadata provided by EXIF.tools