1972 12_#41_Part_2 12 #41 Part 2
1972-12_#41_Part_2 1972-12_%2341_Part_2
User Manual: 1972-12_#41_Part_2
Open the PDF directly: View PDF .
Page Count: 692
Download | ![]() |
Open PDF In Browser | View PDF |
AFIPS CONFERENCE PROCEEDINGS VOLUME 41 PART IT 1972 FALL JOINT COMPUTER CONFERENCE December 5 - 7, 1972 Anaheim, California The ideas and opinions expressed herein are solely those of the authors and are not necessarily representative of or endorsed by the 1972 Fall Joint Computer Conference Committee or the American Federation of Information Processing Societies, Inc. Library of Congress Catalog Card Number 55-44701 AFIPS PRESS 210 Summit Avenue Montvale, New Jersey 07645 ©1972 by. the American Federation of Information Processing Societies, Inc., Montvale, New Jersey 07645. All rights reserved. This book, or parts thereof, may not be reproduced in any form without permission of the publisher. Printed in the United States of America CONTENTS PART II 649 661 F. D. Vickers F. B. Baker 669 A. Goodman A highly parallel computing system for information retrieval ........ . The architecture of a context addressed segment-sequential storage .. . 681 691 B. Parhami L. D. Healy K. L. Doty G. Lipovski A cellular processor for task assignments in polymorphic multiprocessor computers A register transfer module FFT processor for speech recognition ..... . 703 709 A systematic approach to the design of digital bussing structures ..... 719 J. A. Anderson D. Casasent W. Sterling K. Thurber E. Jensen Cognitive and creative test generators ............................ . A conversational item banking and test construction system ........ . MEASUREMENT OF COMPUTER SYSTEMS-EXECUTIVE VIEWPOINT Measurement of computer systems-An introduction .............. . ARCHITECTURE-,-TOPICS OF GENERAL INTEREST DISTRIBUTED COMPUTING AND NETWORKS Improvement in the design and performance of the ARPA network ... 741 Cost effective priority assignment in network computers ............ . 755 C.mmp-A multi-mini processor ................................. . 765 C.ai-A computer architecture for multiprocessing in AI research ..... 779 J. McQuillan W. Crowther B. Cosell D. Walden F. E. Heart E. K. Bowdon, Sr. W. J. Barr W. A. Wulf C. G. Bell C. G. Bell P. Freeman NATURAL LANGUAGE PROCESSING Syntactic formatting of science information ....................... . Dimensions of text processing ................................... . Social indicators from the analysis of communication content ....... . 791 801 811 N. Sager G. R. Martins P. J. Stone 819 829 837 849 G. Baird L. G. Stucki N. Schneidewind A. Merten D. Teichroew MEASUREMENT OF COMPUTER SYSTEMS-SOFTWARE VALIDATION AND RELIABILITY The DOD COBOL compiler validation system .................... A prototype automatic program testing tool ...................... An approach to software reliability prediction and quality control ... The impact of problem statement languages in software evaluation .. . . . . COMPUTER AIDED DESIGN The solution of the minimum cost flow network problem using associative processing .............................................. . 859 Minicomputer models for non-linear dynamics systems ............. . Fault insertion techniques and models for digital logic simulation .... . 867 875 A program for the analysis and design of general dynamic mechanical systems .................................................... . 885 D. A. Calahan N.Orlandea A wholesale retail concept for computer network management ....... . 889 D. L. Grobstein R .. P. Uhlig A functioning computer network for higher education in North Carolina .................................................... . 899 L. H. Williams Multiple evaluators in an extensible programming system .......... . Automated programmering-The programmer's assistant ....... " .. . A programming language for real-time systems .................... . 905 917 923 Systems for system implementors-Some experiences from BLISS .... 943 B. Wegbreit W. Teitelman A. Kossiakoff T. P. Sleight W. A. Wulf 949 959 965 R. Ruud C. D. Warner H. Cureton 971 977 985 993 M. V. Wilkes J. H. Pomerene A. S. Hoagland W. F. Bauer A. M. Rosenberg V. A. Orlando P. B. Berra J. Raamot S. Szygenda E. W. Thompson COMPUTER NETWORK MANAGEMENT SYSTEMS FOR PROGRAMMING· MEASUREMENT OF COMPUTER SYSTEMS-MONITORS AND THEIR APPLICATIONS The CPM-X-A systems approach to performance measurement .... . System performance evaluation-Past, present, and future .......... . A philosophy of system measurement ............................ . HISTORICAL PERSPECTIVES Historical perspectives-Computer architecture. . . ................ . Historical perspectives on computers-Components ................ . Mass storage-Past, present, future .......................... " .. . Software-Historical perspectives and current trends ............... . INTERACTIVE PROCESSING-EXPERIENCES AND POSSIBILITIES NASDAQ-A real time user driven quotation system .............. . The Weyerhaeuser information systems-A progress report ......... . The future of remote information processing systems ............... . 1009 1017 1025 Interactive processing-A user's experience ....................... . 1037 G. E. Beltz J. P. Fichten M. J. Tobias G. M. Booth H. F. Cronin IMPACT OF NEW TECHNOLOGY ON ARCHITECTURE The myth is dead-Long live the myth .......................... . 1045 Distributed intelligence for user-oriented computing ................ . A design of a dynamic, fault-tolerant modular computer with dynamic redundancy ................................................. . 1049 1057 MOS LSI minicomputer comes of age ... , ........................ . 1069 E. Glaser F. Way III T. C. Chen R. B. Conn N. Alexandridis A. Avizienis G. W. Schultz R. M. Holt ROBOTICS AND TELEOPERATORS Control of the Rancho electric arm .............................. . 1081 Computer aiding and motion trajectory control in remote manipulators. 1089 A robot conditioned reflex system modeled after the cerebellum. . . ... 1095 M. L. Moe J. T. Schwartz A. Freedy F. Hull G. Weltman J. Lyman J. S. Albus DATA MANAGEMENT SYSTEMS Data base design using IMSj360 ................................ . An information structure for data base and device independent report generation .................................................. . 1105 R. M. Curtice 1111 SIMS-An integrated user-oriented information system ............ . 1117 C. Dana L. Presser M. E. Ellis W. Katke J. R. Olson S. Yang A data dictionary j directory system within the context of an integrated corporate data base ..................... _..................... . 1133 B. K. Plagman G. P. Altshuler Framework and initial phoses for computer performance improvement .. 1141 Core complement policies for memory migration and analysis ....... . Data modeling and- analysis for users-A guide to the perplexed ..... . 1155 1163 T. Bell B. Boehm R. Watson S.Kimbleton A. Goodman MEASUREMENT OF COMPUTER SYSTEMS-ANALYTICAL CONSIDERATIONS TECHNOLOGY AND ARCHITECTURE (Panel Discussion-No Papers in this Volume) LANGUAGE FOR ARTIFICIAL INTELLIGENCE Why conniving is better than planning ........................... . 1171 The QA4 language applied to robot planning ...................... . 1181 'Recent developments in SAIL-An ALGOL-based language for artificial intelligence ......................................... . 1193 J. A. Feldman J. R.Low D. C. Swinehart R .. H. Taylor 1203 D. Teichroew 1225 J. C. Strauss 1235 R. R. Hench D. F. Foster Computer jobs through training-A final project report ............ . 1243 M. G. Morgan N. J. Down R. W. Sadler Implementation of the systems approach to central EDP training in the Canadian government .................................... . Evaluations of simulation effects in management training ........... . 1251 1257 G. H. Parrett H. A. Grace Conceptual design of an eight megabyte high performance chargecoupled storage device ....................................... . 1261 Josephson tunneling devices for high performance computers ........ . Magnetic bubble general purpose computer ....................... . 1269 1279 B. Augusta T. V. Harroun W. Anacker P. Bailey B. Sandfort R. Minnick W. Semon G, J. Sussman D. V. McDermott J. A. Derksen J. F. Rulifson R. J. Waldinger USER REQUIREMENTS OF AN INFORMATION SYSTEM A survey of language for stating requirements for computer-based information systems ......................................... . MEASUREMENT OF COMPUTER SYSTEMS-CASE STUDIES A benchmark study ............................................ . SERVICE ASPECTS OF COMMUNICATIONS FOR REMOTE COMPUTING Toward an inclusive information network ................. : ....... . TRAINING APPLICATIONS FOR VARIOUS GROUPS OF COMPUTER PERSONNEL ADVANCED TECHNICAL DEVICES ADVANCES IN NUMERICAL COMPUTATION On the numerical solution of III-posed problems using interactive graphics .................................................... . Iterative solutions of elliptic difference equations using direct methods .. Tabular data fitting by computer ................................ . On the implementation of symmetric factorization for sparse positivedefinite systems ............................................. . 1299 1303 1309 J. Varah P. Concus K. M. Brown 1317 J. A. George Cognitive and creative test generators by F. D. VICKERS University of Florida Gainesville, Florida yielded a useful generator for a particular type of test question. This presentation provides background material for the discussion of concepts which are not so simple and which are now under investigation. Finally, the last section provides some ideas for future development. INTRODUCTION Noone in education would deny the desirability of being able to produce quizzes and tests by machine. If one is careful and mechanically inclined, a teacher can build up, over a period of time, a bank of questions which can be used in a computer aided test production system. Questions can be drawn from the question (or item) bank on various bases' such as random, subject area, level of difficulty, type of question, behavioral objective, or other pertinent characteristic. However, such an item bank requires constant maintenance and new questions should periodically be added. It is the intention of this paper to demonstrate a more general approach, one that may require more initial effort but in the long run should almost eliminate the need to compose additional questions unless the subject material covered changes or the course objectives change. This approach involves the design and implementation of a computer program that generates a set of questions, or question elements, on a guided but random basis using a set of predetermined question models. Here the word generate is used in a different sense from that used in item banking systems. The approach described here involves a system that creates questions from an item bank which is, for all practical purposes, of infinite size yet does not require a great deal of storage space. Storage is primarily devoted to the program. It appears at this stage of our research that this approach would only be applicable to subject material which obeys a set of laws involving quantifiable parameters. However, these quantities need not be purely numerical as the following discussion will demonstrate. The subject area currently being partially tested with this approach is the Fortran language and its usage. The following section of this paper presents a brief summary of a relatively simple concept which has SYNTAX QUESTION GENERATION A computer program has been in use at the University of Florida for over six years that generates a set of quizzes composed of questions concerning the syntax of Fortran language elements. See Figures 1 through 5. The student must discriminate between each syntactic type of element as well as invalid constructions. The program is capable of producing quizzes on four different sets of subject area as well as any number of variations within each area. Thus a different variation of a quiz can be produced for each section of the course. Figure 2 contains such a variation of the quiz shown in Figure 1. The only change required in the computer program to obtain the variation is to provide a single different value on input which becomes the seed of a psuedo random number generator. With a different seed a different sequence of random numbers is produced thereby generating different variations of question elements. For each question, the program first generates a random integer between 1 and 5 to determine the answer category in which to generate the element. As an example, consider Question 27 in Figure 1. The random integer in this case was 2 thus a Fortran integer variable name had to be created for this question. A call was made to a subroutine which proceeds to generate the required / name. This subroutine first obtains a random integer between 1 and 6 which represents the length of the name. For Question 27, the result was a 2. The routine then enters a loop to generate each character in the name. Since for integer names the first character must be I, J, K, L, M or N, 649 650 Fall Joint Computer Conference, 1972 ~l Ar·~ E ••••••••••••• ~ CIS 3"2 SECTION 1 OUIZ 1 •••••••••••• 1 n•••••••.••••••• THE 25 ELEr1ENTS BELO\'! BELONG TO ONE OF THE FOLLO\'IING FIVE CATEGORIES. I NO I CATE ON BOTI-I TH I S SHEET Arm YOUR ANS\.'!ER SHEET IN \'JH I CH CATEGORY EACH ELEMEtJT BELONG5. 1. 2. 3. 4. 5. 1. 2. 3. 4. 5. FORTRAN IV SPEC'fi.L CHAPACTER FORTPAN IV CON5TANT FORTPAN IV SYt 1BOL A VALID JOB CONTROL LANGUAGE COHMANO NONE t"IF THE ABOVE A A f\ MH..JA68 n. fl. l~hJ2 7. t) • •• • ,. n• · .. . 11. • ••• 1.? • · ••• l3 .. • ••• 11J • , 65KNFI2ST 15856251 ICALC ..J~16 K r.J55 6.9543E-5 6. v • ••• 1.5. · ... 16. • ••• 17. • ••• 18. • ••• 1. q • • ••• 20. • ••• 21. • ••• 22. • ••• 23 . • ••• 24. • ••• 2 5 • ) 1~4793F)460 . IFllST l$r·10YV2 IENO ( ICALC ,) $\'1$4 T 78L7KUJ 110 838475,56 42760. = L6QIX THE 25 ELEMENTS BELOW BELONG TO ONE OF THE FOLLOWING FIVE CATEGORIES. 1. 2. 3. 4. 5. • ••• 2 ~ • 1'lY- • ••• 27 • • ••• 28. APV~' • ••• 2 q • • •• . 30. · .. . 31. • ••• 3? • • ••• 3:3. · .. . 34. • ••• 35. • ••• 3f1 • • ••• 37 • • ••• 3R • A FORTRAN INTEGER CONSTANT A ~nRTRAN INTEGER VARIABLE A FORTRAN REAL CONSTANT A FORTRAN REAL VARIABLE NnNE OF THE ABOVE KS K 584 *OO5g0 .655147 • •• ;lJ 4. • 61~O 176 • ••• lJ 5 • MN KOKLTP PHK4Q( 5.l~{)E-5 Y5Z 37 SCOI1ING FORMULA = I1IGHT*2 I NH~tJM SCORE = 10 ~~ • ••• ~ 9. • ••• 40. • ••• l~ 1 • • ••• 42. • ••• 43. • ••• 46. • ••• 47. • ••• 48-. • ••• 4 ~. • ••• 50. 24.20 Figure I-Quiz 1 example 2.70E+7 .449E-3 .04G39E+4 447675023 J EHHY$G5 JUPTAH47F 50.E+l 725 3.E+3 fJYR U$QQR*S3447 Cognitive and Creative Test Generators CIS r'~A ~.~ E •••••••••••••••••••••••••• 3t')~ SECT !CHI 2 ('(lIZ 1 In .•.•......•.... THE 25 ELEr1E~!TS Br:LOH BELONG TO ONE DF THE FnLl()VIING FIVE CATEGORIES. IN!1ICATE NJ ~nTH THIS SIIEET ANn YOllR AnS"'ER SHEET n~ '1HICH CATEGORY EACH ELFYENT BEUHJ3S. ' .A. 2• 3. ,,\ 4. 5. 1. 2. 3. 4. 5. ~. 7• 8. 0 " . • ••• 1 n • · ... '.1. • •• It 1~ • • .•. 13. FORTRAN IV SPECIAL CHARACTER FORTR~.N IV CONSTAnT A FonT Ri\tJ IV SYM[H'~L l\ VALl!) Jon CONTrOL LANGUAGE cnm1AtlO NONE OF THE ABOVE 1. X57.1=!( = • ••• 14. • ••• 15. .62522E+8 IFLIST q I "'~ n~~1 • ••• 1.7 • · ..• 18. . · •.. 1 n• n 5134 r "J81l PPh~~KIJ u-w OTEf1A0KG 45048833 3.7 IINT[P. • • ., • 26. • ••• 27 • • ••• 28. • ••• 29. · .. . 30. • ••• 31. • ••• 32. • ••• 33. • ••• 3lJ • • ••• 35. • ••• 3n • • ••• 37 • • ••• 38. IV IlllJ'I7T CY0KF. • ••• 11) • • ••• 2 n • , • ••• ,21. • ••• 22. 4XI 1/7KG • •.• 23. • ••• 24. • ••• 25. THE 25 ELEMENTS BELOW BELONG TO ONE OF T.'E 1. 2. 3. 4. 5. I .78242E+9 IINSERT I) 7f'20SQSR2 8. E+ll KOV ~OLL()WING FIVE CATEGORIES. '\ FORTRAN I NTEGEP. COtlSTANT FOPTP.AtI INTEGEr VAP.Il\BlE A FORTRAN REAL CONSTANT A FORTRAN REAL VARIABLE rWNE OF THE ABOVE r... JJ8 K2NP3 • ••• 3<1. • ••• 4" • • ••• 41. • ••• 42. PFR AZJVM7 41 H8Z L3F5 SEEXQH .8FF.+5 VFKCY R*JVYP • FE-l~ 9.E-2 • ••• 43. • • ~ • l~ 4 • • .'• • l,~5. • ••• 4 f' • • ••• 47 • • ••• 48. • ••• 4 q • • ••• 50. = nlGHT*2 SCORING FO~MtJLA t·1INI~·11H·1 sconE = 10 24.30 Figure 2-Quiz 1 variation H3VOG 5. 7453l~E-3 184 Q 5 Ll 0 1HOQUVT Y70D+$7PO 9.04E+1 ,!;810E4DL H03( 8.2873c}E+4 2 096 1'13 651 652 Fall Joint Computer Conference, 1972 CIS 302 NAt~ E •••••••••••••••••••••••••• SECTION 1 QUIZ 2 I O••••••••••••••• THE 25 ELEMENTS BELOW BELONG TO ONE OF THE FOLLOWING FIVE CATEGORIES. INDICATE ON BOTH THIS SHEET AND YOUR ANSHER SHEET IN WHICH CATEGORY EACH ELEMENT BELONGS. 1. 2. 3. 4. 5. 1. 2. 3. 4. 5. 5. 7. 8. 9. ·· ... '.0. .. . 11. • ••• 1.2. • ••. 13. EXPRESSION CONTAINING ONLY ONE r-10DE OF OPERAND (INTEGER OR RE ALL OTHER EXPRESSIONS A VALID ARITHMETIC STATEMENT AN INVALID ARITm~ETIC STATEMENT (CONTAINS AN =SIGN) NONE OF THE ABOVE AN LG09'F9= ( I TM-JSC) N55W=ALOG(.4/Z$D/.Q5) 28( BI=NY7M-5 EXP«-O.5f1255» -8+3 (+(-L) K=(-JUPT21)+5 COS ( +til J - D) ARS(COS(5.6QSOE+4**4» 9.48E+4=(-IX6RY) TANH {ZXH**\JTHY) ,«53)/LlS8R1) • ••• 1 r. • • ••• 15. • ••. 15. · .. . 17 . · ..• 18. · .. . 19. •••• 20. · .. . 21. • ••• 22. • ••• 2 3. • ••• 24. • ••• 25. «7239+XDZU» N=OROBLI) Y1K)(l) A,(395278364) S$X·«.~OE-9» 2358( -\tJ5'HJFX**1 +JQ*l TQ=*~3296*q 37=(LE)+9 +DE=6591') .504111 THE 25 ELEMENTS BELOW BELONG TO ONE OF THE FOLLOWING FIVE CATEGORIES. 1. 2. 3. 4. 5. • ••• 2 f) • • ••• 27 • • ••• 28. • ••• 29. • ••• 30. · .. . 31. • ••• 3? • • ••• 3 3 • • ••• 3 r~ • • ••• 35. • ••• 36. • ••• 37 . • ••• 38 • A A A J\ STATn·1E~IT STATp·1ENT STATEt-1ENT STATEMENT NONE OF THE CAUSING AN UNCONniTIONAL TRANSFER HAVING A 2 HAY CONDITIONAL TPANSFER (ASSUME NO HAVItJG A. 3 HAY CONDITIONAL TRANSFER (ABNOR~1.1\L HAVING A 4 t..,.A.Y CONDITIONAL TRANSFER (TERMINATIONS APOVE ANn/OR A SYNTACTICALLY INCORRECT STATEMENT GOTO(796,562,282),K18BWB GO TO 175 GOTO(7886,~S,q,1),INAIGO GOTO(7,7,7),HYI I F ( Np- 4 ) q, 2 5, C) GO TO 65 GOTO(77,5,402,524S),L81V GOTn(3),N3017 GOTO(Q6,210,210,Q6),N GOTO(8,8,8,S),CfZ IF«A»31,31,31 (KL)3350,672,33S0 GOTO(282,6Rl,1,5),NKS SCOR I NG FORr'aJLA = RI CHT*2 MINIMUM SCORE = 10 • ••• 3 q • • .•• 40. · ••• 41. • ••. 42. • ••• 43. • ••.• 4 r~ • • ••• 45. • ••• 46. 7• • ••• 48. • ••• f.s. • ••. 49. • ••• 5 0'. 24.20 Figure 3-Quiz 2 example GO TO 989 (9,82,30,952S),IUK GOTn(514,55,648,8),K$J IFC-LGUZN)4,3,814 IF(O~FG»16,4,22 IF(OW02/.5)5~70,1,S3 00TO(917,657,3433), I IF«DZTLY»2,2,2 GnTO(S,813,S,QS),MOXO GnTO(Q,8383,8,48),NOIAO GOTO(4,1,2,5283),LRIGP9 IF{ALOG(W»376,413,413 Cognitive and Creative Test Generators CIS 302 NAt~E •••••••••••••••••••••••••• SECTION 1 OUIZ 4 I D............... . THE 25 ELEMENTS BELOW BELONG TO ONE OF THE FOLLOWING FIVE CATEGORIES. INDICATE ON BOTH THIS SHEET ANn YOUR ANSWER SJiEET IN WHICH CATEGORY EACH ELEMENT BELONGS. 1. 2. 3. 4. 5. • • 4J: • 1. 2. 3. 4. 5. 6. 7. 8. 9. • ••• 10. • ••• 11. • ••• 12. • ••• ]. 3. A VALID A VALlO A VALID A VALID NONE OF INPUT STATEMENT OUTPUT STATEMENT FIELO SPECIFICATION FORHAT STATEMENT THE ABOVE PR I NT, JJN I , PZ FORMAT(5HI7ZV(,217) E13.6 PRINT,VCXNl PRINT,IAOSI READ(5, 988 )l~ZS OR FOR~1AT · .• . 14. • ••• 15. · ••• 16. • ••• 17 • · .. . 18. PRINT,C,62,NCOZ~ FORMAT(832) 551 41 2Ell.4 REAn(S,73)X,MBWDVZ,JSY3Y FORMATC'F',4H2RH*) • ••• 1 q • • ••• 2 () • • ••• ? l . • ••• 22. • ••• 23. • ••• 24. • ••• 25. CODE RE.f\O, MLGN, G4 K, J 12 F0 Rr·,1A T ( E11 • 0 , E1 fl • 4 ) PRINT,N,uonER5,L REAO,IF PRINT,G,LT,XIHJC FORMAT(2A2) El1.12 2X READ, t;18 291 REAn(5,32)E2146 THE 25 ELEMENTS BELOW BELONG TO ONE nF THE FOLLOWING FIVE CATEGORIES. 1. 2. 3. 4. 5. • ••• 2[1. • ••• 27. • ••• 28. 2!l. • ••• 30. • • •0 • .... 31. • ••• 32. A VALID SURSCnlPT A VALID INTEGER SUBSCRIPTED VAPIABLE A VALID REAL SUBSCRIPTED VARIABLE A VALID Dn~ENSION STATE~~ENT FOR r1AIN PROGRAMS ONLY NONE OF THE ABOVE -74 N5Z(161) 9*LVDMS4-9 K3GOJY(5*K-2) LP-5 0 • ••• 3 q • • ••• 40. · .. . 41. • ••• IJ 2. • ••• 43. DIHENSION Q$OC(13,4,7,3) DAYB7A(LXPG) -4 • •• • 34 . • ••• 35.·, DIMENSION 0(5,8,7,1,1,3) • ••• 36. ~HX(4*ILMHP-5) • ••• 37 • ~ I B(NONU8, 7*,JRl-8,IJ4M) • ••• 38. WSSC C3 *N~1) •••• 33. SCOR f NG M~NIMU~ FOP.~1IJLA SCORE = RI CHT*2 = 10 • ••• 44. • ••• 45. ·• ... 'l () . ••• 47 • • ••• 48. • ••• 4 Q. • ••• 50. 24.10 Figure 4-Quiz 4 example DIMEN~ION S(R,5,6,3,63) K$V1SV+8 E~Q(7*LV9JY2,~*N,644) ZZI2U(S*NNBLW-4,M9U,5Q) OIMENSION ZA05T(7,8,3) DI~1ENSION FRO(1,5,6) M34B(N,M8+5,q*JRC,S,N~) D7NOCM+9,NRG74E,8*NY) IR7K4 DIMENSION dF91 (4,5,7) AZ(I+9~5*L9,8*f,MVOV8+7) ITA4UCM8U+4, K-6, 8*t1~'1pr'.~2) 653 654 Fall Joint Computer Conference, 1972 CIS 302 tJA~.1E •••••••••••••••••••••••••• SECTION 1 nUlz 5 I D••••••••••••••• THE 25 ELEMENTS BELOW BELONG Tn ONE OF THE FOLLOWING FIVE CATEGORIES. INDICATE ON BOTH THIS SHEET AND YOUR ANSWER SHEET IN WHICH CATEGORY EACH ELEMENT BELONGS. 1. 2. 3• 4. 5. ·... ··... ... 1. 2. 3. 4. 5. 6. 7. 8. ... ··•••• ... 10.9. .... 11. • ••• 12. • ••• 13. A VALID DO STATEMENT WITH IMPLIED INCREMENT A VALlO DO STATEMENT WITH EXPLICIT INCREMENT CAN BE EITHER AN INDEX, INITIAL VALUE, UPPER LI,..11 T OR INC REM EN '1 CAN ONLY BE AN INITIAL VALUE, UPPER LIMIT OR INCREMENT NONE OF THE ABOVE DO 90 J ~, 94, 6 DO 7566 JDKAUT 885, KI DO 22 IAo.S == K, 52 LT2 DO 5 N NACICR, 98 00 4978 I 4, 2fl 5 MYJ9, 351, 21 DO 8 JAg 431436592 8489622 DO 9 MCSXLU = 3, N7LC, H I DO 8847 L 35, 880, LIl :I :I :I :I IiII • ••• 14. • 4, J30 NY6P$ • 3, LM DO 7 NOWY • 225, 1861 · . .. 15. 0090 M DO 9431 · . .• 17 . · . .• 18. 009290 KM58NI • ••• 20. DO •••• 16. • ••• 19. • ••• 21. • ••• 22. • ••• 23. • ••• 24. • ••• 25. 1.6378E-8 N5 • 85, N 82 J == 38, 927 L1YVOS o. DO 453 KXR J3, 7437 105 DO 1583 K == 241), K I: :I THE 25 ELEMENTS BELOW BELONG TO ONE OF THE FOLLOWING FIVE CATEGORIES. 1. 2. 3. 4. 5. •••• 26. ••.• 27. •••• 28. •••• 29. •••• 30. · .. . 31. •••• 32. •••• 33. • ••• 34. •••• 35. •••• 36. •••• 37. •••• 38. A VALID A VALID A VALID A VALID NONE OF ARITHMETIC STATEMENT CONTROL STATEMENT INPUT OR OUTPUT STATEMENT SPECIFICATION STATEMENT THE ABOVE GO TO NNBL WRITE(6,17)CQU,YB7,VY FORMAT(6X,18) READ,V9E,Ll GOTOC17,5148),COPAAF STOP G=ALOGCO.E-2)*9462.56 CONTINUE DIMENSION JBCLC2,3,5) (823~4,837,4},MZPRI FORMAT(7X) O.QYPP==N I D PRINT,63777729,SC,AK14 SCORING FORMULA == RIGHT*2 MINIMUM SCORE = 10 . •••• 39. • .... 40 .. · .• . 41 • • ••• 42 • • ••• 43 • • ••• 44. • ••• 45 • • ••• 46 • • ••• 47. •••• 48. • ••• 49 • •••• 50. 24.10 Figure 5-Quiz 5 example 5.3E+21i11+40052*MW2 GO TO 9150 READ,UU2CK,NX FORMAT(')=),',lA3,2F6.5) FORMAT('=M+L',lX,'('} PRINT,S,FW4KNL,NX~J4 WGAME=Y~T**L/S2VFON READ,KI,$QJ9U,W DIMENSION LX(6,5,4!J,1) GO TO 653 L-029668-MGR FORMAT(7H041$D2/,')') Cognitive and Creative Test Generators KEY KEY KEY ('lUll 1 SEC 1 nUIZ 1 SEC 1 1. 2. 3. 4. 5. 3 1 5 '- 1. '- . 3. 4. 5. 6. 4 3 6. 7. ?> 7. 8. 11. 2 8. C). 10. 11. 12. 13. 14. 15. 1fi. 17. 18. 1"l. 20. 21. ?2. 23. 24. 25. 26. ?7. 2R. 2g. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. SO. 5 1 '- 1 4 5 4 1 4 1 1 3 5 4 2 1 3 5 2 l~ 1 5 ~ 2 2 5 3 4 1 3 3 3 1 2 5 5 3 1 3 4 5 24.20 10. 11. 12. 13. 14. 15. 16. 17. 18. lq. 20. 21. 22. 23. 24. 25. 2fi. 27. 28. 2fl. 30. 31. 32. 33. 34. 35. 36. 37. 38. 3t). 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. OUIZ 1 SEC 1 3 1 5 2 4 3 3 2 5 1 2 1 4 5 4 1 4 1 1 3 5 4 2 1 3 5 2 l~ 1 5 3 3 2 2 5 3 4 1 3 3 3 1 2 5 5 3 1 3 4 5 24.20 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. lR. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 3q. 40. 41.. 42. 43. 44. 45. 46. 47. 48. 4q. 50. KEY KEY QUIZ 1 01.117. 1 SEC 1 3 1 5 '- I~ 3 3 2 5 1 2 1 lJ 5 4 1 4 1 1. 2. 3. 4. 5. 3 1 5 2 4 3 3 n. 7. 8. . 10. ~ 11. 12. 13. 14. 15. 16. 17. 18. SEC 1 1 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 2 43~ 5 44. 45. 46. 47. 48. 49. 50. 2 5 1 2 1 4 5 4 1 4 1 1 3 5 4 1 ltl. 3 5 1 3 5 2 4 1 5 3 3 2 2 5 3 4 1 3 3 3 2 5 5 20. 21. 22. 23. 24. 25. 2F.. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 3 'J6. 3 1 47. 48. 49. 50. l~ 2 1 3 5 2 4 1 5 3 3 2 2 5 3 4 1 3 3 3 1 3 4 S 24.20 Figure 6-Key example 1. 2. '- 5 1 3 4 5 24.20 3 1 5 2 4 3 3 2 5 1 2 1 4 5 4 1 4 1 1 3 5 4 2 1 3 5 2 4 1 5 3 3 2 2 5 3 4 1 3 3 3 1 2 5 5 3 1 3 4 5 24.20 655 656 Fall Joint Computer Conference, 1972 the first random number in this loop would be limited to a value between 1 and 6. Subsequent random numbers produced in this loop would be between 1 and 37 corresponding to the 26 letters, 10 digits and the $ sign. Thus, for Question 27, the characters KS resulted. In similar fashion, the names for Questions 33, 34, and 43 were produced. As each category for each question is determined by the main program, the values between 1 and 5 are kept in a table to be used as the answer key. This Name MAIN SETUP QUIZi ALPNUM SYMBOL CONSTA SPECHA JCLCOM NONEi INTCON INTEXP REAEXP MIXEXP MIXILE UNIARY PAREN BINARY FUNCT ARITH GOTON IFSTAT COGOTO INOUT FIESPE FORMAT DOSTAT SIZCON CONTRL SPESTA INTVAR REACON REAVAR STANUM SUBSCR INTSUB REASUB DIMENS table is listed for each quiz and section as shown in Figure 6 for use in class after quiz administration is complete. A card is also punched containing the key for input to a computerized grading system which is. used to grade tests and homework and maintain records for the course. To illustrate the scope of this quiz generator in terms of programming effort, the following list gives the name and purpose of each subroutine in the total package. Each routine is written in Fortran IV: Purpose General test formatting and key production Prints a leader to help operator setup printer Calls routines for categories in each quiz Generates single alphanumeric characters " a Fortran symbol " " " constant, real or integer " " " special character " " job command " none of the above entries for each quiz " a Fortran integer constant /I " " " expression II real " " " II " " mixed " " illegal expression " " uniary operator expression " " expression within parentheses "II " binary operator expression II " function call II " Fortran arithmetic statement II " " GO TO statement " " IF " " II " " comp GO TO " II " " I/O statement II " format field specification " format statement " II " Fortran DO statement II " constant of a given size " control statement " " specification statement " " integer variable " 1/ " real constant 1/ " " variable 1/ " statement number 1/ " subscripted variable 1/ " integer " " 1/ " real " " 1/ " dimension statement The only major criticism that can be made of these quizzes is that they fail to test the student on his understanding of the behavior of the computer under the control of these various statements either singly or in combination. This understanding of the semantics of Fortran, of course, is imperative if a programmer is to be successful. Thus a method is needed for generating questions which will test the student in this under- Cognitive and Creative Test Generators standing. It is this problem the solution of which is now being sought. The following sections describe some of the major concepts discovered so far and possible methods of solution. 657 ANSl'lER AND DISTRACTOR GENERATOR MAIN GENERATOR KEY SEMANTIC QUESTION GENERATION GENERATOR Work is now under way for designing a system to produce questions which require semantic understanding as well as syntactic recognition of various Fortran program segments. The major difficulties in such a process is the determination of the correct answer for the generation of a key and the computation of the most probable incorrect answers for the distractors of a question. Both of these determinations sometimes involve semantic meanings (i.e., evaluation of expressions or the execution of statements) which would be difficult to determine in the same program that generates the question element in the first place. As a good illustration, consider the following question model: Given the following statement: IF (X + 2.0 - SQRT(A» 5,27,13 where X = 6.5 and A = 22.7 Transfer is made to the statement whose number is (1) 5 (2) 27 (3) 13 (4) indeterminant (5) none of the above as the statement is invalid Here the generator would have created the expression X + 2.0 - SQRT(A) , the three statement numbers 5, 27 and 13 and finally the two values of X Figure 8-2nd stage involvement of key and distractors 5 KEY = 1 GO TO 10 27 KEY=2 GO TO 10 13 KEY=3 10 CONTINUE This problem can be solved by letting the main generator program generate a second program to compute the key as well as generate the question for the test. This second program would then be passed to further job steps which would compile and execute the program and determine the key for the question. Figure 7 illustrates this concept. As an illustration of a question involving more difficult determination of answer and distractors, the following question model is presented. Given the statement: I = J/2 +X where J = 11 .. MAIN GENERATOR ~ TEST I and X = 6.5 the resulting value of I is KEY GENERATOR KEY Figure 7-2nd stage involvement of key and A. The order of the first four answer choices could also be determined randomly. In this particular question, determination of the distractors is no problem but the determination of the correct answer involves an algorithm similar to the following: X = 6.5 (1) 11.5 (2) 11 (3) 12 (4) 6.5 The determination of the five answer choices would have to be determined by an algorithm such as the ~,1A IN GENERATOR A = 22.7 IF (X + 2.0 - SQRT(A» 5,27,13 (5) 6 Figure 9-No 2nd stage involvement 658 Fall Joint Computer Conference, 1972 THE NEXT FOUR QUESTIONS REFER TO THE FOLLOWING STATEMENT: = K, N, 537 WHERE N = 961 AND K = 1 DO 746 LS21Q4 1. THE FINAL VALUE OF THE 00 VAPIAP,LE, LS2IQ4, IS: (1) (5) 2. 1 (2) 537 (3) NONE OF THE ABOVE 538 (4) 2 THE STATEMENTS WITHIN THE DO LOOP ARE EXECUTED M TIMES, ~.1 IS: \~HERE (1) (5) 3. 1 (2) 537 (3) NONE OF THE ABOVE 538 (4) 2 IF K = 962, THE STATEtAENTS ~nTHIN THE LOOP \'!OULO BE EX ECUTEQ N T U~ES \\IHERE N IS: 0 (2) 1 (3) UNDETEPMINARLE (4) THE PROGRAM WILL NOT BE EXECUTED (5) NONE OF THE ,ABOVE (1) 4. ONE LEGITIMATE STATEMENT FOR THE LAST STATEMENT IN THE LOOP IS: (1) 25G/XKL+L~6 (2) GO TO 31 STOP RETURN (5) "'!R I TE ( G, 20) I (3) (Ld 5. GIVEN THE STATE~ENT: GO TO (578,95~,q75,852,212,864,4g8,7q3),K6 K6 IS 4 TRANSFER IS "1AOE TO THE STATH1ENT ~"'HERE ~JHOSE NW1BER IS: (1) TRANSFER IS ~.~ADE TO THE FIRST 8 NIH}.BEPS HITHIN THE PARENTHESIS IN THAT ORnER (2) 852 (3) 4 (4) MORE INFORMATION IS NEEDED (5) TRANSFER I S NOT tAAOE REC.A.USE THE STATH·1ENT I S I NVAL I D 6. GIVEN THE STATEMENT: IF(CXPJE+797) 43, 326, 896 IF CXPJE = .24 TRANSFER IS t"AOE TO THE STATEt)nn NUMREREO: (1) 0.24000 (5) ~.'()NE (2) 43 (3) 326 (4) 8gB OF THE ABOVE Figure lO-Semantic question examples Cognitive and Creative Test Generators 659 following: TEST ORIENTED SOURCE LANGUAGE J = 11 X = 6.5 ANSI = J/2 +X J/2 +X IANS3 = J/2. +X IANS2 = TOSL PROCESSOR (SNOBOL) FORTRAN PROGRAM Figure ll-TOSL Language environment obvious that a more reasonable method of writing the source program is needed. ANS4 = X IANS5 = X In this problem not only does the determination of the key depend on further computation but also the distractors and the correct answer. Thus the second \ program generated by the first program must be involved in the production of the test as well as the key. Figure 8 illustrates this concept. Some questions are very simple to produce as neither key nor answer choices depend on a generated algorithm. An example is: Given the following statement: DO 35 J5 = 3, 28, 2 The DO loop would normally be iterated N times where N is (1) 13 (2) 12 (3) 14 (4) 28 (5) 35 Here the answer choices are determined from known algorithms independent of the random question elements. No additional program is therefore required for producing this test question and its key. Figure 9 illustrates this condition. It would then appear that a general semantic test generator would have to satisfy at least the conditions exhibited in Figures 7, 8 and 9. Figure 10 illustrates results obtained from a working pilot program utilizing the method illustrated in Figure 8. This program is a very complicated one and was very difficult to write. To produce a Fortran program as output from a Fortran program involved a good deal of tedious work such as writing Format statements within Format statements. It has become FUTURE INVESTIGATION An attempt will be made to design a source language oriented toward test design which will then be translated by a new processor into a Fortran program. See Figure 11. This new language is visualized as being composed of a mixture of languages including the possibility of passing simple English statements (for the textural part of a question) through the entire process to the test. Fortran statements could be written into the source language where such algorithms are required. Finally, statements to allow the specification of random question elements and the linkage of these random elements to the algorithms mentioned above will be necessary. Several special source language operators can be introduced to facilitate the writing of question models. Certain special characters can be chosen to represent particular requirements such as question number control, random variable control, answer choice control, answer choice randomization, and key production. It is anticipated that SNOBOL would make an excellent choice for the processor language as it will allow for rapid recognition of the source language elements and operations and in a natural way generate and maintain strings which will find their way into the Fortran output program and finally into the test and key. The possibilities of such a system look very promising and hopefully, such a system can be made applicable to other subject fields as well as the current one. A conversational item banking and test construction system by FRANK B. BAKER University of Wisconsin Madison, Wisconsin SYSTEM DESIGN INTRODUCTION File structure Most conscientious college instructors maintain a pool of items to facilitate the construction of course examinations. Typically, each item is typed on a 5" X 8" card and coded by course, book chapter, concept and other such keys. The back of the card usually contains data about the item collected from one or more administrations of the item. To construct a test, the instructor peruses this item bank looking for items that meet his current needs. Items are selected on the basis of their content and further filtered by examining the item data on the card, overlapping items are eliminated, and the emphasis of the test is balanced. After having maintained such a system for a number of years, it became obvious that there should be a better way. Consequently, the total process of maintaining an item bank and creating a test was examined in detail. The result of this study was the design and implementation of the Test Construction and Analysis Program (TCAP). The design goal was to provide an instructor with a computer based item banking and test construction system. Because the typical instructor maintains a rather modest item bank, the design emphasis was upon flexibility and capabilities rather than upon capacity. In order to achieve the necessary flexibility TCAP was implemented as a conversational system using an interactive terminal. Considerable care was taken to build a system that had a very simple computer-user interface. The purpose of the present paper is to describe the TCAP system. The order of discussion proceeds from the file structure to the software to the use of the system. This particular order enables the reader to see the underlying system logic without becoming enmeshed in excessive interaction between components. The three basic files of the TCAP system are the Item, Statistics and Test files. A record in the Item file contains the actual item and is a direct analogy to the 5"X8" card of the manual scheme. A record in the Statistics file contains item analysis results for up to ten administrations of a given item. Test file records contain summary statistics for each test that has been administered. The general structure of all files is essentially the same although they vary in internal detail. Each file is preceded by a header (see Figure 1) that describes the layout of the record in the file. Because changing computers has been a way of life for the past ten years, the header specifies the number of bits per character and number of characters per word of the target computer. These parameters are used to make the files word length independent. In addition, it contains the number of sections per record, the number of characters per record section, characters per record and the number of records in the file. The contents of the headers allow all entries to data items within a record to be located via a relative addressing scheme based upon character counts. This character oriented header scheme enables one to arbitrarily specify the record size and layout at run time rather than compile time; thus, enabling several different users of the system to employ their own record layouts without affecting the TCAP software. A record is divided into sections of arbitrary length, each preceded by a unique two character flag and terminated by a double period. Sub sections within a section are separated by double commas. These flags serve a number of different functions during the file· creation phase and facilitate the relative addressing scheme used to search within a record. Figure 2 contains an item 661 662 Fall Joint Computer Conference, 1972 File Header Element 1 2 3 4 5 6-15 Contents Name of file Number of bits per character in target computer Characters per word in the target computer Characters per record in the file Number of sections in the record Number of characters in section where i = 1,2, ... 10 Figure I-Typical file header file record that represents a typical record layout. The basic record layout scheme is the same in all files, but they differ in the contents of the sections. A record in the item file consists of seven sections: Identification, Keyword, Item, Current item statistics, Date last used, and Frequency of use, previous version identification. The ID section contains a unique identification code for the item that must begin with *$. The keyword section contains free field keyword descriptors of the item separated by commas. The item section contains the actual item and was intended primarily for multiple choice items. Since the item section is free field, other item types could be stored, but it has not been tried to date. The current item statistics section stores the item analysis information from the most recent administration of the item. The first element of this section is the identification code of the test from which the item statistics were obtained. The internal layout of this section is fixed so that the FORTAP item analysis program outputs can be used to update the information. The item statistics section contains information such as the number of persons selecting each item response, item difficulty, and estimates of the item parameters. The next section contains the date of the most recent administration of the item. The following section contains a count of the total number of times the item has been administered. These two pieces of information are used in the test construction section to prevent over use of an item. The final section of the item record contains the unique identification code of a previous version of the same item. This link enables one to follow the development of a given item over a number of modifications. A record in the Statistics file contains 11 sections, an item identification section and 10 item statistics sections identical in format to the current item statistics section of the item record. These 10 sections are maintained as a first in, last out push down stack with an eleventh data set causing the first set to be pushed end off. Records in the Test file are similar to those of. the Item file and have five sections: Identification, Keywords, Comments, Summary statistics of the test, and a link to other administrations of the same test. The comments section allows the instructor to store any anecdotal information he desires in a free field format. The link permits keeping track of multiple uses of the same test such as occurs when a course has many sections. The record layouts were designed so that there was a one to one correspondence between each 72 characters in a section and the punched cards used to create the file. Such a correspondence greatly facilitates the ease with which an instructor can learn to use the system. Once he has key punched his item pool, the record layouts within each file are quite familiar to him and the operations upon these records are easily understood. This approach also permitted integration of the FORTAP item analysis program into the TCAP system with a minimum conversion effort. It should be noted that the file design allows many different instructors to keep their items in the same basic files. Alternatively, each instructor can maintain Item File Record *$ STAT 01 520170 .. ZZ EDPSY,STATISTICS,ESTIMATORS,MLE. . QQ ONE OF THE CHARACTERISTICS OF MAXIMUM LIKELIHOOD ESTIMATORS IS THAT IF SUFFICIENT ESTIMATES EXIST, THEY WILL BE MAXIMUM LIKELIHOOD ESTIMATORS. ESTIMATES ARE CONSIDERED SUFFICIENT IF THEY, , (A) USE ALL OF THE DATA IN THE SAMPLE" (B) DO NOT REQUIRE KNOWLEDGE OF THE POPULATION VALUE, , (C) APPROACH THE POPULATION VALUE AS SAMPLE SIZE INCREASES, , (D) ARE NORMALLY DISTRIBUTED. VlW TEST 01 220170 .. 1 1 0 0014 .18 - .21 -01.36 -0.22" 1 2 1 0054 .69 + .53 -00.93 .63, , 1 3 0 0010 .12 + .64 -01.77 -0.83 .. VV 161271. . yy 006 .. $$ STAT 02 230270... Figure 2-A record in the item file Conversational Item Banking and Test Construction System his own unique set of basic files, yet, use a common copy of the TCAP program. The latter scheme is preferred as it minimizes file search times. Software design The basic programming philosophy adopted was one of cascaded drivers with several levels of utility routines. Such an approach enables the decision making at each functional level to be controlled by the user interactively from a terminal. It also enables each level of software to share lower level utility routines appropriate to its tasks. Figure 3 presents a block diagram of the major software components of the TCAP system. The main TCAP driver is a small program that merely presents a list of operational modes to the user: Explore, Construct, and File Maintenance. Selection of a.particular mode releases control to the correspondmg next lower level driver. These second level drivers have access to four search routines that form a set of high level utility routines. The Identification search routine enables one to locate a record in a file by its unique identification code. The Keyword search routine implements a search of either the item or test file for records containing the combination of keywords specified by the user. At present a simple conjunctive match is used, but more complex logic can be added easily. The Parameter search utility searches the item or statistics files for items whose item parameter values fall within bounds specified by the user. The Linked search routine all~ws one to link from a record in one file to a correspondIng record in another file. For example, from the item file to the statistics file or from the item file to the test file. Due to the extremely flexible manner in which the user can interact with the three files it was necessary to access these four search routines through the Basic File Handling routine. The BFH routine initializes the file Figure 3-TCAP software structure 663 handlers from the parameters in the headers, coordinates the file pointers, and handles certain error conditions. Such centralization relieves both the mode implementation routines and the search routines of considerable internal bookkeeping related to file usage. The four search routines in turn have access to a lower level of utility routines, not depicted in Figure 3. These lowest level utilities are routines that read and write records, pack and unpack character strings, convert numbers from alphanumeric to integer or floating point, and handle communication with the interactive terminal. The purpose of the EXPLORE routine is to permit the user to peruse the three basic files in a manner analogous to thumbing through a card index. The EXPLORE routine presents the user with a display listing seven functions related to accessing records within a file. These functions are labeled: Identification, Keyword, Parameter, Linked, Restore, Mode and Continue. The first four of these correspond to the four utility search routines. The Restore option merely reverses the linkage process and causes the predecessor record to become the active record. The Mode option causes an exit from the EXPLORE routine and a return to the Mode display of the TCAP driver. The Continue option allows one to continue a given search using the present set of search specifications. The Test Construction Routine is used to assemble an educational test from the items in the item file. Test construction is achieved by specifying a set of general characteristics all items should have and then defining sub sections of the test called areas. The areas within the test are defined by user supplied keywords and the number of items desired in an area. The Test Construction routine then employs the Keyword search routine, via BFH, to locate items possessing the proper . keywords. This process is continued until the speCIfied number of items for an area are retrived or the end of the item file is reached. Once the requirements of an area are satisfied the user is free to define another area or terminate this phase. Upon termination certain summary data, predicted test statistics, and the items are printed. The function display of the File Maintenance routine presents the user with three options: Create, FORTAP and Single. The Create option is a batch mode proc~ss that uses the File Creation from Cards subroutIne (FCC) to create any of the three basic files !rom a ca:d deck. To use this option, it is necessary to SImulate, Via cards, the interaction leading to this point. The FORTAP option is interactive, but it assumes th~t the FORTAP item analysis routine has created a card Image drum file containing the test and item analysis results. The file contains the current item statistics section for each item in the test accompanied by the appropriate 664 Fall Joint Computer Conference, 1972 identification sections and test links. A test file record for the test is also in this file. The File Maintenance routine transfers the current item statistics section of the item record of each item in the test to the corresponding record in the statistics file. It then uses the FCC subroutine toreplace the current item statistics section of the item records with the item statistics section from the FORTAP generated file. If an item record does not exist in the Item file a record is created containing only the identification sections and the current item statistics. The test record is then stored in the Test file and the header updated. The Single option is used to perform line item updates on a single file. Under this option the File Maintenance routine assumes that card images are stored in an update file and that only parts of a given record are to be changed. OPERATION OF THE SYSTEl\1 The preceding sections have described the file structure and the software design. The present section describes some interactive sequences representing typical uses of the TCAP system. The sequences contained in Figure 4 have had the lengthy record printouts deleted. The paragraphs below follow these scripts and are intended to provide the reader with a "feel" for the system operation. Upon completion of the usual remote terminal sign in procedures, the TCAP program is entered and the mode selection message--TYPE IN TCAP MODE= EXPLORE, CONSTRUCT, FILE MAINTENANCE is printed at the terminal. The user selects the appropriate mode, say EXPLORE, by typing the name. The computer replies by printing the function display message. In the EXPLORE mode, this message is the list of possible search functions. The user responds ty typing the name of the function he desires to pe~form, keyword in the example. The computer 'responds by asking the user for the name of the file he wishes to search. N ext, the user is instructed to type in the keywords separated by commas and terminated by a double period. The user must be aware of the keywords employed to describe the items and tests in the files. Hence, it is necessary to maintain a keyword dictionary external to the system. This should cause little trouble as the person who created the files is also the person using the system. Upon receipt of the keywords, the EXPLORE routine calls the Keyword Search routine to find an item containing the particular set of keywords. The contents of the item record located are then typed at the terminal. At this point the system asks the user for further instructions. It presents the message FUNCTION DISPLAY NEEDED. A negative reply causes a return to the Mode selection display of the TCAP driver. A YES response causes the EXPLORE function list to reappear. If one wishes to find the next item in the file possessing the same keyword pattern, CONTINUE, is typed and the search proceeds from the last item found. In Figure 4 this option was not selected. Returning to the Mode selection or reaching the end of the file being searched causes the Basic File Handler to restore the file pointers to the file origin. The next sequence of interactions in Figure 4 links from a record in the Item file to the corresponding record in the Statistics file. It is assumed that one of the other search functions has been used to locate a record prior to selection of the LINKED option, the last item found via the Keyword search in the present example. The computer then prompts the user by asking for the name of the file from which the linking takes place, item in the present example. It then asks for the name of the file the user wishes to link to statistics in the example. There are several illegal linkages and the Linked search routine checks for a legal link. The Linked search routine extracts the identification section of the item record and establishes the inputs to the Identification Search routine. This routine then searches the Statistics file for a record having the same identification section. It should be noted that a utility routine used a utility routine at this point, but the cascaded control was supervised by the EXPLORE routine. When the proper Statistics record is found its contents are printed at the terminal. Again, the system asks for directions and the user is asked if he desires the function display. In the example, the user obtained the function display ,and selected the Restore option. This results in the prior record, the item record, being returned to active record status. and the name of the active file being printed. The system allows one to link and restore to a depth of three records. Although not shown in the example sequences, the other options under the EXPLORE mode operate in an analogous fashion. The third sequence of interactions in Figure 4 shows the construction of an examination via the TCAP system. Upon selection of the Construct mode, the computer instructs the user to supply the general item specifications, namely the correct response weight and the bounds for the item parameters X 50 and {3. These minimum, maximum values are used to filter out items having poor statistical properties. The remainder of the test construction process consists of using keywords to define areas within the test. The computer prints AREA DEFINITION FOLLOWS: YES, NO. After receiving a YES response the computer asks for the number of items to be included in the area. The user can specify any reasonable number, usually between 5 and 20. The program then enters the normal keyword search Conversational Item Banking and Test Construction System TYPE IN TCAP M~DE =EXPL~RE, C~NSTRUCTIPN, FILE MAINTENCE FUNCTIPN DISPLAY TYPE KIND ~F SEARCH DESIRED EXPL~RE IDENT,KEYW~RD,PARAMETER,LINKED,REST~RE,CPNTINUE,M~DE KEYWORD TYPE IN FILE NAME ITEM TYPE IN KEYW0RDS SEPARATED BY C0MMAS TERMINATE WITH .. SKEWNESS,MEAN,MEDIAN.. *$AAAC 02 230270 .. THE ITEM REC0RD WILL BE PRINTED HERE { FUNCTI0N DISPLAY NEEDED YES,N0 YES FUNCTI0N DISPLAY TYPE KIND 0F SEARCH DESIRED IDENT,KEYW0RD,PARAMETER,LINKED,REST0RE,C0NTINUE,M0DE LINKED LINKED SEARCH REQUESTED TYPE NAME 0F FILE FR0M ITEM TYPE NAME 0F FILE LINKED T0 STAT *$AAAC 02 230270.. THE STATISTICS REC0RED WILL BE PRINTED HERE { FUNCTI0N DISPLAY NEEDED YES,N0 YES FUNCTI0N DISPLAY TYPE KIND 0F SEARCH DESIRED IEDNT,KEYW0RD,PARAMETER,LINKED.REST0RE,C0NTINUE,M0DE REST0RE ITEM REC0RD FILE REST0RED FUNCTI0N DISPLAY NEEDED YES,N0 YES FUNCTI0N DISPLAY IDENT,I<:EYW0RD,PARAMETER,LINKED,REST0RE,C0NTINUE,M0DE M0DE TYPE IN TCAP M0DE = EXPL0RE,C0NSTRUCTI0N,FILE MAINTENANCE C0NSTRUCT TYPE IN WEIGHT ASSIGNED T0 ITEM RESP0NSE 1 TYPE IN MINIMUM VALUE 0F X50 -2.5 TYPE IN MAXIMUM VALUE 0F X50 +2.5 TYPE IN MINIMUM VALUE 0F BETA .20 TYPE IN MAXIMUM VALUE 0F BETA 1.5 AREA DEFINITI0N F0LL0WS YES,N0 YES TYPE IN NUMBER 0F ITEMS NEEDED F0R AREA 10 TYPE IN KEYW0RDS SEPARATED BY C0MMAS TERMINATE WITH .. CHAPTER1,STATISTICS,THE0RY,FISHER. . AREA DEFINITI0N F0LL0vVS YES,N0 YES TYPE IN NUMBERS 0F ITEMS NEEDED F0R AREA 10 Figure 4-0perational sequences 665 666 Fall Joint Computer Conference, 1972 TYPE IN KEYW0RDS SEPARATED BY C0MMAS TERMINATE WITH .. CHAPTER2,DISTRIBUTI0N,FREQUENCY,INTERVAL.. AREA DEFINITI0N F0LL0WS YES,N0 YES TYPE IN NUMBER 0F ITEMS NEEDED F0R AREA 10 TYPE IN KEYW0RDS SEPARATED BY C0MMAS TERMINATE WITH .. CHAPTER3,BIN0MIAL,PARAMETER,C0MBINATI0N,PERMUTATI0N .. AREA DEFINITI0N F0LI.J0W8 YES,N0 YES TYPE IN NUMBER 0F ITEMS NEEDED F0R AREA 10 TYPE IN KEYW0RDS SEPARATED BY C0MMAS TERMINATE WITH .. CHAPTER4,HYP0THESES,LARGE SAMPLE,Z TEST .. AREA DEFINITI0N F0LL0WS YES,N0 N0 ITEMS REQUESTED PER AREA 10 10 10 10 ITEMS F0UND PER AREA 6 9 8 10 PREDICTED TEST STATISTICS MEAN = 16.0758 STANDARD DEVIATI0N = 4.561111 RELIABILITY = .893706 D0 Y0U WANT ITEMS PRINTED YES,N0 N0 ITEM IDENTIFICATI0N X50 BETA h$AAAA 03 230270. . .470000 .450000 (THIS INF0RMATI0NWILL BE PRINTED F0R ALL ITEMS) TYPE IN TCAP M0DE =EXPL0RE,C0NSTRUCTI~N,FILE MAINTENANCE EXIT THAT IS END 0F RUN,G00DBY Figure 4-tContinued) procedures and the user enters the keywords that define this area of the test. Upon receipt of the keywords the item file is searched for items possessing the proper descriptors and whose item parameters are within bounds. Completion of the keyword search results in a return to the area definition message. The area definition and search process can be repeated up to ten times. A NO response to the area definition message results in . the printing of the table showing the number of items requested per area and the number actually found per area. The table is followed by the predicted values of the test mean, standard deviation, and internal consistency reliability index. These values are computed from the current values of the item parameters X 50 and {3 of the retrieved items. These predicted values assist the test constructor in determining if an appropriate set of items has been selected by the system. The program then asks the user if he wants the selected items printed. If not, only the identification section and the values of the item parameters are printed. This information allows one to use the Identification search option of the EXPLORE routine to retrieve the items at a later date. A minor deficiency of the present test con- struction procedures is that a reproducible copy of the test is not produced. A secretary uses the hard copy to prepare a stencil or similar master. With some minor programming this final step could be accomplished. Some enhancements At the present time the full TCAP design has not been implemented and a number of additional features should be mentioned. Two sections of the item record, date of use, and frequency of use can be employed to prevent over use of the same items. A step in the test construction mode will enable the user to specify that an item used since a certain date or more than a specified number of times should not be retrieved. The software for this additional filtering has been written but not debugged. A significant enhancement is one that enables the test constructor to manipulate the items constituting a test. For example, an instructor may not be satisfied with the items the computer has retrieved in certain areas. He may wish to delete items from one area and Conversational Item Banking and Test Construction System add items to another. This can be done interactively and the predicted test statistics should be re-calculated as each transaction occurs. At the present time, such manipulations require a re-run of the total test construction process. An extension allowing considerable freedom in manipulating items of the constructed examination via the utility search routines has been designed but not implemented. The TCAP system was originally designed to be operated from an alphanumeric display, hence the mode display, function display terminology, but the present implementation was accomplished using teletypes. Alphanumeric displays have been acquired and many user actions will be changed from typed in responses to menu selections via a cursor. These displays will relieve the user of the major portion of the typing load and make the system a great deal easier to use. Some observations The TCAP design goals of flexibility, capability and ease of use produced a conflicting set of software requirements. These requirements combined with the fact that the operating system of the computer forced one to treat all drum files as if they were magnetic tapes resulted in a challenging design problem. The requirement for providing the user with computer based equivalents of present capabilities was solved through the use of cascaded drivers and multiple levels of utility routines. Such a scheme enables the drivers to be concerned with operational logic and the utility routines with performing the functions. The use of multiple levels of utility routines provided functional isolation that simplified the structure of the programs. The final TCAP program was highly modular, hierarchical in structure and quite compact. The use of relative addressing in conjunction with the character oriented file records and a header scheme proved to be advantageous. The approach makes transferring TCAP to other computers an easy task. Hopefully, the only conversion problem will be adjusting the FORTRAN A formats to the target computer. A significant feature of the approach is that record layouts within files are defined at run time rather than at compile time. The practical effect is that each instructor can tailor the number of sections within a record and their size to suit his own needs. Thus, the item, statistics, and test files can be unique to a given user. TCAP modifies its internal file manipulations to process 667 the record specifications it receives. Such flexibility is important in the university setting where each instructor feels his instructional procedures are unique. One consequence of the high degree of operational flexibility and the range of capabilities provided is that housekeeping within TCAP is extensive. A good example of this housekeeping occurs when the File Maintenance routine updates the item files from the item analysis results file generated by the FORTAP program. Because not all items in the test will have records in the item file, the File Maintenance routine must keep track of them, create records for them, add them to the item file, and'inform the user that the records have been added. There are numerous other situations of comparable complexity throughout the TCAP system. Handling them smoothly and efficiently is a difficult task. Because TCAP was implemented on a large computer, such situations were generally handled by creating supplementary drum files and provided working arrays in core. The use of random access files would have greatly simplified many of the internal housekeeping problems. On the basis of the author's experience with the design and implementation of the TCAP system o'ne salient conclusion emerges. Such programs must be designed as complete software systems. To attempt to design them in a sequential fashion and implement them piecemeal is folly. The total system needs to be thought through very carefully and the possible interactions explored. If provision is to be made for future, but undefined, extensions, the structure of the program and the files must be kept simple to reduce the interaction effects of such enhancements. It appears to be a characteristic of this area of computer programming that complexity and chaos await your every decision. This caveat is a reflection of the many design iterations that were necessary to achieve the TCAP system. The end product of this process is a system that provides the instructor with an easy to use tool that can be of considerable assistance. Being able to maintain an item bank and assemble tests to meet arbitrary specifications aids one in performing an unavoidable task. To do so quickly and efficiently is worth the investment it takes to convert one's item bank into machine readable form. The TCAP system illustrates again that tasks performed by manual means can often be quite difficult to implement by computer. In the present case a reasonable implementation was achieved by making the system interactive and taking advantage of the capabilities of both man and machine. Measurement of computer systemsAn introduction by ARNOLD F. GOODl\1AN McDonnell Douglas Astronautics Company Huntington Beach, California NEED FOR MEASUREMENT puter Conferences, Proceedings of Fall Joint Computer Conferences, Journals of the Association for Computing Machinery and Communications of the ACM, as well as selected Proceedings of ACM National Conferences and Proceedings of Conferences on Application of Simulation. The resulting personal bibliography and the unpublished bibliographies of BeIP, Miller2 and Robinson3-each with its own bias and deficiencywere utilized to obtain an initial indication of pioneer activity involving measurement. Measurement of computer systems was presaged by Herbst, Metropolis and Wells4 in 1945, Shannon 6 in 1948, Hamming6 in 1950 and Grosch7 in 1953. Bagley,S Black,9 Codd,10 Fein,11 Flores,12 Maron13 and N agler14 published articles concerning it during 1960. These were followed in 1961 with the related contributions of Barton,16 Flores,16 Gordonp Gurk and Minker,18 Hosier,19 and Jaffe and Berkowitz. 20 During 1962, there were pertinent papers by Adams,21 Baldwin, Gibson and Poland,22 Dopping,23 Gosden and Sisson,24 Hibbard,26 Patrick,26 Sauder,27 Simonsen28 and Smith. 29 Many of the concepts and techniques which .were developed for defense and space systems-whose focal point was hardware rather than software-are also applicable to computer systems. The system design, development and testing sequence was perfected by the late 1950's. Since the early 1960's, system verification, validation, and cost and effectiveness evaluation have been prevalent. The adaptation of these concepts and techniques to measurement of computer systems-especially software-is not as simple as system specialists tend to believe, yet not as difficult as software specialists tend to believe. In the middle 1960's, sucb..concepts and techniques began to be applied to the selection and evaluation of computer systems, and to software as well as hardware. Ratynski,30 Searle and Neil,31 Liebowitz32 and Piligian and Pokorney33 describe the Air Force and National Aeronautics and Space Administration (NABA) adapta- Computer systems have become indispensable to the advancement of management, science and technology. They are widely employed by academic, business and governmental organizations. Their contribution to today's world is significant in terms of both quantity and quality. This significant growth of computer utilization has been accompanied by a similar growth in computer technology. Faster computers with larger memories and more flexible input and output have been introduced, one after another. Interactive, multiprocessing, multiprogramming, realtime and timesharing have been transformed from catchy slogans into costly reality-or at least, partial reality. In addition, computer science has come into being, and has made great progress from an art toward a science. Departments of computer science have appeared within many colleges and universities. A new profession has been created and is attempting to mature. These three areas of phenomenal growth-computer utilization, computer technology and computer science-have produced the requirement for a new field, measurement of computer systems. In an atmosphere of escalating computer cost and increasing budget scrutiny, measurement provides a bridge between design promises and operational performance. This function of measurement is complemented by the traditional need for measurement of any art in search of a science. ACTIVITY INVOLVING MEASUREMENT A limited survey was conducted of the 1960-1970 literature on measurement of computer systems. This survey included all Proceedings of Spring Joint Com669 670 Fall Joint Computer Conference, 1972 tion of their system acquisition procedures to software acquisition. Attention then shifted to measurement of computer system performance, with a corresponding increase of activity. Sackman34 discusses computer system development and testing, based upon the Air Force and NASA experience. An important development of the period was the formation of a Hardware Evaluation Committee within SHARE35 during early 1964, and its evolution into the SHARE Computer Measurement and Evaluation Project36 during August 1970, which served as a focal point for significant progress. 37 A preliminary but informative indication of activity involving computer system effectiveness evaluation prior to 1970 appears below. When a comprehensive bibliography on measurement of computer systems is compiled and annotated, the gross characterization of activity given in this paper may be re'finedand expanded-especially in the area of practical contributions and contributors to measurement. Raw material for that bibliography and characterization may be found in the unpublished bibliographies of Bell, l Miller,2 Robinson3 and the author mentioned aboveas well as a bibliography by Crooke and Minker,38 one in preparation by Menck,39 and the selected papers in Hall. 37 During a keynote address at Computer Science and Statistics: Fourth Annual Symposium on the Interface in September 1970, Hamming coined the name of "compumetrics"-in the spirit of biometrics,econometrics and psychometrics-for measurement of computer systems. 40 It is fitting _that the naming of compumetrics occurred at this symposium, since measurement of computet systems is truly a part of the interface-or area of interaction-of computer science and statistics,4l Hamming phrased it well when he stated: 40 "The director of a computer center is responsible -for managing the utilization of large amounts of money, people and resources. Although he has a complex and important statistical problem, his decisions are normally based upon the simplest collection and analysis of data-since he usually knows little statistics beyond such elementary concepts as the mean and variance. His need for statistics involves both the operational performance of his hardware and software, and the environment provided by his organization and users." "A new discipline that seeks to answer these questions-and that might be called 'compu- metrics'-is in the process of evolving. Karl Pearson and R. A. Fisher established themselves by developing novel statistical solutions to significant problems of their time. Compumetrics may well provide contemporary statisticians with many such opportunities." Workshop sessions on compumetrics followed Hamming's remarks at the Fourth Symposium on the Interface. During these sessions,40 "there developed a feeling that this symposium marked a beginning which must not be _allowed to be an end" -that sessions on compumetrics be scheduled at the Fifth Symposium on the Interface, and that a local steering committee be formed to promote interest in compumetrics. It is not surprising, therefore, that a Special Interest Committee on Measurement of Computer SystemsSICMETRICS-was initiated within the Los Angeles Chapter of the Association for Computing Machinery during April 1971. SICMETRICS is compiling a bibliography on compumetrics. 39 There were sessions on computer system models and analysis at the Fifth Annual Princeton Conference on Information Sciences and Systems42 in March 1971. In April 1971, the ACM Special Interest Group on Operating Systems-8IGOPS-sponsored a Workshop on System Performance Evaluation43-with sessions on instrumentation, mathematical models, queuing -models, simulation models and performance evaluation. There were sessions on system evaluation and diagnostics at the 1971 Spring Joint Computer Conference 44 during May 1971. This was followed in November 1971 by workshop sessions on compumetrics at the Fifth Symposium on the Interface,45 by a session on operating system models and measures at the 1971 Fall Joint Computer Conference,46 and by a Conference on Statistical Methods for Evaluation of Computer Systems Performance47-with sessions on general approaches, evaluation of current systems, input analysis, software reliability, system management, design of experiments and regression analysis. During November 1971, the ACl\I Special Interest Committee on Measurement and Evaluation-SICME-was also formed. The ACM Special Interest Groups on Programming Languages-SIGPLAN-and on Automata and Computability Theory-SIGACT-sponsored a Conference on Proving Assertions about Programs48 in January 1972. A Symposium on Effective Versus Efficient Computing49-with sessions on responsibility, getting results, implementation, evaluation, education and looking ahead-was held during March 1972, and so was a session on computer system models at the Sixth Annual Princeton Conference on Information Sciences and Systems. 50 In May 1972, there was a session on Measurement of Computer Systems compumetrics at the 1972 Technical Symposium of the Southern California Region of ACM, and there were sessions on system performance measurement and evaluation at the 1972 Spring Joint Computer Conference. 51 An ACM Special Interest Group on Programming Languages-SIGPLAN-Symposium on Computer Program Test Methods followed during June 1972. 52 The National Bureau of Standards and AC1VI are jointly sponsoring a series of workshops and conferences on performance measurement. An informative discussion of many practical aspects of compumetrics is contained in Canning. 53 Finally, the 1972 Fall Joint Computer Conference 54 in December 1972, has coordinated sessions on meas~rement of computer systems-executive viewpoints, system performance, software validation and reliability, analysis considerations, monitors and their application, and case studies. Across the Atlantic, a Performance Measurement Specialist Group was organized within the British Computer Society in early 1971. A number of its working groups are functioning on specific projects, and it sponsored a conference in September 1972. This summary of activity involving measurement of computer systems clearly outlines the growth and increasing importance of compumetrics. Proposal of a structure for compumetrics is, therefore, quite appropriate. The presentation below is general and suggestive, rather than detailed and complete-as is appropriate for an introduction. STRUCTURE FOR MEASUREMENT A structure-or framework-is proposed for measurement of computer systems, to serve as a background for both understanding and developing the subject. I t provides not only a common set of terms-which may be familiar to some and new to others, but also a guide to the current-as well as potential-extent and content of compumetrics. Such a structure is critical for subjects that have matured and crucial otherwise, whether or not there is universal agreement on detailed portions of it. The conceptual framework for Air Force and NASA acquisition of computer systems30- 34 provides a context in which not only the structure for measurement, but also the structure for effectiveness evaluation, should be considered. Compumetrics .concerns measurement in-internal to-or of-external to-computer systems. As for biometrics, econometrics and psychometrics, this means measurement of a general nature applied to computer systems in a broad sense. A computer system is taken to be a collection of properly related elements, including 671 a computer, which possesses a computing or data handling objective. The structure for compumetrics is described in terms of computer system evolution and computer system operation. Computer system evolution is divided into design, development and testing, and computer system operation is divided into objective, composition and management. A sequence of questions-including the if, why, what, where, when, how much and how of measurement-should be developed and then answered for each element of the structure. The structure is presented from the viewpoint of a statistician who is knowledgeable about computers, in order to augment Hamming's viewpoint as a computer scientist who is knowledgeable about statistics. In addition, this structure is considerably more comprehensive and definitive than that which is implied by Hamming's original discussion. 40 An outline version of it appeared in Locks. 45 At present, measurement of computer systems might be characterized as a growing collection of measurements on their way toward a science, and in need of planning and analysis to help them get there. Bell, Boehm and Watson 55 provide an adaptation of the scientific method to performance measurement and improvement of a computer system: from understanding the system and analyzing its operation, through formulating performance improvement hypotheses and analyzing the probable cost-effectiveness ) of the corresponding modifications, to testing specific hypotheses and implementing the appropriate combinations of modifications-as well as testing the costeffectiveness of these combinations. As a complement to this approach, the author 56 presents a user's guide to data modeling and analysis-including a perspective for viewing and utilizing such a framework for the collection and analysis of measurements. That paper56 discusses the sequence of steps which leads from a problem through a solution to its assessment, some aspects of solving problems which should be considered, and an approach to the design and analysis of a complex system through utilization of both experimental and computer simulation data .. Measurement and system evolution Within this and the following sections, appropriate terms appear in capital letters for emphasis. Such a procedure· produces not only clarity of exposition, but also a lack of smoothness, in the resulting text. The advantage of the former is sought, even at the disadvantage of the latter. In addition, words are employed in their usual nontechnical sense. Computer systems evolve from DESIGN through 672 Fall Joint Computer Conference, 1972 DEVELOPMENT to TESTING. For illustrative purposes, we present one partition-from among the many which are possible-of this evolution into more basic components. It is meaningful from both a manager's and a user's point of view. For a given computer system, the accomplishment of more than one component may be occurring simultaneously, and the accomplishment of all components may not be feasible. The DESIGN of a computer system involves the system what and how. A REQUIREMENTS ANALYSIS ascertains user needs and generates system objectives, and a FUNCTIONAL ANALYSIS translates system objectives into a desired system framework. Then SPECIFICATION SYNTHESIS transforms the objectives and desired framework into desired performance and its description. Finally, STRUCTURE develops system framework from the desired framework, and SIZING infers system size from its framework. System· DEVELOPMENT is concerned with implementing the system what and how. It proceeds from HARDWARE AND SOFTWARE SELECTIONwhich includes the decision to make or buy, through HARDWARE AND SOFTWARE ACQUISITIONwhich involves either making or buying-and HARDWARE AND SOFTWARE COMBINATIONwhich implements the framework in terms of acquired hardware and software, to SOFTWARE PROGRAMMING-which includes the programming of additional software. How well the framework was implemented is then determined by HARDWARE AND SOFTWARE VERIFICATION. Development is completed by SYSTEM DOCUMENTATION to describe the system what and how, and by PROCEDURE DOCUMENTATION to describe the how of system operation and use. TESTING of a computer system has the objective of assessing how well the system performs. First, system INTEGRATION-which could have been included under development-assembles the hardware, software and other elements into. a system. This is followed by system VALIDATION, for ascertaining how well the specifications were implemented and for contributing to quality assurance. COST EVALUATION determines how much the system costs in terms of evolution and operation, and EFFECTIVENESS EVALUATION determines how well the system performs in terms of operational time, quality and impact upon the user. The final step in testing is, of course, OPERATION-performance for the user. McLean57 proposes a characterization for the "all-tootrue life cycle of a typical EDP system: unwarranted enthusiasm, uncritical acceptance, growing concern, unmitigated disaster, search for the guilty, punishment of the innocent, and promotion of the uninvolved." An excellent discussion of computer system development and testing-whose application should alter this cycleis provided by Sackman. 34 In addition, measurement was apparently employed in many places within the design, development and testing sequence for the information system of Winbrow. 58 Where is measurement currently utilized in the system evolution sequence? Measurement is inherently involved in hardware specification synthesis, sizing and cost evaluation. It is employed to a limited extent during hardware requirements analysis and selection, and it emerged in importance as a significant contributor to hardware validation and performance monitoring-which is a portion of effectiveness evaluation. Weare only beginning to consider serious and systematic measurement as it concerns software verification, validation, and cost and effectiveness evaluation. In fact, we are beginning to use the same terminology for hardware and software that was used in the early 1960's for defense and space systems-which were predominately noncomputer hardware. "Requirements for AVAILABILITY of Computing System Facilities"59 provides an excellent example, with its use of reliability, maintainability, repairability and recoverability. Where should measurement be utilized in the evolution sequence? It probably· has an appropriate use in most, if not almost all, components of the sequence. In particular, system verification, validation, and cost and effectiveness evaluation-as well as reliability and its fellow ilities 59-have no real meaning without measurement. Measurement and system operation A computer system operation has COMPOSITION and an OBJECTIVE, as well as being subject to MANAGEMENT. As a guide to discussion and thought, a useful-but not unique-division of system operation into more ba&ic elements is now described. A given computer system, however, may not involve all of these elements. COMPOSITION of a computer system concerns what constitutes the system. The main component, by tradition, has been computer HARDWARE-which may involve input, memory, processing, output, communication or special purpose equipment. Since the means for communicating with that equipment currently costs from one to ten times as much as the hardware, the main component really is SOFTWARE-which may involve input, storage and retrieval, operating, application, simulation, output or communication program packages. The system may also contain Measurement of Computer Systems FIRMWARE, which is either soft hardware or hard software-such as a microprogram, and PERSONNEL. How to operate and use the system is covered by the operating PROCEDURE. The system aspects include all two way INTERFACES such as hardware-software, all three way INTERFACES such as firmwarepersonnel-procedure, all four way INTERFACES such as hardware-software-personnel-procedure, and the five way INTERFACE of hardware-software-firmwarepersonnel-procedure. What the computer system does primarily-although it may do many things concurrently or sequentially-is the system OBJECTIVE. DATA MANAG EMENT emphasizes storage and retrieval of data by the system. Operating upon data by the system is the focus of DATA PROCESSING. COl\1l\1AND AND CONTROL stresses input and output of data by the system, and decisions aided by the system. As observed by Boehm, an alternative view is that all three types of systems aid the making of decisions: data management systems provide the least aid, data processing systems provide more aid, and command and control systems provide the most aid. The distinction among these also depends upon what the environment is and who the user is-data management or command and control systems are frequently called information systems. In addition, the same system- or a portion of it-might frequently be utilized for more than one objective. Computer system MANAGEMENT involves system administration and supervision. PLANNING is projecting the system's future. Getting operations together and focused constitutes COORDINATION, and keeping operations together and directed constitutes CONTROL. REVIEW provides an assessment. of the past and present, while TRAINING provides system operators. Finally, USER INTERACTION concerns system calibration and acceptance by the user. Measurement has traditionally been employed on computer hardware and personnel, has begun to be employed on software and firmware, and may someday be employed on procedure and interfaces. It has been applied in data management and data processing, but should also be applied in command and control. As for management in general, measurement is only beginning to be utilized in computer system planning, coordination, control, review, training and user interaction. STRUCTURE FOR EFFECTIVENESS EVALUATION Consideration of the need for, activity involving, and structure for measurement implies that an impor- 673 tant unsolved problem for the 1970's is the evaluation of computer system effectiveness. That this is true for library information systems is explicitly stated in a recent report by the National Academy of Sciences Computer Science and Engineering Board,60 and that it is true for computer systems in general is implicitly stated in arecent report by GUIDE Int~rnationa1.59 As Maclean observed,57 we are like Oscar Wilde's cynic: "A man who knows the price of everything, and the value of nothing." Effectiveness evaluation determines how well the system performs in terms of operational time, quality and impact upon the user. It has both an internal or . inwardly oriented aspect-which determines how well the system responds to any need, and is more efficiency than effectiveness-and an external or outwardly oriented aspect-which determines how well the system responds to the actual need, and is truly effectiveness. The point of view that is taken as to what effectiveness is and how it should be evaluated is also extremely important. Viewpoints of the user and his management should be considered, as well as viewpoints of the system and its management. In terms of both aspects and viewpoints, effectiveness evaluation is much broader than mere performance measurement. Evaluating the impact of the system upon a user is essentially the reverse of system design or selection, which evaluates the impact of the user upon a potential or real system. In order to accomplish this, it is necessary to evaluate how well the promises of system design or selection are fulfilled by system operation. An informative, as well as interesting, exercise would be the real impact evaluation of applications such as those surveyed in 1965 by Rhodes,61 Ramo,62, Gerard,63 lVialoney,64 l\1cBrier;65 11erkin and Long,66 Gates and Pickering,67 Ward,68 Baran69 and Schlager. 7o Based upon Air Force and NASA experience, Sackman34 provides a thorough treatment of computer system development and testing. This treatment includes: • A survey of system engineering, human factors, software and operations research points of view on testing and evaluation-all of which are implicitly oriented inwardly toward the system, rather than outwardly toward the user. • A description of test levels, objectives, phasing within development and operation, approach and chronology. • A discussion of the analogy between scientific method and system development--during which, a sequence of increasingly specific hypotheses is posed and tested, as the implicit promises of 674 Fall Joint Computer Conference, 1972 XXXIII. XXXII. XXVIII. USE R GENE RAL xxv. XXIV. UNIT, USER 'ANDTASK USER AND TASK UNIT AND CENTER CENTE R AND UNIT GENERAL GENERAL GENERAL XXVI. XXVII. UNIT, CENTER AND TASK I CENTER, UNIT AND TASK r--------.., USER DATA ~--~ CENTER MANAGEMENT XXXI. UNIT AND USER -_ .,. -:....l - ,--;;.-------,L__-_ I INPUT r ~--------~ xxxv. AND UNIT xxx. XXIX. - _ UNIT MANAGEMENT AN~ USER USER MANAGEMENT ~_ _ _ _-.~ ----=1 XI·~:~~"OATA L ________ ~---- ~ ~------~ II I I I I :;:-.J : I I I r------,------I : XX~::d:PUT L _____ '---_ _ _- ' :X:---- XI~,o::~ INPUT UNIT INPUT QUALITY : XX:~!~~PUT L ______ L -_ _--' Figure I-Structure for evaluation of data management or command and control system effectiveness design become explicit promises during development and explicit performance during operation. • A summary of the philosophical roots of this analogy and approach. • A short bibliography. It constitutes an excellent contribution to effectiveness evaluation, as well as a firm foundation for the framework of Bell, Boehm and Watson, 55 but more is needed. In addition, almost all library system effectiveness evaluation has been centered around--'-if not actually restricted to-variations of two simple ratios, called relevance and recall. And Fingings 1 and 2 in the National Academy of Sciences report60 state that much more is needed. The complexity and importance of effectiveness evaluation combine to require a significantly broader and deeper, as well as more meaningful, structure. Most of the significance and ultimate payoff associated with computer systems involve the external environment and aspects of the system, from various points of view. Despite that fact, the preponderance of effectiveness evaluation has not focused upon such aspects from the appropriate points of view.34.71-79 A structure for computer system effectiveness evaluation is proposed, as both a step toward fulfilling that need and an elaboration of the structure for compumetrics. Figure 1 contains a general version of the structure for data management or command and control systems, and Figure 2 contains a general version of the structure for data processing systems. The graphic presentations of the figures are complemented by the corresponding' verbal descriptions-which employ words in their usual nontechnical sense. Effectiveness evaluation of a computer system might require a combination of the structures in Figures 1 and 2, since the system might frequently be utilized for more than one objective. In addition, the entire structure might not be of interest for a given system. An initial indication of activity involving computer system effectiveness evaluation is then summarized. Finally, selected papers that illustrate such activity are briefly discussed. This summary and discussion serve as a background against which to view the proposed structures. Evaluation of data management or command and control systems In Figure 1, there are three main categories of characteristics-FLOW, EFFECTIVENESS and VIEWPOINTS-all of which reside within an ECONOMIC AND POLITICAL ENVIRONMENT. FLOW characteristics (I-XI) involve the flow of data and need for data, from a user and his task through the system unit and center back to the user and his task. Those characteristics (XII-XXIII) which describe how well the flow of data satisfies the need for data- l\1easurement of Computer Systems both internal and external to the system-comprise EFFECTIVENESS. VIEWPOINTS contain the various points of view (XXIV-XXXV) regarding the flow and its· effectiveness. All of these characteristics are embedded within an ECONOl\1IC AND POLITICAL ENVIRONMENT, whose influence is sometimes explicit and sometimes implicit yet always present. A USER (I) of the system and a TASK (II) which he is performing jointly generate a need for data, called USER DATA NEED (III). To satisfy this need, the user contacts either the appropriate outlet of the system-SYSTEM UNIT (IV)-or other sources for data-OTHER USER SOURCES (V). The unit essentially becomes a user now and contacts either the SYSTEM CENTER (VII) or OTHER UNIT SOURCES (VIII), in order to satisfy its UNIT DATA NEED (VI). DATA (IX) is then output by the system or other sources to the user for performance of his task. Finally, there may also be USER DATA INPUT (X)such as data generated by the user in his task or by user management regarding an impending change in its basic need-by the user to the unit, and UNIT DATA INPUT (XI)-such as data generated by the unit or by unit management regarding an impending change in its basic need-by the unit to the system. Operational characteristics of the unit and center in terms of time-how quickly or how often-are XXX. USER MANAGEMENT ~ z > USER AND TASK UNIT, USER AND TASK CENTER MANAGEMENT AND UNIT UNIT MANAGEMENT AND CENTER XXIX. XXVIII. ~~r-X-X-II.------+---------~'--X-XI-II.----~--X-XI-V.--~--+---------~ USER GENERAL XXXIII. XXXII. UNIT MANAGEMENT AND USER XXVII. UNIT AND USER GENERAL XXVI. grouped under UNIT OUTPUT TIME (XII) and CENTER OUTPUT TIME (XIII), those in terms of quality-how well or how completely-are grouped under UNIT OUTPUT QUALITY (XIV) and CENTER OUTPUT QUALITY (XV), and those in terms of impact-how responsively or how significantlyare grouped under UNIT OUTPUT IMPACT (XVI) and CENTER OUTPUT IMPACT (XVII). Time characteristics emphasize the internal aspects of the system and impact characteristics emphasize the external aspects of the system, while quality characteristics emphasize both the internal and external aspects of the system. In addition, time is the easiest to measure objectively as well as the least meaningful quality is more difficult to measure objectively than time and less difficult to measure objectively than impact, as well as more meaningful than time and less meaningful than impact ... impact is the most difficult to measure objectively as well as the most meaningful. Effectiveness may be viewed as the average, over all users and tasks, of the effectiveness for specific user and task combinations. There may also be USER INPUT TIME (XVIII) and UNIT INPUT TIME (XIX)-to indicate how quickly or how often the user inputs data to the unit and the unit inputs data to the center, USER INPUT QUALITY (XX) and UNIT INPUT QUALITY (XXI)-to indicate how well or how completely these XXXI. - CENTER AND UNIT GENERAL UNIT AND CENTER GENERAL ·--xx-v-.----~--------~ UNIT, CENTER AND TASK CENTER. UNIT AND TASK r---=..-'- ---:.-- ----------- -------_=.., It tl ~: ~: VII. PROCESSED DATA I I , I XVI. , : USER INPUT TIME XVII. UNIT INPUT TIME f- X~'~. - - - -.-------~- - - - -- -'-----------ll- ~X-:- - - - - ,.---------, - - - -- -'--....;.;;.:;"-----...1 ixx.-u:R--- • 1 INPUT L _ ~M!~:" _ 1 ~~~~~~~lIT _____-+, _______ : I-;x~----: 675 .-=----i-! UNIT INPUT DUALITY -------'------...1 UNIT INPUT L_~~~~ __ <---='-":':':"'--...1 Figure 2-Structure for evaluation of data processing system effectiveness 676 Fall Joint Computer Conference, 1972 were accomplished, and USER INPUT IMPACT (XXII) and UNIT INPUT IMPACT (XXIII)-to indicate how responsively or how significantly these were accomplished. In this case, the user is serving the system and the above roles are reversed. Internal aspects of the user are focused upon by time and external aspects of the user are focused upon by impact, while both internal and external aspects of the user are focused upon by quality. What we mean by effectiveness, as well as how we evaluate it, will vary according to our point of view. The task specific viewpoint of the user toward the unit is USER AND TASK (XXIV), that of the unit toward the user is UNIT, USER AND TASK (XXV), that of the unit toward the center is UNIT, CENTER AND TASK (XXVI), and that of the center toward the unit is CENTER, UNIT AND TASK (XXVII). USER GENERAL (XXVIII), UNIT AND USER GENERAL (XXIX), UNIT AND CENTER GENERAL (XXX), and CENTER AND UNIT GENERAL (XXXI) represent general viewpoints of the user for the unit, the unit for the user, the unit for the center, and the center for the unit. Finally, the viewpoint of user management toward the unit constitutes USER MANAGEMENT (XXXII), that of unit management toward the user constitutes UNIT MANAGEMENT AND USER (XXXIII), that of unit management toward the center constitutes UNIT MANAGEMENT AND CENTER (XXXIV), and that of center management toward the unit constitutes CENTER MANAGEMENT AND UNIT (XXXV). Internal aspects of the system are stressed in center viewpoints and external aspects of the system are stressed in user viewpoints, while both internal and external aspects of the system are stressed in unit viewpoints. Task specific viewpoints are the easiest to measure objectively, general viewpoints are more difficult to measure objectively than task specific viewpoints and less difficult to measure objectively than management viewpoints, and management viewpoints are the most difficult to measure objectively-the meaningfulness of these depends, of course, upon point of view. USER PROGRAMMING NEED (III) or USER PROCESS IN G NEED (V). To satisfy this need, the user contacts the SYSTE1VI PROGRAMMING UNIT (IV) or SYSTEM PROCESSING CEN,.,ER (VI)which is also contacted to satisfy USER AND UNIT PROCESSING NEED (V). PROCESSED DATA (VII) is then output to the user for performance of his task. There may also be USER PROGRA1VIMING INPUT (VIII) by the user to the unit, or USER AND UNIT PROCESSING INPUT (IX) by the user and unit to the center. Operational characteristics of the unit and center are grouped under UNIT OUTPUT TIME (X) and CENTER OUTPUT TIlVIE (XI), UNIT OUTPUT QUALITY (XII) and CENTER OUTPUT QUALITY (XIII), and UNIT OUTPUT IMPACT (XIV) and CENTER OUTPUT IMPACT (XV). There may also be USER INPUT TIME (XVI) and UNIT INPUT TIl\1E (XVII), USER INPUT QUALITY (XVIII) and UNIT INPUT QUALITY (XIX), and USER INPUT IMPACT (XX) and UNIT INPUT IMPACT (XXI). Task specific viewpoints are those of USER AND TASK (XXII), UNIT, USER AND TASK (XXIII), UNIT, CENTER AND TASK (XXIV), and CENTER, UNIT AND TASK (XXV). USER GENERAL (XXVI), UNIT AND USER GENERAL (XXVII), UNIT AND CENTER GENERAL (XXVIII), and CENTER AND UNIT GENERAL (XXIX) represent general viewpoints. Finally, management viewpoints are given by USER MANAGEMENT (XXX), UNIT MANAGEMENT AND USER (XXXI), UNIT MANAGEMENT AND CENTER (XXXII), and CENTER MANAGEMENT AND UNIT (XXXIII). Some modification and considerable refinement may be required to employ one of these structures on an actual computer system. The structures do, however, indicate important considerations for evaluating the effectiveness of a computer system. In addition, they are considerably more comprehensive than current structures, and provide a guide toward their own modification and refinement. Evaluation of data processing systems Activity involving evaluation Figure 2 contains the characteristics of FLOW (I-IV), EFFECTIVENESS (X-XXI) and VIEWPOINTS (XXII-XXXIII)-all being surrounded by an ECONOMIC AND POLITICAL ENVIRONMENT. Since it differs from Figure 1 only in terms of the basic flow for data and need, a brief description is now presented. A USER (1) and his TASK (II) jointly generate This introduction to compumetrics concludes with an initial indication of activity involving computer system effectiveness evaluation prior to 1970, and a brief description of selected papers which illustrate the activity. That indication and description provide a context in which to consider the structures given above. Utilizing the unpublished bibliographies of Bell, 1 lVleasurement of Computer Systems Miller,2 and Robinson3 and the author, each processing its own bias and deficiency, a preliminary characterization of effectiveness evaluation activity before 1970 was obtained. Those pioneering papers that appeared prior to 1963 and treated the general topic were included, but those papers that emphasized mathematical modeling or computer simulation-the majority of which were more concerned with mathematics than with measurement-were not included. There were 234 separate references remaining after duplicate listings within these bibliographies were eliminated. The number (and approximate percentage) of documents by year were: • • • • • • • • • • • • • • • 1945-1 (0%) 1948-1 (0%) 1950-1 (0%) 1953-1 (0%) 1960-7 (3%) 1961-6 (2%) 1962-9 (4%) 1963-7 (3%) 1964-14 (6%) 196.1:>-8 (3%) 1966-13 (6%) 1967-23 (10%) 1968-31 (14%) 1969-62 (27%) 1970-50 (22%) 677 are summarized by Sackman :78 • All five employ computer time and some measure of man time. • All five employ some measure of program quality. • Gold employs three additional measures of quality, and Smith employs one additional measure of quality. • Gold and Schatzoff, Tsao and Wiig employ a measure of cost. • All five employ-in an implicit, rather than explicit, manner-both system and user viewpoints. Finally, Shemer and Heying79 include both internal and external aspects of effectiveness in the design model for a system, which is to perform timesharing as well as batchprocessing-and then compare operational system data with the design model. ACKNOWLEDGMENTS The critical review of this paper and constructive suggestions for its improvement by Thomas Bell, Barry Boehm, Richard Hamming, Robert Patrick, Harold Petersen and Louis Robinson are gratefullv acknowledged. REFERENCES These numbers and percentages are, of course, affected by all pioneering papers having been counted at the lower end and by some recent papers having possibly been missed at the upper end. Nevertheless, they do exhibit a general trend in the variation of activity over the period. A serious characterization of such activity awaits the compilation and annotation of a comprehensive bibliography on measurement of computer systems-by categories in the structures for measurement and effectiveness evaluation, as well as by year. An elementary structure for evaluation of command and control system effectiveness-in its external form as well as its internal form-is provided by Edwards.71 Both Rosin72 and Bryan73 consider time and quality characteristics of data processing system performance for a large variety of users, the former on a batchprocessing system and the latter on a timesharing system. Five experiments for comparing the performance of a timesharing system with that of a batchprocessing system-Gold,74 Sackman, Erikson and Grant,75 Schatzoff, Tsao and Wiig,76 and Smith77- 1 T E BELL Computer system performance bibliography Unpublished 2 E F MILLER JR Bibliography on techniques of computer performance analysis Unpublished 3 L ROBINSON Bibliography on data processing performance evaluation Unpublished 4 E H HERBST N METROPOLIS N B WELLS Analysis of problem codes on the MANIAC Mathematical Tables and Other Aids to Computation Vol 9 No 49 1945 pp 14-20 5 C E SHANNON A mathematical theory of communication Bell System Technical Journal Vol 27 1948 p 379 6 R WHAMMING Error detecting and error correcting codes Bell System Technical Journal Vol 29 1950 p 147 7 H R J GROSCH High speed arithmetic: The digital computer as a research tool Journal of the Optical Society of America Vol 43 No 4 1953 pp 306-310 8 P R BAGLEY Item 2 of two think pieces: Establishing a measure of 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Fall Joint Computer Conference, 1972 capability of a data processing system Communications of the ACM Vol 3 No 11960 pI A J BLACK SAVDAT: A routine to save input data in simulator tape format Report FN-GS-151 System Development Corporation 1960 E F CODD Multiprogram scheduling: Parts I-IV Communications of the ACM Vol 3 Nos 6 and 7 1960 pp 347-350 and 413-418 L FEIN A figure of merit for evaluating a control computer system Automatic Control 1960 I FLORES Computer time for address calculation sorting Journal of the Association for Computing Machinery Vol 7 No 4 1960pp 389-409 M E MARON J L KUHNS On relevance probabilistic indexing and information retrieval Journal of the Association for Computing Machinery Vol 7 No 3 1960 pp 389-409 H NAGLER An estimation of the relative efficiency of two internal sorting methods Communications of the ACM Vol 3 No 111960 pp 618-620 R S BARTON A new approach to the functional design of a digital computer Proceedings of 1961 Fall Joint Computer Conference AFIPS Press 1961 pp 393-396 I FLORES Analysis of internal computer sorting Journal of the Association for Computing Machinery Vol 8 No 11961 pp 41-80 G GORDON A general purpose systems simulation program Proceedings of 1961 Spring Joint Computer Conference AFIPS Press 1961 pp 87-98 H M GURK J MINKER The design and simulation of an information processing system Journal of the Association for Computing Machinery Vol 8 No 2 1961 pp 260-271 W A HOSIER Pitfalls and safeguards in real-time digital systems with emphasis on programming IRE Transactions on Engineering Management 1961 J JAFFE M I BERKOWITZ The development and uses of a functional model in the simulation of an information-processing system Report SP-584 System Development Corporation 1961 C W ADAMS Grosch's law repealed Datamation Vol 8 No 7 1962 pp 38-39 F R BALDWIN W B GIBSON C B POLAND A multiprocessing approach to a large computer system IBM Systems Journal Vol 1 No 11962 pp 64-70 0 DOPPING Test problems used for evaluation of computers BIT Vol 2 No 4 1962pp 197-202 J A GOSDEN R C SISSON Standardized comparisons of computer performance Proceedings of 1962 IFIPS Congress 1962 pp 57-61 25 T N HIBBARD Some combin.atorial properties of certain trees with applications to searching and sorting Journal of the Association for Computing Machinery Vol 9 No 1 1962 pp 13-28 26 R L PATRICK Let's measure our own performance Datamation Vol 8 No 6 1962 27 R L SAUDER A general test data generator for COBOL Proceedings of 1962 Spring Joint Computer Conference AFIPS Press 1962 pp 371-324 28 R H SIMONSEN Simulation of a computer timing device Communications of the ACM Vol 5 No 7 1962 p 383 29 E C SMITH A directly coupled multiprocessing system IBM Systems Journal Vol 2 No 3 1962 pp 218-229 30 M V RATYNSKI The Air Force computer program acquisition concept Proceedings of 1967 Spring Joint Computer Conference AFIPS Press 1967 pp 33-44 31 L V SEARLE G NEIL Configuration management of computer programs by the Air Force: Principles and documentation Proceedings of 1967 Spring Joint Computer Conference AFIPS Press 1967 pp 45-49 32 B H LIEBOWITZ The technical specification-Key to management control of computer programming Proceedings of 1967 Spring Joint Computer Conference AFIPS Press 1967 pp 51-59 33 M S PILIGIAN J C POKORNEY Air Force concepts for the technical control and design verification of computer programs Proceedings of 1967 Spring Joint Computer Conference AFIPS Press 1967 pp 61-66 34 H SACKMAN Computers system science and evolving society John Wiley & Sons Inc 1967 35 Proceedings of SHARE XXIII Share Inc 1969 36 Proceedings of SHARE XXXV Share Inc 1970 37 G HALL Editor Computer measurement and evaluation: Selected papers from the SHARE project SHARE Inc 1972 38 S CROOKE J MINKER KWIC index and bibliography on computer systems simulation and evaluation Computer Science Center University of Maryland 1969 39 H R MENCK Editor Bibliography on measurement of computer systems ACM Los Angeles Chapter Special Interest Committee on Measurement of Computer Systems Unpublished 40 A F GOODMAN Editor Computer science and statistics: Fourth annual symposium on the interface-An interpretative summary Western Periodicals Company 1971 41 A F GOODMAN The interface of computer science and statistics Naval Research Logistics Quarterly Vol 18 No 2 1971 pp 215-229 l\1easurement of Computer Systems 42 M E VAN VALKENBURG et al Editors Proceedings of fifth annual Princeton conference on information sciences and systems Princeton University 1971 43 U 0 GAGLIARDI Editor Workshop on system performance evaluation ACM Special Interest Group on Operating Systems 1971 44 Proceedings of 1971 Spring Joint Computer Conference AFIPS Press 1971 45 M 0 LOCKS Editor Proceedings of computer science and statistics: Fifth annual symposium on the interface Western Periodicals Company 1972 46 Proceedings of 1971 Fall Joint Computer Conference AFIPS Press 1971 47 W F FREIBERGER Editor Statistical computer performance evaluation Academic Press 1972 48 J MADAMS J B JOHNSON R H STARKS Editors Proceedings of an ACM conference on proving assertions about programs ACM Special Interest Groups on Programming Languages and on Automata and Computability Theory 1972 49 F GRUEN BERGER Editor Effective versus efficient computing Publisher to be selected 50 M E VAN VALKENBURG et al Editors Proceedings of sixth annual Princeton conference on information sciences and systems Princeton University 1972 51 Proceedings of 1972 Spring Joint Computer Conference AFIPS Press 1972 52 W C HETZEL Editor Program testing methods Prentice-Hall Inc 1972 53 R G CANNING Editor Savings from performance monitoring EDP Analyzer Vol 10 No 9 1972 54 Proceedings of 1972 Fall Joint Computer Conference AFIPS Press 1972 55 T E BELL B W BOEHM R A WATSON Framework and initial phases for computer performance improvement Proceedings of 1972 Fall Joint Computer Conference AFIPS Press 1972 56 A F GOODMAN Data modeling and analysis for users-A guide to the perplexed Proceedings of 1972 Fall Joint Computer Conference AFIPS Press 1972 57 E R MACLEAN Assessing returns from the data processing investment Effective versus Efficient Computing Publisher to be selected (see 49) 58 J H WINBROW A large-scale interactive administrative system IBM Systems Journal Vol 10 No 4 1971 pp 260-282 59 Requirements for AVAILABILITY of computing facilities User Strategy Evaluation Committee GUIDE International Corporation 1970 60 Libraries and information technology 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 679 Information Systems Panel Computer Science and Engineering BoardNational Academy of Sciences 1972 I RHODES The mighty man-computer team Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 1-4 S RAMO The computer and our changing society Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 5-10 R W GERARD Computers and education Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 11-16 J V MALONEY JR Computers: The physical sciences and medicine Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 17-19 C R McBRIER Impact of computers on retailing Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 21-25 W I MERKIN R J LONG The application of computers to domestic and international trade Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 27-31 C R GATES W H PICKERING The role of computers in space exploration Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 33-35 J A WARD The impact of computers on the government Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 37-44 P BARAN Communication computers and people Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 45-50 K J SCHLAGER The impact of computers on urban transportation Proceedings of 1965 Fall Joint Computer Conference Part 2 AFIPS Press 1965 pp 51-55 N P EDWARDS On the evaluation of the cost-effectiveness of command and control systems Proceedings of 1964 Spring Joint Computer Conference AFIPS Press 1964 pp 211-218 R F ROSIN Determining a computing center environment Communications of the ACM Vol 8 No 7 1965 pp 463-468 G E BRYAN JOSS: 20,000 hours at a console-A statistical evaluation Proceedings of 1967 Fall Joint Computer Conference AFIPS Press 1967 pp 769-777 M GOLD Time-sharing and batch-processing: An experimental comparison of their values in a problem-solving situation Communications of the ACM Vol 12 No 5 1969 pp 249-259 H SACKMAN W J ERIKSON E E GRANT Exploratory experimental studies comparing online and offline programming performance Communications of the ACM Vol 11 No 11968 pp 3-11 680 Fall Joint Computer Conference, 1972 76 M SCHATZOFF R TSAO R WnG An experimental comparison of time sharing and batch processing Communications of the ACM VollO No 5 1967 pp 261-265 71 L B SMITH A comparison of batch processing and instant turnaround Communications of the ACM VollO No 8 1967 pp 495-500 78 H SACKMAN Time-sharing versus batch-processing: The experimental evidence Proceedings of 1968 Spring Joint Computer Conference AFIPS Press 1968 pp l-lO 79 J E SHE MER D W HEYING Performance modeling and empirical measurements in a system designed for batch and time-sharing users Proceedings of 1969 Fall Joint Computer Conference AFIPS Press 1969 pp 17-26 A highly parallel computing system for information retrieval* by l3EHROOZ P ARHAMI University of California Los Angeles, California PARALLELISM AND INFORMATION RETRIEVAL INTRODUCTION The tremendous expansion in the volume of recorded knowledge and the desirability of more sophisticated retrieval techniques have resulted in a need for automated information retrieval systems. However, the high cost, in programming and running time, implied by such systems has prevented their widespread use. This high cost stems from a mismatch between the problem to be solved and the conventional architecture of digital computers, optimized for performing serial operations on fixed-size arrays of data. It is evident that programming and processing costs can be reduced substantially through the use of special-purpose computers, with parallel-processing capabilities, optimized for non-arithmetic computations. This is true because the most common and time-consuming operations encountered in information retrieval applications (e.g., searching and sorting) can make efficient use of parallelism. In this paper, a special-purpose highly parallel system is proposed for information retrieval applications. The proposed system is called RAPID, Rotating Associative Processor for Information Dissemination, since it is similar in function to a conventional byteserial associative processor and uses a rotating memory device. RAPID consists of an array processor used in conjunction with a head-per-track disk or drum memory (or any other circulating memory). The array processor consists of a large number of identical cells controlled by a central unit and essentially acts as a filter between the large circulating memory and a central computer. In other words, the capabilities of the array processor are used to search and mark the file. The relevant parts of the file are then selectively processed by the central computer. Information retrieval may be defined as selective recall of stored knowledge. Here, we do not consider informa tion retrieval systems in their full generality but restrict ourselves to reference and document retrieval systems. Reference (document) retrieval is defined as the selection of a set of references (documents) from a larger collection according to known criteria. The processing functions required for information retrieval are performed in three phases: 1. Translating the user query into a set of search specifications described in machine language. 2. Searching a large data base and selecting records that satisfy the search criteria. 3. Preparing the output; e.g., formatting the records, extracting the required information, and so on. Of these three phases, the second one is by far the most difficult and time-consuming; the first one is straightforward and the third one is done only for a small set of records. The search phase is time-consuming mainly because of the large volumes of information involved since the processing functions performed are very simple. This suggests that the search time may be reduced by using array processors. Array processing is particularly attractive since the search operations can be performed as sequences of very simple primitive operations. Hence, the structure of each processing cell can be made very simple which in turn makes large arrays of cells economically feasible. Associative memories and processors constitute a special class of array processors, with a large number of small processing elements, which can perform simple pattern matching operations. Because of these desirable characteristics, several proposals have been made for * This research was supported by the U.S. Office of Naval Research, Mathematical and Information Sciences Division, Contract No. NOO014-69-A-0200-4027, NR 048-129. 681 682 Fall Joint Computer Conference, 1972 using associative devices in information retrieval applications. Before proceeding to review several attempts in this direction, it is appropriate to summarize some properties of an ideal information retrieval 'system to provide a basis for evaluating different proposals. PI. Storage medium: Large-capacity storage is used which has modular growth and low cost per bit. Variable-length records are P2. Record format: allowed for flexibility and storage efficiency. P3. Search speed: Fast access to a record is possible. The whole data base can be searched in a short time. P4. Search types: Equal-to, greater-than, less-than, and other common search modes are permitted. P5. Logical search: Combination of search results is possible; e.g., Boolean and threshold functions of simple search results. Some proposalsl-3 consider using conventional associative memories with fixed word-lengths and, hence, do not satisfy P2. While these proposals may be adequate for small special-purpose systems, they provide no acceptable solution for large information retrieval systems. With the present technology, it is obviously not practical to have a large enough associative memory which can store all of the desired information1, 2 without violating PI. Using small associative memories in conjunction with secondary storage3 results in considerable amounts of time spent for loading and unloading the associative memory, violating P3. Somewhat more flexible systems can be obtained by using better data organizations. In the distributed-logic memory,4,5 data is organized as a single string of symbols divided into substrings of arbitrary lengths by delimiters. Each symbol and its associated control bits are stored in, and processed by, a cell which can communicate with its two neighbors and with a central control unit. In the association-storing processor,6 the basic unit of data is a triple consisting of an ordered pair of items (each of which may be an elementary item or a triple) and a link which specifies the association between the items. Very complex data structures can be represented conveniently with this method. Even though these two systems provide flexible record formats, they do not satisfy PI. It is evident that with the present technology, an information retrieval system which satisfies both PI and P3 is impractical. Hence, trading speed for cost through the use of circulating memory devices seems to provide the only acceptable solution. Delay-line associative devices that have been proposed 7,8 are not suitable for large information retrieval systems because of their fixed word-lengths and small capacities. The use of head-pertrack disk or drum memories as the storage medium appears to be very promising because such devices provide a balanced compromise between PI and P3. An early proposal of this type is the associative file processor 9 which is a highly specialized system. Siotnick10 points out, in more general terms, the usefulness of logic-per-track devices. Parkerll specializes Slotnick's ideas and proposes a logic-per-track system for information retrieval applications. DESIGN PHILOSOPHY OF RAPID The design of RAPID was motivated by the distributed-logic memory of Lee4,5 and the logic-per-track device of Slotnick. 1o RAPID provides certain basic pattern matching capabilities which can be combined to obtain more complicated ones. Strings, which are stored on a rotating memory, are read into the cell storage one symbol at a time, processed, and stored back (Figure 1). Processing strings one symbol at a time allows efficient handling of variable-length records and reduces the required hardware for the cells. Figure 2 shows the organization of data on the rotating memory. Each record is a string of symbols from an alphabet X, which will not be specified here. It is assumed that members of X are represented by binary vectors of length N. Obviously, each symbol must have some control storage associated with it to store the search results temporarily. One control bit has proven to be sufficient for most applications even though some HEAD-PER-TRACK DISK o CELLS CONTROL UNIT TOANO FROM OTHER SYSTEMS Figure 1-0verall organization of RAPID Parallel Computing System for Information Retrieval RAPID is very similar to the distributed-logic memory in principle but differs fronl it in the following: .... ROTATION ONE RECORD (VARIABLE LENGTH) HEAD-PER-TRACK DISK CLOCK TRACK EMPTY ZONE TO ALLOW SUFFICIENT T11VI'E FOR PREPARING THE NEXT INSTRUCTION (OF THE ORDER OF 1j.ts) STATE 683 SYMBOL (N BITS) 1[11·····11 ONE CHARACTER Figure 2-Storage of characters and records operations may be performed faster with a larger control field. Control information for a symbol will be called its state, q E {O, I}. A symbol x and its state q constitute a character, (q, x). One of the members of X is a don't-care symbol, 0, which satisfies any search criterion. As an example for the utility of 0, consider an author whose middle name is not known or who does not have one. Then, one can use 0 as his middle initial in order to make the author field uniform for all records. We will use the encoding 11 ... 1 for 0 in our implementation. In practice, it will become necessary to have other special symbols to delimit records, fields, and so on. The choice of such symbols does not affect the design and is left to the user. It should be emphasized, at this point, that RAPID by itself is only capable of simple pattern matching operations. Appropriate record formats are needed in order to make it useful for a particular information retrieval application. One such format will be given in this paper for general-purpose information retrieval applications. The idea of associating a state with each symbol is taken from Lee's distributed-logic memory.4,5 In fact, 1. Only one-way communication exists between neighboring characters in RAPID. This is necessitated because of the use of a cyclic memory but results in little loss in power or flexibility. 2. The use of a cheaper and slower memory makes RAPID more economical but increases the search cycle from microseconds to miliseconds. 3. Besides match for equality, other types of comparisons such as less-than and greater-than are mechanized in RAPID. 4. Basic arithmetic capability is provided in RAPID. It allows for threshold combinations of search functions as well as conventional Boolean combinations. With the above data organization, the problem of searching for particular sets of records will reduce to that of locating substrings which satisfy certain criteria. Search for successive symbols of a string is performed one symbol per disk or drum revolution. There are at least two reasons for this design choice: 1. At any time, all the cells will be performing identical functions (looking for the same symbol). This reduces the hardware complexity of each cell since the amount of local control is minimized and fewer input and output leads are required. 2. The alternative approach of processing a few symbols at a time fails in the case of overlapping strings. Suppose one tries to process k symbols at a time (k > 1) by providing local control for each cell in the form of a counter. Then, if the i-th symbol in the input string is matched, the cell proceeds to match the (i + 1)-st symbol. Hence, if one is looking for the pattern ABCA in the string ... DCABCABCADA ... , only one of the two patterns will be found. Also, the pattern BCAD will not be found in the above example. THE CONTROL UNIT Figure 3 shows a block diagram of RAPID which is a synchronous system operating on the disk clock tracks. The phase signal generator sequences the operations by generating eight phase signals. PHA, PHB, PHC, and PHZ are generated once every disk revolution while PHI, PH2, PH3, and PH4 are generated once every bit time (Figure 4). During PHA, the cell control register (CCR) , input symbol register (ISR) , and address 684 Fall Joint Computer Conference, 1972 ONE LINE PER CELL N+2 LINES PER CELL HEAD-PER -TRACK DISK OR DRUM • N+1 LINES PER CELL holds the instruction to be executed for one disk revolution. The function of various fields in this register will now be described. MULTIPLE RESPONSE RESOLVER (MRR) This field consists of two bits, RST and RSY. RST commands the cells to read the state bit into the current state flip-flop, CSF. RSY commands the cells to read the symbol bits into the current symbol register, CSR. LAS PHC CELLS ONE LINE PER CELL 12 LINES -I I- _ :z: a: ... II) a: Ow wOW :J I!::J ~~ ~ g oww «rna: 0 a: ~ a: !a ~ ~ ~!!O it it !!O « Write field ~§~ 0-lt:l I- ~~ uua: N+1 LINES rn~a: -I .Je:~ -IZt:l a: PHASE SIGNAL GENERATOR (PSG) N LINES Readfield .. SAZ SELECTED ADDRESS .IS ZERO This is similar to the read field and consists of WST and WSY. WST commands that the condition bit, CON (see description of condition field), replace the current state. WSY is a command to replace the current symbol by the contents of current symbol register, CSR, if CON = 1. Address selection field This field contains two bits, LAS and RAS. If the LAS bit of this field is set, the address selection register CONTROL UNIT Figure 3-Block diagram of RAPID selection register (ASR) are cleared. During PHB and PHC, these registers are loaded. Then the execution of the instruction in CCR starts. During PH3, the output character register is reset. It is loaded during PH4 and is unloaded, through G4, after a certain delay. Most parts of the control unit, namely the instruction sequencing section and the auxiliary registers which are used to load CCR, ISR, and ASR or unload OCR, are not shown in Figure 3. It should be noted, however, that these parts ·process instructions at the same time that the cells are performing their functions such that the next instruction and its associated data are ready before the next PHB signal. The system can also be controlled by a general-purpose computer which is interrupted during PHB to load the auxiliary registers with the next instruction and associated data. The arrangement of records on disk is shown in Figure 2. The N +1 bits of a character are stored on parallel tracks while the characters of a record are stored serially. One or more clock tracks supply the timing pulses for the system. The empty zone is provided to allow sufficient time for loading the control registers for the next search cycle. Figure 5 shows the cell control register (CCR) which ONE DISK OR DRUM REVOLUTION ONE BIT TIME PHA PHB PHC _-----'n n ... Jl"-----_ _-----In L ... J _____ _------'n L ... ~"'"'PH1 PH2 PH3 _-----'n PH4 fL ... ~ fL HZ _ _P----i Figure 4-Timing signals Parallel Computing System for Information Retrieval TABLE I-The Match Condition for the State Part of a Character (ASR) is loaded from the multiple response resolver (MRR). MRR outputs the address of the first cell with its ASF on. If the RAS bit is set, the accumulated state flip-flop, ASF, in the cells will be reset. The function of ASF will be described with the cell design. The address selection field allows the sequential readout of the tracks which contain information pertinent to a search request. .,,:tJ -m m» 60 :tJ (I) -I B.EAD§!ATE :tJ (I) READ SYMBOL :E (I) ~RITE STATE m:tJ r- o~ ~RITE SYMBOL ~~ o o o never if q = 0 if q = 1 always 1 o 1 1 1 Match field MATCH ~TATE TO 1 3: (I) MATCH~TATETO~ERO Condition field ... :E (I) r ~ .!:.OAD~R :tJ .RESET ASF ~ » Z (I) »» -1-1 Match 3: -I 0(1) 3:~ MSZ This field consists of two subfields; the state match subfield, and the symbol match subfield. These subfields specify the conditions that the state and symbol of a character must meet. If both conditions are satisfied for a particular character, the current match flip-flop (CMF) of the corresponding cell is set. The state match subfield consists of MSI and MSZ. The conditions for all combinations of these two bits are given in Table I. The symbol match subfield consists of three bits; GRT, LET, and EQT. All the symbols in the cells are simultaneously compared to the l's complement of the contents of ISR. Table II gives the conditions for all combinations of the three signals. S is the symbol in a cell and r is the l's complement of the contents of ISR. -< :!!~e; mro 6 m :tJ MSl ~ -< "':E 685 ~ ... Qm N 3: » -I (") G) X :tJ -I ." ru:!EATER !HAN m r 0 3:(1) »-< -13: (")0:1 .. xO r r m LESSIHAN 0 m EQUALI.0 r 0 LOGICAL .fUNCTION -I -I ." (I) (") (") 0 z 0 ~ (5 Z :!! m r 0 .§.ELECT~F (I) (") 0 This field specifi~s how the condition bit, CON, is to be computed from the contents of the following four flip-flops in a cell: current state flip-flop, CSF; accumulated state flip-flop, ASF; current match flip-flop, CMF; and previous match flip-flop, PMF. LOF specifies the logical function to be performed (AND if LOF= 1, OR if LOF=O). The other four bits in this field specify a subset W of the set of four control flip-flops on which the logical function is to be performed. For example, if SCS=I, then CSF E W. TABLE II-The Match Condition for the Symbol Part of a Character z -I :tJ 0 r (I) » .§.ELECT ASF (I) Match GRT LET EQT ." ." ~ fI) m r m (I) (") 3: §.ELECT CMF (") -I (5 Z (I) "0 3: §.ElECTfMF Figure 5-The cell control register (CCR) 0 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 if S if S if S if S if S if S never Y or S < Y or S < Y or S ;: Y or S > Y or S ~ Y or S always = =a =a =a =a =a =a 686 Fall Joint Computer Conference, 1972 TO MULTIPLE RESPONSE RESOLVER CURRENT STATE FLIP-FLOP "TO PROCESSING SECTION S FROM DISK S CSF PH4 0 R ADS PHZ RAS PHZ MS1 ASF 0 R ACCUMULATED STATE FLIP-FLOP Z 0 (,) STATE MATCH STM PH3 MSZ z 0 i= CURRENT MATCH FLIP-FLOP PREVIOUS MATCH FLIP-FLOP S S 0 (,) ~ II) C 0 I- PMF CMF R C z R 0 0 FROM PROCESSING SECTION SYM SIGNAL TO SYMBOL TRACKS Figure 6-Control section of a cell As will be seen later, the cell design is such that by appropriate combinations of bits in CCR, other functions besides simple comparison can be performed. THE CELL DESIGN Each cell consists of two sections; the control section, and the processing section. Roughly speaking, the control section processes the state part of a character while the processing section operates on the symbol part. The control section (Figure 6) contains four flip-flops: current state flip-flop, CSF; accumulated state flip-flop, ASF; current match flip-flop, CMF; and previous match flip-flop, PMF. CSF contains the state of the character read most recently from the disk. ASF contains the logical OR of the states of characters read since it was reset. This flip-flop serves two purposes: finding out which tracks contain at least one character with a set state (reset by ADS during PHZ) and propagating the state information until a specified character is encountered (reset by RAS during PHZ and by CMF during PH4). CMF contains (after PH3) the result of current match. It is set if both the state and symbol of the current character meet the match specifications. Finally, PMF contains the match result for the previous character. The condition signal, CON, is a logical function of the contents of control flip-flops. The four signals SCS, SAS, SCM, and SPM select a subset of these flip-flops and the logical function signal, LOF, indicates whether the contents of selected flip-flops should be ANDed (LOF= 1) or ORed (LOF=O) together to form CON. The value of CON will replace the state of current character if the write state signal, WST, is activated. The address selection signal, ADS, is activated by the address selection decoder. This signal allows conventional read and write operations to be performed on selected tracks of the disk. It is also possible, through the multiple response resolver, to read out sequentially the contents of tracks whose corresponding ASF's are set. The processing section, shown in Figure 7, contains an N -bit adder with inputs from ISR and the current symbol register, CSR. During PHI, a symbol is read into CSR. During PH2, contents of CSR are added to contents of ISR with the result stored back in CSR. Overflow indication is stored in the overflow flip-flop, OFF. Before the addition takes place, the don't-care Parallel Computing System for Information Retrieval flip-flop, DCF, is set if CSR contains the special don'tcare symbol o. From the results of addition, it is decided whether the symbol satisfies the search specification (SYM = 1 if it does, SYM = 0 if it does not). The adder in each cell allows us to add the contents of ISR to the current symbol or to compare the symbol to the l's complement of the contents of ISR. If we denote the current symbol by S, the contents of ISR by Y, and its l's complement by Y, then: S= Yiff S+Y +1=2N S> Y iff S + Y +1> 2N SY iff Z~O and OFF = 1 S < Y iff OFF = 0 687 Note that the carry signal into the adder is activated if anyone of the signals GRT, LET, or EQT is active. The above equations are used in the design of the circuit which computes the symbol match result, SYM (upper right corner of Figure 7). The result of symbol match is ANDed with the result of state match (STM) during PH3 to set the current match flip-flop. Finally, during PH4, the contents of CSR can be written onto the disk or put on the output bus. Since the address selection line, ADS, is active for at most one cell, no conflict on the output bus will arise. EXAMPLES OF APPLICATIONS We first give a set of 12 instructions for RAPID. These instructions perform tasks that have been found to be useful in information retrieval applications. Each instruction, when executed by RAPID, will load CCR with a sequence of patterns. These sequences of patterns are also given. We restrict our attention to search SYMBOL MATCH OVERFLOW FLIP-FLOP ~ oa: I-z zo 0- ul- S o~ 1-(1) OFF R FROM { INPUT BUS (lSR) 0 t---+------~ ADDER .•. PH2 PH2 PH2 S R 0 S ~ (I) FROM DISK • • • i5 PH2 R 0 0 l- • • • • • PH2 PH2 Or----------------------------------~ CURRENT \ SYMBOL CSR REGISTER PH4 FROM CONTROL SECTION Figure 7-Processing section of a cell 688 Fall Joint Computer Conference, 1972 instructions only. Input and output instructions must also be provided to complete the set. 7. 1. search and set 8: Find all occurrences of the symbol 8 and set their states. 2. search for 8182 ••• Sn: Find all the occurrences of the string 8182 ••• 8 n and set the state of the symbols which immediately follow Sn. 3. search for m.arked 8182 •• . 8n : Same as the previous instruction except that for a string to qualify, the state of its first symbol must be set. 4. search for m.arked 1/1 8: Search for symbols whose states are set and have the relation 1/1 with s. Then, set the state of the following symbol. Possible relations are <, ::S;, >, ~, and ¢. 5. propagate to 8: If the state of a symbol is set, reset it and set the state of the first S following it. 6. propagatei: If the state of a symbol is set, 8. 9. 10. 11. 12. reset it and set the state of the i-th symbol to its right. expand to 8: If the state of a symbol is set, set the state of all symbols following it up to and including the first occurrence of 8. expand i: If the state of a symbol is set, set the state of the first i symbols following it. contract i: If the state of a symbol is reset, reset the state of the first i symbols following it. expand i or to 8: If the state of a symbol is set, perform 7 if an 8 appears within the next i symbols; otherwise, perform 8. add 8: Add the numerical value of 8 to the numerical value of any symbol whose state is set. replace by 8: If the state of a symbol is set, replace the symbol by 8. The microprograms for these instructions are given TABLE III-Microprograms for RAPID Instructions Contents of CCR c 1 $ 1 Match Field Condition Field Write Address Field Selection State Symbol ,-ogic FF Selection L E W W l If rtf iR· S G l S S S E Q A A S S C A C P S S R 0 T T T Y S S T M M 1 Z F S S 0 0 1 0 1 0 1 1 0 1 0 0 1 $1 1 1 0 0 1 1 0 0 1 0 0 0 1 0 :; So. CIJ ~ CIJ Instruction .t:l § ~ ~ z: litO::: ~V) c CIJ ~ It- co 0 u Read Field If S T 1 search and sel. s 2 ~earch for sls2, •• sn R S y j=2 to n $j 1 1 1 0 0 1 0 0 0 1 0 0 0 1 3 search for markeg 5 152••• s n j=l to n $.i 1 1 1 0 0 1 0 0 0 1 0 0 0 1 4 search for marked 1s 5 pr~e.a2~te t,2 5 < 1 $ 1 1 1 0 0 1 0 0 1 0 0 0 0 1 :S 1 5 1 1 1 0 0 1 0 0 1 1 0 0 0 1 > 1 $ 1 1 1 0 0 1 0 1 0 0 0 0 0 1 ~ 1 $ 1 1 1 0 0 1 0 1 0 1 0 0 0 1 • 1 S 1 1 1 0 0 1 0 1 1 0 0 0 0 1 1 $ 1 1 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 1 6 eroea2ate i i 7 exeand to s 1 xl?~nd i i 1 1 0 0 1 0 1 1 1 0 1 0 0 1 9 contract i i 1 1 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 1 8 1 S $ 1 1 1 1 1 10 exeand i gr..j.Q s i 11 ~ s 1 s 1 1 1 0 1 0 0 0 1 s 1 0 1 0 1 0 0 0 12 reg1l~e ~l. s 1 Parallel Computing System for Information Retrieval 689 If the record length is specified by two characters, we note that t1t2 ~ 8182 iff t1 > 81 or t1 = 81 and t2 ~ 82. Hence, we write the following program: search for A search for marked > propagate 1 replace by T search for A search for marked 81 search for marked ~ replace by T search and set T replace by cP propagate to p propagate 3 search for marked E RECORD LENGTH FIELD ONE INFORMATlO_N FIELD FIELD INFORMATION SEPARATOR SYMBOL FIELD END SYMBOL Figure 8-Data storage format in Table III. A blank entry in this table constitutes a don't-care condition. The entries in the repetition column specify the number of times the given patterns should be repeated. As can be seen from Table III, this set of instructions does not exploit all the capabilities of RAPID since some of the bits in the CCR assume only one value (0 or 1) for all the instructions. To illustrate the applications of RAPID, we first choose a format for the records (Figure 8). The record length field must have a fixed length in order to allow symbol by symbol comparison of the record length to a given number. The information fields can be of arbitrary lengths. The flag field contains three characters; two for holding the results of searches, and one which contains a record type flag. The Greek letters used on Figure 8 are reserved symbols and should not be used except for the purposes given in Table IV. As mentioned earlier, a special symbol, ~, is used as a don't-care symbol. It is also helpful to have a reserved symbol, T, which can be used as temporary substitute for other symbols during a search operation. Let us now consider two simple examples to show the utility of the given instruction set. Example 1. Assuming that the record length is specified by one symbol, the following program marks all the empty records whose lengths are not less than 8. This is useful when entering a new record of length 8 to find which tracks contain empty records that are large enough. search for cPTIO' expand to cJ> search for marked magnet expand 10 or to {3 contract 3 propagate to p propagate 3 search for marked v It is important to note that the record format given here serves only as illustration. Because of its generality and flexibility, this format is not very efficient in terms of storage overhead and processing speed. For any given application, one can probably design a format which is more efficient for the types of queries involved. CONCLUSION In this paper, we have described a special-purpose highly parallel system for information retrieval applicaTABLE IV-List of Reserved Symbols x p cP ~ 8 ." ~ T E 82 Example 2. The following program marks all nonempty records which contain in their title field, designated by TI, a word having "magnet" as its first six characters and having 3 to 10 non-blank characters after that. {3 designates the "blank" character. 0" search for A search for marked propagate to p propagate 3 search for marked 81 Indicates start of length field. Indicates end of a record. Separates name and information subfields in a field. Indicates end of a field. Designates the end of an empty record. Designates the end of a non-empty record. Is the don't-care symbol. Is used as temporary substitute for other symbols. 690 Fall Joint Computer Conference, 1972 tions. This system must be evaluated with respect to the properties of an ideal information retrieval system summarized earlier. It is apparent that RAPID satisfies P2, P4 and P5. The extent to which PI and P3 are satisfied by RAPID is difficult to estimate at the present. With respect to PI, the storage medium used has a low cost per bit. However, the cost for cells must also be considered. Because of the large number of identical cells required, economical implementation with LSI is possible. Figures 6 and 7 show that each cell has one N-bit adder, N +6 flip-flops, 6N +39 gates, and 4N +23 input and output pins. For a symbol length of N = 8 bits, each cell will require nomore than 250 gates and 60 input and output pins. The number of input and output pins can be reduced considerably at the expense of more sophisticated gating circuits (i.e., sharing input and output connections). With respect to P3, the search speed depends on the number of symbols matched. If we assume that on the average 50 symbols are matched, the matching phase will take about 70 disk revolutions (to allow for overhead such as propagation of state information and performance of logical operations on the search results). Hence, the search time for marking the tracks which contain relevant information is of the order of a few seconds. Some important considerations such as input and output of data and fault-tolerance in RAPID have not been explored in detail and constitute possible areas for future research. The interested reader may consult Reference 12 for some thoughts on these topics. ACKNOWLEDGMENTS The author gratefully acknowledges the guidance and encouragement given by Dr. W. W. Chu in the course of this study. Thanks are also due to Messrs. P. Chang, D. Patterson, and R. Weeks for stimulating discusSIOns. REFERENCES 1 J GOLDBERG M W GREEN Large files for information retrieval based on simultaneous interrogation of all items Large-capacity Memory Techniques for Computing Systems New York Macmillan pp 63-67 1962 2 S S YAU C C YANG A cryogenic associative memory system for information retrieval Proceedings of the National Electronics Conference pp 764-769 October 1966 3 J A DUGAN R S GREEN J MINKER WE SHINDLE A study of the utility of associative memory processors Proceedings of the ACM National Conference pp 347-360 August 1966 4 C Y LEE Intercommunicating cells, basis for a distributed-logic computer Proceedings of the FJCC pp 130-136 1962 5 C Y LEE M C PAULL A content-addressable distributed-logic memory with applications to information retrieval Proceedings of the IEEE Vol 51 pp 924-932 June 1963 6 D A SAVITT H H LOVE R E TROOP ASP; a new concept in language and machine organization Proceedings of the SJCC pp 87-102 1967 7 W A CROFUT M R SOTTILE Design techniques of a delay line content-addressed memory IEEE Transactions on Electronic Computers Vol EC-15 pp 529-534 August 1966 8 P T RUX A glass delay line content-addressable memory system IEEE Transactions on Computers Vol C-18 pp 512-520 June 1969 9 R H FULLER R M BIRD R M WORTHY Study of associative processing techniques Defense Documentation Center AD-621516 August 1965 10 D L SLOTNICK Logic per track devices Advances in Computers Vol 10 pp 291-296 New York Academic Press 1970 11 J L PARKER A logic-per-track retrieval system 'Proceedings of the IFIPS Conference pp TA-4-146 to TA-4-150 1971 12 B PARHAMI RAPID; a rotating associative processor for information dissemination Technical Report UCLA-ENG-7213 University of California at Los Angeles February 1972 The architecture of a context addressed segment-sequential storage by LEONARD D. HEALY U.S. Naval Training Equipment Center Orlando, Florida and GERALD J. LIPOVSKI and KEITH L. DOTY University of Florida Gainesville, Florida INTRQDUCTION tecture is its associative (or context) addressing capability. Search instructions are used to mark words in storage that match the specified criteria. Context addressing is achieved by making the search criteria depend upon both the content of the word being searched and the result of previous searches. For example, consider the search of a telephone directory in which each entry consists of three separate, contiguously placed words: subscriber name, subscriber address, and telephone number. The search for all subscribers named John J. Smith is a content search-a search based upon the content of a single word. The search for all subscribers named Smith who live on Elm Street is a context search-the result of the search for one word affects the search for another. Associative addressing, or more correctly, content addressing, has been attempted on discsl in which each word in the memory is a completely separate entity in such an addressing scheme. This paper shows how context addressing can be done. Words nearby a word in the storage can be searched in context, such that a successful search for one word can be made dependent on a history of successful searches on the nearby words. Strings, sets, and trees can be stored and searched in context using such a context-addressed storage. 2 More complex structures such as relational graphs can also be efficiently searched. The context-addressed disc has the following advantage over a random-accessed disc in most nonnumeric data processes. Large data bases can be searched, for instance, for a given string of characters. Once a string is found, data stored nearby the string on This paper presents a new approach to the problem of searching large data bases. It describes an architecture in which a cellular structure is adapted to the use of sequential-access bulk storage. This organization combines most of ,the advantages of a distributed processor with that of inexpensive bulk storage. Large data bases are required in information retrieval, artificial intelligence, management information systems, military and corporate logistics, medical diagnosis, government offices and software systems for monitoring and analyzing weather, ecological and social problems. In fact, most nonnumerical processing requires the manipulation of sizable data bases. An examination of memory costs indicates that at present the best way of storing such data bases, and the one most widely used in new computer systems, is disc storage. However, the disc is not used anywhere near its full potential. Discs are presently used as random access storages. Each word has an address which is used to select the word. However, the association of each word with a fixed location, required in a random access storage, is a disadvantage. In a fixed-head disc, each word is read by means of a read head and can be over-written by a write head. N ow,if we discard the capability to randomly address, associative addressing can be used as words are read, and automatic garbage collection can be performed as words are rewritten. Perhaps the most important feature of this archi- 691 692 Fall Joint Computer Conference, 1972 the disc track can be returned to the central processor. Only relevant data need be returned, because the irrelevant data can be screened out by context-addressed searching on the disc itself to select the relevant data. In contrast, a conventional disc will return considerable irrelevant data to the central processor to be searched. Thus, the I/O channel requirements and primary storage requirements of the computer are reduced because less data is transferred. In fact, there is a maximum number of random-accessed discs that can be serviced by a central processor because it has to search through all the irrelevant data returned by all the discs, whereas an unlimited number of context-addressed discs can be searched in parallel. Moreover, the instructions used to search the disc storage can be stored in the disc storage itself. Thus, the central processor can transfer a search program to the disc system, then run independently until the disc has found the data. The computer would be interrupted when the data was found. This will reduce the interrupt load on the computer. In this paper we therefore study the implementation of a context-addressed storage using a large number of discs. The segment-sequential storage to be studied will have the following characteristics (see Figure 1). The entire storage will store a I-dimensional array of words, called the file. From the software viewpoint, collections of words related in a data structure format are stored in a contiguous section of the file, called a record. Records can be of mixed size. From the hardware viewpoint, the file will be broken into equal-length segments and stored on fixed-head discs, one segment to a disc. In the time taken to rotate one disc completely, all discs can search simultaneously for a given word in the context of a data structure as directed by the user's query, marking all words satisfying the search. Words selected by such context searches can be over-written with new data in such a data structure, erased, read out to the I/O channel, or selected as instructions to be executed during the next disc rotation. Data in groups of words can be copied and moved from one part of the file to another part as the data structure is edited. In the meantime, a hardware garbage collection algorithm will collect erased words to the bottom of the file so that large aggregates of words are available to receive large records. MOTIVATION / RECOROS / { §} • • • • • • • SEGMENTS • • .. SOFTWARE MAKEUP HARDWARE PLACEMENT Figure I-Storage of records as segments The problem that leads to the system architecture proposed here is the efficient use of storage devices equivalent to large disc storages. Access to files stored on such devices is currently based upon a sequential search of the file area by reading blocks of data into the main storage of the central processor and searching it there or by use of a file index which somehow relates the file content to its physical location. Many hierarchies of searches have been devised-all efforts to solve the basic problem that the storage device is addressed by location but the data is addressed by its content. The advantage of information retrieval based upon content is well documented. 3,4,5 However, the trend has been toward application of associative-search hardware within the central computer. Content-search storages have been implemented as subsystems within a computer system ;6,7 but even in these cases, the use of the search subsystem has been closely associated with operations in the central processor. The devices fit into the storage hierarchy between the central processor and the main core storage. A typical application of a contentaddressed storage is as a cross-reference to information in main storage-the cache storage. An associative storage subsystem specifically designed for the processing useful in software applications has been proposed, 8 but even that is limited in size by the cost of the special storage hardware. Systems of the type mentioned are small, high-speed Architecture of Context Addressed Segment-Sequential Storage units. They are limited to content search and are restricted in size relative to bulk storage devices. Their application to searching of large data bases is limited to general improvement of central processor efficiency or to searching the index for.a large data base. What is needed for true context search of a large data base is an economic subsystem which can be connected to a computer and can perform context search and retrieval operations on a large data base stored within that subsystem. The approach described in this paper provides just such a subsystem. It is a semi-autonomous external device which has its own storage and control logic. The design concept is specifically oriented toward use of a large bulk storage medium instead of high-speed core storage. In addition, the processing capability of the subsystem has been expanded to include not only list processing, but also special searches such as matching data strings against templates and operations on bit strings to simulate networks of linear threshold elements useful in pattern recognition. The basic building block of the proposed architecture is a segmented sequential storage. The sequential storage was chosen because it provides an economically feasible way to store a large data base. In order to perform search operations on this data base, the storage must be divided into segments which can be searched in parallel. Each segment of the sequential storage must have its own processing capability for conducting such a search. This leads to a cellular organization in which each cell consists of a sequential storage segment. The segment-sequential storage has the following property. Suppose n items are compared with each other exhaustively. This requires n storage words. Thus, the total size of the storage obviously grows linearly with n. However, as the size grows, more discs are added on, but the time for a search depends only on the size of the largest disc and not on the number of discs. Thus, the time to search for each item in a query is still the same. The total time for the search grows linearly with the number of words to be compared. As a first approximation to the cost of programming, the product of storage size and search time grows as n 2 • This compares with n 3 for a conventional computer. Thus, this storage is very useful for those operations in which all the elements in one set are exhaustively compared with each other or with members of another set, especially when the set is very large. Similarly, the cost of a comparison of one element with a set of n elements grows as n 2 in a conventional processor, and as n in this architecture. The rate of growth of the cost of programming for this architecture is the same as for cellular associative memories,9 primarily because it too is a parallel cellular system. 693 Some algorithms demand exhaustive comparisons. Some of these are not used because of their extreme cost. Other algorithms abandon exhaustive comparison to be usable in the Von Neumann computer at some increase in programming complexity, loss of relevance or accuracy, or at the expense of structuring the data base so that other types of searches cannot be efficiently conducted. In view of the lower cost of an exhaustive search, this storage might· be useful for a number of algorithms which are now used for information management in the Von Neumann computer and many others which are not practical 011 that type of machine. Discs appear to be slow, but their effective rate of operation can be made very fast when they are used in parallel. A typical disc rotates at sixty revolutions per second. The segment-sequential storage will be able to execute sixty instructions per second. (Faster rates may eventually be possible with special discs, or on processors built from magnetic bubble memories, semiconductor shift registers, or similar sequential memories.) However, if one hundred fixed-head discs storing 32k words per disc are simultaneously searched, nearly two hundred million comparisons per second are performed. This is approximately the rate of the fastest processor built. This large system of 100 discs would cost about $5000 per disc for a total cost of $500,000. This cost is small compared to that of a new large computer. Thus, this architecture appears to be costeffective. This architecture is based on storage and retrieval from a segmented sequential table data structure utilizing associative addressing. This results in the following characteristics. (1) The search time is independent of the file size. The data content of each cell is searched in parallel; the search time depends only upon the cycle time of the individual storage segment and the number of instructions in the query. (2) The search technique is based largely upon context. Notables or cross references are required to locate data. However, there are cases where cross references can be used to advantage. (3) New data may be inserted at any place in the file. The moving of the data that follows the place of insertion to make room for the new information is performed automatically by the cells. (4) Whenever information is deleted from the file, later file entries will be moved to close the gap. Thus, the locations in the bulk storage will always be "packed" to put available storage at the end of the file area. (5) The system is a programmable processor. Since 694 Fall Joint Computer Conference, 1972 each instruction takes 1/60 second to be executed, as much processing should be done as possible for each instruction. Further, because the cell is large, the cost of the processing hardware will be amortized over many words in that cell. Thus, a large variety of rather sophisticated instructions will be used to search and edit the data. Programming with these instructions will be simpler than programming a conventional computer in assembler language. \ l CONTROLLER J ~ r Lastly, since this architecture is basically cellular, where one disc and asso,ciated control hardware is a (large) cell, the following advantages can be obtained. (1) The system is iterative. The design of one cell is repeated. The cost of design is therefore amortized over many cells. (2) The system is upward expandable. An initial system can be built with a small number of cells. As storage demands increase, more cells can be added. The old system does not have to be dis.:. carded. (3) The system is fail soft. If a cell is found to be faulty, it can be taken out of the system, which can still operate with reduced capability. (4) The system is restructurable. If several small data bases are used, the larger system can be electrically partitioned so that each block of cells stores and searches one data base independently of the other blocks. Further, several systems attached to different computers, say in a computer network, can be tied together to make one larger system. Since the basic instruction rate· is only sixty instructions per second, the time delays of data transmission through the network are generally insignificant. Thus, the larger system can operate as fast as any of its cells for most operations. j CENTRAL PROCESSOR INPUT/OUTPUT CHANNEL BROADCAST/COLLECTOR BUS ~ r r CELL CELL 14-- ••••• ---. CELL 2 N-I I CELL N Figure 2-System block diagram. CENTRAL PROCESSOR INPUT/OUTPUT CHANNEL CONTROL T- REGISTER K- REGISTER Based on these general observations, the segmentsequential storage has very promising capabilities. In the next sections, we will describe the machine organization and show some types of problems that are easily handled by this system. SYSTEM ORGANIZATION The system block diagram for the segment-sequential storage is shown in Figure 2. The system consists of a controller plus a number of identical cells. The controller provides the interface with an I/O channel of the central computer necessary to perform: (1) input and output i~ OPERANDS t t t _______ MICROPROGRAMS i ~ WORD LENGTH • 8R_O_A_D_C_A_S_T_/_C_O_L_L_EC_T_O_R__B_U_S__________- - Jf Figure 3-Controller block diagram Architecture of Context Addressed Segment-Sequential Storage operations between the central computer's core storage and the storage of the individual cells, and (2) search operations commanded by the central computer. Each individual cell communicates with the controller via the broadcast/collector bus and with its left and right adj acent neighbor by a direct connection. All cells are identical in structure. A more detailed diagram of the controller is shown in Figure 3. The controller appears similar to a conventional disc controller to the central computer. It performs the functions necessary to execute orders transmitted from the central computer via its I/O channel. The segment-sequential storage is thus able to perform its specialized search operations under the command of a program in the I/O channel. Intervention of the central computer is required only for initiation of a search and, perhaps, for servicing an interrupt when the search is complete. In its role in providing the interface between the I/O channel and the cells, the controller is quite different from a conventional disc controller. Instead of registers for track and head selection, this controller provides the registers required to hold the information needed by the cells in performing their specialized search operations. These registers are: (1) Instruction Register-I: This register holds the instruction which tells what function the cells should perform during the next cycle. The instruction is decoded by a read-only memory that breaks it down into microinstructions. (2) Comparand Register-C: This register holds the bit configuration representing the character being searched for. It has an extension field Q which is used when writing data into the cell storage. (3) Mask Register-K: This register holds a mask which specifies which bits of the C Register are to be considered in the search. (4) Threshold Register-T: This register holds a threshold value which allows use of search criteria other than exact match or arithmetic inequality. (5) Bit-length Register-B: This register is used to hold the number of bits in the data word. This allows the word size of the storage segments to be selected under control of the computer. A block diagram of the cell is shown in Figure 4. Each cell executes the commands broadcast by the controller and indicates the results by transmission of information to the broadcast/collector bus and also through separate signal lines to its adjacent neighbors. The C, K, T, and B Registers of the controller are duplicated in each cell. These registers are used by the BROADCAST I COL LECTOR 695 BUS C- REGISTER K- REGISTER T- REGISTER STATUS LOGIC READ HEAD WRITE HEAD SEQUENTIAL MEMORY SEGMENT Figure 4-Block diagram of cell arithmetic unit in each cell in performing the commanded operation upon its segment of the storage. The status register is used to hold composite information about the flag bits associated with individual words in the storage segment. Control logic in the cell determines what signals are passed from the cell to the broadcast/ collector bus and to adjacent cells. Each cell can transfer its entire storage contents to its neighbor. DATA FORMAT The storage structure of the segment-sequential storage system consists of a number of cells, each of which contains a fixed-length segment of the total sequential storage. Figure 5 depicts the arrangement of data words within one such segment. The storage segment within the cell is a sequential storage device such as a track on a drum or disc, a semiconductor shift register, or a magnetic bubble storage device. Words stored in the segment are stored sequentially, beginning at some predefined origin point. Data read at the read head is appropriately processed by the arithmetic unit and written by the write head. The information structure of the segment-sequential storage system consists of fixed-length words arranged 696 Fall Joint Computer Conference, 1972 ORIGIN START OF RECORD .,...--INDICATED BY START BIT SEQUENTIAL FILE RECORD I CIRCULATION RECORD 2 __ START OF RECORD --"'INDICATED BY START BIT Figure 5-Word arrangement in a storage segment Figure 6-Division of a file into fixed-length segments in variable-length records. The words in a record are stored in consecutive storage locations (where the location following the last storage location in a segment is the first storage location in the following segment). Thus, a record may occupy only a part of one storage segment or occupy several adjacent segments. The start of a record is indicated by a flag bit attached to the first word in the record, and an end of a record is implied by the start of the next record. Figure 6 shows how a record may be spread over several adjacent segments. Figure 7 shows an expanded view of one word in storage. The b data bits in the word are arranged serially, least significant bit first, with four flag bits terminating the word. The functions of the flag bits are: (1) S: The START bit is used to indicate the beginning of a data set (record). The search of a record begins with a word containing a START bit. (2) P: The PERMANENT bit is used for special designations. Interpretation of this bit depends upon the instruction being executed by the cell. (3) M: The MATCH bit is used to mark words which satisfy the search criteria at each step in the context search operations. (4) X: The X bit is used to mark deleted words. Words so marked are ignored and are eventually overlaid in an automatic storage compression scheme. OPERATIONAL CONCEPTS The basic operation in context searching is a search for records which satisfy a criterion dependent upon both content and the result of previous searches. As an ~ • DATA BITS ~WORD Figure 7-c-Word configuration .. I r FLAGS [ Architecture of Context Addressed Segment-Sequential Storage 697 example to illustrate how the segment-sequential storage is able to search all cells simultaneously, consider the ordered search for the characters A, B, C. That is, determine which records contain A, B, and C in that order but not necessarily contiguous. The three searches required to mark all records that satisfy such a query are: (1) Mark all words In storage which contain the character A. (2) Mark all words in storage which contain the character B and follow (not necessarily immediately) a previously marked character in the same record. At the same time, reset the match indication from the previous search. (3) Repeat the operation of step 2 for the character C. The result of these steps is to leave marked only those records. which match the ordered search specified. S M LS TR RS 10 10 10 I LS TR RS 10 II 10 I LS TR RS LS TR RS 10 II 10 I 10 10 10 I Figure 8a-Flag and status bits before start of search Figure S shows four segments of a system which will be used to illustrate the processing of such a search. The storage segments each contain four words (characters). Only the START and MATCH flags are indicated. The origin (beginning) of each segment is at the top and the direction of search is clockwise (data bits rotate counter-clockwise under the head). A record containing the string Q,C,B,P,A,B,N,L,K,R,C,T,C begins at the origin of the left-most segment and continues over all four segments. The right-most segment also contains the start of the next record which consists of the string beginning B,A,C. The first command causes all words containing the character A to be marked in the MATCH bit. Thus, after one circulation of the storage, the words are marked as shown in Figure Sb. In order to perform context-search operations in one storage cycle, status bits must be provided in each cell. These are used to propagate information about records which are apread over more than one cell. The status LS TR RS 10 10 10 I LS TR RS II1IIII LS TR RS 10 1 I 10 LS TR RS I 10 10 II I Figure 8b-Flag and status bits after search for A bits and their uses are: (1) TR: The TRansparent status bit is set if no word in the cell is marked with a START bit. It is used to indicate that the status indication to be transmitted to adjacent cells depends upon the status of this cell and the status input from adjacent cells. (2) LS: The Left Status bit is set if any word in a cell between the origin and the first word marked with a START bit is marked with a MATCH bit. This bit indicates a match in a record which begins to the left of the cell indicating the status. (3) RS: The Right Status bit is set if any word in the cell following the last word marked with a START bit is marked with a MATCH bit. This bit indicates a match condition which applies to words stored in the cells to the right of this cell, up to the next word marked by a START bit. These status bits are updated at the end of each cycle of the storage. The condition of the status bits after each operation is performed is shown in Figure S. The second search command causes all previous MATCH bits to be erased, after using them in marking those words which contain a B and follow a previously marked word in the same record. If the previously ORIGIN LS TR RS 10 10 10 I ORIGIN LS TR RS 10 II 10 I ORIGIN LS TR RS II I " I I ORIGIN LS TR RS II 1010' Figure 8c-Flag and status bits after search for B 698 Fall Joint Computer Conference, 1972 SEARCH a COMPARAND MARK COMPARISON TYPE Figure 9b-Search and mark instruction format LS TR RS 10 10 10 I LS TR RS II II II I LS TR RS 101 I I 0 I LS TR RS 10 10 10 I Figure 8d. Flag and status bits after search for C marked bit and the word containing the B are in the same cell, the marking condition is completely determined by the logic in the cell. However, in most cases it is necessary to sense the status bits of previous cells in order to determine whether the ordered search condition is satisfied. Notice that the status bit conditions can be propagated down a chain of cells in the same manner as a high-speed carry is propgated in a conventional parallel arithmetic unit. Figure 8c shows the flag-bit configurations for each word in storage and the status bits for each cell after the completion of the search for B. Figure 8d shows the configurations after the C search. After three cycles of the storage, all records in storage have been searched and those containing the ordered set of characters A, B, C have been marked. In general, a search requires one storage cycle per character in the search specification and is independent of the total storage size. BASIC OPERATIONS In this section, the operations for performing context searches are described in a more formal manner than in the example above. The instructions are a subset of the complete set which is described in a report.l0 The use of these instructions will be illustrated in the section following this one. Each instruction includes a basic operation type and, in most cases, a function code which further specifies what the instruction is to do. Figure 9 shows the instruction format and its variations. Instructions which perform search and mark operations use the function code to specify the type of comparison to be used. Instruc- B: The contents of the Bit-length Register is denoted 11. The word length b = ..L 11. G: The contents of the Comparand Register is denoted Q. Individual bits are Cl (least significant bit) through Cb (most significant bit). K: The contents of the Mask Register is denoted K. Individual bits are represented by the same scheme as that used for C. W: The word of cell storage currently being considered is denoted W. Individual bits are represented by the same scheme as that used for C. R: R denotes the contents of a flip-flop in each cell which is used to indicate the result of the comparison. R~1 for a "match" and R~O for "no match". The match performed is the comparison between Q and W in those positions where k i = 1. In the examples considered in this paper, the comparisons are arithmetic (=, ~, ~). M: The MATCH bit associated with each word (see Figure 7) is denoted M. M without superscript designates the MATCH bit in the word being compared, W. M with a numeric superscript indicates the MATCH bit before or after the one being compared; e.g., M -2 represents the MATCH bit two words before the word on which the comparison is being made. Inequality signs used as superscripts indicate logic signals representing the union of INPUT- INST TYPE tions which initiate input or output operations allow two specifications in the function field. The first designates the channel to be used in the data transfer. The second tells whether the start of each record should be marked, in preparation for another search operation. The symbols used in describing the instructions are given below. The notation is that due to Iverson, modified for convenience in describing some of the special operations performed by the search logic. COMPARAND Figure 9a-Basic instruction format FUNCTION OUTPUT '* NOT USED CHANNEL NUMBER INDICATES THE START FUNCTION BIT Figure 9c-Input-output instruction format • Architecture of Context Addressed Segment-Sequential Storage TABLE I-Description of Instructions SS C 699 TABLE II-Data Format for String XYABLMNCDPEFWZ String Search M~(R/\M-l)V(M/\P) Set the MATCR bit in any word where the masked comparison of the word and the comparand satisfies the comparison type specified in the function field of the instruction and the word is immediately preceded by a word in the same record which was left with its MATCR bit set by the previous instruction. Also, set the MATCR bit in any word which was left with its MATCR bit set by the previous instruction and has its PERMANENT bit set. Reset all other MATCR bits. Ordered Search OS C M~(R/\M<)V(M/\P) Set the MATCR bit in any word where the masked comparison of the word and the comparand satisfies the comparison types specified in the function field of the instruction and the word is preceded (not necessarily immediately) by a word in the same record which was left with its MATCR bit set by the previous instruction. Also, set the MATCR bit in any word which was left with its MATCR bit set by the previous instruction and has its PERMANENT bit set. Reset all other MATCR bits. Mark Start wi~S/\(M>VM) where i=..L(Channel No.) M~S/\(Start Function) If the channel number i specified in the instruction is between 1 and b, set Wi, the ith bit of the first word in any record which contains a word with its MATCR bit set. If the start function bit in the instruction is a one, set the MATCR bit in any word which has its START bit set. Reset all other MATCR bits. MS - P: S: all MATCH bits in the record before (M <) and after eM» the word being compared. The PERMANENT bit associated with each word (see Figure 7) is denoted P. The same superscript conventions apply to P as to M. The START bit associated with each word (see Figure 7) is denoted S. The same superscript conventions apply to S as to M. WORD CONTENTS 1 2 3 I/O Flags (S) X 4 5 6 7 A 8 N 9 10 11 12 C Y B L M D P E 13 14 F W 15 Z (S) indicates the START bit for this word is set. TABLE III-Program to Find Match for $AB$CD$EF$ NO. INSTRUCTION TYPE FUNCTION COMPARAND 1 OS A 2 SS B 3 OS C 4 SS D 5 OS E 6 SS F 7 MS The instructions which are considered in the examples in the next section are described in Table I. SEARCH EXAMPLES The following examples show the application of the segment-sequential storage to matching strings with templates. ll A template consists of characters separated by parameter markers which are to be matched by parameter strings. For example, $AB$CD$EF$ is a template which matches any string formed by the concatenation of any arbitrary string, the string AB, another arbitrary string, the string CD, another arbitrary string, the string EF, and another arbitrary string. 2,S REMARKS mark all strings which begin A or $. mark all strings begin which AB, $, or $B. mark all strings which follow the strings above and begin C or $. mark all strings satisfy which the AB search and contain a subsequent string which satisfies the CD search. mark all strings which follow the strings above and begin E or $. mark all strings satisfy which the template. flag channel #2 for input and mark the start of each record. 700 Fall Joint Computer Conference, 1972 The arbitrary strings need not be the same, and any or all may be the null string. The string XYABLMNCDPEFWZ is one example of a string which matches this template. In the following examples, it is assumed that the first word in each record has had its MATCH bit set by the last instruction of the previous search. The programs shown perform the specified search, initiate the in put of the selected records to the computer, and mark the first word of each record in preparation for the next search. The case where a set of fixed strings is stored in the segment-sequential storage is illustrated first. The data format for a typical string is shown in Table II. The first word is used to hold I/O flags. The characters in the string are stored in sequential words following the I/O word. The program to search all strings in storage and mark the ones that match the template $AB$CD$EF$ is shown in Table III. A template search takes one instruction for each character in the template plus an instruction toset the I/O flag in those records which contain the strings matching the template. Find templates to fit a string The case where a set of templates is stored in the segment-sequential storage is considered next. The data format for stored templates is shown in Table IV. The parameter marker, $, is replaced in storage by use of the PERMANENT bit in those words which contain a character which is followed by a parameter marker. A p:rogram to find templates to match the string XYABLMNCDPEFWZ is shown in Table V. The TABLE IV-Data Format for Template $AB$CD$EF$ CONTENTS 1 I/O Flags 2 3 4 5 6 7 A B C D E F NO. (S),(P) (P) (P) (P) (S) indicates the START bit for this word is set. (P) indicates the PERMANENT bit for this word is set. INSTRUCTION TYPE FUNCTION COMPARAND 1 SS X 2 SS Y 3 4 SS SS SS SS SS SS SS SS SS SS SS SS MS A B L M N C D P E F W Z 5 Find strings to fit a template WORD TABLE V-Program to Find Templates for XYABLMNCDPEFWZ 6 7 8 9 10 11 12 13 14 15 1,S REMARKS mark all strings which begin X or $. mark all strings which begin XY or $. flag channel #1 for input and mark the start of each record. execution of this program illustrates how the PERMANENT bit is used. The X and Y searches do not find a match with the template shown in Table IV. However, since the PERMANENT bit in the first word in the record is set, the first word vd.ll remain marked by a MATCH bit and therefore continue as a candidate for a successful search. The A and B searches cause the MATCH bit in the word containing B to be set. Since this word also has its PERMANENT bit set, the MATCH bit will remain set during the searches for the remaining characters in the input string (except for the last character). The search continues in this fashion, with MATCH bits associated with characters immediately followed by a parameter marker being retained. This results in multiple string searches within each record, corresponding to different ways a given string may fit a template. The search process continues in this fashion up to the last character in the input string. There are two ways in which a template can satisfy this search: (1) the last character in the template may match the last character in the input string and the next-to-Iast character in the template have its MATCH bit set, or (2) the last character in the template may have both its MATCH bit and its PERMANENT bit already set. The last search instruction in the program tests for both these conditions and at the same time resets the MATCH bits in all characters which do not meet the conditions. The Architecture of Context Addressed Segment-Sequential Storage last instruction in the program causes the records which satisfy the search to be marked for input to the computer's core storage. The examples above show that the segment-sequential storage reduces the finding of matching templates to a simple search. The time required to execute such a search depends only upon the number of characters in the query. Examples of other possible applications of the segment-sequential storage are given in a report.10 One use is retrieval of information necessary to display a portion of a map. This is a typical problem encountered in graphic displays, where a subset of the data base is to be selected on the basis of x-y location. Another example is the use of the segment-sequential storage to simulate networks of linear threshold devices. CONCLUSIONS This paper has presented a new architecture designed to solve some of the problems in searching large data bases. The examples given indicate its usefulness in several practical applications. Since the system is built around a relatively inexpensive storage medium, it is feasible now. In the future, LSI techniques should make its cellular organization even more attractive. REFERENCES 1 P ARMSTRONG Several patents 701 2 G J LIPOVSKI On data structures in associative memories Sigplan Notices Vol 6 No 2 pp 347-365 February 1971 3 G ESTRIN R H FULLER Some applications for content-addressible memories Proc FJCC 1963 pp 495-508 4 R G EWING P M DAVIES A n associative processor Proc FJCC 1964 pp 147-158 5 G J LIPOVSKI The arch~tecture of a large associative processor Proc SJCC 1970 pp 385-396 6 L HELLERMAN G E HOERNES Control storage use in implementing an associative memory for a time-shared processor IEEE Trans on Computers Vol C-17 pp 1144-1151 December 1968 7 P T RUX A glass delay line content-addressed memory system IEEE Trans on Computers Vol C-18 pp 512-520 June 1969 8 I FLORES A record lookup memory subsystem for software facilitation Computer Design April 1969 pp 94-99 9 G J LIPOVSKI The architecture of a large distributed logic associative memory Coordinated Science Laboratory R-424 July 1969 10 L D HEALY G J LIPOVSKI K L DOTY A context addressed segment-sequential storage Center for Informatics Research University of Florida TR 72-101 March 1972 11 P WEGNER Programming languages, information structures, and machine organization McGraw-Hill 1968 A cellular processor for task assignments in polymorphic, multiprocessor computers by JUDITH A. ANDERSON National Aeronautics & Space Administration Kennedy Space Center, Florida and G. J. LIPOVSKI University of Florida Gainesville, Florida switch required to restructure the computer to perform the selected tasks. This paper will be restricted to those Junctions performed by the cellular processor; in particular, the task qualification phase and the portions of the task assignment phase related to the cellular processor. INTRODUCTION Polymorphic computer systems are comprised of a large number of hardware devices such as memory modules, processors, various input/ output devices, etc., which can be combined or connected in a number of ways by a controller to form one or several computers to handle a variety of jobs or tasks. 1 Task assignment and resource allocation in computer networks and polymorphic computer systems are currently being handled by software. It is the intent of this paper to present a cellular processor which can be used for scheduling and controlling a polymorphic computer network, freeing some of the processor time for more important functions. (See Figure 1.) Work has been done in the area of using associative memories and associative processors in scheduling and allocation in multiprocessor systems. 2 ,3 Since the scheduling process often involves a choice of hardware resources which might do the job, a system able to detect elm out of n" conditions being met would be more suited to the type of decision-making required. The system to be discussed involves a threshold-associative search; that is, all the associative searching performed detects if at least m corresponding bits in both the associative cell and the comparand are one. Scheduling and controlling can be divided into three distinct phases. The first is task qualification, determining which tasks are possible with the available hardware. The second phase is task assignment, deciding which of the candidate tasks found to be qualified in the first phase will be chosen to be performed next. The third phase is the actual controlling or connection of the SCHEDULING The method for ordering requests consists of storing the queue of requests in a one-dimensional array of cells. One request requires several contiguous cells for storage. The topmost cells store the oldest request. New requests are added to the bottom and are packed upward as in a first-in, first-out stack. An associative search is performed over all the words stored to determine which requests qualify for assignment. The topmost request which qualifies will be chosen for assignment. Using a slightly more complex cell structure, a priority level may be associated with each request, resulting in a priority based, rather than chronological, method for task assignment, providing for greater flexibility. The priority-based system will not be discussed here, but further detail relative to it may be found in a previous report.4 METHOD OF OPERATION The basic system consists of a minicomputer and a cellular processor for task ordering. (See Figure 1.) Requests generally take the form of which processors are required, how much memory is required, and which 703 704 Fall Joint Computer Conference, 1972 SWITCH CONTROL (MINICOMPUTER) REQ. CELLULAR PROCESSOR spond to one threshold function, including the threshold value. The devices chosen from to meet that threshold value will be indicated with a one in its bit position. Let S be the status register and (Q) (T) be the request word where Q is the binary vector representing a request and T the binary number giving the threshold value T. The output C of the threshold function may be expressed as C~T~ n L Q[IJ!\S[IJ. 1 Figure I-Polymorphic computer network controlled by cellular processor and minicomputer peripheral devices and how many of each type are required to perform a particular task. These requests are made to the minicomputer via a simple, low-volume communication link, such as a shift register, data bus, or belt. The minicomputer then formats the requests into a request set which is explained below. The request set is given an identification word and is input to the bottom of the task queue stored in the cellular processor. This unit stores all the request sets and determines which requests can be qualified for assignment based on current hardware availability. The topmost request set in the cellular processor which qualifies is chosen for assignment. It is necessary for the processor to know which devices in the polymorphic computer system are not currently in use, and therefore are available for assignment. To provide this information, each physical device in the system has a bit associated with it in an Availability Status Register. If a unit, such as a tape drive, is free, its corresponding bit in the status register will be a one. When the unit is in use, its corresponding bit will be reset to a zero. The requests are of the form indicating which type of hardware devices are required, how many are required and which, if any, particular physical units are required. These requests can all be expressed as a Boolean AND of threshold functions. Each request word will correTABLE I -Status Register Assignment BIT DEVICE 1,2 3-6 7-12 13 14 15 16 17,18 Processors 1 and 2 Memory Units 1-4 Tapes Drives 1-6 Line Printer Disc Card Reader Card Punch CRT 1 and 2 A request set then consists of an identification word and a word for each threshold function necessary to express the entire request. Consider, for exa.mple, a system composed of the components or peripheral devices and the status register bit I D WORD IITS: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 II It 20 21 22 0000000011001011011001 ID THRESHOLD REQUEST ' I WORDS 1 I I I , 1 0'1 0 0 0 011 1 0 0 0 0 01, 110 10 10 I 1 0 0 1 0 0 I I 1 I 1 1 0'0 00 01 0 00'10 0 0 010 1 11 1 1I 0,10'0 I I I 1 1010 1000010 001111110000001010 I I ( I PROC! MEMORY 1 TAPE DRIVES I LPI D !CR I CP I CRT THRESHOLD Figure 2-Request set example assignments shown in Table I. The status register in this example would be 18 bits long. A request would be of the sort that the required devices for Task Number 429 are Processor 1, CRT 1, Tape Drive 1, and any two other tape drives, the Line Printer, and any two memory units. This request set would consist of four words, the ID word and three request words, shown in Figure 2. The threshold value of the ID word is set exactly equal to the number of "l's" in the ID field. This is for hardware considerations in order to do an associative search on the ID words. All the units which are absolutely necessary (mandatory devices) can be compactly represented by a single threshold request that implements the AND function. The first request word represents all such mandatory devices, whereas the second and third request words represent "any two other tape drives" and "any two memory units," respectively. A Cellular Processor for Task Assignments This request set, along with any other requests which were made would be input to the queue. When all three of the request words above could be satisfied with some or all of the available hardware, an interrupt to the minicomputer is generated. The minicomputer can then read out the ID word of the topmost request set that can be satisfied and is therefore qualified. If this request set is the highest in the queue, it will be assigned. Whichever request set is read out will be removed from the cellular processor and the requested resources allocated for that task by the minicomputer. HARDWARE DESCRIPTION The hardware realization for this cellular processor consists of a bilateral sequential iterative network. 5 That is, it is a one-dimensional array of cells, all cells 705 having the same structure and containing sequential logic. Each cell receives external inputs as well as inputs which are outputs from its adjacent cells as shown in Figure 3. Each cell stores either a request word or an ID word, or it is empty. All cells receive hardware status information which is broadcast into them continuously for comparison with their stored requests. When one or more request set has qualified for assignment, an interrupt is generated to the minicomputer. A hardware priority circuit chooses the topmost qualified request to be assigned. The cellular processor outputs the ID word for this request via a data channel which is set up through all the cells above the cell containing the qualified request in the queue. When a request is chosen for assignment, its ID word is broadcast to the cellular processor for removal from the queue. A timer is associated with the uppermost cell in the array and is used to indicate if requests are stagnating in the queue so that action may be taken by the minicomputer. Requests are always loaded into the bottom of the queue. Removal is either from the top, when the timer mentioned above exceeds some maximum value, or by deletion after the request has been assigned. If a task request is cancelled, it may be removed from the queue by treating it as if it were assign~d. When requests are removed from the middle of the queue by assignment, the other requests move upward to pack into the emptied cells. Each cell is basically made up of an n-bit register, a threshold comparator, two cell status flip flops, a data channel, and combinational logic as shown in Figure 4. The n-bit register is divided into two fields. The first k bits, Q, store the binary vector representation of the request or ID word. The last n-k bits represent the threshold value, T, for the threshold comparator. The threshold comparator, which will be discussed in more detail later, outputs a one if and only if at least T positions in both Q and the status input, S, are one's. That is, C~T~ LQ[I]AS[I] or, c~o~cE Q[I]AS[I])-CL T[I]X2I). Figure 3-Cellular processor The two cell status flip flops, TOP f f and D f f indicate whether a cell contains an ID word or not and whether a cell contains data or is empty, respectively. The data channel through each of the cells is used to output information and for packing data to economize on the number of pins per cell. The data flow in the data channel is always upward, toward the top of the queue. The data channel within a cell may be operated in two ways. It may allow data coming into the cell on 706 Fall Joint Computer Conference, 1972 0' +- - - - -- - - 01 I CT - -'" I I DATA CHANNEL S (STATUS INPUT) Figure 4-Basic iterative cell the channel to pass through into the data channel of the next cell and will be referred to as the bypass mode of operation. Also, by means of an electronic switch, it may place the contents of its register into the data channel. This will be referred to as the transfer mode. Through the use of the load enable of the register (the clock input of the register flip flop), it is also possible to load the register with the information which is in the data channel. Operation of the data channel is controlled by the cell status, the control signals from the minicomputer, and a compare rail, CT. When a request is loaded into the cellular processor, it enters via the data channel and is loaded adjacent to the lowest cell containing data. This is determined by the D f f output from the cells. Once a cell has data loaded, its threshold comparator continuously compares the register contents, Q, against the status, S. When a threshold compare has been achieved, that is, T~ request set from the queue by placing the ID of that set on the status lines and commanding a set removal via the control lines. While a removal is being commanded, the set whose ID matches with the ID on the status lines resets its data flip flop, D if, and passes a one along the R (reset) rail. This rail propagates in a downward direction and causes all cells to reset their D f f until a TOP cell is encountered. This removes the request set from the queue. There now is a group of empty cells in the middle of the stack of cells. When a cell containing data detects an empty cell above it, it places its data into the data channel and generates a pulse on the DR (data ready) rail. This pulse travels upward and enables the loading of data into the uppermost cell in the group of empty cells, that is, the first empty cell below a non-empty cell which it encounters. This is determined by D', the value of the D If of the next higher cell. Each cell moves its data upward until all the empty cells are at the bottom of the queue. The comparison operation is not stopped by the data being in the process of packing. The compare rail, CT, is passed through empty cells unless the DR rail is high, indicating data is actually in transit. An example of the switching of the data channel during the loading and shifting, or packing, process is shown in Figure 5. L: S[IJI\Q[IJ a one is ANDed into the CT rail, which is propagating upward, toward the top of the queue. When all the reque~t words in a set compare, the CT rail entering the TOP cell of the request set is a logic one. This causes an interrupt to be generated, indicating to the minicomputer that there is a qualified set. The interrupt, INT, is placed into an OR tree external to the cell network to speed the interrupt signal to the minicomputer to increase response time of the system. Upon receipt of the interrupt, the minicomputer can interrogate the processor to determine which request set caused the INT to be generated. The ID word of the topmost qualified set is broadcast via the data channel, and stored in the output register. The minicomputer can then remove the L...--y-.L....:.I0--,-10--,1 Figure 5-'-Example of shifting and loading 10 10 I A Cellular Processor for Task Assignments Further details of the cell operation are given in an earlier report.4 A method for implementing priority handling was also discussed. 707 5 THRESHOLD COMPARATOR Current literature on threshold logic discusses integrated circuit realizations of threshold gates with up to 25 inputs and with variable threshold values. 6, 7 The threshold comparator mentioned earlier consists of a threshold gate with variable threshold which is selected by the contents of the threshold register. The inputs to the threshold gate are the contents of the status register, 8, ANDed bit by bit with the contents of the cell request register, Q, as shown in Figure 6. All inputs are weighted one. C+Y $ I S[I] A a[ll Figure 6-Threshold comparator If the number of inputs to the threshold gate is restricted to the 25 inputs indicated above, the hardware realization discussed here must be modified to overcome this restriction. In particula~, the various types of resources can be divided into disjoint sets of similar or identical devices such as memory units, processors, I/O devices, etc. A request would not be made, for instance, which would require either a tape drive or a processor. Each set would then have a threshold value associated with it and the compare outputs from all the threshold gates would be ANDed to yield the cell compare output, as illustrated in Figure 7. For simplicity, we will assume an ideal threshold element exists with an unlimited number of gate inputs in our further discussion, which can be replaced as indicated above. For large computer networks, the number of devices will be large. Since the processor discussed here requires C Figure 7-Modular threshold comparator more than 3n interconnections (pins) for each cell, where n is the number of devices, a method of dividing the cell into smaller modules which are feasible with current technologies in LSI must be considered. First, the cell must be split into modules of lower bit sizes. This may be done as discussed previously by dividing the hardware devices into disjoint sets of similar or identical devices. Each module or sub-cell will then have a threshold associated with it and a threshold comparator. One control sub-cell is also necessary which will contain all the logic required for storing the cell status, generating and propagating the rail signals, and control the data channels in the other sub-cells in its cell group. This is illustrated in Figure 8. This modularity of cell design also allows the cellular processor to be expandable. If the system requirements demand a larger (more bit positions) cell, rather than having to replace the entire cellular processor, an additional storage module may be added for each cell. This also reduces the fabrication cost since only two cellular modules would have to be designed regardless of the number of devices ina system. ------------~~~~----------~ ASSOCIATIVE STORAGE MODULES Figure 8-Modular cell structure DR R CT CONTROL MODULE 708 Fall Joint Computer Conference, 1972 CONCLUSION The threshold associative cellular processor incorporates a very simple comparison rule, masked threshold comparison. This rule was shown to be ideally suited to task qualification in a polymorphic computer, or an integrated computer network like a polymorphic computer, and was shown to be easily implemented in current LSI technology. The processor developed using this type of cell would considerably enhance the cost effectiveness of polymorphic computers and integrated computer networks by performing task requests and would reduce the software support otherwise required to poll the status of devices in the polymorphic computer or an integrated computer network. The scheme shown here will have application to other task qualification problems as well, such as a program sequencing scheme to order programs or tasks based on a requirement for previous tasks to have been performed. 4 This modular cellular processor provides a system which can handle a wide range of scheduling problems while retaining a flexibility for expansion and at the same time increasing speed by performing the parallel search rather than polling. REFERENCES 1 H W GSCHWIND Design of digital computers Chapter 9 Springer Verlag 1967 2 D C GUNDERSON W L HEIMERDINGER J P FRANCIS Associative techniques for control functions in a multiprocessor, final report Contract AF 30(602)-3971 Honeywell Systems and Research Division 1966 3 D C GUNDERSON W LHEIMERDINGER J P FRANCIS A multiprocessor with associative control Prospects for Simulation and Simulator of Dynamic Systems Spartan Books New York 1967 4 J A ANDERSON A cellular processor for task assignments in a polymorphic computer network MS Thesis University of Florida 1971 5 F C HENNIE Finite state models for logical machines John Wiley & Sons New York 1968 6 J H BEINART et al Threshold logic for LSI NAECON Proceedings May 1969 pp 453-459 7 R 0 WINDER Threshold logic will cut costs especially with boost from LSI Electronics May 27 1968 pp 94-103 A register transfer module FFT processor for speech analysis by DAVID CASASENT and WARREN STERLING Carnegie-MellOn University Pittsburgh, Pennsylvania FOURIER TRANSFORM APPLICATIONS TO SPEECH PROCESSING2 INTRODUCTION On-line speech analysis systems are the subject of much intensive research. Spectral analysis of the speech pattern is an integral part of all such systems. To facilitate this spectral analysis and the associated preprocessing required, a special purpose fast Fourier transform (FFT) processor to be described is being designed and constructed. One unique feature of this processor which facilitates both its design and implementation while providing an. easily alterable machine is its construction from standard logic modules which will be referred to throughout as register transfer modules or RTM's.l This design approach results in a machine whose operation is easily understood due to this modular construction. Two of the prime advantages of such a processor are: Let us briefly review Fourier transform techniques as used in speech processing. In the discrete time domain, a segment of speech 8(~T+nT) can be represented by 8(~T+nT) =p(~T+nT)*h(nT) (1) * where denotes discrete convolution and ~T is the starting sample of a given segment of the speech waveform.p(~T+nT) is a quasiperiodic impulse train representing the pitch period and h (nT) represents the triple discrete convolution of the vocal-tract impulse response venT), with the glottal pulse genT) and radiation load impulse response r(nT), h(nT) =v(nT)*r(nT)*g(nT) (2) The vocal tract impulse response is characterized by parameters called formant frequencies. These parameters vary with corresponding changes in the vocal tract as different sounds are produced; however, for short time spectrum analysis of speech waveforms, the formant frequencies can be considered constant. Given the above speech model, speech analysis involves estimation of the pitch period and estimation of formant frequencies. These parameters are estimated using the cepstrum of a segment of a sampled speech waveform. For our purposes, the cepstrum is defined as the inverse discrete Fourier transform (IDFT) of the log magnitude spectrum of the speech waveform segment. The details of cepstral analysis are shown in Figure 1. The input speech segment to Figure 1 8(~T+nT), typically about 20 msec in duration, is weighted by a symmetric window function w (nT) (1) The very low design, implementation, and debugging lead times which result from the RTM design at the higher register transfer logic level rather than at the conventional gate level. (2) The RTM processor can be easily altered due to the pin-for-pin compatability of all logic cards. Different hardwired versions of a given algorithm can be easily implemented by appropriate back plane rewiring. Because of the stringent time constraints imposed by such a design effort, this processor can also serve as a feasibility model for the use of RTM's in other complex real-time systems. This is one area in which little work has been done. When in operation, the processor will accept input data in the form of an analog speech signal and output the resultant spectral data to a PDP-II computer for analysis. x(nT) =8(~T+nT)w(nT) = [p(~T+nT)*h(nT) Jow(nT) 05:.n -=< 7L I I : C.C. C::::::~7 -1 I W I )-,----'~ Figure 4-The real-valued input FFT algorithm for N = 16 * denotes complex conjugate 712 Fall Joint Computer Conference, 1972 a + ib 1 - - -....... c + id c'+id' wn cos at a + · N 27Tm) (c cos 21Tm + d Sl.n N bl b + (d cos c' · N 27Tm) a - (c cos 27Tm + d Sl.n N d' = - a' +ib ' 2mn N+ 21Tm i sin - N 21Tm · N 27Tm) c Sl.n N [b - (d cos 2;m - c sin 2;m) Figure 5-The complex calculation * denotes complex conjugate. N = number of samples Computer calculations using this algorithm yielded a maximum error computed at critical values and extrema which ranges between -0.00782 and 0.0094. The coefficients of f (x) were chosen for easy of binary implementation. FFT ALGORITHM FOR REAL-VALUED INPUT Various FFT algorithms exist. One particularly adaptable to RTM implementation will be briefly reviewed. The complex discrete Fourier transform of a sampled time series x(k) (k=O, ... , N -1) can be written as 1 X( j) == - n-l E x(k)e- i2'dk / N (13) Nk=o It has been shown4 that when the x (k) series is real, Re [XC j) ] is symmetric about the folding frequency F,; and 1m [XC j) ] is antisymmetric about F,. Figure 3 shows this pictorially. An algorithmS which eliminates calculations that will lead to redundant results in the real-valued input case has previously been discussed. Figure 4 graphically illustrates this algorithm for N = 16. The algorithm can be represented by the expression n-l X( j) = L: Bo(k) k=O W-jk (14) where W =e27ri/ N ; Bo(k) is real; j=O, 1, ... ,N/2; and N =2m where m is an integer. The "complex calculation" shown in Figure 4 is a slight modification of the butterfly multiply6 normally used in FFT algorithms. Details of the calculation are shown in Figure 5, from which the signal flow is apparent. Each complex calculation box, as shown, moves to the right to operate on all operands within its group. On the first level, this box performs eight computations, on the second level each box performs 4 calculations, etc. Since the multiplications are ordered as above, addressing for this multiplier is fairly straightforward. For ease in accessing the complex multiplier Wm, its complex values should be stored in the order in which they occur. An algorithm for determining the sequence of the exponent m has been documented, and a set of recursive equations which specify the addresses of the four operands for every complex calculation can be formulated. s The address sequencing is easily implemented in a hardware unit for automatic generation of the required addresses in the proper sequence. It is apparent from Figure 4 that all complex calculations involving one complex multiplier Wm can be completed before the next complex multiplier is used. 7 For example, all calculations involving WO can be completed on all 3 levels, then all calculations involving W2, etc. In the conventional method all calculations on one level are completed before dropping to the next level. If the complex multipliers are stored in their accessed order, there is no need to explicitly store the sequence of exponents. Furthermore, each complex exponent in this addressing scheme need be accessed only once. As in the conventional FFT implementation, the resultant Fourier coefficients must be re-ordered. With the accessing order of the complex multipliers specified by a linear array A, the exponent m for the ith W is given by m=A (i). An inverse table look-up enables the scrambled Fourier coefficients to be accessed from memory in the order of ascending frequency. To implement this inverse table look-up, the location N of the ith harmonic is found from the value m in the array A and by using its position in the array as the value of N. TABLE II-Formulas for Calculating the Number of Operations in FFT Algorithnis Real Multiplications complex inputs real inputs (m - 3.5)N (2m - 7)N +6 + 12 Real Additions +4 +4 (1.5m - 2.5)N (3m - 3)N Register Transfer Module FFT Processor 713 TABLE III-Description of RTM Modules o ANALOG SPEECH SIGNAL 12. r-- -: I TO PDP-ll : ~ ~ J: ~r 256 POINT FFT MULTIPLY 256 POINTS BY HAMMING WINDOW VECTOR 256 POINT FFT VECTOR K.bus - - ---. 256 POINT INVERSE FFT (FORMS CEPSTRUM) I I I : ~------------------- _____ J Figure 6-FFT processor data flow. Boxed area denotes future extension of the processor In implementation, the sequence of locations is, for convenience, stored separately. Table II below compares the number of operations, and consequently, the speed, of the conventional ?ooley-Tukey radix-2 FFT algorithm for complex mputs, and the FFT algorithm for real inputs. 5 In the formulas N = 2m , where N is the number of samples. These formulas assume special cases such as exp (iO) are calculated as simply as possible. About 72 the number of operations are required for real inputs as for complex inputs, owing to the elimination of redundant calculations. As explained previously, the algorithm can be streamlined further by sequencing through the complex multipliers rather than across each level. A software version of these techniques has been implemented7 and has achieved a real-time processing speed of 10,300 samples/sec. This is the equivalent of one 256 po~nt FFT every 25 msec. The minimum speech processmg speed required for this system is one 256 point FFT every 10 msec. It is evident that speeding up the. algorithm requires hardwiring the complex calculatIOn and address generation. T.a/d DM.bool DM.const DM.gpa DM.ii DM.index DM.mult DM.oi DM.pdp-l1 DM.tr M.array M.sp controls asynchronous timing of sequential operations analog to digital converter boolean flags 4 word read only memory general purpose arithmetic unit general purpose input interface FFT address genera tor multiply unit general purpose output interface PDP-l1 interface temporary storage register read/write memory; ",,2 JLsec access time read/write scratch pad memory; ",,500 nsec access time the FFT algorithm for real-valued inputs generates harmonics only through the folding frequency. The binary logarithm of the magnitude of each of these 129 complex values is then calculated and the result transferred to a PDP-H. During processing, the buffer must continually store the input samples. After the third group of 128 samples has been stored, samples 128 thru 383 are weighted by the window and processed. Although a 256 point FFT is performed, the window is shifted by only 128 words each time thus including each sample in 2 FFT cal:culations, each time with a different weighting factor. SPEECH SIGNAL FFT I- INDEXING UNIT ARITHMETIC UNIT 1 ADDER 1 MULTIPLIER PROCESSOR DATA FLOW Figure 6 shows the logical flow of data through the processor. The "Future Extension" section will not be implemented initially. Instead the log magnitude of the spectrum will be transferred to a PDP-II. At this point the s~ectral ~nvelope can be extracted by digital recurSIve filtermg techniques rather than by cepstral smoothing. This approach adequately demonstrates the feasibility of a real-time RTM processor. The analog speech signal is sampled at 10 kHz and stored in a buffer. When 256 8-bit words have been accumulated, they are weighted by a Hamming window. A 256 point FFT is then performed on these weighted samples. This results in only 129 complex values since Function Module ~ .~ MEMORY BOOLEAN FLAGS BUS TO BUS INTERFACE H PDP-ll INTERFACE ARITHMETIC I- UNIT I- 1 ADDER 1 MULTIPLIER ARITHMETIC UNIT 2 ADDERS HMEMORY H 1 H MEMORY BOOLEAN FLAGS BOOLEAl'J FLAGS H BUS TO BUS INTERFACE BUS 2 ~ BUS 3 Figure 7-Block diagram of FFT processor Bus 1 samples and buffers speech signal. Bus 2 performs FFT. Bus 3 calculates binary logarithm and interfaces to a PDP-l1 714 Fall Joint Computer Conference, 1972 K.bus r- --, I ANALOG I K.bus : SPEECH L I SIGNAL I DM.pdp-l1 L f - bus =;r= OM. index l TO --I T.a/d bus-Al bus--A2 bus--A3 PDP-II + DM.ii bus-A4 initialize increment DM. gpa DM. gpa DM.mult DM.bool DM.bool M. array DM.oi LDM.ii LDM.ii M.array M.array (512 words; (512 words) M.sp M.sp b end Al (7:0) A2 <7:0> A3 (7:0> A4 (7:0'> (b) DM.index - FFT address generator. The DM.index control lines are described in Table IV lated. In the 12.8 msec used to sample 128 words the following three operations must be performed: (1) The Hamming window must be applied, (2) The 256 point FFT performed, and (3) The log magnitude of each harmonic calculated. RTM LEVEL DESIGN BUS 3 BUS 2 Figure 8-RTM structure of FFT processor. The modules are described in Table III The first FFT thus operates on samples 0-255, the second FFT on samples 128-383, the third on 256-511, etc. In the actual machine a 384 word ring buffer memory is used to achieve the sequencing of the blocks of 128 samples. The time constraints on the system are easily tabu- A block diagram of the processor structure is shown in Figure 7. It is a three bus system with each of the above operations performed on a separate bus. Figure 8 shows the specific RTM modules used; Table III describes the modules. With the exception of DM.mult and DM.index, the data modules shown in Figure 8 are all standard RTM's. The functions of the two nonstandard modules are outlined below and illustrated in Figure 9. TABLE IV-Description of Control Lines for Indexing Unit DM.lndex ·bus control line A-bus done (256 words) DM.oi M.sp BUS 1 b function DM.mult initialize increment B- bus A <15:0> B <15:0> bus ~ bus ~ bus ~ bus ~ done -end Figure 9-(a) DM.mult-multiply unit A1 A2 A3 A4 initialize indexing unit calculate next 4 operand addresses for complex calculation load 1st address on bus load 2nd address on bus load 3rd address on bus load 4th address on bus signals end of calculations involving one complex multiplier signals end of FFT Register Transfer l\10dule FFT Processor Uata Buffering Windowing 1.5 Data Transfer: bus 1 to bus 2 1.5 BUS 1 6.5 HT BUS 2 ~tagnitude Caieul at ion Data Transfer: bus 2 to bus 3 Logarithm Ca1culation 5.3 Reorder and Data Transfer: to PDP-ll 5.3 1 Processor Cycle BUS 3 It-i_ _ _ _ _ _---=.:12:..:..!:.8~_ _ _ _ _ _ __< 2 3 4 5 6 7 MSEC 8 9 10 11 12 13 715 samples and transferring them to bus 2. Bus 2 spends 1.5 msec simultaneously accepting data from bus 1, calculating the magnitude of the harmonic components and transferring the results to bus 3. 6.5 msec are spent calculating the FFT. This leaves 4.8 msec (12.8-1.5-6.5) of dead-time during each processor cycle; time when no processing occurs on bus 2. Bus 3 spends 1.5 msec accepting data from bus 2, and 5.3 msec simultaneously calculating the logarithm of 129 samples and transferring them to the PDP-II. This leaves 6 msec of deadtime on bus 3. It is clear that bus 2 carries the heaviest processing load; therefore, bus 2 dead-time determines that a speed margin of 4.8 msec exists; that is, the processor completes processing each set of 256 samples 4.8 msec faster than needed to maintain real-time operation. Figure 10-Processor timing diagram Accuracy DM.mult This module multiplies the two 16 bit positive numbers in registers A and B. Any 16 bits of the 32 bit result can be placed on the bus. The multiplier was implemented using Fairchild 9344 2 X 4 bit multipliers. DM.index High speed hardware indexing units for FFT operand address generation have been presented in the literature. 8 This module generates the addresses of the four operands of every complex calculation during the FFT. It is a hardware implementation of the recursive equations for the FFT algorithm for real value inputs discussed previously. It was designed to sequence through all calculations involving one complex multiplier. Table IV defines the control lines shown in Figure 9 (b) . The four 8-bit registers, AI, A2, A3 and A4 hold the addresses of the four operands. These registers do not physically exist since the addresses are generated combinatorily upon command; they are defined for logical purposes only. Figure 10 shows the timing diagram of the processor. All arithmetic operations, register transfers, and memory accesses involve use of the bus, which has a settling time of 500 nsec. Therefore, the average speed of any operation is 500 nsec. This value was used in calculating the processing times shown in Figure 10. For example, approximately 13,000 operations are required to perform each 256 point FFT on bus 2. The processing time, therefore, is 6.5 msec. Bus 1 is continually buffering data, however, only 1.5 msec of 1 processor cycle (12.8 msec) are spent windowing 256 The question of accuracy always arises for a processo. operated in fixed point mode. As noted previously,5 distribution of the 1/N normalization factor over the entire transform constrains the magnitudes of the operands at each level to prevent overflows. The only overflow possibility occurs during the calculation of the magnitude of the Fourier coefficients. When overflow occurs (positive or negative), the largest (positive or negative) number will be chosen. Simulation runs to determine the effect of multiplier size on accuracy were conducted. A 16 X 16 bit multiplier was used in conjunction with the fixed point FFT described to process actual speech signal samples. For audible speech, accuracy of 1 percent relative mean square error was achieved when compared to floating point results. The same simulation using a 12 X 12 bit multiplier resulted in an error of 6 percent. For signals of small magnitude (such as the signal generated by silence) the error for the 16 X 16 bit multiplier rose to 25 percent; however, this is acceptable for processing the silence signal. For comparison, previous published accuracy results for a 16 X 16 bit multiplier and similar FFT algorithm7 showed a maximum error of ±O.OI2 percent fullscale with a standard deviation of ±0.004 percent fullscale. On the basis of these results, the 12 X 12 bit multiplier was considered too inaccurate; therefore, the 16 X 16 bit multiplier was chosen. RTM control RTM control logic is designed with 2 basic modules: 1. Ke: a module which initiates arithmetic operations, data transfers between registers, and memory read/write cycles. 716 Fall Joint Computer Conference, 1972 2. Kb: a module which chooses a control branch based on the value of a boolean flag. With these modules the control for executing an algorithm can be specified in a manner quite similar to programming the algorithm in a high level programming language. This greatly simplifies the design of the control, thus resulting in a significant reduction in design time. This concept can easily be illustrated by investigating a section of bus 2 control. This particular section controls the complex calculation for the degenerate case of wo, that is, when the complex multiplier is 1 +iO. For this case the equations shown in Figure 5 reduce to a' =a+c b'=b+d c'=a-c d'=d-b A and B are general purpose arithmetic unit registers; INDEX is a storage register used for sequencing the counter through the 64 complex multipliers; ONE is a constant generator containing a "1"; and MAl and MB1 are memory address and buffer registers, respectively. The control for this series of complex calculations is then: Ke Ke (L~l; initialize) INDEX<-Ol Kb (done) [1__ 1 Ke Ke Ke Ke Ke Ke Ke Ke Ke Ke Ke Ke Ke Ke -----, (MA1~A1; read) ~. . ) (next control sectIOn (A~MB1) (MA1~A2; read) (B~MB1) (MB1~(A-B)/2; write) (MA1~A1) (MB1~(A+B)/2; (MA1~A3; write) read) (B~MB1) (MA1~A4; read) (A~MB1) (MB1~(A-B)/2; write) increment) (MB1~(A+B)/2; write) (MA1~A3; By dividing the result~ of each complex calculation by 2, the 1/N normalization factor can be distributed over the entire calculation. The control section for the remaining complex calculations is, of course, more complex requiring 46 Ke and 7 Kb, but its design and implementation remain straightforward. To accomplish control of all operations on bus 2, including accepting data from bus 1, executing the FFT, calculating the magnitudes of the Fourier coefficients, and transferring data to bus 3, about 120Ke and 20 Kb were used. FUTURE EXTENSIONS The speech processing application for this processor involves an initial Fourier transform, a second Fourier transform to obtain the cepstrum and an inverse Fourier transform. Figure 6 shows data flow for the proposed final form of the pipeline processor. The present system is memory limited because 14 bus transfers in and out of memory are required for every complex calculation. Approximately 500 nsec are required for a bus transfer; 250 nsec to load data on the bus and 250 nsec to read data from the bus. Faster memory and bus systems can decrease this portion of the processing time. The processor fulfills both the overall goal of a modular FFT computer to meet the minimum processing rate of 10K data samples/sec, and attain accuracy of 1 percent relative mean square error necessary for speech analysis. This was done using existing RTM's with only 2 new modules required. It should be emphasized that while the processor performs a specialized function (calculating the FFT) , the RTlVI modules themselves, with the exception of DM.index, are general and can be used to implement any processor. In fact, since only the back plane wiring determines the characteristics of the processor, one set of RTM modules can be shared among many processors, if the processors will not be used simultaneously. This can result in substantial savings over the purchase or construction of several complete processors. Along these lines, it would be advantageous to develop more complex but still general RTM modules. Specifically, a generalized micro-programmed LSI RTM module could be coded to implement the entire complex calculation, the FFT address generator, or any other. algorithm on a single card. The complex calculation is an area where the system's speed can be significantly improved. At present, 46 bus transfers are required for each complex calculation. This number could be reduced by a factor of 3 by constructing one card to perform the entire complex calculation. The present Register Transfer Module FFT Processor system's specifications did not require such improvements and the RTM design concepts were used to investigate various system designs using existing modules rather than constructing an entire system from the start. SUMMARY This paper has reviewed the basic FFT algorithms and presented a method by which a relatively sophisticated piece of hardware such as an FFT processor could be designed at the register transfer level in a much shorter time than required in a conventional gate level design. The simplicity of this modular construction has permitted a fairly in-depth view of the processor. The resultant product and its method of implementation are rather unique in that they combine the convenience of a control logic that is similar in structure to software algorithms with the processing speed of a completely hard-wired algorithm. ACKNOWLEDGMENTS The authors wish to acknowledge the assistance of Lee Bitterman, and Professors Gordon Bell and Raj Reddy 717 of CMU in the design and implementation of this FFT processor. REFERENCES 1 C G BELL et al The description and use of register transfer modules (RTM's) IEEE Transactions on Computers Vol C-21 1972 2 R W SCHAFER L R RABINER System for automatic formant analysis of voiced speech The Journal of the Acoustical Society of America Vol 47 No 21970 3 E L HALL et al Generation of products and quotients using approximate binary logarithms for digital filtering application IEEE Transactions on Computers Vol C-19 1970 4 G D BERGLAND A guided tour of the fast fourier transform IEEE Spectrum Vol 6 1969 5 G D BERGLAND A fast fourier transform algorithm for real valued series Communications of the ACM Voill 1968 6 B GOLD et al The FDP, a fast programmable signal processor IEEE Transactions on Computers Vol C-20 No 1 1971 7 J W HARTWELL A procedure for implementing the fast fourier transform on small computers IBM Journal of Research and Development Vol 15 1971 8 W W MOYER A high-speed indexing unit for FFT algorithm implementation Computer Design Vol 10 No 12 1971 A systematic approach to the design of digital bussing structures * by KENNETH J. THURBER, E. DOUGLAS JENSEN, and LARRY A. JACK Honeywell, Inc. St. Paul, Minnesota and LARRY L. KINNEY, PETER C. PATTON, and LYNN C. ANDERSON University of Minnesota Minneapolis, Minnesota INTRODUCTION Type and number of busses Busses are vital elements of a digital system-they interconnect registers, functional modules, subsystems, and systems. As technological advances raise system complexity and connectivity, busses are being recognized as primary architectural resources which can frequently be the limiting factor in performance, modularity, and reliability. The traditional view of bussing as just an ad hoc way of hooking things together can no longer be relied upon to produce even viable much less cost-effective solutions to these increasingly sophisticated interconnect problems. This paper formulates a more systematic approach by abstracting those bus parameters which are common to all levels of the system hierarchy. Every bus, whether it connects registers or processors, can be characterized by such factors as type and number, control method, communication mechanism, data transfer conventions, width, etc. Evaluating these parameters in terms of the preliminary functional requirements and specifications of the system constitutes an efficient procedure for the design of a cost-effective bus structure. Busses can be separated into two generic types: dedicated, and nondedicated. Dedicated busses A dedicated bus is permanently assigned to either one function or one physical pair of devices. For example, the Harvard class computer characterized by Figure 1 has two busses, each of which is dedicated according to both halves of the definition. One bus supplies procedure to the processor, the other provides data. If there were multiple procedure memory modules on the procedure bus, that bus would be functionally but not physically dedicated. The concept of "function" is hierarchical rather than atomic; in the sense that the procedure bus of Figure 1 carries both addresses and operands, it could be viewed as physically but not functionally dedicated. This dichotomy is reversed in Figure 2, which illustrates another form of Harvard class machine. In this case, one bus is functionally dedicated to addresses and the other to operands. They are undedicated from the standpoint of data/procedure separation, and physically undedicated as well. The principal advantage of a dedicated bus is high throughput, because there is little, if any, bus contention (depending on the type and level of dedication). As a result, the bus controller can be quite simple compared to that of a non-dedicated bus. Also, portions of the communication mechanism which must be explicit in undedicated busses may be integral parts of the BUS STRUCTURE PARAMETERS Each of these bus structure parameters involves a variety of interrelated tradeoffs, the most important of which are considered below. * This work was supported in part by the Naval Air Development Center, Warminster, Pa., under Navy contract number N6226972-C-0051. 719 720 Fall Joint Computer Conference, 1972 - PROCEDURE BUS - 1 T PROCESSOR 1 - PROCEDURE MEMORY DATA BUS DATA MEMORY 1 - Figure 1-Harvard class computer with dedicated procedure and data busses devices on a dedicated bus: addresses may be unnecessary, and the devices may automatically be in sync. A system may include as many undedicated busses as its logical structure and data rates require, to the extreme of one or more busses between every pair of devices (Figure 3). A major disadvantage of dedicated busses is the cost of the cables, connectors, drivers, etc., and of the multiple bus interfaces (although the interfaces are generally less complex than those for nondedicated busses). If reliability is a concern, the busses must be replicated to avoid potential single-point failures. Dedicated busses do not often support system modularity, because to add a device frequently involves adding new interfaces and cables. Non-dedicated busses Non-dedicated busses are shared by multiple functions and/or devices. As pointed out earlier, busses may be functionally dedicated and physically non-dedicated, or vice versa. The Princeton class computer of Figure 4 illustrates a commonly encountered type of single bus - T PROCESSOR 1 - ADDRESS BUS - T PROCEDURE MEMORY 1 - OPERAND BUS Figure 3-Adding a device to a non-dedicated bus structure structure which is not dedicated on either a functional or a physical basis. The interesting case of multiple, system-wide, functionally and physically non-dedicated busses is seen in Figure 5. Here every device can communicate with every other device using any bus, so the failure of a bus interface to some device simply reduces the number of busses (but not devices) remaining available to that device. The crossbar matrix is a form of non-dedicated bus structure for connecting any element of one device class (such as memories) to any element of another (such as processors). It can be less efficiently used to achieve complete connectivity between all system devices. The crossbar can be very complex to control, and the number of switches increases as the square of the number of devices, as' shown in Figure 6. It also suffers from the disability that failure of a crosspoint leaves no alternative path between the corresponding devices. By adding even more hardware, the crossbar switch can be generalized .to a code-activated network (analogous to the telephone system) in which devices seek their own paths to each other. - T DATA MEMORY PROCESSOR MEMORY I/O 1 -- Figure 2-Harvard class computer with dedicated address and operand busses Figure 4-Princeton class computer with a single non-dedicated bus Systematic Approach to Design of Digital Bussing Structures PROCESSOR PROCESSOR MEMORY MEMORY I/O 721 the devices which desire but do not obtain the bus must wait for another opportunity to contend for it. The communication technique is usually more complex for non-dedicated busses, because devices must be explicitly addressed and synchronized. Bus control techniques Figure 5-Multiple, system;,.wide, non-dedicated busses Another relatively unconventional non-dedicated bus structure is the permutation or sorting network which can connect N devices to N other devices. The sorting n~twork may be implemented with memory or gating, but in either case if all N! permutations are allowed, the hardware is extensive for anything but very small N's. Non-dedicated busses offer modularity as their main advantage, in that devices generally may be added to them more easily than to dedicated busses. Multiple busses such as those in Figure 5 not only increase bandwidth but also enhance reliability, rendering the system fail-soft. While non-dedicated busses avoid the proliferation of cables, connectors, drivers, etc., they do exact a toll in usage conflict. Bus allocation requires logic and time, and if this time cannot be masked by data transfers, the bus bandwidth and/or assignment algorithm may have to be compromised. Furthermore, MEM MEM I I I Centralized bus control MEM PRocr-.-------~~H~.---------~~.~.---~!----~:~r~ I I When a bus is shared by multiple devices, there must be some method whereby a particular unit requests and obtains control of the bus and is allowed· to transmit data over it. The major problem in this area is resolution of bus request conflicts so that only one unit obtains the bus at a given time. The different control schemes can be roughly classified as being either centralized or decentralized. If the hardware used for passing bus control from one device to another is largely concentrated in one location, it is referred to as centralized control. The location of the hardware could be within one of the devices which is connected to the bus, or it could be a separate hardware unit. On the other hand, if the bus control logic is largely distributed throughout the different devices connected to the bus, it is called decentralized control. The various bus control techniques will be described here in terms of distinct control lines, but in most cases the equivalent functions can be performed with coded transfers on the bus data lines. The basic tradeoff is allocation speed versus total number of bus lines. With centralized control, a single hardware unit is used to recognize and grant requests for the use of the bus. At least three different schemes can be used, plus various modifications or combinations of these: • Daisy Chaining • Polling • Independent Requests. Centralized Daisy Chaining is illustrated in Figure 7. PROC.....----~-~------. ~IJ~t--.. ..... ·--"'!iIl~t__ I _____ J BUS AVAILABLE .... BUS CONTROLLER BUS REQUEST - ....,BUS BUSY - Figure 6-Adding devices to a crossbar bus DEVICE 0 ! , I- - Figure 7-Centralized bus control: daisy chain DEVICE N 722 Fall Joint Computer Conference, 1972 I DEVICE 0' "'I J BUS REQUEST .BUS CONTROLLER BUS BUSY -... POLL COUNT ,~ ..... Figure Sa-Centralized bus control: polling with a global counters Each device can generate a request via the common Bus Request line. Whenever the Bus Controller receives a request on the Bus Request line, it returns a signal on the Bus Available line. The Bus Available line is daisy chained through each device. If a device receives the Bus Available signal and does not want control of the bus, it passes the Bus Available signal on to the next device. If a device receives the Bus Available signal and is requesting control of the bus, then the Bus Available signal is not passed on to the next device. The requesting device places a signal on the Bus Busy line, drops its bus request, and begins its data transmission. The Bus Busy line keeps the Bus Available line up while the transmission takes place. When the device drops the Bus Busy signal, the Bus Available line is lowered. If the Bus Request line is again up, the allocation procedure repeates. The Bus Busy line can be eliminated, but this essentially converts the bus control to a decentralized Daisy Chain (as discussed later). The obvious advantage of such a scheme is its simplicity: very few control lines are required, and the number of them is independent of the number of devices; hence, additional devices can be added by simply connecting them to the hus. A disadvantage of the Daisy Chaining scheme is its susceptibility to failure. If a failure occurs in the Bus Available circuitry of a device, it could prevent succeeding devices from ever getting control of the bus or it could allow more than one device to transmit over the bus at the same time. However, the logic involved is quite simple and could easily be made redundant to increase its reliability. A power failure in a single device or the necessity to take a device off-line can also be problems with the Daisy Chain method of control. A~other disadvantage is the fixed priority structure which results. The devices which are "closer" to the Bus Controller always receive control of the bus in preference to those which are "further away". If the closer devices had a high demand for the bus, the further devices could be locked out. Since the Bus Available signal must sequentially ripple through the devices, this bus assignment mechanism can also be quite slow. Finally, it should be noted that with Daisy Chaining, cable lengths are a function of system layout, so adding, deleting, or moving devices is physically awkward . Figure 8a illustrates a centralized Polling system. As in the centralized Daisy Chaining method, each device on the bus can place a signal on the Bus Request line. When the Bus Controller receives a request, it begins polling the devices to determine who is making the request. The polling is done by counting on the polling lines. When the count corresponds to a requesting device, that device raises the Bus Busy line. The controller then stops the polling until the device has completed its transmission and removed the busy signal. If there is another bus request, the count may restart from zero or may be continued from where it stopped. Restarting from zero each time establishes the same sort of device priority as proximity does in Daisy Chaining, while continuing from the stopping point is a roundrobin approach which gives equal opportunity to all devices. The priorities need not be fixed because the polling sequence is easily altered. The Bus Request line can be eliminated by allowing the polling counter to continuously cycle except while it is stopped by a device using the bus. This alternative impacts the restart (i.e., priority) philosophy,and the average bus assignment time. Polling does not suffer from the reliability or physical placement problems of Daisy Chaining, but the number of devices in Figure 8a limited by the number of polling lines. Attempting to poll bit-serially involves synchronous communication techniques (as described later) and the attendant complications. Figure 8b shows that centralized Polling may be made independent of the number of devices by placing a counter in each device. The Bus Controller then is reduced to distributing clock pulses which are counted DEVICE DEVICE 0 N ~ ---- CLOCK OSCILLATOR ... BUSY (INHIBIT) ,Ir Figure Sb-Centralized bus control: polling with local counters. Systematic Approach to Design of Digital Bussing Structures by all devices. When the count reaches the code of a device wanting the bus, the device raises the Busy line which inhibits the clock. When the device completes its transmission, it removes the Busy signal and the counting continues. The devices can be serviced either in a round-robin manner or on a priority basis. If the counting always continues cyclically when the Busy signal is removed, the allocation is round-robin, and if the counters are all reset when the Busy signal is removed, the devices are prioritized by their codes. It is also possible to make the priorities adaptive by altering the codes assigned to the devices. The clock skew problems tend to limit this technique to small slow systems; it is also exceptionally susceptibh~ to noise and clock failure. Polling and Daisy Chaining can be combined into schemes where addresses or priorities are propagated between devices instead of a Bus Available signal. This adds some priority flexibility to Daisy Chaining at the expense of more lines and logic. The third method of centralized bus control, Independent Requests, is shown in Figure 9. In this case each device has a separate pair of Bus Request and Bus Granted lines, which it uses for communicating with the Bus Controller. When a device requires use of the bus, it sends its Bus Request to the controller. The controller selects the next device to receive service and sends a Bus Granted to it. The selected device lowers its request and raises Bus Assigned, indicating to all other devices that the bus is busy. After the transmission is complete, the device lowers the Bus Assigned line and the Bus Controller removes Bus Granted and selects the next requesting device. The overhead time required for allocating the bus can he shorter than for Daisy Chaining or Polling since all Bus Requests are presented simultaneously to the Bus Controller. In addition, there is complete flexibility available for selecting the next device for service. The controller can use prespecified or adaptive priorities, a round-robin scheme, or both. It is also possible to dis- I DEVICE 0 .... BUS REQUEST 0 I j 1---1 ~ ~ 1 DEVICE N ~ j~ ~ BUS CONTROLLER BUS GRANTED 0 I I I BUS REQUEST N BUS GRANTED N , , BUS ASSIGNED Figure 9-Centralized bus control: independent requests 723 BUS _ - - - - - -.... DEVICE 0 AVAILABLE BUS REQUEST o BUS REQUEST N Figure lOa-Decentralized bus control: daisy chain 1 able requests from a particular device which, for instance, is known or suspected to have failed. The major disadvantage of Independent Requests is the number of'lines and connectors required for control. Of course, the complexity of the allocation algorithm will be reflected in the amount of Bus Controller hardware. Decentralized bus control In a decentrally controlled system, the control logic is (primarily) distributed throughout the devices on the bus. As in the centralized case, there are at least three distinct schemes, plus combinations and modifications of these: • Daisy Chaining • Polling. • Independent Requests A decentralized. Daisy Chain can be constructed from a centralized one by omitting the Bus Busy line and connecting the common Bus Request to the "first" Bus Available, as shown in Figure lOa. A device requests service by raising its Bus Request line if the incoming Bus Available line is low. When a Bus Available signal is received, a device which is not requesting the bus passes the signal on. The first device which is requesting service does not propagate the Bus Available, and keeps its Bus Request up until finished with the bus. Lowering the Bus Request lowers Bus Available if no successive devices also have Bus Request signals up, in which case the "first" device wanting the bus gets it. On the other hand, if some device "beyond" this one has a Bus Request, control propagates down to it. Thus, allocation is always on a round-robin basis. A potential problem exists in that if a device in the interior of the chain releases the bus and no other device is requesting it, the fall of Bus Request is propagating back toward the "first" device while the Bus Available signal propagates "forward." If devices on both 724 r+ Fall Joint Computer Conference, 1972 ... DEVICE 4 0 DEVICE 1 DEVICE N - ---+ f-- BUS AVAILABLE Figure 1Ob-Decentralized bus control: daisy chain 2 sides of the last user now raise Bus Request, the one to the "right" will obtain the bus momentarily until its Bus Available drops when the "left" device gets control. This dilemma can be avoided by postponing the bus assignment until such races have settled out, either asynchronously with one-shots in each device or with a synchronizing signal from elsewhere in the system. A topologically simpler decentralized Daisy Chain is illustrated in Figure lOb. Here, it is not possible to unambiguously specify the status of the bus by using a static level on the Bus Available line. However, it is possible to determine the bus status from transitions on the Bus Available line. Whenever the Bus Available coming· into a device changes state and that device needs to use the bus, it does not pass a signal transition on to the next device; if the device does not need the bus, it then changes the Bus Available signal to the next device. When the bus is idle, the Bus Available signal oscillates around the Daisy Chain. The first device to request the bus and receive a Bus Available signal change! terminates the oscillation and takes control of the bus. When the device is finished with the bus, it causes a transition in Bus Available to the next device. Dependence on signal edges rather than levels renders this approach somewhat more susceptible to noise than DEVICE 0 ~. ..~ . ,..... :::... ""II""" ~ " " H ~ POLLING CODE .:: I .... ~ ""II ............ ~ BUS AVAILABLE BUS ACCEPT . DEVICE N ---------- " " Figure ll-Decentralized bus control: polling the previous one. This problem can be minimized by passing control with a request/acknowledge type of mechanism such as described later for communication, although this slows bus allocation. Both of these decentralized Daisy Chains have the same single-point failure mode and physical layout liabilities as the centralized version. Specific systems may prefer either the (centralized) priority or the (decentralized) roundrobin algorithm, but they are equally inflexible (albeit simple). Decentralized Polling can be performed as shown in Figure 11. When a device is willing to relinquish control of the bus, it puts a code (address or priority) on the polling lines and raises Bus Available. If the code matches that of another device which desires the bus, that device responds with Bus Accept. The former device drops the polling and Bus Available lines, and the latter device lowers Bus Accept and begins using the bus. If the polling device does not receive a Bus Accept (a Bus Refused line could be added to dis- DEVICE 0 ---------- j~ .~~ ,. "11 .... ...> DEVICE N 4~ BUS REQUESTS BUS ASSIGNED ~" ~ "'II .......... ~ ,r Figure 12-Decentralized bus control: independent requests tinguish between devices which do not desire the bus and those which are failed), it changes the code according to some allocation algorithm (round-robin or priority) and tries again. This approach requires that exactly one device be granted bus control when the system is initialized. Since every device must have the same allocation hardware as a centralized polling Bus Controller, the decentralized version utilizes substantially more hardware. This buys enhanced reliability in that failure of a single device does not necessarily affect operation of the bus. Figure 12 illustrates the decentralized version of Independent Requests. Any device desiring the bus raises its Bus Request line, which corresponds to its priority. When the current user· releases the bus by dropping Bus Assigned, all requesting devices examine all active Bus Requests. The device which recognizes itself as the highest priority requestor obtains control of the bus by raising Bus Assigned. This causes all other requesting devices to lower their Bus Requests Systematic Approach to Design of Digital Bussing Structures (and to store the priority of the successful device if a round-robin algorithm is to be accomodated). The priority logic in each device is simpler than that in the centralized counterpart, but the number of lines and connectors is higher. If the priorities are fixed rather than dynamic, not all request lines go to all devices, so the decentralized case uses fewer lines in systems with up to about 10 devices. Again, the decentralized method offers some reliability advantages over the centralized one. The clock skew problems limit this process to small dense systems, and it is exceptionally susceptible to noise and clock failure. Bus communication techniques Once a device has obtained control of a bus, it must establish contact with the desired destination. The information required to do this includes • • • • Source Address Destination Address Communication Class Action Class. The source address is often implicit, and the destination address may be also, in the case of a dedicated bus. Communication class refers to the type of information to be transferred: e.g., data, command, status, interrupt etc. This too might be partially or wholly implicit, or might be merged with the action class, which determines the function to be performed, such as input, output, etc. After this initial coordination has been accomplished, the actual communication can proceed. Information may be transferred between devices synchronously, asynchronously, or semisynchronously. Synchronous bus cOInInunication Synchronous transmission techniques are well understood and widely used in communication systems, primarily because they can efficiently operate over long lengths of cable. A synchronous bus is characterized by the existence of fixed, equal-width time slots which are either generated or synchronized by a central timing mechanism. The bus timing can be generated globally or both globally and locally. A globally timed bus contains a central oscillator which broadcasts clock signals to all units on the bus. Depending on the logical structure and physical layout of the bus, clock skew may be a serious problem. This can be somewhat alleviated by distributing a globally generated frame signal which synchro- 725 nizes a local clock in each device. The local clocks drive counters which are decoded to identify the time slot assigned to each device. A sync pulse occurs every time the count cycle (Le., frame) restarts. The device clocks must be within the initial frequency and temperature coefficient tolerances determined by the bus timing characteristics. Skew can still exist if a separate frame sync line is used, but can be avoided by putting frame sync in the data. The sync signal then must be separable from the data, generally through amplitude, phase, or coding characteristics. If the identifying characteristic is amplitude, the line drivers and receivers are much more complex analog circuits than those for simple binary data. If phase is used, the sync signal must be longer than a time slot, which costs bus bandwidth and again adds an analog dimension to the drivers and receivers. If the sync signal is coded as a special binary sequence, it could be confused with normal data, and can require complex decoders. All of the global and global/local synchronization techniques are quite subject to noise errors. There are two basic approaches to synchronous busses: the time slots may be assigned to devices on either a dedicated or non-dedicated basis. A mix of both dedicated and undedicated slots can also be used. If time slots are dedicated, they are permanently allocated to a device regardless of how frequently or infrequently that device uses them. Each device on the bus is allowed to communicate on a rotational (time division multiplex) basis. The only way that any priority can be established is by assigning more than one slot to a device (sometimes call super-commutation). More than one device may be assigned to a single time slot by submultiplexing (subcommutating) slower or mutually exclusive devices. Generally, not all devices will wish to transmit at once; system requirements may not even require or permit it. If any expansion facilities for additional devices are provided, many of the devices may not even be implemented on any given system. These two factors tend to waste bus bandwidth, and lowering the bandwidth to an expected "average" load may risk unacceptable conflicts and delays in peak traffic periods. Another difficulty that reduces throughput on a dedicaded time slot bus is that devices frequently are not all the same speed. This means that if a device operates slower than the time slot rate, it cannot run at its full speed. The time slot rate could be selected to match the rate of the slowest device on the bus, but this slows down all faster devices. Alternatively, the time slot rate can be made as fast as the fastest device on the bus, and buffers incorporated into the slower devices. Depending on the device rate mismatches and the length of data blocks, t?-ese buffers could grow quite large. In 726 Fall Joint Computer Conference, 1972 addition, the buffers must be capable of simultaneous input and output (or one read and one write in a time slot period), or else the whole transfer is delayed until the buffer is filled. Another approach is to run the bus slower than the fastest device and assign multiple time slots to that device, which complicates the control and wastes bus bandwidth if that device is not always transferring data. Special logic must also be included if burst or block transfers are to be permitted, since a device normally does not get adjacent time slots. For reliability, it is generally desirable that the receiving device verify and acknowledge correct arrival of the data. This is most effectively done on a word basis unless the physical nature of the transmitting device precludes retry on anything other than a complete block or message. If a synchronous time slot is wide enough to allow a reply for every word, then data transmission will be slower than with an asynchronous bus because the time slots would have to be defined by the slowest device on the bus. One solution is to establish a system convention that verification is by default, and if an error does occur, a signal will be returned to the source device N (say two) time slots later. The destination has time to do the validity test without slowing the transfer rate·, however , the source must retain all words which have been transmitted but not verified. Non-dedicated time slots are provided to devices only as needed, which improves bus utilization efficiency at the cost of slot allocation hardware. Block transfers and priority assignment schemes are possible if the bus assignment mechanism is fast enough. The device speed and error checking limitations of the dedicated case are also shared by· non-dedicated systems. Asynchronous bus com.m.unication Asynchronous bus communication techniques fall into two general categories: One-Way Command, and Request/ Acknowledge. A third case is where clocking information is derived from the data itself at the desti- DATA I DATA READY I II t I I I t1 I I I t2 I I I t3 It4 I I I I I I Figure l3-Asynchronous, source-controlled, one-way command communication DATA I DATA REQUEST I I I I 14 I I I t1 I t t2 : t3 I I ·t Figure l4-Asynchronous, destination-controlled, one-way command communication nation (using phase modulation, etc.) ; this is not treated here because it is primarily suited to long-distance bitserial communications applications and is well documented elsewhere. One-Way Command refers to the fact that the data transfer mechanism is completely controlled by only one of the two devices communicating-once the transfer is initiated, there is no interaction (except, perhaps, for an error signal). A One-Way Command (OWC) interface maybe controlled by either the source or the destination device. With a source-controlled OWC interface, the transmitting device places data on the bus, and signals Data Ready to the receiving device, as seen in Figure 13. Timing of Data Ready is highly dependent on implementation details, such as exactly how it is used by the destination device. If Data Ready itself directly strobes in the data, then it must be delayed .long enough (h) for the data to have propagated down the bus and settled at the receiving end before Data Ready arrives. Instead of "pipelining" data and Data Ready, it is safer to allow the data to reach the destination before generating Data Ready, but this makes the transfer rate a function of the physical distance between devices. A better approach is to make Data Ready as wide as the data (i.e., h=t3 =O), and let the receiving device internally delay before loading. t4 is the time required either for the source device to reload its output data register, or for control of the bus to be reassigned. The principal advantages of the source-controlled OWC interface are simplicity and speed. The major disadvantages are that there is no validity verification from the destination, it is difficult and inefficient to communicate between devices of different speeds, and noise pulses on the Data Ready line might be mistaken for valid signals. The noise problem can be minimized Systematic Approach to Design of Digital Bussing Structures by proper timing, but usually at the expense of transfer rate. The validity check problem can be avoided with a destination-controlled owe interface, such as shown in Figure 14. The receiving device raises Data Request, which causes the source to place data on the bus. The destination now has the problem of deciding when to look at the data lines, which is related to the physical distance involved. If an error is detected in the word, the receiving device sends a Data Error signal instead of another Data Request, so the validity check time may limit the transfer rate. The speed is also adversely affected by higher initial overhead, and by twice the number of bus propagation delays as used by the source-controlled interface. The Request/Acknowledge method of asynchronous communication can be separated into three cases: N onInterlocked, Half-Interlocked, and Fully-Interlocked. DATA :~P---:__~__!~_ DATA READY DATA ACCEPT tl I I I t2 I I " t+- t3 I I I I t4 I -+t t+- It6 I I I , I t5 - . . Figure I5-Asynchronous, non-interlocked, request/acknowledge communication Figure 15 illustrates the Non-Interlocked method. The source puts data on the bus, and raises Data Ready; the destination stores the data and responds with Data Accept, which causes Data Ready to fall and new data to be placed on the lines. If an error is found in the data, the receiving device raises Data Error instead of Data Accept. This signal interchange not only provides error control, but also permits operation between devices of any speeds. The price is primarily speed, although some added logic is also required. As with the One-Way Command interface, the exact timing is a function of the implementation. There are now two lines susceptible to noise, and twice as many bus delays to consider. Improper ratios of bus propagation time and communication signal pulse widths could allow another Data Ready to come and go while Data Accept is still high in response to a previous one, which would hang up the entire bus. This can be avoided by making Data Ready remain up until Data Accept (or Data Error) is received by the 727 DATA--...I[I DATA READY'-----~ DATA ACCEPT I I tl t2 't3 t I '. I I t5 t4 Figure I6-Asynchronous, half-interlocked, request/acknowledge communication source, as seen in Figure 16. In this Half-Interlocked interface, if Data Ready comes up while Data Accept is still high, the transfer will only be delayed. Furthermore, the variable width of Data Ready tends to protect it from noise. There is no speed penalty and very little hardware cost associated with these improvements over the Non-Interlocked case. One more potential timing error is possible if Data Accept extends over the source buffer reload period.and masks the leading edge of the next Data Ready. Figure 17 shows how this is avoided with a Fully-Interlocked interface where a new Data Ready does not occur until the trailing edge of the old Data Accept (or Data Error) . Also, both communication signals are now comparatively noise-immune. The device logic is again slightly more complex, but the major disadvantage is that the bus delays have doubled over the Half-Interlocked case, nearly halving the transfer rate upper limit. Semisynchronous bus cOInmunication Semisynchronous busses may be thought of as having time slots which are not necessarily fixed equal width. On the other hand, they might also be viewed as essentially asynchronous busses which behave synchronously when not in use. DATA DATA READY DATA ACCEPT Figure I7-Asynchronous, fully-interlocked, request/acknowledge communication 728 Fall Joint Computer Conference) 1972 t DATAl:j ~-----I I BUSAVAILABLE ..... ---- I I ~tll4-t 1 I -+t I I r+-- t2 DATAftEADY/ BUS AVAILABLE DATA ACCEPT ~ r;~ rL ~ ld---.~ i ..- - - - -..... 1 I I I f I I I tll 1 1 t2 I t+- I 2 n'--___ ~JI I I It41 I t+1 t5 DATA ACCEPTI BUS AVAILABLE ~11 14- I Semisynchronous busses were devised to retain the basic asynchronous advantage of communication' between different speed devices, while overcoming the asynchronous disadvantage of real-time error response and the synchronous disadvantage of clock skew. Error control in a synchronous system does not impede the transfer rate because the error signal can be deferred as many time slots as the validity test requires. This is not possible on a conventional asynchronous bus since there is no global timing signal available to all devices. Actually, this is true only when the bus is idle, because while it is in use there are one or more communication signals which may be observed by all devices. So an asynchronous bus could defer the Data Error signal for some N word-times as defined by whatever transfer technique is employed. But when no device is using the bus, these signals normally stop, so the one or more pairs of devices which transferred the last N words have no time reference for a deferred error response. The semisynchronous bus handles this problem by generating extra communication signals which serve as pseudoclock pulses for this purpose when the bus is idle. Only N pulses are actually needed, but a continuous oscillation may facilitate the restart of normal bus operation. The location of this pseudoclock depends on the bus control· method. If the bus is centrally controlled, the Bus Controller can detect the idle bus condition and generate the pseudo clock signals. A decentrally controlled bus requires that this function be performed by DATA ")t ; )_II I-L I ~1..___..,.._~IW 1-1 I I I I DATAREADY---......~ t3 ~ Figure 18-Semisynchronous, source-controlled, one-way command communication --.J t3 ~ Figure 19-5emisynchronous, non-interlocked request / acknowledge communication (Data Ready/Bus Available) r: DATA-.... It~: 151 I I t2 .,..t~ Figure 20-Semisynchronous, non-interlocked, request/acknowledge communication (Data Accept/Bus Available) the last device to use the bus. The replication of logic adds cost, and if this last device should fail while generating the pseudoclocks, the entire bus will be down. Like asynchronous busses, semisynchronous busses may be either One-Way Command or Request/ Acknowledge. Figure 18 illustrates how the timing of a semisynchronous source-controlled bus resembles that of its asynchronous counterpart (there is no corresponding destination-controlled case) . Instead of the source device sending a Data Ready to signal the presence of new data, it sends a Bus Available to define the end of its time slot and the beginning of the next. During a time slot, the bus assignment for the following slot is made; Bus Available then causes the next device to place its destination address and data on the bus. The selected destination then waits for the data to settle, loads it, and generates another Bus Available. Combining the function of Data Ready with that of Bus Available (a line generally required by an asynchronous bus) is a benefit which accrues to all semisynchronous busses. The semisynchronous One-Way Command interface does avoid the real-time error re~ sponse, but it is still highly susceptible to noise, and incompatible with devices of differing speeds. DATA~ DATA READYI BUS AVAILABLE DATA ACCEPT --1 \ ~ I I ~ ~: ~A:'i! I I I I I I I. I ~I+I I I ~ t1~ t4+1 t5 I I I I I I t3 r+- Figure 21-Semisynchronous, half-interlocked, request/acknowledge communication (Data Ready /Bus Available) Systematic Approach to Design of Digital Bussing Structures 729 DATA DATAR::::~ ~~i BUS AVAILABLE DATA ACCEPT I ~ ~~ ....i -.-..,- - - - i .--.._ _-;1-.--;--_.... , _____ ~ t1 --+II+- t3~ t2-., t4- t4 DATA READYI BUS AVAILABLE DATA ACCEPT I 1 t1 I t I I !4Figure 22-Semisynchronous, half-interlocked, request / acknowledge communication (toggling Data Ready/Bus Available) For semisynchronous as well as asynchronous busses, there are Non-Interlocked, Half-Interlocked, and FullyInterlocked Request/Acknowledge interfaces. The Non-Interlocked interface shown in Figure 19 is a direct extension of the One-Way Command case. It h.andles de,vices of different speeds, but also is susceptIble to nOIse and potential hangup. However, a semisynchronous bus using Data Ready as Bus Available for a Non-Interlocked interface picks up one of the liabilities of synchronous busses. The transmitting device will not generate Bus Available until Data Accept has been received and its word-time is finished, which wastes bus bandwidth if a slower source is followed by a faster one in the next time slot. This can only be alleviated with the same sort of bus bandwidth and buffer size trade-offs that a synchronous bus would use to match different device speeds. Figure 20 illustrates a scheme which solves this difficulty by using Data Accept for Bus Available. This optimizes bus bandwidth in the asynchronous sense that the transfer rate is slaved to the speed of the receiving device. Of course, the noise and hangup problems are still present. Since using Data Ready as Bus Available is unsuccessful for Non-Interlocked interfaces, it is not surpris- DATA~I DATA READY DATA ACCEPTI BUS AVAILABLE I ; ~ '\d1-----I ----!--.....:.--....;;;..," I I -.j t1 t4-t2 II I I t2 ~ Figure 24-Semisynchronous, fully-interlocked, request/acknowledge communication (Data Ready /Bus Available) ing that it doesn't work in the Half-Interlocked case either. As seen in Figure 21, ts is wasted because only the leading edge of Data Ready is used asBus Available. Als?, one device would try to hold Bus Available up whIle another is pulling it down. The second device could wait for the first to release the line, but skew on the Data Accept line from the first destination to the first and second sources would cause the wait to be quite lengthy. Furthermore, if Data Accept must be used by both source devices, it may as well transfer control instead of Bus Available. To keep from wasting t s, it might be proposed that Bus Available simply be toggled and both edges be utilized as in Figure 22, but the same state conflict exists here. Toggling a Bus Available flip-flop with Data Accept makes no more sense than both source devices employing Data Accept, and would add time. Thus, Data Accept must be converted to Bus Available, as shown in Figure 23. Except for a deferred error signal, the disabilities of a conventional Half-Interlocked asynchronous bus continue to apply. The same reasoning causes the Fullv-Interlocked interface of Figure 24 to be rejected for that of Figure 25, where the trailing edge of Data· Accept serves as Bus Available. DATA 2 DATA READY I' I I I I 1 I I I I -+1,...1 1t3 I I+- t4-.j Figure 23-Semisynchronous, half-interlocked, request/acknowledge communication (Data Accept/Bus Available) DATA ACCEPTI BUS AVAILABLE I I I It11 t21 I I I Figure 25-Seinisynchronous, fully-interlocked, request / acknowledge communication (Data Accept/Bus Available) 730 Fall Joint Computer Conference, 1972 Data transfer philosophies There are five basic data transfer philosophies that can be considered for a bus: • • • • • Single word transfers only Fixed length block transfers only Variable length block transfers only Single word or fixed length block transfers Single word or variable length block transfers. (It should be noted that here the term "word" is used functionally to denote the basic information unit on the bus·, bus width factors are covered later.) . The data transfer philosophy is directly involved WIth three other major aspects of the system: the access characteristics of the devices using the bus; the control mechanism by which the bus is allocated (if it is nondedicated); and the bus communication techniques. Of course, if the bus connects functional units of a computer such as processors and memories, the data transfer philosophy may severely impact programming, memory allocation and utilization, etc. Single words only The choice of allowing only single words to be transferred has a number of important system ramifications. First, it precludes any effective use of purely blockoriented devices, such as disks, drums, or BORAMs. These devices have a high latency and their principal value lies in amortizing this time across many words in a block. To a lesser extent, this concern also applies to other types of devices. There can be substantial initial overhead in obtaining access to a device: bus acquisition, bus propagation, busy device delay, priority resolution, address mapping, intrinsic device access time, etc. Prorating these against a block of words would reduce the effective access time. The second factor in a single-word-only system is the bus control method. Since a non-dedicated bus must be reassigned to another device for each word, the allocation algorithm may have to be very fast to meet the bus throughput specs. Even if bus assignment occurs in parallel with data transfer, this could restrict the sophistication of the algorithm, the bus bandwidth, or both. Judiciously selected parameters (speed, priorities, etc.) conceivably could enable a bus controller to handle blocks from a slow device on a word-by-word basis. A single-word-only bus requires that the communication scheme operate at the word rate, whereas with block transfers it might be possible for devices to effect higher throughput 'by interchanging communication signals only at the beginning and end of each block. Fixed length blocks only Bus bandwidth may be increased at the expense of flexibility by transferring only fixed length blocks of data. Problems arise when the bus block size does not match that of a block-oriented device on the bus. If the bus blocks are smaller, some improvement is achieved over the single-word-only bus, but not as much as would be possible. If the bus blocks are too large, extraneous data is transferred, which ,:ast~s bus bandwidth and buffer space, and unnecessarlly tIes up both devices. However, there are applications such as lookaside memories where locality of procedure and data references make effective use of a purely fixed length block transfer philosophy. Since the bus is assigned for entire blocks, the control can be slower and thus simpler. Likewise, the communication validity check can be restricted to blocks because this is the smallest unit that could be retried in case of an error. The Data Ready aspect of communication would have to remain on a word basis unless a selfclocked modulation scheme is used. Variable length blocks only The use of dynamically variable length blocks is significantly more flexible than the two previous approaches, because the block size can be matched to the physical or logical requirements of the devices involved in the transfer. This capability makes more efficient use of bus bandwidth and device time when transferring blocks. On the other hand, the overhead involved in initiating a block transfer would also be expended for single word transfers (blocks of length one). Thus, a compromise between bandwidth and flexibility may have to be arranged, based on the throughput requirements and expected average block size. An example of such a compromise would be a system in which the sizes of the data blocks depended on the source devices. This avoids explicit block length specification, reducing the overhead and improving throughput. The facility for one-word blocks requires that the control scheme be able to reallocate the bus rapidly enough to minimize wasted bandwidth. Data error response may' also be required at the word rate. Single words or fixed length blocks In a system where there are high priority devices with low data requirements, this might be a reasonable alternative. The single word option reduces the number of cases where the over-size block would waste bandwidth, buffer space, and device availability, but it still Systematic Approach to Design of Digital Bussing Structures suffers from poor device and bus utilization efficiency when more than one word but less than a block is needed. The expected mix of block and single word transfers would be a primary influence on the selection of control and communication mechanisms to achieve a proper balance of cost and performance. Single words or variable length blocks As might be expected, the capability for both single words and variable length blocks is the most flexible, efficient, and expensive data transfer philosophy. Single words can be handled without the overhead involved in initializing a block transfer. Data blocks can be sized to suit the devices and applications, which optimizes bus usage. The necessity for reassigning the bus as often as every word time imposes a speed constraint on the control method which must be evaluated in light of the expected bus traffic statistics. If data validity response is desired below a message level, the choice of a communication scheme will be affected. Bus width The width of a bus impacts many aspects of the system, including cost, reliability, and throughput. Basically, the objective is to achieve the smallest number of lines consistent with the necessary types and rates of communication. Bus lines require drivers, receivers, cable, connectors, and power, all of which tend to be costly compared to logic. Connectors occupy a significant amount of physical space, and are also among the least reliable components in the system. Reliability is often diminished even further as the number of lines increases due to the additional signal switching noise. Line combination, serial/parallel conversions, and multilevel encoding are some of the fundamental techniques for reducing bus width. Combination is a method of reducing the number of lines based on function and direction of transmission. Complementary pairs of simplex lines might be replaced with single halfduplex lines. Instead of dedicating individual lines to separate functions, a smaller number of multiplexed lines might be more cost effective, even if extra logic is involved. This includes the performance of bus control functions with coded words on the data lines. Serial/parallel tradeoffs are frequently employed to balance bus width against system cost and performance. Transmitting fewer bits at a time saves lines, connectors, drivers, and receivers, but adds conversion logic at each end. It may also be necessary to use higher speed (and 731 thus more expensive) circuits to maintain effective throughput. The serial/parallel converters at each end of the bus can be augmented with buffers which absorb traffic fluctuations and allow a lower bandwidth bus. (Independent of bus width considerations, this concept can minimize communication delays due to busy destination devices.) Bit-serial transmission generally is the slowest, requires the most buffering and the least line hardware, produces the smallest amount of noise, and is the most applicable approach in cases with long lines. Parallel transmission is faster, uses more line hardware, generates greater noise, and is more cost-effective over shorter distances. Multilevel encoding is an approach which converts digital data into analog signals on the bus. It is occasionally used to increase bandwidth by sending parallel data over a single line, but there are numerous disadvantages such as complexity, line voltage drops, lack of noise immunity, etc. THE SYSTEMATIC APPROACH A systematic approach to the design of digital bussing structures is outlined in Figure 26. It assumes that pre- ....-----1~ SYSTEM REQUIREMENTS AND SPECIFICATIONS + STEP 1: TYPE AND NUMBER OF BUSSES t STEP 2: CONTROL METHOD + ~TEP 3: COMMUNICATION TECHNIQUES ! TECHNOLOGY CONSTRAINTS I STEP 4: DATA TRANSFER PHILOSOPHIES ! STEP 5: BUS WIDTHS t DETAILED DESIGN Figure 26-0utline of the systematic approach 732 Fall Joint Computer Conference, 1972 liminary functional requirements and specifications have been established for the system. The tradeoffs for each b~parameter are interactive, so several iterations are generally necessary. Even the system requirements and specifications may be altered by this feedback in order to achieve an acceptable bus configuration within the technology constraints. Step 1: Type and number of busses This is the first and most fundamental step, and involves the specification of dedicated and/or nondedicated busses. The factors to be considered are: throughput; cost of cables, connectors, etc.; control complexity; communication complexity; reliability; modularity; and bus contention (i.e., availability). Step 2: Bus control methods The central choice is among three centralized and three decentralized methods. The Step 1 decision regarding dedicated and non-dedicated busses has a major influence here. The other considerations are: allocation speed; cost of cables, connectors, etc.; control complexity (cost); reliability; modularity; bus contention; allocation flexibility; and device physical placement restrictions. Step 3: Communication techniques Either synchronous, asynchronous, or semisynchronous communication techniques may be used, depending on: throughput; cost; reliability; mixed device speeds; bus utilization efficiency; data transfer philosophy; and bus length. Step 4: Data transfer philosophies This step is strongly influenced by the need for any block-oriented devices on the bus. In addition, the data transfer philosophy is a function of: control speed; allocation flexibility; control cost; throughput; communication speed; communication technique; device utilization efficiency; and (perhaps) programming and memory allocation. Step 5: Bus width Bus width is almost always primarily dictated by either bus length or throughput. Other aspects of this problem are: cost, reliability; communication technique; and communication speed. CONCLUSION Historically, many digital bus structures have simply "occurred" ad hoc without adequate consideration of the design tradeoffs and their architectural impacts. This is no longer a viable approach, because systems are becoming more complex and consequently less tolerant of busses which are designed by habit or added as an afterthought. The progress in this area has been hindered by a lack of published literature detailing all the bus parameters and design alternatives. Some aspects of bussing have been touched on briefly as a subsidiary topic in computer architecture papers, and a few concepts have been treated at great length in the substantially different context of communications. In contrast with these foregoing efforts, the intent of this paper is to serve as a step towards a more systematic approach to the entire digital bus structure problem per se. ANNOTATED BIBLIOGRAPHY Although many digital designers recognize the importance of bus structures, there have been no previous papers devoted solely to this subject. When bus structures have been discussed in the literat1,ll'e, it has been as a topic subsidiary to other aspects of computer architecture. This section attempts to collect a comprehensive but not exhaustive selection of important papers which deal with various considerations of bus structure design. A guide to the bibliography is given below so that particular facets of this I!laterial can be explored. Additionally, each entry has been briefly annotated to provide information on its bus-related contents. The bibliography is grouped into nine categories: Computer Architecture/System Organization, I/O, Sorting Networks, Multiprocessors, Type and Number of Busses, Control Methods, Communication Techniques, Data Transfer Philosophies, and Bus Width. Computer architecture/system organization (A2, B2, B4, D2, D3, D4, D7, D8, D9, H3, LI, L4, L7, M4, M5, R5, S7, T4, WI, W5) Papers in this category basically deal with the architecture of computers and systems, and with how subsystems relate to each other. Alternative architectures (D2, L4, WI) and specific architectures (B2, B4, D3, D4, D7, D8, D9, W5) are discussed. Item A2 is tutorial. The impacts of bus structures (D2, H3, LI) and LSI (L7, M5, R5) on systems organization are described. S7 pursues the effects of new technology on bus struc- Systematic Approach to Design of Digital Bussing Structures tures per se. Report T4 (on which this paper is based) examines the entire bussing problem, and contains a detailed bus design for a specific system. 733 different numbers of busses. B2 points out the hierarchical nature of bus structures. F2 is an example of a store and forward bus structure with dedicated busses and extensive routing control. I/O Control methods (A2, Cl, K4) (AI, A2, B7, PI, P2, P3, Ql, S2, S6, S8, W2, W4, Yl) Several papers deal with bus structures as a subcase of I/O system design. K4 is a tutorial on I/O architecture with many implications on bus structure communication and control. A2 discusses the relationships among the executive, the data bus, and the remainder of the system. Cl considers the overall architecture of an I/O system and its control. Sorting networks (Bl, L6, T2, T3) These papers deal with sorting or permuting bus structures. Bl and L6 utilize very simple cells and basically construct their systems from bitonic sorters. T2 utilizes a different, approach which is oriented toward ease of implementation with shift registers. T3 employs group theory and a cellular array approach to derive a unique network configuration. Multiprocessors (AI, C2, C8, C9, Dl, G5) These papers deal with the design of multiprocessor computer systems. C9 covers the bus architecture of multiprocessors through 1963. Al describes a multiprocessor with dual non-dedicated busses controlled by a decentralized daisy chain. C2 discusses the relationship between channel rates and memory requirements. C8 and Dl are about multiprocessors using data exchanges. G5 describes a multiprocessor bus that uses associative addressing techniques in its communication portion. Type and number of busses (AI, A3, B2, B6, D6, DlO, F2, Gl, 11, Kl, K3, L2, L3, L9, M3, S5, W8, Z2) The majority of the control techniques are some form of either centralized independent requests (A2, or decentralized daisy chaining (AI). PI uses polling, and P2 deals with priority control of a system. Communication techniques (C6, C7, D5, Fl, G3, G4, H2, H4, Ml, R2, R3, R4, Sl, S3, S4, 89, TI, W6, W7) These papers tend to be concerned with communication techniques directly rather than as a subsidiary topic. R2 discusses the information lines necessary to communicate in a system. C6, C7, and Ml cover synchronous systems. H4 and S3 are good presentations of the synchronous clock skew problem. 84 deals with the design of a frame and slotted system. Fl describes the use of phase-locked loops for synchronism, while W7 uses bit stuffing for synchronization. The synchronous system in H2 uses a combination of global and local timing. R3 deals with a synchronous system with nondedicated time slots. D5 contains a good summary of asynchronous communication, and G3 furnishes further examples. G4 points out the importance of communication in digital systems. Data transfer philosophies (A4, Cl, C3, C4, C5, G2, HI, L5, L8, M2, W3) Papers in this category are concerned with the philosophies of data transfers. A4 is about transmission error checking and serial-by-byte transmission. C3, C4, and C5 cover buffering and block size from a statistical point of view in simple bus structures such as "loops." G2 studies the choice of block sizes. L5 considers the buffering problem. Bus width The papers in this group describe a computer architecture and include some comments relating to the type and number of busses. Z2 is an example of a dedicated bus, while Al presents a non-dedicated bus. AI, DlO, L2, L3, and Z2 are cases of bus structures with (B3, B5, G3, C4, C5, K2, Rl, T5, Zl) These papers address the problem of reducing the number of lines in the bus. B3 deals with line drivers 734 Fall Joint Computer Conference, 1972 and receivers, and contains an extensive bibliography on transmission line papers. B5 discusses balancing the overall system configuration. C3, C4, and C5 are interested in the relationships of burst lengths, number of lines, etc. K2 describes a transmission system utilizing multilevel encoding. T5 is a comprehensive study of line reduction, and includes all the tradeoffs on buffering, multilevel codes, etc., in the design of an actual bus. A machine with a single 200 line bus structure is the topic of Rl. REFERENCES Al R L ALONSO et al A multiprocessing structure Proceedings IEEE Computer Conference September 1967 pp 56-59 This paper describes a multiprocessor system with non-dedicated instruction and data busses. The control method is a simple decentralized daisy chain. A2 S J ANDELMAN Real-time I/O techniques to reduce system costs Computer Design May 1966 pp 48-54 This article describes two real-time I/O applications and how a computer is used in each. It also indicates the relationships among the system executive, the CPU computations, and the I/O data bus. It includes centralized bus control. A3 J P ANDERSON et al D825-a multiple-computer system for command and control Proceedings FJCC 1962 AFIPS Press pp 86-96 This paper functionally describes the switch interlock system of the Burroughs D825 system. The switch is essentially a crossbar which can handle up to 64 devices. A priority-oriented bus allocation mechanism handles conflicting allocation requests. Priorities are preemptive. A4 A AVIZIENIS Design of fault-tolerant computers Proceedings FJCC 1967 AFIPS Press pp 733-743 This paper describes the internal structure of the JPL-STAR computer. The bus structure consists of two busses and two bus checkers. The busses transmit information in four-bit bytes and the bus checkers check for transmission errors. B1 K E BATCHER Sorting networks and their application Proceedings SJCC 1968 AFIPS Press pp 307-314 This paper describes various configurations of bitonic sorting networks which can be utilized as routing networks or permutation switches in multiprocessor systems. B2 H R BEELITZ System architecture for large-scale integration Proceedings FJCC 1967 AFIPS Press pp 185-200 This paper describes the architecture of LIMAC. It notes the hierarchical nature of bus structures, stating, "A local bus structure interconnects the sub-partitions of a functional module in the same sense that the machine bus interconnects all functional modules." B3 ROBERG et al PEPE implementation study Honeywell Report 12251-FR Prepared for System Development Corporation under Subcontract SDC 71-61 This report contains an extensive bibliography of signal transmission papers and a survey of line drivers and receivers. It also describes the bus designs for the PEPE multiprocessor system. B4 N A BOEHMER et al Advanced avionic digital computer-arithmetic and control unit design Hughes Aircraft Report P70-517 prepared under Navy contract N62269-70-C-0534 December 1970 This report describes a main data bus design for the Advanced Avionic Digital Computer, including the bus communication and allocation mechanisms. B5 F P BROOKS K ElVERSON A utomatic data processing Wiley New York 1969 Section 5.4 Parameters of computer organization pp 250-262 This section descusses speed/cost/balance tradeoffs in computer architecture. Of specific interest is how bus width, speed, and degree of parallelism affect computer performance. Examples of tradeoff results are given in terms of the System/360. B6 W BUCHHOLZ Planning a computer system M~Graw-Hill New York 1962 Chapter 16 of this book describes the data exchange of the STRETCH computer. The data exchange is a switched bus which handles data flow among I/O and external storage units and the primary store. It is independent of CPU processes and able to function concurrently with the central processor. B7 H B BURNER et al A programmable data concentrator for a large computing system IEEE Transactions on Computers November 1969 pp 1030-1038 This paper describes the internal structure of a data concentrator to be used with an IBM 360/67. The concentrator utilizes an Interdata Model 4 computer. The details of the bus structure, including timing and control signals, are given. The system was built and utilized at Washington State University, Pullman, Washington. C1 G N CEDARQUIST An input/output system for a multiprogrammed computer Report No 223 April 1967 Department of Computer Science University of Illinois This report describes the architecture of I/O systems, and deals with some parameters of bus structures through discussion of data transfers. It is primarily concerned with the implementation of centralized control and communication logic. C2 Y C E CHEN D LEPLEY Bounds on memory requirements of multiprocessing systems Proceedings 6th Annual Allerton Conference on Circuit and System Theory October 1968 pp 523-531 This paper presents a model of a multiprocessor with a multilevel memory. Given a computation graph with specified execution times and main memory requirements, bounds on the required main memory and the inter-memory channel rates are calculated. The trade-off between main memory size and backing memory channel capacity is discussed at some length. C3WWCHU A study of asynchronous time division multiplexing for time sharing computer systems Proceedings FJCC 1969 AFIPS Press pp 669-678 Systematic Approach to Design of Digital Bussing Structures This paper describes the use of an asynchronous time division multiplexing system. A model is given which relates buffer size and queuing delays to traffic, number of lines, and burst lengths. C4WWCHU Demultiplexing considerations for statistical multiplexers IEEE Transactions on Computers June 1972 pp 603-609 This paper discusses tradeoffs and simulation results useful in the design of buffers used in a computer communication system. The tradeoffs between message lengths, buffer size, traffic intensity, etc., are considered. C5 W W CHU A G KONHEIM On the analysis and modeling of a class of computer communication systems IEEE Transactions on Communications June 1972 pp 645-660 This paper derives models for a computer communication enyironment, applied to star and loop bus structure systems. The model provides a means of relating statistical parameters for traffic intensities, message lengths, etc. C6 N CLARK A C GANNET Computer-to-computer communication at 2.5 megabit/sec Proceedings of IFIP Congress 62 North Holland Publishing Company September 1962 pp 347-353 This paper describes an experimental synchronous high speed (2.5 megabit/second) communication system. It indicates the relationships of all system parts necessary to communicate in a party-line fashion among three computers. C7 COLLINS RADIO CORPORATION C-system overview 523-0561644-001736 Dallas Texas October 1 1969 This brochure describes the architecture of the Collins C-System, especially the design and features of the Time Division Exchange (TDX) loop. The TDX loop is a 32 million bit-per-second serial communication link. Communication between devices is at a 2 million word-per-second rate. The system as initially implemented contained 16 channels, with expansion to a 512 million bit-per-second capability envisioned. C8 ME CONWAY A multiprocessor system design Proceedings FJCC 1963 AFIPS Press pp 139-146 This paper describes the design of a multiprocessor system which useds a matrix switch (called a memory exchange) to connect processors to memories. The unique feature of the configuration is that an associative memory is placed between each processor and the memory exchange for addressing purposes. C9 A J CRITCHLOW Generalized multiprocessing and multiprogramming systems Proceedings FJCC 1963 AFIPS Press pp 107-126 This paper describes the state of development of multiprocessor systems in 1963. There were essentially three bus schemes in use: the crossbar switch (Burroughs D825), the multiple bus (CDC-3600) and the time-shared bus (IBM STRETCH). Functional descriptions of the bus concepts are presented. D1 R L DAVIS et al A building block approach to multiprocessing Proceedings FJCC 1972 AFIPS Press pp 685-703 This paper describes a bus structure (called a Switch Interlock) for use in a multiprocessor. It discusses the tradeoffs in choosing the structure, and looks at single bus, multiple bus, multiport, and crossbar systems. The Switch Interlock is a dedicated bus matrix switch which supports D2 D3 D4 D5 D6 D7 D8 D9 DlO 735 both single word and block transfers. The switch is designed to be implemented for bus widths from bit-serial to fully word-parallel. A J DEERFIELD Architectural study of a distributed fetch computer NAECON 1971 Record pp 214-217 This paper describes the distributed fetch computer in which the fetch (procedure and data) portion of the machine is distributed to the memory modules. A J DEERFIELD et al Distributed fetch computer concept study Air Force Contract No F-71-C-1417 February 1972 This report describes the design of a bus structure for use in the distributed fetch computer. This machine repartitions the fetch and execute portions of the processor in a multiprocessor system. The fetch units are associated with the memories instead of being with the execute units, thus decreasing bus traffic. A J DEERFIELD et al Interim report for arithmetic and control logic design study Navy Contract N62269-72-C-0023 May 1972 This report describes a proposed bus structure for the Advanced Avionic Digital Computer and some of the tradeoffs considered during the design. J B DENNIS S S PATIL Computation structures Chapter 4-Asynchronous Modular Systems MIT Department of Electrical Engineering Cambridge Massachusetts This chapter describes the reasons for asynchronous systems, and gives examples of asynchronous techniques and their timing mechanisms. It is useful in understanding asynchronous communications. E W DEVORE D H LANDER Switching in a computer complex for I/O flexibility 1964 NEC pp 445-447 This paper describes the IBM 2816 Switching Unit, the bus system utilized to interconnect CPU's and tape drives. It discusses the modularity tradeoffs made in the 2816. DIGITAL EQUIPMENT CORPORATION PDP-II handbook Chapter 8---Description of the UNIBUS pp 59-68 Maynard Massachusetts 1969 This chapter of the PDP-ll user's manual describes the UNIBUS functionally as a subsystem of the PDP-ll. Data transfer operations performed by the bus are described and illustrated with examples, along with general concepts of bus operation and control. DIGITAL EQUIPMENT CORPORATION PDP-11 interface Application Note Maynard Massachusetts This document gives a brief description of the PDP-ll UNIBUS, a single undedicated bus with centralized daisy-chain control and fully-interlocked request/acknowledge communication. DIGITAL EQUIPMENT CORPORATION PDP-11 unibus interface manual DEC-ll-HIAB-D Maynard Massachusetts 1970 This manual gives a detailed description of the PDP-ll UNIBUS, its operation in the computer, and methods for interfacing peripheral equipment to the bus. S B DINMAN Direct function processor concept for system control Computer Design March 1970 pp 55-60 This article describes the (patented) GRI-909 bus structure. 736 Fall Joint Computer Conference, 1972 The machine consists of a series of functional modules strung between two undedicated busses with a bus modifier unit (which serves a function similar to the alpha code on the Harvard MARK IV). The GRI-909 is quite similar to the DEC PDP-11. F1 K FERTIG B C DUNCAN A new high-speed general purpose input/output mechanism with real-time computing capability Proceedings FJCC 1967 AFIPS Press pp 281-289 This paper describes techniques for. I/O .processing of self-clocked data utilizing phase locked loops. F2 H FRANK et al Computer communication network design-experience with theory and practice SJCC 1972 AFIPS Press pp 255-270 This paper describes the ARPANET design from the vantage point of two years experience with the message switching system. ARPANET is a store and forward message switching network in which a device interfaces into the system by means of an interface message processor (IMP). The IMP then routes the message through the network topology. This paper provides insight into the design and specification of dedicated "store.,.and-forward" message switching systems. G1 E C GANGL Modular avionic computer NAECON 1972 Record pp 248-251 This paper describes the architecture of a modular computer including its internal bus structure. The bus consists of four parallel segments: a data bus, a status bus, a microprogrammed command bus, and a power distribution bus. G2 D H GIBSON Considerations in block oriented systems design Proceedings SJCC 1967 AFIPS Press pp 75-80 This paper describes the rationale and techniques for block transfers between CPU and memory. The study is to determine the affect of block size on CPU throughput. G3 A I GROUDAN The SKC-2000 advanced aerospace computer NAECON 1972 Record pp 229-235 This paper describes the SKC-2000 computer and its internal bus structure. The bus operates in a request/ acknowledge mode of communication and can handle devices of different speeds from 1 microsecond to larger than a millisecond with no design changes. G4 H W GSCHWIND Design of digital computers Communications in Digital Computer Systems Chapter 8 Section 5 Springer-Verlag New York 1967 pp 347-367 This section describ~s computer I/O and access paths (busses) in terms of their communication ramifications. It points out that "even experts failed to look at computers seriously from a communication point of view for a surprisingly long time." It also details the communication that occurs in some general computer configurations. G5 D C GUNDERSON Multi-processor computing apparatus U S Patent 3521238 July 13 1967 This patent describes a method of bussing in a multiprocessor system based upon the use of an associative switch. This bus scheme allows processors to access a centralized system memory by either location or some property of the data (content addressabiIity). Each processor has its own individual access to the system memory so the bus is very reliable. HI M L HANSON Input/output techniques for computer communication Computer Design June 1969 pp 42-47 This article describes the I/O systems in several UNIVAC machines, and considers the types of data transfers, staus words, number of lines, method of operation, etc., of these bus structures. H2 R H HARDIN Self sequencing data bus technique for space shuttle Proceedings Space Shuttle Integrated Electronic Conference Vol 2 1971 pp 111-139 This presentation describes the design of SLAT (Slot Assigned TDM), a data bus for space shuttle. SLAT is a synchronous bus with global plus local synchronization. The requirements, length, control method, clock skew, and synchronization tradeoffs are discussed. H3 H HELLERMAN Digital computer system principles Data Flow Circuits and Magnetic-Core Storage McGraw-Hill New York 1967 Chapter 5 pp 207-235 This chapter contains a discussion of data flow or bus circuits, with special emphasis on the trade-offs possible between economy and speed. The author stresses the fact that the bus organization of a computer is a major factor determining its performance. H4 G P HYATT Digital data transmission Computer Design Vol 6 Noll November 1967 pp 26-30 This article deals primarily with the transmission of data in a synchronous bus structure. It considers in detail the clock skew problem, and describes propagation delay and mechanization problems. It concludes that the clock pulse should not be daisy-chained, but radially distributed, and that the sum (worst case) of data propagation delays must be less than the clock pulse period. 11 F INOSE et al A data highway system Instrumentation Technology January 1971 pp 63-67 This article describes a data bus designed to interface many digital devices together. The system is essentiaJly a nondedicated single bus with one wire for data and another for addresses. The system is connected together in a "loop configuration." It uses a "5-value pulse" for synchronization, etc. The system has an access time of 200 microseconds and can handle 100 devices on a bus up to 1 kilometer in length. Kl J C KAISER J GIBBON A simplified method of transmitting and controlling digital data Computer Design May 1970 pp 87-91 This article treats the tradeoffs between the number of parallel lines in a bus and the complexity of gating at the bus destinations. The authors develop a matrix switch concept as a data exchange under program control. The programmed instruction thus is able to dynamically interconnect system elements by coded pulse coincidence control of the switching matrix. K2 H KANEKO A SAWAI Multilevel PCM transmission over a cable using feedback balanced codes NEC 1967 pp 508-513 This paper describes a multilevel PCM code (Feedback Balanced Code) suitable for transmission of data on a coaxial transmission cable. Systematic Approach to Design of Digital Bussing Structures K3 L J KOCZELA Distributed processor organization Advances in Computers Vol 19 Chapter 7 Communication Busses Academic Press New York 1968 pp 346-349 This author presents a functional description of a bussing scheme for a distributed cellular computer. Each processor can address its own private memory plus bulk storage. Communication between cells takes place over the bus in two modes: Local (between two cells) and Global (controller call plus one or more controlled cells). The intercell bus is used for both instructions and data; all transfers are set up and directed by the controller cell by means of eight bus control commands. K4 G A KORN Digital computer interface systems Simulation December 1968 pp 285-298 This paper is a tutorial on digital computer interfaces. It begins with the party line I/O bus, and covers how devices are controlled, how interrupts are handled, and how data channels operate. It discusses the overall subject of interfaces (I/O and bussing system) from the systems point of view, describing how the subsystems all relate to each other. Ll J R LAND Data bus concepts for the space shuttle Proceedings Space Shuttle Integrated Electronic Conference Vol 3 1971 pp 710-785 This presents the space shuttle data management computer architecture from a bus-oriented viewpoint. It discusses the properties and design characteristics of the bus structures, and summarizes the design and mechanization trade-offs. L2 F J LANGLEY A universal function unit for avionic and missile systems NAECON Record 1971 pp 178-185 This paper discusses some trade-offs in computer architectures, and categorizes some architectures by their bus structures, providing an example for each category. It considers single time-shared bus systems, multiple bus systems, crossbar systems, dual bus external ensemble systems, multiple-bus integrated ensemble systems, etc. L3 R LARKIN A mini-computer multiprocessing system Second Annual Computer Designers Conference Los Angeles California February 1971 pp 231-235 The topology of communication between computer subsystems is discussed. Six basic topologies for communication internal to a computer are described: (1) radial, (2) tree, (3) bus, (4) matrix, (5) iterative, and (6) symmetric. Some topological implications of bus structures are discussed including the need to insure positive (one device) control of the bus during its transmission phase. All six topologies can be expressed in terms of dedicated and non-dedicated bus structures. L4 S E LASS A fourth" generation computer organization Proceedings SJCC 1968 AFIPS Press pp 435-441 This paper functionally describes the internal organization of a "fourth-generation" computer including its data channels and I/O bus structure. L5 A L LEINER Buffering between input/output and the computer Proceedings FJCC 1962 pp 22-31 This paper describes the tradeoffs in synchronizing devices, and considers solutions to the problem of buffering between devices of different speeds. 737 L6 K N LEVITT A study oj data communication problems in a self-repairable multiprocessor Proceedings SJCC 1968 AFIPS Press pp 515-527 This paper presents a method of aerospace multiprocessor reliability enhancement by dynamic reconfiguration using busses which are data commutators. Two realizations of such a bus technique are permutation switching networks and crossbar switches. L7 S Y LEVY Systems utilization of large-scale integration IEEE Transactions on Computers Vol EC-16 No 5 1967 pp 562-566 This paper describes a new approach to computer organization based on LSI technology, employing functional partitioning of both the data path and control. Of particular interest is the data bus structure of an RCA Laboratories experimental machine using LSI technology. L8 W A LEVY E W VEITCH Design for computer communication systems Computer Design January 1966 pp 36-41 This article relates memory size considerations to a user's wait time for a line to the memory. It is applicable to bus bandwidth design in the analysis of buffer sizes needed to load up a bus structure. L9 R C LUTZ PC M using high speed memory system for switching applications Data and Communication Design May-June 1972 pp 26-28 This article details a method of replacing a crossbar switch with a memory having an input and output commutation system and some counting logic. Advantages of this approach are low cost and linear growth. Ml J S MAYO An approach to digital system network IEEE Transactions on Communication Technology April 1967 pp 307-310 This paper deals with synchronizing communication between devices with unlocked clocks. A system with frame sync is postulated and the number of bits necessary for efficient pulse stuffing is derived. M2 J D MENG A serial input/output scheme for small computers Computer Design March 1970 pp 71-75 This article describes the trade-offs and results of designing an I/O data bus structure for a minicomputer. M3 J S MILLER et al Multiprocessor computer system study NASA Contract No 9-9763 March 1970 This report reviews the number and type of busses used in several computing systems such as: CDC 6000, IBM DCS, IBM 360 ASP series, IBM 4-Pi, Burroughs D825 and 5500, etc. It goes on to suggest the design of a multiprocessor for a space station. In particular the system has two busses, one for I/O and one for internal transfers. Specifically described are: message structure, access control, error checking and required bandwidth. A 220 MHz bandwidth requirement is deduced. . M4 W F MILLER R ASCHENBRENNER The GUS multicomputer system IEEE Transactions on Computers December 1963 pp 671-676 This paper describes an Argonne Lab experimental computer with several memory and processing subsystems. All internal memory communication is handled by the Dis- 738 M5 PI P2 P3 Ql Rl R2 R3 Fall Joint Computer Conference, 1972 tributor, which functions as a data exchange and is expandable. No detailed description of the Distributor operation is furnished. R C MINNICK et al Cellular bulk transfer systems Air Force Contract No FI9628-67-C-0293 3 AD683744 October 1968 Part C of this report describes a bulk transfer system composed of an input array, an output array, and a mapping device. The mapping device moves data from the input to the output array and may contain logic. Simple bulk transfer systems are described which perform permutation on the data during its mapping. P E PAYNE A method of data transmission requiring maximum turnaround time Computer Design November 1968 p 82 This article describes a method of controlling data transmission between devices by polling. M PIRTLE Intercommunication of processors and memory Proceedings FJCC 1967 AFIPS Press pp 621-633 This paper discusses the throughput of several different bus structures in a system configuration with the intent of providing the appropriate amount of memory bandwidth. It describes the allocation sequence of a typical bus, and concludes that it can be very effective to assign " ..• priorities to requests, rather than to processors and busses ... with memory systems which provide ample memory bus bandwidth to the processors." W W PLUMMER Asynchronous arbiters Computation Structures Group Memo No 56 MIT Project MAC February 1971 This memo describes logic for determining which of several requesting CPU's get access and in what order to a memory. It is potentially a portion of the control logic for a bus structure, and describes several different algorithms for granting access. J T QUATSE et al The external access network of a modular computer system Proceedings SJCC 1972 AFIPS Press pp 783-790 This paper describes the External Access Network (EAN), a switching network designed to interface processors to processors, processors to facilities, and memory to facilities in a modular time sharing system (PRIME) being built at Berkeley. The EAN acts like a crossbar switch or data exchange, and consists of processor, device, and switch nodes. To communicate, a processor selects an available switch node and connects the appropriate device node to it. R RICE WR SMITH SYMBOL-a major departure from classic software dominated Von Neumann computing systems Proceedings SJCC 1971 AFIPS Press pp 575-587 This paper describes a functionally designed bus-oriented system. The system bus consists of 200 interconnection lines which run the length of the mainframe. R RINDER The input/output architecture of minicomputers Datamation May 1970 pp 119-124 This article surveys the architecture of minicomputer I/O units. It describes a typical I/O bus and the lines of information it would carry. M P RISTENBATT D R ROTHSCHILD Asynchronous time multiplexing IEEE Transactions on Communication Technology June 1968 pp 349-357 This paper describes the use of "asynchronous time multiplexing" techniques on analog data. Basically, the paper describes a synchronous system with non-dedicated time slots. R4 K ROEDL R STONER Unique synchronizing technique increases digital transmission rate Electronics March 15 1963 pp 75-76 This note provides a method of synchronizing two devices having local clocks of supposedly equal frequencies. R5 K K ROY Cellular bulk transfer system PhD Thesis Montana State University Bozeman Montana March 1970 Bulk transfer systems composed of input logic, output logic, and a mapping device are studied. The influences of mapping device, parallelism, etc., are considered. SI T SAITO H INOSE Computer simulation of generalized mutually synchronized systems Symposium on Computer Processing in Communications Polytechnic Institute of Brooklyn April 1969 pp 559-577 This paper describes ten ways to mutually synchronize devices having separate clocks so that data can be accurately . delivered in the correct time slot of a synchronous system. The results of the simulation relate to the stability of the synchronizing methods. S2 J SANTOS M I OTERO On transferences and priorities in computer networks Symposium on Computers and Automata Vol 21 1971 pp 265-275 The structure of bus (channel) controllers is considered using the language of automata theory. The controller is decomposed into two units: one receives requests and availability signals, and generates corresponding requests to the other unit which allocates the bus on a priority basis. Both units are further decomposed into subunits. S3 J W SCHWARTZ Synchronization in communication satellite systems NEC 1967 pp 526-527 This paper describes tradeoffs and potential solutions to the clock skew problem in a widely dispersed system. S4 C D SMITH Optimization of design parameters for serial TDM Computer Design January 1972 pp 51-54 This article derives analytical tools for the analysis and optimization of a synchronous system with global plus local timing. S5 D J SPENCER Data bus design techniques NASA TM-X-52876 Vol VI pp 95-113 This paper discusses design alternatives for a multiplexed data bus to reduce point-to-point wiring cost and complexity. The author investigates coupling, coding, and control factors for both low and high signal-to-noise ratio lines for handling a data rate less than five million bits per second. S6 D C STANGA Univac 1108 multiprocessor system Proceedings SJCC 1971 AFIPS Press pp 67-74 This paper describes how memory accesses are made from the multiple processors to the multiple memory banks in the 1108 multiprocessor system. It gives a block diagram of the Systematic Approach to Design of Digital Bussing Structures S7 S8 S9 T1 T2 T3 T4 system interconnectivity and describes how the multiple module access units operate to provide multiple access paths to a memory module. D J STIGLIANI et al Wavelength division multiplexing in light interface technology AD-721085 March 1971 This report describes the fabrication of a five-channel optical multiplexed communication line, and suggests some alternatives for matching wavelength multiplexed light transmission times to digital electrical circuits. J N STURMAN An iteratively structured general purpose digital computer IEEE Transactions on Computers January 1968 pp 2-9 This paper describes a bus and its use in an iterative computer. The system is a dual dedicated bus structure with centralized control. J N STURMAN Asynchronous operation of an iteratively structured general purpose digital computer IEEE Transactions on Computers January 1968 pp 10-17 This paper describes the synchronization of an iterative structure computer. The processing elements are connected on a common complex symbol bus. To allow asynchronous operation, a set of timing busses are added to the system common complex symbol bus. The timing busses take advantage of their transmission line properties to provide synchronism of the processors. F W THOBURN A transmission control unit for high speed computer-tocomputer communication IBM Journal of Research and Development November 1970 pp 614-619 This paper describes a multiplex bus system for connecting a large number of computers together in a star organization. Special emphasis is given to the transmission control unit, a microprogrammed polling and interface unit which uses synchronous two-frequency modulation and a serializer/ de-serializer unit. K J THURBER Programmable indexing networks Proceedings SJCC 1970 AFIPS Press pp 51-58 This paper describes data routing networks designed to perform a generalized index on the data during the routing process. The indexing networks map an input vector onto an output vector. The mapping is arbitrary and programmable. Several different solutions are presented with varying hardware, speed, and timing requirements. The networks are described in terms of shift register implementations. K J THURBER Permutation switching networks Proceedings of the 1971 Computer Designer's Conference Industrial and Scientific Conference Management Chicago Illinois January 1971 pp 7-24 This paper describes several permutation networks designed to provide a programmable system capable of interconnecting system elements. The networks are partitioned for LSI implementation and can be utilized in a pipeline fashion. Algorithms are given to determine a program to produce any of the N! possible permutations of N input lines. K J THURBER et al Master executive control for AADC Navy Contract N62269-72-C-0051 June 18 1972 This report describes a systematic approach to the design of digital bus structures and applies this tool to the design of a bus structure for the Advanced Avionic Digital Computer. T5 WI W2 W3 W4 W5 W6 739 The structure is designed with three major requirements: flexibility, modularity, and reliability. A TURCZYN High speed data transmission scheme Proceedings 3rd Univac DPD Research and Engineering Symposium May 1968 The increasing complexity of multiprocessor computer systems with a high degree of parallelism within the computer system has created major internal communication problems. If each processing unit should be able to communicate with many other subsystems, the author recommends either a data exchange, or switching center, or parallel point-to-point wiring. The latter has the advantage of fast transfer and minimal data registers, but in a multiprocessor it results in a large number of cables. This paper discusses the state-of-the-art of internal multiplexing and multi-level coding schemes for reducing the number of lines in the system. E G WAGNER On connecting modules together uniformly to forma modular computer IEEE Transactions on Computers December 1966 pp 864-872 This paper provides mathematical group theoretic precision to the idea of uniform bus structure in cellular computers. P W WARD A scheme for dynamic priority control in demand actuated multiplexing IEEE Computer Society Conference Boston September 1971 pp 51-52 This paper describes a priority conflict resolution method which is used in an I/O multiplexer system. R WATSON Timesharing system design concepts Chapter 3-Communications McGraw-Hill 1970 pp 78-110 This chapter provides a summary of "communication" among memories, processors, lOP's, etc. The discussion is oriented toward example configurations. Subjects discussed are: (1) use of multiple memory modules, interleaving, and buffering to increase memory bandwidth; (2) connection of subsystems using direct connections, crossbar switches, multiplexed busses, etc.; and (3) the transmission medium. Items discussed under transmission medium are synchronous and asynchronous transmission, line types (simplex, halfduplex, and full-duplex), modulation, etc. DR WELLER A loop communication system for I/O to a small multi-user computer IEEE Computer Society Conference Boston September 1971 pp 49-50 This paper describes a single-line non-dedicated bus with daisy-chained control for the DDP-516 computer. Message format and speed of operation are detailed. G P WEST R J KOERNER Communications within a polymorphic intellectronic system Proceedings of Western Joint Computer Conference San Francisco May 3-5 1960 pp 225-230 This paper describes a crosspoint data exchange used in the RW-400 computer. The switch was mechanized using transfluxor cores. L P WEST Loop-transmission control structures IEEE Transactions on Communications June 1972 pp 531-539 This paper considers the problem of transmitting data on a 740 Fall Joint Computer Conference, 1972 communication loop. It discusses time slots, frame pulses, addressing techniques, and efficiency of utilization. It also discusses a number of ways for assigning time slots for utilization on the impact of slot size on loop utilization efficiency. W7 M W WILLARD L J HORKAN Maintaining bit integrity in time division transmission NAECON 1971 Record pp 240-247 This paper describes the tradeoffs involved in synchronizing high speed digital subsystems which are communicating over large distances. It considers clocking and buffering tradeoffs. W8 D R WULFINGHOFF Code activated switching-a solution to multiprocessing problems Computer Design April 1971 pp 67-71 The author points out that multiprocessor computer configurations have a large number of interconnections between elements causing considerable hardware and software complexity. He describes a technique whereby each program to be run is assigned a code, identifier, or signature; then when this program is activated the system resources it requires can be "lined-up" for use. He compares this scheme to that employed for telephone switching. Code activated switching is illustrated by two system block diagrams: a special purpose control computer and a general purpose time-shared computer. Y1 B S YOLKEN Data bus-method for data acquisition and distribution within vehicles . NAECON 1971 Record pp 248-253 This paper discusses a time division multiplexed bus, and considers bus control, bit synchronization, and technology tradeoffs. Z1 R E ZIMMERMAN The structure and organization of communication processors PhD Dissertation Electrical Engineering Department University of Michigan September 1971 This dissertation describes a multi-bus computer used as a terminal processor. It has a pair of instruction busses which start and then signal completion of processes performed in functional units or subsystems. The machine has three data busses: a memory bus which serves as the primary system communication bus, a flag address bus, and a flag data bus. All busses are eight bits wide and the three data busses are bidirectional. Z2 R J ZINGG Structure and organization of a pattern processor for hand-printed character recognition PhD Dissertation Iowa State University Ames Iowa 1968 This dissertation describes a bus-oriented special purpose computer designed for research in character recognition. The machine contains a control bus, a scratchpad memory bus, and three data busses. Each register that can be reached by a data bus has two control flip-flops associated with it and these determine to which data bus it is to be connected. These connections are controlled by a hardware command. The contents of several registers can be placed on one data bus to yield a bit-by-bit logical inclusive OR. Also, the contents of one data bus can be transferred to several registers and the contents of all three busses transferred in parallel under program command. This processor is a rather interesting example of a five bus processor. Improvements in the design and performance of the ARPA network by J. IVLIVlcQUILLAN, W. R. CROWTHER, B. P. COSELL, D. C. WALDEN, and F. E. HEART Bolt Beranek and Newman Inc. Cambridge, Massachusetts INTRODUCTION systems (called Hosts) and one communications processor called an Interface l\1essage Processor, or Il\1P. All of the Hosts at a site are directly connected to the Il\1P. Some Il\1Ps also provide the ability to connect' terminals directly to the network; these are called Terminal Interface l\1essage Processors, or TIPs. The Il\1Ps are connected together by wideband telephone lines and provide a subnet through which the Hosts communicate. Each Il\1P may be connected to as many as five other Il\I{Ps using telephone lines with bandwidths from 9.6 to 230.4 kilobits per second. The typical bandwidth is 50 kilobits. During these three years of network growth, the actual user traffic has been light and network performance under such light loads has been excellent. However, experimental traffic, as well as simulation studies, uncovered logical flaws in the IMP software which degraded performance at heavy loads. The software was therefore substantially modified in the spring of 1972. This paper is largely addressed to describing the new approaches which were taken. The first section of the paper considers some criteria of good network design and. then presents our new algorithms in the areas of source-to-destination sequence and flow control, as well as our new IMP-to-IMP acknowledgment strategy. The second section addresses changes in program structure; the third section reevaluates the IlVIP's performance in light of these changes. The final section mentions some broader Issues. The initial design of the ARPA Network and the Il\1P was described at the 1970 Spring Joint Computer Conference, l and the TIP development was described at the 1972 Spring Joint Computer Conference. 2 These papers are important background to a reading of the present paper. In late 1968 the Advanced Research Projects Agency of the Department of Defense (ARPA) embarked on the implementation of a new type of computer network which would interconnect, via common-carrier circuits, a number of dissimilar computers at widely separated, ARPA-sponsored research centers. The primary purpose of this interconnection was resource sharing, whereby persons and programs at one research center might access data and interactively use programs that exist and run in other computers of the network. The interconnection was to be realized using wideband leased lines and the technique of message switching, wherein a dedicated path is not set up between computers desiring to communicate, but instead the communication takes place through a sequence of messages each of which carries an address. A message generally traverses several network nodes in going from source to destination, and at each node a copy of the message is stored until it is safely received at the following node. The ARPA Network has been in operation for over three years and has become a national facility. The network has grown to over thirty sites spread across the United States, and is steadily growing; over forty independent computer systems of varying manufacture are interconnected; provision has been made for terminal access to the network from sites which do not enjoy the ownership of an independent computer system; and there is world-wide excitement and interest in this type of network, with a number of derivative networks in their formative stages. A schematic map of the ARPA Network as of the fall of 1972 is shown in Figure 1. As can be seen from the map, each site in the ARPA Network consists of up to four independent computer 741 742 Fall Joint Computer Conference, 1972 translate directly into increases in the maximum throughput rate that an IMP can maintain. Our new algorithm in this area is also given below. Source-to-destination flow control Figure t-ARPA network, logical map, August 1972 NEW ALGORITHMS A balanced design for a communication system should provide quick delivery of short interactive messages and high bandwidth for long files of data. The IMP program was designed to perform well under these pimodal traffic' conditions. The experience of the first two and one half years of the ARPA Network's operation indicated that the performance goal of low delay had been achieved. The lightly-loaded network delivered short messages over several hops in about one-tenth of a second. Moreover, even under heavy load, the delay was almost always less than one-half second. The network also provided good throughput rates for long messages at light and moderate traffic levels. However, the throughput of the network degra~ed significantly under heavy loads, so that the goal of high bandwidth had not been completely realized. 'Ye isolated a problem in the initial network design WhICh led to degradation under heavy loads. 3 •4 This problem involves messages arriving at a destination IMP at a rate faster than they can be delivered to the destination Host. We call this reassembly congestion. Reassembly congestion leads to a condition we call reassembly lockup in which the destination IMP is incapable of passing any traffic to its Hosts. Our algorithm to prevent reassembly congestion and the related sequence control algorithm are described in the following subsections. We also found that the IMP and line bandwidth requirements for handling IMP-to-IMP traffic could be substantially reduced. Improvements in this area . For efficiency, it is necessary to provide, somewhere m the network, aeertain amount of buffering between the source and destination Hosts, preferably an amount equal to the bandwidth of the channel between the Hosts multiplied by the round trip time over the channel. The problem of flow control is to prevent messages from entering the network for which network buffering is not available and which could congest the ~etwork and lead to reassembly lockup, as illustrated mFigure2. In Figure 2, IMP 'I is sending multi-packet messages to IMP 3; a lockup can occur when all the reassembly buffers in IMP 3 are devoted to partially reassembled messages A and B. Since IMP 3 has reserved all its remaining space for awaited packets of these partially reassembled messages, it can only take in those particular packets from IMP 2. These outstanding packets, however, are two hops away in IlVIP 1.· They cannot get through because IlVIP 2 is filled with store-and-forward packets of messages C, D, and E (destined for IMP 3) which IMP 3 cannot yet accept. Thus, IMP 3 will never be able to complete the reassembly of messages A andB. The original network design based source-to-destination sequence and flow control on the link mechanism previously reported in References 1 and 5. Only a single message on a given link was permitted in the subnetwork at one time, and sequence numbers were used to detect duplicate messages on a given link. We were always aware that Hosts could defeat our flow control mechanism by "spraying" messages over an inordinately large number of links, but we counted on the nonmalicious behavior of the Hosts to keep the IMP 1 - IMP 2 ID~ -- III II~ DID IMP3 /"1message A rGII" reassembly '---- -----~ ii-OBi ~ _ _ _ _ _ _ _ _ ......J "- message B reassembly Figure 2-Reassembly lockup Improvements in Design and Performance of ARPA Network number of links in use below the level at which problems occur. However, simulations and experiments artificially loading the network demonstrated that communication between a pair of Hosts on even a modest number of links could defeat our flow control mechanism; further, it could be defeated by a number of Hosts communicating with a common site even though each Host used only one link. Simulations3 ,4 showed that reassembly lockup may eventually occur when over five links to a particular Host are simultaneously in use. With ten or more links in use with multipacket messages, reassembly lockup occurs almost instantly. If the buffering is provided in the source IMP, one can optimize for low delay transmissions. If the buffering is provided at the destination IlVIP, one can optimize for high bandwidth transmissions. To be consistent with our· view of a balanced communications system, we have developed an approach to reassembly congestion which utilizes some buffer storage at both the source and destination; our solution also utilizes a request mechanism from source Il\1P to destination IMP.* Specifically, no multipacket message is allowed to enter the network until storage for the message has been allocated at the destination Il\1P. As soon as the source Il\1P takes in the first packet of a multipacket message, it sends a small control message to the destination IMP requesting that reassembly storage be reserved at the destination for this message. It does not take in further packets from the Host until it receives an allocation message in reply. The destination IMP queues the request and sends the allocation message to the source IlVIP when enough reassembly storage is free; at this point the source Il\1P sends the message to the destination. We maximize the effective bandwidth for sequences of long messages by permitting all but the first message to bypass the request mechanism. When the message itself arrives at the destination, and the destination IlVIP is about to return the Ready-For-Next-Message (RFNM), the destination IMP waits until it has room for an additional multipacket message. It then piggybacks a storage allocation on the RFNM. If the source Host is prompt in answering the RFNlVI with its next message, an allocation is ready and the message can be transmitted at once. If the source Host delays too long, or if the data transfer is complete, the source IMP returns the unused allocation to the destination. With this mechanism we have minimized the inter-message delay * This mechanism is similar to that implemented at the level of Host-to-Host protocol,6,7,8 indicative of the fact that the same sort of problems occur at every level in a communications system. 743 and the Hosts can obtain the full bandwidth of the network. We minimize the delay for a short message by transmitting it to the destination immediately while keeping a copy in the source IMP. If there is space at the destination, it is accepted and passed on to a Host and a RFNl\iI is returned; the source IMP discards the message when it receives the RFNM. If not, the message is discarded, a request for allocation is queued and, when space becomes available, the source IMP is notified that the message may now be retransmitted. Thus, no setup delay is incurred when storage is available at the destination. The above mechanisms make the IMP network much less sensitive to unresponsive Hosts, since the source Host is effectively held to a transmission rate equal to the reception rate of the destination Host. Further, reassembly lockup is prevented because the destination IMP will never have to turn away a multipacket message destined for one of its Hosts, since reassembly storage has been allocated for each such message in the network. Source-to-destination sequence control In addition to its primary function as a flow control mechanism, the link mechanism also originally provided the basis for source-to-destination sequence control. Since only one message was permitted at a time on a link, messages on each link were kept in order; duplicates were detected by the sequence number maintained for each link. In addition, the IMPs marked any message less than 80 bits long as a priority message and gave it special handling to speed it across the network, placing it ahead of long messages on output queues. The tables associated with the link mechanism in each IMP were large and costly to access. Since the link mechanism was no longer needed for flow control, we felt that a less costly mechanism should be employed for sequence control. We thus decided to eliminate the link mechanism from the IMP subnetwork. RFNl\1s are still returned to the source Host on a link basis, but link numbers are used only to allow Hosts to identify messages. To replace the per-link sequence control mechanism, we decided upon a sequence control mechanism based on a single logical "pipe" between each source and destination IMP. Each IMP maintains an independent message number sequence for each pipe. A message number is assigned to each message at the source IMP and this message number is checked at the destination Il\1P. All Hosts at the source and destination Il\1Ps share this message space. Out of an 744 Fall Joint Computer Conference, 1972 eight-bit message number space (large enough to accommodate the settling time of the network), both the source and destination keep a small window of currently valid message numbers, which allows several messages to be in the pipe simultaneously. Messages arriving at a destination IMP with out-of-range message numbers are duplicates to be discarded. The window is presently four numbers wide, which seems about right considering the response time required of the network. The message number serves two purposes: it orders the four messages that can be in the pipe, and it allows detection of duplicates. The message number is internal to the IMP subnetwork and is invisible to the Hosts. A sequence control system based on a single source/ destination pipe, however, does not permit priority traffic to go ahead of other traffic. We solved this problem by permitting two pipes between each source and destination, a priority (or low delay) pipe and a nonpriority (or high bandwidth) pipe. To avoid having each IMP maintain two eight-bit message number sequences for every other IMP in the network, we coupled the low delay and high bandwidth pipe so that duplicate detection can be done in common, thus requring only one eleven-bit message number sequence for each IMP. The eleven-bit number consists of a one-bit priority/ non-priority flag, two bits to order priority messages, and eight bits to order all messages. For example, if we use the letters A, B, C, and D to denote the two-bit order numbers for priority messages and the absence of a letter to indicate a nonpriority message, we can describe a typical situation as follows: The source IMP sends out nonpriority message 100, then priority messages lOlA and 102B, and then nonpriority message 103. Suppose the destination IMP receives these messages in the order 102B, lOlA, 103, 100. It passes these messages to the Host in the order lOlA, 102B, 100, 103. Message number 100 could have been sent to the destination Host first if it had arrived at the destination first, but the priority messages are allowed to "leapfrog" ahead of message number 100 since it was delayed in the network. The IMP holds 102B until lOlA arrives, as the Host must receive priority message A before it receives priority message B. Likewise, message 100 must be passed to the Host before message 103. Hosts may, if they choose, have several messages outstanding simultaneously to a given destination but, since priority messages can "leapfrog" ahead, and the last message in a sequence of long messages may be short, priority can no longer be assigned strictly on the basis of message length. Therefore, Hosts must explicitly indicate whether a message has priority or not. With message numbers and reserved storage to be accurately accounted for, cleaning up in the event of a lost message must be done carefully. The source Il\1P keeps track of all messages for which a RFNl\1 has not yet been received. When the RFNM is not received for too long (presently about 30 seconds), the source IMP sends a control message to the destination inquiring about the possibility of an incomplete transmission. The destination responds to this message by indicating whether the message in question was previously received or not. The source IMP continues inquiring until it receives a response. This technique guarantees that the source and destination IMPs keep their message number sequences synchronized and that any allocated space will be released in the rare case ~hat a message is lost in the subnetwork because of a machine failure. IMP-to-IMP transmission control We have adopted a new technique for IMP-to-IMP transmission control which improves efficiency by 10-20 percent over the original separate acknowledge/timeout/retransmission approach described in Reference 1. In the new scheme, which is also used for the Very Distant Host, 9 and which is similar to· Reference 10, each physical network circuit is broken into a number of logical "channels," currently eight in each direction. Acknowledgments are returned "piggybacked" on normal network traffic in a set of acknowledgment bits, one bit per channel, contained in every packet, thus requiring less bandwidth than our original method of sending each acknowledge in its own packet. The size of this saving is discussed later in the paper. In addition, the period between retransmissions has been made dependent upon the volume of new traffic. Under light loads the network has minimal retransmission delays, and the network automatically adjusts to minimize the interference of retransmissions with new traffic. Each packet is assigned to an outgoing channel and carries the "odd/even" bit for its channel (which is used to detect duplicate packet transmissions), its channel number, and eight acknowledge bits-one for each channel in the reverse direction. The transmitting IMP continually cycles through its used channels (those with packets associated with them), transmitting the packets along with the channel number and the associated odd/even bit. At the receiving IMP, if the odd/even bit of the received packet does not match the odd/even bit associated with the appropriate receive channel, the packet is accepted and the receive odd/even bit is complemented, otherwise the packet is a duplicate and is discarded. Improvements in Design and Performance of ARPA Network Every packet arriving over a line contains acknowledges for all eight channels. This is done by copying the receive odd/even bits into the positions reserved for the eight acknowledge bits in the control portion of every packet transmitted. In the absence of other traffic, the acknowledges are returned in "null packets" in which only the acknowledge bits contain relevant information (i.e., the channel number and odd/even bit are meaningless; null packets are not acknowledged). When an IMP receives a packet, it compares (bit by bit) the acknowledge bits against the transmit odd/even bits. For each match found, the corresponding channel is marked unused, the corresponding packet is discarded, and the transmit odd/even bit is complemented. In view of the large number of channels, and the delay that is encountered on long lines, some packets may have to wait an inordinately long time for transmission. We do not want a one-character packet to wait for several thousand-bit packets to be transmitted, multiplying by 10 or more the effective delay seen by the source. We have, therefore, instituted the following transmission ordering scheme: priority packets which have never been transmitted are sent first; next sent are any regular packets which have never been transmitted; finally, if there are no new packets to send, previously transmitted packets which are unacknowledged are sent. Of course, unacknowledged packets are periodically retransmitted even when there is a continuous stream of new traffic. In implementing the new IlVIP-to-IMP acknowledgment system, we encountered a race problem. The strategy of continuously retransmitting a packet in the absence of other traffic introduced difficulties which were not encountered in the original system, which COMMON STORE RELOAD I DIAGNOSTICS INITIALIZATION ITABLES ~~; ~~ BACKGROUND ~ TASK STORE a FORWARD ~ ~ TASK REASSEMBLY 0:~ TASK REPLY f0:-0 ~ MODEM TO IMP ~""-""-""-""-""-""-""-""-" -0~ IMP TO MODEM ~~ -0 HOST TO IMP -0 HOST TO IMP -0 ~~~-:-=:IM7.:P:--:T==O:-,;H~O:,=ST;-_ _ _-t~~ 24 PAGES ~ IMP TO HOST ~ SS1 TIMEOUT ~~ ~ ~ t-0 DEBUG STATISTICS STATISTICS ~ STAT. TABI.£S" ~ S::~ .~~ -" ~ MESSAGE TABLES, ALLOCATE TABLES ~ ROUTING TABLES to:::~'" ~___V=E~RY~D=IS~T~AN~T~H=OS=T_____~, ___ -0 Figure 3-Map of core storage I PAGE =512 WORDS BUFFER STORAGE PROTECTED PAGE 745 retransmitted only after a long timeout. If an acknowledgment arrives for a packet which is currently being retransmitted, the output routine must prevent the input routine from freeing the packet. Without these precautions, the header and data in the packet could be changed while the packet was being retransmitted, and all kinds of "impossible" conditions result when this "composite" packet is received at the other end of the line. It took us a long time to find this bug 1* PROGRAM STRUCTURE Implementation of the IMPs required the development of a sophisticated computer program. This program was previously described in Reference 1. As stated then, the principal function of the IMP program is the processing of packets, including the following: segmentation of Host messages into packets; receiving, routing, and transmitting of store-and-forward packets; retransmitting unacknowledged packets; reassembling packets into messages for transmission into a Host; and generating RFNMs and other control messages. The program also monitors network status, gathers statistics, and performs on-line testing. The program was originally designed, constructed, and debugged over a period of about one year by three programmers. Recently, .after about two and one-half years of operation in up to twenty-five IlVIPs throughout the network, the operational program was significantly modified. The modification implemented the algorithms described in the previous sections, thereby eliminating causes of network lockup and improving the performance of the IMP. The modification also extended the capabilities of the IMP so it can now interface to Hosts over common carrier circuits ( a Very Distant Host9 ), efficiently manage buffers for lines with a wide range of speeds, and perform better network diagnostics. After prolonged study and preliminary design,3.4 this program revision was implemented and debugged in about nine man months. * Interestingly, a similar problem exists on another level, that of source-destination flow control. If an IMP sends a request for allocation, either single- or multi-packet, to a neighboring IMP, it will periodically retransmit it until it receives an acknowledgment. If it receives an allocation in return, it will immediately begin to transmit the first packet of the message. The implementation in the IMP program sends the request from the same buffer as the first packet, merely marking it with a request bit. If an allocation arrives while the request is in the process of being retransmitted, the program must wait until it has been completely transmitted before it sends the same buffer again as the first packet, since the request bit, the odd/even bit, the acknowledge bits, and the message number (for a multipacket request) will be changed. This was another difficult bug. 746 Fall Joint Computer Conference, 1972 We shall emphasize in this section the structural changes the program has recently undergone. Data structures Figure 3 shows the layout of core storage. As before, the program is broken into functionally distinct pieces, each of which occupies one or two pages of core. Notice that code is generally centered within a page, and there is code on every page of core. This is in contrast to our previous practice of packing code toward the beginning of pages and pages of code toward the beginning of memory. Although the former method results in a large contiguous buffer area near the end of memory, it has breakage at every page boundary. On the other hand, "centering" code in pages such that there are an integral number of buffers between the last word of code on one page and the first word of code on the next page eliminates almost all breakage. There are currently about forty buffers in the IMP, and the IMP program uses the following set of rules to allocate the available buffers to the various tasks requiring buffers: • Each line must be able to get its share of buffers for input and output. In particular, one buffer is always allocated for output on each line, guaranteeing that output is always possible for each line; and double buffering is provided for input on each line, which permits all input traffic to be examined by the program, so that acknowledgments can always be processed, which frees buffers. • An attempt is made to provide enough store-andforward buffers so that all lines may operate at full capacipy. The number of buffers needed depends directly on line distance and line speed. We currently limit each line to eight or less buffers, and a pool is provided for all lines. Some numerical results on line utilization are presented in a later section. Currently, a maximum of twenty buffers is available in the store-and-forward pool. • Ten buffers are always allocated to reassembly storage, allowing allocations for one multipacket message and two single-packet messages. Additional buffers may be claimed for reassembly, up to a maximum of twenty-six. real and four fake) that can be connected; additionally, twelve words of code are replicated for each real Host that can be connected. The program has fifty-five words of tables for each of the five lines that can be connected; additionally, thirty-seven words of code are replicated for each line that can be connected. The program also has tables for initialization, statistics, trace, and so forth. The size of the initialization code and the associated tables· deserves mention. This was originally quite small. However, as the network has grown and the IMP's capabilities have been expanded, the amount of memory dedicated to initialization has steadily grown. This is mainly due to the fact that the IMPs are no longer identical. An IMP may be required to handle a Very Distant Host, or TIP hardware, or five lines and two Hosts, or four Hosts and three lines, or a very high speed line, or, in the near future, a satellite link. As the physical permutations of the IlVIP have continued to increase, we have clung to the idea that the program should be identical in all IMPs, allowing an IMP to reload its program from a neighboring IMP and providing other considerable advantages. However, maintaining only one version of the program means that the program must rebuild itself during initialization to be the proper program to handle the particular physical configuration of the IlVIP. Furthermore, it must be able to turn itself back into its nominal form when it is reloaded into a neighbor. All of this takes tables and code. Unfortunately, we did not foresee the proliferation WORDS o 500 HOSTS (8) IMPS (64) LINES (5) IN ITI ALiZATION STATISTICS TRACE REASSEMBLY ALLOCATE HEADER Figure 4 summarizes the IMP table storage. All IMPs have identical tables. The IMP program has twelve words of tables for each of the sixty-four IMPs now possible in the network. The program has ninetyone words of tables for each of the eight Hosts (four BACKGROUND TIME OUT 28 Figure 4-Allocation of IMP table storage 1000 Improvements in Design and Performance of ARPA Network ,;<--, "-, reas"!'lbI Y /M--{ , I -,... , FROM __----IO-Q-I-C----f-- .. ,.,-,.., \ , HOSTS \ \ \ J ' , "..I' ,, II ~ ..... " ~-RFNMs '" \',,:' receive allocate logic ;=t- TTY ""I DebuC) , Trace \ Parameters Statistics " Discard ,L. '( IMODEM , J./ \ acknowledged "- I \ packets~ I r\duplicotereceiv8 I {aCkets . Qcks...-J I Teletype 747 I I ~--........ I I packets I C free r---,; transmit \ raCkS S/F I I Routing/ I ""- ""- , .... _-" ""- .... 8ACKGROtJNQ / single packet " messages ........ - L..-_ _ _ _ , FROM HOSTS ,/ / request 1 I I I I I I I "TO L_~ /--M / MODEM ..... _'" request 8 aII oca tes o QUEUE oDERIVED PACKET • CHOICE ,,- '--.. ) ROUTINE Figure 5-Packet flow and processing of IMP configurations which has taken place; therefore, we cannot conveniently compute the program differences from a simple configuration key. Instead, we must explicitly table the configuration irregularities. The packet processing routines Figure 5 is a schematic drawing of packet flow and packet processing. * We here briefly review the functions of the various packet-processing routines and note important new features. Host-to-IMP (H~ I) This routine handles messages being transmitted from Hosts at the local site. These Hosts may either be real Hosts or fake Hosts (TTY, Debug, etc.). The routine acquires a message number for each message and passes the message through the transmi allocation logic which requests a reassembly allocation from the destination IMP. Once this allocation is received, the message is broken into packets which are passed to the Task routine via the Host Task queue. Task * Cf. Figure 9 of Reference 1. This routine diI:ects packets to their proper destination. Packets for a local Host are passed through the 748 Fall Joint Computer Conference, 1972 reassembly logic. When reassembly is complete, the reassembled message is passed to the IMP-to-Host routine via the Host Out queue. Certain control messages for the local IMP are passed to the transmit or receive allocate logic. Packets to other destinations are placed on a modem output queue as specified by the routing table. IMP-to-Modem (I....M) This routine transmits successive packets from the modem output queues and sends piggybacked acknowledgments for packets correctly received by the Modemto-IMP routine and accepted by the Task routine. Modem-to-IMP (M.... I) This routine handles inputs from modems and passes correctly received packets to the Task routine via the Modem Task queue. This routine also processes incoming piggybacked acknowledges and causes the buffers for correctly acknowledged packets to be freed. IMP-to-Host (I....H) This routine passes messages to local Hosts and informs the background routine when a RFNM should be returned to the source Host. Background The function of this routine includes handling the IMP's console Teletype, a debugging program, the statistics programs, the trace program, and several routines which generate control messages. The programs which perform the first four functions run as fake Hosts (as described in Reference 1). These routines simulate the operation of the Host/IlVIP data channel hardware so the Host-to-Il\1:P and Il\1:P-to-Host routines are unaware they are communicating with anything other than a real Host. This trick saved a large amount of code and we have come to use it more and more. The programs which send incomplete transmission messages, send and return allocations, and send RFNl\1:s also reside in the background program. However, these programs run in a slightly different manner than the fake Hosts in that they do not simulate the Host/IMP channel hardware. In fact, they do not go through the Host/IMP code at all, but rather put their messages directly on the task queue. Nonetheless, the principle is the same. Timeout This routine, which is not shown in Figure 5, performs a number of periodic functions. One of these functions is garbage collection. Every table, most queues, and many states of the program are timed out. Thus, if an entry remains in a table abnormally long or if a routine remains in a particular state for abnormally long, this entry or state is garbage-collected and the table or routine is returned to its initial or nominal state. In this way, abnormal conditions are not allowed to hang up the system indefinitely. The method frequently used by the Timeout routine to scan a table is interesting. Suppose, for example, every entry in a sixty-four entry table must be looked at every now and then. Timeout could wait· the proper interval and then look at every entry in the table on one pass. However, this would cause a severe transient in the timing of the IMP program as a whole. Instead, one entry is looked at each time through the Tim~out routine. This takes a little more total time but is much less disturbing to the program as a whole. In particular, worst case timing problems (for instance, the processing time between the end of one modem input and the beginning of the next) are significantly reduced by this technique. A particular example of the use of this technique is with the transmission of routing information to the IMP's neighbors. In general, an Il\1:P can have five neighbors. Therefore, it sends routing information to one of its neighbors every 125 msec rather than to all of its neighbors every 625 msec. In addition to timing out various states of the program, the Timeout routine is used.to awaken routines which have put themselves to sleep for a specified period. Typically these routines are waiting or some resource to become available, and are written as coroutines with the Timeout routine. When they are restarted by Timeout the test is made for the availability of the resource, followed by another delay if the resource is not yet available. PERFORMANCE EVALUATION In view of the extensive modifications described in the preceding sections, it was appropriate to recalculate the IlVIP's performance capabilities. The following section presents the results of the reevaluation of the IMP's performance and comparisons with the performance reports of Reference 1. Throughput VS. message length In this section we recalculate two measures of IMP performance previously calculated in Reference 1, the Improvements in Design and Performance of ARPA Network maximum throughput and line traffic. Throughput is the number of Host data bits that traverse an Il\1P each second. Line traffic is the number of bits that an Il\1P transmits on its communication circuits per second and includes the overhead of RFNl\ls, packet headers, acknowledges, framing characters, and checksum characters. To calculate the Il\1P's maximum line traffic and throughput, we first calculate the computational load placed on the Il\1P by the processing of one message. The computational load is the sum of the machine instruction cycles plus the input/output cycles required to process all the packets of a message and their acknowledgments, and the message's RFNl\1 and its acknowledgment. For simplicity in computing the computational load, we ignore the processing required to send and receive the message from a Host since this is only seen by the source and destination Il\,1Ps. A packet has D bits of data, S bits of software overhead, and H bits of hardware overhead. For the original and modified Il\t:IP systems, the values of D, S, andH are: Original Modified D 0-1008 bits S 64 (packet) +80 (ack) = 144 bits H 72 (packet) +72Cack) = 144 bits 0-1008 bits 80 bits (packet+ack) 72 bits (packet+ack) The input/output processing time for a packet is the time taken to transfer D+S bits from memory to the modem interface at one Il\1P plus the time to transfer D+S bits into memory at the other Il\1P. If R is the input/ output transfer rate in bits per second, * then the input/output transfer time for a packet is 2(D+S)/R. Therefore, the total input/output time, I m, for P packets in a B bit message is 2(B+PXS)/R. The input/output transfer time, I r , for a RFNl\1 is 2S/R. To each of these numbers we must add the program processing time, C; this is about the same for a packet of a message and a RFNl\1. * In this calculation we will be making the distinction between the 516 IMP (used originally and reported on in Reference 1) and the 316 IMP (used for all new IMPs). The 516 has a memory cycle time of 0.96 psec, and the 316 has a cycle of 1.6 psec. The 316 provides a two-cycle data break, in comparison with the four-cycle data break on the 516. Thus, the input/output transfer rates are 16 bits per 3.84 psec for the 516 and 16 bits per 3.2 psec for the 316. 749 For the original Il\1P program, the program processing time per packet consisted of the following: l\1odem Output 100 cycles Send out packet l\1odem Input 100 cycles Receive packet at other INIP Task 150 cycles Process it (route onto an output line) l\1odem Output 100 cycles Send back an acknowledgment l\,1odem Input 100 cycles Receive acknowledgment at first Il\1P 150 cycles Process acknowledgment Task 700 cycles Program processing time per packet For the modified Il\1P program, the program processing time consists of: l\1odem Output 150 cycles Send out packet and piggyback acks l\1odem Input 150 cycles Receive packet and process acks Task 250 cycles Process packet 550 cycles Program processing time per packet Finally, we add a percentage, V, for overhead for the various periodic processes in the IJV[P (primarily the routing computation) which take processor bandwidth. V is presently about 5 percent. Weare now in a position to calculate the computationalload (in seconds), L, ~f one P packet message: packets RFNM The maximum throughput, T, is the number of data bits in a single message divided by the computational loads of the message; that is, T = B / L. The maximum line traffic (in bits per second), R, is the throughput plus the overhead bits for the packets of the message and the RFNl\1 divided by the computationalload of the message. That is, R = T + (P + 1) X (S + H) =B __ +--.:..(P_+~l)_X_C-,--S_+_H_) L L The maximum throughput and line traffic are plotted for various message lengths in Figure 6 for the original and modified programs and for the 516 IMP and the 316 IMP. Fall Joint Computer Conference, 1972 750 250~----------~~--~--?---~-' The changes to the IMP system can be summarized as follows: 200 • The program processing time for a store-andforward packet has been decreased by 20 percent. • The line throughput has been increased by 4 percent for a 516 INIP and by 7 percent for a 316 IMP. 150 100 2 :3 4 5 6 7 8 250~--~--~~------~----~---' As a result, the net throughput rate has been increased by 17 percent for a 516 IMP and by 21 percent for a 316 IMP. Thus, a 316 IlVIP can now process almost as much traffic as a 516 IMP could with the old program. A 516 IMP can now process approximately 850 Kbs. • The line overhead on a full-length packet has been decreased from 29 percent to 16 percent. (J) b. 50 Kb 1000 Miles CI z 0 U lLI 0 0 2 :3 4 5 6 7 8 As a result, the effective capacity of the telephone circuits has been increased from thirty-eight full packet messages per second on a 50 Kbs line to forty-three full packet messages per second. (J) ...J ...J 250 Round trip delay vs. message length c. 230.4 Kb :!: 100 Miles 200 150 100 200 150 100 50 d. 230.4 Kb 1000 Miles ------------------------------------------------------- -- In this section we compute the minimum round trip delay encountered by a message. We define round trip delay as in Reference 1; that is, the delay until the message's RFNM arrives back at the destination IMP. A message has P packets and travels over H hops. The first packet encounters delay due to the packet processing time, C; the transmission delay, T p ; and the propagation delay, L. Each successive packet of the message follows C+Tp behind the previous packet. Since the message's RFNM is a single packet message with a transmission delay, T R , we can write the total delay as first packet successive packets RFNM For single packet messages, this reduces to o~~~~--~--~--~--~~--~ 012:3 45678 MESSAGE LENGTH (PACKETS) Figure 6-Line traffic and throughput vs. message length. The upper curves plot maximum line traffic, the lower curves plot maximum throughput The curves of Figure 7 show minimum round-trip delay through the network for a range of message lengths and hop numbers, and for two sets of line speeds and line lengths. These curves agree with experimental data.11·12 Improvements in Design and Performance of ARPA Network 751 Line utilization lOOO----------------------------------------~ 800 600 400 2 :3 4 5 8 6 The number of buffers required to keep a communications circuit fully loaded is a function not only of line bandwidth and distance but also of packet length, IJ\1P delay, and acknowledgment strategy. In order to compute the buffering needed to keep a line busy, we need to know the length of time the sending IMP must wait between sending out a packet and receiving an acknowledgment for it. If we assume no line errors, this time is the sum of: propagation delays for the packet and its acknowledgment, Pp and P A; transmission delays for the packet and its acknowledgment, Tp and T A; and the IMP processing delay before the acknowledgment is sent. Thus, the number of buffers needed to fully utilize a line is (Pp+Tp+L+P A+ TA)/Tp. Since Pp = P A, the expression for the number of buffers can be rewritten: 2P L+TA -+1+-Tp Tp 1000~--------------------------------------, 800 That is, the number of buffers needed to keep a line full is proportional to the length of the line and its speed, and inversely proportional to the packet size, with the addition of a constant term. To compute Tp , we must take into account the mix of short and long packets. Thus, we write T p = xTs+yTL x+y lOOO--------------------------------------~ 800 600 2 :3 4 5 6 7 MESSAGE LENGTH (PACKETS) Figure 7-Minimum round trip delay vs. message length. Curves show delay for 1-6 hops where x to y is the ratio of number of short packets to number of long packets and Ts and TL are the transmission delays incurred by short and long packets, respectively. The shortest packet permitted is 152 bits long (entirely overhead); the longest packet is 1160 bits long. Computing Ts and TL for any given line bandwidth is a simple matter; they typically range from 106 J.Lsec for Ts on a 1.4 Mbs line to 120.5 msec for TL on a 9.6 Kbs line. Assuming worst case IMP processing delay (that is, the acknowledge becomes ready for transmission just as the first bit of a maximum length packet is sent), L=TL • The acknowledge returns in the next outgoing packet at the other ,IMP, which we assume is of "average" size:* 8 Propagation delay, P, is essentially just "speed of * Variations of this assumption have only second order effects on the computation of the number of buffers required. 752 Fall Joint Computer Conference, 1972 - - lS:0L - - 8S:1L - - - 2S :lL -----lS:1L -···-OS:lL -------- --- ,/ ..,.......,. . .....-:;;." ,/ -:~--.~--.~--.~::.~.-=~~~~~~ .. , 11 10 light" delay, and ranges from 50 }lsec for short lines, through 20 msec for a cross country line, to 275 msec for a satellite link. We can now compute the number of buffers required to fully utilize a line for any line speed, line length, and traffic mix. Figure 8 gives the result for typical speeds, lengths, and mixes. Note that the knee of the curves occurs at progressively shorter d'stances with increasing line speeds. The constant term dominates the 9.6 Kbs case, and it is almost insignificant for the 1.4 Mbs case. Note also that the separation between members of each family of curves remains constant on the log scale, indicating greatly increased variations with distance. GENERAL COMMENTS ------ en - -- ~ .,/ /. / ~ ,/ /:.../,./ ...... -/ ~,,/ . / -..: ==.::. =-=.::. =:::..---=---::,:-::,:.,- ~ .' .~~ 0:: W LL LL The ARPA Network has represented a fundamental development in the intersection of computers and communications. Many derivative activities are proceeding with considerable energy, and we list here some of the important directions: ::::> al 11 10 LL 0 0:: W al 1000 ::!! ;:) :z 100 Figure 8-Number of buffers for full line utilization. Traffic mixes are shown as the ratio of number of short packets (8) to number of long packets (L) • The present network is expanding, adding IMP and TIP nodes at rates approaching two per month. Other government agencies are initiating efforts to use the network, and increasing rates of growth are likely. As befits the growing operational character of the ARPA Network, ARPA is making efforts to transfer the network from under ARPA's research and development auspices to an operational agency or a specialized carrier of some sort. • Technical improvements in the existing network are continuing. Arrangements have now been made to permit Host-IMP connections at distances over 2000 feet by use of common-carrier circuits. Arrangements are being made to allow the connection of remote-job-entry terminals to a TIP. In the software area, the routing algorithms are still inadequate at heavy load levels, and further changes in these algorithms are in progress. A major effort is under way to develop an IMP which can cope with megabit/second circuits and higher terminal throughput. This new "high speed modular IMP" will be based on a minicomputer, multiprocessor design; a prototype will be completed in 1973. • The network is being expanded to include satellite links to oversea nodes, and an entirely new approach is being investigated for the "multi-access" use of satellite channels by message switched digital communication systems. 13 This work could Improvements in Design and Performance of ARPA Network lead to major changes in world-wide digital communications. • Many similar networks are being designed by other groups, both in the United States and in other countries. These groups are reviewing the myriad detailed design choices that must be made in the design of message switched systems, and a wide understanding of such networks is growing. • The existence of the ARPA Network is encouraging a serious review of approaches to obtaining new computer resources. It is now possible to consider investing in major resources, because a national, or even international, network clientele is available over which to amortize the cost of such major resources. • Perhaps most important, the network has catalyzed important computer research into how programs and operating systems should communicate, with each other, and this research will hopefully lead to improved use of all computers. The ARPA Network has been an exciting development, and there is much yet left to learn. ACKNOWLEDGl\1ENTS Dr. Lawrence G. Roberts and others in the ARPA office have been a continuing source of encouragement and support. The entire "IMP group" at Bolt Beranek and Newman Inc. has participated in the development, installation, test, and maintenance of the Il\/[P subnetwork. In addition, Dr. Robert E. Kahn of Bolt Beranek and Newman Inc. was deeply involved in the isolation of certain. network weaknesses and in the formative stages of the corrective algorithms. Alex McKenzie made many useful suggestions during the writing of this paper. Linda Ebersole helped with the production of the manuscript. 753 4 R E KAHN W R CROWTHER Flow control in a resource sharing computer network Proceedings of the Second ACM IEEE Symposium on Problems in the Optimization of Data Communications Systems Palo Alto California October 1971 pp 108-116 5 F HEART S M ORNSTEIN Software and logic design interaction in computer networks Infotech Computer State of the Art Report No 6 Computer Networks 1971 6 S CARR S CROCKER V CERF Host/host protocol in the ARPA network Proceedings of AFIPS 1970 Spring Joint Computer Conference Vol 36 pp 589-597 7 S CROCKER J HEAFNER R METCALFE J POSTEL Function-oriented protocols for the ARPA network Proceedings of AFIPS 1972 Spring Joint Computer Conference Vol 40 pp 271-280 8 A McKENZIE Host/host protocol for the ARPA network Available from the Network Information Center as NIC 8246 at Stanford Research Institute Menlo Park California 94025 9 Specifications for the interconnection of a host and an IMP Bolt Beranek and Newman Inc Report No 1822 revised April 1972 10 K BARTLETT R SCANTLEBURY P WILKINSON A note on reliable full-duplex transmission over half duplex links Communications of the ACM 125 May 1969 pp 260-261 11 G D COLE Computer networks measurements techniques and experiments UCLA-ENG-7165 Computer Science Department School of Engineering and Applied Science University of California at Los Angeles October 1971 12 G D COLE Performance measurements on the ARPA computer network Proceedings of the Second ACM IEEE Symposium on Problems in the Optimization of Data Communications Systems Palo Alto California October 1971 pp 39-45 13 N ABRAMSON The ALOHA system-Another alternative for computer communications Proceedings of AFIPS 1970 Fall Joint Computer Conference Vol 37 pp 281-285 REFERENCES SUPPLEMENTARY BIBLIOGRAPHY 1 F E HEART R E KAHN S M ORNSTEIN W R CROWTHER D C WALDEN The interface message processor for the ARPA computer network Proceedings of AFIPS 1970 Spring Joint Computer Conference Vol 36 pp 551-567 2 S M ORNSTEIN FE HEART W R CROWTHER H K RISING S B RUSSELL A MICHEL The terminal IMP for the ARPA computer network Proceedings of AFIPS 1972 Spring Joint Computer Conference Vol 40 pp 243-254 3 R E KAHN W R CROWTHER A study of the ARPA network design and performance Report No 2161 Bolt Beranek and Newman Inc August 1971 (The following describe issues related to, but not directly concerned with, those discussed in the text.) H FRANK I T FRISCH W CHOU Topological considerations in the design of the ARPA computer network Proceedings of AFIPS 1970 Spring Joint Computer Conference Vol 36 pp 581-587 H FRANK R E KAHN L KLEINROCK Computer communication n6twork design-Experience with theory and practice Proceedings of AFIPS 1972 Spring Joint Computer Conference Vol 40 pp 255-270 754 Fall Joint Computer Conference, 1972 R E KAHN Terminal access to the ARPA computer network Courant Computer Symposium 3-Computer Networks Courant Institute New York November 1970 L KLEIN ROCK Analytic and simulation methods in computer network design Proceedings of AFIPS 1970 Spring Joint Computer Conference Vol 36 pp 569-579 A A McKENZIE B P COSELL J M McQUILLAN M J THROPE The network control center for the ARPA network To be presented at the International Conference on Computer Communications Washington D C October 1972 L G ROBERTS Extension of packet communication technology to a hand-held personal terminal Proceedings of AFIPS 1972 Spring Joint Computer Conference Vol 40 pp 295-298 L G ROBERTS B D WESSLER Computer network development to achieve resource sharing Proceedings of AFIPS 1970 Spring Joint Computer Conference Vol 36 pp 543-549 R THOMAS D A HENDERSON McROSS-A multi-computer programming system Proceedings of AFIPS 1972 Spring Joint Computer Conference Vol 40 pp 281-294 Cost effective priority assignment in network computers by E. K. BOWDON, SR. University of Illinois Urbana, Illinois and W. J. BARR Bell Telephone Laboratories Piscataway, New Jersey the task is large. Demand services are those services which, though defined in advance, may be required at any time and at a possibly unknown price. Frequently the only previous agreements made refer to the type of service to be delivered and limits on how much will be demanded. Examples of tasks which are run on a demand basis include research, information requests, and program debugging runs. University computing centers generally find that most of the services which they offer are of this type. Every installation manager who offers either contract or demand services should have a solid and acceptable answer to the critical question "What do I do when my computer breaks down?" If he wishes to ensure that he can meet all commitments, the only answer is to transfer tasks to another processor. This is where network computers enter the picture. If the center is part of a network computer, tasks can easily and quickly be transferred to another center for processing. The concept of transferring tasks between centers through a broker has been widely discussed in the literature. 1 •2 Our basic assumption is that economic viability for network computers is predicated on efficient resource sharing. This was, in fact, a major reason for the construction of several networks-to create the capability of using someone else's special purpose machine or unique process without having to physically transport the work. This type of resource sharing is easily implemented and considerable work has been done toward this goal. There is, however, another aspect of resource sharing which has not been studied thoroughly: loadleveling. By load-leveling we mean the transfer of tasks between computing centers for the purpose of improving INTRODUCTION Previously, the study of network computers has been focused on the analysis of communication costs, optimal message routing, and the construction of a communications network connecting geographically distributed computing centers. While these problems are far from being completely solved, enough progress has' been made to allow the construction of reasonably efficient network computers. One problem which has not been solved, however, is making such networks economically viable. The solution of this problem is the object of our analysis. With the technological problems virtually solved, it is readily becoming apparent that no matter whose point of view one takes, the only economically justifiable motivation for building a network computer is resource sharing. However, the businessmen, the users, the people with money to spend, could not care less whose resources they are using for their computer runs. They care only that they receive the best possible service at the lowest possible price. The computing center manager who cannot fill this order will soon find himself out of customers. "The best possible service ... " is, in itself, a tall order to fill. The computing center manager finds himself in a position to offer basically two kinds of computing services: contract services and demand services. Contract services are those services which the manager agrees to furnish, at a predetermined price, within specified time periods. Examples of this type of service include payroll runs, billings, and inventory updates. Each of these is run periodically and the value placed by the businessman on the timely completion of 755 756 Fall Joint Computer Conference, 1972 the throughput of the network or other criteria. We contend that the analysis and implementation of user-oriented load-leveling is the key to developing economically self-supporting network computers. A SCENARIO OF COST EFFECTIVENESS Until recently, efforts to measure computer efficiency have centered on the measurement of resource (including processor) idle time. A major problem with this philosophy is that it assumes that all tasks are of roughly equal value to the user and hence the operation of the system. • As an alternative to the methods used in the past, we propose a priority assignment technique designed to represent the worth of tasks in the· system. We present the hypothesis that tasks requiring equivalent use of resources are not necessarily of equivalent worth to the user with respect to time~ We would allow the option for the user to specify a "deadline" after which the value of his task would decrease, at a rate which he can specify, to a system determined minimum. With this in mind, we have proposed a measure of cost effectiveness with which we can evaluate the performance of a network with an arbitrary number of interconnected systems, as well as each system individually.3 We define our measure of cost effectiveness 'Y, as follows: where Lq is the number of tasks in the queue, M is the maximum queue length, R is the number of priority classes, a is a measure (system-determined) of the "dedicatedness" of the CPU to the processing of tasks in the queue, and n (3(i) = (R-i) :E [g( j) /f( j) ] j=1 where g (j) is the reward for completing task j (a user specified function of time) , and (3 (i) indicates a ratio of reward to cost for a given priority class and is sensitive to the needs of the user and the requirements imposed on the installation. It is user sensitive because the user specifies the reward and is installation sensitive because the cost for processing a task. is determined by the system. The measure of CPU dedicatedness (a), on the other hand, is an entirely installation sensitive parameter. The first problem which becomes apparent is that which arises if R-l :E (3(i) =0. i=O This occurs only in the situation where there is exactly one priority class (i. e., the non-priority case) . We will finesse away this problem by defining (3 for this case. Intuitively, this is obvious, since the smaller this term gets, the more efficiently (in terms of reward) a system is using its resources. Furthermore, in the absence of priorities, the order in which tasks are executed is fixed, so this term becomes irrelevant to our measure of cost effectiveness. Thus, for the nonpriority case, we have 'Y= (Lq/M) a which is simply<: the measure of the relevance of the queue to processing activities. This is precisely what we want if we are going to consider only load-leveling in non-priority systems. However, we are interested in the more general case in which we can assign priorities. An estimate of the cost to complete task j, f( j) is readily determined from the user-supplied parameters requesting resources. Frequently these estimated parameters are used as upper limits in resource allocation and the operating system will not allow the program to exceed them. As a result, the estimates tend to be high. On the other hand, lower priorities are usually assigned to tasks requiring a large amount of resources. So the net effect is that the user's parameters reflect his best estimate and we may be reasonably confident that they truly reflect his needs. At the University of Illinois computing center, for example, as of July 26, 1971, program charges are estimated by the following formula: f( j) is the cost (system determined) to complete taskj. The term (Lq/ M) a is a measure of the relevance of the queue to processing activities. Similarly, we can look at (3(i) asa measure of resource utilization. Note that j E (i) =0 R-l (M -Lq) cents=a(X+Y) (bZ+c)+d where X = CPU time in centiseconds, Y = number of I/O requests, (2) Cost Effective Priority Assignment in Network Computers g(j) g(j) g2 , t2 L..-_.-+-,,'_.-__ . -__ . --+-·--'lo,,-,in7 8:00 am 3:30pm tal Ideal function. 8:00am 3:30pm time time (b) Approximate function. Figure 1-Example of user's reward function Z = core size in kilobytes, a, b, c are weighting factors currently having the values 0.04, 0.0045, and 0.5, respectively. and d is an extra charge factor including $1.00 cover charge plus any special resources used (tape/disk storage, card read, cards punched, plotter, etc.). The main significance of the reward function g ( j) specified by the user is that it allows us to determine a deadline or deadlines for the task. Typically we might expect g (j) to be a polynomial in t, where t is the time in the system. For example, the following thoughts might run through the user's head: "Let's see, it's 10:00 a.m. now and I don't really need immediate results since I have other things to do. However, I do 'need the output before the 4:00 p.m. meeting. Therefore, I will make 3: 30 p.m. a primary deadline. If it isn't done before the meeting, I can't use the results before tomorrow morning, so I will make 8:00 a.m. a secondary deadline. If it isn't done by then I can't llRP. thp. results, so after 8: 00 a.m. T don't, care." The function g (j) this user is thinking about would probably look something like Figure 1a. Now, this type of function poses a problem in that it is difficult for the user to specify accurately and would require an appreciable amount of overhead to remember and compute. Notice, however, that even if turnaround time is immediate, the profit oriented installation manager would put the completed task on a shelf (presumably an inexpensive storage device) and not give it to the user until just before the deadline-thus collecting the maximum reward. As a result, there is little reason for specifying anything more than the deadlines, the rewards associated with meeting the deadlines, and the rate of decrease of the reward between deadlines, if any. Applying this reasoning to Figure 1a we obtain Figure lb. Note that this function is completely specified with only six parameters (deadlines tl, ~; rewards gl, g2; and rates of decrease ml, m2). ft 757 In general, we may assume that g (j) is a monotonically non-increasing, piecewise linear, reward function consisting of n distinct sets of deadlines, rewards, and rates of decrease. Thus we can simply specify g (j) with 3n parameters where n is the number of deadlines specified. Note that, in effect, the user specifies an abort time when the g (j) he specifies becomes less than f( j). If the installation happens to provide a "lower cost" service, I( j) and if g ( j) >I( j), this task would be processed, but only when all the tasks with higher g (j) had been processed. Now, what we are really interested in, is not so much an absolute reward, but a ratio of reward to cost. Since f( j) is, at best, only an estimate of cost, we cannot reasonably require a user to specify an absolute reward. A more equitable arrangement would be to specify the rewards in terms of a ratio g( j) /f( j) associated with each deadline. This ratio is more indicative of the relative worth of a task, both to the system and to the user, since it indicates the return on an investment. PRIORITY ASSIGNMENT Let us now turn our attention to the development of a priority assignment scheme which utilizes the reward/ cost ratios described in the previous section. Webegin by quantizing the continuum of reward/cost ratios into R distinct intervals. Each of these intervals is then assigned to one of R priority classes 0, 1, 2, ... , R -1 with priority 0 being reserved for tasks with highest reward/cost ratios and priority R -1 for tasks with reward/cost ratios of unity or less. A task entering the system will be assigned a priority according to its associated reward/cost ratio. We want to guarantee, if possible, that all priority 0 tasks will meet their deadlines. Furthermore, if all priority 0 tasks can meet their deadlines, we want to guarantee, if possible, that all priority 1 tasks will meet their deadlines and, in general, if all priority k tasks can meet their deadlines, we want to guarantee that as many priority class k+ 1 tasks as possible will meet their deadlines. (Note that we are concerned with guaranteeing deadlines rather than rationing critical resources. ) To facilitate the priority assignment, we introduce the following notation: For priority k, let T i denote the ith task. Then we assume for each Ti that we receive the following information vector: (Ti, where T i is an identifier, g/f, d i , Ti, 8i) 758 Fall Joint Computer Conference, 1972 gI j is the rewardl cost ratio associated with meeting .the task's deadline, d i is the task's deadline associated with glj, T i is the maximum processing time for the task, encounter a task T k , k~j, such that Sk::;Sk+l-n. (N ote that T j +1 and all its predecessors are guaranteed to meet their deadlines.) 4. However, if Sj+Tj>Si+1 but Fj Sj or Sj+Tj>Si+1, a deadline might be missed; so we proceed with Step 3. 3. COlllpacting schellle Let ij. denote the float time between any two tasks T j- 1 and Tj, where h is defined: 0 1 2 3 4 5 6 j F j= ~h = k=1 Tj, I I Tl T2 I ) A) + ~Tj where t is the current time. Now, starting with task Tj, if Sj+Tj>Sj+1 and Fj~Sj+Tj-Sj+l we assign a new starting time to T j given by: and we continue with T j - 1, T j - 2 , etc., until we I I 2 3 T2 T3 T4 \.~--- B) I 0 3 2 0 ~--~)~ INFORMATION VECTOR 4 5 I I I I I T3 = 2 = 1 = 1 d2 = 4 '2 = 2 52 = 2 d3 = 10 '3 = 3 53 = 7 d 4 = 14 '4 = 2 54 = 12 d 5 = 15 '5 = 1 55 = 14 T5 (3) I 0 d1 '1 51 Tl is k=1 9 SCHEDULE OF TASKS. j Sj-t 8 I I I I I I I I I I I h=sj- (Sj-l+Tj-1) Then, Fj, the total float time preceding given by: 7 FLOAT TIME, fi INFORMATION VECTORS. Figure 2-State of priority Class k at time t =0 T4 I T51 759 Cost Effective Priority Assignment in Network Computers 0 1 2 3 5 4 6 7 8 9 0 I I I I I I I I I I I ITl I T2 I I A) I 2 3 4 5 I I I I I T3 I I I T4 IT51 Tl S.OiEDULE OF TASKS BEFORE ASSIGNMENT. I I 1 I I I I 0 1 1 I I I I I I I I I I I I 'I I I 2 3 4 ITI I T2 I 5 6 7 8 9 0 2 I T3 3 4 5 I T5 ITs/ T4 B) SOiEDULE OF TASKS AFTER ASSIGNMENT. Tl T2 3 I T4 (T51 0 0 I I I I I I I I I I I I I I I I 1 ITl 2 3 4 I T2 I 5 7 6 8 9 T3 I 0 1 I 4 5 Tj "'" 1\ c) I 2 1 0 T3 i\ \ Tiis (Ti, 9, 2, 7). We find thatlh 84 since 7 +2> 7 and a deadline could be missed. But F3~83+T3-84 since 4~7+2-7=2, so we assign a new start time to T3: 83=84-T3=7-2=5. Now 82~83-T2 since 2~5-2, so the priority assignment is complete and all tasks are guaranteed to meet their deadlines. The resulting state of priority class k is shown in Figure 4. (N ote the effect of the last in first out rule for breaking ties on start times.) (iii) Next suppose the information vector for Ti is (Ti, 9, 4, 5). We find that d2 84 since 5 +4 > 7, and a deadline could be missed. But F3~83+T3-84 since 4~4 so we assign a new 0 A) SOiEDULE OF TASKS BEFORE ASSIGNf'lENT. 0 T5 15 0 Ts 141 INFORfv1ATlON VECTORS AFTER ASS I GNflENT • 0 1 2 3 ITl I T2 I 4 5 7 6 I 8 9 I 1 0 I 2 I 3 I T5 ITs I 5 I I I I I I I I I I I I I I I I B) I T3 T4 I SOiEDULE OF TASKS AFTER ASSIGNMENT. Figure 3-Results of priority assignments for Example (i) t=o, the state of priority class k is that shown in Figure 2. (Note that since all priority class k tasks have similar g/ I, we need not show these ratios.) Notice that forming the float time column is analogous to forming a forward difference table. .In each of the following examples we assume that Figure 2 is the initial state of priority class k. (i) Suppose the information vector (with g/I omitted) for Ti is (T i,6, 1,5). Beginning with T 5, we observe that lh 8s since 2+2>3, so we assign a new start time to T 2: S2=SS-T2=3-2=1. Now Sl+T1>S2 since 1 + 1 > 1, so we assign a new start time to T 1 : 81=82-T1=O. The priority assignment is complete and all tasks are guaranteed to meet their deadlines. The resulting state of priority class k is shown in Figure 5. (iv) As a final example, suppose that the information vector for Ti is (T i , 9, 5, 4). As before, we find that th < di ~ ds < d4 < ds, so we insert T i between T2 and Ts and renumber the tasks accordingly. However, 8s+Ts>S4 since 4+5>7, and a deadline could be missed. Furthermore, F s <8s+Ts-84 since 1 <4+5-7 =2, and the compacting scheme will not help us. Instead we leave the start times at their latest critical values and hope that sufficient float time is created later to enable the I 0 1 2 3 4 5 6 7 8 9 I 2 0 I I I I I I I I I I I ITi I T2 I I T3 I 3 4 0 1 2 4 ~I T2 I I 7 6 8 9 0 1 2 I 5 7 6 3 4 T2 I [iJ I 8 9 ITl ] B) T4 2 I 5 T4 IT51 Tl 1 T2 4 2 2 \ 1 6 \ TG 141 2 lli I 3 T2 I B) 5 I 6 7 T3 I 8 9 T4 I I 1 0 I 2 I I I 3 T5 I 4 Tz \ 2 1 1 1 4 0 2 2 9 \ T3 52 10 T4 37 14 T5 122 \ TG 141 ~ 15 5 ITGI SCHEDULE OF TASKS AFTER ASSIGNMENT. Tl c) \ SCHEDULE OF TASKS BEFORE ASS I GNt-'ENT. 4 1 0 2 0 INFORMATION VECTORS AFTER ASSIGWENT. Figure 5-Results of priority assignments for Example (iii) I T4 IT51 I I 0 1 I 2 I 3 I 4 C) I I T5 ITGI SCHEDULE OF TASKS AFTER ASS I GNMENT • I I I I I I I I I I I I I I I I 1 5 I 5 I I I I I I I I I I I I I I I I \ 0 4 W I A) 3 I I I I I T3 T3 51 10 T4 73 14 T5 12Z IT] 2 0 SCHEDULE OF TASKS BEFORE ASS I GNMENT • A) I I I I I 5 3 I I I I I I I I I I I 15 1 o 1 2 o I NFORMA.T ION VECTORS AFTER ASS I GNt-'ENT • Figure 6-Results of priority assignments for Example (iv) tasks to meet their deadlines. The results of this assignment are shown in Figure 6. Note that T4 is the task which is in danger of missing its deadline~ The last example brings up the problem of what to do with a task whose deadline is missed. We simply treat it as though it had just entered the system using the next specified deadline as the current deadline. If no further deadlines are specified, the task is assigned priority R -1 and will be processed accordingly. When a processor finishes executing a task the following scheduling algorithm is used to determine which task is to be processed next. Generally, the algorithm takes the highest priority task in the queue that is closest to its latest starting time. Cost Effective Priority Assignment in Network Computers Scheduling algorithm network of N centers, 'YN, 761 as follows: N Beginning with k = 0 and using l as an index, 'YN = L: Wi'Y i given that (4) i=1 1. We examine the float time, iI, for the first task in priority class k. Then for l = k+ 1 : 2. If il of priority class k S 'Tl for the first task of priority class l, we set k = l and continue with Step 1. Otherwise we continue with Step 3. 3. Set l = l+ 1 and continue with Step 2 until all priority classes have been considered. Then continue with Step 4. 4. Assign the first task, T I , in priority k to the available processor. The effect of this scheduling algorithm is quite simple. It instructs the scheduler to schedule the important tasks first and then, if there is sufficient time, schedule those lower priority tasks in such a way that as many deadlines as possible are met. In the foregoing we have tacitly assumed that each task enters the system sufficiently before its deadline to allow processing. The two algorithms taken together facilitate meeting the deadlines, where possible, of the higher priority tasks. Those tasks which do not meet their deadlines will tend to be uniformly late. LOAD LEVELING IN A NETWORK OF CENTERS Thus far we have been concerned only with cost effectiveness in a single center. Next, let us consider the more general problem of load leveling within a network of centers. Each center may contain a single computer or a subnet of computers. The topological and physical properties of such networks have been illustrated4- S and will not be discussed here. We wish to determine a strategy which optimizes the value of work performed by the network computer. That is, to guarantee that every task in each center will be processed, if possible, before its deadline and only those tasks that offer the least reward to the network will miss their deadlines. Implicit in this discussion is the simplifying assumption that any task can be performed in any center. This assumption is not as restrictive as it may sound since we can, for the purposes of load leveling, partition a nonhomogeneous network into sets of homogeneous subnetworks which can be considered independently. Thus, in the discussion which follows, we will assume that the network computer is homogeneous. We define the measure of cost effectiveness for a where the Wi are weighting factors that reflect the relative contribution of the ith center to the overall computational capability of the network, and 'Yi is the measure of cost effectiveness for the ith center. Note that if a center is a subnet of computers, we could employ this definition to determine the measure of cost effectiveness for the subnet. We also let Cii denote the cost of communication between centers i and j; and tii the transmission time between centers i and j. Ideally, we want the network computer to operate so that all tasks within the network are processed before their deadlines. If a task is in danger of missing its deadline, we want to consider it as a candidate for transmission to another center for processing. The determination of which tasks should be transferred follows the priority assignment (i.e., priority 0 tasks in danger of missing deadlines should be the first to be considered, priority 1 tasks next, etc.) . We note that this scheme may not discover all tasks that are in danger of missing their deadlines. In order to discover all tasks that might be in danger of missing their deadlines, we would require a look ahead scheme to determine the available float time and to fit lower priority tasks into this float time. The value of such a scheme is questionable, however, since we assume some float time is created during processing and additional float time may be created by sending high priority. tasks to other centers. Also, the overhead associated with executing the look ahead scheme would further reduce the probable gain of such a scheme. The determination of which center should be the recipient of a transmitted task can be determined from the measure of cost effectiveness of each center. Recall that the measure indicates the worth of the work to be processed within a center. Thus a center with a task in danger of missing its deadline will generally have a larger measure than a center with available float time. Thus, by examining the measures for each center, we can determine the likely recipient of tasks to be transmitted. These centers can in turn, examine their own queues and bid for additional work on the basis of their available float times. This approach has a decided economic advantage over broadcasting the availability of work throughout the network and transmitting the tasks to the first center to respond. The latter approach 762 Fall Joint Computer Conference, 1972 has been investigated by Farber9 and discarded in favor of bidding. Once a recipient center has been determined, we would transmit a given task only if the loss in reward associated with not meeting its deadline is greater than Cij, the cost of transmitting the task between centers and transmitting back the results. When a task is transmitted to a new center its deadline is diminished by tij, the time to transmit back the results, thus ensuring the task will reach its destination before its true deadline. Similarly, the reward associated with meeting the task's deadline is diminished by Cij, since this represents a reduction in profit. Then the task's g/f ratio is used to determine a new priority and the task is treated like one originating in that center. This heuristic algorithm provides the desired results that within each center all deadlines are met, if possible, and if any task is in danger of missing its deadline, it is considered for possible transmission to another center which can meet the deadline. SUMMARY We have introduced a priority assignment technique which, together with the scheduling algorithm, provides a new approach to resource allocation. The most important innovation in this approach is that it allows a computing installation to maximize reward for the use of resources while allowing the user to specify deadlines for his results. The demand by users upon the resources of a computing installation is translated into rewards for the center. This approach offers advantages to the user and to the computing installation. The user can exercise control over the processing of his task by specifying its reward/cost ratio which, in turn, determines the importance the installation attaches to his requests. The increased flexibility to the user in specifying rewards for meeting deadlines yields increased reward to the center. Thus the computing installation becomes cost effective, since for a given interval of time, the installation can process those tasks which return the maximum reward. A notable point here, is that this system readily lends itself to measurement. The measure of cost effectiveness is designed to reflect the status of a center using the priority assignment technique. From its definition, the value of the measure depends not only on the presence of tasks in the system but upon the priority of these tasks. Thus the measure reflects the worth of the tasks awaiting execution rather than just the number of tasks. Therefore, the measure can be used both, statistically, to record the operation of a center, and dynamically, to determine the probability of .available float time. This attribute enables us to predict the worth of the work to be performed in any center in the network and facilitates load-leveling between centers. We have spent a good deal of time discussing what this system does and the problems it attempts to solve. In the interest of fair play, we now consider the things it does not do and the problems it does not solve. One of the proposed benefits of a network computer is that it is possible to provide, well in advance, a guarantee that, at a predetermined price, a given deadline will be met. This guarantee is especially important for scheduled production runs, such as payroll runs, which must be processed within specified time periods. The system as presented does not directly allow for such a long range guarantee. However, to implement such an option, we simply modify the reward to include the loss of goodwill which would be incurred should such a deadline be missed. Perhaps the easiest way to implement this would be to reserve priority class zero for tasks whose deadlines were previously guaranteed. Under this system we could assure the user, with confidence, that the deadlines could be met at a predetermined (and presumably more expensive) price. A second problem with the system is that the algorithms do not optimize the mix of tasks which would be processed concurrently in a multiprogramming environment. A common strategy in obtaining a good mix is to choose tasks in such a way that most of the tasks being processed at one time are input/output bound (this is especially common in large systems which can support a large number-five or more-tasks concurrently). Generally smaller tasks are the ones selected to fill this bill. Under our system, the higher priority classes will tend to contain the smaller and less expensive tasks since priority is assigned on the basis of a cost multiplier which is user supplied. We assume a user would be much more reluctant to double (give a reward/cost ratio of 2) the cost of a $100 task than to double the cost of a $5 task. This reluctance will tend to keep a good mix present in a multiprogramming environment. The final problem we would like to consider is what to do with a task if (horror of horrors) all of its deadlines are missed. There are basically two options, both feasible for certain situations, which will be discussed. The ultimate decision as to \yhich is best rests with the computer center managers. Therefore, we will present the alternatives objectively without any intent to influence that decision. The first alternative made obvious by the presentation is that when. a task misses all of its deadlines the results it would produce are of no further use. Con- Cost Effective Priority Assignment in Network Computers tinued attempts to process a task in this instance would be analogous to slamming on the brakes after your car hits a brick wall; simply a waste of resources. Thus, if the deadlines are firm, a center manager could say that a task which misses all of its deadlines should be considered lost to the system. On the other hand, the results produced by a task could be of value even after the last deadline is missed. In this case the center manager could offer a "low cost" service under which tasks are processed at a reduced rate but at the system's leisure. The danger in this approach is that if run without outside supervision, the system could become saturated with low cost tasks to the detriment of more immediately valuable work. This actually happened at the University of Illinois during early attempts to institute a low cost option. The confusion and headaches which resulted from the saturation were more than enough to justify instituting protective measures. From the results of this experience, it is safe to say that no installation manager will let it happen more than once. Even in the presence of a few limitations, our system represents a definite positive step in the analysis of network computers. Our approach treats a network computer as the economic entity that it should be: a market place in which vendors compete for customers and in which users contend for scarce resources. The development of this approach is a first step in the long road to achieving economic viability in network computers. ACKNOWLEDGMENTS We are particularly grateful to Mr. Einar Stefferud of Einar Stefferud and Associates, Santa Monica, California for his constructive criticism and constant 763 encouragement in this effort. Weare also indebted to Professor David J. Farber of the University of California at Irvine, California for many interesting conversations about bidding in distributed networks. Finally, we would like to thank the referees for their careful reviews and suggestions to improve this paper. This research was supported in part by the National Science Foundation under Grant No. NSF GJ 28289. REFERENCES 1 E STEFFERUD Management's role in networking Datamation Vol 18 No 41972 2 J T HOOTMAN The computer network as a marketplace Datamation Vol 18 No 4 1972 3 E K BOWDON SR W J BARR Throughput optimi.zation in network computers Proceedings of the Fifth International Conference on System Sciences Honolulu 1972 4 N ABRAMSON The ALOHA system University of Hawaii Technical Report January 1972 5 H FRANK I T FRISCH Communication transmission and transportation networks Addison-Wesley Reading Massachusetts 1971 6 L KLEINROCK Communication nets stochastic flow and delay McGraw-Hill New York New York 1964 7 R SYSKI Introduction to congestion theory in telephone systems Oliver and Boyd Edinburgh 1960 8 E BOWDON SR Dispatching in network computers Proceedings of the Symposium on Computer Communications Networks and Teletraffic April 1972 9 D J. FARBER K C LARSON The structure of a distributed computing system-software Proceedings of the Symposium on Computer Communications Networks and Teletraffic April 1972 C.mmp-A mnItI-mInI-processor ··· * by WILLIAM A. WULF and C. G. BELL Carnegie-Mellon University Pittsburgh, Pennsylvania INTRODUCTION AND MOTIVATION REQUIREMENTS The CMU multiprocessor project is designed to satisfy two requirements: In the Summer of 1971 a project was initiated at eMU to design the hardware and software for a multiprocessor computer system using minicomputer processors (i.e., PDP-II's). This paper briefly describes an overview (only) of the goals, design, and status of this hardware/software complex, and indicates some of the research problems raised and analytic problems solved in the course of its construction. Earlier in 1971 a study was performed to examine the feasibility of a very large multiproceSsor computer for artificial intelligetnce research. This work, reported in the proceedings paper by Bell and Freeman, had an influence on the hardware structure. In some sense, this work can be thought of as a feasibility study for larger multiprocessor systems. Thus, the reader might look at the Bell and Freeman paper for general overview and potential, while this paper has more specific details regarding implementation since it occurs later and is concerned with an active project. It is recommended that the two papers be read in sequence. The following section contains requirements and background information. The next section describes the hardware structure. This section includes the analysis of important problem in the hardware design: interference due to multiple processors accessing a common memory. The operating system philosophy, and its structure is given together with a detailed analysis of one of the problems incurred in the design. One problem is determining the optimum number of "locks" which are in the scheduling primitives. The final section discusses a few programming problems which may arise because of the possibilities of parallel processing. 1. particular computation requirements of existing research projects; and 2. research interest in computer structures. The design may be viewed as attempting to satsify the computational needs with a system that is conservative enough to ensure successful construction within a two year period while first satisfying this constraint, the system is to be a research vehicle for multiprocessor systems with the ability to support a wide range of investigations in computer design and systems programming. The range of computer science research at eMU (i.e~, artificial intelligence, system programming, and computer structures) constrains processing power, data rates, and memory requirements, etc. (1) The artificial intelligence research at eMU concerned ~ith speech and vision imposes two kinds of requirements. The first, common to speech and vision, is that special high data rate, real time interfaces are required to acquire data from the external environment. The second more stringent requirement, is real time processing for the speech-understanding system. The forms of parallel computation and intercommunication in multiprocessor is a matter for intensive investigation, but seems to be a fruitful approach to achieve the necessary processing capability. (2) There is also a significant effort in research on operating systems and on understanding how software s Y'l3tems are to be constructed. Research in these a~eas has a strong empirical and experimental component, requiring the design and construction of many sy~tems. The primary * This work was supported by the Advanced Research Projects Agency of the Office of the Secretary of Defense (F44620-70-0107) and is monitored by the Air Force Office of Scientific Research. 765 766 Fall Joint Computer Conference, 1972 requirement of these systems is isolation, so they can be used in a completely idiosyncratic way and be restructured in terms of software from the basic machine. These systems also require access by multiple users and varying amounts of secondary memory. (3) There is also research interest in using Register Transfer Moduks (RTM's) developed here and at Digital Equipment Corporation (Bell, Grason, et al., 1972) and in production as the PDP-16 are designed to assist in the fabrication of hardware/software systems. A dedicated facility is needed for the design and testing of experimental system constructed of these modules. TIMELINESS OF MULTIPROCESSOR We believe that to assemble a multiprocessor system today requires research on multiprocessors. Multiprocessor systems (othpr than dual processor structures) have not become current art. Possibly reasons for this state of affairs are: 1. The absolutely high cost of processors and primary memories. A complex multiprocessor system was simply beyond the computational realm of all but a few extraordinary users, independent of the advantage. 2. The relatively high cost of processors in the total system. An additional processor did not improve the performance/ cost ratio. 3. The unreliability and performance degradation of operating system software,-providing a still more complex system structure-would be futile. 4. The inability of technology to permit construction of the central switches required for such structures due to low component density and high cost. 5. The loss of performance in multiprocessors due to memory access conflicts and switching delays. 6. The unknown problems of dividing tasks into sub tasks to be executed in parallel. 7. The problems of constructing programs for execution in a parallel environment. The possibility of parallel execution demands mechanisms for controlling that parallelism and for handling increased programming complexity. In summary, the expense was prohibitive, even for discovering what advantages of organization might overcome any inherent decrements of performance. However, we appear to have now entered a techno- logical domain when many of the difficulties listed above no longer hold so strongly: 1'. Providing we limit ourselves to multiprocessors of minicomputers, the total system cost of processors and primary memories are now within the price range of a research and user facility. 2'. The processor is a smaller part of the total system cost. 3'. Software reliability is now somewhat improved, primarily because a large number of operating systems have been constructed. 4'. Current medium and large scale integrated circuit technology enables the construction of switches that do not have the large losses of the older distributed decentralized switches (Le., busses). 5'. Memory conflict is not high for the right balance of processors, memories and switching system. 6'. ThNe has been work on the problem of task parallelism, centered around the ILLIAC IV and the CDC STAR. Other work on modular programming [Krutar, 1971; Wulf, 1971] suggests how subtasks can be executed in a pipeline. 7'. l\tIechanisms for controlling parallel execution, fork-join (Conway, 1963), P and V (Dijkstra, 1968), have been extensively discussed in the literature. Methodologies for constructing large complex programs are emerging (Dijkstra, 1969, Parnas, 1971). In short, the price of experimentation appears reasonable, given that then' are requirements that appear to be satisfied in a sufficiently direct and obvious way by a proposed multiprocessor structure. Moreover, there is a reasonable research base for the use of such structures. RESEARCH AREAS The above state does not settle many issues about multiprocessors, nor make its development routine. The main areas of research are: 1. The multiprocessor hardware design which we call the PMS structure (see Bell and Newell, 1971). Few multiprocessors have been built, thus each one represents an important point in design space. 2. The processor-memory interconnection (Le., the switch design) especially with respect to reliability. C.mmp-A Multi-Mini-Processor 3. The configuration of computations on the multiprocessor. There are many processing structures and little is known about when they are appropriate and how to exploit them, especially when not treated in the abstract but in the context of an actual processing system: Parallel processing: a task is broken into a number of subtasks and assigned to separate processors. Pipeline processing: various independent stages of the task are executed in parallel (e.g., as in a co-routine structure). Network processing: the computers operate quasi-independently with intercommunication (with various data rates and delay times). Functional specialization: the processors have either special capabilities or access to special devices; the tasks must be shunted to processors as in a job shop. Multiprogramming: a task is only executed by a single processor at a given time. Independent processing: a configurational separation is achieved for varying amounts of time, such that interaction is not possible and thus doesn't have to be processed. 4. The decomposition of tasks for appropriate computation. Detailed analysis and restructuring of the algorithm appear to be required. The speech-understanding system is one major example which will be studied. It is interesting both from the multiprocessor and the speech recognition viewpoints. 5. The operating system design and performance. The basic operating system design must be conservative, since it will run as a computation facility, however it has substantial research interest. 6. The measurement and analysis of performance of the total system. 7. The achievement of reliable computation by organizational schemes at higher levels, such as redundant computation. THE HARDWARE STRUCTURE This section will briefly describe the hardware design without explicitly relating each part to the design constraints. The configuration is a conventional multiprocessor system. The structure is given in Figure 1. There are two switches, Smp and Skp, each of which provide intercommunication among two sets of components. Smp allows each processor to communicate with all primary memories (in this case core). Skp ~l 767 Smp (m-to-p crosspoint) r- -- - - - -:- 1 - --I I 1 K.configurat1On I I K. configuration I '-- _ _ _ _ _ _ _ _ _ _I k Skp (p-to-k; nUll/ dual duplex/ crosspoint where: Pc/central processor; Mp/primary memory; T/terminals; KS/ slow device control (e. g., for Teletype); Kf/fast device control (e.g., for disk); Kc/control for clock, timer, interprocessor c~unication lBoth switches have static configuration control by manual and program control Figure I-Proposed CMU multiminiprocessor computer /C.mmp allows each processor (Pc), to communicate with the various controllers (K), which in turn manage the secondary memories (Ms), and I/O devices transducers (T). These switches are under both processor and manual control. Each processor system is actually a complete computer with its own local primary memory and controllers for secondary memories and devices. Each processor has a Data operations component, Dmap, for translating addresses at the processor into physical memory addresses. The local memory serves both to reduce the bandwidth requirements to the central memory and to allow completely independent operation and off-line maintenance. Some of the specific components shown in Figure 1 are: K.clock: A central clock, K.clock, allows precise time to be measured. A central time base is broadcast to all processors for local interval timing. K.interrupt: Any processor is allowed to generate an interrupt to any subset of the Pc configuration at any of several priority levels. Any pro- 768 Fall Joint Computer Conference, 1972 cessor may also cause any subset of the configuration to be stopped and/or restarted. The ability of a processor to interrupt, stop, or restart another is under both program and manual control. Thus, the console loading function is carried out via this mechanism. Smp: This switch handles information transfers between primary memory processors and I/O devices. The switch has ports (i.e., connections) for m busses for primary memories and p busses for processors. Up to min(m,p) simultaneous conversations possible via the cross-point arrangement. Smp can be "Set under programmed control or via manual switches on an override basis to provide different configurations. The control of Smp can be by any of the processors, but one processor is assigned the control. Mp: The shared primary memory, Mp, consists of (up to) 16 modules, each of (up to) 65k, 16 bit, words. The initial memories being used have the following relevant parameters: core technology; each module is 8-way interleaved; access time is 250 nanoseconds; and cycle time is 650 nanoseconds. An analysis of the performance of these memories within the C.map configuration is given in more detail below. Skp: Skp allows one or more of k Unibusses (the common bus for memory and i/o on an isolated PDP-l1 system) which have several slow, Ks (e.g., teletypes, card readers), or fast controllers,Kf, (e.g., disk, magnetic tape), to be connected to one of p central processors. The k Unibusses Jor the controllers are connected to the p processor Unibusses on a relatively long term basis (e.g., fraction of a second to hours). The main reasons for only allowing a long term, but switchable, connection between the k Unibusses and the processor is to avoid the problem of having to decide dynamically which of the p processors manage a particular control. Like Smp, Skp may be controlled either by program or manually. Pc: The processing elements, Pc, are slightly modified versions of the DEC PDP-ll. (Any of the PDP-11 models may be intermixed.) Dmap: The Dmap is a Data operations component which takes the addresses generated in the processor and converts them to addresses to use on the Memory and Unibusses emanating from the Dmap. There are four sets of eight registers in Dmap, enabling each of eight 4,096 word blocks to be relocated in the large physical memory. The size of the physical M p is 220 words (221 bytes). Two bits in the processor, together with the address type are used to specify which of the four sets of mapping registers is to be used. Dmap The structure of the address map, is described below and in Figure 2 together with its implications for two kinds of programs: the user and the monitor programs. For the user program, the conventional PDP-11 addressing structure is retained-except that a program does not have access to the "i/o page," and hence the full 16-bit address space refers to the shared primary memory. A PDP-11 program generates a 16-bit address, even though the Unibus has 18-bit addressing capability. In this scheme the additional two address bits are obtained from two unused program status (PS) register bits. (N ote, this register is inaccessible to user pro- User's l6-bit address I bank 00 bank 01 1------I...;;..-.2.!ba~nk:...!s~el~ec:!ti~on!!.._J register selection bank 10 bank 11 no relocation (local UNIBUS) ~j I format: 2l-bit ! CHUibus Address reserved for expansion of physical page number (reserved) NXM Write protect 'Written-into' Figure 2-Format of data in the relocation registers C.mmp-A Multi-Mini-Processor grams.) These are two additional bits, provides four addressing modes: OO-mOdej oI-mode IO-mode II-mode These addresses are always mapped, and always refer to the shared, large, primary memory. All but 8 kw (kilo words) of this address space is mapped as above. The 8 kw of this space which is not mapped refers to the private Unibus of each processor; 4 kw of this space is for private (local) memory and 4 kw is used to access i/o devices attached to the processor. For mapped references, the mapping consists of using the most significant five bits of the 18-bit address to select one of 30 relocation registers, and replacing these by the contents of the 8 low order bits of that register yielding an overall 2I-bit address. Alternatively, consider that two bits of the PS select one of four banks of relocation registers and the leftmost three bits of the users (I6-bit) address select one of the eight registers in this bank (six in bank three). A program may (by appropriate monitor calls) alter the contents of the relocation registers within that bank and thus alter its "instantaneous virtual memory" -that is, the set of directly addressable pages. The format of each of the 30 relocation registers is as also shown in Figure 2 where: 1. The 'written-into' bit is set (to 1) by the hardware whenever a write operation is performed on the specified page. 2. The 'write protect' bit, when set, will cause a trap on (before) an attempted write operation into the specified page. 3. The NXM, 'non-existent memory', when set, will cause a trap on any attempted access to the specified page. Note: this is not adequate for, nor intended for, 'page fault' interruption. 4. The 8-bit 'physical page number' is the actual relocation value. THE MEMORY INTERFERENCE PROBLEM One of the most crucial problems in the design of this multiprocessor is that of the conflict of processor requests for access to the shared memories. Strecker (1970) gives closed form solutions for the interference in terms of a defined quantity, the UER (unit execution rate). The UER is, effectively, the rate memory references and, for the PDP-II, is approximately twice the actual instruction execution rate. 769 (Although a single instruction may make from one to five memory references, about two is the average.) Neglecting i/o transfers*, assuming access requests to memories at random, and using the following mean parameters: tp the time between the completion of one memory request and the next request ta,te the access time and cycle time for the memories to be used tw = te - til. the rewrite time of the memory Strecker gives the following relations: UER = (m/te) (1 - (1 = l/m)p) UER = m X 1 - (1 - I/m)p t 1 - (1 - I/m)p UER = (m/te) (1 - (1 - P m/m)p) tp tr where Pm + (m/p)(--)(l - (1 - Pm/mP )) te - - I = 0 Various speed processors, various types of memories, and various switch delays, td, can be studied by means of these formulas. Switch delays effects are calculated by adding to til. and t e, i.e., ta' = td + til.; and t e' = td + te. For example, the following cases are given in the attached graphs. The graphs show UER X 106 as a function of p for various parameters of the memories. The two values of td shown correspond to the ,estimated switch delay in two cable-length cases: 10' and 20'. The t e,til. values correspond to six memory systems which were considered. The value of tp is that for the PDP-II model 20. Given data of the form in Figures 3 and 4 it is possible to obtain the cost effectiveness of various processor-memory configurations. An example of this information for a particular memory configuration (16 memories, te = 400) and three different processors (roughly corresponding to three models of the PDP-II family) is plotted in Figure 5. Note that a small configuration of five Pc.I's has a performance of 4.5 X 106 accesses/second (UER). The cost of such a system is approximately $375K, yielding a cost-effectiveness of 12. Replacing these five processors with the same number of Pc.3's yields a UER of 15 X 106 for about $625K, or a cost-effectiveness of about 24. Following this strategy provides a very cost-effective system once a reasonably large number of processors are used. * A simple argument indicates that i/o traffic is relatively insignificant, and so has not been considered in these figures. For example, transferring with four drums or 15 fixed head disks at full rate is comparable to one Pc. 770 Fall Joint Computer Conference, 1972 accurate method under consideration is to associate a small memory with each crosspoint intersection. This can be constructed efficiently by having a memory array for each of the m rows, since control is on a row (per memory) basis. When each request for a particular row is acknowledged, a 1 is added to the register corresponding to the procesor which gets the request. These data could then serve as input to algorithms of the type described under (1). Such a scheme has the drawback of adding hardware (cost) to the switch, and possibly lowering reliability. Since the performance measures given earlier are quite good, even for large numbers of processors, this approach does not seem justified at this time. Legend 24 Processor: t M_ry: p - p • 700 ns (PDP-ll model 20) 1,5,10, ••• ,35 number memory modules = 8 22 t ,t = (300, J), (400 ,250), (650,350), c a (900,350) ,(1200,500) td = 190,270 20 18 16 <) ~ ---114 1. C.mmp-A Multi-Mini-Processor 775 A report by McCredie (McCredie, 1972) discusses two analytic models which have been used to study this problem; here we shall merely indicate the results. Figure 7 illustrates the relationship predicted by one of McCredie's models between the mean response time to a scheduling request, the number of critical sections, and the number of processors. Mean response time increases with the number of processors. For S constant, the increase in mean response time is approximately linear, with respect to N, until the system becomes congested. As N increases beyond this point, the slope grows with increasing N. The addition of one more critical section significantly improves mean response, for higher values of N, in both models. The additional locking overhead, L, associated with each critical section degrades performance slightly for small values of N. At these low values of N, the rate of requests is so low that the extra locking overhead is not compensated for by the potential parallel utilization of critical sections. The most interesting characteristic of these models is the large performance improvement achieved by the creation of a small number of additional critical sections. The slight response time degradation for low arrival rates indicates that an efficient design would be the implementation of a few (S = 2, 3 or 4) critical sections. This choice would create an effective safety valve. Whenever the load would increase, parallel access to the data would occur and the shared scheduling information would not become a bottleneck. The overhead at low arrival rates is about 5 percent percent and the improvement at higher request rates is approximately 50 percent. Given the dramatic performance ratios predicted by these modes, the HYDRA scheduler was designed so that S lies in the range 2~7 (the exact value of S depends upon the path through the scheduler). requires the introduction of language and system facilities for creating and synchronizing sub-tasks. Various proposals for these mechanisms have existed for some time, such as fork-join, "P" and "V", and they are not especially difficult to add to most existing languages, given the right basic hardware. Parallelism has a more profound effect on the programming environment, however, than the perturbations due to a few language constructs. The primary impact of parallelism is in the increase in complexity of a system due to the possible interactions between its components. The need is not merely for constructs to invoke and control parallel programs, but for conceptual tools dealing with the complexity of programs that can be fabricated with these constructs. In its role as a substrate for a number of rearch projects, C.mmp has spawned a project to investigate the conceptual· tools necessary to deal with complex programs. The premise of this research ~s that the approach to building large complex programs, and especially those involving parallelism, is essentially methodological in nature: the primitives, i.e., language features, from which a program is built are not nearly as important as the way in which it is built. Two particular methodologies-"top-down design" or "structured programming" (Dijkstra, 1969) and "modular decomposition" (Parnas, 1971) have been studied by others and form starting points for this research. While the solution to building large systems may be methodological, not linguistic, in nature, one can conceive of a programming environment, including a language, whose structure facilitates and encourages the use of such a methodology. Thus the context of the research has been to define such a system as a vehicle for making the methodology explicit. Although they are clearly not independent, the language and system issues can be divided for discussion. PROGRAMMING ISSUES Language issues Thus far both highly general and highly specific aspects of the hardware and operating system design of C.mmp have been described. These alone, however, do not provide a complete computing environment in which productive research can be performed. An environment of files, editors, compilers, loaders, debugging aids, etc., must be available. To some extent existing PDP-11 software can and will be used to supply these facilities. However, the special problems and potentials of a multiprocessor preclude this from being a totally appropriate set of facilities. The potential of true parallel processing obviously Most language development has concerned itself with "convenience"-providing mechanisms through which a programmer may more conveniently express computation. Language design has largely abdicated responsibility for the programs which are synthesized from the mechanisms it provides. Recently, however, language designers have realized that a particular construct, the general goto, can be (mis)used to easily synthesize "faulty" programs and a body of literature has developed around the theoretical and practical implications of its removing from programming languages (Wulf, 1971a). 776 Fall Joint Computer Conference, 1972 At the present stage of this research it is easier to identify constructs which, in their full generality, can be (mis) used to create faulty programs than to identify forms for the essential features of these constructs which cannot be easily misused. Other constructs a,re: Algol-like scope rules The intent of scope rules in a language is to provide protection. Algol-like scope rules fail to do this in two ways. First, and most obviously, these rules do not distinguish kinds of access; for example, "read-only" access is not distinguished from "read-write" access. Second, there is no natural way to prevent access to a variable at block levels "inside" the one at which it is declared. Encoding A common programming practice is to encode information, such as age, address, and place of birth, in the available data types of a language, e.g., integers. This is necessary, but leads to programs which are difficult to modify and debug if the manipulation of these encodings is distributed throughout a large program. Fixed representations Most programming languages fix both syntactic and run-time representations; they enforce distinctions between macros and procedures, data and program, etc., and they provide irrevocable representations of data structures, calling sequences, and storage allocation. Fixed representations force programmers to make decisions which might better be deferred and, occasionally, to circumvent the fixed representation (e.g., with in-line code). SYSTEMS ISSUES Programming should be viewed as a process, .not a timeless act. A language alone is inadequate to support this process. Instead, a total system that supports all aspects of the process is sought. Specifically, some attributes of this system must be: (a) To retain the constructive path in final and intermediate versions of a program and to make this path serve as a guide to the design, construction, and understanding of the program. For example, the source (possibly in its several representations) corresponding to object code should be recoverable for debugging purposes; this must be true independent of the binding time for that code. (b) To support execution of incomplete programs. A consequence of some of the linguistic issues discussed above is that decisions (i.e., code to implement them) will be deferred as long as possible. This must not pre---- ------------ G 2.2) [Digitalis] [produces] changes in G 2.3) [Digitalis1 [produces] [ changes] in I electrolytes G 2.4) [Digitalis] [produces 1 [changes] in I electrolytes G 2.5) [Digitalis J [produces] [changes in] I [electrolytes] I electrolytes I 3. CONJ Q (have)concentration [in] fluxes P movement in {at a) rate [movement] out of [(at a) rate] (has)most prominent (case in) ------- -+ C [ cells] and C [ cells] ,that is, C the cell and C [the cell] INTEREST WAS INITIALLY FOCUSED ON CHANGES IN POTASSIUM; MORE RECENTLY, CHANGES IN CALCIUM HAVE BEEN RECOGNIZED TO BE OF GREAT IMPORTANCE. I changes in potassium I changes in calcium TABLE II-Formatted Sentences 4-6 Symbols as in Table 1 4. TOXIC DOSES OF DIGITALIS CONSISTENTLY REDUCE THE INTRACELLUlAR CONCENTRATION OF POTASSIUM TIT A WIDE VARIETY OF CELLS, INCLUDING CARD~C MUSCLE CELLS. NO 4.1) Q G Digitalis toxic doses G 4.2) [Digitalis] [toxic doses] 5. Vss Vq D reduces consistently [reduces consistently] Nl I potassium I [potassium1 V Q (has)concentration intra- in a wide variety of cells including C (has)[concentration intra-] --- ~ 4 [cellular] in cardiac muscle cells J --------- slows potassium ---------~ influx into the cell CONCURRENTLY, INTRACELLUIAR SODIUM AND WATER ARE INCREASED. 6.1) increases 6.2) [increases1 results from C I 6. I C cellular THIS RESULTS FROM THE SLOWING OF THE INFLUX OF POTASSIUM INTO THE CELL. 5.1) ~---------------------------- --------------5.2) CONJ Ds N2 . Concurrently )1 I sodium (is) intra- I water [is intra- C ! cellular and[concurrently] cellular] I Syntactic Formatting of Science Information 799 TABLE III-Formatted Sentence 7 Symbols as in Table 1 7. IT IS NOT CERTAIN WHETHER THESE LINKED CHANGES IN SODIUM AND POTASSIUM ARE PRODUCED BY A SINGLE EFFECT OR ARE SEPARATELY MEDIATED. Q 7.1) single effect vss produces f +------------- ------------ II +-.------------- 7.6) v D r Q -----<6.1>-------- s --------- ---------+ ------------ -----<5.2>-------- ------------ CONJ D and + or mediates separately +-------------~ [mediates separately] +-------------- ------------ -----{5.2]-------- --------- ---------+ ~ +-------------- ------------ -----~<6.1>]------ --------- ---------+ is linked to +-------------- ------------ -----t<5. 2 >j------ --------- ---------+ . In (1.): While changes in cells produced by digitalis is ambiguous in English as a whole, it is not ambiguous in the sublanguage since nouns in the class C (cells) do not occur as the subject of Vss verbs (produce). In the sublanguage the word changes only operates on quantity words Q (e.g., amount, rate) or verbs which have an implicit Q on them. In the formats, therefore, change occupies the V q position. In the format for sentence 1 this places internal milieu of cells in the Se position, suggesting that it contains an implicit Q and V. This is supported by the fact that the paraphrase changes in the amounts of X 1, X 2, • • • in cells is an acceptable substitute for internal milieu of cells in all its textual occurrences in this sublanguage. In (2.): The first part of sentence 2 contains lost repeated material (zeroing) which can be reconstructed because of the strong grammatical requirements on the superlative form : Most prominent have been changes in ... is filled out to Most prominent of these changes have been changes in . . .. These changes is a classifier sequence replacing the full repetition of sentence 1, which is then shown in, the format as the first (zeroed) unary sentence of 2. In (2.3-2.5): The word which indicates that changes (along with digitalis produces) has been zeroed is the repeated in after and. In 2.2, the V in Se is (have) concentration in ( or: concentrate to some amount in), which in the sublanguage requires an object noun from the gross tissue-cell class T. Similarly in 2.3, the V fluxes (with unspecified P) requires an object noun from T. In the analyzed texts both of these Vs occurred almost -----[6.1]-------- [and] wh exclusively with the noun cell as their object. The definitional connective that is between 2.2,3 and 2.4,5 supports substituting the word cell for T. In (3.): The sublanguage requirements on the noun class I (potassium, sodium) as the first noun in Se1 when Se is operated on by V q (changes), are that the verb be of the type V IT or V II and the second noun be of class T or I. The continuity of this sentence with its surrounding sentences suggests that the verb is V IT and the noun T (more specifically C: cell) . In (5.1): The pronoun this replaces the entire preceding sentence. In (7.): These linked changes in sodium and potassium transforms into These changes in sodium and potassium which are linked. The portion up to which are linked is a classifier of the two preceding unary sentences, 6.1 and 5.2, pinpointed by the repetition of the words sodium and potassium in the classifier sequence. It is these two conjoined unary sentences which are operated on by a single effect produces in lines 7.1, 7.2, and again by mediates separately with unknown N subject, in lines 7.3, 7.4. The portion which are linked applies to both occurrences of 6.1 and 5.2 in 7.1-4. The wh in which is the connective and the ich part is a pronoun for the two sentences, as indicated by { }. The fact that the sentences were reconstructed by use of a classifier is indicated by the ( >inside the { } in 7.5-6. Although this sentence seems empty, it is common in scientific writing for a sentence to consist of references to previous sentences with new operators and conjunctions operating on the pronouned sentences. The linearity of language 800 Fall Joint Computer Conference, 1972 makes it difficult to express complex interconnections between the events (sentences) except with the aid of such pronouned repetitions of the sentences. The appearance of a word like effect in the column usually filled by a pharmacological agent noun G may herald the future occurrence of a new elementary sentence or a new set of conjoined elementary sentences (classified by the word effect) which will intervene between G and the present Se. This appears to be one of the ways that new knowledge entering the subfield literature is reflected in the formats and the sublanguage grammar. In fact, in the work described here, the first investigation, which covered digitalis articles up to about 1965, showed certain sets of words (including mechanism, pump and, differently, ATPase) appearing in the No or Ds column as an operator on Se. In later articles, which were investigated later, these nouns appeared increasingly as subjects of new Se subtypes listed above in the grammar, connected by conjunctions to the previously known Se. The shift of these words from occurring as operators to occurring in (or as classifiers of) new Se subtypes is the sublanguage representation of the advance of knowledge in the subfield. ACKNOWLEDGMENTS This work was supported by Research Grants R01 LM 00720-01, -02, from the National Library of Medicine, National Institutes of Health, DHEW. Important parts of the sublanguage grammar are the work of James Munz, to whom many of the results and methods are due. REFERENCES 1 F W LANCASTER Evaluation of the M edlars demand search National Library of Medicine 1968 2 Proceedings of 1971 Annual Conference of the ACM pp 564-577 3 N SAGER Syntactic analysis of natural language Advances in Computers 8 F Alt and M Rubinoff eds Academic Press New York 1967 4 N SAGER The string parser for scientific literature Courant Computer Symposium 8-Natural Language Processing R Rustin Ed Prentic Hall Inc Englewood Cliffs N J In press 5 String Program Reports Nos 1-5 Linguistic String Project New York University 1966-1969 6 D HIZ A K JOSHI Transformational decomposition-A simple description of an algorithm for transformational analysis of English sentences 2eme Conference sur Ie Traitement Automatique des Langues Grenoble 1967 7 String Program Reports No 6 Linguistic String Project New York University 1970 8 A F LYON A C DEGRAFF Reappraisal of digitalis, Part I, Digitalis action at the cellular level Am Heart J 72 4 pp 414-418 1961 Dimensions of text processing* by GARY R. MARTINS University of California Los Angeles, California good news for those of us with strong interests in one or another of the many kinds of textual data processing. But we must face the fact now that there remains a serious and large-scale educational task that must be undertaken if the future growth of textual data processing is to fulfill the high hopes for it that we now entertain. Text processing tasks and systems are too often considered in isolation from one another, with the results that (1) much design and implementation work needlessly duplicates prior accomplishments, and (2) potentially useful generalizations and extensions of existing systems for new applications are overlooked. This is a tutorial paper, then. My purpose is to take a broad view of. the text processing field in such a way as to emphasize the relations among different systems and applications. The structure of these relationships will be embedded in an informal descriptive space of two dimensions. In the interests of focussing attention on the unifying character of this framework, however imperfect and incomplete it surely is, I shall avoid the discussion of the internal details of specific systems. INTRODUCTION Numerical data processing has dominated the computing industry from its earliest days, when computing might better have been called a craft than an industry. In those early days it was not uncommon for a mixed group of scientists and technicians to spend an entire day persuading a roomful of vacuum tubes and mechanical relays to yield up a few thousand elementary operations on numbers. The emphasis on numerical applications was a wholly natural consequence of the dominant interests of the men and women who designed, built, and operated those early computing machines. Within a single generation, things have changed dramatically. Computing machines are vastly more powerful and reliable, and easier to use thanks to the efforts of the software industry. Perhaps of equal importance, access to computers can now be taken for granted in the more prestigious centers of education, commerce, and government. And we may be approaching the day w:hen computing services will be as widely available as the telephone. But it is still true that numerical data processing-"number crunching," in one form or another-is the principal application for computers. That, too, is changing, however. Due principally, I think, to the highly diversified needs and interests of the greatly expanded community of computer users, the processing of textual or lexicographic materials already consumes a significant percentage of this country's computing resources, and that share is rising steadily. By the end of this decade, if not before, text processing of various kinds may well become the main application of computers in the United States. This prediction, no doubt, carries ~he ring of authentic TEXTUAL DATA PROCESSING By "textual data processing" I mean a computing process whose input consists entirely or substantially of character strings. For the most part, it will be convenient to assume that this textual input represents natural language expressions, such as, for example, sentences in English. All kinds of systems running today fit this deliberately broad and usefully loose definition: programs to automatically make concordances, compile KWIC indexes, translate between languages, evaluate personnel reports, drive linotype machines, abstract documents, answer questions, perform content analysis, route documents, search libraries, and edit manuscripts. I am sure everyone here could add to this list. It will be instructive to include programming language compilers in our discussion, as well. * Research reported herein was conducted under Contract # F30602-70-C-0016 with the Advanced Research Projects Agency, Department of Defense. 801 802 Fall Joint Computer Conference, 1972 TWO DIMENSIONS OF TEXT PROCESSING An important set of relationships among these highly diverse activities can be clarified by locating them in an informal space of two dimensions: scope and depth of structure. The dimension of scope has to do with the magnitude of the task to be performed: the size of the problem domain, and the completeness of coverage of that domain. To illustrate, an operating productionoriented Russian-to-English machine translation system has a potentially vast input domain, namely, all the sentences of Russian. But an experimental model for such a system, containing only a tiny dictionary and a few illustrative grammar rules-something concocted for a demonstration, perhaps-has a highly restricted domain. The scope of the two systems differs greatly, with important consequences which we shall consider in a moment. The second of our dimensions measures the richness of structure or "vertical integration" developed for the text. This is essentially a linguistic dimension, reflecting the fact that the text itself is made up of natural language expressions of one kind or another. This dimension does not take into account simple linear physical divisions of text, such as the "lines" and "pages" of text editing systems like TECO.l Rather, it measures an essentially non-linear hierarchy of abstract levels of structure which define the basic units of interest in a given application. Scope Generally, the dimension of scope as applied to the description of text processing systems does not differ in any systematic way from the notion of scope as applied to other systems. It enables us to express the relative magnitude of the problem domain in which the system can be of effective use. There are two key factors to be considered in estimating the scope of a particular system. The first of these has to do with the generality of input data acceptable to the system. If the acceptable input is heavily restricted, the scope of the system is relatively small. Voice-actuated input terminals have been rather vigorously promoted during the past few years; their scope is small indeed, being limited to the effective recognition of a very small set of spoken words (names of digits and perhaps a few control words), and often demanding a mutual "tuning" of the terminal and its operators. Programming language compilers provide another rather different example of systems with limited scope in terms of acceptable input; both the vocabulary and syntax of acceptable statements are rigidly and narrowly defined. In contrast, text editing systems in general have wide scope in terms of acceptable input. The other factor which plays an important role in determining the scope of text processing (or other) systems has to do with the convenience and flexibility of the interface between the system and its users. Obviously, this factor will be of lesser importance in the evaluation of systems operated as a service in a closedshop batch-processing environment. It will be of major importance in relation to systems with which users become directly involved, interactively or otherwise. Compilers are a good example of such systems, as are such widely-used statistical packages as SPSS2 and BMD,3 and interactive processors like BBN-LISP,4 BASIC5 and JOSS. 6 More to our present point are text-editing systems such as TECO, QED,7 WYLBUR,8 HYPERTEXT9 and numerous others; in terms of acceptable data input, these latter systems impose few restrictions, but they may be said to differ significantly in overall scope on the basis of differences in their suitability for use by the data processing community at large. It takes a sophisticated programmer with extensive training to make use of TECO's powerful editing and filing capabilities, for example; this restricts the system's scope. A most ambitious assault on this aspect of the problem of scope in text editing systems is that of Douglas Englebart and his colleagues at Stanford University;lO a review of their intensive and prolonged efforts should convince anyone of the serious nature of the difficulties involved in widening the general accessibility of text editing systems. I am sure we have all had experiences with text processing systems of very restricted scope. A decade ago, it was a practice of some research organizations to arrange demonstrations of machine translation systems; in some memorably embarrassing instances, the scope of these systems was unequal to even the always quite carefully hedged, and sometimes entirely pre-arranged, test materials allowed as input. More commonly, we may have written or used text processing programs of one kind or another which were created in a deliberately "quick and dirty" fashion to answer some special need, highly localized in space and time. It is important to note that the highly restricted scope of such "one-shot" programs in no way diminishes their usefulness; given the circumstances, it may indeed have involved an extravagant waste of resources to needlessly expand their scope in either of the two ways I have mentioned. While it may be quite difficult to measure the relative scope of different text processing systems, at least the basic notions involved are simple: breadth of acceptable input data and, where appropriate, the breadth of the audience of users to which the system is effectively addressed. Let us now review the depth of structure dimension of text processing, where the basic notions involved may be somewhat less familiar. Dimensions of Text Processing Depth of Structure We may assume that the text to be processed first appears in the system as a continuous stream of characters. It will rarely be the case that our interest in the data will be satisfied at this primitive level of structure. We will most often be interested in words, or phrases, or sentences, or meanings, or some other set of constructs. Since these are not given explicitly in the character stream, it will be necessary to operate on the character stream to derive from it, or assign to it, the kinds of structures that will answer our purposes. The lowest level of structure, in this sense, consists of the sequence of characters in the stream. The highest level of structure might perhaps involve the derivation of the full range of meanings contained in and implied by the text. Between these extremes we may define a variety of useful and attainable structures. The dimension along which we measure these differences is that to which I have given the somewhat clumsy name of depth of structure. The number of useful applications of text processing at the lowest level of structural depth-the character stream-is quite large. Most text editing systems operate at this level. Other more specialized applications include the development of character and string occurrence frequencies, of use principally to crypt analysts and communications engineers. But, for applications which cannot be satisfied by simple mechanical patternseeking and matching operations, we must advance at least to the next level of structure, that of the pseudo-word. Pseudo-words The ordinary conventions of orthography and punctuation enable us to segment the character stream into word-like objects, and also into sentences, paragraphs, etc. The word-like objects, or pseudo-words, may be physically defined as character strings flanked by blank characters and themselves containing no blank characters. This is still a fairly primitive level of structure, and yet it suffices for many entirely respectable applications. Concordances have often been made by programs operating with this level of textual structure, for example. But there are serious limitations on the results that can be achieved. It is not possible, for instance, for the computer to determine that "talk" and "talked" and "talks" are simply variants of the same basic word, and that they should therefore be treated similarly for some purposes. The same difficulty appears in a more refractory form with the items "go" and "went." Thus, if our intended application will require 803 the recognition of such lexicographic variants as members of the same word family, it will be necessary to approach the next more or less clearly defined level of textual structure, that of true words. Word recognition There are two principal tools used for the recognition of words in text processing: morphological analysis and dictionaries. They come in many varieties, large and small, expensive and cheap. Either may be used without the other. The Stanford Inquirer content-analysis systemll employs a very crud£ kind of morphological analysis which consists in simply cutting certain character strings from the ends of pseudo-words, and treating the remainder as a true word. This procedure is probably better than nothing at all, but it can produce some bizarre confusions; for example, the letter "d" is cut from words ending in "ed," on the assumption that the original is the past tense or past participle form of a verb. "Bed" is thus truncated to "be," and so on. The widely used and highly successful KWIC12 indexing system operates with a crude morphological analysis of essentially this kind. ]\l10re sophisticated morphological analysis is attainable through the use of more flexible criteria by which word-variants are to be recognized, at the cost of a correspondingly more complex programming task. But it is hard to imagine what sort of rules would be needed to cope with the so-called strong verbs of English. The best answer to this problem is the judicious use of a dictionary together with the morphological analysis procedures. 13 In systems employing both devices, a pseudo-word is typically looked up in the dictionary before any analysis procedures are applied to it. If the word is found, then no further analysis is required, and the basic form of the word can be read from the dictionary, perhaps together with other kinds of in.. formation. Thus, "bed" would be presumably found in the dictionary, preventing its procrustean transformation to "be." Likewise, "went" would be found as a separate dictionary entry, along with an indication that it is a variant of "go." On the other hand, "talked" would presumably not be found in the dictionary, and morphological analysis rules would be applied to it, yielding "talk"; this latter would be found in the dictionary, terminating the analysis. Here I have tacitly assumed that our dictionary contains all the words of English, or enough of them to cover a very high percentage of the items encountered in our input text. In fact, there are very few such dictionaries in existence in machine-usable form. The reason is twofold: on the one hand, they are very 804 Fall Joint Computer Conference, 1972 expensive to create, and generally difficult to adapt, requiring a level of skills and time available only to the more heavily endowed projects; on the other hand, while they provide an elegant and versatile solution to the problems of word identification, most current text processing applications simply do not require the degree of versatility and power that large-scale dictionaries can provide. A number of attempts have been made in the past to build automatic document indexing and dissemination systems based upon the observed frequencies of words in the texts. In these and similar systems, it was found to be necessary to exclude a number of very frequent "non-content-bearing" words of English from the frequency tabulations-such words as "the," "is," "he," "in," which we might collectively describe as members of closed syntactic classes: pronouns, demonstratives, articles, prepositions, along with a few other high frequency words of little interest for the given applications. The exclusion of these words is accomplished through the use of a "stop list," a mini-dictionary of irrelevant forms. Such small, highly specialized dictionaries are easily and cheaply constructed, and have proven useful in a wide variety of applications. Automatic typesetting systems provide another good example of a useful text processing application operating at the level of the word on the dimension of depth of structure. The key problem for these systems is that of correctly hyphenating words to make possible the right and left justification of news columns. Morphological analysis of a quite special kind is employed to determine the points at which English words may be broken, and this analysis is often supplemented with small dictionaries of exceptional forms. Another more sophisticated but closely related application of rather narrow but compelling interest is that of automatically translating English text into Braille. Once again, specialized word analysis, supplemented by relatively small dictionaries of high frequency words and phrases, have been the tools brought to bear on the problem. Before moving on to consider text processing applications of higher rank on the scale of depth of structure, I should like to pause for a moment to comment on what I believe to be a wide-spread fallacy concerning text processing in general. Somehow, a great many people, in and out of the field of text processing, have come to associate strong notions of prestigiousness exclusively with systems ranking at the higher end of the dimension of depth of structure. It is hard to know how or why this attitude has developed, unless it is simply a reflection of a more general fascination with the obscure and exotic. But it would be most unfortunate if capable and energetic people were for this reason diverted from attending to the many still unrealized possibilities in text processing on the levels we have been discussing. We have only to compare the widespread usefulness of text processing systems operating at the word level and below with the generally meagre practical contributions of systems located further along this dimension to dispel the idea that there is greater intrinsic merit in the latter systems. In now moving further along the dimension of depth of structure, we leave behind a broad spectrum of highly practical and useful systems that sort and edit text, make indexes, classify and disseminate documents, prepare concordances, set type, translate English into Braille, perform content analysis, make elementary psychiatric diagnoses from writing samples, assist in the evaluation of personnel and medical records, and routinely carry out many other valuable tasks. It may be only moderately unjust to repeat here a colleague's observation that, in contrast, the principal product of the systems we are about to consider has been doctoral dissertations. Syntax Syntax is, roughly speaking, the set of relations that obtain among the words of sentences. For some applications in the text processing field, syntactic information is simply indispensable. The difference between "man bites dog" and "dog bites man" is a syntactic difference; it is a difference of no account in applications based upon word frequencies, for example, but it becomes crucial when the functions or roles of the words, in addition to the words themselves, must be considered. Syntactic analysis is most neatly accomplished when the objects of analysis have a structure which is rigidly determined in advance. The syntactic structure of valid ALGOL programs conforms without exception to a set of man-made rules of structure. The same is true of other modern programming languages, and of artificial languages generally. This fact, together with the tightly circumscribed vocabularies of such languages, makes possible the development of very efficient syntactic analyzers for them. Natural languages are very different, even though some artificial languages, especially query and control languages, go to great lengths to disguise the difference. In processing ordinary natural language text we are confronted with expressions of immense syntactic complexity. And, while most artificial languages are deliberately constructed to avoid ambiguity, ordinary text is often highly ambiguous; indeed, ambiguity is a vital and productive device in human communication. The syntactic analysis of arbitrary natural language text is therefore difficult, expensive, and uncertain. It will Dimensions of Text Processing come as no great surprise, then, that text processing systems that require some measure of syntactic analysis seldom carry the analysis further than is needed. Further, the designers of such systems have defined their requirements for syntactic analysis in a variety of ways. The result is that existing natural language text processing systems embody a great variety of analysis techniques. To some extent, this situation has been further complicated by debates among linguists as to what constitutes the correct analysis of a sentence, though the influence of these polemics has been minor. Over the past decade, and especially over the past five years or so, techniques for the automatic syntactic analysis of natural language text have improved rather dramatically, and are flexible enough today to accommodate a variety of linguistic hypotheses. Earlier, in discussing the place of the dictionary in the identification of words, I mentioned that such dictionaries might carry other information in addition to the word's basic form. Often, this other information is syntactic, an indication of the kinds of roles the word is able to play in the formation of phrases and sentences. These "grammar codes," as they are often called, are analogous to the familiar "part of speech" categories we were taught in elementary school, though in modern computational grammars these distinct categories may be numerous. A given word may be assigned one or more grammar codes, depending upon whether or not it is intrinsically ambiguous. A word like "lucrative" is unambiguously an adjective. But "table" may be a noun or a verb. A word like "saw" exhibits even greater lexical ambiguity: it may be a noun or either of two verbs in different tenses. The process of syntactic analysis, or parsing, generally begins by replacing the string of words-extracted from the original character stream as described earlier-by a corresponding string of sets of grammar codes. It then processes these materials one sentence at a time. Parsing in general does not cross sentence boundaries for the simple, though dismaying, reason that we know very little about the kinds of rule-determined connections between sentences, if indeed there are any of substance. On the other hand, the sentence is the smallest really satisfactory unit of syntactic analysis since we can be confident of our results for one part of a sentence only to the degree that we have successfully accounted for the rest of it, much as one is only sure of the solution to a really difficult crossword puzzle when the whole of it has been worked out. If a sentence consists entirely of lexically unambiguous words-a rarity in English-then there is only a single string of grammar codes for the parser to consider. More commonly, the number of initially possible grammar code sequences is much higher; it is, in fact, 805 equal to the product of the number of distinct grammar codes assigned to each word. Whatever the number, the parser must consider each of the possible sequences in turn, first assembling short sequences of codes into phrases-such as noun phrases or prepositional phrases -and then assembling the phrases into a unified sentential structure. At each step of the way, the par.ser is engaged in matching the constructs before it (i.e., word or phrase codes) against a set of hypotheses regarding valid assemblies of such constructs. The set of hypotheses is, in fact, the grammar which drives the parsing process. When a string of sub-assemblies corresponds to such a hypothesis (or grammar rule), it is assembled into a unit of the specified form, and itself becomes available for integration into a broader structure. To illustrate, consider just the three words "on the table." The parser, first of all, sees not these words, but rather the corresponding string of grammar code sets: PREPOSITION ARTICLE NOUN/VERB.* Typically, it may first check to see whether it can combine the first two items. A table of rules tells the parser, as common sense tells us, that it cannot, since "on the" is not a valid English phrase. So, it considers the next pair of items; the ambiguity of the word "table" here requires two separate tests, one for ARTICLE NOUN and the other for ARTICLE VERB. The former is a valid combination, yielding a kind of NOUN-PHRASE ("the table"). The ARTICLE + VERB combination is discarded as invalid. Now the parser has before it the following: PREPOSITION NOUN-PHRASE. Checking its table of rules, it discovers that just this set of elements can be combined to form a PREPOSITIONAL-PHRASE, and the process ends-successfully. This skeletal description of the parsing process is considerably oversimplified, and it omits altogether some important distinctive characteristics of parsing techniques which operate by forming broader structural hypotheses and thus playa more "aggressive" role in the analysis. The end result, if all goes well, is the same: an analysis of the input sentence, usually represented in the form of a labelled tree structure, which assigns to each word and to each phrase of the sentence a functional role. Having this sort of information, we are able to accurately describe the differences between "man bites dog" and "dog bites man" in terms of the different roles the words play in them. In this simple case, of course, the SUBJECT and OBJECT roles are differentially taken by the words "dog" and "man." I remarked earlier that few production-oriented systems incorporate large-scale dictionaries. The same + + * The notation "X/Y" is used here to indicate an item that may belong either to category X or to category Y. 806 Fall Joint Computer Conference, 1972 is true of syntactic analysis programs; large-scale sentence analyzers are still mainly experimental. The parsing procedures of text processing systems that are widely used outside the laboratory, with a very few interesting exceptions, are designed to produce useful partial results of a kind just adequate to the overall system's requirements. Economies of design and implementation are usually advanced as the reasons for these limited approaches to syntactic analysis. A meritorious example of limited syntactic analysis is provided by the latest version of the General Inquirer, probably the best known and most widely used of content-analysis systems. The General Inquirer-3,14 as it is called, embodies routines which are capable of accurately disambiguating a high percentage of multiple-meaning words encountered in input text. This process is guided by disambiguation rules incorporated in the system's large-scale dictionary, 15,16 the Harvard Fourth Psychosociological Dictionary. These rules direct a limited analysis of the context in which a word appears, using lexical, syntactic, and semantically-derived cues to arrive at a decision on the intended sense of the word. In this manner, nine senses are distinguished for the word "charge," eight for "close," seven for "air," and so on. There are language translating machine in daily use by various government agencies in this country. These machine translation systems are basically extensions of techniques developed at Georgetown University in the early 1960's. Their relatively primitive syntactic capabilities are principally aimed at the disambiguation of individual words and phrases, a task which they approach-in contrast with the General Inquirer-3with an anachronistic lack of elegance, economy, or . speed. For most purposes, the output of these systems passes through the hands of teams of highly-skilled bilingual editors who have substantial competence in the subject matter of the texts they repair. A most valuable characteristic of the government's machine translation systems is the set of very-large-scale machine-readable dictionaries developed for them over the course of the years. It is to be expected that major portions of these will prove to be adaptable to the more modern translation systems that will surely emerge in the years ahead. A working system employing full-blown syntactic analysis is the Lunar Sciences Natural Language I nformation System. 17 This system accepts information requests from NASA scientists and engineers about such matters as the chemical composition and physical structure of the materials retrieved from the moon. The queries are expressed in ordinary English, of which the system handles a rich subset. Unrecognizable structures result in the user's being asked to rephrase his query. The requests are translated into an internal format which controls a search of the system's extensive data base of lunar information. SeInantics The next milestone along the dimension of depth of structure is the level of semantics, which has to do with the meanings of expressions. Although semantic information of certain kinds has sometimes been used in support of lexical and syntactic processes (as, for example, in the disambiguation procedures of the General Inquirer-3), the number of working systems, experimental or otherwise, which process text systematically on the semantic level is close to zero. Those which do so impose strong restrictions on the scope of the materials which they will accept. The aforementioned lunar information system has perhaps the widest scope of any running system that operates consistently on the semantic level (with natural language input), and its semantics are closely constrained in terms of the concepts and relations it can process. More flexible semantic processors have been constructed on paper, and a few of these have been implemented in limited experimental versions/ 8 ,19,20,21 The more promising of these systems, such as RAND's MIND system,22,23 are based upon the manipulation of semantic networks in which generality is sought through the representation of individuals, concepts, relations, rules of inference, and constructs of these as nodes in a great labelled, directed graph whose organization is at bottom linguistic. It is an unsolved problem whether such an approach can produce the needed flexibility and power while avoiding classical logical problems of representation. The applications to which full-scale semantic processors might one day respond include language translation and question-answering or fact retrieval. At present, the often-encountered popular belief in superintelligent machines, capable of engaging in intelligent discourse with men, is borne out only in the world of science fiction. Beyond seInantics Pushing further along the dimension of depth of structure, beyond the rarified air of semantics, we approach an even more exotic and sparsely populated realm which I shall call pragmatics. There will be little agreement about the manner in which this zone should be delimited; I would suggest that systems operating at this level are endowed with a measure of operational Dimensions of Text Processing self-awareness, taking an intelligent view of the tasks confronting them and of their own operations. A remarkable example of such a system is Winograd's program24 which simulates an intelligent robot that manipulates objects on a tabletop in response to ordinary English commands. The tabletop, the objects (some blocks and boxes), and the system's "hand" are all drawn on the screen of a video console. The objects are distinguished from one another by position, size, shape, and color. In response to a command such as "Put the large blue sphere in the red box at the right." the "hand" is seen going through the motions necessary to accomplish this task, possibly including the preliminary removal of other objects from the indicated box. What is more, the system has a limited but impressive ability to discuss the reasons behind its behavior: Q: Why did you take the green cube out of the red box? A: So I could put the blue sphere in it. Q: Why did you do that? A: Because you told me to. We should note that the scope of this system is in many ways the most restricted of all the systems we have mentioned, an observation which subtracts nothing from its great ingenuity. SOME OBSERVATIONS Now let me share with you a number of more or less unrelated observations concerning text processing systems which have been suggested by the two-dimensional view of the field which I have just outlined. Information and depth of structure We process text in order to extract in useful form (some of) the information contained in the text. In general, the higher a text processing system ranks on the dimension of depth of structure, the greater is the amount of information extracted from a given input text. This seems an intuitively acceptable notion, but I believe it can be given a more or less rigorous restatement in the terms of information theory. In sketching one approach to this end I shall attack the problem at its most vulnerable points, leaving its more difficult aspects as a challenge for others. Consider a system which processes text strictly on a word-by-word basis, like older versions of the General 807 Inquirer, for example. Such a system will produce identical results for all word-level permutations of the input text. But arbitrary permutations of text in general result in a serious degradation of information. t We conclude that text processing systems which operate exclusively at this level of depth of structure are intrinsically insensitive to a significant fraction of the total information content of the original text. Similarly, syntactic processors whose operations are confined to single sentences (as is generally the case) are obviously insensitive to information which depends upon the relative order of sentences in the original text. And so on. These considerations are of interest primarily because of their potential use in the development of a uniform metric for the dimension of depth of structure. The dimensions are continuous* I want to suggest the proposition that neither of our dimensions is discrete, in the sense that it presents only a fixed number of disjoint positions across its range. That the scope of systems, in our terms, varies over a dense range is surely not a surprising idea. But the notion is fairly widespread, I believe, that the depth of structure of text processors can be described only in terms of a fixed number of discrete categories or "levels." Unfortunately, the use of the terms "syntactic level," "semantic level," etc. is difficult to avoid in discussing this subject matter, and it is perhaps the promiscuous use of this terminology which has contributed most to misunderstanding on this point. ** t It may not be easy to usefully quantify this information loss. The number of distinct word-level permutations of a text of n words is given by n!/(ftl 12! ... 1m!) where Ij is the number of occurrences of the j-th most frequent word in a text with m distinct words. For word-level permutations, this denominator expression might be generalized on the basis of the laws of lexical distribution (assuming a natural language text), replacing the factorials with gamma function expressions. After that, one faces the thorny empirical questions: how many such permutations can be interpreted, wholly or in part, at higher levels of structure, and how "far" are these from the original? * The term "continuous" is not meant to support its strict mathematical interpretation here. ** In the possibly temporary absence of a more satisfactory solution, a linear ordering for the dimension of depth of structure is derived informally in the following way. First, the various "levels" are mutually ordered in the traditional way (sublexical, lexical, syntactic, semantic, pragmatic, ... ) on the empirical grounds that substantial and systematic procedures on a given "level" are always accompanied by (preparatory) processing on the preceding "levels," but not vice-versa. Various theoretical arguments why this should be so can also be offered. Then, within a given "level" we may find various ways to rank systems with respect to the thoroughness of the job they do there. 808 Fall Joint Computer Conference, 1972 As I have tried to indicate, there is in fact considerable variation among existing text processing systems in the degree to which they make use of information and procedures appropriate to these various levels. There are programs, for example, which tread very lightly into the syntactic area, making only occasional use of very narrowly circumscribed kinds of syntactic processes. Such programs are surely not to be lumped together with those which, like the MIND system's syntactic component,25 carry out an exhaustive analysis of the sentential structure, merely because they both do some form of what we call syntax. The same is true of the other levels along the dimension of depth of structure; the great variety of actual implementations defies meaningful description in terms of a small number of utterly distinct categories. We use the terminology of "levels" because, in passing along this dimension of depth of structure we pass in turn a small number of milestones with familiar names; but it is a mistake to imagine that nothing exists between them. Now I want to conjecture that it is precisely the quasi-continuous nature of this dimension which has helped to sabotage the efforts of researchers and system designers over the years to bring into being a small number of nicely finished text processing modules out of which everyone might construct the working system of his choice. The ambition to create such a set of universal text processing building blocks is a noble one and, like Esperanto, much can be said in its favor. But those who have worked on the realization of this scheme have not enjoyed success commensurate with the loftiness of their aims. Why this unfottunate state of affairs? I believe it can be traced to the generally unfounded notion in the minds of text processing system designers, and their sponsors, that valuable economies of talent and time and money can be achieved by creating systems which, in effect, advance as little as possible along this dimension in order to get by. In fact, as is evident, for example, in some recent machine translation products, this attitude may be productive of needlessly complex and inflexible systems that are obsolescent upon delivery. Of course, in many cases differences in hardware and software have prevented system designers from making use of existing programs. And in some cases the point might be made that considerations of operating efficiency dictated a "tight code" approach, ruling out the incorporation of what appears to be the unnecessary power and complexity of overdesigned components. But many of the systems in this field are of an experimental nature, where operating efficiency is relatively unimportant. And it often happens in the course of systems development, that our initial estimate of depth of structure requirements turns out to be wrong; that is how "kluges" get built. In such instances, the aim at local economies results in a global extravagance. A partial remedy is for us to become familiar with the spectrum of requirements that systems designers face along this dimension of depth of structure, and to learn (1) how to build adaptable processing modules, and (2) how to tune these to the needs of individual systems. I invite you to join me in the belief that this can and should be done. Lines of comparable power Let us consider for a moment the four "corners" of our hypothetical two-dimensional classification space. Since we have no interpretation of negative scope or negative depth of structure, we will locate all systems in the positive quadrant of the two-dimensional plane. At the minimum of both axes, near the origin, we might locate, say, a character-counting program written as a week-end project by a library science student in a mandatory PL/1 course. High in scope, and low in depth of structure are text editing programs in general. Let us somewhat arbitrarily locate Englebart's editing system in this corner, on the basis of its strong user orientation. Low in scope, but high in depth of structure: this practically defines Winograd's tabletop robot simulator. The domain of discourse of this system is deliberately severely restricted, but it surpasses any other system I know of in its structural capabilities. High in both scope and depth of structure: in the real world, no plausible candidate exists. We might imagine this corner of our space filled by a system such as HAL, from the movie "2001"; but nothing even remotely resembling such a system has even been seriously proposed. The manner in which real-world systems fit into our descriptive space suggests that some kind of trade-off exists between the two dimensions; perhaps it is no accident that the system having the greatest depth-ofstructure capabilities is so severely restricted in scope, while the systems having the greatest scope operate at a low level of structural depth. It is my contention that this is indeed not an accident, but that it reflects some important facts about our ability to process textual information automatically. It would seem that, given the current state of the art, we can, as system designers, trade off breadth of scope against advances in structural depth, and vice versa, but that to advance on both fronts at once would require some kind of genuine breakthrough. This trading relationship between the dimensions can be expressed in terms of lines of comparable power or Dimensions of Text Processing sophistication. Having the shape of hyperbolic asymptotes to the axes of our descriptive space, such lines would connect systems whose intrinsic power or sophistication differs only by virtue of a different balance between scope and depth of structure. The state of the art in the field of text processing might then be characterized by the area under the line connecting the most advanced existing systems. Since genuine breakthroughs are probably not more common in this field than in others, our analysis supports the conclusion that run-of-the-mill system design proposals which promise to significantly extend our automatic text processing capabilities in both scope and depth of structure are probably ill-conceived, or perhaps worse. Yet proposals of this kind are not uncommon, and a number of them attract funds from various sources every year. I feel sure that a better understanding of the dimensions of text processing on the part of sponsoring agencies as well as system designers might result in a healthier and more productive climate of research and development in this field. 809 upgrade the structure-handling capacity of systems having considerable scope. While the design of effective hybrid systems for text processing involves many considerable problems, this approach seems to offer a means of bringing the unique power of computers to bear on applications which now lie on the farther side of the state-of-the-art barrier with respect to fully automatic systems. ACKNOWLEDGMENTS The writing of this paper was generously supported by the Center for Computer-Based Behavioral Studies at the University of California at Los Angeles, and was encouraged by the Center's Director, Gerald Shure. I am indebted to Martin Kay and Ronald Kaplan of the RAND Corporation and to J. L. Kuhns of Operating Systems, Inc. for illuminating discussion of the subject matter. The errors are all my own. REFERENCES Men and machines Finally, I want to simply mention a set of techniques which can be of .inestimable value in breaking through the state-of-the-art barrier in text processing, and to indicate their relation to our two-dimensional descriptive space. I have in mind the set of techniques by which effective man-machine cooperation may be brought to bear in a particular application. It has for some time been known that the human cognitive apparatus possesses a number of powerful patternrecognition capabilities which have not even been approached by existing computing machinery. A number of projects have investigated the problems of marrying these powers efficiently with the speed and precision of computers to solve problems which neither could manage alone. In the field of textual data processing, the potential payoff from such hybrid systems, if you will permit me the phrase, increases greatly as we consider higher levels along the dimension of depth of structure. We humans take for granted in ourselves capabilities which astound us in machinery; most 3-year-old children could easily out-perform Winograd's robot simulator, for example. Whereas at the lower levels of this dimension, in tasks like sorting, counting, string replacement, and what not, no man can begin to keep up with even simple machines. I conclude from these elementary observations that well-designed man-machine systems can greatly extend the scope of systems at the higher end of the dimension of depth of structure, or (to put it in another way) can 1 Text editor and corrector reference manual (TEeO) Interactive Sciences Corporation Braintree Mass 1969 2 N H NIE D H BENT C H HULL SPSS: Statistical package for the social sciences McGraw-Hill New York 1970 3 W J DIXON editor BMD: Biomedical computer programs University of California Publications in Automatic Computation Number 2 University of California Press Los Angeles 1967 4 D G BOBROW D P MURPHY W TEITELMAN The BBN-LISP system Bolt Beranek & Newman BBN Report 1677 Cambridge Massachusetts April 1968 5 PDP-10 BASIC conversational language manual Digital Equipment Corporation DEC-10-KJZE-D Maynard Massachusetts 1971 6 PDP-10 algebraic interpretive dialogue conversational language manual ,Digital Equipment Corporation DEC-10-AJCO-D Maynard Massachusetts 1970. The AID language in this reference is an adaptation of the RAND Corporation's JOSS language. 7 QED reference manual Com-Share Reference 9004-4 Ann Arbor Michigan 1967 8 WYLBUR reference manual Stanford Computation Center Stanford University Stanford California revised 3rd edition 1970 9 W D ELLIOT W A POTAS A VAN DAM Computer assisted tracing of text evolution Proceedings of 1971 Fall Joint Computer Conference Vol 37 10 D C ENGLEBART "\V K ENGLISH A research center for augmenting human intellect Proceedings of 1968 Fall Joint Computer Conference Vol 33 11 0 HOLST I R A BRODY R C NORTH Theory and measurement of interstate behavior: A research application oj automated content analysis Stanford University May 1964 810 Fall Joint Computer Conference, 1972 12·P L WHITE KWIC/360 IBM Program Number 360D-06.7.(014/022) IBM Corporation St Ann's House Parsonage Green Wilmslow Chesire England United Kingdom 13 M KAY G R MARTINS The MIND system: The morphological-analysis program The RAND Corporation RM-6265/2-PR April 1970 14 P J STONE D C DUNPHY M S SMITH D M OGILVIE et al The general inquirer: A computer approach to content analysis MIT Press Cambridge 1966 15 E F KELLY A dictionary-based approach to lexical disambiguation Unpublished doctoral dissertation Department of Social Sciences Harvard University 1970 16 P STONE M SMITH D DUNPHY E KELLY K CHANG T SPEER Improved quality of content analysis categories: Computerized disambiguation rules for high frequency English words In G Gerbner 0 Holsti K Krippendorf W Paisley P Stone The Analysis of Communication Content: Developments in Scientific Theories and Computer Techniques Wiley New York 1969 17 W A 'VOODS R M KAPLAN The lunar sciences natural language information system Bolt Beranek and Newman Inc Report No 2265 Cambridge Massachusetts September 1971 18 M R QUILLIAN Semantic memory In M Minsky editor Semantic Information Processing MIT Press Cambridge Massachusetts 1968 19 B RAPHAEL SIR: A computer program for semantic information retrieval In E A Feigenbaum and J Feldman Computers and Thought McGraw-Hill New York 1968 20 C Ii KELLOGG A natural language compiler for on-line data management AFIPS Conference Proceedings of the 1968 Fall Joint Computer Conference Vol 33 Part 1 Thompson Book Company Washington DC 1968 21 S C SHAPIRO G H WOODMANSEE A net-structure based question-answerer: Description and examples In Proceedings of the International Joint Conference on Artificial Intelligence The MITRE Corporation Bedford Massachusetts 1969 22 S C SHAPIRO The MIND system: A data structure for semantic information processing The RAND Corporation R-837-PR August 1971 23 M KAY S SU The MIND system: The structure of the semantic file The RAND Corporation RM-6265/3-PR June 1970 24 T WINOGRAD Procedures as a representation for data in a computer program for understanding natural language MIT Artificial Intelligence Laboratory MAC TR-84 Massachusetts Institute of Technology Cambridge Massachusetts February 1971 25 R M KAPLAN The MIND system: A grammar-rule language The RAND Corporation RM-6265/1-PR March 1970 Social indicators based on communication content by PHILIP J. STONE Harvard University Cambridge, Massachusetts TYPES OF COMMUNICATION CONTENT INDICATORS INTRODUCTION Early mechanical translation projects served to inject some realism about the complexity of ordinary language processing. While text processing aspirations have become more tempered, today's technology makes possible cost effective applications well beyond the index or concordance. This paper outlines one new challenge that is mostly within curent technology and may be a major future consumer of computer resources. As we are all aware, industry and government cooperate in maintaining an extensive profile of our economy and its changes. As proposed by Bauer! and others, there is a need for indicators regarding the social, in addition to the economic, fabric of our lives. Several volumes, such as a recent one edited by Campbell and Converse,2 review indexes that can be made on the quality of life. Kenneth Land has documented the growing interest in social indicators, reflected in volumes of reports and congressional testimony, as drawing on a wide basis of support. As the conflicts of the late 1960's and early 1970's within our society made evident the complexity of our own heterogeneous culture, interest in social indicators increased. Most social indicator discussions focus on statistics similar to economic indicators. A classic case is Durkheim's4 study on the analysis of suicide rates. Another kind of social indicator, which we consider in this paper, is based on changes in the content of mass media and other public distributions of information, such as speeches, sermons, pamphlets, and textbooks. Indeed, in the same decade as Durkheim's study, Speeds compared New York Sunday newspapers between 1891 and 1893, showing how new publication policies (price was reduced from three cents to two cents) were associated with increased attention to gossip and scandal at the expense of attention to literature, religion and politics. Since then, hundreds of such studies, called "content analyses," have been reported. Many different content indicators can be proposed = Which sectors of society have voiced most caution about increasing Federalism? How has the authoritarianism of church sermons changed in different religions? How oriented are the community newspapers to the elites of the community? Such studies, however, can be divided into two major groups. One group of studies is concerned with comparing the content of different channels through which different sectors of society communicate with each other. Such studies often monitor the spread of concepts and attitudes from one node to another in the communication net. Writers such as Deutsch 6 have discussed feedback patterns within such nets. Another group of studies is based on the realization that a large segment of public media represents different sectors of society communicating with themselves. Social scientists have repeatedly found that people tend to be exposed just to information congruent with their own point of view. Thus, rather than focus on the circulation of information between sectors of society, these studies identify different subcultures and look at the content of messages circulated within them. In fact, anyone individual belongs to a set of subcultures. On the job, he or she may be exposed to the views of colleagues, while off the job the exposure may be to those with similar leisure time inter~sts, religious preferences, or political leanings. Given the cost effectiveness of television for reaching the mass public, the printed media has become used more for directed messages to different subcultures. Thus, while there has been the demise of general circulation magazines such as the Saturday Evening Post and Look, the number of magazines concerned with particular trades, hobbies, consumer orientations and levels of literary sophistication has greatly increased. 811 812 Fall Joint Computer Conference, 1972 While the printed media recognizes many different subcultures (and one only has to watch the sprouting of new underground newspapers or trade journals to realize how readily a market can be identified), there has been a more general resistance to recognizing how many subcultures there are and how diverse their views tend to be. Given the enormous complexity of our culture, each sector tends to recognize its own diversity, but assumes homogeneous stereotypes for other sectors. After repeated blunders, both the press and the public are coming to realize that there are many different subcultures within the black community, the student community, the agricultural community, just as we all know the're are many different subcultures in the computer community. As sociologist Karl Mannheim7 identified some years ago, the need to monitor our culture greatly increases with such heterogeneity. Gradually, awareness,of a need is turning into action. Since the Behavioral and Social Science (Bass) reportS released by the National Academy of Science in 1969 gave top priority to developing social indicators, government administration has been set up to coordinate social indicator developments and several large grants have been issued. Within coming years, we may expect significant sums appropriated for social indicators. COMPARISON WITH CONTENT ANALYSIS RESEARCH What language processing expertise do we have today to help produce such social indicators? The Annual Review prepared by the American Documentation Institute or reports on such large text processing systems as Salton's "SMART"9 offer a wide variety of possibly relevant procedures. The discussion here focuses on techniques developed explicitly for content analysis. Content analysis procedures map a corpus of text into an analytic framework supplied by the investigator. Itis information reducing, in contrast to an expansion procedure like a concordance, in that it discards those aspects of the text not relevant to the analytic framework. As a social science research technique, content analysis is used to count occurrences of specific symbols or themes. The main difference between content analysis as a social science research technique and mass media social indicators concerns sampling. A researcher samples text relevant to hypotheses being tested. Only as much text need be processed as necessary to substantiate or disconfirm the hypothesis. Usually, this involves thousands or tens of thousands of words of text. A social indicators project, on the other hand, involves moni- toring many different text sources over what may be long periods of time. The total text may run into millions of words. A hypothetical example illustrates how a social indicators project can come to be such a large size. A social indicators project may compare a number of different subcultural sectors in each of several different geographic locations. Within each sector, the sample should include several media, so it does not reflect the biases of one originator. The monitoring might cover several decades, with a new indicator made bimonthly. Imagine then a 4 dimensional matrix representing 14 (subcultural sectors) X 5 (geographic regions) X 4 (originating sources) X 150 (bimonthly periods). Each cell of this matrix might contain a sample of 15,000 words of text. The result is a text file of over a billion characters. Social science content analysis, which has been computer aided for over a decade (see for example, Stone et al., 10 Stone et al., 11 Gerbner et al. 12), has used manual keypunching to provide the modest volumes of machine readable text needed. If the content analysis task were simple (such as in our first example below), human coders were often less expensive than the cost of getting the text to machine readable form. Computer aided content analysis has tended to focus on those texts, such as anthropologists' files of folktales, where the same material may be intensively studied to test a variety of hypotheses. Social indicators of public communications, on the other hand, will require high speed optical readers capable of handling text in a wide variety of printing fonts. Optical readers for selected fonts have been around for some time, but readers capable of adapting to many new fonts are just coming into existence. The large general purpose reader developed by Information International, which incorporates a PDP-10 as part of its hardware, represents this kind of machine. It is able to "learn" new fonts and then offer high speed reading from microfilm with a low error rate, even of third generation Xerox copy. Both social science content analysis research and social indicators allow for some noise, just as economic indicators tolerate an error factor. Some of the noise stems from sampling procedures. Other noise comes from measurement procedures. The "quality control" of social science research or monitoring procedures involves keeping the noise to a tolerable minimum. Assurances are also needed that the noise is indeed random, rather than leading to specifiable biases. With large sampling procedures, a tolerable modest random noise level should be considerably more than allowed in many kinds of text processing applications. For example, a single omission in an automated document classifica- Social Indicators Based on Communication Content tion scheme might cause a very important document to go unnoticed by the users of the information retrieval system, thus causing great loss. LEVELS OF COMPLEXITY Both content analysis procedures for testing hypotheses and procedures for creating social indicators come at varying levels of complexity. Some pose little difficulty for today's text processing capabilities while others pose major challenges. If one accepts a growing consensus among artificial intelligence experts that a successful language translation machine must "understand," in a significant sense of that word, the subject matter it is translating, then the most complicated social indicator tasks begin to approach this domain of challenge. Let us start at the simpler levels, showing how these needs have been met in content analysis, and work up to these more difficult challenges. The simplest measure is to identify mentions of an individual, place, group, or race. For example, Johnson, Sears and McConahay13 performed a manual content analysis of what they call "black invisibility" since the turn of the century in the Los Angeles press. They show that the percent of newspaper space devoted to blacks in the major papers is much less than their percent of the Los Angeles population warrants, and, furthermore, the ratio has been getting worse over time. A black person can read the Los Angeles press and obtain the impression that blacks do not exist there. Thus, point out the authors, some blacks took a "We won!" attitude toward the large amount of destruction in the Watts riots. Why? As reported by Martin Luther King,14 they said "We won because we made them pay attention to us." There is indeed a hunger to have one's existence recognized. "Black invisibility" can be assessed by counting the number of references to blacks compared to whites, or the newspaper column inches given to each. A computer content analysis would need a dictionary of names referring to black persons or groups. The computer should have an ability to automatically update this dictionary as processing continued, for race may be only identified with early newspaper stories about the person or group. Thus, few stories today about Angela Davis any more identify her race. The computer challenge, should optical readers have had the text ready for processing, would have been minimal. Johnson, Sears, and McConohay carried their research another step, classifying the stories according to whether they dealt with anti-social activities, black entertainers, civil rights, racial violence and several 813 other categories. These additional measures make "black invisibility" more evident, for what little coverage blacks receive is often unfavorable and unrepresentative of the community. These additional topic identifications would again hold little difficulty for a computer analysis. It is not difficult to recognize a baseball story, a violent crime story, or a society event. The Johnson, Sears and McConahay "black invisibility" indexes were only made on two newspapers with very limited samples in the earlier part of the time period studied. Their techniques, however, could be applied to obtain "black invisibility" indexes for both elite and tabloid press in every major metropolitan area of the country. It is an example of how a content analysis measure can have considerable potential as a future social indicator. The next level· of complexity is represented by what we call thematic analysis. For example, we might be interested in social indicators measuring attitudes toward increasing Federalism in our society. Separate indicators might be developed to tap occurrences of themes such as the following: (1) The Federal government as an appropriate re- cipient of complaints about ... (2) The Federal government as initiator of programs for ... (3) The Federal government as restricting or controlling ... (4) The Federal government as responsible for the well being of . . . Such themes are measured by first identifying synonyms. Rather than refer to the "Federal government," the text may refer to a particular agency. The verb section may have alternative forms of expression. F inally separate counts might be kept for each theme relevant to different target clusters such as agriculture, industry, minority groups, consumer goods, etc. Past work in content analysis has offered considerable success in studies on a thematic level. Thus Ogilvie (1966) found considerable accuracy in computer scoring of "need achievement" themes in stories made up by subjects. The scoring involved thematic identifications similar in complexity to the Federalism measures cited above. Ogilvie found that the correlation between the computer and a human coder was about .85, or as high as the correlation between two human coders. A still higher complexity is represented by the packaging of thematic statements into an argument, plot, or rationale. This has recently become prominent in psychology, with Abelson 16 writing about "opinion molecules" while Kelly17 writes of "causal schemata." 814 Fall Joint Computer Conference, 1972 The concern is with, if I may substitute computer terminology, various "subroutines" that we draw on to explain the world about us. Many such subroutines are shared by the community at large, so that a passing reference to any part of the subroutine can be expected to cause the listener to invoke the whole subroutine. To take a very simple example, such phrases as a "Communist inspired plot," "subversive action," and "Marxist goals" can all be taken as invoking a highly shared molecule including something as follows: Communists create (inspire) plots involving subversive actions against established ways in order to force changes in society toward their Marxist goals. Matters are rather simple when dealing with such weatherbeaten old molecules, but take the end run kinds of debates between politicians about school bussing to try and identify the variety of molecules surrounding that topic. The inference of underlying molecules involves theoretical issues that can go well beyond text processing problems. Again, there is a relevant history in content analysis, although computer aided procedures have only recently had any successes. The classic manual study was by Propp18 who showed that Russian folktales fell into variants of a basic plot. Recently, anthropologists such as Colby19 and Miranda20 have pushed further the use of the computer to study folktale plots. Investigators such as Shneidman21 have worked on detailed manual techniques to identify the forms of "psycho-logic" we use in everyday explanations. Social indicators at this level should pose considerable difficulty for some time to come. NEW TEXT PROCESSING RESOURCES Content analysis research may share with social indicators projects in the priorities for new text processing resources. These priorities may be quite different from those in information retrieval or other aspects of text processing. We here review these priorities as we see them. ' While automated linguistic analysis has been preoccupied with questions of syntactic analysis, content analysis work has given priority to semantic accuracy. Semantic errors effect even the simplest levels of measurement and were known to cause considerable noise in many measurement procedures. Even the "black invisibility" study, for example, is going to have to be able to distinguish between "black" the color and "black" the race, as well as other usages of "black." A Federalism study may expect verbs like "restrict" and "control," but in fact the text may use such frightfully ambiguous words as "run," "handle" or "order." A first order of business has been to reduce such noise to more manageable levels. One might argue that procedures for such semantic identifications should come after the text has received a syntactic analysis. Certainly this would simplify the task and increase accuracy. However, many simpler social indicators and content analysis tasks do not otherwise need syntactic analyses. For social indicator projects, the large volumes of text discourage invoking syntactic routines unless they are really needed. In content analysis research, text is often transcripts of conversational material having a highly degenerate syntactical form. For these applications, a syntactically dependent analysis of word senses might be less than satisfactory. Thus, for both social indicators and content analysis research, it makes sense to attempt identification of word senses apart from syntactic analysis. A project was undertaken some five years ago to develop computer routines that would be able to identify major word senses for all words having a frequency of over 40 per million. This criterion resulted in a list of 1815 entries covering about 90 percent of running text. Identification of real, separate word senses is a thorny problem we have discussed elsewhere; let it simply be pointed out here that the number of word senses in a dictionary tends to be directly proportional to the size of the dictionary. Our goal was to cover the basic distinctions (such as "black" the race vs "black" the color) rather than many fine-graded distinctions (such as those of a word like "fine"). Of the 1815 candidates, some 1200 were identified as having multiple meanings. Two thirds of these, or about 800 words offered considerable challenge for word se~se identifications. Rules for identifying word senses were developed for each of these multiple meaning words. Each rule could test the word environment (specifying its own boundary parameters) for the presence or absence of particular words or any of sixty different markers. Each rule, written in a form suitable for compilation by a weak precedence grammar, could either assign senses or transfer to other rules, depending on the outcome. The series of rules used for testing any one word thus formed a routine. The implementation of these rules on a computer emphasized efficiency. Since marker assignments often depended on word senses being identified, deadlocks could occur with some rules testing for markers on neighboring words which could not yet be assigned until the word. in question was resolved. Strategies include the Social Indicators Based on Communication Content computer looking into dictionary entries to see if the marker category is among the possible outcomes. Despite such complicated options, occasionally resulting in multiple passes, the word sense procedures are remarkably fast, to the point of being feasible for social indicators work. The accuracy of the word sense identification procedures was tested on a 185,000 word sample drawn both from Kucera and Frances22 and our own text files. A variety of tests were performed. For example, for a sample of 671 particularly difficult homographs, covering 64,253 tokens in the text, correct assignment was made 59,716 times or slightly over 92 percent of the time. The procedures thus greatly reduce the noise in word sense assignments. The second priority for even some of the simplest social indicator projects should be pronoun identification. The importance of the problem depends on the kinds of tabulations that are to be made. If the question is whether any mention is made in the article, then pronouns are not such a crucial issue. If the question involves counting how many references were made, then references should be identified in both noun and pronoun form. We believe that more work should be encouraged on pronoun identification so as to be better prepared for future social indicators research. Because many pronouns involve references outside the sentence, the problem is beyond most current syntax studies. Winograd23 provides a heartening example of how well pronoun identification can be made for local discourse on a specific topic area. The third priority is syntactic analysis for thematic identification purposes. This is not just a general syntactic analysis, but an analysis to determine if the text matches one of the thematic templates relative to a social indicator. Large amounts of computer time can be saved by only calling on the syntactic routine after it is established that all the semantic components relevant to the theme are indeed present. Syntactic analysis can stop as soon as it is established that the particular word order cannot be an example of that theme. In general, we find that a case grammar is most akin to thematic analysis needs. The transition network approach of Woods 24 holds considerable promise for such syntactic capabilities. Gary Martins, who is with us on the panel, is already exploring the application of such transition networks to content analysis problems. This work should be of considerable utility in the development of social indicators based on themes. Finally, we come to the need for inference systems to handle opinion molecules and the like. Such devices as 815 Hewitt's PLANNER25 may have considerable utility for such social indicator projects. A PLANNER operation includes a data base and a set of theorems. Given a text statement, PLANNER can attempt to tie it back through the theorems until a match is made in the data base. For any editorial, for example, the successful theorem paths engendered could identify which molecules were being invoked and their domain of application. At present, this is but conjecture; much work needs to be done. These priorities, then, are explicitly guided by what Weizenbaum26 calls a "performance mode," in this case toward creating useful social indicators. They may well conflict with text processing priorities in computational linguistics or artificial intelligence. Some social indicators may only be produced in the distant future, but meanwhile important results can be accomplished using optical readers and current text processing sophistication. DEVELOPING SOCIAL INDICATOR TEXT ARCHIVES Having considered text processing research prlOrlties, let us examine what concurrent steps are needed if relevant text files are to be created and put to effective use. At present, our archiving of text material is haphazard and, for social indicator purposes, subject to major omissions. The American public (as publics in most advanced societies) spends more than four times as many hours watching television compared to all forms of reading put together (Szalai, et a1. 27). Yet, even with the incredible salience of network evening newscasts or documentary specials, the networks are not required to place television scripts in a public archive. A content analysis like Efron's The News Twisters28 had to be made from homemade tape recordings of news broadcasts. Similarly if one is to study the content of cummunication channels between sectors of society, one needs both original and intermediate sources such as press releases and wire service transmissions. Past critics of our news media such as Cirino29 have had to make extensive efforts to obtain the necessary primary information. Better central archiving is very much needed. As discussed by Firestone,30 considerable attention will have to be given to coordinating the sampling of text with sampling used for other social indicators. For example, it makes obvious sense to target the sampling of union newsletters to correspond to union memberships selected for repeated survey interviews. In one of our own studies, Stone and Brody31 compared a content 816 Fall Joint Computer Conference, 1972 analysis of news stories on the Vietnam war with the results of Gallup survey questions ab()ut the effectiveness of the president. This study would have been greatly aided by (a) better text files of representative news stories from across the nation and (b) survey information as to media exposure. With less adequate data, the quality of the analysis suffers. SAFEGUARDING THE PUBLIC On the one hand, since the files are based on public communications, investigators outside the government should have access to the archives for testing different models. In this sense, such files would be similar to the computer economic data bases for testing econometric models now made available by commercial organizations. On the other hand, the same technology used to produce social indicators based on content can be used to invade the content of private communications. This author, for one, is worried about current military sponsored research that aims to make possible a computer monitoring of voice grade telephone communication. Mter all that has been written about privacy, a much closer safeguard is needed. Further work on content analysis techniques must be coordinated with such safeguards. SUMMARY This paper has outlined how computer text processing resources may be used to produce social indicators of communication content. A new challenge of considerable scale is forecast. The relations of such indicators to existing content analysis research techniques is identified. Priorities based on social indicator requirements are offered for future text processing research. Because of the scale of the operation and its distinct requirements, we suggest that social indicators based on communication .content be considered separate from other computer text processing applications. Immediate attention is needed for text archiving and safeguarding the privacy of communications. REFERENCES 1 R BAUER Social indicators MIT Press 1966 2 A CAMPBELL P CONVERSE (ed) The human meaning of social change Russell Sage Foundation 1972 3 K LAND Social indicators In R Smith Social Science Methods-A New Introduction Vol 2 In Press 4 E DURKHEIM Suicide-A study in sociology 1897 (Trans from the French 1951 Free Press) 5 J G SPEED Do newspapers now give the news? The Forum 1893 Vol 15 pp 705-711 6 K DEUTSCH Nerves of government Free Press 1963 7 K MANNHEIM Ideology and utopia-An introduction to the sociology of knowledge 1931 (English translation: Harcourt) 8 NATIONAL ACADEMY OF SCIENCE Behavioral and social sciences-Outlook and needs Prentice Hall 1969 9 G SALTON The SMART retrieval system Prentice Hall 1971 10 P STONE R BALES J Z NAMENWIRTH D OGILVIE The general inquirer Behavioral Science 1962 Vol 7 pp 484-498 11 D C DUNPHY M S SMITH D M OGILVIE The general inquirer-A computer approach to content analysis MIT Press 1966 12 G GERBNER 0 HOLSTI K KRIPPENDORFF W PAISLEY P J STONE The analysis of communications content Wiley Press 1969 13 P B JOHNSON D SEARS J McCONAHAY Black invisibility, the press and the Los Angeles riot Amer J Sociology 1971 Vol 76 pp 698-721 14 M L KING Where do we go from here? Chaos or community? Beacon Press 1967 15 D M OGILVIE In P Stone et al op cit pp 191-206 16 R P ABLESON Psychological implication In R P Abelson E Aronson W McGuire T Newcomb M Rosenberg and P Tannenbaum Theories of Cognitive Consistency Rand McNally 1968 17 H KELLY Causal schemata and the attribution process General Learning Press 1972 18 V PROPP Morphology of the folktale 1927 American Folklore Society (Trans 1958) 19 B N COLBY Folk narrative Current Trends in Linguistics Vol 12 1972 20 P MIRANDA Structural strength, semantic depth, and validation procedures in the analysis of myth Proceedings Quatrieme Symposium sur les Structures Narratives Konstanz Germany 1971 In Press 21 E S SHNEIDMAN Logical content analysis: An explication of styles of "concludifying" Social Indicators Based on Communication Content In Gerbneret al op cit 22 H KUCERA W FRANCES Computational analysis of present-day American English Brown University Press 1967 23 T WINOGRAD Procedures as a representation for data in a computer program for understanding natural language Report MAC TR-84 MIT February 1971 (Selections reprinted in Cognitive Psychology 1972 #1) 24 W WOODS Transitional network grammars for natural language analysis Comm ACM 1970 Vol 13 pp 591-602 25 C HEWITT PLANNER-A language for proving theorems in robots Proc of IJCAI 1969 pp 295-301 26 J WEIZENBAUM On the impact of the computer on society Science Vol 176 pp 609-6141972 817 27 A SZALAI E SCHEUCH P CONVERSE P STONE The use of time Mouton 1972 28 E EFRON The news twisters Nash 1971 29 R CIRINO Don't blame the people Diversity Press 1971 30 J M FIRESTONE The development of social indicators from content analysis of social documents Policy Sciences In Press 31 P STONE R BRODY Modeling opinion responsiveness to day news-The public and Lyndon Johnson 1965-1968 Social Science Information Vol 9 #1 pp 95-122 The DOD COBOL compiler validation system by GEORGE N. BAIRD Department of the Navy Washington, D. C. INTRODUCTION Data Systems Languages* and published in May of 1961. Recognizing that the language would be subject to additional development and change, an attempt was made to create uniformity and predictability in the various implementations of COBOL compilers. The language elements were placed in one of two categories: required and elective. Required COBOL-1961 consisted of language elements (features and options) which must be implemented by any implementor claiming a COBOL-1961 compiler. This established a common minimum subset of language elements for COBOL compilers and hopefully a high degree of transferability of source programs between compilers if this subset was adhered to. Elective COBOL-1961 consisted of language elements whose implementation had been designated as optional. It was suggested that if an implementor chose to include any of these features (either totally or partially) he would be expected to implement these in accordance with the specifications available in COBOL-1961. This was to provide a logical growth for the language and attempt to prevent a language element from having contradictory meaning between the language development specifications and implementor's definition. As implementors began providing COBOL compilers based on the 1961 specifications, unexpected problems became somewhat obvious. The first problem was that the specifications themselves suggested mandatory as well as optional language elements for implementing COBOL compilers. In addition the development docu- The ability to benchmark or validate software to ensure that design specifications are satisfied is an extremely difficult task. Test data, generally designed by the creators of said software, is generally biased toward a specific goal and tend not to cover many of the possibilities of combinations and interactions. The philosophy of suggesting that "a programmer will never do . . ." or "this particular situation will never happen" is altogether absurd. First, "never" is an extremely long time and secondly, the Hagel theorem of programming states that "if it can be done, whether absurd or not, one or more programmers will more than likely try it." Therefore, if a particular piece of software has been thoroughly checked against all known extremes and a majority of all syntactical forms, then the Hagel theorem of programming will not affect the software in question. The DOD CCVS attempts to do just that by checking for the fringes of the specifications of X3.23-19681 and known limits. It is assumed that a COBOL compiler will perform satisfactorily for the audit routines, then it is likely that the compiler supports the entire language. However, if the computer has trouble with handling the routines in the CCVS it can be assumed that there will indeed be other errors of a more serious nature. The following is a brief account of the history of the DOD CCVS, the automation of the system and the adaptability of the system to given compilers. * The Conference on Data Systems Languages (CODASYL) is an informal and voluntary organization of interested individuals supported by their institutions who contribute their efforts and expenses toward the ends of designing and developing techniques and languages to assist in data systems analysis, design, and implementation. CODASYL is responsible for the development and maintenance of COBOL. BACKGROUND The first reVISIon to the initial specification for COBOL (designated as COBOL-196P) was approved by the Executive Committee of the Conference on 819 820 Fall Joint Computer Conference, 1972 ment produced by CODASYL was likely to change periodically thus, providing multiple specifications to implement from. Compilers could consist of what the implementor chose to implement which would severely handicap any chance of transferability of programs among the different compilers, particularly since no two implementors necessarily think alike. Philosophies vary both in the selection of elements for a COBOL compiler and in the techniques of implementing the compiler itself. (As ridiculous as it may sound, some compilers actually scan, syntax check and issue diagnostics for COBOL words that might appear in comments both in the REMARKS paragraph of the Identification Division and in NOTE sentences in the Procedure Division.) The need for a common base from which to implement became obvious. If the language was to provide a high degree of compatability, then all implementations had to be based on the same specification. The second problem was the reliability of the com;.. piler itself. If the manual for the compiler indicated that it supported the DIVIDE statement, the user assumed this was true. If the compiler then accepted the syntax of the DIVIDE statement, the user assumed that the object code necessary to perform the operation was generated. When the program executed, he expected the results to reflect the action represented in his source code. It appears that in some cases perhaps no code was generated for the DIVIDE statement and the object program executed perfectly except for the fact that no division took place. In another case, when the object program encountered the DIVIDE operation, it simply went into a loop or aborted. At this point, the programmer could become decidedly frustrated. The source code in his program indicated that: (1) he requested that a divide take place, (2) there was no error loop in his program, (3) the program should not abort. This is .the problem we are addressing: A programmer should concern himself with producing a source program that is correct logically and the necessary operating system control statements to invoke the COBOL compiler. In doing so, he should be able to depend on the compiler being capable of contributing its talent in producing a correct object program. If the user was assured that either: (1) each instruction in the COBOL language had been implemented correctly, or, (2) that each statement which was implementeddid not give extraneous results, then the above situation could not exist. Thus, the need· for a validation tool becomes apparent. Although all vendors exercise some form of quality control on their software before it is released, it is clear that some problems may not be detected. (The initial release of the Navy COBOL audit routines revealed over 50 bugs in one particular compiler which had been released five years earlier.) By providing the common base from which to implement and a mechanism for determining the accuracy and correctness of a compiler relative to the specification, the problem of smorgasbord compilers (that may or may not produce expected results) should become extinct. The standardization of COBOL began on 15 January 1963. This was the first meeting of the American Standards Association Committee, X3.4.4, * the Task Group for Processor Documentation and COBOL. The program of work for X3.4.4 included ... "Write test problems to test specific features and combinations of features of COBOL. Checkout and run the test problems on various COBOL compilers." A working group (X3.4.4.2) was established for creating the "test problems" to be used for determining feature availai;. bility. The concept bf a mechanism for measuring the compliance of a COBOL compiler to the proposed standard seemed reasonable in view of the fact that other national standards did indeed lend themselves to some form of verifications, i.e., 2X4's, typewriter keyboards, screw threads. Il\;fPLEMENTING A VALIDATION SYSTEM FOR COBOL In order to implement a COBOL program on a given system, regardless of whether the program is a validation routine or an application program, the following must be accomplished: 1. The special characters used in COBOL (i.e., etc.) must be converted for the system being utilized. t 2. All references to implementor-names within each of the source programs must be resolved. 3. Operating System Control Cards must be pro- '(', ')', '*', ' +', ' <' * The American Standards Association (ASA), a voluntary national standards body evolved to the United States of America Standards Institute (USASI) and finally the American National Standards Institute (ANSI). The committee X3.4.4 eventually became X3J4 under a reorganization of the X3 structure. X3J4 is currently in the process of producing a revision to X3.23-1968. t For most computers the representatives for the characters A-Z, 0-9, and the space (blank character) are the same. However, there is sometimes a difference in representation of the other characters and therefore conversion of these characters from one computer to another may be necessary. The DOD COBOL Compiler Validation System duced which will cause each of the source programs to be compiled and executed. Additionally, the user must have the ability to make changes to the source programs, i.e., delete statements, replace statements, and add statements. 4. As the programs are compiled, any statements that are not syntactically acceptable to the compiler must be modified or "deleted" so that a clean compilation takes place and an executable object program is produced. 5. The programs are then executed. All execution time aborts must be resolved by determining what caused the abort and after deleting or modifying that particular test or COBOL element, repeating steps 3 and 4 until a normal end of job situation exists. Development of audit routines l\1arch 1963, X3.4.4.2 (the Compiler Feature Availability Working Group) began its effort to create the COBOL programs which would be used to determine the degree of conformance of a compiler to the proposed standard. The intent of the committee was not to furnish a means for debugging compilers, but rather to determine "feature availability." Feature availability was understood to mean that the compiler accepted the syntax and produced object code to produce the desired result. All combinations of features were not to be tested; only a carefully selected sample of features (singly and in combination) were to be tested to insure that they were operational. The test programs themselves were to produce a printed report that would reflect the test number and when possible whether the test "Passed" or "Failed." See Figure 1. When a failure was detected on the report, the user could trace the failure to the source code and attempt Source Statements TEST-OOOl. MOVE 001 TO TEST-NO. MOVE ZERO TO ALPHA. ADD 1 TO ALPHA. IF ALPHA = 1 PERFORM PASS ELSE PERFORM FAIL. TEST-0002. Results TEST ADD NO ADD 21 1 P - F P F Figure I-Example of X3.4.4.2 test and printed results 821 to identify the problem. The supporting code (printing routine, pass routine, fail routine, etc.) was to be written using the most elementary statements in the low-level of COBOL. The reason for this was twofold: 1. The programs would be able to perform on a minimum COBOL compiler (Nucleus levell, Table Handling levell, and Sequential Access level 1). 2. The chances of the supporting code not being acceptable to the compiler being tested were lessened. The programs, when ready, would be provided in card deck form along with the necessary documentation for running them. (The basic philosophies of design set forth by X3.4.4.2 were carried through all subsequent attempts to create compiler validation systems for COBOL.) Assignments were made to the members of the committee and the work began. This type of effort at the committee level, however, was not as productive as the work of standardizing the language itself. In April 1967, the Air Force issued a contract for a system to be designed and implemented which could be used in measuring a compiler against the standard. The Air Force COBOL Compiler Validation System was to create test programs and adapt them to a given system automatically by means of fifty-two parameter cards. The Navy COBOL audit routines In August of 1967, The Special Assistant to the Secretary of the Navy created a task group to influence the use of COBOL throughout the Navy. Being aware of both the X3.4.4.2 and Air Force efforts, (as well as the time involved for completion), a short term project was established to determine the feasibility of validating COBOL compilers. After examining the information and test programs available at that time, the first set of routines was produced. In addition to the original X3.4.4.2 philosophy, the Navy added the capability of providing the result created by the computer as well as the expected result when a test failed. Also, instead of a test number, the actual procedure name in the source program was reflected in the output. See Figure 2. The preliminary version of the Navy COBOL audit routines was made up of 12 programs consisting of about 5000 lines of source code. The tailoring of the programs to a particular compiler was done by hand 822 Fall Joint Computer Conference, 1972 (by physically changing cards in the deck or by using the vendor's software for updating COBOL programs). As tests were deleted or modified, it was difficult to bring the programs back to their virgin state for subsequent runs against different compilers or for determining what changes had to be made in order that the programs would execute. This was a crude effort, but it established the necessary evidence that the project was feasible to continue and defined techniques for developing auditing systems. Because of the favorable comments received on this initial work done by the Navy, it appeared in the best interest of all to continue the effort. After steady development and testing for a year, Version 4 of the Navy COBOL Audit Routines was released in December 1969. The routines consisted of 55 Programs, consisting of 18,000 card images capable of testing the full standard. The routines had also become one of the benchmarks for all systems procured by the Department of the Navy in order to ensure that the compiler delivered with the system supported the required level of American National Standard COBOL. * Also, Version 4 introduced the VP-Routine, a program that automated the audit routines. Based on fifty parameter cards, all implementor-names could be resolved and the test programs generated in a onepass operation. See Figure 3. In addition, by coding specific control cards in the Working-Storage Section of the VP-Routine as constants, the output of the VP-Routine became a file that very much resembled the input from a card reader, i.e., control cards, programs, etc. By specifying the required Department of Defense COBOL subset of the audit routines to be used in a validation, only the programs necessary for validating Source Statements ADD-TEST-l. MOVE 1 TO ALPHA. ADD 1 TO ALPHA. IF ALPHA =2 PERFORM PASS ELSE PERFORM FAIL. Results FEATURE PARAGRAPH P/F COMPUTED EXPECTED ADD ADD-TEST-l FAIL ADD ADD-TEST-2 PASS 1 2 Figure 2-Example of Navy test and printed results * In 1968, the Department of Defense, realizing that several thousand combinations of modules/levels were possible, established four subsets of American National Standard COBOL for procurement purposes. V-P Routine Input: X-O SOURCE-COMPUTER-NAME X-I OBJECT-COMPUTER-NAME X-3 X-8 PRINTER X-9 CARD-READER X-lO X-50 Audit Routine File: SOURCE-COMPUTER. XXXXXO SELECT PRINT-FILE ASSIGN TO XXXXX8 The audit routine after processing would be: SOURCE-COMPUTER. SOURCE-CO MPUTER-NAME. SELECT PRINT-FILE ASSIGN TO PRINTER. Figure 3-Example of input to the support routine, Population file where audit routines are stored and resolved audit routine after processing that subset of elements or modules would be selected, i.e., SUBSET-A, B, C, or D. The capability also existed to update the programs as the "card reader" file was being created. The use of the VP-Routine was not mandatory at this time, but merely to assist the person validating the compiler in setting up the programs for compilation. Once the VP-Routine was set up for a given system, there was little trouble running the audit routines. The user then had only to concern himself with the validation itself and with achieving successful results from execution of the audit routines. When an updated set of routines was distributed, there was no effort involved in replacing the old input tape to the VP-Routine with the new tape. The Air Force COBOL audit routines The Air Force COBOL Compiler Validation System (AFCCVS) was not a series of COBOL programs but rather a test program generator. The user could select The DOD COBOL Compiler Validation System 823 Source statement in test library 4U T 1N078A101NUC, 2NUC PICTURE S9(18). 400151 77 WRK-DS-18VOO PICTURE S9(18). 400461 77 A180NES-DS-18VOO VALUE 11111111111111111. 400471 PICTURE S9(18) COMPUTATIONAL 400881 77 A180NES-CS-18VOO VALUE 111111111111111111. 400891 802925 TEST-1NUC-078. TO WRK-DS-18VOO. MOVE A180NES-DS-18VOO 802930 TO WRK-DS-18VOO 802935 ADD A180NES-CS-18VOO TO SUP-WK-A. 802940 MOVE WRK-DS-18VOO TO SUP-WK-C. MOVE '222222222222222222' 802945 TO SUP-ID-WK-A 802950 MOVE 'lN078' 802955 PERFORM SUPPORT-RTN THRU SUP-TRN-C. Test results .lN078 .lN079. .222222222222222222.09900. Figure 4-Example of Air Force test and printed results the specific tests or modules he was interested in and the AFCCVS would create one or more programs from a file of specific tests which were then compiled as audit routines. Implementor-names were resolved as the programs were generated based on parameter cards stored on the test file or provided by the user. The process required several passes, including the sorting of all of the selected tests to force the Data Division entries into the Data Division and place the tests themselves in the Procedure Division where they logically belonged. An additional pass was required to eliminate duplicate Data Division entries (more than one test might use the same data-item and therefore there would be more than one copy in the Data Division). See Figure 4. Still another program was used to make changes to the source programs as the compiler was validated. As in the Navy system, certain elements had to be eliminated because: (1) they were not syntactically acceptable to the compiler or, (2) they caused run time aborts. Department of Defense COBOL validation system In December 1970, The Deputy Comptroller of ADP in the Office of the Secretary of Defense asked the Navy to create what is now the DOD Compiler Validation System for COBOL taking advantage of: (1) the better features of both the Navy COBOL Audit Routines (Version 4) and the Air Force CCVS and (2) the four years of in-house experience in designing and implementing audit routines on various systems as well as the actual validation of compilers for procurement purposes. The Compiler Validation System (of which the support program was written in COBOL) had to be readily adaptable to any computer system which supported a COBOL compiler and which was likely to be bid on any RFP issued by the Department of Defense or any of its agencies. It also had to be able to communicate with the operating system of the computer in order to provide an automated approach to validating the COBOL compiler. The problem of interfacing with an operating system mayor may not be readily apparent depending on whether an individual is more familiar with IBM's Full Operation System (OS), which is probably the most complex operating system insofar as establishing communication between itself and the user is concerned, or with the Burroughs Master Control Program (lVICP), where the control language can be learned in a fifteen or twenty minute discussion. Since validating a compiler may not be necessary very often, the amount of expertise necessary for communicating with the CVS should be kept to a minimum. The output of the routines should be as clear as possible in order not to confuse the reviewer of the results or to suggest ambiguities. The decision was made to adopt the Navy support system and presentation format for several reasons. (1) It would be easier to introduce the Air Force tests into the Navy routines as additional tests because the Navy routines were already in COBOL program format. It would have been difficult to recode each of the Navy tests into the format of specific tests on the Air Force Population File because of the greater volume of tests. (2) The Navy support program had become rather versatile in handling control cards, even for IBM's as, whereas the Air Force system had only limited control card generation capability. 824 Fall Joint Computer Conference, 1972 The merging of the A ir Force and Navy routines The actual merging of the routines started in February 1971 and continued until September 1971. During the merging operation, it was noted that there was very little overlap or redundancy in the functions tested by the Air Force and Navy systems. In actuality, the two sets of tests complemented each other. This could only be attributed to the different philosophies of the two organizations which originally created the routines. For example in the tests for the ADD statement: Air Force signed fields most fields 18 digits long more computational items Navy unsighed fields most fields 1-10 digits long more display items After examining the Add tests for the combined DOD routines, it was noticed that a few areas had been totally overlooked. 1. An ADD statement that forced the "temp" used by the compiler to hold a number greater than 18 digits in length: i.e., ADD -t999999999999999999 -t999999999999999999 -t999999999999999999 - 999999999999999999 - 999999999999999999 -99 TO ALPHA . . . where the intermediate result would be greater than 18 digits, but the final result would be able to fit in the receiving field. 2. There were not more than eight operands in anyone ADD test. 3. A size error test using a COl\1PUTATIONAL field when the actual value could be greater than the described size of the field, i.e., ALPHA PICTURE 9(4) COl\1P ... specifies a data item that could contain a maximum value of 9999 without an overflow condition; however, because the field may be set up internally in binary, the decimal value may be less than the maximum binary value it could hold: Maximum COBOL value = 9999 lVlaximum hardware value~16383 Therefore, from this point of view, the merging of Source statements ADD-TEST-l. MOVE 1 TO ALPHA. ADD 1 TO ALPHA. IF ALPHA = 2 PERFORM PASS ELSE GO TO ADD-FAlL-I. GO TO ADD-WRITE-I. ADD-DELETE-l. PERFORM DELETE. GO TO ADD-WRITE-l. Initialization if necessary. The Test. Check the results of the test and handle the accounting of that test. Normal exit path to the write paragraph. Abnormal path to the write statement if the test is deleted via the NOTE statement. Correct and computed results are formatted for printing. ADD-FAIL-I. MOVE ALPHA TO COMPUTED. MOVE '2' TO CORRECT. PERFORM FAIL. ADD-WRITE-I. Results are printed. MOVE 'ADD-TEST-l' TO PARAGRAPH-NAME. PERFORM PINT-RESULTS. ADD-TEST-2. Figure 5-Example of DOD test and supporting code the routines disclosed the holes in the validation systems being used prior to the current DOD routines. The general format of each test is made up of several paragraphs: (1) the actual "test" paragraph; (2) a "delete" paragraph which takes advantage of the COBOL NOTE for deleting tests which the compiler being validated cannot handle; (3) the "fail" paragraph for putting out the computed and correct results when a test fails; and (4) a "\\-'rite" paragraph which places the test name in the output line and causes it to be written. See Figure 5. The magnitude of the size of the DOD Audit Routines was approaching 100,000 lines of source coding, making up 130 programs. The number of environmental changes (resolution of implementor-names) was in the neighborhood of 1,000 and the number of operating system control cards required to execute the program would be from 1,300 to 5,000 depending on the complexity of the operating system involved. This was where the support program could save a large amount of both work and mistakes. The Versatile Program l\'1anagement System (VPl\1S1) was designed to handle all of these problems with a minimum of effort. Versatile program management system (VPMS1) A good portion of the merging included additional enhancements to the VPl\1S1 (support program) The DOD COBOL Compiler Validation System which, by this time, through an evolutionary process had learned to manage two new languages; FORTRAN and JOVIAL. The program had been modified based on the additional requirements of various operating systems for handling particular COBOL problems; the need for making the system easy for the user to interface with, and the need to provide all interfaces between the user, the audit routines, and the operating system. The introduction of implementor names through" X-cards" The first problem was the resolution of implementornames within the source COBOL programs making up the audit routines. In the COBOL language, particularly in the Environment Division, there are constructs which must contain an implementor-defined word in order for the statement to be syntactically complete. Figure 6 shows where the implementor-names must be provided. THE NOTE placed as the first word in the paragraph causes the entire paragraph to be treated as comments. Instead of the "GO TO ADD-WRITE-l" statement being executed, the logic of the program falls into the delete paragraph which causes the output results to reflect the fact that the test was deleted. If the syntax error is in the Data Division, then the coding itself must be modified. VPl\1S1 shows, in its own printed output, the old card image as well as the new card image so that what has been altered is readily apparent, i.e., 012900 02APICZZ9Value'I'. NCI085.2 OLD 012900 02 A PIC ZZ9 Value 1. NCI08*RE NEW ENVIRONMENT DIVISION. SOURCE-COMPUTER. implementor-name-l. OBJECT-COMPUTER. implementor-name-2. SPECIAL-NAMES. implementor-name-3 is MNEMONIC-NAME FILE-CONTROL SELECT FILE-NAME ASSIGN TO implementor-name-4. 825 If, while executing the object program of an audit routine, an abnormal termination occurs, then a change is required. The cause might be, for example, a data exception or a program loop due to the incorrect implementation of a COBOL statement. In any case, the test in question would have to be deleted. The NOTE would be used as specified above. In addition, VPMSI provides a universal method of updating source programs so that the individual who validates more than one compiler is not constantly required to learn new implementor techniques for updating source programs. Example of update cards through VPMSl: 012900 02 A PIC ZZ9 VALUE 1. 013210 l\10VE 1 TO A. 014310 NOTE 014900* 029300*099000 (If the sequence number is equal the card is replaced; if there is no match the card is inserted in the appropriate place in the program.) (Deletes card 014900) (Deletes the series from 029300 through 099000). To carry the problem a step further. Some of the names used by different implementors for the high speed printer in the SELECT statement have been PRINTER, SYSTEl\1-PRINTER, FORMPRINTER, SYSOUT, SYSOUl, PI FOR LISTING, ETC. It is obvious to a programmer what the implementor has in mind, but the compiler that expects SYSTEl\1-PRINTER, will certainly reject any of the other names. Therefore, each occurrence of an implementor-name must be converted to the correct name. The approach taken is that each implementorname is defined to VPlVlS1. For example, the printer is known as XXXX36 and the audit routines using the printer would be set up in the following way: SELECT PRINT-FILE ASSIGN TO XXXXX36 And the user would provide the name to be used by the computer being tested through an "X-CARD." X-36 SYSTEl\;f-PRINTER VPl\1S1 would then replace all references of XXXXX36 with SYSTEl\1-PRINTER. SELECT PRINT-FILE ASSIGN TO SYSTEM-PRINTER. data division. FD FILE-NAME VALUE OF implementor-name-5 IS implementor-defined. Figure 6-Implementor defined names that would appear in a COBOL program A bility to update programs The next problem was to provide the user with a method for making changes to the audit routines in 826 Fall Joint Computer Conference, 1972 ADD-TEST-l. NOTE (Inserted by the user as an update to the program.) MOVE 1 TO ALPHA. TO TO ADD-WRITE-I. ADD-DELETE-l. PERFORM DELETE. Figure 7-Example of deleting a test in the DOD CCVS an orderly fashion and at the same time provide a maximum amount of documentation for each change made. There are two reasons for the user to need to make modifications to the actual audit routines: a. If the compiler will not accept a form of syntax it must be eliminated in order to create a syntactically correct program. There are two ways to accomplish this. In the Procedure Division the NOTE statement is used to force the "invalid" statements to become comments. The results of this action would cause the test to be deleted and this would be reflected in the output. See Figure 7. OPERATING SYSTEM CONTROL CARD GENERATION The third problem was the generation of operating system control cards in the appropriate position relative to the source programs in order for the programs to be compiled, loaded and executed. This was the biggest challenge for VPMS1; a COBOL program which had to be structurally compatible with all COBOL compilers and which also had to be able to interface with all operating systems with a negligible amount of modification for each system. The philosophy of the output of VPMS1 is a file acceptable to a particular operating system as input. For the most part this file closely resembles what would normally be introduced to the operating system through the system's input device or card reader, i.e., control cards, source program, data, etc. The generation of operating system control cards is based on the specific placement of the statement and the requirement or need for specific statements to accomplish additional functions. These control cards are presented to VPMS1 in a form that will not be intercepted by the operating system and are annotated as to their appropriate characteristics. The body of the actual control card starts in position 8 of the input record. Position one is reserved for a code that specifies the type of control card. The following is allowed in specifying control cards: Initial control cards are generated once at the beginning of the file. Beginning control cards are generated before each source program with a provision for specifying control cards which are generated at specific times, i.e., JOB type cards, subroutine type cards, library control cards, etc. Ending control cards are generated after each source program with the same provision as beginning control cards. Terminal control cards are generated prior to the file being closed. Additional control cards are generated for ~ssigning hardware devices to the object program, bracketing data and for assigning work areas to be used by the COBOL Sort. There are approx:mately 25 files used by the entire set of validation routines for which control cards may need to be prepared. In addition to the control cards and information for the Environment Division, the total number of control statements printed for VPMS1 could be in the neighborhood of 200 card images and the possible number of generated control cards on the output file could be as large as 5000. The saving in time and JCL errors that could be prevented should be obvious at this point. This Environmental information need not be provided by the user because once a set of VPMS1 control cards has been satisfactorily debugged on the system in question, they can be placed in the library file that contains the same program so that a single request could extract the VPMS1 control cards for a given system. CONCLUSION It has been demonstrated that the validation of COBOL compilers is possible and that the end result is beneficial to both compiler writers and the users of these compilers. The ease with which the DOD CCVS can be automatically adapted to a given computer system has eliminated approximately 85 to 90 percent of the work involved in validating a COBOL compiler. Although most compilers are written from the same basic specifications (i.e., the American National Standard COBOL, X3.23-1968, or the CODASYL COBOL Journal of Development) the results are not always the same. The DOD CCVS has exposed numerous compiler bugs as well as misinterpretations of the language. Due to this and similar efforts in the area of The DOD COBOL Compiler Validation System compiler validation, the compatibility of today's compilers has grown to a high degree. Weare now awaiting the next version of the American National Standard COBOL. The new specifications will provide an increased level of compatibility between compilers because the specifications are more definitive and contain fewer "implementor defined" areas. In addition, numerous enhancements and several clarifications have been included in the new specification- 827 all contributing to better software, both at the compiler and the application level. REF'ERENCES 1 American National Standard COBOL X3.23-1968 American National Standards Institute Inc. New York 1968 2 COBOL-61 Conference on Data System Languages U. S. Government Printing Office Washington D. C. 1961 A prototype automatic program testing tool by LEON G. STUCKI McDonnell Douglas Astronautics Company Huntington Beach, California Dijkstra, in relation to both hardware and software "mechanisms," continues by stating: " ... as a stow-witted human being I have a very small head and had better learn to live with it and to respect my limitations and give them full credit, rather than try to ignore them, for the latter vain effort will be punished by failure." -Edsger W. Dijkstra "The straightforward conclusion is the following: a convincing demonstration of correctness being impossible as long as the mechanism is regarded as a black box, our only hope lies in not regarding the mechanism as a black box. I shall call this 'taking the structure of the mechanism into account.' "1 SOFTWARE SYSTE1VIS l\1EASUREl\1ENT TECHNIQUES As suggested by R. W. Bemer and A. L. Ellison in their 1968 IFIP report, the examination of hardware and software structures might incorporate similar test procedures: The measurement process plays a vital role in the quality assurance and testing of new hardware systems. To insure the reliability of the final hardware system, each stage of development incorporates performance standards and testing procedures. The establishment of software performance criteria has been very nebulous. At first the desire to "just get it working" prevailed in most software development efforts. With the increasing complexity of new and evolving software systems, improved measurement techniques are needed to facilitate disciplined program testing beyond merely debugging. The Program Testing Translator is an automatic tool designed to aid in the measurement and testing of software systems. A great need exists for new methods of gaining insight into the structure and behavior of programs being tested. Dijkstra alludes to this in a hardware analogy. He points out that the number of different multiplications possible with a 27-bit fixed-point multiplier is approximately 254. With a speed in the order of tens of microseconds, what at first might seem reasonable to verify, would require 10,000 years of computation. 1 With these possibilities for such a simple operation as the multiplication of two data items, can it be expected that a programmer anticipate completely the actions of a large program? "Instrumentation should be applied to software with the same frequency and unconscious habit with which oscilloscopes are used by the hardware engineer.' '2 Early attempts at the application of measurement techniques to software dealt mainly with efforts to measure the hardware utilization characteristics. In an attempt to further improve hardware utilization, several aids have been developed ranging from optimized compilers to automated execution monitoring systems. 3 •4 The Program Testing Translator, designed to aid in the testing of programs, goes further. In addition to providing execution time statistics on the frequency of execution for various program statements, the Program Testing Translator performs a "standards" check to insure programmers' compliance to an established coding standard, gathers data on the extent to which various branches of a program are executed, and provides data range values on assignment statements and DO-loop control variables. 829 830 Fall Joint Computer Conference, 1972 As was pointed out by Heisenberg, in reference to the measurement of physical systems, a degree of uncertainty is introduced into any system under observation. With Heisenberg's Uncertainty Principle in mind, the Program Testing Translator is presented as a "tool" to be used in the software measurement process. Just as using a microscope to determine the position of a free particle introduces a degree of uncertainty into observations, 5 so must it be concluded that no program measurement tool can guarantee the complete absence of all possible side effects. In particular, potential problems involving changes in time and space must be considered. For example, the behavior of some real-time applications may be affected by increased execution times. To avoid the use and development of more powerful program testing tools because of possible uncertainties, however, would be as great a folly as to rej ect the use of the microscope. DATA ANALYZED BY THE PROGRAM TESTING TRANSLATOR (2) The number and percentage of those branches and CALLs actually taken or executed. (3) The following specific data associated with each executable source statement. (a) detailed execution counts (b) detailed branch counts on all IF and GOTO statements (c) min/max data range values on assignment statements and DO-loop control variables. Several previous programs7 ,8,9 have provided interesting source statement execution data. The additional data range information provided by the Program Testing Translator, however, proves useful in further analyzing program behavior. Extended research investigating possible techniques for automatic test data generation will make use of these data range values. The long term goal of this research is directed toward designing a procedure for obtaining a minimal yet adequate set of test cases for "testing" a program. STANDARDS' CHECKING In a paper by Knuth a large sample of FORTRAN programs was quantitatively analyzed in an attempt to come up with a program profile. This profile was expressed in the form of a table of frequency counts showing how often each type of source statement occurs in a "typical" program. Knuth builds a strong case for designing profile-keeping mechanisms into major computer systems. 6 Internal organization of the Program Testing Translator was designed with Knuth's table of frequency counts in mind. The Program Testing Translator gathers and analyzes data in two general areas: (1) the syntactic profile of the source program showing the number of executable, nonexecutable, * and comment statements, the number of CALL statements and total program branches, ** and the number of coding standard's violations, and (2) actual program performance statistics corresponding to various test data sets. With all options enabled, the actual program performance statistics produced by the Program Testing Translator include: (1) The number and percentage of those executable source statements actually executed. * Executable statements include assignment, control, and input! output statements. Nonexecutable statements include specification and subprogram statements. ** A branch will denote each possible path of program flow in all conditional and transfer statements (i.e., all IF and GOTO statements in FORTRAN). Although general in design, the initial implementation of the Program Testing Translator was restricted to the CDC 6500. Scanning the input source code, the Program Testing Translator flags as warnings all dialect peculiar statements which pose possible machine conversion problems. The standard is basically the ASA FORTRAN IV Standard10 with some additional restrictions local to McDonnell Douglas. The standard can easily be altered to reflect the needs of a particular installation, in contrast to previous compilers which have incorporated a fixed standard's check (e.g., WATFOR). DEVELOPMENT OF THE PROGRAM TESTING TRANSLATOR The Program Testing Translator serves as a prototype automatic testing aid suggesting future development of much more powerful software testing systems. The basic components of the system are the FORTRAN-toFOR TRAN preprocessor and postprocessor module. (See the section on Use of the Program Testing Translator.) A machine independent Meta Translatorll was used to generate the Program Testing Translator. Conventionally, moving major software processors between machines posed serious problems requiring either completely new coded versions, or the use of new metacompiler systems. 12 This lVleta Translator produces an Prototype Automatic Program Testing Tool 831 ASA Standard FORTRAN translator which represents an easily movable translation package. In general, for implementation on another machine, FORTRAN-to-FORTRAN processors such as the Program Testing Translator require only that the syntactic definition input to the Meta Translator be changed to reflect the syntax of the new machine's FO RTRAN dialect. INSTRUMENTATION TECHNIQUES The instrumentation technique used by the Program Testing Translator is to insert appropriate high-level language statements within the original source code making as few changes as possible. 13 Three memory areas are added to each main program and subroutine. One is used for various execution counts while the other two are used for the storage of minimum and maximum data range values for specified assignment statements and DO-loop control variables. The size of these respective memory areas depends upon the size of the program being tested and the options chosen. Simple counters of the form: QINT(i) =QINT(i) +1 are inserted at all points prefixed by statement numbers, at entry points, after CALL statements, after logical IF statements, after DO statements, and after the last statement in a DO-Ioop.8.14 Additional counters are used to maintain branch counts on all IF and GOTO statements. l\,finimum and maximum data range values are calculated following each non-trivial assignment statement. These values of differing types are packed into the two memory areas allocated for this purpose. Minimum and maximum values may also be kept on all variables used as DO-loop control parameters. These values are calculated before entry into the DO-loop. USE OF THE PROGRAM TESTING TRANSLATOR Overall program flow of the Program Testing Translator is diagrammed in Figure 1. Basically, the user's FORTRAN source cards are input to the preprocessor. . This preprocessor module outputs: (1) an instrumented source file to be compiled by the standard FORTRAN compiler and (2) an intermediate data file for postprocessing. This intermediate file contains a copy of the original source code with a linkage field for extracting the profile and execution-time data for the program. Figure 1-Program testing translation job flow The object code produced by the FORTRAN compiler is linked with the postprocessor from an object library. The resulting object module can now be saved along with the intermediate Program Testing Translator data file. Together they can then be executed with any number of user test cases. Using the intermediate file built by the preprocessor and data gathered while duplicating the original program results, the postprocessor generates reports showing program behavior for each specific test case. Analysis of these reports will help eliminate redundant test cases and point to sections of the user's program which have not yet been "tested." Examination of these particular areas may lead to either their elimination or the inclusion of modified test cases to check out these program sections. Preliminary measurements indicate that the execution time of the instrumented object module, with all options enabled, is approximately one and one half to two times the normal CPU time. Increases in IIO time are negligible in most cases. ACTUAL TEST EXPERIENCE Although the Program Testing Translator has only been available for a short time, several interesting results have come to light. One of the first major subroutines, processed at l\,fcDonnell Douglas, was an eigenvalue-eigenvector subroutine believed to be the most efficient algorithm currently available for symmetric matrices. I5 •16 Of the 832 Fall Joint Computer Conference, 1972 613 source statements in the subroutine, it was immediately noted that the nested DO-loop shown here was accounting for one quarter of all the execution counts for the entire subroutine (see Appendix). DO 640 1=1, N DO 640J=1, N 640 B(I) =B(I) +A(I, J)*V(J, IVEC) Several immediate observations are worthy of mention. First, note that the complexity of the above statement with its double subscripting and multiplication makes it a costly statement to execute. Second, it can be seen that the subscripting of the variable B(I) could be promoted out of the inner loop. A good optimizing compiler should promote the subscripting of the B (I) 's and produce good code for the double subscripting and muItiplication17 but it cannot logically redesign the nested loop. A redesigned machine language subroutine replacing the original loop has now cut total subroutine execution time by one third. Further analysis of the same program, in an attempt to determine why several sections .of code were not executed, revealed a logic error making it impossible to ever reach one particular section of code. This error, which was subsequently corrected, can be seen on the first page of the original run contained in the Appendix. This was a subroutine experiencing a great deal of use and thought to be thoroughly checked out. Running the Program Testing Translator through itself has resulted in savings of over 37 percent in CPU execution times. The standard's checking performed by the Program Testing Translator has verified the machine independence of the Meta Translator. Table I contains a summary of the actual program statistics observed on the first eight major programs run through the Program· Testing Translator. It is interesting to note that only 45.9 percent of the possible executable statements were actually executed. Of more importance, however, is the fact that only 32.5 percent of all possible branches were actually taken. Table II compares the class profile data gathered at lVlcDonnell Douglas by the Program Testing Translator with the Lockheed and Stanford findings cited by Knuth. 6 The syntactic profile of the l\1cDonnell Douglas and Lockheed samples were remarkably similar. Stanford's "student" job profile shows much less TABLE I-Actual Program Statistics with the Program Testing Translator Program Total Number of Statements AB33 1,578 AD77 11,111 F999 2,833 JOYCE 3,033 META 1,125 MI01 PTT 775 772 UT03 1,445 TOTALS 22,672 No. of Comment Statements Percentage of Total 355 22.5 3,847 34.6 644 22.7 176 5.8 86 7.6 189 24.4 44 5.7 54 3.7 5,395 23.8 No. Other Nonexecutable Statements Percentage of Total 177 11.2 905 8.1 257 9.1 372 12.3 534 47.5 40 5.2 249 32.3 254 17.6 2,788 12.3 No. Standard's Violations Percentage of Total 9 0.6 33 0.3 28 1.0 65 2.1 1 0.1 1 0.1 23 3.0 44 3.0 204 1.0 1,046 66.3 6,359 57.2 1,932 68.2 2,485 81.9 505 44.9 546 70.5 479 62.0 1,137 78.7 14,489 63.9 No. Actually Executed Percentage Executed 678 64.8 2,213 34.8 1,155 59.8 846 34.0 419 83.0 392 71.8 364 76.0 584 51.4 6,651 45.9 No. of Branches Avg./Exec. Statements No. Actually Executed Percentage Executed 357 0.34 195 54.6 2,635 0.41 571 21.7 355 0.70 203 57.2 189 0.35 112 59.3 333 0.70 175 52.6 510 0.45 175 34.3 6,956 0.48 2,261 32.5 No. of CALL Statements Avg.jExec. Statements No. Actually Executed Percentage Executed 20 0.02 18 90.0 369 0.06 119 32.2 32 0.06 21 65.6 9 0.02 3 33.3 19 0.04 5 26.3 99 0.09 76 76.8 912 0.06 335 36.7 No. Executable Statements Percentage of Total Total Statement Exec. Counts (in thousands) 26,772 2,929 859 1,718 0.44 0.69 376 454 43.8 26.4 86 0.04 26 30.2 112 278 0.11 67 24.1 1,129 5,284 1,133 1,087 71 38,517 Prototype Automatic Program Testing Tool TABLE II-A Comparison of Syntatic Class Profiles McDonne~1 Douglas Total No. State22,672 ments 23.8 Percentage Comments 12.3 Percentage Other Nonexecutable 63.9 Percentage Executable Avg. No. 0.48 Branches/ Executable Statement Avg. No. CALLs/ 0.06 Executable Statement Lockheed* 245,000 Stanford* 10,700 833 l\1odeled after development of the FORTRAN-toFORTRAN system, instrumentation systems for other languages of heavy use such as COBOL or PL/l might well be developed. The most important area now being investigated, however, is the possible design of extensible automatic testing aids to provide for the automatic generation of test data. Evolvement of future testing tools along these lines would greatly aid the quality assurance aspects of large software systems. 2l.6 10.2 10.6** 12.3** 67.8 77.5 0.54 0.32 ACKNOWLEDGMENT 0.04 The research described in this report was carried out by the author under the direction of T. W. l\1iller, Jr. and R. G. Koppang in the Advance Computer Systems Department at the l\1cDonnell Douglas Astronautics Company in Huntington Beach, California. 0.09 * Note: These figures represent this author's best attempt at extrapolating comparable measurements from Knuth's paper.6 Knuth's percentage figures had to be corrected by adding the comment statements into the total number of statements. Calculations of the average number of branches per executable statement require two assumptions: (1) 30 percent of the IF statements had 3 possible branches· while 70 percent had 2 branches, (2) 96 percent of the GOTO .statements were unconditional (i.e., 1 branch), while 4 percent were switched (i.e., 2 branches were assumed). ** Includes the following: FORMAT, DATA, DIMENSION, COMMON, END, SUBROUTINE, EQUIVALENCE, INTEGER, ENTRY, LOGICAL, REAL, DOUBLE, OVERLAY, EXTERNAL, IMPLICIT, COMPLEX, NAMELIST, BLOCKDATA. emphasis on internal documentation (i.e., fewer comment statements) and also exemplifies a more straightforward approach to flow of control (as seen in Stanford's 0.32 branches per executable statement compared to 0,48 and 0.54 branches per executable statement for the two aerospace companies) . EXTENSIONS OF THE PROGRAl\1 TESTING TRANSLATOR As alluded to in earlier sections, much more powerful testing systems can and should be built in the future. Relatively simple changes to the postprocessor module could enable the execution time data from multiple test runs to be combined automatically into composite test reports. Changes to the translator module might provide the options of first and last values on assignment statements as well as range values. REFERENCES 1 E W DIJKSTRA Notes on structured programming Technological University Eindhaven The Netherlands Department of Mathematics April 1970 TH Report 70-WSK-03 2 R W BEMER A L ELLISON Software instrumentation systems for optimum performance IFIP Congress 1968 Proceedings Software Session 2 Booklet C p 39-42 3 System measurement software SMS/360 problem program efficiency Boole and Babbage Inc. Product Description Palo Alto California May 1969 Document No. S-32 Rev-l 4 D N KELLY Spy a computer program execution monitoring package McDonnell Douglas Automation Company Huntington Beach California MDC G2659 December 1971 5 W HEISENBERG The uncertainty principle Zeitschrift fuer Physic Vol 43 1927 6 D E KNUTH An empirical study of FORTRAN programs Stanford Artificial Intelligence Project Memo AIM-137 Computer Science Dept Report No. CS-186 7 1ST LT G W JOSEPH The fortran frequency analyzer as a data gathering- aid for computer system simulation. Electronics Systems Division United States Air Force L G Hanscom Field Bedford Massachusetts March 1972 8 D H H IGNALLS FETE A FORTRAN execution time estimator Computer Science Department Stanford University STAN-CS-71-204 February 1971 9 CDC 6500 FWW user's manual TRW Systems Group Redondo Beach California 10 Proposed American standard X3-1,..3-FORTRAN Inquiries addressed to X3 Secretary BEMA 235 E 42nd Street N ew York NY March 1965 834 Fall Joint Computer Conference, 1972 11 Meta translator Advanced Computer Sciences Department McDonnell Douglas Astronautics Company Huntington Beach California currently in preparation 12 A R TYRILL The meta 7 translator writing system Master's Thesis School of Engineering and Applied Science University of California Los Angeles California Report 71-22 September 1971 13 E C RUSSELL A utomatic program analysis PhD Dissertation in Engineering School of Engineering and Applied Science University of California Los Angeles California 1969 14 V G CERF Measurement of recursive programs Master's Thesis School of Engineering and Applied Science University of California Los Angeles California 70-43 May 1970 15 S J CLARK Computation of eigenvalues and eigenvectors of a real symmetric matrix using SYMQRl Advanced Mathematics Department McDonnell Douglas Astronautics Company Huntington Beach California Internal Memorandum A3-830-BEGO-71-07 November 1971 16 S J CLARK Further improvement of subroutine SYMQRl Advanced Mathematics Department McDonnell Douglas Astronautics Company Hungtington Beach California Internal Memorandum A3-830-BEGO-SJC-094 March 1972 17 F E ALLEN Program optimization Ann Rev in Automatic Programming 5(1969) pp 237-307 APPENDIX COUNT 'ItOGIt"K loUTING CloUDING 't INDICATES CONVE"1l0N WUNINOS) . , " .n. NUl'll' GO TO .sao I'CABSCS) .Ii'. USce» GO TO 350 It-. IIC • -SeECI) ,DCI*1) , on.u •K 37'0 3613 2601 2601 2601 2607 2607 2607 2607 2607 2607 lOU 1066 1066 * ceDCI.l' o • ceeCS.u ~~SnN,~~O TO RI!TURN It • hNO"M ;J~fC En-u • GO. TO no Q 3'0 , , cuec n • seDCZ.U Q , SeeCl*U DUtU • C-,/5 • I( i-(.-.1) 11 C-fU.U "SlUN 360 TO RETUR'" GO TO tOO ItO It It ti NO"" 180 CO~TlNU& 11)66 1066 1066 11)66 3790 117 117 117 TI"' • ceiONMIl * S-DCNU' IHIW) ..httHUIt1' - CaD CNU) &, HUMU C ~ Q C !! TEMP INfERNAL P!lOCIDII"! n TRue 111 TRUE 1066 MIN .. 9,""'14'E-01 MI N~.l. 2023366!-01 MI "" .. 2. 41J7164'E.01 1'111'4 ... 1.244'6611 01 BRANCH 1 2607 MIN"'1.01000"1-01 HIN •• l. 2443661E-01 BRANCH 1 2601 1'1 I'h.l. 2O~8'61!.0l HI N•• 6 • 12e4361E .. 02 MI N"'1.156'33'E-Ol fltl rt-d.l O~.4flE"02 S~ANCH 1 1066 MIN. '.U'10"!.U HI "" •• 1. '6D24UE.n Ht"~·1.~n~1 .., IN"-l. 5602425e.03 3613 FAI.SE 2607 FALSE MU·9."U60n-u MAX. 1.2SI6663e.01 MU. 2.4810,'4hOl flU. 1.115I685E 01 MU· lilOO1"GE-tat MAX. 1. 17'26"Eoo01 MUw MAX. HAX. MU. 1.U 4 Z4UEa01 4.0277033Ea01 1.238302,8E·n o1io12""~ fUX- 4. ZlS0652hDl MAXu 3.1436U2E.Q3 itl~-1-.-nh6~»- MAX. 3.14;56612E,,03 CAI;Ctll;ATi 'iole ASTATIeN e S R A I ! ! S p O N D I N I t - f l - - - - - - - - - - - - - - - - - - - - - - - - - - - - THI VECTORCIfI,Q" 11' GO f& ' " '00 '1' .A81(1I) 0; "'8tU) NORM !I' PII!SQRT U • 0 • CQO/pp, •• 2) GO TO no 0-.01 ~ TO , u - HENCE, NORM. OO-SORT(1.0 - Cp'/QU'*Z) '20 C • '/NORM I , blllOR11 GO TO !I!TURN. C310,34D.~U) ,It C • IMPOSSIBLE TO REACH AS IS IIGE~VECTO"S ( I ' ANY! BY INVERSE ITERATION. C ~ C I' AI.I. illlNVALUES WERE OIlUIN&D (F411. • 0), THEN AI.I. UGENV!CTORI C WERE E!TIoIIR COH'UTED AI.READY BY ROTA TI ON OM THE REM AI NI NG E lG!NYECTORS IN'fER~[ "[RATIO''" C1"'''11' • 11, IF AI;;l;; EI8EN'IAI;;I;IES V ~ERII NOT OITAINED C'All. .NE. 0), STORE ERROR FLAG IN Ii Rotell, .tt,IIOTC"II.I, jlHICH INDICATES !HEA!: WILl, 9E NO EIGEt>iYECTOIIS e F-oft .Tio/t ttHNV-kL-vES", 'OUftD. ALSO, IIRO OUT REMAINING 1.0enioNt OF ROT A"RAY. lEE COMMENT .0I.LOWING STATEMENT NUMBER 621. G H.... II IBUINiB IV 3190 5790 Z11' 2119 NEVER TRUE i.~ S I 1).0 NORM. 0.0 C 'IND REMAHIING ~790 00 MUST >ZERO fU "eM ,fl. i "EcrrlC ElIEeU,.rON DATA eAAffOH- 1 11' 1'1 I N. 1.21571631-0' 1'41 ",. 1. UD2833E-U T~Ij! U1t MI "". ". ".53"!-O4 SRANelol 1 2'1' MAX. '.0125388;Jbos. MAX, 01. 2UO"2~.01 ~AI;Se 2Yl' MAX. '. 4Z'38Ub01 tOff0- FiLff 10-11 TItIti 10 7 s. HI N, 1.6846126E .. 03 "AX. ". 2130,,2E"01 .- 31'0 5"'0 3190 MIN, .. i.OODOOOOe-oo MIN .. '. "'''tOedl BRANCH 1 11' IRANCf04 3 1066 MAXu i.OOOOOOOE-OO flAX- " '''UUE 01 BUNCH 2 2601 Prototype Automatic Program Testing Tool P~OO'UH !.ISTING C!.UDINQ v I~DIClTES COUNT CONV6RStON WARNINGS) .,Q. 835 SPtCIF'tC ElCECUTIOIII DAU GO TO 612 BIIANC'" " 62' 1'(1'1 NI 00 TO 627 TRue o ,J " M • 1 o DO 626 J • J,N -C-----626 VC I, tVEC) ~ 0.0 C C IF' !TEll .G'. 6, COM"UTE RESIDUAl. USING THE TRIDJAIiONAI. HATRIX AND c THE EIGENVECTOR VCJdVEC) OF' THE TRIDIAGO~AL HATRIX. IF' THE RESIDUAL C VECTOJI HAS ALL ELEME~TS LESS THAN l,OE"8 I ~ ABSOLUTE VAlour;, VCl,r VEC) -C-t~-u----'N'I!"~ !I QENV!CTOR --.-mHtM-ttvff1I----11Ij+lt:1LLr-tC~O~"''f_T....IHIllr'lZ~Ett''O~.;------------------------------ C IF NOT, ROTnVEC) cONTAINS TH-E I.ARGEST EL.EMENT OF' THI: RESIDUAl. C GREATER T"'.N ~.. OE-8, C 627 H'C nEP .LE. 6) GO TO 630 59 FALSi " TJlUE DO 628 ( - l,N o - en 8CII • DtCI""---~---------------------SUI'I1 • AiS 1 may be the appropriate distribution for representing time between failures. The log-normal and gamma (with appropriate choice of parameters) are other functions with an increasing hazard rate which may also be appropriate for this phase. A pplication oj Reliability Theory to SoJtware There are also major differences between hardware and software reliability. These are listed below: • Stresses and wear do not accumulate in software from one operating period to another as in the case of certain equipment; however, program quality may be different at the start of each run, for the reason given below. • In the case of hardware, it is usually assumed that between the burn-in and wearout stages an exponential distribution (which means a constant hazard rate) applies and that the probability of failure in a time interval t is independent of equipment age. However, for software, there may be a difference in the initial "state of quality" between operating periods due to the correction of errors in a previous run or the introduction of new errors as the result of correcting other errors. Thus it is appropriate to employ a reliability growth model which would provide a reliability prediction at several points in the cumulative operating time domain of a program. • For equipment, age is used as the variable for reliability prediction when the equipment has reached the wearout stage. Since with software, the concern is with running a program repeatedly for short operating times, the time variable which is used for reliability purposes is the time between troubles. However, cumulative operating time is a variable of importance for predicting the timing and magnitude of shifts in the reliability function as a result of the continuing elimination of bugs or program modification. Over long periods of calendar or test time, there will 1.0 When applied to software reliability, many of the basic concepts and definitions of reliability theory remain intact. Among these are the following: • Definition of reliability R(t) as the probability of successful program operation for at least t hours • Probability density function J(t) of time between software troubles, or, equivalently, the time rate of change of the probability of trouble • Hazard rate z(t) as the instantaneous trouble rate, or, equivalently, the time rate of change of the conditional probability of trouble (time rate of change of probability of trouble, given that no trouble has occurred prior to time t) 841 OPERATING TIME -t CUMULATIVE TEST TIME Figure 3-Reliability growth 842 Fall Joint Computer Conference, 1972 Step 1. Assemble Data TEST DATAl ASSEMBLE DATA IDENTIFY STATISTICAL DISTRIBUTION POINT AND { CONFIDENCE LIMIT ESTIMATES ESTIMATE RELIABILITY PARAMETERS ESTIMATE RELIABILITY FUNCTION RELIABILITY AND CONFIDENCE LIMIT It MAKE RELIABILITY PREDICTION Data must first be assembled in the form of a time between troubles distribution as was indicated in Figure 2. At this point, troubles are also classified by type and severity. Step 2. Identify Statistical Distribution In order to identify the type of reliability function which may be appropriate, both the empirical relative frequency function of time between troubles (see example in Figure 5) and the empirical hazard function are examined. The shapes of these functions provide qualitative clues as to the type of reliability function which may be appropriate. For ~xample: • A monotonically decreasing jet) and a constant z(t) suggest an exponential function. • An f(t) which has a maximum at other than t = 0 and a z(t) which monotonically increases suggests: -Normal function or -Gamma function or - Weibull function with (3 > 1. • A monotonically decreasing j( t) and a monotonically decreasing z(t) suggest a Weibull function with (3 < 1. After some idea is obtained concerning the possible distributions which may apply, point estimates of the parameters of these distributions are obtained from the sample data. This step is necessary in order to perform goodness of fit· tests and to provide parameter Figure 4-Steps in reliability prediction be shifts in the error occurrence process such that different hazard rate and probability density functions are applicable to different periods of time; or, the same hazard and probability density functions may apply but the parameter values of these functions have changed. This shift is depicted in Figure 3, where the reliability function, which is a decreasing function of operating time is .shown shifted upward at various points in cumulative test time, reflecting long-term reductions in the trouble rate and an increase in the time between troubles. Time Between Troubles Distributions Ship I Program I .4 .3 .2 .1 O~--L-~--------------~--~--~ Approach 0-.9 1.0-1.9 2.0-2.9 Time Between Troubles (Hours) The steps which are involved . in one approach to software reliability prediction are shown in Figure 4. _______ 4.0-4.9 6.0-6.9 3.0-3.9 S.9-!5.9 7.0-7.9 Figure 5-Probability density functions Approach to Software Reliability Prediction and Quality Control Kolmogorov - Smirnov Exponential Test Program I Many Ships* 1.0- Program Run Time Distribution Function 1.0 " Q .9 4l " " .8 '" ~ ~, .7 ~,,~ .6 .5 , , " "'" \, .4 \ .3 ~ \ \ ~ I J I 1 I ---I * d = ± .139 Confidence Band a =.05 Level of Significance N = 93 Data Points ~~ ~" "~ I 843 '~ • ~~ .2~--~--~~--~~-.~~--~~~_~~--~-~--~--~ x ~ Upper Confidence Limit ,~ .I .09~----~----~----+\-----+----~,-----+-----+----~----~ .08~--~----~----~\~--~----~~~~--~--*--~----~--~ .01 .06 / /"\ \ I / , "r\. / From Table of lid II Distribution Values - / " V " .05 t-------+'---/---+--"7"'~+___r_~--+--V~x--+----_x~~-t----f--------I .04 .- Lower Confidence Limit 03 ~ . Theorectical Exponential Distribution Function .02 ,... e-· I\.. / I / 487T x """ Empirical Data - - - - " *Ships I, 2, 3, 4, 5, 6, 7. .01 I I 2 1 4 5 Program Run Time (Hours) Figure 6-Goodness of fit test 6 7 8 844 Fall Joint Computer Conference, 1972 estimates for the reliability function. In order to make a goodness of fit test, it is necessary to provide an estimate of the theoretical function to be used in the test. This is obtained by making point estimates of the applicable parameters. In the case of the one parameter exponential distribution, the point estimate would simply involve computing the mean time between troubles = total cumulative test time/number of troubles, which is the maximum likelihood estimator of the parameter T in the exponential probability density functionf(t) = liTe-tIl. In the case of a multiple parameter distribution, the process is more involved. For the Weibull distribution, the following steps are required to obtain parameter point estimates: - - - A logarithmic transformation of the hazard function is performed in order to obtain a linear function from which initial parameter values can be obtained. The initial values are used in the maximum likelihood estimating equations in order to obtain parameter point estimates. The probability density, reliability and hazard functions are computed using the estimated parameter values. At this point, a goodness of fitness test can be performed between the theoretical probability density and the empirical relative frequency function or between the theoretical and empirical reliability functions. The Kolmogorov-Smirnov (K-S) or Chi Square tests can be employed for this purpose. An example of using the K -S test is shown graphically in Figure 6. This curve shows a test with respect to an exponential reliability function. Shown are the upper and lower confidence limits, the theoretical function and the empirical data points. Since the empirical points fall within the confidence band, it is concluded that the exponential is not an unreasonable function to employ in this case. Step 3. Estimate Reliability Parameters Confidence Limits The point estimate of a reliability parameter provides the best single estimate of the true population parameter value. However, since this estimate will, in general, differ from the population parameter value due to sampling and observational errors, it is appropriate to provide an interval estimate within which the population parameter will be presumed to lie. Since only the lower confidence limit of the reliability func- tion is of interest, one-sided confidence limits of the parameters are computed. In Figure 7 is shown an example of the results of the foregoing procedure, wherein, for an exponential distribution, the point estimate of mean time between troubles (MTBT) is 2.94 hours (hazard rate of .34 troubles per hour) and the lower confidence limit estimate of MTBT is 2.27 hours (hazard rate of .44 troubles per hour). The lower confidence limit of MTBT for an exponential distribution is computed from the expression Ti = 2nl/x~n.l-a where· T: is the lower confidence limit of MTBT, n is number of troubles, l is the MTBT (estimated from sample data), x2 is a Chi-Square value and a is the level of significance. Step 4. Extimate Reliability Function With point and confidence limit estimates of parameters available, the corresponding reliability functions can be estimated. The point and lower limit parameter estimates provide the estimated reliability functions R = e-·34t and R = e-· 44t , respectively, in Figure 7. In this example, the predicted reliabilities pertain to the occurrence of all categories of software trouble, i.e., the probability of no software troubles of any type occurring within the operating time of t hours. Step 5. Make Reliability Prediction With estimates of the reliability function available, the reliability for various operating time intervals can be predicted. The predicted reliability is then compared with the specified reliability. In Figure 7, the predicted reliability is less than the specified reliability (reliability objective) throughout the operating time range of the program. In this situation, testing must be continued until a point estimate of MTBT of 5.88 hours (.017 troubles per hour hazard rate) and a lower confidence limit estimate of MTBT of 4.55 hours (.022 troubles per hour hazard rate) are obtained. This result would shift the lower confidence limit of the predicted reliability function above the reliability objective. Estimating test requirements For the purpose of estimating test requirements in terms of test time and allowable number of troubles, curves such as those shown in Figure 8 are useful. This set of curves, applicable only to the exponential reliability function, would be used to obtain pairs of (test Approach to Software Reliability Prediction and Quality Control Reliability Function and Its Confidence Limit for Program It Ship I Using Exponential Reliabi I ity Function. a =.05 Level of Significance 1.0 Reliability Required to Satisfy Reliability Objectives* R=e -.017 t' Lower Confidence Limit R =e-· 022 T ......... s .9 .8 Reliability Objective (Assumed) .7 r. .6 .5 ~ +- :cc Q) Exponential Reliability (Existing) R = e -.341:' .4 Lower Confidence Limit (Existing) R =e -.44 t' .3 a:: .2 .I 2 3 4 5 Operating Time (Hours) *Assumino Zero R = e-· 020 T 6 7 t' Troubles During Remaining Tests. For 10 Troubles During Remaining Tests. Figure 7-Reliability prediction 845 846 Fall Joint Computer Conference, 1972 Amount of Test Time Required to Achieve Indicated Lower Limit of Reliability 800 Exponential Reliability Function R, •.ge.... 1 hr. 700 - T, =19.5 hr•. 600 1 500 1 400 : .1 R, =.90.... , hr. 300 Tl =9.48 hro . ,. R ·.8!S ... ·Ihr. ~ 200 T,.a.l!ShrL 100 2 4 6 8 10 12 14 16 18 .20 22 24 26 28 30 Number of Trouble. During Te.t Figure 8-Test requirements time, number of troubles) values. The satisfaction during testing of one pair of values is equivalent to satisfying the specified lower limit of reliability Rz for t hours of operation. For example, if a program reliability specification calls for a lower reliability confidence limit of .95 after 1 hour of operating time, this requirement would be satisfied by a cumulative test time of 100 hours and no more than 2 troubles; a cumulative test time of 200 hours and no more than 6 troubles; a cumulative test time of 300 hours and no more than 10 troubles, etc. The required test time is estimated from the relationship T [tX~n,l-aj 2Ln(1jR z)], where T is required test time, t is required operating time, x2 is a Chi Square value, n is number of troubles, Rz is the required lower limit of reliability and a is level of significance. PRELIMINARY RESULTS AND CONCLUSIONS A Naval Electronics Laboratory Center (NELC) sponsored study* was performed, employing the concepts and techniques described in this report, on Naval Tactical Data System (NTDS) data. The data utilized involved 19 programs, 12 ships and 325 software trouble reports. The maj or preliminary results and conclusions follow: 1. On the basis of Analysis of Variance tests, it was found that NTDS programs are heterogeneous with respect to reliability characteristics. There was greater variation of reliability between programs than within programs. This result suggests that program and programmer characteristics (source of between program *N. F. Schneidewind, "A Methodology for Software Reliability Prediction and Quality Control," Naval Postgraduate School, Report No. NPS55SS72032B, March 1972. variation) is more important in determining program reliability than is the stage of program . checkout or cumulative test time utilized (source of within program variation). This result indicates a potential for obtaining a better understanding of the determinants of software reliability by statistically correlating program and programmer characteristics with measures of program reliability. 2. Goodness of fit tests indicated much variation among programs in the type of reliability function which would be applicable for predicting reliability. This result and the Analysis of Variance results suggest that program reliability should be predicted on· an individual program basis and that it is not appropriate to merge sets of trouble report data from different programs in order to increase sample size for reliability prediction purposes. 3. Based on its application to NTDS data, the approach for reliability prediction and quality control which has been described appears feasible. However, the methodology must be validated against other test and operational data. Several interactive programs, written in the BASIC language, which utilize this approach, have been programmed at NELC*. Another model by Jelenski and Moranda* has been developed and validated against NTDS and NASA data. Other approaches, such as reliability growth models, multiple correlation and regression studies and utilization of data smoothing techniques will be undertaken as part of a continuing research program. BIBLIOGRAPHY 1 R M BALZER EXDAMS-extendable debugging and monitoring system AFIPS Conference Proceedings Vol 34 Spring 1969 pp 567-580 2 W J CODY Performance testing of function subroutines AFIPS Conference Proceedings Vol 34 Spring 1969 pp 759-763 3 J C DICKSON et al Quantitative analysis of software reliability Proceedings-Annual Reliability and Maintainability Symposium San Francisco California 25-27 January 1972 pp 148-157 * Programmed by Mr. Craig Becker of the Naval Electronics Laboratory Center. * Jelenski, Z. and Moranda, P. B., "Software Reliability Research," McDonnell Douglas Astronautics Company Paper WD1808, Navember 1971. Approach to Software Reliability Prediction and Quality Control 4 BERNARD ELSPOS et al Software reliability Computer January-February 1971 pp 21-27 5 ARNOLD F GOODMAN The interface of computer science and statistics Naval Research Logistics Quarterly Vol 18 No 2 1971 pp 215-229 6 K U HANFORD A utomatic generation of test cases IBM Systems Journal Vol 9 No 4 1970 pp 242-256 7 Z JELINSKI P B MORANDA Software reliability research McDonnell Douglas Astronautics Company Paper WD 1808 November 1971 8 JAMES C KING Proving programs to be correct IEEE Transactions on Computers Vol C-20 Noll November 1971 pp 1331-1336 847 9 HENRY CLUCAS Performance evaluation and monitoring Computing Surveys Vol 3 No 3 September 1971 pp 79-91 10 R B MULOCK Software reliability engineering Proceedings-Annual Reliability and Maintainability Symposium San Francisco California 25-27 January 1972 pp 586-593 11 R J RUBEY R F HARTWICK Quantitative measurement of program quality Proceedings-1968 ACM National Conference pp 672-677 12 N F SCHNEIDEWIND A methodology for software reliability prediction and quality control Naval Postgraduate School Report No NPS55SS72032B March 1972 The impact of prohlem statement languages on evaluating and improving software performance by ALAN MERTEN and DANIEL TEICHROEW The University oj Michigan Ann Arbor, Michigan INTRODUCTION overall problem of methodology. Its solution calls for a review of the process itself, so that maximum benefit can be had from the use of computers." This paper is concerned with one technique-problem statement languages-which are in accordance with the above philosophy as regards the system building system. They are an answer to the question: "How shall we conduct systems building now that computers are available'?" Problem statement languages are a class of languages designed to permit statement of requirements for information systems without stating the processing procedures that will eventually be used to achieve the desired results. Problem statement languages are used to formalize the definition of requirements, and problem statement analyzers can be used to aid in the verification of the requirement definition. These languages and analyzers have a potential for improving the total process of system building; this paper, however, will be limited to their role in software evaluation. Software systems can be evaluated in three distinct ways. User organizations are primarily interested in whether the software produced performs the tasks for which it was intended. This first evaluation measure is referred to as the validity of the software. "Invalid" software is usually the result of inadequate communication between the user and the information system designers. Even in the presence of perfect communication, the software system often does not initially meet the specifications of the user. The second evaluation measure of software systems is their correctness. Software is correct if for a given input it produces the output implied by the user specifications. "Incorrect" software is usually the result of programming or coding errors. A software system might be both valid and correct but still might be evaluated as being inefficient either by the user or the organization responsible for the com- The need to improve the methods by which large software systems are constructed is becoming widely recognized. For example, in a recent study (Office of Management and Budget!) to improve the effectiveness of systems analysts and programmers, a project team stated that: "The most important way to improve the effectiveness of the government systems analysts and programmers is by reducing the TIME now spent on systems analysis, design, implementations, and maintenance while maintaining or improving the present level of ADP system quality." As another example, a study group (U.S. Air Force2) concluded that to achieve the full potential of command and control systems in the 1980's would require research and development in the following aspects of system building: requirements analysis and design, software system certification, software timeliness and flexibility. Software evaluation is one part, but only a part of the total process of building information systems. The best way to make improvements is to examine the total process. This has been pointed out elsewhere (Teichroew and Sayani3) and by many others; Lehman, 4 for example, states: "When first introduced, computers did not make a significant impact on the commercial world. The real breakthrough came only in the 1950's when institutions stopped asking, 'Where can we use computers?' and started asking, 'How shall we conduct our business now that computers are available?' In seeking to automate the programming process, the same error has been committed. The approach has been to seek possible applications of computers within the process as now practiced.. . . Thus the problem of increasing programming effectiveness, through mechanization and tooling, is closely associated with the 849 850 Fall Joint Computer Conference, 1972 puting facility. This characteristic is termed' performance and is measured in terms of the amount of resources required by a software package to produce a result. The processing procedures and programs and files might be poorly designed such that the system costs too much to run and/or makes inefficient use of the computing resources. Poor performance of software is usually the result of inadequate attention to design or incorrect information on parameters that affect the amount of resources used. Before discussing the value of problem statement languages in software evaluation, the concept is described briefly. The impact on software validity is also discussed. Once requirements in a problem statement language are available, it becomes possible to provide computer aids to the analysis and programming process which reduce the possibility of introducing errors. This impact of problem statement languages on software correctness is discussed later in the paper. In general, a given set of requirements can be implemented on a given complement of hardware in more than one way and each way may use a different amount of resources. The potential of problem statement languages to improve software performance is examined. Some preliminary conclusions from work to date on problem statement languages and analyzers related to software evaluation, and the impact of software evaluation on the design and use of problem statement languages and analyzers are discussed. PROBLEM STATEMENT LANGUAGES AND PROBLEM STATEMENT ANALYZERS Problem statement languages were developed to permit the problem definer (i.e., the analyst or the user) to express the requirements in a formal syntax. Examples of such languages are: a language developed by Young and Kent;5 Information Algebra;6 TAG;7 ADS;8 and PSL. 9 All of these languages are designed to allow the problem definer to document his needs at a level above that appropriate to the programmer; i.e., the problem definer can concentrate on what he wants without saying how these needs should be met. A problem statement language is not a generalpurpose programming language, nor, for that matter, is it a programming language. A programming language is one that can be used by a programmer to communicate with a machine through an assembler or a compiler. A problem statement language, on the other hand, is used to communicate the need of the user to the analyst. The problem statement language consequently must be designed to express what is of interest to the user; what outputs he wishes from the system, what data elements they contain, and what formulas define their values and what inputs are available. The user may describe the computational procedures and/or decision rules that must be used in determining the values of certain intermediate or output values. In addition, the user must be able to specify the parameters which determine the volume of inputs and outputs and the conditions (particularly those related to time) which govern the production of output and the acceptance of input. These languages are designed to prevent the user from specifying processing procedures; for example, the user cannot use statements such as SORT (though he is allowed to indicate the order in which outputs appear), and he cannot refer to physical files. In some cases the languages are forms oriented. In these cases, the analyst using the problem statement language communicates the requirements by filling out specific columns of the forms used for problem definition. Other problem statement languages are free-form. The difficulty of stating functional specifications in many organizational systems has been well recognized. (Vaughan10): "When a scientific problem is presented to an analyst, the mathematical statement of the relationships that exist between the data elements of the system is part and parcel of the problem statement. This statement of element relationships is frequently absent in the statement of a business problem. The seeming lack of mathematical rigor has led us to assume that the job of the business system designer is less complex than that of the scientific designer. Quite the contrary-the job of the business system designer is often rendered impossible because the heart of the problem statement is missing! The fact that the relationships between some of the data elements of a business problem cannot be stated in conventional mathematical notation does not imply that the relationships are any less important or rigorous than the more familiar mathematical ones. These relationships form a cornerstone of any system analysis, and the development of a problem statement notation for business problems, similar to mathematical notation, could be of tremendous value." Sometimes the lack of adequate functional specifications is taken fatalistically as a fact of life: An example is the following taken from a recent IBM report:11 "In practice, however, the functional specifications for a large proj ect are seldom completely and consistently defined. Except for nicely posed mathematical requirements, the work of completely defining a functional specification usually approximates the work of coding the functions themselves. Therefore, such specifications are often defined within a specific problem context, or left for later detailed description. The detailed definition of functional specifications is usually unknown at Impact of Problem Statement Languages the outset of the project. Much of the final detail is to be filled in by the coding process itself, based on ageneral idea of intentions. Hopefully, but not certainly, the programmers will satisfy these intentions with the code. Even so, a certain amount of rework can be expected through misunderstandings. As a result of these logical deficiencies, a :large programming project represents a significant management problem, with many of the typical symptoms of having to operate under conditions of incomplete and imperfect information. The main content of a programming project is logical, to be sure. But disaster awaits an approach that does not recognize illogical requirements that are bound to arise." , Problem statement languages can reduce the existence of illogical requirements due to poor specification. Despite the well-recognized need for formal methods of stating requirements and the fact that a problem statement language was published by Young and Kent in 1958, such languages are not in wide use today. One reason is that, until recently, there did not exist any efficient means to analyze a problem definition given in a problem statement language. Therefore, these languages were only used for the documentation of each user's requirements. Under these conditions, it was difficult to justify the expense of stating the requirements in this formal manner. A problem statement language, therefore, is insufficient by itself. There must also be a formal procedure, preferably a software package, that will manipulate a problem statement for human use. Mathematics is a language for humans--: humans can, after some study, learn to comprehend it and to use it. It is not at all clear that the equivalent requirements language for business will be manipulatable by humans, though obviously it must be understandable to them. Computer manipulation is necessary. because the problem is so large, and a person can only look at one part at a time. The number of parameters is too large and their interrelationship is too complex. A problem statement language must have sufficient structure to permit a problem statement to be analyzed by a computer program, i.e., a problem statement analyzer. The inputs and outputs of this program and the data base that it maintains are intended to serve as a central resource for all the various groups and individuals involved in the system building process. Since usually more than one problem definer is required to develop requirements in an acceptable time frame, there must be provision for someone who oversees the problem definition process to be able to identify individual problem definitions and coordinate them; this is done by the problem definition management. One desirable feature of a system building process is to 851 identify system-wide requirements so as to eliminate duplication of effort; this task is the responsibility of the system definer. Also, si:~lCe the problem defines should use common data names there has to be some standardization of their names and characteristics and their definition (referred to here as "functions"). One duty of the data administrator is to control this standardization. If statements made by the problem definer are not in agreement as seen by· the system definer or data administrator, he must receive feedback on his . '~errors" and be asked to correct these. All of these capabilities must be incorporated in the problem statement analyzer which accepts inputs in the problem statement language and analyzes them for correct syntax and produces, among other reports, a comprehensive data dictionary and a function dictionary which are helpful to the problem definer and the 'data administrator. It performs static network analysis to ensure the completeness of the derived relationships, dynamic analysis to indicate the time-dependent relationships of the data, and an analysis of volume specifications. It provides the system definer with a structure of the problem statement as a whole. All these analyses are performed without regard to any computer implementation of the target information processing system. When these analyses indicate a complete and error-free statement of the problem, it is then available in two forms for use in the succeeding phases. One, the problem statement itself becomes a permanent, machinereadable documentation of the requirements of the target system as seen by the problem definer (not as seen by the programmer). The second form is a coded statement for use in the physical systems design process to produce the description of the target system as seen by the programmer. . . A survey of problem statement languages is given in Teichroew. 12 A description of the Problem Statement Languages (PSL) being developed at The University of Michigan is given in Teichroew and Sibley.9 In this paper the terms problem statement language and problem statement analyzer will refer to the general class while PSL and PSA will be used to mean the specific items being developed in the ISDOS Project at The University of Michigan. ROLE OF PROBLEM STATEMENT LANGUAGES IN SOFTWARE VALIDITY Definition of software validity Often when software systems are completed, they do not satisfy the requirement for which they were intended. There may be several reasons for this. The user 852 Fall Joint Computer Conference, 1972 may claim that his requirements have not been satisfied. The systems analysts and programmers may claim that the requirements were never stated precisely or were changing throughout the development of the system. If the requirements were not precisely and correctly stated, the analysts and programmer may produce software which does not function correctly. Software will be said to be valid if a correct, complete and precise statement of requirements was communicated to the analysts and programmers. Software which does not produce the correct result is said to be invalid if the reason is an error or incompleteness in the specification. Problem statement languages can increase software validity by facilitating the elimination and detection of logical errors by the user, by permitting the use of the computer to detect logical errors of the clerical type and by using the computer to carry out more complex analysis than would be possible by manual methods. Elimination or detection of logical errors by the user Problem statement languages and analyzers appear to be one way to increase the communication between the user organizations and the analysts and programmers. Usually organizations find it very difficult to distinguish between their requirements and various procedures for accomplishing those requirements. This difficulty is often the result of the fact that the user has had some exposure to existing information processing procedures and attempts to use this knowledge of techniques to ((aid" the analyst and in the development of the new or modified information system. The major purpose of problem statement languages is to force the user to state only his requirements in a manner which does not force a particular processing procedure. Experience has shown that this requirement of problem statement languages is initially often difficult to impose upon the user. Often, he is accustomed to thinking of his data as stored in files and his processing requirements as defined in various programs written in standard programming languages. He has begun to think that his requirements are for programs and files and not for outputs of the system. There is an interesting parallel between the use of problem statement languages and the use of data base management systems. Many organizations have found it difficult for users to no longer think in terms of physical stored data, but to concentrate on the logical structure of data and leave the physical storage to the data base management system. In. the use of problem statement languages the user must think only of the logical data and the processing activities. Initial attempts to encourage the use of problem statement languages have indicated some reluctance on the part of users to state only requirements, particularly if the "user" is accustomed to a programming language. However, once the functional analysts (problem definers) become familiar with the problem statement technique and learn to use the output from the problem statement analyzers, they find that they are able· to concentrate on the specification of input and output requirements without having to be concerned about the design and implementation aspects of the physical systems. Similarly, output of problem statement analyzers can be used to aid the physical systems designers in the selection of better processing procedures and file organizations. The physical systems designer has the opportunity to look at the processing requirements and data requirements of all the users of the system and can select something that approaches global optimality as opposed to a design which is good for only one user. One of the Ihajor problems in the design of software systems is the inability of users to segment the problem into small units to be attacked by different groups of individuals. Even when the problem can be segmented, the direct use of programming languages and/or data management facilities requires a great amount of interaction between the user groups and the designers throughout problem definition and design. Problem statement languages allow the individual users to state their requirements on a certain portion of the information system without having to be concerned with the requirement's definition of any other portion of the information systems. . This requirement is slightly modified in organizations where there exists a data directory (i.e., a listing of standard data names and their definition.) In this case each of the users must define his requirements in terms of standard names given in the data directory. In this case it is the purpose of the problem statement analyzer to check each problem statement to determine if it is using the previously approved data names. Besides the requirement to use standard data names, the individual problem definers can proceed with their problem definition in terms of their inputs, outputs, and processing procedures without knowledge of related data and processing activities. It is the purpose of the problem statement analyzer to determine the logical consistency of the various processing activities. It has been found that the individual problem definer might modify his statement of requirements upon receipt of the output of the problem statement analyzer. At this Impact of Problem Statement Languages time he has an opportunity to see the relationship between his requirements and those of others. Use of that problem statement analyzer to detect logical errors of the clerical type Given that requirements are stated precisely in a problem statement language, it is possible to detect many logical errors during the definition stage of system development. Traditionally, these errors are not detected until much later, i.e., until actual programs and files have been built and the first stages of testing have been completed. Problem statement analyzers such as the analyzers built for ADS and for PSL at The University of Michigan can detect errors such as computation and storage of a data item that is not required by anyone, and the inputting of the same piece of data from multiple sources. Extensions of these analyzers will be able to detect more sophisticated errors. For example, they might detect a requirement for an output requiring a particular item prior to the time at which the item is input or can be computed from available input. The problem analyzer can be used to check for problem definition errors by presenting information to a problem definer in a different order than it was initially collected. A data directory enumerates all the places in which a user-defined name appears. Another report brings together all the places where a system parameter has been used to specify a volume. An analyst can, by glancing through such lists, more readily detect a suspicious usage. Complex analysis of requirements The use of the computer as described above is for relatively routine clerical operations which could, in theory, be done manually if sufficient time and patience were available. The computer can also be used to carry out analysis of more complicated types or at least provide an implementation of heuristic rules. Examples are the ability to detect duplicate data names or to identify synonyms. These capabilities require an analysis of use and production of the various basic data items. ROLE OF PROBLEM STATEMENT LANGUAGES IN SOFTWARE CORRECTNESS Definition of software correctness Software is said to be incorrect if it produced the wrong results even though the specification for which 853 it was produced is valid and when the hardware is working correctly. The process of producing correct programs can be divided into five major parts: -designing an algorithm that will accomplish the requirements -translating the algorithm into source language -testing the resulting program to determine whether it produces correct results -debugging, i.e., locating the source of the errors -making the necessary changes. Current attempts to improve software correctness Software incorrectness is a major cause of low programmer productivity. For example, Millsll states: "There is a myth today that programming consists of a little strategic thinking at the top (program design), and a lot of coding at the bottom. But one small statistic is sufficient to explode that myth-the number of debugged instructions coded per man-day in a programming project ranges from 5 to at most 25. The coding time for these instructions cannot exceed more than a few minutes of an eight-hour day. Then what do programmers do with their remaining time? They mostly debug. Programmers usually spend more time debugging code than they do writing it. They· are also apt to spend even more time reworking code (and then debugging that code) due to faulty logic or faulty communication with other programmers. In short, it is the thinking errors, even more than the coding errors which hold the productivity of programming to such low levels." It is therefore not surprising that considerable effort has been expended to date to improve software correctness. Among the techniques being used or receiving attention are the following: (1) Debugging aids. These include relatively simple packages such as cross-reference listers and snapshot and trace routines described in EDP AnalyzerI3 and more extensive systems such as HELPER.17 (2) Testing aids. This category of aids includes module testers (e.g., TESTMASTER) and test data generators. For examples see EDP AnalyzerI3 and the survey by Modern Data. IS (3) Structured and modular programming. This category consists of methodology· standards and software packages that will hopefully result in 854 Fall Joint Computer Conference, 1972 fewer programming errors in the first two phases-designing the algorithm and translating it to the .source language code:-and that they will be easier to find if they are made. (Armstrong,t9 Baker,20 and Cheatham and Wegbreit.2l) (4) Automated programming. This category includes methods for reducing the amount of programming that must be done manually by producing software packages that automatically produce source or object code. Examples are decision:::; table processors and file maintenance systems. 13 ,lS Role of problem statement languages and analyzers software correctness ~n While the need for the aids mentioned above has been recognized, there has been considerable resistance by programmers to their use. 13 Problem statement languages and problem statement analyzers can be of considerable help in getting programmer acceptance of such aids and in improving software correctness directly. With the use of problem statement languages and analyzers, the programmer gets specifications in a more readable and understandable form and is, therefore, less likely to misinterpret them. In addition, extensions to existing analyzers could automatically produce source language statements. These extensions would take automatic programming methods to a natural limit since a problem statement is at least theoretically all the specification necessary to produce code. When the specifications are expressed in a problem statement language, the logical design of the system has effectively been decoupled from the physical design. Consequently, there is a much better opportunity to identify the physical processing modules. Once identified, they have to be programmed only once. ROLE OF PROBLEM STATEMENT LANGUAGES IN SOFTWARE PERFORMANCE Definition of software performance Software which produces correct results in accordance with valid specification may still be rejected by the user(s) because it is too expensive or because it is not competitive with other ~oftware or with non-computerized methods. The "performance" of software, however, is a difficult concept to define. One can give examples of performance measures in particular cases. For example, compilers are often evaluated in terms of the lines of source statements that can be compiled per minute on a particular machine. File organizations and access software (e.g., indexed sequential and direct access) are evaluated with respect to the rate at which data items can be retrieved. Sometimes software is compared on the basis of how computing time varies as a function of certain parameters. For example, matrix inversion routines are evaluated with respect to the relationship between process time and size of the array. Similarly, SORT packages are evaluated with respect to the time required to sort a given number of records. Current methods to improve software performance Software performance is important because any given piece of software is always in competition with the use of some other method, whether computerized or not. Considerable effort has been expended to develop methods to improve performance. Software packages have been developed to improve the performance of programs stated in a general purpose language, either by separate programs, e.g., STAGED (OST) and CAPEX optimizer,I3 or incorporated directly into the compiler. Similar techniques are used in decision table processors. A number of software packages developed can be used to aid in the improvement of performance of existing or proposed software systems. The software packages include software simulators and software monitors. Computer systems such as SCERT, SAM and CASE14, 15, 16 can be used to measure the performance of a software/hardware system on a set of user programs and files. Another approach to improving performance of software systems is to measure the performance of the different components of an existing system either through a software monitor or by inserting statements in the program. The components which account for the largest amount of time can then be reworked to improve performance. Each of these software aids attempts to improve the efficiency of a software system by modifying certain local code or specific file organizations. What would be more desirable is the ability to select the program and file organizations that best support the processing requirements of the entire information system. Role of problem statement languages in software performance Problem statement languages and analyzers can be used to improve the performance of software systems Impact of Problem Statement Languages even beyond that feasible with the aids outlined above. Decisions involving the organization of files and the design of the processing procedures can be made based on the precise statement, and analysis, of the factors which influence the performance of the computing system because the problem statement requires explicit statement of time and volume information. The time information specifies the time at which input is received by the system or the time at which the specified output is required from the system. For a scheduled or periodic input, the time information specifies the time of the day, month, or year, or relative to some other calendar. For unscheduled or random input, the expected rate for a fixed time interval is specified. The volume information consists of specification of the "size" of the input or the output. For example, the number of time cards or the number of paychecks. Volume information is specified so that it is possible to determine the number of characters of data that will be stored or processed and moved between storage devices. In order to design an efficient information system, the analyst must consider the processing needs of the individual users in arriving at a structure for the data base. Each of the data elements of the data base must be considered, and an indication made of the processing activities to be supported by the specific data. From this, the system designers must determine the file organization, i.e., the physical mapping of the data onto secondary storage. Methods for maintaining that data must be determined through the use of the time and volume information specified in the information system requirements. Problem statement analyzers summarize, format and display the time and volume information which is relevnnt to the file designers. Our experience with problem statement languages seems to indicate that efficient systems are designed in which a file organization is initially- determined and then the program design is undertaken. Problem statement languages are used to state the processing procedures required to produce the different output products or to process the various input data. Problem statement analyzers must have the ability to aid the systems analyst to group various data and also to group procedures in such a way as to minimize the amount of transfer of data between primary memory and secondary memory. One of the outputs from the problem statement analyzer such as the one for PSL is an analysis of the processing procedures which access the same or overlapping sets of data. These processes can, in many cas~, be grouped together to form programs. Currently software systems are often defined in which a file organization is initially selected based on 855 some subset of the processing requirements. Following this selection, additional processing requirements are designed to "fit into" this initial file organization. As problem statement analyzers become more powerful, it will be possible to delay this decision concerning selection of a file organization until all the major processing requirements have been specified. REMARKS To our knowledge there does not exist any definitive study with a controlled experiment and collection of data to answer questions such as "why does software not produce the desired result; why does it not produce the correct result; .and why does it not use resources efficiently?" However, it is generally agreed that the major causes include the following: 1. Errors in communication or misunderstandings from those who determine whether final results are valid, correct and produced efficiently to those who design and build the system. 2. Difficulties in defining interfaces in the programming process. The serious errors are not in individual programs, but in ensuring that the output of one program is consistent with the input to another. 3. Inability to test for all conceivable errors. Considerable effort has gone into developing methods and packages to improve software--~ome of these have been mentioned in this paper. The ISDOS Project at The University of Michigan has been engaged in developing and testing several problem statement languages and problem statement analyzers. This paper has been concerned with the ways in which this use of problem statement languages and problem statement analyzers will lead to better software. Problem statement analyzers have been developed for ADS and for PSL at The University of Michigan. The analyzers have been used in conjunction with the development of certain operational software systems as well as a teaching and research tool in the University. The ADS analyzer has been tested on both large and small problems to determine its use in software evaluation and development. The analyzer is currently being modified and extended to be used extensively by a government organization. Initial research concerning the installation of this analyzer within the organization indicates that analyzers must be slightly modified to interface with the procedures of the functional user. 856 Fall Joint Computer Conference, 1972 Other portions of the analyzer appear to be able to be used as they currently exist. Generally, it appears as if analyzers will have a certain set of organizationally independent features and will have to be extended to include a specific set of organizationally dependent features for each organization in which they are used. 3 CONCLUSION 4 Many of the techniques being used or proposed to improve software performance are based on the current methods of developing software. The problem statement language represents an attempt to change the method of software development significantly by specifying software requirements. This paper has attempted to demonstrate that the use of problem statement languages and analyzers could improve software in terms of validity, correctness and performance. The design of problem statement languages and the design and construction of problem statement analyzers are formidable research and development tasks. In some sense the design task is similar to the design of standard programming languages and the design and construction of compilers and other language processors. However, the task appears more formidable when one considers that these languages will be used by non-computer personnel and are producing output which must be analyzed by these people. The procedure by which these techniques are tested and refined will probably be similar to the development and acceptance of the "experimental" compilers and operating systems of the 1950's and 1960's. These techniques are directed at the specification of requirements for application software. This phase of the life cycle of information systems has received the least amount of attention to date. The development and use of problem statement languages and analyzers can aid this phase. As the languages are improved and extended, their value as an aid to the entire process of software development should be realized. 2 5 6 7 8 9 10 11 12 13 14 ACKNOWLEDGMENTS This research was supported by U.S. Army contract #010589 and U.S. Navy Contract # N00123-70-C2055. REFERENCES 1 OFFICE OF MANAGEMENT AND BUDGET Project to improve the ADP systems analysis and computer 15 16 17 programming capability of the federal government December 17 1971 69 pp and Appendices U.S. AIR FORCE Information processing data automation implications of air force command and control requirements in the 1980's Executive Summary February 1972 (SAMSO/XRS-71-1) D TEICHROEW H SAYANI A utomation of system building DATAMATION August 15 1972 M M LEHMAN L A BELADY Programming systems dynamics or the metadynamics of systems in maintenance and growth IBM Report RC 3546 September 17 1971 J W YOUNG H KENT Abstract formulation of data processing problems Journal of Industrial Engineering November-December 1958. See Also Ideas for Management International Systems-Procedures Assoc CODASYL DEVELOPMENT COMMITTEE An information algebra phase I report Communications ACM 5 4 April 1962 IBM The time automated grid system (TAG): sales and systems guide Reprinted in J F Kelly Computerized Management Information System Macmillan New York 1970 H J LYNCH ADS: A technique in system documentation ACM Special Interest Group in Business Data Processing Vol 1 No 1 D TEICHROEW E SIBLEY PSL, a problem statement language for information processing systems design The University of Michigan ISDOS Working Paper June 1972 P H VAUGHAN Can COBOL cope DATAMATION September 11970 H D MILLS Chief programmer teams: principles and procedures IBM Federal Systems Division FSC 71-5108 D TEICHROEW Problem statement languages in MIS E Grochla (ed) Management-Information-Systeme Vol 14 Schriftenreiche Betriebswirteschaftliche Beitrage zur Organisation und Automation Betriebswirtschaftlicher Verlag Weisbaden 1971 EDP ANALYZER COBOL aid packages Canning Publications Vol 10 No 8 May 1972 J W SUTHERLAND The configuration: today and tomorrow Computer Decisions N J Hayden Publishing Company Rochelle Park February 1971 J W SUTHERLAND Tackle system selection systematically Computer Decisions NJ Hayden Publishing Company Rochelle Park February 1971 J N BAIRSTOW A review of systems evaluation packages Computer Decisions N H Hayden Publishing Company Rochelle Park June 1970 R R RUSTIN (ed) Impact of Problem Statement Languages Debugging techniques in large systems Prentice-Hall 1971 18 SOFTWARE FORUM Survey of program package-programming aids Modern Data March 1970 19 R ARMSTRONG Modular programming for business applications To be published 20 F T BAKER Chief programmer team management of production programming IBM Systems Journal No 1 1972 21 T E CHEATHAM JR B WEGBREIT A laboratory for the study of automated programming SJCC 1972 857 The solution of the minimum cost flow and maximum· flow network problems using associative processing by VINCENT A. ORLANDO and P. BRUCE BERRA* Syracuse University Syracuse, N ew York INTRODUCTION ASSOCIATIVE MEMORIES/PROCESSORSI-5 The minimum cost flow problem exists in many areas of industry. The problem is defined as: given a network composed of nodes and directed arcs with the arcs having an upper capacity, lower capacity, and a cost per unit of commodity transferred, find the maximum flow at minimum cost between two specified nodes while satisfying all relevant capacity constraints. The classical maximum flow problem is a special case of the general minimum cost flow problem in which all arc costs are identical and the lower capacities of all arcs are zero. The objective in this problem is also to find the maximum flow between two specific nodes. Algorithms exist for the solution of these problems and are coded for running on sequential computers. However, many parts of both of these problems exhibit characteristics that indicate it would be worthwhile to consider their solution by associative processors. As used in this paper, an associative processor has the minimum level capabilities of content addressability and parallel arithmetic. The cont~nt addressability property implies that all memory words are searched in parallel and that retrieval is performed by content. The parallel arithmetic property implies that arithmetic operations are performed on all memory words simultaneously. In this paper, some background in associative memories/processors and network flows, is first provided. We then present our methodology for comparison of sequential and associative algorithms through the performance measures of storage requirements and memory accesses. Finally we compare minimum cost flow and maximum flow problems as they would be solved on sequential computers and associative processors; and present our results. The power of the associative memory lies in the highly parallel manner in which it operates. Data are stored in fixed length words as in conventional sequential processors, but are retrieved by content rather than by hardware storage address. Content addressing can take place by field within the storage word so, in effect, each word represents an n-tuple or cluster of data and the fields within each word are the elements, as illustrated in Figure 1. One of the ways in which accessing can take place is in a word parallel, bit serial manner in which all words in memory are read and simultaneously compared to the search criteria. This allows the possibility of retrieving all words in which a specified field satisfies a specified search criterion. These search criteria include equality, inequality, maximum, minimum, greater than, greater than or equal to, less than, less than or equal to, between limits, next higher and next lower. Further, complex queries can be formed by logically combining the above criteria. Boolean connectives include AND, inclusive OR, exclusive OR and complement. Finally, any number of fields within the word can be defined with no conceptual or practical increase in complexity. That is, within the limitation of the word length, any number of elements may be defined within a cluster. In addition to the capabilities already mentioned, associative memories can be constructed to have the ability of performing arithmetic operations simultaneously on a multiplicity of stored data words. Devices of this type are generally called associative processors. Basic operations that can be performed on all words that satisfy some specified search criteria as previously described include: add, subtract, multiply or divide a constant relative to a given field; and add, subtract, multiply or divide two fields and place the result in a third field. This additional capability extends the use of the associative processor to a large * This research partially supported by RADC contract AF 30602-70-C-0190 Large Scale Information Systems. 859 860 Fall Joint Computer Conference, 1972 class of scientific problems in which similar operations are repeated for a multiplicity of operands. While various architectures exist for these devices and they are often referred to by other names (parallel processors, associative array processors), in this paper we have adopted the term associative processor and further assume that the minimum level capabilities of the device include content addressing, parallel searching and parallel arithmetic as described above. NETWORK NETWORK DATA STRUCTURE An example network is given in Figure 2, in which the typical arc shown has associated with it a number of attributes. The type and number of these attributes depends upon the specific network problem being solved but typically include the start node, end node, length, capacity and cost per unit of commodity transferred. In solving problems, considerably more than just the network definition attributes are required due to additional arc related elements needed in the execution of the network algorithms. Included are items such as node tags, dual variables, flow, bookkeeping bits, etc. Thus, each arc represents an associative cluster of TYPICAL ARC (A,B,C,D,E) o---~~o DATA CLUSTER A B C D E - START NODE END NODE LENGTH CAPACITY COST Figure 2-Data structure for network problems BIT FIELDS F4 data and hence can be stored within the associative memory with a minimum of compatibility problems. The above discussion applies to network problems in general and served as the basis for research by the authors6 •7 into the use of associative processing in the solution of the minimum path, assignment, transportation, minimum cost flow and maximum flow problems. Results for all problems indicated that a significant improvement in execution time can be achieved through the use of the associative processor. The purpose of this paper is to describe the details of the methodology used and present the results obtained for the minimum cost flow and maximum flow problems. WORDS METHODOLOGY AND MEASURES OF PERFORMANCE DATA CLUSTER: (FI' F2 , ... , Fn) Figure 1-Associative memory layout The general methodology followed in this research was to first solve small problems on sequential computers in order to develop mathematical relationships that could be used to extrapolate to large problems; then to solve small problems on an associative processor emulator to again generate data that could be used in extrapolating to large problems and finally, to compare the results. This methodology had the distinct advantage of obtaining meaningful data without having to Solution of Minimum Cost Flow and Maximum Flow Network Problems expend vast amounts of computer time in solving large problems. Further, since we did not have a hardware associative processor at our disposal, through the use of the emulator, we were able to solve real problems in the same way as ,vith the hardware. In order to compare the compiler level sequential program to the emulated associative program, it was first necessary to define some meaningful measures of performance. It was considered desirable for these measures to be implementation independent and yet yield information on relative storage requirements and execution times since these are the characteristics most often considered in program evaluation. Measures satisfying the above requirements which were used in the performance comparisons of this research are storage requirements and accesses. The storage requirements measure is defined as the number of storage words required to contain the network problem data and any intermediate results. It should be noted that the number of bits per storage word would typically be greater for an associative processor since word lengths of 200 to 300 bits are typical of the hardware in existence or under consideration. However, word comparisons have the advantage of being implementation independent while providing a measure that is readily converted to the bit level for-specific comparisons in which the word lengths of each machine are known. A determination of storage requirements for the competing programs was accomplished by counting the size of the arrays for the sequential programs and the number of emulator storage words for the associative programs. In both cases we assumed that enough memory was available to hold the entire problem. The storage accesses measure is defined as the number of times that the problem data are accessed during algorithm execution. Defined in this manner this quantity is also implementation independent. However, it should be noted that the ratio of sequential to associative processor accesses is approximately equal to the ratio of execution times that would be expected if both algorithms were implemented on current examples of their respective hardware processors. This is true since the longer cycle time of the associative processor is more than offset by the large number of machine instructions represented by each of the sequential accesses. Collection of data for this me~sure was accomplished in the sequential programs by the addition of statements to count the number of times that the array problem data were accessed. Only accesses to original copies of the array data were included in this count. That is, accessing of non-array constants that temporarily contained problem data was not counted. Data collection for the associative programs was 861 accomplished by a counter that incremented each time the emulator was called. SEQUENTIAL ALGORITHM ANALYSIS It was recognized that it would be highly desirable to obtain the sequential algorithms from an impartial, authoritative source, since this would tend to eliminate the danger of inadvertently using poor algorithms and thus obtaining results biased in favor of the associative processor. A search of the literature indicated that these requirements were perhaps best met by algorithms published in the Collected Algorithms from the Associationfor Computing Machinery (CACM)8. While these algorithms may not be the "best" in certain senses, they have the desirable property of being readily available to members of the computer field. Algorithm 3369 is the only algorithm published in the CACM that solves the general minimum cost flow problem stated above. This algorithm is based on the Fulkerson out-of-kilter methodlO which is the only approach available for a single phase solution to this problem. That is, this method permits conversion of an initial solution to a feasible solution (or indicates if none exists) at the same time that optimization is taking place. Other algorithms accomplish these tasks in separate and distinct phases. The single algorithm published by the CACM for the maximum flow problem is number 32411 which is based on the Ford and Fulkerson method. 12 This method appears to be recognized as a reasonable approach since it is consistently chosen for this problem in textbooks on operations research. 13 ,14 These sequential algorithms were implemented in FORTRAN IV and exeeuted on the Syracuse University Computing Center IBM 360/50 to verify correctness of the' implementations and to collect performance data. A detailed analysis of the logic for Algorithm 336 indicates that the access expressions for this program are as follows NB NACsEQ =11 NARCS+ L: NAIsEQ(BR)i i=l NN + L: NAIsEQ(NON)j (1) j=l where NAIsEQ(BR)i=N +21 NLAB i +4 NONL1 i +13 NONL2 i +9 NAUG i -30 (2) NAI sEQ (NONL-=4 N +19 NLAB j +4 NONL1 j +13 NONL2j+4 NARCS+9 NPLAB j -12 (3) 862 Fall Joint Computer Conference, 1972 and was developed for Algorithm 324 and is given as follows NAC = number of storage accesses required for N AI (BR) = number of accesses in a flow augmenting problem solution iteration, called a breakthrough condition NAI(NON) = number of accesses in an improved dual solution iteration, called a non-breakthrough condition NB = number of breakthrough iterations in a problem NN =. number N = number of network nodes NARCS = number of network arcs NLAB = number of nodes labeled during an iteration NONL1 = number of arcs examined during an iteration that have both nodes unlabeled NONL2 = number of other arcs examined in an iteration that do not result in a labeled node NPLAB = number of arcs with exactly one node labeled NAUG = number of nodes on a flow augmenting path. of non-breakthrough iterations in a problem Note that the above expressions represent a nontypical best case for the sequential labeling process since it is assumed that only one pass through the list of nodes is required for the entire labeling process. To simplify the above expressions, assume that all arcs processed which do not result in a labeled node are of type NONLl. This then makes NONL1 =NARCSNLAB. Further assume that NPLAB takes on its average lower bound of NARCS/N. Both of these assumptions introduce further bias in favor of the sequential program. After making these substitutions, equations (2) and (3) become NAIsEQ(BR)i=N +17 NLAB i +4 NARCS +9 NAUG i -30 (4) +9 NARCS/N -12 (5) In a similar manner, a best case access expression NAI sEQ :=3 N +8 (NARCS/N) (NAUG i -1) +10 NAUG i +4 NLAB i -16 (6) where N AI = the number of accesses in an iteration. The above access expressions were verified by comparing the predicted values with those obtained experimentally through actual execution of the· programs. ASSOCIATE ALGORITHM ANALYSIS The out-of-kilter method described above was also used as the basis for the associative processor algorithm since it represents the only minimum cost flow method available that is developed from a network rather than a matrix orientation. The node tags which are used to define the unsaturated path from source to sink are patterned after the labeling method of Edmonds and Karp as described in HU. I3 This selection was made to exploit the associative processor minimum search capability by finding the minimum excess capacity after the sink was reached, rather than performing a running comparison at each labeled node as in the original labeling method. For a discussion of the details of this development see Orlando. 6 Asjndicated earlier, hardware implementation of the developed algorithm was not possible since very few associative processors are in existence and in particular none was available for this research. To circumvent this problem, as previously stated, a software interpretive associative processor emulator was developed after extensive investigation of the programming instruction format and search capabilities available on the Rome Air Development Center (RADC) Associative Memory.Is Additional arithmetic capabilities expected to be available on any associative processor were included in the emulator. Thus, it had the basic properties of content addressability, parallel search and parallel arithmetic. In operation, the associative network programs, composed primarily of calls to the emulator, are decoded and executed one line at a time. Each execution, although composed of many sequential operations, performs the function of one associative processor statement written at the assembly language level. The program for the emulator was implemented in FORTRAN IV and executed on the IBM 360/50. Complete details ·of the capabilities and operation of the emulator and listings of the associative emulator programs are contained in Orlando. 6 The access expressions for the associative processor program derived through a step by step analysis of the Solution of Minimum Cost Flow and Maximum Flow Network Problems logic are presented below. The terminology used is the same as defined previously. NB NAC AP =3+ I: NAIAP(BR)i i=l NN + I: NAIAP(NON)j (7) j=l where NAI AP (BR)i=13 NLAB i +3 NAUG i +20 NAIAP(NON)j= 13 NLABj+29. (8) (9) The above access expressions represent a worst case for the algorithm logic since each step includes the maximum amount of processing possible. That is, it was assumed that the out-of-kilter arc detected always belonged to the last case to be tested and that each node used as a base point for labeling only resulted in the labeling of one additional node. The associative processor algorithm for the maximum flow problem is based on the Ford & Fulkerson method12 with the modification of node labeling as described above. A comparable worst case access expression for this algorithm is NAI APi = 11 NLAB i +3 NAUG i (10) The above access expressions were also verified using experimental data obtained from execution of the emulated programs. 863 it is seen that 11 NARCS accesses are required by the sequential program for this purpose while equation (7) shows that the associative processor program requires 3 accesses for problem initialization regardless of network size. Thus, the comparison on an iteration basis introduces an additional bias in favor of the sequential program. In order to avoid handling the breakthrough and nonbreakthrough cases separately, the comparison will be made on the basis of an average of breakthrough and non-breakthrough access requirements. That is, change to mean values and define NAI = NAI(BR) + NAI(NON) 2 (11) . Experience with the algorithm indicates that in general the majority of the problem iterations result in non-breakthrough and therefore the average as defined in equation (11) gives this case a smaller than realistic weighting. A comparison of the iteration access expressions, equations (4), (5), (8) and (9) indicate a greater relative performance gain for the associative processor in the breakthrough case. Therefore, the equal weighting of the iteration types introduces additional bias in favor of the sequential program. Substitution of the access expressions in equation (11) yields NAIsEQ=%(5 N+32 NLAB+12 NARCS+9NAUG +9 NARCSjN -42) NAIAP= %(26 NLAB+3 NAUG+49). (12) (13) PERFORMANCE COMPARISON Now, let NLAB=aN and NAUG=bN which by The list orientation of the sequential program for the minimum cost flow problem imposes a requirement of 7 NARCS + N words for the storage of problem data. This is approximately seven times the NARCS+l storage words required by the associative processor program. However, since both programs store network data in the form of arc information, the above comparison is the same for all networks. Access comparisons between the sequential and associative processor programs are made on an average per iteration basis. This eliminates the need to assume values for the number of breakthrough and nonbreakthrough iterations needed for problem solution. This approach is valid in terms of total problem access requirements since both algorithms are based on the same method and would therefore require the same number of each type of iteration in the solution of the same problem. The main effect of this approach is to eliminate from the comparison the number of accesses required for problem initialization. From equation (1) TABLE I-Minimum Cost Flow Access Performance Data N 100 500 1,000 NARCS ASSOCIATIVE SEQUENTIAL NAI NAI R D 2:0 .01 100 1,475 2,864 1,000 8,304 5.6 .10 .61 6,000 38,529 26.1 10,000 62,706 42.5 1.00 2.0 .002 500 7,275 14,464 1,000 17,468 2.4 .004 .04 10,000 71,549 9.8 100,000 612,359 84.2 .40 .60 150,000 912,809 125.5 250,000 1,513,709 208.1 1.00 2.0 .001 1,000 14,525 28,964 10,000 83,004 5.7 .01 623,409 42.9 .10 100,000 .60 600,000 3,625,659 249.7 1,000,000 6,027,459 415.0 1.00 864 Fall Joint Computer Conference, 1972 Z 10 0 107 7 N= 1000 t= <[ ...- 10' l&J 0 t= <[ 0:: l&J SEQ Q.. lIJ SEQ 0:: L&J 0- ~ 10 5 en L&J en en u « 10 5 10 4 L&J lIJ C) u u N=IOOO « «> 10' 0:: !:: en l&J en en ~ N=IOOO z 0:: 10 <[ 4 N=50 I N=IOOO L&J C!) AP I~ « 0:: N=500 L&J AP > <[ I 3 10 102 103 10 4 10 5 NARCS - NUMBER OF NETWORK ARCS 10' I~ N=IOO Figure 3-Minimum cost flow access requirements 10 3 10 4 10 5 NARCS - NUMBER OF NETWORK ARCS 10 2 10 6 Figure 5-Maximum flow access requirements 400 350~----~~----~--------~--------~--------_ L&J LIJ :: 350 > .... ~ 300 et u u ~300 o CI) (J) ~ 250 et "~ 250 "..J et .... t= z z LIJ ::> 200 LIJ LIJ o L&J (J) CI) .... (J) (J) CI) et a: 0:: (J) LIJ U U a: 150 o o .... et I 200 ::> o et N =1000 ~ 50 u et 50 I a: o o 0.2 0.4 0.6 0.8 D - NETWORK DENSITY Figure 4-Minimum cost flow access ratio 1.0 0.2 0.4 0.6 0.8 D-NETWORK DENSITY Figure 6-Maximum flow access ratio 1.0 Solution of Minimum Cost Flow and Maxim~m Flow Network Problems definition imposes the constraint a, b ~ 1. Making this substitution and forming the ratio of sequential to associative accesses yields R= NARCS(12+9/N) +N(5+32a+9b) -42 ( ) N(26a+3b) +49 . 14 Since, a, b ~ lesselecting a = b = 1 gives the most conservative assessment of the impact of the associative processor as applied to this problem. Recall that this implies that NLAB=NAUG=N. Substituting these values into equations (12), (13) and (14) yields NAISEQ=NARCS(6+4.5/N) +23 N-2l (15) NAIAP = ~ (29 N +49) (16) R= NARCS(12+9/N) +46 N -42 (17) 29N+49 . The solution of these equations over a representative range of node and arc values results in the data of Table 1 which are presented graphically in Figures 3 and 4. The associative processor access requirements are seen to remain constant with changes in the number of network arcs, reflecting the parallel manner in which the arc data are processed. As shown in Figure 4, the access ratio data of Table I are plotted against network density which is defined as D= NARCS . N(N-l) . (18) Analysis of the preceding data indicates that the TABLE II-Maximum Flow Access Performance Data N 100 500 1,000 NARCS 100 1,000 6,000 10,000 500 1,000 10,000 100,000 150,000 250,000 1,000 10,000 100,000 600,000 1,000,000 ASSOCIATIVE SEQUENTIAL NAI NAI 625 R D 926 1.5 .01 1,654 4.2 .10 12,254 19.6 .61 19,934 31.9 1.00 4,726 .002 3,125 1.5 1.8 .004 5,718 .04 23,574 7.5 202,134 64.7 .40 301,334 96.4 .60 499,734 159.9 1.00 9,476 6,250 1.5 .001 27,404 .01 4.4 206,684 33.1 .10 1,202,684 192.4 .60 1,999,484 319.9 1.00 865 access ratio R lies in the range 2.0~R~0.4 N for N~100 depending upon the density of the network. Because of the approach used, this is an indication of a lower bound on the performance improvement afforded by the associative processor and values of R considerably greater than this bound would typically be expected. An equivalent analysis6 for the maximum flow problem yields a sequential program storage requirement of 5(NARCS+l) words against an associative requirement of (NARCS+l) storage words. Access expressions for this problem were determined to be NAIsEQ = NARCS (2-8/N) +7.5 N -16 (19) NAIAP=6.25 N (20) Performance data resulting from these expressions, presented in Table II and Figures 5 & 6, indicate that 1.5~R~0.3 N for N~100. SUMMARY A comparison was made of the relative performance of the associative processor to present sequential computers on the basis of storage requirements for problem data and the number of times that these data were accessed in the course of solving the minimum cost flow and maximum flow problems. It was indicated that the ratio of sequential to associative storage accesses gives an approximate indication of the ratio of execution times to be expected assuming typical hardware speeds for each processor. Sequential comparison data were obtained through FORTRAN implementation of algorithms published by the ACM as representing typical examples of sequential solutions to these problems. Storage word requirements were obtained directly from the program declarations while access data were obtained by inserting counters to accumulate the number of times that the problem data were accessed in the execution of the sequential programs. Flow diagrams for the associative processor solution of these problems were developed based upon the capabilities inherent in an associative processor. By analyzing these diagrams it was possible to calculate the number of memory words required for problem data as well as the number of storage accesses required in the execution of the algorithms. To test the correctness of the derived algorithms and verify the accuracy of the access calculations, the algorithms were programmed in associative statements at the assembly 866 Fall Joint Computer Conference, 1972 language level and executed on an interpretive emulator program written in FORTRAN and run on the Syracuse University Computing Center IBM 360/50. Emulation was required since large scale examples of the associative hardware are not yet available. It was shown that the storage requirements for the minimum cost flow and maximum flow problems were 7 NARCS+N' and 5(NARCS+1) words respectively, where NARCS is the number of arcs and N is the number of nodes in the network. The number of associative processor words was determined to be N ARCS+ 1 in both cases. Considering the differences in word lengths, both systems require approximately the same amount of storage. The access expressions for each of the competing programs were simplified assuming a best case for the sequential and a worst case for the associative processor. Under the stated assumption, the resulting ratio ranges of 2.0~R~0.4 N} 1.5~R~0.3 for N~100 N represent a lower bound on the performance improvement to be expected through the application of the associative processor to the solution of the minimum cost flow and maximum flow problems respectively. REFERENCES 1 A G HANLON Content addressable and associative memory systems; a survey IEEE Transactions on Electronic Computers August 1966 p509 2 J A RUDOLPH L C FULMER W C MEILANDER The coming of age of the associative processor Electronics February 1971 3 A WOLINSKY Principals and applications of associative memories Presented to the Third Annual Symposium on the Interface of Computer Science and Statistics Los Angeles California January 30 1969 4 J MINKER Bibliography 25: An overview of associative or contentaddressable memory systems and a KWIC index to the literature: 1956-1970 ACM Computing Reviews October 1971 p 453 5 W L MIRANKER A survey of parallelism in numerical analysis SIAM Review Vol 13 No 4 October 1971 p 524 6 V A ORLANDO Associative processing in the solution of network problems Unpublished doctoral dissertation Syracuse University January 1972 7 V A ORLANDO P B BERRA AssQciative processors in the solution of network problems 39th National ORSA Meeting May 1971 8 CACM Collected algorithms from the communications of the association for computing machinery ACM Looseleaf Service 9 T C BRAY C WITZGALL Algorithm 336 netflow Communications of the ACM September 1968 p 631 10 D R FULKERSON An out-of-kilter method for the minimal cost flow problem Journal of the SIAM March 1961 p 18 11 G BAYER Algorithm 324 maxflow Communications of the ACM February 1968 p 117 12 L R FORD D R FULKERSON Flows in networks Princeton University Press 1962 13 T C HU Integer programming and network flows Addison-Wesley 1969 14 H M WAGNER Principals of operations research Prentice-Hall Inc 1969 15 Manual GER 13738 Goodyear Aerospace Corporation Minicomputer models for non-linear dynamic systems by J. RAAMOT Western Electric Company, Inc. Princeton, New Jersey INTRODUCTION I t is necessary to have some understanding of basic integer arithmetic operations before the solution scheme can be discussed. Therefore, the following sections introduce the concepts of F-space and difference terms which are used in integer arithmetic. The computational methods of integer arithmetic have been extended to a variety of applications since the first publication. 1 ,2 The most noteworthy application of integer arithmetic is the calculation of numerical solutions to initial value problems. This method is introduced here with the example of the differential equation: dx -+x=O dt F-SPACE SURFACES A common problem in mathematics is to find the roots of an expression: Given some f(x) the task is to find the values of x which satisfy the equationf(x) =0. These roots can be obtained by a method of trial and error where successive values of x are chosen until the equation is satisfied. A simpler method is to introduce the additional variable y, and to find the points on y=f(x) where the contour crosses the x-axis. This technique of introducing one additional variable is central to operations of integer arithmetic. In the two-dimensional case, a contour f(x, y) =0 is the intersection of the surface F=f(x, y) with the xy;..plane. Here F is the additional variable and is denoted by a capital letter in order to develop a simple notation for subsequent operations. This three-space is called F -space. It can be created for any dimensionality as is indicated in the table in Figure l. Integer arithmetic is not concerned with an analytic characterization of F -space surfaces, but with a set of solution points (F, x, y) at integer points (x, y). The integer points are established by scaling the variables so that unity represents their smallest significant incre-' ment over a finite range. In mathematical calculations the use of integer cal~ culations is avoided because each multiplication may double the number of digits which have to be retained, and the resultant numbers tend to become impractically large. This does not happen in integer arithmetic because the values of F are evaluated at adj acent integer points (x, y) and are expressed as differences. Thereby multiplication is avoided and addition of the differences is the only mathematical operation that is used. (1) By substituting the variable y in place of the derivative, the equation becomes y+x=O (2) which represents a trajectory in the phase-plane. Given an initial solution point (xo, Yo, to), other solution points (x, y, t) are readily found by first solving the phase-plane equation and then computing the values of the variable t from rewriting the above equation as t=- fd; (3) This example of finding solution points to an initialvalue problem demonstrates the procedure which is used in integer arithmetic solutions. Other solution schemes avoid this procedure because in the general case the phase-plane trajectory cannot be expressed in the form f(x, y) =0, and an incremental calculation of solution points (x, y) builds up errors. The major contribution here is that with integer arithmetic techniques, the points (x, y) along the phase-plane trajectory can be calculated with no accumulation of error in incremental calculations, even though the trajectory cannot be expressed in a closed form. As a result, this method handles with equal ease all initial-value problems without making distinctions as to non-linearity, order of differential equation, and homogeneity. 867 868 Fall Joint Computer Conference, 1972 order to distinguish it from finite differences in difference equations, from slack variables in linear programming, and from partial derivative notation, because F x has a relationship to all of these but differs in its interpretation and use. The notation for indicating the direction of incrementation is to have F x or F -x, In addition, the difference between successive difference terms is the second difference term in x and is defined by the identity: INTEGER ARITHMETIC DIMENSION SOLUTION TO EQUATION F-SPACE EQUATION f(x)=O y=f(x) CONTOUR ROOTS 2 3 f(x,y) =0 F=f(x,y) CONTOUR SURFACE flx,y)=O y=O Fl =f(x,y) F2=y ROOTS CONTOUR f(x,y,z)=O F=f(x,y,z) SURFACE SURFACE (7) Figure I-Table of solutions to equations which are obtained by operations in F -space In an initial-value problem, the problem is to follow the phase-plane trajectory from an initial starting point. There are various integer arithmetic contour-following algorithms which are based on the sign, magnitude, or a bound on the F -value. To use anyone of these algorithms it is necessary to specify the start point (F, x, y), the direction, and the difference terms. To apply these contour-following algorithms to a phase-plane trajectory, the difference terms have to be established. This forms the major portion of the computation. There are two methods for finding the difference terms and both are illustrated here for the equation of the circle x2+y2=r2 (8) I t forms the F-space surface The F-space solutions are exact for polynomials f(x, y) =0 at all integer points (x, y). Also the F-space surface F = f(x, y) is single-valued in F. Therefore, there F =r2- (X 2+y2) (9) which is a paraboloid of revolution and is illustrated in is no error in calculating successive solution points based on differences, and the solution at anyone point does not depend on the path chosen to get there. These properties do not hold for non-polynomial functions (e.g., exponentials) , but there the F -space surface points can be guaranteed to be accurate to within one unit in x and y over a finite range.! F DIFFERENCE TERMS Let the value of the variable F at a point (x, y, ... ) be F and be F Ix-tl at (x+l, y, ... ). Then the first difference term in x is defined by the identity 15 (4) 10 I t is the change in the F-variable on x-incrementation. This identity can be rewritten for the F -space surface 5 F=f(x, y) (5) o -~---x as F =f(x±l x , y ) -f( x, Y ) = L..J ~ (±l)n 8n]'(x, y) , 8 n n=! n. x (6) The notation F x is chosen for difference terms in Figure 2-The F-space surface of the circle, F a paraboloid of revolution = 25 - (X2+y2) is Minicomputer Models for Non-Linear Dynamic Systems Figure 2. The intersection of this surface with the xy-plane forms the circle of radius r. Case 1: Given F=f(x,y) by the ratio of difference terms at adjacent integer points. Thus, at a point (x, y) the derivative is a good approximation of the ratio of difference terms, and vice versa: dy -Fz _/'"Oov _ _ In this case the difference terms are established from the identity: F±z=F IZ±l-F =f(x±l, y) -f(x, y) = =t= (2x±l) (10) Likewise, it can be demonstrated that the first y difference term is dx- F'JI Fz= -F_(z+!) (13) F1I = (14) and - F -('11+1) which state that the difference in F-values between two adjacent integer points does not depend on the direction. In the example of the circle, the derivative is Case 2: Given dy/dx Based on the definition of the derivative, it can be shown that the derivative at a point (x, y) is bounded (12) The exact difference terms are found from this approximation by requiring the F -space surface to be single-valued In F. This requirement results in the identities (11) Both difference terms for the circle are illustrated in Figure 3. 869 dy =_ ~ dx y (15) For this given derivative equations (13) and (14) are satisfied only by the introduction of additional terms, here constants c, such that x+c= (x+l)-c (16) and 2c=-1 (17) The resultant ratio of the difference terms is Fz 2x+l 2y+l ---- F'JI (18) and can be written as F ±z _ =t= (2x±l) F ±'J1 - =t= (2y±l) Figure 3-Integer arithmetic difference terms are the changes in the F variable between adjacent integer points in the xy-plane. Successive points are selected to be along the circle but not necessarily on the circle (19) This result is identical to the one in case 1. To summarize, in case 2 the function f(x, y) =0 was not given but its derivative was. This is sufficient to calculate the correct difference terms. It is easy enough to verify that incremental calculation of solutions (F, x, y) based on the difference terms F ±z and F ±'J1 is exact, accumulates no errors, and represents integer solution points on a single-valued surface in F-space. The reader can also easily verify that the intersection of the F -space surface with the xy-plane is the given circle. 870 Fall Joint Computer Conference, 1972 SECOND ORDER DIFFERENTIAL EQUATIONS A general second order non-linear differential equation is represented by the equations dx - =y dt (20) dy dt +g(x, y) =0 (21) The first step in finding numerical solutions to the equations is to calculate the difference terms F x and F y. Their ratio is approximately Fx -dy -~-- Fy dx -dy dt = --. - dt dx g(x, y) = --- Y One example of a contour-following algorithm based on the sign is the following: Given an initial direction vector, then the choice of the next increment in the phase-plane is the one which has the difference term sign opposite that from the F-value. If both difference terms and F have the same sign, then the direction is changed to an adjacent quadrant and an increment is taken along the direction axis traversed. Subsequently, the choice of either a positive or negative t-increment determines whether the new direction is acceptable or another change of direction has to be made. Values of the variable t are obtained by anyone of the two following integration steps. Either, t=fdX~~~ (22) which can be written immediately from the above equations. The exact values of the difference terms must satisfy the identities of equations (13) and (14). These identities are formed by adding appropriate additional terms, g' (x, y) to both sides. The resultant difference term then is Fx=g(x, y) +g'(x, y) (23) In a similar fashion, the difference terms Fy=y+c (24) -F-(H1) = (y+1)-c (25) and are identical if c = ~. It is not practical to reduce further the ratio of exact difference terms in the general case. Later, specific examples will illustrate this technique. Given the exact difference terms and the initial values, then anyone of the integer arithmetic contourfollowing algorithms can be applied to find adjacent integer points (x, y) along the phase-plane trajectory without accumulating errors, and the points (x, y) are guaranteed to be accurate to within unity (which is scaled as the least significant increment in the calcUlations) . Successive solution points are calculated by incrementing one variable in the phase-plane and adding the corresponding difference term to F. In general, the solution points (F, x, y) are on the F-space surface but are not contained in the phase-plane. The important result is that there is no accumulation of errors in the incremental calculation of solution points on the F-space surface. Errors are introduced in relating the F-space surface points to the phase-plane trajectory, but the contourfollowing algorithms can always guarantee that these errors are less than unity. y x (26) Fy or t=- f dy g(x, y) 1 ~ ~ Fx (27) will result in the same value of the variable t. Here the integration is approximated by an incremental summation over either x or y increments. The error in t-increments becomes large whenever the difference term in the denominator of the summation becomes small. This problem is avoided by choosing the summation which contains the largest difference term. In order for t to increase in the positive sense, the direction of incrementation in the phase-plane is chosen to make the product of the x-direction vector and y-value be positive. Otherwise, t increases in the negative sense. This result is derived from equations (20) through (22). Thereby, the solution method is complete. EXAMPLE 1 The integer arithmetic method of finding numerical solutions to differential equations is illustrated here by the example of a second order linear differential equation. It has easily derivable parametric solutions in t but its phase-plane trajectory cannot be expressed as f(x, y) =0. This initial value problem is stated as dx -=y dt (28) (29) with initial values of (xo, Yo, Yo) = (0, 20, 0) (30) Minicomputer Models for Non-Linear Dynamic Systems 871 I ts parametric solution depends on the values chosen for the constants k and w 2• The values chosen here are k= .08 and w 2 = .04 which result in' the parametric solution X= 104.ge-·o4t sin 0.196t (31) and y= -4.1ge-·o4t sin 0.196t+20.0e-· o4t cos 0.196t (32) These calculations apply only for the analytic solution and not in the integer arithmetic solution scheme. There, the first step is to establish the difference terms from the given equations (28) and (29). According to equation (12) the approximate ratio of the difference terms is Fx F 1I - ~ ky+w 2x-yo -"-------''- y (33) The requirement that the F -space surface is a singlevalued surface, as stated in equations (13) and (14), is applied to obtain the exact difference terms (34) and (35) Then the ratio of the terms is multiplied by 2n/2n where n is an appropriate integer to eliminate the fractions. The choice of direction in the phase-plane for positive t establishes that the x, y-direction vector is ,., ..,............................................ .,........ •••...... .................• . Figure 4-The phase-plane trajectory of the linear differential equation discussed in example 1, for the initial values (x, y, t) = (0, 20, 0). The integer arithmetic solutions are displayed on a CRT Figure 5-Top: The integer arithmetic solutions (x, t) as displayed on a CRT for the trajectory of Figure 4 Bottom: A CRT display of the integer arithmetic solutions x as calculated in real time. The time axis is 5msec/cm (1, -1), and the'difference terms are: F ±x= ±n[2ky+w2(2x±1)] (36) F ±11= ±n(2y±1) (37) The resultant phase-plane trajectory is illustrated in Figure 4 and the numerical results (x, t) are compared with the calculation of values of x in real time in Figure 5. The peak to peak values of x are equal to 114 increments which corresponds to a 1 percent accuracy. As can be seen, the first cycle is calculated in 25 milliseconds. A comparison of the numerical with the analytic solution confirms that all points (x, y) are unit distant from the true trajectory. 872 Fall Joint Computer Conference, 1972 The above example illustrates how the incremental calculations are set up for the second order differential equation. The numerical solutions (x, y, t) are obtained by application of: the integer arithmetic calculation, even though there is no closed expressionj(x, y) =0 for the trajectory. problem to obtain an improved resolution of integer solution points. This is done by rewriting the van der Pol equation with the new variables x' and y' SCALING and by taking increments of lin units in x', 11m units in y', and replacing e by elk. Then the difference terms are computed, and the variable nx' is replaced byx and my' is replaced by y. The resultant difference terms are: F ±x=m[±e(6x2±6x+2-6n2 )y+nmk(±6x+3)] In many problems it is necessary to scale the variables to obtain either an improved or a coarser resolution. Such scaling is best illustrated by the example of the circle given in equation (9). First, an improved resolution in x only is obtained by taking increments of lin units where n is an integer and integer calculations are retained. Then, Fnx= -[(x+1/n)2- x2] = - [n- 2(nx+1)2- x 2] = - [n- 2 (2nx+l)] (38) On multiplying F by n2, the difference terms are Fnx= - (2nx+1) (39) and (40) The other scaling example takes n increments inx at a time. Then, Fx/n= -[(x+n)2- x 2] = -[n2(xln+1)2- x2] = -n2 (2xln+1) x=x' (44) dx I -=y dt (45) (46) and (47) For m=n= 10 and e=k= 1, the point (x, y, F) = (0,21, -83840) is located in F-space on the limit cycle F-space surface. Based on the difference terms, an incremental calculation of solution points along the limit cycle returns to the same start point. This confirms that there is no accumulation of errors in the incremental calculations. Likewise, for start points chosen both inside and outside the limit cycle, results agree with the expected trajectories in that all resultant trajectories terminate with the limit cycle. These results are in complete agreement with published data3 for values of the constant e=O.l, 1.0, and 10. The variable t is calculated from either summation: (48) (41) and the y difference remains or (42) The last step in scaling is to substitute a new variable for nx or xln respectively in the above examples, and to proceed with integer calculations. EXAMPLE 2 The earlier example of the integer arithmetic solution scheme represented a linear differential equation with parametric solutions. For this example is chosen the van der Pol equation d2x dx dt 2 +e(x2 -1) dt +x=O (43) The phase-plane trajectory of this equation has a stable limit cycle with a radius of 2 for the constant e>O. Near the limit cycle, it is necessary to scale the t= K L: Fx (49) 'J/ It should be remembered that x, y, and F have been rescaled and correspondingly also the numerator in these summations is scaled to K. It is given by the equation (50) for both the x and y incrementation sum. If the forcing function, 5 sin 2.5t is applied to the van der Pol equation, then the difference term in x becomes F ±x=m[±e(6x2±6x+2-6n2)y+mnk(±6x+3) =F6mn2k(5 sin 2.5t)] (51) and the y difference term remains unchanged. In this instance, again the results are in complete agreement with published data. 4 Minicomputer Models for Non-Linear Dynamic Systems 873 HIGH-ORDER SYSTEMS An initial value problem can be written as: dxl dt = fl (xl, x2, ... , xn, t) dx2 dt =f2(xl, x2, ... , xn, t) dxn dt =fn(xI, x2, ... ,xn, t) (52) The numerical solutions (xl, x2, ... , xn, t) are found by taking the set of equations in the ratios: dx2 dx3 dxl 'dx2' ... (53) Each ratio represents a phase-plane trajectory for which the difference terms can be established. An increment in x2 in the first phase-plane trajectory also corresponds to an increment in x2 in the second phase-plane trajectory, which in turn may result in x3 being incremented. Whether or not x3 is incremented depends on the particular integer arithmetic contourfollowing algorithm which is used. For example, the algorithm based on the sign forces an x3 increment xl SUBROUTINE Ai calculates Fxi and F x(i+l)' SUBROUTINE Bi chooses next increment or operation. SUBROUTINE C i updates all variables for an Xi increment. Figure 7-Flow chart of an integer arithmetic algorithm for tracking coupled trajectories, showing the calculations for the i-th variable when the value of the difference term for that variable has the opposite sign of the current F-value. These coupled trajectories are illustrated in Figure 6 for the simple equation xl 128 128 d4x 1 -4 - -x=O dt 8 -x2 o x3 -x2 x3 ~ I -64 0 2~2 - x4 1 -16 0 Figure 6-Coupled phase-plane trajectories for the equation, 1 d4x - 4 - -x dt 8 =0 The variable t is obtained from the first trajectory and is scaled to T = 128 at x1 = 1 unit from termination (54) given the initial values (xl, x2, x3, x4, t) = (128, -64, 32, -16,0). A general algorithm for coupled trajectories is shown in the block diagram of Figure 7. As can be seen, an increment in the first variable may immediately ripple through to an incrementation in the n-th variable, after which, starting from the end of the chain, each traj ectory achieves a stable solution as determined by the contour following algorithm. The variable t can be calculated from anyone phaseplane trajectory each resulting in the same value but being consistent with the resolution of computation. CONCLUSIONS The integer arithmetic solution method has been applied to a variety of initial-value problems, of which representative examples are illustrated above. Associated 874 Fall Joint Computer Conference, 1972 with this method are a number of theorems. These prove that the F -space surface is single-valued in F, that the direction field is bounded by the ratio of difference terms, that some trajectories have integer F-space solutions at all integer points in the phaseplane, and that for other trajectories the F-space surface is approximated, but the accuracy of results is guaranteed over a finite domain. However, additional theorems remain to be developed to insure that the method is applicable to all initial-value problems, and to determine the necessary conditions for stability. The solution method is summarized as follows: Successive solution points along a phase-plane trajectory are calculated by adding a difference term to the F-value and incrementing the associated phase-plane variable. These simple operations are offset by the more complex contour-following algorithms which track the trajectory by examining the state of calculations and then selecting the next increment. Here the underlying concept is that the trajectory is the contour formed by the intersection of the F-space surface with the phaseplane. There exists a duality between the integer arithmetic technique and the standard Runge-Kutta or predictorcorrector solution methods. In integer arithmetic, the phase-plane variables are the independent variables and t is a dependent variable obtained as a result of integration. Just the reverse is true in the standard methods; t is an independent variable and the phaseplane variables are obtained as a result of integration. The integer arithmetic technique finds solution points on the phase-plane trajectory even though there may not exist an analytical expression of that trajectory. Likewise, the standard method finds solution points of integrals which cannot be expressed in analytic form. An additional duality is that after initial scaling, the integer arithmetic solutions have a guaranteed accuracy whereas the standard methods require a subsequent accuracy calculation. The computations involved in the integer ~rithmetic method are simpler than the ones in other methods: The examples illustrated in this paper were programmed in assembly language for the Digital Equipment Corporation PDP-15 computer. It has only 4096 words of store and does not have a hardware multiplier. The entire program is contained in 300 words of store and is executed in 50 microseconds per increment in x or y, including the time calculation. It is difficult to execute any other solution scheme within such limited facilities or comparable speed. Other examples have been programmed in FORTRAN on a large PDP-I0 computer. There the execution time is 10 times slower, and is comparable to the standard numerical integration methods. In these examples, floating point calculations were used for the integer arithmetic calculations. ACKNOWLEDGMENTS The development of this method resulted from the application of integer arithmetic techniques at the Western Electric Engineering Research Center in Princeton. Also, there are substantial contributions by J. E. Gorman in formulating the integer arithmetic techniques. REFERENCES 1 J E GORMAN J RAAMOT Integer arithmetic technique for digita,l control computers Computer Design Vol 9 No7 pp 51-57 July 1970 2 A G GROSS et al Computer systems for pattern generator control The Bell System Technical Journal Vol 49 No 9 pp 2011-2029 November 1970 3 L BRAND Differential and difference equations Wiley 1966 New York 4 L LAPIDUS R LUUS Optimal control of engineering processes Blaisdell Waltham 1967 Fault insertion techniques and models for digital logic simulation by STEPHEN A. SZYGENDA and EDWARD W. THOMPSON Southern Methodist University Dallas, Texas to validate fault detection or diagnostic tests, to create a fault dictionary, to aid in the automatic generation of diagnostic tests, or to help in the design of diagnosable logic. The activities of a digital fault simulation system can be divided into two major areas. The first is the basic simulator, which simulates the fault free logic net. The activities of the second part are grouped under the heading of program fault insertion. (For digital fault simulation as opposed to physical fault insertion.) The merit of the fault insertion activities can be judged on five points. These are: INTRODUCTION During the past few years it has become increasingly apparent that in order to design and develop highly reliable and maintainable digital logic systems it is necessary to be able to accurately simulate those systems. Not only is it necessary to be able to simulate a logic net as it was intended to behave, but it is also necessary to be able to model or simulate the behavior of the logic net when it contains a physical defect. (The representation of a physical defect is known as a fault.) The behavioral simulation of a digital logic net which contains· a physical defect, or fault, is known as digital fault simulation. 1- 6 In the past, two methods have been used to determine the behavior of a faulty logic net. The first approach was manual fault simulation. 7 (For logic nets of even moderate size, this method is slow and often inaccurate.) The second method used is physical fault insertion. 7 In this method faults are physically placed in the fabricated logic, input stimuli are applied and the behavior of the logic net observed. Although physical fault insertion is more accurate than manual fault simulation, it is still a lengthy process and requires hardware fabrication. The most serious limitation, however, is that physical fault insertion is dependent on a fabrication technology which permits access to the input and output pins of logic elements such as AND gates and OR gates. With discrete logic this is possible, however, the use of MSI and LSI precludes the process of physical fault insertion. Since MSI and LSI will be used in the future for the majority of large digital systems, the importance of digital fault simula. tion can be readily observed. The major objective of digital fault simulation is to provide a user tool by which the behavior of a given digital logic design can be observed when a set of stimuli is applied to the fabricated design and a physical defect exists in the circuit. This tool can then be used (1) Accuracy with which faults can be simulated. (2) Different fault models that can be accommo- dated. (3) Methods for enumerating faults to be inserted. (4) Extraction of information to be used for fault detection or isolation. (5) Efficiency and capability of handling large numbers of faults. ACCURACY OF FAULT SIMULATION In order to accurately predict the behavior of a logic net which contains a fault, the basic simulation used must be capable of race and hazard analysis. A simple example of this is shown in Figure 1. In this example an AND gate has three inputs, a minimum delay of 3, and a maximum delay of 4. At time T2 signal A starts changing from 1 to 0 and at the same time signal B starts changing from 0 to 1. The period of ambiguity for the signals is 3. For the fault free case, signal C remains constant at o. Therefore, the output of the gate remains constant regardless of the activity on signals A and B. If signal C has a stuck-at-Iogical 1 fault, there is potential for a hazard on the output of the gate between time T 5 and T 9. This hazard will not 875 876 Fall Joint Computer Conference, 1972 A~"'O B 0+1 C o D DELAY MIN 3 MAX 4 A o T2 T5 Tg I ! I I 1 I I tw#J1hl I I B o D 0 D ! I wj'/#dI iI I I -------tl-----+-I--- NO FAULT o ______ - I~:lal tA l.lA.6. 6. . L.I. .Ll5L1 ___ FAULT PRESENT Figure I-Fault induced potential error be seen unless the fault insertion is done in conjunction with simulation that is capable of detecting such a hazard. The fault insertion to be discussed here is done in conjunction with the TEGAS28,9 system. TEGAS2 is a table driven assignable delay simulator which has eight basic modes of operation of which three are concerned with faults. For the first mode, each element type is assigned an average or nominal propagation delay time. This is the fastest mode of operation, but it performs no race or hazard analysis. Mode 2 is the same as mode 1, except it carries a third value which indicates if a signal is indeterminate. The third mode has a minimum and maximum propagation delay time associated with each element type and performs race and hazard analysis. All three modes can use differing signal rise and fall times. Fault insertion and parallel fault simulation are performed in all three of these modes. When fault insertion is done in mode 3, races or hazards that are induced because of a fault, will be detected. If fault insertion is done in mode 2, no fault will be declared detected unless the signal values which are involved in the detection are in a determinate state. Also it can be determined if a fault prevents a gate from being driven to a known state. By using TEGAS2 as the basic simulator, faults can be simulated to whatever degree of accuracy desired by the user. model or insert various kinds of faults. Most fault simulation systems are capable of modeling only the class of single occurring stuck-at-Iogical 1 and stuckat-logical 0 pin faults. Although it has been found that this .class of faults covers a high percentage of all physICal defects which occur, (considering present technology), it is certainly not inclusive. In an effort to remain as flexible and as efficient as possible and to be able to model different classes of faults, TEGAS2 has developed three different fault insertion techniques. The first technique is used to insert. signal faults or output pin faults. This type of fault IS where an entire signal is stuck-at a logical 1 or a logical O. The distinguishing factor is that the fault affects an entire signal and not just an input pin o~ an element. Figure 2 illustrates a signal or output pIn fault as opposed to an input pin fault. At the beginning of a fault simulation pass, the OUTFLT table (Figure 3), which contains all of the signal faults to be inserted, is constructed. There is one row in the table for every signal that is to be faulted. The information in each row is the signal number, ~ASK1, and MASK2. The signal number is a pointer Into an array CV where the signal values are stored. MASK 1 has a 1 in each bit position that is to be faulted. The right most bit of a word containing a signal value is never faulted since that bit represents the good machine value. MASK2 contains a 1 in any bit position that is to have a SAl inserted and a 0 where there is either a SAO or no fault to be inserted. Parallel fault simulation is accomplished by having each bit position, in a computer a word containing a signal value, represent a fault. At the end of each time period during simulation, for which any activity takes place, the signal faults are inserted. This method is very simple and requires little extra code. For example, let CV be the array containing the signal values and OUTFLT be the two dimensional table discussed above. Then, to insert one A~I-------------~ I------F B ,...:.0_ _----1 FAULT 2 "INPUT PIN" E C ....:.0_ _---1 D~I-------------~ 1------ G OUTPUT VALUES FAULT MODELS ~ o 0 I I I In order for a fault simulation system to be as flexible and as useful as possible, it should. be able to 0 NO FAULT FAULT I FAULT 2 Figure 2-Example of a signal fault and a pin fault Fault Insertion TechniquBs and Models FAULT RECORD (FAULTS TO BE INSERTED) COT INDEX MFNT (FAULT MACHINE 1~I I FAULT NO. N ~g~~~~PONDENCE TABLE) \ y FAULT TABLES MASK 1- POSITION OF FAULT MASK 2-TYPE OF FAULT Figure 3-TEGAS2-Table structure for fault simulation or more faults on a signal, the following statement would be executed. CV(OUTFLT(i,l» (CV(OUTFLT(i,l» .AND. (. NOT. OUTFLT(i,2». OR. OUTFLT(i,3). [1] i represents the row index in the signal fault table OUTFLT. It is not necessary to insert signal faults after a time period that has no activity, since none of the signals will have changed value. By inserting signal faults in this manner it is not necessary to check a flag every time an element is evaluated to see if its output must be faulted. The second method of fault insertion is used for input pin faults. An input pin fault only affects an input connection on an element. This is demonstrated in Figure 2. In the table structure for the simulator, each element has pointers to each of its fan-ins. The pointer to any fan-in that is to be faulted is set negative prior to a simulation pass. During simulation the input signals for an element are accessed through a special function. That is, the evaluation routines for thedifferent element types are the same as when fault insertion is not performed except that the input values for the element are acquired through the special function. This function determines if a particular fan-in pointer is negative. If a pointer is negative, the element being evaluated and the signal being accessed are looked up in a table containing all input pin faults to be inserted for a simulation pass. The appropriate fault can then be inserted on the input pin before the evaluation routine uses it. The input pin faulting procedure can be more clearly illustrated by first examining the major tables used in simulation. These are given in Figure 4. Each row in 877 the circuit description table (CDT) characterizes a signal or a single element. The first Bntry in the CDT table is a row index into the function description table (FDT). The second entry CDT (i, 2), points to the first of the contiguous set of fan-in pointers (in the FI array) for element i. CDT (i ,3) points to the first of a contiguous set of fan-out pointers (in the FO array) for element i, and CDT (i,4) specifies how many fan-outs exist. The signal value (CV) table contains signal values. The ith entry in the CV array contains the value of the ith signal or element in the CDT table. Each row in the FDT table contains information which is common to. all elements of a given logical type. FDT (i, 1) contains the number of the evaluation routine to be used for this element type, FDT (i, 2), FDT (i,5), and FDT (i,6) contain the nominal, maximum, and minimum delay times respectively. FDT (i ,3) specifies the number of fan-ins for this element type. FDT (i, 7) contains the number of outputs for the given element type. (This is used in the case of multiple output devices.) FDT (i ,4) is used for busses. The simplest evaluation-routine for a variable input AND gate V\rill now be given. This is the routine used when no race and hazard analysis is performed, nor is an indeterminate value used. N = IFIPT = ITEMP = DO 10NN = K = ITEMP = 10 CONTINUE (FDT(CDT(I, 1), 3) CDT(I,2) ALLONE 1,N FI(IFIPT + NN - 1) ITEMP.AND. CV(K) The integer variable ALLONE has all bits set to one. COT(POINTERS) FI* (INTERCONNECTIONS) ELEMENT NUMBER (INDEX) MODE 3 (HAZARDS) *TABLES THAT ARE DYNAMICALLY ALLOCATED CV * CV2* CV3 - ~ TABLES USED FOR SIMULATION COT FDT FI FO } INTERCONNECTION DATA AND ELEMENT CHARACTERISTICS (MODEL DEFINITION) FDT (ELEMENT TYPES) FINES CHARACTERISTICS ~ OF ELEMENTS - NO. OF INPUTS, TIME DELAY, AMBIGUITY REGION, ETC. rn Figure 4-TEGAS2-Simulation table structure 878 Fall Joint Computer Conference, 1972 All that is required to change this routine so that input pin faults can be inserted is to replace CV(K) with FLTCV (K). FLTCV is a function call. It determines if the fan-in pointer K is negative and if so it uses the INFT tal>le to insert the appropriate fault. The INFLT is the same as the OUTFLT table except that the relative input pin position, to be faulted, is given. The combination of the element number and the input pin position on that element identify a particular pin to be faulted. The third method of fault insertion is used for complex faults. A complex or functional fault is a fault used to model a physical defect which does not correspond to a single or multiple occurring stuck-atlogical 1 or stuck-at-Iogical 0 fault. An example of this is a NAND gate that becomes an AND in the presence of some physical defect. For this approach an element is first evaluated by its normal evaluation routine. Then, if a complex fault is to be inserted on that element, it is evaluated again using a routine which models the complex fault. An example would be an inverter gate which no longer inverts. In this case, the normal inverter routine would be used first, then an evaluation routine, which merely passes the input signal along, would be used. As with the other insertion techniques, a table (FUNFLT, Figure 3) is constructed at the beginning of a simulation pass and it contains all elements that are to have complex faults for that pass. This table also contains' the routine number that will evaluate a prospective complex fault and again which bit position will represent the fault. Each entry in the Fl[NFLT table has one extra space, used for input pin position when modeling input shorted diodes. It is the responsibility of the complex fault evaluation routines to merge their results with the results of the normal evaluation routine so that the proper bit represents the fault. This is accomplished by using MASKI in the FUNFLT table. Assume that variable SP temporarily contains the non-fault. element evaluation results and the variable SPFT contains the results from the element representing the complex fault. As was stated before, MASKI contains a 1 in the bit position that is to represent the fault, then the statement SP = (SP.AND. (.NOT.MASKI)) (MASK1.AND.SPFT) .OR. will insert the fault in SP. Other faults that can be modeled with the complex fault insertion technique are shorted signals, shorted diodes, NAND gates that operate as AND gates, edge triggered D type flip-flops which are no longer edge triggered, etc. Two signals which are shorted together would be modeled as in Figure 5. A dummy gate is A ...... A* ...... 0 U M a ... P' a* ... -". Figure 5-Shorted signals placed over signals A and B. In the faulted case, the dummy gate takes on the function of an AND gate or an OR gate, depending on technology. In this case, the input signals are ANDed or ORed together and the result passes on to both A* and B*. Another class of faults that .can be modeled, to some extent, with. the complex method, is transient or intermittent faults. This is possible only because TEGAS2 is a time based simulator. As an example, let us model the condition of a particular signal periodically going to 0 independent of what its value is supposed to be. Again we pass the signal through a dummy gate as in Figure 6. The dummy gate also produces the fictitious signal (F) which is ANDed with the normal signal. The fictitious signal is normally 1, however the dummy gate can use a random number generator to periodically schedule the fictitious signal to go to O. It can also use the random number generator or a given parameter to determine how long the signal remains at O. From this discussion, the flexibility and power of the complex fault insertion method can be seen. In addition to modeling the faults described· above, any combination of faults can be modeled as if they existed simultaneously. A group of faults that exist at the same time is considered to be a multiple fault. With this capability multiple occurring logical stuck at 1 and logical stuck at 0 faults can be modeled. Also multiple complex faults can be inserted, or any combination of the above. Modeling a group of multiple faults is accomplished simply by letting a single bit position in the signal values represent each of the faults that are to exist together. That is, MASKI in the fault tables would have the same bit position set to one for each of the faults in a group of multiple occurring faults. This approach to the handling of multiple faults has permitted us to develop a new technique for Fault Insertion Techniques and Models simulating any number of faults, from one fault to all faults, in one simulation pass. The added running time for this approach is slightly more than that needed to do a parallel simulation pass which consideres a number of faults equal to the host machine word length minus one. Hence, the approach has the potential of being less time consuming than the one pass simulatorslO and more flexible and efficient than the traditional parallel simulators. The technique is called 111ultiple Number of Faults/Pass (MNFP) and partitions the faults into classes that will be simulated as multiple faults. Therefore, each bit position represents a group of faults. If the groups are structured such that blocking faults (such as a stuck at 1 and a stuck at 0 simultaneously on the same signal) are not included in the same group, fault detection can be achieved. If fault isolation is required, the fault groups which contain detected faults will be repartitioned and the process continued. For example, if we are simulating 35 groups of faults and five groups indicate faults being detected, the five groups will be repartitioned and simulated for isolation. The efficiency of this approach is derived from the fact that the other 30 groups need not be simulated any further for these inputs. Assume that these 30 groups each contained 70 faults. For parallel simulation, this would require 59 additional simulation passes, over the (MNFP) approach. Another feature of this approach is that all faults need not, and indeed, sometimes should not, be simulated in one pass. For example, assume the following partition of 2,168 faults. N umber of groups 10 5 20 15 10 9 69 (Total Groups) Number of faults/group 19 27 35 71 .6 2 2,168 (Total Number of Faults) For this case, two passes of the simulator would be required, assuming a 36 bit host machine word. MNFP is also being used in conjunction with diagnostic test generation. For example, assume the existence of 3500 faults, and that our diagnostic test generation heuristics have generated three potential tests (T 1, T 2 and T 3). If the faults are partitioned into groups of 100 each, all faults could be simulated in one pass. Hence, if each test is applied to the fault groups using MNFP, it would require three passes to determine ... A A* 879 ...... 0 U F ..,.. M Figure 6-Intermittent fault (to some degree) the relative effectiveness of the tests. If Tl and T2 detected faults in only one group, and T3 detected faults in 5 groups (with fault detection being the objective), the most likely candidate for further analysis would be Ta. Even if all of the faults in these 5 groups were then considered individually, (the worst case) the entire process would require 18 simulation passes, as opposed to 100 passes using conventional parallel simulation. Further studies to determine the most efficient utilization of the MNFP technique are presently under way. ENUMERATION OF FAULTS If every fault that is to be inserted must be specified manually, it could be a very laborious process. It is certainly necessary to be able to specify faults manually if desired, but it is also necessary to be able to generate certain classes of faults automatically. TEGAS2 is set up in a modular fashion such that control cards can be used to invoke independent subroutines which will generate different classes of faults. Additional faults can be specified manually. New subroutines can be easily added to generate new classes of faults as the user desires. One class of faults that the system presently generates automatically, is a collapsed set of single occurring stuck-at-Iogical 1 and stuck-at-Iogical o pin faults. This is the class of faults most often used. A collapsed set of faults refers to the fact that many faults are indistinguishable. For instance, a stuck at o on any of the input pins of an AND gate and a stuck at 0 on the output of the AND gate cannot be distinguished. If this group of faults is collapsed into one representative fault, it is considered to be a simple gate collapse. This is easy to perform and many existing systems utilize this feature. 880 A Fall Joint Computer Conference, 1972 12 E B.-..;:;3.£..4~.....L..-_ _ 13,14 15,16 G 17,'8 C ~5!..:,6~-f'"_ _""'" F ODD NUMBERS = S-A-I FAULTS EVEN NUMBERS=S-A-O FAULTS 8 COLLAPSED FAULT SETS: (1),(3), (2,4,IO,14) t (9,13,17,15,11), (5),(7) ,(6,8, 12 ,16), (18) Figure 7-Fault collapse A simple gate collapse is not, however, a complete collapse. Figure 7 gives an example of a completely collapsed set of faults. There are a total of 18 possible single occurring S-A-l, S-A-O .faults. A simple gate collapse will result in 12 sets of faults. However, an extended collapse results in only 8 sets of distinguishable faults. This amounts to a reduction of 33 percent over the simple collapse. In large examples, the extended co-l lapse has consistently shown a reduction of approximately 35 percent over the simple collapse. The information gained in collapsing a set of faults has additional value. This information can be used in determining optimal packaging of elements in MSI or LSI arrays so as to gain fault resolution to a replaceable module. As in Figure 7, it can be readily seen that these three elements might best be placed on the same replaceable module. This is true because all three elements are involved in an indistinguishable set of faults. FAULT DETECTION The activity associated with determining when a fault has caused an observable malfunction can be termed fault detection. As with many other functions, TEGAS2 uses a dummy gate designated as the detection gate, for this purpose. A detection gate is specified for signals that are declared observable for the purpose of fault detection. These signals are many times referred to as primary outputs or test points. An ordered set of such signals is called the output vector. If any fault causes the value of one or more points on the output vector to be different from when the fault is not present, the fault is declared detected. Whenever one of the signals, which is part of the output vector, changes value, its corresponding detection gate determines if a fault is observable at that point. This is easily accomplished since the fault free value for any signal is always stored in the low order bit of the host machine word. The values for that signal, corresponding to each of the faults being simulated at that time, are represented by the other bits in the word. Hence, all that is necessary is to compare each succeeding bit in the word to the low order bit. If the comparison is unequal, a fault has been detected. For each simulation pass the machine fault number table (MFNT) (Figure 3) which cross-references each bit position in the host machine word with the fault to which it corresponds, is maintained. Once a comparison is unequal, the table can be entered directly by bit position and the represented fault can be determined. When a fault is detected, the detection gate records the identification number of the fault detected, the good output vector, the output vector for the fault just detected and the time at which the detection occurred. The input vector, at the time of detection, may be optionally recorded. The important thing to note is that since TEGAS2 simulates on the basis of very accurate propagation delay times, the time of detection has significance. By using this additional information, it is possible to gain increased fault resolution. An example of this is when two faults A and B result in identical output vectors for all input vectors applied. Without any additional information, these two faults cannot be distinguished. However, if the malfunction caused by fault A appears before the malfunction caused by B, then they can be distinguished based on time of detection. The detection gate performs several other duties. For example, in mode 2 a signal may be flagged indeterminate. In this case, the detection gate checks to see that both the fault induced value and the fault free value are determinate before a fault is declared detected. In all modes of operation, a detection gate may have a clock line as one of its inputs. This clock line or strobe line, may be used to synchronize the process of determining if a fault is detected. In this manner, systems of testers which can examine for faults only at certain time intervals can be accurately simulated. FAULT CONTROL When dealing with logic nets of any size at all, such as 500 elements and up, there are thousands of faults Fault Insertion Techniques and Models to be considered. If such a magnitude of faults is to be simulated efficiently, a good deal of attention must be paid to the overall fault control. The overall control should be such that it will handle an almost unlimited number of faults and be as efficient as possible. The first step in the process of fault simulation is the specification of the faults to be inserted. As was stated earlier, certain classes of faults may be generated automatically and others specified manually. In either case, the faults are placed sequentially on an external storage device. After all faults have been enumerated, an end-of-file mark is placed on the file and it is rewound. This is accomplished with a control card. The number of faults that can be specified is limited only by the external storage space available. The maximum number of faults that can be simulated during a single simulation pass is dependent on the number of bits in the host machine word, unless MNFP is used. As mentioned earlier, one bit is always used for the good machine and the others are used to represent fault values. Through the use of a control card, the user may specifiy the number of bits to be used, up to the maximum allowable. Let N be the number of faults to be simulated in parallel. The basic steps in fault simulation would then be as follows: (1) Enumerate all faults to be simulated. (2) Store on an external device all data necessary to initialize a simulation pass. (3) Read sequentially N faults from the external fault file and set up the appropriate fault tables. (4) Negate the appropriate pointers based on the faults tables. (5) Pass control to the appropriate mode of simulation as determined by the user. (6) If all faults have been simulated-stop. (7) If there are more faults to be simulated, restore data necessary to initialize simulation and go to 3. What has not been explicitly stated, up to this point, is that all input vectors or input stimuli are applied to a group of faults before going to the next group of faults. This will be called the column method. With zero delay or sometimes unit-delay simulation, fault control is not usually done in this manner. In these cases, a single input vector is applied to all groups of faults before going to the next input vector. This will be referred to as the row method. Between applying input vectors in the row method, all faults are examined to determine which ones have been detected and these are discarded. The faults remaining can then pe rer 881 grouped so that fewer faults need be simulated with the next input vector. On the surface, the row method control seems more efficient than the column method. However, there are several things to be considered. First of all, when the row method is used with sequential logic, the state information for every fault must be saved at the end of applying each input vector. This requires a great deal of bit manipulation and storage space. The amount of state information that must be stored is dependent on the type of simulation used. If, as with most zero delay simulators, the circuit is leveled and feedback loops are isolated and broken, only the values of feedback loops and flip-flop states need to be stored. With a simulator such as TEGAS2, the circuit is dynamically leveled and feedback loops are never detected and broken, therefore, every signal must be stored. This is one of the reasons that the row method is not considered to be as practical with a time based simulator such as TEGAS2. A second consideration is the fact that with TEGAS2, all input stimuli can be placed at appropriate places in a time queue before simulation begins. Once simulation begins, it is one continuous process until all stimuli have been applied. Because of this, a large number of input stimuli can be processed very rapidly and efficiently. The abiljty to place all input stimuli in a time queue would not be possible if a time based simulator were not used. With a zero delay, or even a unit delay simulator, input stimuli cannot be specified to occur at a particular time in reference to the activity of the rest of the circuit. Therefore, one input vector is applied and the entire circuit must be simulated until it has been determined to be stable. Then the next input vector can be applied, etc. In this manner, there is a certain amount of activity partitioning between input vectors, which lends itself to the row method. The third factor to consider is that if a fault is no longer simulated after it is once detected, a certain amount of fault isolation information is lost. If the column method is used, the cost of retaining a fault until the desired fault isolation is obtained is considerably less than with the row method. EXAMPLES To demonstrate the fault simulation capabilities of TEGAS2, as presented in this paper, consider the network in Figure 8. This network is a particular gate level representation of a J-K master slave flip-flop. The nominal propagational delay time of each of the NAND gates is four (4) time units and the delay of 882 Fall Joint Computer Conference, 1972 CLEAR CLOCK simulation, there were thirty faults declared to be detected. Seven of the faults not detected were the same as in the first mode of simulation. The three other faults not detected were 1, 20, an.d 23. The reason these faults were not detected, in the three valued mode of simulation, is that they prevented the network from being driven to a known state. In the race and hazard analysis mode of simulation, twenty-six faults were declared to be detected. Out of the fourteen faults not detected, ten are the same as those not detected in the three valued mode. The other four faults are 5, 7, 8 and 11. Faults 7 and 8 were never detected because they never reached a stable state different from a stable state of the good machine's value. Many PRESET Figure 8-JK master slave flip-flop example the NOT gate is two (2) time units. The minimum delay of the NAND gates is three (3) time units and the maximum is five (5) time units. For the NOT gate, the minimum and maximum is one (1) and three (3) units, respectively. The two valued assignable nominal delay mode of simulation is the fastest mode, but it performs no race and hazard analysis. In this mode of simulation, all signals are initially set to zero. Suppose that for the network in Figure 8; the signals J, K, CLEAR, and PRESET are set to 1. Now let the signal CLOCK continuously go up and down with an up time of five and a down time of fifteen. The outputs, Q and Q will oscillate because they are never driven to a known state. If the same input conditions are used in the three valued mode of simulation, the outputs will remain constant at X (unknown). To demonstrate the power of the race and hazard mode of simulation,assume that the inputs J and K change values while the GLOCK is high and that the clock goes to zero two time units after the inputs change. Under these conditions, internal races will be created and a potential error flag will be set for both of the outputs Q and Q. Performing the extended fault collapse on the network in Figure 8 resulted in a total of forty faults (Table I) that must be inserted .. (A simple gate collapse would result in fifty-two faults to be inserted.) .Table II gives a set of inputs that were applied to the network in all three modes of fault simulation. The table gives all primary input signal changes. In the following analysis of fault detection, the signals Q and Qare the only test points for the purpose of observing faults. In the first mode of simulation, two valued assignable nominal delay, thirty-three of jthe faults were detected.· The seven faults not detected were 15, 16, 25, 27, 29, 31, and 33. In the three valued mode of TABLE I-Collapsed Set of Faults for Network in Figure 8 Fault No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Gate Q Q F3 F3 F2 F2 F2 QB QB F4 F4 F1 F1 F1 F6 F6 F5 F5 Signal Fault Type CLEAR CLEAR CLOCK CLOCK PRESET PRESET QB PRESET Q F4 PRESET F3 F7 F7 Q PRESET CLOCK F2 Q CLEAR QB F3 CLEAR F4 QB CLOCK CLEAR F1 K K J J F7 F4 F6 F6 F7 F3 F5 F5 SAl SAO SAl SAO SAl SAO SAl SAl SAO SAl SAl SAO SAl SAO SAl SAl SAl SAO SAl SAl SAO SAl SAl SAO SAl SAl SAl SAO SAl SAO SAl SAO SAl SAl SAl SAO SAl SAl SAl SAO Fault Insertion Techniques and Models 883 TABLE III-Time vs. Fault Detection CLEAR FAULTS Time DUMI C: =:J INTERMITTENT (I) CLOCK PRESET Figure 9-Complex faults times faults 7 and 8 caused the output signals Q and Q to be in a state of transition while the good machine value was stable. However, this is not sufficient for detection. Faults 5 and 11 caused potential errors and were therefore never declared to be absolutely detected. Table III gives time vs. faults detected for each of the three modes of simulation. Note that some of the faults are detected at different times between the two valued and three valued modes of simulation. This is due to the fact that some signals were not driven to known states in the three valued mode of simulation until a later time. Faults were also detected at different times in the race and hazard analysis mode since minimum and maximum delay times were used. N ow the insertion of complex faults will be demonstrated. The flip-flop network is marked in Figure 9 with five complex faults. Three of the complex faults require dummy elements. The first fault is an intermittent SAO on the PRESET signal. The dummy TABLE II-Input Signal Changes Signal J K CLOCK CLEAR PRESET CLEAR CLOCK CLOCK J K CLOCK CLOCK PRESET Value Changed to 1 0 0 0 1 1 1 0 0 1 1 0 0 Time of Change 0 0 0 0 0 30 70 110 131 131 131 134 160 4 5 8 10 12 16 18 20 46 50 80 83 86 90 120 Mode-2 Mode-1 9,21 Mode-3 21 21 39 1,6,20,38,40 3, 12, 14 24, 28 6,24,28,38,40 23 22,26 22,26 19 19 13,37 13,37 2,4,32 2, 3, 4, 9, 12, 14 32,39 6,24,28,38,40 22,26 19 13,37 2, 3, 4, 9, 12, 14 39,32 123 124 126 128 130 141 147 150 151 167 179 18,34,36 10 18, 34, 36 10 18, 34, 36 10 7 30, 35 17 7 30,35 17 5,8 11 5, 8 11 17, 30, 37 element DUMI is used to insert this fault~ The second fault is an input diode shorted on the connection of signal F4 to gate F6. To insert this fault, the signals F4 and F7 are passed through the dummy element DUM2. The element DUM3 is used to model a signal short between signals F5 and F6. A fourth complex fault is the case where element F3 operates an AND gate instead of a NAND gate. The fifth complex fault is a multiple fault. This multiple fault consists of a SAO on the input connection of signal CLEAR to gate Q, a SAlon signal Fl, and gate F2 operating as an AND gate instead of a NAND gate. The same input signal changes as given in Table II up through time 110 were applied to the network with these faults present. The times of detection for these faults in mode 1 are: Time 10 12 16 120 120 Fault No. 1 4 3 2 5 884 Fall Joint Computer Conference, 1972 Hence, these faults were simulated simultaneously and detected by the given input sequence for this mode of simulation. SUMMARY The TEGAS2 system is capable of simulating faults at three levels. The most accurate level performs race and hazard analysis with minimum and maximum delay times. The fault insertion methods developed for TEGAS2 are capable of modelling not only the traditional set of single occurring stuck-at logical one and stuck-at logical zero faults, but, also a wide range of complex faults such as intermittents, shorted signals, and shorted diodes. In addition, any multiple occurrence of the above faults can be modeled. The specifications of these faults can be done by the user or an extended collapsed set of single occurring stuck-at faults can be generated automatically. Due to accurate time based simulation for faults, it is possible to extract accurate time based fault diagnosis information. Finally, with the introduction of the MNFP technique, a new dimension has been added to digital fault simulation. REFERFNCES 1 E G ULRICH Time-sequenced logical simulation based on circuit delay and selective tracing of active network paths Proceedings ACM 20th National Conference 1965 2 S A SZYGENDA D ROUSE E THOMPSON A model and implementation of a universal time delay simulator for large digital nets AFIPS Proceedings SJCC May 1970 3 M A BREUER Functional partitioning and simulation of digital circuits IEEE Transactions on Computers Vol C-19 pp 1038-1046 Nov 1970 4 S G CHAPPELL S S YAU Simulation of large asynchronous logic circuits using an ambiguous gate model AFIPSProceedings F JCC November 1971 5 R B WALFORD The LAMP system Proceedings of the Lehigh Workshop on Fault Detection and Diagnostics in Digital Circuits and Systems December 1971 6 R M McCLURE Fault simulation of digital logic utilizing a small host machine Proceedings of the 9th ACM-IEEE Design Automation Workshop June 1972 7 E G MANNING H Y CHANG A comparison of fault simulation methods for digital systems Digest of the First Annual IEEE Computer Conference 1967 8 S A SZYGENDA A simulator for digital design verification and diagnosis Proceedings of the 1971 Lehigh Workshop on Reliability and Maintainability December 1971 9 S A SZYGENDA TEGAS2-Anatomy of a general purpose test generation and simulation system for digital logic Proceedings of the 9th ACM-IEEE Deisgn Automation Workshop June 1972 10 D B ARMSTRONG A deductive method for simulating faults in logic circuits IEEE Transactions on Computers May 1972 A program for the analysis and design of general dynamic mechanical systems by D. A. CALAHAN and N. ORLANDEA The University of Michigan Ann Arbor, Michigan Constraint (connection) equations: INTRODUCTION i=l, 2, ... m The physical laws that govern motion of individual components of mechanical assemblages are well-known. Thus, on the face of it, the concept of a general computer-aided-design program for mechanical system design appears straightforward. However, both the equation formulation and the numerial solution of these equations pose challenging problems for dynamic systems: the former when three-dimensional effects are important, and the latter when the equations become "stiff"! or when different types of analyses are to be performed. In this paper, a three-dimensional mechanical dynamic analysis and design program is described. This program will perform dynamic analysis of nonlinear systems; it will also perform linearized vibrational and modal analysis and automatic iterative design around any solution point in the nonlinear dynamic analysis. (5) where E is the kinetic energy of the system qj are generalized coordinates (three rotational and three translational) , Uj are the coordinate velocities, Ai are Lagrange multipliers, representing reaction forces in joints, pj are generalized angular momentums, Qj are generalized forces, €/Ji are constraint functions representing different types of connections at joints (see Figure 2). Representing all subscripted variables in vector form (e.g., !f.= [UI U2 ••• u6JI) , these equations become l(y., ~, fj, p,~; t) =0 cJ!(~) =Q (6) (7) By referencing the free body equations of (1-3) to the joints, we can view the above as a "nodal" type of formulation. FORMULATION The equations of motion of a three-dimensional mechanical system can be written in the following form. Free body equations: NUMERICAL SOLUTION Static and transient analysis To avoid the numerical instability associated with widely separated time constants, most general-purpose dynamic analysis programs employ implicit integration techniques. The corrector equation corresponding to (6) has the form (1) (2) o~ + of) A~+ (oF) Aq ( Ko T duo OU ~- j=4, 5, 6 (3) j=1,2, ... 6 +( KoOF - +OF) - Ap+ (OF) AA=-F T oJ!. oJ! o~ (iJ€/J/og) A2=-€/J (4) 885 (8) (9) 886 Fall Joint Computer Conference, 1972 of the system determinant using Muller's method. 5 This determinant is readily found from the interpreter, which performs an L U factorization to find the vibrational response. GENERATION OF THREE SPARSE MATRIX CODES FOR STATIC. TRANSIENT, AND VI.BRATIONAL (MODAL) ANALYSIS Solution efficiency Figure 1-0utline of program capabilities where T is the integration step size, Ko is a constant of integration. The matrix of partial derivatives in (7-8) is solved repetitively using explicit machine code. The Gear formula is used for integration. 2 Vibrational and modal analysis Substitution of s for Ko/T in (8) can be viewed as resulting in the linearized system equations [s (aF)n a,?& + (aF)n] ay' OU+ (aF)n a~ o~ aF + aF) ap op+ (aF)n a1 o~=I(s) + (s ap (acfJ/ ag) ofj = Q For each corrector iteration involved in (8-9), the minimum set of variables that must be determined are those required to update the Jacobian and right hand side vector-E.. The constraint equations represented by 9!. = Q can in general relate any qj variables in a nonlinear manner; also, from (1) and (2), the A's appear in the aEla~ term of the Jacobian. Therefore, it seems convenient to solve for all the arguments of E. of (6). We do not, then, attempt to reduce the number of equations . to a "minimum set" (such as the number of degrees of freedom) ; since most variables must be updated anyway, we find no purpose in identifying such a minimum set for the purpose of transient analysis. In contrast, for vibrational and modal analysis, a significant savings could be achieved by reducing the equations to the number of degrees of freedom of the system. We do not exploit this at present, since a single transient analysis easily dominates other types of analysis in cost. AUTOMATIC DESIGN (10) (11) where ) n represents evaluation at the nth time step; this includes the static equilibrium case (n=O), 0_ represents a small variation around the nth time step, 1(s) is a force or torque source vector. The evaluation of the vibrational response now proceeds by setting s = jw = j27rJ, and sweeping J over the frequency range of interest. This repeated evaluation is similar in spirit to the repeated solution of (8-9) at every corrector iteration. However, now an interpreter4 is used for solution of the complex-valued simultaneous equations. Modal analysis (i.e., determination of the natural frequencies) is relatively expensive if all modes must be found. However, the dominant mode can usually be found (from a "good" initial guess) in 5-7 evaluations Unlike electrical circuit design, it is not common to design mechanical systems to precisely match a frequency specification. It is far more common to adjust only the dominant mode to achieve an acceptable dynamic response. One of the most direct approaches to automatic iterative adjustment of the natural modes is to apply Newton iteration to the problem of solving ~ (s) = 0 TYPE OF JOINTS SYMBOL NO. OF EQUATIONS OF CONSTRAINTS EXAMPLE OF APPLICATION SPHERICAL ~ 3 SUSPENSION OF CARS UNIVERSAL ~ 4 TRANSMISSION FOR CARS CYLINDRICAL ~ 4 MACHINE TOOLS TRANSLATIONAL ~ 5 MACHINE TOOLS ~ 5 BEARINGS 5 SCREWS REVOLUTE SCREW A Figure 2-Constraint library Program for Analysis and Design of General Dynamic Mechanical Systems ~-'-I-__- - - 887 6.4 I I I I I .I .18 log f Figure 5-Vibrational response y x Figure 3-Example mechanism where A is the system determinant associated with (4). In particular, if 8i is a desired natural mode, and ~j is a parameter, then we solve iteratively or PROGRAM DESCRIPTION The general features of the program are outlined in Figure 1. In general, the program is intended to permit analysis of assemblages of links described by their masses and three inertial moments compatibly connected by any of the joints of Figure 2. Other types of mechanical elements (gears, cams, springs, dashpots) will be added shortly. IVlost of these fit neatly into the nodal formulation, effecting only the constraint equations given in (2). EXAl\1PLE Here, the term in brackets can be identified as a transfer function of the linearized system. The system shown in Figure 3 was simulated over a duration of 42.5 seconds of physical time. It was as-' (-.45,1.7S) t=.S j(al IS -~ en 10 "'o" ~ '- .~ S o O.S 1.0 loS [sec] Figure 4-Transient response ..----x--------~.707 --x---.------~~~-- -1.28 t=O -.28 t=.28 t=O Figure 6-Locus of natural modes 888 Fall Joint Computer Conference, 1972 sumed that a motor with a linear torque vs. speed characteristic, 72 vs. W2, drove the system against a constant load torque, 74. Figure 4 shows the transient response; a vibrational response is shown in Figure 5 around the static equilibrium point. Figure 6 shows the motion of the natural frequencies as the transient response develops; it may be noted that as the natural modes develop a larger imaginary component, an oscillation of increasing frequency appears in the transient response. can be expected to yield some numerical difficulties. Among these appear to be a high degree of oscillation in the reaction forces (X/s), preventing any effective error control to be exerted on these variables. ACKNOWLEDGMENTS SUIVnVIARY The. authors gratefully acknowledge the interest and support of the U. S. Air Force Office of Scientific Research (Grant No. AFOSR-71-2027), and the National Science Foundation (NSF Grant No. GK-31800). The nodal formulation of (1-5) offers a number of programming and numerical solution features. REFERENCES ( 1) No topological preprocessing is necessary to establish a set of independent variables; equations can be developed directly from the connection data, component-by~component. (2) Having a large number of solution variables assists in modeling common physical phenomena; for example, frictional effects in joints are routinely modeled and impact is easily handled. (3) Sensitivity necessary for man-machine and iterative design are easily determined due to the explicit appearance of common parameters (e.g., masses, inertial terms, link dimensions) in the nodal formulation. (4) The use of force and torque equations permits easy capability with current methods of continuum mechanics for internal stress analysis. It must be mentioned that the transient solution of three-dimensional mechanical systems poses some interesting numerical problems not present in the related fields of circuit and structural analysis. First, the equations are highly nonlinear requiring evaluations of tensor products at each corrector iteration; also, associated matrices are of irregular block structure. Second, the natural modes are not infrequently in the right half plane, representing a falling or a locking motion (see Figure 3). The integration of such (locally) unstable equations is not a well-understood process and 1 C W GEAR DIFSU B for solution of ordinary differential equations CACM Vol 14 No 3 pp 185-190 March 1971 2 N ORLANDEA M A CHACE D A CALAHAN Sparsity-oriented methods for simulation of mechanical dynamic systems Proc 1972 Princeton Conf on Information Sciences and Systems March 1972 3 F G GUSTAVSON W M LINIGER R A WILLOUGHBY Symbolic generation of an optimal count algorithm for sparse systems of linear equations Sparse Matrix Proceedings (1969) IBM Thomas J Watson Research Center Yorktown Heights New York Sept 1971 4 H LEE An implementation of gaussian elimination for sparse systems of linear equations Sparse Matrix Proceedings (1969) IBM Thomas J Watson Research Center Yorktown Heights New York Sept 1971 5 D E MULLER A method for solving algebraic equations using an automatic computer Math Tables Aids Computer Vol 10 pp 208-215 1956 6 M A CHACE D A CALAHAN N ORLANDEA D SMITH Formulation and numerical methods in the computer evaluation of mechanical dynamic systems Proc Third World Congress for the Theory of Machines and Mechanisms Kupari Yugoslavia pp 61-99 Sept 13-20 1971 7 R C DIX T J LEHMAN Simulation of dynamic machinery ASME Paper 71-Vibr-l11 Proc Toronto ASME Meeting Sept 1971 A wholesale retail concept for computer network management by DAVID L. GROBSTEIN Picatinny Arsenal Dover, New Jersey and RONALD P. UHLIG US Army Materiel Command Washington, D.C. THE MANAGEMENT PROBLEM The commitment to share computer resources in a network implies substantial changes in the resource control topology of that organization. This is particularly true for organizations that have existing computing facilities which will be pooled to form the base of the network's resources. The crux of the matter is that sharing implies not only that you will let someone else utilize the unused capacity of your computer; it also implies that you may be told to forgo installing your own machine because there is unused capacity elsewhere in the resource pool. If your mission depends on the availability and suitability of computer services from someone else's machine you suddenly become very interested in the management structure which governs the relationship between your organization and the Qne that has the computer. The purpose of this paper is to examine some of the objectives and problems of an organization having existing independent computing centers, when it contemplates moving into a network environment. In the past few years the technical feasibility of computer networks has been demonstrated. An examination of the existing networks, however, indicates that they are generally composed of homogeneous machines or are located essentially in one geographical area. The most notable exception to this is the ARPA Network which is widely distributed geographically and which has a variety of computers. The state-of-the-art now appears to be sufficiently far along to allow serious consideration of computer networks which are not experimental in origin and are not university based. When a large governmental or industrial organization contemplates the establishment of a computer network, initial excitement· focuses on technical sophistication and capabilities which may be achieved. As the problem is examined more deeply it becomes progressively clearer that the management aspects represent the greater challenge. There a.re a number of sound reasons for an organization to establish a computer network, but fundamental to these is the intent to reduce over-all computer resources required, by sharing them. The implications of this commitment to share are more far reaching than is immediately obvious when the idea is first put forth. In both government and industry it is common to find computing facilities established to service the needs of a particular profit center or activity. That is, the computer resources necessary to support a mission organization are placed under its own control, as in Divisions 1, 2, and 4 in Figure 1. COMPUTER NETWORK ADVANTAGES Computer network advantages can be divided into two categories, operational and management. The following advantages are classified· as operational in that they affect the day to day use of facilities in the network: 1. Provide access to large scale computers by users who do not have on-site machines. 889 890 Fall Joint Computer Conference, 1972 CORPORATE HEADQUARTERS Figure 1-:Qec~ntralized computing facilities 2. Provide access to different kinds of computers that are available in the network. 3. Provide access to specialized programs or technology available at particular computing centers. 4. Reduce costs by sharing of proprietary programs without paying for them at multiple sites. 5. Load level among computing centers. 6. Provide back-up capability in case of over-load, temporary, or extended outage. A very fundamental advantage is the provision of a full range of computing power to users, without having to install a high capacity machine at each site. To achieve this, it is necessary to provide access to the network through time sharing, batch processing and interactive graphics terminals. For each of these to be applied ~o that portion of a project for which it is best suited, all must have access to a common data base. Computer users frequently find programs that will be valuable to them but which have been developed for some other machine. Conversion to their own machine can be time consuming· and costly even if the programs are written in FORTRAN. Computer networks can offer access to different kinds of machines so that bor- . rowed programs may be run without conversion. If the program will serve without modification it need not be borrowed at all but can be used through the network at whichever installation has developed it. Thus a network environment can be used to encourage specialized technology at each computing center so that implementation and maintenance costs need not be repeated at every user's site. Computer networks can provide better service to users by allowing load leveling among the centers in the network so that no single machine becomes so overloaded that response and turn-around time degrade to unacceptable levels. Furthermore, the availability of like machines provides back up facilities to insure relatively uninterrupted service, at least for high priority work. From the standpoint of managing computer resources, networks offer several advantages in helping to achieve the goal of the best possible service for the least cost. Among these advantages are: 1. Greater ease and precision in identifying aggregated computing workload requirements, by providing a larger and more stable base from which to make workload projections. 2. Ability to add capacity to the network as a whole, rather than at each individual installation by developing specifications for new main frames based on total network requirements, with less regard for specific geographic location. 3. Computing power can be added in· increments which more closely match requirements. Experience at a number of installations indicates that it is extremely difficult to project computer use on a project by project basis with sufficient accuracy to use the aggregated data as a basis for installation or augmentation of computer facilities. Project estimations vary widely, particularly in scientific and engineering areas. The need for computer support is strongly driven by the week to week exigencies of the project. Because of this variability, larger computing centers can often project their future requirements better from past history and current use trends, than by adding up the requirements of each individual project. In a computing network these trends can be more easily identified, and, since the network as a whole serves a larger customer base than any single installation, the projections can be made more accurately. Simply stated, the law of large numbers applies to the aggregate. The second and third management advantages listed above are interrelated but not really the same. In a network, adding capacity to any node makes that capacity available to everyone else in the network. It is important to recognize that this applies to specialized kinds of capacity as well as to general purpose computer cycles. Thus when specifications for new hardware are developed, they can include requirements derived from the total network. Finally computer capacity tends to come in fixed size pieces. In the case of computers which can service relatively large and relatively long running computer programs, the pieces are not only large, they are very expensive. When these have to be provided at each installation requiring computer services, there is frequently expensive unused capacity when the equipment is first installed. In a network, added computing power can be more easily matched to overall requirements because the network capacity increments are distributed over a larger base. Wholesale Retail Concept for Computer Network l\1anagement WHOLESALE VS RETAIL FUNCTIONS Now let's examine the services obtained from a computing network. At most large computing centers, personnel, financial, and facilities resources are devoted to a combination of functions which include acquisition and operation of computing hardware, installation and maintenance of operating systems, language processors, and other general purpose "systems" software, and design and development of applications programs. These functions are integrated by the computing center manager to try to provide the best overall service to his customers. The Director of Computing Services at a location with its own computer center, provides an organizational interface with his local customers which may include the Director of Laboratories, the Director of Research and Engineering, Director of Product Assurance, and other similar functions which require scientific and engineering computer support. But what structure is required if there is no computing center at a particular location? How does the use of computer network services, instead of organizationally local hardware, affect the computer supported activities? Conversely, in a computer network environment, what is the effect of having no customers at the actual local site of the computing center? What functional structure is required at such a lonely center and what services should it offer? In considering the answer to these and other questions involved in the establishment of a computer network it is useful to distinguish wholesale from retail computing services. At its most fundamental level the wholesale computing function might be defined as the production of usable computer cycles. In order to achieve this it is necessary to have not only computer hardware, but also the operating systems software, language processors, etc., which are needed to make the hardware cycles accessible and usable. The wholesaler produces his services in bulk. The production of wholesale computer cycles may be likened to the production of coal, oil, or natural gas. Each of these products can be used in support of a wide variety of applications from the production of electricity to heating homes to broiling steaks on the back yard grill. The specific application is not the primary concern of the wholesaler. His concern is to produce bulk quantities of his product at the lowest possible cost. The Wholesale Computing Facility (WCF) like the oil producer, has to offer a welldefined, stable product, in a sufficient number of grades (classes of service) to satisfy his end users. To achieve this he also must have a marketing function which interacts with his retailers in order to maximize the 891 TABLE I-Resources and Services Offered by A Typical Wholesale Computing Facility (WCF) ; RESOURCES Computers System Software General Purpose Application Software Systems Programmers Operators Communications Equipment SERVICES Batch processing access Interactive terminal access Real time access Data File storage Data Base Management Contract programming Consulting Services Systems Software Hardware Interfaces Communications Documentation & Manuals Marketing/Marketing Support value of the products he offers. The marketing function includes technical representatives in the form of software and hardware consultants which can explain to the retailer how to derive the maximum value from the services offered and how to solve technical problems which arise. Table I is a non-exhaustive list of the resources needed and services offered by a typical WCF. Unlike the Wholesale Computing Facility which strives for efficient and effective production of general purpose computing power, the Retail Computing Facility (RCF) has the function of efficiently and effectively delivering service directly to the user. The user's concern is with mission accomplishment. He has a project to complete, and the computer provides an analytical tool. He is not directly concerned with efficiency of computer operation; he is concerned with maximizing the value of computer services to his project. In this respect fast turn-around time and specialized applications programs which ease his burden of communicating with the computer may be more important than obtaining the largest number of computer cycles per dollar. The retailer's function is to. provide an interface between the WCF and the user. His primary concern is to cater to the special needs, the taste and style of his customers. He must provide a wide variety of services which tailor the available computing power to each specialized need. To do this it is vital that the retailer understand and relate to his user's needs and capabilities. For the sophisticated user he may 'have to provide interactive terminal access and a variety of high level languages with which the user can develop his own ,specialized applications programs. For others he must offer analyst and programmer services to develop computer 892 Fall Joint Computer Conference, 1972 TABLE II-Resources Needed By and Services Offered By a Typical Retail Computing Facility (ReF) RESOURCES Wholesale /Retail Agreements Access to computers (terminals) Personnel General Purpose Applications Programs Marketing Support SERVICES Usable Computer Time Special Purpose Applications Programming General Purpose Applications Programming Software Consultant Services Applications and Debugging Consultation User Training Administrative Services Arrangement for Terminals Users Guides, Manuals, Key Punching, Password Assignments Marketing / applications to the customer's specifications. His primary orientation must be toward supporting his user's missions. The Retail Computing Facility also represents its users to the Wholesale Computing Facilities. In doing so, it helps the wholesaler to determine the kind of products which must be offered. The retailer may need to buy batch processing, interactive time sharing, and computer graphics services. He may need access to several different brands of computers, in order to process applications programs which his users have developed or acquired from others. He acquires commitments for these services from wholesalers through wholesale/ retail agreements. Table II indicates resources needed and services provided by ReFs. sale/retail agreements are regarded by the retailer as a resource. For the retailer to depend on them, the agreements must be binding, and the retailer must be assured that he will receive the same treatment when he is accessing a· computer remotely through the network as he would if he were geographically and organizationally a part of the wholesaler's installation. After all, to achieve the benefits of sharing computer resources which a network offers, it is necessary to tell some organizations that they cannot have their own computers. Thus it is clear that binding agreements, as surrogates for local computer centers, are fundamental to successful network implementation. Anothe'r reason to distinguish between wholesale and retail facilities is to make it clear that you cannot serve users merely by placing bare terminals where they can be reached. Examination of the retail functions indicate that they include a large number of the user oriented services offered by existing computing centers. It is important to recognize that the decision to use only terminal access to the network at some locations, does not result in saving all the resources that would be required to set up an independent computing center at those locations. Quite the contrary, if computing services are needed, it is a management obligation to provide the required resources for a successful Retail Computing Facility. A third reason for identifying the two functions is that in discussing organization and funding, lines of responsibility and control are clearer to portray. This third reason implies that the wholesale/retail distinction is useful in understanding and planning for network organization, whether or not the distinction becomes visible in the implemented organization as separate DIRECTOR OF COMPUTING SERVICES UTILITY OF THE WHOLESALE RETAIL DISTINCTION The notion of separate wholesale andretail computing facilities is useful for several reasons, particularly when a large company or government agency is attempting to integrate independent decentralized computing centers into a network. In the pre-network environment both the wholesale and retail facilities tend to be contained in the same organization and have responsibility for servicing only that organization. In a network environment it is important to identify the Wholesale Computing Facility in order to understand that it will be serving other organizations as well, and therefore must take a non-parochial point of view. The importance of this viewpoint is indicated by the fact that whole- .......................... I I i··································· ADMINISTRATIVE I COMPUTER OPERATIONS WHOLESALE COMPUTING FACILITY SERVICES : I : SYSTEMS SOFTWARE : ! ........................................: 1 I SCIENTIFIC APPLICATIONS DEVELOPMENT RETAIL COMPUTING FACILITY "........................................ Figure 2-Wholesale and retail computing facilities identified within a "typical" computing center Wholesale Retail Concept for Computer Network Management segments. The wholesale and retail portions of a "typical" computing center are indicated in Figure 2. APPLICATION OF THE WHOLESALE/RETAIL MANAGEMENT CONCEPT The concepts discussed to this point were developed in a search for answers to some very real problems currently facing the authors' organization. We want to make it clear that these theories and ideas are not official policy of our organization; rather they are possible. solutions to some of these problems. In discussing the approach described above with colleagues thr.oughout our organization, we discovered that it is useful to consider possible applications of these ideas in concrete rather than abstract terms. Our colleagues needed to know where they fit into the plan in order to understand it. Furthermore, mapping a general plan onto the structure of a specific organization is a prerequisite to acceptance. THE AUTHORS' ORGANIZATION The authors are in scientific and engineering data processing management positions with the US Army Materiel Command (AMC) , a major command of the US Army employing approximately 130,000 civilians, and 13,000 military at the time this paper was written. AMC has the mission of carrying out research, development, testing, procurement, supply and maintenance of the hardware in the Army's inventory. The scope of this mission is staggering. Some of the major organizational elements comprising the Army Materiel Command include "Commodity Commands" with responsibility for research, development, procurement and supply for specific groups of commodities (hardware), depots for maintenance and supply, and independent laboratories for exploratory research. Because of the nature of its mission, AMC might be likened to a large corporation with many divisions. For example, one of the "Commodity Commands"-Tank Automotive Command in Detroit, Michigan-c-carries out work similar to that carried out by a major automobile manufacturer in the United States. Another "Commodity Command"-Electronics Comm,andcarries out work similar to that carried out by a major electronics corporation. In a sense each of these "Commodity Commands" operates as a small corporation within the larger parent corporation. Each Commodity Command has laboratory facilities for carrying out research in its areas of commodity responsibility. In addition, independent laboratories carry out basic and 893 exploratory research. It may be helpful in the discussion which follows to draw a comparison between industrial situations and the Army Materiel Command. The Commanding General of AM C occupies a position similar to that of the President of a large diversified corporation. The Commanding Generals of each of the Commodity Commands and independent laboratories might be compared to Senior Group Vice Presidents in this large corporation, while the Commanding Officers of various research activities within Commodity Commands carry out functions similar to those carried out by Vice Presidents responsible for particular mission areas within a corporation. As in many large corporations, AM C has a number of different types of computers in geographically dispersed locations to provide computer support under many different Commanding Officers. Locations having major computing resources which are candidates for sharing, and locations requiring scientific and engineering computer support are shown in Figure 3. The resources which are candidates for sharing include 8 IBM 360 series computers (1 model 30, 1 model 40, 2 model 44s, 1 model 50, 3 model 65s), three Control Data Corporation 6000 series computers (1 CDC 6500, and 2 CDC 6600s), seven Univac 1100 series computers (6 Univac 1108s, 1 Univac 1106), one Burroughs 5500 computer, two EMR 6135 computers, and two additional major computers not yet selected. These 21 computers are located and operated at 17 different locations among those shown in Figure 3. It is not clear that everyone of the locations requiring ALASKA-\ Figure 3-AMC locations which have scientific computers or require scientific computing services 894 Fall Joint Computer Conference, 1972 services should ultimately receive them through a computing network. The main purpose of this illustra, tion is to show the magnitude of the problem. In exploring the existing situation it came as somewhat of a surprise to discover that we already have most of the management problems of computer networks, despite the fact that not all of the seventeen computer sites and thirty-one users sites are interconnected. Computer support agreements now exist between many different activities within Army Materiel Command, although not all of these provide for service via terminals. DECENTRALIZED MANAGEMENT OF THE NETWORK NODES The wholesale/retail organizing rationale discussed previously was developed as a vehicle for better understanding our present management structure, and as an aid in identifying a viable structure for pooling computer resources across major organizational boundaries. A number of proposals to centralize operational management of all of these computers were considered and discarded. The computing centers which would form the network exist today, and most have been operational for a number of years. They are well managed and running smoothly and we would like to keep it that way. Furthermore, the association of these centers with the activities which they serve has been mutually beneficial. ("Activity" is used here to refer to an organizational entity having a defined mission and distinct geographic location.) The centers receive resource support from the, activities and in turn provide for the specialized needs of the research and development functions which they serve. Sharing of these specialized technologies and services is a desirable objective of forming the network. For these reasons, the authors believe decentralized computer management would be necessary for a successful network. To make our commitment to decentralized computer management viable, we needed to face squarely the issue that each existing computer is used and controlled by a local Commanding Officer to accomplish the assigned research and development mission of his activity. But network pooling of computer resources implies that some activities use the network in lieu of installing their own computer. For this approach to succeed, availability of time in the computer pool has to be guaranteed to approximately the same degree as would derive from local hardware. The offered guarantee in a network environment would be an agreement between the activity with the computer and the activity requiring computer support. To make the network suc- ceed, corporate (or AMC) headquarters would have to set policies insuring that agreements have sufficient force to guarantee the using organization the resources specified. In the following paragraphs we will discuss how these agreements might be used in the Army Materiel Command type of environment. If we replace the words "Commanding Officer" with the words "Vice President" it seems clear that the same concepts apply to industry as well as to the military situation. If agreements are to become sufficiently binding so that they can be considered a resource it would be necessary to expand the basic mission of the Commanding Officer who "owns" the' computer. The only way to make the computer into a command-wide (or corporate) resource would be to assign the Commanding Officer and his Director of Computing Services the additional mission of providing computer support to all organizations authorized to make agreements with him, and to identify the resources under his control which would be given the task of providing computer services to "outside" users. These resources would now become a Wholesale Computing Facility serving both local and outside organizations. FUNCTIONS OF THE RETAILER In a large corporation with many divisions each division would require a "retailer" of computer services to perform the applications oriented data processing. Those divisions which operate computers would operate them as wholesale functions to provide computer service to all divisions within the corporation. Substituting the words "Commodity Command, Major Subordinate Command, or Laboratory" for the word "Division" the same principle could apply to the Army Materiel Command. Although a local commander would give up some cOIitrol over "his" computer, in that he would guarantee s6me capacity to ou'tside users, he would gain access to capacity on every other computer within the command, to support him in accomplishing his primary mission. Retail Computing Facility (RCF) describes that part of the organization responsible for assuring that computer services are available to the customers and users to accomplish the primary mission of the local activity. Every scientist and engineer within an activity, e.g., laboratory, would look to his local RCF to provide the type of service required. The RCF would turn to wholesalers throughout the entire corporation. This would give the retailer the flexibility to fit to the job an available computer rather than having to force fit the job on Wholesale Retail Concept for Computer Network IVranagement WHOLESALE COMPUTING FACILITY 1 895 for submitting a job to a local computer at a user's home installation. WHOLESALE COMPUTING FACILITY 2 FUNCTIONS OF THE WHOLESALER f~\ 1 iiitl\\\ 2 3 n-1 CUSTOMERS/USERS n Figure 4-RCF uses wholesale service agreements with several WCFs to provide retail services to customers to the local computer. These relationships are shown in Figure 4. In order to obtain the resources and provide the services listed in Table II a considerable amount of homework would have to be done by the retailer. The retailer would estimate the types and amounts of services required by his various users and arrange agreements with wholesalers to obtain these services. It must be recognized that this is a difficult job and in many instances cannot be done with great accuracy. The retailer would act as a middleman between customer/users and Wholesale Computer Facilities within the network. The retailer would be responsible for negotiating two different types of agreements. He would have to negotiate long term commitments with various wholesalers by guaranteeing to these wholesalers a certain minimum dollar amount; in return the wholesalers would guarantee to the retailers a certain minimum amount of computer time. The other type of agreement which retailers could negotiate with wholesalers would be for time as required. This would take the form of a commitment to spend dollars at a particular wholesale facility when the demand occurred and if time were available from that wholesaler. The retailer could then run jobs at that WCF on a "first come, first served" basis, or according to whatever queue discipline was agreed upon in advance. The range of agreements would be from "hard scheduled computer runs" to "time as available." I t is imperative that a user not have to go through lengthy negotiation each time he requests computer service from a retailer. SUbmitting a job through the local retailer to any computer in the corporate network should be at least as simple as the current procedures Figure 5 graphically depictes the Wholesale Computer Facility relationship to retailers. The WCF at installation m would provide resources and services listed in Table 1 through the network to retailers at various activities throughout the corporation. Normally, the WCF at installation m would still provide most of the service to the retailer at installation m; however, that retailer would not have any formally privileged position over other retailers located elsewhere in the network. The primary functions of the wholesaler would be to operate the computers and to provide the associated services which have been negotiated by various retailers. The wholesaler might well have services which were duplicated elsewhere in the network; however, he might also have some which were unique to his facility. It would be essential that every retailer in the network be made aware of the services offered by each WCF, and it would be the responsibility of the wholesaler to ensure that all of his capabilities were made known. In addition to operating the computer or computers at his home installation, the WCS might also be responsible for providing services to retailers through contracts placed with facilities external to the corporation. For example, a propriety software package not available from any computer in the corporate pool, but required by one or more retailers, might be available from some other computer which could be accessed by the corporate net. A contract to access those services could then be placed through one of the wholesalers. HEADQUARTERS FUNCTIONS IN MANAGING THE CORPORATE NETWORK Corporate Headquarters interaction with decentralized wholesale and retail computing facilities can be •. WHOLiLE/RETAi\AGR"E~~NTS RETAIL COMPUTING FACILITY 1 RETAIL COMPUTING FACILITY 2 RETAIL COMPUTING FACILITY n Figure 5-WCFs provide service to multiple RCFs 896 Fall Joint Computer Conference, 1972 CORPORATE HEADQUARTERS CORPORATE DIRECTOR OF COMPUTING SERVICES COMPUTER NETWORK STEERING COMMITTEE COMPUTER NETWORK MANAGEMENT Figure 6-Corporate headquarters organization with decentralized WCFs and RCFs provided through the establishment of two groups, Computer Network Management (CNM), and the Computer Network Steering Committee (CNSC). Both CNM and CNSC should report to the Corporate Director of Computing Services as shown in Figure 6. Overall, the headquarters is responsible for insuring that computer support requirements of scientists and engineers throughout the corporation are effectively met, and that they are provided in an efficient manner. The first responsibility is to insure that proper computer support is available. The second responsibility is to insure that the minimum amount of dollars are expended in providing that support. Computer Network Management (CNM) has basic headquarters staff responsibility to insure that the network is well coordinated and well run. It should: 1. Recommend policy and procedures for regulation and operation of the network. 2. Resolve network problems not covered by corporate procedures. 3. Negotiate facilities management agreements with the appropriate corporation divisions to operate Wholesale Computing Facilities (see Figure 6). 4. Work with WCFs, RCFs and the Computer Network Steering Committee to develop long range plans concerning network facilities. 5. Serve as a network-wide information center on facilities, services, rates, and procedures. CNM need not be directly involved in the day to day operations of the network. Wholesale/retail agreements should be negotiated betweenWCFs and RCFs without requiring headquarters approval, so long as these agreements are consistent with overall corporate policy. Obviously, agreements not meeting this requirement would· require CNM involvement. However, headquarters should function, insofar as possible, on a management by exception basis. A Computer Network Steering Committee (CNSC) should be established to suggest policy for consideration by the corporation. Members of the CNSC should be drawn from the corporation's operating divisions which have responsibility for decentralized management of the wholesale and retail computing facilities. The Computer Network Steering Committee can promote input to the Corporate Headquarters {)f useful comments and ideas on network policy and operation. Under the general structure some specific functions in whi~h Computer Network Management would be involved can be discussed further. Policies set by Computer Network Management should govern the content of agreements between wholesalers and retailers. The following is a list of some of the items which would have to be covered in such agreements: 1. The length of time for which an agreement should run would have to be spelled out in each case. 2. The wholesaler would have to guarantee a specific amount of service to the retailer in return for a guarantee of a minimum number of dollars from the retailer. 3. The kinds and levels of service to be provided would have to be spelled out in detail. Another major area in which Computer Network Management could be involved is in setting rates and in rationing services during periods of congestion. Policies should be established which would promote as effective and efficient support as possible during congested periods, without starving any single customer. Also, the total amount of computer time which each wholesaler can commit should be regulated to prevent over commitment of the network. Computer Network Management should set up some Wholesale Retail Concept for Computer Network Management form of "currency" to be used when resources become congested. The amount of "currency" in the network would be regulated by Computer Network Management with advi_ce from the Computer Network Steering Committee. This "currency" based rationing scheme should be put into effect ahead of time, rather than waiting until resources become so congested that it has to be created under emergency conditions. It is probable that separate rations should be established for different classes of service, such as interactive terminal service, fast turn around batch processing, overnight turn around, etc. Under this organizational concept the corporate headquarters would have to assume a greater responsibility for projecting requirements and procuring new hardware and software to meet those requirements throughout the corporation. Some requirements would arise which would have to be met immediately. There would not always be sufficient time to purchase new hardware or software. In such cases computer network management could arrange for external service contracts to be let through one or more wholesalers. CNM would have the responsibility for identifying peak workloads anticipated for the entire network on the basis of feedback information received from wholesalers and retailers. When overall network services become congested, an open ended external service contract might be placed to handle the· excess. This provides time for a corporate decision to be made as to whether or not additional computing capacity should be added to the network. The last major responsibility of Computer Network Management would be to aggregate requirements being received from wholesalers and retailers and to use these to project when new hardware and software should be procured for the S&E network community. The primary responsibility for justifying this new hardware would rest with CNM, drawing on all corporate resources for support and coordination. Computer Network Management with guidance from the Computer Network Steering Committee, would also be responsible for determining where new hardware should be placed in order to run the network in the most effective and efficient manner. Computer Network Management would fulfill its mission of insuring computer support to corporate scientists and engineers by negotiating facility management agreements with specific divisions of the corporation to establish and operate Wholesale Computer Facilities. These WCFs would offer the specified kinds and levels of service to Retail Computer Facilities via the network. ReFs would tailor and add to the services to meet requirements of local customers. 897 SUMMARY The notions of wholesale and retail computer facilities are particularly useful in examining the problems which must be faced when entering a computer network environment. The concept helps to clarify the functions which must be performed within a network of shared computer resources, and the management commitment which must be made if the objectives and advantages of such sharing are to be realized~; Mapping of the wholesale/retail functions onto the corporate organization which is forming the network can be valuable in identifying to members of that organization what their roles would be in the network environment. Such clarification is a prerequisite to securing the commitment necessary to make a network successful. Decisions as to whether or not operational management of the computer centers should be decentralized will vary with circumstances, but if efficient, well managed decentralized computing facilities exist, they should be retained. In any case, a central computer network mamigement function is needed to set policy and to take ad overall corporate viewpoint. It should· be remembered, however, that the primary purpose of a scientific and engineering computing network is to provide services to research and development projects at field activities. As such, the goal should be to contribute to the optimization of the costs and time involved in the research and development cycle, rather than to optimize the production of computer cycles. The establishment of a network steering committee which includes representatives from field activities can help to insure the achievement of this goal aItd to increase confidence in the network among the field personnel which it is to serve. Finally it is important to realize that a corporation begins to enter the network environment, from the management standpoint, as soon as some of its major activities begin to share computer resources, whether or not it involves any computer to computer communications facilities. Recognition of this point and a careful examination of corporate objectives and goals in computer sharing should lead to the establishment of a computer network management function, so that the corporation can manage itself into an orderly network environment rather than drifting into a chaotic one. ACKNOWLEDGl\1:ENT The authors would like to gratefully acknowledge extensive discussions and interaction with a group of people whose ideas and hard work contributed sub- 898 Fall Joint Computer Conference, 1972 stantially to the content of this paper: Mr. Einar Stefferud, Einar Stefferud & Associates; and the following members of organizations within the US Army Materiel Command: Richard Butler, Harry Diamond Labs; John Cianfione, US Army Materiel Command Headquarters ; James Collins, Missile Command; Tom Dames, Electronics Command; Edward Goldstein, Test and Evaluation Command; Dr. James Hurt, Weapons Command; Paul Lascala, Aviation Systems Command; Sam P. McCutchen, Mobility Equipment R&D Center; James Pascale, Watervliet Arsenal; Michael Romanelli, Aberdeen Research & Development Center; George Sumrall, Electronics Command. REFERENCES 1 E STEFFERUD A wholesale/retail structure for the AMC computer network Unpublished Discussion Paper Number ES&A/AMC/CNC DP-1 February 3 1972 2 J J PETERSON S A VEIT Survey of computer networks Mitre Corporation MTP-357 September 1971 3 F P BROOKS J K FERRELL T M GALL IE Organizational, financial, and political aspects of a three university computing center Proceedings of the IFIP Congress 1968 E49-52 4 M S DAVIS Economics-point of view of designer and operator Proceedings of Interdisciplinary Conference on Multiple Access Computer Networks University of Texas and Mitre Corporation 1970 5 J J HOOTMAN The computer network as a marketplace Datamation Vol 18 No 4 April 1972 6 C MOSMANN E STEFFERUD Campus computing management Datamation Vol 17 No 5 March 1971 7 E STEFFERUD Computer management College and University Business September 1970 8 L G ROBERTS B D WESSLER Computer network development to achieve resource sharing AFIPS Conference Proceedings May 1970 9 F E HEART et al The interface message processor for the ARP A computer network AFIPS Conference Proceedings May 1970 10 C S CARR S D CROCKER V G CERF HOST-HOST communication protocol in the ARPA network AFIPS ConfevenO.e Proc.eedings May 1970 11 E STEFFERUD Management's role in networking Datamation Vol 18 No 4 April 1972 12 E STEFFERUD The environment of computer operating system scheduling: Toward an understanding Journal of the Association for Education Data Systems March 1968 13 BLUE RIBBON DEFENSE PANEL Report to the President and the Secretary of Defense on the Department of Defense Appendix I: Staff report on automatic data processing July 1970 14 S D CROCKER et al Function-oriented protocols for the ARPA computer network AFIPS Conference Proceedings May 1970 A functioning computer network for higher education in North Carolina by LELAND H. WILLIAMS Triangle Universities Computation Center Research Triangle Park, North Carolina INTRODUCTION medical schools, two engineering schools, 30,000 undergraduate students, 10,000 graduate students, and 3,300 teaching faculty members. The primary. motivation was economic-to give each of the institutions access to more computing power at lower cost than they could provide individually. Initial grants were received from NSF and from the North Carolina Board of Science and Technology, in whose Research Triangle Park building TUCC was located. This location represents an important decision, both Currently there is a great deal of talk concerning computer networks. There is so much such talk that the solid achievements in the area sometimes tend to be overlooked. It should be clearly understood then, that this paper deals primarily with achievements. Only the last section, which is clearly labeled, deals with plans for the future. Adopting terminology from Peterson and V eit, 1 TUCC is essentially a centralized, homogeneous network comprising a central service node (IBM 370/165), three primary job source nodes (IB1VI 360/75, IBM: 360/ 40, IBM: 360/40) and 23 secondary job source nodes (leased line Data 100s, UCC 1200s, IBJVI 1130s, IBM 2780s, and leased and dial line IBl\1: 2770s) and about 125 tertiary job source nodes (64 dial or leased lines for Teletype 33 ASRs, IBM 1050s, IBM 2741s, UCC 1035s, etc.) See Figures 1 and 2. All source node computers in the network are homogeneous with the central service node and, although they provide local computational service in addition to teleprocessing service, none currently provides (non-local) network computational service. However, the technology for providing network computational service at the primary source nodes is immediately available and some cautious plans for using this technology are indicated in the last section of this paper. 50 educational institutions universities, colleges, community colleges, technical institutions, and secondary schools (various medium and low speed terminals) DUKE/DURHAM 360/40 NCSU/RALEIGH UNC/CHAPEL HILL BACKGROUND 360/40 360/75 PRIMARY TERMINAL The Triangle Universities Computation Center was established in 1965 as a non-profit corporation by three major universities in North Carolina-Duke University at Durham, The University of North Carolina at Chapel Hill, and North Carolina State University at Raleigh. Duke is a privately endowed institution and the other two are state supported. Among them are two NOTE: PRIMARY TERMINAL IN ADDITION TO THE PRIMARY TERMINAL INSTALLATION AT DUKE, UNC, AND NCSU, EACH CAMPUS HAS AN ARRAY OF MEDIUM AND LOW-SPEED TERMINALS DIRECTLY CONNECTED TO TUCC. Figure l-The TUCC network 899 900 Fall Joint Computer Conference, 1972 • One institution () More than one institution in one location Asheville Charlotte 2 Durham Elizabeth City Greensboro Raleigh Winston"'-Salem Wilmington 2 (higher education) 10 (secondary school system) 3 2 4 3 3 2 TOTAL NETWORK INSTITUTIONS: 53 Figure 2-Network of institutions served by TUCC/NCECS 3067 POWER & COLLANT DIST. 2314 DISK FACILITY DR VES 2880 BLOCK MULTI- 3165 CPU 2 MILLION BYTES 2314 DISK FACILITY 9 DRIVES t~h SELECTOR SUB-CHANNEL 2803 TAPE CONTROL REMOTE FIELD ENGINEERING ASSISTANCE 2540 CARD READ PUNCH 2701 DATA ADAPTER DUKE M/40 (40.8K BAUD) (40.8K BAUD) UNC M/75 (40.8K BAUD) 64 PORTS FOR LOWSPEED TYPEWRITER TERHINALS (110 BAUD) ADAPTER •• • TOTAL OF: 20 }'1ED-SPEED AT 2400 BAUD 8 MED-SPEED AT 4800 BAUD Figure 3-TUCC hardware· configuration 3330 DISK FACILITY 8 DRIVES Functioning Computer Network for Higher Education because of its geographic and political neturality with respect to all three campuses and because of the value of the Research Triangle Park environment. The Research Triangle Park is one of the nation's most successful research parks. In a wooded tract of 5,200 acres located in the small geographic triangle formed by the three universities, the Park in 1972 has 8,500 employees, a payroll of $100 million and an investment in buildings of $140 million. The Park contains 40 buildings that house the research and development facilities of 19 separate national and international corporations and government agencies and other institutions. TUCC pioneered massively shared computing; hence there were many technological, political, and protocol problems to overcome. Successive stages toward solution of these problems have been reported by Brooks, Ferrell, and Gallie;2 by Freeman and Pearson;3 and by Davis. 4 This paper will focus on present success. 1971, replacing a saturated 360/75 which was running a peak load of 4200 jobs/day. The life of the Model 75 could have been extended somewhat by the replacement of 2 megabytes of IBM slow core with an equal amount of Ampex slow core. This would have increased the throughput by about 25 percent for a net cost increase of about 8 percent. TUCC's minimum version of the Model 165 costs only about 8 percent more than the Model 75 and it is expected to do twice as much computing. So far it has processed 6100 jobs/day without saturation. This included about 3100 autobatch jobs, 2550 other batch jobs, and 450 interactive sessions. Of the autobatch jobs, 94 percent were processed with less than 30 minutes delay (probably 90 percent with less than 15 minutes delay), and 100 percent with less than 3 hours delay. Of all jobs, 77 percent were processed with less than 30 minutes delay, and 99 percent with less than 5 hours delay. At the present time about 8000 different individual users are being served directly. The growth of TUCC capability and user needs to this point is illustrated in Figure 4. Services to the TUCC user community include both remote job entry (RJE) and interactive processing. Included in the interactive services are programming systems employing the BASIC, PL/1, and APL Janguages. Also TSO is running in experimental mode. Available through RJE is a large array of compilers including FORTRAN IV, PL/l, COBOL, ALGOL, P~/C, WATFIV and WATBOL. These language facilities coupled with an extensive library of application programs provide the TUCC user community with a dynamic information processing system supporting a wide variety of academic computing activities. PRESENT STATUS TUCC supports educational, research) and (to a lesser, but growing extent) administrative computing requirements at the three universities, and also at 50 smaller institutions in the state and several research laboratories by means of multi-speed communications and computer terminal facilities. TUCC operates a 2-megabyte, telecommunications-oriented IBM 370/ 165 using OS/360-MVT/HASP and supporting a wide variety of terminals (see Figure 3). For high speed communications, there is a 360/75 at Chapel Hill and there are 360/40s at North Carolina State and Duke. The three campus computer centers are truly and completely autonomous. They view TUCC simply as a pipeline through which they get massive additional computing power to service their users. The present budget of the center is about $1.5 million. The Model 165 became operational on September 1, ADVANTAGES The financial advantage deserves further comment. As a part of the planning process leading to installation 120 110 100 TOTAL JOBS PER IIONTH RUN AT TUCC 90 80 70 JOBS PER IIONTH (x 1000) 60 50 40 30 20 10 1967 1968 901 1969 1970 1971 Figure 4-TUCC jobs per month, 1967-1972 1972 902 Fall Joint Computer Conference, 1972 of the Model 165, one of the universities concluded that it would cost them about $19,000 per month more in hardware and personnel costs to provide all their computing services on campus than it would cost to continue participation in TUCC. This would represent a 40 percent increase over their present expense for terminal machine, communications, and their share of TUCC expense. There are other significant advantages. First, there is the sharing of a wide varietrof application programs. Once a program is developed at one institution, it can be used anywhere in the network with no difficulty. For proprietary programs, usually only one fee need be paid. A sophisticated TUCC documentation system sustains this activity. Second, there has been a significant impact on the ability of the universities to attract faculty members who need large scale computing for tlleir research and teaching and several TUCC staff members including the author have adjunct appointments with the university computer science departments. A third advantage has been the ability to provide very highly competent systems programmers (and management) for the center. In general, these personnel could not have been attracted to work in the environment of the individual institutions because of salary requirements and because of system sophistication considerations. NORTH CAROLINA EDUCATIONAL COMPUTING SERVICE The North Carolina Board of Higher Education has established an organization known as the North Carolina Educational Computing Service (NCECS). This is the successor of the North Carolina Computer Orientation Project 5 which began in 1966. NCECS participates in TUCC and provides computer services to public and private educational institutions in North Carolina other than the three founding universities. Presently 40 public and private universities, junior colleges, and technical institutes plus one high school system are served in this way. NCECS is located with TUCC in the North Carolina Board of Science and Technology building in the Research Triangle Park. This, of course, facilitates communication between TUCC and NCECS whose statewide users depend upon the TUCC telecommunication system. NCECS serves as a statewide campus computation center for their users, providing technical assistance, information services, etc. In addition, grant support from NSF has made possible a number of curriculum development activities. NCECS publishes a catalog of available instructional materials; they provide curricu- lum development services; they offer workshops to promote effective computer use; they visit campuses, stimulating faculty to introduce computing into courses in a variety of disciplines. Many of these programs have stimulated interest in computing from institutions and departments where there was no interest at all. One major university chemistry department, for example, ordered its first terminal in order to use an NCECS infrared spectral information program in its courses. The software for NCECS systems is derived from a number of sources in addition to sharing in the community wide program development described above. Some of it is developed by NCECS staff to meet a specific and known need; some is developed by individual institutions and contributed to the common cause; some of it is found elsewhere, and adapted to the system. NCECS is interested in sharing curriculum oriented software in as broad a way as possible. Serving small schools in this way is both a proper service for TUCC to perform and is also to its own political advantage. The state-supported founding universities, UNC and NCSU, can show the legislature how they are serving much broader educational goals with their computing dollars. ORGANIZATION TUCC is successful not only because of its technical capabilities, but also because of the careful attention given to administrative protection of the interests of the three founding universities and of the N CECS schools; The mechanism for this protection can, perhaps, best be seen in terms of the wholesaler-retailer concept. 6 TUCC is a wholesaler of computing service; this service consists essentially of computing cycles,an effective operating system, programming languages, some application packages, a documentation service, and management. The TUCC wholesale service specifically does not include typical user services-debugging, contract programming, etc. Nor does it include user level billing nor curriculum development. Rather these services are provided for their constituents by the Campus Computation Centers and NCECS, which are the retailers for the TUCC network. See Figure 5. The wholesaler-retailer concept can also be seen in the financial and service relationships. Each biennium, the founding universities negotiate with each other and with TUCC to establish a minimum financial commitment from each, to the net budgeted TUCC costs. Then, on an annual basis the founding universities and TUCC negotiate to establish the TUCC machine configuration, each university's computing resource share, and the cost to each university. This negotiation, of course, Functioning Computer Network for Higher Education Figure 5-TUee wholesaler-retailer structure includes adoption of an operating budget. Computing resource shares are stated as percentages of the total resource each day. These have always been equal for the three founding universities, but this is not necessary. Presently each founding university is allocated 25 percent, the remaining 25 percent being available for NCECS, TUCC systems development, and other users. This resource allocation is administered by a scheduling algorithm which insures that each group of users has access to its daily share of TUCC computing resources. The algorithm provides an effective trade-off for each category between computing time and turn-around time; that is, at any given time the group with the least use that day will have job selection preference. The scheduling algorithm also allows each founding university and NCECS to define and administer quite flexible, independent priority schemes. Thus the algorithm effectively defines independent sub-machines for the retailers, providing them with the same kind of assurance that they can take care of their users' needs as would be the case with totally independent facilities. In addition, the founding university retailers have a bonus because the algorithm defaults unused resources from other categories, including themselves, to one or more of them according to demand. This is particularly advantageous when their peak demands do not coincide. This flexibility of resource use is a major advantage which accrues to the retailers in a network like TUCC. The recent installation of the old TUCC Model 75 at UNC deserves some comment at this point because it represents a good example of the TUCC organization in action. UNC has renewed a biennial agreement, with its partners, calling essentially for continued equal sharing in the use of and payment for TUCC computing resources. Such· equality is possible in our network precisely because each campus is free to supplement as required at home. Further more, the UNC Model 75 is a very modest version of the prior TUCC 1\1odel 75. It has 256K of fast core and one megabyte of slow core 903 where TUCC had one and two megabtyes respectively. Rental accruals and state government purchase plans combined to make the stripped Model 75 cost UNC less than their previous Model 50. It provides only a 20 percent throughput improvement over the displaced Model 50. The UNC Model 75 has become the biggest computer terminal in the world! There are several structural devices· which serve to protect the interests of both the wholesaler and the retailers. At the policy making level this protection is afforded by a Board of Directors appointed by the Chancellors of the three founding universities. Typically each university allocates its representatives to include (1) its business interests, (2) its computer science instructional interests, and (3) its other computer user interests. The University Computation Center Directors sit with the Board whether or not they are members as do the Director of NCECS and the President of TUCC. A good example of the policy level function of this Board is their determination, based on TUCC management recommendations, of computing service rates for NCECS and other TUCC users. At the operational level there are two important groups, both normally meeting each month. The Campus Computation Center Directors' meeting includes the indicated people plus the Director of NCECS and the President, the Systems Manager, and the Assistant to the Director of TUCC. The Systems Programmers' meeting includes representatives of the three universities, NCECS and TUCC. addition, of course, each of the universities has the usual campus computing committees. In PROSPECTS TUCC continues to provide cost:-effective general computing service for its users. Some improvements which can be foreseen include: 1. A wider variety of interactive services to be made available through TSO. 2. An increased service both for instructional and administrative computing for the other institutions of higher education in North Carolina. 3. Additional economies for some of the three founding universities through increasing TUCC support of their administrative data precessing requirements. 4. Development of the network into a multiple service node network by means of the symmetric HASP-to-HASP software developed- at TUCC. 5. Provision (using HASP) for medium speed terminals to function as message concentrators for 904 Fall Joint Computer Conference, 1972 low speed terminals, thus minimizing communication costs. 6. Use of line multiplexors to reduce communication costs. 7. Extension of' terminal service to a wider variety 'of data rates. Administrative data processing Some further comment can be made on item 3. TUCC has for some time been' handling the full range of administrative data processing for two NCECS universities and is beginning to do so for other NCECS schools~ The primary reason that this application lags behind instructional applications in the NCECS schools is simply that grant support, which stimulated development of the instructional applications, has been absent for administrative applications. However, the success of the two pioneers has already began to spread among the others. With the three larger universities there is a greater reluctance to shift their administrative data processing to TUCC, although Duke has already accomplished this for their student record processing. One problem which must be overcome to complete this evolution and allow these unive,rsities to spend administrative computing dollars on the more economic TUCC machine is the administrator's reluctance to give up a machine on which he can exercise direct priority pressure. The present thinking is that this will be accomplished by extending the sub-machine concept (job scheduling algorithm) described in the previous section so that each founding university may have both a researchinstructional sub-machine and an administrative submachine with unused resources defaulting from either one to the other before defaulting to another category. Of course, the TUCC computing resource will probably have to be increased to accommodate this; the annual negotiation among the founders and TUCC provides a natural way to define any such necessary increase. SUMMARY Successful massively shared computing has been demonstrated by the Triangle Universities Computation Center and its participating educational institutions in North Carolina. Some insight has been given into the economic, technological, and political factors involved in the success as well as some measures of the size of the operation. The TUCC organizational structure has been interpreted in terms of a wholesaleretail analogy. The importance of this structure and the software division of the central machine into essentially separate sub-machines for each retailer cannot be over-emphasized. REFERENCES 1 J J PETERSON S A VEIT Survey of computer networks MITRE Corporation Report MTP-359 1971 2 F P BROOKS J K FERRELL T M GALLIE Organizational, financial, and political aspects of a three university computing center Proceedings of the IFIP Congress 1968 E49-52 3 D N FREEMAN R R PEARSON Efficiency vs responsiveness in a multiple-service computer facility Proceedings of the 1968 ACM Annual Conference 4 M S DAVIS Economics-point of view of designer and operator Proceedings of Interdisciplinary Conference on Multiple Access Computer Networks University of Texas and MITRE Corporation 1970 5 L T PARKER T M GALLIE F P BROOKS J K FERRELL Introducing computing to smaller colleges and universities-a progress report Comm ACM Vol 12 1969319-323 6 D L GROBSTEIN R P UHLIG A wholesale retail conCept for computer network management AFIPS Conference Proceedings Vol 41 1972 FJCC Multiple evaluators in an extensible programming system* by BEN WEGBREIT Harvard University Cambridge, Massachusetts INTRODUCTION the evaluators, verifier, and optimizer are to fit together. Compiling an extensible language where compiled code is to be freely mixed with interpreted code presents several novel problems and therefore a few unique opportunities for optimization. Similarly, extensibility and multiple evaluators make program automation by means of source level transformation more complex, yet provide additional handles on the automation process. This paper is divided into five sections. The second section deals with communication between compiled and interpreted code, i.e., the runtime information structures and interfaces. The third section discusses one critical optimization issue in extensible languagesthe compilation of unit operations. The fourth section examines the relation between debugging problems, ' proving the correGtness of programs, and use of program properties in compilation. Finally, the fifth section discusses the use of transformation sets as an adjunct to extension sets for application-oriented optimization. Before treating the substantive issues, a remark on the implementation of the proposed solutions may be in order. Our acquaintance with these problems has arisen from our experience in the design, implementation, and use of the ECL programming system. ECL is an extensible programming system utilizing multiple evaluators; it has been operational on an experimental basis, running on a DEC PDPI0, since August 1971. Some of the techniques discussed in this paper are functional, others are being implemented, still others are being designed. As the status of various points is continually changing, explicit discussion of their implementation state in ECL will be omitted. For concreteness, however, we will use the ECL system and ECL's base language, ELI, as the foundation for discussion. An appendix treats' those. aspects of ELI syntax needed for reading the examples in this paper. As advanced computer applications become more complex, the need for good programming tools becomes more acute. The most difficult programming projects require the best tools. It is our contention that an effective tool for programming should have the following characteristics: (1) Be a complete programming system-a language, plus a comfortable environment for the programmer (including an editor, documentation aids, and the like). (2) Be extensible, in its data, operations, control, and interfaces with the programmer. (3) Include an interpreter for debugging and several compilers for various levels of compilation....:.-all fully compatible and freely mixable during execution. (4) Include a program verifier that validates stated input/output relations or finds counter-examples (5) Include facilities for program opfimization and tuning-aids for program measurement and a subsystem for automatic high-level optimization by means of source program transformation. We will assume, not defend, the validity of these contentions here. Defenses of these positions by us and others have appeared in the literature.l.2.3.4.5 The purpose of this paper is to discUss how these characteristics are to be simultaneously realized and, in particular, how * This work was supported in part by the U.S. Air Force, Electronics Systems Division, under ContractF19628-71-C-0173 and by the Advanced Research Projects Agency under Contract F19628-71-C-0174. 905 906 Fall Joint Computer Conference, 1972 MIXING INTERPRETED AND COMPILED CODE The immediate problem in a multiple evaluator system is mixing code. A program is a set of procedures /which call each other; some are interpreted, others compiled by various compilers which optimize to various levels. Calls and non-local gotos are allowed where either side may be either compiled or interpreted. Additionally, it is useful to allow control flow by means of RETFROM-that is the forced return from a specified procedure call (designated by name), with a specified value as if that procedure call had returned normally with the given value (cf. Reference 6). Within each procedure, normal techniques apply. Interpreted code carries the data type of each entityfor autonomous temporary results as well as parameters and locals. Since the set of data types is openended and augmentable during execution, data types are implemented as pointers to (or indices in) the data type table. Compiled code can usually dispense with data types so that temporary results need not, in general, carry type information. In either interpreted or compiled procedures, where data types are carried, the type is associated not with the object but rather with a descriptor consisting of a type code and a pointer to the object. This results in significant economies whenever objects are generated in the free storage region. Significant issues arise in communication between procedures. The interfaces must: (1) Allow identification of free variables in one procedure with those of a lower access environment and supply data type information where required. _ (2) Handle a special, but important, subcase of # 1non-local gotos out of one procedure into a lower access environment. (3) Check that the arguments passed to compiled procedure are compatible ",ith the formal parameter types. (4) Check that the result passed back to a compiled procedure (from a normal return of a called function or via a RETFROM) is compatible with the expected data type. These communication issues are somewhat complicated by the need to keep the overhead of procedure interface as low as possible for common cases of two compiled procedures linking in desirable (i.e., well-programmed) ways. The basic technique is to include in the binding (i.e., parameter block) for any new variable its name and its mode (i.e., its data type) in addition to its value. Names are implemented as pointers to (or indices in) the symbol table. (With reasonable restrictions on the number of names and modes, both name and mode can be packed into a 32-bit word.) Within a compiled procedure, all variables are referenced as a pair (block level number, variable number within that block). Translation from name to such a reference pair is carried out for each bound appearance of a variable during compilation; at run time, access is made using a display (cf. Reference 7). However, a free appearance of a variable is represented and identified by symbolic name. Connection between the free variable and some bound variable in an enclosing access environment is made during execution, implemented using either shallow or deep bindings (cf. Reference 8 for an explanation of the issues and a discussion of the trade-offs for LISP). Once identification is made, the mode associated with the bound variable is checked against the expected mode of the free variable, if the expected mode is known. To illustrate the last point, we suppose that in some procedure, P, it is useful to use the free variable BETA with the knowledge that in all correctly functioning programs the relevant bound BETA will always be a character string. To permit partial type checking during compilation, a declaration may be made at the head of the first BEGIN-END block of P. DECL BETA:STRING SHARED BETA; This creates a local variable BETA of mode STRING which shares storage (i.e., binding by reference in FORTRAN or PL/I9) with the free variable BETA. All subsequent appearances of BETA in P are bound, i.e., identified with the local variable named BETA. Since the data type of the local BETA is known, normal compilation can be done for all internal appearances of BETA. The real identity of BETA is fixed during execution by identification with the free BETA of the access environment at the point P is entered. When the identification of bound and free BETA is made, mode checking (e.g., half-word comparison of two type codes) ensures that mode assumptions have not been violated. In the worst case, parameter bindings entail the same sort of type checking. The arguments passed to a procedure come with associated modes. When a procedure is entered, the actual argument modes can be checked against the expected parameter modes and, where appropriate, conversion performed. Then the names of the formal parameters are added to the argument block, forming a complete variable binding. Notice that this works in all four cases of caller/ callee pairs: compiled/ Multiple Evaluators in an Extensible Programming System compiled, compiled/interpreted, interpreted/compiled and interpreted/interpreted. Since type checking is implemented by a simple (usually half-word) comparison, the overhead is small. However, for the most common cases of compiled/ compiled pairs, mode checking is handled by a less flexible but more efficient technique. The mode of the called procedure may be declared in the caller. For example: DECL G:PROC(INT,STRING;COMPLEX); specifies that G is a procedure-valued variable which takes two arguments, an integer and a character string, and returns a complex number. For each call on G in the range of this declaration, mode checking and inserti?n of conversion code can be done during compilation, wIth the knowledge that G is constrained to take on only certain procedure values. To guarantee this constraint, all assignments to (or bindings of) G are type checked. Type checking is made relatively inexpensive by giving G the mode PROC(INT,STRING;COMPLEX)-i.e., there is an entry in the data type table for it-and comparing this with the mode of the procedure value being assigned. The single comparison simultaneously verifies the validity of the result mode and both argument modes. Result types are treated similarly. For each procedure call, a uniform call block is constructed* which includes the name of the procedure being called and the expected mode?f the result (e.g., for the above example, the name field IS G and the expected-result-mode field is COMPLEX). This is ignored when compile-time checking of result type is possible and normal return occurs. However, if interpreted code returns to compiled code or if RETFROM causes a return to a procedure by ~ non-direct callee, then the expected-result-mode field is checked against the mode of the value returned. Transfer of control to non-local labels falls out naturally if labels are treated as named entities having constant value. On entry to a BEGIN-END block (in either interpreted or compiled code), a binding is made f~r e.ach label in that block. The label value is a triple (IndIcator of whether the block is interpreted or compiled, program address, stack position). A non-local goto label L is executed by identifying the label value referenced by the free use of L, restoring the stack position from the third component of the triple and either . . ' Jumpmg to the program address in compiled code or to the statement executor of the interpreter. * This can be included in the LINK information. 7 907 UNIT COMPILATION In most. programs the bulk of the execution time is spent performing the unit operations of the problem domain. In some cases (e.g., scalar calculations on reals), the hardware realizes the unit operations directly. Suppose, however, that this is not the case. Optimizing such programs requires recognizing instances of the unit operations and special treatment-unit compilation-to optimize these units properly. An extensible language makes recognition a tractable problem, since the most natural style of programming is to define data types for the unit entities, and procedures for the unit operations in each problem area. (Operator extension and syntax extension allow the invocation of these procedures by prefix and infix expressions and special statement types.) Hence, the unit operations are reasonably well-modularized. Detecting which procedures in the program are the critical unit operations entails static analysis of the call and loop structure, coupled with counts of call frequency during execution of the program over benchmark data sets. The critical unit operations generally have one or more of the following characteristics: (1) They have relatively short execution time; their importance is due to the frequency of call, not the time spent on each call. (2) Their size is relatively small. (3) They are terminal nodes of the call structure, or nearly terminal nodes. (4) They entail a repetition, performing the same action over the lower-level elements which collectively comprise the unit object of the problem level. Unit compilation is a set of special heuristics for exploiting these characteristics. Since execution time is relatively small, call/return overhead is a significant fraction. Where the unit operations are terminal, the overhead can be substantially reduced. The arguments are passed from compiled code to a terminal unit operation with no associated modes. (Caller and callee know what is being transmitted.) The arguments can usually be passed directly in the registers. No bindings are made for the formal parameters. (A terminal node of the call structure calls no other; hence, there can be no free uses of these variabIes.) The result can usually be returned in a register again, with no associated mode information. Since the unit operations are important far out of proportion to their size, they are subject to optimizing techniques too expensive for normal application. Opti- 908 Fall Joint Computer Conference, 1972 mal ordering of a computation sequence (e.g., to minimize memory references or the number of temporary locations) can, in general,* be assured only by a search over a large number of possible orderings. Further, the use of identities (e.g., a*b+a*c~a*(b+c)) to minimize the computational cost causes significant increase in the space of possibilities to be considered. The use of arbitrary identities, of course, makes the problem of program equivalence (and, hence, of cost minimization) undecidable. However, an effective procedure for obtaining equivalent computations can be had either by restricting the sort of transformations admittedll or by putting a bound on the degree of program expansion acceptable. Either approach results in an effective procedure delivering a very large set of equivalent computations. While computationally intractable if employed over the whole program, a semi-exhaustive search of this set for the one with minimal cost is entirely reasonable to carry out on a small unit operator. Similarly, to take full advantage of multiple hardware function units, it is sometimes necessary to unwind a loop and rewind it with a modified structure-e.g., to perform, on the ith iteration of the new loop, certain computation which was formerly performed on the (i-1)st, ith, and (i+ l)st iteration. Again, a search is required to find the optimal rewinding. In general, code generation which tries various combinations of code sequences and chooses among them (by analysis or simulation) can be used in a reasonable time scale if consideration is restricted to the few unit operations where the pay-off is significant. Consider, for example, a procedure which searches through an array of packed k-bit elements counting the number of times a certain (parameter-specified) k-bit configuration occurs. The table can either be searched in array orderall elements in the first word, then all elements in the next, etc.-or in position order-all elements in the first position of a word, all elements in the next position, etc. Which search strategy is optimal depends on k, the hardware for accessing k-bit bytes from memory, the speed of shifting vs. memory access, and the sort of mask and comparison instructions for k-bit ~ytes. In many situations, the easiest way of choosing the· better strategy is to generate code for each and· compute. the relative -execution times as a function of array length. A separate issue arises from non-obvious unit operations. Suppose analysis shows that procedures F and G are each key operations (i.e., are executed very frequently). It may wellbe that the appropriate candidates for unit compilation are F, G, and some particular * The only significant exception is for arithmetic expressions with no common subexpressions. 10 combination of them, e.g., "F;G" or "G( ... F( ... ) ... )". That is, if a substantial number of calls on G are preceded by calls of F (in sequence or in an argument position), the new function defined by that composition should be unit compiled. For example, in dealing with complex arithmetic, +, -, *, /, and CONJ are surely unit operations. However, it may be that for some program, "u/v+v*CONJ(v)" is critical. Subjecting this combination to unit compilation saves four of the ten multiplications as well as a number of memory frequencies. ASSUMPTIONS AND ASSERTIONS If an optimizing compiler is to generate really good code, it must be supplied the same sort of additional information that would be given to or deduced by a careful human coder. Pragmatic remarks (e.g., suggestions that certain global optimizations are possible) as well as explicit consent (e.g., the REORDER attribute of PL/I) are required. Similarly) if programs are to be validated by a· program verifier, ~sistance from the programmer in f{)rming inductive assertions is needed. Communication between the programmer and the optimizer/verifier is by means of ASSUME and ASSERT forms. An assumption is stated by the programmer and is (by and large) believed true by the evaluator. A local assumption ASSUME(X~O); is taken as true at the point it appears. A global assumption may be extended over some range by means of the infix operator IN, e.g., ASSUME(X~O) IN BEGIN ... END; where the assumption is to hold over the BEGIN-END block and over all ranges called by that block. The function of an assumption is to convey information which the programmer knows is true but which cannot be deduced from the program. Specifications of the wellformedness of input data are assumptions as are statements about the behavior of external procedures analyzed separately. Assertions, on the other hand, are verifiable. From the program text and the validity of the program's assumptions, it is possible-at least in principle-to validate each assertion. For example, ASSERT(FOR I FROM 1 TO N DO TRUEP(A[I]~ B[ID) IN BEGIN ... END Multiple Evaluators in an Extensible Programming System should be provably true over the entire BEGIN-END block, given that all program assumptions are correct. T~e interpreter, optimizer, and verifier each treat assumptions and assertions in different ways. Since the interpreter is used primarily for debugging, it takes the position that the programmer is not to be trusted. Hence, it checks everything, treating assumptions and assertions identically-as extended Boolean expressions to be evaluated and checked for true (false causing an ERROR and, in general, suspension of the program). Local assertions and assumptions are evaluated in analogy with the conditional expression NOT (expression)==}ERROR ( ... ) (This is similar to the use of ASSERT in ALGOL W.12) Assumptions and assertions over some range are checked over the entire range. This can be done by checking the validity at the start of the domain and setting up a condition monitor (e.g., cf. Reference 13) which will cause a software interrupt if the condition is ever violated during the range. Hence, in interpreted execution, assumptions and assertions act as comments whose correctness is checked by the evaluator, providing a rather nice debugging tool. Not only are errors explicitly detected by a false assertion, but when errors of other sorts occur (e.g., overflow, data type mismatch, etc.), the programmer scanning through the program is guaranteed that certain assertions were valid for that execution. Since debugging is often a matter of searching the execution path for the least source of an error, certainty that portions of the program are correct is as valuable as knowledge of the contrary. The compiler simply believes assertions and assumptions and uses their validity in code optimization. Consider, for example, the assignment X~B[I-J]-60 Normally, the code for this would include subscript bounds checking. However, in X~(ASSERT(1~I-J I\I-J ~LENGTH(B))) IN B[I-J]-60 the assertion guarantees that the subscript is in range and no run-time check is necessary. While assertkms and assumptions are handled by the compiler in rather the same way, there are a fewdifferences. Assumptions are the more powerful in that they can be used to express knowledge of program behavior which could not be deduced by the compiler, either because necessary information is not available (e.g., facts about a procedure which will be input during 909 program execution) or because the effort of deduction is prohibitive (e.g., the use of deep results of number theory in a program acting on integers). Separate compilation makes the statement of such assumptions essential, e.g~, ASSUME(SAFE(P)) IN BEGIN ... END insures that the procedure P is free of side effects and hence can be subj ect to common subexpression elimination. Unlike assumptions, assertions can be generated by the compiler as logical consequences of assumptions, other assertions, and the program text. Consider, for example, the following conditional block (cf. Appendix for syntax), where L is a pointerto a list structure. BEGIN L=NIL==} ... ; ... CDR(L) ... END Normally, the CDR operation would require a check for the empty list as an argument. However, provided that there are no intervening assignments to L, the compiler may rewrite this as BEGIN L=NIL==} ... ; ASSERT(L~NIL) IN BEGIN ... CDR(L) ... END END in which case no checks are necessary. Assertions added by the compiler and included in an augmented source listing provide a means for the compiler to record its deductions and explicitly transmit these to the programmer. The program verifier treats assumptions and assertions entirely differently. Assumptions are believed. * Assertions are to be proved or disprovedl4 ,15 on the basis of the stated assumptions, the program text, the semantics of the programming language, and specialized knowledge about the subject data types. In the case of integers, there has been demonstrable success-the· assertion verifier of King has been applied successfully to some definitely non-trivial algorithms. Specialized theorem provers for other domains may be constructed. Fortunately, the number of domains is small. In ALGOL 60, for example) knowledge of the reals, the integers, and Boolean expressions together with an understanding of arrays and. array subscripting will handle most program assertions. In an extensible language, the situation is more complex, but not drastically so. The base language data types are typically those of ALGOL 60 plus a few others, e.g., characters; the set of formation rules for data aggregates consists of arrays, plus structures and pointers. * One might, conceivably, check the internal consistency of a set Df assumptions, Le., test for possible contradictions. 910 Fall Joint Computer Conference, 1972 Only the treatment of pointers presents any new issues-these because pointers allow data sharing and hence· access to a single entity under a multiplicity of names (Le., access paths). This is analogous to the problem of subscript identification, but is compounded since the access paths may be of arbitrary length. However, recent work16 shows promise of providing proof techniques for pointers and structures built of linked nodes. Since all extension sets ultimately derive their semantics from the base language, it suffices to give a formal treatment to the primitive modes and the built-in set of formation rules-assertions on all other modes can be mapped into· and verified on the base. ** One variation on the program verifier is the notifier. Whereas the verifier uses formal proof techniques to certify correctness, the notifier uses relatively unsophisticated means to provide counterexamples. One can safely assume that most programs will not be initially correct; hence, substantial debugging assistance can be provided by simply pointing out errors.· This can be done somewhat by trial and error-generating values which satisfy the assumptions and running the program to check the assertions. Since. programming errors typically occur at the extremes in the space of data values, a few simple heuristics may serve to produce critical counterexamples. If, as appears likely, the computation time for program verification is considerable, the use of a simple, quicker means to find the majority of bugs will be of assistance on online program production. While the notifier can never validate programs, it may be helpful in creating them. OPTIMIZATION, EXTENSION SETS, AND TRANSFORMATION SETS One of the advantages of an extensible language over a special purpose language developed to handle a new application arises from the economics of optimization. In an extensible language system, each extended language Li is defined by an extension set Ei in terms of the base language. Since there is only a single base, one can afford to spend considerable effort in developing optimization techniques for it. Algorithms for register allocation, common sub expression detection, eliinination of ** This gives only a formal technique for verification, i.e., specifies what. must be axiomatized and gives a valid reduction technique. It may well turn out that such reduction is not a practical solution if the resulting computation costs are excessive. In such cases, one can use the underlying axiomatization as a basis for deriving rules of inference on an extension set. These may be introduced in a fashion similar to the specialized transformation sets discussed in the next section. variables, removal of computation from loops, loop fusion, and the like need be developed and programmed only once. All extensions will take advantage of these. In contrast, the compiler for each special purpose language must have these optimizations explicitly included. This is already a reasonably large programming project, so large that many special purpose languages go essentially unoptimized. As the set of known optimization techniques grows, the economic advantage of extensible language optimization will increase. There is one flaw in the above argument, which we now repair. There is the tacit assumption that all optimization properties of an extended language Li can 'be obtained from the semantics and pragmatics of the base. While the logical dependency is strictly valid, taking this as a complete technique is rather impractical. While certain optimization properties-those concerned solely with control and data flow-can be well optimized in terms of the base language, other properties depending on long chains of reasoning would tax any optimizer that sought to derive them every time they were required. The point, and our solution, may best be exhibited with an example. Consider FOO(SUBSTRING(I, J, X CONCAT Y» which calls procedure FOO with the substring consisting of the Ith to (I +J -1)th characters of the string obtained by concatenating the contents of string variable X with string variable Y. In an extensible language, SUBSTRING and CONCAT are defined procedures which operate on STRINGs (defined to be ARRAYs of CHARacters) . SUBSTRING~ EXPR(I,J:INT, S:STRING; STRING) BEGIN DECL SS:STRING SIZE J; FOR K TO J DO SS[K]~S[I+K-1]; SS END CONCAT~ EXPR(A,B:STRING; STRING) BEGIN DECL R:STRING SIZE LENGTH(A)+ LENGTH(B); FOR M TO LENGTH(A) DO R[M]~A[M]; FOR M TO LENGTH(B) DO R[M+LENGTH(A)] ~B[M]; R END One could compile code for the above c~ll on Faa by compiling three successive calls-on CONCAT, Multiple Evaluators in an Extensible Programming System SUBSTRIN G,and FOO. However, by taking advantage of the properties of CONCAT and SUBSTRING, one can do far better. Substituting the definition of CONCAT in SUBSTRING procedures SUBSTRING(I, J, A CON CAT B)= BEGIN DECL SS :STRING SIZE J; DECL S:STRING BYVAL BEGIN DECL R:STRING SIZE LENGTH(A) +LENGTH(B); FOR M TO LENGTH(A) DO R[M]~A[M]; FOR M TO LENGTH(B) DO R[M+LENGTH(A)]~B[M]; R END; FOR K TO J DO SS END SS{K]~S[I+K-1]; The block which computes R may be opened up so that its declarations and computation occur in the surrounding block. Then, since S is identical to R, S may be systematically replaced by R and the declaration for S deleted. ' BEGIN DECL SS:STRING SiZE J; DECL R:STRING SIZE LENGTH(A)+ LENGTH(B); FOR M TO LENGTH(A) DO R[M]~A[l\tI]; FOR lVI TO LENGTH(B) DO R[l\1 + LENGTH(A)]~B[lVI]; FOR K TO J DO SS[K]~R[I+K-1]; SS END This implies that R[M] is defined by the conditional block BEGIN M ~ LEN GTH(A)=}A[M]; B[M - LENGTH(A») END Replacing M by I +K -1 and substituting, the assignment loop becomes FOR K TO J DO SS[K]~BEGIN has the form FOR x TO VO DO BEGIN x~vl=}fl(X); f 2 (x) END where Vi are loop-independent values and fi are functions in x. A basic optimization on the base language transforms this into the equivalent form which avoids the test FOR x TO MIN(vo,vl) DO flex); FOR x FROM MIN(vo, vl)+1 TO VO DO f2(x); Hence, SUBSTRING (I, J, A CONCAT B) may be computed by a call on the procedure* EXPR(I,J:INT, A,B:STRING; STRING) BEGIN DECL SS:STRING SIZE J; FOR K TO MIN(J, LENGTH(A)-I+l) DO SS[K]~A[I + K -1]; FOR K FROM MIN(J, LENGTH(A)-I+l)+1 TO J DO SS[K]~B[I+K-LENGTH(A)-I]; SS END This could, in principle, be deduced by a compiler from the definitions of SUBSTRING and CONCAT. However, there is no way for the compiler to know a priori that the substitution has substantial payoff. If the expression SUBSTRING(I,J,A CONCAT B) were a critical unit operation, the heuristic "try all possible compilation techniques on key expressions" would discover it. However, the compiler cannot afford to try all function pairs appearing in the program in the hope that some will simplify-the computational cost is too great. Instead,'the programmer specifies to the compiler the set of transformations (cf. Reference 17 for related techniques) he knows will have payoff. TRANSFORM(I,J:INT, X,Y:STRING; SUBSTITUTE) SUBSTRING(I, J, X CONCAT Y) TO SUBSTITUTE(Z:X CONCAT Y, SUBSTRING(I,J,Z» (I, J, X, Y) In general, a transformation rule has the format TRANSFORM«pattern variables); (action variables» K~LENGTH(A)-I +1=>A[I+K-l]; B[I+K-LENGTH (A)-I] END Distributing the assignment to inside the block, this 911 (pattern) TO (replacemeJ;lt) * Normal common subexpression elimination will recognize that LENGTH (A), I-I, and MIN(J, LENGTH(A)-I+l) need be calculated only once. 912 Fall Joint Computer Conference, 1972 All lexemes in the pattern and replacement are taken literally except for the (pattern variables) and (action variables). The former are dummy arguments, statement-matching variables, etc.; the latter denote values used to derive the actual transformation from the input transformation schemata. In the above case, the procedure SUBSTITUTE is called to expand CONCAT within SUBSTRING as the third argument. The simplified result, CP, is- applied to the dummy arguments. Hence, calls such as SUBSTRING (3,2*N +C, AA CON CAT B7) are transformed into calls on CP(3,2*N + C, AA, B7). When defining an extension set, the programmer de.. fines the unit data types, unit operations, and additionally the significant transformations on the problem domain. These domain-dependent transformations are adjoined to the set of base transformations to produce the total transformation set. The program, as written, specifies the function to be computed; the· transformation set provides an orthogonal statement of how the computation is to be optimized. For example, in adding a string. manipUlation extension, one would first define the data type STRING (fixed length array of characters). Next, one defines the' unit operations: LENGTH, CONCATenate, SUBSTRING, SEARCH (fora string x as part of a string y starting at position i and return the initial index or zero if not present). Finally, one defines the transforma~ tions on program units involving these operations. string variable would be implemented as a pointer to a simple STRING (i.e., PTR(STRING» with the understanding that assignment of a string value to such a string variable causes a copy of the string to be made and the pointer set to address the copy. * With these three possible representations available, one would define the data type string variable to be TRANSFORM(X,Y:STRING) LENGTH(X CONCAT Y) TO LENGTH(X)+LENGTH(Y) The predicate WHEN appearing in a pattern is handled in somewhat the same fashion as are ASSERTions during program verification. It is proved as part of the pattern matching; the transformation is applicable only if the predicate is provably TRUE and the literal part of the pattern matches. Here, it must be proved that LENGTH(X) is a constant over the block B and all ranges called by B. If so, the variable can be of type STRING. Similarly, if there is a computable TRANSFORM(A,X,Y,Z:STRING; SUBSTITUTE) X CONCAT Y CONCAT Z TO SUBSTITUTE(A: Y CONCAT Z; X CONCAT A) (X,Y,Z) So long as the transformations are entirely local, they act only as macro replacements. The significant transformations in an extension set are those which make global, far reaching changes to program or data. Clearly, these changes will require knowledge, assumed or asserted, about that portion of the program affected by these changes .. Consider, for example, the issue of string variables in the proposed extension set. If a string variable is to have a fixed capacity, the type STRING is satisfactory. If varial;>le capacity is desired but an' upper bound can be established for each string variable, the type VARSTRING could be defined like string VARYING in PLjI. If completely variable capacity is required, a ONEOF(STRING, VARSTRING, PTR(STRING» Each string variable is one of these three data types. To provide for the worst case, the programmer could specify each formal parameter string variable to be ONEOF(STRING, VARSTRING, PTR(STRING» and specify each local string variable to be a PTR(STRING). A program so written would be correct, but its performance would, in general, suffer from unused generality. Each string variable whose length is fixed can be redeclared TRANSFORM(Dl,D2:DECLIST, S:STATLIST, F:FORM, X; WHEN) WHEN (CONSTANT(LENGTH(X») IN BEGIN D1; DECL X:PTR(STRING) BYVAL F; D2; SEND TO BEGIN Dl; DECLX:STRING BYVAL F; D2; SEND * This does not exhaust the list of possible representations for strings. To avoid copying in concatenation, insertion, and deletion, one could represent strings by linked lists of characters nodes: each node consisting of a character and a pointer to the next node. A string variable could then be a pointer to such node lists. To. minimize storage, one could employ hashing to insure that each distinct sequence of characters is represented by a unique string-table-entry; a string variable could then be a pointer to such string-table-entries. Hashing and implementing strings by linked lists could be combined to yield still another representation of strings. In the interest of brevity, we consider only three rather simple representations; however, the point we make is all the stronger when additional representations are considered. Multiple Evaluators in an Extensible Programming. System maximum length less than a reasonable upper limit LIM, then the data type VARSTRING can be used. TRANSFORM(D1,D2:DECLIST, B:BLOCK, F:FORM, K:INT, X; WHEN) BEGIN D1; DECL X:PTR(STRING) BYVAL F; D2; WHEN(LENGTH(X)~ KI\K~LIM) IN B END TO BEGIN D1; DECL X:VARSTRING SIZE K BYVAL F; D2; BEND To prove an assertion for a variable X over some range, it suffices to prove the assertion true of all expressions that are assignable to X in that range. An assertion about LENGTH(X) is reasonable to validate since it entails only theorem proving over the integers18-once the string manipulation routines are reinterpreted as operations on string lengths. Fortunately, most of the interesting predicates are of this order of difficulty. Typical WHEN conditions are: (1) a variable (or certain fields of a data structure) is not changed; (2) an object in the heap is referenced only from a given pointer; (3) whenever control reaches a given program point, a variable always has (or never has) a given value (or set of values); (4) certain operations are never performed on certain elements of a data structure. Such conditions are usually easier to check than those concerned with correct program behavior, since only part of the action carried out by the algorithm is relevant. That is, the technique suggested above for simplifying proofs about string manipulation operators by considering only string lengths generalizes too many related cases. To verify a predicate concerned with certain properties, one takes a valuation of the program on a model chosen to abstract those properties. 19 The program is run by a special interpreter which performs the computation on the simpler data space tailored to the property. To correct for the loss of information (e.g., the values of most program tests are not available), the computation is conservative (e.g., the valuation of a conditional takes the union of the valuations of the possible arms). If the valuation in the model demonstrates the proposition, it is valid for the actual data space. While this is a sufficient condition, not a necessary one, an appropriate model should seldom fail to prove a true proposition. CONCLUSION An interpreter, a compiler, a source-level optimizer employing domain-specific transformations, and a program 913 verifier each compute a valuation over some model. Fitting these valuators together so as to exploit the complementarity of their models is a central task in constructing a powerful programming tool. ACKNOWLEDGMENT The author would like to thank Glenn Holloway and Richard Stallman for discussions concerning various aspects of this paper. REFERENCES 1 B WEGBREIT The ECL programming system Proc AFIPS 1971 FJCC Vol 39 AFIPS Press Montvale New Jersey pp 253-262 2 A J PERLIS The synthesis of algorithmic systems JACM Vol 17 No 1 January 1967 pp 1-9 3 T E CHEATHAM et al On the basis for ELF-an extensible language facility Proc AFIPS FJCC 1968 Vol 33 pp 937-948 4 D G BOBROW Requirements for advanced programming systems for list processing CACM Vol 15 No 7 July 1972 5 T E CHEATHAM B WEGBREIT A laboratory for the study of automating programming Proc AFIPS 1972 SJCC Vol 40 6 W TEITELMAN et al BBN-LISP Bolt Beranek and Newman Inc Cambridge Massachusetts July 1971 7 E W DIJKSTRA Recursive programming Numerische Mathematik 2 (1960) pp 312-318. Also in Programming Systems and Languages S Rosen (Ed) McGraw-Hill New York 1967 8 J MOSES The function of FUNCTION in LISP , SIGSAM Bulletin July 1970 pp 13-27 9 IBM SYSTEM/360 PL/I language reference manual Form C28-8201-2 IBM 1969 10 R SETHI J D ULLMAN The generation of optimal code for arithmetic expressions JACM Vol 17 No 4 October 1970 pp 715-728 11 A V AHO J D ULLMAN TransformatilJns on straight line programs Conf Rec Second Annual ACM Symposium on Theory of Computing SIGACT May 1970 pp 136-148 12 R L SITES Algol W reference manual Technical Report CS-71-230 Computer Science Department Stanford University August 1971 13 D G BOBROW B WEGBREIT A model and stack implementation of multiple environments Report No 2334 Bolt Beranek and Newman Cambridge Massachusetts March 1972 submitted for publication 914 Fall Joint Computer Conference, 1972 14 R FFLOYD Assigning meanings to programs Proc Symp Appl Math Vol 19 1967 pp 19-32 15 R F FLOYD Toward interactive design of correct programs Proc IFIP Congress 1971 Ljubljana pp 1-5 16 J POUPON B WEGBREIT Verification techniques for data structures including pointers Center for Research in Computing Technology Harvard University in preparation 17 B A GALLER A J PERLIS A proposal for definitions in Algol CACM Vol 10 No 4 April 1967 pp 204-219 18 J C KING A program verifier PhD Thesis Department of Computer Science Carnegie-Mellon University September 1969 19 M SINTZOFF Calculating properties of programs by valuations on specific models SIGPLAN Notices Vol 7 No 1 and SIGACT News No 14 January 1972 pp 203-207 20 B WEGBREIT et al ECL programmer's manual Center for Research in Computing Technology Harvard University Cambridge Massachusetts January 1972 a statement of the form If ill is TRUE then the block is exited with the value of CO; otherwise, the next statement of the. block executed. For example, the ALGOL 60 conditional if ill! then COl else if ill2 then CO2 else C0 3 is written in ELl as (Unconditional statements of an ELl block are simply executed sequentially-unless a goto transfers control to a different labeled statement.) A.2 Declarations The initial statements of a block may be declarations having the format DECL £: APPENDIX: A BRIEF DESCRIPTION OF ELl SYNTAX To a first approximation, the syntax of ELl is like that of ALGOL 60 or PL/I. Variables, subscripted variables, labels, arithmetic and Boolean expressions, assignments, gotos and procedure calls can all be written as in ALGOL 60 or PL/I. Further, ELI is-like ALGOL 60 or PL/I-a block structured language. Executable statements in ELl can be grouped together and delimited by BEGIN END brackets to form blocks. New variables can he created within a block by declaration; the scope of such variable names is the block in which they are declared. The syntax of ELl differs from that of ALGOL 60 or PL/I most notably in the form of conditionals, declarations, and data type specifiers. For the purposes of this paper, it will suffice to explain only these points of difference. (A more complete description can be found in Reference 20.) A.1 Conditionals Conditionals in ELl are a special case of BEGIN END blocks. In general, each ELl block has a valuethe value of the last statement executed. Normally, this is the last statement in the block. Instead, a block can be conditionally exited with some other value CO by IS ~S; where £ is a list of identifiers, ~ is the data type, and S specifies the initialization. For example, DECL X, Y: REAL BYVAL A[I]; This creates two REAL variables named X and Yand initializes them to separate copies of the current value of A[I]. The specification S may assume one of three forms: (1) empty-in which case a default initialization determined by the data type is used. (2) BYVAL CO-in which case the variables are initialized to copies of the value of CO. (3) SHARED CO-in which case the variables are declared to be synonymous with the value of co. A.3 Data types Built-in data types of the language include: BOOL, CHAR, INT, and REAL. These may be used as data type specifiers to create scalar variables. Array variables may be declared by using the built-in procedure ARRAY. For example, DECL C: ARRAY(CHAR) BYVAL CO; creates a variable named C which is an ARRAY of Multiple Evaluators in an Extensible Programming System 915 AA Procedures CHARacters. The LENGTH (i.e., number of components) and initial value of C is determined by the value of U. Procedure-valued variables may be defined by the builtin procedure PROC. For example, A procedure may be defined by assigning a procedure value to a procedure-valued variable. For example, DECL G:PROC(BOOL,ARRAY(INT); REAL); EXPR(X:REAL,N:INT; REAL) BEGIN DECL R:REAL BYVAL 1; FOR I TO N DO R~R*X; REND declares G to be variable which can be assigned only those procedures which take a BOOL argument and an ARRAY(INT) argument and deliver a REAL result. assigns to IPOWER a procedure which takes two arguments, a REAL and an INT (assumed positive), and computes the exponential. IPOWER~ Automated programmering-The programmer's assistant by WARREN TEITELMAN* Bolt, Beranek, & Newman Cambridge, Massachusetts features as complete compatibility of compiled and interpreted code, "visible" variable bindings and control information, programmable error recovery procedures, etc. Indeed, at this point the two systems, BBN-LISP and the programmer's assistant, have become so intertwined (and interdependent), that it is difficult, and somewhat artificial, to distinguish between them. We shall not attempt to do so in this paper, preferring instead to present them as one integrated system. BBN-LISP contains many facilities for assisting the programmer in his non-programming activities. These include a sophisticated structure editor which can either be used interactively or as a subroutine; a debugging package for inserting conditional programmed interrupts around or inside of specified procedures; a "prettyprint" facility for producing structured symbolic output; a program analysis package which produces a tree structured representation of the flow of control between procedures, as well as a concordance listing indicating for each procedure the procedures that call -it, the procedures that it calls, and the variables it references, sets, and binds; etc. Most on-line programming systems contain similar features. However, the essential difference between the BBN-LISP system and other systems is embodied in the philosophy that the user addresses the system through an (active) intermediary agent, whose task it is to collect and save information about what the user and his programs are doing, and to utilize this information to assist the user and his programs. This intermediary is called the programmer's assistant (or p.a.). INTRODUCTION This paper describes a research effort and programming system designed to facilitate the production of programs. Unlike automated programming, which focuses on developing systems that write programs, automated programmering involves developing systems which automate (or at least greatly facilitate) those tasks that a programmer performs other than writing programs: e.g., repairing syntactical errors to get programs to run in the first place, generating test cases, making tentative changes, retesting, undoing changes, reconfiguring, massive edits, et al., plus repairing and recovering from mistakes made during the above. When the system in which the programmer is operating is cooperative and helpful with respect to these activities~ the programmer can devote more time and energy to the task of programming itself, i.e., to conceptualizing, designing and implementing. Consequently, he can be more ambitious, and more productive. BBN-LISP The system we will describe here is embedded in BBN-LISP. BBN-LISP, as a programming language, is an implementation of LISP, a language designed for list processing and symbolic manipulation.! BBN-LISP as a programming system, is the product of, and vehicle for, a research effort supported by ARPA for improving the programmer's environment. ** The term "environment" is used to suggest such elusive and subjective considerations as ease and level of interaction, forgivingness of errors, human engineering, etc. Much of BBN-LISP was designed specifically to enable construction of the type of system described in this paper. For example, BBN-LISP includes such THE PROGRAMMER'S ASSISTANT For most interactions with the BBN LISP system, the programmer's assistant is an invisible interface between the user and LISP: the user types a request, for example, specifying a function to be applied to a set of arguments. The indicated operation is then per- * The author is currently at Xerox Palo Alto Research Center 3180 Porter Drive, Palo Alto, California 94304. ** Earlier work in this area is reported in Reference 2. 917 918 Fall Joint Computer Conference, 1972 formed, and a resulting value is printed. The system is then ready for the next request. However, in addition, in BBN-LISP, each input typed by the user, and the value of the corresponding operation, are automatically stored by the p.a. on a global data structure called the history list. The history list contains information associated with each of the individual "events" that have occurred in the system, where an event corresponds to an individual type-in operation. Associated with each event is the input that initiated it, the value it yielded, plus other information such as side effects, messages printed by the system or by user programs, information about any errors that may have occurred during the execution of the event, etc. As each new event occurs, the existing events on the history list are aged, with the oldest event "forgotten". * The user can reference an event on the history list by a pattern which is used for searching the history list, e.g., FLAG:~$ refers to the last event in which the variable FLAG was changed by the user; by its relative event number, e.g. -1 refers to the most recent event, -2 the event before that, etc., or by an absolute event number. For example, the user can retrieve an event in order to REDO a test case after making some program changes. Or, having typed a request that contains a slight error, the user may elect to FIX it, rather than retyping the request in its entirety. The USE command provides a convenient way of specifying simultaneous substitutions for lexical units and/or character strings, e.g., USE X FOR Y AND + FOR *. This permits after-the-fact parameterization of previous events. The p.a. recognizes such requests as REDO, FIX, and USE as being directed to it, not the LISP interpreter, and executes them directly. For example, when given a REDO command, the p.a. retrieves the indicated event, obtains the input from that event, and treats it exactly as though the user had typed it in directly. Similarly, the USE command directs the p.a. to perform the indicated substitutions and process the result exactly as though it had been typed in. The p.a. currently recognizes about 15 different commands (and includes a facility enabling the user to define additional ones). The p.a. also enables the user to treat several events as a single unit, (e.g. REDO 47 THRU 51), and to name an event or group of events, e.g. , NAME TEST -1 AND -2. All of these capabilities allow, and in fact encourage, the user to construct complex console operations out of simpler ones in much the same fashion as programs are constructed, i.e., simpler operations are checked out first, and then combined and rearranged into large ones. The important * The storage used in its representation is then reusable. point to note is that the user does not have to prepare in advance for possible future (re-) usage of an event. He can operate straightforwardly as in other systems, yet the information saved by the p.a. enables him to implement his "after-thoughts." UNDOING Perhaps the most important after-thought operation made possible by the p.a. is that of undoing the sideeffects of a particular event or events. In most systems, if the user suspects that a disaster might result from a particular operation, e.g., an untested program running wild and chewing up a complex data structure, he would prepare for this contingency by saving the state part of or all of his environment before attempting the operation. If anything went wrong, he would then back up and start over. However, saving/dumping operations are usually expensive and time-consuming, especially compared to a short computation, and are therefore not performed that frequently. In addition, there is always the case where disaster strikes as a result of a supposedly debugged or innocuous operation. For example, suppose the user types FOR X IN ELTS REMOVE PROPERTY 'MORPH FROM X which removes the property MORPH from every member of the list ELTS, and then realizes that he meant to remove this property from the members of the list ELEMENTS instead, and has thus destroyed some valuable information. Such "accidents" happen all too often in typical console sessions, and result in the user's either having to spend a great deal of effort in reconstructing the inadvertently destroyed information, or alternatively in returning to the point of his last back-up, and then repeating all useful work performed in the interim. (Instead, using the p.a., the user can recover by simply typing UNDO, and then perform the correct operation by typing USE ELEMENTS FOR ELTS.) The existence of UNDO frees the user from worrying about such oversights. He can be relaxed and confident in his console operations, yet still work rapidly. He can even experiment with various program and data con-, figurations, without necessarily thinking through all the implications in advance. One might argue that this would promote sloppy working habits. However, the same argument can be, and has been, leveled against interactive systems in general. In fact, freeing the user from such details as having to anticipate all of the consequences of an (experimental) change usually re- Automated Programmering sults in his being able to pay more attention to the conceptual difficulties of the problem he is trying to solve. Another advantage of undoing as it is implemented in the programmer's assistant is that it enables events to be undone selectively. Thus, in the above example, if the user had performed a number of useful modifications to his programs and data structures before noticing his mistake, he would not have to return to the environment extant when he originally typed FOR X IN ELTS REMOVE PROPERTY 'MORPH FROM X, in order to UNDO that event, i.e., he could UNDO this event without UNDOing the intervening events. * This means that even if we eliminated efficiency considerations and assumed the existence of a system where saving the entire state of the user's environment required insignificant resources and was automatically performed before every event, there would still be an advantage to having an undo capability such as the one described here. Finally, since the operation of undoing an event itself produces side effects, it too is undoable. The user can often take advantage of this fact, and employ strategies that use UNDO for desired operation reversals, not simply as a means of recovery in case of trouble. For example, suppose the user wishes to interrogate a complex data structure in each of two states while successively modifying his programs. He can interrogate the data structure, change it, interrogate it again, then undo the changes, modify his programs, and then repeat the process using successive UNDOs to flip back and forth between the two states of the data structure. IMPLEMENTATION OF UNDO** The UNDO capability of the programmer's assistant is implemented by making each function that is to be undo able save on the history list enough information to enable reversal of its side effects. For example, when a list node is about to be changed, it and its original contents are saved; when a variable is reset, its binding (i.e., position on the stack) and its current value are saved. For ~ch primitive operation that involves side effects, there are two separate functions, one which always saves this information, i.e., is always undoable, and one which does not. Although the overhead for saving undo information is small, the user may elect to make a particular operation not be undo able if the cumulative effect of saving * Of course, he could UNDO all of the intervening events as well, e.g., by typing UNDO THRU ELTS. ** See Reference 1, pp. 22.39--43, for a more complete description of undoing. 919 the undo information seriously degrades the overall performance of a program because the operation in question is repeated so often. The user, by his choice of function, specifies which operations are undoable. In some sense, the user's choice of function acts as a declaration about frequency of use versus need for undoing. For those cases where the user does not want certain functions undo able once his program becomes operational, but does wish to be able to undo while debugging, the p.a. provides a facility called TESTMODE. When in TESTMODE, the undoable version of each function is executed, regardless of whether the user's program specifically called that version or not. Finally, all operations involving side effects that are typed-in by the user are automatically made undo able by the p.a. by substituting the corresponding undo able function name(s) in the expression before execution. This procedure is feasible because operations that are typed-in rarely involve iterations or lengthy computations directly, nor is efficiency usually important. However, as a precaution, if an event occurs during which more than a user-specified number of pieces of undo information are saved, the p.a. interrupts the operation to ask the user if he wants to continue having undo information saved. AUTOMA'rIC ERROR CORRECTION-THE DWIM FACILITY The previous discussion has described ways in which the programmer's assistant is explicitly invoked by the user. The programmer's assistant is also automatically invoked by the system when certain error conditions are encountered. A surprisingly large percentage of these errors, especially those occurring in type-in, are of the type that can be corrected without any knowledge about the purpose of the program or operation in question, e.g., misspellings, certain kinds of syntax errors, etc. The p.a. attempts to correct these errors, using as a guide both the context at the time of the error, and information gathered from monitoring the user's requests. This form of implicit :;i.ssistance provided by the programmer's assistant is called the DWIM (Do-What-I-Mean) capability. For example, suppose the user defines a function for computing N factoral by typing DEFIN[ ( (FACT (N) IF N = 0 THEN 1 ELSE NN*(FACT N -1)*]. When this input is executed, an error occurs because DEFIN is not the name of a function. However, DWIM * In BBN-LISP ] automatically supplies enough right parentheses to match back to the last [. 920 Fall Joint Computer Conference, 1972 notes that DEFIN is very close to DEFINE, which is a likely candidate in this context. Since the error occurred in type-in, DWIM proceeds on this assumption, types = DEFINE to inform the user of its action, makes the correction and carries out the request. Similarly if the user then types FATC (3) to test out his function, DWIM would correct FATC to FACT. When the function FACT is called, the evaluation of NN inNN*(FACT N -1) eauses an error. Here, DWIM is able to guess that NN probably means N by using the contextual information that N is the name of the argument to the function FACT in which the error occurred. Since this correction involves a user program, DWIM proceeds more cautiously than for corrections to user type-in: it informs the user of the correction it is about to make by typing NN(IN FACT)~N? and then waits for approval. If the user types Y (for YES), or simply does not respond within a (user) specified time interval (for example, if the user has started the computation and left the room), DWIM makes the correction and continues the computation, exactly as though the function had originally been correct, i.e., no information is lost as a result of the error. If the user types N (for NO), the situation is the same as when DWIM is not able to make a correction (that it is reasonably confident of). In this case, an error occurs, following which the system goes into a suspended state called a "break" from which the user can repair the problem himself and continue the computation. Note that in neither case is any information or partial results lost. DWIM also fixes other mistakes besides misspellings, e.g., typing eight for "C" or nine for ")" (because of failure to hit the shift key). For example, if the user had defined FACT as (IF N=O THEN 1 ELSE NN*8FACT N-l), DWIM would have been able to infer the correct definition. DWIM is also used to correct other types of conditions not considered errors, but nevertheless obviously not what the user meant. For example, if the user calls the editor on a function that is not defined, rather than generating an error, the editor invokes the spelling corrector to try to find what function the user meant, giving DWIM as possible candidates a list of user defined functions. Similarly, the spelling corrector is called to correct misspelled edit commands, p.a. commands, names of files, etc. The spelling corrector can also be called by user programs. As mentioned above, DWIM also uses information gathered by monitoring user requests. This is accom- TABLE I-Statistics on Usage Sessions exec inputs edit commands undo saves 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 1422 454 360 1233 302 109 1371 400 294 102 378 1089 791 650 3149 24 55 2178 311 604 44 52 3418 782 680 2430 558 677 2138 1441 653 1044 1818 p.a. commands spelling corrections 87 44 33 17 28 28 184 64 8 6 95 19 7 1 2 0 1 32 57 30 4 2 plished by having the p.a., for each user request, "notice" the functions and variables being used, and add them to appropriate spelling lists, which are then used for comparison with (potentially) misspelled units. This is how DWIl\t{ "knew" that FACT was the name of a function, and was therefore able to correct F ATC to FACT. As a result of knowing the names of user functions and variables (as well as the names of the most frequently used system functions and variables), DWIM seldom fails to correct a spelling error the user feels it should have. And, since DWIM knows about common typing errors, e.g., transpositions, doubled characters, shift mistakes, etc.,* DWIM almost never mistakenly corrects an error. However, if DWIM did make a mistake, the user could simply interrupt or abort the computation, UNDO the correction (all DWIM corrections are undo able) , and repair the problem himself. Since an error had occurred, the user would have had to intervene anyway, so that DWIM's unsuccessful attempt at correction did not result in extra work for him. STATISTICS OF USE While monitoring user requests, the programmer's assistant keeps statistics about utilization of its various capabilities. Table I contains 5 statistics from 11 different sessions, where each corresponds to several * The spelling corrector also can be instructed as to specific user misspelling habits. For example, a fast typist is more apt to make transposition errors than a hunt-and-peck typist, so that DWIM is more conservative about transposition errors with the latter. See Reference 1, pp. 17.20-22 for complete description of spelling corrections. Automated Programmering CONCLUSION TABLE II-Further Statistics exec inputs undo saves changes undone calls to editor edit commands edit undo saves edit changes undone p.a. commands spelling corrections calls to spelling corrector # of words compared time in spelling corrector (in seconds) CPU time (hr: min: sec) console time time in editor 921 3445 10394 468 387 3027 1669 178 360 74 1108* 5636** 80.2 1:49:59 21:36:48 5:23:53 * An "error" may result in several calls to the spelling corrector, e.g., the word might be a misspelling of a break command, of a p.a. command, or of a function name, each of which entails a separate call. ** This number is the actual number of words considered as possible respellings. Note that for each call to the spelling corrector, on the average only five words were considered, although the spelling lists are typically 20 to 50 words long. This number is so low because frequently misspelled words are moved to the front of the spelling list, and because words are not considered that are "obviously" too long or too short, e.g., neither AND nor PRETTYPRINT would be considered as possible respellings of DE FIN. individual sessions at the console, following each of which the user saved the state of his environment, and then resumed at the next console session. These sessions are from eight different users at several ARPA sites. It is important to note that with one exception (the author) the users did not know that statistics on their session would be seen by anyone, or, in most cases, that the p.a. -gathered such statistics at all. The five statistics reported here are the number of: 1. requests to executive, i.e., in LISP terms, inputs to evalquote or to a break; 2. requests to editor, i.e., number of editing commands typed in by user; 3. units of undo information saved by the p.a., e.g., changing a list node (in LISP terms, a single rplaca or rplacd) corresponds to one unit of undo information; 4. p.a. commands, e.g., REDO, USE, UNDO, etc.; 5. spelling corrections. Mter these statistics were gathered, more extensive measurements were added to the p.a. These are shown for an extended session with one user (the author) in Table II below. We see the current form of the programmer's assistant as a first step in a sequence of progressively more intelligent, and therefore more helpful, intermediary agents. By attacking the problem of' representing the intent behind a user request, and incorporating such information in the p.a., we hope to enable the user to be less specific, and the p.a. to draw inferences and take more initiative. However, even in its present relatively simplistic form, in addition to making life a lot more pleasant for users, the p.a. has had a sup rising synergistic effect on user productivity that seems to be related to the overhead that is involved when people have to switch tasks or levels. For example, when a user types a request which contains a misspelling, having to retype it is a minor annoyance (depending, of course, on the amount of typing required and the user's typing skill). However, if the user has mentally already performed that task, and is thinking ahead several steps to what he wants to do next, then having to go back and retype the operation represents a disruption of his thought processes, in addition to being a clerical annoyance. The disruption is even more severe when the user must also repair the damage caused by a faulty operation (instead of being able to simply UNDO it). The p.a. acts to minimize these distractions and diversions, and thereby, as Bobrow puts it, Cl • •• greatly facilitates construction of complex programs because it allows the user to remain thinking about his program operation at a relatively high level without having to descend into manipulation of details. "3 We feel that similar capabilities should be built into low level debugging packages such as DDT, the executive language of time sharing systems, etc., as well as other "high-level" programming languages, for they provide the user with a significant mental mechanical advantage in attacking problems. REFERENCES 1 W TEITELMAN D G BOBROW A K HARTLEY D L MURPHY BRN-LISP TENEX reference manual BBN Report July 1971 2 W TEITELMAN Toward a programming laboratory Proceedings of First International Joint Conference on Artificial Intelligence Washington May 1969 3 D G BOBROW Requirements for advanced programming systems for list processing (to be published July 1972 CACM) A programming language for real-time systems by A. KOSSIAKOFF and T. P. SLEIGHT The Johns Hopkins University Silver-8pring, Maryland the process, thus avoiding the construction -of a program which exceeds the capacity of the target computer, or which uses undue core capacity and time for low-priority operations. 7. Provide a representation of a computer program which is self-documenting, in a manner clearly understandable by either an engineer or programmer, making clearly visible the interfaces among subunits, the branch points and the successive steps of handling each information input. SUMMARY This paper describes a different approach to facilitating the design of efficient and reliable large scale computer programs. The direction taken is toward less rather than more abstraction, and toward using the computer most efficiently as a data processing machine. This is done by expressing the program in the form of a twodimensional network with maximum visibility to the designer, and then converting the network automatically into efficient code. The interactive graphics terminal is a most powerful aid in accomplishing this process. The principal objectives are as follows: INTRODUCTION 1. Provide a computer-independent representation 2. 3. 4. 5. 6. The development, "debugging," and maintenance of computer programs for complex data-processing systems is a difficult and increasingly expensive part of modern systems design, especially for those systems which involve high speed real-time processing. The problem is aggravated by the absence of a lucid representation of the operations performed by the program or of its internal and external interfaces. Thus, the successful use of modern digital computers in automating such systems has been severely impeded by the large expenditure of time and money in the design of complex computer programs. The development of software is increasingly regarded as the limiting factor in system development. The individual operations of the central processing unit of a general purpose digital computer are veryelementary, with the result that a relatively long sequence of instructions is required to accomplish most dataprocessing tasks. For this reason, programming languages have been developed which enable the programmer to write concise higher level instructions. A compiler then translates these high-level instructions into the machine code for a given computer. The programmer's task is greatly facilitated, since much of the detailed housekeeping is done by the compiler. High level languages are very helpful in designing of a process to be accomplished by a specified (target) computer, and automatically transforming this representation into a complete program, in the assembly language of the specified computer. Design the representation so as to make highly visible the processing and flow of individual data, as well as that of control logic, in the form of a two-dimensional network, and make it understandable to engineers, scientists and computer programmers. Design the representation so that it can be configured readily on an interactive computerdriven graphics terminaL Design a simple but powerful set of computerindependent building blocks, called Data Circuit Elements, for representing the process to be accomplished by a computer using distinct forms to represent each class of function. Enable the user to simulate the execution of the Data Flow Circuits by inputting realistic data and observing the resultant logic and data flow. Facilitate the design of an efficient complex data processing system by making visible the core usage and running time of each section of 923 924 Fall Joint Computer Conference, 1972 programs for mathematical analysis and business applications. In contrast, they do not lend themselves to the design of real-time programs for complex automated systems. The high-level languages obscure the relation between instructions and the time required for their execution, and thus can produce a program which later proves to require unacceptably long processing· times .. Further, automated systems must often accommodate large variations in data volume and "noise" content. The use of existing high-level programming languages inherently obscures the core requirements for storing the code and data. This results in inefficient use of memory and time, by a factor as high as three, and is therefore a limiting factor in data handling capacity. In such systems assembly language is often used to insure that the program meets all system requirements, despite the increased labor involved in the detailed coding. For these reasons the design of computer programs for real-time systems is much more difficult than the preparation of programs for batch-type computational tasks. An even more basic difficulty is a serious communication gap between the engineers and the programmers. Engineers prepare the design specifications for the program to fit the characteristics of the data inputs and the rate and accuracy requirements of the processed outputs. In so doing they cannot estimate reliably the complexity of the program that will result. The programmers have little discretion in altering the specifications to accommodate the limitations on computer capacity and processing times. Consequently, the development of a computer for an automated system consequently often results in an oversized and unbalanced product after an inordinate expenditure of effort and time. PRINCIPAL FEATURES The principal features of the technique developed to solve these problems and the objectives listed in the Summary, are as follows: Data flow circuit language* The basis of the technique is the representation of a computer program in" a "language" resembling circuit networks, referred to as Data Flow Circuits. These * The term "D.ata Flow" has been employed earlier but with quite different objectives than those described. in. this work. (W. O. Sutherland, "On-Line. Graphical Specification of Computer Procedures," PhD thesis, Massachusetts Institute of Technology, January 10, 1966). represent the processing to be done in a form directly analogous to diagrams used by engineers to layout electronic circuits. Data Flow circuits correspond to a "universal language" having a form familiar to engineers and at the same time translatable directly into computer code. This representation. focuses attention on the flow of identifiable data inputs through alternative paths or "branches" making up a data processing network. The switching of data flow at the branch points of the network is done by signals generated in accordance with required logic. These control signals usually generate "jump" instructions in the computer program. Data Flow circuits are constructed of building blocks, which will be called Data Circuit Elements, each of which represents an operation equivalent to the execution of a set of instructions in a general-purpose computer. These Data Circuit elements are configured by the designer into a two-dimensional network, or Data Flow circuit, which represents the desired data processing, as if he were laying out an electronic circuit using equivalent hardware functional elements. Special circuit elements can also be assembled and defined by the designer for his own use. The direct correspondence. between individual Data Circuit elements and actual computer instructions makes it possible to assess the approximate time for executing each circuit path and the required core. This permits the designer to balance during the initial design of the circuit, the requirements for accuracy and capacity against the program "costs" in terms of core and running time. This capability can be of utmost importance in high-data-rate real-time systems, using limited memory. The Data Flow circuit representation also serves as a particularly lucid form of documenting the final derived computer program. It can be configured into a form especially suited for showing the order in which the program executes each function. A pplication of computer graphics The form of the Data Flow circuits and circuit elements is designed to be conveniently represented in a computer-driven graphics terminal, so as to take advantage of its powerful interactive design capability. In this instance, the Data Flow Circuit is designed on the display by selecting, arranging and connecting elements using a light pen, joystick, keyboard or other graphic aid, in a manner similar to that used in computer design of electronic circuits. As the circuit isbeing designed, the computer display program stores the circuit description in an "element interconnection matrix" and a data "dictionary". Programming Language for Real-Time Systems This description is checked by the program and any inconsistencies in structure are immediately drawn to the designer's attention. Transformation into logical form After the elements and interconnections have been entered into the interactive computer by means of either a graphic or alphanumeric terminal, the computer converts the Data Flow circuit automatically into an Operational Sequence by means of a Transformation program. This orders the operations performed by the circuit elements in the same sequence as they would be serially processed by the computer. Code generation and simulation In this step the computer converts the operational sequence into instructions for the interactive computer. The program logic is then checked out by using sample inputs and examining the outputs. Errors or omissions are immediately called to the attention of the designer so that he can modify the faulty connections or input conditions in: the circuit on-line. The assembly language instructions for the target computer are then generated. Integration and testing The derived program is assembled by the interactive computer with other blocks of the total program and the result is again checked for proper operation. Subsequent modifications to the program are made by calling up the circuit to be altered,and making the changes at the display terminal. The above steps provide the Graphical Automatic Programming method for designing, documenting and managing an entire complex computer program through the use of Data Circuit language. The result is highly efficient system software which is expected to be produced at a fraction of the time and cost achievable by present methods. Data circuit elelllents In selecting the "building blocks" to be used as the functional elements of Data Flow circuits, each Data Circuit Element was designed to meet the following criteria: 1. It must be sufficiently basic to have wide application in data processing systems. 2. It must be sufficiently powerful to save the de- 925 signer from excessive detailing of secondary processes. 3. It must have a symbolic form which is simple to represent and meaningful in terms of its characteristic function, but which will not be confused with existing component notation. The choice and definition of the basic GAP (Graphical Automatic Programming) Data Circuit Elements has evolved as a result of applications to practical problems. Seven classes of circuit elements have been defined, as follows: SENSE elements test a particular characteristic of a data input and produce one of two outputs according to whether the result of the test was true or false. OPERATOR elements perform arithmetic or logical operations on a pair of data inputs and produce a data output. COMPARISON elements test the relative magnitude of two or three data inputs and produce two or three outputs according to the result of the test. TRANSFER elements bring data in and out of the circuit from files in memory and from external devices. INTEGRATIN G elements, which are in effect complex operator elements, collect the sum or product of repeated operations on two variables. SWITCHING elements set and read flags, index a series of data words, branch a succession of data signals to a series of alternate branches, and perform other branching functions. ROUTIN G elements combine, split, and gate the flow of data and control signals, and provide the linkage between the program block represented by a given Data Flow Circuit and other program blocks (circuits) constituting the overall program. Some routing elements do not themselves produce program instructions, but rather modify those produced by the functional elements to which they are connected. Table I lists the elements presently defined for initial use in the Graphical Automatic Programming language (GAP). These include four· SENSE elements, eleven . OPERATOR elements, six COMPARISON elements, six TRANSFER elements, fourteen ROUTIN G elements, three SWITCHING elements, and six INTEGRATIN G elements. Others found to meet the basic criteria and be widely applicable will be added to the basic vocabulary. Each designer also may define for his own use special-purpose functions as auxiliary elements, so long as they maintain the basic characteristics, i.e., they accurately show data flow and are directly convertible to machine instructions to permit precise time and core equivalency. Most of these can be built up from combinations of the basic elements. 926 Fall Joint Computer Conference, 1972 Figure 1 illustrates the symbolic representation of a typical circuit element of each of the seven classes. Solid lines are used for data signals and dashed lines for control sign-als. Data inputs are denoted by an X, data outputs by a Y, control inputs by a C and control out- Element Type Name puts by a J. When the input or output may be either control or data the letters I or 0 are used. A U simply means unconnected. In Figure 1 the sample elements are seen to have the following types and numbers of connections: Data Inputs Control Inputs Data Output.s Control Outputs 1 1-2 2 2-3 2 0 1-0 1-0 1-0 2 2 0-2 1 0-3 2-0 0-1 3-0 DATA SPLIT BRANCH ON ZERO ADD BRANCH ON COMPARE READ FILE SET BRANCH SUM MULTIPLY ROUTING SENSE OPERATOR COMPARISON TRANSFER SWITCHING INTEGRATING 0 2 1 1 1 3 0 0 1 0 0-1 Figure I-Data flow circuit elements graphical representation OPERATOR and COMPARISON elements are provided with an optional control input which serves to delay the functioning of the element until the receipt of the control signal from elsewhere in the circuit. The READ FILE and other loop elements have a control input which serves a different purpose, namely to initiate the next cycle of the loop. At present, the maximum number of connections for any element is eight and for SENSE and OPERATOR elements it is four. Connections, or terminals, are numbered clockwise with 1 at 12 o'clock. Data preparation All of the elements described above have either more than one input or more than one output. There are a number of elementary operations which simply alter a data word, thus having a single input and a single output. These operations include masking, shifting, complementing, incrementing and other simple unit processes ordinarily involved in housekeeping manipulations, as for example packing several variables into a single data word or the reverse. TABLE I-Data Flow Circuit Elements COMPARISON SENSE Branch Branch Branch Branch on on on on Zero Plus Minus Constant OPERATOR Add Average Multiply Subtract Divide Exponentiate And Inclus i ve or Exclusive or Minimum Maximum Branch on Compare Branch on Greater Branch on Unequal Correlate Threshold Range Gate TRANSFER Read Word Write Word Read File Write File Function Table Input Data Output Data INTEGRA TING Sum Add Sum Multiply Sum Divide Sum Exponentiate Product Add Product Exponentiate SWITCHING Set Branch Read Branch Index Data ROUTING Linkage Data Passive Split Data Split Control Split Linkage Exit Passive Junction Da ta Junction Control Junction Linkage Store Data Gate Data Pack Linkage Entry Data Loop Control Loop Programming Language for Real-Time Systems ROUTING SENSE ,~ DATA SPLIT TRANSFER .:.?X RF Z -:;- OPERATOR + z· V/J VfJ lJ/X I COMPARISON -~ - + v I J/V BRANCH ON ZERO ADD BRANCH ON COMPARE SWITCHING INTEGRATING SPECIAL z-l U/Jt x ;- - SM v v v READ FILE SET BRANCH SUM MULTIPLY NONE DEFINED Figure 1-Data flow circuit elements graphical representation In the Data Flow Circuit notation, such manipulation is specified by a "prepare" operation preliminary to the operation performed by each element. The manipulations involved in data preparation, which represents a major portion of the "housekeeping" labor in programming, are thereafter accomplished f}.utomatically along with the translation of the functional operations of the elements in the Data Circuit. This type of operation is designed graphically by closed arrowheads at input terminals. 927 3. Write File (WF) 1 to enter the selected hits into the TRK file for retention. 4. A Data Split (DS), a Data Gate (DG), and two Control Junctions (CJ) , distribute the data and cpntrol to the correct element terminals. Data inputs to the circuit are provided by a Linkage Data element, (LD), the control input by a Linkage Entry element (LE) , and two control exits by a Linkage Exit element (LX). In the Data Flow circuit in Figure 2, the numbers in parentheses are unique reference numbers for each element and are prefixed with an "R" in the following text. The reference numbers, R, and element labels in parentheses do not actually appear at the graphic terminals but are used in the explanations of the circuit that follows: The circuit is activated by a control signal at Read File, element R3. This element reads out a hit word containing range and amplitude (A). The input at terminal 1 is the base address of the file (HIT) and at terminal 2 is the index (N) for the negative number of hits. The Data Split (R6) distributes the hit word to the Branch on Compare (R4) and to the Data Gate ,(R7). At the Branch on Compare element the amplitude (A) is extracted and used to compare with a threshold (T). If the amplitude is greater than or equal to SAMPLE DATA FLOW CIRCUIT The particular system from which the following example has been drawn concerns real-time processing of radar signals or "hits." This function is normally associated with track-while-scan radar systems. The logic of the example "Hit Sorting Program" illustrated in Figure 2 operates by indexing through a number of hits in ~he HIT file. Each hit whose amplitude is greater than or equal to a specific threshold (T) is placed in the track (TRK) file. When the HIT file is empty or the TRK file is full the program is exited. Three functional and seven nonfunctional elements accomplish this task: HIT (LEI Description of a sample flow data circuit (1) ENT . (01 (LOI N 1>---- ----1> ,----------------I I I I I i ___-Y ~t-~-_-,-----~ (~il r ~, (LOI I I I II I I I II m L---L7------ IDGI (CJI (91 I IL ________________ _ (51 WF 1. Read File (RF), to extract each hit from the HIT file. 2. Branch on Compare (BC), to select hits whose amplitude equals or exceeds the threshold. EXH (LXI (21 TRK (01 (Lot Figure 2-Hit sorting program (01 (Lot J ----£> EXT (LXI (21 928 Fall Joint Computer Conference, 1972 the threshold, control is passed to the Data Gate (R7). The original hit word fi'omthe Read File enters the Write File element (R5). The index at terminal 2 (J) is incremented. If the index indicates that the file is full the output to the circuit exit is selected. But normally the hit is placed in the file by using the base at terminal 4 (TRK) and the index (J). Control is then passed through the Control Junction (R8) to the looping input (terminal 5) of the Read File. If the amplitude is less than the· threshold, control is immediately passed to the looping input to the Read File. The looping input to Read File (R3) causes the index (N) to be incremented. If the index indicates that no more entries or hits are present the output to the circuit exit (Linkage Exit) is selected. Otherwise, the next hit is read out and processed through another cycle of the loop. The quantities HIT, N, T, TRK, and J are outputs from the Linkage Data element (RO). This element does not appear explicitly in the graphical representation, as in the case of the other linkage elements. The Hit Sorting Program is a simple example for the purpose of explaining the techniques employed in GAP. A circuit more representative in size is the Target Coordinate Computation Circuit, shown in Figure Al and is developed in an analogous way in the Appendix. ELEMENT DIRECTORY AND INDEX PREPARE LIST DATAPACKtJST GLOSSARY GENERATE OPERATIONAL SEQUENCE rRANSFORMATION GENERATE EXECUTION SEQUENCE TRANSLATION Figure 3-Processing of a GAP circuit circuit· has been found to function properly assembly code of the target computer is generated. The computer configuration used in this example is an IBM 360/91 operating under MVT. The software is written in the Conversational Programming System (CPS) and is operational under any terminal in the system. The simulation step generates CPS code and the target computer is a Honeywell DDP-516 whose assembly language is called DAP. Sample processing of a data flow circuit Input Once the particular function has been defined in the form of a GAP circuit on a scratch pad several steps are taken to generate code for the target computer (Figure 3). The circuit is input and checked interactively through a graphics or alphanumeric terminal. The transformation process then converts the twodimensional Gircuit representation into a sequential representation. of the order in which code for the elements is to be written. The first step in the transformation is a detailed trace through the circuit. This trace produces a tabulation, called the Operational Sequence. By removing all nonfunctional steps and other information not necessary for final coding, this is reduced to an ordered list of elements and connections called the Execution Sequence. Each entry in the Execution Sequence corresponds to a dynamic macro statement. The writing of instructions, Translation, now takes place. The user can select actual target computer code or a simulation of the target computer code. Normally the simulation step is first selected, and later when the The first step in the process of Graphical Automatic Programming is the input of a Data Flow Circuit into an interactive computer terminal. Unless the circuit is very simple, it is usually first laid out roughly on a scratch pad in order to save terminal time during the initial conceptual stages of circuit design. When an alphanumeric terminal is employed, as in the example descr·bed in the succeeding sections, the circuit input consists of entering the element labels (e.g., RF for Read File), number of terminals, and Interconnection Matrix. The latter requires the operator to specify only output connections for each element. An interactive program completes the matrix. The output connections are given in terms of the element reference and terminal numbers and type of connection. Table II gives the Interconnection Matrix of the Hit Sorting Program illustrated in Figure 2. Each row of the matrix lists connections to each of 'the terminals of the given element. The order of the rows is in accordance with an arbitrary but unique reference number assigned Programming Language for Real-Time Systems 929 TABLE II-Interconnection Matrix Terminal Number 1 C 0 1 2 3 4 5 6 7 8 9 0 2 3Yl 306 5 13 0 X3 C7 3 13 0 X2 U U 0 X4 7X3 0 XS c:r:&l> 6. X2 5 C6 4 C3 7Yl ~ r.3 4 ~S 4 C4 I 3 4 5 6 7 3Y2 4Y2 5Y2 SY4 IJl 2Jl 9Jl 2J2 4Y6 SYl 3J5