1964 10_#26_Part_1 10 #26 Part 1
User Manual: 1964-10_#26_Part_1
Open the PDF directly: View PDF .
Page Count: 749
Download | ![]() |
Open PDF In Browser | View PDF |
CONfERENCE PROCEED\ NGS VOLUME 26 '964 fALL jO\Nl CO MPU1ER CONfERENCE The ideas and opinions expressed herein are solely those of the authors and are not necessarily representative of or endorsed by the 1964 Fall Joint Cf)mputer Conference Committee or the American Federation of Information Processing Societies. Library of Congress Catalog Card Number: 55-44701 Copyright © 1964 by American Federation of Information Processing Societies, P. O. Box 1196, Santa Monica, California. Printed in the United States of America. All rights reserved. This book or parts thereof, may not be reproduced in any form without permission of the publishers. Sole Distributors in Great Britain, the British Commonwealth and the Continent of Europe: CLEAVER-HUME PRESS 10-15 St. M~irtins Street London W. C. 2 ii CONTENTS Page Preface PROGRAMMING TECHNIQUES AND SYSTEMS CPSS-A Common Programming Support System Error Correction in CORC The Compilation of Natural Language Text into Teaching Machine Programs Method of Control for Re-Entrant Routines XPOP: A Meta-Language Without Metaphysics D. BORETA D. N. FREEMAN L. E. UHR G. P. BERGIN M. 1. HALPERN EXPANSION OF FUNCTION MEMORIES A 10Mc NDRO BIAX Memory of 1024 Word, 48 Bit per Word Capacity W. 1. PYLE R. M. MACINTYRE T. E. CHAVANNES Associative Memory System Implementation J. E. McATEER and Characteristics J. A. CAPOBIANCO R. L.·KoPPEL 2Mc, Magnetic Thin Film Memory E. E. BITTMANN A Semi-Permanent Memory Utilizing Correlation Addressing G. G. PICK A lOu Bit High Speed Ferrite Memory SystemH. AMEMIYA Design and Operation T. R. MAYHEW R. L. PRYOR NEW COMPUTER ORGANIZATIONS An Associative Processor A Hardware Integrated General Purpose Computer/Search Memory A Bit-Access Computer in a Communication System Very High Speed and Serial-Parallel Computers HITAC 5020 and 5020E IBM System 360 Engineering iii 1 15 35 45 57 69 81 93 107 123 R. G. EWING P. M. DAVIES R. G. GALL 147 E. U. COHLER H. RUBINSTEIN K. MURATA K. NAKAZAWA J. L. BROWN P. FAGG D. T.·DoODY J. W. FAIRCLOUGH J. GREENE J. A. HIPP 175 159 187 205 Page MANAGEMENT APPLICATIONS OF SIMULATION UNISIM-A Simulation Program for Communications Networks The Data Processing System Simulator (DPSS) The Use of a Job Shop Simulator in the Generation of Production Schedules DIGITAL SOFTWARE FOR ANALOG COMPUTATION HYTRAN-A Software System to Aid the Analog Programmer PACTOLUS-A Digital Analog Simulator Program for the IBM 1620 MIDAS-How It Works and How It's Worked INPUT AND OUTPUT OF GRAPHICS The RAND Tablet: A Man-machine Communication Device A System for Automatic Recognition of Handwritten Words A Laboratory for the Study of Graphical ManMachine Communication Operational Software in a Disc-Oriented System Image Processing Hardware for a Man-Machine Graphical Communication System L. A. GIMPELSON J. H. WEBER D. D. RUDIE M. 1. YOUCHAH E. J. JOHNSON D. R. TRILLING 233 W. OCKER S. TEGER R. D. BRENNAN H. SANO H. E. PETERSEN F. J. SANSOM R. T. HARNETT L. M. WARSHAWSKY 291 M. R. DAVIS T. O. ELLIS P. MERMELSTEIN M. EDEN E. L. JACKS M. P. COLE P. H. DORN C. R. LEWIS B. HARGREAVES J. D. JOYCE G. L. COLE 251 277 299 313 325 333 343 351 363 E. D. Foss Input/Output Software Capability for a ManMachine Communication and Image Processing System A Line Scanning System Controlled from an On-Line Console MASS MEMORY A Random Access Disk File with Interchangeable Disk Kits The Integrated Data Store-A General Purpose Programming System for Random Access Memories The IBM Hypertape System Design Considerations of a Random Access Information Storage Device Using Magnetic Tape Loops iv R. G. GRAY R. A. THORPE E. M. SHARP R. J. SIPPEL T. M. SPELLMAN T. R. ALLEN J. E. FOOTE F. N. KRULL J. E. FOOTE 387 397 E. C. SIMMONS C. W. BACKMAN S. B. WILLIAMS 411 B. E. CUNNINGHAM A. GABOR J. T. BARANY L. G. METZGER E. POUMAKIS 423 435 TIME-SHARING SYSTEMS The Time-Sharing Monitor System JOSS: A Designer's View of an Experimental On-Line Computing System Consequent Procedures in Conventional Computers COMPUTATIONS IN SPACE PROGRAMS The Jet Propulsion Laboratory Ephemeris Tape System JPTRAJ (The New JPL Trajectory Monitor) ACE-S/C Acceptance Checkout Equipment Saturn V Launch Vehicle Digital Computer and Data Adapter The 4102-S Space Track Program HYBRID/ANALOG COMPUTATION-METHODS AND TECHNIQUES A Hybrid Computer for Adaptive Nonlinear Process Identification The Negative Gradient Method Extended to the Computer Programming of Simultaneous Systems of Differential and Finite Equations Quantizing and Sampling Errors in Hybrid Computation NON-NUMERICAL INFORMATION PROCESSING Real-Time Recognition of Hand-Drawn Characters A Computer Program Which "Understands" A Question-Answering System for High School Algebra Word Problems The Unit Preference Strategy in Theorem Proving Comments on Learning and Adaptive Machines for Pattern Classification HARDWARE DESIGNS AND DESIGN TECHNIQUES FLODAC-A Pure Fluid Digital Computer Design Automation Utilizing a Modified Polish Notation Systematic Design of Cryotron Logic Circuits Binary-Compatible Signed-Digit Arithmetic HYBRID/ANALOG COMPUTATIONAPPLICATIONS AND HARDWARE A Transfluxor Analog Memory Using Frequency Modulation The Use of a Portable Analog Computer for Process Identification, Calculation and Control Progress of Hybrid Computation at United Aircraft Research Laboratories A Strobed Analog Data Digitizer with Paper Tape Output Hybrid Simulation of Lifting Re-Entry Vehicle v HOLLIS A. KINSLOW J. C. SHAW Page 443 455 D. R. FITZWATER E.J. SCHWEPPE 465 E. G. OROZCO N. S. NEWHALL R. W. LANZKRON M. M. DICKINSON J.B.JACKSON G. C. RANDA E. G. GARNER J.OSEAS 477 481 489 501 B. W. NUTTING R. J. Roy A. I. TALKIN 527 539 C. R. WALLI 544 W. TEITEL MAN B. RAPHAEL D. G. BOBROW 559 577 591 L. Wos D. CARSON G. ROBINSON C. H. MAYS 615 517 623 R.S.GLUSKIN M. JACOBY T. D. READER W. K. ORR J. M. SPITZE C. C. YANG J.T.Tou A. AVIZIENIS ff31 W. J. KARPLUS J. A. HOWARD L. H. FRICKE R.A. WALSH G. A. PAQUETTE 673 R. L. CARBREY 707 A. A. FREDERICKSON, JR. R. B. BAILEY A. SAINT-PAUL 717 643 651 663 685 695 CPSS A COMMON PROGRAMMING SUPPORT SYSTEM Dushan Boreta System Development Corporation, Falls Church, Virginia INTRODUCTION Over the years many computer software systems have been developed to serve the program production process. These systems, variously known as "production" systems, "utility" systems, or "support" systems, are designed and produced for the same purpose: to provide programmers the tools required to produce computer programs. Beyond this common purpose these systems have little in common and, in fact, are unique systems individually tailored to a particular application. In each system much of the tailoring occurs because of the particular computer configuraton, operational system support requirements, computer manufacturer's software characteristics, experience of the designers, schedule pressures, and style preferences of the programmers producing the system. The tailoring is reflected in the design of each program production system and is evident. in many features, for example, the programming languages used, the computer operating procedures, the programmer's inputs, the outputs provided to the programmer, and the program organization in the system. What this paper describes is a program production system, CPSS, that should assist programmers and managers in the performance of their tasks. The principle characteristics of CPSS provide for programmers an efficient and effective means for producing their programs. For managers, CPSS provides for the minimization of costs for producing programs, and a relatively inexpensive means for achieving an effective and efficient program production capability. The CPSS characteristics that make' these claims a reality are: first, it provides to programmers the attributes of higher order languages in each program production task; second, that both the functions of CPSS and its computer programs largely are transferable; and third, the totality of functions of a comprehensive program production system is provided in CPSS. Further, the design features embodied in CPSS should afford the minimization of its maintenance costs, reduction in the possibility of programmer errors, and simplification of the programming task itself. Additionally, the design of CPSS provides for its "common" applicability. It may be used in "open-" or "closed-shop" operations in supporting the development and production of system, non-system, and "one-shot" programs. In examining program production systems, most are found to have functional capabilities for generating code, code-checking the object programs, and maintaining magnetic tapes containing programs. Effectively, its design characteristics, language power, scope of applicability, and transferability make CPSS an off-the-shelf program production system. In some instances these capabilities are of the most rudimentary sort. In other instances, very sophisticated and complete capabilities exist. 1 2 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 CPSS is programmed in a subset of the JOVIAL language, and in design is compatible with the full JOVIAL language. Currently, CPSS is implemented on an IBM 7090 and is being used to support the development and production of a computer program system. This installation and continued testing will be the source for refinements to CPSS's design as the system continue& under development. CPSS DESIGN CRITERIA AND REQUIREMENTS Providing "off-the-shelf" capability is a different type of programming problem than normally is encountered. The problems in providing CPSS with the "off-the-shelf" capability stem from the class of computers on which it may be installed; the nature of the transferability task; some aspects of the programmer training tasks; the CPSS maintenance task; the programming language it provides; and the scope of its applicability to the operational system development process. Class of Computers CPSS is directly applicable to medium- and large-scale computers. The computer configuration should have, but need not be restricted to, a word size of 30 bits, a 32K one-instructionper-word or a 16K two-instruction-per-word core memory, peripheral storage units consisting of four tape drives (or three tape drives plus drum or disc units), an on-line printing device, an on-line input device, and some external switches or keys. The computer configuration need not be defined explicitly in that there are many possible trade-offs between the computer's characteristics, the programming conventions and techniques used in CPSS, and the capacity of the system. For example, by altering the labeling convention used in the coding of CPSS~ the class of computers could be expanded to include machines with a word size of 24 bits. The Transferability Task The transferability of a program production system is important for many reasons. The cost of installing a program production system is minimized. For applications employing a variety of computers, there is a standard system and methodology that contributes to programmer transferability. The difficulties and costs inherent in the transition from one computer to another are reduced. And, a bench mark is identifiable from which further technology development may progress. The goal, transferability of programs, usually is interpreted as requiring a program coded and operating on one computer to be operable on a different computer and still retain the capability to perform its functions. The transfer should be completed at least semi-automatically, utilizing clerical or junior personnel and fixed procedures. The current state of the art does not afford 100 per cent transferability. Therefore, we have interpreted this goal to mean that CPSS is to be transferable with only a minimum of known code change. In order that CPSS be transferable, the functions and services provided by CPSS also must be transferable. Additionally, the CPSS documentation, program and system tests, operating procedures, transferability techniques, and transfer procedures are designed to be transferable. It must be emphasized that the transferability task being discussed is that of getting CPSS to run on another computer, different from its current application (the IBM 7090). CPSS is designed to be transferred as a system. Although it is modular, the transferability of any module is a distinctly different task from that of transferring the whole of CPSS. One natural design feature of a transferable system is its independence from machine characteristics. It must be noted that machine independence is a two-way street. Not only is the code of CPSS to be machine-independent, but the functions performed by the code also must be machine-independent. For example, programs making transfers to and from storage should not assume some given unit availability, transfer rate, segmentation of the transferred data, unit positioning, or even that it is the only user of a unit. In CPSS, this example of machine-independence (transferability) is provided by a central I/O program in the CPSS Computer Operations Subsystem (the design of which is discussed later in this 'paper). It will be difficult to measure how transferable CPSS is until it has been transferred CPSS-A COMMON PROGRAMMING SUPPORT SYSTEM across several computers. The system's transferability could be measurable in several dimensions, for example, in time elapsed from start to installation, in dollar costs for each economic factor involved in the transfer, in amounts and types of computer time required, in the amount of code to be altered per program and per function, and in the number of errors discovered in each phase, including installation and post-installation. Detailed records should be maintained that identify the transfer costs and the factors that influenced the cost. Some of these factors are: the differences in the machine instruction word format and addressing, the types and quality of programs available on the "new" machine, the frequency of occurrence and amount of down-time per occurrence of machine failure, and the location and availability of the staging and target computers. '/ Some of the principal technical problems that will arise in transferring CPSS to other computers lie in the sophistication of the design embodied in the system, the power of the language provided by the system, the systerriization of the CPSS design, and the broad class of computers to which CPSS may be applied. For example, consider the problem of designing CPSS so that it will operate on a four-tape drive computer configuration. The task of compiling a JOVIAL program can use (1) an input device for the source program, (2) an input device for the library tape, (3) an input device for the system Compool, (4) a temporary storage device for the intermediate language, (5) a permanent storage device for the object program, (6) an output device for the listings, and (7) an output device to communicate with the computer operator (as will be noted later, a compilation may require other additional "storage devices"). Further complicate the task and allow the programmer to generate a test case and operate his program on the test case, and allow all this to occur in an uninterrupted single job. This problem is resolved in the design of the CPSS executive and I/O functions (discussed later in the paper). In essence, the I/O problem was resolved by constructing a central I/O program that provided machine-independent I/O functions for the remainder of CPSS. The control problem was resolved by allowing p:ogrammers the freedom of directing CPSS 3 via control card inputs (functionally oriented to the program production tasks). All the tasks related to transferability are not involved in subsequent transfers of CPSS. Consider the input card; once we have coded routines to accept floating-field cards and have levied no special format requirements on the card, the card input processing functions are totally independent of the machine. The information processed from card inputs in one application need not appear on cards in another application but could be processed from other input media in a different input form, e.g. punched paper tape or teletypewriter. The methods employed in achieving transferability, or machine-independence, vary depending on the function being performed in the program. Some of the more commonly applied techniques were: the parameterization of certain machine characteristics (word length, number of characters per print line, number of print lines per page, etc.) ; the establishment of programming conventions regarding the use of constants and tags; the use of floating-field card formats; and the use of "all-core" indexing to relocate data and to compute addresses. In many instances, special methods were required to achieve transferability. Some of these are discussed later in the paper during the discussions of the various CPSS subsystems. Programmer Training One of the principal benefits achieved by employing a higher order language and requiring transferability in CPSS is in the potential reduction of programmer training costs. When a programmer is transferred from one application to another, a training or learning period is required to familiarize him with the particular computer and the program production system he will use. This retraining period varies from a week to a month-and-a-half or more. During this period a programmer's effectiveness is almost nil; and thereafter, it is less than it should be until the programmer becomes expert in the use of the "new" computer and system. CPSS should afford a reduction of training and retraining costs by permitting programmers to code and test their programs in a higher 4 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 order language. If CPSS achieves a broad patronage, these costs are further reducible since CPSS is designed to reflect a stable form regarding its interfaces with the programmer. In effect, CPSS could become a means for achieving some level of programmer transferability. CPSS Maintenance In producing CPSS, a primary concern has been maintenance costs. These costs are related to error correction, program improvement, augmentation of the system's capability, program and system documentation, and product release. The design of CPSS provides for the minimi~a tion of such costs by isolating, identifying and documenting the program- and system-type functions that comprise CPSS. The system and program documentation are designed to facilitate the maintenance task. A CPSS program's documentation consists of a heavily commented program manuscript, a detailed flow chart, a functionally organized flow chart, a program description document containing descriptions of data referenced, each routine coded, each procedure used, the input data formats and structures, the output tables, items and messages, and the function served by the program. Another document describes each machine dependency contained in the program. Also, a system design document describes each program function and the interfaces between programs. CPSS is designed and documented to facilitate the maintenance task. Also, the system is capable of maintaining itself or of producing itself. The Programming Language Perhaps the most significant decision made in the CPSS project was the selection of a language for the programming of CPSS. The design of each function contained in a program production system is influenced by the language provided by the system. Therefore, certain design features are required to assure that the program production system is capable of responding to the operational system's programming needs. In a sense, a transferable program production system must be "overdesigned". The design must reflect the current capability of the language being provided, and also needs to provide for logical extensions of the language. Consider the situation that exists with CPSS. There are three levels of JOVIAL represented in CPSS which form a hierarchy of language that is upward compatible in language power and in the language processing algorithms. The formal JOVIAL, J-3, is subset into two levels: J-S, being a subset of J-3; and J-X, being a subset of J-S. The program generation subsystem is coded in J-X and processes programs that are written in J-S. All other subsystems are coded in J-S and perform their functions compatibly with J-3. The decision range as to which level of language capability is to be provided in the program production system is bounded on the upper end by the formal definition of the programming language, and on the lower end by the language capability provided by the program generation subsystem. In the program generation subsystem, the language capability to be provided is influenced by such factors as the subsequent use of the language, the design of the compiler, the level of transferability desired, and the expected characteristics of existing languages and compilers having the same generic name. Other factors influencing the decision in CPSS were the transferring procedures and techniques, the testing techniques established to test the system, and the availability of computers with JOVIAL compilers. CPSS-Scope of Applicability CPSS serves programmers and managers in their performance of several tasks related to the system development process. Figure 1. shows a simplified representation of the system development process that has beeen employed for several systems, both large and small. The scope of CPSS is indicated by its applicability to the program production process, which encompasses parts of the program design, program genera~ion, program test, and assembly test stages. Figure 2. is a simplified representation of the program production process as served by program production systems. The programs of CPSS-A COMMON PROGRAMMING SUPPORT SYSTEM r- -.- - 1 DETERMINE REQUIREMENT 1 ESTABLISH PROGRAM SYSTEM REQUIREMENTS I I PROGRAM PROGRAM ASSEMBLY GENERATION TEST TEST I ·w r --t-ESTABLISH PROGRAM SYSTEM DESIGN (OP SPECS) DEFINE OPERATIONAL MISSION REQUIREMENT SYSTEM PROGRAM DESIGN SYSTEM DESIGN SYSTEM DEFINITION ~ ESTABLISH OPERATIONAL SYSTEM DESCRIPTIDN (05D) ESTABLISH DATA BASE DESIGN AND DATA ACQUISITION ~ I I I I IL Figure 1. II I ~ 1 DESIGN INDIVIDUAL I ,I PROGRAM PRODUCTION PROCESS- - TEST INDIVIDUAL PROGRAMS CODE CHECKING PARAMETER TEST) CODE AND TRANSLATE PROGRAMS 1 , DESIGN INDIVIDUAL PROGRAM TESTS DESIGN ASSEM.LY TESTS ... PRODUCE ASSEM.LY TESTS r ~-~l·-· I II ___________ ---I PROORAMS , --, I SYSTEM TEST SYSTEM OPERATION I I PROGItAM INTEGRATION AND ASSEMLY TEST J€} --,rr _ .J 5 ,.TEST PRODUCE SYSTEM TURNOY1[R AND PERFORMANCE IN THE OPERATIONAL MISSION SYSTEM TESTS DESIGN SYSTEM TESTS Program Production Process in the System Development ProGess the production system are designed to assist programmers and managers in their performance of these four tasks: program generation, program test, system· generation, and assembly test. The principal product derived from the program production process is the operational program system master tape. Other products are a system data dictionary (referred to as a Compool) with its documentation and listings, the program system documents, listings, test plans, and test results, and the programs that comprise the program system with their documents, listings, test plans, and test results. between the various functions is automatic. The execution of the functions is controlled by the programmer. The four program production tasks, program generation, program test, system generation, and assembly test are served by this data and information flow. The preceding figures, ,Figure 1., Figure 2., and Figure 3., depict the scope of applicability for CPSS in the system development process. A maj or task, related to the system generation task, is the acquisition and management of a data base. This paper will not delve into the data base tasks except where such tasks directly interface with the program production system. Program Generation. The programmer, employing the JOVIAL language, encodes a program to satisfy the program design specifications. The code, the symbolic programming language statements (the source program), is input to the compiler which translates the code into machine instructions. During the compilation, the source program is appropriately augmented by routines from the procedure library tape and by system data descriptions from the Compool. Figure 3. depicts the information and data flow provided for in CPSS. The flow of data The principal output from the compiler is a binary program (object program). The re~ Figure 2. Program Production Process 6 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 r-------------T-----------=-----------------r-' I I I I I I I r------ - I I II I r--++ I I I I I I I I I I I I IL __ ..J,I _ _ _ _ ... _ _ _ _ _ _ _ _ _ _ _ • Figure 3. ___________ ~ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ .II Program Production Process, Information and Data Flow mainder of the outputs provide information to the programmer (and to other parts of CPSS) that facilitate the testing and correction of the program. The process of compilation in the early phases of program coding sometimes is referred to as "grammar-checking"-where the result of the grammar-checking is a "good" program; that is, one that is syntactically correct. Program Test. In order to test a programthat is, validate that the program performs its functions correctly-the programmer must define a test case. A test case is comprised of a simulated data environment for the program, recording controls to retrive .data from the program's environment, and any program modifications required to correct the program as shown by previous tests. The test Gase is input to CPSS which translates the programmer's inputs into a test environment. When requested, CPSS loads the test environment and the object program into the computer for operation. During the operation of the test, data is recorded as requested by the programmer in the test case. After the operation of the test, the recorded data is processed to provide, as outputs, the hard copy test results. CPSS appropriately interprets data descriptions from the program Compool or the system Compool to translate the test case inputs and process the recorded data. Essentially, the system Compool and the program Compool are the significant means through which CPSS affords the programmer the ability to test a program at a language level comparable to a higher order programming language. This loop, the program test phase, is repeated for as many test cases as are required to satisfy that the program performs its functions correctly. The two tasks discussed so far, program generation and program test, are common to all program production processes-whether the programs are system programs or independent programs. In this light, the applicability of CPSS is extended to include both system and non-system programming tasks. System Generation. One of the principal tasks in building a system is to define the system data dictionary, more commonly known as a Compool. Essentially, the Compool is the means for defining the data that comprises a system's data base. A Compool can be thought of as being a central repository of data descriptions used both by programmers and programs. Usually, a Compool exists in two forms. The first is a document containing descriptions of the system's data environent, data structures, CPSS-A COMMON PROGRAMMING SUPPORT SYSTEM data organizations, some commentary related to the reasons why the data exists in the system, and a description of the usage of the data. The second form is a binary tape containing information describing data structures, and data organizations. The binary Compool, in some cases (including CPSS) , contains information that is usable in the constructing of the Compool document. The program production system itself is a principal user of the binary Compool in that it retrieves data descriptions from the Compool during the various phases of the program production process. The Compool, being a central collecting point for system data descriptions, serves as an integrating device in the program production process. In this manner the Compool provides to program system managers a means for controlling a system's data environment. CPSS provides both for the building of the binary Compool and for the Compool documentation. A programmer employing the appropriate data descriptors, encodes data description statements that des c rib e the data comprising the data base. These statements are interpreted by the Compool generator which produces a binary Compool. Other outputs provided by CPSS are quality analysis aids, and data description listings. The tape file maintenance function provides the means for building tapes containing programs. Further, the function provides for modifying, correcting, cataloguing, and in general maintaining computer tapes. Assembly Test. The assembly test task, functionally, is similar to the program test task. The purpose of the assembly test task is to provide a means for testing a complex of programs that form a system or a logical subset of a system. In other words, the purpose served in assembly testing is to validate that a complex of programs acting in concert perform a system function correctly. Assembly testing can be thought of as a hierarchy of testing-ranging from simple program interface tests to complex full system tests. Figure 3. depicts an assembly test as being performed in a controlled environment. The system control parameters, initializing data, simulated inputs, and recording controls are 7 prepared as a test case via an assembly test system. The test case is run against the appropriate complex of programs during which record ing is performed. A'fter operation on the test case, the recorded data is processed via a data reduction and test analysis subsystem which provides the hard copy test results. This loop, assembly testing, is performed for as many test cases and test levels as are required to validate that the program system performs its functions correctly. Although CPSS is not designed explicitly to serve the assembly test task, it does contain programs that are usable in an assembly test system; for example, the data recording, data reduction, and data generation programs. With very minor modifications to the test environment load and data reduction programs, CPSS further could be used to provide a very sophisticated string test (program interface test) capability. The reason for not explicitly providing an assembly test capability in CPSS is that the higher' levels of assembly testing usually require programs that reflect the design of the operational program.system-(such as height reply message simulators, and radar correlation analysis programs) . CPSS PROGRAM DESIGN One of the principal design characteristics of CPSS is the functional modularity embodied both in CPSS and its programs. CPSS has been separated logically into subsystems, in general, corresponding to the common program production functions: program generation, data environment simulation, data recording, data reduction, test environment load, computer operation, Compool generation, and tape file maintenance. These subsystems are comprised of programs which further are partitioned into functional subroutines. An attempt was made to isolate each system function and each program function into an identifiable subpart of CPSS. Some of the common program-type functions have been programmed as JOVIAL procedures and loaded onto the CPSS procedure library tape. Additionally, the CPSS programs, tables, items and the Compool itself are defined in the Compool. Thus, CPSS is an integrated system constructed of modules, each of which 8 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 are program-type or system-type functions, which are organized as 22 major programs, 35 library procedures, 10 common executive entries, 25 system tables with 330 items, and 53 parameter items. The size of CPSS is approximately 20,000 JOVIAL statements that result in approximately 65,000 IBM 7090 machine instructions. Program Generation Subsystem The program generation function is provided in CPSS by the JOVIAL language and a JOVIAL compiler. With the development of CPSS, a powerful and comprehensive subset of the JOVIAL language was developed that should be sufficient to produce most computer software systems. This subset, the JOVIAL core-subset language, J-S, is the language employed in the programming of CPSS. The power of J -S is demonstrated by the fact that the programming of CPSS did not require the totality of J -So The principle reasons for developing J-S, and the goals achieved by this development were: (1) The definition of a "comprehensive minimum" JOVIAL language that is sufficient for producing most computer program systems. (2) The definition of a JOVIAL subset language that affords the production of transferable programs. (3) The design, development, and production of a JOVIAL compiler that can be produced on shorter schedules than more comprehensive J-3 compilers. ( 4) The improvement of the language processing speed of a JOVIAL compiler. (5) The retention of the significant language and compiler features normally expected of JOVIAL, for example: Compool sensitivity, procedure library capability, partitioning of programs into procedures and closed routines, memory allocation, packing of items into tables, processing of packed data, grammar checking, subscripting and indexing, bit and byte addressing, machine assembly language coding, logical and arithmetic operations, and program "debug" listings and aids. In general, the differences between the J-S and J-3 languages should be more than offset by the improvements in the compiler design and its compatibility to CPSS. It should be noted that J-S is a proper subset of JOVIAL, i.e., that the programs coded in J-S are legal and valid inputs to J-3. Some of the significant design features of the J-S compiler are: (a) The J-S compiler is a "two-pass" compiler. That is, a program is processed twice to produce a binary output. First, in the JOVIAL language form; and second, in an intermediate language form. The principal result of having only two passes is that compiling speed has been significantly increased. (b) The J-S compiler provides an "altermode" of recompilation. That is, the programmer can add modifications to the source program during compilation without altering the original source program. The compiler will produce an updated version of the source program as one of its selectable options. (c) The J-S compiler produces a program Compool. That is, the J-S compiler produces a Compool containing complete data descriptions of all data and labels referenced or declared by the program. The program Compool is usable interchangeably with the system Compool throughout CPSS and is compatible in form and structure with the system Compool. (d) The J-S compiler is capable of being expanded to incorporate additional language capability. The practical limitation on this expandability is the size of core memory. Additionally, the programmer can select the outputs he wants, override the Compool, specify the Compool he wants used, and in general, exercise those options that specifically control the inputs to and the outputs from the compiler. In general, the CPSS program generation subsystem provides the language power, compiler speed, and flexibility of use that affords a programmer the ability to generate almost any program conceivable. CPSS-A COMMON PROGRAMMING SUPPORT SYSTEM Data Environment Simulation Subsystem The data environment simulation function is provided in CPSS by a computer program that processes data assignment statements. The program produces data records containing the programmer specified data, and control information that is used by the test environment load subsystem. The programmer specifies his data environment requirements in a POL-type language. The program employs either a program Compool or a system Compool as selected by the programmer. The programmer also may identify the data being produced, thereby affording future selective use of the data. The program is compatible with the JOVIAL J-3 data forms and data structures. The data itself can be coded as floating-point, fixed-point, integer, Hollerith, Standard Transmission Code, and status-variable, where such data is organized and structured as tables, items, strings, or arrays. Subscripting and indexing also is allowed. The program allows the programmer to set values into any variable defined or referenced by his program if the program Compool is used. If a system Compool is used, the programmer may set values into any variable defined in the system Com pool. The program imposes no limits on the volume of data it processes. It does perform legality checks on the programmer's inputs to determine the compatibility of the inputs with the defined data environment. Data Recording Subsystem The data recording function in CPSS is separated into two parts, where the actual data recording is provided under the CPSS computer operation subsystem. The CPSS data recording preparation function is performed by a computer program that processes recording request statements. The program produces a data record containing control information for use by the computer operation subsystem, the test environment load subsystem, and the data reduction subsystem. The programmer describes his recording requests in a fully symbolic language. The CPSS program employs either a program Compool or a system Compool as selected by the program- 9 mer. The programmer also may identify the recording that would be performed per his requests, thereby affording future selective use of the recording controls produced by the CPSS program. The programmer may select the data to be recorded under any name defined in the program Compool or system Compool and/or any block of memory. The location in his program at which recording is to take place can be specified symbolically (if a program Compool is selected) . The programmer may request a memory register change survey, or dumps before, during, and/or after his program's operation. The dumps may be formatted as octal, machine language instructions, floating-point, and/or alphameric. Data Reduction Subsystem The data reduction function is provided in CPSS by a SUbsystem of computer programs that process either CPSS recorded data or miscellaneous data formats. There are four general classes of printouts produced by CPSS: Compool-defined data, memory dumps, survey dumps, and tape dumps. Compool-defined data is processed and appropriately formatted entirely dependent upon the Compool definition. CPSS interrogates either the system Compool or program Compool to determine the appropriate formatting. The information in a printout reflects the page number, table name, recording identity, recording location, table size, entry number, data name, data type, the converted data, and a security classification. Memory dump processing is performed in any of four formats: octal, machine language instructions, floating-point, and/or alphameric. A printout contains the page number, security classification, recording location, recording identity, the beginning and ending locations of the dump, the contents of the addressable machine registers, and the contents of the computer words dumped. The page formatting is determined by the program and is printed as four or eight words per print line. The survey dump processing is similar to the memory dump processing. The significant dif- 10 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 ference is that those memory locations which contain changed values are printed. The dumps to be compared are made by the CPSS recording program before and after the operation of a program in a test. A printout contains the page number, security classification, the recording identity, the beginning and ending locations of the survey area, the contents of the addressable machine registers in the "before" and "after" states, and a print-line for each changed computer word containing the address and both values. In the tape dump function CPSS provides the means to print out any tape. The format of the printout is similar to the memory dump format, except that the addressable machine register print lines are not output. The programmer can request that his data reduction be performed in either of three modes: recording, general, or binary. The recording mode is used to process data recorded by CPSS. The programmer can select a subset of the data to be processed, or he can allow CPSS to process the data automatically per his recording requests. Processing selection is performed in exactly the same manner as specifying recording requests. The binary mode is used to process tape dumps. The programmer can specify the "limits of processing" and the print formats (octal, floating-point, etc.). The limits of processing bounds the range of tape that is to be processed. The general mode is used as a mixture of the recording and binary modes. All the controls available to the programmer under these two modes are available under the general mode. Further, if a tape containing records similar to CPSS-type records is to be processed, the general mode may be used to reduce the data per Compool definitions even though the tape was not built by the CPSS recording program. Test Environment Load Subsystem The test environment load function is provided in CPSS by a computer program that loads a test case into the computer for operation. The CPSS test environment load subsystem provides for loading recording patches to a pro- gram; loading a data environment; loading octal correctors to a program; and loading the program that is to be tested (currently, on the IBM 7090, CPSS is capable of loading and operating a 25,000 register test case). Computer Operation Subsystem The computer operation function is performed in CPSS by a subsystem of programs that provide for the uninterrupted operation of the computer. The functions performed by the computer operation subsystem can be grouped into four classes: system control, operator communication, test control, and I/O monitor. System Control. The system control function provides for the continuous operation of CPSS. It interrogates programmer supplied control "cards" to determine which function (or subsystem) is required. The system operates on stacks of jobs (usually prestored on tape in the sequence desired) where each job may be comprised of many dissimilar requests. For example, a job may be to compile a program, specify several sets of recording controls, specify several data environments, to load and execute the compiled program in various test environments, and to process the data recorded in the several program runs. The sequence of CPSS's operation is specified by the ordering of the programmer supplied control cards. In addition to the sequence control function, system control provides the normal control-type functions such as position tapes, clear core, job error recovery, loading of octals to cycling system programs, etc. Essentially, the system provides uninterrupted operation as long as there are jobs to be processed, and a test program does not loop or write into "permanent" core. Controls are provided through which the computer operator (or the programmer in one special case) may interrupt the computer's operation. Operator Communication. The operator's communication function provides three methods for interrupting the system's operation, (1) at each I/O operation, (2) between control cards, and (3) recovery from test program loops, halts or other errors. When the operator has completed his tasks, he may recover the sys- CPSS-A COMMON PROGRAMMING SUPPORT SYSTEM tem's operation, as appropriate, in any of five ways, (1) skip forward in the job to the next control "card", (2) skip forward to the next job, (3) skip forward to a specified job, (4) reinitialize the system, or (5) continue from the point of interruption. Recovery may be performed automatically by CPSS as a result of the operator's request, or the operator manually may enter the system as he desires. While the system is interrupted, the operator. may reassign I/O units, list the I/O unit/file allocation, take dumps, position to a particular job, or perform other similar tasks. The operator may perform his tasks in response to programmersupplied instructions (as printed by CPSS or otherwise), messages printed by CPSS relating to the system's needs, or recognizable error conditions requiring his actions. CPSS provides a method'/.for programmers to "simulate" certain computer operator actions. They may specify I/O allocations, list the I/O unit/file allocations, or perform other operator type tasks. By the judicious use of control cards the programmer may "directly" communicate with the computer operator to effect the job desired. Program Test Control. The program test control provides for the operation of a test case loaded by the test environment load subsystem. Also, the program test control function executes the recording program for dumps and surveys, and loads system recovery type p~ogram modifications to the object program. The interfacing between the test environment load subsystem and other computer operation functions provides to programmers almost complete flexibility in the running of tests. CPSS allows any one of its functions to be run independently or in any sequence. Some of the types of computer runs a programmer might make are: (1) (2) (3) (4) ( 5) (6) Compile only Load and execute his program Compile, load, and execute his program Generate test data Specify recording requirements Load his program, recording controls, test data and execute his program (7) Reload data and re-execute his program (8) Load a different program, its recording controls, and execute the "new" program on the "old" data environment 11 (9) Re-execute his program Effectively, CPSS imposes no operating restrictions on the programmer in the generation or testing of a program. In this manner the programmer is able to selectively test subparts of his program, his whole program, or strings of programs all as one job or independent jobs. I/O Monitor. The principle function performed by the CPSS central I/O program is to provide machine-independent I/O operations for other programs. The I/O program performs all the I/O operations required by the programs comprising CPSS. The program provides a comprehensive set of I/O operators: Read, Read-search, Write, Position, Position-search, Close, Wait (for a specific file), Wait (on all files), Repeat (the preceding request), Rewind (initialize the file), and File-status (feed back the current status of the file). Additionally, the program provides some elementary data conversion and manipulation functions in conjunction with requested I/O transfers, i.e., transfer to or from packed or unpacked BCD data images, convert data to or from "standard transmission code", convert data to or from BCD, and/ or combinations thereof. Also, the program will transfer data to or from specific locations or standard locations. The program will either wait for a transfer to be completed or return immediately as requested. A program requests I/O operations by setting items in a CPSS communication table and transferring control to the I/O program. These items specify the name of the file on which the operation is requested, the operation to be performed, a wait or no wait condition, and other information related to the operation such as data conversion and manipulation, location of the data to be transfered, amount of data to be transferred, etc. Upon completing the operation, the I/O program automatically enters information into the communication table relating to the requested operation and returns control to the requesting program. This information is usable to determine the status of the file, file addressing, status of the requested operation, amount of data transferred, etc. In essence the I/O program determines the appropriate device, record fragmentation (or 12 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 accumulation), labeling, unit positioning, and other functions to effect a transfer of data to or from memory via an I/O device. The program monitors each transfer to determine the validity of the transfer and takes whatever corrective action is appropriate. The manner in which these functions are performed provides CPSS programs their independence from a machine's .I/O and yet allows the referencing programs to perform "efficient" I/O. Compool Generation Subsystem The Compool generation function is provided in CPSS by a subsystem of computer programs that build and interpret a Compool. The CPSS Compool subsystem provides for a comprehensive definition of data. The inputs to the Compool assembler contain the normal type of data definitions, and a variety of supplementary data descriptive information (see Appendix B). Further the Compool assembler provides for assigning data addresses symbolically, and allocates core memory for data or program storage. Effectively, the Compool assembler program provides the ability to define data for system applications, normal utility type needs, and for the programmer's information needs. It facilitates the data description task by accepting a fully symbolic input. In that the contents of a Compool usually are operational systemdependent, the Compool assembler program provides for the definition of the Compool's content. The program interrogates a series of legality matrices to determine the acceptability of data, the validity of the input, and the completeness of the data definition. In this manner the Compool constructed by CPSS can be tailored to the operational system's needs. Also, the Compool subsystem contains a program whose function is to retrieve the information contained in the Compool. This retrieval program provides two levels of information: first, that information which is required for the normal utility type functions, and second, all the information contained in the Compool. In the first case the data is printed in alphabetical order, and in the second case is alphabetized by data class. The third function provided by the CPSS Compool subsystem is a quality analysis of the information contained in the Compool. The program performs a tag analysis function that checks for duplicated tags and ambiguous cross-reference tags. The program also determines the validity of the memory allocation by checking for violations of reserved areas, and overlapped allocations of data. In addition the program performs a capacity analysis by checking for unallocated addresses, and by tallying each data occurrence by data type and amount of memory required. Essentially, CPSS provides a Com pool that is tailored to a system by its content and the tools needed to build, interrogate and provide quality control on a Compool. Tape File Maintenance SUbsystem The tape file maintenance function is provided in CPSS by a computer program that performs those functions necessary to maintain tapes produced by or for CPSS. Further, the CPSS program is capable of performing the same set of functions on almost any tape regardless of format or structure. Some of the more significant characteristics of the CPSS program are: it can duplicate, reformat, position, read, write, close, skip, backspace, rewind, compare, list the contents of, and load octals to tapes containing programs, Compools, files, records or any combinations thereof. The CPSS program interrogates control cards containing information that describes the operation to be performed, the units on which the CPSS program will operate, and the structure of the data stored on the unit. The CPSS program provides for labeling of each transfer, and thereby can handle overlaid and interspersed files of varying structures. The CPSS program is designed in modules such that each operator, and each modifier to the operator are procedures or closed routines. With this design the CPSS program easily can be modified to delete, add or modify the tape file maintenance functions as the particular application requires. The tape file maintenance program establishes information in "dummy entries" for use by the CPSS central I/O program. In this manner the only machine-specifics in this program lie in its processing of binary cards . CPSS-A COMMON PROGRAMMING SUPPORT SYSTEM APPENDIX A CPSS Control The sequence of the tasks performed by CPSS is dictated by the ordering in the programmer's job deck. A job deck is comprised of system control cards, and data and/or function control cards. Data and function control cards im'ASSIGN, INPUT, unit $ 'CLEAR, area to be cleared $ 'COMMENT $ commentary 'COMPILE $ 'COMPOOL, ANALYSIS, control information $ 'COMPOOL, ASSEMBLE $ 'COMPOOL, AUDIT $ 'COMPOOL, DISASSEMBLE, control information $ 'COMPOOL, LIST, control information $ 'ENDJOB $ 'GO, address $ 'JOB $ 'LOAD, control information $ 'OCTAL $ 'OPSIM $ 'POSN, file name, control information $ 'PROCESS, control information $ 'RECORD, control information $ 'RETURN, address $ 'TABSIM, control information $ 'UTILITY $ 'WAIT $ APPENDIX B CPSS Compool The CPSS Compool assembler builds a Compool from information coded on data declaration "cards". The program processes nine data declarator types which are used to define a system's data base, i.e., Program, Table, Item, String, Array, Free Item, Constant, File, and Task declarations. Also, the program processes four declarator types that are used in the building of a Compool, i.e., Ident, Locate, Reserve, and End declarations. 1. I dent. The Ident declaration is used to identify the Compool itself. 13 mediately follow their related system control card in the job deck. The first card in a job deck is the 'JOB card and the last card is an 'ENDJOB card. The system control cards acceptable to CPSS are listed below and summarize some of the system's capabilities. Pairs the CPSS INPUT file to the given unit. Clears the given area of core to plus zero. The comment is printed. Initialize the program generation subsystem. Analyze a Compool as requested. Assemble a Compool from the data cards. Legality check the Compool data cards. Format and print the binary Compool specified. Format, order, and print the Compool specified with commentary added. The end bracket for a job deck. Transfer control to the given address. The begin bracket for a job deck. Load the environment specified. Load and save octals for CPSS programs. Initialize the operator "simulation" function. Position the given file as directed. Format and print the data as directed. Prepare recording parameters as directed. Load a transfer to the CPSS executive at the given address. ~repare a data environment as directed. Initialize the tape file maintenance subsystem. Stop the system's operation. 2. Locate. The Locate declaration is used to pair address labels to core memory addresses. These labels are usable in lieu of actual memory addresses. In this manner, the programmer is able to allocate memory and define data addresses in a completely symbolic method. 3. Reserve. The Reserve declaration is used to prevent the allocation of data to certain core memory areas. 4. End. The End declaration terminates the program's processing of declarations. The type of information the programmer may use to describe data is quite comprehen- 14 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 sive. For example, a program description may contain the program name, mod, length, memory location status (absolute, relocatable, or dynamically relocatable) , memory location, program type (closed or open, system program, parameterless subroutine, or parameterized subroutine), storage location (unit, label, unit addressing) , subsystem name, title, related commentary, and input and output parameters. BIBLIOGRAPHY The following represents a collection of general material on the subject of "program production" type systems and supplementary references for CPSS and JOVIAL. 1. BARNETT, N. L., FITZGERALD, A. K., "Operating System for the 1410/7010-360 philosophy", Datamation, Vol. 10, No.5, pp. 39-42, May 1964. 2. BLATT, J. M., "Ye Indiscreet Monitor", Communications of the ACM, Vol. 6, No.8, pp. 506-510, September 1963. 3. BORETA, D., "Introduction to CPSS", (soon to be published as a System Development Corporation tech memo, TM-WD-800/ 002/00) . 4. BOUVARD, J., "Operating System for the 800/1800-admiral", Datamation, Vol. 10, No.5, pp. 29-34, May 1964. 5. HOWELL, H. L., The Q-32 JOVIAL Operating System, System Development Corporation, TM-1588/000/00, Nove m b e r 1963. 6. OLIPHINT, C., "Operating System for the B5000-Master Control Program", Datamation, Vol. 10, No.5, pp. 42-45, May 1964. 7. PERSTEIN, M. H., The JOVIAL Manual, Part 2, The JOVIAL Grammar and Lexicon, System Development Corporation, TM-555/002/02, March 1964. 8. SCHWARTZ, J. 1., COFFMAN, E. G., WEISSMAN, C., A General-Purpose Time-Sharing System, Proceedings Spring Joint Computer Conference, Washington, D. C., pp. 397-411, April 21-23, 1964. 9. SHAW, C. J., "A Specification of JOVIAL", Communications of ACM, Vol. 6, No. 12, pp. 721-735, December 1963. 10. SHAW, C. J., The JOVIAL Manual, Part 1, Computers, Programming Languages and JOVIAL, System Development Corporation, TM-555, Part 1, December 1960. 11. SHAW, C. J., The JOVIAL Manual, Part 3, The JOVIAL Primer, System Development Corporation, TM-555/003/00, December 1961. 12. STEEL, T. B., Jr., "Operating Systemsboon or boondoggle", Datamation, Vol. 10, No.5, pp. 26-28, May 1964. 13. SUTCLIFFE, W. G., Program Production System User's Manual (1604-A JOVIAL Compiler-OASIS Utility), System Development Corporation, TM-WD-402/000j 00, January 1964. 14. SWANSON, R. W., SPASUR Automatic System Mark 1, Utility System Users Manual, System Development Corporation, TM-WD-28, July 1964. 15. VER STEEG, R. L., "TALK-A High-Level Source Language Debugging Technique with Real-Time Data Extraction", Communications of the ACM, Vol. 7, No.7, pp. 418-419, July 1964. 16. CO-OP Manual, Control Data 1604 User's Group, Control Data Corporation, No. 067a, December 1960. 17. Cosmos IV Manual, System Development Corporation, TM-LX-81/001/00, October 1963. 18. H 800 Survey Guide, Honeywell, Electronic Data Processing Division. 19. IBM System/360 Special Support Utility Programs, IBM Corporation, File No. S360-32, Form C28-6505-0, 1964. 20. IBM System/360 Programming Systems Summary, IBM Corporation, File No. S360-30, Form C28-6510-0, 1964. 21. IBM 7090/7094 IBSYS Operating System, System Monitor (lBSYS), IBM Corporation, File No. 7090-36, Form C28-6248-1, 1963. 22. Phil co 2000 Operating System SYS Version E, Philco Corporation, January 1963. 23. RCA 5Ql Electronic Data Processing System, EDP Methods, Radio Corporation of America, Technical Bulletin No. 16. 24. SCOPE/Reference Manual, CDC 3600, Control Data Corporation, August 1963. ERROR CORRECTION IN CORC, THE CORNELL COMP.UTING LANGUAGE David N. Freeman IBM General Products Division Development Laboratory Endicott, New York . I. INTRODUCTION The current paper describes the error-correction procedures in greater detail. CORC, the Cornell Computing Language, is an experimental compiler fanguage developed at Cornell University. Although derived from FORTRAN and ALGOL, CORC has a radically simpler syntax than either of these, since it was designed to serve university students and faculty. Indeed, most of the users of CORC are "laymen programmers," who intermittently write small programs to solve scientific problems. Their programs contain many errors, as often chargeable to fundamental misunderstandings of the syntax as to "mechanical errors." A major objective of CORC is to reduce the volu~e of these errors. This objective has been achieved to the following extent: the average rate of re-runs for 4500 programs submitted during the fall semester of 1962 was less than 1.1 re-runs/program. II. THE CORC LANGUAGE CORC was designed by a group of faculty and students in the Department of Industrial Engineering and Operations Research at Cornell. This group has coded and tested two similar compiler /monitor systems, one for a medium scale decimal computer an9 the other for a large binary computer. During the definition of the language, the design group surrendered potency to simplicity whenever the choice arose. Certain redundancies have been incl~ded in CORC, serving two functions: to facilitate error-correction during source-deck scanning, and to aid novice programmers' grasp of compiler-language syntax. Excepting these redundancies, CORC is quite frugal with conventions. For example, all variables and arithmetic expressions are carried in floating-point form, avoiding the confusing notion of "mode." At the same time, programmers are spared all knowledge of floating-point arithmetic. Three features of CORC have enabled it to achieve this low re-run rate: (1) Inherent simplicity of the syntax; (2) Closed-shop operation of the Cornell Computing Center on CORC programs, including keypunching, machine operation, and submission/return of card decks; (3) A novel and extensive set of errorcorrection procedures in the CORC compiler /monitors. Each CORC card deck is divided into three required sub-decks plus an optional sub-deck of data cards: (a) The preliminary-description cards supply heading data for each page of the output listing. The CORe language is briefly described below; it is more fully documented elsewhere. 1 15 16 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 (b) The dictionary cards declare all variables used in the program, simple as well as subscripted. (c) Each statement card may have an indefinite number of continuation cards. Statements may bear labels having the same formation rules as variables. Continuation cards may not be labelled. Variables, ,labels, numbers, reserved words, and special characters comprise the symbols of CORC. Each symbol is a certain string of at most eight, non-blank characters. Numbers may have up to twelve digits; decimal points may be leading, trailing, or imbedded in the numbers. There are forty-three reserved words in CORC, e.g., LET, and ten special characters: + - * / $ = ( ) ., The character string defining each label, variable, or reserved word is terminated by the first blank space or special character. The character string defining each number is terminated by the first character that is neither a digit nor a decimal point. Each special character is a distinct symbol. There are forty-six legal characters in CORC: letters, digits, and special characters. A subset of the reserved words is the set of fifteen first-words: LET, INCREASE, INC, DECREASE, DEC, GO, STOP, IF, REPEAT, READ, WRITE, TITLE, NOTE, BEGIN, and END. The first symbol in each statement should, if correct, be one of these first-words. There are eight executable-statement types, plus a NOTE statement for editorial comments on the source-program listing. (NOTE statements may be labelled; in this case, they are compiled like FORTRAN "CONTINUE" statements.) To simplify the description of the statement types, single letters denote entities of the CORC language: V ...... a variable, simple or subscripted E ...... an arithmetic expression, as defined in FORTRAN L ...... a statement label B ...... a repeatable-block label (see below) R ...... one of the six relational operators: EQL, NEQ, LSS, LEQ, GTR, and GEQ. A relational expression is a predicate comprising two arith- metic expressions separated by a relational operator, e.g., 2*X NEQ 0.9. The statement types are as follows: (1) LET V = E, and two variants IN- CREASE V BY E and DECREASE V BY E. (INCREASE may be abbreviated to INC, DECREASE to DEC.) (2) IF El R E;! THEN GO TO Ll ELSE GO TO L:!, IF Ell Rl El:! AND E21 R2 E22 AND E~l R~ Ex:! THEN GO TO Ll ELSE GO TO L2 and two variants IF Ell Rl E12 OR E:!l R2 E22 OR E~l R~ EX2 THEN GO TO Ll ELSE GO TO L 2. (3) GO TO L. (4) STOP, terminating execution of a program. (5) READ Vl' V 2 , • • • , bringing in data cards during the execution phase. Each data card bears a single new value for the corresponding variable. (6) WRITE VI' V 2 , • • • , printing out the variable names, the numerical values of their subscripts for each execution of the WRITE statement, and the numerical values of these variables. (7) TITLE (message), printing out the remainder of the card and the entire statement fields of any continuation cards. (8) REPEAT B ... , comprising four variants (8a) REPEAT BETIMES, (8b) REPEAT B UNTIL Ell Rl E12 AND E2l R2 E22 AND E~l Rx E x2 , (8c.) REPEAT B UNTIL Ell Rl E12 OR E2l R2 E22 ERROR CORRECTION IN CORC (8d) REPEAT B FOR V = E 1 , E 2 , • • • , Eb E j , E k ), • • • , where (E i , Ej, E k ) is an iteration triple as in ALGOL. Closed subroutines-called repeatable blocks in CORC-are defined by two pseudo-statements as follows: B BEGIN 17 A principal tenet of the CORC philosophy is to detect errors as early as possible in: (1) c"haracters within symbols, (2) symbols within expressions, (3) expressions within statements, e.g., the left and right sides of an assignment statement, and (4) statements within the sequencing of each program. BEND, where the "B" labels appear in the normal label field. A repeatable block can be insert~d anywhere in the sub-deck of statement cards; its physical location has no influence on its usage. It can only be entered under control of a REPEAT statement (with a few erroneous-usage exceptions) . Repeatable blocks may be nested to any reasonable depth. Any number of REPEAT statements can call the same block, although the blocks have no dummy-variable calling sequences. All CORC variables are "free variables" in the logical sense, which avoids confusing the novice programmer no less than it hampers the expert programmer. III. ERROR ANALYSIS IN CORC In the CORC compiler/monitor, the author and his colleagues have attempted to raise the number of intelligible error messages and error-repair procedures to a level far above the current state-of-art for similar systems. The success of these messages and procedures is measured by three economies: An explicit message for each error is printed on the output listing. This listing is the only output document from a CORC program; all programs are compiled and executed, and machine code is never saved on tape or punched cards. After detecting a statement-card error, CORC always "repairs" the error by one of the two following actions: (a) CORC refuses to compile a "badly garbled" statement. Instead, CORC replaces it with a source-program "message statement" reminding the programmer oJ the omitted statement. (b) CORC edits the contents of a "less badly garbled" statement into intelligible source language. The edited statement is subsequently compiled into machine code. Errors in cards other than statement cards are repaired by similar techniques. (c) less faculty/student time devoted to tedious analyses of errors. Thus, the machine code produced by CORe is always executable, and compilation-phase and execution-phase error messages are provided for every program. By continuing compilation in the presence "of errors, CORC provides diagnostic data simultaneously on structural levels (1)-(4) cited above. By also executing these programs, CORC detects additional errors in program flow, subscript usage, improper function arguments, etc. The detection of each error invokes a message describing the relevant variables, labels, numbers, etc.; why they are erroneous; and what remedial actions are taken by CORC. Exhibiting errors in detail has improved student comprehension of the CORC syntax. Of course, certain errors defy detection, e.g., incorrect numerical constants. The correction of a programming error is defined to be the alteration of relevant sourcelanguage symbols to what the programmer truly intended. Under this operational definition, many errors are incapable of "correction," e.g., the programmer may have intended a statement or expression not even offered in CORC. Other errors are capable of "correc- (a) reduced re-run loads, (b) reduced costs of card preparation, and 18 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 tion" by the programmer himself but by no critic unfamiliar with the complete problemdefinition; an incorrect numerical constant is again an example. A third class of errors can be corrected by an intelligent critic after scanning the sourcedeck listing, without recourse to the problem definition. Some errors in this class require a profound use of context to elicit the programmer's true intention. Other errors in this class can be detected and corrected with little use of context, e.g., the omission of a terminal right parenthesis. The author defines a corrigible error to be one whose correction is automatically attempted by the CORC compiler/monitor. Thus, this definition is by cases, for a specific version of CORC. CORC may correct one error and fail to correct a second, nearly-identical error. Error correction is a fundamentally probabilistic phenomenon; the CORC error-correction procedures attempt to maximize the "expected useful yield" of each program by strategies based on a priori probabilities associated with the different errors. ~* The majority of corrigible errors are detected during the scanning of source decks by the CORC compiler. A few corrigible errors are detected during the execution of object programs. For each error, one or more correction procedures have been added to CORC, representing certain investments in core memory and operating speed. The following paragraph discusses the selection of corrigible errors, and section IV catalogues these errors. The catalogue will be somewhat peculiar to the structure of CORC, a population of novice programmers, and the operation of a university computing center. However, the discussion of control-statement errors, arithmetic-expression errors, and misspellings is relevant to most compiler languages. The author has roughly ranked various error conditions by two criteria: a priori probabilitiest of their occurrence, and a priori probabilities of their correction (if correction is at- * References 2 and 5 also propose probabilistic correction of misspellings. t Probabilities in the sequel are estimates based on human scrutiny of several hundred student programs. tempted). Correction procedures were designed for some errors, while other chronic errors had such low a priori probabilities of correction that only explicit error-detection messages were printed out. For example, omission of a subscript is a common error which is difficult to correct, although easy to detect and "repair." CORC "repairs" a subscript-omission error by supplying a value of 1. On the other hand, misspellings are common errors whose a priori probabilities of correction are high if sophisticated procedures are used. The author hopes to achieve at least 75 percent correction of misspellings with the current procedures; many have not yet been tested in high-volume operation. 3 ! IV. ERROR CORRECTION DURING SCANNING First, the general procedures for card scanning will be described. The second, third, and fourth subsections deal with dictionary cards, data cards, and statement cards, respectively. The last subsection describes the error-correction phase which follows scanning, i.e., after the last statement card has been read but before machine code is generated by the compiler. A. CARD SCANNING Each CORC source deck should have all cards of one type in a single sub-deck: (1) Type 1, preliminary description cards (2) Type 5, dictionary cards (3) Type 0, statement cards (4) Type 4, data cards (if used). The type of each card is defined by the punch in column 1 (although CORC may attempt to correct the type of a stray source card). At the beginning of each new source program, CORC scans the card images (usually on magnetic tape) for the next type 1 card, normally a tab card bearing any non-standard time limit and page limit for this program. (The tab cards are used to divide the decks, facilitating batch processing and other handling.) This scanning procedure skips any extraneous data cards from the previous pro:j: Damereau has achieved over 95 % cOl'l'ection of misspellings in an information-retrieval application. ERROR CORRECTION IN CORC gram deck. If the preceding deck was badly shuffled, misplaced dictionary cards and statement cards will also be skipped. An indefinite number of type 1 cards may be supplied: CORC inserts data from the first two cards into the page headings of the output listing. This serves to label all output with the processing date and programmer name, avoiding losses in subsequent handling. The problem identification should be duplicated into each deck; any deviations from this identification generate warning messages. The serialization of cards is checked, although no corrective action is taken if the cards are out of sequence. If the serialization is entirely omitted, CORC inserts serial numbers into the print-line image of each card, so that subsequent error messages can reference these print lines without exception. The. general procedure on extraneous or illegal punches is as follows: illegal punches are uniformly converted to the non-standard character "*"; extraneous punches are ignored except in non-compact variable/label fields and in the statement field of type 0 cards, where all single punches are potentially meaningful. Rather than discard illegal punches, CORC reserves the possibility of treating them as misspellings. Likewise, any non-alphabetic first character of a variable/label field must be erroneous and is' changed to "*," furnishing a later opportunity to treat this as a misspelling. All hyphen punches are converted to minus signs during card reading; the keyboard confusion of these two characters is so chronic-and harmless-that CORC even refrains from a warning message. B. DICTIONARY CARDS Although the dictionary and data cards are processed in entirely different phases of a CORC program, their formats are identicalwith the exception of column I-and common procedures are used to scan them.· As mentioned in the preceding subsection, nonalphabetic first characters are changed to Embedded special characters are similarly changed with the following exception: character strings of the form "(I)" or "(I ,J)" are omitted. Fixed-column subscript fields have al- "*." 19 ready been provided and students consistently and correctly use them. However, a common student error is to supply redundant parenthesized subscripts in the label field; these are ignored by CORC, although a \-varning message is supplied. N on-numeric characters in the subscript fields and the exponent field are changed to "I"s. Vector subscripts can appear in either the first-subscript field or the second-subscript field. These subscripts need not be right-justified in their respective fields. After an array has been defined, subsequent subscripts of excessive magnitude are not used; the corresponding data entries are put into the highest legal cell of the array. C. DATA CARDS All of the foregoing procedures apply with these exceptions: if a data card has its variable field blank or, in the case of subscripted variables, its subscript fields blank, the data can still be entered with a high probability of correcting the omission. Information in the READ statement overrides incorrect or missing entries on the corresponding data cards. CORC insists on exact agreement of the variables and subscripts if warning messages are to be avoided. Symbolic subscripts may be used in READ statements, but their execution-phase values must agree with the numeric subscripts on the type 4 cards. D. STATEMENT CARDS Correction of erroneous statement cards is a complex technique-and the most fruitful of those currently implemented in CORC. Statement cards comprise over 80 % of student source decks, on the average. Students commit the overwhelming majority of their errors in communicating imperative statements to a compiler, rather than header statements, declarative statements, or data cards. Statementcard errors fall into two major categories: those detectable at compilation time and those detectable only at execution time. The second category is discussed in section V. Some of the most useful correction techniques for the first ca.tegory-tested and modified during the past two years of CORC usage--are described in the following eight sub-sections. 20 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 ( 1) Misspellings 4 ,5 At the end of Section III, misspellings were cited as a class of errors that both occur frequently and have attractively high a priori probabilities for correction. Accordingly, CORC now contains a subroutine that compares any test word to any list of words (each entry being denoted a list word), determining a "figure of merit" for the match of each list word to the test word. Each figure of merit can be considered as the a posteriori probability that the test word is a misspelling of this particular list word. The list word with the highest figure of merit is selected as the spelling of the test word "most likely" to be correct. Various categories of misspelling are defined in CORC; to each category is assigned an a priori probability of occurrence. When the test word and a list word maUrh within the scope of a category, i.e., the test word is some particular misspelling of the list word, the a priori probability for this category is added to the figure of merit for this list word. Actually, the figures of merit are integers rather than probabilities; they can be converted to probabilities by the usual normalization, but this is unnecessary-they are used merely to rank the possible miss pellings. All increments used in misspelling analyses reflect the number N of non-blank characters in the test word, as follows: a certain basevalue increment is specified for each misspelling; if a match is found, this base value is multiplied by the ratio N/8, then added to the corresponding figure of merit. (a) A concatenation misspelling occurs when a delimiting blank is omitted between two symbols, e.g., "LET X ... " is a concatenation misspelling of "LET X ... " When such a misspelling is detected, any relevant list of words is compared against the concatenated symbol. The increment to the figure of merit for each list word is computed as follows: ( i) If the list word and the test word do not have at least their initial two characters in common, the increment is o. (ii) For every consecutive character in common with the list word (after the first character), an increment of 2 is added to the figure of merit. Example: Assume that the test word is ENTRYA and that two of the list words are ENT and ENTRY. The corresponding figures of merit are 6 and 10, respectively. The higher figure reflects the more exact agreement of ENTRY to ENTRYA. (b) Single-character misspellings provide four different increments to the figure of merit, corresponding to mutually exclusive possibilities: (i) A keypunch-shift misspelling occurs when the IBM 026 keypunch is improperly shifted for the proper keystroke, e.g., a "1"-"U" error. There are fourteen possible misspellings of this type, corresponding to the seven letter-number pairs on the keyboard. The special character row, including "0," does not seem susceptible to misspelling analysis, since special characters are always segregated, never imbedded in symbols. For each list word which agrees within a single keypunch-shift misspelling with the test word, an increment of (20N/8) is added to the corresponding figure of merit, where N is the number of nonblank characters in the test word. (ii) An illegal-character misspelling occurs either (a) when a variable/ label has previously required a "single-letter perturbation" using the character "-=1=" or (b) when an illegal punch in the card is changed to "-=1=." Single-letter perturbations are used when the same symbol occurs at both a variable and a label, or when a reserved word is used as a variable or label. In either case, conflicting usage cannot be tolerated, and CORC appends "-=1=" to the symbol for the current usage. In subsequent searches of the symbol dictionary, one may wish to recognize the orig- ERROR CORRECTION IN CORC inal spelling. Thus, for each list word which agrees within a single illegal-character misspelling with the test word, an increment of (20N/8) is added to the corresponding figure of merit, where N is as above. This increment is higher than that for a random misspelling, reflecting the peculiar origins of the character "=1=." (iii) A rresemblance misspelling occurs whenever any of the following character pairs is confused: "1""1," "0" (the letter) -"0" (the number) and "Z"-"2." For each list word which agrees within a single resemblance misspelling with the test word, an increment of (40N /8) is added to the corresponding figure of merit, where N is as above. (iv) A rrandom misspelling occurs when any other single character is mispunched in a symbol. For each list word which agrees within a single random misspelling with the test word, an increment of (10N/8) is added to the corresponding figure of merit, where N is as above. ( c) A permutation misspelling provides a single increment to a figure of merit whenever the test word matches the corresponding list word within a pair of adjacent characters, this pair being the same but permuted in the two words, e.g., LTE is a permutation misspelling of LET. For each list word which agrees within a single permutation misspelling with the test word, an increment of (20N/8) is added to the corresponding figure of merit, where N is as above. Other permutations may deserve consideration at some future date, but adjacent-pair permutations seem to have the highest a priori occurrence probabilities. (d) Simple misspellings of the foregoing types have high probabilities of successful correction insofar as the following conditions are met: (i) The list of words does not contain 21 man y near ly-identical entries. Otherwise, there will be many reasonable misspelling possibilities from which the program may select only one. (ii) Neither test words nor list words are single-character symbols. The program excludes such list words from consideration during a misspelling analysis; experience has shown that only a small proportion-perhaps 10 percent-of single-character symbols are successfully corrected. (iii) Context can be extraordinarily helpful. Associated with each list word is a set of attributes such as the count of its usage in the current program, its function (variable, label, constant, reserved word, etc.) , and any peculiar usages already detected (such as being an undeclared variable) . Certain misspelling possibilities can be immediately discarded if the context associated with the corresponding list words does not match the context of the test word. For example, if an arithmetic statement is being analyzed, any test for misspelled· variables can immediately discard all misspelled label possibilities. The first two ·of these three conditions are controlled by the vocabulary of the source-deck programmer; CORC gives far better assistance to programs using only a few variables and labels of highly distinctive spelling with at least three characters apiece. (e) The increments corresponding to different misspellings were arbitrarily selected; they can be readily raised or lowered as experience indicates. The current values reflect the following observations: (i) The weakest communication link is between the handwritten coding sheets and their interpretation by the keypunch operator. Hence, the 22 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 largest increment is assigned to resemblance misspellings. (ii) In lieu of exact information, permutation misspellings and keypunch-shift misspellings have been judged equally probable. (iii) Illegal punches in a card image arise from three sources: illegal hole patterns, improper use of a character (e.g., non-alphabetic character beginning a first word, or the duplicate use of a symbol as two entities), and card-reading failures. Lacking other evidence, the author considered the increment to be approximately the same as in (ii). (iv) Other single-character misspellings seem only half as likely to occur. Examples of the current CORC misspelling analyses may be found at the end of subsection E on Post-Scanning Spelling Corrections. (2) Subscripts Correction attempts for subscript errors have low success probabilities, on the whole. Isolated omission of one or both subscripts seems almost hopeless. CORC edits such an omission by appending" (1)" to a vector variable and "( 1, 1)" to a matrix variable. Likewise, if a matrix variable has other than two subscripts, CORC uses primitive editing techniques to produce executable machine code. Excessive commas are changed to "+" signs, and "(E)" is changed to "(E, 1) ," where "E" is the arithmetic expression for the first subscript of a matrix variable. followed by a parenthesized expression, these compilers could change the status of this variable before the final code-generation scan. To reduce compilation time, the current version of CORC scans each source statement once and must make an immediate decision when it finds a left parenthesis juxtaposed to a supposedly simple variable: should "V ( . . . )" be changed to "V* (. .. ) ," i.e., implied multiplication, or should it be treated as a subscript (and re-designate "V" as an array variable)? The present error-correction procedure is to encode "V ( ... )" into the intermediate language without change; special counters for usage as a vector /matrix variable are incremented, depending on one/two parenthesized arguments. At the conclusion of scanning, these usage counters are tested for all "simple" variables. Any variable used preponderantly as a vector variable causes CORC to test for the misspelling of some declared vector variable. Failing this, CORC changes the status of the variable to a vector of 100 cells. Any variable used preponderantly as a matrix variable causes CORe to test for the :rpisspelling of some declared matrix variable. Failing this, CORC changes the status of the variable to a matrix of 2500 cells, comprising a 50 X 50 array. Missing right parentheses are supplied and extra right parentheses are deleted as necessary, although not always correctly. If a variable is infrequently juxtaposed to parenthesized expressions, CORC treats these juxtapositions as implied multiplications. Deferral of this decision necessitates a procedure for inserting the mUltiplication operator during the conversion of intermediate language to machine code, together with the appropriate message. This error-correction procedure is one of the few in the code-generation phase. The message appears at the end of the sourcedeck listing rather than adjacent to the offending card image; the gain in error-correcting power seems to justify deferring the message. Definition of new array variables after the dictionary is complete (Le., after all type 5 cards have been processed) is an attractiveif difficult-error-correction procedure. Most algebraic compilers scan source decks several times; they have a leisurely opportunity to accumulate evidence for undeclared array variables. If such evidence is overwhelming, i.e., if every usage of a certain variable is immediately The a priori probabilities of omitted arrayvariable declarations and implied multiplications are both high. Since the two possibilities are mutually exclusive, CORC bases its choice on the percentage occurrence of the ambiguous usage. If the usage is chronic, i.e., comprising more than 50 percent of the total usage of some variable, an undeclared array variable seems more probable. If the ambiguous usage is a ERROR CORRECTION IN CORC small percentage of the total usage, implied multiplication seems more probable. (3) Arithmetic and relational expressions The rules for analyzing and correcting arithmetic expressions are as follows: (a) Extraneous preceding plus signs are deleted, and preceding minus signs are prefixed by zero, i.e., H - E" becomes "0- E." (b) Thereafter, u+," "-," "*," and "/" are all binary operators. If an operand is missing before or after a binary operator, the value "1" is inserted. This merely preserves the coherence of the syntax; to correct this error seems hopeless. ( c ) If an expression using two binary operators might be ambiguous (irrespective of the formal syntax), CORC prints out its resolution of the ambiguity, e.g., "A/B*C IS INTERPRETED AS (A/B) *C." (4) LET, INCREASE-BY, and DECREASE- BY Four components are essential to each correct statement in this category: the first-word, the assigned variable, the middle symbol, and the right-hand-side (RHS) arithmetic expression. (a) The first-word of the statement has been identified by a generalized pre-scan of the statement. If "LET" has been omitted but "=" has been found, CORC furnishes the former symbol. (b) The assigned variable may be subscripted; if so, CORC supplies any missing arguments, commas, and right parentheses when" =" or "BY" terminates the left-hand-side (LHS) of the statement. If other symbols follow the assigned variable but precede "=" or "BY," they are ignored. (c) "EQU," "EQL," and "EQ" are erroneous but recognizable substitutes for ,,- " (d) Any arithmetic expression is legal for the RHS. 23 (5) GO TO, STOP, and IF (a) With one exception-(b) just belowall unconditional branches begin with "GO," followed by an optional "TO." (b) STOP is a complete one-word statement. Also, it may be used in the conditionalbranch statement, e.g., "IF ... THEN STOP ELSE GO TO .... " (c) A conditional branch always follows one or more relational expressions in an IF or REPEAT statement. For IF statements, the first incidence of "THEN," "ELSE," "GO," "TO," or "STOP" terminates the last relational expression; missing operands, commas, and right parentheses are then inserted as needed. Thereafter, the two labels are retrieved from any "reasonable" arrangement with two or more of the above five words. Missing labels are replaced by dummy Hnext statement" labels, which later inhibit the compilation of machine-code branches. Thus, if an IF statement lacks its second label, the falsity of its predicate during execution will cause no branch. At the end of scanning, certain labels may remain undefined; here also, CORC inhibits the compilation of machine-code branches. (6) REPEAT (a) If the repeated label is omitted,e.g., in the statement REPEAT FOR ARG = 2, CORC scans the label field of the following source card. Programmers often place repeatable blocks directly after REPEAT statements using these blocks: Hence, any label on this following card is likely to be the missing repeated label: it is inserted into the-REPEAT statement. If no such label is found, CORC creates a dummy label for the repeatable block. During the execution of the program, usage of this erroneous REPEAT statement can be monitored by this dummy label. (b) If the REPEAT-FOR variant is used, CORC tests for three components in addition to the repeated label: (i) The bound variable, i.e. ARG in the example in 6 (a) . 24 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 (ii) The character "=" or its erroneous variants "EQU," "EQL," and "EQ." (iii) Any collection of iteration triples and single arithmetic expressions, separated by commas. In any iteration triple, CORC will supply a single missing argument with value "1." (c) As in IF statements, an iIi definite number of relational expressions can be used in REPEAT-UNTIL statements. existence of a "nest" of unclosed blocks. If so, the label of the innermost unclosed block is used in the current END pseudostatement. Otherwise, the card is ignored. ( c) If the label in an END pseudo-statement does not match the label of the innermost unclosed block, the current label is tested against the labels of the entire nest of blocks. If a "crisscross" has occurred, i.e., A • BEGIN B • BEGIN A END, (7) BEGIN and END REPEA T statements and repeatable blocks require consistent spelling of labels and matching BEGIN/END pseudo-statements. Through misunderstanding or carelessness, novice programmers commit grie.yous errors in using REPEAT statements and their blocks. CORC attempts to correct a certain subset of errors whose correction probabilities are attractively high: CORC inserts the END pseudo-statement for block B before the current END pseudo-statement for block A. (d) If the preceding test fails, CORC again tests the current label against the nest, looking for a misspelling. If the current label is misspelled, procedure ( c) is used. If the misspelling tests fail, CORC ignores the END pseudo-statement. (e) If the student has programmed an apparent recursion, CORC prints a warning message but takes no further action. Although unlikely, there may be a legitimate use for the construction: (a) If the label of a BEGIN pseudo-statement is missing, the preceding and following cards are tested for clues: (i) if the preceding card was a REPEAT statement using a yetundefined label, this label is supplied to the BEGIN pseudostatement. (ii) If (i) fails to hold and if the following card is labelled, this label is shifted to the BEGIN pseudostatement. (iii) Otherwise, a dummy label is supplied, awaiting further clues to the identity of the repeatable block. If such clues never appear, the block is closed by a CORC-supplied END pseudo-statement after the last statement card of the deck. Should an unpaired END pseudo-statement be subsequently found, the dummy label (on the BEG I N pseudo-statement) is changed to match this unpaired END label. (8) READ and WRITE (b) If the label for an END pseudo-statement is missing, CORC tests for the Only simple or subscripted variables can appear in READ statements. The subscripts can A BEGIN REPEAT A ... A END. In this situation, CORC makes no attempt to preserve the address linkages as a truly recursive routine would require. Thus, the program is likely to terminate in an endless loop. ERROR CORRECTION IN CORC be any arithmetic expressions. If a label appears in the argument list of a WRITE statement, the current count of the label usage will be printed. Constants, reserved words, and special characters are deleted from the argument lists of READ jWRITE statements. E. POST-SCANNING SPELLING CORRECTIONS The misspelling of labels and variables is corrected-insofar as CORC is capable-after scanning an entire deck, with the exceptions mentioned in section D. After scanning, much usage and context data have been accumulated. CORC attempts to resolve suspicious usages by equating two or more symbols to the same entity. 25 resent the variable in question. Since any misspelled variable is equated to a properly-spelled variable after scanning but before code generation, CORC corrects the misspelling merely by giving the variables identical pointer-cell contents. (b) Each new array variable is paralleled by a pointer-cell containing the base address of the array. As for simple variables, only one pointer cell is changed if this variable is equated to another array variable. When the implementation of CORC was originally under study, heavy weight was given to the potential benefits from correcting misspellings. Efficient correction of misspellings seemed to require one of the following similar strategies: (a) Two or more complete scans of the source deck, the first serving primarily for the collection of data on suspicious usages such as possible misspellings. ( c) To each label corresponds a pointer-cell containing a branch instruction to the appropriate machine location (when the latter becomes defined during the generation of machine code). For an undefined label equated to some other label, its cell is filled with a branch instruction to the pointer-cell for the other label. Thus, execution of GO TO LABELA, where LABELA is a defined label, requires two machine-language branch instructions; if LABELA is an undefined label equated to LABELB, three machine-language branch instructions are required. (b) Encoding' of the source deck into an intermediate language which is tightly packed and substantially irredundant but which also permits re-designation of labels and variables after misspelling analyses. The penalty in cor.npilation speed for using the intermediate language is modest: the average time to complete compilation for CORC programs-after the last statement card has been read-is less than one second; few decks require more than two seconds. A third alternative to these strategies was to compile the source deck directly into machine code, then attempt to repair this code after determining the set of corrigible misspellings. However, this procedure seemed less flexible to use and more difficult to program than the first two strategies; it was rejected from consideration. ( 1) Correction of misspelled labels If a label has been referenced but never defined in a label field, it is tested for being a possible misspelling of some defined label. The defined label with the highest figure of merit is selected and the following message is printed: The second alternative was selected and appears in both current implementations of CORC. Details of the strategy are as follows: (a) Each new simple variable entered into the dictionary is paralleled by a pointercell containing the address of a second cell. This address is ordinarily used during machine code-generation to rep- LABELA IS CHANGED TO LABELB, where LABELA and LABELB are the undefined and defined labels, respectively. If no defined label has a non-zero figure of merit with respect to the undefined label, the following message is printed: LABELA IS UNDEFINED Subsequently, all references to this label during the generation of machine language are 26 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 treated as "next-statement" branches. At execution time, any GO TO or REPEA T statements referencing this label cause the following messages, respectively: IN STATEMENT , GO-TO NOT EXECUTED. IN STATEMENT , REPEAT NOT EXECUTED. (2) Correction of misspelled simple variables (a) If an undeclared variable is never used in suspicious juxtaposition to parenthesized expressions (cf. subsection D (2) above), CORC attempts to find a declared simple variable meeting the following criteria: (i) The undeclared variable is a potential misspelling of the declared variable. (ii) The LHS-RHS usage of the declared variable is complementary to that of the undeclared variable. By LHS-RHS usage is meant the following two frequencies: (aa) Usage on the LHS of an assignment statement, in a READ statement, or in the initial dictionary. This usage corresponds to assigning the variable a new value. (bb) Usage on the RHS of an assignment statement, in a relational expression, or in a WRITE statement. This usage corresponds to using the current value of the variable. The motivation for LHS-RHS analysis is the following: if two variables are spelled almost identically, if one has a null RHS usage and the other a null LHS usage, then the a priori probability that the programmer intended a single entity is higher than the probabilities for most alternative misspellings. CORC does not use LHS-RHS analysis alone to determine the best misspelling possibility. Instead, an increment of 5 is added to the figure of merit of each declared variable whose null usage complements any null usage of the current test word, i.e., undeclared variable. Un- declared variables can be equated only to declared variables, not to other undeclared variables. (b) If a declared variable has a null RHS usage, it may be an erroneous dictionary spelling of some variable which is thereafter consistently spelled. However, CORC will announce that the dictionary spelling is "correct" in this case, after it detects the misspelling; all "misspelled" incidences of the variable are equated to the declared variable. (3) Examples Four groups of nearly-matching symbols are illustrated in Table 1. In. the first group, the label ABC requires testing for misspelling. The label ABCDE is a concatenation misspelling, figure of merit (FOM) = 6. The label ABD is a random misspelling, FOM = 3. The label BAC is a permutation misspelling, FOM = 7. The label AB+ is an illegal-character misspelling, FOM = 7. Thus, CORC would choose at random between BAC and AB+ for the defined label to which ABC should be equated. In the second group, the defined label DEI has FOM = 15 with respect to the undefined label DEL In the third group, three simple variables have not been declared in the dictionary and require testing for misspelling. One should remember that only declared simple variables, i.e., XYZ and XYU, are eligible for identification with the undeclared variables. With respect to XYV, XYZ has misspelling FOM = 3; to this must be added the null-RHS increment of 5, making a total FOM = 8. Since XYU has only the misspelling FOM of 3 with respect to XYV, XYV is equated to XYZ. With respect to YXZ, XYZ has a misspelling FOM of 7, plus the null-RHS increment of 5, making a total FOM of 12: since XYU has a zero FOM for YXZ, CORC equates YXZ to XYZ. With respect to YXW, neither XYZ nor XYU has a positive FOM; thus, YXW is not equated to a declared variable. In the fourth group, GHI was invariably used as a vector variable. Since it is a res em- 27 ERROR CORRECTION IN CORC TABLE 1. SAMPLE PROBLEMS IN POST-SCANNING SPELLING CORRECTIONS LHS Usage RHS Usage Usage as Vector Usage as Matrix 1 2 o 1 1 1 o o o o o o o o 2 o 2 2 2 4 2 o Symbol Type Declared/ Defined? ABC ABCDE ABD BAC AB=i= label label label label label no yes yes yes yes DEl DEI label label no yes XYZ YXW simple variable smp. var. smp. var. smp. var. smp. var. yes yes no no no GHI GH1 GHJ GHK vector variable smp. var. smp. var. smp. var. yes yes yes no 2 2 1 2 XYU XYV YXZ blance misspelling of the declared vector variable GHI, it is equated to this variable and its status changed to a vector. GHJ was used 67 percent of the time as a vector variable; since it is a random misspelling of GHI, it is equated to the latter. GHK has a positive figure of merit with respect to each of the three preceding entries. However, GHK was never used as a vector variable. Since the GHJ and GHI have been set to vector status, GHK can no longer be equated to either of them; it thus remains a distinct, undeclared variable. V. ERROR MONITORING DURING EXECUTION CORC prefaces each compiled statement by a sequence of machine language instructions to monitor object-program flow. Additional "overhead" instructions for monitoring appear in four types of statements: labelled statements, statements containing subscripted variables, REPEAT statements, and READ statements. The monitoring effort has three objectives: (a) Prevent the object program from overwriting the CORC compiler/monitor or itself; 1 4 2 o Total Usage 1 3 2 o o 1 2 o o o 4 4 o 3 4 (b) Continue the execution phase through untested code when the flow of the obj ect program becomes confused (through misuse of REPEA T statements or incomplete GO TO, IF, and REPEAT statements) ; ( c) Provide explicit diagnostic messages for each error detected at execution time, followed by an unconditional postmortem dump of simple-variable values and other helpful data. 6 § A. THE GENERAL MONITOR (1) CORC accumulates a count of all statements executed, the statement count. This count is printed in the post-mortem dump, together with the number of errors committed during the entire program and the total elapsed time for the program. The statement count has two minor functions: to aid debugging of § Many debugging languages such as BUGTRAN (cf. 6) furnish trace and snapshot information if requested by the programmer. CORC furnishes such diagnostic information unconditionally; the overhead instructions cannot be suppressed after programs are debugged. 28 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 short programs in conjunction with the "label tallies" (see (3) below) andlooking towards future CORC research-to exhibit the different speeds of execution for various programs, e.g., with/without heavy subscript usage. The per-statement overhead of the statement count is 13.2 microseconds, comprising a single "tally" instruction. (2) Before executing each statement, its source-card serial number (converted to a binary integer) is loaded into an index register. Execution-phase messages resulting from this statement retrieve the serial number and print it as an introductory phrase to each message, e.g., IN STATEMENT 1234, THE PROGRAM IS STOPPED. Each load-index instruction requires 3.3 microseconds. The percentage of execution time devoted to items (1) and (2) is usually less than 3 percent; see (5) below. (3) The execution of each labelled statement is tallied, by label. These tallies are printed in the post-mortem dump; they show the progress of the program, which branches were never taken, endless loops, etc. Each tally instruction requires 13.2 microseconds. (4) At each labelled statement, a two-position console switch is interrogated. In the normal position, the switch has no effect on program flow. If set, the switch causes the program to terminate at once, printing the message, IN STATEMENT , THE PROGRAM IS MANUALLY INTERRUPTED, followed by the usual post-mortem dump. Thus, any endless loop can be manually interrupted without stopping the computer, although this is rarely necessary. (Cf. the subsequent section on Terminations.) The switch interrogation is required only at labelled statements, since endless loops must include at least one label. Each switch "interro- gation requires 7.2 microseconds. The percentage of execution time devoted to items (3) and (4) is usually less than 1 percent, as exhibited by the following analysis. (5) Assuming that 100,000 statements are executed per minute, an average statement requires some 600 microseconds. Since items (1) and (2) aggregate 16.5 microseconds per stateTI?-ent, the overhead for these items is 2.75 percent. Assuming that every fourth statement is labelled, items (3) and (4) are incurred once every 2400 microseconds on the average; since these times aggregate 20.4 microseconds, their overhead is approximately 0.8 percent. (6) No tracing features are offered in CORC. If a student requires more diagnostic data than is already furnished, he is encouraged to use WRITE and TITLE statements generously. However, he is also warned to print such data compactly: (a) If two consecutive pages print less than 30 percent of the 14,400 char~ acter spaces available (2 pages X 60 lines/page X 120 characters/ line), CORC prints out the follow~ ing message: -TRY TO USE MORE EFFICIENT WRITE AND TITLE STATEMENTS AND AVOID WASTING SO MUCH P APER(b) A page-count limit is set for all normal programs; when this limit is reached, the program is terminated at once. (7) Each untranslatable source card has been replaced by a TITLE card during scanning, bearing the following message: CARD NO. NOT EXE~ CUTED, SINCE UNTRANSLATABLE. These messages remind the programmer of omitted actions during the execution phase. ERROR CORRECTION IN CORC B. MONITORING ARITHMETIC ERRORS CORC uses conventional procedures for arithmetic overflow junderflow errors, but somewhat novel procedures for special-function argument errors. The machine traps of the computer detect overflow junderflow conditions, which are then interpreted into CORC messages: (1) IN STATEMENT EXPONENT UNDERFLOW. (CORC zeros the accumulator and proceeds.) (2) IN STATEMENT _ _ _ _ _ , EXPONENT OVERFLOW. (CORC sets the accumulator to 1 rather than to some arbitrary, large number. This tends to avoid an immediate sequence of identical messages, allowing the execution phase to survive longer before termination from excessive 'errors.) , DIVI(3) IN STATEMENT SION BY ZERO. ASSUME QUOTIENT OF 1.0. For each special function error, CORC creates an acceptable argument and proceeds, instead of taking drastic action, e.g., immediate program termination, as many monitor systems do. (4) IN STATEMENT , {' E~P t SIN j ARGUMENT TOO LARGE. THE RESULT IS SET TO 1. (5) IN STATEMENT , LN 0 YIELDS (or . . . LOG 0 YIELDS) 1. (6) IN STATEMENT / LN } LOG , lSQRT j OF NEGATIVE ARGUMENT. ABSOLUTE VALUE IS USED. THE (7) IN STATEMENT , ZERO TO NEGATIVE POWER-ASSUME 1. (8) IN STATE ME-NT . , $-NEGATIVE ARGUMENT. THE RESULT IS SET TO 1. C. TERMINATIONS Two abnormal terminations were discussed in the General Monitor section. Altogether, 29 there are five terminations, caused by the following events: (1) Console switch set. (2) Page count limit exceeded. (3) Time limit exceeded. Overflow of the real-time clock produces a machine trap which is intercepted by CORC. For each program, a time limit ( ordinarily of sixty seconds) is set. (The tab cards separating the source decks can bear any non-standard page-count and time limits. II) When this time is exhausted, the program is terminated with the following message preceding the postmortem dump: IN STATEMENT _ _ _, THE TIME IS EXHAUSTED. Endless loops are terminated by this procedure, avoiding the necessity of operator intervention with the console switch. ( 4) Error count too high. After each program has been compiled, the total error count is interrogated. When it exceeds 100, then or thereafter, the program is terminated with the appropriate message. (5) Normal execution of STOP. The message IN STATEMENT _ _ _, THE PROGRAM IS STOPPED identifies which STOP statement-possibly of several such statements-has been met. For all terminations, the postmortem dump includes the following: (a) The final values of all simple variables. Since arrays may comprise thousands of cells, CORC cannot afford paper or machine time to dump them too. (b) The usage tallies for all labels. (c) The first fifteen (or fewer) data card images. (d) The error-count, statement-count, and elapsed-time figures. II Ordinarily the tab cards are blank. A special rerun drawer is used for programs which require unusual output volume or running time; the computing center inserts special tab cards with non-standard page-count and time limits before these decks. 30 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 D. MONITORING SUBSCRIPTED VARIABLES (b) Negative numbers are also changed to 1: One of CORC's most radical innovations is the universal monitoring of subscripts. CORC is attempting to trade execution efficiency for two other desiderata: IN STATEMENT , SUBSCRIPT FOR VARIABLE _ _ IS NEGATIVE. IT IS SET TO 1. (c) If non-integral, the subscript is rounded to an integer. If the roundoff error is less than 10-9 , no error message is incurred; earlier calculations may have introduced small round-off errors into a theoretically exact subscript. If the round-off error exceeds 10-9 , the following message appears: (a) Protection of the in-core compiler/monitor against accidental overwriting by student programs. (b) Provision of complete diagnostics on all illegal subscripts: in which statements, for which variables, and the actual erroneous values of the subscripts. CORC's excellent throughput speed has depended on infrequent destruction of the in-core compiler /monitor; in the author's opinion, subscript monitoring is CORC's most important protective feature. Criterion (b) -full diagnostic information on subscript errors-is also of significance, since erroneous subscript usage comprises at least 30 percent of all execution-phase errors. Students quickly learn that these errors are among the easiest to commit-although they are spared the hardship of their detection and isolation. Subscript usage is monitored as follows: (1) Each reference to a subscripted variable incurs a load-index instruction corresponding to the dictionary entry for this variable. If subsequent troubles arise in the subscripts, CORC can retrieve the name and other particulars of the variable by using this index register. (2) The subscript is an arithmetic expression, whose floating point value is transmitted in the machine accumulator to a closed subroutine for un floating numbers. (3) The latter subroutine checks for a positive, integral subscript. (a) 0 is changed to 1 with the following message: IN STATEMENT , SUBSCRIPT FOR VARIABLE _ _ IS O. IT IS SET TO 1. IN STATEMENT , SUBSCRIPT FOR VARIABLE _ _ IS NON -INTEGRAL. IT IS ROUNDED TO AN INTEGER. (d) After verifying (or changing to) a positive, integral subscript, the closed subroutine for unfl.oating subscripts returns control to the size test peculiar to this variable. (4) The subscript is tested for exceeding the appropriate dimension of the array variable. Thus, the first subscript of a matrix variable is tested against the declared maximum number of rows, and the second subscript is tested against the declared maximum number of columns; a vector subscript is tested against its declared maximum number of elements. An excessive value incurs one of the three following messages: IN STATEMENT _ _ _ _ __ IS THE {S~~~JD} SUBSCRIPT FOR VECTOR THE VARIABLE . SINCE IT IS EXCESSIVE, IT IS REPLACED BY THE VALUE _ __ The second blank in the message is filled with the current execution-phase value of the subscript. The. third and fourth blanks are filled with the variable name and its maximum allowable SUbscript. This action serves to repair the erroneous subscript but hardly to correct it. ERROR CORRECTION IN CORC The overhead for each error-free usage of a subscript is 85 microseconds. With obvious waste of effort, this overhead is incurred six times for the statement: LET A (I,J) = B (I,J) + C (I,J) . Future versions of CORC may treat such repeated usage of identical subscripts with more sophistication. However, one must remember that "A," "B," and "C" could have different maximum dimensions, in this example. A row subscript legal for "A" might be excessive for "B," etc. Also, in statements such as LET A(I) = A(I + 1), one must corroborate the legal size of "(I + 1)" as well as that of "I." The per-program overhead of subscript monitoring varies between 0 percent and 90 percent of the execution time, as one might guess. An average overhead of 15 percent has been measured for a representative batch of programs. E. MONITORING REPEATED BLOCKS (1) Each repeatable block is legally used only as a closed subroutine. Hence, the exit instruction from the block-machine code generated by its END pseudostatement-can be used to trap any illegal prior branch to an interior statement of the block. (One cannot enter a block by advancing sequentially through its BEGIN pseudo-statement. However, one can illegally branch to an interior statement of a repeatable block from a statement physically outside the block.) When the block is properly entered by a REPEAT statement, the address of the exit instruction is properly set; after the repetitions have been completed, a trap address is set into this exit instruction before the program advances beyond the REPEAT statement. Thus, program flow can physically leave and re-enter a repeatable block in any complex pattern, as long as the block has been properly "opened" by a REPEAT statement and has not yet been 31 "closed" by completion of the repetitions. In this respect, CORC allows more complex branching than most compilers. When the exit instruction traps an illegal prior entry, CORC prints the following message: IN STATEMENT , AN ILLEGAL EXIT FROM BLOCK _ __ HAS JUST BEEN DETECTED. IN SOME PREVIOUS GO-TO STATEMENT, THE BLOCK WAS ILLEGALLY ENTERED. THE PROGRAM CONTINUES AFTER THE END STATEMENT OF THIS BLOCK. (2) To protect against various illegal usages of the bound variable in REPEAT-FOR statements, CORC pr"e-calculates the number of repetitions and conceals this count from the repeatable block; the count is fetched, decremented, and tested only by the REPEAT statement. This discussion is amplified in (d) below. Consider the statement: REPEAT B FOR V = (E 1 , E;!, E3): (a) If E 1 ::;:;:: E 3, the block is executed once. (b) Otherwise, if E2 is zero, CORC prints the following message: IN STATEMENT , IN REPEAT-FOR TRIPLE, SECOND ARGUMENT IS O. THE REPEAT IS EXECUTED ONCE. (c) Otherwise, if (E3 - E 1 )/E 2 is negative, CORC prints the following message: IN STATEMENT , IN REPEAT-FOR TRIPLE, SECOND ARGUMENT HAS WRONG SIGN. THE REPEAT IS EXECUTED ONCE. (d) Otherwise, CORC uses the count E3 - El ] . E2 to determIne the num[ ber of repetitions. This count is reduced by 1 for each iteration, irrespective of the subsequent values of "V," "E2'" or "E3." Novice programmers often manipUlate "V" inside repeatable blocks; CORC pre- 32 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 vents many potentially endless loops by ignoring this manipulation. F. MONITORING DATA-CARD INPUT The reading and checking of data cards was introduced in Section IV. In brief, a READ statement causes the following steps to occur. (1) A new card is read in; if it is of type 1, CORC assumes it to be the first card of the next source deck. Thereupon, the following messages appear: THE INPUT DATA HAS BEEN EXHAUSTED. IN STATEMENT _ _ _, CORC SUPPLIES A DATA CARD FOR THE VARIABLE _ _ _ WITH VALUE 1.0. Thus, CORC enters a value of 1 for the READ variable and proceeds with the program; subsequent READ statements incur only the second message above. (2) If the new card is neither type 1 nor type 4 (i.e., the correct type), CORC prints this message: IN STATEMENT , THE CARD IS ASSUMED TO BE A DATA CARD. (3) If the new card is type 4-possibly as the result of (2) above-CORC checks the variable field against the variable name in the READ statement. If they disagree, CORC considers the name in the READ statement to be correct; the following message is printed: IN STATEMENT , THE VARIABLE WAS READ FROM THE CARD. THE VARIABLE IN THE READ STATEMENT WAS _ _ __ (4) When the variable names have been reconciled CORC checks for none, one, or two subscripts on the card, as appropria te to the READ variable. Missing or erroneous subscripts incur the following message: IN STATEMENT THE SUBSCRIPT ( , ) WAS READ FROM THE CARD'. THE SUBSCRIPT IN THE READ STATEMENT WAS ( ), or IN STATEMENT , THE SUBSCRIPT ( ) WAS READ FROM THE CARD. THE SUBSCRIPT IN THE READ STATE). MENT WAS ( In every case, CORC uses the value in the READ statement. VI. CONCLUSIONS A. EXPERIENCE IN PRACTICE Throughout the 1962-63 academic year, CORC was in "pilot project" status; in 1963-64 CORC was established as the fundamental computing tool for undergraduate engineering courses at Cornell. In the spring semester of 1964, over 15,000 CORC programs were run, peaking at 2500 programs in one week. The performance of CORC programmers far surpassed the preceding years' performance by ALGOL programmers at Cornell in such respects as speed of language acquisition, average number of re-runs per program, and average completion time for classroom assignments. Actual processing time can be evaluated from the following figures, which are rough estimates based on last year's experience with CORC programs: (a) Average processing time (tape/tape configuration)-500 programs per hour. (b) Average machine-code execution rate100,000 source-language statements per minute, for a random sample of twenty student programs. (c) Average compilation time for CORC programs-less than two seconds. (d) Turnaround time for programs-one day or less, with rare exceptions. The author has automated the operation of the compi,ler/monitor to the following degree: only a random machine malfunction can cause the computer to halt. Since programming errors cannot produce object code that erroneously diverts control outside the CORC system, the ERROR CORRECTION IN CORC role of the machine operator is merely to mount input tape reels and remove output tapes: the computer console needs almost no attention. A few error-detection procedures were altered during 1962-64, primarily to make diagnostic messages increasingly explicit. A new CORC manual was prepared for instructional use in 1963-64; this manual omitted any catalogue of errors, since the author expected that the compiler/monitor systems could describe the errors-and the corresponding remedial actions-in satisfactory detail. CORC has imposed a modest load on the two computers at Cornell. The computing center is satisfied that neither FORTRAN nor ALGOL can lighten this load, which is rarely as much as two hours of CORC runs daily. (FORTRAN and ALGOL systems have greater capability but require more' facility in programming. The class of problems for which CORC has been developed would not warrant the expenditure of time required to program in the advanced languages.) In the author's opinion, this small commitment of resources is well-justified by the educational value of the CORC project. B. POTENTIAL UTILITY OF CORC The author feels that many universities and technical colleges can profitably utilize CORC for introductory instruction. The designers of CORC are convinced that a simple language is well suited for initial study; many Cornell students have already easily advanced to FORTRAN or ALGOL after mastering CORC. With respect to the error-detection and errorcorrection features, CORC demonstrates the modest effort required to furnish intelligible messages and how little core memory and machine time are consumed. Many CORC errormonitoring procedures deserve consideration in future implementations of compiler languages: unconditional counts of statement labels (or statement numbers), source-program citations in diagnostic messages, and brief dumps following all program terminations. The monitoring of subscripts would not be burdensome if the latter were carried as integers-index registers are used in most current compiled codes. Ninety percent of the CORC subscript- 33 usage execution time is devoted to unfloating numbers, and only ten percent is devoted to testing these numbers for size. C. POTENTIAL IMPROVEMENTS IN CORC Four areas for significant improvements in CORC are as follows: (1) Identification of integer-mode variables by their context. Index registers can then be used for arrays and loop counting as in FORTRAN. (2) A problem-grading mechanism. Each instructor can assign a scale of penalties for various errors. CORC will process his batch of student programs and assign the appropriate grades. (3) A permanent file for tabUlating errors. Each time that CORC programs are run, an auxiliary output device-paper tape or punched cards-will record the serial number of each error committed. Periodically, these tapes or cards will be summarized. This data will furnish statistical estimates for the a priori occurrence probabilities of the errors. (4) Remote consoles. These are much discussed in current computer literature, and they hold unusual promise for highvolume university operation. Students would type in their programs from keyboards distributed around a campus covering hundreds of acres. Either these programs would interrupt a large computer programmed for real-time entry, or they would be stacked on tape/disk by a satellite computer. Perhaps results could be printed/typed at these remote stations by the satellite computer. The author and his colleagues are well aware of shortcomings in the language. However, they intend to resist changes which increase the power of the syntax at the expense of linguistic simplicity. Changes on behalf of additional simplicity or clarity are willingly accepted. Continuing efforts will be made to improve the clarity and explicitness of the djagnostic messages, so that classroom instruction can be further integrated with output from the computer. 34 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 VII. ACKNOWLEDGMENTS The author is a former student of Professors Conway and Maxwell; he gratefully acknowledges their assistance to the error-correction project. Other contributors were R. Bowen, J. Evans, C. Nugent, J. Rudan, and R. Sanderson. 2. DAMEREAU, F. J., "A Technique for Computer Detection and Correction of Spelling Errors," Gomm. AGM, 7, 171 (1964). 3. DAMEREAU, F. J., op. cit. 4. Ibid. VIII. REFERENCES 5. BLAIR, C. R., "A Program for Correcting Spelling Errors," Inform. and Gtrl., March 1960, pp. 60-67. 1. CONWAY, R. W., and MAXWELL, W. L., "CORC: The Cornell Computing Language," Comm. ACM, 6, 317 (1963). 6. FERGUSON, H. E., and BERNER, E., "Debugging Systems at the Source Language Level," Gomm. AGM, 6, 430 (1963). THE COMPILATION OF N,ATURAL LANGUAGE TEXT INTO TEACHING MACHINE PROGRAMS* Leonard Uhr University of Michigan Ann Arbor, Michigan Consultant, System Development Corporation Santa Monica, California appears to be an acceptable minimum of conventions, and then compile it (TMCOMPILE), and (2) interpret the compiled program, thus giving a running program that interacts with students (TEACH). Programmed instruction, via digital computers, must be made as painless as possible, both in the writing and the changing of programs, for the author of the programmed text. Otherwise we will only slowly accumulate a body of expensive programs that we will never succeed in testing adequately. It is crucial, given that we are investigating programmed instruction at all, that it become easy to write and re\vrite the programs. In effect, then, this is a compiler-interpreter for programs that are written in relatively unconstrained natural language (no matter which), so long as they are oriented toward the specific problem of programmed instruction, in that they conform to the format constraints described below. It is thus similar in spirit to problem-oriented compilers. Similar compilers have been coded at IBM (referred to in Maher3 ) and SDC (Estavan 1 ) . Despite what appear to be a significantly simpler logic and fewer conventions that must be learned, the present compiler, by means of its branching features, appears to handle a larger set of programs than IBM's, uses a somewhat simpler set of formatting rules, and offers the ability to make loose, partially ordered andlor unordered matches, to use synonyms, and to delete and insert questions conveniently. Estavan has written a program that assembles instructions telling a student where to look in a pre-assigned textbook. This program is restricted to multiple-choice questions. A great deal of research is needed as to the effectiveness of different types and sequences of items; therefore, programs must be flexible and easily changed. A large number of different programs will be needed, from many different content areas. These programs should be written by people whose competence is in these content areas. Such people cannot be expected to learn about computers, or about programming. Ideally, the problems of writing a program for computer teaching of a course in, for example, logic, French, botany, or computer programming should be no greater than the problems in writing a good book. This paper describes a set of two programs that have been written to (1) allow someone to write a program in his content area without having to learn anything new other than what * The author would like to thank Ralph Gerard for bringing the magnitude of the practical need for such a compiler to his attention, William Dttal for discussions of some of the features that such a compiler should have, and Peter Reich for suggestions as to format. 35 36 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 Description of the Program and the Inputs It will Accept If he wishes, the author of a programmed text might sit down at the keypunch or flexowriter and compose in ihteraction with the computer. Or he might retire to his study and write down the set of information, questions, alternate possibilities for answers, and branches that he wants, and ask a keypunch operator to put these onto cards for compiling. In either of these two modes he must follow a few conventions, as described below. Or, if he insists upon his freedom, he might simply be asked to write his text in any way he desired, subject only to the restriction that it follow the very general format of containing only: (1) statements giving information to the student, and (2) questions about this information, either (a) multiple-choice, (b) true-false, or (c) correctly answerable in a concise way, with the various acceptable crucial parts of answers listed by the author right after the question and acceptable synonyms listed in a synonym dictionary. In addition, for each alternative answer (or set of alternatives), the author should say what question or statement of fact the program should ask of the student next. (Or, alternatively, if the author does not bother to specify this, the program will simply go to the next item-the next statement or question-according to the order that the author has given them.) A text written in such a way could easily be formatted by the keypunch operator who punched it onto cards. In general, then, the type of text that the author must write must be a set of strings which are either statements (of information) or questions. The questions must be followed by the alternate possible answers, and each set of alternate possible answers must be followed by an explicit or implied branch to another string in the text. Figures lA and 2A give examples of such texts. If the author is willing to go to a little bit of trouble, he will produce the texts of Figures lA and 2A in a form that will be compiled directly. Figures IB and 2B show what these texts would look like then. If the author makes use of the computer as he writes, he can delete strings that he would like to change, by means of an instruction to "erase string i,-" and then, if he wishes, write in the new version of string i. He can also ask the program to begin teaching him (or others) , to collect data on successes and failures, and to give him a feeling of the program from the student's point of view. Rules for Format A. The peculiarities of this language that the user must learn are as follows: 1. A new item must be identified by *NAME. 2. Items are composed of elements, and all elements are bounded by slashes (I). 3. The following things are elements: (a) the entire statement giving information or advice, (b) the entire question, (c) each alternative possible answer to a question, (d) the branch to the next string to be presented to the subject. 4. The branch element must start with an asterisk (*). B. If he so desires, the author can gain a good bit of additional flexibility by using the following additional features of the language: 5. The NAME is optional: if none is given, the program names this string with the integer one greater than the last integer name given. The name can either be an integer (in which case care must be taken that it is never automatically assigned by the program) or a string of alphanumeric character. 6. An "otherwise" branch (**) for the entire question is optional, and goes at the end of the answer portion of a question. 7. Partial matches between a student's answer and an acceptable answer will be accepted if they fulfill the following criteria: (a) if a word in the answer is listed in a synonym dictionary that has been read into the program as equivalent to a word that the student uses, (b) if the correct answer is a connected substring of the student's answer, (c) if a correct answer is specified as a list of substrings separated by commas and pe- THE COMPILATION OF NATURAL LANGUAGE TEXT A. In Need of Pre-editing. TO TELL WHEl'HER AN OBrAINED DIFFERENCE IS SIGNIFICANT, YOU MUST KNOW WHEl'HER IT IS IARGER THAN MIGHT ARISE FROM SAMPLING VARIABILITY. SAMPLING VARIABILITY IS DUE TO ACCIDENTAL OR CHANCE FACTORS THAT AFFECT THE SELECTION OF OBSERVATIONS INCLUDED IN THE SAMPIE. THESE CHANCE FACTORS OBEY THE rAWS OF PROBABILITY; FROM THESE rAWS YOU CAN CALCUIATE HOW BIG A DIFFERENCE MIGIn' BE EXPECTED BETWEEN TWO SAMPIES DRAWN FROM THE SAME THE rAWS OF PROBABILITY APPLY ONLY TO SAMPLrn THAT CAN BE POPUIATION. SHOWN TO BE RANDOM SAMPIES. A RANDOM SAMPIE MUST BE THE - SE~ IN A WAY THAT GIVES EVERY OBSERVATION IN BEING SAMPIED AN EQUAL CHANCE OF BEING INCLUDED. ANSWER: POPUIATION. WHEN THE NAMES OF THE STUDENTS IN A COLLEGE ARE WRITTEN ON IDENTICAL SLIPS AND ARE DRAWN DRAWN IS A AN - - - our OF A HAT BY A BLINDFOIDED PERSON, THE SAMPIE SO SAMPLE BECAUSE EACH MEMBER OF THE POPUIATION WOUID HAVE OF BEING INCLUDED. ANSWER: A SAMPIE THAT IS Nor RANDOM IS BIASED. RANDOM ••• ~UAL CHANCE. IF SOME OF THE STUDENTS' NAMES WERE Nor IN THE HAT, THE SAMPLE DRAWN WOUID BE ANSWER: B. BIASED. Prepared for Automatic Compilation. */TO TELL WHETHER AN OBI'AINED DIFFERENCE IS SIGNIFICANT, YOU MUST KNOW WHErHER IT IS IARGER THAN MIGHT ARISE FROM SAMPLING VARIABILITY. SAMPLING VARIABILITY IS DUE '{'O ACCIDENTAL OR CKA..NCE FACTORS THAT A.FFECT THE SELECTION OF OBSERVATIONS INCLUDED IN THE SAMPLE. THESE CHANCE FACTORS OBEY THE lAWS OF PROBABILITY; FROM THESE rAWS YOU CAN CALCUIATE HOW BIG A DIFFERENCE MIGHT BE EXPECTED BETWEEN TWO SAMPLES DRAWN FROM THE SAME POPUIATION. THE lAWS OF PROBABILITY APPLY ONLY TO SAMPLES THAT CAN BE SHOWN TO BE RANDOM SAMPIES/ */A RANDOM SAMPLE MUST BE SELECTED IN A WAY THAT GIVES EVERY OBSERVATION IN THE BEING SAMPLED AN ~UAL CHANCE OF BEING INCLUDED/POPUIATION/ - * /WHEN THE NAMES OF THE STUDENTS IN A COLLEGE ARE WRITTEN ON IDENI'ICAL SLIPS AND ARE DRAWN our OF A HAT BY A BLINDFOIDED PERSON, THE SAMPIE SO DRAWN IS A - SAMPIE BECAUSE EACH MEMBER OF THE POPUIATION WOUID HAVE AN - OF BErm INCLUDED/RANDOM. EQUAL CHANCE.=l/ - */A SAMPIE THAT IS Nor RANDOM IS BIASED. IF SOME OF THE STUDENTS' NAMES WERE Nor IN THE HAT, THE SAMPLE DRAWN WOUID BE - /BIASED/ Figure 1. A Sequence Typical of Those Found in Programmed Instruction Texts. Figure la. In Need of Pre-editing. Figure lb. Prepared for Automatic Compilatio~. 37 38 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 A. In Need of Pre-editing. JOHN LIKES MARY BROWN. WHO DOES JOHN LIKE? MARY BROWN. B • MARY. A. A. MARY WHO? B. Bur MARY LIKES PHIL AND PHIL LIKES BETTY. BROWN. B; OTHERWISE TO 1ST DOES MARY LIKE BETTY? C. YES OR NO.O. DON'T KNOW. D. YOU REALLY CAN'T KNOW FROM WHAT YOU'VE BEEN TOLD. IF ONE PERSON LIKES A SECOND PERSON WHO LIKES A THIRD, IT'S NOT CERTAIN THAT THE FIRST PERSON LIKES THE THIRD. D. JOHN LIKES BETTY TOO, ALONG WITH JANE, AND CAROL. WHO DOES JOHN LIKE? E. GENERALIZE. JOHN LIKES F. RIGHT. • GIRLS, OR WOMEN.F. WHAT DO BETTY, MARY, JANE AND CAROL HAVE IN COMMON? GIRIS, OR WOMEN. F • WHO JOHN LIKES. G • BUl' NOT NECESSARILY ALL. YES, OR MAYBE.H. G. BErTY, MARY, JANE, OR CAROL.E. DO YOU THINK JOHN LIKES MOST GIRIS? NO, OR DON'T KNOW, OR NOT ENOUGH INFORMATION.I. IT DOESN'T ADD MUCH TO SAY "JOHN LIKES THAT WHICH JOHN LIKES. tI SUCH A STATEMENl' IS CALLED A TAUTOIOOY -- THERE'S NO POINT IN SAYING THE SECOND HAlF ONCE YOU'VE SAID THE FIRST HAIF. I. H. NO. THIS IS A VERY FALLIBLE AND UNLIKELY SORr OF INFERENCE TO DRAW. FOR INSTANCE, JOHN CERrAINLY DOESN'T EVEN KNOW MOEn' GIRLS. GENERALIZATIONS OF THIS SORT ARE RISKY AT BEST, Bur AT THE LEAS!' YOU MUS!' KNOW MUCH MORE AOOtJr THE TarAL GROUP -- GIRIS -- AND ITS REIATION TO JOHN AND HOW THE PARrICULAR EXAMPLES GIVEN WERE CHOSEN. I. IT SO HAPPENS THAT JOHN DOES LIKE MOST OF THE GIRLS THAT HE KNOWS. MEN AND OOYS DO. Bur THERE ARE AIHIAYS EXCEPl'IONS. DOESN'T LIKE ALICE. msr FOR EXAMPLE, JOHN ON THE OTHER HAND, HE USUALLY LIKES THE GIRLS THAT PHIL LIKES. IS I'i' LIKELY THAT JOHN LIKES BETTY? J. RIGHT. YES. J • orHERWISE. K. SINCE PHIL LIKES BErTY AND JOHN TENDS TO LIKE GIRLS AND TENDS TO LIKE GIRLS THAT PHIL LIKES. L. K. THERE IS SOME REASON TO THINK YES, SINCE PHIL LIKES BETTY. L. READ PAGES 7-13 OF THE TEXT. Figure 2. A Contrived Example Exhibiting Some Features of the Program. THE COMPILATION OF NATURAL LANGUAGE TEXT 39 * / JOHN LIKES MARY BROWN/ */WHO DOES JOHN LIKE/MARY BROWN/*B/MARY/*A/**l/ *A/MARY WHO/BROWN/*B/**l/ *B/Bur MARY LIKES PHIL AND PHIL LIKES BNrTY/ */DOES MARY LIKE BETTY/YES/NO/*C/N.T.KNOW.=2/*D/ *C/YOU REALLY CAN'T KNOW FROM WHAT YOU',VE BEEN TOrno A SECOND PERSON WHO LIKES A THIRD, rr' S IF ONE PER.,~N LIKES Nor CERTAIN THAT THE FIRsr PERSON LIKES THE THIRD/ *D/JOHN LIKES BRrI'Y TOO, ALONG WITH JANE, AND CAROL/ */WHO DOES JOHN LlKE/BETTY,MARY,JANE,CAROL,=O/*E/GIRLS,WOMEN,=O/*F/ WHAT DO BEl'rY, MARY, JANE, AND CAROL HAVE IN COlIJIIJW/ *E/GENERALlZE. * / JOHN. LIKES-/G IRIB, WOMEN, =0/ *Ii' /WHO. JOHN .·LIKES. =2/ '*G / **11 *Ii' fRIGHT. Bur Nor NECESSARILY ALL. DO YOU THINK JOHN LIKES MOOT GIRIB/ YES/MAYBE/*H/NO/DON'T KNOW/NOT ENOUGH INFORMATION/*I/ *G/I!!! DOESN'T ADD 'MUCH TO SAY ItJOHN LIKES THAT WHICH JOHN LIKES. Tf SUCH A STATEMENr IS CALLED A TAurOIOOY -- THERE1'S NO POINT IN SAYING THE SECOND HArF ONCE YOU'VE SAID THE FIRST HAIF/*I/ *H/NO. THIS IS A VERY FALLIBLE AND UNLIKELY SORr OF INFERENCE TO DRAW. FOR INSTANCE, JOHN CERTAINLY DOESN'T EVEN KNOW MOST GIRIB. GENERALIZATIONS OF THIS SORr ARE RISKY AT BEST, Bur AT THE !FAST YOU MUgI' KNOW ~H MORE ABOtJr THE TOTAL GROUP -- GIRIB -- AND ITS REIATION TO JOHN AND HOW THE PARTICUIAR EXAMPIES GIVEN WERE CHOSEN/ *I/rr SO HAPPENS THAT JOHN DOES LIKE MOST OF THE GIRIB THAT HE KNOWS. MEN AND BOYS DO. Bur THERE ARE ALWAYS EXCEPI'IONS. DOESN'T LIKE ALICE. PHIL LrKFf3/ MOST FOR EXAMPIE, JOHN ON THE OTHER HAND, HE USUALLY LIKES THE GmIB THAT (Cont inued ) 40 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 */IS IT LIKELY THAT JOHN LIKES BFJJ!rY/YF13/*J/**K/ *J/RIGHT. SINCE PHIL LIKES BErrY AND JOHN TENDS TO .LIKE GIRIB AND TENOO .TO LIKE GIRLS THAT PHIL LIKES/ *L/ *K/THERE IS SOME REASON TO THINK YES, SINCE PHIL LIKES mm'Y/ *L/READ PAGES 7-13 OF THE TEY:r/ Figure 2b. Prepared for Automatic Compilation. riods, and ending with a number, e.g., /XX,XX,XX.XX.=N/, the program will look for an unordered match of the substrings terminating in commas, and an ordered match (starting from the first ordered substring) of the substrings terminating in periods. It will count the number of such matches it gets, and, if this is greater than N, it will accept the student's answer. To summarize briefly, a new item must start with an *. Its elements (statement of fact, question, alternate answer, branch) must be bounded by /. An item with more than one element is treated as a question. An item can have an optional numerical or symbolic name. A branch for any set of alternate answers can be specified by * , and an "otherwise" branch by **. The following is a short example: *l/JOHN LIKES JANE,SALLY,JO,AND BETTY./ *AIWHO DOES JOHN LIKE/JANE,SALLY,BETTY,JO,=O/*B/GIRLS/*C/MA,SANTA,MO,=O/* /**1/ *IDON'T BE IRRELEVANT/* AI *C/BE MORE SPECIFIC/* A/ *B/BILL L IKE S MARY,ANN,JANE, RUTH,SALLY,AND JO./ */NAME TWO GIRLS BOTH J9HN AND BILL LIKE./JA,SA,JO,=l/* I**B/ DISCUSSION Optional Modes of Operation The program will automatically refrain from asking a question that has previously been answered correctly with a frequency above a tolerance parameter, t, or if the student, at the time he answered the question correctly, also said "*EASY*." Several other features are optional, dependent upon whether special flags have been raised for the particular run. Thus, when desired, the. program will print out any or all of the following in response to a student's answer when a set of answers is required: "YOU ARE RIGHT TO SAY-" followed by the correct elements of the student's answer, "YOU ARE WRONG TO SAY-" followed by the incorrect elements of the student's answer, and "YOU SHOULD HAVE SAID-" followed by those elements that the student left unsaid. The compiler and interpreter programs were coded in SNOBOL (Farber 2 ) for the IBM 7090. As presently coded, the interpreter program handles only one student, accumulating the frequency of his success and failure on each question. If many consoles were used, each console would have a name and the different students would time-share the program. It s~emed futile to add this to the present program (although it would be trivial to do so), since SNOBOL has no provision for reading in from on-line sources. THE COMPILATION OF NATURAL LANGUAGE TEXT Figure 3 gives examples of a compiled program and its interactions with a student. Some Examples of Types of Material That Can Be Handled The person writing the text to be compiled has a great deal of latitude in formatting his material. The present set of programs will handle a wide variety of question and statement formats, including multiple-choice, true-false, fill-ins (either connected or disconnected, ordered or unordered), short answer questions, and essays. The limitations of the short answer type of question lie in the ability of the people who specify the alternate acceptable answers and the synonym dictionary. The key parts to the answer might be very loosely stated. A statement might impart information, or make a comment about the student's performance, or it might command the student to read a certain section of a certain book or perform a certain series of exercises. A branch might be to a question that underlies, forms a part of, or supplements the question missed (or got). Separate branches can be established for different answers with different implications and for different partial answers. With such programs the distinction between teaching, testing and controlling the student becomes an arbitrary one. Thus a compiled program might be used to train the student in some content area, to simultaneously train and test, to give a final examination, or to run an experiment that explored the student's abilities under some specified conditions and treatments. Possible Extensions to the Present Program The program that has been coded is a simple first attempt toward what might be done, such as the following. A. Rather than branch to a single string, the strings could belong to one or more classes, and the branch could then be to a class that contained several strings. For each particular execution of the branch, a random choice could be made; or, better, this choice could be a function of the difficulty of the different members of the class. B. Frequencies of successes and failures could be collected for (1) each student, (2) all 41 students, (3) given types of students (e.g., high IQ, impulsive). Then the choice of the particular branch could be a function of the appropriate individual and/or group information as to what is likely to benefit this student. C. The decision as to what group to put a student into could be made by the program, if it compared the patterns of successes and failures across students, and put students with similar patterns into the same group (e.g., by Kendall's tau). D. The decision as to what string to branch to after each string could be made by the program, by some rule such as the following: (1) branch to a string who.se success-failure frequencies are similar to this string's, (2) branch to a string whose answer is a substring of the answer to this failed string, (3) branch to a string whose answer contains this correctly answered string. E. Weights of specific strings can be not merely functions of success-failure of themselves, but also functions of success-failure of other strings that are related to them by, for example, (1) equivalence-relatedness as specified by the author in a simple equivalence dictionary, (2) connectedness in the sense of the graph formed by the branches cycling through the strings. F. At least simple methods could be programmed for taking an ordinary book, breaking it up into a set of statements, interspersing questions composed by the program) and then, by pretesting with human experimental students, winnowing the questions down to a good set (e.g., (1) non-redundant, (2) suitably difficult, (3) reliable, (4) valid). G. Answers could be recognized by additional partial and loose matches that would allow for a wider variety of alternate forms, for example, misspelled words, than can be recognized at present. H. The program could systematically collect alternate answers (e.g., from students that it judges to be pretty good) and occasionally ask its teacher whether these would in fact be acceptable alternates. It would then add these to its memory. It could similarly augment its synonym dictionary. 42 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 */SUPPOSE WE HAVE TWO SENTENCES, 'A' AND 'e'. THEN THE SENTENCE '(A)V(B)' IS CALLED THE DISJUNCTION (OR ALTERNATION, OR LOGICAL SUM) OF THE SENTENCES 'A' AND 'B'.I *1 A SENTENCE SUCH A.S t (C)V(D)' IS CALLED THE LOGICAL SUM, OR ALTERNATION, OR ------ ./DISJUNCTION/*I**11 ~(-AI WE AGREE THAT THE DISJUNCTION' (A)V(B)' IS TRUE IF AND ONLY IF AT LEAST ONE OF THE TWO SENTENCES 'A' AND 'B' IS TRUE, I.E., I F E I THE R tAt I S TR UE, 0 R t B' 1ST RUE, 0 R BOT H 0 F THE 1','1 ARE TRUE • I *81 IF IT IS NOT KNOWN WHETHER 'A' IS TRUE, CAN' (A)V(B)' BE TRUE/Y.E.S.=l/*I**AI *C/IF 'A' IS FALSE, CAN' (A)V(Sl' BE TRUE/YESI1(-/*-*,AI *DI IF 'A' AND 'B' ARE FALSE, CAN '(A)V(R1V(C)' BE TRUE/(CA1N.T.SA(Yl.=2/*F/**GI *EI YES, IN FACT IT CAN. BUT THIS DOES NOT YET FOLLOW FRO~ WHAT YOU HAVE BEEN TOLD.I *FI YOU ARE RIGHT IN SAYING THAT YOU DONT KNOW IF YOU MEAN THAT THIS IS NOT YET DECIDED.I *GI IN FACT IT CAN. BUT THIS HAS NOT YET BEeN STATED EXPLICITLY IN THE SYSTEM BEING ~EVELOPED FOR YOU.I *rll THE SIGN 'V' dF DISJUNCTION CORRESPONDS WITH FAIR EXACTNESS TO THE ENGLISH WORD 'ORt IN THOSE CASES WHERE 'OR' STANDS BETWEEN TWO SENT[NCES AND IS USFD (AS IT MOST FREQUENTLY IS) IN THE NON-EXCLUSIVE SENSE.I *11 WITH WHAT COMMON ENGLISH WORD DOES THE SIGN 'V' CORRESPOND HOST CLOSEL YICR/1~/*i~HI ~-J/GOOD. 'OR' IS CORRECT. CONGRATULl~,TIONS ON FINISHING THIS LESSON.I Figure 3. A Short Example of a Computer Run That Demonstrates Some Simple Uses of the Partial Match Features. Figure 3a. Listing of the Program to be Compiled. 43 THE COMPILATION OF NATURAL LANGUAGE TEXT INTERACTlONS WITH STUDENTS FOLLOW. IIIIFORMA1:LON- SUPPOSE wE HAVE TWO SE;IlTEIIICES, E SENTENCES 'A' AND 'B'.! QlJESTION- A SENTENCE SUCH liS 'IC)V(O)' SrlJDENT ANSWERED- 'SUM' NI1I, WRONG. '0.' AIIID 'B'. 'tA)V(rl)' IS CALL~D THE DiSJUNCTIO\l IL1R ALTER\jATION, OR llJGICAL SU,") IS CALLED THE LOGICAL SUr-,· 0". AU[;{;Ilf,T10., IIIIFORMATI.ON- SUPPOSE WE HAVE TWO SENTE\lCF.S, E 9EIHENCES 'A' AND '8'.! '/I' AIIID 'B'. '(A)VI!;)' U~ ------ • IS CALLED THE OISJU,'KTION IUR QUESTlON- A SENTENCE SUCH AS 'IC1V(D)' IS CALLED THl: LOGICAL SUM. OR ALTERNATlO,\j, OR -----STlJDfNT AN5WERED- 'DISJUNCTION' RPGHl'. A GOOD ANSWER IS-DISJUNCTION INFORMATION- WE AGREE THAT THE DISJUNCTION 'IAIVISI' IS TRUE IF A;IlD ONLY IF AI' LEflST ONE UF I.E •• IF EITHER 'A' IS TRUE, OR 'B' IS TRUE, OR BOTH OF THEM ARE TRUE.! QIJESTHINIF IT IS NOT KNOWN WHE1HER 'A' STlJDfNT ANSWERED- 'NO' NID, WRONG. UF TH ALTER'~ATION, OR LOGICAL SUM) OF TH • TH~ r..O SE,\jTENCES 'A' A:\jfl 'tl' IS TRUL, 'tl' IS TRUE, IS TRUE, CAN 'IAIVIB)' BE TRUE INFORMAtI.ON- WE AGREE THAT THE DISJUNCTION' (A)VIB)' IS TRUE IF AND ONLY IF AT LEAST ONE OF THE TWO I.E •• IF EITHER 'A' IS TRUE, OR 'B' IS TRUE, OR BOTH· OF THEM ARE TRUE.! SENTE'JC~S 'A' A.-J'1 QllESTIONIF IT IS NOT KNOWN WHETHER 'A' IS TRUE,. CAN 'lAIV(BI' BE TRUE S TIJDENT ANSWERED- 'Y AS' R~GH1'. A GOOD ANSWER IS-- YES QUESTIONIF 'A' IS FALSF, CAN '(AlVIS)' BE TRUE SWDENT ANSWERED- 'WHY SHOULDN'T I SAY IT IN FRENCH- MAIS OUI, CERTAINEMENT' R)GHT. A GOOD ANSWER 15-YES OUI QUESTlONIF 'Ar AND 'B' ARE FALSE, CAN' IAIVIBIV(C)' BE TRUE STIJDENT ANSWERED- 'THAT CAN'T REALLY BE SAID' R)GHT~ A GOOD ANSWER IS-CAN T SAY INFORMATI.ON- YOU ARE RIGHT IN SAYING THAT YOU DONT KNOW IF YOU MEAN THAT THIS IS NOT YET DECI01OO./ ItIIFORMATI.ON- IN FACT IT' CAN. BUT THIS HAS NOT YET BEEN STATE!) EXPLICITLY IN THE SYSTEM BEING IIIIFORMATION- THE SIGN 'V' OF DISJUNCTION CORRESPONDS WITH FAIR EXACTNESS TO THE ENGLISH WORD BETWEEN TWO SENTENCES AND IS USED lAS IT MOST FREQUENTLY IS) IN THE NON-EXCLUSIVE SENSE.! Q~E5T'rCN--" ~r,TH WHAT COMMON ENGLISH WORD DOES THE S!GN 'V' D~VELUPED 'a'" I~ FUR YOU.! THOSE CASES WHERE 'OR' STAN[JS CORRESPOND MOST CLOSELY STUDENT ANSWERED- 'AND' NlI, WRONG. JIIIFORMATLON- THE SIGN 'V' OF DISJUNCTIO'l/ CORR[SPO'lflS WITH FAIR EXACTNESS TO THE EIIIGLISH ."UK!) 'OR' BlETW8EN TWO SENTENCES AND IS USED (AS IT MOST FREQUENTLY IS) IN 1'I-IE NON-EXCLUSIVE S~NSE.! IN THOSE C,\SES WHERE 'OR' STANDS QUESTlON- WITH WHAT COMMON ENGLISH WORD nOES THE SIGN 'V' CORRESPOND MOST CLOSELY S TIJDENiT ANSWERED- 'NON-EXCLlJS I VE 'OR" R)GHT. A GOOD ANSWER IS-OR INFfj)RMATLON- GOOD. 'OR' IS CORRECT. CONGRATULATIONS 0\1 FINISHING THIS LESSO,•• ! Figure 3b. Printout of Interactions with a Simulated Student. 1. It could further try to boil down sets of equivalent alternate answers, by finding things in common among them, composing a summarIZIng statement, and asking its teacher whether this new statement is equivalent to all the specific alternates it is presently storing. It could then substitute this new statement for the alternates that in fact were equivalent, and now look only for this common element in students' future answers. J. It could have various methods for computing branches when appropriate to the problem domain; for example, (1) using a transform dictionary to analyze mistakes in logic or arithmetic, (2) using similarity between substrings to analyze types of mistakes in spelling. K. The program could itself compute the correct answer, rather than having this answer stored in memory. It might then also do such things as check the sequence of a student's answer (which it would get simply by commanding the student "GIVE YOUR ANSWER STEP BY STEP") and try to analyze at what point the student went astray. It could then generate a new question either on the basis of such an analysis or as a function of the student's present level of competence. 44 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 L. Some simplifications in the basic formatting rules could be implemented with relatively little trouble. For example, the program might accept several alternative identifications for questions; e.g., "*,, could be replaced by "*Q" or "*QUESTION" or ""QUESTION""; "." could be replaced by ""THEN""; "," could be replaced by " "AND" "; "/" in the answer section could be replaced by " "OR" "; the "*" that marks the branch by " "GO TO" "; the "-" that designates erase by ""ERASE"". Experiments might be run to see which form is preferable. If, as seems likely, the presently implemented form is somewhat harder to learn at first, but slightly faster to use once learned (if only because fewer symbols need be typed), novices could be trained on the form that looks more English-like and then given the option of using the shorter, more cryptic symbols. REFERENCES 1. ESTAVAN, D. P. "Coding for the class lesson assembler." FN-5633. Santa Monica, Calif.: System Development Corp., 1961. 2. FARBER, D. J., GRISWOLD, R. E., and POLONSKY, 1. P. "SNOBOL, a string manipulation language." J. Assoc. Compo Machinery, Vol. 11, No.1, Jan. 1964, 21-30. 3. MAHER, A. "Computer-based instruction: Introduction to the IBM research project." RC-1114. Yorktown Heights, N.Y.: IBM, 1964. METHOD OF CONTROL FOR RE-ENTRANT PROGRAMS Gerald P. Bergin Programming Systems International Business Machines Corporation New York, New York INTRODUCTION lowed to produce a routine that satisfies the requirements of re-entrability. The use of multiprogramming and multiprocessing raises a question as to the number of copies of a routine, needed in memory for multiple concurrent use. In the case where two or more scientific programs are in core at the same time, each needing the use of a SINE routine, a private copy can be provided for each program's own use, or one copy can be loaded for all to use. A message processing program that services multiple terminals can run into a situation where message A interrupts the processing of message B and because of priority consideration~, .. message A must be processed immediately by~he program. Again, the question of how many copies of the program are required in core occurs. Finally, a multiprocessing configuration with two or more computers sharing a common core memory may each be using the FORTRAN compiler. Each computer could have its own copy of the compiler or a single copy of the compiler could be executed by all computers concurrently. Intuitively, the provision of one copy of the routine or program appears more elegant. The terms used in this paper are defined to eliminate possible misinterpretation. routine-an ordered set of computer instructions which is entered by an explicit call program-a set of routines and associated data areas context-the information which a routine needs to perform its functions instance-the execution of a routine with a particular context read-only routine-a sequence of machine instructions which is not self modifiable or modifiable by others re-entrant routine - a read-only routine which accepts and uses the context associated with an instance of a routine, such that multiple entrances and executions can occur before prior executions are completed subexecution-an instance of a re-entral)t routine task-a set of one or more routines which define a unit of work, and which can compete independently for computer time job-a collection of tasks organized and submitted by a user under a single accounting number LIFO-Abbreviation for Last In, First Out. This pertains to the retrieving of data in the reverse order in which it was stored. Also called a push-down, pop-up list Assuming the use of only one copy of each routine, the possibility that a cQPlmonly used routine may not run to completion pefore being entered again must now be considered. A routine which permits unlimited multiple entrances and. executions before prior executions are complete is called a re-entrant routine. This paper describes a method of controlling these routines and sets forth conventions that must be fol45 46 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 SCL-Abbreviation for Single Cell available space List A TQ-Abbreviation for Active Task Queue TCL-Abbreviation for Task Control List SCB-Abbreviation for Subexecution Control Block BAL-Abbreviation for Block Available space List available space List (SCL). This space is available for use by the monitor only. The Block Available space List (BAL) permits blocks of space of variable size to be allocated for both program and monitor storage needs. A description of the Space Allocation scheme is contained in Appendix A. RE-ENTRANT PROBLEMS The biggest problem a re-entrant routine poses is that of referencing proper context. The routine can be made to conform to a welldefined set of conventions for its references to input, output, and working storage-this solves only part of the problem. The remainder must be resolved through the use of a monitor capable of associating context with each instance of the re-entrant routine, ancl, of accepting responsibility for providing context reference during re-entrant executions. Functions The monitor functions discussed are not intended to be all inclusive even for re-entrant routines. Functions such as I/O and interrupt control are virtually ignored since they are of little concern in this paper. The amount of control information which a lTIonitor must create and maintain is a function of: 1. The number of unfinished instances of each re-entrant program 2. The number of unfinished sub-executions (or current levels down) for each program instance 3. The number of context pointers to data for each unfinished subexecution. In addition, each routine requires working storage and data areas associated with a given instance. To pre-allocate all the space required for some maximum activity seems unreasonable in a dynamic environment. When activity is minimal, a large number of cells would be unusable for other purposes, and any change in the size of data blocks would require re-assembly of the system. Dynamic space allocation will circumvent some of these problems; space can be allocated as needed. When the space for a sub execution is no longer required, it is returned to available space and can be used by other subexecutions. To provide dynamic space allocation, both a single cell and block allocation scheme were considered essential. A small block of space is pre-linked and constitutes the Single Cell RE-ENTRANT CONTROL The monitor functions which are of importance for re-entrant control include: 1. Obtaining and returning single cells and 2. 3. 4. 5. 6. blocks of cells needed for control information Determining priority of tasks and task queuing Creating and terminating tasks Maintaining context for each unfinished instance Maintaining the data structures which reflect the activity of the re-entrant routines Handling all inter-routine and intraroutine communication Structure To perform these functions, the monitor must have control information organized in some manner. The following data structures are, therefore, the basis of achieving the required monitor control. Job Description Block (JDB) Pertinent information about each job is contained in a set of contiguous locations called a Job Description Block. One JDB is created for each job. These blocks are the source of all activity to be done in regard to job processing, especially the sequence of tasks to be accomplished within each job. The set of all JDBs need not reside permanently in core although information pertaining to some tasks may be used frequently enough to dictate its presence. METHOD OF CONTROL FOR RE-ENTRANT ROUTINES The major concern this paper has with JDBs is that they exist. Active TCLsk Queue (ATQ) The Active Task Queue is a list of the tasks which are in some phase of execution in the computer. There is a scheduling procedure applied to this list to determine the next task to be activated or reactivated when interrupts occur or when a subexecution relinquishes control. The A TQ is a simple list structure composed of cells obtained from the SCL. The monitor adds or inserts an entry to the list when a new task is to become a candidate for processing in the multi program environment. An entry is deleted from the queue when a task is terminated, and the cell is returned to the SCL. Each entry in the ATQ contains status information, task identification, priority number, pointer to the associated task control list, and a link to the next entry in the A TQ. TCLsk Control List (TCL) Associated with each ATQ entry is a Task Control List. This list is used to establish and associate context for each level of subexecution within a task. When a task is added to the ATQ, a TeL is created for it. The first entry contains the name of the associated JDB and the second entry points to a Subexecution Control Block (SCB) . This SCB contains the pointers to the context needed for the execution of the main control routine of the task. When the execution of the control routine is initiated, each level of descent into nested subexecutions causes an entry to be added to the TCL which points to the associated SCB. When a sub execution terminates, returning to the prior level of execution, its entry is removed from the TCL. It should be noted that all transfer of control to and from subexecution is through the monitor. The TCL is a push-down, pop-up (LIFO) list with entry to the list through a header cellthe cell containing the name of the JDB. The header cell and push down cells are obtained from (and later returned to) the SCL. New entries to the LIFO list are added to the top of the list with prior entries pushed down. Termination of a subexecution results in the LIFO 47 list being popped up and the cell returned to the SCL (also returning the SCB space). When the control section terminates, its entry in the TCL, the TCL header, and the entry in the ATQ are deleted thereby terminating the task. The job description may get updated at this point. The first entry of the TCL contains a pointer to and the name of the JDB, and a link to the top of the LIFO list. Each entry on the LIFO list points to a SCB, links to the previous entry on the list, and contains the name of the subexecution. Subexecution Control Block (SCB) Each Subexecution Control Block contains the context or references to context associated with its related unfinished subexecution. The monitor creates an SCB when a subexecution is called. The pointer to the SCB is pushed down on the TCL as explained earlier. As a subexecution requests more space or asks for data pointers, the monitor uses the proper SCB to store or fetch the necessary information. When space is returned, the monitor updates the proper SCB appropriately. Termination of a subexecution results in its SCB being returned to available space along with the space no longer needed by the calling subexecution. An SeB is a block of contiguous cells ob= tained from the block-space pool. The number of cells per block may vary depending on the anticipated requirements of the subexecution. A minimum number of cells will always be allocated to contain immediate data and point~ ers to normal data requirements. There are two types of entries in an SCB: immediate data entries and data pointers. Immediate data consists of "save console" infor~ mati on when a subexecution is entered, and "save console" and other program status information when an interrupt occurs that does not return to the interrupted code after its servic~ ing. Data pointers are used to define the loca~ tion of input, output, working storage blocks, extension of the SCB, etc. Each data pointer contains the name of the data block, the loca~ tion of a cell which points to the cell containing the upper and lower boundaries of the block, the register to be loaded witI). a boundary, and information concerning the return of the space which is being pointed to. 48 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 RE-ENTRANT CONTROL EXAMPLE Table I shows five jobs which are known to the system. Each job has a Job Description Block (JDB) which was created when the job entered the system. Within each JDB is the list of Tasks to be done. The monitor has scheduled three Tasks, which in this case are from different jobs. The ATQ shown in Table II indicates that Task "MSG Processor T" is waiting for I/O to complete, Task "MSG Proc.essor T" (a different instance of the previous Task) is in execution, and Task "GO" is pending and will be executed when both instances of "MSG Processor T" Terminate or cannot proceed. Each entry in the ATQ points to a TCL, shown in Table III. Task "MSG Processor T" points to the Header cell K which contains the name of its JDB (Terminal A) and points to the top of its LIFO list, k2i. The "T control" routine has "called" the routine "Update 3" which will resume when its wait condition Terminates. Task "MSG Processor T" (2nd instance) points to its TCL entry S. The Header cell names the JDB (Terminal N) and points to the Top of the LIFO list S4. The main routine "T control" is three levels down in routines "Interpret," SCAN, and SCAN (SCAN has re-entered itself once). The Tasks at locations E1 and E2 of the A TQ are both using program "MSG Processor T" with the first using the context at WI and W 2, the second using the context at Xl through X-/. Task "GO" is associated with TCL U. Its Header cell names the JDB (JobB) and points to the top of its LIFO list Ua. A program "Integrate" has been loaded and is now ready for execution with context at Y 1. Each entry on a LIFO list points to an SCB which associates context with the instance of a routine. A minimum of m cells has been allocated for each SCB. Table IV shows the content of the SCB for one instance of a routine. If another task can be accommodated by the computer, a task will be chosen from the Job Description area. The following example shows what occurs to this new contender for computer time. The scheduler selects Task "Load" of J ob-C whose priority has changed to 4. The monitor gets a cell from the SCL (cell E4). This cell is inserted in the ATQ by changing the link of cell El to El and putting E:! in the link of cell E-/. The console information associated with task "MSG Processor T" (of Terminal N instance) is saved in its SCB (X 4 ) , and its status in the ATQ is set to P. The name of the Task (Load), its priority (4), and the status (E) are inserted in cell E-/. Two more cells V and VI are obtained from the SCL to establish the TCL at V. A block of space is obtained for the SCB of the main routine of "Load." The initial context for the task is then entered in the SCB beginning at Zl and Load is then executed. Table V is a graphic representation of the structure created in the preceding example. RE-ENTRANT ROUTINE CONVENTIONS By definition, a re-entrant routine is readonly in nature. Address calculations, internal indicators, subroutine parameters, and similar information must be stored and used external to the routine. The association of context to an instance of the routine is a function of the monitor and has already been treated. The following conventions are those considered im-, TABLE I. JDB'S (AUXILIARY OR MAIN STORAGE) Terminal N Job-B Compile Load Go Print Terminal A MSG Processor T I I MSG essor MSG essor ProcT ProcR Job-C Load Go Compile Go Print Terminal Q MSG essor MSG essor ProcH ProcR METHOD OF CONTROL FOR RE-ENTRANT ROUTINES 49 TABLE II. ACTIVE TASK QUEUE (ATQ) Status Priority Pointer to TCL Name E1~E2 W 3 K MSG Processor T E2~Ea E 6 S MSG Processor T Ea P 7 U Go-Integrate Link Location ° W = waiting status due to I/O, etc. P = pending E = in execution portant at present to get, use, and store the external data (context) required in any routine. 1. All routines must be called through the monitor. 2. Parameters requir~d for inter-routine communication are contained in the calling routine's context. Return of control to the higher level routine is through the monitor also, so that return can be considered an implied call. To call a routine, the monitor is entered indicating: a. The name of the routine being called (or returned to) b. The name of the register which contains the pointer to the required context (if necessary). TABLE III. TASK CONTROL LIST (TCL) Location Link Ky(k'° Routine Name Pointer toSCB Terminal A loco of JDB T control W1 k1 Update W2 S4 Terminal N loco of JDB >0 T control Xl S2 Sl Interpret X2 S3 S2 Scan X3 S4 Sa Scan X4 k1 k2 S u / u1 U1 10- Job-B Integrate loco of JDB Y1 3. To· get a new block of cells for use, the monitor must be entered indicating: a. N arne to be assigned to the block allocated b. N umber of contiguous cells needed c. N arne of the register to be used (if any) d. Value to be put in the register (either upper or lower boundary) e. Return location or action if space is not available. 4. To re-establish a pointer in a register whose contents have been changed, the monitor must be entered indicating: a. N arne of the block b. Value to be used (either upper or lower boundary) c. N arne of the register to be used if other than the previously associated register (if named, the previous association is lost). 5. To drop a register from use as a context pointer so that it can be used for other purposes, the monitor must be entered indicating the name of the pointer. 6. To return block space to the available space pool the monitor must be entered indicating the name of the block to be returned or the name of a list containing the blocks to be returned. 7. The responsibility of returning space rests with the routine which obtained the space. The termination of a subexecution will, however, result in all space requested for private use being returned to the block allocation pool. 50 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 TABLE IV. SUB-EXECUTION CONTROL BLOCK (SCB) FOR THE TASK "INTEGRATE" I I Location Y1-0 Control Name Register Pointer to loc. pointing to Data 1 IN 1 a f3 IN 2 -1 2 -2 c OU 1 ! I I 3 OU 2 -3 -4 1 d WS 1 -5 WS 2 -6 SCB -7 CONS (immediate data Y l -7 +i CONI (immediate data -m Control 1 2 3 IN OU WS SCB CONS CONI can be returned Returned common data base INPUT OUTPUT Working storage Additional SCB for additional space if required Console save data Console save data plus program status information at an interrupt SUMMARY Association of context for an instance of a routine has been achieved through the use of control information created by the monitor or furnished to the monitor via program conventions. The organization of the control information is in the form of a list structure for ease of inserting and deleting new data. The Active Task Queue is a list, ordered on priority, used primarily for task sequencing. The Task Control Ljsts are LIFO lists which relate context in the Subexecution Control Blocks to routines which are associated with Tasks in the Active Task Queue. This method of control, in conjunction with the conventions that a routine must follow, allows multiple entrances and executions of a routine before prior executions are completed. TABLE V. GRAPHIC REPRESENTATION OF THE MONITOR DATA STRUCTURE SCB TCL ATQ ---- Location Link Status Priority El E4 W 3 Pointer to TCL Name K Location Link Name MSG Processor T .------~ K Pointer to SCB I Location 0 T Control Location of JDB ~Wl-O~ W,-: (CB Data Wl-m kl Update 3 W,- 84 Terminal N Location of JDB 0 T Control j,kz Terminal A ~W2-0t : (CBData W 2-m P 6 S MSG Processor T ---------~ S / X, ~XcOt : ~ M f-3 P=1 0 (CB Data Xl-m Interpret S2 Scan tj 0 _X2-0) X, : Xz-m _ X 3 -0 ~ \ 8CB Data 0 0 Z t f-3 ~ 0 x;-mj 8CB Data X, ~ ~ 0 ~ S4 P 7 U~U/U, Ul E4 Ez E 4 . S3 V Load 0 V/V, Vl 0 ~ _____X.-O} Scan X 4--- Job-B Location of JDB Integrate ~Yl-O~ Y, : j8CB Data Job-C Load M I M : SCB Data X 4-m ~ ~ > Z f-3 ~ 0 q Yl-m Location of JDB ~Zl-O{ Z,- : Zl-m f-3 1-1 Z M W. 1 8CB Data I 10"1 ~ 52 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 APPENDIX A SPACE ALLOCATION The structure of the control data needed for re-entrant and recursive routines is based on list-structure concepts. The use of a list structure approach requires being able to obtain and return space dynamically. Part of the space needed must be prestructured or linked in the IPL-V (Newell, Simon and Shaw) * manner. Availability of space in blocks of contiguous cells is also required to gain a compromise for efficient use of core storage. The following is a description of a s~ngle cell and block allocation scheme that was developed and implemented on the IBM 7094 by Mr. M. R. Needleman of WDPC-VCLA.t SINGLE CELL ALLOCATION A relatively small number of contiguous cells are linked together to fornl the Single Cell available space List (SCL). A fixed cell is maintained which always points to the next available cell on the list. Table A-I shows this construction. The allocation routine allocates a cell by giving the requestor the name of a cell (the address a) and updates the link of A to point to the next available cell on the list which is f3. Table A-II shows the result of allocating 1 cell. When a cell T is returned to the available space list, it is inserted at the top of the SCL as follows: 1. Cell A, which contains the pointer to the next available cell on the SCL, is modified to point to T. 2. The former pointer f3 is put into the link portion of the cell T. Table A-III shows the results of this process. * The Rand Corp., Santa Monica, Calif., Newell A. Editor, "Information Processing Language-V Manual," Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1961. t Western Data Processing Center, University of California, Los Angeles 24, California. The scheme was developed by WDPC under contract with the Advanced Research Projects Agency (Contract No. SD 184), Office of Director of Defense, Research and Engineering, Pentagon, Washington, D. C. TABLE A-I. SINGLE CELL AVAILABLE SPACE LIST (SCL) Location Link Information A (fixed loc.) f3 y o BLOCK ALLOCATION The second type of space allocation is called block allocation. A block of contiguous cells is reserved for this type of allocation. Two lists are used to identify the space available and space allocated. Cells for both lists are obtained from the Single Cell available space List. Each entry in the block allocation list contains: a flag indicating whether or not the block is currently available; a pointer to the cell containing the addresses of the first and last locations of the block; and a link to the next cell on the Block Allocation List. The first cell on the list (header cell) always links to the last cell put on the list. Table A-IV shows the block allocation list after three requests for space. Each cell on the block-limits list contains the address of the first and of the last cell allocated as a block by the block allocation routine and is in one-to-one correspondence to the Block TABLE A-II. SINGLE CELL AVAILABLE SPACE LIST (SCL) AFTER ALLOCATING ONE CELL Location Link A (fixed loc.) f3 f3 y o Information METHOD OF CONTROL FOR RE-ENTRANT ROUTINES TABLE A-III. SINGLE CELL AVAILABLE SPACE LIST (SCL) AFTER RETURN OF ONE CELL Location Link A (fixed loc.) T Information y o f3 TABLE A-IV. BLOCK ALLOCATION LIST (BAL) Location Link Flag Pointer a ~ 0 B y ~ B 0 1 1 1 f3 y e 'l} BLOCK LIMITS LISTS (BLL) f3 De 'l} 10K 40K 30K 20K +1 +1 +1 Last Location (Allocated to) 20K 45K 40K 30K (available) (Zl) (Z2) (Z3) SPACE POOL MAP 10K 15K 20K 25K 30K 35K 40K 45K a Available Z 3 Z 2 Z1 In general, the user requests a block of N cells. The allocator assigns space and returns to the user via the monitor. The monitor sets the address of the cell containing the address of the first and of the last cell assigned in the SCB and places the base value in the register specified. To return space, the user indicates the name of the block, via the monitor, to be returned to the block space pool by the space allocation routine. D- Flag = 0, block is available Flag = 1, block is being used First Location Location Allocation List. An example of this list used in conjunction with the Block Allocation List (BAL) . and core map of the block allocation space pool is shown in Table A-IV. The method of block space allocation is best illustrated by some examples. The first is a request for a block which is immediately available, the second is the return of a block, and finally a request for space which exceeds the length of anyone available block. These examples will assume the previous state of the lists and blocks allocated and indicate only the allocation routine action. The monitor is the implied user. The first example starts with Table A-IV state. f3 T 53 Example 1 A user (Z 4) requests 3000 cells of block storage. The block allocation routine goes through the following sequence. 1. From cell a (in the BAL), get the limits cell f3 and the subsequent limits. In the rare case where the block is in use (Flag = 1) the coalescing of blocks, as outlined in Example 3 is done first. 20 Determine the number of cells available and determine if the request can be filled. (In this case assume the affirmative.) 3. Decrease the upper limit in cell f3 by the number of cells needed. 4. Get two cells «(), L) from the Single Cell available space List (SCL). 5. Insert cell () in the BAL with a link of ~ (obtained from cell a) and pointer to L. The address of () is put in the link field of a and the flag of () set to 1. 6. Set the block limits in cell L equal to 17,001 and 20,000 respectively. 7. Return to the user with the address of cell L. 54 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE. 1964 TABLE A-VI. BLOCK ALLOCATION LIST (BAL) Table A-V shows the result of requesting a block of cells. Example 2 User (Z 2) returns the block of space whose limits are found in cell e. The address of this cell is used to search the BAL for the pointer to this cell. As can be seen in Table A-VI, only the flag of the cell (8) which contains the pointer to the block limits returned is changed (from 1 to 0). The block limits list is not altered. Location Link a e y 0 8 y ~ 8 e ~ User (Z 5) requests 15,000 cells of block storage. The block allocation routine does the following. 1. Using cell a (in the BAL) the contents of cell f3 are obtained. The number of '/ TABLE A-V. BLOCK ALLOCATION LIST (BAL) Location Link Flag a e y 0 8 y ~ 8 e ~ 0 1 1 1 1 Pointer --- f3 ~ e 1] Pointer 0 1 0 1 1 f3 ~ e 1] BLOCK LIMITS LIST (BLL) First Location Location Example 3 Flag -- f3 ~ e 1] 10K 40K 30K 20K 17K +1 +1 +1 +1 Last Location (Allocated to) 17K 45K 40K 30K 20K ( available) (Z 1) ( available) (Z 3) (Z 4) SP ACE POOL MAP 10K 15K 20K 25K 30K 35K 40K 45K Available Z 4 Z 3 Available Z 1 BLOCK LIMITS LIST (ELL) First Location Location f3 Do e 1] 10K 40K 30K 20K 17K +1 +1 +1 +1 Last Location (Allocated to) cells available is determined to be less than the number requested. 17K 45K 40K 30K 20K (available) (Z 1) (Z 2) (Z 3) (Z 4) 2. Link through BAL putting the limits pointer of each "in use" entry (Flag = 1) on a push down list. Each entry with a Flag = 0 (space returned) is returned to the single cell available space list along with its associated limits cell. The BAL cell returned is also deleted from the BAL list. The push down list cells are obtained from the Single Cell available space List. The entries in the list are now ordered such that the name of the cell containing the highest block limits is last on the list (therefore 1st off) and the name of the cell containing the lowest block limits is first on the list (therefore last off). SP ACE POOL MAP 10K 15K 20K 25K 30K 35K 40K 45K Available Z 4 Z 3 Z 2 Z 1 METHOD OF CONTROL FOR RE-ENTRANT ROUTINES which is done when each new block of unused space is encountered. Table A-VII shows the results of coalescing space. Since the user points to a pointer to the block, the block can be moved and the pointer to it changed without concern by the user. 3. The method to coalesce available blocks is as follows. Move each used block up in core so as to pack them to the upper boundary of the space pool. This will push any scattered available space further and further down in core until it is engulfed by the limits of {3; i.e., all unused space is in one block at the lower boundary of the space pool. To implement the coalescing, the pointers to the used space limits are popped-up and limits are changed to reflect data movement 4. If the request for space can now be filled from the space available limits at {3, the method of allocating the block is the same outlined in Example 1. TABLE A-VIII. BLOCK ALLOCATION LIST (BAL) Location Link Flag Pointer a () {3 y 0 ~ y () ~ 0 1 1 1 Ll YJ BLOCK LIMITS LIST (BLL) First Location Location {3 Ll YJ 10K 40K 30K 27K +1 +1 +1 Last Location (Allocated to) 27K 45K 40K 30K (available) (Z 1) SP ACE POOL MAP 10K 15K 20K 25K 30K 35K 40K 45K 55 Available Z 4 Z 3 Z 1 (Z 3) (Z 4) XPOP: A META- LANGUAGE WITHOUT METAPHYSICS J.7IJark I. Halpern Research Laboratories Lockheed Missiles & Space Co. Palo Alto, California pseudo-operations to modify and extend XPOP itself, which then becomes the desired compiler. INTRODUCTION The XPO P programming system is a straightforward and practical means of implementing on a computer a great variety of languages-in other words, of writing a variety of compilers. The class of languages it can handle is not easy to characterize by syntactic form, since the system permits syntax specification to be varied freely from statement to statement. in a program being scanned; the permitted class includes the best-known programming languages, as well as something closely approaching natural language. We believe that this distinguishes the XPOP processor from the syntax-directed compilers,1,2,3 although it shares with them the fundamental idea that the process of programming-language translation can be usefully generalized by a compiler to which source-language syntax is specified as a parameter. The use of these facilities involves the creation by the programmer of functional units that superficially resemble the programmerdefined macro-instructions of, for example, IBMAP (and in fact include such macros as a subset), but whose effects may be radically different from those obtained by use of conventional macros. * An XPO P macro does not necessarily generate coding; its possible effects are so varied that it can best be defined simply as an element of the source program that, when identified, causes the processor to take some specified action. That action may be any of the following: (1) The parameterization of XPOP's scanning routine to make it recognize, either for the remainder of the source program or within some more limited domain, a new notation (2) The compilation of coding for immediate or remote insertion into the object program (3) The immediate assembly and execution of any of the instructions compiled from a source-language statement. This paper describes only the more novel features of XPOP; a fuller treatment is available elsewhere. 4 DISCUSSION XPOP consists of two major parts: (1) a generalized skeleton-compiler that performs those functions common to all compilers, and (2) a battery of pseudo-operations for specifying the notation, operation repertoire, and compiling peculiarities of a desired programming language. The programmer creates the compiler for such a language not by programming it from scratch but by using the XPOP * By "conventional macros" we mean the user-defined operators that some programming systems allow. The definition of a macro consists essentially of the assignment· of a name to a block of coding, after which every appearance of that name as an operator causes the system to insert a copy of that coding into the object program. 57 58 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 (4) The preservation on cards and/or tape of the language description currently in use, in a condensed format that can be redigested by XPOP at tape speed when read back in; also the reading-in of such a language from a tape file created earlier in the same machine run or during an earlier run (5) The production by XPOP of a bug-finding tool called an XRAY-a highly specialized core-and-tape dump giving the programmer the tables and strings produced by the system in structured, interpreted, and captioned form In the illustrations of these features, some conventions that require explanation will be used. All programming examples offered are exact transcripts of the symbolic parts of actual XPOP listings. Lines prefixed by a dollar sign are records output by the processor as comments; these originate either as sourceprogram statements printed out as comments for documentary purposes or as processorgenerated messages notifying the programmer of errors or other conditions he should be aware of. No attempt is made to illustrate XPOP facilities by coding examples of any intrinsic value. The examples used are merely vehicles for the exhibition of those facilities and are therefore generally trivial in size and effect. The discussion that follows takes up the chief features of the system in the order of the five-point outline given earlier. Notation-Defining Pseudo-Operations Consider a macro, LOGSUM, created to store the logical sum of two boolean variables, A and B, in location C. $LOGSUM $ $ $ $ MACRO CAL ORA SLW END A,B, C A B C Having been defined, this macro may at once be called upon in XPOP's standard form, which requires that the macro's name be im- mediately followed by the required parameters with commas separating these elements and the first blank terminating the statement. A standard-form call on LOGSUl\1: would have this appearance and effect: $ LOGSUM,ALPHA,BETA,GAMMA CAL ALPHA ORA BETA SLW GAMMA Suppose we find standard-form notation unsatisfactory and want to call upon the function LOGSUM in the following form: STORE INTO CELL 'C' THE LOGICAL SUM FORMED BY 'OR'ING THE BOOLEAN VARIABLES 'A' AND 'B'. There are, from the XPOP programmer's viewpoint, four differences between the standard and the desired form: ( 1) The name of the function is no longer LOGSUM, but STORE. (2) The order in which parameters are expected by STORE differs from that of LOGSUM. (3) The punctuation required by the two forms differs; in standard form, the comma is the sole separator, blank the sole terminator. In the desired form, three kinds of separator are used: (a) The one-character string 'blank' (b) The two-character string 'blankapostrophe' (c) The two-character string 'apostrophe-blank' and one terminator (a) The two-character string 'apostrophe-period' ( 4) The desired form contains several "noise words"-that is, character strings present for human convenience but which XPOP is to ignore. In the following illustration, we use its pseudo-ops to teach XPOP the new statement form, then demonstrate that the lesson has been learned by offering it the new form as input and verifying that it produces the correct coding. An explanation of each pseudo-op used follows the illustration. XPOP: A META-LANGUAGE WITHOUT METAPHYSICS $STORE $ $ $ $ $NEW $ $ $ $NEW $ $ $NEW $ $ $ $ $ $ $ 59 A,B,C MACRO LOGSUM B,C,A END CHPUNC PARAMETER-STRING PUNCTUATION ADOPTED AT THIS POINT CHPUNC 3S1 2 '2' 1T2'. PARAMETER-STRING PUNCTUATION ADOPTED AT THIS POINT CHPUNC 1S2.. 1Tl. PARAMETER-STRING PUNCTUATION ADOPTED AT THIS POINT NOISE 4INTO 4CELL 3THE 6LOGICA 6FORMED 2BY 60R'ING 6BOOLEA NOISE 6V ARIAB 3AND 3SUM STORE INTO CELL 'GAMMA' THE LOGICAL SUM FORMED BY 'OR'ING THE ... BOOLEAN VARIABLES 'ALPHA' AND 'BETA'. ALPHA CAL ORA BETA SLW GAMMA The definition of STORE with which the above illustration begins deals with the first two of the four differences noted betw.een the desired and the standard statements. It causes XPOP to recognize STORE as an operator identical in effect to LOGSUM, and specifies that the parameter expected as the third by LOGSUM will be expected as the first by STORE. The pseudo-op CHPUNC (CHange PUNCtuation) deals with the third difference. Its first use, with blank variable field, erases all punctuation conventions from the system; the comma is no longer a separa tor nor is the blank a terminator. Having thus wiped the slate clean, CHPUNC is used again to specify the required punctuation. The variable field that follows this second CHPUNC may be read: "Three separators-the one-character string blank, the two-character string blankapostrophe, and the two-character string apostrovhe-blank; also one terminator-the two- character string apostrophe-per'iod." (The additional punctuation specified by the third CHPUNC was introduced because the signal to XPOP that a statement is continued on the next card is the occurrence, at the end of each card's worth, of a separator immediately followed by a terminator; here the programmer wanted to use the string , ... ' for this purpose. A separate CHPUNC was necessary simply because the additional punctuation came as an afterthought.) The fourth and last difference is dealt with by means of the pseudo-op NOISE, which permits the programmer to specify character strings to be ignored by the processor. Since strings longer than six characters are taken as noise words if their first six characters are identical to any noise word, such strings as VARIABLE, VARIABLES, and VARIABILITY are effectively made noise words by the definition of 6VARIAB as an explicit noise word. 60 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 With these pseudo-ops given, XPOP has been taught the desired statement form, as proof of which it generates correctly parameterized coding when used as input. That statement was created, of course, only for illustrative purposes; few programmers would care to use so many words to generate three lines of machine-language coding. For an application in which documentation was an unusually important requirement, however, so elaborate a statement might serve a useful purpose-and real macros would average closer to 100 instructions than to 3. The most important property of this technique for describing a notation to a processor, though, is the flexibility with which a notation so specified may be used. All that the XPOP programmer has explicitly defined is a number of individual words and punctuation marks, with no constraints on their combination; they may be used to form any statement that makes sense and conveys the necessary information to the processor. The programmer will often have a particular model statement in mind when specifying the vocabulary he wishes to use in calling for some function, but he will find that in implementing the model he has incidentally implemented an enormous number and variety of alternative forms. If we add to our list of noise words the two strings OF and AS, we can use any of the following to generate the required coding: (a) STORE INTO GAMMA THE SUM OF ALPHA AND BETA. (b) STORE AS GAMMA THE LOGICAL SUM OF A.LPHA AND BETA. (c) STORE AS LOGICAL GAMMA THE SUM OF THE VARIABLES ALPHA AND BETA. (d) STORE LOGICALLY INTO GAMMA 'ALPHA' AND 'BETA.' (e) STORE GAMMA ALPHA BETA. (f) LOGICALLY STORE INTO 'GAMMA' THE VARIABLES 'ALPHA' AND 'BETA.' (g) INTO GAMMA STORE THE SUM OF ALPHA AND BETA. As (f) and (g) indicate, both noise words and operands may precede the operator, provided only that they are not themselves mistakable for operators. If, for example, INTO were an operator as well as a noise word (such multiple roles are possible arid sometimes useful) , statement (g) would be misunderstood as a call on INTO. Excepting such uncommon cases, the operator and operands in a statement may float freely with respect to noise words, and the operator may float freely with respect to its operands; the sole constraint is that the operands must be given in the order specified when the operator was defined. Even this last constraint will be relaxed when the QWORD feature is fully implemented. A QWORD is a noise word that, like an English preposition, identifies the syntactic role of the word it precedes; its use enables the programmer to offer operands in an order independent of that specified when the operator is defined. Applied to the statement type dealt with so far, the QWORD feature might be used thus: STORE MACRO CAL ORA SLW END $INTO$C,A,B A B C The string $INTO$C informs the system that if the QWORD "INTO" appears in a call on STORE, the first operand following it is to be taken as corresponding to the dummy variable C. The use of the QWORD would override the normal C,A,B order and enable the user of STORE to write, as another alternative: (h) LOGICALLY STORE THE SUM OF ALPHA AND BETA INTO GAMMA. Practically all notation-defining pseudo-ops may be used within macros as well as outside them, and the difference in location determines whether the conventions thereby established are 'local' or 'global.' If such pseudo-ops are given at the beginning of a macro definition that includes some non-pseudG-op lines as well, they are taken as local in effect. They will temporarily augment or supersede any notational conventions already established, and be XPOP: A META-LANGUAGE WITHOUT METAPHYSICS nullified when the macro within which they were found has been fully expanded. 'Local' notation-defining pseudo-ops will be put into effect in time to govern the scan of the very statement that calls on their containing macro. Such internally defined statements need respect the earlier conventions only to the e~tent necessary to permit their operators to be isolated. When pseudo-ops constitute the sole contents of a macro, they are taken as applying to the rest of the program in which they appear; the effect of calling on such a macro-ful of pseudoops is as if each pseudo-op were giyen as a separate input statement. Insofar as the notation a programmer requires is regular and selfconsistent, then, it may be described in a single macro whose name might well be that of the language itself, and which would be called on at the beginning of any program written in that language. Statement forms that have special notational requirements in conflict with any global conventions would include the necessary local conventions within the bodies of their macro definitions. The local-notation feature will be illustrated in the next section. As should be evident at this point, it is possible to teach XPOP to recognize an enorll10US nurnber of logically identical but notationally different statements by means of a few uses of just those pseudo-ops introduced so far. It should be possible, in fact, to define a programming language empirically-that is, to treat a language as a cumulative, open-ended corpus of those statement forms that experience shows to be desirable. The full set of notation-defining pseudo-ops, of which about one-third is exhibited here, permits the description of the notations of FORTRAN, COBOL, and most other existing compiler languages. Compilation-Control Pseudo-Operations The compiler designer also needs, of course, various kinds of control over the compilation process. One requirement is for the ability to call for remote compilation. To meet this need XPOP provides the pseudo-ops WAIT and W AITIN. Both signify that the part of any macro lying within their range is to be ex- 61 panded as usual-that is, parameters substituted for dummy variables, system-generated symbols inserted where called for, and so onbut that the resulting coding is not to be inserted into the object program yet. Instead, these instructions are put aside, to be inserted into the object program only when a sourceprogram statement is found bearing the statement label specified by the WAIT or W AI TIN. (The label to wait for is specified in the pseudoop's variable field, where it may be given as a literal constant or-more likely-represented by a dummy to be replaced by a parameter.) In any case, all instructions waiting for such a label will appear just after those resulting from the translation of the statement so labeled. The instructions waiting for a label may have come originally from several various macros, or several uses of the same macro; if so, the one difference between WAIT and W AITIN will make itself felt. If, for example, a group of instructions lay within range of WAIT ALPHA, they would be appended to the threaded list of those already waiting for ALPHA; if the pseudo-op were WAITIN, they would be prefixed to it. Those groups of instructions made to wait by W AITIN's will, therefore, appear in the object program in the inverse of the order in which they occurred in the source program-hence "WAITIN" (WAIT INverse). If the label for which a batch of instructions is waiting never appears, the instructions do not appear in the object program. If no label is specified, they appear at the very end of the object program. The following example shows the use of W AITIN in a simplified version of FORTRAN's "DO"~ne that permits only the special case of subscripting that is formally. identical to indexing. First, the source program that defines "DO" to XPOP, and then uses it in a twolevel-deep DO nest: * * Note that XPOP can process algebraic expressions. These may be used as source~language statements or within macros; when used within macros, they may contain dummy variables to be replaced by parameters when the macros are used, and those parameters may be arbitrarily long subexpressions. SUbscripts, not now allowed, are being provided for. 62 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 J K DO )A 15 EQU EQU CHPUNC MACRO AXT WAITIN TXI TXL END 2 4 4S1=11,2, 1T2 A,B,C,D,Ol C,B { definition A of "DO" *+l,B,Ol )A+1,B,D ~ DO 15 J=1,3 ) DO 15 K=2,20,2 ~ PHI,J=RHO,J +BETA,J ( "DO" nest TAU,K=PHI,J +4 ) END And below, the object program produced by the above: EQU 2 J EQU 4 K $ CHPUNC 481 1 = 1,2, 1T2 $ $NEW PARAMETER-STRING PUNCTUATION ADOPTED AT THIS POINT $ A,B,C,D,Ol $DO MACRO $)A AXT C,B WAITIN A $ TXI *+l,B,Ol $ TXL )A+1,B,D $ END $ $ $ DO 15 J=1,3 $ )0001 AXT 1,J $ DO 15 K=2,20,2 $ )0002 AXT 2,K $ PHI,J=RHO,J +BETA,J CLA BETA,J FAD RHO,J PHI,J STO $15 TAU,K=PHI,J+4 15 CLA =4 FAD PHI,J STO TAU,K TXI *+1,K,2 TXL ) 0002 + 1,K,20 TXI * +l,J,Ol TXL ) 0001 + 1,J,3 END XPOP: A META-LANGUAGE WITHOUT METAPHYSICS Another obvious use for WAIT or W AITIN is the handling of closed subroutines. The programmer will frequently want a macro to generate only a calling sequence to a closed subroutine, with the subroutine itself appearing only once in the object program, at the end. To secure this effect, the programmer would define the macro in question as starting with the calling sequence; then he would incorporate a WAIT with blank variable field, a ONCE pseudo-op, and then the subroutine. If the macro were not used in a given source program, the subroutine would not be made part 63 of the obj ect program. If used, the first such use would output the calling sequence normally, and the subroutine as waiting instructions to be put into the object program at its end. Subsequent uses of the macro in that program would cause the compilation of the calling sequence only, the ONCE pseudo-op reminding XPOP that it had already compiled the subroutine. The following examples will illustrate uses of W AI TIN, ONCE and local notation-defin,ing pseudo-ops. The first is the pseudo-DO with its punctuation defined within its own body: A,B,C,D,01 MACRO $DO 4S1 1=1,2, 1T2 CHPUNC $ C,B $)A AXT A WAITIN $ TXI *+1,B,01 $ )A~i,B,D TXL $ END $ $ $ DO 15 J=1,3 $ CHPUNC 4S1 1 = 1,2, 1T2 $ $NEW PARAMETER-STRING PUNCTUATION ADOPTED AT THIS POINT )0001 AXT 1,J $ DO 15 K=4,48,TWO $ )0002 AXT 4,K PHI,J=RHO,J +BETA,J $ CLA BETA,J FAD RHO,J PHI,J STO $15 TAU,K=PHI,J +4. 15 CLA =4. FAD PHI,J STO TAU,K TXI *+l,K,TWO TXL ) 0002 + 1,K,48 TXI *+1,J,01 TXL )0001+1,J,3 END A use of ONCE is shown next. ONCE may be used in either of two ways, depending upon whether its variable field is blank or not. When the macro in which it occurs is being expanded and a ONCE with blank variable field is encountered, the name of the macro is searched for in a table. If it is found, the rest of that macro is skipped; if not, it is entered in the table to be found on later searches and expansion continues. The procedure followed if a symbol is found in the variable field differs only in that the symbol found is used rather than the name of the macro being expanded. This type of use permits copies of a subroutine, a set of constants, or a storage reservation to be incorporated into the definitions of many 64 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 different macros, with assurance that they will appear in the object program if and only if one of the macros is used, and not more than once no matter how many of them are used. It is this second type of use that is now shown: $FIRST $ $ $ $ $ $ $ $ $SECOND $ $ $ $ $ $ $ $ $ $ MACRO CLA ADD ONCE STO END A,B,C A B M C MACRO LDQ MPY ONCE STO END X,Y,C X Y M C FIRST,ALPHA,BETA,GAMMA CLA ALPHA ADD BETA STO GAMMA SECOND,PHI,RHO,GAMMA LDQ PHI MPY RHO $ END Last among the compilation-control pseudoops that will be discussed here is XPIFF, which permits the programmer to specify conditions whose satisfaction is a prerequisite to the compilation of the next line of coding. (It is, of course, a direct development of the IFF familiar to users of the FAP-IBMAP family of assemblers.) The IFF is almost entirely restricted to testing conditions involving sourceprogram symbols; the direction in which XPIFF is being developed is that of greater range of reference. The conditions upon which XPOP compilation may be made contingent will include many referring not to sourceprogram symbols but to the system itself. When fully developed, this facility should bring within the compiler-writer's reach the means of specifying as much object-program optimization as he wishes, short of that which, like FORTRAN's, depends on a flow-analysis of the entire compiled program. The kind of optimization available through XPIFF in its present state is indicated by the following illustration, where it is used to avoid compiling loop-initializing and -testing instructions where they are unnecessary. $MOVE $ $ $ $ $ $ $ $MOVMOR $ $)A $ $ $ $ $ $MOVEL $ $ $ $ $ $ $ $ $ $ $ )0002 $ MACRO XPIFF MOVMOR XPIFF MOVEL END A,B,O O,X,X A,B,O O,X,Y A,B MACRO AXT CLA STO TIX END Q,E,D D,4 Q+1,4 E+1,4 )A,4,1 MACRO CLA STO END L,M L M MOVE,ALPHA,BETA O,X,X XPIFF O,X,Y XPIFF ALPHA CLA BETA STO MOVE,ALPHA,BETA,5 XPIFF 5,X,X AXT 5,4 CLA ALPHA + 1,4 STO BETA+1,4 TIX ) 0002,4,1 XPIFF 5,X,Y END XECUTE Mode-A Compile-Time Execution Facility The XPOP processor may at any point in a source program be switched into XECUTE mode, in which succeeding source-language XPOP: A META-LANGUAGE WITHOUT METAPHYSICS statements are not only compiled but assembled and executed. The programmer switches into this mode by using the pseudo-op XECUTE, and reverts to normal processing by using the pseudo-op COMPYL; the' coding between each such pair is assembled as a batch, then executed. XECUTE mode may be used with great freedom. The programmer may enter and depart it within a macro; while in the mode he may use macros (with full notational flexibility), algebraic expressions, and everything else that XPOP normally processes except certain pseudo-ops that would be meaningless at compile time. XECUTE mode was originally implemented by those' working on the XPOP processor for their own use in maintaining and developing that program, and has proved itself better for such tasks than any other method we know. It enables us to patch XPOP in a symbolic language practically identical to the F AP language in which the processor itself is written, and to cause these patches to become effective at such points during a compilation as we choose-not necessarily at load time. The effectiveness of any such patch can be made contingent on results of program execution thus far, so that tests otherwise requiring several machine runs can be accomplished in one. A F AP-like assembly listing is produced by XPOP \vhile in XECUTE mode, and the symbolic language employed is so nearly identical to F AP that the very cards used for XECUTE-mode patches can later be used for F AP assembly-updating. * But this facility is by no means usable only by those working on the processor itself. It has the further role of giving the compilerdesigner working with XPOP the ability to specify pseudo-ops for his compiler, and make it perform any compile-time functions it requires that are not built into XPOP-building special tables, setting flags, and so on. It enables the designer to make his system, to any extent he wishes, an interpreter rather than a .compiler, or a monitor /operating system rather than a language processor. Compile-time execution makes a great variety of special effects readily available to the * We have produced a subroutine, entirely independent of XPOP, equivalent to XECUTE mode, and hope soon to announce its general availability. 65 programmer. For example, it allows any macro to be used recursively: just before calling on itself, such a macro switches into XECUTE mode, makes whatever test is required to determine whether further recursion is indicated, then switches back to compile normally either at or just after the internal call, depending on the outcome of that test. Another useful facility it affords is that of trapping any sourcelanguage statement type for such purposes as counting the number of uses made of it, taking snapshots of its variables before their values are changed, or debugging by testing the values of a procedure's variables just before exiting from it. Such trapping could be done even at the machine-language level. If the programmer wanted to trap all TRA instructions, for example, he would define TRA to be a macro, enter XECUTE mode within that macro to take the desired compile-time action, then return to normal processing. (The psuedo-op ULTLEV-ULTimate LEVel of expansionwould be used within such an op-code/macro to prevent the taking of a TRA instruction within the TRA macro as a recursive call, with resulting infinite regress.) One purpose of replacing op codes by macros of the same name might be to cause each such extended operator to step a programmed clock at execution time (as well as executing the original op code, of course), so that the programmer can learn exactly how long his routines take to run-a critically important matter in real-time applications, which require that programmed procedures fit into time slots of fixed size. This capability, together with its notational-flexibility and immediate-execution features, makes XPOP particularly suitable for command and control programming. 6 Language-Preserving Pseudo-Operations XPOP provides the programmer with a group of three pseudo-ops that enable him to order, at any points in his program, that all macros so far defined be punched onto binary cards, written onto tape, or both. The use of any of these- pseudo-ops preserves all macros then in the system in a highly compact form (binary-card representation takes about onesixth the number of cards that symbolic takes) and, more important, a form that can be read 66 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 into the system at tape speed on any later XPOP run, without the time-consuming process of scanning and compressing the symboliclanguage definitions. Notation-defining macros may, of course, be preserved on cards and/or tape along with code-generating macros. The tape and/or card deck produced may thus contain a complete programming language of the programmer's own design in both vocabulary and notation. This language may then be changed in any respect during the course of any ordinary production or debugging run. Functions may be added or deleted, notation elaborated or simplified. Because any of these pseudo-ops can be used as often as desired in a single program, it is possible to preserve successively larger sets of macros, each set containing its predecessors as subsets, as well as any macros defined since. Each time macros are punched or written out by means of any of these pseudo-ops, a report is generated, giving a-n alphabetized list of the macros preserved and the percentage of the system's macro capacity they occupy. Another two pseudo-ops are available for ordering, either during a later XPOP run, or later in the same run, that predefined macros be read in either from the input tape (if they had been preserved on cards) or a reserved tape (if they had been preserved on tape). Sets of preserved macros may be read into the $ $THE $ $ $ $ 00 $ $ALL system at any point in any program, making it possible to switch languages in mid program. This greatly facilitates the consolidation into one program of sections written by several programmers using different XPOP-based languages-each section simply begins by reading into the system the language in which it was written. The five pseudo-ops, and their exact effects, are given in Table 1. As is shown in the following example, the programmer may override XPOP's built-in assumptions about the tapes that WMDT, W APMD, and RMDT refer to. He does so simply by specifying, either by logical or by FORTRAN tape designation, the unit he wishes to address. He may also assign a name to each file when he creates it, and later retrieve it by name; this permits many languages to be stacked on a tape while sparing the programmer any concern over the position of the one of interest to him. In the example below, the programmer has used WMDT to write his language onto logical tape A6 under the name 'TEST'. His language consists of three macros, whose names are then listed by XPOP. (Since the amount of available core storage used by these three was less than one-half percent, it is given as zero percent.) He then read this language back in again, this time addressing the tape by its FORTRAN designation, 11. WMDT TEST,~6 FOLLOWING MACROS HAVE BEEN OUTPUT ON TAPE TEST2 TESTER TESTXC PER CENT OF AVAILABLE SPACE HAS BEEN USED RMDT TEST,ll PREVIOUS MACROS HAVE BEEN DESTROYED BY THE USE OF RMDT Debugging Tools-The XRA Y XPOP provides one unconventional tool for finding bugs that our experience has shown to be highly useful, and which might readily be incorporated into other systems. This is the XRA Y-a structured, interpreted, and captioned dump of core memory and the output tape. It prints out the chief buffers, tables, and character-strings in the system in meaningful format and (where one exists) external representation, as well as all the program com- piled so far (whether still in core or already on tape), and a standard octal dump of as much of core memory as the programmer may require. In case of system trouble or sourceprogram trouble not covered by one of XPOP's 50-odd error messages, the first thing the XPOP programmer will want to check is that the macro definitions were properly accepted and packed away, and these definitions are accordingly converted back to original input form and exhibited first. Because these defini- XPOP: A META-LANGUAGE WITHOUT METAPHYSICS TABLE 1 Pseudo-op 67 THE LANGUAGE-PRESERVING PSEUDO-OPERATIONS Action Caused Meaning WMDT Write Macro-Definition tape Writes definitions and associated information in binary on logical tape A5 PMDC Punch Macro-Definition Cards Writes definitions and associated information in card-image format on system punch tape WAPMD Write and Punch Macro Definitions RMDT Read Macro-Definition Tape Causes both tape files described above to be written Reads in from logical tape B5 a binary file created by a 'WMDT' or 'WAPMD' RMDC Read Macro-Definition Cards tions, as seen in an XRAY, have undergone both compression into internal form and expansion back into input form, the programmer who can recognize his macros there can feel some assurance that they were properly digested by XPOP. He will next want to see how the system has scanned the last statement it saw; for this purpose he is given a print-out of the table that shows what symbols XPOP extracted from that statement as the parameters, and how it paired them off with dummy variables. Following this he is shown that part of the compiled program still in the system's output buffer, then that part already written out onto tape. Finally, the XRAY will present as much of core memory in standard octal dump format as the programmer may have specified in the variable field of the XRAY pseudo-op that triggers this output. XRA Ys can be obtained in two ways. One is to use the pseudo-op explicitly at whatever points trouble has shown up in a previous run, or is to be feared; the other is to order compilation in XPER (eXPERimental) mode, which may be started at any point in the program by use of the pseudo-op XPER. In this mode, the detection by XPOP of any error in the source program or the system itself causes the generation of an XRAY-and one will be generated at the end of the program in any case. The two methods may be combined, the programmer calling explicitly for XRAYs at some points as well as compiling all or parts of his Reads in from the input tape binary records representing a deck produced by a 'PMDC' or a 'wAPMD' program in XPER mode. The information presented by an XRA Y as presently constituted is not fully adequate (hence the selective octal dump as a backup), and additions to it are being made, but experience indicates that the gain in intelligibility of information presented in XRAY form over that given in octal dumps is great enough to mark a step forward in bug-finding methods, as we think that XECUTE mode does in bug-correction. The joint power of the two source language facilities suggests the possibility of some experiments in on-line debugging; we hope to report on these later. ACKNOWLEDGMENTS The general macro-instruction concept, as well as many details of format, are derived from the BE-SYS systems created at Bell Telephone Laboratories and generally associated with the names D. E. Eastwood and M. D. McIlroy. Three projects similar at least in spirit to XPOP but which came to the author's attention too late to play any part in the XPOP design are the Generalized Assembly System (GAS) of G. H. Mealy,' the Self-Extending Translator (SET) of R. K. Bennett and A. H. Kvilekval,8 and the Meta-Assembly Language of (presumably) D. E. Ferguson.!) . At Lockheed Missiles & Space Company the author's principal debt is to B. D. Rudin, W. F. 68 PROCEEDINGS~F ALL JOINT COMPUTER CONFERENCE, 1964 Main, and C. E. Duncan for steady faith and support over long bleak stretches. Much of the coding of the processor and most of the daily problems fell to W. H. Mead, Marion Miller, M. Roger Stark, and A. D. Stiegler. The author is grateful to C. J. Shaw of System Development Corporation for an acute critique of XPOP that has helped to improve the presentation and to pinpoint the areas in which further development is most needed. 10 Thanks are also due to P. Z. Ingerman of Westinghouse Electric Corporation for useful discussions on the relationship between XPOP and syntax-driven compilers, and for the opportunity to read part of his forthcoming book on such compilers. 3 REFERENCES 1. IRONS, E. T., "A Syntax Directed Compiler for ALGOL 60," Communications of the ACM, January 1961, pp. 51-55. 4. HALPERN, M. I., An Introduction to the XPOP Programming System, Lockheed Missiles & Space Co., Electronic Sciences Laboratory, January 1964. 5. , "Computers and False Economics," Datamation, April 1964, pp. 26-28. 6. , A Programming System for Command and Control Applications, Technical Report 5-10-63-26, Lockheed Missiles & Space Co., July 25, 1963. 7. MEALY, G. H., A Generalized Assembly System, Memorandum RM-3646-PR, The RAND Corporation, August 1963 (2nd printing) . 8. BENNETT, R. K., and A. H. KVILEKVAL, SET: Self-Extending Translator, Memo TM-2, Data Processing, Inc., March 3, 1964. 2. FLOYD, R. W., "The Syntax of Programming Languages - A Survey," IEEE Transactions on Electronic Computers. EC-13, August 1964, pp. 346-353. 9. FERGUSON, D. E., "The Meta-Assembly Language," address presented before the Special Interest Group on Programming Languages, Los. Angeles Chapter of ACM, July 21, 1964 [information taken from the announcement] . 3. INGERMAN, P. Z., A Syntax Oriented Translator (New York: Academic Press, to be published) . 10. SHAW, C. J., "On Halpern's XPOP," System Development Corporation, unpublished, undated [early 1964]. A 10 Me NDRO BIAX MEMORY OF 1024 WORD, 48 BIT PER WORD CAPACITY William I. Pyle Theodore E. Chavannes Robert M. MacIntyre Philco Corporation Ford Road, Newport Beach, California INTRODUCTION SYSTEM DESCRIPTION Most of the approaches to fast read access memories in the past have been centered about the achievement of either faster conventional destructive switching, or the use of various non-destructive readout techniques and storage devices. Many of these techniques have inherent drawbacks for very fast read operation, such as the necessity for rewriting, in the case of conventional switching approaches, or the lack of truly non-destructive properties. The memory system described in this paper solves these problems by utilizing the BIAX memory element, with its inherently non-destructive readout properties, in a system organized to minimize circuit delays and utilize transmission line properties for the various signal paths. In this manner it is possible to achieve random read access times of 85 nanoseconds maximum since most inductive components are incorporated into the various transmission lines with the lines being terminated in their characteristic impedance. Not only is the memory designed for very high readout rates in the non-destructive mode, but it is electrically alterable with conventional linear select methods in five microseconds or less. The sections which follow will describe the system design concepts, operation of the BIAX memory system, and the circuit and packaging designs which were used to achieve the system performance. System Design Goals The basic goals of the memory program were to design and construct an operating model of a 1024 word, 48 bit per word memory capable of 10 Mc. random access non-destructive readout (NDRO) while being electrically alterable with a write cycle time of five microseconds. Although the performance requirements were of prime concern it was nevertheless necessary to utilize state-of-the-art components to insure that a practical system would ultimately result. Table I outlines the system characteristics which resulted. System Organization The organization of any memory system is, in general, related to the desired speed of operation. If the primary design goal is the achievement of very short read access time it is usually mandatory that parallel operation of • CAPAOTY: 1024 WORDS, 48 BITS PER WORD • REPETITIVE READ CYCLE TIME: • READ ACCESS TIME: (RANDOM ACCESS) 100 NANOSECONDS 85 NANOSECONDS (MAXIMUM) • REPETITIVE WRITE CYCLE TIME: 5 MICROSECONDS • REPETITIVE WRITE/READ CYCLE TIME: 10 MICROSECONDS Table 1. Memory System Characteristics. 69 70 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 ~--------------------------- many parts of the memory be employed. The block diagram of Figure 1 shows how this type of parallel organization is employed to achieve 10 Mc. NDRO operation. In this diagram it is seen that the flow of information for a typical readout operation is through the input buffer, read decoder, interrogate drivers, BIAX array, sense amplifiers, and output register. To achieve the goal of 85 nanoseconds read access time, the propagation delays through the functional parts of the system as shown in Figure 2 1 were necessary. 'The achievement of these propagation delays necessitated the use of certain specific organizations of the circuitry within the memory read system. These organizational factors and how they were applied to the 10 Mc. NDRO memory are listed below. 1) Every signal path involved in the read operation, including interconnections, must be considered in terms of its transmission line characteristic impedance and propagation delay. This is especially true in the BIAX array where the inductance of wires passing through many elements is substantial. 2) When the array signal transmission paths are portions of transmission lines, the total array delay is approximately the sum of the interrogate line delay plus sense line delay. Therefore the minimum total array delay usually results when the number of array words is approximately equal to the number of bits per word. In this memory, the 256 word by 192 bit per word array organization permits achievement of near minimum delay within the array. NANOSECONDS 100 CYQ.E TIME Figure 2. 10 Me. BIAX NDRO Cycle. 3) Read address decoding must be accomplished at as low a signal level as practical, with high level gating kept to a mInImum. In order to effeetively accomplish this end it is necessary to use one interrogate driver per array word and a 1 of 256 decoder. Although 256 interrogate driver circuits are used each circuit is simple since it drives a transmission line terminated in its characteristic impedance. 4) The BIAX output signals produced by the interrogation of a word must be strobed at the earliest possible time following the interrogation. In this memory it is accomplished by strobing the sense amplifier with timing pulses derived from the array itself. By using these array-derived strobing pulses each sense amplifier output is strobed at an optimum time and variation in signal delays due to physical location of the word within the array or degradation of the interrogate pulse rise time is automatically compensated for. 5) All circuits associated with the NDRO portion of the memory must be located as close as possible to the array to miniIn the mize interconnection delays. memory this is accomplished by arranging the read circuits on two sides of the array, and making interconnections via twisted pair lines. M emory System Design and Operation The memory described here has two basic modes of operation, non-destructive readout and, a writing mode, both of which utilize linear or word select techniques for address selections. Figure 1. 10 Me. NDRO BIAX Memory Organization. NDRO Mode The basic concept employed to achieve nondestructive readout in the BIAX element is A 10MC NDRO BIAX MEMORY OF 1024 WORDS 71 rogate hole contains a single conductor for interrogation of the memory element. DIMENSIONS IN MlLLI.INCHES Figure 3. Nominal BIAX Physical Characteristics. one involving crossed or quadrature magnetic fields in a common volume of square loop magentic material. 2-5 The BIAX element used in the 10 Mc. memory is a pressed block of ferrite material having two non-intersecting orthogonal holes. The physical dimensions are approximately 50 x 50 x 85 milli-inches (mils) with two circular holes, one 30 mils in diameter, the other 20 mils in diameter (Fig. 3). Information is stored by saturating the magnetic material around the 30 mil hole (the storage hole). 'The storage hole contains the windings necessary to write into the memory element and to sense signal output. The inter- Interrogation of the element is accomplished by applying a current producing flux in the same direction as flux already established around the interrogate hole. The current causes the domains in the common volume to be re-oriented toward the direction of the flux linking the interrogate hole. This reorientation decreases the flux linking the storage hole and thereby gives rise to a dcp/ dt voltage on the sense winding passing through the stor.age hole. The polarity of this voltage is dependent on the orientation of the flux linking the storage hole, consequently, a selected polarity of element output voltage will be observed for a ONE and the opposite polarity for a ZERO (See Figure 4C). Upon termination of the interrogate pulse, the domains in the common volume revert back to their original permanent flux condition and a true non-destructive readout is achieved. Several advantages result from the use of this principle as employed in the BIAX memory element. First, the interrogate process introduces no measurable delay in the read operation and is therefore quite applicable to very high speed reading. Secondly, since the interrogation process involves only shuttling of flux around the interrogate hole, the inductance of the wires passing through a .... AX II.IIIIIIn FLUX PATTIIN IIf PUll Figure 4. The BIAX Principle. 72 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 number of elements is sufficiently linear to permit low loss wide bandwidth transmission lines to be constructed using the BIAX element inductance and the associated array capacitance. By using the array construction techniques described later in this paper, it was possible to achieve transmission line impedance as low as 200 ohms while propagating pulse rise times less than 5 nsec. circuits with another level of gating. Since the memory array is organized as 256 words of 192 bits, only eight of the ten address bits are required for decoding at the input to the memory, with the remaining two address bits being employed to select the desired 48 bits (of the 192 available) to be transferred to the 48 hit memory output register. When a particular interrogate driver has been activated, it is necessary to extract the stored information from the array within 10 nanoseconds if the total memory read access time of 85 nanoseconds is to be achieved. To understand the difficulty of achieving the 10 nanosecond array delay with conventional constant current techniques, consider the following calculations: Assume that each interrogate line consists of approximately 200 elements, each exhibiting an inductance of 30 nanohenries. By lumping all the inductances, a total inductance of 6 microhenries would result. U sing conventional constant current drive techniques, to achieve 80 rna. within 10 nanoseconds would require the following voltage: NDRO operation of the memory is initiated upon receipt of a clocked read command after the address levels have stabilized. The ten address bits and their complements are converted by the input buffer to levels required by the read address decoder. The decoder selects one unique path of the possible 256 and activates the interrogate driver connected to the decoder output. The actual decoding process starts with the occurrence of the clocked read command and proceeds through the various levels of the decoder at a rate limited only by the response of the circuits in the path corresponding to that address. Figure 5 shows the functional breakdown of the input portion of the memory. To accomplish the required 1 of 256 decoding, it is seen that two decoders, a 64 place, and a 4 place, are used. The primary advantage of this method is that it minimizes the number of gating levels since the decoders operate in parallel. The 4 place decoder is a clocked unit, while the 64 place is unclocked and the outputs of the two decoders are combined at the input of the interrogate driver 'L~I (1) E== ~T E 6-10- li (80-10- 3 ) 10-8 (2) (3) E=48 V It was felt that not only would a 48 volt current source be impractical since 256 were re- TO WRITE CIROJITS ~ ~ INPUT BUFFERS WRI TE COMMAND ~ J READ COMMAND INHIBIT .. 6 BITS r -. 6" PLACE DECODER ADDRESS INPUTS ~ 10 BITS DECODER OUTPUT "AND" GATES AND INTERROGATE DRIVERS 2 BITS 2 BITS READ COMMAND".. r ~ ..... r SYSTEM CLOCK .-..I~ DECODE CLOCK GENERATORI "PLACE DECODER .. I ADDRESS .1 TO "T" GENERATORS ~ FLIP FLOPS I 1(2) -.. Figure 5_ 10 Me. BIAX NDRO Memory Input Block Diagram. TO BIAX ARRAY . 73 A 10MC NDRO BIAX MEMORY OF 1024 WORDS IHTERIIDGAT! DRIVER ... 1". ~~ ~ I Figure 6. Simple BIAX Interrogate Line Equivalent Circuit. quired, but it also would introduce reactive transients which would seriously limit the maximum interrogation rate. In order to bring the required interrogate drive voltage within practical limits, and to minimize transients, the terminated transmission line concept of operation was employed in the memory array. Figure 6 shows the schematic representation of a simple BIAX transmission line. In this figure, each lumped inductance is represented by one BIAX element through which an interrogate wire passes, and the capacitance is that between the wire and the ground plane. If one calculates the properties of the transmission line 6, assuming an element inductance of 30 nh per element, with elements spaced at approximately 0.125 inch intervals and located above the ground plane, the line will exhibit a characteristic impedance of approximately 500 ohms. Were a line with such a high characteristic impedance to be used for the memory interrogate line, certain problems would be encountered. First the drive voltage required to achieve 80 mao interrogate current would be 40V, and even with a constant voltage driver, it is excessive from a practical circuit standpoint. Secondly, such a large excursion in voltage on the interrogate line introduces noise onto the sense line by capacitive coupling through the element, even though this capacitance is only about 0.01 pf. per element.. Third, if this transmission line consisted of 200 sections, corresponding to the required word length in the array, the delay would be approximately 12 nsec. (Fig; 7). In order to alleviate these problems, several steps were taken to alter the electrical length, impedance and driving characteristics of the lines. These sfeps are described briefly below. To reduce the driving voltage requirements, the line impedance was. reduced by two means. First, the elements were offset as shown in I 1/ / I ! \ J1 I :I TIME BASE"' 2D NSEClDlV VERT. SCALE = 2V/DiV i LINE IMPEDANCE = !500 OHMS \ \ ! '- bJ.V Figure 7. Interrogate Pulse Propagation Through 200 Element Transmission Line. Fig. 8 and treated essentially as two transmission lines in parallel, and further split into two additional parallel lines. Since each wire passes through only half as many elements (and inductance) per unit of capacitance as for the single line, the impedance is reduced to 0.7 of the single line value. It should be noted that the delay per section of line is actually greater for the offset placement by a factor of ·1.4, but since only half as many sections are employed (by driving in parallel), the net propagation delay is reduced to 0.7 of the single line value. The second means employed to reduce the impedance of the interrogate line is also shown in Fig. 8. This method consists of introducing a perforated metallic shielding mask around each element between the two holes. This increases the capacitance per section by approximately another factor of three, and brings the characteristic impedance down to approximately 200 ohms. The methods described above, employed to reduce the transmission line characteritic impedance, did reduce the drive voltage requirements to about 12 V and as a result, the capacitive coupling to the sense line through the BIAX element was reduced accordingly. Even so, an objectionable amount of noise was still observed due to the coupling. Two measures were taken to eliminate this problem. The offsetting of the elements as shown in Fig. 8 necessitated driving the two lines in parallel. Because of the inherent properties of the BIAX element, interrogation can be accomplished with either polarity pulse, if it is in the 74 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 !LEMBn' PLACEMBn' • SIDE Yin IMTEIIlOG.\TE DIlIVO ELBIINT PLACEMENT· TOPYID Figure.~. Dual BIAX Interrogate Line-Physical Configuration. same .direction as the previously established flux around the interrogate hole. This property of the BIAX element was used to reduce the capacitively coupled noise to the sense line by using pulses of equal amplitude and simultaneous rise times but of opposite polarity applied to the offset lines. Since a given sense line crosses both of these offset interrogate lines, the total capacitive coupling is reduced to a value proportional to the algebraic sum of the opposite polarity interrogate voltages during the rise time. Since this method did not. provide perfert cancellation of the capacitively coupled noise, an additional method was employed to provide partial cancellation of remaining noise on the sense line. In Fig. 9 it is seen that the sense line is divided into eight segments of 32 bits each. Within each segment, the electrical length of the line is short compared to the interrogate pulse rise time, and one end of each segment is returned to ground. When capacitive noise is introduced into the segment, it propagates to the grounded end and is reflected back to the source inverted, providing effective cancellation of the noise pulse. BIAX OUTPUT WHEN IHTERROGA TED SENSE AMPLIFIER 20 OHM STRIP TRANSMl6SlON LINES SENSE LINE LOCALLIZED GROUND PLANES Figure 9. Sense Line Summing Equivalent Circuit. A lOMe NDRO BIAX MEMORY OF 1024 WORDS When an output is produced from an element by interrogation, the isolation resistors shown in Fig. 9 create, in effect, a constant current source. The sense amplifier is then designed with a very low input impedance to provide compatibility with the sense line summing method. In the present memory, the sense amplifier has an input impedance of approximately 15 ohms, and receives its input from the array not more than 10 nsec after the 50 % point of the interrogate pulse. When the signals are observed at the output of the sense amplifier the time delay (relative to the interrogate pulse) depends ~oth upon the physical word location relative to the sense amplifier and the location of the bit in the word relative to the interrogate driver. In the present memory, this delay ranges from a minimum of essentially zero to a maximum of ten nanoseconds, not including the delay through the sense amplifier. This variation, added to the variation in decoding delay, renders it difficult, if not impossible to strobe all 192 sense amplifiers reliably with a pulse fixed in time while still maintaining the required access time. To avoid this problem, the pulse used to strobe the sense amplifier is derived from the same region of the array as is the information. This can best be understood by considering Figure 10. Each group of 48 75 sense amplifiers is accompanied by a 49th bit, identical to the other 48, which provides the input to a pulse generator ("T" pulse generator) the output of which is used to strobe the 48 sense amplifiers in that group. In so doing the inherent time variations in signal output due to word and bit location within the array, degradation of the interrogate pulse rise time, and variations in decoding time are automatically compensated for. Figure 10 also shows a function block called "8" clock generator. The pulse from this generator, which is also derived from the array, is used to set those output register flip flops which do not receive a reset input from the "T" gate. This technique permits the use of simple one input or "D" flip flops 7 in the output register. The final operation which occurs in NDRO is selection, by the two most significant address bits, of the proper group of 48 sense amplifiers whose strobed outputs are to establish the state of the memory output register. This selection is accomplished by permitting only one of the four "T" generators to be activated at any time thus producing an output on only one of the four "OR" inputs to each of the memory output flip flops. Write Mode It will be recalled that the organization of the array for reading is as 256 words of 192 C819 BIAX ARRAY 256 WORDS OUTPUT ReSISTER C4 FLlP.flOPS) Figure 10. 10 Mc. NDRO BIAX Memory Output Block Diagram. 76 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 .ITICYCU _WIITI PULSIS IIT.ITI PUUIS IIT....TI LIllIS y -'fttTI UlIS Figure 11. BIAX Array Element Orientation and Write Current Program. bits per word. For writing however, the array is organized as 1024 words of 48 bits per word, and conventional linear select techniques a;e used. The elements are oriented and wired in the array such that current pulses pass in both the word and bit directions and selective writing is accomplished by the coincidence of a word oriented word write pulse and bit oriented write pulse. The orientations of the BIAX elements within the array and the TEWERATURE CONTROLLED VOLTAGE ~ REGULATORS write current program are shown in Figure 11. The word write pulse currents consists of a fixed sequence of two opposite polarity word write currents, and a time-overlapping bit current whose polarity depends upon the binary state of the information to be stored. In order to select the appropriate word of the 1024 for writing, a matrix of 16 word drivers and 64 word switches, organized as shown in Figure 12, is used. j TO: {BIPOLAR BIT DRIVERS WORD .1 DRIVERS WORD.Q DRIVERS ADDRESS BUFFER ADDRESS ...- - - - 1 10 BITS 4 BITS ARRAY (ORGANIUnOM FOR WRlnMG) ~-.... •024 WORDS .. BITS PER WORD 6 BITS WRITE CDlMAHD DATAIMPUT OBITS Figure 12. Memory Write System Block Diagram. A lOMe NDRO BIAX MEMORY OF 1024 WORDS 77 Receipt of the write command activates the write cycle by starting the fixed sequence of word current pulses and permitting the 48 bipolar bit drivers to generate currents in a direction dictated by the data inputs (Figure 11). 'The portion of the write cycle during which element switching occurs, if it occurs at all, is determined by the information carried by the bit current and the previous history of the element. Both the bit and word currents are temperature compensated to permit optimum switching of the elements with minimum disturb over a temperature range of ooe to 50°C. Nominal write system operating parameters for the memory are given in Table II. MEMORY SYSTEM FABRICATION AND PERFORMANCE Fabrication and Packaging The complete 1024 x 48 memory described in the preceding portion of this paper was designed, fabricated and tested. The completed memory is shown in Figure 13. In this photograph are identified the following essential parts of the memory. 1. Array and Read Circuits One of the four memory array planes is visible in Figure 13. Note that the decoder and interrogate driver circuits, located above the array, and the sense circuits and memory output circuits to the right of the array, both are mounted as physical extensions of the main array ground plane. This was done primarily to minimize propagation times and to avoid . ground noise problems. Each of the four array planes is divided into eight sections as shown in the photograph. This results from the required segmenting of the sense lines (refer to Fig. 9), and because word write lines must be terminated at 48 bi·t intervals. The allocation TOTAL WRITE CYCLE TIME 5" SEC MAX WRITE. READ CYCLE TIME 10" SEC MAX WORD CURRENT AMPLITUDE FIRST (lIln) (25°C) SECOND (lion) +200 ala • 200 ala BIT CURRENT AMPLITUDE (25"C) ± 95 Table II. Write System Parameters. Figure 13. 10 Me. NDRO BIAX Memory. of spare word and bit lines is made so that each of the eight sections contains two word spares and two bit spares and a spare for the '''T'' line. Therefore each section contains 34 x 52 or 1768 elements. Since each section is identical, each array plane contains 14,144 BIAX elements and the entire memory array consists of 56,576 elements. Figure 14 shows ala Figure 14. Detailed View of 10 Me. NDRO Array. 78 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 a detailed view of one section of the array. From this photograph can be seen the dual interrogate lines, offset to permit straight wire looming. The element-to-element spacing in both the horizontal and vertical directions is 0.125 inches. A shielding mask can be seen, positioned between the holes of the BIAX element forming the ground plane for the interrogate transmission line. In the lower center of Figure 14 in the gap between the shielding masks 'are the sense lines with their isolation resistors, and the bit lines. Seen near the top of the picture are the twisted pairs which connect to the interrogate drivers, the word write lines and word select diodes. 2. Write Circuits The write circuits used in the memory can be seen in Fig. 13 mounted in two card racks below the array. These circuits are of conventional design and are the same type of circuits -used in other BIAX memory systems. 3. Power Supplies and Cooling Fans Power supplies and blowers occupy the lower regions of the memory cabinet, and are of standard off-the-shelf variety. All power supplies have voltage regulation of 0.170 or better and are current limited to provide protection to the memory circuits. M emory System Performance The 10Mc. NDRO memory has been extensively tested to determine its performance characteristics. Figures 15 and 16 show waveforms at various locations in the memory for NDRO operation. Figures 15A through F _Zi '( rr II.. If \ II. MEMORY SYSTBI CLOCK I, SLOWEST ADDRESS lIT = '!CV!~v. ERT. =5Y/DIV. ERT. = 'lV/DIY. fJ '" J ....... 1-" r"-'i'\ - " I ([\ ( :'\ \J \l!\J I\..~ I\..r oJ U C. INTERIOGATE PULSE I>- SE"SE AMPLIFIER OUlPUT ./" ~.,.;" E. TillE STROlE ("T") GATE OUTPUT "" ERT.:: 2V/DlY. J \ f ... BIOIY OUTPUT IEGISTER FLIP.flOP E,CEPT AS HOTEl> TIME lASE = 1II "SEC/I" YERT, SCALE = 7¥/DI¥ Figure 15. 10 Mc. NDRO BIAX Memory Read Cycle Timing Waveforms. show the read operation from the beginning of decoding, through the decoder and interrogate driver selection, and through the sense amplifier and time strobing, and to the memory output register. Figure 16 shows detailed photographs of read circuit waveforms. In Figures 16C and D are shown the access time measurement from the 50 % point of the system clock (negative going) to the response of the memory output flip flop. This acces,s time represents the longest access time for any word or bit in the memory. The memory also underwent considerable testing to determine its operating reliability under various conditions of patterns and cycle rates for both the NDRO and write/read modes. To' facilitate th~ testing, a memory exerciser which was capable of generating an almost unlimited number of bit and word patterns and error checking each pattern in both NDRO and write/read, was employed. By "~' ~-~;~ ~:~~~, VIRT. = IIV/DIV. "n.• I( A. IIITIRIOGATI I'IILSI L L INTIRIOGATI I'ULSiS (POSITIve AND MlGATlY!) NOR. • !M SIC/DIY. VlRT. = 2VIDIV. Figure 16. 10 Me. NDRO BIAX Memory Read Circuit Waveforms. A lOMe NDRO BIAX MEMORY OF 1024 WORDS utilizing this exerciser, errors from any origin caused the equipment to stop and indicate the word and bit location of the error. During the equipment checkout phase, tests representing voltage and write current variations, as well as worst-case patterns and cycle times, were run as a matter of course. To demonstrate that reliable system operation was being obtained, each pattern was run for a ten minute period of time; resulting in a total of 3.10 11 bits having been error checked. In this time, each bit in the memory was error checked 6.10 6 times, and because of the memory organization, had actually been interrogated 24.10 6 times. As an acceptance test for the memory, fourteen patterns were each run for the required ten minute period, representing a total of approximately 42.10 11 error checks of stored information. Each pattern was also run for the write-read-error check mode for a thirty minute period with a read-after-write error check at a write-read cycle time of 9 microseconds. 'The entire acceptance test procedure involved approximately 40 hours of error free system operating time. FUTURE AREAS OF INVESTIGATION Although work has been completed on the memory system described in this paper, many extensions of the techniques are possible. Below are described a few of the more promising approaches and areas in which further work is being done. Variations in Word Organization Although the memory. described in this paper is organized as 1024 words of 48 bits per word, since the array is organized for reading as 256 words of 192 bits per word, many variations in effective memory organization are possible. For example, a read organization of 256 words of 192 bits per word could be readily achieved with minimum modifications. Similarly, word lengths between 48 and 192 bits can be achieved. In summary, many combinations of word lengths and bits per word can be realized for NDRO operation with the existing array design, as long as the total storage capacity is not exceeded, although with the present array, writing must still be performed on a 48 bit per word basis. 79 Access Time Reduction In Figure 2 at the beginning of this paper, it was noted that the BIAX array contributed only about 10 nanoseconds to the 80 nanosecond typical access time. In view of this it appears quite feasible to reduce the access time substantially by appropriate circuit design effort, as the NDRO operation of the BIAX element is not a limiting factor. Faster Read Cycle Times In the same way that the access time can be reduced, it is quite possible to increase the reading rate to 20 Mc. or more while still using the same array and system organization principles. Increased Storage Capacity The present memory capacity of 1024 words in no way represents the practical limit for this fast NDRO technique. It seems quite likely that word capacities of two or four times the present memory could be achieved with perhaps a 120 nanosecond access time. Reduced System Volume No effort was made to minimize the physical size of the present memory, rather it was designed specifically for physical access to the array. By appropriately folding the array and repackaging the circuits, the physical size of a system should be consistent with other· core memories of similar capacities. Airborne Applications The BIAX element and its low power nondestructive readout properties are particularly well suited for airborne applications. For this reason, BIAX elements for use at temperatures from -55°C to+100°C have been developed by Aeronutronic and are being employed in various systems. 'The techniques used in the 10 Mc. NDRO BIAX memory can be readily applied to this type of element to produce very fast NDRO operation over a wide range of temperature. MicroBIAX Applications The BIAX element used in the 10 Mc. NDRO memory employed elements developed before the start of the memory project. A major inhouse program is now underway to develop a MicroBIAX element having outside dimensions of 30 x 30 x 50 mils. These elements offer 80 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 greatly improved characteristics, particularly for write cycles of 1-2 JJ. sec. In addition, faster NDRO operation, better performance over wide temperature ranges, simpler array wiring configurations, as well as the obvious sIze advantages are offered by the MicroBIAX element. The potential applications for this class of new elements is almost unlimited, and it is expected that MicroBIAX elements will be employed in most of the new memory systems which are developed in the future. ACKNOWLEDGMENT This work was sponsored in part by ·the Department of Defense. Many people in addition to the authors have contributed to the success of the program, but in particular the efforts of C. M. Sciandra in preparation of elements and C. L. Cantor and M. J. VanZanten in the design and testing of the system are greatly appreciated. REFERENCES. 1. J. A. RAJCHMAN: "Magnetic Memories; Capabilities and Limitations", Computer Design, September 1963. 2. C. L. WANLASS and S. D. WANLASS, "BIAX High Speed Computer Element", WESCON, 1959. 3. DUDLEY A. BUCK and WERNER 1. FRANK, "Nondestructive Sensing of Magnetic Cores", AlEE Technical Paper 53-409, October 1953. 4. ATHANASIOS PAPOULIS, "The Nondestructive Read-Out of Magnetic Cores", Proceedings of the IRE, Vol. 42, pp. 1283-1288, August 1954. 5. U. F. GIANOLA and D. B. JAMES, "Ferromagnetic Coupling between Crossed Coils", Journal of Applied Physics, Vol. 27, No.5, pp. 608-609, June 1956. 6. JACOB MILLMAN and HERBERT TAUB, Pulse and Digital Circuits (Mc Graw-Hill Book Co., Inc., New York, 1956), Chap. 10, pp. 291-295. 7. MONTGOMERY PHISTER, JR., Logical Design of Digital Computers, (John Wiley & Sons, Inc., New York, 1958), Chap. 5, p. 126. ASSOCIATIVE MEMORY SYSTEM IMPLEMENTATION AN D CHARACTERISTICS J. E. McAteer and J. A. Capobianco Hughes Aircraft Company, Ground Systems Group Fullerton, California and R. L. Koppel Autonetics Division of North American Aviation Anaheim, California choice of the mechanization methods is dependent on the application and would be a result of detailed system analysis and tradeoff studies. The first part of this paper details the mechanization techniques of an associative memory with the BIAX element. In particular, a new mode of use of the BIAX element is presented which enables extremely fast search times to be realized. The number of functions which an AM can perform are many and varied. These functions may be broadly classified in the following way: 1. INTRODUCTION The implementation of a new system utilizing state-of-the-art technologies requires a careful engineering evaluation of all parameters affecting such a design. In particular, when new system concepts are needed and the available devices for mechanization have been designed for a different class of system, the problems become much more severe. Such is the case with Associative Memory (AM) systems where an entirely new system organizational concept places exacting requirements on the existing technology of information storage, as is evidenced by the many techniques which have been proposed for implementation. 2 - 8 1. Search Functions 2. Write Functions 3. Readout Functions It has been determined through study and evaluation of storage media that the BIAX* element,l a rYlultiaperture ferrite core, possesses the most desirable characteristics for implementing an associative memory today. The utilization of the BIAX element in the mechanization of an AM is not limited to -one configuration. The repertoire of possible methods consists of one-BIAX-per-bit and two-BIAXper-bit schemes, and, within each of these areas there exist different ways of utilization. Th~ * Registered The functions which are provided in a given system are, as mentioned, dependent on the application. In addition, the methods of performing some of these functions, in particular the searching types, are dependent on the speed requirements. These in turn will, to some extent, determine the mechanization method chosen. The second part of this paper details the various functional characteristics an associative memory might have. A chart is presented which delineates the pertinent characteristics as a function of the mechanization technique. Trademark, Philco Corp. 81 82 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 The last part of the paper shows the results obtained from a demonstration model of an AM which utilizes the BIAX in the new mode of operation mentioned above. 2. BIAX IMPLEMENTATION OF ASSOCIATIVE MEMORIES 2.1.0 Normal BIAX Operation, Using TwoBIAX-Per-Bit The BIAX is a rectangular block- of ferrite having two orthogonal holes: the storage (information) hole and interrogate hole are as shown in Figure 1. Also shown in the figure are the read and write wave shapes and the readout signal produced by the BIAX element when operating in the normal mode. Note that the sense signal is bipolar and occurs during both the rise and fall time of the interrogate current and that the phase of the signal is information dependent. THe sense signal is caused by a domain rotation phenomenon which results from the interaction of storage and interrogate hole flux, in the material between the holes, during interrogation. An AM implemented with the normal mode of the BIAX operation requires a serial-by-bit interrogation to prevent possible cancellation of pulses on the word-oriented sense lines. Figure 2 depicts the technique used in the two-BIAX-per-bit method. The normal and complement of each word are stored. In order to decrease the required search time, the interrogate currents are staggered by an amount equal to or greater than the rise time of the current and left on until the last bit has been searched. This prevents the output signals produced by the trailing edge of the interrogate current from interfering with sense signals produced by subsequent interrogate pulses. If the sense signal polarities are as shown in Figure 1, and if the normal bit is searched when looking for 0 at a bit position and the complement bit is searched when looking for a 1, then the input to the sense amplifier will consist of a series of negative pulses for a matching word. This is due to the fact that all elements interrogated would be in the 0 state. Should a mismatch occur at a bit position, a positive pulse will occur on the sense line. For example, in Figure 2, drivers C1, N2, N3, and C4 wouJd be turned on. In Word No.1, several elements in the 1 state are interrogated resulting in a mismatch (positive) signal, while in Word No.2 all elements interrogated contain 0 and only negative pulses occur on the sense line. 2.2.0 Operation of the BIAX in the Hughes Unipolar Mode 2.2.1 Description of Operation In the course of the mechanization studies, a technique for using the BIAX which greatly enh~nces the search speed has been invented. This new mode of operation results in a signal for a WRITE r- ___ +1/3 I ~'1': ___ """ W, ~~~--~( , ,)~------ , -1/3 IW ~ - +2/3 - - ~'O" - - - WORD NO.3 --' WORD NO.2 'w ww WORD NO.1 -2/3 'w INTERROGATE DRIVERS READ INTERROGATE ,~--------------~~~------ SENSE "1" "~---------~v---SENSE "0" ~ f""\ ~--- Figure 1. Conventional BIAX Operation. Figure 2. Typical Search Operation. ASSOCIATIVE MEMORY SYSTEM IMPLEM.ENTATION 83 stored 1 and no signal for a stored 0, with a very high element signal-to-noise ratio. Thus, unipolar rather than the conventional bipolar operation is obtained. This technique allows parallel-by-bit interrogation. Thus, the search time is not directly proportional to the number of bits per word, as in the serial-by-bit approach, but is proportional to the number of bits, per word divided by the number of elements interrogated simultaneously. tory basis twenty-bit words have been interrogated with resultant word signal-to-noise ratios (twenty 0 signals versus one 1 signal) greater than 3: 1 (see Section 7). In a practical system, the number interrogated simultaneously would be smaller due to environmental conditions and circuit tolerances. However, since the decrease in search time is directly related to the number of elements interrogated simultaneously, dramatic improvements result. Using the same criteria for selecting the normal or complement driver as before, it can be seen that for the matching word, no signal will occur on the sense line since all O's are' being interrogated. Therefore, if the lements are interrogated simultaneously, only 0 noise buildup will be seen. For a word which mismatches (interrogation of an element in the 1 state), a large output signal will result. The method of obtaining this mode of operation is shown in Figure 3. The element is first written to the 1 state by a current pulse large enough to saturate the storage hole (Figure 3B) . This pulse is then followed by a smaller pulse of the opposite polarity which is the Word Write 0 current. If a 0 is to be written then, in coincidence with the Word Write 0 current, a current pulse (Bit Write 0) is produced in the interrogate hole and the flux around the storage hole is reduced to a very small value. If a 1 is to be written, the Bit Write 0 pulse does not The number of elements which may be interrogated simultaneously is a function of the signal-to-noise ratio of the elements. On a labora- WORD WRTE----' "1" INTERROGATE HOLE "1" BIT W R I T E - - - - - C \ j J . . . - - - INTERROGATE---lJ1j "1" FLUX DUE ELEMENT OUT-- TO INTERROGATE HOLE RESULTANT FLUX VECTOR FLUX DUE TO STORAGE HO LE MATERIAL INTERROGAT~TWEEN FIELD RESULTANT HOLES Ay-""';"-"0" ~RESULTANT .Lp---~~ REDUCTION IN STORAGE FLUX .",," I I ,," I -------~-~ ~INTERROGATE ."""" INTERROGATION OF STORED "ONE." Figure 3. BIAX Element Operation in the Unipolar Mode. 84 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 occur and the storage hole flux remains in a saturated condition. Part C of Figure 3 illustrates the technique by showing what occurs in the common volume of material between the two holes. This technique has one main disadvantage: Where writing into fields of words is desired, the disturb characteristics of the element in the o state result in a lowered signal-to-noise ratio. This is due to the fact that the flux around the storage hole of the unselected bits will "creep" to a higher value due to the word oriented disturb currents and thus produce a larger output for the 0 state than is desired. However, the method has many advantages. One which has been mentioned is that of decreasing the search time. This reduction in the basic search time can be traded for hardware cost by permitting time-sharing of sense amplifiers and thus reducing the number of circuits required. In addition, it can be seen from Figure 3A that all windings are orthogonal and thus the array noise problem is reduced and, since no woven windings are needed, the array fabrication is extremely simple. 2.2.2 Ternary-State Reading 1'n the Hughes Unipolar Mode A significant aspect of the unipolar mode of operation is that by reversing the sense winding between the normal and complement words and using serial-by-bit interrogation, a ternary output results. That is, if a bit matches the search, then no output results; if a bit mismatches, then the output can be positive or negative dependent upon whether the normal or complemeht bit was interrogated. In this manner it is possible to classify all words at one time as less than, greater than, or equal to the search word. This can be explained by Figure 2. Assume that an element in the 1 state (signal output during interrogation) in the normal word produces a positive output due to the reversed sense winding. If the same criteria are used as before for selecting interrogate drivers, Word No.2 will again produce no signal since it exactly matches the search word, and thus all elements interrogated are in the 0 state. However, vVord No.1 mismatches in the first bit position and, since the complement bit is in- terrogated, will produce a negative output indicating that Word No.1 is less than the search word. Word No. 3 agrees in the first bit position with the search word and thus will produce no output for that interrogation. However, in the second bit position a mismatch occurs and, since the normal bit is ~nterrogated, a positive pulse occurs indicating that Word No. 3 is greater than the search word. Thus, (1) if a positive pulse appears on the sense line first, that word is greater than the input word; (2) if a negative pulse appears on the sense line first, that word is less than the word; and (3) no pulse on the sense line indicates an exact match. The technique described is quite significant in that the search time now is independent of the search type for these three searches and, with a tristable sense circuit, all words are classified simultaneously. With the conventional mode of operation described previously, only two sense signals may be derived: positive and negative. Therefore, in order to accomplish limit type searches, a stepping algorithm (see the section on Associative Memory Search Functions) which alters the search word between steps must be used or logic in the sense amplifier is necessary to determine if the first mismatch occurs when a 1 or a 0 is searched for (if the first mismatch occurs when a 1 is being searched for the search word is obviously larger than the stored word and vice versa). 2.3.0 Operation Using One BIAX Per Bit Mechanizing an AM with one BIAX per bit requires a serial-by-bit interrogation. However, the method of accomplishing this interrogation can take several forms. One possible way to interrogate is to ripple through the bits serially and gate the sense signal logically at each amplifier against the information in the corresponding bit position of the input word. In this manner, mismatches can be detected and, if a tristable sense circuit is available, the limit searches (LESS THAN and GREATER THAN) can be directly implemented without use of an algorithm. Another way to interrogate involves an interrogate-priming cycle and does not require logic in the sense amplifier. Referring to Figure 1, it can be seen that the sense signals for a 1 and ASSOCIATIVE MEMORY SYSTEM IMPLEM.ENTATION o are out of phase for the same interrogation pulse. If the sense output were examined during the rise time of the interrogate when searching for a 1 and during the fall time when searching for a 0, it can be seen that, if a match occurs, the pulses on the sense line would all be positive. If a mismatch occurs (for example, examining the output of an element in the 1 state during the fall time of the interrogate pulse); then a negative sense signal occurs. The above process can be implemented in a straightforward manner by merely turning on (priming) all interrogate drivers which are to search for O's before rippling through the interrogate cycle and turning them off at the proper time during the interrogate cycle. Figure 4 depicts the procedure for accomplishing the interrogation. 11 and Is are turned on during the priming period since they are to search for 0 as indicated by the contents of the Data Register. In Word No.1, the corresponding bit positions contain a 0 and 1 respectively, and hence no output will appear on Sense Line 1 during the priming period because of cancellation. However, Word No. 2 contains 0 in both positions and therefore a double amplitude negative pulse will appear on Sense Line 2. During the interrogate period, a negative pulse appears on Sense Line 1 indicating that the word mismatches, while Sense Line 2 has all positive pulses which indicates a matching condition. SENSE LINE NO.2 SENSE liNE NO.1 INTERROGATE DRIVERS SENSE LINE - " ' - - - - - ' NO.1 SENSE LINE NO.2 Figure 4. Interrogation Using One-BIAX-Per-Bit. 85 The method of implementation described here is quite straightforward and is such that conventional random access BIAX array windings with some modifications can be used. In addition, conyentional memory circuitry, with the exception of the word-oriented sense circuits, can be used, and data readout is provided with relative ease. This technique would, how-ever, be somewhat slower than the parallel-by-bit two-BIAX-per-bit scheme. Since the BIAX is operated in its normal mode, the disturb characteristics and writing mode are such that alteration of arbitrary fields within a word can be provided. 3. ASSOCIATIVE MEMORY SEARCH FUNCTIONS 3.1.0 EXACT-MATCH Search There are a variety of search types which can be implemented in an associative memory.9,1'2 These searches can be performed on an entire data word or on specified fields. The selection of fields is accomplished by having the ability to mask the data word. That is, any bit of the comparison word can be masked to a "don't care" state, and only those bits not masked will participate in the search. Thus, there is inherently a ternary search characteristic (1,0, don't care) which may be taken advantage of in some cases to decrease the search time. A brief description of search types follows: The most commonly used search operation is the EXACT-MATCH search. This search, as the name implies, would locate all words in memory which have a one-to-one correspondence with the bits of the search word. That is, any word in memory which mismatches the search word in one or more bit positions does not satisfy the search criterion. The search time is proportional to the number of bits in the word with the exception of the parallel-bybit techniques. 3·2.0 Limit-Type Searches Under this category are included GREATER THAN, GREATER THAN OR EQUAL TO, LESS THAN, and LESS THAN OR EQUAL TO searches. The functions of these search types are fairly obvious. The time involved in performing these searches is dependent upon the method of mechanization. In the two- 86 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 BIAX-per-bit scheme, with the ternary output described previously, the search time is the same as an EXACT-MATCH search and no logic gating is required in the sense circuitry. In order for the other techniques to have a comparable search time, logical gating is necessary in the sense circuitry at each bit interrogate time. Another technique for accomplishing the limit-type searches is to have an algorithm which alters the search word and then looks for exact matches at each step. In its simplest form this would consist of incrementing or decrementing a counter for each step and performing an EXACT MATCH search. However, by taking advantage of the ternary characteristics of the interrogation (1, 0, don't care), it is possible to reduce the number of steps required. With this method the maximum number of steps required is eqiial to the length of the field participating in the search and, on the average, will be one-half the length of the field. For example, if a 12-bit field were being used in a LESS THAN OR EQUAL search the maximum number of steps required to perform the search is 12 (contrasted to a maximum of 4096 in the simpler counter approach). Thus, if any word matches at one of the steps, it satisfies the search criterion. This method, while not requiring logical gating of the sense circuits, results in a significant increase in component count if the field.s are not restricted. The process of finding all words in memory which lie between some specified bounds can be accomplished by the successive application of the GREATER THAN OR EQUAL TO and LESS THAN OR EQUAL TO searches to the same field with a change of the search word. A LESS THAN OR EQUAL TO search is performed on the upper bound search word which therefore eliminates all words greater than the upper bound. The GREATER THAN OR EQUAL TO search on the lower bound search word then leaves those words lying within the bounds indicating a matching condition. A somewhat more efficient algorithm can be implemented if the upper and lower bounds are available simultaneously. 3.3.0 Pattern Recognition Another useful function which can be provided in an associative memory is a form of pattern recognition. As an example, consider the case where it is desired to compare incoming patterns with stored patterns in the associative memory. An incoming pattern is normalized, sampled, and quantized at set intervals. These quantized samples then become the keys with which the search is conducted. Since exact pattern matches are impractical, there are two words stored in memory (two-BIAX-per-bit mechanization is used) for a single pattern. The word outputs, which indicate match or mismatch, are OR'd together and "don't care" bits are written into the words in memory. That is, if a bit position is in the "don't care" state, no response will be obtained from the bit during interrogation for a 1 or a O. This has the effect of performing a BETWEEN LIMITS search in memory and thus effectively establishes an envelope about the desired pattern. For example, if there are 32 quantization levels and one sample point has the value of 23, then the words stored in memory might by 101XX and 110XX (where X indicates a "don't care" bit) thus allowing a match indication for that point if the incoming waveform has a value between 20 and 27. Thus, the tolerance allowable is accounted for in the memory and is subject to control. This is illustrated in Figure 5. 3.4.0 Supplementary Search Operations Ordered Retrieval-In some problems it is desired to retrieve information in an ordered manner. In a conventional system this can be 30 25 a :::; ~ 10 QUANTIZATION POINTS Figure 5. Tolerance Envelope Used in Pattern Recognition. ASSOCIATIVE MEMORY SYSTEM IMPLEM.ENTATION 87 a very time-consuming process. Using the ternary characteristics of the associative memory, much more efficient ordering is possible. 10, 11, 13 and extend its range of usefulness. The writing functions which are provided in a system would be strongly application dependent. Minimum and Maximum Searches-In some applications it is desired to find the word in memory which has the minimum value, within the field, with respect to all other words in memory. The algorithm for accomplishing this is that which would be used for ordered retrieval. The algorithm would terminate when the first single response occurs. Sequential Load-In some applications, blocks of data may be transferred to the associative memory for use in subsequent searches. Sequential load starting from the first word location is then useful as it allows the words to be loaded very rapidly by minimizing the controls necessary while retaining a spatial relationship with respect to the source store. In a partial load, word locations not written into can be prohibited from responding to subsequent searches. The maximum search would use the same algorithm as for inverse ordered retriev~l. Nearest Neighbor-The capability of determining the nearest numerical neighbor above or below the value of the selected key can be implemented by the use of an ordered retrieval algorithm. This is accomplished by using either normal or inverse ordered retrieval starting from the initial value of the key. A more complex algorithm can be implemented to obtain the nearest neighbor on either side if it is desired. Composite Searches-In some instances it is desirable to perform searches on different keys and to specify a logical relationship between the separate key searches. Accordingly, only those ·words \vhich satisfy the logical relationship and key searches are retained. For example, if there are five keys, A, B, C, D, and E, it might be desired to perform an EXACT MATCH on key A, GREATER THAN OR EQUAL TO on key B, BETWEEN LIMITS on key C, and LESS THAN OR EQUAL TO on keys D and E. In addition, logical relationships such as ABCDE, ABC+DE, may be required. This type of search can be very useful in a variety of applications. The possibility of providing a match indication if a ·portion of the search keys match could also prove useful. 4. AM WRITING FUNCTIONS As with the search functions there are a number of different writing functions which may be provided with an AM. As might be expected some of these functions are identical with the normal writing modes encountered in conventional memories. However, there are also those modes which are peculiar to the AM organization and which add to the power of the system Random Load-Another loading feature which is often useful is the random load. This is the same as for a conventional random access memory and requires that the physical location (address) to be written into be specified. Load First Empty Location-An associative memory can be implemented to keep track of its own empty locations, such that when a word is to be entered, it is automatically writtten into the first empty location. In a memory where the retrieval time is location dependent, this is very effective since all data is held at the "front" of the memory, thus minimizing access time. This data-packing feature can be very useful. W rite "Don't Care" Bit-Masking within the data word in the associative memory itself can be accomplished by writing a bit to the "don't care" state. With this technique bounds can be stored in the memory as described previously. This is one of the more interesting features which should find great utility. The twoBIAX-per-hit schemes are, at present, the only techniques which can be used to accomplish this. Field Alteration-The ability to alter a single bit or field of all words or selected words as a consequence of the result of a search is another writing characteristic which might be provided. This feature is particularly useful when using the memory as an aid to parallel computation. The element must be operated in the conventional mode to implement thiB feature. This could also be termed "writing through a mask." Memory Partitioning-It is possible to partition an associative memory so that there effec- 88 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 tively exists "micro-associative memories" within the main associative memory. This feature is useful when several types of data are stored and the access time needs to be kept to a minimum. Thus, only that portion of the memory containing the data to be interrogated is accessed and the non-pertinent data for that search is bypassed. This of course assumes that the memory contains multiple planes and the normal search process consists of a sequential search of the planes (all words in a plane are of course searched in parallel) . 5. AM READOUT FUNCTIONS As with the other characteristics of an associative memory, there are a number of different types of readout possible. The type of readout necessary is, of course, dependent upon the application. Address Readout-In an application where the key or keys are well defined, the use of an associative memory with a conventional random access memory may be advantageous. In this mode, a block of keys is transferred from the conventional memory to the associative memory in a specified seq~ence so that the physical location of the keys in the associative memory is spatially linked with data stored in the conventional memory. Upon searching the associative memory, the output indicates the addresses of the words in the random access memory which satisfy the applied search criteria. This mode may be particularly useful where the ratio of the search key to the remaining data word is small, since at present the associative memory is expensive relative to conventional random access memory. Data Readout-The ability to read out the contents of an associative memory is another feature which is useful. The flexibility that this allows in a system can be significant, since any portion of the data word may be searched on, and the data word itself, or perhaps the portion of the data word not searched, can be read out. Multiple Match Resolution-In any search the possibility of more than one word matching the applied criteria has to be contended with. The ability to retrieve all matching words is, in most cases, a necessity. This is usually accomplished by retrieving them sequentially through a commutating network. An efficient design of the commutating network is necessary since it can be an important factor in the retrieval time. Yes-N o-In some applications a decision is made regarding the next course of action after interrogation of the memory, bas"ed only on information as to whether or not the search has been matched. The Yes-No operation is a relatively easy feature to provide. Count Matches-When the memory is searched there is a possibility, dependent on the application, that a significant portion of the memory could respond to the search. In such cases, an indication of the number of matches which exists may be wanted before output occurs. If the dump is excessive, then the search may be refined to reduce the number of response. 6. CHARACTERISTICS OF AM SYSTEMS The foregoing has been a brief description of some of the more salient features of AM mechanization techniques and functions. Table I lists the techniques mentioned and shows the relative performance of the mechanization schemes. It can readily be seen from the discussion above and Table I that an absolute comparison of techniques is not practical. For an absolute comparison, detailed knowledge of the system application would be required, so that the various factors and tradeoffs could be intelligently evaluated. N one of the schemes shown in Table I requires logical gating of the sense circuits for performing limit-type searches. Thus, for Schemes 11, 3, and 4, a stepping algorithm (similar to that in Ref. 11) is used for these search types and therefore the limit search time is a function of the length of the field used in the search. However, by providing logic in the sense circuit, the limit search time for these three schemes becomes proportional to M, the number of bits per word. The merits of providing this mode would be ascertained from the total system analysis. In the equations for the relative limit search time, the first term represents the time required for storage of intermediate results (considering one unit of time as the time between 89 ASSOCIATIVE MEMORY SYSTEM IMPLEM.ENTATION Table 1. Scheme 1 Mechanization Relative Exact Match Relative Limit Data Search Writing Time Search Time Two-BIAX-per-bit Signal-no signal M it Binary sense output 2 Associative Memory System Characteristics. E+~ -2k (ave) 2 3F + ~ k Whole word only. M M Ternary sense output 3 E+~ -2 (ave) 2 Two-BIAX-per-bit Signal-signal M 3F + 3F + Restriction On Fields For L:ilJ11.t Search* Additional array windings required. Fixed fields only. Additional array windings required. No restrictions (any combination of bits selected by mask permissable) Wrih' IIDon't Care" Bit Available Yes I (max) Two-BIAX-per-bit Signal-no signal Whole word only. Data Reading (Additional Array Reauirements) ~ (max) Yes Unrestricted No new windings Fixed fields only required. as to location and number of bits Yes No new windings Fixed fields only Unrestricted required as to location and number of bits No Binary sense output 4 One-BIAX-per-bit Signal-signal M ~ 2 (ave) 6F + ~ (max) Binary sense output :r..mEND: M k F Number of bits per word Number of bits per word interrogated simultaneously Field length used in limit search successive interrogations). Thus, since the average number of steps (field interrogations) in the incrementing algorithm is one-half the field length, the number of storage cycles required is F /2 and, in the type of system being considered, the storage cycle is about three times the "ripple" time, hence the term 3F /2 in Scheme 1. The second term represents the total "ripple" interrogate time. Since again F /2 steps are required on the average, and there are F /k ripple times, the total is F2/2k. Of course if k = 1 ilSee text.. (serial-by-bit interrogation) there results the equations shown in Scheme 3. In Scheme 4 the first term is increased due to the priming cycle and the need for sense amplifier recovery due to the priming cycle. The table attempts to compare systems of approximately· equal logical complexity, hence the restriction on the fields in the limit searches. Obviously, it is logically possible to have completely variable fields for the limit searches in 90 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 all Schemes at the expense of additional components (which can be quite significant in number) . of information are stored, there are therefore 340 bits in the array (the complement of the 20-bit word is not needed for the experiments for which this word is intended). 7. THE ASSOCIATIVE MEMORY MODEL The block diagram of Figure 6 is not complete in every detail but shows the more pertinent features. The Data and Mask Registers con'sist of a bank of 10 manual switches each with the provision for patching the address counter into the Data Register to permit dynamic search and write operations. The model is also capable of performing "write" and "read" in a single step process by means of push button control. The search timing can be controlled to allow serial-by-bit operation or parallel-by-bit with k from two to ten. Of the techniques for mechanization described earlier, the only one which departs significantly from the conventional use of the BIAX is that which produces signal-no signal operation. For this reason it was decided to verify the approach experimentally by the construction of a model which utilized this new mode of operation. The block diagram of the model is shown in Figure 6. The array portion of the system consists of 16 words of 10 bits each, and one word of 20 bits for purposes of signal-to-noise experiments. Since both the normal and complement Figure 7 shows three photographs of the demonstration model. The top figure is inter- WORD WRITE DRIVERS WORD WRITE DECODER SENSE AMPLIFIERS WORD WRITE SWITCHES CLOCK INPUT WRITE READ Figure 6. Simplified Block Diagram of Model of Associative Memory. ASSOCIATIVE MEMORY SYSTEM IMPLEMENTATION 91 The model has been operated at search clock rates of 2Mc (limited by external equipment) with simultaneous interrogation of all bits of the word (k = 10). 8. CONCLUSIONS This paper has presented several techniques for the utilization of the BIAX in an associative memory system. The techniques presented have, in some cases, significantly different operating parameters. In addition, the influence of the various techniques on the search speeds has been pointed out. From this discussion it can be seen that the number of trade-off areas which exist, and the resulting influence on system complexity and performance, make it necessary to have an intimate knowledge of ultimate system utilization in order to effect a proper associative memory design. ACKNOWLEDGMENTS Figure 7. Photographs of Associative Memory Model. esting in that it shows that no woven windings are necessary in the array. This suggests an array structure with all elements in contact which provides highly compact and noise-free systems. The model proved that the technique is valid and could be applied to larger systems. Figure 8 shows waveform photographs obtained from the 20-bit evaluation word. The write program is shown in part (a); part (b) shows the interrogate current waveform and the disturbed 1 output of a single element. Part (c) shows the 0 output of a single element, which together with part (b) indicates SIN ~ 40. Part (d) shows the result of interrogating a string of 20 elements, 19 of which have stored O's while the 20th has stored in it a 1. Part (e) shows the result of interrogating the same string of 20 elements while all are in the 0 state. Comparison of parts (d) and (e) clearly indicates a sense winding output signalto-noise ratio of better than 3 :1, which permits relatively straightforward amplitude discrimination. The work presented in this paper is the result of the contributions of many people. The authors would particularly like to acknowledge the contributions of L. H. Adamson and D. A. Savitt. REFERENCES 1. WANLASS, C. L., and S. D. WANLASS, "BIAX High Speed Magnetic Computer Element," WESCON Convention Record, Part 4, pp. 40-54, San Francisco, California, August 18-21, 1959. 2. KrSEDA, J. R., H. E. PETERSEN, W. C. SEELBACH, and M. TEIG, "A Magnetic Associative Memory," IBM Journal of Research and Development, Vol. 5, pp. 106-121, April 1961. 3. BROWN, J. R., Jr., "A Semi-Permanent Magnetic Associative Memory and Code Converter," Special Technical Conference on Nonlinear Magnetics, Los Angeles, California, November 1961. 4. LEE, E. S., "Solid State Associative Cells," Proceedings of the Pacific Computer Conference, California Institute of Technology, March 15-16, 1963. 5. SLADE, A. E., and C. R. SMALLMAN, "Thin Film Cryotron Catalogue Memory," Sym- 92 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 (A) 200 MA/CM, WRITE PROGRAM UPPER TRACE WSI, WSO LOWER TRACE PWIO ill SEC,CM (8) UPPER TRACE - INTERROGATE CURRENT LOWER TRACE - DISTURBED "ONE" OUTPUT FROM SINGLE ELEMENT 400 MA. CM, IIlSECCM 100 MA,CM, 0.111 SEC CM 50 MV, CM, O.ill SEC CM (C) LOWER TRACE - "ZERO" OUTPUT FROM SINGLE ELEMENT 50 MV CM, O.ill SEC CM (0) LOWER TRACE -INTERROGATION OF 20 ELEMENTS; A SINGLE DISTURBED "ONE" IN SERIES WITH 19 "ZERO'S" 50 MV CM, O.ill SEC CM (E) LOWER TRACE - INTERROGATION OF 20 ELEMENTS - ALLIN"ZERO"STATE 50 MV CM, 0.111 SEC CM Figure 8. Waveforms Showing Unipolar Operation in the Associative Memory Model. posium on Superconductive Techniques for Computing Systems, Washington, D. C., May 1960. 6. NEWHOUSE, V. L., and R. E. FRUIN, "A Cryogenic Data Addressed Memory," Proceedings of the Spring Joint Computer Conference, May 1-3, 1962. 7. DAVIES, P. M., "A Superconductive Associative Memory," Proceedings of the Spring Joint Computer Conference, May 1-3,1962. 8. ROWLAND, C., and W. BERGE, "A 300 Nanosecond Search Memory," Proceedings of the Fall Joint Computer Conference, N 0vember 1963. 9. ESTRIN, G., and R. FULLER, "Algorithms for Content Addressable Memories," Pro- ceedings of the Pacific Computer Conference, November 1963. 10. SEEBER, R. R., "Associative Self Sorting Memory," Proceedings of the Eastern Joint Computer Conference, pp. 179-188, December 13-15, 1960. 11. SEEBER, R. R., and A. B. LINDQUIST, "Associative Memory with Ordered Retrieval," IBM Journal of Research and Development, Vol. 6, pp. 126-136, January 1962. 12. FALKOFF, A. D., "Algorithms for ParallelSearch Memories," Journal of the ACM, Vol. 9, pp. 488-511, October 1962. 13. LEWIN, M. H., "Retrieval of Ordered Lists from a Content-Addressed Memory," RCA Review, Vol. XXIII, No.2, pp. 215-229. A 16k-WORD, 2-Mc, MAGNETIC THIN-FILM MEMORY Eric E. Bittmann Burroughs Corporation Defense and Space Group Great Valley Laboratory Paoli, Pennsylvania INTRODUCTION Small magnetic thin-film temporary-data memories 1 ,2 have been in use in operational computers since mid-1962, when the prototype Burroughs D825 Modular Data-Processing System3 ,4 was installed at the U. S. Naval Research Laboratory. To the present, some 43 additional D825 systems have been placed in use or ordered. The experience gained in the successful operation of these small thin-film stores has encouraged the more ambitious construction of a large, random-access memory for a modular processing system. assigned a number. If two or more requests appear simultaneously on different party lines, the signal on the lowest-numbered line receives priority. A separate party line interconnects all memory modules, allowing communication from memory to memory. The memory module is physically divided into two cabinets, each storing 8,192 words of 52 bits each, for a total capacity of 16,384 words. The 52-bit word contains 48 data bits, three control bits, and one parity bit. The control bits act as tags which tell the program whether or not the instruction has been executed. Control of the memory module is effected by descriptor words containing 52 bits. The descriptors originate at either a computer or I/O control module. A memory module can receive four descriptors during one request. The read/write cycle of each memory is 0.5 p,sec, and the access time is 0.3 p,sec. During the remaining 0.2 p,sec, the word is rewritten or replaced at the selected address. The two cabinets of a module can be tested independently of each other. Several test features are built into each cabinet. A test word can be written into all addresses, or into alternate addresses, or into a selected address. A continuous stop-on-error mode compares every readout with the test word. Operation halts on an error, and the faulty word and its address are displayed on the control panel. Single-cycle and single-pulse operation are also possible. Each memory module can perform a number of logic manipulations independently of other modules. A memory module can: execute the conventional read or write instructions on a single word, or on two, three, or four consecutive words simultaneously; read n words, where n is a quantity contained in the descriptor; perform a block transfer operation from one area in memory to another, or to another memory module; or perform a search for a requested word or a requested digit, either in itself or in any other memory module, matching against a word or digit supplied. MEMORY MODULE ORGANIZATION Figure 1 is a block diagram of one memory module; the interwiring in the memory stack is shown in Fig. 2. To keep the total sense "Party lines" interconnect the memories with either computer or I/O. Each party line is 93 94 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 PARTY LINES 104 1\ SPUT BIT CONDUCTOR STROBE REFERENCE STROBE BUFFER 1. .. \ EASY (BIT.FIELD) AXIS ~(LDNGITUDINAL ~ ~~ _ FILM SPOT DIRECTION) RECTANGULAR B·H LOOP HARD (WORD· FIELD) AXIS (TRANSVERSE DIRECTlONl SQUARE B·H LOOP WORD CONDUCTOR SENSE CONDUCTOR Figure 1. Memory Module, Block Diagram. delay and sense signal attenuation reasonably low, we organized the stack into a configuration of 4096 words, of 104 bits each, rather than 8192 words, 52 bits each. This kept the total sense delay below 100 nsec for the worst-case address location. Film elements are deposited 768 per substrate, in a 32 X 24 array. Five substrates in a row provide storage for 32 words, 120 bits each. A single five-substrate film word, therefore, can easily store two 52-bit computer language words. Four such rows (or 128 words on 20 substrates) comprise a plane. A plane with certain associated circuit caras, connectors, and structures is assembled as an integral plug-in unit called a frame; 32 frames comprise a 4096-word stack. A pair of computer words requires 105 bits, including two parity, six control, and one reference hit. The unused bits, or spares, are distributed through the stack for possible replacement use. A row of spare bits can be easily wired into position to replace another row, if necessary. This is normally performed during testing of memory planes, prior to module assembly. A descriptor word arriving at the control unit receivers initiates a memory cycle. The address data is transferred to the address registers, and a memory cycle is initiated. The address (6 bits) is decoded at the input of every word driver and at the input of every word switch. Selection of a film word occurs in a diode-transformer matrix. The matrix contains 4096 transformers and the selection diodes. The memory is addressed by the wordorganized (linear-select) scheme; each film word line is driven from a single transformer. The current from a selected word driver flows through the matrix to the selected word switch. The transformers have linear (not square-loop) characteristics, and the selected film word line receives a word current pulse. This current interrogates all the film bits, inducing signals into the sense amplifiers. Planar films remagnetize under the influence of two orthogonally opposed fields. (See inset in Fig. 1.) A word field applied parallel to the film's hard direction rotates the magnetization vectors from their rest position (easy axis) into the hard direction. (Vectors of a bit storing a ONE and a bit storing a ZERO rotate from opposite directions, each passing essentially through 90°, to an almost common harddirection alignment.) This rotational switching induces a readout signal into the associated sense line. A second field, the bipolar bit or information field, applied parallel to the easy direction (by a bit conductor lying in the hard direction), while the film is still magnetized in the hard direction, determines the future 2-MC MAGNETIC THIN FILM MEMORY 95 Figure 2. Memory Stack Interwiring. state of the cell after the word field has been removed. (Vectors fall back through 90° toward either the ONE or ZERO orientation, along the easy axis.) Interrogation of a word occurs during the leading edge of the word current, and data is written into the films during the trailing edge of this current. Bit currents, present in all lines during word-current turn-off, ensure correct storage of data to be written. The polarity of each bit current determines the storage of a ONE or a ZERO. A reference bit in each film word (104 bits) was included for the following reason. The sense readout signal has a width of only 50 to 60 nsec, and the delay in the stack can vary as much as 70 nsec for different address locations. To generate a variable time strobe pulse, a strobe reference bit, storing always a ONE, is included in the stack as the 105th bit. The strobe reference bit sense amplifier drives a clock buffer amplifier (strobe buffer) which supplies a 25-nsec-wide strobe pulse to the information register flip-flops. The strobe pulse sets each bit in the flip-flops to the data state represented by the sense signal passing at that moment through the corre'sponding sense amplifier. The bit current flows parallel to the sense conductor, and induces large inductive noises into the sense signal. Transposition of each sense line with the corresponding bit line by a crossover connection in the middle of the memory plane reduces this noise. This connection in every sense line is made after the glass has been sandwiched between the printed-circuit boards. Due to mechanical imbalance between each sense-line/bit-line pair, some noise (as much as 5 mV) remains. Further reduction of this noise is possible by manually adjusting the small sense end-around loop on the plane. Bit-noise cancellation prevents sense amplifier overloading, and ensures reliable operation at high speed. 96 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 As an additional means of keeping noise in the sense lines at a minimum, we included another feature in the design; during a write cycle, the flow of bit current is restricted to a single memory 'plane, rather than being permitted to flow through the entire stack. Each information switch circuit is associated with one plane (32 per stack). One of the switches is enabled by the decoding of five address bits. The information (bit) drivers connect to the appropriate bit lines on each frame through a diode-transformer assembly (Fig. 2). Employing four (rather than two) diodes per transformer has the advantage that the bit switch circuit can be designed for single-polarity current pulses even though bipolar bit currents flow in the plane. Also, the same amount of current flows through the switch, regardless of the information being written into the films, and only two conductors per bit are needed to interconnect the corresponding bit lines between frames. Sneak ground currents are also eliminated with a four-diode scheme. The information drivers see high impedances in every plane but the selected one. This arrangement eliminates the time delay in the bit current, because the bit lines are effectively connected in parallel. Words are stored 128 to a memory plane, on 32 planes, rather than in the more conventional fashion of a plane storing one bit position for all words. Because of this geometry, and the restriction of bit currents to a single memory NAN~ plane, each plane is effectively a 128-word memory stack in itself, functionally isolated from other planes, during write. The sense lines, on the other hand, are series-connected through all bits in the stack, one line per bit position. MEMORY TIMING The memory timing waveforms are shown in Fig. 3; waveforms of actual word current, bit current, sense readout, and strobe pulse are shown in Fig. 4. Memory read operation begins with an initiate pulse received from the memory control unit. Storing the address information in the address register requires 20 nsec. Address decoding occurs in a single gate level, and requires an additional 20 nsec. The decoding enables the selected word switch circuit, and also one of the 32 information switches. A word gate pulse turns on the chosen word driver at 100 nsec. With a circuit delay in the driver of 50 nsec, current flows in the selected word line at 150 nsec. The sense signals are induced on the sense lines during the word current rise, but, depending upon the location of the word in the stack, may be retarded at the sense amplifier input by as much as 70 nsec. The earliest time at which a signal can appear at this input is at 160 nsec, the latest at 320 nsec. An amplifier delay of 40 nsec allows signals to arrive at the information register at between 200 and 270 nsec. The strobe pulse clocks the informa300 100 400 500 I INITIATE (CLOCKlPULSE ADDRESS REGISTER WoRO SWITCH{ WORD GATE ~---""'I WORD DRIVER ~-.j..---I---"""1 WORD CURRENT I - - - - I - - - - - ! - - SENSE SIGNAL SENSE AMPLIFIER I---..:....-.-+--~ ~-+- ______ ~=-+-_--o<.:~ Tr-"", STROBE PULSE ~_"':"-_-+- _____________ ",,-- wRiTE -NOrSE - --"-~ -~-LA-TE-ST-S";;TROBE - - - - - - - - _ _....Ji INFORMATION REGISTER I - - - - I - - - - + - - - - ! - -...~ INFORMATION SWITCH ( INFORMATION GATE I-------!----!--~--_I__--V" RECOVER GATE INFORIIATION/RECOVER{ CURRENT =='1~-t---t----t---;--t----:::=::4--. . . . Figure 3. Memory Module Timing Diagram. ~ - - - - - - ......, - - - - 2-MC MAGNETIC THIN FILM MEMORY Word Current: 200 mAl em vertical scale Bit Current, Write ONE: 100 mAl em vertical scale 97 40 nsec I cm horizontal scale Word Current: 200 mAl cm vertical scale Bit Current, Write ZERO: 100 mAl cm vertical scale 40 nsec I cm horizontal scalB Amplified Sense Signal, ONE Readout: 1 V I cm vertical scale Strobe Pulse: 2 V I cm vertical scale 40 nsec I cm horizontal scale - f. ~ ~-.; Amplified Sense Signal, ZERO Readout: 1 V I em vertical scale Strobe Pulse: 2 V I cm vertical scale 40 'nsec I cm horizontal -scale • -~-~ I .. --- .. : -----;~.----- _.- Amplified Sense Signal, ZERO Readout: 1 V/cm vertical scale Strobe Pulse: 2 V I cm vertical scale ~ ~.-~-1-_ O.llJ.Sec/cm horizontal scale Amplified Sense Signal, ONE Readout: 1 VI cm vertical scale Strobe Pulse: 2 V I cm vertical scale 0.1 ,",sec I cm horizontal scale Figure 4. Waveforms: Word and Bit Currents, Sense Readout, and Strobe Pulse. tion register, which has a delay of 20 nsec. At the latest possible time of 290 nsec, the information register contains the read data. The write operation begins at 300 nsec. The write cycle either replaces the data read out during the previous read (and contained in the information register), or enters new data into the selected word location, via information drivers. The new data is taken from the buffer register, and is substituted for the signals from the information register. With a circuit delay of 40 to 50 nsec in the information driver, bit current flows at 350 nsec for a duration of 100 nsec. While the bit current is at its crest, the word current (which has continued to flow since initiation of read) is terminated. Termination of the word current allows the magnetization vectors of the films to rotate in the directions established by the bit currents, and the word is written. To eliminate magnetizing energy which would otherwise remain stored in the pulse transformers employed in the bit circuits, a recover pulse is selectively applied to bit lines. The recover pulse, opposite in polarity to the bit current, and of about the .same 98 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 duration, terminates at 550 nsec, to complete the write portion of the memory cycle. MECHANICAL CONSTRUCTION A single cabinet (Fig. 5) houses one stack and all associated circuitry; two such cabinets, one containing certain common circuitry (that shown in the middle of Fig. 1) make up a memory module. The front-opening door on each cabinet carries the control panel for that stack and permits full access to the interior. The interior of the cabinet (Figs. 6 and 7) contains two circuit-card racks which may be locked together, and can be swung either separately, or in unison, around a vertical hinge. The stack, mounted in the lower portion --.... _-- - ••• Figure 6. Memory Cabinet, Door Open and Racks Extended. of one of these racks, as shown in Fig. 6, has the following dimensions: height 30 in., width 26 in., and depth 12 in. The stack housing is an integral part of the rear card rack, which can be swung completely out of the cabinet. The memory frames slide into the stack en. closure from the front, and engage with bit connectors located in the rear panel. The 32 frames lie in the stack horizontally, with a frame-to-frame spacing of 0.7 in. The word driver and switch lines engage the frames through "side-entry" connectors located at the left side of the stack. Placing the word-driver cards and the word-switch cards to the left of the stack keeps the interconnecting wires quite short. (The address decoding matrix is contained on the 32 memory frames.) The five rows of cards above the stack (Fig. 7) contain, in bottom-to-top order, sense amplifiers, information register, bit drivers, address register, and timing and control circuits. Figure 5. Memory Cabinet, Front Door and Control Panel. The separately hinged front rack (Fig. 6) includes space for five rows of logic cards for the party-line transmitters and receivers, input 2-MC MAGNETIC THIN FILM MEMORY 99 under the influence of a magnetic field. The films are 1000A thick; the glass measures 70 by 43 mm, and is 0.2 mm (8 mils) thick. An etching process, applied after deposition, removes the unwanted material from the glass. The 768 rectangular cells contained on one substrate measure 30 by 80 mils each, spaced on 50-mil and 100-mil centers, respectively. The easy direction of film magnetization is along the length of the cell, hard direction along the width, to accommodate shape anisotropy-the demagnetizing effect of the air return path is less significant in this orientation. Two sman registration holes, drilled into the glass prior to deposition, help in the alignment of the glass with the conductors during test and assembly. Each substrate stores 32 words of 24 bits each. Twenty such glass substrates are assembled into one memory plane, as shown in Fig. 8. Arrangement of the substrates into five rows of four each provides storage for 128 words of 120 bits each. (Each word includes 15 spare bits, which are distributed evenly, for possible later replacement of weak or faulty bits.) Figure 7. Memory Cabinet, Showing Rear Rack and Bit Switches. and output decoding, receiving and transmitting registers, and parity-generating and check circuits. A magnetic shield surrounding the memory stack reduces the disturbing influence of the earth's magnetic field. A separate power supply is located in the rear of each cabinet, behind the card racks. The unit operates in a temperature range from 0° to 50° C. Fans, located in the top and bottom of the cabinet, provide air to cool the equipment. THE MEMORY PLANE The magnetic thin films employed in this system are produced by vacuum deposition of nickel-iron alloy onto glass substrates, while The glass substrates of each memory plane are sandwiched between two printed-circuitboard assemblies which measure 20 in. in length and 9 in. in width. Three conductors address every memory cell: a word conductor, a sense conductor, and a bit conductor. The word conductor, 20 mils wide, is parallel to the film easy direction, and lies orthogonally to the sense and bit conductors. (The p.elds associated with the conductors are, of course, orthogonal to the conductors.) A split bit conductor, each half 20 mils wide, and separated from the other by 50 mils, embraces the 10-mil-wide sense conductor. Five printed-circuit boards, each with 24 bit and 24 sense conductors, bond to a single flat backing board 0.1 in. thick (Fig. 8). The 128 word lines, printed onto 1-mil-thick Mylar, bond to the rigid sense-bit assembly. All conductors terminate into tab connections on 50mil centers, located at the edges of the printedcircuit boards. A 9-mil-thick glass epoxy spacer separates the two printed-circuit assemblies, and prevents excessive forces from pressing onto the glass substrates. A small amount of epoxy glue holds each substrate in its proper location. 100 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 and one secondary winding-with a turns ratio of 1: 1:1. To maintain balanced drive conditions between the word drivers and the word switches, we included two selection diodes, one in each primary winding of the transformer. The third diode in the secondary circuit speeds transformer recovery (Fig. 11). The transformer reduces the capacitive noise induced into the sense signal from the word current, as well as the noise generated during transition of the address selection. Each word line is electrically isolated from all other word lines. Reverse biasing of all diodes in a selection matrix prevents undesired sneak currents. During a Figure 8. Elements of Memory Plane. MEMORY FRAME CIRCUIT BOARDS A frame (Figs. 9 and 10) surrounds each completed plane. Three types of circuit boards mounted on the edges of the frame--the word selection matrix, five sense boards, and five bit boards-=-connect to the plane. (There are five rows of substrates in the plane.) The attachment of connectors to the frames helps greatly during debugging and testing, and during replacement of faulty semiconductor components on the plane. (The circuits employed on these boards-for word selection, sensing, and bit selection-are described in greater detail in the next section. ) Word Selection Matrix The word selection matrix, which is part of the frame, contains 128 selection elements. Each element consists of a pulse transformer and three diodes. The transformer is wound with three windings-two primary windings Figure 9. Complete Memory Frame, Front. 101 2-MC MAGNETIC THIN FILM MEMORY nectors. The 128 paired output terminals of the matrix, spaced on 50-mil centers, align with the film word lines, and connect to the word lines through welded-on jumper wires. Welded "end-around" connections jumper the far ends of the word lines on the plane, to complete the return path. The nominal word-current amplitude is 400 rnA, with a tolerance of ±10 percent. Sense Boards The five sense boards are located at one edge of the plane, and the five bit boards at the opposite edge. Therefore, all sense connections are made from the same edge. A small wire loop shorts the far ends of each sense line. The near end connects to the secondary winding of a transformer, as shown in Fig. 12. Each board contains 24 transformers. Each sense transformer contains three windings; one connects to the film sense line, and the other two connect to an edge-board connector. -Four output connections. per sense line are required. The cO.nnector terminals are spaced on 50-mil centers. Figure 10. Complete Memory Frame, Back. memory cycle, this bias is removed from the row of diodes connecting to the enabled switch. In a matrix without transformers, a large voltage swing would be coupled into the sense line, because of the capacity which exists between word line and sense line. The capacitive currents would induce a normal-mode signal which cannot be removed in a differential input circuit. The memory operates at 2 Mc; this selection scheme, however, operates 6 at speeds up to 6 Mc. The word-drive and word-switch connections are made through "side-entry" edge-board con- Bit Boards The bit-line selection spheme employed in this memory utilizes a transformer in every line (Fig. 13). The secondary winding connects to the corresponding bit_ line. A bit current of 100 rnA is required to write a single bit. The printed-circuit end tabs on the bit boards mate with the edge-board connectors located in the backplane of the stack (Fig. 7). FROM WORD DRIVER t ~ ~Ir I~ ~ ~I~ II~ ~ WORO UNE * END-AROUND DIDOE &4 PAIRS 'i TO WORD TO WORD SWITCH SWITCH '------,v,....-----' 64 PAIRS Figure 11. Word Selection Matrix. 102 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 150 ohms. Drivers contain a 7-input AND gate at the input, and the switches contain a 6-input AND gate. Delay in the circuits is 35 nsec. Figure 12. Sense Line'Interwiring. CIRCUIT DESCRIPTION Word-Current Drivers and Switches The word driver and word switch circuits which resemble those described by Bates and D' Ambra,6 can generate currents with 20-nsec rise and fall times when loaded by a small (128-word) memory, such as that employed in the computer module of the D825. The loading of the 4096-word selection matrix (Fig. 11) increases the current rise time to 35 nsec, the fall time to 50 nsec. The driver supplies both a positive pulse and a negative pulse to the interconnecting twisted pair of conductors. The balanced drive arrangement eliminates ground currents and radiating fields which can greatly add to the noise problem. The closeness of the driver and switches to the stack keeps the interconnecting wires short. The total delay (about 10 nsec) from a driver to the word-line endaround short is less than the current rise time, and does not deteriorate the current shape. The drivers and switches supply a current of 200 mA to the selection matrix. The word transformers in the matrix have a 1:1:1 turns ratio, and the output winding receives a current of 400 mAo The drivers and switches connect to the stack through twisted-pair conductors, and both circuits have output impedances of 100 to Sense Amplifiers The sense lines are effectively series-connected through all of the bits in the stack. The sense transformer output windings of corresponding bits connect together from plane to plane in series fashion. Every sense line contains an end-around loop which shorts the far end of the line. This short reflects to the beginning of the line connecting to the transformer. The sense signal which travels through a transformer on an unaddressed plane receives only a small attenuation after the short is reflected to the output winding. This reflection appears at the input after two line delays, and accounts for the signal delay in the stack. The pickoff is taken from the middle of each line, to halve the delay, as shown in Fig. 12. Signal attenuation for worst-case locations is 6 dB. Nominal sense output at the plane is 1.5 mY. The amplifier has a gain of close to 3000, and a delay of 40 nsec. The amplifier is tr~nsformer coupled into a differential input stage, which is succeeded by two amplifying stages. The amplifier digitalizes the signal, and sends both true signal and complement signal to the associated information register flip-flop. Information or Bit Drivers The bit drivers supply bipolar current pulses, 100 nsec wide, to the stack. The bit-driver input circuit contains the decision elements which Figure 13. Bit Selection Scheme. 2-MC MAGNETIC THIN FILM MEMORY either copy the word stored in the information register or allow new data, obtained from the buffer register, to be written into the stack. In addition, the logic function which determines whether a ONE should be stored as a ONE or a ZERO is included. This is necessary because of the sense-line transposition in the center of every plane. The sense amplifier amplifies only single-polarity signals, and all ONEs stored in the stack must appear as negative signals at the input to the sense amplifier. Therefore, all ONEs are stored as ONEs in half of the stack, and as ZEROs in the other half. The reverse is true for the storage of ZEROs. The driver input contains two OR gates driven from four two-input AND gates. The bit driver delay is 35 nsec, and the output stage is transformer-coupled and has an output impedance of 50 ohms. All lines connecting the stack to the backplane are impedance-matched. The bit drivers, located in the second and third rows, connect to the middle of the stack through five groups of coaxial lines, as shown in Fig. 7. On the frame bit connectors (hidden by wiring and c i r cui t s), corresponding bits interconnect through twisted pairs, with an impedance of 150 ohms. A matching transformer connects the coaxial line to the twisted pair. The bit switches for the eight frames shown installed in Fig. 7 cover the coaxial bit lines. Each bit switch circuit is contained in a strip which aligns with the associated memory frame. One switch circuit handles the currents from one row of substrates. Five such circuits on a strip are driven from a common drive circuit (visible at the far right in the photograph) . The bit line impedance on the frame is 10 ohms. The nominal bit current is 100 mAo The bit transformer has a turns ratio of 2:2 :1, which requires 25 mA of current in the primary. The matching transformer which connects the interconnecting twisted pair to the coaxial drive line has a 1:2 turns ratio. This transforms the impedance from 160 ohms to 40 ohms, which is close enough to match the 50ohm coaxial cables. The current necessary from a bit driver, to produce 100 rnA of bit 103 current in a plane, is 50 rnA. The total of 105 times 25 rnA of current is received by the selected information switch. The switch is divided into five individual circuits, operating in parallel, each handling a total of 500 mA; each circuit handles the bit currents in one substrate row. SUBSTRATE TESTER A substrate tester (Fig. 14) submits every bit on a substrate to a pulse test which subjects the bit to disturbing fields resembling worstcase examples of those encountered during actualoperation. The films exhibit pronounced magnetic anisotropy; the B-H hysteresis loop along the film easy axis is rectangular, while that along the hard axis is linear. The films also exhibit various disturbing thresholds for fields applied in different directions. Because of the film's 104 PROCEEDINGS~F ALL JOINT COMPUTER CONFERENCE, 1964 linear loop in the hard direction, a low disturb threshold exists for fields parallel to the transverse (hard) direction. (See Fig. 2.) The test fixture (Fig. 15) is assembled from circuit boards similar to those surrounding the memory plane in the actual memory stack. S ubstrates are inserted and removed through a narrow slit located at the word end-around connections. Two pins through holes in the glass, used to register substrates and circuit boards in the actual memory-plane sandwiches, furnish similar registration ill the substrate tester. A relay rack contains the circuitry necessary to test the substrates. Indicators for alI" flip-flop circuits, located on a control panel, allow observation of the test, and help during operational maintenance. The worst disturb condition exists when a stored ONE bit is surrounded by all ZEROs, or when a ZERO is surrounded by all ONEs. The test word 10001000100. . . , disturbed by all ZEROs in adjacent locations, is tested for ONEs; the test word 01110111011. ~ .., disturbed by all ONEs in adjacent locations, is tested for ZEROs. At the beginning of an operation, the test word (ONEs) is written into all 24 bits of the first address. Next, all ZEROs are written into the adjacent word location; the latter is repeated as many as 32,000 times. The rewrite process subjects the test word to transverse and longitudinal disturbing fields applied simultaneously. The test word is read after the completed disturb cycle, and its content compared with a program register. A match continues the test, by shifting the test word to the next bit (0100010 ... ), and the disturbing continues. After three shifts, every bit in the first address has been tested for ONEs. The test word for ZEROs follows. This test continues until all bits are checked for ONEs and ZEROs. The disturb word can be written to the right or to the left of the test word, alternatively to the right and then to the left, or parallel to the right and left. The parallel writing of two words which embrace the test. word constitutes a more than worst-case· condition-a condition that never occurs during memory operationbut allows grading of the substrates. Substrates which pass the 32k disturb test are assembled into memory frames. These substrates tolerate about 4k to 8k disturb pulses, when tested in the more-than-worst-case parallel mode. The test is fully automatic, and the output signal of the sense lines is not monitored. The tester operates at a frequency of 1 Mc; one disturb test requires 8 seconds if no error occurs. Evaluation of a good substrate requires about 30 to 60 seconds, because the substrate is also retested in the parallel mode. Operation stops on a bad bit, and panel indicator lights display the location of the bad bit, and whether the failure represents a bad ONE or a bad ZERO. Film disturbance shows dependence upon current rise times. Slower current pulses tend to disturb less. Current rise and fall times of 20 nsec are available in the tester, as compared with 35 nsec in the stack. The size and the larger number of selection elements reduce the current rise and fall times in the stack. Substrates which pass the- 32k disturb-pulse test also pass consistently a test of 4 million word and 250 million bit disturb pulses in the frame tester. FRAME TESTER Figure 15. Substrate Tester Test Fixture. Evaluation of fully assembled memory planes (frames) takes place in a frame tester (Fig. 16). A relay rack houses the circuitry and 2-MC MAGNETIC THIN FILM MEMORY 105 power supplies. A specially designed fixture allows the frame to slide into the word connector in an upright position. This provides the connection necessary to address all 128 word lines. A movable rack, containing sense amplifiers and bit drivers for the examination of 24 bits, can slide vertically to the desired group of lines. Printed-circuit edge-board connectors mate with the appropriate conductors. With this mechanical arrangement, an equal conductor length is consistently maintained during the examination of the five groups of substrates, to correspond to the actual stack construction. The evaluation consists of two phases: first, the bit-write noise is reduced by manual adjustment of the "end-around loops." Secondly, a disturb test, similar to that performed on the substrates, is run. The frame tester operates as a memory exerciser, with the capability of inserting worst-case patterns into the plane. The automatic program rewrites the disturb word up to 32,000 times; manual operation allows any desired number of disturb operations. The tester operates at one of three different frequencies: 2 Me, 4 Mc, or 250 kc. Single-pulse operation is also available. Although substrates cannot easily be removed frOlTI an assembled plane, up to three bad bits can be tolerated in each of the five sense-bit groups. The spare lines can replace lines containing faulty or marginal bits, but a small wiring change is necessary. Figure 16. Frame Tester. CONCLUSIONS The operation of this half-microsecond-cycle memory module represents a significant achievement in a program of magnetic thinfilm development for computer storage which was begun at these laboratories in 1955. Large numbers of substrates were processed and tested, and memory plane assembly ~nd test are now routine operations. Memory frames which contain 20 substrates (15,360 bits) can be assembled without great difficulty. The limitations were imposed by the printed-circuit boards, and were due to dimensional tolerances. Cost-per-bit reduction can be achieved by increasing the number of bits contained in a single pluggable unit, because the interconnections in the stack contribute significantly to the total memory cost. A shorter memory cycle can be made possible by reducing the total sense delay, and by the elimination of the bit recover pulse. The pulse transformers 'will be replaced by active solidstate devices. A reduction of 150 nsec-50 nsec from a shorter sense delay and 100 nsec from elimination of the bit recover pulse-make a cycle time of 350 nsec, or 3-Mc operation, possible. 106 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 The capacity and speed attained with this memory are clear indication that magnetic thin films have become the optimum storage elements for reliable, nonvolatile, fast-access memory. ACKNOWLEDGMENT The author wishes to express thanks ~o the many people at these laboratories who we:te responsible for the successful completion of this memory, especially to F. C. Doughty, A. M. Bates, P. A. Hoffman, J. W. Hart, J. S. Jamison, and L. N. Fiore, for their technical assistance; J. H. Engelman and R. P. Himm~l for film fabrication; G. Sabatino, K. McCardell, and S. V. Terrell for film and memory testing; B. C. Thompson and F. Rehhausser for logic design; and R. E. Braun, G. J. Sprenkle, and G. J. Kappe for mechanical design. REFERENCES 1. BITTMANN, E. E., "Thin-film memories: some problems, limitations, and profits," an invited paper presented at the International Nonlinear Magnetics Conference (INTERMAG), April 1963, and published in the Proceedings. 2. RAFFEL, J. 1., et al., "The FX-1 magnetic film memory," Report: MIT Lincoln Laboratories TR278, November 1962. 3. ANDERSON, J. P., et al., "The D825: a multicomputer system for command and control," AFIPS Proceedings, 1962 Fall Joint Computer Conference, December 1962. 4. ANDERSON, J. P., "The Burroughs D825," Datamation, April 1964. 5. THOMPSON, R. N., and WILKINSON, J. T., "The D825 automatic operating and scheduling program," AFIPS Proceedings, 1963 Spring Joint Computer Conference, May 1963. 6. BATES, A. M., and D' AMBRA, F. P., "Thinfilm drive and sense techniques for realizing a 167-nsec read/write cycle," Digest of Technical Papers, 1964 International SolidState Circuits Conference, February 1964. A SEMIPERMANENT MEMORY UTILIZING CORRELATION ADDRESSING George G. Pick Applied Research Laboratory, Sylvania Electronic Systems A Division of Sylvania Electric Products., Inc. W alth.am, Massachusetts scribed in an earlier paper.l In previously reported work~' :>.1 inductive coupling was also used, but electrical connection had to be made to the stored data, or else the capacity was very limited. Some work:;' allowed use of connection less data containing media, but in each case, precise data alignment and interleaving structures were needed. The described memory uses thin copper clad "Mylar" printed circuit sheets which are placed adjacent to each other with no interleaving of any sort. All coupling, into and out of the data planes, is by inductive coupling from and to two respective solenoid arrays which pass loosely through holes in the data planes. SummcLry: A mechanically changeable, semipermanent, random access memory with a 16,384 twenty bit word capacity is described. This solenoid array memory is useful for stored programs and tables in computers, character generation and as a combined input and storage device for special purpose computers. It utilizes an associative technique to allow addressing of any of its 1024 sixteen word datacontaining sheets, which completely avoids any need for electrical connections to the data-containing sheets or for any ordering of the sheets wi thin the memory. Each sheet is a very thin printed circuit onto which data is entered by etching or punch-card controlled cutting. The data is inductively interrogated by means of solenoids which pass loosely through the sheets. The sheets are contained in loose-leaf notebooklike magazines which fit into a file drawer. (i The addressing solenoid array is driven by an input address which has been transformed into an error-correction type code. The address is simultaneously correlated or matched to the stored addresses on each data plane with the result that the autocorrelation on one plane is a voltage positive enough to exceed its diode conduction voltage, and on all unselected planes the cross-correlations result in voltages which are well below, or negative, to that voltage. In consequence, a current is allowed to flow in only the selected plane's data path, allowing only that plane's data to be sensed by the pick-up solenoids. The present memory's access time is 0.7 microsecond and its cycle is below two microseconds. INTRODUCTION In recent years there has been an increasing interest in read-only random access memories. This class of memory has developed along two paths, those electrically alterable and those mechanically alterable. This device falls into the latter class. The solenoid array memory described here is a development which followed the solenoid array correlator and memories de- This association between a coded address and its plane's data allows the mechanical flexibility mentioned earlier. The data plane may 107 108 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 be positioned anywhere along five inches of the solenoids' lengths, and as long as there is only one data plane in the stack with that address, it is uniquely accessible. Each sheet contains sixteen words or 320 bits. To avoid the need for 320 amplifiers, the four lower order address bits are used to select the appropriate group of solenoids and connect them to the sense amplifiers. This technique allows the packing of many words on a single data plane, thereby efficiently multiplying the capacity of the memory and radically increasing its effective bit packing density on normal length words. Single data sheets of practical size can contain upwards of a thousand bits. The ease of data change and the low cost of the storing medium allow this memory to be considered for tasks where it acts not only as a memory, but as an input device as well. Magazines, containing tens or even hundreds of thousands of bits, can be stored on a shelf and inserted into the memory when required with an ease comparable to changing a modern magnetic tape cartridge. Thus for some computing systems, mechanical input reading devices lnay be replaced by a rugged, mechanically static, semi-permanent memory of this type. Overall Description The solenoid array device described utilizes long, thin solenoids to provide a simple, noncritical magnetic coupling between the data containing planes and the array memory structure. The memory is organized so that the data-containing planes need have no connections other than magnetic, and this realization required hyo basic functions-unique data plane drive and appropriate sensing of the driven plane for the selected stored data. These two functions are almost independent and are realized by a driver solenoid array and a sense solenoid array. The driver solenoid array is in essence a substitute for a 1024 line linear addressing matrix and the resultant connection pair that would be needed to each of the 1024 data planes. The sense or pick-up array detects if the various bit positions on each driven plane contain a one or zero. The sense solenoid outputs are connected so that appropriate gating can con- nect the output of only one group of solenoids, a word group, to the sense amplifiers. The arrays are shown in Figure 1. The principle of operation of the solenoid array is based on the transformer. On any transformer, if a wire passes around its flux path, there is coupling, and if it bypasses its flux path, there is only stray or minimal coupling. With solenoids, the same rules apply with little modification. In the memory, the drive array and the sense array are separate components which are only connected through the stored data planes. In series with this connection on each plane is a diode which acts as a switch that allows one and only one plane to be connected at one time during the interrogation. See Figure 2. The address is stored on each plane on that portion which slips over the addressing array. This matrix of mutual inductances which couple a digital input address word simultaneously to all the data planes perform a correlation or dot product operation. The operation thus performed is given by 15 Tj = ~WjkUk . k= 1 j = 1, 2, ... 1024; where Uk and \V jk are the klh components or cells of the input address word U and the stored address word Wj, respectively, and T j are the simultaneous individual output voltages generated on each plane. The correlation is formed by simultaneously energizing 15 solenoid pairs, in either the "zero" Figure 1. Solenoid Array Without Planes. 109 A SEMI-PERMANENT MEMORY UTILIZING CORRELATION ADDRESSING two rows or words can be placed on one loop. (The coupling loops are the paths starting on the right of the data portion, going to the left, up a short distance and returning to the right. All these loops are in series with all the others, the diode and the addressing array loops on the right.) Figure 2. Data Plane. or "one" positions-depending on the input address. The individual multiplications that result are shown in Figure 2A and these positive and negative voltages on each plane are summed together because all of the paths are in a series circuit. Figure 2B shows the operation of the sense solenoids and the manner in which they pick up the stored data. In a later section the organization for solenoid selection switching will be described more fully, however, it should be clear that the interrogation of a data plane results in parallel output of all data bits on the plane. The selection switching circuit is used only to reduce the number of sense amplifiers and subsequent gating circuits. If each plane contained only one word, selection would be eliminated. The right portion of the photograph in Figure 2 shows the addressing paths, plus bias positions to be described later. It may be noted that the loops on vertically adjacent aperture pairs always encircle one and bypass the other. At the bottom of the photograph is a small component, a diode, which "detects," and allows current to flow only in that plane where the addressing correlation resulted in a posi~ tive voltage sum with respect to the diode conduction polarity. . o£L !::-D"i!Z INPUT o LFTPUT a . o.JL The described addressing operation is, as was explained earlier, a substitute for a connection pair to the left portion of the data plane, which stores the actual data. The data portion of the plane on the left side of Figure 2 is organized into 16 rows, each representing a 20 bit word. Two rows are paired into one major loop and there are eight loops on the plane, one above the other. The sense solenoids are not paired like the drive solenoids, hence the data for each bit is stored by cutting the path so that each solenoid, or bit, is either inside· or outside the enclosed area or loop. The distance between the two sides of each major loop is of little consequence, hence /::__ ~DZERO-+i!-+-Z_~~._ -_---O~oOnTPUT 'NM, :g- "Z~ Figure 2A. Data Plane Driving Solenoid. 110 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 I i_ Figure 3. Block Diagram. - ---t---L---+-----"'~-DATA PLANE CURRENT .. ONE" OUTPUT ·2 --O.f---+--+-...l...-_-" ZERO" OUTPUT 2B. Addressing Solenoids Driving Stored Address. The error-correcting code's first function is to allow unique selection of Qne plane, thereby addressing many planes with a modest number of drivers and solenoids. The long "distance"* of the code brings with it the advantage of redundancy which results in high reliability and driver load sharing. (In practice, it has been found that degraded or missing drive pulses on a few drivers have little effect on memory operation.) The operation of the memory can be described by Figure 3. The address is stored in the address register. The addresses' lower order bits are decoded into sixteen sub-addresses for selecting the appropriate sensing solenoid group. The higher order address bits are operated on by the coder to form a Hamming code which operates the addressing solenoids. When the addressing array is driven, current rises on the selected plane and the pick-up or sense solenoid array emits the selected word to a bank of sense amplifiers. Addressing by Correlation A solenoid array memory has previously been built in which the data planes were conductively connected and were addressed by a relatively straightforward coincident voltage technique in rectangular matrix with a diode connected in series with each plane's path at each crosspoint. Another memory was built in which a single solenoid was used to drive each single * In !1 coding sense. data plane with a unique and orthogonal address. The first memory required connections to the data planes, which was acceptable only if data was to be changed infrequently, and the latter memory was limited to a modest number of planes equal to a practical number of drive solenoids, namely, about fifty planes. Addressing by correlation provided an exit from these limitations. Well developed correlation techniques were available 1 from which a correlator could be designed that would accurately correlate a binary address and thereby achieve a form of connectionless associative addressing. However, the original solenoid array correIa tor used air cored solenoids whose output voltages were too low to generate enough voltage to drive a data pla:Qe. Ferrite cored solenoids were designed which improved the coupling, allowing much higher drive voltages to be delivered to the data planes. However, in spite of compensation, the cored solenoid's coupling to the data planes was much less uniform than that of the air cored units (e.g., 15 per cent versus 1 per cent), and, even wIth the high outputs available, single output voltages were too low for reliable operation. Hence, the need for load sharing and the requirement for less critical drive voltage amplitudes combined to recommend an error-correction type code. Use of a non-orthogonal code causes the requirement for a nonlinear component which would detect the positive selection voltage and allow the drive current to flow in the plane-a diode. Since the diode could be a permanently prefabricated part of each plane, and a mechanical arrangement was found that did not increase the total thickness of a stack of planes, the diode was not considered objectionable. Applicable Error-Correcting Codes Mathematics recognizes many types of codes that could be applied to the present device. A SEMI-PERMANENT MEMORY UTILIZING CORRELATION ADDRESSING The memory field has seen the use of the usual binary codes and classes of orthogonal load sharing codes i for addressing or driving core memory selector matrices. In the case of the described memory, binary codes would be unsatisfactory because the difference between a matched or selected correlation and the closest un selected one is too small, namely, one bit. The aforementioned orthogonal codes excel in that the selected address correlation would receive the sum of all or most of the driving bits, and all the unselected addresses would have no drive at all, by mathematical definition of orthogonality, but unfortunately, orthogonal codes always require at least as many bits as there are addresses. They would fulfill the described memory's load sharing requirement, but would sharply limit the possible number of data planes. The error-correcting alphabets were designed to encode relatively long blocks of bits, hence the number of possible a~dresses is relatively large. The use of a diode detector removes the need for code orthogonality, and error-correcting codes are inherently efficient in their use of redundant bits to achieve long coding distances or weights. Two codes were found that were easily applied to the addressing problem. The first is the well known Hamming code, in particular a code with 10 information bits, 5 redundant bits and a "distance" of 4 bits. A second code, even more attractive, is the Golay code, which for this application would represent 10 information bits, with 12 redundant bits and a distance of 8 bits. Both codes can be conveniently generated by either shift register encoders or parallel modulo-2-sum networks. Many variants of these codes are available for both larger or smaller addressing capacity requirements. For magnetic reasons to be discussed subsequently, correlating by means of single solenoids, where zero or one is represented by absence or presence of an input drive, is undesirable. A more practical arrangement is to drive pairs of solenoids in parallel, and to drive them with one polarity for a "zero," and the other for a "one." The effect of this arrangement is that the range of correlation outputs is extended from 111 o to + N (N = number of bits) to a range of - N to + N, thereby doubling the distance or weight of the code. As a result, using the Hamming code, an autocorrelation is + 15 units of voltage, and the nearest cross-correlation is +7. In the Golay code, the respective figures are + 22 and + 6. This distribution of Hamming code outputs, with no bias, is shown in Figure 4. The correlation outputs, as they stand, possess the necessary "distance" properties, but their absolute levels are not optimum for practical operation. The distribution of the unselected outputs must be shifted so that a diode (or diode-Zener diode) detector on the plane can efficiently prevent current flow. (The reasons for choosing either type of detector are discussed subsequently.) The output distribution shift is readily achieved by adding fixed bias in the form of additional solenoid drivers which always operate in the same polarity. Application of the Data Plane Addressing Techniques This section describes the technique of generating the codes, the circuitry of the solenoid drivers, the structure and design criteria of the drive solenoid themselves and the data plane detector considerations. Input Register and Coder The binary address of the desired data plane is entered into a buffer register. This address contains the data plane address along with the additional address bits "for the subselection of data within the plane. The data plane address bits, in ordinary binary code, are themselves DISTRIBUTION FREQUENCY OF DRIVE OUTPU IS 1 I -15 I..y-I 0.5 VOLT INTERVALS J +7 +15 Figure 4. Output Distribution on "Paired" Hamming Code. 112 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 4 SHIFT PULSES AFTER OAT A ENTRY Figure 5. Shift Register Coder. MODULO - 2 - SUM used to each control a solenoid drive polarity; and redundant bits must be generated from the original bits by one of several common techniques. The best technique for generating the codes is the shift register encoder as shown in Figure 5. In this device, the original address is entered as shown in shift register positions 2-11 and the modulo-2 adder operates to set position 1 accordingly. The register is shifted, a new bit generated and the operation is repeated until the data has been shifted to the left-most position. For a Hamming code with 5 redundant bits, 4 shifts are necessary. This technique is simple to instrument, uses a small number of circuits and is most attractive with the following exception. The coding must be done before interrogation, hence the speed of this operation directly affects the access time. Thus for a given logic speed, the minimum access time is clearly limited by the time consumed by the required shifting operations. The alternative is to generate all the redundant bits in parallel. It can be shown~ that all redundant bits can be determined as modulo-2 sums of the original information or input bits. Hence by instrumenting parallel modulo-2 adder logic (exclusive-or), all redundant hits were generated at once. This is shown in Figure 6. Unfortunately, efficient codes have little logical overlap between their redundant bits, hence the amount of circuitry is not inconsiderable. (Typically, a Hamming coder requires about 50 NAND circuits and a Golay coder about 110.) Alternative simplified circuits and magnetic configurations are available, but some degree of complexity remains. For the Hamming codes used in the described memory, the code generation was parallel. A second unit now under construction, uses a Golay code and its redundant bits are generated serially, with higher speed logic to partially compensate for the multiple logic cycles. In that unit, with modest access time, it was found more economical to supply a sepa- rate, faster clock and control counter, all instrumented in higher speed logic, than to generate the Golay code in a parallel. Drive Solenoids The drive solenoids are operated in pairs, with their respective windings connected in parallel so that for one given drive polarity the solenoid flux polarities are opposite as shown in Figure 7. This balanged configuration achieves an approximation of a closed magnetic circuit without the need for an actual closure. Although, because of the air gaps, the mutual inductance between the two solenoids of a pair is not large, the superposition of the individual solenoid fields radically reduces the stray flux. Further, although the correlation used for addressing is very non-critical, the drive pattern sensitivity of individually driven solenoids would be unacceptable. Thus, to minimize the need for magnetic shielding between the drive and pick-up arrays, to minimize drive pattern sensitivity and to somewhat improve mutual coupling to the data planes, paired solenoids can be fully justified. As a bonus, as was mentioned earlier . , the availability of two bit positions on the plane allows the storage of positive and negative corDATA PLANE ADDRESS MOD 2 SUM MOD 2 SUM I REDUNDANT BIT II 12 13 14 Figure 6. Parallel Coder. 15 A SEMI-PERMANENT MEMORY UTILIZING 'CORRELATION ADDRESSING relation weights, thereby automatically doubling the correlation distance, when the solenoids are reversibly driven for one and zero inputs. Magnetic Structure A solenoid using no ferromagnetic core has a very uniform coupling to a surrounding loop over almost its whole length as is shown in Figure 8. However, for a given number of turns, an air core solenoid has a relatively low self-inductance, therefore presenting a low impedance to its driver. Hence many turns must be used for practical air core solenoids with the consequence that the transformer stepdown turns ratio is large. To allow a smaller number of turns, larger diameter air cores or ferrite cores must be utilized to maintain a practical level of self-inductance. The resulting coupling to a loop unfortunately becomes very 113 nonuniform as shown in Figure 9 by the dashed line, an undesirable situation since the drive voltages induced in the planes would vary depending on the position of the plane along the solenoid. To compensate for the nonuniform coupling, two techniques were evolved. The simplest was to vary the turns density along the winding so that regions near the end were more densely wound than near the middle. For good results, this technique will require careful control of winding density, which will be easy to achieve on production machinery but is difficult to do in the laboratory. The alternative technique was to use a linear winding and to vary the ferrite permeability by using short ferrite rods butted against each other. Since the reluctance of the solenoid return path is relatively large, small air gaps between the rods were found SOLENOID DRIVERS 114 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 f OUTPUT L I I , ~X ~SlTlON~ ~ DRIVE OUTPUT~ Figure 8. Air Cored Solenoid and its Relative Coupling to a Loop. quite unobjectionable. Figure 9 shows a solenoid constructed in this way with a middle rod of low permeability and the two end rods with higher permeability. It may be interesting to note that this technique bears a similarity to a triple-tuned bandpass filter with a broadly tuned middle section and more sharply tuned end sections. This techniqueiwas applied in the memory described. The question may reasonably be asked why it is necessary to use all these techniques and to pay the price of higher drive currents instead of using a closed toroidal-like structure. The reasons are as follows: 1. A very elongated thin-legged closed struc- ture is likely to have high leakage flux between its long members when the leakage reluctance becomes comparable to the path reluctance. 2. The need for an easy data exchange would require split cores whose alignment would need to be carefully maintained. Since this memory is intended to applications where data is frequently changed, precisely mated surfaces would require critical protection and very precise alignment mechanisms. Solenoid Drivers The solenoid drivers are designed to supply a 16 volt, half microsecond long pulse to an inductive load of about twenty microhenries, with ample margin. They are also designed to withstand an inductive overshoot equal to the drive pulse, or about 32 volts peak. As was mentioned earlier, each solenoid pair is connected in parallel, as shown in Figure 7, so that the solenoids in each pair always have opposite polarities. A second consideration is that "one" input should drive the pair one way, and a "zero" the other. In earlier designs, a transformer with two input drive windings of opposite winding polarity, one from each of two separate drivers, coupled to an output winding that was connected to the solenoid load. When "one" was asserted, one switch closed and drove the transformer, and when a zero was asserted, the other switch and winding drove the transformer, thereby generating opposite drives on the output winding for the two states. Unfortunately, the transformers were relatively bulky and somewhat inefficient. a Instead, each driver solenoid wag cut with 2 grooves instead of one to allow a bifilar winding, and one winding on each solenoid was driven for a "one," and the other for a "zero." Both of the respective pairs of windings on the solenoids were connected in parallel, in order to maintain opposite magnetic polarity on the solenoids for either drive. The drivers themselves are arranged to be controlled by the input address code. The state of the address turns on either of two currents II or 10 in the driver, which flow as soon as the input address code bits are set up. The currents are shunted to -18 volts by transistors • .. I ~ f '" .... t - ~ ". '- I.... ~.. • :! T; 8l "'!"' r-...; Further, since the operating pulse widths are short because of access and cycle time requirements, the maximum drive currents remain at acceptable levels. To give the structure mechanical strength, the ferrites are inserted into a phenolic-paper tube and appropriately glued in, and the windings are laid into a shallow threaded groove cut on the outside of the tube. The entire assembly is then appropriately varnished or epoxy coated. OUTpuTl~-- -~ 'oo~'_~ ~ x"_ Figure 9. FOR SINGLE UNIFORM ROC Ferrite Cored Solenoid and its Relative Coupling to a Loop. A SEMI-PERMANENT MEMORY UTILIZING CORRELATION ADDRESSING T 1 or T 2 which are saturated continually except during the drive pulse. T 1 and T 2 turn on as soon as power is applied to form a fail safe timing circuit for the solenoid drivers. Tl and T:! can only be shut off by a negative drive pulse to their bases, and the resistance-capacitance time constant in their base input circuit is made long enough to allow them to open only slightly over the maximum desired drive pulse width. When either II or 12 flows, the opening of T 1 and T 2 shunts the current into T 3 or T 4. This turns on one drive transistor, causing the required drive voltage pulse. At the end of the drive pulse, T 1 and T 2 again saturate, rapidly shutting off the conducting transistor- with a low impedance drive. The fail-safe circuit of T 1 and T 2 is needed since the D. C. resistance of the solenoid circuit is very low, and the drive transistor would soon be destroyed if allowed to stay on. It should be noted, that when T 3 is switched on, the transformer coupling between the two solenoid windings causes the collector of T -1 to go to double the supply voltage. Similarly, when T -t is switched on, T/s collector rises. For this reason alone, the inductive overshoot clamp diodes on T;-; and T -t must be tied to almost twice the supply voltage. Thus, the inductive transient causes an overshoot approximately equal to the drive pulse both in amplitude and pulse width. (Volt-time areas are equal.) The large overshoot transient is desirable because it shortens the transient duration, but tends to produce other unwanted transients. These transient currents in the data planes occur after the data has been strobed, hence they do not affect data read-out, but they do require a few microseconds to settle, thereby lengthening the cycle time. These transients, and means to suppress them, are discussed later. Correlation Selection Techniques The addressing solenoid drive causes a parallel correlation operation on all the stored addresses on each respective plane. Figure 10 shows the outputs of one selected plane, and three typical outputs of un selected planes, the former being the positive drive pulse. A means must be provided to uniquely separate the selected plane by allowing a current to flow in I ~ . . f' : (\. \ / . . 115 .. -\ OUTPUT OF - DRIVER CORRELATION 2V /em 0.5 fJS/ div Figure 10. Correlation-Coder Outputs. its path which is much larger than the linear sum of ALL the unselected data plane currents. t An ordinary silicon epitaxial diode has a forward-to-reverse resistance ratio of over a million to one. Its capacitance of a few picofarads in series with the data plane impedance of perhaps ten micro henries and three ohms allows only an extremely short transient current in the un selected planes, and the sum of all these currents is still far exceeded by the select current over the drive pulse interval. The outputs of the stored address code correlations may be represented by a distribution as shown in Figure 4 if no bias is applied. If a four or five microsecond memory is desired, a simple diode may be placed in series with the path on each data plane and additional solenoid drive bias of minus nine units may be added to shift the distribution to that shown in Figure llA. The diode characteristics in Figure lIB will allow current to flow in that addressed plane whose output is to the right of the origin. Unfortunately, after the data has been strobed out, and after the drive pulse is. terminated, the overshoot reverses the distribution so that the unselected planes then go into conduction as shown in Figure llC. As a result, current flows in many planes for a few microseconds with a period determined by plane inductance~ resistance and diode conduction voltage drop. Due to less than perfect coupling from solenoids to planes, the sum of the currents is far less than would be expected in a good transformer, hence the solenoid flux collapses rapidly. The technique used to prevent the flow of current during the overshoot period is as folt This statement is actually a simplification intended to clarify. Actually, the differential of the desired current flow over the interrogation period must greatly exceed the sum of all the un selected differentiated currents. Since the small, unselected' current transients are very short, their positive and negative differentials essentially cancel out during the first fraction of the driver pulse period. 116 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 t FREQUENCY OF OCCURRENCE A. DISTRIBUTION OF OUTPUTS DURING DRIVE +7 OUTPUT _OUTPUT OF SENSE SOLENOIDS '50 my/em 0.51'1/ div B. DIODE CHARACTERISTIC Figure 13. C. DISTRIBUTION OF OUTPUTS DURING OVERSHOOT Figure 11. Distribution of Outputs for Given Drive and Diode Characteristic. lows. First, the drive bias is changed so that the distribution of outputs is that shown in Figure 12A. Second, a zener diode with a breakdown voltage VIis placed in series with the diode. Its voltage is chosen such that the sum of the diode forward conduction voltage drop and the zener breakdown voltage drop equals a voltage which is two units higher than the right limit of the distribution at + 11 as is shown in Figure 12B. Now, during drive, only the selected current flows just as before. However during the overshoot period, the distribution'is such that no current flows at all, as is shown in Figure 12C. Hence, a new memory cycle can begin in less than a microsecond. The oscillograph in Figure 13 shows the voltage sensed by a solenoid in an array loaded with fifty planes using simple diodes, and the oscillograph in Figure 14 shows a similar output due to fifty planes utilizing the diode combination. t FREQUENCY OF OCCURRENCE The diodes are ordinary planar epitaxial types, but "gold bonded" germanium paint contact types work almost as well. The zener diodes most commonly available have relatively high capacitance since they have large junctions designed for high dissipation. The emitter-base junctions of a small silicon transistor such as a 2N706 have very sharp and uniform zenerbreakdown voltages, and low junction capacitances on the order of a few picofarads. These transistors, used as zener diodes, may be seen in the photograph of the ~emory in Figure 15 hanging from the sides of the data planes. However, the diodes are mounted integrally on the data plane in small holes, staggered around the planes' peripheries as shown in Figures 2 and 16. In later units, where the zener diodes are needed, the diode-zener diode would be in one component mounted as the diodes are now mounted as shown in Figure 16. The dual component is a commonly built one, and is in essence a transistor with no base lead connection, in which the usual collector j unction is - ,---- ---/ ---- - A. DISTRIBUTION OF OUTPUTS DURING DRIVE --~---"--~+~13--":'iy Worst Case "Ones" and "Zeros"-Diode Detector. 0.5 fJVdiv 100 mV / em - SOLENOID OUTPUT 0.5 fJl/ em - IN PLANE I. 0100( - ZENER DIODE CHARACTERISTIC VI 0.5 "./ div I) ~ ~-~~ '" ,,' , ~C. DISTRIBUTION OF OUTPUTS DURING OVERSHOOT -19 Figure 12. Distribution of Outputs for G~v~n Drive and Diode-Zener Diode CharacterIstic. _.//\ ---./''''--' "\ '-- 50 mV / em - SOLENOID OUTPUT 0.5 AI div Figure 14. "One" and "Zero" Outputs for 3,as and 1.5ps Cycle Time with Zener Diode-Diode Detector. A SEMI-PERMANENT MEMORY UTILIZING CORRELATION ADDRESSING the diode and the emitter j unction is the zener diode. Dozens of silicon epitaxial type transistors were found to have the desired characteristics, shown in Figure 12B, hence no problems are anticipated in obtaining these "integrated" circuits. Data Readout The device described up to this point of the paper constitutes a substitute for a pair of connections to each plane, with appropriate gating and drive circuitry. Little has been mentioned as to how the current in the plane is detected and used. The organization for data readout is the subject of the following paragraphs. The current ramp in the data plane constitutes a primary drive to a multitude of long sense solenoids or transformers. Each loop or enclosure of a solenoid on a driven plane is a primary coupled to the solenoid secondary, or a "one" and each bypassed solenoid constitutes a "zero" when that plane is interrogated. A "one': causes a voltage output of one polarity, and a zero causes a smaller voltage of opposite polarity. Figure 17 is an oscillograph that shows the drive current in the data plane, and Figures 13 117 and 14 the outputs of a number of "ones" and "zeros" from the respectively driven solenoids. Obviously, all bits on a plane are emitted in parallel, hence some gating is usually desirable to connect only the desired subset, or word, to the output amplifiers. The method of achieving this is quite simple. One terminal of each solenoid in a word group is tied together with the similar terminals of the solenoids in that group. This common terminal becomes the word-select control terminal. All the other word-groups of solenoids are similarly tied together. This is shown in Figure 18. In each word group, the other terminal of each solenoid representing each bit of the word is tied to all the other respective solenoids in the other words by means of diodes. All the common "word line" terminals are biased so that their respective diode switches are backbiased except for those of the addressed word. The diodes on the addressed word's solenoids are forward-biased before the data plane is pulsed, hence effectively connecting the addressed solenoids to the preamplifiers before the interrogating pulse. Since a fraction of a microsecond is needed for currents to change and for diodes to switch, this word preselection proced ure is timed ahead of the main pulse. Figure .15. Photograph of System. 118 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 Figure 16. Diode Mounting Close-Up. The solenoids themselves for most applications are air-cored. These are simply phenolic paper tubing which is wound with a helix of thin copper wire, typically, number 30 wire gauge. After winding, it is suitably coated to protect it with either varnish or epoxy coating. For applications where higher outputs .are . desired, ferrite cored solenoids, smaller in diameter than those in the drive array (quarter inch diameter instead of three eighths) may be used. To minimize pattern sensitivity, they should either be paired and connected in series or spaced far apart. Output Circuitry The diodes used at the output of the solenoids could be matched to those on each bit line to allow the use of a direct coupled, single ended system. However, for signals below 100 millivolts, typical of the simpler air-cored pick-up solenoids, capacitor coupling is indicated. Hence the gated signal is amplified in a linear amplifier, along with any pedestal shifts due to word-line switching, and voltage restoration is applied. If the access time is to be short compared to the pulse width, keying or gating of the restoring voltage is required. If the access time is long compared to the pulse, an ordinary resistance-capacitance high-pass filter is adequate. The waveforms of the connectionless memory shown in Figures 13 and 14 require strobing for reliable operation, and it is timed to occur just before the end of the drive pulse. +v -v -v CURRENT IN PLANE 100 mal em 0.5 .,./ div -v f SELECTION VOLTAGES Figure 17. Drive Current in Plane Due to Hamming Correlation. BIT 2 BIT 3 Figure 18. Word Pre-Selection Matrix. A SEMI-PERMANENT MEMORY UTILIZING CORRELATION ADDRESSING In the limit, with a pick-up array of paired ferrite-cored solenoids, which may deliver outputs of as much as a volt, output flip-flops could be strobed directly from the solenoids. In a more typical case, one or two transistors can be used for amplification, and another for slicing and strobing. For increased speed, a two-stage amplifier followed by the voltage restoring switch, followed by a slicer and output stage is desirable, and the memory described here used this system and is shown in Figure 19. It should be emphasized, that in all cases, the amplifiers were single-ended and differential amplifiers were not required. Data Planes The data planes in the earliest work were simply thin wire, wound on plastic sheets with small bobbins attached. However, this technique did not allow for quick and easy exchange of individual circuits paths or planes. Copperclad Mylar was soon found to the applicable to the requirement. The tooling technique involves accurately drawing the printed circuit layout, drilling a master template which fairly accurately matches the layout, and having a stainless steel mesh "silk screen" fabricated from the printed circuit layout. To avoid critical alignment and fabrication problems, copper path widths are made about 0.040" wide, distances between closest conductors are also 0.040" and the closest a copper path passes to a hole is nominally 0.060". The solenoid array base plate and the data planes themselves are drilled through the same template, hence with only modest care, alignment is no problem. The holes in the planes are about 0.1" larger than the solenoids; hence they fit quite loosely. 119 All data planes contain both alternate paths around the solenoids. In the laboratory, the data is entered by simply scraping off the etch resist before the planes are etched. The result of this operation can be seen in Figure 2. A machine has been constructed that mills away the copper path, under control of an ordinary punch card, at the rate of six bits per second, and this is shown in Figure 20, and can be seen operating on a large 1428 bit plane. The fabrication of the planes themselves is straightforward. Silk-screening, mass drilling under the template, data insertion and etching comprise the laboratory process. Screening, drilling and etching are the large scale process with subsequent data insertion by punching, scraping or milling on the aforementioned machine. In the field, the copper paths can be severed with a knife. The material used mostly so far has been one ounce copper (0.00135") on 2 mil Mylar (0.002") which has a total thickness of about 0.004", allowing well over two hundred planes per lineal inch along the solenoid. Mechanical Considerations In a memory in which changes are not frequent, it is simple enough to slide the planes on or off individually. For greater convenience, many planes can be prealigned to thin· base OUT STROlE Jr"7 -\I' WOlD SOL£NOID UNE ]f.' VOlTAG£· -0.7 ~ESTORATION DftMS Figure 19. Pre-Amplifier. Figure 20. Punch-Card Controlled Cutter for 1428 Bit Planes. 120 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 plates, covered with another thin sheet, and handled as magazines. In either case, the only disadvantage is that in removing data behind or below other data, which is inaccessible until the covering data is removed. A "file cabinet'" like mechanism has been designed to avoid this problem. In this technique, shown in Figure 21, the solenoid array is fastened to the stationary back panel behind a drawer section, with the solenoids extending through the drawer when the file is closed. The solenoids are withdrawn from the drawer when the file is opened. Thus when the file is open, magazines can be removed or replaced individually, and closing the file mates the data containing drawer and the solenoid array. A unit such as this is now under construction. The magazines themselves contain one to two hundred planes each, which are aligned to the magazine by several pins. Changing one data plane requires opening of the drawer, opening of the magazine much as a loose-leaf notebook, finding the "page," and exchanging it. In some operations, entire programs or tables would be stored or shipped in magazines, and when the data was to be used, that magazine simply dropped into the drawer, and the drawer closed. Conclusions At present, a 360,000 bit memory has been built and about a hundred planes placed in it. The signal degradation in increasing the number of planes from a few to a hundred was very minor, hence extrapolation to full capacity appears justified. The unit built used a Hamming code and has operated well. However, for the small increase in complexity, the Golay code more than doubles the selected current, hence a unit now under construction will use that code. A practical limit to this technique is about 4,000 data planes since the practical but powerful Golay code is extendable to a 23-12 code. Physical dimensions also suggest that a stack of 4,000 planes, or about twenty inches, is a reasonable limit. Lengthening the solenoid causes linearly increased driver voltage requirements, and they show practical limits which are equivalent to between 2,000-4,000 planes, with presently available transistors. The limits on the number of bits per plane is also between 2,000 and 4,000, imposed by the limits of the correlation voltage output drives versus the data planes' path resistance and inductance as well as the propagation time in the data planes' paths. In summary, the size limit per module is about 10 i bits, and 2 X 106 bits appears easy to reach. As to access and cycle times, the limits vary with module capacity, and for a 1 megabit memory 0.5 microsecond access and 1 microsecond cycle probably are close to the limit, and twice this is relatively straightforward. Decreases in memory capacity, particularly data plane bit capacity, should be followed linearly by access and cycle times down to a limit of about 0.25 and 0.5 microseconds respectively. Below this, directly connected data plane memories should be considered. Figure 21. "File Cabinet" Mechanism. The cost of these memories is low but highly variable since the associated electronics, inputoutput buffers, coder, sense amplifier and word line selectors set up an "overhead" that is almost invariant over a range from under a hundred to a thousand planes, and goes up very slowly beyond that. The cost of the array, even A SEMI-PERMANENT MEMORY UTILIZING CORRELATION ADDRESSING in hand-made versions, amounts to only a small fraction of a cent per bit capacity. The data plane cost is between a quarter and one cent per bit, including data entry and all fabrication; and a full electronics complement can add anywhere from a quarter to two cents a bit depending on memory size and speed. In summary, this memory has a bit cost ranging from half a cent per bit for large memories to a few cents per bit for relatively small ones, with some downward revision when produced in quantity. It is believed that this type of memory will find application in digital computers where large, infrequently changed blocks of data are used, and other applications where the memory's rapid data change capabilities allow it to be used as an input device as well. ACKNOWLEDGMENTS The author would like to acknowledge the valuable'suggestions and discussions with many of the members of Sylvania's Applied Research Laboratory and particularly those of Messrs. Stephen Gray, Benjamin Eisenstadt, Allan Snyder, Gerald Ratcliffe; Doctors Donald Brick, Richard Turyn and Paul Johannessen and our Director, Dr . James Storer. REFERENCES 1. PICK, G. G., GRAY, S. B., and BRICK, D. B., "The Solenoid Array-A New Computer Element," IEEE Transactions on Electronic Computers, Vol. EC-13, Number 1, February 1964. 121 2. YOUNKER, E. L., et aI., "Design of an Experimental Multiple Instantaneous Response File," AFIPS Conference Proceedings, Vol. 25, Washington, D. C., pp. 515-527, April 1964. 3. KUTTNER, P., "The Rope Memory, a SemiPermanent Storage Device," AFIPS Conference Proceedings, Vol. 24, Las Vegas, Nevada, pp. 45-58, October 1963. 4. BUTCHER, 1. R., "A Prewired Storage Unit," IEEE Transactions on Electronic Computers, Vol. EC-13, No.2, April 1964. 5. ISHIDATE, T., YOSHIZAWA, S., and NAGAMORI, K., "Eddycard Memory-A SemiPermanent Storage," Proc. of the Eastern Joint Computer Conference, Washington, D. C., December 1961, pp. 194-208. 6. FOGLIA, M. R., McDERMID, W. L., and PETERSON, M. E., "Card Capacitor-A SemiPermanent Read-Only Memory," IBM J. Res. and Dev., Vol. 68, p. 67, January 1961. 7. MINNICK, R. C., and HAYNES, J. L., "Magnetic Core Access Switches," IEEE Transactions on Electronic Computers, Vol. EC11, No.3, June 1962, pp. 352-368. 8. TURYN, R., "Some Group Codes," Internal Applied Research Laboratory Note Number 404. 9. DONNELLY, J. M., Card Changable Memories, Computer Design, Vol. 3, No.6, June 1964. A lOs. BIT HIGH ·SPEED FERRITE MEMORY SYSTEM - DESIGN AND OPERATION H. Amemiya, T. R. Mayhew, and R. L. Pryor Radio Corporation of America Camden, New Jersey dividual word lines. This arrangement reduces the noise voltages that are coupled into the memory stack from the word drive system. INTRODUCTION With the advancement of computer technology in recent years, the demand for a veryhigh-speed memory has greatly 'increased. Scratch-pad memories of smaller than 100 words with cycle times faster than 500 nanoseconds are commonly found in computers on the market. However, larger memories of the same speed range are not yet commercially available, due to the fact that the problems in building a large memory are much more complicated than those in building a small memory. These problems center around the transients generated in the digit sense system. The sense amplifier is a differential amplifier in which a delay line is used to minimize dc imbalances and level shift. A tunnel-diode strobe circuit is used to provide low-level thresholding and high-speed operation. Some portions of the electronics of the memory system are located very close to the memory stack. Interconnections are made either by cables or by microstrips. The use of these techniques has resulted in a memory cycle time of 450 nanoseconds for the, memory system. In order to understand these problems, a 1024-word 100-bit memory was built. The storage cells consist of ferrite cores (30 mils O.D., 10 mils LD., 10 mils thick) used in a two-coreper-bit arrangement in a linear organiz~d array. In order to simplify the core-threading work, only two conductors per core are used; one conductor is plated, leaving only one wire to be threaded. MEMORY CELL OPERATION Linear selection (word-organized memory) and partial switching1 , 2, 3, 4, 5, 6, 7 are the two techniques commonly. employed to achieve a cycle time of one microsecond for a high-speed ferrite memory. Linear selection offers the advantage that read currents of large amplitude (limited only by drivers) can be used to increase speed. This method contrasts with coincident current selection, where read currents are dictated by the threshold characteristics of the ferrite cores used. As a new approach, digit lines are treated as a set of mutually coupled parallel transmission lines and are terminated accordingly. Recognition that different modes of wave propagation exist on digit lines was probably the most important step in obtaining the high-speed operation of the present memory. As the memory speed is increased by narrowing the width of the write and the digit pulses and subsequently the width of the read pulse, a point is reached where two-core-per-bit operation becomes necessary. There are two reasons The word drive system uses a square selection matrix with transformer coupling to in123 124 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 for this: the sense signal generated on reading a ZERO becomes large as the rise time of the read pulse is decreased; and the sense signal difference between reading a ONE and reading a ZERO becomes small because the digit pulse in the presence of the write pulse switches only a small fraction of the core irreversibly. Figure 1 illustrates these reasons qualitatively. The ZERO signal is due to reversible flux change, and the ONE signal is due to both irreversible and reversible flux changes, with the former contributing .to the net -signal difference between a ONE and a ZERO. Two-core-per-bit operation provides a means of cancelling the reversible flux contribution to the total sense signal. There are many schemes employing two cores per bit,8, 9, 10 but the one used in this memory is shown in Figure 2. Here, each core is threaded by two conductors, one in the word direction and the other in the digit direction. When writing, both core A and core B of the same bit pair receive a write pulse. In addition, either core A or core B receives a digit pulse depending on the information being written in. When reading, a read pulse is applied to both core A and core B in the direction opposite to that of the write pulse. Digit pulses are always applied through the cores in the direction which is the same as that of the write pulse, because the digit disturb threshold of a core becomes much lower if opposite~polarity digit pulses are used. 7 , 9,10 R --------111 --------111 WRITE ONE WRITE ZERO CORE A CORE B W+O W W W+O (b) ~ V..... (e ) TIE ZERO Figure 2. Two-core-per-bit scheme: (a) basic read-write scheme, (b) magnetization applied to cores when reading, (c) net sense signals. The sense signals generated at core A and core B are added differentially in a differential sense amplifier, where the signal due to the reversible flux change is cancelled. Therefore, only the net signals as shown in Figure 2 ( c) reach the threshold circuit of the sense amplifier. This two-core-per-bit scheme has the following features: 1. Bipolar sense signals provide more reli- able sensing compared to a unipolar sense signal. 2. Word line impedance ·is constant regardless of the information pattern because each bit (a pair of cores) presents a constant impedance to word pulses even if a ONE or a ZERO is stored. 3. Read and write pulses may have loose tolerances. t:::> t:::> 4. Balanced digit lines that are paired for one bit location offer a possibility of controlling wave propagation inside a memory stack. This point will be described in more detail later. Q. o o 100 Figure 1. Sense signals of one-core-per-bit memory at increased speed. The ferrite cores used in this memory have an outer diameter of 30 mils, an inner ,diameter of 10 mils, and a thickness of 10 mils. The operating conditions are shown in Table I. A test has shown that the worst-case disturb pattern changes the sense signal by less than 10 per cent. A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM 125 Table I OPERATING CONDITIONS OF THE FERRITE CORES 30 mils O.D., 10 mils J.D., 10 mils thick CORE DIMENSIONS DRIVE PULSES AMPLITUDE RISE TIME READ 80 nsec 630 ma +5% WRITE 220 ma +5% 40 nsec DIGIT 30 nsec 70 ma +3% SENSE SIGNAL FROM CORES KICKBACK VOLTAGE WHEN READING FALL TIME WIDTH (50% point) 30 nsec 40 nsec 30 nsec +50 mv 0.25 v/bit 80 nsec 80 nsec 75 nsec WAVE PROPAGATION IN THE MEMORY STACK AND TERMINATIONS inherent coupling due to digit lines running in parallel and the coupling due to word lines. It is a very basic requirement that a memory system· must be able to store any ,information pattern desired at any word location. Since some words are located close to the digit drivers and sense amplifiers whereas others are located far away from them, it is required that digit lines must be able to carry digit pulses and sense signals without distortion. These requirements make it essential that the wave propagation inside a memory stack be well understood.n,12 The problem is complicated because many digit lines are parallel for a considerable distance and because many word lines cross the digit lines at right angles, with ferrite cores at the intersections. A relatively simple mathematical analysis of this structure can be made if one assumes that the delay on the word lines is zero. Then, the presence of word lines may be considered as contributing only to the coupling between digit lines. With this assumption, the problem of two-dimensional wave propagation changes into that of one-dimensional wave propagation on multiple parallel transmission lines with mutual coupling. The mutual cou:' pIing now consists of two parts, namely, the To fulfill the requirement that digit lines carry digit pulses and sense signals without distortion, it is necessary that digit lines be lossless and that there be no interference among waves propagating on separate digit lines. The first condition is met approximately by a memory stack. The second condition is normally not sat.:. isfied because of mutual coupling. However, there is a wave mode that propagates on a pair of lines without interference, provided that i a certain manipulation of coupling is made. LINE n LINE n' Sl- PAIR n { 1...[ - EQUAL COUPLING k Equalization of coupling and differential mode. UNE Figure 3. In Figure 3, line n and line n' belong to pair and line k is a line outside pair n. Assume that the coupling between line n and line k is made equal to that between line n' and line k. Then, the differential-mode propagation on pair n (Le., simultaneous propagations of same amplitude but of opposite polarities on lines nand n') does not induce propagation on line k, because of cancellation effect. In other words, if digIt lines are paired, each pair can have independent differential-mode propagation without interference, provided that equalization of coupling is made. * The transposition method used in the stack to obtain equalization of coupling will be explained later. 11, Therefore, it is desirable to have all the propagations in differential mode. However, this is not the case with the memory being discussed here. In Figure 2 (a) it is shown that digit lines are paired, a result of the consideration given above. But the digit pulses are not applied differentially because negative digit * Proof is given in the appendix. 126 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 pulses are not permitted. Since either of the two lines of a pair is always driven by a digit pulse, whether a ONE or a ZERO is being written in, digit pulse propagation can be regarded as a superposition of the differential-mode component and the common-mode component as shown in Figure 4, with current amplitude onehalf that of the digit pulse on each line. The differential mode component consists of a number of differential mode propagations, one for every digit line pair, that are made independent of each other by equalization of coupling. The common mode component obviously has no interference problem, since all the digit lines carry the same current pulses simultaneously. The information is carried by the differential mode component and not by the common mode component, as the latter merely serves as a fixed bias, independent of the information being written in. Digit lines are terminated to eliminate reflections, since undesired reflections reduce system reliability and prolong cycle time. For instance, a proper termination is the only means to minimize the waiting time between writing and reading, as the digit pulses must be completely dissipated before sense signals can be detected. {" 1.....1'" 1.....1'" .rt. {Ii () n Z. • Zc-Zd 2 {" {" I ..It. .IL.. - .IL.. () (j II AAIR .IL.. { I, PA.. {" n .l"""'\. () 1.....1'" Ii (a) (b) {"" .IL.. Ie) Figure 4. Propagation of digit pulses: (a) digit pulses on digit lines, (b) differential mode component, (c) common mode component. The differential mode component and the common mode component require different impedance for termination. As shown in Figure 5, let Zd and Zc be the proper termination for the differential mode and the common mode, respectively. Zd is smaller than Zc,and the difference between the two is rather appreciable due to the effect of word lines. To terminate both modes, either a T network or a 7r network may be used, as shown in Figure 5 (c) and (d). (b) >11 (e) .IL.. ..11.. Ci 11 (a) PAIR [: {Ii PAI.{" n PAIR r:: :$:::: Z • 2 Zc Zd Zc-Zd (d) Figure 5. Digit line terminations: (a) differential mode termination, (b) common mode termination, (c) T termination (both modes), (d) 11' termination (both modes). A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM When reading, signals are sensed by differential sense amplifiers. It is noted that the net signal is propagated in the differential mode, and there is no interference problem. Since common-mode voltage is not sensed by the amplifiers, the common-mode termination is less critical than the differential mode termination. The fact that the digit lines are terminated means that only one-half of the raw sense signal reaches the sense amplifier. This seeming disadvantage is far outweighed by the advantage of being able to control the wave propagation generated in the memory stack. Since the actual memory stack consists of eight memory planes, digit lines are folded. Figure 6 shows digit lines unfolded in order to show the transposition details. This transposition method equalizes coupling between any two adjacent line pairs if transpositwns are done at short intervals. Figure 6 also shows that the digit lines are terminated on both ends by T terminations and that digit drivers and sense amplifiers are connected to the mid-points of digit lines. This connection minimizes the digit 127 line delay measured from the driving and sens- . ing point. Yet, the digit line delay across 1024 words of 40 nanoseconds requires two different timings for the read and write pulses. Digit pulses used have negative polarity and are applied through diodes. These diodes disconnect digit driver cables and digit drivers from digit lines to avoid loading the sense signals. The emitter followers are the first stage of a sense amplifier and work as impedance transformers. Thes~ diodes and emitter followers are mounted on the stack assembly. The effect of the new termination method on the digit pulse waveform and on the digit transient will be shown later. DESIGN OF MEMORY STACK In the memory, one of the two conductors that go through cores is a conventional wire and the other is a plated conductor. Figure 7 shows the plated conductor as well as how memory cores are assembled into a strip. Individual cores are first metallized by vacuum deposition and then inserted into a groove cut in the mid- PAIR DIGIT DRIVER CABLES 1=~!:i;~=iE~§~§~FROM DIGIT I DRIVERS SENSE AMPLIFIER FIRST STAGE ' - - - - - ' ( L _ _ _ _ _J TO SECOND STAGE Figure 6. View of unfolded digit lines showing trans-positions to obtain equalization of coupling and connections to digit drivers and sense amplifiers. 128 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 INSULATOR Figure 8. Memory plane. Figure 7. Ferrite core strip. dIe of an insulator strip, with connecting conductors already etched. Then the.strip is electroplated to improve contact and also to lower the over-all resistance of the conductive path. 9 Since the connecting conductors on an insulator strip connect two neighboring cores on the same side, the resulting conductive path has a zig-zag pattern. Each ferrite core strip contains 128 cores, with each memory plane holding 200 strips. Since plated conductors are used as digit lines, each memory plane contains 128 words of 100 bits each with 8 planes comprising a full memory stack. Plated conductors were used as digit lines because they permit pairing two neighboring conductors to form a bit pair. Such pairing is helpful in maintaining a good balo.: ance between the two lines of a pair and also to simplify transposition. If, on the other hand, the plated conductors are used as word conductors, it will be necessary to pair two nonadjacent digit lines because of the zig-zag pattern of the plated conductors. As shown in Figure 8, a memory plane consists of a substrate and 200 ferrite core strips, of which 100 are mounted on the top surface and the remaining 100 on the bottom surface. This packaging technique causes the word lines to be folded into hair-pin shape to facilitate connection to the word drive system. Two opposing sides are used for word line'; c:onnections; i.e., on each memory plane 64 word lines have their ends brought out to one side and the other 64 word lines to the other side. Ground planes are provided on the top and the bottom surfaces of a substrate, over which ferrite core strips are placed. The ground planes are connected to the supporting structure on the four corners of the memory stack assembly. When eight memory planes have been assembled, ferrite core strips are connected to form digit lines. Since each strip contains 128 cores, eight strips connected in series make up a full digit line. One group of 50 bit pairs (100 digit lines) is made up of core strips mounted on the top surfaces of the eight memory planes; another group of 50 bit pairs consists of core strips on the bottom surfaces of the memory planes. This packaging technique is shown in Figure 9. It is noted that these two groups have symmetry; i.e., (b) is obtained by rotating (a) 180 degrees. The digit system is divided into two groups to make the best use of the. stack surface areas· for external connections, which include 200 transistors, 400 diodes, 600 termination resistors and 300 cable connectors for the digit system. (See Figure 6.) Figure 10 shows the utilization of the memory stack surfaces for the digit and the word connections; all usable surfaces are being used. The top and bottom surfaces are actually the top surface of the top memory plane and the bottom surface of the bottom memory plane, and are not usable for external connections. In the construction of the present memory, bit sense signal testing was done after each memory plane had been completed with core strips. Bad cores were then replaced. The resistance of the digit lines (plated conductors) across 1024 words was found to fall between 1.6 and 2.0 ohms. A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM "'==========---o END MIDPOINT MIDPOINT MEMORY PLANE SUBSTRATE 1...=:=====---oEND (a) ( b) Figure 9. Methods of connecting ferrite core strips: (a) one group of 50 bit-pairs is obtained by connecting ferrite core strips on the top surfaces of all eight planes, and (b) the other group of 50 bit-pairs is obtained by connecting ferrite core strips on the bottom surfaces. WORD . / /""'- -- CONNECTIONS 512WORD~ DIGIT //",-...---- / / CONNECTIONS ,., 50 BITS ./// f "" "" . . . . r:_ I I I I I I "................. . // ~/ I /- . . ""--...... // I I ~, "'" I -',1// "( I I I /""-, I I I ------/1:::, MEMORY STACK 1024 WORDS X 100 BITS // I /- I // /// ./ I I ./' I I DIGIT CONNECTIONS) 50 BITS // / // / / / . . . --....J.-/. Figure 10. Utilization of the memory stack surfaces for external connections. 129 130 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 ELECTRONICS FOR THE 1024-WORD MEMORY CONTROL SYSTEM Figure 11 is a block diagram of the memory system, which has four major portions: block,\Whichhasatypicalrtwo-leve~AND\-OR !logic 1. 2. 3. 4. The control system is built of a logic building Memory stack assembly, Control system, Word system, Digit system. The memory stack has been described. The control system generates and supplies all timing pulses for the drive system and for transferring data. The word system at the command of the control system supplies the proper read and write current pulses to a selected word for reading and writing information out of or into the memory. The digit system is used in a dual fashion: to provide sensing of the information stored; and to write back into the memory, simultaneously with the write pulse, formerly stored or new information. Parts of both the word system and the digit system are packaged on the memory stack. WORD SYSTEM l • ADDRESS MEMORY ADDRESS REGISTER 10 BITS , ,5 BITS -r~ABLE; SWITCHES 32 o 100ETRANSFORMER MATRIX 32 X 32 • I I I 100 BITS , 1 READ DRIVER 32 WRITE DRIVER TIMING PULSES 1 WRITE DRIVER 32 I TERMINATIONS AND SE'NSE AMPLIFIER FIRST STAGES MEMORY STACK AND SURROUNDING ELECTRONICS • X8 I 100 I CABLES ~ I .J DATA DIGIT SYSTEM l MEMORY REGISTER 100 BITS T SENSE AMPLIFIER SUCCEEDING STAGES 100 CABLES DIGIT DRIVER ONE I 100 I CABLES DIGIT DRIVER ZERO -, COM!AND TIMING ·GENERATOR ~CODER J32 CABLES .JL32_CABLES _ _ _ _ -, MEMORY STACK 1024 WORDS CONTROL SYSTEM t 5 BITS READ DRIVER DECODER 4X8 SWITCH DECODER 4X8 ___ delay of seven nanoseconds with fan-out of six. The system controls all the necessary timing pulses for each of three cycle types: 1) read cycle, 2) write cycle, 3) split cycle. The first two are standard for destructive random access memories, the first being the standard read-out operation which must be followed by regeneration while the information is still in the memory register. The second is the standard means of getting new information into the memory, the read half of the cycle being used only to clear the memory while the strobe pulse is inhibited. The memory register is loaded with the new information which is then written into the memory. The only feature which is unusual is the split cycle. The first command for this cycle generates only a read operation accompanied by a strobe of the sense amplifier. The retrieved information is available for processing but is not regenerated since the entire memory cycle has been temporarily suspended. When the con- Figure 11. Block diagram of the memory system. ...... A 10· BIT HIGH SPEED FERRITE MEMORY SYSTEM tinue command is given, the memory register is reset a second time to receive the newly processed information which is then stored in memory. Thus, the first half starts a conventional "read" cycle which stops itself in the middle, upon later command to continue as a "write" cycle after clearing the memory register. The time-saving features of this type of cycle are compatible with many of the common computer operations. The timing pulses generated by the control system for a read cycle are shown in their time relationships in Figure 12. WORD SYSTEM The word system is required, with the generation of minimum noise, to distribute a large read pulse, followed by a smaller write pulse of opposite polarity, to any of the 1024 words which happens to be addressed by the 10-bit address register. The bulk of this decoding is done in a bipolar diode matrix driven by 32 pairs of read and write drivers along one side -1l~ READ COMMAND PULSE and 32 switches along the other side. The 1024 intersections of this main matrix are transformer-coupled to the 1024 word lines of the memory stack. The dc level of the word line is restored by a diode-resistor network in the secondary of the transformer. Without this network a dc level shift will appear, as the read current pulse is greater in amplitude and duration than the write current pulse. The circuitry of the main matrix is shown in Figure 13. Each of the drivers and each of the switches has its own preamplifier channel complete with an AND gate having one negative and one positive input. The complete read and write driver channels are shown in Figure 14, and the switch channel in Figure 15. These driver and switch channels are arranged in three 4 X 8 matrices. These matrices permit the selection of one of 32 switch channels and one each of 32 read drivers and write drivers to select any word and drive it. The main problem encountered in designing a word drive system for a high-speed, high-bit- __________________ -1'____________________ ADDRESS TRANSFER PULSE SWITCH TIMING PULSE ~ \ /I READ TIMING PULSE \~ WRITE TIMING PULSE /I DIGIT TIMING PULSE I MEMORY REGISTER RESET PULSE STROBE PULSE 131 ~\ , ..J\ __________ ~f\~ _______________ DATA AVAILABLE PULSE (COM MUN ICATION PULSE) ______----~f\~------------- CYCLE COMPLETE PULSE (COMMUNICATION PULSE) ------------------~ Figure 12. Timing diagram for a read cycle. Read and write timing pulses shift in time depending on the word address. (Solid lines show "Timing A" and broken lines show "Timing B"). 132 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 WRITE DRIVER CHANNEL W R READ DRIVER CHANNEL liB WRITE DRIVER CHANNEL WORD LINE S SWITCH CHANNEL S SWITCH CHANNEL Figure 13. Portion of the 32 x 32 bipolar matrix. capacity memory is that of minimizing the noise introduced into the stack. As the electronics are a significant cost factor, it is desirable to use a bipolar diode matrix performing selection with drivers and switches; such a matrix reduces the cost of the word-system elec- tronics for a 1024-word memory about ten to one. It has been our experience, however, that with all the types of bipolar matrices that we can devise, severe switching transients are introduced on n lines of an n2 matrix when the switch selection is made. Moreover, it is found NEGATIVE "AND" 0 - - - - - - - , INPUT 430pf 750pf 34.enlWt 1°/. 2N828 OUTPUT TO BIPOLAR MATRIX POSITIVE "AND"~~"""'....I INPUT 30n ,..-~=--, 560n (a) NEGATIVE nANDn 0 - - - - - - , INPUT 430pf OUTPUT TO BIPOLAR ___-...-0 MATRIX POSITIVE "AND·~~""""...J INPUT, 30n (b) Figure 14. Word drivers: (a) read driver channel, (b) write driver channel. A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM 133 NEGATIV,E HAND" INPUT POSITIVE "AND" INPUT Figure 15. Switch channel. that the characteristics of the memory stack, both with conventional core memories and with our partially automated fabrication techniques, show much tighter capacitive coupling between the network of word lines and the network of digit lines than can be made to exist between either of these networks and a ground plane. The result is a tendency for the conventional word selection matrix to introduce a very sizable common-mode noise onto the network of digit lines. An analysis of the switch-noise injection procedure in the memory, where the word lines connected to a single switch are uniformly intersected along the terminated digit line, shows that a voltage step on the selected switch couples via the 32 word lines controlled by the switch all along the digit line simultaneously. Thus, a step at the switch generates a commonmode noise, the amplitude of which can be predicted from the inter-line and line-to-ground plane capacitances. The portion of this common-mode signal which the stack converts into the differential mode depends on the balance between the two digit lines of a pair and varies from one line-pair to another. A pulse transformer for each word line was used for capacitive decoupling between the word selection matrix and the memory stack. The interwinding capacitance of the transformer is a maximum of 7 picofarads, whereas the capacitance between a word line and all the digit lines connected together is about 60 pico- farads, resulting "in a switching noise attenuation of about 10 to 1. Starting with a 35-volt, 30-nsec rise time step for switch selection, the half-selected word lines experience only a 3-volt step because of the isolatIon afforded by the transformer. " This condition in turn cal1-ses a 0.5-volt common-mode spike to exist on the digit line. In the worst case this spike generates a differential noise signal almost as large in amplitude as a sense signal. This noise must be displaced in time from the. sense signal by causing the timing pulse for the switch to start earlier than the timing pulse for the read driver. As shown in Figure 12, the switch timing pulse is used to select a switch. This technique differs from normal practice, which does without a timing pulse, with the result that at least one switch is turned on all the time. In the present memory, a switch is turned on for a specific length of time to let the read and the write currents go through; otherwise, no switch stays turned on. The switch noise is appreciably reduced by holding the switches off until after the memory address register has completely settled from the address transfer transient, as otherwise a spurious selection of switches during the address transfer transient will inject additional noise into the stack. By turning off the switch as soon as the write pulse is terminated, the problem of slow switch turn-off can be easily eliminated. Another closely associated problem is injection of noise via the half-selected word lines 134 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 controlled by the same switch during the read pulse. This is due to the passage of the read pulse through the finite impedance of the switch circuit, with its mechanism of noise injection being very similar to the one described above. This type of noise is a threat due to the fact that it always coincides with the sense signal. It is obvious that this problem can be minimized by lowering the impedance of the switch circuit. A low switch impedance is also desirable from the standpoint of matrix operation because it permits unimpeded flow of the read and the write pulses. The problem was solved by avoiding cables for connecting the switch channels and the word selection matrix altogether, and instead packaging the output stages of the switch channels at the memory stack. Here again the isolation provided by the use of coupling transformers alleviates the noise problem greatly. Another advantage in using coupling transformers is the speed increase of the switch operation due to the capacitive isolation afforded by the transformers. Actually, this speed increase and the switch turn-on noise reduction brought about by the coupling transformers are closely related. The capacitive charging and discharging currents that must come from a switch when it is turned on and off are made small by the use of the transformers. Thus, the switch speed is increased. Since the switch noise is caused by the same charging current entering the memory stack, the noise is reduced by the transformers. As shown in Figures 6 and 12, the read and the write pulses have two different timings, depending on the word address. This feature is necessary because the digit line delay of 20 nanoseconds from the driving and sensing point to the termination is not negligible compared with the drive current widths. The problems here are basically that of aligning the read and the strobe pulses and that of aligning the write and the digit pulses. In the present memory, the strobe and the digit pulses are fixed and the read and the write pulses are shifted according to the word address. The 1024 words of the memory are divided into two groups of 512 words. One group is closer to the digit-driving and sensing points while the other group is closer to the terminations as showri in Figure 6. The former group uses Word Pulse Timing A, and the latter group Word Pulse Timing B. Figure 12 shows that the read pulse of Timing A is delayed compared to that of Timing B, and the write pulse of Timing A is advanced compared to that of Timing B. The difference is 10 nanoseconds, which is one-half of the effective digit line delay. DIGIT SYSTEM The digit system (Figure 16) is composed of the circuits that are used to detect and to write or regenerate information in each of the one hundred bits of a selected word. The circuits include 100 sense amplifiers, 100 digit drivers and the 100 flip-flops that form the memory information register. Digit Driver During the write time, the digit driver provides a 70-milliampere current pulse into one of the two digit lines in a direction to add to the write pulse in one of the two cores of a memory bit. The digit driver consists of two identical current drivers which are under the dual control of the timing generator and the flip-flop in the memory information register. The width of the digit pulse, 75 nanoseconds, is controlled by the digit timing pulse. The first stage of the digit driver produces a gated 10-volt pulse. The pulse is produced in one of the current drivers by the coincidence of the positive digit timing pulse and a low voltage level from one side of the flip-flop in the memory information register. The second current driver is inhibited by the positive level from the second side of the flip-flop. The gated pulse is applied to the second stage through a capacitor that is used to give the pulse a negative level shift so that at the input to the second stage the pulse goes positive to - 25 volts from a reference level of - 35 volts. The second stage is a double emitter follower which is used to provide a voltage drive for the output stage. The output stage is a nonsaturating current driver whose output current is determined by the resistance in the emitter circuit and the voltage swing at the base of the transistor. The output stage drives the center of the digit line A 10· BIT HIGH SPEED FERRITE MEMORY SYSTEM -r ~T ~,: ,DIGIT DRIVER I -riE.MORY STACK- ~~~I FIRST STAGE I I , SENSE AMPLIFIER- 200 SECOND STAGE --l I ~ I I I I I i I I II 30/10 CORES 62 1·81<1 1 I 137 4.22 IIA I ~ '~i l' I I I~ MEMORY REGISTER ZERO SIDE L_ CURRECTOR I TIMING f I J I I I I I I I I I I I I V:'D ~ D«;t.:.~:.: __ -I V 133 135 I I It( II 1 ~~~ ,I 62 26 2N708: IN3605 IN3605 I I +3V I -30 I I 1.Jl!.; 1- _____ ~ ~08E TO MENORV REGISTER IK 3.91( IK 200 IIf3851 2N96SA +~ _~_ _ __ t I -.J Figure 16. Digit system. through a 100-ohm cable and a series diode, providing a 70-milliampere pulse into each half of the digit line. The diode is used at the memory stack to isolate the digit driver cable from the digit line when the driver is not in use so that low-level signals in the memory are not loaded by the cable. Sense Amplijier13 , 14, 15, 16 The digit lines are terminated at both ends in order to reduce the recovery time of the memory stack. As a result, only half of the difference signal from the two cores of a bit is available at the sense amplifier input. The difference signal is bipolar where one polarity represents a ONE and the other polarity represents a ZERO. The sense amplifier amplifies the difference signal and is strobed during a portion of the read time. The polarity of the sense signal at strobe time is sensed and if a ONE is detected, the sense amplifier produces a 3-volt negative-going output pulse which sets a flip-flop in the memory information register. If a ZERO is sensed, no change occurs at the sense amplifier output. During the write time, a negative digit pulse of approximately 20 volts is applied to one of the two digit lines, depending on whether a ONE or a ZERO is being written into the memory. The sense amplifier is inhibited during the write time by the strobe circuit and recovers in less than 50 nanoseconds after the last difference-mode reflections from the digit pulse have ceased to exist on the digit lines. The first stage of the sense amplifier consists of two emitter followers which are connected to the center of the digit lines as shown in Figure 16and are used to provide a high input impedance so that the sense amplifier does not load the digit lines and does not interfere with the termination of the lines. The emitter followers and series diodes are physically mounted near the center of the memory stack and are connected to the plug-in board that contains the regeneration loop circuits by means of a 125ohm shielded twisted-pair cable. The twisted-pair cable is terminated at the input to the second stage with resistors connected to a decoupled power supply. When the negative digit pulse is applied to one of the digit lines, the corresponding emitter follower 136 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 is turned off and the transient at the input to the second stage is limited to 300 millivolts, since the currents in the cable termination are reduced to zero. The diodes are used in series with the emitter-follower outputs in order to prevent the flow of current if the base-to-emitter breakdown voltage is exceeded by the digit pulse. The second stage of the sense amplifier is a differential amplifier, with the transistor collectors connected together through a delay line. This stage amplifies the difference between the input signals and sums the inverted amplified difference signal and the delayed amplified difference signal. This produces output voltage waveforms at the collectors that do not have a dc level shift with repetition rate variations. The output waveforms are on a well-determined reference voltage level which is determined by the current in the large resistance in the emitter circuit. Practically all of the emitter current reaches the transistor collectors and produces a constant operating voltage across the parallel combination of the collector resistors. The operating voltage is constant even if the current is not shared equally by the two transistors, since' the delay line provides a dc short circuit between the collectors. The collector resistors are used to terminate the delay line so that there will be no reflections and the outputs of the second stage will recover to the reference level in a minimum of time after the end of the digit transient. The delay of the delay line is long enough that a usable amount of the inverted amplified sense signal is passed before the output is reduced by the delayed amplified sense signal. The delay in this system is 25 nanoseconds, which is approximately one-half the base width of a sense signal. The third stage of the sense amplifier is an ac-coupled differential amplifier. One output is used as a test point for observing amplified sense signals, and the other output drives the next stage. The third stage has a maximum output current swing that is limited by the current in the emitter current sources. This prevents the digit transient from overpowering the inhibit current in the strobe circuit. The last stage is the strobe and pulse-stretching circuit. This stage contains a bistably biased five-milliampere germanium tunnel diode which drives an output transistor. The tunnel diode has two inputs. One input is from the third stage which provides the amplified sense signal and also provides the normal bias current for the tunnel diode. The second input is from the strobe circuit which during the inhibit time provides sufficient reverse current through the tunnel diode to keep it in the low-voltage state during the digit transient. The operation of this stage is illustrated in Figure 17, which shows the tunnel diode voltampere characteristic and its load line. The tunnel diode is normally biased in the lowvoltage state at point A and is unable to switch to the high-voltage state during the digit transient because of the current-limiting action of the third stage. During a portion of the read time the sense amplifier is strobed by removing the inhibit current, thereby biasing the tunnel diode in the low-voltage state near the knee at point B. A difference signal of five millivolts at the input of the sense amplifier and the polarity of a ONE signal is sufficient to trigger the tunnel diode to point C in the high-voltage state. The tunnel diode turns on the output transistor which produces a three-volt negativegoing pulse used to set a flip-flop in the memory information register. The tunnel diode remains in the high-voltage state until the inhibit TUNNEL DIODE CHARACTERISTIC Figure 17. Tunnel diode characteristic and load line. A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM 137 current is applied by the strobe circuit. The inhibit current resets the tunnel diode to point A and terminates the output pulse. Figure 18 (d) shows the strobe pulse which is positive only during the first peak of· the amplified sense signal shown in Figure 18 (c) . The operation of the sense amplifier is illustrated by the waveforms in Figure 18. Figure 18 (a) shows superimposed the read-out signals and digit transients on the two lines of a digit line-pair. The stored information is represented in the difference between the two signals that appear on the lines at read time. Figure 18 ( e) shows the sense amplifier output. The negative-going pulse indicates the detection of a ONE. Figure 18 (b) shows the signals that appear at the sense-amplifier test point when the delay line is removed from the circuit. The solid line shows reading and regenerating a ONE and the dotted line shows reading and regenerating a ZERO. The amplifier has amplified the difference in the read-out signals and has limited the digit transient. It can be observed that these waveforms have a dc component which would result in a dc shift in an ac amplifier. This shift would be particularly objectionable at high repetition rates. In addition, the waveform base line is dependent on the dc balances of the previous stages. Figure 18 (c) shows the test point signals with the delay line in the circuit. The waveforms represent the sum of the inverted amplified difference signal and the delayed amplified difference signal. These waveforn1s have no dc component other than the base-line voltage, which is well determined. READ TIME PACKAGING The circuitry of the memory, except for those parts that had to be near the memory stack for special reasons, is packaged in four nests surrounding the memory stack as seen in the photograph in Figure 19. Each nest has 10 removable motherboards, each of which could potentially contain up to 56 small plug-in modules. Individual modules contain such parts as logic blocks, portions of drivers, portions of the sense amplifiers, etc. When a nest is completely assembled, all of the circuitry within it is interconnected by 70-ohm-impedance printed strip lines on both sides of the motherb()ards and perpendicular grandmother boards. Interconnections between nests are made by coaxial cables. Some of the memory circuitry which did not lend itself to modular packaging be.. cause of power dissipation or size considera.. tio~s, such as driver output stages, was pack.. aged on specially built motherboards by removing some or all of the provisions for pluggable modules. All logic level interconnections are made via 70-ohm cables. Re~d and write driver outputs are transmitted to the bipolar diode matrix at the stack via 70-ohm cables. To ob- WRITE TIME (al SIGNALS ON DIGIT LINE PAIR WAVEFORMS AT TEST POINT {b 1 WHEN DELAY LINE IS REMOVED FROM CIRCUIT --O---f\-,!?\ \ I , , : ' , I.. __ .J { 1 WAVEFORMS AT TEST POINT C WITH DELAY LINE IN CRCUIT (d1 STROBE PULSE (el SENSE AMPLIFIER OUTPUT -..1\______ o ---u----I Figure 18. Sense amplifier operation. Figure 19. Memory system. 138 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 tain a lower impedance, the output stages of the switch channels are located at the memory stack. These stages are connected to the rest of the switch channels via 50-ohm cables. The digit driver outputs are transmitted to the stack via 100-ohm cables, and the sense amplifier first stages are connected to the rest of the sense amplifier by twisted-pair balanced cables having a common ground sheath and a differential impedance of 125 ohms. TEST RESULTS The 1024-word two-core-per-bit memory was built with a complete word system and a full digit system of 100 bits. Also, a special memory exerciser was built to thoroughly test the memory system. Figure 20 shows a switch voltage, read and write currents and a digit current. The switch waveform was observed at the center tap of a word line transformer primary winding (see Figure 13). The undulations on the plateau were caused by the flow of read and write currents through the switch circuit. These undulations would have been much larger if the switches had not been mounted on the memory stack. Read and write pulses of both Timing A and Timing B are shown in the figure. Note that the read and the write pulses of Timing A are close together, while those of Timing B are slightly separated. The digit pulse was observed at the end of a digit line. As shown in Figure 21 (a), T termination networks are used to terminate digit lines. The termination impedances are Zrl = 137 ohms and Zs = 133 ohms. For practicality, the same termination networks are used for all the 100 digit line-pairs, although there is some indication that the optimum value changes from bit to bit, not so much for Zd but to some extent for Zs. The calculated value of the common-mode termination is Zc = 403 ohms. The word lines crossing digit lines are responsible for the large difference between the differential mode and the common-mode termination impedances. Figure 21 (b) shows the voltage waveforms at the three points on a termination network as indicated in Figure 21 (a). It is seen that the voltage waveform at the end of the undriven line B and that at the junction C in the T termination are the same. This is important because it means that no current flows out of the undriven digit line, which is a basic requirement for memory operation as shown in Figure 2. To make the net current propagation on the undriven line zero, there must be a voltage propagation on it. (Here, the same velocity is assumed for all the propagation modes that exist in the memory operation.) It is to be CURRENT PULSE SWITCH WAVEFORM LlN~IG~~'R{ --u- NO CURRENT (a) READ AND WRITE PULSES (TIMING A) 200 MA/DIVISION VOLTAGE AT A (DRIVEN LINE) READ AND WRITE PULSES (TIMING B) VOTAGE AT B (UN DRIVEN LINE) IOV/DIVISION 200MA /DIVISION VOLTAGE AT C DIGIT PULSE AT THE END) ( OF DIGIT LINE 74 MA/DIVISION _ lOONS/ DIVISION (b) - lOONS/DIVISION Figure 20. Switch voltage, read and write pulses ("Timing A" and "Timing B"), and'digit pulse. Figure 21. Voltage waveforms at a termination network: (a) T termination network (Zd = 137 ohms and Zs = 133 ohms), (b) voltage waveforms. A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM noted that the condition shown in Figure 21 is realized only when all the 100 digit drivers are operating. Figure 22 shows waveforms at a sense amplifier test point, together with related waveforms. Figure 22 (a) shows two bits, one at the edge of the memory stack and the other at the center, regenerating ONES and ZEROS alternately over the entire memory of 1024 words. Note that the sense signals are delayed to avoid the switch noise. The switch noise, although comparable in amplitude to the sense signal, could have been made as small as it is only by the use of coupling transformers. The negative-going sense signal represents a ONE and the positive-going sense signal a ZERO. The digit transient takes about 350 nanoseconds to die down, measured from the start of the digit pulse. This time includes approximately 300 nanoseconds attributed to the base width of the digit pulse and the stack recovery time, plus 50 nanoseconds attributed to the sense amplifier. This relatively slow recovery of the stack, even with the elaborate T WAVEFORM AT TEST POINT (EDGE OF MEMORY STACK) IV/DIVISION WAVEFORM AT TEST POINT (CENTER OF MEMORY STACK) IVIOIVISION 139 termination, seems due to the imperfection of digit lines as transmission lines. Figure 22 (b) shows regeneration of ONES on all the 1024 words. It is seen that the information is available at the memory register in about 230 nanoseconds from the beginning of the read command pulse. Figure 22 ( c) shows a higher repetition rate operation of about 450nanosecond cycle time. I t shows regeneration of ONES and ZEROS on alternate words over the entire memory. Here, the switch noise and the digit transient recovery are made concurrent without affecting the sense signals. The waveforms shown above represent only a small portion of the tests performed on the memory with the aid of the memory exerciser. These tests confirmed the soundness of the design philosophy, the effectiveness of the problem solving approach, and the practicality and reliability of the memory system actually built. CONCLUSION Development of the present memory system evolved the method of control of wave propagation. Unless wave propagation is controlled, it is almost impossible to operate a high-speed memory. The basic requirements for control are: 1. Use of two neighboring digit lines as a pair for one-bit location 2. Equalization of coupling between the digit lines READ COMMAND PULSE WAVEFORM AT TEST POINT WAVEFORM NO. 2 lV/DIVISION STROBE PULSE SENSE AMPLIFIER OUTPUT 5V/DIVISION OTHERWISE MEMORY REGISTER ONE SIDE READ COMMAND PULSE WAVEFORM AT TEST POINT STROBE PULSE SENSE AMPLIFIER OUTPUT MEMORY REGISTER ONE SIDE WAVEFORM NO. 2 lV/DIVISION 5V1DIVISION OTHERWISE te) - 1 0 0 NS/DIVISION Figure 22. Waveforms at sense amplifier test points: (a) regenera tion of ones and zeros, (b) regeneration of ones, (c) regeneration of ones and zeros at 450-nsec cycle time. 3. Use of differential sense amplifiers 4. Termination of digit lines on both ends for all the existing wave propagations with particular emphasis on the differential-mode termination. The last requirement is met by the present memory due to the particular digit drive scheme used. It requires careful study to choose a digit drive scheme, as otherwise, a simultaneous termination for all the possible propagations becomes a very complex problem, with no practical answer. Although not applicable to the present memory, it is preferred that only the differential-mode propagations exist. This may be accomplished by the proper selection of memory cell types and digit drive schemes, and will simplify the propagation problem 140 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 greatly. It should be emphasized that the approach used in this paper in treating wave propagations in a memory stack is applicable to any word-organized memory. The use of transformers to couple read and write pulses to individual word lines proved very successful in alleviating the noise problem associated with the word selection matrix. There is still a possibility of reducing the noise further by reducing the transformer interwinding capacity, which will increase the system reliability and at the same time enable a faster access time. The use of a delay line in a differential sense amplifier minimized the problems of dc imbalances and level shift when sensing sm:),ll signals in an environment of large digit pulses. In addition, the use of a tunnel diode strobe circuit provided low-level thresholding and high-speed operation. There seems to be no basic difficulty in building a memory with twice as many words, using the basic design described here. It is expected, however, that the cycle time will be slightly longer since the digit lines are twice as long. 4. A ground plane is present. This assumption is also made to permit a mathematical analysis. A similar problem of multiple line wave propagation was studied a long time ago.17 The solution given here is more general than the one given in the reference and more readily applicabl~ to memories. Let the number of lines be 2n, where n is an arbitrary integer. From these, we form n pairs. Pair i consists of line i and line - i, where i = 1,2, ... ,n. This is shown in Figure A-I. Pair-to-pair coupling is equalized, which implies that, when we consider pair i and pair j (i *- j, and i, j = 1, 2, ... , n), the coupling is the same between line i and line j, line i and line - j, line - i and line j, and line - i and line - j. To be general, the coupling is made a function of i and j. The case in which the coupling is constant regardless of i and j has been treated elsewhere.!' Using matrix notation, the pertinent differential equations are [- [- ;~J APPENDIX WAVE PROPAGATION ON MULTIPLE PARALLEL LINES WITH EQUALIZED COUPLING (A-I) = [Y][V] (A-2) r ~-~ 1 1. Propagation is in the direction of the [V] = digit lines only. This implies that the word lines are considered as contributing only to the coupling among digit lines. 3. Digit lines are uniform and have no discontinuities. This assumption may not hold precisely in practice, but is made to permit a mathematical analysis. = [Z] [I] The above factors are defined as follows: The analysis given below shows the modes of wave propagation that can exist on a set of multiple parallel lines with equalized coupling. The following assumptions are made: 2. Digit lines are distributed constant lines. This is justified because for the frequencies of interest, it is not necessary to consider the line irregularities caused by memory cells. ~: ] I I I I I Vn L V-n L~ 1 [I] I J where Vi = Voltage on line i V- i = Voltage on line - i Ii = Current on line i Li = Current on line - i = 1,2,3, ....... , n. I I I J A 10· BIT HIGH SPEED FERRITE MEMORY SYSTEM PAIR I ~ 0 0 PAIR 2 PAIRi ~ r-"'----. 0 0 0 0 0 0 0 1 GROUND ····PAIRn ~ PAIR j ~ 0 0 0 0 0 0 I utI LINE i LINE -i 141 0 0 0 LINE -j Figure A-l. Cross section of a system of 2n parallel lines. where [Z] ................. J Y si = Self parallel admittance per unit length of line i or line - i, i = 1, 2, 3, ..... ,n. Ymi = Mutual parallel admittance per unit length between line i and line - i, i = 1, 2, 3, ...... , n. Y ij = Mutual parallel admittance per unit Znl Znl Zn2 Zn2. . . . .. Zsn Zmn L-Znl Znl Zn2 Zn2. . • . •. Zmn Zsn where ZSi = Self series impedance per unit length of line i or line - i, i = 1, 2, 3, ....... , . n. Y ij Zmi = Zij = Mutual Zij Mutual series impedance per unit length between line i and line - i, i = 1, 2, 3, ....... , n. = Zji I i =I=- j and i, j = i =I=- j and i, j Y ji [~:J = [Z] [Y] [V] = [1'] [V] (A-3) where [/L] = 1, 2, 3, ....... , n. [Z] [Y] I 1'" I'm, 1''' YIn YIn -, Y l2 Y ml Y SI Y l2 Y l2 Y 21 Y 21 Y S2 Y m2 Y 21 Y 21 Y m2 Y S2 " Ym , Y" YIn YIn Y 2n Y 2n Y 2n Y 2n ................. ................. Y nl Y nl Y n2 Y n2 Y nl Y nl Y n2 Y n2 . . • • . . /Ls; = Zsi Y Si JLmi + = Zsi Y mi + Zmi Y Si + + 2 2 /LIn /LIn /L2n /L2n /L21 /L21 /Lm2 /Ls2 /L2n /L2n /Lnl /Lnl /Ln2 /Ln2 /Lsn /Lmn 1,2,3, ...... , n Zik Y ki ; i ~ Zik Y ki ; i = 1, 2, 3, ...... , n kli /Lij = Zij (YSj + Y mj ) + Y ij (ZSi /LIn /LIn /L21 /L21 /Ls2 /Lm2 and ~ k,. /L12 /Lml /LsI /L12 /L12 /Lnl /Lnl /Ln2 /Ln2 . . . . • . /Lmn /Lsn L- Y sn Y mn Y mn Y sn Zmi Y mi = 1, 2, 3, ...... , n. From Equations (A-I) and (A-2) , series impedance per unit length between either line i or line - i and either line j or line - j, i =I=- j and i, j = 1, 2, 3, ....... , n. (equalized) Y [Y] length between either line i or line - i and either line j or line - j, i =I=- j and i, j = 1, 2, 3, ..... , n. + Zmi) + 2 ~ k"I k and i, j = 1, 2, 3, .... , n. i j Zik Y kj ; 142 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 In general P,ij =FJLji; i =F j, and i, j = 1,2,3, .... , n. Assume a solution for Equation (A-3) of the form Vi = "f.X V oje 1 i = 1, 2, .... , n. In order that the solution not to be trivial (Le., VOl = 0, i = 1, 2, .... , n), the following must hold: p,ml P,sl - y2 P,21 P,21 p,nl p,nl P,nl P,nl P,12 P,12 P,12 P,12 p,m2 p,m2 p,s2 - . /. p,s2 - y2 p,n2 p,n2 /LIn P,ln =0 JLn2 p,n2 or (P,SI - p,ml - y2) (P,S2 - p,m2 - y2) (P,Sl - p,ml - y2) x 2JL21 2JL12 P.S2 + P.m2 - y2 =0 This is a 2n th degree equation in y2. Let the roots be 2 Yk = {P,Sk - JLmk k Root of the determinant = 1, 2, k .... , n = n + 1, n + 2, .... , 2n Consider only the forward propagation, because the backward propagation is the same except for direction. Then - VJLsk - Yk = 1 P,lllk - VRoot of the determinant k = 1, 2, .... , n k = n + 1, n + 2, .... t 2n Now the solution of Equation (A-3) is =~v'keY"l i = 1, 2, .... , nand k == 1, 2, .... , 2n Y V_. == ~ V••• e " } V. (A-4) lit Substituting Equation (A-4) into Equation (A-3) to find relationships among V 1k and V- ik , ~ JLlj (V jk + V- Jk ) + ~/tiJ (V jk + V- Jk ) + p,ml V ik + (P.Si - J"rf ~ * Here it is assumed that (P.Si - y!) V ik + /tllli V- ik y=) V- ik =0 (A-5) == 0 (A-6) A"'oS are all single roots. Inclusion of multiple roots, however, does not change the form of Equations (A-ll) and (A-12), because the terms of the form xPe"Y", where p is a non-zero integer, cannot appear in the solution. A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM 143 where i = 1, 2, .... , nand k = 1, 2, .... , 2n. By substracting Equation (A-6) from (A-5), Y: =1= O. If k =1= i, /Lsi - /Lmi - Therefore V ik = V- ik , where k =1= i, and k = 1,2, .... , 2n (A-7) Then, Equation (A-5) can be rewritten as ~ 2/Lij V jk + j j 'I I- (}Lsi + /Lmi - y 2k) V ik + /Lik (Vkk + V-kk ) (A-8) = 0 i k where k =1= i, and i, k = 1, 2, .... , n. If k == i, /Lsk - y~ = /Lmk (k = 1, 2, .... , n). Then, from Equation (A-5) (A-9) (n - 1) equations, because i can take (n - 1) different values. Therefore, where k = 1,2, .... , n. For a given k, Equations (A-8) and (A-9) together form n simultaneous equations in V jk (j =1= k; and j = 1; 2, .... , n) and (Vkk + V -kk). In other words, if k is fixed, Equation (A-9) gives one equation and Equation (A-8) gives V ik = -V-ik V ik = V- ik V ik = V- ik =0 V jk =0 j =1= k, and j, k + V-kk = 0 = 1, 2, .... , n and V kk k = 1,2, '" 0' n These are combined with Equation (A-7) to obtain i = k = 1, 2, .... , n } i =1= k and i, k = 1, 2, 0 00 0, n i = 1,2, 000 .,nand k = n + 1,n + 2, 0000' 2n, (A-I0) Now Equation (A-4) becomes i = 1,2,0000, n (A-II) = 1,2, .000' n (A-12) U sing Equation _Vii 1'i% + ~.2n Ii -e Zoi k =n + 1 2n k ~ =n +1 i 144 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 where Z·= 01 J ZSi - Zmi Y~i - Ymi i = 1,2, .... , n Equations (A-II) and (A-12) show that the possible modes of propagation are: 1) Independent differential mode for each line-pair 2) Common modes in which the two lines of a pair have identical wave propagation. ACKNOWLEDGMENT The authors wish to acknowledge the assistance given on this project by the members of the Computer Advanced Product Research Group. Credit is due to them for designing the two-level logic modules, developing the new packaging techniques, and designing and constructing the memory exerciser. Special credit is due to. H. C. Nichols for his invaluable contribution in the fabrication of the memory system. Additional acknowledgment is due to Memory Products Department, RCA Electronic Components and Devices, for the construction of the memory stack. REFERENCES 1. V. L. NEWHOUSE, "The Utilization of Domain Wall Viscosity in Data Handling Devices," Proc. IRE, vol. 45, no. 11, pp. 14841492, November 1957. 2. W. S. KOSONOCKY, "Memory System," U.S. Patent 3,042,905, filed December 11, 1956, issued July 3, 1962. 3. J. A. RAJCHMAN, "Ferrite Aperture Plate for Random Access Memory," Proc. IRE, vol. 45, no. 3, pp. 325-334, March 1957. 4. M. M. KAUFMAN and V. L. NEWHOUSE, "Operating Range of a Memory Using Two Ferrite Plate Apertures per Bit," Journal of Applied Physics, vol. 29, no. 3, pp. 487488, March 1958. 5. R. E. McMAHON, "Impulse Switching of Ferrites," Solid State Circuit Conference Digest, pp. 16-17, February 1959. 6. R. H. TANCRELL and R. E. McMAHON, "Studies in Partial Switching of Ferrite Cores," Journal of Applied Physics, vol. 31, no. 5, pp. 762-771, May 1960. 7. R. H. JAMES, W. M. OVERN, and C. W. LUNDBERG, "Flux Distribution in Ferrite Cores under Various Modes of Partial Switching," Journal of Applied Physics Supplement to vol. 32, no. 3, pp. 385-395, March 1961. 8. C. J. QUARTLY, "A High Speed Ferrite Storage System," Electronic Engineering, vol. 31, no. 12, pp. 756-758, December 1959. 9. H. AMEMIYA, H. P. LEMAIRE, R. L. PRYOR, and T. R. MAYHEW, "High-Speed Ferrite Memories," AFIP Conference Proceedings, vol. 22, pp. 184-196, Fall 1962. 10. W. H. RHODES, L. A. RUSSEL, F. E. SAKALAY, and R. M. WHALEN, "A 0.7-Microsecond Ferrite Core Memory," IBM Journal, vol. 5, no. 3, pp. 174-182, July 1961. 11. G. F. BLAND, "Directional Coupling and Its Use for Memory Noise Reduction," IBM Journal of Research and Development, vol. 7, no. 3, pp. 252-256, July 1963. 12. W. T. WEEKS, "Computer Simulation of the Electrical Properties of Memory Arrays," IEEE Transactions on Electronic Computers, vol. EC-12, no. 5, pp. 874-887, December 1963. 13. G. H. GOLDSTICK and E. F. KLEIN, "Design of Memory Sense Amplifiers," IRE Transactions on Electronic Computers, vol. Ee11, pp. 236-253, April 1962. A 105 BIT HIGH SPEED FERRITE MEMORY SYSTEM 14. T. R. MAYHEW, "The Design of a Sense Amplifier for a Thin Film Memory," Master's Thesis, University of Pennsylvania, June 1962. 15. R. T. LURVEY and D. F. JOSEPH, "RCA N7100 Microferrite Array," Application Note SMA-9, RCA Semiconductor and Materials Division, Somerville, N.J., August 1962. 145 16. B. A. KAUFMAN and J. S. HAMMOND, III, "A High Speed Direct-Coupled Magnetic Memory Sense Amplifier Employing Tunnel-Diode Discriminators," IEEE Transactions on Electronic Computers, vol. EC-12, pp. 282-295, June 1963. 17. J. R. CARSON and RAY S. HOYT, "Propagation of Periodic Currents over a System of Parallel Wires," Bell System Tech. Journal, vol. 6, no. 3, pp. 495-545, July 1927. AN ASSOCIATIVE PROCESSOR Richard G. Ewing and Paul M. Davies Abacus Incorporated, Santa Monica, California 1. INTRODUCTION simple modules, but the memory capacity and computing capability of each module were limited. Others, such as the Solomon Computer 2, provide greater memory capacity and computing capability in the module, but each module approaches the complexity of a small computer. This paper describes the computer system designed under an Air Force sponsored study program to develop a non-cryogenic Associative Processor organization and to study its possible use in a variety of Aerospace applications. Two approaches were considered to this problem: one in which an associative memory would be added to a more or less conventional computer and another in which a new organization would be developed around the principle of memory distributed logic. The latter approach was chosen because it appears to result in a more efficient form of parallel processor. Another serious problem is that of communication. For a periodic computing structure to be useful, it is essential that there be efficient paths for the communication of control signals and operands among the modules. In some parallel processors, the communication networks are more complex than the processing modules themselves. Because of the nature of the intended use of the processor, emphasis was placed on network simplicity, on reduction of size and power, and especially, on reliability. While the processor organization was designed in terms of a particular mechanization-wire memory and integrated circuitry-the organization and algorithms are described here in general terms, and questions of mechanization are postponed to a final section. The associative memory suggest itself as a basis for another approach to the problem of parallel processing. Logical operations are performed within the individual memory cells of this memory, and communication within the structure is particularly efficient. Extension of these principles to permit full logical and arithmetic capability within each memory cell would provide a high degree of processing parallelism. We shall call an associative memory structure and its control logic, which is capable of performing such distributed computation, an Associative Processor. When the fundamental limits of electrical and optical signal propogation speeds are reached, there are just two ways to further reduce the time to perform a given computation. One of these is by making things smaller, and the other is by performing parallel processing. But efforts to achieve efficient parallel processors have encountered several difficulties. First is the problem of providing sufficient memory and computing capability within a simple module. Some parallel processors, such as the Holland 1 machine, have employed relatively In addition to the parallel computing capability, there are several other advantages which one may expect to achieve in the Associative Processor. These are: 1. The data storage and retrieval capabilities of the Associative Memory, which greatly simplify or eliminate such common data manipulations as sorting, col- 147 148 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 lating, searching, matching, cross referencing, updating and list processing; 2. Programming simplifications based upon the possibility of ignoring the placement of data in memory and the extensive use of content addressing and ordered retrieval; 3. The periodic structure of a large portion of the processor. Periodicity of structure lends itself to integrated circuit techInterniques and batch fabrication. connections between components become shorter and less tangled, reducing propogation delays and simplifying layout and checkout. Since the structure is periodic, it can easily be expanded in size; 4. Fault Tolerance. The periodic structure may permit an organization which is tolerant of memory or circuit element failures. If a cell fails, it may be possible to avoid its further use with little loss to the system capability. A program for an associative structure makes little or no reference to a unique cell so that loss of a cell "vould not confuse the program. Two approaches have been taken in the past to solve the problems of parallel processing by using associative processing techniques. Rosin 3 and Fuller 4, 5 have considered an associative memory under control of a general purpose computer. In Fuller's work, algorithms for a variety of arithmetic operations are built up as sequences of elementary operations performed by the rather limited word logic of the associative memory. In Davies 6, more extensive word logic is provided, and the control is integrated into the associative processor. The present paper represents an attempt to achieve the higher speed of the second approach with a considerably simpler logical structure, which could be mechanized from non-cryogenic components. The design which was adopted provides a random access memory for program storage and a bit serial associative memory for data storage and parallel processing. The ability to write tags (i.e. to simultaneously write data in a selected bit position of a number of selected words), coupled with simplified word logic networks, permits relatively efficient bit serial algorithms for many kinds of parallel searches, parallel arithmetic and ordered retrieval. Methods were developed for treating certain classes of memory and circuit failures. For these cases, the processor can continue to operate in spite of a failure with only slight impairment of the overall system capability. In the area of communication, methods were developed for treating operand pairs in a variety of relative locations. 2. MEMORY DISTRIBUTED LOGIC One of the fundamental features of the associative memory is that logical operations are performed within the memory cells. However, even in the random access memory a limited amount of logic is performed in the memory cell. The boolean function Xi 'Y j 'Sij is performed, where Xi is the selected X address coordinate, Y j the selected Y address coordinate, and Sij the bit stored at location ij. The value of the function is read out on the sense line. In an associative memory, the memory logic is extended to permit selection of a memory cell on the basis of stored data. In some associative memories, this is accomplished by the function (Sl~Rl) . (S:!~R:!) ... (Sn~Rn) which is mechanized in each memory word cell. "Si~Ri'" the equivalence function, is the same as Si . Ri + Sj • R i. Si is the bit stored in the i-th bith position of a typical word, while Ri is the corresponding bit of a reference word stored in an external register. The function selects all words whose stored contents match the reference word. This can be improved to permit masking of selected bits as follows: [(Sl~Rl) + Md r (S2~R2) + M2] [(Sn~Rn) +Mn] where Mi indicates whether the i-th bit is to be ignored in the comparison. In addition to providing logic in each memory bit position, it is also profitable to have logic associated with each word cell. This is the case in certain word organized random access memories and in associative memories. In the first case, there is the word driver which may be a magnetic or semiconductor amplifier which responds to X and Y coordinate selection lines just as the typical bit cell does in a coincident current memory. In associative memories, there is usually a match detector with each AN ASSOCIATIVE PROCESSOR word which responds to the match logic described above. Ordinarily, the match detector has memory. These operatiotls at both the bit level and the word level suggest the possibility of providing sufficient distributed logic to permit parallel computation throughout the memory structure. In arriving at an Associative Processor capable of such parallel computation a number of important decisions must be made. One basic choice is whether to use a separate random access memory or the associative memory itself for program storage. The first choice is probably more practical since the random access memory is less expensive; furthermore, it will be easier to protect the associative portion from fault if the program is kept separate. A second choice to be made is between bit parallel and bit serial operation. Certain associative operatibns such as the matching of fields for equality can be performed in bit parallel. On the other hand, to perform the more complex functions of arithmetic, it appears more convenient to use the bit serial approach,· simplifying the bit cell by time sharing one logic module among all bits of a word. A third problem is that of communication. To· perform parallel computation, one must have access to the operands and operand pairs. In some cases, the operand pairs are stored together in the same word. In other cases, they are in adjacent words, while in still others, they are in non-adjacent words, but always some fixed number of words apart. Another common requirement involves operand pairs in which the first operand of each pair is common while the second operands are distinct. In this case, the common operand, in an external register, must be communicated to the others, stored in various memory cells. Still another communication problem is based upon the fact that while large portions of a problem may be susceptable to parallel processing, other parts may be essentially sequential. These also must be performed efficiently by the Associative Processor if they are not to offset the ad vanta~es gained in the parallel processing . . Techniques for solving these communication problems include the following: 1. Transmission of a common operand to all memory word cells. 149 2. Flexible control of field selection to permit operation on pairs of operands in the same words. 3. Use of shift registers for communication between words. These can be uni-directional orbi-directional and can be extended to two or more dimensions to give greater flexibility. 4. Forms of entry-exit ladder networks which permit rapid communication between non-adjacent word cells. The following sections will describe the Associative Processor, which is based upon specific choices of these options. 3. ORGANIZATION A block diagram of the Associative Processor is shown in Figure 1. It contains both a conventional random access memory (RAM) and an associative memory. The RAM provides storage for instructions and constants; it is accessed parallel by bit and serial by word. In processing operations, the Associative Memory is accessed parallel by word and serial by bit. In the organization under consideration, RAM contains 4000 twenty-four bit words, and the Associative Memory contains 500 ninety-six bit words. Instructions accessed from RAIVI are transferred to the Instruction ·Register where they are held during execution. The D-Register, WORD ASSOCIATIVE MEMO~Y La G I c} SENSE & W~I TE AM P'S INPUT Figure 1. Block Diagram of Associative Processor. 150 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 which has the same length as a RAM word, serves as temporary storage for operands which participate in associative operations. For instance, the D-Register may hold the argument of a search, may receive data being retrieved from the Associative Memory, or may communicate with the external world. Data originating from outside of the Associative Processor can be transferred directly to either the Associative Memory or RAM. Direct input to the memories is under an automatic interrupt control. In the Associative Memory, only one bit column at a time may be operated upon. The particular bit column is selected by either the A Counter, the B Counter, or the C Counter. Associated with the A and B Counters are the A and B Limit Registers. Each may contain a value which serves to define a maximum or minimum value of its companion counter. Together, each counter and limit register define a field which can be any length up to the number of bits in the Associative Memory word, and may overlap the field defined by the other counter and limit register. , The design of the Associative Processor is sufficiently general to permit implementation by' a variety of memory elements and logic techniques. Therefore, the following description of the Associative Memory, shown in Figure 2, will present those characteristics which are essential to the design of the Associative Processor. Storage for one bit is provided at each intersection of a word and a bit line. A pulse on a bit line causes a signal to be emitted by each bit on that line. The signals are transmitted through the word lines to the sense amplifiers. l BIT DRIVERS Figure 2. Associative Memory. The equivalence function is obtained in one of two ways depending upon the particular memory element. In some memories it is sufficient to exercise control over the polarity of the interrogating pulse, thereby achieving a signal output for a match and no output for a mismatch. In these cases, the bit element itself performs the equivalence function, S~R. In other memories, the stored bit is merely read out; the reference bit is transmitted to all sense amplifiers and logic associated with each sense amplifier generates the equivalence function. Writing at a particular bit location is accomplished by passing a current through the intersecting bit and word lines. The polarity of the current in the world line, or in some cases the word and bit lines, determine the state of the written bit. By energizing all the word drivers and one bit driver, one bit of each word can be written into. The latter operation, which is sometimes referred to as "tagging", plays a significant role in the design of the Associative Processor. The logic associated with each word gives great power to the Associative Processor. This logic 'is identical for all words and consists of a sense amplifier, storage flip-flop, write amplifier, and control logic. Refer to Figure 3.3. The sense amplifier is bistable and remembers the match state from one interrogation to the next. The output ",of the sense amplifier determines the state of the storage flip-flop in various ways as determined by the control signals, Es, Er, and Ec. In addition, the contents of each storage flip-flop can be shifted to the storage flip-flop in the word above under control of the signal, Esh. This provides communication between words. One of the functions of the storage flip-flops is to control writing. In this operation, the storage flip-flops in the "1" state select the words that are to be written into, while the signals, W1 and WO, determine whether "l's" or "O's" are written by the selected write amplifiers. In addition, the output of each storage flip-flop is "ANDed" with the output of the corresponding sense amplifier. The outputs of these AND gates are ORed together to provide an output channel from the Associative Memory to the remainder of the Processor. AN ASSOCIATIVE PROCESSOR Control of the word logic networks is exercised through a Control Unit. This unit interprets the contents of the Instruction Register and D Flip-flop to determine the control signals, Es, Er, Ec, Esh, WI, and WO which are transmitted to all word logic networks. The D flip-flop and D Register are the data link between the Associative Memory and the Random Access Memory. Data, such as the argument for a search, are transferred from RAM in parallel to the D-Register. Each bit is then shifted into the D Flip-flop where it participates in the search operation. Data retrieved from the Associative Memory are transferred through an adder to the D Flipflop and then to the D Register. The Associative Processor offers a variety of processing options in terms of operand location and processing speed. The following list illustrates 'some of the possibilities: 1. D+ M~D 2. D,+ Mj~Mj 3. M j + Mj~Mj 4. M j + Mk~Mk (1) represents an operation occurring between the D Register and one selected word in the Associative Memory. The result goes to the D Register. (2) illustrates a process between D Register and many words in the Associative Memory. The third operation occurs between pairs of operands, each pair stored in a separate word. (4) represents an operation occurring between operands in different words. The same operation may simultaneously occur in many such pairs of words. In addition to these operations, many variations are possible, e.g. the operands may be located in different words with the results going to a third word. The capability of the storage flip-flops to act as a shift register provides the communication link between adjacent words. Another use of this shift register occurs in counting the number of words which satisfy a search algorithm. This is accomplished by operating the storage flip-flops as a shift register and counting the number of "1's" shifted out. Each "I" corresponds to a word that satisfies the search. 151 4. COMMAND STRUCTURE There are two types of instructions in the Associative Processor. Instructions which exercise control over the Associative Memory shall be referred to as associative instructions. Instructions which provide access to the Random Access Memory or that perform control transfers shall be referred to as non-associative instructions. A list of non-associative instructions follows: LA Load the contents of memory location M into the A-Counter and Limit Register. LB Load the contents of memory location M into the B-Counter and Limit Register. LC Load the contents of memory location M into the C-Counter. LD Load the contents of memory location M into the D-Register. LM Load the contents of the D Register into Memory Location M. TD Transfer to location M if the D flip-flop equals "zero", otherwise proceed sequentially. TO Transfer to location M if the output of the OR gate equals "'zero", otherwise proceed sequentially. TI Transfer to the location specified by memory location M. TU Unconditionally transfer to location M. SH Up-shift the storage flip-flops a number of times equal to M. SC Up-shift the storage flip-flops a number of times equal to M. The C Counter counts the number of ones shifted into the highest level storage flip-flop. CD Transfer the contents of the C Counter to the D Register. ID Input data wo;rd from external device to the D register *. OD Output data word from the D register to external device *. *The input and output commands generally work in conjunction with the automatic interrupt facility. An external devicE' requests an interruption by turning on an interrupt flip-flop. This causes the Processor to complete the present instruction, store the contents of the Instruction Address Counter in memory, and jump to an Input or Output routine. These routines can transfer 1-0 data between the D Register and either the RAM or the Associative Memory_ 152 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 CS TC CC AC RS f-.i 0 :J f;J ~ 0 ~ 0 ~~ ~~ O~ 00 00 ~~ ~O \/)~ Z~ ~O [So fE~ A~ A~ ~O 0 IC is ~ #Sf-; f4~ D$ ~\/) ~ A Each associative instruction controls the processing during a single bit time, except when it is executed in a Repeat Mode. The instructions are divided into a number of fields, each of which specifices the control of a separate part of the Processor. Figure 3 summarizes these fields which are described below: Column Select (CS) : The contents of this field determine what bit column of the Associative Memory is to be interrogated or written into, either by specifying the A, B, or C counter, which in turn selects the bit column, or by directly specifying one of the four bit columns. The four directly specified columns are ordinarily used for the storage of tag bits. A A counter B B counter C C counter T1 Column 1 T2 Column 2 T3 Column 3 T4 Column 4 Counter Control (CC): The contents of this field determine whether the counter selected in CS will be modified. A counter may be decremented or incremented by one. IN Increment DE Decrement NC No Change Transfer Control (TC): In general, instructions are accessed from sequential memory locations in RAM. To facilitate exiting from a subroutine, it is desirable to be able to transfer to another location when the contents of a counter become equal to the associated Limit Register. Ta Transfer to memory location 0 when the A-Counter becomes equal to the A-Limit Register WC IT SC 00 O~ ~~ ~O fSo ~ ~.~ ~ fS~ I--;f-; ~~ 0 ~O Dg .~~ ~O 20 t/) & ~ ~. ~~ RM 0 0 ~ ~ Of:4 O~ gf-; f-.i ~ ~ [g I--; f:4 ~ Tb Transfer to memory location 1 when the B-Counter becomes equal to the B Limit Register NC Proceed sequentially. Adder Control (AC) : The contents of this field control the manner in which the output of the OR gate is transferred into the D Flip-flop. L OR gate output copied into the D Flipflop. C Complemented OR gate output copied into the D Flip-flop. A OR gate output added to the D Flip-flop. The carry is stored in a flip-flop associated with the Adder. S OR gate output subtracted from the D Flip-flop NC No transfer D-Register Shift (RS) : The D Register can be made to shift one bit in either direction. The shift is end around when the contents of the AC field indicate that no transfer is to take place. R Right shift L Left shift NC No Change • r----L...I Figure 3. Word Logic. TO O~ GATE AN ASSOCIATIVE PROCESSOR Interrogate Control (IC) : Upon interrogation, the sense amplifier responds with a "1". output to a· match condition between the interrogated bit and what previously has been labled a reference bit. The IC field defines the reference bit: 1 Interrogate for "I". If the stored bit is equal to "1", the sense amplifier will be set. Z Interrogate for "0". If the stored bit is equal to "0", the sense amplifier will be set. D If the D flip-flop is equal to "I", interrogate for "1", if "0", interrogate for "0" . D If the D flip-flop is equal to "0", interrogate for "I", if "I", interrogate for "0" . Write Control (WC): This field specifies writing to occur in the words for which the storage flip-flop is equal to "I". During writing, the IC field is available to determine whether "1's" or "O's" are to be written. W-Write NC-Do not Write Storage Flip-fiop Control (SC): This field specifies the state of the control signals which are common to the input logic of each of the storage flip-flops. This logic influences the transfer of data from the sense amplifiers to the storage flip-flops. NC O~Es, O~Er No transfer takes place. Es Er Esr I~Es, O~Er The Storage flip-flop is set if the sense amplifier is equal to "I". O~Es, 1 ~Er The Storage flip-flop is reset if the sense amplifier is equal to "0" . I~Es, I~Er The state of the sense amplifier is copied by the storage flipflop. D 15 153 O~Es, I~Er, if the D flip-flop is equal to "I". I~Es, O~Er, if the D flip-flop is equal to "0". OR I~Er if the output of the OR gate is equal to "I". Ec I~Ec The storage flip-flop is complemented if the sense amplifier is equal to "1". Instruction Type (IT): This field appears in both associative and non-associative instructions. The contents designate the instruction as being associative or not. A Associative A Non-associative Repeat Mode (RM): It is sometimes desirable to repeat an instruction during the execution of a simple search. R Repeat until the counter specified by CS becomes equal to its limit register. R Do not repeat instruction. The time required to execute one associative instruction is measured from the time the instruction is transferred into the Instruction Register to the time the sense amplifier outputs are transferred into the storage flip-flops. During this time, the next instruction is accessed, and the previous output of the storage flip-flops can be transferred to the D flip-flop. This time will be referred to as a "bit time". Associative instructions are accessed at a rate of one per bit time. It should be noted that the AC field of the associative instruction will control the disposition of storage flip-flop data that resulted from an interrogation specified by the previous associative instruction. Nonassociative data tranferring instructions require two bit times for execution (Both an instruction and an operand must be accessed from the Random Access Memory). N onassociative instructions which transfer control require one bit time for execution. I~Es, O~Er, if the D flip-flop is equal to "1". O~Es, I~Er, if the D flip-flop is equal to "0". 5. MICROPROGRAMMED ALGORITHMS The method by which associative instructions are controlled constitutes one of the 154 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 major factors contributing to the flexibility of the Associative Processor. Each field of the instruction directly specifies some. control function, so that numerous associative instructions can be micro-programmed by the appropriate selection of fields. Despite the large number of options available, the central control unit is very simple since the control functions are obtained directly from the instruction fields. Following, is a description of several important categories of associative instructions with typical microprograms; algorithms are also given for more complex data retrieval and arithmetic processes built up for microprogrammed associative instructions. Possibly the most often used algorithm is ordered retrieval. A number of algorithms for retrieval have appeared in recent publications. Among these, the algorithm presented by Lewin 11 appears to be fastest, making its' implementation in the Associative Processor an attractive consideration. However, in view of its' unique hardware requirement, (Le., the equivalent of a three state sense amplifier on each bit column), an algorithm was developed which utilizes logic of a more general nature. In fact, the development of this algorithm greatly influenced the design of the word logic. The algorithm, presented in detail later in this section, retrieves one bit of data for each bit time of execution. The ordered retrieval algorithm is used whenever data is to be retrieved from the Associative Memory or whenever it is necessary to select a word in which to write data. Since the time required to identify a word is related to tIie number of bits that must be searched, it is often desirable to have stored in each word a compact address field. Having such a field also provides a convenient way to distinquish between two words which might otherwise contain the same data. In most instances, before the execution of an associative search, it is necessary to precondition the storage flip-flops either by setting or resetting them all, or by setting those corresponding to the set of words which is to be searched. Accomplishing this last operation requires transferring the contents of the tag bit (or whichever bit position holds the infor- mation) into the sense amplifiers and then copying the states of the sense amplifiers into the storage flip-flops. These operations can be executed with a single associative instruction. CS CC TC AC RS IC Wc S.c IT RM Tl NC NC NC NC 1 NC Esr A R Setting (or resetting) all the storage flipflops requires two associative instructions. The procedure is to interrogate the same bit column twice; once for "1's", and once for "O's". The following instructions reset the storage flipflops. CS CC TC AC RS IC WC SC IT RM 1. Tl NC NC NC NC Z NC Er A 2. Tl NC NC NC NC 1 NC Er A R R The following associative searches have been microprogrammed: Equality Less than Less than or equal Greater than Greater than or equal Maximum value Minimum value Similarity Ordered Retrieval Except for Similarity, the execution time or each search is one bit time for each bit of the argument. 'The object of most searches is to leave the storage flip-flops in a state which defines the locations of those words which meet the conditions of the search. However, it is possible to obtain the complementary set of words, as well as the set obtained by ORing or ANDing the results of several different searches. Following are a few examples of search microprograms: Equality Search CS CC TC AC RS IC WC S.C IT RM A IN TA NC R D NC Er A R This instruction searches the words in memory whose storage flip-flops are initially true. At the end of the search, the storage flip-flops identify those words containing a field exactly matching the field in the D Register. The field in memory is defined by the A Counter and Limit Register. Each bit of the field is interrogated starting with the least significant bit. 155 AN ASSOCIATIVE PROCESSOR The D flip-flop specifies the match conditions. A mismatch will cause the appropriate storage flip-flop to be reset. AddD + SP Add Ml + Less Than IN NC NC R Z NC DAR This instruction causes a storage flip-flop to be set if the .interrogated memory bit is "0" and the D flip-flop "I" and reset if the memory bit is "I" and the D flip-flop "0". Otherwise the storage flip-flJPs are unchanged. Essentially, this logic is the same as borrow logic with the contents of the D Register being subtracted from the contents of each memory word. When the storage flip-flop is equal to one, the memory word is less than the data register word. Ordered Retrieval CS CC TC AC RS IC WC SC IT RM A DE NC L L 1 NC Or A R This instruction transfers to the D Register the maximum value field of the set of fields which are identified by the true storage flipflops, Starting with the most signicant bit position of the search field, each bit position is sequentially interrogated for a one. When any sense amplifier indicates a match for a field still in the search set, the storage flip-flops corresponding to those sense amplifiers indicating a mismatch are reset. Each time a bit position is interrogated and is found to contain a "I" in any of the words remaining in the search set, a "I" is transferred into the D flip-flop. Many arithmetic and logical microprograms have been developed for the Associative Processor. Below is a partial list. The operations are identified by S when they apply to an operation between a single pair of operands. An SP refers to an operation betwen one operand in the D Register and many operands in the memory, and P refers to simultaneous operations between many pairs of operands in memory. The number of bit times required for the execution of each operation appears in parenthesis Add M + (12 X no. of operand bits) M2~M2 P CS CC TC AC RS IC WC SO IT RM A M~M D~D S (1 + no. of operand bits) (12 X no. of operand bits) Multiply Ml X S M2~D,M3 Multiply Ml X P M2~M3 Divide (20 X no. of multiplier bits) for a 24 bit mUltiplicand (no. of multiplier bits X no. of multiplicand bits) D/Ml~M2 S Add (M 1 + (30 X no. of operand bits) M2~M2) Field Ml is added to field M2 in all words which contain a "I" in bit column (T1). Field Ml is defined by the A counter and Limit Register. Field M2 is defined by the B counter. Bit column T2 is temporary storage for the carry. Addition is executed in the following steps: 1. The jth bit of Ml is transfered to the storage flip-flops. 2. The contents of the storage flip-flops are added to the jth bit of M2. The carry is developed in the storage flip-flops during the addition. 3. The partial carry resulting from the carry addition_ (preceding step 1) to the j th bit of lVI2 is ORea with the partial carry in the storage flip-flop. The final carry results in the storag~ flip-flop. 4. The carry is added to the j+1 bit of M2; the resulting partial carry is stored in T2. The following program executes this addition algorithm: CS CC TC AC RS IC WC SC IT RM 1. T1 NC NC 2. A IN NC 3. B NC NC 4. B NC NC 5. B IN TB 6. T2 NC NC 7. T2 NC NC 8. B NC NC 9. B NC NC 10. B NC NC 11. T2 NC NC 12. Transfer to NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC NC 1 (TU) 1 1 1 1 0 1 0 1 1 0 1 NC NC NC W W NC W NC W W W EsrA Er A NC A Er A NC A Es A NC A NC A Er A NC A NC A A R R R R R R R R R R R 156 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 Instruction one transfers T1 to the storage flip-flops. Instruction two interrogates the jth bit of MI and resets the storage flip-flop where mismatches occur. The resulting contents of the storage flip-flops constitute the AND function of T1 and the jth bit of MI' Instructions three, four, and five have the effect of complementing the jth bit of M2 in words for which the storage flip-flop is equal to "1". In addition, if the jth bit of M2 were equal to "1", the storage flip-flop would remain equal to "1", thereby representing the partial carry. Instruction six OR's the carry resulting from addition of the last carry to the jth bit of M2 to the partial carry in the storage flip-flop. Instruction seven clears the T2 column in preparation for the next carry storage. Instructions eight, nine, and ten add the carry to the j 1 bit of M2 using the same technique as instructions three, four, and five. Instruction eleven stores the partial carry in T2. The routine is exited after instruction five has been executed and the addition of the most significant bits completed. + 6. FAULT TOLERANCE An interesting characteristic of this particular associative memory is its structural periodicity. Each bit driver is identical to any other, and the logic of each word is identical to that of any other. There is no addressing matrix and no ladder network. The existence of these characteristics suggest the possibility of making the system operation insensitive to local malfunctions in the memory stack and memory circuits. There are numerous possible causes of malfunction in the Associative Memory. However; most malfunctions can be placed in one of four categories. The first is characterized by the inability of the sense amplifier to change state. The second is characterized by the inability of the storage flip-flop to change state. The third category consists of those malfunctions which cause a write amplifier to fail, and the fourth consists of those malfunctions which cause a bit driver to fail. The procedures to be described below consist of exercising control over the functions common to the logic of each word in such a way as to gain a system tolerance to these malfunctions. Other malfunctions may occur for which the only safeguard would be the utilization of component redundancy techniques. In the following discussion it will be assumed that no more than one type of malfunction exists in anyone word. To cope with these malfunctions, it is of primary importance to guard against spurious results in the retrieval operations. It is of little importance if, in a particular word, some other operation goes awry as a result of a malfunction. This merely produces meaningless data in that word, which is unimportant if provisions are made never to write into and never to retrieve information from such a word. Since a word is selected for writing by means of the retrieval algorithm, the fundamental problem is to guard against incorrect retrieval as a result of a malfunction. Malfunctions of the first category cause the sense amplifier to permanently store either a "1" or a "0". If a "0" is stored, the storage flip-flop can never be set. The retrieval of data from a particlular word and the ability to write into a particular word are dependent upon the storage flip-flop of that word being in the "1" state. If the storage flip-flop can never be set, the word appears to be nonexistent. If information were stored in the word prior to the malfunction, it would become irretrievable; however, as long as the malfunction existed it would be impossible for data to be inadvertantly stored in that location. A "1" locked in the sense amplifier presents a different problem. If the storage flip-flop of such a malfunctioning word is true, execution of the retrieval algorithm will result in the retrieval of a word of "1's". To prevent this, it is necessary to perform an operation prior to the retrieval algorithm which will reset the storage flip-flops in words whose sense amplifiers are locked in the "1" state without altering the states of the other storage flip-flops. Furthermore, this operation must not turn on a storage flip-flop in a word whose sense amplifier is locked in the "0" state. This can be accomplished as follows: Reset the sense amplifiers by interrogating a column of "O's". Then by executing the complement control Ec, complement all storage flip-flops corresponding to sense amplifiers still in the "1" state. AN ASSOCIATIVE PROCESSOR Malfunctions of the second category are those in which the storage flip-flop does not change state. If the storage flip-flop is locked in the "0" state, the associated word can not participate in any reading or writing operations. Such a condition will cause that word to appear to be nonexistent. A permanently stored "I", however, may cause an erroneous readout during execution of the retrieval algorithm. To avoid this, in the case of maximum value retrieval, the affected word should be loaded with "O's" prior to retrieval. A method of determining whether any storage flip-flops are malfunctioning is first to interrogate a column of "1's" so as to set the sense amplifiers, then to execute the Es and then Ec functions. The output of the OR gate will be "I" if any storage flipflops remain true. The third category consists of those malfunctions which disable a write amplifier. If a disabled write amplifier can be detected and the word uniquely marked, further operations upon that word can be avoided. Detection is accomplished by writing a pattern of "1's" and "O's" in each word. An equality search is then made using the same pattern to identify any words in which the pattern was not successfully recorded. On each successive execution of this procedure, a pattern different from the last pattern must be used. If the pattern which is read out of a given word is not identical to the current pattern, then the write amplifier driving that word is malfunctioning. The erroneous pattern cannot be used again. If the length of the pattern is N bits, then N bits in each word of the memory must be relegated to storage of the checking patterns at the time of execution of the checking procedure. The number of different patterns must exceed by two the number of malfunctions which can be tolerated. The malfunctions of the last category are associated with the bit drivers. Once a bit driver has failed there is no way of writing into or interrogating that particular bit column. Therefore, it is necessary to isolate malfunctioning bit drivers. Detection of a malfunctioning bit driver is accomplished by designating two words of memory as test words, one of which would contain stored "1's" and the other, stored "O's'. Special logic on each of these two word 157 lines would compare the real output to the theoretical output upon each interrogation. A discrepancy would interrupt the program, thereby allowing the execution of a programmed corrective action. 7. MECHANIZATION The critical problem in mechanizing the Associative Processor is, of course, the implementation of the associative memory. The critical requirements of this memory are the following: 1. Non-destructive readout. This is essential for the Processor described above; however, with slight modification, destructive readout could be tolerated provided the write time was comparable to the read time. 2. Small ratio of word write current to read signal. This significantly influences the complexity of the sense amplifier and write amplifier and limits the number of words in memory. 3. Short write cycle. This makes tagging operations practical. 4. Short interrogation cycle. This is especially important for a bit serial processor. 5. Limited power consumption. A number of memories were analyzed for compliance with these requirements, including: 1. Plated Wire 7,8 2. Laminated Ferrite 9 3. Bi-core 4. Biax At this time, the most promising of these for both the Associative Memory and RAM appears to be the Plated Wire Memory. It can be operated in a nondestructive readout mode, requires a word current of approximately 25 ma, and can be interrogated or written into at a 10 mc rate. The closed flux path, rotational switching mode and lose coupling of the switched flux to the sense line contribute to the high ratio of read signal to word write current and to the low power consumption of each bit. Our laboratory evaluation of the Plated Wire has indicated the feasibility of using integrated 158 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 circuitry for both the word logic and the bit drivers. The periodic structure of large portions of the Associative Processor and the requirement for a relatively small number of circuit types facilitate mechanization with integrated circuits. The optimum sizes of the Associative Memory and the Random Access Memory depend greatly upon the application. For the class of Aerospace applications for which the Processor was conceived, the following dimensions and parameters were chosen: RAM: 4096 words of 24 bits each. Associative Memory: 512 words of 96 bits each. Bit time = Memory cycle time = .1 fLsec. 8. CONCLUSIONS plications and Mr. T. Stupar who assisted in evaluating the technical feasibility of the system. REFERENCES 1. HOLLAND, J. H., "A Universal Computer Capable of Executing an Arbitrary N umber of Sub-Programs Simultaneously," Proc. Eastern Joint Computer Conference, (1959) . 2. SLOTNICK, D. L., BORCK, W. C., and McREYNOLDS, "The Solomon Computer," Proc. Fall J oint Computer Conference, (1962) . 3. ROSIN, R. J., "An Organization of an Associative Cryogenic Computer," Proc. Spring J oint Computer Conference, San Francisco, (May 1962). The Associative Processor possesses several virtues as a parallel processor. The basic processing module, i.e. one word of Associative :Memory with its word logic, possesses considerable computing and memory capability for its size and complexity. This implies a large amount of parallel processing per dollar. Communication within the Processor is relatively efficient, especially where associative techniques can be employed. Most of the Processor is periodic in structure and, therefore, compatible with batch fabrication techniques and integrated circuitry. Fault tolerance techniques can be employed, at least to a degree. The nonperiodic control structure of the Processor is relatively simple. And, finally, the microprogramming characteristics of the instructions permit and encourage programming experiments. 4. ESTRIN, G. and FULLER, R., "Algorithms for Content Addressable Memory Organizations," Proc. Pacific Computer Conference, Pasadena, (March 1963) . ACKNOWLEDGEMENT 9. R. SHAHBENDER, et al., "Laminated Ferrite Memory." Proc. Fall Joint Computer Conference, Las Vegas, (Nov. 1963). 5. ESTRIN, G. and FULLER, R., "Some Applications for Content-Addressable Memories," Proc. Fall Joint Computer Conference, Las Vegas. (Nov. 1963). 6. DAVIES, P. M., "Design for an Associative Computer," Proc. Pacific Computer Conference, Pasadena, (March 1963). 7. I. DANYLCHUCK, A. J., PERNESKI, M. W. SAGAL, "Plated Wire Magnetic Film Memories," Intermag Proceedings, Washington, D.C., (April 1964) . 8. K. FUTAMI, et al., "The Plated-Woven Wire Memory Matrix", Intermag Proceedings, Washington, D.C., (April 1964). The authors gratefully acknowledge the support of the Space Systems Division of the Air Force under whose study contract the work was done and Space Technology Laboratories, Inc., who participated with Abacus, Inc. in the study. 10. C. A. ROWLAND, W. O. BUGE, "A 300 Nanosecond Search Memory," Proc. Fall Joint Computer Conference, Las Vegas, (Nov. 1963) . We wish particularly to thank Mr. M. Waxman of S.T.L. for his work in evaluating the Associative Processor in several aerospace ap- 11. M. H. LEWIN, "Retrieval of Ordered Lists from a Content-Addressed Memory," RCA Review, (June 1962). A HARDWARE-INTEGRATED GPC/SEARCH MEMORY Russell G. Gall Goodyear Aerospace Corporation, Akron, Ohio as curves for ease of comparison of performance of the three systems. SECTION I-INTRODUCTION A search memory that can be operated in conjunction with a USQ-20 computer using only the standard input-output channels is, being developed by Goodyear Aerospace Corporation. This approach is referred to as the peripheral search memory. On the other extreme, there is the possibility of completely integrating the search memory with the USQ-20 computer. That is, the instruction repertoire can be modified to include associative instructions" the control logic cap. be modified extensively, and additional registers can be added as necessary. Neither integration can be considered the optimum since the latter involves costs that are out of proportion with the advantages, and the former involves undue transfer time penalties and more complex programming. While the integration methods described could conceivably be applied with some study to any general-purpose computer, this document is particularly oriented toward integration of the Goodyear Aerospace Corporation (GAC) search memory with the Univac 1206 general-purpose computer. The actual military designation of this computer is CP642A/USQ-20 (V) . This indicates that it is one of a number of components included under the designation AN / USQ-20 (V) . USQ-20 is a general term used informally to identify this computer and is used throughout this paper. SECTION II-SEARCH MEMORY DESCRIPTION This paper presents, a method of integrating the search memory with the Univac 1206 (AN/ USQ-20) computer more intimately than is possible through a standard input-output channel. The saving of search time that results approaches that of a completely integrated system. The hardware modifications required are relatively minor, and therefore increases in cost are held to a minimum. A typical fourvariable search problem is formulated and its, solution by the following three separate systems is analyzed: (1) AN/USQ-20 computer, (2) AN /USQ-20 computer in conjunction with a peripheral search memory tied to a standard I/O channel of the computer, and (3) the proposed hardware-integrated USQ-20/search memory. Search solution times are derived for each of the three systems. They are displayed GENERAL The particular search memory model upon which this study is based is shown in block diagram form in Figure 1. A brief functional description is considered sufficient background for understanding the ideas presented herein. More detailed information is available in the literature. 1 1. The memory proper has a capacity of 256 words. Each word contains 30 bits. (The word size of the USQ-20 computer is also 30 bits.) The memory is limited to three types of search: 1. Exact match (=) 2. Equal to or greater than (» 3. Equal to or less than «) 159 160 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 MEMORY 30 BITS--" 256 WORDS ADDRESS SELECTION MATRIX RESPONSE RESOLVER RESPONSE STORE + Figure 1. Search Memory Block Diagram memory, when to accept criteria, what type of search to perform, and what type of response is required. The search types may be combined without limit with the logic AND connective. For example, the equal-to-or-Iess-than search logically ANDed with the equal-to-or-greater-than search is, equivalent to a between-limits search. c. Data A total of (n) words (0 < n < 256) of data may be transferred from the USQ-20 computer to the search memory at a maximum rate of 8 p'sec per word using the internal interrupt mode of data transfer. The USQ-20 program then will be interrupted internally after transferring the nth data word to the search memory and will jump to an interrupt routine. This routine will generate an "erase" instruction if required, load the first set of criteria in an output buffer for the forthcoming transfer to the search memory, and initiate the first search instruction. The search memory will accept a search instruction even though it is not finished erasing, although it will not act upon this instruction until erasing has been completed. 2. SEARCH MEMORY INPUT (FROM USQ-20) a. Information Types Three general types of information are transferred from the USQ-20 computer to the search memory: instructions, data, and criteria. b. Instructions The instruction information consists of a single instruction word, the format of which is shown in Table 1. The instruction word is the only type of information that is transferred from the USQ-20 computer via the external function mode of data transfer. The configuration of the instruction word tells the search memory when to start writing data into the memory, when to erase remaining words in d. Criteria The criteria consists of either (1) a mask word followed by a key word or (2) a key word TABLE I-INPUT INSTRUCTION WORD FORMAT Bit d22 • • • d15 Address (random write or start) d7 Response count required d6 d5 d4 d3 d2 Mask word follows Response required LIT search G/T search Exact match search d1 Stop write (erase) do Start write A HARDWARE INTEGRATED GENERAL PURPOSE COMPUTER/SEARCH MEMORY transferred from the USQ-20 to the search memory. The search instruction cues the search memory as to whether the criteria will be transferred according to (1) or (2) (see bit d6 in Table I). It is considered sufficient that the criteria be transferred from the USQ20 to the search memory in the normal buffer mode of data transfer. The internal interrupt mode of data transfer is not considered necessary in this case since the USQ-20 need not take further action until an end-of-search signal response is received from the search memory via the input channel. 3. SEARCH MEMORY OUTPUT (TO USQ20) The output of the search memory can be either response address, response count, or no response. Either may be selected by the instruction. A no-response instruction indicates the response to the present search should be saved for logically ANDing with the next search. Two address responses per 30-bit word are packed for transfer to the USQ-20. There can never be more than a single count response for any given search. It is placed in the least significant portion of the 30-bit word transferred to the USQ-20. The normal buffer mode is used to transfer address or count responses to the USQ-20. An end-of-search signal must be sent to the USQ-20 whether or not an address or count response is desired or available. This signal is transferred via the USQ-20 external interrupt. The end-of-search signal always follows the transfer of address or count responses if any have occurred. 4. SEARCH TIME a. Exact-Match Search Search time is variable, depending on the type of search. The exact-match search is performed in 5 p'sec and is independent of the memory size (number of words), number of words actually loaded into the memory, and the number of unmasked bits searched. b. Equal-to-or-Less-Than Search The equal-to-or-Iess-than search is actually the result of one or more exact-match searches. The number of exact-match searches performed depends on the number of unmasked zeros in 161 the key word. Therefore, the total search time is 5mo p'sec, where mo is the number of unmasked zeros in the key word. c. Equal-to-or-Greater-Than Search The equal-to-or-greater-than search is similar to the equal-to-or-Iess-than search, except it depends on the number of unmasked ones in the key word. The total search time then is 5m1 p'sec, where ml is the number of unmasked ones in the key word. 5. ADDITIONAL CONSIDERATIONS Masked shift load capability is a method of specifying n high-order bits of the 30-bit USQ20 word, and specifying a particular field of the search memory into which this n-bit byte should be entered. Although this actual search memory model does not have this capability, for the purposes of this paper the memory is assumed to have it. Although there are several methods and variations of methods to update the search memory, the masked shift load affords the fastest way to update both the peripheral search memory and the proposed hardware-integrated GPC/search memory. It also allows less complex programming procedures in the USQ-20 computer, which will ease the analysis and evaluation to follow. The USQ-20 is a one's complement machine. The search memory must have data in straight binary form, from smallest (most negative) value to the greatest (most positive) value to perform the equal-to-or-greater-than and equalto-or-Iess-than searches properly. Therefore, the most significant bit of each data word transferred to the search memory must be inverted. This may be handled either by USQ-20 software or search memory hardware. The actual search memory does not incorporate this hardware at present, but is assumed to have this capability to simplify the ideas to be presented. SECTION III-PROPOSED METHOD OF INTEGRATION 1. SYSTEM CONSIDERATIONS The hardware-integrated search memory will require relatively minor modifications to both the presently contracted peripheral search memory and the USQ-20 computer. The over- 162 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 all system function of the integrated search memory will be identical to that of the peripheral search memory. A block diagram for the integrated search memory is given in Figure 2. The block diagram shows that the integrated search memory system still uses an output channel of the USQ-20 computer. This output channel will provide the following functions: 1. Initial loading of the search memory by means of block transfer (masked shift load) 2. A means of providing instructions to the search memory utilizing the external function mode of data transfer 3. An additional method of updating dynamic data in the search memory by block transfer if considered appropriate in a given system application. The external function mode of data transfer is a very efficient method of providing instructions to the search memory, since a single programmed instruction in the USQ-20 computer can effect the transfer of a single instruction to the search memory. The main modifications involve supplying the search memory direct access to several of the , I• C'NES FOR '56-WORD ..EMORV e .. e' S4 e I LINES FOR 2SS'WORO MEMORY II I I I..ot-------' I RESPONSE ADDRESS ~--_ CLEAR e 7 iENSE 2'· BIT OF 8 7 8 d CHANNEL ~ Figure 2. CQMPARANO OR LOAD DATA I ~ ___ UTPUT _I I 1 _ _ _ _ _.... INSTRUCTION IEXTERNAL FUNCTION MOCE) OR LOAD DATA (NORMAL. BUFFER MOOE~ L _______ ~ Hardware-Integrated USQ-20 ISearch Memory Block Diagram internal registers of the USQ-20. Specifically, output access to the 30-bit A register and output access to the 30-bit Q register is required. In addition, two-way access to the Bl index register is provided. Although the Bl register contains 15 bits, only an 8-bit access is necessary for a 256-word search memory. Output access to the Bl register is not used in the time analysis that follows, but can be useful in many applications. As shown in Figure 2, there are only two lines remaining in the interface between the search memory and the USQ-20. One line allows the search memory to clear the B7 index register of the USQ-20. The other line allows the search memory to sense the most significant bit of the B7 index register. These are the only two lines necessary to effect synchronization between the search memory and the USQ-20. Absolutely no modification to the complex control circuitry of the USQ-20 computer is required. Without further explanation, one might reasonably question the fact that only two leads in the interface between the search memory and the B, index register could effect complete control and synchronization between the two devices. The answer is, of course, that the two leads cannot alone provide the required control and synchronization of the two devices. However, in conjunction \vith judicious use of certain USQ-20 instructions, they provide for the control and synchronization of the two devices mainly by economical software utilization rather than expensive hardware modification. The key USQ-20 instruction for synchronization is REPEAT, described as follows: Clear B7 and transmit the lower 15 bits of Y (the operand) to B 7 • If Y is nonzero, transmit (j) to r (designator register), thereby initiating the repeat mode. This mode executes the instruction immediately following the REPEA T instruction Y times; B7 contains the number of executions remaining throughout the repeat mode. The instruction selected to follow the REPEA T instruction is the ENTER Bo instruction. It was the shortest instruction that could be found in the repertoire, and is normally used as a NO-OP instruction since there is no Bo register. In the repeat mode, the first execution of the ENTER Bo consumes 8 fLsec; each succeeding execution requires only 4.8 fLsec. The repeat mode described above and the added A HARDWARE INTEGRATED GENERAL PURPOSE COMPUTER/SEARCH MEMORY single leads (Figure 2) that control and sense the status of the repeat mode synchronize the USQ-20 and the search memory during transfer of search data. Details will be clarified during the analysis to follow later. 2. HARDWARE CONSIDERATIONS a. General The hardware considerations discussed below are based on data contributed by R. Horvath. Z Since synchronization and control of these integrated devices will be handled by judicious use of the USQ-20 software, the only USQ-20 hardware modifications will be those necessary to provide the search memory two-way access with the Bl and B7 registers (18 bits) , and output access from the A and Q registers (60 bits). The hardware changes that will be necessary to the present peri pheral search memory to modify it for use in the integrated system are the circuitry to terminate and gate 69 additional input lines, and circuitry to drive 9 additional output lines. b. USQ-20 Modifications Each additional input to a register stage in the USQ-20 must be ORed with the existing inputs to that stage. This is accomplished by removing the collector resistor and clamp diode from the output transistor of a standard USQ20 gated input amplifier. The open collector is then directly connected to the proper output side of the given register stage. Each additional output from the USQ-20 is handled in a straightforward manner by providing a standard USQ-20 data line driver with the feedback circuitry removed to decrease the rise and fall times. Removal of the feedback circuitry may be unnecessary. All B registers are located in Chassis No. 7 of the USQ-20 computer. The required nine modified gated input amplifiers and the nine· modified data line drivers may be located in Chassis No.8, which has 30 spare card locations available. The interchassis connectors have a sufficient number of spare pins to allow the necessary interchassis wiring. The A and Q registers are located in Chassis 4 and 5. The 60 modified data line drivers may be located in spare card locations in Chassis 3, 4,5, and 6. 163 SECTION IV-SEARCH TIME ANALYSIS 1. GENERAL In this section, a typical search problem is defined and three separate solution methods are analyzed: (1) USQ-20 computer only, (2) peripheral search memory, and (3) integrated search memory. The resulting equations and curves represent the USQ-20 computer arithmetic time used. The over-all task can be divided into three basic time-consuming steps no matter what the search system configuration might be: (1) loading data and updating dynamic data, (2) providing search criteria and instructions, and (3) handling the resulting responses. 2. TYPICAL SEARCH PROBLEM DEFINITION The typical search problem parameters are shown graphically in Figure 3. The problem is stated verbally as follows. Find the addresses of all items that are hostile AND between the limits of Xl and Xz AND between the limits of Y l and Y2 AND equal-to-or-greater than Zz. The solution approaches are governed by the following assumptions: 1. Six bits are used to describe each of the four variables for all items. 2. Initially, each of the 4 variables occupies a complete 30-bit word in the USQ20 memory. 3. Mask word will always be part of the criteria supplied by the USQ-20 computer (that is, criteria will always consist of two words-mask and match). 4. The search memory capacity is 256 words and 30 bits; however, the number of words is treated as a variable in the equations to be developed. 5. Xl and Y 1 are positive. 6. X2 and Y z are negative. 7. Zl and Z2 are positive. 8. 128 items (or half the total items) will be found between the limits of Xl and X2• 9. 128 items (or half the total items) will be found between the limits of Y land Y2• 164 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 10. 64 items (or one-fourth the total items) greater than Z2 will be found. 11. 16 items (or one-sixteenth of the total items) will prove to be hostile. 12. Only a single item will meet the logical product of all the criteria. 13. Distribution of items is shown in Figure 3. 3. USQ-20 COMPUTER SOLUTION The USQ-20 computer method of solution uses only the USQ-20 computer without benefit of the search memory and serves as a basis for comparison of solution times with those of forthcoming solution methods. A USQ-20 instruction-oriented flow diagram was drawn (Figure 4) to solve the typical search problem stated in Item 2, above. Note that a step of the over-all search task, that of updating dynamic data, is not included in the -y -z NOTE: FOURTH DIMENSION IS CLASSIFICATION OBLIQUE VIEW I I I --t--- EB: I __ 1..._ EB!EB --+----+-.- x I I ---1-- :EB I I --+-I I I I EB:EB EB:EB EB:EB EB!EB ---t- - - ---~- I x, X-v PLANE - EB:EB EB:EB EBlEB --+-EB!EB r x, 4. USQ-20 Computer Search Flow flow. When only the USQ-20 computer is used, updating is not considered part of the search task. Only when an external search memory is used, which must be updated in addition to the conventional core memory, must updating time be considered. A typical distribution of the items in the four-dimensional space was assumed as shown in Figure 3. From this assumed distribution, the number of cycles through all the loops in the flow diagram can be determined and, therefore, a very close estimate of solution time is possible. The expression for the solution time as a function of the number of items (N 1 ) and the number of typical mixed searches (N s) is handled by four possible equations: tZ y Figure --+ -- x m) + 0.240m] + 0.016ImNs, (1) T,'o:l = [0.048 + 0.0192 (N m) + 0.240m] Ns + 0.016ImN s, (2) Teo:! = [0.48 + 0.224 (N m) + 0.240m] Ns + 0.016ImNs, (3) Teol = [0.048 + 0.0866(N I - Ns 1 - X-Z PLANE 1 - PROBLEM: FINO ALL HOSTILE TARGETS THAT ARE BETWEEN THE LIMITS OF Xl AND X ~ BETWEEN THE LIMITS OF Y, AND Y 2 AND GREATER THAN Z2' 2 ASSUMED TARGET DISTRIBUTION: ,. EB SHOWS THE DISTRIBUTION OF HOSTILE TARGETS. 2. THERE ARE FOUR TARGETS (TOTAL) IN EACH OF THE 64 X-Y-Z SPACE CUBES. and [0.048 Figure 3. Test Problem Geometry + 112 (TcQ3 + 0.1216 (N Tco4 = Tco2 1 Ns T co2 ) - + m) + 0.240m] 0.016ImNs. (4) A HARDWARE INTEGRATED GENERAL PURPOSE COMPUTER/SEARCH MEMORY With reference to the above equations, 1. T is search solution time in milliseconds. 2. The subscript co refers to computer (USQ-20) only. 3. NI is the number of items. 4. m is the average number of responses per typical mixed search. 5. Equation 1 assumes the typical (see Figure 3) distribution. 6. Equation 2 assumes best item distribution for the indicated solution sequence. 7. Equation 3 assumes worst item distribution for the indicated solution sequence. 8. Equation 4 is the mean item distribution (between worst and best). Equation 4 is plotted as Curve 1 in Figures 11 through 13. The term 0.0161mNs appears in all four equations and expresses the time required to perform some system function based on the responses. 1. 0.016 (msec) is the time per average in- struction. 2. I is the number of average instructions per response. 3. m is the average number of responses per typical mixed search. 4. Ns is the number of typical mixed searches. The term 0.016ImNs, for reasons that are explained in Section V of this paper, is disregarded when the equation is plotted. 4. PERIPHERAL SEARCH MEMORY SOLUTION a. Dynamic Case As explained earlier, the peripheral search memory consists of the search memory model that interfaces with the USQ-20 computer strictly by means of the standard I/O channels of the USQ-20 computer. The typical problem will be programmed using this system configuration. The solution for the dynamic case requires periodic search memory updating since the parameters describing each item are assumed to be variable with time. The single parameter table block-transfer method of updating will be used since this appears to be the fastest 165 method of updating the search memory. An equation describing search problem solution time will be developed by actually writing an abbreviated symbolic program for each of the three basic steps of the overall search problem: (1) updating, (2) supplying criteria, and (3) handling responses. Since instruction execution times are known quantities, one or more terms can be developed for each of these three basic steps of the search problem. A collection of terms results in the required equation. The generalized functional flow diagram for this configuration is given in Figure 5. The broken line blocks are not considered part of the search problem, while the solid blocks contain the search functions that are executed from the main program. The remainder of the search functions are handled by interrupt routines that begin with the updating routine shown in Figure 6. Internal interrupt routines handle the updating, while the search routines are handled by external interrupt routines. Updating time is determined by adding the time to perform the updating functions of Blocks F, G, H, and J in Figure 5 to the total instruction execution time for performing all the instructions found in Figure 6. In addition, every word that is transferred into or out of the USQ-20 computer via a standard I/O channel requires 16-p.sec memory-access time that otherwise could be used for arithmetic access time. The total updating time is shown in Table II. Figure 7A displays the instruction list for supplying criteria to the search memory. The total instruction execution time required to supply criteria to the search memory is 1176 p.sec. TABLE II-UPDATING TIME, PERIPHERAL SEARCH MEMORY Function Blocks F, G, H, and J (Figure 5) Updating routine (Figure Time (msec) 0.080 0.684 6) I/O output access (0.016 X 4 X NI ) Total 0.064N I 0.764 + 0.064N I 166 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 START MAIN PROGRAM I-. Ir- INITIALIZE L 24 pSEC SEND WRITE INSTRUCTION TO SEARCH MEMORY VIA EXTERNAL FUNCTION [STORE CONTENTS OF OUTPUT BUFFER (X) INTO SUC- . CESSIVE LOCATIONS OF SEARCH MEMORY INTO FIELD DESIGNATED BY INSTRUCTION] K r------, .---.l ADDITIONAL SYSTEM I FUNCTIONS AS REQUIREDI L _ _ _ _ _ -.J 16 pSEC STORE· A • L (DOG) : ____ J 256 TIMES THROUGH LOOP (n = 0 THROUGH 255) 24 pSEC ACTIVATE 256-WORO OUTPUT BUFFER (X) TO SEARCH MEMORY WITH MONITOR Figure 5. Peripheral Search Memory Search Flow This is, of course, for a single search. The total execution time for N s searches per updating therefore becomes 1.176Ns (in milliseconds). Here again, every word that is transferred in or out of the USQ-20 by a standard I/O channel requires 16-f-tsec memory-access time that otherwise could be used for arithmetic access TITLE UPDATING PAGE LABEL ~PDATING I R()l1TlNE 16 f-tsec/word X 12 words/search X Ns searches or O.192Ns (in milliseconds). PERIPHERAL SEARCH MEMOR Y d CODING FORM OPERATOR 1 OPERANDS AND _TYPf • PAT ·JUMP .ENTER • SUBTRACT • ENTER RETURN JUMP' PA T ~~E-_EXT _ _ M S - NOTES INSTR EXEC (I.l~l"r.I EXEC PER "l",,,,r.' TOTAL fJLSl"r.I 24 5 120 "P + 1 "A" U (120 + j) 8 16 4 4 32 64 "A' X + 255i l · SKIP· A ZERO "A • U (120 + j) • SKIP 20 4 80 24 3 72 8 1 8 .JUMP "RAT .SUBTRACT "A • Y + 25s.to " SKIP-A ZERO 20 3 60 .ENTER "A' U(120+j)' SKIP 24 2 48 .JUMP "TAR .SUBTRACT "A " Z + 255 10 • SKIp·A ZERO 8 1 8 20 2 40 .JUMP "CAN 8 1 8 .JUMP "TAP 8 1 8 • ENTER "A· L (PAT) 16 1 16 • STORE • A " L (ABLE) [FIGURE 7AJ 16 1 16 . "SEARCH [fIGURE 7AJ "Cn • MONITOR !?56-WORD Y-BUFFER] 8 1 8 24 1 24 8 1 8 24 I 24 .JUMP RAT .OUTPUT TAR .OUTPUT • PAT • C n " MONITOR [256-WORD Z-BUFFER] .JUMP "PAT .OUTPUT • Cn • MONITOR [256-WORD C-BUFFER] .JUMP • PAT .JUMP TAP PROGRAMMER TIMl" INTINT CAN time. Since 12 criteria words are supplied for each search, the total memory access time to supply criteria to the search memory is: 8 I 8 24 1 24 8 I 8 TOTAL Figure 6. Peripheral Search Memory Updating Routine 684 A HARDWARE INTEGRATED GENERAL PURPOSE COMPUTER/SEARCH MEMORY 167 The total computer real-time consumption necessary to supply criteria to the search memory therefore is 1.176Ns + 0.192Ns (in milliseconds). time for Ns searches per updating and m responses per search becomes: Figure 7B shows the instruction list for handling responses from the search memory. The total instruction execution time required to handle responses from the search memory is 216 + 161 p.sec. This is for a single search and a single response. The total execution where TITLE SEARCH ROUTINE cri LABEL T n~ I OPERANDS + ENTER ·A· RET + SUBTRACT + STORE 'A' L (DOG)' SKIP' A NOT • DOG • C n - CAT + ENTER 'A • W (CAT) • SKIP - A NOT +JUMP • TARE PROGRAMMER ~iE-_EXT- _ _ M S - NOTES ff RETN [+IJ JUMP -ABLE 8 1 8 16 24 5 5 80 120 8 0 0 8 6 48 24 1 24 + EXT FUNCTION-cn [SEARCH' NO REPLY] 24 1 24 + RETURN JUMP· DOG +OUT • C n - CRIT IX? [2 WORDS] +EXT FUNCTION-C n [SEARCH' NO REPLY] 24 1 24 24 1 24 24 I 24 24 I 24 24 1 24 +EXT FUNCTION-Cn [SEARCH· NO REPLY] 24 I 24 + RETURN JUMP' DOG 24 I 24 +OUT 24 I 24 24 I 24 24 1 24 +OUT • Cn - CRIT 1 Z, [2 WORDs.! + EXT FUNCTION-C n [SEARCH· NO REPLY] 24 24 1 I 24 24 + RETURN JUMP' DOG +OUT • C n - CRIT IC [2 WORDSJ +IN • C n - RESPONSES [128 WORDS] + EXT FUNCTION" C n [SEARCH. REQUEST REPLY] 24 24 1 I 24 24 24 1 24 24 1 ~ + RETURN JUMP' DOG 24 1 24 l? WORD~ _. [? - Cn-CRIT IY WORDS] • Cn - CRIT I Y? [2 WORD§] JUrviP~ . . RETURr.; RET 6 120 24 • cn.- CRIT IX +EXT FUNCTION-Cn [SEARCH· NO REPLyl RET 5 5 48 OUT RET 4 80 24 1 + RETURN JUMP- DOG RET 3 5 6 +OUT RET 2 144 48 6 6 16 8 + EXT FUNCTION-Cn [STOP WRITE] RET 1 ~::iCf,ER ru~i~~ 24 +JUMP ff AND the greatest integer in - - 2 number of responses per search, and INSTR EXEC TIME (llSECl 24 8 SEARCH RET (in milliseconds), m + RETURN JUMP' ABLE +JUMP • P + 1 !lvrAIN PROGRAM] +JUMP + 0.016ImNs m+l EX TINT ABLE DOG m+1] 2 Ns CODING FORM OPERATOR __ SUPPLYING CRITERIA [ PERIPHERAL SEARCH MEMORY PAGE ~ 0.216 DOG CONTINUED IN FIGURE 7B. HANDLING RESPONSES PORTION TOTAL EXECUTION TIME FOR SUPPLYING CRITERIA LABEL T OPERATOR ~ANDLING RESPONS~""" BEG OPERANDS Q • L (BUFFER COUNT) Q • L (BUFFER LIMIT) • SKIP + ENTER + COMPARE + NI IF ~ Q +JUMP RET 7 + STORE Q DOT I + ENTER + ENTER L(DOT) •. A' U ( ) - SKIP B n • A • SKIP A ZERO ODE +JUMP PERFORM I RESPONSE HANDLING INSTRUCTIONS ODE + LOW + ENTER STORE NOTES EXEC PER "-",,,-,,rt: 2 Tu~;~~ 16 2 32 8 I 8 16 I 16 24 I 24 16 0 0 8 1 8 16 0 0 INSTR EXEC TIME (Il"-Fr, 16 Q • L (LOW) 16 I 16 24 1 24 ) 16 I 161 24 I 24 BEG 8 1 8 DOG 24 I 24 + REPLACE L (BUFFER COUNT) • 1 +JUMP + RETURN JUMP' TOTAL EXECUTION TIME FOR HANDLING RESPONSES Figure 7. 32 Bn • L ( + PERFORM I RESPONSE HANDLING INSTRUCTIONS RET 7 AND 1176 Peripheral Search Memory Search Routine 2J6 + lhT 168 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 I = number of instructions per response to perform some system function based on the response. greatest integer in m = number of responses per search, or 0.016 m+ 2 IJ Ns (in .milliseconds) [ where [m ; 1 J= the greatest integer in m ; m == 1 and number of responses per search. The total real-time consumption necessary to handle responses from the search memory therefore is 0.216 [m; 1 JNs + 0.016ImNs + 0.016 [ m; 1 JNs. == 0.764 + 0.064N1 + 1.368Ns + 0.232 [ m: 1 JNs + 0.016ImNs• (5) where T = PSD (subscript) == = search solution time expressed in milliseconds. system configuration (peri pheral search memory, dynamic) , N r == number of items, N s = number of typical mixed searches per search memory updating, undetermined (depends upon application) number of average (16 p.sec) instructions required to perform some system function that is based on each response address. Equation 5 is plotted as Curve 2D in Figures 11 through 13. However, the last term (0.016ImNs ) of the equation is disregarded whene plotted for reasons that are explained in Section V of this paper. b. Static Case In some applications of the search memory, it is recognized that the search memory data could be static. Under this assumption, the equation for search solution time is similar to Equation 5 with the updating terms removed. The equation then becomes T pss = 1.368Ns The final equation that expresses the complete search solution time for this system configuration can be found by combining the double underlined expressions that describe each of the three basic search steps: TpSD I J response words per search X N s searches , 2 Again, every word that is transferred in or out of the USQ-20 by a standard I/O channel requires 16 p'sec memory-access time that otherwise could be used for arithmetic access time. Therefore, the total memory access-time-totransfer responses from the search memory is 16 !'Sec per response word X [m ; 1 m+l + 0.232 [m: 1 JNs + 0.016ImNs , (6) where PSS (subscript) = system configuration (peripheral search memory, static) . The first two terms of this equation are plotted as Curve 2S in Figures 11 through 13. 5. INTEGRATED SEARCH MEMORY a. Dynamic Case An integrated search memory has been proposed and is described in Section III above. The typical problem will be programmed using this system configuration. The solution for the dynamic case requires periodic search-memory updating since the parameters describing each item are assumed to be variable with time. The single parameter table block-transfer method of updating will be used since this appears to be the fastest method of updating the search memory and A HARDWARE INTEGRATED GENERAL PURPOSE COMPUTER/SEARCH MEMORY TABLE III-UPDATING TIME, INTEGRATED SEARCH MEMORY also since this is the same method used to update the peripheral search memory above. An equation describing search problem solution time will be developed by actually writing an abbreviated symbolic program for each of the three basic steps of the over-all search problem: (1) updating, (2) supplying criteria, and (3) handling responses. The generalized functional flow diagram for this configuration is given in Figure 8. The broken line blocks are not considered part of the search problem; the solid blocks contain the search functions that are executed from the main program. The remainder of the search functions are handled by internal interrupt routines that begin with the updating routine shown in Figure 9. Updating time is determined by adding the time to perform the updating functions of Blocks F and G in Figure 8 to the total instruction execution time required to perform all the instructions found in Figure 9. In addition, every word that is transferred into or out of the USQ-20 computer via a standard I/O channel requires 16-J-tsec memoryaccess time that otherwise could be used for arithmetic access time. The total updating time is shown in Table III. The instructions necessary to perform the remaining portions of the over-all search problem, those of supplying criteria to the search memory and handling responses from the search memory, are found in Figure 10. The 256 TUES THROUGH LOOP In Figure 8. = 0 THROUGH 255J Integrated Search Memory Search Flow 169 Time (msec) Function Blocks F and G (Figure 8) Updating routine (Figure 0.048 0.708 9) I/O output access (0.016 X 4 X Nr Total 0.064N1 0.756 + 0.064Nr total time is (481 + 24m + 161m) microseconds. This, of course, is for a single search. The total execution time for Ns searches per updating, therefore, becomes: 0.481Ns + 0.024mNs + 0.016ImNs (in milliseconds) , where m = number of responses per search, and I = number of instructions to perform some system function based on each response. The final equation that expresses the complete search solution time for this system configuration can be found by combining the foregoing double underlined expressions. The equation becomes T rSD = 0.756 + 0.064N r + 0.481Ns 0.024m N s + 0.016ImNs, (7) where T search solution time in milliseconds, system configuration (inISD (subscript) tegrated search memory, dynamic), N r - number of items, Ns number of typical mixed searches per search memory updating, number of responses per m search, and undetermined (depends I upon application) number of average (16 J-tsec) instructions required to perform some system function that is based on each response address. + 170 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 TITLE UPDATING ROUTINE PAGE of - LABEL 1 _ ...... OPERATOR INTEGRATED SEARCH MEMORY CODING FORM T OPERANDS AND PROGRAMMER PLT. _ _ E)(T _ _ M S _ DATE NOTES ~~~}..R I ; ;...Er.~ OUTINT + RETURN JUMp· PAT [BAT - FIGURE 10 -AFTER PER- PAT +JUMP • P + 1 + ENTER 24 ~;:~r.~ER lj(~i~~ 5 4 120 8 .A·U(120+j) 16 4 64 + SUBTRACT • A • X + 255, n • SKIP' A ZERO 20 4 80 + ENTER • A • U (120 + j)' SKIP 24 3 72 +JUMP ·RAT 8 1 8 + • A • Y + 255, n' SKIP'A ZERO 20 3 60 + ENTER • A • U (120 + j) • SKIP 24 2 48 +JUMP • TAR 8 1 8 FORMING CA'l] SUBTRACT + SUBTRACT • A • Z + 255, n 20 2 40 +JUMP • CAN 8 1 8 +JUMP ·TAP • C n • MONITOR 8 1 8 24 1 24 • SKIP'A ZERO GWOR!] CAN + + ENTER ·A· BAT 16 1 16 CAT + ·A· L (TAB) 16 1 16 OUTPUT STORE • PRESEARCH 1FIGURE 10J • C n • MONITOR [256-WORD Y-BUFFER] 8 1 8 24 1 24 • PAT • cn • MONITOR [256-WORD Z-BUFFER] 8 1 8 +OUTPUT 24 1 24 +JUMP • PAT 8 1 8 +OUTPUT • Cn • MONITOR [256-WORD C-BUFFERl • PAT 24 8 1 1 24 8 +JUMP RAT +OUTPUT +JUMP TAR TAP +JUMP 708 TOTAL Figure 9. Integrated Search Memory Updating Routine Equation 7 is plotted as Curve 3D in Figures 11 through 13. However, the last term (0.016ImNs) is disregarded when plotting for reasons that are explained in Section V. b. Static Case In some applications of the search memory, it is recognized that the search memory data can be static. Under this assumption, the equation for search solution time is similar to Equation 7 with the updating terms removed. The equation then becomes T1ss = 0.481Ns + 0.024mNs + 0.016ImNs, (8) where ISS (subscript) 32 system configuration (integrated search memory, static) . The first two terms of Equation 7 are plotted as Curve 3S in Figures 11 through 13. SECTION V-SEARCH TIME COMPARISONS AND CONCLUSIONS A search time analysis was performed in Section IV for each of three search system configurations. For each system configuration several equations were derived. The dependent variable in all equations is search time, T. All equations contain the independent variable N s, which is the number of searches per search memory updating, and m, which is the average number of responses per search. Most equations contain a third independent variable, NT, which is the number of items carried in the system. T versus Ns is plotted for each system configuration on each of the three graphs (Figures 11, 12, and 13). Figure 11 treats NI as a constant 256 items; Figure 12 treats NI as a constant 512 items; Figure 13 treats NI as a constant 1024 items. In all cases, m is assumed equal to one. The search time equations that describe all the curves are derived in Section IV. Note that one term, 0.016ImN s, is common to all the equations for all system configurations. This term describes ·the time required to perform some system function, based on the search response address and must be performed in the USQ-20 computer in all cases. This function is not considered part of the search but only related to the search in that it is based on the results of the search. This term is, therefore, disregarded in the plots of the equations. Equations 1, 2, 3, and 4 describe the search time when only the USQ-20 computer is used. 171 A HARDWARE INTEGRATED GENERAL PURPOSE COMPUTER/SEARCH MEMORY TITLE SEARCH ROUTINE fI PAGE .., LABEL PRESEARCH SEARCH I INTEGRATED SEARCH MEMORY CODING FORM OPERATOR ___ TT~ • I OPERANDS AND ENTER ENTER - B n • ZERO - Q • W (Xl MASK + Bn) • ENTER - A • W (X~ + Bn) NOTES INSTR EXEC TIME (liSE C) 8 16 - • • PROGRAMMER PLT. _ _ EXT. _ _ _ MS DATE • EXT FUNCTION-Cn [$EARCH - NO REPLyJ 7 TIME~ .REPEAT NI - 10000 R • fP • BO - ZERO ~O OP] ENTER * ON COMPLETION OF SEARCH, SEARCH MEMORY CLEARS B 7 "ENTER • Q • W (X 2 MASK + Bn) . . ENTER • A - W (X 7 + Bn) • EXT FUNCTION Cn I§EARCH - NO REPLY] .REPEAT NI • 10000 R [B? TIMES] • BO. ZERO [NO OP] "ENTER * ON COMPLETION OF SEARCH, SEARCH MEMORY TOTAL (ILSEC\ 1 1 8 16 16 1 16 24 1 24 8 1 8 6.5 (AVE) 2 13 IN USO-20 16 1 16 16 1 16 24 1 24 8 1 2 13 6.5 (AVE) 8 CLEARS B7 IN USQ-20 • ENTER • 0 • W (Y I MASK + Bn) 16 1 16 • ENTER • A· W (Y 1 + Bn) 16 24 1 16 24 • EXT FUNCTION-Cn [SEARCH - NO REPLY] .REPEAT NI • 10000 ~ 7 TIME~ R • ENTER • BO. ZERO [j.lo * ON COMPLETION OF SEARCH, OP] 1 8 1 8 6.5 (AVE) 2 13 SEARCH MEMORY CLEARS B7 IN USQ-20 • ENTER • 0 • W (Y7 MASK + Bn) 16 1 16 • ENTER - A • W (Y 7 + Bn) 16 24 1 1 16 24 1 2 8 13 16 1 16 16 1 16 24 1 24 8 1 8 2 13 16 1 16 16 1 16 24 8 1 1 +m 24 8 + 8m 8 m 8m • EXT FUNCTICN- Cn [sEARCH - NO REPLY) .REPEAT NI - 10000R fu7 TIMES] 8 • BO. ZERO [NO opl 6.5 (AVE) • ENTER *'ON COMPLETION OF SEARCH, SEARCH MEMORY CLEARS B7 IN USO-20 • ·0 - W (Z2 MASK + Bn) ENTER • A • W (Z1. + Bn) .EXT FUNCTIO:N~ en [SEARCH - NO REPLY] .REPEAT NI • 10000.A B 7 TIMES • ENTER 6.5 (AVE) • BO. ZERO LNO OP] "ENTER *ON COMPLETION OF SEARCH, SEARCH MEMOR Y CLEARS B 7 IN USQ-20 • 0 • W (C MASK -+ en) "ENTER • A • W (C + Bn) • EXT FUNCTIO~ c n [sEARCH - REOUEST REPL yJ .REPEAT NI • 10000 B 7 TIMES "ENTER REP 6 EXEC PER SFAR r.H R • BO·ZERO [NO OP] "ENTER '* ENTER RESPONSE ADDRESS FROM SEARCH MEMORY INTO Bl REGISTER AND CLEAR B7 REGISTER • BAT PERFORM (I) RESPONSE HANDLING INSTRUCTIONS 16 1m 161m 8m 8 m 8 0 0 16 1 16 .JUMP • REP 6 • JUMP • BSKIP • P +1 n • B • SEARCH COUNT .JUMP • SEARCH [FIGURE 8J 8 0 0 .JUMP • PAT IFIGURE 22A 8 1 8 TOTAL EXECUTION TIME FOR PERFORMING SEARCH "'"THESE ARE NOT USQ-20 INSTRUCTIONS; THEY ARE SYSTEM FUNCTION STATEMENTS. Figure 10. Integrated Search Memory Search Routine 481 + 24m + 161m 172 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 100,000 - 2D - PERIPHERAL SEARCH MEMORY (DYNAMIC CASE) 3D - PROPOSED INTEGRATED SEARCH MEMORY (DYNAMIC CASE) 3S - PROPOSED INTEGRATED SEARCH MEMORY (STATIC CASE) - = :\.. 10,000 = " ~ .... ",, 1,000 " , '" "- "' ~ I"\~ ~~ "- I I'~ "iii "- 00 " 10 ~ .......... Dr-. ~ " ~i e! Js :I ! ""l'\ w :a j: is e! 1 1000 100 NUMBER OF SEARCHES, NS 10 512 m 1 Ns 1000. Results are 1. Peripheral search memory (Curve 2D) provides a performance increase by a fac- , u ~ "' - ~ .... ~ f'J'\.. NI L' i'.... 1'."' I'i\,. ~o - 2S - PERIPHERAL SEARCH MEMORY (STATIC CASE) I'~ Curves 2 and 3 display the problem solution time of the peripheral search memory and the proposed integrated search memory, respectively. Curves 2D and 3D handle the dynamic problem situation where the search memory must be updated in addition to the conventional memory. Curves 2S and 3S represent a static problem situation where updating is not required. The curves displayed in Figures 11 through 13 allow unlimited performance comparisons to be made between the three system configurations. The conditions, however, must be specifically stated. Two representative comparisons will be made here (see Figure 12). The conditions are as follows: - 1 - USQ-20 COMPUTER ONLY SOLUTION ("MEAN" ITEM DISTRIBUTION) 100,000 "' .~, ~ , 1 10,000 Figure 11. Search Time (T) versus Number of Searches per Updating (N s ), 256-ltem Case, m = 1 ! 1 - USQ.20 COMPUTER ONLY SOLUTION ("MEAN" ITEM DISTRIBUTION) I "l II I ~~ ~I 2S - PERIPHERAL SEARCH MEMORY (STATIC CASE) ! 1\30 - PROPOSED INTEGRATED SEARCH MEMORY 1 (DYNAMIC CASE) ~ 3S - PROPOSED INTEGRATED SEARCH MEMORY _ (STATIC CASE) - ! I! 1\ " i! Since the USQ-20 computer must perform the search sequentially, the resulting search time is highly sensitive to item distribution (within the space constraints) for any given solution sequence. Equation 1 assumes the typical distribution found in Figure 3. Equation 2 assumes best case distribution for the solution sequence used; Equation 3 assumes worst case distribution for the solution sequence used; and Equation 4 is the calculated mean between Equation 2 and 3. Since the item distribution changes in a dynamic situation, the mean distribution is the most meaningful. Therefore, Equation 4 was chosen to be plotted (as Curve 1 in Figures 11 through 13) to display the problem solution resulting from use of only the USQ-20 computer. Updating is not included in the search time, using the USQ-20 computer only, since it is assumed that updating must occur to meet other system requirements. j 2D - PERIPHERAL SEARCH MEMORY (DYNAMIC CASE) i !!! ii i I J I I 1,000 I ~ til 100 I ~ "I'.: ..., 1'IlII~ : " I ! .... "\1\. '" ~ ~~ I, '1 "\ ~ I "r'\ t--r-. 20 to.... ~ ," ~ 3D t~ ~2S '" 0 "\ 3S " , , ~ ~ 1"\[\ "\ 1 1000 100 NUMBER OF SEARCHES, NS 10 Figure 12. Search Time (T) versus Number of Searches per Updating (N s ), 512-ltem Case, m 1 = A HARDWARE INTEGRATED GENERAL PURPOSE COMPUTER/SEARCH MEMORY 100,000 ::- 1 - US0-20 COMPUTER ONLY SOLUTION ("MEAN" ITEM DISTRIBUTION) ~ ['.. ['.. 'f\.. 2D - PERIPHERAl SEARCH MEMORY (DYNAMIC CASE) - 2S - PERIPHERAl SEARCH MEMORY (STATIC CASE) - - 3D - PROPOSED INTEGRATED SEARCH MEMORY (DYNAMIC CASE) ,,~ 3S - PROPOSED INTEGRATED SEARCH MEMORY (STATIC CASE) 10,000 " ~ r\.1 'r'\. " 1,000 '" ,''''- "- ,~ ]\II' l'\.. "~ ~ I ....... 100 l'\. "0 ~f' r-- ~ t'-I'- ~D '" I 3D 2S , , II~I , !, , , ! ! !!, ! , f' I I ,1 , I'N 0 " ! "'- !!,,' , ~ ~ 1 I III I I 1 I 1I 1000 NUMBER OF SEARCHES, NS 1 iii II j 100 "- 1 i I , I , j J ~ l'\. 1"- 111 iN "'- " 10 Figure 13. Search Time (T) versus Number of Searches per Updating (Ns ), 1024-Item Case, m 1 = tor of 38 over the conventional computer only (Curve 1) solution. 2. Integrated search memory (Curve 3D) provides a performance increase by a factor of 116 over the conventional computer only (Curve 1) solution. Note that the curves show only system search time of the three system configurations. The proposed integrated search memory has an additional advantage over the peripheral search memory-less complex programs. Comparison of the list of instructions for the two system configurations (given in Figures 7 and 10) will indicate that the proposed integrated search memory approach requires fewer instructions and allows more familiar and direct programming techniques than for the peripheral search memory. 173 An important point to remember is the fact that the updating terms of the search time equations, for the system configurations that involve the search memory, only need be considered if the search memory must be updated in addition to the conventional core memory of the USQ-20 computer. If only one of the two memories requires updating then the updating terms may be ignored and, therefore, only the straight line plots of Figures 11 through 13 need be considered. Another consideration is the fact that the feasibility of transferring large amounts of data over the standard USQ-20 I/O channels for one purpose (search memory in this case) is unpredictable unless the word rate requirements of all the I/O channels are known and considered. In a complex system, there is the possibility that the I/O is already heavily loaded and that the peripheral search memory I/O requirements could result in overload of the I/O. The proposed integrated search memory, on the other hand, does not have to depend on standard I/O channel data transfer, and so this unpredictable I/O situation can be avoided. Related to this consideration is the definition of solution time. The solution times analyzed and displayed in this paper are based on computer memory access time. In an approach where standard I/O channel transfers are not involved, computer memory access time is identical with real or actual solution time. Where standard I/O transfer time is involved, memory access times are not adjacent and the actual solution time is greater than the solution time based on computer memory access time alone. This then points to another advantage of the proposed integrated search memory approach. LIST OF REFERENCES 1. GER-10857: Collection of Technical Notes on Associative M emory. Akron, Ohio, Goodyear Aerospace Corporation, 9 October 1962. 2. HORVATH, R.: Integrating the Search Memory with the USQ-20 Computer. GER11621. Akron, Ohio, Goodyear Aerospace Corporation, 4 June 1963. A BIT-ACCESS COMPUTER IN A COMMUNICATION SYSTEM Dr. Edmund U. Cohler and Dr. Harvey Rubinstein Sylvania Electronic Systems A Division of Sylvan:ia Electric Products, Inc. APPLIED RESEARCH LABORATORY 40 Sylvan Road Waltham, Massachusetts, 02154 1.0 INTRODUCTION Systems having computers and communications subsystems are increasing in number. The application of such systems span such diverse fields as process control, message switching, command and control, and multi-user online computer installations. In these systems, a significant portion of the information processed is brought to and sent from the computer on a large number of communication lines, carrying peak bit rates generally from 75 bps to 4800 bps. Often, failure of a portion of the system to provide services can entail serious consequences to the system users. Thus severe relability standards are placed on the system hardware. Many of these systems must be capable of providing service to a range in the number of users and must be able to grow as the system finds more users. Thus, one finds the need for modularity to meet these demands. Finally, as these systems are used, they must be capable of change so that they can be adapted to the ever changing and wide variety of requirements, problems, formats, codes and other characteristics of their users. As a result, general-purpose stored program computers should be used wherever possible. two computers (full redundancy) to obtain the required reliability and availability. One computer stood by while the other processed data on line. When it failed, the computers were interchanged. To handle the incoming data, many of these past systems were designed with costly complex fixed programmed bit and character buffers, and message assemblers. The buffers operated in such a way that a failure in one of them could prevent usage of a number of transmission lines. As a rule, the fixed programs wired into these units did not permit rapid changes of the characteristics of the line it handled. In this report, a design for a low-cost multiprocessor system is described which alleviates these past deficiencies. This system performs the store-and-forward operations of a message switch. A unique design of the input and output interface is central to meeting these objectives, and is the primary topic of this paper. 1.1 Operational Objectives of the Design The basic objectives of the work described were to design a message switch which provides: A. Improved operational reliability, B. Greater economy, both in the initial installation and in operation, and Past approaches toward meeting these operating requirements have been made by utilizing 175 176 c. PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 Greater adaptability to a wide variety of communication environments and procedures, initially and during operation. These goals were achieved through the design features which are summarized below: 1. The design is made modular through the use of equipment pools. These pools of processing and storage equipment lead to a very high order of reliability with low initial and maintenance costs. In particular, a method has been evolved for using a number of small digital computers to provide message switching functions with a large amount of flexibility and modularity in operation and in installation. 2. A unique method of interfacing lines with processors has been invented which decreases buffering costs, failure interaction, and bit processing equipment requirements. The method is a combination of hardware, logic and software which ideally suits the problem at hand. It makes possible to use general-purpose processors in transferring and processing messages and thus provides great adaptability to changing environments and requirements. The direct access to memory of information lines provided by this technique allows much greater equipment efficiency in handling incoming and outgoing information in a class of multiuser computer systems extending well beyond message processing. The message processing example, however, shows sufficient details to support the claims of better efficiency. 3. An efficient method for the orderly storage and retrieval of messages in a modular drum (or other medium-access-time) storage system has been evolved. This method reduces the cost of core storage by allowing frequent drum accesses and reduces the cost of the drum system by making efficient use of the storage required by the drum. 1.2 Achieving Operational Reliability To obtain the reliability offered by redundancy without the impoverishing costs of du- plexing, we have turned to the use of modular equipment pools. As an example, the processor pool serves incoming and outgoing lines. Each of these lines has a usable service connection to three separate processors. Thus, if a single processor fails completeley, its lines can be serviced by other processors that are not completely occupied. Because there are four processors in the pool, we need only 25 percent overcapacity (redundancy) in each processor to assure no loss of system capacity on a single processor failure. Many computer-centered systems require 100 percent redundancy (complete duplexing to achieve the same result. Other such systems do not use the pool concept so that despite similar degrees of modularity, a loss of a single module causes complete loss of service to a group of lines until it is either repaired or replaced. A similar pooled approach has been taken in the message storage area where any of three selection units can give a processor access to the message storage drum modules. The modularity of this design also improves maintenance times by reducing the time required to isolate faults and by simplifying the training of maintenance personnel. Shorter times to find a difficulty and correct it result in greater system reliability. An important requirement in pool operation is that failures on one side of an interface do not cause failures on the other. The magnetically coupled interfaces used in the direct-access-to-memory avoid this difficulty. The magnetic coupling is sufficiently loose that the failure of an active component cannot affect other equipments across the interface. 1.3 Achieving Greater Economy We have achieved economy in this system primarily by the invention of a new type of line access to a processor which makes the processor more than five times more efficient in the acceptance and assembly of bits from a serial line. In all past such switching systems either external equipment was assigned to the task of bit assembly or it was done in the computer at great cost in number of memory cycles per bit. By making possible a single instruction time for the handling of a single bit, we have 177 A BIT-ACCESS COMPUTER IN A COMMUNICATION SYSTEM REFERENCE FILE been able both to eliminate external equipments and to make efficient usage of our computer in handling the bit transfers and assemblies as well as the more complex but less frequent jobs of switching. Furthermore, the modularity of the processing pool allows us to choose the switch capacity to suit traffic conditions and numbers of lines in various installations thus minimizing the required equipment. In addition, the pooled approach results in much less redundancy so that we can expect an almost two-to-one cost reduction over other duplexed systems even at their optimum capacity. B ;APES ~ 3 ~ JOURNAL, OVERflOW, INTERCEPT, TAPES CROSSPOINT MESSAGE IlWM POOL Iio---~ I I I I;; I IOI~ I ~ I I I I:: I ~ IOI~ I I I I SERVICE SECTION 1.4 Achieving Greater Operation Adaptability The advantage of using a programmed processor for flexibility and adaptability to meet new requirements and situations over the older techniques of wired-in operations is now well recognized in the industry. Error correction and detection schemes may be implemented. Very sophisticated priority disciplines can be adopted on a moment's notice to suit the situation at hand. Changes in codes, formats and routing indications can be handled. With pooled design, vIe can re-assign lines not only under failure conditions but under conditions of changing traffic patterns because line assignments are made electronically and each line can be assigned to any of three separate processors. In talking about the adaptability to change, we should also speak of protection against unwarranted change. We observe that. this system, being primarily under programmed control, permits protection from tampering by making initial program entry possible only from protected devices, while subsequent modification through the console or other external devices would be solely under the control of some internal program. 2.0 Description of the System 2.1 Description of Switch Operation The block diagram of Figure 1 shows the equipment pools and their interconnections. The most important pools in the normal on-line operation of the switch are the processor pool, the message drum pool and the processing Figure 1. 200-Line Message Switch Major Subsystems Diagram (Maximum Lines). drum pool. The tape and console pool playa subsidiary job as they are only partially utilized in the routine switch operation. Briefly describing the input processing, the incoming bits for each line are sent to three processors in the processing pool. A supervisory program has previously assigned each line to one of these three processors. As the bits arrive, the processor assembles them into characters and checks the characters for special system coordination and control information. Included in these control information groups are the routing indicators which identify the message destination and precedence characters which indicate the priority of the message. When these arrive, an access is made to the processing drum pool by the processor to translate these groups into outgoing line numbers. When the outgoing line numbers and the precedence of the message are known to the processor and the message has fully arrived, an entry is made into a table (queue list) to alert the outgoing line that a message is a waiting transmission. In addition, the processor enters somewhat different information onto a ledger, which maintains an account of the message status; i.e., those lines on which the message is to be transmitted, those on which it has been transmitted and those which have acknowledged the transmission. Simultaneously, the processor trans- 178 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 fers the incoming message onto the 'in-transit" message-drum pool and onto the reference-tape pool. This it does in fixed-size bins. The initial entries into a journal are also made in the course of the input message processing. The journal is a chronological listing of the actions taken on a message while it is in the switch. In output processing when a processor finds that one of its outgoing lines is no longer busy (or at fixed intervals after a nonbusy condition) , it refers to the queue lists, which are identified by line and precedence, to obtain the message-drum address of the next message to be transmitted on that line. It then makes arrangements to retrieve that message from the drum and to send out the characters one bit at a time. In the meanwhile recordings of these actions are made upon the journal tape. When the message is completely transmitted, additional entries are made in the ledger to indicate transmission. When all transmissions of a message have been made the in-transit message and the ledger are erased. 2.2 The Processor Pool Functions The processor pool accepts the bits from a line, assembles them into characters, disassembles them and sends them to a line. Secondly, it examines the incoming characters and performs a variety of routing, queuing and surveillance functions based on these. Thirdly, it stores groups of characters in its core memory for buffering other storage pools (primarily the message drum pool). Finally, during slack time and routinely, the processor evaluates switch operation and traffic for maintenance and adaptation purposes. It is evident that a general-purpose processor can handle all of these functions, and if it is time shared, efficient hardware usage can be achieved along with flexible operation. In Section 3.1, it is demonstrated that bit and character processing dominate the other processes in computer usage thus making the interface techniques important in improving processor efficiency. Because this paper is primarily on the interface technique which we have used between the communication lines and the processing pool, our discussion will center on this pool. 2.3 Message Drum Pool Functions The maj or message drum pool function is the storage of messages to accommodate line availability / demand variations. This variation shows up as messages stored in "in-transit" storage awaiting lines to become free for transmission. It is clear that this function requires orderly and efficient storage of messages. Orderly storage of information on the message drum and efficient transfer to and from the processing pool have been achieved by the use of list processing techniques. Because of properly chosen accessing procedures, all bins of information in the processor memory are of the same size, so that the information may be stored on the drum in fixed-size bins and successive bins chained to previous ones. A complete empties list keeps track of all available storage space remaining on the drum, thus permitting very efficient storage filling. Because all messages are stored in fixed-size bins, the problem of cross-office speed conversion is automatically accomplished. A full discussion of the message drum pool techniques is beyond the scope of this paper. 2.4 Processing Drum Pool Function The processing drum pool stores the lists and registers used in the processing job by the processor pool. Its lists, as a matter of fact, are used in common by all of the processors of the processing pool to provide a record of: where the message ,viII be kept on the message drum pool, on what lines the message is to be transmitted, in what sequence the message is to be transmitted, to what lines the message has been sent, which message to transmit next, where the message is located; and to update the ledger entry of lines to which the message has been sent. Even though normal operation of this pool is independent of the message drum pool, it makes use of the same drums for storage. This is possible and efficient because both storage capacities are determined by the maximum probable queue build up. 2.5 Other Equipment Groups The other equipment groups within the switch are not as central to the normal operation of the switch, and will not be covered in this paper. A BIT-ACCESS COMPUTER IN A COMMUNICATION SYSTEM 3.0 SYSTEM DESIGN 179 regularly or very frequently and are given in Table IV. 3.1 The Processor Pool 3.1.1 Processing Jobs Messages on various lines and trunks may differ in code, bit rate, and message format, but in each case, the message consists of a header, text and ending. The header includes routing and message priority. Some messages are divided into 80 character blocks for transmission and reception purposes. In a message store and forward system, processing of two types are encountered. The first type centers about the acceptance, storage and transmission of messages and the second type about the control of the switching system. Tables I, II and III indicate typical message processing functions. Routine functions are classified in these tables as either of a bit, character block or message type. System control functions keep the switching system performing effectively by supplying the switching center supervisor with data useful in the management of the store and forward center and network, and as an aid in maintaining and testing the switch, its programs, and data base. They are generally not performed 3.1.2 Discussion of Store and Forward Switch Functions An examination of the list of functions given in Tables I-IV shows the store and forward switch functions fall into four classes: data formating, system operation, signal acceptance and transmission and recording (storage) operations. In order to obtain an order of the importance of these functions to message switching, it is desirable to classify them in the order of their frequency of occurrence. The acceptance and forwarding of bits are the most frequently occurring functions. They occur at the incoming and outgoing bit rates. The next most frequently occurring functions are those associated with each and every character which enters or leaves the system, for example, control character check. Most functions are not performed on each character. Header validations and entries in queue lists for example are performed on entire messages independent of the number of characters they contain. The character functions occur lib times as frequently as the bit function, where b is the number of bits per character and averages almost 7 bits per char- TABLE I-INPUT PROCESSING-ROUTINE Bit Accept bits and assemble characters Check parity of characters Detect system control characters Assemble characters in words and bins Write messages in "In-Transit Store" Initiate preemption for flash messages Verify header Perform routing Enter incoming message data in ledger Check block parity Write message on reference tape Enter data in journal Acknowledge accepted messages Count blocks Enter data in queue lists Assign serial number for processing x x Character Block Message x x x x x x x x x x x x x x 180 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 TABLE II-OUTPUT PROCESSING-ROUTINE Bit Use queue lists to initiate message transmission Make journal entries Updata ledger Retrieve messages from "In-Transit Store" Convert formats Convert codes Construct block parity Check security for each block Remove messages from storage which have been transmitted Convert messages to a bit stream Character Block Message x x x x x x x x x x x acter. The next set of operations in the order of descending frequency of occurrence, are those which occur for each block. They occur l/cb as often as the bit functions. Here, c is the number of characters per block and is about 80 characters. The remaining functions occur once per message or so. If there are m characters per message, the message functions occur l/mb as often as the other functions where m is approximately 2000. Then each character, block or message function occurs respectively 8, 640, 16,000 times as infrequently as a bit function. line rates (from 75 bits per second to 4800 bps), and various bits per character (from 5 to 8 depending on their code) must be considered. The most frequent functions then are the bit functions. When a bit arrives, it must be stored in the proper place in a character, used to update the character parity count, and counted to establish the arrival of a full character. When a full character is received, it is transferred to another location in memory for further processing. A similar per line process occurs in reverse order when information is disassembled for forwarding. These bit functions are performed on each line at a rate determined by the information rate on the line. In designing an equipment to perform this function, various Two characteristics of these bit and character functions should be noted. The first is that there exists a variety of functions and the TABLE III-MESSAGE PROCESSINGNON-ROUTINE Control errors. Print or display messages. Initiate service messages. Manually retrieve message. The character operations are somewhat more complex. Typically, characters must be examined to determine if they are system control or coordination characters, and if their parities are correct. When transmitting, the character codes may require conversion and the block parity must be determined. The characters also must be counted to determine block length. TABLE IV-SYSTEM CONTROL PROCESSING TASKS Maintain in-transit storage status. Maintain status of traffic. Activate overdue message alarms. Activate queue threshold alarms. Execute maintenance routines. Control program maintenance routines. Activate and control start-up. Activate and control recovery. Check confidence levels on equipments and links. Provide line synchronization. Allocate hardware. Accept supervisor initiated commands. Provide statistical analyses of traffic. A BIT-ACCESS COMPUTER IN A COMMUNICATION SYSTEM second that these same functions are performed on all lines. The latter assumptions imply time sharing of equipment while the former implies a reasonably complex assortment of equipment. 3.1.3 The Processor Interface With Input and Output Lines Because the most frequent operations in the message switch are the acceptance and the delivery of bits of information, earlier switch designs used special equipment to accept bits from a line and assemble them into characters for use by the processor. That approach involved considerable equipment which was peculiar to a particular line type. Recent designs have made this equipment sufficiently flexible (generally by pluggable programs) to be suitable for a wide variety of such lines. However, in so doing, any efficiency derived from specialpurpose equipment was lost. Furthermore, an efficient method of providing alternate capability (redundancy) in case of equipment failure was not provided. Seeking economy, redundancy, flexibility and simplicity in handling bits, we use a general-purpose machine, taking advantage of its high speed to perform service for a number of lines. A direct and inexpensive interface is made to each line. Each line interfaces with a number of processors so that in case of failure, assignments can be made electronically for other processors to take over the lines formerly served by the inoperative processor. The communication lines interface directly with computer memory cores in our design. A single instruction ( one memory cycle in length): (1) accepts a bit from the communication lines; (2) puts it in the proper bit location in a memory word which is employed as a character buffer for that particular line; (3) checks to see if a complete character has yet arrived; and (4) computes the parity bit for the character. The operation is accomplished almost entirely with existing equipment in the main computer memory. Additionally, it provides alternate servers for each line with sufficient decoupling to assure that no failure on one side of the interface can cause a failure on the other side. Both the method of entry and the handling of the bits within the machine will be described in what follows. 181 3.1.3.1 Bit Handlin.q By A New Instruction The lines coming into the computer are actually wired into the memory of the computer as described in the next section. Each incoming communication line will be accompanied by a synchronizing line which specifies timing. Each of these wires is wired into a memory core at a location which is permanently reserved for that communication input. A program will cause the line termination memory locations to be scanned at times specified by interrupts from a real-time clock and will then put the received data bits into the proper positi()n in that word. When the word is full, it will be transferred to another location and character processing will begin. A new type of instruction in the processor puts the bit into the proper place in the memory location for the line, checks to see if the location is full and computes the character parity. The entire instruction takes just one memory cycle of the computer. One additional memory cycle must be used to determine the line to be scanned next. This latter instruction is just an unconditional branch instruction whose address portion is determined when scan lists are made up. The new type of instruction is exteTnally determined which means that its effect is not determined at the time the program is written but rather is determined by subsequent input. This is not quite the same as a branch or skip instruction which merely constitutes a choice of where the next instruction is taken based on post-programming inputs. With the direct interface it allows inexpensive appropriate control of a processor by a number of external users. For purpose of accepting bits the instruction nature is determined by an incoming synch signal and by a marker bit which determines the end of character. However, the instruction is programmed in the normal manner as part of a subroutine which performs line scanning. The instruction format is as shown in Figure 2. The instruction code part of the instruction word contains a partial code and two externally set bits. The address field part of the instruction word contains the operand for the instruction. When the scan program causes this instruction to be read-out from the memory, the 182 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 DATA BIT INSTRUCTION CODE GENERAL ORDER oI oI oI o 0 I 0 0 - I 0 I 0 I - oI INSTRUCTION EFFECT 0 I X X - - - - - - CHECK SYNCH AND MARKER 0 I I 0 - - - - - - SHIFT BITS 1-16 lEFT CLEAR BIT 17 AND RESTORE. (POSITIONS DAT .... BIT.) UPDATE PARITY. - - - - - NO OPERATION. - - - - - BRANCH TO X AND MARK PLACE. (THIS STARTS CHARACTER PROCESSING.) ~ AND MARK PLACE. (THIS STARTS CHARACTER PROCESSING) 0 I I I - - - - - - IlI!ANCH TO Figure 2. Instruction Formats for Bit Processing. operation which is executed will then depend upon the two indeterminate bits. For the time being, let us assume that the second indeterminate bit is a zero. The first indeterminate bit is the synch bit, which will be a one if a new data pulse has come in since the line was last scanned. The full instruction code is then 010110 which (see Figure 2) entails a shift operation upon part of the instruction word. The data bit which was in the last significant bit of the word is shifted left one position and, thereby, entered into the partly assembled character. Simultaneously, the character parity is updated. In addition, the synchronization bit is cleared and the entire word restored in the same memory location. If when we had read the line word out, no synch bit had come in, then the instruction (010100) would be interpreted as a no-op and the word restored without modification. The use of the marker bit is fairly simple. When we arrive at the end of a character, we would like to know about it so that the entire character can be moved to another location in core memory thence to enter on character processing. To do this, the program sets a "1" into a particular bit of the input line instruction in the normal address field. Since the- address field is initially all zeros, the marker bit will be the first one to show up in the second indeterminate bit of the instruction code (due to a succession· <)f shifts). Thus, whenever a "1" appears in this position, it indicates that a complete character has been received, and the instruction becomes a branch instruction which branches to a sub-routine to take care of transferring the character. Two different branches are indicated here because the asynchronous nature of the system may allow a data bit to come in after the character was filled. On the other hand, there may be no new data bit that has come in. In one case, a single bit must be preserved in the memory location and in the other case, it need not be. While this is one type of externally determined instruction, there are others possible, and in fact, the line equipment used in making the asynchronous to synchronous conversion can be embodied in an additional indeterminate synchronous bit possible in the instruction code. The full power of such an instruction particularly in message switching and command and control has not yet been realized. 3.1.3.2 Description of Interface Electronics The basic system of entry into the memory is illustrated in Figure 3. Each incoming line is wired into a core which is effectively part of the main memory. These cores are all in the same bit position in the memory. The information coming into the core is written into each core on the basis of a coincidence of input-current and a write-current from the computer; the latter being supplied on every memory write cycle (by the y-drivers in Figure 3). The information is read, out of the core by the standard read cycle of the memory. Thus, once having been written in, a data bit is available for read-out with the rest of the word using the standard memory equipment already in the computer (with some exceptions to be covered later) . INCOMING UNES CoURY 1/2 - WlfTE CURRENT DATAl TO MINOP:'f REGISTEl SYNCH 1 1 1 I DATAl SYNCHJ FROM COR£ MINOP:'f . CIRCUITS CORRESPONDING MEMORY LOCATIONS ARE RESEItVED FOR A UNE INSTIt\JCTION WORD Figure 3. Processor-Input Line Interface. A BIT-ACCESS COMPUTER IN A COMMUNICATION SYSTEM However, we recognize that a single bit could be written in and read-out many times since input pulses are much wider than memory cycles. Furthermore, a read-out of a zero data bit is ambiguous since it may indicate no input rather than zero-input. Thus, we must provide a synchronizing channel for each line, which will indicate when a data bit is available for reading. A similar input to another bit of the same word will be used for such synchronization purposes. However, the synchronization pulse must be timed to a single write pulse of the computer. This means there must be an asynchronous to synchronous converter timed from the master timing source (which also times the computers) for each incoming line. While we have devised a method of employing a third channel to obviate the need for the conversion equipment, we will not discuss it here because the economic tradeoffs are not clear. Actually, the input lines are not wound through the cores which are normally located in the main memory stack. Instead, two additional very small memory planes (Figure 3, Auxiliary Memory Cores) are provided which are the storage locations for those particular bits of memory. These planes are wired in with the x and y and z lines of the main memory. A coincident current memory will provide two half-writes, x and y. Either the x or the y write current will be provided to the external cores as a 1/2 write common to all auxiliary cores. Thus, coincidence with the external 1/2 write signal will write in a "I". A separate sense winding will be provided to prevent interference with the normal words of memory and thus a separate sense amplifier will be provided for each of these two planes. The output will be logically added to the output of the normal sense amplifiers. Thus, all the input lines can be implemented with the addition of the auxiliary planes, two sense amplifiers, a few diodes and two gates. While each line has been described as threading a single memory, in actuality it would thread cores in three sep~rate processor memories. Thus, anyone of the processors coupled to this line could service the line if its programmed scan included the line. On-line program modification could take care of reassignment if it became necessary. Be- 183 cause the input line is magnetically coupled to the memory, no processor failure can disable it; i.e., it will still deliver its write current to the other processors with which it is associated. Furthermore, if a portion of the line equipment fails, it disables the particular line but in no way prevents the processors from servicing other lines. Thus, this magnetic coupling has the sort of ideal loose coupling described earlier. 3.1.4 Programming System A master control program schedules operating programs and provides for hardware and line assignments. Consequently, it organizes routine and non-routine activities. The portion of the control program which refers to the routine functions resides in the core memory of each processor and has the facility to call the remaining program or portions of it from the processor drum memory to core storage when required. The operation of the control program is tied to interrupt signals from a real-time clock. These signals occur as often as required for the processor to sample its incoming lines for signals and to supply information to its outgoing lines. The processor need not keep track of elapsed time. The operation of the control program and the operational programs for the functions proceed as follows: the control program, on the basis of information describing the lines assigned to it, schedules groups of lines to be scanned at a time. When a real-time interrupt occurs the program transfers control to an appropriate program for handling the line scanning functions. If during the line scan a full character is found to have entered the machine, the character will be entered into core memory with others in its message. If the character shoul~ be a system control character appropriate action will be noted in a list kept for scheduling by the control program. When all the lines have been scanned, control will be returned to the control program. At this time, the control program decides what its next course of action will be through an examination of its scheduling list. It might examine the control characters to determine their significance. If one of these was a start of message, it would initiate a header verification and then have control 184 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 returned to it for user action. As characters are accumulated in memory or transmitted from memory by the operating routines, they would signal the control program to initiate a program which would bring more information from the drum for transmission or would have information taken from core memory for storage. Programs for the handling of change in line assignments due to hardware failures will either be manually or automatically initiated. The manual initiation will occur from the supervisor's console. From this position, a computer will be selected and a message sent to this computer to initiate a program that would remove a computer from service and reassign its lines. This program will be retrieved from the processor drum, together with a list describing line assignments and characteristics. It will then determine another group of assignments based on an algorithm previously decided on, which is judged to minimize the overloading. Queue lengths would be available to this program if required. The change in line assignments requires no hardware change. 3.1.5 Processor Rate and Storage Requirements The total number of memory cycles required for the processing job can be divided by the number of input bits to give a figure of merit which is independent of the capacity of the system. At times one sees such a figure expressed as instructions per throughput bit. However, the number of output bits exceeds the number of input bits in a system where multiple addresses are allowed. According to one estimate for a large system, the average output rate will be 75% higher than the average input rate. Thus, the proposed figure seems more natural and allows one to evaluate the required processor memory speed. Other system factors do influence the proposed figure of merit. For example, the average number of bits per character, characters per block and blocks per message will determine the number of instructions per bit used in character and message processing. These parameters have been chosen as discussed in Section 3.1.2. Trial programs of bit and character functions have been worked out to obtain the data on which to assess the processor requirements. In estimating program complexity, a prototype instruction code containing twenty instructions was used. The input bit processing takes two memory cycles per bit of direct input processing and approximately 0.5 memory cycle per bit attributable to the control program. The former load may be reduced by limiting the flexibility of the scan cycle, and in fact may be reduced to 1 memory cycle per bit for a fixed or nearly fixed scan. The output bit processing is similar in memory cycle usage. Input character processing can be done in 36 memory cycles per character and the output processing in 28 memory cycles per character. To determine the full processor rate required for each incoming bit, the bit and character process rates must be augmented by the memory cycles required for the block and message functions. Our analysis of the drum transfer and other routine block and message functions indicates that 2500 instructions per message received and 50 instructions per block received is a generous allowance for these functions (Le., 5000 and 100 memory cycles respectively) . The total number of memory cycles per input bit is then conservatively fixed at 2.5 + 1.75 X 2.5 5000 2000 X 7 36 28 7 7 +- +- ---- + 50 80 X 7 X 1.75 + = 19.5 memory cycles This is equivalent to ten instructions which is what most systems require for a single interrupt to process one input bit. The core memory associated with each processor is used to hold the currently executed programs (control programs, operating programs, and maintenance programs when required), the data base for the execution of this program (code conversion tables, empties-registers, queue entries by precedence for each line, address of next ledger entry, etc.) and the messages prior to storage on the drum. I t is expected on the basis of preliminary estimates that the programming and its data would consume less than 2000 core memory words. A BIT-ACCESS COMPUTER IN A COMMUNICATION SYSTEM The message buffer is required to hold about 70 words per line. If a double buffer scheme is used to prevent buffer overlay before transfer to in-transit storage, for a computer handling 67 lines, about 10,000 words for message buffers are required per computer. The most important facets of the processor, the interface with the communication lines and the memory size and rate have been discussed. The remaining features of the processors are of conventional nature. BIBLIOGRAPHY 1. HELMAN, D. A., et aI, "VADE: A Versatile Automatic Data Exchange," IEEE Trans. Communications and Electronics, No. 68, p. 478, September 1963. 2. HARRISON, G., "Message Buffering In A Computer Switching Center," IEEE Trans. Communications and Electronics, No. 68, p. 532, September 1963. 3. POLLACK, M., "Message Route Control In A Large Teletype Network," J. ACM, Vol. 11, No.1, p. 104, January 1964. 185 4. GENETTA, T. L., GUERBER, H. P., RETTIG, A. S., "Automatic Store and Forward Mes-:sage Switching," WJCC, San Francisco, AFIPS, Vol. 17, pp. 365-369, May 1960. 5. HELMAN, D. A., BARRETT, E. E., HAYUM, R., WILLIAMS, F. 0., "Design of ITT 525 'VADE' Real-Time Processor," FJCC, AFIPS, Vol. 22, pp. 154-160, 1962. 6. SEGAL, R. J., GUERBER, H. P., "Four Advanced Computers-Key to Air Force Digital Data Communication System," EJCC, Washington D.C., pp. 264-278, December 1961. 7. WOLF, F. G., "Application of A Modular Data Processor to Store-and-Forward Message Switching Systems," Proceedings Ninth National Communications Symposium, pp. 198-207, 1963. 8. "Message Switching and Retrieval In A Real Time Data Processing System," Comma Catalyst of Progress, National Communications Symposium, Ninth, pp. 190-197, 1963. VERY HIGH SPEED SERIAL AND SERIAL-PARALLEL COMPUTERS HITAC 5020 AND S020E Kenro Murata and Kisaburo Nakazawa Hitachi Ltd. Tokyo, Japan 1. Introduction (2) Program Flexibility. The refined and flexible instruction system, in conjunction with a number -of multi-purpose registers to be used as accumulators, index registers, and many input-output control registers, gives powerful possibilities to the programming activities. (3) Simultaneity and Multiprogram Activity. The memory time sharing, the concurrent operation of various control unit, the automatic program interruption, the memory- protection, and the introduction of priority mode are prepared. HITAC 5020 family consists of general purpose computing systems designed to solve a wide variety of problems for both scientific and business data processing. HITAC 5020 system is a medium-scale junior version of this family, and would have the same performance characteristics as IBM 360/40 50. 1 A purely serial logic construction is the remarkable feature of this system. HITAC 5020E (5020 ENHANCED) system is a large scale senior version of this family, and would have as much performance as IBM 360/62 - 70. But this system is constructed in serio-parallel logic form for economical reasons. This paper reviews the engineering design of HITAC 5020 and 5020E systems with primary concentration on central processing unit. The central processing unit of this family is designed to rapidly and economically perform fixed or floating point arithmetic operations in either single or double precision, and even more to be able to process bit-wise variable length data.2, 3 It contains 18 MC, 2-phase serial transistor-diode logic circuits and helical transmission lines 4 for accumulators, index registers and other various registers. 2. Outline of HIT AC 5020 and 5020E System The 5020 system is organized along four basic lines, Main Memory, Arithmetic and Control Unit, I/O Channels and I/O Devices. Figures 1 and 2 show those 5020 and 5020E system configurations, respectively, and Figures 3 and 4 are pictures of the 5020 system. Our design goals of 5020 family are as follows: 2.1 The Main Core Memory The core storage of the 5020 has a capacity ranging from 8,192 words to 65,536 words (32 bits each), and is directly accessible from the (1) High Performance per Cost. We have realized this feature by introduction of helical transmission lines combined with the 18 MC serial logic system. 187 188 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 Main Memory 16 Kw (2.Ops) 16 Kw 16 Kw 16 Kw 1 2+1 Ari thmetic end Control Uni t to other Computer to other Computer Figure 1. The System Configuration of HITAC 5020. central processor and input-output control channels, with read-write cycle of 2.0 sec per 32 bits. The memory addresses 0 - 31 are transmission line registers and so absent in the main core memory. The read-write control of the 5020 Arithmetic and Control Unit or Channels is rather simple one, not overlapped, and the access to the memory by the central processor is delayed only if any of the input-output channels or the central processor are attempting to access the storage at the same time. in bank 3; 8, 9 in bank 0; and so on). This increases independent operation efficiency. In other words, instructions, operands and I/O informations, etc. are referred through the respective exclusive memory refer controls (four in total as seen in Figure 2). With these features the effective storage speed is doubled, while the controls are referencing the separate banks, since more than two banks are simultaneously accessed. In the case of the 5020E, the core storage can contain anyone of 16K, 32K, 49K, or 65K words as a unit. Although the core storage cycle time is 1.5 microseconds, the effective increase of the access speed is realized through the completely independent access to each of the separate banks into which the whole core storage is effectively divided. (i.e., two banks in case of 16K words). Instructions, operands and I/O data, etc., are, through their respective controls, transmitted via the Core Memory Multiplexer which timeshares the data flow between the core storage and either the processor or the I/O channels. In addition, during one storage cycle, data of two word length (64 bits), or three or four word length (96 bits or 128 bits), in case of the operand, can be transmitted in parallel, thereby increasing the storage access efficiency. Two words (64 bits) in each bank of 16 KW capacity are referred at a time, so the effective memory speed is doubled. Moreover, each block, which is made up of four separate banks, is addressed as shown in Figure 2 (i.e. address 0, 1 in bank 0; 2, 3 in bank 1; 4, 5 in bank 2; 6, 7 The 5020 instructions can designate up to 65K words (16 bits address field) of the core storage addresses. However, the 5020E is so designed that the enlargement of the core storage is possible. A field conversion up to 260K words of the store capacity is feasible. VERY HIGH SPEED SERIAL ANDSERIAL-P ARALLEL COMPUTERS Main 16 Kw 16 Kw Memory 16 Kw 189 (.1.5 ps) 16 Kw l6_ 1 16 Kw 16 Kw 40 32 Core Memory Multiplexor Instruction Operand Unit Channel Exchange I Unit Channel Exchange II Execution Unit Arithmetic and Control Unit to other Computer Figure 2. The System Configuration of HITAC 5020E. In the case of more than 65 KW capacity of the 5020E, every 32 KW memory module enlargement is possible. The effective address field of 16 bits is expanded to 18 bits (theoretically 21 bits) by Figure 3. Front view of HITAC 5020. means of a preset instruction and a LMM (large memory mode) indicator bit. The preset instruction prepares 21 bits address information which is added to index modified address field of next instruction, and Figure 4. Console and II 0 Devices. 190 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 generates the effective address field of 21 bits. When the LMM is off, the index modification is performed on 16 bits field so the information is performed seen as 0 to 216 - 1. When the LMM is on, the modification arithmetic is executed on 16 bits information seen as -2 15 ~ +2 15 -1. Our choice of the above mentioned convension is done for assurance of complete compatibility between the system of less than 65 KW capacity and the more. 2.2 Arithmetic and Control Unit The Arithmetic and Control Unit executes and processes stored programs and data. In the 5020, the arithmetic and logical operations are all performed in serial logic. Basic cycle time is 2/Ls for 32 bits information and 4 bits unavailable bits (these are used for multiplication acceleration) of 18 MC. In contrast to the 5020, the 5020E is designed to adopt a four-bit parallel logic configuration based on the high speed circuit which has been proved to be completely feasible with the 5020 systems. By this alone, therefore, approximately four times increase in operation speed over the 5020 is readily possible. In the 5020 the multiplication of 32-bit by 32-bit word is executed in 8 basic cycles using four adders, whereas the 5020E can perform the same multiplication in 2 basic cycles (1.0 microseconds) using a multiplication unit. The 5020E processor takes instructions and operands in advance, that is, three controls: the control for instructions, the control for operands and the control for executions are constantly operating in parallel. Therefore, the instruction and the operand access times would scarcely appear in the total execution time required for the instruction. This is a kind of "advanced control" facility and will not present any trouble, since the instructions will always be executed sequentially and the interruption facility will be the same as the 5020. Specific comparisons of the internal speeds of the 5020 and 5020E are illustrated in Table I. Comparative capacities indicate that the 5020E is approximately 8 to 12 times faster than 5020 in scientific applications and 8 times in the other applications. Table I also shows the performance characteristics of other typical computer systems, for comparison. 1 ,5 Furthermore, the 5020E can accept the object programs which are written in the 5020 Table I. HITAC 5020, 5020E-Execution Times (in J.L sec) (Including instruction read and index-modification time) I Ooerationa Add and INbtract KultiJll1' Diude i tvce !1xedDOint floatingnoint fixadDOint floatingnoint fiDdnoint fioatingnoint Shirt Jump store Inner--100p of the fixeelpol;vnaa:l.al. cal.cuPOint lation fioatingPx ~ ~ =p point Inner--1oop of the fixedmatrix mult1pl1cPoint at10n floating:s .. Bi b_1 = ~ point Inner-loop of the fixedmatrix inversion POint by a direct math- f'loatingpoint eel&[ .. ctb 1 &[ = single or double I -lellgth 5020E 160/W 5020 single 8 0.75 .... 2.2 11.88 double 12 ' 1.25 .... 2.2 '3.0 "",1. sinde ~"l.24 18.66 doUble 16 ..... 26 3.0 ..... 3. Z1.66 2.0,,- 2. 8L..~ sinde 24 double ~ 3. ..... '5 36,,-38' 2.71 '" rs sinde ~.61 21)9.18 double 72 ..... 74 4.2 '" .25 186 7. "'- .0 sin.d.e l&2 double 20.0 ... 23.0 .JJ3 9.5 .... 12.;75 I '7 "... 80 12S sin.d.e I 1..77 double 13 .... U&2 17.0""2l.0 .... 10 ll.21);v25 s~le 1.0 ... 10 double I 12.;,....43 1.5 jump 0.5 ... 1.5 SoL3 4 non-jUJlll) 8 1.5 S.rb llinde 1.25 2.1) 10 12'. 1.25 .... 2.5 I double 16 17. 3.5"",5.5 sinde 36 double 96 6.0"" 7. 5 6.0",- 7.0 sinde 64 double 100 7.01.. 8.0 5.0 .... 7. 5 sinJUe 48 121 double 8.0 112 8.0~10.0 8 125 sinJUe double 9.0", ll.O II 317 6.0 .... 10.5 sinde 50 double 8.0'1.10.5 12 sinde 9 10.0 ""12. S double 11.0 ..... 13.; 13 360/50 4.0 tI.B8 9.69 28.52 2l.5 38.0 33.25 . 3l:IJ/62 1.87 f 3l:IJ/70 1.05 3.2 3.22 tI.12 1.11 1.13 2.8 6.12 9.62 ll.2 2.2 4.2 5.7 I~~~ 207 1...21) 6.18 6.70 b:£ 20:9 ll.CIl ~ till ! ! 71 8 , 9 ,", M ~ '~ tl= ! , ' ! , , , , ~ ! , , ! , ! ," I ! ________ ~ Mask indicator corresponding to the same bi taddress of address 1/1 r-I C\I r-I C\I ....\ ' ________ ~ "II:, "'" ~ ~ ",om -d MOO CIS = trj §,~ ~~ ,--_ _~_ _ _ _ _ Channel Control Resist.:..:e=-r_ _ _ _ _ _~ address Ii' S. 30 ,0 I ! I ( I , , ! I '~ 116, , I ?O 1 21 , ! , , , ,31 I ~----~v-----~ ~~--~~--~ Command Address Statios Indicator Count Figure 7. Assignment of Special Register Bit. I; Indirect address bit A; Channel number specification B 2 ; M2 part modifier 12 ; Indirect address bit M2 ; "Command" location specification D; Device number specification W; Mask bit setting L; "PS" instruction variation 4. Input-Output Channel and Its Control The I/O control of the 5020 family is executed by means of I/O channels, and the Arithmetic and Control Unit or the channel timeshares main memory as occasion arises. An I/O channel is exclusive to a certain type o~ I/O devices, and the simultaneous operation of maximum 12 channels is allowed. The channel operation is initiated in Arithmetic and Control Unit by I/n instruction (Format IV), which is valid when and only when the Priority Mode Indicator (the 4th bit of address 17, of Figure 7) is on. The L part of an I/O instruction assigns subfunctions such as Read, Read Backward, Write, Advance, Advance Backward, Rewind, etc. The channel having received the control, starts the designated unit and transfers informations between I/O unit. The transmission area is defined by "commands" (as shown in Figure 8), which are initially- addressed by a I/O instruction and are fetched from the main core memory also by use of memory read facility of the channel. The command operations can be chained by designating a bit in a command. Each part of a command shown in Figure 8 represents the following: M; initial main memory address or drum address N; number of word or block to be processed R; transfer or skip S; variation of operation T; disconnect or proceed The adoption of commands in the 5020 and 5020E systems makes the I/O operation more efficient, as so called scatter read, gather write or symbol control is possible. Having left the I/O control to the channel, the central processor processes ensuing main program simultaneously with the I/O operations of the channel. Its memory refer process is delayed only if any of the I/O channels are attempting to access the main storage at the same time. The status of a channel in operation or after operation is indicated in the channel Control Register as shown in Figure 7 and can be referred by main control programs. VERY HIGH SPEED SERIAL AND SERIAL-PARALLEL COMPUTERS 5. Program Interruption and Protections The program interrupt conditions are: Memory error Overflow in fixed point, floating point or variable Length arithmetic operations Memory protection and No operation Ready conditions of I/O control channels Real-Time clock interrupt Manual Console switch, etc. responding contents of mask register (address 18, as seen in Figure 7). These program interrupt conditions cause an interruption only when there is a "one" bit both in Indicator Register and Mask Register, and when the Priority Mode Indicator (4th bit of address 17) is off. If the program interruption occurs, the main program routine is turned off, the Priority Mode Indicator is forced to turn on, the current content of SCC is stored to the address 32 with the instruction word count, and a forced jump is made to the address 33 from where begins Master Control Program, the routine handling the specific conditions. These conditions are indicated in the address 17 Indicator register (Figure 7). The built-in circuits continuously and automatically check for above conditions and cor11 Fomat I~~~~~~~~~~-L~~~~'~ILv~I~.~,_A~,~I~Ill-Lr~~~'_'L,_FLr~~~~,~, 15 16 17 2D 21 22 M* , , 1 24 25 [11 ,Brr r 202122 , , {2, , 111* , , I "'2 V I 4748 49 I FomatlV [ 15 0 I M I , I 32 Command I , o A 17 11 ' . I I 5354 LJ ·1 S I 1517 18 192D 21 Figure 8. Instruction Format. I I , I: L I , I 63 F I I 31 W f I IWI N I F , , 25 5657 58 T I I , I 24 25 ,~, I , I 63 ~ 5758 IBl, 12 1///I/////I/lJ R , ~ 2D 21 22 47 ! 2~ , ,Ill I , I 525354 , D, , I--d , I , L 565758 ;BI I~' A I I ,I 31 'PI 20 21 22 , , F , , , I , ~I I I {, , 31 2425 525354 I 197 I 31 L , , , 63 I 198 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 address 2 4, I n 0 P ~ 3 0 31 20 21 (a) d..1 address I 0 2 3 31 21 3) n 31 20 21 I P I 0 20 21 I 31 (b) Modifier f I displacement (value) o 31 15 1& (e) index word for word adress modification , , - - - - - - modifier (value) I , ! o 20 21 (d) 31 index word for bit address modification Fig. 9. Repeat Mode Modifier and Index Word. The right half word of address 16 is the memory Protection Boundary Register (as shown in Figure 7), U or L assigns the upper bound group number or the lower bound, respectively, where a group represents every 256 words memory locations, each group having an address that is a multiple of 256. (When the main memory capacity is over 65 KW in a 5020E, every 1024 words.) The protected memory address X, whose group number is X, is assigned by U and L, such as ifU>L U>X>L no region is assigned if U = L X < U or L < X if U < L , When and only when Priority Mode Indicator is on, all the instructions which will have store operation to the above-mentioned protected area are suppressed from storing, and this is indicated by Transfer Protection Indicator (Oth bit in address 18) and Protection Indicator (16th bit in address 17). So the program interruption will occur. The purpose of this protection is to eliminate the interactive of programs in multiprogramming operation. Especially when the system is operating under supervision of the monitoring system, it is necessary to safeguard the monitor from harm of supervised program, for these situations cause not only a single program but all the system to stop. Moreover, stop protection facility is provided in the 5020 family. Thus, automatic program interrupt ion system and protection facility have made effective multiprogramming possible. 6. Logical Structure 6.1 Logic Circuit The logic circuit of the 5020 family is fully synchronous, 2 phase (B phase) static circuit. Figure 10 (a) shows a basic regenerative ampli- 199 VERY HIGH SPEED SERIAL AND SERIAL-PARALLEL COMPUTERS 32 words cycle time to execute the multiplication of one word by one word. Particularly for scientific uses it is important to reduce the execution time of the multiplication. The sumInarized procedures of the multiplication and the division are as follows. fier or flip-flop, in which the connection A is specially added to eliminate the hazard of misoperation and to have room of the clock phase adj ustment. Figure 10 (b) is the symbolic representation of circuit (a). (c) in the same figure shows the rules of worst case logical connection, that is, two levels of pair logic are permitted. Max number of fan-in of logic circuit is 7 and max fan out of logic is 6. ( d) shows the helical-wired transmission line (delay line) register of 36 bit information in the 5020 system. ( 1 ) Multiplication The HITAC 5020 has five binary Adder & Subtracters. In the multiplication, four Adder & Subtracters of them are connected in series. The multiplier is divided into eight parts, each of which contains 4 bits information. When the multiplicand passed through four Adder in series according to 4 bits information, the partial product of one-word by one eighth word is obtained. Therefore the multiplication of one word by one word is performed only in 8 words cycle time (16 .p.sec). 6.2 Arithmetic Unit of the 5020 In the 5020 arithmetic unit, the pure serial logic and the delay line registers feature reveals especially its simple characteristics in multiplication, division and shift operation. The multiplication and the division in the serial computer are performed by a series of additions, subtractions and shifts. Therefore, if the most simple procedure is adopted, it will take In the case of Integer Multiply (1M), five Adders are used. So it takes only 1 word cycle x ~ '------00 '}" A....p ~ EF Q.tu l I- 7a.~·"'1- - - - . 7i ---.-.. ~ I ooOC It Y ANf)-c,· EF rt OR- AWD A....p ----4---- _____ ~ ______~ .J.1:" i Q-Q , {ci)I'---...oo _-,I T___ ~ 1=&1 ~ ~ ~ i ~--t@ l I.. N/O- OR-EF ". OR-ANb C) ~ I I J --------------------------~~ ~. (C) '7, 00( ~ • !.: ~ r K£ r d.efAt ~ lei ) Figure 10. Logic Circuit. ~ ~ ~ 200 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 time to carry out the multiplication of one word by the integer less than 32. This scheme is illustrated in Figure 11. quotient .e--- r--- - ----- - -------.., I I "-------®- (2) D-iv-ision I I ~ Add Partial _R!'!'!.":!.!!d!'!. _ _ -0-+I } .witchl I Adder 1 or Subtracto ---.... a f - - . d - - - - - { ' a Generally the non-restoring method is suitable for the procedure of the division in computers. According to the test result of signs of the divisor and the partial remainder, the divisor is added to or subtracted from the partial remainder shifted one place to the right. a Sub Partial - ~~i.!!9~ __ a -~~~-- In the HIT AC 5020, this method is more improved to reduce the execution time of the division by using three binary Adder & Subtracters. (See Figure 12.) Figure 12. Division Unit of the 5020. The result of the addition or the subtraction is transferred to the Adder 2 and the Adder 3, in which the divisor is added to or subtracted from the result in the Adder 1, respectively. These two results appear in the Auxiliary Registers 1 and 2. At the end of this operation, the true result is selected automatically by checking a sign bit of the result in the Adder 1. Thus, the division of one word by one word is performed in 16 words cycle time (32 p.sec). (3) Shift Operation In the HITAC 5020, the shift operation is performed by means of the shifter. Owing to very high clock frequency, the delay lines can be used as various registers instead of the transistorized register, and the use of delay lines is very eminent from the viewpoint of economy and reliability. The Shifter, of course, is made of delay lines. For instance, n bits left shift is carried out by making the contents of a register pass through the delay lines of n bits a}---_ I I I I I I I L ___ - - - __ --1 L - - - - ________ ..J contrel line Figure 11. Multiplication Unit of the 5020. a I } I _..J I -0--* ,~------- length. The Shifter of the HITAC 5020 consists of delay lines of 0, 1, 2, 4, S, 16, and 36 bits length. When the number to be shifted n places is set in the Shift Control Register (SCR), these delay lines are connected automatically to make the total length n bits. Every kind of shift operation, thus, can be performed by very simple logical circuits. Although the one-word information of the HITAC 5020 is 32 bits, the one-word delay line registers have 36 bits length for certain reason (ex. multiplication). Therefore, the recirculation time of delay line register is 36 bits cycle. Some consideration must be taken to shift the two or more-word information because of 4 bits spare time. For this purpose, the coincidence circuit between the shift control register and the counter is provided, which controls two output of the shifter (see Figure 13). 6.3 Arithmetic Unit of the 5020E (1) Multiplication Scheme of the 5020E To perform the mUltiplication of one word by one word only in two words cycle time (1.0 ft s in the case of the 5020E), we use eight 4-bit-parallel-adders, AI, A2 . . . . . AS (see Figure 14). The multiplicand is passed through the fifteen black boxes which are n time circuits, named Nl, N2, . . . . . N15 and we get 15 outputs (Le. y, 2y, 3y, . . . . . 15y) simultaneously. Now, they are fed into eight gates numbered Gl through GS. On the other hand, the most significant four bits of the multiplier control the gate G1. That is, they select one of the 15 outputs, mentioned above, or inhibit all of them when the 4 bits are all zeros. The next more significant 4 bits of the multiplier VERY HIGH SPEED SERIAL AND SERIAL-PARALLEL COMPUTERS 201 Delay line 4 information ;/0-- input of tl:e shifter -t.. I .! I I 16 I I I 36 IL ____________ I I ji---------- ..J I L--_.---,------, co:"ncidence circuit + I I I I II I Shift Control Register I b----------~-_J I L--_ _ _I _----'I counter Figure 13. The Schematic Diagram of the Shifter of the 5020. control the gate G2 and so on. The outputs of the gates are fed into the corresponding adders and the adders are connected in series. So we can get the two-word product from the output of the adder A8 in 2 words cycle time. (2) Division Scheme of the 5020E To speed up the division process, we modify the non-restoring method slightly and obtain. 4 bits quotients in one word cycle (0.5 fLS). In this method, comparing the sign of the divisor (y) with the sign of the partial remaind~r (ri), we can determine the next quotient bit (qi) and whether to add or· subtract next. But still there are eight possibilities left. That lmltiplier Register o· C!I 0 0 0 0 0 (xl 0 0 0 0 0 0 0 0 0 0 0 0 lmltiplicand Reg (yl Figure 14. Schematic Diagram of the Multiplication unit of the 5020E. is, how many times of the divisor should be added to (or subtracted from) the 16 "'I i. riH = 16 "'Ii -+- (2k + 1) y (k = 0, 1, ..... 7) Comparison of the most significant 5 bits of the divisor (y) with the most significant 6 bits of the partial remainder' '(r i) allows us ~o restrict the above eight possibilities to onl~ two. (The divisor must have been normalized.) Suppose that we could find k to be n or n + 1 (n = 0, 1, 2, ... 6). Then, perform three addi. tions (or subtractions). riH: = 16Yi ± (2n + 1~ y riH" = 16 "'Ii ± (2n) y ri+4'" = 16 "'Ii -+- (2n + 3) y Test the, sign of ri+4" and we c~n determine which of the two possible remainders -(r 1H " ri+4{~). is right. And, of course, we can obtain the right quotient bits. This method is schematically illustrated in Figure 15 where N1, N2, .... N15 are then times circuits which are used also in multiplication operation; G1, G2, G3 are gates which choose the n multiple of the divisor by the informatio~ of the most significant 5 bits of divisor (Dl, D2, .•.. D5) and of the 6 bits of the partial remainder (R1, R2, ... R6). AS1, AS2, AS3 are the Adder-Subtracter. and P is a circuit to select the right partial remainder out of two possible ones. 7. Circuitry Recent advancement of transistor technique is remarkable and enables us to' easily realize 202 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 IlL mm~m. 00000 Divisor Reg (7) Figure 15. Schematic Diagram of the Division Unit of the 5020E. the circuit that operates at over 10 Me clock frequency. Moreover, we have utilized the current switch mode circuit to make the most of its high speed feature .. :F.orAND..OR logic circuit, however, the diode logic and emitter follower fan out is adopted, based on economy and logic density vs. speed studies. Figure 16 illustrates the circuit configuration of Amp (which is the same circuit as shown in Figure 10(a), (b» (a), AND-OR logic (b), and OR-AND logic. The "zero" level of signal is ground and the "one" level is + 3 V as is done in other current switch mode circuitry. The collector of the amplifier directly drives the signal transmis- Figure 16. Circuitry of the 5020 Family. sion lines, which are twisted pairs or coaxial cables, and the collector load is the matching resistance (75 or 100) of these lines. The ANDOR logic circuit is driven by EF's (emitter followers) which are almost uniformly distributed on signal lines, and transform the source impedance to low. The reasons why we do not drive the line by EF, but from collector directly, are as follows: i) The maximum fan out of EF is confined by its Pc (power dissipation of transistor) . ii) The direct connection of logic input to transmission line disturbs the line characteristics. iii) The problem of damping oscillation of EF is serious. iv) It is necessary to decrease the collector load resistance as far as the characteristic impedance of line in order to attain very high speed switching of transistor, so our configuration has no losses. The signal transmission lines are the foamed polyethylene twisted pair lines and coaxial cables, whose length, for the worst case of half phase 2 level logic, must be less than 1.8m. The output lines of EF are single lines less than 0.5m length. VERY HIGH SPEED SERIAL AND SERIAL-PARALLEL COMPUTERS The computer is fully taken care of ground construction and power distribution, for these may otherwise cause very serious problems in very high speed circuit. 203 and troubleshooting without the aid of the central processor. These, all added up, have sufficient room for half phase of 18 MC, clock period, that is 27.8 ns. Actual computer system has been operating with sufficient margin, and proved to be completely feasible. 9. Summary The paper has described the outline of hardware aspect of the HITAC 5020 and 5020E. The design goal of this family, such as high performance per cost, is achieved by 18MC fully synchronous 2 phase logic circuit, helical transmission lines for various purpose registers such as accumulators, and serial or serioparallel logic structure. 10 systems are now in production, the largest system of which will be installed at the Tokyo University and consists of a 5020E and two 5020's. This will be one of the most advanced integrated systems in Japan, and in the world as well, which is expected to play an important role and contribute to the advancement of Japanese sciences and engineerings with its full-fledged power. 8. Core Memory ACKNOWLEDGMENT The 5020 Magnetic Core Memory is a wordarranged, linear-:-select system utilizing one fer- rite core per bit in a partially-switched mode. The aU,thors ~ish to acknowledge Prof. T. Simauti, St. Paul's University, Tokyo, for his considerable--£ontribution to the development of system philosophy and software considerations. Thanks are also due to S. Anraku, Hitachi Central Laboratory, for his ,yorks in the logical design, especially in the design of 5020E control unit, to Y. Onishi and lVL Tsutsumi, Hitachi Central Laboratory, for their memory design wor ks, and to many other people associated with the 5020 and 5020E project. The circuit data of Figure 16, or Figure 10 are shown below. Ta; delay time of Ampincluding logic witch clock, 6.5 ns Ta; delay time of Amp including logic witch line, 7.5 ns T; delay time of two levels of pair logic, 8.5 ns The basic memory module contains 8,192 words (33 bits per word including one parity check hit) arranged in an array of 4,096 access lines of 66 bits (2 words) each. This enables the transfer of 2 words or up to 64 consecutive bits per module in a single memory cycle which is 2.0 microseconds for 5020 and 1.5 microseconds for 5020E. Expansion of memory capacity can be made in modules or banks, as described in 2.1. Each 30-18 mil high speed core in the memory array is threaded by three windings: One winding each for the word read and write currents, one winding for the common sense-digit line whose winding scheme is such that optimum noise cancellation is achieved and the rather liny line is properly terminated in view of the high speed operation involved. Word selection is accomplished by Ineans of a large steering diode matrix in conjunction with a transistorized cross-bar switch; drive current are derived from the constant current switch, flow into the selected steering diode, word winding and finally the voltage switch. The access control and information flow control for the memory modules are provided for every 16 KW as a unit. The built-in checkout circuit serves to quick maintenance REFERENCES 1. Standard EDP Report, "IBM System/360" (Auerbach Corporation, Phi I a del phi a, Penna., June 1964). 2. E. BLOCH, "The Engineering Design of the STRETCH Computer," Proc. EJCC, No. 16, p. 48,1959. 3. W. BNCHHOLZ, Planning a Computer System (McGraw Hill Book Co., Inc., N. Y., 1962),chap.7,p.75. 4. I. A. D. LEWIS and F. H. WELLS, Millimicrosecond Pulse Techniques (Pergamon Press, London, 1959), chap. 2, 3, p. 47. 5. Standard EDP Report, "CDC 3600," ibid. 6. Hitachi Ltd., The Instruction Manual of HITAC 5020 (Hitachi Ltd., Tokyo, 1963). IBM SYSTEM!360 ENGINEERING P. Fagg, J. L. Brown, J. A. Hipp, D. T. Doody International Business Machines Corporation, Poughkeepsie, New York and J. W. Fairclough International Business Machines Corporation, Winchester, Hampshire, England and J. Greene International Business Machines Corporation, Endicott, New York micrologic components that were then being developed by the IBM Components Division. INTRODUCTION The cornerstone of the IBM System/360 philosophy is that the arch~tecture of a computer is basically independent of its physical implementation. 1 Therefore, in System/360, different physical implementations have been made of the single architectural definition which is illustrated in Figure 1. The Logic which diodes, most significant features of this Solid Technology (SLT) are the module, replaces discrete transistors, resistors, etc., and the two-layer printed wiring 'tTT'h;n'h .... .o:n10na..1 ....... n"'+ n-f +'ha IH"'n .... a+a UT;1"'a", rr''ha ... "',t'.1.u-.......... "'" .I..L.l.Vtrr.:)V V..L. V.I."',", """ ... .VV.LVU'-' Y'f.&..L"'JJ_ yy~~.1.\...-.I.J.. ..L. ... .L", modules are a~embled on small cards in groups or 6, 12, 24, or 36. The small card is the basic replaceable unit. These small cards, in turn, plug into large multilayer printed. circuit cards (approximately 8.5 x 13 inches). Interconnections between large cards are made by flat multiconductor tape cables that plug into large cards in the same way that the small cards do, and which run in channels between the large cards. See Figure 2. One of the initial decisions was the number of processors to implement. Specifically, should it be four or five (considering the Model 60/62 as one). The original decision of five was based upon a planned increase of about 2.5 to 3 in internal performance between one model and the next. Another fundamental decision was to provide full compatibility, both upward and downward, over the entire range of the IBM System/360. This decision was motivated primarily by the advantages, both to IBM's customens and to IBM, of the interchangeability of software. Experience with read-only storages was derived from an experimental computer built in 1960-1961 at the IBM Hursley Laboratory in England. There were two major reasons for the general adoption. of read-only storages in System/360. It was clear to Engineering that the cost 1. Assist downward compatibility due to the cost advantages. Read-only storage (ROS) showed an advantage in cost over the circuitry which it replaced. ROS is used primarily in the control section of the system and its advantages be- targets for each model in System/360 would be feasible only if a significant breakthrough were made in costs of building transistorized. computers utilizing the IBM SMS technology. Therefore, we decided in 1961 to utilize the 205 206 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 INPUT -OUTPUT CHANNELS MAIN STORAGE SELECTOR CHANNEL MULTIPLEX CHAN. MODEL MAX. NUMBER MAXIMUM MAXIMUM DATA RATE SUBCHANNELS PER CHAN. (KC) NUMBER 30 40 50 60 62 70 250 400 800 1300 1300 2 2 3 6 6 6 96 128 256 _ }NOT _ AVAILABLE 1300 MODEL CAPACITY 8-BITBYTES lK "1024 WIDTH BITS EXCLUDING PARITY 30 40 50 60 62 70 8-64K 16 - 2S6K 32 - 256K 128 -5 r2K 256 - 512K 256 - 512K 8 16 32 64 64 64 CYCLE )jS 2.0 2.S 2.0 2.0 1.0 1.0 t ADDRESSES INSTRUCTIONS DATA FLOW CONTROL TYPE 30 40 50 60 62 70 READ ONLY STORE READ ONLY STORE READ ONLY STORE READ ONLY STORE READ ONLY STORE SLT CIRCUITS )II 1.0 0.625 0.5 0.25 0.25 INDEXED ADDRESSES 30 40 50 60 62 70 8 16 (8 ADDER) 32 64 64 64 30 30 30 10 10 5 - RELATIVE INTERNAL PERFORMANCE 30 40 50 60 62 70 CIRCUIT DELAY PER LEVEL, ns CYCLE MODEL MODEL MODEL WIDTH BITS EXCLUDING PARITY FIXED Fi gure 1. IBM System/360 Architecture. comes more pronounced when more functions are to be performed. Therefore, ROS units showed a method of maintaining full compatibility by allowing complex function even in the smallest systems. 2. Flexibility., such as the implementation of compatibility features with other IBM systems. A unique flexibility is achieved by being able to add to the control section of the computer, when implemented by a ROS, without significantly affecting the remainder of the system. One result is to allow certain System/360 models to operate as another computer, such as the 1401, by adding to but without redesigning the system. This ROS concept can be combined with software VARIABLE FIELD LENGTH GENERAL REGISTERS (16 It 32) INTERNAL PERFORMANCE 1 3 10 20 30 50 I I I FLOATING POINT FLOATING POINT REGISTERS (4 x 64) LOCAL STORE MODEL 30 40 SO 60 62 70 TYPE MAIN STORE CORE ARRAY CORE ARRAY SLT REGISTERS SLT REGISTERS SLT REGISTERS WIDTH BITS EXCLUDING PARITY 8 16 32 64 64 64 CYCLE )jl 2.0 1.25 0.5 - to offer almost any performance from pure simulation (no ROS) to maximum performance (no software). When this can be done with performance equivalent to the original system, it greatly simplifies the programming conversion problem by offering essentially two computers in one. Note that we have various technologies for the read-only memory control, including card capacitor, balanced capacitor, and the transformer approach. Certain choices were made for initial implementation, but it should be made clear that the choices, especially in the slow speed unit are not critical and that more than one technique could be used to give the desired performance. IBM SYSTEM 360 ENGINEERING The concept of architectural compatibility was carried a step further in the input/output area by the decision to attach the various units through the same electrical interface. The major reasons for this decision were the added flexibility to the IBM customer, and the greatly reduced number of different engineering and manufacturing efforts which would be involved in producing both the System /360 I/O channels and the many I/O units. Other decisions concerned reliability and maintainability. The primary improvement in reliability involved the advantages of SLT over SMS and obtaining more performance from a given number of components by using high-speed circuits and storages. An objective in maintainability was to have hardware and software not only detect failures, but to localize them to small areas, such as five specific small cards. A programming system was created which takes the machine logic, analyzes it, and automatically produces a set of programs with the proper patterns and expected results for that specific logic. These fault locat- 207 ing programs (FLT's) are then capable of being entered to the appropriate computer (Model 50, 60/62, or 70) and executed with the assistance of special hardware. This hardware allows setting up the proper patterns in the various registers, advancing the clock a controlled number of cycles, logging these registers into main storage, and comparing the actual versus expected results. The programming system allows for updating these FLTs with engineering changes, and they offer a powerful diagnostic tool in localizing failures. SYSTEM/360 MODEL 30 Model 30, the smallest member of the System/360 line, was designed for the market area currently served by the IBM 1401,1440, 1460, and 1620. The design objective was, of course, complete function and compatibility with other System/360 models. However, System/360 architecture includes 142 instructions, decimal, binary, and floating-point arithmetic, complete interruption facilities, overlapped channels, storage .protection, and "other features normallyfound in more expensive computers. This made maintenance of full compatibility, Figure 2. Solid Logic Technology (SLT). 208 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 Figure 2b. , • Figure 2c. IBM SYSTEM 360 ENGINEERING while achieving the required cost and performance objectives, a very difficult task. Therefore, a number of different hardware configurations were examined. Data paths and storages from four to sixteen bits. wide; table lookup addition and logical operations; registers built in SLT hardware, in main storage and in a separate high-speed storage, were studied in detail to arrive at a configuration that would meet the cost performance objectives. The availability of the 2-p,sec. storage at an acceptable cost was also established at this tim:e and became an overwhelming factor in our; selections. The, key engineering decisions which established Model 30 characteristics were: 1. Selection of an 8-bit wide, plus parity, 2-,p,sec storage unit, 64K bytes maximum, 2. Use of 8-bit wide, plus parity, data path, 3. Use of 30-nsec SLT circuits, 4. Implementation of "local storage" in main storage, 5. Use of a 1-p,sec ROS for control, 6. Provide· integrated, time-shared, multiplex and selector channels, 7. Provide for complete processor growth in in a single frame, 8. Utilize "byte" packaging to facilitate checking and fault location. "By-te" packaging means placing all the circuits required for eight data bits plus parity on one pluggable small card. This concept, along with micro-program diagnostics, permits fault location to a resolution that averages five small cards. A simplified data-flow diagram ,of the Model 30 processor is shown in Figure 3. The fourteen 8-bit registers, with the 8-bit AL U and 8-bit storage, provide all the functions for the CPU and multiplex channel. Additional registers ,or buffers are required for selector channe.ls,: direct control, interval' timer, and storage-protection features. Figure 4 is a map of local storage, which is actually an extension of the main storage unit. The first 256 bytes are utilized by the processor for general registers, floating-point registers, interim storage during multiply time shar- 209 ing, and other scratch-pad functions. The second 256 bytes are used by the multiplex channel for channel-control words. Additional control-word storage is available in larger storage sizes. The multiplex channel time shares the processor registers by interrupting the microprogram, holding the return address, storing the registers in local storage, and then at completion of the I/O operation, restoring the registers to their original state. This is analogous to a macroprogram interrput. TIMING AND ROS CONTROL The basic timing is established by the 1-p,sec read-only storage. The main storage provides the read-write cycle in 2 p,sec and a read-compute-write cycle in 3 p,sec. Within the 1-p,sec ROS cycle are four 250-nsec timing pulses. The read only storage in the Model 30 is a card capacitor ROS containing a maximum of 8,000 words with 64 bits per word. The capacitor ROS consists of a matrix of drive lines an<;l sense lines with capacitors at the intersections where a one is required, and no capacitors at those intersections requiring a zero. The voltage change on a drive wire will cause capacitive current to flow in those sense lines which are coupled to that particular drive wire by a c,apacitor. In the card capacitor store, the 64 sense lines and one plate of each capacitor are printed on an epoxy glass board bonded to a sheet of dielectric material. The drive wires and the other plate of the capacitors are printed on mylar cards (program cards) the size of a standard IBM card (see Figure 5). Each program card is punched with the information pattern specified by the micro-code and contains 12 ROS words. A capacitor plate is punched out at an intersection where a zero is to be stored; thus, an unpunched card will give all ones and punching a hole gives a zero at that bit in the word. Microprogram changes can be made by inserting new program cards. MICROPROGRAM A single microprogram instruction can initiate a storage operation, gate operands to the 210 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 ALU A TOW TOX TIC B B NEXT ROS ADDRESS CN ROS BRANCH CL ROS BRANCH CH STAT SET CS SOURCE "A" CA "A" INPUT CTL CF SOURCE "B" CB "B" INPUT CTL CG STORAGE DATA BUSS STORAGE INHIBIT BUSS CONSTANT CK~--. CARRY CTL CC TIC ALU CTL CV oEST. "0" CD STORAGE ADDRESS REG. INH SELECT CM LOCAL/MAIN DEST. CTL. CU FROM U V X BUSS WBUSS Figure 3. System/360 Model 30 Data Flow. IBM SYSTEM 360 ENGINEERING 4;: 6 211 INTP STATUS 0 OX LOCAL STORAGE I 2 G.P. REG 3 0 lX I 2X 2 7 f I X 8 9 I x+llx+2 10 II 12 13 14 IS FLOATING POINT REG. 0 0111213141S1617 FLOATING POINT REG. 2 10SO USE '3X 3 4X 4 FLOATING POINT REG. 4 SX S 16 I 17 I 18 I 19 120 121 122 123 6X 6 FLOATING POINT REG. 6 7X 7 24 I 2S I 26 I 27 I 28 I 29 I 30 I 3 I 8X 8 9X 9 lOX 10 IIX II 8 I 9 o 110 III 112 113 I 14 liS 1 L i S I v lUi G1 J I .... .... .... .... K ADDRESSABLE LOCAL STORAGE BYTES, FOR TEMPORARY STORAGE OF INSTRUCTION COUNTER, STORAGE PROTECTION MASK, SELECTOR AND MULTIPLEX CHANNEL INFORMATION, CONDITION REGISTER, ETC. I l T 12X 12 13X 13 14X 14 ISX IS OX 1)( MPX STORAGE CPU STORAGE DURING MPLX SHARE UNIT CONTROL WORD I CPU WORKING STORAGE 0 UNIT CONTROL WORD 16 1 17 2X 2 18 3X 3 19 4X 4 20 SX S 21 6X 6 22 7X 7 23 8X 8 24 9 2S 9X In 1:~~I~------------------i-j----r------------------~-~--~ ,),c, Figure 4. Model 30 Local Storage,. ALU input registers, select the ALU function, store the result in the destination register, and determine the next micro-word to read from read-only storage. Each ROS. word is decoded to operate the gates and control points in that system. A brief description of the branch, function and storage-control fields in each ROS word follows. Branch Control The branch control fields provide the address of the next ROS word to be executed. A ROS address is a 13-bit binary number. Normally the branch control group provides only 8 bits (leaving 5 bits unchanged) of next address information. Of these 8 bits, the low-order 2 are called "branch" bits and the remaining 6 are called "next address" bits. The 6 "next address" bits are specified directly in a 6-bit field. The two "branch" bits are speficied by two 4-bit fields. These two fields are decoded and used in masking and extracting machine conditions and status conditions contained in data-flow registers G and S. Another 4-bit branch control group provides the function of setting several variables to desired values for later use in microprogram branching. 212 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 MYLAR PROGRAM CARD ~:iE CAPACITANCE EPOXY GLAss SENSE PLANE CONNECTION TABS Figure 5. Model 30 Card Capacitor Read-Only Storage. In summary, every ROS word provides a branchjng ability. The branch can be 4-way, 2-way, or I-way (simple next address). A partial next address is normally used, but provision is made for obtaining a full 13-bit next address when required. The total length of the branch control group is 18 bits, plus 1 parity bit. Function Control The function control· group is subdivided into four fields: source A, source B, operation, and destination. 1. Source A (CA) This 4-bit field selects one of the 10- hardware registers to be gated to the A input of the ALU. 2. (CF) The 8 data bits from a register can be presented to the A input "straight" or they can be presented "crossed." The term· "crossed" means that the high- order four bits of the source register enter the low-order four bits of the ALU, and the low-order four bits of the source register enter the high-order four bits of the ALU. The A input can further be controlled by presenting all eight bits, the low-order four only, the high-order four only, or none, to the ALU. 3. Source B (CB) This 3-bit field selects one of three registers to be presented to the B input of the ALU. The B input is the "true/complement" input and has HI/LO controls but no straight/crossed controls. 4. (CG) This field controls the gating of the B input to the ALU. That is, the low-order four bits only, the high-order four bits only, all eight bits, or none of the eight bits of B may be presented to the ALU. 5. Constant Generator (CK) This field is gated to the B buss, main core STAR and ROSAR, thus providing IBM SYSTEM 360 ENGINEERING a source for constants, mask configurations, and address constants. 6. Carry (CC) This 3-bit field controls carry in, AND, OR, EXCLUSIVE OR functions and permits the setting of carry out into the carry latch, if desired. 7. True / Complement & Binary / Decimal Control (CV) This 2-bit field controls the true/ complement entry of the B input to the ALU, also whether the operation is decimal or binary. 8. Destination (CD) This 4-bit field selects one of the 10 hardware registers to receive the output of the ALU. A given register may be used both as a source and as the destination during a single ROS cycle. In summary, the Operation group specifies one of ADD binary, ADD decimal, AND, OR, or EXCLUSIVE OR. It also specifies true or complement; 0 or 1 carry input; save or ignore resulting carry; use true/complement latch; and use carry latch. Storage Control.'J (CM) (CU) These two fields control core storage operation. Either main storage, local storage, or MPX (for I/O) storage can be addressed for storage read-write calls; five values of CM are used to specify the address register to be gated to STAR. An example of the sequence of ROS control, Figure 6, shows an ADD cycle using a simplified data flow. Step i-As the routine is entered, the con- tents of UV are gated to the STAR, a read call is issued to main storage, and register V is decremented by 1. Step 2-The A-field data is regenerated in storage and the A-field data byte is transferred from register R to D. Step 3-The contents of IJ are gated to STAR, a read call is issued and J (lower -4 bits) are put in Z. Step 4-Z is tested for 0 to set up the branch condition at the next step, the B-field data byte 213 is read out (R) to the adder, as are the Dregister contents (A-field data) and the carry from a previous cycle. The output (Z) is gated into R. Step 5-If the zero test of A in Step 4 is true,-a write into the B field is performed (the address is still in STAR), J is decremented by 1 and the routine is repeated. If the zero test of Z in step 4 is false, then' a branch is made to the write call and the routine is exited. As an example of Model 30 microcoding efficiency, the execute portion of a fixed-point binary add uses approximately 20 words. However, the add can be combined with 13 additional operation-code executions, such as subtract, AND, OR, EXCLUSIVE OR, etc., using a total of 45 words. A one-half word multiply giving a full-word product requires about 95 words. The total floating-point feature, which utilizes the fixed-point microprograms, requires approximately 500 words. It should be recognized that the microprogrammer has the choice of optimizing for minimum words or maximum performance. IBM 1401/40/60 FEATURES COMPATIBILITY It was a market requirement that Model 30 execute the IBIVI 140.1-40/60. instructions directly. Further, it was desired to provide these features without disturbing the Model 30 design, which was optimized for System/360 requirements. As a result, these features are provided by an addition of only four circuit cards plus extensive microprograms. The general approach utilizes the following: 1. System/360 input-output devices, 2. Conversion tables in local storage, 3. Microprogram decoding and execution of the instructions directly. The internal performance is several times the 1401, based on a typical mix of instructions found in 1401 programs. For individual instructions, however, the speed ratio varies widely. Method The internal code used in Model 30 for the compatibility feature is EBCDIC and, further, Model 30 has a binary-addressed storage. Thus, 214 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 ALU OUTPUT BUSS (Z) U, V - ADDR. A FIELD I J - ADDR. B FIELD ALU A INPUT OP - ADD END-LOW 4 BITS OF J-EQUAL 0 BINPUT SENSE OUT INHIBIT CORE STORAGE STAR = UV RD V = V- 1 I READ 1/ I WRITE I I I WR D=R I ~ V I I I ~ ~ I STAR = IJ RD Z'" J (L) R=R+D+C I ~ I I I I I I V I I I I '{I Figure 6. Model 30 Read-Only Storage ADD Cycle. a certain amount of translation of character codes and conversion of numbers from decimal to binary radix, and back again, takes place during processing. These conversions and translations are accomplished by table look-up, using the tables in local storage. These tables are read into storage as part of the special load procedure required prior to execution of a 1401 program. The table used to translate the EBCDIC to BCD requires 128 characters. It is used ·during the execution of 1401 instructions such as bit test, move zone and move numeric, which depend on the actual bit coding. A 72-byte table is required to hold the constants used to convert the 1401 decimal ad- dresses to binary addresses and the converse. This conversion takes place at execute time, hence the programs can operate correctly no matter what methods are used in the 1401 program to generate or modify addresses. Each conversion requires from 6 to 12 microseconds depending on the value of the address. Other functions which utilize tables are op decode and I/O device address assignment. Additional areas in local storage are used for hardware register back-up, sense switches, system specification, and working area. While most of the available 512 bytes of local storage are used, no main storage is used. Hence, an 8K 1401 program will run on an 8K Model 30. IBM SYSTEM 360 ENGINEERING Each input or output device type has an individual microprogram routine. Most devices are attached to the Model 30 multiplex channel. Magnetic tapes are also available on selector channels. The throughput for the compatibility feature requires an analysis of the particular I/O devices used and the particular program being run. However, in every practical case it will exceed the system being simulated. SUMMARY The Model 30 effort has helped to prove two points: 1. Small systems, with the extensive function and facility of the largest systems, are practical. 2. The microprogram control is sufficiently general that a good system design can be used to simulate a wide variety of architecture. SYSTEM/360 MODEL 40 The performance of Model 40 is approximately three times that of Model 30. To attain this performance at minimum cost, four major design decisions were important. 1. The adoption of a 16-bit-wide storage of 2.5 p'sec together with a basic 16-bit wide data flow; implemented in 30-nsee SLT circuits, provided an optimum configuration. 2. A 0.625 p.,sec read-only storage as a means of control for all CPU functions was used because this technique offered significant advantages in cost, flexibility, and design freedom, compared with more orthodox . control systems. 3. The inclusion of a 1.25 p., sec local storage of one hundred forty-four 21-bit words to provide general register storage. This resulted in a considerable reduction in accesses to main storage. 4. By using the local storage to preserve the contents of the CPU during channel operations, much of the CPU data flow can be used for channel functions, thus considerably reducing the cost of channels. MICROPROGRAMMING Microcoded programs, physically residing in permanent form in a transformer read-only 215 storage (TROS), form the heart of the control section of the Model 40. To reduce the resulting physical changes associated with changes in the microprograms, the microprogram design is automated and debugged before actual physical implementation by means of an IBM 7090 programming system, the Control Automation System (CAS). CAS is utilized not only by Model 40, but by all the System/360 models using ROS control. The basic input to the system is a logic sketch page produced by the microprogrammer. The separate micro-instructions are written on this page in a formal language, TACT. When this initial writing phase is complete, the page is transcribed into punched cards. The control program for the 7090 is directly derived from the Model 40 control signal specification and acts as a set of inviolable rules. Within this framework, and using associated established microprograms for reference where necessary, the 7090 simulates the Model 40 and attempts to run the microprogram using submitted data. Errors or violations are detected, stop the program, and cause a diagnostic analysis routine to be entered. Facilities are available for various printouts to provide for analysis and subsequent correction. The debugged microprogram, in the form of magnetic tape, is submitted to assignment checking. This operational phase checks the manual assignment of the absolute address given to each ROS word, and produces listings giving the absolute address and binary bit pattern of each assigned ROS word. Two decks of cards are also produced and used in the production and testing of the read-only storage. An output of the CAS program is a fullychecked and redrawn version of the original logical sketch page. If microprograms are subsequently updated, a revised CAS page is automatically printed on receipt of change. TROS The finally-debugged microprogram is translated into a series of micro-instructions, held in read-only storage. In Model 40, this takes the form of a transformer read-only storage-TROS. TROS is 216 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 made up of 16 modules, each containing 256 ROS words (micro-instructions) to make a total capacity of 4096 words. Each module is made up of 128 tapes, each tape containing two words. The word tape carries two ladder networks, each of which, after modification, holds the bit pattern derived from a specific microinstruction. The tapes are stacked in a module, as shown in Figure 7, with transformers inserted through the prepared holes in the tapes. These 54 transformers are in the form of U and I cores. The I cores carry the sense windings. Each stage of the ladder network corresponds to one bit position of the ROS word. Whether the bit is a 1 or a 0 is determined by breaking the current path on one side of the ladder with a punched hole through the printed wiring, so that the current then either passes through the core for a 1 or bypasses it for a O. Signals are taken from the tapes to the sense amplifiers, the outputs of which are used to set 54 latches. The total cycle time of the TROS and the basic cycle time of the CPU are both 625 nsec. Normally, there are four 625-nsec TROS cycles for each 2.5-/1 sec main storage cycle. It is possible, by the inclusion of a feature which adds additional ROS modules, to simulate other equipment, such as the 1401 or the 1410. This enables programs written for these machines to be run on the Model 40. Model 40 implements these features by microprogramming and conventional programming; Model 30 utilizes microprogramming exclusively. For example: the input/output and edit commands in the 1401 simulated on the Model 40 are executed with System/360 programming, while most of the remainder of the 1401 instructions are microcoded. The primary reason for adopting this approach in the Model 40 is to reduce the added TROS requirements and the ensuing packaging problems and costs. DATA FLOW Figure 8 is a schematic representation of the Model 40 data flow. The data flow may be divided into sections characterized by their data-handling capabilities: 1. one-byte arithmetic handling (8 bits plus 2. 3. 4. 5. parity) , one-byte local storage addressing, two-byte storage addressing and data input/output referencing, one-byte service for the channel data and one-byte service for flags related to the channel interface, two-byte data flow to and from local storage. One-Byte Data Flow One-byte arithmetic handling is performed by the arithmetic and logical unit (ALU). The ALU is a one-byte-wide adder/subtracter which operates in either decimal or hexadecimal mode. It is capable of producing both arithmetical and logical combinations of the input data streams and is checked by means of two-wire logic, where one true and one complement signal is expected on each pair of wires. Figure 7. System/360 Model 40 Transformer Read-Only Storage. Data bytes are fed to the P and Q busses from the associated registers or from the emit 217 IBM SYSTEM 360 ENGINEERING LOCAL STORAGE field of the current micro-instruction. The data are then manipulated by the ALU in accordance with the content of the ROS control word. Several instructions may have common microprogram subroutines in which the difference lies only in the ALU function. One of 16 different ALU functions is preset by a 4-bit field in a ROS control word executed before branching to the common subroutines. This is a small, high-speed storage which provides registers for fixed and floating-point operations, channel operations, dumping of CPU working register contents, interrupts, and for general working areas. Only the fixed and floating-point locations are addressable by the main program. Two-Byte Data Flow Data transfers between local storage, channel registers, CPU registers and main storage are carried out in two-byte steps. The ferrite local store contains 144 locations allocated as shown in Figure 9. Each location is 21 bits long. Addressing is completely r~n dom and the unit may be split-cycled with reador-write operations in any sequence. A read- STORE PROTECT MAIN STORE 2.5 PSt CYCLE 1881TS ~ BUMP Q BUS (9 BITS) 1 P BUS (9 BITS) DIRECT CONTROl INPUT REGISTER 0-7 .-J DIRECT IN ·1 DIRECT OUT DIRECT CONTROL OUTPUT REGISTER 0-7 I I Pt O-17 A REGISTER Pt O-15 B REGISTER ARiTHMETiC I I AND LOGIC UNIT Pt O-7 I I Pt O-17 C REGISTER - Pt O-15 D REGISTER I R BUS (21 BITS) MPX OUT SELECT IN SELECT OUT 1 INCREMENT SELECTOR CHANNEL DATA REGISTER P,O-7 ,. .. MPXIN I H J P,O-7 P,O-7 LOCAL STORE ADDRESS REGISTER P,O-7 - Figure 8. Model 40 Data Flow. P,O-17 R REGISTER READ ONLY ADDRESS REGISTER P,O-II LOCAL STORE 144 WORDS 21 BITS READ ONLY STORE P,O-52 - CONTROL 218 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 000 064 066 067 WORK AREA PROGRAM STATUS WORD 071 072 073 009 WORK AREA IN UNDEFINED STATE WORK AREA AND LOG OUT AREA START 1/0 SWITCH FLOATING POINT REGISTERS 1 ST LEVEL DUMP AREA 015 016 031 032 SELECTOR CHANNEL 1 UNIT CONTROL WORD 037 038 SELECTOR CHANNEL 1 BUFFER MULTIPLEX CHANNEL WORD AREA 041 042 FIXED INTERRUPT BUFFER 043 UNASSIGNED 047 III 048 112 POINT SELECTOR CHANNEL 2 UNIT CONTROL WORD 053 054 SELECTOR CHANNEL 2 BUFFER UNASSIGNED 056 057 2ND LEVEL DUMP AREA 063 127 ADDRESSES IN DECIMAL Figure 9. Model 40 Local Storage. REGISTERS IBM SYSTEM 360 ENGINEERING or-write cycle requires 625 nsec. A complete read-and-write cycle therefore takes 1.25 p.. sec. One example of use of the local store is the double-dump routine executed under certain types of channel operation. If the machine is currently in CPU mode and a multiplex channel interrupt occurs, all data relevant to the current CPU operation are dumped in the local storage first-level dump area. If, subsequently, a selector channel interrupt occurs, all data relative to the multiplex service are dumped, and the selector channel is serviced. When all selector channel operations are complete the multiplex data are restored and multiplex servicing is continued. Similarly, when mUltiplex servicing is complete, CPU data are restored and CPU operation is resumed. MAIN STORAGE The main storage array (2.5-p.. sec cycle, 1.25p.. sec access) of the machine is divided into two logical sections. These are the true main storage, and a special area of 256-2048 bytes, called the bump storage. The bump storage is used to hold channel control words used in multiplex channel operations and is accessible by the microprogram only. CHANNELS Multiplex Channel The multiplex channel is an extension of the CPU in the sense that .the regular CPU data flow and microprogram are used for all data transfers. This channel, which allows a number (maximum of 128) of relatively low-speed units to be operating simultaneously, normally scans all attached control units continuously. When a device reaches the point where it needs to send or receive a byte of data, its control unit intercepts the first available scanning signal and transmits the unit address to the CPU. The CPU data flow is then cleared and retained in the local store using the dump routine. After the byte transfer has been completed, the control unit and device disconnect from the channel, permitting scanning of other devices to be resumed, and the CPU processing to continue. Selector Channels Two types of selector channels are available on Model 40, the A channel and the B channel. 219 They differ in that the CPU interference caused by I/O operations on the B channel is approximately one third of that caused by the A channel. The A channel time-shares the CPU data flow and microprogram to a high degree. Data bytes are never transferred directly between main storage and the interface busses, but move via a 16-byte buffer in the Model 40 local storage. The transfer between interface and buffer is conducted serially, by byte, at the rate dictated by the I/O device. Each transfer causes the microprogram to hesitate one 625-nsec cycle. When the 16-byte buffer is half full, the channel requests the use of the microprogram and CPU data flow. Up to 12 microprogram cycles are required to preserve the current contents of the data-flow registers and to load the control word. The 8-16 bytes in the buffer are transferred 2 bytes per storage cycle to main storage as a block, at a rate of 2 bytes every 2.5 microseconds The B Channel is more conventional; it uses SLT hardware and does not use the local storage as a buffer. Data is transferred to main storage as soon as two bytes are accumulated. Interference is basically constant at 1.25 p'sec per byte. SYSTEM MAINTAlNABILITY One of the more unique features of Model 40 maintenance hardware is the use of the readonly storage as a source of diagnostic routines. One module of the TROS contains a complete set of tests to validate the CPU, local and main storages. These tests are automatically applied each time the system reset is operated to ensure an operational machine. SYSTEM/360 MODEL 50 The performance range of System/360 Model 50 is approximately ten times the Model 30. A review of the following key engineering decisions will highlight the distinguishing characteristics and engineering achievements of Model 50. 1. The 30-nsec family of SLT circuits is used. 220 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 2. A small, 0.5 p,sec, local core storage contains the general purpose and floatingpoint registers. 3. A relatively low-cost 2.0 p,sec. main storage is used. 4. The data paths, local storage and main storage are" each 32 bits wide (plus 4 parity bits) . 5. A 0.5 p,sec read-only storage provides sequence control throughout. 6. Both selector and multiplexor type channels are provided to cover a wide performance range. 7. The CPU and channels utilize common hardware, yet maintain substantially overlapped operation. 8. The CPU, channels, and storage are packaged in a unified structure. CENTRAL PROCESSING UNIT The first four decisions are highly interdependent and were reached somewhat simultaneously to produce a system in the right cost and performance range, utilizing components that would be available at the right time and that would fit together in a good physical and speed relationship. These components were selected on the basis of good performance per-unit-cost and a balanced design, rather than for performance alone. The choice of the 30-nsec family of circuits allowed an internal clock cycle from register through adder and back into register of 500 nsec. This family of circuits provides the combination of compact packaging, good fan in-fan out ratios, and low power dissipation, with a nominal delay per stage of 30 nsec. .,,1__- Figure 10. Solid Logic Technology 30-nsec AOI Circuit. isters, the main storage, and the local storage are all 32-bits wide plus 4 parity bits. An auxiliary 8-bit data path through a logical processing unit, called the mover, is provided for processing variable field length information. This path allows a byte to be selected from each of two working registers under control of two byte counters. The two bytes can then be combined in a variety of logical functions and returned to one of the working registers. With the mover, decimal operands are first aligned in the working registers, then processed arithmetically 32 bits at a time using the main adder. A typical 30-nanosecond SLT module is illustrated in Figure 10, together with its circuit diagram. Such a module has a power dissipation of approximately 30 milliwatts and allows a fan-out factor ef 5 and a fan-in to the OR of 5. The fan-in to the AND is limited only by packaging considerations. The main 2-p,sec storage unit for the Mod 50 is approximately 32" x 14" x 26"; a reduction by one-third the physical size of the 7090 storage (Figure 12). It is available in capacities of 64K, 128K, and 256K bytes (8 bits plus parity). Bump storage which is part of main storage is used to hold channel control words for multiplex channel operations. Bump storage area of 1024 to 4096 bytes is accessible only by microprogram control and not by the problem program. The use of a combined senseinhibit line allowed a three-wire ferrite core plane which can be machine wired. Figure 11 illustrates the basic Model 50 data flow. The main adder path, the working reg- The local storage contains 64 thirty-six bit words (32 plus parity) and has a read-write IBM SYSTEM 360 ENGINEERING cycle time of 500 nsee. This ferrite core storage unit provides working locations for the CPU and channels, as well as the general and floating point registers. Regeneration from either the L or R register allows a swap of information between the CPU and local storage on a single cycle. 221 organizing influence on the design procedure. It not only forced a centralization of all controls, but· also forced an early definition of all gate signals thereby allowing the design to proceed independently in several areas of the machine. Additional benefits resulted from the use of the Control Automation System (CAS). This system not only provides the necessary record-keeping and generation of manufacturing information for the read-only storage, but also provides documentation of the instruction READ-ONLY. STORAGE The decision to use a read-only storage for sequence control produced a great unifying and --L L2 =+~ I L.S. ADDRJ REG. i LOCAL STORAGE 32 PLUS PARITY + + ~I II II + -A. + ~ A~~;.. iii STO!AGE ~~:. 1 t I i i I L REG. i iI IIIi I~I Ii i I R REG. -r - T MD i 1M REG. F -tT- L..:l_T I I I f COUNTER -.i MAIN STORAGE ADDER 32 PLUS PARITY I I STORAGE DATA REG. 32 PLUS PARITY I + SHIFTER , I LATCH - ..L I + I ~- I I I i H REG. I ~ ~ MOVER 8 PLUS PARITY X I .r:- i FROM MULTIPLEXOR CHANNEL TRUE/ COMPLEMENT GATE I I I i TO SELECTOR CHANNELS l---i.. 1 I TO MULTIPLEXOR CHANNEL LATCH I + FROM SELECTOR CHANNELS Figure 11. System/360 Model 50 Data Flow. 222 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 drive lines running vertically and the sense lines horizontally. Approximately a 5-inch pound torque is applied to a system of pressure pads to maintain constant pressure between the two plates. Thus, switching on the array driver provides a change in voltage that is capacitively coupled from the drive line to the sense line. For impedance-matching purposes in the sensing circuits, a balancing line is used in conjunction with each drive line. The appropriate location of the bit tabs determines whether the signal received by the sense amplifier is a 1 or a O. Figure 12. Physical Comparison Model 50 and IBM 7090. sequences and allows complete simulation of these sequences before producing hardware. The balanced capacitor technology is used in the read-only storage unit of the Model 50. A bit plate contains the information content of the storage in the form of tiny tabs attached to a long electrical drive line (see Figure 13). These are etched on a glass epoxy plate by a process similar to that from which printed circuit cards are manufactured. This bit plate is covered by a sheet of I-mil mylar which forms the dielectric of the capacitor. The bit-plate tab forms one plate of this capacitor and a sense line running orthogonally to the drive line forms the other plate. The sense lines are also etched lines. A pair of these lines form inputs to the base and emitter of a differential amplifier at one end and are terminated to ground at the other end. One sense plate contains 400 parallel sense lines. The bit plates lie over the sense plates and are separated by the I-mil mylar, with the The read-only storage of the Model 50 contains 2816 words of 90 bits each. It has a cycle time of 500 nsec and an access time of 200 nsec. To allow the result of one cycle to immediately influence the choice of the next, two words are read from storage simultaneously, and a choice between them is made on the basis of the result of the first cycle. The chosen word then controls the second cycle. This technique allows the same speed to be'maintained as if sequential logic circuit controls were used. CHANNELS A wide range of channel performance is provided in the Model 50 by the inclusion of two quite different channel designs. Both types of channels allow a substantial overlap with CPU operations. The multiplexor channel provides concurrent operation of multiple low to medium speed devices. It makes extensive use of CPU hardware and contains relatively little hardware of its own. (Example: Local storage and some CPU registers are used as working locations.) The control words for each operating device are held in an extension to main storage called bump storage. As each byte is handled, the channel takes control of the CPU and obtains the required control word from bump storage. The CPU registers required for the operation are dumped into local store. The multiplexor channel can also operate with a single higher speed device in a "burst" mode. In this mode the control word is held in local storage to speed the operation, but bytes are still handled one at a time. 223 IBM SYSTEM 360 ENGINEERING { TO TO TERM. RESISTOR +6 VOLTS a r TOTERM. RESISTOR +6 VOLTS a TO TERM. RES. GND. a r SENSE! AMP (A) TO TERM. RES. GND. -=- B( + PADS a TO TERM. RES. GND. a TO SENSE AMP (B) :?~E,~~EJL-' ~Mr-. \'" 1 y TO ARRAY DRIVER (X) TO TERM. RES. GND. a DRIVE BALANCE LINE 1 TO ARRAY DRIVER (Y) r~~~~E TO ARRAY DRIVER .ZERO"/lJ------- O-~ BIT TO ARRAY DRIVER (X) DRIVE BALANCE LINE (Z) Figure 13. Model 40 Capacitor Read-Only Storage. High speed I/O devices use the selector channels (up to three are attachable) which can handle 8-bit data up to an 800 KC byte rate. These channels contain sufficient hardware to assemble a full word of data before requiring the main storage or CPU facilities. Controlword information is retained in hardware instead of local storage. CPU facilities are used only when transferring a word to storage or chaining between I/O commands, resulting in a much higher maximum data rate than the multiplexor channel. An additional selector channel is available which operates in a lockout fashion a;nd is capable of operating at a data rate up to 1.3 mc. MAINTENANCE Four decisions stand out in the maintenance area. They are: 224 PROCEEIHNGS-F ALL JOINT COMPUTER CONFERENCE, 1964 1. The decision to include a hardware Log In-Log Out system for the execution of fault location tests (FLT's). An integrated approach (using the read-only storage to control FLT sequencing and log paths, and using existing hardware where feasible) reduced the cost from being excessive to being defendable, and in addition provided better fault resolution than conventional diagnostics. The FLT's can provide fault localization to within a few small cards, on the average, and have the additional advantage of being automatically produced and updated. The Log Out system is also used with the error checking circuits, to provide a complete snapshot of internal status at time of error. 2. Use of a Progressiive Scan technique provides a reasonable means of running fault-location tests on channel hardware; with resultant improved fault resolution, thus the entire processor can be examined with high resolution, non-functional test. 3. The DIAGNOSE instruction allows the initiation of any micro instructions, in any order. This permits integration of the control of various maintenance techniques into the read-only storage. This instruction gives the diagnostic programmer a power tool. 4. The decision to allow for single-man servicing. This is made possible by bringing all controls and indicators together on a single system control panel which is usable in two positions; the normal operating position, and swung left 1800 so that it faces the principle servicing area (CPU logic, main storage, and channel hardware). See Figure 14. The small logic cards, which are the principle replaceable item, are easily accessible, with the majority located on the outer sides of the gates. Figure 14. Model 50 System Control Panel (Open Position). IBM SYSTEM 360 ENGINEERING SYSTEM/360 MODELS 60/62 Models 60/62 are large-scale processors with performances that are approximately 20 and 30 times that of Model 30. Basic design considerations are: 1. Main storage speeds of 2 p,sec and 1 p,sec. 2. High-speed circuit family with nominal delay of 10 nsec per logical block, 3. Local storage in transistors, with 125nsec access and non-destructive read, 4. Read-only storage control of CPU functions at a 250-nsec cycle, 5. 64-bit-wide storage, plus 8 parity bits; 64-bit data flow, plus 8 parity bits. Models 60/62 attach stand-alone storage and channel units compared to the integrated designs of the smaller models in System/360. The entire instruction set is standard. Model 60 is equipped with a 2-p,sec main storage packaged in two separate frames. Address interleaving between the frames increases the effective speed of storage. The combined capacity of the storage frames varies and is available in 128K, 256K, or 512K byte sizes. Input-output control is provided through a channel capable of handling 8-bit data at a lo3-mc byte rate. Up to 6 channels are attachable and capable of operating simultaneously. Information is passed between storage and channel on a 72-bit, double-word basis, minimizing interference with operating programs. High-speed I/O devices, tapes, disks, and drums are primarily intended for direct attachment to the channel. Slower speed units, terminals, peripheral readers, punches, and printers, attach to smaller System/360 Models which, in turn, may be connected to one of the Model 60 channels in a multiprocessing arrangement. Model 62 differs from Model 60 only in the speed and configuration of the main storage. The 1-p,sec storage associated with this model is available in 256K-byte, self-contained units. A maximum of two of these units is directly attachable, providing 512K bytes of storage. The addressing is conventional and is accomplished without interleaving. TECHNOLOGIES UTILIZED The foundation of the central processing unit design is a basic building block or module con- 225 taining four logical diodes with load resistors, one emitter follower and inverter transistor, plus three control diodes fabricated ona single substrate. This device houses the primary circuit used almost exclusively in Models 60/62. Variations of diode and transistor arrangements exist within the same circuit family and these different module types equip the logician with a full array of AND-OR design elements. The basic block has a nominal delay of 10 nsec. Signal swings vary from + 1 to +3 volts. The circuits display good noise-rejection characteristics in that worst-case simultaneous switching is tolerable under maximum loading situations of five-way AND's driving five-way O,R's into a ten-load output. Circuit speed has purposely been compromised to improve driving and loading capability. All the circuitry is contained on one module, with the exception of the two collector resistors which are located external to the module in a resistor pack. With the exception of the components outlined, the circuit configuration is essentially the same as the 30-nsec circuit block. The local store is not required to accommodate integrated I/O channels, consequently a favorable trade-off is achieved by structuring it in transistor registers as opposed to ferrite . cores. In addition to reduced cost faster speeds are obtained, since the store is not destructively read, so regeneration time is eliminated. The unit contains twenty-:five 36-bit registers. Sixteen are general purpose· registers, eight are floating-point registers and one is a working register used for variable field· length control. Since one or more registers participate in the execution of each System/360 instruction, the accessibility and maneuverability of local storage contents is mandatory for good.internal performance. Any of the registers can be easily read or modified in 125 nsec, a nice fit for the 250-nsec machine cycle. All CPU manipulations occur within the 250-nsec machine cycle. A balanced capacitor read-only storage device, similar to that used by the Model 50, provides logical control for the processor. The cycle time of the read-only storage is 250 nsec, with the output being available 100 nsec after select. Sixteen bit planes of 176 words each provide a total of 2,816 words. Each word is 100 bits wide. 226 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 To achieve desired CPU speeds, the read-only storage is supplemented with conventional control hardware. Approximately 500 control lines are generated in all, of which 400 emanate from the read-only storage. Conventional control logic is utilized primarily where sequencing becomes data-dependent and exclusive read-only storage control would require additional cycles to be taken. LOGICAL ORGANIZATION The size of the CPU-storage interface varies in width from a byte on Model 30, to half-word on Model 40, to a full word on Model 50. In the Models 60/62, it is double-word wide. A reduction in the number of storage cycles required fo~ program execution is purchased at the price of hardware. The data path continues double word throughout. Significant performance im- provements in the handling of long precision floating-point arithmetic and variable field length operations are achieved. A 64-bit instruction-buffer register is directly fed from storage and initially receives all instructions (see Figure l5). Four halfwords are loaded and passed, one half-word at a time, through a I6-bit extender register to a I6-bit decoding register where instruction execution commences. As the last of the four half-words enters the extendor register, storage is signaled to refill the buffer register. The flow through these three registers neatly overlaps instruction fetching with execution and adds substantially to the internal performance of the CPU. A parallel adder, 60-bits wide, is the confluence of the data path. Address calculation, arithmetic, shifts, register-to-register trans- TX LOCAL STORE 25 WORDS ~....----.. .....I--+-.... L ADDR. BUS 24 SERIAL ADDER 8 BITS 32 24 32 PARALLEL ADDER 60 BITS LATCH LATCH TO AB, T, D, IC, R, E TO ST Figure 15. System/360 Models 60/62 Data Flow. IBM SYSTEM 360 ENGINEERING fers, and parity generation and checking are handled in the adder. It is a high-speed unit split into four sections, four groups per section, four bits per group for look-ahead propagation. Worst-case carries ripple out in 135 nsec. The entire adder output can be held in a latch register equipped with gates which can shift the full sum 4 bits right or left into the latch. Adder input gates provide for left shifts of one or two bits. The adder accommodates, in one pass, the 56-bit fraction arithmetic of long precision, floating-point instructions. Two 64-bit registers operate in concert with the parallel adder. Both can be directly loaded from storage and one provides the store path to main core. At times they participate in an action as a 64-bit unit. Frequently they are treated as separate and independent 32-bit registers (identified logically as A-B, S-T). A finer subdivision into 8-byte units can also be effected. Both the A-B and S-T registers have three-position byte counters controlling byte movements. By utilizing the 32-bit _working register in local store, in addition to parallel adder paths, any combination of register-toregister transfer between, A, B, Sand T can be made. The base registers in local storage transmit between the S-T registers. A 24-bit instruction counter and a 24-bit storage address register connect to the parallel adder for loading and incrementing. An 8-bit serial adder draped with extensive gating is woven into the data flow. It drives a latching register and is byte fed into the working registers. The many variable field ope:t;ations, including arithmetic, work through the serial adder as do the logical functions AND, OR and EXCLUSIVE OR. To provide maximum flexibility in present or future high-performance system configurations, physically-independent storage and channel units are employed. One common mu1tip1extype interface serves to permit the attachment of either the 2360 (2 p.Sec) or the 2362 (1 p.Sec) storage units to the CPU and 2860 channel. This interface also provides the mechanism for the attachment of multiple 2361 large-capacity storage units. Figure 16 shows that the key to this interface is the cable that serves as the conductor for multiple driving circuits feeding multiple receiving circuits. This permits effici- 227 ent time sharing of the inter-unit data and address paths under the control of just a few direct connection signal lines. SYSTEM/360 MODEL 70 System/360 Model 70 is designed to fill a marketing need for a very-high-performance data processing system. It may serve as a direct functional replacement for any member of the System/360 family, with which it maintains complete program compatibility. The sole constraints on program compatibility are storage size, input/output configuration, and the absence of time-dependent program loops. The fundamental elements contributing to Model 70 performance are: high-speed circuitry, interleaved 1-ftsec storage units, and logical organization. CIRCUITS AND PACKAGING To achieve the desired performance, the Model 70 Engineering team specified a circuit family which would permit an effective balance between the basic machine cycle, storage-access time, and storage-cycle time. The optimum cycle time was established at 200 nsec. This, in turn, coupled with machinepackaging constraints, dictated a circuit with a nominal switching time of 5 nsec. The resultant circuit is of the AND-OR INVERT family. The switching time varies from 4 nsec and longer as a function of fan-in and loading, i.e., line length and number of loads. The basic circuit configuration is similar to the 10-nsec AOI but differs in component characteristics. These circuits are packaged in modular form; the modules are mounted on small cards having a capacity for either 12 or 24 modules. A substantial number of these small cards are functionally packaged to achieve high density and relatively short line lengths. With a circuit that performed well, extreme care had to be exercised, not only in defining the functional cards but also in card placement. More difficulty was experienced in controlling delays in transmission than logical delays. STORAGE Main storage for the Model 70 consists of two banks of 1-ftsec storage. Each storage bank has 228 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 STORAGE 1 MODEL 60-2360 MODEL 62-2362 ~~~ STORAGE 2 MODEL 60-2360 MODEL 62-2362 LCS 2 2361 ; ~~~ LCS 4 2361 LCS 3 2361 ~~~ r-: I ' I t-------------.. -------------.--------------~---------- ____•______________ J I t--------------,------------r- -------- ---r-------- ------~ ------------,-------------, I: : : : : : : ! t-.- -r--T- --:-,-"--r-T- -t"--""'T-"~-r-r- -;--1 ! ~~~ LCS 1 2361 I ! ~~~ ~~~ ! I ! I -~--""--+--+---4--""-.~-:---'---~--~ I : : I i : I ! I I • • ! I • I I ~ DR~ • I ~DR~ ~DR ~ ~DRa 2 -- SIMPLEX CONTROL LINES I I I ~DR~ ~DR~ ~DR~ 5 4 3 6 2860 SELECTOR CHANNELS 2860 SELECTOR CHANNELS CPU 2060 t ~ ~ • 7 LEGEND: - - - STORE BUS - - - FETCH BUS ------- } MULTIPLEX BUSSES ADDRESS BUS DR - DRIVER CIRCUIT REC -RECEIVER CIRCUIT LCS - LARGE CAPACITY STORAGE Figure 16. Model 60 Storage-Channel Interface. a- capacity of 256 K bytes arranged in 32K double words (64 information +8 parity bits). Data with even double word addresses are stored in one bank; data with odd double word addresses are stored in the other. These two banks a.re independent of each other, having exclusive drive schemes and data registers. When accesses are made to sequential storage addresses, the storage units operate in an interlea ved fashion. The two units cannot be accessed concurrently but must be offset by at least 400 nsec, a restriction imposed by the maximum rate of the storage bus. LOGICAL ORGANIZATION The initial design approach was to accentuate the p~rformance of arithmetic operations. Enlphasis, however, was brought to bear on those instructions used heavily in the compiling function, e.g., load, branch, store, and compare. A data flow schematic is shown in Figure 17. Instruction buffering is provided via two double-word registers, each supplied by independent banks of main storage. Instructions are executed sequentially and as the contents of each register is exhausted, it is replenished from storage during which time the contents of the second register is being used. Some degree of instruction overlap is achieved by operand buffering. A sequencing unit controls the instruction decoding, effective address generation, and operand fetching, while the preceding instruction is being executed. The sequencing unit is not allowed to operate more than one instruction ahead of the execution unit and is not allowed to change any addressa- IBM SYSTEM 360 ENGINEERING ble registers while the execution unit is still operating on the previous instruction. This eliminates recovery problems in the event that a branch or interrupt causes the abandonment of the instruction being worked on by the sequencing unit. Address generation is accomplished through the use of a three-input adder in one machi!le cycle (200 nsec) ; one input is from the instruction register, the other two from the general purpose registers specified by the instruction. The heart of the execution unit is the main adder which is a 64-bit adder-shifter' combination. It has a three-stage full carry lookahead scheme which permits an add operation in one clock cycle. The main adder is supplemented with an 8-bit exponent adder for floating point operations and an 8-bit decimal adder for decimal and VFL operations. 16 229 A read-only storage mechanism was not included in the Model 70 since it did not lend itself to the machine organization, especially in the area of required cycle time. As a result, the control functions were implemented through conventional logic design. A further attempt to improve performance was made through utilization of transistor circuitry, rather than core storage, for the general purpose and floating-point registers. This approach permitted faster access, eliminated regeneration time between fetches, and permitted accessing more than one register at a given time. In view of the storage interleaving and instruction overlap, a precise estimate of machine performance on a particular program, loop or sub-routine, must involve careful scrutiny of instruction and data addresses and instruction sequences. GENERAL PURPOSE REGISTER 4 I II FLOATING POINT REGISTER I i MAIN ADDER ADDRESSING ADDER STORAGE IN BUS Figure 17. System/360 Model 70 Data Flow. 230 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 CHANNEL The 2860 Selector Channel is a very high performance data channel designed to operate on Models 60, 62 and 70. LARGE CAPACITY STORAGE A considerable asset in achieving thruput improvement is the addition of large capacity storage (LCS) to the Model 70. This storage unit, with a read-write cycle of 8 p'sec, may be attached to the system in increments of 1024K and 2048K bytes up to a maximum of 8 million bytes. These units are attached directly to the storage bus and can be considered an extension of directly addressable main storage. The performance of this channel utilizing 30-nsec circuits permits operation at data rates up to 1.3 megacycles; the rate being measured by the number of 8-bit bytes that pass, via the I/O interface, to or from an appropriate input or output control unit. Any interconnection to input or output control units is via the I/O interface. The LCS can also be attached to Models 50, 60 and 62, and has the facility of being shared between any two of these systems. ,< ri r, The channel is of the selector type, permitting interconnection of multiple I/O control ~SERVIC~EA-------------------~~' 21' O' ~ BOUNDARY / /"" I I ./'" ~--Cl : I / I fl: I ,- - I I M-4 3 AND 4 60~' ! :: / 17 1 3" M-4 1 AND 2 I 1 III !!ii 1/ l -t 30 I4 (OPTIONAL) = 31 = - , IL--_ _-Y L: - MCU - I ---, (OPTIONAL) 1 r------+-1 I I : I EXTERNAL - L , \ I \ CABLE I _____ , \\ __ I I I I I CONNECTORS I" I I I I , ' II I I ! "'" CPU 1 '--1- - 1 , I : I : 3" 604' L... ' - ' - ' - I : : I I I I II I I ~ 1 I 30" L_l _______ l ___ J ___ ~L ____ L_~~ L30.1 63' J. 43.-L-43"-i-43,,J30.J Figure 18. Model 70 Physical Arrangement. IBM SYSTEM 360 ENGINEERING units to a single channel. A maximum of 6 channels may be used. Up to 8 control units may be attached to one channel. However, the channel at any given time operates with one, and only one, device of the many attached. The channel is "instructed" by the processor system to commence an operation. An operation performed by an instruction may involve any sequence or list of commands to that particular device. After being instructed, the channel independently obtains "commands" and transmits data to or from processor storage until it completes its operation. In operating with storage, the channel shares the buss control unit, used by the processor, to obtain its storage references. The Models 60, 62 and 70 use a double level of priority. In the first level, each channel vies with the other channels attached to that particular CPU for priority, and then in turn vies with the processor for specific priority. PHYSICAL ORGANIZATION . Figure 18 shows the Model 70 physical arrangement. Each gate in the CPU contains 20 large cards and 4 half-size cards for termination of interframe cables. Two of the gates contain the Execution-unit, the other contain the sequencing-unit. The maintenance-control unit (MCU) frame contains maintenance circuitry, 231 the bulk power supply, and a small amount of the CPU circuitry. Mounting the CPU and the storage units on a common wall helps performance in that it allows short, direct cable connections between them, rather than long under-floor cables. It also means that all installations will have the same cable lengths in these paths. The 2860 Selector Channel frame is a threegate stand alone frame housing three swinging gates, each capable of containing 20 large cards. Power supplies are mounted in the internal column between gates. Since each channel occupies one full gate, up to three channels may be contained in a given three-gate frame. BIBLIOGRAPHY 1. AMDAHL G. M., BLAAUW G. A., and BROOKS F.P. JR., "Architecture of the IBM System/ 360, "IBM Journal of Research and Development, Vol. 8, No.2 (April 1964). 2. DAVIS E. M., HARDING W. E., SCHWARTZ R. $., and CORNING J. J., "Solid Logic Technology: Versatile, High-Performance Microelectronics," IBM Journal of Research and Development, Vol. 8, No.2 (April 1964). 3. CARTER W. C., MONTGOMERY H. C., PREISS R. J., and REINHEIMER H. J. JR., "Design of Serviceability Features for the IBM System/360", IBM Journal of Research and Development, Vol. 8, No.2 (April 1964). UNISIM-A SIMULATION PROGRAM FOR COMMUNICATIONS NETWORKS J. H. Weber and L. A. Gimpelson Bell Telephone Laboratories, Inc. Holmdel, New Jersey I. INTRODUCTION Furthermore, the measured outputs are directly influenced by most or all of the transactions sequenced through the program. The design and analysis problems associated with large communications networks are frequently not solvable by analytic means and it is therefore necessary to turn to simulation techniques. Even with networks which are not particularly large the computational difficulties encountered when other than very restrictive and simple models are to be considered preclude analysis. It has become clear that the study of network characteristics and traffic handling procedures must progress beyond the half-dozen switching center problem to consider networks of dozens of nodes with hundreds or even thousands of trunks so that those features unique to these large networks can be determined and used in the design of communications systems. Here it is evident that simulation is the major study tool. In telephone (and other) traffic simulations, however, particularly of the network type, the possible number of alternatives before any call (demand) is not very great, but the number of calls which are simultaneously in progress is quite large, and their interactions are not predictable. The measure of performance is typically grade of service (or probability of blocking), which is normally in the order of one per cent of the total offered calls. This measure, furthermore, applies to each traffic parcel in the network. For example, if there are 20 nodes in a network, there are 190 two way traffic items, each with a different demand rate. If the smallest demand contributes only 1/1000 of the total calls in the network at any time, then 1000 calls must be processed for each one from this smallest parcel. A blocking of one per cent should be measured with reasonable reliability on this smallest parcel, and even if 1,000,000 calls are processed, only (1/1000) (1/100) (1,000,000) or ten calls will be blocked, and therefore contribute directly to the measurement on the smallest parcel. Since many networks must be tested using different load levels, it is clear that a primary requirement for a simulator which is to be used in traffic network studies is that it be fast. An improvement in speed of 5 II. SIMULATOR REQUIREMENTS The type of problem to which this simulator is directed is quite different from many of the queuing and other problems which are the primary application for most simulation programs. Whereas in management simulation and tandenl queuing processes (job shop simulations, etc.) problems are characterized by a fairly complex sequence of possible alternatives, the number of demands simultaneously in process is ordinarily not so great as to tax the capacity of the computer. 233 234 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 per cent or 10 per cent can be worth many thousands of dollars even for a single study. The other important characteristic that simulators for this application must have is that they be capable of handling large networks. The toll network in the United States has in the order of 2000 switching centers. Although it is not possible to incorporate even a substantial fraction of this number into a simulation pl'ogram, the number of nodes which can be accommodated must be capable of exercising all of the proposed routing and control parameters, which can ordinarily not be done with anything fewer than about 20 or 30 nodes, and preferably should be somewhat larger. In practice, the capacity of a, simulator is not governed by the number of nodes in the network, but by the number of calls simultaneously in progress, and this maximum must be commensurate with the maximum number of allowable nodes. It would do no good to simulate a 2000 node network which allowed less than one simultaneous call in progress per node pair since this would not be a realistic situation. This requirement, of course, is in conflict with the speed criterion, since to take advantage of low speed bulk storage media such as disc files might cause an intolerable slowing down of the program. III. SIMULATOR CHARACTERISTICS The simulator can accommodate systems with both direct (line switched) and storeand-forward traffic and with two nonpreemptive priority levels allowed for each traffic mode. It also allows trunk reservation by mode, priority, or route and can develop its own alternate routing tables according to specified rules. These can then be dynamically changed according to the state of the system. All congestion is assumed to result from trunks only: the switching centers offer no delay or blocking to any call. This assumption, although perhaps somewhat unrealistic in certain circumstances, allows the simulation to operate quite rapidly while facilitating the evaluation of alternate routing patterns strictly on the basis of their basic structure and routing doctrine, without introducing obscuring effects of particular switching ma- chines. It is felt that the structure information will be common to a large variety of networks, whereas the switching machines are of course unique to each particular system. The essential characteristics of the simulator are as follows: 1. Two modes of traffic, direct and storeand-forward, are allowed. Direct traffic is served upon arrival, using either a direct or an alternate route. If a direct call is unable to be served immediately, it is considered blocked and is either lost from the system or reattempts after some fixed interval of time. The reattempt interval may be exponentially distributed or constant, and calls may either reattempt after each try or be lost with a given probability. Store-andforward calls are served immediately, using the direct or alternate route, if possible. If the call cannot be served immediately, it is stored and queued on the most direct route. 2. Each mode of traffic is allowed two nonpreemptive priorities, and these can be distinguished by: a. Different retrial times for direct traffic. b. Head of the line queuing for the higher priority store-and-forward traffic. c. Trunk reservation procedures in which a certain number of trunks in a group are reserved for high priority traffic only. 3. The simulator accommodates networks with the following maximum dimensions: 63 nodes 1,953 trunk groups These maxima cannot be simultaneously realized, the primary limitation being approximately 6,000 calls in progress simultaneously. The simulator will handle approximately 500,000 calls per computer hour; this is a maximum speed obtained when a moderately loaded network of about 35 nodes is being simulated (i.e., the number of calls in queues and being retried is UNISIM-A SIMULATION PROGRAM FOR COMMUNICATIONS NETWORKS determined only by the condition of the immediately succeeding links. Calls are not allowed to switch through the same node twice, and maxima, in the form of number of links and distance, can be established for each traffic item. It is also possible to allow cans to return to a previous point for rerouting if they are blocked. small compared with the number of calls in progress). The time required to process the simulator output and produce statistics on the run (included in the above estimate) varies with the network size and may take as long as the simulation itself. 4. The alternate routing procedure which is provided determines its own symmetrical routing arrangement (see Reference 1) based on the shortest paths between each two points, hierarchial routing according to preset rules or general routing with the route sections read in directly. The routing either remains fixed throughout the run or can be changed at periodic intervals according to the dynamic condition of traffic in the network. 5. The output data are arranged such that information about probability of blocking, delay distributions, trunk usage and several other aspects of the system are summarized over prearranged intervals of time. The results of these summarizations are then printed out, as well as placed on magnetic tape, and selected sections can then be combined using another program to derive means and variances of the appropriate statistics over any group of intervals which is desired. 6. The simulator has the ability to change loads at a linear rate during the course of the run; that is, any load or combination of loads can be made to vary at a fi'xed rate over a desired period of time, including a step function in which a change is made in zero time. 7. Trunk reservation is set up not only to distinguish between high and low priority calls but also between direct and store-and-forward calls: it is possible to reserve some trunks for direct calls only, as well as for the higher priority. Trunks may also be reserved for first routed traffic at the expense of alternate routed traffic. 8. Routing is of the Hstage-by-stage" variety. As calls progress through the network their next choice of route is 235 IV. SIMULATOR ORGANIZATION The Simulator Program (Figure 1) is made up of a number of subprograms, some of which are simultaneously in core and some of which are read in sequentially. This subprogram structure was used in order to accelerate the programming, maximize the availability of core storage for any program, simplify debugging and allow maximum flexibility for future changes. Briefly, the sequential programs are as follows: 1. The Traffic Generator accepts as input data the point-to-point offered loads, the holding times of the various types of TRAFFIC TAPE TRAFFrC DATA I ~ I GENE~AT~R I -- ~ FACILITY AND ROUTI NG OAT A CALL RECORD AND SWITCH COUNT TAPES STATISTICS ~--4 STATISTICS ~_~ Figure 1. Simulator Organization. 236 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 traffic, and the load changes which can be expected during the course of the run. The program generates all the calls which will be used in a simulation run, placing their mode and priority, time of arrival, originating and terminating points and holding time on the Traffic Tape in chronological order. This tape is used as input to the main Simulation Program. 2. The Simulation Program accepts as input the structure of the network in terms of nodes and trunks; trunk reservation information, if any; routing options and limitations; retrial specifications; and the Traffic Tape. It then processes the calls through the simulated system and reports the results on two tapes. One of these, called the Call Record Tape, contains a record of all calls which have been processed; that is, arrival time, service time, holding time, mode and priority, origin, destination and number of links used. The other tape, called the Switch Count Tape, records the results of periodic measurements of all of the trunk groups in the network, reporting on occupancy and reservation levels. These two tapes contain all of the raw output information from the simulation and are used as input to the Output Processor Programs. The Dynamic Alternate Router Program, which determines the routing pattern as a function of the traffic condition of the network, is alternated in core with the main processing programs, the program not in use being temporarily stored on a scratch tape or disc file. 3. The First Output Processor accepts as input the Call Record Tape and the Switch Count tape, as well as specifications of the number and length of the time intervals over which the information on the two tapes are to be averaged. It then prepares mean values of blocking probabilities, delay distributions, average delays, trunk usages and links-per-call distributions over the specified intervals. This information is printed out and placed on magnetic tape. By visual inspection of these outputs the program user can then determine the intervals over which the system was sufficiently close to steady state operation to allow longer term averaging. He then can make the appropriate specifications for the Second Output Processor. 4. The Second Output Processor accepts the output tape from the First Output Processor and the interval averaging specifications of the program user, and determines over-all means and variances of all system statistics originally derived by the First Output Processor. The detailed operation of the several programs mentioned above will be given in subsequent sections. V. TRAFFIC GENERATOR The Traffic Generator Program (Figure 2) is run prior to the Simulation Program and the Traffic Tape which is produced can be reused as required. This procedure realizes a saving in computer time since traffic need not be regenerated to test several network configurations and routing schemes. The Traffic Tape contains an entry for each offered call consisting of essential information such as arrival time, terminal nodes, holding time, etc. The calls are generated by using input quantities such as offered loads to generate interarrival times using independent random numbers, selected from a specified distribution (single parameter with unit mean). The program continually searches for the next earliest arrival. Finding that item, a holding time and direction are determined. This information is placed on the output tape. The item then receives a new arrival time, is placed back into the list, and a new search is initiated for the next arrival. The selection of the earliest arrival is expedited by the use of a technique, suggested by W. S. Hayward, Jr., in which entries in the list are paired, and the earlier of the two arrival times of each pair is placed in a second list. This same process is continued using pairs from subsequent lists until the earliest arrival is selected. The node pair and call type associated with the arrival time can be simply UNISIM-A SIMULATION PROGRAM FOR COMMUNICATIONS NETWORKS DATA CONSTRUCT LIST 1 FROM INPUT DATA; SET UP CHANGE NUMBERS CALCULATE 6 i b=hi/Li ab a FOR EACH ITEM AND REPLACE ALL Liab BY_ APPROPRIATE 6 i ab 237 LOAD NEW t, AND t2 AND NEW CHANGE NUMBERS DETERMINE HOLDING TIME AND DIRECTION FOR CALL SELECTED PUT ALL REQUIRED DATA INTO OUTPUT BUFFER FOR WRITING ONTO TAPE ;>-Y_E_S_..... STOP CALCULATE INITIAL ARR.I VAL TIMES = 6 i ab . Ra AND STORE IN LIST 1 STORE NEW ARRIVAL TIME IN LI ST 1; ALTER LIST 2 aik TO DETERMINE NEW AVERAGE INTERARRIVAL TIME AND THEN NEXT ARRIVAL TIME FOR TH IS ITEM Figure 2. Traffic Generator. determined. The economy of this technique results from its ability to change individual items very rapidly without full reconstruction of the lists; in particular, it can select the required call more quickly than would be possible with a full search of the first list. During prespecified time intervals in which there are to be load changes, a calculation of each new interarrival time is preceded by a test for the presence of a flag in the listing of the current item of traffic. Should there be one, the interarrival time is modified to produce a linear change in the load. VI. SIMULATION PROGRAM A) General Description The Simulation Program (Figure 3) is composed of a number of subprograms which can be grouped into three categories: 1. Operator Program 2. Routing Programs 3. Record Keeping Progr~ms The division of tasks between the subprograms was made to allow separate writing and debugging of the programs by a team of programmers. This technique also permits changes in operation to be effected in parts of the simulator without major rewriting of the entire program; for example, several routing schemes are available as "plug-in" units. The Operator Program maintains a single chronological queue (or linked list) containing all events which occur in the simulator. This includes call arrivals, call departures, call retrials and various instructions to perform control actions. For example, if the event at the head of the queue is the arrival of a new call, placement is attempted; successful placement will move the call back in the queue to a point in time equal to the arrival time plus the holding time; when that time is reached (following processing of intervening events), this call is removed from the network. In order to maximize the number of calls simultaneously in progress, advantage was taken 238 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 INITIALIZE PROGRAM OPERATOR PROGRAM CONTROL DIRECTION DYNIIMIC ROUTER ROUTE AVAILABLE Figure 3. Simulation Program. of the statistics of network behavior in organizing the data structure. Since the chronological queue is the basic data item using most of the memory, the cell size to be used was of critical importance. It turned out that if a call used only one, two or three links, then three words would be sufficient, but if more links were needed (and a maximum of seven was a requirement for some calls) a larger cell would be needed. In most networks at least 95 per cent of the calls use fewer than four links; so the cell size was set at three words, with the possibility of adding additional words if the particular call required them. The Operator Program presents a new call to the Direct Router, which determines whether a single-link placement is possible by checking trunk availability for the mode and priority of the new call. If a direct route cannot be obtained, the call is presented to the Alternate Router, which attempts a multiple-link connection. The Alternate Router uses a Trunk and Routing Table (described below) to find an available route for the call. A valid route obtained by either routing program is transmitted to the Call Placer, which updates the Trunk and Routing Table to account for the new call. That call (with its event time changed to its departure time) is returned to the Operator Program for reinsertion into the chronological queue. Upon complete failure to route the call, the Operator will either add the call to a nonchronological queue (for storeand-forward traffic) or, with a new event time obtained by adding a retrial interval to the arrival time, place the call at this event time in the chronological queue (for direct traffic). At the conclusion of the call the Operator presents the call to the Call Releaser, which alters the Trunk and Routing Table to account for the call's withdrawal and checks each trunk group previously employed by this call for the presence of nonchronological queues, reporting these to the Operator. The Trunk and Routing Table (Figure 4b) contains the occupancy status of each equipped link (Trunk Data) and routing information for each pair of nodes (Routing). Thus the same table keeps a record of the traffic on individual links and provides full routing information for each pair of nodes in the simulated network. The routing information consists of lists of intermediate nodes to be used for calls which cannot be placed directly. ADDRESS ADDRESS ADDRESS ADDRESS 1,2 1,3 1,4 2,3 2.4 LIST A (LOW NUMBERED NODE) LIST B (H I GH NUMBERED NODE) ~. 3,4 :5.5 ." '--- -- ~ ""'- ----.. LISTS FOR ENTERING TRUNK AND ROUTING TABLE IA I IA I 1-2 1-2 1-3 1-'I B B I CI I 0 CI 0 2-1 2-1 3-1 3-1 A I 8 I cI 0 IA IB I cI0 1-4 4-1 IAIBlclo I A I B I C 10 TRUNK DATA J ) ) ~ ) 1,2 ROUTING 1.2 ROUTING 1.' TRUNK DATA 1,_ ROUTING T:~K DATA 1,5 ~ b. TYPICAL TRUNK AND ROUTING TABLE Figure 4. Typical Trunk and Routing Table. UNISIM-A SIMULATION PROGRAM FOR COMMUNICATIONS NETWORKS The Trunk and Routing Table is constructed from the input data by an Initialization Program (Figure 5). For symmetrical routing the program is supplied with point-to-point distances (or other weightings) and the numbers of trunks installed throughout the network. It then determines a number of economical routes in each direction between every two nodes. The first intermediate node to be tried for each READ GENERAL PARAMETER CARD AND GENERAL ROUTING CARD CALCULATE ADDRESSES OF LlST BAND TRUNK AND ROUT LNG TABLE WRITE INPUT DATA ON SWITCH COUNT TAPE CALCULATE LIST A READ TRUNK INFORMATION FROM DATA CARDS FORM TRUNK DATA LlNES WRITE TRUNK LINES ON TAPE CALL DYNAMIC ALTERNATE ROUTER TO SET UP TRUNK AND ROUTING TABLE AND COMPLETE LIST B TABLE STORE TRUNI AND ROUTING lABLE STARTING FROM APPROPRIATE ADDRESS CALCULATE AND STORE LIST B STARTING FROM APPROPRIATE ADDRESS WRITE INITIAL ROUTING DATA ON SWITCH COUNT TAPE TRANSFER TO OPERATOR Figure 5. Initialization Program. 239 route selected, together with the minimum number of links required using that node, is entered into the Trunk and Routing Table. This method of preparing the routing data is especially convenient when large networks are being tested since the number of alternate routes becomes so large that manual specification is not feasible. The Initialization Program also determines two lists which facilitate entering the Trunk and Routing Table. Shown in Figure 4a for a network of N nodes, there are N entries in List A, each consisting of the address of the first line of a series of entries in List B. When either trunk availability or routing information is required for a specific call type between nodes X, Y (X < Y, numerically; direction of call is not a consideration at this point), indirect addressing permits use of line X of List A to immediately obtain that line of List B which pertains to the node pair X, Y. List B contains information required to route a call from X to Y. B) Direct Router When calls of the following classes reach the top of a chronological queue they are delivered to the Direct Router: 1. New calls. 2. Calls which are to be retried after having previously been blocked. 3. Calls at the head of a nonchronological queue associated with a link on which a call has just been released. 4. Store-and-forward calls which have just completed transmission to an intermediate node after having previously failed to reach their destination. The Direct Router (Figure 6) must determine the nodes between which a connection is required. If the node pair is unequipped (rapidly determined by a sign-bit test in the Trunk and Routing Table), the Call Data is sent immediately to the Alternate Router. If there is a trunk group installed, the Direct Router enters the Trunk and Routing Table and, using the call type, determines the availability (considering reservations) of the trunk group for this call. The lack of an available trunk sends the Call Data to the Alternate 240 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 ENTER FROM OPERATOR DETERMINE NODE PAIR: EITHER TERMINAL NODES OR INTERMEDIATE TO DESTINATION YES NO ENtER TRUNK Af40 ROUTlNG TABLE VIA LISTS A & B YES NO If there is a direct link between the terminal nodes, the store-and-forward call will queue on this link immediately; if there is no such link, the call will queue along the "first choice route;" this is a route selected to have no more links than the minimum number of links specified when the first choice of an intermediate node was obtained by the Alternate Router. "Crank-back" or call "back-up" for direct traffic is available (Figure 8). When used, a call which has been blocked at some point in its routing releases the last link accepted, returning to the previous node where it now tries to reach its destination via another link. The number of links which may be released before each forward progression is an input quantity. D) Call Placer and Releaser GO TO CALL PLACER GO TO ALTERNATE ROUTER Upon indication from either router that there is an available route for the current call, the Call Placer (Figure 9) determines which Figure 6. Direct Router. Router, while availability indicates that the call can be placed to its destination, and the Call Placer is supplied with this information. Since every call is offered to the Direct Router and most calls are carried over direct routes, this program was written with particular attention to operating speed. In conjunction with this the structure of the Trunk and Routing Table was largely determined to facilitate its use by the Direct Router. C) Alternate Router A call is referred to the Alternate Router (Figure 7) only if it has been determined that there is no direct route available. The Alternate Router attempts an immediate multiple-link connection, progressing through the network on a node by node basis. Should the Router either exhaust the list of node choices or be unable to find a route with no more than the permitted maximum number of links, a direct call will be returned to the Operator Program for retrial at a later time (a specified per cent of the retrial traffic can be "lost" rather than retried), while a store-andforward call will be queued on a trunk group. GO TO CALL PLACER Figure 7. Alternate Router (Without Dynamic Router or Call Back-Up). UNISIM-A SIMULATION PROGRAM FOR COMMUNICATIONS NETWORKS 241 BACK-UP TO PRECED I NG ~':'::---------------------.,.jNODEOF ROUTING PATH AND ATTEr~PT FURTHER ROUTING Figure 8. Option 2 Alternate Router (Without Dynamic Router, But With Call Back-Up). links require updating, enters the Trunk and Routing Table and alters the appropriate data. After updating these records the Call Placer returns the call to the Operator for reinsertion into the chronological queue at the release time. At the conclusion of a call, the Call Releaser (Figure 10) must update the Trunk Data. Then the call is sent to either the output queue (for transfer to the Output Tape) or to a nonchronological queue on a trunk group (for a store-and-forward call not yet at its destination and still blocked). Finally, the Call Releaser checks each of the trunk groups used by the call just released to determine if there are any nonchronological queues waiting for these groups and presents this information and program control to the Operator. E) Operator Program The chronological queue is administered by the Operator Program (Figure 11) using linked lists, which facilitate the maintenance of information in the computer by allowing the ordering of data to be easily altered. In the simulator, a list number is stored with each chronological event. This number is the address of the next event in the queue. The address (or location in core) of an event remains unchanged from the time it enters the simulator until it is removed (for example, from original entry from the Traffic Tape until the end of processing and placement on the Call Record Tape). The position of a call in the queue, however, is easily changed by altering the list numbers of the preceding event and the event being moved. Vacancies in core are linked together in the same fashion using a push down vacancy list. N onchronological queues are similarly constructed by taking blanks from the vacancy list. When calls have been completed they are linked into an output queue which is periodically read on to the Call Record Tape; after this the slots are returned 242 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 UPDATE ACCUMULA TED DELAY TIME BY ADO I NG HOLD I NG TIME The Second Output Processor collects the appropriate information from the Processor Tape (generated by the First Output Processor) and determines means, standard deviations, and over-all network statistics for a prespecified number of processing intervals. CHANGE CALL'S CODE TO INDICATE DEPARTURE OF A COMPLETED CALL ""== ... ~ a:: o "- .... w .... w C[ a:: The first of these, called the First Output Processor, reads raw system statistics from the Switch Count and Call Record Tapes and evaluates the mean values of appropriate system statistics over pre specified subintervals of time. These statistics are then printed out and also read onto magnetic tape. The program user can examine the results and, by ascertaining the time periods over which equilibrium was obtained, write the specifications for the second program, called the Second Output Processor. '------ 0.25 o.oe 8.06 0.01 0.17 0.24 0.44 0.04 0.11 0.14 0.52 0.78 3.54 0.90 0.17 0.88 0.21 0.36 0.35 0.08 1.51 1.84 0.93 1.53 0.42 0.54 0.63 40.40 56.04 4.01 1.43 1.58 3.08 3.53 144.31 1.92 2.21 2.90 2.12 0.18 1.46 1.61 6.31 7.93 25.80 10.20 Figure 12. Offered Loads as Used in Generation of Input Tape. 246 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 __________________ ~E---~W~4XU!~r'~'M~ PAIR LINKS cIRErT --L_~ _____ ~ ~IN~I~T~f4~I~A~I~TE~R~N~AT~E~R~A~IITWI~N_G~~ES~ _____________________________ ______________~R~e'~'T~IN~G~I~e~H~T~e~HulwGH~________~R~e'U'T~IN~G~H~IG~H~T8~lg~W~--____YLE~N~GT~H~___ NIce ________________ ~4~ AF LINK t INKS NICE I IN'S N00E I INKS NADE I INKS __~?_ _ _ __UIO~--~?~_ _ _ _ _ ~4____~--__I~O~--~?_ _ _ _ _ _~79~6J~--- 00 00 5 o 0 0 9 00 __ ~ ______~______________~2~__~2~__~4L___~?~_ _ _ _~2~__~----~4~__~?~_ _ _ _~2~4~3~4~4____ 4 ? 0 0 2 2 10 2 2 0 0 10 10 ? Q 0 0 8 3 00 2 ? 1 3 ? 2 2 3 o o o 0 2 3 00 10 ? 3 4 10 10 2 ? 2 4 4 3 0 0 00 '0 2 11 3 ? ? 2 42 10 2 42 00 ?? 4 9 2 10 ? o 11 ]0 4 ~ 1 13 o ? 2 2 3 0 0 20 o 10 " 20 1 15 3 2 2 3 20 J 16 20 1 11 3 10 4 1 t8 119 21 ? 2 2 2 0 0 3 2 3 2 3 4 18 o o 2 2 10 2 4 '3 18 o 16 3 3 o 15 3 10 3 4 25 20 2 o 4 o 2 o 10 4 o 3 o .. o 3 o ..o 10 20 'o" 10 4 o 18 10 20 70 o. o o o o ??5 '3 o 8 2 3 13 10 o 17 13 3 2 2 3 o 17 20 3 2 2 o 21t 2 3 3 2 3 ? 3 2 o 21 o 2 10 3 3 20 2 3 20 23 2 11 25 21 10 o 18 o 10 2 o 2 3 10 11 24 2 o A o o o. 2 3 4 o 2 o 10- 15 13 o o o 7 17 3 o 2 10 o 17 16 3 3 3 3 3 2 20 2 18 25 21 10 12] 1 24 42 2 10 ? 10 2 53 00 o o 2 3 1 22 2 o 10 2 4 o 20 '" o 1 4 o 10 4 3 3 2 3 2 3 2 20 3 1 20 3 174 ____~________________~2____~2_____utO~_ _~3~____~____~--__1~3~--~3______~0H_----4 4 20 2 20 21 3 2 o o o 19 4 3 o 3 2 2 10 2 10 2 o. 4 1 o o 2 o ? 2 10 0 o o. o. o 3 2 3 o 2 162.35 o 2 Q o. 10 o 10 24 o 25 o 2 Q 3 o o. Figure 13. Initial Alternate Routing Tables. The frequently used sections, such as the Direct Router or the Operator, are basic to the structure either of the system or of the simulator, and changes in these would normally be of a sufficiently fundamental nature to require a substantially new approach. This type of program, then, need not be easily alterable. This sort of experience leads rather naturally to a categorization of simulation programs according to the following classifications: (1) Basic subroutines which are called by virtually every transaction. These programs should be efficient but need not be easily modified. (2) Subroutines which specify the logic to be followed under more unusual cir- cumstances, such as when congestion is encountered. Such routines are critical to the problem, since it is the behavior of the system under such circumstances that is normally the question under investigation. The logic followed in these cases ordinarily can be varied and is under constant assault as new operating procedures are invented and must be tested. These programs, however, must have access to the same data structure as the basic subroutines of class (1). (3) Programs which perform such functions as preparing the inputs or processing the data. These programs are normally reached only at infrequent times, and can access data which is buffered and altered from the basic data UNISIM-A SIMULATION PROGRAM FOR COMMUNICATIONS NETWORKS in the simulation. They also may be largely arithmetic in function as opposed to the logical operations generally performed by the programs within the simulation. 247 could well be written in some intermediate type of simulation language. The characteristics of the language would have to include at least the following: It would be desirable in the development of future simulation languages if the capability for using different existing languages for certain sections of the program could be provided. For example, heavily used portions of the program may be best written in a machine language in order to obtain maximum efficiency in speed and space. Others may best be written in a familiar language which is suitable for arithmetic operations (although if a simulation language can incorporate extensive arithmetic functions, this would be useful). The third type of program, however, (type (2» (a) It be "natural" in the sense that it be reasonably easy to understand and use, and has a documenting capability to facilitate program changes. (b) It be capable of interfacing with a program written in machine language and perhaps with a commonly used compiler language such as Fortran. In particular, it must be capable of operating on the same data as do machine language subroutines, with full capability for flexible bit packing and accessing of information which is stored in dynamically changing arrays. SWIH.H -UUU.-IAf'L.ltEOUCTHlII,.. MEM-Fl!.IR_REPeRTING INTERVAL -IWMaEA-- _-__ 1--_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ PReCESSING INTERVAL NUMBERS 2 THRBUGH 6 NUMBER lIF SWITCH Cl!.IUNTS 50 BEGINNING TIME IS ~r:E PAIR 1 2 1 3 1 4 1 10 1 20 2000.000 H!lTAl TRlNKS TRUNKS RESC FUll ENDING TIME IS UTAl CARR I ED ----- ----CARJUEO- -_.F.uu. UlAD ACCESS CARRIED LIMITED ACCESS 12000.000 PERCENT P-f:RCENT PERCENT lICCU-- -- .. FULL------U!l1 r E - O . - - - - - - - - PANCV ACCESS ACCESS 17 7 23 18- C 73.200 67.320 5.880 95.065 91.!998 8.002 0 ______ -.!t.320 __ 3.180 I.HO 61.114 n 74 1 26 2S9 0 17.720 8.580 9.140 17.043 48.243 51.157 .0 _____ -1L.480_____ 11.420 80.444 _____ 78.9D9.----21.'O'9,...1_ _ _ __ 3.060 7 g 5~ 140 4 ... 94Q 0.200 73.429 3.165 ____ ! L L - _ - ' L -_ _ _ _-->3'-"-9-'7'-"4J.LO_--'3><>-8~. ___ .__0.800 _______________ .92...A19 ___----"9u7...:9'"-'7C-L7_ _---42:-.u..02~3L-_ _ _ _ _ __ _2 _ 3 2 4 79 C 74.860 62.680 12.180 94.759 83.786 16.214 31 ______ 0_ 22 380 ----13.520 8.860 ____ 72.194 _.60.13-1-____ -------3..""9...2:.<>6::1-9_ _ _ _ _ _ __ _2 5 2 8 e 0 4.800 3.860 0.940 60.000 80.585 19.415 9_ 0 ______ 5 8 8 0 4._400 1.480 65.333 _ 74~9.93?S 001 2 9 2 10 169 0 163.560 150.160 13.400 96.781 91.816 8.184 5 n 1 960 1 860 0.100 ___________3.'L2.Qa..___9""SL.-!t.4C;2S>----_-------'t-.4..:lSu.7:>-S_ _ _ _ _ _ __ ---2 11 2 12 0 2.260 2.100 0.160 56.500 4 93.642 6.358 58.58L ___ .U...Al9.._ _ _ _ _ _ __ 18--0 __ --14~ __ 8.620 6.140 82.000 2 13 2 14 10 0 6.740 5.840 0.900 61.400 86.118 13.282 98.022.---------1.978 _ _ _ _ _ _ _ __ 2 18 11 0 - -----9..0-20- _ -----'----8.840 0.180 82.000 2 19 9 0 7.180 1.180 O. 79.178 100.000 O. _ -- ______---26.-___ --0. 16 380 14 980 _____ 1.400 ________ -ll.DD0-----------"9LL1...:S:>.J1.-L7_ _----l:8~48t1..:3L-_ _ _ _ _ __ _-220 2 21 16 0 10.660 10.620 0.040 66~625 99.658 0.342 7 0 5.160 5.160 O. 73.114--- _LOQ.GOO. _ _ OU-o--_·_ _ _ _ _ _ __ 2 22 2 23 14 0 12.61:10 12.680 O. 90.511 100.000 O. 5 0 3.360_ 2.720 0.640 67.200 81~82a _____ 18.__+J!7'_"2_ _ _ _ _ _ __ 2 24 2 25 1 0 4.920 4.800 0.120 70.286 97.')10 2.430 ... ___.33 ____ 0 __ 30 ]20 '9 980.____ .0.3 ..0 9l.8.l2.._-------"9Ul8~8t1.:l91.L2_ ____lJl...JJ!0w:8L-_ _ _ _ _ __ .2 29 2 30 11 0 9.020 9.020 O. 82.000 100.000 o. 2 31 18 a 15.080 14.260 0.820 8.1.178 94.433_ 5 561 2 32 7 0 4.660 4.500 0.160 66.571 96.538 3.462 _____ .7 0 _ . _____ ?o.800 ________ ... 000 0.800 68.511 _ 83.042-'-._ 16 9S8 2 33 2 34 8 0 6.080 6.080 O. 16.000 100.000 o. S7 0 52 400 0-<-'79'"--_ _ _ _ _ __ 4 3 420 ___ J.98O' _________ -'U-'9L13""'0_-----'8:>..692 15.303 24.697 42 440 342M 8 24lL______ ._--'S..,8....4'"-"L-I7_ _J:U80.........,h.:>JS8"---_..... ].19...,3..,.4L2_ _ _ _ _ _ __ -4 10 48 0 I, 20 11 0 7.920 6.860 1.060 12.000 86.798 13.202 ___ ..b9.846______ ______'_L5___5'__"'4_"__3_ _ _ _ _ _ __ _'t.2~___ 13 0 9 0 80 8 580. _____ D.500H 5 10 46 0 38.620 29.020 9.600 83.951 75.226 24.714 6 10 _________ ~ ___ ~. . ___ 69.946 __ -----11lO_~----Uo",____ _ _ _ _ _ __ _ _ _ _~2u5....... 88""0''---_L;>25~880' _ o. 1 8 13 0 6.820 6.100 0.120 52.462 89.325 10.615 15 )aO 7 ]Q ?J a 14 98Q 72 2Mb 98 63 8 0 200 1 3h2 1 17 19 0 12.480 12.220 0.260 65.684 91.813 2.127 3.D61l __________ 8..3..257 ~ _ _ _ _ _~ 89 488 10 512 35 _ _~_ _ _ _ _~2~9~1~4~O_~2~6....0w8~OL-_~ 8 1:1 0 14.060 12.540 21 1.520 66.952 89.465 10.535 )0 8 46 __ .8.._1.7 _ 0 ]4 700 13 100 ..l.601l _ __ 10 • .000_ ?J 89 154 19 0 8 18 12.320 11.'-40 0.880 64.842 92.916 1.08lt 7_ 326 ]Q 0 I 660 2 5 6 14 8 25 6 +4° .. 7S0 64 400 9 10 34 0 27.920 24.880 3.~0 82.118 89.096 10.904 0 _ _ _ _ _~2~3....6~4~O~_~22~5~2QL-_-i. ~~ _ _ _~~_~ 4 757 30 95 243 1....l2O. __________ 18~aoo It 13 18 C 110.020 10.240 3.780 71.889 72.965 21.035 9~ ° -- - -- -- -- ---- Figure 14. Switch Count Tape Reduction-Mean-For Reporting Interval Number. 248 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 CAli TAPE REoIICTT8N - PReCESSING INTERVAL NUMBERS BEGINNING TIME IS !!RIG 2 THRBUGH 6 NUMBER BF SWITCH CBUNTS AVE. ENDING TIME IS PRe B DE'Ay a , ,,051 0 0 0 3 4 5 6 7 0.027 O. O. O. 2 a AVE. NB. , TNKS 0 0 0 J O. o. O. J 89 O. n O. O. O. O. 0 0 0 0 0 0 O. O. O. O. O. O. 0 0 0 0 0 0 O. O. O. O. O. O. o. 0 O. 0 O. o 0 O. o O. O. a 733 0.825 o 0 O. .a 0 a 0 0 900 lit 0 0 0 0 0 16 17 18 0.200 O. O. O. 0 0 o. 0 O. o. o. O. o. O. O. J Q C 0 0 0 0 0 0 ? 1 20 0.006 O. O. (I. O. O. O. 1.293 ) 2 J C 0 0 0 0 0 0 2 _ ) o J 27 0.104 Q ) 09 0.700 0 O. a 1.500 2 233 1.300 ) I 21 1.600 2 '61 2.350 2 397 2.100 2 467 2.275 8 0 9 O. )a a 844 0.843 a B65 O. oen 0 050 049 a O. a 11 O. 12 COo a 13 0.140 O. O. 072 073 025 O. a O. a Q o. a 0 a O. a o. 0 o. O. 0 0 0 O. 233 131 12000.000 LINK olSTRIBUTIBN 3 4 5 1.211 ) ) 68 1.700 0 50 2000.000 eUICK DELAY oIST,RIBUTIBN o 5 1 0 1 5 PEST AEPIJlIIWNGIi--lIN-NTUE;.aR.ll.YAll'~N'UJ'MUlB~fRR-_...J'------------- MeAN.- FAD o. o o. o O. a 888 O. o O. 0.741 o O. Q O. o 300 0.300 o 767 0.500 a 100 0.800 0 733 0.650 0.053 a 0) 9 0.100 non.. O. a 007 O. O. 0 0 D O. O. 0 025 100 0.300 0 233 0.100 a 010 O. 0 267 0.350 Q O. 0 002 O. 0 O. a 6 a a o. o o. o o. o O. o O. a O. 0 Q O. O. 0 0 0 O. O. O. O. Q O. Q o. o o. Q o. 0 0 a 0.100 0 200 0.050 0 033 0.006 0.100 0 0.025 O. a O. n o. 0 0.231 0.400 a 067 0.100 0 166 0.022 O. O. a O. 0 0 0 0 0 Q 703 800 869 0 J 97 J 31 0 1 no O. a 0.C82 O. O. O. o. O. O. 2.210 0.830 0.130 0.040 O. O. O. COO a 0 0 a 2 341 Q 783 a 155 Q Q 062 a a 24 O. O. O. o. O. O. O. 1.733 0.667 0.133 O. O. O. o. Q Q 79 2 a 25 C 101 a a a ann 2 258 Q 158 a OM 0 o 26 C.C25 O. O. O. O. O. O. 2.662 O. 0.538 0.262 O. O. O. O. a o 600 0 0 a o _ I 21 0 0 0 a 0 0 0 1 800 o 28 0.0.73 o. O. o. o. O. O. 3.298 O. 0.012 0.689 0.286 0.012 O. O. 1 1 29 C 024 0 a a 0 0 a 2 316 0 0 772 0 139 Q 088 0 0 Q 1 30 C. O. O. O. O. O. O. 2.351 O. 0.705 0.240 0.056 O. O. O• .--L-.~3U1_ _C~C~IQO_~0~____ ~0~_~0~_~0~_~0~_~0~_ _ _ ~2~3~5~6_~0~_~0~6~1~4~0~3w06L-~0~Q~1~0~0~Q~1~Q~0~_~Q~__ 1 32 O. O. O. O. O. O. O. 2.317 O. 0.750 0.183 0.067 O. O. O. 1 1 1 1 1 22 J 33 0 1 34 0.100 23 e33 0 O. _...L-_ _~3_~C.o...J.LO..1.31L_~04___ 4 C. 038 O. •-...&._ _--"-_---'0.l.aJ.LC52.16"----"0'-- 2 2 9 10 !l 12 C.025 o () J 1 o. C ) 00 0.044 C 011 C. O. 0 O. 0 O. O. a 0 O. O. 0 0 2 O. O. O. 0 O. 0 O. 0 0 O. 0 0 O. 0 0 811 0.837 0 937 O. O. 2.369 I 226 1.201 I 076 2.171 , a O. 0 0 0 0 0 0 O. 0 O. O. O. O. O. 0 O. O. 0 O. O. 0 O. 0 -D..~_ _ _.u..o~_--'LO~_--'L0~__~O~_.Jl.. O. O. 2 13 coo 2 H C. O. O. 15:L--___ .O 025 _..L2_.L: Q 0 2 16 0.019 O. O. __ .2. __ 11 C.. Ql..L __ ._. Q 2 18 C.C03 O. O. 19 C C08 0 0 20 0.C02 O. O. 2) COO 22 0.042 O. O. ° _. __ ._._---- 0 O. a O. 0 O. 0 O. Q o. 0 O. O. 0 0 O. 0 o. 0 O. o. 0 o. 0 O. O. 0 o. 0 o. 0 O. Q o. 0 O. O. 0 O. 0 O. --.0.... O. 0 O. 0 O. ]00 094 0 O. 0 900 0 100 O. 0.681 158 0.132 0 053 0.829 0.294 0 021 0.024 0 006 0.171 0 0 0 0 906 09. 0 0 0 O. 0 005 0.006 Q 004 O. 0.025 0 0.000 0 O. O. Q O. 0 O. 0 O• 0 o. 0 0 0 0 O. a 1.078 0.928 0.066 0.006 O. O. O. o. . I 146 0 904 0 052 0 039 0 005 D 0 0 1.142 0.868 0.122 0.009 O. ·0.001 O. O. ) 035 0 976 0 012 Q 012 0 0 0 0 1.225 0.B61 0.094 0.019 0.009 0.017 O. o. 1 ?B2 0 BI6 0 10 7 0 056 0 Oll 0 0 0 1.139 0.899 0.081 O. 0.019 O. O. O. 2 196 Q 0 400 0 432 Q 1)9 0 029 0 cj 2.420 O. 0.723 0.168 0.075 0.034 O. O. 2 12.L_-u0~__-l01..o..lL88!1JIL-..JOLla..l].1IJ06~..u.0....J0.w0La.8--L10l&..-_.....IQ~_ _Ol1.o...-_ 1.271 0.794 0.155 0.0.0 0.008 0.002 O. O. I 361 0 150 0 178 0 Olt) Q olg D OlD 0 0 1.026 0.977 0.020 0.003 O. O. O. O. J 057 0 947 0 049 0 004 0 0 Q 0 1.209 0.829 0.139 0.024 0.007 O. O. O. -_ .. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Figure 15. Call Tape Reduction-Mean-For Reporting Interval Number. (c) The implementation should preferably be monitor independent, so that it can be used at installations with nonstandard systems. Failing this, the structure of the translator should be sufficiently well documented so that modifications can be made locally. such a program would provide most of the advantages of a full new language, while at the same time materially reducing the complexity of the program. This would improve the likelihood of its being learned and used, which is the ultimate test of value. (d) Although not essential, if the language is to be used for an entire program, as might be the case for some smaller problems, I/O and arithmetic functions should be included. Special subroutines, such as programs to assemble delay distributions, would be useful in this respect. tion on grounds of space or speed, since critical sections can be done in other languages, and full bit manipulation capability would be provided. The above statement of certain desired characteristics of a simulation language in effect sets a rather limited goal, on the grounds that It also would not be excluded from applica- These requirements are quite general in nature, but we do not believe that detailed questions of program structure can readily be deduced from a specific problem of the sort which prompted the writing of UNISIM, nor do we expect that this is an appropriate vehicle for such discussions. 249 UNISIM-A SIMULATION PROGRAM FOR COMMUNICATIONS NETWORKS The essential point, however, is that the programmer_ have available the full capability of the machine if necessary, and a simulation language which is to have application to problems of this sort must make this possible. To our knowledge no language which presently exists and is in general use, provides this capability. Mrs. E. E. Bailey, Miss S. A. Switch, and Mrs. A. Sheehan. REFERENCES 1. "Some Traffic Characteristics of Communications Networks with Automatic Alternate Routing," Bell System Technical Journal, Vol. 41, pp. 769-796, March 1962, J. H. WEBER. X. ACKNOWLEDGMENT 2. "A Simulation Study of Routing and Control in Communications Networks," Bell System Technical Journal, November 1964, J. H. WEBER. The programming of the various sections of UNISIM was accomplished by the following team at Bell Laboratories: Miss G. T. Watling, TRUNKS SUM CARR lEO --'SO""IE=GINNnIN""G-----=EN""OITI"II1NG,.----'f'1l"TfAX"1L--OR?ESf iO..-----,.C"'AR"'R"!""E"D-----=-.,;FUiX-TIME TIME TRUNKS FULL LilAD ACCESS - - --------_.. _-------- ------------------------ ----_ ...__ . . CARRIED LIMITED ACCESS PERCENT PANty PERCENT FULL ACCESS PERCENT LIMITED ACCESS 1.00 3000.00 6663 0 ----1001)0-.;001000.00 ~663"----"O 8000.00 11000.00 6663 0 12000.00 15000.00 6663 0 16000.0-'-0_ _1::..9..:.0_00_•..:.0_0_ _ _ _ 66-'6'-.3_ _ _..:.0_ 402.600 76.514 74.636 71.728 80.527 82.218 92.103 92.668 92.013 90.950 90.493 7.891 7.332 7.987 9.:)50 9.507 ~VERALL BEGINNING TIME 1.00 4000.00 8000.00 12000.00 16000.00 ENDING TIME 15000.00 19000.00 5178.997 5365.530 5478.197 TRUNKS RESo FULL 6663 6663 6663 6663 6663 4695.531 ~9n~-99T--·ij:608~-39r--- -364~-600··- 0 0 0 0 0 4765.330 413.667 4879.930 485.600 4_957 .~!._. 5!Q..~g_~ _ - SWITCH C"UNT SYSTEM STANDARD DEVIATUNS lalAL TRUNKS 3000.00 tOOO.OO 11000.00 5098.130 ~CCU- DATA - ItEPeRTING INTERVALS ----------------- SUM CARRIED LIJAD CARRIED FULL ACCESS CARRIED LIMITEO ACCESS 339.522 291.658 214.849 211.104 196.565 199.628 81.048 60. 196 66.274 68.1/5 60.410 235. tt3 229.917 213.305 206.315 --------------------------- eVER-ALL Mt:ANS-CALL DATA TAPE --------------- BEGINNING liME 1.00 4000.00 BLeCK ENDING lIME pUB. AVE. DELAY --- DelAY OISTRIBUTHIN O. 0.5 1.0 1.5 - 31l1rcr.~Ilf---lf~--____u_;_--- 0.-.---0.--. --0. 7000.00 0.009 D. O. o. O. o. -"mnr.0c:r-"Il000;mr-lr.008---1);·~--lf.-·----1)_;;·----O~ 12000.00 15000.00 0.012 O. --n;omr.OO 1900.0.00 0.024 O. O. o. o. o. o. o. O. o. o. U()-:----n-----r.-I0"9 O. 1.103 O. TIME TIME PReB. DELAY o. 0.5 o. - - - - 4OOU.00--rooO;OO---0~014 --1)~-------1J~--~-\l;----8000.00 11000.00 0.013 O. O. o. 1.00 --------12000.00 16000.00 3000.00 0.013 O. -1500cr.:oo----o-.015 - 1).-19000.00 0.017 O. 1.5 o. O. O. AVE. DECAY · 0 ; 1.00 4000.00 - - 8000.00 12000.00 16000.00 O. 0.034 0.026 0.029 0.034 -0.044 O. 0.--· O. ·0. 0.101 0.097 0·.106 0.ll5 0.1l9 ---r';II3---l)·~·8-88· o. 1.124 1.128 O. 0.878 0.874 2.0 0.004 0.004 0.004 0.005 0.006 4 O. O. O. O. O. O. O. j. O. o. o. O. O. o. o. O. O. o. O. o. ~··ClI.L-Uj(TA --------.--~._~ LINK LINKS 3 OlSrRIBUn~N 4 5 O. HI SERY FACT0RS BEGINNING ENDING BLeCK - TIME . - -- Tn4F-----PReB~-3000.00 7000.00 lIOOO.OO 15000.00 19000.00 0.892 0.897 0.063 0.033 0.041 0.020 O. O. o. O. 0.·· ll~- -0-;m>l:--lr._1)3{I'O'."'O"'38.--"'....,-,,--,.-----...--...,;.:c--..;:.c:-0.018 O. O. o. O. O. o. O. 0.060 0.032 0.039 0.019 O. O. o. o. - - 0-.------0;- - -0-. - ----·O~ .-- ---0";--··~bZ---lf~()'y32.,--.... O-=-.0;.,4"'0.--;~~---;.-=---....;.:'----...=----;: 0.022 O. O. o. O. O. O. O. O. O. 0.059 0.030 0.037 0.019 o. O. o. O. O. ~ 1.0 ~ LlNKS --·~---------·---·-------------"3V£I\-J.U.NTERVAl:;TANDARODEVTlTIlJNS -1!£GfNNTNG ----EfIDTN(;---·Bl-ocK--/WF;---··_·---'JELlY--DISTRIBUTI"N LX ~~K 01 S JR.I BUTI eN AYE. N0. 2.0 O. O. . O. O. O. DELAY DISTRIBUTIeN 0;5--~---1~0 O. o. O. O. 0.·· O. -D. ~---\l-;--.- o. O. ·r~;-- O. ---_ .. _---------- "2;0 O. o. O. ·0-; --- -0;-- o. O. o. .- O. AVE. Nil. LINK DISrRI8UTI~N L INKr-- --r----.,.-----..----.:---..,------:i~-~ 3 5 0.215 0.220 0.201 0.211 0.189 0.194 ·--o;ns--- -0;ZOS--D.T88 0.219 0.217 - 0.208 0.190 0.205·0~n7 0.031 0.040 O. O. 0.031 O. O. 0.040 0.038 O. o. O. O. o. Figure 16. Overall System Means-Switch Count Data-Reporting Intervals. O. o. O. o. o. o. O. O. o. O. o. THE DATA PROCESSING SYSTEM SIMULATOR (DPSS) (SP.1299/000/01) Michael I. Youchah, Donald D. Rudie, and Edward J. Johnson System Development Corporation Paramus, New Jersey 1.0 INTRODUCTION The D PSS can be used to determine the sensitivity of a data processing system's performance to various system loading or design parameters. In addition, the total system design, including the software and equipment portions, can be subjected to a rigorous analysis and evaluation early in the design process so that key decisions can be made in the areas of: 1. The kind of equipment to be used. 2. The number of each type of equipment. 3. The kind of data processing discipline and strategy required. 4. The projected performance of the system under varying loads. 5. The system's maximum capacity. 6. The system's ability to respond as a function of loading, capacity, and environment. The Data Processing System Simulator (DPSS) is a general purpose computer program that can be used for the evaluation of a proposed new design or a modification to an existing design of a data processing system prior to making equipment selections or performing any significant computer program design. The DPSS can also be used to provide guidance in the design and development of a data processing system during the detailed design stages. The DPSS was initially designed to meet the needs of analyzing and developing the data processing requirements of the Project 465L Strategic Air Command Control System (SACCS) . It has subsequently been generalized to permit its application to other systems in various stages of design and development. Results have shown the usefulness of the DPSS in Project 465L (SACCS) and, in a preliminary manner, its usefulness on the Space Surveillance Project and New York State Identification and Intelligence System. In all three cases, original concepts about the system's potential performance were evaluated and new and significant guidance and information obtained as a result of the use of the DPSS. In the case of the New York State System, the DPSS results showed a whole class of computers to be inadequate for the job. 2.0 DEVELOPMENT OF THE DATA PROCESSING SYSTEM SIMULATOR In its development, the Data Processing System Simulator has used a higher order simulation language similar to those which have been developed previously for general simulation. However, unlike other techniques, a single combination of these higher order language macro' instructions is used in a single logical arrangement permitting the representation of a wide variety of possible data processing system configurations and processing rules with no additional programming or design. 251 252 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 As a consequence, the necessity of writing special "event" programs or subroutines, as is usually required with the use of simulation languages, has been eliminated. Further, the flow diagramming, design, coding, compiling, and checkout of each simulator or subroutine created by the use of simulation languages have been totally eliminated. The result is a considerably shortened time required to produce a reliable model of the data processing system to be investigated. 2.1 DPSS Description The following sections contain a general description of the DPSS and a sample problem illustrating how it can be used. 2.1.1 General Characteristics The Data Processing System Simulator requires approximately 1,500 instructions written in JOVIAL, a high-order programming language developed by the System Development Corporation. The D PSS was initially designed to run on the AN /FSQ-32-V computer which is described in Appendix V, however, a new and expanded version of the DPSS has been written to run on the IBM 7094. The AN/FSQ-32-V version of the DPSS requires 15,000 core locations which are used to store the basic program and the parametric inputs used for each run. 2.1.2 Purposes and Limitations The DPSS is used to represent (a) the inputs to the system, that is, those message units or informational units which are to be entered into the system, at either local or remote locations, (b) the processing performed by the computer on each input, and (c) the outputs generated by the processing portion of the system based upon the inputs. Processing includes the buffering and the retention of the messages prior to processing. It also includes the specification of the time to load the system with the necessary programs and environment to handle the message and the time to unload and the time to operate the actual program as well as the data preparation and data presentation functions. 3.0 DPSS OPERATION-GENERAL To understand the functioning of the DPSS, a general idea of the key features of the way a possible Data Processing Central (DPC) cycle operates will be discussed. The system configuration to be considered in the example will employ a discontinuous cycle "in which interleaving and interrupting are permitted. The material for this example is drawn from experience with several systems and does not represent the design of any single system. See the Glossary of Terms, Appendix III, for an explanation of key terms used in the following paragraphs. 3.1 Data Processing Central (DPC) Operations The input messages which arrive from various local and remote locations are batched at the DPC (Figure 1) and held until certain criteria for one or more of the batches are reached or exceeded. The specific nature of batching will be discussed subsequently. Once a batch criterion has been exceeded, the Executive program which controls all DPC functions then initiates the processing of the input messages which have been batched. In so doing, it calls in from auxiliary storage the necessary environment and programs for the operation of the processing portion of the DPC cycle. When the processing has been completed, a set of output programs extracts the appropriate data from the data files and prepares the required system outputs. 3.2 Input Batching An input message batch is characterized by three items, time, size, and interrupt, the latter having two sub-items (see Figure 2). The "time" item indicates that a particular message or group of messages will be accumulated for a given length of time before an indication (cycle initiation request) is given to the Execu- ~N:,.u6.tING H '------- PROCESSING ~I Figure 1. DPC Cycle Operation. OUTPUTS THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) tive or master program that a cycle should start. The second item is the "batch" size. The DPC will collect message (s) until a predetermined number of them have been accumulated. This given number of message (s) must be accumulated in less time than the batch time in order to cause a request for the initiation of a DPC cycle to be made because of "size." Once either' of these two values (time or size) has been exceeded, it must then be determined if the interrupt feature (the third item) associated with this particular batch is set to "yes" or "no," and if set to "yes" whether the "immediate" or "wait" option is set. T.here are two cases to consider here: (1) when the DPC cycle initiation request occurs during the operation of an interleaved subsystem, and (2) when the DPC cycle initiation request occurs during the operation of DPC cycle in progress. When any cycle initiation request occurs during the operation of an interleaved subsystem, the Executive interrupts the interleaved subsystem at the earliest possible moment, regardless of the interrupt setting of the· batch, and initiates a DPC cycle. If the cycle initiation request occurs during an on-going DPC cycle, Ll ____ Ll _____ Ll_! _ _ _ _ _ _ 1_ _ _ _ _ _ ..J ______l! _ _ _ _ LL._ 253 If the interrupt item has been set to "yes" and the interrupt option to "immediate," then the Executive program will initiate a new cycle immediately (possibly with certain programming constraints). If the interrupt item had been set to "yes" and the interrupt option to "wait," then the current cycle will be interrupted when the priority of the message causing the interrupt request is equal to or higher than the priority of the message being processed in the current cycle. The D PSS permits the establishment of as many batches as are required for efficientsystem operation, assignment and modification of batch characteristics, and the assignment of inputs to each batch. 3.3 DPC Task Processing Messages are processed by tasks within the DPC Program System. The tasks are sequenced according to the priority of the message being processed (see Figure 3). One of the tasks provides outputs from the system upon requests (display requests). This task is shown as the last task in Figure 3, however, there are many logical places where the task could be l"IleIl l"Ilree l"IlIIlgOS CaIl Ilal' l'eIl, Uel'eIlulIlg UIl l"lle setting of the interrupt item and interrupt option. If the interrupt item is set to "no" for a batch which causes a cycle initiation request to be generated, then an indication will be given to the Executive program that the batch's limits of time or size have been exceeded but the Executive will not interrupt the cycle in progress. A new cycle will be initiated as soon as the present one has been completed. INPUT BATCHING ! - - - - . . t When a D PC cycle begins, all of the messages that have been collected in all of the batches up to the time that the cycle begins are transferred from the batches to the task processing area. In the task processing area, the messages lose their batch identity and are processed according to the task sequence. INPUT BATCHING !-----IPROCESSINGt-----I OUTPUTS BATCH'" nME 1.5 IZE INTER 10 TASK TASK TASK - - - _ _ _ _ _ _ _ _ _ _ _ !.. _ _ _ _ _ _ _ M 2 YES Figure 2. Batching Concept. Figure 3. DPC Task Processing. 254 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 3.4 Input Data for Each Simulation Run The use of the DPSS requires the definition of the program system configuration and the assignment of values for the system parameters as input data. The following set of input data are typical for each run of the DPSS. The sample values (placed within parentheses or in tables) are used to describe a system that will be simulated as an example. Note that if a system does not have some of the characteristics described in the inputs, e.g., batching, then that information can be left out of the input data. 1. The length of the period to be simulated(3600 seconds). 2. The number of times that this test is to be repeated under the same operating conditions, referred to as the number of cycles in the test- ( 1 ) . 3. The messages to be used in the test-(A, B, C, D, E, F). 4. The number of each type of message that will arrive at the DPC. This number is not an absolute number; rather, it determines the relative frequency of each particular message arriving at the DPC. This number is used in conjunction with the traffic rate: 10 (B - 22 (C - 19 (A - D - 31) E - 5) F - 18) means a collection of programs that are used to process a message or set of messages. The performance of a task is measured as follows: a. "Load Time"-The time it takes the environmental data tables and the operating programs to be transferred from the auxiliary storage to core memory. b. "Operate Time"-The time it takes after the environmental data programs are in core for the programs themselves to process the messages to assess the intelligence they contain. In both of these cases, the distributions of the times to be used for this task and the proper parameter (s) for the distribution must be specified. (See Table I.) 7. Message-Task Relationship-This indicates the messages that are processed by each task. The DPSS will accept any message-task relationship. Message Task A C B,D E,F 1 2 3 4 Immediate Wait No 8. Message-Display Relationship and the ~obability of the Forced Display-This is the relationship between messages and forced displays. It indicates which display(s) may be forced as the result of the message's being processed. Any message-display relationship may be established and tested. For each messagedisplay relationship, the probability of a display's being forced is the conditional probability that the display will be forced given that the message has been processed. If a display is forced by more than one message, the probability of a displats being forced may be different for each message. (See Table II.) 6. System Tasks-This is a list of the tasks or jobs that the DPC Program System is required to do. The word "job" or "task" 9. Task Sequence-This indicates the order in which the tasks operate. There is no restriction on the order in which the 5. The batch criteria for each batch and the messages that are collected in each batch. a. Time b. Size c. Interrupt 1) Immediate 2) Wait Interrupt Time Size Batch Message Criteria Criteria Option 1 2 3 A B, D C, E, F 0 Sec. 5 Sec. 10 Sec. 1 3 4 255 THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) TABLE 1. SYSTEM TASK SUMMARY LOAD Distribution TASK Uniform 1 OPERATE Parameters Min 1 Sec Max 6 Sec Distribution Parameters Exponential Mean .001 Sec 2 Exponential Mean 4 Sec Exponential Mean .01 Sec 3 Uniform Min 4 Sec Uniform Min. .02 Sec Max 5 Sec Max .02 Sec Min 1 Sec Peak 3 Sec Max 8 Sec Min 2 Sec Max 4 Sec Triangular 4 Additional Task which is performed whenever a system output is generated. Uniform tasks operate. In addition, a task may operate more than once during a cycle. Task Sequence No. 1 1 2 3 2 4 3 4 10. Traffic Rate-This indicates the total volume of traffic per hour that is arriving at the DPC. By knowing the traffic rate and the relative frequency of each message, one is able to determine the expected number of each type of message that will arrive at the DPC within any time interval. Preplanned changes in traffic rates are permitted during the test- (750 messages/hour). " Exponential Mean 1 Sec Exponential Mean 5 Sec pertinent data after a DPC cycle has been requested- (0 Sec.). 3.5 Results for Each DPSS Run Each simulation run records and prints out the following items of information: 1. The arrival time of every message that is received by the computer. 2. The time that each cycle is requested. 3. The time that each cycle begins and the reason for initiation. TABLE II. MESSAGE-DISPLAY RELATIONSHIPS Message Display A D1 D2 None D1 D3 D4 None D5 None 11. Maximum Interleave Time-This is the maximum time available for the interleave subsystem. This time could be zero, in which case the prime DPCcycle would operate in a cyclic fashion-(1800 Sec.). B C 12. The Bookkeeping Time for the Interleave Subsystem-This is the time that is given the interleave subsystem to store and save D E F Probability of Forced .8 .3 .6 .5 .01 1 256 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 4. The number of messages that have been collected in each batch at the beginning of each cycle. 5. The time that the processing of each individual message begins. 6. The time that the processing of each individual message is completed (processing complete means that the data files have been updated) . 7. For every display that is forced during a cycle the message that forced the display, the time the message arrived and the time the display is forced are indicated. 8. For every cycle the minimum and maximum time required to process each type of message during the cycle. 9. A cumulative average processing time by cycle for each type of message in the system. 10. The time that each cycle ends. 11. A list of messages remaining to be processed at the end of each cycle. 12. The total number of messages that were received and processed by the computer during the simulation period. 13. For every type of message: a. The number of messages that were expected to arrive during the simulation period. b . The number of messages that actually arrived. c. The number of messages processed. d. The average waiting time for this type of message. e. The average processing time for this type of message. f. A histogram which indicates the percentage of messages whose. processing time was in each of several one minute intervals; e.g., the percentage of messages whose processing time was between 0 and 1 minute, between 1 and 2 minutes, ... , 30 to 31 minutes and 31 plus. The DPSS does not plot the histograms; rather, it supplies the data from which a plot can be drawn. 14. The number of times that each display type was forced. 4.0 DETAILED EXAMPLE OF THE DPSS The DPSS is described in detail with the aid of an example. This example will consist of a description of the system to be simulated and a detailed account of the arrival and processing of the first few messages. I t should be noted that all of the features of the DPSS are not identified in this example. However, enough of them have been identified and described so that the reader can get a good understanding of the model by following the example. 4.1 Description of System Being Simulated The input parameters identified in Section 3.4 which describe the system being simulated are summarized in four tables: MESSAGE, BATCH, TASK, and SYSTEM tables. The MESSAGE table identifies the messages, their frequencies, the batches in which the messages are collected and the displays that are associated with the messages. The BATCH table identifies the batches, the batch criteria for each batch, and the message types that are associated with each batch. The TASK table is divided into two parts: LOAD and OPERATE. The LOAD part is used to specify the probability distribution and parameters that are used to determine the length of time it takes to transfer the processing programs and environment from auxiliary storage to core. The OPERATE part is used to specify the probability distribution and parameters that are used to determine the length of time that it takes to process each message after the processing programs and environmental data are in core. The SYSTEM table indicates the length of the test, the rate of the incoming messages and other pertinent information for the test. See tables III, IV, V and VI for this example. The ADD, CHANGE and DELETE columns are for the convenience of the user when he is making a series of runs and may wish to add or delete displays that are associated with a message, or to change the probability of a display being forced. 4.2 The DPSS in Operation The simulator generates the messages and "sends" them to the computer, in the same 257 THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) TABLE III. MESSAGE TABLE Number Forced Displays Message Per Test Batch Display Add A 10 1 Dl D2 x x .3 B C 22 14 2 3 Dl D3 D4 x x x .6 .5 .01 D E F 31 5 18 2 3 3 D5 x Change Delete Probability .8 1.0 fashion as it would receive them in an operating system. This system has'isix message types. During the simulation period they are "sent" to the computer at a rate of 750 messages/ hour. distributed pseudo random number y = .362 is picked, then The incoming system messages arrive according to a Poisson distribution, i.e., the interarrival times of the system inputs are exponentially distributed. This means that 2.16 seconds after the arrival of the last message (or from the beginning of the test if this is the first message to arrive) another message arrives at the central processor. The type of message that was received has not yet been determined. Each type of message is assigned a range on the interval (0, 1). The range is dependent on the relative frequency of the particular message type. To determine the message type, another random number is generated and checked to see which range it falls in. If x is the interarrival time then an exponentially distributed interarrival time has the following form: 1 1 x = - In ( - - ) a 1 - Y (1) Where l/a is the mean of the distribution and y is a uniformly distributed random number between 0 and 1 as determined by the use of a pseudo random number generator. In our case, if we want x in seconds, then l/a = 3600/750. Suppose that the uniformly x = 3600/750 In ( 1 1 - .362 ) = 2.16 seconds It should be noted that the order sequence in which the ranges for the various message types are laid out on the unit interval does not in any way influence the simulation process, the reason being that the picked random number TABLE IV. BATCH TABLE Interrupt Option Batch Size Time 1 2 3 1 o Sec 3 4 5 Sec 10 Sec No Immediate Wait x x x Message Associated with Batch A B,D C,E,F 258 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 TABLE V -------------------------------------------TASKTABLE-LOAD---------------------------DISTRIBUTION TASK TASK TASK SEQUENCE NAME NO. I 2 3 4 I 3 2 4 MESSAGES ASSOC. WITH EXPONENTIAL ARBITRARY MEAN A C B. D E. F CONTINUOUS CUM. PROS. VALUE NORMAL MEAN STANDARD DEVIATION TRIANGULAR MIN. UNIFORM MAX. PEAK MIN. MAX. I SEC 6 SEC 4 SEC 4 SEC 5 SEC I SEC 3 SEC 8 SEC DSP. TSK 2 SEC 4 SEC TASK TABLE - OPERATE DISTRIBUTION MESSAGES TASK TASK TASK ASSOC. WITH NAME NO. SEQUENCE TASK EXPONENTIAL MEAN I I A .001 SEC 2 3 C .01 SEC 3 2 B. D 4 4 E. F DSP TSK ARBITRARY CUM. PROS. CONTINUOUS VALUE NORMAL MEAN STANDARD DEVIATION TRIANGULAR MIN. MAX. PEAK UNIFORM MIN. .02 SEC MAX. .02 SEC I SEC 5 SEC TABLE VI. SYSTEM TABLE Message Rate Length of Test 3600 Sec. Time Rate of seconds) Messages MAXIMUM TIME IN------------- TERVAL BETWEEN o 750 COMPLETION OF A CONTROL CYCLE AND REQUEST FOR A NEW CONTROL CYCLE 1800 Sec. (in is always uniformly distributed over the unit interval. The ranges for the messages in this system appear in Table VII. This means that, if alter a message has arrived and the random number is picked to determine what type of message is in the interval .47 to .77, then the message will be tagged as message type D. Using the procedures just outlined, the interarrival time, the time of arrival and type message that arrived of the first 6 messages in the system. (See Table VIlla.) LENGTH OF TIME ALLOTTED THE INTERLEA VE SUBSYSTEM TO STORE DATA 0 Sec. In addition to these inputs, suppose that the messages arrive at the times indicated. (See Table VIIlb.) NUMBER RUNS 1. The DPSS does not actually generate messages this far in advance; rather, it always generates enough messages to keep the arrival of OF TEST 259 THE DATA PROCESSING SYSTEM SIMULATOR (DPSS) TABLE VII. RANGE OF NUMBERS FOR EACH MESSAGE TYPE Range Message Type (Approximate) A B C D E F .10 .22 .14 .31 .05 .18 TABLE VIllb. MESSAGE ARRIVAL TIMES AND TYPES From-To Message oto .10 .11 to .32 .33 to.46 .47 to .77 .78 to .82 .83 to 1.0 C E F B A D A B the last generated message ahead of the time in the processor. The simulator could have been designed to generate all of the messages that will be used in the simulation prior to making the actual run. This is an acceptable method provided that feedback from the central processing unit will not influence the arrival of messages. 4.2.1 DPC Cycle Simulation-Cycle I (Figures 4 and 5) The first message (message D) arrives at the computer at 2.16 seconds after the start of the test and is collected in batch 2. It is the first message in the batch. The time criterion of batch 2 is 5 seconds, hence batch 2 will request a data processing cycle at 7.16 seconds. This batch has a "wait-interrupt" option. Message E arrives at 2.27 (2.16 + .11) seconds, and is collected in batch 3 which will request a cycle at 10 + 2.27 seconds or 12.27 seconds. Similarly, Message A arrives at 2.29 seconds and is coilected in batch 1, which has the "im- Arrival Time from Start of Test 18..4 Sec. 19.6 Sec. 20.7 Sec. 21.3 Sec. 21.8 Sec. 22.8 Sec. 23.7 Sec. 25.4 Sec. 27.5 Sec. 36.5 Sec. C E mediate-interrupt" option set. The interrupt occurs immediately and a cycle is initiated. Note that the computer was available for other tasks (other than Executive processing) during the first 2.29 seconds. As soon as a cycle starts, all of the batched messages are tran~ferred to the task processing area and the time and size criteria associated with each batch is reset. There are three messages to process in this cycle: Message Time of Arrival D E A 2.16 2.27 2.29 The cycle is set so that the tasks 1, 2, 3, 4 operate in the order 1, 3, 2, 4. Task 1 will 10 TABLE VIlla. MESSAGE ARRIVAL TIMES AND TYPES Random Number .362 .023 .004 .770 .478 Interarrival Time 2.16 .11 .02 7.06 3.12 Arrival Time from Start Random of Test Number 2.16 2.27 2.29 9.35 12.47 .72 .82 .02 .35 .11 INPUT ARRIVALS BATCHES DEA 15 CI I I D 2.16 E 2.27 A 2.39 9.35 I 12.47 I REQUEST CYCLE 7.16 Message Type D E A C B CYCLE 2 t----~I 3 t-------~I REQUEST CYCLE 12.27 REQUEST CYCLE BEGIN CYCLE I Figul'e 4. DPC Cycle Simulation Message BatchingCycle 1. 260 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 5 INPUT ARRIVALS OEA C I 15- 10' B 9.35 02.16 E2.27 A2.29 I 1 I I 12.47 20 ACE IFBA 0 1 1 1 1 II 4 16.6018.4\ 111~h8 19.6 21.8 20.07 21.3 BATCH , ~~~ST 17.47 1------11 REQUEST CYCLE 19.35 .....-_ _ _ _--11 CYCLE 5.19 7.914 10.534 2.2~L.T.IIL.T.3IL.T.41 5. 204 7. 934 0 20.23 ,Ja.52 ,L.T•. , 13.32 i:~,: rN~ r~ A ~~~ST E &}~PUT OS BEGIN CYCLE 12 20.23 Figure 5. DPC Cycle Simulation. Cycle I-Operation Cycle II-Batching operate during this cycle, since task 1 processes message type A and a t least one message A arrived before the cycle began. The processing time for the task is divided into a load and operate time. The load time for task 1 is uniformly distributed between 1 and 6 seconds. The length of time that is required to load task 1, this time, is determined by picking a uniformly distributed random number and applying it to a formula. To obtain a uniformly distributed random number lying between a minimum "m" and a maximum "M," one picks a uniformly distributed random number on the interval (0, 1) and uses the formula y = m +x (M - m) (2) where y is the desired value uniformly distributed between m and M and x is a uniformly distributed random number on the interval (0, 1). In this case, m = 1, M number x is .38 then y = 6 and if the random 1 + .38 (6 - 1) = 2.9 seconds = Hence, it takes 2.9 seconds to load task 1, this time. Since the cycle started at 2.29 and it takes 2.9 seconds to load task 1, the time is advanced to 5.19 seconds. After the task is loaded, the operate part of the task must be performed. The operate time for task 1 is exponentially distributed with mean .001 seconds. A uniformly distributed random number (0, 1) is again chosen. The processing time for message A is determined in the same fashion as the interarrival time was chosen for the incoming messages. Suppose that the processing time is determined to be .014 seconds, the simulated time is now advanced to 5.19 + .014 or 5.204 seconds. The DPSS performs the "load" part of the task operation once per task and the "operate" part once for each message that is processed by the task. The D PSS is an event oriented simulator hence the simulated time "steps" from event to event. However, whenever the time is advanced, the simulator always checks to see if any other event occurred or was to occur in the interval between events. Whenever this happens, the simulator "backs-up" to take appropriate action. Message processing for task 1 is now complete. Since message A has two displays associated with it, a check must be made to determine if these displays should be forced this time. The probabilities of forcing the two displays D1 and D2 (Table II) associated with message A are .8 and .3 respectively. Two random numbers are picked, say .94 and .47. Since .94 is not between 0 and .8 and .47 is not between 0 and .3 neither of the displays is forced. If, for example, the random numbers chosen were .63 and .47 then only Dl would be forced. The next task in the sequence, task 3, operates since it processes message Band D and one message D arrived before the beginning of the cycle. Task 3 is "loaded" into the computer with the load time determined in the way as was used to determine that for task 1. Suppose the load time turns out to be 2.71 seconds; the system time is advanced to 7.914 seconds. The "operate" part of task 3 is performed on message D. Task 3 has a uniformly distributed "operate" time with minimum equal to the maximum hence the "operate" time, .02, is constant. The time is advanced from 7.914 to 261 THE DATA PROCESSING SYSTEM, SIMULATOR (DPSS) 7.934 seconds. Task 3 is now ready to generate displays but since message D does not generate any displays, the cycle moves to the next task in the sequence. Task 2 which processes message type C is the next task scheduled to be' processed, however, no C messages arrived before the cycle began, hence task 2 is skipped during this cycle. Task 4 processes messages E and F and since an E message arrived before the beginning of the cycle task 4 will operate. Suppose its load time is 2.6, advancing the time from 7.934 to 10.534 seconds. During the load time for task 4, message C arrived at 9.35. Batch 3 has no interrupt option, however, message C will cause a cycle initiation request to be issued at 19.35 seconds (10 seconds from the arrival of message C). Since no interrupt occurred, the cycle continues, the operate time for message C taking 2.79 seconds. This brings the time from 10.534 to 13.324. During this time message B arrives at 12.47. Message B is collected in batch 2 which has a "wait-interrupt" feature causing a cycle to be requested 5 seconds after its arrival (being the first message in batch 2) at 17.47, or whenever 2 messages arrive in that batch. The task in progress is continued and since message E forces display D5 with probability 1.0 the task that forces the display must be "loaded" taking, say, 3.2 seconds, bringing the time from 13.324 to 16.524. Display D5 is now forced, which takes 3.71 seconds, bringing the time to 20.234 seconds. Message A arrives at 16.60 seconds during the "operate" part of the task which forces displays. Since message A is collected in batch 1, a new cycle is requested at 16.60. The simulator has the capability of handling a request for interruption of a logical operation in two ways. In the first way, the DPSS can honor the interrupt as soon as it occurs; in this case the "operate" part of the task which forces display D5 would have been interrupted. The second way is to recognize the interrupt request and honor it as soon as the current logical operation has been completed. For purposes of this example, the interruption will take place as soon as the current logical operation is complete. Thus the interrupt request will be held until the output processing is complete, 20.234 seconds, hence a new cycle begins at 20.234 seconds. 4.2.2 DPC Simulator-Cycle II (See Figures 5 and 6) The following messages arrived during the first cycle and are waiting to be processed in the second cycle: Message Time of Arrival C B 9.35 12.47 16.60 18.4 19.6 A C E The DPSS has the option of processing the messages in the order that they are received or according to some sequence which is independent of arrival order. In this system the messages are processed according to a preset sequence, i.e., message A is processed before message B even though it arrived later in time. The two C messages will be processed together with the one arriving at 9.35 seconds being processed before the one arriving at 18.4 seconds. Again, in this cycle, task 1 is the first one to be operated. Suppose the load time is determined to be 3.9 seconds; this advances the 20 25 'Y'l'FBA A !NPUT ARRIVALS C 18 III I I I 30 35 C I BATCH I 2 REQUEST CYCLE 3 REQUEST CYCLE CYCLE END CYCLE 20. 23 2 I L.T.I 24. 13 I 8 -~-- .. _____ ~_!_~ 3.9 SEC TASK 3 TASK 2 PROC. A 3 BEGIN CYCLE 3 Figure 6. DPC Cycle Simulation. Cycle II-Operation Cycle III- Ba tching TASK 4 262 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 time to 24.13 seconds. The following events occur during the load of task 1: Message Arrives at F B A D A 20.7 21.3 21.8 Collected in Batch Request Cycle at 3 2 1 2 1 30.7 26.3 21.8 26.3 ~2.8 23.7 Message A collected in batch 1 causes a cycle to be requested at 21.8 seconds. This request is the "interrupt-immediate" type. However, under the operating rules of this system, the cycle interruption will not take place until task 1 processing has been completed at 24.132 seconds. Note that the second message A arriving at 23.7 seconds does not generate another cycle request. The messages that were to be processed by tasks 2, 3, and 4 are queued when the cycle interrupt occurs. 4.2.3 DPC Simulation-Cycle III (See Figures 6 and 7) The third cycle starts at 24.132 seconds. The batches are again cleared and the time and size criteria counters are reset to zero. Task 1 operates first (there are two A messages to be processed) and is completed at 27.9 seconds. INPUT ARRIVALS BATCH 25 30 HDA IB C 22 I I ~ 1 ".1 35 I 4.2.4 DPC Simulation-Cycle IV (See Figures 7, 8, and 9) Cycle number 4 begins at 31.5 seconds. The following messages have been collected in the batches during the third cycle or have been queued from previous cycles: Message Number B 1 3 C E 1 1 F .I() I 25.4 27.5 21.8 23.7 During the operation of task 1, message B arrives at 25.4 seconds and message C arrives at 27.5 seconds. Message B will cause a cycle to be requested at 30.4 seconds and message C will cause a cycle to be requested at 37.5 seconds. The request from batch 2 (containing message B) will be an "interrupt-wait," that is, the interrupt will occur when all of the messages which are processed before Band D have been processed, but the interrupt will not interrupt the currently operating task. In this cycle, task 3 operates after task 1 because two B's and 1 D are waiting to be processed. Processing of task 3 is complete at 31.5 seconds. N ow since B is a higher priority input than those processed by tasks 2 and 4 the cycle will be interrupted after task 3 has been completed. The inputs that would normally be processed by tasks 2 and 4 are queued for the next cycle. I INPUT ARRIVALS 36.5 BATCH ~-n C I 27.5 30 35 I 40 I , E 36.5 REQUEST CYCLE t--------I, ~~~~ST CYCLE CYCLE o 24.132 A B C I A I B Y F ' __ £._-I_.! __ ~ ~~~C~C~SK TASK 2 TASK 4 3 ~~~~~ c I 4 Figure 7. DPC Cycle Simulation. Cycle III-Operation Cycle IV-Batching C BEGIN CYCLE F 'TA~K 3 'TA;K 2' TASK E I 4 Figure 8. DPC Cycle Simulation. Cycle IV-Operation THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) o INPUT ARRIVALS 10 DEA C 0-2.16 E-2.27 A-2.29 ,2.29 REQUEST CYCLE BATCH I 9.35 15 B I 12.47 35 A I 16.60 119.61 '1.8f23.725.4 IS.4 20.71 21 • 3 22. S 16.60 7.16 1 I 17.47 REQUEST CYCLE 26.3 12.27 31.4 I I REQUEST CYCLE 2.27 36.5 REQUEST CYCLE r - -___I I 40 E 21.S 1 REQUEST CYCLE 2.15 263 1 REQUEST CYCLE REQUEST CYCLE 19.35 REQUEST rl--------~==LI______~I ~1--------F===LIC~Y~C~LE~-,3~7IL·5----REQUEST CYCLE BEGIN2~rfLE5.19 IL T 'I L. 7.914 T. oIL. 5.204 PROCESS INPUT A T CYCLE 10.~~;LE 41 PROC IL T. REQUEST 16.52 REQUEST Tel 20j23 CYCLE iNPUT E 13. 32 PROCESS GEN. OUTPUT 05 0 20·~.T.124.13 B ~ E ~----+----~--- 3.9 SEC .002 T.3 T.2 T.4 PROCESS A 0 24.132 A B C F ABC E lrASK I YASK3 ITASlr2lr'ASK~ C B 4 C C F E 'TASK 3 'TASK 2 ITASK ! Figure 9. DPC Cycle Simulation Summary. Cycles I, II, III, IV. Therefore, tasks 3, 2, and 4 will operate in cycle number IV if no interruptions take place. This procedure is continued until the end of the test. 5.0 DPSS APPLICATIONS The DPSS has been applied primarily to real time management information and command! control type systems. The initial major emphasis was in the design and development of the 465L SACCS. Approximately 300 production runs were made simulating 2000 hours of actual operation. While the checkout and installation of the entire operational SACCS program system is not yet complete, significant portions of it have been successfully demonstrated. The results of these demonstrations are classified, but it can be said that the DPSS predicted results were close to actual performance figures. The initial application of the DPSS to the N ew York State Identification and Intelligence System during the feasibility study phase showed that a class of computers could not handle the job unless the user drastically changed his requirements. The choice was offered to the user early in the system acquisition process to make the trade-off of dollars versus capability on a more informal basis. As the work on this system progresses, the DPSS will continue to 'be used to evaluate the various possible system configurations and aid in the selection of the appropriate hardware. 5.1 Applica.tion of the DPSS on 465L SAGGS At the outset of the investigations performed with the DPSS, many combinations of SACCS system characteristics were checked because of the complexity of the problem. A list of the maj or system characteristics checked are shown in Table IX. One of the system characteristics initially subjected to detailed investigations was the length of the control cycle. This was done because system response time (the time from the initiation of a request for data until the data was presented) was found to be a function of normal uninterrupted DPC cycle time. * Cycle time was, in turn, found to be a function of many items, such as total message rate, tasks, sequencing, and batches. It was also * This evaluation was made prior to the introduction of the interrupt feature, which permits short cycles and fast response, but which causes the average age of data presented and message queue lengths to increase. 264 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 TABLE IX. SYSTEM CHARACTERISTICS INVESTIGATED IN INITIAL SIMULATION RUNS MAX 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Length of Control Cycle Response Time Maximum Message Capacity Age of Data Storage Requirements Average Processing Time Per Message Type Average Waiting Time Per Message Type Queue Lengths Per Message Type Message Priority System Relationship and Sensitivity of the System to Combination of 1 Through 9 Above found to be extremely difficult to predict system response time with a reasonable degree of certainty under a wide variety of operating conditions. To overcome this problem, an "interrupt feature" was introduced into the design, which permits extremely rapid data presentation on demand. However, prior to incorporating this system change the effect of the interrupt feature on the system was checked and the "cost" of extremely short response time was dramatically demonstrated (Section 5.1.1 and Figure 11). The costs and other side considerations are discussed in the next section, Results of DPSS Runs and Their Interpretation. 5.1.1 Results of DPSS Runs and Their Interpretation The types of outputs from the simulator runs were as shown in Section 3.5 with some typical results shown in Figures 10 and 11. In testing various possible program system structures, one of the objectives was to attempt to have the average times to process each message type become relatively constant (Figure 10). The average age is checked for a wide range of message loads and the value at which this leveling out occurs would then represent the expected average age of data in the system by message type. AVG TIME MIN 2 3 CYCLE CONTROL NUMBER Figure 10. Min, Max and Average Time to Process Each Type of Message by Cycle. Message-batch-task arrangements are investigated to find those arrangements which tend to stabilize the average processing time values (with minimum spread) in order to select an optimum program structure. The histogram of message processing time for each message type ( Figure 11) for the messagebatch-task arrangement is then evaluated. In the case shown in Figure 11, the effect of the interrupt feature (for display requesting) on system performance was evaluated. The high priority display request technique was introduced to permit the interruption of the system in order to respond to data presentation requests in minimum time. It can be seen from Figure 11 that for a given set of system loading conditions, 100% of the high priority display requests were honored in one minute or less. However, the effect on a low priority data message was dramatic and 100% % OF MESSAGES PROCESSED HIGH PRIORITY DISPLAY REQUESTS 2 3 TIME (MINUTES) socro "lo OF MESSAGES PROCESSED 10% 29 30 LOW PRIORITY DATA MESSAGE" (ONLY 1/3 OF TOTAL REC·D WERE PROCESSED) 31 - - - . 31+ TIME (MINUTES) Figure 11. Histogram of Message Processjng Time. THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) required further assessment. N one of the lower priority data messages was processed in under 28 minutes. Only 5% were processed between 28 and 29 minutes, 10% processed between 29 and 30 minutes, 35 % processed between 30 and 31 minutes, and 50% of all messages processed took 31 minutes or longer. These results can be interpreted to mean that the high priority data presentation requests which were honored in one minute or less did not make use of the data contained in the lower priority data messages. For the same case, the queue length was determined from the fact that only 113 of the lower priority data messages that were received were processed. I t was further necessary to determine exactly what type of data was contained in each type of message and what its variability might be. If the lower priority messag-e contained data which varied relatively little over a period of several. hours, then the fact that 50% of the data might be as much as 30 minutes old is of considerably less consequence than if the data were extremely variable in less than 30-minute increments. Therefore, by investigating the histograms for each message type, establishing limits on the number of high priority display requests which might be made periodically, and determining the proper message priority and message-batchtask relationship, it was possible to arrive at a program system structure which satisfied .Operational Requirements. 5.2 General Results The results of the simulations showed that the system is most sensitive to the use of the interrupt feature. This "system interrupt" was found to cause short cycle times which could create the impression of highly efficient operation. However, the queue sizes and waiting times were increasing as a result of the increased use of the interrupt features. The total message rate imposed upon the system with all or most message types being present was found to be second in importance to the system's operational performance. The presence of all ( or· most) message types causes a maximum number of input-output (I/O) transfers to occur. I/O transfers are one of the most 265 time consuming aspects of data processing tasks and caused total cycle time to increase as the number of transfers increased. t It was also determined that the system was not particularly sensitive to the relative frequencyof each message type for any given input rate as long as all or most types of messages were present. This particular aspect of the environment assumed secondary importance based on these results. The investigation of the "time. to load" and "time to process" portions of the system's task operating times showed a high proportion of time to load vs. time to process. This suggested that the system's apparent I/O limitations would bear further investigation for possible improvement of overall system operation. 5.3 System Improvements A disc file is used as the principal auxiliary storage device for the SACCS. A study of its characteristics suggested that a change from the current serial read procedures to parallel read procedures could produce a considerable reduction of I/O time. A conservative estimate of the I/O time reduction factor attainable was set at 10 because of anticipated engineering problems, and also because there are existing hardware items other than disc files which have the desired timing characteristics and do not involve modification of the disc. When only the times to load and unload (I/O) of all system times were reduced by a factor of 10 the results of the DPSS runs showed a very high payoff available for such a modification. The probable results from the reduction of the I/O time are shown in Table X. In these results, the "rates" refer to the message input rate per hour; the "load" refers to either a "normaP' (N) loading and unloading time or a "1/10 normal" loading and unloading (I/O) time. "Interleave" means the availability of DPC time for other operations which might normally occur as part of a time sharing funct This is not necessarily a linear function. Also note that the effect of I/O transfer time on total cycle and response times is a complex relationship, i.e., many I/O's can occur with no responses required and response time therefore becomes meaningless in this case. 266 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 tion of the system. The load rates shown are hypothetical and do not in any way reflect actual rates. However, the effect of reducing the I/O time factor in the total system operating time is clearly demonstrated as a function of relative loads. It is apparent from the results shown in Table X that auxiliary storage devices with the performance specifications inferred from the reduced I/O time factor would be desirable to permit maximum time sharing and maximum expansion potential in the system. 6.0 DPSS CAPABILITIES SUMMARY A summary of the capabilities of the Data Processing System Simulator are: 1. System Feasibility Studies 2. Simulates Computer Based Data Processing Systems 3. Evaluate Equipment and Processing Discipline Combinations vs. System Operational Requirements 4. Establish Equipment Configuration for the System 5. Establish Program Configuration for the System 6. Development of Detailed Design Requirements (Operational Program Requirements, Subsystem Design Specifications, Program Design Specifications) 7. Set Initial Parameters for the Operational System 8. Determine System Performance Requirements (for Acceptance Testing) TABLE X. PROBABLE RESULTS FROM MODIFICATION OF I/O DEVICE Conservative Assumed Improvement of I/O Time = Factor of 10 Rate 10 30 10 30 60 Load Interleave N N 0-30% 1/10 1/10 1/10 80% 60% 30% None Comment Acceptable Poor All Requirements Met All Requirements Met All Requirements Met 9. Evaluate Proposed System Modification and Retrofits before Implementation The flexibility of the DPSS in terms of the variability of both the inputs to be tested and simulation program logic makes this tool useful in the early stages of establishing data processing system requirements. I t is a powerful tool in performing system feasibility studies, simulating the operation and performance of computer based data processing systems, and in evaluating equipment and data processing discipline combinations as a function of system operational requirements. Once past the initial phases of system development process, the DPSS can continue to be useful in helping to evaluate the equipment configurations being considered for the system, and in establishing the framework for the computer program configuration for the system. This latter framework includes items such as the need for Executive or master control programs, the structuring and organization of system inputs, processing tasks, and outputs. During the system implementation and acquisition phase, the DPSS is of continuing usefulness in the development of detailed design requirements. The key design features for Operational Program Requirements (OPR), Subsystem Design Specifications (SSDS), and Program Design Specifications (PDS) can be determined and established as design goals. This work, in addition, is valuable in setting the initial parameters for the operational system. The DPSS can be used to develop System Performance Requirements (SPR) which can be used at the conclusion of the system acquisition phase during which acceptance testing is performed. The SPR's can be established early in the design process and used by both the contractors and the procuring agencies as performance criteria for determining the successful completion of the design and implementation phase. Proposed system modifications and retrofits can be evaluated before commitments are made for additional equipment, computer programming, or human action requirements. This evaluation is essentially the performance of the system feasibility studies discussed earlier in this section. Thus, the design, development, in- THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) stallation, and acceptance testing procedure can be completed by providing the analytic capability needed to continually refine and improve any system in existence or proposed for future development without becoming so deeply committed in time and dollars that prohibitive rework costs are incurred as is currently the case in the field of command}control and management information systems development. 7.0 EXPANDED DPSS CAPABILITIESMODEL C The experience gained by applying the DPSS to three major management control systems in different phases of development (see Section 1.0) identified areas in the. original version of the DPSS which could be further generalized and expanded so that it can be used to simulate a greater variety of data processing systems. The expanded DPSS-Model C is now operational on the IBM 7090/7094 computer. DPSS-Model C is a self contained program package; i.e., it does not require the use of a control program. In this way, the operation of the program is "streamlined" so as to reduce the computer time required to make each run. In addition to the capabilities of the original DPSS described in Sections 1.0 through 6.0 inclusive, DPSS-Model C has the following additional features: a. The DPSS-Model C program is structured so that one programming task has the ability to create inputs for other tasks. b. Associated with each message-task relationship is the probability of message being processed by each of its associated tasks. A probability of 1.0 ensures that a message will always be processed by a particular task. c. In order to accommodate transient states in the input message frequencies, DPSSModel C permits changes in the relative frequency of input messages during the simulation run. This means, for instance, that the DPSS can simulate changes in the mode of operation of the system. d. DPSS-Model C has the capability of simulating several levels of system operation, where one level has priority over another. Each level can be considered as 267 a system in itself-messages are collected in batches, processed in tasks according to some processing discipline (e.g., first come, first serve or according to a task sequence), generate displays, etc. Every level may, but need not~ have identical characteristics except that one has priority over the other one. When going from one level to another the simulator has the capability of aborting the lower priority level immediately or at the end of a logical operation (e.g., completion of a task) going to the higher level, doing the required processing and then returning to the point of interruption. Up to 100 levels can be requesting service at the same time, with the highest priority service first, the next highest priority second, and so on. e. The simulator has the capability of selectively emptying the individual batches for each data processing cycle. This will operate so that only the messages collected in the batches whose batch criteria are exceeded before the beginning of the cycle will be processed in that cycle. The simulator has the option of transferring to the processing area for processing in the next cycle only those messages which were collected in batches whose batch criteria was exceeded before the beginning of the cycle. The rest of the messages remain in their respective batches until their batch criteria have been exceeded. This capability enables the DPSS to simulate more than one independent system with a priority arrangement such as might be found in a time sharing system. 4 f. It is possible to delete messages in the system as a result of other messages being processed. This capability covers conditions when "partial updating" of files is specified in early messages in the system and where a "complete update" message is received which includes the conditions cited in the "partials." In this way, the simulator will not duplicate the processing of any message. g. The capability is provided to regulate task operation so that a task will operate 268 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 only once in every "n" cycles; "n" must be specified in the input data. h. The DPSS has the capability of resetting the computation counters (e.g., those counters used to compute average message processing time, maximum processing time, etc.) at a prespecified time after the beginning of the test. With this feature, it is possible to study systems in transition, during steady state conditions, or a combination of transient and steady state. 1. The DPSS has the capability of providing three sets of periodic summaries throughout the test. The summaries can be presented for every message-task relationship in the system using three time periods. For example, the average processing time for a given message being processed in a given task may be presented every 30 seconds, every 25 minutes and every hour. j. The concept of generating or forcing dis- plays as a result of messages being processed has been generalized as follows: (1) There can be any number (limited only by the capabilities of the simulator) of tasks that produce forced displays. Each display producing task has its own processing characteristics. (2) The tasks that produce forced displays need not follow the task which processed the data messages which caused the display to be forced, indeed, the tasks which force displays can be located anyplace in the task sequence. k. Additional capabilities have been added to the DPSS to permit detailed investigation of the effects of limited buffer size on the functioning of a data processing system. The DPSS also has the option of "losing-from-the-system" those messages which attempt to enter a full buffer. The DPSS identifies those messages which have been "lost," or it can exercise the option of queuing these messages and not losing them. 1. The manner in which outputs from the simulator can be presented has been made more flexible so that only the necessary or desired information will be produced. m. The processing of messages by tasks has been modified to permit more flexible and detailed simulation of the operation of a task. Task processing is divided into three sub-operations: (1) performing the input operation, (2) processing each message associated with the task and (3) performing the output operation. Each of the suboperations has its own distribution function and parameters. In addition, the DPSS has the option of performing a "save data" operation in the event of an interruption during the operation of a task. When this option is exercised, the "save data" part of the task operation will be executed before the request for interruption is honored. The "save data" operation has its own distribution function and parameters. There may be a "save data" operation associated with each task in the system. 8.0 FUTURE DEVELOPMENTS Future developments of data processing simulators will most likely be along the lines of multi- and parallel-processing and will include prediction techniques so that the time required to develop a data processing system under various configurations can be studied. APPENDIX I SAMPLE PRINTOUT This appendix contains a sample printout from a simulation run. The output data contains the following items of information: 1. The cycle number, its begin and end time. 2. For each message arrival the message identity, time of arrival, the batch in which the message is batched, the task that processes the message (mod catego~y), the priority of the message, the time when the processing of the message was completed, the displays that are forced by this message and the time that each display is forced. THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) 3. A list of messages that have not been completely processed at the end of each cycle. 4. Averages by batch, priority, message of a. Processing time b. Total time c. Waiting time 269 5. Number of messages queued at beginning and end of cycle. 6. A list of messages remaining to be processed at the end of the test. 7. A total time distribution by minute, per message at end of run. It tells, e.g., what percent of messages were processed between, say, 10 minutes and 11 minutes. DPC SIMULATION-RUN NUMBER 1 CYCLE REQUEST AT 7.65 REASON SIZE BEGIN CONTROL CYCLE NUMBER 1 AT 22.65 BEGIN CYCLE NUMBER 5 AT 638.48 MESSAGE QUEUE LENGTH-565 BATCH SIZES BATCH NR. 0 SIZE 0 0 1 1 2 3 4 373 60 NR. TIME TOTAL TYPE MOD ARJUVAL OUT BATCH PRIORITY FORCED FORCED INDENT TIME WAIT 5 2 643.19 664.29 3 o 13C o 382.50 693.63 3 1 21 21 2 568.09 568.09 2 CYCLE INTERRUPT AT 813.67 REASON WAIT END CYCLE NUMBER 5 AT 834.86 690 MESSAGES REMAINING TYPE MOD ARRIVAL BATCH PRIORITY 24 4 1.52 3 1 40 4 4.41 3 1 21.11 5.40 311.13 311.04 1 809.52 A02 2 809.57 A02P 270 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 AVERAGES BY BATCH TOTAL TIME BATCH NR WAIT 0 1 2 3 4 NO MESSAGES NO MESSAGES 56.50 227.53 253.24 56.56 227.65 253.30 AVERAGES BY PRIORITY PRIORITY NR WAIT TOTAL TIME 0 1 8.57 233.19 17.82 233.25 AVERAGES BY MESSAGE TYPE TYPE MOD NR PROCESSED WAIT TOTAL TIME MAX TOTAL TIME MIN TOTAL TIME 0 NO MESSAGES 1 2 2 275.82 275.88 2 2 1 233.24 233.29 3 2 18 245.29 245.35 346.82 346.82 390.05 169.80 TOTAL TIME DISTRIBUTION RAR FROM 0 MIN. TO 1 MIN. .33 FROM 1 MIN. TO 2 MIN. FROM 2 MIN. TO 3 MIN. .33 FROM 3 MIN. TO 4 MIN. FROM 4 MIN. TO 5 MIN. FROM 5 MIN. TO. 6 MIN. FROM 6 MIN. TO 7 MIN. FROM 7 MIN. TO 8 MIN. FROM 8 MIN. TO 9 MIN. FROM 9 MIN. TO 10 MIN. FROM 10 MIN. TO 11 MIN. FROM 11 MIN. TO 12 MIN. FROM 12 MIN. TO 13 MIN. FROM 13 MIN. TO 14 MIN. FROM 14 MIN. TO 15 MIN. FROM 15 MIN. TO 16 MIN. FROM 16 MIN. TO 17 MIN. FROM 17 MIN. TO 18 MIN. FROM 18 MIN. TO 19 MIN. FROM 19 MIN. TO 20 MIN. FROM FROM 20 MIN. TO 21 MIN. FROM 21 MIN. TO 22 MIN. 22 MIN. TO 23 MIN. FROM 23 MIN. TO 24 MIN. FROM 24 MIN. TO 25 MIN. FROM 25 MIN. TO 26 MIN. FROM 26 MIN. TO 27 MIN. FROM 27 MIN. TO 28 MIN. FROM 28 MIN. TO 29 MIN. FROM 29 MIN. TO 30 MIN. FROM 30 MIN. TO 31 MIN. FROM 31 MIN. TO .33 THE DATA PROCESSING SYSTEM. SIMULATOR (DPSS) Hence, APPENDIX II y=p.+O"Z and y is the desired number. THE USE OF RANDOM NUMBER GENERATOR 1. The Random Number Generator" X Il + l = 129 Xn + 227216619 (mod 268435456) is used to genrate numbers between 0 and 1 which are statistically tested to be uniformly distributed. That is are numbers lying between 0 and 1 and satisfy various statistical tests for being uniformly distributed (0, 1). The first number of the sequence is gotten by letting Xo = 0 and then Xl = 227216619. 2. Uniformly Distributed Random Numbers Suppose it required to obtain a uniformly distributed random number lying between m and M. If one obtains a number x uniformly distributed (0, 1) then the converted number = m +x (M - m) (3) is uniformly distributed (m,M). In practice this is used if the only information available is the minimum and maximum of a random variable. 3. Normally Distributed Random Numbers A table is assumed stored in core of normal distribution with mean 0 and standard deviation 1. Suppose a random number y given with normal di.stribution of mean p. and standard deviation 0". Then Y- Ii is a normally distributed random variable with mean 0 and standard deviation 1. To choose a random number y, normally distributed with mean 0 and standard deviation 1 a number x, uniformly distributed (0,1) is chosen. Using x find the corresponding Z from the table of the normal distribution (mean 0, standard deviation 1). Now, y-p. Z=-- (5) 4. Exponentially Distributed Random Numbers An exponentially distributed random number x has the distribution t O, P [x ::;; t] = where 268435456 y 271 { 1 - e-at , t <0 >0 !a is the mean of the distribution. (6) Since negative values of t have no application for the simulator it will be assumed here that t ~ o. To obtain an exponentially distributed random number, a random number y, uniformly distributed (0,1) is given and is set equal to 1 - e-ax Hence, y = 1 - e-ax (7) and solving for x, 1 1 x=-ln-a 1-y (8) 5. The Triangular- Density Function Sometimes, besides the minimum and maximum of a randoll1 variable, some Hfavorite" value is known. This suggests the use of the triangular density function. Let m, IvI be the minimum and maximum and K the "favorite" value. The density function assumes the triangular shape shown in Figure 12. The area of the large triangular, mPN, is equal to 1. To obtain a number with such a density function, a random number X uniformly distributed (0,1) is chosen and a number Y is computed so that the region designated by A has area X. 6. Arbitrary Continuous Distribution The random number generator enables the user of the simulator to utilize almost any disP A m Y K M (4) Figure 12. Triangular Distribution 272 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 tribution he desires. The basic requirements for such a distribution are: a. that it be continuous b. that the random variable be bounded. Condition 1 is usually the case. A distribution function that has a discontinuity does not usually occur in a simulation, and can, if necessary, be approximated by a continuous distribution. Condition 2 requires that the user specify that the value of the random variable not exceed (so far as the approximation is concerned) some specified value. Suppose Figure 13 represents some distribution. Various values (in this case Xl' x 2 , x 3 , x 4 ) are chosen from the distribution to be approximated. Let F (x) be the original distribution. Then PI = Prob [x S Xl] (9) P 2 = Prob [x S x 2 ] and so on until (10) Suppose a random number x, uniformly distributed (0, 1) is chosen. Then if 0 S x S PH (11) 'where Xis the desired value. If PIS X S P 2, then X=XI+ (X_PI)(X I -X 2 ) (12) PI P 2 and so on, testing whether R lies between two successive P's and then using linear interpolation to determine the value between the two points. APPENDIX III GLOSSARY OF TERMS The following definitions are included for the reader who may not be familiar with the terminology of the document. BATCH A device which is used to request a DPC cycle. BATCH CRITERIA Parameters associated with each batch which determines when a batch will request a DPC cycle to be initiated. DATA PREPARATION A task which processes every incoming message to make sure that they are valid. The Data Preparation Task also determines which tasks must operate in the present DPC cycle. DISPLAY A presentation of information contained in the D PC program system files. DPC CYCLE INTERRUPT A DPC cycle interrupt occurs when all of the messages of the present control cycle are not processed before a new cycle begins. DPC CYCLE REQUEST A request for a DPC cycle when the batch criteria of a batch is exceeded. DPC PROGRAM SYSTEM CYCLE A sequence of tasks that are performed to process the system message. DPC PROGRAM SYSTEM (OR CYCLE) That part of the system within the Data Processing Central (DPC) which deals with the primary processing functions. Figure 13. Arbitrary Continuous Distribution FLASH MESSAGE A message that is processed immediately, THE DATA PROCESSING SYSTEM SIMULATOR (DPSS) upon receipt by the computer. This message does not cause a new cycle to be initiated. FORCED DISPLAY A display which is generated as the result of a message (data) being processed. INTERARRIVAL TIME The time between the arrival of two successi ve messages. INTERLEAVED SUBSYSTEM Any subsystem other than the primary real time program system. LOAD TIME The time required to transfer the operating programs and data environment associated with a task from tape, drum or disc to core. MESSAGE An input into the DPC program system. MOD The position of a task in a sequence of tasks (the task sequence number). OPERATE TIME The time required by the operating programs to process a message once the operating programs and environmental data are in core. PRIORITY A measure of importance in a message. PROCESSING TIME The time from when processing in a message is begun to when it is completed. RANDOM NUMBER A number from a random number generator tested for certain statistical properties. REQUESTED DISPLAY A display which is generated as the result of a special message being processed (a display request message). 273 WAIT TIME The time which begins when a message arrives in the DPC and ends when processing of this message begins. APPENDIX IV MACRO INSTRUCTIONS The logical operation of the model is governed by the interpretive program. The operation of the simulator can be changed by changing the flow of the interpretive program. The flow diagram that is currently being used in the simulator is shown in Figure 14. Figure 14 illustrates the interpretive program as it is read into the computer. The first few instructions of the interpretive program are explained below. The last line of the flow reads STOP IA This indicates that the flow diagram data is complete and the program is to execute the instruction labeled IA first. This instruction is found on line 1: . T"O IA ~.L.I TT\ ~~ IA is the instruction label. QMSGAR is the instruction. The Q usually designates that the instruction asks a question. In the case QMSGAR asks whether any messages have arrived. If yes go to IB, if no go to ID. Suppose a message has arrived then one goes to IB: IB AGNMSG IA FIN AGNMSG begins with an A which usually designates an action, and the instruction states· that a message is to be generated and then one goes to IA or FIN depending upon whether there is more to do in the simulation or not. If no message has arrived, one goes to ID: ID QFLASH IE IC TASK A related collection of programs (considered in the DPSS as a single operation) which operate on a message or a set of messages. QFLASH asks whether any flash messages are to be processed. If there are go to IE, if not to IC. TOTAL TIME The time from the arrival of a message at the DPC to completion of message processing. Macro Programming Instructions The instructions QBEGCC, QMSG, QSTART are macro programming instructions. The list 274 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 IA CA CB CC CE CF CG CH CK CC CP CQ CQO CR CS IB IC ID IE IF IG IH II 1M IN IR FIN IS III QMSGAR QBEGCC ABEGPC QTASKS AENDCC QSTART QMSG LDTASK APCMSG QMSG QBEGFD QFORCE ALDDP APCFRC QFORCE AGNMSG QFLDSP QFLASH APCFLS QCC QINT QFLLD ABEGCC QREQ AGNMSG AABORT FINISH ABEGCC AENDCC STOP IB CB CC IA IA CG CH IA IA CK CQ CQO IA CS IA IA IH IE IA IG ID CF FIN CE FIN CO CC FIN FIN CP CR CC FIN III CA FIN FIN IR FIN FIN IA CA IN ID IS IA II FIN CC FIN IF IC FIN 1M FIN FIN IA Figure 14. Interpretive Program Flow Diagram of macro instructions and their meanings are as follows: 1. DUMP This instruction gives an octal dump. 2. QBEGCC This instruction asks whether at the time the instruction is to be operated upon the system is at the beginning of a cycle. The beginning of a cycle consists of the tasks of loading and operating the data preparation tasks. 3. ABEGPC This instruction performs the loading and operation of the data preparation task only at the beginning of the cycle. 4. QTASKS The instruction inquires whether any more tasks are to be done in the cycle and sets the indicators to the next task done by the cycle. 5. AENDCC This instruction resets various indicators so as to signal the end of a cycle. I t also gives a summary of the output data for that cycle. 6. QSTART This instruction inquires whether the system is at that moment at the start of a task, i.e., whether the program for that task was loaded. 7. QMSG This instruction inquires whether there are messages in the table for the particular task to operate upon. 8. LDTASK This instruction "loads" the task in question, i.e., determines how long it takes to load the task and increment the simulated clock. 9. APCMSG This instruction processes the appropriate incoming message according to processing priority. 10. QBEGFD The instruction inquires whether the forced display ·task should be "loaded." 11. ALDDP This instruction "loads" the display task. 12. QFORCE This instruction inquires whether there are forced displays to process. 13. APCFRC This instruction processes the appropriate forced display. 14. QMSGAR This instruction inquires whether any messages have arrived. 15. AGNMSG This instruction generates messages if any are due to arrive. THE DATA PROCESSING SYSTEM. SIMULATO'R (DPSS) 275 16. QFLDSP This instruction inquires whether any of the ft.ash messages generate display. 24. AABORT This instruction aborts the interleaved subsystem. 17. QFLASH This instruction inquires whether there are any ft.ash messages to process. 25. FINISH This instruction finishes the run, sumrizes the data and goes on to the next run, if any. 18. APCFLS This instruction processes ft.ash messages. 19. QCC This instruction inquires whether the cycle is operating. 20. QINT This instruction inquires whether the cycle is to be interrupted. 21. QFLLD This instruction either "loads" or processes the ft.ash display task, whichever is appropriate. 22. ABEGCC This instruction begins a cycle. 23. QREQ This instruction inquires whether a message will arrive before the request for the instruction of a cycle. APPENDIX V SUMMARY DESCRIPTION OF THE IBM AN/FSQ-32 COMPUTER4 The AN/FSQ--;3'2 computer is a l's-complement, 48-bit word computer, with 65,536 words of high-speed (2.5 m sec. cycle time minus overlap) memory available for programs, and an additional 16,384 words of high speed memory available for data and input/output buffering; the latter memory is called input memory. There are four core memory banks (of 16K words each) which are individually and independently accessible by three control units: the central processor unit, the high speed control unit, and the low speed control unit. High speed I/O, low speed I/O, and central processing can take place simultaneously out of different memory banks, or with certain restrictions, out of the same memory bank. Characteristics of the AN/FSQ-32 Storage Devices Devi.ce Size Word Rate Average Access Time Core Memory 65K 2.5 ,usec/wd Input/Output Core Memory 16K 2.5 ,usec/wd Magnetic Drum 400K 2.75,usec/wd 10 msec 4000K 11. 75 ,usec/wd 225 msec Disk File Magnetic Tapes * Depending 16 Drives 128 ,usec/wd (high density) 5 to 30 msec (no functioning) * on whether the tape is at load point, and whether it is being read or written. 276 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 REFERENCES 1. H. M. MARKOWITZ, B. HAUSNER, and H. W. KARR, Simscript: A Simulation Programming Language, RAND Corporation, Santa Monica, California, 1962. 2. General Purpose Systems Simulator 11Reference Manual, IBM, 1963. 3. C. A. KRIBS, Building a Model Using Simpac, System Development Corporation, Santa Monica, California, TM 602/300/00, November 15, 1962. 4. J. D. SCHWARTZ, E. G. COFFMAN, and C. WEISSMAN, A General-Purpose Time-Shar- ing System, System Development Corporation, Santa Monica, California, SP-1499, April 29, 1964. 5. P. PEACH, "Bias in Pseudo Random Numbers," Journal of American Statistical Association, Vol. 56, No. 295, September, 1961. 6. M. 1. YOUCHAH and D. D. RUDIE, A Universal DPG Simulator Applied to SAGGS Program System Design, System Development Corporation, Santa Monica, California, SP-924, July 8, 1963. 7. D. D. RUDIE and M. 1. YOUCHAH, The Data Processing System Simulator (DPSS), System Development Corporation, Santa Monica, California, SP-1299, March 23, 1964. THE USE OF A JOB SHOP SIMULATOR IN THE GENERATION OF PRODUCTION SCHEDULES Donald R. Trilling Westinghouse Electric Corporation, Pittsburgh, Pennsylvania The following describes some techniques under development at the Steam Division of the Westinghouse Electric Corporation. This plant, located at Lester, Pennsylvania, manufactures large Steam Turbines, and its main facility is an exceptionally large job shop. It is the locale where many of the concepts discussed below underwent development. However, it should be made clear that this paper is not in any way intended to be a progress report on their use there. The nature of these techniques remains highly experimental, and they are described here as a matter of interest to those who are concerned with the potential of computers in management applications. we gain some insight into how the real shop will perform in processing an equivalent load. The model shows specifically how machines will be loaded, the extent of the queues that form, and when orders may be expected to be completed compared to the schedules set for them. If management is considering a change in facilities, or a change in procedures, much of the implications of these changes may be learned in advance by trying them out in the model. The comparison of different simulations is made on the basis of certain shop statistics, reported out periodically as the simulation proceeds. These statistics are well know, and follow the pattern set by the original GE-IBM Job Shop Simulator 4. Such measures as machine utilization, average waiting time in queue, average queue lengths, and order lateness are given. A sampling of these reports is included in Appendix 1. To the original set we have added such measures as shop hours, overtime hours, machine substitution counts and some others. L BACKGROUND The technique of job shop simulation is becoming increasingly well-established in industry, and the advent of the macro simulator languages, such as SIMSCRIPT 7, assures continued movement in this direction. To date, job shop simulators have been used principally as a tool for facilities planning \ '2, and as a testing mechanism for the merit of various decision rules 3, 10. Typically, a model of the shop is set up in the core of the computer, having a similar configuration of men and machines as exist in the real shop or could exist in a contemplated shop. It is supplied with data on manufacturing orders representing the coming load in the shop. The simulator processes the orders much as the real shop would. By seeing how the model shop fares in processing its load, The knowledge gained by simulation, in experiments such as outlined above, is quite useful for many scheduling decisions. However, in general, we may state that this use extends only to what might be called the guiding of scheduling decisions. It does not directly assist in the preparation of actual schedules. The idea that the technique of simulation may be turned to such a use is shown below. The job shop simulator devised at Westinghouse Steam Division has been named SHOP277 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 _SHAH. SIIIUUTOII TO _ OPTlIIAL ~TlOil ~ES _1_1115 _ n AIID PUTlCaAil EYENTS. IlESn_o_.. _______ o.a. TRILLI. . NCU. Loa. - IIIL--. C_OLL. _ . ~ DAn _ -------",... s!W8O'. . . IIlBIlIElCAt.... ____ . ____ _ .1M» 1M2 . . . . . Of I19GnIIl5 PHIODS II _·we'om~ -------------------------------.~w~~ II. . . .TlOll IEGIN DAn. DEC IllS ..IlI.'-....___ _ . .. . , - aq£.RL..lSLftU_ _ _ _ . _ _ _ _ _ _ _ Shop Statistics-Title Page, showing name, manning, dispatch rule, and re-routing table for each machine group our"" HIlIODIC -------- -~-5icIP-P9FiiuiMc:E------ --!ftlOQ_......,.,'---_ _ _ _ _ _TJlU ..!UIOP. .--_ _ _ _ _ _ BRill DaTE IN 0EC11IAL _ $ 535.0 ._.STAIIDo'IlD 51<01' TIllE 1121.5 HOUItS. P L _ OVERTlIIE PEttIODS 135.0 HOURS. ALLOWED OVERTIME pEttIODS .0 1l0III5. _ _ _--'lIOLlOAYL-~..wrtQAYj_ _ _ _ 1.~.O.. I!QUIIS._ u_ lilt!! MACltINE _ AYAlL UTIL ___ CUS~_GIIOW .... _OESCltIPtlOII. _ CAPAC .. CAPAC MRS _ _ _ _ _H""!>LlJl 28$ FITF 111 11110 '72.5 HOUIIS. .0 HOURS. UTIL ",,,ac _ TO OTHERS AYAIL UT IL CAPAC CAPAC 011 21.0 DAYS. .0 DAYS. 011 IOLE TOTAL MRS·-· UTIL CAPAC MRS MRS -lIaO·· ., BIt-· ·· .. ''2-~705 l1li5 _ TO OTMEltS Z... 85L-_ _Ib..!001 ......70JOb71000 HA 66] Joo29GOI HA 6. . JOI25ool IP 6650JoooeOOO IP 6i650JOOOBooo LA 661 JOI86GOI I. 66' JOl7aool GA 906 J~2G02 HA 661 J0835oo2 HP 663 J0029GOI DA 669 J0983GOI tP 668 JOl93G02 HI. 667 J08S1oo2 tP 6~0J04l19ool HA 671 J0666GOI 611.Z 6U.8 1.12.2 612.4 11002691 U0901Ze 0000260 11001923 • 4 11 19 .... I.A 01. HA 605.6 602.6 599.Z 603.0 1IePII116 ve,,"116 11111"116 1I8PII116 0111000276101 0131000275901 013&00911101 013&00270601 II 12 n BlADE RING 1 GOY no RG G01l 2 HP CYl fIIO DE BlD RG GU 2 DUTEII CYL GOY lP CVL EXHAUST IILD IIG GOV 2 OUTER CYL OUtER t Yl HP tYl FIIO DE laUDE RING 2 GDV tROSSOVeR liNG tYL lASE , CVA ElH SUO LA 785 006n002 OISC 6)9928n 102111U 21003910 11000052 11001~ 1 23 10 ... IP IP LA 6340J0489001 671 J0058001 6670J067100D 671 J0058002 6630JODOI000 6320J0017Doo 671 Joo58001 6590J0874OOI 6590JOn4ool ,,70J06110oo . .SOJooJ1000 360080523001 6.6 J0935G02 604.8 V8MU16 0131000265601 591.6 V8MLU6 OUA00906~06 6C1.Z 1I8ML116 013&0027$1101 I>O~.~ V8MU11I 01)&00271001 610.~ V!PLI16 013,\00276301 611.4 veMLU' 0131000216301 '0).0 ve"Lll6 013A002U101 591.4 1IellU16 0131000265601 601.0 1I811LU6 0131000908101 IIU.2 veMUI6 0131000270601 601.~ ""MLl16 0131000275801 601.8 '1ellU16 0131000275301 614.0 vellL116 OUA002U501 612.4 veMUI6 013A00210601 tl7.1 612.2 V6PLU6 013,\00211101 611.1 612.8 vellLlll1 013&00271001 611.0 6ll.0 611.3 (011.6 61Z.7 61).6 11110.4 614.7 61S.1 615.3 615.4 61 S. 7 US •• 6i6.11 PAU DESCRIPTlIIfI EA DaloMING ITEM Short HP cn fliD DE BLD ItG -I GEN END cn JIltjU lN11EII CYL ASSY tHOIER CYL asSY BLO IIG 8LO RING G01l HSG ASSY 81.0 RG -I Gtll 8LO RG -I GEN END NOZZLE CHAM8ER 2NO EXT. ILACE RING litll BlO RG-2 GEN 8LAOE RING 3 GOV tlLO RING-Z 664 JooUIOO) tYl INN~I loSSY 664 J0688001 Flail GUIDn 672 ~0~14GOI EXH HOOO 668 JOlJ2GOI CVL IIIf;ER asSY STO HIS OT 2'.6 31.7 l.S 31.7 84.4 la.8 31.7 41.2 .,8.2 11.1 43.0 4.0 9.2 9.G 2).0 30.3 1$.6 H.5 0.7 25.2 H.C 94.1 40.6 9.0 25.2 25,3 211.7 2).0 345.0 27.2 76.0 64.' 6Z.) 19.6 Z8.' 21.7 39.6 23.' 7.~ 17.1 21.0 211.7 33.0 FRail IIG TO IIG LO~Clll1 flTflJ6 L0l1'111 fITFU6 LOMell6 LOI!CU6 I.IlllC116 -LO"C 116 -LOMC1l6 LOMe 116 LOMe1l6 1.0I!CU6 H8M5116 LOMe 11 I LC"ClU FITFU6 285 lO~Cl11 LO'C 116 LOI'IC116 LO'" Ul PlIl6116PLM6116HOP5116 fltrU6 fITFUl LO"C 116 LDPI,U2 LO~'111 VUKI21 V!lTK121 LOMelll VIINK116 VB"JUI FITFUl FITFU6 HOP5116 lO"C13Z ROP7111 FIlFlU FITFIlI FITFIlI LOIOl:132 VACel~6 FitFUl FITFlll fITFlll FITf116 FITF116 ___ LONI'.Ul FITFU6 flTFlll LOloClll 100/2)6 FITFIlI 100/236 FITF111 FITF116 FITFUl FITF111 e"F5132 veNLUIo III1F5132 FITFU6 FITF116 LOMC111 FITFUl FITFUl FITFUl ft Tf 116 FITFUl FITF116 FITFUl FITF116 HDP5U6 FITFU6 FITFU6 8"f513Z WHAR 1~ I LOMeU6 ROP0116 FITfU6 FITFU6 1I8M(111) IILO)134 IILO]13~ Term Schedule rules tends to implement certain policies, at some rejection of others. This applies equally well to the schedule analyst, who is also operating under a set of decision rules. If some criteria could be found lor establishing a cost of being late, then schedules could be truly optimized and probably fully determined by the computer. Additional research in this area could yield rich returns. path in the order networks, the overall system may be highly or hardly sensitive to changes in capacity of a given machine group. While it is often startling to see the difference wrought by the addition of one man-shift at certain machines, it can also be very revealing to see the fruitlessness of another placement elsewhere. It will be up to the schedule analyst to know these sensitivities. One of the most important functions of the schedule analyst is to learn the sensitivity of the system to. changes in different variables. Assume he w~~hes to accelerate schedules, and is followingt~e aforementioned procedure of adding mento·a machine group with a troublesome queue. . As one would expect from the lessons of queueing theory, or marginal economic analysis, each successive addition of a man should result in a small absolute reduction in the average queue delay there. In turn, de.pending on where the jobs are routed to after they leave that machine group, the reduction in delay may result in substantial, little, or no reduction in general aggregated order lateness. At some point it will pay to divert the next intended man increment to some other troublesome machine group. Depending on its queue, its level of utilization, and its importance as a One procedure, which is the converse to the one above, has also been tried to advantage. Here, the analyst begins with the simulation of a shop where all .possible capacity is available, no matter what the utilization percentages. This gives an upper limit to the possible schedule performance of the shop. From this point, successive selective cutbacks can be made on the machine groups showing low utilization in the Shop Performance report. Again, as utilizations go up, the extent to which the cutbacks are continued, with their commensurate deterioration in schedules, would be a matter of local management decision. This decision would be based on comparison of the cost of the retained man to the reduction in lost time attributable to him. The above procedures depict a situation which appears to incur extensive computer 286 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 costs, since repeated runs would be made in order to establish the proper balance between load and capacity. Actually, in practice this will not be the case. If, for instance, new para-schedules were generated weekly, then much knowledge gained from last week's run can be put to use in the current run. In addition, orders don't suddenly get into trouble. They either start out in trouble, or work their way into it gradually. In most cases there will be sufficient lead time for the analyst to observe the impact of his corrective measures in succeeding simulations. Para-schedules can also be influenced by changes in decision rules, such as dispatch rules, subcontracting policies, and others. However, their efforts are studied by separate experiments. If sensitivity to their change proves to be high, then they should be programmed to respond internally as the schedule analyst would manipulate them externally. It has been shown above that by manipulating machine capacities in the model, and manipulating the priorities within the model, the para-schedules generated are being manipulated. Coming problems are recognized and corrective actions tested weeks or months before the time they will actually be encountered in the shop. If it is impossible to find a paraschedule in which all of the orders meet all of their due dates, then a pretty strong case is made that some commitments will have to be changed. All of the due dates simply cannot be met. Again, the model is available to help pick and choose. All this is done without the slightest gamble in the shop itself. When the analyst has arrived at the best resolution of limited capacities and disappointed customers, he adopts the para-schedule which represents the decisions which gave the best result. At the same time he has in hand the major instrument for implementing these decisions: The para-schedule itself and the dispatch sequence which should produce it. Dispatch Lists and the Fuzzy Future The dispatch list is the sequence in which operations are to be performed on each machine. An example is seen in Appendix II under the title "Short Term Schedule." The jobs are listed in the order in which they would be dis- patched, with start time, job times, and other pertinent data. They may be printed for any interval of time, but what the time span for the Short Term should be is a matter particular to the shop being simulated. The dispatch list is the tool to be used in the shop to implement the desired para-schedule. In theory, if the shop followed it precisely the para-schedule adopted would be met precisely. Obviously, however, this cannot be done. From the instant the list 'is prepared, things will be happening in the shop to prevent this. Parts may not arrive. Men may report in sick, leaving machines unmanned. Work may be scrapped. Metals may be extra hard, and machining therefore might take a little longer. Nevertheless, in general, the lists may be followed. If the dispatcher always goes to the highest remaining job on the list, he will be implementing the adopted paraschedule. * This leads us to the question: For how long into the future are these lists useful? Failure to do one job at the appointed time can have perturbations throughout the shop, since other parts of the list for other machines are based on that job being done at the proper time. One failure compounding another and another can lead to substantial misalignments, when it comes to trying to sequence the machines as the list says. However, the actual results should not be so bad. Here are some reasons why. Suppose that the job missed was to have proceeded on to a machine where a large bottleneck exists. In such a case, the expected queue time itself at the bottleneck machine will act as a buffer, allowing extra time for the work to actually arrive at the machine before the dispatch time appearing on the dispatch list. On the other hand, suppose the missed job was to have proceeded on to a relatively slack machine. Such a slack machine would be quite adaptive, and capable of handling the work whenever it did arrive, or shortly thereafter. Similar reasoning would hold for machine groups with high traffic. If many machines are working on many jobs, then machines empty and are ready for new work frequently, and at any such point * At the EI Segundo Division of the Hughes Aircraft Co. a job shop simulator prepares daily dispatch lists for their Fabrication Shop. They are apparently being used with great success. See 12. JOB SHOP SIMULATOR IN THE GENERATION OF PRODUCTION SCHEDULES previously scheduled work arriving late will be accommodated, tending to put it back on schedule. One situation which is not easily resolved is the case of a machine group of one or few machines that processes very long operations. Here openings are comparatively rare, and when a scheduled job hasn't arrived to be dispatched, a high level human evaluation is called for. Tying up the only machine for a long time on the wrong job could cause a substantial deviation from the intended schedule. Obviously at the present state of the art, although dispatch sequences are prepared in great detail, they still require human supervision. The need is reduced, but it still remains. Now, when the question is raised of for how far into the future dispatch lists are good, it really means for how long will they continue to require less supervision, rather than more. This remains to be seen. Feedback and Relations to Support Units The setting of due dates for shop orders be-· gins as a matter of management preference. Once established, the standard backward scheduling procedure can be used to determine due dates for supporting components and ascheduled start date for each operation *. The scheduled start dates are used by the dispatch rules in the simulator. It was seen above that the simulated results may indicate some due dates cannot be met, but the formal act of rescheduling an order may be forestalled, depending on how its dispatch priority within the simulator is influenced. However, the simulator results also bear a relationship to the components included in the order which come from external sources, such as support shops or aisles, or outside suppliers. The para-schedule acknowledges them fully, by means of the "Call-out" mechanism. If information is received that a component will be late, the computer program holds up the order in the model until the time * Rowe's flow allowance 9 could be used nicely here, since information on queue time distributions would be available from previous simulations. It is obvious from the lateness figures that the sample reports reflect runs where the simulator is pitted against a backward schedule having a theoretical minimum of slack. These figures should in no way reflect on the ability of the described facility to deliver their orders on time! 287 the component is expected to arrive. The Order History Diagnostic shows the expected arrival times of the called out components, and shows the slippage, if any, suffered because of delay introduced by them. This means that any component holding up an order is brought to the attention of the analyst so that appropriate pressure may be applied to the source while there is still time. There is still another aspect to this. If an order is running very late, either because of late components or bad queue experience, and nothing can be done about it, then judicious resetting of the due dates for the components may be warranted, especially for those coming from support shops that are already overloaded. New due dates would be based on the required times indicated in the para-schedule. v. PROBLEMS OF IMPLEMENTATION As may be expected, a large number of problems loom in trying to set up such a system. We will touch on a few of them. Supporting Systems The validity of the answers generated by the simUlator, as in all systems, depends in large measure on the amount of information available to it. This information classifies easily into three parts: configuration information about the shop, manufacturing information on orders comprising the load, and third, the intersection of the first two, which is present shop status. The first consists of such things as machines available, manning, scheduled down-time, and overtime plans. This information is converted into parameters for the simulator. Such data is comparatively easy to capture. The second set of information consists of all necessary data on all the orders that compose the coming load in the shop. Depending on the size of the shop, this data can be quite massive. It is embodied in a very large "work ahead" file, and the maintenance of this file requires a number of supporting systems. Most of its information is acquired directly from the manufacturing line-up. For the bulk of the items in a job shop, the transfer of this data into the "work ahead" file is not difficult. Because simulations often run far out into the future, and because job shops must fabri- 288 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 cate many products which are newly designed, very often orders must be included in thesimulation which are booked or are forecast, but for which no manufacturing data is yet available. Therefore a system of representing this work to the simulator is needed. Several systems have been devised. One is based on composing prototype manufacturing line-ups from similar orders previously completed. Another is based on using Monte Carlo to generate sanlples from a transition matrix constructed from historical line-ups. * The use of component availability dates was discussed above. Such information necessitates still another system to capture these availability dates from the support shops and outside suppliers involved. The shop status information may be quite difficult to acquire. Historically, in job shops most operational information is not in computer sensitive form. But if simulation is to be used with the precision that has been inferred, then at simulation beginning, the model shop has to be set up to reflect as accurately as possible the present status of the real shop. Such information can come from an extensive data collection system. A final word about supporting systems. No device, no matter how precisely it processes data, will generate valid answers if the input data is not valid. Everywhere that supporting systems fail to supply useful information, subjective evaluations begin to reappear. There will always be a need for some interpretation and evaluation when appraising the para-schedules generated. Unfortunately such interpretations will grow more and more subjective as the model gets less and less information to work with. If too much interpretation is required, the model has defeated its -own purpose as far as generation of schedules is concerned. Frequency of Simulations In discussing dispatch lists, it was brought out that the shop must be expected to go considerably astray of the sequences generated. In addition, it may be expected that the older the dispatch lists become, the less valid they will be. With the passage of time, more and more things will happen to disturb the schedule. * See \ These would be things that neither the simulator nor any other device could have anticipated. The questions arise then: a) How often should the model be set up again, based on the latest information, and a new simulation run? and b) How far into the future should the simulation run? Much depends on the uses to which the answers are to be put. Generation of dispatch lists should be frequent, but need not extend too far into the future. Long range simulations for load leveling, manpower planning, and facilities planning would extend very far into the future, but are run infrequently. In the simplest sense these considerations are a function of the production cycle time, the number of operations done per day, and the accuracy of the data. There can be, in addition, many other factors such as engineering changes and repair orders, whose effects are more difficult to determine. At present, the answers must be determined empirically for the shop in question. Unrepresented Production Patterns As may be expected, the usefulness of the results of the simulator will be colored by how closely the model is an analogue to the real production patterns in the shop. A case in point is the ,vorking of operations out of sequence. The manufacturing line-up infers that this should not be allowed, yet sometimes foremen on the shop floor find cases where they can circumvent this restriction to advantage. The simulator does not recognize such a possibility at all. Other places where the simulator fails to accurately represent some standing shop practices is in lap phasing, bumping, and the saving of set-ups. Simulators can be written which will do such things. The problem is that the capturing of the necessary information which they need is prohibitive. Replication Each run in a simulation is only one sample from the joint distribution describing possible results from the shop complex. Clearly a greater number of samples, or replications, will improve the accuracy of the predictions embodied in the para-schedule *. The displays in * How much is not certain. See B. JOB SHOP SIMULATOR IN THE GENERATION OF PRODUCTION SCHEDULES 289 Appendices I and II reflect the results of a single run, but could easily represent the results of many replications. It is not a difficult data processing problem to merge the results of many runs, and format average figures (with accompanying dispersion measures) instead of a single value. can require from one to two words of core per shop operation, depending on the complexity of the networks. To this must be added the core required for programs (upwards of 8K) and core for tables, which is approximately Unfortunately, the number of runs required to allow for a reasonably stated prediction with very modest confidence limits is prohibitively high in cost. ** Naturally, in an industrial environment the cost determines the use. Because of this, the number of replications will be low, and scientifically justified statements on accuracy will be limited. However, it is felt that this does not entirely impugn the usefulness of the simulator as an operational tool, for several reasons. The first is the above-mentioned fact that queues tend to act as buffers. - Another is that given a feasible schedule to shoot at, management will put a marked bias on the play of otherwise random events in the shop. Finally, the dispatch sequence generated is at least likely to lead to the goals represented by the adopted para-schedule. Some dispatch sequences which wouldn't lead to the desired schedule have been revealed in prior runs, and have already been eliminated by manipulation of the capacities and priorities in the _model. A large number of possible sequences may yet remain which might lead to equally desirable para-schedules, but it is questionable how much would be gained by seeking them out. i = 1 where G is the number of machine groups, M is the number of machines, F is the maximum number of jobs which will be forced into a queue overflow zone when the queue area for a particular machine group is full, qj is the mean number of jobs in queue for the i-th machine group, O'j is the standard deviation of the number of jobs in queue for the i-th machine group, and k is an arbitrary constant which trades room for speed when searching for jobs in queue. Scaling A major limitation of the scheduling scheme may appear with the inability to fit some shops into the core of the computer. LeGrande reports on problems of this type in 6. The problem revolves only partly around the number of machines in the shop. A more predominant consideration is the amount of load which must be represented. Each sub-order is on the average half-finished, and all of the unfinished part is in core. Thus, one-half the average number of operations per sub-order, times the average number of orders on the floor, gives the minimum number of operations which must be kept in core during simulation. The program ** The proper number of replications per simulation could be in the thousands. For example, see 14. G 44G+3M+2F+ 154+ ~ There are several remedies which may be tried when the model shop becomes too large for core, but they are too involved for discussion here. VI. CONCLUSION Job shops by their very nature make automatic procedures difficult. Yet it is hoped that these experiments will lead to efficiently mechanizing one of the most difficult of the shop's control problems: the schedule-sequencing determination. For the first time it appears possible that at one central logical control point the entire shop, with all of the interaction among orders competing for facilities, can be examined as a unified whole. Ideally, this should reduce the need for segmenting and assigning to the various departments the resolution of the scheduling problems that occur there. This has been done in the past because of the great complexities involved. Those familiar with this practice know that it adds to lead times, and encourages suboptimization. We have outlined above a method that might be considered somewhat unorthodox to current scheduling concepts, since the detailed sequenc- 290 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 ing determines the schedule, instead of vice versa. The full extent of its value remains to be proven. We have tried to accurately portray the problems that exist in implementing such a scheme. It should be pointed out that most of these problems are not especially difficult, or beyond the capabilities of present practices. They are typical implementation problems. REFERENCES 1. BARNES, W. E., "A Case History on Job Shop Simulation," Proceedings, 16th Annual S.A.M.-A.S.M.E. Management Engineering Conference, April 1961. 2. BURKHART, L. J., "Application of Simulation to Production and Inventory Control," 15th Annual S.A.M.-A.S.M.E. Management Conference, April 1960. 3. CONWAY, R. W., and MAXWELL, W. L., "Network Scheduling by the Shortest-Operation Discipline," Dept. of Industrial and Engineering Administration, Cornell University. 4. I.B.M. Math and Applications Dept., " Job Shop Simulation Application," M&A-1., I.B.M., White Plains, N. Y. 5. KERPELMAN, H. C., "Solution to Problems of Assembly and Disassembly Operations in a Job Shop," 19th Annual Meeting, Operations Research Society of America, May 1960. 6. LEGRANDE, EARL, "The Development of a Factory Simulation Using Actual Operating Data," Management Technology, May 1963. 7. MARKOWITZ, H. M., HAUSNER, B., and KARR, H. W:, "SIMSCRIPT: A Simulation Programming Language," RM-3310-PR, RAND Corporation, Santa Monica, Calif., November 1962 (also Prentice-Hall, 1963). 8. MORGANTHALER, G. W., "The Theory and Application of Simulation in Operations Research," Chapter 9 of Progress in Operations Research, Vol. 1, R. L. Ackoff, ed., N. Y., John Wiley & Sons, 1961. 9. ROWE, A. J., "Toward a Theory of Scheduling," Report S.P.-61, System Development Corp., Santa Monica, California. 10. , "Application of Computer Simulation to Production System Design," in Modern Approaches to Production Planning and Control, R. A. Pritzker and R. A. Gring, eds., New York, N. Y., American Management Association, Inc. 11. SISSON, R. L., "Sequencing Theory," Chapter 7 of Progress in Operations Research, Vol. 1, R. L. Ackoff, ed., New York, John Wiley & Sons, 1961. 12. STEINHOFF, H. W., JR., "Daily System for Sequencing Orders in a Large Scale Job Shop," 6th Annual ORSA/TIMS Joint Western Regional Meeting, Orcas Island, Wash., April 1964. 13. TRILLING, D. R., "Job Shop Simulation of Orders that are Networks," Westinghouse Electric Corporation, Pittsburgh, Pennsylvania. 14. VAN SLYKE, R. M., "Monte Carlo Methods and the PERT Problem," Operations Research, September-October 1963. HYTRAN* - A SOFTWARE SYSTEM TO AID THE ANALOG PROGRAMMER Wolfgang Ocker and Sandra Teger Electronic Associates, Inc., Princeton, New Jersey problem variable is multiplied by a scale factor to form the machine variable in volts. (2) Parameters stated implicitely have to be evaluated. (3) A computer diagram is prepared, showing the analog hardware implementation of the equations, component modes and the expressions represented on potsheets and amplifier sheets. IA\ Potentiometer settings are determined \'"%1 by evaluating the constant expressions of the scaled equations. (5) To check proper scaling and the correctness of the computer diagram, an offline static check is carried out as follows: (a) The theoretical calculations are performed by substituting into the original equations a test initial condition for each variable that is represented by an integrator output, and solving for their highest derivatives. (b) After the chosen test initial conditions are scaled and entered into the computer diagram as integrator output voltages, the programmer calculates and records on the diagram all component outputs using the voltages present at their input. ( c) The computed voltages representing the highest derivatives are com- 1. INTRODUCTION In recent years much attention has been given to combined analogi digital computation, dividing the problem on hand into an analog and a digital part and letting each task be performed most economically by the part of the system which suits it best. This philosophy not only applies to simultaneous analog and digital computation, but also to a sequential use of these two means of computation. One such application is the use of digital C0111puters in the programming and checking of analog computers, a task ideally suited to a digital machine and especially practical in an hybrid installation where the digital computer is most readily available to the analog programmer. HYTRAN, a system of software programs has been developed to provide quick digital assistance in the programming of the analog part of the HYDAC 2400 hybrid computer system **, even to the analog programmer unfamiliar with digital computers. Since the function of the HYTRAN system is to replace or complement certain manual procedures, it is necessary to briefly consider the manual method of analog programming and checking on which it is based 1. This method features a two-way static check and can be broken down into the following steps: (1) In order to comply with the voltage range of the analog computer every * A service Mark of Electronic Associates, Inc. ** The HYDAC 2400 includes the digital DDP-24 and the analog PACE 231R. 291 292 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 pared with their respective theoretical values, any discrepancies are traced back to their origin and corrected, and finally the static check voltages are recorded on the potentiometer, multiplier, and amplifier sheets. (6) The analog patch-panel is wired according to the computer diagram, and the potentiometers are set. (7) To insure correct patching and operational hardware, all amplifiers are read out on-line and compared with their respective, computed static check values. SCALE EQUATIONS I DRAW DIAGRAM I I I I I I '") v OFF-LINE ~ COMPUTE CHECK 0r PARAMETERS I COMPUTE POTSETTINGS I I HYTRAN has been written to process digitally the following rather tedious routines of analog programming: (1) The calculation of the theoretical static check values, voltage static check values, and potentiometer settings. (2) The performance of the static check itself, both off-line and on-line *. (3) The complete documentation of the analog program. I I I '\../ I I I I I "\.)7 ") PATCHING I MANUAL OPERATIONS v ON-LINE CHECK ~ Iv- SET POTS HYTRAN OPERATIONS Figure 1. Programming of an Analog Computer. Figure 1 shows the steps required to program an analog problem with HYTRAN. The scaling of the physical equations and the preparation of the computer diagram are still performed by the programmer who thereby maintains direct control over the analog implementation of the problem. In order to permit the calculation of the theoretical static check values, HYTRAN must be given the original problem statement and a set of test initial conditions. To calculate the static check voltages and to perform the static checks, the input has to include the patching according to the computer diagram, component settings or modes, and the highest derivatives with their corresponding integrators and scale factors. Expressions representing other component outputs may be optionally inputted to aid in the automatic pin-pointing of errors during the off-line static check. All inputs are punched on paper tape in an analog oriented format, which uses patch-panel terminology and allows complex algebraic expressions. HYTRAN outputs include the conventional amplifier and potentiometer sheets, cross-referencing and symbol tables. Potentiometer settings and static check values are put out as typewriter documents and on paper tape. The format of this tape allows automatic setting of potentiometers as well as automatic read-out and check of static check values by means of the ADIOS input/output desk. For a more thorough check of the analog set-up and hardware, the ADIOS tape of measured static check values can be processed by HYTRAN, providing a rapid means of locating mispatching or component failures (Figure 2) . II STATIC CHECK *Since the use of an ADIOS desk for input/output to the analog computer is of particular advantage with HYTRAN, this discussion presumes its availability for the automatic setting of potentiometers and performance of the on-line static check from data punched on paper tape. The practice of computing two independent sets of check values has been used as a basis for the HYTRAN off-line static check. The conventional analog circuit diagram states for at least some components their expected outputs, by an HYTRAN---'A SOFTWARE SYSTEM TO AID THE ANALOG PROGRAMMER 293 values. In this case this particular portion of the computer diagram cannot be checked for consistency with the problem statement. In the special case of a high gain amplifier, checking can be performed only if the expression at the component output is explicitly stated. The program here checks that the sum of the component inputs is zero. Similarly for an algebraic loop, the output expression of some component in the loop must be given. Figure 2. HYTRAN Inputs and Outputs. expression in terms of parameters, variables and scalefactors. By substituting for the parameter and variable names in these expressions their values in physical units, the so-called theoretical static check value is obtained. Further input defines the analog component interconnections or patching inf()rmation, which is used to calculate the voltage check values. In this computation all input voltages to a component, the kind of input to which they are connected and the transfer· function of the component are used to determine its voltage output. When both voltage and theoretical values are available for a component output they.are compared and, if in agreement, they yield the offline static check value for that component; If the values do not agree then the error is isolated by retaining the theoretical value as static check value for all subsequent use and an error message is given. Values exceeding the voltage range of the computer will also cause errormessages, but will be retained for further static check calculations. The amount of input information required depends upon the degree of checking desired. If the expression for some component _output is omitted the voltage value is retained as static check value. This has the disadvantage of allowing any undetected error to propagate, rather than isolating it to one component as is done when the expression is given. Conversely, if the patching connections are not stated, the theoretical values are used as _static check The statements defining the analog connections need not be given in any particular order. Since a connection statement can only be evaluated once all· input voltages to the component have become defined, the resulting static check values are computed in an order that is different from the order of their input statements. The static check values are punched in this computational order in ADIOS tape format for all components with voltage outputs. This is important for the on-line static check, which if performed by ADIOS in a conventional way cannot distinguish propagated errors from original errors. However, discrepancies typed out in computational sequence are always preceded by the original error"thus eliminating any tracing for the error source~ In addition a typeout of the errors in computational sequence and an alphabetical listing of all component names and their static check values is gi~en. III THE ON-LINE STATIC CHECK While in systems without digital access to the analog computer the on-line static check must be performed by manual comparison, the availability of a digital input/ output. system provid~s the HYTRAN user. with a choice of two automatic procedures in systems using ADIOS. One method is to feed the HYTRAN.generated static check tape into ADIOS to obtain an automatic comparison between the calculated and the measured values. It is used whenever the digital computer is not available at the time of the analog on-line check. However if the DDP-24 is available at online check-out time, the use of HYTRAN allows an improved consistency check that is expectedto become an invaluable tool for debugging of complex problems as well as for preventive maintenance. checks. Rather than 294 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 merely comparing the measured component outputs with their respective, computed voltages, the HYTRAN on-line check tests the transferfunctions of each component used. To accomplish this the voltages present at the inputs of a particular component as well as its output voltage must be known. The outputs are easily measured since ADIOS allows automatic punching on paper tape of the output voltages of the components used. The inputs on the other hand are either zero (unused) or equal to the voltage at the output of some component, and are therefore easily determined from the connection statements given by the programmer. HYTRAN then computes the output voltages of each component used in the program and compares it with the measured output, using the tolerances stated in the individual component specifications. In these computations only the measured voltages are used. The resulting theoretical outputs are discarded immediately after the comparison with the measured value. Therefore, errors do not propagate, but are always pin-pointed at the component level. As a result, accumulated errors do not necessarily cause an error message (as they would in the conventional on-line check), but the contribution of. each component involved is investigated and checked against its specifications. It is sometimes desirable to repeat the on-line check at some later time after initial conditions and parameters have been changed. This is possible without the manual entry of such changes, as they are reflected in a change of potentiometer settings. These settings are updated by reading in a paper tape containing the actual settings of all potentiometers used, which can be automatically punched by ADIOS. All algebraic loops, including high-gain amplifier circuits can be checked in closed-loop fashion even if their output is unknown, because the on-line check never uses the given component expressions. It is also possible to use the on-line check after switching the analog computer into hold mode during a problem run for a reading of component outputs. Because HYTRAN simulates integrators in initial condition mode (i.e., the output is solely determined by the voltage of the initial condition input), a read-out dur- ing hold causes an error at every integrator output. But since HYTRAN can distinguish between actual errors and propagated errors, the remainder of thp "program can be checked correctly. Error messages generated in the on-line check state the erroneous component, the correct output voltage, and the actual voltage measured. We recall that the correct voltage is computed from the component function and the connection statement. Therefore, discrepancies can be caused by component failure as well as by patching errors. These error sources can be separated if the analog computer hardware is thoroughly checked before the program is set-up. This hardware check too can be performed with the HYTRAN on-line diagnostic program, using an artificial problem which has been correctly and permanently wired on an analog patch-panel. IV THE HYTRAN LANGUAGE The input language used in HYTRAN is oriented toward the analog programming procedure rather than towards the digital machine. Input data can be written in a form which closely resembles the programmer's own way of documenting analog programs. The HYTRAN inputs fall into two main categories: inputs which describe the problem to be solved, and inputs describing its implementation on the analog computer. 1. The Problem Statement The problem is usually stated in mathematical notation. HYTRAN therefore accepts the problem statement in a mathematically oriented language which bears resemblance to the programming languages ALGO L 6 and FO RTAN 7. The problem information to be inputted includes parameter values, variable initial conditions, and a set of algebraic and differential equations-all in terms of physical units. Parameters can be defined by expressions containing numerical values as well as other parameters, while initial conditions of variables can be given in terms of numerical values parameters, or other initial conditions. All algebraic and differential equations have to be in explicit form with respect to the unknown vari- HYTRAN-A SOFTWARE SYSTEM TO AID THE ANALOG PROGRAMMER able, or the highest derivative, respectively. Otherwise, the equations should be inputted in their original form so that any errors that may occur during further manipulations on the problem equations will be revealed or the theoretical static check. HYTRAN accepts expressions and equations containing· not only the basic algebraic operations, but also the functions sine, cosine, arctangent, square root, logarithms to the bases ten and e, and the exponent of e. In addition, the program processes certain discontinuous functions which are often used in analog computation such as absolute value, sign, limits and dead zones. Relational operators are another means of obtaining step-functions. They are represented by the relations less than and greater than (for practical purposes, equal to does not exist in analog computation). While a logical and can be performed by multiplication, the or is an explicit Boolean operator in HYTRAN. 2. Circuit Diagram Information An analog program is generated in the form of an analog circuit diagram by which the programmer states the outputs of components, their modes, their interconnections (patching), and in case of potentiometers and switches, their setting or position. Inputting this information enables HYTRAN to check the analog program both against the original problem input and against the physical set-up on the analog computers. The general form of a component statement is: Component Name and Number, Mode = Expression; Connection Statement A console number is given only when a change from one console to another occurs. By mode one designates the configuration in which a component is used, or any special connections not involving problem variables. For example amplifiers can be connected in integrator, summer or high gain mode. The expression gives the (static check) output of the analog component in terms of problem variables and scalefactors, written in the format used in the problem statement. Finally, a connection statement is defined as a sequence of up to 32 input statements. Each 295 input statement consists of an input designation (usually identical with the pre-patch panel input name), followed by the name of the analog component which is connected to the specific input. All computer input data are punched on paper tape, each type of information being preceded by one of the following keywords. These input sections must be presented in the order given below, although within any section the statements can be in arbitrary order. (1) The PARAMETER and VARIABLE keywords are followed respectively by parameters as used for the static check, and all variable initial conditions. Parameters and variables may be referred to by mnemonic names which, in turn, can be defined in terms of other parameters, initial conditions, or by numerical values. A name thus defined need not be referred to beforehand, but must be specified within the same keyword section. (2) The EQUATION portion contains the set of theoretical differential and algebraic equations· to "be implemented on the analog computer. (3) The keyword COMPONENTS is followed by the illforIl1ation contained in the circuit diagram whcih represents the analog program. As a simple example of how the input information to HYTRAN is written let us consider the case of a second order equation (see Figure 3). The inputs shown in Figure 4 should be provided if a complete check is desired. Note that comments pertinent to the program input, but meaningless to the digital program, can be inserted if preceded by a tab. The input begins with the problem identification, which contains any information the programmer wishes to use as a heading for all typewriter outputs. The resulting outputs are shown in Figures 5 through 7. V. THE INDIVIDUAL GRAMS HYTRAN PRO- The HYTRAN system presently consists of three programs which together provide the 296 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE. 1964 PARAMETERS +IOOV -IOOV Swy/lOO S =5 S = SCALE FACTOR A =zo A = AMPLITUDE OMEG = 10 BETA = 10 TO - BETA = ~ = TIME SCALE =1/10 Swy VARIABLES Y' = A*SIN (OMEG*TO) Y =A*COS(OMEG*TO)/OMEG EQUATIONS Y" = - OMEG**Z*Y Swy COMPONENTS POO = S*Y'/IOO;+REF AOO, I=S*Y'; IC:POO ANALOG ,IMPLEMENTATION OF 2 d y 2 --2 +w y=O dt COO = s*y" I(BETA*lO);I:QOO Q01. = OMEG/BETA;AOO POI = S*OMEG*Y 1100; -REF AO!, I = --S*OMEG*Y; IC:POI COl = -S*OMEG*Y'/(BETA*IO); I:QOl Figure 3. Sample Problem and Analog Circuit Diagram. AOZ, S=S*OMEG*Y; I:AOI QOO = OMEG/BETA; AOZ Figure 4. HYTRAN Input Format. features described above (see Figure 8). These programs are: (a) an interpretive (off-line) static check generator, (b) an on-line diagnostic program, ( c) a documenting program. Each program can be contained in memory, yet allows enough data storage to process three, 120 amplifier, analog consoles in an 8K core. The off-line static check generator converts the information on the programmer's input tape into a compact form suitable for digital processing. A t the same time, it checks the input for proper format, typing format-error messages when necessary. Defined expressions are evaluated immediately, while expressions containing undefined symbols are stored in memory until they become defined by subsequent input statements. Static check voltages are evaluated in a similar manner. Here the connection statements may be stored awaiting the calculation of the static check values of all components mentioned in the statement. During the entire process, an intermediate tape is punched containing in compact form all in- formation necessary to run the remaining HYTRAN programs, including possible future extensions of the HYTRAN system. The inputs to the on-line diagnostic generator are the intermediate tape punched by the off-line program, a tape of measured potentiometer settings, and a tape of measured component outputs. The intermediate tape provides the connection statements and potentiometer settings necessary for the computation STATIC CHECK VALUES AOO 47.48 AOl 87.99 AOZ 87.99 COO -8.79 COl 4.75 POO -47.48 POI 87.99 QOO 87.99 QOl 4.74 Figure 5. HYTRAN Static Check Output. 297 HYTRAN-A SOFTWARE SYSTEM TO AID THE ANALOG PROGRAMMER ADIOS TAPES 231R PARAMETERS A 2.00000 El BETA 1. 00000 El OMEG 1. 00000 El S 5.00000 EO TO 1. 00000 E-l COMPlJTER ~ • VARIABLES Y 8. 799416E-2 Y' 4.74835E-l Y" 8.7994l6EO of the exact output voltages. The two remaining tapes are direct outputs from the ADIOS input/output desk. Any measured potentiometer setting read replaces the corresponding theoretical setting which was read previously from intermediate tape, however, the reading of the measured settings is optional and can be suppressed by the use of a console switch. The documenting program sorts and converts the compact information on the interCOMPONENTS MODE SETT A*S*Y' AOO ~*S*Y AOI A02 EXPR S A*S*Y A*S*Y'/100 POI A*S*OMEG*Y /100 QOO OMEG/BETA 1.0000 001 OMEG/BETA 1. 0000 .8799 CROSS REFERENCE PARAM OCCURRENCE POO, POI OMEG 000,001 S POO, POI Y POI Y' POO DIAGNOSTIC PROGRAM ~ L-,;' Cb HYTRAN OFF -LINE STATIC INPUT CHECK TAPE GENERATOR Ir~- CO c) INTERMEDIATE TAPE DOCUMENTING PROGRAM mediate tape, resulting in component sheets that contain the analog components in an orderly sequence, together with their modes, and their outputs or settings in terms of problem parameters, variables and scalefactors. In addition, an alphabetic list of values of parameters and variables is typed out, and ADIOS tapes are generated. One ADIOS tape contains all potentiometer settings in a' format which allows the automatic settings of potentiometers, the other one contains the computed static check values for on-line static check by ADIOS. The documenting program finally generates a cross-reference sheet which consists of an alphabetic list of parameters and variables. Each name is followed by a list of potentiometers, the settings of which are dependent upon the parameter in question. .4748 POO A 0 ON-LINE Figure 8. Flow of Information in HYTRAN. Figure 6. HYTRAN Symbol Table Output. COMP HYTRAN Cbc) ANALOG Figure 7. HYTRAN Component Sheets and Cross Reference. VI CONCLUSIONS HYTRAN is expected to become an important tool to the analog programmer as it increases programming efficiency and justifies a high degree of confidence in the analog solution. Some of the reasons that lead to this conclusion are: (1) The automatic evaluation of algebraic expressions saves programming time and prevents arithmetic errors. (2) The generation of pot-set tape saves time and eliminates the errors which could occur when transferring the numerical settings from the desk-calculator to the ADIOS keyboard. 298 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 (3) The documentation generated by HYTRAN provides a complete, error-free, and standard documentation of analog programs. Cross-referencing is expected to speed-up the changing of parameters and scalefactors. (4) The off-line static check will check every component in the computer diagram. Automatic static check calculations save time and are error free. The checking on the component level allows one to omit the checking of any selected portion of the computer diagram, such as circuits that must be checked dynamically or that are considered standard routines. Such an omission does not prevent the checking of other components in the same algebraic chain. (5) Simple changes of connections, parameters or scalefactors may change the majority of the static check values and pot-settings in a program, but only a simple change is necessary on the HYTRAN input tape in order to generate a complete, updated set of ADIOS tapes and documents. Obviously, when a new static check is not required, changes can be made in a conventional way and there is no need to update the input tape for every minor change. (6) When the on-line static check is performed by ADIOS, the computational sequence of the static check values on the HYTRAN generated tape eliminates tracing for the error sources. (7) The use of the on-line diagnostic program allows pin-pointing of errors on the component level, even for closed algebraic loops of unknown output. (8) In conjunction with a permanently wired test problem, the on-line diagnostic program can be used for daily maintenance checks. Man-power savings from the above benefits out-weight by far the additional effort involved in preparing the HYTRAN input tape, a job that is comparable in size to that of manually preparing potentiometer and amplifier sheets. The present system is easily expanded to include new features (such as the generation of a digital check solution for example), some of which may 'evolve from the practical use of the present HYTRAN system. Since such additions will require little additional input information, their benefits will be available at small extra cost and therefore further increase the over-all economy of the system. REFERENCES 1. CARLSON, A. M. and HANNAUER, G., "Handbook of Analog Computation" (Electronic Associates, Inc., 1964). 2. PAYNTER, H., and SUEZ, J., "Automatic Digital Set-up and Scaling of Analog Computers", ISA Transactions, Jan. 1964. 3. GREEN, G., DEBROUX, A., DEL BIGIO, G., and D'Hoop, H., "The Code APACHE intended for the Programming of an Analogue Problem by Means of a Digital Computer", Proc. Int. Assoc. Analog Computation, Vol. 5, April 1963, No.2. 4. OHLINGER, L., "ANATRAN-First Step in Breeding the DIGNALOG," Proc. WJCC, Vol. 17, May 1960. 5. PROCTOR, W., and MITCHELL, M., "The P ACE Scaling Routine for Mercury," Computer Journal, Vol. 5, No.1, 1962. 6. NAUR, P., et aI., Report on the Algorithmic Language ALGOL 60, Com. ACM., Vol. 5, No.5 (1960). 7. FORTRAN General Information Manual (International Business Machines Corporation, 1961) ed. F 28-8074-1. "PACTOLUS"- A DIGITAL ANALOG SIMULATOR PROGRAM FOR THE IBM 1620 Robert D. Brennan and Harlan Sano International Business Machines Corporation Research Laboratory San Jose, California the very intimacy of the man-machine communication it permits. Since the nature of the task can be so well defined for this particular application, digital simulation seems a most logical starting point in the quest for more creative use of computers. Only when we are able to perform this task well-with ease and flexibility comparable to simulation using analog computers-only then, should we proceed to those applications where the man-machine relationship is more nebulous. I. INTRODUCTION Perhaps the most formidable challenge in the field of digital computer applications is the development of equipment and programs which will extend the creative power of scientific users. The crux of the problem, of course, is intimate man-machine communication-a most elusive and difficult-to-define characteristic. The engineer requires a conveniently manageable system; the scientist requires sufficient intimacy to provide real insight into the complex interplay of problem variables; the creative user requires computing power and flexibility to permit imaginative use of the computer and graphic display to permit recognition of inventive solutions. The development of computers designed specifically for such applications has been slow due to the difficulty of ascertaining the proper man-machine relationship. Digital analog simulator programs-programs which affect the elements and organization of analog computers-are no novelty. Since the first attempt by Selfridge 1 in 1955, there have appeared a number of such programs; best known perhaps are DEPI,2 ASTRAL,3 DEPI 4,4 DYSAC,5 PARTNER,G DAS, 7 JANIS, 8 and MIDAS.9 Significant improvements have been made in both the interconnection languages and the computational aspects. The latest and most sophisticated of these programs is MIDAS; it incorporates the best features of its predecessors while presenting several important innovations. However, all these previous programs seem to share a common failing in that while they succeed to a greater or lesser extent in using blockoriented languages to express the simulation configuration, they fail to provide the on-line operational flexibility of the analog computer. P ACTOL US is an attempt to focus attention Simulation is perhaps unique. It does require close man-machine communication but the nature of this interplay is fairly well known. Years of experience are available in which analog computers were used for simulation in the entire spectrum of scientific disciplines. The structural organization of the analog computer-its collection of specialized computing elements which can be interconnected in almost any desir~d configuration-makes it a most flexible tool for the engineer, who is trained to visualize a system as a complex of subsystems. The forte of the analog computer is in 299 300 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 upon this seemingly ignored aspect of digital simulation. According to ancient myth, everything that King Midas touched turned into gold. This was fine until dinner time when his food and drink also turned into gold and what had seemingly been a boon wrested from the gods became a curse. To remedy this golden problem, he was advised to bathe in the River Pactolus. Digital simulation programs, particularly MIDAS, have certainly seemed aurous to many users; yet they must be regarded with mixed feelings, at best, by the engineer who is accustomed to "twisting a pot" or "throwing in a lag circuit" at an analog computer console. To such a user, the remoteness of the digital computer and "turn-around" time seem something of a curse on digital simulation. P ACTOLUS is intended to demonstrate that with an appropriate terminal as an operating console, this "curse" can be remedied. P ACTOL US is designed to permit the user to "modify the patchboard," "twist pots," and "watch the plotter" as is the wont of the analog devotee. The name of the program was chosen in deference to the structural and computational excellence of the MIDAS program and to suggest a direction for future development. II. GENERAL DESCRIPTION Simulation is a well-established engineering and scientific tool with applications ranging from simple mechanics to hydrodynamics, aerospace, and bio-niedical research. It is presently so widely used that most definitions are unduly restrictive. An excellent definition which avoids this fault is the following: "Simulation is the act of representing some aspects of the real world by numbers or symbols which may be easily manipUlated to facilitate their study."lo Analog and digital computers obviously differ m3:rkedly in the manner in which they operate; thus, considerable differences are to be expected between analog and digital simulation. Digital simulation offers significant advantages which have been capably described in References 1-9. The consensus of these is that for many types of problems, digital simulation can provide more reliable and accurate results with less over-all engineering time and effort. Much of this potential has already been achieved in the various digital analog simulator programs. In part, the development of P ACTOL US was intended as a commentary on the language and computational aspects of these programs. Primarily, however, we are concerned with the second half of the definition; that is, improving the ability of the digital computer user to manipulate the numbers or symbols. PACTOLUS might be described as a blockoriented interpretive program with on-line control and input-output capabilities. The program, written in FORTRAN II-D, was developed for the IBM 1620 computer with the 1627 Plotter and the Card Read-Punch. This is a comparatively small, scientific computer and several features which might have been incorporated with a larger computer had to be foregone. The overriding concern, however, in the development of this program was the demonstration of operational flexibility. For this purpose, the 1620 with its plotter, typewriter, and sense switches has been an excellent choice. The configuration and parameter specifications may be prepared in advance of a problem session in the form of a deck of cards, or this data may be entered via the typewriter in a simple, convenient manner. If "patchboard wiring" is performed at the console by means of the typewriter, the punch automatically produces a card which may be added to the previously prepared deck. The user, observing the primary output as it is plotted, may interrupt the run at will to modify the configuration, parameters, or initial conditions. The typewriter provides a neat, permanent record of the configuration and parameter values and any modifications specified by the user. In addition, the user specifies those variables of secondary interest which will be recorded by the typewriter at specified intervals during the run. The program incorporates all the standard analog computer elements-summing amplifiers, inverters, integrators, multipliers, relays -plus many special purpose analog circuitsabsolute value, bang-bang, dead space, limiters, clippers, zero order hold circuits. Table I contains the complete list of the available elements and symbols. The program is presently limited to a maximum of 75 blocks in a simulation; no more than 25 integrators or 25 unit delay PACTOLUS-A DIGITAL ANALOG SIMULATOR PROGRAM FOR THE IBM 1620 NAME SYMBOL TYPE DESCRIPTION ARCTANGENT +1 e BANG-BANG o >0 0 for e = 0 -1 i 0 i i eo = Min (O,e - P ) e <0 1 2 i DEAD SPACE EXPONENTIAL LINEAR INTERPOLATION 10 EVEN SEGMENTS BETWEEN e '" 0 AND i e =- 100 FUNCTION GENERATOR i GAIN eo HALF POWER INTEGRATOR I JITTER J Table ~-eo =~ SQUARE ROOT e i ~ r. Definition of PACTOLUS Elements. RANDOMNUMBERGENERATOR BETWEEN:1 301 302 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 TYPE SYMBOL CONSTANT K n. LIMITER L NAME DESCRIPTION a1 e0 e-~e i~ e e 0 o o = = P I > o· Min(e , PI) i for e Max(e1"p2) i > 0 MAGNITUDE NEGATIVE CLIPPER N e.··~e i~ o e 0 s 0 o > 0 OFFSET POSITIVE CLIPPER P QUIT Q e··-~e 1~ 0 =9- 0 e1 e 2 RELAY SINE TIME PULSE GENERATOR Q n e e QUIT(TERMINATE RUN) R S T o IF e > e 2 1 eo e ~ ~ e ~e i~ = e e 1 0 ~ o < 0 s1n(e ) ARGUMENT i IN RADIANS GENERA TES PULSE TRAIN WITH PERIOD EQUAL TO PI FIRST PULSE 0 OCCURS WHEN e 1 ?! 0 PACTOLUS-A DIGITAL ANALOG SIMULATOR PROGRAM FOR THE IBM 1620 NAME DESCRIPTION SYMBOL TYPE UNIT DELAY U e~O VACUOUS V e~eo WEIGHTED SUMMER W e e~ P2 Pa X WYE Y ZERO ORDER HOLD Z SUMMER + DIVIDER / e = ei(t -at/2) 0 eo f(e ) i USED IN CONJUNCTION WITH ELEMENT WYE e 0 0 e~ e n eo eo e MULTIPLIER n e l e 1 = Bn> - e e 2 e~~ e I Z n 0 e 0 ;&! = Ple l + P e + Pae 2 2 a = e l e2 LOGICAL BRANCH ELEMENT USED FOR IMPLICIT OPERATIONS SAMPLES WHENEVER e > 0 2 = ±e I ±e2 ±ea o ONLY ELEMENT WHERE NEGATIVE' SIGN IS PERMISSIBLE IN CONFIGURA TION SPECIFICATION e = SIGN INVERTER e SPECIAL 1-9 SUBROUTINES SUPPLIED BY USER o n REPRESENTS THE BLOCK NUMBER 303 304 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 elements may be used. The program is also restricted at present to a maximum of three function generators. The structure of the program is sufficiently simple that modifications or additions to the set of PACTOLUS elements may easily be made to accommodate the requirements of particular users. The innovation in PACTOLUS is its atten.., tion throughout to operational flexibility. In most other respects, PACTOLUS represents a conscious synthesis of those features which in our opinion are the best of the many previous programs. Its interconnection language is modeled on that of ASTRAL which is flexible yet simple. Each block is identified by number which also identifies the output from the block. The type of element for each block is specified by a single alphanumeric character or mathematical symbol. The inputs to a block are specified by listing the block numbers of those elements which provide the inputs. Structurally, PACTOLUS is an interpretive program and is closely modeled on MIDAS. Like MIDAS, it uses the excellent sorting procedure-a logical test for determining the proper order for the block computations-which had been a feature of the ASTRAL program. The second-order Runge-Kutta integration scheme used in PACTOLUS is a compromise between the Euler integration advocated by the developers of DAS and PARTNER and the more sophisticated formulas used in many of the other programs. For those simulations which commonly involve discontinuous functions, the use of higherorder numerical integration formulas seems unwarranted; apart from accuracy considerations, the requirements of the output plotter demand fairly small time increments even at the cost of prolonged solution time. In addition to the complement of computing elements provided with most of the digital analog simulator programs, PACTOLUS includes an element similar to the Implicit Function element which is one of the outstanding features of MIDAS. This element is used for the solution of equations of the form Y = f (Y). It permits iteration without advancing the time clock until convergence is (hopefully) achieved. The implementation of this feature seems somewhat superior in PACTOLUS; if the convergence criterion is not satisfieq, the computa- tion proceeds again through those blocks in the algebraic loop. MIDAS appears to recompute all the preceding blocks on the sort list, regardless of whether or not they pertain to the Implicit Function. This modification would seemingly result in a significant reduction of solution time for large problems involving Implicit Functions. Another modest contribution in P ACTOL US is the incorporation of a number of special elements of unspecified function. An interconnection specification for one of these special elements results in a subroutine call during the interpretive portion of the program. The user may design any complex function for this element by development of the appropriate subroutine. This permits the user to add elements to the. standard complement without reprogramming the main P ACTOLUS program. The JANIS program, although structurally quite different, must be credited with first utilizing subroutines to achieve this "do-it-yourself" capability.ll III. OPERATING PROCEDURE AND EXAMPLES The simulation configuration is the specification of the interconnection of the computing blocks, where each block is one of the standard set of P ACTOL US elements or one of the nine special element blocks which the user may prepare for his particular requirements. Each block has but a single output and no more than three inputs. The program is incapable of handling algebraic loops; the existence of such a loop will result in a "sort failure" message. Since the digital computer is a serial device, the various computations specified by the simulation configuration must be performed for each time cycle in some particular order. The computation for any block should not be attempted until all its inputs have been computed during that iteration. The program automatically performs a sorting operation to achieve this ordering after each configuration change. It is presumed that each simulation involves at least one integrator; the program uses a simple second-order Runge-Kutta numerical integration formula to approximate the outputs of the integrators during the specified integration interval. 305 PACTOLUS-A DIGITAL ANALOG SIMULATOR PROGRAM FOR THE IBM 1620 Example 1: Use of the program is perhaps best understood by consideration of several simple examples. Figure 1 is a simulation diagram for a second-order system. The program presumes that ordinarily the user will approach the computer with most of the "patchboard" pre-wired; that is, with a deck of cards specifying the configuration and parameter values. For this simple example, however, it is as easy to put a deck of blank cards in the card reader and do the "wiring" at the console. Figure 2 shows the typewriter record from this problem session. The user first turned on Sense Switch # 1 to indicate his intention to enter configuration specifications from the typewriter. The computer then typed the following: Figure 1. Simulation Diagram for Second-Order System Example. CONFIGURATION SPECIFICATION BLOCK TYPE INPUT 1 INPUT 2 INPUT 3 () () () CONF I GURATION SPEC I F I CATI ON TYPE INPUT 1 INPUT 2 (IJ6) (K) ~ (+) (7) (I) (23) ( IJ) (G) (17 ) (G) (I) (0) (G) (13) (19) ( 20) () ( 17' ( (-13) (-23) ; ( IJ6 J; ( ) ( ( ) ) (IJ); ( 76); ( 19); ( ( ( ) ) ) (1) ~ (1); INPUT 3 () ( ( ( ( ( ( ) ) ) ) ) ) ) INITIAL CONDITIONS AND PARAMETERS I C/PAR1 PAR2 PAR3 0.0 ;) (13) (-100.0;) ( IJ) ( 1.0 ; ) ( 23 ) ( 0 •8 ) ,. (20) (20.0; ) (19) (-5.0; ) ) INTEGRATION INTERVAL BLOCK (17) ~ ~o ~~ ( 2.0 (20) ; (13) ; ~ i ( ) TOTAL T !ME ) PRINT INTERVAL HOR I ZONTAL AX I S VERTICAL AXIS TIME 0.000 2.000 IJ .000 6.000 8.000 10.000 BLOCK (23) TIME 0.000 2.000 ".000 6.000 8.000 10.000 BLOCK (23) TIME 0.000 2.000 ".000 6.000 8.000 10.000 OUTPUT("6) 100.00000 -30.51581 -13.03217 9.20051 ."8282 -1.96715 ( () The user then turned the typewriter paper roller back one line and inserted the first specification within the parentheses. After he pressed the RS (release and start) key, the PACTOLUS DIG IT AL ANALOG SIMULATOR PROGRAM BLOCK () OUTPUT( 7) 0.00000 "1.30857 -11.02521 -6.96698 3.8"609 .50806 OUTPUT( 13) -100.00000 -1.21098 21.8523" -3.62699 -3.55969 1.56010 OUTPUT< 2 3) ; 0.00000 31.8IJ686 -8.82016 -5.51358 3.07681 ."06IJ5 INITIAL CONDITIONS AND PARAMETERS IC/PAR1 PAR2 PAR3 1.2 ; OUTPUT 116 100.00000 -23.IJ3231 -8.61192" 2.27288 .111380 -.21880 OUTPUT 1 0.00000 31.60320 -.65"6" -3.39"39 .11838 .30531 OUTPUT 116 100.00000 -38.091118 -25.5018" 30.08132 -3.85580 -11.511220 OUTPUT 1 0.00000 63.26133 -32.190511 -12.01936 20.56715 -5.06982 OUTPUT 13 -100.00000 -21.691"6 9."31181 1.8003§ -.88581 -.1"76" INITIAL CONDITIONS AND PARAMETERS Ie/PARI PAR2 PAR3 ( 0.11 ; OUTPUT 13 -100.00000 12 .19025 38.38"06 -25.21358 -11.37106 13.51013 OUTPUT 23 0.00000 "5.123811 -.18556 -IJ.07321 .1"206 .366"5 OUTPUT 23 0.00000 25.30IJ53 -12.81621 -11.807111 8.22686 -2.02192 Figure 2. Typewriter Record for Second-Order System Example. 306 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 typewriter automatically typed another line of parentheses in anticipation of the next specification. In addition, as each configuration specification was entered from the typewriter, the punch produced a card with the identical data. The collection of these cards forms the wired "patchboard" which may then be saved for subsequent problem sessions, thereby eliminating the necessity of re-typing the specifications. Just as the analog user merely needs to get the patch-cord into a hole to make a connection, the PACTOLUS user need only get the proper block number within the parentheses; he is not distracted at the console by complicated input format requirements. The equivalence of the format of the configuration specifications shown in Figure 2 to the simulation diagram of Figure 1 should be obvious. Block 17 is a Constant input (K). Block 46 is a Summer (+) with inputs from blocks 17, 13, and 23; sign inversion is indicated for the latter two. In a similar manner, blocks 4 and 23 are Gain potentiometers (G) and blocks 7 and 13 are Integrators (I). Thus, each component is uniquely identified by a block number. The type of element for each block is specified by a single letter or symbol. These have been assigned as either the first letter of the name of the element or as the common mathematical symbol for the operation. Table I contains the complete list of elements and the corresponding symbols. The inputs to a block are specified by the block numbers of the components which provide the inputs. Block numbers may be assigned arbitrarily between 1-75. Block 76 by definition is the time variable; it appears as the input to block 19 since ~t is desired to plot the time response of the second-order system versus time. It should be noted that the plot size is fixed at 10 inches square. The origin is at the center and each axis is scaled for a maximum of +100.0. In this example, blocks 19 and 20 correspond to the horizontal position and gain controls of the conventional X- Y recorder. After entering the last configuration specification, the user turned off Sense Switch # 1 and turned on #2 to enter initial condition and parameter values from the typewriter. Once again, the typewriter typed a line of parentheses to indicate the proper format. Figure 2 shows that the user specified an initial condition for Integrator 13 at -100.0 and the damping factor control, Gain potentiometer 23, at 0.8. After each of these parameter specifications, the punch again produced a card; thus, both the configuration and the parameters became part of the permanent deck. After entering the last of these specifications, the user turned off the sense switch. The typewriter then requested timing data for the run and specification of those outputs which were to be plotted and those which were to be printed. These entries are simply performed since it is only necessary that the typing be within the parentheses provided. The actual time required to "wire at the console," "set the pots," and "adjust the output devices" for this example was 6V2 minutes. This time cost would seem to be comparable to that required for the equivalent analog setup, particularly if we insist that the analog operator prepare a neat, permanent record of the interconnection and all parameter values. The plotter record is shown in Figure 3. Three runs were made, each for a different value of the damping factor. Figure 2 shows the parameter changes for Gain block 23. Each run required 1 minute, 45 seconds; the "pot" changes required 15 seconds each. Run times might have been shortened by increasing the 100 r----------.,.--------..., Time in seconds Figure 3. Plotter Record for Second-Order System Example. PACTOLUS-A DIGITAL ANALOG SIMULATOR PROGRAM FOR THE IBM 1620 typewriter print interval or the integration step. Example 2: Let us next consider a slightly more complicated example which illustrates how P ACTOL US might be used for a study in speech synthesis. This particular simulation is concerned simply with the response of the first speech formant when driven by a glottal pulse. The first formant corresponds roughly to that portion of the speech signal which remains after passing the signal through a 1 KC low-pass filter. The glottal pulse is closely modeled by the triangular waveform produced by block 6 of Figure 4; the effect of the vocal tract is simulated by blocks 7-9. During utterance of a stop consonant such as the "b" of the word "bah," the lips must open quickly with a resulting rapid rise in the natural frequency of the vocal tract. This variation in frequency of the first formant is controlled by blocks 10-12. The user might experiment with various frequency changes to match actual speech records. To add the second or third formants, it would only be necessary to add additional second-order filters and their associated frequency controllers. In preparation for this problem session, the the configuration and parameter ;~~~ifi~~ti-o;;~ on speci;l coding forms which "I'YIi-nimi'7o r>cmr>ovon ·with data format; from these ~~di~;~f~rv~~~~-; deck of cards was punched. In ll~Pl' pntpl'Pc1 PLOTTER VERTICAL AXIS ~PWI'TER 76(t) --~=IZONTAL Figure 4. Simulation Diagram for Speech Synthesis Example 307 this sense, the user approached the console with a "pre-wired patchboard." The specifications for blocks 10-12 were omitted, however, as might have occurred from oversight or from early indecision with the form of this formant frequency controller portion of the simulation. The program caused the configuration specification cards to be read until a blank card was encountered; each of these statements was also printed by the typewriter for the convenience of the user. Since a block 12 had been specified as the input to block 13, but block 12 was as yeturlspecified,\the program recognized an operator' error. After listing the sort failures as an aid for debugging, the typewriter produced a line of parentheses, anticipating that the user would wish to correct the error or omission. The user then turned on Sense Switch #1 until he had entered blocks 10-12. The program then proceeded to read the parameter specification cards until a blank card was encountered. During this period, the user had turned on Sense Switch #2 to permit typewriter entry of the parameters for blocks 10-12. Figure 5 shows that the user neglected to depress the numeric shift key of the typewriter when entering the total time for the run. To recover from this or any other typing error, the user simply turns on Sense Switch #4 prior to pressing the RS key. He would then turn the switch off and reenter the data on the next line. Figure 6 shows the plotter output from this run. The total time for reading the input deck, adding the three blocks, adj usting their parameters and performing the run was 11 minutes. The user finally changed the configuration and repeated the run in order to plot the "glottal pulse." Note that for this purpose, it was not necessary to change parameters, timing, or output data; a series of runs can be performed with a minimum of effort. Exa1nple 3: The third example is a simulation of a sampled-data feedback control system. Figure 7 is a diagrammatic representation of the system; Figure 8 shows the corresponding simulation diagram. The objective of this study was to obtain a digital compensator design which would permit the system output to respond in an acceptable manner to the flattopped ramp input. Of particular interest is the manner in which the Time Pulse Generator 308 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 define a Special element rather than resort to a number of the standard PACTOLUS elements. For instance, one might prepare a defining subroutine for element Special 1 to perform the function of the digital controller of the third example: E() (z) k (I-a Z-l) PACTOLUS DIGITAL ANALOG SIMULATOR PROGRAM CONF I GURA TI ON SPEC I FICA TI ON TYPE INPUT 1 INPUT 2 BLOCK 1 G 2 I 3 R 5 3 7 12 13 0 13 76 19 0 0 0 0 0 9 7 I INPUT 3 o o 2 o o o 7 o o o o o o o o o (l-b Z-l) 100 r-------------~~-----.----------------------~ 13 n (11) (60) 0 o o Cl2) ( O.O~ t ( UI*' I 12.0 , ( 2.0' (20) , 0 0 I a 9 n - -51 5 6 13 FAILURE FA I LURE FA I LURE FAI LURE 1 76 - SORT SORT SORT SORT 0 51 ~ 6 7 I 9 19 20 50 51 60 BLOCK BLOCK BLOCK BLOCK ClO) 2 50 (K), ( (I) (L) ( ( ) 10) I 1111 INITIAL CONDITIONS AND PARAMETERS BLOCK I C/PARI PAR2 PAR3 1 -1.50000 0.00000 0.00000 5 0.00000 0.00000 0.00000 7 0.00000 -1.00000 -.30000 13 6.21320 0.00000 0.00000 19 16.66666 0.00000 0.00000 20 -100.00000 0.00000 0.00000 50 100.00000 0.00000 0.00000 51 ~.OOOOO 0.00000 0.00000 60 2.00000 0.00000 0.00000 (10) (O.~' ) ) ) (11) (0.2,) ) ) (12) ( 0.6 ) 0.0') ) ) INTEGRATION INTERVAL ) TOTAL TIME ) PRINT INTERVAL HORIZONTAL AXIS VERTI CAL AX I S TIME 0.000 2.000 ~.OOO 6.000 1.000 10.000 12.000 BLOCK (60) BLOCK OUTPUT( 6) 0.00000 50.00000 100.00000 25.00000 0.00000 0.00000 0.00000 OUTPUT(12) .20000 .21000 • __ 000 .36000 .52000 .60000 .60000 CONFIGURATION SPECIFICATION TYPE INPUT 1 INPUT 2 (.) (611 () OUTPUT( 7l 0.00000 n.760U -5.ln,. -9.'1651 -2.126-' -.51610 1.7H51 OUTPUT( 9)1 0.00000 17.66a5 15.79_76 .56082 -.52720 -.69110 -.27715 INPUT 3 () INITIAL CONDITIONS AND PARAMETERS IC/PARI PAR2 PAR3 -100~ ____________________ o TIME 0.000 2.000 _.000 6.000 1.000 10.000 H.OOO Figure 5. OUTPUT 6 0.00000 50.00000 100.00000 25.00000 0.00000 0.00000 0.00000 OUTPUT 12 .20000 .21000 .36000 .~_ooo .52000 .60000 .60000 OUTPUT 7 0.00000 1'.760.0 "-3.11391 -9.'1651 -2.126,7 -.51610 1.rU51 OUTPUT 9 0.00000 17 .6&1'5 15.79"6 .56012 -.52720 -.69110 ~ 12 -- Typewriter Record for Speech Synthesis Example. Special Elements 1-9 When a user finds recurring use for a particular operation, it may be advantageous to __________________ Figure 6. Plotter Record for Speech Synthesis Example. -.27715 element is used in conjunction with Zero Order Hold and Unit Delay elements to implement the digital compensator. Block 50 produces a series of pulses at the sampling rate; these pulses trigger the hold elements. By alternating hold and delay elements, one may obtain the sampled-data operators z-I, z-2, etc. The output of the system is shown in the plotter record, Figure 9. A number of runs were conducted with various values for the parameters of blocks 12 and 13 which determine the gain and weighting factors for the digital controller. Each of these runs required 1 Y2 minutes, exclusive of the time required to decide upon the next set of values. ~ Time in milliseconds I ~ Figure 7. Diagrammatic Representation of SampledData Feedback Control Problem. TI(I) PlDI'TD VllRTlCAL AD! ~_PL<7M'EII ~~ORTAL TI(I) _ _ ~ Figure 8. Simulation Diagram for Sampled-Data Feedback Control Problem. PACTOLUS-A DIGITAL ANALOG SIMULATOR PROGRAM FOR THE IBM 1620 200 .-------------r--------,----.. . . C C C C 309 SUBROUTINE SUBIIC.PAR.I.J.K.LI I NPUT IS CI J I OUTPUT IS CI I I CI761.PARI75.31 DIMENSION IF I C(76) I 2.1.2 INITIALIZE AT T • 0 CAY· PARII.lJ A .. PARII,21 B .. PARI 1,3 I EO .. 0.0 ZMIEI .. 0.0 lMIEO .. Q.O 2 IF I CIK) I 4.4,3 SAMPLING OCCURS WHEN SECOND INPUT IS POSITIVE EI .. ClJ) EO .. CAY*IEI - A*ZMIEI) + B*ZMIEO ZMIEO • EO ZM1EI .. EI 4 C( I I .. EO RETURN END Figure 10. FORTRAN Program Example for Definition of Element "Special 1." Time in minutes 2.5 Figure 9. Plotter Record for Sampled-Data Feedback Control Problem. Such a program might be easily written in FORTRAN or, for a large simulation in which speed was critical, in symbolic machine lan- guage. Figure 10 shows a FORTRAN program which would perform this operation; it is important to note that due to the simplicity of the P ACTOL US program, only modest programming skill is necessary for the definition of the element. This new element, Special 1, can then replace blocks 2-15 of Figure 8. The typewriter record illustrating use of this element as block 13 is shown in Figure 11. PACTOLUS DIGITAL ANALOG SIMULATOR PROGRAM BLOCK 1 13 20 21 30 31 40 41 50 60 61 70 74 75 CONFIGURATION SPECIFICATION TYPE INPUT 1 INPUT 2 -41 6l 50 1 1 I 21 13 20 G 0 I 21 0 G 30 0 I 41 31 G 40 0 T 76 0 G 76 0 L 60 0 0 31 0 G 76 0 '0 74 0 INPUT 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INITIAL CONDITIONS AND PARAMETERS I C/PARI PAR2 PAR3 0.00000 -1.00000 0.00000 -10.00000 0.00000 0.00000 0.00000 0.00000 0.00000 10.00000 0.00000 0.00000 0.00000 -1.00000 0.00000 -20.00000 0.00000 0.00000 .20000 0.00000 0.00000 125.00000 0.00000 0.00000 125.00000 0.00000 0.00000 -100.00000 0.00000 0.00000 80.00000 0.00000 0.00000 -100.00000 0.00000 0.00000 (13) (0.75 ) (0.6 ) ( 0.5 ~ ) ) INTEGRATION INTERVAL ) TOTAL TIME ) PRINT INTERVAL HORIZONTAL AXIS VERTICAL AXIS BLOCK 20 21 30 31 40 41 50 60 61 70 74 75 ( 0~02 V ( 2.5; ( 0.5; (5); (70); TIME 0.000 .500 1.000 1.500 2.000 2.500 Figure 11. OUTPUT(13) 0.00000 24.87666 6. Q5500 -5.72807 3.14409 -3.21131 OUTPUT(21) 0.00000 21.64Q80 ".65402 -3.77466 3.20176 -2.35386 OUTPUT(31) 0.00000 40.72684 115.54003 128. Q7023 121.58805 127.13260 OUTPUrc4 1>, o.oonrtl! 31.04115 112.51337 130.}t1467 1?0.33100 128.0!W'-4 Typewriter Record for Sampled-Data Feedback Control Problem. 310 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 Implicit Operations: The "Wye," logical branch element is used in conjunction with the Vacuous element V for simulations which involve implicit equations of the form Y = f(Y,X). On a digital computer, such an equation is usually solved by iteration. In a realistic problem, Y will often be merely an intermediate variable used in subsequent equations of the form Z = g (Y,X). It is the function of the Wye element to determine whether the iteration on Y is sufficient to satisfy the error criterion. If not, further iterations must be made until the test is (hopefully) satisfied. This iterative procedure must be performed within each of the integration time steps. There is no need, however, to recompute X at each iteration since it is independent of the iteration. Similarly, the computation of Z ought not be attempted until the iteration on Y is satisfied. The MIDAS program was the first of its kind to have a capability for implicit equations, but it does not presently incorporate these considerations. The implementation in PACTOLUS is presented (as a peace offering?) for possible improvement of MIDAS and subsequent programs. The manner in which the elements are to be used is indicated in Figure 12. Although block numbers may be assigned arbitrarily, for illustrative purposes they have been assigned in the same sequence as that determined by the z Y test ~---:x~-----~ --..I Figure 12. Example Use of the Wye and Vacuous Elements for Simulations Involving Implicit Equations. sorting operation. Blocks 1-16 compute a quantity X. The initial estimate for Y is given as an initial condition for V block 17. Blocks 18-30 compute f (Y,X). Block 31 compares f (Y,X) with the previous estimate of Y. Parameter 1 of the Wye element specifies the relative error criterion. If the relative error between the outputs of blocks 30 and 17 is less than specified by that parameter, then the output of block 31 is set equal to that from block 30. Computation of Z then proceeds in normal manner. If the error criterion is not satisfied, the operation of Wye block 31 is as follows: (1) A new estimate for Y is computed using parameter 2 of the Wye element Yn+l = (l - P2) f (Yn,X) + P2Y n ; (2) Y n +1 replaces the previous Y n as the output of block 17; (3) The program "branches back" to the computation of that element on the sort list which follows the V block-in this case, it branches to block 18. The inputs, if any, to the Vacuous element V do not affect its output. Its initial output is specified as an initial condition; its subsequent output is determined by the associated Wye block. The position of the V element in the sort list follows that of each of its inputs. Its purpose is to ensure that, when a "branch back" occurs, these preceding blocks will not be recomputed. There is no requirement that the implicit equation actually involve any of the inputs to the Vacuous block. Sense Switches An important factor in the flexibility of analog simulation is the ease with which the operator can control the run, starting and stopping it at will. PACTOLUS uses the sense switches of the 1620 to achieve the same measure of control provided by the usual "Standby-Initial Condition-Operate" switch of the analog computer. The operator may terminate a run at any time by momentarily turning on Sense Switch .#4. He then sets the sense switches in accordance with the operational option he desires and presses the Start switch to continue. PACTOLUS-A DIGITAL ANALOG SIMULATOR PROGRAM FOR THE IBM 1620 The sense switch settings for the various options are as follows: Sense Switch #1 on permits the operator to modify both the configuration and initial condition/parameter specifications; Sense Switch #2 on permits the operator to modify the initial condition/parameter specifications; Sense Switch #3 on permits the operator to modify the timing and output specifications; Sense Switch #4 on causes the plotter to move the paper beyond the present plot and prepare a new plot frame. Sense Switch #3 may be used independently or in conjunction with Sense Switches .#1 or #2. Sense Switch #4 is always used in conjunction with one or more of the other switches. If the computer is restarted with all switches off, the program presumes that an entirely new simulation is to be started and attempts to read the configuration specifications. IV. SUMMARY Our objective in developing PACTOLUS has been twofold: to make a critical evaluation of the various techniques employed in previous digital analog simulator programs and to demonstrate that a modus operandi comparable to that of analog computer users could be obtained for digital simulation. P ACTOL US embodies the conclusions of that evaluation. No attempt has been made herein to detail the reasons for accepting certain features while rejecting others. With the hope that this" effort will be of value in the development of subsequent simulation programs, we merely wish to state that this is our considered~ opinion after serious study. Whether P ACTOLUS does indeed represent an innovation in operational flexibility will only be known after the program achieves much wider usage. It has been used for a number of 311 small applications,12 but we readily admit a measure of bias. We do feel that P ACTOLUS demonstrates that our objective can be obtained. To the user of the small scientific computer, the program offers a means of conveniently solving many of those problems for which an analog computer would otherwise have been required. The techniques employed in P ACTOLUS are hopefully suggestive of the manner in which digital simulation might be provided at remote terminals serviced by a large digital computer. It is our thought that digital simulation is on the brink of significant expansion. The availability of appropriate terminals and visual display units will herald this event. REFERENCES 1. SELFRIDGE, R. G.: "Coding a General-Purpose Digital Computer to Operate as a" Differential Analyzer," Proc. 1955 Western Joint Computer Conference (IRE). 2. LESH, F.: "Methods of Simulating a Differential Analyzer on a Digital Computer," J. of the ACM, Volume 5, Number 3,1958. 3. STEIN,-M. L., ROSE, J., and PARKER, D. B.: "A Compiler with an Analog-Oriented Input Language," Proc. 1959 Western Joint Computer Conference. 4. HURLEY, J. R.: "DEPI 4 (Differential Equations Pseudo-Code Interpreter) -An Analog Computer Simulator for the IBM 704," internal memorandum, Allis Chalmers Mfg. Co., January 6, 1960. 5. HURLEY, J. R.: "Digital Simulation I: DYSAC, A Digitally Simulated Analog Computer,'; AlEE Summer General Meeting, Denver, Colorado, June 17-22, 1962. 6. STOVER, R. F., and KNUDTSON, H. A.: "H800 PARTNER-Proof of Analog Results Through a Numerical Equivalent Routine," Doc. No. U-ED 15002, MinneapolisHoney~ell Regulator Co., Aero. Divn., April 30, 1962. 7. GASKILL, R. A., HARRIS, J. W., and McKNIGHT, A. L.: "DAS-A Digital Analog Simulator," Proc. 1963 Spring Joint Computer Conference. 312 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 8. BYRNE, E. R.: "JANIS, A Program for a General-Purpose Digital Computer to Perform Analog Type Simulations," ACM Nat. Conf., Denver, Colorado, August 27, 1963. 9. HARNETT, R. T., SANSOM, F. J., and WARSHAWSKY, L. M.: "MIDAS Programming Guide," Tech. Doc Report SEG-TDR64-1, Analog Computation Divn., WrightPatterson Air Force Base, January 1964. 1964 10. McLEOD, J.: "Simulation is Wha-a-t?," SIMULATION, Vol. 1, No.1, Fall 1963. 11. LINEBARGER, R. N.: "Digital Simulation Techniques for Direct Digital Control Studies," Proc. 19th Annual Conf. of ISA, New York, October 12, 1964. 12. COWLEY, PERCY E. A.: "An Active Filter for the Measurement of Process Dynamics," Proc. 19th Annual Conf. of ISA, New York, October 12, 1964. ,MIDAS - HOW IT WORKS AND HOW IT'S WORKED Harry E. Petersen, F. John Sansom, Robert T. Hartnett, and L. Milton Warshawsky Directorate of Computation Wright-Patterson AFB, Ohio digital computer he must normally employ an interpreter in the form of a digital programmer (usually a mathematician). This means that he must describe his engineering problem in the required form, detail, and with sufficient technical insight to have the digital programmer develop a workable program on the first try. This doesn't happen very often and it is the consensus of opinion among computing facility managers that a maj or source of the difficulty lies in the fact that the engineer does not always realize the full mathematical implications of his problem. For example, in specifying that a displacement is limited, he might not state what happens to the velocity. This can lead to all sorts of errors as an analog programmer would know. It is, of course, possible for an analog programmer to learn to program a digital computer by studying Fortran. This has been attempted here at Wright-Patterson AF Base with little success, mainly because, unless used very often, the knowledge is lost so that each time a considerable relearning period is required. Some computing facilities have even embarked on cross-training programs so that each type of programmer knows the other's methods. While this has much to recommend it, it is often impracticable. INTRODUCTION The possibility of using a digital computer to obtain check solutions for the analog was recognized by many people at the dawn of our 15 year old history. Unfortunately several problems existed then, mainly at the digital end, which made this impracticable. Digital computers of that day were terribly slow, of small capacity and painfully primitive in their programming methods. I t was usually the case when a digital check solution was sought for an incoming analog problem, that it was several months after the problem had been solved on the analog computer and the results turned over to the customer before the digital check solution made its appearance. The fact that the two solutions hardly ever agreed was another deterrent to the employment of this system. As we all know, digital computers have made tremendous strides in speed, capacity and programmability. In the area of programming -and throughout this paper-we're talking of scientific problems expressible as differential equations; the main effort has been in the construction of languages such as Fortran, Algol, etc. to permit entering the problem in a quasimathematical form, with the machine taking over the job of converting these to the individual serial elemental steps. While the progress along this line has been truly awe-inspiring to an analog man (usually an engineer), the resulting language has become quite foreign to him so that if he wishes to avail himself of the In March of 1963, Mr. Roger Gaskill of Martin-Orlando explained to us the operation of DAS (Digital Analog Simulator) / a block diagram type of digital program which he intended for use by control system engineers who 313 314 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 did not have ready access to an analog computer. We immediately recognized in this type of program the possibility of achieving our long-sought goal of a means to obtain digital check solutions to our analog problems by having the analog programmer program the digital computer himself! We found that our analog engineers became quite proficient in the use of DAS after about one hour's training and were obtaining digital solutions that checked those of the analog. At this point several limitations of this entire method should be acknowledged. First, the idea that obtaining agreement between the digital and analog solutions is very worthwhile is based mainly on an intuitive approach. After all both solutions could be wrong since the same programming error could be made in both. Secondly, the validity of the mathematical model is not checked, merely the computed solution. Finally, it might be argued that the necessity of the analog man communicating the problem to his digital counterpart has the value of making him think clearly and organize his work well. This is lost if he programs the digital computer himself. In spite of these limitations we thought it wise to pursue this idea. Although DAS triggered our activity in the field of analog-type digital programs, several others preceded it. A partial list of these and other such programs would include: DEPP California Institute of Technology DYSAC:J University of Wisconsin DIDAS4 Lockheed-Georgia P ARTNER5 Honeywell Aeronautical Division DYNASARG General Electric, Jet Engine Division Almost all of these-with the possible exception of PARTNER (Proof of Analog Results Through a Numerical Equivalent Routine) -had as their prime purpose the avoidance of the analog computer. They merely wished to borrow the beautifully simple programming techniques of the electronic differential analyzer and apply them to the digital computer. While DAS proved to be very useful to us, certain basic modifications were felt to be necessary to tailor it better to our needs. Principal among these modifications was a rather sophisticated integration routine to replace the simple fixed interval rectangular type of DAS. Other important changes were made but the change in the integration scheme and our wish to acknowledge our debt to DAS, led us to the choice of the name MIDAS, an acronym meaning Modified Integration Digital Analog Simulator. In this paper a brief description of the method of using MIDAS. will be given, followed by a summary of our experience in using it in a large' analog facility for about 18 months. How MIDAS Works To a large degree, programming in MIDAS closely resembles the methods used in DAS and, therefore, an analog computer. There are usually three steps we go through in obtaining a solution. First we prepare a block diagram very similar to an analog schematic indicating the operational elements required to solve the problem. N ext we prepare a listing which is our means of directing the punching of the cards, one for each line. This listing indicates the source of the inputs to each element and defines the values of the required numerical data. After the cards have been punched and checked they are turned in to our IBM 7094 operations group where the information is placed on magnetic tape and this tape, along with the MIDAS subroutine, is used to solve our particular problem. The results are given to us on printed form according to a rather fixed format. As an illustration of the steps involved in preparing a MIDAS program, let us set up the classical second-order linear differential equation for the mass, spring and damper system. The equation is: ... + Bx + Kx = 0 ~ Mx (1) with initial conditions: x(O) = A ~(O) = 0 The MIDAS block diagram for this equation is shown in Figure 1. The following points should be noted: MIDAS-HOW IT WORKS AND HOW IT'S WORKED DESCRIPTION OF MIDAS ELEMENTS SYMBOL & NAME t ~ INTEGRATE REMARKS OUTPUT 1. MATHEMATICAL OPERATIONS - - --,IC fAdt + IC 1. Only one input can be accepted. 2. The initial condition, IC, is transmitted via an IC card and its corresponding Out = o data card. SUM Al~ • S. out K A2 Ak;. Out =1; Ai i=1 l K~6 Out = -A NEGATIVE MULTIPLY DMDE ABSOLUTE VALUE -ijABSj SQUARE ROOT EXPONENTIAL NATURAL LOGARITHM The order of listing the inputs is very important. The numerator, A, must be listed first. Out = A/B I ou~ Out = IAI ~ Out = SQRj ~ +..[A Defined only for Ali: 0 Out = e A InA ~ LNj 14 Out = inA Defined only for A> 0 Bout=sinA 1. Input angle, A, must be in radians. 2. Since there are two outputs, tbetle moat be specified as RESjB or RESjC depe~ on w!!etller tile sine or cosine Is required. Cout= cos A I 1. Output Is an angle In radians. 2. Defined only for the 1st and 4th quadrantst i.e., -"'/2 ~ out < 11/2 I 2. SWITCIUNG ELEMENTS SPECIAL NOTE: Since all of these elements have more than one input, the listing of inputs must be in the normal order of A, B, C. OUTPUT RELAY B O H A>B D=O D<"O A + ~ Equivalent to: ~ B C + 0 ~ out 00" C"A"B A C Description of MIDAS Elements. ~ 315 316 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 TABLE 1 DESCRIPTION OF MIDAS ELEMENTS (cont'd) BANG-BANG out; r r· -B out; DEAD SPACE 5. Equivalent to: A B 0 C;;A~B A-C A C RUN TERMINA nON FINISH 6. out A~O None INSERTION OF NUMERICAL ITEMS Name " I CONSTANT or PARAMETER ~ INITIAL CONDITION 4 Per Data Card Ij(O) 1 IC Ij SPECIAL STATEMENTS 1. When A ~ B, computation is stopped. 2. Every program must col1taIn at least one FIN statement. 3. Numbering of FIN statements is not req'd. 1. The name of a constant or parameter can be composed of at most six alphanumeric symbols excluding blanks and commas. 2. The names must not be the same as that of a functional element used In the.problem. 3. The name will appear on a C0N or PAR card and its numerical value on a data card 4. Do not use these special names: IT, TR, MININT, !/lPTI0N. 1. The name must be the same as the integrator with which it is associated. 2. The name must appear on an IC (or PAR) card and its value on a data card. 3. Only non-zero IC's need be specified. 4. Such IC's must be specified for every run even though they do not change. I. Contents of the HDR statements are printed at the beglnning of each run. 2. Normally used to name the variables recorded in each column. 3. HDR cards should precede the R!/lcardB. HEADER HDR None READOUT R0 None Specifies tile sources of the variables to be recorded for each run. END END None Signifies the end of the MIDAS symbolic program. Numerical data follows. SPECIAL NAMES I. Gives the current value of the independent variable In the approprtate units. 2. Generated Internally, thts variable can be obtaIned by specifying its source as IT. INDEPENDENT VARIABLE IT Independent Variable TIME BETWEEN READOUTS TR None 1. When listed on a cl/lN or PAR card, TR gives the increment of the independent variable, usually time, between successive readouts. 2. if no value of TR Is given, a standard value of 0.1 units will be used. MINIMUM INTERVAL OF INTEGRA TrON MININT None 1. When listed on a CI/lN or PAR card, MININT specifies the smallest Interval that the integration system is permitted to use. 2. if no value of MININT Ia given, a value of ZERO is used. INTEGRA TION OPTION None When an Integration method with an error criterion other than the standard one Ia deSired, call for !/lPTI!/lNj In columns 1-7 somewhere within the MIDAS program. 0PTI0Nj Table I MIDAS-HOW IT WORKS AND HOW IT'S WORKED SYMBOL & NAME 3. REMARKS ARBITRARY FUNCTIONS For all 4 types of ~tion ~. out_ ~r:l ---c:s--- CONSTANT FUNCTION 1. Up to 50 sets of x-y coo,dinates can be used. 2. Spacing of breakpoints is arbitrary. 3. Slope of the function is zero above aud below the set of specified points. 4. Method of introducing data Is in the text. generators: --c.r- FUNCTION out_ CURVE FOLLOWER CONSTANT CURVE ~.~ FOLLOWER ~ 4. OUTPUT Out = f(A) 1. Linear interpolation. 2. Data needed for ~ run. Out = f(A) 1. Linear interpolation. 2. Data needed for first run only. Out = f(A) 1. Quadratic interpolation. 2. Data needed for ~ run. Out = f(A) 1. Quadratic interpolation. 2. Data needed for first run only. ITERATIVE ELEMENT IMPUCIT FUNCTION No direct output 1. Tbe inputs must be llsted in the order A, B. 2. If IA - BI >5x10- 6 1A1, iteration occurs by tra..n..sferring t.l:!e va.!!!e of B into A{B .... A). recomputing a new value of B, transferring it into A, etc., until the error criterion is sattllfled. ~BI 3. Criterion for convergence is I ;tAl < 1. I Example: x=f(x)+y 4. An initial estimate of A must be given ,via a C0N or PAR card aud its associated data card. 5. When A ls needed elsewhere in the problem, it can be taken from the C0N or PAR source. (See suggested method of usage at the left.) 6. For multiple runs, A must be named on a PAR card. r-----------..., <--_ _....;' MIDAS elements • ~f~z:. ~~~_~ ~~~..: Table I 317 318 PROGEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 (1) SI is a summer which adds the quantities -Bi and -Kx to yield Mx In accordance with equation 1. (2) Dl is a divider whose dividend is M~' (the output of SI) and whose divisor is a constant called M. Its output is then +x. (3) II is an integrator whose input is +~i' and output is +x. Unlike the case of analog integrators and summers, there is no change of sign in the equivalent MIDAS elements. Since no initial value is specified the output of II will start at zero. (4) I2)s another integrator whose input is +x and whose output is +x. The initial value of x is indicated by the dashed line extra input to 12. (5) Ml and M2 are multipliers which multiply +x by -B and +x by -K to provide the required inputs to S1. Unlike on analog computers the same type of multiplier is used whether two constants, a variable and a constant, or two variables are being multiplied. (6) FIN is a finish element. This element will stop computation when its A input (in this case t) equals or exceeds its B input (in this case the quantity named STOP) . Computation can be caused to halt by any of several conditions. A FIN box is required for each termination criterion. (7) When numerical data corresponding to M, B, K, STOP and x (0) are furnished, the problem is completely specified. No scaling will be required since numerical values ranging betwen 10-37 and 10+37 can be handled. N ext the listing is prepared. It will contain the following information: (1) One card identifying the problem. (2) A few cards calling out the MIDAS program. (3) Cards giving names to the constants and parameters of the probem, including integrators with non-zero initial values (up to 6 per card). (4) One card for each MIDAS element in the block diagram, identifying it by type and number, and indicating the source of its inputs (for example S2 meaning summer number 2). (5) At least one FIN card spe~ifying a condition for finishing a run. (6) One or more HDR cards and RO cards specifying the headeT names to be placed on top of the columns of output data to be read out at specified increments of the independent variable. (7) An END card indicating the end of the symbolic program and the start of numerical data. (8) One or more cards assigning numerical value to the constants, parameters and initial conditions named in (3) above. (9) Cards defining arbitrary functions, if any. An example of a listing is shown in Figure 2 for the mass, spring, and damper system. ," Note the use of a comment card by the programmer to identify the problem. Also of significance is the columnar location of various types of entries. For example, comment cards have their first letter in column 1. Operational elements, constant and parameter naming cards, HDR and RO cards all have their first character in column 7. Inputs to these elements are listed starting in column 15. Numerical data is listed starting in colunms 1, 11, ... 51. lt should be pointed out that in MIDAS, the programmer need not concern himself with the order in which he prepares the listing since a built-in sorting routine will automatically line up the program properly. This is another important difference from MIDAS.' predecessor, DAS. The particular listing shown will result in three runs being made, each starting with x (0) = 20 and terminating when t = 5. The mass M will have a value of 10 for each case since M was named on a constant card. The three runs will have the following values of -B and -K, each of which was specified as a paTameter. MIDAS-HOW IT WORKS AND HOW IT'S WORKED SflNSOM PROGR .... ER PHONE 33281 OAT< 20J({L'I 64 PAGE _ OF _ 319 PAGES MASS, SPIWJ(i , DIIIJ1P€f(. SYST6M PROGR ..... , 1 INPUTS FUNCTION U if.IDANA ftlc/SFlNS(/) 1M. PR(/)Bj61-Z 24 • TItnEj5, ailES/SOD " XE~ *" CI1LL" tnID .. IRs END Mrll 5¢1lUTI l/LN ~~ KIt! 55,,. SPItING, A J)RmPER" SYSTEM M,STd>P C¢lN -5,-K PAR rz Ie 51 IfIl,Jrl2 DI II 5/.111 DI 1Z II MI -8,II Ml 11., FIN IT.ST.P HDR TIME, Ace., VEL., DISP. -I( HDR IT.l)J.II.I2 R$ END ASO FO'" SEP •• O·490b MIDAS DATA fORM PROGRA • • ER PROGRA. SAN~oll1 fY/llSS, SPIW/~ "DAmPER. ~YSr.EM 21 10. S. -2.5 -8.6 2.0. -3.2 2.0. -8.6 -2..5 -15. ro . I Figure 2. Original Listing. Run 1 Run 2 Run 3 -B = -2.5 -B = -3.2 -B = -2.5 -K -K = -8.6 = -8.6 -K = -15.0 Finally a portion of the printed output of the IBM 7094 is shown in Figure 3. Several points are worthy of note. (1) The printout of the problem l~sting including the data for Case 1. Actually some machine mapping and storage information precedes this but has been omitted for clarity. (2) The format of the output. Note the headers and the spacing of the four columns of output. Provision exists for six columns but only four components were specified to be read out in this simple problem. (3) The MAXIMA-MINIMA table. This feature, unique to MIDAS, provides the analog programmer with all the information needed to scale his analog schematic, both in amplitude and time. It shows the maximum and minimum values achieved by every component during the course of a run, whether these values occurred at read out intervals or not. (4) The printout of the parameter and IC data for Cases 2 and 3, followed by their output. (5) The job accounting summary. The three cases took a total of 44 seconds. This completes the description of this problem. Although considerable detail has been presented, in retrospect it can be seen that the· main idea was simple. An analog-type block diagram was drawn and a listing prepared describing its interconnections. Information providing numerical data was also furnished. 320 PROCEEDINGS-F ALL JOINT COMPUTER CONFERENCE, 1964 MAXIMA SOLUT 1001 Of' MASS, SPRING, DAM.ER SYSTEM CON M,STOP PAR -8.-K Ie SI 01 "1."2 11 12 "1 P112 FIN HOR IT 12 SI Ml M2 01 11 12 Sl,M 01 11 -8,11 "INI~' 5.1000E 2.0000E 1.0535E 4.6"7E 9.9142E 1.0535E 8.3718E 00 01 02 01 01 01 00 O. -10I528E -1.1200E -2.6790E -1.7200E -1.7200E -1.4515E 01 02 01 02 01 01 Il.-I\ IT,S TOP TIME,AtC.,YEl •• DISP. HDR RO IT,OI.[1,12 END 5.0000E 00 -8.6000E 00 1.0000E 01 -2.5000E 00 2.0oooE 01 MAXIMA - MINIMA FOR CASE 2 [ SYMBOLIC PROGRAM AND DATA FOR CASE 1 -2.5000E 00 2.0000E 01 J -1.5000E 01 DATA FOR CASE 3 TIllE ACC. O. 1.0000E-Ol 2.0000E-Ol 3.0000E-Ol ".OOOOE-Ol 5.0000E-Ol 6.0000E-Ol 1.0000£-01 8.0000£-01 9.0000£-01 10.0000£-01 1.1000£ 00 1.2000£ 00 4.9000E 00 5.0000E 00 5.1000E 00 -1.7200E -1.6103E -1.6016E -1.5328£ -1.""'9E -1.3S08£ -1.2456£ -1.1324£ -1.0124£ -1.8610£ -1.5676E -6.2351£ -4.8826E VEL. 01 01 01 01 01 01 01 01 01 00 00 00 00 1.0103E-0"1 -1.608IE-OI -IO.OOOOE-Ol O. -1.6962E -3.3362E -4.9014£ -6.3981£ -1.7911E -9.0966£ -1.0286E -1.1)59E -1.2309E -1.3131E -1.3822E -1.437SE DIS •• 00 00 00 00 00 00 01 01 01 01 01 01 9.9Z27E 00 9.9495E 00 9.8913E 00 2.0000£ 1.9915£ 1.9663£ 1.9250£ 1.868"£ 1.7973£ 1.7I28E 1.6158E 1.507U 1.3890E 1.2617£ 1.1268E 9.8569E 01 01 01 01 01 01 01, 01 01 01 01 01 00 -3.6996E 00 -2.7053E 00 -1.7126E 00 -~ - Ace. TIME Vel. O. 1.0000E-Ol 2.0000£-01 3.0000E-Ol 4.00ODE-OI 5.0000E-OI 6.0000E-OI 7.0000E-OI 8.0000£-01 9.000DE-OI 10.0000£-01 I.IOOOE 00 1.2000E 00 -3.0oo0E 01 -2.9038£ 01 -2.7671E 01 -2.5927E 01 -2.38HE 01 -2.1458E 01 -1.8815E 01 -1.5958E 01 -1.2936£ 01 -9.7963£ 00 -6.5891E 00 -3.3648E 00 -1.696 7E-0 1 4.9000E 00 5.0000E 00 5.0500E 00 -1.5983E 01 -1.6018E 01 -1.6035E 01 O. -2.9554E -5.7941E -8.4770E -1.0968E -1.3236E -1.5251E -1.6992E -1.8437E -1.9575E -2.D394E -2.0892E -2.1068E DIS •• 00 00 00 01 01 01 01 01 01 01 01 01 4.1I28E 00 2.5078E 00 1.1047E 00 NUMERICAL RESULTS FOR CASE 1 NUMERICAL RESULTS FOR CASE 3 MAXIMA IT 12 SI HI H2 01 II 5.1000E 2.0000E 1.1636E 3.8119£ 1.1211£ 1.1636E 2.0000E 1.9851E 1.9"13E 1.8698E 1.7724E 1.6512E 1.5085E 1.3471E 1.1697£ 9.7933E 7.7922E 5.7252E 3.6245E 01 01 01 01 01 01 01 01 01 00 00 00 00 9.9696E 00 1.0301E 01 1.0406E 01 - MINIJlU 00 01 02 01 02 01 9.9495E 00 O. -1.30HE -1.1200E -2.4874E -1.1200E -1.1200£ -1.5Z48E MAX IMA 01 02 01 02 01 01 5.0500E 2.0000£ 2.2193£ 5.2611£ 2.1727E 2.2193E 1.5261E IT 12 SI MI M2 01 11 MAXIMA' MINIMA FOR CASE 1 JIIINI"A 00 01 02 01 02 01 01 O. -1,"S4E -3.0000E -3.8152E -3.0000E -3.0000E '-2.1068E 01 02 01 02 01 01 ~--------------------------------------------~ MAXIMA - MINIMA FOR CASE 3 -3.2000E 00 2.00OD~ 01 jr------------~_----, -8.6000£ 00 L -______--------------------------------~ DATA FOR CASE 2 JOB DATE ACCOUNTING SU"MARY 21 JUL 64 BEGAN 13134111 ["OED 1313./55 JOB ACCOUNTING SUMMARY TiME O. I.OOOOE-Ol 2.0000E-OI 3.0000E-Ol •• OOOOE-Ol 5.0000E-OI 6.0000E-OI 1.0000E-OI 8.0000£-01 9.0000E-OI 10.0000E-01 I.IOOOE 00 1.2000E OC VEL. ACC. -1.1200E -1.6586£ -1.5851E -1.5005E -1.4059E -1.3024E -1.1'IlE -1.OB2E -9.5001E -8.2264£ -6.9232E -5.6025E -4.276IE 01 01 01 01 01 01 01 01 00 00 00 00 DO O. -1.6903£ -3.3132£ -4.8568E -6.3108F DIS •• 00 00 00 ,00 -1.6657E 00 -8.9130E -1.0046E -1.IOS8E -1.19"E -1.2102E -1.3328E -1.3822E 00 01 01 01 01 01 01 2.0oo0E 1.9915E 1.9664£ 1.9255£ 1.8696E 1.1996E 1.1I66E 1.6211E 1.5161E 1.4010E 1.2777E 1.1474E 1.0ll5E 01 01 01 01 01 01 01 01 01 01 01 01 01 AFIMGoEIt IS PUT OF A HANOI HOM IUoNY FING.US DOES JCHt "AYE QI I :~~~,.:H:~:e!:iF=~Sp::J~: !~~UMf IH'S' Mfa .. s triAS ,,~ ""lUSt! 2 THERE ME hO .'U'S ON .... USONJ '.9000E 00 5.0000E 00 5.1000E 00 5.02HE-OI -2.21~IE-OI -9.2155E-OI 8.3519E 00 8.3718E 00 8.31"E 00 -3.69'IE 00 -2.8570E 00 -2.0221E 00 HOV flANY FIftG,fAS DOES JOHN HAItE 3 NUMERICAL RESULTS FOR CASE 2 :::WA::~ ~:::C~E!\=I~S •• eu Bur I ASSUME IHA51 "E"IIIS (HAS"~ ""USII (CQNT'D) ::~:;~ .. ~~nNtE IS Alle81GUOU5 •• BUT I ASSUMt IHASI In.,'.tS 1""5 "!I P"It'~.J HOM ",NY flNGU5 DOtS JOHN HAye QI IJHf A8UVE SErwJfNC.f IrHE ANSwtR lS UU Figure 3. Problem Output. IS A"IHGWUS •• ~U' f ASS'J-E !HASt ",UNS IHAS AS PUHIJ 283 LINES MIDAS-HOW IT WORKS AND HOW IT'S WORKED This was then converted to punched cards, turned in to the 7094 operators who subsequently supplied the printed output. This example utilized only a few of the available MIDAS elements. The entire complement of them is shown in Table I, along with some description of their use. The A,B,C lettering system has significance where an element has more than one 'input. In this case the order of the sources of the inputs as given in the listing (starting in column 15) should be A,B,C, etc. Integration System in MIDAS The integrations in MIDAS are performed by a variable-step, fifth-order, predict-correct integration routine. 7 This variable-step feature represents the basic departure of MID AS from its predecessor, DAS. It relieves the programmer from the chore of having to specify a fixed increment of the independent variable, an increment which must be small enough to handle the highest frequency phenomena in a problem but not so small as to cause inordinately long solution times. The step size in the MIDAS integration routine adjusts itself to meet a certain error criterion, a factor which allows it to take large steps for those portions of the solution "when not much is happening" and small steps for those portions when one or more variables are changing at rapid rates. The net result is that time-scaling, as the analog programmer knows it, is eliminated in MIDAS. However, he must still face the time scaling problem when he prepares his analog schematic, especially when certain variables in the problem drive narrow bandwidth analog components such as servo-multipliers, X~ Y plotters, etc. MIDAS helps him here, though, because he can observe the maximum values of the derivatives of these variables from the MAXIMA-MINIMA list and he can then make any necessary time base changes. Fortunately, for every variable appearing at the output of a MIDAS integrator, its derivative must appear at the output of the element feeding the integrator since integrators can accept only a single input. Miscellaneous Information on MIDAS The MIDAS program has limitations on the number and characteristics of certain compo- 321 nents. This information may be of value when a very large problem is to be solved on MIDAS. Item 1. Operational Elements 2. Symbols (operational elements, constants, and header names) 3. Integrators 4. Function Generators 5. Points for each function generator 6. Inputs for each summer Maximum Number 750 1000 100 40 50 6 Summary of How MIDAS Works This has been of necessity a brief review of the method of using MIDAS. A much more complete description including several fully worked out examples is given in Technical Documentary Report No. SEG-TDR-64-1, "MIDAS PROGRAMMING GUIDE," dated January 1964. 8 Information on obtaining the MIDAS program can be obtained by contacting the authors. How MIDAS Has Worked In the following paragraphs, a brief summary of the experiences of the Wright-Patterson AF Base Computation Facility in the use of MIDAS will be given. Actually, at this writing, approximately 100 computing facilities throughout the U.S. are using MIDAS; thus only a small segment of the total experience can be reported on. The Analog Computation facility at WrightPatterson has used MIDAS in almost every problem submitted for solution on its large analog computer. Generally the MIDAS solution is attempted prior to the analog in order to achieve the maximum benefits as regards scaling. Another side benefit of MIDAS has been the calculation of Problem Check values. It has been found in many cases that the extra time involved in programming a MIDAS check is saved in checkout time on the analog computer. The increased confidence in the validity of the solutions when a check 'between MIDAS and the analog solution is obtained is the most important benefit of the program. Another side benefit is the broadened horizons achieved 322 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 by the analog programmers in giving them the ability to program the digital computer. This should be very valuable when hybrid computers come into their own. Until now the problem of finding people with the required capability in both areas has proven to be the greatest deterrent to the growth of these new devices. While everything mentioned above is on the positive side, there are quite a few aspects of MIDAS that are annoying and time consuming. One thing that the analog programmer learns is that the digital computer brooks no mistakes. If a "zero" is punched when the letter "0" is required, the problem comes back -generally the following day-with a diagnostic telling him of an undefined symbol. A little thing like a missing decimal point in numerical data will cause a day to be lost. Another thing that an analog programmer encounters is that when an error exists in a MIDAS program (however not of the type to prevent its running) , the solutions look just as good (as many significant figures of output) as they would if the solution were perfect. On the analog computer errors of the same type would cause overload lights to flash, etc. The very sophisticated integration routine of MIDAS introduces problems at times. For example, discontinuities in derivatives some- times make it impossible for the error criterion to be met even though the increment of the independent variable is halved many times. This will cause a solution to "hang up." Provision has been made to overcome this problem by the use of MININT (see Table I) but it usually requires one or more unsatisfactory runs before the programmer is made aware of the difficulty. One rather interesting discovery was the fact that an operation that was very easy to perform on an analog computer was very bothersome on the digital. Specifically, a rather large missile simulation was performed first on the analog computer and later using MIDAS. Quite a few first order lags were present in the mathematical model in the form of -S1 . On the analog computer this offers no T +1 problem. For small values of T one way to handle this to parallel the feedback resistor of a summer with a capacitor of T microfarads. In fact, far from creating a difficulty, it generally is beneficial to the analog simulation by reducing some of the high frequency "noise." Using MIDAS, small values of T can cause considerable increase in solution time. For example, in this particular problem, when such transfer functions with T of .001 sees. were Figure 1. Block Diagram. ~ IT~ MIDAS-HOW IT WORKS AND HOW IT'S WORKED included, the solution time was extrapolated to be 51f2 hours for 26 seconds of problem time. This was reduced to 121;2 minutes for the same length of problem time, simply by neglecting these small delays, and the effect on the results was insignificant. Incidentally, the analog solution was in "real time," i.e., 26 seconds. The subject of solution time is rather important to a digital programmer. We have attempted to gather data on the relative speed of a MIDAS run compared with a Fortran program produced by a skilled programmer. With the conditions of the test equalized, the solution time of the Fortran program was approximately half as long; however, the programming time for MIDAS was much less. The question of solution time is not very important for the typical problem handled with MIDAS because usually we are interested in one or two runs, so whether they take 3 minutes or 5 minutes each is of academic interest only. There have been a few problems handled by MIDAS alone without recourse to the analog computer. In these cases, program efficiency was of considerable importance since many runs were required. Here, in the present stage of the development of MIDAS, a specially tailored digital program should receive serious consideration. At this point the question should be considered of whether MIDAS or a similar digital computer program will take over the role of the analog computer in the areas where the latter shines. Since a MIDAS~type program has appropriated one of the best features of the analog computer, viz., simple block diagram programming and the speed and capacity of digital computers have developed so much, there is certainly reason to consider this question. While anyone would be foolhardy to give an answer to hold for all time, it is our opinion that MIDAS, rather than threatening the existence of analog computers, has reinforced their position by increasing confidence in their output. There are quite a few advantages to the use of an analog computer which MIDAS doesn't touch. Among these are: (1) The intimate relationship between the engineer and his problem which enables 323 him to design his system by observing graphical outputs and changing parameters as required. (2) The ability to tie in physical hardware to the mathematical simulation. ·(3) The ability to use portions of the computer in direct relationship to the size of the problem. (4) The fact that certain mathematical operations are performed better, e.g., integration. (5) The very fact that it is a distinctly different technique of solution, thus making possible a checking means. While some progress has been and is being achieved in items 1, 2, 3 and 4, item 5 will always remain. Future of MIDAS Although MIDAS has proven to be very effective in accomplishing its purpose, certain improvements could be made without materially changing its simple programming rules. Among such improvements would be the following: (1) Increased efficiency, i.e., shorter solution times without losing programming simplicity. (2) Additional flexibility in naming outputs. (3) Permit the use of fixed point literals in the body of the program. (4) A greatly expanded operation list that would include logical operations such as AND, OR, NOT, etc. and others equivalent to the elements found in a hybrid computer. A new program is being developed at Wright-Patterson AF Base which already includes the improvements listed above. In addition, it is anticipated that the following features will be included: (1) Ability to add new functions external to the basic program. (2) Additional controls that would ( a) Allow the results 6f one run to dictate automatically the conditions for the next. 324 PROCEEDINGS-F ALL JOJNT COMPUTER CONFERENCE, 1964 (b) Permit more "hands on" control of operation of the program as advocated by Mr. R. Brennan in his P ACTOLUS!J program. It is further hoped that an investigation of various integration routines will result in an integration system that will 'automatically account for discontinuities and ·,thus prevent the solution from "hanging up.'" The new program, MIMIG, is completely different from MIDAS in concept but it retains the programming ease of MIDAS. It will be written as a system to operate under IBJOB control on an IBM 7090/7094 computer. It is an assembler type program that generates machine language code equivalent to the original problem. The instruction format is very similar to MIDAS but has been designed to appeal to both analog and digital programmers. If and when this' occurs and both analog and digital programmers employ MIMIC regularly, a very significant first step in breaking down the communications barrier between the two will have been taken since they will, for the first time, be speaking the same language. Furthermore, just as MIDAS has made the digital computer accessible to the analog man, this new program might serve to educate the digital programmer in analog methods. The day of the omniscient, triple-threat programmer might be on the way! REFERENCES 1. GASKILL, R. A., HARRIS, J. W., and McKNIGHT, A. L., "DAS-A Digital Analog Simulator," AFIPS Proceedings, 1963 Spring Joint Computer Conference. 2. LESH, F. H., and CURL, F. G., "DEPI: An Interpretive Digital-Computer Routine Simulating Differential-Analyzer Operations," Jet Propulsion Laboratory, Cali- fornia Institute of Technology, Pasadena, California, March 22, 1957. (Note: DEPI is an acronym for Differential Equations Pseudocode Interpreter.) 3. HURLEY, J. R., and SKILES, J. J., "DYSAC: A Digitally Simulated Analog Computer," AFIPS Proceedings, 1963 Spring Joint Computer Conference. ·4. SLAYTON, G. R., "DIDAS: A Digital Differential Analyzer Simulator," Twelfth National Meeting of the Association for Computing Machinery, June 1958. 5. KNUDTSON, H. A.,' and STOVER, R. F., "PARTNER-Proof of Analog Results Through a Numerical Equivalent Routine," Aeronautical Division, Minneapolis-Honeywell Regulator Co., MH Aero Document U-ED 15001, August 22, 1961. 6. MARVIN, I. E., and DURAND, H. P., "Jet Engine Control Representation Study," Jet Engine Division, General Electric Company, Cincinnati, Ohio, Air Force Technical Documentary Report ASD-TDR-63-650, July 1963. (Note: DYNASAR is an acronym for Dynamic Systems Analyzer). 7. MILNE, W. E., and REYNOLDS, R. R., Journal of the ACM, January 1962. 8. HARNETT, R. T., SANSOM, F. J., and WARSHAWSKY, L. M., "MIDAS Programming Guide," Air Force Technical Documentary Report SEG-TDR-64-1, January 1964. DDC (formerly ASTIA) Report No. AD430892. Also available from Office of Technical Services, U.S. Dept. of Commerce. 9. BRENNAN, R. D., and SANO, H., "PACTOLUS-A Digital Analog Simulator Program for the IBM 1620," IBM San Jose Research Laboratory, San Jose, California, IBM Research Report RJ-297, May 6, 1964. THE RAND TABLET:- A MAN-MACHINE GRAPHICAL COMMUNICATION DEVICe* M. R. Davis and T. O. Ellis The RAND Corporation Santa· Monica, California Present-day user.:.computer interface mechanisms provide far from optimum communication, considerably reducing the probability that full advantage is being taken of the capabilities of either the machine or of the user. A number of separate research projects are underway, aimed at investigating ways of improving the languages by which man communicates with the computer, and at developing more useful and more versatile communication channels. Several of these projects are concerned with the design of "two-dimensional" or "graphical" man-computer links. Early in the development of man-machine studies at RAND, it was felt that exploration of man's existent dexterity with a free, penlike instrument on a horizontal surface, like a pad of paper, would be fruitful. The concept of generating hand-directed, two-dimensional information on a surface not coincident with the display device (versus a "light pen") is not new and has been examined by others in the field. It is felt, however, that the stylus-tablet device developed at RAND (see Fig. 1) is a highly practical instrument, allowing further investigation of new freedoms of expression in direct communications with computers. is connected to an input channel of a general~ purpose computer and also to an oscilloscope display. The display control multiplexes the stylus position information with computer~ generated information in such a way that the oscilloscope display contains a composite of the current pen position (represented as a dot) and the computer output. In addition, the computer may regenerate meaningful track history on the CRT, so' that while the· user is writing, it appears that the pen has "ink." The displayed "ink" is visualized from the oscilloscope display while hand-directing the stylus position on the tablet, as in Fig. 1. Users normally adjust within a few minutes to the conceptual superposition of the displayed ink and the actual off-screen pen movement. There is no apparent loss of ease or speed in writing, printing, constructing arbitrary figures, or even in penning . one's signature. To maintain the "naturalness" of the pen device, a pressure-sensitive switch in the tip of the stylus indicates "stroke" or intended input information to the computer. This switch is actuated by approximately the same pressure normally used in writing with a pencil, so that· strokes within described symbols are defined in a natural manner. The RAND tablet device generates lO-bit x and 10-bit y stylus position information. It * This research was supported by the Advanced Research Projects Agency under contract No. SD-79. Any views or conclusions should not be interpreted as representing the official opinion or policy of ARPA or of the RAND Corporation. 325 326 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 Figure 1. Complete System in Operation. In addition to the many advantages of a "live pad of paper" for control and interpretive purposes, the user soon finds it very convenient to have no part of the "working" surface (the CRT) covered by the physical pen or the hand. The gross functioning of the RAND tablet system is best illustrated through a general description of the events that occur during a I~ tl t2 t3 t" t5 maj or cycle (220 fLsec; see timing diagram, Fig. 2). Figure 3 is the system block diagram with the information flow paths indicated by the heavier lines. The clock sequencer furnishes a time sequence of 20 pulses to the blocking oscillators. During each of the 20 timing periods, a blocking oscillator gives a coincident positive and negative pulse on two lines attached to the tablet. 0 ~______~I 0 I t21 9 0 - 0 0 - 0 0 0 t22 t23 ~p.~c 0 t6 0 t20 -I Major cycle 22Op.sec 0 - Figure 2. Timing Waveforms (JLsec). THE RAND TABLET: A MAN-MACHINE COMMUNICATION DEVICE 327 Clock Sequencer & Control Shift Register 20 Gates : Reody Output Register 20 Bits L____________________ .?_u_!..~ ~t__I_n..!~ .':..f_Q_c_e____________________ .J Figure 3. Graphic Input System Block Diagram. The pulses are encoded by the tablet as serial (x,y) Gray-code position information which is sensed by the high-input-impedance, pen-like stylus from the epoxy-coated tablet surface. The pen is of roughly the same size, weight, and appearance as a normal fountain pen. The pen information is strobed, converted from Gray to binary code, assembled in a shift register, and gated in parallel to an interface register. The printed-circuit, all digital tablet, complete with printed-circuit encoding, is a relatively new concept made possible economically by advances in the art of fine-line photoetching. The tablet is the hub of the graphic input system, and its physical construction and the equivalent circuit of the tablet itself will be considered before proceeding to the system detail. The basic building material for the tablet is O.5-mil-thick Mylar sheet clad on both sides with 1;2-ounce copper (approximately 0.6 mils thick). Both sides of the copper-clad Mylar sheets are coated with photo resist, exposed to artwork patterns, and etched using standard fine-line etching techniques. The result is a printed circuit on each side of the Mylar, each side in proper registration with the other. (Accurate registration is important only in the encoder sections, as will be seen later.) Figure 4 is a photo of the printed circuit before it has . been packaged. The double-sided, printed screen is cemented to a smooth, rigid substrate and sprayed with a thin coat of epoxy to provide a good wear surface and to prevent electrical contact between the stylus and the printed circuit. The writing ·area on the tablet is 10.24 X 10.24 in. with resolution of 100 lines per inch. The entire tablet is mounted in a metal case with only the writing area exposed, as can be seen in Fig. 1. Although it would be very difficult to fully illustrate a 1024 X 1024-line system, it does seem necessary, for clarity, to present all the details of the system. Thus, an 8 X 8-line system will be used for the system description and Figure 4. Unmounted Printed Circuit. 328 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 expansion of the concept to larger systems will be left to the reader. Figure 5 shows the detailed, printed circuit on each side of the 0.5-mil Mylar for an 8 X 8line system. The top circuit contains the x position lines and the two y encoder sections, while the bottom circuit has the y position lines and the two x encoder sections. I t should be noted that the position lines are connected at the ends to wide, code-coupling buses. These buses are made as wide as possible in order to obtain the maximum area, since the encoding scheme depends on capacitive coupling from the encoder sections through the Mylar to these wide buses. It should be further noted that the position lines are alternately connected to wide buses on opposite ends. This gives symmetry to the tablet and minimizes the effect of registration errors. With reference to Fig. 5, at time tl encoder pads PI + are pulsed with a positive pulse and pads PI- are pulsed with a negative pulse. Pads PI + are capacitively coupled through the Mylar to y position lines Y5, Y6, y" and Ys, thus coupling a positive pulse to these lines. Pads PI - are capacitively coupled to y position lines Yh Y2, Y3, and Y-l, putting a negative pulse on these lines. At time t 2 , encoder pads P2 + and P:! - are pulsed plus and minus, respectively, putting positive pulses on y position lines y 3, Y-l, Y5, and yr" and negative pulses on y position lines YI, Y2, Y7, and Ys. At the end of time t 3, each y position line has been energized with a unique serial sequence of pulses. If positive pulses are considered as ones and negative pulses are zeroes, the Gray-pulse code appearing on the y position wires is as follows: 000 001 011 010 110 111 101 100 Yl Y2 Y3 Y4 Y5 YG Y7 Ys The x encoder pads are now sequentially pulsed at times t.1, t 5 , and t 6 , giving unique definitions to each x position line. If a pen-like stylus with high input impedance is placed anywhere on the tablet, it will pick up a time sequence of six pulses, indicating the (x,y) position of the stylus. It should be pointed out again that the stylus is electrostatically coupled to the (x,y) position lines through the thin, epoxy wearcoat. If the stylus is placed on the tablet surface at a point (Xt'y~), the pulse stream appearing at the pen tip would be as indicated in Fig. 6. This detected pulse pattern will repeat itself every major cycle as long as the stylus is held in this position. If the stylus is moved, a differ- Figure 5. Double-sided Printed-circuit Layout for 8 X 8 System. THE RAND TABLET: A MAN-MACHINE COMMUNICATION DEVICE 329 Timing pulse C1 = Encoder pad coupling capacity - 5 pf C2 = Capacity to adjacent parallel wires in tablet - 10 pf Pulses at position (x.. ' Ys) C3 = Capacity to crossing lines in screen - 100 pf C .. = Stylus-to-tablet coupling capacity - .5 pf C s = Sf}'lus input shunt capacity - 5 pf ---lnL-_____ Data ready. _ _ _ _ _ _ _ _ Figure 6. Timing Diagram and Pen Signals for the Example 8 x 8 System. R = Stylus input resistance - 200 Kn Figure 7. Equivalent Circuit of Encoder-Tablet-Stylus Coupling and Attenuating Elements. ent pulse pattern is sensed, indicating a new (x,y) position. proximately 1/300 of the drive-line signals. The character of the signals at the stylus input is greatly dependent on the drive-pulse rise time. Since there are 1024 x position lines and 1024 y position lines, 20 bits are required to define an (x,y) position. The actual timing used in the RAND system was shown in Fig. 2. Timing pulses t 21 , t 22 , and tzs are additional pulses used for bookkeeping and data manipulation at the end of each maj or cycle. Figure 8 is an oscilloscope pattern of the amplified signals at the stylus output. t These signals are amplified again and strobed into a Gray-code toggle. An x bit at ts and a y bit at t17 are smaller than the rest. This indicates that the stylus tip is somewhere between lines and these are the· bits that are changing. The position lines on the full-size tablet are 3 mils wide with a 7-mil separation. The codecoupling pads are 16 to 17 mils wide with a 3to 4-mil separation. Figure 4 shows that the encoding pads which couple to the lower set of position lines (y position lines) are enlarged. This greater coupling area increases the signal on the lower lines to compensate for the loss caused by the shielding effect on the upper lines (si]lce they lie between the lower lines and the stylus pick-up). The encoding pad for the two least-significant bits in both x and y was also enlarged to offset the effect of neighboring-line cancellations. With these compensations, all pulses received at the stylUS tip are of approximately the same amplitude. Since the final stages of the amplification and the strobing circuit are dc-coupled, the system is vulnerable to shift in the dc signal level. For this reason, an automatic level control (ALC) circuit has been provided to insure maximum recognizability of signals. During the first 180 fLsec of a major cycle, the stylUS is picking up bits from the tablet. During the last 40 fLsec, the tablet is quiet-Le., the stylUS is at its quiescent level. During this 40-fLsec interval, the quiescent level of the pen is strobed into the ALC toggle. If the quiescent level is recognized as a zero, the ALC condenser changes slowly into the proper direction to change the recognition (via a bias circuit) to a one, and vice versa. For a perfectly balanced system, the ALC toggle would alternate between 1 and 0 with each major cycle. Figure 7 is an illustration of the approximate equivalent circuit of the encoder-tablet-stylus system, along with typical system parameter values. It is clear that the values of C1 vary with encoder-pad size, and the value G,I varies according to whether top or bottom lines are being considered. The value of C4 is also dependent on the stylus-tip geometry and wearcoat thickness of the tablet. The signals arriving at the input to the stylus amplifier are ap- A Gray code was selected so that only one bit would change value with each wire position, giving a complete and unambiguous detert It will be noted in the oscilloscope pattern of Fig. 8 that the pulsing sequence is x first and y last. This is mentioned only because it is the opposite order of that shown in the 8 x 8-line example system discussed above; otherwise, it is unimportant. 330 PROCEEDINGS-FALL JOINT COMPUTER CONFERENCE, 1964 I II x pulses tit2 t3 t 4 t5 t6 t 7 t8 t9 t 10 y pulses til t 12 t 13 t 14 t 15 ---------1 t 16 t 17 t 18 t 20 t 19 -2 -4 Figure 8. Oscillogram of Pen Signal and Strobe. mination of the stylus position. Furthermore, a reflected Gray code facilitates serial conversion to binary. The conversion logic for an N -bit number, when N is the most significant bit, is: Binary~ = GraYN