1964 04_#25 04 #25
1964-04_#25 1964-04_%2325
User Manual: 1964-04_#25
Open the PDF directly: View PDF .
Page Count: 636
Download | |
Open PDF In Browser | View PDF |
AFIPS CONFERENCE PROCEEDINGS VOLUME 25 1964 SPRING JOINT COMPUTER CONFERENCE The ideas and opinions expressed herein are solely those of the authors and are not necessarily representative of or endorsed by the 1964 Spring Joint Computer Conference Committee or the American Federation of Information Processing Societies. Library of Congress Catalog Card Number: 55-44701 Copyright © 1964 by American Federation of Information Processing Societies, P. O. Box 1196, Santa Monica, California. Printed in the United States of America. All rights reserv:ed. This book or parts thereof, may not he reproduced in any form without permission of the publishers. Sole Distributors in Great Britain, the British Commonwealth and the Continent of Europe: CLEA VER-HUME PRESS 10-15 St. Martins Street London W. C. 2 ii LIST OF JOINT COMPUTER CONFERENCES 1. 1951 Joint AlEE-IRE Computer Conference, Philadelphia, December 1951 2. 1952 Joint AIEE-IRE-ACM Computer Conference, New York, December 1952 3. 1953 Western Computer Conference, Los Angeles, February 1953 4. 1953 Eastern Joint Computer Conference, Washington, December 1953 5. 1954 Western Computer Conference, Los Angeles, February 1954 6. 1954 Eastern Joint Computer Conference, Philadelphia, December 1954 7. 1955 Western Joint Computer Conference, Los Angeles, March 1955 8. 1955 Eastern Joint Computer Conference, Boston, November 1955 9. 1956 Western Joint Computer Conference, San Francisco, February 1956 10. 1956 Eastern Joint Computer Conference, N ew York, December 1956 11. 1957 Western Joint Computer Conference, Los Angeles, February 1957 12. 1957 Eastern Joint Computer Conference, Washington, December 1957 13. 1958 Western Joint Computer Conference, Los Angeles, May 1958 14. 1958 Eastern Joint Computer Conference, Philadelphia, December 1958 15. 1959 Western Joint Computer Conference, San Francisco, March 1959 16. 1959 Eastern Joint Computer Conference, Boston, December 1959 17. 1960 Western Joint Computer Conference, San Francisco, May 1960 18. 1960 Eastern Joint Computer Conference, New York, December 1960 19. 1961 Western Joint Computer Conference, Los Angeles, May 1961 20. 1961 Eastern Joint Computer Conference. Washington, December 1961 21. 1962 Spring Joint Computer Conference, San Francisco, May 1962 22. 1962 Fall J oint Computer Conference, Philadelphia, December 1962 23. 1963 Spring Joint Computer Conference, Detroit, May 1963 24. 1963 Fall Joint Computer Conference, Las Vegas, November 1963 25. 1964 Spring Joint Computer Conference, Washington, April 1964 Conferences 1 to 19 were sponsored by the National Joint Computer Committee, predecessor of AFIPS. Back copies of the proceedings of these conferences may be obtained, if available, from: • Association for Computing Machinery, 14 E. 69th St., New York 21, N. Y. • American Institute of Electrical Engineers, 345 E. 47th St., New York 17, N. Y. • Institute of Radio Engineers, 1 E. 79th St., New York 21, N. Y. Conferences 20 and up are sponsored by AFIPS. Copies of AFIPS Conference Proceedings may be ordered from the publishers as available at the prices indicated below. Members of societies affiliated with AFIPS may obtain copies at the special "Member Price" shown. List Prne Member Price 20 $12.00 $7.00 21 6.00 6.00 22 (LOO 4.00 23 24 25 1000 16.50 5.00 8.25 Volume I Publisher Macmillan Co., 60 Fifth Ave., New York 11, N. Y. National Press, 850 Hansen Way, Palo Alto, Calif. Sparatan Books, Inc., 301 N. Charles St., Baltimore 1, Md. Sparatan Books, Inc. Sparatan Books, Inc. Sparatan Books, Inc. NOTICE TO LIBRARIANS This volume (25) continues the Joint Computer Conference Proceedings (LC55-44701) as indicated in the above table. It is suggested that the series be filed under AFIPS and cross referenced as necessary to the Eastern, Western, Spring, and Fall Joint Computer Conferences. CONTENTS Page Preface COMPILERS: TUTORIAL Programming Systems and Languages: A Historical Survey Bounded Context Translation Syntax-Directed Compiling TECHNICAL SESSION A general-Purpose Table-Driven Compiler APPLICATIONS A Computer Technique for Producing Animated Movies Simulation of Biological Cells by Systems Composed of String-Processing Finite Automata Computer Simulation of Human Interaction in Small Groups Real-Time Computer Studies of Bargaining Behavior: The Effects of Threat Upon Bargaining Real-Time Quick-Look Analysis for the OGO Satellites SOCIAL IMPLICATIONS OF DATA PROCESSING An Ethos for the Age of Cyberculture Information Processing and Some Implications for More Rational Economic Activities The Computer Revolution and the Spirit of Man NUMERICAL ANALYSIS New Difference Equation Technique for Solving NonLinear Differential Equations Discontinuous System Variables in the Optimum Control of Second Order Oscillatory Systems With Zeros Two New Direct. Minimum Search Procedures for Functions of Several Variables COMMAND AND CONTROL On the Evaluation of the Cost-Effectiveness of Command and Control Systems S. ROSEN R. M. GRAHAM T. E. CHEATHAM, JR. K. SATTLEY 1 17 31 S. WARSHALL R. M. SHAPIRO 59 K. C. KNOWLTON W. R. STAHL R. W. COFFIN H. E. GOHEEN J. T. GULLAHORN J. E. GULLAHORN R. J. MEEKER G. H. SHURE W. H. MOORE, JR. R. J. COYLE J. K. STEWART 67 89 103 115 125 A. M. HILTON 139 H. E. STRINER 155 R. H. DAVIS 161 J. M. HURT 169 W. NEVIUS 181 H. TITUS B. WITTE 195 N. P. EDWARDS 211 Page Fractionization of the Military Context Some Problems Associated With Large Programming Efforts Some Cost Contributors to Large-Scale Programs HYBRID SYSTEMS: TUTORIAL Hybrid-Computation ... What is it? ... Who. Needs it? ... HYBRID SYSTEMS: TECHNICAL A Hybrid Analog-Digital Parameter Optimizer for Astrac II A Hybrid Analog-Digital Pseudo-Random Noise Generator A 2MC Bit-Rate Delta-Sigma Modulation System for Analog Function Storage ARTIFICIAL INTELLIGENCE A Computer-Simulated On-Line Learning Control System A Heuristic Program to Solve Geometric-Analogy Problems Experiments With a Theorem-Utilizing Program EVALUATING COMPUTER SYSTEMS Analytical Technique for Automatic Data Processing Equipment Acq.uisition Cost-Value Technique for Evaluation of Computer System Proposals The Use of a Computer to Eyaluate Computers MULTI-PROGRAMMING A General-Purpose Time-Sharing System Remote Computing: An Experimental System Part 1 : External Specifications Remote Computing: An Experimental System Part 2 : Internal Design Multi-Computer Programming for a Large-Scale, Real-Time Data Processing System LOGIC, LAYOUT AND ASSOCIATIVE MEMORIES On the Analysis and Synthesis of Three-Valued Digital -Systems An Algorithm for Placement of Interconnected Elements Based on Minimum Wire Length Studies of an Associatively Addressed Distributed Memory Design of an Experimental Multiple Instantaneous Response File F. B. THOMPSON A. E. DANIELS 219 231 B. NANUS L. FARR 239 T. D. TRUITT 249 B. A. MITCHELL, JR. 271 R. L. T. HAMPTON 287 H. HANDLER R. H. MANGELS 303 J. D. HILL 315 G. McMuRTY K. S. Fu T. G. EVANS 327 L. E. TRAVIS 339 S. ROSENTHAL 359 E. O. JOSLIN 367 D. J. HERMAN 383 E. G. COFFMAN, JR. J. 1. SCHWARTZ C. WEISSMAN T. M. DUNN J. H. MORRISSEY J. 1\1:. KELLER E. C. STRUM G. H. YANG G. E. PICKERING G. A. ERICKSON E. G. MUTSCHLER 397 413 425 445 J. SANTOS 463 H. ARANGO R. A. RUTMAN 477 G. J. SIMMONS 493 E. L. YOUNKER 515 C. H. HECKLER, JR. D. P. MASHER J. M. YARBOROUGH Page INFORMATION RETRIEVAL TUTORIAL Research in Automatic Generation of Classification Systems Information Storage and Retrieval :. Analysis of the State-of-the-Art INFORMATION RETRIEVAL: TECHNICAL Training a Computer to Assign Descriptors to Documents: Experiments in Automatic Indexing Experiments in Information Correlation Some Flexible Information Retrieval Structure Matching Procedures System~ Using BUSINESS DATA PROCESSING Two New Improvements in Sorting Techniques Conceptual Models for Determining Information Requirements H. BORKO 529 G. ARNOVICK J. A. LILES A. H. ROSEN J. S. WOOD 537 M. E. STEVENS G. H. URBAN J. L. KUHNS A. MONTGOMERY C. G. SALTON E. H. SUSSENGUTH, JK 563 M. A. GOETZ J. C. MILLER 577 587 599 609 PREFACE The following pages contain the papers which were presented and discussed in the formal technical and tutorial sessions of the 1964 Spring Computer Conference. The Conference theme, "Computers '64: ProblemSolving in a Changing World," is intended to suggest the significanc~ of this year as the mid-point between the infancy of electronic digital conlputation (the first elements of ENIAC began operating in June 1944) and the Orwellian year 1984, which symbolizes to some critics an unhappy potential which might be achieved if we do not wisely guide our rapidly advancing technology. Society is increasingly concerned with the broad adjustments that must take place if man is to gain the maximum long-term advantage from the computer. Reflections of this concern and of the increasing uses to which computers are being applied are, among others, the papers dealing with social implications and the tutorial papers on hybrid systems, compilers, and information retrieval. This volume, a product of the 25th in the series of Joint Computer Conferences, for the first time includes an -index. Thanks for this useful addition are due to the authors and session chairmen, who were most -cooperative in getting "final copy" submitted on schedule, and to our publisher, who assumed the burden of preparing the index. Appreciation must also be expressed for the contributions of many other persons, who participated as conference planners, panel members, and reviewers. A special acknowledgement is made to the members of the Technical Program Committee, who willingly assumed the heavy burden of assembling the formal program. R. KOLLER General Chairman 1964 Spring Joint Computer Conference HERBERT PROGRAMMING SYSTEMS AND LANGUAGES A Historical Survey Saul Rosen Professor of Mathematics and Computer Sciences Purdue University West Lafayette, Indiana 1. Introduction. Twenty years ago, in 1943, there were no Electronic computers. Ten years ago, in 1953, a large number of Electronic calculators were in use, but the general purpose stored program electronic computer was still quite rare. The Coincident Current Magnetic Core memory which finally provided both reliability and speed at reasonable cost had only just been developed, and was still. a laboratory device. A number of specially designed, mostly one of a kind, computers were in operation at Universities and government' research centers. Commercially, a few Univac I computers had been delivered and were operating with great reliability at rather low speed. A few IBM 701's provided high speed but with very poor reliability. In 1953 most computing was being done by the Card-Programmed Calculator, an ingenious mating of an Electromechanical Accounting \ Machine with an Electronic Calculating Punch. installed, including many hundreds of very large computers. Electronic Computing and Data-processing have become part of the everyday industrial and commercial environment, and Electronic Computer manufacturing and programming has become a multi-billion dollar industry. Because of this rapid, almost explosive pattern of gro'wth mod systems, both in the hard~ ware and software area could not be adequately planned, and it was often not possible to make adequate evaluations of old systems in time to use such evaluations in the design of new ones. 2. Developments up to 1957. The first programming systems were systems of subroutines. Subroutines were in use on the pre-electronic Mark I, a large electromechanical computer built as a joint effort by IBM and Harvard University in the early forties.! The EDSAC at the University of Manchester was probably the first stored program Electronic Computer in operation (1949). The classic text on programming the EDSAC, by Wilkes, Wheeler and Gill 2 makes the subroutine library the basis of programming, and stresses the automatic relocation feature of the EDSAC, which makes such libraries easy to use. Between 1954 and 1958 many hundreds of Electronic computers were installed. This was the era of the Vacuum Tube Computer, with Magnetic Drum storage on lower priced machines, and Magnetic Core storage on the larger more expensive ones. By 1959 the first transistorized computers had been delivered, and the production of vacuum tube computers ceased almost immediately, Low cost magnetic core memories made the magnetic drum almost obsolete except as auxiliary storage. Since 1959 thousands of computers have been delivered and The IBM Card-Programmed Calculator, developed between 1947 and 1949, was not very fast by modern standards, but was an extremely versatile device. Operation codes were deter1 2 PROCEE'DINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 mined by the wiring of control boards. Ingenuity could be expended on designing boards to optimize performance on particular problems that taxed the capacity of the computer, or it could be expended on the design of general purpose boards for use in large classes of problems. A set of general purpose boards represented a language for the CPC, and a number of languages were tried and used. 3 Most Scientific installations finally settled on a wiring that made the CPC appear to be a floating point machine with three-address logic, and with a standard vocabulary of built-in routines like Sq,uare Root, Sine, Exponential, etc. Experience with the CPC systems had many influences on the programming systems that were designed for the stored-program computers that followed. One of the most important of the early automatic programming groups was associated with the Whirlwind project at MIT. Whirlwind, which was built between 1947 and 1951, was a fast but very basic computer. With only a 16 bit word it had a very limited instruction code, and very limited high speed storage. Even relatively simple calcula,tions required the use of multi-precision techniques. The very difficulty of using the machine in its own language provided great incentive toward the development of programming languages. The Summer Session Computer at MIT was one of the early interpreti ve systems, designed to make the Whirlwind computer available to students at a summer course in computing at MIT. These ~.arly developments led to the design of a quite elaborate "Comprehensive System" for Whirlwind. 4 At the associated Lincoln Laboratories the groundwork was laid for the very large programming systems that were developed in connection with Sage and other military projects. s The first large scale electronic computer 3;vailable commercially was the Univac I (1951. The first Automatic Programming group associated with a commercial computer effort was the group set up by Dr. Grace Hopper in what was then the Eckert-Mauchly Computer Corp., and which later became the Univac Division of Sperry Rand. The Univac had been designed so as to be relatively easy to program in its own code. It was a decimal, alphanumeric machine, with mnemonic instructions that were easy to remember and use. The 12 character word made sca)ing of many fixed-point calculations fairly easy. It was not always easy to see the advantage of assembly systems and compilers that were often slow and clumsy on a machine with only 12,000 characters of high speed storage (200 microseconds average access time per 12 character word). In spite of occasional setbacks, Dr. Hopper persevered in her belief that programming should and would be done in problem-oriented languages., Her group embarked on the development of a whole series of languages, of which the most used was probably A2, a compiler that provided a three address floating point system by compiling calls on floating point subroutines stored in main memory.6,7 The Algebraic Translator AT3 (Math-Matic 8 ) contributed a number of ideas to Algol and other compiler efforts, but its own usefulness was very much limited by the fact that Univac had become obsolete as a scientific computer before AT3 was finished. The BO (Flow-Matic 8 ,9) compiler was one of the major influences on the COBOL language development which will be discussed at greater length later. The first sort generators 10 were produced by the Univac programming group in 1951. They also produced what was probably the first large scale symbol manipulation program, a program that performed differentiation of formulas submitted to it in symbolic form.1 1 Another and quite independent group at Univac concerned itself with an area that would now be called computer-oriented compilers. Anatol Holt and William Turanski developed a compiler and a concept that they called GP (Generalized Programming 12) • Their system assumed the existence of a very general subroutine library. All programs would be written as if they were to be library programs, and the library and system would grow together. A program was assembled at compile time by the selection and modification of particular library programs and parts of library programs. The program as written by the programmer would provide parameters and specifications according to which the very general library programs would be made specific to a particular problem. Subroutines in the library were organized in hierarchies, in which subroutines at one level PROGRAMMING SYSTEMS AND LANGUAGES could call on others at the next level. Specifications and parameters could be passed· from one level to the next. The system was extended and elaborated in the GPX system that they developed for Univac II. They were one of the early groups to give serious attention to some difficult problems relative to the structure of programs, in particular the problem of segmenting of programs, and the related problem of storage allocation. Perhaps the most important contribution of this group was the emphasis that they placed on the programming system rather than on the programming language. In their terms, the machine that the programmer uses is not the hardware machine, but rather an extended machine consisting of the hardware enhanced by a programming system that performs all kinds of service and library maintenance functions in addition to the translation and running of programs. The IBM 701 (1952) was the first commercially marketed large scale binary computer. The best known programming language on the 701 was Speedcode,13,14 a language that made the one address, binary, fixed point 701 appear to be a three address decimal floating point machine with index registers. More than almost any other language, Speedcode demonstrated ·the extent to which users were willing to sacrifice computer speed fOF the sake of programming convenience. The PACT15,16 system on the 701 set a precedent as the first programming system designed by a committee of computer users. It also set a precedent for a situation which unfortunately has been quite common in the computer field. The computer for which the programming system was developed was already obsolete before the programming system itself was completed. P ACT ideas had a great deal of influence on the later developments that took place under the auspices of the SHARE organization. Delivery of the smaller, medium scale magnetic-drum computers started in 1953, and by 1955-56 they were a very important factor in the computer field. The IBM 650 was by far the most popular of the early drum computers. The 3 650 was quite easy to program in its own language, and was programmed that way in many applications, especially in the data-processing area. As a scientific computer it lacked floating point hardware, a feature that was later made available. A number of interpretive floating point systems were developed, of which the most popular was the one designed at the Bell Telephone Laboratories. 17 This was a three address floating point system with automatic looping and with built in Mathematical subroutines. It was a logical continuation of the line of systems that had started with the general purpose CPC boards, and had been continued in 701 Speedcode. It proved that on the right kind of computer an interpretive system can provide an efficient effective tool. Interpretive systems fell into disrepute for a number of years: They are making a very strong comeback at the present time in connection with a number of so-called micro-programmed computers that have recently appeared on the market. The 650 permitted optimization of programs by means of proper placement of instructions on the drum. Optimization was a very tedious job f'or the programmer, but could produce a very considerable improvement in program running time. A program called SOAP, a Symbolic Optimize:c. and Assembly Programs, combined the features of symbolic assembly and automatic optim.ization. There is some" doubt a~ to whether a symbolic assembly system would have received very general acceptance on the 650 at the time SOAP was introduced. The optimization feature was obviously valuable. Symbolic assembly by itself on a decimal machine without magnetic tape did not present advant~ges that were nearly as obvious. The major competitor to the 650 among the early magnetic drum computers was the Datatron 205, which eventually became a Burroughs product. It featured a 4000 word magnetic drum storage plus a small 80 word fast access memory in high speed loops on the drum. Efficient programs had to make very frequent use of block transfer instructions, moving both data and programs to high speed storage. A number of interpretive and assembly system were built to provide programmed floating point instructions and some measure of automatic use of the 4 PROCEE'DINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 high speed loops. The eventual introduction of floating point hardware removed one of the principal advantages of most of these systems. The Datatron was the first commercial system to provide an index register and automatic relocation of subroutines, features provided by programming systems on other computers. For these reasons among others the use of machine code programming persisted through most of the productive lifetime of the Datatron 205. One of the. first Datatron computers was installed at Purdue University. One of the first Algebraic compilers was designed for the Datatron by a group at Purdue headed by Dr. Al Perlis. This is another example of a compiler effort based on a firm belief that programming should be done in problem-oriented languages, even if the computer immediately available may not lend itself too well to th~ use of such languages. A big problem in the early Datatron systems was the complete lack of alphanumeric input. The computer would recognize pairs of numbers as representing characters for printing on the flexowriter, but there was no way to produce the same pair of numbers by a single key stroke on any input preparation device. Until the nluch later development of new inputoutput devices, the input to the Purdue compiler was prepared by manually transcribing the source language into pairs of numbers. When Dr. Perlis moved to Carnegie Tech the some compiler was written for the 650, and was named IT.I8 IT made use of the alphanumeric card input of the 650, and translated from a simplified algebraic language into SOAP language as output. IT and languages derived from it became quite popular on the 650, and on other computers, and have had great influence on the later development of programming languages. A language Fortransit provided translation from a subset of Fortran into IT, whence a program would be translated into SOAP, and Slofter two or more passes through SOAP it wou Id finally emerge as a machine language program. The language would probably have been more popular if its translation were not such an involved and time-consuming process. Eventually other, more direct translators were built that avoided many of the intermediate passes. The 701 used a rather unreliable electrostatic tube storage system. When Magnetic core storage became available there was some talk about a 701M computer that would be an advanced 701 with core storage. The idea of a 701M was soon dropped in favor of a completely new computer, the 704. The 704 was going to incorporate into hardware many of the features for which programming systems had been developed in the past. Automatic floating point hardware and index registers would make interpretive systems like Speedcode unnecessary. Along with the development of the 704 hardware IBM set up a project headed by John Backus to develop a suitable compiler for the new computer. After the expenditure of about 25 man years of effort they produced the first Fortran compiler.I 9 •20 Fortran is in many ways the most important and most impressive development in the early history of automatic programming. Like most of the early hardware and software systems Fortran was late in delivery, and didn't really work when it was delivered. At first people thought it would never be done. Then when it was in field test, with many bugs, and with some of the most important parts unfinished, many thought it would never work. It gradually got to the point where a program in Fortran had a reasonable expectancy of compiling all the way through and maybe even of running. This gradual change of status from an experiment to a working system was true of most compilers. It is stressed here in the case of Fortran only because Fortran is now almost taken for granted, as if it were built into the computer hardware. t In -the early days of automatic programming, the most important criterion on which a compiler was judged was the efficiency of the object code. "You only compile once, you run the object program many times," was a statement often quoted to justify a compiler design philosophy that permitted the compiler to take as long as necessary, within reason, to produce good object code. The Fort.ran compiler on the 704 applied a number of difficult and ingenious techniques in an attempt to produce object coding that would be as good as that produced by a good programmer programming in machine PROGRAMMING SYSTEMS AND LANGUAGES code. For many types of programs the coding produced is very good. Of course there are some for which it is not so good. In order to make effective use of index registers a very complicated index register assignment algorithm was used that involved a complete analysis of the flow of the program and a simulation of the running of the program using information obtained from frequency statements and from the flow analysis. This was very time consuming, especially on the relatively small initial 704 configuration. Part of the index register optimization fell into disuse quite early but much of it was carried along into Fortran II and is still in use on the 704/9/90. In many programs it still contributes to the production of better code than can be achieved on the new Fortran IV compiler. Experience led to a gradual change of philosophy with respect to compilers. During debugging, compiling is done over and over again. One of the major reasons for using a problem oriented language is to make it easy to modify programs frequently on the basis of experience gained in running the programs. In many cases the total compile time used by a project is much greater than the total time spent running object codes. More recent compilers on many computers have emphasized compiling time rather than run time efficiency. Some may have gone too far in that direction. It was the development of Fortran II that made it possible to use Fortran for large problems without using excessive compiling time. Fortran II permitted a program to be broken down into subprograms which could be tested and debugged separately. With Fortran II in full operation, the use of Fortran spread very rapidly. Many 704 installations started to use nothing but Fortran. A revolution was taking place in the scientific computing field, but many of the spokesmen for the computer field were unaware of it. A number of major projects that were at crucial points in their development in 1957-1959 might have proceeded quite differently if there was more general awareness of the extent to which the use of Fortran had been accepted in many major 704 installations. Among these are the Algol project and the SOS project which are discussed below. 5 3. Algol and its Dialects. Until quite recently, large scale computers have been mainly an American phenomenon. Smaller computers were almost worldwide right from the beginning. An active computer organization GAMM had been set up in Europe, and in 1957 a number of members of this organization were actively interested in the design of Algebraic compilers for a number of machines. They decided to try to reach agreement on a common language for various machines, and made considerable progress toward the design of such a language. There are many obvious advantages to having generally accepted computer independent problem oriented languages. It was clear that a really international effort in this direction could only be achieved with United States participation. The President of GAMM wrote a letter to John Carr who was then President of the ACM, suggesting that representatives of ACM and of GAMM meet together for the purpose of specifying an international language for the description of computing procedures. The ACM up to that time had served as a forum for the presentation of ideas in all aspects of the computer field. It had never engaged in actual design of languages or systems. In response to the letter from GAMM, Dr. Carr appointed Dr. Perlis as chairman of a committee on programming languages. The committee set out to specify an Algebraic compiler language that would represent the American proposal at a meeting with representatives of GAMM at which an attempt would be made to reach agreement on an internationally accepted language. The ACM committee consisted of representatives of the major computer manufacturers, and representatives of several Universities and research agencies that had done work in the compiler field. Probably the most active member of the committee was John Backus of IBM. He was probably the only member of the committee whose position permitted him to spend full time on the language design project, and a good part of the "American Proposal" was based on his work. The ACM committee had a number of meetings without any very great sense of urgency. Subcommittees worked on various parts of the 6 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 language and reported back to the full committee, and in general there was little argument or disagreement. There is after all very general agreement about the really basic elements of an Algebraic language. Much of the language is determined by the desirability of remaining as close as possible to Mathematical notation. This is tempered by experience in the use of computers and in the design of compilers which indicates some compromises between the demands of desirable notation and those of practical implementation. At one meeting of the committee Dr. Bauer, one of the leaders of the GAMM effort, presented a report on the proposed European language. Among other things they proposed that English language key words, like begin, end, for, do, be used as a world-wide standard. Of course this is something the American committee would never have proposed, but it seemed quite reasonable to go along with the Europeans in this matter. Although some of the notations seemed strange, there were very few basic disagreements between what GAMM was proposing, and what the ACM committee was developing. Dr. Bauer remarked that the GAMM organization felt somewhat like the Russians who were meeting with constant rebuffs in an effort to set up a summit meeting. With such wide areas of agreement why couldn't the ACM-GAMM meeting take place? Although there is quite general agreement about the basic elements of an Algebraic language, there is quite considerable disagreement about how far such a language should go, and about how some of the more advanced and more difficult concepts should be specified in the language. Manipulation of strings of symbols, direct handling of vectors, matrices, and multiple precision quantities, ways to specify segmentation of problems, and the allocation and sharing of storage; these were some of the top~cs which could lead to long and lively discussion. The ACM language committee decided that it was unreasonable to expect to reach an agreement on an international language embodying features of this kind at that time. It was decided to set up two subcommittees. One would deal with the specification of a language which included those features on which it was reasonable to expect a wide range of agreement. The other was to work toward the future, toward the specification of a language that would really represent the most advanced thinking in the computer field. The short-range committee was to set up a meeting in Europe with representatives of GAMM. Volunteers for work on this committee would have to arrange for the trip to Europe and back, and were therefore limited to those who worked for an organization that would be willing to sponsor such a trip. The ACM was asked to underwrite the trip for Dr. Perlis. The meeting of the ACM and GAMM subcommittees was held in Zurich in the spring of 1958, and the result was a Preliminary report on an International Algebraic Language, which has since become popularly known as Algol 58. 21 With the use of Fortran already well established in 1958, one may wonder why the American committee did not recommend that the international language be an extension of, or at least in some sense compatible with Fortran. There were a number of reasons. The most obvious has to do with the nature and the limitations of the Fortran language itself. A few features of the Fortran language are clumsy because of the very limited experience with compiler languages that existed when Fortran was designed. Most of Fortran's most serious limitations occur because Fortran was not designed to provide a completely computer independent language; it was designed as a compiler language for the 704. The handling of a number of statement types, in particular the Do and If statements, reflects the hardware constraints of the 704, and the design philosophy which kept these statements simple and therefore restricted in order to simplify optimization of object coding. Another and perhaps more important reason for the fact that the ACM committee almost ignored the existence of Fortran has to do with the predominant position of IBM in the large scale computer field in 1957-1958 when the Algol development started. Much more so than now there were no serious comp~titors. In the data processing field the Univac II was much too late to give any serious competition to the IBM 705. RCA's Bizmac never really had a chance, and Honeywell's Datamatic 1000, with PROGRAMMING SYSTEMS AND LANGUAGES its 3 inch wide tapes, had only very few specialized customers. In the Scientific field there were those who felt that the Univac 1103/1103a/1105 series was as good or better than the IBM 701/ 704/709. Univac's record of late delivery and poor service and support seemed calculated to discourage sales to the extent that the 704 had the field almost completely to itself. The first Algebraic compiler produced by the manufacturer for the Univac Scientific computer, the 1103a, was Unicode, a compiler with many interesting features, that was not completed until after 1960, for computers that were already obsolete. There were no other large scale scientific computers. There was a feeling on the part of a number of persons highly placed in the ACM that Fortran represented part of the IBM empire, and that any enhancement of the status of Fortran by accepting it as the basis of an international standard would also enhance IBM's monopoly in the large scale scientific computer field. The year 1958 in which the first Algol report was published, also marked the emergence of large scale high speed transistorized computers, competitive in price and superior in performance to the vacuum tube computers in general use. At the time I was in charge of Programming systems for the new model 2000 computers that Philco was preparing to market. An Algebraic compiler was an absolute necessity, and there was never really any serious doubt that the language had to be Fortran. 22, 22A The very first saies contracts for the 2000 specified that the computer had to be equipped with a compiler that would accept 704 Fortran source decks essentially without change. Other manufacturers, Honeywell, Control Data, Bendix, faced with the same problems, came to the same conclusion. Without any formal recognition, in spite of the attitude of the professional committe~s, Fortran became the standard scientific computing language. Incidentally, the emergence of Fortran as a standard helped rather than hindered the development of a competitive situation in the scientific computer field. To go on with the Algol development, the years 1958-1959 were years in which many naw computers were introduced. The time was ripe for experimentation in new languages. As mentioned earlier there are many elements in com- 7 mon in all Algebraic languages, and everyone who introduced a new language in those years called it Algol, or a dialect of Algol. The initial result of this first attempt at the standardization of Algebraic ]anguages was the proliferation of such languages in great variety. A very bright young programmer at Burroughs had some ideas about writing a very fast one pass compiler for Burroughs new 220 computer. The compiler has come to be known as Balgol. A compiler called ALGO was written for the Bendix G15 computer. At Systems Development Corporation, programming systems had to be developed for a large command and control system based on the IBM military computer (ANFSQ32). The resulting Algebraic language with fairly elaborate data description facilities was JOVIAL23 (Jules Schwartz' own Version of the International Algebraic Language). By now compilers for JOVIAL ha:ve been written for the IBM 7090, the Control Data 1604, the Philco 2000, the Burroughs D825, and for several versions of IBM military computers. The Naval Electronics Laboratory at San Diego was getting a new Sperry Rand Computer, the Countess. With a variety of other computers installed and expected they stressed the description of a compiler in its own language to make it easy, among other things, to produce a compiler on one computer using a compiler on another. They also stressed very fast compiling times, at the expense of object code running times, if necessary. The language was called Neliac,24,25 a dialect of Algol. Compilers for Neliac are available on at least as great a variety of computers as for JOVIAL. The University of Michigan developed a compiler for a language called Mad, the Michigan Algorithmic Decoder. 26 .27 They were quite unhappy at the slow compiling times of Fortran, especially in connection with short problems typical of student use of a computer at a University. Mad was originally programmed for the 704 and has been adapted for the 7090. It too was based on the 1958 version of Algol. All of these languages derived from Algol 58 are well established, in spite of the fact that the ACM GAMM committee continued its work 8 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 and issued its now well known report defining Algol 60. 28 Algol 60, known simply as Algol, went considerably further than was anticipated in some of the early committee meetings. The language did not limit itself to those areas in which there exists almost universal agreement. Concepts like recursive subroutines, dynamic storage allocation, block structure, own variables and arrays, were introduced which require the inclusion of rather complex structures in the running programs produced by the compiler. Without attempting any serious evaluation of these concepts here, I think it is fair to say that they are difficult, and their inclusion in an Algebraic language that is intended to ba universal is controversial. They led to much debate about the difficult areas and tended to obscpre some of the more fundamental accomplishments of the Algol committee. Algol set an important precedent in language definition by presenting a rigorous definition of its syntax in the Backus normal form.29 As compared to Fortran it contains a much more general treatment of iterative loops. It provides a good systematic handling of Boolean expressions and variables and of conditional statements. The most serious deficiency in Algol results from its complete lack of input-output specifications. The handling of input-output is one of the most important services provided by a compiler, and a general purpose Algebraic compiler language is not completely specified until its input-output language has been defined. Algol compilers have been written for many different computers, but with the exception of Burroughs no computer manufacturer has pushed it very strongly. It is very popular among University and mathematically oriented computer people especially in Europe. For some time in the United States it will probably remain in its status as another available computer language. 4. Data Processing Compilers. The largest user by far of data-processing equipment is the United States government. The government, by law and by design, avoids giving preferential treatment to anyone computer manufacturer. More than any other computer user, the government is plagued by the problems caused by the lack of compatibility between different kinds of computing equipment, whether manufactured by the same or by different suppliers. In the spring of 1959, the office of the Secretary of Defense summoned representatives of the major manufacturers and users of dataprocessing equipment to a meeting in Washington, to discuss the problem associated with the lack of standard programming languages in the data processing area. This was the start of the Committee on Data Systems Languages (CODASYL), that went on to produce COBOL, the common business oriented language. From the beginning their effort was marked by missionary zeal for the cause of English language coding. Actually, there had been very little previous experience with Data processing compilers. Univac's B-O or Flow-Matic,8.9 which was running in 1956, was probably the first true DataProcessing compiler. It introduced the idea of file descriptions, consisting of detailed record and item descriptions, separate from the description of program procedures. It also introduced the idea of using the English language as a programming language. It is remarkable to note that the Univac I on which Flow-Matic was implemented did not have the data-processing capabilities of a good sized 1401 installation. To add to the problems caused by the inadequacy of the computer, the implementation of the compiler was poor, and compilation was very very slow. There were installations that tried it and dropped it. Others used it, with the philosophy that even with compiling times measured in hours the total output of the installation was greater using the compiler than without it. Experience with Flow-Matic was almost the only experience available on Data Processing compilers prior to the launching of the COBOL project. A group at IBM had been working for some time on the Commercial Translator, 30 and some early experience on that system was also available. At the original Defense Department meeting there were two points of view. One group felt that the need was so urgent that it was necessary to work within the state of the art as it PROGRAMMING SYSTEMS AND LANGUAGES then existed and to specify a common language on that basis as soon as possible. The other group felt that a better understanding of the problems of Data-Processing programming was needed before a standard language could be proposed. They suggested that a longer range approach looking toward the specification of a language in the course of two or three years might produce better results. As a result two committees were set up, a short range committee, and an intermediate range committee. The original charter of the short range committee was to examine existing techniques and languages, and to report back to CODASYL with recommendations as to how these could be used to produce an acceptable language. The committee set to work with a great sense of urgency. A number of companies represented had commitments to produce Data-processing compilers, and representatives of some of these became part of the driving force behind the committee effort. The short range committee decided that the only way it could satisfy its obligations was to start immediately on the design of a new language. The committee became known as the COBOL committee, and their language was COBOL. Preliminary specifications for the new language were released by the end of 1959, and several companies, Sylvania, RCA, and Univac started almost immediately on implementation on the MOBIDIC, 501, and Univac II respectively. There then occurred the famous battle of the committees. The intermediate range committee had been meeting occasionally, and on one of these occasions they evaluated the early COBOL specifications and found them wanting. The preliminary specifications for Honeywell's F ACT30 compiler had become available, and the intermediate range committee indicated their feeling that Fact would be a better basis for a Common Business Oriented Language than Cobol. The COBOL committee had no intention of letting their work up to that time go to waste. With some interesting rhetoric about the course of history having made it impossible to consi·1er any other action, and with the support of the Codasyl executive board, they affirmed Cobol as the Cobol. Of course it needed improvements, 9 but the basic structure would remain. The charter of the Cobol committee was revised to eliminate any reference to. short term goals and its effort has continued at an almost unbelievable rate from that time to the present. Computer manufacturers assigned programming Systems people to the committee, essentially on a full time basis. Cobol 60, the first official description of the language, was followed by 61 31 and more recently by 61 extended. 32 Some manufacturers dragged their feet with respect to Cobol implementation. Cobol was an incomplete and developing language, and some manufacturers, especially Honeywell and IBM, were' implementing quite sophisticated data processing compilers of their own which would become obsolete if Cobol were really to achieve its goal. In 1960 the United States government put the full weight of its prestige and purchasing power behind Cobol, and all resistance disappeared. This was accomplished by a simple announcement that the United States government would not purchase or lease computing equipment from any manufacturer unless a Cobol language compiler was available, or unless the manufacturer could prove that the performance of his equipment would not be enhanced by the availability of such a compiler. No such proof was ever attempted for large scale electronic computers. To evaluate Cobol in this short talk is out of the question. A number of quite good Cobol compilers have been written. The one on the 7090 with which I have had some experience may be typical. It implements only a fraction, less than half I would guess, of the language described in the manual for Cobol 61 ext~nded. No announcement has been made as to whether or when the rest, some of which has only been published very recently, will be implemented. What is there is well done, and does many useful things, but the remaining features are important, as are some that have not yet been put into the manual and which may appear in Cobol 63. The language is rather clumsy to use; for example, long words like synchronized and computational must be written out all too frequently; but many programmers are willing to put up with this clumsiness because, within its range of applicability the compiler performs 10 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 many important functions that would otherwise have to be spelled out in great detail. It is hard to believe that this is the last, or even very close to the last word in data processing languages. Before leaving Data Processing compilers I wish to say a few words about the development of the FACT compiler. In 1958 Honeywell, after almost leaving the computer business because of the failure of their Datamatic 1000 computer, decided to make an all out effort to capture part of the medium priced computer market with their Honeywell 800 computer. The computer itself is very interesting but that is part of another talk. They started a trend, now well established, of contracting out their programming systems development, contracting with Computer Usage Co. for a Fortran language compiler. Most interesting from our point of view was their effort in the Data Processing field. On the basis of a contract with Honeywell, the Computer Sciences Corporation was organized. Their contract called for the design and production of a Data processing compiler they called FACT.3o.33 Fact combined the ideas of data processing generators as developed by Univac, GE Hanford,34 Surge 35 and 9PAC with the concepts of English language data processing compilers that had been developed in connection with Univac's Flow-Matic and IBM's commercial translator. The result was a very powerful and very interesting compiler. When completed it contained over 250,000 three address instructions. Designed to work on configurations as small as 4096 words of storage and 4 tape units it was not as fast as some more recent compilers on larger machines. The Fact design went far beyond the original COBOL specifications,30 and has had considerable influence on the later COBOL development. Like all other manufacturers Honeywell has decided to go along with the COBOL language, and Fact will probably fall into disuse. 5. Assemblers and Operating Systems. Symbolic assembly language has become an almost universal form for addressing a computer in a computer oriented language. After the first 704's were delivered in 1956 a number of users produced assembly routines for use on the computer. One of the early standardization efforts involved a choice of a standard assembly program to be used by Share, the 704 users group. It is a sign of some of the thinking that was current then that the standard chosen was U ASAP. 36 The first SAP was a very basic assembly system. It did practically nothing but one-to-one translation, and left the programmer in complete control of all of the parameters of the program. In the early days many users felt that this was all an assembly system should do. Some still feel that way, but on most computers the simple assembly system has been replaced by the full scale computer oriented compiler in which one-to-one code translation is augmented by extensive libraries of subroutines and generators and by allocation, segmentation, and other program-organization features. 37 The word-macro-instruction apparently was coined in connection with the symbolic assembly systems that were developed for IBM's 702/705 computers. These Autocoder 38 systems with their large macro-instruction libraries have been used for huge amounts of data processing programming on a number of machines. Assembly systems gradually grew into or became incorporated into operating systems. 39 .40 Perhaps the earliest monitor system on the 704 was put into operation at the General Motors Research center.41.42 The idea of automatic sequencing of batches of jobs spread rapidly until it was almost universally used in connection with large computers. It made it possible for large computers to handle small jobs with reasonable efficiency and greatly extended their range of application. The idea of such systems was to run the computer without any stops, and to relegate the operator to occasional mounting of tapes, and otherwise to responding to very simple requests presented to him on the on-line printer. Under such a system debugging becomes a completely off-line process. The only response to trouble in a program is to dump and get on with the next program. At the end of 1956 IBM announced its new 709 computer. The 709 was essentially a 704 with internally buffered input and output. PROGRAMMING SYSTEMS AND LANGUAGES As mentioned earlier, IBM was at its peak of penetration of the large scale scientific computer market at that time, and the rest of the industry watched with great interest as many of the best programming systems people representing many leading scientific computer installations met as a Share committee to design the very complex programming system which was eventually called SOS (Share Operating System). The design of programming systems by large committees representing many companies and institutions has almost invariably led to disappointing results. SOS was no exception. Planned mainly by personnel of \Vest Coast aircraft and research companies, it was to be written according to their specifications by the IBM programming systems activity on the East Coast. Separation of design and implementation responsibility by 3000 miles is almost enough in itself to guarantee great difficulty, if not complete failure. In 1958 the chairman of the Share 709 system committee wrote,43 "ThE fundamental procedures used throughout the system will undoubtedly be retained in every installation." This has not been the case. The SOS system is now in use at only a very few installations. There are many reasons, of which I would like to suggest just a few. SOS put all of its emphasis on the computer oriented programming system. The time during which SOS was being designed and implemented was the time during which the attitude toward Fortran was changing from polite skepticism to very general acceptance. By the time SOS was ir nearly full operation some installations were using almost nothing but Fortran. Apparently little or no effort had been expended on the problem of compatibility between SOS and Fortran. It was only in 1962 that an SOS system which handles Fortran was distributed by the Rand Corporation. 44 Their system accepts Fortran source programs, and produces the binary symbolic or squoze decks that can be combined with other programs produced by the SOS system. IBM boasted of over 50 man years of effort on the SOS system for the 709. They spent almost no effort on Fortran for the 709, on the theory that Fortran was developed for the 704 would be adequate. The Fortran II system that was originally distributed for the 709 took no ad- 11 vantage of the fact that the 709 hardware permitted buffered input and output. The SOS system provided a very elaborate buffering system. SOS proposed to provide a system in which the programmer would need to know and use only one language, the compiler source language. One of its major achievements was the provision of source language modification of programs at load tinle without full recompilation. A very versatile debugging system was built around this feature. While this and other features of the system are extremely attractive, there is a' serious question as to whether they are worth the price paid in system complexity, and in increased loading time. I think it is interesting to point out that a relatively simple assembly system, FAP, and a very basic operating system, the Fortran Monitor System, both originated at customer installations and not by the manufacturer, became the most widely used systems on the 709/90 computers. Quite similar systems were introduced on competitive equipment, the Philco 2000 and the CDC 1604. Complexity and system rigidity no doubt contributed to the fact that SOS was not generally accepted. It win be interesting to foliow the history of a new and very complicated system, the IBSYS/IBJOB complex that has recently been introduced by IBM on the 7090 and related machines. A critique of these systems is far beyond the scope of this discussion. A few comments may be in order. IBJOB presents a very elaborate assembly system MAP, and translators from two languages, FORTRAN IV and Cobol into Map. They are then translated into a language that is close to machine language, with the final steps of the translation left to a very complicated loader. T~le design, which calls for the translation of problem oriented source languages into an intermediate computer oriented source language is very attractive. By ha ving the assembly system do most of the work of the compiler it is possible to have many of the features of the problem oriented language available by means of subroutine calls to those who prefer to write in assembly language. This design philosophy is attractive, but I think that it is wrong. Attractiveness and elegance should not be the determining design criteria for production compiling systems. Design of a system 12 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 is a compromise between many design criteria. One of the most important is keeping the system overhead cost low on the many problems that do not require the use of very sophisticated system features. The code produced by a compiler like Fortran is straightforward simple code. The assembler for such code can be simple and straightforward. The loading program can be designed to permit easy combination with programs produced by other systems. An assembler designed to aid in the production of large programming systems contains many features that are seldom used except in the coding of such systems. A wasteful mismatch may occur when the output of Fortran is fed through such an assembler. Not so very many years ago there was quite a bit of discussion as to whether general purpose operating systems should be designed and supplied by the manufacturers. Some users felt that the very great difference in the job mix and operating philosophy at the various installations called for specially designed and tailored system programs. For a time the argument seem to be settled by the almost universal assumption that operating systems, and computer software in general were as much an obligation of the manufacturer as was the building of the computers themselves. I wonder if this assumption will be able to stand up in face of the rapid developments in the large computer field that will lead to computing systems that are very much more diverse, and very much more complex than those that are in general use today. In the large computer field multiprocessing and multiprogramming systems will soon become the rule rather than the exception. Many experiments in these directions are being tried with computers of the generation that is now coming to an end. Systems combining a 7094, a 1301 Disc unit and a 7040 will soon be com·monplace. There are a number of military systems involving several computers and much peripheral equipment all working together under a common operating system. Among newer computers already delivered to customers there are several models that }lave been designed to make it possible and practical to run peripheral equipment on-line while simultaneously carrying out independent computer processing tasks. The distinction between offline and on-line tends to disappear on such systems, and the operating systems must be able to control equipment in many different modes. Systems already delivered that have some features that permit multiprogramming and multiprocessing include the Honeywell 800, The Ferranti Atlas, The Burroughs 5000 and D825. There is some very interesting recent literature about programming systems for these compu ters. 45.46.47 In the next generation of large computers it may be possible to implement true demand processing systems. Demand systems have been advocated by many in opposition to batching systems. In a demand system problems are sub.... mitted as they arise. The system controls the input of jobs and the scheduling of jobs by stacking jobs in queues according to length, priority, etc. A demand system requires multiprogramming facilities, but also requires much more elaborate decision making on the part of an executive system than is present in most monitors today. The complexity required in some of these operating systems may seem to require that they be uniform systems designed and produced by the manufacturer. But, another feature that is being stressed more and more is modularity, which permits an almost unlimited variety in system configurations. It is very difficult to design a single operating system that is appropriate for a computing system based on Disc storage, and also for one based on tapes or drums, and also for any combination of auxiliary devices. The problem will get more complicated when high speed storage at different levels is available in various q,uantities. It is quite reasonable to anticipate a system in the next few years that will have a very high speed film memory, backed up by a fast core memory, backed up by a large and somewhat slower core memory, backed up by high speed drums, then discs and tapes. It will be a real challenge to design programming systems that are valid for all combinations in such systems. In the early days one of the aims of the operating system was to get the human being out of the system as much as possible. In a multi-programming system it is possible to allow human PROGRAMMING SYSTEMS AND LANGUAGES intervention in the running of a program without any appreciable loss of computer time, since the computer will presumably have other programs it can work on. There has been a great deal of publicity given to the experiments in the use of on-line consoles on present day systems. 4H Such consoles may be a routine part of many computing systems in a few years. In recent issues of a number of Computer publications there is an advertisement for a computer that claims to be faster than the 7090 and costs only $215,000 dollars. Whether or not the claim is true, it does serve to emphasize the fact that the cost of computer processing capability is going to go down rapidly. It is going to be economically feasible to put together extremely large, varied, and complicated concentrations of computer components. Programming sys!ems are going to increase in number and complexity, and the job of the system program designer is going to remain "as it always has been, very difficult, but very, very interesting. 6. B'ibliography. This is not intended to be a complete bibliography of the field of Programming Systems and Languages. It is rather a selected list of publications that may help to document the text. A great deal of reference material is contained in manuals published by computer manufacturers. It would serve no useful purpose to list such manuals here. Manuals exist for just about every programming system mentioned in the text, and the mention of a language or system may be interpreted as an implied reference to the manual. I have attempted to include specific references to sources other than manuals for most systems discussed, but in many cases the system manuals provide better and more complete documentation. In the following Bibliography the abbreviation "1954 Symposium" will be used to refer to the Proc~edings of a Symposium on Automatic Programming for Digital Computers, Navy Mathematical Computing Advisory Panel, Office of Naval Research, Washington, D. C. May 13-14, 1954. These proceedings include a bibliography on pp. 150-152. Additional references will be found in the bibliography on pages 260-270 of reference 49. 13 1. AIKEN, H. H. and HOPPER, G. M. The Automatic Sequence Controlled Calculator. Electrical Engineering 65 (1946) pp. 384391, 449-454, 522-528. 2. vVILKES, M. V., WHEELER, D. J. and GILL, S. The Preparation of Programs for an Electronic Digital Qomputer. AddisonWesley 1957. (First edition published in 1951.) 3. MACMILLAN, D. B. and STARK, R. H. "Floating Decimal" Calculation on the IBM Card Programmed- Electronic Calculator. Mathematical Tables and other Aids to Computation. (Mathematics of Computation.) 5(1951) pp. 86-92. 4. ADAMS, CHARLES W. and LANING, J. H. JR. The M.LT. System of Automatic Coding: Comprehensive, Summer Session, and Algebraic. 1954 Symposium. pp. 40-68. 5. BENNINGTON, H. D. Production of Large Computer Programs. Proceedings of a Symposil:lm on Advanced Programming Methods for Digital Computers. Navy Mathematical Computing Advisory Panel and Office of Naval Research. Washington, D. C. June 28,29, 1956. 6. HOPPER, G. M. The Education of a Computer. Proceedings of the Conference of the ACM. Pittsburgh, 1952. 7. RIDGWAY, R. K. Compiling Routines. Proceedings of the Conference of the ACM Toronto, 1952. 8. TAYLOR, A. E. The Flow-Matic and MathMatic Automatic Programming Systems. Annual Review of Automatic Programming 1 (1960) pp. 196-206. Pergamon. 9. KINZLER, H. and MOSKOWITZ, P. M. The Procedure Translator-A System of Automatic Programming. Automatic Coding. Monograph No.3. Journal of the Franklin Institute. Philadelphia, 1957 pp. 39-55. 10. HOLBERTON, F. E. Application of Automatic Coding to- Logical Processes. 1954 Symposium pp. 34-~9. 11. KAHRIMANIAN, H. G. Analytic Differentiation by a Digital Computer. 1954 Symposium pp. 6-14. 12. HOLT, ANATOL W. General Purpose Programming Systems. Communications of the ACM 1 (1958) pp. 7-9. 14 PROCEE'DINGS-SPRING JOINT OOMPUTER CONFERENCE, 1964 13. BACKUS, J. W. and HERRICK, H. IBM 701 Speedcoding and other Automatic Programming Systems. 1954 Symposium. pp. 106-113. 14. BACKUS, J. W. The IBM 701 Speedcoding System. Journal of the ACM 1 (1954) 4-6. 15. BAKER, CHARLES L. The Pact I Coding System for the IBM Type 701. Journal of the ACM 3 (1956) pp. 272-278. This is one of seven papers on the Pact I system in the same issue of JACM. 16. STEEL, T. B. JR. Pact lA. Journal of the ACM. 4 (1957) pp.8-11. 17. WOLONTIS, V. M. A Complete Floating Decimal Interpretive System. Technical Newsletter No. 11 IBM Applied Science Division. 1956. 18. PERLIS, A. J. and SMITH, J. W. A Mathematical Language Compiler. Automatic Coding. Monograph No.3. Journal of the Franklin Institute. Phildelphia 1957 pp. 87-102. 19. BACKUS r J. W. et al. The Fortran Automatic Coding System. Proceedings of the Western Joint Computer Conference. Los Angeles 1957 pp. 188-198. 20. SHERIDAN, P. B. The Arithmetic Translator-Compiler of the IBM Fortran Automatic Coding System. Communications of the ACM 2 (February 1959) pp. 9-21. 21. PERLIS, A. J. and SAMELSON, K. Preliminary Report - International Algebraic Language. Communications of the ACM 1 (December 1958) PP. 8-22. 22. ROSEN, S. and GOLDBERY, I. B. Altac, The Transac Algebraic Translator. Preprints of papers presented at the 14th National Conference of the ACM. Boston, 1959. 22. ROSEN, S. Altac, Fortran, and CompataAbility. Preprints of papers presented at the 16th National Conference of the ACM, Los Angeles, 1961. 23. SHAW, C. J. Jovial-A Programming Language for Real-time Command Systems. Annual Review of Automatic Programming 3(1963) pp. 53-119. Pergamon. 24. HUSKEY, H. D., HALSTEAD, M. H. and McARTHUR, R. Neliac, A Dialect of Algol. Communicaitons of the ACM 3(1960) pp. 463-468. 25. HALSTEAD, M. H. Machine Independent Computer Programming. Spartan Books, 1962. 26. ARDEN, B. W., GALLER, B. A., GRAHAM, R. M. The Internal Organization of the Mad Translator. Communication of the ACM 4(1961). This issue of the CACM contains the Proceedings of an ACM Compiler Symposium, Washington D.C. November 1718, 1960. 27. GALLER, B. A. The Language of Computers. McGraw Hill, 1962. 28. NAUR, P. Editor. Revised Report on the Algorithmic Language Algol 60. Communications of the ACM 6(1963) pp. 1-17. Also published in Numerische Mathematic and the Computer Journal. 29. BACKUS, J. W. The Syntax and Semantics of the Proposed International Algebraic Language of the Zurich ACM-GAMM Conference. Information Processing. Proceedings of ICIP, UNESCO, Paris 1959 pp. 125-132. 30. CLIPPINGER, R. F. FACT A Business Compiler. Description and Comparison with COBOL and Commercial Translator. Annual Review in Automatic Programming. Vol. 2(1961) Pergamon. pp. 231-292. 31. SAMMET, JEAN E. Basic Elements of COBOL 61. Communications of the ACM 5(1962) p.237-253. 32. COBOL-1961 Extended. External Specifications for a Common Business Oriented Language Dept. of Defense. 1962. U.S. Govt Printing Office. Washington, D.C. 33. CLIPPINGER, R. F. FACT. The Computer Journal 5(1962) pp. 112-119. 34. MCGEE, \V. C. Generalization: Key to Successful Electronic Data Processing Journal of the ACM 6(1959) pp. 1-23. 35. LONGO, F. SURGE: A Recording of the COBOL Merchandise Control Algorithm. Communications of the ACM 5(1962) pp. 98-100. 36. MELCHER, W. P. Share Assembler UASAP 3-7. Share Distributor 564. 1958. The Original UASAP 1 was written by Mr. Roy Nutt. PROGRAMMING SYSTEMS AND LANGUAGES 37. CHEATHAM JR., T. E., and LEONARD, G. F. An introduction to the CL-II Programming System. Document CAD-63-4-SD. Computer Associates Inc. Wakefield, lVlass. 1963. 38. GOLDFINGER, R. The IBM Type 705 Autocoder Proceedings of the Western Joint Computer Conference. San Francisco 1956. 39. ORCHARD-HAYS, W. The Evolution of Programming Systems. Proceedings of the IRE 49(1961) pp. 283-295. 40. MEALY, G. H. Operating Systems. Rand Corporation Document P-2584. Santa Monica, California, 1962. 41. RYCKMAN, G. F. The Computer Operation Language. Proceedings of the Western Joint Computer Conference 1960, pp. 341343. 42. FISHMAN, J. J., HARROFF, D. F., LIVERMORE, F. G. and KUHN, E. F System (FS-3). Internal Publication of the General Motors Research Laboratories (1960). 43. SHELL, D. L. The Share 709 System: A Cooperative Effort. Journal of the ACM 6(1959) p. 123-127. This is one of the six papers on the Share 709 System in the same issue of J AGM. 15 44. BRYAN, G. E., Editor. The Rand-Share Operating System Manual for the IBM 7090 Computer. Memorandum RM-3327PR, Rand Corporation. Sept. 1962. 45. KILBURN, T., HOWARTH, D. J., PAYNE, R. B. and SUMNER, F. H. The Manc,hester University Atlas Operating System. Parts 1 and II. The Computer Journal 4(1961) pp. 222-229. 46. KILBURN, T., PAYNE, R. B. and HOWARTH, D. J. The Atlas Supervisor. Proceedings of the Eastern Joint Computer Conference, Washington, D. C. 1961. pp. 279-294. 47. THOMPSON, R. N. and WILKINSON, J. A. The D825 Automatic Operating and Scheduling Program. Proceedings of The Spring Joint Computer Conference. Detroit, Mich. 1963 pp. 41-49. 48. CORBATO, F. J., MERWIN-DAGGETT, M. and DALEY, R. C. An Experimental Time-Sharing System. Proceedings of The Spring J oint Computer Conference. San Francisco, 1962. 49. CARR, J. W., III. Programming and Coding. Part B of Handbook of Automation, Computation and Control. Volume 2. Wiley 1959. BOUNDED CONTEXT TRANSLATION Robert M. Graham Computation Center Massachusetts Institute of Technology Cambridge, Mass. All translators are syntax-directed in the sense that the translator must obviously recognize the various syntactic structures and the output of the translator is a function of the syntax of the language. The term syntax-directed is usually applied to a translator which contains a direct encoding of the syntax of the language, this direct encoding being used by the translator as data. The companion paper by Cheatham and Sattley is concerned with this type of translation.! In the other class of translators the syntax is essentially buried in the coding of the translator. Most of the algebraic languages in use are precedence grammars,2.3 or close enough so that the properties of precedence grammars are useful. U sing the properties of precedence grammars, bounded context translation is possible. At each step in the scan of an expression in bounded context translation the decision as to what action to take next is a function of the symbol currently under scan and of N symbols on either side (where N is fixed for the particular language) . elimination of common sUbe:x;pressions and optimum evaluation of Boolean expressions, 8 are much simpler when applied to some intermediate form rather than to the original expression or the final machine language version. The intermediate form exhibits the structure (i.e., which subexpressions are the operands of each operator) and the order of evaluation of the subexpressions. In this paper we will restrict ourselves to the source language defined in Appendix A. For the sake of brevity when we say expression we will mean either an expression, as defined in Appendix A, or an assignment statement. This language has no constants and identifiers are single letters, thus avoiding the problem of recognition. A translator for a language such as ALGOL would have a recognizer which would scan a statement and which, each time it was called; would recognize the next element in the input string. After recognizing an identifier the recognizer would store it in a symbol table if it were not already there. The symbol table would also contain other information pertaining to the identifier such as type, address, dimension, etc. The recognizer would then return a pointer to the place in the symbol table where that identifier and its properties were stored. A constant, after recognition, would, be stored in a table of constants, given an internal identifier, and treated from that point on just as any other identifier would be. When an operator was recognized, a pointer to the place Most translators produce an intermediate form of the program before producing the final machine code. Once this intermediate form is produced, the distinction between syntax-directed translation and other methods disappears. The primary goal of the intermediate form is to encode the program in a form which is easily and efficiently processed by the computer. Most optimization algorithms, such as 17 18 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 in the operator table where the properties of that operator were stored would be returned. Thus we see that the recognizer would effectively reduce an expression of a language such as ALGOL to an expression of our language; that is, each element would be reduced to a single symbol. In general, the intermediate form does not contain the identifiers or operators but the pointers supplied by the recognizer. However, for clarity, in the examples we will use the actual identifiers and operators. The recognizer is heavily dependent upon the details of the particular language being translated and such characteristics of the computer as character representation, word size, etc. The algorithms to be discussed in this paper depend only upon a few very general properties of the source language. Most of the common compiler languages have these properties so the techniques discussed here are essentially language independent. While the algorithms which we will describe here apply only to the simple expressions of Appendix A, several of them may be trivially generalized to translate more complex expressions and even the bulk of the ALGOL statements.'·8.12 Some representation of the tree form of an expression is used in several translators. Fig1.. Tree fOrAI B C 2. - E F 3.. 0 2 II. + 1 3 5. :- A II ,, Matrix fOrAI /:-~ ''''-, /. B 1. If this is the first time this row has been examined and, a. If the minor string pointer is not empty; proceed to the row named by it. b. If the minor string pointer is empty; evaluate this row and proceed to the row named by the maj or string pointer. 2. If this is the second time this row has been examined, evaluate this row and proceed to the row named by the major string pointer. Figure l. I ure 1 shows the tree form of the expression A :=:B*C+D* (E-F) and the matrix re-presentation of the tree 5 ,6,7,8 The first column is the operator, the second column is the left hand operand, and the third column is the right-hand operand. An integer represents theresult of the row with that number. The subexpressions are to be evaluated in the sequence in which the rows are written. Figure 2 shows the tree form used by ROSS.12 The first three columns in the representation are the same as in Figure 1. There are two additional columns which explicitly state the sequence of subexpression evaluation. The fourth column is the minor evaluation string pointer (dashed arrows in the tree) and the fifth column is the major evaluation string winter (solid arrows in the tree). In the example the evaluation is to start with row 2. The rules for following the evaluation strings for the row under examination are: ~~~ 1. :- A 3 * B C II 2. I ." "CID/~ 3. + 2 II • • \~ D7 5. ( 6 E/-.. . . . . .F 6. - £ 7. ) 5 ,\,~ d"", ? Ross's tree form Figure 2. 5 1 3 6 F Representat Ion 7 In this example the '('and')' symbols should be treated as "do nothing" operators when their evaluation is called for by the rules. Ross's representation is part of a more general system and, hence, some features of the representation are not pertinent to this discussion. A representation similar to Ross's based on threaded lists is described in Ref. 9. Evans' bases his intermediate form on postfix notation. The expression A :=:B*C+D*(E-F) in postfix form is ABC*DEF-*+ :=:. In this form both the structure and the order of subexpression evaluation are implicit. The rigbthand operand of an operator is the first complete expression to the left of the operator and BOUNDED CONTEXT TRANSLATION the left-hand operand is the second complete expression to the left of the operator. In the example the right-hand operand of the is 'DEF-*' and the left hand operand is 'BC*' The subexpressions are evaluated in left to right order. '+' The transformation of a completely parenthesized expression into matrix form is relatively simple. The expression is scanned from left to right. The following rules are applied to each symbol as it is encountered (a variable name is considered as a single symbol) : + - * / 19 + left left right right - left left right r.i gh t * left left left left / left left left left : . right right right right 1. If the symbol is not a ')'; continue the scan. 2. If the symbol is a ')' and the expression is properly formed then the four symbols to the left of the')' should be of the form 'sl#s2', where 'sl' and 's2' stand for variable names or integers (row numbers) and '#' is any operator; write '# sl s2' as the next row of the matrix and replace '(sl#2)' in the expression by the number of the row just written in the matrix. Continue the scan. In Figure 3 the changes in the expression are shown on the left, the small.arrow under the expression indicating the points, during the scan, at which a row is written in the matrix. The row actually written at that point is shown on the right. Completely parenthesized expressions have such an overabundance of. parentheses that they are difficult to read; hence, languages such as ALGOL have precedence rules making it unnecessary to write completely parenthesized expressions. The language is a precedence grammar if the precedence rules are such that given two adjacent operators it is unambiguous which is to be evaluated first. Thus if a language is a precedence grammar it is possible to construct t (A:-( (A:-( (A:-( (A:- 1 1 1 -... - 1. B C +(O*(E-F~») 2. E F ») 3. 0 2 1 3 (A: -( (B*C)+(O*( E-F»» +(0+ 2 , + » t ) t Figure 3. 5. :- " .. Figure 4. a table of the type shown in Figure 4. To determine which of two adjacent operators to evaluate first, find the intersection of the row labeled with the left operator ,and the column labeled with the right operator. In the context of transforming an expression into matrix form, the order of evaluation of the operators is to be interpreted as the order of their entry into the matrix. A subexpression enclosed by parentheses is to be completely evaluated before any consideration is given to the operators on either side of it. ~pplying these considerations we have the following rules for transforming an expression to matrix form. We enclose the expression on the left by '[-' ;and on the right by '-I'. Again we scan from left to right, applying the rules to each symbol in turn: 1. If the symbol is an operator, # I, and the left context is, a. '[-sl'; continue the scan. b. '( sl' ; continue the scan. c. 's2#2s1'; look up #2#1 in the table. If the table entry is, i. 'right'; continue the scan. ii. 'left'; write '#2 s2 s1' as the next row of the matrix, replace 's2#2s1' in the expression by the number of the row just written in the matrix, and apply rule 1 again. 2. If the symbol is ')' and the left context' is, a. 's2#s1'; write '# s2 sl' as the next row of the matrix, replace 's2#sl' in 20 PROCE,E'DINGS-SPRING JOINT COM,PUTER CONFERENCE, 1964 the expression by the number of the row just written in the matrix, and apply rule 2 again. b. '(s'; replace' (s)' by's' and continue the scan. 3. If the symbol is '-I' and the left context is, a. 's2#s1~; write '# s2 s1' as the next row of the matrix, replace 's2#s1' in the expression by the number of the row just written in the ,matrix, and apply rule 3 again. b. '1- s'; the expression has been completely transformed into matrix form. 6. The label ERR has been filled in for all illegal operator pairs. If one of these occurs then the expression is incorrectly formed. The questions of the detection of incorrectly formed expressions and what action to take when errors are detected is very important in actual translator construction. These questions will not, however, be discussed in this "paper. We will assume here that all expressions are properly formed. When the function T (x, y) takes on the value END then the matrix form of the expression is complete. This is the essence of bounded cpntext translation. The rules just stated show that N ==3 for the precedence grammar of appendix A. That is, in deciding what action to take next, at most only the three symbols immediately to the left of the symbol currently under scan need be examined regardless of the length of the expression being translated. Before examining some ways that bounded context analysis has actually been implemented let us restate, precisely, the above rules in the form of a flow chart (Figure 5). In the flow charts of this paper the true exit of a decision box is marked t and the false exit is marked f. We consider the expression being analyzed as a string of symbols, S (indexed by k), bounded on the left by '/-' and on the right by '-I'. We will distinguish three classes of symbols: 1:-1+1 1iI( I.I>:-L(j-I> M(I,Z):-L(j-Z) ~I( 1.3):-L(j) j :-j-2;LO ):-1 1. I, the class of identifiers: any variable name. 2. R, the class of matrix row numbers: any integer. 3. 8, the class of operator symbols: 8 == {+, -, *, I, :==, (, ), 1-, -/}. For future use we distinguish the subclass of arithmetic operators, 8° == {+,-, *, I}. Symbols from the input string are transferred to an auxiliary list, L (indexed by j). This will avoid the problem of gaps when replacements are made in the input string. M is the matrix (with i the row index) and T (x, y) is a function whose arguments are a pair of operators and whose value is the label found at the intersection of the x-row and the y-column in Figure Figure 5. . .I + - . I MTX MTX PUT PUT :- ) -f ERR MTX MTX MTX MTX PUT PUT ERR f.lTX MTX MTX MTX MTX ~ITX ERR MTX MTX MTX MTX MTX MTX ERR MTX MTX :- PUT PUT PUT PUT ERR ERR MTX ( PUT PUT PUT PUT ERR DEL ERR ... PUT PUT PUT PUT PUT ERR END Figure 6. BOUNDED CONTEXT TRANSLATION 21 It is obvious from the table in Figure 6 that most operator pairs cause one of two actions to take place: 1) the operator currently under. scan is put on the list, or 2) rows are written in the matri~ until an operator pair is found which calls for some other action. The only exceptions are the removal of parentheses and termination of the transformation. If a precedence function is defined for each operator then a table of all operator pairs is not necessary. A precedence function, P (x), is defined such that, given a pair of operators #1#2, if P(#1)#~#P(#2) then the operator #1 is evaluated before the operator #2. A precedence function for the operators used in the previous examples is defined in Figure 7. The algorithm for transforming an expression into matrix form that is used in GAT5 and MAD6.7 uses a precedence function. The flow chart in Figure 8 gives an algorithm for generating the matrix form of the expression. This algorithm differs from the GAT-MAD algorithm only in the direction of scanning the expression. N 0tice that, assuming all expressions are properly formed and since P('(') < P(')') and P(,I-') < P('-I'), when the test P(Lj-1» ~ P(S(k) ) fails then S(k)==')' implies L(j-1) == '(' and S(k) == '-I' implies L(j-1) == '1-'. is encountered in the scan, instructions are generated to move its value onto the N list. Bauer and Samelson give, in the same paper, modifications to this algorithm which will generate more efficient machine code. Bauer and Samelson 10 use two lists, one, L, in the translator and one, N, in the object program. L, called the "symbols cellar," is used for storing operators which can not be evaluated yet, just as in the previous algorithms. N, called the "numbers cellar," is used by the object program during execution for the temporary storage of partial results and values of identifiers. Their algorithm does not generate an intermediate form, but immediately generates machine code~ In the examples of this paper, the machine language used will be that described in Appendix B. This language is very similar to the F AP symbolic machine language used on the IBM 709-90-94 computers. C (indexed by m) will be a matrix where generated machine instructions are stored. The first column will contain the operation code and the second column the. address. ·Figure 9 is the table, for Bauer and Samelson's algorithm, corresponding to the table in Figure 6, and Figure 10 is their algorithm. Whenever an identifier Figure 7. x P(x) I 7 * 7 + 6 - 6 := 5 ) 4 ( 3 -f 2 I- 1 Figure 8. 22 PROGE~EnINGS-SPRING JOINT COM'PUTER CONFERENCE, 1964 . - * I ( ) PUT PUT PUT PUT PUT ERR END + MTXl MTXl PUT PUT PUT MTXl MTXl - MTX2 MTX2 PUT PUT PUT MTX2 MTX2 * MTX3 MTX3 MTX3' MTX3 PUT MTX3 ~'TX3 I MTX .. MTX .. MTXII MTXII PUT MTX .. MTXII ( PUT PUT PUT PUT PUT DEL ERR ~ -I Figure 9. Figure 11. + C("l,ll:-'CLA' C(ot+l,2':-N(h-ll C(m+2,1):-'FOP' C(ot+2,2':-N(h) C(ot+3,1l:-'STQ' C(m+3 2):-N(h-ll + - * I :- MTX MTX PUT PUT ERR ( ) -I PUT PRN MTX - MTX MTX PUT PUT ERR ,?UT PRN ~'Tx'" * MTX MTX MTX MTX ERR PUT PRN MTX I MTX MTX MTX MTX ERR PUT PRN MTX :- PUT PUT PUT PUT ERR PUT ERR MTX ( PUT PUT PUT PUT ERR PUT MTX ERR ) ·MTX ~'TX MTX MTX ERR ERR ERR MTX ~ PUT PUT PUT PUT PUT PUT ERR END Figure 10. Ross's algorithm, as given in Ref. 12, transforms more than just expressions into tree form. In the version of his algorithm given in Figures 11 and 12 the machinery not needed to transform expres~ions has been omitted. A minor evaluation string pointer is set whenever the right operand of an operator is set and both operands are variables, or whenever a left operand is set and the o-perator is a modifier. The only modifier in an expression is the '('. The minor evaluation string is forced to pass Figure 12. through modifiers since a modifier may change the interpretation of the right operand. For example, the right operand of '(' may' be either a normal subexpression or the argument of a function depending upon whether the left argument of the' (' is empty or is an identifier. A major evaulation string pointer is set whenever a right or left operand is set and the operand, is a row number. BOUNDED CONTEXT TRANSLATION Evans 4 uses an algorithm based on Floyd's productions. l l Instead of generating machine code directly as Floyd did in his paper, Evans transforms the expression into postfix form. This algorithm is probably the most versatile of the lagorithms which we have described here. The central idea here is a set of productions which determine the course of action at each step in the translation process. A production consists of five parts; 1. A top of stack configuration. 2. A replacement for the configuration of part 1. 3. The name of an action routine. 4. Scan, no scan flag. 5. Which production to consider next. Figure 14 gives the productions for transforming ,an expression into postfix form. The expression is scanned from left to right. As each new character is scanned it is put on the top of the pushdown stack and the productions are then searched for the first one whose part 1 matches the present top of the stack (when a class symbol such as eo appears, any member of that class will give a match). When a match is found, that portion of~ the top of the stack which matched part 1 of the production is replaced by part 2 of the production. If part 2 is empty, the replacement degenerates to the deletion of the symbols, which matched part 1, from the top of the stack. The action routine named in part 3 is then executed. After the action routine has been executed the ,productions are again searched for a match with the top of the stack; however, the search is started with the production whose line number is given in part 5 of the last interpreted production. If 1 1 • 2 ( 3 " 5 2 ( ,. ,:- :- -. -. 6 ) 7 () 3 OUT ) Nap CaMP OUTP( 'loc') CaMP " · ·· · CaMP Nap x P(x) * 7 / 7 + 6 - 6 :- 5 ) 4 -I 3 ( 2 1- 1 Figure 14. 5 3 1 1 1 END 7 · 3 P-Table Figure 13. Figure 15. 23 24 PROCE'E'DINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 a '*' appears in part 4, a new symbol is scanned from the input string before the search continues. The productions given in Figure 13 are sufficient only to transform the simple expressions of this paper. The productions that Evans gives will transform all the statements of ALGOL into postfix form. Figure 15 gives an algorithm for transforming an expression into postfix form using the productions of Figure 13. The action routine OUT puts the last identifier scanned into the output string, M. The action routine OUTP (x) puts its argument, x, into the output string. The productions cause the unary operator 'loc' to be introduced into the output string following a variable which is on the left ,side of ':==', .which indicates that a location (where a value is to be stored) is involved rather than a valu~. The action routine COMP uses the precedence function, defined in Figure 14, to determine when operators are placed in the output string (i.e., to determine the sequence of evaluation of the operators) . the matrix, M, is translated by making a copy, in the code output matrix, of the pattern corresponding to the operator M (i, 1), replacing all cccurrences of '1' by the left operand, M (i, 2), all occurrences of lr' by the right operand, M (i, 3), and all occurrences of 't' by the row number, i. The row numbers (integers) which appear in the code are to be interpreted as the names of temporary storage locations. Figure 17 is an algorithm for performing this translation. N is the number of rows in the matrix, M, C is the code output matrix, P AT1 (x) a function whose value is the index of the first line of the pattern for the operator x, and P AT2 (x) a function whose value is the index of the last line of the pattern for x (both these functions are defined in Figure 18). The translation of the matrix in Figure 1 is shown in Figure 19. It is immediately obvious that very inefficient machine code is produced by this algorithm. Once the matrix form of the expression has been generated, the final translation to symbolic machine code is relatively simple. Corresponding to each operator is a set of machine instructions, a pattern. Figure 16 gives the patterns for the operators of our language; the fi"rst column is the operation code and the second column is the address. The matrix is translated one row at a time, in sequence. Row i of operator 1 eLA 1 2 FAD r 3 STO t II eLA 1 5 FSB r 6 STO t 7 LDQ 1 8 FMP r t 9 STO 10 eLA I 11 FOP r t 12 STQ 13 eLA r 111 STO 1 + :- Pattern Table Figure 16. Figure 17. BOUNDED CONTEXT TRANSLATION x PATl(x) PAT2(x) + 1 3 - 4 6 * 7 9 I 10 12 :a 13 14 Figure 18. 1. * 2. - :3. * II. + B E D 1 A Matrix C F 2 :3 II LDQ B FMP C STO 1 CLA E FSB F STO 2 LDQ D FMP 2 STO :3 CLA 1 FAD :3 STO II the code output matrix depend upon the contents of the special registers. The method used in the MAD translator is general enough for the machine structure assumed in this paper. The problem of generating efficient machine code is a very difficult one and is yet unsolved. There are methods, undocumented, other than the one to be described but none which can claim to produce highly efficient code in all circumstances. The patterns will be arranged so that the result of a row is, in general, not stored, i.e., it will be left in one of the registers AC or MQ. The machine code produced when a row of the matrix is translated will depend on the values of four Boolean variables. These variables are named AC, MQ, LO, and RO. Suppose we are ready to translate the ith row of the matrix, then these variables have the following meanings: 1. If AC is true, the result produced by row i-I is in the AC. I CLA II I STO A . ~lachlne 25 2. If MQ is true, the result produced by row i-I is.in the MQ. 3. If LO is true, the left operand in row i is i-l (I.e., the result of row i-I). code Figure 19. Once we begin to consider the production of efficient machine code, the algorithms rapidly get very complicated. We can make some improvement, however, without a great deal of complication. In the example are several redundant store and fetch instructions. These can be eliminated if we keep track of the contents of the special machine registers. We then insert store instructions only when the current contents of a special register is not one of the operands of the next operator and fetch instructions only when the operand is not already in a special register. To implement this we generalize the idea of patterns. Corresponding to each operator is a variable ·pattern, that is, the instructions which are actually copied into Figure 20. 26 PROCE'EDINGS-SPRING JOINT COM·PUTER CONFERENCE, 1964 4. If RO is true, the right operand in row i is i-I. Instead of regarding the patterns as a ·fixed set of machine code to be produced in translating operation code first operand second operand M inst a 11 12 s val J a row of the matrix, we now take the view that the pattern is really a small 'program which, when executed, produces machine code. Viewing it in this light, we need a language in which to write the program. meaning compile the machine instruction inst with a in its address field (a may be I, r, t, or blank) if AC -1 == 'true' transfer to line 11, MQ JI otherwise transfer to line 12 LO r AO l set the value of [~gJ to val transfer to line 1 halt H In our language we will need the following types of instructions: produce a machine instruction, branch on the truth or falsity of one of the Boolean variables, absolute transfer, set the value of one of the Boolean variables, and halt. Figure 20 is a flow chart for a program to produce the code for '+', where '1' and 'r' have the same meanings as in Figure 16, but 't' now refers to the temporary used to store the result of the previous row. Notice that if there is a result in the AC or MQ and it is not either of the operands then instructions to store it and fetch one of the operands are generated. If one of the operands is in the wrong special register an exchange instruction is gen- x PAT(x) -+ 1 - 20 * 37 I 54 :- 68 erated. The word COMPILE means, here, write the instruction into the code output matrix. A command in our pattern programming language will have a line number, an operation 1 RO 2 26 AC 28 2 AC ~ 27 :, XCA 27 ~IQ 52 ~9 52 M STQ t 53 J 49 5 .. LO 55 61 57 56 3 M XCA 28 M FSB .. M FAD I 29 J 5 5 S AC true 30 AC n 3~ 55 AC 6 S MO false 31 H STO t 56 '1 'J" elA I 57 34 /110 35 32 35 M STU t 36 J 32 61 37 RO 38 102 62 )9 63 7 H 32 8 lO 9 13 9 AC 11 10 H 28 10 M XCA 11M FAD 12 J 5 13 AC III 17 31 MO .. 0 III M STO t 39 M XCA 1 r 15 M ClA 16 J 11 17MQ 18 15 t r 1 XCA FOP r S AC false 59 S "Q true 60 H AC 62 65 M 5TO t M CLA I 610 J 57 65 ','0 66 63 66 M STI"! t J 63 58 " 40 M FMP U J 5 "2 lO 43 107 67 H '~Q .. 5 .... 68 RO 69 7 .. .... ,., xeA 69 AC 70 72 STO I STQ I r IBM STQ 19 J 15 20 RO 21 25 .. 5 /I FMP 21 AC 23 22 .. 6 J 5 r 70 M 71 H 22 M lCA .. 7 AC 108 51 72 H 23M CHS .. 8 M STO t 73 H 49 M lDQ I n ,., CLA 50 J .. 5 75 J 70 211 J II 25 LO '76 3D Pa ttl! rn PrO.l'l'Ilms Figun21. 51 Figure 22. BOUNDED CONTEXT TRANSLATION MQ:-'false' RO:-'false' C(j , 2 ) : -M ( i , 2 ) Figure 23. 27 28 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 x OP(x) M MOP AC ACOP MQ MQOP LO LOOP RO R,OOP S SOP J JOP H HOP Figure 24. AC M 1. * - I LO E fal se false C RO B C true 2. MQ fal se F LOQ B FMP C STO 1 CLA E F I line of the pattern program for the operator x. The values of LO and RO are set by examining the operands in a row before executing the pattern program for that row. The function P AT (x) is defined by the table of Figure 21, ~igure 22 gives the pattern programs (the P matrix), and Figure 23 is the interpreter for the pattern programs, i.e., the algorithm for translating the matrix into machine code. OP (y) is a function, defined in Figure 24, whose argument, y, is one of the operation codes of the pattern programming language and whose value is a label, the label of that portion of the algorithm which interprets the operation code y. Figure 25 shows the translation of the matrix in Figure 1 into machine code using the algorithm of Figure 23. For each row of the matrix is shown the machine code produced for that row and the status of the four Boolean variables after translating that row and just before considering the next row, that is, at point 3 in the flow chart of Figure 23. Notice that just this simple consideration of the contents of the special registers gives us a saving of five instructions when compared to the instructions produced by the algorithm of Figure 17. Figure 25. It is obvious that only trivial modifications are necessary to be able to use the pattern program interpreter with Ross's intermediate form. Instead of considering the rows of the matrix in sequence, the minor and major evaluation strings are followed. When the rules for following the evaluation strings call for evaluation of a row, the appropriate pattern program is interpreted. Evans uses an algorithm very similar to the pattern program one which we have described. code, and two operand fi~lds. The commands are: The program for translating the matrix into machine code now becomes an interpreter which executes (interpretively), for each row of the matrix the appropriate pattern program. The pattern programs will be stored in a matrix, P (indexed by k), the first column ""is the operation code, the second column is the first operand, and the third column is the second operand. As before, we have a function PAT (x) whose value is the index of the first No consideration has been given to the type (real, integer, etc.) of the identifiers involved. In many computers there are different instructions for floating decimal (real) and integer arithmetic. This is easily taken care of by having, for example, two pattern programs for '+', one to be interpreted when the operands are real, and the other to be interpreted when the operands are integer. Finally, it is clear that the interpreter instead of generating machine instructions could actually execute them, thus turning the entire translator itself into 3. * 0 true fal se false true FSB true false fal se true FMP 0 true false false true FAD 1 STO A XCA 2 4. + 1 3 5. :- A 4 BOUNDED CONTEXT TRANSLATION an interpreter which could execute (interpretively) programs written in the source language. BIBLIOGRAPHY 1. CHEATHAM, T. E., JR., and K. SATTLEY: Syntax Directed Compiling, Pt'oceedings of the 1964 SJCC, Washington, D. C., April 19. 2. FLOYD, R. W.: Syntactic Analysis and Operator Precedence, J. ACM, 10 (3): 316-333 (1963). 3. FLOYD, R. W.: Bounded Context Syntactic Analysis, Comm. ACM, 7 (2): 62-65 (1964). 4; EVANS, A., JR.: An ALGOL 60 Compiler, paper presented at the 18th Annual Meeting of the ACM, Denver, August 1963. 5. ARDEN, B. W., and R. M. GRAHAM: On GAT and the Construction of Translators, Comm. ACM, 2 (7): 24-26 (1959); correction, Comm. ACM, 2 (11): 10-11 (1959) . 6. ARDEN, B. W., B. A. GALLER, and R. M. GRAHAM: The Internal Organization of the MAD Translator, Comm. ACM, 4 (1) : 2831 (1961). 7. ARDEN, B. W., B. A. GALLER, and R. M. GRAHAM: An Algorithm for Translating Boolean Expressions, J. ACM, 9 (2) : 222239 (1962). 8. GRAHAM, R. M.: Translator Construction, Notes of the Summer Conference on Automatic Programming, University of Michigan, June 1963. 9. EVANS, A., JR., A. J. PERLIS, and H. VAN ZOEREN: The Use of Threaded Lists in Constructing a Combined ALGOL and Machine-Like Assembly Processor, Comm. ACM,4 (1) : 36-41 (1961). 10. BAUER, F. L., and K. SAMELSON: Sequential Formula Translation, Comm. ACM, 3 (2) : 76-83 (1960). 11. FLOYD, R. W.: A Descriptive Language for Symbol Manipulation, J. ACM, 8 (4) : 579584 (1961). 29 12. Ross, D. T., and J. E. RODRIGUEZ: Theoretical Foundations for the ComputerAided Design System, P1'oceedings of the 1963 SJCC, Detroit, May 1963, pp. 305322. 13. NAUER, P., et al.: Report on the Algorithmic Language ALGOL 60, Comm. ACM, 3 (5): 299-314 (1960). APPENDIX A Definition of the Language Used in the Examples: :==A I B I C 1DIE I FIG I H : :== I ( < expression> ) : :==* I / : :==+ 1 : :== I : :== I : :== :== < expression> APPENDIX B The Machine Language Used in the Examples There are two special registers, the A C and the MQ. Instructions are single address. The meaning of the instructions is expressed as a short ALGOL program. The '?' in the context 'MQ:==?' means that the contents of the MQ is indeterminate. When the name of a register does not appear to the left of an ':==' in the description of an instruction, then the value of that register is unchanged. Instruction Meaning CLA X AC:==X FAD X AC:==AC+X; MQ:==? LDQ X MQ:==X FMP X AC :==MQ*X; MQ :~ ? FDP X MQ :==AC/X; AC :==? FSB X AC:==AC-X; MQ:==? STO X X:==AC STQ X X:==MQ CHS AC:==-AC XCA TMP :==AC; AC :==MQ; MQ :== TMP SYNTAX-DIRECTED COMPILING T. E. Cheatham, Jr., and Kirk Sattley Computer Associates, Inc. Lakeside Office Park Wakefield, Massachusetts We are concerned hardly at all with the extremely important and often neglected problems of the~envirj)nment in which a compiler or code resultiJ1g;from a com'piler is to operate. Reference [3] sketches our position in this matter. INTRODUCTION This paper is primarily concerned with the analysis of source statements in a programming language, although some of the ideas and techniques may be applicable to the analysis of source statements in a natural language. We are particularly concerned with those techniques which might be classed as predictive; the companion paper by Graham 7 is concerned with other ("nonpredictive") techniques of analysis. Very broadly the techniques we will discuss operate as follows: Given a set of rules (Syntax Specification) for forming allowable constructs, eventually 1-esulting in a statement (or sentence, word, program, etc.) of a language, we analyze a source statement in that language by guessing, or predicting, how the statement is constructed and either verifying that this is the case or backing up to try again, assuming some other method of construction. We keep a "history" of Our attempts and when we have determined the exact way in which the statement is constructed we can use this "history" of its construction for further processing of the components of the statement. The phrase "syntax-directed" in the title refers to the method by which the compiler is given the syntactic specification of the language it is to compile. That is, rather than having the syntactic structure· of the language reflected in the actual encoding of the compiler algorithm, a "syntax-directed" compiler contains (or uses, as parametric data) a relatively simple and direct encoding of the syntactic structure of the language, for example, as it might be expressed in Backus Normal Form.. By "simple and direct encoding," we mean, for instance, numerical codes for the distinct syntactic types of the language, and direct pointers representing the relations of concatenation and alternative choice, plus perhaps some sorting. This paper is not intended as a review or critique of syntax-directed eom'piIers or compiler techniques nor have we presented a comprehensive bibliography on the subject. Rather, our purpose is tutorial-to present in as straightforward a manner as possible the essential ideas of syntax-directed compilers. Unfortunately, there is, at the present time, no completely adequate review paper on the subject; Floyd13 does include a rather complete bibliography. We will be concerned, secondarily, with the synthesis of machine coding, given an analysis of a source statement. We do not, however, discuss in any detail the difficult (and, at this point, not terribly well understood) problems of synthesizing highly efficient coding~ Reference [1] contains a brief discussion of this prohlem.. 31 32 PROCEEDINGS-,SPRING JOINT COMPUTER CONFERENCE, 1964 Our presentation commences with a discussion of syntax and the syntactic specifications of languages-programming languages in particular. We then discuss techniques for encoding the syntax into tables and develop a simple algorithm, the ANALYZER, which can perform a syntactic analysis of source material, using this tabled syntax specification as data. From this we proceed to a discussion of the generation or synthesis of code from the results of the analysis. Finally, we discuss several problems and limitations. Certain problems of syntactic specification and some modifications of the schemes we describe in the body of the paper have been discussed in an appendix. In order to discuss the structure of the language,., we give names to classes of strings in the language-we call these names (or the classes they denote) "Syntactic Types." Some of the classes of interest consist of single characters of the source alphabet: these we call "Terminal Types," and specifically "Terminal Characters"; to talk about any particular one, we will merely display the character. Most of the classes, though, are more complicated in structure and are defined in terms of other classes; these we call "Defined Types," and to designate one, we choose a mnemonic name for the class and enclose it in the signs' <' and' >'. SPECIFICATION OF SYNTAX Rather than proceed immediately to a discussion of Backus Normal Form, we shall first define a simple form of Synt~.x Specificationthe Basic Specification. This consists of a set of "Simple Type Definitions" (meaning, not that the Types are simple, but the Definitions are). A Simple Type Definition consists of the name of a Defined TY'pe, followed by the curious sign ':::=' followed by a sequence of Syntactic Types, Defined or Terminal. An example, taken from the Syntax I-soon to be discussedwould be: Several essentially equivalent formalisms for the representation of syntax have been developed. These include such things as Post Production Systems, developed by the logician Emil Post during the 1940's as a tool in the study of Symbolic Logic; Phrase Structure Grammars, developed by the linguist Noam Chomsky during the 1950's as a tool in the study of natural languages; and Backus Normal Form, developed by the programmer John Backus during the late 1950's as a tool in the description of programming languages. We shall use here a formalism most similar to.Backus's. A syntactic specification of a language is a concise and compact representation of the structure of that language, but it is merely thata description of structure-and does not by itself constitute a set of rules either for producing .allowable strings in the language, or for recognizing whether or not a 'Proffered string is, in fact, an allowable string. However, rules can be formulated to produce, or recognize, strings according to the specification. In a "syntax-directed" compiler it is an algorithm which performs the recognition of allowable input strings, and it does this by using (an en co dement of) the Syntax Specification as data. In this paper, 'we shall call such an algorithm an (or the) "Analyzer." Basic Syntax Specification : ::= := The Defined Type on the left of the ':::=' is called The Defined Type of the Definition; and the Definition is said to be a Definition of its defined type. In general--even for the more complicated forms of Type Definitions yet to come-we shall call the right-hand side of the Definition the "Definiens." Any sequence of type designators appearing in a Definiens is called a "Construction," and each type designator within the Construction is a "Component" of the Construction. So, the above example is a Definition of the Defined Type ; its Definiens is a Construction with three components, which are, in the order of their appearance, the Defined Type , the Terminal Charcater ':=' and the Defined Type <.arith expr>. A Simple Type Definition of this sort states that, among the strings of the source language belonging to the .Defined Type, are those which are concatenations of substrings-as many SYNTAX-DIRECTED COMPILING substrings as there are components of its (Simple) Definiens-such that each substring (in order of concatenation) belongs to the Syntactic Type named by the corresponding Component (in order of appearance in the Definiens). Applied to our example: A source string belongs to the class (or, for short, "is an ") if it can be broken into three consecutive substrings, the first of which is a , the second of which is the single character '==', and the third of which is an . If we were interested in using Syntax Specifications as '''generative grammars"-that is, if we were writing an algorithm to use a Syntax Specification to produce sam·ples of strings of the language, we would write something which, applied to our example, would have the effect of: "if you wish to produce an , then: first choose any "definition of and 'produce a string according to that definition, then, second write down the character '==', then third produce an according to any definition of that type; then you have produced an " Thus, the use of a (Basic) Syntax Specification as a generative grammar is quite straightforward. The inverse problem-using the Syntax Specification a~ a "recognition grammar"is, like many inverse problems, rather more involved. In our opinion, the fundamental idea -perhaps "germinal" would be a better word -which makes syntax-directed analysis by computer possible is that of goals: a Syntactic Type is construed as a goal for the Analyzer to achieve, and the Definiens of a Defined Type is construed as a recipe for achieving the goal of the Type it defines. * It is this use of goals which leads to another description of analysis techniques of this kind-"predictive analysis" : setting up the recognition of a particular Syntactic Type as a goal amounts to predicting that an instance of that type will be found. N eedless to say, this use of the term "goal" is not to be confused with the "goal-seeking behavior" of "artificial intelligence" programs or "self- * To our knowledge, the first person to formulate and implement this conception was E. T. Irons, in his initial design of the PSYCO compiler; his paper [4] describes the essentials of his compiling technique. 33 organizing systems." However, when we come to specifying the algorithm for selecting par~ ticular goals in a particular order, we reach the point at which the several existing syntaxdirected techniques diverge. Our purpose in this section on "Basic Syntax Specification" is to lay a foundation common to the principal different applications of the technique; hence, if we try to "picture" the use of a Syntax Specification as a recognition grammar, as we pictured its use as a generation grammar in the preceding paragraph, the most generally valid statement we can make is : We can say that we have recognized an occurrence of a given Syntactic Type (at a given position in the source string) if one of the two following conditions obtains: 1. The Syntactic Type is a Terminal Character, and the character at the given position in the source string is exactly that Terminal Character; 2. The Syntactic TY'pe is a Defined Type, and for some one of its (Simple) Definitions, we have already recognized concatenated occurrences of the Components of that Definiens, in the stated order, the first of which occurs at the .given position. In t>rder for the set of Simple Type Definitions to constitute a useful Syntax Specification, it should satisfy some conditions. (Cl) Any Defined Type which occurs as a Component in any Definiens must also occur as the Defined Type of some definition. The desirability of this "completeness condition" is fairly obvious-it will be very difficult to recognize a Defined Type if the Analyzer has no Definition of that Type. Of course, it is possible that the Analyzer may never be asked to find an instance of this Type, but then all the Definitions which included it as a Component would also be superfluous. (C2) Every Defined Type must ultimately be constructible entirely out of Terminal Characters. This "connectedness condition" is designed to prevent a cycle of definitions which it is impossible to break out of-that is, if a Defined Type is defined only in terms of Defined Types, 34 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 each of which in turn is defined only in terms of Defined Types, etc. Of course, it will be true that there will be cycles within the act of definitions, and these cycles may be traversed arbitrarily many times; but there must exist some point in each cycle where an alternative definition of one of the types exists. It is probably sufficient to restate condition (C2) in the following fashion: A Terminal Type will be said to be "grounded." A Defined Type is grounded if it has at least one Definition, all of whose Components are grounded; then (C2') Every Defined Type must be grounded. (C3) There must exist exactly one Defined Type which does not appear as a Component in any Definiens (except possibly its own). This Type is called the "Starting Type" of the Syntax Specification. The Starting Type represents the "largest" construction allowable under the Specification --e.g., "sentence," or perhaps "paragraph," in natural language applications, or usually "program" in compiler applications. If there is no Starting Type, the Analyzer, quite literally, will not known where to begin. Let us note here in passing that there is a property of Syntax 8pecifications which is of great importance to theoreticians in this field, and to people who are designing new languages or trying to construct Specifications for existing complex languages, but which is irrelevant to the problem of programming a syntax-directed Analyzer. This is the question of "structural ambiguity" -does the Syntax Specification permit a particular source-language string to be correctly analyzed in two different fashions? A simple example, taken from natural language (with apologies to Oettinger) is: "Time flies incessantly." This is certainly an English sentence-but is it a metaphorical declarative sentence, or a terse imperative? In the case of an Analyzer algorithm on a computer, only one thing is done at a time-if the Analyzer is asked to find an instance of an Ambiguous Syntactic Type, it must try one of the possible definitions first; if that definition succeeds, the Analyzer is satisfied, and the other definitions are not considered. This is not to say that an Analyzer, one of whose functions is to find all possible analyses, cannot be built; this has been done by Oettingerl l for natural language, and by Irons 5 , for use in compiling. Some Transformations of the Basic Specification We shall now proceed to build up to the description of a particular simple Analyzer algorithm, and at this point, we must choose one among several different techniques. The differences between the various techniques stem from the following considerations: -Given a Syntax Specification, there are different ways of using it to determine the next goal which the analyzer is to pursue. The two major approaches are called the "top-down" and the "bottom-up" techniques. -There are different ways to use the output of the Analyzer, e.g., interpretation, immediate generation of output code, recording of the analyzed structure for later generation, etc. The particular type of Analyzer we have chosen to describe here is, we believe, the easiest to explain, and is suitable for any of the three output-treatments mentioned above. It does not correspond, so far as we know, to any actually existing compiler system, although it bears a surprisingly strong resemblance to the algorithm used in some of the compilers that Computer Associates, Inc., has recently produced. (See Shapiro and Warshall)1. The first step is to transform our Basic Syntax Specification into a Backus Normal Form. The description of a Syntactic Type Definition is now expanded so that the Definiens, instead of simply a Construction (which, remember, was a sequence of Components, which, in turn were Syntactic Types) can now be a sequence of Constructions, separated by the special sign 'I'. Any such sequence of Constructions, separated by 'I' and appearing in a Definiens is called an "Alternation," and the individual Constructions in the sequence are called "Alternatives" of the Alternation. To transform a Basic Syntax Specification into Backus Normal Form, we must repeatedly apply the following transformation rule to the set of Definitions, until it can no longer be applied: SYNTAX-DIRECTED COMPILING (Tl) If any Defined Type has more than one Definition in the set, delete an such Definitions, and add to the set a new Definition whose lefthand side is the Defined Type in question, and whose Definiens is an Alternation of the original (Basic) Definientes. As an example, the Basic Syntax Specification for the simple language we are using for illustration in this paper would have contained three definitions for : ::== ::== «arith expr» After applying (Tl) to the Basic Specification, these three Definitions would be replaced by the single Definition ( ) I I This Definition, of COurse, should be read "a source string is a if it is either a or an or an enclos~d in parentheses." This Backus Normal Form is exactly the form of Syntax Specification used in the defining documents for ALGOL 60 [8], and Table 1 presents a conlplete syntax for a simple arithmetic programming language in this form, which we shall refer to as "Syntax 1." The Action of the Analyzer We can now sketch out the action of the Analyzer: At the beginning of the process, it takes the Starting Type of the Specification as its first goal. Then at any point in the process it follows these steps when it has a Defined Type as its current goal: The Analyzer consults the Definition of the Defined Type (in Backus Normal Form, of course, each Defined Type has a unique Definition) , and specifically, it considers the first Alternative in that Definition. It then successively takes each Component of that Alternative as a sub-goaL (Of course, it must re-enter itself for each of these goals, and it must keep track of where it was at each level of re-entry.) If at any point it fails to find one of these subgoals, it abandons that Alternative, and considers the next Alternative in that Definition, 35 if there is one, and steps through the Components of that Alternative. If there is no next Alternative, it has failed to realize its current goal, and reports this fact "upstairs." If it succeeds in finding the sub-goals corresponding to each of the Components of any Alternative in the Definition of its current goal, it has found its goal, and reports that fact. T-his rough sketch conveniently ignor-es a number of sticky points which we now have to consider. The first of these -points is that we discussed the action of the Analyzer only when its current goal was a Defined Type. What if the goal is a Terminal Character? When it· comes to writing a compiler in practice, the question of recognizing Terminal Char- . acters brings us face to face with the lovely problems of restricted character sets, inputoutput idiosyncracies of the particular computer, etc. Both in practice and in the remainder of this paper, we assume the presence of another routine, called the "Recognizer," which the Analyzer can call upon to deal with these problems. So far, we have also glossed over the problem of keeping track of where in the Input String the Analyzer is looking. Obviously, when the first Component of some Construction has been recognized, starting at a certain point in the Input String, then, when the Analyzer proceeds to look for the next Component, it must move its Input-String pointer past the substring which satisfied the first Component. Now, since a Type which has been successfully recognized consists, ultimately, of a sequence of Terminal Characters, and the recognition of Terminal Characters is the job of the Recognizer, we shall also leave the moving of the Input-String pointer to the Recognizer. The fundamental action of the Recognizer is then as follows: The Recognizer is called by the Analyzer, and asked if a specified Terminal Character occurs at a stated character position in the Input String. The Recognizer then negotiates with the I/O system of the computer (if necessary) and examines the characterposition in the input string. If the input character at that position is not the Terminal Character the Analyzer asked for, the Recognizer reports failure. However, if the input character is the desired Terminal Character, 36 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 the Recognizer reports success to the Analyzer, and advances the Input-string pointer by one character position. H·aving the Recognizer at hand, it turns out to be convenient in practice to give it some further responsibilities. Consider the definitions of and in Syntax I. These amount to saying that 'an is any sequence of digits, and a is any sequence of letters or digits as long as the first one is a letter. If we relegate the recogni...; tion of these fundamental types to the Recognizer, rather than the Analyzer, we obtain several advantages. -The Recognizer can be hand-tailored to perform these :particular recognitions very efficiently on the particular machine, and this speeds up the analysis considerably. -As far as the Analyzer is concerned, if the Syntax Specification calls for an , for instance, any old integer will do. But when we come to generating output code, we'll need to know the particular integer which occurred at that point. The Recognizer can perform the conversion from external number representation to machine representation, and either return the numerical value, or enter the number in a "Literal Table" and return its index value. Similarly, when it recognizes , it can look in a "Symbol Table" for previous occurrences of that particular variable, add it to the table if necessary, and return a line number. -In practical applications the question of what constitutes a "blank" is often an involved one. In some languages, a long comment may function syntactically as a blank. When a compiler runs under the control of some operating systems, certain segments of the Input string (e.g., identification fields in cards) must be treated as blanks, or ignored entirely. Since the Recognizer constitutes the interface between the Analyzer and the outside world, it can take care of these matters. To allow for this extended Recognizer in our Syntax Specification, We allow another sort of Terminal Type (up to now, we recall, the only Terminal Types have been Terminal Characters). We designate these new Terminal Types with script capital letters, and call them "Terminal Classes." Thus, in Syntax I, we can delete the definitions of , , , and , and replace the Definientes of and by the Terminal Classes QI and (1, respectively. This ·produces Syntax II, Table 1, which is the one we shall refer to throughout the rest of this paper. But this technique could be carried further. A compiler-builder might decide that he prefers operator-precedence techniques for the analysis of arithmetic expressions, while keeping the flexibility of syntax-direction for analysis of the larger constructions. His arithmetic-expression scanner would then function as a Recognizer for the previous Defined Type ' ,' and, for this simple language, the Syntactic Specification would take the form of Syntax III, Table 1. To summarize: When the current goal is a Defined Type, the. Analyzer calls upon itself to find it, but when the goal is a Terminal Type, it calls upon the Recognizer. When the Recognizer is called, it determines according to its own internal rules, Whether the desired Terminal Type occurs in the Input string at the current pointer-position; if not, it reports failure; if so, it advances the pointer past the substring which constitutes the Terminal Type (single character, or member of a Terminal Class), and reports success. Encoding the Syntax Specification Weare now almost ready to proceed to the encoding of the Syntax Specification for the use of the Analyzer, except for one embarrassing question: Consider, as an example, the definition of in Syntax II : : :== I * What if the Analyzer should find itself considering the second Alternative in this Definition? This would amount to the Analyzer saying to itself "in order to find my current goal, which is , I must set out to find the first Component of this Alternative, which is ." In order to find a term it must SYNTAX-DIRECTED COMPILING 37 TABLE 1 Alternative Syntax Specifications Syntax I: < assignment> Syntax II: < assignment> 1 ; == 1 + 1 * I 1 «arith expr» 1 I I AIBICIDIEIFIGIHIIIJIKILIMINIOIPI QIRISITIUIVIWIXIYIZ 0111213141516171819 I ·; == < arith expr> I + I * I I «arith expr» I ::== == , .. -' £ a be able to find a term first. This is called the "left recursion problem,", and it has led some language designers to disallow Type Definitions which include Alternatives which mention the Defined Type of the Definition as their first Com,ponent. To a human analyst, of course, the intent of the Definition is plain ;he should first look for a ; if he finds one, he has indeed found a , but he should continue looking to see if he can find a '*' followed by another ; if he can, he has found a "longer" , and should continue looking for a: still longer one; as soon as he fails to find a '*' following his latest ,:eterm>, he can stop looking, confident that he has found the longest at that point in the string. This recognition process can be embodied in the encoding of the Syntax Specification, but it does require detecting the presence of these left-recursive alternatives, and giving them some special treatment. Keeping this in mind, we shall proceed to encode the Syntax Specification. The encoding consists of two tables, the Syntax Type Table and the Syntax Structure Table. The Syntax Type Table will contain an entry for each Syntactic Type which occurs anywhere in the Syntax Specification, whether it be a Defined Type or a Terminal Type. Each entry i the Type Table consists of two items: a yes-no item TERMINAL, and an integer item LOOKFOR. When line t in the Type Table corresponds to a Terminal Type, TERMINAL [t] will be set to "yes," and LOOKFOR [t] will contain an arbitrary code number which the Recognizer will interpret as denoting the particular Terminal Character or Terminal Class it should try to recognize. When line t in the Type Table 38 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 TABLE 2 The Syntax Tables (GOAL) (Type) < assignment> Q/ (] , -- + * ( ) (Index) TERMINAL LOOKFOR i ii iii iv v vi vii viii ix x xi xii xiii xiv xv No No No No No No No Yes Yes Yes Yes Yes Yes Yes Yes 1 4 7 10 13 18 19 101 102 1 2 3 4 5 6 2.1 Syntax Type Table SOURCE (Index) TYPECODE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ii x i vi xi iii iv xii iv v xiii v vi vii xiv iii xv viii ix I STRUCT SUCCESSOR ALTERNATE Yes No Yes No No Yes Yes No Yes Yes No Yes Yes Yes No No Yes Yes Yes 2 3 OK 5 6 OK 8 9 8 11 12 11 OK OK 16 17 OK OK OK FAIL OK FAIL FAIL FAIL FAIL FAIL OK FAIL FAIL OK FAIL 14 15 FAIL FAIL FAIL FAIL FAIL Corresponds to Definition 1.1 1.2 1.2 3.1 3.2 4.1 4.2 5.1 5.2 5.3 6.1 7.1 SYNTAX-DIRECTED COMPILING corresponds to a Defined Type, TERMINAL [t] will be set to "No," and LOOKFOR [t] will contain some line-number in the Syntax Structure Table, to be filled in presently~ We keep in mind that we can now use small integers as, in effect, names of Syntactic Types, by using them to index the Syntax Type Table. The Syntax Structure Table will contain a line for each Component of each Alternative of each Definiens in the Syntax Specification. Each line of the Structure Table will consist of four items: TYPECODE, an integer item, will contain line numbers referring to the Type Table; STRUCT, a yes-no item; SUCCESSOR and ALTERNATE, integer items, will contain line numbers referring to other lines of the Structure Table,plus two special codes denoted by "OK" and "FAIL." The Syntax Structure Table is constructed according to the following rules: Consider first a Defined Type which has no left-recursive Alternative in its Definiens. Reserve a block of entries in the Structure Table. Assign an entry in the Structure Table to each Component in each Alternative in the Definiens. In the Type Table line corresponding to this Defined Type-say, t-set LOOKFOR [t] to the Structure-Table line number assigned to the first Component of the First Alternative of the Definiens. In each Component-line s, set TYPECODE [s] to the Type Table line number of the Syntactic Type which occurs as that Component in the Definiens. In each line corresponding to a Component which is not the last Component of an Alternative, set STRUCT to "No" and SUCCESSOR to the line corres'ponding to the next Component. In each line corresponding to a Component which is the last Component of an Alternative, set STRUCT to "Yes" and SUCCESSOR to "OK." In each line corresponding to a first Component of any Alternative except the last Alternative of the Definiens, set ALTERN A TE to the line corresponding to the first component of the next Alternative. Set all other ALTERN ATE fields to ,"FAIL." If the Defined Type contains a left-recursive Alternative: (we shall here assume there is 39 only one left-recursive Alternative-See Appendix). Set the left-recursive Alternative aside temporarily, and carry out the above process for the other Alternatives. Then: Assign a Structure-Table line to each Component of the recursive Alternative except the recursive Component itself. Set TYPECODE in each of these lines, as above. Set SUCCESSOR and STRUCT in each of these lines, except for the last Component, as above. . Call the first of these lines (the one corresponding to the Component which immediately follows the recursive Component in the Definiens) the "handle." Set ALTERNATE in each of these lines, except the handle, to "FAIL." Set ALTERNATE in the handle to "OK." Set SUCCESSOR in the line for the last Component of this Alternative to the handle, and set STRUCT in this line to "Yes." Now, in the lines corresponding to last Components in all the other Alternatives in this Definiens, SUCCESSOR will have been set to "OK" by the nonrecursive treatment. Replace each of these "OK"s by the line number of the handle. The Syntax Type Table and the Syntax Structure Table corresponding to Syntax II are shown together as Table 2. In the hope of ~GOAL - I (Stan:1. . Type) ,SOURCE - 0 'CIIAR- 1 ~INAL [GOAL]? ~••~~;;;;;~Qiiii~iL) ~ ~------~ PU'~' SOURCE-~ Figure 1. Flow Chart for ANALYZE. 40 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 reducing the confusion involved in having entries in each table pointing to the other, w( have indicated the indexing of the Syntax Type Table with Roman numerals, and the indexing of the Syntax Structure Table with Arabic numerals, and we have added the names of the syntactic types corresponding to the lines of the Syntax Type Table. The Analyzer Algorithm The flow chart, Figure 1, illustrates an Analyzer working from Syntax Tables of the sort we have just constructed. The following remarks will help to follow the flow chart. -The global quantities GOAL, SOURCE, and CHAR are used as follows: GOAL is the line number in the Syntax Type Table corresponding to the Syntactic Type currently being considered (initially, the Starting Type). SOURCE is the line number in the Syntax Structure Table of the Component currently being considered (initially undefined, hence set to zero) . CHAR is the (ordinal) number of the character position in the Input String next to be considered (initially set to the beginning of the String, here 1). -The operations of "Pushdown" and "Popup" are performed on these globals-this may be done by any appropriate mechanism. For definiteness, let us assume an index Y (initially zero) accessible to these two operations, and arrays GYOYO, SYOYO, and CYOYO. Then (in quasi ALGOL notation), + Pushdown: Y :== Y 1 GYOYO[Y] :== GOAL; SYOYO[Y] :== SOURCE·; CYOYO[Y] :== CHAR; Popup: GOAL :== GYOYO [Y] ; SOURCE :== SYOYO[Y]; if CHAR is mentioned in the call then CHAR :== CYOYO [Y]; Y :==Y-l; Plausibly, CHAR is popped up when an Alternative has failed, and the Analyzer must back up to the beginning of that Construction and try another Alternative at the same place; and CHAR is not popped up-hence left as it has been set by the successful recognitions-when some Alternative has succeeded. -Recognize is assumed as described earlier: It returns a success/failure indicator which is tested in the "Found?" box. For definiteness again, we shall assume that, when it succeeds in recognizing a Terminal Class, it places a Symbol-Table or Literal-Table line number in some global location, for the Generator to use. -The following sections of this paper will discuss possible uses to be made of the Analyzer's results. The routine which considers these results is named the "Generator," and it is represented in this flow chart by a subroutine call box: "Generate." When Generate is called, the valij.e of SOURCE uniquely indicates the Syntactic Type which has been recognized and, moreover, the particular Alternative in the Definition of that Syntactic Type which has just succeeded. The column headed "Corresponds to Definitions" ha.s been added to the Syntax Structure Table to indicate this relationship. The numbers in this column correspond to the Alternatives in the "semantics" tables, Tables 4 and 5. DOING SOMETHING USEFUL WITH THE ANALYSIS A syntactic analysis, such as that depicted verbally in the preceding section or via the flow chart in Figure 1, is an important part of the problem which must be solved by a compiler, but it is only a part. The goal of a compiler is, after all, to produce the coding required to carry out the procedure described in the programming language being compiled. This coding might be desired as actual machine instructions for some computer or it might be desired as instructions appropriate for some interpreter available on one or more machines or it might be desired in some other· form. In any event, some further processing is required once the syntactic analysis is complete in order to "generate" and format the coding to be output. Let us suppose that the syntactic analysis has proceeded to the point where some syntactic type has been recognized (in the flow chart, Figure 1, we have passed through the "GENERATE" box). The contents of SOURCE tells us SYNTAX-DIRECTED COMPILING 41 TABLE 3 "Interpretive Semantics" for Syntax II 1.1 I 1.2 2.1 3.1 : :== : :== : :== {Execute the then proceed} == {"Locate" the (determine its address for later assignment of value); then evaluate the ; then assign its value ·to the } {Evaluate the ; the value of the is this value} I * 4.2 5.1 ; I + 3.2 4.1 {Execute the then halt} {Evaluate the and then the ; the value of the (defined) is the sum of these two values} {Evaluate the ; the value of the is this value} {Evaluate the and then the ; the value of the (defined) is the product of these two values} {The value of the is the value of the } 5.2 I {The value of the is the value of the } 5.3 I «arith expr» {Evaluate the ; the value of the is the value of the } 6.1 7.1 00- is the value most recently assigned to the variable QI} o {The value of the is the value of the integer 0 (according to the conventional representation of integers) } which syntactic type has been recognized as well as which alternative construction of the syntactic type was built. Thus, some action or set of actions could be initiated at this ,point to process this syntactic type in a variety of ways. For example, Table 3 gives for each alternative construction of the syntactic types of Syn- tax II a verbal description of the computations to be performed upon recognition of that construction. Corresponding to this table, the analysis of the assignment statement X == NU* (Y + 15) could yield the following fragments of (slightly edited) verbal description: 42 PROCEEDINGS-SPRING JOINT COMPUTER CONF'ERENCE, 1964 Line 6.1 Construction Q/ Source x 6.1 5.1 NU 4.1 NU 6.1 5.1 Y 4.1 Y 3.1 · Y 7.1 (] 15 5.1 15 4.1 3.1 < ari th expr> 15 Y ( 4.1 * + 15 + 15) NU* (Y + 15) 3.1 NU* (Y + 15) 2.1 = X= NU* (Y + 15) Description of Computation The value* of the is the value most recently assigned to the variable X. The value of the is the value most recently assigned to the variable NU. The value of the is the value of NU. The value of the is the value of NU. The value of the is the value most recently assigned to the variable Y. The value of the is the value of Y. The value of the is the value of Y. The value of the is the value of Y. The value of the is 15. The value of the is 15. The value of the is 15. The value of the is Y 15. The value of the , (Y + 15) is Y + 15. + The value of NU* (Y 15) is The value of the NU*(Y +15) is + the , NU* (Y 15). NU*(Y +15). + Assign the value of NU* (Y 15) to the variable X. + * Obviously, the current value of the variable X is not of interest here since the purpose of the assignment is to assign a value to X. Thus, with a certain amount of editing, the recognition of X = NU* (Y 15) yields the verbal description: + "Let NU and Y represent the values most recently assigned to the vadables NU and Y; then compute NU* (Y 15) and assign the resulting value to the variable X." + Table 4 illustrates a very simple a'pproach to the problem of machine code synthesis. With each syntactic construction is associated a set of actions. These actions are of two typesoutput and set. The interpretation of the actions is reasonably obvious. The bracketed numerals under the components of a construc- SYNTAX-DIRECTED COMPILING 43 TABLE 4 Machine Code Semantics for Syntax II-Direct Generation 1.1 ::== 1. Output END [1] I 1.2 ; <·program> [2] [3] [1] == 2.1 [1] [2] 1. Output CLA (addr [3] ) [3] 2. Output STO addr [1] 3.1 : :== 1. Set addr == addr [ ] [1] I + 3.2 [1] [2] [1] 1. Output CLA (addr ) [1] [3] 2. Output ADD (addr ) [3] 3. Output STO (addr ) [ ] 4.1 ::== 1. Set addr == addr [1] [ ] [1] i * 4.2 [1] [2] ) 1. uutput LV\:l \ aoar • 'T'T""'Irt.J""'\. , " [1] [3] 2. Output MPY (addr ) [3] 3. Output STQ (addr ) [ ] 5.1 : :== 1. 5.2 I [ ] I «arith expr» [1] 6.1 .. - rv [2] [1] 1. Set addr == addr [1] 5.3 Set addr == addr [ ] [1] [3] [1] 1. Set addr == addr [ ] [2] 1. Set addr == the variable name [ ] recognized at this point of the input string. 7.1 (J 1. Set addr == a symbolic name for the [ ] address in which will be kept, the integer constant recognized at this point in the input string. 44 PROCEEDINGS-SPRING JOINT COMPUTER CONFERENCE, 1964 instruction a temporary storage register is to be assigned. In the example below we use the notation tj for the jth such temporary. Again consider the assignment tion identify the components. The notation addr means the result address associated [1] with the component identified by [1] in the syntax specification. A bracketed blank represents the syntactic type being defined and addr represents the result address of this x= [ ] type; when the addr + 15). a'ppears in a machine If this assignment is analyzed and actions carried out as per Table 4, the following results: Construct Source [ ] Line 6.1 NU*(Y Result Q! X addr [ ] = X 6.1 Q! NU addr [ ] = NU 5.1 NU addr [ ] = NU 4.1 NU addr 6.1 Y addr [ ] = Y 5.1 Y addr [